This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
AI–Based Classification of Coffee Leaf Rust from Leaf Images in Smallholder Kenyan Farms
Downloads
Authors
Abstract
Coffee leaf rust (CLR), caused by Hemileia vastatrix, remains a major threat to smallholder coffee production, yet access to timely and actionable disease risk information is limited. This study developed and compared machine learning models for predicting CLR incidence using plot-level data from 9,850 observations collected across six Arabica-producing counties in Kenya between 2018 and 2023. The dataset included microclimatic variables (relative humidity, temperature, precipitation, and leaf wetness duration), spatial variables (elevation, distance to infected farms, and NDVI), and agronomic variables (coffee variety, plant age, shade cover, fungicide use, and outbreak history). After preprocessing and addressing class imbalance (36.1% CLR-positive) using SMOTE, we trained and evaluated logistic regression, random forest, XGBoost, support vector machine, and artificial neural network models. Logistic regression achieved the highest discriminative performance (area under the receiver operating characteristic curve, AUC = 0.872) and the best calibration (Brier score = 0.148). XGBoost achieved comparable predictive performance (AUC = 0.845) and better representation of non-linear threshold effects. Across models, leaf wetness duration (odds ratio, OR = 3.21 per hour), relative humidity (OR = 2.75 per percentage point), and distance to the nearest infected farm (OR = 0.51 per km) were the most influential predictors of CLR incidence. SHapley Additive exPlanations (SHAP) identified clear non-linear thresholds, indicating that CLR risk increases sharply when relative humidity exceeds 80% or when leaf wetness duration exceeds 12 hours per day. Scalability analysis showed that logistic regression and XGBoost are computationally efficient, with model sizes below 2 MB and inference latencies under 2 ms per sample on a standard CPU. These characteristics make both models suitable for deployment on low-cost smartphones for real-time prediction.
DOI
https://doi.org/10.31223/X5ZB6T
Subjects
Life Sciences
Keywords
Coffee leaf rust, Machine learning, Plant disease classification, Smallholder farming systems
Dates
Published: 2026-05-26 12:59
Last Updated: 2026-05-26 12:59
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
The Author declare that no conflict of interest exists
Data Availability:
The data underlying this study was deposited in Zenodo and can be can be accessed through https://doi.org/10.5281/zenodo.17861841
Metrics
Views: 18
Downloads: 0
There are no comments or no comments have been made public for this article.