Skip to main content
AI–Based Classification of Coffee Leaf Rust from Leaf Images in Smallholder Kenyan Farms

AI–Based Classification of Coffee Leaf Rust from Leaf Images in Smallholder Kenyan Farms

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Maurice Wanyonyi 

Abstract

Coffee leaf rust (CLR), caused by Hemileia vastatrix, remains a major threat to smallholder coffee production, yet access to timely and actionable disease risk information is limited. This study developed and compared machine learning models for predicting CLR incidence using plot-level data from 9,850 observations collected across six Arabica-producing counties in Kenya between 2018 and 2023. The dataset included microclimatic variables (relative humidity, temperature, precipitation, and leaf wetness duration), spatial variables (elevation, distance to infected farms, and NDVI), and agronomic variables (coffee variety, plant age, shade cover, fungicide use, and outbreak history). After preprocessing and addressing class imbalance (36.1% CLR-positive) using SMOTE, we trained and evaluated logistic regression, random forest, XGBoost, support vector machine, and artificial neural network models. Logistic regression achieved the highest discriminative performance (area under the receiver operating characteristic curve, AUC = 0.872) and the best calibration (Brier score = 0.148). XGBoost achieved comparable predictive performance (AUC = 0.845) and better representation of non-linear threshold effects. Across models, leaf wetness duration (odds ratio, OR = 3.21 per hour), relative humidity (OR = 2.75 per percentage point), and distance to the nearest infected farm (OR = 0.51 per km) were the most influential predictors of CLR incidence. SHapley Additive exPlanations (SHAP) identified clear non-linear thresholds, indicating that CLR risk increases sharply when relative humidity exceeds 80% or when leaf wetness duration exceeds 12 hours per day. Scalability analysis showed that logistic regression and XGBoost are computationally efficient, with model sizes below 2 MB and inference latencies under 2 ms per sample on a standard CPU. These characteristics make both models suitable for deployment on low-cost smartphones for real-time prediction.

DOI

https://doi.org/10.31223/X5ZB6T

Subjects

Life Sciences

Keywords

Coffee leaf rust, Machine learning, Plant disease classification, Smallholder farming systems

Dates

Published: 2026-05-26 12:59

Last Updated: 2026-05-26 12:59

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
The Author declare that no conflict of interest exists

Data Availability:
The data underlying this study was deposited in Zenodo and can be can be accessed through https://doi.org/10.5281/zenodo.17861841

Metrics

Views: 18

Downloads: 0