This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.
Explainable machine learning for wheat above-ground biomass estimation under spatial cross-validation: sensor contributions and exploratory harvest forecasting
Downloads
Authors
Abstract
Global food security faces increasing challenges under climate change, making accurate monitoring of wheat (*Triticum aestivum*) critical. This study presents an explainable machine learning (ML) framework to estimate in-season wheat above-ground biomass (AGB) and to explore harvest forecasting one to four months ahead, integrating in-situ weather and soil moisture data with Sentinel-1, Sentinel-2, and PlanetScope imagery across four Mediterranean wheat field-seasons (2020 to 2023) in central Chile. We evaluated 41 models, comprising 40 recipe-algorithm combinations spanning eight sensor-combination recipes and five ML algorithms (Random Forest [RF], XGBoost, GLMnet, bagMLP, and KNN), plus a stacked ensemble, and obtained a preliminary, exploratory assessment of their spatial transferability with a leave-one-site-out cross-validation (LOSO-CV) scheme across the four site-seasons, with model transparency assessed via the DALEX framework.
For in-season estimation (stage 1), the best LOSO-CV configuration (RF using only PlanetScope predictors, *rec7*) reached R²=0.78 (RMSE=7.87 t/ha, MAE=5.90 t/ha). A sensor-ablation analysis showed that Sentinel-2 reduced RMSE by 23.4% relative to a weather-only baseline, Sentinel-1 added essentially no further skill (+0.5%), and PlanetScope contributed a further 5.7% reduction. DALEX analysis of the stacked ensemble identified a Sentinel-2 SWIR-related chlorophyll index (S2~SWIR12~-MCARI), two PlanetScope band/index predictors, a cumulative Sentinel-2 red-edge index, and the Sentinel-1 VH/VV backscatter ratio as the leading, comparably important predictors, spanning optical, SAR, and accumulated-thermal-time variables. As a secondary, exploratory objective, the same LOSO-CV framework was applied to a self-consistency analysis of harvest-AGB forecasting one to four months ahead, using the stage 1 model's own spatial predictions as the target in the absence of independent harvest-date ground truth; self-consistency (RF, all predictors) was modest and decayed sharply with lead time (R² from 0.46 at one month to 0.10 at four months; RMSE from 4.5 to 6.5 t/ha), underscoring the need for both independent harvest-date measurements and larger multi-site networks to properly evaluate multi-month forecasting. This multi-sensor, cloud-resilient framework provides a promising, explainable basis for in-season AGB monitoring in Mediterranean wheat systems, although the four-site-season LOSO-CV evaluation should be read as a preliminary sensitivity analysis of spatial transferability rather than definitive proof of generalizability (see Discussion).
DOI
https://doi.org/10.31223/X5KJ1K
Subjects
Agriculture, Life Sciences
Keywords
above-ground biomass, Sentinel‑1, Sentinel‑2, PlanetScope, soil moisture, machine learning, Precision Agriculture
Dates
Published: 2025-12-18 18:02
Last Updated: 2026-06-13 17:53
Older Versions
License
CC-BY Attribution-NonCommercial-ShareAlike 4.0 International
Additional Metadata
Conflict of interest statement:
None
Metrics
Views: 452
Downloads: 118
There are no comments or no comments have been made public for this article.