This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Optimal Modeling and County-Level Applications for the Spatial Prediction of Soil Organic Matter: A Case Study of Thirteen Counties in the Yellow River Basin
Downloads
Authors
Abstract
Soil organic matter (SOM) is a key indicator for assessing soil health and the carbon sequestration potential, and its precise spatial prediction is vital for ensuring sustainable agricultural development. Machine learning has emerged as a core tool in digital soil mapping. However, in complex landscapes such as the Yellow River Basin, the selection of models and the translation of their predictions into county-level governance remain uncertain. To address these issues, this study focused on thirteen counties in the Yellow River Basin. We systematically established five machine learning models, namely, random forest (RF), ridge regression, LASSO regression, gradient boosting, and support vector regression models, and obtained 172 soil samples and multiple environmental variables to predict SOM. We optimized the models through recursive feature elimination and cross-validation and comprehensively evaluated their performance on the basis of metrics such as the coefficient of determination (R2) and root mean square error (RMSE). The results indicated that the random forest model achieved the best prediction accuracy and stability (test set R2 = 0.56; RMSE = 3.36 g/kg). Spatial distribution maps generated from this model revealed a distinct SOM pattern across the study area encompassing high values in the east and west and low values in the central region. County-level comparative analysis revealed that compared with counties dominated by intensive agriculture (e.g., Hejin), counties with high forest cover (e.g., Jixian) exhibited significantly higher average SOM contents. This study confirms that the random forest model is a high-precision prediction model suitable for this region. The findings not only reveal key environmental drivers but also, more importantly, provide a county-level comparable SOM spatial dataset and demonstrate clear intercounty differences, directly supporting the formulation of differentiated soil conservation policies. For instance, the results of this study offer a systematic basis for delineating SOM enhancement and conservation zones and for implementing corresponding management measures with precision.
DOI
https://doi.org/10.31223/X5NZ0Z
Subjects
Civil and Environmental Engineering
Keywords
Soil organic matter, machine learning, random forest, Spatial prediction, Yellow River Basin, County scale, Precision agriculture
Dates
Published: 2026-05-22 05:09
Last Updated: 2026-05-22 05:09
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
The authors have declared that no competing interests exist
Metrics
Views: 13
Downloads: 0
There are no comments or no comments have been made public for this article.