Skip to main content
Optimal Modeling and County-Level Applications for the Spatial Prediction of Soil Organic Matter: A Case Study of Thirteen Counties in the Yellow River Basin

Optimal Modeling and County-Level Applications for the Spatial Prediction of Soil Organic Matter: A Case Study of Thirteen Counties in the Yellow River Basin

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Ziyang Zhang , Mengmeng Wu, ZiYi Hu, Haibin Wang, LiWen Liu, Yaodong Jing, Mingxing Qin, Ke Hu, Qi Liu

Abstract

Soil organic matter (SOM) is a key indicator for assessing soil health and the carbon sequestration potential, and its precise spatial prediction is vital for ensuring sustainable agricultural development. Machine learning has emerged as a core tool in digital soil mapping. However, in complex landscapes such as the Yellow River Basin, the selection of models and the translation of their predictions into county-level governance remain uncertain. To address these issues, this study focused on thirteen counties in the Yellow River Basin. We systematically established five machine learning models, namely, random forest (RF), ridge regression, LASSO regression, gradient boosting, and support vector regression models, and obtained 172 soil samples and multiple environmental variables to predict SOM. We optimized the models through recursive feature elimination and cross-validation and comprehensively evaluated their performance on the basis of metrics such as the coefficient of determination (R2) and root mean square error (RMSE). The results indicated that the random forest model achieved the best prediction accuracy and stability (test set R2 = 0.56; RMSE = 3.36 g/kg). Spatial distribution maps generated from this model revealed a distinct SOM pattern across the study area encompassing high values in the east and west and low values in the central region. County-level comparative analysis revealed that compared with counties dominated by intensive agriculture (e.g., Hejin), counties with high forest cover (e.g., Jixian) exhibited significantly higher average SOM contents. This study confirms that the random forest model is a high-precision prediction model suitable for this region. The findings not only reveal key environmental drivers but also, more importantly, provide a county-level comparable SOM spatial dataset and demonstrate clear intercounty differences, directly supporting the formulation of differentiated soil conservation policies. For instance, the results of this study offer a systematic basis for delineating SOM enhancement and conservation zones and for implementing corresponding management measures with precision.

DOI

https://doi.org/10.31223/X5NZ0Z

Subjects

Civil and Environmental Engineering

Keywords

Soil organic matter, machine learning, random forest, Spatial prediction, Yellow River Basin, County scale, Precision agriculture

Dates

Published: 2026-05-22 05:09

Last Updated: 2026-05-22 05:09

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
The authors have declared that no competing interests exist

Metrics

Views: 13

Downloads: 0