Hybrid Machine Learning for Integrating Pedological Knowledge into Digital Soil Mapping to Advance Next-Generation Earth System Models

This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.


Download Preprint


Rodrigo Miranda, Rodolfo L. B. Nobrega , Estevão Silva, Jadson Silva, José Araújo Filho, Magna Moura, Alexandre Barros, Alzira Souza, Anne Verhoef, Wanhong Yang, Hui Shao, Raghavan Srinivasan, Feras Ziadat, Suzana Montenegro, Maria Araújo, Josiclêda Galvíncio


Land surface and Earth System models require reliable soil maps to represent the influence of spatial variability of soil properties on ecosystem fluxes and storages. However, mapping soils using conventional in situ survey protocols is time-consuming and costly. We addressed the outdated spatial information on soil physico-chemical properties for a tropical region with a ~700-km longitudinal gradient of contrasting topography, climate, and vegetation (~98,000 km2; NE Brazil), by developing a novel hybrid machine learning framework and applying it to this region. This framework reduces prediction redundancies due to high multicollinearity by implementing a recursive feature selector algorithm for input selection; its core is composed of the Soil-Landscape Estimation and Evaluation Program (SLEEP) and a calibrated Gradient Boosting Model (GBM) capable of modeling the spatial distribution of soil properties at multiple and dynamic soil depths. The use of SLEEP and GBM allowed us to explain the spatial distribution of various soil properties and their environmental modulators. The model training and testing approach used six topographical, ten meteorological and two vegetation properties, and data from 223 soil profiles across the study area. Our models demonstrated a consistent performance with spatial extrapolations exhibiting r2 values of 0.79–0.98, and -1.39–1.14% percent bias. The properties related to topography and climate were dominating when estimating the number of soil layers, soil texture, and the sum of bases. Our framework features high flexibility and it is transferable to other tropical regions, while reducing capital investments and increasing accuracy when compared to traditional mapping protocols.




Environmental Monitoring, Soil Science, Statistical Models


Gradient Boosting Model, Decision trees, Sleep, Soil properties, tropics, Pernambuco.


Published: 2022-07-22 01:15

Last Updated: 2023-04-11 10:42

Older Versions

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:

Data Availability (Reason not available):

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.