Beyond prediction: methods for interpreting complex models of soil variation

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1016/j.geoderma.2022.115953. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Alexandre M.J.-C. Wadoux, Christoph Molnar

Abstract

Understanding the spatial variation of soil properties is central to many sub-disciplines of soil science. Commonly in soil mapping studies, a soil map is constructed through prediction by a statistical or non-statistical model calibrated with measured values of the soil property and environmental covariates of which maps are available. In recent years, the field has gradually shifted attention towards more complex statistical and algorithmic tools from the field of machine learning. These models are particularly useful for their predictive capabilities and are often more accurate than classical models, but they lack interpretability and their functioning cannot be readily visualized. There is a need to understand how these these models can be used for purposes other than making accurate prediction and whether it is possible to extract information on the relationships among variables found by the models. In this paper we describe and evaluate a set of methods for the interpretation of complex models of soil variation. An overview is presented of how model-independent methods can serve the purpose of interpreting and visualizing different aspects of the model. We illustrate the methods with the interpretation of two mapping models in a case study mapping topsoil organic carbon in France. We reveal the importance of each driver of soil variation, their interaction, as well as the functional form of the association between environmental covariate and the soil property. Interpretation is also conducted locally for an area and two spatial locations with distinct land use and climate. We show that in all cases important insights can be obtained, both into the overall model functioning and into the decision made by the model for a prediction at a location. This underpins the importance of going beyond accurate prediction in soil mapping studies. Interpretation of mapping models reveal how the predictions are made and can help us formulating hypotheses on the underlying soil processes and mechanisms driving soil variation.

DOI

https://doi.org/10.31223/X5G62K

Subjects

Applied Statistics, Soil Science, Statistical Models

Keywords

Digital soil mapping, Shapley, Partial dependence, H-statistic, Accumulated local effect, Surrogate modelling

Dates

Published: 2021-10-27 02:55

Last Updated: 2021-10-27 09:55

License

CC BY Attribution 4.0 International