Equifinality, sloppiness, and emergent structures of mechanistic soil biogeochemical models

Biogeochemical models increasingly consider the microbial control of carbon cycling in soil. The major current challenge is to validate mechanistic descriptions of microbial processes and predicted system responses against experimental observations. We analyzed soil biochemical models of different complexity regarding parameter identifiability using information geometry, i.e. a model is geometrically interpreted as a manifold embedded in data space. The most complex model (PECCAD) was used as a test case to reveal parsimonious process formulations. All models showed sloppiness, i.e. most individual parameter values cannot be inferred from the observed data. We derived a less complex model formulation of PECCAD with effective inferable parameter combinations and identified structural model limitations. The complexity of identified effective models was systematically reduced with ∗Corresponding author Email addresses: gianna.marschmann@uni-hohenheim.de (Gianna L. Marschmann), holger_pagel@uni-hohenheim.de (Holger Pagel), philipp.kuegler@uni-hohenheim.de (Philipp Kügler), tstreck@uni-hohenheim.de (Thilo Streck) 1Schloss-Wolfsbrunnenweg 8, 69117 Heidelberg, Germany Preprint submitted to Elsevier May 18, 2019 decreasing information content of data. Our results suggest that information geometry provides a powerful approach to derive effective descriptions of relevant biogeochemical processes and reduce structural model uncertainty.

tions. Starting from a complex model, this information can be leveraged by to reveal physically or biologically relevant mechanistic information about Turnover depends on C input into the soil (I(t)), the first-order cycling rate  Here, V max is the maximum growth rate, K S is the substrate affinity coeffi-142 cient and Y denotes the microbial C use efficiency.  Table 1-2)) to an ODE system (PECCAD 166 ODE, Fig. 1).

167
PECCAD ODE couples the dynamics of two pesticide pools (dissolved C P 168 and sorbed phase C P −s ) to that of several C pools (readily available high resolutions: (i) using information on all available data including the dynamics of functional genes (dark gray), or (ii) bulk biomass (gray) along with measurements of dissolved organic C (DOC), insoluble soil organic matter (C I ), total pesticide (C P + C P −s ) and heterotrophic respiration (CO 2 ); (iii) only with input-output information on total pesticide and CO 2 (light gray). Fluxes directly related to pesticide degradation in orange.
Individual C pools in white boxes correspond to unobserved system components.
Local sensitivity of model output with respect to changes in parameters 214 around the best fit value p * was measured by the Hessian matrix In parameter space, the Hessian approximates regions of constant cost as    each width being smaller than the previous one by a roughly constant factor.

374
The width of the spectra increases with increasing apparent model complex-375 ity, which is taken here as the number of model parameters.

Sloppiness and parameter identifiability analysis
For the parameter identifiability analysis of the minimal microbial soil C data sets (Fig. 4a). For sets consisting of 2 or 3 parameters, identifiability 387 depends on the specific parameter combination as well as the specific data 388 set. Application of the MBAM results in reduced models whose parameters 389 are uniquely identifiable from the data (filled symbols in Fig. 4a).
where ϑ 1 = V max /K S is the emergent linear decomposition factor. The   Table A.4 and Fig. 5a).

444
Limits are less obvious when multiple parameters approach extreme values at 445 the same rate as defined by Eq. 7. In these cases, emergent finite parameter 446 combinations correspond to expressions such as ∞/∞, 0/0, 0 · ∞ or ∞ − ∞.

447
As an illustration of different types of limiting processes, consider, e.g., the 448 following ODE of specific pesticide degrader C: The death rate (a BP ) is mediated by substrate availability in order to simu- We identified the discarding limits k BP,P → 0, k BP,hiq → 0 and K a−BP,P → 0.

462
That is, the time evolution of the specific degrader pool does not explicitly 463 depend on the pesticide concentration C P in the system: Additionally, we identified the rescaling limit, a max−BP , K Singular limits leading to steady-state approximations usually require evalu-469 ating more than a single biokinetic term on the right hand side of the equation Here, Φ BP is a limiting factor of activity increase and m max−BP is the max- whereas the alternative limit ϑ 5 → 0 would have corresponded to a saturating 496 approximation of Monod kinetics (cf. Fig. 2).   The results of the MBAM agree well with global sensitivity measures derived 717 from the Morris method and Bayesian model calibration (Fig. 7). In con-      2014)).

C stock
Differential equation Specific pesticide degraders Physiological state index of specific pesticide Sorbed phase Insoluble soil organic matter C Sorbed phase Litter derived CO Fraction of hiq litter on total decomposed litter Maximum rate of total litter decomposition Substrate-dependent specific growth rate of bacteria Substrate-dependent specific rate of maintenance respiration of bacteria Substrate-dependent specific growth rate of bacterial pesticide degraders Substrate-dependent specific rate of maintenance respiration of bacterial pesticide degraders Limiting factor of activity increase of bacteria Limiting factor of activity increase of fungi Limiting factor of activity increase of bacterial pesticide degraders  Bacteria (1) Rescaled physiological state index of bacteria Rescaled physiological state index of specific pesticide degraders hiq dissolved organic C Sorbed phase Sorbed phase Insoluble soil organic matter C Sorbed phase Table A.5: Governing differential equations and parameters of the reduced PECCAD ODE model (M=7, N=21) corresponding to bulk measurements in the MCPA + Litter data set.

C stock
Differential equation

Renormalized parameters [unit]
Bacteria Fungi Rescaled physiological state index of bacteria Rescaled physiological state index of specific pesticide degraders Sorbed phase Sorbed phase Insoluble soil organic matter C Sorbed phase Table A.6: Carbon stocks, governing differential equations and renormalized parameters of the reduced PECCAD ODE model (M=6, N=18) corresponding to input-output observations of the MCPA + Litter experiment.

C stock
Differential equation

Renormalized parameters
Bacteria Fungi Rescaled physiological state index of specific pesticide degraders Sorbed phase Sorbed phase Insoluble soil organic matter C Sorbed phase Insoluble soil organic matter

Description Expression Unit
Specific rate of initial-stage decomposer growth Limiting factor for activity increase of initial-stage de- Limiting factor for activity increase of late-stage decomposer Φ(C s,ls ) = Mineral-associated organic carbon Mineral-associated organic carbon decomposition Adsorption of DOC Desorption of DOC DOC uptake by microbes Dormancy flux BA growth respiration BA maintenance respiration BD maintenance respiration BA mortality Synthesis of enzymes for P1 F 13,EP 1 = P 1 Synthesis of enzymes for P2 F 13,EP 2 = P 2 Synthesis of enzymes for M Turnover of enzymes  Figure S2, Gelisol)).   ., 2014)).

C stock Differential equation
Stable soil organic C substrates

Description Expression Unit
Microbial uptake Mortality of active microbial biomass Mortality of dormant microbial biomass Enzyme production rate Transfer from dormant to active pop- Transfer from active to dormant pop- Maintenance respiration Leaching of dissolved organic C Leaching of enzymes Transfer coefficient for dissolved or- Diffusivity of dissolved organic C in bulk soil Transfer coefficient for enzymes

Diffusivity of enzymes in bulk soil
Switching function for active-dormant state transition Switching function for dormant-active state transition Soil matric potential with finite difference approximation (h = 0.01) The geodesic ODE has to be integrated until a manifold boundary is iden- For the Metropolis-Hastings algorithm (Chib and Greenberg, 1995) that samples the posterior distribution of sloppy models, Gutenkunst (2007) suggests to sample the candidate parameter vector from a multivariate Gaussian distribution, the inverse covariance matrix of which is the Hessian matrix. The acceptance probability that satisfies detailed balance reads: . The larger µ * i , the larger the effect of the i-th parameter on the model performance metric. σ i is a measure for nonlinearity or interaction effects for the i-th parameter. If σ i is small, the EEs for the i-th parameter do not vary significantly over support points in parameter space. If the effect of a small perturbation of a parameter is the same everywhere, a linear relationship between parameter and model performance metric is likely. A parameter with large σ i will have non-linear or interaction effects. Different sets of Morris mean and standard deviation hence correspond to parameters that have negligible effect on the model performance metric (both µ * i , σ i small), those that have a linear effect (µ * i > σ i , with σ i small) and those with significant interaction effects (µ * i < σ i , with both µ * i , σ i large).   Convergence plot for the mean of EEs and confidence intervals derived from 3000 bootstrap resamplings evaluated for different sample sizes.