Global Sensitivity Analysis to Optimize Basin-Scale Conductive Model Calibration – A Case Study From the Upper Rhine Graben

Calibrating geothermal simulations is a critical step, both in scientific and industrial contexts, with suitable model parameterizations being optimized to reduce discrepancies between simulated and measured temperatures. Here we present a methodology to identify model errors in the calibration and compensate for measurement sparsity. With an application to the Upper Rhine Graben, we demonstrate the essential need for global sensitivity studies to robustly calibrate geothermal models, showing that local studies overestimate the influence of some parameters. We ensure the feasibility of the study through a physicsbased machine learning approach (reduced basis method), reducing computation time by several orders of magnitude.


Introduction
Physics-based forward modeling simulations are used in a wide range of geothermal studies -from the investigation of global and regional temperature fields to local reservoir studies (Cacace et al., 2013;Kolditz & Clauser, 1998;Konrad et al., 2019;Randolph & Saar, 2011). As in many geoscientific appli-5 cations, these simulations are based on models with associated uncertainties in the selection of the physical model (conductive, hydrothermal, mechanical, etc.), the geological model itself, as well as all relevant parameters (boundary conditions and material properties, including hydraulic and thermal conductivies, radiogenic heat production, etc.). To compensate for these model errors, 10 arising from, for instance, measurement errors, generalizations, and geometrical uncertainties (Houghton et al., 2001;Murphy et al., 2004;Refsgaard et al., 2007;Wellmann & Caumon, 2018), we need an appropriate calibration process. Reliable and robust model calibrations represent a particular challenge for deep geothermal systems because most temperature measurements come from 15 shallow depth. Typically, only a limited number of deeper measurements are available for the model validation. This imbalance has an important influence on the ability to constrain uncertain model parameters: if a model parameter only has a very limited influence on the output value at the position of the available data points, then, logically, it will be di cult to estimate this parameter from the 20 available data. The analysis of parameters that can be identified and calibrated is the main goal of the field of sensitivity analysis (Saltelli et al., 2004). Here, we investigate the requirement for sensitivity analyses and potential of automated model calibrations for the specific case of regional conductive heat flow models on the basin scale, with an application to the Upper Rhine Graben in Central 25 Europe.
Sensitivity analyses can be categorized into local and global analyses (Saltelli et al., 2004;Wainwright et al., 2014). The local sensitivity analysis (SA) investigates the influence of the model parameters on the model response in the vicinity of the initial parameters. Furthermore, the local SA assumes that the 30 parameters are independent. As such, local SAs have known deficiencies as they are only locally investigating the first-order parameter influence around a given parameter set. In contrast, the global SA investigates the entire parameter domain and considers parameter correlations. In the work presented here, we investigate both local and global approaches and evaluate specifically the 35 requirement and added value of global SA for regional geothermal simulation studies on the scale of an entire basin (Sobol, 2001;Wainwright et al., 2014).
Global sensitivity analyses in geosciences have been performed for hydrological studies (van Griensven et al., 2006;Tang et al., 2007;Cloke et al., 2008;Zhan et al., 2013;Baroni & Tarantola, 2014;Song et al., 2015), for volcanic source 40 modeling (Cannavó, 2012), and for geothermal heat exchangers (Fernández et al., 2017), to name just a few fields. Furthermore, a comparison of local and global sensitivity analysis has been performed for a hydrological model in Wainwright et al. (2014). However, there is no comparison of local and global sensitivity analyses for basin-scale geothermal heat flow models available yet, 45 as presented here, with a focus also on the uneven distribution of measurement points.
In geothermal applications, the material parameters including thermal conductivities and radiogenic heat productions are usually associated with high uncertainties Lehmann et al., 1998;Vogt et al., 2010;50 Wagner & Clauser, 2005). These high uncertainties quickly lead to wrong initial guesses of the various parameters, bearing the substantial risk of exploring the wrong part of the parameter space with a local sensitivity analysis. Based on this realization, global SAs have been developed (see Saltelli et al., 2004, for an overview), but these typically require many thousand forward simulations and 55 are therefore often infeasible for more complex or larger simulations, where a single simulation run can require tens of minutes or even hours of simulation time.
To overcome the problem of the long simulation time, and to make global SAs possible for geothermal conductive heat flow models, we apply here the reduced 60 basis (RB) method to obtain a highly e cient surrogate model for the entire forward simulation. The RB method is essentially a physics-based machine learning approach (Hesthaven et al., 2016;Prud'homme et al., 2002;Quarteroni et al., 2015). This method has been adapted successfully for geophysical simulations (Degen et al., 2020) and showed promising results with a reduction of simulation time by several orders of magnitude, after model training, while still providing highly accurate estimates of state variables at measurement locations.
We apply this method here to make a global SA feasible and to e ciently test several model scenarios.
While focusing on the model calibration methodology, our study also derives 70 some new information about the case study of the Upper Rhine Graben and its surrounding region, Central Europe. This region has gained significant interest for deep geothermal exploration due to its high geothermal gradient, for example around Soultz-sous-Forêts, Landau, and Bruchsal (Agemar et al., 2013(Agemar et al., , 2014Illies, 1972;Pauwels et al., 1993;Geothermie, 2007;Vidal et al., 2015). However, 75 obtaining reliable spatial temperature distributions is a major challenge as seen in the ongoing discussion in the literature (Agemar et al., 2013(Agemar et al., , 2014Freymark et al., 2017Freymark et al., , 2019GeORG-Projektteam, 2013;Grimmer et al., 2017;Stober & Bucher, 2015), partly because of the unclear influence of advection on the temperature distribution within the Upper Rhine Graben. So far, mostly "trial- 80 and-error" model calibrations have been performed in basin-scale models of this region (e.g. Freymark et al., 2017). With this work, we aim to contribute with a detailed global SA, followed by a full calibration of a conductive geothermal model, and identify the suitability of such a model in di↵erent regions of the basin. 85 The paper is structured as follows: We provide an overview of the numerical methods, including sensitivity analyses, model calibrations, and the reduced basis method in Section 2. This is followed by a comparison of various model calibrations, considering di↵erent sensitivity analysis techniques and di↵erent data weighting schemes for the Upper Rhine Graben in Section 3. We discuss 90 the results in Section 4 and present concluding remarks in Section 5.

Materials and Methods
This section presents the conceptual di↵erences between local and global sensitivity analyses, in order to demonstrate the need for global sensitivity analyses for robust and reliable model calibrations. As stated in the Introduction, our 95 case studies using sensitivity analysis are limited here to thermal basin-scale applications. To focus solely on the methodology, we first perform this comparison on a simplified geological model, even though the results are not restricted to simple models but apply in general, as will be seen in Section 3 with the case study of the Upper Rhine Graben.

Forward Simulation
The first step in a simulation is to select a physical model, which we take as the one for the real-case study of Section 3 to ensure the basic comparison of this section can be directly extended later on. With the focus of this contribution on the methodology, we keep the physics simple and do not consider convection processes, despite their potential importance in the Upper Rhine Graben (Freymark et al., 2019). For the forward simulations of the temperatures, we follow therefore Bayer et al. (1997) and consider a steady-state geothermal heat conduction problem with a radiogenic heat production: where is the thermal conductivity, T the temperature and S the radiogenic heat production.
We are taking only nondimensional parameters and variables into account, which leads to Eq. 2:

115
We now compare the two end-member concepts of local and global sensitivity analysis on the conductive heat transport model presented above, noting that the results extend to other physical problems.
We consider a function f such that u = f (x), with x = (x 1 , ..., x N ) the model parameters and u ⇤ = f (x ⇤ ) the required solution at the optimal parameter 120 set x ⇤ . A sensitivity analysis aims to determine the influence of the model parameters x on the model f (x) (Sobol, 2001;Wainwright et al., 2014), and can be performed locally or globally (Sobol, 2001), as discussed in the next sections.
In the following, we briefly describe the main di↵erence between both analyses, and illustrate the concepts by using a simplified geological basin-scale model 125 ( Fig. 2.1), while keeping the same governing equations as for the subsequent case study.
The simple model is derived from the Upper Rhine Graben model , extending 292 km in the x-direction, 525 km in the y-direction, We investigate the sensitivities of all thermal conductivities and the radio-145 genic heat production of the Saxothuringian.

Local Sensitivity Analysis
We go back to the generic definition of the function u = f (x), with x = (x 1 , ..., x N ). Note that u is a generic function that can contain any value. In our geothermal study a u typically contains numerous temperature measurements 150 of our model domain. Performing a local sensitivity analysis at x 0 consists in . Hence, the local SA investigates the influence of the various model parameters with respect to a pre-defined reference parameter set x 0 . For this reason, local sensitivity analyses focus on the sensitivity in the vicinity of the input parameters (Sobol, 2001;Wainwright et al., 2014) and 155 they do not consider parameter correlations. They assume that the influence observed for each realization is solely arising from the single changed model parameter, which itself is allowed a defined maximum variation (e.g., 1 %).
In the case of a geothermal application, we would, for instance, determine the influence of the di↵erent thermal conductivities with respect to a reference 160 set, which would usually be our initial guess. That already shows the major issue with local sensitivity analyses: defining that reference x 0 . Once selected, we evaluate the function f at that parameter set x 0 , i.e. we calculate the reference temperatures at our observation points, using our physical model with those parameters. Ideally, we would like these reference temperatures to match 165 the observed ones, but this is unlikely, if not impossible, due to model and measurement errors. Hence, despite naturally using our "best" knowledge as the reference, the inherent uncertainties related to geothermal applications lead to discrepancy between this "best" knowledge and the "true" parameter values.
Therefore, we introduce an error in the sensitivity analysis that can lead to 170 wrong and possibly misleading estimates of parameter sensitivities.
We apply now this process to our simple basin-scale model with a 1% varia- which is due to the fact that these temperatures lie outside the imposed range 180 of 1 % variation from the true values. We also note that the response of the simplified model is sensitive to all investigated parameters assuming an arbitrary threshold 1 of 10 -1 (shown as a black line in Fig. 2.2). This means that we would have to consider all model parameters of the simplified model for further 1 Note that it is not possible to define a general value for the threshold because it depends on what we consider as "sensitive". This might di↵er from application to application. In this paper, we look for a significant drop in the sensitivity index values to define the threshold. analyses.

Global Sensitivity Analysis
For the global analysis, the idea is to remove the constraint of the reference x 0 . In this paper, we use the Sobol sensitivity analysis with the Saltelli sampler as our global method. The Sobol sensitivity analysis is a variance-based method (Sobol, 2001;Saltelli, 2002;Saltelli et al., 2010;Wainwright et al., 2014). It does not set x = x 0 , but evaluates instead the entire model f (x). Assuming a square-integrable function with orthogonal members, we can decompose it into the following form (Sobol, 2001): Z This means that the Sobol sensitivity index is defined as the ratio between the partial and total variance.
We use the global sensitivity analyses to investigate the influence of the model parameters on the total absolute misfit between the simulated and the 190 observed temperature data. In contrast to the local SA, the global SA does not require a reference and investigates the influences of all parameters within the pre-defined parameter ranges (Sobol, 2001). Hence, we are no longer restricted to small parameter variations. Furthermore, we no longer need to know the parameter distribution in advance and only specify an allowable physical range.

195
The first-order index represents the influence of the model parameter on the model itself. The second-order index describes the influence of the correlation between two parameters. Further, higher-order indices are available to investigate the influence of the correlation between more than two parameters (Sobol, 2001). For further details regarding the definition of the sensitivity indices, 200 we refer to Sobol (2001), and for further information regarding the sampling procedure, we refer to Saltelli (2002), and Saltelli et al. (2010).
In the example of the geothermal simulation considered here, we no longer need to specify our (actually wrong) "best" knowledge of the thermal properties.
We performed the global sensitivity analysis for the simple basin-scale model 205 ( Fig. 2.2), using 10,000 realizations per parameter to reduce the statistical error and eliminate negative sensitivities, yielding a computationally extremely demanding analysis of in total 100,000 forward solves. To compensate for this high computational cost, we construct a fast, physics-preserving, and highly accurate surrogate model. The sensitivity analysis is executed using the Python library 210 SALib (Herman & Usher, 2017). The results, of the first-order indices (shown in blue) and the total-order indicies (shown in orange) in Fig. 2.2, di↵er significantly from those of the local sensitivity analysis (in green). We still observe the dominant influence of the thermal conductivity of the top layer; however, we see that the model response is insensitive to the thermal conductivity of the 215 middle layer. This means that the local sensitivity analysis overestimates the influence of this particular model parameter. Furthermore, the local analysis overestimates the influence of both the thermal conductivity of the base layer and the radiogenic heat production of the middle layer. In contrast to the local SA, the results of the global SA allow us to reduce the dimension of the 220 parameter space from four to three.
In addition to the total sensitivities, we obtain information about the parameter correlations. For the presented model, we observe nearly exclusive first-order contributions. Hence, the parameter correlations in the presented model are small. This could be seen as an argument to claim that the local 225 SA should be ideally suited for this model; yet it clearly fails. Conversely, the global SA has a di↵erent issue, requiring many forward simulations. In Section 3, we describe how we overcome this disadvantage.

Model Calibration
Based on the results of the sensitivity study, we perform model calibrations 230 on the most relevant parameters only, i.e. those whose sensitivity is above the selected threshold of 10 -1 . Hence, the sensitivity analysis is a natural preparation step for a model calibration (Ray et al., 2015). As we demonstrate in this paper, the e cient reduction of the parameter space leads to more robust model calibration results. 235 We use the trust region reflective (TRF) method from SciPy (Branch et al., 1999;Jones et al., 2014) as our calibration method since it is robust and considers bounds for the model parameters (Jones et al., 2014). Especially the latter aspect is actually of utmost importance since our surrogate models are only valid if we remain within the pre-defined training ranges. This is the reason 240 why we cannot use, for instance, the conjugate gradient method provided by SciPy, whose implementation does not allow the consideration of bounds for the model parameters.
Take the following problem into account: min x2R n {m(x) : l  x  u}, where we want to minimize the function m(x) with x being our model parameter. Furthermore, the model parameters are bounded below by the lower bound l, and above by the upper bound u. The method defines a trust region N around the current "best" solution to improve the convergence rate. In standard trust region methods, the quadratic model q(s) is assumed to approximate the original objective function inside the trust region. We derive the quadratic model The current point is updated to x + s if m(x + s) < m(x) and the trust region is updated. If we do not fulfill this condition, the current point stays the same. Then, we decrease the trust region and repeat the computation of 255 the trial step. The trust region optimization problem is then expressed as: min 1 2 s T Hs + s T g such that kDsk  . Here, g is the gradient, H the Hessian matrix, D a diagonal scaling matrix, and a positive scalar. We repeat these steps until we reach convergence. Minimizing a quadratic function -instead of the original, possibly non-linear, objective function -results in a re-260 duction in computation time. More details about the method can be found in Branch et al. (1999).

Case Study of the Upper Rhine Graben
After illustrating the shortcomings of a local sensitivity analysis for our simple basin-scale model, we extend the study to the real-case study of the 265 Upper Rhine Graben, presented in Freymark et al. (2017). As mentioned in the introduction, our purpose is not to discuss the assumptions and validity of this particular geological model but to demonstrate the impact of global SAs on a given study. Furthermore, we demonstrate the consequences of the outcome of the sensitivity analysis for further analyses such as model calibrations.

270
First, we introduce both the high-dimensional finite element (FE) and the lowdimensional RB model and then we present the results for the sensitivity-driven model calibrations. Fig. 3.3 summarises all model scenarios used for the model calibrations and sensitivity analyses of this Upper Rhine Graben case study.

275
Our study is based on the model of the Upper Rhine Graben presented in Freymark et al. (2017), which is solved using the FE method. This approach explicitly simulates the whole problem and is therefore referred to as the high dimensional model. We recall here some of its numerical aspects for convenience.

290
The temperature sampling points used for calibration are very unequally distributed throughout the spatial domain. To compensate for this inequality, we introduce weights. For the calculation of the weights, we consider two methods: first, we calculate them automatically via a distance matrix provided by SciPy, and second, we specify them via user-input. Both weighting schemes are 295 displayed in a map view in Fig. 3.2. We obtain the user-defined weighting by splitting the data into three regions ( Fig. 3.2). The decision of subdividing the data into three di↵erent regions is made upon their depth distribution and the underlying physical behavior. In detail, region 3 contains the data points below -2,5 km. Region 2, contains the data points deviating from the trend that a 300 conductive model introduces. All remaining points are associated to region 1.
We note that region 3, which corresponds to the northern part of the Upper Rhine Graben, contains only a few points. To compensate for its data sparsity, we apply a weight of 10 to the misfit in this region. We also note that many observation data are available in the Hesse area (northeast of Frankfurt a. M.

305
in Fig. 3.2) while significantly fewer data are available for the Upper Rhine Graben (gray outline, labeled with URG). To compensate for this unequal data distribution, we apply a weight of 0.1 to region 1 and 1.0 to region 2. To explain the assigned weights for region 1 to 3, we have 2282 data points in region 1, 53 in region 2, and 12 data points in region 3 ( Fig. 3.1). Hence, region 1 contains 310 one order of magnitude more data than region 2 (resulting in the weight of 0.1 for region 1) and two orders of magnitude more data than region 3 (yielding the weight of 10 in region 3).

Upper Rhine Graben -Low-Dimensional Model
Based on the full FE model, we constructed two reduced models:

315
The first reduced model (branch 1 of Fig. 3.3) considers only thermal conductivities as model parameters. It consists of 12 di↵erent parameters since we combined those layers with equal thermal conductivities. In both models, we take the allowed parameters ranges from Freymark et al. (2017). If no range is provided, we allow a variation of ± 50 % from the initial value. For the nondimensional representation of the problem, we set the reference thermal conductivity ref to the maximum thermal conductiv-325 ity of 6.0 W m -1 K -1 . We choose the maximum temperature of 1300 C as the reference temperature T ref and the maximum radiogenic heat production of 3.0 µW m 3 as the reference radiogenic heat production S ref . The reference length corresponds to the maximum y-extent of the mesh (525,000 m).
We use the RB solution as a surrogate model for the FE method. To apply the method, we must decompose the PDE into its parameter-dependent and - where u 2 H 1 0 (⌦) is the solution, µ 2 D the parameter (where P = the number of parameters), v 2 X the test function, X the function space (H 1 0 (⌦) ⇢ X ⇢ H 1 (⌦)), and ⌦ the spatial domain in R 3 . The decomposition of the bilinear form a for both reduced models is given by: Here, w 2 X is the trial function, the index "q" denotes the number of the where is the domain boundary in R 3 , n is equal to 11, g(x, y, z) the lifting function, T top the temperature at the top of the model, h(x, y, z) the location in the model, z base (x, y) the depth of the base surface, and d(x, y) the distance between the base and top surface.
The decomposition of the linear form of the second reduced model (branch 2 of Fig. 3.3) is similar: optimization problem (such as model calibration), we seek to minimize the loss function, which is also refered to as cost function (Golberg, 1989;Geem, 2012).
Here, we use the smooth approximation of the L1 absolute value loss with a value of 0.1 for the soft margin between inlier and outlier residuals to but less weight on possible outliers. We set the tolerances for the termination to 10 -5 365 for changes of the cost function, 10 -8 for the norm of the gradient, and 10 -5 for changes of the independent parameters.
Furthermore, note that all forward simulations throughout all model calibrations and sensitivity analyses are performed with the reduced model to speed-up the analysis time and to make the global sensitivity study feasible. 1 (in the Supplementary Material), indicating that no satisfactory calibration was actually obtained for those parameters within the imposed ranges.

Model Calibration with Local Sensitivity Analysis
In this section, we investigate the model calibration of the Upper Rhine

395
Graben under consideration of a local SA. Again, we take all three model weighting versions into account (branches 1.1.2, 1.2.2, and 1.3.2 of Fig. 3.3). First, we present the sensitivity results, and then we show how to incorporate these results into the model calibration procedure.
Sensitivity Analysis:

400
The sensitivity indices resulting from the local sensitivity analyses are shown in Fig. 3.4 for all weighting schemes. In Fig. 3.5, we display them in a crosssectional view to provide a spatial impression of the distribution of the sensitivities.
The local sensitivity analysis without weights (for the reduced model with The model response has a sensitivity below 0.1 for the remaining layers. There-415 fore, we do not consider them for further analyses.

Model Calibration with Global Sensitivity Analysis
We present now the results from the global SA and the model calibration using these sensitivity results. Again, we perform all analyses for the scenarios, 450 using i) no weights, ii) automatic weights, and iii) user-defined weights.
Sensitivity Analysis: For both the sensitivity analysis without weights (branch 1.1.3 of Fig. 3.3) and automatic weights (branch 1.2.3 of Fig. 3.3), we obtain similar results. The first-and total-order indices of the Sobol sensitivity analysis are shown in Fig.   455 3.6. Again, we provide the cross-sectional view in Fig. 3. The influences from all remaining layers are negligible with sensitivities less than 465 10 -2 . As for the local sensitivity analysis also the influence of the radiogenic heat production is negligible.
In addition to the sensitivities for each parameter, we now also obtain an indication of higher-order contributions (see Section 2.2.2), which can be attributed to parameter correlations. The sensitivities are dominated by first-order contri-470 butions for all layers except the Lithospheric Mantle (LM1 & LM2) which also has significant higher-order contributions. The Sobol sensitivity analysis with user-defined weights (branch 1.3.3 of Fig. 3.3) results in a slightly di↵erent pattern. For this scenario, the higher-order contributions are nearly non-existent for all layers. Furthermore, we lose the influences of the remaining layers.
For the calibration, we now consider only the thermal conductivities (branches 1.1.3, 1.2.3, and 1.3.3 of Fig. 3.3) since the Sobol sensitivity analysis showed that the influence of the radiogenic heat production is small in comparison to the thermal conductivity. As before, we repeat the calibration in three scenarios.

480
In the no weight and automatic weight scenario (branches 1.1.3, and 1.2.3 of In Fig. 3.8, we plot the spatial distribution of the user-defined weights. Additionally, we show the misfit between the simulated and observed temperatures for the calibration without weights, with user-defined weights, and for the initial  for the calibration without weights c) the misfit in temperature after the calibration with user-defined weights, and d) the misfit in temperature for the initial parameter distribution.
In the background of all images are the annual regional average surface temperature values plotted. The gray line indicates the main outline of the Upper Rhine Graben.

505
The reduction requires 128 basis functions to describe the model for a relative error tolerance of 5·10 -4 for the Upper Rhine Graben model, where we vary only the thermal conductivities (Tab. 1, branch 1 of Fig. 3.3). The convergence rate of the maximum relative error bound for both reduced model is shown in Fig.   3.9. We can then reduce the computation time of a single forward simulation 510 from 44 min to 3 ms, yielding a speed-up of 9.2·10 5 . For the model with a Graben. Here, the first RB model refers to branch 1 in Fig. 3.3, and the second RB model to branch 2 in Fig. 3

520
The results of our case study highlight the essential need for global sensitivity analyses, required to obtain meaningful, comprehensive, and robust geothermal model calibrations. This aspect is specifically relevant in comparison to traditional "trial-and-error" model calibrations, which we discuss in this section in more detail. Additionally, we discuss important aspects related to data sparsity 525 and the computational cost of using the RB method instead of the classical FE method.

Local vs. Global Sensitivity Analysis
Global and local sensitivity analysis both allow to identify parameters that do not influence the model, leading to a reduction of the number of parameters to 530 consider in further analyses. The quality of the results, though, di↵ers strongly from one method to the other. The common use of local SAs stems from the computational cost of global SAs, which, despite being recognised as a better strategy, certainly appears as prohibitive, at least in absence of any specific strategy to palliate the fact that forward simulations take minutes to hours 535 to run. This second-best nature of local SAs, however, has been perceived as unimportant until now, most probably since local SAs has provided nonetheless a way for modellers to optimise parameters. To assess the accuracy of this optimisation, we have a closer look at the calibrated parameters from our case study of Section 3. parameters are still reaching their pre-defined bounds. This either shows that: i) the bounds were not placed far enough away from the initial parameter distribution, or ii) the parameters are insensitive to the temperature distribution and that consequently the calibration results are meaningless. This is potentially problematic. Considering the already quite large variation range in our study

550
(up to ± 50 %), the second option is more likely. In such a case of insensitive parameters, the model calibration will fail to correctly determine the best parameter values since all of them result in similar temperature distributions.
If we then wanted to perform predictions away from known observations, we would likely obtain biased and overfitted results. To conclude, we need a global sensitivity analysis for the reduction of the parameter space to ensure a robust model calibration and to investigate possible 585 parameter correlations. In this paper, we presented the two end-members of a local SA and a global Sobol SA. However, note that also analyses that combine aspects of both local and global SAs exist. One example is the Morris SA, which could address the reliance on the reference parameters (Campolongo et al., 2007;Morris, 1991;Wainwright et al., 2014). The Morris sensitivity analysis 590 is di↵erence-based, as the local sensitivity analysis. It requires fewer forward simulations than the Sobol sensitivity analysis. Therefore, it is computationally cheaper. In contrast to the Sobol sensitivity analysis, it does not allow an intensive study of the parameter correlations (Campolongo et al., 2007;Morris, 1991;Wainwright et al., 2014). Since these correlations are of major interest 595 in geothermal studies, we advise using the Sobol sensitivity analysis with a surrogate model to compensate for the computational costs.

Geological Conclusion of the Sensitivity Analysis
While our contribution focuses on methodology, it is also interesting to discuss some geological aspects from the results of the global SA. For the SA, we need to define a quantity of interest, which is tailored to the aims of further analyes. In our study, we want to perform a model calibration, which minimizes the di↵erence between simulated and observed temperatures.
Hence, our study is focused on the temperatures at the measurement locations.
Consequently, we focus also our SAs on the measurements. Note that if we were 615 interested in the physical processes, we would need to chose a di↵erent quantity of interest, such as the total amount of heat in the system, to avoid a potential measurement bias.
For our measurement-focused study, we find the influences of the radiogenic heat production to be negligible in comparison to the thermal conductivities.

620
This is the reason why we did not account for these parameters in further analyses. The minor influence of the radiogenic heat production (the sensitivity index of the most influencing radiogenic heat production is nearly one order of magnitude smaller than the one of the thermal conducitivity of the Cenozoic Rift Sediments (see Fig. 2

Data Sparsity
One main di culty of working with most observation data consists in dealing with very uneven spatial distributions. Note that this problem of data sparsity is a general one, not specific to the Upper Rhine Graben. A detailed analysis of how the SA is influenced by data sparsity is presented in Degen et al. (2021). We 645 present here a way to compensate for the sparsity during the model calibration in general, using di↵erent data weighting strategies, along with some consequences for the Upper Rhine Graben in specifically.
As a reminder, for the geological model of Section 3, we have 2282 data points in region 1, 53 in region 2, and 12 data points in region 3 ( Fig. 3.8).

650
Hence, region 1 contains one order of magnitude more data than region 2 and two orders of magnitude more data than region 3. Performing a calibration without applying any weights to the data reduces therefore mainly the misfit in region 1, only a↵ecting slightly the misfits in regions 2 and 3, in line with the overall regions contributions. This can be seen in Fig. 3.8 by comparing 655 the temperature values of the model calibration without weights and the initial temperature distribution. The di↵erence between these two is insignificant, especially under consideration of the measurement accuracy. If we are only interested in region 1 of the model it is su cient to follow the approach of a model calibration without any weights. However, this is not enough for a more even 660 fit over the entire spatial domain. To compensate for the unequal distribution of observations, we apply user-defined weights to the di↵erent regions, as described in Section 3.2. This results in a slightly worse fit for region 1 compared to the initial temperature distribution. However, the misfit in region 2 can be decreases significantly from 29 C to 17 C. The misfit in region 3 is comparable for both realizations.
The calibration with automatic weights yields very similar results to the calibration without weights. We explain this result by focussing on the spatial distribution of the misfit inside the southern and central part of the Upper Rhine Graben. In these areas, we can identify three zones: i) the central part with a 670 zone of overall underestimated temperatures, as well as its ii) northern and iii) southern zones, both with overall overestimated temperatures. These zones of over-and underestimated temperatures correspond to region 2. Therefore, this is the region in which the conductive model fails to represent the measured temperature distribution. If we use the automated weighting scheme that considers 675 distances only, we apply a higher weight to both the over-and underestimated temperature regions and we increase for both parts the contribution to the total misfit, which the calibration minimizes. The user-defined scheme only applies a higher weight to the overestimated temperature zones, therefore we increase the contribution to the total misfit of this region alone yielding the di↵erence 680 between those two schemes.

Consequences of the Data Weighting Analysis for the Upper Rhine Graben Model
Analogously to Section 4.1 on SAs, we draw several geological conclusions for the data weighting analysis. Using the model calibration with user-defined weights and a global SA (branch 1.3.3 of Fig. 3. ized by highly permeable sediments with a dense fault network (Buchmann & Connolly, 2007;Bauer et al., 2015;Vidal & Genter, 2018;Meixner et al., 2016).
Combining these structural characteristics with the discrepancy in the thermal 715 conductivity of the upper two layers leads to the conclusion that the misfit might be related to fluid interactions and related e↵ects on heat transport (such as convection). Both e↵ects are ignored in the current conductive model, which does not consider any faults or fractures as it focuses on the temperature of at basin-scale. This conclusion is supported by Freymark et al. (2019), who show 720 that these features become important for local studies and that the temperature discrepancy is a result of the advective heat transport.
In contrast to previous studies, we have identified the two layers that are responsible for the misfit and systematically determined which of these layers is of higher importance. Furthermore, based on the sensitivity analysis, we have 725 determined that most of these misfits are caused by the two layers themselves, and only a significantly smaller part is introduced by the correlation with other parameters. The global SA was only made feasible by using highly e cient surrogate In the current stage, the RB method provides e cient error bounds only for elliptic and parabolic partial di↵erential equations (PDE). For a geothermal 745 application, this has the consequence that the method is only applicable to conductive studies if we want to obtain a guarantee for the approximation error.

Computational Cost
The RB method is also applicable for hyperbolic problems but in this case, we only obtain estimates of the approximation error. Also note, that we need to decompose the PDE into a parameter-dependent and -independent part. This 750 is naturally given for linear problems, but not for nonlinear ones. Nonetheless, the method is also applicable for nonlinear PDEs by using the empirical interpolation method (e.g. Barrault et al., 2004), which approximates the nonlinear part of the PDE.

755
We performed systematic geothermal model calibrations and sensitivity stud- Not only were we able to identify the spatial areas with model errors (arising, for instance, from unaccounted physical processes) but we furthermore quan-765 tified the influence of the thermal conductivity on these areas. As such, we showed that a simple conductive model can actually compensate for the errors in the underlying physics and potentially provide more reliable predictions temperatures at the target depth of geothermal exploration applications than more complex models accounting for all relevant physical processes.

770
Furthermore, we encountered in this study the problem of data sparsity, as in many geophysical applications. Through di↵erent weighting schemes, we o↵ered systematic methods to compensate for this sparsity during the analyses and therefore possibilities to reduce the bias caused by this unequal data distribution.

775
The combination of a global sensitivity study and an automated model calibration is computationally prohibitive with state-of-the-art finite element simulations for high-resolution models, like basin-scale geothermal models as investigated here. Therefore, we used the RB method as a surrogate model, for the inverse processes. This reduced the computation time from 36 core-years to Our results on this case study open up the path to several subsequent steps.
We showed that the calibration indicates the need to consider the fluid interaction in the upper layers. Due to the high dimensionality of the model, a hydrothermal simulation becomes computationally very costly, making global 790 sensitivity analyses as investigated here prohibitively expensive. Furthermore, the number of parameters that need to be calibrated increases, making a calibration of the individual parameters increasingly di cult, due to the data sparsity (Freymark et al., 2019). In order to perform both e cient inverse processes and consider the hydrothermal e↵ects, novel approaches will be required to introduce 795 another step change, using potentially concepts such as the entropy production (Börsing et al., 2017;Huang & Wellmann, 2021), to transfer the e↵ects of the convection into e↵ective thermal conductivities.

Acknowledgement
We would like to acknowledge the funding provided by the DFG through Analogous to the article, we start the presentation of the results with the model calibration without a sensitivity analysis. Therefore, we present in Tab.
1 the initial thermal conductivities, as well as the calibrated thermal conductivities for all layers and all weighting schemes, i.e. for the i) no weights, ii) automatic weights, and iii) user-defined weights scenario. We also present the 10 radiogenic heat production values. Note that the values were not varied in the calibration since the sensitivity analyses showed that the influence of the radiogenic heat production is minor in comparison to the thermal conductivity.

⇤ Corresponding author
We observe in Tab. 1 a high decrease in the thermal conductivity of the Cenozoic Rift Sediments (CRS) in case of the model calibration with user-15 defined weights with a decrease of 0.4 W m -1 K -1 . Also, we obtain a high increase in thermal conductivity of 0.9 W m -1 K -1 for the Cenozoic Volcanics (CV) for the model calibration without and with automatic weights.

Model Calibration with Local Sensitivity Analysis
In the following, we present how the calibrated values change by including 20 the results of a local sensitivity study into the process of model calibration. The results of the local sensitivity analysis for the unweighted, automatic weighted, and user-defined weighted case are presented in Fig. 3.4. Here, we present the additional analysis performed for determining the influence of the radiogenic heat production. For this analysis, we compared in Fig. 1  The highest influence of the radiogenic heat production is arising from the Saxothuringian (S). However, the influence is significantly lower than the influence of the thermal conductivity of the Cenozoic Rift Sediments (CRS). This is 30 the reason why we did not consider the radiogenic heat production during the model calibration. The acronyms for the respective geological layers are defined in Tab. 1.
As before, we present the detailed result of the calibration of the thermal conductivities, along with the initial values of the thermal conductivities and radiogenic heat productions, for all layers in Tab. 2.

35
As discussed in the main article, the model calibration results without and with a local SA are alike (Tab. 2). Only for the Lower Crust (LC) and the Lithospheric Mantle (LM1 & LM2), di↵erences are observed. We still, observe that many of the model parameters reach their upper or lower bound.

Model Calibration with Global Sensitivity Analysis
The global sensitivity analysis whose results are accounted for in the model calibration is presented in Fig. 3.6. Here, we present the comparison between the influences between the radiogenic heat productions of the Upper Crust and the thermal conductivity of the Cenozoic Rift Sediments (CRS) in Fig. 2. As for the local SA, we conclude that the highest influence of the radiogenic heat pro-45 duction is caused by the Saxothuringian (S). Since this influence is significantly lower than the one of the thermal conductivity of the Cenozoic Rift Sediments (CRS), we disregard the radiogenic heat productions during the calibration.
First-order Sobol Indices [-] 10 0 10 -2 10 -4 10 -8 Total-order Sobol Indices [-] 10 0 10 -2 10 -4 no weights automatic weights user-defined weights For the calibration with user-defined weights (branch 1.3.3 of Fig. 3.3), we observe for the Rift Sediments (CRS) a decrease in thermal conductivity after the calibration, resulting in an extremely low thermal conductivity of 0.8 W m -1 K -1 . The Lithospheric Mantle (LM1 & LM2) has a slightly decreased thermal conductivity of 3.6 W m -1 K -1 in contrast to the initial thermal conductivity of 3.95 W m -1 K -1 .  Fig. 3.3) by 70 decreasing the allowed variation range from ± 50 % to ± 10 % in step sizes of 10 %. For this analysis, only the model calibration with a global sensitivity analysis behaves in accordance with our expectations. Both the model calibration without a sensitivity study and with a local sensitivity study behave nonphysically.
For example, if we define a variation range of ± 10 % the thermal conductivity of 75 the Cenozoic Folded Molasse/Cenozoic Foreland Molasse/Buntsandstein/Jura Mountains/Odenwald layers decreases to its lower bound of 2.7 W m -1 K -1 .
Note that the initial value is 3.0 W m -1 K -1 . However, if we now increase the variation range to ± 20 %, then the thermal conductivity increases to its upper bound of 3.6 W m -1 K -1 , which is physically not plausible.

80
In order to control the robustness of the di↵erent model calibration versions, we performed the model calibration without a sensitivity study, with a local sensitivity study, and with a global sensitivity study with di↵erent initial guesses.
All model calibrations are tested using the user-defined weights (branches 1.3.1 to 1.3.3 of Fig. 3.3). We tested the parameter values provided by Freymark 85 et al. (2017) as an initial guess, the lower and upper parameter bounds, and ten randomly chosen initial parameter sets. The least robust model calibration is the one without a sensitivity analysis (branch 1.3.1 of Fig. 3.3).
It shows the highest di↵erences with thermal conductivity di↵erences in the order of 10 -2 W m -1 K -1 .

90
The model calibration with a local sensitivity analysis (branch 1.3.2 of Fig.   3.3) yields di↵erences in the order of 10 -5 W m -1 K -1 , and the one with a global sensitivity (branch 1.3.3 of Fig. 3.3) di↵erence in the order of 10 -4 W m -1 K -1 .   Fig. 3). That is suspicious since at least di↵erences caused by numerical errors should be visible. This leads to the conclusion that the model calibration with the local sensitivity study is robust for the model parameters that do not reach their bounds and stuck for those that reach one of their bounds. So, overall it is a non-robust model calibration.

105
To conclude, only the model calibration with a global sensitivity analysis results in robust and therefore reliable model calibration. The local sensitivity analysis fails to e ciently reduce the parameter space. This means we remain with insensitive parameters in the model calibration. These parameters cause the non-robustness since any for these parameters any value results in the same 110 temperature distribution.