A data assimilation framework to constrain anthropogenically- induced geomechanical processes at depth: the subsidence case

Subsurface activities, such as reservoir gas production, geothermal heat extraction, ground water extraction, phreatic groundwater level lowering, storage of natural gas and CO2, potentially lead to geomechanical risks. The two most critical instances of these risks are anthropogenically-induced seismicity and subsidence. A combination of geological interpretations with seismic campaigns and flow modeling often provides a relatively rich pre-existing knowledge of the underground around anthropogenic subsurface activities. However, our understanding of the driving mechanisms for induced seismicity and subsidence is still poor and our modelling forecasts still very uncertain. In our companion paper [Candela et al., 2021] the focus is on induced seismicity; here the focus is on subsidence induced by natural gas extraction. The translation of reservoir pressure depletion to ground surface displacements involves multiple types of poorly constrained physics-based models. Deploying a data assimilation procedure based on ensemble smoother algorithms, we demonstrate that (i) the reservoir compaction process driving subsidence can be effectively constrained and (ii) our subsidence forecasts can be well aligned with the geodetic observations. The identification of the physical process at work is crucial to build confidence in our subsidence forecasts. The performance is achieved by honoring uncertainties at each step of the workflow from reservoir depletion to ground surface displacement and by assimilating not only vertical displacements (that is the subsidence) but also horizontal displacements. The predictive power of the procedure is demonstrated with an ensemble of synthetic but complex reservoir flow simulations mimicking all the characteristics and uncertainties representative for real gas fields in the north of the Netherlands.


INTRODUCTION
Various types of energy-related anthropogenic subsurface activities, such as reservoir gas production, geothermal heat extraction, ground water extraction, storage of natural gas or CO2, can lead to geomechanical risks, such as induced seismicity and subsidence [Zoback, 2007]. The relatively rich preexisting knowledge of the underground around anthropogenic subsurface activities (e.g. through geological interpretations of seismic campaigns, borehole analysis, and reservoir flow modelling, e.g. [NAM, 2016] for the Groningen gas field, the Netherlands) could lead one to think that the driving physical processes of anthropogenically-induced seismicity and subsidence are well constrained. This is unfortunately not the case and for each instance of anthropogenic subsurface activities, the contribution of multiple driving physical processes is still highly debated (e.g. [Mossop and Segall, 1999], [van Thienen-Visser et al., 2015a], [Fokker et al., 2016], [Goebel and Brodsky, 2018], [Buijze, 2020], [Candela et al., 2019[Candela et al., , 2021).
An important step towards the identification of the driving processes of induced seismicity and subsidence is to properly assimilate the observations and to constrain the pre-existing geological, hydrological, geomechanical models and their inherent uncertainties. As in many scientific realms where uncertainties are abundant, like weather forecasting and hydrology, a probabilistic ensemblebased approach can be appropriate and fruitful (e.g. [Reggiani and Weerts, 2008], [Jaynes, 2003], [Evensen, 2003], [Emerick and Reynolds, 2013a,b]). In the context of Bayesian probability, "ensemble" here implies that we build multiple model realizations, based on the possible choices of processes and subsurface parameters. For the quantification of the driving mechanisms of induced seismicity at the Groningen gas field, a data assimilation procedure based on an ensemble smoother algorithm has been deployed in the companion paper of [Candela et al., 2021]. Here the focus is on subsidence.
During the extraction of natural gas from a gas field, the reservoir pressure decreases, leading to compaction. This reduction in volume at reservoir depth may induce surface subsidence [Doornhof, 1992], with consequences for the environment and for human activities [van Thienen-Visser et al., 2015b]. The consequences of reservoir production and its changes call for subsidence forecasting approaches. However, the link between reservoir production and subsidence is non-trivial. Some gas fields in the Netherlands and elsewhere have shown a non-linearity between pressure depletion and subsidence, or even a delay between the start of production and the onset of subsidence and a continuation of subsidence even after production had stopped [van Thienen-Visser et al., 2015a, Hettema et al., 2002. Explanations for this non-linear relationship between reservoir production and subsidence have been suggested in the potential delay between reservoir depletion and compaction, and in the potential non-linear or inelastic mechanical behavior of the subsurface. For the latter class of explanations, multiple reservoir compaction models have been already developed (e.g. [Mossop, 2012], [De Waal, 1986], [NAM, 2015], [Pijnenburg et al., 2018[Pijnenburg et al., , 2019). An efficient procedure to discriminate which compaction model is the most likely to explain the subsidence observations is a crucial step to gain confidence in our subsidence predictions.
Ensemble-based approaches for inversion of surface subsidence have already been developed in the past (see e.g. [Fokker et al., 2012], [Fokker et al., 2016], [Baù et al., 2015], [Gazzola et al., 2021]). However, these pioneering works were designed for specific applications, where the full spectrum of uncertainties from the reservoir flow to the modelling of subsidence was not accounted for. For example [Fokker et al., 2016] assumed a linear compaction model and only varied the reservoir compaction coefficient and the subsurface elastic moduli. [Baù et al., 2015] focused on a simplified disk-shaped reservoir with spatially uniform pressure drop. We here introduce a newly developed integrated approach, coined Ensemble-based Subsidence Interpretation and Prediction (ESIP). It takes advantage of all our a-priori knowledge in terms of processes and subsurface parameters. The objective is to carefully span the a-priori model space with an ensemble of subsidence realizations. Later, deploying an ensemble smoother algorithm, the ensemble is confronted with the ground surface displacement data in order to refine the predictions and to identify the most likely driving mechanisms. For evaluation purposes, the integrated approach is applied on numerical flow simulations aiming to mimic the characteristics of real gas fields in the north of Netherlands. We investigate the predictive power of our approach by training our model to a certain time period, then making forecasts for the period following, and evaluating whether our method has improved the forecasting reliability.
Our newly developed integrated approach was developed for the production of a natural gas field, but it can easily be adapted for other settings of exploitation of subsurface resources causing ground surface movement. Typical examples are the injection of CO2 [Vasco et al., 2010], underground gas storage [Teatini et al., 2011], steam injection [Khakim et al., 2012], geothermal systems ( [Mossop and Segall, 1999], [Allis et al., 2009], [Vasco et al., 2013]), groundwater extraction [Galloway and Burbey, 2011], phreatic groundwater level management  and salt mining [Fokker and Visser, 2014].

WORKFLOW DESCRIPTION
This section details the workflow that we designed to improve the parameterization and forecasting capability of subsidence above a typical gas field in the north of the Netherlands. The complexities of such a field are reproduced with a synthetic reservoir model, and multiple reservoir simulations are performed to mimic the prevalent uncertainty. One of the realizations is selected as the synthetic truth model and used for creating synthetic subsidence data, while the others are employed to show how the knowledge about the reservoir and the subsidence forecasting can be improved with these data.

ENSEMBLE OF PRESSURE DEPLETION SCENARIOS
A synthetic gas field was designed with Shell's propriety reservoir simulator MoReS, mimicking the complexities of a typical real field in the north of the Netherlands. These complexities (see Figure 1) are: (1) a fault potentially compartmentalizing the reservoir with an uncertain sealing capacity (that is an uncertain fault transmissibility), (2) an uncertain reservoir permeability, including the possibility of a flow barrier and a high permeability streak, (3) an uncertain amount of residual gas in the aquifer, and (4) irregular reservoir boundaries and grid size. The variability in the permeability of the flow barrier results in the variability of the aquifer activity. We built an ensemble of pressure depletion scenarios to map our a-priori belief in terms of reservoir geometry, geology, flow properties, and their uncertainty. Figures 2 and 3 display the first and the last of the 76 members of the prior ensemble. Each member is simulating 30 years of field production; the total volumes of gas and water produced are representative of real fields in the north of the Netherlands. The depletion level of a reservoirconnected aquifer is regularly uncertain in real scenarios. This is also the case for our synthetic scenario, where the prior ensemble is intentionally mapping uncertain pre-existing knowledge on the degree of aquifer depletion. The pressure depletion of the aquifer is clearly variable between members of the prior ensemble (see Figures 2 and 3).

UPSCALING
The 3-D pressure fields (Figures 1 to 3) are upscaled to 2-D maps in two steps: the vertical averaging of the pressure in the reservoir layers and the horizontal averaging over a coarser grid. This averaging process reduces the computational runtime and is justified by the fact that a distribution of compaction over an upscaled cell in a thin and deep reservoir has a limited effect on the surface subsidence [Geertsma, 1973].
The upscaling needs to be performed such that the amount of compaction for the upscaled 2-D grid and the initial layered 3-D grid is identical. Therefore, the vertical and horizontal averaging of the 3-D pressure field need to be weighted, to take into account the variability in compaction between each individual grid block of the reservoir model. Figure 4 presents the upscaled 2-D pressure fields for the first and last member of the prior ensemble of reservoir flow simulations.

FROM PRESSURE DEPLETION TO COMPACTION
Translating the pressure depletion between two epochs to volume compaction is non-trivial. Models for rock compaction based on laboratory measurements and/or field measurements are still highly debated ([van Thienen-Visser et al., 2015a], [NAM, 2015], [Hettema et al., 2002], [De Waal, 1986], [Spiers et al., 2017], [Pijnenburg et al., 2018[Pijnenburg et al., , 2019, [Smith et al., 2019]). For our exercise, four types of compaction models were considered: the linear, the bilinear, the time decay, and the rate type model. An important question that we address here is: can we determine which compaction model explains the subsidence observations best?
The linear elastic compaction model assumes a linear relationship between pressure depletion and compaction : where is the grid block net volume. Only one material parameter, the compaction coefficient of the reservoir rock is needed for the linear compaction model. can be chosen as a single value or as a function of porosity . , A polynomial regression derived from uniaxial laboratory experiments on rock samples representative of the gas reservoirs of the north of the Netherlands [NAM, 2017] yielded a third-order polynomial: The three other compaction models (bilinear, time-decay, rate type) intend to capture non-linear timedependent and/or pressure-dependent behavior. They have been formulated in response to geomechanical and geodetic measurements pointing out nonlinearities and delays in compaction and subsidence relative to the production of gas fields ([van Thienen-Visser et al., 2015a], [NAM, 2015], [Hettema et al., 2002], [De Waal, 1986], [Spiers et al., 2017], [Pruiksma et al., 2015]). Recent observations also indicate continuing subsidence even when production has stopped.
In the bilinear compaction model [NAM, 2017], the compaction coefficient is assumed to be pressuredependent. It combines two linear relationships between pressure depletion and reservoir rock compaction as: where 0 and respectively define the initial pressure before the start of production and the transition pressure. The first relationship, equation (3), should fit the stiff behavior at early stages using a low value from the onset of pressure depletion up to the transition pressure . The second relationship, equation (4) addresses the later, weaker behavior using a high value for for higher pressures. Two material parameters , and the pressure are required to compute the bilinear compaction.
The time-decay model was inspired by the ubiquitous diffusive behavior of many physical systems pushed into non-equilibrium high-energy states, which slowly decay to their low-energy equilibrium states again [Mossop, 2012]. Such a diffusion-type process can be modeled using a convolution of a linear relationship between pressure depletion and reservoir rock compaction with an exponential time decay function: In order to compute the time decay compaction, the material parameter and the characteristic time decay constant are required.
The rate type compaction model was derived from laboratory measurements designed for capturing the loading-rate dependency in sandstones [De Waal, 1986]. The onset and arrest of production can be seen as changes in loading rate of the reservoir rocks. At the change in loading rate, a first direct elastic strain response is modeled, followed by a more gradual creep strain response . We followed [Pruiksma et al., 2015] with a formulation based on the linear isotach compaction. The rate type isotach compaction was implemented as an explicit Euler finite-difference scheme keeping a constant time step ∆ [Pruiksma et al., 2015]. To calculate the compaction of one grid block grid ( , ) the applied numerical scheme can be divided into 5 steps: 1) From the current effective vertical stress ( ) and strain ( ), calculate the creep strain rate as: The vertical stress is derived from the reservoir depth and the mean density of the subsurface up to the reservoir top , and the effective stress is its difference with the pressure: At 0 , that is at the onset of pressure depletion/production, the direct elastic strain ( 0 ) and creep strain ( 0 ) are both taken zero, and thus the total strain ( 0 ) is set to zero. The reference total strain is expressed as: with the reference vertical effective stress = ( 0 ).
Three material parameters ( , , and ) and one state parameter (̇) are needed to compute the rate type compaction. The parameter is an empirical laboratory-derived constant. The material parameters and are respectively the reference and direct compaction coefficients, where is the compaction coefficient corresponding to the pre-depletion loading rate, and thus by definition relatively high. Parameter is dedicated to map out the direct effect at the change of loading rate. In the scenario of the change of loading rate due to the onset of pressure depletion, is expected to be relatively low in order to mimic the stiff response of the reservoir rocks. The state parameter ̇ represents the reference vertical effective stress rate at the start of the reservoir depletion.
2) The second step of the Euler scheme consists in calculating the increase in creep strain as: and update the creep strain as: 3) The time is updated as +1 → + ∆ 4) Following a linear stress-strain relationship one can calculate the direct elastic strain as: 5) Finally one can calculate the total cumulative strain as: And the total cumulative compaction as: with 0 the grid block net volume, assumed constant over time. After this last fifth step the workflow returns to the first step for the next time step.

FROM COMPACTION TO GROUND SURFACE DISPLACEMENTS
In order to propagate the 2-D compaction fields to surface subsidence we employ influence functions, also called Green functions. They are rotationally symmetric surface displacement profiles for a unit volume of compaction (a nucleus of volumetric strain) and they are used in conjunction with the 2-D compaction fields, in order to calculate the total surface displacements at the desired geodetic benchmarks locations. Our integrated approach (ESIP) gives the possibility to compute either Gaussian influence functions following Knothe's theory [Hejmanowsky & Sroka, 2000], or physics-based semianalytical influence functions based on the work of [Fokker and Orlic, 2006]. The last are based on the nucleus of strain concept [Mindlin & Cheng, 1950] and extend Geertsma's analytical solutions ( [Geertsma, 1973], [Van Opstal, 1974]) in the sense that they can handle a layered elastic subsurface with the possibility of including visco-elastic salt layers.
The Knothe influence function only requires two input parameters, the reservoir depth and the influence angle, and only gives the vertical surface displacement (i.e. the subsidence). The influence functions resulting from the semi-analytical approach developed by [Fokker and Orlic, 2006] give the vertical and horizontal surface displacements for a single "nucleus" located at reservoir depth, and for given elastic parameters -or visco-elastic parameters in the case of salt layers. The required input parameters are the reservoir depth, the depths of the layer interfaces, Young's modulus and Poisson's ratio of each layer, and Maxwell viscosity in case of a visco-elastic layer. For the present study, only semi-analytical influence functions for an elastic two-layer subsurface model have been considered. The interface between the layers was put at -3800m depth, 80m below the reservoir (-3720m). Uncertainties of the model parameters (Young's modulus and Poisson's) of both have been mapped out in the prior ensembles (see Section 2.5) to cover the broad range of possibilities from a homogeneous elastic subsurface (no elastic differences between both layers) to a subsurface with a stiff basement (that is with a strong contrast of elasticity between the top and bottom layer). The subsidence signatures of a homogeneous elastic subsurface and one with a stiff basement are quite different. One objective of our exercise was to test if our integrated approach (ESIP) can effectively constrain the potential contrast of elasticity of our two-layer subsurface model.
Following the application of the influence function to the compaction grids and the calculation of the modeled surface displacements, a set of 90 double differences is generated for each realization for the training period . The combinations of modeled 90 double differences (in terms of benchmark locations and time of the geodetic campaigns) correspond to those of the measured (synthetic in our case) 90 double differences.

PRIOR PREDICTIONS
For each type of compaction model and based on the prior information, a prior model vector ensemble of vectors 0 = ( 1 , 2 , … ) is created. These vectors are generated by stochastically selecting values in the prior distributions of the driving input parameters. In this sense the available prior knowledge is mapped to the ensuing ground surface displacement, by means of the geomechanical forward models for both the compaction and the influence functions to translate compaction at depth to ground surface displacements. Note here that the prior model parameters used to generate the ensemble of reservoir flow simulations are not included in 0 and thus will not be updated during the data assimilation procedure (detailed in Section 2.7).
The prior distributions of all the model parameters used for the compaction models and influence function are presented respectively in Table 1 and Table 2. For some model parameters, stochastic factors are employed to generate the prior ensembles. The prior ensembles of some compaction coefficients ( , , and ) are generated by multiplying the initial compaction coefficient value (as defined by equation (2)) with their respective stochastic factors ( , , and ). The early stage low compaction coefficient of the bilinear compaction model, is then multiplied by the stochastic factor to obtain the prior ensemble of relatively higher later stage compaction coefficient . The prior ensemble of pre-depletion relatively higher compaction coefficient of the rate type compaction model, is generated by dividing with the stochastic factor .
From our a-priori knowledge one can generate an ensemble of surface displacements predictions for each component of surface displacements (u1, u2, u3). From these ensembles of surface displacements one can generate a prior ensemble of double differences (defined as differences in time of level differences in space, see Section 2.6) for each type of compaction model. The geomechanical forward model (compaction model + influence function) is indicated by the function working on each vector of prior model parameters. For the sake of clarity, the ensembles 0 for each type of compaction model, will from now on only refer to the ensemble of prior double difference predictions: For each type of compaction model one can define a mean and a covariance matrix of the prior double differences predictions. The mean over the members is defined as: The covariance over the members between the jth location and the kth location is defined by which can be written in matrix notation as: And where The mean of the priors < 0 > can be seen as the best prior estimates for the various model types. The ensembles can be regarded as the implementation of the prior knowledge, and, assuming a production scenario from time T1 to T2 (that is from 2008 to 2015), they can be used to generate the best prior estimates for the various types of compaction model from T0 to T2. For a real scenario, the times T1 and T2 correspond respectively to the present day and a future geodetic campaign.

GEODETIC OBSERVATIONS
Two types of ground surface displacement measurements were employed for our exercise: optical levelling and GPS. Their spatial distribution with regard to the reservoir mimics real cases of the north of the Netherlands (Figure 4). The levelling measurements give vertical displacements only (subsidence or heave, indicated by u3); GPS also measures the horizontal displacements (West-East component u1, and North-South u2). The additional assimilation of the horizontal displacements is particularly interesting for constraining the shape of the subsidence bowl. Our integrated approach allows flexible incorporation of other measurement types like InSAR, if necessary.
To connect to the actual procedure of taking measurements the synthetic ground surface displacements were translated to double difference estimates and their corresponding covariance matrix. A double difference indicates a difference in time of a level difference in space [van Leijen et al., 2017], which circumvents the effect of the choice of references in time and space. We considered 90 double differences, including both optical levelling and GPS, for the training period . They were chosen from one member of our prior ensemble -which was not used anymore in the further analysis.
Along with the simulated observations, the uncertainties of the observations due to measurement noise, and temporal and spatio-temporal idealization noise were considered, formalized with the full data covariance matrix according to [van Leijen et al., 2017]. Figure 5 presents the full data covariance of the 90 double differences for the training period . The temporally correlated idealization noise, mostly giving positive off-diagonal numbers, describes the effect of benchmark instability. Most negative off-diagonal numbers are related to the spatio-temporal components reflecting additional surface motion not associated with the signal of interest, e.g., compaction of shallow soft soil layers. A detailed description of the method used to simulate the noise of the measurements and the full covariance matrix is beyond the scope of the present publication. We simply note here that the full data-covariance matrix, describing the uncertainty/variance of each measurement, but also the temporal and spatio-temporal correlations between the measurements, even if often neglected [e.g. Fokker et al., 2012], is essential for a proper conditioning of the model parameters (see later Section 2.7).

CONDITIONING OF THE MODELS WITH THE DATA
The next step consists in confronting or conditioning the prior estimates up to T1 with the geodetic data acquired up to T1 (that is from 1986 to 2008, ).
The ensemble smoother algorithm consists in an inversion scheme, for which the goal is to maximize an objective function J (or minimize -log[J]) of the form ( [Menke, 1989], [Tarantola, 2005]): where and ( ) are respectively the "optimized" (posterior) vector of model parameters and double differences predictions (that is ) from time T0 to T1 . In other words, the objective function is integrated in an inversion scheme seeking the solution for the vector of model parameters that optimizes the match with the data and with the prior information 0 . This way, the ensemble smoother conditioning step updates both models and predictions.
The optimal "least-square" solution of equation (19) for one particular realization and assuming a linear inverse problem is given by [Tarantola, 2005]: which can be rewritten as: In equation (22) 0 is the prior ensemble of model parameters; 0 represents the result of the nonlinear geomechanical forward model working on all the members of the prior ensemble, that is the ensemble of prior double differences predictions. Primes indicate anomalies with respect to the ensemble mean as 0 ′ = 0 −< 0 > and 0 ′ = 0 −< 0 >. Finally = ( + 1 , + 2 , … + ) corresponds to an ensemble of double differences data realizations created adding to the vector data random noise vectors corresponding to the uncertainty range of the measurements. This procedure ensures a posterior error covariance that is consistent with the theory. The background of this was discussed by [Burgers et al., 1998].
After the data assimilation step has been performed, that is computing equation (22), posterior model parameters included in ̂ are used to re-run the compaction models and influence functions to generate the posterior ensemble of ground surface displacements. This implementation is called the Ensemble Smoother with one Single Step of data assimilation (ES-SS).
The non-linearity in the geomechanical forward model can be more appropriately handled by applying multiple steps of data assimilation, so called Ensemble Smoother with Multiple Data Assimilation (ES-MDA) [Emerick and Reynolds, 2013a]. In ES-MDA, the output ensemble of the smoother is used as input for the next update. The same data are used for all assimilation steps; to compensate for the effect of the multiple application, the data covariances used in the update steps are increased with respect to its actual value. For each step, a covariance multiplication factor must be chosen, so that ∑ 1 =1 = 1 (with the number of assimilation steps). Following [Emerick and Reynolds, 2013a], we apply 4 steps with decreasing with increasing , giving progressively larger weights to later update steps.
ES-MDA updates model parameters in a more "progressive" way, relative to the rather "aggressive" single step of the ES-SS. Indeed, even if at the end of the 4 steps of the ES-MDA the actual covariance is honored, during each step of the ES-MDA the update can be seen as relatively "gentle" because of the covariance inflation. Therefore, one can expect a higher ability of the ES-MDA for avoiding ensemble collapse. The downside of ES-MDA is that it is computationally slower; in our case, 4 times slower as we pick 4 steps of assimilation.

PERFORMANCE ASSESSMENT
The performance of the ensemble smoother algorithms (ES-SS and ES-MDA), in updating the prior ensembles towards posterior ensembles consistent with the data, can quantitatively be evaluated by different metrics. First we define a metric based on the Absolute Error ( ) ( [Baù et al, 2015], [Gazzola et al., 2021]), which is possible since the synthetic data were generated with known parameters: A second metric, the Average Ensemble Spread ( ) can be defined independently from the true values: the variation of the values with regard to their average ( [Baù et al, 2015], [Gazzola et al., 2021]): Both metrics can be defined independently for the data and for the model parameters. For the comparison of data with model results, the average is determined. For the model parameters we calculate the value for each parameter separately, because we want to know the behavior of the separate parameters, and their absolute values differ considerably. Low values of correspond to a model close to the truth. Low values of indicate that the spread of the ensemble and thus the model uncertainties are small.
To evaluate the update of the ensemble smoother algorithms (ES-SS and ES-MDA) we evaluate the variation between the prior and the posterior metrics with respect to the prior metrics: − and − . The closer their value is to unity, the better the estimate is (for ) or the stronger the constraining power of the smoother, in the sense of reducing the uncertainty of the estimate (for ). In Section 3, we will only use the normalized values, but we will still call them and for convenience.
A third class of quality metrics is formed by a 2 test. It evaluates the difference between model results and data with account of their covariance. We have defined two formulations, one in which the covariance is formed by the combination of covariances of both the model ensemble forecasts and the data, and one in which only the covariance of the data is used. The latter does not completely use the ensemble result but incorporates its average only: The 2 measures (equations (27) and (28)) judge if the model posterior predictions and the data are consistent. Loosely speaking, the average squared mismatch should be of the order of the covariance. The corrected measure employs a combination of the data covariance and the covariance of the posterior prediction. A 2 close to unity means that the model ensemble is matching the data and that the quality of the match is in agreement with the error covariance of the data. The non-corrected measure 2 compares the average of the model predictions with the data, with account of the data covariance.

INTRODUCTION OF THE EXERCISE AND CHALLENGES
Our exercise aims to estimate and forecast ground surface displacements from 1986 to 2015 based on synthetic data from 1986 to 2008. We ran reservoir simulations for multiple scenarios from 1986 to 2015. Because the synthetic data correspond to one member of the prior ensemble we have data and models from 1986 to 2015. Our predictions from 1986 to 2015, based on model conditioning from 1986 to 2008, can therefore be directly checked against our synthetic measurements from 1986 to 2015.
Each type of compaction model corresponds to one separate prior ensemble. Each of the four prior ensembles is based on the initial ensemble of 76 flow reservoir simulations. Uncertainties in the compaction model and the influence function are then superposed to the uncertainties in the pressure depletion, resulting in an ensemble size of 760 members for each of the four types of compaction model. At each step of the modelling workflow, the full spectrum of uncertainties is thus properly accounted for. One member of the bilinear prior ensemble is randomly picked as synthetic data and excluded from the set that is used for the assimilation exercise.
The prime scientific challenge that we are tackling with our exercise is: does our integrated approach combining forward models (and their uncertainties) with the ensemble smoother algorithms help to discriminate which type of compaction model best explains the synthetic data? We would hope that the integrated approach should easily identify the bilinear compaction model as the best model. However, there are at least three reasons that might complicate this identification. The first one is that the non-linearity between the pressure depletion and reservoir compaction, imposed by the bilinear compaction model, can potentially be reproduced by the time decay and rate type compaction models. It might thus be a real challenge for our integrated approach to discriminate between the bilinear, time decay and rate type models, and to identify the bilinear compaction model as the model that explains the data best. The second reason is that the large uncertainties at both the pressure depletion level and the influence function level might blur the differences between the spatiotemporal signatures of each type of compaction model. The third reason concerns the geodetic synthetic data and their spatio-temporal distribution. To mimic real field cases of the north of the Netherlands, the geodetic benchmarks had not been distributed uniformly in space and time with regard to the complex reservoir and aquifer. This uneven distribution of the observations, and their intrinsic errors (mapped in the full covariance matrix, see Figure 5), might complicate the ability of the integrated approach to discriminate between each type of compaction model.  . The less aggressive multi-step update of the ES-MDA helps to "better constrain" most of the model parameters. "Better constrain" means that the reduction of the parameters uncertainties is more effective with the ES-MDA than with the ES-SS. For the model parameters of the influence function, used to propagate the reservoir compaction in terms of ground surface displacement, only the ratio between the Young's modulus of the top layer ( from the ground surface to -3800m depth, that is including the reservoir) and the Young's modulus of the bottom layer ( that is the basement) is updated significantly during the data assimilation procedure. Interestingly this update of the Young's modulus is very similar for all types of compaction models.

Model parameters
For the bilinear compaction model, where we know the true model parameters used to generate the synthetic data, the visual-inspection-based conclusion about the parameter distributions is confirmed by the metrics and . The metrics show more constraining power for the ES-MDA than for the ES-SS (Figure 8): the posterior ensembles of model parameters obtained after the fourth assimilation step of the ES-MDA are less scattered and more centered around the true set of model parameters. Figure 8 also shows that the Young's modulus of the basement ( ) is more constrained by the data assimilation than the elasticity of the top layer ( ). The conditioning of the ratio of Young's modulus ( / ) towards the true, as visualized in Figures 6 and 7, is thus dominated by the update of the Young's modulus of the basement ( ).

Ground surface displacements
Figures 9 to 12 display the prior and posterior ensembles of ground surface displacements for both the ES-SS and ES-MDA. The results are shown for two specific geodetic benchmark locations, one on top of the reservoir (location 18, Figure 4) and one on top of the aquifer (location 8, Figure 4). For both ensemble smoother algorithms (ES-SS and ES-MDA), a qualitative assessment of the curves indicates that the spread of each posterior is smaller than the spread of their prior. The vertical ground surface displacements of the posterior ensembles are mostly "well aligned" with the data in the sense that the spread of the posterior ensembles encompasses the data. Only for the linear compaction model, and for the combination rate-type compaction model with ES-SS, the alignment is suboptimal. With the exception of the rate-type compaction model with the ES-SS, the horizontal ground surface displacements (West-East component u1), even if of low magnitudes, are relatively "well constrained" by the posterior ensembles for both the reservoir benchmark and aquifer benchmark.
The posterior ground surface displacements at the benchmark location on top of the aquifer (benchmark location 8, Figure 4) are in better agreement with the data than the movements calculated with the prior ensemble. Actually, the mean posteriors moved very close to the data during the assimilation procedure. This is remarkable given (i) the high uncertainty of the aquifer depletion and (ii) the low magnitudes of the ground surface displacements on top of the aquifer compared to those on top of the reservoir. Interestingly, for the benchmark location on top of the aquifer, the relative bandwidths of the posterior (with the ES-MDA after four steps of assimilation) horizontal ground surface displacements (defined as the ratio between the standard deviation and the mean of the ensemble) are smaller than those of the posterior vertical displacements. As an example, for the bilinear compaction model, the relative bandwidth of the posterior horizontal ground surface displacements on top of the aquifer is 0.57 whereas for the posterior vertical ground surface displacements it is 1.01. This points out the importance of the use of the horizontal ground displacements to constrain the shape of the subsidence bowl on the sides of the reservoir and ultimately to constrain the aquifer activity. The suboptimal update obtained with the ES-SS for the ground surface displacements of the rate-type compaction model ( Figure 12) is confirmed by the lower and than for the bilinear and time decay ensembles (Figure 13). For the rate-type compaction model, the performance differences between the ES-SS and ES-MDA are larger than for the other models. It seems that for a complex compaction model (with a higher number of model parameters) as the rate type, the need for a less aggressive assimilation procedure with multiple steps is more pronounced. The suboptimal alignments of the posterior ground surface displacements of the linear compaction model (Figure 9) are confirmed by their poor values for , 2 and 2 (i.e. 2 ≫ 1 and 2 ≫ 1) compared to those derived for the other compaction models (Figure 13).
An important question concerned the reliability of subsidence forecasts. Figure 13 presents the quality metrics both for the training period [1986 -2008] and for the full period including the forecasting, for which no measurements were used in the assimilation [1986 -2015]. We see that the metric for the full period is not significantly different from the metric for the training period. Especially the 2 and 2 for the bilinear model do not increase with the extension of the evaluation period. This indicates that the forecast of the bilinear model is trustworthy.

ACCOMPLISHMENTS OF THE INTEGRATED APPROACH
The main objective of our exercise was to test if our integrated approach could effectively identify the bilinear compaction model as the best compaction model to explain the synthetic data (themselves generated with a bilinear compaction model) and to make trustworthy forecasts. Our analysis has demonstrated that with the combination of the and metrics it was possible to identify the bilinear and rate-type as best performing compaction models. The additional 2 and 2 metrics allowed to clearly identify the bilinear compaction model as better performing than the rate-type model. Our integrated approach is thus indeed capable to identify the main driving compaction model. This is especially remarkable as our exercise incorporated the full spectrum of uncertainties from the reservoir flow to the modelling of subsidence. To our best knowledge, there are no previous studies that have honored this full spectrum. Even if large uncertainties in the ensembles of pressure depletion scenarios and influence functions were effectively "blurring" the spatio-temporal signature of the reservoir compaction, the integrated approach successfully recognized the bilinear compaction as best model to explain the synthetic data.
It is important to stress here that the pressure depletion realizations were not subjected to the assimilation procedure. Even if the uncertainties in the pressure depletion were kept constant throughout the assimilation procedure, the model parameters of the bilinear compaction model were well constrained and aligned with the true-synthetic data. Both footprints in the subsidence signal, of (1) the pressure depletion and (2) the bilinear compaction, were thus effectively disentangled by our integrated approach.
In addition, our analysis has demonstrated that the update of the model parameters of the influence function is independent of the type of compaction model. Indeed, for each type of compaction model, the posterior ensembles of the Young's modulus ratio (between the two subsurface layers) similarly converged towards the data. This last result indicates that again our integrated approach was capable to disentangle the unique footprint of the elastic layering on the subsidence signal from the footprints of the pressure depletion and compaction.

DISCUSSION AND CONCLUDING REMARKS
We have presented and validated a robust ensemble-based probabilistic approach for forecasts of ground surface displacements. The integrated approach, which we have coined ESIP (Ensemble-Based Subsidence Interpretation and Prediction), reduces the model parameter uncertainties by assimilating subsidence data and it gives a probabilistic forecast with well-defined error bounds. ESIP is not only intended to make testable predictions of surface movement but also to constrain the driving physical processes. Honoring the full spectrum of processes and uncertainties attached to it, from the reservoir flow to subsidence, we demonstrated that the compaction at depth can be effectively constrained. The identification of the driving physical mechanisms is especially important for the subsidence forecasts. As an example, in the possible scenario of a reservoir production shutdown, the subsidence forecast will be very different for the different types of reservoir compaction models at work. Besides constraining the active type of reservoir compaction model at depth, the integrated approach effectively identified the parameters of that model and the elastic layering properties of the subsurface. The latter directly controls the influence function translating the reservoir compaction at depth to ground surface displacements.
The main limitation of the current workflow is that the ensemble of reservoir flow simulations is unchanged upon assimilation. Ideally one should update the controlling reservoir parameters and make a new reservoir simulation for every ensemble member. This should indeed be part of future developments. Introducing potential future implementations, it is important to also utilize the versatility of applications offered by ensemble smoother algorithms. Indeed, the ES-SS has been recently deployed, in our companion paper [Candela et al., 2021], on a real dataset to identify the driving mechanisms of induced seismicity at the Groningen gas field. This specific scenario demonstrates that the ensemble smoother algorithms can be also applied to a discrete dataset, such as an earthquake catalog. The flexibility of the implementation of the ensemble smoother also offers the possibility to assimilate multiple types of data. As an example, jointly assimilating subsidence geodetic observations and seismicity data, our understanding of the driving mechanisms of both induced subsidence and seismicity might be improved. An active research area in the Netherlands is the disentangling of the relative contribution between deeply seated sources of subsidence (as natural gas extraction) and shallow seated sources of subsidence . These sources often interfere. Incorporating in ESIP the additional forward models for shallow seated sources of subsidence, such as soil compaction and oxidation caused by lowering of the phreatic level, might help to achieve this disentangling [Candela et al., 2020].     [1986]. Bottom: pore pressure at the end of production [2015]. Figure 3 Top-views of the pore pressure distribution in the reservoir and connected aquifer for two members of the ensemble of flow simulations. Top: pore pressure pre-production [1986]. Bottom: pore pressure at the end of production [2015]. The blue rectangle of zero pore pressure, on the south-side of the model, corresponds to an intentionally cropped-volume (non-active zone) intended to mimic the complex geometry of real field cases. Still for the sake of capturing structural complexities of real field cases, east-west and north-south corridors of relatively smaller grid-block volumes are also visible.     ) ensembles of model parameters. Figure 8 and performance metrics (over the training period ) for the model parameters of the bilinear ensembles.