A comparison of Bayesian inference and gradient-based approaches for friction parameter estimation

Numerical tidal models are essential to the study of a variety of coastal ocean processes, but typically rely on uncertain inputs, including a bottom friction parameter which can in principle be spatially varying. Here we employ an adjoint-capable numerical ocean model, Thetis, and apply it to the Bristol Channel and Severn Estuary, using a spatially varying Manning coefficient within the bottom friction parameterisation. The spatial variation in the coefficient is a priori constrained by a categorisation of the sediment type found on the sea bed into three groups: rock, sediment containing gravel, and sediment containing only sand. We compare two calibration methods to estimate the three corresponding Manning coefficients using tide gauge observation data. The first method consists of Bayesian inversion via a Markov Chain Monte Carlo algorithm, using a Gaussian process emulator as a surrogate for the full numerical model, while the second uses a gradient-based approach via the adjoint mode of the numerical model. We first apply these methods to a ‘synthetic’ experiment, then to the assimilation of real data; in each experiment we compare the results from each method and their respective computational cost. We further find that the use of the estimated Manning coefficients also reduces the model-observation misfit when tested within an independent numerical model, TELEMAC-2D, indicating that the calibration procedure has identified non model-specific and physically meaningful parameters.


Introduction
1 Numerical coastal ocean, and specifically tidal, models have a wide range of applications; studies of tidal degrees of freedom, the use of adjoint-capable models (Chen et al., 2014, Lu andZhang, 2006, Maßmann, 43 2010b,a, Zhang et al., 2011, Heemink et al., 2002 (or other gradient-calculation methods (Sraj et al., 2014a)) 44 is appropriate, as they allow the gradient of the model-observation misfit to be computed efficiently with 45 respect to an arbitrarily large set of input parameters. This gradient information permits the use of efficient using Kalman filters has also been used for friction parameter estimation (Mayo et al., 2014), but is more 61 commonly used in the context of state estimation, or joint state and parameter estimation (Evensen, 2009), 62 and is not considered here. 63 In this work, we implement and compare two methods for the calibration of bottom friction parameters 64 via the assimilation of observation data, using the Thetis numerical coastal ocean model. The first uses a described in Section 3. Results from a synthetic experiment and the assimilation of real data are described 79 in Section 4, these results are discussed in Section 5, and we draw conclusions in Section 6. In this work we consider the Bristol Channel and Severn Estuary region on the UK west coast as a real 83 world case study location and use data from two sources for the purposes of model calibration and validation: 84 (i) 11 locations at which tidal harmonic data is available (National Oceanography Centre, personal com-85 munication 2018), which are shown as green squares in Fig. 1. To compare modelled results with 86 these data, the model must be run for a suitably long time (ideally a month or more), and a harmonic 87 analysis performed at these locations. 88 (ii) Five tide gauges where quality controlled timeseries surface elevation data is available from the British where η is the free surface elevation, H = η + h is the total water depth, h is the bathymetry (measured 100 positive downwards), u is the two-dimensional depth-averaged velocity, F C is the Coriolis force, g is the 101 acceleration due to gravity, ρ is the water density, τ b is the bottom stress due to friction between the ocean 102 and sea bed, and ν h is the kinematic viscosity. Here, we parameterise the bottom friction τ b via a Manning's 103 n formulation to the bathymetry in order to avoid negative water depth. This scheme is adopted since its formulation 117 involves only a modification to the underlying equations; it therefore poses little additional complexity in 118 differentiation of the model using numerical adjoint methods. This scheme introduces an additional wetting-119 drying parameter α, which controls the transition from wet to dry regions, and is user-defined. In all Thetis 120 simulations presented herein, α is taken to be 0.5 m.

121
The mesh used for the simulations here is shown in Fig. 1 problem with this approach is that "a substantial amount of manual intervention is still necessary" (Sraj In addition to Thetis, we also use the TELEMAC-2D model (Hervouet, 2007), henceforth referred to as   The GPE also exports a covariance matrix encapsulating the uncertainty in the GPE estimate. The GPE 188 output interpolates the training data, with zero covariance at training points.

189
The training data is generated from a harmonic analysis (at the 11 locations) of Thetis model outputs.

190
The Thetis training runs were configured using 40 samples from the input parameter space, drawn from   Fig. 1). The set of emulator outputs, estimated for a vector of Manning coefficients n = (n 1 , n 2 , n 3 ) T , is where Π is the posterior distribution of the parameters n given the observed data {y C }, L is the likelihood of 214 observing the outputs {y C } given the parameters n, and q is the prior distribution of each of the parameters 215 n i , which we take to be an uninformative prior in the range [0.01, 0.05] and hence For a constituent C, we assume that the model-observation discrepancies, which are the components of 217 the vector y C − G C (n), are independent and identically distributed variables with zero mean and variance 218 σ 2 C . The likelihood L({y C }|n) for a set of constituents C is therefore given by The covariance in the model outputs G C (n) due to the use of the GPE, which is estimated as part of the 220 GPE evaluation, is assumed to be negligible compared to the variances σ 2 C , and neglected within the Bayesian 221 inversion. Since these σ 2 C are unknown a priori, they are treated as hyperparameters, i.e. they are included 222 as additional parameters to be inferred by the MCMC algorithm. We denote the full vector of unknowns 223 θ = (n 1 , n 2 , n 3 , log σ 2 C1 , log σ 2 C2 , . . . ), and the full posterior distribution is therefore given by For the unknown variances σ 2 C , the only prior information is that they must be positive. We therefore follow 225 Sraj et al. (2014b) and assume Jeffreys priors (Sivia and Skilling, 2006), such that The posterior distribution Π(θ|{y C }) of Eq. (6) gives the probability distribution of the unknown Man- Hastings MCMC algorithm (Hastings, 1970), which is given by Algorithm 1. The algorithm requires the 232 selection of an appropriate proposal distribution covariance matrix, Σ step , governing the size of the random 233 steps within the parameter space. We set so that the random steps in each of the Manning coefficients have zero mean and a standard deviation of 235 0.001 s m −1/3 , and the random steps in each log σ 2 C have zero mean and a standard deviation of 0.1. These 236 step sizes were found to give satisfactory results, without the need for an adaptive MCMC algorithm.

255
The challenge in assimilating this data via a numerical adjoint model is the high demand on placed on 256 computational resources, in particular memory, by adjoint models. This is compounded by the definition of 257 the misfit between the model and harmonic tide gauge data, which requires a sufficiently long forward run 258 that a harmonic analysis may be performed on the simulated timeseries. In order to limit the computational 259 cost of assimilating harmonic data, we opt to assimilate only the M2 harmonic, so that a short assimilation where A M,i and A O,i are the modelled and observed M2 amplitudes at the 11 gauges i, respectively. Note 267 that the minimisation of this misfit is equivalent to the maximisation of the posterior of Eq. (6) for a single 268 constituent, i.e. this method constitutes maximum likelihood estimation. We therefore expect results from 269 this calibration method to be consistent with those from the MCMC method applied to the M2 constituent.

270
The advantage of this method over the MCMC approach is its use of an efficient gradient-based algorithm 271 via the use of the adjoint model, without needing to approximate the numerical model with a surrogate.

272
However, the computational resources available for this work prevent a thorough harmonic analysis based 273 on more than one tidal constituent. of Manning coefficients, before the 24-hour assimilation period.

283
For comparisons between the model and this timeseries data, the misfit functional is defined as where η M i and η Oi are the modelled and observed surface elevations at each of the five tide gauges i, 285 respectively, and the time integral spans the 24-hour period described above and is approximated by a 286 suitable discrete method applied to the 15 minute data intervals.

287
This method has the advantage that each optimisation iteration requires forward and adjoint model runs   We again simulate uncertainty in the observations by specifying a fixed σ 2 C = 0.0025 m 2 for all con-313 stituents C. The resulting joint PDF is shown in Fig. 3  shows that assimilation of additional data can be used to overcome uncertainty in observation data, 317 resulting in a tighter constraint on the unknown parameters. However, the high covariance between 318 the estimated rock and gravel friction parameters remains present.

Assimilation of real data
We now apply each calibration method to the assimilation of real data, with results summarised in Tables 333 3 and 4. We observe the following:   The convergence for the assimilation of timeseries data is shown in Fig. 6  this is a comparison using a greater number of harmonics than used by any of the calibration methods.

369
This measure will indicate how successful the assimilation of timeseries data, or different combinations 370 of harmonic constituent data, has been in improving overall harmonic-based error.

371
(iii) The same eight-harmonic NRMSE as in (ii) above, using results from Telemac. The use of a second 372 numerical model will indicate how successful each calibration method has been in identifying non-model 373 specific friction parameters.

374
The results for each of these misfit measures are summarised in Table 5. For comparison purposes we 375 have also included optimal results using single-coefficient (spatially uniform) friction parameters for each 376 misfit measure, which were determined from a brute force approach using model runs with values of n from    Uniform n = 0.03 s m −1/3 5.1% Uniform n = 0.0275 s m −1/3 5.9%  implemented adjoint model is typically greater than that of the forward model by a multiplier slightly larger 425 than unity (Griewank and Walther, 2008). For the Thetis adjoint model, we find that this multiplier is 426 approximately 3.6. The gradient-based calibration using timeseries data is therefore the least computationally 427 expensive of the methods used within this work, due to its short 1. approach; specifically, we obtain the full posterior distribution, and can therefore estimate uncertainties in 445 the estimated friction parameters. We therefore suggest that, for the three-dimensional input parameter 446 space used here, the MCMC approach has the potential to be more computationally efficient, by providing 447 more information than the gradient-based approach, at a similar computational cost. this essentially requires that the assimilated data are sufficient to constrain the unknown parameters). In 464 this study, we used a three-dimensional parameter space for the unknown Manning coefficient, and have 465 concluded that the MCMC and gradient-based approaches can be applied at similar computational cost.

466
Given the considerations above, we suggest that for a parameter space of four or more dimensions, the 467 computational cost of the MCMC approach would exceed that of the adjoint approach, whereas for two or 468 fewer dimensions, the emulator training would be cheaper than the adjoint approach.

469
Lastly, we note that the availability of timeseries tide gauge data is typically limited to locations on the all of these calibration methods are found to be consistent, within the uncertainty estimates resulting from 486 the MCMC approach.

487
The adjoint approach using timeseries data is the least computationally expensive approach due to its 488 short assimilation window, but for a general application is dependent on the availability of timeseries data at 489 multiple locations within the same time period. For the three-dimensional parameter space used within this 490 work, we have shown that the assimilation of harmonic data via gradient-based and MCMC approaches can 491 be applied at similar computational cost. Since the MCMC approach provides uncertainties in the resulting 492 parameter estimates, which can be valuable information in solving inverse problems, it is therefore more 493 efficient for our purposes. However, considering a more general parameter estimation problem, the adjoint 494 approach will scale better with increased degrees of freedom in the unknown parameter space, and we suggest 495 that for an input parameter space of four dimensions or more, adjoint methods will be more efficient than an 496 emulator-based MCMC approach, whereas for two dimensions or fewer, the MCMC approach will be more 497 efficient. In general, a choice of calibration method should take into account the dimension of the input 498 parameter space, the availability of data and the computational resources available.

499
Using the M2-only MCMC calibration result, we suggest a best estimate of the Manning coefficients for 500 our rock, gravel and sand sediment groups to be 0.037 ± 0.07, 0.029 ± 0.009 and 0.014 ± 0.003 s m −1/3 , 501 respectively. These parameters are found to decrease multiple measures of model-observation misfit across 502 two numerical models, which suggests that the calibration has avoided excessively correcting for model-503 specific numerical errors, and therefore that these parameter estimates are physically meaningful.