A Bayesian Level Set Method for Identifying Subsurface Geometries and Rheological Properties in Stokes Flow

SUMMARY We aim to simultaneously infer the shape of subsurface structures and material properties such as density or viscosity from surface observations. Modeling mantle ﬂow using incompressible instantaneous Stokes equations, the problem is formulated as an inﬁnite-dimensional Bayesian inverse problem. Subsurface structures are described as level sets of a smooth auxiliary function, allowing for geometric ﬂexibility. As inverting for subsurface structures from surface observations is inherently challenging, knowledge of plate geometries from seismic images is incorporated into the prior probability distributions. The posterior distribution is approximated using a dimension-robust Markov-chain Monte Carlo sampling method, allowing quantiﬁca-tion of uncertainties in inferred parameters and shapes. The effectiveness of the method is demonstrated in two numerical examples with synthetic data. In a model with two higher-density sinkers, their shape and location are inferred with moderate uncertainty, but a trade-off between sinker size and density is found. The uncertainty in the inferred is signiﬁcantly reduced by combining horizontal surface velocities and normal traction data. For a more realistic subduction problem, we construct tailored level-set priors representing “seismic” knowledge


INTRODUCTION
Imaging structures at depth, in the crust, lithosphere and mantle, and constraining their mechanical properties is a major challenge in geodynamics. While seismic tomography provides images of the lithosphere and mantle, they tend to be blurry with no direct translation from reconstructed velocity anomalies to rheology. Combining seismic images as prior knowledge with models for the mechanical long-term behavior of the mantle along with consistency between model predictions and observational data is a promising approach. By modeling mantle flow using the Stokes equations, we aim at inferring subsurface structures and material parameters from surface measurements of plate velocities and normal tractions while incorporating seismic information as prior knowledge.
Computations with varying geometries are challenging, in particular when the topology may change. For many physics simulations, level set and phase field methods have been established as flexible and efficient tools accomplishing this (e.g. Osher & Sethian 1988;Boettinger et al. 2002).
Inferring geometries from observational data is particularly challenging when access to data is limited and if one is additionally interested in quantifying uncertainties in the inferred parameters.
When aiming to identify high-dimensional parameter fields in geodynamics, the results often suffer from blurry edges due to the ill-posedness of the problem (Worthen et al. 2014). Therefore, Corresponding author one may resort to pre-defined regions in which one infers spatially restrictive constants (Baumann & Kaus 2015;Reuber et al. 2020). While this approach does not allow one to infer geometric features, it reduces the dimension and thus allows for quantification of uncertainties using sampling methods. Alternatively, one can rely on parameterizations of geometric structures. These include explicit parameterizations of geometric interfaces (Carpio et al. 2020), for which a moderate number of parameters may be sufficient. However, this approach limits geometric flexibility. Implicit parameterizations such as describing geometric structures as level sets of an auxiliary function has become a particularly popular tool for geometric inverse problems due to the shape flexibility it provides (Santosa 1996).
In the context of seismic imaging and full waveform inversion, parametric level set methods have been employed to identify subsurface salt bodies (Kadu et al. 2017), whereas a nonparametric level set method has been used to improve image resolution in travel-time tomography (Muir & Tsai 2019). Additionally, the level-set technique has been applied to joint inversion of density contrast and seismic slowness using gravity and travel-time data (Li & Qian 2016). The above papers deal with a purely deterministic setting and do not allow for quantification of uncertainties. Recently, methods to estimate geometric uncertainties have been applied to level-set based travel-time tomography (Muir et al. 2022). In general, Bayesian inference is a powerful tool to infer geometric uncertainties. In this approach, prior beliefs are stated in terms of probabilities and the solution to the inverse problem is not one particular shape but instead a probability distribution for the shapes. On the other hand, the curse of dimensionality typically forces one to either work in low dimensions or use dimension-robust inference methods, which is the approach taken here.
Some geodynamic inverse studies have assumed the geometry of the present-day subsurface (e.g., mantle structure) using adjoints of the Stokes equations while employing surface velocity measurements (Ratnaswamy et al. 2015;Rudi et al. 2022). These studies have proven to be efficient at recovering rheological parameters and the covariance between them. However, the models are limited in their ability to decipher the trade-offs between rheology and density or assessing the role of uncertain geometry. Other studies have attempted to recover geometry in the geological past through an integration of the convection equations backward in time (Bunge et al. 2003;Liu et al. 2008).
Here, we build on Iglesias et al. (2016) and use flexible geometric priors for binary functions with parameter-to-observable maps that are expensive to evaluate. We propose a method to construct tailored level-set priors with adjusted mean and variance representing seismic knowledge, building a bridge between seismic and mechanical models. Then, we study the trade-offs between geometry, density, and viscosity using a Bayesian framework in which dimension-robust sampling methods are employed to approximate the posterior distribution and quantify uncertainties in the inferred shapes and material parameters. Finally, we investigate how informative different observation types are and demonstrate the benefits of combining them.

FORWARD PROBLEM
Over long time scales, the slow deformation of rock can be modeled as an incompressible fluid modeled by the steady-state Stokes equations on a spatial domain Ω: where the Boussinesq approximation is employed. Here, σ is the stress tensor, ρ the density field, and g the gravitational acceleration vector. The velocity and pressure fields are denoted by u and p, respectively,ε(u) = 1 2 (∇u + ∇u T ) is the strain rate tensor, and I the second-order identity tensor. We assume the viscosity η to be independent of the velocity and pressure such that the above system of partial differential equations (PDEs) is linear. However, the viscosity may be spatially varying. For illustration, we focus on a bounded rectangular domain Ω ⊂ R 2 .
In order to close the system, we need to define appropriate boundary conditions. Let ∂Ω denote the boundary of Ω. We consider no normal flow and free-slip boundary conditions everywhere on where n denotes the unit-length outer normal vector, and T := I −n ⊗ n the tangential projection on ∂Ω with the outer product ⊗, i.e., n ⊗ n = nn T .
Since we discretize the governing equations using a finite element method (see Section 6), we require the weak form of the equations (1) and (2). With the free-slip boundary conditions, the pressure is only unique up to a constant. To eliminate this additional degree of freedom, we consider only square-integrable pressure fields with zero mean. Let Multiplying equations (1) and (2) with test functions v ∈ V and q ∈ L 2 0 (Ω), respectively, integrating by parts and adding the equations yields the weak form of the Stokes equations: Find for all (v, q) ∈ V × L 2 0 (Ω). The boundary integral in Green's first identity vanishes due to the free-slip boundary conditions (4) and (5) and thus does not occur in (7). Standard theory about linear saddle point problems (e.g. Girault & Raviart 1986) yields existence of a unique solution of equation (7) under the assumption ρ, η ∈ L ∞ + (Ω), i.e., the density and viscosity are essentially bounded from above and below by positive constants.

INVERSE PROBLEM
Although mantle structures can be inferred with seismic or with electrical and magnetic imaging methods, rheological parameters cannot be directly determined through measurements, and density is poorly determined at regional and smaller scales. Consequently, we estimate rheology and density by solving an inverse problem. More precisely, we are interested in recovering a vector m of unknown parameters (e.g. density, viscosity, etc.) from a finite number of noisy measurements y ∈ R d , i.e., where δ ∈ R d models measurement and model errors, and G is the parameter-to-observable map.
In our case, G involves solving the Stokes equations in their weak form (7) given the parameters m and evaluating the solution at the measurement locations. Inverse problems governed by partial differential equations are ill-posed, i.e., they do not have a unique solution or small perturbations in the data may lead to large differences in the reconstructions (e.g. Hanke 2017). There are several ways of dealing with this issue, e.g., regularizing the problem by incorporating additional penalty terms in a deterministic inversion (e.g. Engl et al. 1996). Instead, we will overcome this issue by formulating the inverse problem in the Bayesian framework in Section 3.1, which has the additional advantage of providing information about uncertainties in the parameters.
One of our aims is to investigate the effect of different data types on the inversion. In all calculations, the data y is observed on the upper boundary of the domain, which corresponds to the Earth's surface. We will consider two data types individually and in combination: (i) horizontal velocities u 1 and (ii) normal traction σ n = n · σn, which is related to dynamic topography. We assume that the normal traction has zero mean as topography in practice is measured with respect to some reference.

Bayesian Inversion
We briefly introduce Bayesian inversion as applied here, but a more detailed introduction can be found in Kaipio & Somersalo (2005). In this approach, all quantities in (8) are treated as random variables and we assume some prior knowledge on our parameters of interest in the form of probability distributions. We assume all components of the parameter vector m to be mutually independent and denote the joint Gaussian prior distribution m ∼ µ 0 = N (m 0 , C 0 ). Additionally, we assume that the noise vector is independent of m and normally distributed with mean zero and covariance matrix Γ, δ ∼ N (0, Γ). The goal of Bayesian inversion is to find and describe the posterior probability distribution µ y , which is a conditional distribution of the parameters m given the data y and prior distribution µ 0 . This posterior distribution is defined through Bayes' law. Since our inversion parameter vector m contains a function, the infinite-dimensional version of Bayes' law is needed (e.g. Stuart 2010): where J (m) : and we dropped the normalization constant in (9) because it is not needed for the methods used here. Note that J is the least-squares data misfit weighted by the inverse noise covariance matrix and thus the posterior distribution takes its largest values around minimizers of J , linking Bayesian inversion to deterministic approaches to solving inverse problems.

Level Set Parameterization for Uncertain Geometries
The rheological parameters of interest often exhibit large variation over small length scales. Therefore, we assume them to be piecewise constant functions. As we are primarily concerned with inferring geometric structures, we impose as few restrictions as possible on the interfaces between different structural entities, which we refer to as phases. This is achieved by using a Bayesian version of the level set method that was introduced in Iglesias et al. (2016). The main idea is to describe interfaces as level sets of a smooth function and infer this function from observation data.
Note that different from classical level set methods that use the level set function to evolve shapes in some time-dependent or iterative process, here level set functions are only used to define and sample distributions of geometric subsurface structures. Next, we detail level set parameterizations using the density field ρ as an example.
We denote the indicator function of a set A ⊆ Ω with 1 A , i.e., Let for some ρ i > 0, Ω i ⊆ Ω, i = 1, . . . , n, with Ω i ∩ Ω j = ∅ for i = j. The sets Ω i determine the shape of the different phases. The idea of the level set method is to parameterize these sets with the help of a smooth function h : Ω −→ R, the so-called level set function. For given thresholds c i ∈ R, i = 0, . . . , n, we define such that the shape of the phases is fully determined by the level set function h. This leaves considerable flexibility in describing the geometric structure of ρ and allows us to change the interfaces implicitly by updating h.
The level set function is naturally linked to the physical parameter of interest ρ through the so-called level set map F defined via where ρ = (ρ 1 , . . . , ρ n ) T . This procedure is visualized in Fig. 1. With this parameterization of the density, we can infer the smooth level set function h rather than the piecewise constant function ρ. For clarity, we simplify notation and refer to the adjusted parameter-to-observable map as G.
Note that by choosing the thresholds c i in (14) an assumption is made on the relative area of the different phases. Furthermore, it is not necessary to specify the phase values, ρ i , of ρ a priori.
By incorporating these scalar values in the definition of F in (15), they can be included in the parameter vector m for the inverse problem. We will follow this procedure in the computations.
As shown in Iglesias et al. (2016), the level set map F and hence the adjusted parameter-toobservable map are discontinuous, leading to difficulties if the problem is treated in a deterministic context. This is another reason for choosing the Bayesian framework for the inverse problem. It also means that the new parameter-to-observable map is nonlinear even if the data depend linearly on the physical parameters of interest. In the following, we focus on linear Stokes equations to demonstrate the method for the sake of simplicity. However, the framework is also applicable to nonlinear rheologies.

Prior Distribution for the Level Set Function
The prior measure on the level set function h is Gaussian, N (h 0 , C 0,h ), with mean h 0 and covariance operator C 0,h . In the function space setting, it is common to choose C 0,h to be a power of an inverse differential operator. This allows exploitation of fast PDE solvers for sampling from the distribution and yields control over the samples' regularity (Bui-Thanh et al. 2013;Stuart 2010).
In particular, we define the covariance operator C 0,h on the domain Ω in terms of the PDE operator A as where I is the identity and ∆ the Laplace operator. The use of inverse elliptic operators as covariance operators is motivated by their relationship to Matérn covariance functions, and an empirical relation between τ and α to the correlation length of the resulting random fields can be found in Lindgren et al. (2011). In particular, the constant τ ≥ 0 controls the (inverse) correlation length, whereas α > 0 determines the regularity of the samples. For example, increasing τ leads to small-scale features in realizations of the random field, whereas increasing α yields smoother samples.
In the level set context, the latter results in smoother interfaces between the different phases in physical space. Amongst other applications, this type of prior covariance operator has been used in the context of Bayesian level set priors in Dunlop et al. (2016).
For the PDE operator A, we employ zero Dirichlet boundary conditions on the bottom part of the boundary, ∂Ω b , and zero Neumann boundary conditions everywhere else: The implications of these prior modeling choices can be seen in Fig. 1: When the threshold parameter satisfies c 1 > 0, the subdomain Ω 2 cannot reach the bottom boundary. This is desirable in our applications where we are typically only interested in the upper part of a physical domain and the bottom boundary is artificial and only chosen for computational reasons.
The prior covariance operator A is discretized using a finite element method. Defining H := for all k ∈ H. Existence of a unique solution of (19) follows immediately from the Lax-Milgram theorem (e.g. Evans 1998).
Sampling from the prior distribution requires access to the square root of the covariance op- . This is realized by solving systems of the form Ah = r subsequently α/2 times. Hence, we limit the choices of α to positive even integers.

INCORPORATING SEISMIC IMAGES AS PRIOR KNOWLEDGE
Inverting for subsurface structures from surface observations is challenging. Thus, it is advantageous to incorporate any available information about these structures into the prior distribution.
For example, seismic images can provide insight into subsurface geometries a priori. Since we do not invert for the physical fields directly, the question arises of how to translate this knowledge Bayesian Level Set Method for Stokes Problems 11 into prior information for the level set function. In this section, we propose a method to construct the mean h 0 for the prior distribution of the level set function given a seismic image. For the presentation of this idea, we use a density field ρ with two phases and known values ρ = (ρ 1 , ρ 2 ) T as in Fig. 1.
Assume we have a seismic image of our computational domain Ω. Through experimental data or first principle calculations, estimates of the prior density can be made. Letρ : denote this estimated density field. Our aim is to construct the mean h 0 for the level set function in such a way that the expected value of the corresponding binary functions matches the seismic imageρ, i.e.,ρ Next, we use (20) to derive a computable expression for h 0 . For that purpose, we denote the probability density function and cumulative distribution function of the standard normal distribution N (0, 1) as ϕ and Φ, respectively. Furthermore, let c : Ω × Ω −→ R be the covariance function which is equivalent to In the case with two phases, the level set map simplifies to where the threshold c 1 is given.
. Substituting this and (21) into (20) yieldŝ which can be rearranged to obtain the desired prior mean of the level set function: Note that, in discretized form, the values c(x, x) are the diagonal values of the covariance matrix and h 0 can be computed easily.

Discretization and Prior Distribution
The discretization of the governing equations and the prior distribution are based on the finite element method. In all computations, the mesh consists of triangular elements. Since the domain and local mesh refinement is different for the two models in Sections 6.1 and 6.2, the exact properties are stated there.
For the discretization of the weak form of the Stokes equations (7), we choose continuous piecewise quadratic basis functions for the velocity components and continuous piecewise linear basis functions for the pressure. This choice of basis functions, also known as Taylor-Hood elements, is stable and a standard choice for the Stokes equations (see e.g. Elman et al. 2014). For computational reasons, we use a different approach for the discretization of the pressure space than stated in Section 2. Instead of enforcing a zero mean on the pressure field, we remove the excess degree of freedom by setting the pressure to zero at one fixed node, resulting in a shift of the pressure field.
We discretize the weak form (19) of the PDE operator underlying the prior covariance using continuous piecewise linear finite elements. More information about the discretization of the relevant inner products can be found in Bui-Thanh et al. (2013). We choose the inverse length scale τ = 14 and the exponent α = 8, amounting in a rather smooth level set function with relatively large correlation length. Choosing (τ, α) constitutes prior modeling, in which one incorporates all prior knowledge into the prior distribution. Practically, to find appropriate (τ, α), we visualize multiple samples from the resulting distribution and aim at choosing parameters (τ, α) such that samples represent potential models for the true parameters (see, e.g., the prior samples in Fig. 4).

Bayesian Level Set Method for Stokes Problems 13
Alternatively, it is possible to consider (τ, α) as hyper-parameters as in Dunlop et al. (2016) at the cost of increasing the computational requirements. Our implementation is based on FEniCS (Logg et al. 2012), an open-source software library for solving partial differential equations using the finite element method. FEniCS offers Python interfaces to efficient sparse direct solver tools such as MUMPS (Amestoy et al. 2001) or SuperLU (Li 2005), which we use for the computations in Section 6.

Approximating the Posterior Distribution
Since the parameter-to-observable map G is nonlinear, the posterior distribution is non-Gaussian and does not have a closed form. Using Gaussian approximations to the posterior distribution typically requires derivatives of G, which are not available here as G is discontinuous. Therefore, we rely on sampling methods to explore the distribution. The goal is to find a sequence of samples from the posterior measure µ y and use them to compute statistical quantities like expected value or This sampling method does not require derivatives of the parameter-to-observable map and is thus applicable to problems using a level-set prior. Another advantage of the pCN-MCMC algorithm is that it is well-defined on infinite-dimensional function spaces, yielding mesh-independent convergence of the Markov chain. However, in practice one might need a large number of samples for accurate approximations of the posterior distribution, an aspect addressed further in Section 6.1.
As subsequent samples of the Markov chain are correlated, it is important to choose the parameters in Algorithm 1 carefully to obtain a sufficiently large effective sample size. A larger jump parameter β leads to a better exploration of the full space of the posterior distribution but will in turn decrease the acceptance rate of the samples, whereas a smaller jump parameter will increase the acceptance rate but might lead to a poor exploration of the space. Both situations result in a slow convergence and large autocorrelation times of the Markov chain. As a compromise, we ad- Propose

NUMERICAL EXPERIMENTS
We apply the proposed method to two different scenarios with synthetic data: First, one where the data is generated using two sinkers in an otherwise homogeneous rectangular domain. This problem discussed in Section 6.1 is used to develop intuition for the method. Second, we use a more realistic geophysical setup with a subduction zone as detailed in Section 6.2. All quantities are non-dimensional. We visualize the results, interpret them in terms of expectations from viscous fluid dynamics, and discuss the prospects and limitations of this inference approach.
In ing the horizontal velocity and mean-zero normal traction measurements are given by respectively. The exact locations of the measurement points x j as well as the model setup used to create synthetic data for the inversion are specified for each model separately. For both models, the synthetic velocity and traction data are corrupted by 5% and 10% uncorrelated relative noise, respectively, to mitigate the "inverse crime" (Kaipio & Somersalo 2005).

Model 1: Sinkers
The first model consists of two higher-density sinkers in the rectangular domain Ω = [0, 4] × [0, 1], which is discretized with a uniform triangular mesh constructed from 200 squares in xand 50 squares in y-direction, each of which is split into two triangles (Fig. 2). We aim to recover the geometric structure of the density field ρ, the higher density value ρ 2 , and the constant background viscosity η ref . Since the velocity is driven by the density difference between ρ 1 and ρ 2 , we consider ρ 1 = 1 to be known. To ensure that the density of the sinkers ρ 2 is larger than ρ 1 , we define ρ 2 := ρ 1 + exp(ρ log ) and invert for ρ log . For the viscosity, we use η ref = exp(η log ) and invert for η log . As the level set method presented here requires a Gaussian prior for the inversion parameters, ρ log and η log , we assume substantial prior uncertainty in ρ 2 and η ref .
Since the geometric structure of ρ is determined by the level set function h, the full parameter vector is m = (h, ρ log , η log ), where h is a function and ρ log and η log are scalars.
In this example, we do not incorporate "seismic" knowledge about the subsurface into the prior mean and choose h 0 ≡ 0 as mean for the level set function. The prior mean and variance of the scalar parameters can be found in Table 1. Synthetic data is created with the density field shown in  Table 1, where the observations are taken at 16 equidistant surface points x j indicated by red squares in Fig. 2.
In all computations, we draw five million samples and discard the first million as burn-in. When using Markov chains to explore the posterior distribution, it is important to ensure that the chain is converged and exhibits sufficient mixing to obtain a good approximation of the distribution. For this purpose, we estimate the autocorrelation time for scalar quantities of m and visually inspect their traces. Using the norm of the level set function h L 2 (Ω) (Fig. 3), we consider the statistical dependence of samples for which autocorrelation drops below 0.1 as negligible. This suggests that roughly every 15,000th sample of the chain is statistically independent, which shows that a large number of samples is needed to obtain a reasonable number of statistically independent samples.
The trace of h L 2 (Ω) in Fig. 3 suggests reasonable (but slow) mixing of the chain and is another visualization for the correlation of subsequent samples.
Comparing samples from the prior and posterior distribution of the density field, the horizontal  location of the sinkers is recovered well, whereas there is some variance in the vertical location and the sizes of the sinkers (Fig. 4). This is expected since the measurements at the surface are mostly determined by the sinkers' mass -a larger lower-density sinker excites almost the same surface flow as a smaller higher-density sinker. Furthermore, small artifacts with higher density can appear close to the bottom of the domain. Again, this is due to the observations being taken on the surface and since higher density material leads to less flow when closer to the bottom of the domain due to the no-outflow boundary condition at the bottom boundary.
We study the role of different observation data on the recovered density field comparing three different settings, namely (i) using only horizontal velocities, (ii) using only normal tractions, and (iii) using both horizontal velocities and normal tractions as data. For cases (i) and (ii), measureprior samples i.e., the sinker area multiplied by its density. The true mass is recovered with small uncertainty, particularly compared to the cases with only one type of observation data.
Finally, the two-dimensional joint marginal distributions of the background viscosity η ref and the sinkers' density ρ 2 (Fig. 7) are visualized through the prior probability density function (pdf) as well as kernel density estimates of the posterior pdfs. Using both data types, the background viscosity is constrained well (Fig. 7c). As expected, this uncertainty is larger for the case with just velocity data (Fig. 7a). Furthermore, there is a positive correlation between the background viscosity and the density, in accordance with physical expectations. In contrast to these cases, we do not learn much about η ref or ρ 2 when using only normal tractions as data (Fig. 7b). However, one should keep in mind that this is a two-dimensional marginal distribution of a high-dimensional distribution -in fact, using normal tractions as data allowed us to infer the shape and locations of the sinkers reasonably well (Fig. 5e-f).
Note that despite there being substantial uncertainty in the inferred density for all three observation data types (Fig. 7), we observe that the uncertainty in the overall sinker mass is small  ( Fig. 6). This is due to the different sinker geometries whose size is anticorrelated to the density, as can also be seen from the posterior samples ( Fig. 4g-l).
In a similar model setup, we performed computations where, in addition to the geometric structure of the density field, the structure of the viscosity field was also unknown. However, we used one level set function for both fields, coupling their shapes. We then inverted for the sinker viscosity instead of a constant background viscosity. Independent of the data type used, the locations of the sinkers were well recovered (see Fig. A1). On the other hand, neither the sinkers' density nor viscosity could be well constrained, leading to more uncertainty about the exact size of the sinkers compared to the model presented in this section. The sinkers' mass, however, was again inferred with low uncertainty.

Model 2: Subduction Zone
The second model describes a subduction zone in the domain Ω = [0, 3] × [0, 1], which is discretized with a mesh that is locally refined around the ridge and hinge zone, and further refined around the dipping weak zone that represents the mega-thrust fault (Figs. 8 and A2). The goal is to recover the shape of the subducting plate, its density, and a "weakened" viscosity in the hinge zone.
As the location and shape of continental plates are typically well constrained, we assume that the properties of the over-riding plate (including the weak zone) are known and invert for both the density field ρ and viscosity field η in the remaining part of the domain. We assume that the geometries of ρ and η coincide, which is accomplished by using a single level set function h for the shape of both fields. Using the same parameterization as in Section 6.1, we write ρ 2 = ρ 1 + exp(ρ log ) with ρ 1 = 1 given and invert for ρ log . Furthermore, we consider the background viscosity η 1 = 10 2 to be known and parameterize the subducting plate's viscosity as where the reference viscosity η pl = 10 5 is given and the prescribed function γ controls the size, location, and smoothness of the hinge zone (see Fig. 8b, cf. Rudi (2019)). The physical parameter of interest is the minimal "weakened" viscosity in the hinge zone, denoted by η w = min x η 2 (x).
However, to allow for sufficiently large variance in this value, we invert for w log , leading to the full parameter vector m = (h, ρ log , w log ).
Synthetic data are created with the density and viscosity fields (Fig. 8) with the true scalar Prior mean and variance of the scalar parameters are given in Table 2. For this inversion, we construct a non-zero mean h 0 for the prior distribution of the level set function as described in Section 4. We also computed cases with a zero prior mean for comparison. After one million samples, even the most likely samples of the chain obtained using the latter approach were not near the true parameter values (Fig. A3) despite fitting the data reasonably well, supporting the need for a tailored level-set prior mean to infer reasonable values.
In all computations, we draw five million samples and discard the first million as burn-in.
Again, traces of the norm of the level set function h L 2 (Ω) were inspected to check for convergence and sufficient mixing. We estimate the autocorrelation time for these computations was about 15,000, similar to the results in Section 6.1.
Comparing prior and posterior mean and pointwise variance of the density field ρ, the shape of the slab as well as its density are recovered well and the pointwise variance is significantly reduced ( Fig. 9). At first sight, the tailored prior mean and corresponding pointwise variance ( Fig. 9a-b) seem restrictive. However, comparing samples from the prior and posterior distributions (Fig. 10) shows this is not the case. The prior density samples mostly do not resemble plates with a large variance in the plate's density ρ 2 . On the other hand, the shapes of the posterior density samples closely resemble a true subducting plate with less variance in ρ 2 . As observed in the sinker model in Section 6.1, higher-density artifacts can appear close to the bottom of the domain because their influence on surface measurements is small.
If just velocity data is used, the conditional distribution illustrates well the trade-off between recovered ρ and η w (Fig. 11). However, if just traction data is used, the density is substantially prior samples Lukas Holbach, Michael Gurnis, and Georg Stadler narrowed to values around the true value, but the viscosity of the hinge zone is not much changed from the prior marginal. By combining the two data types, the recovery of the density (from the traction) allows for the viscosity of the hinge zone to be much better estimated. The final marginal from the combined data is firmly centered on the true value (Fig. 11). Nevertheless, the shape of the plate does differ somewhat from the true shape within the hinge zone with a slight down warping of the top side of the slab (Fig. 9d). In addition, the top side of the slab has a wider zone of higher pointwise variance. This is consistent with earlier work that has shown that plate velocity is expected to go as u p ≈ where η L is the viscosity of the plate, D the thickness of the plate, R the radius of curvature of the plate in the hinge zone, and C 1 , C 2 and C 3 are constants (Conrad & Hager 1999). The posterior variance on the hinge zone viscosity is roughly 4 and this trades off with plate thickness as ≈ 4 1/3 , consistent with the size of the zone of high uncertainty around the hinge zone (Fig. 9d).
We are able to compare the inversion results when different data types are used given the true shape of the subducting plate, i.e., the only unknown parameters are η w and ρ 2 (Fig. 11d-f).
This reduces the dimension of the parameter space drastically, and as expected in all three cases the parameters are better constrained than when geometries are uncertain. Using only velocity data, there is a nonlinear positive correlation between the hinge zone viscosity and subducting plate's density. If the density is higher (and thus a larger force is pulling the plate), the hinge zone viscosity does not need to be as small for the same bending to occur, and vice versa. In the computation with only normal traction data, the parameters are very well constrained. Moreover, the correlation between the parameters changes its sign. As before, combining the data types yields the best result, in this case with rather low uncertainty.

DISCUSSION
We have demonstrated the potential of using Bayesian level set methods to infer subsurface structures and material parameters in Stokes flow and quantify their uncertainties. Through construction of tailored level-set priors, we have formulated a method to incorporate geometric knowledge from seismic imaging in the inversion while imposing few restrictions on the specific shapes. The method, furthermore, allows inversion for density and rheological parameter fields, complementing seismic imaging methods.
A focus on linear Stokes equations allowed for a clearer presentation of the underlying ideas.
While the method is applicable to models with nonlinear viscosities, this becomes computationally more challenging due to the substantially higher computational cost per sample. Due to the high-dimensional parameter space and the resulting slow decay of autocorrelation times, many samples are needed to approximate the posterior distribution. Since each sample requires solving the Stokes equations, the overall computational cost increases substantially when nonlinear Stokes equations are used. More efficient sampling methods can potentially alleviate the effect to some extent. A Gaussian approximation to the posterior distribution can be employed to enable faster sampling or as a direct approximation to the distribution (Pinski et al. 2015;Bui-Thanh et al. 2013). Although derivative information is typically required for these approaches and the nondifferentiable parameter-to-observable map would need to be adjusted, a derivative-free method to generate samples from a Gaussian approximation to the posterior distribution was proposed recently (Garbuno-Inigo et al. 2020).
In one of the presented model problems we used locally refined meshes to resolve the subduction physics of the setup. This refinement was static, i.e., it was kept the same for all samples.
An extension of the present method could be to use dynamic refinement that aims at resolving the level set interface, which changes in each sampling step. However, this would increase the cost per MCMC sample substantially, in particular for the case where the viscosity is constant and thus the Stokes discretization matrix only has to be assembled and factorized once, since these factors can be reused for all samples.
In two numerical examples with synthetic data, the shape of two sinkers as well as the shape of a subducting plate were inferred with their uncertainties, along with densities and viscosity parameters. There were important trade-offs: In models with a pair of sinkers, only their mass could be inferred with low uncertainty as variations in their size and density can lead to the same negative buoyancy force. In the hinge zone of the subducting plate, the plate thickness and viscosity have similar effects on the flow and therefore lead to higher uncertainties in this region, which is consistent with earlier work (Conrad & Hager 1999;Ratnaswamy et al. 2015). The results of both models demonstrate the benefits of combining two different surface data types, horizontal (plate) velocities and normal tractions, which reduced the uncertainties significantly compared to using more measurements of the same data type.
A limitation of some optimization methods meant to infer rheological parameters using adjoints of the Stokes equations (Ratnaswamy et al. 2015;Rudi et al. 2022) is that the density field driving the flow is not perfectly known. These methods could be expanded to include a scaling factor between assumed geometry of the temperature (or density) and their magnitude. For subduction zone problems, this is not ideal as plate velocity is partly resisted by forces within the hinge zone. Specifically, within the hinge zone, the resistance from the bending is the product of the effective viscosity of the plate and the cube of the plate thickness normalized by the radius Bayesian Level Set Method for Stokes Problems 27 of curvature of the bending (Conrad & Hager 1999;Ribe 2001;Buffet 2006). Plate geometry and radius of curvature are geometric quantities driving the flow and why substantial variance in the recovered viscosity within the hinge zone was found (Fig. 9d). Uncertainty quantification of mantle rheology needs to account for the uncertainty in the radius of curvature and plate thickness within the hinge zone, for the subduction zone problem.
Although There are now numerous constraints on slab shape that could be used in such inversions with level sets. When inverting for the shape of the subducting plate, it became evident that a tailored level-set prior representing "seismic" knowledge is crucial when dealing with more realistic models, as we were not able to obtain meaningful results without a tailored prior mean. Constraints on the prior mean could come from mapping traditional body wave seismic tomographic images (Xue & Allen 2007;Zhao et al. 1992) while prior variance could come from both tomography and detailed constraints on velocity gradients from seismic wave forms (Chen et al. 2007;Chu et al. 2012) or seismic interferometry (Shen & Zhan 2020). Construction of such density priors requires the mapping from seismic velocities to densities which is often accomplished through thermodynamic relations determined experimentally or from first principle calculations, such as through the framework in Stixrude & Lithgow-Bertelloni (2005). Taken together, there are now sufficient surface data and seismic controls within individual subduction zones to formally infer rheological parameters within the hinge zone and formally quantify uncertainties.