Simultaneous Inference of Plate Boundary Stresses and Mantle Rheology Using Adjoints: Large-Scale Two-Dimensional Models

Plate motions are a primary surface constraint on plate and mantle dynamics and rheology, plate boundary stresses, and the occurrence of great earthquakes. Within an optimization method, we use plate motion data to better constrain uncertain mantle parameters. For the optimization problem characterizing the maximum a posteriori rheological parameters we derive gradients using adjoints and expressions to approximate the posterior distributions for stresses within plate boundaries. We apply these methods to a 2-D cross section from the western to eastern Pacific, with temperature distributions and fault zone geometries developed primarily from seismic and plate motion data. We find that the best-fitting stress exponent, $n$, is about 2.8 and the yield stress about 100 MPa or less. The normal stress on the interplate fault zones is about 100 MPa and the shear stresses about 10 MPa or less.


INTRODUCTION
Plate motion is the result of a force balance involving mantle and lithosphere rheology, the longterm strength of fault zones that define plate boundaries, and buoyancy forces. But considerable disagreement and uncertainty about these characteristics remain, primarily because of the complexity of mechanical properties. The creep strength of the mantle is highly nonlinear and is de-5 pendent on pressure, temperature, and composition (Ranalli, 1995), while a range of strain-and composition-dependent processes control the development and evolution of fault zones (Scholz, 1990), such that mantle downwellings are characterized by relatively strong slabs that slide by the overriding plate. Plate bending (Buffett, 2006) and interplate coupling (Scholz & Campos, 2012) are distinct processes, each constrained observationally. These processes need to be cap-10 tured in models in order to decipher the plate tectonic controls on interplate coupling and how slab strength (Billen, 2008) and mantle rheology balance to control plate motions and mantle flow.
In two-dimensional forward models (i.e., computations with assumed parameters), the treatment of constitutive relations for creep and faults (shear zones) approximately consistent with those suggested from laboratory work has been accomplished in cross-sectional models of subduc-15 tion (Zhong & Gurnis, 1995;Zhong et al., 1998;Billen & Hirth, 2007;Crameri et al., 2012;Gerya, 2011;Gérault et al., 2015;Garel et al., 2014). Indeed, assuming strain rate-and temperaturedependent viscosities has led to forward models that are consistent with generic subduction zone evolution (Zhong & Gurnis, 1995;Garel et al., 2014;Burkett & Billen, 2009). In global models, these constitutive relationships and parameterizations are often simplified  Bertelloni, 2002) so as to make the problem computationally tractable. The simplifications entail the use of linear viscosities, potentially smaller lateral variations in viscosity that could exist (as suggested by 2-D models), and are usually low resolution. In subduction models, low resolution is limiting because it changes the character of deformation and flow within a subduction zone: Instead of a relatively strong plate that slides by and is resisted by the overriding plate, the oth-25 erwise strong plate advects through a wide, low-viscosity zone such that the plate neither bends nor is strong. These limitations have been partially overcome in models with resolutions of up to 1 km within and near the shear zones defining subducting plate boundaries Alisic et al., 2012Alisic et al., , 2010. But such models also have their limitations, especially in that they are forward computations that only reasonably fit plate motions. Such models show that the degree of 30 fault coupling likely varies between subduction zones, but the computational cost of each forward model precludes finding the true best-fitting model as well as the correlation between parameters.
To accurately estimate the forces controlling plate motions, we need an optimization scheme that faithfully resolves the key mechanical, rheological, and geometric aspects of the system. Here we move in that direction by expanding on the adjoint approach to efficiently compute derivatives 35 (Ratnaswamy et al., 2015) with actual plate motions and seismic constraints in 2-D cross-sectional models. Besides applying the method to geophysical data and inferring mantle properties, we enhance the method by inferring the stress within plate margins. We determine the conditional distributions of these stresses and inferred rheological parameters. Moreover, we demonstrate the stability of the numerical methods for inference and show that we can constrain the nonlinear 40 exponent of mantle deformation, the activation energy, the prefactor to a rheological model, and the effective viscosity and stress within fault zones, as well as the trade-offs among all of these quantities. In essence, we refine optimization methods for geodynamic models using 2-D models, paving the way for the geodynamic inverse models that will require computation of the entire spherical problem with global temperature and plate motion constraints. in Ω, −∇ · σ = −RaT e r in Ω, Simultaneous Inference of Plate Boundary Stresses and Mantle Rheology 5 where Ω ⊂ R n , n = 2, 3 is a bounded two-or three-dimensional domain with boundary ∂Ω. On that boundary we assume no-outflow and tangential free-slip boundary conditions, u · n = 0, T (σn) = 0 on ∂Ω.
Here u : Ω → R n and p : Ω → R are the unknown velocity and the pressure, respectively. The viscous stress tensor is defined as σ = 2η(x, T ,ε II , m)ε − pI, whereε = 1 2 (∇u + ∇u T ) is the strain rate tensor. Moreover, η = η(x, T ,ε II , m) is the (effective) viscosity, which depends on the 55 location in the domain x, a given temperature field T , on (the square root of) the second invariant of the strain rate tensorε II = ( 1 2ε :ε) 1/2 , and on rheology parameters summarized in the vector m, to be specified later. The momentum equation (2) is driven by thermal buoyancy, where Ra is the thermal Rayleigh number and e r is the negative unit vector in the direction of increasing gravity.
In (3), n represents the outward unit normal vector at the boundary ∂Ω, and T = I − n ⊗ n is the 60 tangential projection operator, where ⊗ denotes the outer product.
A nonlinear rheology is used in the upper mantle to account for dislocation creep and a linear rheology in the lower mantle for diffusion creep. A global yield stress σ y is used that allows for the dynamic weakening, which is triggered primarily within the hinge zones as slabs bend when entering the mantle. At a point x ∈ Ω, the viscosity is described as where η min and η max are the minimum and maximum bounds of the viscosity, respectively; ω(x) is the function describing the reduction of viscosity for each weak zone defining subducting plate boundaries; and a(T ) is the temperature-dependent component of viscosity. Further, σ y is a yield stress triggering plastic failure. We denote n as the stress exponent causing strain rate weakening when n > 1, and we call its inverse 1/n the strain rate weakening exponent. As detailed in the 70 next section, the vector m collects the parameters we aim to infer from data, such as the strain rate exponent, the weak zone factors, and the yield stress, possibly after some transformation.
The normal Arrhenius equation is linearized as the adjusted Frank-Kamenetskii law (Frank-Kameneëtìskiaei, 1969) where β is the nondimensional activation energy and η 0 is the viscosity prefactor. The reduction in 75 viscosity that represents the fault zones between converging plates at subduction zones is described by using the weakening function with a weakening coefficient 0 < w i ≤ 1 that is, generally, different for each subduction zone with index i; dist(Γ i , x) denotes the minimal distance of x and the center surface describing the weak zone Γ i , d is the width of the zone of full weakening, and b is the length-scale of smoothing.

80
Although the parameters governing ω(x) are intrinsic parameters in the viscosity formulation (4), the viscosity within the weak zones is not solely intrinsic, because strain rate weakening plays an important role.

ESTIMATING PARAMETERS: THE BAYESIAN INVERSE PROBLEM
Next we summarize the Bayesian inference approach and the computation of derivatives using the 85 adjoint method in Section 3.1. We describe the parameters and the effect of nonlinear transformations in Section 3.2. In Section 3.3 we show how Bayesian estimation of the rheology parameters can be used to estimate uncertain physics quantities such as the plate boundary shear and normal stresses.

Bayesian inverse problem and computation of derivatives 90
We now briefly summarize the Bayesian approach to parameter estimation. For a general introduction on the subject, we refer to Tarantola (2005) and Kaipio & Somersalo (2005) and, in the context of mantle dynamics, to Ratnaswamy et al. (2015). We combine the uncertain rheology parameters that we aim to infer from observations into a parameter vector m ∈ R p ; the inference parameters are discussed in Section 3.2. In the current work, observational data are considered to be plate velocities at the top surface of Ω, which we denote by ∂Ω obs . Since plate movement is assumed to be rigid, one velocity vector (or Euler pole for spheres) is available for each plate. We interpolate these velocity vectors to obtain one tangential velocity field at the surface per plate. These velocity fields constitute the observational data of the inverse problem and are only considered sufficiently far away from nonrigid plate boundaries. Corresponding to the observational data, denoted as d obs , 100 we define the observations from the solution of the Stokes equations (1-2) for given parameters m as f (m), which is supported on ∂Ω obs ; and we refer to f as the parameter-to-observable map.
Moreover, we decompose f (m) into an observation operator b(·) applied to the solution u(m) of the Stokes equations for given parameters m; hence, f (m) = b(u(m)).
In a Bayesian inverse problem, the parameters m and data d obs are considered random vari-105 ables to model our imperfect knowledge about these parameters. Furthermore, we assume that the combined measurement and model error is additive and that the error (d obs − f (m)) is Gaussian with zero mean and covariance matrix C data . We additionally choose a prior distribution for the parameters m that is Gaussian with mean m 0 and covariance matrix C prior . With these assumptions, it follows that the posterior distribution, which is obtained as the solution of the Bayesian inverse 110 problem, has the density where ∝ means equal up to a (normalization) constant that makes π post a proper density. The real-valued exponent J in the posterior density is defined as where the first term is a cost measuring the misfit between data and model output † and the second term penalizes deviations of parameters from prior knowledge. Since within mapping f (m), the 115 Stokes solution (u, p) depends nonlinearly on the parameters m, the posterior distribution (7) is non-Gaussian, and thus its statistical characterization is challenging. Since we aim at problems where we infer multiple global rheology parameters as well as multiple plate boundary weak zone † Note that in the case of observations being plate motions at the surface ∂Ω obs , the evaluation of the cost for the data misfit entails computing an integral over ∂Ω obs . In (8) the integral is implied in the quadratic form associated with the inverse error covariance matrix C −1 data .
factors, it is computationally difficult to fully explore the high-dimensional posterior distribution (e.g., through sampling methods), because of the computational cost that each solution of a high-120 resolution nonlinear Stokes system entails. Thus we have to resort to approximations of π post . One such approximation is to first minimize J (m) over m. This yields the maximum a posteriori  (7), it now becomes computationally tractable to locally describe π post .
The MAP point as well as the Gaussian approximation of the posterior requires derivatives of J with respect to m ∈ R p . We utilize adjoint methods to compute these derivatives efficiently, 130 because the computational cost of adjoint-based derivatives is independent of the number of parameters p. Next, we introduce adjoint variables obtained as solutions to adjoint equations derived from the Lagrangian (Hinze et al., 2009), which incorporate the minimization objective (8) and the Stokes equations (1-2) in weak form. The Lagrangian function L is defined as where v and q are the adjoint velocity and pressure and the arguments m, u ofJ in (9), are now 135 considered independent variables; thus The gradient G(m) of J (u(m)) in (8) with respect to m is given by the gradient of L with respect to m, provided that all variations of the Lagrangian with respect to (v, q) and (u, p) vanish; see Tröltzsch (2010); Borzì & Schulz (2012). These conditions for stationarity of the Lagrangian imply with the adjoint stress and with boundary conditions for v as in (3) but with This means that the difference between the observation data d obs and the model output b(u) drives the adjoint equation. Note that the adjoint stress τ defined in (13) depends on the Stokes solution u and that ⊗ denotes the outer product between second-order tensors. The gradient now can be evaluated as the sensitivity of L in (9) with respect to m, using the solutions u and v of the Stokes equations (1-2) and the adjoint Stokes equations (11-12), 145 respectively.

Inference parameters and their transformations
We collect the parameters we are inverting for in the parameter vector m. In order to avoid difficulties that arise when physical parameters are scaled very differently or must satisfy sign conditions, the inference parameters in m are (nonlinear) transformations of these physical parameters. In 150 particular, since the weak zone factors must satisfy 0 < w i ≤ 1, the corresponding inversion parameters are m w i ∝ log(w i ). Since the nonlinear exponent n is responsible for strain rate weakening viaε ters. The next section discusses how the parameters m can be inferred, together with estimates of their uncertainty, from observation data.

Uncertainty quantification for physical quantities
Here we derive estimates for physical quantities of interest (QoIs) from a computed solution of the Bayesian inverse problem from Section 3.1. In addition, we are able to propagate uncertainties 170 from parameter estimates in the rheological relationship to those physical quantities. Specifically, we focus on the average plate boundary shear stresses and normal tractions since these quantities are central to the linking of geodynamic to seismogeneic processes. For the ith weak zone, the average normal tractionσ where V χ i = Ω χ i dx is the volume of the weak zone i and χ i (x) ∈ {0, 1} equals one inside and 175 equals zero outside of the weak zone. Moreover, n w = n w (x) is the unit normal vector on the weak zone's center surface Γ i , evaluated at the point on Γ i that is nearest to x.
The normal and shear components of the stress are important because they effectively give the resisting stresses along the plate boundaries. The larger the resisting stress, the more mechanically coupled a plate boundary, and vice versa. Ultimately, these quantities will allow us to establish a relationship between seismic and mechanical coupling by comparing the weakening factors w i with the geographic variability of great earthquake occurrence and characteristics.
Our goal is to estimate the distribution of the quantities (16) given estimates and uncertainties of the rheological parameters m. For that purpose, let us summarize the quantities we are inter- i , in a vector q. We consider q as a function of the velocity u and, in general, 185 also as a function of the uncertain parameters m; hence A Taylor expansion of q with respect to m about the MAP point m map yields where J denotes the Jacobian of q(m) at the MAP point, For the next step, we assume to have computed the Gaussian approximation of the posterior distribution for m (see Section 3.1) with mean m map and the covariance matrix C post .  ing this Gaussian posterior and the truncated Taylor series (18), we arrive at an approximation for the resulting distribution of q. It has the mean q map = q(m map ) and the covariance matrix This follows from (18) and basic properties of affine-transformed Gaussian random variables. While q map can be evaluated directly, the covariance C QoI requires the Jacobian (19), which entails the sensitivity vectors ∂q/∂m j . Note that q in (17)  One straightforward way of computing the Jacobian (19) approximately is to use forward sensitivities. It is carried out with finite differences, where each parameter m i is perturbed by a small value δ and the QoI is evaluated at the perturbed parameter, where e i is a vector with entry one at index i and zero otherwise. In order to compute all entries 200 of J , the number of nonlinear Stokes solves amounts to the product of the number of QoIs and parameter dimension p. We are using forward sensitivities to compute the small amount of six QoIs and have a parameter dimension of eight. Therefore, the postprocessing step of propagating uncertainties to QoIs is computationally feasible for the inverse problem we are targeting. To derive the sensitivities in (19), we use a formal Lagrangian procedure analogously to the 210 approach in Section 3.1. Specifically, in Equation 9, we substitute J (m, u) by one scalar-valued QoI q i (m, u) (i.e., the ith component of q) and carry out the same formal Lagrange procedure as in Section 3.1. Since we performed the substitution of J (m, u) with q i (m, u), the right-hand side of the adjoint equation (12) becomes the derivative of q i (m, u) with respect to u. Therefore, for every component q i of the QoI vector q, an adjoint equation with a different right-hand side 215 must be solved. In addition to the modified adjoint equation, a new term appears in the gradient expression (15), which is the derivative of the QoI's Lagrangian with respect to the parameters m.
This is due to the explicit dependence of q i (m, u) on the parameters m.
We consider the QoI to be the normal tractionσ where the adjoint stress τ has been defined in (13). The adjoint equations for the QoIσ frame was used since the sidewalls on two-dimensional cross sections preclude any large-scale differential motion between the bulk of the mantle and plates, that is, any net rotation. We account for uncertainty in plate motions with the standard deviation σ data .   Table 1.
When solving the nonlinear Stokes flow problem (1-2), it is crucial to resolve the thermal boundary layers and fault zones. This requires using either a discretization on a fine uniform mesh, which is computationally very expensive, or a discretization on an adaptively refinement mesh, which is algorithmically more complex. Here we use adaptive mesh refinement (AMR) to 270 refine the mesh in areas such as oceanic plates, slabs, the mantle wedge, and fault zones. The resulting adaptive meshes exhibit a difference of eight refinement levels, which corresponds to a factor of 256 for the side lengths between the coarsest and finest mesh elements. We reach the finest resolutions at fault zones, where the spacing of degrees of freedom for velocity and pressure is only about 0.6 km-a small fraction compared with the 25,644 km extension of the domain.

275
Our AMR capabilities are based on the p4est library (Burstedde et al., 2011(Burstedde et al., , 2013, which manages adaptive meshes in parallel by using scalable algorithms that exploit topological "oc-tree" structures of the mesh. For the function approximation, we use quadratic finite elements for velocity and first-order discontinuous elements for pressure to ensure local mass conservation within each element. Resolving the fine-scale structures of the thermal boundary layers and 280 fault zones through AMR is performed dynamically while solving the nonlinear Stokes equations; thus, the mesh is refined during the nonlinear solution process. The computational challenges of solving these nonlinear systems are severe. We use an advanced nonlinear solver based on Newton's method, which was specifically developed for viscoplastic Stokes flow problems (Rudi et al., 2020). The nonlinear solver is combined with a linear solver into an inexact Newton-Krylov 285 method. The linearized Stokes systems arising at each Newton iteration are solved with a preconditioned Krylov GMRES method, where the preconditioning consists of an inverse Schur complement approximation (Rudi et al., 2017) and hybrid spectral-geometric-algebraic multigrid detailed in Rudi et al. (2015). This multigrid approach uses spectral coarsening of the polynomial order of finite elements, followed by a sequence of geometric multigrid levels coarsening the adaptive 290 mesh, and concludes with algebraic multigrid. Code details and validation are provided in Rudi (2018).

NUMERICAL RESULTS OF BAYESIAN INFERENCE
We infer rheological parameters that best fit observed plate motions with a series of numerical experiments, which are summarized in Table 2. We determine how robust our algorithms for in-  We now look at the initial guess in the context of earlier work using synthetic data with known 305 rheological parameters, with conditional and marginal distributions computed through repeated forward solves (without assumptions on their form using MCMC). In this work, Ratnaswamy et al. (2015) found mostly unimodal posterior distributions that were well approximated by Gaussian distributions near the best-fitting values (MAP points). For the Pacific cross-section data with two different configurations of data errors, discussed below, we estimate rheological parameters with 310 three different guesses on n (2.0, 2.8, and 3.2). In all cases, the optimization algorithm converges on the same n ≈ 2.8 ( Fig. 2A), indicating that there does not appear to be any substantial local minimum for cross-sectional inference with geophysical data. We observe that other parameters  Figure 2. For series A.1, convergence of optimization algorithm (Newton BFGS) to find the optimally fitting parameters using different initial guesses for the stress exponent n. The different initial guesses are n = 2.0, n = 2.8, and n = 3.2. The three (out of eight) parameters-stress exponent (A), Mariana weak zone factor (B), and yield stress (C)-converge to the same optimum values regardless of initial guesses. Correspondingly, the gradient norm of the inverse problem is reduced by a factor between 10 −4 and 10 −3 . such as yield stress and weak zone factor also converge to the same respective values regardless of the initial value given to n (Fig. 2C, D).

315
Our focus on the stress exponent, n, comes from the strong sensitivity of the inverse problem with respect to n, which is further documented below. In addition to different initial guesses for n, we test the influence of the transfer function that maps a (normalized) inference parameter value m n to the value n used in the Stokes model. One version of the transfer function is linear, m n ∝ n, and the other version is inverted, m n ∝ 1/n. Hence m n directly represents the strain rate From inference series B.1 (where m n ∝ 1/n) considering the case with the smallest assumed data error (σ data = 0.25 mm/yr), we have a model (Fig. 3C) that fits the data the closest; the optimization method rapidly minimizes the cost function during several iterations with the forward velocity converging toward the plate motion data, including the major plates and small plates in the 335 back-arc (Fig. 3C). The effective viscosity through the entire domain varies between 10 18 and 10 24 Pa-s with major strain thinning within the upper mantle below the plates, in a halo above the slab in the upper mantle, and weakening in the hinge zone of each subducting plate (Fig. 3C, right).
In this inference, the viscosity structure of the Mariana subduction zone differs substantially from that of Chile. The Mariana shows more weakening of the slab throughout the hinge zone, below 340 the imposed weak zone (e.g., the fault). There is also weakening above the fault on the overriding  The results depend on the magnitude and type of observation errors, that is, on σ data , where 345 σ data is assumed to contain data and model errors and it is incorporated in C data from (8) generally fits the data well for any of the assumed data errors. But the small back-arc basin for the Mariana subduction zone is not well fit (Fig. 3B,C). When the data error is large, the surface plate motions do not display divergence above the Mariana slab. As σ data is reduced, the fit of the velocity of this small Mariana plate improves, especially between σ data values of 1 and 0.5 mm/yr.
During this trend toward resolving the back-arc motion better, there is a substantial transition in MPa, which leads to much more (e.g., broader scale) yielding within the hinge zones of the three slabs. When the yield stress decreases, there is a jump in the weak zone factor for the Mariana from 10 −5 to 10 −4 (Fig. 4C). Essentially, the divergence above the slab, referred to as trench rollback, causes the Mariana slab to roll back. The slab is able to roll back only if it can easily bend 360 in the hinge zone and a lower effective viscosity in the hinge zone is required (Alisic et al., 2012); consequently, fitting the roll-back in the kinematic data well leads to a global reduction in the yield stress and hence the quite evident sharp reduction of σ y as σ data decreases. While it is interesting that the model is capable of fitting the back-arc motion, the inferred parameters might be a result of under-estimating observation and model errors. Table 3. Inference series C.1. Prior and posterior modes and standard deviations. Standard deviations are "additiv" (±) for normal distributions and "multiplicative" ( ) for log-normal distributions. log-normal 1.0 × 10 −5 10 7.9 × 10 −6 8.5 Mariana weak zone factor log-normal 1.0 × 10 −5 10 4.9 × 10 −6 9.4 Ryuku weak zone factor log-normal 1.0 × 10 −4 10 1.2 × 10 −3 2.1 plate size. Essentially, when the error is constant for each data value (recall we have one data value per node in the finite element mesh within the interior of plates), the larger plates are weighted more strongly, as we have already seen with the Pacific Plate being well fit even for large data errors. However, the small plates, although geodynamically of substantial importance, exert little 370 control in the inference. We address this in inference series C.1 by assuming a data standard deviation proportional to the square root of the plate size. The inference with this plate sizedependent error leads to a good fit for most plates, from the large Pacific Plate to the Mariana microplate (Fig. 5). We provide more details of this inference series in Table 3. The table lists  which are obtained by marginalizing the posterior covariance matrix. We can learn from this table that for the parameters, where standard deviations in the posterior are reduced relative to the prior, the data and model are able to inform these parameters.
Looking at estimated parameters with plate size-dependent data errors in more detail, we see 380 that a number of parameters show a positive trade-off with the stress exponent, n, evident through 2-D marginal distributions ( Fig. 6 and Fig. S18 in the Supplementary Material). Both the yield stress and the stress exponent control the nonlinearity of the material, and a high n, which leads to more weakening, must trade off with a higher σ y so that there is less volume (usually in the hinge zone) that experiences yielding. The activation energy, E a (which is the dimensional quantity 385 associated with β in (5)), trades off with n, since a high E a by itself leads to higher viscosities within the cooler parts of the domain (the slabs and plates). Therefore, a larger n is needed to weaken this cold material. There is also a strong trade-off of E a with σ y (Supplementary Fig. S18).
This trade-off between n and E a has been suggested in early studies of the lithosphere and mantle using forward models (Christensen, 1983). We further consider the inference series C.1 with plate size-dependent error, and we take the procedure a step further by inferring the normal and tangential (shear) stresses within the fault zone. We find that the weak zone factors for the Chile, Mariana, and Ryuku subduction zones vary from 1.9 × 10 −6 to 1.2 × 10 −3 (Table 4)-a substantial variation over nearly three orders of magnitude. However, we observe that the average tangential and normal stresses for these faults varies 395 by a much smaller degree (Fig. 7). The values for the normal stresses vary between 83 and 110 MPa while the tangential stresses vary between 3 and 22 MPa. This interplay between rheological parameters is important and demonstrates the nonlinearity and strong interactions between global rheological parameters (yield stress and strain rate exponent) and local coupling parameters.

400
We have inferred rheological parameters (nonlinear exponent, yield stress, and activation energy), viscosity prefactors within the upper mantle and hinge zone of the subducting plate, the coupling coefficient and stresses within several subduction megathrusts, and the covariances between them.
Since inference is carried out in a two-dimensional geometry where it is difficult to isolate plate forces and kinematics solely along a cross section, the results are preliminary and illustrative of the 405 potential of the approach. The method overcomes substantial limitations in previous geodynamic inference by not needing to simplify the rheogical laws. Moreover, when placed in the perspective of prior work with synthetic data constructed with forward models showing recovery of known nonlinear rheological parameters (Ratnaswamy et al., 2015), the results point to a considerable robustness in this application of adjoint-based MAP estimation in plate and mantle dynamics. However, the inferred rheological parameters and plate coupling values stem from a single (universal) form governed by stress and temperature. This means that other processes that could influence deformation in the lithosphere and mantle, such as grain size and volatile concentrations, might be mapped together. Grain size and volatiles could vary spatially in the mantle (Riedel & Karato, 1997;Hirth & Kolhstedt, 1996;Spasojevic et al., 2010), and these would likely lead to a spatial 415 variation in the dominant deformation mechanism in different parts of the mantle (Hall & Parmentier, 2003). An alternative approach would require that the rheological parameters (like the nonlinear exponent, n) vary spatially. Exploration of this method with synthetic data showed that with plate motion constraints and with many more unknowns than independent data, this inference method was less effective at parameter recovery while showing rapid decay in parameter recovery 420 with depth (Worthen et al., 2014). A compromise could be a tectonic regionalization to the rheological parameters, such as a small set of nonlinear exponents, n, that vary with tectonic type (e.g., within the mantle wedge versus below cratons). This was attempted successfully with a Markov chain Monte Carlo approach by Baumann et al. (2014) in a two-plate system with different parameters for the plates and the crust versus the mantle. Considerable opportunities remain for future 425 exploration in this direction using the adjoint method once full global inference become available.
Mantle and lithosphere deformation becomes qualitatively more nonlinear when n > 3 (Alisic 435 et al., 2012) since there is substantially more shear thinning in the upper mantle using experimental values compared with the recovered values. Earlier, using a global geometry but with a forward approach similar to the present inversions with resolved weak (shear) zones for the interplate faults and resolved hinge zones, Alisic et al. (2012) preferred an exponent, n, of about 3.0 and yield stress, σ y , of about 100 MPa when the models were fit to both the major plates and the 440 microplates adjacent to trenches that rapidly roll back. Although these global models suggest a larger n and about the same yield stress from what we recover from the cross-sectional inference, they are within the 95% confidence interval of the 2-D marginal distributions (Fig. 6). We currently cannot determine whether this difference between the two sets of models is due to the geometry (2-D cross-sectional versus full spherical), the more realistic parameterization of the fault zones in 445 this inference, or some other reason. Speculating that the change in fault parameterization causes the differences is not unreasonable: The Alisic et al. (2012) study had near-surface dip angles for the faults that were steeper, and fits to the plate motions could have been accomplished through more strain rate weakening achieved by a higher n than otherwise inferred with more realistic, continuously curving fault zones.

450
Taking the geophysical models, either the spherical or the cross-sectional, with n = 3.0 or 2.8 at face value suggests that the nonlinear exponent may not be as high as preferred from experimental work for either dry olivine (3.0 to 3.6) or wet olivine (3.5 to 4.9). Since we impose a non-negativity restriction (Tarantola, 2005) on the inferred parameters, strain rate exponents less than zero cannot be recovered, and so the hypothesis for a velocity-strengthing material cannot be 455 precluded. Bercovici (1993) found that a value of n = −1 can produce plate-like behavior, but without yielding or temperature-dependence, both included here. The inferred activation energies are 530 ± 16 kJ/mol, which is in the range of dry olivine, 540 kJ/mol, but more than the expected values for wet olivine, 430 kJ/mol, respectively, in the dislocation creep regime (Karato & Wu, 1993). We see a strong trade-off with n. 460 We find yield stresses of 94 ± 16 MPa, six times smaller than those found in rock mechanics experiments (600 MPa) (Mei et al., 2010), a conclusion in line with previous studies using forward geodynamic models with generic plate motions (Zhong & Gurnis, 1996;Moresi & Gurnis, 1996).
However, this postiori distribution is not significantly different from the prior. The mean of 94 MPa is slightly smaller than the preferred values found in the global forward models of Alisic zones. The moderately higher yield stresses found here will recover more realistic estimates once we can achieve full spherical models. 470 We find an average (arithmetic mean) viscosity of the upper mantle through the entire cross section to be 6 × 10 19 Pa-s. The average samples the upper mantle from the base of the thermal lithosphere to 410 km depth, while excluding the thermal slab, and is dominated by the suboceanic environment. We note that the Pacific and Nazca Plates several hundred kilometers from the Mariana and Chile trenches are about 10 19 Pa-s, at the higher end of the 5 × 10 17 to 1 × 10 19 Pa-s 475 inferred for the Indian Plate several hundred kilometers from the Sumatra Trench using transients excited by the 2012 Indian Plate earthquake (Hu et al., 2016). In global models, it will be possible to alter the adjoint-based inference approach to incorporate priors within specific regions in the lithosphere and upper mantle where we have explicit geodetic constraints from transients induced by co-seismic, lake, and glacial loads.

480
For thermal convection with variable viscosity, Christensen (1984) has argued that a strain rateweighted viscosity, which we call η , is more appropriate since what limits convection is the viscosity in those regions undergoing the strongest deformations. The strain rate-weighted average viscosities within the Chile, Mariana, and Ryuku hinge zones are 4.0, 5.0, and 2.6×10 22 Pa-s, respectively, all about the same as the 6 × 10 22 Pa-s inferred from simple bending models of linearviscous plates (Buffett & Rowley, 2006). Whether η is more appropriate for the asthenosphere is not entirely certain, but it is the most sensible for the hinge zone where the forces transmitted to the subducting plates are strongly influenced by resistance from the bending plate that yields while it bends.
High resolutions of the computational mesh are required to resolve the weak zones (faults), 490 which are characterized by an intrinsic material property (e.g., the coupling factors). Additionally, the hinge zones, where the bending plates yield and drive rapidly moving microplates, also need to be resolved. Hence, high-resolution forward models are essential for meaningful inference results. In addition, the parameterization of data errors can strongly influence the proper inference of small-scale features such as back-arcs, fault zones, and bending plates. The computational experiments with different data errors allow us to demonstrate how the material properties change between cases where the microplates sufficiently contribute to the objective function of the inverse problem versus when they have relatively small weights compared with larger plates. A small constant data error can lead to overfitting. This motivated us to vary data error according to the square root of plate size, allowing us to mitigate over fitting and achieve the back-arc motion. The global 500 yield stress can change substantially when such back-arc extension (trench roll-back) is resolved.
How this will influence parameter estimation with a global, fully spherical geometry awaits to be demonstrated.
The coupling factors between the subducting and overriding plates, w i , along with the associated stresses have been inferred. The coupling coefficients are intrinsic parameters whereas 505 stresses are extrinsic. We find that two weak zones (Mariana and Chile) have small values of about 10 −5 with large variances (Supplementary Material) while the Ryuku weak zone is substantially larger at about 10 −3 with a smaller variance. This statement is independent of the yield stress and the nonlinear exponent, inspite of the large posteriori variance on both quantities. Both yielding and shear thinning can weaken the boundary between the plates, but the inference algorithm is 510 finding coupling factors that vary by nearly two orders of magnitude. However, the effective viscosities and stresses within the shear zones do not vary by nearly this amount (Fig. 7). Although the total variation of the stresses in the upper mantle is about 10 3 (from ∼ 0.1 MPa to ∼ 200 MPa), the variation within similar dynamic elements (e.g., plate interfaces, plate interiors, slabs, and the central parts of the asthenosphere) are all much smaller (Fig. 5D). This is significant for 515 understanding the occurrences of great earthquakes and other tectonic processes because despite substantial differences in the material properties of megathrust (such as differences in sediment thickness between trenches (Ruff, 1989;Heuret et al., 2011)) or in the strength of potential regional tectonic controls (such as plate age and convergence rates (Ruff & Kanamori, 1980)), the plate-mantle system will deliver nearly constant stresses to similar tectonic elements because of 520 the strong nonlinearity in the rheology of the system (through both n and σ y ).
Caution must be exercised in interpreting the geophysical significance of the difference between the large values for Ryuku and the small values for Mariana and Chile because of the two-dimensional nature of the forward model. The Ryuku slab only partially penetrates through the Philippine Sea Plate (especially since we placed no constraint on the velocity of the Eurasian Plate, Fig. 5). The inferred shear stresses within the weak zones are consistently smaller than those of the normal stresses, consistent with models of the seismogenic process that usually start with the premise of a frictional material with shear stresses being a fraction of the normal stress (Scholz, 1990).

530
The inference results illustrate the substantial trade-offs expected between the global rheological and local coupling parameters and point to the strength of the approach. The adjoint-based inference method described here has been implemented in a finite element code that scales to millions of cores on parallel supercomputers (Rudi et al., 2015). It has been used extensively in ultra-high-resolution full spherical models for the present day  and the geo-535 logical past (Hu et al., 2021). This use suggests that such high-resolution inference of the highly nonlinear plate-mantle system are within reach.

ACKNOWLEDGMENTS
This work was partially supported by the U.S. National Science Foundation (NSF) through awards EAR-1645775 and EAR-2009935 to MG and EAR-1646337 and DMS-1723211 to GS. This ma-540 terial also was based in part upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. Computations carried out on the Texas Advanced Computing Center (TACC) Stampede-2 supercomputer using allocation TG-DPP130002 of the Extreme Science and Engineering Discovery Environment (XSEDE) supported by NSF grant ACI-1548562.

DATA AVAILABILITY
The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Material files. The code (Rhea) used for the forward models and the inference is available upon request.