A mixed, uniﬁed forward/inverse framework for earthquake problems: fault implementation and coseismic slip estimate

technique. We demonstrate the potential of this new computational framework by performing a linear coseismic slip inversion through adjoint-based optimization methods, without requiring computation of elastic Green’s functions. Speciﬁcally, we consider a penalized least squares formulation, which in a Bayesian setting—under the assumption of Gaussian noise and prior— reﬂects the negative log of the posterior distribution. The comparison of the inversion results with a standard, linear inverse theory approach based on Okada’s solutions shows analogous results. Preliminary uncertainties are estimated via eigenvalue analysis of the Hessian of the penalized least squares objective function. Our implementation is fully open-source and Jupyter notebooks to reproduce our results are provided. The extension to a fully Bayesian framework for detailed uncertainty quantiﬁcation and non-linear inversions, including for heterogeneous media earthquake problems, will be analysed in a forthcoming paper.

we will extend this linear inversion to a fully Bayesian framework for detailed uncertainty quantification and provide non-linear inversions, for example for heterogeneous material parameters, for earthquake type problems.
This manuscript is structured as follows. We first present a brief description of the forward-inverse framework in Section 2, and then compare the new fault implementation within the mixed FE elastic formulation to the standard displacement method and to the split node technique in Section 3. Next, we address the common coseismic slip problem as an example application and compare our inversion results to the classic linear approach using Green's functions (Section 4). We conclude in Section 5 by discussing capabilities and limitations of our approach.

F E n i C S -h I P P Y l i b F R A M E W O R K
Our open-source forward-inverse modelling framework is based on two advanced numerical libraries, FENICS and HIPPYLIB. FENICS (Logg & Wells 2010;Logg et al. 2012) is a high-level parallel FE collection of software components for automated and efficient solution of PDEs. It includes several libraries for the FE discretization, assembly and solution of linear and non-linear systems of equations. In FENICS, any PDE can be explicitly and easily expressed in variational form using the Unified Form Language (Alnaes et al. 2014) Python library. This makes a problem coded in this framework transparent, reproducible, flexible for multiphysics formulations and easy to implement. The variational forms of these equations can then be automatically discretized, converted and assembled into low-level C++ codes using the FENICS form compiler (Kirby & Logg 2006) and the high performance library DOLFIN (Logg & Wells 2010). The latter provides the user interface and integrates all other computational components, communicating with external libraries such as PETSC (Balay et al. 1997(Balay et al. , 2020 and TRILINOS (Heroux et al. 2005) for the numerical linear algebra, SCOTCH (Pellegrini 2008) for the mesh partitioning, and MPI (Gropp et al. 1999) and OpenMP (Dagum & Menon 1998) for parallel computing. FENICS is well tested and benchmarked through several available demos and applications in the Earth sciences (e.g. Vynnytska et al. 2013;Rhebergen et al. 2015;Tosi et al. 2015;Damiani et al. 2020;Haagenson et al. 2020). For instance, Vynnytska et al. (2013) provided 2-D/3-D benchmarks for mantle convection problems, and  conducted a wide range of geodynamic benchmarks using TERRAFERMA, including for subduction zone thermal structure (van Keken et al. 2008).
This advanced framework can interoperate with the HIPPYLIB package (Villa et al. 2016(Villa et al. , 2018. Built on FENICS and PETSC for the discretization of the PDEs and scalable linear algebra operations and solvers, respectively, this library implements state-of-the-art scalable adjoint-based algorithms for PDE-based deterministic and Bayesian inverse problems. In HIPPYLIB, derivative information-that is gradients and actions of the second derivative of objective functions (Hessian)-are efficiently computed using the adjoint method while leveraging the automated symbolic differentiation and assembly of variational forms in FENICS. These are essential ingredients not only for the solution of the deterministic inverse problem but also for uncertainty quantification.
HIPPYLIB preserves all of the flexibility of the underlying libraries, allowing solution of linear and non-linear and stationary and time-dependent PDE-based systems of equations. It provides a collection of functions for deterministic and Bayesian solution of inverse problems, accelerated by adjoint-based gradient and Hessian operations. While in a deterministic inversion the result is found by a least squares minimization for the 'best' model parameters, a Bayesian framework provides a posterior distribution of likely values within a range, thereby quantifying uncertainties and trade-offs in information. Algorithms to solve linear and non-linear deterministic inverse problems in HIPPYLIB make use of common kernels, such as randomized singular value decomposition (SVD) methods . In the linear case, for example, the solution of the deterministic inverse problem is found using conjugate gradients (CG), while for the Bayesian inverse problem, the posterior is Gaussian with mean given by the solution of the deterministic problem and covariance by the inverse of the Hessian. In the case of non-linear inverse problems, the deterministic inversion is solved by use of inexact Newton-CG, while Bayesian solution is computed using geometric Markov chain Monte Carlo (MCMC) methods (Beskos et al. 2017), which use Hessian information to accelerate sampling.
Issues such as high-dimensionality [O(10 6 ) parameters] of large non-linear geophysical problems, highly concentrated posterior distributions (Baumann et al. 2014;Baumann & Kaus 2015;Gallovič et al. 2019), and the slow convergence of Monte Carlo methods have made Bayesian inversion for complex problems intractable using methods such as black-box MCMC. HIPPYLIB efficiently overcomes these challenges by exploiting the intrinsic low dimensionality of the parameter-to-observable map of the problem (Flath et al. 2011;Bui-Thanh et al. 2012;Isaac et al. 2015;Wang et al. 2018;Chen et al. 2019), and by exploiting posterior geometry via adjoint-based gradient and low-rank Hessian information (e.g. Bashir et al. 2008;Martin et al. 2012;Petra et al. 2014;Bui-Thanh & Ghattas 2015;Alexanderian et al. 2016a;Beskos et al. 2017). These techniques require a number of forward model solves that is independent of the parameter or data dimension (as opposed to, for example, gradient-only or derivative-free methods), and depend only on the intrinsic information contained in the data about the model (e.g. Bui-Thanh et al. 2012, 2013Isaac et al. 2015). Given a vector field f(x) for each point x ∈ indicating body forces, the equation of linear elasticity in the Hellinger-Reissner form seeks to find the stress σ and displacement u which satisfy the constitutive and the linear momentum equations, including the boundary conditions (e.g. Arnold 1990; Arnold et al. 2007): where A = A(x) is the fourth-order elastic compliance tensor, which is a symmetric and positive definite linear operator M → M describing the material properties of the medium. ε(u) = sym(∇u) = 1 2 ∇u + (∇u) T is the strain tensor, which is the symmetric part of the gradient deformation tensor, u 0 is the imposed displacement values at the boundaries, n denotes the outward unit normal of ∂ , and t is the traction. In the case of an homogeneous and isotropic elastic material, the compliance A(x) depends only on the two Lamé coefficients, the shear modulus μ and λ: where I is the d × d identity matrix, and tr(σ ) is the trace of the stress tensor. The formula above relies on the symmetry of the stress tensor and the invertibility of the stress-strain relation (e.g. Rognes & Winther 2010).
In this work, we are interested in simulating the cosesmic slip s along a fault plane F in , as shown in Fig. 1. For a fault discontinuity F , let n + and n − be the two unit normal vector fields on F with opposite directions (n + = −n − = n F ). For example, n + represents the outward unit normal vector from the '+' side pointing towards the negative side of the fault. Then using the above notation, we write the interface conditions at the fault interface as where φ = φ + − φ − is the jump operator, and T (n F ) is a tangent operator which allows to take the component of the displacement u parallel to the fault plane F . Given the normal vector n = (n 1 , n 2 , n 3 ), we define the vector T (n) = (n 2 , −n 1 ) in 2-D and in 3-D the 2 × 3 matrix as where these two expressions can be found by decomposing the unit normal over the basis. In 2-D we look for a vector orthogonal to a given normal, and in 3-D we first get a plane and then find an orthogonal pair.
To derive a variational formulation of the strong form in eq. (1) with the fault interface condition stated in eq. (3), we first introduce the Lagrange multiplier r = skew(∇u) = 1 2 (∇u − (∇u) T ) from the space of skew-symmetric matrices K, which has a physical meaning of rotation (Fraeijs de Veubeke 1975), and substitute ε(u) = ∇u − r. Then, we construct the variational form by taking the dot product of eq. (1) Downloaded from https://academic.oup.com/gji/article/230/2/733/6524187 by University of Texas at Austin user on 13 April 2022 with weighting functions τ ∈ , ω ∈ W and ξ ∈ , and setting the integral over equal to zero: where τ , ω, ξ are the test functions for stress, displacement and rotation, respectively, and the spaces are defined as = {τ ∈ H (∇·, , M) : , which represent the space of square-integrable matrices fields with square-integrable divergence satisfying the traction boundary conditions, and the spaces of all square integrable vector fields (Arnold 1990).
We define the asymmetry operator as(σ ) = (σ 12 − σ 21 ) in 2-D and as(σ ) = (σ 32 − σ 23 , σ 31 − σ 13 , σ 21 − σ 12 ) T in 3-D. The restriction on the stress space along F is effectively satisfied when the fault is resolved by the computational grid. The last equation of eq. (5) is necessary in order to enforce the symmetry of the stress tensor weakly (Arnold et al. 1984a;Stenberg 1988;Farhloul & Fortin 1997;Arnold et al. 2007;Boffi et al. 2009;Cockburn et al. 2010). Note that the Lagrange multiplier r is a scalar and vector field in 2-D and 3-D, respectively. The integration by parts of the non-conforming term in the first equation gives where ds and dS represent the integration of the integrand over the external and internal boundaries, respectively. We recognize that the traction boundary condition (σ · n = t) becomes essential, hence it has to be imposed a priori onto the function space where the stress tensor is sought, while the displacement boundary condition (u = u 0 ) arises naturally from the weak form derivation. Decomposing the last term in eq. (6) into its normal and tangential (relative to the fault F ) components results in where we have used the slip definition (eq. 3) and the fact that τ ∈ to obtain the last equality. Hence, the variational formulation of the elasticity equation in this mixed form reads: seek (σ , u, r) ∈ × W × such that where we use the superscript ' + ' to indicate the side of the fault where the slip is prescribed (Fig. 1). In this scenario, the slip s will be negative for a thrust fault and positive for a normal movement. Vice versa, we could also write a negative superscript for the unit normal, but in this case we would need to take a positive sign of the slip to indicate a reverse movement. We will see in the next section how the fault problem formulation in eq. (8) can be implemented by using a stable pair of function spaces to discretize the displacement field with discontinuous piecewise polynomial elements, thus allowing relative motions between two adjacent cells. This approach, which naturally arises from the integration by parts in the derivation of the variational formulation, is contrasted with the split node (Melosh & Raefsky 1981) and the decomposition (e.g. Aagaard et al. 2013) techniques. While in the first method the slip is explicitly prescribed on both sides of the fault through a modification of the local force vector, in the decomposition approach the slip is imposed as double couple point sources via a Lagrange multiplier. Our approach differs from the methods above in the sense that the slip is prescribed, at the infinite-dimensional (continuous) level, directly as a constraint in the displacement field, without applying a proper traction or requiring local modifications of the force vector. In addition, our approach does not introduce additional unknown variables at the fault nodes as for example PYLITH (Aagaard et al. 2017) does, which generally requires particular techniques for the matrix solution (Aagaard et al. 2013). Another advantage of our approach is that the slip vector appears explicitly in the weak form, which allows derivation of the gradient for the inverse problem at the continuous level, leading to consistent discretizations of all field variables, and inversion for the slip distribution without having to compute elastic Green's functions (Section 4). Lastly, the calculation of the stress as primary variable may allow to impose constraints on the fault tractions, allowing the slip to be consistent with fault constitutive models, with potential applications to dynamic earthquake problems. First, second and third degree kth of BDM k × DG k − 1 × DG k − 1 elements for the discretization of stress, displacement and rotation, respectively. Left-hand side: Brezzi-Douglas-Marini (BDM k ) and Discontinuous Lagrange (DG k − 1 ) elements for triangles (2-D). Right-hand side: same as above, but for tetrahedra (3-D). Black dots indicate the degrees-of-freedom (DOFs) within each element. Arrows normal to an edge or a face denote DOFs associated to normal components of vector fields along that edge or face. Gray circles and dots of BDM elements indicate bubble functions, that is functions with vanishing normal trace on the boundary of the element. For the discontinuous Lagrange elements, all the DOFs are internal to the elements.

Mixed finite element method
Next, we present the mixed FE approximation of eq. (8). Mixed FE methods are a type of FE method in which additional fields are introduced as unknowns in the formulation of a PDE problem, often leading to a so-called saddle point problem. In the case of elasticity (Arnold 1990), the stress field is introduced in the formulation in addition to the displacement field (the primal variable); the resulting mixed form is in contrast to the traditional displacement form. There are several studies that analyse the choice of stable finite-element spaces h , W h and h to discretize stress, displacement and rotation, respectively (e.g. Arnold et al. 2007;Falk 2008;Cockburn et al. 2010;Rognes & Winther 2010;Ambartsumyan et al. 2020). The most common choices are those introduced by Arnold et al. (2007), where the lowest order elements are the union of linear vector polynomials with continuity of normal components over the element facets for the stress Nédélec 1986), and piecewise discontinuous constants for the displacement and the rotation (Fig. 2, top panel).
For an arbitrary polynomial degree k and considering a 2-D/3-D finite-element discretization T h of the domain , the kth order elements for the elasticity equation are where BDM k represent the kth order of Brezzi-Douglas-Marini elements  which are vector polynomials with continuity of normal components over inter-element facets (Fig. 2). The choice to use piecewise discontinuous Galerkin elements DG k − 1 to discretize the displacement vector field allows to prescribe relative motion between two adjacent cells on the fault plane. Therefore, the mixed FE approximation of eq. (8) can be written as: where h is the discretized domain. The mesh conforms to the fault geometry, that is h, F is the union of facets which align with the fault geometry in the triangulation of h . This system of equations has a unique solution both on continuous and discrete levels. The choice of the spaces h , W h and h in eq. (10) provide stable finite-element approximations (e.g. Arnold et al. 2007), which satisfy the stability conditions in the Brezzi's theory of mixed methods (Brezzi 1974). Finally, the system possesses the same order accuracy for all variables in their corresponding norms (e.g. Arnold et al. 2007;Cockburn et al. 2010;Ambartsumyan et al. 2020), and advantage over the displacement formulation, which yields one order lower stresses.

Solution of saddle-point type systems
In matrix-vector form, the system Ax = b of eq. (8) can be written as ⎛ ] d S and (M ) i j = ω i · ω j dx represent the FE basis of the spaces to discretize the slip (ψ j ) and source term, respectively. As all problems with Lagrange multipliers, eq. (11) exhibits a saddle-point structure, making the system Jacobian A indefinite, possessing both positive and negative eigenvalues.
For 2-D problems, the solver of choice is usually a sparse direct solver that can handle indefinite matrices, such as those implemented in UMFPACK, MUMPS, LU, SuperLU and STRUMPACK. In 3-D, preconditioned Krylov methods, such as MINRES or GMRES, can outperform parallel direct solvers. Other techniques to solve eq. (11) involve hybridization or static condensation techniques, such as generalized displacement methods (Fraeijs de Veubeke 1965;Arnold & Brezzi 1985), domain decomposition approaches (Khattatov & Yotov 2019) and elimination of the degrees of freedom of the stress around the vertices (e.g. Ambartsumyan et al. 2020). For example, Ambartsumyan et al. (2020) use a vertex quadrature rule for the stress bilinear form which allows for local eliminations of the stress in the case of DG 0 for rotation, resulting in a cell-centred displacement-rotation system, or eliminating both stress and rotation in the case of DG 1 which leads to a displacement-only cell-centred system. Several pre-conditioners are available to efficiently solve eq. (11) using iterative methods (e.g. Klawonn & Starke 2004;Wildey & Xue 2013;Baerland et al. 2017;Rees & Wathen 2021). These advanced physics-based pre-conditioners have been shown to outperform simpler Schur complement-based pre-conditioners leading to solution approaches that are robust with respect to variations of model parameters as well as refinements of the discretization (Klawonn & Starke 2004;Baerland et al. 2017). For instance, Klawonn & Starke (2004) suggested an efficient block diagonal pre-conditioner for the MINRES iterative solver to solve the mixed FE formulation of linear elasticity, resulting in mesh-independence and uniform convergence in the incompressible limit. However, some of the implementation of these methods is not trivial. For simplicity in this work, we use an efficient direct solver to avoid nuances in the implementation of such complex methods.

Advantages and disadvantages of mixed FE methods
The classic approach to construct a FE discretization for the elasticity equation is represented by the pure displacement formulation method. Since the compliance material tensor A is invertible, the stress σ can be eliminated by substituting the constitutive relationship into the conservation of linear momentum. After taking the dot product of the governing equations with a test function υ and setting the integral over equal to zero, the weak form in this pure elastic displacement formulation reads where ϒ = {υ ∈ H 1 0 ( , V) : υ = 0 on N } is the space of square integrable vector fields on , with square integrable derivatives which vanish on the Neumann boundary N . In general, the vector function space ϒ is approximated by ϒ h = CG k (T h , V), which are the kth order continuous piecewise polynomials belonging to the Lagrange elements. Note that in this case the displacement boundary condition is essential while the traction condition becomes natural.
A FE approach based on this pure displacement formulation is standard and found in many textbooks (e.g. Ciarlet 2002). However, such formulation is not preferable for more complex models in viscoelasticity (e.g. Rognes & Winther 2010), poroelasticity (e.g. Baerland et al. 2017), plasticity (e.g. Johnson 1977) and Stokes problems (e.g. Stenberg 1984), where the stress-strain relation is not local and the stress variable σ cannot be eliminated (Arnold et al. 2007).
Another advantage of this mixed FE approach is that the stress is computed with one order higher accuracy than the pure displacement formulation (see Section 3.3). For pure displacement formulations, instead, the stress variable must be obtained a posteriori by differentiation leading to a loss of accuracy (Arnold 1990). The mixed formulation also results in conservation of momentum at the element level, not just at the global level. It is also well-known that the standard displacement discretization is not robust in the incompressible and nearly incompressible case, that is as λ → ∞ (Arnold et al. 1984b). While the elastic compliance tensor A is bounded as Poisson's ratio ν → 1 2 , its inverse blows up. For the elastic mixed method, instead, the compliance A tends to a limiting value (Arnold 1990).
There are also some disadvantages to these mixed approaches in comparison with displacement methods. While displacement methods typically lead to positive definite algebraic systems, we saw that the discretized system for the mixed method is indefinite, which for large scale problems requires specialized pre-conditioners in combination with iterative solvers (Section 3.1.1). Finally, because both stress, displacement and rotation are calculated simultaneously, the discrete mixed system generally involves more degrees of freedom than displacement approaches (see Section 3.3.1).

Benchmarks and performance
We now compare the elastic solution between the mixed finite-element approach (MF) and the pure displacement formulation (DF). We compute the corresponding convergence rate and analyse their performances. Then, always within the same FENICS framework, we compare the fault implementation described in Section 3 with the split node technique (Melosh & Raefsky 1981) in the case of an in-plane crack mode II. We also compare both results with analytic solutions (Pollard & Segall 1987).
The simulations and the convergence tests are performed using a single core of a laptop (eight-core Intel I9-9880H machine running at 2.4 GHz with 32 GB of RAM). We use the sparse LU factorization provided by Multifrontal Massively Parallel Solver (MUMPS) to solve our systems of equations (Amestoy et al. 2001(Amestoy et al. , 2019. For simplicity and because of the 2-D nature of our tests, we choose to use this robust and accurate direct solver rather than Krylov iterative solvers to fairly compare the performance of both formulations without preconditioning the systems.

Verification
To verify the accuracy of the mixed FE approach and compare it to the standard pure displacement formulation, we create an exact solution by applying the method of manufactured solutions (Roache 2002;Oberkampf & Roy 2010). Considering a 2-D connected domain ⊂ R 2 with boundaries ∂ = D , the boundary value problem of the linear elasticity in the pure displacement formulation, with vanishing Dirichlet boundaries, reads The forcing vector f can be calculated such that the exact solution is given by where u ex is the exact solution for displacement, and the stress σ ex is recovered by substituting the exact displacement into eq. (13). Appendix A provides a derivation of the source term f and stress σ ex expressions. Now that we have the exact solution of our variables, we can verify and compare the accuracy between our mixed method and the pure displacement approach. In order to do so, we consider a unit square := [0, 1] × [0, 1] with vanishing Dirichlet boundary conditions. We build the triangular mesh directly in FENICS using the built-in mesh function (Fig. S1). The body force f is then determined using Lamé coefficients μ = 1.0 and λ = 2.0. To fairly compare the two approaches, we choose to keep the same order of accuracy for the displacement variable. For linear elements, we discretize u with first order continuous polynomials for the pure displacement formulation, and k = 2 for the mixed method. In the latter, the displacement field is discretized with linear discontinuous piecewise elements (DG 1 , Fig. 2). Fig. 3 shows the comparison of the displacement and stress magnitudes between the difference of the pure displacement approach and the mixed method with respect to the exact solution (eq. 14) for cell size h = 1/512. We estimate the stress magnitude as the Frobenius norm of the stress tensor, defined as the square root of the sum of the absolute squares of its components.
While the displacement is comparable between the two different methods (same colour scale), the stress is not. As analysed by many studies (e.g. Arnold et al. 2007;Cockburn et al. 2010;Ambartsumyan et al. 2020), the mixed finite-element exhibits one order higher accuracy than the pure displacement formulation. Moreover, the error pattern differs between the two formulations for the magnitude of displacement and stress. While in DF the main displacement difference is concentrated at the four peaks (top centre in Fig. 3), in the mixed method the error is slightly lower and more evenly distributed (top right). For the stress, the error distribution between DF and MF is quite different. The error for the pure displacement formulation is three orders of magnitude larger and is mainly concentrated in the slopes around the peaks (bottom centre), while for the mixed method it is mostly focused at the corners of each quadrant (bottom right).
Given the exact solution and mesh size h, we can also compute the convergence rates of the two different elastic methods. For polynomials of order k, we expect the error to be O(h k+1 ) and O(h k ) in the L 2 and H 1 norm, respectively. Fig. 4 shows the convergence rates for linear and quadratic elements for the displacement field. The theoretical rates in (a), (b), (e) and (f) are denoted with dashed lines. All rates are in agreement with the theoretical expectations, as indicated in (a), (b), (e) and (f). The mixed method possesses the same order of accuracy for both stress and displacement variables in their corresponding norms for both linear and quadratic elements (Arnold et al. 2007;Cockburn et al. 2010;Ambartsumyan et al. 2020). Although the order of convergence for displacement is the same, the mixed approach is slightly more accurate than the standard elastic method for the same discretization.
Figs 4(c), (d), (g) and (h) show the total computational time versus the error of the displacement (top) and stress (bottom) for linear and quadratic elements, respectively. Dashed lines indicate linear trends, whose rates are shown in the corresponding legend. For a target fixed accuracy in the displacement unknown the run times of DF and MF are comparable, the former being slightly faster than the latter. This may Top left to right: displacement magnitude of the exact solution, and the absolute difference between DF and MF with respect to the exact solution, respectively. Bottom left to right: same as above, but for the magnitude of the stress field. Note that the colour scale for the displacement difference is the same for DF and MF, while for the stress it differs by about three orders of magnitude. Linear elements for displacement are used in all computations, and cell size h = 1/512. The stress magnitude results are normalized by the maximum exact values. be due to to the greater number of degrees of freedom (DOFs) that the mixed system has (stress, displacement and rotation) than DF. Fig. S2 shows the total number of DOFs and computational time as a function of the mesh refinement. We can see that the number of DOFs for the mixed method are greater than those for the pure displacement formulation, with a ratio MF/DF of about 4.8 and 2.9 for linear and quadratic elements, respectively. This difference leads the computational time for MF to be about 6.1 and 5.8 times higher than DF.
However, for a fixed target accuracy in the stress variables, the run time of MF is several order of magnitude faster than DF, as seen in Figs 4(g) and (h). This demonstrates that the proposed MF formulation is preferable when accuracy in the stress variables is needed. For the very fine mesh and quadratic elements, the run time for the mixed method seems to slightly deviate from the trend. This may be due to loss of performance of MUMPS in solving very large problems (about 24M of total DOFs).

Comparison with analytical solution for a mode II crack
To test the implementation of a fault discontinuity, we consider the case of an in-plane shear crack, mode II (e.g. Lawn & Wilshaw 1975). Analytic expressions for the displacement and stress fields are in the form of (e.g. Pollard & Segall 1987;Segall 2010;Scholz 2019) where μ and ν indicate the shear modulus and Poisson's ratio, respectively, r is the distance from the crack tip into the crack, and θ is the angle measured from the crack plane. K II is the stress intensity factor for mode II and depends on the geometry and magnitudes of the applied loads. Both K II and the functions f i (θ) and f ij (θ) can be found in standard references (e.g. Tada et al. 1973;Lawn & Wilshaw 1975). provides the analytic expressions of the displacement and stress fields for a crack mode II used in this study and reported in Pollard & Segall (1987). Besides with the analytic solution, we compare the mixed method results with the pure displacement approach, in which the relative motion along crack walls is modelled by using the split node technique. For the implementation of the split node method within the FENICS framework, we follow the strategy first suggested by Melosh & Raefsky (1981).
For the comparison between the different approaches, we consider a domain := [0, 4] × [0, 4] with a crack of unity length located at the centre of the domain (Fig. S3). We use the FENICS built-in function to build the mesh, and apply the analytic displacement (u = u an ) at the boundaries (eq. B2). For the elastic properties, a shear modulus of μ = 1 and Poisson's ratio of ν = 0.25 are used throughout the computations. We apply a unity stress drop, which leads to an elliptical solution for the slip (Appendix B, Pollard & Segall 1987;Scholz 2019). We prescribe a left-lateral movement, and perform the simulations using linear elements. Fig. 5 shows the comparison of the displacement magnitude and the mean normal stress between the analytic, the pure displacement method with the split node technique, and the mixed approach, for cell size h = 1/128. To make the absolute difference of DF and MF with respect to the analytic solution meaningful (second and third columns), we normalize the displacement magnitude by the maximum analytic slip (top panel), and the normal stress by its maximum analytic value (bottom panel). Both the mixed method and the pure displacement formulation with the split node technique are in good agreement with the analytic solution, with errors less than ∼0.05 per cent for both displacement and stress. MF gives better results than DF. For the stress, this is expected since the mixed method has an order of accuracy higher than the standard displacement approach (Arnold et al. 2007;Cockburn et al. 2010;Ambartsumyan et al. 2020). The main differences are focused on the crack tips. In the pure displacement method with the split node technique, the systematic error is likely related to the stress singularities at the crack tips since the elastic solution is computed at the vertices. This problem can be overcome by the mixed method because the computation of the displacement solution occurs within each cell (Fig. 2). We note a small difference at the crack tips, but this discrepancy is below 0.02 per cent for both variables. Similar results are also visible in Figs S4 and S5, where we compare all the components of the displacement and stress field.
Lastly, we compute the convergence rates for the MF with the fault implementation described at the beginning of the section and DF with the split node technique. In order to avoid the systematic discrepancies around the crack tips (Fig. 5), we remove a 0.5 × 0.5 block around the discontinuity, which corresponds to one cell in the coarser mesh case. We refine the mesh, and compute the error L 2 and H 1 norm for the displacement and stress, respectively, as we did for the manufactured solution case. Fig. 6 shows the comparison of the convergence rate between the mixed method (red) and the pure displacement formulation with the split node technique (blue), using linear elements. Empty and full symbols indicate the integration of the error over the entire domain and excluding the 0.5 × 0.5 block around the crack, respectively.
Considering the entire domain, the convergence rates do not agree with the theoretical ones (dotted lines), likely due to the systematic errors around the crack tips (Fig. 5). In particular, for the displacement the error decreases as the mesh refines at the rate of about 0.45 for Downloaded from https://academic.oup.com/gji/article/230/2/733/6524187 by University of Texas at Austin user on 13 April 2022 Figure 5. Comparison of the displacement magnitude and mean normal stress between the pure displacement formulation with the split node technique (DF), the mixed method with the fault implementation described in Section 3 (MF), and the analytic solution for a shear crack, mode II (Pollard & Segall 1987). Top left to right: displacement magnitude of the analytic solution, and the absolute difference between DF and MF with respect to the analytic solution, respectively. Bottom left to right: same as above, but for the mean normal stress. Linear elements for displacement are used in all computations, and cell size h = 1/128. The displacement magnitude and the mean normal stress results are normalized by the maximum analytic slip and by the maximum analytic value, respectively. both DF and MF. For the stress, instead, the convergence rate is negative. This opposite trend may be related to the presence of the stress singularities at the crack tips. Refining the mesh, the singularity may be better approximated by the stress solution increasing the corresponding error. Removing the small area around the discontinuity in the error calculation, the convergence rates significantly improve, approaching the theoretical ones. For the displacement, while the MF follows the theoretical curve, the convergence for the DF with the split node technique is slightly lower than the theoretical one (1 1 2 instead of 2). This small discrepancy may be related to the effect of the displacement around the crack tips beyond the small region around the discontinuity that we excluded in the error calculation (cf. Fig. 5).

T H E I N V E R S E P RO B L E M : C O S E I S M I C S L I P D I S T R I B U T I O N
We now pose the inverse problem: given a set of surface displacement data recorded with different observations (e.g. GPS, InSAR, etc.), we seek to find the fault slip responsible for such deformation. One of the strengths of the proposed FENICS-HIPPYLIB framework is that the forward and inverse problems can be performed within the same infinite-dimensional (continuous) formulation, in a flexible, transparent and easily extensible way. Although the HIPPYLIB library contains many algorithms to solve the inverse problem in a Bayesian fashion to better quantify model uncertainties, here we show an application of this new framework to a classic earthquake problem, the linear deterministic inversion for the coseismic slip distribution. However, we stress that the transition between the two inverse formulations is straightforward (e.g. Isaac et al. 2015) and all the underlying algorithms are contained in HIPPYLIB .
The reason for initially addressing the linear slip inversion problem is twofold: first, we want to exploit our treatment of the fault discontinuity within the mixed FE approach, since the slip appears directly in the right-hand side of the constitutive law (eq. 8) after integration by parts. This fault implementation presents a straightforward path to derive the gradient corresponding to the slip field without discretizing the fault a priori (which avoids differentiating through numerical artefacts) and computing the Green's functions. Secondly, we can compare our results with the standard approach of inverting the matrix of Green's functions (Okada 1992) to solve the linear inverse problem. Although HIPPYLIB automatically computes gradient and Hessian information by applying symbolic differentiation to the variational form of the forward problem (eq. 8), in the next section we describe the adjoint method to derive gradient and Hessian actions for the solution of the coseismic slip inversion problem.

The adjoint method for the coseismic slip problem
Given some discrete, noisy observations d ∈ R n obs , the goal of the inverse problem is to infer the unknown model parameter field m ∈ M that best reproduces the observations. Mathematically, this relationship can be written as where F : M → R n obs is the parameter-to-observable map, describing the process that predicts the data for a given parameter m, and m is the slip field s in eq. (8) for the linear coseismic slip inversion. η indicates additive noise due to uncertainties in the data and model errors (Tarantola 2005). In HIPPYLIB, the noise is modelled as a Gaussian distribution η ∼ N (0, noise ) centred at 0 with covariance noise . The mapping F is given by a linear or non-linear observation operator B(ϕ) : U → R n obs that extracts the observations from the states ϕ ∈ U, where ϕ depend on m via the solution of the forward problem or state equation. In the case of the coseismic slip inversion, the states ϕ correspond to displacement, stress, and rotation, respectively, and the mapping F(m) is linear since the slip appears linearly in eq. (8). This mapping can be discretized as Fm = BA −1 Mm, where F is the discretized parameter-to-observable map, and m is the slip vector. B is the discretized observation operator B, which evaluates the displacement u at the observation locations. A and M = (M F , −M , 0) T are the mixed elasticity and mass matrices of eq. (11), respectively. In our case, the source term f is zero, hence M vanishes. The main challenge of solving eq. (17) is that, in the general case of F governed by PDEs with infinite-dimensional parameters, the inverse problem is ill-posed, that is the solution is not unique and highly sensitive to errors in the data (Hadamard 1923;Tikhonov & Arsenin 1977;Engl et al. 1996). To overcome this issue, we usually regularize the problem by including additional information on the solution, such as smoothness (Vogel 2002).
In general, we can formulate the linear inverse problem as follows: given a set of finite-dimensional noisy measurements d ∈ R n obs , we seek to find the model parameter m which can predict the data within the noise tolerance. This translates into solving the following optimization problem: Here, the cost functional J (m) consists of two terms. The first is the misfit between the observations d and those predicted by the mapping F(m), weighted by the inverse of the data noise covariance −1 noise . The second term R(m) is the regularization, which penalizes oscillatory components of the model parameter m by imposing some sort of regularity, such as smoothness. In the case of the coseismic slip, we may use some type of Tikhonov regularization (Phillips 1962;Tikhonov 1963) that penalizes, for example, the gradient or the second derivative of the model parameter, allowing the solution to vary smoothly.
We propose to infer the fault slip by solving the optimization problem of eq. (18). In general, to efficiently solve this linear least-squares problem, first (gradient) and second derivative (Hessian) information of J (m) are needed. In our case, only the gradient depends on the model parameter, since the forward problem is linear, while the Hessian is independent of m. The gradient expression can be derived by using the Lagrangian formalism (Tröltzsch 2010). For the linear elastic problem, the Lagrangian functional for the infinite-dimensional gradient L G , in variational form, reads where the first term is the data misfit, and the second is a linear combination of a H 1 and L 2 -type Tikhonov regularization that penalizes the H 1 ( ) and L 2 ( ) norm of (m − m 0 ), respectively. m 0 is a reference model parameter, while γ and δ represent smoothing weights. The other terms form the residual of the forward PDE model (eq. 8), where τ , ω and ξ represent auxiliary variables, called the adjoint variables, for the stress, displacement and rotation, respectively. We have replaced the slip s of eq. (8) with the unknown parameter field m we seek to infer from the data. Note that, although the slip is defined on F , we need to integrate the model parameter on the entire domain . This is due to a current limitation in FENICS to manage function spaces defined on different meshes (e.g. domain and fault). By setting to zero the variation of the Lagrangian L G with respect to the adjoint variables, one obtains the variational form of the forward problem (eq. 8) to find the displacement, stress, and rotation, respectively. Similarly, by setting to zero the variation of the Lagrangian L G with respect to the state variables one obtains the weak form of the so-called adjoint problem: whereσ ,ũ, andr are test functions, and B * : R n obs → U is the inverse mapping of B that maps the discrete observations back to the infinite-dimensional space of the states U. We can solve the adjoint equation above to find the adjoint variables τ , ω and ξ . In contrast with the strong form of the forward problem in eq. (1), the strong form of the adjoint problem reads where as * (ξ ) is defined such that: as(σ ) · ξ dx = σ : as * (ξ ) dx, and in 2-D takes the form of Lastly, we can derive the gradient of the cost functional by taking the variation of the Lagrangian with respect to the model parameter. The gradient of J (m) in an arbitrary directionm ∈ M, evaluated at an arbitrary point m * in the parameter space M, is the Gâteaux derivative of L G with respect to m : where we can see that the gradient is linear in m * since τ + depends linearly on u via the solution of the adjoint problem (eq. 21), and u depends linearly on m * via solution of the (weak) forward mixed elasticity problem (eq. 8). The terms in the gradient expression (eq. 23) that depend linearly on m * define the Hessian operator, whose action on an arbitrary m * entails solution of one forward (eq. 8) and one adjoint (eq. 20) mixed elasticity problem. Integrating the regularization term by parts, the strong form of the gradient expression then reads where n F = n + = −n − . In HIPPYLIB, we can either explicitly input the expression for the gradient as in eq. (23), or else let HIPPYLIB derive this expression using FENICS's symbolic capability for taking variations of weak forms. To efficiently solve the linear inverse problem (eq. 18), we use a preconditioned conjugate gradient (CG) algorithm to solve the first order necessary condition, for m * . At each CG iteration, a Hessian action must be computed, which as stated above entails solution of a pair of forward/adjoint mixed elasticity problems. Preconditioning the system by the inverse of the regularization operator R transforms the Hessian into the sum of a compact operator (its eigenvalues accumulate at zero) and an identity operator , for which CG is known to converge rapidly and in a number of iterations that does not depend on the parameter dimension or the data dimension (Ghattas & Willcox 2021). Therefore, the overall cost of solving the inverse problem, measured in forward/adjoint problem solutions, does not depend on the parameter or data dimensions, and instead depends only on the intrinsic information contained in the data about the model (e.g. Bui-Thanh et al. 2012, 2013Isaac et al. 2015;Ghattas & Willcox 2021).

Comparison with standard coseismic slip inversion approach
For a linear inverse problem, we can rewrite eq. (17) in its discrete form, omitting the noise term, as where the data kernel G relates the model parameter vector m = (m 1 , m 2 , ..., m M ) T to the finite-dimensional observations d = (d 1 , d 2 , ..., d N ) T . For the coseismic slip problem, the fault surface is generally discretized a priori into rectangular patches, and every column of the N × M matrix G contains the surface displacements at the observation locations computed by imposing unity slip for each fault patch, using the elastic Green's functions within an elastic half-space (Okada 1992). The inverse problem defined by eq. (26) is usually ill-posed due either to having more unknown parameters than data (lack of uniqueness) or to having very small singular values (lack of stability) (Tarantola 2005). Therefore, we need to add some a priori information on m to constrain the solution (Jackson 1979). This prior knowledge can be encapsulated in a regularization term, analogous to R(m) in eq. (18).
We can solve the linear inverse problem of eq. (26) by using, for example, a weighted damped least-squares approach (Menke 2018): where L is a linear operator identified with the Tikhonov regularization R(m) in eq. (18), and β is a weighting parameter. In particular, if L is the identity, the regularization is called zeroth-order Tikhonov. If the operator is the gradient or the Laplacian, we refer to it as first and second-order Tikhonov regularization, respectively. Zeroth-order Tikhonov regularization is not commonly used; the majority of coseismic studies typically decide to penalize the gradient or the second derivative of the model parameters (e.g. Liu & Archuleta 2004;Hsu et al. 2006Hsu et al. , 2011Liu et al. 2019;Wang et al. 2020). However, this standard approach encapsulates many limitations. This method requires a priori fault discretization, and the computational time to calculate the slip rises up as the number of subfault patches increases. Additionally, a realistic complex geometry of the fault and 3-D heterogeneous media may be difficult to explore with this approach, mainly due to computational limitations.

Comparison of the inversion results
To compare the results of the coseismic slip inversion between the standard linear approach and the adjoint method via the FENICS-HIPPYLIB framework, we consider a 2-D model with a curved fault and 20 observations uniformly spaced at the surface (inset in Figs 7 and 8). For the adjoint linear inversion, we consider a rectangular computational domain of size 1100 × 500 km. The open-source software GMSH (Geuzaine & Remacle 2009) is used to generate an unstructured mesh with 12 930 triangular cells. The mesh is finer in a region near the fault, mesh size ∼5 km, and coarser near the vertical and bottom boundaries. The mixed elasticity problem is discretized using the 1st order stable triplet of finite element spaces, resulting in 117 062 DOFs for the state variables (stress, displacement and rotation). Fig. 7 shows the discretized domain, where the fault discontinuity is divided in 22 uniformly spaced segments of 5 km each. The same fault geometry is used to compute the data kernel G of Green's functions in eq. (26). We choose the particular location of the surface observations on just one side of the fault trace as this resembles the most common GPS network distribution in subduction zones, where the majority of geodetic stations are located in the overriding plate. The estimated slip, as well as the uncertainty associated with such estimation, significantly depends on the observation configurations. However, the problem of finding an 'optimal' configuration of sensors requires solving an optimal experimental design problem. Development of scalable approaches for the the solution of large-scale optimal experimental design problems governed by PDE forward model is a very active area of research (e.g. Alexanderian et al. 2016a, b;Attia et al. 2018;Herman et al. 2020;Alexanderian 2021) and is beyond the scope of this work. A homogeneous isotropic Earth's elastic structure is considered, but a fully heterogeneous medium can be easily implemented within our framework. Table 1 summarizes the parameters used in our computation. At the fault boundary, we prescribe a Gaussian slip centred at 20 km depth and standard deviation of 15 km resembling an earthquake nucleated at shallow depth on the subduction interface.    , and it is represented by horizontal coloured segments. Each slip segment is coloured by the absolute error with respect to its true value. Panels (b)-(c) show the same inversion results as (a) but using the standard linear approach with first-order (b) and second-order (c) Tikhonov regularization, respectively. The data are polluted with 5 per cent random Gaussian noise with zero mean and covariance noise .
We apply zero displacement boundary conditions to the left, right and bottom boundaries, and a free surface at the top of the model. Although the model extends for several fault lengths, it is not sufficient to fully remove the effects of the boundary conditions. This produces a slightly different displacement field between the numerical and the analytic solution moving away from the fault source. The standard solution would again involve increasing the domain size. However, this is not an issue in our case since we are interested in demonstrating the solution of the coseismic slip inversion with synthetic (simulated) data. As long as the boundary conditions of the forward model used to generate the synthetic data are consistent with those used for the inversion, the quality of the inverse solution is not affected by the choice of the boundary conditions.
For the classic slip inversion, we compute the data kernel G of Green's functions by Okada's routine 'DC3D' (Okada 1992). In our FENICS-HIPPYLIB framework, we compute synthetic horizontal and vertical surface deformations by solving the forward problem (eq. 8, Fig. 8) and extract the displacement values at the 20 observations using the observation operator B(u). Fig. 8 shows the horizontal and vertical displacement field given the prescribed Gaussian slip, and the locations of the surface observations. The imposed Gaussian slip produces the largest surface deformation close to the 4th, 5th and 6th stations from the left of the intersection between the fault interface and the surface. Moreover, there is a sharp and resolved jump of the displacement field between the two sides of the fault. This confirms the power of the mixed FE approach to deal with fault discontinuities.
To perform the coseismic slip inversion, we apply 5 per cent random noise to the surface observations and calculate the noise variance as the product of the relative noise level and the L ∞ norm of observed surface deformation. Since the inverse problem is ill-posed, we regularize the system with a regularization term R(m) of the form of eq. (19). We penalize both the magnitude and the gradient of the model parameter field so that the pre-conditioner R(m) is invertible, as needed by the CG algorithm. Therefore, we need a small contribution of the mass matrix. In a Bayesian inference setting, the ratio √ γ /δ also plays the role of the correlation length in the prior term. After polluting the synthetic horizontal and vertical surface data with the random noise, we fix the ratio γ /δ to be 10 4 , and perform an L-curve analysis (Miller 1970;Lawson & Hanson 1995) to find the best value of γ . We obtain an 'optimal' value of γ = 60 (Fig. S6a).
To fairly compare the inversion results between the adjoint method and the standard approach using linear inverse theory, we need to discretize the fault slip in the same way. The data kernel G of Green's functions is calculated by imposing unity slip in each fault patch, and each patch is characterized by constant slip. To reproduce the same condition in the FENICS-HIPPYLIB framework, we use Crouzeix-Raviart (CR 1 ) elements to discretize the fault slip, where the DOF is located at the mid-point of each fault segment. It is easy enough to discretize the slip using linear elements (i.e. using CG 1 elements), but in this case we would have one more DOF for slip than the standard linear method.
The result of our inversion is shown in Fig. 9. With this configuration of noise and regularization, the CG method converges in 24 iterations. At each CG iteration a pair of forward (eq. 8) and adjoint (eq. 20) problems are solved. In the same figure, we also plot the results of the coseismic slip inversion using linear inverse theory. We only plot the results for the first and second-order Tikhonov regularization, where the values of the corresponding weights are inferred from the L-curve criterion (Figs S6b and S6c). The slip distribution inferred using the adjoint method (Fig. 9a) approaches the true solution (black line). The result is comparable with the standard linear approach using Green's functions (b and c). The non-perfect match is due to the limited amount of surface data (only 20 observations) and noise level (5 per cent). Fig. S7 replicates the same inversion with lower noise (1 per cent) in which case the inverted slip very closely matches the true slip.

Spectrum decomposition
To further compare the two different approaches, the adjoint method and the standard linear inversion using Green's function, we analyse the spectra of the discretized parameter-to-observable map F and the data kernel G. Due to the linear nature of the coseismic slip inverse problem, the second derivative of the objective function (eq. 18), the so-called Hessian H, is independent of the model parameter field m and data d. The analysis of the Hessian spectrum is useful to: (1) characterize the degree of ill-posedness of the inverse problem; (2) understand the redundancy of the data and (3) determine which data contain more information about the infinite-dimensional field m. After discretization, the Hessian is generally a large, dense matrix; therefore, an explicit construction of H for large-scale problems is typically intractable since its dimension is equal to the dimension of m. Each column of the Hessian requires the solution of a pair of linearized forward/adjoint PDEs. To make operations with the Hessian tractable, it is well-known that eigenvalues of the Hessian typically collapse rapidly to zero, since the data only contain limited information about the infinite-dimensional parameter field. Hence, we can build a low-rank approximation of the data misfit component of the Hessian, H mis f it . The low-rank properties of H mis f it have been analytically demonstrated for many complex forward PDE problems (e.g. Hesse & Stadler 2014;Petra et al. 2014;Worthen et al. 2014;Isaac et al. 2015;Chen et al. 2019;Alghamdi et al. 2020) The discretized Hessian can be decomposed into two components: the Hessian of the data misfit and the Hessian of the regularization, where H mis f it has eigenvalues that decay to zero, reflecting ill-posedness. This property invites a low rank approximation of H mis f it , which in HIPPYLIB we compute via a randomized eigensolver (Halko et al. 2011) to solve the following symmetric eigenproblem : where v i is the eigenvector associated with the eigenvalue λ i . The Hessian of the data misfit, H mis f it , is a symmetric positive semi-definite matrix, and it can be easily related to the data kernel G. From eq. (27) we can see that the term in parenthesis of G −g can be decomposed into a data misfit term and a regularization component. The data misfit Hessian can be discretized as H misi f t = 1 where σ 2 d is the data noise variance and we used the symmetry of M and A. We note that G ∼ F, in the sense that G corresponds to a different discretization technique, using Green's functions, of the same parameter to observable map. Hence, the data misfit Hessian H mis f it corresponds to G T G in the linear inverse theory. Fig. 10 compares the spectrum between the data misfit Hessian and G T G for the coseismic slip inverse problem of Fig. 9. Both the eigenvalue decay (a) and the eigenvectors (b-c) are very similar. We will use the spectrum information of Fig. 10 to compare the resolution of the model parameters between the two approaches, and infer preliminary uncertainties of our coseismic slip inversion. We choose p = 8 to avoid to include the smallest eigenvalues (see Fig. 10a). The data are polluted with 5 per cent random Gaussian noise with zero mean and covariance noise .

Truncated SVD solution, model resolution and uncertainty analysis
The Hessian H and the data kernel G play a fundamental role in quantifying model resolution and uncertainty in the estimated slip. In particular, their spectral properties: (1) characterize the degree of ill-posedness of the inversion procedure, (2) indicate which parameter modes are most informed by the data  and (3) expose the low intrinsic dimensionality of the coseismic slip problem. To do so, we write the data kernel G using the singular-value decomposition (SVD) as where U and Vare N × N left and M × M right matrices of singular vectors, respectively, and is an N × M diagonal matrix whose diagonal elements are called singular values. These singular values are non-negative and usually arranged in decreasing order. Some of them may be zero, and thus can be partitioned into a p × p matrix p containing nonzero singular values and a matrix with zero entries. The subscript p is an integer indicating how many singular values are positive (Menke 2018). Hence we can rewrite eq. (30) as G = U p p V T p , where U p and V p consist of the first p columns of U and V, respectively.
The linear operator G is usually explicitly constructed, so its SVD decomposition is relatively straightforward. However, the computation of U, and V for the Hessian requires some care. Since the Hessian is typically a large, (formally) dense matrix for large-scale geophysical problems, unless the parameter dimension is modest, it is not possible to explicitly construct H and compute the SVD of the discretized parameter-to-observable map F ∼ G. In order to address this issue and compute the SVD of F given the data misfit Hessian, we can readily build the columns of the right matrix of singular vectors V from the eigenvectors v i of eq. (29). The diagonal matrix can be easily calculated by taking the square roots of the eigenvalues λ i of eq. (29). Finally, the N × N left matrix of singular vectors U requires the knowledge of the discretized observation operator B, the mixed elasticity matrix A, and the mass matrix M: From this SVD decomposition of the data kernel G and Hessian, the truncated SVD solution of the linear inverse problem (eq. 27) can be obtained by where the integer p must be chosen such that the smallest eigenvalues are excluded from the calculation (Menke 2018). The grey dashed line in Fig. 10 a indicates the choice of the value of p for our coseismic slip inverse problem. We choose p = 8 because it gives a good natural solution of the inverse problem from SVD (Fig. 11). Lower or higher values of p do not improve the truncated SVD solutions. We may obtain better results by lowering the noise level in the data. According to linear inverse theory, we can write the model resolution matrix of the natural generalized inverse, R, as The model resolution matrix characterizes whether each model parameter is uniquely determined (if R is the identity matrix). If R is not an identity matrix, the estimates of m are weighted averages of the true model parameters. Fig. 12 shows the comparison of the model resolution matrix between the standard approach (a) and the adjoint method based on the mixed elasticity formulation (b) for our coseismic slip problem. The model resolution matrices look very similar. Both plots show high resolution near the top left, indicating that the shallow slip is well resolved, while the resolution becomes poor at depth. This is expected, since we have observations only at the surface (see Fig. 8). Note that R does not depend on the actual values of the data, but only on the fault geometry, observation location, and a priori information added to the problem (Menke 2018).
The data kernel and Hessian spectra can also provide information about the uncertainties in the model parameters. Given statistically uncorrelated observational errors with uniform variance σ 2 d , it is possible to calculate how the error in the data propagates through the inversion process, leading to estimates of model parameters with covariance C m . We can also rewrite this model covariance matrix using SVD. The pointwise variance of m is given by the diagonal elements of the model covariance matrix, where C d = σ 2 d I is the data covariance matrix. The diagonal matrix of singular values, p , can be easily calculated by taking the square root of the eigenvalues λ i (eq. 29 and Fig. 10a). The model variances σ 2 m are typically used to infer the confidence bounds for the model parameters. Fig. 13 shows the results of the coseismic slip inversion shown in Fig. 9 within the 2σ 2 m = 95 per cent confidence intervals. As expected, resolution is highest at the shallow depths. This is also reflected by smaller confidence intervals in the slip solution close to the surface than at depths, where the coseismic slip may not be predicted with high accuracy due to the limited surface observations (20) and high data noise (5 per cent).

C O N C L U S I O N S
We developed a new, open-source FE modelling framework able to solve forward and inverse earthquake problems within the same computational architecture. This FENICS-HIPPYLIB framework provides the user with all the flexibility and transparency of the two advanced numerical libraries. Although suited for many multiphysics problems, we have focused on the coseismic slip problem and provided a new and rigorous fault implementation in a mixed finite element formulation at the continuous level. This allows the formulation of the coseismic slip inverse problem to expose the unknown slip field at the infinite dimensional level, thereby bypassing numerical artifacts and assumptions of the fault discretization (such as piecewise-constant slip and piecewise-linear fault geometry). It avoids the underlying assumptions of the Green's function approach, including the assumption of homogeneity and permits the gradient and Hessian to be readily derived at the infinite dimensional level. This allows the discretization to be chosen in a manner that is appropriate for other needs such as sufficient accuracy and smoothness. The new formulation provides an attractive framework for slip inversion in a heterogeneous medium, for joint slip-medium inversion, and for inversion of the fault geometry. The mixed elastic formulation exhibits a stress convergence rate that is one order higher than that of the pure displacement method, both theoretically and observed in numerical convergence tests. Moreover, the new fault implementation is more accurate near crack tips than the split node technique implemented within the standard displacement elastic formulation. While the new method is slower than the displacement approach for the same resolution (due to a larger number of degrees of freedom), it is far faster for the same stress accuracy-we observed several orders of magnitude speedup in runtime. The larger computational cost for the same mesh size may be remedied by the use of existing efficient pre-conditioners with iterative solvers, domain decomposition methods, and local elimination techniques, or a combination thereof. When applying the new framework to a classic earthquake problem, inversion for coseismic slip, results are comparable to the standard Green's function approach. From a spectral decomposition of the data misfit Hessian, we can estimate preliminary model uncertainties, and we document correspondence between the Hessian and data kernel spectra.
These promising results for our general forward and inverse framework indicate great utility for a number of more sophisticated earthquake problems. In a forthcoming paper, we will extend this modelling framework to perform non-linear and Bayesian inference inversions for more rigorous uncertainty quantification and inversions for heterogeneous material parameters, which are impossible with standard Green's function approaches. The flexibility of our new framework should allow for the rigorous integration of multiphysics and heterogeneous data sets, providing a new tool to help answer fundamental questions in earthquake science.

A C K N O W L E D G M E N T S
We thank Dr Ilona Ambartsumyan for the fruitful discussions about the mixed finite-element elastic formulation. SP, TWB, and DL were supported by NSF EAR-2121666, EAR-2045292, 19214743 and 1927216. EK, UV and OG were supported by NSF ACI-1550593 and DOE ASCR DE-SC0019303. We thank Brad Aagaard and an anonymous reviewer for their helpful comments to improve the quality of the manuscript.

DATA AVA I L A B I L I T Y
The fully documented Jupyter notebooks to reproduce the results are available for the readers in the online GitHub repository https: //github.com/SimonePuel/Coseismic-Slip-Inversion.git. We used FENICS-2019.1.0 and HIPPYLIB-3.0.0 to compute all the results in this study. These libraries can be downloaded at https://fenicsproject.org and https://hippylib.github.io, respectively. The unstructured mesh for the FE simulation of the coseismic slip inversion was built using the open-source software GMSH (Geuzaine & Remacle 2009) and the files are included in the online repository. Figure S1. Example of triangular mesh used for computing the convergence rate of Fig. 4. We use the FENICS built-in mesh function, and cell size h = 1/32. Figure S2. DOFs and computational time comparisons between the pure displacement formulation (DF, blue dots) and the mixed method (MF, red squares). Log-log plots of computational time (a-b) and DOFs (c-d) as a function of mesh size h with linear and quadratic elements, respectively. Dashed lines in all plots indicate the best fit. Figure S3. Example of triangular mesh used for computing the convergence rate of Fig. 6. We use the FENICS built-in mesh function. The black solid line indicates the crack, and cell size h = 1/8. Figure S4. Comparison of the displacement components between the pure displacement formulation with the split node technique (DF), the mixed method with the fault implementation described in Section 3 (MF), and the analytic solution for a shear crack, mode II (Pollard & Segall 1987). Top left to right: horizontal displacement of the analytic solution, and the absolute difference between DF and MF with respect to the analytic solution, respectively. Bottom left to right: same as above, but for the vertical displacement. Linear elements for displacement are used in all computations, and cell size h = 1/128. The displacement components are normalized by the maximum analytic slip. Figure S5. Comparison of the components of the stress tensor σ between the pure displacement formulation with the split node technique (DF), the mixed method with the fault implementation described in Section 3 (MF), and the analytic solution for a shear crack, mode II (Pollard & Segall 1987). Top left to right: σ xx of the analytic solution, and the absolute difference between DF and MF with respect to the analytic solution, respectively. Middle left to right: same as before, but for the σ yy . Bottom left to right: same as above, but for the σ xy . Linear elements for displacement are used in all computations, and cell size h = 1/128. The stress components are normalized by the maximum analytic value. Figure S6. L-curve criterion for inferring the 'optimal' regularization weight for the coseismic slip inversion. (a) L-curve log-log plot of the model norm as a function of the data norm for the adjoint inversion. L is a linear operator and in this case it is represented by the gradient. The 'optimal' regularization weight is located a the 'elbow' of the L-curve (red dot). (b)-(c) are same as (a) but for the standard linear inversion using the data kernel of Green's functions, and first-order and second-order Tikhonov regularization, respectively. In this case, L is represented by the gradient and Laplacian operator, respectively. The data are polluted with 5 per cent random Gaussian noise with zero mean and covariance noise . Figure S7. Reconstructed slip from the coseismic slip linear inversion. Same plots as Fig. 9, but for 1 per cent random Gaussian data noise. (a) Fault slip estimated using the new framework and the adjoint method. The slip is constant within each subfault patch (22 total), and it is represented by horizontal colored segments. Each slip segment is colored by the absolute error with respect to its true value. (b)-(c) show the same inversion results as (a) but using the standard linear approach with first-order (b) and second-order (c) Tikhonov regularization, respectively.

R E F E R E N C E S
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper.

A P P E N D I X A : D E R I VAT I O N O F M A N U FA C T U R E D S O L U T I O N
We derive the expression of the source term f(x, y) and exact stress given an exact solution, using the method of manufactured solution (Roache 2002;Oberkampf & Roy 2010). Considering a 2-D connected domain ⊂ R 2 with boundaries ∂ = D , the boundary value problem of the Downloaded from https://academic.oup.com/gji/article/230/2/733/6524187 by University of Texas at Austin user on 13 April 2022 linear elasticity in the pure displacement formulation, with vanishing Dirichlet boundaries, reads ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ −∇ · σ = f in , σ = 2με + λtr(ε)I in , ε = 1 2 (∇u + (∇u) T ) in , u = 0 on D . (A1) Let the exact solution be given by u ex = 0 sin(2π x) sin(2π y) .
We want to determine the source term f(x, y) and the boundary conditions, such that they give the exact solution. In order to find the source term f(x, y), we substitute the expression of the strain tensor ε into the conservation of linear momentum equation to get ∇ · μ(∇u ex + ∇u T ex ) + λ∇ · u ex I = f.

A P P E N D I X B : A N A LY T I C E X P R E S S I O N S F O R M O D E I I C R A C K
In this appendix, we provide the analytic expressions of the displacement and stress fields for a crack mode II following Pollard & Segall (1987) and used in this study. We consider a 2-D domain and a shear crack of unity width, 2a = 1, where a is the half-width of the crack. The formulas below are expressed in polar coordinates centred at the crack middle (x 0 , y 0 ) and tips a. Following Pollard & Segall (1987), we can