Fast nonlinear gravity inversion in spherical coordinates with application to the South American Moho

S U M M A R Y Estimating the relief of the Moho from gravity data is a computationally intensive nonlinear inverse problem. What is more, the modelling must take the Earths curvature into account when the study area is of regional scale or greater. We present a regularized nonlinear gravity inversion method that has a low computational footprint and employs a spherical Earth approximation. To achieve this, we combine the highly efficient Bott’s method with smoothness regularization and a discretization of the anomalous Moho into tesseroids (spherical prisms). The computational efficiency of our method is attained by harnessing the fact that all matrices involved are sparse. The inversion results are controlled by three hyperparameters: the regularization parameter, the anomalous Moho density-contrast, and the reference Moho depth. We estimate the regularization parameter using the method of hold-out cross-validation. Additionally, we estimate the density-contrast and the reference depth using knowledge of the Moho depth at certain points. We apply the proposed method to estimate the Moho depth for the South American continent using satellite gravity data and seismological data. The final Moho model is in accordance with previous gravity-derived models and seismological data. The misfit to the gravity and seismological data is worse in the Andes and best in oceanic areas, central Brazil and Patagonia, and along the Atlantic coast. Similarly to previous results, the model suggests a thinner crust of 30–35 km under the Andean foreland basins. Discrepancies with the seismological data are greatest in the Guyana Shield, the central Solimões and Amazonas Basins, the Paraná Basin, and the Borborema province. These differences suggest the existence of crustal or mantle density anomalies that were unaccounted for during gravity data processing.


I N T RO D U C T I O N
The Mohorovičić discontinuity (or Moho) that marks the transition from the crust to the mantle, is studied almost exclusively through indirect geophysical methods. The two main geophysical methods used to estimate the depth of the Moho are seismology, with both natural and controlled sources, and gravimetry. With the advent of satellite gravimetry missions like GRACE and GOCE, gravity-derived crustal models can be produced in regional or global scales (e.g. Reguzzoni et al. 2013;van der Meijde et al. 2013van der Meijde et al. , 2015. New spherical harmonic gravity models that use these satellite observation, like GOCO5S (Mayer-Guerr et al. 2015), provide almost homogeneous data coverage in difficult to access regions traditionally poor in terrestrial data. An example is South America, where seismological and terrestrial gravity data are traditionally concentrated around urban centres and coastal ar-eas, resulting in large areas (e.g. forests and mountains) devoid of data.
Estimating Moho depth from gravity data is a nonlinear inverse problem. One can generalize this problem of estimating the depths of an interface separating two media, such as the sedimentbasement interface of a sedimentary basin or the crust-mantle interface (Moho). Several methods have been developed over the years to solve this inverse problem, for example, Bott (1960); Barbosa et al. (1997Barbosa et al. ( , 1999a; Barnes & Barraud (2012); Leão et al. (1996); Martins et al. (2010Martins et al. ( , 2011; Oldenburg (1974); Reguzzoni et al. (2013); Santos et al. (2015); Silva et al. (2006Silva et al. ( , 2014, to name a few. Solving the inverse problem is computationally demanding because it requires the construction of large dense matrices and the solution of large linear systems. As a result, some authors search for ways to increase the computational efficiency of this class of inverse problem. Bott (1960) proposed a method based on iteratively applying Fast nonlinear gravity inversion 163 corrections to a starting estimate based on the inversion residuals. The algorithm is fast because it bypasses the construction and solution of linear systems and only involves forward modelling. Oldenburg (1974) showed that the fast FFT-based forward modelling of Parker (1973) could be rearranged to estimate the relief. Barnes & Barraud (2012) use a form of adaptive discretization to compute the Jacobian, or sensitivity, matrix. For each data point, the discretization will be progressively coarser the further way from the point. This reduces the matrix and, consequently, the linear systems to a sparse form that can be solved efficiently. Recently, Silva et al. (2014) extended and generalized the original method of Bott (1960) and Santos et al. (2015) used this extension to estimate a basement relief with sharp boundaries.
A spherical Earth approximation is preferred when estimating the Moho depth from gravity data in continental and global scale studies. Wieczorek & Phillips (1998) developed a spherical harmonic equivalent of the Parker-Oldenburg FFT algorithm and applied it to estimate the crustal structure of the Moon. Reguzzoni et al. (2013) use a spherical Earth approximation to estimate the global Moho relief using data from the GOCE satellite mission. Another approach is to use non-spectral (space domain) gravity inversion methods. Many such methods were developed for estimating the basement relief of a sedimentary basin (e.g. Barbosa et al. 1997Barbosa et al. , 1999aMartins et al. 2010Martins et al. , 2011Sun & Li 2014). These methods approximate the sedimentary pack by a set of juxtaposed right-rectangular prisms. The top of the prisms coincide with the Earth's surface and the prisms' thicknesses represent the depths to the basement and are the parameters to be estimated in the inversion. The use of rectangular prisms implies a planar Earth approximation and may not be adequate for depth-to-Moho estimates in continentalor global-scale study. A straightforward way to circumvent this hindrance is to adapt one of the methods developed for rectangular prisms to use tesseroids (spherical prisms). One of the difficulties of this approach is that the forward problem for a tesseroid must be solved numerically. Two alternatives proposed in the literature to the numerical solution are Taylor series expansion (Heck & Seitz 2007;Grombein et al. 2013) and the Gauss-Legendre Quadrature (Asgharzadeh et al. 2007). Numerical experiments by Wild-Pfeiffer (2008) suggest that the Gauss-Legendre Quadrature (GLQ) offers superior results. However, the GLQ suffers from numerical instability when the computation point is close to the tesseroid (Asgharzadeh et al. 2007). To overcome the numerical instability, Li et al. (2011) proposed an adaptive discretization algorithm which was later improved upon by .
In any gravity inversion for estimating the relief of an interface, two hyperparameters control the inversion results: the densitycontrast between the two media and the reference level around which the interface undulates. The reference level is the constant depth of the Normal Earth Moho in the case of the anomalous Moho. For regularized inversions, an additional hyperparameter is the regularization parameter that balances the relative importance between the data-misfit measure and the regularizing function. The two most commonly used methods for estimating the regularization parameter are the L-curve criterion and Generalized Cross Validation (GCV). Farquharson & Oldenburg (2004) provide for a thorough comparison of both methods. Estimating the density-contrast in a sedimentary basin context has been tackled by Silva et al. (2006) and Martins et al. (2010) when the basement depth is known at a few points. To the authors' knowledge no attempt has been made to estimate the reference level.
We present a nonlinear gravity inversion to estimate the Moho depth in a spherical Earth approximation. Our method is based on Figure 1. Sketch of the stages in gravity data correction and the discretization of the anomalous Moho relief using tesseroids. (a) The Earth and the measured gravity at point P (g(P)). (b) The Normal Earth and the calculated normal gravity at point P (γ (P)). z ref is the depth of the Normal Earth Moho. (c) The gravity disturbance (δ(P)) and the corresponding density anomalies after removal of the normal gravity: topography, oceans, crustal and mantle heterogeneities, and the anomalous Moho. (d) The Bouguer disturbance (δ bg (P)) after topographic correction and the remaining density anomalies. (e) All density anomalies save the anomalous Moho are assumed to have been removed before inversion. (f) The discretization of the anomalous Moho in tesseroids. Grey tesseroids will have a negative density contrast while red tesseroids will have a positive one.
the Silva et al. (2014) Gauss-Newton formulation of the method of Bott (1960). We use tesseroids to discretize the anomalous Moho and the adaptive discretization algorithm of  for the forward modelling. The stability of the inversion is achieved through smoothness regularization. In order to maintain the computational efficiency of Bott's method, we exploit the sparse nature of all matrices involved in the computations. We employ a variant of GCV known as hold-out cross-validation (Kim 2009) to estimate the regularization parameter. Additionally, we estimate the densitycontrast and reference level simultaneously in a second validation step using knowledge of the Moho depth at a few points, similarly to Silva et al. (2006) and Martins et al. (2010). Finally, we apply the proposed method to estimate the Moho depth for South America using gravity data from the GOCO5S model (Mayer-Guerr et al. 2015) and the seismological data of Assumpção et al. (2013).

M E T H O D O L O G Y
In potential field methods, we must isolate the target anomalous density distribution before modelling and inversion. In our case, the target is the relief of the real Moho undulating around a reference Moho. We do this by removing all other effects from the gravity observations. The first correction is to remove the scalar gravity of an ellipsoidal reference Earth (the Normal Earth), hereafter denoted as γ . This effect is calculated on the same point P where the gravity observation was made (Figs 1a and b). γ (P) is calculated using the closed-form solution presented by Li & Götze (2001). The difference between the observed gravity at point P (g(P)) and Normal gravity at the same point is known as the gravity disturbance, δ(P) = g(P) − γ (P).
(1) 164 L. Uieda and V.C.F. Barbosa Figure 2. Sketch of a tesseroid (spherical prism) in a geocentric coordinate system (X, Y, Z). Observations are made at point P with respect to its local north-oriented coordinate system (x, y, z). After Uieda (2015).
(see Fig. 1c). This includes all masses above the surface of the ellipsoid (the topography), the mass deficiency of the oceans, the mass deficiency of sedimentary basins, crustal sources (e.g. igneous intrusions, lateral density changes, etc.), heterogeneities below the upper mantle, and the effect of the difference between the real Moho topography and the Moho of the Normal Earth.
To estimate the anomalous Moho relief from gravity data, we must first isolate its gravitational attraction. Thus, all other gravitational effects must be either removed or assumed negligible. Here, we will remove the gravitational effect produced by the known topography and ocean masses to obtain the full Bouguer disturbance (Fig. 1d), δ bg (P) = δ(P) − g topo (P). (2) We will also remove the gravitational effect of know sedimentary basins but assume that the effects of other crustal and mantle sources are negligible. Thus, the only effect left will be that of the anomalous Moho relief (Fig. 1e). The gravitational attraction of the topography, oceans, and basins are calculated in a spherical Earth approximation by forward modelling using tesseroids (Fig. 2). The tesseroid effects are calculated numerically using GLQ integration (Asgharzadeh et al. 2007). The accuracy of the GLQ integration is improved by the adaptive discretization scheme of .

Parametrization and the forward problem
We parameterize the forward problem by discretizing the anomalous Moho into a grid of M lon × M lat = M juxtaposed tesseroids (Fig. 1f). The true (real Earth) Moho varies in depth with respect to the Moho of the Normal Earth. Hereafter we will refer to the depth of the Normal Earth Moho as z ref (see Fig. 1b). If the true Moho is above z ref , the top of the kth tesseroid is the Moho depth z k , the bottom is z ref , and the density-contrast ( ρ) is positive (red tesseroids in Fig. 1f). If the Moho is below z ref , the top of the tesseroid is z ref , the bottom is z k , and ρ is negative (grey tesseroids in Fig. 1f).
Considering that the absolute value of the density-contrasts of the tesseroids is a fixed parameter, the predicted gravity anomaly of the Moho is a nonlinear function of the parameters z k , k = 1, . . . , M, in which d i is the ith element of the N-dimensional predicted data vector d, p is the M-dimensional parameter vector containing the M Moho depths (z k ), and f i is the ith nonlinear function that maps the parameters onto the data. The functions f i are the radial component of the gravitational attraction of the tesseroid Moho model.

Inverse problem
We wish to estimate the parameter vector p from a set of observed gravity data d o . The least-squares estimate is the one that minimizes the data-misfit function Function φ(p) is nonlinear with respect to p. Thus, we can determine its minimum using gradient-based iterative optimization methods like Gauss-Newton or Steepest Descent. Such methods start from an initial approximation to the model parameter vector p 0 and estimate a parameter perturbation vector p 0 . The perturbation vector is used to update p 0 to p 1 = p 0 + p 0 . This procedure is repeated until a minimum of function φ(p) (eq. 4) is reached.
For the Gauss-Newton method, the parameter perturbation vector at the kth iteration p k is obtained by solving the linear system in which ∇φ k and H k are, respectively, the gradient vector and the Hessian matrix of φ(p).
The gradient vector and the Gauss-Newton approximation of the Hessian matrix of φ(p) are, respectively, and in which A k is the N × M Jacobian or sensitivity matrix whose elements are

Regularization
Nonlinear gravity inversions for estimating the relief of an interface separating two media (like the Moho) are ill-posed and require additional constraints in the form of regularization (Silva et al. 2001). A common approach is to use the first-order Tikhonov regularization (Tikhonov & Arsenin 1977) to impose smoothness on the solution. The cost function for smoothness regularization is given by where R is an L × M finite-difference matrix representing L firstorder differences between the depths of adjacent tesseroids.
To transform the ill-posed inverse problem into a well-posed one via Tikhonov regularization, we adopted the well-established procedure of formulating a constrained inverse problem that is solved by minimizing an unconstrained goal function Fast nonlinear gravity inversion 165 in which μ is the regularization parameter that controls the balance between fitting the observed data and obeying the smoothness constraint imposed by the regularizing function θ (p) (eq. 9). The goal function (p) is also nonlinear with respect to p and can be minimized using the Gauss-Newton method. The gradient vector and Hessian matrix of the goal function are, respectively, and At the kth iteration, the parameter perturbation vector p k is obtained by solving the linear equation system Estimating the Moho depths using the above equations is computationally costly because of two main factors: (1) the evaluation and storage of the dense N × M Jacobian matrix A k and (2) the solution of the resulting M × M equation system. In practice, the derivatives in the Jacobian (eq. 8) are often calculated through a first-order finite-difference approximation. Thus, evaluating A k requires 2 × M × N forward modelling operations for each iteration of the gradient descent algorithm. These computations are performed for each iteration of the optimization of the goal function (p). Bott (1960) developed an efficient method to determine the depth of the basement of a sedimentary basin from gravity observations. The method requires data on a regular grid of N x × N y = N observations. The basement relief is then discretized into an equal grid of M x × M y = M elements with M x = N x and M y = N y . Bott's iterative method starts with an initial approximation of the basement depths p 0 equal to the null vector. The method updates the approximation by calculating a parameter perturbation vector p k using the formula

Bott's method
in which G is the gravitational constant and ρ is the contrast between the density of the sediments and the reference density. The iterative process stops when the inversion residuals r k = d o − d(p k ) fall below the assumed noise level of the data. Silva et al. (2014) showed that Bott's method can be formulated as a special case of the Gauss-Newton method (eq. 5) by setting the Jacobian matrix (eq. 8) to where I is the identity matrix. In this framework, Bott's method uses a Bouguer plate approximation of the gravitational effect of the relief, d i = 2π G ρ p i . The derivative of d i with respect to the parameter p i is 2π G ρ, thus linearizing the Jacobian matrix. However, the nonlinearity of the predicted data d(p k ) is preserved.
One of the advantages of Bott's method over the traditional Gauss-Newton or Steepest Descent is the elimination of the computation and storage of the dense Jacobian matrix A k . Furthermore, Bott's method also does not require the solution of equation systems. However, a disadvantage of Bott's method is that it suffers from instability (Silva et al. 2014). A common approach to counter this issue is to apply a smoothing filter after the inversion to the unstable estimate, as in Silva et al. (2014).

Combining Bott's method, regularization and tesseroids
We propose a regularized version of Bott's method to invert gravity data for estimating the depth of the Moho in spherical coordinates. To adapt Bott's method to spherical coordinates, we replace the right-rectangular prisms in the forward modelling (d(p k ) in eq. 14) with tesseroids. The tesseroid forward modelling uses the adaptive discretization algorithm of  to achieve accurate results. Furthermore, our formulation maintains the regularized solution for the Gauss-Newton method (eq. 13) but replaces the full Jacobian matrix with the Bouguer plate approximation. Here, the Jacobian matrix is replaced by a diagonal matrix (eq. 15) whose elements are invariant along successive iterations. Using this approximation eliminates the cost of computing and storing the full N × M-dimensional Jacobian matrix A k at each iteration (eq. 8). Traditionally, the full Jacobian matrix is computed using a firstorder finite difference scheme, which requires 2 × N × M forward modelling operations per iteration. Using eq. (15) requires N multiplications that need only be performed once. This provides a considerable speed gain.
Matrix arithmetic operations can be performed efficiently by taking advantage of the sparse nature of matrices A and R (respectively, eqs 15 and 9). The same is true for solving the equation system in the Gauss-Newton method (eq. 13). However, the computational cost of forward modelling is still present. Particularly, forward modelling using tesseroids is more computationally intensive than using right-rectangular prisms because of the numerical integration and adaptive discretization . We show later in this article that sparse matrix multiplications and solving the sparse linear system in eq. (13) account for less than 0.1 per cent of the computation time required for a single inversion.
The main advantage of our formulation is that it retains the efficiency of Bott's method while stabilizing the solution through the well-established formalism of Tikhonov regularization. For example, the total variation approach used by Martins et al. (2011) could potentially be implemented in a more straight forward manner than done by Santos et al. (2015).

Estimating the inversion hyperparameters
Parameters that influence the inversion result but are not estimated directly in the inversion are known as hyperparameters. In the case of our regularized Moho depth inversion, the hyperparameters are the regularization parameter μ (eq. 10), the Moho density-contrast ρ (eq. 15), and the depth of the Normal Earth Moho, or reference level, z ref (Fig. 1b).
We estimate these hyperparameters in two steps. First, we assume fixed values for z ref and ρ and perform a hold-out crossvalidation procedure (Hansen 1992) to estimate an optimal value for μ. Our investigations suggest that the optimal value of μ does not depend on the particular values of z ref and ρ used. Second, we use the estimated μ to perform a validation procedure to estimate z ref and ρ. The final outcomes of both steps are the values of the three hyperparameters and the final estimated Moho depths.

Estimating the regularization parameter
The regularization parameter μ controls how much smoothness is applied to the inversion result. An optimal value of μ will by guest on November 11, 2016 http://gji.oxfordjournals.org/ stabilize and smooth the solution while not compromising the fit to the observed data. Two widely used methods to estimate an optimal μ are the L-curve criterion and cross-validation (Hansen 1992). Here, we will adopt the hold-out method of cross-validation (Kim 2009). The hold-out method consists of splitting the observed data set into two independent parts: a training set d o inv and a testing set d o test . The training set is used in the inversion while the testing set is kept back and used to judge the quality of the chosen value of μ. For a value of the regularization parameter μ n , the training set is inverted using μ n to obtain an estimatep n . This estimate is used to calculate predicted data on the same points as the testing set via forward modelling The metric chosen to evaluate μ n is the mean square error (MSE) of the misfit between the observed and predicted testing data sets, in which N test is the number of data in the testing set. The optimal value of μ will be the one that minimizes the MSE, that is, the one that best predicts the testing data. We emphasize that the inversion is performed on the training data set only.
The algorithm for the hold-out cross-validation is summarized as follows: (a) Estimatep n by inverting the training set d o inv . (b) Usep n to calculate the predicted testing set d n test using eq. (16).
(c) Calculate the mean square error MSE n using eq. (17).
(iii) The final solution is thep n corresponding to the smallest MSE n .
The separation of the training and testing data sets is commonly done by taking random samples from the full data set. However, we cannot perform the separation in this way because Bott's method requires data on a regular grid as well as having model elements directly below each data point. Thus, we take as our training set the points from the observed data grid that fall on a similar grid but with twice the grid spacing (open circles in Fig. 3). All other points from the original data grid make up the testing data set (black dots in Fig. 3). This separation will lead to a testing data set with more points than the training data set. A way to balance this loss of data in the inversion is to generate a data grid with half of the desired grid spacing, either through interpolation or from a spherical harmonic model.

Estimating z ref and ρ
The depth of the Normal Earth Moho (z ref ) and the density-contrast of the anomalous Moho ( ρ) are other hyperparameters of the inversion. That is, their value influences the final solution but they are not estimated during the inversion. Both hyperparameters cannot be determined from the gravity data alone. Estimating z ref and ρ requires information that is independent of the gravity data, such as knowledge of the parameters (Moho depths) at certain points. This information can be used in a manner similar to the crossvalidation described in the previous section. In this study, we use Let z o s be a vector of N s known Moho depths. We use the MSE as a measure of how well a given inversion outputp l,m fits the know depths. The optimal values of z ref and ρ are the ones that best fit the independent known Moho depths (i.e. produce the smallest MSE). However, the points do not necessarily coincide with the model elements of the inversion. Before computing the MSE, we interpolatep l,m on the known points to obtain the predicted depths z l,m s . The MSE is defined as The algorithm for estimating z ref and ρ is: A similar approach was used by Silva et al. (2006) and Martins et al. (2010) to estimate the parameters defining the density-contrast variation with depth of a sedimentary basin. van der Meijde et al. (2013) also had a similar methodology for dealing with the hyperparameters, though in a less formalized way.

Software implementation
The inversion method proposed here is implemented in the Python programming language. The software is freely available under the terms of the BSD 3-clause open-source software license. Our implementation relies on the open-source libraries scipy and numpy (Jones et al. 2001 (Hunter 2007, http:// matplotlib.org) and seaborn (Waskom et al. 2015, http://stanford. edu/∼mwaskom/software/seaborn) for plots and maps, and Fatiando a Terra (Uieda et al. 2013, http://www.fatiando.org) for geophysics specific tasks, particularly for forward modelling using tesseroids. We use the scipy.sparse package for sparse matrix arithmetic and linear algebra. The sparse linear system in eq. (13) is solved using the conjugate gradient method implemented in scipy.sparse.
The computational experiments (e.g. data processing, synthetic tests, real data application) were performed in Jupyter (formerly IPython) notebooks (Pérez & Granger 2007, http://jupyter.org/). The notebook files combine the source code used to run the experiments, the results and figures generated by the code, and rich text to explain and document the analysis.

A P P L I C AT I O N T O S Y N T H E T I C DATA
We test and illustrate the proposed inversion method by applying it to two noise-corrupted synthetic data sets. The first one is generated by a simple Moho model simulating the transition from a thicker continental crust to a thinner oceanic crust. This application uses cross-validation to estimate the regularizing parameter (μ) while assuming that the anomalous Moho density-contrast ( ρ) and the Normal Earth Moho depth (z ref ) are known quantities. This first test is simplified in order to investigate solely the efficiency of the inversion and the cross-validation procedure to estimate μ. The second data set is generated by a more complex model derived from the South American portion of the global CRUST1.0 model (Laske et al. 2013). This second application uses cross-validation to estimate μ and the validation procedure using synthetic seismological data to estimate ρ and z ref . The model and corresponding synthetic data are meant to simulate with more fidelity the real data application.

Simple model
We simulate the transition from a continental-type Moho to an oceanic-type Moho using a model composed of M lat × M lon = 40 × 50 grid of juxtaposed tesseroids (a total of M = 2000 model elements). The anomalous Moho density-contrast is ρ = 400 kg m −3 and the Normal Earth Moho depth is z ref = 30 km. Fig. 4(a) shows the model Moho depths where we can clearly see an eastward crustal thinning. In Fig. 4(a), each pixel in the pseudo-colour image corresponds to a tesseroid of the model.
The synthetic data were forward modelled on a regular grid of N lat × N lon = 79 × 99 points (a total of N = 7821 observations) at a constant height of 50 km. The data were contaminated with pseudo-random noise sampled from a normal distribution with zero mean and 5 mGal standard deviation. Fig. 4(b) shows the noisecorrupted full synthetic data set exhibiting an eastward increase due to the simulated eastward crustal thinning shown in Fig. 4(a). The data grid spacing is half the grid spacing of the tesseroid model so that, when separating the training and testing data sets (Fig. 3), the training data set points will fall directly above each model element. We separated the synthetic data into training and testing data sets following Fig. 3. The training data set is a regular grid of N lat × N lon = 40 × 50 points (a total of N train = 2000). The testing data set is composed of N test = 5821 observations. We used cross-validation to estimate an optimal regularization parameter (μ) from a set of N μ = 16 values equally spaced on a logarithmic scale between 10 −6 and 10 −1 . We ran our regularized inversion on the training data set for each value of μ, obtaining 16 Moho depth estimates. For all inversions, the initial Moho depth estimate used to start the Gauss-Newton optimization was set to 60 km depth for all inversion parameters. Furthermore, z ref and ρ are set to their respective true values. Finally, we computed the MSE (eq. 17) for each estimate and chose as the final estimated Moho model the one that minimizes the MSE. Fig. 5(a) shows the final estimated Moho depth after the crossvalidation. The recovered model is smooth, indicating that the crossvalidation procedure was effective in estimating an optimal regularization parameter. Fig. 5(b) shows difference between the true Moho depth (Fig. 4a) and the estimated Moho depth. The differences appear to be semi-randomly distributed with a maximum coinciding with a short-wavelength feature in the true model. The maximum and minimum differences are approximately 2.19 and −2.13 km, respectively. Fig. 5(c) shows inversion residuals, defined as the difference between the observed and predicted data (in mGal). The largest residual (in absolute value) coincides with the largest difference between the true model and the estimate. The inversion residuals are normally distributed, as shown in Fig. 5(d), with 0.02 mGal mean and a standard deviation of 3.63 mGal. The cross-validation curve in Fig. 5   at μ = 0.00046 (indicated by the red triangle). Fig. 5(f) shows the convergence of the Gauss-Newton optimization in eight iterations. We also investigated the computation time spent in each section of the inversion process using a source code profiler. The profiler measures how much time is spent inside each function during the execution of a program. We ran the profiler on a single inversion of the training data set using the estimated regularization parameter. We tracked the total time spent inside each of the three functions that represent the potential bottlenecks of the inversion: solving the linear system in eq. (13) using the conjugate gradient method, performing the dot products required to compute the Hessian matrix (eq. 12) and the gradient vector (eq. 11), and forward modelling to calculate the predicted data (eq. 3). The profiling results presented in Table 1 show that the time spent on forward modelling accounts for approximately 99.8 per cent of the total computation time.

Model based on CRUST1.0
In this test, we simulate the anomalous Moho of South America using Moho depth information extracted from the CRUST1.0 model (Laske et al. 2013). We construct a tesseroid model with M lat × M lon = 80 × 60 juxtaposed elements, 4800 in total, using the Moho depths shown in Fig. 6(a). In our model, the Normal Earth Moho is z ref = 30 km and the density-contrast is ρ = 350 kg m −3 . We produce the synthetic data at a constant height of 50 km and on a regular grid of N lat × N lon = 159 × 119 points (a total of 18 921 observations). We contaminate the synthetic data with normally distributed pseudo-random noise with zero mean and 5 mGal standard deviation (Fig. 6b).
The validation procedure to determine ρ and z ref requires knowledge of the Moho depth at certain points (z o s in eq. 18), usually from seismic experiments. Thus, we must also generate synthetic seismic data about the Moho depth. We produce such data by interpolating the Moho depth shown in Fig. 6(a) on the same 937 geographic coordinates pinpointed in the data set of Assumpção et al. (2013). The resulting synthetic seismic data is shown in Fig. 6(c).
We estimate the three hyperparameters in two parts. First, we run the cross-validation to estimate an optimal regularization parameter (μ). The starting estimate for all inversions is 60 km depth for all model parameters. For this cross-validation, we keep z ref and ρ fixed to 20 km and 500 kg m −3 , respectively. Second, we use the estimated μ to run the validation procedure with the synthetic seismological data to estimate z ref and ρ, thus obtaining the final estimated Moho depths. Fig. 7 summarizes the results.
For the cross-validation, we separate the synthetic data (Fig. 3) into a training set with twice the grid spacing of the original data (N lat × N lon = 80 × 60) and a testing set with 14 121 observations. We run the inversion for 16 different values of μ equally spaced in a logarithmic scale between 10 −7 and 10 −2 . For each of the 16 estimates we compute the MSE (eq. 17), shown in Fig. 7(a) as function of μ. The optimal regularization parameter that minimizes the MSE is μ = 10 −4 (red triangle in Fig. 7a).
In the validation using seismological data, we use the esti-  Fig. 7(b) shows a pseudo-colour map of the MSE with a minimum (marked by the red triangle) at z ref = 30 km and ρ = 350 kg m −3 . Fig. 7(c) shows the final solution after cross-validation and validation using seismological data. The recovered model is smooth, indicating that the cross-validation procedure was effective in estimating an optimal regularization parameter. Fig. 7(d) shows the difference between the true Moho depths (Fig. 6a) and the estimated depths (Fig. 7c). The maximum and minimum differences are, respectively, 9.8 and −8.2 km. The largest absolute differences are located along the central and northern Andes, where there is a sharp increase in the true Moho depth (Fig. 6a). Positive differences (indicating a too shallow estimate) appear along the central portion of the Andes, flanked by regions of negative differences (indicating a too deep estimate) on the continental and Pacific sides. Figs 7(e) and (g) show the gravity residuals, defined as the difference between the observed and predicted gravity data. The residuals appear normally distributed, with 0.03 mGal mean and a standard deviation of 4.10 mGal. The gravity residuals follow a similar, though reversed, pattern to the differences shown in Fig. 7(d). The largest residuals (in absolute value) are along the Andes, with the central portion being dominated by negative residuals and flanked by positive residuals on both sides. Figs 7(f) and (h) show the differences between the synthetic seismic data (Fig. 6c) and the estimated Moho depths. Once more, the largest differences are concentrated along the Andes, particularly in the central Andes and near Ecuador and Colombia. The differences are smaller along the Atlantic coast of South America, with notable larger differences in a few points of northeastern Brazil and along the Amazon river. In general, large residuals are associated with sharp increases in Moho depth.

A P P L I C AT I O N T O T H E S O U T H A M E R I C A N M O H O
We apply the inversion method proposed here to invert for the Moho depth of the South American continent. We follow the application of van der Meijde et al. (2013) but with some differences, mainly using a different data set and performing all modelling in spherical coordinates using tesseroids. The data are corrected of the effects of topography and sedimentary basins. Crust and mantle heterogeneities cannot be properly accounted for in regions where information coverage is sparse and readily accessible models are not available, like in South America and Africa. Hence, for the purposes of this study, we will assume to be negligible all other crustal and mantle sources, including lateral variations in density along the Moho. All tesseroid models are defined with respect to a spherical Earth of radius 6378.137 km.

Gravity and seismological data
The raw gravity data are generated from the satellite only spherical harmonic model GOCO5S (Mayer-Guerr et al. 2015). The GOCO5S model combines data from 15 satellites, including the complete mission data from the GOCE satellite. The data were downloaded from the International Centre for Global Earth Models (ICGEM) web-service (Barthelmes & Köhler 2012, http://icgem.gfz-potsdam.de/ICGEM/) in the form of the complete gravity field on a regular grid with 0.2 • grid spacing at ellipsoidal height 50 km. We calculate the gravity disturbance (δ(P) in eq. 1) by subtracting from the raw data the normal gravity of the WGS84 reference ellipsoid (γ (P)) using the formula of Li & Götze (2001). Fig. 8(a) show the calculated gravity disturbance of South America.
We remove the gravitational effect of the topography from the gravity disturbance by modelling the ETOPO1 digital terrain model (Amante & Eakins 2009, http://dx.doi.org/10.7289/V5C8276M) using tesseroids (Fig. 8b). We used the standard densities of 2670 kg m −3 for continents and −1630 kg m −3 for the oceans. Fig. 8(c) shows the calculated gravitational attraction of the topographic masses at 50 km height. Fig. 8(d) shows the Bouguer disturbance (eq. 2) obtained after subtracting the topographic effect from the gravity disturbance.
The effect of sedimentary basins is removed using tesseroid models of the three sedimentary layers present in the CRUST1.0 model (Laske et al. 2013, http://igppweb.ucsd.edu/˜gabi/rem.html). Each sedimentary layer model includes the density of each 1 • × 1 • model cell. Figs 8(e)-(g) show the thickness of the upper, middle, and lower sedimentary layers, respectively. The density-contrasts of the tesseroid model are obtained by subtracting 2670 kg m −3 from the density of each model element. Fig. 8(h) shows the combined gravitational attraction of the sedimentary basin tesseroid model. We subtract the total effect of sediments from the Bouguer disturbance in Fig. 8(d) to obtain the sediment-free Bouguer disturbance (Fig. 9a), which will be used as input for the inversion. Fig. 9(b) shows the 937 known Moho depths (coloured dots) which were estimated from seismological data by Assumpção et al. (2013). This data set is used in the validation procedure.

Inversion, cross-validation, and validation using seismological data
As in the CRUST1.0 synthetic data test (Section 3.2), we estimate the hyperparameters in two steps. First, we run the cross-validation to estimate an optimal regularization parameter (μ). The starting  estimate for all inversions is 60 km depth for all model parameters. For this cross-validation, we keep z ref and ρ fixed to 20 km and 500 kg m −3 , respectively. Second, we use the estimated μ to run the validation using the seismological data of Assumpção et al. (2013) to estimate z ref and ρ, thus obtaining the final estimated Moho depth model.
We split the sediment-free gravity disturbance (Fig. 9a) into the training and testing data sets. The training data set is a regular grid with 0.4 • grid spacing (twice the spacing of the original data grid) and N lat × N lon = 201 × 151 grid points, a total of 30 351 observations. The remaining 90 350 points compose the testing data set. We test 16 values of the regularization parameter (μ) equally spaced on a logarithmic scale between 10 −10 and 10 −2 . Fig. 10(a) shows the Mean Square Error (MSE) as a function of μ. The minimum MSE is found at μ = 10 −10 , the lowest value of μ tested, suggesting that little or no regularization is required.
We proceed with the validation using seismological data using μ = 10 −10 in all inversions. We test all combinations of nine values of z ref , from 20 to 40 km with 2.5 km intervals, and seven values of ρ, from 200 to 500 kg m −3 with 50 kg m −3 intervals. Fig. 10(b) shows a pseudo-colour map of the MSE with respect to the Assumpção et al. (2013) data set. The MSE has a minimum, indicated by the red triangle, at z ref = 35 km and ρ = 400 kg m −3 . The minimum is not as well-defined as for the CRUST1.0 synthetic ( Fig. 7b), which is expected because in reality ρ is not homogeneous across all of South America and the surrounding oceans.

Moho model for South America
The final Moho depth model for South America is shown as a pseudo-colour map in Fig. 11. The model is available in the online repository that accompanies this contribution (see Section 2.7). Each model element is a 0.4 • × 0.4 • tesseroid, represented by the pixels in the pseudo-colour map.
Our model differs significantly from CRUST1.0 ( Fig. 6a) but contains most of the large-scale features present in the GMSA12 gravity-derived model of van der Meijde et al. (2013). The deepest Moho is along the central Andes, reaching depths upward of 70 km. The oceanic areas present the shallowest Moho, ranging approximately from 7.5 to 20 km. The Brazilian and Guyana Shields have a deeper Moho (greater than 35 km), with the deepest portions in the area around the São Francisco Craton and the northern border of the Parecis Basin. The Moho is shallower than 35 km along the Guyana Basin, the Andean foreland basins, the Chaco Basin, and along the centres of the Solimões, Amazonas and Paraná Basins. Fig. 12(a) shows the gravity residuals, defined as the difference between the observed and predicted gravity data. Fig. 12(b) shows the differences between the seismic-derived Moho depths of 172 L. Uieda and V.C.F. Barbosa Figure 9. Input data for the South American Moho inversion. (a) Sediment-free Bouguer disturbance for South America. Obtained by subtracting the total sediment gravitational effect (Fig. 8h) from the Bouguer disturbance (Fig. 8d). (b) Seismological Moho depth estimates from Assumpção et al. (2013). Assumpção et al. (2013;Fig. 9b) and the depths of our gravityderived model (Fig. 11). The differences shown in Fig. 12(b) range from approximately −23 to 23 km and have a mean of 1.18 km and a standard deviation of 6.84 km. The gravity residuals and Moho depth differences from seismic are smallest in the oceanic areas, southern Patagonia, and the eastern coast of the continent. The largest gravity residuals are located along the Andes and correlate with the deepest Moho depths. These large residuals follow a pattern of a negative value in the centre flanked by positive values to the east and west. This same pattern is observed in the CRUST1.0 synthetic test results (Fig. 7). In general, larger gravity residuals appear to be associated with sharp variations in the estimated Moho depth. Along the Andes, large differences with seismic data are correlated with the larger gravity residuals. Conversely, this correlation is absent from the large differences seen in the Guyana, Paraná, and the Solimões Basins. In the Borborema province, northeastern Brazil, our model slightly overestimates the Moho depth. On the other hand, our model underestimates the Moho depths in the Amazonas, Solimões, and Paraná Basins. Particularly in the Amazonas and Solimões Basins, where our model predicts a Moho depth of approximately 30 km, the differences with the seismological estimates can reach 10 km or more.

Discussion
Differences between our Moho depth model and the seismological data (Fig. 12b) may indicate regions where our initial assumptions (summarized in Fig. 1) are inadequate or where we have failed to correct for all crustal and mantle sources. The largest differences are seen along the Andean Province and are likely caused by the fact that our model does not include the subducting Nazca plate. Furthermore, the CRUST1.0 synthetic data test (Fig. 7) suggests that our inversion method is not able to fully recover deep Moho depths in the Andes, even without the effect of the subducting plate. In the Guyana Basin, our model is able to fit the gravity data but differs from the seismological data by up to ±10 km with no clear pattern for the distribution of the differences. A possible explanation is an inaccuracy in the CRUST1.0 sediment model (Laske et al. 2013) used to correct our gravity data. The inversion results will be biased if the input data includes effects other than the anomalous Moho.
In the Amazon and Paraná Basins, our model fits the gravity data but underestimates the seismological data by up to 15 km. This indicates that a mass excess may be present in the crust or in the upper mantle. A body with positive density contrast whose gravitational effect was not removed from the data during processing will make the observed gravity disturbance greater than it would be otherwise. This will cause the inversion to produce a shallower Moho estimate. These discrepancies between gravity and seismological estimates have been noted before by Nunn & Aires (1988) for the Amazon Basin and Mariani et al. (2013) for the Paraná Basin. Both studies propose high density rocks in the lower crust as probable causes for the discrepancies. Another possible cause for the observed discrepancy in our model could be our failure to fully remove from the data the effects of the igneous intrusions present in both basins. Using a sediment model for South America more detailed than CRUST1.0 might lead to more accurate results for these basins.
Greater confidence in our Moho model can be had in areas where it is able to fit both the gravity and the seismological data. In such places, a gravity-derived model can serve as an interpolator for the seismological point estimates. In general, our model fits both data sets in the oceans, the Atlantic coast of the continent, and central Brazil. In the Parnaíba Basin, our model has small differences with the few seismological data points that fall inside and on the borders of the basin. The Moho surrounding the basins continental borders is deep, following inward with a shallower Moho, then a deeper step, and finally a shallower part in the middle of the basin. Recent deep seismic reflection studies by Daly et al. (2014) that cross the basin from west to east agree with our model. The Northern border of the Parecis Basin has a deep Moho in our model that is corroborated by a single seismological data point.

C O N C L U S I O N S
We have developed a computationally efficient gravity inversion method in spherical coordinates. Our method extends the Gauss-Newton formulation of Bott's method to use tesseroids as model elements and Tikhonov regularization. The computational efficiency of our method is due to using Bott's approximation for the Jacobian matrix and using sparse matrix algorithms for arithmetic operations and the solution of linear systems. This approximation for the Jacobian matrix is adequate and the method converges even when the data are at higher altitudes and the model is not outcropping, as shown by the applications to synthetic data. We employ hold-out cross-validation to estimate the regularization parameter and a validation procedure using seismological data to estimate the Moho density-contrast and the Normal Earth Moho depth.
There are two main advantages of the proposed method over previous works. First, unlike the Parker-Oldenburg method or methods using rectangular prisms, our inversion method does not require the data to be projected onto a plane. Second, the Parker-Oldenburg method and methods derived from Bott's method cannot apply the traditional methodology for constraining inverse problems. Our method has no such restriction because we use the formalism of traditional Tikhonov inversion. However, the proposed method is not without limitations. It requires data on a regular grid and restricts the model to be a regular mesh tied to the data grid. Like Bott's method, our method only works for gravity disturbances and not for the gravity gradients.
The test on simple synthetic data shows that our inversion method is able to recover a smooth Moho relief with a homogeneous densitycontrast distribution. The inversion was not able to fully recover the shortest wavelength feature in the model, possibly due to the smoothness constraints which tend to soften high-frequency (sharp) variations. The cross-validation Mean Square Error curve has a welldefined minimum, indicating a value of the regularization parameter (μ) whose corresponding estimate best predicts data that were not included in the inversion. Using this value of μ in the inversion leads to a stable solution characterized by a smooth Moho relief with an acceptable data misfit.
The efficiency of the proposed method is because solving linear systems and performing matrix multiplications together account for a mere 0.067 per cent of the total computation time required for a single inversion. The majority of the computation time (99.824 per cent) is spent on forward modelling. Thus, we are able to retain the high computational efficiency of Bott's method and use a classic Tikhonov regularization formulation. This approach could, in theory, be extended to other types of regularization (e.g. total variation) and misfit functions (e.g. re-weighted least squares) already available in the literature.
The more complex synthetic data test based on CRUST1.0 shows that the validation using pointwise Moho depth information is able to correctly estimate the density-contrast ( ρ) and Normal Earth Moho depth (z ref ). This test indicates that the inversion neither correctly estimates Moho depth nor adequately fits the gravity and pointwise data when sharp variations in Moho depth occur. This phenomenon is particularly strong in the region below the Andes. A likely explanation is that the smoothness regularization is intrinsically unable to produce sharp variations in Moho depth. These effects might be mitigated with the use of sharpnessinducing regularization, like the weighted smoothness inversion, Cauchy norm regularization, entropic regularization, total variation regularization, or an adaptive mixed smoothness-sharpness regularization.
We applied the method proposed here to estimate the Moho depth for South America. Our estimated Moho depth model is in accordance with previous results. The model fits well the gravity and seismic data in all oceanic regions, the central portion of the Andean foreland, Patagonia, and coastal and central parts of Brazil. However, the model is unable to fit the gravity and seismic data in places with sharp variations in Moho depth, particularly below the Andes and in the boundaries of the main geotectonic provinces of the South American Plate, like the Borborema province, the Parnaiba Basin, and the São Francisco Craton. This might indicate that smoothness regularization should not be applied indiscriminately to the whole model, as suggested by the CRUST1.0 synthetic data test. Another reason for the observed misfit might be the presence of crustal or mantle density anomalies whose gravitational effects were not removed during the data corrections. In the Guyana Basin on the coastal region of Venezuela, along the central Amazonas and Solimões Basins, and in the Paraná Basin, our Moho depth model is able to fit the gravity data but differs significantly from the seismic data. These discrepancies in the Paraná and Amazonas Basins are interpreted in the literature as high density rocks in the lower crust. In general, differences between a gravity and a seismically derived Moho model may indicate the presence of crustal or mantle density anomalies that were unaccounted for in the data processing. Such locations warrant further detailed investigation.  Fig. 11. (a) Gravity residuals, defined as the difference between the observed data in Fig. 9(a) and the data predicted by the estimate in Fig. 11. (b) Differences between the seismological depth estimates of Assumpção et al. (2013) and our gravity-derived Moho depth estimate. The inset in b shows a histogram of the differences along with their calculated mean and standard deviation (std). Dotted lines mark the limits of major geologic provinces and lithospheric plates.

A C K N O W L E D G E M E N T S
We are indebted to the developers and maintainers of the opensource software without which this work would not have been possible. We thank Marcelo Assumpção for providing the seismological Moho depth estimates, Naomi Ussami, Julio C.S.O. Lyrio, Cosme F.P. Neto, and Daniel R. Franco for their constructive feedback, and Editor Dr Gary Egbert and two anonymous reviewers for their thoughtful comments that helped improve this work. L. Uieda was supported by a scholarship from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). V.C.F. Barbosa was supported by a fellowship from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).