Upper Mantle Radial Anisotropy Under the Indian Ocean from Higher Mode Surface Waves and a Hierarchical Transdismensional Approach


 We investigated the likelihood of radial anisotropy in the shallow and deep upper mantle, including the mantle transition zone (MTZ) under the Indian Ocean. Seismic anisotropy can be an indicator of mantle deformation through lattice preferred orientation of anisotropic crystals in the mantle. It has thus the potential to illuminate Earth’s dynamic interior, but previous seismic tomography studies have not achieved consensus on the existence of radial anisotropy below ∼250 km depth. We developed a fully nonlinear transdimensional hierarchical Bayesian Markov Chain Monte Carlo approach to invert fundamental and higher mode surface wave dispersion data and applied it to a subset of a global Love and Rayleigh wave data set. We obtained posterior model parameter distributions for shear wave velocity (VS) and radial anisotropy ξ under the Indian Ocean. These posterior model distributions were used to calculate the probability of having radial anisotropy at different depths. We demonstrated that separate inversions of Love and Rayleigh waves yield models compatible with the results of joint inversions within uncertainties. The obtained pattern of VS anomalies agrees with most previous studies. They display negative anomalies along ridges in the uppermost mantle, but those are stronger than for regularized inversions. The Central Indian Ridge and the Southeastern Indian Ridge present velocity anomalies that extend to ∼200 km depth, whereas the Southwestern Indian Ridge seems to have a shallower origin. Weaker, laterally variable velocity perturbations were found at larger depths. The anisotropy models differ more strongly from regularized inversion results and their uncertainties were rather large. We found that anisotropy models from regularized inversions also depend on the chosen parametrization, which is consistent with the existence of a large model null-space. Apart from a fast horizontally polarized shear wave signal in the top 100 km, likely reflecting the horizontal plate motion due to asthenospheric deformation, no clear relation to surface geology was detected in the anisotropy models. We found that, although the anisotropy model uncertainties are rather large, and lateral variations are present, the data generally prefer at least 1 per cent anisotropy in the MTZ with fast vertically polarized shear waves, within errors. Incorporating group velocity data did not help better constrain deep structure by reducing parameter trade-offs. We also tested the effect of prior constraints on the 410- and 660-km topography and found that the undulations of these discontinuities had little effect on the resulting models in our study region.


INTRODUCTION
Seismic anisotropy, which is the directional dependence of wave propagation, can provide clues to the properties and dynamics of the mantle. Radial anisotropy is the change in seismic wave velocity between the horizontal and vertical direction. Seismic anisotropy can occur via shape preferred orientation (SPO), which occurs where multiple isotropic mediums are stacked on top of each other, such as melt pockets and cracks (Kendall & Silver 1998;Faccenda et al. 2008;Sakamaki et al. 2013). Although the individual mediums are isotropic, the bulk medium has directional dependence. Seismic anisotropy can also develop due to the crystallographic or lattice preferred orientation (CPO or LPO) of intrinsically anisotropic crystals. If such crystals can deform by dislocation creep, they may partially align within the strain field caused by mantle convection, resulting in bulk anisotropy that may be observable at the seismic scale. There are abundant observations of seismic anisotropy in the crust and upper mantle, both globally and regionally. Radial anisotropy was first proposed by Anderson (1962) to explain apparent discrepancies between shear-wave velocity models derived from inversions of Love wave dispersion curves and those resulting from Rayleigh wave data. The Preliminary Reference Earth Model (PREM) (Dziewonski & Anderson 1981) was the first global model to include this type of anisotropy in the top 220 km of the mantle. Since then, many global and regional studies have found evidence for its presence in the uppermost mantle (Montagner & Tanimoto 1991;Shapiro & Ritzwoller 2002;Visser et al. 2008a).
LPO of olivine is the prevailing explanation for the existence of seismic anisotropy at these depths due to the abundance of this mineral in the mantle and its high single crystal anisotropy (Mainprice et al. 2005). Mantle convection is thought to be the origin of LPO in the upper mantle, which is supported by observations of the seismically fast axes in the asthenosphere aligning with absolute plate motion (Becker et al. 2003;Burgos et al. 2014;Debayle & Ricard 2013;Beghein et al. 2014). Therefore, improving our constraints on seismic anisotropy is important to better understand the nature of mantle convection. In particular, determining whether it is present in the mantle transition zone (MTZ) is important since this region likely plays an important role in Earth's thermal evolution, but it is a challenging problem. Improving constraints on upper mantle anisotropy can also help determine the depth of the lithosphere-asthenosphere boundary (LAB), which directly relates to the fundamental concept of plate tectonics. Additionally, vertical changes in seismic anisotropy can indicate layering in the upper mantle. This, in turn, can provide insight on the LAB depth and can indicate the existence of compositional boundaries (Smith et al. 2004;Yuan & Romanowicz 2010;Beghein et al. 2014).
While the presence of seismic anisotropy in the upper 250 km of the mantle, in the lowermost mantle, and in the inner core is well established, its existence in the rest of the mantle is still debated. For decades, seismic anisotropy had not been observed in the deep upper mantle or in most of the lower mantle, which was interpreted as an indication that deformation at those depths did not produce LPO (Karato & Li 1992). More recently, however, radial and azimuthal anisotropy observations below 250 km depth, including in and below the MTZ, have challenged this view (e.g. Gung et al. 2003a;Panning & Romanowicz 2006a;Yuan & Beghein 2013Lynner & Long 2015;Huang et al. 2019). Constraining seismic anisotropy in the deep upper mantle is, however, not straightforward. Body waves do not generally have good vertical resolution as they travel quasi vertically. Some shear-wave splitting analyses using source-side measurements have been able to identify seismic anisotropy originating from large depths, though they cannot separate the contribution of the uppermost mantle from that of the transition zone and lower mantle (Lynner & Long 2014. Tomographic studies that include higher mode surface wave dispersion and/or long period body waveforms generally result in models with better depth resolution, but lower lateral resolution. Several recent publications have reported seismic anisotropy of up to a few percents in the transition zone (e.g. Panning & Romanowicz 2006a;Yuan & Beghein 2013;Auer et al. 2014;Moulik & Ekström 2014;Chang et al. 2015;Ferreira et al. 2019), but there are large discrepancies between models at those depths (Schaeffer et al. 2016). Differences in the models can arise from differences in the datasets, but also from different inversion techniques and applied regularizations.
Tomographic inverse problems are typically solved by trying to minimize a cost function that measures the distance between data and model predictions in a least-squares sense.
They are, however, generally not well-posed and can thus be highly non-unique, i.e., several solutions can explain the data equivalently well. Traditional inverse methods deal with this non-uniqueness by imposing subjective regularizations, but those can have significant impacts on the calculated solution. These regularizations include the choice of the reference model, the model parameterization, and/or the choice of ad hoc damping parameters that compromise between minimizing the misfit and the size of the model (Trampert 1998). In addition, regularized inversions can underestimate model uncertainties as the posterior model covariance is generally smaller than the prior by construction (Tarantola 2005). In addition, resolution analyses for regularized inverse methods depend on the regularization, which may give a false sense of certainty in some model parameters. It should also be noted that errors in tomographic models can be introduced by errors or approximations in the forward theory.
One common example in seismic tomography is solving linearized equations instead of the full non-linear problem.
In this study, in order to avoid some of these issues, we developed a Hierarchical Transdimensional Bayesian (TB) inverse method (Sambridge et al. 2005;Bodin et al. 2012Bodin et al. , 2016Gao & Lekić 2018), which uses a reversible jump Monte Carlo Markov Chain (rj-MCMC) algorithm (Green 1995(Green , 2003Gallagher et al. 2009) to solve the non-linear forward problem associated with calculating higher mode surface wave dispersion from an interior model. With advances in computing, Monte Carlo methods, which repetitively solve the forward problem, have become more and more common in geophysics and seismic tomography in particular. An advantage of these techniques is that they can solve non-linear problems directly instead of relying on a linearized approximation of the problem. In addition, besides the choice of the prior model parameter distribution, they do not require as much regularization as traditional inverse methods. By testing thousands or millions of possible models, they can provide an ensemble of solutions that fit the data instead of choosing a single one with a subjective regularization. This provides quantitative posterior model uncertainties and allows for an estimate of the probability of different values of the model parameters. In our study, we used the resulting parameter distributions to determine the likelihood of the presence of radial anisotropy at different depths in the upper mantle.
It should be noted that a different fully non-linear approach to the same problem was previously taken by Visser et al. (2008a) to determine the likelihood of radial anisotropy in the top 1400 km of the mantle at the global scale. They had found that 1−2% radial anisotropy with vertically polarized shear-waves traveling faster than horizontally polarized shear-waves was likely in the mantle transition zone globally. Our technique has, however, several possible advantages compared to their work. First, it allows for the depth parametrization to change during the model space search, letting the data guide how complicated the model needs to be instead of imposing a fixed set of basis functions a priori. Second, the hierarchical nature of our MCMC algorithm enables us to include the data noise among the unknowns. Using incorrect data noise can negatively impact the accuracy of inversion results (Bodin et al. 2012). Instead, by inverting for data noise, we allow the algorithm to decide how much noise is needed in order to explain the measurements without overfitting them.
In this manuscript, we first present the data and method employed. Second, we demonstrate the dependence of radial anisotropy models on the regularization imposed when performing linearized inversions. Third, we apply our TB technique to a fundamental and higher mode surface wave dataset sampling the Indian Ocean and compare the results with those of a regularized inversion. Fourth, we test the performance of the method with synthetic tests.
Finally, we briefly discuss the results before drawing conclusions.

DATA
The measurements used in this study are the Rayleigh and Love wave fundamental and higher mode phase velocity maps measured by Visser et al. (2008b). The authors used a model space search to perform those measurements, which allowed them to also calculate quantitative data uncertainties. Both Rayleigh and Love waves are necessary to constrain radial anisotropy as they each are sensitive to separate shear-wave velocity polarization directions. Rayleigh waves are sensitive to vertically polarized shear-wave speed (V SV ), and Love waves are sensitive to horizontally polarized shear-wave speed (V SH ). The Voigt average bulk shear-wave speed (V S ) and radial anisotropy (ξ) can be calculated from V SV and V SH (Babuska & Cara 1991): Both wave types are also sensitive to density, ρ, and Rayleigh waves are additionally sensitive to elastic parameters relating to compressional wave velocities. However, due to the existence of parameter trade-offs, Rayleigh waves and Love waves can only resolve shear-wave parameters. We thus only inverted for the two S-wave velocity parameters only, V SH and V SV , as explained in the Method section. Visser et al. (2008b) measured phase velocity dispersion for branches up to the sixth overtone for Rayleigh waves and up to the fifth overtone for Love waves. Including higher mode data is critical to constrain elastic structure beyond the uppermost 200 km of the mantle because they are sensitive to deeper structure than fundamental modes at the same period.
Sensitivity curves for V S and ξ for this dataset are displayed in Fig. 1 at periods between 35 s and 175 s. They show that the higher modes are much more sensitive to structure below 200 km than the fundamental modes, and demonstrate that the dataset used in this study is sensitive to structure well below the bottom of the mantle transition zone. For this work, we divided the phase velocity maps into 5 • × 5 • cells, with each cell being assigned velocities based on the values of the maps at the center of the grid cell. The surface wave velocities were inverted at each of the resulting 238 grid cells separately using different methods, as described in section 3, to obtain one-dimensional (1-D) shear-wave velocity depth profiles.
The obtained individual 1-D velocity profiles were then used to construct shear-wave velocity and anisotropy maps.

METHOD
In this study, we compared results from regularized linear inversions of phase velocity maps with those obtained with a fully non-linear Bayesian approach. Both methods are described below.

Regularized Inversion
The relation between perturbations in phase velocity, c and perturbations in the parameters m describing the physical state of Earth's interior can be written as follows (Woodhouse & Dahlen 1978): where R is the radius of the Earth, δd represents the phase or group velocity data, T is the period considered, and b is the boundary radius. When constraining radial anisotropy, one needs to account for five elastic parameters in addition to density. These elastic parameters are : L and N , which relate to the speed of vertically and horizontally polarized shear-waves, respectively (V 2 SV = L/ρ and V 2 SH = N/ρ); C and A, which relate to P-wavespeed for waves traveling vertically and horizontally, respectively (V 2 P V = C/ρ and V 2 P H = A/ρ); and F , which describes waves traveling at intermediate angle. If we define η = F/(A − 2L), we thus have (Montagner 1986): Among these six parameters, only the two shear-wave related parameters can be reasonably well resolved with surface waves (e.g. Beghein & Trampert 2004). It is customary to assume scaling relationships for the other parameters and to invert for the shear-wave parameters only. Such relationships were derived by Montagner & Anderson (1989) for the uppermost 200 km of the mantle and are commonly used in surface wave inversions for radial anisotropy (e.g. Panning et al. 2006;Xing & Beghein 2015): δ ln η δ ln ξ = −1.5 While these relations are strictly only valid for the shallow upper mantle, studies have shown that using them for the deeper mantle does not affect the shear-wave velocity or anisotropy model significantly (Panning et al. 2006). We thus applied them to the entire depth range of interest. Equation 3 then becomes: where the K kernels are linear combinations of the kernels in Equation 4. The problem can alternatively be described in terms of perturbations in bulk shear-wave velocity and anisotropy: where V S and ξ are defined in Equations 1 and 2.
These equations can be discretized by introducing a basis function to parameterize the depth dependence of the model parameters. We chose a basis function in terms of J splines S j : Equation 3 then becomes: where index i is for each one of the two elastic parameters we solve for. The forward problem can thus be written as: where d is the data vector. G is the kernel matrix that maps the model space into the data space: with G kj an element of G, and K k m i (r) the sensitivity kernel for the ith model parameter in m and the kth measurement in d.
A generalized inverse matrix, G g , can be calculated via singular value decomposition (SVD) and can be used to determine a least-squares solution m (Jackson 1972;Lanczos 1961;Wiggins 1972). If G is a n × m matrix where n is the number of data points and m is the number of model parameters, it can be decomposed into: where U is a n×n matrix of eigenvectors that span the data space, Λ is a n×m diagonal matrix whose columns are non-negative eigenvalues λ i , and V is a m × m matrix of eigenvectors that span the model space. The λ 2 i are the singular values of G. The generalized inverse of G is thus: where p is the number of non-zero eigenvalues. The estimated model parameters m est are then given by a sum limited to the non-zero eigenvalues to insure stability of the solution: Here, we applied the method of Matsu'Ura & Hirata (1982) to determine p: we first normalize G by the data covariance matrix C d and a prior model covariance matrix C m : We then calculated its eigenvalues and determined the solution using all the eigenvalues smaller or equal to unity. For C d , we used a diagonal matrix and the uncertainties estimated by Visser et al. (2008b). The choice of model covariance matrix acts as implicit regularization as it yields a different cutoff in the number of eigenvalues (Snieder & Trampert 2000).
In this study, we inverted Love and Rayleigh waves jointly using the SVD technique described above, and compared models obtained with different prior values for C m and with different model parameterizations (Equations 9 and 10). The misfit (φ d ) was defined by: where N is the total number of measurements, d obs i is the ith measured data, and d pre i is the ith predicted data calculated with Equation 12. We followed Xing & Beghein (2015) and corrected for the effect of the crust by applying non-linear crustal corrections based on the 3-D crustal model CRUST 1.0 (Laske et al. 2013). To calculate these non-linear crustal corrections, we divided the phase velocity maps into 5 • × 5 • cells and, at each grid cell, we created local 1-D models composed of CRUST1.0 and the PREM mantle. At each grid cell, we then used MINEOS (Masters et al. 2011) to predict phase velocities. Those predictions were then subtracted from the real measurements so that the corrected data contain information about mantle structure only. Sensitivity kernels were calculated at each grid cell relative to the local reference model, and we inverted for perturbations in mantle structure only.

Transdimensional Hierarchical Bayesian method
While solving the inverse problem using the regularized method described above is relatively simple and efficient, it yields solutions that depend on the regularizations imposed, including the choice of a reference model. In addition, posterior model uncertainties can be underestimated if the model null-space is large (Trampert 1998), and the non-linear relation between phase velocity and structure are approximated by linearized equations. We thus developed instead a non-linear model space search approach to find a range of plausible solutions instead of choosing one with a subjective regularization. Our technique is based on the hierarchical TB method of Bodin et al. (2012) to explore the model space with a rj-MCMC method, which we combined with Fortran code MINEOS (Masters et al. 2011) to solve the forward problem.
As stated before, some of the advantages of this method compared to traditional inversions include the fact that the depth parameterization can change at each iteration, allowing the data themselves to constrain how complicated the model needs to be. In addition, unknown data noise can be simultaneously inverted for instead of being fixed at a presumed level, which reduces the risk of mapping noise into the model. Another advantage is that model parameters are described by probability density functions (PDFs), which can be used to quantify their uncertainties.
In this study, noise was assumed to be Gaussian, and we chose to parameterize the models as a set of layers. For each layer, a Voronoi nuclei defines its elastic parameter, and the boundaries between layers are the midpoints between adjacent nuclei. The number of points and their depths are allowed to change at each iteration, meaning that the model depth parameterization is a variable of the inversion, in addition to the elastic parameters for each layer. Monte Carlo inversions are generally time-consuming, and this is especially true when solving the forward problem with MINEOS. Minimizing the number of unknowns to solve for is thus important. For this purpose, we inverted Love and Rayleigh waves separately for V SH and V SV , respectively. We verified that the models obtained were equivalent to joint inversion results (see section 5.1). Scaling relations were imposed on the other elastic parameters as in section 3.1 (Montagner & Anderson 1989). The resulting distributions of 1-D V SH and V SV profiles were then sampled and used to calculate V S and ξ distributions (Equations 1 and 2).
With a hierarchical TB approach, model parameters or data noise level are perturbed iteratively, and each new model and data noise combination is either accepted or rejected based on an acceptance criteria as described below. If a change is accepted, the iteration repeats with the modified parameter, and a parameter is randomly selected to be perturbed again. If rejected, the iteration repeats with the model and noise used prior to the modification.
Every 100 iterations, the current model is added to the ensemble of solutions. This process is repeated thousands of times in order for the chain to reach convergence, after which the saved ensemble of models gives a posterior for V SH or V SV . For our application, each chain had a burn-in period of 50,000 iterations before models were saved. We then ran enough chains and iterations to get 10,000 samples after the burn-in period.
Data noise perturbations consisted in random values generated from a prior Gaussian distribution that were added to the original noise following level. A model perturbation can consist in moving a Voronoi nucleus, inserting or removing a nucleus, or randomly modifying the elastic parameters at a nucleus using a prior Gaussian distribution. The acceptance or rejection criteria used depends on the likelihood and the prior probability as described in Bodin et al. (2012). According to Bayes' theorem (Bayes 1763), the posterior probability of having model m when a set of observed data d obs is fixed was given by: where p(d obs |m) is the likelihood function, which gives the probability of observing data given a particular model, and p(m) is the a priori probability of model m, which reflects our prior knowledge on the parameters before having the observed data. The acceptance criteria compares the posterior for the perturbed model to that of the original model. The larger the perturbed model posterior, the higher the likelihood the new model will be accepted. Here, the likelihood function is defined by: where d pre are the data predicted by model m, N is the total number of measurements, and σ is the total data noise level. The prior can be written as: where c is the set of depths of the Voronoi nuclei associated with m, v is the set of wave velocities, k is the number of Voronoi nuclei, and h is the set of noise parameters.
In Bodin et al. (2012), the prior p(c, v) was a uniform distribution that was constant with depth. Thus, p(c, v|k) can be written as the product of p(c|k) and p(v|k). The prior probabilities of models with the same number of Voronoi nuclei are equal, but they differ when layers are added and removed due to the prior on the velocities: where v i is the velocity of the ith Voronoi nuclei and ∆v is the range of allowable velocities.
In our case, the prior on V SV or V SH is a uniform distribution within 10% of CRUST1.0 in the crust and within 10% of PREM in the mantle. Since velocities are generally increasing with depth, δv increases with depth. This depth-dependent prior means that points at shallower depths will generally have a higher prior probability than points at larger depths. For anisotropy, the models are even more sensitive to changes in parameterization and regularization, even at 100 km depth where the data have some of the strongest sensitivity to structure (Fig. 1). Fig. 3 shows that for the V SV and V SH inversions at 100 km depth, both

Effect of Mantle Transition Zone Topography
In theory, topography of the MTZ boundaries can affect higher mode phase velocities (Equation 12). Higher modes have thus the potential to contain information about the MTZ thickness, which could be used to constrain its thermochemical state (Meier et al. 2009). Here, we tested the effect of MTZ topography on our results by first inverting the dispersion data provided by Visser et al. (2008b) and corrected for the effect of the crust as described above, and then comparing the models with those obtained by inverting the same data set corrected with a prior model of MTZ topography. For this, we used the 410 km and 670 km topography values obtained by Huang et al. (2019) with SS precursors and calculated their contributions to Rayleigh and Love wave phase velocities (Equation 12). Fig. S1 displays examples of MTZ topography corrections for MTZ sensitive modes and shows that they are at least an order of magnitude smaller than the measurements in the study region, and are thus unlikely to have an effect on the models. These contributions were nevertheless removed from the dispersion measurements and the remaining signal was inverted for at several grid cells. Fig. S2 shows the 1-D V S and ξ profiles obtained at two different locations within the study area using data corrected for the crust and data corrected both for the crust and for deviations in the depth of the 440 km and 660 km boundaries. We see that the effect of the boundary topography on the model velocities is not significant as both sets of inversions produced models that were nearly identical. This is in agreement with the study of Meier et al.
(2009) who inverted these same higher mode dispersion measurements using neural networks and found that the 410-km and 660-km topographies were not well constrained with respect to the prior PPDFs. We thus proceeded with uncorrected data for the rest of the study.

Joint vs. Separate Inversions
Non-linear inversions with Monte Carlo methods are computationally prohibitive. This is espe- and Rayleigh waves separately assuming isotropy did not differ significantly from those calculated for anisotropic PREM. The differences were within data errors. They did not, however, compare joint and separate non-linear inversions of real dispersion data. For completeness, we thus decided to verify whether this assumption is valid for our method using real data.
For this purpose, we inverted Love and Rayleigh wave fundamental and higher mode dispersion data separately at a few grid cells, and compared the models with those obtained from joint Love and Rayleigh wave TB inversions (Fig. 4). An important aspect of these joint Non-peer reviewed preprint sub. to EarthArXiv 17 inversions is that they were performed using independent depth parameterizations for V SH and for V SV . As demonstrated by Gao & Lekić (2018), inversions performed in a TB framework that use identical choices of depth parameterizations for the different unknowns can bias structure estimates. This "attached-type" parameterization is very common in seismology and has the advantage of potentially reducing the number of parameters, but it also constitutes a form of implicit regularization that imposes the same geometry to all parameters. It often results in a model geometry that is determined by the best resolved parameters but that does not necessarily appropriate for the others. We verified that this is true for our dataset as well. Fig. 4 compares results of joint inversions with independent parametrizations for V SH and for V SV and results of separate Love and Rayleigh wave inversions using our non-linear TB inversion approach. It displays the mean models and their 90 % confidence intervals on the same plots for the two types of inversions. We can see that there are few differences between the mean models and that separate inversions yield equivalent results to joint inversions as long as model uncertainties are accounted for. We thus proceeded with separate inversions for the rest of our study.

Effect of noise parameterization
As explained above, one advantage of the TB method is that it can invert for unknown data noise jointly with other parameters to prevent the mapping of data uncertainties into the model. While the phase velocity maps we used here (Visser et al. 2008b) were published with quantitative uncertainties at each period, these uncertainties resulted from averaging the errors on each measured earthquake-station pair. They therefore do not reflect the lateral variations in the uncertainties that can be caused by differences in ray coverage, for instance.
It is thus reasonable for us to invert for data noise in addition to V S and ξ instead of In order to determine the best approach for our problem, we compared TB inversion results for different noise parameterizations. In the first case we used one noise parameter for the Love wave inversions and one for the Rayleigh wave inversions. In the second case, each of the data subsets (Love and Rayleigh) were divided into two groups: one group contains mode branches n = 0 to n = 2 and the other group contains the mode branches n = 3 and higher. Each of these groups had its own noise level. In the third case, each mode branch within the two subsets had its own noise level. Results are shown for V SV and V SH in Fig. 5

. Supplementary
Figs. S3-S5 display results for the reconstructed V S and ξ and the noise parameter PPDFs.
The models are not significantly different, though the cases with additional noise parameters yielded less well-resolved V S and ξ. This is expected when increasing the number of unknowns in the inversion if trade-offs are present. However, the same general trends still exist across these different model distributions. We also note that the posterior noise parameters are generally larger for Love waves than for Rayleigh waves, indicating that Love wave dispersion data are noisier. Considering how little effect the number of noise parameters has on the velocity and anisotropy models, we proceeded using only one for each data set inversion. For radial anisotropy, the methods have less agreement (Fig. 7). At 100 km depth, the SVD inversion shows a general horizontally fast direction (ξ > 1) with pockets of vertically fast (ξ < 1) direction underneath the CIR, SWIR, and SEIR, and under Madagascar. The TB results display a more ubiquitous ξ > 1 with larger amplitude across the region, with We thus decided instead to follow previous studies (Beghein & Trampert 2006;Visser et al. 2008a) and use the posteriors to calculate the likelihood of the presence of positive or negative anomalies in the model parameters. These results are shown in Figs. 8 and 9 for d ln V S relative to PREM and in Figs. 10 and 11 for radial anisotropy.

Models
In Figs. 8 and 9 we see that at 100 km depth, the probability P that V S anomalies are negative underneath oceanic ridges is greater or equal to 0.75, and that the anomalies are of at least 1% in strength. In addition, while the negative anomalies beneath the SWIR are likely to be between 1% and 5%, the CIR and SEIR are likely (P ≥ 0.75) characterized by stronger anomalies of at least −5% at that depth. These relatively low velocity zones largely disappear at 200 km depth, with the exception of a likely d ln V S < −1% underneath part of the SEIR. At this depth, positive relative velocity perturbations of at least 1% likely exist under East Africa and the SWIR, but they are unlikely to be > 5%. At 400 km, an almost ubiquitious d ln V s > 1% is found with a high likelihood. This anomaly is, however, probably lower than 5% for much of the region, with the exceptions of the region between Southern Africa and Madagascar and east of the CIR. At 500 km, relative negative or positive velocity perturbations of 5% or more are very unlikely to exist. At those depths, positive anomalies of at least 1% may exist near La Réunion Island and the SEIR, and negative anomalies of at least 1% may exist on the SWIR and east of the CIR.
Figs. 10 and 11 reveal that at 100 km depth, there is a high likelihood (P ≥ 0.75) for V SH > V SV with strength of at least 5% underneath much of the Indian ocean. The exceptions are the RTJ, where the CIR, SEIR, and SWIR intersect, and the region around the Indonesian subduction zone where the likelihood of ξ > 1 is low (Fig. 10). At the RTJ, the likelihood of ξ < 1 is also weak (Fig. 11), suggesting the area may not have any radial anisotropy. A small pocket of likely V SH < V SV with strength of at least 5% is also visible south of the Indonesian slab around 30 • S, 105 • E. At 200 km, the likelihood of anisotropy anomalies is more heterogeneous and we see both horizontally fast and vertically fast anomalies of about 1%, indicating a likely reduction in the amplitude of the anisotropy with depth. There is also a pocket east of the RTJ with likely horizontally fast velocities with at least 5% anisotropy.
At 400 km depth, the likelihood of V SH > V SV decreases further and a few locations are characterized by a probability of 0.7 or greater of having V SH < V SV with about 1% anisotropy (ξ 0.99). At that depth, a zone of likely vertically fast velocities with at least 5% anisotropy is found south of the RTJ and in between the the CIR and Indonesian subduction zone. At 500 km, we see that much of the region has likely V SH < V SV with 0.95 < ξ < 0.99.

TB Inversions Synthetic Tests
In order to test the robustness of the above results, we ran synthetic tests using some of the mean models displayed in Figs. 6 and 7 as input to calculate the synthetic data. The prior and the noise parameterization were defined in the same way as in the real data inversions. Fig. 12 shows the result of one such test. For V SV and the reconstructed V S , we find that the posterior model distribution encompasses the input model and that the mean of this distribution is representative of the input model, especially above 400 km depth. At greater depths, the uncertainties increase. For V SH , the peaks of the posterior distribution are offset compared to the input model, showing that the TB inversion tends to overestimate the model amplitudes.
However, the mean model follows the input model at most depths except between 400 km and 550 km where rapid vertical changes in the input model are not accurately captured by the mean of the posterior distribution. This is likely due to limitations in the vertical resolution of the data. Additionally, the 50 km to 150 km depth range is characterized by multiple peaks in the posterior distribution that indicate trade-offs: the data cannot distinguish between Figure 10. Probability of having V SH > V SV (i.e., ξ > 1) at different depths with at least 1% (left) and 5% (right) anisotropy. Figure 11. Probability of having V SH < V SV (i.e., ξ < 1) at different depths with at least 1% (left) and 5% (right) anisotropy. a high velocity zone above a low velocity layer and a constant velocity structure at these depths. Since ξ is the square of the ratio of V SH to V SV , these differences between input and mean model are amplified in the reconstructed ξ profile. We see that the general trend of the posterior distribution roughly follows that of the input ξ model, but the amplitudes of the input model and the mean or the peak of the posterior distribution differ. Nevertheless, it should be noted that the posterior ξ distribution encompasses the input model, which highlights the importance of accounting for model uncertainties when interpreting results.
A question that may still arise is whether there could be leakage from isotropic to anisotropic structure. We thus ran another synthetic test with an isotropic input model below 200 km ( Fig. 13). As in Fig. 12, we find a close match between the input model and the mean of anisotropy, the mean model shows a strong ξ < 1 near 200 km depth that peaks about 50 km deeper than the corresponding anomaly of the input model. At larger depths, the mean model oscillates around the isotropic input model, with peak anisotropy approaching 5% at some depths. If one were to interpret these results based on the mean anisotropy model alone, one would conclude that there is a risk of finding deep anisotropy with our method that is not constrained by the data. However, we note that the posterior ξ distribution below 200 km is wide and allows a wide range of values of both ξ > 1 and ξ < 1 at these depths. This deep anisotropy found in the mean model is therefore not resolved, but this again highlights how essential it is to determine the posterior model uncertainties before interpreting the anisotropy signal.
In order to determine the level of confidence with which we can interpret the likelihood map shown in Figs. 10 and 11, we used the posterior ξ distribution displayed in Fig. 13 to calculate the likelihood of finding 1% and 5% anisotropy for both V SV > V SH and V SH > V SV .
The results are shown in Fig. 14. We see a strong likelihood (P = 0.8) of finding ξ < 0.95 between 200 km and ∼ 230 km, where the input model is isotropic. We attribute this to the inability of the surface waves to recover a sharp velocity or anisotropy vertical change. Most noteworthy, however, is the fact that below ∼ 250 km depth, the probability that we might find an anisotropy signal not constrained by the data is generally less than 0.6. This means that, when discussing our results (Figs. 10 and 11), we can confidently conclude that the deep anisotropy signal in our models is constrained by the data as long as we only interpret anisotropy associated with a likelihood higher than 0.6.
Non-peer reviewed preprint sub. to EarthArXiv 31

Joint Group and Phase Velocity Inversion
Some of the results presented above show multiple peaks in the posterior distributions, indicating that trade-offs exist between the different depth parameters (Fig. 4). In addition, it is clear that Love wave phase velocity data alone cannot resolve V SH as well as Rayleigh wave phase velocities can resolve V SV , resulting in large uncertainties in ξ. Group velocity dispersion data are generally sensitive to shallower depths than phase velocities at the same periods  ( Fig. 15) and may therefore help break trade-offs between shallow and deep structure. In this section, we tested whether adding the group velocity data measured by Ma et al. (2014) could reduce trade-offs and help better constrain the deep anisotropy signal.
Since the group and phase velocity data came from two different studies, we included independent noise parameters for each data type in our inversions. Fig. S6 represents the posterior noise distributions. It shows that the Love wave data are generally noisier than the Rayleigh wave data and that the group velocity data also contain a larger amount of noise than the phase velocities. anisotropy model is also found to be slightly lower in the top 50 km when group velocities are included but the general trend does not change significantly. Importantly, we find that the posterior distributions do not change significantly below ∼ 100 km. This suggests that, although the group velocity data can help constrain the shallow structure better, it does not change the model resolution in the rest of the upper mantle and does not affect our interpretation of the results.

DISCUSSION
The surface wave tomography of Debayle & Lévêque (1997) revealed that the lowest shear wave velocity anomalies at 100 km depth were located along the ridges, with the CIR and SEIR showing stronger anomalies than the SWIR. Low velocities along the CIR are also seen in the fundamental Rayleigh wave phase and group velocity study of Mazzullo et al. (2017). This is in agreement with our mean model (Fig. 6) and is confirmed by our likelihood estimates. Interestingly, our results suggest that the low velocity anomalies seen under the CIR and SEIR persist down to ∼ 200 km as in Debayle & Lévêque (1997), but that the SWIR has a shallower origin. Debayle & Lévêque (1997) found that these anomalies largely disappear at around 200 km. This is qualitatively consistent with our results, which show a decrease in the amplitude of the anomalies. At 400 km, our results favor a positive velocity perturbation of at least 1% and likely less than 5% across the whole study area, which suggests a relatively cold mantle. This differs from several other studies (Ritsema et al. 2011;Kustowski et al. 2008; As in Visser et al. (2008a), we find a high likelihood (P ≥ 0.75) of V SH > V SV at 100 km depth throughout most of the study region. This is consistent with previously published radial anisotropy models of our study area (Lévèque et al. 1998) as well as in other regions and globally (Gung et al. 2003b;Ekström & Dziewonski 1998;Panning & Romanowicz 2006b;Kustowski et al. 2008;Panning et al. 2010;Auer et al. 2014;French & Romanowicz 2014;Chang et al. 2015) and is often interpreted as the signature of horizontal asthenospheric motion. At 200 km, our models present likely (P ≥ 0.75) V SV > V SH and V SH > V SV anomalies with 1-5% anisotropy scattered across the region, which differs somewhat from Visser et al. (2008a) who found a generally low likelihood of V SH > V SV at that depth. No clear anisotropy pattern or relation to geological features emerges from our results below 100 km depth except for a vertically fast signal at 200 km along the Indonesian subduction zone (Fig. 7), though it is likely to be of small amplitude (Fig. 11) though they do not necessarily share the same patterns. Assuming anisotropy at those depths is due to the strain-induced lattice preferred orientation of intrinsically anisotropic minerals, our results suggest that the higher mode data favor a generally vertical flow signal below 200 km down to the MTZ. It should, however, be noted that the relationship between fast seismic direction and mantle deformation is not straightforward at those depths. Unlike for the shallow mantle, the anisotropy fast axis may not be a good proxy for flow direction in this part of the mantle as deformation could occur with a different slip system than at shallower depths (Mainprice et al. 2005). As pointed by Yuan & Beghein (2013), recrystallization or a change in the slip system of the olivine structure during phase changes at 410 km depth could also affect the relation between observed seismic anisotropy and flow direction.

CONCLUSION
We developed a non-linear inversion method based on Markov Chain Monte Carlo within a transdismensional Bayesian framework to constrain Earth's interior structure with fundamental and higher mode surface wave dispersion measurements. Besides the ability to solve the non-linear problem, the advantages of this approach include the fact that the choice of the depth parametrization is iteratively driven by the data. The results are therefore not influenced by an arbitrary prior choice of the number of model parameters. Another advantage lies in the ability to invert the noise level jointly with the model parameters, which reduces the risk of mapping unknown data noise into the models. In addition, the use of a Monte Carlo approach enables us to sample all possible models and to represent the ensemble of solutions with a distribution of parameters instead of choosing one model among many possibilities. This allowed us to account for model uncertainties by calculating a likelihood for every parameter of interest.
We applied our technique on a subset of the global phase velocity maps obtained by Visser et al. (2008b) for Love and Rayleigh waves up to the fifth and sixth overtone, respectively.
We focused on the Indian ocean and determined how well these data can constrain radial anisotropy in the deep upper mantle and transition zone. We found that correcting the measurements for the effect of perturbations in the depth of the MTZ boundaries did not impact the resulting models significantly, despite the higher mode data having sensitivity to these depths. We also demonstrated that separate Love and Rayleigh wave inversions produced results equivalent within model uncertainties to those of joint inversions, as long as the depth parameterizations for V SV and V SH are allowed to differ in the joint inversions.
The velocity models resulting from the mean of the posterior model distribution were in general agreement with results from linearized inversions, though with stronger negative anomalies along ridges in the upper 100 km-200 km. Stronger deviations between the models were found at 400 km depth, but they appeared to be in better agreement in the MTZ.
This suggests that using a fully non-linear TB method to constrain shear-wave velocities with higher mode surface waves may not result in significant changes in the model compared to regularized inverse methods. Our results show, however, that using such as a technique and quatifying model uncertainties is essential when discussing models of radial anisotropy. Large discrepancies were indeed found between the two methods for ξ, even at 100 km depth where the data sensitivity was high, and model uncertainties were larger below ∼ 300 km. Despite these large posterior model uncertainties, we were able to calculate the likelihood of anisotropy at different depths and determine how well resolved deep anisotropy signal is. Overall, while the dominant signal at 100 km is V SH > V SV as expected for seismic anisotropy resulting from horizontal mantle motion, we found likely lateral variations in ξ at greater depths, with a tendency for the data to favor a small amount of anisotropy with V SV > V SH across most of the region in the deep upper mantle and MTZ. The synthetic tests we performed also demonstrated that the anisotropy signal found at these depths is resolved by the higher mode data and does not result from leakage. We also found that the inclusion of group velocity dispersion data may help reduce parameter trade-offs in the top 100 km of the mantle but does not improve the model resolution in the deep upper mantle or MTZ.
For our TB inversions, we chose a prior that would not significantly influence the results.
Thus, the posterior distribution is mostly a function of the likelihood of a model given the misfit of the predicted data and the observations. This results in models that can explain the data used, but may not necessarily be in agreement with other datasets. This is reflected in some of the very large deviations of ξ from 1 (isotropy), with the posterior distributions sometimes including anisotropy of 25% or more. Additional information could improve constraints on model parameters, guiding the posterior distributions to more realistic values. This could take the form of complementary types of data that reduce trade-offs from the current data set.
This could also take the form of a narrower prior, which could be based on mineral physics.
This will be implemented in future work.

ACKNOWLEDGMENTS
This project was funded by NSF EAR grant #1446978. The data used in this study were downloaded from http://www.geo.uu.nl/~jeannot/JTweb/downloads.html for the phase velocity maps, and from https://igppweb.ucsd.edu/~gabi/litho1.0.html#surfcodes for the group velocity data.