Lowermost Mantle Shear‐Velocity Structure From Hierarchical Trans‐Dimensional Bayesian Tomography

The core‐mantle boundary (CMB) is the most extreme boundary within the Earth where the liquid, iron‐rich outer core interacts with the rocky, silicate mantle. The nature of the lowermost mantle atop the CMB, and its role in mantle dynamics, is not completely understood. Various regional studies have documented significant heterogeneities at different spatial scales. While there is a consensus on the long scale length structure of the inferred S‐wave speed tomograms, there are also notable differences stemming from different imaging methods and datasets. Here we aim to overcome over‐smoothing and avoid over‐fitting data for the case where the spatial coverage is sparse and the inverse problem ill‐posed. We present an S‐wave tomography model at a global scale for the Lowermost Mantle (LM) using the Hierarchical Trans‐Dimensional Bayesian Inversion (HTDBI) framework, LM‐HTDBI. Our LM‐HTDBI analysis of ScS‐S travel times includes uncertainty, and the complexity of the model is deduced from the data itself through an implicit parameterization of the model space. Our comprehensive resolution estimates indicate that short‐scale anomalies are significant and resolvable features of the lowermost mantle regardless of the chosen mantle‐model reference to correct the travel times above the D’’ layer. The recovered morphology of the Large‐Low‐Shear‐wave Velocity Provinces (LLSVPs) is complex, featuring small high‐velocity patches among low‐velocity domains. Instead of two large, unified, and smooth LLSVPs, the newly obtained images suggest that their margins are not uniformly flat.

Importantly, while various regional studies have argued for the existence of short-scale heterogeneities (e.g., Frost et al., 2013;He & Wen, 2009;He & Wen, 2011;Hung et al., 2005;Li et al., 2021;Suzuki et al., 2020;Takeuchi, et al., 2008;Takeuchi, 2012) the global seismic studies of shear-waves robustly revealed a dominant character of long-scale features (e.g., Durand et al., 2017;Houser et al., 2008). A recent mantle model using shear waves, developed by French and Romanowicz (2014) utilized waveform inversion of selected windows of body waves (300-32 s) and surface waves (400-60 s) to provide a detailed image of the Earth's mantle. This model also found evidence for short-scale heterogeneity at the top of the CMB.
The remarkable diversity of the short and medium scale anomalies suggests a significant uncertainty amongst models, even at the scales that they allegedly resolve (e.g., Moulik & Ekström, 2014;Kustowski et al., 2008;Simmons et al., 2010). In our hypothesis, most discrepancies among various tomographic models stem from the adopted procedures. First, they are prone to subjective choices of model parameters and regularization, which, in turn, determines the level of model complexity. In model regularizations, smoothing and damping are usually applied to the entire model, and consequently, they mute the sharp edges that are likely present in Earth and transform them into pixelated gradual transitions. Second, building models based on explicit parameterizations such as regular cells which ignores the spatial variability in the data's resolving power usually introduces small-scale artifacts in regions with less ray coverage. Third, the difficulty in quantifying data uncertainties takes its toll since the level of data noise is usually determined by the user, prior to the inversion, and it regulates the level of data fit.
To take a progressive approach and improve the imaging capability of structural features in the D'' by using travel times only, we believe it is essential to possess the following two ingredients: (a) differential travel times of body waves, whose use suppresses the unwanted crustal and upper mantle effects near the source and receiver, thus capturing the contribution from the lowermost mantle, (b) an improved theoretical and methodological framework to overcome the aforementioned limitations in various seismic-tomography inversion approaches. For this purpose, we use the Hierarchical Trans-Dimensional Bayesian Inversion (HTDBI) framework to perform a tomographic inversion that circumvents subjective choices. In fact, the first global imaging application of the HTDBI method was to study the lowermost mantle Young et al., 2013), but the improvement here in comparison with the previous studies is that we now deploy truly spherical Voronoi parameterization. This type of parameterization has recently been applied in an attenuation tomography (Pejić et at., 2019) and the P-wave velocity tomography studies of the upper inner core (Burdick et al., 2019). The main limitation of our study (due to the computational expense consideration) is that we do not account for the finite frequency effects. In contrast, other global tomography studies have utilized more complete information by utilizing waveforms, a wide range of body waves and phases, and normal modes. In subsequent sections, we present a development of a new shear-wave velocity model of the D''. In Section 2, we explain the data that was used in the inversion. In Section 3, we discuss how the HTDBI method differs from the linearized travel-time tomography. Next, in Section 4, we present the resolution tests including an all-embracing test for an abstract, complex, D'' model and a test for a more realistic structure, which mimics the final pattern of anomalies in D''. In Section 5, we explain the inversion and data analysis that we carried out through HTDBI. In Section 6, we first describe our approach to correcting for mantle structure and then present the averaged shear-wave velocity model together with its standard deviation map. In Sections 7 and 8, we compare our model with other available models of D'' and discuss the advantages of using HTDBI in global imaging of D''.

Data
In this study, we utilize differential travel time of shear waves to image the heterogeneity pattern at the lowermost mantle. Differential travel times of body waves are particularly helpful: first, by reducing the unwanted crustal and upper mantle effects near the source and receiver, since the raypaths are close for the large epicentral distances in the crust and upper mantle; second, source mislocations are not projected into the model. We used ScS-S differential travel times from Houser et al. (2008) which encompasses events from 1976 -2007, and augmented it by our newly measured data in the time window 2008-2018, focusing on the source-receiver geometries that fill the gaps in spatial coverage of the Houser et al. (2008) data set. We considered only earthquakes with simple source-time functions and magnitudes 5.5 < M < 7.5. To reduce the effect of upper mantle structure on our ScS-S data set, we imposed a relatively strict criterion and only selected the event-station pairs at epicentral distances between 60 and 75 degrees (Figure 1a). In these epicentral distances, the raypaths of ScS and S waves are more proximate to each other and thus their differential travel time is a better proxy for the travel time affected by D'' structure where they differ the most. The vertical separation between the bottoming points of S and ScS varies from 1,450 km to 950 km for epicentral distances of 60 and 75 degrees. Our data-selection procedure resulted in 18,131 rays in total, including 16,344 rays out of the original 41,000 rays (Houser et al., 2008) indicated by the red dashed line in Figure 1b, and the newly collected 1,787 rays, to augment the geometric coverage and enhance the imaging capacity of D''. The resulting coverage of rays in CMB is shown in the top panel of Figure 2 and Figure S1. A simple calculation for our data set shows that over 70% of the CMB has a hit count of 5 or more if we divide the area of CMB to 4° by 4° cells. The distribution of the bounce points of ScS rays and their raypaths for both datasets at the CMB are shown in Figures 2b and 2c. We compare the newly picked data and data set from Houser et al. (2008) in Figure S2 to Figure S3.
To ensure the compatibility between the two datasets we followed the same measurement technique as Houser et al. (2008) and handpicked the differential travel times through waveform correlation of band-pass  (Houser et al., 2008). The red dashed line indicates the portion of data (16,144 rays) that was selected for this study. New raypaths added in this study Raypath from Houser et al., (2008) ScS-S di . travel time (s) -4 -2 0 2 4 filtered waveform data at 10-50 s ( Figure 3). Correspondingly, the size of the first Fresnel zone for ScS in the lowermost mantle is near 400 km in diameter, which can be interpreted as a rough approximation of the potential resolution of the data in the areas with intermediate or low coverage and a criterion for the size of resolved short-scale heterogeneities.
The advantage of handpicking is that it allows us to align the onsets of S and ScS phases and discard noisy waveforms. The measurements were made on transverse components to suppress any interference from converted phases.
Given the fact that S and ScS raypaths are not identical in the lower mantle (albeit they are similar), to further reduce the effects of mantle heterogeneity, S and ScS travel times were corrected for 3D mantle structure using 1D raypaths down to the top of D'' (300 km above CMB) using five global mantle models: TX2011 (Grand, 2002), S362ANI + M (Moulik & Ekström, 2014), GyPSuM-S (Simmons et al., 2010), SEMUCB-WM1 (French & Romanowicz, 2014) and SAW642ANb (Panning et al., 2010). S and ScS traced in the mantle using "tau-p" (Buland & Chapman, 1983) through PREM and the extracted mantle velocity structure (based on the selected five global models and their reference models) along each raypath give the time corrections for S and ScS. While this does not account for ray bending that might occur due to 3D structure, the difference between velocities sampled along 1D and 3D raypaths is minimal given the long wavelength of features imaged in tomographic models. Mantle correction reasonably suppresses the projection of mantle heterogeneities into the inferred structure of D''. At these epicentral distances and in the crust, the raypaths of S and ScS are nearly identical, which means that the correction for the crust is not necessary. The corrected ScS-S data from all 5 models are then inverted for the velocity anomalies in D''.

Method
We employed the HTDBI method (Bodin et al., 2012;Pejić et al., 2019;Young et al., 2013) as our preferred inversion method to image the lowermost mantle. The HTDBI method includes the overall level of data noise as a free parameter in the inversion process, together with the number of model parameters. Consequently, large, small, smooth, and sharp discontinuous features can be resolved based on the trade-off between data uncertainty and resolution. In the non-HTDBI tomographic inversions, arbitrary choices, such as a single parameterization type, damping, and smoothing, must be made a priori. These choices can be especially problematic in global seismic tomography, where an uneven distribution of sources and receivers causes uneven ray coverage. Also, the use of a fixed regular grid of some chosen size typically introduces artifacts in regions of low ray coverage, where the particular property being investigated is poorly constrained.
Here, we assume a 300-km-thick layer atop CMB to perform the inversion. The choice of 300 km is based on the average thickness of D'' (350 ± 50 km) presented by statistical analysis of Garcia et al. (2009) and empirical examination of different D'' thicknesses in Tkalčić et al. (2015). Given the trade-off between layer thickness and the amplitude of perturbations, further advances will include the HTDBI approaches where the thickness and the amplitude of the perturbation will be treated as variables.
The Trans-Dimensional inverse problem in tomography may be solved with the reversible jump algorithm (Bodin et al., 2012) in which the solution is represented by multi-scale irregular ensemble of Voronoi cells with variable number and size (see Figure S5 for example of a Voronoi model). For a review of the Trans-Dimensional Bayesian inference, see a review of Sambridge et al. (2013). In this approach, the Voronoi cells  dynamically change their position, values (here velocity) and number (via death and birth of the cells) during the inversion, within the context of multi-dimensional probabilistic sampling.
Bayes' theorem is applied to solve the inverse problem in which the solution is represented as a posterior probability distribution. The Bayes' theorem for a Trans-Dimensional model can be represented as: k, the number of Voronoi cells, typically chosen to be uniform between chosen limits.
The Likelihood term includes the forward modeling, where we make predictions of travel times for a given Earth model. Here, the forward model involves computation of a travel time prediction along each raypath in D'' using the tau-p algorithm through the PREM Earth model (Dziewonski & Anderson, 1981). The accrued time differences between the 1D and 2D ray-tracing are on the same order of magnitude as the uncertainty introduced by ray tracing through various mantle tomographic models (up to several seconds for most extreme velocity perturbations). Still, moving from 1D to 2D or complete 3D ray tracing is an ambitious computational goal within the HTDBI framework, which hopefully the next generation of models will achieve.
The downward and upward ray-legs of ScS in the D'' layer with 300 km thickness are projected along with the longitude and latitude coordinates on the surface of a sphere. The predicted travel times of ScS are attributed to the surface projection of ScS raypath on the D'' spherical shell. The prediction is then compared to the observation. We assume travel time errors are independently and normally distributed which leads to a Gaussian likelihood of the following form: is a measure of data misfit used by the algorithm. Minimizing the negative log-likelihood is equivalent to maximizing the posterior probability.
In the hierarchical approach, the data noise, i E  , is assumed a fixed parameter and a scaling hierarchical parameter, λ, is allowed to vary. Together, they define the noise parameter, , A Bayesian setting is adopted to solve the inverse problem in which the solution is represented as a posterior probability distribution over a multi-dimensional Earth model, E m (the first term in Equation 1). An ensemble of models generated with the Birth-Death Markov chain Monte Carlo, McMC, (Geyer & Møller, 1994), which is a special case of the generalized reversible jump algorithm (Green, 1995).
The McMC algorithm probabilistically samples Earth models, with variable numbers of unknowns, in proportion to their support as expressed by the combination of the likelihood and prior. The property known as natural parsimony ensures more complex, higher E k models, which, if optimized, would fit the data better, but are not necessarily favored over less complex, lower E k, models (Mackay, 2003). In the McMC algorithm, new models are proposed according to some chosen proposal distribution, and these get accepted or rejected probabilistically according to an acceptance ratio. Typically a Gaussian proposal PDF width for velocity perturbations is adjusted so that actual acceptance levels are between 30% and 50% (Tarantola, 2004). If acceptance levels are much lower than this then the algorithm spends a large proportion of time-solving the forward problem for models that are ultimately rejected. This is often the case when the proposal step size is too large. If acceptance levels much higher than this range are obtained, the algorithm is likely inefficiently walking through model space taking small steps and thereby sampling many similar models. The acceptance rate between 30% and 50% is where the random walk establishes a balance between large and small steps.
By definition, each new model (or step) along a Markov chain is dependent on the previous step. Since each chain is typically initiated at an arbitrary point in model space, which may be poorly supported by the data, (i.e., having low Likelihood) then early steps along the chain are discarded as "burn-in." The number of models discarded is usually decided to be once the Likelihood reaches some stabilized level, or when models provide a satisfactory fit to data given estimated noise levels. In addition, the chained is "thinned" in order to reduce the correlation of nearby samples along the chain. In this procedure, only every th E n model is taken into account. The collected output samples from all chains after burn-in and thinning is referred to as the "ensemble". Statistical properties of this ensemble, such as mean and standard deviation, can then be calculated to aid the interpretation of the results.
In this study, we use a recently developed extension in which the model space is defined by a network of spherical Voronoi cells forming an irregular mesh over the 2D sphere (Burdick et al., 2019;Pejić et al., 2019). Spherical Voronoi cells directly partition a spherical surface, which is critical for global tomographic studies. We view this as an improvement over previously employed Cartesian Voronoi cells parameterizations (e.g., Young et al., 2013;Tkalčić et al., 2015) defined through a Cartesian projection over the sphere. The implementation of spherical Voronoi cells removes distortion at the poles inherent with the Cartesian approach, and also naturally accommodates cyclicity over the globe.

Resolution Tests
We carried out two all-inclusive synthetic tests (Figure 4), two synthetic tests with simpler shapes and a synthetic test including a model with smooth variations ( Figure S4) to assess the resolving power of our ray coverage ( Figure 2) and the capabilities of the HTDBI method. We used the data predicted by the two models featured in Figure 4 and added Gaussian noise with the standard deviation of N E  = 0.5 s to the travel times. Velocity perturbations, relative to an average PREM velocity in the lowermost 300 km of the mantle, were allowed to range in the interval ±4%.
Our synthetic tests are improvements over the checkerboard pattern tests as they check the recovery of coexisting heterogeneities of different sizes and shapes. They examine retrieval of both simple and more complex heterogeneous structures with sharp boundaries as well as varying strengths in S-wave speed in D''. The first synthetic test (Figure 4a) examines the recovery of a pseudo, complex heterogeneity in D'' dominated by the spherical degree 2 anomalies (LLSVPs) as revealed in numerous long-period S-wave tomographic studies, but assumes other, smaller-scale features of varying amplitudes superimposed on the main signature. For example, it includes fast and slow velocities outside of the LLSVPs, beneath Central America, Alaska, Greenland, East Asia, South-East Asia and Australia.
The second synthetic test (Figures 4b and 4c) includes the maximum power amplitudes and explores the recovery of several multi-scale features of different shapes (circle, rhombus, square and a small circle) that are superimposed on top of a predominantly hemispherical structure with sharp boundaries Young et al., 2013). The size of the overlaying anomalies onto the hemispherical structures varies from 5,000 km (the diameter of the big circles) to 300 km (the narrow parts of the squares). We show the mean and standard deviation of the posterior ensemble in the second and third row of Figure 4, respectively. The recovery of both long-and short-scale features in the synthetic tests highlights the benefits of the HTDBI approach. As expected, the best model-recovery is achieved in regions with a dense ray coverage, in particular North Africa, North Atlantic, Asia, and Australia.
In our synthetic tests, we used 6 Markov chains with 3 million iterations in each chain. The first 200,000 iterations were discarded as burn-in period and every 250th model was selected during the thinning procedure. We let the number of cells vary from 50 to 2000. Since the number of the model parameters can change, the required number of cells to fit the data can be expressed as a posterior probability distribution function. The histograms for the number of cells for both synthetic tests are shown in Figure 5. In these synthetic examples, the true noise is known and there is no need to use hierarchical sampling to estimate the character and level of the noise. However, adding the noise as a free parameter gives us confidence that we are able to recover noisy data in the inversion (Bodin et al., 2012). Therefore, as in a hierarchical approach, we allowed the scaling factor, λ, to be estimated through the inversion in all of our synthetic tests. As we added Gaussian random noise to the synthetic travel times and introduced the same value for E  with a standard deviation of 0.5, we expect λ to arrive at a value around 1 (Equation 3). The histogram of λ (Figures 5b and 5d) confirms that the data noise is properly resolved.
From synthetic test #1, it is evident that the method is able to retrieve the pattern and strength of the model parameters with a good resolution in areas with reasonable ray coverage. The ensemble mean of the retrieved models from inversion of the synthetic travel times illustrates that the combination of our ray coverage and this inversion scheme is successful in recovering both long-and short-scale features. According to the synthetic tests, small and large structures are well resolved in most parts of the northern hemisphere and Australia. We should expect smearing and distortions to happen in the east Pacific and parts of the southern hemisphere which is also evident in the standard deviation maps (Figures 4g-4i), and this is in agreement with the ray coverage ( Figure 2a). The higher standard deviation occurs in the poorly sampled regions and at the sharp boundaries of positive and negative anomalies, which has been shown to be associated with uncertainty in boundary locations . The northern hemisphere is characterized by a higher ray coverage which results in a lower standard deviation and a good capacity to recover the details of S-wave speed variations. In contrast, the poorly sampled areas or the sharp velocity transitions go through more versatility in parameterization (mobile geometry of the Voronoi cells) and hence have a higher standard deviation.

Inversion
We used our high-quality, hand-picked measurements collected through waveform correlation, corrected the differential travel times for the effect of mantle structure and simulated the shear wave structure in the D''. In the HTDBI approach often a number of chains are utilized to thoroughly sample the model space. Here, at each chain, the initial 200,000 burn-in iterations were discarded and every 250th model was selected for further processing (similar to the synthetic tests). Note that these chains start at a different random point and sample the model space independently. We allowed between 50 and 2000 cells with a uniform prior PDF over the globe, and set the shear wave velocity perturbation with a uniform prior in the interval <−7.5%, 5%> relative to an average PREM shear-wave speed in D''. A uniform prior PDF for the model parameters means that the possible outcome could be with equal probability within the prior bound but not outside. The hierarchical parameter, λ, was set over the range <1.0e−9, 1.0e1> and is sampled from the Jeffreys prior distribution. Bounds of the priors are set to include all possible values that a model parameter can take.
The determining factors for the resulting number of Voronoi cells and their spatial distribution are the ray distribution and the noise amplitude. Larger Voronoi cells form in the regions of scattered sampling while much smaller ones form in the regions of denser sampling (Bodin & Sambridge, 2009). We performed the inversion hierarchically, meaning that the noise is treated as a free parameter in the inversion (i.e., the inversion attempts to explore the trade-off between the model complexity (the number of Voronoi cells) and the observational noise (hierarchical error)).
Our LM-HTDBI approach generates a total of 54 million models. Given our burn-in period and the thinning stage, the posterior inference ends up with an ensemble of 180,000 models. Each model of the ensemble is a crude and implausible representation of the D'', however, the mean of many such crude models represents a more interpretable field of shearwave velocity variation in D''.

Results
The mean and standard deviation of the posterior ensemble solution can be expressed visually by plotting them on a pixeled 2-D velocity map.
To suppress the effect of the mantle on our differential travel times, we have tested a number of mantle models and analyzed the corresponding corrections for 3-D mantle structure. The structure of the mantle has been imaged by different research groups and some differences can be observed among them, which makes the decision of choosing a mantle model non-trivial. Here we have taken a more general approach, explained below.

Mantle Correction
Given the variability among global mantle models and to avoid possible bias towards a specific one, we corrected our differential travel times using five different mantle models including TX2011 (Grand, 2002) Mantle corrected using S362ANI+M (Moulik & Ekström, 2014) and SEMUCB-WM1 (French & Romanowicz, 2014). We ran nine Markov chains for each corrected data set, resulting in 45 chains with 1.2 million model-iterations performed under each chain. The resulting D'' models inferred from data corrected for these five models are presented in Figure 6. The patchy distribution of high-and low-velocity structures is independent of our choice of mantle model and shows more or less a consistent image of D'' (Figures 6a-6e). The short-scale heterogeneities exist across all five models obtained from corrected data sets, which suggests that these patchy anomalies are a feature of the Earth's structure at D'' rather than a bias towards a specific mantle model.
We show the chain history of negative log-likelihood as a function of iterations for 45 Markov chains in Figure 7. The chain history of negative log-likelihood demonstrates that all Markov chains stabilize around a similar value (left panel in Figure 7). A significant overlap can be observed in the histograms of different chains, which demonstrates that all chains converged to a similar set of models (right panel in Figure 7).
Posterior probability plots of the data noise and the number of cells show the estimated noise in the data set and the number of Voronoi cells to which the inversion converges (Figures 8a and 8b). The frequency histogram of the number of cells shows a Gaussian distribution with a maximum centered on 600 cells. Figure 8c shows three distinctive groups representing the Trans-Dimensional ensembles recovered from the five different data corrections used. They are not due to different chains converging to different parts of the model space. Hence in this figure, we see the effect of the change in data on the ensemble. The key things to note are that the effect of the change in model correction on the distribution of cell numbers is not significant since all are centered around 600-650 cells, and most have similar spreads. The hierarchical noise parameter, λ, sees the most change, which again is understandable since each correction results in different levels of scatter in the observations and hence differing levels of data consistency (or noise). A sampling of the λ parameter responds precisely to this effect. It will be higher when the (uncorrelated) residual scatter is larger and smaller otherwise. Also, the slope of each cluster in terms of the λ versus the number of cells is as expected, i.e., when λ becomes larger (data noise is assumed larger), then fewer numbers of cells tend to be introduced, and vice versa. The Natural Parsimony principle can be seen in action on the distribution of cells as fewer models with a higher number of cells are sampled. The natural parsimony is also evident in Figure 8c, since there is a negative correlation between data noise level and numbers of cells introduced by the algorithm. This is consistent with the fact that more cells, and hence structure, are required when there is more apparent signal in the data (noise levels are lower). Given that the majority of the models recovered in the inversion have around 600 cells, this signifies a well-sampled parameter space. Also, given that the inversion was performed with data noise as a free parameter, we show the frequency histogram for the scaling factor of the hierarchical parameter, λ. Since we assumed a 5% E   , and given the peak value of 0.27 in the distribution of λ, the estimated noise in HTDBI is about 1.35% ( N E  , in Equation 3). Given this estimation for data noise, and assuming a typical travel time of 170 s for ScS within D'', we arrive at a representative traveltime uncertainty of about 2.3 s for our data set.

Shear-Wave Velocity in D''
We combine the models with different mantle corrections (Figures 6a-6e) to obtain the mean and standard deviation over all models with the assumption that they all have equal weight. Therefore, our S-wave velocity variations in the D'' is the average of five models. This average velocity model and its associated standard deviation map are shown in Figure 9. We compare the travel-time residual of the mean model (Figure 9a) with an individual Voronoi-cell model ( Figure S5) and the initial travel-time residuals in Figure S6. The histogram shows that the mean of the posterior ensembles can reduce travel time residuals compared to a single Voronoi-cell model shown in ( Figure S5) and the average velocity in PREM.
Following Simons et al. (2006), we filtered the tomographic image of this study along with other tomographic models: SGLOBE-rani (Chang et al., 2015), S362ANI + M (Moulik & Ekström, 2014), SP12RTS (Koelemeijer et al., 2016), GyPSuM (Simmons et al., 2010), S40RTS (Ritsema et al., 2011), SEMUCB-WM1 (French & Romanowicz, 2014) and HMSL (Houser et al., 2008) to the low spherical harmonic degree, lmax = 6 ( Figure 10). The long-scale patterns in LM-HTDBI are in general agreement about low velocities beneath Africa and the Pacific and fast in most parts of Asia and Antarctica. However, there is a striking difference across the existing global models, even for long-scale features in the shape and intensity of the LLSVPs (Figure 10). The critical characteristic of our tomographic model is the appearance of short-scale features at the base of the mantle exterior to the LLSVPs, which our resolution analysis confirms are not artifacts of the inversion. In Figure S7, we examine the heterogeneity-spectrum or the power-spectrum of the models. Spectral analyses reveal the long-scale content of our model is similar to the previous models as shown in the filtered plots; however, its power of heterogeneity in higher degrees is larger than previous estimates (Figures 10 and S7).
The main advantage of our approach over previous approaches is that as no explicit subjective regularization or parameterization are applied, the size of velocity heterogeneities is directly controlled by the data itself and the short-scale features are preserved. In terms of a comparison with other lowermost mantle models, we confirm that our model correlates well with SEMUCB-WM1 in terms of short-scale heterogeneity content, but it has more intense variation in velocity in comparison with the latter. The areas with the lowest and the highest shear-wave speed are scattered throughout the lowermost mantle. Four specific examples of regions with high-velocity are the mid-Indian ocean, west Australia, North Atlantic and North Chile. From Figure 9b, it is obvious that the areas with higher standard deviation correspond to either the poorly sampled parts of D'' or to the areas that have gone through more velocity variations during the inversion (e.g., closer to the sharp contrasts in elastic properties).

Discussions
The chief goal of this study was to invert for the shear-wave velocity heterogeneities atop the CMB using ScS and S travel times. Our study has a couple of limitations. The first limitation is that we assume a fixed thickness of 300 km for the D'' layer. In addition, our forward method does not include the finite frequency effects. However, ray-theory based tomography at the frequencies we use is a reasonable approximation to interpret finite frequency travel times. For example, Hung et al. (2005) and Castle and van der Hilst (2000) used ScS-S data at the periods of 20 s to perform inversions with and without the sensitivity kernels. They showed that the tomography based on the sensitivity kernels yields similar results using infinite frequency raypaths. Hung et al. (2005) showed that the recovered velocity perturbations using finite frequency kernels (in the frequency band similar to our data) are approximately 1-2 times larger than those from the ray-based kernels. In our study, given the average number of 600 Voronoi cells for the area covering the CMB (Figure 8a), the average diameter of a simplified Voronoi cell is ∼500 km, which is slightly larger than the diameter of the Fresnel zone (400 km). The structures we interpret are also larger than the size of the Fresnel zone. Hence, not accounting for finite frequency effects might not be a too severe limitation.
Another limitation of our approach is that the inversion scheme we deploy does not assess an independent impact of anisotropy on S and ScS data, and in turn, on our tomograms. Within the Bayesian framework, not accounting for anisotropy is considered a theory error. Accounting for both data noise and theory errors is at the leading edge of geophysical inference studies. Accurate mapping seismic anisotropy in the lowermost mantle using tomographic methods is challenging and still an open question (Romanowicz & Wenk, 2017). Many studies used differential measurements of pair phases such as ScS and S to calculate anisotropy in D''  (e.g., Lay, 2015;Nowacki et al., 2010;Wookey et al., 2005). Shear wave splitting measurements of D'' show that horizontally polarized S-wave velocities are larger than vertically polarized ones by ∼1% (Panning & Romanowicz, 2004). The residual splitting in ScS, which is attributed to the anisotropy of D'' shows lag times between 1.0 and 3.9 s for epicentral distances 60° to 85° beneath the north Pacific (Wookey et al., 2005).
Our tomographic approach in a combination with a high-quality data set enables multi-scale imaging of the lowermost mantle. Our model (LM-HTDBI) reveals multi-scale structures ranging in wavelengths from hundreds to thousands of kilometers. The deduced shear-wave tomogram of D'' also reveals that the amplitudes of the short-scale low-velocity anomalies outside the LLSVPs are comparable to those observed inside them. Also, the morphology of the LLSVPs imaged by our method features small high-velocity patches creating separate regions of low-velocity anomalies, rather than only two large, unified and smooth LLSVPs. In addition, the appearance of the short-scale structures distorts the general shape of the LLSVPs as their margins do not look uniformly flat.
The results of this study are in general agreement with the D'' P-wave tomography of Tkalčić et al. (2015) using the HTDBI method in that both models contain short-scale and medium-scale structures superimposed on top of the long-scale structures. Furthermore, Muir and Tkalčić (2020) have recently obtained a similar result using Hamiltonian Monte Carlo sampling and low-degree spherical harmonics expansion with implicit regularization by basis truncation. They showed that short-scale perturbations are required to account for the observed travel-time residuals.
The complex pattern of short-scale heterogeneities that we obtained suggests that LLSVPs are not necessarily the only contributing factors to mantle instabilities. Given the existence of short-scale heterogeneities, we believe that it is feasible that they play a significant role in mantle dynamics. Several regional studies found evidence for locally sharp variation (up to or more than ±4%) in regions like eastern Eurasia (He & Wen, 2011), western Pacific (e.g., He et al., 2006Suzuki et al., 2020;Takeuchi, 2012) and the Caribbean (Hung et al., 2005). The recent study of Suzuki et al. (2020) illustrated several strong short-scale, low shearwave velocities in the western Pacific and northern Australia. For instance, in ULVZs, shear-wave velocity reduction can reach 30%-45% (e.g., Jenkins et al., 2021;Yu and Garnero, 2018) and many of them show complex internal radial structures (Pachhai et al., 2014(Pachhai et al., , 2015. We cannot expect to reveal such a level of details in our tomograms as the thickness of ULVZs varies most likely between 5 and 10 km (Garnero et al., 2016) or on the order of 20-30 km in the case of the Hawaiian mega-ULVZ Jenkins et al., 2021). Our model presents an averaged image of D'' with approximately 300 km thickness. Hence, some theory limitations inevitably contribute to the noise and are captured by our estimates of observational uncertainty through the hierarchical noise estimation. A recent study by Kim et al. (2020) found evidence of pervasive scattering (Sdiff postcursors present on ∼50% of paths) across the Pacific without a clear relationship between the postcursors' amplitudes and delay-times, suggesting either distributed heterogeneities or interaction with boundaries of a larger structure. We speculate that the significant amplitude Vs variation obtained in this study is probably significant enough to produce the postcursors detected by Kim et al. (2020).
Our study reveals short-scale heterogeneities on top of the long-scale structures while most global models imaged predominantly long-scale features in D''. The method adopted here alleviates some of the shortcomings of using classical inversion methods including block parameterization, smoothing and damping regularization. Importantly, inverting independently for ScS-S and correcting for the mantle heterogeneities enables us to isolate the lowermost mantle from the heterogeneities in the rest of the mantle. To further investigate these differences, we compared seven different global maps of shear wave variations at D'' including SGLOBE-rani (Chang et al., 2015), S362ANI + M (Moulik & Ekström, 2014), SP12RTS (Koelemeijer et al., 2016), GyPSuM (Simmons et al., 2010), S40RTS (Ritsema et al., 2011), SEMUCB-WM1 (French & Romanowicz, 2014) and HMSL (Houser et al., 2008) with our model (Figure 11). More broadly, these models were derived either from different datasets or by using different methods of inversion.
Figures 10 and 11 reveal that there are similarities among different models for long-scale features: they agree on the existence and extent of large low-velocity patterns, the well-known LLSVPs, under the Pacific and Africa. However, the margins, shapes, strength of heterogeneity and gradients across the D'' are significantly different. Although the data coverage is usually better in the northern hemisphere, most of the models in D'' do not capture small-scale heterogeneities, which we speculate is most likely due to applying uniform damping and smoothing. For example, an interesting observation from our comparison with other global Figure 10. Global maps of shear wave velocity variations at the core-mantle boundary for the eight models including SGLOBE-rani (Chang et al., 2015), S362ANI + M (Moulik & Ekström, 2014), SP12RTS (Koelemeijer et al., 2016), GyPSuM (Simmons et al., 2010), S40RTS (Ritsema et al., 2011), SEMUCB-WM1 (French & Romanowicz, 2014), HMSL (Houser et al., 2008) and this study (LM-HTDBI) filtered to the spherical-harmonic degree lmax = 6. Each model extreme value is annotated on the top left side of their map. All scale bars are in %δVs and set to the ±3% to show the amplitude diversity between models.  dVs/Vs (%) models ( Figure 11) is a clear difference in the recovery of short-scale anomalies between the HMSL model and this study. We employed a subset of data used in the HMSL model, which only includes data from larger epicentral distances. Although the outline of the LLSVPs is similar, the presence and lack of short-scale features represent a striking difference between these two models. This is attributable to the inversion strategy deployed here where no fixed model parameterization, damping and smoothing are required, and noise is treated as a free parameter, which is achieved at a significantly higher computational cost.
Another clear difference among the models selected for comparison is the amplitudes of Vs variations. This is not surprising as different modelers use different data and methods, and they achieve different resolutions in different regions. We compared the amplitudes of LM-HTDBI with the previous models in Figure S8, as in Burdick and Lekić (2017). The tilt of the ellipses in Figure S8 shows that velocity variations in our model have higher amplitude relative to the previous studies. Among all models presented in Figure 11, SEMUCB-WM1 is generally consistent with our model in places where they both exhibit larger amplitudes of S-wave velocity heterogeneity, as well as in the features not seen in earlier generations of global D'' Figure 11. Comparison of the global shear-wave velocity models for the lowermost mantle. All scale bars are in %δVs and set to ±5% to show the amplitude diversity between models. As in Figure 10, the numbers in the corner of each panel show maximum variation for each filtered model and scale bars are set to ±5%. -4 -2 0 2 4 dVs/Vs (%) models. However, these two models certainly differ in detail, especially in terms of the location and distribution of short-scale heterogeneities. The existence of some short-scale features in SEMUCB-WM1 is likely due to the usage of short-period waveform data.

HMSL
The only low-velocity anomaly outside of the LLSVPs that emerges in the majority of global models is the "Perm anomaly" (e.g., Lekić et al., 2012), north of the Caspian Sea. The region around the Perm anomaly is imaged as two smaller low-velocity anomalies in our model and this region is well constrained in our synthetic test (Figure 4d).
A possible origin and composition of the LLSVPs presented in global tomographic images of D'' is still enigmatic (e.g., Garnero et al., 2016;Doucet et al., 2020), yet our tomographic model confirms a significantly more complex image of D''. The base of the mantle outside of the LLSVPs has long been considered as the site of accumulation of cold subducted slabs (Grand, 2002;Jones et al., 2020;Li, 2020;Suzuki et al., 2020). This is a possible explanation for the existence of high shear-wave velocity regions. A recent study by Li (2020) showed that the scattered low-velocity anomalies outside of the LLSVPs could have a hot thermal origin that are detached from the thermochemical piles. Along the same line, the existence of fast seismic velocity anomalies within LLSVPs might suggest principally thermal-heterogeneity characterized by the clusters of plumes (Davaille & Romanowicz, 2020;Davies et al., 2015;Muir & Tkalčić, 2020;Tkalčić et al., 2015).

Conclusions
This study presents a new model of shear-velocity variations in the D'' using high quality, handpicked data set of S and ScS differential travel times through waveform correlation. Inverting for the compilation of 18,131 ScS-S differential travel times in conjunction with careful consideration of global mantle models enabled us to "isolate" the lowermost mantle from the heterogeneities present in the rest of the mantle. In this paper, we assumed that the D'' could be modeled as a single layer. We utilized recent improvements in the inversion technique through the use of the HTDBI framework which is based on implicit parametrization (spherical Voronoi cells) and treats the model complexity and the data noise as free parameters, while a uniform application of damping and smoothing to the model space can be avoided. We found that our model is in agreement with most of the recent global studies in terms of imaging the long-scale features at the base of the mantle. However, short-scale heterogeneity, as seen in our D'' tomogram, is far more omnipresent than in spherical-harmonic, degree-2-dominated images for shear-wave speed of the lowermost mantle obtained in most global models. This is the most striking difference between our model and the previous model obtained using the same subset of data (Houser et al., 2008). We demonstrate, however, that our method is able to recover much simpler, degree-2 images of the lowermost mantle if the lowermost mantle is indeed void of short-and medium-scale heterogeneity and only characterized by long-scale features. We are thus confident that the tomogram of the lowermost mantle we obtain is a realistic representation of the seismic structures. Significantly, the recovered LLSVP morphology of is complex, consisting of smaller high-velocity patches among low-velocity domains. The newly obtained image suggests that the LLSVPs' margins are not uniformly flat. We further argue that the methodology we adopt is a significant step forward in imaging the complicated structure of the D'' layer on a global scale, and provides an important bridge between long-scale features at a global scale and short-scale features of regional models. Future work may explore combining additional datasets of travel times to increase spatial coverage and resolution of the lowermost mantle, adopting similar principles in full-waveform modeling and novel methodological approaches to inversion.