Lowermost mantle shear-velocity structure from hierarchical trans-

7 The core-mantle boundary (CMB) is the most extreme boundary within the Earth where the liquid, 8 iron-rich outer core interacts with the rocky, silicate mantle. The nature of the lowermost mantle atop 9 the CMB, and its role in mantle dynamics, is not completely understood. Various regional studies 10 have documented significant heterogeneities at different spatial scales. While there is a consensus on 11 the long scale-length structure of the inferred S-wave speed tomograms, there are also notable 12 differences stemming from different imaging methods and datasets. Here we aim to overcome over13 smoothing and avoid over-fitting data for the case where the spatial coverage is sparse and the 14 inverse problem ill-posed. Here we present an S-wave tomography model at global scale for the 15 Lowermost Mantle (LM) using the Hierarchical Trans-dimensional Bayesian Inversion (HTDBI) 16 framework, LM-HTDBI. Our HTDBI analysis of ScS-S travel times includes uncertainty, and the 17 complexity of the model is deduced from the data itself through an implicit parameterization of the 18 model space. Our comprehensive resolution estimates indicate that short-scale anomalies are 19 significant and resolvable features of the lowermost mantle regardless of the chosen mantle-model 20 reference to correct the travel times above the D’’ layer. The recovered morphology of the Large21 Low-Shear-wave Velocity Provinces (LLSVPs) is complex, featuring small high-velocity patches 22 among low-velocity domains. Instead of two large, unified, and smooth LLSVPs, the newly obtained 23 images suggest that their margins are not uniformly flat. 24


Introduction 45
The core-mantle boundary (CMB) lies approximately halfway down to the Earth's center and is the 46 boundary with the most extreme contrasts in seismic velocity, viscosity, and density, where the heat 47 original 41,000 rays (Houser et al., 2008) indicated by the red dashed line in Fig. 1b, and the newly-140 collected 1,787 rays, to augment the geometric coverage and enhance the imaging capacity of D''. 141 The resulting coverage of rays in CMB is shown in the top panel of Fig. 2

and the Supporting 142
Information (Fig. S1). A simple calculation for our dataset shows that over 70% of the CMB has a hit 143 count of 5 or more if we divide the area of CMB to 4 ° by 4 ° cells. The distribution of the bounce 144 points of ScS rays and their raypaths for both datasets at the CMB are shown in Fig. 2b & 2c. We 145 compare the newly picked data and dataset from Houser et al., (2008) in the Supporting Information 146 ( Fig. S2 to Fig. S3). 147 To ensure the compatibility between the two datasets we followed the same measurement technique 148 as Houser et al., (2008) and handpicked the differential travel times through waveform correlation of 149 band-pass filtered waveform data at 10-50 s (Fig. 3). Correspondingly, the size of the first Fresnel 150 zone for ScS in the lowermost mantle is near 400 km in diameter, which can be interpreted as a 151 rough approximation of the potential resolution of the data in the areas with intermediate or low 152 coverage and a criterion for the size of resolved short-scale heterogeneities. 153 The advantage of handpicking is that it allows us to align the onsets of S and ScS phases and discard 154 noisy waveforms. The measurements were made on transverse components to suppress any 155 interference from converted phases. 156 Given the fact that S and ScS raypaths are not identical in the lower mantle (albeit they are similar), 157 to further reduce the effects of mantle heterogeneity, S and ScS travel times were corrected for 3D 158 mantle structure using 1D ray paths down to the top of D'' (300 km above CMB) using five global 159 mantle models: TX2011 (Grand, 2002), S362ANI+M (Moulik and Ekström, 2014), GyPSuM-S 160 (Simmons et al, 2010), SEMUCB-WM1 (French and Romanowicz, 2014) and SAW642ANb 161 (Panning et al., 2010). S and ScS traced in the mantle using "tau-p" (Buland and Chapman, 1983) 162 through PREM and the extracted mantle velocity structure (based on the selected five global models 163 and their reference models) along each ray path give the time corrections for S and ScS. While this 164 does not account for ray bending that might occur due to 3D structure, the difference between 165 velocities sampled along 1D and 3D ray paths is minimal given the long wavelength of features 166 imaged in tomographic models. Mantle correction reasonably suppresses the projection of mantle 167 heterogeneities into the inferred structure of D''. At these epicentral distances and in the crust, the 168 raypaths of S and ScS are nearly identical, which means that the correction for the crust is not 169 necessary. The corrected ScS-S data from all 5 models are then inverted for the velocity anomalies in 170 D''. 171

Method 173
We employed the HTDBI method (Bodin et  preferred inversion method to image the lowermost mantle. The HTDBI method includes the overall 175 level of data noise as a free parameter in the inversion process, together with the number of model 176 parameters. Consequently, large, small, smooth, and sharp discontinuous features can be resolved 177 based on the trade-off between data uncertainty and resolution. In the non-HTDBI tomographic 178 inversions, arbitrary choices, such as a single parameterization type, damping, and smoothing, must 179 be made a priori. These choices can be especially problematic in global seismic tomography, where 180 an uneven distribution of sources and receivers causes uneven ray coverage. Also, the use of a fixed 181 regular grid of some chosen size typically introduces artifacts in regions of low ray coverage, where 182 the particular property being investigated is poorly constrained. 183 Here, we assume a 300-km-thick layer atop CMB to perform the inversion. The choice of 300 km is 184 based on the average thickness of D'' (350±50 km) presented by statistical analysis of Garcia et al. 185 (2009) and empirical examination of different D'' thicknesses in Tkalčić et al. (2015). Given the 186 trade-off between layer thickness and the amplitude of perturbations, further advances will include 187 the HTDBI approaches where the thickness and the amplitude of the perturbation will be treated as 188

variables. 189
The Trans-dimensional inverse problem in tomography may be solved with the reversible jump The Likelihood term includes the forward modeling, where we make predictions of travel times for a 204 given Earth model. Here, the forward model involves computation of a travel time prediction along 205 each ray path in D'' using the tau-p algorithm through the PREM Earth model (Dziewonski and 206 Anderson, 1981). The accrued time differences between the 1D and 2D ray-tracing are on the same 207 order of magnitude as the uncertainty introduced by ray tracing through various mantle tomographic 208 models (up to several seconds for most extreme velocity perturbations). Still, moving from 1D to 2D 209 or complete 3D ray tracing is an ambitious computational goal within the HTDBI framework, which 210 hopefully the next generation of models will achieve. 211 The downward and upward ray-legs of ScS in D'' layer with 300 km thickness are projected along 212 with the longitude and latitude coordinates on the surface of a sphere. The predicted travel times of 213 ScS are attributed to the surface projection of ScS ray path on the D'' spherical shell. The prediction 214 is then compared to the observation. We assume travel time errors are independently and normally 215 distributed which leads to a Gaussian likelihood of the following form: 216 where ( ) is data predicted from model , is observation, is the total number of observations 218 and is the error of the observation. The negative log-likelihood, − ( ), is a measure of data 219 misfit used by the algorithm. Minimizing the negative log-likelihood is equivalent to maximizing the 220 posterior probability. 221 In the hierarchical approach, the data noise, , is assumed a fixed parameter and a scaling 222 hierarchical parameter, λ, is allowed to vary. Together, they define the noise parameter, , : 223 , = . (

3) 224
A Bayesian setting is adopted to solve the inverse problem in which the solution is represented as a 225 posterior probability distribution over a multi-dimensional Earth model, (the first term in eq. 1). 226 An ensemble of models generated with the Birth-Death Markov chain Monte Carlo, McMC, (Geyer 227 and Moyer, 1994), which is a special case of the generalized reversible jump algorithm (Green, 228 1995). 229 The McMC algorithm probabilistically samples Earth models, with variable numbers of unknowns, 230 in proportion to their support as expressed by the combination of the likelihood and prior. The 231 property known as natural parsimony ensures more complex, higher models, which, if optimized, 232 would fit the data better, but are not necessarily favored over less complex, lower , models 233 (Mackay, 2003). In the McMC algorithm, new models are proposed according to some chosen 234 proposal distribution, and these get accepted or rejected probabilistically according to an acceptance 235 ratio. Typically a Gaussian proposal PDF width for velocity perturbations is adjusted so that actual 236 acceptance levels are between 30 -50% (Tarantola, 2005). If acceptance levels are much lower than 237 this then the algorithm spends a large proportion of time-solving the forward problem for models that 238 are ultimately rejected. This is often the case when the proposal step size is too large. If acceptance 239 levels much higher than this range are obtained, the algorithm is likely inefficiently walking through 240 model space taking small steps and thereby sampling many similar models. The acceptance rate 241 between 30 -50% is where the random walk establishes a balance between large and small steps. 242 By definition, each new model (or step) along a Markov chain is dependent on the previous step. 243 Since each chain is typically initiated at an arbitrary point in model space, which may be poorly 244 supported by the data, (i.e. having low Likelihood) then early steps along the chain are discarded as 245 'burn-in'. The number of models discarded is usually decided to be once the Likelihood reaches 246 some stabilized level, or when models provide a satisfactory fit to data given estimated noise levels. 247 In addition, the chained is 'thinned' in order to reduce the correlation of nearby samples along the 248 chain. In this procedure, only every ℎ model is taken into account. The collected output samples 249 from all chains after burn-in and thinning is referred to as the 'ensemble'. Statistical properties of 250 this ensemble, such as mean and standard deviation, can then be calculated to aid the interpretation 251 of the results. 252 In this study, we use a recently developed extension in which the model space is defined by a 253

Resolution tests 261
We carried out two all-inclusive synthetic tests (Fig. 4), two synthetic tests with simpler shapes and a 262 synthetic test including a model with smooth variations (Supporting Information, Fig. S4) to assess 263 the resolving power of our ray coverage (Fig. 2) and the capabilities of the HTDBI method. We used 264 the data predicted by the two models featured in Fig. 4 and added Gaussian noise with the standard 265 deviation of =0.5 s to the travel times. Velocity perturbations, relative to an average PREM 266 velocity in the lowermost 300 km of the mantle, were allowed to range in the interval ±4%. free parameter gives us confidence that we are able to recover noisy data in the inversion (Bodin et 293 al., 2012). Therefore, as in a hierarchical approach, we allowed the scaling factor, λ, to be estimated 294 through the inversion in all of our synthetic tests. As we added Gaussian random noise to the 295 synthetic travel times and introduced the same value for with a standard deviation of 0.5, we 296 expect λ to arrive at a value around 1 (equation 3). The histogram of λ ( Fig. 5b & 5d) confirms that 297 the data noise is properly resolved. 298 From synthetic test #1, it is evident that the method is able to retrieve the pattern and strength of the 299 model parameters with a good resolution in areas with reasonable ray coverage. The ensemble mean 300 of the retrieved models from inversion of the synthetic travel times illustrates that the combination of 301 our ray coverage and this inversion scheme is successful in recovering both long-and short-scale 302 features. According to the synthetic tests, small and large structures are well resolved at most parts of 303 the northern hemisphere and Australia. We should expect smearing and distortions to happen in the 304 east Pacific and parts of the southern hemisphere which is also evident in the standard deviation 305 maps (Fig. 4g, 4h & 4i), and this is in agreement with the ray coverage (Fig. 2). The higher standard 306 deviation occurs in the poorly sampled regions and at the sharp boundaries of positive and negative 307 anomalies, which has been shown to be associated with uncertainty in boundary locations (Hawkins 308 et al., 2019). The northern hemisphere is characterized by a higher ray coverage which results in a 309 lower standard deviation and a good capacity to recover the details of S-wave speed variations. In 310 contrast, the poorly sampled areas or the sharp velocity transitions go through more versatility in 311 parameterization (mobile geometry of the Voronoi cells) and hence have a higher standard deviation. 312

Inversion 313
We used our high-quality, hand-picked measurements collected through waveform correlation, 314 corrected the differential travel times for the effect of mantle structure and simulated the shear wave 315 structure in the D''. In the HTDBI approach often a number of chains are utilized to thoroughly 316 sample the model space. Here, at each chain, the initial 200,000 burn-in iterations were discarded 317 and every 250 th model was selected for further processing (similar to the synthetic tests). Note that 318 these chains start at a different random point and sample the model space independently. We allowed 319 between 50 and 2000 cells with a uniform prior PDF over the globe, and set the shear wave velocity 320 perturbation with a uniform prior in the interval <-7.5, 5%> relative to an average PREM shear-wave 321 speed in D''. A uniform prior PDF for the model parameters means that the possible outcome could 322 be with equal probability within the prior bound but not outside. The hierarchical parameter, λ, was set over the range <1.0e-9, 1.0e1> and is sampled from the Jeffreys prior distribution. Bounds of the 324 priors are set to include all possible values that a model parameter can take. 325 The determining factors for the resulting number of Voronoi cells and their spatial distribution are 326 the ray distribution and the noise amplitude. Larger Voronoi cells form in the regions of scattered 327 sampling while much smaller ones form in the regions of denser sampling (Bodin and Sambridge, 328 2009). We performed the inversion hierarchically, meaning that the noise is treated as a free 329 parameter in the inversion (i.e. the inversion attempts to explore the trade-off between the model 330 complexity (the number of Voronoi cells) and the observational noise (hierarchical error)). 331 Our LM-HTDBI approach generates a total of 54 million models. Given our burn-in period and the 332 thinning stage, the posterior inference ends up with an ensemble of 180,000 models. Each model of 333 the ensemble is a crude and implausible representation of the D'', however, the mean of many such 334 crude models represents a more interpretable field of shear-wave velocity variation in D''. 335

Results 336
The mean and standard deviation of the posterior ensemble solution can be expressed visually by 337 plotting them on a pixeled 2-D velocity map. To suppress the effect of the mantle on our differential 338 travel times, we have tested a number of mantle models and analyzed the corresponding corrections 339 for 3-D mantle structure. The structure of the mantle has been imaged by different research groups 340 and some differences can be observed among them, which makes the decision of choosing a mantle 341 model non-trivial. Here we have taken a more general approach, explained below. 342

Mantle correction 343
Given the variability among global mantle models and to avoid possible bias towards a specific one, 344 we corrected our differential travel times using five different mantle models including TX2011 345 under each chain. The resulting D'' models inferred from data corrected for these five models are 349 presented in Fig. 6. The patchy distribution of high-and low-velocity structure is independent of our 350 choice of mantle model and shows more or less a consistent image of D'' (Fig. 6a to 6e). The short-351 scale heterogeneities exist across all five models obtained from corrected data sets, which suggests 352 that these patchy anomalies are a feature of the Earth's structure at D'' rather than a bias towards a 353 specific mantle model. 354 We show the chain history of negative log-likelihood as a function of iterations for 45 Markov chains 355 in Fig. 7. The chain history of negative log-likelihood demonstrates that all Markov chains stabilize 356 around a similar value (left panel in Fig. 7). A significant overlap can be observed in the histograms 357 of different chains, which demonstrates that all chains converged to a similar set of models (right 358 panel in Fig. 7). 359 Posterior probability plots of the data noise and the number of cells show the estimated noise in the 360 dataset and the number of Voronoi cells to which the inversion converges (Fig. 8a & 8b). The precisely to this effect. It will be higher when the (uncorrelated) residual scatter is larger and smaller 371 otherwise. Also, the slope of each cluster in terms of the λ vs. the number of cells is also as expected, 372 i.e., when λ becomes larger (data noise is assumed larger), then fewer numbers of cells tend to be 373 introduced, and vice versa. The Natural Parsimony principle can be seen in action on the distribution 374 of cells as fewer models with a higher number of cells are sampled. The natural parsimony is also 375 evident in Fig. 8c, since there is a negative correlation between data noise level and numbers of cells 376 introduced by the algorithm. This is consistent with the fact that more cells, and hence structure, are 377 required when there is more apparent signal in the data (noise levels are lower). Given that the 378 majority of the models recovered in the inversion have around 600 cells, this signifies a well-379 sampled parameter space. Also, given that the inversion was performed with data noise as a free 380 parameter, we show the frequency histogram for the scaling factor of the hierarchical parameter, λ. 381 Since we assumed a = 5%, and given the peak value of 0.27 in the distribution of λ, the estimated 382 noise in HTDBI is about 1.35% ( , in equation 3). Given this estimation for data noise, and assuming a typical travel time of 170 seconds for ScS within D'', we arrive at a representative 384 traveltime uncertainty of about 2.3 s for our dataset. 385

Shear-wave velocity in D'' 386
We combine the models with different mantle corrections (Fig. 6a-e) to obtain the mean and standard 387 deviation over all models with the assumption that they all have equal weight. Therefore, our S-wave 388 velocity variations in the D'' is the average of five models. This average velocity model and its 389 associated standard deviation map are shown in Fig. 9. We compare the travel-time residual of the 390 mean model (Fig. 9 a) with an individual Voronoi-cell model ( Supporting Information, Fig. S5) and Antarctica. However, there is a striking difference across the existing global models, even for long-401 scale features in the shape and intensity of the LLSVPs (Fig. 10). The critical characteristic of our 402 tomographic model is the appearance of short-scale features at the base of the mantle exterior to the 403 LLSVPs, which our resolution analysis confirms are not artifacts of the inversion. In the Supporting 404 Information (Fig. S7), we examine the heterogeneity-spectrum or the power-spectrum of the models. 405 Spectral analyses reveal the long-scale content of our model is similar to the previous models as 406 shown in the filtered plots (Supporting Information, Fig.10 and Fig. S7); however, its power of 407 heterogeneity in higher degrees is larger than previous estimates (Fig. S7). 408 The main advantage of our approach over previous approaches is that as no explicit subjective 409 regularization or parameterization are applied, the size of velocity heterogeneities is directly west Australia, North Atlantic and North Chile. From Fig. 8b, it is obvious that the areas with higher 416 standard deviation correspond to either the poorly sampled parts of D'' or to the areas that have gone 417 through more velocity variations during the inversion (e.g. closer to the sharp contrasts in elastic 418 properties). 419

Discussions 420
The chief goal of this study was to invert for the shear-wave velocity heterogeneities atop the CMB 421 using ScS and S travel times. Our study has a couple of limitations. The first limitation is that we 422 assume a fixed thickness of 300 km for the D'' layer. In addition, our forward method does not 423 include the finite frequency effects. However, ray-theory based tomography at the frequencies we 424 use is a reasonable approximation to interpret finite frequency travel times. For example, Hung et al. without the sensitivity kernels. They showed that the tomography based on the sensitivity kernels 427 yields similar results using infinite frequency ray paths. Hung et al. (2005) showed that the recovered 428 velocity perturbations using finite frequency kernels (in the frequency band similar to our data) are 429 approximately 1-2 times larger than those from the ray-based kernels. In our study, given the 430 average number of 600 Voronoi cells for the area covering the CMB (Fig. 8a), the average diameter 431 of a simplified Voronoi cell is ~ 500 km, which is slightly larger than the diameter of the Fresnel 432 zone (400 km). The structures we interpret are also larger than the size of the Fresnel zone. Hence, 433 not accounting for finite frequency effects might not be a too severe limitation. 434 Another limitation of our approach is that the inversion scheme we deploy does not assess an 435 independent impact of anisotropy on S and ScS data, and in turn, on our tomograms. Within the 436 Bayesian framework, not accounting for anisotropy is considered a theory error. Accounting for both 437 data noise and theory errors is at the leading edge of geophysical inference studies. Accurate The complex pattern of short-scale heterogeneities that we obtained suggests that LLSVPs are not 460 necessarily the only contributing factors to mantle instabilities. Given the existence of short-scale 461 heterogeneities, we believe that it is feasible that they play a significant role in mantle dynamics. 462 Several regional studies found evidence for locally sharp variation (up to or more than ±4%) in 463  (Fig. 11). More broadly, these models were derived either from 490 different datasets or by using different methods of inversion. 491 hemisphere, most of the models in D'' do not capture small-scale heterogeneities, which we 496 speculate is most likely due to applying uniform damping and smoothing. For example, an 497 interesting observation from our comparison with other global models (Fig. 11) is a clear difference 498 in the recovery of short-scale anomalies between the HMSL model and this study. We employed a 499 subset of data used in the HMSL model, which only includes data from larger epicentral distances. 500 Although the outline of the LLSVPs is similar, the presence and lack of short-scale features represent 501 a striking difference between these two models. This is attributable to the inversion strategy 502 deployed here where no fixed model parameterization, damping and smoothing are required, and 503 noise is treated as a free parameter, which is achieved at a significantly higher computational cost. 504 Another clear difference among the models selected for comparison is the amplitudes of Vs 505 variations. This is not surprising as different modelers use different data and methods, and they 506 achieve different resolutions in different regions. We compared the amplitudes of LM-HTDBI with 507 the previous models in the Supporting Information (Fig. S8), as in Burdick & Lekić (2017). The tilt 508 of the ellipses in Fig. S8 shows that velocity variations in our model have higher amplitude relative to the previous studies. Among all models presented in Fig. 11, SEMUCB-WM1 is generally 510 consistent with our model in places where they both exhibit larger amplitudes of S-wave velocity 511 heterogeneity, as well as in the features not seen in earlier generations of global D'' models. 512 However, these two models certainly differ in detail, especially in terms of the location and 513 distribution of short-scale heterogeneities. The existence of some short-scale features in SEMUCB-514 WM1 is likely due to the usage of short-period waveform data. 515 The only low-velocity anomaly outside of the LLSVPs that emerges in the majority of global models 516 is the 'Perm anomaly' (e.g. Lekić et al., 2012), north of the Caspian Sea. The region around Perm 517 anomaly is imaged as two smaller low-velocity anomalies in our model and this region is well 518 constrained in our synthetic test (Fig. 4 d).

Conclusions 531
This study presents a new model of shear-velocity variations in the D'' using high quality, 532 handpicked dataset of S and ScS differential travel times through waveform correlation. Inverting for 533 the compilation of 18,131 ScS-S differential travel times in conjunction with careful consideration of 534 global mantle models enabled us to "isolate" the lowermost mantle from the heterogeneities present 535 in the rest of the mantle. In this paper, we assumed that the D'' could be modeled as a single layer. 536 We utilized recent improvements in the inversion technique through the use of the HTDBI 537 framework which is based on implicit parametrization (spherical Voronoi cells) and treats the model 538 complexity and the data noise as free parameters, while a uniform application of damping and 539 smoothing to the model space can be avoided. We found that our model is in agreement with most of 540 the recent global studies in terms of imaging the long-scale features at the base of the mantle. 541 However, short-scale heterogeneity, as seen in our D'' tomogram, is far more omnipresent than in 542 spherical-harmonic, degree-2-dominated images for shear-wave speed of the lowermost mantle 543 obtained in most global models. This is the most striking difference between our model and the 544 previous model obtained using the same subset of data (Houser et al., 2008). We demonstrate, 545 however, that our method is able to recover much simpler, degree-2 images of the lowermost mantle 546 if the lowermost mantle is indeed void of short-and medium-scale heterogeneity and only 547 characterized by long-scale features. We are thus confident that the tomogram of the lowermost 548 mantle we obtain is a realistic representation of the seismic structures. Significantly, the recovered 549 LLSVP morphology of is complex, consisting of smaller high-velocity patches among low-velocity 550 domains. The newly obtained image suggests that the LLSVPs' margins are not uniformly flat. We 551 further argue that the methodology we adopt is a significant step forward in imaging the complicated 552 structure of the D'' layer on a global scale, and provides an important bridge between long-scale 553 features at a global scale and short-scale features of regional models. Future work may explore 554 combining additional datasets of travel times to increase spatial coverage and resolution of the 555 lowermost mantle, adopting similar principles in full-waveform modeling and novel methodological 556 approaches to inversion. 557

Acknowledgment 558
We thank Guy Masters for sharing his dataset with us, and his insights into the interpretation of our 559 results. The authors would like to thank the associate editor Ved Lekić, Ed Garnero, and an 560 anonymous reviewer for constructive discussions and feedback. We used GMT (Wessel et al., 1998)