Small-scale lithospheric heterogeneity characterization using Bayesian inference and energy ﬂux models

Observations from different disciplines have shown that our planet is highly heterogeneous at multiple scale lengths. Still, many seismological Earth models tend not to include any small-scale heterogeneity or lateral velocity variations, which can affect measurements and predictions based on these homogeneous models. In this study, we describe the lithospheric small-scale isotropic heterogeneity structure in terms of the intrinsic, diffusion and scattering quality factors, as well as an autocorrelation function, associated with a characteristic scale length ( a ) and RMS fractional velocity ﬂuctuations ( ε ). To obtain this characterization, we combined a single-layer and a multilayer energy ﬂux models with a new Bayesian inference algorithm. Our synthetic tests show that this technique can successfully retrieve the input parameter values for 1- or 2-layer models and that our Bayesian algorithm can resolve whether the data can be ﬁtted by a single set of parameters or a range of models is required instead, even for very complex posterior probability distributions. We applied this technique to three seismic arrays in Australia: Alice Springs array (ASAR), Warramunga Array (WRA) and Pilbara Seismic Array (PSAR). Our single-layer model results suggest intrinsic and diffusion attenuation are strongest for ASAR, while scattering and total attenuation are similarly strong for ASAR and WRA. All quality factors take higher values for PSAR than for the other two arrays, implying that the structure beneath this array is less attenuating and heterogeneous than for ASAR or WRA. The multilayer model results show the crust is more heterogeneous than the lithospheric mantle for all arrays. Crustal correlation lengths and RMS velocity ﬂuctuations for these arrays range from ∼ 0.2 to 1.5 km and ∼ 2.3 to 3.9 per cent, respectively. Parameter values for the upper mantle are not unique, with combinations of low values of the parameters ( a < 2 km and ε < ∼ 2.5 per cent) being as likely as those with high correlation length and velocity variations ( a > 5 km and ε > ∼ 2.5 per cent, respectively). We attribute the similarities in the attenuation and heterogeneity structure beneath ASAR and WRA to their location on the proterozoic North Australian Craton, as opposed to PSAR, which lies on the archaean West Australian Craton. Differences in the small-scale structure beneath ASAR and WRA can be ascribed to the different tectonic histories of these two regions of the same craton. Overall, our results highlight the suitability of the combination of an energy ﬂux model and a Bayesian inference algorithm for future scattering and small-scale heterogeneity studies, since our approach allows us to obtain and compare the different quality factors, while also giving us detailed information about the trade-offs and uncertainties in the determination of the scattering parameters.


I N T RO D U C T I O N
The Earth is heterogeneous on a variety of scales, ranging from the grain scale to scales of hundreds of kilometres. This heterogeneity is evident in data from geo-disciplines with varying sensitivity to different scales, such as geochemistry, mineralogy or seismology (e.g. Wu & Aki 1988). Due to the seismic wavelengths, most seismological earth models are laterally homogeneous or smoothly varying, with a lack of small-scale heterogeneity (e.g . Helmberger 1968;Dziewonski & Anderson 1981;Kennett & Engdahl 1991;Randall 1994). This limits our understanding of high-frequency seismic wave propagation and challenges in seismic imaging of small-scale heterogeneities remain.
Many seismic studies published before the 1970s were based on laterally homogeneous Earth models (e.g. Alexander & Phinney 1966) which were able to explain the propagation of long period signals, but failed to explain high frequency seismograms. Aki (1969) showed that the power spectra of coda waves for a given station are independent of epicentral distance and earthquake magnitude. He proposed that codas were caused by backscattered energy from discrete heterogeneities randomly distributed beneath the stations. The presence and shape of the coda strongly depends on the heterogeneity structure and, therefore, the geology beneath the station. Later studies (e.g. Aki & Chouet 1975;Rautian & Khalturin 1978) showed that the stable decay in coda wave amplitude was also independent of epicentral distance and source mechanism, fully supporting the scattering hypothesis.
Methods to study heterogeneity and scattering within the Earth vary depending on the type of the heterogeneity. Many seismological studies use deterministic methods to characterize the structure of the Earth (e.g. Christensen & Mooney 1995;Zelt & Barton 1998) or to find individual scatterers and try to obtain their particular characteristics and locations (e.g. Etgen et al. 2009). Marchenko imaging (e.g. van der Neut et al. 2015; Thorbecke et al. 2017) or migration techniques (e.g. Etgen et al. 2009) are often used in reflection seismology to study shallow structure and are a good example of deterministic methods. These techniques tend to have limited spatial resolution due to the wavelength of the studied waves and do not always take into account small-scale heterogeneities (on the order of magnitude of the wavelength or smaller), therefore failing to explain or reproduce the complex coda waves we see in seismograms. A different approach that partially overcomes these issues uses a stochastic description of the heterogeneity (e.g. Korn 1990Korn , 1997Ritter et al. 1998;Hock et al. 2004;Margerin 2005). This approach (e.g. Frankel & Wennerberg 1987;Shapiro & Kneib 1993;Hock et al. 2004;Sato & Emoto 2018) provides a statistical description of the structure and determines the integrated effect of heterogeneity on propagating seismic waves, so the characteristics and locations of individual scatterers are not relevant. Studies show the crust and lithospheric heterogeneity to be statistically complex and the necessity of heterogeneous Earth models that are capable of explaining not only the main waveforms but also coda waves (e.g. Aki 1973;Flatté & Wu 1988;Langston 1989).
Several methods allow us to study the propagation of seismic waves through heterogeneous stochastic media and characterize the scattering and attenuation properties of the Earth. Single-scattering perturbation theory (e.g. Aki & Chouet 1975;Sato 1977Sato , 1984 was one of the first methods designed for this purpose. It considers scattering to be a weak process and coda waves the superposition of single scattered waves generated at randomly distributed heterogeneities within the Earth. It often makes use of the Born approximation (e.g. Sato et al. 2012), a first-order perturbation condition which does not take into account the energy loss from the primary waves. As a result, energy is not conserved in the scattering process (e.g. Aki & Chouet 1975). Sato (2006Sato ( , 2007 and Emoto et al. (2010) later set the basis for future synthesis of vector wave envelopes studies by extending the Markov approximation for scalar waves and developing a series of algorithms to synthesize vector wave envelopes in 3-D Gaussian random elastic media. Recently, many studies have used radiative transfer theory (RTT), a technique initially developed for light propagation (Chandrasekhar 1950) which has been significantly improved and expanded (e.g. Margerin et al. 1998;Przybilla & Korn 2008;Nakahara & Yoshimoto 2011;Sanborn et al. 2017;Sato & Emoto 2017Hirose et al. 2019;Margerin et al. 2019) since its first applications to seismology (e.g. Wu 1985;Gusev & Abubakirov 1987). In particular, the development and improvement of Monte Carlo simulations and analytical approaches to solve the radiative transfer equations have made it possible to apply RTT to a wide variety of tectonic and geological settings (e.g. Margerin & Nolet 2003;Gaebler et al. 2015b, a;Fielitz & Wegler 2015;Hirose et al. 2019). Other methods to analyse coda energy and study lithospheric heterogeneity have been proposed and are also widely used (e.g. coda normalization method (Aki 1980), multiple lapse time window analysis (e.g. Fehler et al. 1992), coda wave interferometry (e.g. Snieder 2006, etc.). While these methods are able to characterize the heterogeneity structure of the Earth, they all use approximations or are computationally expensive.
In this study, we combine two stochastic methods, the single layer modified energy flux model (EFM, Korn 1990) and the depth dependent energy flux model (EFMD, Korn 1997), with a Bayesian inversion algorithm which allows us to characterize small-scale lithospheric heterogeneity by fully exploring the scattering parameter space and obtain information about the trade-offs and uncertainties in the determination of the parameters. We applied these methods to a large data set of teleseismic events recorded at three seismic arrays of the Australian National Seismic Network: Pilbara Seismic Array (PSAR), and Alice Springs Array (ASAR) and Warramunga Array (WRA), which are also primary seismic arrays from the International Monitoring System (IMS) network, the worlwide network built to ensure compliance with the Comprehensive Test Ban Treaty (CTBT).

M E T H O D S
We use the random medium approach, which considers the propagation of seismic waves through a medium with constant background velocity and density and random heterogeneities distributed according to a given autocorrelation function (ACF) and linearly related through Birch's law (Birch 1961). The ACF depends on the RMS fractional velocity fluctuations, ε, and the characteristic or correlation length, a, which defines the spatial variation of the heterogeneities. By obtaining these parameters, it is possible to obtain a statistical description of the sampled structure that reveals the strength of the scattering experienced by seismic waves. The modified energy flux model (EFM) and depth-dependent energy flux model (EFMD) can be used for both weak and strong scattering (e.g. Korn 1990;Hock & Korn 2000;Hock et al. 2004) and allow determining the best-fitting ACF of the heterogeneous medium. Both methods work under the assumption of planar wave fronts and vertical or near-vertical incidence from below on a single I. N. GonzálezÁlvarez et al. scattering layer (EFM) or stack of layers (EFMD), conditions well met by teleseismic events, allowing the study of the heterogeneity structure in seismically quiet regions.
Here we present a short introduction to the EFM and EFMD. Full details about the methods can be found in Korn (1990), Korn (1997), Hock & Korn (2000) and Hock et al. (2004).

The modified EFM for a single scattering layer
When a plane wave front enters a heterogeneous unlayered medium from below, part of the energy propagates with the ballistic wave front, while part forms the forward scattered coda energy that arrives later at the surface and some energy scatters back into the half-space. Total energy E tot is conserved in this process and we can write it in terms of frequency, ω, and time, t, as with E d being the energy of the direct wave, E c the energy transferred from the direct wave into the coda (forward scattered) and E diff the energy diffusion (backscattering) from the current layer back into the half-space. The energy that is transferred from the incoming wave front to the scattered coda and the backscattering to the half-space can be expressed as an energy loss for the direct wave, controlled by a quality factor Q s for scattering and Q diff for diffusion. To take into account anelastic (intrinsic) attenuation, we use the quality factor Q i . The EFM assumes spatially homogeneous coda energy within the scattering layer. Energy transfer into the coda due to scattering or anelastic losses stops once the ballistic wave leaves the scattering layer after totally reflecting at the free surface, while diffusion out of the scattering layer can continue after that.
A linear least-squares fit of the theoretical coda power spectral density allows us to calculate the coda decay rate, a 1 , and its amplitude at zero time, a 0 (Korn 1990(Korn , 1993. The values of Q i and Q diff at 1 Hz, Q i0 and Q d0 , can be obtained from values of a 1 at different frequencies via where α is the exponent controlling the frequency dependence of Q i (Korn 1990). To determine Q diff and Q i at different frequency bands, we then use: Laboratory measurements of α have shown that it probably remains below 1 for most of the frequency range considered here (Korn 1990, and references therein). Our attempts at obtaining α as a third free parameter in the least-squares inversion of eq. (2) revealed a very complicated trade-off with Q i0 and Q d0 , with high values of α corresponding to negative values of Q i0 and/or Q d0 . Therefore, we limited α to the range of 0.0-0.6, in steps of 0.1, and chose the value that minimized the misfit to the data. The impossibility to fully invert for α makes it difficult to accurately calculate Q i within the EFM, but has a minor effect in the determination of Q diff (Korn 1990). For our range of source distances, Q i is generally much larger than Q diff (Korn 1990), which reduces the impact of this limitation of the EFM inversion.
The coda amplitude at zero time, a 0 , is related to Q s through I D being the integral of the squared amplitude envelope, A 2 (t; ω), over the time window of the direct wave arrival (Hock & Korn 2000). We can then use the relationships between Q −1 s and the structural parameters for different types of ACFs obtained by Fang & Müller (1996) to determine the type of ACF that fits the data best, as well as a first estimation of the correlation length (a) and the RMS velocity fluctuations (ε) for a single scattering layer. The eight one octave-wide frequency bands we used in our analysis for both methods are shown in Table 1. Given the similarity between different ACFs within our frequency range of interest, and despite the possibility to determine the type of ACF of the scattering structure using the EFM, we decided to assume an exponential ACF for this study, since previous studies have proposed it as an appropriate ACF for teleseismic scattering studies (Shearer & Earle 2004).
Finally, we calculated the combined quality factor, Q comb , as the combination of all three quality factors: Please note that Q comb , as opposed to other quality factors, is not related to the energy decay of the wavefield nor it is applied to any specific part of the seismogram. Its only intent is to summarize the total coda attenuation and make it easier to compare our results from the different arrays. Korn (1997) modified the EFM to include depth-dependent heterogeneity. In this model, a plane wave front enters a stack of N heterogeneous layers from below. Each layer j has its own characteristic transit time δt j and scattering quality factor Q s j , which is calculated from the structural parameters a j and ε j (Fig. 1) using the analytical approximation for isotropic exponential media obtained by Fang & Müller (1996). The stack of layers is symmetric with respect to the free surface, which is located at the centre of the stack to take into account the reflection of the wave front.

The EFM for depth-dependent heterogeneity
For a given angular frequency ω c , the normalized coda energy envelope of a velocity seismogram at the free surface is computed from the squared amplitude envelope A 2 (t; ω c ) and is related to the energy balance within the different layers in the model through with E C N (t; ω c ) being the spectral coda energy density of the layer containing the free surface, t N the traveltime from the bottom of the stack of layers to the free surface and E D (t; ω c ) the energy density of the direct wave at the free surface. Q s and Q i control the decay of the direct wave energy over time due to scattering and intrinsic attenuation via where t j represents the one-way travel time through each layer. The energy balance within layer j (j = 1,..., N) is represented by where H is the Heaviside function. The first two terms of eq. (9) describe the energy flux from layer j to the layers above and below, while the next two terms describe the opposite flux from the neighbouring layers into layer j. The last two terms represent the anelastic or intrinsic energy loss and the direct wave energy input into the layer. In practice, for a given model m, comprising a single value of a and ε for each layer in the stack, E D is calculated for each time sample using eq. (8), starting from the measured energy value at the free surface. Then, the system of linear differential equations in eq. (9) is solved for each layer in the model. Finally, synthetic coda envelopes are calculated for each frequency band using eq. (7).

Bayesian inference
We use a Bayesian approach to obtain the values of the structural parameters for each layer in the model (e.g. Tarantola 2005). In this approach, the aim is not to obtain a best fitting model, but to test a large number of models with parameters drawn from a prior probability distribution p(m) (or prior) defined by our previous knowledge on them. In our case, we assume we have no previous knowledge on the value of the parameters and use a uniform prior.
The likelihood associated with model m, p(d|m), is the probability of observing our data, d, given the model parameters in m. We used the Mahalanobis distance (m) (Mahalanobis 1936) between d, with variance-covariance matrix C, and the synthetic envelopes g(m), to calculate the fit to our data: which we then applied to the calculation of the likelihood of model m: Bayes' theorem (Bayes 1763) allows us to calculate the corresponding sample of the posterior probability distribution (or posterior), that is, the probability density associated with model m, or p(m|d): We create an initial model by selecting a random value for the correlation length and velocity fluctuations in all layers in the (a min , a max ) or (ε min , ε max ) intervals, with a min = 0.2λ min [m], a max = 2λ max [m] (λ min and λ max being the minimum and maximum wavelengths in the layer, depending on signal frequency and background velocity), ε min = 4.5 × 10 −3 per cent and ε max = 10 per cent. These maximum and minimum values were chosen considering the relevant range for detectable scattering while being geologically feasible (e.g. Korn 1993;Hock et al. 2004). We then applied the Metropolis-Hastings algorithm (Metropolis & Ulam 1949;Metropolis et al. 1953;Hastings 1970) to sample the posterior probability distribution and generate our ensemble of solution models. This way, at every time step, this Markov Chain Monte Carlo (MCMC) algorithm generates a new model m by randomly choosing one of the parameters in the previous model (m) and updating its value by adding a random number in the (− δa, δa) or (− δε, δε) interval, with δa and δε being the step size for correlation length and RMS velocity fluctuations respectively. In case the new value of the parameter exceeds the boundaries defined by (a min , a max ) or (ε min , ε max ), the distance to the boundary is calculated and the new parameter value is forced to bounce back into the valid parameter range by the same distance . The algorithm then takes model m and uses eqs (9) and (7) to obtain the corresponding synthetic envelopes. In order to decide whether to accept or reject the new model, the algorithm uses the posterior probability exponent (eq. 11), (m)/2, called here the loglikelihood, L, associated with model m, as an estimator of the likelihood and the goodness of the fit to the data. Thus, if L(m)/L(m ) ≥ 1, m will be accepted. If L(m)/L(m ) < 1, however, it will only be accepted if exp(L(m) − L(m )) ≥ q, q being a random number between 0 and 1. This algorithm ensures that parameter values closer to the true value have high likelihoods and are accepted more often than values further from the true value. The acceptance rate (AR) represents the percentage of times new parameter values were accepted through the Markov chain. There are several criteria defining what the value of the AR should be, most of them making assumptions about the properties of the target distributions (e.g. Brooks et al. 2011). In our case, since we do not have any a priori information about the posterior distributions, we aimed at AR values between 30 and 60 per cent. Finally we calculate the 5-to 95-percentile range (PR) for each parameter in each layer in the model from our ensemble of accepted models.
For more detailed descriptions of Bayesian inference and MCMC, we refer the reader to Tarantola (2005) or Brooks et al. (2011).

Synthetic tests
Previous studies have tested the validity of both the EFM and EFMD: Frankel & Wennerberg (1987) and Korn (1990) used a 2-D acoustic finite difference code to check the validity of their respective versions of the EFM; Korn (1997) and Hock et al. (2004) tested their approaches by obtaining synthetic seismograms from a fully elastic 2-D finite difference method and comparing them with synthetic envelopes obtained from the EFMD. Here, we tested our Bayesian inversion code with five different synthetic data sets, with varying number of layers and parameter values. Synthetic envelopes for these five models were calculated using the EFMD algorithm. Parameter values for each one are shown in Table 2, together with a summary of our synthetic tests results. In all of them, we used Pilbara Seismic Array (PSAR, Section 3) as a test array and obtained its velocity model and Moho depth from the Australian Seismological Reference Model (AuSREM, Kennett & Salmon 2012;Kennett et al. 2013;Salmon et al. 2013b) and AusMoho model (Kennett et al. 2011) respectively, although our results should be applicable to all arrays. Based on the lower bound of the lithosphere-asthenosphere boundary (LAB) for this array (Yoshizawa & Kennett 2015;Kennett 2015), we set the bottom depth of all models to 200 km. Frequency bands used are listed in Table 1. Figs 2, 3, 4, S1 and S2, illustrate the results from our synthetic tests for Models 1 to 5 (Table 2). In order to test the convergence of our algorithm, we ran three independent Markov chains for each model, with a total of 3 million iterations (parameter combinations tested) for the single layer model, 9 million for the 2-layer models and 15 million for the 3-layer model. For each chain, we discarded the models corresponding to the burn-in phase, during which the algorithm is not efficiently sampling the posterior probability distribution and models are still affected by the random initialization of the Markov chain. In order to define the point at which the algorithm reached convergence and the burn-in phase ended, we first calculated the mean loglikelihood value in the second half of the chain (during which the algorithm is stable) and then subtracted 5 per cent off that value. We consider the algorithm has converged the first time it accepts a model with loglikelihood L equal or higher than this value. Our threshold was defined based on the observation, in test runs of the EFMD, that L generally remained stable after reaching the defined threshold for the first time. L provides an estimation of the goodness-of-fit of the synthetic data to our real data and takes negative values, meaning fits improve as L gets closer to zero (eq. 11). In terms of parameter values, we consider that a narrow 5-95 percentile range (PR) points to clearly determined values of the structural parameters, while wide 5-95 PRs would suggest multiple parameter values are equally likely and good at fitting our data.
For Model 1, with a single layer encompassing the entire lithosphere, all three chains reached stability and converged within 10 000 iterations. Panels (d)-(f) in Fig. 2 show our posterior probability density functions (PDFs) for each parameter, as well as the joint PDF. In both cases, the distributions are approximately Gaussian and symmetric, with the 5-95 PR being ∼0.06 km and ∼0.01 per cent wide for the correlation length and RMS velocity fluctuations respectively (Table 2), which indicated that the range of suitable values of the parameters is very well defined. The algorithm slightly overestimates the correlation length and underestimates the RMS velocity fluctuations, with the input value of the parameter being included in the 5-95 PR for the latter but not for the former (Table 2, Fig. 2). However, the difference between the central value of the PDFs and the true value of the parameter is <0.4 per cent for both the correlation length and the RMS velocity fluctuations. Graphs on the right-hand side of Fig. 2(panels (g)-(n)) show histograms of the synthetic envelopes for our ensemble of accepted models for all frequency bands. As frequency increases, both envelope amplitudes and width of the ensemble of synthetic envelopes increase too. However, in all cases, the highest density of envelopes, indicated by a dark brown colour, is found in a very narrow line that matches the input data envelopes, not only in the time window used for the fit (shadowed area in the plots), but also outside of it.
Model 2 contains two layers, representing the crust and lithospheric mantle. Our three chains converged in less than 120 000 iterations and remained stable for the rest of the inversion, as shown in panels (a)-(c) in Fig. 3. Panels (d)-(i) in this figure summarize our results. In this case, the PDFs for the parameters in both layers are narrow (the 5-95 PR is <0.7 km wide at most for a and <0.5 per cent for ε) and approximately centred around the input values, even if they are not Gaussian and show some local maxima.
The true values of the parameters lie within the 5-95 PR in all cases, near the centre of the joint PDFs, and the maximum difference between the input values and the absolute maxima of the PDFs is 2 per cent. Panels (j)-(q) in Fig. 3 indicate fits to the synthetic data are good, since they show again that the largest concentration of synthetic envelopes for all frequencies coincides with the input data envelopes.
Models 3 and 4 have the same interface structure as model 2 (Table 2) and investigate high contrast situations in which a strong heterogeneity layer is above or below a layer containing weak heterogeneities, respectively. Figs S1 and S2 summarize our results and can be found in the Supporting Information. In both cases, the chains reached stability within 11 000 iterations. Posterior PDFs for the strongly scattering layer are approximately Gaussian and narrow for both models 3 and 4, with maxima that deviate from the input parameter values by 0.4 per cent at most ( Table 2). The weakly scattering layer, however, is poorly resolved for both models. The posterior PDFs for this layer are very similar in both cases and clearly non-Gaussian. They show multiple maxima that do not correspond to the input parameter values, which widens the 5-95 PR, especially for a. The RMS velocity fluctuation values seem to be constrained to the range from 0.5 to 1.9 per cent for both models, while the shape of the PDFs suggests any value of the correlation length would be equally acceptable, even if large values (>5 km) are favoured. The stability of the chains, shown in panels (a)-(c) in Figs S1 and S2, together with the ensemble of synthetic envelopes on panels (j)-(q), indicate that all these models provide similarly good fits to the data and have similar loglikelihoods. This observation points to solutions being highly non-unique, and to the scattering parameters of the weakly heterogeneous layer not being easily recoverable for these high contrast cases.
Finally, model 5 contains three layers, with boundaries corresponding to upper and lower crust and lithospheric mantle. Our results are shown in Fig. 4 and Table 2. Chains converged in less than 130 000 iterations. In all cases, PDFs are clearly non-Gaussian (panels d-l on Fig. 4) and have complex shapes, which widens the 5-95 PR and increases the range of suitable values of the parameters. The correlation length PDFs show clearly defined maxima near the true values of the parameter in all layers (the maximum distance between the maximum and the input parameter value being 0.35 per cent). RMS velocity fluctuations PDFs are more complex Table 2. Summary of the synthetic model layering and our synthetic tests results. For each model, we include the 5-95 percentile range (PR) and the acceptance rate (AR) for each parameter, as well as the maximum loglikelihood (L) found during the inversion. and neither of them show clear maxima near the input parameter values. Fig. S3 contains the marginal PDFs for all parameters in all layers, as well as the PDF for each individual parameter. It shows a strong trade-off between parameter values in different layers of the model, especially the two crustal layers, and allows us to identify two independent sets of parameters from our results (see Section S.1 in the Supporting Information for details). This interaction between the parameters is caused by two main factors: first, the energy balance the EFMD is based on (eq. 9) is strongly dependent on the layering of the model, since the maximum energy that can be present within a layer at any time depends on its thickness (i.e. energy leaks out of thinner layers faster); secondly, correlation length values have a much smaller effect on coda amplitudes, compared with RMS velocity fluctuations, so the algorithm uses ε to compensate the excess or lack of energy within a layer and match data coda amplitudes. Since panels (m)-(t) on Fig. 4 do not show two clearly different sets of envelopes in our ensemble of synthetic envelopes, and given that the loglikelihood values remained stable throughout the three independent chains we ran for this example, we conclude that both sets of parameters we obtained from our inversion provide equally good fits to the data, even if neither of them match our input parameter values.
Overall, our results show that our Bayesian algorithm is capable of successfully fitting our data and retrieving the input parameter values for our 1-layer and 2-layer models. For our 3-layer model, however, the method provides good fits to the data but fails to obtain the correct parameter values, so we cannot trust results from this model for real data inversions, since we do not know what the scattering parameters are beforehand. Our observations illustrate the usefulness of the Bayesian approach we took in this study. It provides detailed information about the parameter space and indicates whether a single set of parameters that fits our data exists or a range of models can equally match the data. Any estimation of scattering parameters in a maximum-likelihood framework would therefore have led to erroneous conclusions about the physical parameters in this system, which we have avoided. The joint PDFs highlight the complicated relationships and trade-offs between the model parameters in the different settings explored here, which had not been observed in previous studies using the EFMD. We do not observe systematic overestimation of a in the EFMD, as reported by Hock et al. (2004). This observation might be related to the limited number of models tested in grid search approaches and the observed trade-offs between parameters.

DATA S E L E C T I O N A N D P RO C E S S I N G
Our data set consists of seismic recordings from teleseismic events from 1 January 2012 to 31 December 2018, with epicentral distances between 30 • and 80 • from the arrays, source depths greater than 200 km and magnitudes from 5 to 7. These conditions ensure vertical or nearly vertical incidence angles and prevent near-source scattering and unwanted deep seismic phases from appearing in our time window of interest.
After removing the instrument response, we calculate the signalto-noise ratio (SNR) for each trace and frequency band using the peak-to-peak amplitude in two separate time windows: for noise, we used a 20 s long window, starting ∼25 s before the theoretical P-wave arrival [as estimated from PREM (Dziewonski & Anderson 1981)], while for the signal we chose a time window starting 1 s before the theoretical first arrival and ending 40 s later. Only traces with signal-to-noise ratio equal to or higher than 5 were used. Hock et al. (2004) pointed out that the EFMD generally overestimated the RMS velocity fluctuations by up to 3 per cent when using only vertical-component data and that a mix of 1-component and 3-component data produced unstable results, both of them caused by the difference in coda amplitudes between 1-component and 3-component data. However, the International Monitoring System arrays are dominantly vertical component, with WRA having three 3-component stations and ASAR a single 3-component central station. All PSAR stations are 3-component. To address this issue, we tried calculating a correction factor to approximate 1-component to 3-component coda levels. We used several different approaches to obtain this correction factor, all of them based on the ratio between every available 3-component coda envelope A(t; ω c ) or normalized envelope (left-hand side on eq. 7) and its 1-component (vertical) counterpart. However, we found that these ratios varied significantly from event to event and frequency band to frequency band and followed complicated probability distributions, even after using our large datasets to calculate them. The corrected 1-component envelopes did not, in general, fully match the 3-component coda amplitudes using this approach. Our tests also showed the correction factors needed for the normalized envelopes were different than for the unnormalized ones and that small variations in coda amplitudes affected the results we got from both the EFM and EFMD. We also used the 'corrected' 1-component data in our EFM-EFMD algorithm and compared the results in different settings with those from our 3-component data for PSAR. In both cases, the distribution of Figure 2. Summary of the results obtained from our EFMD algorithm for synthetic model 1 from Table 2 from three separate chains, adding up to a total of 3 million iterations (parameter combinations tested). Panels (a)-(c) show the loglikelihood (or posterior probability exponent) for each accepted model in the chain, once the burn-in phase was removed. Panels (d)-(f) contain the posterior PDFs of the structural parameters, as well as the joint PDF. Dotted blue lines in these plots represent the input parameter values and the shaded area corresponds to the 5-95 percentile range (PR). Panels (g)-(n) on the right-hand side show 2-D histograms of the synthetic envelopes for all accepted models and frequency bands, with colour bars indicating the number of models that produced a data sample within each bin. Vertical scale is the same in all plots. The shaded area here indicates the time window used for the fitting and blue dotted lines are the input data envelopes. the heterogeneity followed similar patterns, but the values of the scattering parameters and the posterior PDFs differred. Therefore, we only analyse 3-component data in this study. Table 3 shows the number of events and traces used for each array and frequency band. For PSAR, we only kept events with ≥5 good quality 3-component traces. For WRA and ASAR, we used all available 3-component data. This allowed us to test this method with different station configurations, from a full array (PSAR) to a small group of stations (WRA) or even a single station (ASAR). In all cases, our large event data set guarantees a thorough sampling of the structure beneath the stations and allows us to obtain robust results.
For each array, the data processing prior to the EFM/EFMD analysis was carried out as follows: (i) Computation of 3-component envelopes for each frequency band, station and event. All traces were trimmed to the time window going from t N seconds before to 3t N seconds after the theoretical P-wave arrival (t N being the travel time through the lithosphere, ∼25 s for all arrays). These were then stacked by event, normal-  Table 2 (2-layer model).
ized using eq. (7) and stacked by frequency band. Unnormalized envelopes for all events were also stacked by event and frequency band. The variance of both normalized and unnormalized envelopes was calculated sample by sample from all individual event stacked envelopes and used as the uncertainty of our data. (ii) Estimation of Q s , Q i , Q diff , a and ε for a single scattering layer using the EFM. (iii) Bayesian inversion for the structural parameters of each layer in each model type from Fig. 5 by applying the envelope modelling technique from EFMD, as described in Section 2.2, and using the Q i values obtained from the single layer EFM. The bottom depth of these models was set to 200 km in all cases to make it easier to compare our results from the three arrays. In order to speed up this process, our data were resampled to a common sampling rate of 10 Hz (original sampling rates were 40 Hz for PSAR and WRA and 20 Hz for ASAR) before applying the EFMD algorithm.

T E C T O N I C S E T T I N G
ASAR and WRA are located on the North Australian Craton (NAC), one of the Proterozoic cratons in the Precambrian westernmost twothird of the Australian continent (e.g. Myers 1990;Simons et al. 1999;Wellman 1998;Cawood & Korsch 2008 , Fig. 6). The NAC consists of late Archaean to Proterozoic cratonic blocks overlaid by Proterozoic and Phanerozoic orogenic belts and basins. PSAR is located on Archaean lithosphere part of the West Australian Craton (WAC), which includes both the Pilbara and Yilgarn Archaean cratons, as well as some Proterozoic orogens and basins (Cawood & Korsch 2008, Fig. 6). Present day tectonic activity in Australia is concentrated along the active plate boundaries in the north and east, with continental regions presenting only moderate seismicity (Fichtner et al. 2009).
Previous studies have investigated crust and lithospheric thicknesses and structure around the three arrays studied here. Thick crust (L c > 40 km) with a wide and smooth Moho transition has generally been found in the Proterozoic shields of Central Australia while the Archaean regions of western Australia have thinner crust (L c < 40 km) and sharper crust-upper mantle transitions (e.g. Clitheroe et al. 2000;Kennett et al. 2011;Salmon et al. 2013a;Kennett & Saygin 2015;Sippl 2016). This difference in crustal thickness between Archaean and Proterozoic regions seems not to fit the trend of crustal thickness increasing with age suggested for Australia (e.g. Clitheroe et al. 2000). It has been attributed to post Archaean tectonic activity underplating material at the base of the crust in these regions, as opposed to the Archaean cratons being located at passive margins and, therefore, not being affected by more recent tectonics (e.g. Drummond & Collins 1986). Sippl (2016) and Kennett & Sippl (2018) imaged a series of Moho offsets along a north-south profile in the NAC. One of these  Table 2 (3-layer model).
offsets is associated with the Redbank Shear Zone, which separates the Aileron Province and the location of ASAR from the Amadeus Basin, just south of the array (e.g Goleby et al. 1989;Korsch et al. 1998;Sippl 2016). The profile used in Sippl (2016) and Kennett & Sippl (2018) is located roughly 50 km west of ASAR and shows an offset of up to 20 km coinciding with ASAR latitude, even though they show constant Moho depths beneath the array. An east-west gravity anomaly has been found at the location of this Moho offset (Sippl 2016, Fig. 1) and attributed to denser lithosphere at the base of the crust caused by the uplift of the Aileron crustal block during the Alice Springs Orogeny 400-350 Ma (Goleby et al. 1989;Aitken 2009;Aitken et al. 2009;Sippl 2016). Another offset imaged by Sippl (2016) and Kennett & Sippl (2018), further north, shows a north-south decrease in Moho depth of about 10 km just south from WRA, which has been associated with a Proterozoic suture zone. Corbishley (1970) also found evidence of a layered and dipping structure below WRA. Gravimetric data do not show any anomalies here (Sippl 2016), which has been attributed to a layer of sediments near the surface isostatically compensating the mass excess at depth.
Several studies have addressed the thickness of the lithosphere beneath the Australian continent. Some suggest similarly deep interfaces across all Precambrian cratonic regions in Australia (L l ≈ 200 km, e.g. Debayle & Kennett 2000). More recent studies use a lithosphere-asthenosphere transition (LAT) zone, defined as a mechanical or thermal boundary layer related to changes in rheology, as opposed to a simple interface at the bottom of the lithosphere (e.g. Yoshizawa & Kennett 2015;Kennett & Sippl 2018). Specifically, Kennett & Sippl (2018) place the upper and lower bounds of the LAT at 140 and 170 km depth respectively for ASAR, and at 120 and 160 km for WRA, while Yoshizawa & Kennett (2015) place them at 100 and 200 km depth for PSAR. Some studies have also found evidence for mid-lithospheric discontinuities below both ASAR and WRA which have been interpreted as vertical variations in mantle composition, grain size or fabric, for example a low  Figure 5. Representation of the AuSREM P-wave velocity models for each seismic array (left-hand panel) and the three types of lithospheric models used in the EFMD (right-hand panel). The layering and bottom depth is the same we used in the models for our synthetic tests, with Model types I, II and III corresponding to Models 1, 2 and 5 from Table 2 (Models 2, 3 and 4 have the same layering). Moho depths for each array were obtained from the AusMoho model (Kennett et al. 2011).  velocity melt cumulate layer (Ford et al. 2010) and as a former mantle detachment zone associated with the Alice Springs orogeny (Kennett & Sippl 2018).

EFM results
We calculated the coda decay rate, a 1 , and its value at zero time, a 0 , for all frequency bands and arrays as stated in Section 2.1. We applied the linear least-squares fit of the squared stacked envelopes at the free surface (Fig. S4) to a time window starting t N s after the theoretical P wave arrival (t N being the one-way traveltime through the lithosphere), since the EFM is only applicable after the direct wave has left the scattering layer (Korn 1990;Hock & Korn 2000). The length of this time window varied from 42.5 to 48 s for all arrays and frequency bands, depending on differences in P-wave velocities and arrival times. Table 4 and Fig. 7 summarize our EFM results for all arrays.
A least-squares fit using eq. (2) then allowed us to calculate the quality factors for diffusion and anelasticity at 1 Hz from a 1 . For all arrays, the coda decay rate for the lowest frequency band did not follow the trend defined by the other frequency bands. Including it in the least squares fit produced inconsistent results, and it was excluded from the analysis (Fig. S5). The intrinsic quality factor, Q i , takes similar, frequency independent (α = 0), values of ∼2000 for WRA and PSAR. For ASAR, our best fits to the coda decay rate (eq. 2) correspond to α = 0.2 (Fig. S5) and Q i ∼ 1000. Diffusion quality factor values at 1 Hz are similar for ASAR and WRA (∼400), and higher for PSAR (∼500). Since this quality factor does not depend on α (eq. 16, Korn 1990), this translates into Q diff following the same trend for all arrays but being higher for PSAR than for WRA and ASAR. Fig. S6 shows measured Q s values, obtained from eq. (5), together with the theoretical least-squares regression curves derived by Fang & Müller (1996) for the relationship between the structural parameters and Q s for an exponential ACF. As explained on Section 2.1, these parameters represent a first approximation to the average spatial distribution and strength of the heterogeneity of a hypothetical single scattering layer beneath the arrays. Correlation length values are similar for the three arrays, varying from 0.92 to 1.1 km. Heterogeneities appear to be weaker beneath PSAR than ASAR or WRA, with ε jumping from ∼3.0 per cent for PSAR to ∼4.5 and ∼4.7 per cent for WRA and ASAR, respectively. Fig. 7 shows the frequency dependence of the different quality factors obtained from the EFM. The total quality factor, Q comb , and Q s follow a similar trend. They take the highest and lowest values for PSAR and ASAR, respectively. For WRA and ASAR, their maximum value corresponds to the 0.5-1 and 0.75-1.5 Hz bands, respectively, and the minimum for the 1.5-3 Hz frequency band. The frequency dependence of Q s and Q comb for the highest Table 4. Summary of the main results obtained from the EFM for all arrays: intrinsic (Q i0 ) and diffusion (Q d0 ) quality factors values at 1 Hz, intrinsic quality factor frequency dependence coefficient (α), correlation length (a) and RMS velocity fluctuations (ε).

Array
Q i0 Q d0 α a (km) (per cent) PSAR 2100 ± 200 500 ± 40 0.0 0.9 ± 0.1 2.9 ± 0.1 WRA 2100 ± 100 400 ± 20 0.0 1.1 ± 0.1 4.5 ± 0.1 ASAR 1000 ± 100 400 ± 40 0.2 0.9 ± 0.2 4.7 ± 0.2  frequencies is similar for both arrays. This indicates that the dominating scale length of the heterogeneity is in the 2.6-5.3 km range for these arrays when we consider a single scattering layer. For PSAR, however, Q s decreases for frequencies below 1.5 Hz and then remains approximately constant, which could be indicative of different scale lengths of the heterogeneity being equally present in the structure. For this array, Q comb increases slowly over the frequency range covered here. In general, diffusion is the strongest attenuation mechanism (lowest Q) at low frequencies, with scattering dominating at higher frequencies. For WRA, this transition happens at 0.75 Hz, while for ASAR and PSAR, the change takes place at 1.125 Hz. Anelasticity remains the weakest attenuation mechanism (highest Q) at low frequencies, up to 4.5 Hz for WRA and PSAR and 3.75 Hz for ASAR. Above that frequency, Q diff becomes dominant. These results agree with the observations by Korn (1990), who obtained Q i > 1000 and Q diff ∼ 300-400 at 1 Hz for WRA, even if his results showed that Q i remained larger than Q diff up to 10 Hz. Our Q comb results suggest that, even if Q s , Q i and Q diff are lower at most frequencies for ASAR than for the other two arrays, total attenuation strength is similar for ASAR and WRA. These lower Q comb values could be related to the location of these arrays on the NAC, younger in origin than the WAC (Section 4). The location of ASAR, on the southern edge of the NAC, in an area widely affected by the accretionary processes that took place during the assembly of the Australian continent, as well as major events like the Petermann and Alice Springs orogens (Section 4), could explain the lower values of the different quality factors obtained for this array. For PSAR, the generally high quality factors values we obtained could be related to the location of the array on a tectonically quiet Archaean craton (Section 4). Previous studies (e.g. Cormier 1982;Korn 1993;Sipkin & Revenaugh 1994;Domínguez & Rebollar 1997) have also found lower Q values in regions with quiet tectonic histories, an observation that matches our results from the EFM for all three arrays.

EFMD results
We used the 1-layer and 2-layer lithospheric models shown in Fig. 5 in our inversion of the data for all three arrays. Q i values necessary to calculate the synthetic envelopes from eq. (7) are determined by the EFM. As with our synthetic tests, we ran three parallel Markov chains for each array and model type, with 1 million or 3 million iterations for models with 1 and 2 layers, respectively. The burn-in phase, defined as described in Section 2.2.2, was removed from all chains. Table 5 summarizes our results. To avoid repetition, we include here only the most relevant results for each array. Figures from the rest of our inversions can be found in the Supporting Information.
Inversion of PSAR data with Model type I (single layer), revealed this model produces very large amplitude codas that barely decay over time (Fig. S7). All chains were stable and converged within 14 000 iterations, but the maximum loglikelihood reached during the inversion (<−10 6 , panels (a)-(c) on Fig. S7), indicated fits to the data are very poor, which is also obvious from the comparison of the ensemble of synthetic envelopes with the data (panels (g)-(n) on Fig. S7). The posterior PDFs suggest a nearly homogeneous lithosphere, with ε ∼ 0 per cent and a > 20 km. This is likely due to the large thickness of the layer (200 km) preventing diffusion out of it and, therefore, energy levels in the heterogeneous layer remaining high at all times, regardless of the magnitude of the scattering parameters. We also tested model type I on ASAR data, since coda levels for this array are higher. These results are shown on Fig. S8. Despite the higher coda amplitudes, model type I fails to fit our data for this array, with the maximum loglikelihood reached being on the order of −10 000. ASAR coda amplitudes are similar to WRA, indicating similar behaviour. Therefore, this model was not tested for WRA.
Model type II (two layer) inversions for all three arrays showed much better fits for frequency bands D-H (Table 1) than for A-C (example for PSAR in Fig. S9). However, loglikelihood values are still very low (<−4 × 10 5 , Table 5), which indicates poor fits to the data and, therefore, unreliable parameter estimations, even if there is a substantial improvement with respect to model type I. Our EFM results show scattering only becomes the dominant attenuation mechanism above 1.5 Hz for PSAR (Fig. 7). This, together with coda amplitudes shown on panels (j)-(q) in Fig. S9 being barely above the noise level in the time window of interest for the lowest frequency bands, suggests these codas are affected by large-scale heterogeneities and might not be composed only of energy scattered at small-scale structure. Therefore, the EFMD may not be able to fit our coda envelopes for frequencies below this threshold. To test this, we ran our EFMD inversion code for frequency bands D to H (Table 1) alone. By comparing our results for PSAR in Figs S9 and 8, we observe considerable improvement in the fits to the data, also evidenced by much higher loglikelihood values (<−10). Given these new observations, we discard frequency bands A-C (central frequencies below 1.5 Hz, Table 1) in future inversions of the data for all arrays.
Figs 8, 9 and 10 summarize our results for all three arrays and model type II. All Markov chains converged within 10 000, 7000 and 4000 iterations for PSAR, ASAR and WRA, respectively. The scattering structure beneath all three arrays shows different amounts of heterogeneity in the crust and a relatively homogeneous lithospheric mantle. The posterior PDFs for both parameters in the top layer in all cases are roughly Gaussian and narrow (Table 5). Maxima for the correlation length PDFs for PSAR, ASAR and WRA are at 0.6, 0.7 and 1 km, while RMS velocity fluctuations posteriors peak at 2.4, 2.7 and 3.6 per cent respectively. PDFs for layer 2, on the other hand, show no clear maxima and also have similar shapes for all arrays. For PSAR, ε only takes values below ∼3 per cent, while for WRA and ASAR, the PDF extends up to ∼8 and ∼6 per cent, respectively. In all cases, most of the accepted models have ε < 1 per cent. The correlation length PDF, on the other hand, extends throughout the entire parameter space. For PSAR and WRA, large values of a (>5 km) are favoured, while small correlation lengths (<1 km) seem to work better for ASAR. Loglikelihood values are high (>−10) for all arrays, which suggests fits to the data are generally good. The shape of the PDFs for the bottom layer makes our solutions non-unique and highlights a complicated trade-off between the scattering parameters. These results strongly resemble the ones we obtained from our synthetic test of model 3 (Table 2), in which our Bayesian inference algorithm successfully recovered the input parameter values for the strongly heterogeneous layer while pointing out similar trade-offs between the two parameters and non-unique solutions for the more homogeneous layer. These results suggest the lithospheric mantle beneath all three arrays is much more homogeneous than the crust above it, where most of the scattering and attenuation takes place.
These results agree with observations from previous studies. Kennett (2015) studied P-wave reflectivity in the lithosphere and asthenosphere in Australia. Their results point to strong lithospheric heterogeneity being present beneath stations in the Proterozoic NAC and they suggest correlation lengths of at most a few kilometres and ∼2 per cent velocity fluctuations in the crust. For the lithospheric mantle, they propose much larger correlation lengths (10-20 km) and ε < 1 per cent. Kennett & Furumura (2016) and Kennett et al. (2017) also addressed the presence and interaction of multiscale lithospheric heterogeneity in the Australian continent. In their simulations, they combined large scale heterogeneities with stochastic media and fine scale structure. Their results indicate a wide range of heterogeneity spatial scales are present and interact within the lithosphere. Their models contain four different layers for the fine scale structure, two in the crust and two in the lithospheric mantle, and different horizontal (a H ) and vertical (a V ) correlation lengths. Their scattering parameters suggest a mildly heterogeneous asthenospheric mantle (a H = 10 km, a V = 10 km, ε = 0.5 per cent) and an increase in the strength of the heterogeneity in the lithosphereasthenosphere transition zone (a H = 5 km, a V = 1 km, ε = 1 per cent). The crust is generally more heterogeneous in these models, with a H = 2.6 km, a V = 0.4 km for both crustal layers and RMS velocity fluctuations of 0.5 and 1.5 per cent for the upper and lower crust respectively. At resolvable scales, these values are consistent with our results from the EFMD (Table 5).

Limitations and assumptions
A possible source of error in our inversion is the prescribed thickness of the layers in our models. The EFMD is sensitive to changes in the bottom depth of the different layers, especially for the shallowest layer, as this affects the diffusion out of them. For our model type II, we used a priori information on Moho and lithosphereasthenosphere boundary (LAB) depths. As discussed in Section 4, however, there is some uncertainty in reported depths, especially for the LAB. Our models consider the lithosphere to extend down to 200 km depth for all three arrays, but tests of the EFMD with shallower LABs did not produce major changes in our results.
Previous studies have shown that the strongest inhomogeneities within our planet are found in the lithosphere, even if deeper sections can also be heterogeneous (e.g. Shearer & Earle 2004;Shearer 2007;Rost et al. 2015). In this study, we focused on the characterization of small-scale lithospheric heterogeneities beneath ASAR, PSAR and WRA, with our models extending down to 200 km depth in all cases. We interpreted our results under the assumption that the coda energy was generated by lithospheric inhomogeneities, even if we are aware that we cannot rule out energy contributions from deeper, weaker scatterers. It is unlikely that these structures are the dominant source of coda energy throughout the time window used in our analysis and their effect on our results is likely small.
Other limitations of our approach are the assumptions for the determination of the different quality factors in the EFM and the fact that neither the EFM nor the EFMD take into account phase conversions and reflections at interfaces other than the free surface. Eq. 15b from Korn (1990), which we use in this study, is based on the assumption that Q s and Q diff are of the same order of magnitude, even if that is not necessarily always the case. The intrinsic quality factor (Q i ) value used in the EFMD was determined by the EFM, with a limitation to a single scattering layer and a poorly constrained frequency dependence of Q i , since α could not be fully inverted for in the EFM (Section 2.1). Therefore, all layers in our EFMD models have the same Q i and frequency dependence as obtained in the EFM. The heterogeneity anisotropy observed by Kennett & Furumura (2016) and Kennett et al. (2017) could be included in future approaches of Bayesian inversion for heterogeneity structure but given the range of acceptable models we find and the trade-offs inherent in inverting for scattering parameters we have demonstrated, we are unsure if anisotropy in scattering could be well resolved with these kinds of data.

C O N C L U S I O N S
For three Australian seismic arrays, we applied the single layer modified EFM and depth dependent EFMD to a large data set which includes events from a wide range of magnitudes, distances and azimuths. This ensures we are thoroughly sampling the structure of the lithosphere beneath the arrays and reduces azimuthal and lateral bias. Our EFM results highlight similarities and differences in the behaviour of the quality factors (Q i , Q diff , Q s , Q comb ) for the three arrays studied here and, therefore, the attenuation structure beneath them. Generally, intrinsic and diffusion quality factors are lower at all frequencies for ASAR than for the other two arrays, which would indicate that attenuation caused by these two mechanisms would be strongest for this array. However, the scattering and total quality factors take similar values for ASAR and WRA, making their heterogeneity and overall attenuation structure comparable and different to PSAR. These results are consistent with the tectonic histories and settings of the areas the arrays are located on. WRA and ASAR lie on the proterozoic North Australian Craton (NAC), but while WRA is situated near its centre, ASAR is on its southern border, a margin with more complex and recent tectonic history than the interior of the craton, which correlates with the generally lower quality factor values we observe for ASAR. The EFMD confirms some of these similarities and differences. Our results suggest the crust is more heterogeneous than the lithospheric mantle for all arrays, which could be related to the cratonic nature of the lithosphere in these areas. Correlation lengths in the crust vary from ∼0.2 to 1.5 km and RMS velocity fluctuations take values in the 2-4 per cent range. The scattering structure of the lithospheric mantle, on the other hand, is more complex. Solutions for this layer are not unique, with both low (<2 km) and high (>5 km) correlation length values being equally possible. Low velocity fluctuation values are favoured in the inversion results for all arrays, but the posterior PDFs for ASAR and WRA extend up to ∼6 and ∼7 per cent, respectively, and only to ∼3 per cent for PSAR, thus supporting our hypothesis that the similarities and differences in the heterogeneity structure beneath these arrays are caused by their different locations on the cratons and the different tectonic histories of these areas.
These results highlight the suitability of Bayesian inversion approaches for the characterization of lithospheric small-scale structure. Our synthetic tests show that the combination of the EFMD and our Bayesian inference algorithm can effectively recover heterogeneity parameters for 1-and 2-layer models. Our approach provides detailed information about the parameter space and the trade-offs and uncertainties in the determination of the structural parameters. The study of the posterior PDFs also allows us to determine whether a single set of scattering parameters can successfully explain our data or whether solutions are not unique.
Our study shows that energy flux models can be used for seismic arrays or groups of stations (PSAR, WRA) and single seismic stations (like the single available 3-component station at ASAR). The methods rely on teleseismic data, which makes them suitable for regions with limited local and regional seismicity, such as our study areas in northern and western Australia. The strength of the heterogeneity is not constrained, which makes this technique applicable to strong and weak scattering regimes and apt to the study of small-scale heterogeneity on Earth and other planets. Finally, the computational efficiency of the EFMD means it can be combined with Bayesian inference algorithms to explore wide and complex parameter spaces. Overall, our study shows that the combination of the EFM and Bayesian EFMD is an effective tool to quantify heterogeneities in the lithosphere and can contribute to our understanding of heterogeneity distribution in the Earth.

A C K N O W L E D G E M E N T S
This work was supported by the Leeds-York Natural Environment Research Council (NERC) Doctoral Training Partnership (DTP) NE/L002574/1, in CASE partnership with the Atomic Weapons Establishment (AWE) Blacknest. Andy Nowacki was supported by NERC (standard grant REMIS: Reliable Earthquake Magnitudes for Induced Seismicity: NE/R001.154/1).
The facilities of IRIS Data Services, and specifically the IRIS Data Management Center, were used for access to waveforms, related metadata, and/or derived products used in this study. IRIS Data Services are funded through the Seismological Facilities for the Advancement of Geoscience and EarthScope (SAGE) Proposal of the National Science Foundation under Cooperative Agreement EAR-1261681.
Geoscience Australia operates the Australian National Seismic Network within Australia, its Territories and across its local region. The network supports the Joint Australian Tsunami Warning Centre, operated by Geoscience Australia and the Australian Bureau of Meteorology, which is responsible for issuing tsunami warning bulletins for Australia and its Territories. The network also contributes to monitoring earthquakes in the Australian region.
The authors would like to thank Vernon Cormier and two anonymous reviewers for their constructive and helpful comments which contributed to improving the original manuscript. Discussions with Corinna Roy and Andrew Walker about Bayesian inference and optimization of Python code are gratefully acknowledged.

DATA AVA I L A B I L I T Y
All data for this study are freely available from the IRIS DMS under the network code AU. The code used to download and analyse these data is publicly available and can be found under the DOI: https://doi.org/10.5281/zenodo.5119164

S U P P O RT I N G I N F O R M AT I O N
Supplementary data are available at GJ I online. Table S1. Summary of the two independent and equally likely families of parameters extracted from Fig. S3 for our synthetic model 5 from Table 2. Figure S1. Results from our synthetic test of model 3 from Table 2, in which a strongly scattering layer lies above a weakly scattering one. Panels (a)-(c) show the loglikelihood for each accepted model in the chain, while (d)-(i) contain the posterior PDFs of the structural parameters and the joint PDF. Dotted blue lines in these plots represent the input parameter values and the shaded area indicates the 5-95 percentile range (PR). Panels (j)-(q) on the right contain 2-D histograms of the synthetic envelopes for all accepted models, with colour bars indicating the number of models that produced a data sample within each bin of the grid. Vertical scale is the same in all plots. The shaded area in these panels points to the extent of the time window used for the fitting. Figure S2. Results from our synthetic test of model 4. In this case, the top layer contains weak heterogeneities, while the bottom layer is highly heterogeneous. Figure S3. Joint PDFs for all parameters and layers in synthetic model 5 from Table 2. Plots in the diagonal of the figure contain the individual PDF for the different scattering parameters. Figure S4. Linear fit of the logarithm of the squared normalized coda envelopes (A) for all arrays, as described in Section 2.1. The shaded area represents the maximum time window used for the fits. Lighter solid lines represent our data envelopes. Darker, dashed lines show the linear fits whose equations are shown in the legend. Figure S5. Coda decay coefficient (a 1 ) versus frequency for all arrays. Solid lines represent the regression curves defined by eq. (18) from Korn (1990). The legend contains our obtained values of the intrinsic and diffusion quality factors at 1 Hz, as well as the indicative estimation of the thickness of the scattering layer. Figure S6. Scattering quality factor, Q s , versus the theoretical curve derived by Fang & Müller (1996). The legend contains our estimation of the correlation length and RMS velocity fluctuations for a single scattering layer. Figure S7. EFMD results for PSAR and model type I. Figure S8. EFMD results for ASAR and model type I. Figure S9. EFMD results for PSAR and model type II using all eight frequency bands listed on Table 1.
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper.