H IGH-FREQUENCY GLOBAL WAVEFIELDS FOR LOCAL 3 D STRUCTURES BY WAVEFIELD INJECTION AND EXTRAPOLATION

Earth structure is multiscale, and seismology remains the primary means of deciphering signatures from small structures over large distances. To enable this at the highest resolution, we present a flexible injection and extrapolation type hybrid framework that couples wavefields from a precomputed global database of accurate Green’s functions with a local three dimensional (3-D) method of choice (e.g. a spectral element of a finite difference solver). The interface allows to embed a full 3-D domain in a spherically symmetric Earth model, tackling large-scale wave propagation with focus on localized heterogeneous complex structures. Thanks to reasonable computational costs (10k CPU hours) and storage requirements (a few TB for 1 Hz waveforms) of databases of global Green’s functions, the method provides coupling of 3-D wavefields that can reach the highest observable body-wave frequencies in the 1-4 Hz range. The framework is highly flexible and adaptable; alterations in source properties (radiation patterns, source-time function), in the source-receiver geometry, and in local domain dimensions and location can be introduced without re-running the global simulation. The once-and-for-all database approach reduces the overall computational cost by a factor of 5,000-100,000 relative to a full 3-D run, provided that the local domain is of the order of tens of wavelengths in size. In this paper we present the details of the method and its implementation, show benchmarks with a 3-D spectral-element solver, discuss its setup-dependent performance, and explore possible wave-propagation applications.


INTRODUCTION
The past few decades have seen a significant improvement in our capacity to image the Earth's interior. Rapid hardware and software developments, along with increasing and more accessible high-quality broadband data, have driven progress in our attempts to infer the Earth's structure and dynamics [5,20]. However, even though we understand wave propagation and its numerical solution, it remains computationally prohibitive to generate high frequency synthetic waves in a realistic, multiscale, heterogeneous Earth -a challenge especially relevant when considering global scale problems.
Theoretically, with full three-dimensional (3-D) numerical solvers at hand, such as finite difference [29,43], discontinuous Galerkin [19] and spectral element [14,32,33] methods, we can tackle media of various geometries and desired complexities. The computational cost of those methods, however, scales with frequency to the 4 th power (three dimensions in space, one in time), decidedly restricting their applicability to global scale problems for the observable frequency band: at present, frequencies around 1 Hz can be considered the practical limit for a few full 3-D global simulations on large clusters [35,60]. Since iterative inverse problems or systematic studies of synthetic waveforms require large numbers of simulations, in practice such full 3-D methods remain unfeasible at high frequencies especially at the global scale where waves travel over thousands of wavelengths in distance. Given the dependence of computational cost on the frequency, this will remain a bottleneck for some time even on the largest future infrastructures.
As a result, to compute high-frequency body waves we often rely on methods that include the full physics of wave propagation but reduce media complexity by assuming one-dimensional (1-D) spherically symmetric models, such as DSM [31], GEMINI [25], Yspec [2], or AxiSEM [48]. Such structural simplification facilitates wave propagation for realistic Earth models and provides a compromise between broad frequency ranges and computational costs, allowing for direct comparisons of synthetics with observed waveform data [48]. Leng et al. [36], however, show in a synthetic study that 3-D scatterers of a couple of wavelengths in size have a recognisable impact on high-frequency global wave propagation, an effect that is currently largely ignored. Along with insufficient data coverage, such structural modelling assumptions resulted in global tomographic models that are in reasonable agreement for large-scale structures [4,5], but diverge at smaller length scales. Improving seismic Earth models and our understanding of waveforms must therefore incorporate numerical methods that not only capture the relevant physics and provide a high-frequency resolution, but also account in a computationally feasible manner for the complexity that affects global wave propagation in 3-D.
More recently, AxiSEM3D [35] has drastically improved our capability to generate high-frequency synthetic wavefields on a global scale. Leng et al. [35] proposed to exploit the smoothness of global wavefields to reduce computational costs and characterized the azimuthal dependence of 3-D wavefields in terms of Fourier series. Later, van Driel et al. [64] have adopted the smoothness idea to generate wavefield-adapted three-dimensional meshes. The efficiency of both methods is therefore model dependent, and increasing model complexity increases numerical requirements. Even though the methods can be applied to highly heterogeneous structures and provide a significant speed-up relative to full 3-D solvers and meshes, they nonetheless remain prohibitive for routine use at resolutions above 1 Hz.
Hybrid methods that couple two solvers can bridge that gap between high structural complexity and computational cost. Such techniques represent the bulk 'background' medium as a sparse or smooth structure that accounts for source and path effects, and honour complexities in the model to resolve local scattering in areas of interest. The wavefields are coupled on the boundary between the subdomains, effectively embedding smaller-scale structures in the larger model. Each method provides accurate solutions in its domain of validity, while the reduced complexity (e.g. 1-D model) or approximate physics (e.g. ray theory or the frequency-wavenumber (FK) method) in the background medium decrease the overall computational cost of the problem.
A large proportion of coupling methods proposed to date in seismic wave propagation, often referred to as wavefield injection or domain reduction methods, stem from the work by Alterman and Karal [3] addressing the source singularity problem in finite difference schemes. Such methods are effectively a two-step procedure with no dynamic exchange of information between the solvers. They have been evolving over the years as numerical solvers themselves evolved, striving to efficiently obtain ever better wavefield estimates for both the incident background field and the local scattered field. The vast majority offers to couple the local domain directly under the receiver arrays on the surface [e.g., 8,9,15,41,42,44,49,50,54,55,59,67,74], and only a few of the injection methods are suited for full 3-D global rather than regional applications. Chevrot et al. [15] presented the first implementation of injection of teleseismic body waves into a 3-D spectral-element solver, though the local 3-D domain was embedded in a homogeneous background medium and only incoming planar wave fronts were considered. Later, Monteiller et al. [44] coupled 1-D DSM with Specfem3D Cartesian, Tong et al. [59] coupled 1-D frequency-wavenumber (FK) wavefields with Specfem3D Cartesian, Masson and Romanowicz [39,40] coupled Specfem3D Globe with regSEM [18], and Beller et al. [6,7] coupled 1-D AxiSEM with Specfem3D Cartesian. To the best of the authors' knowledge, only three global-scale 3-D methods go beyond wavefield injection and include domains away from the receivers -around the source [40,70], or at depth [38,39,40] -that extrapolate local wavefields to stations at distance. Masson and Romanowicz [39,40], however, focus on fully 3-D structures in both the background and the local domains, and the high computational requirements of the global background 3-D model restrict the applicability of their method for the uppermost frequencies observed teleseismically. Lin et al. [38], on the other hand, propose a 2-D conceptual study of the full workflow for local spectral-element methods, with a possible extension to 3-D. Other two-way coupling methods have also been proposed to account for higher order long range scattering between the two domains [11,12,13,26,65,68], but dynamic boundary conditions on the interface between the subdomains require a significant computational effort and remain expensive for large numbers of high-frequency simulations.
Offering advantages not provided by either of the methods on their own, hybrid methods have recently been used to study the effects of Moho and surface topography on teleseismic wavefields [44], to carry out parametric studies of inversion methods [7,73], and to locally improve the resolution of tomographic models [16,39]. Although a coupling of two solvers does not provide the ultimate solution to high-frequency simulations on global scale, it is well-suited for a wide range of applications and can bring us closer to understanding the multi-scale Earth structure and the imprints of small heterogeneities -around sources, receivers, or at depth -on the global wavefield.
The methodology put forward in this paper fills the gap for a global scale hybrid coupling that includes the full physics, targets the highest observed body-wave frequencies and caters for both wavefield injection and extrapolation. It provides an unprecedented level of flexibility, with a framework built around 1-D spherically symmetric Earth models that allows for coupling with an arbitrary local 3-D solver of choice. The framework employs Instaseis [63, www.instaseis.net], a tool that allows a near-instantaneous extraction of global Green's functions from pre-computed AxiSEM [48, www.axisem.info] databases. Such databases act as a once-and-for-all solution to wave propagation in spherically symmetric background models, leaving the flexibility with respect to parameter alterations, such as modifications in source properties (radiation patterns, source-time function), in the source-receiver geometry, and in local domain dimensions and location (including regions around the source or receivers, and at depth) to subsequent hybrid applications without the need of re-running the global simulation. Thanks to reasonable computational costs (about 10,000 CPU hours for a re-usable database) and storage requirements (a few TB for global 1 Hz waveforms) of the Instaseis databases, the hybrid method can reach the highest observable body-wave frequencies in global seismology and opens numerical simulations to new parameter regimes.
The paper is structured as follows. In Section 2 we first review the principles of wavefield injection and of wavefield extrapolation (via the Helmholz-Kirchhoff integral in the representation theorem) which form the basis of the one-way methods and connect the solution in two subdomains. We then look in Section 3 at the specificities of the implementation of the coupling with global AxiSEM wavefields and Instaseis databases. In Section 4 we validate the framework with two very different local solvers and discuss the key parameters defining the efficiency and performance of hybrid methods. We also compare the hybrid runs to full global runs, with the coupled framework performing up to 100,000 times faster than 3-D global solvers. In Section we explore a range of possible applications, and in Section 6 we discuss the method with its limitations, further developments and extensions.

THEORY
Hybrid methods decompose the computational domain into subdomains -often referred to as the background and local domains -in order to exploit advantages of individual solvers and increase algorithmic efficiency. Wavefield injection caters for cases when the source is in the background domain, and the wavefield needs to be imposed in the local domain that is embedded therein. The Helmholz-Kirchhoff integrals, on the other hand, allow to ex-  (1), interactions at depth are accounted for locally (2), and finally repropagated to the receivers at distance (3).
trapolate the wavefield from the local domain around the source region to receivers at distance in the background medium. Together the two provide a framework for three distinct set-ups: with the local domain around the receiver (injection, Fig. 1a), around the source (extrapolation, Fig. 1b), and at depth (both injection and extrapolation, Fig. 1c). Figure 2: Definition of domains involved in wavefield injection and extrapolation. Background volume V = V b ∪ V s is subdivided by a fictitious interface ∂V s , where subdomain V s defines the truncated volume represented in a local solver. Modifying the structure in V s in the local solver, hybrid coupling accounts for local interactions and the wavefield u(x, t) in V s is a linear combination of the background wavefield u b (x, t) and the scattered wavefield u s (x, t). Note that the structure of the coupling interface ∂V s itself must remain unchanged relative to its representation in the background volume V. With V s the same as its representation in the background volume V there is no residual scattered wavefield relative to the background model and u s (x, t) = 0, i.e. hybrid coupling reproduces the original wavefield u b (x, t). The volume V s is represented in a local solver that includes the seismic source, and the local wavefields are recorded on ∂V s . The fictitious boundary ∂V s is used to repropagate the wavefield u(x, t) from the local solver to receivers at distance.

Wavefield injection
Wavefield injection introduces an incident wavefield generated by a source in a background medium into a local domain embedded in that medium. The source and the majority of the path effects are accounted for in the background domain and are carried over to a regional solver via the local boundary: either via imposing appropriate boundary conditions that drive the local simulation [54] or by introducing a distribution of body forces on the local boundary [41]. The method therefore consists of two steps. First, the interactions in the background domain are simulated and recorded on a fictitious boundary of an area of interest. Next, that area is represented in a local solver, with the boundary introducing the background wavefield.
Let V be a linearly elastic background model subdivided by a fictitious interface ∂V s such that V s is the region of the local truncated volume embedded in V, and V b the part of the domain including the seismic source, i.e. V = V b ∪ V s (Fig. 2a). The domain V s , along with its bounding interface ∂V s , has two representations, one in each solver: -In the global solver, V = V b ∪ V s is represented, and ∂V s is a fictitious, non-physical boundary that surrounds the region of interest V s within the global domain V. ∂V s serves only as a coupling interface.
-In the local solver only V s is represented, while ∂V s is an artificial, non-physical boundary that serves as a coupling interface and requires absorbing boundary conditions.
In the first step, the background wavefield u b (x, t) is recorded on ∂V s within the global domain V. In the second step, the volume V s is represented in a different solver and the wavefield u b (x, t) is imposed on the boundary ∂V s . If the model in the local domain remains identical to the representation of V s in the background model, there is no residual scattered wavefield relative to the background model and the coupling should reproduce the original background wavefields in V s , i.e. u s (x, t) = 0 and u(x, t) = u b (x, t). When the model in V s is modified to account for complex local structures, the total wavefield u(x, t) in the local solver becomes u(x, t) = u b (x, t) + u s (x, t), where u s (x, t) 0 represents local scattering. In what follows, we discuss two ways of imposing the background wavefield on the boundary of the local domain to drive the local simulation.

Boundary condition
The background wavefield u b (x, t) can be imposed via a Dirichlet condition on the boundary of the local domain. Let's first consider the total displacement wavefield u(x, t) that is a solution to the elastodynamic equations of motion (and similarly for σ h (x, t)) where u b (x, t) is the background wavefield computed in the first step, u s (x, t) is the scattered wavefield due to 3-D model heterogeneities in V s , and u(x, t) is the total wavefield that solves equation (1). We then want to find the hybrid field u h (x, t) such that In the new problem, unlike in equation (1), the information about the excitation is given by a Dirichlet condition on ∂V s that imposes a discontinuity in the wavefield. This result could be extended to problems with a non-linear constitutive relation in V b as long as u h (x, t) remains a linear perturbation around u b (x, t) on ∂V s .
Posing this problem as above, we can see that the boundary ∂V s not only generates the complete field u(x, t) in V s , but also the scattered field u s (x, t) propagating out of ∂V s . In practice, however, the second computation happens only in the truncated subdomain V s with absorbing boundary conditions applied on ∂V s to the outgoing field u s (x, t). The background wavefield u b (x, t) can be computed with any numerical or analytic method provided that the wavefield can be interpolated to the grid points of the local solver.

Distribution of point sources
Wavefield injection can also be implemented via a discretisation of the surface integrals in the representation theorem into multiple point sources. Applying the representation theorem to the background wavefield u b (x, t) in V b we have: where n is the normal on ∂V s pointing outwards of V b . Given a global background domain in our methodology ∂V is assumed to be the Earth's stress-free surface, so that integrals over ∂V vanish and only integrals over ∂V s remain in (4). Note that this represents the background wavefield in V b only.
Recall now that we aim to impose the background wavefield on the boundary of the local domain in order to represent (inject) where n = −n is the normal on ∂V s pointing outwards of V s . This result can be rigorously derived by considering a surface-source type of problem of scattering by an arbitrary inhomogeneous object [24]. Approximating the surface integrals by a numerical quadrature S f (x)dS ≈ M m=1 α m f (x m ) and using the Dirac delta function property Expression (6) has the form of a body-force contribution with a discrete distribution of moment tensor and force sources on ∂V s , and it completely represents the background wavefield in an unmodified model in subdomain V s . It is similar to the so called "time reversal mirrors" in acoustics [23,41], although with reconstructing the wavefield forward in time rather than refocussing it backward in time. Modifying the structure in V s and imposing the same moment tensor and force sources on the boundary results in wavefield reconstruction with additional scattering terms superimposed on the background field: u(x, t) = u b (x, t) + u s (x, t). As before, absorbing boundary conditions are necessary on the representation of ∂V s in the local solver to absorb u s (x, t) and avoid non-physical reflections off model boundaries.
The equivalence between the distribution of body forces and introducing a discontinuity on the surface ∂V s in equation (3) is discussed in Chapter 3 of Aki and Richards [1] in the context of representing finite faults.

Extrapolation
Wavefield extrapolation propagates the wavefield from a local domain through an encompassing background domain. The interactions in the vicinity of the source are accounted for in the local solver and carried over to receivers at distance. Like wavefield injection, extrapolation consists of two steps. First, the source region is simulated in the 3-D solver and the fields are recorded on the boundary. Next, that local boundary is fictitiously represented in the background medium where it introduces the local wavefields.
As before, let V be a linearly elastic background model subdivided by a fictitious interface ∂V s such that V s is the region of the local truncated volume embedded in V, and V = V b ∪V s (Fig.  2b). This time, however, the representation of V s in the local solver includes the seismic source. In the first step, the local wavefield is recorded on ∂V s . In the second step, the boundary ∂V s is fictitiously represented in the background medium and allows for extrapolation to receivers. If the model in the local domain remains unchanged relative to the representation of V s in the background model, the coupling should reproduce the original background wavefields in V s , as u s (x, t) = 0 and u(x, t) = u b (x, t). With the structure in the local representation of V s modified to include heterogeneities, the total wavefield u(x, t) in the local solver and at receivers after extrapolation becomes u( We can repropagate the scattered wavefield out of the local domain V s to the receivers in the background domain V b by discretising surface integrals in the representation theorem like in Section 2.1.2, and imposing moment tensor and force sources on the fictitious representation of ∂V s in the background model. The total wavefield at the receivers at distance is the sum of the background and scattered wavefields, We know the background wavefield u b (x, t) in V b and on ∂V s , and hence we wish to represent the scattered wavefield u s ( and the volume integral vanishes. As mentioned above, integrals over the surface reduce to integrals over ∂V s . Repeating the procedure from Section 2.1.2 for the scattered field in V b , with the numerical quadrature and the delta property of the Dirac delta we get for x m on ∂V s . Note that for a domain at the surface around the source region, the total wavefield from the local solver can be considered as the locally scattered wavefield u s (x, t), since all path effects are accounted for in the extrapolation step. For the more general case of a local domain along the propagation path, however, the locally scattered wavefield needs to be separated from the background medium in order to account for the propagation effects of the background wavefield outside of the local region.
Expression (7) has the form of a body-force contribution with a discrete distribution of moment tensor and force sources on ∂V s . In other words, we can regenerate the scattered field u s at any point x ∈ V b if we know the scattered field on ∂V s and the Green's functions between x and ∂V s .

IMPLEMENTATION
The proposed implementation of hybrid coupling embeds any local 3-D domain in a 1-D spherically symmetric Earth model. We treat the global background medium using AxiSEM [48] and Instaseis [63] both to extract the background wavefields necessary for injection in the local domain and to extrapolate the scattered wavefields back to receivers on the Earth's surface. The approach is centred around precomputed databases that provide the general framework of the method and leave flexibility for coupling with an arbitrary local solver of choice.

AxiSEM: 2.5-D global wavefields
AxiSEM is a parallel spectral element axisymmetric solver based on the variational form of the wave equation. Simulations are run in two-dimensional (2-D) domains, while the third azimuthal dimension is tackled analytically to generate 3-D wavefields in 1-D Earth models, as shown in Fig. 3. Such an axisymmetric setting reduces the computational cost by up to 3-4 orders of magnitude relative to full 3-D modelling [48]. With AxiSEM simulations scaling with frequency to the third power, it allows for global wave propagation at frequencies above 1 Hz with a reasonable computational effort. Should only source-receiver pairs with distances under 180 degrees be of interest, the numerical cost can be further limited by truncating the 2-D slice to a given distance (which adds an absorbing boundary to the system). For example, limiting the domain to 90 degrees halves the computational requirements relative to a full Earth simulation.
Apart from a compromise on model symmetry, the solver does not make any limiting assumptions regarding short-period wave propagation physics (the lack of rotation and gravity is not relevant for periods below 100 s) or on the source radiation pattern. Transverse anisotropy -the most complex type of anisotropy for a spherically symmetric model [61] -and attenuation [62] can also be accounted for.

Instaseis: a database approach
The AxiSEM-generated Instaseis databases of 2-D Green's functions act as a once-and-for-all solution to wave propagation in spherically symmetric models and store the basis coefficients of Lagrange polynomials, typically of degree 4. The stored databases give flexibility with respect to parameter alterations, such as modifications in source properties (radiation patterns, source-time function), in the source-receiver geometry, and in local domain dimensions and location (including regions around the source or receivers, and at depth). Green's functions for a given Earth model therefore allow for a wide range of parameter changes in subsequent hybrid applications without the need of re-running the global simulation. We distinguish two types of Instaseis databases: 1. The forward database is generated by a moment tensor point source at a fixed depth. For such Green's functions receivers exist throughout the medium, whereas the source can be rotated from the north pole to any longitude and latitude. However, it remains at the depth prescribed at the time of the simulation.
2. The reciprocal database is generated by a single force point source at a fixed depth and recorded throughout the medium. Due to the reciprocity of Green's functions, the locations of the single force source can be treated as three component receivers. Therefore, in such databases the sources exist throughout the medium, whereas receivers can be rotated to any longitude and latitude, but must remain at the depth prescribed at the time of the simulation. Typically, reciprocal databases are generated by a single force on the Earth's surface which mimics real-world receivers.
Instaseis -the backbone of our method -is a tool that efficiently accesses, processes, and extracts full 3-D wavefields from such pre-computed AxiSEM databases. Given the 4 th order spatial accuracy of the Green's functions, Instaseis can interpolate the wavefields to a mesh-independent location, while higher-order Lanczos resampling allows to retrieve seismograms at any sampling rate. Changing the source radiation pattern or the sourcetime function, or defining a finite fault does not require a new database generation, as Instaseis convolves chosen source parameters into the Green's functions. In the following we discuss how the two classes of Instaseis databases serve as a basis for the global hybrid framework.

Hybrid: a flexible framework
The Instaseis-based approach to global hybrid modelling exploits the reciprocity of Green's functions to accommodate both injection and extrapolation of wavefields. Each of the database types imposes different geometrical source-receiver constraints, but nonetheless both represent the same response of the same medium (for a given spherically symmetric model and a given frequency). A forward database allows to place receivers anywhere in the domain and requires a fixed source depth, and therefore lends itself to wavefield injection that necessitates the response of the background medium on the boundary of the local domain. A reciprocal database, on the other hand, allows to place sources anywhere in the domain and fixes receivers at the surface, and is appropriate for wavefield extrapolation.
In the most general case of a complex subdomain at depth, we identify the proposed global hybrid method as a three-step approach (with the other cases reducing to two steps, steps 2 & 3 and steps 1 & 2 for local domains around the source and receiver, respectively):

1: [Instaseis injection]
The forward database is used to extract the background wavefield on an artificial boundary which is defined by the mesh of the local method chosen for the coupling. The incoming wavefield is then imposed on those boundary points of the 3-D solver and drives the local wave propagation.

2:
[Local solver] The total wavefield from the simulation is recorded on the bounding box. The locally scattered wavefield is equal to the difference between the total local wavefield and the background wavefield (u s (x, t) = u(x, t) − u b ).

3: [Instaseis extrapolation]
The locally scattered field is convolved with the reciprocal database of Green's functions and repropagated back to the receivers on the surface. It is then superimposed on the pre-computed background wavefield to render the total global wavefield that incorporates global propagation effects as well as local 3-D scattering.
The availability of both forward and reciprocal databases of Green's functions for a global model renders the method particularly attractive. The spherically symmetric set-up caters for a broad range of possible geometries and sources with only two database simulations -the same databases can be queried for any arbitrary source-receiver configurations. The heavy computational burden of simulating the background field is therefore transposed to storage requirements and to the embarrassingly parallel and thus massively scalable procedure of accessing a relevant database to extract and process wavefields.
The modified hybrid Instaseis interface provides the processing infrastructure, with rotations between local and global coordinate systems, interpolations of wavefields to points defined by the mesh or quadrature of the local method used for the coupling, temporal resampling of the local and global wavefields, as well as changes in the source mechanism. Communication between Instaseis and the local solver happens via HDF5 files.

Local solvers: SPECFEM3D Cartesian and Wave Propagation Program
Currently the framework has been coupled with two local 3-D codes representative of the typical and most widely used methods using spectral elements and finite differences: Specfem3D Cartesian [32], a spectral-element (SEM) solver, and Wave Propagation Program (WPP) [46,51], a second order finite-difference (FD) solver. The Specfem3D implementation includes all three coupling cases, with the choice of the local domain on the source side, on the receiver side, and at depth. The WPP implementation, a strictly cartesian code, has been developed for a specific source-side application [52, 53, in preparation] and has not been extended to the receiver and depth cases.

Specfem3D Cartesian
Specfem3D Cartesian is a software that is mostly used for seismic wave propagation at local and regional scales, although it can be applied to other acoustic and elastic wave propagation problems. It is well-suited for handling various geometries and allows to account for 3-D complexities affecting the seismic wavefield, such as lateral variations of elastic parameters and density, anelasticity, full 21-parameter anisotropy, and topography of internal discontinuities and of the free surface. It also includes coupled fluid-solid domains.
We inject the global wavefield into a Specfem3D simulation via imposing the Dirichlet boundary condition (see Section 2.1.1) on all points of the boundary of the SEM mesh. In order to avoid spurious reflections, we then apply the paraxial boundary condition [57] to absorb the residual outgoing scattered wavefield. Applying the boundary condition to the outgoing scattered wavefield requires the knowledge of background tractions on the boundary of the SEM mesh -the known background tractions are subtracted from the locally computed tractions, while the residual (i.e. the difference not known a priori) is absorbed. Therefore, we query a forward Instaseis database for velocities and tractions on the GLL points on the mesh boundary. The values are stored in HDF5 files and later accessed by Specfem3D during the simulation -a modification added to the code for coupling purposes. Note that absorbing only the lower energy residual scattered wavefield renders the boundary condition more effective.
For extrapolation we save displacements and stresses from the local simulation into interfacing HDF5 files. Given the mesh construction, there is no need to define a new quadrature to discretise the integral in the representation theorem and we directly use the weights of the GLL points in the extrapolation equation 7. The integration points are therefore well-defined for a high order quadrature which renders the extrapolation step of the coupling efficient and easy to implement. Moreover, the native mesher of the code respects the curvature of the Earth's surface as well as of the 1-D interfaces.

Wave Propagation Program
The Wave Propagation Program (WPP) is a second order (in time and space) open source FD solver developed at Lawrence Livermore National Laboratory for local and regional seismic wave propagation. WPP solves the elastodynamic equations of motion for elastic or anelastic materials with fully 3-D heterogeneous material model specification, including anisotropy and anelastic attenuation with P-and S-wave quality factors. Surface topography can be represented through a boundary conforming curvilinear mesh created automatically with a built-in mesh generator from user-supplied files. Supergrid absorbing boundary conditions are implemented to dampen the outgoing wavefields, and depth-dependent mesh refinement can account for the increase in seismic wavespeeds.
For the extrapolation step necessary for the source-side coupling, we save displacements and strains from the local simulation into interfacing HDF5 files. Given the Cartesian mesh construction, we need to define a quadrature to discretise the integral in the representation theorem. With a dense regular FD grid, we choose to use the mid-point rule with the areas between grid points as weights in equation 7. In the current implementation mesh construction does not respect the curvature of the Earth's surface or of the 1-D interfaces.

VALIDATION
To benchmark the hybrid scheme we first consider the Specfem3D Cartesian implementation and reconstruct the background wavefield for a 1-D model from the original Instaseis database. We then introduce a smooth Gaussian perturbation in the local Specfem region and benchmark hybrid seismograms against a 3-D reference solution computed by Specfem3D Globe. Finally, we turn to the WPP-Instaseis source-side implementation at high frequencies and reconstruct 1-D wavefields. All 1-D simulations that reconstruct the background wavefield were validated for multiple receiver locations and components. The 3-D validation, however, was constrained to receiver locations at 30 degrees due to computational costs of global Specfem3D Globe simulations. We chose 30 degrees as a compromise between the simulation time of Specfem3D Globe and our aim to validate waveforms at teleseismic distances.
The accuracy of the benchmarks, and more generally of the hybrid method, depends on the meshing of the local solver relative to the 'dominant period' of the database. The parameters of the local simulation need to be considered carefully both relative to the requirements of the local solver itself, as well as relative to the discretisation of the surface integrals of the representation theorem for extrapolation and for injection via a distribution of point sources. Each use case needs to be considered separately, and it is recommended to test every problem-specific coupling set-up with 1-D benchmarks before introducing 3-D structures.
The dominant frequency (inverse of the dominant period) in this paper is understood as the frequency corresponding to the halfmaximum of the spectral amplitude of the source-time function, ensuring that the bulk of the energy up to the dominant frequency is contained in the simulations. In a Specfem3D Globe nomenclature, on the other hand, frequency corresponding to the maximal frequency with non-zero spectral amplitude is considered as the 'dominant frequency'. The 10 s dominant period runs discussed below, for example, would be referred to as '5 s' in Specfem3D Globe.

1-D benchmarks: Specfem-Instaseis wavefield reconstruction
The hybrid methodology should reproduce the original background wavefields if the local domain remains unmodified relative to the background model. With the local domain at the surface, either around the source or under the receivers, the wavefields can be reconstructed at teleseismic distances. However, for the local domain at depth along the propagation path, only the "residual" scattered wavefield (u s (x, t)) is extrapolated to receivers at distance, while unchanged 1-D models only regenerate the 1-D interactions. Hence, the 1-D benchmark for the local domain at depth can only prove a correct reconstruction of the 1-D wavefield inside of a buried domain without a free surface. It tests a correct implementation of wavefield injection at depth, as extrapolation for the depth case cannot be validated with a 1-D model.

3-D benchmarks: Specfem-Instaseis local scattering
To benchmark the hybrid framework for 3-D structures in the local domain, we introduce a 40% Gaussian perturbation of seismic velocities v p , v s , and of density ρ over PREM into the Specfem3D Cartesian domains at depth and on the surface, around the source and under the receivers. The extreme perturbation, rarely observed in bulk-Earth applications, has been chosen to validate the accuracy of our method as an upper bound. As before, a full moment tensor source and a Gaussian sourcetime function of a 5 second half-width are used, resulting in wavefields with a dominant period of 10 s. The Gaussian perturbation is defined with a standard deviation equal to two S wave wavelengths. For the domain at depth, the perturbation is located in the middle of the domain in x, y and z, and the domain itself is centred around the turning point of the P and S wave for the presented source-receiver configuration of 30 degrees epicentral distance (Fig. 5a). For the domains at the free surface -around the source and under the receivers -the centre of the Gaussian is located in the middle of the domain in x and y), and at a depth equal to two standard deviations, resulting in a strong perturbation on the surface (Fig. 5b) Table 2 with a summary of computational costs. therefore been simulated for 23 minutes to obtain a complete seismogram at 30 degrees, and Fig. 6a presents the resulting benchmark of the full synthetic signal. Given the high computational cost of Specfem3D Globe simulations (271,200 CPUh for 23 minute seismograms at 10 s dominant period) and the completeness of the depth benchmark for injection and extrapolation, we choose to validate the source and receiver cases without surface waves and compute only 15 minutes of Specfem3D Globe seismograms. Moreover, since a correct generation of realistic surface waves requires a full 3-D representation of the crust between the source and the receiver, our hybrid methodology that approximates the Earth as a spherically symmetric layered medium with a flat free surface is not well suited for surface-wave applications. Since the hybrid benchmark at depth is complete and fully sufficient, and given our focus on highfrequency body waves, a validation of 1-D surface waves for all three coupling cases is of little interest.
The model perturbation has a comparable effect in both solvers, with differences arising due to numerical reasons as well as limitations of the hybrid method. The numerical issues, such as distinct meshing schemes, source and receiver mislocations, and accumulative round-off errors, are a consequence of solver differences. According to Leng et al. [35], the misfits between Instaseis databases (that correspond to AxiSEM3D solution to 1-D models) and Specfem3D Globe simulations can generally be expected to remain below 2% for 1-D models, although envelope misfits can sometimes be very large for unidentified reasons. For 3-D tomographic models, the misfits between AxiSEM3D and Specfem3D Globe are more significant and vary between receiver locations. Thus, with a locally perturbed 1-D model we can expect a mismatch between the hybrid and full 3-D seismograms, as meshing differences alone can result in different representations of the perturbation. Moreover, in the local method of the hybrid scheme, it is necessary to use absorbing boundary conditions which are known to be imperfect and can result in often significant spurious reflections. On top of such numerical issues, the limitations of a one-way hybrid scheme are expected to contribute to the visible waveform discrepancies. Higher order long range scattering is not accounted for in the coupling, while the Specfem3D Globe waveform represents all interactions between the perturbation, the free surface, and the internal 1-D model discontinuities.
The benchmark for the perturbation at depth along the propagation path shows the best fit, with PM of 10% and less and EM of 20% or less -values considered to represent an 'excellent fit' by Kristeková et al. [34]. With the local domain of 16 wavelengths in size (1020 km in x, y and z) and centred around the turning point of P and S waves, most contributing long range scattering is accounted for due to the size of the domain, and the resulting misfits can be mostly attributed to numerical discrepancies discussed above. For the source and receiver cases, however, the local domain is significantly smaller (480 km in x, y and z), as the decrease in wave velocities results in shorter wavelengths close to the Earth's surface. The local domain, therefore, does not encompass all contributing long range scattering and results in larger differences and breaks the one-way assumption of the method. Filtering all components at 20 seconds reduces the misfits for all three cases, though the visible amplitude differences for the source and receiver coupling settings remain. The extreme 40% perturbation -validating the accuracy of our method as an upper bound -can be benchmarked with a 'good fit' (as defined by Kristeková et al. [34]) for all three coupling set-ups, and we expect the higher order long range scattering from more realistic structures to remain small at teleseismic distances relative to primary interactions of interest. Taking all contributing factors into consideration, we deem the results acceptable with the PM and EM under 20% for the complete coupling case that incorporates a perturbation at depth, with the larger misfits for the coupling set-ups under the free surface attributed to the limitations of the method.

High-frequency WPP-Instaseis wavefield reconstruction
The current implementation of the WPP-Instaseis source-side coupling does not respect the curvature of the Earth's surface or of internal discontinuities. A regular Cartesian WPP mesh is used, which for large domains results in a significant difference in altitude between the spherical Earth and the flat Cartesian surface. The method, however, requires that the coupling surface ∂V i remains unchanged relative to the background model. For large WPP Cartesian domains, this condition is not met and numerical errors arise. We have found that the magnitude of resulting errors is also dependent on the size of the altitude gap relative to the wavelength.
For small domains on the order of tens of kilometres, the errors remain small. Given the cost of global Specfem 3-D simulations, however, our 3-D benchmarks are limited to a dominant period of 10 seconds. The local domain needs to be large enough to encompass a smooth Gaussian perturbation of two wavelengths, which for a 10-second simulation means 500 km in x, y, and z.
In a domain of this size the model mismatch is large, as both the free surface and five discontinuities that are within the top 500 km of a 1-D model are misrepresented on a Cartesian mesh relative to a spherical background model.
We therefore choose to benchmark the WPP-Instaseis wavefield extrapolation for a short period 1-D reconstruction at 2 seconds with a local domain of size 30 × 30 × 30 km. According to the WPP manual [51], the number of points per wavelength (PPW), the largest significant frequency of the source time function ( f max ), the minimum wavelength (v min ) and the FD grid spacing h min are related according to With a minimum wavespeed v s of 3360 m/s, 16 PPW recommended for best accuracy, and the required f max of 1 Hz (1 s, for a 2 s dominant period), the WPP simulation should be sampled with at least 210 m FD grid spacing in x, y and z.
For the sake of variety, we choose an isotropic moment tensor source (explosion) and IASP91 as the 1-D background model, and we include attenuation to show the validity of the scheme for attenuating databases. As for the Specfem-Instaseis coupling in Section 4.1, the PM and EM are below 1% (Fig. 7)    FWD stands for forward and BWD for reciprocal database, and database type also indicates the dominant period of the run. Maximum source depth, components, seismogram length and the range of epicentral distances define the final database size. All databases were generated for 3 components and all distances. Exact CPU hours of the 2 s reciprocal database are unknown, as it has been generated for other purposes prior to this study. Injection CPUh is a one-off cost for a single application that requires multiple realisations of the local domain. Extrapolation CPUh is a cost of a single three component seismogram. Run ID corresponds to IDs in Figs 6 and 7. The database cost is a one-off cost, as a single database is available for a multitude of hybrid applications. Excluding costs of database generation, a full standard hybrid run is 5,000-100,000 times cheaper than a full 3-D run by Specfem3D Globe, and exact costs depend on the local method, on the chosen parameters, and on the location of the local domain. Benchmark database indicates whether a forward (fwd) or a reciprocal (bwd) database was used, and specifies the dominant period, and columns 3-8 summarise injection and extrapolation parameters that affect the costs. Length of inputs/outputs defines the time length of wavefields in interfacing HDF5 files with spacing between samples equal to the time step (dt) of a given local run.

Database generation and storage
The bulk of the computational effort of the hybrid methodology lies in database generation, and the cost of generating an AxiSEM database scales with frequency to the 3 rd power [63]. The actual cost of each run depends both on the machine and on the chosen simulation parameters (e.g. attenuation, number of elements per wavelength), and a forward database is approximately twice as expensive as a reciprocal database for the same set of parameters. It should be noted that the cost can be significantly reduced for specialised studies with limited epicentral distances, as the 2-D disk in Fig. 3 can be cropped to encompass only the required latitudes (with absorbing boundaries applied to the additional non-physical boundary).
The advantage of database generation is the once-and-for-all approach. Storing the Green's functions from a single AxiSEM run, one can freely access and re-use the 1-D background wavefields for a variety of hybrid applications -the location and size of the local domain, and more generally of the source-receiver geometry can vary, and the source parameters can be modified. Therefore, the cost of generating a database can be considered a one-off investment for a large spectrum of problems, shifting the focus from regular access to substantial computational resources to the necessary database storage requirements. As discussed by van Driel et al. [63] the storage space required for a reciprocal database depends on frequency content, maximum source depth, components, seismogram length and the range of epicentral distances. A forward database takes approximately twice as much space as a reciprocal database for the same set of parameters. The CPU hours and storage requirements for the databases used in the benchmarks are listed in Table 1.

Hybrid injection and extrapolation
There is no general rule for the costs of injection and extrapolation, as both depend on the the length of the local simulation, as well as on mesh discretisation and time step of the local method: -The number of points on the boundary of the local domain influences both injection and extrapolation. It depends on the size of the local domain, as well as on the points chosen for the coupling. For injection, the points must correspond to the mesh of the local method. For extrapolation, one could define a quadrature independent of the mesh points, provided that the local method can accurately interpolate the local wavefields to any point. In both Specfem and WPP implementations, however, the extrapolation points are chosen to be the local mesh points, and in WPP using only every other point on the boundary proves sufficiently accurate for extrapolation and reduces the cost.
-For injection, background wavefields extracted from the database are required to have the same time length and time step as defined in the local simulation, and both factors influence the cost. Longer time series require higher time to save to HDF5. Moreover, the costs of interpolating wavefields to the right time step vary and depend both on the sampling of the database and on the new sampling required.
-For extrapolation, the time length of the local wavefields influences the computational cost. However, the local wavefields are downsampled for extrapolation to avoid the high costs of interpolation, and hence the time step of the local simulation has less of an effect on the overall cost than in case of injection. Table 2 summarises the injection and extrapolation costs and parameters (number of points, time steps, lengths of wavefields) for benchmark runs in Section 4.2, as well as CPU hours (CPUh) of respective local simulations and one-off database runs. Note that the cost of injection includes all receivers defined in the local domain, while the cost of extrapolation is a per receiver cost for a three-component seismogram. Therefore, the injection cost for a single application that requires multiple realisations of the same local domain is a one-off cost. Wavefields are saved into HDF5 files according to the requirements of the local mesh and remain available for multiple runs of the local solver, further reducing the costs of injection solely to the cumulative cost of required local runs. However, modifications in the local domain for hybrid extrapolation require re-running both the local simulation and the extrapolation to receivers.
Depending on the parameters of the set-up, the computational cost of a full standard hybrid run -including the cost of the local simulation that is method dependent, but excluding database generation -is reduced by a factor of 5,000-100,000 relative to a full 3-D run, provided that the local domain is of the order of tens of wavelengths in size. A Specfem3D Globe simulation for the 3-D benchmark at depth (run ID 1) cost 271,200 CPUh for 23 minutes of wave propagation, while the source and receiver cases (run ID 1 and 2, respectively) required 227,500 CPUh for 15 minutes of wave propagation. The corresponding hybrid coupling costs are on the order of tens of CPUh. The database runs, even though included in the comparison, are a one-off cost available for multiple hybrid applications and settings.

WAVE-PROPAGATION APPLICATIONS
Given its ability to generate high-frequency seismograms at teleseismic distances, the hybrid method allows to study the influence of localised 3-D structures on global propagation of body waves over 1 Hz. Purely synthetic studies in multi-scale media are made possible, providing insights into the long-range effects of complex scattering and a better understanding of uncertainties in existing models. Realistic simulations of structure-induced anomalies, on the other hand, are essential to interpretation and analysis of seismic recordings, and can now be achieved for highest frequencies observed teleseismically.

Source side heterogeneity
With strong topography prevalent both in regions of high seismicity and in nuclear test sites, accounting for topographic scattering in a local 3-D model around the source region allows to capture its teleseismic effects on body waves -a topic further addressed by Pienkowska et al. [52, 53, in preparation] in the context of forensic seismology, where observations are compared with synthetics at 2 Hz. Figure 8 illustrates the comparison on three locations in a USSR Degelen Mountain nuclear test site, demonstrating that topography alone can reproduce some complex waveform features observed teleseismically. Sourceside modelling can also account for heterogeneous small-scale geologic structures, including random media, for realistic finite faults [22] or dynamic ruptures [27,69], and for non-linear hydrodynamic interactions [71]. In all such applications, tradeoffs between various structural and source parameters -and how such interactions translate to teleseismic signals -could be evaluated. Moreover, applying backprojection [30,72] or time-reversal [10] to synthetics could help to better map the time and space history of high-frequency body waves and their source-time evolution, highlighting interactions with specific scatterers in the source region.

Depth
Embedding 3-D solvers at depth and modelling localised smallscale structures along the propagation path could improve our understanding of teleseismic signatures -or lack thereof -of deep mantle structures and provide important constraints on the evolution of the Earth's interior. Global tomographic models are in reasonable agreement for large-scale 3-D structures [4,5], but diverge at smaller length scales as a result of structural modelling assumptions and insufficient data coverage. A multi-scale view of thermal and chemical heterogeneities in the interior of the Earth, however, carries implications for global mantle convection and the Earth's heat budget, including coremantle exchanges and geochemical as well as mineralogical constraints [21,28,58]. Synthetic waveforms at high frequencies could provide insights into seismic properties of structures in the mid-mantle and around the CMB, such as velocities, densities, anisotropy and attenuation, as well as topography [e.g. 17] and sharpness of the lateral (e.g. D", 660 and Moho) and vertical (e.g. LLSVP, ULVZ) interfaces [e.g. 66]. Hybrid methods can also be directly adapted to localised tomographic studies based on full waveforms in order to improve the resolution of global Earth models at depth [39].

Receiver side
Finally, receiver-side hybrid modelling could provide a numerical means of exploring the ever larger high quality datasets from seismic arrays, such as USArray, AlpArray or Hi-net. Initially installed to improve the signal-to-noise ratio of teleseismic body waves from nuclear explosions, seismic arrays -with their regular and closely spaced instrument configurations -have since been used to study the Earth's fine-scale structure. Mirroring such dense datasets with high-frequency hybrid modelling would help not only in the interpretation of seismic recordings, but also in a systematic analysis of various array techniques, such as beamforming and its derivatives, different slant stacking techniques, and frequency-wave number analysis (for a review of array methods, see for example Rost and Thomas [56]). Receiver-side injection methods can also be used for explicit studies of study have also recently been used to study the effects of Moho and surface topography on teleseismic wavefields [44], to carry out parametric studies of inversion methods [6,73], and to locally improve the resolution of tomographic models [7,16,39] 6 DISCUSSIONS Applying the principles of wavefield injection and wavefield extrapolation, we propose a global hybrid framework built around Instaseis databases that seeks a compromise between the availability of computational resources and simulating strong 3-D scattering of teleseismic body-waves at high frequencies. The methodology boasts an unprecedented level of flexibility, and can provide up to a 100,000 fold speed-up relative to full 3-D global solvers. With the databases serving as a once-and-for-all solution to wave propagation in spherically symmetric Earth models, the global wavefields are applicable to a variety of problems, shifting the heavy computational costs to long-term storage requirements. A single database can not only accommodate for different geometries and locations of the local domain within the background model, but also can be used for coupling with multiple local 3-D solvers of choice. Moreover, it allows alterations in source properties, such as locations, source-time functions and radiation patterns, leaving such choices to later hybrid simulations. With the ability to generate and store wavefields in the 1-4 Hz frequency range, and with the flexibility of the subsequent local modelling, the hybrid method can reach the highest frequencies observed teleseismically and opens global wave propagation to a new parameter space. The method has been applied to generate 2 Hz teleseismic seismograms bearing the signature of local 3-D topography of a nuclear explosions test site, successfully reproducing amplitudes and complex waveform features present in observed data [52, 53, in preparation] Another novel method separating background and scattered wavefields has recently been proposed, entirely within the framework of AxiSEM3D [37], for high-frequency simulations with localised strong scatterers at depth. The method is precise for multiple, local interactions between a 3-D domain and a background wavefield, and allows for 3-D global models. It is, however, significantly more expensive for 3-D global simulations, and addresses a different suite of problems. Our method is geared towards higher-frequency applications, and it is complementary in its flexibility, its ease of use by relying on precomputed databases, and its plug-in approach for multiple solvers.
Two-step methods, however, are not suitable for problems where scattering is not localised. Not all wavefield interactions are accounted for with a one-way approach to coupling given the lack of a dynamic exchange of information on the boundary. All scattering effects are present within the local domain both for wavefield injection and extrapolation, and the scattered wavefield can leave the local domain. However, it can never return into the local domain in order to interact with embedded 3-D structures again, as the global and local solvers do not act simultaneously or iteratively, but sequentially. In global seismology the effects of such higher order long range scattering at teleseismic distances are often small relative to primary interactions that the method can help understand. When designing a problem, however, it is of paramount importance to understand this limitation and the potential effect in can have on resulting waveforms.
Future developments of the framework include embedding multiple domains, including solid-fluid interfaces, and using 3-D background media if precomputed databases such as Instaseis for 3D models were available, which may be a future direction for machine-learning aided wave propagation [45]. The code for the framework is open source and is available as a modification of Instaseis at https://github.com/martapien/instaseis_hybrid.