A ccelerating numerical wave-propagation using wavefield adapted meshes , P art I : F orward and adjoint modelling

An order of magnitude speed-up in finite-element modelling of wave propagation can be achieved by adapting the mesh to the anticipated space-dependent complexity and smoothness of the waves. This can be achieved by designing the mesh not only to respect the local wavelengths, but also the propagation direction of the waves depending on the source location, hence by anisotropic adaptive mesh refinement. Discrete gradients with respect to material properties as needed in full waveform inversion can still be computed exactly, but at greatly reduced computational cost. In order to do this, we explicitly distinguish the discretization of the model space from the discretization of the wavefield and derive the necessary expressions to map the discrete gradient into the model space. While the idea is applicable to any wave propagation problem that retains predictable smoothness in the solution, we highlight the idea of this approach with instructive 2D examples of forward as well as inverse elastic wave propagation. Furthermore, we apply the method to 3D global seismic wave simulations and demonstrate how meshes can be constructed that take advantage of high-order mappings from the reference coordinates of the finite elements to physical coordinates. Error level and speed-ups are estimated based on convergence tests with 1D and 3D models.


Introduction
Numerical simulations of wave propagation using finite-element methods are nowadays a standard tool in seismology, but despite the ever growing computational power they still remain computationally challenging, and the full bandwidth observed in the field can hardly be modelled with the biggest supercomputers available [e.g. 1,2]. This has been the motivation for a variety of approaches to speed up the calculations including, for example, advanced code optimization [3], usage of GPU accelerators [4,5], improved time integrators [6,7] as well as physical approximation using 1D models [8] or coarse grained attenuation [9].
AxiSEM [8] is based on the idea that the azimuthal dependency of the wavefield can be solved analytically for models that are symmetric with respect to an axis through the source. In this way, it reduces the numerical problem to two dimensions. Recently, Leng et al. [10,11] generalized this idea to include 3D variations of the elastic parameters as well as crustal thickness variations, topography and ocean loading by replacing the azimuthal analytical solution by a pseudo-spectral numerical approach and report speed-ups of up to several orders of magnitude for the case of global wave propagation in comparison to the 3D spectral-element method [SPECFEM3D_GLOBE,12]. These performance gains are based on the observation that the complexity of the wavefield emanating from a seismic source is high in the radial and vertical direction, but much lower in the azimuthal direction. To take advantage of this property, the pseudo-spectral method used in the azimuthal direction uses a spatially (depth and distance) varying maximum Fourier order.
In this paper, we build on the same observation of the wavefield properties by introducing the concept of anisotropic adaptive mesh refinement (aAMR): we construct finite-element meshes with element sizes adapted to the expected complexity of the wavefield. This way similar speed-ups can be achieved with any finite-element method and the complexity of the numerical scheme in AxiSEM3D is traded for additional complexity in meshing. Furthermore, we demonstrate how gradients with respect to the elastic properties can be computed in this framework despite the fact that the mesh is designed for the forward wavefield only, and although physical wavefields for point sources at locations other than the source location cannot be properly modelled. Finally, we demonstrate the applicability of aAMR for global wave propagation. All numerical simulations were performed with the Salvus suite of simulation software [13].
In Part II [14] we discuss in detail how the full waveform inversion workflow needs to be adapted to accomodate the source specific meshes, perform numerical inversion experiments in 2D and show the first gradients computed in 3D on the global scale.

Motivation
Leng et al. [10,11] readily show that for a variety of current 3D models on the global scale seismic waves retain their azimuthal smoothness with respect to the source location, hence we do not dive into quantitative aspects of applicability, but illustrate the idea with simple 2D examples. To this end, we consider the membrane wave approximation of surface waves [e.g. 15,16]. In contrast to 2D vertical slices, this application justifies the use of a smooth model and does not require a free surface condition. Instead, we apply absorbing boundary conditions on all surfaces of the truncated domain.   ing criterion requiring a fixed number of elements per shortest wavelength (in the elastic case the s-wave length). This mesh is conservative in the sense that it can resolve waves propagating in any direction for a given highest frequency. Looking at the wavefield reveals that both p-and s-wavefronts remain close to circular in the smooth velocity model with perturbations in material parameters of up to ±8.5%.
The time averaged azimuthal spectra of the radial and transverse displacement computed along circles around the source location ( Fig. 2) show the expected behaviour: A peak at order 2 due to the source mechanism and a steep fall-off down to a numerical noise floor about 2-3 orders of magnitude below the signal. The vertical lines indicate the standard meshing criterion (here: 2 elements per s-wave length) and reveal that the wavefield is over-resolved by the mesh in the azimuthal direction.

Anisotropic Adaptive Mesh Refinement (aAMR)
This over-resolution can be avoided by using a modified meshing criterion: instead of resolving waves travelling in any possible direction, the mesh can be designed to resolve only the waves expected for a particular source, i.e. travelling approximately in a certain direction. The required azimuthal resolution can then be found either from the azimuthal spectra or from a convergence test using a series of meshes. Leng et al. [10] introduce an empirical relation for the azimuthal resolution as a function of depth and source distance for global wave propagation. Furthermore, Leng et al. [11] monitor the energy in each azimuthal order as a function of depth and distance to the source to derive the required lateral resolution for a given 3D model and source.

submitted to Geophysical Journal International
Wavefield adapted meshes, Part I 3 source receivers Figure 3: One quadrant of an anisotropically refined mesh adapted to the complexity of the wavefields: elements are elongated in the direction parallel to the wavefronts expected for the source location marked with the star. This mesh is designed for 0.2 Hz, the mesh used for the computation of seismograms at 1 Hz has the same number of elements in lateral direction (48 in the interior, 96 after the refinement) but is refined by a factor 5 in radial direction. The number of elements in the meshes are 18.5 K for the aAMR mesh and 253.7 K for the classical mesh, giving rise to a speed-up of a factor of 13.7.
In the meshing literature, this technique of adapting the mesh to the solution is referred to as adaptive mesh refinement, and this is commonly done in the community when adapting the element size to the local velocity. The novelty in this study is to include information about the expected approximate propagation direction of the waves in the meshing process, and to create elements with large aspect ratios oriented with their long sides parallel to the wavefront resulting in an anisotropically refined mesh. It is important to note that common mesh quality measures such as equiangular skewness or edge aspect ratio lose their meaning for such meshes and the best criterion for judging the mesh quality appears to be the resulting time step according to the Courant-Friedrichs-Lewy criterion [CFL criterion, 17]. Fig. 3 shows how the anisotropic adaptive refinement can be achieved in practice: first, an inner circle with isotropic element sizes and the required number of elements along the lateral direction is created. This also ensures that the source is properly resolved in the same way as in the classical mesh. This mesh is then combined with a regular grid in cylindrical coordinates, giving rise to the anisotropic elements. The mesh can be refined in azimuthal direction to achieve higher azimuthal resolution as complexity accumulates with traveled distance from the source. As the model is the same as in Fig. 1, we can use the azimuthal spectra to choose the azimuthal element number approximately equal to the angular order at which the energy has decayed by a factor 10: 48 in the interior, 96 in the refined region. This corresponds to an optimistic meshing criterion with one element per azimuthal wavelength, while we use 2 elements per wavelength in the radial direction. Comparing the seismograms to those computed on a mesh similar to the one shown in Fig. 1 exhibits differences on the order of a few percent as highlighted by the dashed green lines, despite the fact that the aAMR mesh needs a factor 13.7 less elements.

Inverse Modelling
The adjoint equations required for gradient-based optimization (also known as adjoint tomography or full-waveform inversion) are commonly derived using the optimize-then-discretize paradigm [e.g. 18,19,20,21]. The fact that seismic stations are modelled as point observations in the theory leads to the adjoint sources being point sources. As we argued above, the aAMR mesh is designed with respect to the source, which seems to prevent using these meshes for adjoint computations. The remedy to this apparent problem is a paradigm shift in approaching the inverse problem, i.e., to use the discretize-then-optimize paradigm and to derive the discrete adjoint equation and the consistent discrete adjoint source. This is motivated by the fact that all numerical schemes to simulate seismic waves can only resolve a subset of the full wave propagation physics. For instance, for any given discretization spectral-element and finite-difference schemes produce accurate solutions only within a certain frequency band. Consequently, the discrete wave operator considers a discrete representation of the model space and is only sensitive to variations within this space. Hence, the key ingredients of the discrete adjoint approach are (1) the choice of the discretization of the (forward) wave operator such that it resolves all the physics of interest, and (2) computing derivatives with respect to changes in the discrete model space. This requires, in particular, to discretize the forward and the adjoint equation on the exact same mesh and this may be different to the discretization of the model. Furthermore it is impotant to note that the resulting discrete adjoint equation does not describe a physical wavefield and should not be interpreted with physical intuition.
To illustrate the derivation of the discrete adjoint source, we consider a misfit functional χ that compares synthetic waveforms to measured data, and the minimization problem where u denotes the displacement field, m is the material model and L(u, m) is a time-dependent wave operator. In this general form, the adjoint equation is given by with the adjoint wavefield u † [18,20]. The right-hand side in eq. (2) is called the adjoint source. By discretizing the wave equation in space using the spectral-element method, we obtain the following semi-discrete representation of the wavefield u  where φ i denote the nodal basis functions of the spectral-element method, and u i (t) are time-dependent coefficients. Note that the basis functions only have local spatial support within single or neighbouring elements, which means that for each point evaluation, e.g., at a receiver location x r , φ i (x r ) is nonzero for only a few i.

submitted to Geophysical Journal International
By inserting eq. (3) into the right-hand side of eq. (2) and applying the chain rule, we obtain the discrete adjoint source for a receiver located at x r as The adjoint source is injected at all nodes i, where φ i (x r ) is nonzero; the same weights φ i (x r ) are used for both the point evaluation u(x r , t) and the adjoint source. Hence, the spatial representation of an adjoint source is equivalent to the point evaluation; and it involves interpolation on the spectral-element mesh. As this is done in the discrete basis though, the adjoint source can naturally be represented in any mesh that is appropriate for the forward problem. Although the discrete adjoint equation still resembles a wave equation, the discrete adjoint field does not represent a physical wavefield radiated from a point source. However, this is actually not required for computing the sensitivity of the seismogram with respect to changes of the discretized smooth model.
Using the adjoint method, we obtain the directional derivative as where K is the continuous sensitivity kernel and M denotes the finite-element mass matrix that encodes the discretized integral over the domain G. The discrete gradient ∇ m χ(u) is defined on the spectral-element mesh. Because aAMR uses sourcedependent meshes, solving an inverse problem involving several sources thus requires a common parameterization of the model space. For instance, as the model is assumed to be smooth, it can efficiently be parametrized in terms of the coefficients of a multidimensional Fourier series. Prior to solving the wave equation, the Fourier space modelm is then mapped onto the spectralelement model m with the help of an interpolation operator A, i.e., m = Am.
In order to map the discrete gradient back into the Fourier space, we combine eq. (6) and eq. (5) and obtain which involves the transpose of the interpolation operator A.  [22] and a sponge layer [23]. We consider a single source-receiver pair with a recording length of 132 s and a misfit functional based on cross-correlation time shifts of both displacement components. While the forward wavefield is the same for both meshes, the adjoint wavefield on the aAMR mesh shows significant distortions radiating from the receiver location in off-source directions, i.e. where the wavefronts are not parallel to the longer sides of the elements. The discrete gradients are almost exactly the same except for small differences close to the source and the receiver and these differences disappear after projecting the discrete gradients into the model space. The small differences before projection are

submitted to Geophysical Journal International
Wavefield adapted meshes, Part I 5 expected and consistent with the theory: model perturbations on this small scale would break the underlying assumption of smoothness that was used to justify the aAMR forward operator.
After projection to the model space, the gradient is again consistent with this assumption and hence matches the reference solution.
The smoothness of the gradient is a consequence of the choice of the model space, and this choice can be interpreted as a form of regularization. In the case where the model resolution of an inverse problem is constrained by the sparsity of the data rather than the wavelength of the seismic waves, as is typically true for global and continental scale seismology, the gradients minimum scale length in the model space is decoupled from the wavelength and effectively independent of frequency. Importantly, this means that a tomographic inversion using this approach can explicitly invert for a smooth model that fits high frequency data and this is fundamentally different to using a coarser discretization of the wavefield and lower frequency data. In this particular example, the model was expanded up to Fourier order 10, relating to a minimum scale length of 85 km in comparison of 30 km for the P-wavelength.

Mesh Construction
The work by Leng et al. [10,11] suggests that a scheme adapted to the azimuthal complexity of the wavefield can be very beneficial for global wave propagation with speed-ups of up to three orders of magnitude. The aAMR technique can be employed on the global scale as shown in Fig. 6, left: the D-shaped mesh as used in AxiSEM [8, Figure 3] is extruded rotating about a vertical axis. A number of elements at the symmetry axis is removed and replaced by a cylindrical mesh to avoid singular elements. Finally, the near surface elements around the equator are refined in azimuthal direction to capture the higher complexity of the surface waves using an edge template based directional refinement scheme.
The numerical cost of the mesh generation is on the order of seconds to minutes on a workstation, and negligible compared to the simulation itself. The most expensive routines are parallelized by multithreading, allowing for meshes up to several hundred million elements to be efficiently built. Load balancing in the solver is achieved in the same way as in the standard cubed sphere mesh using graph partitioning [PT-Scotch, 25]. The meshing algorithms employed here will be described in more detail in a future publication. Fig. 6 (right) shows seismic waves propagating through a mesh designed this way for a dominant period of 50 s at two elements per wavelength. As we only employ 3D variations in the mantle using the smooth S20RTS model [24] and keep a 1D crust, the 3D effects on the waves are hardly visible, which confirms the applicability of aAMR.

High Order Approximation of the Sphere
On non-Cartesian domains such as the globe the mesh must not only resolve the wavefield, but also approximate the geometry of the domain (including topography) to sufficient accuracy. While many numerical codes rely on a first order (i.e. trilinear) mapping between physical and reference coordinates, it is possible to use more accurate representations of the geometry (e.g. second order, Komatitsch and Tromp [12] or even exact mappings to the sphere, Nissen-Meyer et al. [26]). Here we use Lagrange polynomials of arbitrary order, as these allow to represent general shapes in a straight forward manner by moving the control nodes. Fig. 7 visualizes the error in representing a sphere for varying polynomial orders of this mapping, and Fig. 8 shows how the maximum error converges with the number of elements along a great circle for a cubed sphere mesh. Slight discrepancies between the two figures stem from the fact that the latter is computed on a single element of varying size which is therefore less distorted on the surface of the sphere than the corner elements of the cubed sphere. The general picture remains unchanged by this: very few elements are sufficient to accurately represent the sphere at order 8 or higher. For most applications involving 3D models and a few tens of elements along the equator, order 4 is likely sufficiently accurate. In seismology, order 4 coincides with the order that is most widely used for the representation of the wavefield in spectral-element methods. Using the same order for shape mapping, wavefield and parameter interpolation simplifies the implementation significantly.
It should be noted that the number of control nodes is (n + 1) 3 for order n, so that using a higher order than required results in overheads in mesh file size as well as in pre time-loop computational time for the computation of the elemental Jacobian matrices. The more relevant computational cost in the time-loop is not affected by the order of the shape approximation at all, and for typical simulations the increase in time-to-solution is negligible.

1D Convergence Test
To quantify the effect of the approximation of the sphere on the seismograms and to verify the anisotropic meshing approach for global applications, we perform a convergence test using a 1D model [isotropic PREM, 28]. We use AxiSEM and Instaseis [8,27] to compute the reference solution, which uses an exact mapping to the sphere [26]. Furthermore, we use a five times over-resolved spatial discretization (2 elements per wavelength at 10 s period) as well as a fourth-order time scheme [7], so we can assume the numerical error for this reference solution to be negligible.
For the test, we build a series of meshes as shown in Fig. 6 for 50 s period at 2 elements per wavelength and for varying number of elements in azimuthal direction and several polynomial orders for the shape approximation. The surface refinement (marked C in Fig. 6) is not used here. Seismograms for a strike slip event in 20 km depth below the north pole are recorded on a regular grid of 12 × 12 receivers on Earth's surface for one hour after the event. We quantify the error using the mean of the normalized L2 waveform error over all stations [eq. 9, 29]. This misfit normalizes to the sum of the three components to avoid overemphasis of small signals close to nodal planes of the source and larger amplitudes close to the source. It emphasizes surface waves, which are more sensitive to the proper representation of the spherical surface than body waves.  max error / km shape order 1 shape order 2 shape order 4 shape order 6 shape order 8 shape order 10    The results of this test are shown in Fig. 9, confirming the convergence with respect to azimuthal mesh refinement. As the mesh is refined only in azimuthal direction, all other errors (e.g. spatial errors in source-receiver plane, time integration, model representation on the same order as the shape mapping) remain constant, the convergence is as a result limited and the limiting value depends on the spatial order.
In addition, the test suggests that fourth-order shape mapping might suffice (at an error level of a few percent) for modelling wave propagation in a 1D model using 4 elements in azimuthal direction, which coincides with a resolution criterion of 2 elements per wavelength for a quadrupole source such as the strike slip used here. On the other hand, using the standard linear shape mapping requires more than 64 elements in the azimuthal direction even when allowing for a generous error level of 10%.

3D Convergence Test
Knowing the geometrical error, we continue with the addition of 3D variations to the seismic velocities according to the S20RTS model [24], where p-wave velocity perturbations are scaled from the s-wave model. Fig. 10 shows the L2 error as a function of the azimuthal mesh resolution with and without the surface refinement (from 20 • to 160 • epicentral distance and down to 220 km depth) for a fourth-order representation of model and shape mapping. All other parameters are the same as in the 1D convergence test. The reference solution is computed on a cubed sphere mesh using the same resolution in azimuthal as well as in source-receiver direction with otherwise identical parameters. Importantly, the time step required by the Courant criterion at this mesh resolution is driven by the thickness of the crustal layers and is thus the same for all meshes including the reference solution. This might appear surprising due to the very small angles in some elements of the surface refinement, however, these elements are also very elongated in the same direction. As a consequence, the minimal point distance and hence the time step is not constrained by the angle, but by the crustal thickness. The only factor that affects the relative computational cost is consequently the total number of elements; speed-ups computed accordingly are shown in Fig. 11. The results demonstrate that at an error level of a few percent, an order of magnitude speed-up can be achieved in this problem setup. Of course, changing any of the parameters such as the resolved period, model complexity, seismogram duration, etc. influences these numbers. As demonstrated by Leng et al. [10], the efficiency generally increases with shorter periods. Rayleigh waves also show a significant effect on the amplitude, which is likely due to focussing along the ray path. For body waves, slight phase changes can be seen with almost no effect on the waveform. The major differences already disappear at the lowest azimuthal resolution and the error becomes small at 24 elements in azimuthal direction, which still provides an order of magnitude speed-up.
To achieve the convergence to an error level of 10 −3 , we use a mesh resolution of two elements per wavelength at 50 s period. As this level of accuracy is often not required given the noise level in real data, the same meshes could be used for tomography at 25 s period. We deliberately chose a very simple model for this convergence test, as this leads to a smoother wavefield. While the extremely elongated elements with accurate approximation of the sphere posed the biggest challenge in this study, it is well established that the spectral-element method works well on more complex models and the addition of topography by deforming the mesh is straightforward.
Leng et al. [10,11] show in great detail, that for a fixed model complexity, the average azimuthal complexity of the wavefield on global scale depends only weakly on frequency, which implies that the efficiency of the method increases with increased frequency.

Regional Example
Finally, we highlight the flexibility of using an element based approach for applications that do not span the whole globe in Fig. 13. In this particular use case of a simulation for East Asia the source and receiver locations are very disparate with receivers on the continent and events along the plate boundaries. Instead of using a cylindrically symmetric domain, we can adapt the mesh to the available data by removing all elements that are not within 500 km of body or surface wave ray paths. This distance appears sufficient for the absorbing boundary on the very non-smooth lower surface of the mesh to be effective, as no reflections are visible in the wavefield. In this specific case, the number of elements could be reduced by an additional factor 4.6, with respect to a cylindrical domain as would be used in AxiSEM3D on top of the savings granted by the aAMR approach.

Discussion & Conclusions
In this manuscript we demonstrate how practical order of magnitude speed-ups in numerical modelling can be achieved by taking advantage of the azimuthal smoothness of seismic waves in the mesh design. The smoothness of the wavefield and model are necessary prerequisites for the method, and such conditions need to be verified for each application (as it was done by Leng et al. [10,11] for global wave propagation).
In comparison to the hybrid pseudo-spectral/spectral-element approach as described by Leng et al. [10,11], there are several important differences. In the hybrid method, the meshing is done in an axisymmetric 2-D domain, and the adaption to the wavefield's azimuthal complexity is achieved by locally adjusting the order of the spectral basis in that direction. While this keeps the meshing procedure relatively simple, an additional algorithmic and computational cost is borne in the computation of the stiffness term. In contrast, the aAMR method proposed here requires no change to the wave equation solver, but instead requires a more complex meshing stage that can only adapt to the azimuthal wavelength in a coarser manner via mesh refinement. As an additional advantage, aAMR is not limited to axisymmetric domains, but allows for removing elements outside of a region of interest and for applying absorbing boundary conditions (e.g.  Figure 12: Example seismogram for the convergence in S20RTS with azimuthal refinement including surface refinement (orange line in Fig. 10) at an epicentral distance of 110.8 • . For reference, the lowest trace shows the 1D solution. Note the factor 10 on the difference for meshes starting from 8 elements in azimuthal direction. Fig. 5 and 13). Moreover, the azimuthal resolution can be varied as a function of azimuth, if necessary.
While our discussion here is limited to point sources, it is easy to see that the cylindrical region of the mesh which consists of regularly shaped elements can also be centered on a finite source and the implementation would be no different to a standard cubed sphere mesh. The only requirement for the aAMR to work and achieving speed-ups similar to those shown above is that the complexity of the resulting far-field wavefield retains similar directional characteristics.
Furthermore, we demonstrate that gradients with respect to model parameters can be computed consistently using the discrete adjoint method irrespectively of artefacts in the adjoint wavefield. aAMR will hence help to push large-scale full waveform inversions that are still limited by computational resources into new regimes of resolution and bandwidth [see part II for first steps in this direction, 14].