Gradient-boosted equivalent sources

The equivalent source technique is a powerful and widely used method for processing gravity and magnetic data. Nevertheless, its major drawback is the large computational cost in terms of processing time and computer memory. We present two techniques for reducing the computational cost of equivalent source processing: block-averaging source locations and the gradient-boosted equivalent source algorithm. Through block-averaging, we reduce the number of source coefficients that must be estimated while retaining the minimum desired resolution in the final processed data. With the gradient boosting method, we estimate the sources coefficients in small batches along overlapping windows, allowing us to reduce the computer memory requirements arbitrarily to conform to the constraints of the available hardware. We show that the combination of block-averaging and gradient-boosted equivalent sources is capable of producing accurate interpolations through tests against synthetic data. Moreover, we demonstrate the feasibility of our method by gridding a gravity dataset covering Australia with over 1.7 million observations using a modest personal computer.


Introduction
Measurements of anomalies in potential fields, like gravity disturbances and total-field magnetic anomalies, are widely used in geophysical exploration for their low cost of acquisition. These data can be surveyed using ground, airborne, shipborne, or satellite systems. During ground surveys, the data are often gathered following irregular paths or networks along the surface of the terrain, leading to highly variable elevations in mountainous regions. Airborne and satellite surveys gather data along flight lines, producing closely spaced measurements along almost straight lines but with larger spacing between adjacent lines. Measurement height can also change because of the vertical movement of the aircraft. Processing of the data often involves interpolation onto a regular grid at constant height, both to improve visualization for interpretation purposes as well as to prepare the data for further processing and modelling (e.g., reduction-to-thepole, derivative calculations, upward continuation, Euler deconvolution).
Several methods exist in the literature for interpolation in two dimensions, for example continuous curvature splines in tension (Smith and Wessel, 1990), biharmonic (thin-plate) splines (Sandwell, 1987), and kriging (Hansen, 1993). These general-purpose methods have limitations when it comes to interpolating potential field data, namely (i) they are not able to take into account the variable height of the observation points and (ii) the interpolating functions are not necessarily harmonic, which is the underlying assumption behind many processing techniques (e.g., upward continuation and vertical derivatives).
A widely used method for interpolating gravity and magnetic data is the equivalent sources technique (also known as equivalent layer, radial basis functions, or Green's functions interpolation). First introduced by Dampney (1969), the method consists in fitting a model of finite elementary sources to the data and using this model to predict new data values. Besides interpolation, equivalent sources have been used for reduction-to-the-pole of magnetic data (Guspí and Novara, 2009;Nakatsuka and Okuma, 2006;Silva, 1986), upward continuation (Emilia, 1973;Li and Oldenburg, 2010), joint processing of gravity gradient data (Barnes and Lumley, 2011), modelling the lithospheric magnetic field (Kother et al., 2015), recovering the magnetic induction vector from total-field magnetic anomalies (Li et al., 2020), and more.
It is also worth mentioning the least-squares collocation method (LSC), which is widely used in geodesy (Tscherning, 2015, and references therein). LCS is often applied to combine and interpolate different linear functionals of the disturbing gravity potential (gravity anomalies, gravity disturbances, deflections of the vertical, geoid height, et cetera). Like equivalent sources, collocation also requires the solution of a large linear system of the order of the number of observed data. As such, it's practical application suffers from the same computational challenges. Many variants of the equivalent sources technique have been proposed, often attempting to obtain faster or more accurate solutions. The key factors that vary between them are: (i) the type of source, (ii) the location of the sources, and (iii) the solution strategy.
The most commonly used type of source is a point mass for gravity or dipole for magnetics (e.g., Mendonça and Silva, 1994;Silva, 1986;Siqueira et al., 2017;von Frese et al., 1981). However, right-rectangular prisms (e.g., Barnes and Lumley, 2011;Jirigalatu and Ebbing, 2019;Li et al., 2020) and tesseroids (Bouman et al., 2016) have also been used successfully. In fact, even point sources with a simple inverse distance function, instead of actual gravity or magnetic fields, can be used as equivalent sources (Cordell, 1992).
The location of sources often follows one of two strategies. The most common approach is to distribute sources on a regular grid at a constant depth (e.g., Barnes and Lumley, 2011;Leão and Silva, 1989;Oliveira et al., 2013). Alternatively, sources can be placed beneath each data point (e.g., Cordell, 1992;Siqueira et al., 2017). Some recent work by Li et al. (2020) places the sources in two overlapping layers at different depths.
The coefficients of the equivalent source model are often estimated through damped least-squares. This imposes a heavy computational load when the number of data points is large (e.g., airborne and satellite surveys). To reduce the computational load, Mendonça and Silva (1994) built the solution iteratively by incorporating one data point at a time using the "equivalent data concept". Leão and Silva (1989) processed the input data using a moving window, only fitting the data inside the window and predicting observations at its center. Li and Oldenburg (2010) and Barnes and Lumley (2011) apply different operations to generate a sparse representation of the sensitive matrix (respectively, wavelet compression and quadtree discretization), which significantly improves the speed of the least-squares solution. Oliveira et al. (2013) parametrized the equivalent layer as a piecewise bivariate polynomial function, reducing the number of parameters in the solution. Siqueira et al. (2017) developed an iterative solution in which the sensitivity matrix is transformed into a diagonal matrix with constant terms through the "excess mass criterion". Jirigalatu and Ebbing (2019) applied the Gauss-FFT method to speed up the forward modelling operations and solved the least-squares problem using steepest descent to avoid calculating the Hessian matrix and solving linear systems.
Many of the existing methods solve under-determined problems, requiring a much larger number of equivalent sources than the number of data points. Some achieve greater efficiency by restricting their applications to specific data types (Siqueira et al., 2017), interpolating only on regular grids (Leão and Silva, 1989), or requiring already gridded data (Takahashi et al., 2020), to name a few. Furthermore, many of the optimizations proposed are also complex to implement in a computer program, limiting their wider adoption.
In the present study, we propose two strategies for reducing the computational load of the equivalent sources technique: 1. Reduce the number of equivalent sources for oversampled surveys through a block-averaging strategy while maintaining the quality of the solution.
2. Fit the equivalent source model iteratively along overlapping windows using a gradient boosting algorithm (Friedman, 2001).
The first strategy consists in dividing the survey area into horizontal blocks and assigning a single source to each block, located at the median horizontal location of the data points. For airborne, shipborne, and satellite surveys, which are oversampled along tracks, this can greatly reduce the size of the inverse problem while retaining the same quality of interpolation.
The gradient boosting algorithm allows us to fit the equivalent source model iteratively by operating on individual overlapping windows. As a result, our method solves several much smaller least-squares problems instead of a large one. This has some similarities with the strategy used by Leão and Silva (1989) but without the requirement for sources and predictions to be on regular grids.
Through tests on synthetic data, we show that: (i) the block-averaged sources are able to achieve the same accuracy as other traditional equivalent source layouts while using a fraction of the number of sources, and (ii) the gradient boosting algorithm greatly reduces the computational memory required to fit very large datasets without sacrificing prediction accuracy. Finally, a combination of both strategies is used to process a collection of approximately 1.7 million ground gravity data measurements from Australia.

The equivalent sources technique
We will follow the "generalized equivalent sources" of Cordell (1992) and assume that any harmonic function d(p) can be approximated by a sum of M discrete point source effects in which p and q j are, respectively, the position vectors in a 3D Cartesian space of data and sources, c j is a scalar coefficient related to the point source located at q j , and · represents the L 2 norm. The horizontal and vertical distribution of sources is discussed in section 2.4.
In case we have values of the harmonic function at N discrete points {p 1 p 2 . . . p N }, we can write a set of N equations of the form where d i is the calculated value at point p i . These equations can also be expressed in matrix form as where d is a column vector containing the N predicted values at the observation points, c is a column vector containing the M coefficients c j , and A is the N × M Jacobian matrix, whose elements are For a given set of N observed data d o , we can find a least-squares solution to Eq. 3 and obtain the values of c that best fit the observations. These coefficients can, in turn, be used to predict the value of the harmonic function at any other point outside of the sources by evaluating Eq. 1. Gridding and upward continuation can thus be achieved by predicting values on points that fall on a regular grid or at different heights, respectively.

Damped least-squares solution
We can obtain the values of the source coefficients c that best fit the observed field values d o by minimizing the goal function where W is a N × N diagonal matrix of data weights and λ d is a positive damping parameter with the same units as the Jacobian matrix elements. The second term on the right-hand side of Eq. 5 is the zeroth-order Tikhonov regularization (Tikhonov, 1977), also known as a damping regularization, that is used to stabilize the solution. The damping parameter controls the amount of regularization that will be applied. An overly large value would generate a smooth solution that fails to reproduce the high frequency components of the data, while an overly small value would result in over-fitting, thus failing to produce realistic interpolation results (Martinez and Li, 2016). The range of acceptable values for the damping parameter λ d will depend on the values of the Jacobian matrix A and the coefficients. Consequently, this range will vary (often dramatically) between datasets, making it difficult to choose an appropriate value in practice.
To solve this issue, we first scale the Jacobian matrix so that its elements are dimensionless and each column has unit variance. We define a diagonal matrix S in which σ j is the standard deviation of the j-th column of A. We then write the forward problem in Eq. 3 as where B = AS -1 is the scaled and dimensionless Jacobian matrix and m = Sc is a vector containing scaled coefficients with the same units as the data. The goal function defined in Eq. 5 can be rewritten as where λ is a dimensionless damping parameter and regularization is applied on the scaled coefficients m instead of c. Using a dimensionless damping parameter allows us to narrow the range of values of λ that would generate the most accurate predictions, irrespective of the dataset and its units. From experience, we recommend searching for suitable λ values between 10 −6 and 10 4 varying by order-of-magnitude. The choice of the damping and other hyper-parameters, like the source depth, could be done through well-established statistical methods, such as cross-validation. The vector of scaled coefficientsm that minimizes the goal function can be found by solving the normal equation system (Menke, 1989) Once the scaled coefficients are obtained, the estimated unscaled coefficientsĉ can be calculated by removing the scaling factorĉ = S -1m .
The forward modeling operations used to perform predictions (e.g., for interpolation and upward continuation) are left unchanged by using vectorĉ instead ofm.

Gradient boosting
Gradient boosting was first introduced by Friedman (2001Friedman ( , 2002 as a method for fitting additive parametric models of the form where α k is a scalar coefficient called the step-size and f is a function of the parameter vector c k . For linear problems, these additive models can be written as the matrix equation Because of the linearity of the f (c k ) functions, the α k step-size parameters can be incorporated into the parameter vector c k . We can transform our equivalent source problem in Eq. 3 into an additive model by following these steps: 1. Define a set of M equivalent sources distributed throughout the survey area (see section 2.4 for details).
2. Define a set of K overlapping windows of equal size that cover the survey area.
3. Create K separate sets of equivalent sources, one for each window. Each set will be formed by the portion of the original M sources that fall inside the respective window. Since the windows overlap, the total number of sources from all sets will be greater than M .
4. Define vector c k as the M k coefficients of the equivalent sources of the k-th window.
5. Define matrix A k as the N × M k Jacobian matrix between the sources in the k-th window and all N data points of the survey.
6. Model the predicted data as a superposition of the effects of the K separate sets of equivalent sources (i.e., Eq. 12).
The gradient boosting algorithm works by fitting each component of the additive model, one at a time, to the residuals of the previous component. Friedman (2001) demonstrates that this corresponds to a steepest-descent optimization in the so-called "function space". The adaptation of the gradient boosting method to find the damped least-squares solutions for the K parameter vectors c k in Eq. 12 is presented in Algorithm 1.
Algorithm 1: Gradient boosting solution for damped least-squares regression.
1 Define the residual vector r 0 = d o After all c k coefficients vectors are estimated, we can predict the effect of the additive equivalent source model on any point through the summation in which c kj is the j-th element of the c k vector and the q kj is the position vector of the j-th source of the k-th window.
To improve the convergence of the algorithm, Friedman (2002) suggests introducing randomness into the fitting process. We achieve this by randomizing the order in which the K windows are used in the gradient boosting algorithm. Section 3.3 explores the effect of randomization in the convergence rate of the algorithm and the accuracy of the interpolation.
The A k matrices have only N × M k elements (where M k is the number sources on the k−th window), which can be considerably smaller than the N × M elements of A. Therefore, the gradient boosting method allows us to fit equivalent source models that would produce Jacobian matrices that are larger than the available computer memory. Furthermore, we can increase or decrease the size of the overlapping windows as needed depending on the number of sources in the model and the available computer memory.
We can improve the efficiency of the algorithm further by: 1. Using only the N k data points that fall within the k-th window for fitting the sources (steps 4 and 5 of algorithm 1). By doing so, we can replace the N ×M k Jacobian matrix A k with the smaller N k ×M k matrix A k . We still use all N data points when calculating the predicted data and residuals (steps 7 and 8 of algorithm 1).
2. The forward modeling operation performed in step 7 can be done by a summation (Eq. 2) instead of a matrix-vector product, which allows us to avoid computing and storing the larger N × M k matrix A k at any point.
Algorithm 2 is the final gradient-boosted equivalent sources algorithm which incorporates these changes. Figure 1 shows a sketch of the algorithm steps applied a set of observation points that simulate a ground survey and locating one source below each data point.
It is worth noting that two sets of equivalent sources obtained through two adjacent overlapping windows have some portion of the sources on the same locations, specifically the ones that fall on the intersection between the two windows. We can interpret this as the gradient-boosting algorithm fitting the source coefficients multiple times: one time for every window that covers each source. This fact can be exploited in order to save computer memory. Instead of storing all of the c k vectors (Eq. 12), we can initialize a single c vector with zeros, where each element represents the coefficient of each one of the original M sources. After each iteration of the gradient-boosting algorithm, we add the estimated coefficientsĉ k to the corresponding elements of vector c. Because the forward modelling function is linear, we can safely compute the resulting field through Eq. 1 instead of Eq. 13. This way, the memory needed to store the entire set of estimated coefficients is limited to a single vector of M elements.
Our gradient boosting algorithm for overlapping windows is similar to the "bootstrap inversion" used in von Frese et al. (1988), which also iteratively fits portions of an equivalent source model to the data residuals. The key differences are that in our method: (i) the sources in the  Figure 1: Sketch of the gradient-boosted equivalent source algorithm. Data points are represented by blue upwardsfacing triangles, equivalent sources by orange dots, data residuals by red downwards-facing triangles, and the current window by black dashed lines. The algorithm starts by selecting the data and sources inside the first window and estimating the source coefficients using the selected data points. Then, the effect of the estimated sources is predicted on all data points and used to calculate the residuals. Another window is used to select residuals and sources and estimate the coefficients using the selected residuals instead of the original data. Again, the effect of the estimated sources is predicted on all data points and the residuals are updated. These steps are repeated for every window in a randomized order.
1 Define the residual vector r 0 = d o Select weightsW k and residualsr k−1 for data points inside the k-th window 4 Calculate Jacobian matrixÃ k with data points and sources inside the k-th window 10 end for overlapping portions of the windows are fitted more than once, allowing the algorithm to self-correct for poor solutions to any given window; (ii) we use only data points within the window when fitting, what enables the use of larger datasets.

Location of sources
The ideal number of sources and their locations, both horizontally and vertically, has been debated since the inception of the equivalent sources technique with Dampney (1969). The choices made regarding these parameters can play an important role on the accuracy of the predictions and the computational resources needed to estimate the source coefficients. An ideal distribution of sources should simultaneously be able to reproduce the measured data on the survey points, make accurate predictions on non-surveyed locations, and minimize the required computational resources. A large number of evenly distributed sources along the survey region are capable of reproducing the observed data. Nevertheless, the computational load can be prohibitive and such underdetermined problems are prone to overfitting the data, leading to poor predictive power when interpolating and extrapolating. On the other hand, using few sources will reduce the computational requirements but the model may be incapable of reproducing the full spectral content of the measured data.
Particular survey characteristics also play a role in the choice of equivalent source distribution. In a ground survey, observations are usually located along irregular paths and scattered points. The coverage of the survey region is often uneven, leaving large areas without any observation. On the other hand, observations from airborne surveys are located along almost straight and closely spaced flight lines. Measurements are usually taken at a high temporal frequency, leading to observation points along the flight lines that are several times closer to each other than the flight line spacing. This creates a bias in the sampling, which can cause aliasing artifacts in gridded products.

Horizontal source layouts
The most widely used layouts for distributing equivalent sources horizontally are: 1. Sources below data points: one equivalent source is placed at the horizontal location of each data point (Fig. 2b). Therefore, the number of sources is equal to the number of observations (M = N ).
2. Regular grid : a homogeneous distribution of point sources below the survey region (Fig. 2c). A padding region is often added to help reduce edge effects. In practice, it often leads to underdetermined problems since a large number of sources is required (M > N ).
For ground surveys, the regular grid layout needs a sufficiently small grid spacing to be able to fit the observed data. This creates an unnecessarily large number of sources in areas where no observations exist. In contrast, the sources below data layout is more likely to accurately fit the observed data with many fewer sources, reducing the computational load. But when applied to airborne surveys, the sources below data layout may place an undesirably large number of sources along the flight paths. This could lead to aliasing effects on the predicted values, such as the stripes parallel to flight lines that are often observed when gridding airborne magnetic data. The regular grid layout can avoid this effect by evenly distributing sources and using a continuous source layer (e.g., right-rectangular prisms or tesseroids).
We propose a new way of distributing equivalent sources horizontally that could simultaneously reduce the computational load and mitigate some of the drawbacks of existing layouts. In the block-averaged sources layout, point sources are placed in the average position of data points that fall within specified spatial blocks (Fig. 2d). This is done by: 1. Dividing the survey region into rectangular blocks of equal size.
2. Computing the median horizontal position of the observation points that fall inside each block. Blocks without any observation point are omitted.
3. Assign one point source to each of the median horizontal positions calculated in step 2.
The number of sources created by this new layout will be less than the number of observations if the block size is chosen appropriately (i.e., making sure that blocks are large enough to contain more than a single data point). The overdetermined problem that arises from this layout has a lower computational load and is less prone to overfitting the data since the model complexity is lower. Moreover, the block averaging process can balance the spacing between sources along a flight line and between adjacent lines, helping to reduce aliasing effects in the generated grids. In Section 3.1, we demonstrate through tests on synthetic data that the block-averaged sources layout is able to interpolate with comparable accuracy to other layouts while using a fraction of the equivalent sources.

Depth of sources
It is widely known from potential theory that the depth of a point source influences the wavelength of the observed field at the surface. This makes the source depth a key parameter affecting the outcome of interpolation and other operations done with equivalent sources. Several different strategies for assigning the depths of equivalent sources have been proposed in the literature. Here, we will highlight the following (Fig. 3): 1. Constant depth: The simplest option is to locate all sources at the same depth (Fig. 3a). If the measurements were taken at significantly different altitudes, some measurements will be more distant to the sources than others, which may create problems for reproducing short wavelengths in high altitude points.
2. Relative depth: The depths of sources are determined by shifting the vertical coordinate of data points downward by a fixed amount (Fig. 3b). The sources will not all be at the same vertical coordinate, but they will all be at the same vertical distance from the observation points.
3. Variable depth: The depths of sources are proportional to the horizontal distance to the nearest neighbouring data points or sources (Fig. 3c). Different variations of this strategy have been proposed before, for example Cordell (1992), Guspí et al. (2004), and Guspí and Novara (2009). The rationale for this strategy is that if a survey has data points clustered in some areas, we may want the sources below those areas to be shallower in order to preserve the shorter wavelengths that can be measured.
Our approach to the variable depth strategy will be: in which z is the vertical coordinate (positive downwards) of an equivalent source, ∆z is a relative depth shift that is the same for all sources, α is an dimensionless depth factor, h is the median horizontal distance to the k nearest neighbouring sources, and z obs is a vertical observation coordinate that will depend on the horizontal layout strategy. For sources below data, it is the vertical coordinate of the data point corresponding to the given source. For regular grid, it can be interpolated from the vertical coordinates of all data points. Finally, for block-averaged sources it will be the median vertical coordinate of the data within the corresponding block. In Section 3.1, we test the effectiveness each of these strategies on synthetic data.

Tests on synthetic data
We have used synthetic gravity datasets to test the interpolation accuracy of the difference horizontal and vertical source distribution strategies as well as the gradientboosted equivalent sources method. To generate the data, we created a model of 64 right-rectangular prisms, distributed in a 111319 m×111319 m area with depths varying between 10000 m and zero. The density contrast of prisms ranges from −900 kg m −3 to 500 kg m −3 . The model includes prisms of different shapes, sizes, and depths to create gravity disturbances with a variety of wavelengths. We created two synthetic datasets from the model, one simulating a ground survey and another an airborne acquisition (Fig. 4). To create the synthetic ground survey, we selected measurement positions from a portion of a public domain gravity dataset for Southern Africa, available through the NOAA National Centers for Environmental Information (NCEI). For the synthetic airborne survey, we used a portion of the Great Britain Aeromagnetic Survey acquired by Hunting Geology and Geophysics Ltd and Canadian Aeroservices Ltd between 1955 and 1965 and made publicly available by the British Geological Survey (BGS). In both cases, we rescaled the horizontal coordinates of each survey portion to span an area of 111319 m×110576 m, matching the model dimensions. The ground survey contains 963 observations distributed at heights between 0 m and 2052.2 m (Fig. 4a). The airborne survey has 5673 observations at heights between 359 m and 1255 m (Fig. 4c).
The vertical component of the gravitational acceleration generated by the model was computed using the method of Nagy et al. (2000Nagy et al. ( , 2002 with recent modifications by Fukushima (2020), as implemented in the opensource software Harmonica (Uieda et al., 2020b). We generated a target grid of 57×56 points with a spacing of 2 km and located 2000 m above the zero height plane (Fig. 5) to serve as a reference when calculating the interpolation error. We then generated synthetic ground (Fig. 4b) and airborne (Fig. 4d) data to which we added pseudo-random Gaussian noise with zero mean and 1 mGal standard deviation.

Source distribution strategies
We investigated the effect on interpolation accuracy of different strategies for distributing the equivalent sources horizontally and vertically. To do this, we used the damped least-squares solution described in Section 2.2 (without gradient boosting) to interpolate the synthetic datasets (Fig. 4) and compared the results against the target grid (Fig. 5). This process was repeated for each combination of horizontal layout (sources below data and block-averaged sources) and depth type (constant, relative, and variable) and for regular grid sources with a constant depth, totalling 7 different combinations. Each source distribution strategy requires certain hyper-parameters to be chosen in order to build the set of point sources. For example, using a constant depth needs the definition of the depth and using block-averaged sources requires the definition of the block size. The predictive capabilities of the equivalent sources depend on the choice of these hyper-parameters. To ensure that our comparisons are fair, we perform an exhaustive search over combinations of hyper-parameter values (including the damping parameter from Eq. 8) to obtain the best prediction that can be achieved by each source distribution strategy. Here, the best prediction is defined as the one that minimizes the root mean-square error (RMS) between interpolated values and the target grid (Fig. 5).
The parameter values used in these searches and the one producing the smallest RMS error are outlined in Tables 1 and 2. Fig. 6 and 7 show the differences between the target grid and the best prediction achieved by each source distribution strategy for the ground and airborne synthetic surveys, respectively. For the synthetic ground survey, the horizontal layouts produced similar RMS values of approximately 0.8 mGal regardless of the depth type, with the exception of the regular grid layout which produced a larger RMS of 0.97 mGal. The differences between the target grid and the interpolated values are larger in regions of poor data coverage. Edge effects are present for all strategies but are noticeably smaller for the combination of block-averaged sources with a variable depth based on the nearest neighbour distance. For the synthetic airborne survey, all strategies (including the regular grid) produced similar RMS errors of approximately 0.3 mGal. The maps of the differences between the target grid and

Window size and overlap in gradient boosting
We assessed the trade-offs in interpolation accuracy and computation time of the gradient-boosted equivalent sources algorithm as a function of the two key controlling factors: the window size and the amount of overlap between adjacent windows. The comparisons were performed against a regular least-squares solution (Eq. 9) using the synthetic airborne data (Fig. 4c-d). To avoid biasing the results, we used the same locations of equivalent sources for both the regular and gradient-boosted interpolations, namely block-averaged sources with a block size of 2000 m and a relative depth of 9000 m.

Window size
The size of the windows controls the size of the Jacobian matricesÃ k by limiting the number of data points and equivalent sources used in each step of the gradientboosting algorithm (Alg. 2). Thus, using smaller windows will reduce the total amount of computer memory required to estimate the source coefficients. Nevertheless, smaller windows may produce less accurate interpolations by failing to achieve the global minimum of the goal function in Eq. 8. The window size might also impact the computation time in non-intuitive ways since smaller windows generate smaller least-squares problems but also require more gradient-boosting iterations. We calculated the interpolation RMS error (between the interpolated grid and the target grid in Fig. 5) and computation time for a fixed window overlap of 50% and several window sizes. To avoid any biases introduced by the shuffling of windows, the calculations were repeated using different seeds for the pseudo-random number generator used in the shuffling. Fig. 8a shows the RMS error and Fig. 8c shows the computation time required for estimating the source coefficients, both as functions of the window size.
These results show that the interpolation error for gradient-boosting is generally larger than the error for regular equivalent sources. The error decreases asymptotically to within ∼ 40% of the regular equivalent sources for windows with an area greater than ∼ 10% of the survey area. The computation time similarly decreases with window size, with the gradient-boosting being generally faster than the regular equivalent sources for windows with an area greater than ∼ 5% of the survey area. As the window size increases, both RMS error and computation time appear to stabilize to nearly constant levels.

Window overlap
The amount of overlap between adjacent windows plays an important role in the performance of the gradientboosted equivalent sources. It controls the number of iterations and how many times a particular source is used in the least-squares fitting process. The experiments in the previous section showed that 50% overlap was sufficient to achieve acceptable interpolation accuracy. However, we studied separately the impacts of the amount of window overlap on both accuracy and computation time.
We performed a similar experiment to the one in section 3.2.1 but this time kept the window size fixed to 30000 m and varied the amount of overlap from 0% to 95% with a step size of 5%. All other experimental procedures remained unchanged. Fig. 8b shows the RMS error and Fig. 8d shows the computation time required for estimating the source coefficients, both as functions of the window overlap.
Our results show that the interpolation RMS error decreases with the amount of overlap, reaching the same accuracy as the regular equivalent sources at approximately 90% overlap. On the other hand, the computation time increases with the amount of overlap, becoming larger than that of the regular equivalent sources for overlaps greater than 70%. This is expected since increasing the overlap adds iterations to the gradient boosting algorithm without decreasing the individual least-squares problem sizes to compensate.

Interpolation with gradient boosting
Finally, we applied the gradient-boosted equivalent sources to interpolate the synthetic airborne survey (Fig. 4). As previously, we used the block-averaged sources layout with a block size of 2 km. Based on the results from section 3.2, we adopted a window overlap of 50% and a window size of 20 km.
We estimated the relative depth of the sources and the damping parameter by comparing the predictions against the values of the target grid. The search explored depth values between 1000 m and 19000 m and damping values between 1e-06 and 10 by steps of one order of magnitude. The most accurate predictions achieved a RMS error of 0.38 mGal with a depth of 3000 m and a damping of 0.1. It is worth noting that the RMS error achieved by the gradient-boosted equivalent sources is comparable to the ones obtained by the regular equivalent sources in Section 3.1. To highlight the importance of randomizing the order of windows in the gradient-boosting iterations, we preformed the interpolation once more using the same values of damping and depth but this time iterating over windows in sequential order (South to North, West to East).
Figs. 9a-b show the differences between the target grid and the interpolation results for windows in randomized and sequential order, respectively. The differences for randomized windows resemble those for regular least-squares equivalent sources seen in Figs. 6 and 7. On the other hand, the differences for sequential windows show a clear trend of large negative differences in the South decreasing towards the North. This trend is correlated with the order in which windows are executed, with differences decreasing in absolute value towards the end of the algorithm. Fig. 9c shows the RMS error of the fitting process after each iteration for both window orders, clearly indicating that a randomized window order leads to faster convergence of the algorithm. . Window overlap is given as a percentage of the window size (an overlap of 50% means that two adjacent windows share an area half of the size of the entire window). For gradient-boosting, the RMS errors and computation times are the means (error bars are 1 standard deviation) of results using different seeds for the pseudo-random number generator. Computation time is the ratio between the time required to estimate the source coefficients for the gradient-boosted and the regular equivalent sources.

Gridding gravity data from Australia
This section will demonstrate how gradient-boosted equivalent sources can be used to interpolate large datasets onto regular grids at uniform height. For this purpose, we selected an open-access compilation of ground gravity surveys over Australia made by Wynne (2018) and filtered and referenced to the WGS84 ellipsoid by Uieda (2021). It contains over 1.7 million data points and covers most of the Australian territory at variable point spacings. Our goal is to create a 1 arc-minute resolution grid of gravity disturbances at a constant geometric height of 2127.58 m (the largest height of observations).
We computed the gravity disturbance by removing the normal gravity of the WGS84 ellipsoid from the observed gravity data (Fig. 10). Here, normal gravity was computed at each observation point through the closed-form formula of Li and Götze (2001) using the Boule software (Uieda and Soler, 2020). Finally, we converted the observations to planar Cartesian coordinates by applying a Mercator projection.
We start the interpolation process by defining a set of block-averaged sources using a block size of 1.8 km, resulting in a total of 796744 point sources. The block size was chosen to match the desired resolution of the final grid (1 arc-minute is approximately 1.8 km at the equator). Based on the results obtained in Section 3.1, we have cho-sen to use the relative depth strategy. The window overlap was once again fixed at 50%. To determine the size of the windows, we calculated the amount of computer memory needed to store the largest Jacobian matrix for different values of window size (Fig. 11a). We have chosen a size of 225 km in order to limit the amount of memory needed to under 16 Gigabytes.
We determined the depth of the sources and the damping parameter by applying K-Fold cross-validation through the scikit-learn library (Pedregosa et al., 2011). The method randomly divides the original data into k sets (folds), fits the model using only data from k − 1 folds, and validates the model by comparing its predictions against the one remaining fold. This process is carried out once for each one of the k folds, leading to an estimated mean cross-validation RMS error for the model. To speed up the computation, we only performed the crossvalidation on a subset of the data corresponding to an area of 300 km×300 km containing 14934 points. We ran the cross-validation repeatedly for combinations of depth, ranging from 1000 m to 15000 m, and damping, from 0.01 to 10000 in steps of one order of magnitude. Figure 11c shows the resulting cross-validation RMS errors and highlights the minimum value of 1.33 mGal, which corresponds to a relative depth of 3000 m and a damping equal to 100.
Finally, we proceeded to estimate the source coefficients using the entire dataset and the parameters previously determined. The estimated source coefficients were then  Figure 9: Interpolation error for gradient-boosted equivalent sources using randomized (a) and sequential (b) window order. (a and b) Pseudo-color maps of the differences between the target grid and the interpolated synthetic airborne survey data. The color scale has been cropped to the same range as Fig. 7. (c) Root-mean squared error after each iteration of the gradient-boosting algorithm.
used to predict the values of the gravity disturbance on a regular grid of 2442×2085 points at 2127.58 m above the ellipsoid. On a modest workstation with 16 cores and 16 Gigabytes of RAM, estimating the 796744 coefficients with gradient-boosting took ∼ 1.3 hours and the prediction step took ∼ 18 minutes. Fig. 10 shows the original data distribution and the interpolated grid. Grid points that are further than 50 km from the nearest data point are masked to avoid unrealistic extrapolations. Fig. 11b shows the RMS error against the observed data after each iteration of the algorithm. Fig. 12 shows the difference between the observed and predicted gravity disturbances. The inset figure shows a histogram of these residuals, which are approximately normally distributed around zero.

Location of sources
The results of our tests on synthetic data (Figs. 6 and 7) show that there are no significant differences in interpolation accuracy between source distribution strategies, both in terms of the interpolation RMS errors and from visual inspection of the difference maps. Therefore, we conclude that all source distribution strategies are able to produce comparable interpolations. Nevertheless, the block-averaged sources strategy makes use of fewer sources when compared with other strategies, which reduces the computational load of estimating the sources coefficients and forward modelling. To ensure that the interpolation is able to reproduce the high frequencies in the data, the block size used in the averaging should be chosen to match the desired grid resolution.
The choice of source depth strategy does not appear to significantly impact the interpolation RMS error. In the particular case of a sparse ground survey with blockaveraged sources, the use of a variable depth visibly reduced edge effects and artifacts in areas of poor data coverage. At a first glance, the choice of a depth strategy would not seem to impact the computation time. However, when searching for the set of hyper-parameters that produce the most accurate interpolation (e.g., through cross-validation), one must solve the inverse problem once for every possible combination of parameters. A depth strategy like the variable depth requires a higher number of hyper-parameters (depth shift ∆z, depth factor α, and the number of nearest neighbours k from Eq. 14) than other strategies which only require a single parameter. Having more parameters means increasing the dimensions of the parameter space and thus increasing the number of possible combinations. Thus, we recommend using a constant depth or a relative depth when processing large datasets in order to minimize computation time.

Gradient boosting
From Fig. 8a and 8c, we can see that the gradient-boosted equivalent sources produce slightly less accurate interpolation results but are able to achieve smaller computation times than regular equivalent sources. The reduction of the accuracy might be due to the gradient boosting algorithm failing to converge to the global minimum of the goal function. As the windows increase in size, interpolation error decreases because more data points are included into the least-squares fitting of the source coefficients. At the same time, the fitting process becomes faster because of a reduction in the number of iterations. Our results indicate that it is desirable to maximize the window size, which can be done up to the point that the Jacobian matrices still fit within the available computer memory.
The results shown in Figs. 8b and 8d indicate that using window overlap values between 40% and 70% strike a balance between accuracy and computation time. This corroborates our initial choice of 50% overlap, which is good enough for producing accurate predictions in reasonable times.
Finally, the results in Fig. 9 highlight the importance of randomizing the order in which the overlapping windows are iterated. Running the gradient boosting algorithm sequentially produces less accurate predictions and decreases the convergence rate of the method.  (a) Amount of computer memory needed to store the largest Jacobian matrix for different window sizes. Our implementation uses double precision floating point numbers (64 bits) for the Jacobian. (b) Root-mean square error against the observed data after each iteration of the gradient-boosting algorithm. (c) K-Fold crossvalidation root-mean square errors obtained for each pair of damping and depth parameters. The orange star highlights the minimum.

Australia gravity data
The application of the gradient-boosted equivalent sources to the Australian gravity dataset demonstrates that the method is able to interpolate and upwardcontinue large datasets in a reasonable amount of time using only modest computational resources. The resulting grid (Fig. 10) preserves the high resolution of the original data while avoiding aliasing artifacts due to the block averaging of the source locations. Some parts of the grid are smoother and have lower amplitudes than the original data (e.g., some southwestern parts), which is ex- Figure 12: Residuals. Differences between the gravity disturbance data from Australia and the predicted values by the estimated equivalent sources on the same observation points. The color map has been truncated to improve the visualization around the largest portion of residual values. The inset plot shows a histogram of the residuals.
pected from the upward continuation that was performed to have the grid at a constant height. From the crossvalidation analysis on a subset of the data, we estimate that the interpolation error is approximately 1.33 mGal.
The largest residuals in Fig. 12 are located in regions with high-amplitude short-wavelength features in the observed data. This is expected since the method involves some degree of smoothing because of the use of damping and the source depths. There are also low-amplitude longwavelength residual signals that seem to coincide with some of the windows of the gradient-boosting method. A possible cause of these features is inability of the equivalent-sources within a window to adequately fit the long-wavelength components of the data. We note, however, that all of these long-wavelength residuals are smaller than 1 mGal in amplitude and do not represent a significant source of errors.
The elongated valley around the minimum of the crossvalidation RMS errors (Fig. 11c) shows that there is ambiguity in the choice of damping and source depths. One could choose a large damping with a small depth or a small damping with a large depth to achieve roughly the same interpolation result. This is expected since both parameters control the smoothness of the interpolation.

Conclusions
The equivalent source technique has been proven to be well suited for interpolating gravity disturbances and magnetic anomalies. The two main reasons that make it to stand out from other 2D interpolation methods is the fact that the equivalent sources take into account the height of the observations and that the interpolated values will always be harmonic functions. The main challenge of using equivalent sources in practice is the high computational load of estimating the coefficients of the equivalent sources, specially the computer memory needed to store the Jacobian matrix.
We present two strategies that could be simultaneously applied to interpolate datasets with millions of points on modest hardware: block-averaging source locations, which reduces the number of equivalent sources needed for the interpolation, and the gradient-boosted equivalent source algorithm, which breaks down the inverse problem into smaller sets of equivalent sources defined by overlapping windows. Both methods were tested against synthetic datasets in order to compare their accuracy and how they perform in terms of computational efficiency.
Our results show that the block-averaged sources reduce the computational load needed to estimate source coefficients in comparison to two traditional strategies (placing sources below data points or on regular grids). We also show that this reduction of the number of sources does not affect the accuracy of the predictions. The use of block-averaged sources may also prevent aliasing of the interpolated values, specially when the observations are unevenly sampled (e.g., airborne and shipborne surveys). Special attention must be payed when choosing the size of the blocks for averaging. As a thumb rule, we recommend choosing a size approximately equal to the resolution of the regular grid where the values will be interpolated.
Tests that compared strategies for the vertical location of the sources showed that any one of the three strategies tested (constant depth, relative depth and variable depth) produces comparable accuracy of interpolation. Nevertheless, we are more prone to recommending either the constant depth or the relative depth for most applications because they involve less hyper-parameters that would need to be configured before the actual interpolation.
Gradient-boosted equivalent sources were shown to heavily reduce the computer memory needed to estimate source coefficients, making it possible to interpolate large datasets with millions of points that would otherwise produce Jacobian matrices larger than the available memory. The interpolations obtained though this new method achieve close to the same accuracy than the regular equivalent sources, while reducing the computation time by approximately a factor of three. We also show that an overlap of 50% between adjacent windows achieves a good compromise between accuracy and computation time. The size of the overlapping windows should be chosen as the maximum value possible that creates Jacobian matrices that still fit into computer memory. Moreover, randomizing the order in which the windows are iterated increases the convergence rate of the algorithm and is essential to producing accurate predictions.
The gradient-boosting method can be used in conjunction with any horizontal source layout, depth strategy, or source type (e.g., point sources, prisms, tesseroids) because it does not rely on assumptions about the sources. Future research should investigate the application of gradient boosting to other equivalent source methods.

Data and code availability
The Python source code used to produce all results and figures presented here is available at https://doi.org/ 10.6084/m9.figshare.13604360 and https://github.com/ compgeolab/eql-gradient-boosted under the BSD 3-clause open-source license.
The gradient-boosted equivalent sources implementation is based on the equivalent source code in the Harmonica library (Uieda et al., 2020b). Other software used in this study includes: Pooch (Uieda et al., 2020a) for downloading and caching datasets, Verde (Uieda, 2018) for block reductions and coordinate manipulations, Boule (Uieda and Soler, 2020) for normal gravity calculations, xarray (Hoyer and Hamman, 2017) and Numpy (Harris et al., 2020) for handling multidimensional arrays and numerical computations, Numba (Lam et al., 2015) for just-in-time compilation and parallelization, scikit-learn (Pedregosa et al., 2011) for cross-validation, Matplotlib (Hunter, 2007) and PyGMT (Uieda et al., 2020c) for generating the figures and maps, and the Jupyter notebook programming environment (Kluyver et al., 2016). Harmonica, Boule, Pooch, and Verde are part of the Fatiando a Terra project .
All datasets used are open-access and publicly available.
The synthetic surveys were generated using a public domain gravity dataset for Southern Africa distributed by the NOAA NCEI (https:// www.ngdc.noaa.gov/mgg/gravity/gravity.html) and the Great Britain Aeromagnetic Survey distributed by the British Geological Survey (BGS) under an Open Government License (https://www.bgs.ac.uk/products/ geophysics/aeromagneticRegional.html). The shaded relief in Fig. 10 is the SRTM15+ dataset by Tozer et al. (2019). The Australian ground gravity data is based on a compilation distributed by Geoscience Australia under a Creative Commons Attribution 4.0 International Licence (Wynne, 2018) which was filtered and referenced to the WGS84 ellipsoid by Uieda (2021) and is distributed under the same license (https://doi.org/10.6084/ m9.figshare.13643837).

Acknowledgements
We are indebted to the developers and maintainers of the open-source software without which this work would not have been possible. We would also like to thank Editor Frederik Simons, Assistant Editor Fern Storey, and two anonymous reviewers for their constructive comments. S.R. Soler is supported by a scholarship from CONICET, Argentina. This work contains British Geological Survey materials © UKRI. S.R. Soler and L. Uieda jointly devel-oped the initial idea, analysed the results, and wrote the paper. S.R. Soler produced all results and developed the software implementation with the assistance of L. Uieda.

A Source position parameters used for the tests on synthetic data
Tables 1 and 2 show the parameter values tested and their optimal values for creating the equivalent source distributions tested in Section 3.1. The optimal values were used to produce the results in Figs 6 and 7.