Automatic Slowness Vector Measurements of Seismic Arrivals with Uncertainty Estimates using Bootstrap Sampling, Array Methods and Unsupervised Learning


 Horizontal slowness vector measurements using array techniques have been used to analyse many Earth phenomena from lower mantle heterogeneity to meteorological event location. While providing observations essential for studying much of the Earth, slowness vector analysis is limited by the necessary and subjective visual inspection of observations. Furthermore, it is challenging to determine the uncertainties caused by limitations of array processing such as array geometry, local structure, noise and their effect on slowness vector measurements. To address these issues, we present a method to automatically identify seismic arrivals and measure their slowness vector properties with uncertainty bounds. We do this by bootstrap sampling waveforms, therefore also creating random sub arrays, then use linear beamforming to measure the coherent power at a range of slowness vectors. For each bootstrap sample, we take the top N peaks from each power distribution as the slowness vectors of possible arrivals. The slowness vectors of all bootstrap samples are gathered and the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is used to identify arrivals as clusters of slowness vectors. The mean of slowness vectors in each cluster gives the slowness vector measurement for that arrival and the distribution of slowness vectors in each cluster gives the uncertainty estimate. We tuned the parameters of DBSCAN using a data set of 2489 SKS and SKKS observations at a range of frequency bands from 0.1 to 1 Hz. We then present examples at higher frequencies (0.5–2.0 Hz) than the tuning data set, identifying PKP precursors, and lower frequency by identifying multipathing in surface waves (0.04–0.06 Hz). While we use a linear beamforming process, this method can be implemented with any beamforming process such as cross correlation beamforming or phase weighted stacking. This method allows for much larger data sets to be analysed without visual inspection of data. Phenomena such as multipathing, reflections or scattering can be identified automatically in body or surface waves and their properties analysed with uncertainties.

Past studies which analysed slowness vector properties using array methods (for a review see: Rost & Thomas, 2002 were limited in terms of number of observations due to the usual requirement to visually inspect each observation to determine an arrivals slowness vector properties or if it is too noisy to use. In addition, measurement uncertainties are usually not included in the analysis. Automating the identification of arrivals and measuring their slowness vector properties would remove the time consuming and subjective process of visually inspecting each observation and could allow for larger data sets to be analysed. Estimating the uncertainty of these measurements allow for better interpretation of the observations, and the ability to rigorously accept or reject scientific hypotheses on Earth structure or its processes. Previous efforts have been made in automating standard seismic processing techniques such as shear wave splitting (Teanby et al., 2004) and H − κ stacking (Ogden et al., 2019). Methods also exist to estimate uncertainties in the beamforming methodology (Lin & Roecker, 1996;Bear & Pavlis, 1997;Ritsema et al., 2020). To our knowledge, no method has been developed to automate measuring the full slowness vector properties with uncertainties. Machine learning methodologies are becoming more prevalent in the geosciences (for a review see: Bower et al., 2013) and seis-mology (for a review see: Kong et al., 2019) with methods used to automate data selection (e.g. Valentine & Woodhouse, 2010;Thorne et al., 2020) and extracting properties from data by mapping seismograms to lower dimensional space using autoencoders (Valentine & Trampert, 2012) or sequence seismograms and identify features such as the precense of seismic scatterers (Kim et al., 2020). Here we use an unsupervised learning algorithm as part of our automation technique.
In the approach we present in this paper, we create subsets of waveforms using bootstrap sampling (Efron, 1992). For each sample, beamforming (Rost & Thomas, 2002) corrected for a curved wavefront (Ward et al., 2020) is used to search over a range of slowness vectors and recover the slowness vectors of potential seismic arrivals. The slowness vector measurements of all the individual bootstrap samples are collected and we use the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm (Ester et al., 1996) to identify clusters of slowness vectors as seismic arrivals. DBSCAN is an unsupervised learning algorithm which uses the density of points to classify them as part of a cluster or as noise. For further details, see Section 2.
We tune the parameters of the DBSCAN algorithm on a visually inspected dataset where each observation is labeled as having either 0, 1, or 2 arrivals. More arrivals are possible, but in this dataset the maximum number confidently observed is 2. In this dataset, observations with more than one arrival are hypothesised to be caused by multipathing, one of many phenomena which can cause multiple arrivals. Multipathing occurs when the wavefront is incident of a sufficiently large velocity gradient causing different parts of the wavefield to move at different velocities, diffract and refract. Multipathing results in 2 arrivals arriving at the station at different times and different slowness vector properties. The predictions made by the method are compared to the labels given from visual inspection to find the best parameters for the DBSCAN algorithm. Following this, we show the effectiveness of this automated method on finding the slowness properties of short-period PKP scattering and long-period surface wave arrivals. Guidance on using the method is given in Section 5. We find the parameters work well for our example applications with a minor change needed for the surface wave example. Tuning the algorithm can be done for specific applications.

METHOD OVERVIEW
This section outlines the method to automatically measure the slowness vector properties with uncertainty estimates. The process can be roughly broken down into the following steps with more detail given below.
(i) Create a number of bootstrap sub-samples (1000 here) through random sampling with replacement of a set of waveforms recorded at the seismic array in question.
(ii) For each sample, use beamforming (Rost & Thomas, 2002) correcting for a curved wavefront (Ward et al., 2020) to find how the power of coherent energy varies with backazimuth and horizontal slowness.
(iii) Make a noise estimate in each bootstrap sample by randomly scrambling the traces in time, stacking them and measuring the power. This is repeated 1000 times for each sample and the mean power is taken as the noise estimate.
(iv) Set all power values below the noise estimate to zero.
(v) From the resultant power distribution, take up to X peaks (in this study we take up to 3 peaks), which describe the slowness vectors of possible arrivals.
(vi) Store the locations for these peaks of all the bootstrap samples.
(vii) Use DBSCAN, a density-based clustering algorithm, to identify the arrivals and measure their slowness properties with uncertainties.
One advantage of the bootstrap sampling process is that bootstrap samples of the stations in the array are used. Beamforming subsets of the array leads to different peak power in the beams which leads to variations in the recovered slowness vectors for each arrival. When all of the slowness vectors are taken into account, using all of the bootstrap sampled arrays, we obtain uncertainty estimates in the slowness vector. These uncertainty estimates will include the effect that array geometry and local structure has on the slowness vector measurements. For each bootstrap sample, we use a relative beamforming method where the traces are aligned on a target slowness before searching over the slowness vectors. After the beamforming, we calculate a noise estimate using the traces in the bootstrap sample with a similar method to Korenaga (2013). The traces are aligned using the slowness vector with the highest power. Then, they are randomly shifted in time, stacked and the power of the stack calculated. This is repeated 1000 times and the mean of all power estimates is used for the noise power estimate. All power values in the beamforming plot ( Fig   1) below three times this noise estimate are set to zero. Multiplying the estimate by three was determined by exploratory analysis and found to give the most satisfactory result. This can be changed depending on the application. To remove local power maxima, the power distribution is The peaks recovered for each bootstrap sample are then collected and the clustering algorithm DBSCAN (Ester et al., 1996) is used to find clusters. DBSCAN is an unsupervised learning algorithm which uses the density of points to identify clusters and noise. The algorithm takes a radius and a minimum number of points (MinPts) to define a minimum density for points to be a cluster.
Here, we define MinPts as a fraction of the number of bootstrap samples. DBSCAN sorts the data into three categories as visualised in Fig 2. (i) Core point: A point with at least MinPts points within its neighbourhood (i.e. within radius ).
(ii) Boundary point: A point within the neighbourhood of a core point, but without MinPts points in its own neighbourhood.
(iii) Noise: Points that are not within of a core point and does not have MinPts points within its neighbourhood. DBSCAN has advantages over other clustering algorithms such as k-means (MacQueen et al., 1967) for this application such as: (i) It does not take the number of clusters as input so visual inspection before the clustering is not required.
(ii) Not all points need to be part of a cluster allowing for noise.
(iii) If clusters are not well separated or the data is noisy, clusters of non-hyperspherical shape can still be recovered unlike k-means (Ertöz et al., 2003;Celebi et al., 2013).
There are also disadvantages to DBSCAN: (i) If the range and data is not well understood, choosing the parameters can be challenging.
(ii) Clustering data with large variations in density is challenging because there may be no combination of and MinPts which will find all of the clusters.
(iii) Clusters separated by a distance smaller than will be combined into one cluster. image shows the clusters found by the DBSCAN algorithm (Ester et al., 1996) where MinPts is 0.25 and is 0.2 s/ • . The red and yellow points are classified as clusters 1 and 2 respectively and the black points are noise. The background power distribution is the mean of all the power distributions found from bootstrap sampling.
We tested other density-based clustering algorithms such as HDBSCAN (Campello et al., 2013(Campello et al., , 2015 and OPTICS (Ankerst et al., 1999)  will preferentially return a large, single cluster because one large cluster will usually contain more "mass" (for a detailed explanation, see McInnes & Healy, 2017). To avoid one large cluster being returned when multiple clusters exist, HDBSCAN by default will not return a single cluster as an output. Keeping this default is not possible as most instances will have a single cluster (arrival).
Changing the default and allowing HDBSCAN to return one cluster will mean phenomena causing multiple arrivals (such as multipathing) may not be identified as EOM will preferentially return a single cluster.
OPTICS (Ordering Points To Identify the Clustering Structure) (Ankerst et al., 1999) is another density-based algorithm which specialises in identifying clusters of varying density. OPTICS orders the points to represent the clustering structure. From this, clusters can be extracted. When using OPTICS, we found the size of the clusters retrieved was too inconsistent to estimate the uncertainties of slowness vector properties. Furthermore, OPTICS is more computationally expensive. Because of these considerations, we decide to use DBSCAN.

Slowness Vector Uncertainty Estimates
We estimate the uncertainty with the standard deviation of backazimuths and horizontal slow-

PARAMETER TUNING
To find the best parameters to use with the DBSCAN algorithm ( and MinPts), we compare the number of arrivals predicted by the algorithm to the number of arrivals identified from visual inspection. We use the same dataset as Ward et al. (2020) which used SKS and SKKS data recorded at the Kaapvaal array in southern Africa. Ward et al. (2020) make observations at a range of frequency bands (Table 1) using the whole Kaapvaal array and several sub-arrays.The traces are first aligned on the predicted slowness of SKS or SKKS depending on the arrival of interest. The beamforming is conducted in a time window that is 20s before and 40s after the predicted arrival.
The dataset provides a good test for the algorithm since it has clear single arrivals, multipathed arrivals (2 arrivals) and observations that are too noisy to identify any arrivals (0 arrivals). Each observation is labeled from visual inspection of the distribution and density of the points collected from all the bootstrap samples and the mean power distribution of all the bootstrap samples. If the algorithm predicts a higher number of arrivals than the human given labels, we assume here the algorithm has identified noise as arrivals. If the algorithm predicts a lower number of arrivals, the density threshold is too high for arrivals to be identified. Due to the subjective nature of the labelling this may not always be the case, but for the tuning process we assume the human labels are a ground truth. Observations where it was not clear whether there is one or two arrivals are labeled as "1-2 arrivals" and excluded from this tuning process.
We searched over a range of and MinPts values and predict the number of arrivals in each observation. This is compared to the human labels in Table 1 and an accuracy score is calculated.
The accuracy score is defined as the number of instances where the method correctly predicts the We test how well the algorithm generalises using cross validation. Cross validation involves splitting the dataset into N representative subsets (5 here). One of the subsets is removed and the grid search is conducted on the remaining N − 1 subsets and the best set of parameters recorded.
The removed subset acts as a validation set. Then we take these best parameters and make predic- For each combination, the number of arrivals in each observation are predicted, compared to the true labels (Table 1)  From these measures, the precision is defined by P = T P T P +F P . This is essentially the proportion of the target labels which have been correctly identified. The recall, R = T P T P +F N , is a measure of how many of the target labels has been recovered by the algorithm. The F 1 score is the harmonic mean of the precision and recall and can be described as F 1 = 2  and 6 show that the method is capable of greater than 90% agreement with the observations of a human. This is mainly from observations with one clear arrival, which makes up the majority of the observations. The algorithm also performs well with more complex observations of multipathing with a F 1 score of over 0.75. This method is quite insensitive to noise as it does not regularly incorrectly identify noisy observations as shown by a F 1 score of over 0.85 for observations with 0 arrivals. As with the accuracy, we use cross validation to see how well the parameters generalise with new data. Table 2 shows the mean F 1 scores for the individual labels.
The cross validation analysis of all the labels and F 1 score on the individual labels show the parameters = 0.20 s/ • and MinPts = 0.25 are consistently found to be the best. Inferring how well the parameters generalise from this analysis is limited because of the low number of cross validation samples (5 here). The low sample number was necessary because of the small number Due to the subjective nature of labelling each observation with the number of arrivals, some difference between the method's prediction and the human labels is acceptable. To analyse how reasonable the predictions are when the technique disagrees with the human labels, we create a confusion matrix using the predictions with parameters of = 0.20 s/ • and MinPts = 0.25 (Fig 7).
In the confusion matrix, each row represents a true label (number of arrivals in this case) and each column the predicted arrivals. The values at each point in the matrix indicates how many times that true label is identified as the corresponding predicted labels. For example, for all instances with the true label of 1 arrival, the confusion matrix will show how many are correctly classified as having one arrival and how many are incorrectly identified with 0, 2 or 3 arrivals. We normalise the values along each row of the confusion matrix so for each true label, the columns show the proportion of the predictions given to that label. For example, for the instances with a true label of '0 arrivals', 80 % of the predictions are correctly identified as having 0 arrivals, 18 % are identified as having 1 arrival and so on. The confusion matrix shows that when the method prediction differs from the human labels, the predictions it makes are not radically unreasonable. It is worth remembering the labeling process is quite subjective and just because the algorithm predicts a different number of arrivals to that labeled by a human, does not mean it is wrong. It is possible that some of the human labels with two arrivals only have one arrival or some have three arrivals. Equally, it is possible some instances labeled with no arrivals do have one arrival but a human could not confidently identify it above the noise. Fig 7 shows the algorithm makes reasonable predictions in the vast majority of the cases for this data set using the parameters found from the tuning process and cross validation. Analysis of the uncertainty estimates show the slowness vector measurements have small variation with the mean standard deviation for backazimuth measurements of 1.2 • and horizontal slowness of 0.14 s/ • . The mean area bounded by the 95% confidence ellipse is 0.14 s 2 / • 2 .
Analysis of the confusion matrix in addition to the findings from the cross validation process shows the parameters = 0.20 s/ • and MinPts = 0.25 will give reasonable results that will generalise well. We use this parameters in other applications with a minor change for applications to surface waves (Section 4).

APPLICATIONS TO PKP SCATTERING AND RAYLEIGH WAVE MULTIPATHING
This section provdes two example applications of this method to study Earth structure. First, we show an example identifying a PKP precursor in the high frequency teleseismic wavefield (0.5 to 2 Hz). Coherent precursors are indicative of scattering caused by small scale structures and our method can constrain uncertainties on their location. Then, we show an example of low frequency (0.04 to 0.06 Hz) Rayleigh wave multipathing. Using our method to identify Rayleigh wave multipathing, we can interpret possible causes of multipathing and provide uncertainties for phase velocity measurements. All measurements of backazimuth and horizontal slowness are shown with one standard deviation describing the uncertainties.

PKP precursors
Analysing the slowness vectors of PKP precursors is indicative of their location and whether they are caused by source or receiver side structure (Haddon & Cleary, 1974). We use PKP data from Thomas et al. (1999) who observe several scatterers beneath Europe and Eastern Asia. Of the data used in Thomas et al. (1999), we focus on a single event occurring on 15 September, 1992 which shows clear PKP precursors. We only use data recorded at the Grafenberg array and not the larger GRSN array to avoid spatial aliasing. In this example, the PKP precursors appear to be coherent from visual inspection of the seismograms (Fig 8. Coherent precursors suggest they probably originate from localised scatterers such as an Ultra Low Velocity Zone (ULVZ) (Ma & Thomas, 2020).
Fig8 shows the traces used for this example and the clusters found by our algorithm. The data have the instrument response removed and are filtered between 0.5 and 2 Hz before the beamforming process. We used a time window of 10 s before the predicted PKIKP arrival and the same DBSCAN parameters found from the tuning ( = 0.20 s/ • and MinPts = 0.25). The method identifies a single precursor arriving with a backazimuth of 58.6 • ± 2.3 • and a horizontal slowness of 2.93 s/ • ± 0.32 s/ • . This is similar to the slowness vector properties of the dominant arrival found by Thomas et al. (1999) arriving 6.5 s before PKIKP with a horizontal slowness of 2.8 s/ • and backazimuth of 53.6 • . Unlike Thomas et al. (1999), we only identify one precursor rather than three. We believe this is because our time window encompasses all precursors meaning if one precursor has a significantly higher amplitude it may be the only one recovered. Furthermore, visual inspection of waveforms suggests a single dominant precursor (Fig 8). The range of possible horizontal slowness of this PKP precursor inferred from the uncertainty of the measurement (2.93 s/ • ± 0.32 s/ • ) at a distance of approximately 140 • means this precursor could originate from either source side or receiver side structure (Haddon & Cleary, 1974).

Rayleigh wave multipathing
The second example shows the identification of multipathed Rayleigh waves. From this observation, the phase velocities and backazimuths of the multipathed arrivals can be measured and analysed with uncertainty bounds. Xia et al. (2018) identify multipathing in Rayleigh waves in the western US and suggest this is caused by the transition from continental to coastal to oceanic structure each with unique velocity profiles. We analyse Rayleigh waves from an event on 05 January 2013 recorded at the Southern California Seismic Array (CI) to identify multipathing and hypothesise some potential causes. The instrument response is removed and traces are filtered between 0.04 and 0.06 Hz. The time window used in the relative beamforming is 200 s before and after the predicted arrival time assuming a velocity of 3.5 km/s. In this example, the points in each cluster would be needed to recover the anisotropic properties, but this example shows how our technique can be used to identify statistically significant differences in phase velocity measurements.

CODE GUIDELINES
This section outlines some guidance to use this technique in terms of parameter selection and computation time. There are many potential aspects of a study that can influence the method's effectiveness such as frequency bands, array size and configuration or local receiver side structure.
The tuning process (Section 3) shows we cover a range of frequency bands (Table 1)  The number of peaks above the noise threshold should be equal to the maximum number of arrivals of interest or expect to be possible. The noise threshold was determined to be three times the noise estimate through exploratory analysis and found to give satisfactory results, but this can be changed depending on the application. DBSCAN parameters and MinPts of 0.20 s/ • and 0.25 respectively will work well for identifying single arrivals and is relatively intolerant to noise. If the study is searching for multipathing, changing MinPts to 0.15 and keeping as 0.20 s/ • increases the accuracy of the multipathed arrivals from 66 % to 75 % but decreases the accuracy of the noisy arrivals from 80 % to 44 %. These alternative parameters would require visual inspection of those identified as multipathing by the algorithm but would significantly reduce the amount of visual inspection as observations with one arrival need not be visually inspected.
For surface waves, the algorithm also works well after changing to 0.02 s/km. For applications with significantly different frequency bands or array size or searching for a very specific phenomenon, the DBSCAN parameters may need to be tuned to optimise performance (Section 3).
The remaining parameters can be kept the same. Sensible beamforming practice such as avoiding spatial aliasing still applies when using this method.
The computationally intensive part of the method is the bootstrap sampling and the beamforming on each sample, which must be performed for each observation; the cluster analysis is comparatively quick. However, the code is trivially parallelisable over observations since each is independent of all the others. The code is written in Python, is easily editable and freely available (https://github.com/eejwa/Array_Seis_Circle). The code has been parallelised so the bootstrap sampling can be spread over several cores and uses Numba (Lam et al., 2015) to compile the functions into machine code before execution. Further improvements in efficiency could be made by rewriting the algorithm in more efficient languages such as Julia, C++ or Fortran, and investigating further performance improvements possible with the existing code base.
For an example array with 20 stations, a time window of 30 seconds, and searching over a grid of slowness vector properties with 14641 vectors (a grid where each axis covers 6 s/ • in increments of 0.05 s/ • ), each bootstrap sample takes approximately 1.6 seconds to process. This makes tens of observations viable on a handful of cores such as on a desktop machine. Larger datasets (thousands of observations) can be processed on the order of hours using tens of cores.

CONCLUSIONS
Slowness vector measurements have been used to understand a variety of Earth structures and phenomena. They are typically used to identify wavefield perturbations, scattering and event/noise source localisation. While this analysis is a common tool used by seismologists, studies are limited because of the necessary and subjective visual inspection of observations. Interpretation of the resolved structures is limited as uncertainties are not normally included in the analysis.
In this study, we described a method to automate slowness vector measurement, estimate the uncertainties and identify the number of possible arrivals. To do this, we bootstrap sample the waveforms and in each sample use a relative beamforming process to measure the coherent power and recover slowness vector properties of potential arrivals. These slowness vector properties are collected and the clustering algorithm DBSCAN is used to identify arrivals. The mean of the clusters gives the backazimuth and horizontal slowness and the spread of the cluster gives the uncertainties of the measurements.
We tuned the DBSCAN parameters on a data set with 0, 1 and 2 arrivals and achieved > 90% accuracy in recovering these arrivals. We present examples of analysis of scattered P wave energy and Rayleigh wave multipathing. The advantage this method brings to these applications is the ability to automatically identify the arrivals and measure the slowness vectors with uncertainty estimates. The difference in spatial scale and wavelengths used in these examples shows that our approach is applicable to studying Earth properties at a wide variety of spatial scales. Using this method, it may be possible to analyse slowness vector properties on larger data sets with reduced need for subjective visual inspection. In addition, uncertainties can also be quantified and used alongside the measurements. This technique makes 1000s of observations feasible in a matter of hours and allows for global-scale slowness vector observations to be made.

ACKNOWLEDGMENTS
ObspyDMT (Hosseini & Sigloch, 2017) was used to download data. GMT (Wessel et al., 2013) was used to make some of the figures. Predictions from 1D velocity models were made using the TauP toolkit Crotwell et al. (1999). We thank the Department of Geology and Geophysics, University of Utah for hosting a collaborative visit and the Centre for High Performance Computing (CHPC) for computer resources and support. This research is supported by the NERC DTP Spheres grant NE/L002574/1, AN was funded by NERC standard grant NE/R001154/1 (REMIS: