DEEP LEARNING OF GEOLOGICAL STRUCTURES IN SEISMIC REFLECTION DATA

Understanding the internal structure of our planet is a fundamental goal of the Earth Sciences. As direct observations are restricted to surface outcrops and borehole cores, we rely on geophysical data to study the Earth’s interior. Especially, seismic reflection data showing acoustic images of the subsurface, provide us with critical insights into sedimentary, tectonic and magmatic systems. The interpretation of these large, 2-D grids or 3-D seismic volumes is however time-consuming even for a well-trained person or team of people. Here we demonstrate how to automate and accelerate the analysis of these increasingly large seismic datasets with machine learning. We are able to perform typical seismic interpretation tasks, such as the mapping of (1) tectonic faults, (2) salt bodies and (3) sedimentary horizons at high accuracy using deep convolutional neural networks. We share our workflows and scripts, encouraging users to apply our methods to similar problems. Our methodology is generic and flexible allowing an easy


INTRODUCTION
Deep learning is transforming the way we analyze large datasets in many scientific disciplines including medicine (e.g. Esteva et al., 2017) , chemistry (e.g. Butler et al., 2018) , and physics (e.g. Carleo & Troyer, 2017) . In Earth Science, we use large volumes of geophysical data to learn about the internal structure of our planet. Seismic reflection data showing acoustic images of the subsurface are particularly important to our understanding of sedimentary, tectonic and magmatic systems. The emergence of 3-D seismic technology allows us to image geological structures in the subsurface at a resolution of a few tens of meters to over thousands of square kilometers (Cartwright & Huuse, 2005) . While this technology provides fascinating insights into the Earth's interior, it does require increasing amounts of time, experience and expertise to analyze these large datasets (e.g. Bond et al., 2012) .
This allows us to learn where a particular geological feature is in the data without requiring engineering of a specific set of features (seismic attributes), which highlight it. In addition, we can use neural networks for unsupervised learning, where we do not pre-define the geological structures, we would like to detect (Coléou et al., 2003;de Matos et al., 2007) .
With a few exceptions (e.g. Waldeland et al., 2018) , many studies exploring deep learning in seismic interpretation do not publish their codes. Given the complexity of these models and the sensitivity to small changes in these workflows, it is very challenging to replicate, compare and evaluate different approaches. Moreover, deep learning models are hardly used in practice, where seismic interpretations are almost exclusively performed by experts. Here we aim to show how to use deep learning to analyse large volumes of two-dimensional (2-D) and/or three-dimensional (3-D) seismic reflection data step by step. We explain: (1) how these techniques work; (2) how to apply them to geophysical datasets and (3) which challenges can occur and what remedies are available. Deep learning describes a set of machine learning models that allow us to extract data representations (e.g. features, patterns) from the raw data (e.g. LeCun et al., 2015) . Here we train these models to perform all typical seismic interpretation tasks, such as: (1) mapping faults, (2) extracting geobodies, and (3) tracing stratigraphic horizons in 3-D seismic reflection data. This can accelerate the analysis of geophysical datasets significantly, allowing us to study the structure and internal processes of the Earth in great detail.

3-D SEISMIC REFLECTION DATA
We exemplify our technique using state-of-the-art 3-D seismic reflection data from the northern North Sea. This newly-acquired dataset covers an area 35,410 km 2 , imaging the crust down to ∼22 km depth. The dataset was acquired using a series of up to 8-km-long streamers towed ∼40 m below the water surface. The broadseis technology, used for recording, covers a wide range of frequencies (2.5-155 Hz) providing high resolution depth imaging. The data was binned at 12.5 × 18.75 m with a vertical sample rate of 4 ms. The data was 3-D true amplitude Kirchhoff prestack time migrated. The seismic volume was zero-phase processed with SEG normal polarity; i.e., a positive reflection (white) corresponds to an acoustic impedance (density × velocity) increase with depth. More details on data acquisition and pre-processing steps are in the supplemental material.

Figure 1. Schematic illustration of deep learning of 3-D seismic reflection data.
We deconstruct a 3-D seismic volume into a large number of cubes, which function as input data for the neural network. Using existing seismic interpretations as labels, the neural network learns to predict the output. By predicting the output of each cube, we can reconstruct the result volume, thereby providing a full interpretation of contained fault sets and stratigraphic horizons.

DEEP LEARNING
Machine learning describes a set of algorithms and models, which learn to perform a specific task without explicit instructions. While classic machine learning relies on features engineered for a specific task ( Units or ReLu, which set values below a certain threshold to a lower bound (zero), i.e.
f(x)=max(0,x); for more details see Nair and Hinton, 2010 Bjorck et al,. 2018, Santurkar et al., 2018 , batch normalisation tends to accelerate convergence during training (Santurkar et al., 2018) . A dropout layer is used to prevent neural networks from overfitting the training data and thus generalizing poorly. The basic idea is to randomly drop units from the network. This prevents co-adaptation (different units of the network adapting similar weights) and provides regularization (lower the generalization error) (Srivastava et al., 2014) .
Using these building blocks, we construct two types of convolutional neural networks in this study. The first type consists of several convolutional and max pooling layers, which downsample an image, followed by a couple of dense layers, which classify it. These models are most commonly employed to predict the label of an image, e.g. whether a certain object is in the image or not. The second type (UNets) consists of two paths: one for down and one for upsampling (e.g. Long et al., 2015, Ronneberger et al., 2015 . Each path consists of convolutional, max pooling, batch normalization and dropout layers as well as so-called skip connections, i.e. extra connections between nodes in different layers of a network that skip one or more layers (Orhan and Pitkov, 2018) . In contrast to the first type, these models predict the label of each pixel of an image, allowing a much faster pixel by pixel prediction of images. We apply both of these types of models to compare their accuracy and speed as well as highlight advantages and disadvantages during training and application.
We can think of the training of these models as an optimization, which aims to minimize the difference between the predicted and the actual labels. This difference is calculated using a loss function (e.g. mean squared error, cross entropy).
Optimizers (e.g. stochastic gradient descent, adaptive moment estimation) allow us to minimize the loss function by adjusting the model weights via backpropagation (Rumelhart et al., 1986) . We can monitor the optimization by tracking several metrics (e.g. loss, accuracy) during training. These learning curves are useful for hyperparameter tuning and to decide when training has completed. Once training is complete, we can determine the performance of the model on the test set and apply it to our data set.
On the technical side, we wrote our code in Python 3 using open source libraries, such as Segyio, NumPy (Van der Walt et al., 2011) , Matplotlib (Hunter, 2007) , Tensorflow (Abadi et al., 2016) , and Keras (Chollet, 2015) (see supplementary material for more details). We label training data with a graphics editor (Adobe Illustrator)    where validation loss was minimal).The confusion matrix cross-plots predicted and true labels, so that high values on the diagonal indicate high accuracy for the classes. Note that the relatively large fraction of false positives is somewhat by design, as we would rather find more faults than missing some (e.g. for C02 storage).
During training, the loss (mean squared error) decreases from 0.16 to 0.12 on the training set and from 0.19 to 0.14 on the validation set ( Fig. 3A), while the binary accuracy increases from 0.77 to 0.83 on the training and from 0.71 to 0.79 on the validation set (Fig. 3B). We save the model after epoch 5, when the accuracy is highest and the loss lowest on the validation set.
The final model predicts 85% true positives (fault), 15% false positives, 71% true negatives (no fault) and 29% false negatives (Fig. 3C). The model reaches a precision, a recall and a F1-score of 0.79 with a support of 10 000 on the test set (Tab. 1). Applying the model to an unseen 2-D seismic section of the northern North Sea takes 120 minutes on our machine, because the model has to predict the label of each of the 21,983,584 pixels of the section. Overall, the model predicts all major faults in the upper crust, but also labels some steeply-inclined stratigraphic reflections as faults ( Fig. 2A,B). These faults show typical geometries including: (1) plane, (2) listric, and (3) kinked surfaces (Fig. 2C) (Bell et al., 2014) . The model is also able to predict fault intersections, such as criss-crossing and splaying faults as well as associated structures, such as en échelon faults and rotated fault blocks.
Next, we approach this task as a semantic segmentation problem, where we During training, the loss decreases from 0.35to 0.10 on the training set and from 0.36 to 0.33 on the validation set (Fig. 3D), while the binary accuracy increases from 0.82 to 0.94 on the training and from 0.85 to 0.92 on the validation set (Fig. 3B). We save the model after epoch 3, when the loss is lowest on the validation set.
The final model predicts 64% true positives (fault), 36% false positives, 95% true negatives (no fault) and 5% false negatives (Fig. 3C). The model reaches a precision of 13 0.83, a recall of 0.82, a F1-score of 0.83 with a support of 10,000 on the test set (Tab. 1).
All these values are reasonably high, so it makes sense to apply the model to the entire seismic section.
Applying the model to an unseen 2-D seismic section of the northern North Sea takes only 5 seconds on our machine, because the model simultaneously predicts 128×128 of the 21,983,584 pixels for each batch of the section. Moreover, the model is able to predict the faults just as well as the previous model.

SALT
Our second example focuses on mapping salt bodies in three dimensions.
Given that salt forms complex bodies, which are difficult to label in 3-D as segmentation masks, we approach this task as a binary classification, where we train  The final model predicts 89% true positives (salt), 11% false positives, 92% true negatives (no salt) and 8% false negatives (Fig. 4C). The model achieves a precision, a recall and a F1-score of 0.91 with a support of 10,000 on the test set (Tab. 1). These values give us confidence that we can apply the model.  During training, the loss (mean squared error) decreases from 0.27 to 0.047 on the training set and from 0.33 to 0.051 on the validation set ( Fig. 6A). At the same time, the binary accuracy increases from 0.75 to 0.966 on the training and from 0.63 to 0.932 on the validation set (Fig. 6B). Plateauing of loss and accuracy with time suggests that training has been completed.
The final model predicts 95% true positives (horizon), 5% false positives, 98% true negatives (no horizon) and 2% false negatives (Fig. 6C). The model reaches a precision of 0.92, a recall of 0.91, a F1-score of 0.92 with a support of 10,00 on the test set (Tab. 1). Given that these values are sufficiently high, we can apply the model to a neighbouring area.

DISCUSSION
Our models highlight that we can use deep learning for typical seismic interpretation tasks, such as fault, salt and horizon mapping in two and three dimensions (Figs. 2, 5, 7). Using labelled training data, we can train a set of 2-D and 3-D convolutional neural networks to perform these tasks in as little as 5 seconds.
This allows us to map 2-D and 3-D geological structures (e.g. faults, geobodies and horizons) in great detail, over vast areas, and at high accuracy. This has far-reaching implications. For instance, if we are able to map entire rifts and rift systems in 3-D seismic reflection data, we could compare the architecture of these systems to 3-D geodynamic models, which, in turn, helps us understand the rheology of the lithosphere and the spatial evolution of fault rift systems. Mapping every stratigraphic horizon in these systems would help us reconstruct their temporal evolution in much greater detail than previously possible. While this example focuses on tectonic faults in continental rifts, a similar approach is feasible in different disciplines working with 2-D and 3-D seismic reflection data, whether it concerns the study of submarine channels, magmatic intrusions or salt tectonics.
In the following, we discuss: (1) the potential of deep learning in geophysics, (2) how generalisable these methods are to other problems in geophysics, and (3) what are the challenges and few remedial strategies. We think that there are three main reasons regarding the suitability of deep learning techniques for analysis of 3-D geophysical data. First, geophysical datasets are large (GBs to TBs); a key requirement for the implicit feature extraction of deep learning (Chen & Lin, 2014).
Second, we already have large volumes of historical training data labelled by expert geologists and if necessary, we can label even more training data. Third, the translation invariance of convolutional neural networks has made deep learning very effective in analyzing images; a property we too can exploit in geophysics.
With deep learning transforming the way we analyze data, we can ask ourselves how transferable these methods are to other problems in geophysics. First, it is worth noting that deep learning does not require feature engineering and is thus not limited to a specific type of geophysical data. Second, the models presented here are still relatively simple in terms of architecture, so that users can easily tweak them to adapt the workflow to their problem. Moreover, it is possible to optimize the hyperparameters of these models to further increase accuracy and speed. To encourage these developments, we will make our code freely available on Github: https://github.com/thilowrona/seismic_deep_learning . Third, we can apply our workflow to map a variety of geological structures (e.g. volcanoes, channels or craters) in different types of geophysical data (radar, magnetic, gravity). Finally, we can explore applications of deep learning in other tasks in seismic interpretation.
Uncertainty quantification, for example, is extremely difficult in manual seismic interpretations. While our models only predict a proxy for the probability of detecting a certain geological structure (e.g. Fig. 2B, C), we can envision incorporating uncertainty in the training data and architecture of these models.  Blei et al., 2017).
While these techniques are very powerful, it is also worth highlighting some of the challenges (and potential remedies) associated with deep learning. For example, pixel-wise predictions are more time-consuming with classification instead of segmentation models. Labelling training data for segmentation models, in particular in 3-D, is challenging and time-consuming, too. A masked loss function allows training with 3-D models and data using 1-D-and 2-D labels (see Tschannen et al., 2020 for more details). This function restricts the loss calculation during training using a mask, which is one where a label is available and zero where it is absent.
Another challenge is determining the optimal number of training examples.
While we found that around 100,000 examples provide a good trade-off between prediction accuracy and training time, this point is worth investigating further.
When labels are only available for a small subset, it is possible to increase the size of the training set using data augmentation techniques. When there are no labels available, we (as a user or community) have to spend time labelling data manually.
Alternatively, we can explore unsupervised learning methods (e.g. autoencoders), which do not require labels.
Another major issue of deep learning is overfitting. Deep neural networks can provide accurate predictions on the training set, but fail to generalize on new data (i.e. overfitting) (Domingos, 2012). To identify overfitting, we can compare model predictions between the training and validation set. Moreover, it is advisable to start with a simple model and successively increase complexity as necessary.
Another challenge is the quality and amount of training data. Supervised machine learning, for example, requires high quality labelled training data representative of the entire data set. Moreover, deep learning requires large volumes of this data. When labels are available for a small subset, it is possible to increase the size of the training set using data augmentation techniques. When there are no labels available, we (as a user or community) have to spend time labelling data manually. Alternatively, we can explore unsupervised learning methods (e.g. autoencoders), which do not require labels.
At last, it is worth mentioning that neural networks are still, to a large degree, a black box, offering only sporadic insights into their inner workings. So while these models can perform many tasks at high accuracy and speed, we are just beginning to understand which features these models use and how they make decisions, so interpretability remains a key focus area of machine learning research (e.g. Ribeiro et al., 2016;Lundberg and Lee, 2017;Molnar, 2018).

CONCLUSIONS
Here we demonstrate how to use deep learning to analyse geophysical data sets at high accuracy and limited time. We show that deep convolutional neural networks are able to perform key tasks, such as (1)

MACHINE LEARNING
The models used in this study were developed, trained and applied using a series of scripts implemented Python 3 and built on top of the several existing packages (e.g.
Segyio, NumPy, Matplotlib, Tensorflow and Keras). A typical supervised machine learning workflow consists of these components: (1) load data and labels, (2) prepare data, (3) set up model, (4) train model, (5) tune model, and (6) apply model (e.g. \Wrona et al, 2018). First, we typically load seismic sections or volumes into NumPy arrays using the package Segyio . We load labelled sections saved as images using Matplotlib . Second, we standardize our seismic data (mean=0, variance=1) using Scikit-learn . Then, we either store training and validation data in Numpy arrays or stream the data using data generators from Keras . Third, we set up our models (listed below) using Keras, a high-level neural networks API built on top of Tensorflow . We use tensorflow-gpu. We initialize model weights with Keras' default settings (e.g. kernel initializer is glorot uniform). Fourth, we train our models using these loss functions: Sixth, we apply our models to entire data sets by iterating through each point.