Toward Real-time Microseismic Event Detection using the YOLOv3 Algorithm

Real-time microseismic monitoring is essential for understanding fractures associated with underground fluid injection in unconventional reservoirs. However, microseismic events recorded on monitoring arrays are usually contaminated with strong noise. With a low signal-to-noise ratio (S/R), the detection of microseismic events is challenging using conventional detection methods such as the short-term average/long-term average (STA/LTA) technique. Common machine learning methods, e.g., feature extraction plus support vector machine (SVM) and convolutional neural networks (CNNs), can achieve higher accuracy with strong noise, but they are usually time-consuming and memory-intensive to run. We propose the use of YOLOv3, a state-of-art real-time object detection system in microseismic event detection. YOLOv3 is a one-stage deep CNN detector that predicts class confidence and bounding boxes for images at high speed and with great precision. With pre-trained weights from the ImageNet 1000-class competition dataset, physics-based training of the YOLOv3 algorithm is performed on a group of forward modeled synthetic microseismic data with varying S/R. We also add randomized forward-modeled surface seismic events and Gaussian white noise to generate “semi-realistic” training and testing datasets. YOLOv3 1 is able to detect weaker microseismic event signals with low signal-to-noise ratios (e.g., S/N=0.1) and achieves a mean average precision of 88.71% in near real time. Further work is required to test YOLOv3 in field production settings.

is able to detect weaker microseismic event signals with low signal-to-noise ratios (e.g., S/N=0.1) and achieves a mean average precision of 88.71% in near real time. Further work is required to test YOLOv3 in field production settings.

INTRODUCTION
Microseismic monitoring is crucial for evaluating the dynamics of hydraulic fracturing in unconventional reservoirs (Akram et al., 2017). Microseismic events are usually recorded using surface or downhole geophones, with the recent advances in distributed acoustic sensing (DAS), monitoring is also performed using fiber-optic cable installed downhole (Binder and Chakraborty, 2020). However, real-time event detection and analysis are difficult because of the challenges in telemetering data from geophone stations or handling the daily TB-size volumes of data coming from DAS interrogators. Human-based analysis and interpretation are almost incapable at this "data velocity". It is made further challenging due to the often low (i.e., <1) signal-to-noise ratios (S/N) commonly found in acquired data when using the conventional automatic arrival picking methods such as the STA/LTA algorithm (Withers et al., 1998;Vaezi and van der Baan, 2015).
In the past, classification algorithms (such as Fuzzy c-means clustering, SVM) and neural networks have been used for microseismic event detection in 1D time series (Zhu et al., 2016;Jia et al., 2017;Akram et al., 2017). With recent advances in deep learning approaches for image segmentation and object detection, other possibilities have emerged, including several robust algorithms that can handle streaming video data [e.g., YOLO (Redmon, 2016) and Single-Shot Mulibox Detector (Liu et al., 2016)]. One can leverage these advanced algorithms that are trained to detect geophysical phenomena of interest such as microseismic events.
Binder and Chakraborty (2020) use a convolutional neural network (CNN) to input the 2D strain wavefield from the DAS data as space-time images and to output a 1D profile of possibilities of microseismic events along time. The convolutional neural network outperforms STA/LTA, but will have challenges representing multiple events scattered in space with 1D profile. Qu et al. (2019) propose a workflow for image segmentation including feature extraction techniques on 2D microseismic data and SVM classification, it overcomes the low signal-to-noise ratio of microseismic data and achieve a high accuracy of 93% for data with -13 dB S/N ratio. But the workflow is too computationally expensive to achieve real-time detection.
Compared with SVM algorithms and the image segmentation workflow, YOLO is an order-of-magnitude faster object detector for streaming video. With a one-stage detection workflow, the YOLO detection time is extremely short: it can handle 45 frames per second (FPS) HD video flows. Even with the increase in data velocity, YOLO achieves better precision than other real-time object detectors according to ImageNet data benchmark tests (Redmon and Farhadi, 2018). Hence, its news variation, YOLOv3, is a good candidate for large-scale real-time DAS monitoring.
As a supervised machine learning approach, YOLOv3 requires 2D images with object location labels for the training process. In the detection process, YOLOv3 takes 2D image inputs and returns bounding boxes on detected objects as well as on an associated confidence measure. We use physics-based 2D elastic modeling to develop a microseismic database that provides a sufficiently labeled training dataset for YOLOv3 as well as additional data for testing and validation. We also add randomized forward-modeled surface seismic events and Gaussian white noise to generate "semi-realistic" training and testing datasets to investigate the effects of coherent noise on the prediction ability of microseismic events.
In this paper, we first discuss the YOLOv3 algorithm and our procedure of training it to detect microseismic events. We then outline our synthetic testing procedure to determine the ability of the algorithm to detect object "events" in noise-free conditions. After generating a suite of coherent and random noise panels, we test YOLOv3's performance while adding in increasing amounts of coherent and random noise. Finally, we offer insights about the applicability of YOLOv3-based microseismic monitoring for field-data environments.

DEEP LEARNING DETECTORS
Deep convolutional neural networks are inspired by the connectivity of human neurons, usually constructed with convolutional layers, sub-sampling layers, and fully connected layers. Deep-learning-based object-detection frameworks mainly fall into two categories: two-stage (e.g., faster R-CNN using the ResNet backbone) and one-stage (e.g., YOLOv3 using the darknet-53 backbone) detectors. In a two-stage detector, the two main steps of object detection include generating regions that possibly have objects and feature extraction of the generated regions (Wu et al., 2020). Usually 2000 region proposals are generated for one single image. Eliminating the proposal generation step, one-stage detectors directly make predictions of different class objects and perform much faster than two-stage detectors.
The YOLOv3 network architecture is termed Darknet-53 and consists of 53 convolutional layers and residual blocks, which provide shortcut connections between deep and shallow layers. Compared to the original YOLO and follow-on YOLOv2 versions, the network is much deeper, provides improved precision in predictions, and can detect smaller objects.
While Darknet-53 is slower than the original YOLO network, the network has fewer layers compared to CNN backbones such as ResNet-101 or ResNet-152 making it more efficient and faster to train and test.
[ Figure 1 about here.] YOLOv3 and its variations divide the input image into S-by-S grids, with each grid cell predicting several objects which are centered in this grid cell (Figure 1, upper center).
The output prediction includes three categories of information about each object: class, bounding box location, and confidence score for each bounding box (Figure 1, lower center).
Class defines the type of object; bounding box gives the center coordinates of the object and the length and width of the box; and the confidence score is the product of the possibility of object's existence and the accuracy of the bounding box location. In the figures below the bounding box is shown in red, which is associated with the predictions defining the bounding box location.
For the training process, we begin by using weights pre-trained on the ImageNet 1000class competition dataset (Redmon and Farhadi, 2018). The neural network is trained in iterations: for each iteration, a batch of data is fed into the network, the network parameters are updated, and the result is then evaluated by calculating the precision and average loss using the rest of the training data as validation. The training process is complete when the average loss does not improve after additional iterations, which for our case usually is about 3000 iterations (after which the network may become overtrained).

PHYSICS-BASED DATA GENERATION
We test the YOLOv3 algorithm on a section of the SEG/EAGE Overthrust P-wave velocity model (see Figure 2). For the purposes of elastic forward modeling, we construct an S-wave velocity model using V S = V P / √ 3 and assume a constant density model, ρ = 2800 kg/m 3 .
[ Figure 2 about here.] We use a 2D GPU-based isotropic elastic finite-difference modeling code (Weiss and Shragge, 2013)  [ Figure 3 about here.] Figure 3 presents one of the 300 forward-modeled events where the P-and S-wave first arrivals are clearly visible in the right-hand side of the image. We also automatically generate and overlay a red bounding box that serves as the labeled input for the YOLOv3 algorithm. The overall output image dimensions and 3-s record represents a reasonable image size and frame rate for real-time YOLOv3 processing.
In addition to generating noise-free microseismic events, we also forward modeled 100 coherent near-surface noise sources (see the red triangles in Figure 2). These noise events were concatenated, from which we randomly selected 401 traces that we again concatenated to generate a coherent background noise panel. We also add in Gaussian white noise to the coherent noise panels. We then add the combined noise panels to the noise-free microseismic events (see Figure 4) to test the ability of the YOLOv3 algorithm to identify events in "semirealistic" low S/N settings.

EXPERIMENTS
We design three sets of training and testing data for YOLOv3 (see Table 1) with the noise levels set at noise-free (dataset 1), S/R=1 (dataset 2), and S/R=0.1 (dataset 3). In addition to adding different levels of noise to the forward modeled data, we also generate the same number of noise-only panels to balance the training dataset. Noise-only panels make the positive and negative predictions equal in number and prevent class imbalance in the dataset (Oksuz et al., 2020). Thus, the full dataset consists of 300 forward modeled events and 300 background noise images. The dataset is then randomly divided into two subsets: 90% used for training, with the remaining unseen 10% reserved for testing the precision of YOLOv3 algorithm.
[ Table 1 about here.] The noise-free dataset has the highest average precision at 91.8%. However, increasing the noise level to S/R=1 and S/R (coherent noise)=0.1 (i.e., datasets 2 and 3 in Table ??) drops the precision to 88.7% and 88.0%, respectively. Figure 5 presents nine example results from the third dataset. The predictions are reasonable and YOLOv3 successfully discriminates noise panels from event images by not predicting any objects shown in the lower two panels of the right column of Figure 5.

DISCUSSION
YOLOv3 is able to detect the synthetic microseismic events with a high precision in a very short duration of time (around 11.09 ms per frame). Thus, this algorithm appears to represent a promising approach for a real-time microseismic detection and discrimination algorithm. Because YOLOv3 is less sensitive to smaller objects compared to CNN backbone detectors, slicing large-sized streaming data into different images is likely necessary. In this work, we use 3 s time windows where the microseismic events usually occupy a significant portion of the image, which makes it easier for object detector to predict and significantly improve the precision. Our physics-based forward-modeled datasets do not have the same level of complexity as field data would: the source locations are restricted to 2-4 km in depth and the receiver arrays are perfectly uniform. These factors will undoubtedly affect the performance of YOLOv3 on field data. Thus, additional testing on field data is required to determine the full capability of the YOLOv3 approach for microseismic event detection on high-velocity streams of monitoring data.

CONCLUSIONS
We applied the YOLOv3 deep convolutional neural network for microseismic event detection. Using a physics-based 2D forward-modeling procedure to simulate microseismic events and realistic coherent noise, we generated three large training datasets with different noise levels. Using pre-trained neural network weights from benchmark image detection datasets, YOLOv3 achieved a high level of precision and efficiency in training and detecting microseismic events. This shows the possibility of real-time microseimic processing using deep learning object detectors, even in the scenarios involving "semi-realistic" low S/N levels.    Noise-free 300 noise-free panels 30 noise-free panels 91.8% 2 1:1:1 300 noisy + 300 noise panels 31 noisy + 29 noise panels 88.7% 3 1:10:1 300 noisy + 300 noise panels 30 noisy + 30 noise panels 88.0% Table 1: Summary of composition and testing precision of three datasets with different signal-to-noise ratio. S, C and R refer to signal, coherent and random noise, respectively.