An Active Learning Pipeline to Detect Hurricane Washover in Post- Storm Aerial Images

We present an active learning pipeline to identify hurricane impacts on coastal landscapes. Previously unlabeled post-storm images are used in a three component workflow — first an online interface is used to crowd-source labels for imagery; second, a convolutional neural network is trained using the labeled images; third, model predictions are displayed on an interactive map. Both the labeler and interactive map allow coastal scientists to provide additional labels that will be used to develop a large labeled dataset, a refined model, and improved hurricane impact assessments.


Introduction
Hurricanes can change coastal landscapes by redistributing large quantities of sediment. For storms that impact the US, coastal landscape change can be assessed using Emergency Response Imagery collected by the National Geodetic Survey Remote Sensing Division of the US National Oceanographic and Atmospheric Administration (NOAA; https://storms.ngs.noaa.gov/). This comprehensive aerial imagery is often obtained soon after the storm event, and is typically large both in terms of the number of individual images per storm and the size of each image. Additionally, this imagery is unlabeled.
Here we use post-storm imagery to identify hurricane impacts to coastal landscapes along the US Atlantic and Gulf coasts. Our work combines ideas from two lines of research -first, the use of crowd sourcing to label the impacts visible in post-storm imagery [Liu et al., 2014, Morgan et al., 2019, and second, the use of machine learning to classify and investigate coastal landscape dynamics [e.g. Buscombe and Ritchie, 2018, Ridge et al., 2019, Goldstein et al., 2019. We develop an active learning [e.g., Settles, 2011] methodology to identify storm impacted landscapes from crowd-labeled imagery, specifically the binary classification task of identifying the presence of washover deposits in each image. Washover is a deposit of sediment left on land surfaces after elevated coastal water levels [e.g., Hudock et al., 2014, Lazarus, 2016. We crowd-source labels for 528 post-storm images from a two recent hurricanes (Florence and Michael, both 2018) and develop a deep learning model to detect the presence of washover deposits in a corpus of unlabeled imagery. The model is used to determine which unlabeled images have the most uncertain label, and pass these images back to the labeler for human annotation. In addition, a public facing map is used to display geolocated ML predictions of washover presence in post-storm images and provide an additional route for researchers to label imagery. Both the labeler and interactive map provide more training data for future model improvements.

Active Learning Pipeline
The active learning system is broken up into three components: an online labeler, a machine learning model, and an online interactive map (Figure 1). In the subsections below we discuss these components sequentially. Figure 1: Schematic of the components (boxes) and processes (arrows) involved in the active learning pipeline.

Labeler
The labeling interface [Rafique et al., 2020] serves a stream of images and a given set of questions to coastal researchers ( Figure 2). The labeler is designed to accommodate multiple researchers labeling a single image (e.g., to ensure correct labeling via consensus, and assess inter-rater reliability), and is currently hosted on a virtual machine exposed via an URL https://coastalimagelabeler. science/. An administrator uploads images to be labeled (in this case -from NOAA), develops questions for the labeler to present to users, assigns sets of images to each user to label, and periodically exports the labeled data to an open data repository. We developed a bespoke software package to download and manage the collection of NOAA post-storm imagery [Moretz et al., 2020a,b]. To begin the project, 8 coastal researchers labeled 388 images from Hurricane Florence . Each image was labeled until 2 or more researchers agreed on the label. These images were used to train and test the deep learning model. After this initial task, the labeler now hosts a changing set of images for all users that are selected based on the results of the deep learning model (described in the next section).

Model
The 388 labeled images from Hurricane Florence (179 washover, 209 no washover) were used to develop a model for the binary classification task of detecting washover presence/absence . We use VGG16 Simonyan and Zisserman [2014], as it has been used successfully for other remote sensing tasks [e.g., Sinha et al., 2019, Sumbul et al., 2020]. The base model was initialized with ImageNet weights, joined to 1 fully connected layer (with 50% dropout), and then fine tuned with a low learning rate (1e-5). We resize the post-storm images to 416 x 416, use image augmentation during training (rotation, width and height shift, shear, zoom, horizontal and vertical flip), and train the model for 200 epochs with an early stopping callback. We use 140 labeled images from Hurricane Michael (70 washover, 70 no washover), labeled by two coastal scientists, to test the model. The F1 score for the test set is 0.92, and the confusion matrix can be seen in Table 1). Visual inspection of results using Grad-cam [Selvaraju et al., 2017] suggests that the model is correctly identifying washover deposits (Figure 3). The trained model is then used for inference with 9,700 unlabeled images from Hurricane Florence (2018), Hurricane Michael (2018), and Hurricane Isaias (2020). For each of the 3 storms we select the 100 images with the least label certainty -the 100 images with sigmoid output values closest to the decision value of 0.5. Therefore, the threshold certainty value for each storm is T s = max k (abs(0.5 S(I k )), where S(I k ) is the sigmoid output of image k, and max k is the value of k th image (where k = 100 in this case).
Figure 3: Three images with washover deposits from Hurricane Michael, and the associated Grad-cam heat map from the last model layer.
These 300 images are sent back to the labeler for expert annotation (i.e., an uncertainty sampling method of active learning). We select 100 images for each storm to balance the burden on labelers but have enough images to finetune the model further. Once these images have been labeled by a coastal experts we will retrain the model, perform this active learning step on unlabeled images, and send a new batch of images to the labeler. See  for the most recent labels.

Map
Each image is geolocated. We visualize the output from the model inference on a map (https: //uncg-daisy.github.io/StormImpactMap/) to observe the presence/absence of washover deposits contextualized with other data -e.g., hurricane tracks https://www.nhc.noaa.gov/data/ tcr/, measured washover extents [Doran et al., 2019a], and coastal change forecasts [Doran et al., 2019b]. Post-storm images with their washover prediction are displayed on the online map as a marker (Figure 4). This online interactive map is another route for active learning -users can click the marker and see links to the post-storm image, the Grad-cam overlay on the image obtained from the last layer of the model, and a button to mark the ML prediction as correctly (or incorrect if no washover is actually seen). As washovers are a rare class in the dataset, the map aids in finding and labeling washover imagery and works against future class imbalance in the labeled data. This labeled imagery from the map interface will be incorporated in future model retraining.

Future Directions
We have developed a three part active learning pipeline with unlabeled post-storm aerial images. The labeler, model, and map all work in conjunction toward two overarching goals: 1) to crowd source the development of a large labeled dataset of storm impacts, and 2) continually improve a model to detect storm impacts in sandy, low-sloped coastal regions. A successful model can potentially be used for rapid and automated assessment of post-storm imagery. The pipeline is currently focused on identifying hurricane washover deposits, but can be expanded to look at other impacts (e.g., damage to the built environment, flooding, dune erosion, ecosystem impacts, etc.).
This pipeline does not currently incorporate pre-storm imagery, so the model is susceptible to classifying past storm impacts (i.e., washover from previous storms) and other sandy bowl-shaped geomorphic features (e.g., dune blowouts) as current washover deposits. Though note that washover deposits can be long-lived features that are reactivated in subsequent storms [e.g., Hosier and Cleary, 1977]. Future work will focus on testing different model architectures (to allow for active learning using query-by-committee) and also providing additional data to the learner (i.e., pre-storm imagery, storm water levels, storm wave heights) that can be used to improve model performance.