An attempt at improving atmospheric corrections in InSAR using cycle-consistent adversarial networks

Interferometry from satellite radar has thrived as a major asset to study surface deformations from earthquakes, volcanoes, aquifers, glaciers, landslides, etc. Most signals recorded in an interferogram have precise enough models to remove them almost completely. Yet, current models still fail to capture the full range and scales of variations of atmospheric perturbations. This work explores the use of an image-to-image translation method, CycleGAN, to learn a function that wraps and improves an imperfect model for atmospheric correction. CycleGAN is a generative adversarial network, in which discriminators determine which images are real, and generators produce images to fool the discriminators. Training the discriminators and generators against each other improves the resulting translation function. We have tested this approach using Sentinel-1A data gathered around Puebla, Mexico, which was the starting point of the 2017 earthquake that devastated Mexico City. CycleGAN can generate visually compelling fringes, including small-scale perturbations that are absent from the atmospheric models, while limiting noisy areas. However, it fails to capture the variations of amplitude behind the fringes, especially at large scale, and the generated interferograms remain too different from the real interferograms. Solving that amplitude issue could create practical applications for a CycleGAN-type method in atmospheric correction or phase unwrapping. The large amount of InSAR data and the continuous progress of deep learning methods provide ample opportunity for improvement.


Introduction
Interferometric synthetic-aperture radars (InSAR) measure deformations of Earth's surface by sending and recording microwaves at different times and orbits [Moreira et al., 2013]. This technique has grown to become a major asset in geodesy and remote sensing due to its unrivaled spatial coverage, and is now widely used to study earthquakes and landslides, and to monitor volcanoes, aquifers, and glaciers.
Interferograms produced from InSAR data do not just include deformations, but other signals from the satellite's orbit, such as Earth's shape, the local topography, atmospheric delays, etc. Removing those signals is a vital step of InSAR's processing pipeline to extract the deformations as accurately as possible. Most of those signals can be modeled precisely enough as to limit the remaining processing errors, except for atmospheric delays.
Variations in atmospheric conditions, e.g., water vapor, pressure, and temperature, delay the signals and perturb the interferograms, sometimes masking the deformations. Several techniques exist to model those atmospheric delays based on other types of data or weather models [Bekaert et al., 2015]. But none of those techniques perfectly reproduce the delays, leaving ambiguities in the deformation patterns that can lead to misestimating the source of deformation.
Recent years have seen a huge increase in the available InSAR data thanks to the Sentinel-1 mission. Most studies focus on interferograms containing deformation events, either sudden ones like earthquakes, or longer, smaller-amplitude ones requiring a stack of acquisitions over a large timespan. This leaves aside numerous interferograms that contain no deformation because no sudden event has occurred and the timespan between their acquisitions remains too small to record any small-amplitude event.
We propose to better estimate atmospheric delays using machine learning and all the interferograms that contain no deformation. Indeed, they constitute examples of all the signals that need to be removed, in which the atmospheric delays dominate some processing errors. By computing the atmospheric models at the same location and timespan, we have numerous examples of models of atmospheric delays and actual atmospheric delays (section 2). This work attempts to use a deep learning method for image-to-image translation, CycleGAN, to find a relationship between atmospheric models and real interferograms of atmospheric delays, which could ultimately improve atmospheric correction (section 2).
We use Sentinel-1A interferograms related to the 2017 Puebla earthquake as case study to assess the viability of CycleGAN in this context (section 3 and 4). This magnitude 7.1 earthquake struck central Mexico on 19 September 2017, damaging infrastructure near Mexico City and killing hundreds of people. Characterizing its source and the resulting deformation could help better understand the processes behind such earthquake and limit tragedies. But atmospheric perturbations obscure the deformations in InSAR data (figure 1), limiting their usefulness without further atmospheric correction.   Figure 1 Location of the 2017 Puebla earthquake, and interferogram covering the day of the earthquake before and after atmospheric correction (see sections 2.1 and 2.2). Overlapping the deformation modeled by the USGS National Earthquake Information Center [2018] suggests that the interferogram contains deformations related to the earthquake.

Processing Sentinel-1 data
Sentinel-1 is a constellation of two satellites from the European Space Agency (ESA) for SAR data acquisition. Sentinel-1A was launched in 2014, followed by Sentinel-1B in 2016. Each satellite visits the same location every 12 days and covers the entire planet. ESA releases all the acquired data with several stages of preprocessing. In our case, we need single look complexes (SLC) acquired in interferometric wide swath mode (IW) to generate interferograms covering the epicenter of the 2017 Puebla earthquake. Sentinel-1A acquires data almost centered on the epicenter during its descending orbit. We used the open-source Python package PyInSAR [Rude et al., 2018] to download all the SLC covering this exact same footprint from February 2017 to April 2019 (appendix A) and prepare their processing. Most acquisitions occurred every 12 days, although some gaps exist.
We generated the interferograms from those SLCs using the Interferometric synthetic aperture radar Scientific Computing Environment (ISCE) version 2.2.0 [ISCE developer team, 2019]. Interferograms were not generated when two acquisitions occurred more than 14 days apart. In the end, one interferogram covers the day of the earthquake and should contain deformations (figure 1). 59 interferograms contain primarily atmospheric delays and processing errors, and constitute the first part of our training data.

Models for atmospheric correction
Models for atmospheric correction generated at the same time and location as Sentinel-1A's SLC constitute the second part of our training data. Atmospheric delays can be divided into tropospheric delays due to variations in water vapor, pressure, and temperature in the troposphere, and ionospheric delays due to variations in total electron content in the ionosphere. The effect of ionospheric delays depends on the wavelength, and using C-band SAR like Sentinel-1 tends to minimize those perturbations. Thus, we have not included models for ionospheric correction in our analysis.
Several methods to estimate tropospheric delays have been developed based on different data sources [Bekaert et al., 2015], mainly the interferometric phase itself through an empirical linear or power law relationship with the topography, multi-spectral data from satellites, weather models, and Global Positioning System (GPS) data. Their ability to correct for tropospheric delays vary in space and time, which can leave a large part of the delays untouched. Yu et al. [2017Yu et al. [ , 2018b have attempted to solve some of the limitations of those models, mainly independence to clouds and global, real time availability. Their method exploits the spatial coverage of the high resolution weather models from the European Centre for Medium-Range Weather Forecasts (HRES ECMWF) and the quality of tropospheric delays estimated by GPS into a decoupled interpolation between an elevation-dependent component and a turbulent component. The Generic Atmospheric Correction Online Service for InSAR (GACOS, cegresearch.ncl.ac.uk/v2/gacos) implements this model for online, on-demand generation of zenith tropospheric delay maps for a given area and time. All the models for atmospheric correction used in this work come from this service.

Image-to-image translation using Generative Adversarial Networks
Image-to-image translation has been a longstanding field of study in computer science. It aims at learning a function to transform-or translate-an input image into an output image. Style transfer is an example of such translation in which the style of a famous painter is applied to a photo. Over the recent years, deep learning has made breakthroughs in many areas, and the same applies to image-toimage translation. Generative Adversarial Networks (GAN) [Goodfellow et al., 2014] in particular have been able to accurately translate images in different contexts, for instance generating photos from label maps or edge maps, colorizing images, and style transfer [e.g., Isola et al., 2017, Zhu et al., 2017a. GAN manage such results by training two neural networks against each other: a discriminator, which aims at determining whether an image is real, and a generator, which aims at generating images that fool the discriminator.
CycleGAN [Zhu et al., 2017a] pushes the concept further by using two discriminators and two generators, one for the input images, the other for the output images, and by translating back the output to the input (figure 2). Such approach does not need paired images between input and output, so it is applicable to a wider range of problems, while showing accuracies close to methods requiring paired images. Cycle-consistency can also make CycleGAN more robust to noise even when paired images are available, as reported in seismic velocity inversion [Mosser et al., 2018]. Temporal decorrelation introduces noise in interferograms, making such robustness an attractive property.
InSAR already offers large amounts of interferograms containing only atmospheric delays, and the corresponding atmospheric models can be easily computed. This provides enough training data to train CycleGAN to translate atmospheric models into wrapped interferograms and vice versa. Using wrapped interferograms avoids any bias or artifacts due to unwrapping, and may stabilize training by adding bounds to the interferogram values. In the end, CycleGAN has to learn two functions-one per generator-each with two objectives: (i) a function to wrap and improve an atmospheric model; (ii) a function to unwrap and regress an interferogram. Only the first function would be useful in practice if it improves the atmospheric correction. We have implemented our own version of CycleGAN using the opensource Python package of TensorFlow [Abadi et al., 2015].

Validation of the predictions
Evaluating the predictions of GANs remains a tough task [Borji, 2018]. Fortunately, the paired relationship between real interferogram and atmospheric model means that we can compare CycleGAN's predictions to the real interferograms, and assess if there is any improvement on the atmospheric correction compared with the atmospheric model. Since unwrapping can be a hazardous task, we compare wrapped interferometric phase fields using the mean circu-  lar deviation [Mardia, 1972, Nikolaidis andPitas, 1998]: where d mean circular deviation; n number of cells of the interferometric phase fields; φ first interferometric phase field, an interferogram containing real atmospheric delays; φ second interferometric phase field, the corresponding wrapped atmospheric model or the interferogram predicted by CycleGAN. Such measure has already been used in InSAR to inverse interferograms without unwrapping [Feigl and Thurber, 2009].

Results
We divided our 60 interferograms into a training set of 55 interferograms and a testing set of 5 interferograms, which includes the interferogram containing the Puebla earthquake and 4 randomly chosen interferograms. Since interferograms come from the difference between phases taken at separate dates, one could imagine similar phase patterns occurring on reversed dates. This provides an easy way to augment the training set: we include the interferogram and atmospheric model corresponding to day 2 − day 1 as well as the interferogram and atmospheric model corresponding to day 1 − day 2, giving us 110 interferograms for training. Data augmentation is a common practice in deep learning to stabilize training and improve the results. Atmospheric models for training are scaled altogether between −1 and 1 to preserve the variations of amplitudes between them while stabilizing training. Interferograms are divided by π to achieve a similar effect. Atmospheric models for testing are scaled using the coefficients from the training set.
Training goes over 451 000 iterations on 512 by 512 pixels tiles randomly extracted from the interferograms of the training set. Similarly to the original implementation, the learning rates starts at 0.0002 and linearly decreases after 220 000 iterations. The number of residual blocks is increased to 12 instead of 6 when training on 128 by 128 pixels images or 9 when training on 256 by 256 pixels images in the original work. All the other hyperparameters of CycleGAN follows the original implementation [Zhu, 2020], since changing them did not improve the results.
Training a neural network is an optimization problem to minimize a loss function that defines the quality of the outputs. In the case of generative adversarial networks, the goal is to improve generator and discriminator by using them against each other, and, ideally, training should reach an equilibrium so that generator and discriminator could perpetually learn from each other. As such, the loss values from generative adversarial networks provides little information on the quality of the outputs, but they help to assess the stability of the training (figure 3). While the discriminator for the atmospheric models are quite stable, the discriminator for the interferograms displays an unstable behavior during two-third of the training period, before eventually stabilizing. This behavior might reflect the difficulty in generating the fringes of the interferograms, which display a complex periodicity and noise absent from the atmospheric models.
CycleGAN successfully generates realistic-looking fringes (figure 4), and even captures some characteristics of the interferograms that are absent or differ in the atmospheric models, such as a loose influence from the topography (see for instance tile 6 of interferogram 31 in figure 4) and some wave-like patterns (see for instance tile 10 of interferogram 10 in figure 4). Some noise also appears in the generated interferograms (see for instance tile 6 of interferogram 18 in figure 4), but it remains consistent with the real interferograms (see for instance the same tile in figure 4) and many predictions remain sharp (see for instance tile 8 of interferogram 12 in figure 4). However, predictions are of- Fake AM Fake I

Real I
Real AM x Generator AM

Unwrap and regress
Generator I

Wrap and improve
For tile x of each interferogram: Test data and corresponding predictions by CycleGAN of interferograms and atmospheric models. The test data are 512 by 512 pixels tiles from the test interferograms, the same size used during training. Using the training data, CycleGAN learns to transform interferograms containing real atmospheric delays (Real I) into atmospheric models (Fake AM) and, more importantly, to transform atmospheric models (Real AM) into interferograms (Fake I). In doing so, CycleGAN could learn to correct the atmospheric models and to improve predictions of atmospheric delays in interferograms.  Figure 4 Continued. ten dissimilar to their real counterparts, which is even more apparent on the atmospheric models predicted by CycleGAN, with some predictions being completely out of range (see for instance tile 9 of interferogram 51 in figure 4). As CycleGAN is based on convolution layers, it is possible to predict over a larger area than during training (figure 5). Some of the generated interferograms seem closer to their real counterpart than the atmospheric models (see for instance interferogram 51 in figure 5), while others do not (see for instance interferogram 12 in figure 5). Increasing the prediction area blurs the results, although all the generated interferograms reproduces fine-scale patterns visible in the real interferograms but not in the atmospheric models. Such large-scale predictions better highlight CycleGAN's failures. First, it relies a lot on the topographic structures in the atmospheric models to build the fringes (see for instance interferogram 10 in figure 5), leading to an absence of details in the areas of the atmospheric models not dominated by topography. Second, its conversion from interferograms to atmospheric models is less successful: fringes are translated individually and CycleGAN fails to capture the large-scale structure of the atmospheric patterns (see for instance interferogram 12 in figure 5).
Overall, it remains unlikely that any of those generated interferograms will improve the atmospheric correction. The mean circular deviations with the real interferograms (table 1) show an almost even split between improved and worsened 512 by 512 pixels tiles (figure 4), except in interferogram 51, which confirms a spatial variability in CycleGAN's efficiency. While results seem better on the largest possible tiles (figure 5), the mean circular deviation suggests a clear improvement for interferogram 51 only. But CycleGAN's prediction for this interferogram is noisy, which should make unwrapping challenging.

Discussion
CycleGAN is able to capture and generate various small-scale features of the training data when using a configuration close to default. While it shows its versatility, modifying that configuration only led to similar if not worse results, suggesting that improvements might be difficult to achieve. In our attempts, we modified: the size of the training tiles, the scaling of those tiles, the number of scaling layers in the generator and discriminator, the number of residual blocks in the generator, the last activation of the generator, the weights in the loss functions, and the learning rate. We also tried to use unwrapped interferograms or wrapped atmospheric models, and to reduce the wavelength of the fringes in the interferograms to have more fringes per tiles. Since phase is a circular quantity, we tried to use the mean circular deviation (equation 1) in the loss function for the interferograms, which failed to improve the results as well.
Overall, we observe issues similar to those identified by the deep learning community when using CycleGAN. The most noteworthy is the need for similar structures to appear in the two sets to be translated. For instance, CycleGAN is quite good at transforming horses to zebras thanks to their anatomical similarity, but it is is less successful at transforming cats to dogs because their anatomies are too divergent. In InSAR, this highlights how far atmospheric models still are from reality: they tend to over-represent the elevation- Table 1 Mean circular deviations (equation 1) between the real interferograms and the wrapped atmospheric models (GACOS) and between the real interferograms and the predictions by CycleGAN (CycleGAN) for all the test interferograms except interferogram 18, because it must contain deformations from the Puebla earthquake that hinder a fair comparison. A black value indicates the lowest deviation from the real interferogram.  Figure 5 Test data and corresponding predictions by CycleGAN of interferograms and atmospheric models. The test data are the largest possible tiles within the test interferograms. Wrapped atmospheric models are given for comparison. Using the training data, CycleGAN learns to transform interferograms containing real atmospheric delays (Real I) into atmospheric models (Fake AM) and, more importantly, to transform atmospheric models (Real AM) into interferograms (Fake I). In doing so, CycleGAN could learn to correct the atmospheric models and to improve predictions of atmospheric delays in interferograms.  This results in large smooth areas that are all similar, while the corresponding areas in the interferograms display more variability. A stochastic version of image translation [Zhu et al., 2017b, Almahairi et al., 2018 could be an option to compensate the limitations of the atmospheric models. On top of that, weather patterns and correction success can vary a lot from one interferogram to the next, leading to very heterogeneous properties that cannot be fully captured by 55 interferograms. And noisy areas or areas with little atmospheric delay in some interferograms could perturb the training, which might benefit from identifying and removing those areas. Including interferograms from nearby areas may improve the results as well.
As such, the lack of large-scale trend when predicting at larger sizes remains a big issue. It implies that prediction should be done at training size, but 512 by 512 pixels is already huge from a deep learning and hardware perspective, and a full interferogram is much larger. Improving training stability could help from that perspective, because an unstable training prevents CycleGAN from learning a clear representation of the transfer function. Many developments have already been suggested since CycleGAN came out [e.g., Choi et al., 2018, Hoffman et al., 2018, Gokaslan et al., 2019, and maybe better results could be achieved based on those efforts. CycleGAN requires unpaired data, which is less of a strength in InSAR because interferograms and atmospheric models are paired data. Although it has shown good results when working with paired data [Zhu et al., 2017a, Mosser et al., 2018, using a more adapted method [Isola et al., 2017] could improve the results.
While solving that issue is essential for atmospheric correction, it could also expend the use of such approach, for instance to phase unwrapping, which is a hard exercise because of the spatial structures and noise in interferograms. In this case however, we cannot build a training set using real interferograms, since we would have to use the already existing phase unwrappers and the errors they create. An option could be to create artificial unwrapped interferograms, then wrap them [Rongier et al., 2019], so create an artificial training set in which the wrapping transform is accurate in both directions.

Conclusions
This work illustrates a first attempt at applying CycleGAN, an image-to-image translation method based on generative adversarial networks, to atmospheric correction in InSAR. CycleGAN successfully captures the style of interferograms and generates realistic-looking fringes, but fails to improve atmospheric correction and more work would be required to use such approach in practice. The identified failures of CycleGAN are similar to what has been observed by the deep learning community, and developments on the deep learning side might translate directly to InSAR. Moreover, the InSAR community has now access to an incredible amount of data that should open up many avenues to deep learning. While our approach is a post-processing step, integrating machine learning directly into atmospheric modeling might be more efficient and might lead to more meaningful results.