Reconstruction of Cloud Vertical Structure With a Generative Adversarial Network

We demonstrate the feasibility of solving atmospheric remote sensing problems with machine learning using conditional generative adversarial networks (CGANs), implemented using convolutional neural networks. We apply the CGAN to generating two‐dimensional cloud vertical structures that would be observed by the CloudSat satellite‐based radar, using only the collocated Moderate‐Resolution Imaging Spectrometer measurements as input. The CGAN is usually able to generate reasonable guesses of the cloud structure and can infer complex structures such as multilayer clouds from the Moderate‐Resolution Imaging Spectrometer data. This network, which is formulated probabilistically, also estimates the uncertainty of its own predictions. We examine the statistics of the generated data and analyze the response of the network to each input parameter. The success of the CGAN in solving this problem suggests that generative adversarial networks are applicable to a wide range of problems in atmospheric science, a field characterized by complex spatial structures and observational uncertainties.


Introduction
Clouds are a major component of the hydrological cycle of the Earth and greatly affect its radiative balance, constituting one of the most important yet least well understood climate feedbacks (e.g., Stevens & Bony, 2013;Vial et al., 2013). Since the radiative effect of clouds is greatly dependent on their altitude (Stephens, 2005), their vertical distribution must be understood in order to fully observationally constrain their climate impact. While dozens of passive satellite sensors are currently operational, providing continuous monitoring of clouds in all regions of the Earth, they mostly measure the cloud top height, often in a biased manner (e.g., Garay et al., 2008;Marchand et al., 2010) and thus are unable to fully characterize the vertical profile. Active cloud-observing instruments, that is, radars and lidars, can resolve the cloud vertical structure, but their coverage is much more sparse, with only a few such instruments currently operational in Earth orbit. The large disparity in spatial coverage is one reason for the lack of a global three-dimensional (3-D) cloud observations data set. This absence is a major limitation in the development and validation of atmospheric models. Moreover, the observations of passive sensors are themselves affected by the three-dimensional cloud structure, which can affect the radiative transfer in a manner that is inconsistent with the assumptions of the retrieval algorithms used to derive the physical properties of the cloud (Várnai & Marshak, 2002).
To mitigate the large disparity between passive and active sensor spatial coverage, different computational or algorithm approaches are available. For instance, several algorithms have been proposed to construct 3-D cloud fields using data from both kinds of sensors as input (Barker et al., 2011;Ham et al., 2015), thereby enabling simulation of solar radiative transfer from available data. However, the Barker et al. (2011) algorithm constructs each vertical column in the 3-D cloud field from a nearby column with similar radiances. While this approach seems successful near the measured cross section, and suffices for modeling radiative properties, it would likely not create completely geometrically realistic 3-D clouds, as each column in the 3-D field is simply a copy of one of the measured columns. 10.1029/2019GL082532 probability distribution of the training data, have been recently driven particularly by the invention of generative adversarial networks (GANs; Goodfellow et al., 2014;Radford et al., 2015; see also section 3). These use adversarial training to learn to map a simple probability distribution (e.g., a set of independent standard normal variables) to the training data distribution. GANs can learn to generate artificial samples that strongly resemble those found in the training set. A relatively straightforward variant, the conditional GAN (CGAN; Mirza & Osindero, 2014), learns the distribution conditional to a given input. CGANs can learn to solve conditional probability problems in which the random fields have complex spatial structures and thus are directly applicable to cloud vertical profile reconstruction. Since GANs learn directly from the data, they allow for solutions that might be precluded by algorithms using prescribed rules.
In this paper, we introduce the application of CGANs to probabilistic problem solving in atmospheric remote sensing. We demonstrate the concept by generating CloudSat radar scenes from collocated Moderate-Resolution Imaging Spectroradiometer (MODIS) observations. Thus, we solve a subproblem of the 3-D reconstruction problem stated above by reconstructing two-dimensional (2-D) cloud vertical structures from one-dimensional (1-D) MODIS data.

Data
The CloudSat satellite  carries a nadir-looking 94-GHz cloud radar, located in the A-Train constellation at a 705-km Sun-synchronous orbit. The primary data product is the radar reflectivity, given in the logarithmic dBZ units, which is available in the 2B-GEOPROF data product (Marchand et al., 2008). The MODIS spectrometer (Platnick et al., 2003) on the Aqua satellite is also part of the A-Train constellation, in which CloudSat operated for the majority of its mission, allowing close spatiotemporal collocation of the data from the two instruments. The Aqua MODIS data have been mapped to the CloudSat data coordinates in the CloudSat MOD06-AUX product.
We used the entire year 2010 of the 2B-GEOPROF and MOD06-AUX products as the basis of our data set. From these data, we extracted nonoverlapping rectangular patches of radar reflectivity, 64 × 64 radar bins in size. We refer to these as "scenes" throughout this paper. In physical coordinates, the 64 × 64 size corresponds to approximately 15 km in height and 70 km in horizontal along-track distance, owing to the 1.1-km along-track resolution and 240-m vertical bin size of CloudSat. The scene height is sufficient to cover nearly the entire altitude range where CloudSat is able to detect clouds, while the horizontal extent means that the scenes reflect mesoscale organization of clouds and precipitation. We chose this approach, rather than processing each column individually, because adjacent columns are often similar, and thus, their probability distributions are strongly dependent on each other. Furthermore, the 70-km scale represents a good compromise between how statistically representative it is of observed cloud scales and how much it includes horizontal cloud correlations. Guillaume et al. (2018) have shown that the distribution of horizontal cloud chord length evaluated from CloudSat data was heavily skewed toward short scales, so that clouds at the CloudSat horizontal resolution of 1.1 km are vastly more frequent than clouds at scales of about 2,000 km, which are very rare.
From the MOD06-AUX product, we extracted four variables: cloud top pressure (P top ), cloud optical depth ( c ), effective radius (r e ), and cloud water path (CWP). Additionally, we generated a binary cloud mask variable to indicate whether a cloud was detected by MODIS in a given column (if not, this might be either because a cloud was actually absent or due to missing data). Thus, the MODIS data consist of five 64-bin time series for each scene.
In preprocessing, we rescaled the CloudSat radar reflectivity Z dB linearly from the range [−35 dBZ, 20 dBZ] to [−1, 1] as with missing points and bins below −35 dBZ set to −1, and bins above 20 dBZ set to 1. We mapped the missing values to the minimum values because radar reflectivity tends to decrease on the edges of clouds and precipitating regions, and thus, this allows a smooth transition between cloudy and cloudless regions. The MODIS variables (except the cloud mask) were rescaled as follows: (2) These transformations scale the variables in the data set near to zero mean and unit variance; the logarithm transform was used for some variables to reduce skew. The missing values for these variables were treated differently from the radar reflectivity because not all of them tend to 0 near the cloud edges. Instead, we set each transformed variable to 0 where the data were missing, and also set the cloud mask to 0, as opposed to a mask of 1 where data were available. This provides information to the network regarding the location of the missing values, helping the network learn to distinguish between cloudy and cloud-free areas.
The scenes are limited to daytime observations because some MODIS variables are based on measurements of sunlight scattering from the cloud and thus are not available at night. To avoid complications due to terrain echoes in the radar data, we also limited the scenes to those occurring over the oceans. Finally, to avoid processing large numbers of near-empty scenes, we limited the data set to scenes where the MODIS cloud mask indicated a cloud in at least 50% of bins. We recognize that this downselection, made in the interest of efficiency, introduces some bias into the global distribution of samples. The same applies to the use of a single year of training data, which neglects possible interannual variability in the modeled relationship of MODIS-derived cloud properties and radar reflectivity. Depending on the application, it might be useful to retrain the model with different selection criteria.
The final data set consists of 199,622 scenes. Of these, 90% were selected randomly for training, while the remaining 10% were set aside for validation.
The output of the generator network is scaled back to [−35 dBZ, 20 dBZ]. Output bins that have a reflectivity lower than −30 dBZ are then flagged as missing values. This is done because CloudSat rarely detects signals below −30 dBZ, and because the network sometimes generates weak spurious outputs at just above the minimum value. This postprocessing removes these artifacts effectively, thus improving the visual similarity of the real and generated images.

GAN Architecture and Training
The machine learning problem is stated formally as follows: Given a vector y, containing the MODIS observations described above, we seek to characterize the conditional probability distribution p data (x|y) of CloudSat scenes x. We use the CGAN to solve this problem by training a generator neural network to map vectors z, whose each element z i is sampled from the standard normal distribution, to CloudSat scenes x, conditional to the MODIS observation vectors y. Following the GAN principle, the generator is trained adversarially against a discriminator network, which is trained simultaneously with the generator. The discriminator is trained to distinguish generated samples from real samples, while the generator is trained to "fool" the discriminator as much as possible.
For the generator, we use a deep CNN that takes as its inputs the MODIS observation vector y and the noise vector z. The generator has one densely connected layer followed by four convolutional layers. Following the deep convolutional GAN (Radford et al., 2015) practices, we use upsampling layers followed by convolution. Each hidden layer is followed by a rectified linear unit (ReLU) activation (Nair & Hinton, 2010) and a batch normalization step (Ioffe & Szegedy, 2015). The final layer uses a tanh activation with outputs between −1 and 1; this is then rescaled to the appropriate dBZ range.
The discriminator takes as its input a scene x and a MODIS observation vector y. The MODIS observations are first upsampled into 64 × 64 bin channels using a four-layer convolutional network similar to the architecture used in the generator. The upsampled MODIS observations and the generated image are then processed using four hidden layers, each using strided convolutions followed by leaky rectified linear unit activations (with negative slope of 0.2) and dropout. The output layer is densely connected to the final hidden layer and is sigmoid activated to yield a number between 0 and 1 representing the probability that the input scene is a fake sample created by the generator (as opposed to a real CloudSat scene).
The generator and discriminator networks are described in detail in Figure S1 in the supporting information.
The code and training data are available as described in the Acknowledgments.

10.1029/2019GL082532
To train the CGAN, we alternated between training the generator with a single batch of data and training the discriminator with two batches, one containing real samples and the other containing generated samples. We train the CGAN for a total of 45 epochs, gradually increasing the training batch size from 32 to 256. The Adam optimizer (Kingma & Ba, 2014) was used to train both the generator and the discriminator. We performed the training using a single Nvidia Tesla K80 general-purpose graphics processing unit; the full training required approximately 40 hr. Figure 1 displays selected examples of generated CloudSat scenes for a variety of different MODIS measurements. The top two rows in each column show the MODIS variables, the four middle rows show scenes generated by the CGAN from the MODIS data, and the bottom row shows the actual CloudSat scene that corresponds to the MODIS data. All data shown are from the validation data set; that is, they were not used to train the network. Of the generated scenes, the topmost shows the image generated with the noise input z set to all zeros, representing the most likely answer according to the CGAN. The other generated images were created with randomly sampled noise vectors. On each generated image, a root-mean-square error (RMSE) relative to the real image is also plotted, calculated such that missing data were set to −30 dBZ before taking the difference. The RMSE is an imperfect metric because the GAN is explicitly not designed to optimize the RMSE, but rather the visual similarity, as defined by the discriminator. In general, quantitative evaluation of GAN-generated images is a topic of ongoing debate with no clear consensus (Borji, 2018). Nevertheless, the RMSE can give some indication of the accuracy of the reconstruction.

Generated Versus Real Scenes
It is evident from Figure 1 that the CGAN generator can create realistic-looking radar reflectivity scenes. Columns 1 and 2 show scenes that contain fairly uniform cloud layers. In these, the structure of the cloud is accurately predicted by the CGAN: The radar echo top height and the geometric thickness of the cloud are predicted to within 1 km, and the radar reflectivity of the generated cloud also has very similar values. The textures are also similar between the real and generated scenes: Scene 1 is relatively uniform, while the structure of the cloud in Scene 2 is more complex. However, the generator misses certain specific details in both scenes, such as the change in the altitude of the radar echo top in the middle of Scene 1, and the low-level cloud that is present in Scene 2, although in this case, one of the solutions does include a low-level cloud in the wrong position.
Columns 3-5 in Figure 1 demonstrate various cases where the CGAN successfully infers the presence of multilayer clouds. It appears that the generator exploits the spatial variability of the MODIS variables to infer the presence of multiple cloud layers. In Columns 3 and 4, the cloud top pressure P top is variable, and this seems to drive the CGAN to create multiple layers. In Column 4, the best match to the real scene is notably not the scene deemed most likely by the CGAN, but rather one of the randomly sampled scenes. This demonstrates the advantage of the CGAN generating a distribution of possible predictions for a given input. In Column 5, the increase of c and CWP on the right side of the scene apparently allows the CGAN to infer the presence of a thick low-level cloud, probably of convective origin, underlying the thinner cloud layer around 12-km altitude.
Columns 6 and 7 of Figure 1 show high-reflectivity scenes where the radar echo reaches the surface. In these scenes, as with Columns 1 and 2, the cloud top height is accurately predicted by the GAN, as is the general intensity of the radar echo. The generated scenes in Column 6 also include traces of the melting layer bright band that is evident in the real scene, although the generated bright band is not nearly as sharp as that in the real scene. This could possibly be improved by including information about the atmospheric temperature in the CGAN inputs, but we did not explore this in the current study. In both Columns 6 and 7, the most likely solution resembles the real scene quite closely, while the randomly sampled scenes include some solutions where the radar echo does not reach the surface, leading to a higher RMSE.
Finally, Column 8 of Figure 1 demonstrates a case where gaps in cloud detection by CloudSat are correctly predicted by the CGAN; while there are gaps in the MODIS data, these clearly do not correspond exactly to the missing CloudSat echoes. This demonstrates that the CGAN can predict situations where CloudSat would not detect a cloud even though it is seen by MODIS. Conversely, there are significant MODIS data gaps on the right side of the scene, but the CGAN correctly generates a low-level cloud there regardless; apparently, the CGAN can recognize situations where data gaps are caused by missing data (e.g., rejected retrievals) rather than actual absence of clouds and enforce continuity in the generated cloud scene.
Unlike Scenes 1-8, in Scenes 9-16 of Figure 1 the CGAN has some difficulty making the correct prediction. In Column 9, the radar echo in the real scene reaches the ground, while the generated scenes do not reproduce this. In Scene 10, a multilayer cloud is incorrectly interpreted as a deeper, single-layer cloud. The real scene in Column 11 is quite uniform and contains a pronounced reflectivity intensification at the melting layer; in the generated scenes, the layer is much thinner on the left side of the scene than on the right, and no melting layer is present. Notably, the MODIS data in this scene contain rather large gaps that have no obvious counterpart in the CloudSat data. In Column 12, the real scene contains a detected cloud that covers almost all of the horizontal extent of the scene, but the CGAN predicts a radar echo much more concentrated on the left side. In the scene shown in column 13, the CGAN generates a spurious second cloud layer on the left and the center, and also mostly misses the convective cloud in the middle of the scene. Finally, Columns 14-16 contain complicated scenes that the CGAN appears to find difficult to interpret. In each case, there is considerable variability among the generated scenes, none of which correspond to the real scene particularly well. The common feature in these scenes seems to be that a high, continuous cloud layer masks MODIS from seeing the cloud layers below. In such cases, it is hardly surprising that not much can be reliably predicted about the underlying clouds. Thus, the high variability among the generated radar reflectivity fields seems to reflect the uncertainty of the CGAN about the correct solution.
Naturally, in probabilistic predictions, the most likely solution is not always the correct one; rather, in a properly functioning probabilistic model, one would expect to find the correct solution somewhere within the predicted distribution. In Figure 1, only four generated solutions are shown for each case due to space constraints. Such few samples cannot be expected to completely represent the entire probability space. In order to further explore the probability space of our predictions, we have included Figures S2-S9 in the supporting information. These correspond to each of the problematic scenes 9-16 of Figure 1 but show 64 randomly generated examples for each scene. Additionally, to widen the range of predictions made by the CGAN, we used a noise standard deviation of 2 rather than 1 in the noise input z of the generator. As expected, increasing the noise standard deviation led to a higher variability in the generated scenes. Meanwhile, this increase in the noise did neither reduce the credibility of the generated images, nor trigger the generation of obvious artifacts.
With the higher variability and the larger number of generated samples drawn for each scene, the generated probability space in most cases includes scenes that correspond closely to the correct solution. Solutions where the reflectivity field reaches the surface can be found for Scene 9 ( Figure S2), and multiple cloud layers at roughly the right altitudes are present in some examples in Scene 10 ( Figure S3), although the radar reflectivity in these remains too high. Likewise, some solutions in Figure S4 are considerably more horizontally uniform than those found in Scene 11 of Figure 1, and the solutions in Figure S5 include scenes with extended low-level clouds resembling that of the real solution in Column 12. In Figure S6, there are some solutions where the spurious cloud on the left is weaker than in the solutions shown in Column 13, and others where the convective cloud in the middle is stronger. These solutions improve the representation of these features, but none of the generated scenes in Figure S6 completely reproduce the real scene; in particular, the spurious second cloud layer persists at least partially in all of the generated images. In Figures  S7-S9, corresponding to Columns 14-16 of Figure 1, the high variability of the generated scenes further demonstrates the uncertainty of the CGAN about the vertical structure of the clouds. This is accompanied by a higher RMS variability, indicated on top of each plot, which can be used as a simple diagnostic for uncertainty about the correct solution. In each of these cases, some of the generated images somewhat resemble the real scene, indicating that the highly variable solution space also includes the correct solution with a non-negligible probability.
The scenes shown in Figure 1 were selected manually to demonstrate the operation of our CGAN in various situations. As such, they are not statistically representative of the data set. In order to provide further examples of the functionality of the CGAN over the entire data set, we have also included Figures S10-S17 in the supporting information. These figures are equivalent to Figures 1, except that the cases shown in them have been selected randomly.

Dependence on MODIS Parameters
The above analysis suggests that the CGAN has learned a fairly complex, nonlinear response to the MODIS variables. Nevertheless, it can be instructive to examine how the generator responds to simple changes in the input variables. In Figure 2, we have plotted the changes in the generated cloud scene while varying each of the four MODIS variables individually. The middle column shows the scene generated from synthetic MODIS data with all transformed variables (as defined in equations (2)-(5)) set to their mean values in the data set, while each row shows the variability of the generated scene when a single input variable is varied from 2 standard deviations below the mean (−2.0 ) to 2 standard deviations above the mean (+2.0 ). All scenes have been generated with zero noise in order to give the most likely answer according to the generator.
In many cases, the scenes generated in this way do not look physically realistic. This is probably because, in reality, the parameters do not vary individually but are significantly correlated. Nevertheless, it is encouraging the generator is well behaved in the sense that no scenes contain obvious image processing artifacts, and the response to the parameters is smooth. The response to the change in P top is the easiest to interpret, as increasing P top corresponds to lowering echo tops in the generated cloud scene up to +1 . This good correspondence can be expected, as the CGAN also accurately predicted the echo top heights in section 4.1. However, at high P top , the clouds become increasingly thin and multilayered. The low cloud layer, which seems to correspond to the P top observation, is barely visible at +1.5 and disappears altogether at +2.0 . The lowest P top scenes are also accompanied by lower-altitude clouds. A plausible explanation of this is that very low P top usually occurs with anvil clouds originating from deep convection, which is often accompanied by shallower convective clouds.
The effective radius r e is another variable for which one can make a physical interpretation of the generator response. In this case, low r e occurs in nature in nonprecipitating clouds, which tend to be somewhat shallow in vertical extent, and also have weak radar reflectivity signatures. Conversely, high r e typically occurs in precipitating clouds, which have higher reflectivities that cover a larger vertical extent (as the radar is sensitive to the precipitation in addition to the cloud). The CGAN response to r e is consistent with this relationship.
The effects of c and CWP individually are difficult to interpret, since in practice, these two variables are strongly dependent on each other (for details, see, e.g., Grosvenor et al., 2018). Thus, it is physically unrealistic to change one of these without changing the other. Low values of c create a vertically shallow, high-reflectivity cloud layer, which probably would not occur in realistic scenarios. Meanwhile, high values of c create a deep, high-reflectivity (i.e., precipitating) region with a low-reflectivity layer on top. Curiously, the scenes generated with low CWP are similar to those produced by high c . Meanwhile, high CWP leads to a rather unrealistic-looking layer with high reflectivity around 5-to 8-km altitude, with lower-reflectivity regions both above and below.

Cloud Vertical Distribution
A downside of using adversarial training in GANs is that there is not a clear, specific metric to judge model performance. However, we can still examine the distribution of data statistically and compare between the generated and real data sets. A commonly used method for analyzing radar data climatologically is to present the aggregated data as a two-dimensional joint distribution of altitude and radar reflectivity, sometimes called contour frequency by altitude diagram (e.g. Steiner et al., 1995). We present these distributions for our data set in Figure 3. The histogram for the real data was computed from the validation data set, while the generated histogram was obtained by running the generator for each scene in the validation set using randomly sampled noise.
Clearly, the generated histogram replicates the most significant features of the histogram for the real data set. The CGAN also replicates the decreasing occurrence near −30-dBZ reflectivity, which is caused by the CloudSat radar detecting only some of the radar echoes near its sensitivity limit. However, this transition appears to be more gradual in the generated data than in the real data set. The extremes of the real and generated histograms also seem to have similar distributions, indicating that the CGAN captures the data distribution well near the extreme values.
The relationship of reflectivity and altitude can also help illustrate regional difference in cloud structure (Leinonen et al., 2016;Oreopoulos et al., 2017). In Figures S18-S24 in the supporting information, we also show the same plot along 20 • zonal bands. The accuracy of these is similar to Figure 3, indicating that the GAN does not suffer from significant regional bias. Furthermore, we show the standard deviation of occurrence for both the real and generated data in Figure S25, which indicates that the generator correctly captures the variability of the data.

Conclusions
The CGAN described in this study is capable of generating crisp images that strongly resemble the radar reflectivity scenes in the data set. Most of the time, the CGAN generates cloud vertical structures that are close to those measured by CloudSat, using only the collocated MODIS data as input. The generator is capable of exploiting the spatial structure of information in the input data, most notably inferring the presence of multilayer clouds. It is robust in cases of missing data, being able to interpolate into regions of missing MODIS inputs. The generator can also characterize the uncertainty of its predictions to some degree, creating more variability in its outputs in cases where the uncertainty is high, although we observed a few cases where the variability appears underestimated, as none of the generated scenes in the output distribution match the real scene particularly well. The generator is also able to generalize its learning to the validation data set, which was not used for training.
Based on these results, we argue that machine learning using GANs (and CGANs specifically) has potential to solve a variety of problems in atmospheric remote sensing, and observational Earth science in general. Typical problems in this field of study involve complex spatial structures, which CNNs handle effectively, and incomplete measurements, which are best treated using probability distributions, an integral feature of GANs. Conditional probability problems, in particular, are ubiquitous in the formulation of remote sensing retrieval problems, and are naturally handled by CGANs. This study is intended to demonstrate these capabilities and lay the foundations for further investigations that target more practical applications. For instance, reconstructing 3-D cloud scenes from MODIS 2-D imagery, as opposed to reconstructing 2-D vertical profiles from 1-D MODIS data in this study, would make available an estimate of cloud vertical structure over very large areas, as the MODIS data cover a swath of over 2,000 km rather than the single nadir-pointing scan obtained by CloudSat. This could also be useful in the context of missions such as EarthCARE, for which 3-D reconstruction algorithms are being developed (Barker et al., 2011). Implementing such reconstruction using GANs will likely involve substantial challenges related to network design and computational requirements. The original CloudSat data products 2B-GEOPROF and MOD06-AUX are available at the CloudSat Data Processing Center (http://www. cloudsat.cira.colostate.edu/). The training data set has been made available by Leinonen (2019). A Python/Keras implementation code that can be used to reproduce the results is available at https://github. com/jleinonen/cloudsat-gan website. The research of J. L. and A. G. was carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology, under a contract with the National Aeronautics and Space Administration (NASA) and funded through the internal Research and Technology Development program. The High Performance Computing resources used in this investigation were provided by funding from the JPL Office of the Chief Information Officer. T. Y. acknowledges funding from NASA Grant 80NSSC18M0084, "Making Earth System Data Records for Use in Research Environments," PM: Lucia Tsaoussi.