Reconstructing Aerosols Vertical Profiles with Aggregate Output Learning

Aerosol-cloud interactions constitute the largest source of uncertainty in assessments of the anthropogenic climate change. This uncertainty arises in part from the inability to observe aerosol amounts at the cloud formation levels, and, more broadly, the vertical distribution of aerosols. Hence, we often have to settle for less informative twodimensional proxies, i.e. vertically aggregated data. In this work, we formulate the problem of disaggregation of vertical profiles of aerosols. We propose some initial solutions for such aggregate output regression problem and demonstrate their potential on climate model data.


Introduction
Aerosols are atmospheric particles that influence the Earth's energy budget both by scattering radiation directly (direct effect) [8], and by acting as cloud condensation nuclei (CCN) that modulate cloud droplet number and radiative properties (indirect effect). Aerosol-cloud interactions (ACI) contribute the largest uncertainty to the projection of anthropogenic climate change, in part due to the poor estimates of the abundance and vertical distribution of aerosols in the atmosphere [4].
While field measurement campaigns provide detailed aerosol data, these are spatio-temporally sparse [1,13] and provide insufficient constraints on aerosol global distribution. In contrast, satellite observations offer long term global records, but they are typically limited to measurements of aerosol optical properties [12]. This manuscript has been accepted to the Tackling Climate Change with Machine Learning Workshop at ICML 2021. This is a nonarchival submission that will appear on the workshop website https://www.climatechange.ai/events/icml2021.html.
Aerosol Optical Depth (AOD), defined at a given wavelength λ as: where b ext is the extinction coefficient 1 and the integral is taken over the height H of the atmospheric column.
While AOD is useful as a measure of total aerosol load within the column, it does not provide information on the vertical distribution of aerosols, which strongly influences both the magnitudes and even the sign of the aerosol direct and indirect effects. For example, both modelling [14] and observational studies [10] find AOD to be inadequate for assessing aerosol-cloud interactions over vast subtropical ocean areas, which play a key role in determining the radiation balance of the earth. However, in both cases, vertically resolved aerosol extinction coefficient b ext shows significantly higher correlations with CCN or its proxies.
We investigate the reconstruction of aerosol vertical profiles using as inputs meteorology and chemistry covariates. While our prime motivation is to reconstruct b ext from satellite measurements of AOD [12,18], the intricacies of combining measurements from different instruments makes it challenging to validate any proposed methodology. On the other hand, climate models have readily available aerosol vertical profiles and are self-consistent in the sense that all data is jointly observed. Hence, we propose to use data from NASA's Goddard Earth Observing System, version 5 (GEOS-5) Nature Run (G5NR) simulation [17] --a highresolution global circulation model [3] for model development. We choose sulfates (SO 4 ) as a case study --these are a major contributor to anthropogenic aerosol pollution and arise primarily through oxidation of sulfur dioxide (SO 2 ) emitted from burning fossil fuel. G5NR provides matched samples of the vertically resolved sulfate mass concentrations [SO 4 ] and the mass column density defined as: To mirror our motivating application, i.e. predicting vertically resolved b ext from AOD observations only, we propose in this work to probe the vertical reconstruction of [SO 4 ] given σ SO 4 .
Motivated by the study of cloud vertical structures, the task of reconstructing three-dimensional (3D) profiles corresponding to two-dimensional observations (2D) has been framed in the past as fully supervised learning [6]. Collecting high quality observational data of aerosol vertical profiles at large scale is however unfeasible, thus making fully supervised approaches unsuitable. In fact, while in previous work Nair and Yu [9] have addressed the task of CCN number prediction from atmospheric measurements, they resorted to using model data in order to apply fully supervised learning models.
Spatial disaggregation or statistical downscaling is the task of inferring subgrid details given a low resolution observation process. Postulating an underlying fine grained spatial field that aggregates into coarse observations, this problem can be framed as weakly supervised learning [21] with aggregate outputs. Existing works [2,7,15,16,19,20] have only considered applications to 2D fields, yet this rationale can be extended to 3D fields that aggregate with respect to height into a 2D field. Furthermore, Kipling et al. [5] show that aerosols' behavior is driven by a relatively small number of atmospheric processes, suggesting that having access to 3D observations of these processes could help vertical profile reconstruction. We thus propose to frame the reconstruction of [SO 4 ] vertical profile as the vertical disaggregation of σ SO4 using 3D atmospheric covariates.
Our contributions are threefold: (i) We propose the novel problem of aerosols vertical profiles disaggregation given vertically aggregated targets and 3D atmospheric covariates; (ii) We introduce a dataset of 2D+3D meteorological and chemistry covariates from NASA's G5NR simulation, with focus on sulfates aerosols; (iii) We describe baseline vertical disaggregation models and demonstrate them on the sulfate mass column density disaggregation problem.

Dataset
NASA's GEOS-5 Nature Run (G5NR) is a 2-year (June 2005-May 2007) global, non-hydrostatic 2 mesoscale 3 simulation with a 7 km horizontal resolution and 72 vertical levels (up to 85km). This simulation includes standard meteorological parameters, as well as aerosol tracers (dust, sea salt, sulfate, black and organic carbon), O 3 , CO and CO 2 . The simulation is driven by prescribed sea-surface temperature and sea-ice, daily volcanic and biomass burning emissions, and high-resolution inventories of anthropogenic sources.
We used the instantaneous (30min) products relevant for sulfate formation with 7km horizontal resolution from the 15 th of January 2007 as the basis for our dataset. Sulfates are a major contributor to the total AOD in our dataset (see Figure 3 in Appendix A). As depicted in Figure 1, the dataset is located in the Pacific Ocean (10-30 • S, 80-100 • W). Table 1 outlines the variables used in the dataset and the notation that will be used to refer to these in the following sections.  Table 1. Dataset variables, "2D" corresponds to variables indexed by time, latitude and longitude while "3D" corresponds to variables that also have a height dimension.

Name
The groundtruth 3D SO 4 mass concentration is calculated by multiplying the SO 4 mass mixing ratio with the (moist) air density in the column as: We verify numerically that this field aggregates with respect to height into the 2D field σ SO4 provided by the dataset.

Baseline Methodologies
In this section, we describe baseline models. While our experiments focuses on SO 4 column mass density disaggregation, we employ broad notations for the sake of generality.

Problem Statement
Let {{x ∈ R dx is a spatial covariate that admits 3D coordinates following latitude, longitude and height and belongs to j th height column. We denote its tensor concatenation as ] ∈ R H×dx and denote X := [x 1 . . . x n ] ∈ R n×H×dx . y j ∈ R dy is a column-level covariate that admits 2D latitude and longitude coordinates and z j ∈ R is the column covariate we wish to disaggregate along height. Likewise, we use tensor representations Y := [y 1 . . . y n ] ∈ R n×dy and z = [z 1 . . . z n ] ∈ R n .
Let f : R dx → R be the function of interest we want to recover and consider the linear aggregation operator defined for the j th column as Agg j : f → j th column f (x) dh(x) where h is a positive measure of height. Then we postulate an aggregate observation model where ε j ∼ N (0, σ 2 ) is observation noise.
In finite data size, we substitute Agg j (f ) with an approximation Agg(f (x j )) using trapezoidal rule where ∆h (i) j is the height difference between x (i+1) j and x (i) j . With notation abuse, we will use in what follows notation Agg(·) for the approximate aggregation over height of any tensor that admits a height dimension. For example, To probe the potential of the vertical disaggregation task, we propose as a first approach to adopt a plain linear model baseline and make hypothesis f (x) = x β + c with β ∈ R dx and c ∈ R. Without loss of generality, we assume that intercept is included in the covariates, i.e. c = 0.

Aggregate Ridge Regression
Given observation model (4), it is natural to consider as an objective the regularized quadratic risk between aggregate observations and the aggregated prediction, which writes with regularization weight λ > 0.
By linearity of the aggregation operator, the solution to (6) is simply the solution to the ridge regression of 3D covariates aggregated over height Agg(X) against aggregate targets z, given by While straightforward, this method has the advantage of scaling seamlessly to a huge amount of data since it only incurs a O(d 3 x ) matrix inversion computational cost.

Two-Stage Aggregate Ridge Regression
In the above, it is implicitly assumed we can establish a mapping j → Agg j (f ) that associates each column to its aggregated value based on its index only. In doing so, each column is treated and regressed independently from the others. This is unrealistic since we expect continuous fields to correlate across spatial and temporal dimensions. On the other hand, the 2D column-level covariates y j provide information about columns correlations. This in turn can be used to embed columns dependence information by learning a mapping y j → Agg j (f ).
In a second baseline, we augment the aggregate ridge regression model by a two-stage process that enables leveraging the information conveyed by 2D covariates.

Stage 1
We first regress the 2D covariates Y against aggregated columns values Agg(Xβ). For the sake of simplicity, we assume linear model g(y) = γ y and minimize regularized empirical quadratic risk where ν > 0 is a regularization weight. As per above, this admits closed form solution Stage 2 We now use the prediction provided by g(Y) to regress against aggregate targets z.
The evaluation of the regressor learnt in Step 1 writes g(Y) = Yγ = Υ Agg(Xβ). Hence, substituting the latter to the aggregated columns values in (6), we obtain empirical risk which admits closed form minimizer Comparing (11) to (7), we can interpret Υ as a regularizing term enforcing functional smoothness across 2D covariates.

Experiments
We demonstrate and evaluate the baseline models at the vertical disaggregation of sulfate mass column density using dataset introduced in Section 2. The ridge regression baseline is referred to as RIDGE and the two-stage ridge approach as TWO-STAGE. We report evaluation against the unobserved groundtruth 3D sulfate mass concentration profiles [SO 4 ] and also compare the vertically aggregated prediction against the 2D sulfate mass column density σ SO4 used for training. Scores are reported in root mean square error (RMSE), mean absolute error (MAE) and Pearson correlation (Corr.). Experiments are implemented in PyTorch [11] and code and dataset are made publicly available 4 .
Model Setup: Since RIDGE only uses 3D covariates, we use as input 3D variable x = (latitude, longitude, altitude, r SO 2 , RH, T, w, q) where covariates notations are defined in Table 1. For TWO-STAGE, we can additionally leverage column-level 2D covariates in the first stage. We use 2D covariate y = (latitude, longitude, σ SO4 , LWP) to fit the first step, and then use the same covariates as RIDGE for the second step. We emphasize that while σ SO4 is also our aggregate target z, it can nonetheless be used as a column-level covariate. All input variables are standardized. RIDGE TWO-STAGE 2D RMSE (10 -6 ) 3.47 3.52 MAE (10 -6 ) 3.39 3.39 Corr. (%) 93.5 87.5 3D RMSE (10 -10 ) 2.71 2.50 MAE (10 -10 ) 1.07 1.10 Corr. (%) 62.5 63.9 Table 2. Evaluation scores on vertical profile reconstruction; "2D" refers to evaluation against aggregate σSO 4 targets used for training; "3D" refers to evaluation against vertical groundtruth Results: As depicted in Figure 2, we observe the model is able to resolve vertically distributed details that correlate with the input covariates. Table 2 suggests that the columnlevel knowledge conveyed by 2D covariates is reflected in better performance of the two-staged ridge regression model on the reconstructed 3D profiles.
The scene plotted in Figure 2 shows the patterns of the cloud liquid water reproduced in the prediction --while the cloud layer can be identified in the groundtruth there are no clusters of high concentration within the cloud layer as seen in the prediction. Cloud liquid water is taken as a proxy for where oxidation of SO 2 would occur --having explicit oxidant fields could help reduce the bias due to the cloud field. Both prediction and groundtruth feature a layer of SO 4 that extends across all longitudes and beyond 13km in altitude and that are consistent with the SO 2 mass mixing ratio. Although the prediction in this scene fails to reproduce the thin layer of higher concentration (around 90 • W longitudes) consistent with high SO 2 , some predictions are strongly influences by the SO 2 mass mixing ratio and reproduce its pattern in full (see Appendix B).

Discussion
Motivated by the prediction of better vertically resolved aerosol proxies, we introduced the new task of vertical disaggregation from aggregated 2D observations. We provide a dataset of G5NR model data including diverse meteorological and chemical covariates, propose baseline vertical disaggregation models, and demonstrate their performance at sulfates mass column density disaggregation.
In future work, we intend to apply baseline models to collocated observations from MODIS 2D AOD product [12], CALIOP vertical lidar measurements and more widespread measurements of atmospheric states and compositions. A major benefit from doing that would be increasing the spatiotemporal resolution of CALIOP. Simulatenously, we aim to define evaluation metrics that sensibly penalise vertical incoherence, and are hence naturally suited to this problem. Finally, while we limit ourselves to demonstrating simple linear models operating on aggregate output in this work, further directions will include both non-linear (e.g. kernel-based) and Bayesian methods. This will enable a finer treatment of input covariates along with uncertainty quantification.