Use of multi-spectral visible and near-infrared satellite data for timely estimates of the Earth’s surface reﬂectance in cloudy conditions: Part 2 - image restoration with HICO satellite data in overcast conditions

Satellite-based visible and near-infrared imaging of the Earth’s surface is generally not performed in moderate to highly cloudy conditions; images that look visibly cloud covered to the human eye are typically discarded. Here, we expand upon previous work that employed machine learning (ML) to estimate underlying land surface reﬂectances at red, green, and blue (RGB) wavelengths in cloud contaminated spectra using a low spatial resolution satellite spectrometer. Speciﬁcally, we apply the ML methodology to a case study at much higher spatial resolution with the Hyperspectral Imager for the Coastal Ocean (HICO) that ﬂew on the International Space Station (ISS). HICO spatial sampling is of the order of 90 m. The purpose of our case study is to test whether high spatial resolution features can be captured using multi-spectral imaging in lightly cloudy and overcast conditions. We selected one clear and one cloudy image over a portion of the panhandle coastline of Florida to demonstrate that land features are partially recoverable in overcast conditions. Many high contrast features are well recovered in the presence of optically thin clouds. However, some of the low contrast features, such as narrow roads, are smeared out in the heavily clouded part of the reconstructed image. This case study demonstrates that our approach may be useful for many science and applications that are being developed for current and upcoming satellite missions including precision agriculture and natural vegetation analysis, water quality assessment as well as disturbance, change, hazard, and disaster detection.


Introduction
Space-borne hyper-spectral imagers are instruments that typically have a spatial resolution of the order of 100 m or better and spectral coverage from the visible through near-infrared (NIR) and sometimes also encompassing ultraviolet (UV) and/or short-wave infrared (SWIR) wavelengths.These spectrometers are enabling a host of new science and applications.For example, the 2018 US National Academies' Decadal Survey identified several priorities for a hyper-spectral sensor including the monitoring of terrestrial and aquatic ecosystem physiology and health; snow and ice albedo, accumulation, and melting; active surface changes including those due to volcanic eruptions, landslides, and other hazards; and land use changes and resulting e↵ects on fluxes of energy, water, and carbon (National Academies of Sciences, Engineering, and Medicine, 2018).Several such sensors have been or are currently flying in low Earth orbit (LEO) and more are planned for launch over the next decade.Remote sensing of the Earth's surface with hyperspectral imagers is typically not attempted in moderately cloudy conditions.Rather, observations in overcast cloudy conditions are commonly discarded in most types of surface remote sensing with backscattered sunlight (e.g., Thompson et al., 2014).Atmospheric correction, defined as the removal of atmospheric and unwanted e↵ects such as Rayleigh, cloud, and aerosol scattering, remains a critical part of ocean and land remote sensing algorithms (e.g., Frouin et al., 2019;Lyapustin et al., 2011aLyapustin et al., ,b, 2012;;Thompson et al., 2015Thompson et al., , 2016)).This is a preprint and has not been peer reviewed.Joiner et al. (2021) (hereafter referred to as J21) demonstrated that surface color (red, green, and blue components) could be recovered in cloud conditions of low to moderate optical thickness as well as in heavily loaded absorbing aerosol using satellite multi-spectral data.The instrument they used, the Global Ozone Monitoring Experiment 2 (GOME-2), covers the UV through NIR with continuous measurements at about 0.5 nm spectral resolution and spatial resolution of the order of 40 km 2 .Within the fairly large GOME-2 pixels, many surface spatial details are obscured.It was therefore unclear to what extent and under which conditions cloud e↵ects blur surface imagery reconstructed at higher spatial resolution.
In this work, we expand on the study of J21 to examine whether their cloud-clearing approach can be e↵ectively applied to higher spatial resolution multi-spectral imagers with lower spectral resolution and sampling.Here, we use the Hyperspectral Imager for the Coastal Ocean (HICO) that flew on the International Space Station (ISS).The results of J21 showed that their approach should work well at HICO spectral resolution and sampling.We conduct a case study over the northern coast of Florida, where there were overcast conditions over much of the scene with optically thin clouds over a portion of the scene transitioning to overcast conditions with optically thick clouds over another portion.With HICO we also examine the accuracy of the cloud-clearing at near-infrared wavelengths and show how this applies to reconstructions of a vegetation index; this was not done in the previous J21 GOME-2 study.
The basic cloud-clearing approach involves the use of an artificial neural network (NN) to retrieve surface spectral reflectances from observations in overcast skies.The approach is a spectral method of the type that has been employed for image dehazing, with far ranging applications such as search and rescue and event recognition (e.g., Mehta et al., 2020, and references therein) and ocean remote sensing in the presence of thin clouds, aerosol, and glitter (Frouin et al., 2014;Gross-Colzy et al., 2007a,b;Schroeder et al., 2007;Steinmetz et al., 2011).Unlike other approaches such as spatial, temporal, and noncomplementation methods (e.g., Li et al., 2019;Wang et al., 2019;Zhang et al., 2018b, and references therein), our methodology does not require a priori information about the Earth's surface or atmosphere (beyond an appropriate data set for NN training).In addition to cloud e↵ects, the observations are impacted by scattering and absorption from air molecules (Rayleigh scattering and absorption from gases such as O 2 , H 2 O, and O 3 ).The method is able to remove these e↵ects along with those of the clouds.

Hyperspectral Imager for the Coastal Ocean (HICO) multi-spectral reflectances
We use reflectance measurements from the US Naval Research Laboratory's (NRL) HICO (data provided by NASA Goddard Space Flight Center, Ocean Ecology This is a preprint and has not been peer reviewed. Laboratory, Ocean Biology Processing Group, 2018).HICO has continuous spectral coverage from approximately 400-1000 nm with a spectral binning of ⇠5.73 nm.It flew on the Japanese Experiment Module-Exposed Facility on the ISS at an altitude of approximately 350 km and at an inclination of 51.6 .HICO incorporates an O↵ner grating-type spectrometer that images in a pushbroom mode.The operations period was from 1 October 2009 through 13 September 2014 with a focus on coastal zones worldwide.The ground sample distance is approximately 90 m in both the cross-and along-track directions.The cross-track instantaneous field-of-view (IFOV) varies with view angle from 83 m at nadir to 182 m at 45 .In the along-track direction, the IFOV is also 83 m at nadir and increases to 120 m at 45 .The camera employs a 512 ⇥ 512 charge-coupled device (CCD).Calibration procedures are described by Lucke et al. (2011) and Gao et al. (2012).
There are a few noted issues with HICO that may a↵ect results shown here.Second-order light from wavelengths between 350 and 540 nm falls into the same pixels as the first-order light between 700 and 1080 nm.Although an empirical correction technique was developed (Li et al., 2012), there may still be errors in a bright cloudy scene such as the one we focus on.Imperfections in the frame smearing correction (Lucke et al., 2011) may also a↵ect results shown here.
Here, we use two sets of observations taken over the northern coast of Florida that include a portion of the Panama City Beach area, Mexico Beach, and the St.Joseph Bay.The first scene is taken over mostly clear skies on day 78 (19 March) of 2011 (scene H2011078211654) and is used as the target or outputs for NN training.The second cloud-covered scene is from 17 days later on day 95 (5 April) (scene H2011095142905) and is used for NN inputs during the training and evaluation.A basic assumption is that the scene has not changed during the 17 day period in between when the two days the observations were made.
Because the two scenes are spatially mismatched, it is necessary to align them for the NN training.This was accomplished using the AUTO ALIGN IMAGES software provided for use in the interactive display language (IDL) by T. Metcalf.With this software, we aligned images of the di↵erence vegetation index (DVI) defined as the NIR minus red reflectance (Tucker, 1979).The DVI provides enhanced contrast for image alignment in cloudy conditions as compared with a single wavelength as will be shown below.
The northernmost portion of the full scene was discarded because some clouds were detected on 19 March in this part of the scene which is taken to be the clear sky reference.In addition, we trimmed o↵ the sides of the aligned images to use rows 20-479 and columns 20-1980 for training.This corresponds to latitudes between 29.59N and 30.45N and longitudes between 85.05W and 86.42W.For the cloudy day (5 April), the VZA range is 11.2-17.6 .For the clear day (March 19), the VZA range is 12.3-18.3such that there was not much relative FOV distortion between the two sets of observations.This is a preprint and has not been peer reviewed.

Methodology
Figure 1 shows a flow diagram of the NN training that reconstructs red, green, blue, and near-infrared surface reflectances (henceforth denoted RGBN recon or R recon , G recon , B recon , and N recon ) from cloudy HICO reflectances ⇢ cld .The approach is similar to that detailed in J21, however with a few minor di↵erences.J21 used collocated nadir-adjusted collocated reflectances derived from a di↵erent sensor, the moderate-resolution imaging spectroradiometer (MODIS), as the target in the NN training developed for GOME-2.The MODIS MCD43 data used were processed over a 16 day window in order to construct surface reflectances adjusted to nadir view (Schaaf et al., 2002;Wang et al., 2018).MCD43 processing includes removal of cloudy data as well as atmospheric correction and quality control.Clear-sky data are then weighted over the 16 day window towards the day of interest.Therefore, MCD43 are somewhat smoothed over a 16 day interval.We were not able to collocate the HICO and the lower resolution MODIS images to a satisfactory level for training.Instead, we use clear sky data HICO taken on another day close in time (8 days apart) to our cloudy day of interest for the target in our NN training.
Surface Lambertian-equivalent reflectances (LERs) at R, G, B, and N wavelengths are derived from the clear sky image on 19 March to be used as the predicted or target variables.The wavelengths used to define R, G, B, and N are 630-690, 520-600, 450-520, and 780-900 nm, respectively, similar to bands on the Landsat 7 Enhanced Thematic Mapper Plus (ETM+).Surface LERs are derived by inverting where I 0 is the radiance contributed by the atmosphere with a black surface, T is direct plus di↵use irradiance reaching the surface converted to an ideal Lambertian reflected radiance in the direction an observer by division by ⇡, then multiplied by the transmittance between surface and top-of-atmosphere, and S b is the di↵use flux reflectivity of the atmosphere for isotropic illumination from below.We compute I 0 , T , and S b using the vector linearized discrete ordinate radiative transfer (VLIDORT) software (Spurr, 2006).Henceforth, R, G, B, and N will refer to LERs at red, green, blue, and NIR wavelengths, respectively.Once trained, the non-linear NN reconstruction of RGBN recon , denoted as f NN , can be described by where the inputs depend on observed reflectance spectra ⇢ cld ( ) along with the optional sun-satellite geometry that can be described by the cosines of the solar zenith, view zenith, and phase angles, ✓ 0 , ✓, and , respectively.As in J21, we reduce the dimensionality of the HICO spectra by performing a principal component analysis (PCA) or eigen-decomposition of a covariance matrix constructed from a large sample of spectra from the cloudy scene on 5 April 2011 (note that we do not subtract a mean spectrum first as is common practice for PCA).We use coe cients of the leading modes as the actual NN This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.
predictors or pseudo-observations.We also reconstruct each spectrum using the leading mode coe cients as a quality check as described below.While J21 found that 14 leading modes well captured most of the spectral variability in GOME-2 spectra, we find that 12 modes are su cient for the lower spectral resolution HICO spectra in order to represent most of the variability (>99.9987%) in the wavelength range 400 to 1000 nm.
As in J21, we find that training results improve when we remove watercontaminated pixels; the NN would require a large number of samples over many di↵erent sun-satellite geometries to e↵ectively learn the complex angular-spectral dependencies of water surface scattering.Such an e↵ort is beyond the scope of this case study.We use quality control filters for training data similar to those in J21 as follows: 1) to remove observations with optically thick clouds, we filter out pixels with an observed red reflectance in the cloudy scene of > 0.7 (note that this check was not necessary for the cloudy scene in this study as no pixels meet this requirement); 2) to remove water-contaminated observations, we filter out pixels with observed red and NIR reflectances in the clear scene < 0.1; 3) to remove suspect spectra, we check the reconstruction of each spectrum with leading modes and discard if the maximum error in the reconstructed spectrum exceeds the mean error computed over all spectra in the training set by more than 5 .This flagging may identify erroneous observations in spectra, such as excessive amounts of scattered light or other spectral distortions (removes ⇠0.2% of otherwise good pixels).We trimmed o↵ the edges of the scenes, particularly over ocean, to use rows 20-1980 and columns 20 to 479 of the 19 March scene for the training.We also detected a few clouds in the northernmost portion of the target scene on 19 March and eliminate all pixels north of 30.45N.
The total number of samples meeting passing all quality control checks was 180,996.We conduct the training and evaluation using 2 fold cross validation, where two separate trainings were conducted each using half the samples for training and the other half for evaluation.All results shown here are for the independent samples (i.e., not used in the training).Similar statistical results were obtained for the dependent and independent samples, an indication that over-training did not occur.
We employ the same NN architecture as that used by J21, consisting of a three layer feed-forward artificial NN with two hidden layers and 2N nodes in each layer, where N = 15 is the number of inputs (coe cients of 12 leading modes and cosines of 3 angles defining the sun-satellite geometry).For activation functions, we use a soft-sign for the first layer, a logistic (sigmoid) for the second layer, and a bent identity for the third layer.An adaptive moment estimation optimizer minimizes the error function with a learning rate of 0.1.Inputs and outputs are both scaled to produce zero means and unit standard deviations.

Results
A random sample of HICO spectra is shown in Figure 2 for conditions ranging from mostly clear, with lowest reflectances in the visible wavelengths, to highly This is a preprint and has not been peer reviewed.cloudy, with high visible reflectances.Major atmospheric absorption bands are seen including the O 2 A band near ⇠760 nm and O 2 B band near 685 nm.The red edge, characterized by a rapid rise in reflectance between about 685 and 760 nm, is apparent in the land pixels.These pixels can be distinguished from those over water that have generally low NIR reflectances.The e↵ect of Rayleigh scattering that is more prevalent at shorter (bluer) wavelengths is also seen in the spectra as an increase in reflectance from green to blue wavelengths; the surface reflectance is generally higher in the green as compared with the blue.Some defects may be present at the longest and shortest wavelengths in this range.These artifacts are not expected to degrade results as they do not fall within the ranges of the target reflectances.HICO spectra have been adjusted within the processing of the raw data to account for instrument wavelength and response function variations across its swath (also known as the spectral smile e↵ect) (Gao et al., 2012).
Table 1 summarizes the evaluation with statistics comparing RGBN clr and RGBN recon .Variability captured is generally high for R, G, and B at more that 94% for these wavelengths.The r 2 values are a bit lower than those reported for This is a preprint and has not been peer reviewed.As in J21, we conducted training with and without the sun-satellite angles as predictors and found that inclusion of the angles gave noticeable improvement though reasonable results were obtained without the geometry included (r 2 > 0.92 for R, G, and B).Also, as in J21, we found that the errors in the reconstructed reflectances were highly correlated between the di↵erent bands.We list statistics for band di↵erences in Table 1.
Figure 3 shows the 5 April 2011 HICO original all sky RGB image, the target surface RGB image for 19 March, and the reconstructed RGB image for 5 April.Here, we display only a portion of the reconstructed image that shows a gradient in cloudiness with reduced coverage over ocean (rows 1100-1800 and columns 200-479).The coverage corresponds approximately to latitudes between 29.59N and 30.19N and longitudes from 85.14W to 85.77W.The RGB images are selectively scaled using the ScaleModis procedure from the Interactive Display Language (IDL) coyote library (Fanning Software Consulting) based on original code from the MODIS rapid response team.The two images on the right and in the middle were further enhanced using the Mac Preview application with the "auto levels" function to enhance contrast.
High reflectances over the beaches along the coastline are visible in the uppermost portion of the cloudy 5 April image where clouds are optically thin, but disappear in the lower part of the image where clouds are more optically thick.All surface features are obscured in the lower portion of the cloudy image.As will be shown below, the red reflectance in the cloudiest parts of the scene This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.
(approximately the bottom third of the scene) ranges from about 0.2-0.4.These reflectances correspond to cloud optical thicknesses of the order of 5 (Kujanpää and Kalakoski, 2015).Note that in the center and right panels in this figure and following figures unless otherwise noted, pixels over water surfaces, detected as described above in the 19 March clear sky image, are displayed as black as the training is not optimized for these pixels.The major bright features are fairly well captured in the reconstructed cloudy image, especially in the upper portion of the image where the clouds are more transmissive.
Figure 4 shows a similar set of panels for the DVI (NIR minus red).The DVI is used here because it shows good contrast between vegetated (high DVI) and non-vegetated (low DVI) surfaces.The DVI displays noticeable contrast between land and water scenes even through the optically thick clouds in the lower portion of the image.As in Figure 3, most of the large contrast DVI features are preserved while some of the lower contrast high spatial resolution features are washed out.We found that standard image sharpening software packages were able to restore the general loss of contrast in the lower part of the reconstructed image in Figure 4, but they were not able to restore the fine scale spatial features that are lost in the reconstruction.Note that the DVI error due to clouds is positive in the upper part of the figure where clouds are optically thin but errors are negative in the lower portion of the scene with thicker clouds.As in Figure 3, some of the clear sky spatial features are visible in the cloudy image, while others such as in the lower portion of the image, a↵ected by heavy clouds, are not.
In the cloudy 5 April DVI image (left panel), some moderately low values are seen extending over the ocean.The positive values do not extend in an isotropic manner around all of the coastlines as may be expected if they were due to spatial-spectral scrambling from clouds.Rather, the features over ocean appear in the direction of the image acquisition after land has been encountered.This suggests a possible ghosting e↵ect in the readout of the CCD array.These e↵ects would likely also occur over land and may contribute to spatial smearing.Full scene (including over ocean) clear sky images (19 March) are shown for RGB, red, and blue in Figure 5 and show that the positive DVI features over ocean in the cloudy image of Figure 4 do not correspond to those of ocean color.
Figures 6 and 7 show the same set of three images for the red and blue bands, respectively, along with estimated uncertainties for the reconstructed bands.As in the RGB image, the bright beaches are seen through the clouds in the upper portions of the cloudy images, but are completely obscured in the lower parts of the images.Similar to the set of images in Figures 3 and 4, the high contrast features are captured fairly well, while the low contrast, high spatial features, such as roads are smeared out.Imagery appears very similar in the red and blue bands and there does not appear to be any noticeable smearing in the blue band due to increased Rayleigh scattering.Estimated uncertainties in reconstructed reflectances increase with the value of the surface reflectance similar to results shown in J21.Images in the reconstructed green band display similar features (not shown).This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.This is a preprint and has not been peer reviewed.

Discussion
While some of the low contrast, high resolution features such as roads are smeared out, particularly in the lower portion of the reconstructed imagery, it is encouraging that the the higher contrast features are relatively well preserved even when the clouds appear opaque to the human eye.While we used a clear sky image over the same area for training of a neural network, none of the spatial information from the training set entered into the image reconstruction; all information used to recover the spatial-spectral details from the cloudy image came from the observed spectra on the cloudy day as well as the sun-satellite geometry.Additional studies need to be undertaken to determine the range of conditions and instrument performance that is required to achieve image reconstruction with the required accuracy for a particular application.We have expanded on the work of J21 to include NIR reflectance and DVI, demonstrating that recovery of vegetation information is possible with our reconstruction approach.Therefore, applications in precision agriculture may be possible.
In our case study of a cloudy image where optically thin clouds cover the upper part of the scene transitioning to optically thick clouds (that would appear white to the human eye) in the lower portion, we demonstrate how a hyperspectral data can be used to reconstruct much of the spatial-spectral detail that is obscured or distorted by the clouds.However, some of the highest spatial resolution features with low contrast are lost.We may attribute at least some of this loss to spectral-spatial scrambling of the observations within the optically thick clouded portion of the scene that would be expected to occur particularly in the presence of liquid water clouds.Terra MODIS data on that data indicate water clouds in the vicinity, although the Terra overpass time did not perfectly coincide (our scene was on the eastern part of the MODIS swath).Ice clouds and aerosols produce more forward scattering as compared with water clouds and therefore may produce less spectral-spatial scrambling and blurring in a reconstructed image.Our results in thin clouds shows excellent reconstruction of spectral and high spatial spatial details.However, close inspection of our results reveals that instrumental e↵ects may also play a significant role in the blurring of the reconstructed image.
We note that the clear sky image was taken at 21:16 UTC or 17:16 EST which is later in the day than the cloudy image taken at 14:29 UTC or 10:29 EDT.Unlike the work of J21 that trained on many di↵erent viewing conditions with a nadir adjusted target reflectance, here we essentially neglect surface BRDF e↵ects in the target data.Therefore, shadowing in the clear sky image may have degraded the fitting results somewhat.
This study focuses on land surfaces with a very limited training data set.More training data over a wide range of angles and conditions would be needed to e↵ectively reconstruct imagery over the ocean surface with our methodology.We plan to address ocean applications in future works.We also plan to apply our approach to other hyperspectral imagers.The techniques developed here provide capability to do remote sensing in cloudy conditions and increase the data coverage and timeliness of products from these sensors for a wide variety of This is a preprint and has not been peer reviewed.

Figure 1 :
Figure 1: Flow diagram showing how surface reflectances at red (R), green (G), blue (B), and near-infrared (N) wavelengths are reconstructructed and their uncertainties are estimated with a neural network (NN) using multi-spectral reflectance measurements from HICO.

Figure 2 :
Figure 2: Random sampling of HICO reflectance spectra (random colors and line-types) on 5 April 2011 with major atmospheric atmospheric absorption bands labeled above.

Figure 3 :
Figure 3: RGB imagery from HICO: Left: Original image from 5 April 2011 over the northern coast of Florida including the St.Joseph Peninsula and St.Joseph Bay. Middle: Aligned image from 19 March 2011; Right: Reconstructed surface RGB using 5 April HICO spectra.Pixels over water in middle and right panels are displayed as black.See text for description of processing used to produce these images.

Figure 4 :
Figure 4: Similar to Figure 3 but for the DVI (NIR minus red): Left: Original image from cloudy 5 April 2011; Middle: Original image from clear day 19 March; Right: Reconstructed image from 5 April.Pixels over water in middle and right panels are displayed as black.

Figure 5 :
Figure 5: Original imagery from the clear day 19 March 2011 (ocean not blacked out); Left: RGB; Middle: Red band; Right: Blue band.

Figure 6 :
Figure 6: Similar to Figure 3 but for the Red band: Far Left: Original image from cloudy 5 April 2011; Middle left: Original image from clear day 19 March; Middle Right: Reconstructed image from 5 April; Far Right: Estimated uncertainties in reconstructed reflectances.Pixels over water in middle and right panels are displayed as black.

Figure 7 :
Figure 7: Similar to Figure 6 but for the Blue band: Far Left: Original image from cloudy 5 April 2011; Middle Left: Original image from clear day 19 March; Middle Right: Reconstructed image from 5 April; Far Right: Estimated uncertainties in reconstructed reflectances.Pixels over water in middle and right panels are displayed as black.

Table 1 :
Statistical comparison of reconstructed red (R), green (G), blue (B), and NIR (N) bands (RGBN recon ) and band di↵erences with independent (not used in training) data points (180,996 in total) from the clear sky day (RGBN clr ).Statistics include the root mean squared di↵erence (RMSD), bias (mean of RGBN recon -RGBN clr ), and variance explained (r 2 ).
97, and 0.95 for GOME-2 reconstructed R, G, B, respectively).However, RMSD values were higher for R, G, and B for GOME-2/MODIS as compared with HICO which may result from higher values of reflectances contained in the global sample for GOME-2, particularly the inclusion of deserts.Note that GOME-2/MODIS r 2 values decreased from red to blue wavelengths whereas the opposite behavior is seen for HICO.