Regional analysis of climate projections using Bias Corrected spatial disaggregated super-resolution convolutional neural networks

Climate change is very crucial for ecological systems and society. But Global climate models run at coarse spatial resolution which is difficult to do regional analysis. Regional-scale projections can be obtained by a technique called statistical downscaling which uses past data to find out the high resolution and low-resolution mapping. There are many methods for statistical downscaling of climate data: 1) Conventional methods 2) Deep learning architecture. Some of the existing works like DeepSd downscaled High-resolution climate projections but in such cases, Global climate model (GCM) data suffers from concept drift, change of mapping between input and label over time. So applying these deep learning models is not a good idea for statistical downscaling. In my study, I have developed new approach of downscaling which outperforms other deep learning architectures like super-resolution convolutional neural network (SRCNN), Long short term memory network (LSTM) in terms of accuracy and reliability. These existing models focus on minimizing the root mean square error (RMSE ) and do not take care of the tails or extremes. Therefore the objective function of these models should be changed other than root mean square error (RMSE). My proposed model focuses on both means and extremes. I provide a comparison between proposed and other existing deep learning models in downscaling daily precipitation and temperature from 1.25 to 0.25 resolution over India. I have downscaled 6 Global climate model (GCM) models in our comparative study.


Introduction
Climate change causes very dangerous effects on society which leads to extreme precipitation and temperature events. Natural resources are very much sensible to these extreme events which may cause drought and flood etc. Earth system models simulate climate change. These physics-based models can predict the atmospheric variables on a very large scale of about 125x125 KM grid [4]. But for regional analysis of these variables, I need to downscale the GCM data into the resolution of 25x25 [3]. Downscaling is basically of two types statistical and dynamic downscaling [1]. Dynamic downscaling is physics-based models that run on a regional scale and these are computationally expensive [2]. In contrast, Statistical downscaling finds the relationship between observed small scale variables and larger scale variables using artificial neural networks or support vector machines, linear regression. But these methods do not care about the spatial correlations and other existing deep learning models like SRCNN and LSTM do not perform well for statistical downscaling due to concept drift in GCM data.
SRCNN is used in computer vision for signal image super resolution. It tries to minimize the mean difference (RMSE) between high resolution and low-resolution images. BCSD is a state of the art technique for Statistical downscaling which reproduces statistical distribution by doing quantile mapping between GCM data and observation from each individual grid point.
My proposed work, BCSRCNN, a combination of BCSD and SRCNN to perform statistical downscaling doesn't limit itself to minimize the means but it also captures the extremes . I have   downscaled the low-resolution climate projections into high-resolution climate projections   over India for 6 global climate models (CESM-CAM5, NOR-ESM, MIROC, MPI, BNU-ESM,   GFDL). In my proposed model first BCSD is applied and then the weights of auto encoded SRCNN for high-resolution output prediction. BCSD cares about the statistics of the data and SRCNN cares about the spatial correlations and distribution of errors. Data used to train and validate downscaling methods include observed precipitation data (high resolution ) and GCM.

Related Work:
From Ahmed et al. study, it has been noticed that statistical downscaling and dynamic downscaling perform equally with a negligible difference over a small region from GCM, which encourages us to opt for statistical downscaling over dynamical downscaling [6][7][8]. But Statistical downscaling approaches are developed based on the assumption that the statistical relationship between GCM and observation will remain the same in future predicted data [sachindra pap]. Generally, statistical downscaling approaches have been divided into three major categories: weather classification, weather generators and regression-based approaches [9][10][11][12][13][14][15][16][17][18][19]. In my study as i am focusing on regression-based approaches.
From the past literature it has been found that irrespective of the machine learning and deep learning models which has been used, they perform well in simulating average (Means) and underestimates the tails and the standard deviation [60][61][62]. These downscaling models overfits the trend of lower percentiles and underfits the trend of higher percentiles [68]. But all-natural calamities related to climate are considered as extremes; which occur at higher percentiles.
Even though in past studies, machine learning has been applied to statistical downscaling, those studies lack good evaluation of models that were developed. Because the majority of the studies used only RMSE as their metric; but mean will reside in lower percentiles and these models' overfit lower percentiles, RMSE is not a good enough metric to evaluate the models [63][64][65][66][67].

Methodology Data Pre-processing
Data for a single day at the coarse resolution (GCM ) of 1.25 • is an "image" of size 25x27.
Precipitation and elevation are used as input channels while precipitation is the sole output.
Images are obtained at each resolution through downsampling using bicubic interpolation. For instance down-sampling 1.25 o to 0.25 o increases image size from 25x27 to 129x135 similar to the resolution of observed data. This interpolated image is given as input to all models. Data pre-processing is same across all the methods

Long Short term memory network
LSTM is a very special type of recurrent neural network [70]. It is good at dealing with temporally related data. LSTM introduces a special so-called memory cell, which acts as an accumulator to learn long term dependency in a time-series. The cell is self-connected and copied its own real-valued state. Memory cell contains three gates input gate, output gate and forget gate. These gates indicate how much of the information should be passed to the next state and how much should be forgotten. Therefore LSTM preserves the long term dependency without vanishing gradient. The formulation of the LSTM cell is as follows: Here f is forget gate, I is input gate and o is output gate, c is cell memory, h is the previous state. The output from the last ConvLSTM2D is given as input to the conv2d which has 1 filter of size 3x3.

Concept drift
In my study, I have used GCM daily precipitation as input to the SRCNN. But raw GCM and observed data have no daily to daily correlation. Mapping between Raw GCM and observed data is changing with respect to time, this is called concept drift.

Auto encoded SRCNN
Data for a single day at the highest resolution, 0.25 • , covering CONUS is an "image" of size  values. Therefore these do not capture extremes. but BCSRCNN and AUBCSRCNN perform well even for higher percentiles.