Evaluation of satellite precipitation products for water allocation studies in the Sio-Malaba-Malakisi River Basin of East Africa

Study region : Sio Malaba Malakisi river basin, East Africa. Study Focus: Poor rain-gauge density is a limitation to comprehensive hydrological studies in Sub-Saharan Africa. Consequently, Satellite precipitation products (SPPs) provide an alternative source of data for possible use in hydrological modelling. However, there is need to test their reliabilities across varied hydro-climatic and physiographic conditions to understand their applicability. In this study, we evaluated and compared the Tropical Rainfall Measuring Mission (TRMM-3B42 v7), Climate Hazards Group Infrared Precipitation (CHIRPS v2.0), Multi-Source Weighted-Ensemble Precipitation (MSWEP v2.2), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) and Tropical Applications of Meteorology using Satellite (TAMSAT) against gauge observations, for possible use in water allocation studies. Furthermore, the Continuous Semi-distributed Runoff (COSERO) model was adapted using the SPPs and applied to generate discharges, which were cross-compared with observed discharges. New Hydrological Insights for the region: Our results indicate that the SPPs are able to detect seasonal rainfall patterns throughout the basin. At lower altitudes, the products overestimated rainfall events as indicated by the performance measures. The COSERO results indicate that PERSIANN-CDR and MSWEPv2.2 overcompensated and underestimated discharge throughout the basin. This could be attributed to differences in temporal dynamics of the products. In overall, seasonal trends captured by the SPPs can be used to support catchment management efforts in data scarce regions.


INTRODUCTION
Rain gauges have been the mainstay of global rainfall measurements for a very long time.
Inadequate rain gauge density, portends that the distribution of rainfalls cannot be well captured considering its high variability in time and space (Ouma et  In ungauged basins, studies have shown that satellite rainfall products can be used as an alternative source of meteorological data important for river basin management studies (Blöschl et  . In relatively smaller transboundary basins however, not much has been done partly because of the varied national basin management plans that govern observed data sharing and acquisition, thereby limiting requisite validation of satellite based rainfall products. In a composite study over East Africa, Kimani et al. (2017) noted that most of the satellite precipitation products generally replicate the rainfall patterns but with observable differences when it comes to reproducing characteristics of orographic rainfall events. The precipitation products significantly overestimated rainfall amounts in the mountainous areas of East Africa.
Similarly, the performance of Climate Hazards Group Infrared Precipitation with station data (CHIRPS) and Tropical Application of Meteorology using Satellite data (TAMSAT) over Eastern Africa, has been found to be varied and weaker in the coastal and mountainous regions (Dinku et al. 2018). Consequently, according to Li et al. (2009), the results for some of the products such as TRMM based Multi-Satellite Precipitation Analysis (TMPA), can be much improved by using a systematically bias-corrected TMPA rainfall data.
From literature, a majority of studies have largely focused on understanding the performance of TRMM and CHIRPS products in the region. This is partly because they are quasi-global products with relatively high spatial and temporal resolution, with uninterrupted and uniformly distributed precipitation estimates for the tropical region (Huffman et al. 2007;Ngoma et al. 2021). Also, the focus of the studies has been on understanding drought and flood events, which are adequately captured by CHIRPS and TRMM products (Ayugi, et al. 2019). Few studies have focused on other products such as TAMSAT, the Multi-Source Weighted Ensemble Precipitation (MSWEP) and to some extent, the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), which are considered relatively novel and improved over the years. This study incorporated such products, with a view to understand their performance and applicability for extended use in hydrological modelling in a mesoscale tropical river basin.
Several approaches have been employed to establish the reliability of satellite precipitation products as an alternative source of data (Knapp et  In this study, the two approaches were disparately tested with a view to better understand the satellite precipitation products for possible use in water allocation studies within the Sio-Malaba-Malakisi River Basin of East Africa. The statistical measures were first applied through direct comparison of the rainfall datasets and subsequently, the indirect model approach further used to understand the discharges variations in select areas with observed 6 data. The evaluation was important to discern products that can be used locally for enhanced catchment management in the transboundary River Basin spanning Western Kenya and Eastern Uganda in East Africa.

The Study Area
The study area is the Sio-Malaba-Malakisi river basin (SMMRB), a shared basin located between longitudes 33.7 o and 34.7 o E, and latitudes 0.1 o and 1.2 o N on the western Kenya border region with eastern Uganda. In total, 5,000 km 2 are drained by three main rivers; the Sio river that flows from Mt. Elgon and into Lake Victoria in the south, Malakisi and Malaba rivers that flow into L. Kyoga and its wetlands in the south-west. The drainage pattern of the main stream, Sio River, is dendritic with a high drainage density. At its northern most point, the elevation of the SMMRB rises with Mt. Elgon to an altitude of 4320 meters above sea level (masl), while it merges with Lakes Victoria and Kyoga at an altitude of 1135 and 1033 masl respectively (Fig 1). The basin is nestled between these two striking topographic features, which play an important role in the hydrology of the region. Based on the two features, the area experiences two types of rainfall; orographic rainfall which occurs around Mt. Elgon and convectional rainfall near Lake Victoria. The rainfall pattern is normally bimodal, with long rains between March and June (MAM), and the short rains between

Gauge-based datasets
Historical 10-day aggregates of observed precipitation time series was obtained from the Kenya Meteorological Department (KMD). A data quality assessment criterion was applied based on completeness of data (atleast 60% complete), value difference from, and its proximity to nearest neighbors as well as a check for systemic errors and extreme values. In this manner, only 9 out of 39 gauging stations were selected and a subset period of 1998-2016 chosen. The rainfall stations in the SMMRB tend to be located in areas with dense population, specifically in schools and government premises, they are also located in areas of intense agriculture as well as near forest stations (Fig 1). This acquired data, henceforth, served as the reference for the evaluation of downloaded satellite precipitation estimates. Additionally, daily streamflow records from hydrological stations in the SMMRB were acquired from the Water Regulatory Authority (WRA) in Kenya for the period 1981 to 2016 (Fig 1). The time series has extended periods of missing values due to measurement challenges, faulty instruments and limited reporting of measured values. Hydro-meteorological data acquisition in Uganda unfortunately proved to be an unresolvable challenge. Table 1 and Table 2 provide a summary of the attributes of the rainfall and hydrological stations used in the study respectively.

Satellite Precipitation Products (SPPs)
Satellite precipitation products are based on data derived from various sensors and satellites.
They can also include other data sources such as ground radar, gauge networks or forecasts from model or reanalysis. In this study, five Satellite precipitation products were chosen because of their diversity, high spatial resolution, coverage domain, and periods of availability (Maidment et al. 2017). Additionally, the suitability of the products to capture precipitation patterns and extremes over a tropical region was considered. The product attributes are described in Table 3.

CHIRPv2
The CHIRPS dataset is a composite of five different data inputs that include; in situ rainfall measurements, pentadal precipitation climatology, quasi-global geostationary thermal infrared

TAMSATvs2
The dataset used in this study is derived from the TAMSAT version 2.0 dekadal (10 day GridSat-B1 infrared window data. The resulting estimates are then bias corrected using the Global Precipitation Climatology Project (GPCP) 2.5 o monthly precipitation product (Liu et al. 2017). This product has been specifically developed for climate and variability studies.

2.4.Evaluation of Satellite products
Our main focus was to evaluate the performance of 5 satellite precipitation product estimates by comparing them to station observations across the SMMRB. The comparison is done at both the monthly and annual time steps. The products are then used as input in a hydrological model to further compare their performance with observed discharge in an indirect comparison approach. This section describes the methodology applied.

The Approach employed
Point values from the pixels of each satellite product was georeferenced to gauge location and extracted from the raster files downloaded from data access sites (Table 3)

Statistical measurements
Various statistical measurements were applied to corroborate the performance of satellite precipitation products with gauged observations (Table 4). In this regard, the following validation measurements were used; correlation coefficient (CC), mean error (ME), mean absolute error (MAE), root-mean-square error (RMSE), and Percent bias (R bias ). These are calculated, as shown in Table 4 Eq. (1) to Eq.   Where P est are the satellite precipitation estimates and P obs is the observed rain gauge precipitation, over bar is the mean and n is the number of samples considered.

Hydrological Evaluation
The hydrological simulations were performed using the conceptual rainfall-runoff model Given the limited availability of spatially distributed data in the SMMRB, the model was setup in a lumped manner, with the SMMRB being divided into ten sub-catchments (Fig. 1, Table 1  for calibration in the SMMRB. The calibration procedure exposed the poor quality of observed discharge data with the many gaps weakening the performance of certain products (Fig 2). To evaluate the model performance, the following objective functions are applied.

Station distribution
To get a better understanding of the performance of the satellite precipitation estimates against the observed precipitation, we first briefly explore the status of the observed precipitation and the station distribution. The observed precipitation indicates a mixed but related rainfall distribution across the basin. There are clear similarities on rainfall amounts and distribution among stations that are closer to each other and lie in a relatively similar altitude ( Table 1, Fig   1). For example, Busia (1228 masl) is closer to Nambale (1234 masl), Bungoma (1427 masl) is closer to Nzoia (1490 masl) and Webuye (1562 masl) is closer to Lugari (1673 masl), hence the pairs depict almost similar rainfall distribution (Fig 3). Busia and Nambale lie in the middle of the basin and depict larger distribution of rainfall, which could be an effect of their equidistant from two key natural features, Mt. Elgon and Lake Victoria, whose different rainfall regimes (orographic and convectional rainfall regimes) contribute to increased rainfall amounts in the middle of the basin.

Performance against station observations -Monthly time scale
The performance of the satellite products in comparison to the station observations are evaluated and compared over the SMMRB. The observed low rainfall amounts at the lower altitude region, is in contrast to the high amounts in the rest of the basin (Fig 4 & Table 1).   The scatter-plot in Fig 5 presents the evaluation of the five satellite precipitation products with rain gauge measurements at the monthly time-scale. All the products correctly represent the spatial distribution of rainfall. However, PERSIANN-CDR has a wider scatter than the other products, it also tends to overestimate station rainfall amounts. MSWEPv2.2 underestimates some of the rainfall amounts. CHIRPSv2, MSWEPv2.2 and TAMSATv2 rainfall fields have less scatter beyond 300mm per month. In addition, the differences in monthly average precipitation amount enlarges as the precipitation amount increases. In a correlation matrix (Table 7)

Evaluation at an annual time-scale
At the annual scale, TRMM-3B42 underestimates the rainfall events with a correlation coefficient of 0.60 and a Pbias of 36%, while PERSIANN-CDR overestimates the rainfall events with a correlation coefficient of 0.76 and a Pbias of 62%. (Fig 6, Table 8   In the scatter-plots of the annual sums (Fig 7), PERSIANN-CDR overestimates most of the rainfall amounts with a wider scatter compared to the other products. TRMM3B42, TAMSATv2 and CHIRPSv2 also overestimate rainfall amounts but with less scatter than PERSIANN-CDR. However, it is also evident that MSWEPv2.2 tends to underestimates rainfall at the annual time-step similar to the underestimation seen in the monthly evaluation.  overlapping values during the calibration and validation process (Fig 11, 12 and 13).

Bias performance evaluation
The various products registered varied bias performances in different basins. However, the magnitudes of bias often fell below 0.35 which indicates better performance by the model.
However, in basin 8, PERSIANN-CDR registers an extreme bias of 10 in the validation process, an indication of deterioration in model performance (Fig 16). In basin 2, MSWEPv2 exhibits a better bias of less than 0.05 in both calibration and validation procedures. TRMM-3B42 and TAMSATv2 both register a negative bias of 0.15 in the same basin (Fig 14). In basin 3, MSWEPv2.2 registers the strongest bias at 0.32 hence the weakest performance of the 5 products. The product with the best bias at basin 3 is TRMM3B42 with a negative bias below 0.05 at both the calibration and validation process (Fig 15).

NSE performance evaluation
The NSE performance from basin 8 indicate a performance of 0.56 to 0.73 by all the products except PERSIANN-CDR which posted an NSE value of -1.19 (Fig 17). However, the NSE performance was poor for both basins 2 and 3 (Fig 17, 18). To better understand the performance of the products in the SMMRB, we can deduce NSE performance from basin 8 (Fig 19), which had fairly complete observed discharge data in the study area. In that regard, MSWEPv2.2, CHIRPSv2, TRMM-3B42 and TAMSATv2 exhibit decreasing performances in that order respectively.   The next steps would be to make decisions as to whether and how satellite data should be further processed so that they can be used in conjunction with the gauge data. Preliminary results suggest that they can be used but with slight modifications (Dinku et al. 2018). The real issue is therefore to determine if it is possible to configure a locally robust calibration blueprint that can be applied to the satellite data to ensure that they are compatible with the available gauge data and a calibrated rainfall-runoff model. In this regard, a multi-objective calibration process might be useful for the study area.
The CMORPHv1 product dataset was found to have systemic errors of rainfall estimates in the study area at the daily time-scale, which was compounded in monthly and yearly aggregations and was therefore eliminated from the final analysis in this study.

CONCLUSIONS
1. All the products were able to replicate rainfall patterns in space and time, but showed systematic errors in rainfall retrieval that decreased with an increase in rainfall amounts. The systematic errors were mainly in underestimations and showed seasonality, as they were larger during the OND rainy season than during the MAM rainy season. The errors were more evident in a monthly timescale but decreased in a yearly timescale.
2. Products' input data affected their performances in rainfall retrieval. Products using multiple sensors performed better than those with single sensors, especially if the sensors were on different platforms. This increased their ability to retrieve different types of rainfall over SMMRB. This mainly affected TAMSATv2 and PERSIANN-CDR, which use only infrared sensors. A single sensor (infrared) tends to limit rainfall retrieval ability of different rainfall regimes. The distribution of the rain gauges used in calibration also affects their performance, and thus there is a need to regularly update the algorithms with denser rain gauge data where applicable. This affects the way each product varies in performance from region to region.
3. The satellite products considered are therefore applicable over SMMRB, but errors in high altitude areas need to be considered during the OND season, especially for products using only infrared sensors. To reduce orographic effects, elevation and wind direction data are recommended to be included as input data in the development of algorithms to improve the accuracy of orographic rainfall retrieval.
4. The COSERO hydrological model performance shows quite a good performance in the middle of the basin by most of the products. However, the bias evaluation and the ETA performance indicate that the model attempted to overcompensate for the performance of products such as PERSIANN-CDR and MSWEPv2.2.
5. PERSIANN-CDR was found to overestimate rainfall amounts by up-to 60% and is therefore not ideal for use as an input in hydrological models in the area. CHIRPSv2 and MSWEPv2.2 products perform best with a Correlation coefficient of 0.75 and 0.72, and a Pbias of 14% and 4% respectively. At the lower altitude (Port Victoria Station), all the products were found to overestimate the rainfall amounts.
6. CHIRPSv2 and MSWEPv2.2 were found to be the most suitable products for estimating rainfall amounts in the SMMRB.
7. For the purposes of water resource assessments, the findings indicate that it is crucial to select the SPPs which show good performances in direct comparison with rainfall gauge data and hydrological simulations. Only then can it be used for water resource allocation and planning. The example of CMORPH shows, that one cannot simply use any SPPs but that a careful selection process is necessary.