Time-Series Analysis and Statistical Forecasting of Daily Rainfall in Kupang, East Nusa Tenggara, Indonesia: A Pilot Study

This pilot study presents a novel statistical time-series approach for analyzing daily rainfall data in Kupang,


Introduction
Kupang is the capital city of the province of East Nusa Tenggara, Indonesia. Kupang generally has a tropical and dry climate which also tends to be influenced by wind and is categorized as a semiarid area because of the relatively low rainfall and the vegetation is dominated by savanna and steppes. Thus, drought poses a serious threat to local food security in Kupang in particular, and the entire territory of Timor Island in general.
In this study we try to overcome this problem by studying the time series pattern of rainfall in Kupang obtained from daily rain gauge data from BMKG El Tari -Kupang station (WMO: 97372, 10°10'17"S, 123°40'19 "E, elevation 102 m) (https://dataonline.bmkg.go.id/). The temporal coverage of this dataset ranges from 1 January 1978 to 31 December 2020.
This pilot study presents a novel statistical time-series approach for analyzing daily rainfall data in Kupang, East Nusa Tenggara, Indonesia. By using the piecewise cubic hermite interpolation algorithm, we succeeded in filling in the null values in the daily rainfall time series. We then analyzed the monthly average and its pattern using the continuous wavelet transform (CWT) algorithm, which shows the strong annual pattern of rainfall in this region. In addition, we use the rainfall anomaly index (RAI) function to standardize daily rainfall as an indicator of dry/wet conditions in this region. Then we also use the daily RAI timeseries objects from 1978 to 2020 for modeling and predicting daily RAI over the next year. The result is the root mean squared error (RMSE) of 0.8424041040593219. This Prophet model is also able to capture the linear trend of increasing drought throughout the study time period and the annual pattern of wet/dry conditions which is in accordance with previous study by (1).
This is an open access article under the CC-BY-SA license.
where hk is the length of the kth subinterval of the data points, and x in this context is the daily time index. The first divided difference δk is known through the following equation, where y in this context is the rainfall datum. dk, which is the slope of the interpole at xk, is known by the following equation, = The interval ≤ ≤ can be expressed as a function of local variables s = x -xk, in form of cubic polynomial, and h = hk, so that this expression satisfies the four interpolation conditions as follows (4), To automate this interpolation process, we use a built-in method of pandas (5) DataFrame, namely interpolate with the argument method = 'pchip'. The result of this interpolation is shown in Figure 2.

Periodicity Analysist
To explore the rainfall data for Kupang, I first average the monthly rainfall over the entire period of the interpolated time series. The result ( Figure 3) is consistent with the study conducted by (1) which grouped East Nusa Tenggara into region A with one peak and one trough. This pattern is due to the strong influence of the northwest monsoon from November to March (NDJFM) and the southeast monsoon from May to September (MJJAS) (1).  In addition to looking at the annual cycle, we can also find out the nonlinear periodicity by using a continuous wavelet transform (CWT) algorithm. CWT is commonly used to analyze local transient oscillations in a time-series object (6). In other words, by using CWT, we can capture the pattern of a time-series object that is not normally distributed. Therefore, periodicity analysis with CWT can be used to analyze the dominant factors that affect climate in a particular location. (7) defines CWT in the equation 6 as follows, where x(t) is a time-series object and ψ(t) is a Morlet wavelet basis function on a scale of s. The power of this CWT is |Wd,ψ| 2 . To simplify the numerical calculation process, we use the PyCWT library in the Python computing environment to process the periodicity analysis of rainfall in Kupang. We show the graphical representation in Figure 4. Qualitatively, as we can see in Figure 4, if the daily rainfall in Kupang is strongly influenced by the annual cycle as the results of the spectral analysis of the double correlation method (DCM), empirical orthogonal function (EOF), and rotated orthogonal function (REOF) conducted by (1) on monthly rainfall in Region A. In addition, we also see interannual signals that may be influenced by the Madden-Julian Oscillation (MJO) and other local phenomena. Meanwhile, the effect of El Niño - Southern Oscillation (ENSO) in Kupang, with a quasiperiodicity of 2 -8 years, is not clearly visible in the daily rainfall data, which may be due to the strong influence of the annual cycle.

Estimation of Rainfall Anomaly Index (RAI)
The calculation of the drought/wetness index using the rainfall anomaly index (RAI) (8) begins by subtracting the time-series object of daily rainfall (R) from the average rainfall over the entire time series period in Kupang ($ % ). Then the results are divided by the result of subtraction between the time series of daily rainfall with the average value of the 10 largest rainfall (M ) or the average value of the 10 smallest rainfall (X) as shown in equation 7 as follows, To simplify the RAI calculation process, we use the precintcon package (9) in the R computing environment. The graphical visualization is shown in Figure 5.

Statistical Forecasting
In this final section, we try to model the daily RAI time series using the Prophet algorithm (10) where g(t) is the trend (non-periodic changes), s(t) is the seasonality (periodic changes), h(t) is the holiday effect (potentially annual irregular schedules ≥ 1 day(s)), and εt is the error term with prior N (0, 0.5).
In this study, we use the default g(t) as the linear trend model used in the Facebook Prophet library in a Python computing environment. The following is a formal definition of the linear trend model, where k is the growth rate at time t, a(t) T δ is the change in growth rate at time t, m is the offset parameter (prior: m ∼ N (0, 5), and γ is the adjustment of the offset parameter to connect the endpoints of time-series segments.
Seasonality in the Prophet algorithm is based on the Fourier series as follows, = > ?9 @ cos ? 2EF G H + I @ sin ?
where the vector of sine and cosine terms (X(t)) is defined as, and vector of weights (β) is defined as, L = R9 , I , … , 9 S , I S T with N is the order of Fourier series, T is the period, and prior β ∼ N (0, σ 2 ).

30
International Journal of Data Science ISSN 2722-2039 Vol. 3, No. 1, June 2022, pp. 25-32 Since there is no holiday term in the daily RAI time series in Kupang, our final model can be formalized into the following form, |;, , L, V ~ X 5 + + ℎ + V 7 (13) We use the Facebook Prophet library to estimate the maximum a posteriori probability (MAP) for all parameters in this final model. The result of prediction and time series modeling using the Facebook Prophet is shown in Figure 6. To evaluate this model, we calculated the root mean squared error (RMSE) using the scikit-learn library in the Python computing environment for the period 1 January 1978 to 31 December 2020. The RMSE value is 0.8424041040593219.
where N is the number of data points, y is the actual value of RAI and yˆ is the predicted value of RAI. Model components can be seen in Figure 7. The results of running the Prophet model using the default parameter settings show that there is an increasing trend of the drought index from 1978 to 2020 in Kupang. In addition, Prophet is also able to model seasonality to the annual stage with results similar to those studied by (1) in region A.

Conclusion
In this study, we have used several algorithms to analyze patterns and predict time series of rainfall anomalies in Kupang. The result is obtained if the annual rainfall pattern is the dominant pattern of this region. The Prophet algorithm that we use to analyze and predict RAI is able to capture the annual pattern of dry/wet conditions, but has not been able to capture the interannual pattern, so we