Estimating flood quantiles at ungauged sites using nonparametric regression methods with spatial components

ABSTRACT Prediction of flood quantiles at ungauged sites is investigated using several nonparametric regression methods including: local regression and generalized additive models (GAM). These methods are used to describe the relationship between runoff variables and catchment descriptors to predict flood quantiles. Previous work reported the presence of spatial correlation in the residuals for these models. To this end, this study proposes and investigates ways of incorporating spatial components. An L-moments regression technique (LRT) is developed to predict L-moments of target sites and flood quantiles are derived by aggregating quantiles from multiple candidate distributions. The predictive power of the proposed methods is evaluated on a large database of Canadian rivers using cross-validation. The results are examined inside different hydrological regions to assess the behaviour of the methods. The results show that GAM and local regression using, respectively, thin plate spline and kriging provide the best predictive powers among the considered methods.


29
Estimating the statistical distribution of extreme events is a crucial task in designing safe and cost-30 efficient infrastructure. For a given catchment, the amount of streamflow data may not be sufficient 31 to carry out a proper at-site frequency analysis with the desired level of accuracy. In such a 32 situation, regional frequency analysis is commonly used to improve the quality of the estimates by 33 transferring additional information from nearby catchments with similar characteristics. In 34 particular, when a target site is ungauged, flood quantiles associated with a given return period 35 must be deduced from the relationship with existing catchment descriptors. Quantile regression 36 technique (QRT) (Thomas and Benson, 1975) is an approach where the model is directly linking 37 the at-site flood quantiles to the catchment descriptors. This method avoids the subjective task of 38 selecting a best distribution at the site of interest, which impacts the extrapolation for longer return 39 periods. However, this approach may lead to incoherent estimations among different return periods 40 as separate models are necessary for every desired return period. Alternatively, parameter 41 regression techniques (PRT) can be used to link the parameters of a target distribution to its 42 catchment descriptors. In terms of performance for estimating flood quantiles at ungauged sites, 43 recent comparison studies concluded that both techniques led to similar results and that the choice 44 between the two methods depends mostly on practical considerations (Ahn and Palmer, 2016; 45 Haddad and Rahman, 2012). 46 Among existing parameter regression techniques, the index-flood method (Hosking and Wallis, 47 1997) is commonly used to estimate flood quantiles based on the assumption that inside a 48 homogenous region all sites follow the same distribution up to a scale factor, called the index-49 flood. In particular, a neighborhood defines a homogenous region that is unique to a target site and 50 where the pooled catchments are the nearest catchments according to a given metric (Acreman and 51 Sinclair, 1986). Several comparative studies showed that this type of region, commonly called 52 Region Of Influence (ROI), outperforms the use of fixed regions derived from common clustering 53 techniques (Burn, 1990;GREHYS, 1996;Tasker et al., 1996). there are no streamflow data, the degree of homogeneity of a region in regards of the target site 58 cannot be directly assessed. However, useful regions can still be delineated using geographical 59 distance or similarity between catchment descriptors (Eng et al., 2007;Zrinji and Burn, 1994), 60 even though the resulting regions may lack hydrological similarities (Burn et al., 1997; Oudin et 61 al., 2010). Nonetheless, the delicate task of delineating and validating homogenous regions can be 62 avoided if one assumes instead that the catchment attributes evolve smoothly inside the descriptor 63 space (Chokmani and Ouarda, 2004;Laio et al., 2011). 64 A logarithm transformation is applied to the first two L-moments to obtain more normal-shape 151 distributions and to enforce positive values. The L-moments are then used to estimate the flood 152 quantiles of ungauged sites for multiple candidate distributions and a final prediction is obtained 153 by averaging the output of each candidate. This section starts by presenting the nonparametric 154 regression methods and continues by describing the procedure to estimate the flood quantiles from 155 the L-moments predicted at target sites. At the end of the section, criteria are proposed for assessing 156 the quality of estimated flood quantiles. 157

158
Kriging is a powerful geostatistical technique for predicting a variable at unknown location 0 s 159 when data are generated by a Gaussian random field. In the present context, kriging can be used 160 to obtain L-moments at ungauged sites. Let's consider a set of gauged sites (1) However, the choice of the control points plays an important role in the quality of the fitting and 198 using a large number of control points provides more flexibility, but is likely to overfit the data. In 199 that case, regulation can be used to control the smoothness of the estimated function The motivation for using the sum of univariate smooth functions in GAM modelling is that the 207 effect on the response variable with respect to each explanatory variable is straightforward and it 208 was shown that this strategy adapts reasonably well to various nonlinear forms encountered in 209 flood frequency analysis (Chebana et al., 2014). However, that restriction creates limitations. For 210 instance, it means that a GAM model cannot correctly characterize geographical coordinates, 211 because they are treated separately. Consequently, for two or more dimensions, an alternative 212 approach is needed. In this line, for a set of r bivariate control points 1 ,, r cc , thin plate spline 213 (TPS) is better suited for modelling spatial data and is defined as a radial basis function 214 . Wood (2003) showed that thin plate splines can be used to expand the 216 capability of GAM models by incorporating a bivariate function that characterizes the interaction 217 between two explanatory variables, while keeping the general interpretability of the GAM model. 218 For predicting L-moments at ungauged sites, a GAM model can be considered where the spatial 219 component is described by thin plate splines. In this case, the deterministic trend includes both the 220 spatial component and the relationship with the catchment descriptors and consequently the error 221 term is assumed to be spatially independent. 222

223
The GAM models described above contain useful strategies for the spatial component in the 224 prediction of L-moments at ungauged sites. Other regression procedures may not share the same 225 mathematical simplicity, but can also lead to accurate predictions. Local regression where a 226 multiple regression model is performed inside a region of influence is one example. In the present 227 study, weights proportional to the distance of a gauged site from the target site are used to give 228 more importance to the closer sites. Formally, if () i s are the gauged sites of a neighborhood of site 229 1, , in  , ordered by distance () i h with the target then the weights are 230 Note that only the number of sites in the neighborhood must be calibrated to use the model. Although locally linear, the local regression as a whole cannot be treated as a linear model, and 238 therefore the universal kriging estimator (2) with a ROI trend cannot be evaluated. However, a 239 ROI/KRG procedure can be considered by extracting the trend in a first step and then using kriging 240 (or TPS) in a second step to further predict the residuals based on spatial proximity. Here, the mean 241 of the residuals is known to be zero and that assumption is incorporated as a constraint in the 242 simple kriging estimator (Schabenberger and Gotway, 2004). This gives rise to a model having a 243 similar form as the universal kriging estimator, but with a different trend and spatial component. Afterward, the AIC of the best distribution is compared to the one of the GEV. If they are judged 280 similar, then the GEV is used; otherwise, the best distribution is kept. The present study accepts 281 that two distributions are similar when the difference between their AIC is lower than two. Note 282 that such threshold is not largely biased towards the GEV. In particular, that difference corresponds 283 to the impact of removing one parameter from a nested model; Burnham and Anderson (2002) also 284 suggest that differences greater than 4 should be considered meaningful. 285 An alternative to the presented methodology for evaluating flood quantiles at a target site using L-286 moments is the index-flood model (IF) using the L-Moment algorithm (Hosking and Wallis, 1997). 287 The index-flood method computes the flood quantile of a site s as ( ) ( ) q s z s q  , where () zs is 288 the index-flood factor taken here as the mean of the site annual maxima i s (L1) and q represents 289 the flood quantile of a dimensionless distribution, or regional growth curve. With the L-Moment 290 algorithm, the regional growth curves are estimated using neighborhoods of size 20 and by 291 averaging the LCV and LSK using weights proportional to the record lengths. Hosking and Wallis 292 (1997) reported that neighborhoods of size larger than 20-25 sites do not usually provide more 293 information. Traditionally, the L-Moment algorithm selects a single regional distribution based on 294 a Z-statistic that identifies the distribution having theoretically the most coherent L-kurtosis with 295 the sample. Note that in the present situation, the homogeneity of the regions used for computing 296 the regional growth curve is not verified as it is assumed instead that the L-moments are smoothly 297 evolving in descriptor space. 298

299
The assessment of a method for flood frequency analysis is mostly focused on evaluating the 300 capacity of such method in estimating flood quantiles of specific return periods. Leave-one-out 301 cross-validation is often used in that context to evaluate in turn the prediction error made when 302 one site is assumed ungauged. For local regression, such strategy can be efficiently implemented 303 by excluding the target site in each neighborhood. However, for other nonparametric regression 304 methods such as GAM and neural network, refitting a model many times is inefficient. 305 Consequently, k-fold cross-validation is preferred here to limit the computational burden (Hastie 306 et al., 2009). This cross-validation procedure does essentially the same as a leave-one-out cross-307 validation scheme. In turns, one of k group of sites with similar sizes is considered as ungauged 308 and these sites are predicted by the remaining sites. For the present study 10 k  and n=770 sites. 309 Therefore, this strategy reduces by 77 times the number models to be fitted. 310 Based on the residuals of the cross-validation scheme, two criteria are used to evaluate the 311 magnitude of the prediction error. Let () i zs be a response variable predicted at the i-th of n sites 312 when considered as ungauged and () i zs the observed value. Two well-known criteria are the 313 Nash-Sutcliffe 314 and the mean absolute deviation 316 The NSH criterion provides a measure of the quadratic error relatively to the performance of the 318 model represented by a global mean. Values close to 100 % indicate good predictive performance. 319 Alternatively, the MAD criterion is based on absolute errors, which leads to a criterion that is less 320 affected by large discrepancies and so more robust to potential outliers. Note that the MAD 321 criterion provides a measurement of the prediction error in the same units as the predicted variable, 322 while NSH is a dimensionless measure. 323 In general, for two competing methods it is difficult to interpret the meaning of small differences 324 in performance measures as they can represent either a real gain of prediction skill or be simply 325 the consequence of random sampling. In the context of hydrological model forecasting, it was 326 shown that the assessment of significantly better forecasting skills can be determined by hypothesis 327 testing (DelSole and Tippett, 2014; Hamill, 1999). The same strategy is adopted here to evaluate 328 prediction skills using the Wilcox signed-rank (WSR) test. For the first two predicted L-moments 329 (L1 and LCV), the loss differential of two competing methods with predictions () i zs and ˆ() The Wilcox signed-rank test verifies the hypothesis of a zero-mean rank of the loss differential 335 against a significant (bilateral) difference. Therefore, a small p-value of this test provides evidence 336 that one competing method has superior predictive skills. In particular, as a nonparametric test, the 337 outcome is invariant to the distribution of the loss differential.   (Ward, 1963) and their total number is determined based on 370 their overall parsimony and interpretability. Figure 1 illustrates the geographical locations of the 371 770 gauged sites regrouped in 7 hydrological regions. Region 1 represents catchments located 372 along the Pacific coast, which receive the largest amount of precipitation in Canada. Regions 2 to 373 4 are wetter catchments mostly found in eastern Canada and in British Columbia, while regions 5 374 to 7 are drier catchments mainly located in the Prairies and northern Canada. One can see that 375 British Columbia is hydrologically diverse as at least one member of every region is found in its 376 territory. Notice also that large watersheds are generally found at the more northerly locations. 377

378
The quality of the models is assessed by the cross-validation criteria and is reported in Table 2. 379 When no spatial component is included, the GAM method is found to have the least predictive 380 power for the mean (L1) with low NSH and high MAD. This suggests that the assumption of 381 independence between the smooth functions limit the model flexibility. In that regard, ROI offers 382 the best predicting power, with results slightly outperforming NN. Similar observations are made 383 for LCV and LSK. By default, ROI uses the Mahalanobis distance between catchment descriptors 384 to delineate the neighborhood of a target site. Similarly, ROI/GEO is the same method, but using 385 instead the geographical distance. One can see that the change of metric improves the predictive 386 capability of the local regression approach, although based on a WSR test the difference of 387 predictive skills is not significant for L1 and LCV at a significance level of 0.05. 388 These important gains show the importance of geographical proximity in the determination of the 401 scale and shape of the target distribution. 402

Evaluation of the aggregated weights
403 Section 2.4 described a strategy to use weights for estimating flood quantiles from multiple 404 candidate distributions. For that purpose, the Mahalanobis distance between descriptors was 405 proposed to delineate neighborhoods of size 20. This size of neighborhood was found optimal in 406 terms of cross-validation criteria. Table 3 presents the contingency table of the distributions  407 selected by at-site frequency analysis. As expected, the GEV distribution is preferred in a large 408 proportion (78%). However, Region 5 (smaller and drier) shows a substantial percentage of sites 409 (47%) where the GNO and PE3 distributions are selected. Due to the prevalence of GEV, weights 410 greater than 0.5 are found for that distribution in 84% of the neighborhoods. At the same time, in 411 Region 5 the cumulative weight of the GNO and PE3 is greater than 0.5 in 42% of the cases. This 412 shows that the proposed methodology preserves the specificity of the distribution selected inside 413 hydrological regions. Consequently, in a majority of the sites outside of Region 5, the estimated 414 flood quantiles are essentially those of a GEV distribution, while inside Region 5 a compromise is 415 made between the relevant distributions. 416

417
In the following, the focus is set on the GAM/TPS and ROI/KRG methods that were found to be 418 the best methods overall for predicting the L-moments at target sites. The cross-validation criteria 419 of the estimated flood quantiles of return period of 10 and 100 years are presented in Table 4. 420 Overall, it is seen that neither of the two methods is systematically superior to the other. Indeed, 421 according to the WSR test, only hydrological region 7 is significantly better modelled by 422 ROI/KRG for a significance level of 0.05. Notice that the NSH of the hydrological regions is 423 always lower than the NSH of all sites, because the predicting errors are compared to the regional 424 means and not the global mean. The hydrological region with the lowest NSH is Region 7 with 425 70.3 for Q100. However, this region has a MAD of 0.293 that is comparable with Region 2 (0.310), 426 even though the latter has a NSH of 84.0. According to MAD, Region 5 (smaller and drier) is the 427 hydrological region estimated with the lowest accuracy. 428 For evaluating the quality of the prediction considering the sampling error, the coefficient of 429 variation of the estimated flood quantiles for each site is evaluated and averaged by hydrological 430 regions. For Q100, apart from Regions 5-6, the average coefficient of variation (ACV) of 431 GAM/TPS is relatively constant between 10% and 12% . The ACV criterion also shows that the 432 flood quantiles in Region 5 are the less accurately predicted with an ACV of almost double that of 433 the others (19.4). The ACV of Q100 for the ROI/KRG model is less constant across hydrological 434 regions (between 8.6 and 13.7), but they are overall slightly lower than those of GAMP/TPS for 435 all sites. For Q10, according to ACV the ROI/KRG method led to more accurate estimates than 436 GAM/TPS, except for Region 2. 437

Comparison of different approaches for computing flood quantiles
438 Table 5 reports the comparison of different approaches for computing the flood quantiles. The 439 results indicate that the index-flood (IF) method is underperforming when estimating L1 by both 440 GAM/TPS and ROI/KRG. According to the WSR test, the GAM/TPS method using the LRT 441 approach has significantly better prediction skill than the index-flood method to predict Q100 with 442 a p-value less than 0.001. Similar results are observed for ROI/KRG. One reason for this 443 discrepancy is that the index-flood method does not include a spatial component for LCV and 444 LSK, which was shown in Table 2 to provide important information. A second factor could be the 445 choice of the distribution using the Z-statistic. In particular, the adopted procedure for selecting 446 the at-site distribution has chosen the GEV distribution in 78% of the cases, while the GEV 447 distribution is selected in 49% of the neighborhoods using the Z-statistic. To measure the impact 448 of this selection procedure, a variant of the index-flood model (IFW) is considered where the 449 estimated flood quantiles are taken as a weighted average of candidate distributions, like LRT. 450 These results, presented in Table 5, reveal that for Q10 the two variants of the index-flood method 451 are essentially the same in terms of cross-validation criteria, but a slight discrepancy is observed 452 for Q100. This is coherent with the fact that the selection of the regional growth curve has more 453 impact in the extrapolation to the longer return periods and that the averaging approach is more 454 coherent with the at-site selection procedure. However, LRT has also significantly better 455 prediction skill than IFW, according to the WSR test. This shows that the choice of the regional 456 growth curve is not the most important factor to explain the gap in prediction skill between IF and 457

LRT. 458
Previous studies suggested that parameter regression techniques (similar to LRT) have prediction 459 power comparable to quantile regression techniques, QRT (Ahn and Palmer, 2016; Haddad and 460 Rahman, 2012). Table 5 revisits these conclusions and arrives at similar results. In particular, 461 notice that the highest difference in NSH between LRT and QRT is only 0.03, which shows a 462 similar predictive power of these two methods. These findings are also corroborated by the WSR 463 tests with p-values greater than 0.14. 464 The comparison between the various methods have shown overall very similar results in terms of 465 cross-validation when examined at the national level. One reason is that this case study includes a 466 large number of sites and so nonparametric regression methods having comparable features 467 converge to similar solutions. As mentioned, flood frequency analyses in Canada are often 468 performed at the provincial level for political reasons. Therefore, it is useful to investigate the 469 impact of imposing administrative boundaries and limiting the number of sites. The ROI/KRG and 470 the GAM/TPS methods are thus carried out individually for each province. To ensure a minimum 471 of sites by province, the Atlantic provinces (ATL), the three North territories (NT) and Manitoba-472 Saskatchewan (MS) are pooled together. Also, even though Labrador is part of the Atlantic 473 province of Newfoundland and Labrador, these sites are regrouped with the adjacent province of 474 Quebec (QC). Note that for ROI/KRG, the hydrological regions are not used in this context and a 475 constant neighborhood size is selected for each province. Table 6 shows the cross-validation 476 criteria by "province", using national and provincial strategies. Overall, the national analysis 477 appears to outperform the provincial analysis when looking at the cross-validation criteria, but for 478 the vast majority of the provinces and for the whole of Canada, the WSR test does not identify a 479

491
In this study, nonparametric regression models with spatial components were investigated for 492 predicting flood quantiles at ungauged sites. The proposed methodology used three separate 493 regression models to predict the first three L-moments of a target distribution. The flood quantiles 494 were then computed as a weighted average of the flood quantile derived by multiple candidate 495 distributions and a weighting scheme was developed to represent the relative importance of each 496 distribution in the proximity of the target site. The overall performance of the nonparametric 497 regression methods was evaluated using cross-validation on a case study of 770 sites in Canada. 498 The results were summarized at the national level, but also inside 7 relevant hydrological and 7 499 administrative regions. This allowed verification of how the considered methods behave in 500 different situations. It was shown that all nonparametric regression methods worked best when a 501 spatial component was added. In particular, a GAM model combined with thin plate spline 502 (GAM/TPS) and a local regression model using kriging (ROI/KRG) were shown to provide 503 globally the most accurate flood quantile estimates. Overall, the cross-validation criteria used to 504 compare these two methods did not identify a better method and the WSR tests showed no 505 significant difference in prediction skills among them. Each method was found to perform slightly 506 better than the other depending on the context and criteria examined. This resemblance in terms of 507 predictive power could be attributed to the large number of sites present in this case study, which 508 allowed them to converge toward similar solutions. Dividing the analysis by administrative regions 509 revealed a small reduction in the predictive power, but this reduction was not found significant 510 according to the WSR test. Nonetheless, these restrictions brought about by use of administrative 511 regions appear to have affected the performance of GAM/TPS more than ROI/KRG.