Selection of hydrological signatures for large-sample hydrology 1 2

24 25 Hydrological signatures are now used for a wide range of purposes, including catchment 26 classification, process exploration and hydrological model calibration. The recent boost 27 in the popularity and number of signatures has however not been accompanied by the 28 development of clear guidance on signature selection, meaning that signature selection is 29 often arbitrary. Here we use three complementary approaches to compare and rank 15 30 commonly-used signatures, which we evaluate in 671 US catchments from the CAMELS 31 data set (Catchment Attributes and MEteorology for Large-sample Studies). Firstly, we 32 employ machine learning (random forests) to explore how attributes characterizing the 33 climatic conditions, topography, land cover, soil and geology influence (or not) the 34 signatures. Secondly, we use a conceptual hydrological model (Sacramento) to critically 35 assess which signatures are well captured by the simulations. Thirdly, we take advantage 36 of the large sample of CAMELS catchments to characterize the spatial smoothness (using 37 Moran’s I) of the signature field. These three approaches lead to remarkably similar 38 rankings of the signatures. We show that signatures with the noisiest spatial pattern tend 39 to be poorly captured by hydrological simulations, that their relationship to catchments 40 attributes are elusive (in particular they are not correlated to climatic indices like aridity) 41 and that they are particularly sensitive to discharge uncertainties. We question the utility 42 and reliability of those signatures in experimental and modeling hydrological studies, and 43 Manuscript submitted to Water Resources Research on 17/01/2018. 2 we underscore the general importance of accounting for uncertainties in hydrological 44 signatures. 45


space. 89 90
Our approach is motivated by the general idea that "the ability to accurately predict 91 behavior is a severe test of the adequacy of knowledge in any subject" put forward by 92 Crawford and Linsley (1966). We argue that the failure to predict a signature is 93 symptomatic of limitations of our understanding of what it represents, and/or of 94 limitations of data we use to compute or predict it. Here we explore and reveal limitations 95 of hydrological signatures, and argue that this can help to guide signature selection. Our 96 approach is driven by three main research questions: 97 98 1. How well can signatures be predicted using landscape characteristics? With this 99 question, we try to better understand how the interplay of landscape attribute shape 100 hydrological behavior. We used a statistical model (random forests) to relate 101 catchments attributes to hydrological signatures. 102 2. How well can signatures be simulated by a conceptual hydrological model calibrated 103 using an aggregated measure of performance? We used signatures to critically assess 104 the realism of simulations from a model calibrated using RMSE. Our aim is to 105 examine shortcomings of simulations resulting from this kind of traditional (and still 106 prevalent) parameter estimation technique. 107 3. How smoothly do signatures vary in space? We explored the spatial patterns of 108 signatures drawn when plotting their value for 671 catchments. We used those 109 patterns to reflect on whether signature variations in space truly reflect differences in 110 hydrological processes, or rather, data and method uncertainties. 111 112 The analysis in this paper enables us to compare and rank signatures, and to provide 113 general guidance for their selection and use. The remainder of this paper is organized as 114 follows: The data and methods are presented in Section 2; the ranking of signatures is 115 presented in Section 3; the implications for signature selection are discussed in Section 4; 116 conclusions and future research needs are presented in Section 5. 117 2 Data and methods 118

The CAMELS data set 119
All the data used in this study come from the CAMELS data set (Catchment Attributes 120 and MEteorology for Large-sample Studies). The CAMELS data set covers 671 121 catchments in the contiguous US (CONUS) and consists of two types of data: daily time 122 series the atmospheric forcing and discharge (Newman et al., 2014(Newman et al., , 2015 and catchment 123 attributes selected to provide a quantitative description of landscape features likely to 124 influence hydrological processes (Addor et al., 2017a(Addor et al., , 2017b. The hydrometeorological 125 time series and catchment attributes are described in Sections 2.2 and 2.3, respectively. 126

CAMELS hydrometeorological time series 127
The hydrometeorological time series include both daily meteorological forcing and 128 observed discharge time series, as well as daily hydrological simulations. Precipitation 129 and temperature at the catchment scale were retrieved from the Daymet data set (Thornton et al., 2012). Potential evapotranspiration was estimated based on Priestley and 131 Taylor (1972). The hydrologic simulations were produced using the Sacramento Soil 132 Moisture Accounting model (Burnash et al., 1973) combined with the SNOW-17 snow 133 accumulation and ablation model (Anderson, 1973), with streamflow being routed using 134 a unit-hydrograph model. Hereafter this modeling setup is referred to as SAC. SAC was 135 calibrated using the shuffled complex evolution (SCE, Duan et al., 1992) global 136 optimization routine, minimizing the root mean squared error (RMSE) of the discharge 137 simulations. Simulations started on October 1 st 1980 for the 598 basins (out of 671) for 138 which discharge measurements started on or before that date. For the other basins, 139 simulations started on the first October 1 st after the start of the discharge records. SAC 140 was calibrated over the first 15 years of the simulation for each catchment, meaning that 141 different periods were used for different catchments. For each catchment, SCE was 142 started from 10 different random seeds, which led to 10 optimized parameter sets. Further 143 details on the hydrometeorological time series are provided in Newman et al. (2015). 144

CAMELS catchment attributes 145
The landscape of each catchment was described using a wide range of attributes, which 146 can be divided into five classes: 147 148 1.  Those signatures were selected because they characterize  175  different parts of the hydrograph and they are sensitive to processes occurring over  176  different time scales. They are also commonly employed in the literature, so we used this  177  study as an opportunity to compare them. The signatures we considered are described in  178  Table 2. We computed them using the observed discharge and the mean of the 10 SAC 179 simulations produced for each catchment. We also predicted these signatures based on 180 catchment attributes using random forests (Section 2.4). We evaluated the signatures 181 simulated by SAC and predicted by random forests by computing the fraction of variance 182 (R 2 ) of the observed signatures that they explain. forests are a flexible statistical model, which is not constrained by any physical 218 principles or assumptions on hydrological processes. We see it as an advantage, as 219 data exploration using random forest can potentially reveal relationships, which are 220 not commonly acknowledged, although they can be explained a posteriori from a 221 physical perspective. 222 223 3. Reduced risk of data overfitting: Random forests are an ensemble of regression trees, 224 which gives them more robustness than individual regression trees. Randomness is 225 introduced when they are constructed so that their predictions are not overly 226 influenced by specific catchments or predictors (Appendix 1). The random forest 227 predictions were evaluated using a ten-fold cross-validation: a random forest was 228 trained using 90% of the basins and its predictions were evaluated using the 229 remaining basins, this procedure was then repeated nine additional times in order to 230 cover all the basins. forest takes a few seconds). Further, each random forest relies on an ensemble of 246 trees, that can be used to estimate the uncertainty of the prediction (those uncertainty 247 estimates can be very reliable, see Figure A1d). 248 249 We argue that these advantages justify the use of random forests in our study. It is 250 however fair to acknowledge that random forests also have drawbacks. Critically, they 251 are highly parameterized, as each regression tree uses on the order of 10 thresholds. In 252 this study, we used 500 trees to predict each of the 15 hydrological signatures, which 253 leads to about 70,000 parameters (thresholds on predictors). This number of parameter is 254 impractical to analyze on an individual basis, but the relative influence of the predictors 255 on each signature can be quantified using the IncMSE. 256

258
The presentation of the results is organized as follows. We first present spatial maps for a 259 subset of commonly used signatures (mean discharge, slope of the flow duration curve, 260 and the baseflow index), and then we present statistics for the full set of 15 signatures. 261 forests and the SAC model. Mean discharge can be predicted very well by a random 268 forest based on catchment descriptors (R 2 = 0.92) and can be also simulated remarkably 269 well by the conceptual hydrological model SAC calibrated by minimizing the RSME (R 2 270 = 0.98). In contrast, the performance of both the random forest and SAC is poor when it 271 comes to the slope of the flow duration curve (R 2 = 0.29 and R 2 = 0.15, respectively). The 272 baseflow index is predicted (by the random forest) and simulated (by SAC) better than 273 the slope of the flow duration curve, but worse than the mean annual discharge (R 2 = 0.64 274 and R 2 = 0.84, respectively). Note that for these three signatures, the performance of the 275 random forest and of SAC are related: both methods perform well for the mean annual 276 discharge, reasonably well for the baseflow index, and poorly for the slope of the flow 277 duration curve. 278 279 Interestingly, the performance of both the random forest and SAC is related to the spatial 280 smoothness of the hydrological signatures. Note how the mean discharge field varies 281 smoothly across space, whereas the slope of the flow duration curve exhibits large 282 changes over short distances (first row of Figure 1). To quantify the spatial smoothness, 283 we used Moran's I to measure the spatial auto-correlation (Appendix 2). I enables us to 284 quantify features that are clear visually, and to compare signatures based on the spatial 285 smoothness of their field. The spatial smoothness is the highest for the mean discharge (I 286 = 0.51), intermediate for the baseflow index (I = 0.16) and the lowest for the slope of the 287 flow duration curve (I = 0.09). This ranking is the same as the ranking based on the 288 performance of the random forest and SAC. In other words, Figure 1 suggests that 289 signatures with lower spatial smoothness may be harder to relate to catchment 290 characteristics and to simulate using a conceptual model. 291

Simulation, prediction and spatial smoothness of hydrological signatures -292
evaluation for 15 signatures 293 Figure 2 shows that there is a strong three-way relationship between how well signatures 294 can be predicted based on catchment attributes, how well they can be simulated by SAC,295 and the smoothness of their spatial variability over the CONUS. The signatures in Figure  296 2 are ordered from left to right based on how well they can be predicted using a random 297 forest. Like for Figure 1, we compared the observed and predicted signatures from the 298 random forest by computing the coefficient of determination R 2 , shown in light blue in 299 Figure 2. R 2 varies from 0.92 (mean annual discharge) to 0.29 (slope of the flow duration 300 curve). The performance of the random forest is compared to that of SAC, shown in dark 301 blue in Figure 2. It is clear that hydrological signatures that can be accurately predicted 302 from catchment attributes by the random forest can also be well simulated by SAC. 303 Indeed, the performance of the random forest and that of SAC, each described by 15 R 2 304 values, are highly correlated (r = 0.91). Note that several signatures we considered were 305 also predicted by Beck et al. (2015) using characteristics from thousands of catchments 306 from across the world and neural networks. They also find that some signatures are better 307 predicted than others and interestingly, it appears that if they had ranked signatures based 308 on the R 2 they report in their Figure 5, the ranking would have been very similar to what 309 we propose (with the mean annual flow and half-flow date being best predicted, followed 310 by the high-flow quantile, and finally the low flow quantile and the baseflow index). 311 312 Furthermore, the spatial smoothness measured by Moran's I (shown in green in Figure 2) 313 is almost systematically greater for signatures that can be accurately predicted by the 314 random forest and well simulated by SAC. In fact, the correlation between the 315 performance of the random forest and spatial smoothness is strong (r = 0.91). This 316 suggests that random forests fail to capture sudden (small-scale) changes in hydrological 317 signatures over short distances. The spatial smoothness also appears to be a good 318 predictor of how well hydrological signatures are captured by SAC (r = 0.78). 319 320 The remarkable similarity between the performance of the random forests and SAC is 321 somewhat surprising given that the two methods are fundamentally different. The random 322 forest is based on catchment attributes (not on hydrometeorological time series, although 323 some catchment attributes are shaped by hydrometeorological conditions described by the 324 time series) and is a statistical framework that is not constrained by physical processes. In 325 contrast, SAC is a conceptual hydrological model conditioned by hypotheses on 326 catchment behavior imbedded in its formulation, it requires daily time series (random 327 forests only have access to climatic averages) and its parameters are determined by an 328 automated discharge calibration procedure (they were not inferred from catchment 329 attributes). Further, the random forests were trained to capture each hydrological 330 signature independently, but SAC was only trained to optimize RMSE (note that this does 331 not prevent SAC from providing better estimates of most signatures, as shown by Figure  332 2). 333 To better understand why some signatures were better predicted than others, we explored 347 which predictors were preferentially used by the random forest. To this end, we consider 348 the IncMSE, the increase in the MSE of the prediction when the value for each predictor 349

Climatic indices as strongest predictors of hydrological signatures
were shuffled. IncMSE is indicated by the size of the dots in Figure 4. The importance of aridity is clear and its control over the water balance receives 369 continuous attention, sustained by the high number of studies based on the Budyko 370 framework (Padrón et al., 2017). Yet, Figure 4 shows that several hydrological variables, 371 which reflect key aspects of hydrological dynamics, are poorly predicted by aridity alone, 372 or even by a combination of climatic indices. For instance, random forests were unable to 373 clearly relate climate indices to the precipitation-streamflow elasticity, the slope of the 374 flow duration curve or the no-flow frequency. The variations in space of these signatures 375 (bottom row of Figure 3) appear to be too complex to be captured by correlation 376 coefficients, or by a more complex statistical model (random forest) or by a conceptual 377 hydrological model (SAC). In other words, the number of hydrological signatures that 378 can be well predicted based on climatic indices alone is limited. 379

Weak influence of land cover, soil and geology on hydrological behavior? 380
We found that climatic indices have by far the greatest influence on selected hydrological 381 signatures, while the attributes characterizing the land cover, soil, geology and 382 topography have a much weaker influence. The lack of dark colors in the corresponding 383 columns of Figure 4 indicate that those attributes, when considered individually, are not 384 strongly correlated to hydrological signatures. Even when those attributes are combined 385 with other attributes using a random forest, their influence is generally insignificant, as 386 shown by the lack of the large circles in the same columns. The relative strength of 387 climatic variables when compared to other catchment attributes has the following 388 implication. When a hydrological signature is strongly linked to one or several climate 389 indices, it is well predicted, and conversely, weak links lead to poor predictions. Hence, 390 climatic attributes strongly condition how well hydrological signatures can be predicted 391 by the random forest. Some signatures like the slope of the flow duration curve are not 392 well constrained by climate variables, and the random forest is not able to extract relevant 393 information from the predictors we are using. 394 395 The lack of significance of land cover, soil and geology attributes shown in Figure 4  the full information available in the data sets we used as predictors. They fail in 429 particular at capturing sudden changes in space, for instance the fields in Figure 1j  430 are too smooth, but the basic equations constituting SAC lead to more spatial 431 diversity (Figure 1k) In the Results section we discussed how, as we move down the table of signatures shown 450 in Figure 4, the quality of the predictions and simulations, the spatial smoothness of the 451 signature fields and the influence of climate vary in consistent ways. We summarized 452 these results in the top part of signatures, such as the mean discharge, are far less sensitive to rating curve uncertainty 465 than others, such as the slope of the flow duration curve (as illustrated by their Figure 6). 466 Similarly, low flow signatures are more sensitive to data errors than high flow signatures. 467 They also regionalized signatures following a weighted-pooling-group approach, in 468 which each signature was estimated using the weighted mean of its value in similar 469 catchments (similarity was defined based on mean annual precipitation, the 90 th percentile 470 catchment elevation, the base-flow index and catchment area). Their regionalization 471 performs better for high flows than for low flows, and better for the mean discharge than 472 for the slope of the flow duration curve (their Figure 8). This is not only consistent with 473 the sensitivity of the signatures to rating curve uncertainties that they determined, but also 474 with the ranking of signatures we propose based on random forest regionalization. instance, the frequency of zero discharge is impacted by the fact that different stations 499 report very low flows differently, which is likely to contribute to the strong variations in 500 space ( Figure 3i) and to partly explain why the random forest predictions and the SAC 501 simulations are particularly poor for this signature (Figure 2). Further, the formulation of 502 the signature influences its value, and if it is not robust enough, it can exacerbate 503 insignificant differences or mask significant differences between catchments. For 504 instance, streamflow-precipitation elasticity can be formulated in different ways, some 505 being less sensitive to outliers (Sankarasubramanian et al., 2001). It is well possible that 506 other signatures suffer from similar drawbacks: their aim is clear, but their formulation 507 does not capture well that they should because it is lacking robustness. 508 509 Note that uncertainties related to data and methods are only two factors making it 510 difficult to isolate and understand differences in behavior between catchments. A more 511 general issue is that, while signatures enable us to explore hydrological behavior, they do 512 not necessarily allow us to pinpoint the hydrological processes leading to this behavior. 513 For instance, the slope of the flow duration can be related to myriad of processes which 514 are difficult to disentangle and which interact in complex ways. Similarly, both baseflow 515 generation and the snow melt contribute to the slowly-varying part of a hydrograph, but 516 discharge separation techniques used to compute the baseflow index (such as digital 517 filters) are unable to distinguish between these two processes. Also, both 518 evapotranspiration and loss to groundwater lead to low values of the runoff ratio, but the 519 runoff ratio on its own does not inform us on this partitioning. Furthermore, statistics 520 based on high discharge thresholds enable us to explore the frequency and amplitude of 521 floods, but do not account for the different processes leading to floods. In most cases, 522 signatures do not enable us to focus on a single process, but rather, reflect the interplay of 523 several processes. As a consequence of this diversity of processes, it is difficult to 524 establish clear links between landscape attributes and hydrological signatures. 525 526 There are few exceptions, for instance the seasonality of discharge is in many cases 527 determined by the seasonality of precipitation and the eventual presence of snow, and the 528 mean discharge is strongly controlled by the aridity (top of the signature table). But 529 overall, the hydrological drivers of many signatures are still unclear. Hydrological 530 signatures are promising tools, but the results showed here illustrate that research on 531 hydrological signatures is still at an early stage, since we still do not understand many of 532 them well enough to explain what is driving their changes in space. We think that this 533 should make us re-evaluate what they tell us on hydrological processes. To give one 534 example, the precipitation-discharge elasticity is commonly used for anticipate the future 535 impact of climate change on discharge, yet even recent research recognizes that "it is 536 difficult to identify physical reasons, for the spatial variations in elasticity values" 537 (Andréassian et al., 2016). If we are not able to explain how elasticity changes in space, is 538 it reasonable to rely on it to produce projections of discharge under future climate, that 539 will potentially support decision-making on adaptation strategies? We recognize the 540 value of assessing the sensitivity of discharge to precipitation, but we wonder whether 541 sensitivity is correctly captured by this specific signature (and hence, how much faith we 542 should put into it). 543

Hydrological signatures for model calibration and selection 544
SAC performs overall better than the random forests ( Figure 2 Figure 4 have relatively 558 low value for model calibration if their uncertainties are not accounted for. These 559 signatures can be strongly influenced by data and method uncertainties, so approaching 560 them from a deterministic perspective and trying to exactly match them may not bring 561 much, since it means using a hydrological model to mimic variations over space that are 562 only partially related to hydrological processes. Again, in this study, we do not explicitly 563 assess how data and formulation uncertainties propagate into the signatures. Yet, the 564 wider range in the random forest predictions and the scatter in the maps of these 565 signatures indicate that they are particularly uncertain. The signatures we identify as 566 uncertain are also considered as uncertain by Westerberg et al. (2016). Future research 567 should systematically assess the value of signatures. Model calibration using a wide 568 range of signatures in a wide range of catchments would help us to better assess which 569 signatures are most useful for hydrological calibration. 570 571 Our concerns about model calibration using signatures impacted by data and methods 572 uncertainties also apply to model evaluation (and by extension, to model comparison). 573 Signatures at the bottom of the table are particularly uncertain and their relationship to  574 catchment characteristics remain elusive (i.e., we do not have a good handle on those 575 signatures). To use an example, we can predict the mean discharge in space well, which 576 provides us with a reference models can be compared to. In contrast, if we consider the 577 slope of the flow duration curve, it is poorly constrained by observations. Although we 578 can compute its value, we cannot explain its variations in space, hence we question 579 whether this reference is robust enough to enable model comparison. In absence of 580 uncertainty estimates, we do not think that a model should be selected instead of another 581 model, based solely on its better representation of signatures from the bottom of the table. 582 5 Conclusions and outlook 583 584 We systematically explored how landscape attributes influence (or not) hydrological 585 signatures. We described the landscape of 671 catchments in the contiguous USA using 586 five classes of attributes (topography, climatology, land cover, soil and geology) and 587 summarized catchment behaviour using 15 hydrological signatures. Random forests 588 allowed us to combine those landscape characteristics in non-linear ways and to 589 quantitatively explore their relative influence on hydrological signatures. We found that 590 climatic attributes are by far the most influential predictors for signatures that can be 591 well-predicted based on catchment attributes (such as the mean annual discharge or the 592 half-flow date), with land cover, soil and geology attributes playing secondary roles. Yet, 593 several other signatures, such as the slope of the flow duration curve or the streamflow-594 precipitation elasticity are poorly predicted based on catchments attributes, and in 595 particular, could not be satisfactorily predicted by climatic indices alone. 596 597 Using a large sample of catchments enabled us to explore the spatial patterns of 598 hydrological signatures over the CONUS, and to characterize their spatial smoothness 599 (auto-correlation) using Moran's I. We found that spatial smoothness is a simple yet 600 powerful way to gain insights into a variety of aspects of large-sample studies. Signatures 601 with smooth spatial variations are typically those with a high spatial predictability. In 602 contrast, when signatures exhibit abrupt changes over short distances, those changes 603 usually cannot be related to catchment attributes using random forests and they are also 604 poorly captured by hydrological simulations from a conceptual model. Those sudden 605 variations make signature regionalization difficult if neighbouring catchments are used as 606 donors. The reasons behind noisy spatial patterns are not entirely clear and deserve more 607 attention. 608 609 In summary, we found strong relationships between i) our ability to capture hydrological 610 signatures using simulations from a conceptual hydrological model (SAC), ii) our ability 611 to predict them using catchment characteristics as predictors in a machine-learning 612 algorithm (random forests), iii) the spatial smoothness of the maps of these signatures 613 (characterized using Moran's I) and iv) the strength of the climate influence on those 614 signatures. The strong consistency between these four aspects enabled us to rank 615 hydrological signatures (Figure 4). Signatures at the bottom of this ranking are poorly 616 related to catchment attributes, poorly captured by SAC, their spatial pattern is noisy, and 617 based on results from other studies, they are also particularly susceptible to discharge 618 uncertainties and difficult to regionalize. In other words, these signatures are poorly 619 constrained by discharge observations and the drivers of their variations in space are 620 elusive. Hence in absence of uncertainty estimates for these signatures, we question their 621 reliability to formulate conclusions on hydrological processes and we do not recommend 622 them for the evaluation and selection of hydrological models. Those findings are outlined 623 in Table 3.  624  625 Future research could explore whether signatures at the top of the ranking deliver better 626 results when used for the calibration and selection of hydrological models. Another 627 research avenue would be to re-use the framework presented here and explore how our 628 conclusions vary when a subsample of catchments is selected in order to explore a 629 specific landscape feature. We hope that the ideas and results presented in this study will 630 trigger discussions on the drivers of hydrological processes at the catchment scale and on 631 the use of hydrological signatures for hydrological modeling. 632 Appendix 1: An introduction to regression trees and random forests 633 634 We chose to use a machine-learning tool (random forests, Breiman, 2001) to explore how 635 the interplay between landscape attributes shapes hydrological behavior. Machine-636 learning algorithms are gaining in popularity as the quantity and diversity of data to 637 process increase. Machine-learning algorithms have been shown to be powerful 638 prediction techniques, including in hydrologic studies (e.g., Gudmundsson and 639 Seneviratne, 2013; Beck et al., 2015). Here we present a brief introduction to random 640 forests, which may be useful for the interpretation of our results. 641 642 A random forest relies on an ensemble of regression trees to relate predictors (here 643 catchment attributes) to a response variable (here a hydrological signature). In a 644 regression tree, the prediction is made based on a series of threshold-based conditions on 645 the predictors. The prediction scheme is initiated at the top of the tree (in the example 646 shown in Figure A1a, the question at the top split is whether the mean elevation is greater 647 than 1151m). The prediction is then refined using other thresholds on other (and 648 sometimes the same) predictors at lower levels of the tree. The influence of each 649 predictor on the response variable can be estimated based on its position in the regression 650 tree: predictors appearing higher in the tree have a higher separating/predictive power 651 ( Figure A1a indicates that mean elevation is a strong predictor of the base flow index, 652 likely because it conditions the formation a snow pack, which will increase the baseflow 653 index when it melts). Note that regression trees are typically not symmetrical (different 654 variables are used in different parts of the tree). 655 656 Regression trees are grown following a "recursive binary splitting" approach. The 657 procedure starts at the top of the tree and at each split, one variable and one threshold are 658 selected in order to minimize the mean squared error (MSE) of the prediction. The 659 prediction is the mean value of the predictor for all the elements (catchments) falling in 660 each class. As a consequence, the predictions of a decision tree are discrete values (one 661 per terminal node, such as 0.4801 for the left-most terminal node of the tree shown in 662 Figure A1a, which leads to the horizontally aligned back points in Figure A1b). Trees are 663 grown and then pruned by minimizing the cross-validated MSE in order to reduce the risk 664 of overfitting. While regression trees are intuitive to interpret and can deal with non-665 linear relationships between variables, they typically lack robustness. We found that 666 regression trees produced by randomly excluding half of the catchments to be quite 667 different in the predictive variables they selected and in the position of these variables in 668 the tree. 669 670 To overcome this limitation, we used random forests instead of single regression trees. 671 Random forests are an ensemble of regression trees (here we used 500 trees per forest). 672 The robustness of the forest comes from the way each tree is grown. At each split, a 673 subsample of predictors is randomly excluded and the prediction must be done using 674 solely the remaining. This implies that strong predictors, which otherwise might have 675 been used for this specific split, will be excluded. This introduces differences between the 676 trees, making the prediction more robust than if all the trees were similar. The number of 677 trees N and the number of predictors P excluded at each split are variables defined by the 678 user. We found that variations around the default value for P (a third of the total number 679 of predictors) has little influence on our predictions, and that N = 500 is adequate because 680 it leads to better predictions than small forests, but more trees did not improve the 681 predictions. 682 683 Since it is not practical to inspect each tree to determine which variables are used for the 684 prediction, the relative influence of the predictors of a random forest is measured in an 685 automated way. Once the forest has been grown, each predictor is considered individually 686 and its values are shuffled (their statistical distribution remains the same but their order is 687 now random). The relative drop in prediction accuracy (expressed in %) indicates how 688 influential this predictor is (large increases in MSE indicate influential predictors). Figure  689 A1c shows that for the prediction of the baseflow by a random forest, the fraction of 690 precipitation falling as snow is the most influential predictor. 691 An advantage of growing a random forest is that the ensemble of trees can be used to 692 characterize the uncertainty in the prediction. We used QQ plots to assess the reliability 693 of the ensembles and found that for all the hydrological signatures except the fraction of 694 no flow, the ensembles are remarkably reliable ( Figure A1d). Although this is not a 695 feature we use in this study, we consider important to stress this finding, as it can be 696 relevant in other contexts, for instance for parameter estimation based on regionalized 697 hydrological signatures. Finally, note that because the deterministic prediction of each 698 random forest is the mean prediction of its regression trees, the predictions are continuous 699 values. This reduces the granularity of the predictions when compared to regression trees, 700 which only predict a limited number of discrete values ( Figure A1b). 701 702 Appendix 2: Moran's I as a measure of spatial smoothness 703 When a variable is plotted on a map for numerous catchments, spatial patterns can appear 705 and help with the formulation of starting hydrological hypotheses. A fundamental 706 advantage of large-sample hydrology over small-sample hydrology is that, when maps 707 are produced using hundreds of catchments, those insights are likely to be clearer than if 708 the maps were based on a handful of catchments, because those tend to be patchier. 709 710 In this study, we explore and quantify regional variability in hydrological signatures 711 using a measure of spatial smoothness. Addor et al. (2017b) observed that maps of 712 climate indices generally exhibit smoother patterns than maps of hydrological signatures, 713 whose patterns tend to be noisier (with potentially strong differences between adjacent 714 catchments). Similar differences in spatial variability can also be observed among 715 hydrological signatures: some signatures vary gradually across the landscape, while 716 others exhibit abrupt changes over short distances. This is already apparent in earlier 717 studies. Figure 2