What is the hydrologically effective size of a catchment?

Linking human activities and climate change with their consequences for water availability is a prerequisite for sustainable water management, which is traditionally performed at topographically delineated catchments. However, inter-catchment groundwater flow results in effective catchment sizes other than sizes suggested by topography. Here, we introduce the notion of effective catchment size and analyze the relative prevalence of substantial differences between topographic and effective catchment sizes using a large global dataset. We find that 1 in 3 catchments exhibit an effective catchment size either twice as large or half as small as its topographic size. These catchments will likely be affected by management activities such as groundwater pumping or land use change outside their topographic boundaries. Or they alternatively affect water resources beyond their topographic boundaries. The observed differences are strongly linked to aridity, slope, distance to coast, and topographic size. We show that our findings also hold for other large catchment databases, e.g., GRDC, CAMELS-US, and MOPEX . Our study provides a first-order identification of catchments where additional in-depth analysis of subsurface connectivity is needed to achieve sustainable water management. This supplement describes the global distributions of ECI, the CART (Classification And Regression Tree) analysis used to indicate the importance of different predictors (catchment attributes), and an analysis of the sensitivity of our results to the minimum number of ECI combinations obtained from the different precipitation and actual evapotranspiration datasets.


Introduction
Sustainable water resources management requires a robust understanding of the link between management activities and their consequences for freshwater availability. Traditional water management has focused on what is sometimes called blue water, i.e. water available for development and supply such as pumping (groundwater) or the creation of artificial reservoirs (mainly through damming of rivers). This view has been expanded to additionally consider activities that influence green water (soil moisture, Falkenmark and Rockström 2006), which includes the impact of land use that partially controls the amount of soil moisture removed from the terrestrial water cycle through evapotranspiration (Gordon et al 2005, Schewe et al 2019. Traditionally, blue water and green water availability is quantified by climate and discharge observations at the outlet of topographically delineated catchments assuming a closed water balance. For that reason, topographic catchments are deemed as the basic unit for water management activities and they represent the most common scale for hydrologic research and practice (Wagener et al 2007). Consequently, a growing number of hydro-meteorological catchment scale datasets have been developed to enable hydrologic analysis and advance understanding (Addor et al 2017, Duan et al 2006. 3 However, regional groundwater flow systems can remove water from or add water to topographic catchments (Tóth 1962) resulting in a violation of the assumption of a closed water balance. Such groundwater connectivity means that some catchments collect water from areas beyond topographic boundaries, while others are effectively much smaller than their topography suggests. Evidence for this problem comes so far only from studies of individual catchments or from those performed in geographically-focused regions (Fan and Schaller 2009, while a wider discussion of the consequences for water management activities is missing. In hydrologic research and practice, most modeling strategies currently ignore the consideration of groundwater connectivity assuming that catchments are not connected to their surroundings (Bouaziz et al 2018). So far, no global analysis of effective catchment areas has been performed and the conditions that control the extend of these deviations are still largely unexplored.
In this study, we introduce the notion of effective catchment area and define a novel metric, the Effective Catchment Index (ECI), to detect and quantify differences between topographic and effective catchment sizes. We determine ECI for a global set of catchments derived from various publically available discharge datasets and use Classification And Regression Tree (CART) to link it to physiographic characteristics of the catchments to identify the most important factors controlling its strength and direction.

The Effective Catchment Index (ECI)
We define a new metric, the Effective Catchment Index (ECI), to describe the deviation of the effective catchment size from the topographic catchment area. ECI enables us to estimate the effective size of a catchment in contrast to its topographic size and to quantify the inter-catchment groundwater flow. It is calculated as follows: their effective size is smaller than their topographic size.
Assuming a negligible influence of water storage on the total water balance on a long-term basis, the observed mean discharge, Q obs , at the catchment outlet represents the response of the effective catchment ( [km 2 ]), while the difference of catchment average precipitation (P) and actual evapotranspiration (AET), P-AET, represents the response of the topographic catchment ( [km 2 ]). Therefore, the ratio of the effective over the topographic catchment area ( ) can be calculated as follows: (2) Combing equation (1) and (2) yields: (3)

Data
The ECI is calculated using observed discharge (Q obs ), and independent precipitation (P) and actual evapotranspiration (AET) data. Based on the best availability of discharge observations, we chose the 10- year time period 2000-2009 to obtain long-term averages for ECI computation. In our study, discharge observations are obtained from several sources (table 1) described in details by Beck et al (2019a). We further selected AET and precipitation datasets according to the following quality criteria 1) temporal coverage: the P and AET datasets should cover the time period of 2000-2009 since the analysis is performed during this period; 2) spatial coverage: the longitude should range from -180° to 180° and the latitude should range from -60° to 80°, which is set based on the locations of our catchments; 3) the spatial resolution should not be coarser than around 0.25°×0.25°. Finally, three independent precipitation (P) and three actual evapotranspiration (AET) products were used to perform the analysis (table 1). To minimize the influence of irrigation on the calculation of ECI, we only consider catchments with irrigation area <5% of the total catchment area from the discharge observation dataset (table 1). By doing so we exclude the pumping effect to a large extent, even though pumping might still occur for other purposes than irrigation not captured by our analysis. These considerations result in a final catchment set consisting of 8,701 catchmens for the analysis, which are all consistent with our a priori data requirements.

Uncertainty analysis
The uncertainty in the ECI calculation originates from uncertainties in the discharge, precipitation, and actual evapotranspiration datasets. Compared to the precipitation and AET datasets, the discharge that is observed at local gauging stations is assumed to contain much smaller uncertainty (Khan et al 2018, Biemans et al 2009, Sauer and Meyer 1992. We consequently focus on the uncertainties introduced through precipitation and AET by analyzing the range of ECI estimates obtained from the nine possible combinations of the three independent AET and three independent precipitation datasets presented in table 1. We define "conclusive" catchments as those with both at least seven out of nine possoble ECI estimates can be obtained (given that some datasets do not cover all catchments) and with all ECI estimates indicating the consistent deviation of effective catchment sizes from topographic sizes (all effective catchment sizes are either all smaller than the topographic catchment area, or all are larger). We use the mean of the conclusive ECI estimates for the subsequent analysis. In order to avoid conclusions due to the subjectivity in defining the threshold for conclusive catchments, we repeat our analysis where we require that at least eight and all nine ECIs show the same sign.

Identification of controlling factors
We use a Classification And Regression Tree (CART) analysis to identify the most relevant controlling factors of ECI and to quantify their importance and we apply a 10-fold cross-validation for the regression tree. CART is a multivariate statistical analysis and can be easily applied to identify the most "important" factors to organize a dataset, e.g., for catchment classification with multiple hydrologic indices (Sawicz et al 2014). Building on the idea of hydrologic landscapes with respect to dominant controls on the water balance (Winter 2001), we use physiographic properties that characterize a catchment"s climate, topography, and geology: aridity index (AI), distance to the coast (D), topographic catchment area (A), mean slope (S), permeability (µ), and mean elevation (H). In addition, the fraction of lakes and reservoirs (f w ) is considered. We identify the most important factors using the "predictorImportance" function in Matlab©, which computes estimates of predictor importance for trees by averaging changes in the risk 7 considering tree splits and tree branch nodes. To avoid false conclusions due to the pre-selection of conclusive catchments (see section 2.3), we applied the CART analysis to the 2,760 conclusive catchments as well as to all 8,701 catchments. We explore the relationship between controlling factors and ECI with bivariate correlation analysis with the four most important variables.

Global distribution of ECI
Using a global dataset of 8,701 catchments (Beck et al 2019a) and the newly defined Effective Catchment Index (ECI), we provide the first quantitative evidence that effective catchment sizes can significantly differ from topographic sizes. Through the analysis of the 2,760 conclusive catchments (figure 1), we find that substantial deviations between topographic and effective catchment sizes (the effective size is either larger than twice or smaller than half their topographic catchment area) are abundant across the entire globle (cirles in dark red and dark blue in figure 1). 36% of the conclusive catchments show this substantial deviation. Consequently, blue water availability in these catchments will likely be affected by management activities outside their topographic boundares, or, activities within topographic boundaries will affect water resources elsewhere.
In North America, our analysis indicates that 2,117 of 6,172 catchments are conclusive. Among those, 691 catchments show substantial deviations between effective and topographic catchment sizes (the effective size is either twice as large or half as small as the topographic size). Especially in the US, our analysis demonstrates that catchments located in the Upper Mississippi River Basin gain water from neibouring areas resulting in larger effective catchment areas (positive ECI). While catchments in the Colorado River Basin show smaller effective catchment sizes (negative ECI), indicating losing water conditions. These results agree well with a previous regional study on the groundwater exporters and importers in the US (Fan and Schaller 2009 Repeating the analysis with a minimum of eight or all nine conclusive ECI estimates results in 30% (2,573) and 28% (2,442) conclusive catchments, respectively. Their global ECI distributions (figure S1 and figure S2) show similar patterns as our main analysis (figure 1).

Figure 1
Global distribution of ECI. Colored circles represent 2,760 conclusive catchments from the 8,701 catchments, while the grey squares indicate inconclusive catchments. The dark red and dark blue circles represent catchments with the effective area that is either twice as large or half as small as the 9 topographic area. The light red and light blue circles indicate catchments with a smaller deviation between the topographic and effective catchment areas compared to the circles in dark red and dark blue.

Factors controlling ECI
The CART analysis of ECI shows that the 10-fold cross-validation losses (misclassified data) of the 2,760 conclusive and the entire 8,701 catchments are 0.11 and 0.12, respectively, which indicates a good classification and regression. The analysis on the 2,760 conclusive catchments (figure 2) shows that aridity index, mean slope, distance to coast and topographic catchment area are located in the top four levels of the splits in the classification and regression tree, indicating their importance to the ECI. To avoid bias due to our definition of conclusive catchments, the same CART analysis was performed on the entire 8,701 catchments (figure S3). It identified the same four factors as the most important predictors.
Through quantifying the importance estimates of all seven factors that represent a catchment"s climate, topography, and geology (figure S4), it confirmed that aridity index is clearly the strongest controlling factor for both cases (the conclusive and the entire catchments), the following two factors differ in the two cases, either distance to coast or mean slope, followed by topographic catchments area, permeability, fraction of lakes and reservoirs, and the mean elevation. We, therefore, analyze the relationship between ECI and the top four controlling factors (see section 3.3), i.e. aridity index, distance to coast, mean slope, and the topographic catchment area.
In general, catchments in the more arid regions present a higher possibility of producing negative ECI

Relationship between influencing factors and ECI
For the 2,760 conclusive catchments, 17% indicate the effective catchment size is as small as half their topographic size, while 19% show the effective catchment size larger than double of that suggested by topography ( figure 3a). Analyzing further the relationship between ECI and the four most influencing factors identified via CART analysis for the 2,760 conclusive catchments, we find that arid catchments tend to lose more water resulting in smaller effective catchments sizes (figure 3b), while the other controlling factors are more important in wetter regions. This is in line with the numerical model simulations in a drier climate by Fan and Schaller (2009), who showed that in arid regions the regional groundwater table often falls below stream beds and much of the recharge through local precipitation enters the regional groundwater flow system rather than the river. A decreasing trend in ECI is found with decreasing distance to the coast (figure 3c), suggesting that near-coast catchments tend to lose water 11 through subsurface drainage to the sea. This is consistent with studies on submarine groundwater discharges of coastal catchments, which showed that the estimation of total submarine groundwater discharge can be 80−160% of the amount of freshwater entering the Atlantic Ocean from rivers (Moore et al 2008). ECI variability also decreases with increasing slope (figure 3d) and catchment size (figure 3e), indicating a higher variability of effective catchment area in flatter regions and in headwater catchments.
It also shows smaller deviations from the topographic catchments for large basins, which one would expect to be more consistent with the regional groundwater system. There is some statistical dependency between the four controling factors (figure S5), which can be explained by physiography. For instance, aridity index is positively correlated with distance to the coast since inland regions are more likely to be arid than coastal regions (Trenberth et al 2011, Makarieva et al 2009. The relationships between ECI and the controlling factors do not change substantially when using at least eight or all nine ECI estimates to define conclusive catchments ( figure S6 and figure S7).

ECI analysis of other catchment databases
We We find that about 30%−40% of the catchments in these datasets are conclusive catchments with consistent deviations of effective catchment areas and topographic catchment areas. We further find that 20.6%-58.1% of the conclusive catchments in these databases have substantial ECI influence such that the effective catchment size is either twice as large as or less than half its topographic size. These results confirm our previous analysis with our global catchment dataset of 8,701 catchments and indicate that the results of previous large-scale studies may have been biased for a substantial fraction of their catchments.

Conclusions
Detecting and quantifying differences between effective catchments and topographic catchments using the Effective Catchment Index, our global study provides evidence that the assumption of a closed water balance of topographic catchments does not hold for a substantial fraction of catchments across the globe.
One in three catchments have the effective catchment size even to such an extent that is either twice as large as, or less than half its topographic catchment area. Consequently, these catchments will potentially be affected by management activities such as land use change outside their topographic boundaries, or, activities within their topographic boundaries will affect blue water resources outside. Repeating this analysis with other catchment databases frequently used in previous large-scale studies (GRDC, CAMELS-US, and MOPEX) also revealed consistent results. This provides strong indication that these studies may have drawn conclusions biased by a false assumption of a closed water balance for their topographic catchments. Our study shows that our newly defined Effective Catchment Index provides a simple and easily applied diagnostic tool to account for the deviations between effective and topographic catchments. Thus, it provides the first step towards better understanding the hydrologically effective size of catchments and provides a first-order insight where additional in-depth analysis of subsurface connectivity across topographic boundaries is needed to support sustainable water management.  Figure S3 shows the CART analysis of the entire 8,701 catchments to confirm the consistence on the controlling factors that are identified by the same analysis of the 2,760 conclusive catchments.

ECI sensitivity to minimum number of conclusive catchments
In the main manuscript, we show results for the catchments with at least seven conclusive ECI estimates.
We also demonstrate our results for even stricter criteria for the ECI estimates (at least eight, and all nine conclusive ECI estimates). Figure S6-S7 show results for the catchments with at least eight and all nine conclusive ECI estimates, respectively. The general trends and relationships of the ECI to aridity, distance to the coast, mean slope, and the catchment size also hold for these stricter criteria. This indicates that the 25 general relationship of ECI to aridity, distance to the coast, mean slope, and catchment size is robust for this large dataset.

Figure S9
Results for the catchments with at least eight conclusive ECI estimates: (a) distribution of the ratio of effective catchment area over topographic catchment area, the relationships between the ECI estimates to (b) aridity index, (c) to distance to the coast, (d) to mean slope, and (e) to topographic catchment area.

Figure S10
Results for the catchments with all nine conclusive ECI estimates: (a) distribution of the ratio of effective catchment area over topographic catchment area, and relationships between the ECI estimates and (b) aridity index, (c) distance to the coast, (d) mean slope, and (e) topographic catchment area.
Our study demonstrates that the effective catchment size has significant impacts on catchment water balance. Additional future research is needed though. Beck et al (2019) recently showed that precipitation in mountain ranges may be underestimated when they analyzed three widely used global precipitation datasets. Their study, however, assumed that deviation of effective catchment size from the topographic catchment size is insignificant everywhere and that precipitation bias is the only relevant factor. Further . Overall, our newly defined metric, the ECI index, is a useful indicator that can help to avoid biased