Rapid and Accurate Estimates of Streamflow Depletion Caused by Groundwater Pumping Using Analytical Depletion Functions

Reductions in streamflow due to groundwater pumping (“streamflow depletion”) can negatively impact water users and aquatic ecosystems but are challenging to estimate due to the time and expertise required to develop numerical models often used for water management. Here we develop analytical depletion functions, which are simpler approaches consisting of (i) stream proximity criteria, which determine the stream segments impacted by a well; (ii) a depletion apportionment equation, which distributes depletion among impacted stream segments; and (iii) an analytical model to estimate streamflow depletion in each segment.We evaluate 50 analytical depletion functions via comparison to an archetypal numerical model and find that analytical depletion functions predict streamflow depletion more accurately than analytical models alone. The choice of a depletion apportionment equation has the largest impact on analytical depletion function performance and equations that consider stream network geometry perform best. The best‐performing analytical depletion function combines stream proximity criteria which expand through time to account for the increasing size of the capture zone; a web squared depletion apportionment equation, which considers stream geometry; and the Hunt analytical model, which includes streambed resistance to flow. This analytical depletion function correctly identifies the stream segment most affected by a well >70% of the time with mean absolute error < 15% of predicted depletion and performs best for wells in relatively flat settings within ~3 km of streams. Our results indicate that analytical depletion functions may be useful water management decision support tools in locations where calibrated numerical models are not available. Plain Language Summary Groundwater pumping can reduce streamflow (“streamflow depletion”), but it is hard to determine which streams will be affected by a well and how much each stream will be depleted. In this study, we develop and test simple tools called analytical depletion functions that can be used to estimate streamflow depletion in real‐world settings where more complex models or field estimates are not available. We find that analytical depletion functions accurately predict which stream will be most affected by groundwater pumping for >70% of wells and accurately estimate the amount of depletion. Thus, we conclude that analytical depletion functions are a useful tool to provide rapid, screening‐level estimates of streamflow depletion for water managers in areas where more complex approaches are unavailable. By integrating analytical depletion functions into online decision support tools, it will be possible for nonexperts to quickly estimate reductions in streamflow caused by groundwater pumping from existing and/or proposed wells.


Introduction
Effective conjunctive management of surface water and groundwater requires information about the impacts of groundwater pumping on streamflow, which is often unknown. Groundwater pumping reduces streamflow ('streamflow depletion') by capturing groundwater which would have otherwise discharged into streams or, in extreme cases, inducing infiltration from the stream into the aquifer Bredehoeft et al., 1982). This can have negative consequences on surface water users and aquatic ecosystems, both of which depend on a stable contribution of groundwater to streamflow (Gleeson & Richter, 2017;Larsen & Woelfle-Erskine, 2018;Perkin et al., 2017;Rohde et al., 2017Rohde et al., , 2018. However, quantifying streamflow depletion is challenging due to the complexity of modeling groundwater-surface water interactions (Barlow & Leake, 2012). To guide sustainable water management, it is critical to develop and test alternative approaches to estimate streamflow depletion that can allow local water managers to make informed decisions on groundwater withdrawals in a variety of settings .
Streamflow depletion can be modeled using numerical or analytical models (Table S1; Zipper et al., 2018a). Numerical models (e.g. MODFLOW) are process-based representations of the physics governing groundwater flow, and are therefore ideal for local water management applications including estimating streamflow depletion. However, the time, expertise, and financial costs associated with their development make them impractical for most areas around the world. Analytical models offer water managers a simpler approach to estimating streamflow depletion, but do not simulate most of the processes and real-world complexity included in numerical models. Due to the relative ease of implementing analytical models, they have been suggested as a path towards developing real-time, web-based conjunctive management decision support tools in locations where numerical models do not exist (Huggins et al., 2018). The only functional example the authors are aware of is the Michigan Water Withdrawal Assessment Tool (http://www.deq.state.mi.us/wwat), which integrates an analytical depletion function with a fish ecological model and is a required step to approve proposals that permit new or increased large quantity withdrawal from the surface or groundwater (Hamilton & Seelbach, 2011;Reeves et al., 2009Reeves et al., , 2010Zorn et al., 2012).
Given their practicality, numerous analytical models have been developed for calculating streamflow depletion for different hydrogeological conditions (reviewed in Huang et al., 2018 andHunt, 2014). Most analytical models assume either one or two linear streams with aquifers extending infinitely away from the stream, though some analytical models have been developed for more complex settings. For instance, Yeh et al. (2008) provide an analytical model for a well between two intersecting streams, and Singh (2009) for a well next to a stream with a right-angle bend. Despite these advances, there are still many real-world settings in which even the most complex analytical models cannot predict streamflow depletion, such as domains with > 2 streams or sinuous stream networks (Barlow & Leake, 2012).
In these complex real-world settings, recent work has suggested integrating analytical models with depletion apportionment equations, which are geometric methods used to distribute the impacts of pumping among the stream segments within the surrounding stream network (Reeves et al., 2009;Zipper et al., 2018a). To use depletion apportionment equations, it is also necessary to define the subset of streams within the domain which may be impacted by pumping using a stream proximity criteria. Here, we introduce the term analytical depletion function to refer to the combination of an analytical model with a depletion apportionment equation calculated for streams meeting a given stream proximity criteria.
For use in water management decision-making, it is necessary to rigorously evaluate the performance of analytical depletion functions. However, many aspects of analytical depletion functions remain untested, particularly their performance under transient conditions. During the development of the Michigan Tool, Reeves et al. (2009) found that the best match to numerical model results was the Hunt (1999) analytical model with an inverse distance-based depletion apportionment equation using adjacent catchments as the stream proximity criteria. However, this comparison was only conducted for a single timestep (after 5 years of pumping) and a single stream in Michigan. Subsequently, Zipper et al. (2018a) tested 5 depletion apportionment equations across a range of stream network geometries in British Columbia and found that the web squared depletion apportionment equation, which considers stream network geometry, best matched numerical model results across several stream network and aquifer configurations. However, this comparison was under steady-state conditions and therefore did not consider different analytical models or timesteps.
To date, there has been no systematic evaluation of the performance of analytical depletion functions considering the constituent elements (analytical model, depletion apportionment equation, and stream proximity criteria), nor any studies which assess the performance of analytical depletion functions under transient conditions. To address this knowledge gap and advance understanding of the utility of analytical depletion functions as a potential decision support tool, we investigate the questions: (1) How does the choice of analytical model, depletion apportionment equation, and stream proximity criteria affect the performance of analytical depletion functions?
(2) How does the performance of analytical depletion functions change through time under transient conditions?
(3) What are the primary landscape attributes associated with errors in analytical depletion functions?

Methods
We answer these questions by creating and testing 50 analytical depletion functions using a combination of two analytical models, five depletion apportionment equations, and five stream proximity criteria (Section 2.1). We evaluate these models via comparison with an archetypal numerical model, which is a simplified numerical model intended to capture key processes of interest (Table S1; Zipper et al., 2018a), based on the Navarro River Watershed, California (Section 2.2). This comparison is intended to provide generalizable and broadly relevant insights into the performance of simplified analytical depletion functions via comparison to a processbased numerical model, rather than a specific assessment of streamflow depletion in the Navarro River Watershed.

Analytical depletion functions
An analytical depletion function consists of the three constituent elements: stream proximity criteria (Section 2.1.1), a depletion apportionment equation (Section 2.1.2), and an analytical streamflow depletion model (Section 2.1.3). The combination of these constituent elements is demonstrated in Figure 1. First, the stream proximity criteria are used to identify all stream segments and/or portions of stream segments that may be affected by a given well. Second, the depletion apportionment equation is used to estimate the fraction of total depletion (fi) which is apportioned to each stream segment, i, meeting the stream proximity criteria. Third, the analytical model is used to estimate the volumetric streamflow depletion (Qsi) for each stream segment meeting the stream proximity criteria. Finally, for each stream segment the estimated depletion is calculated as the product of the fraction of total depletion estimated using the depletion apportionment equations and the streamflow depletion rate estimated using the analytical model (Equation 1): In this study, we are representing streamflow depletion as depletion potential (Qd,i), which is the volumetric streamflow depletion in a stream segment normalized by the pumping rate, Qw Fienen et al., 2018). Depletion potential is typically between 0% (streamaquifer flux is unaffected by pumping) and 100% (the change in stream-aquifer flux is equal to the pumping rate).

Stream proximity criteria
Stream proximity criteria define the stream segments which could potentially be depleted by a given well, and to our knowledge have never been explicitly defined or tested in previous work. In this study, we developed and evaluated five stream proximity criteria ( Figure 1): The whole domain stream proximity criteria are the most permissive and use all stream segments within the domain.
The local area stream proximity criteria retain any stream segment within a specified distance of the well. This is based on the 'local area' concept for a well, which is the area in which the effects of pumping are likely to impact streams Fienen et al., 2016). We define our local area by calculating double the maximum distance from any land point to the closest stream segment (=8057 m), which ensures that > 1 stream segments are potentially affected by each well and avoids instances of wells with no streams meeting the stream proximity criteria.
The adjacent stream proximity criteria, proposed by Reeves et al. (2009), retain all stream segments with catchments neighboring the well. To identify which stream segments are adjacent to the well, we use the stream segments with non-zero depletion fractions estimated by the Thiessen Polygon depletion apportionment equation (see Section 2.1.2).
The expanding stream proximity criteria, introduced in this study, use the analytical model (Section 2.1.3) to determine the maximum distance from a well with a depletion potential of at least 1% for the timestep of interest and retains all stream segments within this distance of the well. Unlike the whole domain, local area, and adjacent criteria (which are static through time), the expanding criteria retain more stream segments at later timesteps. We also evaluated the sensitivity of these criteria to the 1% threshold used for this criterion (Section 2.1.4).
The adjacent + expanding stream proximity criteria combines the adjacent and expanding criteria by considering any stream segment that is either in a catchment neighboring the well (adjacent), or within the maximum distance with a depletion potential > 1% at a given timestep (expanding). Thus, when the expanding radius is very small (e.g. shortly after the start of pumping), the results are identical to the adjacent method, and when the expanding radius is very large the results are identical to the expanding method.

Depletion apportionment equations
We evaluated the same five depletion apportionment equations Zipper et al. (2018a) compared under steady-state conditions (Figure 1). We briefly described them here: The web (Equation 2) and web squared (Equation 3) depletion apportionment equations divide each stream segment into a series of equally spaced (e.g., 5 m) points and then weight depletion based on the inverse distance and square of the inverse distance to each point, respectively. For each stream segment, i, the fraction of total depletion apportioned to that segment, fi, is calculated as: where d is the horizontal distance to each stream segment from the well, P is the total number of points into which a stream segment is subdivided, and n is the number of stream segments meeting the stream proximity criteria. By dividing each stream segment into a series of points, the web and web squared methods apportion depletion based on the length of each stream segment which is consistent with analytical model theory developed for streams of finite length (Kollet et al., 2002).
The inverse distance (Equation 4) and inverse distance squared (Equation 5) methods are similar to web and web squared, respectively, but only consider the point on each stream segment closest to the well of interest, where d is the horizontal distance to each stream segment from the well and n is the number of stream segments meeting the stream proximity criteria.
The Thiessen polygon (Equation 6) is an area-based, rather than distance-based, depletion apportionment equation. It uses two overlapping sets of polygons: one of which is defined using the point on each stream segment closest to the well, and the other using the point on each stream segment closest to the well and the location of the well. For each stream segment, fi is calculated as: where ai is the area of the polygon for stream segment i from the first set and aw is the area of the polygon from the second set which contains the well. Because this is an area-based method, the depletion apportionment results from this method are the same for all stream proximity criteria.
Only two studies we are aware of compared the performance of depletion apportionment equations. Reeves et al. (2009) compared 9 different methods with results from the Kalamazoo Valley (Michigan) numerical groundwater model and found that the inverse distance approach performed the best, which was then implemented in the Michigan Water Withdrawal Assessment Tool. Zipper et al. (2018a) compared these same 5 depletion apportionment equations among several different drainage densities, topographic conditions, and recharge rates around Nanaimo, British Columbia (Canada), and found that the web squared method best matched results from a numerical model under steady-state conditions.

Analytical streamflow depletion models
Of the dozens of existing analytical streamflow depletion models (reviewed in detail by Huang et al., 2018), we selected two for comparison. The first, referred to as the Glover model, is described in Glover & Balmer (1954). The second, referred to as the Hunt model, is described in Hunt (1999). Following Huggins et al. (2018), these two models were selected due to their widespread application, simplicity of implementation, and contrasting representation of surface water features as described below. Like most analytical models, both the Glover and Hunt models adopt the assumptions of a single linear stream perpendicular to a homogeneous, isotropic aquifer of infinite horizontal extent with no vertical groundwater flow, among others.
The Glover method assumes that streams fully penetrate the aquifer and that there is no resistance to flow through the streambed. Based on these assumptions, the Glover model defines depletion potential, Qf, in a stream segment for a given pumping well as: where S is the aquifer storage coefficient (specific yield in an unconfined aquifer), T is the aquifer transmissivity, t is the time since the start of pumping, and d, Qs, and Qw are as defined above.
In contrast, the Hunt method assumes that streams partially penetrate the aquifer and that there is a streambed clogging layer of a finite thickness (br) and hydraulic conductivity (Kr). Based on these assumptions, the Hunt method defines depletion potential for a given pumping well as: where λ is the streambed conductance. The streambed conductance is defined as: where w is the width of the river. Since data on Kr and br are rarely available, they are often estimated based on other quantities. Following Reeves et al. (2009), in this study we estimated Kr as 10% of the aquifer's horizontal hydraulic conductivity; and br as either the vertical distance from the top of a well's screened interval, or the total length of the well screen, whichever was greater.
The Glover model will always estimate greater depletion than the Hunt model, though when the streambed conductance term is large (high w*Kr or small br) the two models provide similar output. For both the Glover and Hunt models, we also used the principle of superposition (Jenkins, 1968) to model Qf under intermittent pumping schedules (Section 2.2.3).

Fine-tuning analytical depletion functions
After identifying the combination of stream proximity criteria, depletion apportionment equation, and analytical model (Section 3.1), we conducted an additional sensitivity analysis of two parameters: the percent threshold used to define the limit of the adjacent + expanding stream proximity criteria, and the exponent used in the web and web squared depletion apportionment equations. For the percent threshold, we varied over three orders of magnitude: 0.01%, 0.1%, and 1%. Smaller thresholds would correspond to a larger domain used in the expanding portion of the stream proximity criteria. For the exponent, we varied the exponent from one to three at intervals of 0.5; an exponent of 1 is equal to the web depletion apportionment equation (Eq. 2) and an exponent of 2 is equal to the web squared equation (Eq. 3).

Evaluating analytical depletion functions
To evaluate the performance of the analytical depletion functions, we compared them to model output from an archetypal numerical MODFLOW model (Section 2.2.2) based on the Navarro River (California) developed for this study. Archetypal numerical models are simplified representations of real-world environments intended to isolate specific processes of interest (Table S1; Zipper et al., 2018a) and have many advantages over calibrated, site-specific models for the evaluation of analytical depletion functions. Most importantly, archetypal numerical models eliminate site-specific complexity unrelated to our research questions in order to develop generalizable understanding of the importance of process-based representations of streamflow depletion via a comparison between the two modeling approaches. Given these advantages, archetypal models have a long history of use in hydrogeology (Bredehoeft & Kendy, 2008;Kendy & Bredehoeft, 2006;Lamontagne-Hallé et al., 2018;Zipper et al., 2017Zipper et al., , 2018b).

Study domain
Our analytical depletion functions and archetypal numerical model were based on the Navarro River Watershed, an 816 km 2 watershed in Mendocino County, California, USA ( Figure  2; HUC1801010804 in the US National Hydrography Dataset). Streamflow in the Navarro River is highly seasonal, with high flows during the winter rainy season (December-April) and flow sustained primarily by baseflow during the summer dry season ( Figure S1; Figure S2). While the Navarro River formerly contained excellent anadromous fish habitat, increases in stream temperature and sedimentation in recent years have contributed to a decline in salmonid populations and subsequent classification by the US Environmental Protection Agency as a "water quality limited water body" (North Coast Regional Water Quality Control Board, 2005).
Human land use in the Navarro River Watershed includes agriculture, timberland, rangeland, and rural residential. The footprint of irrigated agriculture has expanded notably over the past 50 years, with the largest water users being vineyards (McGourty et al., 2013). Additionally, the Navarro River Watershed is part of the 'Emerald Triangle' region of California (Humboldt, Mendocino, and Trinity Counties) which is home to widespread cultivation of cannabis, and surface water and groundwater use associated with cannabis cultivation is an emerging management concern (Carah et al., 2015). For illustrative purposes, in this paper we test analytical depletion functions using hypothetical groundwater wells pumping at typical rates for cannabis cultivation (Section 2.2.3).

Numerical model
The basis of our archetypal numerical model was the Navarro River Watershed ( Figure  2), including all adjacent HUC12 watersheds. We used the FloPy package for Python (Bakker et al., 2016(Bakker et al., , 2018 to build a simplified model of the domain using MODFLOW-NWT finitedifference groundwater flow program ( Figure 2b; Niswonger et al., 2011). We simulated a homogeneous subsurface to reduce domain complexity based on our archetypal modeling approach (Table S1). The archetypal numerical modelling approach is appropriate for our research questions as the focus of the present study was the comparison of the numerical and analytical approaches for a generalized assessment of sensitivity analysis of analytical depletion functions under transient conditions, rather than a site-specific assessment of the Navarro River Watershed. Therefore, our model does not represent site-specific features such as potential subsurface heterogeneity, spatial variability in recharge, or existing groundwater/surface water withdrawals. Future work will investigate the impacts of subsurface heterogeneity on the performance of analytical depletion functions via a comparison with a site-specific model.
The domain was discretized at 100 m x 100 m lateral resolution with 5 vertical layers for a total of 1,112,555 active grid cells. Vertically, the top of the model domain was set to the land surface elevation at the center of each grid cell from the National Elevation Dataset (Figure 2a). The top 4 model layers had a thickness of 20 m each, and the bottom layer had a variable thickness with a constant bottom elevation of 100 m below sea level. All model layers were unconfined and capable of drying and re-wetting as necessary. The subsurface was defined as homogeneous with a horizontal saturated hydraulic conductivity of 10 -5 m s -1 , vertical hydraulic conductivity of 10 -6 m s -1 , and specific yield of 0.1.
Surface water boundary conditions were defined at all cells including second-order or higher streams in the US National Hydrography dataset ( Figure 2). The stream network within the model domain is divided into 485 segments, of which 175 are part of the Navarro River Watershed and 310 are in the surrounding adjacent watersheds (Figure 2). We used the river package (RIV; Harbaugh et al., 2000) to represent surface water features. For comparison, we also built an archetypal numerical MODFLOW model representing stream features with the surface-water routing package (SFR2; Niswonger & Prudic, 2005) which routes flow through a network of stream channels and allows for overland flow input to the streams; these results are presented in the supplemental information. The ocean along the west edge of the domain was a specified constant head boundary (CHB) at an elevation of 0 m.
Groundwater recharge for the domain was spatially uniform and prescribed as 150 mm yr -1 , which is equal to the long-term annual average baseflow ( Figure S1). We divided recharge evenly (30 mm mo -1 ) among the 5 months constituting the rainy season (December-April; Figure  S2), and set recharge to 0 mm mo -1 during the rest of each year. For the SFR2 package, which requires an overland flow input, we calculated the mean monthly difference between total streamflow and baseflow ( Figure S1) which was applied to each stream segment based on the direct drainage area.
To test the impacts of groundwater pumping on streamflow in a systematic manner, we created a grid of 113 synthetic pumping wells within the Navarro River Watershed which were simulated using the Multi-Node Well package (MNW2; Konikow et al., 2009). Well screens started at the water table elevation from a steady-state simulation and extended 50 m below this. Thus, wells could either be fully contained in one model layer or span up to 3 model layers. These synthetic pumping wells were created by making a grid of pumping wells with 1000 m spacing (10 grid cells in x-and y-dimensions), excluding any pumping wells that were placed in grid cells containing stream features, and selecting every 7 th well for simulations as a compromise between a large number of wells and minimizing computational time (Figure 2b).

Pumping scenarios
We conducted two transient pumping experiments using these wells to test the performance of the analytical models with depletion apportionment equations: (1) transient continuous; and (2) transient intermittent. In each experiment, we turned wells on one-at-a-time at a rate of 2.63 x 10 -5 m 3 s -1 (600 gallons day -1 ) and compared to a baseline simulation with no pumping at any well. As noted above (Section 2.2.1), cannabis cultivation is a concern in the Navarro River Watershed; this pumping rate corresponds to estimated water use for an outdoor cannabis plantation with 100 plants . This is larger than the average number of plants at a typical outdoor grow site in the region (n=45), but well below the maximum observed 757 plants (Butsic & Brenner, 2016). While cannabis water needs were used to define our pumping rate, our results and analysis focused on depletion potential (Section 2.1) and are therefore broadly applicable to groundwater withdrawals for any purpose.
Both transient experiments were 10 years in length. For the transient continuous experiment, we began pumping in May (the beginning of the dry season; Figure S2) and pumped at a constant rate for 116 months until the end of the 10-year simulation. We compared results from this experiment to the Glover and Hunt analytical models combined with each of the depletion apportionment equations (Section 2.2). For the transient intermittent experiment, we turned pumps on during the typical irrigation season of June-October . Pumps were turned off between pumping periods.
We used a multi-stage spin-up to ensure the groundwater and surface water components of our models had reached a dynamic equilibrium prior to beginning our pumping experiments (Somers et al., 2018;Zipper et al., 2018b). First, we ran a steady-state simulation with no pumping and recharge rates defined as the long-term average annual baseflow (150 mm yr -1 ). Using these steady-state results as initial conditions, we then ran the model for a 30-year transient spin-up. To ensure the model reached a dynamic equilibrium, we calculated the annual range in river leakage and found <0.1% change between years by the end of the spin-up simulation for both RIV and SFR ( Figure S3).
The primary variable of interest for comparison with analytical depletion functions was depletion potential, or the change in the stream-aquifer flux following pumping normalized by the pumping rate (Section 2.1; Barlow et al., 2018). To calculate depletion potential from our MODFLOW model output, we calculated the change in net stream-aquifer flux into each stream segment in the Navarro River Watershed while pumping each well relative to a simulation with no pumping (Barlow & Leake, 2012). In rare cases, the depletion potential from the numerical model could be <0%, indicating an increase in stream-aquifer flux during pumping; this can occur when, for example, pumping causes the contributing area to a stream to shift due to a change in the water table elevation leading to changes in groundwatershed divides (Zipper et al., 2018a).

Model evaluation criteria
We identified four performance metrics intended to capture different aspects of analytical depletion function performance which were calculated at each output timestep: 1. Spatial distribution of primary impact, defined as accurate identification of the stream segment most affected by a well. We quantified this as the percentage of wells for which the stream segment with the greatest depletion potential value predicted by the analytical depletion function matched the stream segment with the greatest depletion potential predicted by the MODFLOW model.

2.
Magnitude of primary impact, defined as the accuracy of the predicted depletion potential in the most affected stream segment. We quantified this as the mean absolute error (MAE) between the depletion potential estimated by the analytical depletion function and the MODFLOW model in the most affected segment, normalized by the range in depletion potential among all wells. We normalized MAE to account for the fact that larger absolute errors are more common but less problematic at higher predictions of depletion potential (e.g. an error of 0.10 is less problematic for a predicted depletion potential of 0.80 than it is for 0.20).
3. Spatial distribution of overall impacts, defined as the accuracy of the predicted depletion potential across the entire domain. We quantified this as the Kling-Gupta Efficiency (KGE; Gupta et al., 2009) between the depletion potential estimated by the analytical depletion function and the depletion potential estimated by the MODFLOW model. The KGE is a hydrological fit metric related to the Nash-Sutcliffe Efficiency which integrates correlation, bias, and variability between the two methods, with 1.0 being a perfect fit and lower values indicating worse performance. KGE is a useful metric for assessing the performance of analytical depletion functions, because it can be further decomposed into the relative contributions of correlation, bias, and variability to overall error (Gudmundsson et al., 2012) which have different management implications (Zipper et al., 2018a).
4. Magnitude of overall impacts, defined as the accuracy of the predicted capture fraction, which is equal to the cumulative depletion potential summed across all stream segments from a given well at a given timestep . We quantified this as the MAE between the capture fraction estimated by the analytical depletion function and the capture fraction estimated by the MODFLOW model, normalized by the range in capture fraction among all wells.
We calculated each of these performance metrics for all analytical depletion functions and compared across different combinations of stream proximity criteria, depletion apportionment equations, and analytical models to determine the sensitivity of the analytical depletion function for each constituent component and identify which analytical depletion function had the best overall performance for our domain. To evaluate the factors influencing the performance, we also used statistical analyses to compare performance metrics for the best performing analytical depletion function to various landscape attributes and metrics describing well-stream geometry in order to identify the conditions under which analytical depletion functions were most accurate.

Sensitivity analysis of analytical depletion functions
There is a wide variety of performance across all analytical depletion functions ( Figure  S4; Figure S5). In the following sections, we explore variability in each of our performance metrics (described in Section 2.1) and as a function of each depletion apportionment equation (Section 3.1.1), stream proximity criteria (Section 3.1.2), and analytical model (Section 3.1.3). The average performance over the entire 10-year simulation for a subset of the analytical depletion functions is shown in Table 1.

Sensitivity to depletion apportionment equation
The relative performance of the depletion apportionment equations varies among the different performance metrics (Section 2.1). The web and web squared methods perform substantially better than the other methods at the spatial distribution of the primary impact, accurately identifying the most affected stream segment ~50-75% of the time (Figure 3a). Much of the inaccuracy, particularly in the first year after the start of pumping, is due to not identifying stream segments with minimal pumping impacts (e.g. depletion potential between 0.1% and 5%). When only wells with a depletion potential of at least 5% in the most affected segment are considered, the accuracy of all depletion apportionment approaches improves notably, with >80% accuracy in the first several years before asymptoting at ~75% in the continuous pumping experiment and ~90% in the intermittent pumping experiment.
Similarly, the web and web squared approaches are also the best at estimating the magnitude of impacts in the most affected segment (Figure 3b), with normalized MAE consistently lower than the other depletion apportionment equations. Among the various depletion apportionment equations, normalized MAE is ~0.05-0.15 throughout the continuous pumping experiment (meaning ~5-15% of the observed range in depletion potential). There is a noted seasonal pattern in performance in the intermittent pumping experiment, with normalized MAE of ~0.05-0.10 during the pumping period and normalized MAE of ~0.15-0.20 when the wells are shut off. This variability in seasonal performance is driven primarily by changes in the observed range of depletion potentials between the two seasons, with a larger range when pumps are turned on during the summer.
The spatial distribution of overall impacts, as quantified using the KGE across all stream segments (Figure 3c), indicate a decay in performance through time for all methods. Early in the simulations, shortly after the start of pumping, KGE is relatively high since depletion is primarily concentrated in the segments closest to the well. As time goes on and impacts become more diffuse, the overall performance decreases to different degrees among the different methods. After ~1.5 years, the web and inverse distance approaches plateau at a KGE of ~0, while the web squared and inverse distance squared approaches plateau at a KGE of ~ -0.5. Unlike the distancebased approaches, the performance of the area-based Thiessen Polygon method continues to degrade through time.
The magnitude of overall impacts shows fairly consistent patterns across all depletion apportionment equations (Figure 3d). Normalized MAE of predicted capture fraction increases through time, from ~0.10 at the start of the continuous pumping experiment to ~0.50 for the worst-performing metrics by the end. The various depletion apportionment equations diverge after approximately 3 years and differences between equations increases through time. Normalized MAE of capture fraction in the intermittent pumping experiment has a similar upwards trajectory to the continuous pumping experiment, and a strong seasonal pattern as observed in the normalized MAE of the most affected segment (Figure 3b). However, in contrast to the spatial distribution of overall impacts (Figure 3c), the web and inverse distance approaches are the worst performing depletion apportionment equations, with the Thiessen Polygon, web squared, and inverse distance squared all performing similarly and representing the best options.
Due to the similar performance of the web and web squared depletion apportionment equations across the various performance criteria (Figure 3; Figure S4), we also compared among additional exponents ranging from one to three. For the performance criteria assessed (magnitude of primary impact, bias of primary impact, and magnitude of overall impacts) there tended to be a clear rank-ordering among the different exponents ( Figure 4). As the web exponent increased, the normalized MAE and bias of depletion potential predictions for the most affected segment also increased while the KGE across all segments decreased. From a management perspective, analytical depletion functions are most valuable if they provide conservative estimates of depletion (overestimates) to avoid potentially over-allocating water resources in a region (Zipper et al., 2018a). We find that all web exponents overestimate depletion shortly after the start of pumping in the continuous pumping experiment, and all except web provide initially conservative estimates for the intermittent pumping experiment (Figure 4b). However, as time goes on the percent bias decreases and the web squared approach produces the least biased estimates among the exponents providing conservative results.

Sensitivity to stream proximity criteria
Compared to the variability among depletion apportionment equations, there is little difference among stream proximity criteria in predicting either the spatial distribution (Figure 5a) or the magnitude (Figure 5b) of the primary impact. In contrast, however, the stream proximity criteria lead to greater variability in both the spatial distribution ( Figure 5c) and magnitude (Figure 5d) of overall impacts compared to the variability among depletion apportionment equations. This difference in sensitivity is due to the different roles of these two components of the analytical depletion functions. The depletion apportionment equations distribute depletion among different stream segments, and therefore are relatively unaffected by the total number of segments included as all of them weigh more depletion to closer stream segments. In contrast, the primary function of the stream proximity criteria is defining the total number of streams included in the depletion apportionment equations; therefore, the stream proximity criteria have a large influence on the results encompassing the overall domain.
Comparing among the performance criteria related to overall impacts, there is a strong effect of the number of stream segments included. The stream proximity criteria which considers the largest number of stream segments (whole domain) has the highest KGE across all segments, but also the highest normalized MAE of capture fraction; followed sequentially by criteria with decreasing numbers of stream segments (local area, adjacent + expanding, adjacent, and expanding). As the time increases and the number of stream segments included in the expanding criteria increases, it begins to perform better than the adjacent stream proximity criteria ( Figure  5c).
While the number of stream segments is an important determinant of the performance of the stream proximity criteria, overall performance is fairly insensitive to the percent depletion potential threshold used to define the stream segments included. Comparing across three orders of magnitude, the spatial distribution of overall impacts is identical for the first 2 years following the start of pumping, and differs only slightly during the rest of the continuous pumping and intermittent pumping experiments ( Figure S6). Overall, a 1% threshold for the stream proximity criteria (meaning that depletion is apportioned among all stream segments which have an estimated depletion potential of > 1% of the pumping rate) performs the best throughout the entire simulated period.

Sensitivity to analytical model
As with the stream proximity criteria, the performance of the analytical models at identifying the most affected segment is virtually identical (Figure 6a). Unlike the stream proximity criteria, however, the two analytical models differ in their prediction of the magnitude of depletion in this segment: the Hunt method has a consistently lower normalized MAE in the most affected segment. Given that analytical depletion functions tend to overpredict depletion potential in the most affected segment ( Figure S7), the lower error with the Hunt model indicates that the consideration of the streambed properties leads to lower predicted depletion potential which better matches the MODFLOW output. However, unlike the differences between depletion apportionment equations and stream proximity criteria, all of the performance criteria show a converging trend between the two analytical models towards the end of the continuous pumping experiment (Figure 6). The converging trend indicates that, under transient conditions, the relative importance of streambed conductance decreases as estimated depletion potential increases.
While our results indicate that the sensitivity of modeled depletion potential to the choice of analytical model is relatively low, previous work has demonstrated that the streambed conductance exerts a large influence on the accuracy of analytical model results (Sophocleous et al., 1995;Spalding & Khaleel, 1991). In settings with a lower streambed conductance (e.g. lower streambed hydraulic conductivity or a thicker streambed clogging layer), the difference between the Hunt and Glover models would be greater; unfortunately, estimates of streambed conductance require substantial field work to obtain, and are therefore available in very few settings. Heterogeneity in streambed conductance can be an important factor which is not considered here (Lackey et al., 2015). Additionally, both the numerical model and analytical models used here assume that streams do not dry as a result of groundwater pumping, and therefore may not be well-suited to intermittent streams which are vulnerable to groundwater pumping, though if it is known a priori which stream segments are dry at certain times, they can be excluded from stream proximity criteria.

Accuracy of the best approach
Based on the results of our sensitivity analysis, we conclude that the best-performing analytical depletion function is the combination of the web squared depletion apportionment equation (Section 3.1.1; Figure 3; Figure 4), the adjacent + expanding stream proximity criteria using a 1% threshold (Section 3.1.2; Figure 5; Figure S6), and the Hunt model (Section 3.1.3; Figure 6). Compared to all other analytical depletion functions, this approach provides a conservative estimate of depletion while performing among the best for each of our four performance criteria.
The selected analytical depletion function does particularly well at estimating the primary impacts of pumping, which tend to be of most concern to managers. The best-performing approach correctly identifies the spatial distribution of primary impact (most affected stream segment) > 70% of the time there are substantial impacts (depletion potential Qd > 0.05) in the continuous pumping experiment and > 85% of the time in the intermittent pumping experiment (Figure 3a, Figure 5a, and Figure 6a). Additionally, the magnitude of primary impact is wellpredicted, with a normalized MAE in most affected segment < 0.15 in the continuous pumping experiment and < 0.20 in the intermittent pumping experiment (panel b in Figure 3, Figure 5, and Figure 6). Error in the magnitude of primary impacts is characterized by a positive bias ( Figure  6b), which is most pronounced at high levels of depletion (Figure 7a). This positive bias indicates that the selected analytical depletion function provides a conservative estimate of depletion in strongly affected stream segments, which is important to avoid over-allocating groundwater resources.
Performance metrics describing predictions of the distribution and magnitude of domainwide depletion are less encouraging than those describing the primary impacts. For the spatial distribution of overall impacts the selected analytical depletion function performs the worst relative to other options, with KGE across all stream segments > 0 only during the first year of the transient pumping experiments (panel c in Figure 3, Figure 5, and Figure 6). However, the magnitude of overall impacts is still predicted fairly accurately, with the normalized MAE of total capture fraction < 0.40 throughout the continuous and intermittent pumping experiments (panel d in Figure 3, Figure 5, and Figure 6), with normalized MAE < 0.20 for the first 2 years after the start of pumping. Like the primary impacts, the analytical depletion function provides a conservative estimate of depletion potential, with errors characterized by overprediction of depletion in heavily affected segments (Figure 7a).
Error decomposition (Gudmundsson et al., 2012;Gupta et al., 2009) indicates that the contributions of different factors to overall error are relatively stable through time (Figure 7b; Figure S8). Imperfect correlation is the cause of ~65% of the total mean squared error for the most affected segment, with variability contributing most of the other ~35% (Figure 7b; Figure  S8). Bias was not a dominant source of error, despite the observed overprediction at high levels of depletion potential; this is because the contribution of bias to overall mean squared error is calculated using the difference between the mean analytical and mean MODFLOW depletion potential, and the positive bias at high levels of depletion potential is balanced out by a negative bias at low levels of depletion potential which is due to the conservation of mass (Figure 7a). This is consistent with the steady-state results from Zipper et al. (2018a), which found that the web squared depletion apportionment equation had a mix of primarily correlation-and variability-driven error. The management implications of these different types of error are discussed in Zipper et al. (2018a); having a relatively balanced error profile between variability and correlation indicates that both the overall mean depletion and the spatial patterns of depletion will be captured by the analytical depletion function.
To determine whether our results were sensitive to the construction of the MODFLOW model, we also compared each analytical depletion function to results from separate MODFLOW models constructed using the SFR2 package for stream features instead of the RIV package ( Figure S5). While 3 of the 4 performance metrics are comparable whether the RIV or SFR2 packages are used, the analytical depletion functions do not match the most affected segment as frequently when compared to the SFR2 models, asymptoting at ~60-65% for both the continuous and intermittent pumping experiments.

Landscape attributes influencing performance
Performance of the analytical depletion functions varies following several factors related to landscape position and well-stream geometry. Spatially, performance tends to be worst in the northeastern portion of the domain near the aquifer boundary (Figure 8a). This region corresponds with some of the highest elevation portions of the watershed (Figure 2). Across all wells, performance is highly correlated with a number of elevation-based metrics including the land surface elevation, water table elevation, and water table depth. Of these, normalized MAE has the strongest linear relationship with steady-state water table elevation (Figure 8b), with decreased performance at higher water table elevations (R 2 = 0.29, p < 10 -5 ).
Additionally, both the lateral and vertical distance between the well and the stream segment of interest influence analytical depletion function skill. The lateral stream-well distance has a strong positive correlation with normalized MAE (R 2 = 0.72, p < 10 -5 ), though at wellstream distances < 2.7 km performance is insensitive to changes in well-stream distance ( Figure  8c). Similarly, the analytical depletion functions perform best when the elevation difference between the well and stream is small, with particularly large decreases when the well screen is at a lower elevation than the stream segment ( Figure 8d). Finally, there is a negative correlation between analytical depletion function performance (normalized MAE) and stream segment length for stream segments < ~1 km in length, while performance is fairly insensitive to stream segment length once segment length exceeds 1 km (Figure 8e). Poor performance in short streams was also observed under steady-state conditions in Zipper et al. (2018a).

Utility, limitations, and future research needs for analytical depletion functions
Our results indicate that analytical depletion functions are likely to be a useful tool for quantifying streamflow depletion resulting from an existing and/or proposed well, thus allowing managers to assess pumping impacts on streamflow in settings where numerical models are not available (Watson et al., 2014). Notably, the analytical depletion functions are successfully able to identify which stream segment will be most affected by a pumping well the majority of the time and provide accurate predictions of the magnitude of its impact (Section 3.2). Comparing across the various factors influencing performance (Section 3.3), we can generalize that the analytical depletion functions are most likely to be accurate for wells placed in relatively flat areas with a near-surface water table and within a few kilometers of a downgradient stream. Conveniently, these factors also describe locations which are often well-suited to agriculture, such as alluvial valleys, indicating that the analytical depletion functions are likely to be most effective in the locations where they are most needed. For instance, in the Navarro River Watershed much of the agriculture is concentrated in the lowlands of the Anderson Valley around Boonville, which is where analytical depletion functions perform the best (Figure 8a).
From a bigger picture perspective, the primary utility of analytical depletion functions is likely to be as a screening tool for impacts of pumping wells, rather than a replacement for calibrated numerical models. By providing rapid estimates of streamflow depletion which can be used to identify areas of potential concern, adding analytical depletion functions to the toolbox of water managers and scientists will allow more efficient prioritization of time-intensive efforts such as field data collection or the development of calibrated numerical models. One example for how these tools may be implemented in a decision support context is provided by Huggins et al. (2018), who show that depletion apportionment equations combined with analytical models can provide rapid network-wide assessment of streamflow depletion when integrated with existing online tools.
While we tested a variety of analytical depletion functions, our analysis was not exhaustive and in some settings it may be necessary to go beyond the combinations of stream proximity criteria, depletion apportionment equations, and analytical models considered here. For instance, in domains where semi-confined ('leaky') aquifers represent a significant source of water to wells, analytical depletion function performance would likely be improved by using an analytical model specifically designed for these settings (Butler et al., 2007;Hunt, 2003;Zlotnik, 2004;Zlotnik & Tartakovsky, 2008), rather than the Hunt and Glover models used here. A recent review provides a useful flow-chart for analytical model selection (Huang et al., 2018, Figures 4 and 5). However, additional work is needed to test the integration of these analytical models with depletion apportionment equations. Furthermore, all of our experiments turned wells on one-at-atime, and future work is needed to examine the cumulative impacts of multiple pumping wells, as the total impact from multiple wells may not be equal to the sum of the effects of individual wells (Ahlfeld et al., 2016;Schneider et al., 2017).
Alternately, in some settings more complex analytical models may eliminate the need for identifying stream proximity criteria and depletion apportionment equations. For instance, in wedge-shaped aquifers bounded by approximately linear surface water features which are commonly found at the confluence of two streams, Yeh et al. (2008) provide a fully analytical solution which does not require the use of depletion apportionment equations. While the Yeh et al. (2008) solution approach considers only two stream segments and therefore does not account for potential factors such as underflow, it has the potential to provide additional theory-based evaluations of the performance of analytical depletion functions in controlled modelling experiments.

Conclusions
In this study, we evaluated the performance of 50 analytical depletion functions in order to quantify the sensitivity of analytical depletion functions to the choice of depletion apportionment equations, stream proximity criteria, and analytical model under transient conditions; and identify factors describing the landscape and well-stream geometry that influence the performance of analytical depletion functions. We found that the analytical depletion functions are most sensitive to the choice of depletion apportionment equations (Section 3.1.1), followed by stream proximity criteria (Section 3.1.2), and the least sensitive to the choice of analytical model (Section 3.1.3) under the conditions studied. The web and web squared depletion apportionment equations, which consider stream geometry, were best able to predict which stream segment would be most affected by a well, as well as the magnitude of overall impacts.
The analytical depletion function which performed the best combined the adjacent + expanding stream proximity criteria with the web squared depletion apportionment equation and the Hunt analytical model. This analytical depletion function correctly identified the stream segment most affected by a well > 70% and > 85% of the time under continuous and intermittent pumping conditions, respectively, with a mean absolute error < 20% of the range in observed depletion potential. From an application perspective, analytical depletion functions performed the best in areas with little topographic relief, when wells were within ~3 km of downgradient streams, and when stream segments are at least ~1 km in length.
Overall, these results indicate that analytical depletion functions are likely to be a useful management decision support tool, though additional research is needed to test their accuracy in a variety of hydrogeological settings. In locations where numerical models are unavailable, analytical depletion functions could be used to test whether proposed pumping wells will cause negative impacts on streams and prioritize more complex field investigations and modelling studies in higher risk locations. Given their low computational requirements, analytical depletion functions are particularly well-suited for integration with web-based tools for real-time screening and decision support (Huggins et al., 2018), where the analytical depletion functions can be integrated with diverse geospatial datasets to provide accurate, site-specific estimates of streamflow depletion.         Figure S8. In both plots, the best performing analytical depletion function is compared to the MODFLOW model using RIV for stream features. (c) lateral distance between well and stream segment; (d) vertical distance between top of well screen and stream segment, where a negative value means the well is at a lower elevation than the stream; and (e) stream segment length. For each plot, the variable on the x-axis was divided into 20 quantiles used to calculate normalized MAE. Blue lines in (b) and (c) are linear best-fit (R 2 = 0.29 and R 2 = 0.72, respectively; p < 10 -5 for both), and blue lines in (d) and (e) are smoothed loess filters. MODFLOW model with RIV stream features used for evaluation. Significant effort (weeks) and skill (specialists). Not calibrated.
Kendy & Bredehoeft, 2006;Konikow & Leake, 2014;Lackey et al., 2015. Ahlfeld et al., 2016Feinstein et al., 2016;Fienen et al., 2018;Reeves et al., 2009. Figure S1. Streamflow in the Navarro River Watershed (USGS NWIS station #11468000). Daily unit discharge for the 1951-2017 water years. Baseflow separated using recursive digital filter with exponent of 0.925 (Nathan & McMahon, 1990).   (b) magnitude of primary impact; (c) spatial distribution of overall impacts; (d) magnitude of overall impacts. Note that y-axis is reversed on (b) and (d) so that upwards indicates better performance. Left column shows continuous pumping experiment and right column is intermittent pumping experiment. The thick colored lines show the results from the Hunt model with adjacent + expanding stream proximity criteria and the web squared (blue) and web (red) depletion apportionment equations. Figure S5. Comparison across all analytical depletion functions for each performance indicator (Section 2.2.4), evaluated using MODFLOW model with SFR package. (a) Spatial distribution of primary impact; (b) magnitude of primary impact; (c) spatial distribution of overall impacts; (d) magnitude of overall impacts. Note that y-axis is reversed on (b) and (d) so that upwards indicates better performance. Left column shows continuous pumping experiment and right column is intermittent pumping experiment. The thick colored lines show the results from the Hunt model with adjacent + expanding stream proximity criteria and the web squared (blue) and web (red) depletion apportionment equations. Figure S6. Comparison among different percent thresholds used to define adjacent + expanding stream proximity criteria. Plots show spatial distribution of overall impacts performance criteria for analytical depletion function using Hunt model and web squared depletion apportionment equation compared to MODFLOW model using RIV for stream features.