HydroLang: An Open-Source Web-Based Programming Framework for Hydrological Sciences

This paper presents HydroLang, an open-source and integrated community-driven computational web framework to support research and education in hydrology and water resources. HydroLang uses client-side web technologies and standards to perform different routines which aim towards the acquisition, management, transformation, analysis and visualization of hydrological datasets. HydroLang is comprised of four main high-cohesion low-coupling modules for: (1) retrieving, manipulating, and transforming raw hydrological data, (2) statistical operations, hydrological analysis, and creating models, (3) generating graphical and tabular data representations, and (4) mapping and geospatial data visualization. Two extensive case studies (i.e., evaluation of lumped models and development of a rainfall disaggregation model) have been presented to demonstrate the framework’s capabilities, portability, and interoperability. HydroLang’s unique modular architecture and open-source nature allow it to be easily tailored into any use case and web framework and promote iterative enhancements with community involvement to establish the comprehensive next-generation hydrological software toolkit.


Introduction
The development of environmental and water resources web applications has grown in popularity to serve vast amount of publicly available data in a way that it is conveniently consumed and integrated to aid professionals and scientists in decision-making (Nelson et al., 2019;Ewing and Demir, 2021). However, the amount of data obtained from numerous sources in geosciences can be overwhelming due to the variety in formats, structures, and procedures for adoption and customization (Ebert-Uphoff et al., 2017). To overcome these problems, library-based frameworks (Yildirim and Demir, 2019) were developed considering the need of having an easy-to-use environment that can conveniently be plugged into and workflows systems (Duffy et al., 2012) per the application area's requirements.
There have been several studies that report such software development efforts that promote opensource environmental, hydrologic, and hydraulic analyses (Dawson et al., 2007;Xu et al., 2019a). These software libraries and applications provide one or more of the functionalities pertaining hydrological research workflow such as environmental data retrieval, map-based visualizations (Xu et al., 2019b), geospatial data processing (Sit et al., 2019), statistical operations, and data manipulation and analytics for domain-specific tasks. HydroDesktop, as an example, is an open-source desktop application that was developed as part of the CUAHSI toolkit to allow users to search, retrieve, and analyze hydrologic data while allowing external plug-ins for visualization and analysis. Though it is now a legacy system, it is one of the first notable examples of comprehensive, open-source, and extensible hydrological software toolkits putting forth a vision and demonstrating the necessity for modern and technologically advanced approaches for informatics tasks in hydrology. Similar to HydroDesktop in regard to requiring a desktop installation, FREEWAT is an open-source plugin designed for the QGIS desktop software as an integrated water management and planning module to simulate surface water and groundwater in regard to quantity and quality (Rossetto et al., 2018).
As the popularity and convenience of web frameworks increase, there has been a focus shift to provide on-demand and interoperable GIS capabilities through web (Swain et al., 2016;Sit et al., 2021). Alcantara et al. (2019) introduced two web applications (i.e., Streamflow Prediction Tool and HydroViewer) for displaying hydrographs for specific river reaches and visualize streamflow forecasts for different regions, requiring a certain server configuration to be adopted and customized. (Delipetrev et al., 2014) presented scalable web application for handling geospatial data and offering functionality for water resources modeling. HydroCloud is an open-source web application to retrieve, plot, and analyze hydrological data from stream gauges and to provide a contextual visualization with weather imagery and watershed description (Roberge et al., 2017). SHARKS app is an R-based data analysis tool to retrieve and visualize meteorological and stream-related data that is served on the web using the Shiny web application framework (Brendel et al., 2019).
As the diversity of hydrological and environmental tasks grow and the formats of high velocity and high-volume data resources substantially differ, plug-and-play software libraries offer an opportunity to provide customizable functionality with low fidelity to any certain use case, in comparison to desktop applications (Kraft et al., 2011). Given the popularity of Python language among environmental and geophysical scientists for data analysis due its high-level abstraction and ease of use, various Python packages have been developed and proposed in the literature partially addressing certain aspects of hydrological data retrieval, analysis, and visualization needs (Ayzel, 2016). Wradlib is an open-source Python library to facilitate analysis of weather radar data with a set of processing algorithms while keeping a community-based approach with detailed documentation (Heistermann et al., 2013). Wflow is a Python-based distributed hydrological simulations platform that includes built-in various models such as for waves, sediment dynamics, and runoff (Schellekens, 2018). Ulmo is an open-source tool that specifically specializes in retrieving public hydrology and climatology data from preconfigured list of USbased data resources (Ulmo, 2020). MetPy is a community-driven comprehensive library with the purpose of bringing the meteorological data analysis functionalities of GEMPAK to Python ecosystem for custom use cases (May et al., 2020).
The review of the literature yields a significant gap for a comprehensive generalized solution for hydrological data communication, analysis, and visualization on the web. Most preliminary works are often limited to support very specific use case with unique computational environment requirements, presenting huge challenges for integration into existing systems for new tasks. To liberate the associates of hydrological domain from technical challenges of data variety and format incompatibility to allow a shared environment where hydrological analyses can be reproduced and communicated among stakeholders without requiring any system configuration, a web-based, modular, integrated, community-driven, extensible, and open-source software library is needed for client-side scientific computations.
This paper presents HydroLang, an open-source and integrated community-driven computational web framework to support research and education in hydrology and water resources. HydroLang uses client-side web technologies and standards to perform different routines which aim towards the acquisition, management, transformation, analysis and visualization of environmental datasets. HydroLang is comprised of four main high-cohesion low-coupling modules: Data, Analysis, Maps, and Visualization. The data module can retrieve environmental data from open repositories such as governmental agencies and other freely available data resources through HydroLang Application Programming Interfaces (API). The analysis module encapsulates three sub-components for data preprocessing, statistical analyses, and hydrological analyses with community-accepted subroutines and practices, and predictive model development and training using customizable neural networks. The maps module contains the means of generating and annotating interactive maps from retrieved raw data as well as the results of analyses. Finally, the visualization module provides the user with charts, reports, and other visual tools that aid the user to better examine and communicate the data at hand. HydroLang's unique modular architecture and open-source nature allow it to be easily tailored into any use case and web framework and promote iterative enhancements with community involvement to establish the comprehensive next-generation hydrological software toolkit.
The remainder of this article is organized as follows. Section 2 presents the system design and components, and describes the methodology. Section 3 focuses on the implementation of the presented framework detailing the functionality. Section 4 describes multiple use cases to demonstrate its modular nature and applicability. Section 5 concludes the article with future work.

Methods
HydroLang is developed using a component-based modular approach to serve as a one-stop software library to support hydrological research and education on the web. All hydrological background described within this article are implemented to support the selected case studies and to demonstrate HydroLang's applicability for common hydrological tasks for research and educational use cases. The framework also allows users to implement and utilize hydraulic, statistical, and hydrological functions based on their requirements and available data. This is due to the extendable nature of HydroLang that is specifically designed to allow systematic and ontological expansions and modifications to functionality, services, external tools, and domain-specific parameters.

Software Architecture
HydroLang is designed as a JavaScript library to augment existing web platforms and applications regardless of their underlying software stack. It incorporates a modular structure built upon ECMAScript 6 standard to benefit from advanced programmatic features and object-oriented design. During the development phase, no certain architecture is assumed in order to eliminate dependencies and minimize reliance on JavaScript frameworks (e.g., Angular.js, Vue.js). External software libraries (e.g. Leaflet, TensorFlow) are defined within HydroLang to be retrieved on-the-fly when a function requiring the library is invoked. For the integration of HydroLang in web applications, the library supports moduletyped import via HTML or JS as well as MIME-typed (i.e., text/javascript) import through its bundled version as a single JS file (namely, hydrolang.js). For bundling with backward-compatibility, Webpack and Babel is used. Figure 1 shows an overview of HydroLang's software architecture.
The library is comprised of a hierarchy where functionalities with similar scope are grouped together. JS Modules are utilized to implement the hierarchy where a semantic encapsulation exist between different use cases of HydroLang. During the hierarchy construction, high-cohesion lowcoupling principle is adopted where each module contains methods that are closely related and are not dependent on other modules to be able to function properly. Thus, functions under each module independently regulate the input parameters and return an output useful on its own. This philosophy paves the way for chain-like command structures where users can create combinations with dot notation (Engelschall, 2017) in a way that the output of a function in one module can automatically be the input of another module's function. Furthermore, this approach allows the creation of intelligible namespaces where for each function, the underlying algorithm or external services used can be modified or extended. In fact, each external service and library that is utilized within HydroLang is designed to be easily replaced depending on the user preference and requirement. Thus, the easily extendable nature of the library creates the opportunity for a comprehensive and consensual hydrological software toolkit addressing the needs of different stakeholders and organizations.

Hydrological Analysis Methodology
In this section, the methodology for rainfall-runoff analysis implemented within HydroLang is described, with the purpose of illustrating the steps taking towards the derivation of well-known hydrological procedures. Nonetheless, these selected analysis and methodology serve as an example of the capabilities of HydroLang to integrate methods within its framework, allowing the user to select either already implemented methods or add new customized ones.
Rainfall varies in space and time according to patterns that are subjected to global and local factors. An example of the spatial distribution of rainfall is isohyetal maps, which show the rainfall recorded at a gaged point within an area (Chow, et al., 1988). To determine areal averages of rainfall, the arithmetic means method (Equation 1) is used if the gages are uniformly distributed in an area.
Eq. 1 If the stations are located within an area for which they can be more representative, then weights accounting for area percentage are added to calculate the average precipitation. This is called the Thiessen method (Equation 2), which assumes that the points inside a watershed are the same as the nearest gage.
Eq. 2 The accumulation for a rainfall event can be calculated in terms of total depth or intensity; from here the rainfall hyetograph and mass curve can be derived. It is usually represented graphically and used for calculation of losses and runoff. Rainfall data is usually widely available on higher temporal resolutions. This is linked to both the number of stations that are recording the data and corresponding metadata. Koutsoyiannis (2003) highlights how countries provide hourly or sub-hourly data in terms of daily gauges. Rainfall aggregation means summing the readings of an event at a temporal scale (5 min to hourly data). Disaggregation, on the other end, encapsulates the methods that transform rainfall from high-resolution data to a finer scale. Different methods have been applied in the literature such as empirical methods (Knoesen & Smithers, 2009), cascade models (Muller & Haberlandt, 2018), and more recently, neural networks (Muller & Haberlandt, 2018) to achieve accurate disaggregation. Evaluation metrics such as the ones implemented within the framework have shown the advantages and disadvantages of these approaches, but this lies outside the scope of the research.
As per runoff, the water balance equation (Equation 3) is described as the difference between precipitation, evapotranspiration, storage, and groundwater (Sitterson, et al., 2017). The result is the surface runoff that can be observed on land.
The relationship between rainfall and runoff has been studied for more than 100 years, with the first publication making a reference to a method being published by Thomas Mulvaney in 1851: the rational method. Being quite simple, it uses rainfall intensity, the drainage area, and a runoff coefficient for determining peak discharges in a basin (Beven, 2012). This runoff coefficient has been widely studied to determine new applicable scenarios in which it changes, including graphical techniques (Beven, 2012), and more recently the usage of the unit hydrograph to account for the responses of a basin on a rainfall event (Xu, 2002).
The analysis of runoff has become its own focal study by means of models. A runoff model helps understand the hydrological phenomena in the complex system. They can be categorized depending on the approach that they take. Empirical structure studies the non-linearity of the inputs and outputs (Klemes, 1982). Conceptual structure takes simplified equations that represent water storage in a catchment (Vaze, et al., 2011); while physical structure applies physical laws and equations based on the real hydrologic responses within a certain domain (Sitterson, et al., 2017).
Moreover, models can also be classified based on the spatial processes which are involved in the catchment. These are lumped, semi-distributed, and distributed (Devi, et al., 2015). Lumped models do not consider spatial variability, and thus the entire catchment is modeled as one single unit. The inputs are averaged throughout the study area and are fast for computation but require many assumptions. Distributed models account for spatial variability, dividing the whole area into grids, and calculating all physical calculations per cell. Because of their approach, distributed models are data-intense and require longer computation times. Finally, semi-distributed models take ideas from both lumped and distributed models to make calculations.

Time of Concentration
The time that it takes for runoff to form and travel hydraulically from the most distant point of a catchment to the outlet is defined as time of concentration. It is obtained by the summation of all travel times in consecutive components of a catchment drainage, and it is in direct relation with the shape and peak of a runoff hydrograph (USDA, 2015). Different approaches have been taken in the framework for the calculation of the time of concentration. For instance, the SCS watershed lag method (Equation 4) spans through a large set of conditions by using the CN described previously. It uses the flow length, average catchment slope, and maximum potential retention.
The Kerby-Kirpich equation (Equation 5) considers that the total time of concentration is the sum of the overland time and the channel time (Sharify & Hosseini, 2011). For small watersheds where flow is important to consider for the travel time, the overland time is calculated using the overland flow length L; a conversion coefficient K which is 0.828 in metric system and 1.44 in imperial system; and a dimensionless retardance coefficient N that depends on the terrain, ranging from 0.02 to 0.80. The channel travel time is calculated using another coefficient K that is 0.0078 in metric system; S being the main channel slope, L the channel flow length.

T c = t ov + t ch
Eq. 5 Finally, an alternative approach to calculate time of concentration is implemented within HydroLang based on Manning's roughness coefficient as introduced by Kerby (Equation 6) which uses the coefficient, the overland slope, and the longest path. The formulas change depending on the units which are used.
Eq. 6 The relationships between time of concentration, lag time and time to peak have been well established. Lag time is the interval that lies between the center of mass of rainfall and the peak runoff (USDA, 2015) (Equation 7); established as 60% of the time of concentration. Time to peak is the time required for the mass of rainfall to reach its highest peak, considered to be 70% of the time of concentration.

Method 1 -Curve Number (SCS)
An example of a commonly used empirical lumped model is the one developed by the SCS called the Curve Number (CN) method. It considers the total drainage area of a watershed or subbasin for a rainfall event, but as a difference between the rational method, it also uses infiltration rates, losses and interceptions, and finally, the temporal distribution of the rainfall. Within the SCS curve number method, rainfall is considered to be uniformly distributed upon the watershed. Initial abstraction is defined as the losses that a watershed has before runoff can begin. Losses are from variables such as surface depressions, evapotranspiration, and infiltration. This is done using the empirical equation (Equation 8).
S is the maximum retention after runoff begins (Equation 9).
The value of CN, or curve number, is calculated based on the soil's cover type, hydrologic soil group, and antecedent moisture condition. The documentation of the method has been widely extended to include different types of each of the three components in which CN is constrained. The values for CN typically range between 30 and 100 (non-inclusive). Finally, runoff (Equation 10) is calculated as the difference between the precipitation and initial abstraction, if applicable, per unit of time.

Method 2 -Unit Hydrograph
One of the most common ways to make a simple yet powerful hydrological analysis is by using the concept of the unit hydrograph. It is a direct runoff hydrograph that results in a total volume of one unit of rainfall that is uniformly distributed over a basin during a specified unit of time. It requires certain assumptions (Shaw, 1998) such as that the effective rainfall should be distributed over a basin, rainfall duration should be uniform during the unit duration and time is invariant, having linearity, superimposition, and proportionality between one hydrograph and another. A unit hydrograph can be derived from an observed hydrograph, from multi-peaked flooding events or from synthetic calculations. If done from observed hydrographs, it can be done as a storm that fulfills the conditions of rainfall uniformly distributed, and more or less uniform intensity. If done from synthetic calculations, the basin physical characteristics play a major role for deriving duration and discharges. Specifically for synthetic calculations, The National Resources Conservation Service (NRCS) developed a dimensionless unit hydrograph from the observation of natural unit hydrographs from catchments varying in size and location that is used as a base start when information is unavailable. It is derived from the gamma distribution in the forms of ratio of discharge Q/Qp, the Euler constant e, the gamma equation (Equation 11) shape factor m, and the ratio of time t/tp (USDA, 2007).
The only parameter that changes from the equation is the shape factor, which is linked to the peak rate factor that changes depending on the terrain characteristics varying from 101 to 566 cfs (Table 1); the values on the lowest end represent fewer flat areas while the highest values are for abrupt terrain (Chow, et al., 1988) (Equation 12).
Finally, using the physical characteristics of the basin, the unit hydrograph can be derived within HydroLang based upon the idea of a triangular hydrograph (Equation 13). The time to peak is calculated as Tp, considering the total duration of excess unit rainfall D, calculated at the same time using the relation of 0.4*lag time.
By multiplying the entries of the dimensionless unit hydrograph by the required rainfall duration time and the peak discharge, a synthetic unit hydrograph is generated. After subtracting all losses from the rainfall event, the cascade multiplication of the rainfall event entries and the synthetic unit hydrograph results in a flood hydrograph. If the readings of both discharges of streamflow and a rainfall event are available, then a unit hydrograph can be derived empirically. This is done by using the concepts previously mentioned of proportionality and uniformity (Raghunath, 2006). Given an event, the total volume from a direct runoff hydrograph (DRH) (Equation 14) is obtained as the total summation of the volume times the timestep of the readings of the effective rainfall hyetograph (ERH), once the baseflow has been subtracted.
Eq. 14 The total volume equivalent (Equation 15) in units of depth will be the same as the division of the total volume DRH divided by the total area of the catchment. The unit hydrograph is then calculated as a time series in which the value is in units of discharge by unit of volume, i.e., m 3 /s/cm.
Once derived, a flooding hydrograph for a rainfall event can be found by doing a discrete convolution of the UH. The rainfall hyetograph must be separated in terms of pulses, in which each pulse represents units of equivalent depth. The discharge ordinate of the UH is calculated as a convolution spanning through time depending on the number of pulses required (Equation 16).
Eq. 16 The limitations on the method relay upon the rainfall and observed hydrograph distributions. Uniformity is required to obtain good results and is not a good practice to use when dealing with extreme events.

Method 3 -Bucket Model
Within the framework, a simple propagation model was implemented using bucket model as references. A bucket model replaces the unit hydrograph by representing streamflow as a cascade of state variables that are calculated at each time step (Santos et al., 2018). The variables included within vary depending on the type of approach for the model that is being taken, but most well-established models consider evaporation, rainfall, surface flow, infiltration, and baseflow. A simple bucket flow model calculates initial flow (Qi) (Equation 17) using the following equation, where FC stands for field capacity and LU is the land use percentage.
The soil moisture content is related to the different land uses scenarios. If the result of Qi is larger than the FC, then there is an overflow of the system. The overflow, then, is calculated as the difference between the Qi and FC. If this is not true, then the overflow Q0 is 0. If the flow Q0 is larger than 0, this means that there is interflow (Qinf) (Equation 18), which is calculated as the multiplication of Qi times the infiltration capacity (if).

Q inf = Qi * if
Eq. 18 The subsequent iterations (Equation 19) consider the latter as a function for overland flow and interflow.
Eq. 19 Total flow (Qt) (i.e. Bucket model final discharge) is then calculated as the sum of all the fluxes (Equation 20).

HydroLang Framework
HydroLang is designed and implemented around four main components (i.e., Data, Analysis, Visualization, Map) which are initialized and managed through a core module. The core module is responsible for generating HydroLang instances, caching asynchronously loaded resources and external libraries, and configuring system-wide parameters (e.g., metric system, default map services vendor). Figure 2 summarizes the architecture and components within HydroLang along with the scope of each module's functionality.

Data Component
The Data Component contains the handlers and mechanisms to retrieve, manipulate, and transform raw hydrological data from variety of sources in an object-oriented fashion. Main functionality is grouped under four functions (i.e., Retrieve, Upload, Transform, Download). To import data in HydroLang ecosystem, the user can either utilize external data resources with built-in APIs (i.e., Retrieve) or upload their own data from client-side (i.e. Upload). Transform function provides the capability to convert between different data structures (e.g., JSON, CSV, XML, array) for preferred representation, define custom labels, and filter out columns for compatibility with different procedures. Finally, the users can download the processed and/or analyzed data for their records (i.e., Download).

Figure 2: Architecture and components of the HydroLang framework
In support of HydroLang's extendable architecture, a unique workflow is implemented to retrieve raw data from external sources. Retrieve function implements an organization-independent algorithm that performs an HTTP request based on the parameters available in a separate JSON file, namely DataResources.js. Each data resource can be defined within this file by providing relevant parameters such as the name of the organization (e.g., USGS), type of data (e.g., stream stage), arguments for querying (e.g., date, location), access token, and proxy address (if required). The Data Module automatically parses these definitions, allowing the user to retrieve data as simple as providing the appropriate identifier for desired data resource. Thus, this usage converges to the convention-overconfiguration concept in the context of minimizing the complexity and error susceptibility by eliminating the need for messy descriptions, instead, allows access to data resources with an intelligible form of command that may virtually correspond to an actual sentence (e.g., retrieve stream stage data from USGS for Iowa for January 2020). HydroLang's current data resource descriptions establish the means of connection to numerous APIs providing hydrologically relevant data (e.g., stage, precipitation, disaster declarations) from around the world (Table 2).

Analysis Component
The Analysis Component can be attributed as the backbone of HydroLang as it packages all methodology required for reasoning and generating useful information. It is comprised of three subcomponents, namely Stats, Hydro, and NeuralNets.

Stats Module:
Stats subcomponent provides a wide range of tools to preprocess, clean, and statistically-operate on raw data. It implements and assembles a variety of functions to suit the presented data structures including time series. Basic statistical operations that are commonly used in data-driven projects include the calculations of mean, median, value range, quantile, frequencies, variance, and standard deviation as well as matrix operations. Main focus of this component is to bundle the operations that are often used in preprocessing of data before predictive modelling. These operations include standardizing a given dataset, identifying and removing outliers, converting temporal data into frequency domain with Fast Fourier Transform (FFT) to amplify patterns in data for better reasoning (Maklin, 2019), running Pearson's Correlation Analysis, and determining gaps within the data indicating missing values. There are many techniques to handle data gaps, with the most common approach being to take the average of the close interval in which that gap is found (Samules, 2020). Nevertheless, identifying and addressing gaps is usually tailored according to the type of data and the desired outcome (Zhao & Huang, 2015). Two of the most popular methods for outlier identification are inter-quartile range (IQR) and data normalization. The IQR method relies on the definition of a partition within the data, which is usually set as above the first quartile and below the third quartile. The definition suggests that any data points that fall outside of this partition is considered an outlier. The second method uses the sample's standard deviation to normalize the data in a way that the entire range of the sample can be classified on a scale where the median value is equal to 0. Similar to the first approach, it identifies the values that fall under a partition, which is usually defined between -1.5 and +1.5, as the clean sample while the remaining values are classified as outliers (Hadi & Simonoff, 1993). Both methods were implemented within the framework as part of the data classification and cleaning.

Hydro Module
Hydro subcomponent provides hydrological analysis capabilities for precipitation and runoff analysis while being structured in a way to allow further expansions to include any hydrological models. It contains basic functions for analyzing precipitation data from cloud computing services (Seo et al., 2019) regarding spatial and temporal coverage as well as the derivation of runoff based on physical characteristics using synthetic calculations. The functions can be utilized independently or in combination as the degree of cohesion between them is low which offers interoperability with limited constraints. For instance, the results of rainfall distribution derived from Thiessen polygons can be used for intensity hyetographs followed by the bucket model, thus, offering a chain-like commands that are easily readable.
In addition to helper functions for commonly used operations (e.g., linear system solver), this component implements several hydrological functions. Arithmetic function calculates the mean distribution of a rainfall event caught by different stations in the same basin. Thiessen function calculates the weighted average precipitation for a rainfall event assuming there is one stations per subbasin. Syntheticcalc function is used for deriving duration parameters (i.e., time of concentration and lag time) to create a unit hydrograph based on approaches that require different parameters; (1) the CN value, longest path, and average basin slope for SCS approach, (2) the longest path, and average slope for both main channel and basin for Kerby-Kirpich, and (3) the Manning coefficient, longest path, and slope for Kerby. Dimunithydro function creates a dimensionless hydrograph based on the Gamma distribution for calculating peak rate flows (PRF). As input, it requires the distribution type (e.g., Gamma), the peak rate flow (ranging between 101-566 cfs), required time step, and number of hours for an event. This yields the "m" parameter in the Gamma distribution which is later used to derive the hydrograph. If the PRF is on the lower end, then the basin flooding area should have a flat slope and the response time of the basin would be longer. If on the contrary, it is on the highest end, then the basin area should be a very steep basin. This should be considered by the user before the usage.
Hyetogen function creates an intensity hydrograph for a time-series event, while Unityhydrocons function creates a unit hydrograph from a time-series of either a dimensionless unit hydrograph or an observed discharge hydrograph. The dimensionless case relies on the drainage area, the time of concentration, and the dimensionless unit hydrograph created from the dimunithydro function. The unit hydrograph is constructed by multiplying the entries of the dimensionless hydrograph by the drainage area and divided by the time to peak. The observed hydrograph case relies on the drainage area, precipitation intensity, and baseflow. From the observed hydrograph, a direct runoff hydrograph (DRH) is calculated using the area of the basin, removing the observed baseflow. The unit hydrograph is then calculated by dividing the DRH by its total volume in depth. Finally, the function returns an object with the unit hydrograph and the total volume calculated for the rainfall event, which can then be used for calculating a flooding hydrograph.
Floodhydro function generates a flooding hydrograph based on the physical characteristics of a basin using two different approaches (i.e., SCS and empirical). SCS calculates abstractions based on CN, which is combined with a unit hydrograph resulting in a composite one. The empirical method calculates runoff considering the total volume of a unit hydrograph in terms of depth from the unit hydrograph. Using the principle of superposition and proportionality, the flooding hydrograph is calculated based on the convolution of the unit hydrograph with rainfall intensity expressed in terms of pulses. Each pulse represents an alteration in the system, which is found by multiplying the rainfall pulse with the unit hydrograph. All subsequent hydrographs are displaced one timestep. The final hydrograph is obtained by summing all entries, including the baseflow if given as a parameter.
Rainaggr function can aggregate and disaggregate rainfall to adjust the resolution for which the enhancements rely on pretrained models. Ground1d function is used to calculate groundwater steady simulation using both heads and discharge of a system, based on parameters such as length, k constant, node count, head, and discharge of the medium. Final discharge is solved with first-order Runge-Kutta method.

NeuralNets Module
NeuralNets subcomponent is designed as a wrapper to TensorFlow.js, which is an open-source hardware-accelerated JavaScript library to enable machine learning on the client-side for web-based platforms (Smilkov et al., 2019). It can be used to train, test, and use predictive models (Xiang and Demir, 2020) and customize networks (e.g., hidden layers, activation functions, bias, learning rate, batch size, epoch number, loss function, optimizer, metrics) utilizing the client hardware resources which reduces the dependence to a centralized server and assures data privacy. It introduces neural networks to the HydroLang framework and simplifies the process for building hydrological models by combining the data retrieval, preprocessing, statistical and hydrological analysis, and predictive modelling altogether in a one-stop ecosystem.
Hydrological systems, similar to any other physical phenomena, often requires the modelling of complex hydraulic, environmental, and geophysical data that possess intrinsic relationships and may not always explicitly manifest their underlying characteristics (Marçais and De Dreuzy, 2017). That is why, deep learning has proved to be an immensely valuable tool in hydrological tasks as it can produce actionable and robust knowledge based on noisy and multi-faceted data at a higher-level of conceptualization (Sit et al., 2020). Within HydroLang, TensorFlow.js can be utilized to perform machine learning tasks (e.g., classification, sequence and matrix predictions, regression, segmentation, reinforcement learning, unsupervised learning) on various subfields of hydrology including floods, land use, water quality and resources, surface water, groundwater, soil moisture, and weather.

Visualization Component
Visualization component is comprised of the functionality to generate visual data reports (i.e., charts and tables) based on the data retrieved or generated within HydroLang. All visualization objects returned from the component are associated with a semi-encapsulated division element to allow the developers to place and manipulate it on the web page while protecting the element's inner integrity.
The user may choose from different external libraries (e.g., Google Charts, D3) when constructing charts and tables. As a web page usually follows a theme, it is expected that these visual representations of data will share common descriptions in terms of styling. That is why, a separate class is created (i.e., styles) to define stylistic attributes that can directly be passed to table and chart creation functions as an object, and thus, eliminating redundancy.

Map Component
Map component encapsulates all operations and dynamics in regard to map-based visualizations. It provides a comprehensive and singular interface to geospatial functionality regardless of the underlying map engine. Thus, depending on user preferences and system requirements, the component provides the option to use different map engines (e.g., Leaflet, Google Maps) without requiring major modification to the code. Main functionalities of this component can be summarized as initializing a map instance with preferred engine, configuring and rendering the map with appropriate parameters (e.g. location, zoom, tile type), creating markers with annotations, drawing geometric features (e.g. polygon, line, point) from KML and GeoJSON files as well as data structures generated within HydroLang, and styling.

Community Oriented Development
The flexibility of using a modular architecture, open-source libraries, and not requiring installation provides a unique opportunity for scalability and upgrades, thus, creating the potential for the library to grow by becoming a community-based framework with collaborations from research institutions or individuals with expertise. HydroLang can be customized and extended by interested parties to suit for specific use cases, development environments, project requirements, and data resources. To provide an overview of the library's functions, input and output data types for interoperability, and descriptions of the employed methodology, an external tool (i.e., Documentation.js) is integrated to HydroLang framework that automatically generates a human-readable documentation website by parsing the code and comments written following the JSDoc syntax. The documentation is hosted on a website to be updated as the library grows. Additionally, a detailed and practical guideline on how to extend and modify the framework along with sample code snippets are published as part of the GitHub repository (https://github.com/uihilab/HydroLang). In addition to the core library, a separate repository, HydroLang-Models, (https://github.com/uihilab/HydroLang-Models) is created for the users of HydroLang to share their models, codes, and case studies to enable others to browse and build on top of, and thus creating a community. HydroLang, along with HydroLang-Models, are published on GitHub with MIT License available for use by the academics and professionals of the water domain all around the world.

Results and Discussions
HydroLang is designed as a plug-and-play JS library encapsulating numerous functions to retrieve, analyze, and visualize geospatial data for hydrological tasks. It can be used in the operational and research settings to augment existing web-based information systems for an easy-to-use and integrated hydrological toolkit as well as in the educational setting to teach K-12 and college level students hydrological processes. Its chain-like commands with intelligible names allow the students to intuitively perform hydroinformatics tasks as if they are forming a sentence in plain English. Furthermore, the framework is suitable to be integrated with curriculums to create assignments with defined tasks.
One of the main benefits and distinctions of the framework is that it encapsulates advanced hydrological and analytical (e.g., machine learning) functionality in the client-side. For the entire chain of data processing (e.g., retrieval, cleaning, analysis), only a single codebase is required eliminating the need for external programs and manual work. Enabling the use of machine learning within JavaScript significantly reduces the learning curve and simplifies the technicality with multiple layers of abstraction. Furthermore, utilizing ML models from the browser increases accessibility, by ensuring a functioning environment with access to client hardware that does not require installation and configuration, as well as speed, by preventing the need to transfer huge amounts of user data to the server for model execution (Rivera, 2020).
Consequently, on-device computations minimize privacy concerns by preserving user data as well as sensor readings. This connectivity and privacy along with access to a wide range of client devices open paths for crowdsourcing and citizen science applications for hydrology (Sermet et al., 2020a). The privacy and community centric approach can benefit decision support systems (Sermet et al., 2020b;Xu et al., 2020) and operational use cases by federal and state organizations. Such applications can achieve high level performance by leveraging emerging web technologies for distributed (Agliamzanov et al., 2020) or parallel computing, such as WebAssembly, WebGL, WebGPU, and WebRTC (Smilkov et al., 2019).

Case Studies
Two extensive use cases have been developed to demonstrate the presented framework's capabilities and its modular nature allowing adoption for different purposes and data resources. These use cases show the workflow (e.g., objects, functions) for performing routines that are common for hydrological tasks. The case studies are not intended to be comprehensive and novel hydrological analyses; they are rather examples to demonstrate how HydroLang can be utilized in common hydrological analysis and research tasks. For both case studies, entire implementation and execution happen within HydroLang by the tools it provides.

Case Study 1 -Evaluation of Lumped Models for Medium-Size Basins
This case study presents an evaluation of lumped models on medium-size basins within the HydroLang framework. SCS and Bucket Model are employed for respective basins to generate hydrographs to compare observed runoff within basin-specific constraints. The basins selected for the experiment are Upper Roanoke Basin in Virginia, USA and Morland Basin in Cumbria, England (Figure 3). The basins were selected based solely on data availability and physical characteristics. Both basins have intense agricultural land usages and similar river and basin slopes.

SCS-based Hydrograph Generation for Upper Roanoke River Basin
The analysis for synthetic hydrograph was performed based on the Roanoke River bounded by the Upper Roanoke Basin, which has a drainage area of 509 mi². The river has a median daily discharge of around 180 ft 3 /s. The basin has a coverage of 62% forest, 25% grassland and agriculture, and 10% urban area. The length of the river spans 50.3 miles up to the gauging station. The analysis was performed from August 30 th , 2020 to September 2 nd , 2020 ( Figure 4). For analytical purposes, a CN value of 80 is determined based on the studies described by Li et al. (2018) for basins of similar characteristics. The metrics for the time calculations of mass transport were obtained ( Figure 5) using synthetic calculators implemented within the framework. Although there is a strong variation between the terrain from the initial river stage to the gauging station, an average slope of the river of 0.068% was assumed as described by the City of Roanoke (City of Roanoke, 2016). The travel times were calculated using Kerby-Kirpich method and used for the selection of a dimensionless unit hydrograph with a peak discharge ratio of 238, selected considering that the terrain of the basin is a combination of steep terrain and flat areas, with predominance on the latter. The calculated travel times are 6.76, 4.73, and 4.06 hours for time of concentration, time to peak, and lag time, respectively.
Both the observed discharge and the rainfall measurements were obtained through USGS API Service using the functions included within HydroLang's Data Component. The data resolution for discharge was 15 minutes whereas it was 5 minutes for rainfall measurements. Both were aggregatedin case of discharge, as the average value on a period -to analyze them conjointly. For a robust and comparable analysis, the data was aggregated to 1-hour resolution to create a synthetic flooding hydrograph using SCS. To verify the performance of the bucket model implemented within the framework, sensor data for the Morland basin was utilized, which contained hourly rainfall, evaporation, and observed runoff for dates between October 1 st , 2011 through November 30 th , 2012. The basin has an area of 12.7 km 2 with land use categorized as 3% urban area, 60% agriculture, 3% bare rock, 27% grassland, and 7% forest. Moreover, the field capacities for each land use are 5 mm, 25 mm, 5 mm, 25 mm and 50 mm, respectively. The data was obtained from Environmental Agency UK (EAUK). To evaluate the efficiency of the models for different time resolutions, metrics such as the Nash-Sutcliffe efficiency coefficient, coefficient of determination (r 2 ), and the index of agreement (d) were used. The data for both basins was analyzed hourly, every 6 hours, every 12 hours and daily. These metrics were selected as they are commonly used for validation and calibration of these models (Krause, et al., 2005). The hydrograph derived using the synthetic method showed poor performance for long term events in large basins, but provides good performance for single event hydrographs in small basins. The selection of the type of dimensionless hydrograph was a key to simulate a realistic rainfall pattern, which by itself is constrained by the type of terrain, slope and land use. Nonetheless, considering that synthetic methods are based solely on the physical characteristics of a basin in comparison to complex numerical systems which take into account the changes in topographical features and wave propagation, the SCS allows for fast computations in case of a flooding event. The methods showing the best performances are the deconvolution unit hydrograph and the bucket model for larger time resolutions, whilst results from the synthetic dimensionless hydrograph show a difference in performance based upon the arguments stated above. The bucket model performs well for data sets that span throughout longer periods of time, but not on an hourly scale ( Figure 6). The sensitivity for the efficiency metrics on the model shows an increase in concordance of observed and modelled volumes as the simulation period is extended (Figure 7).

Case Study 2 -Rainfall Disaggregation using Neural Networks
This case study explores the uses of machine learning libraries integrated to HydroLang for the creation of reusable models applied to hydrology and environmental sciences. Using Tensorflow.js, a rainfall disaggregation model is created to increase the resolution of 1-hour rainfall data to 15 minutes. A rainfall station in the locality of Altavista, Virginia, USA is selected to retrieve both 1-hour and 15-minute rainfall measurements between 1984 and 1987. The raw data was preprocessed and sorted for further hydrological analysis following the implementation described by (Burian et al., 2000). The storms for training and evaluation sets were selected within the center of cumulative depth distribution of each storm within the given time frame, and thus, excluding extreme events. The model was implemented as a 3-layered feed forward neural network with 1 hidden layer of 11 neurons. The combined dataset of 1-hour and 15-minute rainfall data, which contains 96 storms, were divided into two blocks for training and validation. The activation function selected for both the hidden layer and the output layer was the sigmoid function. Configurations for model compilation include Adaptive Moment Estimation (Adam) (Kingma and Ba, 2014) as an optimizer, binary cross entropy as a loss function, and mean-square error as a success metric. The total number of epochs established for the data training was 1000 with a learning rate of 0.19, both of which were selected empirically after several iterations to find an optimum value in regards of the computation time ( Figure 8). To evaluate the final performance of the model, mean absolute error (MAE) were used, for which smaller values indicate better prediction accuracy. For the validation dataset, the model produced a mean absolute error value of 0.551, resulting in a reasonable correlation with the observed actual sensor readings (Figure 9). This model was based on a single station without being informed on factors such as seasonality, extreme events, and stochastic differences, and thus, did not achieve strong or generalizable performance in comparison to what is available in the literature using advanced models and computing (Poschlod et al., 2018). However, the sole purpose of this case study is to highlight HydroLang's vision and capability regarding the range of hydrological applications with predictive aspects that can be implemented within the framework, and to provide a boilerplate for community adoption. Generalized rainfall disaggregation using neural networks is an active research area (Sit et al., 2020) for which this study showcases its potential implementation in web-based systems.

Conclusions and Future Work
This paper introduces HydroLang, a web-based, modular, integrable, extensible, and open-source software framework as a full-scale solution for hydrological research and education use cases to retrieve, manipulate, analyze, visualize, and model water-related data on the web. It is implemented in JavaScript (JS) upon ECMAScript 6 standards in a modular structure to enable chain-like commands, and packaged to be served as a single JS file ready-to-use on any web platform. The core library is grouped under four main components for: (1) retrieving, manipulating, and transforming raw hydrological data, (2) statistical operations (i.e. Stats), hydrological functions (i.e. Hydro), and creating models (i.e. NeuralNets), (3) generating graphical and tabular data representations, and (4) mapping and geospatial data visualization. Two case studies (i.e., evaluation of lumped models and development of a rainfall disaggregation model) have been presented to demonstrate the capabilities of the framework and to serve as a guide for further adoption in the hydrological domain.
HydroLang's main objective is to create an easy-to-use framework that can be used for research and education. The framework provides basic and customizable tools for data driven projects on hydrology, hydraulics, and structure for easy adoption to other fields. Using only open-source libraries for functionality and data retrieval, it serves as an adequate tool for research and the capability of running software on web browsers utilizing client hardware. HydroLang is built upon a modular architecture specifically designed to be tailored for different use cases, software stacks, organizations, and data resources. New data resources, data types, external libraries and map engines, hydrological and statistical functions, neural network configurations, visualization methods, and novel modifications and expansions can be introduced in pursuit of a consensual and comprehensive hydrological software toolkit supported by the domain associates. To support community building, detailed documentation, through guidelines for adoption and extension, and a repository to share models, data, and case studies are developed and published.
More specifically, the open-source release of HydroLang will allow the scientific community to contribute to the framework for a more complete solution. The framework can be ported to server-side JavaScript environments (e.g., NodeJS) for the purpose of having more computational power available for specialized cases. Being able to use the library on the server-side makes the analysis of large-scale data feasible. More specifically, various enhancements and improvements can be performed per module such as adding new data resource endpoints and hydrological models (e.g. runoff). NeuralNets component can be expanded with predesigned configurations suitable to specific use cases for off-theshelf usage with custom data. Finally, the framework can be enhanced to handle geospatial data types generated from desktop GIS applications.