WaterBench: A Large-scale Benchmark Dataset for Data-Driven Streamflow Forecasting

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.


Download Preprint


Ibrahim Demir, Zhongrun Xiang, Bekir Zahit Demiray, Muhammed Sit 


This study proposes a comprehensive benchmark dataset for streamflow forecasting, WaterBench, that follows FAIR data principles that is prepared with a focus on convenience for utilizing in data-driven and machine learning studies, and provides benchmark performance for state-of-art deep learning architectures on the dataset for comparative analysis. By aggregating the datasets of streamflow, precipitation, watershed area, slope, soil types, and evapotranspiration from federal agencies and state organizations (i.e., NASA, NOAA, USGS, and Iowa Flood Center), we provided the WaterBench for hourly streamflow forecast studies. This dataset has a high temporal and spatial resolution with rich metadata and relational information, which can be used for varieties of deep learning and machine learning research. We defined a sample streamflow forecasting task for the next 120 hours and provided performance benchmarks on this task with sample linear regression and deep learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and S2S (Sequence-to-sequence). To some extent, WaterBench makes up for the lack of a unified benchmark in earth science research. We highly encourage researchers to use the WaterBench for deep learning research in hydrology.




Civil and Environmental Engineering, Computer Sciences, Earth Sciences, Hydrology



Published: 2021-12-31 18:27


CC BY Attribution 4.0 International

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.