In Defense of Metrics: Metrics Sufficiently Encode Typical Human Preferences Regarding Hydrological Model Performance

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: This is version 3 of this Preprint.


Download Preprint


Martin Gauch , Frederik Kratzert, Oren Gilon, Hoshin Gupta, Juliane Mai, Grey Nearing , Bryan Tolson, Sepp Hochreiter , Daniel Klotz


Building accurate rainfall-runoff models is an integral part of hydrological science and practice. The variety of modeling goals and applications have led to a large suite of evaluation metrics for these models. Yet, hydrologists still put considerable trust into visual judgment, although it is unclear whether such judgment agrees or disagrees with existing quantitative metrics. In this study, we tasked 622 experts to compare and judge more than 14,000 pairs of hydrographs from 13 different models. Our results show that expert opinion broadly agrees with quantitative metrics and results in a clear preference for a Machine Learning model over traditional hydrological models. The expert opinions are, however, subject to significant amounts of inconsistency. Nevertheless, where experts agree, we can predict their opinion purely from quantitative metrics, which indicates that the metrics sufficiently encode human preferences in a small set of numbers. While there remains room for improvement of quantitative metrics, we suggest that the hydrologic community should reinforce their benchmarking efforts and put more trust in these metrics.



Earth Sciences, Hydrology, Physical Sciences and Mathematics, Water Resource Management


hydrology, metrics, visual inspection, Rainfall-Runoff, expert judgment, machine learning


Published: 2022-10-19 21:38

Last Updated: 2023-02-08 02:38

Older Versions

CC BY Attribution 4.0 International

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.