This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Towards HydroLLM: A Benchmark Dataset for Hydrology-Specific Knowledge Assessment for Large Language Models
Downloads
Authors
Abstract
The rapid advancement of Large Language Models (LLMs) has enabled their integration into a wide range of scientific disciplines. This paper introduces a comprehensive benchmark dataset specifically designed for testing recent large language models in hydrology domain. Leveraging a collection of research articles and hydrology textbook and, we generated a wide array of hydrology-specific questions in various formats, including True/False, Multiple Choice, Open-Ended, and Fill-in-the-Blank. These questions serve as a robust foundation for evaluating the performance of state-of-the-art LLMs, including GPT-4o-mini, Llama3:8B, and Llama3.1:70B, in addressing domain-specific queries. Our evaluation framework employs accuracy metrics for objective question types and cosine similarity measures for subjective responses, ensuring a thorough assessment of the models’ proficiency in understanding and responding to hydrological content. The results underscore both the capabilities and limitations of Artificial Intelligence (AI)-driven tools within this specialized field, providing valuable insights for future research and the development of educational resources. By introducing HydroLLM-Benchmark, this study contributes a vital resource to the growing body of work on domain-specific AI applications, demonstrating the potential of LLMs to support complex, field-specific tasks in hydrology.
DOI
https://doi.org/10.31223/X5R410
Subjects
Environmental Engineering
Keywords
benchmark dataset, large language models, hydrology, Question Generation, Domain-Specific AI, natural language processing
Dates
Published: 2025-03-15 19:34
Last Updated: 2025-03-16 02:32
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
None
Data Availability (Reason not available):
Data is shared in the paper.
There are no comments or no comments have been made public for this article.