Skip to main content
Towards HydroLLM: Building a Domain-Specific Language Model for Hydrology

Towards HydroLLM: Building a Domain-Specific Language Model for Hydrology

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Dilara Kizilkaya, Yusuf Sermet, Ibrahim Demir

Abstract

As large language models (LLMs) continue to expand, their effective adaptation to specialized fields remains a critical challenge. This work presents an initial step toward the development of HydroLLM, a domain-specific LLM for hydrology. We construct a dataset of approximately 8,800 hydrology-focused question–answer pairs, each with a supporting context passage drawn from textbooks and scientific articles. The dataset includes four instructional formats: multiple choice, true/false, fill-in-the-blank, and open-ended. Using this corpus, we fine-tune several LLMs of varying type and scale—from compact (1.5B) to large (32B) parameter counts using parameter-efficient LoRA (Low-Rank Adaptation) methods. Our methodology compares different fine-tuned models and evaluates model performance using accuracy and cosine similarity metrics across task types. Results show that larger model size is not always advantageous: among the fine-tuned models, the 8B DeepSeek Llama variant achieved the strongest overall performance, while the 32B model overfit and the 1.5B model underperformed—emphasizing the need to match model capacity to dataset size. This work demonstrates that effective domain adaptation requires careful consideration of model architecture, parameter count, and task complexity, with fill-in-the-blank tasks proving particularly challenging across all models. By establishing performance and identifying the limits of current fine-tuning approaches, we took a concrete step toward building HydroLLM as a robust, domain-specific language model for hydrological analysis and decision support.

DOI

https://doi.org/10.31223/X51H99

Subjects

Engineering

Keywords

HydroLLM, Large Language Models (LLMs), Fine-Tuning, hydrology, Question Generation, Domain-Specific AI, Natural Language Processing (NLP), Large Language Models (LLMs), Fine-tuning, hydrology, Question Generation, Domain-Specific AI, Natural Language Processing (NLP)

Dates

Published: 2025-07-11 10:38

Last Updated: 2025-07-11 19:58

Older Versions

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data Availability (Reason not available):
All data is available upon reasonable request.