Skip to main content
Comparing Process-Based and Machine Learning Models for Streamflow Prediction in the Kaligandaki River Basin, Nepal

Comparing Process-Based and Machine Learning Models for Streamflow Prediction in the Kaligandaki River Basin, Nepal

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Aayush Roka , Bishesh Khanal

Abstract

Reliable daily streamflow prediction is critical for hydropower operations, flood risk management, and irrigation planning in monsoon-dominated Himalayan river basins. While both process-based and machine learning (ML) approaches have been used for such tasks, systematic comparisons that decompose the sources of performance differences remain scarce. This study evaluates seven configurations: a process-based SWAT model, and three XGBoost (XGB) and three Random Forest (RF) models representing pure rainfall-runoff, observed-lag, and recursive-simulation scenarios for streamflow prediction in the Kaligandaki river basin, Nepal. An NSE gap decomposition framework is applied to quantify two distinct components of discharge lag information value: the watershed memory benefit and the recursive error propagation cost. During the five-year independent test period, SWAT achieved NSE = 0.851, while pure rainfall runoff XGB-A and RF-A models achieved NSE = 0.840. Observed-lag upper-bound models reached NSE values above 0.94. RF recursive simulation (RF-C) achieved NSE = 0.861, exceeding SWAT, whereas XGB recursive simulation showed no improvement (XGB-C = 0.838), revealing a strong algorithm-dependent sensitivity to recursive error propagation. Flow duration curve analysis reveals that SWAT underestimates low flows (>60% exceedance probability) despite near-zero total PBIAS (−0.28%), reflecting compensating biases between high- and low-flow regimes. SHAP analysis identifies the Antecedent Precipitation Index (API) as the dominant precipitation predictor, with 30-day cumulative rainfall as the second-ranked feature, confirming multi-week soil-moisture memory as the primary catchment-scale control on discharge in this large, monsoon-dominated basin. These results establish that well-engineered pure rainfall-runoff models match SWAT in aggregate NSE while substantially outperforming it in distributional flow reproduction.

DOI

https://doi.org/10.31223/X55N3N

Subjects

Civil and Environmental Engineering, Earth Sciences, Engineering, Environmental Sciences, Hydrology, Physical Sciences and Mathematics, Water Resource Management

Keywords

Streamflow prediction, SWAT model, Machine learning hydrology, XGBoost, Random Forest, Himalayan river basin, Kaligandaki River, Hydrological modeling, NSE decomposition, Recursive forecasting

Dates

Published: 2026-06-04 05:57

Last Updated: 2026-06-04 05:57

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data Availability:
The data used in this study are available from publicly accessible sources and referenced datasets. Processed datasets and model configurations can be shared upon reasonable request.

Metrics

Views: 22

Downloads: 0