Skip to main content
Geospatial Machine Learning for Predicting Flash Flood Response at Ungauged Appalachian Watersheds: Terrain, Soil, and Land Cover Controls

Geospatial Machine Learning for Predicting Flash Flood Response at Ungauged Appalachian Watersheds: Terrain, Soil, and Land Cover Controls

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Sujan Bhattarai 

Abstract

Flash floods remain among the deadliest weather hazards in the United States, yet the majority of flood-prone watersheds in the Appalachian region lack streamflow monitoring. Predicting flood response characteristics at these ungauged sites requires understanding which landscape properties control hydrologic behavior. This study evaluates whether geospatial basin descriptors derived from high-resolution terrain, soil, and land cover datasets can predict seven flood response metrics across 49 gauged Appalachian watersheds spanning seven states (Virginia, West Virginia, North Carolina, Tennessee, Kentucky, Georgia, and Pennsylvania). Predictor variables were extracted from the USGS 3D Elevation Program (10 m), the National Land Cover Database (30 m), and the NRCS Soil Data Access service. Four model families were compared using leave-one-out spatial cross-validation: regularized linear models (Ridge, ElasticNet), tree-based models (Random Forest, XGBoost), and Gaussian Process Regression (GPR) with multiple kernel configurations. Results show that GPR with a Matern 1.5 kernel achieves the highest predictive skill for the Q95 discharge ratio (R-squared = 0.46) and mean rise rate (R-squared = 0.73), while regularized linear models perform comparably or better for other targets. Flashiness index and coefficient of variation of annual peaks are not predictable from static geospatial descriptors (R-squared approximately equal to 0), indicating that these properties depend on storm characteristics rather than landscape attributes. Spearman correlation analysis identifies basin relief (rho = -0.58, p < 0.001) and drainage area (rho = -0.42, p < 0.01) as the strongest correlates of flood response. SHAP-based feature importance analysis confirms that terrain properties dominate across most targets, contributing 42 to 69 percent of total importance. GPR prediction intervals show well-calibrated uncertainty, with observed 95 percent coverage ranging from 88 to 95 percent across targets. These findings suggest that geospatial machine learning can provide moderate predictive skill for flood magnitude indicators at ungauged Appalachian sites, but flashiness metrics require dynamic storm-event information that static basin descriptors cannot capture.

DOI

https://doi.org/10.31223/X5ZB5F

Subjects

Physical Sciences and Mathematics

Keywords

flash flood susceptibility; ungauged basins; geospatial machine learning; Gaussian process regression; SHAP interpretability; Appalachian hydrology; uncertainty quantification; terrain analysis

Dates

Published: 2026-04-14 15:09

Last Updated: 2026-04-14 15:09

License

No Creative Commons license

Additional Metadata

Conflict of interest statement:
No conflicts

Data Availability:
Available through USGS

Metrics

Views: 27

Downloads: 2