This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Geospatial Machine Learning for Predicting Flash Flood Response at Ungauged Appalachian Watersheds: Terrain, Soil, and Land Cover Controls
Downloads
Authors
Abstract
Flash floods remain among the deadliest weather hazards in the United States, yet the majority of flood-prone watersheds in the Appalachian region lack streamflow monitoring. Predicting flood response characteristics at these ungauged sites requires understanding which landscape properties control hydrologic behavior. This study evaluates whether geospatial basin descriptors derived from high-resolution terrain, soil, and land cover datasets can predict seven flood response metrics across 49 gauged Appalachian watersheds spanning seven states (Virginia, West Virginia, North Carolina, Tennessee, Kentucky, Georgia, and Pennsylvania). Predictor variables were extracted from the USGS 3D Elevation Program (10 m), the National Land Cover Database (30 m), and the NRCS Soil Data Access service. Four model families were compared using leave-one-out spatial cross-validation: regularized linear models (Ridge, ElasticNet), tree-based models (Random Forest, XGBoost), and Gaussian Process Regression (GPR) with multiple kernel configurations. Results show that GPR with a Matern 1.5 kernel achieves the highest predictive skill for the Q95 discharge ratio (R-squared = 0.46) and mean rise rate (R-squared = 0.73), while regularized linear models perform comparably or better for other targets. Flashiness index and coefficient of variation of annual peaks are not predictable from static geospatial descriptors (R-squared approximately equal to 0), indicating that these properties depend on storm characteristics rather than landscape attributes. Spearman correlation analysis identifies basin relief (rho = -0.58, p < 0.001) and drainage area (rho = -0.42, p < 0.01) as the strongest correlates of flood response. SHAP-based feature importance analysis confirms that terrain properties dominate across most targets, contributing 42 to 69 percent of total importance. GPR prediction intervals show well-calibrated uncertainty, with observed 95 percent coverage ranging from 88 to 95 percent across targets. These findings suggest that geospatial machine learning can provide moderate predictive skill for flood magnitude indicators at ungauged Appalachian sites, but flashiness metrics require dynamic storm-event information that static basin descriptors cannot capture.
DOI
https://doi.org/10.31223/X5ZB5F
Subjects
Physical Sciences and Mathematics
Keywords
flash flood susceptibility; ungauged basins; geospatial machine learning; Gaussian process regression; SHAP interpretability; Appalachian hydrology; uncertainty quantification; terrain analysis
Dates
Published: 2026-04-14 15:09
Last Updated: 2026-04-14 15:09
License
Additional Metadata
Conflict of interest statement:
No conflicts
Data Availability:
Available through USGS
Metrics
Views: 27
Downloads: 2
There are no comments or no comments have been made public for this article.