Sponsored by:
Texas Association of Environmental Professionals (TAEP)
TAEP is the premier organization for environmental
professionals in the State of Texas. The goals of TAEP include
the advancement of the environmental profession and the
establishment of a forum to discuss important environmental
issues. TAEP members receive a 10% discount. Please call
713.522.6300 for the code.
Published: 2025
Authors: Pin-Ching Li, Sayan Dey, Venkatesh Merwade
Abstract
Machine learning (ML) models are alternatives to traditional hydrologic modeling for streamflow predictions in ungauged basins (PUB). The variability in watershed characteristics of ungauged basins; however, adds uncertainties to PUB frameworks based on ML models. These uncertainties arise from the inconsistency in the statistical distributions between the dataset used to train and test a ML model, known as covariate shifts, and the real-world (global) dataset on which the trained model is implemented. In real-world applications, covariate shift is a widespread issue for ML that has not been investigated in hydrological applications. This study evaluates the uncertainty in ML-based PUB method including Random Forest (RF) and Artificial Neural Network (ANN) under the influence of covariate shift. The Monte Carlo method is applied to aggregate simulations of RF and ANN according to various data splitting configurations as predictive distributions. The results indicate that ML performance is not robust under covariate shifts. ML performance is influenced by watershed characteristics displaying heterogeneity, such as drainage area, dam density, and urbanized area. 20–48% simulation results show a departure from the normal distribution under different covariate shift scenarios Furthermore, the efficiency and limitation of Random Forest models for PUB are highlighted by investigating their biased predictions in watersheds with varying dam density, drainage area, and meteorological variables, such as annual snowfall and annual precipitation.