Capturing the spatio-temporal variability of groundwater pumping using remote sensing products and machine learning techniques: An assessment of training data quality and quantity implications on model performance

Author
Abstract

Data paucity has limited machine learning studies of groundwater, particularly withdrawals, which drive depletion and are critical to monitor to ensure future groundwater sustainability. Machine learning techniques have been proven to provide significant insight in numerous studies including terrestrial water storage changes and trends. The reliability of outcomes from machine learning analysis heavily relies on available data quality and quantity. Nevertheless, the data quality and quantity requirements to produce a robust estimate of groundwater withdrawals are not well studied. In this study, we evaluated a combination of different training and testing sets to understand performance variability of machine learning models that predict groundwater withdrawals using a Random Forest algorithm. We performed the analysis in the Northwestern Kansas Groundwater Management District 4, which monitors withdrawals from all major use wells. The model used publicly available remote sensing, modeling, and observational products as input. The time frame of the model is 2008 - 2020, and the model is trained at the point scale (well location) with a limited number of samples (5% to 95% of the available data) to predict the volume of groundwater withdrawals. The predicted withdrawals are then aggregated to a 2 km by 2 km grid, producing volumetric water use estimates for the entire region. The results of the model prediction performance varied according to the amount of training data used. Models trained on 75% to 95% of the total data resulted in R2 that ranges from 0.88 to 0.96, respectively. A significant drop in model performance is observed when data used for training is reduced to 5% to 25% with corresponding R2 values of 0.61 to 0.67. The feature importance analysis ranked precipitation and groundwater level change as the two most important variables. This study has the potential to provide insight on the amount of data needed to accurately capture the spatio-temporal variability of groundwater withdrawals in regions with limited data availability, in addition to contributing to our understanding about groundwater withdrawal drivers in regions with similar climate, anthropogenic activities, and aquifer systems.

Year of Publication
2023
Conference Name
AGU23
Date Published
12/2023
Conference Location
San Francisco
URL
https://ui.adsabs.harvard.edu/abs/2023AGUFM.H43I2192A/abstract