This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1016/j.envsoft.2022.105364. This is version 2 of this Preprint.
Downloads
Authors
Abstract
Sensors measuring environmental phenomena at high frequency commonly report anomalies related to fouling, sensor drift and calibration, and datalogging and transmission issues. Suitability of data for analyses and decision making often depends on manual review and adjustment of data. Machine learning techniques have potential to automate identification and correction of anomalies, streamlining the quality control process. We explored approaches for automating anomaly detection and correction of aquatic sensor data for implementation in a Python package (PyHydroQC). We applied both classical and deep learning time series regression models that estimate values, identify anomalies based on dynamic thresholds, and offer correction estimates. Techniques were developed and performance assessed using data reviewed, corrected, and labeled by technicians in an aquatic monitoring use case. Auto-Regressive Integrated Moving Average (ARIMA) consistently performed best, and aggregating results from multiple models improved detection. PyHydroQC includes custom functions and a workflow for anomaly detection and correction.
DOI
https://doi.org/10.31223/X5Z62X
Subjects
Biogeochemistry, Civil Engineering, Environmental Engineering, Environmental Monitoring, Hydrology
Keywords
data management, aquatic sensors quality control, anomaly detection, Python, data management, aquatic sensors, quality control, anomaly detection
Dates
Published: 2021-07-23 12:13
Last Updated: 2022-03-26 11:18
There are no comments or no comments have been made public for this article.