Interpretable Quality Control of Sparsely Distributed Environmental Sensor Networks Using Graph Neural Networks

Elżbieta Krystyna Lasota; Timo Houben; Julius Polz; Lennart Schmidt; Luca Glawion; David Schäfer; Jan Bumberger; Christian Chwala

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Elżbieta Krystyna Lasota, Timo Houben, Julius Polz, Lennart Schmidt, Luca Glawion, David Schäfer, Jan Bumberger, Christian Chwala

Abstract

Environmental sensor networks play a crucial role in monitoring key parameters
essential for understanding Earth’s systems. To ensure the reliability and accuracy of collected
data, effective quality control (QC) measures are essential. Conventional QC methods struggle
to handle the complexity of environmental data. Conversely, advanced techniques such as neural
networks, are typically not designed to process data from sensor networks with irregular spatial
distribution. In this study, we focus on anomaly detection in environmental sensor networks using
graph neural networks, which can represent sensor network structures as graphs. We investigate
its performance on two datasets with distinct dynamics and resolution: commercial microwave
link (CML) signal levels used for rainfall estimation and SoilNet soil moisture measurements. To
evaluate the benefits of incorporating neighboring sensor information for anomaly detection, we
compare two models: Graph Convolution Network (GCN) and a graph-less baseline-long short-
term memory (LSTM). Our robust evaluation through 5-fold cross-validation demonstrates the
superiority of the GCN models. For CML, the mean area under curve values for the GCN was 0.941
compared to 0.885 for the baseline-LSTM, and for SoilNet, it was 0.858 for GCN and 0.816 for the
baseline-LSTM. Visual inspection of CML time series revealed that the GCN proficiently classified
anomalies and remained resilient against rain-induced events often misidentified by the baseline-
LSTM. However, for SoilNet, the advantage of GCN was less pronounced likely due to a fragile
labeling strategy. Through interpretable model analysis, we demonstrate how feature attributions
vividly illustrate the significance of neighboring sensor data, particularly in distinguishing between
anomalies and expected changes in signal level in the time series.

DOI

https://doi.org/10.31223/X5WT3W

Subjects

Artificial Intelligence and Robotics, Earth Sciences

Keywords

quality control, graph neural network, time series, Environmental sensors network, anomaly detection, machine learning

Dates

Published: 2024-05-02 08:04

Last Updated: 2024-05-02 12:04

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Data Availability (Reason not available):
The CML data supporting this research was provided to the authors by Ericsson. This data is not publicly available as Ericsson restricted the distribution of this data due to their commercial interest. In order to obtain CML data for research purposes a separate and individual agreement with the network provider has to be established. The SoilNet data used in this study is available upon request at https://www.ufz.de/record/dmp/logger/806/en/.