From Points to Predictions: Data Curation for Geospatial Machine Learning

Louis Saumier; Joe R. Melton; Scott Winton

From Points to Predictions: Data Curation for Geospatial Machine Learning

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Louis Saumier, Joe R. Melton, Scott Winton

Abstract

The quality of training datasets can have a large impact on Machine Learning (ML) models, yet this aspect of the pipeline frequently receives less scrutiny than it should. In the context of geospatial mapping from point-scale field data, quality control strategies to remove erroneous or misleading data can be applied prior to model training to improve performance. However, such strategies and their resulting impact are rarely reported, compared to extensive discussions of model selection and tuning. To investigate the potential for spatial data error correction, we examine the case of peatland mapping from peat core samples. We assess several curation strategies and compare fully automated filters against filters that require monitoring by domain experts. We find that cleaning strategies based on location precision and landcover classification filtering to detect mismatches can significantly improve performance metrics. We also find that blind reliance on fully automated classification may lead to worse results. Despite the additional effort required, we conclude that manual spatial data quality control processes are an important component of large-scale spatial modelling and discuss recommended approaches to scale them effectively for large datasets.

DOI

https://doi.org/10.31223/X59N2H

Subjects

Physical Sciences and Mathematics

Keywords

geospatial machine learning, data-centric machine learning, peatland mapping, location accuracy, landcover filtering

Dates

Published: 2026-01-20 14:33

Last Updated: 2026-01-20 14:33

License

No Creative Commons license

Metrics

Views: 240

Downloads: 63