ml4xcube: Machine Learning Toolkits for Earth System Data Cubes

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Julia Peters, Anja Neumann, Marco Jaeger, Lukas Gienapp , Josefine Umlauft 

Abstract

Rapidly changing climate conditions and the increase in extreme events are posing severe challenges to human life and infrastructure, requiring sophisticated analytical capabilities for hazard prediction and disaster risk management. Earth System Data Cubes (ESDCs) have become an essential tool in Earth System Sciences (ESS) by organizing large-scale, multivariate environmental datasets into a structured, scalable and analysis-ready format. However, modern machine learning techniques are not yet being utilized to their full potential on ESDCs. This is due to the lack of proper tooling, domain-specific challenges, and high barriers of entry for practitioners. We introduce ml4xcube, an open-source Python framework designed to assist ESS domain experts in applying ML techniques on ESDCs for advanced analysis and prediction of environmental variables and impacts. Through a comprehensive suite of tools, it addresses specific challenges associated with the nature of ESS data, such as the non-uniform data distribution due to dynamic gaps, or spatio-temporal autocorrelation of environmental variables. Due to its modular architecture, it covers the complete analysis process, from data exploration, and preparation, to model development, result interpretation and evaluation. With support for distributed computing, it handles large ESDC datasets efficiently. In order to ease the adoption it includes extensive documentation and tutorial notebooks. We demonstrate ml4xcube’s capabilities through three examples, showcasing its potential and capabilities for integrating machine learning with ESDC data.

DOI

https://doi.org/10.31223/X5D13M

Subjects

Computer and Systems Architecture, Computer Engineering, Engineering, Geography, Remote Sensing, Social and Behavioral Sciences

Keywords

Earth System Data Cubes, machine learning, multivariate data analysis, high performance computing

Dates

Published: 2024-10-09 07:42

License

CC BY Attribution 4.0 International