Data Science for Geoscience: Recent Progress and Future Trends from the Perspective of a Data Life Cycle

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Xiaogang Ma 

Abstract

Data science receives increasing attention in a variety of geoscience disciplines and applications. Many successful data-driven geoscience discoveries have been reported recently, and the number of geoinformatics and data science sessions have begun to increase in many geoscience conferences. Across academia, industry, and governmental sectors, there is a strong interest to know more about the current progress as well as the potential of data science for geoscience. To address that need, this article provides a review from the perspective of a data life cycle. The key steps in the data life cycle includes concept, collection, preprocessing, analysis, archive, distribution, discovery, and repurpose. Those subjects are intuitive and easy to follow even for geoscientists with very limited experience of cyberinfrastructure, statistics, and machine learning. The review includes two key parts. The first is about the fundamental concepts and theoretical foundation of data science, and the second is the summary of highlights and sharable experience from existing publications centered on each step in the data life cycle. At the end, a vision about the future trends of data science applications in geoscience is discussed, including topics on open science, smart data, and science of team science. We hope this review will be useful to data science practitioners in the geoscience community, and will lead to more discussions on the best practices and future trends of data science for geoscience.

DOI

https://doi.org/10.31223/X55S4D

Subjects

Artificial Intelligence and Robotics, Databases and Information Systems, Earth Sciences, Environmental Sciences, Numerical Analysis and Scientific Computing, Theory and Algorithms

Keywords

cyberinfrastructure, open science, science of team science, workflow platform, data-driven geoscience discovery

Dates

Published: 2021-05-03 23:46

Last Updated: 2021-05-04 02:46

License

CC BY Attribution 4.0 International