A Model Workflow for GeoDeepDive:  Locating Pliocene and Pleistocene Ice-Rafted Debris

Simon Goring; Jeremiah Marsicek; Shan Ye; John Warren Williams; Stephen Meyers; Shanan E Peters; Daven Quinn; Allen Schaen; Brad Singer; Shaun Marcott

A Model Workflow for GeoDeepDive: Locating Pliocene and Pleistocene Ice-Rafted Debris

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Simon Goring, Jeremiah Marsicek , Shan Ye , John Warren Williams , Stephen Meyers, Shanan E Peters , Daven Quinn , Allen Schaen , Brad Singer , Shaun Marcott

Abstract

Machine learning technology promises a more efficient and scalable approach to locating and aggregating data and information from the burgeoning scientific literature. Realizing this promise requires provision of applications, data resources, and the documentation of analytic workflows. GeoDeepDive provides a digital library comprising over 13 million peer-reviewed documents and the computing infrastructure upon which to build and deploy search and text-extraction capabilities using regular expressions and natural language processing. Here we present a model GeoDeepDive workflow and accompanying R package to show how GeoDeepDive can be employed to extract spatiotemporal information about site-level records in the geoscientific literature. We apply these capabilities to a proof-of-concept subset of papers in a case study to generate a preliminary distribution of ice-rafted debris (IRD) records in both space and time. We use regular expressions and natural language-processing utilities to extract and plot reliable latitude-longitude pairs from publications containing IRD, and also extract age estimates from those publications. This workflow and R package provides researchers from the geosciences and allied disciplines a general set of tools for querying spatiotemporal information from GeoDeepDive for their own science questions.

DOI

https://doi.org/10.31223/X54312

Subjects

Earth Sciences, Glaciology, Sedimentology

Keywords

Digital Libraries, Geoinformatics, Ice Rafted Debris, Open Workflows, Text Mining, Ice Rafted Debris, Open Workflows, Text Mining

Dates

Published: 2021-07-02 06:39

License

CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:
None

Metrics

Views: 1557

Downloads: 603