A Model Workflow for GeoDeepDive:  Locating Pliocene and Pleistocene Ice-Rafted Debris

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.


Download Preprint


Simon Goring, Jeremiah Marsicek , Shan Ye , John Warren Williams , Stephen Meyers, Shanan E Peters , Daven Quinn , Allen Schaen , Brad Singer , Shaun Marcott


Machine learning technology promises a more efficient and scalable approach to locating and aggregating data and information from the burgeoning scientific literature. Realizing this promise requires provision of applications, data resources, and the documentation of analytic workflows. GeoDeepDive provides a digital library comprising over 13 million peer-reviewed documents and the computing infrastructure upon which to build and deploy search and text-extraction capabilities using regular expressions and natural language processing. Here we present a model GeoDeepDive workflow and accompanying R package to show how GeoDeepDive can be employed to extract spatiotemporal information about site-level records in the geoscientific literature. We apply these capabilities to a proof-of-concept subset of papers in a case study to generate a preliminary distribution of ice-rafted debris (IRD) records in both space and time. We use regular expressions and natural language-processing utilities to extract and plot reliable latitude-longitude pairs from publications containing IRD, and also extract age estimates from those publications. This workflow and R package provides researchers from the geosciences and allied disciplines a general set of tools for querying spatiotemporal information from GeoDeepDive for their own science questions.




Earth Sciences, Glaciology, Sedimentology


Digital Libraries, Geoinformatics, Ice Rafted Debris, Open Workflows, Text Mining, Ice Rafted Debris, Open Workflows, Text Mining


Published: 2021-07-02 04:39


CC BY Attribution 4.0 International

Additional Metadata

Conflict of interest statement:

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.