This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.
Downloads
Authors
Abstract
Machine learning technology promises a more efficient and scalable approach to locating and aggregating data and information from the burgeoning scientific literature. Realizing this promise requires provision of applications, data resources, and the documentation of analytic workflows. GeoDeepDive provides a digital library comprising over 13 million peer-reviewed documents and the computing infrastructure upon which to build and deploy search and text-extraction capabilities using regular expressions and natural language processing. Here we present a model GeoDeepDive workflow and accompanying R package to show how GeoDeepDive can be employed to extract spatiotemporal information about site-level records in the geoscientific literature. We apply these capabilities to a proof-of-concept subset of papers in a case study to generate a preliminary distribution of ice-rafted debris (IRD) records in both space and time. We use regular expressions and natural language-processing utilities to extract and plot reliable latitude-longitude pairs from publications containing IRD, and also extract age estimates from those publications. This workflow and R package provides researchers from the geosciences and allied disciplines a general set of tools for querying spatiotemporal information from GeoDeepDive for their own science questions.
DOI
https://doi.org/10.31223/X54312
Subjects
Earth Sciences, Glaciology, Sedimentology
Keywords
Digital Libraries, Geoinformatics, Ice Rafted Debris, Open Workflows, Text Mining, Ice Rafted Debris, Open Workflows, Text Mining
Dates
Published: 2021-07-02 03:39
License
CC BY Attribution 4.0 International
Additional Metadata
Conflict of interest statement:
None
There are no comments or no comments have been made public for this article.