This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
Downloads
Supplementary Files
Authors
Abstract
On September 18-19, 2019 a workshop on Securing Legacy Data to Enable Future Discoveries was held in Albuquerque, New Mexico engaging 29 researchers representing universities, national laboratories, and governmental agencies that included 4 international and 10 early career participants. The need and funding for this workshop grew out of a June 2018 event focused on legacy seismic data organized by the National Academy of Science Committee on Seismology and Geodynamics (NAS COSG), which sparked interest at both at the National Science Foundation and U.S. Geological Survey. The NAS COSG event identified a number of technical as well as financial challenges in trying to collect and build the datasets necessary to address key problems spanning large time periods that require legacy data. Not only are such datasets essential to evaluate global change in microseisms and extend time series of precursory phenomena, they are a crucial first step toward machine learning and other data intensive processing. Regardless, critical paper and magnetic tape records are at risk of loss or severe degradation.
Presentations and discussions at the meeting were organized in three main themes: science drivers, data preservation, and future directions. Through a series of presentations by the participants, the workshop reviewed examples of tectonic, volcanic, national security and climate science questions that can best be addressed using legacy data. Next the workshop reviewed past and ongoing legacy data preservation efforts in the US and internationally, enabling participants to consider best practices. This second theme included a number of different software products to better scan analog legacy data. Several large-scale international scanning and digitizing projects, such as by Instituto Nazionale di Geofisica e Vulcanologia in Italy, SISMOMEx by the Universidad Nacional Autónoma de México, and Harvard University were described in detail. These efforts show that with sufficient resources, significant volumes of analog data can be preserved and securely archived. However, these efforts are currently only addressing a small fraction of the available high-value, legacy data worldwide.
The workshop participants identified opportunities to coordinate activities internationally to achieve consensus on metadata standards. Initially, 39 metadata elements were identified. These elements can be grouped into 6 broad categories that parameterize the data including: 1) Time of Data, 2) Station/Channel, 3) Sensor, 4) Recording System, 5) Image File, and 6) Other. Participants were surveyed as to whether these elements should be required, recommended, optional, or omitted. Post-workshop, 20 additional metadata elements were contributed to the list. To reach consensus and maximize the utility of these efforts, additional vetting by the international community is warranted.
At the end of the workshop, a list of next steps for legacy data activities was developed and summarized below:
Analog holdings catalog. Create an inventory of analog seismic data holdings to identify current resources, connect potential users to resources, and aid in metadata discovery.
Publications database. Create a database of research publications that use analog data as a resource to other researchers, inspire new studies, and provide evidence to the importance of this data.
Data Availability. Develop policies to encourage legacy data submission to data centers working with existing centers on sustainable financial models.
Standards. Begin work on creating FAIR compliant metadata standards to enable federated discovery and access. Establish best practices and standards for imaging and digitizing, learning from established projects.
Pilot Project. Identify existing repositories to pilot federated data search and access utilizing proposed metadata standards, and retrieval of multiple data and metadata types. A pilot study will help to demonstrate the data’s value, enable consensus on standardization, and advance data processing workflows.
Future Research. Identify strategies to enable future research through open source and standardization of both data and software. Identify targeted campaigns with specific research objectives defining the high priority science questions such as the identification of key stations to conduct imaging of all records and the identification of specific earthquakes for historical analysis.
New Technologies. Identify enabling technologies to reduce human intervention in the end-to-end process of creating research ready, time series data.
Other Communities. Attract a broader scientific community to apply seismological data in nontraditional research domains and communities with similar needs in preserving analog time series data.
Outreach. Create a larger community of users through outreach at all career levels.
DOI
https://doi.org/10.31223/osf.io/dre8m
Subjects
Earth Sciences, Geophysics and Seismology, Physical Sciences and Mathematics
Keywords
analog data, analog seismic data, data rescue, legacy data, legacy seismic data, seismogram, workshop
Dates
Published: 2020-02-21 17:07
There are no comments or no comments have been made public for this article.