The PATCH Lab: A database and workspace for Cenozoic terrestrial paleoclimate and environment reconstruction

12 In the last two decades, analytical advances and a growing interest in relevant research questions 13 has brought a rapid increase in the amount of stable isotope data used for reconstructing terrestrial 14 paleoclimates and environments. As the spatial and temporal resolution of proxy data continues to im15 prove, the quantitative interpretation of these data is becoming increasingly common. These advances 16 in data resolution and theory bring opportunities for multi-proxy comparisons, synthesis and modeling 17 of large datasets, integration with paleoecological datasets, improved climate model benchmarking, 18 and more. Here, in an effort to support these growing avenues of research, we present The PATCH Lab 19 (Paleo-Analysis of Terrestrial Climate and Hydrology)—an online portal to discover, download, and 20 quantitatively analyze Cenozoic terrestrial stable isotope data. The PATCH Lab portal includes a new 21 database that currently includes 27009 stable isotope measurements from 211 publications spanning 22 multiple terrestrial proxies, and quantitative models for interpreting water isotope and soil carbonate 23 data. Data query, download, and modeling results are organized into user-friendly graphical interfaces 24 that export datasets as .csv files. New data can be easily submitted to the PATCH Lab curators 25 through the portal by completing a data submission template. The PATCH Lab, with the help of 26 community engagement, serves as a resource for archiving terrestrial stable isotope data, building pa27 leo “isoscapes”, and increasing accessibility to quantitative methods of investigating terrestrial stable 28 isotopes in paleoclimate. 29


31
Stable isotopes of oxygen, hydrogen, and carbon from terrestrial "proxy" materials are valuable tracers of water and 32 climate on land throughout Earth history. In this paper, we will refer to these proxy data collectively as "terrestrial when available (Table 1). Of these, 24906 are δ 18 O measurements, 22114 are δ 13 C, and 978 are δD. About 59% of and area sampled increasing with bin duration. We bin the data in 5 million year intervals because this is similar to 116 or greater than the age uncertainty for many terrestrial sections, and consistent with the timescales considered for 117 the tectonic evolution of mountain ranges (Chamberlain and others, 2012). The extensive sample coverage of western 118 North America demonstrates that climate and tectonics reconstructions on a regional (> 100km) scale have grown 119 increasingly tractable as data coverage approaches the coverage of the sedimentary record. other dimensions beyond the outcrop scale are becoming easier and more common. We anticipate that the database 126 presented here will further contribute to this ongoing work to understand data spatially and across proxies. 3. DATA STRUCTURE flexible inputs are publication references or codes that identify entities like a locality or sample. These inputs are 145 flexible, rather than free, because they contain input restrictions such as following a reference format or disallowing 146 spaces or certain characters. Numeric flexible inputs (restricted to numbers) include columns such as the age of 147 the sample or its isotopic ratio. Numeric flexible input columns are useful for querying data based on quantitative 148 characteristics such as geographic range (latitude and longitude) or age range (e.g. the Eocene, 56-34 Ma). 149 Fixed inputs columns allow only a fixed set of input values. Fixed input columns are used for broad grouping of 150 data types by defining the sample material and its basic lithology or fossil source. They also include columns that 151 accept binary values (yes or no) that are useful for data filtering (for example, to remove duplicated values that were 152 published in one paper and compiled in a later paper). Fixed input columns are designed for querying qualitative 153 data characteristics, such as data from a specific mineral or type of deposit (e.g. paleosol versus lacustrine carbonate). 154 Together, these three column input types are designed to meet a range of data query and post-processing goals while 155 minimizing the amount of information that is lost or condensed. Any given stable isotope value rests on a hierarchy of information designed to simplify data querying and post-158 processing steps (Fig. 3). The basic unit of this structure is the publication (or a dataset ID in cases where data 159 is unpublished such as publicly available MS and PhD theses). While a more basic unit would be a given sample is not reported and must be independently determined.

163
Within every publication, we define at least one section ID. Section IDs are flexible alphanumeric inputs that separate specific terms. The basic sample type defines a given sample based on its lithological, mineralogical, or fossil origin 179 (e.g. "phyllosilicate" or "shell") and the more specific sample type distinguishes between sub-categories (e.g. "altered 180 ash" or "mollusc"). The sample material denotes the mineral or material measured (e.g. "smectite" or "aragonite").

181
Finally, the sample ID is a flexible alphanumeric identification code assigned by the original paper or, when none is 182 available, generated by us. We deviate from the sample ID defined by the original paper if multiple sample types are 183 measured for a given sample ID. In this case, we append a suffix to the sample ID that distinguishes the sample type 184 (such as "XXorg" and "XXcarb" for organic and carbonate carbon measurements of the same hand sample).

185
The last two delineations, the replicate ID (flexible alphanumeric) and isotope ratio (flexible numeric), refer to were a single measurement and note that the value reflects the average of multiple measurements. Finally, the isotope 191 ratio is the last level of the hierarchy, and is a numeric value assigned to one of 5 columns (δ 18 O, δ 13 C, δD, ∆ 47 , 192 ∆ 17 O). All data in a given column are reported relative to the same isotope standard.

193
All column names are grouped by their data input type (free, flexible, or fixed) and described in Table 2    taken. In order to include these complementary data we add six generic data entry columns-three to define data 203 types ("Other data type X") and three to define data values ("Other data value X"). In these columns, the value "X" 204 can be either "1", "2", or "3", and this number links the data type and value. While many publications include 205 data beyond terrestrial stable isotope data, we generally only include other data in these columns if they are directly 206 relevant for contextualizing the isotope data. However, new data entries do not need to follow this guideline and are 207 welcome to include any additional data in these generic columns. but at the risk of the data being omitted from geographic queries.

219
In order to make our data entry practices transparent, we document decisions such as these in Word documents or 220 text files referred to as data upload "notebooks". Each data upload (publication or unpublished dataset entry) has 221 its own notebook and each notebook contains "General notes" and "Next steps". The general notes section includes 222 decisions about data entry such as those discussed above. These decisions are usually also documented in a notes 223 column in the database itself. The next steps section includes updates that can be made to the data entry in the 224 future. These updates might be to ask the corresponding author for clarification in the case of a data entry question, 225 or for additional data if only average values of multiple replicates are reported. In some cases the next steps include 226 entering "other data" that is available in the manuscript but does not fall into any of the stable isotope data columns.

227
The next steps efforts do not involve changes to the data entry that are critical for the isotope data (in this case, the   Database queries can be completed from the homepage of The PATCH Lab portal (Fig. 4). Here, users will find   its isotopic composition, the isotopic composition of respired soil carbon, and the carbonate formation temperature.

265
The output data are δ 13 C and soil respiration rate values for a given depth in the soil column. Output 2 solves for 266 the δ 13 C as a function of depth in the soil profile using similar inputs as output 1 as well as the soil respiration rate.

267
Output 3 returns atmospheric pCO 2 based on the isotopic composition of soil carbonate and other necessary inputs 268 such as the soil pCO 2 and its isotopic composition. Lastly, output 4 reads in carbon isotope data that is defined   In addition to data and model downloads, The PATCH Lab portal includes the tools necessary to contribute new 294 data (see Fig. 4). A basic workflow for the data contribution process is shown in Figure 6. Data contributors can 295 begin by downloading the PATCH Lab template that includes all of the data entry columns listed in Table 2  The PATCH Lab curators will continue to add data to the database and we strive to upload recent publications 307 swiftly. We encourage community contributors to upload their own data to minimize the influence of curators on data 308 entry decisions.  The "Community Tools" page of The PATCH Lab portal hosts links to code developed by the research community 321 that is of interest to other PATCH Lab users. These entries are not restricted to any specific functionality, but 322 some helpful resources include code for processing or plotting data, for running other models, or for integrating other 323 datasets like modern climate, topography, or water isotopes. We also encourage users to contact us about integrating 324 new graphical user interface models into The PATCH Lab portal directly. The portal runs on code written in R, so 325 resources written in R can be easily adapted to be featured on the portal.

326
Currently, this page includes links to R code to work with the database. There are a variety of scripts, including 327 the same data query scripts used by The PATCH Lab portal, for users to build upon or modify. There are also some 328 functions for statistical analysis that is useful for determining statistical power and possible data biases. Scripts like 329 these can be sent to PATCH Lab curators to upload to the Community Tools alongside data from a given publication 330 to improve transparency and reproducibility. The PATCH Lab is advised by a steering committee to help ensure its long-term success and to stay up to date 333 with the needs of different research communities. The steering committee identifies areas of the resource that can be 334 improved and advises on its long-term infrastructure. The committee includes two distinct groups-an advisory panel 335 and regional editors. The primary role of advisory panel members is to identify long-term goals for the PATCH lab 336 and provide feedback from relevant subdisciplines that frequently use terrestrial stable isotope data. In turn, regional 337 editors handle data curation for certain continents or regions and act to identify new data trends that may impact 338 data curation and archiving. Regional editors also identify and upload data that is missing from the database within The PATCH Lab is an online workspace that integrates terrestrial stable isotope data and models for efficient data 344 download and analysis. Previous and ongoing research already leverages the rapid increase in data of the past two 345 decades. However, synthesizing these data on a case-by-case basis will become more challenging and time intensive 346 as the total amount of available data grows. The goal of this effort is to provide a long-term, curated database for 347 research purposes, alongside tools that can assist in data interpretation. 348 We emphasize that community engagement in this effort is critical to its utility. The PATCH Lab is a tool developed 349 for the terrestrial paleoclimate and environment community-a community that spans a wide range of disciplines and 350 sub-disciplines, each of which may find the tool useful for different reasons. Feedback from, and engagement with, the 351 research community are the most effective ways to ensure this resource remains useful as science progresses. We thank all of the researchers who generated the stable isotope data compiled in The PATCH Lab. We are also grateful to Hari Mix, Danielle Y. Moragne, and Sam Kramer for their contributions compiling data. We thank Katharina Methner for valuable conversations about the database structure, and Caitlin Mothes and Matthew Ross of the CSU Geospatial Centroid for assistance in hosting the PATCH Lab application. This work was funded by NSF EAR-1322084.