Differences in carbon isotope discrimination between angiosperm and gymnosperm woody plants, and their relationship to atmospheric O2:CO2 ratio, physiology, and environment

For most of the Phanerozoic Eon, Earth’s woody vegetation has been dominated by C3 plants – predominantly gymnosperms with angiosperms only emerging as the dominant plant group as CO2 declined during the Cenozoic (66 Ma onward). At present, differences in carbon isotope discrimination (∆C) between angiosperm and gymnosperm plants are relatively small (2–3 ‰), but an increasing body of evidence points to larger differences across geological times (up to 6–7 ‰), potentially associated with varying environmental conditions and atmospheres (i.e. concentrations of atmospheric carbon dioxide, [CO2], and oxygen, [O2] could have ranged from ∼ 180 to 1100 ppm, and ∼ 15 to 25 %, respectively, across the past 250 Ma). Yet, differences in ∆C between the two plant groups, and their potential link to climatic and environmental changes, have not yet been fully explored and understood. Here, we combine a comprehensive ab initio model of discrimination, with a recent model of plant eco-physiology based on least-cost optimality theory, to show how differences in ∆C between angiosperms and gymnosperms arise. We train the comprehensive model using a very large (n > 7000) database of leaf and tree ring data spanning the past 110 years. We find that averaged differences in ∆C between angiosperm and gymnosperms decrease modestly with atmospheric [O2]:[CO2] ratios, and increase strongly with vapor pressure deficit (D). These relationships can be explained by three key physiological differences: (1) the ratio of cost factors for transpiration to carboxylation (higher in angiosperms); (2) the ratio of mesophyll to stomatal conductances of CO2 (lower in gymnosperms); and (3) differences in photorespiration. In particular, the amount of CO2 released from photorespiration per oxygenation reaction, λ, is generally lower in gymnosperms than in angiosperms. As a result of these factors, ∆C is more sensitive to [CO2] in angiosperms, and to D in gymnosperms. We propose a simplified empirical model to account for this behaviour, and test it against isotopic data from leaves, tree rings and previously-published plant chamber experiments, along with geological data from the Cenozoic. Overall, these data agree with our model over a range of [O2]:[CO2] ratios from 100 to 650 mol mol−1 (equivalent to a CO2 range around 323 − 525 ppm at 21% O2), and D levels between 0.45 and 1.1 kPa (R = 0.51, RMSE = 1.49‰). Our simplified empirical model offers a new explanation for secular trends in the geological record, and suggests a way ∗Corresponding author Email address: vincent.john.hare@gmail.com (Vincent J. Hare) Preprint submitted to Geochimica et Cosmochimica Acta February 8, 2021 forward to improve paleo-[CO2] proxies based on terrestrial discrimination models by incorporating the effects of [O2], phylogeny, and photorespiration. Lastly, the framework predicts that the average difference in ∆C between woody C3 plant groups will increase in the future if both [CO2] and global D continue to rise as suggested by projections.

wards), angiosperms rapidly diversified to become the dominant land plant group. It is thought that the radiation of contributes only slightly to isotope discrimination, e.g. < 1‰ at a typical Γ * of ∼ 3.2 Pa, at 20 o C (value from Bernacchi et al. (2002)), assuming h is negligible. However, its contribution increases at lower CO 2 concentrations, 119 and/or higher Γ * (higher leaf temperature). Because Rubisco has an affinity for both O 2 and CO 2 , Γ * also depends 120 on the oxygen concentration in the chloroplast, O c , and on the amount of CO 2 released from photorespiration per 121 oxygenation reaction -a variable defined by Busch (2020) as λ. The relationship between these variables can be 122 written as: stress, and CH 2 -THF is the chemical precursor of lignin and many other secondary products. Typically, λ is around 128 0.5 (corresponding to 25% of the 2-phosphoglycerate carbon lost as CO 2 ), but as the relative proportion of these 129 components change, so too does λ -and the discimination against 13 C due to photorespiration.

130
Incorporating mesophyll effects, and retaining the photorespiration terms (see full derivation in Electronic Annex-131 ure), Eq. (3) can be rewritten more succinctly as: 132 133 whereā = (a s θ m + a m )/(1 + θ m ) and θ m is the ratio of mesophyll conductance to stomatal conductance (g m /g s ).
134ā thus represents all the fractionation processes during the CO 2 diffusion along the pathway from the atmosphere to   ∆ 13 C (‰) leaf-level carbon isotope discrimination 1,2,3 ∆ 13 C * a−g (‰) difference between average co-located angiosperm and gymnosperm ∆ 13   were estimated using the training dataset (Section 3.1), but also using the tree ring and leaf data from the training 204 dataset individually (the latter calibrated parameters are reported in the Electronic Annexure). In all cases, we 205 considered a model of the form of Eqns. (5)(6)(7), with Gaussian errors, and constants b = 30‰, a s = 4.4‰, 206 a m = 1.8‰ (Ubierna and Farquhar, 2014), R = 8.3145 J mol −1 K −1 , and f = 11 ± 4‰ (Tcherkez, 2006). In 207 practice, we found that the best fit (lowest RMSE) was obtained using a value of h = −1 ‰ for angiosperms, 208 and h = −10 ‰ for gymnosperms. Rubisco kinetic parameters measured on tobacco leaves were taken from   Eqns. (5-7) and our best-fit values for β c , g m /g s , and λ/λ ref . The simulations were performed at 20 o C for two 217 different levels of vapour pressure deficit: low D (0.23 kPa) and high D (1.54 kPa). Although this choice of D levels may seem arbitrary, it encompasses a reasonably wide range of environments (optimal D ∼ 0.8 kPa). 219 We then compared our results against our "testing dataset" which was comprised of averaged ∆ 13 C differences 220 calculated from the tree ring and leaf isotopic compilation, Cenozoic geological data from the available literature    Table 2, Fig. 2). The best-fit values of β c (combining leaf and tree ring datasets, correction "A") 236 are 210 ± 25 (1σ) for angiosperms and 147 ± 10 for gymnosperms. The value for angiosperms is in excellent 237 agreement with that obtained by Lavergne et al. (2020b) using robust linear regressions (i.e. 211 ± 1.8), but that 238 for gymnosperms is lower than the one obtained by the same study (i.e. 286 ± 1.6), and the value first estimated by  For both corrections, g m /g s is higher in angiosperms (e.g. 2.6 ± 0.7) than in gymnosperms (e.g. 0.98 ± 0.10), is also in good agreement with our findings.

247
The most pronounced differences between the two plant groups are observed for λ/λ ref (Fig. 2c), with values 248 around 5.2 for angiosperms and 0.1 for gymnosperms. Using species-specific ε cellulose (correction "B"), λ/λ ref is 249 slightly higher than using correction "A" for gymnosperms (0.2), but still significantly lower than the comparable 250 value for angiosperms (4.0). Table 2: Best fit plant-specific parameters for Eqns. (5)(6)(7) fitted to global ∆ 13 C data from leaves and tree rings. Parameters are: βc, the ratio of carboxylation to transpiration cost factors at 25 o C; gm/gs, the ratio of mesophyll to stomatal conductance; and λ/λ ref , the amount of CO 2 released from transpiration per oxygenation reaction, relative to that of N. tabacum at 25 o C. All values are unitless, errors are 1σ.

251
Correction "A" used a constant post-photosynthetic fractionation factor for tree ring cellulose, ε cellulose = −2.1 ‰, whereas correction "B" used species-specific post-photosynthetic fractionation factors (i.e., ε cellulose,angio = −2.8‰ for angiosperms and ε cellulose,gymno = −4.7 ‰ for gymnosperms). Lowest RMSE values were found with with h = −1‰ for angiosperms, and h = −10‰ for gymnosperms. For simplicity, we define the offset in discrimination between co-located (i.e same T d and D) angiosperm and 260 gymnosperm plants as ∆ * a−g = ∆ a −∆ g , where the subscripts "a" and "g" are adopted to indicate each plant group, 261 respectively. The asterisk ( * ) denotes an average isotopic offset between two tissues, rather than a fractionation in  shows that ∆ * a−g is expected to fall in a large range between +3. 5 where ε f is a coefficient related to the difference in fractionation between angiosperms and gymnosperms due to 279 photorespiration terms, and ε ab is a coefficient related to differences attributed to CO 2 diffusion and carboxylation.

280
At this stage the meaning of the third term, ε 0 is not fully clear, but is included to describe all other remaining 281 contributions (including differences in respiration, random effects, etc.), which we assume to be constant. Because  the Rubisco specificity, S c/o (Busch, 2020), the first term can be approximated (using Eqns. [4][5] as: another species of angiosperm at any given location. However, when several species are compared, we expect this 292 variation will be averaged out, because according to our analysis, differences in λ appear to be fairly robust across Our proposed model (Eqn. 8) is simple enough to be tested against stable carbon isotope data from plant chamber 295 experiments, as well as tree rings and leaves (see Fig. 4a,c caption, and details of "testing dataset" in Methods).

306
The slope of ∆ * a−g versus D is higher than that of ∆ * a−g versus [O 2 ]/[CO 2 ], and in the opposite direction (Fig. 4b).

307
A linear fit to the binned data in Figure 4b yielded a positive slope of ε ab = 5.9 ± 1.3 ‰ mol mol −1 (1σ), with 308 an intercept of −3.9 ± 1.0 (adjusted R 2 = 0.86). This result is in general agreement with the predictions (purple 309 lines, Fig. 4b) at moderate D levels (i.e. between 0.7 and 1 kPa). However, the slope of ∆ * a−g versus D as implied 310 by the data, is greater than that predicted by Eqns. (5)(6)(7). Figure 4b shows that the predictions using Eqns. (5-7) 311 diverge slightly from the trend in the data at 0.5 < D < 0.7 kPa and at D > 1 kPa.

312
Finally, using our fitted value for ε f , it is possible to calculate the Rubisco specificity (S c/o ) using Eqn. 9. We   ∆ * a−g values calculated from Cenozoic paleo-data generally support our novel interpretation. Figure 5 shows that

488
In this study, we aimed to better understand the factors that influence differences in ∆ 13 C between angiosperm values -including β c -the ratio of cost factors for carboxylation to transpiration (related to leaf physiology), g m /g s 493 -the ratio of mesophyll to stomatal conductances for CO 2 (related to leaf morphology), and λ -the fraction of CO 2 494 released during photorespiration (related to plant carbon metabolism). 495 We also showed that the ∆ 13 C offset between the two C 3 plant groups is very likely not constant over time presents an opportunity to refine ∆ 13 C-based proxies of paleoatmospheric composition, if diagenesis can be ruled 507 out, and D levels can be independently constrained.

508
Our framework reconciles previously unexplained observed patterns, such as covariation of modern tree ring ∆ * The authors declare that they have no known competing financial interests or personal relationships that could have 515 appeared to influence the work reported in this paper.

516
1 Locations of ∆ 13 C dataset Figure S1 shows the location of the tree ring and leaf data sites for angiosperm and gymnosperm plants used in the study. The isotopic data are derived from diverse species and plant functional types growing in a wide range of environments characterised by different soil water content and evaporative demand (vapour pressure deficit, D, ranging around 0.1 and 2 kPa). Figure S1: Global locations of tree ring (TR) and leaf stable carbon isotopic data. Blue points correspond to angiosperms, while red points are for gymnosperms. Yearly-averaged daytime vapour pressure deficit (D) is given in kPa.

Derivation of expression for ∆ 13 C in C c -basis (Eq. 4, main text)
An comprehensive expression for leaf carbon isotope discrimination, assuming finite mesophyll conductance, and including photorespiration terms (but excluding fractionation during respiration), was proposed by [14] as: where for convenience we adopt the notation suggested by ref. [16] of χ = C i /C a , and χ c = C c /C a . Assuming that the CO 2 flux from the outside of the leaf to the intercellular spaces is equal to the flux from the intercelullar spaces to the chloroplast, Fick's law yields: which can be rewritten as: where θ m is the ratio of mesophyll (g m ) to stomatal conductance (g s ). Rearranging this expression yields Inserting Eqns 1.4-1.5 into Eq. 1.1 yields the expression for ∆ 13 C in terms of χ c :

Rubisco kinetics
Parameters associated with Rubisco kinetics include the Michaelis-Menten coefficients for CO 2 , K c (Pa), and O 2 , K o (Pa), as well as Γ c (Pa). The former two parameters are combined into the effective Michaelis-Menten coefficient for Rubisco, K (Pa), as follows: Both K c and K o exhibit an Arrhenius temperature response which depends on their respective activation energies (E a,Kc and E a,Ko ). At any given temperature T (K), these variables in (Pa) were computed as follows: Values for K c,25 , E a,Kc , K o,25 , and E a,Ko were taken from study [1] as 272.38 µmol mol −1 , 80.99 kJ mol −1 , 165.82 mmol mol −1 , and 23.72 kJ mol −1 respectively. For leaf and tree ring data, pO 2 was estimated from altitude (z, in meters) via a standard barometric formula: Finally, the photorespiratory CO 2 compensation point (Pa) was calculated as: Likewise, Γ c,25 and E a,Γc were taken from [1] as 37.43 µmol mol −1 and 24.46 kJ mol −1 . Since these values were estimated for tobacco at 25 o C, we use the λ ref value inferred from [15] as 0.6.
Research Unit (CRU Ts4.03; Ref. [9]). We calculated the raw monthly mean daytime air T (T d,raw ) to consider only the part of the day when photosynthesis occurs, as: where x = −tanφtanδ, with φ the latitude, and δ the average solar declination for the month.
We then convert raw monthly mean daytime air temperature to rough estimation of leaf temperature, T d , using the relationship identified between mean daytime air temperature (growing season) and leaf temperature in [10] (illustrated in Figure S2, S3). A regression was perfomed on the [10]     We estimated these parameters for both combined and individual leaf and tree ring datasets. In all cases, we consider a model in the form of Eqns. Our choice was motivated as follows. Constraints on the range of β c (between 70 and 1000) were chosen in accordance to other analyses (e.g. [11,16]), and are quite conservative. The range of g m /g s was chosen to reflect the isotope-independent measurements reported in ref.
[17], and initial values of µ 1 ,s 1 , µ 2 , and s 2 chosen from the same study as 1.8, 1.1, 0.9, and 0.1, respectively. λ/λ ref must be greater than 0, and is expected to be around 1. It could also be greater than 1 if the proportion of CH 2 -THF exported from the photorespiratory pathway increases. Thus, we use a likely range of values between 0 and 10.
MCMC chains were run for 15000 iterations. To assess chain quality and algorithm convergence, we considered both integrated correlation time, τ (a measure of the averaged number of iterations required to achieve independent sampling), and a Geweke test (the output of which is equivalent to a Z-test). Lower τ values, and a Geweke statistic approaching 1 (p << 0.05) can be regarded as indications of acceptable chain convergence. Figure S5 shows the chain outputs for the combined angiosperm datasets, as an example. All statistics can be found in Tables S1-S4. In all cases, chains converged rapidly on the stationary distribution (τ < 48).  LGM) using Eqns. (5-7) and our best-fitted values for β c , g m /g s , and λ/λ ref at 20 o C, at two different levels of vapour pressure deficit. In fact, these simulations were a subset of a larger number of simulations over variable D (shown in Fig S5a) and variable T d (shown in Fig. S5b). The MATLAB code is also available from the authors. Our simulations also show that the D effect is stronger than the T effect, considering the range of leaf T d estimated according to Section 4 (above). D levels higher than 1.5 kPa indicate significant water demand for transpiration. Note that because higher D is usually also accompanied by higher temperatures, the two effects are expected to partly cancel each other out with respect to ∆ * a−g . However, the D effect is much stronger, and ultimately wins!

Datasets
The following datasets are included in this paper, in excel format:

We have added a few sentences to our MS to reflect the concerns and implications of Reviewers 2 and 3, and trust that this is sufficient to accept for publication in GCA. Once again, thanks very much to Reviewers 2 and 3 for these thoughtful suggestions, and encouragement.
We believe that our title is presently quite long, and perhaps also a little pedestrian. We would like whether the associate editor believes a change might be appropriate. Since our manuscript is framed in quite a general way, one possible alternative is something along the lines of: "Differences in Δ 13 C between angiosperm and gymnosperm woody plants, and their geological significance." Perhaps we can leave the final decision on this up to the Editors' judgement?
Please note that in the latest MS version, Fig. 5 has been re-plotted with the axes switched around (predicted as x-axis, observed as y-axis). This is more in line with convention. Accordingly, RMSE values have been updated in the text.

Reviewer #2:
The authors claim here that gymnosperms are less responsive to CO2 than angiosperms. This appears to contradict recent work (DOI: 10.1038/s41467-017-02691-x) by the first author (Hare) that showed, "the effect of pCO2 during the last deglaciation is stronger for gymnosperms (−1.4 ± 1.2‰) than angiosperms/fauna (−0.5 ± 1.5‰)." How can these dissimilar results be reconciled? This previous study is cited within, but not in the context of showing a different result from that presented here. et al. (2018), the gymnosperm and angiosperm/faunal data were biased towards different locations across the northern Hemisphere, with very different moisture availabilities (Fig 3c of that paper). In general, the angiosperm/faunal data were entirely biased towards continental Europe and the British Isles, whereas gymnosperm data were heavily biased towards North America, and higher latitude sites. We suspect that the CO2 effect for gymnosperms appeared much larger in that study because there was a strong bias in the gymnosperm dataset towards locations for which there were higher changes in VPD (see Fig 3c of  that paper). This is consistent with the findings of our present study, specifically -gymnosperms are more sensitive to VPD than angiosperms.

Although an interesting observation, a more thorough investigation is arguably best suited for another publication. Re-calculation will require monthly VPD data from the latest PMIP model ensembles (for both LGM and mid-Holocene) -which is a considerable task.
I appreciate the new and revised figures, but the high scatter (Fig 4a) and lack of fit (Fig 4b) with the model remains concerning. The deviation of the observed vs predicted fit from the 1:1 line ( fig. 5) is also poor. However, I still think this work represents a valued contribution and will spur additional work to test and revise the interpretations made here. These results give me pause, but the authors are clear in their approach, uncertainties, and limitations of their work, and do not overstate their conclusions. (R 2 = 0.51, RMSE = 1.49‰, Fig. 5), which is much more than previous models, which have held Δ*a-g to be constant. Our approach is thus a stepchange from previous studies, where Δ*a-g was assumed to be invariant with environmental conditions. This is the central point of the paper, which we have drawn attention to in the discussion. We therefore see our findings as a promising way forward for future research. We have added the following line for emphasis:

Line 424-427 "However, it should be emphasised that although many confounding effects may potentially exist, our framework still explains over half of the variations observed in the geological record; and is thus a step-change from previous studies, where Δ*a-g was assumed to be invariant with D, [CO2], and [O2]."
One possible reason for the deviation between observed and predicted values is that we considered a Δ 13 C model which depends on changes in atmospheric D through its effect on the partial pressure of CO2 in the leaf intercellular space (ci) and in the chloroplast (cc), but does not depend on changes in soil moisture. Yet, a recent study (Lavergne et al. 2020b) has shown that beta in the least-cost hypothesis, which was assumed constant here, should decrease with a reduction of soil moisture. This effect was not taken into account in our predictions mainly because we do not currently have a good understanding of how soil moisture should downscale ci and cc values based on first principles -but this is grounds for future research.
Because soil moisture will co-vary with atmospheric D, not accounting for such an effect may result in a potential underestimation of Δ 13 C values, and thus affect the predicted Δ*a-g values. Soil pH was also suggested to influence beta in several studies ( We have added the following paragraph to clarify the lack of fit: Line 398-405 "One possible deficiency in our ab initio model is that it assumes that χc (and thus Δ13C), is affected by atmospheric D, but not affected by other complex variables such as soil moisture. Because atmospheric D is generally expected to co-vary with soil moisture, not accounting for such additional variables might result in a potential underestimation of Δ 13 Figure  4b. A recent study (Lavergne et al. 2020b)  Perhaps the best evidence of this work's promise is the comparison shown in Fig 6 (a new figure, after Reviewer 1). I think the agreement is best for low D because terrestrial organic matter records are best preserved from wet sites (preservation bias) even though many of the modern data come from sites with higher D, and not because the model is biased towards chamber data (line 442-444).

kPa had less scatter. We have retained these curves in the manuscript as examples of the degree of uncertainty associated with the inversion + smoothing, whilst being consistent with Cui et al. (2020). However, we expect that the robustness of these curves will increase with (1) more data coverage, and (2) refinements in the smoothing technique, although this awaits another more detailed study.
Also, it is true that D=0.8 and D=1.4 give CO2 = 550 to 700 ppm at the MMCO (lines 448-450), but to suggest a similar CO2 result by lumping these two scenarios together for the MMCO is suspect given their widely different shape and timing for the CO2 peak in the Miocene.
As discussed in our response to the Reviewer's previous point, these scenarios are presented as a a rough visual estimate of the sensitivity of CO2 reconstructions, also incorporating uncertainties in smoothing procedures. They are not intended as formal error bounds, but rather as a likely range of mean/median estimates, following either model. Formal error bounds are technically complex in this case, and are arguably beyond the scope of this manuscript. A more robust curve, with formal uncertainties, will have to await future studies. However, we have included the following word of caution in our MS:

Line 466-470 "It should be noted that this range is presented as a rough estimate of the sensitivity of the terrestrial Δ13C-based [CO2] proxy to different D levels -incorporating potential errors in smoothing methods -and is not a formal confidence interval. Obtaining statistically robust confidence intervals for this [CO2] proxy method is a highly technical task that will likely improve with future research, and greater data coverage."
Line 95: f = 7-16‰, but in Table 1 = 8-18‰.

Reviewer #3:
Ln 116 -117: these are growth chamber studies, not geologic studies. Fix to '…acknowledged in growth chamber studies'.

Thank you. We have changed this to "plant chamber studies".
Ln 320: "none of them" reads awkwardly, change to "neither relationship has been described before" We agree. Changed.

Citations: Schlanser et al 2020a, is 'a' necessary?
The 'a' has been removed. Thank you.

Report from Associate Editor Sarah Feakins:
This is an intensely studied topic -whether plant d13C can be used to reconstruct aspects of atmospheric chemistry and in particular pCO2, and of course this approach also faces strong criticism given all the documented ways that this can not work in the geological record, however there are certainly also studies that find ways that it can have promise. Given that publication debate, there was some reviewer fatigue on the topic, yet three reviewers volunteered their time and have offered up careful observations as they try to follow the findings and the interpretations, as well as carefully referencing the extensive work that comes before -with some of those missed references pointed out in the reviews below. I'm not sure if it is an omission in referencing or an omission in reading those studies, either way I hope you find the commentary helpful to add to the background information to surround your studies. The reviewers are all interested in the work presented, however they present a number of challenges to dimensions that are problematic. Given the extent of the concern "data does not appear to support the conclusions" and some concerns about following what is what from all reviewers, I'm not sure if this is readily fixable. However, if you find their feedback helpful and that you are able to address their major and minor concerns, any resubmission should be accompanied by details of how these concerns were addressed with a thorough response to reviews.
In light of the reviewers' concerns, and many helpful suggestions, we have re-evaluated our manuscript. We appreciate the considerable time taken by the reviewers in presenting their meticulous comments, and we now believe that we have included almost all of their suggestions.
The most significant concern was that the data did not appear to support the conclusions, as stated by Reviewer 2: "there are no apparent trends in ∆*a-g observed from sedimentary records across the Cenozoic", and "the application of this to the geologic record is not convincing". Similar sentiments were echoed by Reviewer 3:" Perhaps this is why we see little to no variation in Δ*a-g, especially through the Paleogene?" We agree that the manner in which this was presented in our initial contribution could have been clearer, and we have addressed this by making the following substantial changes to the manuscript (also see our detailed responses to individual reviewers), among others:  Table of abbreviated symbols now appears in the main text

The major change has been to the status of the proxy. Reviewer 2 best picked up on our intention: "to spark further interest in this area and to lead to additional experiments to help clarify and test the framework provided here".
After reconsideration of our data, we agree that the paper is best focussed on presenting the novel framework, as opposed to the [O2]/[CO2] proxy. A proxy application is indeed premature at this stage, given the uncertainties at play (in fact, we set out to better understand the various factors behind differences in Δ13C, so a new proxy was never the primary aim of our paper, but seemed like a natural extension of our work at the time). Therefore, we have scaled back the claims about the proxy (as suggested by Reviewer 3), but at the same time we have included a new discussion on the implications of our findings for reconstructing pCO2 in general (as suggested by Reviewer 1). We believe this strikes a better balance than proposing a novel proxy and will be of better immediate use to the community.
In evaluating our revisions, we would also like to draw the attention of the Editors and Reviewers to the following: We also find a better fit when h is varied (-10 per mil for gymnosperms, -1 for angiosperms). These new details are also presented in the text (Lines 194-197, and 209).

Note 3: Reviewer 3 stated that "There is a large body of work that focuses on the relationships between pCO2, water use efficiency, and leaf fractionation and should at least be addressed somewhere in this manuscript."
We went to great lengths to incorporate water use efficiency into our manuscript at first submission (implicitly, via the theory for Ci/Ca, therefore it seems that Reviewer 3 has misunderstood our approach (and findings) here. We have tried to be clearer about this in the new manuscript and in our response to this Reviewer.

Note 4: Several of the papers suggested by the reviewers were known to us. Most are relevant and we have included them in the revised version of the manuscript. Schlanser et al. (2020)
was not known to us -as it is a very recent paper. We have included this data, which adds more weight to our thesis.
Note 5: Based on their responses, it seems that the three Reviewers have not considered the Electronic Annexure in their reviews.

Reviewer #1:
The authors present an in-depth study to understand the differences in carbon isotope fractionation between phylogenetically distant plant groups, the effect of this difference on paleo-CO2 reconstructions, and offer a novel way of reconstructing paleo-CO2 incorporating the novel understanding of the differences in fractionation between angiosperms and gymnosperms. The article is very well written, easy to follow, and the methods are sound. The authors use a dataset of over 7000 d13C measurements on gymnosperms and angiosperms tree-rings to identify that the ratio of mesophyll to stomatal conductance, the amount of CO2 released from photorespiration per oxygenation reaction, and the ratio of cost factors for carboxylation to transpiration, are all significantly different between angiosperms and gymnosperms and most likely drive the difference in carbon fractionation. In particular the amount of CO2 released from photorespiration per oxygenation reaction was identified as driving major differences in fractionation. - Tables and figure captions:

Reviewer #2:
Hare and Lavergne present the hypothesis that the difference in <DELTA>13C value between angiosperms and gymnosperms (∆*a-g) is not constant, but changes through the geologic record as a function of O2/CO2. Support for this is provided by first-principles theory, which they use to predict a linear, negative relationship between ∆*a-g and O2:CO2. However, application might be limited given the narrow range in which their data match the prediction. Nevertheless, this is a particularly intriguing idea and has potential to augment our understanding of changes in <DELTA>13C value in the geologic record. I very much enjoyed reading this manuscript, and I fully suspect this work to spark further interest in this area and to lead to additional experiments to help clarify and test the framework provided here.
We thank the Reviewer for these kind comments -this was our intention. I do have some substantial reservations that should be addressed before publication, including, but not limited to Figures 4 and 5, which form the heart of the proxy and application. Figure 4 represents the proxy equation that will be used to calculated O2/CO2 from ∆*a-g, and is therefore critical to ∆*a-g being used as a CO2 proxy. The choice to split the Porter (2018) data in half, and only include the two higher O2/CO2 levels in the relationship (and not the two lower O2/CO2 levels) is troubling. Much of this relationship therefore leverages on the two data points representing 20th century tree rings that only cover a very narrow range of O2/CO2, and on their own, would not be robust.
We agree in retrospect that our claims about the O2:CO2 relationship were premature, and have revaluated this relationship by using values for leaf temperature (Tg) following Helliker et al.
(2013) (see above, and the Electronic Annexure). This has resulted in the inclusion of more data into the binning procedures, and the identification of the relationship with D.
As a result of this (arbitrary?) grouping, the authors suggest the relationship is valid for O2/CO2 between 400 and 650, but the lack of clear mechanism for this is disconcerting, and extrapolation to the Paleogene and Quaternary does not work. The authors need to do a better job justifying this grouping, and reconciling these data with the first-principles theory to make this a robust proxy.
We agree that there were too few datapoints for a good test of the relationship, and that a better agreement between the theoretical predictions and observations was needed.  . 4ab). Note, too, that we have scaled back claims for a O2:CO2 proxy based on Δ*a-g, but nevertheless include a new discussion section (5.3) which outlines implications for existing proxies, which will be of greater interest to the community. These estimates of Δ*a-g were originally obtained from a minimum of 4 gymnosperm or angiosperm datapoints. We have now also increased this minimum number to 6 datapoints. We have indicated the total number of datapoints used to calculate Δ*a-g next to each datapoint in Figure 4  Lines 300-306 "Although there is more scatter in this relationship (adjusted R2 = 0.51) and the trend is modest, it is in agreement with the slope predicted by Eqns. (5)(6)(7) to within 1sigma prediction bounds. Note that to ensure robust binning, a minimum number of 6 angiosperms and 6 gymnosperm δ13C values were used in the calculation of Δ*a-g. Fits are found to be somewhat insensitive to the binning procedure when a different bin width (i.e., 30 mol mol-1 ) is chosen; the resulting regressed coefficients are not significantly different ("epsilon_f = -0.0037 ± 0.0014 ‰ mol mol-1 ). All binned data can be found in the Electronic Annexure." Other comments: Line 230. Suggest adding "decreasing" before CO2 for clarity… "The simulations show that <DELTA>13C decreases strongly with decreasing [CO2] at low D, but decreases slightly with decreasing [CO2] at high D (Fig. 3a)." Yes, this is a good suggestion-we have now rewritten the sentence accordingly (as also suggested by Reviewer 3) Line 255-259 "The simulations show that gymnosperm Δ 13   ? What term(s) are primarily responsible for the different curve shapes (i.e., the gymnosperms are obviously more responsive than angiosperms for any D)? Is 3c is drawn from Eqn 8? Including these details in the caption would be useful. Yes, the curves are drawn from Eqn 5-7, using the best-fit parameters determined by MCMC ( Table 2). We have now also indicated this on the Figure 4ab legend. Gymnosperms are generally more responsive to D than angiosperms (Figure 3a). The terms responsible for this speciesspecific pattern are primarily gm/gs, and beta_c (via χc). Angiosperms are generally more responsive to CO2 (also Figure 3a) due to the term lambda. We have now included a paragraph in the discussion section to clarify the meaning of the curves from Figure 3a (Lines 324-330). We have also included the following in the caption to Figure 3: "… as estimated from the MCMC approach applied to Eqns. (5)(6)(7). Parameters are listed in Table (  The solution for D=1.54 kPa was shown in Figure 3c, but note that the photorespiration contributions were identical in either case, so these two curves were plotted over one another. We have now included a shaded region in the figure to make this clearer.  Fig 3) is negative. I think it is simply the sign for <epsilon>f that is wrong and should be negative 0.017, but this should be checked. Thank you very much for spotting this -it was a typographical error, and we have now corrected it. Note that this regression has changed in response to the new binning procedure (more datapoints), but the slope is still negative, as expected. It is lower, but in agreement with the theory.
line 280. "The linear relationship between ∆ * a-g and [O2]/[CO2], at least over 400 to 650 mol mol-1, is noteworthy." This is interesting, but is also across a very small change in CO2 (assuming O2 = 21%, then this represents CO2 = 350-525 ppm). The fact that the trend does not continue to lower O2/CO2 is troubling. Please see our comments above, and our revised interpretation.
Likewise, the implied ∆ * a-g for O2/CO2 of the latest Pleistocene (very high O2/CO2, ~1200 mol mol-1) from Fig 4 is extremely low (on the order of -10‰). Are there any data to support such a large implied offset between angiosperms and gymnosperms during the last glacial maximum, for example? This is a very good idea, LGM data would indeed be an excellent test of our framework. In the two years leading up to this article, we performed an extensive search of the literature for colocated data from the LGM but could not find any. When undertaking these revisions and following Reviewer 2's suggestion, we contacted Prof Takeshi Nakagawa (Ritsumeikan University, Kyoto) and Dr Richard Staff (Glasgow) regarding their unpublished Lake Suigetsu leaf macrofossil δ13C record. They kindly looked through it but found only two needles from the LGM (both at about -27 per mil), so it is impossible to discern any statistically-meaningful trends (or lack thereof) here, given the challenges with the binning procedures for low sample numbers (see comments above). This will have to await research in years to come.
Note, it would be easier to compare the simulations (Fig 3) to the results (Fig 4) if both figures were on the same scales. Or, perhaps the model (expected) relationship can be drawn on Fig 4? Thank you very much for this excellent suggestion. We have included the expected relationship on Figure 4 (heavy purple line). It is in agreement with our updated regression, to within uncertainties. Note that the expected relationship did not agree with the regression in the initial version of this manuscript, but after our changes in binning and estimation of Tg, we believe the two are now in better agreement (with more modest slopes). Changing the bin width does not significantly affect the result (as discussed in the text).
283. I do not consider a value of <400 mol/mol to be an extreme O2/CO2. It equates to a CO2 > 525 ppm assuming O2 = 21%, and according to Fig 4 represents a substantial portion of the Neogene and all the Paleogene. The above is really my biggest concern for this work. I appreciate the observation that first-principles theory suggests a linear, negative relationship between ∆*a-g and O2:CO2, but the application of this to the geologic record is not convincing. The authors report a strong linear correlation between ∆ * a-g and O2/CO2, but only for a very small window of O2/CO2 space, and the meaning of this is not clear. The way the authors grouped the data appears very arbitrary: there are 6 O2/CO2 levels represented, 4 from Porter (2018) and 2 from tree ring data. The authors chose to group the two higher O2/CO2 experiments from Porter with the tree ring data, and leave the two lower O2/CO2 experiments separate. Why? Instead, if the grouping changed and only the Porter (2018) data are plotted, then the relationship changes completely. Much of this interpretation therefore hinges on the two data points represented by the 20th century tree rings. Yet, these two points would be entirely unconvincing on their own (only two data points, both with large overlapping errors, and an extremely narrow range of O2/CO2: ~590 to 630 mol/mol). Does the relationship shown in Fig 4 match the predicted trend? Is this why the authors believe this grouping of the data is best? Please see our comments above, and our revised interpretation. Thank you for drawing attention to this. The relationship shown in Fig 4 now matches the predicted trend, to within errors. Although the errors are quite large, they are conservative formal error bounds for Δ*a-g, i.e. they represent the 95% CI for the difference between two means. Therefore, one can say, for instance, that the (chamber) values for Δ*a-g at 90 and 400 are higher than those at 520 and 570 (which is significant at alpha = 0.05). The datapoint at 110 seems a bit low (but has larger error bounds). In Fig 4, we have now also indicated the total number of angiosperm+gymnosperm datapoints above each data point. Figure 4. We agree with the reviewer, and have reconsidered our interpretation, as well as scaled back the claims for a proxy throughout the new manuscript.
We agree with the reviewer, and have reconsidered our interpretation, as well as scaled back the claims for a proxy. Figure 5 shows the measured ∆ * a-g, but the predicted values are not shown. It would be much more convincing to the reader that "∆ * a-g calculated from the paleo-data generally support our novel interpretation" (line 327) if the authors showed the predicted values on this plot (could use open symbols), or show an additional plot of measured versus predicted (x-y plot). As of now, this plot looks like ∆ * a-g stays constant at ~2-3‰ for all of the Neogene and Paleogene, with tree ring records showing somewhat lower values (1-2‰). As currently shown, this figure does little to convince the reader of the utility of this proxy. This is an excellent suggestion, thank you. We have now included a new Figure 5 of measured versus predicted Δ*a-g.
360. Same potential sign issue for <epsilon>f. Should this be -0.017? See response above -corrected.
358-361. This would be much more powerful if the authors did the same calculation for the other ~10 time slices represented on Fig 5, and not only picking one point. However, it seems to me that the other Betchel data (Betchel et al 2007a, 2007b) would give similar (or slightly higher) values to this one, yet these are not at the MCO. It also appears that many of the other data points also have much higher uncertainty than the one chosen at 15 Ma. I do not follow why the authors chose this one time slice to illustrate. It appears that this single point is critical to verifying the proxy (see also final sentence of abstract), but why only this point? Since we have scaled back our proxy approach, this comment is no longer relevant, but for interest's sake, we chose this dataset because of the precision available at a single location (n=83 diterpenoids, n=113 triterpenoids), which propagated into small uncertainties in Δ*a-g, and thus [O2]/[CO2].
371-374. Yes, this proxy benefits from its independence from <delta>13CO2. However, it has its own requirements/inputs (namely O2, and also T and D?), as do all proxies. What is the evidence that uncertainty in <delta>13CO2 yields greater uncertainty in CO2 for these other proxies than do the uncertainties in O2, T, and D do here?
We are not aware of any systematic published study into the uncertainties of δ13CO2 on palaeoatmosphere reconstruction, and although it is mentioned briefly in Hollis et al. (2019), it awaits further investigation. We have therefore left this out of the new manuscript, to make way for the new Section 5.3. δ13CO2 is generally estimated using the method of Tipple et al. (2010) on foraminifers (input into this model are T, via the equilibrium fractionation factor between DIC and CO2). However, this method sometimes disagrees with measurements from Antarctic ice cores (Eggleston et al., 2016), by up to 1 per mil, at periods (perhaps related to DIC disequilibrium effects -see graph below from ongoing study by VJH). This is yet to be published, but one might expect it to propagate strongly into estimates of CO2, on the order of uncertainties of D, and possibly of O2.

Reviewer #3:
This manuscript uses a modern dataset to train a plant physiology model to explain the relationship between the offset of angiosperms and gymnosperms leaf fractionation and the O2:CO2 ratio that can then be applied to the geologic record as a CO2 proxy. The O2:CO2 level on plant fractionation merits investigating. However, I have a number of major concerns about the project design and these need be addressed before this could be considered for publication. I have outlined my major concerns below and also include some minor comments.
Major Comments 1. The data does not appear to support the conclusions. * The model is based on a large compilation of modern datasets. Comparatively little data is used to then validate the Dleaf-O2:CO2 model; 2 datapoints from field data in the last century, and 5 datapoints from growth chamber studies, two of which, do not match the model. Going forward, this study should either validate the model with more data or scale back the assertive claims about the proxy (i.e., lines 289 -292). Following the comments of Reviewer 2, and changes to our binning procedure and estimation of Tg, we have scaled back the assertive claims about the proxy. We also identify a strong relationship to vapor pressure deficit. Both the relationship to D and to [O2]:[CO2] are now in broad agreement with the predicted relationship (Eqns. 5-7). Note that the parameters responsible for this predicted relationship (lambda, gm/gs, beta_c) are inferred from the largest compilation of stable carbon isotopes yet assembled (n>7000), with great care and attention to numerical procedures. Comparatively few data are indeed used in the testing dataset -but nevertheless, our binning procedures are now robust (new lines 300-306, as well as expanded on the caption of Figure 4ab), and Δ*a-g represents the community average. Lines 300-306 "Although there is more scatter in this relationship (adjusted R2 = 0.51) and the trend is modest, it is in agreement with the slope predicted by Eqns. (5)(6)(7) to within 1sigma ~182 ppm ~220 ppm ~190 ppm ~270 ppm prediction bounds. Note that to ensure robust binning, a minimum number of 6 angiosperms and 6 gymnosperm δ13C values were used in the calculation of Δ*a-g. Fits are found to be somewhat insensitive to the binning procedure when a different bin width (i.e., 30 mol mol-1 ) is chosen; the resulting regressed coefficients are not significantly different ("epsilon_f = -0.0037 ± 0.0014 ‰ mol mol-1 ). All binned data can be found in the Electronic Annexure." * Looking at the geologic data ( Figure 5), <DELTA>a-g largely reflect a 2-3 per mil offsets with little to no variation, especially through the Paleogene despite large fluctuations in pCO2 during this span of time (see: Foster, G. L., Royer, D. L., & Lunt, D. J. (2017). Future climate forcing potentially without precedent in the last 420 million years. Nature Communications, 8, 14845 and references therein). Does this mean the proxy is not as sensitive on geologic timescales and different phylogenies have offsets that remain fairly consistent through time? Also, to consider, the samples used here are derived from sediment, and thus represent time and community averages rather than individual plant specimens, such as those used to calibrate the model. Could this add an extra level of error? Perhaps this paper could benefit from a section in the discussion specifically addressing assumptions and caveats that would help future studies using this model.
We * The most important product of this paper is to establish a new Da-g -pCO2:O2 proxy. However, only one example value from the Miocene was calculated. Why not reconstruct pCO2 from all the samples pulled from the geological record in Figure 5 and . Note that pCO2 curves are highly uncertain. For that reason, we deliberately chose to train our core model with data for which pCO2, pO2, and D are much more precise (i.e., over the 20 th -21st centuries), avoiding uncertainties associated with data from chamber experiments. We have now clarified this in Lines 225-228: "Note that for various reasons, we excluded these data from the training dataset to estimate the plant-specific parameters. These included potential uncertainties in chamber design (Porter et al. , 2015 ), and estimation of chamber δ13CO2 values (Leavitt , 2001), which occasionally result in high variability of Δ13C" We have proposed a linear model, not only based on functional form implied by the discrimination model, but also because for its simplicity (Occam's Razor). We do not know yet whether calculating pCO2 for the Schouten PETM samples will work, and so we have not presented the pCO2 proxy in this paper. It might be complicated by uncertainties in vapor pressure deficit (see new section 5.3).

From the abstract and intro, it is not immediately clear what is being done in this study.
A new framework is mentioned, but is this a model? Is this a framework based on data? Date generated in this project? Growth chamber data? Data from the geologic record? This should be clarified.
We have edited the abstract to add greater clarity. We use framework and simple empirical model interchangeably, but we are more careful to refer to the full model (Eqns. 5-7) as the comprehensive ab initio model of discrimination (see response to minor comment #1). Throughout the manuscript, we have been specific when we are referring to particular datasets (including "training" and "testing" -new terms).