Modeling the impact of bioturbation and species abundance upon discrete-depth individual foraminifera analysis used in ENSO-type climate reconstructions

We use a single foraminifera enabled, holistic hydroclimate-to-sediment transient modelling approach to fundamentally evaluate the efficacy of discrete-depth individual foraminifera analysis (IFA) for reconstructing past sea surface temperature (SST) distribution, a method that has been used for reconstructing El Nino Southern Oscillation (ENSO). The computer model environment allows us to control for variables such as sea surface temperature (SST), foraminifera species abundance response to SST, as well as depositional processes such as sediment accumulation rate (SAR) and bioturbation depth (BD), and subsequent laboratory processes such as sample size and machine error. Examining a number of best-case scenarios, we find that IFA-derived reconstructions of past SST distribution are sensitive to all of the aforementioned variables. Running 100 ensembles for each scenario, we find that the influence of bioturbation upon IFA-derived SST reconstructions, combined with typical samples sizes employed in the field, produces noisy SST reconstructions with poor correlation to the true SST distribution in the water. This noise is especially apparent for values near the edge of the SST distribution, which also happens to be the distribution region of particular interest for ENSO. The noise is further increased in the case of machine error and decreasing SAR. We also find poor agreement between ensembles, underscoring the need for replication studies in the field to confirm findings at particular sites and time periods. Furthermore, we show that a species’ abundance response to SST could bias IFA-derived SST reconstructions, which can have consequences when comparing IFA-derived SST from markedly different mean climate states.

Measurements on individual sea-dwelling shelled organisms called foraminifera retrieved from deep-sea sediment cores have been used to reconstruct past water temperature variation and, in turn, past El Niño.
To evaluate this method, we use a computer model to simulate millions of single foraminifera and how they become mixed in the sediment after being deposited upon the sea floor. This approach allows us to subsequently compare the water temperature reconstructed from the single foraminifera in the sediment core to the true water temperature in the water during the past. In short, our computer simulation helps to define the total uncertainty associated with El Niño reconstructions based on such methods.

Background
El Nino Southern Oscillation (ENSO) is an important feature of the Earth's interannual climate dynamics, and the magnitude of ENSO events has important consequences for human populations of the tropical Pacific regions and further afield (Lyon & Barnston, 2005). Future climate change may impact the magnitude and/or frequency of climate events. ENSO manifests itself as a reversal of the Pacific trade winds and associated extreme shifts in tropical Pacific sea surface temperature (SST) patterns, with observational era ENSO events having occurred every two to seven years, and being between nine and 12 months in duration. Knowledge of ancient monthly SST variation and its relationship with past global climate parameters is therefore crucial in advancing our understanding of palaeo-ENSO dynamics and, thus, better informing future regional climate predictions (Rosenthal & Broccoli, 2004). To this end, forensic palaeoceanographic analysis of pan-Pacific deep-sea sediment can potentially offer a pan-Pacific archive of past frequency and magnitude of SST extremes and their relationship to past global climate conditions, potentially providing valuable insight into past ENSO.
The most-studied palaeoclimate signal carrier vessel within deep-sea sediment cores is the carbonate shells of planktonic foraminifera (microscopic, single-celled organisms), which can record the conditions of the ambient water that the foraminifera lived in. These organisms have a lifespan of ~1 month, after which their shells are deposited on the sea-floor. Their short lifespan means that foraminifera populations retrieved from deep-sea sediment archives can, in principle, reflect past monthly SST dynamics, which is key for reconstructing ENSO events. However, the technical limits associated with isotope ratio mass spectrometry (IRMS) analysis of foraminifera has traditionally required that many tens of single foraminifera shells to produce a viable measurement, thus averaging out any monthly SST signal.
Advances in IRMS have allowed for the analysis of single foraminifera shells of sizes representative of planktonic populations (Oba & Uomonoto, 1989;Spero & Williams, 1990), which has encouraged researchers to carry out individual foraminifera analysis (IFA) to reconstruct SST variability associated with ENSO (Koutavas et al., 2006;Leduc et al., 2009). This method can, in principle, allow for the extraction of a range of monthly SST values from a given interval of a deep-sea sediment archive (i.e. 1 cm discrete depths from a given sediment core). Using the IFA method, a number of foraminifera are subsampled from a discrete-depth's foraminifera population, after which some form of proxy method is applied to each foraminifera's carbonate shell to infer individual SST values.
The IFA method depends upon a major assumption, namely that the distribution of the sub-sampled foraminifera SST values are a faithful representation of the true distribution of monthly SST values for a given time interval (i.e. a decadal/centennial/millennial period). The ability of discrete-depth IFA to accurately reproduce a time period's true SST distribution can be clouded by a number of environmental, biological and logistical issues, which occur in the water domain (pre-deposition), sediment archive domain (post-deposition) and laboratory domain (post-retrieval). Regarding issues in the water domain, it is possible that a foraminifera species may not exist continually at a single surface water location, thus giving a non-continuous record of SST, which can have consequences for ENSO reconstructions (Metcalfe et al., 2020). Secondly, a species' foraminiferal abundance through time is not constant and can be influenced by SST itself, which may bias IFA-derived SST distribution reconstructions, which is especially relevant in the case of ENSO, which itself influences SST. Similarly, long-term absolute shifts in the overall range of SST (e.g. from a glacial to an interglacial world) may cause the water's SST range to shift from one that partially overlaps with a species' preferred temperature range to on that fully overlaps with a species' preferred temperature range. In practical terms, this could lead to an IFA-derived artefactual shift from a relatively narrow SST range to a relatively wider SST range, with potential for incorrect interpretation regarding glacial-interglacial ENSO dynamics.
Issues associated with the sediment archive domain can further cloud IFA-derived SST distributions.
Specifically, systematic bioturbation of deep-sea sediment archives means that individual foraminifera with vastly different ages are mixed into single discrete-depth sediment intervals, which is a particular challenge in the current state-of-the-art in IFA, which still relies on the 'average age' of a particular sediment interval (i.e. it is not yet feasible to systematically date single planktonic foraminifera). This practical limitation in turn places an interpretive constraint upon IFA; when foraminifera from vastly different long-term climate states (i.e. multi-millennial) are mixed into the same sediment interval, IFAderived SST variability reconstructed from that sediment interval cannot be exclusively assigned to decadal or centennial changes in ENSO dynamics. For these reasons, it is important to understand the age distribution of the foraminifera contained within a discrete-depth sediment interval. For example, it is often assumed that a sediment archive with a sediment accumulation rate (SAR) of e.g. 5 cm ka -1 will have a temporal resolution of 1000/5 = 200 yr cm -1 , an assumption that is deceptively supported the observation that the mean age of such a sediment archive would increase by ~200 yr every cm. However, downcore discrete-depth change in mean age is not the same concept as variance of the age contained within a single discrete-depth of sediment. The distribution of the age contained within a single centimetre of sediment core is governed not only by the SAR, but also by the bioturbation depth (BD), the uppermost depth of the sediment within which bottom-dwelling organisms actively mix the sediment.
Following established understanding of bioturbation processes (Berger & Heath, 1968;Pisias, 1983;Schiffelbein, 1984), the 1σ age value for a single cm of sediment core can be approximated, in the example of a 5 cm ka -1 sediment core with a representative BD of 10 cm (Boudreau, 1998;Trauth et al., 1997), as 10/5×1000 = 2000 yr. In idealised conditions, the corresponding shape of the age distribution for a discrete-depth interval of sediment core will be characterised by an exponential distribution with long tail towards older ages. The average age of the sediment at the top of the sediment archive will also be similar to the 1σ age value, as exhibited in 14 C dates of deep-sea core tops which support a BD of between 5 and 10 cm (Henderiks et al., 2002;Trauth et al., 1997), including for the Pacific (Peng et al., 1979;White et al., 2018). It is additionally important to consider the shape of this distribution when comparing IFA-derived SST from an interval of sediment core (subsampled from a population with a exponential age distribution with a long tail towards older ages) to observational or model SST from specific periods of climate history (i.e. a uniform interval of time).

Experimental Design
Here, we use a computer modelling approach, which allows all simulation parameters to be known and strictly controlled, thereby allowing us to create an idealised experimental design with minimised degrees of freedom. Such an experimental design takes advantage of computer modelling as an investigative tool with which to increase our conceptual understanding of processes (Emanuel, 2020). This approach offers advantages over field-based testing of IFA, the latter of which is affected by multiple dynamic parameters, thus leading to increased degrees of freedom and limiting the ability to make interpretative conclusions. Our comprehensive modelling approach incorporates controlled parameterisations of climate, sediment and laboratory processes. Such a strictly-controlled computer model environment allows us to directly compare a known input SST signal to a reconstructed SST derived from the corresponding simulated sediment-based IFA incorporating known understanding of sediment concepts.
In this way, we can objectively quantify how well field-based discrete-depth IFA would function in a number of best-case scenarios, allowing its interpretive capacity for the reconstruction of ENSO-type climate variability to be evaluated at the most fundamental level.

Approach synopsis and model setup
We carry out a holistic hydroclimate-to-sediment transient modelling approach to test the suitability of discrete-depth IFA for the reconstruction of ENSO-type climate variability. Crucially, our approach includes a quantified representation of both sediment dynamics (in particular bioturbation) and species abundance, thus building upon previous models and simulation estimations of IFA accuracy where such information was not yet included (Fraass & Lowery, 2017;Leduc et al., 2009;Thirumalai et al., 2013).
Our modelling approach is carried out using an offline coupling of two transient models: a singleforaminifera sediment accumulation simulator (SEAMUS; (Lougheed, 2020)) run at a monthly timestep resolution, forced with monthly SST from the TRACE-21ka climate model (He, 2011). We investigate a number of best-case scenarios, concentrating on the time period spanning from 20 ka (BP 1950) up to and including 1989 CE, assuming a hypothetical sediment core at a location ideal for studying ENSO (Fig. 1), i.e. at the centre of the Niño 3.4 ENSO region that is used to calculate the Oceanic Niño Index (ONI).
In this study, simulated single foraminifera are incorporated into synthetic sediment archives, the latter of which employ best-case sedimentation conditions whereby representative values for SAR and BD are both kept temporally constant. We assume a best-case scenario where foraminifera perfectly record monthly SST (in this case the TRACE-21ka SST), and we also assume the existence of an ideal proxy method that allows for perfect retrieval of SST data from the single foraminifera. In reality, foraminifera may not continuously record the water temperature at the surface or indeed at the same water depth in general, which further complicates IFS ENSO in practice, however, here we seek to test best case conditions. After carrying out the sediment archive and bioturbation simulation, synthetic single foraminifera are randomly picked from each discrete-depth cm interval of simulated core, thereby resulting in virtual IFA. The output of the best-case virtual IFA retrieved from the sediment depth domain can subsequently be directly compared to the inputted SST in the time domain (i.e. TRACE-21ka SST), allowing us to evaluate the current state-of-the-art in IFA at the most fundamental level.
We sum up the sediment model component (SEAMUS) in Section 2.1, and the climate component (TRACE-21ka) in Section 2.2. An overview of our various best-case scenario simulations, as well as their associated run parameters, can be found in Section 2.3.

Sediment model component
We model the sedimentation history of single foraminifera using the the SEAMUS sediment accumulation model (Lougheed, 2020). This stochastic model uses the same established understanding of bioturbation (Bard, 2001;Berger & Heath, 1968;Pisias, 1983) that is also incorporated into previous  (Dolman & Laepple, 2018;Trauth, 1998Trauth, , 2013, but differs in model execution in that it is explicitly designed for the purpose of modelling single foraminifera, thus making it a suitable sediment model for use in this IFA evaluation study. Furthermore, the stochastic nature of the model is ideal for simulating bioturbation of single foraminifera, which is in itself a stochastic process. Our period of interest spans 20 ka BP to 1990 CE, so we have run the SEAMUS model from 30 ka BP to 1990 CE to provide sufficient model spin-up for our period of interest. The model is run using a monthly timestep resolution, whereby single synthetic foraminifera are generated at each time-step and added to the top of the sediment archive after which the BD of the sediment archive is uniformly mixed. All simulations are run with an appropriate BD of 10 cm, following previous studies (Boudreau, 1998;Trauth et al., 1997). Some of our model run scenarios assume a temporally constant foraminiferal abundance, in such cases we assign a constant per timestep foraminiferal abundance that results in 10 4 foraminifera per cm of sediment (i.e. the prescribed per timestep abundance is higher in the case of higher SAR and viceversa). In the case of model runs with temporally dynamic foraminiferal abundance, 100 foraminifera per timestep (i.e. month) are simulated, allowing temporal (i.e. monthly) changes in abundance to be modelled with sufficient statistical power (i.e. if relative abundance of a the species drops from 0.56 to 0.55 then it will result in one less foraminifera of the species being simulated for a timestep). All of our model run scenarios are carried out using 100 ensemble runs in SEAMUS, thus capturing the stochastic nature of bioturbation (i.e. the fact that no two sediment archives formed under the same conditions will be exactly alike). Subsequently, four separate 'picking' scenarios are carried out on each of the 100 ensembles, whereby 50, 100, 500 or 10 4 synthetic foraminifera are randomly picked from each discrete 1 cm depth slice of the synthetic core, whereby the picker is assumed to have perfectly identified the species in all cases, thus avoiding challenges associated with species mis-identification (Pracht et al., 2019). Finally, in some scenarios we add Gaussian noise of ±1°C to the SST of all simulated foraminifera, to mimic proxy uncertainty (approximately equivalent to a δ 18 O measurement error of 0.1‰). Ensemble runs were performed using a computer cluster provided by the Swedish National Infrastructure for Computing (SNIC) at the Uppsala Multidisciplinary Centre for Advanced Computational Science (UPPMAX).

Climate model component
Monthly SST forcing for the SEAMUS model is sourced from the TRACE-21ka transient climate simulation (He, 2011), specifically using the surface temperature data for the TRACE-21ka grid cell centred on the coordinates 1.86° N and 146.25° W. This grid cell, at the centre of the Niño 3.4 ENSO region used for calculating the ONI-index, is ideal for our synthetic core simulation as it is characterised by high inter-annual surface temperature variation typical of ENSO (Fig. 1a). Furthermore, the grid cell also captures the glacial-interglacial SST transition (Fig. 1b), as well as typical TRACE-21ka transient changes in ENSO-type SST dynamics, as shown by the 1.5-7 yr filtered 100 and 1000 year moving 1σ of SST (Fig. 1c). This filtering approach has previously been used to identify ENSO-type dynamics in TRACE-21ka for the Niño 3.4 region (Liu et al., 2014).
The TRACE-21ka dataset is the result of a fully-coupled Community Climate System Model (CCSM3) simulation with T31_gx3 grid resolution that uses transient forcing changes in both greenhouses gases, orbital driven insolation variations, ice sheet evolution (ICE-5G) and associated meltwater fluxes for a non-accelerated atmosphere-ocean-sea ice-land surface coupling. The TRACE-21ka dataset begins at 22 ka, whereas our SEAMUS run starts at 30 ka. The reason for this difference is that our period of interest for this study covers the period spanning the past 20 ka, and we provide an extra 10 ka of spin-up time for the SEAMUS model, which is important in cases of very low SAR (e.g. ≤ 5 cm ka -1 ). In order to provide SST data for synthetic foraminifera generated between 30 ka and 22 ka, the oldest 1500 years contained within the TRACE-21ka dataset are repeated from 22 ka to 30 ka. Such an approach obviously does not represent an accurate picture of the climate between 30 ka and 22 ka, but it has no practical consequences for the particular purpose of our study, which is to compare a given climate input signal in the time domain to the subsequent signal recorded by single foraminifera in the sediment depth domain.
Furthermore, our period of study interest spans the past 20 ka.

Model run settings
We carry out a number of best-case scenarios, with each scenario being subject to 100 ensemble runs to capture the full stochastic range resulting from the sedimentation, bioturbation and picking process. We run SAR scenarios for 5, 10 and 40 cm ka -1 . In this study, we concentrate on best-case scenarios, so the figures in the main text are for the 40 cm ka -1 scenarios only. The corresponding figures for the 5 cm ka -1 and 10 cm ka -1 scenarios, which may be more realistic for much of the Pacific, can be found in the supplement. Each of the three SAR scenarios is first subjected to 100 ensemble runs with constant foraminifera abundance and a perfect SST proxy, a second set of 100 ensemble runs is then carried out with constant abundance and added ±1°C Gaussian noise on the SST proxy, a third set of 100 ensemble runs is carried out with dynamic abundance and a perfect SST proxy, and a final set of 100 ensemble runs is carried out with dynamic abundance and ±1°C Gaussian noise on the SST proxy. All of the aforementioned 1200 ensembles are each subjected to randomised picking for 50, 100, 500 and 10 4 foraminifera per cm of sediment core depth.
As described in the previous paragraph, some of our scenarios incorporate dynamic foraminiferal abundance in order to investigate the effect of changes in species abundance upon IFA-derived reconstructions. In these scenarios, we use a hypothetical transfer function (Fig. 2a) to assign a per timestep abundance to our simulated foraminifera species, whereby the abundance is calculated as a function of the corresponding TRACE-21ka SST for the timestep. This approach allows us to quantify how a known species abundance response to SST could systematically bias an IFA-derived SST distribution. Consider, for example, a theoretical time interval whereby the true monthly SST data are normally distributed, as in the theoretical example in Fig. 2b. In such a case, an IFA-derived SST distribution using a species characterised by our SST transfer function would be biased towards warmer temperatures and, furthermore, the shape of the IFA-derived SST distribution would be skewed, as shown in the abundance-modified profile in Fig. 2b.

Downcore, discrete-depth IFA standard deviation
Numerous studies have concentrated on subsampling numerous individual foraminifera from the same discrete-depth interval of a sediment core, from which the 1σ value of the SST (or a proxy equivalent thereof) of those foraminifera is calculated to infer ENSO activity for a particular time period, whereby a greater 1σ value is assumed to indicate increased ENSO activity, and vice versa (Koutavas et al., 2006;Koutavas & Joanides, 2012;Rustic et al., 2015). To evaluate such an approach, we compare the 1.5-7 yr filtered 1000 year moving 1σ of SST in the time domain (Fig. 1c) to ensembles of SEAMUS runs carried out under various sediment and picking conditions within a best-case 40 cm ka -1 scenario ( Fig. 3 and Fig.   4). The equivalent figures for the 5 cm ka -1 and 10 cm ka -1 scenarios, which may more be representative best-case values for the open ocean areas of the Pacific (Metcalfe et al., 2020), can be found in the supplement.
We find that the discrete-depth, downcore 1σ value reconstructed using IFA analysis for the simulated 40 cm/ka scenarios varies greatly between all of the 100 ensemble runs in the case of IFA sample sizes typically used in the field, i.e. between 50 foraminifera ( Fig. 3a-b; Fig. 4a-b) and 100 foraminifera ( distinct time periods by comparing, e.g., a late glacial sediment slice's 1σ SST value to a late Holocene sediment slice's 1σ SST value. In such cases, our model results suggest that, for the aforementioned typical sample sizes deployed in the field (50-100 foraminifera), random chance may lead to any number of possible apparent outcomes regarding the relative apparent ENSO states of the late glacial and the late Holocene.
We do find, however, that greatly increased sample size, higer SAR and reduced measurement error can all significantly improve the probability of a given ensemble's IFA-derived downcore 1σ SST exhibiting significant correlation with the TRACE-21ka SST ENSO-type variation ( Table 2). We must stress, however, that our best-case scenarios involve constant SAR and BD, whereas real world conditions in the field are inherently dynamic and would, therefore, be even more challenging, even in the case of larger sample sizes. Additionally, we note that the improved correlation in the case of larger samples size does not correspond to a good reproduction of the absolute values of the ENSO-type variation as indicated by the TRACE-21ka SST ENSO-type variation. We note that even in the case of an extreme best-case scenario where it is possible to find, pick and analyse 10 4 foraminifera per cm, the absolute values of the ENSO-type variation derived from IFA are systematically greater than that of the TRACE-21ka SST ENSO-type variation ( Fig. 3g and Fig. 4g), despite good correlation (Table X). This offset in absolute values can be due to the fact that the 1.5-7 yr filtered, 1000 year smoothed TRACE-21ka standard deviation is reflecting a different integration of the time than the 1σ data retrieved from discrete-depth IFA. The former is based on a smooth of uniform time, whereas the latter is represents a population of foraminifera with a long-tailed age distribution. The absolute offset between the two signals is further increased in the case of machine error on the IFA SST analysis (Fig. 3h and Fig. 4h), thus highlighting the importance of accurately quantifying uncertainties in the analytical process.

Discrete-depth IFA distribution analysis
Many IFA studies have gone beyond studying a discrete depth's 1σ SST value and have branched into more forensic studies of a discrete depth's IFA-derived SST distribution. These studies have focussed on analysing the shape of said distribution using various statistical tools, including histograms to infer distribution skewness analysis of histograms (Khider et al., 2011;Leduc et al., 2009), as well as quantilequantile (Q-Q) plots (Ford et al., 2015;Rongstad et al., 2020;Thirumalai et al., 2019;White et al., 2018;White & Ravelo, 2020). Such analysis can reveal apparent shifts in the shape of the downcore, IFAderived SST distribution, which the aforementioned studies have attributed to changes in ENSO-type climate variability. Here, we compare the monthly TRACE-21ka SST data for the 18 ka to 17 ka period to our 100 ensembles of simulated IFA SST for our 40 cm ka -1 scenario, taking in each ensemble the 1 cm discrete-depth with a median age closest to 17.5 ka. We detail 100 ensembles with no analytical error and constant abundance (Fig. 5), 100 ensembles with ±1° C analytical error and constant abundance (Fig. 6), 100 ensembles with ±1° C analytical error and dynamic abundance (Fig. 7), and 100 ensembles with ±1° C analytical error and dynamic abundance (Fig. 8). In all cases in our 40 cm ka -1 scenario, we find that sample sizes typically associated with IFA in the field (50-100 foraminifera) produce high levels of noise, leading to low reproducibility from one ensemble to the next (panels a and d in Figs. 5-8), with even lower reproducibility in the case of the 5 and 10 cm ka -1 scenarios (see supplemental figures). In practical terms, these results suggest that if one were to, at the same coring location, retrieve multiple sediment cores and carry out discrete-depth IFA, it is likely that markedly different outcomes would be produced each time, each with poor correspondence to the true SST distribution. Furthermore, as the level of noise increases with lower SAR, one has to be additionally careful when comparing IFA results from sites with markedly different SAR (Thirumalai et al., 2019).
We also find that the IFA method has a tendency for noisy over-or undersampling of the tails of the true SST distribution in the case of typical sample sizes (50-100 specimens) used in the field (panels b and e in Fig. 5-8). This effect can be attributed to the fact that there is a low occurrence of individual foraminifera within the population that record more extreme temperatures, and small sample sizes are likely to either miss such foraminifera altogether (i.e., -100% oversampling), or, in the case of a single such foraminifera being picked within the sample, significantly over-represent extreme SST within the sample (in some cases >500% oversampling). This effect has practical consequences for interpretations made within IFA studies, seeing as the tails of the SST distribution also happen to be the region of interest when reconstructing the presence of extreme ENSO events (Koutavas et al., 2006;Rustic et al., 2015). This noisy under-or oversampling of the distribution tails by IFA also translates directly to sample Q-Q plots (panels c and f in Fig. 5-8), which are commonly used in IFA studies to investigate the population distribution (Ford et al., 2015;Rongstad et al., 2020). This level of noise in the tails increases substantially in the case of increased analytical error, i.e. when one compares panels a-f in Fig 5 (without simulated analysis error) and Fig. 6 (with ±1°C simulated analysis error). We furthermore find that even larger sample sizes involving 500 foraminifera are also prone to noisy under-or oversampling in the tails, especially in the case of analytical error (panels g, h, and i in Fig. 6). We also note that the tendency for under-and oversampling in the tails is greatly increased in the case of lower SAR that is more representative of real-world conditions (see supplemental figures for 5 cm ka -1 and 10 cm ka -1 SAR scenarios). Even in the case of sample sizes of 10 4 foraminifera in our 40 cm ka -1 scenario (panels j, k and l in Figs. 5 and 6) we also find sub-optimal agreement with the TRACE-21ka SST distribution in the tails. This disagreement is not due to noise, but due to the fact that we emulate the current state of the art, whereby researchers compare SST from a uniform interval of time (in our case 18 ka to 17 ka) to a sample of foraminifera retrieved from a discrete-depth with a foraminifera population characterised not by a uniform distribution, but an exponential distribution with a long tail towards older ages.
Finally, we investigate the influence of temperature-induced species abundance changes upon IFAderived SST distributions. Our 40 cm ka -1 simulations that have been run using the temperature abundance transfer function in Fig. 2a are shown in Fig. 7 (without analytical noise) and Fig. 8 (with analytical noise). We find that in all cases, the IFA-derived SST distribution is biased towards too warm values when compared to the TRACE-21ka SST distribution (panels a, d, g and j in Fig. 7 and Fig. 8).
This bias can also be visualised as an oversampling of warmer values (panels b, e, h, k in Fig. 7 and Fig.   8), or bias in a Q-Q plot (panels c, f, i, l in Fig. 7 and Fig. 8). We demonstrate that a species' abundance response to temperature can inherently bias IFA-derived reconstructions of SST distribution, which could have practical consequences for studies in the field. For example, the results in studies that compare IFAderived SST distributions from significantly differing mean climate states (White et al., 2018;White & Ravelo, 2020), may be (partially) attributable to a species' temperature abundance response to the dominating SST profile associated with the differing climate states. Our results demonstrate the importance of incorporating understanding of past temporal changes of species abundance and its relationship to past SST to better constrain such studies.

Conclusion
Our best-case modelling study reveals a number of challenges which inhibit the efficacy of discrete-depth IFA in producing reconstructions of past SST distribution, the latter of which is paramount in reconstructing past ENSO-type climate dynamics. Firstly, we find that bioturbation of sediment archives, combined with typical sample sizes employed in IFA-based studies, can lead to noisy IFA-derived SST distribution reconstructions. This noise leads to poor reproducibility with a potential for artefactual results. We propose that this problem can be addressed by quantifying the total error on IFAreconstructions using two main approaches: (1) Ensemble-based forward model studies, as detailed in this study using best-case scenarios, can be run using the dynamic sediment and species abundance parameters present at a particular archive in the field. This approach will help estimate the total stochastic error associated with the IFA-derived reconstruction. Care must be taken to include uncertainties regarding time-domain estimations of SAR, BD, species abundance, and analytical uncertainty. (2) Replication studies in the field (essentially a real-world ensemble approach) can help to further understand of the the stochastic noise involved with IFA reconstructions. We furthermore have shown in our best-case study that a species' abundance response to SST can inherently bias IFA-derived reconstructions of past SST variance. We propose that the coupling of a single foraminifera sediment model approach to foraminiferal ecological models (Lombard et al., 2011;Metcalfe et al., 2020;Roche et al., 2018) could further help to constrain the total uncertainty associated with IFA-derived SST reconstructions.
We have also demonstrated that SST from uniform periods of time (as humans are accustomed to using) cannot directly be compared to IFA-derived SST which is retrieved from a population with an age distribution characterised by an exponential distribution with a long tail towards older ages. Subsequently, we propose that researchers adjust observational or model SST data to integrate an exponential representation of time when comparing to IFA-derived SST.
Finally, we note that although our study concentrates on the IFA method, our results are also useful for researchers who measure pooled samples of foraminifera (i.e. measurement upon multi-specimen samples) to reproduce a general SST history in the Pacific. Our ensemble results help shine light upon the intra-sample variability that could be expected within such samples, as well as possible noise artefacts present in the mean value.

Author contributions
BCL and BM conceived the study. BCL executed the model runs and wrote the manuscript, with input from BM.

Data availability
No new data have been generated in this study.  Ocean; ARA -Arabian Sea. We have estimated the 1σ value of age in 1 cm of sediment based on the SAR and a BD of 10 cm (Boudreau, 1998), using the following calculation based on (Berger & Heath, 1968): BD/SAR×1000, where SAR is entered in cm ka -1 and BD in cm.     year (12000 month) moving 1σ of the 1.5-7 year filtered monthly SST data (as also shown in Fig. 1c.) The left panels (a, c, e and g) show the output of scenarios with 50, 100, 500 and 10 4 randomly picked foraminifera per discrete 1 cm depth, all with constant species abundance and no assumed analytical error. The right panels (b, d, f and h) show the output of scenarios with 50, 100, 500 and 10 4 randomly picked foraminifera per discrete 1 cm depth, all with constant species abundance and an assumed analytical error of ±1°C in SST. and 10 4 randomly picked foraminifera per discrete 1 cm depth, all with dynamic species abundance (following Fig. 2a) and an assumed analytical error of ±1°C in SST. 425 Figure 5. Simulated single foraminifera SST distributions from 100 ensembles of SEAMUS runs, with SAR of 40 cm ka -1 , BD of 10 cm, no analytical errror and constant abundance. In each ensemble, the single foraminifera SST distribution from a single discrete depth with a simulated median age of 17.5 ka is shown, and compared to the TRACE-21ka SST distribution for the 18 ka to 17 ka period. The left panels (a, d, g and j) show the 100 SEAMUS ensembles as coloured lines in the case of 50, 100, 500 and 10 4 randomly picked foraminifera, with the TRACE-21ka SST distribution is shown as a black line. The middle panels (b, e, h and k) show the the rate of over/undersampling for each of the 100 SEAMUS ensembles (coloured lines) relative to the TRACE-21ka SST distribution (black line) in the case of 50, 100, 500 and 10 4 randomly picked foraminifera. The right panels (c, f, i and l) show Q-Q plots of the 100 SEAMUS ensemble quantiles vs the TRACE-21ka quantiles as coloured lines in the case of 50, 100, 500 and 10 4 randomly picked foraminifera, with a perfect 1:1 correspondence to TRACE-21ka shown for reference as a black line.  plots of the 100 SEAMUS ensemble quantiles vs the TRACE-21ka quantiles as coloured lines in the case of 50, 100, 500 and 10 4 randomly picked foraminifera, with a perfect 1:1 correspondence to TRACE-21ka shown for reference as a black line. Figure 8. Simulated single foraminifera SST distributions from 100 ensembles of SEAMUS runs, with SAR of 40 cm ka -1 , BD of 10 cm, ±1 °C analytical errror and dynamic abundance (following Fig. 2a). In each ensemble, the single foraminifera SST distribution from a single discrete depth with a simulated median age of 17.5 ka is shown, and compared to the TRACE-21ka SST distribution for the 18 ka to 17 ka period. The left panels (a, d, g and j) show the 100 SEAMUS ensembles as coloured lines in the case of 50, 100, 500 and 10 4 randomly picked foraminifera, with the TRACE-21ka SST distribution is shown as a black line. The middle panels (b, e, h and k) show the the rate of over/undersampling for each of the 100 SEAMUS ensembles (coloured lines) relative to the TRACE-21ka SST distribution (black line) in the case of 50, 100, 500 and 10 4 randomly picked foraminifera. The right panels (c, f, i and l) show Q-Q plots of the 100 SEAMUS ensemble quantiles vs the TRACE-21ka quantiles as coloured lines in the case of 50, 100, 500 and 10 4 randomly picked foraminifera, with a perfect 1:1 correspondence to TRACE-21ka shown for reference as a black line.