A spatial reconstruction of Siberian Last Glacial Maximum climate from pollen data

The Last Glacial Maximum (LGM, around 21.000 years before present) was a period with significantly colder global mean temperature, large Northern Hemisphere ice sheets, and lower CO2 concentrations. Siberia was affected by a lower sea level which led to a closed Bering strait and a northward shift of the Arctic Ocean coastline. However, unlike other high-latitude areas, Siberia was not covered by a terrestrial ice sheet at that time. Climate simulations with LGM boundary conditions show large inter-model differences especially in Northern Siberia, which impede a direct analysis of Siberian LGM climate from simulations. Preserved pollen samples provide information on the LGM vegetation and climate state.
We reconstruct mean temperature of the warmest month and mean annual precipitation from a network of pollen samples using weighted averaging partial least squares transfer functions. Combining these local reconstructions with a multi-model ensemble of PMIP3 climate simulations in a Bayesian framework, we obtain the first probabilistic spatial reconstruction of Siberian LGM climate. Our reconstruction shows less cold summer temperature anomalies along the coastlines than in Western Siberia, and a weaker precipitation reduction in Eastern Siberia compared to Western Siberia. Finally, we quantify the mismatch between each ensemble member and the proxy data. By examining common features among similarly performing simulations, we find that most simulations with a comparatively warm and wet Siberian climate score better.
Our data constrained state estimate is well-suited for future analyses of Siberian LGM climate. Potential applications include the investigation of reasons for the absence of a Siberian ice sheet during the LGM, the quantification of land-atmosphere carbon fluxes, and the estimation of past afforestation rates.

Northern Siberia, which impede a direct analysis of Siberian LGM climate from simulations. Preserved pollen samples provide information on the LGM vegetation and climate state.
We reconstruct mean temperature of the warmest month and mean annual precipitation from a network of pollen samples using weighted averaging partial least squares transfer functions. Combining these local reconstructions with a multi-model ensemble of PMIP3 climate 15 simulations in a Bayesian framework, we obtain the first probabilistic spatial reconstruction of Siberian LGM climate. Our reconstruction shows less cold summer temperature anomalies along the coastlines than in Western Siberia, and a weaker precipitation reduction in Eastern Siberia compared to Western Siberia. Finally, we quantify the mismatch between each ensemble member and the proxy data. By examining common features among similarly

Introduction
During the Last Glacial Maximum (LGM), around 21.000 years before present (21ka), the global mean temperature was significantly colder than today. Large areas of the Northern Hemisphere, in particular In the Pacific sector glaciers were near present-day extension during the LGM (Meyer and Barr, 2017;Stauch and Gualtieri, 2008).
For Western Siberia, pollen-based reconstructions from Tarasov et al. (1999) show substantially reduced precipitation during the LGM and colder summer temperature anomalies in mid-latitudes than in high latitudes. The Baikal lake is well studied from lake sediment records (Colman et al., 1995;Williams 70 et al., 1997). Climate changes are mostly inferred from diatoms and biogenic silica but quantitative climate reconstructions are not available. Borehole temperature reconstructions show minimal glacial temperatures with mean annual temperature anomalies of -10 K around 90ka (Dorofeeva et al., 2002), coinciding with the maximal Siberian glacier extent. Subsequent warming leads to an LGM anomaly of around -4 K compared to present-day. 75 In high-latitude Central Siberia and along the Arctic Ocean coastline, dryer conditions than today are reconstructed from a variety of proxies (e.g. Sher et al., 2005;Pitul'Ko et al., 2007;Andreev et al., 2011;Ashastina et al., 2018;Opel et al., 2019). Stable isotope analyses and macrofossils indicate regionally homogeneous colder winter temperatures (e.g. Meyer et al., 2002;Kienast et al., 2005;Andreev et al., 2011;Wetterich et al., 2011;Opel et al., 2019). In contrast, inconsistent results are reported for summer 80 temperature. While Andreev et al. (2011) deduce colder-than-present summers from pollen and insect assemblages along the Arctic Ocean coastline, Kienast et al. (2005) and Sher et al. (2005) interpret macrofossil, insect, and mammal findings from the Lena delta as indicators of warmer or similar to present-day summer temperatures. Pitul' Ko et al. (2007) and Ashastina et al. (2018) infer slightly cooler LGM summer temperatures from pollen records in the Yana region between Lena and Kolyma. 85 In Eastern Siberia, Berman et al. (2011) reconstruct positive LGM summer temperature anomalies for the Kolyma region from invertebrates. Within uncertainties, this range coincides with similar to presentday summer temperature estimates by Alfimov and Berman (2001). In the same study, strongly positive anomalies are reconstructed for Ayon Island, which was part of the Siberian mainland during the LGM. In contrast, Lozhkin et al. (2007) reconstruct substantially negative summer temperature anomalies for 90 the more continental El'gygytgyn Lake. In the Pacific sector,  reconstruct LGM summer temperatures similar to today on Kamchatka peninsula from branched glycerol dialkyl glycerol tetraethers and infer similar or higher precipitation amounts from these estimates via indirect glacier modeling (Meyer and Barr, 2017). Off the Pacific coast, Kiefer et al. (2001) and Meyer et al. (2016) infer summer sea surface temperature anomalies around -1.5 K from Foraminifera assemblages and Tetra Ether 95 index (TEX 86 ) based paleothermometry in the Northwest Pacific, compared to -2 K to -6 K anomalies in MARGO Project Members (2009). For the Sea of Okhotsk, Seki et al. (2009) infer summer SST anomalies of -3 K from TEX 86 .
Despite these efforts, Siberia is currently underrepresented in global climate reconstruction compilations (Bartlein et al., 2011) and no spatial reconstruction of the Siberian LGM climate exists to date. 100 Climate model simulations of present-day conditions feature substantial surface temperature biases and inter-model differences over Siberia (Miao et al., 2014). As a consequence, large uncertainties persist in future temperature projections. Even larger inter-model differences exist for simulations of Siberian climate under LGM conditions (see Fig. 5 and Online Resource 1). To get a better constrained state estimate of LGM climate conditions and to quantify the ability of climate models to simulate the Siberian

105
LGM climate, we extend the previously available spatial synthesis of proxy records.
Due to the absence of large ice sheets, fossil pollen samples from the LGM are preserved in lake sediments and peat bogs. We compile a synthesis of published pollen records covering large parts of Siberia. From this proxy network, we reconstruct mean temperature of the warmest month (MTWA) and mean annual precipitation (P ANN ) during the LGM using the weighted averaging partial least squares 110 (WAPLS) algorithm, which is a well-established statistical pollen-climate transfer function (Birks et al., 2010).
The spatial distribution of our pollen sample network is too coarse to create gridded maps with simple statistical interpolation. Therefore, using Bayesian statistics, we combine the local reconstructions with a multi-model ensemble of climate simulations from the Paleoclimate Modelling Intercomparison 115 Project Phase III (PMIP3, Braconnot et al., 2011, 2012 to compute a spatial reconstruction of Siberian LGM climate. Our results refine previous global-scale estimates by Annan and Hargreaves (2013), and provide the first dedicated spatial reconstruction of Siberian climate at the LGM. A closely related approach was used by Cleator et al. (2019) to reconstruct the European LGM climate. In the spatial reconstruction, the proxy data provides local climate information whereas the simulations control the The structure of this study is as follows: In the next section, we present the pollen data and climate simulation ensemble. The statistical pollen-climate transfer functions, the spatial reconstruction framework, and the simulation evaluation procedure are described in Sect. 3. In Sect. 4, we present results from the local and spatial reconstructions as well as the simulation ensemble evaluation. Finally, we discuss potential implications for atmospheric processes governing the Siberian LGM climate, compare 130 our results with previous work, and discuss potential applications and extensions of our results in Sect. 5. In Online Resource 1, we provide technical descriptions of the statistical algorithms, evaluations of the statistical methods, and additional figures. For local climate reconstructions, we use a compilation of 37 pollen records, which are located between 69 • E and 178 • W and 42 • N and 75 • N. The locations of the cores are depicted by black dots in Fig.  1. For each core, the raw radiocarbon dates as well as pollen percentages had to be available to be included in the synthesis. Age models were recalculated with the Bayesian age model Bacon (Blaauw and Christen, 2011) following Cao et al. (2013). We use all samples with calibrated mean age between 140 19ka and 23ka. This leads to a total of 170 samples. The records are the LGM subset of a larger Siberian pollen compilation (Cao et al., 2020). The taxonomy of the pollen counts is homogenized as described in Cao et al. (2013), Cao et al. (2014), and Cao et al. (2020. To apply statistical pollen-climate transfer functions, a modern pollen and climate calibration dataset is compiled, expanding the previously presented dataset for China and Mongolia (Cao et al., 2014).

145
MTWA, mean temperature of the coldest month (MTCO), and growing degree days above 5 • C (GDD5) climatologies are derived from the Berkeley Earth Surface Temperature dataset (Rohde et al., 2013) for the period 1951 to 1980. P ANN and mean annual temperature (T ANN ) are extracted from the CHELSA high resolution climatologies for the period 1979 -2013 (Karger et al., 2017).
The data collection and processing of the samples is described in detail in Cao et al. (2020) for the 150 fossil data and in Cao et al. (2014) for the calibration data. Further information on the proxy records are provided in Online Resource 1.

Climate simulations
A multi-model ensemble of climate simulations, which was created within PMIP3, is used for interpolation in the spatial reconstructions and evaluated against the paleoclimate data. The boundary conditions 155 in the simulations were adjusted to the LGM, including changed orbital configurations, greenhouse gas concentrations, ice sheets, and topography (Braconnot et al., 2011). Here, we rely on simulations from the CCSM4, COSMOS-ASO, FGOALS-g2, GISS-E2-R, IPSL-CM5A-LR, MIROC-ESM, MPI-ESM-P, and MRI-CGCM3 models in the PMIP3 project. The simulations form a multi-model ensemble with common boundary conditions, i.e. they follow the same experiment protocol. The models were run until 160 an equilibrium state was obtained (spin-up), after which simulations were continued and archived for at least 100 years. We interpolate all simulations bilinearly to a common 2 • by 2 • grid. The climatological means for the respective reconstruction variables are extracted. This minimizes inter-annual to decadal variability, which is not resolved by our proxy compilation and follows the procedure of Annan and Hargreaves 165 (2013). Anomalies with respect to the preindustrial simulations are computed. For P ANN , we use the relative anomalies from the modern values, since they are more comparable across space.
Further properties of the simulations are provided in Online Resource 1.

Methods
The workflow summary in Fig. 2 shows how the datasets described in the Sect. 2 and the estimated sta-170 tistical parameters contribute to the local reconstructions, the spatial reconstructions, and the ensemble member evaluation.

Local reconstructions
Local reconstructions from each pollen sample are performed with WAPLS transfer functions  following the methodology from Cao et al. (2014). WAPLS is a regression-based statistical transfer function that consists of two stages (Birks et al., 2010). First, the climate optimum of each taxa is estimated based on the modern calibration dataset. Second, the climate optima are weighted according to the pollen percentages in the fossil pollen sample to obtain a local climate reconstruction. Partial least squares regression is used to account for residual correlations in the training data not explained by the respective climate variable. Uncertainty estimates are calculated 180 using the bootstrapped RMSEP from cross-validation of the modern calibration dataset (Juggins, 2017). Due to the absence of core top samples for several records, anomalies of the reconstructions are calculated with respect to the calibration climatology. The pollen percentages are square-root transformed prior to computing the transfer functions. For each proxy record, modern samples within a 2000 km radius are used for calibration, in order to include 185 a sufficient environmental gradient for resolving glacial-interglacial differences. The number of used WAPLS components is selected from a randomization t-test (Juggins and Birks, 2012;van der Voet, 1994). Depending on the test results either one or two components for each of the records are chosen.
Climate reconstructions from statistical transfer functions can be biased if the signal of a dominant environmental variable superimposes the reconstructed variable (Juggins, 2013;Rehfeld et al., 2016). 190 Therefore, the selection of important reconstruction variables, which explain independent parts of the variability in the pollen data, needs to be examined statistically. We test the potential to reconstruct MTWA, MTCO, GDD5, T ANN , and P ANN using the coefficient of determination between observed and predicted variables in the calibration data (R 2 ), the root mean square error of prediction in the calibration data (RMSEP), and the significance test of Telford and Birks (2011), which tests the significance of the 195 explained variance in the fossil data against randomized transfer functions. In these tests, MTWA and P ANN perform best for the highest number of records. The performance of P ANN is less correlated with the different temperature variables, which indicates that it explains more independent variance from MTWA than the other temperature variables. Therefore, MTWA and P ANN are chosen for local reconstructions from the pollen samples. These choices are in accordance with previous interpretations 200 of Siberian pollen records as recorders of summer temperature and aridity (e.g. Alfimov and Berman, 2001;Kienast et al., 2005;Sher et al., 2005;Wetterich et al., 2011;Ashastina et al., 2018).
Results from the evaluation of the transfer function models are provided in Online Resource 1.

Spatial reconstruction framework
The spatial reconstruction framework is adapted from Weitzel et al. (2019). The strategy is to use the 205 climate simulation ensemble for spatial interpolation and structural extrapolation of the local climate reconstructions from the pollen samples. The ensemble spread provides an additional constraint on physically reasonable climate states, particularly in regions that are sparsely covered by proxy records. Our strategy to combine the two sources of information is based on Bayesian statistics. This allows us to separately model the proxy-climate relationship, called data stage, and the spatial interpolation, 210 which is called process stage. Moreover, the Bayesian framework provides a natural way to incorporate uncertainties from the local reconstructions and the climate simulation ensemble. The goal is to estimate the so-called posterior probability, which is the probability of the past climate C p , interpolation parameters ϑ, and transfer function parameters θ, conditioned on the proxy data P : (1) The prior stage in Eq.
(1) contains weakly-informative prior distributions of the parameters θ and ϑ 215 which are necessary to ensure that the posterior distribution is a valid probability distribution. The data stage models the incorporation of the information from the local reconstructions using a Gaussian observation model. The best estimates from the transfer functions are employed as data and the bootstrapped RMSEPs are the standard deviations of each proxy sample. The samples are modeled as unbiased estimators of the climate anomaly in their respective grid boxes. They influence 220 the spatial reconstruction in other grid boxes only through the spatial interpolation. As part of the transfer function uncertainty is shared among samples from the same record, for example uncertainties related to the modern calibration dataset, we introduce a correlation parameter which estimates the shared fraction of the uncertainty. As the WAPLS transfer functions do not provide information on the amount of the dependent uncertainties, this parameter is fitted in the spatial reconstruction framework.
We previously tested several strategies to formulate the process stage (Weitzel et al., 2019). Here, we employ the best performing model from Weitzel et al. (2019), which is a multivariate Gaussian distribution estimated from the simulation ensemble with additional parameters. Compared to a purely Gaussian formulation, these parameters allow a more flexible adjustment of the ensemble to the proxy data and they mitigate overconfidence originating from climate simulation inadequacies and the small 230 number of ensemble members. The mean of the Gaussian distribution is given by a weighted mean of the ensemble members. The weights are estimated from the local reconstructions. This gives more importance to ensemble members that fit better to the proxy data. The spatial covariance matrix of the Gaussian distribution combines the empirical covariance matrix of the simulation ensemble with a reference correlation matrix for regularization. Due to the relatively coarse resolution of the simulations, 235 the empirical covariance mainly provides physically motivated estimates of the large-scale spatial correlations and the range of physically reasonable climate states. The reference correlation matrix decays with distance and models the small-scale correlations.
We infer the posterior distribution using Markov chain Monte Carlo methods. A technical description of the spatial reconstruction framework is provided in Online Resource 1. As in Weitzel et al. (2019), 240 the reconstructions of MTWA and P ANN are evaluated with pseudo-proxy experiments using the climate simulation ensemble as reference climatologies and with cross-validation experiments. Details of the evaluation methods and evaluation results are given in Online Resource 1.

Ensemble member evaluation
We use the Bayesian framework to additionally quantify the deviation of proxy data and ensemble 245 members. This is achieved by computing the log-likelihood score (LLS) of the ensemble members given the proxy data P . The LLS is a proper score function for probabilistic predictions (Gneiting and Raftery, 2007) and is therefore well-suited for model evaluations under uncertainties. For a given ensemble member M k , the LLS is given by The LLS is derived from the spatial framework by integrating out C p to get a direct likelihood of an 250 ensemble member given the proxy data. We choose a flat prior for M k , which gives equal prior probability to every possible climate state. A technical description of the evaluation method is provided in Online Resource 1. The model-data comparison allows us to cluster simulations according to their fit with the proxy data. This clustering is used in Sect. 5.1 to study potential changes in the atmospheric circulation which 255 govern the reconstructed MTWA and P ANN anomalies.

Local reconstructions
The local reconstructions are shown in Fig. 3 together with the uncertainty estimates from the bootstrapped RMSEP. The estimated LGM anomalies show mostly cooler conditions with less precipitation 260 compared to present-day. The reconstructions exhibit heterogeneity on large and medium spatial scales. Examples for large scale differences are more negative MTWA anomalies in continental Siberia compared to the coastlines, and more precipitation reduction in Central Siberia (Record IDs 1,2,16,22,24,25,26,30,31) than along the Arctic Ocean (IDs 12,14,17,19,28). On medium scales, we find for example deviations between the colder Yana region (IDs 14,22,25) and the upper Kolyma region (IDs 6,8,9,265 11,15,23,29,37), where MTWA anomalies are near present-day values, and between the wetter Amur region (IDs 3,4,5,20,21,32,35,36) and the dryer upper Kolyma region. Within these regions, the anomalies are relatively consistent, which indicates more homogeneous estimates on small spatial scales.
Mean MTWA anomalies of up to 5 K cooler than present-day are reached in large parts of the domain, but less negative and in some grid boxes even positive anomalies are present near the modern coastlines 270 of the Arctic and Pacific Oceans and in the upper Kolyma region. With standard deviations between 1.0 K and 2.2 K, the temperature uncertainties of the samples are relatively homogeneous in space. However, higher temporal sampling rates can reduce the uncertainty of the mean anomalies of pollen records.
For most samples, lower-than-present precipitation is reconstructed, with a maximum reduction by 275 around 60%. A regionally consistent increase of precipitation is only inferred in the Amur region.
The transfer function uncertainties are typically large with standard deviations between 9.8 and 47.8 percentage points (%p).
In records with high temporal sampling rates, we find mostly small inter-sample variations during the 19ka to 23ka time slice. This supports our assumption of a relatively stationary LGM climate within 280 the defined time period.

Spatial reconstructions
The area weighted mean anomaly of the spatial MTWA reconstruction is -1.9 K (see Table 1) with a 90% credible interval (CI) of (-2.7 K, -1.2 K). This spatial average is the result of regional heterogeneity with most negative LGM anomalies up to -8 K over the Barents-Kara ice sheet in the northwestern 285 corner of the domain, and anomalies around -5 K in the southwestern part of the domain and over the open Pacific Ocean. In contrast, temperatures near modern values are reconstructed along most of the modern coastlines of the Arctic and Pacific Ocean and at the Sea of Okhotsk. An up to +7 K warmer LGM MTWA is featured in Arctic Ocean regions that are below sea level today but were above sea level during the LGM (Fig. 4a). The maximal positive and negative anomalies in the spatial reconstruction 290 are considerably higher than in the local reconstructions (Fig. 3) due to structural spatial extrapolation via the climates simulations. For example, temperatures are near the modern values or slightly warmer along the modern coastline of the Arctic Ocean in the local reconstructions. PMIP3 simulations with similar values along the modern coastline produce pronounced warmer summers over the Siberian shelf that is now below sea level. The negative anomalies in the southeastern and southwestern part of the 295 domain are mostly significant on a 5% level. However, in the central and northern areas anomalies are only significantly negative in regions with strong proxy constraints like the highland between Lena and Kolyma and over the most-northwestern area which was covered by the Barents-Kara ice sheet during the LGM.
Uncertainties (grid box-wise 90% CIs, Fig. 4c) feature a strong northwest-to-southeast gradient with 300 CI sizes down to 2 K in the southeastern part of the domain and up to 14 K at the northwestern modern coastline of the Arctic Ocean. This pattern is similar to the ensemble spread in the PMIP3 simulations, which differ most in Northern Siberia and particularly over the Siberian shelf. The uncertainty in the unconstrained PMIP3 ensemble is reduced most near proxy records, but also large-scale features are better constrained. On average, the grid box-wise uncertainty reduction from the unconstrained PMIP3 305 ensemble to the spatial reconstruction is 58.4%. The effect of proxy data on the posterior distribution is demonstrated by comparing two 20 • times 10 • areas, one in the eastern and one in the western part of the domain (rectangles in Fig. 1). In both regions the posterior mean is warmer than the prior mean and the posterior distribution is more constrained than the prior (Fig. 5). However, more proxy records are located in and near the eastern 310 subdomain. Therefore, the reduction of uncertainty is much larger for this region.
Spatially averaged, the mean P ANN anomaly is negative with 26.2% less precipitation during the LGM ( Table 1). The 90% CI of that estimate is (-30.8%, -21.4%). The most-negative anomalies with up to 50% less precipitation are reconstructed in the northwestern part of the domain. Precipitation amounts similar to present-day climate are found in the Amur region due to the relatively high P ANN values in 315 the local reconstructions. In all other regions, the mean P ANN anomalies are between -15% and -40%. Despite the large uncertainties in the local reconstructions, the anomalies in the spatial reconstruction are significant (p ≤ 0.05) in most areas besides the Amur region as a result of negative anomalies in all PMIP3 simulations (Fig. 4b).
The spatial average of the grid box-wise 90% CI sizes is 34.0 %p, which is just an 8.9% reduction of uncertainty compared to the unconstrained PMIP3 ensemble. At grid boxes with proxy data, the reduction of uncertainty is largest with 21.8%, but the proxy signals are not spread out in space substantially due to much weaker spatial correlations than in the MTWA reconstructions. The smallest uncertainties are found in the same regions than in the unconstrained PMIP3 ensemble. The grid box-wise uncertainties are mostly between 20 %p and 50 %p, but extreme values down to 10 %p and up to 150 %p are featured 325 in some grid boxes (Fig. 4d).
Overall, the MTWA reconstruction is more strongly influenced by the proxy data due to comparatively stronger constraints from the local reconstructions. This leads to higher ensemble member weights for comparatively warm simulations and those with decreasing anomalies from northeast to southwest. On average, the posterior mean is 3.3 K warmer than the PMIP3 ensemble mean and it is warmer across 330 almost the whole domain. The highest weights are placed on the MRI-CGCM3, the COSMOS-ASO, and the IPSL-CM5A-LR simulations, whereas GISS-E2-R and MIROC-ESM have the lowest weights (see Online Resource 1). On the other hand, the spatial structure of the P ANN reconstruction is close to that of the ensemble mean. However, the mean anomaly is 4.6 percentage points wetter than the ensemble mean and highest weights are given to the two wettest models, MPI-ESM-P and IPSL-CM5A-LR. In 335 contrast, the driest model, MIROC-ESM, has the lowest weight.
Looking at the spatial covariance estimates, we find larger spatial correlations for MTWA than for P ANN . We quantify this by computing the posterior mean 1/e-decorrelation length (in degrees longitude/latitude) across space from fitting a smoothing spline to all pairs of grid boxes. For MTWA, the mean decorrelation length is 31.8 • , whereas it is only 6.8 • for P ANN (Table 1). This finding indicates 340 that large-scale spatial structures are better constrained by the proxy data for MTWA than for P ANN . In addition to the stronger proxy constraints for MTWA, the larger spatial correlations lead to the stronger uncertainty reduction from the unconstrained PMIP3 ensemble to the posterior distribution.
The correlation parameter, which estimates the fraction of the dependent part of the transfer function uncertainty between samples from the same sediment core, is determined from discrepancies between 345 local reconstructions of nearby pollen cores and the assumption of a stationary climate during the LGM time slice. The posterior distribution of this correlation parameter is well constrained by the proxy data for both variables. It is concentrated between 0.89 and 0.93 for MTWA and between 0.76 and 0.84 for P ANN (Table 1). This means that among samples from one pollen record only 10-20% of the transfer function uncertainty is independent. As Fig. 3 shows that the assumption of a stationary climate during 350 the LGM is reasonable, it indicates that the majority of the uncertainty is related to shared error sources like the modern calibration. Fig. 6a shows the LLS for the eight ensemble members. In addition, the LLS of the ensemble mean and of a weighted mean, with weights according to the likelihoods of each ensemble member, is provided.

355
For MTWA, the MRI-CGCM3 has the highest score, followed by COSMOS-ASO and IPSL-CM5A-LR. The highest LLS for P ANN are achieved by MPI-ESM-P and IPSL-CM5A-LR. The IPSL-CM5A-R is the only model in the top three for both climate variables, whereas MIROC-ESM and GISS-E2-R are the bottom two for both variables. Besides, there is little coherence in the order of the ensemble members between MTWA and P ANN LLS. 360 We find that the ensemble mean scores better than the mean LLS of the simulations, but below the best ensemble members. In contrast, the weighted mean performs similar to the best ensemble members. This means that the ensemble mean overcomes some of the deficiencies of individual ensemble members, the large inter-model differences still lead to a worse fit than the best ensemble members and the weighted mean.

365
Looking further into the model fit for different subregions, Fig. 6b-i shows the MTWA fit for moving windows of size 22 • times 10 • . In most cases, models with consistently good regional fits are also performing well in the whole domain analysis (e.g. MRI-CGCM3, IPSL-CM5A-LR, and COSMOS-ASO). However, the overall LLS is not just the average of the local fits, but corrects for the non-uniform spatial distribution of proxy records and takes into account potential teleconnections. For example, 370 the MTWA MPI-ESM-P climatology performs above average in most subregions (Fig. 6h), but belowaverage in the whole-domain LLS due mismatches in the large-scale structures. The subregion analysis for P ANN is included in Online Resource 1.

Potential implications for atmospheric processes 375
The physical processes that lead to the large-scale Siberian climate in the models, and that are assessed by the LLS, can be investigated further. Therefore, we compare model-data mismatches and large-scale circulation features in different simulations.
In general, simulations with less cold and fairly wet conditions in Siberia perform better (compare Fig. 6a and Fig. 7a). Examples are the highest MTWA LLS for the MRI-CGCM3, which features the 380 least negative MTWA LGM anomalies, and the highest P ANN LLS for the MPI-ESM-P, which exhibits the smallest precipitation decrease. Exceptions from these relationships can be mostly traced back to deviating teleconnection patterns between proxy samples and climate simulations such as overestimated spatial gradients between subdomains.
Whether Siberia is warmer and wetter in a specific simulation does not relate to the global T ANN 385 anomaly or the global P ANN anomaly in the respective models ( Fig. 7a,b). However, there is a strong concordance between Siberian T ANN and MTWA, although the magnitude of negative MTWA anomalies is smaller in all models. The relative reduction of precipitation is much larger in Siberia than globally, but across the different models, the coherence of P ANN and T ANN changes in Siberia is weak. This indicates that the precipitation decrease is more related to dynamic properties or microphysics than to 390 thermodynamic constraints. The LLS is strongly influenced by large-scale features like the spatial mean anomalies and north-south as well as east-west gradients. Northern Hemisphere polar amplification (Masson-Delmotte et al., 2006) controls the change of the temperature gradient between the Arctic and the Tropics. Therefore, it is potentially an important proxy of some of these large-scale features, for example the latitudinal anomaly 395 gradient. Fig. 7c shows the ratio of the mean anomaly between 70 • N and 90 • N and between 0 • and 20 • N at 500 hPa, 600 hPa, and 700 hPa. These levels are chosen because at this altitude estimates are less influenced by the different representation of orographic features in the climate models and therefore a better indicator of the atmospheric state. During July, polar amplification at 600 hPa varies in the eight models between 1.2 in MRI-CGCM3 and 3.4 in GISS-E2-R. The four models, which perform best in 400 either MTWA or P ANN , are those with the smallest polar amplification factors during July and annually. These models also feature more stable estimates across the different pressure levels.
As Siberian climate is strongly influenced by the structure of the polar jet stream and the large-scale pressure systems, we look at the wind speed and geopotential height anomalies at 250 hPa (Fig. 8). To summarize the differences between simulations with higher and lower LLS, we compare the ensemble 405 mean anomalies and the anomalies of the LLS-weighted mean, with weights from the MTWA evaluation. Plots of the anomalies for all ensemble members are provided in Online Resource 1. Most models show a band-structure with increased 250 hPa LGM wind speeds in the mid-latitudes accompanied by decreased wind speed in the terrestrial high latitudes and subtropics. However, the magnitude of the dipole between increased wind speed in the mid-latitudes and decreased wind speed in the high latitudes varies across 410 the models. Fig. 8a,b show that this gradient is shallower in the weighted mean, which is dominated by simulations with high LLS values, compared to the ensemble mean, where low scoring models contribute equally to the ensemble average. However, both averages feature a slight northward shift of the mean jet stream position. While the unweighted ensemble mean features a strong negative geopotential height anomaly (Fig. 8c), in particular over Europe and Eastern Siberia (i.e. a more pronounced Eastern 415 Siberian low than in the preindustrial simulations), the anomaly magnitudes are much smaller in the weighted mean (Fig. 8d) with just a slight negative anomaly, which might be caused by the overall cooler LGM atmosphere.
Assuming that better fitting ensemble members produce more realistic large-scale atmospheric structures which govern the surface temperature and precipitation patterns, we can deduce that during the LGM the jet stream position was slightly shifted northward with stronger wind speeds over the midlatitudes and reduced wind speeds over high latitudes. However, the intensification of the jet stream and the changes of the pressure structure was less extreme than in several simulations. The deviating magnitude of geopotential height anomaly between eastern and western North Pacific in the weighted mean suggests a westward shift of the North Pacific High, which is also hypothesized by . This might lead to the transport of relatively warm and moist air masses to the Pacific coastline and thus a weaker reduction of precipitation and MTWA in the better fitting simulations. The assumption that near-surface skill of an ensemble member is positively correlated to upper troposphere skill has been tested with mixed results on inter-annual scales in the Twentieth Century Reanalysis (Compo et al., 2011). It is unclear how these results translate to longer timescales. Future research to test these 430 relationships could focus on the consistency of our results with reconstructions of European and North American LGM climates.
The improved performance of models with small negative or even positive MTWA anomalies along the coastlines of the Arctic and Pacific Ocean (cf. Online Resource 1) indicates that in these models the local MTWA anomalies are more directly influenced by the changed local energy balance due to the 435 lower sea level. This explanation would fit with the simulation of reduced cloudiness and thus increased short-wave radiation at the surface by Löfverström et al. (2014). In the models with substantial cold anomalies along the shorelines, these local changes might be superimposed by the large-scale negative LGM anomaly signal as diagnosed also from the more extreme geopotential height and wind speed anomaly anomalies.

Comparison with previous work
In this section, we compare our results with previous climate reconstructions for different parts of Siberia as well as sea surface temperature (SST) reconstructions for the Northwestern Pacific. We structure this comparison geographically by looking at Western Siberia and the Baikal region, the Arctic Ocean coastline, Eastern Siberia, and the Pacific sector. The comparison is summarized in Table 2.

445
Our reconstruction supports the finding of less negative MTWA anomalies in the high latitudes of Western Siberia (+2 K to -3 K by Tarasov et al., 1999) than in the mid-latitudes (-3 K and -8 K by Tarasov et al., 1999) Meanwhile, the coldest anomalies in Tarasov et al. (1999) are a little warmer than in Western Siberia in our reconstruction, where local anomalies below -6 K are unlikely (p ≤ 0.05). The borehole-based T ANN LGM anomaly of around -4 K from Dorofeeva et al. (2002) is colder than 450 our estimate of -1.4 K for the Baikal region. However, considering the likely cooler T ANN than MTWA anomalies as well as local heterogeneity and reconstruction uncertainties, this result is still in concordance with our estimates (-4.1 K to +1.7 K, Table 2).
The complex anomaly pattern of our reconstruction in Central Siberia and along the Arctic Ocean (local anomaly ranges between -3.7 K and +6.6 K, including uncertainty estimates) is reflected in previous 455 work. In the Yana region, we reconstruct moderate but statistically significant (p ≤ 0.05) negative LGM MTWA anomalies. This is in agreement with the estimates of Pitul' Ko et al. (2007) and Ashastina et al. (2018). While Andreev et al. (2011) reports colder summer temperatures along the Arctic Ocean, Kienast et al. (2005) and Sher et al. (2005) find warmer or just slightly cooler summer temperatures in the Lena Delta. While we reconstruct negative but insignificant anomalies for the Taymyr Peninsula, 460 our estimates of similar MTWA values to present-day for the Lena delta support the latter results. It also indicates that the results of Kienast et al. (2005) might not just be the effect of studying sheltered habitats as was suggested previously (Wetterich et al., 2011).
For Eastern Siberia, reconstructed positive MTWA anomalies by Berman et al. (2011) for the Kolyma region, estimates between -4 K and +3 K by Alfimov and Berman (2001), and LGM summer temperatures 465 similar to present-day on Kamchatka Peninsula inferred by , are within the uncertainty ranges of our reconstruction. The strongly positive anomaly for Ayon Island reconstructed by Alfimov and Berman (2001) supports our finding of a warmer Siberian shelf due to the lower sea level. While we find significant negative MTWA anomalies in the more continental parts of Eastern Siberia, our reconstruction does not support the very cold estimate from Lozhkin et al. (2007).

470
For the Northwest Pacific, the inferred summer SST anomalies of around -1.5 K by Kiefer et al. (2001) and Meyer et al. (2016) are in good agreement with our estimate of -2.3 K (with local anomalies between -4.6 K and -0.1 K). Similar to MARGO Project Members (2009), our reconstruction features a northwest to southeast gradient, but our lowest anomalies stay above the -6 K in the coolest MARGO grid box. The estimate of -3 K summer SST anomaly for the Sea of Okhotsk by Seki et al. (2009) is slightly cooler 475 than our estimates. That the latter two estimates are more extreme than our results might be due to the higher spatial consistency of our reconstruction.
In general, most of the discussed summer temperature reconstructions from previous work are within the uncertainty limits of our reconstruction and several of the spatial patterns arising from the literature are reflected in our results. However, some of the coldest reported anomalies are outside of our uncertainty 480 estimates. This is perhaps due to the smoothing effect of the spatial interpolation which leads to a less heterogeneous reconstruction compared to considering only individual records.
Our consistently inferred dryer conditions during the LGM in Siberia are in agreement with previous work (e.g. Tarasov et al., 1999;Sher et al., 2005;Pitul'Ko et al., 2007;Lozhkin et al., 2007;Andreev et al., 2011;Ashastina et al., 2018). However, some of the studies like Tarasov et al. (1999) reconstruct 485 an even stronger reduction of precipitation. We only estimate precipitation similar to present-day for parts of the Pacific sector. Based on indirect glacier modeling, Meyer and Barr (2017) suggest similar or higher precipitation amounts during the LGM for the Kamchatka peninsula. However, for this part of the Pacific sector we still infer reduced precipitation. This conflict might stem from uncertainties in the glacier modeling-based precipitation estimates or an underestimation of the transport of moist pacific 490 air masses by the PMIP3 simulations, which we employ.

Potential applications and extensions
Based on our results, we find that the data-constrained state estimate is likely a better basis for analyses of large-scale features of LGM climate compared to analyzing just individual proxy records or to using only simulations. If a spatial reconstruction of the desired variable is not feasible due to missing direct 495 proxy data (e.g. for wind speed anomalies), our results suggest to use simulations, which score comparatively well against reconstructions of potentially related variables such as temperature and precipitation. Furthermore, our results indicate that a weighted mean of ensemble members is preferable compared to the standard ensemble mean, which is more likely to be disturbed by strongly biased ensemble members in ensembles with very large spreads such as the PMIP3 LGM ensemble.

500
It is well-established in numerical weather prediction and historical climatology that ensemble means tend to perform better than individual ensemble members under the assumption of independent and identically distributed (iid) ensemble members (Krishnamurti et al., 1999). However, it is unclear whether this is still true in paleoclimatology. Our results indicate the opposite perhaps due to stronger large-scale biases compared to historical simulations, which might make results based on iid assumptions invalid.

505
Meanwhile, a simple "superensemble" (Krishnamurti et al., 1999) approach, i.e. a multi-model mean that is weighted according to its fit with paleoclimate data, performs similar to the best ensemble members. We want to highlight three possible applications of our spatial reconstructions and the ensemble member evaluations. The physical mechanisms behind the absence of a large terrestrial Siberian ice sheet during the LGM are still an open question. MTWA and P ANN are important variables as they 510 are major controls of the energy balance of glaciers (Reeh, 1989). While very low P ANN values are the prevalent explanation for missing large ice sheets during the LGM in Western and Central Siberia (Stauch and Gualtieri, 2008), relatively warm summer temperatures were proposed as limiting factors for the extent of glaciers in the pacific sector (Meyer and Barr, 2017). Forcing ice sheet models with our data-constrained estimates or with weighted means of simulation ensembles might result in new insights 515 into the mechanisms behind the glacial history of Siberia.
A large amount of carbon is stored in Siberian soils (Hugelius et al., 2014). To constrain estimates of carbon fluxes under global warming, biogeochemistry models have to be tested against past glacialinterglacial transitions. This requires precise atmospheric forcing. Current estimates of the net carbon balance between the LGM and today are diverging, with Zech et al. Reconciling data-and model-based estimates could be feasible by reducing biases from using single climate simulations as atmospheric forcings and by instead employing data-constrained state estimates.
Siberia has been identified as a potential area for large-scale carbon storage through afforestation 525 (Bastin et al., 2019). To improve model-based estimates of afforestation rates our spatial reconstruction can be used to force vegetation models as a test of their ability to model the afforestation between the LGM and today. The uncertainty estimates of the reconstruction (see Sect. 4.2) and the cross-validation experiments (see Online Resource 1) indicate where new data can clearly improve existing state estimates. The 530 large MTWA uncertainties in Western Siberia between Ob and Lena due to missing proxy constraints and a large PMIP3 ensemble spread indicate that new data in these regions could sharpen our MTWA reconstructions the most. The cross-validation experiments show distinct negative skill at the location of the westernmost proxy sample. More data in this region would help to inspect whether this sample is an outlier or whether model biases need to be overcome in this region. A key issue remains that 535 the uncertainty for P ANN in the pollen-based reconstructions is very high, hence reducing its impact on the state estimate. Here, adding constraints from other aridity-recording proxies like ice wedges, macrofossils, or faunal remains would improve the estimates the most.
Since these archives are also interpreted as proxies for winter temperature, MTCO could be added as third climate variable to our state estimates if a sufficient spatial proxy coverage is obtained. This would 540 additionally allow to better constraint T ANN which is not possible from the MTWA reconstruction alone.

Conclusions
We present new LGM climate reconstructions from an extensive compilation of Siberian pollen records. WAPLS transfer functions are used to compute local reconstruction of MTWA and P ANN for large parts of Siberia. The reconstructions feature mostly colder temperatures and less precipitation. However, 545 several records along the modern coastlines show temperatures near modern values and in the Amur region slightly more precipitation than present-day is reconstructed. While reconstructions are predominantly consistent on local-to-regional scales, the anomalies from individual records are rarely statistically significant.
We combine the local reconstructions with an ensemble of model simulations for the LGM and compute 550 spatial reconstructions of MTWA and P ANN . The reconstruction framework performs well in crossvalidation and pseudo-proxy experiments. It provides a data-constrained state estimate which adjusts the ensemble mean according to the proxy data and reduces the uncertainty in the unconstrained PMIP3 ensemble. While our estimates are cooler than the modern climate with less precipitation in all areas except for some coastal regions and the Siberian shelf, the spatial reconstructions are warmer and wetter 555 than the PMIP3 ensemble mean. By computing the LLS of each ensemble member, we quantify the fit of proxy data and climate simulations. The comparison reveals that simulations with a relatively warm and wet LGM in Siberia perform mostly better, although mismatching medium-to-large scale patterns can diminish the performance. The higher scoring models tend to have a smaller Northern Hemisphere polar amplification, a less enhanced 560 polar jet stream, and just a slighty negative geopotential anomaly compared to the ensemble mean. Our results indicate that while the Siberian atmospheric state was altered during the LGM in response to the large Fennoscandian ice sheet, lower CO 2 levels, lower sea levels, and varied insolation patterns, the changes during the Siberian summer were less extreme than in several PMIP3 simulations.

565
Nils Weitzel, Ulrike Herzschuh, Andreas Hense, and Kira Rehfeld designed the study. Xianyong Cao and Thomas Böhmer computed the local reconstructions. Nils Weitzel implemented the spatial reconstruction and model evaluation framework. All authors discussed the results. Nils Weitzel wrote the first version of the manuscript and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.