LARGE MODEL PARAMETER AND STRUCTURAL UNCERTAINTIES IN GLOBAL PROJECTIONS OF URBAN HEAT WAVES

Urban heat waves (UHWs) are strongly associated with socioeconomic impacts. Reliable projections of these extremes are pressingly needed for local actions in the context of extreme event preparedness and mitigation. Such information, however, is not available because current multi-model projections largely lack a representation of urban areas. Here, we use a newly-developed urban climate emulator framework in combination with global climate simulations to show that, at the urban scale a large proportion of the uncertainty results from choices of model parameter and structural design in projecting UHWs in the next several decades under climate change. Omission of the model parameter and structural uncertainty would considerably underestimate the risk of UHWs. Results show that, for cities in the four high-stake regions, the Great Lakes region of North America, Southern Europe, Central India, and North China, a virtually unlikely (0.1% probability) UHW event is estimated by our model with probabilities of 13.91%, 5.49%, 2.78%, and 13.39% respectively in 2061–2070 under a high-emission scenario. Our findings highlight that for urban-scale extremes, decision-makers and stakeholders have to account for the multi-model uncertainties if decisions are informed based on climate simulations. A NON-PEER REVIEWED MANUSCRIPT IN REVIEW IN X


Main
Heat waves (HWs) are among the most damaging climate extremes to human [1,2,3,4,5] and natural systems [6,7,8,9] globally. Absent effective adaptation, extreme heat stress associated with climate change would cause a substantial increase in human mortality and morbidity [1,10,2], energy demand [11,12] and civil conflicts [13,14], and a large reduction in agricultural yield [15,16,17,18], livestock production [19], and workplace productivity [20]. In recent decades, HWs have been recognized as the deadliest environmental extreme in the United States (U.S.) [21,22]. These risks are further compounded in urban areas by the unique urban climates combined with concentrated population and assets [23]. Climate models agree on the projection of increasing severity, frequency, and duration of HWs at regional to global scales [24,25,26,27] or from an aggregated spatial probability perspective [28] over this century under rising greenhouse gas emissions. These projections, however, are incapable of representing the HW signals for cities because: 1) the state-of-the-art Earth system models (ESMs) that participate in the Coupled Model Intercomparison Project (CMIP) [29,30] almost universally lack urban representation; and 2) the complex synergistic nature between urban heat islands and HWs [31,23] precludes the urban HW signal to be reduced to a simple anomaly on top of the traditional regional background climate projections by ESMs. Cities, as exposure hotspots for humans and infrastructure and fundamental foci of sustainable development and climate adaptation, need local-scale climate extreme projections that are specific to urban areas. Moreover, a multi-model urban framework is essential for managing the risks associated with climate extremes, because urban planning and decision-making for resilience to extremes rely primarily on probabilistic estimates [32,33] which can only be obtained by multi-model projections [34,35,36].
A robust modeling framework to address uncertainty in local-or regional-scale climate change should include the roles of internal variability (natural variability of the climate system resulting from non-linear dynamical processes intrinsic to the atmosphere), model parameter and structural uncertainty (uncertainty from choices in parameter values, representation of unresolved physics and model design, and their effects on climate sensitivity), and scenario uncertainty (uncertainty in prescribing future scenarios) [37]. Quantitative attribution of the structure of these uncertainties is particularly critical for projection of climate extremes, because uncertainties associated with extreme projections are larger than with average-state projections in climate models. The uncertainty analysis has been done for non-urban surfaces at regional scales using multi-modeled grid cell means [28,38]. For local-scale urban climate extremes, the role of internal variability can be addressed by rerunning the simulations a large number of times using a single climate model with small atmospheric initial condition perturbations [39]. However, quantification of the model parameter and structural uncertainty has never been achieved [39] because of the aforementioned near-universal lack of urban representation in ESMs. Nor could existing downscaling techniques (both dynamic and statistical) from a small number of global climate models or observational-based methods account for the full uncertainties associated with the extreme projections in cities. This has been a critical research gap, as the model parameter and structural uncertainty is expected to be the dominant uncertainty sources at time horizons of multiple decades or longer [40].
Here, we use a newly-developed urban climate emulator framework [41] to assess the inter-model variability in projections of local urban heat waves (UHWs) for global urban areas in a future high-emission scenario, and to quantify the relative contributions of uncertainties from internal variability and model parameter and structural variability associated with the projections. This emulator framework combines process-based Earth system modeling and the data-driven machine learning approach. The urban model in the U.S. National Center for Atmospheric Research's Community Earth System Model (CESM) [42] has been evaluated against in situ and remote sensing observations over cities across the globe in previous studies [43,44,45,46,47,48,49]. The emulator framework was further evaluated against both PRISM (Parameter-elevation Relationships on Independent Slopes Model) observation-based climate data (http://www.prism.oregonstate.edu/) and the mesoscale dynamic downscaling results from ref [50] over selected cities in the U.S.. The urban climate emulator is based on daily output of fully-coupled simulations using the CESM. The emulator is then applied to 17 ESMs that participated in the CMIP5 to generate global multi-model projections of local urban daily temperature under the Representative Concentration Pathway (RCP) 8.5 scenarios (see Methods). We use a definition that has been shown to be related to human mortality risk [1,5] to calculate UHWs for present day (defined as 2006-2015) and future projected climate (2061-2070). Note that the "present-day" climate here is technically a projection of "present-day" conditions because we are using RCP 8.5 scenario only, rather than historical simulations. This is a reasonable assumption since the climate has largely followed the RCP 8.5 path in reality (instead of other paths such as RCP 4.5 and RCP 2.6). The internal variability is assessed based on the CESM Large Ensemble (CESM-LE) [51] simulations, whereas the model parameter and structural uncertainty (lumped together) is characterized based on the emulated multi-model projections. In this study, we did not separate out the uncertainty due to model parameters and model structure.
We show that traditional projections from ESMs substantially underestimate the risks of UHWs in almost every aspect including intensity (average daily maximal temperature during the UHWs), frequency (average number of UHW events per year), duration (average number of days per UHW event) and total days (duration multiplied by frequency in days per year) (Fig. 1). These positive anomalies relative to the background non-urban signals are not spatially uniform, confirming that the local-scale UHW projections cannot be simply reproduced by the traditional CMIP multi-model HW projections. Here, the "background non-urban" HW signals are based on the gridcell mean temperatures (i.e. the same as traditional climate projections), whereas the "urban" signals are based on the urban subgrid temperature. According to the multi-model ensemble mean results, the increase in UHW intensity by 2061-2070 is underestimated by 1-2.1 K for 8.3% of the global urban areas compared to the traditional projections (Fig. 1a). These numbers are significantly larger than the average urban-minus-background warming difference under non-extreme conditions (-0.6 to 0.6 K) [41]. The regions of large anomaly of change in UHW intensity generally collocates with those in frequency, duration and total days (Fig. 1), indicating further compounded underestimation of UHW-induced risk. In particular, eastern U.S. and India are noteworthy as they are hotspots with large anomaly in all four aspects. Given that massive urbanization is projected to happen in India in the next few decades [52], the underestimation of UHWs using traditional climate projections would put their city dwellers and infrastructure at a large risk.  Our multi-model results demonstrate the inter-model robustness of increasing severity of UHWs over certain regions under climate change (Fig. 2). CMIP models generally agree better in projections of UHW intensity and frequency compared to duration and total days (Fig. 3), as is also indicated by greater spatial extent in the stippling shown in Fig. 2a This indicates that the model parameter and structural uncertainty have larger impacts on the distribution of daily temperature than on the magnitude. Models project similar magnitude as well as the frequency of temperature extremes, but do not necessarily agree on which day an extreme temperature occurs. Note that the tropical (near-equator) region (15 • S ∼ 15 • N) is projected to have the most substantial increases in UHW frequency, duration, and total days by 2070. This is because minimal seasonal variations in this region means that UHWs can occur at any time of the year (see Methods). According to the multi-model results, four "hot spot" regions are noteworthy: the Great Lakes region of North America, Southern Europe, Central India, and North China (Fig. 4). Models agree, with high inter-model confidence, on the projection of substantial increase in UHW intensity and frequency for cities in these regions under RCP 8.5 scenario ( Fig. 2a and b), indicating large exposure of these cities to high heat-extreme risks. Specifically, UHW intensity for cities in these regions is projected to increase by 2.0, 2.0, 1.5 and 2.0 K on average for the Great Lakes region of North America, Southern Europe, Central India, and North China respectively.
Previous research has demonstrated a dominant role of internal variability in projections of hot extremes on the local and regional scales over the next few decades [28]. We show here that for urban areas, there is a large part of the uncertainties that result from model parameter and structural uncertainty. We define a structural uncertainty fraction (SUF) as the model parameter and structural uncertainty divided by the sum of internal variability plus model parameter and structural uncertainty to measure the contribution of interval variability. Based on the CESM-LE simulations and our emulated multi-model results, we find that the model parameter and structural uncertainty contributes more than 50% of the total variability by 2061-2070 for most of the global urban areas (Fig. 5). The SUF with respect to projections in UHW frequency, duration and total days ( Fig. 5b-d) are even larger than that in UHW intensity (Fig. 5a).
These results indicate that for decision-making in the context of urban heat extremes based on climate modeled results, policymakers or local practitioners will need to consider the large model parameter and structural uncertainty on the local scale. It is particularly important to account for this uncertainty where projections of future UHW frequency, duration and total days are concerned ( Fig. 5b-d).
Increasing number of record-breaking heat extremes has been observed recently over many cities globally [53,54,55,56,57], which raises the question of whether the probability of heat extremes on local scales is largely underestimated by ESMs in a changing climate. We argue that this underestimation is likely for cities because of the lack of sampling a sufficiently wide uncertainty range. To illustrate the effect of accounting for model parameter and structural uncertainty on UHW projections, we calculate the probability density function (PDF) of the 10-year mean increase in UHW intensity (2061-2070 relative to 2006-2015) based on the emulated multi-model and CESM-LE urban results (see Methods). Results demonstrate considerable potential underestimation of heat extreme risks in cities if not accounting for model structural spread (Fig. 6), as shown in illustrative examples using the aforementioned four hotspot regions (Fig. 4). The PDFs generated from multi-model urban projections cover much larger spectra than from the CESM multi-member projections which sample the internal variability only. This indicates that the probability of extreme increase in UHW intensity (higher tail in the PDF) would be significantly underestimated from single-model ensembles.
As has been largely recognized, one of the major benefits of multi-model climate projections is the capability of capturing and quantifying all aspects of model uncertainties [58]. The above discussion demonstrate that the emulated multi-model results are unique in being able to account for the uncertainty associated with the model structural spread, because of the near-universal lack of an urban representation in CMIP5 ESMs [41].
The underestimation of heat extreme risks shown above can be understood in the context of a "grey swan" event [59]. Unlike black swans, grey swans indicate high-impact events that are not completely unanticipated. Here we define "urban heat grey swans" as the heat extremes that cities will face in the future and that are virtually unpredictable based on historical observations or existing models but can be better foreseen and prepared for based on our model. We quantify the likelihood/probability associated with some urban heat grey swans to illustrate how plausible these events are in a changing climate. According to our results, a seemingly unlikely urban heat grey swan event with 0.1% probability estimated by the CESM ensemble mean PDF is actually predicted, by the CMIP ensemble mean PDF, at the probabilities of 13.91%, 5.49%, 2.78% and 13.39% for cities in the Great Lakes region of North America, Southern Europe, Central India, and North China, respectively. These results mean that the once in 10 000 years urban heat grey swan event estimated previously using a single model are likely 7-, 18-, 36-and 7-year events. This underestimation is even more concerning for cities in Central India (Fig. 6c) and North China (Fig. 6d), because largest urban population growth is predominantly projected to concentrate in South and South East Asia and Africa by 2100 [52].
Urban climate extremes often have very consequential socioeconomic implications, especially in future warmer climates. Plausible record-breaking heat extremes may not be predictable by extrapolating the sparse historical extremes in short urban historical records. Robust assessments of the risks associated with UHWs under climate change have been limited by the near-universal lack of urban representation in the state-of-the-art ESMs. This limitation is overcome in our machine learning-enabled multi-model projections anchored in process-based simulations. We demonstrate considerable contribution of model parameter and structural uncertainty in projecting the local-scale urban climate extremes. We note here that the entire emulator framework in this study is based on a single urban model, CLMU, simply because of the lack of other ESMs in CMIP5 that have an urban representation. It is important to include urban parameterizations in future development of various ESMs to more robustly assess the model parameter and structural

CESM and Large Ensemble
The Community Earth System Model (CESM) [42], hosted at the National Center for Atmospheric Research (NCAR), is a fully coupled Earth system model that provides state-of-the-art dynamic simulations of the Earth's climate states. It consists of seven components including Atmosphere, Sea-ice, Land, River, Ocean, Land-ice, and a Coupler that exchanges fluxes between components. The land-atmosphere interactions are represented in its land component -the Community Land Model (CLM) [60]. As a sub-model embedded in CLM, the urban land scheme (CLMU) provides physically-based urban representation and parameterization. More details on the physics and hydrology represented in CLMU can be found in ref [61,48]. A global urban surface dataset prescribing the thermal (e.g., heat capacity and thermal conductivity), radiative (e.g., albedo and emissivity), and morphological (e.g., building height to street width ratio, roof areal fraction, average building height, and pervious ground fraction) characteristics of urban facets for each urban grid cell is provided by ref [62]. The urban spatial extent is derived from a population density dataset at 1 km spatial resolution. The urban property data is compiled by synthesizing a variety of datasets including satellite products, a global database of tall buildings, local building codes data and other municipal documentation, and validated against imagery from Google Earth. With this high-dimensional urban surface dataset and the prognostic forcing fields produced from the Atmosphere component, the CLMU outputs state and flux variables for urban land units in each urban grid cell, laying the groundwork for assessing the urban climate on a large scale.
We utilized the CESM Large Ensemble (CESM-LE) [51] simulations to constitute the training data and to quantify the role of internal variability in urban heat wave projections. CESM-LE dataset includes 40 ensemble members of climate simulations conducted at the resolution of 0.9 • latitude × 1.25 • longitude. These ensemble members were run with slightly different atmospheric initial condition but identical model configuration and climate scenarios. In this study, we utilized 32 members of the CESM-LE ensemble simulations under the Representative Concentration Pathways (RCP) 8.5 scenario.

Emulator
We develop an urban daily temperature emulator, using the framework described in ref [41], that maps atmospheric forcing meteorology to subgrid urban responses. Detailed information on this urban climate emulator framework can be found in ref [41]. The previous emulator function used in the framework was based on multiple linear regression.
Here because this study is on a daily scale, we use a higher-complexity non-linear model to fit the emulator function in order to better capture the daily variations. The urban daily temperature emulator is a location-dependent (grid cell) non-parametric model based on atmospheric forcings and time. It incorporates the daily atmospheric forcing fields from the atmospheric component of the ESM and the month-of-year indicator as the emulator inputs, and then outputs the daily maximum temperature. Specifically, the daily maximum temperature for a particular grid cell that has urban land unit T lat,lon is emulated as T lat,lon = f lat,lon (AF, m) where AF is the vector of atmospheric forcings from the atmospheric component of the ESM; m is a 12-dimension vector of the month-of-year indicator; lat and lon are the latitude and longitude specifying the geospatial information of the grid cell; f lat,lon denotes the location-dependent emulator function. The AF matrix contains all the atmospheric forcing fields that drive CLM in the CESM including net shortwave radiation, net longwave radiation and precipitation (liquid and solid) at the surface, and atmospheric temperature, pressure, specific humidity, and wind speed (zonal and meridional) at the forcing height. These forcing variables are consistently available in other CMIP5 ESMs. The emulator function (f lat,lon ) is a unique nonlinear model for each grid cell that has an urban land unit. Therefore, the impacts of urban surface form on its specific response to forcing meteorology are implicitly embedded in these location-specific functions (f lat,lon ). We employ the XGBoost [63] (a scalable tree boosting method) to fit the emulator function f lat,lon for 2006-2015 and 2061-2070. Therefore, the fitted functions f lat,lon are a set of decision trees which map daily atmospheric forcing fields to urban temperatures at given grid cells. One of the salient features of the XGBoost is that tree-based approaches are inherently able to capture the interactions among predicting variables. Therefore, the nonlinear interactive effects between forcing variables and their seasonality are represented in the XGBoost models.

Training Data
We utilize the fully-coupled CESM-LE simulations as the training sets to build the urban daily temperature emulator. We randomly sample 10% of simulation outputs from each member of the CESM-LE simulations within each grid cell that has an urban land unit. Each member then contributes to the training dataset with the same weight. With this strategy, we ensure a sufficiently large size of training data ( > 3 times of using only one CESM-LE member data).

Validation
The whole framework including the urban modeling in CESM and the emulator itself has been thoroughly evaluated in ref [41]. Importantly, the urban simulation by CLMU was evaluated against both the gridded ground-based observation dataset PRISM (http://www.prism.oregonstate.edu/) and the meso-scale dynamic modeling data over the contiguous U.S. [50]. Results demonstrated the credibility and robustness of our emulator method. In addition, the CLMU has also been widely evaluated against both in situ and remote sensing observations over sites across the globe in previous studies [43,44,45,46,47,48,49]. Here, we further evaluated the statistical robustness of the urban daily temperature emulator by cross-member validation using the data of 32 CESM-LE members under RCP 8.5 that were excluded from the training of the emulator. Results show that our emulator accurately predict daily urban temperatures. The global average root-mean-square-error (RMSE) of urban daily maximum temperature across the tested ensemble members (Fig. 7)

Multi-model projections
The CESM-modeled atmospheric forcing fields that were used to train the emulator can be consistently extracted from other ESMs in CMIP5. We applied the emulator to all other available ESMs in CMIP5 to produce multi-model global urban daily maximum temperature projections over 2006and 2061-2070. We included all ESMs that have their daily archive of outputs available, but excluded results available from those same models at lower spatial resolution. In the end, a total of 17 ESMs were selected under RCP8.5. (Table 1). The first ensemble member simulation of each selected ESM was used. Direct application of the emulator requires the grids of other ESMs align exactly with the CESM grid, because our emulator function, as described above, is location-dependent. Therefore, we regridded the needed atmospheric forcing fields of all 17 ESMs from their native grids to the CESM grid using a Python package "xESMF" [64] with the "patch" interpolation method [65,66]. These regridded atmospheric forcings accompanied by the encoded month-of-year indicator were fed as inputs to the emulator to generate 17 global urban projections of daily maximum temperature. The final multi-model analyses were then based on these 17 emulated projections plus the original CESM-LE simulations.

Heat wave definition and calculation
There are multiple ways to define a heat wave (HW) event [67,68,69], each of which has its own merits and disadvantages for different communities [70]. Here we use a "relative" definition of HWs that has been shown to be closely related to human mortality risk [1]. Within each grid cell, we define an HW event as three or more consecutive days with daily maximum temperature (T max ) higher than the 98th percentile of T max for 2006-2015 for the same grid cell. Using this definition, we calculated urban HW (urban subgrid) and background HW (whole grid cell) for each of the grid cells that has an urban land unit for present day (2006-2015) and a future climate (2061-2070) under RCP 8.5. What should be noted is that we technically use a projection of "present day" based on the RCP 8.5 scenario,

Inter-model robustness
We used the "signal-to-noise" ratio (SN R) as a measure of inter-model robustness to illustrate how different ESMs agree on the projections of urban HWs. The SN R is calculated as the ratio of the multi-model ensemble mean to the inter-models variability as SN R lat,lon = µ lat,lon σ lat,lon where µ denotes the multi-model ensemble means (i.e. means of HW intensity, frequsency, duration, and total days); σ denotes the inter-model variability calculated as the standard deviation of multi-model values; and "lat, lon" denotes each grid cell that has an urban land unit in the CLM. The reciprocal of SN R is the multi-model variance normalized by the mean, therefore it can also indicate multi-model spread. A smaller SN R indicates higher model spread or smaller signal change such as multi-model mean change of HW intensity, frequency, duration and total days. We quantified the SN R for each grid cell that has an urban land unit to assess the inter-model agreement. We use SN R > 2.0 to indicate "high" inter-model agreement in the multi-model urban HW projections. Note that this threshold is somewhat subjective and for illustration purpose only. Different thresholds can be chosen for different applications of urban extreme events research to tighten or relax the tolerance in model disagreement.

Internal variability and model structural uncertainty
The discrepancy between multi-member simulations and between multi-model projections are associated with internal variability and model parameter and structural variability respectively. The internal variability refers to the natural variability of the climate system, which comes from non-linear dynamical processes intrinsic to the Earth-Atmosphere system. The role of this unpredictable internal variability is usually estimated by running the same ESM a number of times with slightly perturbed initial conditions but otherwise identical model configuration. Here we use the CESM-LE simulations to quantify the internal variability associated with the urban HW projections.
On the other hand, the model parameter and structural variability refers to the uncertainty introduced by the choices of parameter values and model design and their effects on climate sensitivity, which can hardly be captured by changing parameters within a single model, no matter how wide the range of initial conditions or parameters are chosen. The underlying reason is that it is fundamentally impossible to describe all the climatology processes within a single ESM, no matter how complex the model itself. Each ESM is suitable for specific applications due to its merit and shortcoming. The multi-model ensembles try to tackle this issue by combining a set of model simulations from structurally independent models. It usually exhibits the better skill, higher reliability, and consistency of the model forecast [71,72]. Here we use our emulated multi-model urban temperatures to compute the multi-model ensemble mean urban HW projection and to estimate the model parameter and structural uncertainty associated with the projection. We compare the estimated internal variability and the model parameter and structural uncertainty to demonstrate their relative contributions. The uncertainty due to model parameters and model structure are not separated out in this study.
Note that the model parameter and structural uncertainty assessed in this study is ultimately those associated with larger-scale model structural design and parameters (such as radiative transfer, cloud microphysics, biogeochemical cycles and ocean processes etc.) rather than those associated with urban model, because the urban emulator is based on a single urban model. It is advantageous to develop emulators from multiple urban models to better assess the uncertainty associated with urban models. Nevertheless, the magnitude of the model parameter and structural uncertainty estimated in this study should be reasonable, because urban parameterizations, unlike ESMs, do not have many internal dynamics, and as such cannot drift far from each other if forced by identical atmospheric forcing [73,74]. Therefore, the major variability is from diverse large-scale atmospheric forcings that are produced by different ESMs. The multi-model structural spread presented here is likely an upper bound encompassing the potential variability resulting from different urban parameterizations.

PDFs of increase in UHW intensity
We aggregated the changes of UHW intensity between 2061-2070 and 2006-2015 for the four hotspot regions (i.e. the Great Lakes region of North America, Southern Europe, Central India, and North China), and estimated the probability density function (PDF) using a non-parametric method (Gaussian kernels) for both CESM-LE multi-member simulations and our emulated CMIP5 multi-model results. The Python package "SciPy" [75] was used to determine the kernel density for the PDF estimation.

Code and Data Availability
Scripts and instruction to develop and apply the urban climate emulator, and analyze the urban heat waves are available at doi:10.5281/zenodo.3872520 or https://github.com/zzheng93/code_uhws.

Acknowledgements
We would like to acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. We thank AWS for providing AWS Cloud Credits for Research.