UNSEEN trends : Detecting decadal changes in 1 100-year precipitation extremes 2

Geography and Environment, Loughborough University, Loughborough, UK 5 The Norwegian Meteorological Institute, Oslo, Norway 6 Section for Meteorology and Oceanography, Department of Geosciences, University of Oslo, 7 Norway 8 School of Geography and the Environment, University of Oxford, Oxford, UK 9 School of Architecture, Building and Civil Engineering, Loughborough, UK 10 European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK 11 Centre for Ecology and Hydrology, Wallingford, UK 12

explaining the most severe events possible in the current climate, such as heatwaves in China 20 . 48 However, validating the UNSEEN method is a well-recognised difficulty in existing studies, and 49 UNSEEN has not yet been used to facilitate detection of non-stationarity in climate extremes over 50 short periods of several decades. 51 Here, we provide a framework to systematically evaluate the robustness of the UNSEEN approach 52 and we present a novel UNSEEN-trends approach, where we aim to provide confident short-term 53 trend estimates by using the larger event sample to better constrain changes in climate extremes. 54 We do this in a storyline context 23 , where we take observed flood episodes as a starting point for our 55 analysis. We select the west coast of Norway and the Svalbard Archipelago as study regions; two 56 contrasting areas in terms of precipitation extremes. Western Norway faces the highest extremes 57 within Europe 24 and has a dense station network 25,26 , whereas Svalbard is a semi-desert with only a 58 few observation stations 27 . Both regions have faced severe damages from recent extreme events, 59 such as the September 2005 28 and October 2014 29 floods over Western Norway and the slush-60 avalanche inducing extreme precipitation event over the Svalbard Archipelago in 2012 30 . The 61 extreme events were driven by atmospheric rivers 27-29 (ARs), which cause heavy precipitation over a 62 prolonged period. As AR-related floods predominantly occur in autumn and frequently strengthen 63 over a period of several days 28,29 , we select autumn (September to November) spatial averaged 64 (Supplementary fig. 1) three-day extreme precipitation (SON-3DP) as target events. 65 Previous UNSEEN studies have used the Hadley Centre global climate model, [20][21][22] and the European Centre for Medium-ranged Weather Forecasts (ECMWF) ensemble prediction 67 systems [15][16][17][18] and earlier version of the seasonal prediction system 12,13,19 . Here, we are the first to use 68 the latest ECMWF seasonal prediction system SEAS5 31 for its high-resolution, large ensemble, long 69 homogeneous hindcast period  and open access. The ECMWF atmospheric model has 70 shown skill in simulating atmospheric rivers for Northern Europe 32 , giving confidence in the realism 71 of these extreme events in SEAS5, hence is a good candidate for the UNSEEN method. We use the 25 72 ensemble members across lead times of 2-5 months, resulting in a sample of 100 members (called 73 the UNSEEN ensemble) and evaluate the independence and stability of the pooled sample for  3DP events across Western Norway and Svalbard. We then use the UNSEEN-trends approach to 75 identify unprecedented extreme precipitation events and to detect trends in 100-year precipitation 76 events over the last 35 years. These findings will help understanding the robustness of current 77 design levels and may improve our understanding of physical processes driving climate extremes and 78 their non-stationarity. 79 Ensemble member independence and model stability 80 The independence of ensemble members is an important requirement for the UNSEEN approach, as 81 dependent members would artificially inflate the sample size, without adding new information. 82 Previous studies have assessed the independence of ensemble members for lead times 9-10 83 days 15,16,18 , but to the best of our knowledge, no independence test has yet been performed in 84 UNSEEN studies of seasonal prediction systems. 85 For the regions studied here, the ensemble members from lead times beyond one month are not 86 dependent on atmospheric initial conditions, because the synoptic patterns related to ARs are 87 known not to be predictable beyond two weeks 32,33 . However, predictability on a seasonal timescale 88 may be found through slowly varying components of the ocean-atmosphere system. Therefore, 89 while the ensemble members might represent unique weather events because of the independency 90 to the atmospheric initial conditions, the weather events could have a conditional bias induced by 91 favourable conditions in the slowly varying components of the ocean-atmosphere system. 92 To test the seasonal dependence of SON-3DP, we first select the seasonal maximum event for each 93 forecast then concatenating these events to create a 35-year timeseries (Fig. 1a, b,c). To robustly 94 assess the independence between each of the ensemble members, we calculate the Spearman rank 95 correlation coefficient ( ) for every distinct pair of ensemble members (Fig. 1d), resulting in 300 96 values for each lead time. The value of ranges from ca. −0.6 to 0.6, and the median correlation is 97 close to zero for all lead times for both Western Norway and Svalbard (Fig. 1e,f). The range in 98 values is expected due to the large number of correlation tests, and none of the lead times fall 99 outside the range that would be expected for uncorrelated data for the West Coast of Norway (Fig.  100 1e). For Svalbard, slightly higher values are found, with the median correlation still within the 101 expected range, but the interquartile range just exceeding the upper boundary of the confidence 102 intervals for the first two lead times (Fig. 1f). The small correlations found for Svalbard might be 103 driven by the trend that we detect for this region (UNSEEN-trends section), and thus, the UNSEEN 104 ensemble members represent unique events that follow the slowly evolving climate signal, as 105 desired. 106 A second potential issue for generating the UNSEEN ensemble could be a drift in the simulated 107 climatology 34,35 , which may alter precipitation extremes over longer lead times. Therefore, model 108 stability is a requirement for pooling lead times. Model stability is assessed by comparing the 109 distribution of predicted SON-3DP events across different lead times. For both regions, the 110 probability density functions of the pooled SON-3DP events for the considered lead times are 111 remarkably similar (Fig. 2a,b). Moreover, the empirical extreme value distributions of the individual 112 lead times fall within the uncertainty range of the distribution of all lead times pooled together and 113 thus, the model can be considered stable over lead times (Fig. 2c,d). 114

Fidelity of UNSEEN extremes for Western Norway 115
Confidence in simulated `unprecedented extremes' in large ensembles is complicated by the inability 116 to validate extremes, given the limited sample sizes of observations. Here, we evaluate the UNSEEN 117 ensemble with 1) rank histograms, commonly applied in ensemble forecast verification 36 Supplementary Fig. 2). This is confirmed by the bootstrapping test, 125 that shows that the observed mean and standard deviation fall outside the 95% confidence intervals 126 of the UNSEEN ensemble ( Supplementary Fig. 3). The UNSEEN SON-3DP anomalies and standardized 127 anomalies do show rank uniformity, and thus are suggested to be reliable ( Supplementary Fig. 2 Fig. 2), the ratio between the mean observed extremes and the mean simulated 132 extremes (1.74) is applied as a constant bias correction to generate the bias corrected UNSEEN 133 ensemble (henceforth referred to as UNSEEN-BC). Note that we found little sensitivity to using the

UNSEEN-trends in 100-year precipitation over last 35 years 166
Climate models can be used to detect changes 38-41 or to attribute extreme events to human causes 42 , 167 but are less suited to detecting trends over the recent past such as the last 35 years. By design, 168 climate model simulations are initialized once at the beginning of a centennial run. Contrastingly, 169 here we use seasonal forecasts that are initialized every month, and thus are more constrained by 170 real-world climate variability than climate model simulations. Consequently, seasonal forecasts 171 sample a smaller range of climate conditions but are closer to reality than climate model 172 simulations. This means that their use is consistent with analysing trends over the recent past 173 described by the available forecast period (for SEAS5, currently 35 years). Furthermore, the model 174 setup and version are the same for the entire hindcast simulation, ensuring that, with respect to the 175 models and initialization, SEAS5 is a homogeneous dataset and thus suitable for climate analysis and 176 detection of UNSEEN-trends. 177 With a 36 km resolution and 25 members, the ECMWF SEAS5 reforecast set used here is based on a 178 modelling system of high resolution and associated with a large ensemble compared to current high-179 resolution global climate models 43 . SEAS5 greenhouse gas radiative forcing captures the long-term 180 trends in emissions 31 , and we show that the global mean temperature trend in SEAS5 follows ERA5 44 181 ( Supplementary Fig.6). Whilst regionally, we find a cold bias over the Norwegian study domain, the 182 trend is consistent with ERA5 for both Western Norway and Svalbard ( Supplementary Fig. 6), 183 confirming the capacity of SEAS5 to detect recent trends. 184 To illustrate the added value of UNSEEN-trends, we extend the GEV distribution to include a time 185 covariate and fit this distribution to the observed and UNSEEN SON-3DP (see Methods). Using the 186 observations, we find an increase in 100-year SON-3DP of 4% over 1981-2015 in Western Norway, 187 but associated with large uncertainties ranging from −27% to 34% (Fig. 4 a,b). The UNSEEN-trend 188 estimate of 2% is more constrained due the larger sample size, with confidence intervals ranging 189 from −3% to 7%. A negative trend is thus statistically possible, indicating that the trend over Western 190 Norway is not significant. For Svalbard, we find a significant positive UNSEEN-trend of 8%, with 191 uncertainty bounds ranging between 4- 12%. 192 In addition to the trend in 100-year SON-3DP events, we illustrate the change in all return values by 193 plotting the GEV distribution with the covariates 1981 and 2015 ( Fig. 4

c,d). The likelihood ratio test 194
shows that the GEV distribution including a time covariate improves the model fit for Svalbard (p-195 value = 2.7e-07). We find that the frequency of the event that used to be a 100-year event in 1981 196 has an expected return period of 41 years in 2015 ( Fig. 4 c,d). For Western Norway, the GEV 197 distribution including a time covariate does not improve the model fit for either the observed (p-198 value = 0.58) or the UNSEEN-ensemble (p-value = 0.65), and thus, the stationary GEV distribution, as 199 presented in Fig. 3, is most appropriate. 200

Discussion and Conclusion 201
In this study, we test the robustness of the UNSEEN approach and we use the large sample to 202 constrain short-term UNSEEN-trends in high-impact precipitation events for Western Norway and 203 Svalbard. We show that with SEAS5, the effective sample size of autumn 3-day precipitation (SON-204 3DP) events in Western Norway and Svalbard can be increased by a factor of 100 compared to 205 observations, because ensemble members are independent and the model is stable over lead times. 206 Validating UNSEEN events and trends is a complex task, but our approach reproduces observed 207 extremes well after bias correction for Western Norway, a region with extensive records 26 . 208 The insights presented in this study are specific to Western Norway and Svalbard SON-3DP but the 209 independence, model stability and model fidelity tests applied to the UNSEEN approach could be 210 transferred to other regions, temporal resolutions and spatial extent of the events, seasons and 211 climate variables. Global validation of the UNSEEN ensemble will highlight in which regions the 212 approach may enhance the robustness of design level estimation, with a potentially high value in 213 supporting data scarce regions 45 . Furthermore, the large sample size may allow estimation of 214 extremes using empirical approaches that avoid assumptions about underlying distributions and 215 their non-stationarity, thereby offering the possibility of improved design estimates 10 and empirical 216 attribution of physical mechanisms. A wide range of scientific disciplines might benefit from the 217 UNSEEN method by forcing seasonal prediction systems into impact models to assess 218 unprecedented impacts and improve understanding of the physical mechanisms leading to these 219 events. 220 The results from the two study areas highlight the value of both the UNSEEN and the UNSEEN-trends 221 approach. For the well-monitored Norwegian domain, we are able to bias correct the UNSEEN

413
Data. We use the fifth generation of the ECMWF seasonal forecasting system SEAS5 to generate the 414 UNSEEN ensemble. SEAS5 is a global coupled ocean, sea-ice, and atmosphere model, which has been 415 introduced in fall 2017 31 . The atmospheric component is based on cycle 43r1 of the ECMWF 416 Integrated Forecast System. The spatial horizontal resolution is 36 km and it has 91 vertical levels. 417 The ocean (Nucleus for European Modelling of the Ocean, NEMO 48 ) and sea-ice (Louvain-la-Neuve 418 Sea Ice Model, LIM2 49 ) models run on a 0.25-degree resolution. The atmosphere is initialized by ERA-419 Interim 50 and the ocean and sea-ice components are initialized by the OCEAN5 reanalysis 51 . ECMWF 420 provides a re-forecast (also known as hindcast) dataset for calibration of the operational forecasting To evaluate the precipitation extremes simulated by SEAS5, we use a 1x1 km gridded station-based 454 precipitation product for Norway 25 . The data have recently been corrected for underestimation 455 caused by wind-induced under catch and uses more information in the interpolation scheme for 456 datapoints randomly picked from all members, years and lead times to remove potential 490 correlations. This randomized dataset is split into four pseudo lead times of 25 timeseries, in order 491 to calculate the boxplot statistics from the same amount of correlation coefficients (300) as before. 492 The data are resampled 1000 times (without replacement), resulting in 4000 boxplot statistics (4 493 pseudo lead times * 1000 resampled series), from which the confidence intervals are calculated 494 based on a 5% significance level (the 2.5 and 97.5 percentiles). variability and thus the forecasts are under-dispersed (over-dispersed). We create rank histograms 522 for the raw SON-3DP UNSEEN ensemble, for the anomalies from the mean and for the standardized 523 anomalies, where the anomalies are divided by the standard deviation. 524 To compare UNSEEN to the observed record in more detail, we apply a bootstrap test presented in 525 previous studies 20,22 . We bootstrap 10,000 timeseries of 35 years with replacement from all 526 ensembles (100 x 35 years) and calculate the mean, standard deviation, skewness and kurtosis for 527 each. We test whether the four distribution statistics derived from the observed precipitation time 528 series over the period 1981-2015 fall within the 95% confidence intervals for the statistics derived 529 from the bootstrapped timeseries. 530 We then fit the Generalized Extreme Value (GEV) distribution, described by a location (−∞ < < 531 ∞), scale ( > 0) and shape (−∞ < ξ < ∞) parameter 60 : 532 And we test the sensitivity to using the Gumbel distribution with ξ = 0, simplifying the distribution 534 to: 535 The quantiles of the distribution can again be obtained by inverting the distribution: 537 Where the return value corresponds to the return period 1/probability ( ). For all statistical 539 model fits in this study (including non-stationary fits described in the next section), we apply 540 Maximum Likelihood Estimation (MLE) to estimate the parameters of the distributions, utilizing the 541 extRemes package 61 in R 58 . The 95% confidence intervals of the distributions are calculated based on 542 the normal approximation, which is the default of the extRemes package. 543 544 UNSEEN-trends. In this study, we present the idea of performing trend analysis on seasonal 545 hindcast, as the seasonal hindcasts provide a larger sample than observations and a higher 546 resolution than climate models (see the UNSEEN-trends section for more details). We apply well-547 established extreme value theory 60,62,63 , by allowing the location ( ) and scale ( ) parameters of the 548 GEV distribution (given in equation 3) to vary linearly with time ( ). Because the scale parameter 549 needs to be positive, a log-link function is used: 550 μ( ) = μ 0 + μ 1 (6) 551 ln σ ( ) = ϕ 0 + ϕ 1 (7) 552 553 This approach selects one block maximum per year, leading to 35 data points over the years 1981-554 2015 based on observed records. With UNSEEN-trends, we have 100 times more values for each 555 year and thus increase confidence in the regression analysis (see Fig.4a,b for illustration). As for the 556 stationary method, we use MLE to estimate the parameters of the distributions and the normal 557 approximation to find the 95% confidence intervals of return values. We focus on the changes in the 558 100-year quantiles, because these are associated with the design-levels mostly used in flood 559 defence 64 . The trend in the 100-year return value is defined as the percentual change between 1981 560 and 2015: 561 Where is defined by equation (5). 563 The robustness of the trends to experiment decisions like the block size and the regression method 564 can be further investigated but are beyond the scope of this research. For example, 6-month blocks 565 can be selected at the expense of the ensemble size. This will result in 25 realizations, in comparison 566 with 3-month blocks, which contain 100 realizations. A block size of three months (September-567 November) is chosen in this study. A linear trend in time is assumed in this study. With the large 568 amount of data, more complex regression methods can be explored. The ECMWF SEAS5 seasonal 569 prediction system is used in this study, but other seasonal prediction systems with available 570 hindcasts could also be assessed to test the model sensitivity to return value and trend estimation.