Regional SST and SLP conditions related to tornado ‘ outbreak ’ environments 15 days later

Global climate features are known to influence tornado frequency in the U.S., but more work needs to focus on understanding the extent to which climate variables contribute to increases in CAPE and shear on days with an outbreak of at least ten tornadoes. Here the authors quantify the conditional relationships between precursor SST and SLP variables and localized extremes of CAPE and shear associated with large outbreaks. They do this by fitting linear regressions to global climate variables averaged over the fifteen days before the outbreak to estimate the changes in CAPE and shear on days with at least ten tornadoes. Results show that for every 1° increase in the SST gradient between the Gulf of Alaska and the Caribbean, DLBS increases by 0.88 m s ̄1, SLBS increases by 0.62 m s ̄1, and CAPE decreases by 50.6 J kg ̄1, conditional on at least ten tornadoes, and holding the other variables constant. Further, results show that for every 1° E increase in longitude, DLBS increases by 0.15 m s ̄1, SLBS increases by 0.38 m s ̄1, and CAPE decreases by 39.3 J kg ̄1, conditional on at least ten tornadoes, and holding the other variables constant. Additionally, SLBS is the only environmental factor that has a significant upward annual trend. Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation ** This article is In-Review in Climate Dynamics ** Noname manuscript No. (will be inserted by the editor) Regional SST and SLP conditions related to tornado ‘outbreak’ environments 15 days later

severe convective weather in the US. Research suggests that CAPE will increase as a result of rising surface temperatures [6,11,17,18,31,58], and BS will decrease as a result of the weakened temperature gradient between the equator and the poles [6,12]. Other research suggests that the decrease in BS will be smaller than the increase in CAPE [6,11,31,58]. As a result, environments for severe convective weather will become more frequent in the future [6,11,31,58]. However, more research is needed to understand the direct relationship between climate variables and tornado outbreak environments known to influence tornado and casualty counts.
Large-scale climate features are known to influence tornado activity in the US. Increased surface temperatures, sea surface temperatures, and phases of global wind patterns influence tornado activity in the US. For example, the phase of the El Nino Southern Oscillation (ENSO) modulates the polar and subtropical jet streams, which directly influence tornado activity in the US [1,9,[32][33][34]54]. Additionally, the phase of the Madden-Julian Oscillation (MJO) influences convection and tornado activity in the US [4,28,37,41,51,53]. Although research has shown the relationship between climate variables and tornado activity, more work is needed to quantify the relationship between these climate variables and tornado outbreak environments.
The goal of this paper is to determine which large-scale climate variables precede tornado outbreaks in the US. Here, we define tornado outbreak environments using locally extreme values of CAPE and shear. We are interested in the regional sea surface temperature (SST) and sea level pressure (SLP) conditions fifteen days before an outbreak occurs. However, we are not examining whether these precursor conditions can anticipate the occurrence of an outbreak. Instead, we address the following question: Given an outbreak, what regional SST and SLP patterns fifteen days prior are statistically related to the amount of CAPE and shear associated with the outbreak? The study quantifies the conditional relationships between precursor SST and SLP variables and local extremes of CAPE and shear associated with large outbreaks. First, we assign each global climate variable to its respective cluster. Next, we fit a series of regression models to cluster-level environmental data and climate variables on convective days (24-hour period; 12 UTC to 12 UTC) when the number of tornadoes is at least ten [see [43]]. These models quantify the relationships that exist between climate variables and environmental conditions associated with tornado outbreaks.
The paper is outlined as follows. The data and methods are discussed in section 2, including selecting linear mixed-effects models to statistically explain environmental factors for clusters for at least ten tornadoes. Descriptive statistics and model results are discussed in section 3. Model estimates and skill tests are also discussed in section 3. Finally, a summary of the results with conclusions is in section 4.

Data and Methods
This section includes a description of the data collation process, organization, and procedures to aggregate values to the cluster level. For this research, we define a cluster as a group of at least ten tornadoes that occur relatively close together in both space and time during a single convective day. We select ten as the cut-off value because it is often used to formally define an outbreak [3,26]. Additionally, a selection of ten tornadoes helps alleviate uncertainty caused by too few clusters and excessive time to fit models caused by too many clusters [22]. We fit a series of regression models to demonstrate that climate variables can skillfully predict environmental factors aggregated to the level of tornado clusters. We fit a regression model for each environmental factor resulting in three models. Environmental Regional SST and SLP conditions related to tornado 'outbreak' environments 15 days later 3 factors collected from reanalysis data represent the conditions before the first tornado in each cluster. The explanatory variables in these models include climate variables such as sea surface temperatures, North Atlantic Oscillation, Pacific North American, and Madden-Julian Oscillation, and spatio-temporal variables such as latitude, longitude, and year.

Tornado Clusters
We download tornado data from the Storm Prediction Center (SPC; https://www.spc. noaa.gov/gis/svrgis/). We extract the date, time, and genesis location for all tornado reports between 1994 and September 2020. The year 1994 is selected as the start of analysis because it is the first year of the extensive use of the WSR-88D weather radar [30]. During this time frame, there are a total of 33 143 tornado reports. We convert the geographic coordinates for each genesis location to Lambert conic conformal coordinates, where the projection is centered on 96 • W longitude.
We assign each tornado a cluster identification number. We assign the same cluster number to tornadoes that occur close together in space and time. Clustering ends when the difference between the individual tornado and existing clusters surpasses 50 000 s (roughly 14 hours). Then, we divide the space-time difference by 15 m s −1 to account for the average speed of tornado-producing storms, which is commensurate with the magnitude of the steeringlevel wind field. Additional detail about the clustering procedure, along with a comparison of the resulting clusters to well-known outbreaks, is available in [43].
We select only clusters with at least ten tornadoes occurring within the same convective day. In total, there are 830 clusters with a total of 18 571 tornadoes used in this research. The average number of tornadoes per cluster is 22, with a maximum of 173 (27 April 2011). The clusters have a right-skewed distribution , with 88 clusters containing exactly ten tornadoes. The minimum convex hull (black polygon) that includes all tornado genesis locations defines the cluster area. Figure 1 is an example of the May 20, 2019, cluster. This day had a total of 49 tornadoes that occurred over Oklahoma and Texas. It had an area of 155 338 km 2 and lasted roughly 17 hours. It resulted in a total of 4 casualties (sum of injuries and fatalities).
There were cases where multiple clusters occurred on the same day. Although these clusters may result from the same synoptic system, they do not group together because the minimum tornado space-time distance exceeds the threshold value. Therefore, in this research, we do not attempt to identify the system that produced these tornadoes and use the term cluster instead of outbreak.

Environmental Factors
Convective available potential energy (CAPE), bulk shear (BS), and weak convective inhibition (CIN) are large-scale environmental factors known to influence tornado development [13,42,47,52]. We obtain the environmental factors from the National Centers for Atmospheric Research's North American Regional Reanalysis (NARR) [35]. The resolution of NARR is a 32 km x 32 km grid. All NARR values are available in 3-hour increments beginning at 0000 UTC. In the severe convective weather literature, researchers often refer to these environmental factors as "parameters." Here, we refer to them as factors since the term "parameter" denotes an unknown coefficient in statistical modeling methodology, which is employed here. For each cluster, we select the nearest 3-hour NARR time before the occurrence of the first tornado. For example, we select the 1200 UTC environmental factors for a cluster whose first tornado occurs at 1347 UTC. The NARR time before the start of each cluster allows the data to be less contaminated by deep convection. However, this choice can lead to underestimating the severity of environmental conditions when environments conducive to tornadogenesis are rapidly changing. In total, about 57% of all clusters have an initial tornado between 18 UTC and 00 UTC. However, there are more tornadoes in clusters when the first tornado occurs between 18 UTC and 21 UTC on average.
The environmental factors considered in this research include CAPE (0-180 hPa above ground level), deep-layer bulk shear (DLBS; 1000 -500 hPa), and shallow layer bulk shear (SLBS; 1000 -850 hPa). We calculate the shear variables as the square root of the sum of the squared differences between the -and -wind components at the respective levels consistent with others [57]. Climate researchers often use these specific variables as proxies for more traditional variables used to forecast severe convective weather [2,22,40,43,44,57].
For each environmental variable, we select the highest value across the raster grid confined within the area defined by the cluster's convex hull to represent the cluster (Fig. 2). We select the maximum value to capture environments unaltered by deep convection as a result of tornadogenesis. We do not use the mean value because the tornado and non-tornado producing convection within the cluster's convex hull often influence the environmental conditions. Histograms (not shown) of the maximum values show no evidence of extreme behavior.
We do not include storm-relative helicity (SRH), lifted condensation level (LCL), and dewpoint temperature (DEW) are in this research, although they are proven to indicate favorable environments for tornadogenesis. Eliminating these environmental factors is consistent with other researchers [44] who eliminate these variables as a result of the correlation to the other environmental factors such as CAPE, DLBS, and SLBS. Additionally, we do not use composite parameters, including the significant tornado parameter (STP) and supercell composite parameter (SCP) in this study. STP and SCP formulas use CAPE and shear to calculate these variables. Therefore, the value of these composite variables could be a result of high CAPE and low shear or low CAPE and high shear.

Climate Variables
This research leverages climate variables to statistically explain changes in CAPE, DLBS, and SLBS for the clusters. Sea surface temperatures, North Atlantic Oscillation, Pacific North American, and Madden-Julian Oscillation represent the climate variables in this research because they influence tornado activity in the US. A single daily average value is obtained for each of the fifteen days before the cluster for each climate factor. We compute the fifteenday average for each climate variable per cluster by averaging the fifteen individual daily average. Fifteen days is commensurate with the global propagation of Rossby waves which is roughly a two-week cycle.

Sea Surface Temperatures
We collect sea surface temperature (SST) data from the High-Resolution Blended Analysis through the National Oceanic and Atmospheric Administration's (NOAA) Physical Sciences Laboratory. It contains information on the daily mean sea surface temperature from September 1981 to the present. This research uses sea surface temperatures in three separate zones: the Caribbean, Gulf of Alaska, and the El Nino 3.4 region (Fig. 3). We select these zones because they have been used to understand tornado events in the US [16,19,23,24]. The first SST zone in this study is the Caribbean sea surface temperatures (CSST) which extends from 90 • W to 70 • W and 15 • N to 25 • N (blue rectangle in Fig. 3). The second SST zone in this study is the Gulf of Alaska sea surface temperatures (GAKSST) which extends from 157.5 • W to 133.1 • W and 50.5 • N to 60 • N (green rectangle in Fig. 3). [23] found that tornado activity decreases for increasing CSSTs and GAKSSTs. The final SST zone in this study is the El Nino 3.4 (NinoSST) region which extends from 170 • W to 120 • W and -5 • S to 5 • N (yellow rectangle in Fig. 3). We average the SSTs in this region to obtain the value of the NinoSST. We select the NinoSST instead of the well-known El Nino 3.4 Index because the index is a 5-month average value. The average NinoSSTs for each region better represent the daily to weekly timescales in this research. NinoSSTs are known to influence tornado development. When NinoSSTs are colder than normal, the Southern Oscillation Index is in the La Nina phase. During the La Nina phase, tornado frequency, CAPE, moisture, and low-level winds are increased, leading to the likelihood of tornadoes [2,19,60]. When NinoSSTs are warmer than normal, the Southern Oscillation Index is in the El Nino phase. During the El Nino phase, more tornadoes occur over the high plains region with a seasonal increase in tornadoes along the Gulf Coast [2,19].

North Atlantic Oscillation
We collect the North Atlantic Oscillation (NAO) data from NOAA's Physical Science Laboratory (https://psl.noaa.gov/data/timeseries/daily/NAO/). The data set contains information on the daily value of the NAO Index from 1948 to the present. The index is calculated by taking the difference in the 500 hPa height patterns from the Azores High [35-45 • Fig. 3). The NAO values are standardized by the standard deviation of the monthly NAO index. When the height difference is greater than normal, the NAO is in a positive phase. The probability of tornadoes decreases in the southeastern US when the NAO is positive [19]. When the height difference is weaker than normal, the NAO is in a negative phase. A negative NAO results in weaker westerly winds and a decreased pressure gradient across the North Atlantic.

Pacific North American
We collect the Pacific North American (PNA) data from NOAA's Physical Science Laboratory (https://psl.noaa.gov/data/timeseries/daily/PNA/). The data set contains information on the daily value of the PNA Index from 1948 to the present. The index is calculated by taking the difference between the 500 hPa height patterns between northern Pacific Ocean [40-50 • Fig. 3). The PNA values are standardized by the standard deviation of the monthly PNA index. When the height difference is greater than normal, the PNA is in a positive phase. When this occurs, there are strong fluctuations in the strength and location of the East Asian jet stream, which can lead to above-average temperatures over western Canada and the western US and increased precipitation in the southeastern US [8]. When the height difference is weaker than normal, the PNA is in a negative phase. When this occurs, the East Asian jet retracts westward, leading to high pressure situated over the Northern Pacific [8].  It is calculated by taking the difference in the 500 hPa height patterns from the Azores High (green rectangle) and the Icelandic low (blue rectangle). The third panel highlights the regions used to calculate the PNA Index. The PNA is calculated as the difference between the difference in the 500 hPa height patterns for the two Pacific Regions (pink rectangles) and again for the two North American height regions.

Madden-Julian Oscillation
We collect the Madden-Julian Oscillation (MJO) data from NOAA's Physical Science Laboratory through the Australian Bureau of Meteorology (http://www.bom.gov.au/climate/ mjo/graphics/rmm.74toRealtime.txt). For this research, the MJO is the amplitude of the wave pattern in units of meters. Strong amplitudes of MJO indicate enhanced convection along the equator. The temporal range of the MJO extends from weekly to monthly timescales. It is most known for its influence on the strength of global monsoon patterns, variations in wind and precipitation, and hurricanes.

Regression Models
We fit a series of regression models to the cluster-level environmental data. Each environmental variable has a single model. The regression models quantify the effect of each climate factor on the environmental factors (CAPE, DLBS, SLBS) while holding the other variables constant. The random effects (an offset to the intercept term) in the model are the seasonal and hourly variability of the environmental factors. The climate variables for each cluster are the fixed effects in the models. The environmental factors for each cluster are the response variables in the models.
We fit a series of linear regression models to the data having the initial form where the cluster center location [latitude ( ) and longitude ( )], year ( ), and the six climate variables (CSST, GAKSST, NINOSST, NAO, PNA, and MJO) are the explanatory variables in the model. The random effects in the model are month and hour. Therefore, Month and Hour are vectors of coefficients with one element for each month of the year and hour of the day. The coefficients are computed using the maximum likelihood approach with the lmer function from the lme4 package in R [5]. We do the same for the initial SLBS and CAPE models. We simplify the initial models through single-term deletion as described in §3.
We evaluate model skill by comparing the observed DLBS, SLBS, and CAPE with estimated values from the model. We obtain these values for each cluster by plugging the values of the explanatory variables into the final model. Predicted rates are under dispersed relative to the observed environmental factors. Comparisons are made using the Pearson correlation coefficient and mean absolute error. We evaluate the predictive skill of the models using in-sample and out-of-sample predictions. To compute the in-sample predictions, we fit a single model using the 830 clusters. To compute the out-of-sample predictions, we conduct a hold-one-out cross-validation [see [21]] where one cluster is held out of the model fitting procedure, and the model then uses that cluster to predict the environmental conditions.

Descriptive Statistics
Deep-layer bulk shear does not have a significant diurnal variation but does have a seasonal pattern (Fig.4). The propagation of the polar jet stream during the winter lends itself to increased DLBS values [48,49]. DLBS values decrease during the summer months when the polar jet stream retreats northward [7]. The seasonal variability of DLBS must be taken into account when fitting a model to estimate DLBS using climate variables.
The values of maximum DLBS follow a normal distribution (Fig. 5a) with a maximum value of 47.9 m s −1 and a minimum value of 5.59 m s −1 for clusters with at least ten tornadoes ( Table 1). The median value of DLBS is 27.6 m s −1 . The mean value of DLBS in clusters with more than ten tornadoes is 27.6 m s −1 . In total, 50.1 % of clusters have less than the mean value of DLBS.
Shallow-layer bulk shear does not have a significant diurnal variation but does follow a seasonal pattern consistent with DLBS (Fig.4). Similar to DLBS, the propagation of the polar jet stream directly influences the values of SLBS [48,49]. The values of maximum SLBS follow a normal distribution (Fig. 5b). The maximum value of SLBS is 35.7 m s −1 with a minimum value of 1.08 m s −1 for clusters with at least ten tornadoes ( Table 1). The median value of SLBS is 15.1 m s −1 . The mean value of SLBS for clusters with at least ten tornadoes is 15.2 m s −1 . In total, 51.8% of clusters have less than the mean value of SLBS.
CAPE follows both a diurnal and seasonal pattern (Fig.4). During the summer months, CAPE values increase as a result of increased air temperatures [7]. These larger CAPE values indicate more buoyant air leading to a greater potential for convection to occur. Additionally, CAPE follows a diurnal pattern. CAPE values are highest in the afternoon and early evening hours when the daily temperatures are warmest, and CAPE values are lowest during nocturnal events [48].
The values of maximum CAPE do not follow a normal distribution (Fig. 5c). The maximum value of CAPE is 6530 J kg −1 with a minimum value of 0 J kg −1 for clusters with at least ten tornadoes ( Table 1). The median value is 2045 J kg −1 . The mean value of CAPE for clusters with at least ten tornadoes is 2199 J kg −1 . In total, 53.98 % of clusters have less than the median value of CAPE. It is important to note that CAPE's median and mean values in clusters with at least ten tornadoes have similar values. These similarities are taken into account when fitting the CAPE model discussed below.
The explanatory variables in this research are a combination of physical and spatiotemporal variables ( Table 1). The range of values for these variables is consistent with the literature. The average CSST is 27.6 • C for the 830 clusters in this research. The average GAKSST is 7.51 • C, and the average NinoSST is 29.6 • C. The maximum standardized geopotential height difference for the NAO is 26.5 m and 36.7 m for the PNA.
Although each of these variables influences tornado activity in the US, collinearity exists between the climate variables. There is a strong correlation between the GAKSST and CSST at a value of 0.897. Therefore, we compute the SST gradient by subtracting the maximum GAKSST from the maximum CSST for all clusters with ten or more tornadoes. After calculation of the gradient (SSTgradient), no collinearity issues remain between the climate variables (Fig. 6).

Deep-layer bulk shear model
We use data from the 830 clusters to regress DLBS onto the climate variables given in Table 1. The regression model quantifies the effect of these climate variables on DLBS while holding the other variables constant. The random effect in the model is the month because of the significant seasonal variation in DLBS. Climate variables are the fixed effects in the model, as are the latitude (Lat) and longitude (Lon) of the centroid for each cluster. We include Lat and Lon in the model to account for the spatial variability in DLBS values. We add the fifteen-day averages of each climate variable to the initial model (Table 2). Climate variables with large -values remain in the final model. The null hypothesis is rejected if the -value on the coefficient estimate is less than or greater than 1. The null hypothesis cannot be rejected for the MJO in the initial DLBS model. All significant climate variables have signs on the coefficients that are physically reasonable (Table 2). DLBS increases for an increase in the SST gradient between the Caribbean and Gulf of Alaska. DLBS also increases for every degree N and degree E increase in Lat and Lon, respectively.
All variables in the final DLBS model are significant. This regression model is best as it has the lowest Akaike Information Criterion (AIC) score, which measures the goodness of fit for the model. The in-sample correlation between the observed DLBS values and the predicted values is r = 0.565 [0.52, 0.61, 95% confidence interval (CI)]. The model statistically explains 36.3% of the variation in cluster-level DLBS but tends to over predict DLBS for clusters with lower observed DLBS values and slightly under predict DLBS for larger values of observed DLBS (Fig. 7). The conditional standardized residuals between the actual and model estimated values of DLBS follow a normal distribution which indicates an adequate model.
The model coefficients on the climate variables are consistent with expectations given recent literature. Specifically, DLBS increases for increasing Lat, Lon, and SSTgradient. DLBS decreases for increasing NAO and PNA. Latitude is the most important fixed effect

Shallow-layer bulk shear model
Similar to the DLBS model, a regression model is fit to SLBS using the same explanatory variables Table 1. A substantial seasonal variation also characterizes SLBS, so the month is included in the model as a random effect. Climate variables are the fixed effects in the model, as are the latitude (Lat) and longitude (Lon) of the centroid for each cluster. We include Lat and Lon in the model to account for the spatial variability in SLBS values. We add the fifteen-day averages of each climate variable to the initial SLBS model consistent with the DLBS model (Table 3). Climate variables with large -values remain in the final SLBS model. For the SLBS model, the null hypothesis cannot be rejected for the MJO. The year is significant in the model, indicating a positive and significant annual upward trend in SLBS on average independent of the other variables. All significant climate variables have signs on the coefficients that are physically reasonable (Table 3). SLBS increases with increasing SSTgradient and longitude eastward.
All variables in the final SLBS model are significant. The final regression model has the lowest Akaike Information Criterion (AIC) score, which measures the trade-off between fit and overfitting. The in-sample correlation between the observed SLBS values and the predicted values is r = 0.648 [0.61, 0.69, 95% uncertainty interval (UI)]. The model statistically explains almost 37.6% of the variation in cluster-level SLBS but tends to over predict SLBS for clusters with lower observed SLBS values and slightly under predict SLBS for larger values of observed SLBS (Fig. 7). The conditional standardized residuals between the actual and estimated values of SLBS follow a normal distribution indicating an adequate model.

Convective available potential energy model
Finally, a regression model is fit to CAPE using the same explanatory variables ( Table 1). The regression model quantifies the effect of these climate variables on CAPE while holding the other variables constant. The random effect terms in the CAPE model are the month and hour of the cluster because of the considerable seasonal and diurnal variation in CAPE. For this model, the climate variables are the fixed effects in the model, as are the latitude (Lat) and longitude (Lon) of the centroid for each cluster. To remain consistent, the initial CAPE model uses only the fifteen-day averages of each climate factor (Table 4). Only climate variables with a large -value remain in the final model consistent with the shear models. The null hypothesis cannot be rejected for the CAPE model for NinoSST, NAO, and PNA. All significant climate variables have signs on the coefficients that are physically reasonable (Table 4).  (Fig. 7).
The conditional standardized residuals between the actual and estimated CAPE values follow a normal distribution that indicates an adequate model (Fig. 8). However, it is essential to note that there is a more extensive spread in model estimates for larger values of CAPE. The spread depends on the month of occurrence with increased variability of CAPE values during the spring and summer months when more clusters occur over a larger spatial domain on average.

Model estimates
We illustrate the models by estimating the environmental factors using climate variables across a range of values that are significant in all three models (Lon and SSTgradient) (Fig. 9). We hold the explanatory variables for each model constant about their respective mean values. The year is set to 2020 for the SLBS model as it is only significant in this model. The random effects, month (all models) and hour (CAPE model only), are set to April and 18z due to maximum activity during the spring and evening hours. For a SSTgradient of 10 • C and a Lon of 92 • W, the models estimate DLBS to be 21 m s − , SLBS to be 13 m s −1 , and CAPE to be 2364 J kg −1 using their respective models. For a SSTgradient of 25 • C and a Lon of 72 • W, the models estimate DLBS to be 37 m s −1 , SLBS to be 29.9 m s −1 , and CAPE to be 818 J kg −1 using their respective models. For a SSTgradient of 18 • C and a Lon of 109 • W, the models estimates DLBS to be 25.6m s −1 , SLBS to be 11.4 m s −1 , and CAPE to be 2627 J kg −1 using their respective models. Figure 9 is a visual representation of the CAPE and DLBS fields when modeled over a range of values for the SSTgradient and Lon values. It is interesting to note that both shear models follow similar patterns where shear values increase for every 1 • E shift in longitude and a 1 • C increase in the SSTgradient. However, CAPE follows the opposite pattern where CAPE values decrease for every 1 • E shift in longitude and increase the SSTgradient.
The SSTgradient is significant in all three models. An increased SSTgradient value leads to an increase in DLBS and SLBS with a decrease in CAPE. There is a clear distinction between the kinematic (wind-driven) and thermodynamic (temperature-driven) environmental factors. Increased shear is a result of the slope between the geopotential heights for a larger SST gradient. This increase in the gradient leads to cooler temperatures which decrease CAPE values across the US. When the SST gradient between these two regions is smaller, the geopotential heights are similar, resulting in a decrease in shear. This decrease in the gradient leads to warmer temperatures which increase CAPE values across the US.

Sensitivity of the results to the averaging period
To directly test the sensitivity of the results to changes in the average period, we refit the models to include 10-day and 20-day averages instead of 15-day averages. The 10-day average climate variables do not improve the mean absolute error of the CAPE and DLBS models.

Conclusions
Tornado outbreaks are becoming more destructive on average. Recent studies indicate changes in environmental conditions for tornadoes. This research focuses on the extent to which climate variables contribute to increases in CAPE and shear given an outbreak of at least ten tornadoes. It is important to note that we make no attempt to use climate variables to predict the occurrence of an outbreak. Instead, the study quantifies the conditional relationships between precursor SST and SLP variables and localized extremes of CAPE and shear when associated with outbreaks.
We use statistical models to quantify the relationship between environmental factors and climate variables for clusters with at least ten tornadoes. For this research, we extract CAPE, DLBS, and SLBS from the NARR dataset to represent the environment of clusters before the first tornado consistent with previous research [43]. We create a regression model for each environmental variable (response) to quantify its change due to climate variables (explanatory). Additional explanatory variables include location and year. As a result of the seasonal and diurnal variability of CAPE, DLBS, and SLBS, the random effects in the model are the month and hour. The DLBS model explains 36.3% of the variation in cluster-level DLBS when using climate variables as explanatory variables. DLBS increases by 0.34 m s −1 for every 1 • N increase in latitude, 0.15 m s −1 for every 1 • E increase in longitude, and 0.88 m s −1 for every 1 • increase in the SSTgradient. DLBS decreases by 0.88 m s −1 for a 1 m increase in the standard deviation of the NAO and 0.55 m s −1 for a 1 m increase in the standard deviation of the PNA. DLBS is location-dependent with the model indicating increased values in the North and East consistent with current literature [48,49]. Additionally, DLBS increases with a stronger gradient between the Caribbean and Gulf of Alaska SSTs consistent with [23].
The SLBS model explains 37.6% of the cluster-level variation in SLBS using the climate variables. The model indicates that SLBS increases by 0.38 m s −1 for every 1 • E increase in longitude, 0.09 m s −1 each year, and 0.62 m s −1 for every 1 • increase in the SSTgradient. SLBS decreases by 0.16 m s −1 for every 1 • N increase in latitude, 0.57 m s −1 for a 1 m increase in the standard deviation of the PNA, and 0.39 m s −1 for a 1 m increase in the standard deviation of the NAO. SLBS is location-dependent with larger values in the South and East consistent with the literature [48,49].
The CAPE model explains 35.8% of the cluster-level variation in CAPE using the climate variables. The model indicates that CAPE decreases by 116 J kg −1 for a 1 m increase in the MJO, 50.6 J kg −1 for a 1 • increase in the SSTgradient, and 39.3 J kg −1 for a 1 • E increase in longitude. These results are consistent with the literature which suggests that lower values of CAPE are found in the Southeast [25,48].
The models are a first step at understanding the influence of climate variables on environmental factors for clusters with at least ten tornadoes. These findings combined with previous research will aid in understanding the direct influence of climate variables on tornado outbreak characteristics, including tornado and casualty counts [44]. For example, tornado and casualty counts may increase if the preceding climate variables increase DLBS when an outbreak occurs.
The focus on the last 25 years of a much longer tornado record is a limitation of this study. Considering additional tornado cases from earlier years could improve the study. However, including earlier years would lead to greater uncertainty on the estimates of clusters and the associated environmental conditions. Additionally, NARR data tends to unrealistically favor environments for tornadoes in specific convective setups, which could affect the model results [2,27,29]. Additional climate variables and variations in the temporal lag may also improve model performance. Future work will examine how these environmental conditions will influence tornado outbreak characteristics, including tornado and casualty counts.

Declarations
I know of no conflicts of interest associated with this publication, and there has been no financial support for this work. The data is open-source and available through the National Oceanic and Atmospheric Administration. The linear mixed effects models in this paper were implemented with the lmer function from the lme4 R package [5]. Graphics were made with the ggplot2 [59] and tmap [50] framework. The code and data to fit all the models is available on GitHub (https://github.com/zschroder/tornenvir_climate).