Systematic calculation of finite-time mixed singular vectors and characterization of error growth for persistent coherent atmospheric disturbances over Eurasia

Singular vectors (SVs) have long been employed in the initialization of ensemble numerical weather prediction (NWP) in order to capture the structural organization and growth rates of those perturbations or “errors” associated with initial condition errors and instability processes of the large scale flow. Due to their (super) exponential growth rates and spatial scales, initial SVs are typically combined empirically with evolved SVs in order to generate forecast perturbations whose structures and growth rates are tuned for specified lead-times. Here we present a systematic approach to generating finite time or "mixed" SVs (MSVs) based on a method for the calculation of covariant Lyapunov vectors (CLVs) and appropriate choices of thematrix cocycle. We first derive a data-driven reduced ordermodel to characterize persistent geopotential height anomalies over Europe and Western Asia (Eurasia) over the period 1979-present from the NCEPv1 reanalysis. We then characterize and compare the MSVs and SVs of each persistent state over Eurasia for particular leadtimes from a day to over a week. Finally, we compare the spatio-temporal properties of SVs and MSVs in an examination of the dynamics of the 2010 Russian heatwave. We show that MSVs provide a systematic approach to generate initial forecast perturbations projected onto relevant expanding directions in phase space for typical NWP forecast lead-times.

Singular vectors (SVs) have long been employed in the initialization of ensemble numerical weather prediction (NWP) in order to capture the structural organization and growth rates of those perturbations or "errors" associated with initial condition errors and instability processes of the large scale flow. Due to their (super) exponential growth rates and spatial scales, initial SVs are typically combined empirically with evolved SVs in order to generate forecast perturbations whose structures and growth rates are tuned for specified lead-times. Here we present a systematic approach to generating finite time or "mixed" SVs (MSVs) based on a method for the calculation of covariant Lyapunov vectors (CLVs) and appropriate choices of the matrix cocycle. We first derive a data-driven reduced order model to characterize persistent geopotential height anomalies over Europe and Western Asia (Eurasia) over the period 1979-present from the NCEPv1 reanalysis. We then characterize and compare the MSVs and SVs of each persistent state over Eurasia for particular leadtimes from a day to over a week. Finally, we compare the spatio-temporal properties of SVs and MSVs in an examination of the dynamics of the 2010 Russian heatwave. We show that MSVs provide a systematic approach to generate initial forecast perturbations projected onto relevant expanding directions in phase space for typical NWP forecast lead-times. This manuscript has been submitted for publication in Chaos. Please note that the manuscript is currently in peer-review. Subsequent versions of this manuscript may have slightly different content. If accepted, the final version of this manuscript will be available via the 'Peer-reviewed Publication DOI' link. Please feel free to contact the authors with any feedback.

I. INTRODUCTION
When it comes to predicting atmospheric conditions, it has been long known that prediction of the distant future is practically impossible. Famously, Lorenz 1 first showed that small errors in the initial conditions for even very simplified, but nonlinear, models of the atmosphere grow over finite time intervals before saturating, hence imposing a limit on the predictability of any future atmospheric state.
A particular challenge of weather forecasting involves estimating the uncertainty in the atmospheric basic state through predicting the probability density function of states in phase space. This is formally equivalent to estimating an infinite hierarchy of moments or cumulants in spectral space 2 . Lorenz 3 was the first to consider error growth by application of an ensemble generated as random initial errors added to a particular initial (analyzed) state. He showed that even simple reduced models for the atmosphere display varying leading error growth rates over consecutive 4day periods. Lorenz 4 further showed that for the Northern Hemisphere synoptic scale troposphere, small errors in geopotential height states tended to double about every 8 days. In practice, forecast skill varies not simply due to errors in the initial conditions, i.e., the analyzed state after assimilation of observations, but also due to model biases in the representation of physical processes, finite grid resolution, and poor or even absent subgrid scale parameterizations. Most relevant to the present study, the variability of forecast skill is directly related to the instability of the large scale flow itself 5 .
The approach suggested by Lorenz relates to the linearization of the model in question where small errors evolve according to the system's tangent linear dynamics. The magnitude and directions of error growth can therefore be related to the singular values and singular vectors (SVs), respectively, of the tangent linear propagator (see Section II for mathematical details). Today, current methods for ensemble prediction attempt to directly sample rapidly expanding directions in phase space with appropriate initial perturbations whose spatio-temporal scales sample the region of phase space where the dynamics are relevant to the physical processes of interest. For example, the European Centre for Medium-Range Weather Forecasts (ECMWF) began exploring the behavior of SVs within ensemble forecasting in the 90s [6][7][8][9][10][11] , leading to the SVs being formally integrated into the initial perturbations of the forecasting systems 12 . Leutbecher and Palmer 13 provide a detailed description of the ECMWF ensemble prediction system whose perturbations are based on the leading part (vectors) of the singular value decomposition (SVD) of the tangent linear dynamics (linearized as a first order Taylor expansion for a particular solution of the nonlinear model) over a finite time interval.
While the approach using SVs as initial perturbations is optimal for disturbances that grow over the first day or two of the forecast, there is an increasing demand for medium-range forecasts which extend out to 10 days or longer. Such forecasts should reflect changes in the background flow, including large-scale persistent anomalies that develop and decay within that time frame. Beyond weather systems, for climate models whose respective domains extend over a large range of spatial and temporal scales, it is practically necessary to isolate the dynamical behavior on the scales of interest. One approach is through reduced models.
In Quinn, Harries, and O'Kane 14 the authors analyzed a reduced model for the North Atlantic Oscillation (NAO) defined by opposing phases of large atmospheric pressure anomalies over the Atlantic Ocean. This model was obtained from a data-driven clustering method applied to atmospheric reanalysis data. The clustering technique identified the dominant persistent states in the atmosphere and an optimal switching between those states. A reduced model was then constructed using the optimal model parameters and cluster assignments, with the latter corresponding to timedependent switching between states. Each data instance is assigned to one of the clusters however the states themselves are also time dependent and do not correspond to invariant patterns. The model dynamics were then explored through analyzing the directions of local and global growth and decay. The authors used an algorithm for computing the covariant Lyapunov vectors (CLVs), which on short time scales produces sets of mixed singular vectors (MSVs). While each regime was found individually to be asymptotically stable, the MSVs showed positive growth rates over short time periods (3 days) associated with structures of a large spatial extent. The characteristics of these MSVs are potentially useful in forecasting in terms of defining the directions in phase space associated with the evolution of large-scale persistent structures of the NAO.
The relative utility of different dynamical vectors in application to ensemble forecast initialization varies in terms of the types of organized structures they project onto during their growth cycle.
These structures are often norm dependent and those with smaller spatial scales typically show more rapid development than those with initially larger spatial scales. The respective differences in the spatial scales, growth and convergence rates of SVs, MSVs and CLVs associated with Northern Hemisphere anticyclogenesis over Eurasia are examined in this study through a reduced order linear delay model shown to capture the large-scale switching between various identifiable meta-stable atmospheric states.
Specifically, we focus on identifying the persistent states of atmospheric pressure over the European and western Asian continent. Using the same method as in Quinn, Harries, and O'Kane 14 , we construct a reduced model for persistent atmospheric states over the last four decades. We compute SVs and MSVs associated with each persistent state and compare their relative growth rates and spatial structures when optimized over various time windows. This is then put into the context of extreme event forecasting through a case study of the 2010 Russian heat wave. We discuss the potential utility of such MSVs in application to the initialization of NWP ensembles for medium-range forecasts.

II. DYNAMICAL SYSTEMS AND PERTURBATION GROWTH
Consider a dynamical system of the form where x is known as the flow that satisfies Eq. 1 for a given initial condition. For spatially discretized flows, Eq. 1 corresponds to a system of ordinary differential equations. Small disturbances x to the flow x are governed by the tangent linear dynamics: Here A(x, ) is called the tangent linear propagator. The tangent linear propagator can be used to determine the stability properties of the flow, both locally and asymptotically. In particular, one can use compositions of the tangent linear propagator in time to determine the growth rates and directions of growth of perturbations over a specified time window. Over sufficiently long time windows, growth rates define the asymptotic stability of a system and the directions span the subspaces that on average experience the respective growth rate. Rather than asymptotic (or average) behavior, one may in actuality be more interested in local rates and directions of perturbation growth and decay. This is precisely the case in ensemble forecasting. In the following sections we introduce the concepts of local and asymptotic directions of perturbation growth through SVs and CLVs, respectively. We also present another way to measure local growth through the MSVs introduced in Quinn, Harries, and O'Kane 14 . We discuss the computation of each of the vectors and their relations to one another.

A. Singular vectors
For a matrix M, the SVD is given by where V * = V T when M is real. The columns of V are referred to as right-singular vectors v .
These form a non-unique orthonormal basis that gives the directions corresponding to the respective stretching rates , i.e., the diagonal elements of the matrix . Singular vectors and their stretching rates are norm-dependent. In other words, the th right SV (v ) maximizes the following ratio of 2 norms: By definition, the columns of V are normalised, and therefore v = 1. One can choose appropriate weighting for the norms to maximize the stretching if the initial basis is not normalised.
In the context of the tangent linear propagator, one is interested in the SVD of the matrix cocycle over a chosen window , A ( 0 , ). This can be approximated by a composition matrix of the time-discretized system with time step Δ : As x is implicitly dependent on through Eq. 1, we drop the explicit dependence on x in the definition of the matrix cocycle A for simplicity of notation. The right singular vectors obtained from the SVD of Eq. 5 correspond to those perturbations with optimal growth rates over the window = 0 to = 0 + for the chosen norm. Hence, one can identify the perturbation directions associated with the most growth over that time period. This is a method often used when initializing ensemble members in ensemble forecasting (see subsection II D).

B. Covariant Lyapunov vectors
When one is interested in the asymptotic stability of a system, it is useful to consider Lyapunov exponents and Lyapunov vectors. Lyapunov exponents characterize the long-time growth and decay of a dynamical system. If one or more Lyapunov exponents are positive, then we say that the system is unstable. Otherwise, if all are negative then the system is stable. A Lyapunov exponent is defined as follows 15 : The subspaces Φ (x), known as Oseledets subspaces, decompose the phase space according to the long-time growth and decay rates. In other words, vectors that lie within a given subspace will evolve asymptotically with rate forward (-backwards) in time. These subspaces are covariant with the dynamics and therefore the vectors which span each subspace are known as CLVs. They are norm-independent, give the local directions of the corresponding asymptotic growth and decay in tangent space, and are not required to be orthogonal. While the CLVs are dependent on the underlying flow and therefore provide local information, the directions of growth are optimized for very long time scales.
Since the emergence of numerical algorithms for calculating CLVs [16][17][18] , the vectors have been used in many studies to determine the hyperbolicity and stability of chaotic systems in applications to convection 19,20 , turbulence [21][22][23] , and general dissipative systems [24][25][26][27] . Additionally the loss of hyperbolicity, or alignment of the CLVs, has been related to critical transitions or regime shifts in systems [28][29][30] . Recently they have been utilized in the context of ensemble data assimilation through the construction of the background covariance matrix using a subset of the leading CLVs 31 . Most relevant to the study at hand is the application of CLVs to the climate system through the study of stability in quasigeostrophic flow [32][33][34] , blocking events 35 , and climate teleconnections 14 .

C. Mixed singular vectors
Numerical approximations to the CLVs of a given system are obtained by considering the dynamics of the system over a chosen (sufficiently long) finite time window. As the length of this window is reduced beyond the lower limit for ergodicity, the resulting vectors are no longer CLVs.
Instead, these so-called MSVs reflect aspects of the local dynamics, as evidenced by differences in properties such as their alignment and finite-time growth rates 14 . The minimum window length required for the MSVs to converge to CLVs will be different for every system and can roughly be determined through analyzing the finite-time growth rates of the vectors. The averages of the growth rates calculated over many short time intervals should be equal to the asymptotic Lyapunov exponents if the vector is a CLV 36 . We use this property to differentiate between MSVs and CLVs when computing these vectors numerically.
In the following, we compute MSVs and CLVs using Algorithm 2.2 of Froyland et al. 18 applied over progressively longer time windows. The vector will be used to denote the th largest MSV or CLV at time = . First we choose a push forward ( ) and pullback ( ) window length, and set of orthogonalization times N = { , 2 , . . . , } (here the othogonalization time step is ). The approximation to is then obtained utilizing Algorithm 1.
In the subsequent analysis we will use an equal push forward and pullback window ( = ), however they can be chosen independent of one another. The effect of varying the two independently is beyond the scope of this study. The algorithm is initialised using the right singular vectors of the composition matrix from time − to . Here the optimisation time window is = Δ .
The singular vectors are then pushed forward, or evolved, by the tangent linear propagator time steps. At each orthogonalisation step (including at time ), the evolved vectors are enforced to be orthogonal to the leading singular vectors of the forward composition matrix from that time step. This is to avoid collapse onto the leading subspace. In this sense, initial and evolved singular vectors are being "mixed", hence the term MSV to describe the resulting vectors which have not yet converged to CLVs.

D. Use of singular vectors in ensemble weather forecasting
The early 1990s saw the introduction of SVs into numerical weather prediction. Mureau, Molteni, and Palmer 6 used SVs to define the perturbations to the initial states for ensemble We now briefly summarize the implementation of SVs in the ECMWF ensemble prediction system, referring the interested reader to the detailed description in Leutbecher and Palmer 13 .
Firstly, optimal perturbations are those which maximize the ratio of the final time to initial time norm. In this sense the norm is weighted by the estimated covariance C( ) of the perturbations x ∈ R at time : The propagator is constructed for the window of initial to final time which we again notate as A ( 0 , ) for consistency (note in Leutbecher and Palmer 13 this is defined as M). The initial right SVs of the propagator maximize and thus are optimal for the subspace of the estimated initial covariance C( 0 ). The normalized evolved SVs correspond to the left SVs U. The transformed forecast error covariance matrix Given the application-specific nature of this empirical approach, it is therefore of practical interest to seek alternative systematic approaches to determining mixed SVs with appropriate properties, i.e optimal structures with physically realistic growth rates, with potential application to NWP.  To apply the FEM-BV-VAR fitting procedure, one needs to first specify the number of clusters , the order of the VAR models (also referred to as memory), and the average persistence . As these values are typically not known beforehand, one can run the fitting procedure over many sets of the hyperparameters ( , , ) and use a chosen criteria to select the optimal model. As

A. Identifying persistent states
There is a well established association between persistent anomalies in atmospheric pressure and extreme weather events. For example, particularly intense precipitation events producing strong winds occur over mid-latitude coastal regions of southeastern Australia when isolated low pressure systems (cut-off lows) form in an unstable easterly flow on the northern flank of a slow-moving or blocking anti-cyclone 53 . As we are only interested in studying the local growth rates of the persistent states, we first extract all events such that the model affiliation is constant in one state for a minimum of 10 consecutive days.
In Table I  To see how the structure of the identified persistent events compares to all events combined, we show composites of all days associated with each of the respective persistent events in each given state ( Figure 1). The structure of state 1 is dominated by a high pressure anomaly centred over Scandinavia. We also plot for comparison the EOF pattern with the largest pattern correlation, which is EOF 1 at = 0.94. This suggests that persistent events in state 1 are associated with the leading mode of variability seen in the data. The composite of persistent events in state 2 shows a pronounced dipolar structure, with the positive and negative anomalies of the dipole having similar magnitude. The negative anomaly also has a large spatial extent, covering the entirety of the European continent. The largest pattern correlation is still with EOF 1, however it is negative and slightly less correlated than state 1 ( = −0.71). Interestingly, this pattern resembles the negative phase of the NAO when looking at the Atlantic region. While the fitting procedure does not include data from this region, it seems that a negative NAO event drives the persistent widespread negative pressure anomaly over Europe. The composite of persistent events in state 3 appears similar to that of state 1 in that it is characterized by a singular high pressure anomaly except with center over the north west of Europe. This composite has a relatively high pattern correlation with EOF 2.
Persistent occurrences of anomalies similar to the state 1 and state 3 composites have been related to heat wave events throughout Europe 54,55 .

B. Comparison of MSVs, SVs, and CLVs
As persistent events modulate the background state of the atmosphere for an extended period of time, we are interested in the short-term growth behavior during such events. For this we consider the stationary state dynamics for each regime. In Quinn, Harries, and O'Kane 14 we introduced MSVs to characterize finite-time error growth optimized over short periods. Here we investigate how these vectors compare to SVs calculated for the same composition matrix and the corresponding asymptotic CLVs.
In Appendix B we detail the construction of the reduced dynamical model via the FEM-BV-VAR framework. The resulting tangent linear propagator for the reduced model is given as follows: Here the propagators A for ∈ {1, 2, 3} are just constant matrices determined by the parameters corresponding to cluster state , and is the Viterbi path denoting the cluster state with the maximum affiliation value at time (see Appendix A for more details). The composition matrix (Eq. 5) over a given window is then just given by multiplication of the corresponding sequence of constant matrices.
To investigate the dynamics of persistent states we can simply analyze the stationary state dynamics. For this we use compositions of a given cluster's constant propagators from Eq. 9 for various window lengths. As we are interested in the finite-time growth over short periods, we compute sets of MSVs for each of the stationary states. Additionally we compute sets of SVs over the same time windows. More specifically, if a push forward of is used for Algorithm 1, then the SVs will be calculated for the composition matrix A ( 0 − , ) (note here that since we are working with constant dynamics the 0 is irrelevant). We also compute the asymptotic Lyapunov exponents of each stationary state using the QR algorithm 56 . Figure 2 shows the one-day growth rates for the leading 5 MSVs and SVs, as well as the 5 leading asymptotic Lyapunov exponents. The growth rates of the MSVs approach the asymptotic growth rates as the push forward is increased. This is expected as by definition the MSVs should converge to CLVs as increases in Algorithm 1. This means the MSVs will approach the asymptotic subspaces on which the Lyapunov exponents are defined. On the other hand, the leading SVs retain high growth rates even as the push forward is increased. It is known that SVs can commonly grow much faster than Lyapunov exponent growth 7,12 .
In addition to the growth rates, we are also interested in the atmospheric patterns onto which the vectors project. Figure 3 shows the leading SV, MSV, and their difference for stationary state the dynamics at hand are stationary and stable, we know that the model will not have growing modes for long push forwards and that the least-decaying mode will be similar to the composite of persistent events in a given state shown in Figure 1. The emergence of unstable small-scale structures for long push forwards gives further evidence that the leading SVs are not projecting onto the physically relevant atmospheric patterns for the time periods of interest.

IV. CASE STUDY: 2010 RUSSIAN HEAT WAVE
To provide a practical example of the utility of MSVs as potential initial perturbations for ensemble forecasting, we now turn our attention to a specific extreme event, namely the Russian heat wave (RHW) during summer 2010. This heat wave is considered to have taken place over the two month period from mid-June to mid-August, with its peak impacts in late July and early August [57][58][59] . The event is attributed to a strong and persistent atmospheric blocking event centred over Moscow 60 . As blocking is associated with persistent large-scale high pressure anomalies, we would expect the Russian heat wave to be represented in our model by persistence and recurrence to a pattern closely associated with the previously identified metastable state 1.
In order to investigate the RHW we first identify the relevant dates of impact using two standard heat wave indices. The first is a generally accepted definition from the World Meteorological Organization (WMO), which requires 5 or more consecutive days in which the maximum temperature exceeds the climatological average maximum temperature by 5 • C. The second index, introduced by Russo et al. 61  which is characterized by a persistent positive height anomaly over Moscow, with infrequent brief excursions to one of the other two states of never more than 3 consecutive days. This aligns with the conjecture that persistent state 1 events correspond to heat wave events in Europe.
We have previously described the use of combinations of initial and evolved SVs at the ECMWF 12,13 . Here we are interested in whether or not the systematically generated MSVs can also adequately characterize the dynamics and error growth, and hence predictability, when optimized over windows appropriate to medium-range weather forecasting (days to weeks). We take a start date of 15 July 2010 such that the heat wave has been well established and continues for a period of more than 10 days (according to Figure 4), and we compare the behavior of both SVs and MSVs on forward propagation.
We again consider the leading growth rates first, which are plotted in Figure 5. As with the stationary case, the SVs start out with much larger growth rates than the MSVs. The growth rates remain positive over 4 days for most of the push forward values. Even for a very long window of 100 days, which theoretically should be optimizing the structures which grow over that time period, the growth is faster than that of the MSVs optimized over 1 day. Additionally, given that our model is globally stable, we would not expect such growth for structures optimized over such a long window. We know that the growth of any arbitrary vector will converge to the leading Lyapunov exponent after a sufficient amount of time. We therefore show the growth of the leading SVs and MSVs out to 16 days to illustrate the relative behavior of this convergence. We see that the shorter push forward windows show a slower convergence to that growth and still underestimate it by day 16. For a very long window of 100 days, we see that the MSV is already on the asymptotic subspace (as expected), while the SV takes a few days to approach the subspace and still has not quite converged by the end of the 16 days. As can be seen by the values of the asymptotic Lyapunov exponents in Figure 5, our model is globally stable. We therefore expect that as perturbations are optimized over longer windows, those perturbation vectors will approach an increasingly stable subspace. This indicates that the MSVs are better able to characterize error growth optimized over particular finite time windows, however we observe that the SVs consistently initialize in a relatively unstable subspace.
In order to give more insight on the spatial patterns of the vectors, their propagation consistent with the growth rates described earlier in Figure 5, and relationship to the geographical location of the observed RHW structure, in Figure 6 On propagating the vectors we see that the SVs eventually converge to patterns with larger spatial structures. However, the location of the large-scale structures differs from that of the evolved MSVs, and the connection between the initial and evolved SV patterns is unclear. The MSVs appear to evolve as a propagation of their initial states with some growth and decay of the large-scale features. We observe that MSV 1 appears to develop a dipolar structure with a high pressure anomaly over Scandanavia similar to the composite pattern of state 3 (we note that Figure   4 indicates the model is in state 3 on 18 and 19 July). The structures associated with the heat wave in MSVs 2 and 4 remain, albeit smaller for MSV 2, as the two vectors begin to align with a similar spatial pattern. This could potentially be used as evidence for the continuation of the heat wave despite the leading vector developing an opposing anomaly in that region.
In relation to medium-range weather forecasting, the MSVs provide initial conditions isolating large-scale features and their subsequent evolution, and therefore may be optimal to use over SVs in this context.

V. SUMMARY AND DISCUSSION
The study by Quinn it is readily apparent that the non-parametric data-driven reduced order linear model obtained is capable of not simply representing the climatological characteristics of blocking over Eurasia, but also the dynamics of particular events and specifically the 2010 RHW.
We have discussed the use of SVs as deployed in the ECMWF weather forecast system, noting the combination of initial and evolved SVs through the implementation of a total energy norm. We note the complexity of that approach required to obtain the high degree of skill for global weather systems for which the ECMWF system is renowned. Our results show that the MSVs generated While much of climate science is directed towards developing general circulation models which represent the dynamics of the ocean, atmosphere, land, sea ice and cryosphere at ever increasing resolutions and complexity, there is a recognition in the geophysical mathematics community of the increased importance of furthering an understanding of the fundamental dynamical and statistical properties of the climate system. This is often done via reduced order models that capture the essential dynamics of the climate modes and with the ultimate aim of characterizing the causal relationships between these teleconnections, their regime dependencies, and their response to anthropogenic changes in the forcing. Nevertheless, many of the challenges that faced the early pioneers of numerical weather prediction remain in the climate models of increasing complexity.

DATA AVAILABILITY STATEMENT
The NCEP-NCAR reanalysis output used is provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, and may be accessed at https://psl.noaa.gov/data/reanalysis/reanalysis.shtml. All source code used to perform the analyses presented in this study may be found at https://doi.org/10.5281/zenodo.4035644.

Appendix A: FEM-BV-VAR specifics
The Finite Element Bounded Variation Vector Auto-Regressive method developed by Horenko 39 as also described in Metzner, Putzig, and Horenko 40 , is a data-driven clustering method which extracts metastable states and the associated switching sequence between the states. Each of the states is assumed to be defined by a locally stationary linear VAR model: Here we have introduced two hyperparameters, and . The first refers to the number of metastable regimes while the second defines the memory dependence of the VAR models. The model parameters and P are constant within each cluster state as is the covariance matrix Σ( ).
is random noise required to initiate transitions between states.
We seek to apply a regularized variational minimization of a scalar-valued functional describing the error ( , Θ ) of a particular model for a given observational data instance ( ) characterized by a time dependent set of model parameters Θ( ). We first construct a distance functional for a Here we optimize for fixed Θ followed by optimizing Θ for a fixed Γ. This procedure is repeated iteratively, until the change in (Θ, Γ) is less than some small pre-determined threshold, resulting in monotonic minimization of . In summary, we have used an adaptive finite element method for the numerical minimization of the FEM-BV Eq. A2. The number of different meta-stable states, or clusters , and the model parameters chosen within these regimes, such as memory depth , the indicator functions (·) signaling activation of the respective models, are all determined simultaneously in a global optimization procedure. The optimization yields a judicious compromise between low residuals, reproducing the data of a training set on the one hand, and the demand for the smallest-possible overall number of free parameters of the complete model on the other.

Appendix B: Model specifics
Given the results of the FEM-BV-VAR fitting procedure outlined in Appendix A, we can construct a reduced time-dependent model for the dynamics. We first assign each time step to a cluster state determined by the maximum affiliation at that time: = arg max [Γ( )] for ∈ {1, 2, 3}.
Eq. B1 is known as the Viterbi path in Hidden Markov Model literature. We can then use the Viterbi path and the model parameters to define the VAR model at each time step through Note that we have dropped the stochastic element of the VAR model as we are interested in evaluating the deterministic dynamics.
By extending the state space to account for the memory in the model, we construct a dimension 60 (3 × 20 PCs) discrete linear map whose tangent dynamics are governed by Our tangent linear propagator is then given as or alternatively,