Idealized forecast-assimilation experiments for convective-scale Numerical Weather Prediction

To aid understanding of and facilitate research into forecast–assimilation systems of Numerical Weather Prediction (NWP), idealized models that embody essential characteristics of these systems can be used. This article concerns the use of such an idealized fluid model of convective-scale NWP in inexpensive data assimilation (DA) experiments. The forecast model, introduced in Kent et al. (2017), is a modification of the rotating shallow water equations that includes some simplified dynamics of cumulus convection and associated precipitation. It is of interest owing to (i) its distinctive dynamics, including the disruption of large-scale balanced flows, highly nonlinear behaviour associated with convection and moisture, and other features of convecting and precipitating weather systems, and (ii) its computational efficiency, a crucial factor for an idealized model. When using such intermediate-complexity models for DA research, it is important to justify their relevance in the context of NWP. The process of achieving a well–tuned observing system and filter configuration is described here using a deterministic ensemble Kalman filter. The tuning process involves systematically permuting through parameters of the combined forecast–assimilation system, combinations of which define a single experiment. We conduct numerous experiments, each characterized by a specific combination of parameters pertaining to the filter configuration and observing system, and assess their performance and relevance objectively in a concise graphical manner. We show how to construct well-tuned experiments in which the ensemble provides a good estimate of the forecast error, assessed via a spread–error diagnostic and the Continuous Ranked Probability Score. The forecast–assimilation system has an average observational influence similar to operational NWP (about 30%) and the resulting error–doubling time statistics reflect those of operational convection-permitting models (about 6− 9 hours). We supplement the objective assessment of performance and relevance with a subjective examination of model fields at different forecast lead times, illustrating the impact of data assimilation on the model dynamics and highlighting where improvements are both achieved and lacking. Our approach and results not only demonstrate the model’s suitability for conducting DA experiments in the presence of convection and precipitation, but also offer a formative protocol for conducting data assimilation research in an idealized yet relevant framework.


Introduction
The atmosphere is a driven, dissipative, chaotic system that possesses myriad dynamical processes over a range of temporal and spatial scales. Numerical forecast models of the atmosphere attempt to capture some of these fundamental processes by integrating the primitive equations of motion for larger scales and parameterizing smallscale (unresolved) processes. Numerical Weather Predic-tion (NWP) is essentially an initial value problem comprising a forecast model and suitable initial conditions, with its accuracy depending critically on both. The model needs reinitialising regularly to restore information lost through error growth. Data Assimilation (DA; cf. Kalnay (2003)) attempts to determine the optimal initial conditions (and reinitialisations) for the forecast model by estimating the state of the atmosphere and its uncertainty using a combination of forecast and observational information, while taking into account their respective uncertainties. Despite the imperfections of NWP and its various components (e.g., forecast models and data) and the constraints on predictability owing to chaos (cf. Lorenz (1963)), weather forecasting remains possible, and indeed successful, owing to the regular updates from observations via DA.
Over the last decade there has been a growing use of ensemble forecast and assimilation systems in weather forecasting; concurrently, the spatial resolution of the forecast models has increased -grids on the order of one kilometre are now commonplace in regional models and are able to resolve some of the finer-scale features associated with convection and precipitation -whilst diverse highresolution observations are available (cf. Gustafsson et al. (2018)). It is important to understand how such developments impact on the effectiveness of assimilation algorithms. To this end, idealized models that capture some essential features of atmospheric modeling yet are computationally inexpensive and easy to implement (relative to modern NWP systems) can be used. These allow the investigation and optimization of current and alternative assimilation algorithms in a more concise mathematical and numerical setting to gain insights before considering implementation in a full NWP model (Ehrendorfer 2007). The shift from large-to convective-scale NWP is in some sense a shift from balanced to unbalanced dynamics. Traditional DA systems developed for large-scale NWP exploit the fact that midlatitude dynamics at synoptic scales are close to geostrophic and hydrostatic balance. However, this balance is no longer present at smaller scales where rotation no longer dominates and vertical accelerations modulate the flow -this is "the essence of the convective scale" (Yano et al. 2018). Kent et al. (2017) proposed a fluid dynamical model for investigating DA algorithms at convective scales (extending that of Würsch and Craig (2014)) based on the shallow water equations (SWEs) and modified via switches to incorporate some dynamics of cumulus convection, such as rapid ascent and descent of air, and the transport of moisture via a 'rain mass fraction' variable r. This modified rotating shallow water (modRSW) model exhibits important aspects of convective-scale dynamics relating to the disruption of these large-scale balance principles, as well as other aspects such as non-local convection initiation by gravity-wave propagation and convection downstream from an orographic ridge. Its distinctive dynamics, along with an efficient and robust numerical solver, were reported in detail in Kent et al. (2017); the purpose of this study is to demonstrate that the model provides a useful and relevant testbed for investigating DA algorithms in the presence of complex dynamics associated with convection and precipitation.
We present an idealized forecast-assimilation system using the ensemble Kalman filter assimilation algorithm and elucidate its relevance for convective-scale NWP and DA. The Ensemble Kalman Filter (EnKF;Evensen (1994)) and its variants have been extensively investigated with different models at different scales (cf. Meng and Zhang (2011) and Houtekamer and Zhang (2016)). It follows the same conceptual framework as the standard Kalman filter theory (Kalman 1960) but differs in that it uses Monte Carlo methods to estimate the (timedependent) forecast-error covariances via an ensemble of model integrations. In combination with other techniques to combat undersampling (due to the finite ensemble size), such as inflation and localization, it provides an approximation to the Kalman-Bucy filter (Kalman and Bucy 1961) that is computationally feasible for operational atmospheric DA problems. There are strong arguments for using ensemble-based algorithms at convective scales: having flow-dependent error statistics is beneficial at finer scales where nonlinear error growth proliferates and forecast uncertainty is generally larger. The use of convectivescale ensembles at operational centres is therefore growing (e.g, Hagelin et al. (2017) and Schraff et al. (2016)); Gustafsson et al. (2018) provide a survey of operational convective-scale DA and NWP, including EnKF-based techniques.
Here, we employ the deterministic filter of Sakov and Oke (2008) in our idealized experiments, which is particularly robust for small ensembles and is shown to be equivalent to the 'no-perturbation' EnKF with a particular implementation of adaptive inflation. We emphasize that our present aim is not to compare DA methods, nor to propose suggested changes to operational DA systems, but to provide convincing evidence that DA experiments using this idealized model show consistency with operational forecast-assimilation systems. This is an often overlooked aspect in DA research using idealized models, and should be a prerequisite to be considered useful from an operational perspective.
Achieving an interesting and relevant experimental setup is more nuanced than simply interfacing a model with an assimilation algorithm. It requires consideration of the 'real-life' problem at hand, in this case convectivescale NWP and DA, and should attempt to mimic certain attributes of the whole system, not just the dynamical ('model-only') aspects. Addressing the relevance of DA experiments in an idealized setting is an important aspect of this work: idealized models are by construction (sometimes severely) reduced versions of their state-of-the-art counterparts and it is important to justify that methods and results from idealized forecast-assimilation experiments are valid and apply to the operational problem. This can be achieved by enforcing, where possible, consistency with operational systems in the modeling set-up (e.g., temporal and spatial scales), the assimilation algorithm (e.g., the need for techniques such as localization and inflation in ensemble-based filtering), and the combined forecastassimilation system (e.g., error-growth statistics and observational influence (Cardinali et al. 2004)). In practice, operational forecast-assimilation systems require a great deal of tuning in order to perform optimally, and this must take into account many facets of the forecast model, the observing system, and the assimilation algorithm. Accordingly, the process of developing and arriving at a welltuned system deserves attention in an idealized setting and is a key feature of this study.
In an operational forecast-assimilation system, tuning is performed to produce the lowest forecast error (at a target lead time) for fields of interest given the available observing system. Typically, this involves permuting through various parameters associated with assimilation algorithms, such as ensemble size, inflation factors/methods and localisation length-scales, in an attempt to arrive at the filter configuration with the best performance. This is achieved usually in a systematic fashion (e.g., Poterjoy and Zhang (2015)) that requires subjective comparison of potential parameter combinations. Recent attempts at optimal tuning by Ménétrier et al. (2015a,b) pursue a more objective approach than simply permuting through a prescribed set of parameters. We note also that tuning in an operational setting often comprises a combination of objective and subjective verification.
The process of tuning an idealized forecastassimilation system differs somewhat from the operational case in that we have the freedom to choose the observing system. How it is generated should reflect the problem at hand and in some sense becomes part of the process: the observing system should be tuned alongside the filter configuration to produce an idealized system that demonstrates attributes of an operational system (e.g., Fairbairn et al. (2014); Inverarity (2015)). For example, if an experiment is deemed optimal in a minimum-error sense but has an observational influence of, say, a few percent, it cannot be considered relevant since convective-scale NWP has an average observational influence 20%. Likewise, if error growth rates of ensemble forecasts initialised using model fields from the optimal analysis ensembles are not comparable with operational values, it is difficult to consider the experiment meaningful from an NWP perspective.
In a recent review article, Houtekamer and Zhang (2016) commented that: "the frontier of data assimilation is at the high spatial and temporal resolution of limitedarea systems, where we have rapidly developing precipitating systems with complex dynamics". By combining the nonlinearity due to the onset of convection and precipitation and the genuine hydrodynamic (advective) nonlinearity of the SWEs, the model introduced in Kent et al. (2017) captures some fundamental dynamical processes of weather systems at this frontier and provides an interesting and legitimate testbed for data assimilation research at convective scales. We substantiate this claim further here, by describing in detail the process of arriving at a welltuned and relevant set of idealized forecast-assimilation experiments. In doing so, we define a formative protocol -not a set of rules per se, but a suggested framework -for conducting idealized experiments that directly addresses their relevance for (convective-scale) NWP. The essence of this protocol, and indeed the entire study, is encapsulated in table 1: the first column lists aspects of forecastassimilation systems that idealized systems should seek to replicate where possible, along with criteria to guide the tuning and validation process; the second column gives typical values of these aspects for an operational system (where applicable, cf. Gustafsson et al. (2018)); our results, and an assessment of their relevance, augments the table in the last two columns.
For this demonstration, we have chosen a model resolution and cycling frequency (table 1) that are directly comparable to operational systems (see, e.g., table 5 in Gustafsson et al. (2018)). An idealized system has, by construction, fewer degrees of freedom (n) and number of observations (p) than than an operational system; however, operational systems are characterized by their rankdeficiency, and this should be imposed on our idealized system by choosing the ensemble size N to satisfy N < p < n. The observing system is chosen here to be a simplification (linear operator, uncorrelated errors) whilst having enough flexibility (through observation density and error magnitude) to tune the system, noting that the realistic treatment of observations is not the focus of this study. The filter configuration is defined here by the parameters associated with localization and inflation: we implement horizontal covariance localization, a flavour of additive inflation, and adaptive multiplicative 'Relaxation-To-Prior-Perturbation' (RTPP) and 'Relaxation-To-Prior-Spread' (RTPS) inflation. Well-tuned experiments in the idealized system have a localization lengthscale and inflation factors directly comparable to operational NWP. For an experiment to be considered well-tuned, the ensemble spread (SPR; i.e., the root mean square difference between the ensemble members and the ensemble mean) should be comparable to the root mean square error (RMSE) of the ensemble mean (cf. Whitaker and Loughe (1998)). Thus, we demand the ratio SPR/RMSE ∼ 1 and seek to minimise the RMSE and the continuous ranked probability score (CRPS; Hersbach (2000)). Experiments are optimised on forecasts with a three-hour lead-time (denoted as T+3) since we do not necessarily want to produce the best analysis but the best forecast (lowest RMSE) at a desired lead time, noting that 3hr forecasts are within the 6 hours demanded by short-range/nowcasting ensemble prediction systems (Sun et al. 2014;Ballard et al. 2012).
If an idealized system achieves all of the above, it is validated for its relevance for NWP data assimilation primarily via the overall influence of observations and the errordoubling time statistics. Alongside objective verification measures for system performance, subjective assessment of experiments is common in operational NWP. We therefore examine a well-tuned and relevant case qualitatively via a subjective visual assessment, giving a more holistic  1: Idealized forecast-assimilation experiments for convective-scale Numerical Weather Prediction: protocol, results, and relevance. The protocol is summarized in the first two columns: the first column lists aspects of forecast-assimilation systems that idealized experiments should seek to replicate where possible; the second column gives typical values for an operational system. Our configuration, and an assessment of its relevance, augments the table in the last two columns: column 3 gives the values achieved for a well-tuned idealized experiment using the modRSW model and the ensemble Kalman filter, detailed in section 4; the last column appraises the relevance of these results by comparing the values in columns 2 and 3 and attributing medium (-) or high () relevance where applicable, with N/A used for quantities that are deliberately chosen to be different in order to attain idealized status. illustration of the impact of cycled data assimilation on forecast model fields.
The outline of this article is as follows. In section 2, a brief description of the idealized fluid model is given, along with details of the numerical implementation. The necessary theoretical and practical aspects of ensemble Kalman filtering, including diagnostics for assessing DA systems, are addressed in section 3. The tuning process is detailed in section 4, in which numerous experiments are assessed for performance and relevance, before analysing in depth a particular set-up that is well-tuned and exhibits characteristics of high-resolution NWP. Finally, we conclude with a critical discussion of our results in section 5, returning again to Table 1, and propose potential directions of interest for further research using the model.

a. Shallow water equations and modifications
The model in Kent et al. (2017) is based on the standard shallow water equations on a rotating Cartesian f -plane in which dynamical variables are independent of one spatial coordinate (here the y-coordinate, so that ∂ (·)/∂ y := ∂ y (·) = 0). These standard equations are where h = h(x,t) is the fluid depth, b = b(x) is the prescribed underlying topography (so that h + b is the freesurface height), u(x,t) and v(x,t) are velocity components in the zonal xand meridional y-directions, f is the Coriolis parameter (typically 10 −4 s −1 for midlatitudes), g = 9.81ms −2 is the gravitational acceleration, and t is time. The effective pressure p(h), following the terminology of isentropic gas dynamics, has the form p(h) = 1 2 gh 2 and facilitates the modifications that follow. This system of equations, together with specified initial and, where appropriate, boundary conditions, determines how the flow evolves in time.
The modRSW model introduces a number of alterations to the parent equations in regions where the fluid exceeds certain threshold heights. These two thresholds provide highly nonlinear switches for the onset of convection and precipitation, enabling a simplified representation of cumulus convection to be incorporated in the model without explicitly considering temperature and other thermodynamic properties. An extra variable is also introduced for the idealized transport of moisture (the 'rain mass fraction' r) which is coupled to the hu-momentum equation. A full model description, including its physical basis, and a thorough investigation of the model's distinctive dynamics are given in Kent et al. (2017); here, we reproduce the equations in full and provide a brief overview of the key terms and modified dynamics.
The modRSW model is described by the following equations: where P and Q are defined via the effective pressure p = p(h) = 1 2 gh 2 by: with p denoting the derivative of p with respect to its argument h, and the 'rain switch': The threshold heights H c and H r > H c (units m), appearing in the modified pressure terms (Eq. 3) and 'rain' switch term (Eq. 4), correspond to the onset of convection and precipitation, respectively. The constants α > 0 (units s −1 ) and β > 0 (dimensionless) control the removal and production of model 'rain' respectively; c 2 0 (units m 2 s −2 ) converts the dimensionless r into a potential in the momentum equation and controls the strength of the feedback. Schematic solutions for h and r and the threshold heights are illustrated in figure 1. When the fluid exceeds the first threshold h + b > H c , the pressure terms (Eq. 3) are set to a constant value which is lower than it would be without the modification (see Fig. 1 in Kent et al. (2017)). This essentially reduces the restoring force due to gravity acting on the fluid, and the fluid will rise in this region. If the fluid subsequently ascends above H r and ∂ x u < 0, then the 'rain' switch term β (Eq. 4) is non-zero and acts as a source for the rain mass fraction in (Eq. 2d); the positive wind convergence condition (∂ x u < 0 in the absence of a y derivative) ensures that rain is formed when the fluid is rising only. The presence of rain r in the system modulates the flow via the hc 2 0 ∂ x r term in the hu-momentum equation (Eq. 2b), which can be thought of as a local contribution to the pressure gradient. This enhances the restoring force, thus suppressing the updraft and limiting the growth of convection in the model. It was stressed in Kent et al. (2017) that what is referred to as model 'rain', or the 'rain mass fraction' r, is an artificial representation of precipitation: it should be regarded as the fraction of mass in a column that has precipitated. It is never removed from the system (total mass is conserved according to Eq. 2a), with the sink term αhr in Eq. 2d transferring mass from a precipitated state r to a notional precipitable state 1 − r so that it may precipitate again at a later time. This is important for idealized DA experiments as it means that the model does not run for a finite time only, i.e., it does not 'run out of' moisture and so continues to precipitate for large times.

b. Numerical implementation
The solver derived in Kent et al. (2017) provides a novel, robust and efficient scheme for the numerical integration of the modRSW model. The scheme is based on the discontinuous Galerkin finite element method (DGFEM) and combines methods to ensure wellbalancedness and non-negativity of h and r. It discretizes the flow domain into N el elements (defining the horizontal resolution of the model) and uses a dynamic time-step that guarantees stability while allowing for gains in efficiency (i.e., a larger time step) when possible. We integrate the non-dimensionalised equations (see appendix 1 in Kent et al. (2017)), characterised by two non-dimensional parameters: (i) the Froude number Fr = V 0 √ gH 0 and (ii) the Rossby number Ro = V 0 f L 0 , which control the strength of stratification and rotation, respectively, compared to the inertial terms, using typical values of the fluid depth H 0 , domain size L 0 and flow speed V 0 defined in the next section. Further simulation details are deferred to section 4.

c. Dynamical set up for DA experiments
Motivated by the experiments with topography in Kent (2016) and Kent et al. (2017), non-rotating supercritical flow over topography is considered for the experiments herein with non-dimensional parameter Fr = 1.1. The topography is defined as a superposition of sinusoids in a sub-domain and zero elsewhere: where x p = 0.1, k = {2, 4, 6}, A = {0.1, 0.05, 0.1}. Given a non-zero initial velocity and periodic boundary conditions, this collection of hills (see top panels in figure 4) generates varied and complex dynamics (including gravity-wave excitation) without the need for external forcing or an imposed mean wind field. Periodic boundary conditions mean that waves that leave the domain wrap around again, and so the flow remains energetic; this keeps the flow dynamically interesting without further forcing. The basic (unperturbed) initial conditions used in these experiments are: h(x, 0) + b(x, 0) = 1; hu(x, 0) = 1; hr(x, 0) = 0; (6) noting that since we consider non-rotating flow (effectively setting Ro = ∞) over topography with transverse velocity v initially zero, Eq. 2c is removed from the integration. We address ensemble initialisation in section 3. For a given Froude number, potential characteristic scales of the dynamics can be analysed and, where possible, likened to high-resolution NWP. Consider a fixed length of domain L 0 = 500 km and velocity-scale V 0 = 20 ms −1 , implying a time-scale T 0 = 25000 seconds (∼ 6.94 hours). Thus, one hour is equal to 0.144 nondimensional time units. A Froude number Fr = 1.1 implies gH 0 ∼ 330 m 2 s −2 . We note that gravity g in shallow water models can be reinterpreted as a reduced gravity g = g∆ρ/ρ where ∆ρ is the density difference between two fluid layers (see, e.g., Rogerson (1999)), and impose a height scale of H 0 = 500 m. This implies a reduced gravitational acceleration of g ∼ 0.66 ms −2 and is justified as follows. Taking the average air density in the layers 0-500 m and 500-1500 m from the air density profile of the International Standard Atmosphere 1 , ρ 0−500m = 1.196 kg m −3 and ρ 500−1500m = 1.112 kg m −3 , gives a reduced gravity of g ∼ 0.69 ms −2 . The threshold height H c mimics the Level of Free Convection (LFC, the height at which an air parcel gets warmer than its surroundings and starts to rise freely); we note here that this can be on the order of several hundred meters in convectively unstable conditions, so our choice of magnitude for H c and H r (see Table 2) is reasonable.

Tools for running and assessing an ensemble-based DA system
Monte-Carlo ensemble forecasts attempt to sample the true probability density function (pdf) of the atmosphere starting from a finite number of initial random perturbations (Epstein 1969;Leith 1974). The ensemble provides a discrete estimate of this distribution; the mean (first moment) yields the Best Linear Unbiased Estimator (BLUE) of the future state, while the covariance (second moment) of the ensemble exemplifies the uncertainty in the ensemble forecast (Murphy 1988). These finite sample estimates are known to converge slowly and considerably undersample the true pdf when the number of degrees of freedom is large (Stephenson and Doblas-Reyes 2000). In the context of NWP, the computational cost of integrating the forecast model M limits the size of the ensemble N used operationally, which is typically O(10 − 100), much smaller than the number of degrees of freedom of the model n = O(10 9 ). Consequently, all ensemble DA schemes suffer from sampling errors, as N n. The efficiency and effectiveness of the EnKF algorithm depends on myriad factors, almost all of which stem from this undersampling due to the small ensemble size (Houtekamer and Mitchell 1998;Carrassi et al. 2018).
We introduce the EnKF equations and a particular deterministic formulation in section 3a, and relate these to the model and discretization of section 2. Well-known techniques to combat undersampling, namely inflation, localization, and excluding members from covariance estimates to avoid inbreeding, are addressed in section 3b, followed by objective measures to assess performance of a DA system in section 3c. These techniques and measures are crucial parts of any ensemble-based DA system and are central to the tuning process and results of section 4. An overview of the full sequential forecast-assimilation algorithm is given in appendix A.

a. EnKF equations
We follow where possible the notation of Houtekamer and Zhang (2016): the n-dimensional state vector is denoted x ∈ R n and the p-dimensional vector of observations is denoted y ∈ R p . Superscripts 'f' and 'a' applied to the state vector indicate the forecast and analysis state, respectively. A linear observation operator 2 H : R n → R p generates simulated observations by mapping the state vector x from model to observation space; the nonlinear model operator M : R n → R n is the numerical forecast model. Here, M is the DGFEM discretization outlined in section 2b and the state vector is the column vector of discretized model variables (i.e., h k , (hu) k , and (hr) k for k = 1, ..., N el , so that n = 3N el ).
It should be noted that the state vector in the assimilation (Eq. 8) is defined as x = (h k , u k , r k ) T ∈ R n , where the T superscript denotes vector or matrix transposition, rather than in terms of the flux variablesx = (h k , (hu) k , (hr) k ) T ∈ R n used in the model integration (Eq. 7). Thus, the model operator M acts onx and, before passing the model statẽ x to the analysis step, it is transformed via a mapping Ψ to the state vector: x = Ψ(x). This simply maps h to h, hu to u, and hr to r. For ease of notation, we simply write x in what follows.
For an N-member ensemble, the j th member x j ( j = 1, .., N) is integrated forward from time t i−1 to time t i via the forecast model M : At time t i , observations y are assimilated, yielding an ensemble of analysis states: where the Kalman gain K and the matrices within are evaluated at time t i (this time-dependence of the analysis step is now implicitly assumed and no longer indexed); P f is the forecast-error covariance matrix, defined below in Eq. 11. In the stochastic filter, perturbed observations y j are treated as random vectors whose distribution has mean equal to the unperturbed (i.e., measured) observation y and error covariance matrix R; this is necessary to retain the correct analysis covariance (Burgers et al. 1998). The perturbed observation ensemble is defined as: where N (0, R) denotes a Gaussian distribution with zero mean and covariance matrix R. It is necessary to correct the observations against any sampling bias (i.e., y j = y if o j = 0 where the overbar denotes an average over the ensemble) after the perturbations have been applied, especially when N is small (cf. Carrassi et al. (2018)). As is typical in the Monte Carlo approach to forecasting, the best estimate of the state is given by the ensemble mean: State error covariance matrices in the standard Kalman filter are defined in terms of a (usually unknown) truth state, and must accordingly be modelled in some way. In the EnKF, the forecast-error covariance matrix is approximated using a finite ensemble as: where X f = (X f 1 , ..., X f N ) is the matrix whose columns comprise the ensemble perturbations (x f j ) = x f j − x f . Errors are therefore defined as perturbations from the ensemble mean rather than the (unknown) truth, and the forecasterror approximation is completely characterized by the covariance matrix generated by a finite ensemble of model states.
The Monte-Carlo nature of the EnKF leads to two major sources of sampling error from (i) the finite ensemble size used to model the forecast errors, and (ii) the stochastic perturbation of observations, both of which make the filter suboptimal. To combat this second source, the analysis step (Eq. 8) can be formulated in a deterministic fashion -that is, there are schemes (so-called square-root or deterministic filters) which compute the analysis without perturbing the observations (Tippett et al. 2003) -whilst retaining the expected analysis covariances. Here, we implement the deterministic filter (DEnKF) of Sakov and Oke (2008), which combines the versatility, robustness, and effectiveness of the EnKF with the performance of other square-root filters. This scheme calculates the analysis mean via the update step (Eq. 8) with the unperturbed observation vector y and ensemble covariance matrix P f e : and the analysis perturbations are updated as follows: Each analysis state is then augmented with the mean to complete the process: where (X a ) j is the j th column of the matrix X a . Full details, including theoretical justification for the changes, are found in Sakov and Oke (2008); our implementation uses an equivalent generalized approach described in Appendix B.

b. Localization, inflation, and self-exclusion
Additional techniques are required to compensate for the sampling error associated with the finite ensemble size; in severely rank-deficient cases (i.e., when N n), they are crucial for maintaining satisfactory filter performance, particularly for nonlinear systems. We address each in turn and describe the approach taken here.
Localization attempts to prevent the analysis estimate being degraded by suppressing spurious long-range correlations in the forecast-error covariance matrix (Hamill et al. 2001;Houtekamer and Mitchell 2001;Whitaker and Hamill 2002). In model-space covariance localization (cf. Carrassi et al. (2018)), this is achieved by multiplying the elements of the forecast-error covariance matrix with elements of a covariance taper matrix ρ ρ ρ ∈ R n×n that reduces correlations as a function of distance. Entries of the covariance taper matrix ρ ρ ρ are calculated using a correlation function ρ with compact support (i.e., non-zero in a local region, zero everywhere else), resulting in a localized forecast-error covariance matrix P f loc = ρ ρ ρ • P f e , where • is the Schur product 3 . A common choice for the localizing function ρ is the Gaspari-Cohn function, a piecewise rational function (Gaspari and Cohn (1999); their equation 4.10): where: z is the (absolute) Euclidean distance between 2 grid points, and c is a length-scale that determines the severity of the localization. This function (Eq. 15) has similar shape to a half-Gaussian function, noting that ρ(0) = 1 and ρ decreases as z increases, with correlations reduced to zero beyond twice the characteristic length-scale. We redefine the length-scale parameter c in terms of a dimensionless factor L loc such that the cut-off distance is defined in terms of the number of grid points in model space. Setting c = N el dx/(2L loc ) in Eq. (15), where dx is the grid spacing, we see that correlations are cut off completely after N el /L loc grid points, corresponding to a distance z = N el dx/L loc = 2c: for N el = 200, an L loc value of, say, 2, defines a length in model space of 100 grid points, half of the the domain, beyond which correlations are zero. It follows that broad (narrow) localization is achieved with low (high) L loc values. Figure 2 illustrates the process of covariance localization 4 in our system with 200 grid points. The correlations are filtered out gradually and suppressed completely (due to the compact support) beyond the distance 2c, after which an observation has no influence. We relate this distance to the dimensional distance in kilometres in section 4, after the scaling of the system has been discussed. Ensemble inflation techniques attempt to maintain a sufficient ensemble spread (and satisfactory filter performance) by artificially increasing the spread; typically, a combination of additive, multiplicative, and/or adaptive methods is used in practice. However, it should be noted that the alterations in the ensemble trajectories due to inflation dilute the impact of flow-dependent statistics developed in the EnKF (cf. Houtekamer and Zhang (2016)). We employ a combination of adaptive and additive inflation.
Additive inflation comprises adding random Gaussian perturbations η j ∼ N (0, γ 2 a Q) during, or at the end of, the forecast step: where the model-error covariance matrix Q is prescribed from some knowledge of the modeling system and γ a is a tuneable parameter controlling the overall magnitude of the sample perturbations. The γ a parameter is used to adjust the magnitude of the sampled additive noise in recognition of the fact that estimates for Q are inherently approximate, and mainly intended to represent the relative magnitudes of and correlations between the different components. How one best defines Q is an open questionideally it should be constructed using flow-dependent perturbations (Hamill and Whitaker 2011) but it is often a static matrix developed offline from historical data. Since the aim of this study is not to investigate the role of the Increasing L loc (green and red curves) leads to a more severe localization. Top right: banded localization matrix ρ ρ ρ with L loc = 1.5, which has the same dimension as P f e and is a block 3 × 3 matrix for our three-variable system. Bottom left: a time-dependent forecast-error correlation matrix (i.e., normalized P f e ) generated by an 18-member ensemble (note that this is the average over the N = 18 sample error correlation matrices due to self-exclusion; see step 2iv in appendix A). Bottom right: the localized correlation matrix (i.e., normalized P f loc = ρ ρ ρ • P f e ) combines the top right and bottom left matrices. We also note the strong positive correlation between the h and r variables, apparent in the top right and bottom left blocks, highlighting the concurrent presence of convection and precipitation in h and r, respectively.
Q matrix, we adopt the most straightforward approach for an idealized setup, generating the model-error covariance matrix by exploiting the difference in resolution between the nature run and the forecast. In particular, we define the model error η j as the difference between a deterministic model run and the nature run trajectory at each grid point for a one-hour forecast (matching the assimilation frequency), with both forecasts starting from the same initial conditions taken from the nature run (sub-sampled in the lower-resolution forecast). From the error distribution obtained with 48 forecast pairs, each starting with a different initial condition, we calibrate our model-error covariance matrix Q using the unbiased sample covariance estimator with a denominator of N − 1 and neglecting all the non-diagonal terms for simplicity. We also set to zero the diagonal terms relating to the model rain r, as it is nonlinearly related to h and inflating both might cause r to be over-inflated. In order not to introduce bias into the model state x(t i ), an unbiased additive inflationη j is computed and used in (16) by subtracting the ensemble averaged additive inflation η:η A graphical representation of the diagonal model-error covariance matrix Q is shown in Fig. 3. We note the geographical variation in the model-error variance, with larger errors downstream of the topography (where much of the convection is triggered), and emphasize the approximate nature of this derived Q for the purpose of additive inflation tuning. Motivated by Bowler et al. (2017), additive inflation is implemented in (Eq. 16) via the Incremental Analysis Update method (IAU, see Bloom et al. (1996)) by spreadingη j contributions appropriately (i.e., proportional to the adaptive time-step) throughout the numerical FIG. 3: Spatial structure of the (variance of the) modelerror perturbation vector η j ∼ N (0, γ 2 a Q) used in additive inflation (Eq. 16). The plot shows var(η j ) = diag(γ 2 a Q) = γ 2 a diag(Q), a vector of dimension n = 600, for three candidate values of the scaling factor γ a used in the tuning process. The vertical dashed lines partition the vector into three for the h-, hu-, and hr-components of the variance, respectively. The model-error covariance matrix Q is diagonal and its derivation is described in detail in section 3b.
integration from t i−1 to t i (see Appendix A). Additive inflation does not try to represent the model error explicitly, but provides a lower bound for the forecast-error covariance, thus preventing filter divergence. Moreover, the addition of random Gaussian perturbations can moderate the non-Gaussian higher moments that nonlinear error growth may have generated in the forecast step. Since the optimal EnKF solution assumes Gaussian distributions, this is expected to benefit the quality of the analysis estimate (Houtekamer and Zhang 2016). However, adding random Gaussian noise may also mask useful covariance information pertaining to the model dynamics.
Two popular flavours of adaptive multiplicative inflation are the Relaxation To Prior Perturbation (RTPP) and Relaxation To Prior Spread (RTPS) methods. Zhang et al. (2004) proposed RTPP as an alternative to simple constant multiplicative covariance inflation of Anderson and Anderson (1999) that relaxes analysis perturbations X a back to the forecast perturbations X f independently at each analysis point as follows: where α RT PP ∈ [0, 1] is a tuneable parameter. Whitaker and Hamill (2012) comment that this amounts to a combination of multiplicative and additive inflation. In a similar vein, the RTPS method relaxes the analysis ensemble spread σ a back to the forecast spread σ f (Whitaker and Hamill 2012) as follows: where σ is the spread at each gridpoint measured as the square root of the diagonal entries of Eq. (11) applied to the forecast and analysis ensembles and α RT PS ∈ [0, 1] is a tuneable parameter. We direct the reader to appendix A for implementation details. The RTPS is a purely multiplicative form of inflation. In this study, we employ the RTPS method and note that, as exploited in the operational MOGREPS-G Ensemble of Data Assimilations, the DEnKF is equivalent to the 'no-perturbation' EnKF with implicit RTPP adaptive inflation (Eq. 18 with α RT PP = 1/2); this equivalence is derived in appendix B. The downside of using relatively ad hoc inflation and localization techniques is that there are several parameters that need tuning; the filter configuration in our idealized framework is defined by the combination of parameters L loc , γ a , and α RT PS for localization, additive, and RTPS inflation respectively. Houtekamer and Mitchell (1998) identified the issue of inbreeding that occurs when ensemble perturbations from the ensemble mean used to calculate P f e are themselves updated using this covariance matrix, leading to the spread being underestimated (see also Roth et al. (2017)). They adopted the approach of using two sub-ensembles and updating each sub-ensemble using forecast-error covariances calculated from the other sub-ensemble and proposed the limit of calculating P f e using all members except the one being updated. We have adopted this latter approach (termed self-exclusion by Bowler et al. (2017)).
c. Diagnostics for assessing performance and relevance

1) ENSEMBLE ERROR AND SPREAD
The performance of ensemble methods is typically measured by their accuracy, quantified by the RMSE of the ensemble mean and ensemble spread (e.g., Whitaker and Loughe (1998)). A useful summary measure for our dimensionless system is provided by the RMSE of the ensemble mean (averaged over the state vector), defined as: where x t k and x j,k are the k th components of the nature run vector x t and ensemble member x j , respectively. A natural measure of the typical spread (or dispersion) of the ensemble is provided by the root mean value of Eq. (11) (again averaged over the state vector): where Tr denotes the trace of the matrix, given by the sum of its diagonal entries. It should be noted that the rain component has been multiplied by 100 in these expressions to make it the same order of magnitude as the fluid depth and velocity components (section 4c). Both RMSE and SPR are valid at the same time as the state vector. As diagnostics, we monitor the both the spread and RMSE individually, and also the ratio SPR/RMSE. An ideal ensemble is expected to have the same magnitude of ensemble spread as the RMSE of its mean at the same lead time in order to represent the full uncertainty in the forecast (Stephenson and Doblas-Reyes 2000). However, Leutbecher and Palmer (2008) showed that, for a finite ensemble of N members, the ensemble spread converges to the RMSE if a correction factor N+1 N−1 is applied. Although a ratio of one is considered optimum theoretically, in practice this fluctuates in time and space, as well as between variables, and so we accept the ratio within a certain tolerance (e.g., SPR/RMSE = 1 ± 0.2). This tolerance also takes into account the bias correction described above (Leutbecher and Palmer 2008). These measures provide a simple but important objective diagnostic on the reliability of the generated ensemble in the EnKF (cf. Piccolo et al. (2019)).

2) CONTINUOUS RANKED PROBABILITY SCORE
A well-configured ensemble is crucial to good performance as it is used to estimate flow-dependent forecasterror covariances. Spread and RMSE are a good first check for an adequately performing ensemble, but do not capture the entire permissible range of outcomes, i.e., the full distribution provided by the ensemble. The CRPS verifies the reliability of an ensemble for scalar quantities, and is a popular verification tool for probabilistic forecasts. Reliability represents the degree to which forecast probabilities agree with outcome frequencies. A key feature of the CRPS is that it generalizes the mean absolute error to which it reduces if the forecast is a point measure (i.e., deterministic forecast). It is a negatively-oriented scoring rule that assigns a numerical score (lower scores indicating higher skill with zero being perfect) to probabilistic forecasts. The CRPS is computed for each component of the forecast/analysis ensemble x f /a j at every assimilation time. We use the derivation for ensemble prediction systems (Hersbach 2000) and applied in, e.g., Bowler et al. (2017) (their appendix C).

3) OBSERVATIONAL INFLUENCE DIAGNOSTIC
The global average observation influence diagnostic, related to the Degrees of Freedom for Signal measure, is defined as: where S = ∂ (Hx a )/∂ y = HK is the analysis sensitivity matrix (Cardinali et al. (2004); Inverarity (2015); see appendix C for a derivation generalized for an ensemble system with self-exclusion). It provides a norm for quantifying the overall influence of observations on the analysis estimate expressed in observation space and is normalized by the total number of observations p to express the result as a fraction, with 1 representing an analysis totally determined by the observations and 0 an analysis that is equal to the prior forecast. Thus, increasing the number of observations alone will not necessarily increase their overall influence. Cardinali et al. (2004) have applied this diagnostic to the ECMWF global NWP model and found an observation influence OID = 0.18. This means that observations adjust the prior forecast estimate closer to reality, rather than replace it completely; observations alone are too few and incomplete (compared to the size of the system) to provide a comprehensive picture of the state. However, it should be noted that the prior forecast estimate contains observational information from previous analysis cycles. Brousseau et al. (2014) use the same diagnostic to quantify the observational impact for the AROME (Applications de la Recherche l'Opérationnelà Méso-Echelle) convectivescale forecast-assimilation system. It is expected that the observational influence for convective-scale NWP is higher than for global NWP, but that the analysis estimate is still dominated by the prior forecast. Therefore, we consider our idealized system relevant for convective-scale NWP if it has an influence 20% OID 40%. The OID is computed at each assimilation time and is also expected to vary temporally; it can also be readily partitioned to assess observations of each variable. Throughout this work, we have calculated the OID for each sub-ensemble, taking account of self-exclusion (Appendix C), and averaged these to provide a single measure of the observation influence on the ensemble as a whole.

4) ERROR-GROWTH RATES
The error-doubling time T d of a forecast-assimilation system is the time taken for the error E of a finite perturbation at time T 0 to double: where E is some error norm (taken here to be the RMSE, as in NWP). The error-doubling time is expected to fluctuate somewhat between variables (since certain variables behave more nonlinearly than others) and is controlled by the 'dynamics of the day', i.e., it is temporally and spatially dependent. As such, it is not a fixed number, but by averaging over a number of staggered forecasts covering a range of dynamics and initial perturbations, the mean error-doubling time of the system can be estimated. For global NWP, Buizza (2010) found a doubling time of 1.28 days for the Northern Hemisphere forecast error. Errors in high-resolution NWP grow faster at the smallest scales than in the global case owing to the strong nonlinearities at convective scales. Thus, in order to be relevant for convective-scale NWP, a well-tuned idealized forecastassimilation system should exhibit a mean error-doubling time on the order of hours rather than a day. Moist convection severely limits mesoscale predictability (Zhang et al. 2003), and for limited-area cloud-resolving models, the mean error-doubling time has been found to be around 4 hours (Hohenegger and Schär 2007). Thus, ensemble forecasts initialised with the analysis perturbations from a well-tuned experiment should exhibit characteristic error growth rates on this timescale. This diagnostic is not used in the tuning process but is used to ascertain the relevance of an experiment considered well-tuned with respect to the three criteria of Table 1. We compute T d for an idealized ensemble prediction system in section 4d by running 450 24 hr forecasts initialised with the analysis ensembles from 25 assimilation cycles (noting that 18 members times 25 cycles equals 450 initial conditions for the forecasts).

5) SUBJECTIVE VERIFICATION
The objective measures detailed above are complemented by 'subjective verification' at a later stage of assessment, in order to give a comprehensive understanding of system performance and relevance. Subjective visual evaluation of individual experiments is recognised as playing an important role in the verification process of model and assimilation performance (e.g., Piccolo et al. (2019)), particularly for convection-permitting models and precipitation forecasts. Operational centres use subjective verification, in tandem with more-traditional statistics-based objective measures, to comprehensively assess changes to operational forecast-assimilation systems (e.g., Rawlins et al. (2007); Lean et al. (2008)). Unlike objective measures, subjective verification techniques require both knowledge of meteorology and modelling, as well as experience accrued in assessing previous outputs (Mittermaier et al. 2016). Careful interpretation at this subjective level may provide crucial insight that can not be achieved via objective verification only. However, subjective verification is time-consuming and not systematic. We therefore examine model outputs after the objective measures have identified credible experiments, in order to substantiate and justify the claim that an experiment is both well-tuned and relevant.

Idealized DA experiments
Data assimilation research using idealized models is primarily carried out in a so-called 'twin' experiment setting, whereby the same computational model is used to generate a nature run (which acts as a surrogate truth) and the forecasts. If the forecasts are generated using exactly FIG. 4: Model solutions for h (top), u (middle), and r (bottom), with zoomed insets for h and r, for N el = 200 (blue), N el = 400 (green), and N el = 800 (cyan). In an imperfect model scenario, the forecast model has a coarser resolution than the nature run. Top panel: thick black line is the topography b(x) (Eq. 5) and the black horizontal dotted lines are the threshold heights H c < H r . Bottom panel: black dashed line is r(x) = 0, confirming that the numerical integration scheme ensures non-negativity of r.
the same model configuration as the nature run, the resulting DA experiments are said to be carried out in a perfect model setting. On the other hand, in an imperfect model scenario, forecasts are generated using a different model configuration, e.g., with misspecified model parameters or at a coarser spatial resolution.
The nature run is a single long integration of the numerical model and is a proxy for the true evolving physical system. It is the principal difference between idealized and operational DA experiments and its function is twofold. First, it is used to produce pseudo-observations of the physical system, which are then assimilated into the cycled forecast-assimilation system. These pseudoobservations (also known as synthetic observations) are generated by applying the observation operator H to the state vector from the nature run x t and adding random samples from a specified observational error distribution. Second, it provides a verifying state with which to compare the forecast and analysis estimates and thus quantify the errors in each. Next, we explain in detail how the twin experiments in this study are constructed, before describing the tuning process and assessing the performance and relevance of multiple experiments.

a. Assimilation: experimental set-up
Current high-resolution NWP models are operating with a horizontal gridsize on the order of one kilometre. For example, the Met Office's UKV model has a gridsize of 1.5 km in its interior domain and the MOGREPS-UK (Met Office Global and Regional Ensemble Prediction System -United Kingdom) ensemble runs at 2.2 km (Tang et al. 2013;Hagelin et al. 2017); the Deutscher Wetterdienst's COSMO-DE (COnstortium for Small-scale MOdelling -DEutschland) model has a 2.8 km horizontal grid spacing (Baldauf et al. 2011). Running models at this resolution means that convection is resolved explicitly (albeit imperfectly) and yields more realistic-looking precipitation fields (Lean et al. 2008). With this in mind, a forecast grid spacing of 2.5km is imposed for the idealized model. Thus, given a length L 0 = 500 km of the domain, the computational grid for the forecast model has N el = 200 elements and the total number of degrees of freedom is n = 600 (again noting that hv is removed from the integration since the flow is non-rotating).
Despite the improved representation of clouds and precipitation in models with gridsize O(1km), it is widely recognised that convection is still under-resolved and does not exhibit many aspects of observed convection (Tang et al. 2013). To reflect this, we adopt here an imperfect model scenario, in which the nature run is generated at a finer resolution than the forecast model, namely N nat el = 400 for the nature run 5 . This is the only difference compared with the model configuration used for the forecast integrations. Example model trajectories at different resolutions at a given time are shown in figure 4; conceptually, the basic data assimilation problem can be summarised using this figure: adjust the forecast (lowresolution field) using pseudo-observations of the nature run (high-resolution field) in order to provide a better estimate of the nature run using the imperfect forecast model. 5 Experiments with N nat el = 800 have also been conducted (not shown here) and typically require a shorter assimilation window, due primarily to the increase in model error and subsequent error growth rates. A nonlinear assimilation algorithm may be better suited to this resolution mismatch.
Observations are assimilated hourly, comparable with the cycling frequency of operational ensemble-based convective-scale systems (e.g., Schraff et al. (2016)), over a 48 hour period. All variables are observed directly (making the observation operator H linear) with specified error σ = (σ h , σ u , σ r ) and spatial density of an observation every 25 gridcells (∼ 63 km) for h and 20 gridcells (∼ 50 km) for u and r. This means that p = 28 observations are assimilated at each analysis step. To ensure that our idealized system exhibits rank-deficiency akin to operational systems, i.e., N < p < n, we employ an ensemble with N = 18 members. This relatively low number of observations is further motivation for our choice of the DEnKF over the stochastic 'perturbed-observation' EnKF. The DEnKF avoids introducing further sampling error that would otherwise result from perturbing the small number of observations, which can in turn make it more likely that the state vector has negative r or from the fluid depth being on the wrong side of a convection or precipitation threshold. We have applied an additional quality control step of resetting any unphysical negative h and r pseudoobservations to zero. Observation quality control is a crucial part of operational NWP; in an idealized setting, it can also help to reduce forecast failures and enables a larger range of the parameter space to be explored.

b. Ensemble initialisation and non-negativity constraints
The initial ensemble should ideally be constructed to represent as fully as possible (given the finite ensemble size) the error statistics of the model state (Evensen 2007). Various methods can be used to generate initial ensemble perturbations, which may not be dynamically consistent with the model. However, given enough spin-up time, these initial perturbations usually do not impact the overall EnKF performance since they are only used once at the very first assimilation cycle (Houtekamer and Zhang 2016). This is especially pertinent at higher resolutions where information loss occurs on fast time-scales; a frequently cycling ensemble system can adjust quickly from random initial perturbations and generate an ensemble that adequately samples the forecast error. For convectivescale DA (and especially idealized experiments), spatiallyuncorrelated random Gaussian perturbations can be used to initialize the first cycle (e.g., Zhang et al. (2004)). A range of initial errors σ σ σ ic = (σ ic h , σ ic hu , σ ic hr ) have been trialed and, as noted by Houtekamer and Zhang (2016), the initial perturbations are forgotten promptly and negligible difference is noted between the trials after a few cycles. The perturbations used to generate the initial ensemble for all experiments are σ σ σ ic = (0.1, 0.05, 0), i.e., for j = 1, ..., N: I is the 200-by-200 identity matrix, and h(x, 0) is given in Eq. 6; initial ensembles for for hu and hr are constructed analogously. Note that the rain variable is not perturbed (σ ic hr = 0) since the rain field is initially zero everywhere and adding Gaussian noise is neither desired (producing unphysical negative rain) nor required (perturbations to the h field lead to a random sample of rain fields as t > 0).
Since the Kalman filter and its variants (and indeed most operational DA algorithms) are in essence Bayesian estimators that assume Gaussian statistics, the analysis update may produce a negative value for a state variable which should be strictly non-negative, e.g., rain rate or humidity. Numerical integration schemes that preserve non-negativity ensure this is not the case in the forecast step, but spurious negative values may still result from the analysis step. For the idealized model, the height and rain variables h and r should remain non-negative, with the numerics described in Kent et al. (2017) ensuring this in the forecast step. Negative h is not only unphysical but also causes the subsequent integration to fail (by violating hyperbolicity of the model); negative r poses no problems for the model integration but is clearly unphysical and impacts the other variables via the momentum coupling.
The most straightforward solution is to enforce nonnegativity simply by setting any spurious negative values to zero after the update. Whilst effectively ensuring the desired non-negative analysis states, this artificial modification is a crude approach that violates conservation of mass and may cause an 'initialisation shock' in the subsequent forecasts. More-sophisticated methods exist which incorporate constraints in the assimilation algorithm itself (e.g., Janjić et al. (2014)). However, these methods typically require a variational algorithm whose cost function includes terms to penalise unphysical negative solutions. We have instead adopted the non-variational Kalman gain formulation. In the idealized experiments presented here, any negative h and/or r values after assimilation are reset to zero before the next forecast step.

c. Assessment of filter performance
In this section, we outline the tuning process and give a first assessment of filter performance for a range of experiments. Despite the use of a low-dimensional, inexpensive model, the tuning of this idealized forecast-assimilation system requires many adjustments to most of the parameters of the observing system and filter configuration (see right column of Table 2). The observing system parameters are the observing frequency, total number of observations, their spatial density, and error; filter configuration parameters are ensemble size N, localization length-scale L loc and the inflation factors, α RT PS and γ a . In particular, the opportunity (and indeed need) to vary the observing system greatly increases the dimensionality of the overall tuning process. For the sake of simplicity, though, we restrict our considerations here to the tuning of the filter configuration given a particular observing system and ensemble size, as discussed in section 4a. Therefore, given the model configuration described in section 2 (and summarised in the left column of Table 2) and a prescribed observing system, each experiment is defined by a combination of the tuning parameters; these experiments are now systematically compared with each other and assessed via the diagnostics in section 3c in pursuit of a well-tuned experiment.
Before asking how well a filter performs (and before considering the effect of both localization and inflation), we need to choose the timescale for which the system will be optimized. In particular, since our focus is convectivescale NWP nowcasting, we optimize the system for fore- casts with a three-hour lead time. It is important to note that there is not a single most 'well-tuned' configuration but rather a subset of the parameter space which yields a selection of well-tuned experiments. Our tuning process selects those experiments that satisfy the tuning criteria of Table 1; once this has been achieved, candidate experiments are validated further for relevance (see section 4d). Thus, we focus now on the filter diagnostics with the following targets: , with the L loc values corresponding to a cut-off distance of 1000 km, 500 km, 333 1 3 km, and 250 km, respectively. The matrices are constructed from forecast perturbations valid at T = 36 hrs; note that each matrix is the average over the N sample error correlation matrices due to self-exclusion (see step 2iv in appendix A).
The target value for a well-tuned experiment is indicated by the white/light cells. By comparing the position of the best experiments under each criterion, we seek an area of the parameter space that satisfies all the conditions above. However, as is often the case in tuning a complex system, detecting such a subset of experiments is non-trivial and consensus is hard to achieve. As a compromise, we set a tolerance of ±0.2 on the condition SPR/RMSE ∼ 1 and demand that the experiments must first satisfy this target in order to be evaluated for minimal RMSE and CRPS: these experiments are outlined in red in the top-left panel of The SPR/RMSE measure and CRPS assess the filter configuration of the forecast-assimilation system, in particular the role of the ensemble, but they do not entirely indicate its relevance to the NWP problem (in particular the observing system). To this end, we also show the same graphic for the OID (top-right), which has a target value 20% OID 40% indicated by and centred on white/light shading. Given that the imposed observation error is fixed for these experiments, as well as the update frequency and observation density, the subsequent influence of the observations is controlled by the changing impact of the forecast (due to inflation and localization). The average influence of the observations increases with increasing inflation, as should be expected: higher inflation factors increase the ensemble spread, and consequently lead to a larger estimation of the forecast error covariance P f , representing decreased confidence in the accuracy of the prior forecast. It follows that more weight is given to the observations in the filter and increases their influence on the analysis estimate. Crucially, there is good agreement between the desired SPR/RMSE and OID, indicating at this stage of the assessment that a good filter configuration and meaningful observing system can be achieved within this experimental set-up; this is examined further in section 4d.
As a consequence of the application of the three tuning criteria above, we focus now on 12 candidate well-tuned experiments, indicated by the squares outlined in black in Fig. 5, and examine the role of localization. Figure 6 shows the forecast-error correlation matrices of four different experiments, all drawn from the 12 selected above, one for each L loc value, before and after the application of localization. Although the impact on the experiments of different localization scales is not entirely evident in Fig. 5, here the direct effect on the forecast-error correlation matrix is much clearer. In particular it is clear that, on the one hand, for values of L loc bigger than 1 (bottom panels), much of the signal is suppressed away from the diagonal bands, leaving in place only the narrow diagonal bandlike structures present in the matrices. On the other hand, a value of L loc = 0.5 (top-left panel) is too broad to remove spurious long-distance correlations in P f e . Since the aim of localisation is to remove the spurious long-distance correlations while retaining valid correlations, and since the typical horizontal length-scale in operational convectivescale NWP (see table 2) is on the order of 100km, we focus now on the three experiments with L loc = 1.0, recalling that this defines a length-scale of N el /L loc = 200 grid points (i.e., correlations are suppressed completely beyond 500km). We also note the effect of the four localisation length-scales on the tuning criteria (Fig. 5).
Another key aspect of verification that has not been properly scrutinised yet is to assess whether the data assimilation is actually improving forecast accuracy (and by how much). This is ascertained by comparing forecasts with different lead times, but valid at the same time, in an attempt to highlight the benefit that assimilating new observations had on the most recently initialised forecast. Figure 7 shows, for the three chosen experiments with L loc = 1.0, the domain-averaged time series of ensemble SPR and RMSE of the ensemble mean for each field h, u and r. These values are computed from 3hr (blue lines) and 4hr (red lines) forecasts valid at the same time; the average over these times (excluding the first 12 hours for spin-up) for each variable is shown in the top-left corner of each panel, and detailed further in table 3. Examining forecast errors exemplifies the impact of data assimilation: we expect the 3hr forecast to be more accurate (i.e., lower RMSE values) than the 4hr forecast, since the former has been updated with more-recent observational information than the latter. Indeed, this is evident in all three experiments, with the 3hr forecast error (blue dashed lines) generally lower than the 4hr forecast error (red dashed lines) throughout the 48 hours. This is confirmed quantitatively in time-averaged values for all variables (table 3). The 3hr forecasts are consistently more accurate than the 4hr forecasts, with the percentage improvement ranging between 5-12% between variables, and close to 10% overall. Interestingly, the precipitation field exhibits the largest improvement out of the model variables in all three experiments.
Furthermore, fluctuations in these time series highlight the time-dependence of the ensemble SPR and consequently the flow-dependence of forecast error. We note their quasi-oscillatory nature, which is attributed to the periodic boundary conditions. In general, SPR (solid lines) is comparable with the RMSE of the ensemble mean (dashed lines), indicating that the ensemble is providing an adequate estimation of the forecast-error covariance matrix P f . There is naturally some variability between variables, and we note that h and r tend to be more underspread than u (i.e., SPR/RMSE 1 for h and r, whereas SPR/RMSE ∼ 1 for u). Dynamically this makes sense: the convective behaviour in the flow is driven by h and manifest in r. These variables exhibit highly nonlinear behaviour in regions of convection which the ensemble and filter (which assumes Gaussian statistics) struggle to capture in terms of SPR. We discuss this further in section 4e. Also noteworthy is that these series are stationary in the sense that the spread and error do not drift in time. A poorly configured filter may diverge from 'reality' (despite the frequent update from observations), but we see here that these set-ups do not experience this filter divergence. Of the three experiments examined in Fig. 7, we focus next on experiment (a), which has the largest overall improvement in forecast accuracy (i.e., largest reduction of RMSE: see Table 3, left column). It also has the lowest additive inflation factor, which is desirable from a model-error perspective, and an RTPS value (α RT PS = 0.7) comparable with operational systems (cf. Gustafsson et al. (2018)).

d. Validation and relevance for convective-scale NWP
The CRPS and SPR/RMSE measures ascertain the general performance of the forecast-assimilation system itself, in particular the role of the ensemble, but they are not sufficient to indicate its relevance to the NWP problem. To this end, two additional validation diagnostics are employed: (i) the observational influence diagnostic (OID); (ii) the error-doubling time statistics.
The OID is calculated at each cycle and is expected to vary for the given dynamical situation -the 'weatherof-the-hour'. If at a given time there is a lot of uncertainty in the forecasts, e.g., due to a lot of convective behaviour and associated nonlinearity, then it is to be expected that the observations have a greater influence at this time. On the other hand, a situation without much convection is relatively predictable, suggesting more certainty in the forecasts and less impact from the observations. The OID of h, u, and r is plotted in figure 8 as a function of time. The overall influence (thick black line) is typically 25 − 35%, comparable to operational convectivescale forecast-assimilation systems. The influence of h, u and r observations is also shown: this too fluctuates depending on the 'hourly weather', and we note that the average influence of observing u over 48 hours is higher than for h and r (which are themselves comparable). We do not claim that this is an intrinsic property of our model and recognise that choosing a different observing system will lead to a different balance between the various observational components. Monitoring the observational influence in this way facilitates further investigations of the observing system; this is however not the purpose of this study, but we conclude that this well-tuned experiment has an observing system that is relevant for operational convective-scale NWP (for which 20% OID 40%).
The final objective verification measure of our protocol is the error-doubling time, described in section 3c.    figure 7 (all with L loc = 1.0): comparison of 3hr and 4hr forecast error, and the percentage difference. The % difference is computed with respect to the 4hr forecast: The goal of data assimilation is to provide the best estimate of the state of the atmosphere by merging forecast and observational information. Typically, this best estimate is used to initialise forecasts that run longer than the length of the assimilation window. To evidence further the relevance for convective-scale NWP, the error-doubling time statistics are considered by running numerous forecasts initialised at different times with the analysis ensembles produced in this experiment. Each cycle provides an ensemble of N = 18 analyses and, by taking a range of initial conditions from successive cycles, the resulting forecasts cover a wide range of dynamics. In total, 450 24-hour forecasts are run and the time T d taken for the initial RMSE to double (see equation (23)) is recorded: histograms of the error-doubling times for each variable are shown in figure 9. We note that h and u have similar doubling times (about 9 hrs), while errors in the precipitation field r grow at a faster rate (doubling in about 6 hrs). Dynamically, this is to be expected due to the highly nonlinear behaviour of r. As noted in section 4, the average doubling time in convection-permitting NWP models is around 4 hours (Hohenegger and Schär 2007); the idealized forecast-assimilation system analysed here has error growth properties of the same order of magnitude (i.e., hours) as convective-scale NWP. It should also be noted that the resolution mismatch between forecast and nature runs strongly determines error-growth rates: experiments with a finer nature run (N nat el = 800; cf. Fig. 4), doubling the resolution difference from two to four, have

e. Subjective verification
We conclude the assessment by examining full-domain snapshots of the dynamics, drawn from the well-tuned experiment examined in the previous section (experiment (a) in Table 3) in Figs. 10 and 11. Individual ensemble members (blue lines), the ensemble mean (red lines for the forecasts and cyan lines for the analyses), pseudoobservations (green dots), and the verifying nature solution (green lines) of each variable are displayed at a given time 7 . Since the system's performance has already been evaluated objectively, the purpose of this section is to demonstrate from a qualitative point of view the impact of data assimilation on the system, and to highlight both its successes and struggles in coping with the nonlinear behaviour of the model. Figure 10 compares the 4 hr forecast (initialised at T = 36 hrs; left panel) with the 3 hr forecast (initialised at T = 37 hrs; right panel) valid at T = 40 hrs. Clearly, we expect the latter to be more accurate than the former (i.e., closer to the nature run trajectory), since it benefits from a more-recent initialisation and therefore has had less time to diverge.
The dynamical situation (from the nature run; green lines) at T = 40 hrs is as follows. The flow is moving from west to east (u > 0) with convection downstream of the topography. There is one active region of convection observable in h around x = 0.75 with associated precipitation r. Some precipitation remains over the topography 0.2 < x < 0.5, due to the fluid rising above H r at an earlier time. Sharp gradients in wind u are present where there are 7 Note that other times have also been examined subjectively; we focus here on one particular situation for illustrative purposes. sharp gradients in fluid depth h; recall that precipitation forms when both h + b > H r and ∂ x u < 0, and the rate of formation is proportional to −β ∂ x u. The fluid marginally exceeds the threshold heights around x = 1 (or 0 recalling periodic boundaries). Note that the green circles denote the pseudo-observations assimilated at T = 40 hrs; pseudo-observations assimilated at T = 36 hrs and T = 37 hrs have the same spatial location.
Generally, the forecasts are accurate (close to the nature run with small spread) for all three fields in nonconvecting regions. The two regions of less accuracy (and hence higher uncertainty exemplified by the spread) are the active convection around x = 0.75 and the region close to the thresholds around x = 1. Both the 3 hr and 4 hr forecast capture the location of convection but struggle with its intensity, which we have already seen in Figure 3 is a consequence of using a lower-resolution forecast model. Somewhat unsatisfactorily, there is little noticeable difference between the two forecasts in h and r, suggesting that the more-recently initialised forecast offers only marginal improvement in this case. Perhaps this is not unexpected, given the rapid (< 3hrs) development of convection here. Furthermore, given the limitations of the forecast model and the time since initialisation, it is not able to capture the intensity.
On the other hand, visual examination of the forecasts around x = 0/1 shows greater improvement. Both forecasts predict spurious precipitation, which is not present in the nature run, but its magnitude is greatly reduced in the 3 hr forecast. This is a particularly interesting situation offered by the model: some ensemble members exceed the thresholds while others do not. The model thus exhibits highly nonlinear behaviour that is an inherent characteristic of the onset of convection and precipitation, reflected in the non-Gaussian distribution and large spread of the forecast ensemble. Despite the challenges posed by this behaviour and the limitations of the EnKF, the impact of the data assimilation is positive and leads to a more-accurate forecast.
Overall, the 3 hr forecast offers greater improvement over the 4 hr forecast in u and r with lesser difference in h; this is backed up quantitatively in the spread-error plots (left panel of Fig. 7), where the 3 hr forecast at T = 40 hrs shows a smaller RMSE of the ensemble mean than does the 4 hr forecast in u and r.
Next, we analyse the direct impact of assimilating observations at T = 40 hrs on this dynamical situation. Figure 11 shows the 1 hr forecast ensemble initialized at T = 39 hrs (left panel) and the analysis ensemble (right panel) at T = 40 hrs. Thus, the difference between the ensembles in the left and right panels is the assimilation of observations (green circles). First of all, and in reference to the visual interpretation of the 3 and 4 hr forecasts, we observe that the 1 hr forecast also struggles to capture the intensity of the convection and precipitation around x = 0.75, but does certainly improve on the 3 hr forecast (Fig. 10). However, the effect of assimilating observations at this time is evident and promising: the height and rain fields are adjusted, with the analysis ensemble showing good agreement with the verifying nature run, particularly in the region of convection around x = 0.75, which also coincides with an observation of h. Indeed, the impact of observation proximity on the results is worthy of comment, particularly in regions of convection. Good pseudo-observations of h that coincide with convective updrafts have, unsurprisingly, a greater positive impact than those that do not measure the updraft (not shown but evident at other times). Even though the corresponding peak in model rain r does not coincide with an observation of r, the ensemble mean of the analysis (cyan line) is greatly  Figs. 2 and 6), which is able to spread observational information to unobserved parts of the state vector.
The spread of the analysis ensemble is lower than that of the 1 hr forecast, reflecting the increased confidence. Interestingly, around x = 0/1, the threshold heights bring about diverging ensemble members also in the analysis ensemble. The best estimate (i.e., the ensemble mean; cyan line) offers marginal improvement (in h and r) on the 1 hr forecast, again illustrating the nonlinear model behaviour and the limitations of the EnKF in dealing with this situation with (highly nonlinear) threshold behaviour. The wind field, on the other hand, shows good improvement overall. However, we also note the negative impact on the analysis ensemble of the outlying u observations around x = 0.65 and x = 0.85, highlighting that poor observations can be detrimental to the analysis.
Finally, we stress that this subjective verification, by its definition, is open to interpretation and that many other model outputs at other times have also been examined in this process. However, through a critical assessment of Figs. 10 and 11, we have illustrated some of the interesting situations attainable in this system, highlighting both the positive and neutral/negative effects of EnKF data assimilation in this idealized environment.

Conclusion
High-resolution 'convection-permitting' NWP models are able to resolve some of the finer-scale features associated with convection and precipitation and are now commonplace in national weather centres (cf. Gustafsson et al. (2018)). However, increasing the spatial resolution is not a panacea (Yano et al. 2018); the so-called 'grey-zone' -the range of horizontal scales in which convective processes are being partly resolved dynamically and partly by subgrid parameterizations -poses many challenges for NWP, including how best to tackle the assimilation problem. Idealized models are designed to represent some essential features of a physical system and offer a computationally inexpensive tool for researching assimilation algorithms. A great deal of preliminary analysis on the performance and suitability of potential DA algorithms is conducted using low-order models (e.g., Lorenz (1963Lorenz ( , 2005). However, there is a vast gap between the complexity of these models and operational NWP models which integrate the primitive equations of motion (Kalnay 2003). The idealized fluid dynamical model of Kent et al. (2017) is able to simulate some fundamental dynamical processes associated with convecting and precipitating weather systems, suggesting that it is a suitable candidate for investigating DA algorithms at convective scales. This study exemplifies this further by conducting numerous forecast-assimilation experiments, providing a critical assessment of their performance, and addressing their relevance for convectivescale NWP.
Ensemble-based DA algorithms offer many advantages at convective scales: flow-dependent error statistics are able to capture nonlinear and intermittent aspects of the convective-scale flow that a static covariance matrix method does not. We have implemented a flavour of the (deterministic) ensemble Kalman filter (Sakov and Oke 2008), in combination with well-known techniques to tackle undersampling, and have presented a robust algorithm with which to establish the suitability of the modified rotating shallow water (modRSW) model in idealized forecast-assimilation experiments. We arrived at this particular implementation by considering this 'noperturbation' EnKF in tandem with RTPP and RTPS adaptive multiplicative inflation techniques (as well as additive inflation, localization, and self-exclusion), motivated partially by the Ensemble of Data Assimilations developed by Bowler et al. (2017), an updated version of which became operational in the Met Office's MOGREPS-G global ensemble in December 2019.
Experiments have been carried out in the 'imperfect' twin-model setting: the forecast model runs at a coarse resolution which partially resolves the convection and precipitation fields; pseudo-observations are generated by perturbing a higher-resolution nature run, which exhibits "sharper" features in the model fields. We have shown via an extensive tuning process that there is sufficient error growth in our idealized system, which has an overall observational influence directly comparable to operational systems, for meaningful hourly-cycled DA at kilometre scales.
This tuning process involved making a priori adjustments to the observing system and iterative adjustments to the filter configuration, while monitoring measures for not only system performance (spread, RMSE of the ensemble mean, CRPS) but also relevance (OID and error-doubling time). The performance measures have been computed for 3 hr forecasts, since the goal of convective-scale DA is ultimately to find the analysis that produces the best (short-range) forecast. Our observing system is characterized by the resolution mismatch between the forecast model and the nature run, the frequency of updates from observations, the total number of observations, imposed observation error and observation spatial density; the filter configuration is characterized by the size of the ensemble and parameters pertaining to localization and inflation. Systematically comparing potential set-ups is a crucial process for developing effective forecast-assimilation systems (see, e.g., Poterjoy and Zhang (2015)). In an idealized setting, this process is necessary for assessing both performance and relevance in pursuit of well-tuned experiments. We have synthesized the results of this process in a concise graphical manner that facilitates objective comparison of a large number of experiments, and ascertained how the filter configuration affects the spread-error statistics, CRPS, and overall observational influence. In order to assess a multitude of experimental configurations rapidly, these measures have been computed hourly for 48 hours, and then averaged over time, space, and model variable.
This process identified three candidate experiments (i.e., those with a spread-error ratio close to one and, given this first criterion, minimal RMSE of the ensemble mean and CRPS) to analyse further. Specifically, we focused on the time-dependent spread and RMSE of the ensemble mean of the 3 and 4 hr forecast ensembles for each model variable (see table 3 and figure 7), and in particular calculated the relative improvement of the 3 hr forecast with respect to the previous cycle's 4 hr forecast for each variable. This confirmed that the spread and RMSE of the ensemble mean are of comparable magnitude, and therefore that the time-dependent error statistics were adequately captured by the ensemble via its spread. In terms of the improvement in error, the impact of the data assimilation was greatest on the precipitation field, which is a key quantity in operational meteorology (especially nowcasting). The experiment which exhibited the largest improvement (on average) had a relatively low additive inflation factor and an RTPS value α RT PS = 0.7. Our analysis culminated in a critical assessment, both objective and subjective, of this well-tuned experiment chosen from the three short-listed candidates, and validated it further for its relevance.
The headline results are summarized in table 1, which also includes certain aspects of convective-scale NWP and DA that an idealized system should attempt to mimic and ascribes an appraisal of relevance (medium or high) of the idealized system for each aspect. This is by no means a prescriptive or exhaustive list but provides guidance to conducting DA research using idealized models and, in doing so, emphasizes the need to assess relevance as well as performance. In our set-up, the forecast model has a grid size of 2.5 km and only partially resolves the convection and precipitation fields, while observations are sampled from a better-resolved nature run. The update frequency (1 hr) is comparable to that of operational high-resolution NWP and error-doubling time statistics (6−9 hrs on average) reflect those of convectionpermitting models in a cycled forecast-assimilation system, while noting that lower values are attainable when using an even higher-resolution nature run. A well-tuned observing system and filter configuration is achieved that adequately estimates the forecast error (with a spreaderror ratio close to 1) and has an average observational influence similar to NWP of about 30%. The filter configuration, with a horizontal localization lengthscale on the order of 100km and adaptive multiplicative inflation (with α RT PS = 0.7), is comparable to filter configurations of operational systems such as those of the Met Office and Deutscher Wetterdienst.
We recall here our present aim, outlined in the introduction, was to provide convincing evidence that DA experiments using the modRSW model show consistency with operational forecast-assimilation systems, rather than, say, to compare methods or focus on a particular aspect of the assimilation algorithm. It should be stressed that there is not a single perfect experiment, but rather many potential candidate configurations that the user must critically analyse and choose which is best given their specific needs. It may be that one aspect of the configuration is to be prioritized; for example (and without loss of generality), in our system we have focused on experiments with a localization length-scale L loc = 1.0, equivalent to a Gaspari-Cohn length-scale of 500 km, and looked at three configurations in more detail. In presenting our results in this way, we have attempted to lead the reader through the involved process of conducting idealized DA experiments that are faithful to operational NWP. By demonstrating a model's suitability for conducting idealized DA experiments and, in particular, showing them to be relevant for convective-scale NWP, results obtained in an idealized setting have the best chance of scaling up to more complex systems. Of course this is not without uncertainty and care should be taken when making such claims, but the extra effort made in assessing both performance and relevance gives greater credence in the outcomes. While there are numerous informative studies using idealized models in DA research, what is often lacking is a careful consideration of their relevance to operational NWP. This consideration is a fundamental part of our study: we have scrutinized our idealized system thoroughly for its relevance and in doing so we have provided confidence in the use of the model for convective-scale DA research. We consider this to be a significant contribution to the novelty of this work and conclude with some suggestions for its use in data assimilation research at convective scales. This is by no means an exhaustive list, but brings together some of the ideas that have been stimulated by the model's preliminary use 8 , most of which build on the simplified observing system and observation operator used in this study.
• Nowadays, a vast number of observations come from satellites and they play a critical role in advancing high-resolution DA. However, they pose huge challenges and the question of how best to assimilate 8 Authors TK and LC hosted a users' workshop (http://www1. maths.leeds.ac.uk/~amttk/workshop.html) in May 2019 and some of these suggestions arose in discussion with participants there. such a vast quantity of remotely-sensed observations is a major topic of research. It would be possible to mimic satellite observing systems in an idealized setting by having periodic (in time), nonlinear observations localised in space and time and a simplified radiative transfer model. The model with such an observing system would provide an interesting test-bed for satellite DA research.
• Idealized models are most typically employed in a framework to compare the performance of different assimilation algorithms (e.g., Fairbairn et al. (2014)). The model could be used to compare methods in the presence of convection and precipitation. For example, how does the EnKF perform against a hybrid ensemble-variational method or a fully nonlinear filter? The case for nonlinear data assimilation is growing along with the resolution of NWP models. An idealized model with highly nonlinear convective processes is a useful tool for furthering research in this direction. In particular, dynamics associated with the thresholds for the onset of convection and precipitation provide a stiff test for nonlinear filters.
• Representation, or representativity, error comprises "the difference between the modelled representation of an observation and what is actually observed" ). An essential component of this error comes from the discrepancy between the spatial scales represented in the observations and the model fields. The imperfect model scenario illustrated in the idealized forecast-assimilation system of this study, combined with a more-realistic observation operator, provides a flexible testbed to investigate this type of error.
• Often in operational NWP and DA, a great deal of observations cannot be used due to the sheer volume and diversity of incoming measurements. The observing network is becoming ever denser and this brings about further computational and scientific challenges. Data compression is a technique that aims to extract the maximum information content from observations while reducing the overall amount of data to be assimilated. Preliminary research (Fowler 2019) has been conducted using a low-order ordinary differential equation model and there is scope here for using our more realistic idealized model.
• Through the use of the observational influence diagnostic, we have included an initial assessment of observational impact in this study. There is vast potential for undertaking further research in this direction, building for example on low-order work of Fowler and van Leeuwen (2012) and recent work on both the impact and potential impact of observations in a simplified NWP set-up (Necker et al. 2019

Sequential forecast-assimilation algorithm
A compact algorithm for our implementation of one complete cycle (forecast plus analysis) of the DEnKF is summarised here. Throughout, subscript j denotes the j th ensemble member and subscript i denotes time. Note that prior to the start of the data assimilation algorithm, pseudo-observations y j are generated by stochastically perturbing the nature run x t valid at the observing time t i : R = diag(σ 2 h I h , σ 2 u I u , σ 2 r I r ), for prescribed error variances σ 2 h,u,r and identity matrices I h,u,r with dimension equal to the number of observations of h, u, and r, respectively (so that R is diagonal with dimension p × p). Unphysical negative pseudo-observations of h and r are then reset to zero. A prescribed model-error covariance matrix Q also has to be estimated.

FORECAST STEP:
i An ensemble of initial conditions x ic j is generated by taking the values from Eq. (6) and adding Gaussian noise for each variable according to σ σ σ ic , as per Eq. (24) and section 4b. Unphysical negative initial conditions for hr are reset to zero while negative h values (very rare) are reset to 0.001.
ii The model is integrated forward in time. Additive inflation is drawn from the model-error covariance matrix Q as η j ∼ N (0, γ 2 a Q) and potential biases are removed by applying equation (17). The unbiased model-error vector η j is injected throughout the numerical integration by dividing it into (small) allocations proportional to the duration of each dynamical time-step δt.
For time-step details, we refer to Kent (2016) and Kent et al. (2017), implemented here with a Courant-Friedrichs-Lewy (CFL) number of 0.5. Therefore, within each forecast step of duration ∆t = t i+1 − t i at any timet ∈ [t i ,t i+1 ], we compute: . In order to ensure that the algorithm does not overshoot the time of the next forecast-assimilation cycle t i+1 , we take the final time-step to be the reduced value t i+1 −t when this is smaller than the optimal δt determined by the CFL value.
iii At later times, the forecast uses the analysis ensemble from the previous cycle as initial condition to integrate forward in time. A 12-hour forecast is launched at the end of each analysis step, with the additive inflation resampled and injected hourly as in ii.

ANALYSIS STEP:
i Each j th T + 1 hr model state obtained from the most recently launched T + 12 hr forecast is transformed into the state vector for assimilation: x f j (t i ) = Ψ(x f j (t i )), i.e., (h, hu, hr) → (h, u, r). ii Compute the innovations d j = y j −Hx f j using the forecast states from step 1 and the pre-computed pseudo-observations.
iv Compute the forecast perturbation matrix X and therefore the N forecast-error covariance matrices P f e, j (see Eq. (11)), each of them computed excluding the j th ensemble member of the forecast states from step 1, in order to avoid inbreeding. v Apply (model-space) localisation using the Gaspari-Cohn function ρ for a given length-scale L loc to each forecast-error covariance matrix P f e, j : compute the j th Kalman gain K e, j and the subsequent analysis ensemble: x a j = x f j + K e, j d j .
vi The Deterministic Ensemble Kalman Filter is implemented. The analysis perturbation matrix X a is computed with the N members x a j (as per Eq. (11)); using the RTPP implementation of the DEnKF (Appendix B), X a is then redefined as: vii Relaxation to Prior Spread (RTPS) is applied by recomputing the analysis ensemble spread σ a (as per Eq. (19)) and then the analysis ensemble perturbations X a are adjusted using Eq. (3) from Whitaker and Hamill (2012): viii The final analysis ensemble is recomputed using the redefined perturbation matrix: x a j = x a + (X a ) j , j = 1, ..., N, where (X a ) j is the j th column of X a . ix Return to step 1: analysis states from step 2viii are transformed backx a j (t i ) = Ψ −1 (x a j (t i )) for integration and the sequential cycle continues.
Recall that, in this implementation, the parameter L loc defines the number of grid points in model space (N el /L loc ) beyond which correlations are set to zero. Since H is linear in this setting, the observation location always coincides with the model grid.

APPENDIX B
Equivalence of DEnKF with 'no-perturbation' EnKF with RTPP α RT PP = 0.5 We give here a proof that the 'no-perturbation' EnKF in conjunction with adaptive RTPP inflation and α RT PP = 0.5 (Eq. (18)) is formally equivalent to the Deterministic Ensemble Kalman Filter (DEnKF; Sakov and Oke (2008)). This result is already being exploited in the Met Office's operational MOGREPS-G Ensemble of Data Assimilations but, to our knowledge, has not previously been published. Importantly, this equivalence is preserved when self-exclusion (Section 3b) is applied to counter the effects of inbreeding. First, note that the analysis step of the EnKF without perturbed observations ('no-perturbation' EnKF) can be written as: x a j = (I − K e, j H) x f j + K e, j y, with the ensemble mean x a being: x a = (I − K e, j H) x f + K e, j y.
The j th column of the analysis perturbation matrix (X a ) j can therefore be expressed as: in which X f is the forecast perturbation matrix. The RTPP equation (18) together with the above result yields: (X a ) j = (1 − α RT PP ) (I − K e, j H) (X f ) j + α RT PP (X f ) j .

Observational Influence Diagnostic (OID)
In the update equation (Eq. B1), the gain matrix K e, j weights the information provided by the observations and prior forecast according to their error covariances. The projection of the analysis estimate into observation space, calculated by premultiplying Eq. (B1) by the observation operator H, is: y j = Hx a j = HK e, j y + (I − HK e, j )Hx f j .
The analysis sensitivity with respect to observations, obtained by differentiating Eq. (C1) with respect to y, is given by: The OID (Cardinali et al. 2004), generalized for an ensemble system with self-exclusion (Section 3b), is defined as: and provides a norm for quantifying the overall influence of observations on each analysis estimate of the analysis ensemble.