Decomposition and Inference of Sources through Spatiotemporal Analysis of Network Signals: The DISSTANS Python Package

of the observation records in a user-assisted, semi-automated framework that includes uncertainty propagation. DISSTANS is open-source, includes an application interface documentation as well as usage tutorials, and is easily extendable. We present two case studies that demonstrate our code, one using a synthetic dataset and one using real GNSS network timeseries. more the easier to identify local shortterm noise


Introduction
Networks of Global Navigation Satellite System (GNSS) stations enable the direct observation of surface displacement down to millimeter accuracy (e.g., Blewitt, 2015). Originally using only the Global Positioning System (GPS) and consisting of only a handful of stations, modern quasipermanent deployments sometimes incorporate more than 1,000 receivers and take advantage of other GNSS constellations such as the European Galileo or Chinese BeiDou systems. Analyzing network position timeseries requires awareness of the many processes that affect the observations, both desired and confounding, and an ability to distinguish between them. While dominant constituents like the secular motion of a particular station can usually be inferred by simple linear regression, quantifying less prominent constituents (e.g., displacements due to low-magnitude slow fault slip events or small-volume magma chamber pressurization) requires a better understanding of the contributing processes.
Here, we present the Decomposition and Inference of Sources through Spatiotemporal Analysis of Network Signals (DISSTANS) Python package to facilitate the temporal and spatial decomposition of GNSS timeseries. The code is written in a generic, fully object-oriented fashion with minimal assumptions as to study location, data units, and sampling frequency. Different data loading methods are implemented that interface with common existing timeseries file formats, but are also easily adapted to new formats. All downstream processing is independent of the original format and origin. To make the code as usable and accessible as possible, it is open-source and extensively commented. The repository includes tagged versions, verbose commit messages, and full documentation. The documentation features tutorials based on synthetic and real timeseries data, a subset of which are presented here. DISSTANS already contains many common processing workflows. These workflows are usable with just a few lines of code, and more are in the planned development roadmap. DISSTANS is parallelized for the most demanding tasks -most notably the model fitting component. We also provide extensive plotting options and graphical user interfaces, simplifying interactions with the data.
Section 2 of this report introduces some key structural decisions and presents a brief literature review of previous work, placing this study in the broader scientific context. Section 3 provides an overview of the code design, with Appendix A detailing the lower-level implementation. To validate our processing, Section 4 contains the analysis of a synthetic network of GNSS stations, as well as results from a real-world application using data from the Long Valley Caldera region, California, USA. Section 5 discusses key design choices. Finally, we end in Section 6 with a brief summary and some possible future avenues for extensions to DISSTANS.

Background
The list of scientific questions that can be addressed with GNSS networks is long, and the list of approaches that can be used is even longer (e.g., Blewitt et al., 2018;Herring et al., 2018;Bertiger et al., 2020;. For studies of plate motion, surface deformation, and related fields, the key data are displacement timeseries, i.e., the relative movement over time of a receiver with respect to a defined reference frame. To obtain these timeseries, processing centers start from raw receiver observables (time, pseudoranges, and phases) and take into account a large number of physical processes (e.g., tropospheric and ionospheric travel time delays, gravitational effects, relativistic effects) to produce the best estimate of true receiver position for any given sampling interval (e.g., daily or hourly); see, for example, Misra and Enge (2010) or Blewitt (2015).
With these displacement timeseries, we can now interrogate the timeseries: Is the entire signal explained by rigid plate motion (e.g., Altamimi et al., 2017)? What are the causes for shortterm or longterm transients (e.g., Houston et al., 2011)? How can we use inter-, co-and postseismic station velocities to constrain fault locking (e.g., Meade and Hager, 2005)? Similarly, one might also want to identify and characterize noise processes (e.g., power-law noise, Langbein, 2020). All of these questions, however, require the decomposition of the timeseries into components that are the direct effect of specific physical processes (e.g., hydrological seasonal loading, earthquake offsets and transients, plate motion), and a residual component which is the result of noise processes, processing artifacts, and imperfect modeling.
In this study, we focus on this intermediate step, and refer to it as simply timeseries decomposition. Therefore, we will refer to the displacement timeseries as produced by GNSS network processing centers as the raw or input timeseries, and the decomposition process will aim to isolate timeseries constituents.

Approaches to Timeseries Decomposition
We categorize timeseries decomposition tools using three main criteria.

Process-aware vs. process-agnostic
This first criterion aims to distinguish approaches that either make a priori assumptions about the physical processes affecting the data (expecting a certain structure in the data), or alternatively, assume the least possible. For example, fitting a model containing a complete set of basis functions to a timeseries is, in its most generic form, processagnostic (e.g., Riel et al., 2014), but fitting a logarithmic decay function to a postseismic transient effectively assumes a specific tectonic process (e.g., Hsu et al., 2009).
Process-agnostic approaches will usually achieve the "best" fit to the observations -at least as measured by the magnitude of the residuals, since that is the principle optimization criterion for such methods. However, overreliance on the data and its residuals makes these methods susceptible to "overfitting"; i.e., interpreting noise as signal. Process-agnostic methods also have difficulties determining trade-offs between different source processes, for example in the case of Principal Component Analysis (PCA) when signals manifest in multiple principal components, or a single principal component mixes different signals. In contrast, process-aware approaches might ignore parts of the observation if they either (a) do not have an appropriate way of describing the observation (e.g., an unexpected transient), or (b) try to fit a signal with an inappropriate model (e.g., mapping postseismic deformation into the coseismic one); as these approaches naturally prefer a decomposition that follows the assumed underlying functional forms.

Parametric vs. non-parametric
This second criterion assesses whether one estimates parameters (coefficients) for predetermined models to decompose the timeseries. The models can be as complex as desired (high dimensionality, non-linear). Examples for nonparametric decompositions are linear time-invariant filters used in signal processing (e.g., bandpass or lowpass filters) or basis reprojections like Principal Component Analysis (PCA), Independent Component Analysis (ICA), or Singular Spectrum Analysis (SSA).
Note that this criterion ignores the impact of hyperparameters (e.g., regularization penalties, frequency windows). With non-parametric approaches, the assumptions and hyperparameters are minimal compared to model-based methods, thereby simplifying the problem setup immensely. Furthermore, reducing the influence of hyperparameters translates into a reduction of possible sources of errors. On the other hand, parametric approaches enable a straightforward implementation of the formal covariances between model parameters, and by extension, uncertainties in the predicted timeseries. These approaches can also deal naturally with data gaps (i.e., without the need for data imputation). Crucially, a parametric approach is necessary for process-aware studies, because non-parametric approaches have no inherent knowledge about how to group different source processes into components (see above).

Station-vs. network-level
An additional criterion acknowledges the role that spatial information can play in the analysis process. For example, if the same models are fit to every timeseries in a network, regardless of where the stations are located, then the decomposition code is not aware of the spatial context. These local, station-level solutions are therefore independent from another. If one recognizes, however, that geophysical signals usually have a spatially coherent signature (assuming sufficiently dense networks), then we can and should incorporate that understanding. For example, PCA makes use of the fact that all stations in the network can potentially see the same source signal (even though the network geometry is neglected). Taking advantage of potential spatial structure can be beneficial, although the complexity of the resulting code and additional computational costs are not negligible.

Review of Existing Tools
Considering the diversity of possible approaches, the selection of a certain approach (or the design of a hybrid approach) depends on one's goals and the available data. Additional factors include the ease of software implementation, or possibilities to extend the methods to include ancillary datasets (e.g., rainfall, earthquake catalogs, atmospheric pressure). We review selected published work in the field of timeseries decomposition in the context of processagnostic vs. process-aware, parametric vs. non-parametric, and degree of spatial awareness.
Before high-quality station timeseries became ubiquitous, the QOCA software (Dong et al., 1998) could be used to combine "quasi-observations" (lightly-processed input data from GPS, Electronic Distance Measurements, Satellite Laser Ranging, or Very Long Baseline Interferometry) using a Kalman filter approach. QOCA includes the module analyze_tseri to estimate linear, episodic, and stochastic motion of the different stations individually in a leastsquares-based, process-aware, and parametric framework.
With an increasing number of GNSS stations, more GNSS constellations, and more precise understanding of the physical processes affecting GNSS positioning solutions, GNSS networks became common for monitoring surface deformation. Today, the analysis tools developed to produce GNSS displacement timeseries routinely also include simple timeseries decomposition functionality. For example, the current iterations of JPL's GipsyX/RTGx (Bertiger et al., 2020) and MIT's GAMIT/GLOBK (Herring et al., 2018) software both contain methods to estimate position, velocity, seasonal variations, offsets 1 , and postseismic deformations 2 . These Kalman-filter-based methods are parametric, processaware, but in contrast to QOCA, not spatially aware.
For regions where complex geophysical processes are at play (such as near a volcano or in subduction zones), more complex analysis is necessary to distinguish between different processes. A common example is the impact of an un unmodeled transient period on the estimated secular plate velocity. In the following, we present a (non-exhaustive, unordered) small selection of tools that start from raw GNSS displacement timeseries to analyze stations exhibiting more complex behavior.
The Network Inversion Filter (NIF), first proposed by Segall and Matthews (1997) and subsequently expanded upon by a variety of studies (e.g., McGuire and Segall, 2003;Bekaert et al., 2016), estimates slip rates on predetermined fault structures from (GNSS or other) observations using a Kalman filter. It is therefore process-aware, and because slip on the modeled faults affect multiple stations, which are jointly used to estimate the slip coefficients, it is also spatially aware. The NIF estimates slip and therefore transient displacement constituents non-parametrically, but the hyperparameters specifying the fault geometry and the characteristics of fault slip in time and space (e.g., smoothness) play an important role.
The Median Interannual Difference Adjusted for Skewness (MIDAS,  algorithm explicitly recognizes the importance of unmodeled steps and shortterm transient deformation in the raw timeseries. Not being a traditional regression scheme, it uses the median of velocities computed from data pairs separated by one year, providing a degree of insensitivity to offsets, small data gaps, and annual seasonal signals if the timeseries is sufficiently long. This process-aware, station-level method is mostly defined by its hyperparameters, although other parameters such as known maintenance and earthquake offset times are used. It is therefore a powerful, largely automated method to estimate secular plate velocities, that does not attempt to extract nonannual seasonal, transient, or decaying signals. MIDAS is at the core of UNR's Nevada Geodetic Laboratory openlyaccessible global GNSS timeseries repository (Blewitt et al., 2018).
The Señales y Análisis de Ruido Interactivo (Interactive Signal and Noise Analysis, SARI, Santamaría-Gómez, 2019) software performs process-aware, parametric, stationlevel regression focusing on an interactive user interface. least-squares or Kalman filtering is used to fit polynomial, sinusoidal, exponential, logarithmic, and step models, allowing for a detailed timeseries decomposition. It also contains useful additional functionality such as automatic discontinuity detection, periodogram visualization, and noise characterization.
The Greedy Automatic Signal Decomposition (GrAtSiD, Bedford and Bevis, 2018) algorithm is an iterative, stationlevel method that focuses on detecting and modeling transient signals in the timeseries. At each iteration, a least-squares regression is performed that includes a linear trend, sinusoidal oscillations, predefined steps, as well as a selection of sparse, transient functions ("multitransients"). Only multitransients that significantly improve the data fit are kept for the next iteration, until a convergence criteria is reached. GrAtSiD can therefore be classified as a parametric approach, that is partly process-aware (for the non-multitransient parts of the regression) and partly process-agnostic (since the multitransients can have a variety of shapes and are not tied to a particular physical source).
MIDAS, SARI, and GrAtSiD are limited to station-level model fit solutions, and do not incorporate spatial awareness.
An example of a non-parametric, process-agnostic, and spatially-aware method to decompose timeseries is the variational Bayesian Independent Component Analysis (vbICA, , a modern iteration of basis reprojection algorithms particularly suitable for GNSS networks. Its key distinction from traditional PCA/ICA is to recognize that probability density functions for individual components are generally not normally distributed by nature, and alleviates this problem by using mixtures of Gaussians. vbICA therefore allows for a more accurate signal separation, as well as a formal way to incorporate component uncertainties. Riel et al. (2014) proposed a method that builds on parametric, process-aware regularized regression. Their approach adds a process-agnostic set of B-Spline functions to model transients in a spatially-aware framework. DISSTANS builds on this framework, which we describe in more detail in Sections 3.2 and 3.3.
Here, we have just described a subset of available tools, focusing on publicly available, complete software packages that provide a reasonable level of portability. There are many other studies that have implemented or adapted codes and methods for specific study regions or purposes; an analysis and comparison of which would be beyond the scope of this work.

Code Overview
DISSTANS aims to build on advancements and best practices of previous work, combining them into a single package that adheres to standards of free, extensible, shareable, and scalable software. At its core, it models timeseries as the linear combination of constituent functions, and estimates the functions' coefficients using least-squares. DISSTANS also includes a suite of pre-and postprocessing tools. In this section, we present key properties and design choices made in the DISSTANS package (Section 3.1), as well as two core functionalities (spline-based transient modeling and spatial regularization, see Sections 3.2-3.3). More implementation details can be found in Appendix A.

Goals
Commonly-used workflows included. To allow researchers to focus more on science and less on programming, timeseries decomposition software should include easy-touse versions of well-established timeseries decomposition workflows. Such software can then be used for generic preand postprocessing, as well as serve as a base on which new analysis methods can be developed. An additional benefit of comprehensive software is the lowering of the entry barrier for researchers new to the field. DISSTANS therefore provides a vast array of such workflows, ranging from data cleaning methods and PCA/ICA decomposition to simple least-squares fitting with standard models and residual analysis.
Incorporating process knowledge. Where knowledge about physical processes affecting GNSS timeseries is present (e.g., an inflating magma chamber), such information can theoretically improve model fitting. It is therefore desirable for timeseries decomposition methods to both include models that best represent known physical processes, as well as methods that are flexible enough to account for unmodeled, unknown processes. DISSTANS allows for such a distinction by offering a range of processaware, as well as process-agnostic models (see Section 3.2 and Appendix A.2). Spatial awareness. With GNSS networks becoming more widespread -and more importantly, denser -we should explicitly recognize that nearby stations subject to common geophysical processes may behave similarly. If we only consider each station individually, we may miss the opportunity to identify constituents that manifest themselves around the noise floor. However, if many stations experience the same signal (with different magnitudes), a joint estimation can theoretically enhance our ability to detect them. Such a method would thereby lower the effective signal-to-noise ratio necessary for constituent extraction. DISSTANS allows one to take advantage of the available spatial information by building on the spatiotemporal transient fitting algorithm developed by Riel et al. (2014) (also see Section 3.3 and Appendix A.3).
Scalability. In order to scale well with both the number of stations, as well as the length of the observation record, it is useful to parallelize the computationally demanding parts. DISSTANS includes an option to parallelize the stationlevel, least-squares solutions, as well as the evaluation of the predicted model timeseries including the full model covariance matrix. Uncertainty estimation. Given the possible complexities of displacement timeseries, a proper interpretation of signal decomposition results can only be made if the tradeoffs between and within models and east-north-up components can be quantified. The full, formal model covariances can be estimated and propagated in the DISSTANS workflow.
Step detection. One omnipresent challenge when analyzing timeseries is the detection and subsequent estimation (or equivalently, removal) of steps in the data. Improper step removal can significantly affect secular plate velocities as well as the character of GNSS noise (e.g., Santamaría-Gómez and Ray, 2021;, but there is no fully-automated algorithm that would remove the need for manual inspection (e.g., Gazeaux et al., 2013). DISSTANS contains semi-automated tools that aid modeling all relevant offsets: a step detector (similar to the one in GipsyX, Bertiger et al., 2020, also see Appendix A.6), a visualization GUI to inspect the data (see Appendix A.8), and loading functions for maintenance records in multiple formats. DISSTANS also features both an empirical (following  and an elastic-half-space-based method to determine whether or not to allow a coseismic offsets to be estimated at any given station and time. Portability and extendability. As new GNSS networks are built, and output formats of data processing centers change, one must be able to easily incorporate these changes. DISSTANS separates the data loading tasks from all other analysis steps, such that the former can easily be updated without affecting the latter. Furthermore, to enable the development and integration of new approaches, DISSTANS is written as a modular, extendable framework (in contrast to single-use collections of scripts, see Appendices A.1 and A.4).

Spline-Based Transient Modeling
To optionally model transient signals in the displacement timeseries of unknown functional shape, DISSTANS includes spline-based models. B-splines in particular are piecewise-polynomial functions that, when constructed in a specific manner, form a full basis for any polynomial function of a given degree over the basis' support. As introduced by Hetland et al. (2012) for geophysical applications, sets of repeated, uniform, integrated B-splines (see Fig. A2 for a visualization) of various timescales and center times can be used to approximate any given unknown transient signal of similar timescales. The ability to approximate arbitrary functions in a process-agnostic framework makes sets of splines useful for timeseries decomposition where standard functions (polynomials, sinusoids, exponential functions, etc.) cannot capture the full breadth of the observations (e.g., aseismic slow slip or volcanic expansion events). A more detailed mathematical description of the available splinebased models in DISSTANS can be found in Appendix A.2.

Local and Spatial Regularization
Sets (or "dictionaries") created by shifting and scaling a single uniform B-spline are not linearly independent (see Hetland et al., 2012), and therefore do not form a "proper" basis in the mathematical sense. It follows that any signal decomposition using such sets is non-unique, and thus requires regularizing the solution. The most commonly used regularization is based on the L2 (Euclidean) vector norm ‖⋅‖ 2 2 , promoting solutions with smaller overall magnitudes. However, in the context of fitting transient signals that may or may not be present in the timeseries, we prefer a regularization scheme that yields sparse solutions, i.e., spline coefficients should be driven to zero if there is not sufficient evidence in the data to warrant usage of any given spline in the overall model fit. L1-norm regularization is such a sparsity-promoting regularization scheme: it penalizes the absolute magnitudes ‖⋅‖ 1 of the estimated parameters, driving many parameters close to zero. L0-norm regularization goes further by penalizing the existence ‖⋅‖ 0 of a parameter, thereby either driving parameters to zero, or not penalizing a parameter at all (Candès et al., 2008). This type of regularization is therefore more suited for physical processes which occur sporadically, are not ubiquitous, and have an "arbitrary", but significant, magnitude. All three regularization schemes are implemented in DISSTANS (see Appendix A.3). Riel et al. (2014) combined the potential of using dictionary of splines with the benefits of L0 regularization. Using the algorithm introduced by Candès et al. (2008), they proposed a method to extend the regularization from a timeseries at a single station (henceforth referred to as local L0 regularization) to all the timeseries in a network of stations (spatial L0 regularization). Their approach yields spline-based fits whose estimated model coefficients are sparse in time (i.e., for a single timeseries at one station) and space: transient signals common to multiple stations are decomposed using the same spline functions. An additional benefit of a spatially-coherent set of splines is that it is harder for the solver to fit local noise processes with splines that would only be relevant at isolated stations and times. DISSTANS builds on the method of Riel et al. (2014) (for which the relevant source code was never published), extending it in various ways (most notably, adding parallelization and improving the numerical stability). More details on the implementation of the spatial L0 regularization can be found in Appendix A.3.

Validation
We present two validation datasets and results. The first, in Section 4.1, is a synthetic dataset of 16 stations exhibiting some commonly seen patterns in GNSS network timeseries. Using this synthetic network, we demonstrate key capabilities of this code in estimating spatially-coherent complex signals, all while being able to compare fitted models to the true underlying timeseries. The second dataset, in Section 4.2, is a collection of GNSS stations in the Long Valley Caldera region in California, USA. Here, the main goal is to recover the transient caldera inflation signal, and discuss some subtleties in the analysis when dealing with imperfect, real-world data.

Synthetic Dataset
The code for this analysis, as well as additional discussion, can be found in Tutorial 3 of the online documentation.
A key feature of DISSTANS is its ability to use spatial coherence as an additional source of information and constraint. In general, signals like earthquakes, slow fault slip events, or seasonal loading are spatially correlated, as the processes affecting each station have the same underlying sources. By using this knowledge in combination with the enforcement of sparsity, we ensure that the estimated models are consistent between stations. Processes that only affect a single station are considered noise for the purposes of this study (e.g., antenna maintenance or strongly localized displacements).

Setup
The synthetic dataset is comprised of 16 stations randomly positioned on an elongated, rectangular grid (see Fig. 1). Each two-component station is affected by a secular, linear trend, one annual seasonal signal, an earthquake (with both co-and postseismic components), two shortterm slow slip events, one longterm transient, common mode error, and measurement error (correlated between the components). The linear trend, coseismic and postseismic constituents are all equal in direction and magnitude, whereas the seasonal constituent is random at each station. The three transients are all equal in onset time, duration, and direction, but differ in magnitude. Furthermore, one station ("Cylon") experiences significant powerlaw noise, and a different station is affected by an unmodeled maintenance step. Both signals represent site-specific noise processes that the spatial coherence constraint aims to suppress. Lastly, the amplitudes of the three transients decrease exponentially towards the east.
The analysis follows a simplified version of the example workflow presented in Appendix A.7. Because the data is synthetic, no quality metrics need to be applied, nor is step detection necessary. We add the following constituents to our inversion: polynomial, sinusoidal, step, and logarithmic (to recover everything except the transients; unregularized); and a set of splines. To recover the transient episodes, the spatially-regularized splines contain timescales between a  month and multiple years, amounting to hundreds of individual splines (see Fig. S4). The fitting converges smoothly onto the final solution (see Fig. S1). In the following, we compare the results obtained using local and spatial L0 regularization to highlight the benefits of promoting spatial coherence.

Results
Fig . 2 shows the East component of a representative station. The inferred model fits the synthetic data well. We find a small tradeoff between the secular and transient constituents, although we note that in real world applications, such a conclusion is frequently difficult. (A visualization of the full model parameter correlation matrix can be found in  Table S1 present comparison results of our spatial L0 solution with the L0 solution without spatial regularization (see below) and other commonly used methods (simple least-squares, MIDAS), showing that the spatially-aware L0 solution clearly outperforms other methods. Fig. 3 shows the improvement from local to spatial L0 regularization in map view for all stations: the transient components are smoother (therefore fitting less noise) and more closely follow the true signal (shown in the background). Importantly, the homogenous displacement field is obtained without degrading the fit to the data (compare Fig. S3). This is enabled by the spatial solver's identification of the set of splines that best describes the transient signal common to all stations (compare Fig. S4), and the better recovery of the secular velocity. Section S.2 explores the dependence of the model error on the number of stations for a different synthetic network, with Fig. S6 further validating our claim that incorporating information from nearby stations improves the quality of the resulting model fit.

Long Valley Caldera
The code for this analysis, as well as additional discussion, can be found in Example 1 of the online documentation.
To demonstrate DISSTANS with real data, we consider timeseries from the Long Valley Caldera (LVC) region in California, USA. Because of the geophysical interest into the magmatic, seismic, and hydrological processes at work there, the LVC has been monitored by an ever-expanding network of GNSS stations since the late 1990s (e.g., . The displacement timeseries are complemented by detailed maintenance and seismic catalogues, which are crucial for determining the best set of steps to include in the fitting process. In this example, the goals are threefold: (1) to illustrate the example workflow proposed in Section A.7, (2) to present the best-fit transient model to the periods of unrest in the Long Valley Caldera, and (3) to showcase the importance of allowing the seasonal signal models to vary in amplitude over time. Any in-depth physical modeling of the extracted constituents is beyond the scope of this study.

Setup
The data and corresponding maintenance and seismic events catalog are downloaded with DISSTANS-included tools from the GNSS timeseries repository maintained by the University of Nevada at Reno's Nevada Geodetic Laboratory (Blewitt et al., 2018). Only stations with a reliability of over 50% (i.e., observations on more than half of the days the station was active) and an observation record at least one year long are considered, and outliers in each timeseries (more than 10 standard deviations away from the median) as well as the common mode error are removed. With help of the available maintenance catalog, we iteratively identify steps in the data. This process is aided by the step detector  and visualization routines included in DISSTANS. For the final fit, we use following models: polynomial, sinusoidal, and steps (to recover everything except the transients; unregularized); a set of splines (to recover the transient episodes, containing timescales between months and multiple years, hundreds of individual splines; spatially regularized); and a varying-amplitude sinusoid (for deviations from the nominal, unregularized seasonal signal; locally regularized).

Transient Signals
The timespan between 2012 and 2015 (approximately) is dominated by a significant expansion of the caldera's dome, as observed by both the GNSS network and Interferometric Synthetic Aperture Radar (InSAR) timeseries . Fig. 4 shows the horizontal component estimated by DISSTANS in map view: the radial extension of the network from the center of the dome is clearly visible.  Silverii et al. (2020, Fig. S3a), where transients were recovered using non-parametric multiyear filters, even though the directions of maximum displacements are different. Crucially, however, we did not enforce the secular long-term motion to be zero during a specific timespan. As a result, many stations appear to never reach a "steady-state" matching the general plate motion, because the transient motion, even when regularized, is dominant for large parts of each timeseries. (A priori removal of a secular trend can easily be done with DISSTANS, if desired.) Estimation of the transient signal directly affects the secular velocity estimate, for which different published values for the stations in and around the Long Valley Caldera exist, which in turn enables a more straightforward validation than comparing extracted transients between methods or studies. In Section S.3, Figs. S7-S9, we compare our results with the MIDAS-derived secular velocities  and the Geodesy Advancing Geosciences and EarthScope (GAGE) facility's secular velocities .

Seasonal Signals
Traditional least-squares model fitting for GNSS timeseries usually either approximate the seasonal signal as having a constant amplitude and phase over the entire timespan considered (or piecewise within that timeseries) (e.g., Heflin et al., 2020), or estimate a more accurate seasonal deformation signal from filtering or component-analysis methods (e.g., . The two approaches are usually acceptable, as either the resulting residuals are insignificant, or are not prone to producing large seasonal residuals in the first place. Given our transient modeling of even small timescales (down to the order of less than 100 days), our method does suffer from these seasonal residuals, as annual rain-and snowfall can vary widely, especially in the Sierra Nevada. In fact, because seasonal residual are highly correlated between stations, they are not removed by our spatial L0 regularization. Modeling the seasonal signal as the sum of both an unregularized, constant, nominal constituent, and a simple, L1-regularized, station-specific deviation constituent of the same nominal frequencies that is allowed to vary in amplitude (and by construction, instantaneous phase) over time, the solver is once again able to separate seasonal (i.e., periodic) signals from (aperiodic) transient motion (see Appendix A.2). One example of the resulting seasonal fit in the vertical direction is shown in Fig. 6. Variations in the amplitude, and sometimes instantaneous phase, are clearly visible, demonstrating the importance of properly estimating and removing the seasonal signal at stations that are affected by major hydrological processes.

Discussion
The choice to incorporate process-agnostic, spatial awareness into the timeseries decomposition problem by means of a parametric, spline-based model that requires regularization and iteration may possibly appear odd -after all, vbICA and comparable methods already have an inherent sense of space. However, even though basis decompositions have a spatial component, the geometry of the network is neglected (e.g., relative distance between stations). Network geometry and extent become relevant when networks are large, and some signals are spatially confined: different processes at different locations may be mapped into the same component, complicating its interpretation. Furthermore, in order to obtain a clean decomposition using vbICA or similar methods, maintenance and earthquake coseismic offsets still have to be removed ahead of time, as well as the linear secular trends. Therefore, not only do these nonparametric decomposition approaches require a significant amount of preprocessing in the first place, the separation of preprocessing and actual decomposition precludes a straightforward way to quantify the covariance between the constituents. Using parametric models that are both processaware (such as secular, seasonal, and maintenance offset models) and process-agnostic (using a dictionary of splines for transients and seasonal variations), by contrast, offers this correlation by design, while the spatial L0 regularization accomplishes the goals for sparsity and spatial awareness. We note that DISSTANS still does offer PCA/ICA, for example used for the common mode removal.
Parametric approaches allow one to include prior knowledge beyond the preprocessing steps. Incorporating such knowledge is already partly possible through the choice of the model functions (e.g., inserting a postseismic displacement model after a large earthquake), but least-squaresbased methods such as the one used by DISSTANS also allow analytic inclusion of a priori model parameter knowledge, which may be added in future versions.
We omit a detailed look here at hyperparameters (e.g., regularization penalties, the number of iterations), as differing goals, as well as different characteristics of the data, will have a large impact on what the "best" choice is, and general assertions are therefore not possible. The tools presented here therefore do not relieve the user of the task of finding the best set of hyperparameters for their data and problem formulation, although the documentation includes the specific choices for the cases presented in the previous section (based on both analytic and empirical considerations), which may provide a good starting point.
An important caveat of using a fixed dictionary of splines to model transient signals is that such fits are not phaseinvariant. Processes that move both in space and time (e.g., slow slip events in Cascadia, Riel et al., 2014) are "discretized" by the onset times of individual splines, such that multiple splines (of possibly different periods) are necessary to capture a potentially simple signal that migrates in time. Failing to account for different onset times throughout the network could impact the quality of fit, as well as reduce the sparsity of the solution. However, experience shows that phase invariance is not as crucial as it may seem: First, observation noise makes exact onset times of transient signals hard to determine, and simultaneously allows the solver to fit splines that are adjacent in time when the "best" onset time would be somewhere in between the splines' onset times. Second, if the problem persists, more splines of different periods or new onset times can be easily inserted into the models (with the main drawback being higher computational costs). In neither the synthetic nor real data examples presented here did the splines' periods or onset times have to be adjusted from an initial, default configuration to obtain a high-quality decomposition.

Conclusion
Displacement timeseries of regional GNSS networks are commonly used to monitor surface deformation, plate motion, as well as transient signals such as hydrological loading or aseismic slip events. A crucial step in these analyses is the decomposition of the input (raw) timeseries into its constituents: secular motion, periodic seasonal variations, step offsets due to earthquakes, etc. As networks continue to grow in number and size, so does the need to efficiently analyze timeseries. We combine the many features of previously published analysis methods into a single, generic, open-source framework. The DISSTANS Python package includes: (1) incorporation of spatial information through the use of a spatial L0-regularized least-squares solver, (2) CPU-based parallelization for scaling to large networks, (3) formal uncertainty quantification with covariance matrices between components and constituents, (4) a suite of supporting tools including timeseries files data management, common mode estimation, and simple, automated step detection, as well as (5) visualization methods to accelerate data and model inspection by the user.
Validation with synthetic GNSS network timeseries shows the beneficial effect of fitting transient signals with the spatial, L0-regularized solver: transients in the data are fit sparsely both in time and space, and are able to recover the true underlying motion better than comparable solutions without spatial awareness. An analysis of GNSS displacement timeseries from the Long Valley Caldera region in the Sierra Nevada, California, USA demonstrates the viability of our approach using real data, jointly decomposing the timeseries into step offsets, secular motion, transient signals, as well as time-varying seasonal displacements.

Computer Code Availability
DISSTANS is available at https://github.com/tobiscode/ disstans under the GPL-3.0 License.  on presenting the structure and methodology of the package, with little to no actual sample code. For sample code, please refer to the package documentation. The following typesetting will be used for clarity: classes are capitalized and typeset in bold monospace font (e.g., Station) and attributes, properties, variables, methods, functions as well as general code are typeset in regular monospace font (e.g., parameters or import disstans) with callables (e.g., functions and methods) additionally being trailed with parentheses (e.g., get_mapping()).

A.1. Structure
Fig . A1 presents the modular structure of DISSTANS. The highest level of abstraction is the Network class, which serves three main purposes. First, for each station in the network, it contains a Station object in its stations dictionary attribute, which enables straightforward access. Its second use is to provide a suite of convenience methods that perform a certain task for each station. Without parallelization enabled, their only advantage is that a user does not have to write explicit for-loops. However, Network methods also implement an automatic switch to parallelized execution using Python's multiprocessing.pool module if the configuration is set accordingly. Finally, the Network class contains methods that interface with all stations simultaneously; for example, the graphical user interface gui() and other plotting functions (more details about visualization methods in Appendix A.8). Plotting functions are based on the standard Matplotlib (Hunter, 2007) and Cartopy (Elson et al., 2022) packages.
One level down in the hierarchy is the Station class. Apart from storing the metadata information name and location, it is the container object for all datasets being assigned to the station; for example, raw or post-processed GNSS displacement timeseries (e.g., Dataset 1 and Dataset 2 in Fig. A1). A network can contain multiple stations, and each station can contain multiple datasets, but not all datasets have to be present at all stations. The Station class also provides functions that directly work on contained timeseries, such as analyze_residuals().
On the third level, for each dataset, a station contains three key elements: the actual data (in the Timeseries object, stored in the Station.timeseries dictionary), the associated models (as a ModelCollection object containing the individual Model objects, stored in the Station.models dictionary), and any fits to the data based on model evaluations (as a dictionary of Timeseries objects, one for each model, plus one for all models jointly, all stored in the Station.fits FitCollection object). Using the methods provided by the Station class ensures that whenever a new dataset is added (or removed), all three elements are initialized (or deleted) appropriately. While this separation might appear somewhat confusing, it is necessary to enable easy access to individual objects while preserving flexibility. For example, a Timeseries object is physically independent of whatever model one wants to apply to it, and therefore the code should reflect this (i.e., the Timeseries object should not change when a model is added or removed, or when an individual model is evaluated to yield a prediction). The separation into data, models, and fits also allows for the same dataset to easily have different models at different stations, or multiple models of the same class (e.g., two sets of step functions, one for maintenance and one for earthquakeinduced steps). Using the Timeseries class also for fits (i.e., model-predicted timeseries) allows for the efficient re-use of practical Timeseries methods such as file storage or mathematical operations.
On the lowest level, the Model and Timeseries objects store their data using standard NumPy arrays  and pandas DataFrames (McKinney, 2010; The pandas development team, 2021), respectively, enabling seamless integration with existing Python-based workflows.
The open-source nature of the code, along with a defined hierarchical, object-oriented structure, allows for easy modification and extension by the user through subclassing. For example, storing additional station metadata such as antenna information can easily be implemented by creating a Python class inheriting from the Station class and extending the initialization function to accept additional instance variables. Another example is the implementation of new user-defined models by subclassing Model which then seamlessly integrate into the rest of DISSTANS's workflow. Finally, loading timeseries data from a custom data format can be integrated into DISSTANS by subclassing the Timeseries class. In fact, all of the included models (see below) and timeseries file formats are subclasses of Model and Timeseries, respectively, and can be used as examples by users wishing to extend the code functionality.

A.2. Models
DISSTANS uses a linear combination of parametric models. Parametric models linear in their coefficients (i.e., not necessarily composed of linear functions) allow both simple unregularized as well as more complex L2, L1 or L0 regularized least-squares fitting (more detail about regularization schemes in Appendix A.3). Furthermore, estimating multiple models jointly is straightforward as they are linearly added together, and the mapping (or design) matrix is simply a horizontal stack of all the models' individual mapping matrices (everything automatically done by the ModelCollection class). Lastly, the formal estimated model parameter covariance matrix can usually be estimated in closed-form.
The individual Model classes included in DISSTANS can be separated into basic and spline models. All models can be used with one or multiple data components. The basic models currently included are: Polynomial, Step, Sinusoid, Logarithmic, Exponential, HyperbolicTangent and Arctangent. The basic models are either single functions (e.g., logarithm), or their functions form orthogonal bases within their class (e.g., polynomials). The spline modeling in BSpline or ISpline model is based on Hetland et al. (2012) and Riel et al. (2014), containing multiple cardinal B-or integrated-B-splines (respectively) of the same timescale and order but with different center times. The SplineSet combines several BSpline or ISpline models of different timescales into one large collection, forming a linearly-dependent (overcomplete) spanning set able to approximate arbitrary functions. The AmpPhModulatedSinusoid estimates a sinusoid of a given nominal frequency, but allows the instantaneous amplitude and phase to vary. Time-varying properties are enabled by modeling the linear sine and cosine coefficients of the sinusoid as being defined by a linearly-independent set of B-Spline basis functions over the given time interval. Some form of regularization is necessary to gain a meaningful result when using an overcomplete set of splines.

A.2.1. Joint Mathematical Formulation
In DISSTANS, the joint mathematical formulation ( ) is the sum of all the individual models contained in a ModelCollection. Each individual constituent Model (described by Model objects) can again be a linear superposition of functions and corresponding coefficients : Here, num_parameters is the total number of all individual functions, and therefore also the number of all coefficients to be estimated. While the models are continuous in time, timeseries decomposition inherently works on discrete observations at times . Using matrix notation, the least-squares problem can be formulated as follows: and ∈ ℝ num_observations × 1 is the column vector of residuals. All solvers start from this formulation to find the best set of that minimizes a given cost function dependent on (potentially including regularization criteria, see Appendix A.3). The choice of the data misfit loss function implicitly defines the assumed distribution from which is drawn (e.g., a Normal distribution in the case of unregularized least-squares). The observations can include measurements in all three dimensions, allowing the use of crosscomponent covariances in the fitting process. In DISSTANS, the mapping (i.e., design) matrices are assembled by the get_mapping() methods, is represented by Timeseries objects, and is returned by the solver in Solution objects and added to each Model object.
In the following three subsections, we detail both the basic and the spline-based models.

A.2.2. Basic Models
The basic models in DISSTANS include function commonly used to model geodetic timeseries: where all ′ are just stand-ins for the overall set of , can vary between models, the step are step times, and ( ) is the Heaviside function. While all of these models are available out-of-the-box, the user still has to actively specify which models to use, how many of each, and with which reference times, timescales, or periods.

A.2.3. Linearly Dependent, Overcomplete Dictionary of Splines: SplineSet
For study areas where significant transients of arbitrary shape can be found, DISSTANS offers spline-based transient modeling.
We start with the formulation of a single cardinal Bspline basis function (spline function) of reference time ref .
"Normalized" timestamps ′ can be calculated as follows: By default, this single spline function is then shifted to multiple center times by using its timescale , leading to different normalized timevectors for each spline function: (Here, = 0 … num_splines only considers the spline functions.) To create the spline functions of a certain degree (with order = + 1), we can then use the following relation (Butzer et al., 1988;Schoenberg, 1973): This is the model represented by BSpline. Based on Hetland et al. (2012), this study uses the integrated form of this spline function to represent transients. Its mathematical representation is: The final spline model (a single BSpline or ISpline object) over all the available center times is therefore Within the SplineSet class, this model can then be repeated again for different timescales . Fig. A2 shows example spline functions.

A.2.4. Linearly Independent Spline Basis for Time-varying Sinusoids: AmpPhModulatedSinusoid
For study areas where the amplitude of seasonal oscillations varies significantly, DISSTANS offers varyingamplitude sinusoidal modeling. Such modeling can also be used to improve the fitting of shortterm transient processes, as potentially periodic parts can then be accommodated by the seasonal model. The simple Sinusoid class models a seasonal signal, given a certain frequency , as the linear combination of a sine and cosine combination, allowing one to estimate both phase and amplitude as a linear problem: Sinusoid ( ) = cos ( − ) = cos ( )+ sin ( ) (10) If we want to allow the overall amplitude to change over time, we can extend the definition of (and similarly, ): To keep the problem linear, we can use a spline representation for Δ ( ), Δ ( ): Where the (and respectively, ) are the parameters to estimate, and ℎ are the spline basis functions (more on ℎ below). Expanding Sinusoid ( ) with the extended definition leads to a natural separation of components: Here, the first term represents the nominal component, and the second term the deviation component. In DISSTANS, the terms correspond to the Sinusoid and AmpPhModulatedSinusoid, respectively. Note that the ℎ are not the same as for the dictionary of splines defined above. The dictionary is comprised of a single (cardinal) spline, that is of a defined length scale (i.e., period), and centered at specified timestamps. Here, for AmpPhModulatedSinusoid, we do not need the spline to be the same one shifted and scaled, instead we can default to the more general notion of B-Splines -a complete basis for polynomials of a given degree on a given interval. This relaxation allows us to use SciPy's basis function implementation directly . Fig. A3 shows an example set of spline basis functions ℎ , as well as the resulting modulated cosine and sine terms used as the spanning functions for AmpPhModulatedSinusoid.
Although it is not strictly necessary to includēand explicitly in ( ) and ( ) (splines can also represent any constant function), the separation is useful because it allows a regularized solver to not penalize the nominal component.

A.3. Solver Functions
The provided solver functions are least-squares (therefore parametric) solvers, with varying degrees of added complexity. They each 1. Build the mapping and observation matrices for a given Timeseries object of observations and ModelCollection object ( and , respectively, see Appendix A.2), 2. Divide the solution process into independent subproblems if there is no data component covariance (decreasing the computational burden), 3. Call a lower-level solver to minimize the cost function ‖ − ‖ 2 2 (potentially subject to regularization), 4. Optionally calculate the formal model parameter covariance matrix , and 5. Return a Solution object (containing the best-fitting ).
To prevent convergence or numerical issues, the solvers and the Solution class keep track of model parameters that cannot be estimated (because they are not observable given the timespan of the observations) or should not be estimated (useful, for example, if some splines in a SplineSet are assumed to be zero). The regularized solvers additionally keep track of which model's parameters should be regularized, allowing for a flexible regularization approach.
The first, most basic solver is linear_regression(), which provides the above-mentioned features as a wrapper to the least-squares routine in SciPy . It can therefore be regarded as a minimal code example for new, user-defined solvers. The cost function to be minimized is: and the posterior covariance matrix given the data covariance matrix is where −1 is the generalized pseudo-inverse for matrices. The second provided solver, ridge_regression(), adds L2 regularization, and also relies on the least-squares routine in SciPy. It minimizes the cost function where is a chosen regularization penalty hyperparameter, and reg is the subset of that should be regularized. Furthermore, can vary between data components to account for different noise levels. The posterior covariance matrix takes the regularization into account: The third solver, lasso_regression(), uses CVXPY (Diamond and Boyd, 2016; Agrawal et al., 2018) to provide L1 and, by means of weighted iterations, station-specific L0 regularization (Candès et al., 2008). In its basic form, the solver minimizes By defining a reweighting function and iterating on the L1regularized solution, the lasso_regression() solver approximates the solution for the L0-regularized 3 least-squares problem, minimizing Because the result of an L0-regularized solution is approximately the same as if an unregularized problem was solved with only a subset of model parameters to be estimated, the posterior covariance matrix for lasso_regression() is the same as for linear_regression(), but setting to zero the covariances which were not estimated.
Note that the reweighting function does not explicitly appear in the L0 cost function, although it does mimic the role of the regularization penalty from the L2 and L1 cost functions. Specifically, during the iteration process, the reweighting function returns penalties that anticorrelate with parameter amplitude: small parameters will be penalized heavily, and large parameters will receive very little penalty. With this approach, the solver converges to the L0-regularized solution where parameters either have near-zero penalty (if the parameters are deemed significant by the solver), or near-zero amplitude (because of their high penalty). Consequently, the final cost function does not contain an explicit penalty hyperparameter , although care needs to be taken when specifying the reweighting function such that it is able to distinguish insignificant from significant parameters. Thresholds for this distinction are not hard cut-offs; they are defined within the context of ReweightingFunction objects, and usually correspond to the location of an L-shape bend in the reweighting function's shape. While the appropriate choice of functions and scales will vary between applications, a good (empirical) starting point are functions whose penalties close to an input value of zero are of a similar order of magnitude of the data being fitted. For more details about the implementation of the L0-regularized solver, including examples of reweighting functions, see Candès et al. (2008).
The Network.spatialfit() method extends the possibilities of station-specific L0 regularization to also take into account the weights of a given model at nearby stations. The approach implemented here follows Riel et al. (2014) closely, with the goal to identify signals close to the noise floor, suppress local noise, and promote sparse models in both time and space. A visual summary of the method is given in Fig. A4. DISSTANS is able to perform the stationspecific fits in parallel, resulting in a large runtime improvement. Lastly, Network.spatialfit() can also minimize the jointly L1-and L0-regularized problem:

A.4. Data Formats
All timeseries datasets are stored as objects of Timeseries subclasses. The Timeseries parent class defines an internal data structure that all further processing done by DISSTANS methods of all levels rely on. DISSTANS also implements properties such as the calculation of a timeseries length or reliability, the possibility to use Python's in-built mathematical operators to create new timeseries, and convenience functions such as cutting the timeseries or building covariance matrices at a particular timestep.
Subclasses, in turn, define how any particular input file gets loaded to match the common structure. The two provided subclasses are GipsyTimeseries (for JPL's GipsyX .tseries files) and UNRTimeseries (for UNR's .tenv3 files). User-defined classes can easily be created by adhering to the format of the two existing subclasses, and checking the documentation of Timeseries.  Figure A4: Flowchart of the spatiotemporal L0-regularized solver as described in Riel et al. (2014). Symbols and colors from Fig. A5. At each station, an L1-regularized least-squares t is computed, where each parameter has an associated weight. The weight is inversely correlated to the parameter magnitude. Parameters close to zero are iteratively penalized, whereas signicant parameters have their penalty gradually reduced to zero. Iterated L1 regularization eectively approximates an L0-regularized solution (see Candès et al., 2008). By combining the weights between stations with a median in an intermediate step, parameters that are signicant at other nearby stations as well are promoted, and parameters that are insignicant are demoted.

A.5. Synthetic Data
The creation of synthetic data is another feature directly integrated into DISSTANS. Each Model and ModelCollection object has the two methods read_parameters() and evaluate(), which integrate into existing Python workflow by accepting and returning (respectively) NumPy arrays. A typical workflow to generate datasets therefore is to instantiate Model objects (e.g., a polynomial of a certain order), define and read in the parameters of the model, and finally evaluate the individual models (or a ModelCollection containing the individual ones). If the data is then to be used within DISSTANS, a simple Timeseries constructor exists for NumPy arrays, otherwise one can use the regular NumPy methods for exporting the data.

A.6. Step Detector
DISSTANS includes the StepDetector class to perform statistics-based assessments on whether step models should be added to a timeseries to estimate offsets due to physical (e.g., earthquakes) or non-physical (e.g., maintenance events) processes. Since there is no fully-automated algorithm that approaches the performance of manual inspection (e.g., Gazeaux et al., 2013), the focus here lies on providing a semi-automated method that is to be used in conjunction with end-user interaction. The method implemented in DISSTANS is based on the Akaike Information Criterion (AIC, e.g., Burnham and Anderson, 2002). For a given window size, the algorithm will evaluate the residuals of fitting the timeseries with two different models: one with a simple linear slope and offset, and the other with a linear slope, offset, and an additional offset in the center of the window. Then, for each timestep, the relative probability of the step versus the no-step model being true is calculated. In the last step, the maxima of the step probabilities are calculated and thresholded. The user can then examine the steps (alongside their respective variance reductions) and determine whether to add the offsets as steps to be estimated.
Our step detector approach is similar to the one in GipsyX (Bertiger et al., 2020), but where GipsyX considers multiple window sizes, DISSTANS only uses a single one. Of course, StepDetector can be run multiple times with different window sizes, such that their combined results can provide a more robust step probability estimate.

A.7. Example Workflow
Even though DISSTANS is modular and therefore highly flexible, we propose the workflow presented in Fig. A5 as a general starting point for timeseries decomposition with DISSTANS.
The first step is the acquisition and preparation of the raw input datasets, i.e., the GNSS network station displacement timeseries (and, if available, associated maintenance and seismic catalogs). Applying quality metrics such as requiring a minimum number of observations or station reliability (through their respective Timeseries attributes num_observations and reliability) ensures that the fitting process is not hindered by bad data.
We view the second step as a "preprocessing" one, where we identify and remove statistical outliers and the common mode errors (CME, see Dong et al., 2006;Huang et al., 2012) from the observations (see Fig. A6 for more detail). The relevant functions are median(), clean() and common_mode(), which are called on the entire network (respecting parallelization) through the Network methods call_func_ts_return() and call_func_no_return(). The CME is systematic for the entire network, reflects noise in the estimation of the reference frame and manifests itself as a high-frequency noise realization that should be estimated independently of model fits (which could create additional systematic errors). To estimate the CME (e.g., using PCA/ICA), we first remove empirically the potentially interesting, low-frequency signal using a low-pass running median. A median filter is robust when handling large steps in the data (which may be present before any step removal is performed). Outlier removal is performed on the residual between low-pass and input signal, based on the residual's variance.
Offsets (or steps) in the data are the big obstacles for model fitting. Left unaccounted for, they will influence every other model component (e.g., the secular plate velocity).  Figure A5: Example workow for using DISSTANS, explained in detail in Appendix A.7. Blue rectangles represent single computational steps, orange rectangles with cut corners represent sub-workows discussed in more detail elsewhere, and green, rounded rectangles represent datasets at their dierent stages of processing. The numbered steps in the text correspond to the numbering in the top left corners of the rectangles.
While big jumps in the data can easily be spotted by comparing a measurement with the variance around the mean of previous observations, smaller offsets that are either below or similar to the data variance, and/or are accompanied by transient motion, are more challenging to detect. Ideally, all occuring offsets are known in advance based on ancillary catalogs, and could be categorized into equipment changes and physical processes. Maintenance events (e.g., antenna replacements, software changes, receiver upgrades) usually are well-recorded and accessible. Functions like parse_maintenance_table() and parse_unr_steps() are useful for these purposes. However, not all maintenance events automatically have a visible effect in the data, and therefore there are "grey zones" where the addition of a modeled step may be more harmful than beneficial. In these cases, we can perform an iterative process between fitting larger signals, and then checking again for evidence of smaller offsets.
A similar case can be made for the presence of coseismic displacements. Large, nearby earthquakes produce offsets that can be predicted from seismic catalogs and simple forward modeling of the expected displacement at any given station. (The earthquakes module provides this functionality in DISSTANS.) However, smaller events might not necessarily warrant an additional modeled step, and very fast transients would be better fit by transient models. Therefore, we recommend an iterative approach here as well.
The next steps in the proposed workflow are iterations of step-detection and model-fitting. In the third step, an unregularized least-squares fit with only a polynomial and some sinusoidal models is performed at each station individually. Using the StepDetector class, extremely prominent offsets in  Figure A6: Preprocessing sub-workow, following the same symbolic and coloring as Fig. A5 (step 2), with rose circles representing mathematical operations. First, a running median of the input is calculated, which results in a lowpass ltered timeseries. The variance of the input around the lowpass timeseries is used to detect outliers. Removing them from the input yields the outlier-free input. Without common mode estimation, this is also the nal output. To remove the common mode, the dierence between the lowpassed input and the outlier-free input is calculated, which yields an outlier-free, highpassed input. The dominant component of this timeseries is the best estimate of the common mode error. Removing this from the outlier-free input yields the outlier-free, commonmode-removed output.
the data are well resolved, and are added to a list of offsets to be fit (with the Step model class).
In the fourth step, using the initial simple models, the defined list of offsets, and a SplineSet dictionary of longterm transient splines, another unregularized least-squares fit is computed. Together with external maintenance and seismic catalogs, a second run of the StepDetector then aims to identify smaller steps that are to be estimated.
For the fifth step of the proposed workflow, the Network.spatialfit() method is used to perform a networkwide fit using the aforementioned polynomial, sinusoidal, and step models, as well as an expanded spline dictionary that includes also shorter-term transients. Only the spline parameters are subject to the spatiotemporal L0 regularization, although the regularization can be extended to all models. Defining an appropriate ReweightingFunction ensures a sparse, yet well-fitting solution. When seasonal effects are found to be strongly varying in time, allowing the seasonal signal to vary in amplitude over time (using AmpPhModulatedSinusoid models), can also improve the fit.
The results at each step are a set of model parameters for each data component, together with a complete parameter covariance matrix. They can be evaluated at all stages to yield the overall model-predicted timeseries (including its predicted uncertainty), as well as the individual constituent contributions. The residuals can be computed using the Network.math() methods and analyzed using the Network.gui() method to assure no systematic misfit is present. Of course, there are many variations to this example workflow.

A.8. Visualization
Because the raw data contained within Timeseries objects are standard pandas DataFrames, they can be plotted using standard Matplotlib code using their Timeseries.time and Timeseries.data attributes. Utilizing commonly-used Python object formats enables easy inspections of a particular station, timeseries, or fit; and allows for nonstandard user-desired plotting. Model parameter values and covariances (accessed through their Model.parameters and Model.covariances NumPy array attributes) are also directly plottable with Matplotlib.
There are high-level visualization routines already included in DISSTANS. The core functionality is contained within the Network.gui() method, which provides a clickable map of the network (optionally with satellite imagery background), and a separate figure with all the timeseries contained by a station. If a timeseries contains fitted models, the overall model prediction is plotted, and optionally, can be split up into the different model components, and if there are SplineSet models present at a station, a scalogram can be shown. All figures can also be saved directly to files.
Furthermore, to visualize station motion in a map view, the Network.wormplot() method can produce still maps and animated videos of station displacements (or individual model constituents of them). Lastly, the Network.graphical_cme() method performs common mode estimation (see Appendix A.7) and presents the temporal and spatial components separately for validation purposes.
Tobias Köhne received his B.S. (2016) in Mechanical Engineering from the Technical University of Munich, Germany and his M.S. (2018) in Aerospace Engineering from the University of Texas at Austin, USA focusing on orbital mechanics. He is currently a Ph.D. candidate at the California Institute of Technology, where he is working on modern tools for geodetic timeseries analysis of large-scale, continuously-operating GNSS regional networks. At Caltech, he was also part of an investigation into how machine learning approaches can aid to extract more information out of the currently available InSAR timeseries datasets, and worked on a small study on the dynamics and origins of retrograde Jupiter Trojans. His research interests include processes associated with the seismic cycle, migration of magma and water in the subsurface, tides, and glacial rebound; tectonics and the relationship between short and long time-scale processes; glaciology, particularly basal mechanics and ice rheology; tools and applications using space geodesy, particularly GNSS and SAR; Bayesian methods for large geophysical inverse problems; and application of space geodesy for monitoring and rapid response to natural disasters. Prof. Simons is a fellow of the American Geophysical Union.   3 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 28.54 D 57.08 D 114.2 D 228.3 D 456.6 D 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 28.54 D 57.08 D 114.2 D 228.3 D 456.6 D 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Coefficient Value (a) Jeckle, L1 2000Jeckle, L1 2001Jeckle, L1 2002Jeckle, L1 2003Jeckle, L1 2004Jeckle, L1 2005Jeckle, L1 2006Jeckle, L1 2007Jeckle, L1 2008Jeckle, L1 2009Jeckle, L1 2010Jeckle, L1 28.54 D 57.08 D 114.2 D 228.3 D 456.6 D 2000Jeckle, L1 2001Jeckle, L1 2002Jeckle, L1 2003Jeckle, L1 2004Jeckle, L1 2005Jeckle, L1 2006Jeckle, L1 2007Jeckle, L1 2008Jeckle, L1 2009Jeckle, L1 2010 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Coefficient Value (b) Cylon, L1 2000Cylon, L1 2001Cylon, L1 2002Cylon, L1 2003Cylon, L1 2004Cylon, L1 2005Cylon, L1 2006Cylon, L1 2007Cylon, L1 2008Cylon, L1 2009Cylon, L1 2010Cylon, L1 28.54 D 57.08 D 114.2 D 228.3 D 456.6 D 2000Cylon, L1 2001Cylon, L1 2002Cylon, L1 2003Cylon, L1 2004Cylon, L1 2005Cylon, L1 2006Cylon, L1 2007Cylon, L1 2008Cylon, L1 2009Cylon, L1 2010.6 D 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Coefficient Value (c) Jeckle, Local L0 2000200120022003200420052006200720082009201028.54 D 57.08 D 114.2 D 228.3 D 456.6 D 20002001200220032004200520062007200820092010.6 D 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Coefficient Value (d) Cylon, Local L0 2000200120022003200420052006200720082009201028.54 D 57.08 D 114.2 D 228.3 D 456.6 D 20002001200220032004200520062007200820092010 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Coefficient Value (e) Jeckle, Spatial L0 2000200120022003200420052006200720082009201028.54 D 57.08 D 114.2 D 228.3 D 456.6 D 20002001200220032004200520062007200820092010 Fig. S3. The horizontal and vertical axes correspond to time and the discrete periods of the splines, respectively. Patches (colored by the spline coecient's value) in this time-period-space represent a single spline in the dictionary, with their extent in time dened as the active period of the spline (i.e., having non-zero gradient), and their height dened by the relative magnitude of the particular spline compared to all splines active at that time. Using the L1 solver, the transients (two shortterm, one longterm) are sparsely tted in time, but not in space (i.e., each station's timeseries is t using dierent splines). The local L0 regularization does not change this general behavior. Spatial L0 regularization leads to the transients being sparsely tted in time and space (i.e., every station's timeseries is t with a similar set of splines). Modeling the transients with coecients sparse in time, space and period is benecial in the context of identifying signals close to the noise oor that are appearing at multiple stations, since the respective coecients will be penalized less, allowing for a more physically-consistent decomposition. Conversely, the penalization of coecients that are only seen at isolated stations makes it easier to identify local shortterm noise processes.   Table S1: Root-Mean-Squared-Error between the secular velocity estimates and the true secular velocity, averaged across the entire network, for the methods presented in Fig. S5, and both data components individually. The spatial L0 solution signicantly outerperforms the other solutions (including the local L0 solution).

S.2 Influence of Number of Stations
The code for this analysis, the synthetic model parameters, as well as the exploration of additional explored hyperparameters, can be found in Tutorial 5 of the online documentation.
In this section, we use a synthetic network of = 20 stations, distributed randomly, that is only affected by a single transient process and white noise, to explore the dependence of the model error on the number of stations used. The noise level relative to the maximum amplitude of the transient signal, , is one of the hyperparameters we vary. The other variable is the number of stations 2 ≤ ≤ used by the spatial L0-regularized solver. For each test case, we therefore subsample the original network to create a subnetwork of smaller size , comprised of randomly selected stations. (We also calculate the result of using a local L0-regularized solver for comparison, where by construction = 1). The number of samples , for each to test, is given by the maximum of either the amount of possible permutations, or a defined maximum value based on computational considerations ( = 50 in our case).
For each and each , we therefore have samples to test. The metric we choose to compare is the root-mean-squared true model error (RMSE), calculated from the final fit of each sampled subnetwork (ensuring the solvers iterate long enough to converge). For each , we therefore compute the double mean of the RMSE, , first across the subnetwork, and then across samples. We also compute the standard deviation of the samples of the subnetwork-wide mean RMSEs. Fig. S6 shows the results of our experiment. For all of the cases, the mean RMSE decreases with increasing number of stations used in the fitting process (approximately by 1∕ √ ). Furthermore, the variance of the errors decreases as well. Importantly, for the case of = 3 (i.e., the white noise standard deviation is three times the maximum magnitude of the transient signal), the local L0-regularized solution has a high error variance centered close to the maximum allowable error (defined as not fitting a transient at all). Including multiple stations in the estimation process, however, decreases the mean error and error variance significantly -with 20 stations, as low as the mean error for the local L0-regularized solution for = 1. In the highest noise case presented here, = 10, most local L0-regularized solutions actually overfit the data. Incorporating spatial awareness prevents the solver to do so. Overall, as shown by the reduction of error, error variance, and susceptibility to overfitting, the importance of using spatial awareness for transient model fitting becomes clear.

S.3 Long Valley Caldera: Secular Velocity Comparison
The code for this analysis can be found in Example 2 of the online documentation.

S.3.1 Method
Qualitatively, our results of transient and seasonal constituents are comparable to, e.g., ; Montgomery-Brown et al. (2015); . To quantitatively validate the decomposition of the input timeseries from the Long Valley Caldera Region (LVCR) into its different constituents, we would need published, already-decomposed timeseries for the same study area. However, we are not aware of such products, and reproducing decompositions based on individual studies is beyond the scope of the paper. A different way to still be able to perform a quantitative validation of our method is to recognize the fact that if our method is successful at distinguishing motion due to transient processes from longterm, secular motion (while still taking into account seasonal, seismic, and maintenance signals), then such estimates of secular motion should be free of physical influences other than longterm plate motion and deformation. Specifically for the Long Valley Caldera, we would expect that on top of a "background" field of motion, we would not see any influence from the magmatic caldera inflation. (Of course, if the caldera intrusion has a steady-state component, we would have no way of inferring this from only GNSS data, and are therefore neglecting this possibility.) Geodetically, we can fit an average field of motion by assuming our study area (a circle of 100 km radius around the caldera center) is, to first order, approximated by a rigid body moving on a sphere. We can then estimate a best-fit rotation matrix (or equivalently, an Euler pole) using standard weighted least-squares (e.g., Goudarzi et al., 2014).
In this section, we compare the results obtained using DISSTANS and its spatiotemporal L0 regularization approach with published MIDAS-derived secular velocities  and the Geodesy Advancing Geosciences and EarthScope (GAGE) facility's secular velocities . We first build a Network object that contains all three different velocity models. Then, the Network.euler_rot_field() method calculates the predicted velocity due to best-fit motion on a spherical Earth. Lastly, we remove the best-fit "background" secular velocities from the previously estimated, "model" secular velocities to produce "residual" secular velocities. The smaller these residuals, the better can our study area be approximated by a rigid plate.
In our comparison, we do not include methods that rely on a priori removal of a secular velocity in order to extract the transient. These approaches currently represent the majority of transient extraction methods; e.g., in combination with filtering (e.g., , with vbICA (e.g., if the timeseries is strongly correlated, , with Singular Spectrum Analysis (SSA, e.g., Walwer et al., 2016), or with stacking (e.g., Kano and Kato, 2020). We omit these methods on the basis that they either do not claim to capture the longterm secular velocity in the first place (e.g., because the analyzed timespans are short, Kano and Kato, 2020) or that they are heavily reliant on assumptions (e.g., assuming a certain timespan represents steady-state velocity, Silverii et al., 2020).

S.3.2 Results
We first want to note that neither the MIDAS nor the GAGE solution explicitly aim to model transient processes, with the exception of postseismic, decaying transient motion. While MIDAS does aim to be robust against shortterm transients, in general, our comparison is therefore not a "fair" one -both MIDAS and GAGE velocity fields are estimated in a fullyautomated fashion and for the majority of global GNSS stations, they provide high-quality, reliable secular velocity estimates. The goal of this subsection is simply to highlight the differences in model results owing to our explicit modeling of transient processes.  LVCR to the ones that are in the latter but not in the former. For stations outside the LVCR, the models produce approximately the same residual RMS (approx. ± 3%), but within the LVCR, DISSTANS reduces the residuals by approx. 2547%, respectively.   Figure S8: Background velocity elds as calculated by the best-t Euler pole for the entire study area and the Long Valley Caldera Region in the upper and lower panels, respectively. Uncertainties, fault oulines, and colors are the same as in Fig. S7.

DISSTANS MIDAS GAGE
The DISSTANS-derived background velocity eld slightly diers from the MIDAS-and GAGE-derived elds, but exhibit the same overall pattern. Caldera Region (LVCR) itself. Calculating the best-fit velocities assuming a rigid-body motion for the three different solutions (independently) yields similar results (Fig. S8) (differences around or below millimeter/year level). Excluding stations inside the LVCR in the estimation process also does not affect the resulting velocities significantly (differences on the order of half a millimeter/year). Fig. S9 shows the residual velocities (difference between modeled horizontal and best-fit background velocities). Qualitatively, the residuals from the DISSTANS solution are visibly reduced inside the LVCR compared to the MIDAS and GAGE solutions. Specifically, the MIDAS and GAGE solutions show a clear expansion component for stations in or near the caldera itself; this expansion pattern is much less prominent in the DISSTANS solution. Outside of the LVCR, all residuals show coherent patterns of motion, indicative of the imperfection of the assumption on the background velocity field (see below). Table S2 quantifies the differences between model residuals using the Root-Mean-Square (RMS) of the residual magnitudes (i.e., the length of the residual vectors). Crucially, in the LVCR, where we expected the residuals to decrease by modeling the transients explicitly, we find that they are reduced by 25-47% (depending on the model). This reduction implies that the original modeled secular velocity field produced by DISSTANS more closely approximates the homogenous background velocity field. The reduction of residuals inside the LVCR is accompanied by only a small increase in residuals of about 3% outside the LVCR. Interestingly, the residual RMS of our velocity model is more similar between stations outside and inside the LVCR (range of 0.4 mm/a), whereas the residuals of the MIDAS and GAGE solution show a larger variance (ranges of 0.5 and 1.9 mm/a).
Overall, we interpret the reduction of residuals in the Long Valley Caldera Region for the DISSTANS solution to demonstrate the benefit of spatial awareness and explicit modeling of transient processes. By separating transient from longterm motion in a spatially-aware framework, the resulting secular velocity field is more homogenous than the MIDAS and GAGE solutions, and diminishes significantly the effect of magmatic inflation periods on the secular velocity estimate in the vicinity of the caldera.

S.3.3 Note on the assumed background velocity field
By comparing the secular velocities instead of the displacement timeseries during transient episodes, we are able to show quantitative differences between our solution and two other published secular velocity fields -the MIDAS  and GAGE  models. To demonstrate the effect of explicitly modeling transient processes on the resulting estimed secular station velocities, we have furthermore estimated and removed a "background" field of motion from the secular velocities, and shown the residual velocities. We take the background field of motion to be the best-fit velocity field for our small study area (100 km radius around the Long Valley Caldera center), assuming rigid body motion on a sphere. Note that we do not assume the background velocity field to represent the true underlying longterm velocity (e.g., we expect distributed shearing in our study area because of the remote North America-Pacific plate boundary), only that such a velocity field should be able to reproduce the modeled secular velocities to first order (which is the case). The estimation of the background velocities is performed using simple, unregularized, weighted least-squares. The results obtained from the three different input fields (DISSTANS, MIDAS, and GAGE) are similar, supporting our assumption that the background field is able to capture most of the secular velocity signal. Therefore, using the residual velocity fields for our comparison simplifies the highlighting of the differences between the processing strategies of the three secular velocity models.