Machine learning thermobarometry and chemometry using amphibole and clinopyroxene: a window into the roots of an arc volcano (Mount Liamuiga, Saint Kitts)

The physical and chemical properties of magma govern the eruptive style and behaviour of volcanoes. Many of these parameters are linked to the storage pressure (P) and temperature (T) of the erupted magma, and the chemistry of the melt phase (X). However, reliable single-phase thermobarometers (P, T) and chemometers (X) which can recover this information remain elusive. We present a suite of new single-phase amphibole and clinopyroxene thermobarometers and chemometers, calibrated using random forest machine learning. These calibrations are used to track the range of pre-eruptive conditions, over the course of a millennial eruptive cycle, on an island arc volcano (Mount Liamuiga, Saint Kitts, Eastern Caribbean). We unpick the recent history of Mount Liamuiga, a stratovolcano that produces a dacitic eruption from the upper crust (~ 2 kbar) prior to the Lower Mansion Series eruptive sequence. This precedes a systematic increase in the temperature of crystallisation recovered by amphibole and clinopyroxene in the middle to upper crust (1.2 ± 0.5 to 5.6 ± 2.4 kbar), which correlates with a remarkable progression of matrix plagioclase chemistry to a less-evolved (more anorthitic) composition in time. Prediction of melt chemistry (SiO2, Al2O3, CaO, Na2O, K2O, FeO, MgO, TiO2) in equilibrium with clinopyroxene and amphibole delineate a liquid line of descent concordant with measured groundmass and whole rock chemistry. We also show that the regression strategy, as opposed to the abject insensitivity to pressure, has hindered previous calibrations of amphibole-only barometers. By applying our new calibrations, we construct a quantitative picture of the magma plumbing system beneath an arc volcano.


Introduction
Accurately recording the pressure and temperature distribution of magma storage is critical for our understanding of volcanic igneous plumbing systems (Blundy and Cashman 2008). This includes the ability to compare the pressure and temperature of erupted products with monitoring signals to quantitatively link eruption magnitude, explosivity and duration (eruptive dynamics) to pre-eruptive magma storage conditions (Voight 1988). Assessing such parameters temporally, be it relative time in the form of stratigraphy (Sisson and Vallance 2009) or absolute time from geochronology (e.g. Shane 2013), may improve our appreciation of how, why, and when these dynamics change during an eruptive cycle (Ridolfi et al. 2008;Shane 2013).
Determinations of intensive parameters (pressure, temperature, melt composition; P, T, X) for a magma can be made using the results of equilibrium experiments, run at specific conditions to match the mineral and melt chemistry of erupted products Sisson et al. 2005;Solaro et al. 2019). Additionally, a thermometer (T), barometer (P), or chemometer (X; recorder of equilibrium melt chemistry) can be calibrated to relate mineral chemistry to experimentally determined P-T-X (Blundy and Cashman 2008;Putirka 2008). Such statistical relationships are derived using an array of approaches: linear regression (Ridolfi et al. 2008;Ridolfi and Renzulli 2012); direct correlation with unit cell parameters (Nimis 1995;Nimis and Ulmer, 1998); multi-phase reaction barometry (P av method; Communicated by Othmar Müntener. Ziberna et al. 2017); linear least-squares regression with a thermodynamic basis (Putirka 2008); random forest machine learning (this study; Petrelli et al. 2020).
Amphibole and clinopyroxene are mineral phases commonly used as thermobarometers and chemometers. Clinopyroxene is typically coupled with a coexisting melt for thermobarometry (clinopyroxene-melt; Neave and Putirka 2017;Putirka 2016Putirka , 2008. The caveat is that an equilibrium melt for each clinopyroxene measurement is required for determination of P and T. Melts can be measured directly in the form of matrix glass (for crystal rims) or melt inclusions (for crystal cores/mantles). Such direct measurement requires the spatial association of liquid with crystal which may be particularly problematic in the case of melt inclusions that can grow preferentially in certain crystal domains. Alternatively, equilibrium melts may be calculated, measured, or inferred. Schemes for this include calculating liquids by mass balance (Hammer et al. 2016), iterative melt matching (Neave et al. 2019), or pairing a combination of measured glass, calculated equilibrium melts and bulk rock chemistry with clinopyroxene analyses (Scruggs and Putirka 2018). The associated uncertainty of matching a melt with a mineral is rarely propagated. In the case of single-phase (clinopyroxene-only) barometers, an independent estimate of temperature is required (Nimis and Ulmer 1998) which introduces further uncertainty. Clinopyroxene may also be used as part of a mineral-mineral thermometer (e.g. orthopyroxene-clinopyroxene; Putirka, 2008) or a multiphase barometer (spinel + clinopyroxene + olivine + plagioclase; Ziberna et al. 2017), mineral assemblage permitting. Amphibole has enormous potential for pressure and temperature determinations due to its common occurrence in hydrous arc magmas (Scaillet and Evans 1999) and compositional diversity (Leake et al. 1997). Temperature estimates may be obtained from amphibole-plagioclase (Blundy and Holland 1990;Holland and Blundy 1994), amphibole-melt (Putirka 2016), and in some cases, amphibole-only (Anderson and Smith 1995;Erdmann et al. 2014;Ridolfi and Renzulli 2012;Shane 2013). Calcic amphibole has also been successfully calibrated as a chemometer for prediction of equilibrium melt chemistry (Zhang et al. 2017). However, its use as a reliable single-phase barometer is debated. It is best applied in multiply saturated granitic systems (T < 800 °C; Fe# 0.40-0.65; Anderson and Smith 1995;Putirka 2016) whereas calibrations of amphibole-only barometers for broader conditions (e.g. Ridolfi and Renzulli 2012) have been widely criticised (Erdmann et al. 2014;Putirka 2016;Shane 2013). Additionally, errors from amphibole barometry are consistently quoted at ± 3-4 kbar which approaches the edge of usability for identifying the loci of pre-eruptive magma storage (Putirka 2016).
The Eastern Caribbean (Lesser Antilles) island arc has been an area of recent interest to determine crustal structure and composition (Kopp et al. 2011;Melekhova et al. 2019), water flux (Cooper et al. 2020), and the architecture of sub-volcanic systems (Melekhova et al. 2017). This is due to its reputation for significant volcanic hazards (Lindsay 2005) and as an exemplar of a slow subduction zone (Wadge and Shepherd 1984). Particular attention has been paid to constraining the sub-volcanic crustal structure using intrusive fragments (widely referred to as "plutonics", "cumulates", "xenoliths", or "inclusions") along the arc by employing experimental petrology (Melekhova et al. 2017(Melekhova et al. , 2015Stamper et al. 2014), thermodynamic modelling (Stamper et al. 2014), and a host of thermobarometers (Camejo-Harry et al. 2018). Mineralogically, these intrusive fragments reveal the importance of amphibole-clinopyroxene phase relations within the crust: they near ubiquitously contain one or both phases, typically displaying partially complete reactions between the two (Cooper et al. 2016;Melekhova et al. 2019Melekhova et al. , 2017. Nonetheless, such intrusive fragments are largely found ex situ (Arculus and Wills 1980) and so lack temporal (with regards to the eruption) or spatial (with regards to the eruptive centre) information. In contrast, the excellent exposure of pyroclastic fall and flow deposits in the Eastern Caribbean (Baker and Holland, 1973;Howe et al. 2014) provides a preferable record of time-integrated P-T-X which may be linked to evolving conditions in the sub-volcanic system. However, natural fragmentation in pyroclastic rocks (e.g. Higgins et al. 2021) and compositional diversity in silicate phases at thin-section scale (notably plagioclase; Toothill et al. 2007) alludes to a dynamic history of storage and amalgamation prior to eruption.
We present a suite of new thermobarometers and chemometers (Online Resource 1; amphibole-only and clinopyroxene-only), calibrated using random forest machine learning. This approach has proved successful for clinopyroxene thermobarometry applied to alkalic magmas from Iceland (Petrelli et al. 2020). We use a section of the Lower Mansion Series stratigraphy (Baker 1969;Higgins et al. 2021;Roobol et al. 1981) on the island of Saint Kitts, Eastern Caribbean, to temporally track the variation of intensive parameters in pyroclastic samples. These are then linked to the remarkably clear increase of matrix plagioclase anorthite content (An#) in time for the same sequence (Higgins et al. 2021). Where possible, we corroborate the P-T-X results with evidence from geochemistry, petrography, and existing phase equilibrium experiments. Developing reliable single-phase thermobarometers and chemometers is critical for the wider volcanology community as they provide quantitative metrics to directly compare volcanic systems and their dynamic behaviours, a notoriously difficult task in Earth sciences (Cashman and Biggs 2014).

Geological setting
Saint Kitts (Fig. 1a) is a northern island in the Eastern Caribbean island arc, resulting from the slow (2-4 cm/year; Shepherd, 1984), westward subduction of the North American plate beneath the Caribbean plate. Recent volcanism on Saint Kitts (~ 42 ka; Roobol et al. 1981) originates from the Mount Liamuiga stratovolcano in the northwest of the island, depositing the Mansion Series stratigraphy (Baker 1969;Roobol et al. 1981;Fig. 1a). The Mansion Series consists of 6 main units (A-F; Roobol et al. 1981). The older Lower Mansion Series, the focus of this study, comprises the Lower Green Lapilli (A), Cinder Unit (B), and Upper Green Lapilli (C). The Green Lapilli layers are primarily greygreen, angular, aphyric, micro-vesicular lapilli of andesitic composition, a rock type not noted elsewhere in the Eastern Caribbean. The eruptive sequence resumes at 4270 ± 140 BP until 2070 ± 150 BP (Baker 1985) with units D-F. They are composed of interbedded ash, pumice fall deposits and pyroclastic flow deposits, along with intercalations of the Steel Dust Series fall deposits on the western flanks of Mount Liamuiga. Pyroclastic flow deposits, restricted to the north and west of Mount Liamuiga, are morphologically divided into bimodal andesitic block and ash flows and polymodal basaltic-andesite block and ash flows (Tate and Wilson 1988). The excellent exposure of the Saint Kitts stratigraphy has inspired several studies of chemical and physical changes in the volcanic deposits through time (Baker 1980;Baker and Holland 1973;Higgins et al. 2021).

Fieldwork and sample preparation
Samples were collected from a 6.8 m thick, stratigraphic section on the east coast of Saint Kitts (Fig. 1b;17.38725,; midway between the villages of Mansion and Tabernacle; Higgins et al. 2021;Sheldrake and Higgins 2021). This section includes the "Pre-Mansion Series pyroclastic deposits" (> 43,000 BP) and Mansion Series units A-C (> 41,420 to > 41,730 BP) which have been dated using 14 C (Harkness et al. 1994;Roobol et al. 1981). Juvenile material (pumice, mafic scoria, or volcanic ash) was collected for chemical analysis, and thicknesses of volcanic units and palaeosoils measured. Beds were sampled at changes in deposit form or macroscopic mineralogy to capture the full variability of the sequence. Samples selected for thermobarometry and chemometry were those which contained phenocrysts of amphibole or clinopyroxene. Data were also included from two in-situ intrusive fragments: one mafic intrusive fragment in SK392 (olivine + plagioclase + clinopyroxene + orthopyroxene + spinel) and one felsic intrusive fragment in SK386B (plagioclase + amphibole + quartz + Fe-oxide). For a more detailed appraisal of the sequence and whole rock textures refer to Higgins et al (2021).

Electron probe micro-analyzer (EPMA)
In-situ mineral analyses of amphibole and clinopyroxene were made on 30 μm polished thin sections using a JEOL 8200 Superprobe at the University of Geneva and a JEOL Fig. 1 a Geological map of Saint Kitts, Eastern Caribbean, modified after Martin-Kaye (1959). The four volcanic centres young towards the northwest. The active centre is Mount Liamuiga which is responsible for the deposition of the Mansion Series. Peléan style volcanic domes of various ages outcrop across the island (e.g. Baker 1968). Study locality is a sea cliff showing a well-exposed pyroclastic fall sequence between the villages of Mansion and Tabernacle. b Stratigraphic sequence that is the focus of this study. A basal pyroclastic flow deposit (sample SK408) separates the Lower Mansion Series (Units A-C according to Roobol et al., 1981). Figure adapted from Higgins et al. (2021) JXA-8530F at the University of Lausanne. Both microprobes were equipped with a five-channel wavelength-dispersive spectroscope system (WDS) and were operated at an accelerating voltage of 15 keV, a beam current of 20 nA, and a beam diameter of 3 μm. Quantitative analyses were made using internal standards (orthoclase [Si, K] ). Mineral analyses were acquired as transects or points to capture chemical variability within the mineral phase. Glass analyses were made on all samples with an accelerating voltage of 15 keV, a beam current of 6 nA, and a beam diameter of 10-20 μm depending upon the size of the glass pools available.

Machine learning thermobarometry
Experimental data for the calibration of the single-phase thermobarometers (clinopyroxene-only and amphiboleonly) were collected from the Library of Experimental Phase Relations (LEPR) database (Hirschmann et al. 2008) and supplemented with experiments from the geological literature (Online Resource 2-Table S1a, b). Data spanned 0.002-12 kbar and 750-1250 °C, covering the crustal thickness (Kopp et al. 2011;Melekhova et al. 2019) and inferred range of magmatic temperatures (Melekhova et al. 2017;Toothill et al. 2007) of lavas from Saint Kitts and the wider arc. Experiments at 1 atmosphere (clinopyroxene) were excluded as they tend to exhibit an anomalously wide range of Al contents. This has been ascribed to Na-loss during the experiment (Putirka 2008) or coupled substitution of Na and Al for Si, Ca, and Mg in fast-growing, disequilibrium clinopyroxene at low pressure (Mollo et al. 2010;Ziberna et al. 2017). Binary cation plots for all calibrant experiments are shown in Online Resource 3- Fig. S1. Major element cations in amphibole and clinopyroxene were calculated on the basis of 6 (clinopyroxene) and 23 (amphibole) oxygens according to Deer et al (1997). In both cases all Fe was assumed as ferrous, as per Putirka (2016), as spectroscopic analysis of Fe speciation shows that stoichiometric Fe has little correlation to measured values (Al'meev et al. 2002;Dyar et al. 1993Dyar et al. , 1992Hawthorne and Oberti, 2007). Equilibrium partitioning of Fe and Mg between mineral and melt [K D (Fe-Mg) min−liq ] was used as a further discriminator for data quality by removing any analyses that fell outside of the standard deviation of values for clinopyroxene and amphibole (0.39 ± 0.16 and 0.29 ± 0.13, respectively; Putirka 2008). Any experiments that did not report a coexisting liquid were also excluded.
The final filtered datasets (n = 409 for amphibole; n = 615 for clinopyroxene; Online Resource 2-Table S1a, b) were used to train regression models with the extraTrees (v 1.0.5) package (Simm et al. 2014) in R (Team 2013). Note that these datasets are not an exhaustive collection of mineral equilibria but do cover the chemical variability observed along the Lesser Antilles arc. The extraTrees package encompasses a machine learning method for classification and regression, employing a series of uncorrelated decision trees to reach a prediction output (P, T) based on an input (mineral chemistry of our chosen phase). The methodology can be broken down into 4 component parts: testing, training, prediction, and uncertainty (see below). All components require an understanding of decision trees, an example of which can be found in Online Resource 3- Fig. S2 (a single decision tree for temperature prediction in amphibole). A decision tree is a hierarchical flow chart into which data are passed through a series of "yes" or "no" gates (nodes) conditioned upon a statement. In this case the statement is quantitative, in the generalised form: "does the mineral composition have a number less than the number at the node of a given branch for the chosen element?". After passing through all nodes, a prediction output (P, T) is reached at the base of the single decision tree. A random forest is a collection of decision trees, with each tree using different combinations of cations at the nodes to provide feature randomness (Simm et al. 2014). Therefore, the final predicted output (P, T) is taken as the median value predicted from all 300 decision trees. Hyperparameter tuning is shown to have little effect on uncertainty or precision of machine learning thermobarometers (Petrelli et al. 2020) and so we use 300 trees in our forest.
The generalised calibration workflow is shown in Online Resource 3- Fig. S3. Testing is used to understand the behaviour of the random forest and quantify the average uncertainty associated with models (Online Resource 3- Fig.  S3a). The experimental datasets of mineral compositions (amphibole, clinopyroxene) were first split into training datasets (used to train the model by correlating compositional changes in minerals with known P and T) and testing datasets (used to verify the models' performance by predicting P and T based on mineral composition and calculating a residual to the known experimental value). Relative to many applications of random forest machine learning, the datasets of equilibrium experiments are sparse. Random forest, unlike linear regression, has limited extrapolative capabilities and so the effect of removing many experiments to the testing dataset can significantly impact model performance. Additionally, equilibrium experiments are more commonly performed at lower pressure (≤ 2 kbar), and high-pressure experiments are typically performed at higher temperature (Fig. 2). Therefore, we used a uniform pressure-temperature grid when sampling the testing dataset (10% of the final filtered dataset), whereby a random experiment was sampled from within each uniform grid square. This ensured a uniform and representative pressure-temperature distribution in the testing dataset. Next, the random forest models are trained using the mineral composition of the phase of interest (amphibole, clinopyroxene) from the training dataset. The experimental composition of the amphibole (Si, Al, Ti, Ca, Na, K, Fe, Mg, Mn) or clinopyroxene (Si, Al, Ti, Ca, Na, Fe, Mg, Mn, Cr) was the input, and the experimental pressure or temperature was the output. The performance of the models was determined using a standard error estimate (SEE; Eq. 1) based on the ability of the algorithm to predict the testing (unknown) datasets.
To gauge the effect of sampling certain experiments in different testing and training datasets, we randomly resampled both datasets (amphibole and clinopyroxene) 200 times (r = 200) and repeated the calibration (Online Resource 3- Fig. S3a). This effectively yielded 200 individual models per thermobarometer, each encompassing a different testing and training dataset split, where each model consisted of 300 individual decision trees. Assessing uncertainty in this way is a noteworthy difference to linear regression thermobarometers, where a single "best" calibration (lowest SEE) is the end goal and poorly performing models are ignored or discarded.
The largest effect on the SEE recovered by each of the 200 repetitions (for both P and T in clinopyroxene and amphibole) was found to be the resampling of the testing and training datasets (see Results). This is a feature of the non-uniform distribution of equilibrium experiments available to us for training. However, the purpose of splitting a testing and training dataset is to determine the uncertainty of a model calibration. Therefore, having determined the overall uncertainty behaviour of our calibration strategy we train the final predictive models on all available data (Online Resource 3- Fig. S3b). The uncertainty we associate with each final model is the modal SEE from the distribution of the 200 calibration repetitions. This strategy is chosen as a more honest representation of the overall predictive ability as opposed to choosing the best performing model from each of the 200 calibrations and discarding the remaining 199. By doing this, we would have underestimated the SEE, particularly given our finding that the testing-training dataset split is the most important factor influencing the SEE. The result of our calibration strategy for thermobarometry is 4 random forests which have been assigned abbreviations for brevity: clinopyroxene pressure [P(C)], clinopyroxene temperature [T(C)], amphibole pressure [P(A)], and amphibole temperature [T(A)]. Prediction of intensive variables for natural data is achieved by passing a natural mineral composition through each of the 300 trees in the relevant random forest, an example of which can be seen in Online Resource 3- Fig. S3c.
A further advantage of a random forest calibration is that uncertainty may be assessed on an individual prediction, as opposed to relying solely on a SEE. However, we perform the SEE determination above for each model as a way for users to cross compare our methods with existing calibrations. As explained in Online Resource 3- Fig. S3d, to obtain a prediction from a random forest model a composition is passed through all 300 trees, each of which has a different structure. Therefore, 300 individual predictions are made which will dominantly reflect the best estimate (wisdom of the crowd), providing that mineral composition can be fundamentally related to P or T. However, some trees initiated on chemical elements which have a poor relation to the intensive variable of interest (P, T) will yield less accurate results. Hence, we take the interquartile range of the voting distribution as an uncertainty associated with a given individual prediction (output). Results predicted with high certainty will have most trees centred on a consistent predictive value, whereas higher uncertainty is reflected in wider voting distributions. We use this as an error bar for natural data.

Machine learning chemometry
A series of clinopyroxene-only and amphibole-only chemometers were calibrated to predict the chemistry of the melt (SiO 2 , Al 2 O 3 , CaO, Na 2 O, K 2 O, MgO, FeO, TiO 2 ) in equilibrium amphibole clinopyroxene

Fig. 2
Amphibole and clinopyroxene-bearing experiments used to train the models for the thermobarometers presented in this study. Note that amphibole (light-blue triangles) stability is thermally dependent (≤ 1050 °C). Clinopyroxene (green circles) spans a wider range of temperatures at a given pressure with amphibole/clinopyroxene. The calibration strategy and experimental runs were identical to those used for the thermobarometers (Online Resource 2- Table S1a, b), with coexisting liquids normalised to 100 wt% anhydrous (all Fe as FeO). The exception was that temperature (°C) for the experiments was used as an additional input variable as this significantly reduced the SEE. Using the same calibration datasets as for the thermobarometers (n = 409 for amphibole; n = 615 for clinopyroxene; Online Resource 2- Table S1a, b), we could also use the same testing and training dataset splits for calculating chemometer SEEs. As such, although T is an input parameter for chemometers, the SEEs are calculated on testing datasets where the T is entirely unknown to the algorithm. This is critical to provide a fair representation of the uncertainty as temperature is unknown for natural minerals a priori. Hence, when the chemometers are applied to natural minerals, we use the calculated temperature from the machine learning thermometer as a predictor along with the mineral chemistry of the phase of interest.

Equilibrium experiment matching
A common approach to determine pre-eruptive storage conditions of magma is to compare the results of equilibrium experiments with natural mineral chemistry (e.g. Pichavant and Macdonald 2007). Mismatch may arise due to experimental gaps (such as the lack of experiments at certain pressures; Fig. 2) or additional processes (mixing, resorption). To synthesise this method, we used a custom search script written in R (Team 2013) to match each microprobe analysis from Saint Kitts (Online Resource 2- Table S1c) with an experimental run from Online Resource 2-Table S1a, b. We searched for amphibole and clinopyroxene compositions from experiments that matched ≥ (n − 1) of the major elements in each phase to within 5% relative, where n is the number of major elements. K, Cr and Mn in mafic phases were omitted as they tend to be below detection or not consistently reported in experiments. These results can then be compared with P-T-X measured from our thermobarometers and chemometers, which is useful to independently estimate the performance of the trained algorithms. Additionally, the plagioclase compositions in equilibrium with experimentally matched amphibole and clinopyroxene compositions were extracted to understand their equilibrium conditions of crystallisation.

Thermobarometry and chemometry uncertainty estimates
An example of the best performing (lowest SEE) of the 200 model repetitions, where each model repetition employs a different testing and training dataset split, for thermobarometers and SiO 2 chemometers can be seen in Fig. 3. Amphibole is a remarkably sensitive thermobarometer, with low residual predictions across the full range of temperature (700-1100 °C; Fig. 3a) and pressure (0.54-12 kbar; Fig. 3b). Liquid SiO 2 in equilibrium with amphibole is also recovered throughout the studied compositional spectrum (basalt to rhyolite), although estimates lie closer to the 1:1 line at higher silica contents (> 72 wt% SiO 2 ; Fig. 3c). Equally, clinopyroxene is reliable throughout the magmatic temperature range (Fig. 3d) yet produces decidedly more scatter in predicted pressure values than amphibole at higher pressures (> 7kbar; Fig. 3e). Liquid SiO 2 in equilibrium with clinopyroxene is also recovered consistently across the compositional space, albeit with greater spread than amphibole (Fig. 3f). Best performing models for Al, Ca, Na, K, Mg, Fe, and Ti can be found in Online Resource 3- Fig. S4.
Each resampling of the testing and training datasets generates a slightly different model based on the data present in the resampled training dataset. This produces a distribution of SEEs for all 200 models (see "Methods"; Fig. 4). Therefore, to give a fairer representation of the SEE, we take the modal uncertainty for each of these distributions as the overall model SEE: pressure uncertainties of 1.6 kbar and 2.3 kbar for amphibole and clinopyroxene, respectively (Fig. 4a), and temperature uncertainties of 40 ˚C and 57 ˚C for amphibole and clinopyroxene, respectively (Fig. 4b).
Using cations or oxides makes negligible difference to the performance of the thermobarometers (Fig. 4). We argue against using site-specific mineral chemistry for model calibration (e.g. tetrahedral, octahedral) as these rely on assumptions for the filling of cation sites, particularly in amphibole, which leads to covariance (e.g. tetrahedral Si and tetrahedral Al). Both amphibole and clinopyroxene chemometers recover strikingly similar SEE values for all predicted oxides ( Fig. 4c;d), except for SiO 2 which is higher for the clinopyroxene chemometer (4.7 wt% SiO 2 ), suggesting both phases can be used interchangeably for recovering liquid lines of descent.

Amphibole as a reliable thermobarometer
Amphiboles are host to a wide array of cation substitutions (Leake et al. 1997), attributed to changes in intensive parameters (Blundy and Cashman 2008;Holland and Blundy 1994;Johnson and Rutherford 1989). However, amphibole chemistry is more sensitive to temperature and melt composition than pressure variability. This has historically resulted in good performance of amphibole thermometers (Holland and Blundy 1994;Putirka 2016) and chemometers (Zhang et al. 2017), alongside equally poor performance of amphibole barometers (Ridolfi and Renzulli, 2012), apart from those applied in multiply saturated granitic systems (Anderson and Smith, 1995). The good performance of our amphibole barometer (Figs. 3;4) is at odds with these previous studies. Indeed, amphibole performs notably better as a barometer than clinopyroxene (lower SEE).
To rationalise this discrepancy, and assess its robustness, we compare the P(A) to an amphibole thermobarometer developed using linear regression (Ridolfi et al. 2010;Ridolfi and Renzulli, 2012). We use the test dataset of Erdmann et al (2014), who questioned the ability of amphibole to recover magmatic pressures, which spans lower-mid-crustal pressures (2-4 kbar) and a range of magmatic temperatures (800-1000 °C). Results from the P(A) show a pressure SEE of 1.2 kbar for the test dataset of Erdmann et al (2014), with 82% of pressures recovered to within 1.6 kbar (the SEE of the P(A) calibration) of the experimental pressure (Fig. 5a). This represents a substantial improvement compared with the approach of Ridolfi and Renzulli (2012). As is expected from the results of Ridolfi and Renzulli (2012) and Erdmann et al (2014), temperature estimates ( Fig. 5b) from amphibole are good (SEE = 29 °C), with the T(A) showing much less scatter than Ridolfi and Renzulli (2012). As seen in Fig. 5a, predictions from some 4 kbar experiments overlap with predictions of 2 kbar experiments. However, the mean prediction from the P(A) for 2 kbar experiments is 2.7 kbar and 4.4 kbar for the 4 kbar experiments. Hence, average (by mean or median) estimates should reasonably indicate whether two different natural samples lie at distinct pressures, an approach advocated by other authors (Putirka 2016;Weber et al. 2019).
Multiple-reaction thermobarometers may offer a robust estimate of pressure, with the limitation of requiring a suite of equilibrated, touching phases. The spinel-clinopyroxene-olivine-plagioclase (SCOlP) barometer of Ziberna et al (2017) provides an equilibration pressure of olivine-bearing intrusive fragments from Dominica and Saint Kitts with uncertainties of 0.9-2.6 kbar. Temperature estimates are also calculated using the results of their approach (Ziberna et al. 2017). To further test our thermobarometer, we take Saint Kitts samples from Ziberna et al (2017) that contain amphibole and compare the T(A) and P(A) results for amphibole rims to the predicted pressure and temperature derived from SCOlP. Whilst the agreement of two independent thermobarometers does not unequivocally validate results, it does offer a metric to evaluate the performance of our calibrations on natural samples with well-constrained intensive parameters. Figure 5c, d shows the pressure and temperature predicted  (2017), showing excellent agreement between the two systems. The exception is a single point from sample KS31 (interpreted as a fragment of magmatic mush; Melekhova et al. 2017) which has a lower predicted temperature than that of Ziberna et al (2017). This point is a chemical outlier relative to the other samples, with high Fe, and very high Mn, explaining the offset in temperature. This sensitivity to outliers is a notable problem with single-phase calibrations and is why, where possible, the averaging of several spot analyses should be used to constrain robust estimates of intensive parameters. Pressure estimates (Fig. 5d)  The inferred pressure sensitivity of amphibole suggests that the random forest machine learning algorithm can recover more nuanced, non-linear relationships between phase chemistry, pressure, and temperature than traditional regression approaches. Hence, we argue that it is the regression strategy, as opposed to the insensitivity to pressure, that hinders previous calibrations of amphibole-only barometers. This is demonstrated by the agreement between our calibrations, equilibrium experiments ( Figs. 3; 5a, b), and an independent multiple-reaction barometer (SCOlP; Fig. 5c, d) for a range of compositions and conditions. This conclusion allows us to interrogate the variation of pressure and temperature in the Lower Mansion Series, recorded by amphibole, with confidence.

Mineral chemistry
Amphiboles form two visually discrete clusters (cluster 1 = SK408, SK385, felsic intrusive fragment from SK386B; cluster 2 = SK385, SK386B, SK388, SK390, SK394A, SK394C) for both Si vs Ca (Fig. 6a) and Mg vs Al (Fig. 6b). Both clusters show negative correlation for Si vs Ca, with cluster 1 ranging from 6.6 to 7.2 Si apfu (atoms per formula unit) compared with 6.0-6.5 apfu for cluster 2 (Fig. 6a). Given the strong negative correlation between amphibole Si and temperature (Putirka 2016), this qualitatively suggests that cluster 1 formed at lower magmatic temperature than cluster 2. Cluster 1 shows negative correlation between Mg and Al, whereas cluster 2 shows positive correlation. Typically, in the Eastern Caribbean these elements are positively correlated in amphibole, in both experimental and natural samples (Martel et al. 2013;Melekhova et al. 2017;Pichavant et al. 2002), with a notable exception on Dominica (Solaro et al. 2019 (Fig. 6c). Ca and Al positively correlate in clinopyroxene, with SK392 displaying higher Al for a given Ca compared to SK391 (Fig. 6d). In contrast to the felsic intrusive fragment in SK386B, the mafic intrusive fragment in SK392 spans identical mineral compositional space to its host rock. This implies that the intrusive fragment from SK392 was sampled from a similar source region as its host whereas the SK386B intrusive fragment is antecrystic sensu lato. EPMA spot analyses made in this study can be found in Online Resource 2- Table S1c.

P-T-X estimates for the lower mansion series
The mean and interquartile range of estimated crystallisation pressures for each of the Lower Mansion Series samples from our clinopyroxene and amphibole thermobarometers are shown in Fig. 7a. SK408 (basal pyroclastic flow deposit) has a restricted pressure range of 1.8-2.1 kbar and a mean of 1.9 kbar. The base of the Mansion Series (SK385) yields higher pressure with respect to SK408, spanning from 2.2 to 3.0 kbar (upper and lower quartile) and a mean of 2.4 kbar. Along the stratigraphy from SK385, pressure decreases to a mean of 2 kbar for SK391, before widening significantly in pressure extent for SK392 (Fig. 7a). The uppermost units (SK394A; SK394C) have quartiles that collectively span 2.8-4.5 kbar. Overall, this suggests differentiation associated with the Lower Mansion Series magmatism predominantly takes place in the middle-upper crust. Mean temperature systematically increases from the basal pyroclastic flow deposit (825 °C) to SK391 (1010 °C), with a slight drop in temperature in SK394A (942 °C) and SK394C (980 °C; Fig. 7b). Interquartile ranges of temperature for each sample are relatively tight (< 30 °C), except for SK391 which shows a wider temperature range. Mean temperature positively correlates with modal An# in the matrix plagioclase, suggesting that the An# increase is being thermally controlled (Fig. 7c). However, plagioclase inclusions in amphibole are consistently An# 80-90, excluding SK408, raising questions about whether such inclusions are consistently in equilibrium with their host phase (see "Discussion"). Measured (matrix glass, bulk rock) and estimated (predicted liquids from chemometers) metrics of melt chemistry for Lower Mansion Series samples are shown in Fig. 8. We will refer to matrix glass measurements from this point and throughout as "groundmass chemistry". This is because "glass" (sensu stricto) is defined as an amorphous, non-crystalline solid. However, all samples except sample SK408 contain abundant microlites (< 5 µm) of plagioclase, clinopyroxene and Fe-oxides. Therefore, we averaged multiple groundmass measurements to recover a bulk estimate for each sample. Groundmass chemistry (measured with the EPMA) and bulk rock samples (measured by XRF) lie on the same well-defined liquid line of descent. Groundmass chemistry measurements are consistently more evolved than their bulk rock counterparts (Fig. 8), most notably in SiO 2 (Fig. 7c). In addition, the chemistry of melts predicted by amphibole and clinopyroxene chemometers (Al 2 O 3 , K 2 O, CaO, Na 2 O, MgO, FeO, TiO 2 ), plotted versus SiO 2 as a measure of differentiation, are shown in Fig. 8. Predicted SiO 2 and Al 2 O 3 of amphibole melts negatively correlate (Fig. 8a). Clinopyroxene melts show a marginally shallower decrease of SiO 2 versus Al 2 O 3 , initiating from ~ 19 wt% Al 2 O 3 at ~ 52 wt% SiO 2 (Fig. 8a). Predicted melts from SK394C recover liquids consistent with high-Al basalt (> 19 wt% Al 2 O 3 ; Fig. 8a). Al 2 O 3 is marginally higher for amphibole equilibrium melts for a given SiO 2 compared to the liquid line of descent described by groundmass chemistry and bulk rock (Fig. 8a), although this offset is typically within  Harkness et al (1994) the uncertainty of the chemometer. Melt K 2 O from amphibole increases with increasing SiO 2 from ~ 0.5 to 2.5 wt%, accordant with a low-K system in which K is incompatible (Fig. 8b). The CaO, Na 2 O, MgO, FeO and TiO 2 of predicted melts ( Fig. 8c-g) plot concordantly with whole rock trends for the Lower Mansion Series, although calculated melts tend to have slightly lower FeO (Fig. 8f) than a whole rock of equivalent SiO 2 . Offset in predicted equilibrium FeO from chemometers (Fig. 8f) may be related to using all Fe as FeO in chemometer calibrations. Melts predicted from amphibole in SK408 match closely with groundmass chemistry from the same sample for all elements, implying that amphibole is in equilibrium with the groundmass. It should be noted that results of amphibole chemometers are best compared with groundmass chemistry from SK408 as the lack of microlites makes measurements analogous to pure liquid (glass). In general, some samples (SK408, SK385) display a tight range of melt composition whereas others (SK390, SK391, SK392, SK394C) cover a much wider range, correspondent with their span of mineral chemistry (Fig. 3) (2017), our amphibole chemometer results match closely with their predictions (Fig. 8). In the case of K 2 O, the equation of Zhang et al (2017) and our calibrations differ, although our calibration matches bulk rock and groundmass chemical variability from the Lower Mansion Series much more closely (Fig. 8b).

Comparison with equilibrium experiments
A comparison between natural, experimental and thermobarometric information derived from silicate phases is shown in Fig. 9. Clinopyroxene and amphibole pressure (Fig. 9a) and temperature (Fig. 9b) experimental matches are extremely close with the results of random forest thermobarometry. The exception is pressure estimates for clinopyroxene-bearing samples (SK391 and SK392; Fig. 9a) which have wider distributions in experimental matches compared with the T(C) and P(C). This perhaps relates to experimental matches excluding minor elements (see Methods) which have been attributed high feature importance in random forest-based clinopyroxene barometry (Petrelli et al. 2020). In general, the An# of plagioclase is shown to increase strongly with temperature, alongside the overprinting effects of water content and starting composition (Sisson and Grove, 1993). Plagioclase in equilibrium with clinopyroxene and amphibole experimental matches (Fig. 9c) can be used to infer the An# of the co-crystalising plagioclase. Plagioclase inclusions in amphibole span An# 80-90, with the exception of SK408 (Fig. 7c). However, experimental plagioclase compositions in apparent equilibrium with natural amphibole and clinopyroxene consistently fall outside of inclusion chemistry (Fig. 9c). Matrix plagioclase from natural samples matches much more closely with experimentally co-crystallising plagioclase, accordant with the correlation between thermometer temperatures and matrix An# (Fig. 7c). The exception is SK386B, for which An# of experimental matches is appreciably higher than matrix An#. However, SK386B also presents as an outlier in the overall temperature and An# increase of the sequence, suggesting that amphibole in SK386B may be antecrystic. Overall, these data show that plagioclase inclusions in Saint Kitts amphiboles are poor records of erupted liquid compositions.

P-T-X-H 2 O versus time in the Lower Mansion Series
The crustal column beneath a volcanic edifice is a thermally and chemically stratified sequence of magmatic minerals, melts, volatiles, and crystalline material (Sparks et al. 2019). This is reflected in the array of phenocryst chemistry and textures from lavas, pyroclastics, and intrusive fragments erupted at a single centre (e.g. Klaver et al. 2018Klaver et al. , 2017 as well as diverse records of along-arc melt inclusion chemistry and volatile contents (Cooper et al. 2020). Magma may remain resident in the system for long periods, move through relatively unimpeded, or never reach the surface. An evolved crustal column may act as an efficient density (and, by extension,  Table S1c) are subsetted from the experimental datasets in Online Resource 2- Table S1a, b. This enables the recovery of pressure (a) and temperature fields (b) for comparison between thermobarometry (filled boxplots) and appropriate experimental compositions (outlined boxplots). An# of natural plagioclase (P = phenocryst; I = plagioclase inclusions in amphibole; M = matrix plagioclase) are then compared to plagioclase in equilibrium with clinopyroxene and amphibole experimental matches (outlined boxes in c) composition) filter for magma (Stolper and Walker, 1980). Understanding these processes necessitates the recovery of the key magmatic variables that control the behaviour of a volcano: pressure (Fig. 7a); temperature (Fig. 7b); melt composition (Fig. 8); melt water content. The latter requires direct H 2 O measurement of mineral-hosted melt inclusions (Danyushevsky et al. 1993;Hervig et al. 1989;Zajacz et al. 2005), hygrometry (Waters and Lange 2015), or calculation of melt H 2 O using partition coefficients derived from nominally anhydrous minerals (e.g. clinopyroxene and orthopyroxene; Edmonds et al. 2016;Hauri et al. 2006). Melt inclusions may be entirely absent in crystals or disproportionately represented in certain domains (core or rim) due to a propensity to form along cracked surfaces (Faure and Schiano 2005) or during discrete heating, dissolution, and reprecipitation events (Cashman and Blundy 2013;Edmonds et al. 2016;Nakamura and Shimakita 1998). Equally, melt water contents may not be faithfully recorded due to post-entrapment effects (Gaetani et al. 2012;Massare et al. 2002). Regardless, amphibole-hosted melt inclusions in Saint Kitts samples are sparse. To overcome this, we combine melts from amphibole chemometry (this study) with the modal plagioclase matrix composition ( Fig. 7c; Higgins et al. 2021) to give a series of plagioclase-melt pairs. These plagioclase-melt pairs were used in the hygrometer of Waters and Lange (2015). We perform a full, Monte Carlo error propagation which includes the uncertainty of predicted pressure and temperature from amphibole thermobarometry, predicted liquid from amphibole chemometry, and variability within the interquartile range of matrix plagioclase composition. This yields uncertainty of predicted H 2 O of approximately ± 1wt% H 2 O (mean sample uncertainties 0.72-1.34 wt% H 2 O). Full H 2 O uncertainty results and a description of our error propagation method are presented in Online Resource 4. We assess whether these melts are saturated in a vapour phase using equation 10 in Zhang et al (2007) at the conditions described by our thermobarometers. For clinopyroxene-bearing samples, hygrometer calibrations for specific melt compositions exist (e.g. alkaline Etnean magmas; Armienti et al. 2013;Perinelli et al. 2016), although globally applicable calibrations are lacking. Instead, we use estimates from experimental matches (Fig. 9) to infer melt water contents in equilibrium with Saint Kitts clinopyroxene where possible.
The samples from the Lower Mansion Series show a progressive temperature increase, evidenced through matrix plagioclase An# and thermometry ( Fig. 7b;c), coupled with relatively stable barometric estimates for several deposits (Fig. 7a). We will now place this into the context of the evolving sub-volcanic system, using our full P-T-X-H 2 O versus (relative) time dataset (Fig. 10).

Clearing out the pipes of the magma plumbing system
The lower pressure of SK408 (~ 2 kbar) compared to most of the sequence (Fig. 7a), along with its experimental phase assemblage matches (Fig. 9), indicates residence in the upper crust: rhyolitic glass, orthopyroxene, amphibole, plagioclase, and Fe-oxides are found in a subset of similar pumice from Dominica that have been linked to upper-crustal differentiation using equilibrium experiments (Solaro et al. 2019). Quartz grains in SK408 can be ascribed to prolonged cooling in the upper crust in equilibrium with rhyolitic glass and albitic plagioclase (Solaro et al. 2019). Late-stage resorption (rounding; Higgins et al. 2021) is invoked by the contraction of the cotectic in the An-Ab-Qtz ternary (granite minimum) towards the quartz end member during decompression (Ghiorso and Gualda, 2015). The match between the predicted liquid from amphibole chemometry (Fig. 8) and the rhyolitic glass from SK408 implies amphibole was a late crystallising phase. Hence amphibole records a robust estimate of final equilibration conditions in SK408. Therefore, we suggest that SK408 acted as an upper-crustal plug to the volcanic plumbing system. As such, the pronounced compositional gap for magmas (pyroclastics or lavas; Online Resource 3- Fig. S5) at 66-72 wt% SiO 2 on Saint Kitts could reflect an intrinsic feature of melt production: melt that evolves beyond ~ 66 wt% SiO 2 is difficult to erupt in large quantities. Therefore, eruptions of magma with > 66 wt% SiO 2 are restricted to rare dome effusions (e.g. the Salt Pond Dome; Baker 1984), as rhyolitic glass in upper-crustal bodies following protracted cooling (SK408), or slivers of evolved, interstitial melt erupted inside fragments of the deeper plutonic system (Melekhova et al. 2017;this study). Mineral-hosted (predominantly in clinopyroxene and plagioclase; Fig. 8) melt inclusions fill this compositional gap and are typically erupted in magmas with < 66 wt% SiO 2 (Online Resource 3- Fig. S5; Cooper et al. 2020;Melekhova et al. 2017). An analogue to this in the Eastern Caribbean may be the island of Bequia (Camejo-Harry et al. 2018) where interstitial melts measured in intrusive fragments are consistently more evolved than concomitant lavas, revealing a lack of either sufficient size or efficient melt extraction in the magmatic system. A rare, exposed window into an equivalent plutonic regime may be found on Saint Martin, in the north of the extinct Limestone Caribbees, where the granodiorite pluton spans 62-75 wt% SiO 2 (Davidson et al. 1993). Given the assertion that amphibole crystallised in equilibrium with the matrix glass we can also be confident that the predicted melt water content (6.1 ± 0.72 wt% H 2 O; water saturated at 2 kbar; Fig. 10) is indicative of the final, pre-eruptive melt water content for SK408.

Progressive increase of temperature in time
Removing this chemically evolved, upper-crustal plug permitted less-evolved material to be erupted from the middle-upper crust (Figs. 7a; 10). Magmatic temperatures increase consistently along with An# of matrix plagioclase (Fig. 7b, c). Although constraining a timescale from palaeosoils is difficult due to highly variable accretion rates in the Caribbean, the sparse palaeosoils between SK386B and SK390 imply this increase is happening over short (decadal to millennial) timescales (Higgins et al. 2021). As thicker deposits (e.g. SK390 and SK391) tend to sample a wider variety of plagioclase phenocryst chemistry (Higgins et al. 2021), the mechanism that generates large-volume eruptions is not necessarily coupled to the thermal maturation at this temporal scale. Relationships between erupted volume and deposit thickness are likely to hold true at the proximity to the source of the Lower Mansion Series stratigraphy in this study (Fig. 1), as shown by mapping of dispersal characteristics on Saint Kitts (Roobol et al. 1985).
Melt water content from hygrometry reveals a relatively limited range from 5.2 ± 0.79 -7.1 ± 1.5 wt% H 2 O. This agrees with water contents of Saint Kitts melt inclusions (Cooper et al. 2020;Melekhova et al. 2017). Amphibole melts from SK385, SK386B, SK388, and SK390 are water undersaturated according to equation 10 of Zhang et al (2007), although this does not preclude saturation in a mixed fluid (e.g. CO 2 + H 2 O). The felsic intrusive fragment in SK386B is an exception, with equilibrium melts recording the same water-saturated conditions as in SK408. In fact, the similar mineralogy, water content (Fig. 10), pressure (Fig. 7a), temperature (Fig. 7b), and melt chemistry (Fig. 8) between the felsic intrusive fragment in SK386B and sample SK408 suggests the former could be the plutonic (intrusive) equivalent of the latter, scavenged during ascent through the upper crust.

Peak thermal maturity reflected in clinopyroxeneamphibole phase relations
In SK391 and SK392, clinopyroxene is present as phenocrysts and amphibole is absent. Likewise, there are no phenocrysts of clinopyroxene in amphibole-bearing samples. This agrees with the examination of our experimental database which shows that, at middle to upper-crustal conditions, the Saint Kitts magmas lie outside of any stability field in which amphibole and clinopyroxene coexist. Instead, olivine and clinopyroxene crystallise at the expense of amphibole. This is a well-established incongruent thermal reaction driven by decreasing temperature (liquid + clinopyroxene → amphibole), observed in experimental (Foden and Green 1992) and natural (Smith 2014) samples including Lesser Antilles intrusive fragments (Cooper et al. 2016;Melekhova et al. 2017). Therefore, the crystallisation of clinopyroxene and amphibole from high-Al basalts (Andújar et al. 2015;Foden and Green 1992;Melekhova et al. 2017) and dacites (Marxer and Ulmer, 2019;Pichavant et al. 2002;Solaro et al. 2019) equivalent to inferred Saint Kitts starting compositions can impart information about the thermal balance establishing in the system through time (Fig. 10). Effectively, the amphibole-clinopyroxene transition reflects the thermal state of the magma beneath Saint Kitts (phase boundaries in Fig. 10). Peak temperatures are represented by SK391 and SK392 which have crossed into the clinopyroxene field. Higher temperature amphibole melts (SK386B, SK390) lie closer to the "amphibole in" boundary, with decreasing temperature driving melts in equilibrium with amphibole to higher SiO 2 . More evolved (cooler) melts that lie further from this boundary should crystallise less anorthitic plagioclase (e.g. SK385), in agreement with matrix plagioclase mineral chemistry (Fig. 7c). The interplay between clinopyroxene, amphibole, and plagioclase is mirrored in their equilibrium melt chemistry (Fig. 8). Al 2 O 3 shows a decrease versus SiO 2 for clinopyroxene melts (Fig. 8a). This is typical of plagioclase-saturated melts (Grove et al. 2012;Sisson and Grove, 1993). Considering decreasing Mg as an indicator of progressive differentiation, and that calcic plagioclase is one of the dominant sinks of Al in arc magmas, we would expect Al in clinopyroxene to decrease with decreasing Mg in the presence of abundant plagioclase (Klaver et al. 2017). In SK392 the opposite is true (Fig. 3c), despite predicted melt chemistry that attests to plagioclase saturation. Combined, these melt and mineral chemistry features represent two competing effects. First, plagioclase saturation increases Al activity in the melt, increasing Al with decreasing Mg in clinopyroxene, as noted in experimental studies (Nandedkar et al. 2014;Villiger et al. 2007). Second, the clinopyroxene-plagioclase ratio is affected by melt water content. At fixed pressure and temperature, and increasing water activity, the ratio of clinopyroxene to plagioclase increases. This is because water destabilises plagioclase but has less effect on clinopyroxene abundance at these conditions (Andújar et al. 2015;Sisson and Grove, 1993). Therefore, we suggest that the clinopyroxene from SK392 represents crystallisation from a wet, but water-undersaturated, high-Al basalt that cooled between 1025 °C and 950 °C. A realistic experimental analogue would be found in Andújar et al. (2015), for example run s0968-43 (Fig. 10). This cooling would largely explain the range of melt chemistry from the clinopyroxene chemometer (Fig. 8) as well as the increase in Al in the clinopyroxene during differentiation, resulting from an increase in melt H 2 O (partially suppressing plagioclase) during crystallisation. The subtle balance between clinopyroxene and plagioclase mineral fraction is potentially mirrored in the shallower trend in melt Al 2 O 3 versus SiO 2 (Fig. 8a), indicative of a smaller fraction of plagioclase than coexists with the amphibole melts. Amphibole melts show the same decrease in Al 2 O 3 , albeit with a trend that initiates from a marginally higher initial Al 2 O 3 than the clinopyroxene melts (Fig. 8a). Amphibole from SK394C records a higher Al 2 O 3 content than any other sample. This suggests a more protracted stage of fractionation of Al-poor phases (olivine, clinopyroxene, orthopyroxene, spinel) prior to plagioclase saturation. The slightly hooked appearance of the Al 2 O 3 versus SiO 2 trend may even indicate the final moments of a melt at the cusp of plagioclase saturation. The larger degree of fractionation of the parent melt to the amphiboles of SK394C is also reflected in its melt water content, the highest of any sample (modal peak of 6.7 ± 0.8 wt% H 2 O; Fig. 10b).

Cryptic fractionation from andesitic melts unveiled by amphibole chemometry
A feature of melt inclusion records in arc magmas is a pronounced bimodality in SiO 2 (wt%), whereby melt inclusion chemistry correlates poorly with host bulk rock chemistry in the andesitic range (Reubi and Blundy 2009) due to a paucity of intermediate composition inclusions. However, amphibole chemometry recovers a large proportion of melts with 55-65 wt% SiO 2 , covering much of the spectrum that is conspicuously absent in melt inclusions (Fig. 8a). Similar observations are made at Lassen Volcanic Center (California; Scruggs and Putirka, 2018). This suggests that amphibole chemometry reveals protracted cryptic fractionation from andesitic melts within the volcanic sub-system, albeit along the same overall liquid line of descent (Fig. 8). Cryptic evolved melts are recorded in plagioclase via a similar process in the monotonous basaltic magmas of the Gálapagos (Stock et al. 2020). Upon amphibole crystallisation, the surrounding melts are chemically propelled from basaltic-andesitic composition towards more silicic melts via two effects. First, amphibole exerts a large differentiation effect on the melt due to its low silica content relative to other common fractionating silicates in arc systems (Grove and Donnelly-Nolan 1986). Second, amphibole generally appears on the liquid line of descent at a point where arc magmas spend a relatively short period of time, resulting in rapid differentiation and a low occurrence probability of andesitic melts (Caricchi and Blundy 2015;Marsh 1981;Müntener and Ulmer 2018;Nandedkar et al. 2014). Based on this evidence, amphibole may have a significant effect not just on trace element behaviour (Davidson et al. 2007;Smith 2014) but also on major element behaviour in arc magmas.

Late-stage waning
In the overall trend of the sequence, SK394A and SK394C represent a minor, late-stage thermal waning that results in the re-crossing of the amphibole-clinopyroxene cotectic (Fig. 10b). However, these samples also appear to be more representative of the dominant state of the sub-volcanic system. This is revealed by textural segmentation of quantified chemical maps (Higgins et al. 2021;Sheldrake and Higgins 2021) which show an increased abundance of homogeneous, high-An# crystals in the upper units of the stratigraphy compared with the base, as well as the mean An# of matrix plagioclase and phenocryst plagioclase converging to near uniformity (Fig. 9c). The regression to a less-evolved composition is preserved almost entirely in the mineral chemistry, with the bulk rocks from the Lower Mansion Series broadly overlapping with the most common composition erupted throughout the history of Saint Kitts volcanism (an andesite with ~ 58 wt% SiO 2 and ~ 17.4 wt% Al 2 O 3; Fig. 7c; Online Resource 3- Fig. S5). This contrasts with the varied mineral chemistry and textures, particularly in plagioclase (Higgins et al. 2021;Figs. 7c;9).
The observable disequilibrium between plagioclase inclusions in amphibole and amphibole thermometer temperatures may be explained by a similar process. Essentially the melt and the crystal column through which it moves are chemically decoupled from one another. Ascending melts that crystallise amphibole may cannibalise small, high-An# crystals that reveal the dominant chemistry in the sub-volcanic region but not necessarily the crystal in equilibrium with the package of melt from which the amphibole crystallised. As the magmas leaving the top of the system transition to a more representative composition of the magmatic system in time (e.g. SK394A, SK394C), the plagioclase inclusion chemistry, phenocryst chemistry and matrix chemistry converge. Such an observation is unsurprising considering an identical process occurs in plagioclase phenocrysts in the form of rare, high-An# cores in SK408 (Higgins et al. 2021). This process is consistent with "petrological cannibalism" whereby chemically and spatially disparate crystals from the plutonic sub-system are scavenged and amalgamated into a final erupted product (Cashman and Blundy 2013;Davidson et al. 2007;Reubi and Blundy 2008). The An# of inclusions would not re-equilibrate with the host amphibole due to the slow rates of CaAl-NaSi interdiffusion in plagioclase at magmatic temperatures (Grove et al. 1984), which may even surpass cooling timescales of some magma reservoirs ). Resorption is prevented as the plagioclase is entombed in its host and is, therefore, chemically isolated from the reactive magma volume ). Our observations raise questions surrounding the validity of using inclusions in amphibole as equilibrium pairs for thermobarometry.

How representative is "groundmass chemistry" of true Saint Kitts liquids?
A counter-intuitive observation is that average groundmass chemistry does not correlate with average matrix plagioclase chemistry on Saint Kitts (Fig. 7c). As both An# of matrix plagioclase and groundmass chemistry are measured data (not calibrated or calculated) they must be explained via a process operating in the magmatic system. Should the proposed correlation between An# and amphibole temperature be correct, temperature change associated with the sequence is ~ 200 °C (Fig. 7b) which should induce a change in groundmass chemistry via crystallisation. Such a change is not apparent. We have two potential explanations which are not mutually exclusive. Firstly, individual analyses have been variably affected by microlite content. We observed a significant, and heterogenous, distribution of plagioclase, clinopyroxene and Feoxide microlites in all samples except SK408. This led to wide ranges for all samples except SK408 ( Fig. 8; Online Resource 3- Fig. S6). Hence, individual (or averaged) analyses are unlikely to be indicative of the final equilibrated liquid. Therefore, the most evolved liquid recorded by amphibole chemometry for each sample may be more representative of the final liquid in equilibrium with the crystal cargo. In CaO vs Al 2 O 3 space, these most evolved liquids produce a linear, temporally evolving array which would match the increase in plagioclase matrix An#, at least for amphibole-bearing samples (Online Resource 3- Fig. S6). Secondly, there may be a compositional buffering effect. In this scenario, a multi-phase reaction, such as olivine + clinopyroxene + hydrous melt (1) = amphibole + hydrous melt (2), may stabilise melt within a relatively fixed compositional spectrum (Foden and Green, 1992;Stewart et al. 1996). This would leave temperature or water content, both of which vary for Lower Mansion Series samples (Figs. 7;10), to control matrix plagioclase An# variability.

Evidence for a vertically extensive magmatic system?
Evidence from modelling, geochemistry, and geophysics over the last 2 decades asserts that magmatic systems should be considered as transcrustal entities (Annen et al. 2006;Christopher et al. 2015;Sparks et al. 2019). One of the key features of these systems is that much of the differentiation is staged in the deep crust, which acts as a factory for cumulate-textured rocks, whereas the middle-upper crust is a vertically extensive system that hosts more ephemeral bodies of melt and crystals (Annen et al. 2006;Cashman et al. 2017;Hildreth and Moorbath, 1988;Jackson et al. 2018). The results of our thermobarometry add support to this view. The depth range of the inferred magmatic system beneath Saint Kitts extends from 1.2 ± 0.5 to 5.6 ± 2.4 kbar, equivalent to a depth of 3.2 ± 1.4 to 15.1 ± 6.5 km (crustal density 2.7 g/ cm 3 ; Fig. 10). This is consistent with the vertical extents for the upper portion of transcrustal magmatic systems hypothesised beneath stratovolcanoes in the Eastern Caribbean (Camejo-Harry et al. 2018;Christopher et al. 2015;Cooper et al. 2016;Edmonds et al. 2014) and elsewhere (Cashman and Blundy, 2013). In general, there is a moderate increase in the vertical extent (pressure range) over which crystallisation occurred, coincident with clinopyroxene crystallisation (Fig. 10). Critically, however, most magma released by the investigated eruptions was sourced from a significant range of depth in the crust ( Fig. 7; Fig. 10).
The sub-vertical pressure-temperature gradient on Saint Kitts (Fig. 10), present throughout the eruption history, shows that magmas were sourced from a thermally mature crust: if the crust was to have a pristine geothermal gradient, shallower magmas should be much cooler than deeper magmas which is not the case on Saint Kitts. Thermal modelling reproduces such sub-vertical PT gradients following magma injection (Karakas et al. 2017). However, the uncorking of the upper-crustal system (SK408) likely removed a significant amount of heat from the crust. Therefore, if we consider a rather constant average long-term magma flux (Caricchi et al. 2014), the eruptive sequence on Saint Kitts reflects a thermal contraction as the system recovers to its steady state. Recovery may be signified by the point at which the matrix plagioclase, phenocryst plagioclase, and inclusion chemistry in amphibole converge (i.e. the upper units of the Lower Mansion Series; Figs. 7; 10; Higgins et al. 2021). The composition at which they converge is An# 80-90, the most abundant phenocryst chemistry observed throughout the Lower Mansion Series sequence (Higgins et al. 2021), and therefore the most sampled inclusion chemistry. The dominance of relatively unzoned, anorthite-rich plagioclase in cumulate-textured intrusive fragments on Saint Kitts supports this view (Melekhova et al. 2017).

Conclusions
We have calibrated clinopyroxene-only and amphibole-only thermobarometers that yield performance comparable to (in the case of clinopyroxene; Fig. 3d, e) and far exceeding (in the case of amphibole; Fig. 3a, b) that of existing thermobarometers (Petrelli et al. 2020;Ridolfi et al. 2010;Ridolfi and Renzulli 2012). Equilibrium experiments, as well as natural samples with well-constrained pressure and temperature, were used to independently verify the performance of the amphibole thermobarometer, in most cases recovering pressures to within ≤ 1.6 kbar of known values (SEE of the P(A); Fig. 4a). This suggests that amphibole acts as a reliable indicator of magma storage pressure and temperature without necessarily requiring multiple saturation conditions with liquid or other solid phases. Random forest machine learning uncovers subtle relationships between intensive parameters and mineral chemistry, unaccounted for by conventional linear regression approaches (e.g. Ridolfi and Renzulli 2012). Chemometers, recorders of melt chemistry in equilibrium with a mineral phase, delineate evolution trends in agreement with those elucidated by whole rock and melt inclusion chemistry from Saint Kitts (Fig. 8). The ability to predict melt chemistry from EPMA analyses has wide-reaching future applications in Earth sciences, including combination with viscosity models (Giordano et al. 2008).
Thermobarometry and chemometry (this study), coupled with hygrometry (Waters and Lange, 2015), have allowed us to unpick the P-T-X-H 2 O versus time of the evolving subvolcanic reservoir beneath Saint Kitts (Figs. 7; 10). The basal pyroclastic flow deposit removed an upper-crustal plug that had formed at ~ 2 kbar, allowing progressively hotter, amphibole-phyric magma to move through from depth (1.2 ± 0.5 to 5.6 ± 2.4 kbar). A continuous increase in temperature resulted in the crossing of the amphibole-clinopyroxene cotectic, forming clinopyroxene at the expense of amphibole (Foden and Green 1992). Frozen snapshots of this incomplete reaction can be found in the plutonic fragments erupted on the island (Melekhova et al. 2017). Thermobarometry and chemometry of amphibole and clinopyroxene were paired with equilibrium experiments that match closely to their mineral chemistry. This identified disequilibrium in amphibole-hosted plagioclase inclusions for the lowermost units (SK408-SK390). The magma most representative of the sub-volcanic reservoir beneath Saint Kitts is reflected in the uppermost units (SK392-SK394C) where plagioclase phenocrysts, matrix and inclusions in amphibole become relatively chemically invariant.
Compared with multi-reaction thermobarometers that rely on melt or equilibrium pairs, our calibrations can record the entire history of each crystal, at the expense of marginally higher standard error estimates. This drastically increases the probability of sampling the true extent of the magmatic system. Bias may be introduced when melt or matrix chemistry is used in liquid-dependent thermobarometers: pressure estimates are skewed towards lower pressures as matrix melt was crystallised last and is more likely to have equilibrated in the shallowest portion of the magmatic system with the rims of phenocrysts. Hence the average global estimate of 2 kbar for most upper-crustal magma chambers (e.g. Plank et al. 2013) is likely an absolute minimum, representing the average depth of the roof as opposed to the main thermal engine of a volcanic plumbing system.
Our approach has facilitated a link between an eruptive cycle of an arc volcano (Mount Liamuiga, Saint Kitts) and the intensive parameters that govern eruptive behaviour. This is particularly important in a system where bulk rock chemical variation clearly masks many of the nuances uncovered using mineral chemistry (this study; Higgins et al. 2021) which may not be the case in certain unique examples (Gertisser and Keller, 2003). By extending this approach to other volcanic systems, we may better uncover the links between the temporally evolving chemical and physical properties of magma, and the eruptive behaviour of volcanoes.