SOLUBILITY PRODUCT CONSTANTS FOR NATURAL DOLOMITE (0– 200°C) THROUGH A GROUNDWATER-BASED APPROACH USING THE USGS PRODUCED WATER DATABASE

The calculation of a reliable temperature dependent dolomite solubility product constant (Ksp−dol) has been the subject of much research over the last 70 years. This study evaluates log10( Ca 2+ a / Mg a ) values using PHREEQC (Pitzer approach) for a screened subset (n=11,480) of formation waters in the U.S. Geological Survey National Produced Waters Geochemical Database V2 (PWGD), an extensive inventory of 165,960 formational waters from a range of sedimentary lithologies in North America up to 6.6 km depth (Blondes and others, 2016). Through extensive ground truthing against datasets sourced from Texas Gulf Coast basin and the Mississippi Salt Dome basin we establish both the geochemical data from the PWGD and a new geothermal model of the US that is used to determine temperatures at-formation-depth to be reliable data sources. The vast majority (90%) of PWGD samples have log10( Ca 2+ a / Mg a )temperature values that are interpreted to be indicative of calcite-dolomite equilibrium and buffering by the bulk mineral solubilities. Using statistical models with different parameterisations (different Maier-Kelly formulas, mixed-effects models with various random effects and linear models) log10( Ca 2+ a / Mg a ) values are regressed against the estimated at-formation-depth temperatures to determine Ksp−dol between 0–200°C. This process relies on the well constrained calcite solubility product constant (Ksp−cal). Local effects that modify log10( Ca 2+ a / Mg a ) values are evaluated through the addition of random effects to the mixed model which both improves the statistical reliability of the Ksp−dol model and enables the determination of Ksp−dol values for local dolomite phases. The nature of these local effects is open to interpretation, but we suggest the primary influence on log10( Ca 2+ a / Mg a ) values is the stoichiometry of the dolomite phase systematically modifying log10( Ca 2+ a / Mg a ) values. We discount the influence on log10( Ca 2+ a / Mg a ) values of dolomite order, the solution ionic strength, equilibration with anhydrite and chlorite group minerals, illitization of smectite and albitization of feldspar. For the dolomite solubility equation; CaMg(CO3)2 (s) ↔ Ca 2+ (aq) + Mg 2+ (aq) + 2CO3 2− (aq) (1) the mixed-effects model (model J23) chosen as most representative yields a pKsp−dol (log10Ksp−dol); pKsp−dol = 1. 47545 × 10 1 − 6. 24959 × 10 ∙ T(K) − 3. 99350 × 10 ∙ 1

INTRODUCTION The solubility product constant (Ksp) is defined as 'the product of the ion activities raised to appropriate powers of an ionic solute in its saturated solution, expressed with due reference to the dissociation equilibria involved and the ions present' (McNaught, 1997). Ideally Ksp is determined analytically using a saturated solution under the temperature/pressure conditions for which the constant is desired ( Hefter and Tomkins, 2003). However, even with high degrees of supersaturation, it has proved difficult to unequivocally precipitate well-ordered, stoichiometric dolomite near standard state conditions (25°C, 1 atm) that coincide with conditions near which significant volumes of dolomite formed (Land, 1998;Warren, 2000). This necessitates a Ksp extrapolation from higher temperatures, where equilibrium is achieved within the timescales of laboratory experiments, using a Maier-Kelley regression (Maier and Kelley, 1932). Whilst pragmatic, this approach suffers from increased uncertainty for conditions outside the experimental range. An alternative to the experimental method for determining Ksp−dol is the groundwater method, which assumes calcite-dolomite equilibrium has been attained based on the substantial residence times of subsurface fluids.
This study reviews the complexities in evaluating Ksp−dol and determines Ksp−dol using the groundwater method by evaluating log 10 ( Ca 2+ a / Mg 2+ a )-temperature relationships for a subset (n=11,480) of fluids from the PWGD ( Blondes and others, 2016). Temperatures atformation-depth are estimated by interpolating and merging subsurface geothermal gradients from the SMU Heatflow database ( Blackwell and others, 2011) and mean annual land surface temperatures across North America (Bechtel, 2015). The thermal conditions under which most sedimentary dolomites form overlap with the range of temperatures (0-200°C) for which this study proposes a Ksp−dol suggesting a wide applicability for this regression analysis. For a given temperature, a deviation in log 10 ( Ca 2+ a / Mg 2+ a ) away from the average value is interpreted to reflect a variation in the composition of the equilibrium dolomite phase.

SUMMARY OF PAST WORK
The Ca-Mg-CO2-H2O-Calcite-Dolomite System There are two dominant mechanisms through which dolomite forms. The first is primary precipitation from a dolomite-supersaturated fluid; CaMg(CO 3 ) 2 (s) ↔ Ca 2+ (aq) + Mg 2+ (aq) + 2CO 3 2− (aq) Primary dolomites are found cementing sandstone lithologies (Spötl and Pitman, 1998), in metamorphic rocks (Bucher and Grapes, 2011) and as dolocretes (Khalaf, 2007). In natural 3 systems primary dolomite is much less volumetrically significant than secondary (diagenetic) dolomite mostly found replacing limestones (Warren, 2000). Secondary dolomite forms via a dissolution-reprecipitation mechanism whereby a Mg-rich fluid enters a calcite bearing formation, calcite dissolves, the fluid becomes dolomite supersaturated and precipitates dolomite; 2CaCO 3 (s) + Mg 2+ (aq) ↔ CaMg(CO 3 ) 2(s) + Ca 2+ (aq) The molar ratio of dissolved calcite to precipitated dolomite is a function of both the (temperature dependent) solubility and stoichiometry/order of all equilibrium solid phases (including any common-ion effects), and the unbuffered original fluid composition (particularly with respect to Ca The equilibrium constant (K sp−dz ) of equation (5) which can be rearranged to determine pKsp−dol; log 10 K sp−dol = 2 log 10 K sp−cal − log 10 ( Ca a 2+ Mg a 2+ ) Improvements in estimates of Ksp−cal, and to a lesser extent ion activity models, lead to progressive refinement of Ksp−dol models. For example, early work by Hsu (1963) estimated pKsp°−dol to be -16.69 using a pKsp°−cal of -8.29 ( Garrels and Drever, 1952). Recalculating (supplementary table 1) the data from Hsu (1963) using a pKsp°−cal from SUPCRT92 (table 1; slop07.dat database - Johnson and others, 1992) of -8.48 results in a decrease of pKsp°−dol to -17.07. Sherman and Barak (2000) and Bénézeth and others (2018) both collate and recalculate Ksp−dol values using different reference thermodynamic data and these studies form the basis for our review (supplementary table 2). Utilising Ksp−cal to determine Ksp−dol from the perspective of calcite-dolomite equilibrium has been employed for both experimental studies ( Rosenberg and Holland, 1964; Baker and Kastner, 1981; Morrow and others, 1994;Usdowski, 1994; see 'Solubility (experimental)' supplementary table 2) and groundwater studies (Hsu, 1963; Barnes and Back, 1964;Hyeong and Capuano, 2001; Vespasiano and others, 2014; Blasco and others, 2018;see 'Solubility (groundwater) ' supplementary table  2).
Complicating matters, Möller and De Lucia (2020) argue that because calcite and dolomite dissolve and precipitate incongruently, then log 10 ( Ca 2+ a / Mg 2+ a ) and by extension equation (9), cannot be used to determine the thermodynamic properties of either bulk mineral. Specifically Möller and De Lucia (2020) state that equilibrium is maintained with non-stoichiometric magnesian calcite and calcian dolomite surface phases; 2Ca Mg 1− CO 3 (s) + (2 − )Mg 2+ (aq) ↔ Ca Mg (2− ) (CO 3 ) 2 (s) + (2 − )Ca 2+ (aq) such that the pK sp−dol for the non-stoichiometric dolomite surface phase is; pK sp−dol = 2 pK sp−cal − (2 − )p ( Ca a 2+ Mg a 2+ ) Assuming that the composition of neither surface carbonate phase is known the use of a single value of log 10 ( Ca 2+ a / Mg 2+ a ) generates an inherently non-unique solution. Bulk mineral calcite and dolomite compositions deviating from ideality, both with respect to stoichiometry and order, pose similar non-uniqueness problems. We review the current understanding of the potential bulk and surface mineral contributions to solution chemistry.

Dolomite Ordering and Stoichiometry
Ordering.--The crystallographic structure of ideal dolomite consists of alternating layers of covalently bonded Ca 2+ , Mg 2+ and CO3 2ions ( fig. 1) ( Gregg and others, 2015;Deelman, 2003 and references therein). The level of homogeneity within individual cation monolayers is described as the degree of substitutional order (s) for the dolomite phase, with unity representing total order (s = 1) and zero total disorder (s = 0). The different sizes and Coloumbic attraction profiles of Ca 2+ and Mg 2+ ions mean disordered locations of either ion result in thermodynamically unfavourable (lattice strain) tilting and rotation of carbonate anions (Althoff, 1977;Deelman, 2003; Antao and others, 2004).
There exists a dynamic equilibrium with cations constantly swapping locations with overall ordering remaining constant. This is the mean-field assumption (ordering parameter homogeneity across an infinite lattice) that is the basis of the Bragg-Williams model (Bragg and Williams, 1934; Chaikin and others, 1995) seminally applied to dolomite by Navrotsky and Loucks (1977). The Bragg-Williams model enables both thermodynamic and kinetic components of dolomite ordering to be considered. Order (s) is described by the relationship (Navrotsky and Loucks, 1977); where Tc is the critical temperature at which there is zero order. Higher temperatures decrease the difference in the potential energy between any two sites and the locations of Ca 2+ and Mg 2+ ions tend towards randomness, with the probability of an ion occupying a lattice location proportional to stoichiometry. Disordering is therefore endothermic (positive ∆Hdis) and disorder increases up to the Tc (Tc = 1373K-1473K, Goldsmith and Heard, 1961;Luth, 2001; Antao and others, 2004) whereupon there is total disorder (s → 0 as T → Tc; fig. 2). The loss of XRD-peaks (lattice-plane reflections) 101, 5 015 and 021 at Tc is also interpreted to reflect a second-order phase change to very-high magnesium calcite (VHMC) associated with a space group transformation from R3 ̅ (dolomite) to R3 ̅ (calcite) ( Antao and others, 2004; Gregg and others, 2015). For a synthetic ideal dolomite with a Tc of 1473K ( Goldsmith and Heard, 1961), Helgeson and others (1978) determined the enthalpy of complete disordering (Δ dis°) at reference state conditions to be 12.25kJ mol −1 , whilst Navrotsky and others (1999) determine the average Δ dis° for a suite of dolomites (natural and synthetic) is even higher at 33 ± 6 kJ mol −1 . High enthalpies of disorder are thought to correlate to substantial increases in dolomite solubility (Helgeson and others , 1978).
Based on solutions to equation (12) derived by Chaikin and others (1995), for the Bragg-Williams model well-ordered dolomite (s ≥ 0.96) is the most stable state at temperatures ≤ 500°C ( fig. 2). Helgeson and others (1978) suggested the thermodynamics of natural dolomites, that have (metastable) cation disorder acquired at low-temperatures, could be approximated by extrapolating high-temperature measurements of disorder to reference state conditions. This model yields pKsp°−dol values of -18. 14 and -16.60 for the ordered and disordered dolomite phases respectively (supplementary table 2). Helgeson and others (1978) observed that prior estimates of pKsp°−dol ranged from -16.4 to -19.3, though most were close to -17 (see supplementary table 2) and interpreted that this value corresponded to a partially ordered dolomite with an s of 0.7. The ordered, disordered and naturally-ordered (s = 0.7) dolomite phases from Helgeson and others (1978) are included in a number of thermodynamic databases valid between 0-300°C, particularly those derived from SLOP databases ( Shock and Helgeson, 1988). Most thermodynamic datasets are in some way derived from the SLOP databases including EQ3/6 which is the source of many TOUGHREACT and PHREEQC compatible databases and slop07.dat which, in conjunction with SUPCRT92, generates the solubility data for the disordered and ordered dolomite phases used by this study (Wolery and others, 1990; Parkhurst and Appelo, 1999;Xu, 2008). The ordered, disordered and naturally-ordered dolomite phases have been extensively used in a variety of simulations ( André and others, 2007; Whitaker and Xiao, 2010;Al-Helal and others, 2012;Gomez-Rivas and others, 2014; Blasco and others, 2017;Hirani and others, 2018; Benjakul and others, 2020).
The metastable naturally-ordered (s = 0.7) phase Helgeson and others (1978) interpreted to be "probably typical of modern sedimentary dolomite" is confusing. Most work on what is now considered to be 'modern' (Holocene) dolomite was conducted after Helgeson and others (1978) and pre-Holocene dolomites typically have an s > 0.7 (see below). The range of Ksp°−dol values referenced by Helgeson and others (1978) encompasses both experimental studies and groundwater studies; the groundwater studies are primarily comprised of samples from the Floridan aquifer which is not modern, being Eocene to Miocene in age (Hsu, 1963; Barnes and Back, 1964). We discuss the relationships between, and the complexity of, dolomite ordering and compositions arguing that the later is the most significant property influencing the solubility of (pre-Holocene) dolomites.
Holocene dolomites are found with an s << 1, though in many cases the s > 0.7. For a near surface sediment core Gregg and others (1992) determined through Rietveld analysis that the youngest, nearest surface, calcian (60-54% CaCO3) dolomites had an s of 0.7 and s increased with depth to a maximum of 0.9 (30cm beneath the surface). Rietveld structural analysis iteratively refines a mineralogy to fit XRD data whilst, the simpler more commonly 6 applied technique of Goldsmith and Graf (1958) determines s using the ratio of the intensities of the (105 ̅ ) and (110) reflections. Fang and Xu (2019) account for variations in stoichiometry and suggest that evaluations made using the Goldsmith and Graf (1958) methodology underestimate the s for calcian dolomites finding Holocene dolomite s values to be mostly between 0.5 and 0.8.
Stoichiometric pre-Holocene dolomites are typically completely ordered (i.e. s ∼ 1) and have, though not always (see Miser and others, 1987), homogenous microstructures observed by transmission electron microscopy (TEM) (Reeder and Wenk, 1983;Reeder, 1992;Reeder, 2000). Complete ordering is unsurprising as the thermodynamic drive for order at diagenetic temperatures (0 -300°C) is high ( Helgeson and others, 1978; Navrotsky and others, 1999). Where cation disorder is interpreted it is often associated with calcian dolomite compositions, though Fang and Xu (2019) conclude that, regardless of stoichiometry, a dolomite with an s > 0.5 will have a (105 ̅ ) diffraction peak in XRD analysis. This is supported by experimental evidence of a close correlation between cation disorder and calcian dolomite compositions ( Kaczmarek and Thornton, 2017;Kell-Duivestein and others, 2019).
For pre-Holocene dolomites, from a wide variety of compositions, Pina and others (2020) observe all samples to have a (105 ̅ ) diffraction peak suggesting at a minimum all pre-Holocene dolomites have an s > 0.5, though they also note a range of disorder in stoichiometric dolomite. For Cretaceous calcian dolomites (~54.5 -53.5% CaCO3) Manche and Kaczmarek (2019) determine, using the Goldsmith and Graf (1958) technique, a high degree of disorder (mean of facies 0.43 < s < 0.56; total range 0.32 < s < 0.84). We argue that the calculated disorder for these calcian dolomites would likely increase if reanalysed using the Fang and Xu (2019) methodology and that significant cation disorder, defined as s < 0.5, resolves on short (Holocene) timescales regardless of stoichiometry. Hyeong and Capuano (2001), Vespasiano and others (2014), and Blasco and others (2018) interpolate between the Helgeson and others (1978) ordered and disordered dolomite phases to determine the s value (and Ksp−dol value) for the equilibrium dolomite phase (supplementary table 2). All three studies interpret high levels of disorder; Hyeong and Capuano (2001) determine s to be 0.4, whilst reported thermodynamic properties suggest the dolomites in the studies of Vespasiano and others (2014) and Blasco and others (2018) are marginally more ordered. Respectively the dolomites sampled are interpreted to be Oligocene ( Hyeong and Capuano, 2001), Mesozoic (Vespasiano and others, 2014), and Triassic-Upper Jurassic ( Blasco and others, 2018) in age and these values, particularly that of Hyeong and Capuano (2001), do not appear to be consistent with the aforementioned interpretations of s values for pre-Holocene dolomites being typically > 0.5. Instead, we interpret that these dolomites, and the Floridian aquifer dolomite evaluated by Helgeson and others (1978), reflect equilibrium with a natural dolomite phase of an unknown s, though with a minimum value of 0.5, and a potentially non-stoichiometric composition.
Stoichiometry.--Many dolomites have microstructural heterogeneity (typically lamellarlike modulations with wavelengths ∼100-200Å observed by TEM) and XRD signals that are attenuated and diffuse, and are interpreted to reflect the presence of lattice disorder (Gregg and others, 1992;Navrotsky and others, 1999; Gregg and others, 2015). The degree of this 8 Metastability summary.--In the context of an Ostwald-step model the overall reaction of the precipitation of stoichiometric ordered dolomite remains equation (3), but metastable and increasingly insoluble intermediary phases provide a catalytic pathway (Nordeng and Sibley, 1994;Kaczmarek and Sibley, 2007). A wide variety of intermediary metastable phases are suggested, including magnesite ( Chou and others, 1989;Sherman and Barak, 2000), amorphous phases ( Kaczmarek and Sibley, 2007;Rodriguez-Blanco and others, 2015), VHMC ( Gregg and others, 2015), protodolomite ( Graf and Goldsmith, 1956) and calcian/disordered dolomite ( Goldsmith, 1983;Reeder, 1992;Kaczmarek and Sibley, 2007;Kaczmarek and Thornton, 2017;Kell-Duivestein and others, 2019). Of these calcian dolomite is the most stable and therefore potentially consequential to natural subsurface dolomite thermodynamics.
The concept of dolomite metastabilty was incorporated by Graf and Goldsmith (1956) into their definition of protodolomite; 'single phase rhombohedral carbonates which deviate from the composition of the dolomite that is stable in a given environment, or are imperfectly ordered, or both, but which would transform to dolomite if equilibrium were established'. Protodolomite is commonly used to describe Holocene dolomites with significant cation disorder, though Land (1980) and Gregg and others (2015) reject the term 'protodolomite' on the basis that any phase without order is not considered dolomite. Instead, they define rhombohedral Ca-Mg carbonates with near-dolomite stoichiometry, but lacking cation ordering, as VHMC. Dolomite in contrast must have (XRD) evidence of 'ordering reflections' (Bradley and others, 1953; Graf and Goldsmith, 1956) that cannot occur in the R3 ̅ c space group.
For pre-Holocene dolomites, the vast majority of which have ordering reflections, the present dolomite definition arguably focuses too narrowly on order. Gregg and others (2015) permit a dolomite to have a 'near-dolomite stoichiometry' continuum, stemming from Land (1980)'s assertion that 'Ideal, stoichiometric dolomite is the exception, not the rule'. This statement conflicts with the observation that the modal natural dolomite composition is stoichiometric ( fig. 3). Undoubtedly there is complexity in the relationship between cation ordering and stoichiometry but for the purposes of this study we consider natural pre-Holocene dolomite to either be stoichiometric (and well-ordered) or calcian (with an indeterminate, though probably high, level of cation order).

Calcite Stoichiometry
The incorporation of a variable amount of Mg into the calcite structure is typically described as a non-ideal solid solution between the CaCO3 and MgCO3 end-members (see Deelman (2003) for an alternative mixed crystal theory). The dissolution of MgCO3; has the solubility product constant; where pKsp°−mag = -8.035 (table 1) (16) is said to be at stoichiometric saturation (see Lippmann, 1977;Gresens, 1981a;Gresens, 1981b;Gamsjäger and others, 2000). However metastable or stable thermodynamic equilibrium only occurs at a singular Ca 2+ a : Mg 2+ a : CO a 3 2ratio and when there is thermodynamic equilibrium between every component (end-member) in every phase. The solubility of Mg-calcite solid solutions of varying compositions was first evaluated by Thorstenson and Plummer (1977) through a reinterpretation of the Mg-calcite solubility data of Plummer and Mackenzie (1974); this reinterpretation was notably critiqued by Gresens (1981a) for generating too great a range in equilibrium Ca 2+ a / Mg 2+ a values. The use of the erroneously reinterpreted data from Thorstenson and Plummer (1977) by Möller and De Lucia (2020) is detrimental to their critique of Bénézeth and others (2018) (see below). For reference the solubility of nonideal solid solutions, such as Mg-calcite, is now estimated using Equal-G methods developed by Lippmann (1980) and Königsberger and Gamsjäger (1992).
Mg-calcite is either described as low-Mg calcite (LMC ∼ 1-4% -MgCO3 though commonly around 2%), high-Mg calcite (HMC 4-30% MgCO3) or, typically observed only during experimental synthesis, very high-Mg calcite (VHMC ∼ 30 to 51 % MgCO3 including dolomitic calcite ∼ 50% MgCO3), and huntitic calcite (75% MgCO3) compositions ( Gregg and others, 2015;Möller and De Lucia, 2020). HMC, in contrast to LMC, is only produced by biogenic processes and is poorly preserved, commonly undergoing recrystallisation to LMC ( Bischoff and others, 1993). HMC typically loses ≥ 50% of the initial Mg content within a relatively short time (10 1 − 10 4 years; Dickson, 1995 and references therein). Thus, the vast majority of subsurface lithologies that contain calcite, are likely to be primarily composed of LMC, and typically we consider pure calcite a good approximation for LMC. However in the discussion we do evaluate the effect of LMC on calcite-dolomite equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) values; specifically we assume LMC to be an ideal solid solution as the difference between calcite and mag nesite so lub ilit y is small, LMC compositions are close to the calcite end-member, and the difference between non-ideal and ideal solid solutions decreases as the composition tends towards an end-member ( Königsberger and Gamsjäger, 1992). For the ideal solid solution model equation (16) reduces to; pK sp-cal( ) = (1 − )pK sp-cal + pK sp-mag (17)

Surface Complexities
The dissolution of Mg-calcite should be incongruent; the more soluble magnesite endmember is predicted to dissolve first until no Mg remains whereupon pure calcite dissolves. Due to slow ionic lattice diffusion this does not occur, and instead both [Ca 2+ ] and [Mg 2+ ] increase suggesting that calcite and magnesite end-members dissolve congruently. This phenomena of non-equilibrium apparent congruent dissolution is described by two competing models (Gresens, 1981a); 1) Congruent dissolution; the solid solution dissolves congruently, behaving as a pure one component phase of a fixed stoichiometry. The point at which no further dissolution occurs is known as stoichiometric saturation (which may or may not also be thermodynamic equilibrium).
2) Incongruent dissolution; a thermodynamic equilibrium between the bulk and solution is maintained by an intermediary surface coating of incongruent dissolution products. This mod el is favour ed by Möller and De Lucia (2020) for the dissolution of both Mg-calcite and dolomite.
The experiments of Möller (1973), reinterpreted by Möller and De Lucia (2020), involved monitoring ( 45 Ca Th er e ar e ou ts tand ing qu es tions su rro und ing th e inter p lay b etw een bu lk mineral composition, r eactiv e sur face lay ers, an d the so lutio n du r ing times o f sign if icant mass transf er betw een ph ases . These relationships have been extensively explored for LMC (Sinclair, 2011;Sinclair and others, 2012). At one extreme, if the entire bulk mineral completely dissolves in a closed system [Ca 2+ ]/[Mg 2+ ] (and the log 10 ( Ca 2+ a / Mg 2+ a ) value) equals the bulk mineral [Ca]/ [Mg]. As mass transfers between phases increase, it seems reasonable to conclude that 1) log 10 ( Ca 2+ a / Mg 2+ a ) values will increasingly reflect equilibrium with the bulk composition and 2) the composition of the thin surface phases will respond (irrelevantly) to the solution log 10 ( Ca 2+ a / Mg 2+ a ) value maintaining a role as a catalytic intermediary phase. Möller and De Lucia (2020) suggest that the Möller (1973) log 10 ( Ca 2+ a / Mg 2+ a ) value in equilibrium with a dolomitic-composition surface phase on bulk calcite is the same as the log 10 ( Ca 2+ a / Mg 2+ a ) value determined to reflect equilibrium with the Bénézeth and others (2018) dolomite phase ( fig. 4a). Möller and De Lucia (2020) use this to argue that the determination of dolomite solubility is impossible by classic experimental methods. However the two values are not the same; specifically the log 10 ( Ca 2+ a / Mg 2+ a ) values for the Bénézeth and others (2018) dolomite phase at 25°C in 0.1M NaCl and the maximum value observed by Möller (1973) for a dolomitic surface composition (49.5-50.5% CaCO3) are -0. 01 and -0.30 respectively (fig. 4a). These values appear to be of a similar order of magnitude as Möller and De Lucia (2020) compare to the (erroneous) Thorstenson and Plummer (1977) data.
Equilibration with the Mg-calcite surface phase may be of greater significance to K sp−dol measurements when mass transfers between the bulk and solution are low. Surface phases could therefore be more consequential for experimental, rather than groundwater, Ksp−dol methods due to sluggish kinetics. Using XPS data Bénézeth and others (2018) confirmed the surface dolomite precipitate to have near ideal dolomite stoichiometry, similar to the bulk. However, in contrast to XRD data, XPS does not give information on order and thus whether the surface phase is dolomite, and not VHMC, is equivocal. Möller and De Lucia (2020) argue that the surface phases of calcite or dolomite are Mgrich compared to the bulk since Mg 2+ is more strongly bound to the surface than Ca 2+ . With respect to the experiments of Plummer and Mackenzie (1974) where biogenic calcite consisting of an average bulk HMC (18.4%MgCO 3 ) transitions to HMC (11.7%MgCO 3 ) (fig. 4b) Möller and De Lucia (2020) interpret the end state solution with a log 10 ( Ca 2+ a / Mg 2+ a ) = 0.95 is in equ ilibr iu m only with a HMC (30%MgCO 3 ) surface phase and that log 10 ( Ca 2+ a / Mg 2+ a ) values cannot be used to identify bulk carbonate compositions. There appear to be two problems with this interpretation, which Möller and De Lucia (2020) state has wide implications for determining carbonate solubility; 1-Solution non-uniqueness.--The interpretation by Plummer and Mackenzie (1974)  HMC (11.7%MgCO 3 ) compositions as this represents a non-unique solution. Supporting the interpretation of Plummer and Mackenzie (1974) that log 10 ( Ca 2+ a / Mg 2+ a ) values can be directly correlated to the composition of the bulk phase, spot EMPA/XRD determined compositions of unreacted and reacted carbonate grains correspond well to the estimated HMC (23.8%MgCO 3 ) and HMC (11.7%MgCO 3 ) phases ( Plummer and Mackenzie, 1974 Huang and Fairchild, 2001 and references therein). Such extreme fractionation is consistent with the incongruent precipitation of a nearly pure calcite early in the Plummer and Mackenzie (1974) experiment. The final incongruent steady state HMC (11.7%MgCO 3 ) precipitate that accounts for the bulk of the mass transfer results from the increasing Mg 2+ a in the closed system. Möller and De Lucia's (2020) suggestion of a thin HMC (30%MgCO 3 ) intermediary phase seems at odds with these Mg fractionation observations, though can be reconciled through mineral surface processes including electrostatic interactions.

12
The surface buffering capacity of log 10 ( Ca 2+ a / Mg 2+ a ) not only reflect changes in the surface monolayer composition but also changes in the electrical double layer. The surface potential derives from 'potential determining ions' (PDIs). PDIs are absorbed dehydrated ions with adjacent vacancies caused by preferential dissolution of surrounding ions, adsorbed hydrated surface complexes, and adsorbed charged ions (Prédali and Cases, 1973; Derkani and others, 2019). The surface potential is notably sensitive to; 1) Changes in background ionic strength (I). The accumulation of 'indifferent' ions such as Na + and Clin the electric double layer leads to effective electrostatic screening, such that the zeta potential tends towards zero with increasing I ( Strand and others, 2006;Derkani and others, 2019). As such at high I surface absorption/adsorption would be increasingly irrelevant in buffering log 10 ( Ca 2) The bulk crystal composition. Ca 2+ and Mg 2+ are interpreted as PDIs for calcite whilst they are probably not PDIs for dolomite (see Derkani and others (2019) and references therein). This suggests that the dolomite surface has no potential to buffer log 10 ( Ca 2+ a / Mg 2+ a ) through adsorption/absorption and that only recrystallisation (or ion-exchange with the monolayer) can modify log 10 ( Ca 2+ a / Mg 2+ a ). During early stages of dolomite dissolution there is an initial spike in log 10 ( Ca 2+ a / Mg 2+ a ) and thus, the rate limiting step is interpreted to be the dissolution of Mg-surface complexes (Pokrovsky and Schott, 2001; Gautelier and others, 2007). This is likely to also be the case for (Mg-)calcite even though calcite dissolution kinetics are less well constrained ( Pokrovsky and others, 2009). The reason for the lower solubility of Mgsurface complexes is interpreted to be the larger enthalpy of hydration for Mg 2+ relative to Ca 2+ ( Busenberg and Plummer, 1982;Pokrovsky and Schott, 2001; Gautelier and others, 2007). However, for dolomite immediately after the initial release of Ca 2+ , steady state quickly establishes and the overall dissolution and equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) values appear to reflect congruent dissolution of stoichiometric dolomite, and the surface [Ca]/[Mg] remains broadly constant (Pokrovsky and Schott, 2001). This assertion of a constant surface [Ca]/[Mg] is critiqued by Möller and De Lucia (2020) who argue that the Pokrovsky and Schott (2001) XPS measurements sample 20 -25 layers whilst the Möller (1973) Ca 45 measurements record compositional variation in a single monolayer, albeit of Mg-calcite ( Möller and Sastri, 1974;Möller and De Lucia, 2020). As mixed cation layers are not found in dolomite the surface mechanism for buffering log 10 ( Ca 2+ a / Mg 2+ a ) is likely to be different to a mixed cation calcite surface. We acknowledge calcite surfaces may have potential to buffer log 10 ( Ca 2+ a / Mg 2+ a ) though suggest this is mitigated by significant mass transfers. We support the view of Pokrovsky and Schott (2001) that dolomite dissolution is congruent and surface compositions are c o m p a r a b l e t o t h e bulk. Möller and De Lucia (2020) interpret that equilibrium with a wide variety of differently composed Mg-calcite surface phases has resulted in the scattered range of log 10 ( Ca 2+ a / Mg 2+ a ) values in their meta-analysis (n=242) of groundwaters. This study 13 adopts a similar big data approach and focusses intensely on scatter to evaluate whether surface phases or bulk compositions are buffering log 10 ( Ca 2+ a / Mg 2+ a ).

Determination of Ksp-dol
There are two contrasting approaches to calculating Ksp−dol; 1) Indirectly determining Ksp−dol by combining free energies of ions and compounds (fig. 5a; see 'Thermal decomposition\Database' in supplementary table 2). For example reaction enthalpies (e.g. dolomite-pyroxene equilibria at 700°C from Chai and Navrotsky, 1993) can be extrapolated to lower T/P conditions and combined with other data, such as ionisation/hydration enthalpies for Ca 2+ and Mg 2+ , to determine Ksp−dol. Similarly, refinements to the mineral data ( Holland and Powell, 1990;Holland and Powell, 1998) and aqueous models ( Miron and others, 2017) can lead to updates to Ksp−dol models.
2) Direct experimental determination of Ksp−dol in aqueous solutions near conditions of interest is thought to be comparatively more accurate and applicable than the indirect approach ( Hefter and Tomkins, 2003). Experimental solubility observations are ideally made over a wide temperature range and determined from both conditions of undersaturation via dissolution and supersaturation via precipitation so demonstrating thermodynamic reversibility. Unfortunately sluggish kinetics inhibit the attainment of equilibrium, particularly via precipitation, at lower temperatures (< 50°C) within reasonable experimental time frames (< 10 years) ( Gregg and others, 2015).
The groundwater approach assumes that the residence time of subsurface fluids is sufficient for the attainment of calcite-dolomite equilibrium enabling a direct determinatio n of Ksp−dol over a w id e temper atu re r an ge ( fig. 5a; see 'Solubility (groundwater)' in supplementary table 2). We further review the direct experimental and groundwater methods for deriving Ksp−dol below.
Solubility.--A secondary effect of determining the temperature dependence of the solubility product using a (Maier-Kelley) regression is that the difference between the undersaturation and supersaturation approaches is averaged. This is termed the quasi-static method, and the smaller the difference between the two approaches the more precise the measurement ( Hefter and Tomkins, 2003). Most experimental estimates of pKsp−dol are based only on approaching equilibrium from conditions of undersaturation because of the difficulty in precipitating unequivocally ordered stoichiometric dolomite on realistic experimental timescales, particularly at lower temperatures.
Dolomite equilibrium has been measured at 80°C by Gautelier and others (2007) and at 53°C by Bénézeth and others (2018). However, Gautelier and others (2007) only evaluate equilibrium via dissolution, and there remains uncertainty (see Möller and De Lucia, 2020) around the composition of the precipitated phase in Bénézeth and others (2018). Bénézeth and others (2018) suggest that, aside from Gautelier and others (2007) at 80°C, no other single-phase (dolomite-only) experimentally derived values for Ksp−dol have been reported for temperatures > 25°C. However this ignores the work of Rosenberg and Holland (1964), Baker and Kastner (1981), Morrow and others (1994), and Usdowski (1994)  Groundwater.--The groundwater approach presents an opportunity to 1) evaluate Ksp−dol at near-equilibrium conditions due to long residence times and 2) understand natural variations in calcite-dolomite equilibrium. However, whilst in an experiment both the solute and solvent can be analysed, for the groundwater approach these parameters are often less wellconstrained. Nevertheless most prior studies applying the groundwater approach (Hsu, 1963; Barnes and Back, 1964;Hyeong and Capuano, 2001; Vespasiano and others, 2014;Blasco and others, 2018) are site specific and establish a priori the presence of both calcite and dolomite in host aquifers ( fig. 5a; see 'Solubility (groundwater)' supplementary table 2). This contrasts with the big data approaches of Möller and De Lucia (2020) and this study that seek to interpret the presence of calcite-dolomite equilibrium from log 10 ( Ca 2+ a / Mg 2+ a )temperature relationships alone.
Defining Ksp−dol assuming the system is only-dolomite buffered (eq 4) is complicated by sample degassing and/or equilibration with atmospheric CO2 which compromises the quality of pH and HCO3measurements ( Hyeong and Capuano, 2001). The primary advantage of assuming calcite-dolomite equilibrium (eq 8) is the independence from pCO2 and pH measurements ( Hyeong and Capuano, 2001). Comparing approaches we note a two order of magnitude difference in the estimated Ksp−dol (supplementary table 1). Moreover, by multiplying the activities for the dolomite-only system (eq 4) the errors on each term are cumulated, whereas if calcite-dolomite equilibrium (eq 8) is assumed the errors on each term are reduced and compensatory (Hsu, 1963).
Through regression analysis we interpret that bulk calcite-dolomite equilibrium is extremely common in subsurface fluids and evaluate Ksp−dol. Variations in log 10 ( Ca 2+ a / Mg 2+ a )-temperature relationships are speculated to relate primarily to the composition of the equilibrium dolomite phase.

METHODOLOGY
To evaluate the dolomitizing potential of subsurface fluids, geochemical data is sourced from the U.S. Geological Survey National Produced Waters Geochemical Database V2 (PWGD; Blondes and others, 2016). This is an extensive database of formation waters from a range of sedimentary lithologies across North America to a maximum 6.6 km depth. The PWGD has previously been used for a variety of purposes including investigating the potability of produced water ( Aines and others, 2011), hydrogeological analysis of formational fluids ( Engle and Blondes, 2014), and mineral exploration (Ray, 2016).
The PWGD dataset of 165,960 waters is filtered using a set of criteria (supplementary table 3) adapted from Hitchon and Brulotte (1994) and Blondes and others (2016), leading to the rejection of 93% of samples. The remaining 11,846 samples are categorized into 5 lithological groups; Sandstone, Dolomite, Limestone, Mixed Carbonate and Shale (supplementary table 4).
The PWGD includes the temperature at which the pH was recorded, albeit very rarely (3% of samples with an average of 24°C) and we set this temperature to 25°C if not provided. Of more critical importance the PWGD lacks data on temperatures at-formation-depth. Similar to Shope and others (2012) at-formation-depth temperatures are calculated from sample depth measurements (PWGD) using the corrected geothermal gradients (SMUH) and the mean annual surface temperature.
We use the Southern Methodist University Heat Flow (SMUH) database to estimate temperature at-formation-depth. This database contains both geothermal observations and calculated thermal gradients for 99,044 records ( fig. 6b) of drilled wells across North America ( Blackwell and others, 2011). The database is filtered to include only records that have bottom hole temperature-Harrison correction (BHT-Harrison) geothermal gradients between 0°C/ km and 200°C/km (Harrison and others, 1983). Geothermal gradients are down-sampled to 0.1°x0.1° Lat/Long areas then interpolated (SciPy griddata linear method) to generate local geothermal gradients across the US ( fig. 6a).
Surface temperatures are derived from mean annual land surface temperature (MAST) measurements for North America between 2003-2014 (Bechtel, 2015) and downsampled to a 0.1°x0.1° Lat/Long resolution ( fig. 6e). The combined 0.1°x0.1° surface temperature (MAST) and corrected geothermal gradient (SMUH) datasets are referred to as the geothermal model. Of the filtered PWGD samples, 11,480 samples (96.9%) are coupled to the geothermal model generating an at-formation-depth temperature for each sample. The geochemical composition and lithological classification of samples in this dataset used in the PHREEQC modelling are summarised in supplementary tables 4 and 5. The majority of unmatched samples are from locations outside the contiguous continental states.
Using PHEEEQC with the Pitzer database (Plummer and others, 1988; Parkhurst and Appelo, 1999), which is valid up to 200°C and halite saturation, fluids are driven from the temperature at which the pH was measured (or 25°C) to their estimated at-formation-depth temperature. The Pitzer database is appropriate here as the median (P50) total dissolved solids is 42,745 mg/l and the median (P50) Pitzer calculated I is 0.85 mol/kg (supplementary table 5). The data (n=11,480) is then filtered such that the subset of samples with log 10 ( Ca 2+ a / Mg 2+ a )-temperature values outside of the range between SUPCRT92 (slop07.dat) calciteordered dolomite & calcite -disordered dolomite equilibriums (termed hereafter the 'SUPCRT92-filter') are excluded as they are determined unlikely to be at calcite-dolomite equilibrium. The remaining samples (n=10,343) constitute the main dataset for the principal regression analysis (supplementary table 4).
Both generalized linear mixed-effects models that use fixed and random effects (lme4; Bates and others, 2014), and linear models that contain only fixed effects, were constructed using R (Team, 2000) to evaluate the relationship between fluid temperature (fixed effect) and log 10 ( Ca 2+ a / Mg 2+ a ) (outcome variable) (table 2). The relationship is evaluated using variations of the Maier-Kelley thermodynamic regression formula; Models utilising the full five term Maier-Kelley thermodynamic regression formula are found to produce sizeable spurious changes, particularly at temperatures < 25°C, and are not presented. For most analyses in this study a three-term (a, b and c) expansion, as used by Bénézeth and others (2018), is supported by goodness of fit tests, in particular the Akaike information criterion (AIC) (see below). Typically, a three-term model generates very similar results to a two-term model but also enables the estimation of the heat capacity coefficient (°) and hence is the primary equation used for most models presented.
The application of a mixed-effect model evaluates the variation in the fixed effects for each 'group' of samples which share an identical descriptive attribute such as a commonality in the field or the type of lithology. These descriptive attributes are modelled as random effects. This study evaluates up to 7 different attribute-level random effects present in the majority of all samples; the depth, the formation, the field, the basin, the lithology and the age (broken down into period and series) of a sample. Note that for mixed modelling some missing descriptive attribute data is inconsequential.
As an example, the representative mixed model (J21) that utilises all 7 random effects is coded into R as follows; where lmer is the mixed modelling package used ( Bates and others, 2014).
For a single sample in the PWGD there is a 3-level (nested) hierarchical structure for spatial data. This is implemented in the mixed model as a series of nested random effects at a dataset level, not needing to be explicitly included in equation (20). The 10,343 samples from different wells share 2,168 unique common fields and these fields are nested together by shared basin (n=53). Similarly for the temporal data the time series (n=35) is nested within a geological period (n=13) (i.e. the Tournasian is in the Carboniferous).
Random effects are described as completely crossed if samples possess every combination of random effect attribute, whilst they are partially crossed if there is some degree of nesting. Typically the attributes of lithology, time-period/series and formation operate as partially crossed random effects. For example, in some cases the formation random effect will be substantially crossed against the basin and field random effects; e.g. the Tensleep Fm. (n=659) spans 7 different basins. In other cases local formations, particularly those related to production, are completely nested within a single field.
An additional attribute is created by collating samples from the same field into groups of similar depth under the assumption that they share a common formation/fluid. The assumption is most reliable where, at an intrafield scale, the subsurface geology is laterally homogenous and hydrologically connected. There are 9,049 unique depth groups which suggests that there are 1,294 (10,343-9,049) repeat measurements of the same depth in the same well. By collating samples taken from similar depths the number of groups is significantly reduced. For example, for a 300m depth interval grouping (e.g. all samples from 600 -900m) the total number of groups reduces to 3,296.
For the representative model (model J21) the simplified (i.e. presenting all random effects as crossed random effects) formal statistical notation of this 3-term Maier-Kelly regression, 7 random effect model is as follows; Where log 10 ( Ca a 2+ Mg a 2+ ) ijklmnpq represents the i th value of the response variable log 10 ( Ca 2+ a / Mg 2+ a ), β 0 is the global intercept, β 1 and β 2 are the coefficients for the temperature (K) which is modelled as a fixed effect and U(j…p) are the 7 random intercepts with the subscripts denoting depth(j), field(k), basin(l), formation(m), lithology(n), period(p) and series(q). ∈ ijklmnpq is the residual or error term for the i th value after accounting for the random effect terms.
The mixed model returns three sets of results that are of interest. At the global-level, the model generates a global intercept (β 0 ) and global fixed effect coefficients (β 1 and β 2 ) which effectively describe the properties of the average sample. At the attribute-level the mixed model returns information that describes how much of the model variance can be attributed to each random effect (e.g. 'field' or 'formation') that is implemented. As each random effect is implemented the residual variance (∈ ijklmnpq ) typically reduces.
For each individual group, for example all samples from the Soso field, the model returns coefficients that describe the deviation of the group away from the global-level model. Mixed models that only evaluate the deviation of group data away from the globallevel model by generating a new (β 0 ) intercept for each group are termed random intercept (RI) models. By inserting RI values for different groups into the place of the global intercept, with the global fixed effect coefficients remaining constant, separate local solubility models for different groups can be created. Random slope models further this by incorporating the dependence of the fixed effect coefficients (β 1 and, though not used in this study, β 2 ) on the random effects such that each group has both an RI value and a random gradient (β 2 and/or β 1 ) for the log 10 ( Ca 2+ a / Mg 2+ a )-temperature profile. A primary feature of mixed modelling is partial pooling. Partial pooling stands in contrast to the alternative strategies of complete and no pooling. Complete pooling models all data as a single group; for example the linear model (model K1) of the entire PWGD represents a complete pooling approach. The main problems with complete pooling are that 1) between-group variations are suppressed and 2) both attribute-level and group-level information, which may be of interest, is lost. Moreover suppressing grouping information can have deleterious effects on uncertainty estimates ( German and Hill, 2006). A no-pooling approach models each group separately; for example, for the PWGD this would comprise several thousand linear models of each group of samples with a common attribute. A nopooling approach overstates the between-group variation with significant, though unreliable, differences between individual groups appearing. The partial pooling approach represents a compromise and a significant advance. All three levels of mixed model results (global-, attribute-and group-level) are informed by both the data contained within each group and group size. Low sample size groups possess less information so both influence the attribute-/global-level results less and have group-level coefficients similar to the attributelevel/global-level results (Gelman and Hill, 2006).

18
The application and statistical validity of mixed models for datasets with small numbers of observations and groups is an active area of research as applied research is frequently data limited (Bell andothers, 2010, Hox andothers, 2017). For mixed models there are group and sample size guidelines; Hox and others (2017) suggest that the ratio of the number of groups to the number of individuals in each group be on the order of 100/10 (100 groups of 10 individuals), 50/20 or 30/30. The mixed models (A1 -I1) presented in the ground truthing section below fail these rules of thumb and linear models for these cases are arguably more statistically valid. However the small sample size mixed models correlate well with linear models illustrating the lack of sensitivity of mixed model global-level fixed effect coefficients to limited data ( Bell and others, 2010). The primary reason these models are included is because they demonstrate important features of mixed models at a relatable smaller scale. The models for the PWGD (n=10,343 or 11,480) easily surpass minimum group and sample size requirements. The PWGD contains many small sample size groups (n < 5) though this is not thought to significantly affect fixed and random coefficients (Hox and others, 2017).
Model selection aims for the simplest model to describe the relationship between temperature and log 10 ( Ca 2+ a / Mg 2+ a ), and whilst guided by statistical tests remains an inherently subjective exercise. The AIC, the R 2 , 95% confidence intervals (2σ), and the intraclass correlation coefficient (ICC) are statistical tests that assist selection by evaluating goodness of fit and model uncertainty. In particular the AIC compares differently parameterized models, utilizing identical datasets, with the model possessing the lowest AIC value (AICmin) regarded as the best fitting. AIC differences between models smaller than 10 are not regarded as sufficient criteria for model rejection whilst differences smaller than 2 suggest support for both models ( Burnham and Anderson, 2002).
We frequently observe that samples from an individual group (e.g. all samples from a single field or formation) appear to have strong within-group similarities, but there are significant differences between-groups often including when the groups are closely linked (for example immediately adjacent fields). The intraclass correlation coefficient (ICC) describes how strongly within-group samples resemble each other or, more specifically by extension, the fraction of total variation of a dataset that can be accounted for by betweengroup variation (Gelman and Hill, 2006). The ICC is implemented using the method of Nakagawa and others (2017) and (as is typical) we report only the adjusted ICC value, which is the percentage of total variance explained by groups (excluding fixed effect variance). In essence adjusted ICC reflects the reproducibility of replicate measurements from each group. ICC values < 0.4 indicate poor reproducibility, 0.4 ≤ ICC < 0.75 indicate fair to good reproducibility and ICC ≥ 0.75 indicates excellent reproducibility. For an ICC of zero, observations within-groups are no more similar than observations between-groups and there is no difference between a mixed model (partial pooling) and a linear model (complete pooling). As such the ICC is used as a discriminator for determining whether to apply a mixed model technique. Whilst the fixed effects may change little if moving from a linear to a mixed model, as is the case in many instances in this study, the larger the ICC the greater the impact on the standard error of a regression term and other measures of model uncertainty such as 95% confidence intervals. It is widely recognised that mixed modelling more 19 accurately quantifies uncertainty for clustered datasets over linear models (Gelman and Hill, 2006).
For linear models only, the adjusted R 2 value is reported. For mixed models, the method of Nakagawa and others (2017) is used to determine the marginal and conditional R 2 . The marginal R 2 describes the variance explained by the model that is a product of only the fixed effects and is generally similar to the R 2 of a comparative linear model. The conditional R 2 of a mixed model describes the variance explained by both fixed and random effects and is typically higher than the marginal R 2 as random effects account for some of the variance.

RESULTS
The PWGD samples (n = 11,480) come from a wide range of locations and depths across 30 of the contiguous US continental states, though historically prolific oil producing states such as Wyoming (n = 2,946) and Texas (n = 2,495) are heavily represented ( fig. 6d). The range in annual surface temperatures is small (-5 -21°C) compared to the range of geothermal gradients (11 -88°C/km) (supplementary table 5; fig. 6a and e). With a median formation depth of 1.9km, the contribution to the estimated at-formation-depth temperature (and uncertainty) from the geothermal gradient (σ = 6°C/km) is typically twice that of the surface temperature (σ = 6°C).
The relationship between log 10 ( Ca 2+ a / Mg 2+ a ) and temperature i s i n i t i a l l y e v a l u a t e d for t w o areas where prior studies demonstrate pore waters are at calcitedolomite equilibrium, allowing ground truthing of the approach. The areas are the northeastern section of the Texas Gulf Coast (TGC) basin, studied by Hyeong and Capuano (2001) and to a lesser extent by Morton and Land (1987), and the Mississippi Salt Dome (MSD) basin studied by Kharaka and others (1987). To evaluate the reliability of this study's geochemical and geothermal methodology within these areas we compare both groundwater chemistry data in the PWGD and the at-formation-depth temperatures derived by this study's geothermal model with the published geochemical and temperature data.

Ground Truthing: Thermal Gradients in the Texas Gulf Coast
Datasets from the north-eastern TGC primarily include samples from two formations; the Oligocene Frio Fm. and the Miocene Fm.. The Frio Fm. is comprised of sand-rich fluvio-deltaic sediments that have a variable mineral composition ( Loucks and others, 1984), including both detrital carbonate grains (caliche clasts) and diagenetic carbonate cements ( Loucks and others, 1980). The Miocene Fm., otherwise known as the Miocene Major Stratigraphic Unit, is comprised of similar fluvio-deltaic sandstones. Both formations are locally cemented by carbonate which comprise ≤ 15% of the total rock volume (Land, 1984;Hyeong and Capuano, 2001). Carbonate cements are primarily calcite, ankerite and dolomite; dolomite constitutes 10 -15% of the cement and therefore ≤ 0.8% of the total rock volume (Land, 1984;Hyeong and Capuano, 2001).
Within the Frio Fm. Morton and Land (1987) define four regions of typically distinct end-member water compositions, thought to reflect local geochemical processes, influx of waters from deeper aquifers, and structural discontinuities (faults and diapirs) that restrict regional fluid flow. Our Test Area A (bounding box W94°48' -W96°00'/N28°48' -N29°30' totalling some 9,059km 2 ) lies within the most northerly of Morton and Land's (1987) sub-regions, referred 20 to here as the 'north-eastern TGC' ( fig. 7a and b) and includes the three fields from the seminal high-resolution study of Hyeong and Capuano (2001); the West Columbia, Chocolate Bayou and Halls Bayou fields. Hyeong and Capuano (2001) collated the latter two into the 'Chocolate/Halls Bayou field' presumably due to the small sample size for the Halls Bayou field (n=2), proximity between the fields and the stratigraphic equivalence of the two fields.
For the 21 PGWD samples from 9 different fields within Test Area A the geothermal model yields a mean surface temperature of 16°C (range 15 -16°C) and a mean geothermal gradient of 33°C/km (range 29 -35°C/km) ( fig. 8a). This is steeper than the geothermal gradient of 25.7°C/km determined using the 16°C surface temperature and the subsurface temperature observations of Morton and Land (1987) for the Chocolate Bayou field. In contrast, using the 16°C surface temperature, the more recent and detailed subsurface temperature observations of Hyeong and Capuano (2001) for the Chocolate Bayou field yield a mean local geothermal gradient of 31°C/km (range 27 -34°C/km) which is consistent with both Hyeong and Capuano's (2001) interpretation of the mean local gradient (30°C/km) and the estimate for Test Area A PGWD samples (33°C/km).
Both Hyeong and Capuano (2001) and Morton and Land (1987) suggest the presence of significant local vertical variations in geothermal gradients. Hyeong and Capuano (2001) samples from the West Columbia field have a high local gradient of 39°C/km (range 32 -51°C/km). This is attributed primarily to high temperature samples (43 -58°C) from the shallow Miocene Fm. (651 -921m) which, assuming a 16°C surface temperature, have a geothermal gradient range of 40 -51°C/km. These high gradient values are significantly outside the range generated by the geothermal model for Test Area A (29 -35°C/km). This suggests that the methodology adopted here generates at-formation-depth temperatures broadly consistent with in situ observations, but also highlights limitations when applying simple linear geothermal gradients in areas of complex geothermal regimes.

Ground Truthing: Calcite-Dolomite Equilibrium in the Texas Gulf Coast Frio Formation-
Test Area A All 21 samples from Test Area A in the PWGD are from the Frio Fm. (fig. 7b). These samples are compared with Hyeong and Capuano (2001) data from the West Columbia (43 -85°C; n=16) and Chocolate/Halls Bayou fields (94 -150°C; n=35), the majority (88%) of which are also from the Frio Fm.. This area lies within the north-eastern TGC that is characterized by NaCl-type waters derived primarily from the dissolution of salt diapirs ( Morton and Land, 1987). It is for these NaCl-type waters (Ca 1,490 mg/l, [Ca 2+ ]/[Mg 2+ ] ~ 10) that Hyeong and Capuano (2001) interpret log 10 ( Ca 2+ a / Mg 2+ a )-temperature trends to be consistent with prior estimations of the calcite-dolomite equilibrium.
A primary concern of Hyeong and Capuano (2001) is evaluating the extent to which the log 10 ( Ca 2+ a / Mg 2+ a )-temperature relationship reflects either fluid mixing processes or water-rock reactions. Hyeong and Capuano (2001)   Comparing mixed models of the recalculated Hyeong and Capuano (2001) dataset, a three-term Maier-Kelly expansion (model A3, AIC = -107.4) provides a better fit to the dataset than a two-term expansion (model A4, AIC = -84.4). However, given the small size of the dataset (n = 51), three-term models (such as model A3) generate spurious extrapolations beyond the data range ( fig. 9a). A two-term expansion is the regression formula used for small (n < 400) datasets. There are minimal differences between the random intercept mixed model A4 and a random slope mixed model A5 ( fig. 9a)  ) trend with temperature (model E1), but the significance of this is limited (R 2 of -0.24) due to the small number (n = 6) and limited temperature range (43 -58°C) of Miocene Fm. samples. Hyeong and Capuano (2001) established a complete pooling approach precedent of combining observations from different fields and formations to increase the temperature range of a Ksp−dol model. Application of the mixed model approach to the Hyeong and Capuano (2001) dataset (n=51) (model A4) generates an adjusted ICC value of 0.441, indicating that grouping explains a significant amount of the variance (44.1%; good to fair reproducibility for grouped data -Rosner (2015)). As is expected for grouped data, the conditional R 2 of the mixed model A4 (0.90) is larger than the marginal R 2 (0.83), indicating that the application of random effects improves model fit; note the marginal R 2 of model A4 also approximates the R 2 of the linear model A2 (0.84). Model A4 and model A2 have very similar profiles ( fig. 9a) however the uncertainty is substantially larger for model A4. The standard error on the intercept log 10 ( Ca 2+ a / Mg 2+ a ) value rises from 1.49×10 −1 log unit for model A2 to 2.44×10 −1 log unit for model A4. Similarly, the width of the 95% confidence interval at 25°C rises from 0.13 (model A2) to 0.39 (model A4) ( fig. 9a). This is consistent with uncertainties determined by linear models being erroneous underestimates for clustered data (Gelman and Hill, 2006). However, AIC indicates that the linear model A2 (-107.1) is better than the mixed model A4 (-84.4), reflecting the overparameterization (limited numbers of group/sample size) of the mixed model. For only marginally larger datasets, AIC values favor mixed models over linear models though the statistical validity/reliability of mixed models may not be assured until group/sample size criteria are met (Hox and others, 2017).
Had Hyeong and Capuano (2001) would have observed significant disparities between-groups had they adopted a no-pooling approach (compare linear models C1, B1, D1 and E1; fig. 9b  As is commonly found in the PWGD, samples from Test Area A (n = 21) do not sample multiple depths in each of the 9 fields. Instead there is high representation from a few individual fields and from a narrow range of production related depths (for example see the discussion of Frio Fm. below). This contrasts with the wide range of sampled depths within closely spaced fields in the Hyeong and Capuano (2001) dataset. The typically more limited coverage in the PWGD means it is poorly suited to such single field analyses. However we demonstrate consistency between the log 10 ( Ca 2+ a / Mg 2+ a )temperature profiles/models of the PWGD samples in Test Area A (n = 21; model F1) and the Hyeong and Capuano (2001)  Expanding upon Hyeong and Capuano (2001) . 9d). The large scatter suggests significant local effects which may include variations in the local dolomite composition, or potentially relate to the heterogeneity of Frio Fm. water compositions ( Morton and Land, 1987).
The Frio Fm. dataset (n = 117) covers 48 fields but 35 fields have 2 or less samples, and the largest single field dataset (n = 21 from the Seeligson field) only spans a narrow 15°C temperature range (66 -81°C). Combined with the substantial scatter these two population characteristics preclude using only Frio Fm. samples to conduct a single field type analysis, similar to Hyeong and Capuano (2001) Morton and Land (1987).  Hyeong and Capuano's (2001) interpretation that the formational brines in the Cretaceous Edwards group of south-central Texas ( Land and Prezbindowski, 1981) are at calcitedolomite equilibrium ( fig. 9d). For these brines (49 -176°C) Hyeong and Capuano (2001) note a similar log 10 ( Ca 2+ a / Mg 2+ a )-temperature gradient to that observed in the West Columbia and Chocolate/Halls Bayou fields, but with log 10 ( Ca 2+ a / Mg 2+ a ) values shifted ~ 0.1 log unit higher. The Edwards group primarily consists of dolomitized limestones and Hyeong and Capuano (2001) interpret that a more ordered dolomite phase, than that found in the West Columbia and Chocolate/Halls Bayou fields, is buffering log 10 ( Ca 2+ a / Mg 2+ a ) to high values. As such Hyeong and Capuano, (2001) state that their reference model (model A1) is not valid for equilibrium between calcite and more ordered dolomite phases.

Ground Truthing: Thermal gradients in the Mississippi Salt Dome Basin
The Mississippi Salt Dome (MSD) basin is mostly located in central Mississippi and is separated from the TGC basin to the south by the Wiggins, South Mississippi, and La Salle uplifts. Kharaka and others (1987) sample (n = 17) a range of clastic and carbonate Upper Cretaceous (Stanley Fm.) to Upper Jurassic (Norphlet Fm.) reservoirs (70 -120°C) in 7 MSD basin fields. The Kharaka and others (1987) dataset is much lower resolution than that of Hyeong and Capuano (2001) and only the Reedy (1920 -3485m; 68 -102°C; n = 5) and Soso (1965 -2875m; 68 -89°C; n = 3) fields sample a wide depth/temperature range.
Test Area B, which encompasses the 7 fields sampled by Kharaka and others (1987), spans the Alabama-Mississippi border and has an area of 13,224km 2 ( fig. 7c, bounding box W88°12' -W90°00'/N31°30' -N32°12'). For the 204 PGWD samples, from 35 different oil fields, contained within Test Area B the geothermal model yields a mean surface temperature of 13°C (range 12 -14°C) and a mean geothermal gradient of 27°C/km (range 22 -34°C/km) ( fig. 8b). Assuming a 13°C surface temperature, the mean geothermal gradient for the Kharaka and others (1987) dataset is calculated to be 26°C/km (25 -29°C/km) which is comparable to the mean gradient predicted by this study's geothermal model. Moreover, in both datasets samples from the Soso field are present but critically non-identical. Interestingly the most similar Soso samples from depths of 2031m in the PWGD and 1965m in Kharaka and others (1987) dataset have predicted (PWGD) and observed ( Kharaka and others, 1987) temperatures of 69°C and 68°C respectively.
At-formation-depth temperatures predicted for Test Area B have a greater range than those for Test Area A ( fig. 7a). This probably reflects both a wider sampling area and the substantial faulting and salt tectonics that characterize the MSD basin. In particular salt domes can dramatically increase the lateral heterogeneity of local geothermal gradients ( Daniilidis and Herber, 2017). In Test Area B several fields, including the well represented Raleigh and Soso fields, are proximal (< 5km) to salt domes whilst in Test Area A the majority of fields are ≥ 15km away from a salt dome.

25
For a given water analysis the presence of calcite-dolomite equilibrium has been interpreted using two separate standards. 1) Most commonly calcite-dolomite equilibrium is interpreted through evaluating saturation indices for both minerals. Kharaka and others (1987) interpreted all samples (n = 17) to be at (SOLMINEQ.87) calcite-dolomite equilibrium (-0.5 < SI < 0.5) and attribute this to relate to the substantial dolomitization of the limestone Smackover Fm. (Carpenter and Trout, 1978;Kharaka and others, 1987). 2) Calcite-dolomite equilibrium can also be interpreted if the log 10 ( Ca 2+ a / Mg 2+ a ) or [Ca 2+ ]/[Mg 2+ ]-temperature relationship is consistent with prior estimates ( Land and Prezbindowski, 1981;Dutton, 1987;Hyeong and Capuano, 2001; Vespasiano and others, 2014;Blasco and others, 2018). This method of interpretating the presence of calcite-dolomite equilibrium has typically only been applied whilst evaluating Ksp−dol and/or the order of the natural dolomite phase (see 'Solubility (groundwater)' supplementary table 2). Within the Kharaka and others (1987) fig. 9c and d). Test Area B samples are most different to the Test Area A samples, though interestingly the models for these datasets (models A4 and I1) and others discussed intersect around ~120°C suggesting differences are more pronounced at low temperatures.
Expanding upon Kharaka and others (1987) and Hyeong and Capuano (2001) this study interprets that the high log 10 ( Ca 2+ a / Mg 2+ a ) waters present within the Edwards group, Test Area B and most, but not all, of the Frio Fm. are consistent with calcite-dolomite equilibrium. The range of log 10 ( Ca 2+ a / Mg 2+ a ) values at a specific temperature that can be considered consistent with an interpretation of calcite-dolomite equilibrium is therefore wide, the width being further evaluated below, and attributed to local effects.

Distribution of samples in the PWGD
Geothermal model.--The highest spatial resolution of this study's geothermal model ( fig.  6a) is in hydrocarbon-producing basins as oil and gas wells are the primary data source for the SMUH dataset ( fig. 6b; Blackwell and others, 2011). Ground truthing for two such datarich areas (Test Areas A and B) established the reliability of the geothermal model. Blackwell and others (2011) produced two interpolated maps ( fig. 6c) for heat flow and temperature at two specific depths (6.5km and 9.5km). Comparing this study's interpolation with that of Blackwell and others (2011), we find them to be mostly similar, though note a small number of disparities. For example, in parts of northern Virginia, this study's geothermal model and the Blackwell and others (2011) model respectively resolve opposing high and low geothermal anomalies. However, disparities between the interpolated models are typically in data poor areas, such as Virginia, where interpolations are less reliable. In comparison, more marginal differences are observed in higher data density areas, such as in south Texas, where the Blackwell and others (2011) interpolation resolves geothermal features to a much finer spatial resolution than this study's geothermal model suggesting the lateral resolution could be increased further.
As discussed for the Miocene Fm. samples in Test Area A, linear geothermal gradients over-simplify vertical thermal heterogeneities. Significantly more processing, outside the scope of this study, would be needed to generate an interpolated 3D temperature model. However, as established for Test Areas A and B, we interpret that the geothermal model generates robust at-formation-depth temperatures for most samples. However low kurtosis values could be an artifact as small datasets are rarely consistent with a normal distribution, a necessary assumption for the kurtosis and skewness parametric tests (Rosner, 2015). Kurtosis reflects little about the distribution of samples within 1 σ from the mean and is a poor descriptor of the 'peakedness' of the data (Westfall, 2014). Leptokurtic distributions instead reflect the presence of significant numbers of outliers which is consistent with 10% (n=1,137/11,480) of PWGD samples having log 10 ( Ca  . 10i). Nevertheless support for the presence of systematic additional modes is weak and meaningful identification of additional modes is also challenged by the presence of repeat samples from a single depth/formation. Note that replicate measurements are well accounted for by mixed models where they strengthen group weightings.
In contrast to modal log 10 ( Ca  . 10c) for the limestone population are systematically higher than the dolomite population, particularly between 50 -100°C (which contains 44% of all limestone-hosted and 25% of all dolomite-hosted samples). This disparity is primarily due to the skewed distribution of the limestone population, which for 50 -100°C is categorised (Bulmer, 1979) as either moderately (skewness between -1 and -0.5 or +1 and +0.5) or heavily (skewness < -1 or > 1) skewed, with values up to +2.73 for the 60 -70°C temperature bin ( fig. 10e)

PWGD Mixed Modelling
Mixed modelling for the entire PWGD focuses on developing both a representative statistical model between log 10 ( Ca 2+ a / Mg 2+ a ) and temperature, and a Ksp−dol model. All models below utilize the SUPCRT92-filtered dataset (n=10,343) and a 3-term Maier-Kelly mixed model unless otherwise stated. To appropriately parameterize the mixed model of PWGD data the number of random effects utilized is incrementally increased. The extent to which a particular model over-or under-fits data is evaluated using goodness of fit measures so justifying modelling decisions. For example, we refine the classification scheme by which samples are categorized into different depth groups.
Grouping by depth.--We compare a series of mixed models of the PWGD dataset utilizing only the depth random effect (models J2 -J9) but with varying depth interval widths. Beginning with a model with no depth-dependent grouping of samples (no. of depth groups = 8201; AIC = -2754; model J2) increasing the depth interval width results in AIC values decreasing up to 300m (no. of depth groups = 3,296; AIC = -3197; model J7). Thereafter as the grouping interval increases so also do AIC values, to a maximum reported here of 500m (no. of groups=3,040; AIC=-3153; model J9). The minimum AIC value for the 300m grouping interval suggests this is an optimal balance between over-and underparameterization.
From a hydrogeological perspective a 300m grouping interval seems to be toward the upper end of a reasonable depth range. However, the vast majority of depth groups contain samples from a relatively narrow range of depths (typically <50m) which correspond to particular production zones. The primary purpose of the depth random effect is to incorporate the variance attributable to repeat sampling of similar formational fluids over a narrow range of depths. A secondary purpose is the potential to evaluate variations in log 10 ( Ca 2+ a / Mg 2+ a ) values at different depths in a field, by using group-level RI values. This may be particularly useful in fields where there is either a lack of detail about the formations present or the formations are thick and discretely stratified into isolated reservoirs with potentially different fluid compositions. In comparison to other models that utilize only a single random effect (models J10-J15) the 300m depth-only random effect model is clearly the best model.
We exemplify the relevance and applicability of the depth grouping with the example of the Seeligson field, Texas (n=51). Seeligson field samples come from 2 recognised formations, the Frio Fm. (n=20) and the Vicksburg Fm. (n=1), with the remainder (n=30) from one of 13 less informative production-related formational names such as '3300 Water Sand' (n=7) and 'A' (n=1). In contrast to grouping by one of 15 different formations, the 300m interval grouping generates 8 different groups which appear to reasonably collate samples from similar production horizons. The Frio Fm. samples are primarily split into two groups; depth groups 'AT4' (n = 5; depth of all samples identical at 1.44km) and 'AT5' (n = 13; sampled from depths between 1.79 -1.86 km) which respectively correspond to the Seeligson production zones '14' (channel-fill deposits) and '20' (floodplain deposits with carbonate nodules noted) ( Ambrose and others, 1992).
Below we construct the main reference mixed model of this study (model J21). In comparison to the global-level intercept (-1.91) the model J21 group-level RI value(s) for the Seeligson field is -1.81 and for the 'AT4' and 'AT5' depth groups are -1.58 and -1.89 respectively. As is commonly encountered by this study a lack of detailed mineralogical/lithological information for the Seeligson field makes it hard to interpret the causes of differences in RI values. Nevertheless and as an example, based solely on RI values we would interpret that the dolomite composition corresponding to the 'AT5' depth group is more calcian than that corresponding to the 'AT4' group.
Combining random effects.--In comparison to a linear model (AIC = -1455; model J1) of the PWGD data, all single random effect mixed models (models J10 -J15) have lower AIC values affirming both the use of mixed models and the inclusion of these specific attributes.
There is only limited support for the time-period-only (AIC = -1457; model J14) and time-series-only (AIC = -1501; model J15) models. The PWGD subset possesses time-series and time-period information for only 55 and 59% of samples respectively, contrasting with other attributes where completeness is either a criterion for inclusion (such as the sample lithology) or are near complete (e.g. n = 10,308/10,343 samples have formational information). This may be a significant contributing factor in the observed statistical weakness of the time-period and time-series models.
A general trend is observed that model AIC values decrease as the spatial resolution of the grouping attribute, and by extension the number of groups that describe the data, increases. In decreasing order of AIC values the single random effect models are ranked; time-period model (13 period  There are three exceptions to this 'more groups better model' trend. As aforementioned the depth-only model with no sample groupings (model J2) has a higher AIC value than other depth-only models that group samples by a depth range. Similarly model J2 has a higher AIC than the field-only model (model J10) and this highlights the overparameterization in model J2 and the benefit of using depth interval groupings. Lastly we speculate data completeness may explain why the lithology model (model J13) has a lower AIC than the time-period (model J14) and time-series (model J15) models.
Systematically combining multiple random effects, both model fit and the capability to reliably describe a wide variety of sample attributes are progressively improved. As determined by AIC the best fitting model, the reference model (AIC = -3595; model J21), utilizes all 7 random effects (depth, field, basin, formation, lithology, time-period and timeseries). AIC values are only marginally (< 5) higher for the 4-random effect (Depth-Field-Basin-Formation; -3591; model J18) and 5-random effect (Depth-Field-Basin-Formation-Lithology; -3590; model J19) mixed models indicating some support for these models. The inclusion of lithology and time-period/time-series random effects have a minimal effect on both the global/attribute/group-level results, but the inclusion of these random effects is clearly not penalized by AIC for overparameterization. As such the 7-random effect model (model J21) is selected as the representative model. Global-level model results are relatively similar for all models; the linear model (model J1) and the reference model (model J21) have pK sp°−dol values of -17.24±0.02 and -17.27±0.35 respectively. The width of the confidence intervals typically reduces as mixed models incorporate increasing numbers of random effects; the maximum uncertainty is shared by the lithology and time-period single random effect models (±0.45; models J13 and J14). Note again the narrow uncertainty for the linear model (±0.02) which reflects the erroneous underestimation of uncertainty for clustered data (Gelman and Hill, 2006). The ICC value for the reference model J21 is 46.6% (good to fair reproducibility for grouped data - Rosner, 2015), which indicates that grouping is indeed significant for PWGD data and a mixed modelling approach is appropriate. We evaluate other suggested mechanisms, aside from calcite-dolomite equilibrium, that could buffer fluids to the observed log 10 ( Ca 2+ a / Mg 2+ a ) -temperature profile. Firstly, we critique the interpretations of Möller and De Lucia (2020)  Comparison to the Möller and De Lucia (2020) meta-analysis.--The largest previous meta-/regression analysis of groundwater log 10 ( Ca 2+ a / Mg 2+ a ) values is that of Möller and De Lucia (2020) who consider 242 samples from 17 regions around the world. They similarly find these fluids have log 10 ( Ca 2+ a / Mg 2+ a ) values that are intermediate with respect to the SUPCRT92-filter. However, for the majority of data subsets they observe log 10 ( Ca 2+ a / Mg 2+ a )-temperature profiles to be non-parallel to the SUPCRT92 calcite-dolomite equilibria suggesting this indicates buffering by Mg-calcite surface phases. Möller and De Lucia (2020) also recognise that a minority of samples have log 10 ( Ca 2+ a / Mg 2+ a )-temperature profiles parallel to the SUPCRT92 calcite-dolomite equilibria. These include datasets from the Red Sea, Meizar Wells/Yarmouk Gorge (Israel/Jordan), and the MSD basin. Nevertheless the presence of calcite-dolomite equilibrium is dis coun ted, w ith an alternative interpretation for these sites invoking unspecified differences in environmental conditions and chemical reactions. The MSD basin dataset is from Kharaka and others (1987) and our expanded analysis of Test Area B samples arguably strengthens an interpretation of calcite-dolomite equilibrium.
The Meizar Wells/Yarmouk gorge dataset is from Siebert and others (2014) and Siebert and others (in prep). The dominant lithology in the Yarmouk gorge is limestone and, though Siebert and others (2014) mention that limestones are altered and interpret calcite-dolomite equilibrium for springs at the Shamir Artesian location, diagenetic information for the local carbonate aquifers is limited. Most wells in the Siebert and others (2014) database appear to sample the Turonian-Cenomanian Ajloun group, which further south is described as primarily being composed of fine-grained dolomitized shallow/restricted marine mudstone/wackestones ( Khalifa and Abed, 2010). Locally, dolomitization is identified (well Mukheibeh 4) in the upper part of the Ajloun group (Inbar and others, 2019). Evaluating the Meizar Wells/Yarmouk gorge dataset the log 10 ( Ca 2+ a / Mg 2+ a )-temperature profile does appear to be consistent with calcitedolomite equilibrium ( fig. 9f). However, samples with temperatures < 25°C have scattered log 10 ( Ca 2+ a / Mg 2+ a ) values. For simple 2-term (Maier-Kelly) linear models including samples < 25°C the R 2 is 0.05 and the p-value is 0.08 (model L1; n = 42) whilst removing six < 25°C samples results in an improvement in goodness of fit measurements with the R 2 rising to 0.38 and the p-value falling to 4.23 × 10 −5 (model M1; n = 36). Unlike model L1, model M1 accounts for much of the variation in the dataset (high R 2 ) and is significant (low p-value).
Generally there is a divide in the Möller and De Lucia (2020) dataset whereby regional subsets with sample temperatures > 30°C have log 10 ( Ca 2+ a / Mg 2+ a )-temperature trends largely parallel to the SUPCRT92 calcite-dolomite equilibria, whilst those with temperatures < 30°C are non-parallel. Only 22% of the samples (n = 54) in the Möller and De Lucia (2020) dataset have recorded depths, with the majority derived from surface springs. This is reflected in a mean sample temperature of 32°C (σ = 24°C) which is substantially lower than the 62°C (σ = 29°C) mean of PWGD at-formation-depth temperatures.
In contrast to this study, Möller and De Lucia ( 2020) utilise the recorded sample temperature and do not thoroughly evaluate the temperature at-formation-depth. For example, for the North East German basin Möller and De Lucia ( 2020) Tesmer and others (2007). However we find a significant number of temperatures from Tesmer and others (2007) to be anomalous when compared against a representative geothermal model comprised of a linear geothermal gradient of 32°C/km ( Agemar and other, 2012) and an average surface temperature of 4°C (supplementary figure 2). An exemplar anomaly is from the Rheisberg well where the log 10 ( Ca 2+ a / Mg 2+ a ) is determined at 16.6°C for a sample taken from 1600 m which seems improbably low. Inaccurate equilibrium temperatures have a minimal effect on the calculated log 10 ( Ca 2+ a / Mg 2+ a ) values, as these are primarily dependent on the solution composition, but are extremely consequential when contextualising log 10 ( Ca 2+ a / Mg 2+ a ) values to interpret calcitedolomite equilibrium.
In contrast, sample temperatures from Hyeong and Capuano (2001) and this study's geothermal model appear to be reliable. Specifically Hyeong and Capuano (2001) measure temperatures at the well perforation during the formation shut in test or, lacking that data, derive temperatures from corrected well log measurements. All borehole data is interpreted to have a good control on the sample depth, which is by extension assumed to be the depth at which equilibrium is established.
The samples in the Möller and De Lucia ( 2020) meta-analysis are derived from both surface springs and boreholes which have separate methods of estimating/observing atformation-depth temperatures. For spring water samples, where the equilibrium temperature is less well known, prior analyses have adopted a geothermometrical approach as exemplified by Chiodini and others (1995), Vespasiano and others (2014), and Blasco and others (2018) (supplementary table 2). These studies have evaluated the equilibrium temperature and dolomite order using both isotope data and compositional analysis with respect to multiple reference minerals (such as calcite, dolomite, anhydrite and quartz).
There is a strong similarity ( fig. 5b) between the reference model (model J21) log 10 ( Ca 2+ a / Mg 2+ a )-temperature profile and prior globally distributed groundwater (borehole and spring water) observations (Hsu, 1963;Hyeong and Capuano, 2001; Vespasiano and others, 2014; Blasco and others, 2018). This supports both the study methodology and a widespread applicability of the reference models J21 and J23. Notably the difference between model J21 and the spring water studies of Vespasiano and others (2014) and Blasco and others (2018) are remarkably small at 0.01 and 0.05 log unit respectively. Using model J21 as the reference geothermometer for the Vespasiano and others (2014) and Blasco and others (2018) observations we find the equilibrium temperatures to be 71±72°C and 101±72°C respectively; note the very wide uncertainties reflect the confidence intervals of model J21. For Blasco and others (2018) the new estimate agrees well with their equilibrium temperature estimate (93±14°C) derived using CO2 degassing reconstruction highlighting a potential for log 10 ( Ca 2+ a / Mg 2+ a ) values/model J21 to be used as a geothermometer.

Sources of Local Effects
We further evaluate the mechanisms that could control log 10 ( Ca 2+ a / Mg 2+ a )-temperature profiles both at a local and more global scales. This includes the presence of any common ion equilibria, kinetic effects, I, the solution composition, composition of the equilibrium calcite and dolomite phases, and conceivable sources of error.

A) Equilibrium with additional minerals.--
The addition of any other minerals bearing Ca, Mg or CO3 could conceivably generate substantial changes to the solubility of either calcite or dolomite due to the common-ion effect. For the common ion effect to be of significant concern the additional mineral must have a solubility somewhat comparable to either calcite or dolomite.
The illitization of smectite and albitization of feldspar, processes which release Mg and Ca respectively, have been variably discounted as significant in modifying log 10 ( Ca 2+ a / Mg 2+ a ) values by both Hyeong and Capuano (2001) and Kharaka and others (1987). Specifically, at diagenetic temperatures these processes are irreversible and, therefore, cannot buffer log 10 ( Ca 2+ a / Mg 2+ a ) values. The addition of Mg or Ca to the solution through illitization or albitization should therefore be buffered by calcite-dolomite equilibrium maintaining a constant log 10 ( Ca 2+ a / Mg 2+ a ). Furthermore, as neither illite, smectite, albite or anorthite are very soluble, there is a negligible common ion effect.
Chlorite group minerals such as clinochlore (Mg5Al2Si3O10(OH)8) are more soluble. However, Hyeong and Capuano (2001) conclude chlorite is unlikely to be a significant buffer in Frio Fm. sediments as, though chlorite is abundant, it is only found in samples with atformation-depth temperatures >90°C and there is no divergence in log 10 ( Ca 2+ a / Mg 2+ a ) values above and below 90°C. Nevertheless to evaluate the common ion effect we have simulated the solubility of various chlorite group members at 25, 50, 75 and 150°C using PHREEQC with the LLNL and the Thermoddem ( Blanc and others, 2012) databases for 0.1 mol/kgw NaCl buffered solutions in equilibrium with both calcite and dolomite. Hereafter, unless otherwise stated, we model dolomite in PHREEQC using model J23. Clinochlore is present in both the Thermoddem and the LLNL database, but the latter contains two separate variants, clinochlore-14A and clinochlore-7A, which have identical formulae but slightly different solubility constants. The phase clinochlore-14A was determined to be broadly similar in solubility to chlorite from Flagstaff Hill, California termed 'Chlorite (CCa-2)' ( Zhang and others, 2015). Thermoddem also includes the chlorite mineral sudoite (Mg2Al4Si3O10(OH)8).
In solutions buffered solely by calcite and dolomite, all of the aforementioned chlorite phases are very insoluble. For the most soluble chlorite phase, sudoite, each kg of water dissolves 1.056 × 10 −4 moles of calcite, 1.062 × 10 −4 moles of dolomite, and just 2.095 × 10 −7 moles of sudoite at 25°C (at higher temperatures sudoite is also relatively insoluble compared to calcite and dolomite). In the absence of sudoite the amounts of calcite (1.051 × 10 −4 moles) and dolomite (1.065 × 10 −4 moles) that dissolve are similar and this explains the constant log 10 ( Ca 2+ a / Mg 2+ a ) of 0.31 log unit for both simulations with and without sudoite.
Ca-sulphates such as anhydrite (CaSO4) are common in the subsurface and more soluble than either calcite or dolomite and thus potentially could affect log 10 ( Ca 2+ a / Mg 2+ a ). Again, we evaluate equilibrium conditions at 25, 50, 75 and 150°C using PHREEQC with the Thermoddem database in a 0.1 mol/kgw NaCl buffered solution. At 25°C, for every kg of water 1.205 × 10 −2 moles of dolomite and 3.775 × 10 −2 moles of anhydrite dissolve. Meanwhile 2.407 × 10 −2 moles of calcite precipitate due to the common ion effect as anhydrite is more soluble. However log 10 ( Ca 2+ a / Mg 2+ a ) values remain at 34 0.31, which suggests that the common ion effect has no impact on the ability of calcitedolomite equilibrium to buffer log 10 ( Ca 2+ a / Mg 2+ a ). Cation exchange with mixed-layer illite/smectite could also buffer log 10 ( Ca 2+ a / Mg 2+ a ). Hyeong and Capuano (2001), though noting the presence of mixed-layer illite/smectite, dismiss this geochemical process modifying log 10 ( Ca 2+ a / Mg 2+ a ) in the Frio Fm.. However, Dutton (1987) and Engle and Blondes (2014) suggest that the variable release of Ca 2+ and Mg 2+ ions, as Na + is adsorbed, is a key mechanism modifying brines in the San Andres Fm. of the Permian basin and results in low [Na + ]/[Cl − ] values. Variation in the clay composition results in differences in the release Ca 2+ and Mg 2+ and could therefore buffer log 10 ( Ca 2+ a / Mg 2+ a ) values to a wide range. However whilst this process is challenging to model using thermodynamic codes the main argument against clay minerals buffering log 10 ( Ca 2+ a / Mg 2+ a ) values is similar to that given for illitization/albitization; there appears to be a unidirectionality in clay mineral ion exchange as the binding energies and ion sites for Na + , Mg 2+ and Ca 2+ are different ( Underwood and others, 2016). C) Ionic strength\solution composition.--We evaluate calcite-dolomite equilibrium for a range of I and in both monovalent (NaCl) and divalent (Na2SO4) background electrolytes ( fig. 11). An increase in the background I (either NaCl or Na2SO4) up to ~2.  (Ulfsbo and others, 2015).
We find support for the interpretation that log 10 ( Ca 2+ a / Mg 2+ a ) values at calcitedolomite equilibrium are largely insensitive to both I and solution composition. Baker and Kastner (1981), also referencing the experimental data of Rosenberg and Holland (1964) . 11). Noting identical I values, when comparing brines in the Edwards group and the Smackover Fm. Hyeong and Capuano (2001) attributed substantially different log 10 ( Ca 2+ a / Mg 2+ a ) values to variations in dolomite order. Moreover, Hyeong and Capuano (2001) observe different TDS profiles for both the Chocolate/Halls Bayou (from 33g/L to 132g/L) and West Columbia (from 61g/L to 99g/L) fields yet note similar log 10 ( Ca 2+ a / Mg 2+ a )-temperature profiles suggesting no influence from I.
We modify the reference model J21 to additionally include I as a fixed effect such that the Maier-Kelly formula of (only) the fixed effect terms is; ) at calcitedolomite equilibrium and therefore model J22 is not a reference model. Baker and Kastner (1981) note slower rates of dolomitization at lower I therefore the slight decrease in log 10 ( Ca 2+ a / Mg 2+ a ) with increasing I may reflect a kinetic effect inhibiting attainment of equilibrium. Alternatively Kaczmarek and Sibley (2011) observe variations in the dolomite ordering/composition depending on the initial solution composition suggesting the dolomite phase in equilibrium with the solution reflects the I. D) Error.--There is a component of random measurement error which is assumed to be normally distributed and incorporated into a formulation of the mixed model by the error term, ε, and thus is of little further interest. Systematic errors are of more concern and there are several potential sources.
Errors in the geothermal model for calculating at-formation-depth temperatures could increase log 10 ( Ca 2+ a / Mg 2+ a ) scatter. However extensive ground truthing benchmarked this study's geothermal model at a local scale ( fig. 8) and as the majority of PWGD and SMUH samples come from similar areas (producing basins, fig. 6) at the continental scale we interpret the calculated at-formation-depth temperatures for the majority of PWGD samples to be reliable.
There is no weighting in the model for samples taken earlier in the production history which may reflect compositions closer to the unperturbed groundwater composition. The injection of water to stimulate hydrocarbon production (waterflooding) from sources other than recycled water from the formation of interest -such as oceans, rivers or separate aquifers -may result in a calcite-dolomite disequilibrium. Similarly injection of CO2 or other gases/liquids for reservoir stimulation may result in calcite-dolomite disequilibrium and effect the solubility of other minerals such as chlorite. Han and others (2010) document changes to the fluid composition and the saturation indices of calcite (SI = 0.80 → 0.62) and dolomite (SI = 2.55 → 2.44) as production shifts from the waterflooding to the CO2 injection production phases in the Cisco and Canyon formations of the Scurry Area Canyon Reef Operations Committee (SACROC) Unit in the Midland basin, Texas. It is unclear how much variation in log 10 ( Ca 2+ a / Mg 2+ a ) could be attributed to production influences, though Engle and Blondes (2014) offer a solution that could potentially be used to filter samples through the use of principal component analysis.
Another potential error source relates to the evaluation of activities by PHREEQC-Pitzer. We suggest that at least between 43 -150°C PHREEQC-Pitzer produces reliable log 10 ( Ca  . 5b). In comparison to the value of pK sp°-cal , the pK sp°-cal(4%MgCO 3 ) for a LMC phase consisting of 4% MgCO3 (upper bound of LMC) that results from ideal mixing between calcite and magnesite (eq 17; table 2) is +0.018 log unit higher. A +0.018 log unit increase in the value of pK sp°-cal(4%MgCO 3 ) results in an increase of +0.036 log unit in the equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) value (eq 8) and by extension an identical increase in any estimated value for pKsp°−dol. For reference, 0.036 log unit is an order of magnitude smaller than the uncertainty associated with the reference model J21 (pKsp°−dol = -17.27±0.35). This suggests that slight variations in the Mg-composition of the equilibrium calcite phase are unlikely to significantly inhibit the determination of Ksp−dol.
A more realistic non-ideal Mg-calcite model would result in increases in both the estimated solubility of the Mg-calcite phase and, by extension, the equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) value. The determination of the thermodynamic properties of nonideal Mg-calcite across a wide range of temperatures is desirable (though complex). However, at 73°C Ksp°−cal ≡ Ksp°−mag≡ K sp°-cal(4%MgCO 3 ) and non-ideal behaviour should approach a minimum ( fig. 5b). Moreover, at temperatures 50°C ≤ T ≤ 100°C endmember solubilities are similar and at low (T < 73°C) and high (T > 73°C) temperatures, for a pure calcite end-member assumption, the calculated pKsp°−dol values would be slight under-and over-estimates respectively. As the 25 th and 75 th percentiles of the at-formation-depth temperatures are 40°C and 79°C respectively (supplementary table. 5) it seems reasonable to determine Ksp°−dol assuming a pure calcite end-member.
Were Ca substitution into dolomite ideal, for a 2% CaCO3 dolomite the reference model (model J23) pK sp°−dol would change from -17.27 to -17.10 which translates to a substantial identical change (-0.17) (Sperber and others, 1984) with log 10 ( Ca 2+ a / Mg 2+ a ) values.

Mixed Model Global Level Analysis
The reference model (model J21) log 10 ( Ca 2+ a / Mg 2+ a )-temperature profile appears broadly consistent with prior Ksp−dol models ( fig. 5b). Here we contextualise our results relative to prior studies and discuss derived thermodynamic parameters. In this section we also consider the limitations of mixed models and uncertainty estimates.
Literature Comparison.--The study of Bénézeth and others (2018) represents the most recent experimental study that utilizes the widest experimental range of temperatures (53 -253°C) and approaches Ksp−dol from conditions of both under-and super-saturation. At reference state conditions (25°C) the pKsp°−dol value of -17.27±0.35 from the reference model (model J23) is comparable to the -17.19±0.3 of Bénézeth and others (2018). However, at temperatures > 50°C the models diverge ( fig. 5a) until at 200ºC the Bénézeth and others (2018) model has a pKsp−dol of -24.02 which is much lower than both the reference model and also the ordered dolomite phase of Helgeson and others (1978), which have pKsp−dol values of -23. 26 and -23.71 respectively at 200ºC. As the divergence becomes more significant at higher temperatures where dolomite solubility is lower (though achieved quicker) we suggest the accurate measurement of these equilibrium solutions with lower [Ca 2+ ], [Mg 2+ ] and [CO3 2-] values maybe more prone to error. Moreover, Bénézeth and others (2018) do not leverage the benefits in terms of accurately calculating activities in solution through the use of a thermodynamic program. Möller and De Lucia (2020) re-evaluate the Bénézeth and others (2018) dataset using PHREEQC and determine the pKsp−dol at 200ºC to be either -24.7 (llnl.dat) or -25.5 (pitzer.dat) which, they argue, reflects buffering by Mg-surface phases. In summary the calcitedolomite equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) values generated by the Bénézeth and others (2018) model appear to be spurious in comparison to other profiles ( fig. 5a) and a poor fit for the PWGD log 10 ( Ca 2+ a / Mg 2+ a ) distribution. Furthermore the plausible re-evaluation by Möller and De Lucia (2020) results in an even more insoluble Ksp−dol model suggesting additional reliability issues.
As previously discussed, the consistency between the various global groundwater observations/Ksp−dol models including this study suggests a commonality of calcite-dolomite equilibrium in groundwaters. By extension we suggest this increases the reliability of Ksp−dol models generated using the groundwater approach.
Partial pooling issues.--The partial-pooling approach which we have advocated for as an optimal compromise between the complete-and no-pooling extremes has an inherent flaw. Mixed modelling weights groups based on sample sizes, with greater confidence in grouplevel effects for groups with larger numbers of samples. This has two significant effects which we exemplify using Test Area A and the Hastings field (a representative field within Test Area A); 1) The population average (global-level intercept and gradient) is heavily influenced by larger groups. As increasing amounts of high log 10 ( Ca 2+ a / Mg 2+ a ) data is added to a 38 mixed model the global-level log 10 ( Ca 2+ a / Mg 2+ a ) intercept increases. This is most notable when comparing the F1 (n = 21) and G1 (n = 117) models ( fig. 9d) ) value of 0.48 log unit at 25°C which is significantly higher than model F1.
2) A more significant issue is that RI values for small groups that are significantly different to the rest of the population will tend closer towards the global-level intercept as the population size increases. Samples from the Hastings field (n = 4; T = 73 -76°C) are all from the Frio Fm. and have log 10 ( Ca 2+ a / Mg 2+ a ) values consistent with other fields in Test Area A (fig. 9d). The Test Area A model (model F1), Frio Fm. model (model G1), and the reference model (model J21) contain progressively larger number of samples in addition to those from the Hastings field/Test Area A. The group-level RI values for the Hastings field increase as the overall model population size increases with RI values of 0.00, 0.21, and 0.23 for F1, G1 and J21 models respectively ( fig. 9d). The reference model (model J21) RI value for the Hastings field is one of the lowest values for all fields in the PWGD (RI = -1.98; 2149 th lowest RI value out of 2167 fields) though the difference to the global-intercept is only 0.07. Ranked RI values are interpreted to be useful as a comparative tool, highlighting potentially significant variations away from the population average, though we urge caution using RI values to construct models for individual groups as the significance of the variation can be minimised. Future work could implement a Bayesian framework that better preserves information for small sample size groups. The lowest experimental temperature used by Bénézeth and others (2018) is 53°C, yet the uncertainty for pKsp°−dol at 25°C is reported as ±0.3 without any apparent evaluation of the uncertainty associated with extrapolation outside of the experimental range. We model the Bénézeth and others (2018) dataset, specifically the activities given in Bénézeth and others (2018) and not those later determined by Möller and De Lucia ( 2020), using a 3-term linear model (N1) with uncertainty evaluated using 2σ/95% confidence intervals (identical to reference model J21). At 25°C the pKsp°−dol for model N1 is -17.18±0.53 whilst at 100°C, and generally for values within the experimental range, the pKsp−dol is -19. 50±0.15 (fig. 5a). Clearly the extrapolated uncertainty to reference state conditions is significantly greater than that given by Bénézeth and others (2018) and were the uncertainty evaluated at the 3σ/99.7% confidence interval level the uncertainty estimate would be even larger. We suggest that the pKsp°−dol determined here using groundwater data can broadly be considered to be at least no less uncertain than that derived by prior experimental data.
A single inaccurate thermodynamic property, in particular the ,r°, does not render other parameters to be in error. Following the method of Bénézeth and others (2018) ,r° can be fixed to a constant value and the model refitted; for a ,r° value of 157.51 J mol −1 K −1 ( Robie and Hemingway, 1995) we set the b coefficient to a value of -0.06919. This refitted model (model J24) determines the a and c coefficients to be 1.93×10 1 and -4.75×10 3 respectively; these values equate to minor shifts in the values of Δ° and Δ° to −2161.4 ± 0.66 kJ mol −1 and −2332.67 ± 0.34 respectively. Moreover between 25 -150°C there is a minimal difference ( fig. 5a) between the reference model J23 and model J24, with the greatest divergence primarily occurring at temperatures > 230°C where PWGD data is limited ( fig. 9a). The minor difference between these models is because ,°, unlike Δ° and Δ°, is extremely sensitive to the log 10 ( Ca 2+ a / Mg 2+ a ) -temperature gradient. The use of single field data, some of which show steep increases of log 10 ( Ca 2+ a / Mg 2+ a ) with temperature, may be able to more accurately evaluate ,° though the inclusion criteria requires careful justification to avoid a selection bias.

CONCLUSIONS
This regression analysis of the PWGD reconstructs at-formation-depth temperatures for each sample (n=11,480) using a new geothermal model that combines subsurface geothermal gradients and mean annual land surface temperature measurements. After screening to ensure quality, the log 10 ( Ca 2+ a / Mg 2+ a ) of each sample was evaluated using PHREEQC with the Pitzer database.
• Ground truthing at-formation-depth temperatures and log 10 ( Ca 2+ a / Mg 2+ a ) values to areas of the Texas Gulf Coast basin (Test Area A) and Mississippi Salt Dome basin (Test Area B) we find both are consistent with prior studies suggesting both the new geothermal model and PWGD dataset are reliable data sources.
• The log 10 ( Ca 2+ a / Mg 2+ a )-temperature relationship of the majority (90%) of subsurface waters in the US are interpreted to be indicative of calcite-dolomite equilibrium, suggesting this is likely also true globally.
• Equilibria with bulk calcite and dolomite mineral compositions, rather than Mgcalcite surface phases or other conceivable mechanisms (e.g. albitization), is thought to be the primary control on log 10 ( Ca  Supplementary table 7 is an excel workbook comprised of the group RI results for the reference models J21 and J23. A minimum working example of the workflow (amongst other archival results) can be found at the following GitHub repository: https://github.com/hammytheham/AJS_Robertson_2022 . The new geothermal model, which has been modified to generate local data, can be found at www.co2-gasp.com .   Gregg and others (2015). Fig. 2 Temperature dependency of the dolomite order parameter (s) as defined in equation (12) for a dolomite with a critical temperature (Tc) of 1473K equating to a synthetic, ideal dolomite ( Goldsmith and Heard, 1961). There are four solutions (Chaikin et al., 1995); (1) s = 0 at and above Tc, (2) s = ±√ 3(T c −T) T for the asymptotic approach to Tc, (3) s = 1 at T = 0K, (4) s = tanh(T c /T) for the asymptotic approach to T = 0K. For the Bragg-Williams model well-ordered dolomite (s ≥ 0.96) is the most stable state at conditions ≤ 500°C. Dashed line for Helgeson and others (1978) represents their estimation of naturally ordered (s = 0.7) dolomite. Gregg and others (1992) determine 0.7 < s < 0.9 for Holocene dolomites. Dashed line for Hyeong and Capuano (2001) represents their estimate for a partially ordered (s = 0.4) dolomite. Modified from Helgeson and others (1978) .   Fig. 3 Relative frequencies of dolomite stoichiometry for; a) Pliocene Bonaire dolomite (n=72; samples with > 90% dolomite) ( Laya and others, 2021), b) Phanerozoic dolomites from North America (n=55) ( Sperber and others, 1984), c) Global dolomites based on samples (n=654) from a wide variety of localities and ages compiled by Sperber and others, (1984 Fig. 4 a) The re-evaluation by Thorstenson and Plummer (1977) of data from Plummer and Mackenzie (1974) was notably critiqued by Gresens (1981a) for producing too great a range in equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) values. The inclusion of this data by Möller and De Lucia (2020) led to their erroneous interpretation that the Bénézeth and others (2018) data is comparable to that of Möller (1973). Redrawn from Möller and De Lucia (2020). b) Solution [Ca 2+ ] and [Mg 2+ ] as a function of the square root of time during the dissolution of HMC algal calcite (Plummer and Mackenzie, 1974). [Ca 2+ ] and [Mg 2+ ] are non-linear in stage one suggesting congruent dissolution. In stage two both are linear suggesting release to the solution is controlled by diffusion through a product layer. In stage three [Mg 2+ ] is linear but obeys a parabolic rate law and the non-linear decrease in [Ca 2+ ] indicates a transition towards formation of a more Mg-rich incongruent phase than that which precipitates during stage two (~ pure calcite).  (Möller (1973) (and SUPCRT92 calcite and J23 dolomite). This study's reference model (J21) of log 10 ( Ca 2+ a / Mg 2+ a ) values between the SUPCRT92-filter is solid purple with the model (K1) of values not subject to the SUPCRT92-filter (n=11,480) in dashed green. The model (J24) utilizing a fixed β 1 coefficient of -0.06919 is in dashdot purple. Fig. 6 a) This study's geothermal gradient interpolation (0.1x0.1° Lat/Long resolution) using the SMUH dataset (Blackwell and others, 2011). b) Number of SMUH samples per US county (for counties with ≥ 1 sample). Large counties in historically prolific oil producing states have the highest number of temperature measurements, with a maximum n = 3,739 for Crockett county, Texas. c) The SMUH interpolation (Blackwell and others, 2011) for heat flow which utilizes the same data as this study's interpolation of geothermal gradients, which are most similar in areas with a high spatial resolution of temperature measurements. d) Number of filtered PWGD samples (n=11,480) per US county (for counties with ≥ 1 sample). Large counties in historically prolific oil producing states have the highest number of temperature measurements, with a maximum n = 519 for Fremont county, Wyoming. e) Mean annual land surface temperatures (Bechtel, 2015) interpolated at 0.1x0.1° Lat/Long resolution. All maps, aside from C, are WGS 84.  Beckman and Williamson (1990) and Thieling and Moody (1997). Colours represent this study's geothermal gradients ( fig. 6a). Open triangles are fields contained in the PWGD. Filled triangles are fields for which data has previously been published ( Kharaka and others, 1987;Hyeong and Capuano, 2001). Some fields located outside of the bounding boxes are included in the analysis after sample locations are rounded to the nearest 0.1° Lat/Long. b) Location of fields in Test Area A (W94°48' -W96°00'/N28°48' -N29°30'). c) Location of fields in Test Area B (W88°12' -W90°00'/N31°30' -N32°12'). Unfilled squares (Soso and Raleigh fields) represent fields included in both the literature and PWGD datasets. All maps are WGS 84. Fig. 8 a) Comparison of temperatures measured at-formation-depth observed by Hyeong and Capuano, (2001) and those predicted for PWGD samples in Test Area A by this study's methodology. A geothermal gradient representing the average geothermal gradient (SMUH datamean 33°C/km; range 29-35°C/km shaded grey) and the average surface temperature (MAST data -mean 16°C; range 15-16°C) calculated for the PWGD samples in Test Area A shows a reasonable fit to the measured temperatures. Samples from the shallower (651-921m) Miocene sediments have measured temperatures significantly above the upper range of geothermal gradients calculated for Test Area A, emphasizing the limitations of linear geothermal gradients. b) Comparison of temperatures at-formation-depth measured by Kharaka and others (1987) and those predicted for PWGD samples in Test Area B. A geothermal gradient representing the average geothermal gradient (SMUH datamean 27°C/km; range 22-34°C/km shaded grey) and the average surface temperature (MAST data -mean 13°C; range 12-14°C) calculated for the PWGD samples in Test Area B shows a reasonable fit to the measured temperatures. Fig. 9 All plots include the reference model J21 (95% confidence intervals shaded light grey), this study's mixed model A4 of the Hyeong and Capuano (2001) Land and Prezbindowski (1981), K (1987) - Kharaka and others (1987) Kharaka and others (1987) (n=16) (model H1), including the Reedy (n=5) and Soso (n=4) samples, and in general samples from Test Area B (n=204) (model I1) have higher log 10 ( Ca 2+ a / Mg 2+ a ) values than those present in Test Area A (model F1) being closer to the reference model J21. f) Yarmouk gorge dataset from Möller and De Lucia (2020) with models including all samples (model L1; n=42) and excluding samples with temperatures <25°C (model M1; n=36). Also shown (dotted box) is the area represented in figs a, b and c.  Hyeong and Capuano (2001) dataset calculated by Hyeong and Capuano (2001) (using SOLMINEQ88-Pitzer) and by this study (using PHREEQC-Pitzer). For clarity and, to demonstrate no temperature dependence, we present calculated log 10 ( Ca 2+ a / Mg 2+ a ) values with reference to both the sample ID and temperature.  Plummer and Busenberg (1982), Shock and Helgeson (1988), SUPCRT92- Johnson and others, (1992), Shock and others (1997 Hyeong and Capuano (2001). M&D - Möller and DeLucia (2020). Type: L -linear model; Mmixed model (random intercept); M 1mixed model (random slope). Random effects: Fifield; Foformation; Ddepth; Bbasin; Llithology; Ptime-period; Stime-series. MK: Number  Hsu (1963) and that recalculated in this study using SUPCRT92 ( Johnson and others, 1992). The use of SUPCRT92 Ksp°−cal increases the calculated Ksp°−dol by 0.38 log units compared to that calculated with the Garrells and Drever (1952) constant. This shifts the Ksp°−dol value from -16.69 (Hsu, 1963) to -17.07.
¶The data for Kramer (1959) reported by Sherman and Barak (2000) and Bénézeth and others (2018) this study believes is in error, and instead Ksp°−dol=1.5×10 −17 is reported from the original source. †Baker and Kastner (1981) and Morrow and others (1994) do not regress the high temperature experimental data to reference state conditions and instead report experimental ranges; a single average value determined by this study is used to represent the pKsp°−dol. ‡The value of Ksp°−dol= 2.89 × 10 −17 reported by Sherman and Barak (2000) appears to be a transcription error. Barnes and Back (1964) present a range (Ksp°−dol = 2-3 × 10 −17 ) over which they interpret Ksp°−dol. The value Ksp°−dol=2.87 × 10 −17 represents the maximum ion activity product for dolomite as reported by Barnes and Back (1964) and corresponds to the pKsp°−dol reported by Bénézeth and others (2018). §There is likely a transcription error on the part of either Sherman and Barak (2000) (Δ 298.15°= −2151.9 kJ mol −1 ) or Bénézeth and others (2018) (Δ 298.15°= −2121.9 kJ mol −1 ). ◆ The pKsp°−dol from Sherman and Barak (2000) uses the original Helgeson and others (1978) thermodynamic properties not reported here. ¢ Reported uncertainties associated with the thermodynamic properties are derived using the standard error otherwise uncertainties associated pKsp°−dol values are computed using 95% confidence intervals. $ There are two distinct (but related) methods of estimating ordering parameter; a) the standard Helgeson and others (1978) method as used by Hyeong and Capuano (2001) and easily relatable to crystallographic measurements and b) the Anderson and Crerar (1993) % of ordered dolomite method as used by Vespasiano and others (2014) and Blasco and others (2018). However the Anderson and Crerar (1993) method, though initially easier to calculate is not easily converted to s values (and is not attempted here as this study discounts the influence of natural dolomite order and favoring dolomite stoichiometry). Equilibrium log 10 ( Ca 2+ a / Mg 2+ a ) values are higher for both Vespasiano and others (2014) and Blasco and others (2018) compared to Hyeong and Capuano (2001) which classically suggests the presence of a 'more ordered' or, as this study interprets, a more stoichiometric dolomite phase. ¥ -Most databases, such as slop07, have the same properties for the ordered and natural (i.e. just 'Dolomite') phases suggesting the natural (s=0.7) phase has fallen out of usage though Blanc and others (2012) preserved a natural dolomite phase which we presume is related to the Helgeson (1978) phase.
Supplementary Table 3. Geochemical and geospatial/other rejection criteria used to filter the PWGD and the rejection criteria used by Hitchon and Brulotte (1994) and Blondes and others (2016). Of the entire database (n=165,960) 93% of all samples failed to meet the criteria specified in this study, leaving a final population of n=11,840. N/A is not applicable; this criteria is not used to discriminate against a sample's inclusion in the dataset.  Table 5: Summary statistics describing the PWGD dataset (n=11,480), including the mean value, 1 standard deviation (std), minimum (min) and maximum (max), together with percentiles P25, P50 and P75. Variables denoted by (1) are from the PWGD ( Blondes and others, 2016). Geothermal gradients (2) are derived from the interpolation of SMUH dataset ( Blackwell and others, 2011). Mean annual land surface temperatures at well sites (3) derive from a reprocessing of the MAST dataset for North America (Bechtel, 2015). Temperature at formation-depth (4) is calculated using the depth, geothermal gradient and the surface temperature. Because of the minimal amount of available pressure data and generally negligible effect on log 10 ( Ca 2+ a / Mg 2+ a ), this data is excluded from the PHREEQC analysis. Activities and ionic strengths of fluids (5) Table. 6. For each coefficient estimate (e.g. 'a'); (SE) represents the standard error associated with the estimate. For each coefficient the 't/p' represents; 't'the t-test associated with the significance of the coefficient estimate and 'p'the p-value for that t-test. # The intraclass correlation coefficient (ICC) reports the adjusted and conditional ICC. Both ICC values account for all sources of uncertainty but the conditional ICC differs from the adjusted ICC by incorporating the variance associated with the fixed effects; as is common practice we report only the adjusted value. At small sample sizes, using mixed models suffering from singularity in regression analysis, ICC sometimes fails to report. *Reported R 2 values are the marginal (m) and conditional (c) R 2 values for mixed models and the adjusted (a) R 2 for linear models. † The lower (l) and upper (u) confidence intervals (CI) are accompanied by a calculation of the difference between them (dif).^For each random effect usually only the log 10 ( Ca 2+ a / Mg 2+ a ) intercept 'I.' is reported; these are termed random intercept models and the most common type of model used by this study. For random slope models both the intercept 'I. ' and also the gradient for the increase in log 10 ( Ca 2+ a / Mg 2+ a ) with temperature '(G.)' are reported. If a random effect is implemented it is reported; in some cases, e.g. Field for M2, a random effect has zero effect but is still reported as 0.00. For clarity and convenience the total number of random effects '(RE)' implemented is reported in the bracke ts for the Model Type. Unless otherwise stated models use a 2,3 or 4 term formulation of the Maier-Kelly regression formulae eq (25) with the total number of terms used reflected by the number of coefficients (a, b, c and d) reported. A1. Original Hyeong and Capuano (2001) model for Hyeong and Capuano (2001) dataset. Activities calculated by SOLMNEQ88-Pitzer and fit to eq (28) (note equation is in celsius-see text). A2. Linear model of the Hyeong and Capuano (2001) dataset with activities calculated using PHREEQC-Pitzer. A3. Three term (fixed effect) mixed model of Hyeong and Capuano (2001) dataset. The model incorporates the field and formation attributes as random effects. This three-term model produces a clearly spurious fit ( fig. 5a). A4. Two term (fixed effect) mixed model of Hyeong and Capuano (2001) dataset. A5. A random slope (all others are random intercept) mixed model of the Hyeong and Capuano (2001)  -3.76 (4.87×10 −1 ) -3.64 (5.58×10 −1 ) -3.62 (5.97×10 −1 ) -3.65 (6.34×10 −1 ) -3.58 (6.48×10 −1 ) t/p -7.71, 1.37×10 −14 -6.53, 6.93×10 −11 -6.07, 1.38×10 −9 -5.76, 9.14×10 −9 -5.52, 3.52×10 −8 b (SE) 8.90×10 −3 (7.20×10 −4 ) 8.72×10 −3 (8.26×10 −4 ) 8.61×10 −3 (8.85×10 −4 ) 8.63×10 −3 (9.38×10 −4 ) 8.52×10 −3 (9.58×10 −4 ) t/p -3589.7 J15. Model of SUPCRT92-Filtered PWGD dataset. Only the time-series random effect is used. J16. Model of SUPCRT92-Filtered PWGD dataset. The depth (300m interval) and the field random effects are used. J17. Model of SUPCRT92-Filtered PWGD dataset. The depth (300m interval), field, and formation random effects are used. J18. Model of SUPCRT92-Filtered PWGD dataset. The depth (300m interval), field, formation, and basin random effects are used. J19. Model of SUPCRT92-Filtered PWGD dataset. The depth (300m interval), field, formation, basin and lithology random effects are used. -----K1. Model of PWGD database not filtered using the SUPCRT92-Filter. The model (and K2) are unable to reliably calculate the time series random effect term so it is omitted. K2. Model of pKsp°−dol values for the PWGD database with values not filtered using the SUPCRT92-Filter. L1. Model of Yarmouk gorge samples from Möller and De Lucia ( 2020) dataset; samples are taken from Siebert and others (2014) and Siebert and others (in prep). The p-value of this model is 7.73×10 −2 (identical also to p value on the b-term). We do not report model p-values as they are only output for linear models. The p-values given(reported t/p) are those that describe the significance of model coefficients. M1. Model of Yarmouk gorge samples from Möller and De Lucia ( 2020) without samples <25°C. The pvalue for this model is 4.23×10 −5 (again identical to the to p value on the b-term). N1. Reanalysis of the Bénézeth and others (2018) dataset using activities calculated by Bénézeth and others (2018).