Statistics and segmentation: Using Big Data to assess Cascades Arc compositional variability

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.


Download Preprint

Supplementary Files

Bradley William Pitcher, Adam J Kent


Primitive lavas erupted in the Cascades arc of western North America demonstrate significant patterns of along-arc heterogeneity. Such compositional diversity may be the result of differences in mantle melting processes, subduction geometry, regional tectonics, or compositions of the slab, mantle or overlying lithosphere. Previous authors have partitioned the arc into four geochemically distinct segments in order to assess the importance and relative roles of these potential causes (Schmidt et al., 2008). However, despite the immense amount of data available from the Cascade arc, no previous study has utilized a statistical approach on a comprehensive dataset to address such a fundamental petrologic question. To better characterize the heterogeneity of the entire arc, we compiled >250,000 isotopic, major, and trace element analyses (glass and whole rock) from nearly 13,000 samples. To minimize inherent sampling bias – the effect where well-studied volcanoes heavily weight conclusions – we use a weighted bootstrap Monte Carlo approach in which the probability of a sample being selected to the posterior distribution was inversely proportional to the number of samples within its 0.25° latitude bin. This methodology produces a more uniform and unbiased distribution from which we can assess regional, rather than local, compositional variability in the Cascades arc. Using a multivariate statistical approach, we demonstrate that the four segments designated by Schmidt et al. (2008) are, in fact, statistically distinct. However, using a modified hierarchical clustering mechanism, we objectively divide the arc into six regions which have geochemical differences that are up to 6.3 times more statistically significant than in the previous scheme. Our new, more robust segmentation scheme includes the Garibaldi (49.75-51°N), Baker (48.5-49.75°N), Glacier Peak (47.75-48.5°N), Washington (45.75-47.75°N), Graben (44.25-45.75°N), and South (41.25-44.25°N) Segments. By partitioning the arc into the most statistically distinct segments and calculating unbiased mean compositions for each, we explore the petrogenetic causes for the regional-scale differences in primitive lava compositions. These bootstrapped mean data indicate significant inter-segment differences in fluid-flux signature, mantle fertility, and depth and degree of melting. We suggest that differences in subduction geometry, regional tectonics and mantle heterogeneity are the primary causes for these intra-arc differences. This study demonstrates the value of rigorous statistics and the use of big data in the field of petrology.



Applied Mathematics, Applied Statistics, Earth Sciences, Geochemistry, Geology, Multivariate Analysis, Physical Sciences and Mathematics, Statistics and Probability, Volcanology


geochemistry, geostatistics, Subduction zone, volcanology, trace elements, petrology, multivariate statistics, big data, Cascades, Cascades Arc, Igneous Petrology, Volcanic Arc


Published: 2018-09-24 21:58


Academic Free License (AFL) 3.0