Sedimentary structures discriminations with hyperspectral imaging on sediment cores

Hyperspectral imaging (HSI) is a non-destructive high-resolution sensor, which is currently under significant development to analyze geological areas with remote devices or natural samples in a laboratory. In both cases, the hyperspectral image provides several sedimentary structures that need to be separated to temporally and spatially describe the sample. Sediment sequences are composed of successive deposits (strata, homogenite, flood) that can be visible or not depending on sample properties. The classical methods to identify them are time-consuming, have a low spatial resolution (millimeter), and are generally based on a naked-eye counting. In this study, we propose to compare several supervised classification algorithms for the discrimination of sedimentological structures on lake sediments. Instantaneous events in lake sediments are generally linked to extreme geodynamical events (e.g., floods, earthquakes), their identification and counting are essential to understand long-term fluctuations and improve hazard assessments. This is done by reconstructing a chronicle of event layer occurrence, including estimation of deposit thicknesses. Here we applied two hyperspectral imaging sensors (Visible Near-Infrared VNIR, 60 μm, 400-1000 nm; Short Wave Infrared SWIR, 200 μm, 1000-2500 nm) on three sediment cores from different lake systems. We highlight that the SWIR sensor is the optimal one to create robust classification models with discriminant analyses. Indeed, the VNIR sensor is impacted by the surface reliefs and structures that are not in the learning set, which lead to miss-classification. These observations are also valid for the combined sensor (VNIR-SWIR). Several spatial and spectral pre-processing were also compared and allowed to highlight discriminant information specific to a sample and a sensor. These works show that the combined use of hyperspectral imaging and machine learning improves the characterization of sedimentary structures in laboratory conditions.

Natural archives, such as sediment cores are composed of a succession of deposits recording past climate and environment biological-physical-chemical variations. There are two main processes; continuous sedimentation that can be interrupted by event layers. The continuous sedimentation allows to create an age-depth model and infer the environment and climate conditions. The event layers, e.g., related to floods (Gaume et al., 2009;Glur et al., 2013), storms , landslides, earthquakes (Wilhelm et al., 2016), tsunamis (Chagué-Goff, 2010), eruptions, record among the most damaging disasters in terms of economic and societal losses. Currently, in an overwhelming majority of laboratory studies on natural archives, the sedimentary structures are first described visually, then numerous physical (X-ray imaging, computed tomography scan, grain-size) and chemical analyses are undertaken. From those results, the event layers are identified and described. Then, layers with the same characteristics (e.g., color and texture) are usually counted by naked-eye observation. This approach is time-consuming, characterized by a low spatial resolution and subject to high uncertainties due to human interpretations (Lotter and Lemcke, 1999). To overcome these limits, several semi-automatic methods were developed to discriminate these sedimentary deposits from RGB images. The main approaches studied the strata from annually laminated sediment to create an age model with their discretization, but they only use a line or a combination of segments, and then a deposit is characterized by the detection of the maxima (Meyer et al., 2006;Weber et al., 2010). Similarly, for the automatic detection of event layers, Vannière et al. (2013) proposed to use a 1/Red signal and threshold. Some studies also used discrimination methods based on labeled pixels to create classification maps, for example, an adaptive neuro-fuzzy inference system (Ebert and Trauth, 2015) or K-nearest neighbor (Ndiaye et al., 2012). Thereby, only the color signal has been investigated, while many other parameters are potentially useful to distinguish event layers, such as texture, grain-size, chemical composition (Fouinat et al., 2017;Gilli et al., 2013;Wilhelm et al., 2018). A more relevant approach would then consist to detect event layers considering all these parameters automatically.

This is a preprint version
Hyperspectral imaging (HSI) is a non-destructive high-resolution laboratory analysis, which allows a semi-automatic description of the natural deposits based on their physical-chemical characterization.
HSI can improve chemical knowledge by analyzing the sample surface. It has already been used to characterize mineralogical fingerprints (Feng et al., 2018;Lorenz et al., 2019;Tusa et al., 2020), organic matter (Jacq et al., 2019b;Van Exem et al., 2019), pigments (Butz et al., 2017;Makri et al., 2020;Schneider et al., 2018), and particle size distribution (Jacq et al., 2019a). Some of these studies highlight sedimentary structures with proxy's estimation but without characterizing them spatially. It has also permitted to characterize spatial variations of volcanic tephra layers with a visible and nearinfrared hyperspectral camera (VNIR, 400-1000 nm) using a supervised classification method based on an artificial neural network (Aymerich et al., 2016).
In this study, we propose to test several discrimination methods based on artificial neural networks (ANN), on classification rules (decision tree DT, random forest RF) and classification models (linear and quadratic discriminant analysis LDA/QDA, partial least squares discriminant analysis PLS-DA). The hyperspectral images used in this study come from a visible and near-infrared camera (VNIR, pixel size: 60 μm) and a short-wave infrared sensor (SWIR, pixel size: 200 μm). The two sensors can also be combined to estimate a VNIR-SWIR composite sensor (pixel size: 200 μm) or reduced to determine pseudo-RGB, HSV, Lab, and grayscale images. Several spectral and spatial pre-processing, and compression algorithms were also compared to improve the discrimination of continuous sedimentation and event layers. Consequently, this study proposes to compare three high-resolution images, seven discrimination methods, raw data, and eight pre-processing to create discrimination models between the continuous sedimentation and event layers. For the three selected cores, the sediment was first described and logged after naked-eye observations. When they were available, the event layers were distinguished from the continuous sedimentation by using sedimentological and geochemical results. As X-ray fluorescence core-logging that was made at 1 mm resolution for ALO09 and 0.5 mm resolution for LDB17, which was interesting to have a higher resolution than with naked eye. Once clearly identified, those layers were counted visually. In ALO09 and LDB17, the sediment sequence consists of homogeneous light gray clay alternating with dark layers, interpreted as flood induced deposits. Whereas for LEM10, the event layers present a pale red color. In the three cores, the frequency, and the thickness of event deposits is variable ( Table 1). After that, the data sets will be referred to by the lake name, but it must be understood that from one core to another from the same site, there may be differences (color, chemical, physical). respectively. The real resolution may vary due to surface roughness. Relevant data were obtained with a flattening and a cleaning of the core to have a plane surface that reveals sediment structures (Butz et al., 2015). Then, a calibration of the camera was realized with a spectralon reference, and the image of a known object to have squared pixels (true shape) and relevant reflectance intensities (color and signal to noise ratio).

Sample sites description
The resolution of the two datasets (VNIR and SWIR) has also been homogenized at a common spatial resolution of 200 μm to get a VNIR-SWIR image. The combination of the two HSI into a unique one (i.e., VNIR-SWIR) was made with image registration (Liu et al., 2011) adapted to HSI with a registration on a wavelength plane characteristic of a similar chemical compound. In previous work, we find that the 970 nm (VNIR) and 1200 nm (SWIR) wavelengths were optimal to combine them (Jacq et al., 2019c). They are related to hydroxyl chemical bonds mostly associated with moisture for sediment cores (Bull, 1991;Viscarra Rossel and Behrens, 2010). Therefore, a composite sensor was created to merge the VNIR and SWIR datasets to cover the range of 400-2500 nm. In this study, we compare the classification performances of three datasets: VNIR at a spatial resolution of 60 μm, SWIR at 200 µm, and VNIR-SWIR at 200 μm.

This is a preprint version
Pre-processing can be used to correct the data from noise or aberrant values, and to highlight discriminant wavelengths. With the three dimensions of the HSI, spectral, and spatial pre-processing can be used (Rinnan et al., 2009;Vidal and Amigo, 2012). Spectral pre-processing compared in this study are normalization (autoscale), baseline correction (detrend), scattering effect correction (Standard Normal Variate (SNV) and Multiplicative Scatter Correction (MSC)) ( Barnes et al., 1989), Savitzky-Golay derivatives (Savitzky and Golay, 1964). Principal Component Analysis (PCA) was also used as a compression of the spectral dimension (Pearson, 1901). Contrast Limited Adaptive Histogram Equalization (CLAHE) was used on the spatial dimensions with an estimation on the grayscale image levels and applied on each image wavelength (Zuiderveld, 1994).

Data Analysis
Seven commonly used machine learning algorithms were used to create classification models with three spectral type data (VNIR, SWIR, VNIR-SWIR) ( Figure 2). First, an RGB image was made from the VNIR HSI to label continuous sedimentation and event layer pixels. Then, machine learning methods were used to estimate discrimination models. Also, eight pre-processing (spatial and spectral) were used and compared with raw data. Consequently, 189 discriminations models were created for each sample to find the optimal sensor (of three) -methods (of seven) -pre-processing (of nine). The Matlab codes used in this study are available in github.com/JacqKevin/HSI_SupervisedClassification.  (6) Estimation of the optimal model depending on the sensor, the pre-processing and the discrimination model.

Data labeling
The hyperspectral VNIR data can be reduced to three planes corresponding to a Red-Green-Blue image of the VNIR sensor (611-549-464 nm, according to sRGB standard) (CIE, 1999). The RGB images are representation modes adapted for eye vision and were used to visualize the sample for the labeling of the sedimentary structures. For that, the user must manually select rectangle areas that correspond to the different studied classes based on the known deposits characterized by geochemical, textural, and mineralogical analysis. The resulting maps were used as a mask with 0 values for non-labeled pixels and other values for labeled pixels (1: event layer, 2: continuous sedimentation). It was also used to extract the labeled pixels and their corresponding spectrum. For the SWIR dataset, these maps were also made with the VNIR data at 60 µm and resized at 200 µm.

Classification modeling
Seven methods were used to create a supervised classification model to discriminate the two sedimentary processes (Figure 3). Decision tree (DT) and random forest (RF) are linear nonparametric methods based on successive rules on the reflectance value of one wavelength at a time (Breiman et al., 1984;Ho, 1995). Artificial neural networks (ANN) are non-linear parametric methods based on data learning with neurons (Ivakhnenko and Lapa, 1965;McCulloch and Pitts, 1943;Rosenblatt, 1958). Linear, Quadratic, and Partial Least Squares Discriminant Analysis (LDA, QDA, PLS-DA) are parametric methods with a mathematical function kernel to separate the classes (Fisher, 1936;Wold et al., 1984). The models were fitted with 70% randomly selected data in calibration and the remaining in the test set (Table 2). This ratio was applied to the class with the smaller number of labeled pixels to have a consistent number on each group. The same calibration and test sets were used for each classification method, but they were different for each pre-processing. The labeled pixels number depends on the ability of the user to see precisely the deposits, so only pixels that we are sure they belong to these classes are labeled so as not to introduce uncertainty with bad labeling. Due to the compression of the high-resolution labeled map, the pixels of the contours may have been removed as being uncertain, which slightly changes the labeling percentages between the two resolutions.
That explains the difference between the cores with labeled pixels between 0.45% and 15.71% for one class (Table 2). It will also allow to see the learning capacity depending on the number of calibration pixels.

This is a preprint version
The quantitative performances of the classification models are estimated with the accuracy in calibration and prediction: with "true positive" corresponding to class 1 pixels classified in class 1, "true negative" corresponding to class 2 pixels classified in class 2, "false positive" corresponding to class 2 pixels classified in class 1, and "false negative" corresponding to class 1 pixels classified in class 2.
The accuracy corresponds to class 1 and 2 pixels that are successfully classified in their corresponding classes. Thus, the closer the value is to 1, the better the prediction.

Qualitative assessment of each pixel image classification
The created model must be visually assessed by an expert to quantify the spatial consistency of the classification map because of possible overfitting. For that, the model was used on all the pixels to estimate the classification map. Then, the user must assess it by looking at the salt and pepper characteristics of the image, with single pixels corresponding to potential miss-classification. Also, the user must check the event layer predicted areas with the RGB image. For this, a qualitative index with values between 0 and 5 (arbitrary) is used. For this, a qualitative index with values between 0 and 5 (arbitrary) is used. A value of 0 corresponds to a classification map with mainly a single value for each image pixel. Whereas a value of 5 shows relevant deposits across the image width and a few single pixels.

Results and interpretations
3.1 Comparison of the classification methods and the pre-processing Eight pre-processing were tested on each sample, each sensor, and each discrimination method to find the optimal combination ( Figure 4). The processing must have the capacity to highlight discriminant information and create performant and robust models. PCA and second derivatives (D2) This is a preprint version present the lowest prediction accuracies for each sensor and sample (Figure 4). The PCA reduction may not keep the discriminant information in the main PCs, as well as the D2. Whereas raw, CLAHE, autoscaled, and detrended data have the highest discriminant properties in all cases (Figure 4). Some pre-processing are sensor dependent, such as SNV and MSC, which models present low accuracies for the VNIR, and high ones for the SWIR sensor. The first derivative (D1) has intermediate accuracies for all cases except for the SWIR sensor and the Allos sample, for which high performances are obtained.
Concerning the discriminant methods, DT presents the lowest prediction accuracies in most of the cases, whereas ANN, QDA, and RF have the highest. For the ANN, few neurons (2-3) seems to be better than numerous. The later has many more parameters and may overfit on the few training data, thus limiting its generalization capability on the test set. The accuracies of the optimal methods are close and must be discussed with all the quantitative values and interpretations of their discriminant spectral properties.
Another consideration to be considered is that the results are sample-specific. They also show that using the LDA or PLS-DA methods with raw data allows to have first performing results, even if adding a pre-processing can slightly improve prediction performances.

Optimal models for each image type
The optimal model's calibration and prediction accuracies are higher than 0.8 and mainly 0.9, which highlights the presence of spectral discriminant information in both sensors (Table 3). ANN with two neurons is often the optimal discriminant method based on the prediction accuracy and to be parsimonious because adding neurons do not improve the accuracies. However, for the qualitative index, discriminant analysis methods (LDA, QDA, PLS-DA) generally present the best performance This is a preprint version levels. The computation time is lower for the LDA, PLS-DA, and DT methods, and all of these methods are fast to compute, so the computation time is not used to select the optimal model. Based on a compromise between these two first properties (accuracies and qualitative), the optimal methods in most of the cases are the LDA or PLS-DA. Another important property is their transparency in the selected discriminant wavelengths. Table 3 highlights an optimal pre-processing that depends mainly on the sample and the sensor. The discrimination methods seem to have a low impact on it, which suggests that an optimal pre-processing can highlight the discriminant wavelengths.
This  The classification maps comparison of the optimal and the worst models highlight that the VNIR sensor is more sensitive to surface reliefs, as it can be seen in the Figure 5.a-b with fissures and holes due to sampling, or the Figure 5.d-e with some kind of shadows. One can notice the darker laminated areas on the Bourget and Geneva sediment sections (on the left of the picture) that were not labeled initially because of the non-possibility to precisely label the two classes. In these areas, the VNIR This is a preprint version sensor predicted most of the pixels in the event layers class might be due to his similar color with this class. In contrast, the SWIR data predict the two groups correctly thanks to the more spectral discriminant information. For the Allos sediment section (Figure 5.a-c), the event layers are easily distinguishable from the continuous sedimentation due to their darker colors. But there is also a color gradient linked to a grain size gradient that is spectrally registered in the SWIR range (Jacq et al., 2019a). That explains the wider layers estimated from the SWIR data than those of the VNIR one, and the lightest layers not visible with the VNIR data.
All these observations and the quantitative validation are agree on the highest discriminant capacities of the SWIR sensor for the separation of event layers from the continuous sedimentation.
Thus, the combination of the two sensors is not better, as we could expect due to the VNIR data sensitivity to surface roughness, and they will not be used further on. Figure 5: Optimal (c, f, i) and worst (b,e,h) models for each sediment core samples (a, b, c: Allos; d, e, f: Bourget, h, i, j: Geneva)

Spectral signatures
The estimation of a SWIR wavelength cumulative occurrence allows to find the discriminant ones and to characterize them ( Figure 6). Five discriminant spectral areas present high cumulative values and can be associated with chemical properties (Viscarra Rossel and Behrens, 2010). Organic components can be related to the spectral wavelengths between (1) 1100-1200 nm and (2)   The three optimal models use similar discriminant wavelengths but with different importance depending on the sediment properties. The two hydroxyl bonds spectral areas have important implications in the three models that can be associated with the moisture and the particle size.

This is a preprint version
Otherwise, the two classes seem to be discriminated by the mineralogy for the Allos sediment sequence, the organic matter for the Bourget and Geneva sequences.

Model transfer
The optimal SWIR models developed on each core were used to estimate the classification maps of the other ones to see the capability of the model transfer. Table 4 shows the results of these classifications with the test accuracies and qualitative indicators. Most of them agree on the difficulty of transferring the model to other cores. Only the model developed with the Bourget data has some capacity to predict on the other, maybe due to its discriminant wavelengths that are less spectrally localized than the two other models (Allos: 2100-2200 nm, Geneva: 1650-1700 nm).

This is a preprint version
Based on these results, the estimation of a classification model is sample-specific. It can be made with a linear discriminant method as LDA or PLS-DA with the raw SWIR dataset, even if some pretreatment can slightly improve the performance.

Event layer classification with classical images
The true-RGB image can also be converted into HSV (Hue, Saturation, Lightness) or L*a*b* (lightness, color) color spaces, and reduce in grayscale level. The HSV and L*a*b* transformations have proved to be more adapted for classification purposes (Bora et al., 2015;Hernández-Hernández et al., 2016).
The grayscale levels are the most straightforward color space. Table 5 shows that the conventional image presents slightly lower performances as VNIR and also the same mispredictions with surface variations, shadows, and laminated areas (Supplementary figure 2).
All these observations highlight that these methods learn too much the color of the deposits with VNIR data and do not learn sufficiently from the chemical composition, contrary to SWIR HSI. Then, machine learning and conventional image can be used in the case of a clean image and sample, with only color differences between the classes and the learning of all kinds of sample events and defects.

Comparison between an HSI and a naked-eye chronicles
The discretization of the event layer allows to estimate their occurrence along the sediment section and to compare it with the naked-eye event chronicle. The classification map was reduced to a summed profile, with pixels equal to 0 for continuous sedimentation and 1 for an event layer. This assumes that deposits are parallel, if this is not the case, image processing needs to be used to correct the deformation when possible. This profile can also be used as an event occurrence probability with the normalization by the pixel number of the image width. It was smoothed with a second-order polynomial Savitzky-Golay filter to reduce miss-classifications (Savitzky and Golay, 1964). A low threshold (lower than 20%) allows to separate all the deposits, but also find some artifacts, and the boundaries can be overestimated due to curvatures. Conversely, a high threshold (higher than 70%) only finds the large deposits, but close ones can be fused. Thus, a double threshold was used. First, to the half-width of the image to classify each column to one class, it allowed to find most of the relevant deposits, but close layers can be seen as a large one. The second threshold of 15% of the image width was used for event layers thicker than 5 mm to divide them potentially.
Finally, the chronicle can be reconstructed with the depth and thickness of each event layer. Naked eye chronicles of the three cores were estimated and can be compared with those from hyperspectral (Jenny et al., 2014b;Wilhelm et al., 2012). Figure 7 compares the identification of event layers in depths and thickness between HSI and nakedeye approaches. In general, HSI results in thicker event layers compared to naked-eye observations. This can be explained by the eye limitation to characterize the event limits (resolution grain) or by the curvature of it (miss-identification). Machine learning also allows to identify new deposits that were not visually detected (due to their small thickness or texture or color differences). They must be assessed by other high-resolution technics as microscopy to verify their relevancies. The number of This is a preprint version detected event layers is higher (Table 1, Table 6) due to the detection of supplementary thinner layers.
Initially, this work was developed to study only flood events that were manually labeled. Still, HSI does not seem to have enough discriminant spectral information to distinguish the different triggering mechanisms of the event layers (e.g., flood, slumps induced by seismic-shaking…).
However, the HSI model corresponds to the first fast and non-destructive method to detect, in a semi-automatic way, event layers in different sedimentary contexts and represent a clear improvement in sedimentology.

Perspectives
It is worth noting that we focus only on event layers and that models are site-specific. Some strategies can be tested on future works. The first one is to create a multi-site database to learn several event types and continuous sedimentation cases (strata or homogeneous). A second could be to use complementary information estimated by other sensors, for example, XRF spectroscopy for elemental composition, which is also a non-destructive, high-resolution (up to 200 µm) analysis.
It will also be interesting using a spatial-spectral approach to add information contained in both dimensions, like color and grain size gradient, along with a flood event. Deep learning with the convolutional neural network (CNN) use this kind of approach and can be interesting for a multi-site and a large database. This method introduces multiscale local features learning and some translation and rotation invariance, which is of interest for image classification (Schmidhuber, 2015). We have tested a CNN1D, that uses only the spectral dimension, but it presents low performances that may be due to a too-small database and to a too simple problem that an ANN with few neurons can model.
In future work, we will use a CNN3D, that use a spatial-spectral approach (Ben Hamida et al., 2018).
We have shown previously that the combination of VNIR and SWIR does not increase the discriminating capacity. Improvements can be made with surface defect correction before the acquisition or after with pre-processing, or detected and removed them from the classification map.
Another way could be to make two separate classifications with both sensors and then to combine them with different weights using e.g., fuzzy or belief functions methods to improve the certainty of the predictions (Lian et al., 2019;Tehrani et al., 2019).
This study has shown that event layers can be estimated at high-resolution with hyperspectral imaging for the cores used in calibration. A first discriminant model, based on LDA or PLS-DA, estimated with the raw hyperspectral data can allows to know the possibility of discrimination between different classes. A similar method has already been used for the detection of tephra layers (Aymerich et al., 2016). So, the proposed method may be used for a case study for other types of events, like tsunamis, earthquakes, landslides, storms, or any other laminae. It can be very useful for paleoenvironment and paleoclimate studies and will allow the creation of event chronicles.

Conclusions
We studied the potential of three hyperspectral sensors (VNIR, SWIR, VNIR-SWIR) to image three sediment cores and created machine learning models. The aim is to automatically discriminate different types of sedimentation (continuous versus event layer) with non-destructive, highresolution, and time-saving methods. Seven discrimination methods, coupled with raw data or eight pre-processing, were used to find an optimal model. We found that the SWIR sensor allows creating the most robust models with discriminant analysis (LDA, PLS-DA). Raw data presents relevant predictions, but the use of a pre-processing can slightly improve the performances and robustness.
The models were assessed quantitatively with prediction accuracies higher than 0.9. For the qualitative one, event layers present colors and textures that differ from the continuous sedimentation and can be seen with a naked-eye observation to check the relevance of the prediction maps. The discriminant wavelengths are associated with organic matters and some mineral bands. Finally, the event chronicles can be estimated from the classification maps with the estimation of the depth and thicknesses of each deposit. Unfortunately, the hyperspectral sensors used in this study do not have enough spectral discriminant information to characterize the trigger of the event layer and if different types of events exist. Future works will allow to characterize triggers by the combination of hyperspectral imaging with other sensors or by the use of spatial-spectral machine learning methods. This study highlights the sediment lithologies discrimination capacity of This is a preprint version hyperspectral imaging with manual labeling. The application of this method on sediment sections will allow to create robust chronicles of events with characteristic wavelengths and thus, enhance the knowledge of the evolution of the frequency of extreme geodynamical events.

Acknowledgements
Coring supports were provided by the French National Research Agency's with the IPER-RETRO program for lake Geneva (ANR-08-VUL 005), the Pygmalion project for lake Allos (ANR BLAN07-2_204489). Hyperspectral imaging was processed at the University of Normandie-Rouen and was funded by the Region Normandie, which supports the scientific consortium SCALE UMR CNRS 3730.