Evaluating single and multi-date Landsat classifications of land-cover in a seasonally dry tropical forest

Accurate information on the land cover is crucial for efficient monitoring and development of environmental studies in the Brazilian Caatinga forest. It is the largest tropical seasonal forest in South America, presenting high biodiversity and is under intense anthropogenic disturbance. Caatinga's land cover is heterogeneous, and rainfall is its primary phenological regulator, presenting mainly deciduous species. Different land-cover patterns show distinct spatial responses to climate and soils changes and modify their physical properties over time. Rainfall is highly variable over time and space, but seasonally concentrated between 2 to 4 months. Therefore, distinguishing the different patterns of land cover through medium spatial-resolution remote sensing, such as the Landsat image series, is challenging, due to the particularities of the climate-vegetation interaction. Two remote sensing approaches have a high potential for efficient land-cover mapping in Caatinga: single and multi-date imagery. The heterogeneity of the land cover of this environment can contribute to a better performance of multispectral approaches, although it is normally applied for singledate images. In a land-cover mapping effort in Caatinga, the temporal factor gains relevance, and the use of time series can bring advantages, but, in general, this approach uses vegetation index, losing multispectral information. This manuscript aims to assess the accuracies and advantages of single-date multispectral and multi-date Normalized Difference Vegetation Index (NDVI) approaches in land-cover classification. Both approaches use the Random Forest method, and the results are evaluated based on samples collected during field surveys. Results indicate that land-cover classification obtained from multi-date NDVI performs better than single-date multispectral data. The lower performance observed for single-date multispectral classification is due to similarities in spectral responses: targets of deciduous vegetation lose their foliage and can be misread as non-vegetated areas. Meanwhile, an accurate classification by time series of plant clusters in seasonal forests allows incorporating seasonal variability of land-cover classes during the rainy and dry seasons, as well as transitions between seasons.


Introduction
The Caatinga is the largest seasonally dry tropical forest in South America (Queiroz et al., 2017), covering an area of about 11% of the Brazilian territory (Brazil-MMA, 2019).
Caatinga's conservation has a direct influence on various environmental processes associated with soil protection, water resources, climate maintenance (Manhães et al., 2016) and economic activities (Brazil-MMA, 2019). The degradation of Caatinga vegetation results from unsustainable exploitation, which, associated with climatic factors, accelerates the desertification process in the region (Drumond, 2004). This ecosystem, of high biodiversity, is under intense anthropogenic disturbance (Ribeiro et al., 2015), and needs accurate information on the land cover for efficient monitoring and development of environmental studies (Gomez et al., 2016).
The Caatinga land cover is heterogeneous, and rainfall is the main phenological regulator of plants in this forest (Moro et al., 2016). The different land-cover patterns are driven by natural and anthropogenic factors, acting on multiple spatial and temporal scales (Moro et al., 2016;Chaves et al., 2008). In these different land-cover patterns, the strategies for adapting to the climate are distinct, resulting in different spatial responses and in the variation of their physical properties over time (Meiado et al., 2012;Vico et al., 2015). The particularities of the climate-vegetation interaction in this forest make it a challenge to distinguish the different landcover patterns through remote sensing (Cunha et al., 2020).
The extraction of land-cover information from remote sensing images is the result of the interaction of the targets on the surface and the electromagnetic radiation in the different spectral bands (Jensen, 2009). The algorithms for distinguishing the different spatial patterns existing in the landscape take advantage of this information to characterize the land cover. The Landsat data structure allows performing temporal analysis in higher spatial resolution (Woodcock et al., 2020), as it provides information on the quality of coverage radiometric, geometric, and identification of clouds and cloud shadows Man et al., 2018), making easier the differentiation of land-cover patterns in high spatial heterogeneity.
Although satellites offer practically continuous monitoring, classification of land cover commonly uses multispectral data at a single observation date (Jia et al., 2014;Mahdianpari et al., 2018;Alhassan et al., 2019). However, this approach can induce confusion in the classification of the different existing land-cover patterns in dry seasonal forests, due to the similarity of the vegetation's spectral response in specific phenological stages (Karnieli, 2002). The use of time series can be an alternative for mapping seasonal dry forests, for allowing the monitoring of the different phenological stages of land cover patterns (Hüttich et al., 2011;Gomez et al., 2016). Moreover, the use of vegetation indices allows synthesizing the spectral bands which are most sensitive to biomass variation and photosynthetic activities, simplifying the number of input variables (Tatsumi et al., 2015). However, most studies using vegetation-index time series are carried out in crop areas (Wardlow and Egbert, 2008;Zheng et al., 2015;Mercier et al., 2020), which facilitates the identification of the phenological cover patterns. In seasonal dry forests, the land cover classes and their phenological patterns are not well defined and anthropogenic changes may impair the mapping (Abdi, 2020).
This study assesses two approaches for supervised classification of the Caatinga forest vegetation, one using multi-date Normalized Difference Vegetation Index (NDVI) data and the other using single-date multispectral data. The objectives of this study are: i) to map the Caatinga land-cover classes using these two approaches, comparing both performances for land-cover classes classification, and ii) to assess the impact of the classifications on land-cover mapping once the outcome results provide information for the forest management and conservation. It is also expected that the findings can contribute to enhancing the techniques for mapping seasonally dry tropical forests.

Study area
The study area is the Sucuru River basin (Fig. 1), with a territorial area of 1,682.87 km², located between the geographical coordinates 7º28'30'' and 7º49'30'' South and 36º34'00''and 37º12'00'' West. In the study area, vegetation degradation has occurred mainly by human activities, such as agriculture and livestock exploitation and wood extraction (Moreira and Targino, 1997;Alves et al., 2017). The climate is hot semi-arid (BSh, Köppen classification), with two distinct seasons: the hot dry season (From June to January) and the very hot rainy season (from February to May), with an average annual rainfall of approximately 520 mm (Cunha et al., 2020). The soils are shallow and stony, which makes it difficult to retain water after the precipitation events (Moro et al., 2015). The river basin is located in the Cariris Velhos desertification nucleus. This nucleus is one of the areas in the region that presents a high risk of desertification (INSA, 2016).  Figure 2 presents the schematic workflow of the methodology applied in this work to evaluate the performance of the classification obtained by single-date multispectral data and the classification by multi-date NDVI data. First, we collected field data and selected satellite images. Then, we reconstructed the smoothed NDVI time series, and identified the temporal patterns of the vegetation cover classes. In the processing step, the Random Forest (RF) method was used for both, single-date multispectral and, multi-date NDVI classification. Finally, it is identified the accuracy and performance of these classifications.

Data collection
Field surveys collected raw data about Caatinga's land-cover classes in 60 previously chosen land-plots. The surveys occurred in early spring when the deciduous vegetation doesn't lose its leaves yet. In this case, it happened from September 28 to October 7, 2016. The landcover identification survey in the 60 land-plots polygons ( Fig. 1) extracted 3,000 pixels randomly, which representing six classes of Caatinga land-cover. The whole set of pixels was randomly grouped into training (2,000 pixels) and validation (1,000 pixels) data sets. Caatinga's land-cover classification followed the methodology proposed by Chaves et al. (2008). Those authors describe and evaluate Caatinga's vegetation in its different stages of anthropization, based on size, morphological features and degrees of coverage. Table 1 shows the used classes according to this methodology. The Bare Soil (BS) class, when there is no vegetation cover, was added, totalling six land-cover classes.  September 2016 that cover the study area (44 from the ETM+ sensor and 44 from the OLI sensor). The combination of images from two sensors results in sampling for the same region at eight-day intervals with thirty meters of spatial resolution. Two different algorithms generate Landsat data at this correction level and depend on the measurement sensor: Landsat 7 ETM+ data are obtained by the LEDAPS software (Masek et al., 2006), and Landsat 8 OLI data are processed by the LaSRC algorithm (Vermote et al., 2016). NDVI is calculated using surface reflectance data from the red and near-infrared (NIR) spectral regions (Tucker, 1979). USGS provides NDVI images with terrain and atmospheric correction, resulting in orthorectified images of high geometric accuracy.

Pre-processing
Digital pre-processing of the images, to perform the classification by time series, was done by a R script (R Development Core Team, 2018) developed by the authors , for noise reduction and removal of cloud shadows, clouds, and water pixels. For a smoothed NDVI time serie, the pre-processing uses the Landsat surface reflectance quality assessment (band pixel_qa) which considers only clear pixels (values 66 and 130 for Landsat 7, or 322 and 386 for Landsat 8, USGS, 2019a, b). Holben (1986) showed that the maximum value is a reliable measure to produce representative compositions of the temporal image on a monthly scale. In this study, the NDVI maximum values are used to reduce the original NDVI time series to monthly composite images. The missing values were filled in by linear interpolation. Also, the Savitzky-Golay linear filter is applied (Cao et al., 2018;Savitzky and Golay, 1964), with a fivemonth window to smooth the width, reducing the noise caused by atmospheric variability.

Classification
The classification step uses the spectral bands blue, green, red, NIR, short wave infrared 1 (SWIR 1) and short wave infrared 2 (SWIR 2) ( Fig. 3) of the image of September 29, 2016, Landsat 8 as input data to the RF method to implement the single-date multispectral approach to classification. In the approach by multi-date NDVI, the NDVI monthly time series from October 2014 to September 2016, totalling 24 composite images (Fig. 3) are the input data to the RF procedure.
Supervised image classification was performed using the Random Forest R package (Liaw and Wiener, 2002). The RF is an ensemble classifier that produces multiple decision trees. The method uses a bootstrap sample of two-thirds of the original training data (in-bag samples) to form trees randomly while the remaining third group of samples known as Out-Of-Bag (OOB), are used to obtain an internal error estimate (Breiman, 2001;Belgiu and Drãgut, 2016;Hüttich et al., 2011). The final classification decision is obtained by the arithmetic mean of the class assignment probabilities calculated by all produced trees, in which each tree votes for a class membership and the final result is the class that obtained the highest number of votes (Belgiu and Drãgut, 2016). This method has been indicated to classify land cover due to its precision (Valbuena et al., 2016). The main parameters of the RF models, defined by the user, are the maximum number of decision trees to be generated in the forest (ntree) and the number of variables used randomly to split each node (mtry) of the tree (Belgiu and Drãgut, 2016;Htitiou et al., 2019). In this study, five hundred decision trees were used and mtry was set to the square root of the number of predictor variables.

Validation and accuracy assessment
The evaluation of the supervised classifications was carried out based on four performance indicators calculated from the confusion matrix: overall accuracy (OA), Kappa coefficient (k), producer's accuracy (PA), and user's accuracy (UA). The confusion matrix describes the pattern of the allocation class relative to the reference data (Foody, 2002). The Kappa coefficient of agreement (k) uses all the elements of the confusion matrix in its calculation (Eq. 1), constituting an important precision evaluator in images analysis.
where k is an estimate of the kappa coefficient (valuing less than 0 means no agreement; close to 1 means perfect agreement); xii is the value in row i and column i; xi+ is the sum of the values in row i; x+i is the sum of the values in column i of the confusion matrix; n is the total number of samples; and m is the total number of classes (Foody, 2002;Congalton and Green, 2008).
OA is the division of the total number of correctly classified samples (sum of the elements of the main diagonal of the confusion matrix) by the total number of reference samples. PA is the division of the total number of correctly classified samples in a class by the total number of reference samples for that class, while UA is the division of the total number of samples that were correctly classified in a class by the total number of samples classified in that class (Congalton, 1991). These performance indicators value between 0% and 100% (worst and best performance, respectively).
The spectral response of the Caatinga coverage classes concerning the two classification approaches was also analyzed. The classes' boxplots were visually examined to evaluate the separability of each class. This graphic technique illustrates how the training data of the coverage classes are related to the inputs used in the classification approaches.

Results
The results of the performance assessment showed that the classification based on NDVI monthly time series was more accurate, with an overall accuracy of 88.8% and a kappa coefficient of 0.86 (Table 3), than the single-date multispectral classification, with an overall accuracy of 81.4% and a kappa coefficient of 0.78 (Table 2). The comparison of the accuracies for each classification approach, shows that the classifications achieved high accuracies (> 70%) among the different classes (Fig. 4, Tables 2 and 3). However, the lower accuracies for some classes can be assumed as critical for mapping land-cover, in particular the single-date multispectral classification for the open vegetation (OSSB, OSS, SSS, BS).

Fig 4:
Radar chart representing the user's and producer's accuracies for the single-date multispectral and multi-date NDVI classification Figure 5 shows the training samples of the six land-cover classes through boxplot graphs for the two approaches and also, that the NDVI monthly time series approach has better revealed distinct patterns for each land cover class than the single-date multispectral approach.
In single-date multispectral classification, similar patterns were observed for OSSB, OSS, SSS, BS classes; these classes were precisely those that presented the lowest performance in this classification approach for the user's accuracy (UA) and producer's accuracy (PA  The values of areas of each land cover class over the studied basin (Fig. 6), classified by the single-date multispectral and multi-date NDVI approaches, show that some classes have different estimates for their areas according to the approach used (Fig. 7). The application of the first approach resulted in the following composition: BS (4.  Fig. 7 show that, although the areas present quite similar magnitudes (Fig. 6), there are spatial differences in the land-cover distribution between the maps. In general, the land-cover maps according to the single-date multispectral classification present higher spatial fragmentation of the land-cover classes when compared with the ones generated by multi-date NDVI classification.

Discussion
The lower performance presented by the single-date multispectral classification can be explained by the similarities in the spectral responses of some classes of coverage (Fig. 5.A).
SWIR 1 band has the highest reflectance value among the land-cover classes, except VDAS class. This is explained by the contribution of the exposed soil in these cover classes (Ciani et al., 2005;Tian and Philpot, 2015), since most of the Caatinga vegetation is sparse. However, VDAS class has very dense vegetation, reducing the contribution of the exposed soil in its spectral response, being the band NIR its highest reflectance value (Ding et al., 2014). Figure   8, generated from the average value of reflectance for the training sample, allows visualizing the similarity between classes BS and SSS, and between classes OSS and OSSB. Deciduous vegetation targets lose their foliage during the dry season in the Caatinga environment, and can be confused with non-vegetated areas (Lima et al., 2012). This feature is more significant in the vegetation areas classified as open, since the vegetation in the upper stratum, when losing its leaves, presents large portions of exposed soil. Each of the Caatinga vegetation species responds differently to precipitation and the amount of water storage in soils (Lima and Rodal, 2010;Moro et al., 2015). In the humid period, most open-ground cover areas are invaded by grasses that have low height, confusing the distinction between areas of different vegetation size, due to the elevation of biomass and high momentary photosynthetic activity. Therefore, the near-infrared band will show high reflectivity to the increase in biomass, and the blue and red bands will absorb solar radiation to perform photosynthesis (Kumar et al., 2001). The high heterogeneity of Caatinga land cover makes it harder to distinguish the different land-cover patterns through the interaction between electromagnetic radiation and the surface recorded by sensors onboard the Landsat satellites. Our results suggest that the use of vegetation-index time series approach performs better for land-cover classification, compared to single-date multispectral classification in a high heterogeneous environment as the Caatinga. As mentioned by Hüttich et al. (2011) and by Silveira et al. (2018), this approach, as considering the seasonal variability of vegetation activity and phenology cycle in the classification process, increases the overall performance of the land-cover classification in dry seasonal forests. Moreover, the phenology behaviour detected by time series has been pointed out as a key factor to the best performance observed in semi-arid environments (Htitiou et al., 2019). The boxplots in Figure 5.B show that the multidate NDVI classification has revealed different patterns of each coverage classes, which help to ensure good discrimination between classes in the study area. It is possible to identify a time signature of the monitored cover classes, revealing a pattern in the behaviour of each type of land cover (Fig. 9). The morphological characteristics of each class can explain the distinction of temporal signatures, mainly those related to vegetation (Arvor et al., 2011). The Caatinga is characterized by significant variation of biomass between the dry and rainy seasons (Barbosa and Kumar, 2016). Therefore, it is necessary to assess the seasonal behaviour of each class (Xia et al., 2017). This will make it possible to distinguish the classes of cover more effectively, as it is possible to verify the response of vegetation throughout the rainy and drought cycles to which the ecosystem is subjected (Levine and Crews, 2019;Gomez et al., 2016).
However, the classes with larger and denser vegetation (VDSA, DSA) showed similar performance in both approaches (Fig. 4), which indicates that the construction of the time series brought low benefit to distinguish these classes, compared with the single-date multispectral approach. VDAS class is characterized by a more stable phenological behaviour (Fig. 9). This can be explained by the proximity of the location of the occurrence of this class to the watercourses, allowing the availability of water throughout the rain and drought cycles. In Figure 5, some classes stand out for the outliers, which can be caused by the difficulty of distinguishing them in the field survey, and also by the anthropic interference in some regions throughout the time series. Anthropogenic activities in Caatinga's land-cover may impair the performance of classification multi-date NDVI (Maldonado et al., 2002;Jianya et al., 2008;Santos et al., 2013). BS class presents outlines with NDVI value from vegetated regions, which suggests that some points may have been used as cropland at certain times (Fig. 5.B). This explains the lowest performance among the classes evaluated by the multi-date NDVI classification. However, when comparing the BS class for both approaches, a lower performance is perceived for the single-date multispectral classification (Fig. 4). The Caatinga vegetation, especially for smaller canopy (SSS and OSS), is scattered with a significant contribution of the soil to the recorded reflectance ( Fig. 5.A).
Remarkable differences in the detection of land-cover, such as those observed between the single-date multispectral and multi-date NDVI classifications ( Fig. 6 and Fig. 7), can interfere in numerous applications of environmental planning and research in this region, sometimes generating misleading approximations of the reality. In this sense, it is relevant to evaluate the time-series patterns of different land-cover classes in seasonal dry forests and, thus, allow their characterization through satellite images. Continuous ground monitoring of different types of land cover is very important, to face the challenge of land-cover classification in the Caatinga and other dry environments (Zhao et al., 2016).

Conclusions
The high spatial heterogeneity and temporal variability of the Caatinga vegetation are important elements to consider in the land-cover classification process. The use of a multi-date NDVI approach for the characterization of the land cover in this environment tends to be an effective alternative, compared to the traditional single-date multispectral approach, that takes