Development of New Index Based Supervised Algorithm for Separation of Built-Up and River Sand Pixels from Landsat7 Imagery: Comparison of Performance with SVM

While extracting “built-up” pixels from satellite imagery, supervised classification algorithms often misclassify “river sand” pixels as “built-up” ones due to the similarity in their spectral profiles. With the help of the spectral reflectance information in BLUE & GREEN bands of Landsat satellite imagery, this study has introduced a new index BRSSI (Built-Up & River Sand Separation Index) that efficiently reduce the misclassification between these two classes. The results shows that average overall accuracy, F1 score and kappa ($\kappa$) coefficient for the developed index corresponding to selected 3 study regions across India are 0.9763, 0.9767 & 0.9527 respectively.


INTRODUCTION
The problem of estimation of urban sprawl has been approached by classifying built-up pixels from satellite imagery with the help of various classification methodologies. In supervised classification algorithms using multi-spectral satellite images, information stored in different bands at pixel level is utilized as "features" to classify the pixel as "built-up" or "non built-up". Due to the similarity of spectral profiles [1], "river sand" deposited in the banks of the rivers & beaches, often gets misclassified as "built-up" by supervised classifiers that use spectral information to extract "built-up" pixels. Thakkar et al. [2,3] have studied performance of Maximum Likelihood Classifier (MLC) for Indian Remote Sensing (IRS) Resourcesat2 (R2) multi-spectral Linear Imaging Self-Scanning System III (LISS-III) satellite data in Arjuni & Khan-Kali watersheds, Gujarat, India and have reported significant misclassification between "built-up" & "river sand" classes. In the Land Use and Land Cover (LULC) change analysis study conducted by Avelar et al. [4] for the coastal area of Rio de Janeiro, Brazil, it has been observed that supervised classification and machine learning techniques could not accurately differentiate between "built-up" & "sand" classes using both Landsat-5 1 (for the year 1990) & GeoEye-1 2 (for the year 2012) satellite imagery. In their study for the city of Nanjing, eastern China, Zha et al. [5] have noted that due to similarity of spectral response across multi-spectral bands, Normalized Difference Built-Up Index (NDBI) is not able to separate the pixels of urban settlements from that of sandy beaches using Landsat satellite imagery. Pesaresi et al. [6] have applied Symbolic Machine Learning (SML) for detecting "built-up" region using Sentinel-2 3 satellite imagery for the city of Porto Viro in the area of the Po river delta, Italy and have reported misclassification errors of detection of "sand dunes" as "built-up" along the coastal areas due to indistinguishable spectral characteristics of these two classes. As index-based methodologies have been advantageous for ease of implementation and computational efficiency, in this work we have developed a new indexbased supervised algorithm that significantly reduce misclassification between "built-up" and "river sand" classes using Landsat 4 satellite imagery which has been widely preferred by the researchers due to it's easy & historic availability and large scale spatial coverage. Study sites and associated data sources along with preparation of training and testing dataset have been described in Section 2. Section 3 includes discussions on development of the proposed index-based algorithm and corresponding performance measures to gauge the effectiveness of the developed method in separating "built-up" and "river sand" classes. Finally, findings of this study have been presented in Section 4.

DATA & STUDY AREA
In order to ensure that the proposed algorithm does not have any region specific bias and perform satisfactorily across different geographic regions, we have considered 3 study areas (Delhi, Patna & Rajamundry) of 1 • Latitude×1 • Longitude spatial resolution (covering area ≈ 12100 sq.km.) from various parts of India. Study regions have been labelled according to the largest urban settlement that has been contained inside the region. Also, these study regions are situated in the banks of different rivers. To elaborate, the study region of Delhi & Rajamundry are situated along the rivers Yamuna and Godavari respectively. Similarly, rivers Ganga, Gandak and Gharghara flow within the region of selected study site of Patna. Ortho-rectified and geo-referenced Landsat7 ETM+ (Enhanced Thematic Mapper Plus) satellite imagery, provided by USGS, have been used in this study for development and validation of the proposed index-based methodology. Image acquisition dates of Landsat7 images for study sites of Delhi, Patna & Rajamundry are 25-Feb-2017, 22-Feb-2017 and 12-December-2017 respectively. These images have been atmospherically corrected and rectified for Scan Line Corrector (SLC) failure 5 with the aid of gap mask files and inverse distance weighting algorithm as implemented in Geospatial Data Abstraction Library (GDAL) python library. Manually verified training and testing set of pixels have been created for both "built-up" & "river sand" classes using Google Earth Engine (GEE) 6 platform. For all considered study sites, training set consists of 100 pixels from each of "built-up" & "river sand" classes. Similarly, testing set comprises of 500 pixels from each of these 2 land cover types. In order to ensure a fair comparison between considered methodologies, same set of training pixels has been used to set thresholds for separating the "built-up" class using the proposed index-based method and to train the Support Vector Machine (SVM) classifier. By the same token, same testing data has been utilized as reference to compare the performances of separation between "built-up" and "river sand" classes using the developed index and SVM classifier.

METHODOLOGY
For the purpose of understanding the pattern of spectral profiles for "built-up" & "river sand" pixels, we have studied the distributions of 6 Landsat bands for these 2  classes. Spectral distributions of Landsat7 bands corresponding to considered 2 classes for the study site of Delhi have been displayed in Figure 1. Careful observation of spectral profiles ( Figure 1) reveal that though the patterns of spectral profiles have been similar for both the classes of "river sand" & "built-up", both BLUE and GREEN bands have been able to completely separate "river sand" pixels from the "built-up" ones. Also, as shown in Figure 2, analysis of Receiver Operating Characteristic (ROC) curves for Naive Bayes classifiers using individual Landsat7 bands for separating "built-up" & "river sand" classes, indicates that BLUE & GREEN bands exhibit higher level of importance compared to other Landsat7 bands. Therefore, in this study we have formulated the proposed index as the product of reflectance values for BLUE & GREEN bands with raised to appropriate powers for ensuring high level of separation between the distributions of "built-up" and "river sand" pixels. It could be noted here that for the purpose of demonstration, in this article we've described the methodology with data for the study site of Delhi only but the observations are similar for other 2 study sites (Patna & Rajamundry) as well.
We've constructed the introduced generic index as shown in Equation 1 where α and β are parameters with real values and to be adjusted for the purpose of maximizing separation between the distributions of pixels from the considered 2 classes. According to the primary purpose of the index, it has been named as "Built-Up & River Sand Separation Index" or BRSSI.
Next, for selecting the values of parameters, we have simulated and carried out full factorial designed experiments by varying the values of α & β within the range from −10 to 10 with changes of 0.5 and have noticed   Figure 4, we can observe that for the training set, there has not been any mixing between the distribution of BRSSI (= √ BLUE × GREEN) for "built-up" pixels with the same for "river sand" pixels. As mentioned previously, it could be emphasized here again that all discussed observations have been similar for other 2 study sites also. In order to separate "built-up" pixels from "river sand" ones for the validation set and entire satellite image corresponding to the particular study site, threshold has been computed using bootstrapping method [9] from the training set of "built-up" pixels corresponding to the same study area. Thus, a pixel i would be separated as "built-up" from "river sand" class, if L BRSSI ≤ BRSSI(i) ≤ U BRSSI where L BRSSI & U BRSSI are lower & upper bootstrap thresholds respectively for "built-up" pixels and BRSSI(i) is the value of index BRSSI for the pixel i. For accessing the performance of the proposed index BRSSI, classified "built-up" & "river sand" pixels from the testing set have been compared with the actual ones corresponding to the same set. With the help of the confusion matrix [10], accuracy measures that have been computed and reported are Sensitivity (Recall), Specificity, Positive Prediction Value or Precision (PPV), Negative Prediction Value (NPV) and Overall Accuracy. Also, in order to balance between Precision & Recall, we have noted F1 Score (= 2× (Precision×Recall) (Precision+Recall) ). In addition, Cohen's Kappa (κ) coefficient has been computed and reported for the purpose of understanding the degree of conformance of the separation results with the ground truth. As SVM [11,12] has been widely used Machine Learning (ML) methodology for pixel-based land cover classification problems in remote sensing, we have compared the performance of the developed index BRSSI with the same for SVM. All performance measures discussed above have been reported for both the methodologies across 3 study sites. RBF (Radial Basis Function) kernel ) has been used in the SVM method. Also, parameters sigma (σ) in RBF kernel function along with Cost (C) have been tuned properly to optimize the performance of SVM. R software package 7 and associated libraries have been used for statistical computations and calculation of performance measures for testing set using both the methodologies (BRSSI & SVM).

RESULTS & DISCUSSIONS
It could be observed in table 1 that for all considered study regions, both overall accuracy and F1 score corresponding to the proposed index BRSSI have been greater than 0.95, indicating high level of separation between "built-up" & "river sand" classes. Though it could be noticed that the classification performance of SVM is marginally higher compared to the same for BRSSI, the implementation of BRSSI is fast and it is computationally less expensive compared to SVM for which associated parameters need to be tuned properly in order to achieve optimized performance. Visual inspection of application of proposed BRSSI along with existing supervised classification methodologies for extraction of "built-up" pixels also indicates significant reduction of misclassification across all 3 selected study regions.