IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 19-20, 2017, India
Dimension Reduction of Hyperion data for improving Classification performance - an assessment Naiwrita Borah, Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, East Sikkim-737136 India,
[email protected], Dibyajyoti Chutia,North Eastern Space Applications Centre, Dept. of Space, Govt. of India, Shillong , Meghalaya – 793103 India, d.chutia @nesac.gov.in, Diganta Baruah, Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, East Sikkim-737136 India,
[email protected], P.L.N.Raju, North Eastern Space Applications Centre, Dept of Space, Govt. of India, Shillong, Meghalaya – 793103 India,
[email protected],
Abstract—Hyperspectral remote sensing data contains a large number of spectral bands along with a large a number of features. So for classification purposes, it becomes imperative to reduce the number of spectral bands as well as to reduce the number of features to achieve higher performance in terms of accuracy and computational complexity. Principal component analysis (PCA) and minimum noise fraction (MNF) were used for dimension reduction of spectral bands, on the other hand correlation based feature selection (CFS) technique was employed for selection of features. Parametric classifier like maximum likelihood classifier (MLC), advanced non parametric classifiers like support vector machine (SVM), Multilayer Perceptron (MLP) and ensemble random forest (RF) classifier were used for investigation in two test datasets of different land cover characteristics of Hyperion sensor. It was observed that all of the classifiers have behaved differently with the dimension reduced dataset produced either in spectral bands or feature level. Most of the classifiers have achieved higher performance with dimension reduced dataset produced by MNF as compared to PCA. On the other hand, dataset with selected features could not give noticeably better classification performance than the datasets with entire features. However, computational complexity of the classifiers has been reduced greatly while they were classified with selected features Keywords—Hyperspectral, PC composite features, ensemble classifier, MNF
I.
images,
selected
Introduction
Availability of large number of contiguous spectral bands with narrow bandwidth in hyperspectral data has provided significant potential for material recognition [1]. Classification of hyperspectral satellite data is quite challenging as compared to the multispectral data because of large number of spectral bands [2]. On the other hand, huge dimensionality of spectral band often rises ‘Hughes phenomenon’ [3] where sample size essential for training a classifier increases exponentially with the number of spectral bands. Major drawback in hyperspectral image processing is large number of spectral bands while utilized in conventional approaches for processing [4]. In addition remotely sensed data are associated with large number of features characterized by spectral texture and spatial morphological properties. However all the features are not relevant to define a training dataset. Selection of optimal set of features is crucial in order to achieve higher predictive accuracy with minimal computational complexity. A number of dimension reduction techniques have been proposed since the beginning of hyperspectral remote sensing. Minimum noise fraction (MNF, green et al 1998 [5]), principal component analysis (PCA, [6]) independent
component analysis (ICA, Hyvärinen[7]) are to name a few. PCA is a well recognized dimensionality reduction technique which has the ability to reduce data in fewer independent bands with noncorrelated information and are commonly more interpretable than the source data. MNF is a second major algorithm belonging to this family of PCA techniques and effectively used in hyperspectral data processing. ICA transforms a set of mixed, random signals into mutually independent components and the advantage over PCA is that IC transformation can differentiate features of interest even if they occupy only a small portion of the pixels in the image. Both PCA and MNF have been widely used in hyperspectral remote sensing. Chutia et al. [2] has proposed PCA dimension reduction of Hyperion data where dimension reduced dataset is comprised by the first principal component of the bands falling in the respective spectral regions. PCA has been considered for improving performance of the classifier as a pre-processing technique [2, 8]. In an investigation by Guangchun Luo et al.[9], demonstrates that MNF-based methods attain higher signal-to-noise ratios than PCA-based methods when it comes to signal-dependent noise, whereas PCA-based methods produce higher signal-to-noise ratios than MNF-based methods for Gaussian white noise. In addition, MNF was effectively used for feature extractions and improvement of classification accuracy [10] in a number of applications. With the presence of large number of features in the dataset the computational, feature extraction becomes imperative. Because the reduced set of features not only improves the learning ability SF the supervised learning algorithm but also enhances the computational cost incurred. Among the different processes available wrappers [11] and filters [12] are the most prevalent. While wrapper utilizes an intended algorithm to assess the relevance of any particular feature filters do not use any algorithm in identification of relevant features. In this research the technique of CFS put forward by Hall [13] is used for feature selection, based on filters. The main objectives of this work are given below 1. To assess the quality of dimension reduced (DR) datasets produced by PCA and MNF technique in terms of classification performance. 2. To investigate the performance of classifiers using training datasets with entire features (EF) and the training datasets with selected set of features (SF).
978-1-5090-3704-9/17/$31.00 © 2017 IEEE
1347
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 19-20, 2017, India
Datasets used
II.
Here, Hyperion sensor data of NASA's Earth Observing 1 (EO-1) satellite, the first space borne hyperspectral instrument to acquire both visible/near-infrared with a wavelength ranging from (400-1000nm) and shortwave infrared with wavelengths (900-2500nm) spectral data, has been used. It has a total of 220 potential bands with a spatial resolution of 30m with 7.6 km swath widths. . Spectral characteristic of bands of sample data are given in the Table 1.
agricultural areas and grasslands and some scattered fallow lands. A total of six land cover classes are taken in this dataset (see Table 3). TABLE 3: INFORMATION ON TRAINING-TEST SAMPLES FOR THE DATASET-2 Samples Class name
Table 1 Spectral characteristic of bands of sample data. Sl. No. 1 2 3 4 5
Bands 8-15 16-25 26-34 35-57 & 77-184 185-220
Wavelength range (nm) 426.81-498.04 508.22-599.80 609.97-691.37 701.55-1991.96 2002.06-2355.21
Spectral region Blue (B) Green (G) Red (R) Near-Infrared (NIR) Mid-Infrared (MIR)
Dataset-1
It is an extraction of the region situated along the bank of the river Brahmaputra, Assam, India. Thus being a river bed there is an abundance of agricultural crops. Apart from tree clad areas forest areas, this region also contains a lot of grasslands and serves as grazing lands. This area also finds a number of wetlands. Seven land cover classes were considered in the experiment for which training and test samples have been collected independently (see Table 2). TABLE 2: INFORMATION ON TRAINING-TEST SAMPLES FOR THE DATASET-1
566
504
Agricultural double crops
122
136
Tree clad areas
122
165
Grasslands
64
68
Fallow lands
139
100
Water Bodies
99
112
1112
1085
Dimension reduction using PCA and MNF
Here both PCA and MNF were applied on the bands of each spectral region of Hyperion sensor independently. Theoretical background of PCA and MNF are provided in the subsequent sections. A. PCA Principal Components Analysis (PCA), often referred to as a PC rotation, is a linear transformation of a multivariate dataset into a new coordinate Let, is a pixel matrix of an image formed by p number of spectral bands with n number of pixels in each of the bands. Let, is a mean-adjusted pixel matrix of where, mean value of the respective band is subtracted from its each of the pixel value. The PCA on can be defined as follows [15]:
Samples
(1)
Train size Test size
B.
Agricultural areas mixed with settlements
III.
Class name
Agricultural areas mixed with settlements
138
144
Agricultural double crops
70
104
Tree clad areas
159
142
Grasslands
124
80
Sandy areas
68
60
Wetlands areas
73
48
Water Bodies
175
197
807
775
Total
Test size
Total
The study was being carried out using two datasets for both PCA and MNF of different land use information. Let us consider here that PCA datasets means the dimension reduced data using PCA and similarly MNF dataset defines the dimension reduced dataset given by MNF technique. The two datasets used comprise of the Brahmaputra river basin, and the region surrounding its tributaries, in Assam, India A.
Train size
where, is the pixel matrix with m principal components, is the matrix of the eigenvectors formed by the m selected eigenvectors of the variance-covariance matrix of . The eigenvector with the highest eigenvalue is considered as the first PC of the new image formed by the with m (m 0 such that (2) where, p is the Pearson’s correlation coefficient computed for feature-feature inter-correlation and feature-class (C) correlation. Further details can be found in Chutia et. al [15]. V.
Results and discussion
Kappa index ( K ), and receiver operator characteristic (ROC ) were used for evaluation of performance in terms of classification accuracy on the other hand precision (p) was used for determining stability of the classifier over different
datasets. K basically calculates the agreement between the observed values and the hypothetical probabilistic values. ROC gives the curve between the true positive rate (TPR) against the false positive rate (FPR) at numerous threshold settings. p is the is the section of relevant instances among the retrieved instances. C. Classifiers for investigation Most widely used parametric classifier like MLC, advanced non parametric classifiers like SVM and MLP and popular ensemble classifier like RF were used for investigation on an Intel(R) Xeon(R) CPU E31245 @ 3.30GHZ (4 Core) system.
Fig 1. Image reconstructed from three (G, R, NIR) spectral regions using PCA and MNF
a) Dataset-1
D. Comparative assesmsent using PCA and MNF The main objective was to assess the quality of PCA and MNF datasets in terms of classification performance. It was observed that RF outperformed all the other for all the datasets of PCA and MNF. However, performance of all the classifiers are comparable in most of the instances for both MNF and PCA datasets except the performance of MLC for PCA in Dataset-1 was found to be very poor.
b) Dataset-2
TABLE 4: COMPARATIVE ASSESSMENT WITH OTHER CLASSIFIERS
Fig 2. FCC image of PCA for a) Dataset-1, b) Dataset-2 Classifier
Dataset-1
Dataset-2 a) Dataset-1
b) Dataset-2
Fig 3. FCC image of MNF for a) Dataset-1, b) Dataset-2
IV.
Selection of features using CFS
CFS selects the set of optimal features based on the feature-feature inter-correlation and feature-class correlation.
RF MLP SVM MLC RF MLP SVM MLC
PCA
MNF
K
ROC
P
0.85 0.81 0.78 0.76 0.84 0.84 0.72 0.51
0.99 0.98 0.95 0.92 0.96 0.96 0.96 0.81
0.88 0.85 0.82 0.72 0.85 0.91 0.90 0.62
K
ROC
P
0.91 0.80 0.83 0.71 0.94 0.90 0.75 0.79
1 0.97 0.95 0.93 0.99 1 0.92 0.95
0.95 0.85 0.87 0.8 0.96 0.93 0.81 0.85
It was observed that the performance of RF classifier was greatly improved while it was investigated with MNF datasets. Performance of MLP was also found satisfactory followed by SVM. Fig.4 and Fig. 5, give the classified outputs of MNF and PCA datasets.
978-1-5090-3704-9/17/$31.00 © 2017 IEEE
1349
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 19-20, 2017, India
A. Performance analysis on EF and SF data An attempt was made to investigate the performance of classifiers using training datasets with selected set of features (SF). There are a total of 144 and 159 features associated with the Dataset-1 and Dataset 2 respectively (see Table 5) for both PCA and MNF. After CFS, it was observed that number of optimal features for Dataset-1 is 20 and 12 for Dataset-2 in case of PCA datasets. Similarly in case of MNF datasets, SF of Dataset-1 is with 11 and 20 for Dataset-2. In both the datasets, the number of features in SF has significantly reduced (see Table 5). It was observed that most of the classification results have been improved with SF in case of PCA datasets. For e.g. , performance of RF, MLP and MLC for Dataset-1 and SVM and MLC for Dataset-2, have been found enhanced as compared to their results with EF (see Table 6). However performance of SVM for Dataset-1 and RF and MLP for classification of Dataset-2 were not improved with the SF (see Table 6). But in case of MNF data, most of the classification results were found satisfactory with EF as compared to SF. For e.g. RF and SVM for Dataset-1, RF MLP and SVM for Dataset-2 have shown better results with EF. On the other hand for Dataset-1 and MLC for both datasets have performed better with SF (see Table 7). It has been observed that some of the classifiers have shown better results with EF on the other hand some have performed satisfactorily with SF depending upon the datasets i.e. PCA or MNF. All the advanced classifiers like RF, SVM and MLP could not result consistently higher performance neither with SF nor with EF. Classification performance of MLC was enhanced with SF for all the instances. However, classification with SF has greatly minimized the computational cost as compared to the classification using EF. Specifically MLP was highly benefitted as it takes longer time in training the datasets.
I. CONCLUSION Dimension reduction of Hyperion datasets in spectral and feature space is reported to assess the classification performance of several popular classifiers. Results, obtained during the investigation were found quite encouraging however further efforts are required to enhance the processing technique of PCA and MNF. It was observed that classification results were found higher on MNF datasets as compared to the PCA datasets. Dimension reduction in feature space i.e. training datasets with selected features has reduced the computational complexity, however accuracy of the classifiers has not been found with significant improvement in classification with entire features. Feature selection was done with CFS, many other techniques could be applied. A lot of research has been done previously in order to perform dimensionality reduction in case of spectral bands and in reduction of the feature set. However this is the first attempt at reducing the spectral bands from the collection of contiguous bands and then the reduction of features to obtain the optimal set of features.
a) Classified image of Dataset-1
TABLE 5: FEATURES IN THE DATASETS BEFORE AND AFTER PERFORMING SF
Datasets Dataset-1 Dataset-2
PCA EF SF 144 20 159 12
b)Classified image of Dataset-2
MNF EF SF 144 11 159 20
Fig 4: Classified results using RF on PCA based dimension reduced dataset.
TABLE 6: COMPARATIVE ASSESSMENT OF EF AND OF DATA IN CASE OF PCA RF
MLP
EF SVM
MLC
RF
MLP
SF SVM
MLC
0.85 0.64 0.88
0.81 85.51 0.85
0.78 0.38 0.82
0.76 0.05 0.72
0.91 0.2 0.93
0.82 3.95 0.86
0.77 0.06 0.81
0.8 0 0.84
0.84 0.28 0.85
0.84 125.64 0.91
0.72 0.15 0.90
0.51 0.07 0.62
0.76 0.17 0.86
0.81 2.29 0.88
0.73 0.05 0.79
0.61 0 0.70
Dataset-1
K t(sec) P
Dataset-2
K t(sec) P
TABLE 7: COMPARATIVE ASSESSMENT OF EF AND OF DATA IN CASE OF MNF EF
a) Classified image of Dataset-1
RF
MLP
SVM
MLC
RF
SF MLP SVM
0.9 0.3 0.94
0.8 6.99 0.85
0.83 0.17 0.87
0.71 0.01 0.8
0.86 0.2 0.89
0.81 3.99 0.87
0.69 0.17 0.76
0.79 0 0.85
0.93 1.12 0.96
0.9 280.75 0.93
0.74 2.4 0.8
0.78 0.15 0.85
0.71 0.19 0.81
0.87 2.33 0.92
0.71 0.06 0.81
0.83 0 0.88
MLC
Dataset-1
K t(sec) P
B) Classified image of Dataset-2
Fig 5: Classified results using RF on PCA based dimension reduced dataset.
Dataset-2
K t(sec)
P
978-1-5090-3704-9/17/$31.00 © 2017 IEEE
1350
ACKNOWLEDGMENT
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 19-20, 2017, India
The authors would like to thank North Eastern Space Applications Centre, Department of Space, Government of India, Umiam, and Meghalaya, India for providing the necessary guidance and support during the entire duration of study. The authors also acknowledge the concerned authorities of WEKA and ImageJ software for its important role in carrying out the investigation
hyperspectral satellite data: A case study on Hyperion data, Applied Geomatics (Springer), 6(3), 181-195 [15] Chutia, Dibyajyoti, Bhattacharyya, DK, Sarma, Jaganath and Raju, PLN. 2017. An effective Ensemble Classification Framework using Random Forests and Correlation Based Feature Selection Technique, Transaction in GIS (In Press)
References [1]
[2]
[3] [4]
[5]
[6] [7] [8]
[9]
[10]
[11] [12] [13] [14]
Hsu, P.H. (2007). Feature extraction of hyperspectral images using Wavelet and matching pursuit. ISPRS Journal of Photogrammetry and Remote Sensing, 62(2), 78-92. Chutia, Dibyajyoti, Bhattacharyya, DK, Kalita, R and Sudhakar, S (2014): OBCsvmFS: Object-Based Classification supported by Support Vector Machine Feature Selection approach for hyperspectral data. Geomatics, 8 (1), 12-19. Hsu PH(2007) Feature extraction of hyperspectral images using wavelet and matching pursuit. ISPRS J Photogramm Remote Sens 62(2):78–92. Kant, Y., Bharath, B.D., Mallick, J., Atzberger, C. and Kerle, N., (2009). Satellite-based analysis of the role of land use/land cover and vegetation density on surface temperature regime of Delhi, India. Journal of the Indian Society of Remote Sensing, 37(2), pp.201-214. Green AA, Berman M, Switzer and P, CraigMD (1988) A transformation for ordering multispectral data in terms of image quality with implications for noise removal. Geoscience and Remote Sensing, IEEE,Transactions on, 26(1), 65–74 Anton, Howard. Elementary linear algebra. John Wiley & Sons, 2010. Hyvärinen, A. &Oja, E. (2000) Independent component analysis: algorithms and applications,Neural networks, 13(4), 411--430. Panigrahi, N. and Prashnani, M., (2015). Impact Evaluation of Feature Reduction Techniques on Classification of Hyper Spectral Imagery. Journal of the Indian Society of Remote Sensing, 43(1), pp.110. Guangchun Luo, Guangyi Chen, Ling Tian, Ke Qin, Shen-En Qian (2015)Minimum Noise Fraction versus Principal Component Analysis as a Preprocessing Step for Hyperspectral Imagery Denoising,Canadian Journal Of Remote Sensing Vol. 42 (2),106-116 Dian, Y., Pang, Y., Dong, Y. and Li, Z., 2016. Urban Tree Species Mapping Using Airborne LiDAR and Hyperspectral Data. Journal of the Indian Society of Remote Sensing, pp.1-9. Kohavi, R. (1995)Wrappers for Performance Enhancement and Oblivious Decision Graphs, PhD Thesis, Stanford University. Kohavi, R. & John, G.(1996)Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 97(1-2), 273--324,. Hall, M.A. (1999). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand. PhD thesis. Chutia, Dibyajyoti, Bhattacharyya, DK, Kalita, R and Sudhakar, S (2014): A model on achieving higher performance in the classification of
978-1-5090-3704-9/17/$31.00 © 2017 IEEE
1351