Document not found! Please try again

Non-Uniform Random Feature Selection and Kernel Density Scoring ...

1 downloads 0 Views 2MB Size Report
Sathishkumar Samiappan, Saurabh Prasad, Member, IEEE, and Lori M Bruce, Senior Member, IEEE ... important problem in remote sensing applications. Hy- ...... Dr. Bruce is an active member of the IEEE Geoscience and Remote Sensing. Society, and she is a member of the Phi Kappa Phi, Eta Kappa Nu, and Tau.
792

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 2, APRIL 2013

Non-Uniform Random Feature Selection and Kernel Density Scoring With SVM Based Ensemble Classification for Hyperspectral Image Analysis Sathishkumar Samiappan, Saurabh Prasad, Member, IEEE, and Lori M Bruce, Senior Member, IEEE

Abstract—Traditional statistical classification approaches often fail to yield adequate results with Hyperspectral imagery (HSI) because of the high dimensional nature of the data, multimodal class distribution and limited ground truth samples for training. Over the last decade, Support Vector Machines (SVMs) and Multi-Classifier Systems (MCS) have become popular tools for HSI analysis. Random Feature Selection (RFS) for MCS is a popular approach to produce higher classification accuracies. In this study, we present a Non-Uniform Random Feature Selection (NU-RFS) within a MCS framework using SVM as the base classifier. We propose a method to fuse the output of individual classifiers using scores derived from kernel density estimation. This study demonstrates the improvement in classification accuracies by comparing the proposed approach to conventional analysis algorithms and by assessing the sensitivity of the proposed approach to the number of training samples. These results are compared with that of uniform RFS and regular SVM classifiers. We demonstrate the superiority of Non-Uniform based RFS system with respect to overall accuracy, user accuracies, producer accuracies and sensitivity to number of training samples. Index Terms—Ground cover classification, hyperspectral imagery (HSI), multi-classifier systems (MCSs), random feature selection (RFS), support vector machines (SVMs).

G

I. INTRODUCTION

ROUND cover classification is a challenging and an important problem in remote sensing applications. Hyperspectral imagery (HSI) provides a detailed description of ground-cover materials ranging from visible to infrared regions of the electromagnetic spectrum. Such a wide spectral range of information has the potential to yield higher classification accuracies. The key to the design of a powerful classification system lies in extracting pertinent features from the high-dimensional data and employing classifiers to exploit those features. Maximum Likelihood (ML), a traditional supervised pattern classification approach, often fails to classify HSI data accurately because of (a) the high dimensionality of features, (b) Manuscript received May 15, 2012; revised August 17, 2012 and December 11, 2012; accepted December 13, 2012. Date of publication January 17, 2013; date of current version May 13, 2013. S. Samiappan is with the Mississippi State University, Geo Systems Research Institute, Starkville, MS 39759 USA (e-mail: [email protected]). S. Prasad is with the University of Houston, Electrical and Computer Engineering Department, Houston, TX 77004 USA (corresponding author, e-mail: [email protected]). L. M. Bruce is with the Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, MS 39762-9571 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2013.2237757

multimodal distributions and (c) limited ground truth availability. In order to solve the problem of high dimensionality, there are several existing approaches based on the concepts of dimensionality reduction and feature selection. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Stepwise-LDA (S-LDA) are popular dimensionality reduction techniques [33]. Feature selection can also be performed using metrics such as Bhattacharya Distance (BD), Jeffries-Matusita (JM), entropy etc. The Gaussian ML classifier assumes that the classes are Gaussian distributed. This is a limitation for majority of practical HSI datasets. Algorithms based on Gaussian Mixture models [1] have been proposed in the past to accommodate multi-modal distributions. An alternative approach that has become more popular recently with HSI data is the use of Support Vector Machines (SVM) [4]. Finally, there are techniques to solve the limited ground-truth availability such as sample interpolation and adaptive classifiers [2], [3]. In recent work, to improve the performance of conventional single classifiers, Multi-Classifier Systems (MCS) have been developed [5]–[8]. MCS are often referred as ensemble classifier systems, and they potentially perform better than single classifiers when diversity is established among the classifiers. The diversity among the classifiers can be established in different ways [9]–[11]. Prasad et al. [6], [7], demonstrated that, with an MCS setup with ML as classifier, the performance can be improved when compared to single classifiers, and there was a potential to further improve such a system by incorporating nonlinear SVM classifiers. Recently, a MCS based on Random Feature Selection (RFS) [12] proposed by Waske et al. and a dynamic subspace approach [13] proposed by Min et al. were shown to perform well with HSI data. Techniques such as random forests [14] and RFS perform well because they create diversity among the classifiers by re-sampling the spectral bands at the inputs of the classifier. As proposed in [10] diversity can also be created in other ways. In [15], Breiman has demonstrated the diversity creation by re-sampling, strategies related to [15] such as bagging [16] and boosting [17] are also proved to be effective. The selection of features in [12] is a uniform random feature selection (RFS). In our recent work [18], we explored the possibilities of using a non-uniform RFS (NU-RFS) based MCS with SVMs. We found that a diverse classifier ensemble for a classification problem need not always come from a RFS as proposed in [12], [13]. In [18] we demonstrated that NU-RFS can provide better performance than uniform RFS. In this study, as an extension, we present a fully automated MCS with NU-RFS using SVM. It is assumed that a diverse set of features leads to higher

1939-1404/$31.00 © 2013 IEEE

SAMIAPPAN et al.: NON-UNIFORM RANDOM FEATURE SELECTION AND KERNEL DENSITY SCORING

classification accuracies. Although the diversity can be defined in many ways [10] for the purposes of this study, a diverse set of spectral bands is defined as the following 1) Bands are selected from multiple spectral regions across the entire spectrum of signature. 2) Cross-correlation between selected bands is minimized. The approach proposed in this paper combines the following methods to create diversity within a pool of classifiers and to ensure that strengths and weaknesses of individual classifiers are incorporated into the final decision making: a) re-sampling features in the data through RFS; b) manipulation of input features through NU-RFS; c) manipulation of output classes through scores computed from Kernel Density estimation. The approach uses a spectral band grouping [28] to perform NU-RFS and a decision fusion strategy based on kernel density scores. To verify the effectiveness of this approach, we performed experiments to compare overall accuracies of SVM, RFS, NU-RFS, SVM with kernel density fusion, and NU-RFS with kernel density fusion. The sensitivity of the above mentioned approaches to the number of samples required to train them is also studied in this work. This paper is organized as follows. Section II provides a review of SVM, MCS, and RFS for SVM and possible extensions. Section III describes proposed kernel density based scoring system for fusion in a MCS, the proposed classification system based on NU-RFS, and band grouping. Section IV discusses the experimental setup and provides results. Finally, Section V summarizes the observations and provides concluding remarks. II. BACKGROUND A. Support Vector Machines The effectiveness of SVMs for HSI data has been shown in [4] and has gained popularity over the last decade. They often provide high classification accuracies compared to other non-parametric and statistical approaches. SVM classifiers are particularly useful to classify heterogeneous classes with a limited number of training samples. A detailed tutorial of SVMs can be found in [19]. SVMs are intrinsically designed as binary classifiers; however multi-class SVM classifiers can be constructed by using original SVMs as basic blocks. One—vs-all and hierarchical tree based approaches are popular techniques for constructing multi-class SVM classifiers. A more detailed explanation for constructing multi-class SVMs can be found in [20], [21]. In this paper a non-linear SVM with a Gaussian Radial Basis Function (RBF) kernel has been used. There are two parameters for an RBF kernel: and . We estimate these parameters using cross-validation and grid search. B. Multi-Classifier System The concept of combining the predictions of multiple classifiers to produce a single classifier has been proposed by various researchers in the past [22], [23]. In the literature, this concept is referred as ensemble classifiers or MCS. The resulting MCS is generally more accurate when compared to the individual classifiers that form MCS. An effective MCS is one where the individual classifiers in the MCS are accurate and make their classification errors on different parts of the input

793

space. Combining the predictions of identical classifiers will not have any improvement so it is useful only when there is a disagreement among the individual classifiers. In [24], Krogh et al. proved that overall classification error can be divided into a quantity which is the average generalization error of each classifier and a quantity proportional to the disagreement among the classifiers. From [25], [26] it can be concluded that an ideal MCS should consist of classifiers that have the highest disagreement possible. Bagging [15] and boosting [17] are very popular methods used to create diversity among classifiers. In Bryll et al. [16], an improved approach called attribute bagging was introduced. This followed the development of many wrapper based MCS approaches. Each classifier is trained with independently randomly selected feature subsets. The outputs are expected to be diverse and can be combined to form a final decision. Breiman [27] introduced a decision tree (DT) based classification approach with Random Forests (RF), Min [13] proposed a dynamic subspace approach and Waske proposed construction of SVM ensemble using RFS [12]. These are some of the approaches that are inspired from the basic idea of bagging and boosting classifiers successfully used on hyperspectral applications. In [36] Jacobs proposed an approach with mixture of experts followed by [37]. In [38] S. Kumar et al. demonstrated the effectiveness of this technique with hyperspectral data with binary classifiers for a multiclass problem. In their work, partitioning of groups of classes is achieved by binary classifiers at different levels. III. PROPOSED APPROACH In this paper, we propose a SVM-based MCS with unequal number of features being selected from different spectral regions resulting in a non-uniform random feature selection. A. Preliminaries The hyperspectral dataset is assumed to have classes, each represented as . is the number of samples in . Samples in are denoted as where means sample of the class . Samples from different classes can be grouped together to form a super class and is represented as . A feature vector is dimensional and each feature is represented by , i.e., . We define the normalized distance between any two set of samples with respect to its feature vector as and , where is the distance between two classes and and defined in (3) & (4) is the distance between two sets and . B. Proposed Non-Uniform RFS Strategy In RFS based multi-classifier system [12], a subset of features are selected by random sampling from a complete set of features whose indices tend to follow uniform distribution. Fig. 1(a) illustrates two examples of equally likely uniformly distributed spectral band feature selection where has highly correlated bands as compared to . An obvious way to avoid this situation is, as shown in Fig. 1(b) to divide the spectrum uniformly into smaller regions and performing feature selection within each region and concatenating the selected features. Outcome of this approach depends on the choice of number of partitions and partition boundaries. Features can still be correlated

794

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 2, APRIL 2013

Fig. 2. NU-RFS feature selection from original data.

Fig. 1. (a) Two examples of equally likely uniformly distributed spectral band feature selection where d1 has highly correlated band compared to d2. (b) Example of uniform partitioning of spectral band(shown in blue). (c) Non-uniform partitioning of spectral bands with uniformly distributed feature selection per partition.

with this approach. As one progresses their way along the spectrum, the bands in a hyperspectral signature are typically more highly correlated if they are adjacent and the rate of change in the correlation of neighboring bands varies. An intelligent way of partitioning the spectrum would be to place the partition at a point in the feature set where the correlation of neighboring bands changes drastically. This will result in a non-uniform partitioning of spectral bands and bands selected from these non-uniform regions are expected to be less correlated. This is shown in Fig. 1(c). To obtain an optimal set of partition boundaries, we use a band grouping strategy. In [28], the authors proposed an intelligent spectral partitioning technique that groups highly correlated bands into distinct contiguous subspaces and then used those partitions with a MCS. An intelligent (non-random) band-grouping was performed to partition the spectrum into

subsets. In this approach, the diversity among the classifiers is gained by breaking up the spectrum into smaller groups. The region boundaries are automatically selected based on a bottom-up band-grouping strategy. In this approach, starting with the first band, each successive band is added to the group—if this addition does not change the performance metric employed, then the growth of that group is stopped and a new group is started, resulting in a contiguous partitioning of the spectra. The metric employed for band-grouping in this work is the product of Bhattacharya distance and correlation. For more details about band grouping, the readers can refer to [28], [30]. In the proposed approach, the feature space is divided into distinct but contiguous regions in such a way that in each region, the class separation is maximized and statistical dependence is minimized by band grouping. Let be the size of each region, be the total number of features in the data and be number of features that are selected from each region which is directly proportional to . Then, total number of features selected for each classifier is (1) Fig. 2 illustrates this set up. Since uniform RFS is performed in each region separately, this approach can be thought of as a piece-wise uniform RFS. Since the HSI data has high correlation between consecutive bands, there is a good chance of consecutive bands getting grouped into different classifiers when using uniform RFS in a MCS. These highly correlated bands would clearly affect the diversity of the ensemble, and then result in reduced robustness of the MCS approach. However, in the proposed NU-RFS, features for individual classifiers in the resulting MCS are drawn in a non-uniform fashion thereby creating greater diversity among the classifiers compared to selecting features from a uniform random selection. Experimentally, we observed that this approach can result in better ensembles where features are less correlated, owing to the fact that the probability of features that are spectrally close to each other being sent to different learners is very low. The above said is applied to each classifier in the MCS separately unlike [12]. With initial experiments presented in [18], we found that

SAMIAPPAN et al.: NON-UNIFORM RANDOM FEATURE SELECTION AND KERNEL DENSITY SCORING

795

Fig. 3. Proposed NU-RFS based Multi-Classifier System.

the size of the region plays an important role in the performance of overall classification. As recommended in [29], the . The proposed MCS system is shown in Fig. 3. The random subspace selection demonstrated in [12], [13], [29] for MCS aims to create diversity among classifiers. The aforementioned techniques propose to construct ensembles by bagging and boosting variants. We believe the optimal subset to create maximum diversity need not come from a uniform RFS because there is a very good chance that the nearby spectral bands get grouped into the same classifier. This is clearly the case of classifiers having correlated features. This situation of similar grouping of features can sometimes affects the diversity by forcing the classifiers to commit similar errors. By using the proposed approach described in Fig. 3, we attempt to alleviate this issue. The proposed approach is compared against regular SVM, RFS and NU-RFS using band grouping only. C. Uniform Voting and Kernel Density Decision Fusion NU-RFS produces a group of features to be trained by each classifier. Each of these sets of features has a unique class separation capability since they form a different combination of the original feature set. In order to make use of this uniqueness in our system, we estimate a set of scores for each classifier that is proportional to the ability of the classifier to classify each class from all the other classes. For example, if there are classifiers and the data has classes then we generate a score matrix of size . These scores are computed by kernel density estimation across all the features. Oh et al. proposed an approach to estimate the class separation [31] to perform hand writing recognition. Our approach is inspired from the algorithm proposed in [31]. After performing NU-RFS, we have a set of training data for each classifier. A probability density for a class for a feature vector is estimated. A probability distribution for the class can be computed by (2)

where, is the kernel function and is the smoothing parameter. We have tested the rectangular, normal, triangular and epanechnikov kernel functions [34] in the proposed system. Let be the distance between any two classes and be the distance between any given class and all other classes can be computed by (3) and (4) respectively. (3) (4) where are class distributions of and respectively. When there is complete overlap between distributions, (3) gives a minimum and no overlap gives maximum. i.e., when there is complete overlap cannot distinguish two classes whereas it can distinguish the best when there is no overlap. Thus it defines the ability of to differentiate between any two classes of and . Equation (4) is computed for every class then values are averaged over all the selected features resulting in an array of scores of separability of each class with respect to the selected features. These scores are sorted in descending order where higher the score, higher the ability to classify a class. This is shown in Fig. 3 as compute class score. This process is repeated for every classifier in MCS resulting in a score matrix representing the ability of each classifier to distinguish a particular class from . We denote these scores as . Now, each row of this matrix corresponds to the ability of each classifier to distinguish classes where higher the value of , higher the chance of distinguishing that class from all other classes . Although, estimating class probability density function is a harder problem than classification, the aim of estimating these scores is to get a coarse estimate of separation which can be used during decision fusion. After estimating the score matrix, the actual classification is performed with all the SVM classifiers in MCS resulting in class labels for every test sample from each classifier. Let be

796

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 2, APRIL 2013

number of test samples with number of classifiers in MCS, then the resulting class labels can be represented as a matrix. Each column of this matrix holds the prediction of each test sample from different classifiers in MCS. In the hard decision fusion scenario, the final classification decision can be obtained by a majority vote over each classifier. The final decision of test sample can be obtained from by (5) (5) (6) Fig. 4. Sensitivity of various algorithms to number of training samples with AVIRIS Indian Pines data (error bars corresponds to 95% confidence intervals).

Mathematically, mode gives the most frequently occurring event. By means of majority vote, we get a hard decision fusion. This only uses the predictions of classifiers in MCS. The voting scheme described in (5) is uniform voting. i.e., each classifier in MCS has equal strength in deciding the final class label. We propose a voting mechanism based on scores where the strength of each classifier is modified according to its ability to classify a particular class from all other classes . This can be achieved by creating a modified class label column matrix for each test sample based on corresponding . This is done by appending with an array of length . The elements of the appended array will have the class label corresponding to highest , where length of can vary depending on as given in (6). From our experiments with various datasets, we arrived at (6). This is then used to perform majority vote. The decisions of MCS are not modified when . These scores will bias the majority voting decision based on the strengths and weaknesses of each classifier. IV. EXPERIMENTAL SETUP AND RESULTS A. Experimental Dataset and Setup We have used three hyperspectral datasets representing different analysis tasks—two datasets representing an agricultural problem, where classes are vegetation cover types and the third representing an urban classification problem. The first experimental HSI dataset employed was acquired using NASA’s AVIRIS sensor and was collected over northwest Indiana’s Indian Pine test site in June 1992 [32]. The image represents a vegetation-classification scenario with 145 145 pixels and 220 bands in the 400 to 2450 nm region of the visible and infrared spectrum. This dataset has 16 classes. The second experimental HSI dataset was acquired over north Mississippi’s Blackbelt Experiment Station agricultural test site in June 2008. The dataset has seven classes—each representing chemical stress on a corn-crop [35]. The corn crop, grown under controlled conditions was induced with varying degrees of chemical stress. The crop was sprayed with seven different concentrations of Glufosinate herbicide diluted with water, where the seven classes or concentrations were (control), 1/32, 1/16, 1/8, 1/4, 1/2, and 1 times the labeled rate of the herbicide concentrations. This dataset is acquired by using handheld Analytical Spectral Devices (ASD) sensor resulting in HSI datasets with 2151 bands. Since all seven classes in this

dataset represent the same species under varying degrees of stress, it makes for a very challenging classification problem. The third dataset used has 102 spectral bands acquired by the ROSIS sensor over Pavia, Northern Italy. This data has 9 classes. The classes are Water, Trees, Asphalt, Self-Blocking Bricks, Bitumen, Tiles, Shadows, Meadows and Bare soil. For this dataset, considering the very high number of samples from each class, the model selection is conducted on a subset of training samples rather samples from each class. These model parameters are used to train the SVM classifiers. The classification is performed using SVM with Gaussian RBF kernel for all experiments [35]. Model selection for the SVM is performed by using cross validation and grid search. For all the datasets the RBF parameters C and are estimated by selecting 10% of training samples from each class and performing a grid search by using cross validation except Pavia dataset where we used 5% of training data for model selection. We compute a confusion matrix for every classification problem and then estimate user, producer and overall accuracies. B. Results With the AVIRIS Indian Pines Dataset Experimental results demonstrate the superiority of the proposed approach compared to SVM and RFS. The study of overall accuracy for various numbers of training samples is shown in Fig. 4. At 10% training, NU-RFS with kernel density based fusion achieved an overall accuracy of 93.7% with rectangular kernel, and RFS and NU-RFS achieves 81.5% and 80.3% respectively. Interestingly, SVM with kernel scoring performs better than RFS and Band grouping based NU-RFS. The proposed Kernel Scoring based NU-RFS outperforms other approaches by 10%. The maximum overall accuracy achieved is 97.3% with 50% training. For all our experiments, we have used an ensemble size of . We observed that increasing the ensemble size does not provide any significant improvement beyond 8. This is similar to an observation made by Waske et al. [12] when using simple RFS. The study of the performance of various algorithms with respect to sample size follows a very interesting trend. The proposed approach clearly outperforms other techniques that are compared. In Fig. 4, rectangular kernel is used to compute the density. Fig. 5 shows the comparison of various kernel functions with respect to number of training samples used for training.

SAMIAPPAN et al.: NON-UNIFORM RANDOM FEATURE SELECTION AND KERNEL DENSITY SCORING

797

Fig. 5. Sensitivity of proposed approach with different kernel functions to number of training samples with AVIRIS Indian Pines data.

Fig. 7. Producer accuracies of different classes of Indian Pines data with various algorithms.

Fig. 6. User accuracies of different classes of Indian Pines data with various algorithms.

In this case, the rectangular and triangular kernels perform almost equally well compared to normal and epanechikov. The error bars shown are corresponding to 95% confidence intervals. From experimentation, it is found that the performance of the classifier increases with increasing initially and it decreases after reaching a particular value. In order to maintain uniformity among various experiments, the value of is used, datasets. From this yields the best performance for all the three of the classiexperimentation, it is found that the performance fier increases with increasing initially and it decreases after reaching a particular value. Figs. 6 and 7 illustrate the user accuracies (UA) and producer accuracies (PA) for each class in Indian Pines dataset. We observe a consistent improvement ( 2–40%) in both user and producer accuracies throughout all classes when employing the proposed kernel density based scoring approach. This is expected, as the confusion between the classes is reduced via the proposed scoring approach. The standard deviation is shown as error bars for user and producer accuracies. The deviation is approximately 0.8% for both user and producer accuracies. C. Results With the Corn Stress Dataset The study of sensitivity of various classifiers against different training sample sizes reveals an interesting pattern. As is observed with Indian Pines, the proposed approach handles the

Fig. 8. Sensitivity of various algorithms to number of training samples with corn stress data.

small sample size situation better than other approaches. Fig. 8 shows a comparison study of the overall accuracy versus the number of training samples. Systems based on NU-RFS exhibit a 1.5 to 3% increase in overall performance. It is worth pointing out that the performance of the kernel scoring NU-RFS algorithm is above 99% with a sample size of 10 samples/class, where single SVM and original RFS algorithms produce an accuracy of approximately 93% and 95% respectively. Fig. 9 shows the performance of different types of kernel used. Rectangular kernel performs better than other kernels with small training sample size. Figs. 10 and 11 illustrate the user and producer accuracies for each class in the Corn stress dataset. A similar increase in the user and producer accuracies is observed as with the Indian Pines data. The standard deviation is shown as error bars for user and producer accuracies. The deviation is approximately 0.1% for both user and producer accuracies. Tables I and II show the confusion matrices for classification without feature selection and with Kernel scoring NURFS (Triangular kernel function) with 10% training data, respectively. Both user accuracies (UA) and producer accuracies (PA) are improved with the proposed feature selection. Overall accuracies of other feature selection approaches and other kernels are shown in Figs. 8 and 9.

798

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 2, APRIL 2013

TABLE II SVM CLASSIFICATION WITH KERNEL SCORING NURFS

Fig. 9. Sensitivity of proposed approach with different kernel functions to number of training samples with corn stress data.

Fig. 12. Sensitivity of various algorithms to number of training samples with Pavia data.

D. Results With the Pavia, Italy Dataset Fig. 10. User accuracies of different classes of corn stress data with various algorithms.

Fig. 11. Producer accuracies of different classes of corn stress data with various algorithms.

TABLE I SVM CLASSIFICATION WITH NO FEATURE SELECTION

The experimental results with the Pavia, Italy dataset show an improvement in overall classification accuracy compared to other algorithms. Fig. 12 shows the performance of the proposed approach under various percentage of number of training samples. Kernel Density based NU-RFS achieves a gain of 7% and also it performs well under limited training samples. Both Kernel Density based approaches combined with SVM and NU-RFS shows superior performance over all the other approaches, and this shows the effectiveness of the proposed decision fusion approach. Fig. 13 illustrates performance with different kernel functions. Figs. 14 and 15 shows the user and producer accuracies for each class of Pavia dataset. Water, trees, bitumen, tiles and bare soil classes gained an improvement of 1 to 5%. This improvement can be seen from other kernel scoring techniques without feature selection. The standard deviation is shown as error bars for user and producer accuracies. The deviation is approximately 0.2% for both user and producer accuracies. V. DISCUSSIONS AND CONCLUSION An SVM based MCS with NU-RFS is developed in this work for hyperspectral classification. The overall accuracies are significantly higher compared to regular SVM based single classifiers and Uniform RFS based MCS. NU-RFS seems to handle the small sample size situation better than other MCS techniques in our comparison study. SVMs are known to better handle small sample size situations compared to statistical classifiers like ML. However, we have observed that single SVM classifier too suffers the curse of dimensionality. The number of

SAMIAPPAN et al.: NON-UNIFORM RANDOM FEATURE SELECTION AND KERNEL DENSITY SCORING

799

situations and hence it will be interesting to explore the possibilities of using this with semi-supervised learning for datasets with few ground truth points. We are testing these aspects in ongoing work. REFERENCES

Fig. 13. Sensitivity of proposed approach with different kernel functions to number of training samples with Pavia data.

Fig. 14. User accuracies of different classes of Pavia data with various algorithms.

Fig. 15. Producer accuracies of different classes of Pavia data with various algorithms.

features that we selected for each classifier is consistent with previous RFS implementations [12], [29]. We also conducted experiments with different values of and where we selected more features, the accuracy improved in some regions, though the impact was marginal. The user and producer accuracies with the proposed approach show a consistent improvement when compared to other conventional approaches. We believe that a study using better decision fusion strategies such as the Linear Opinion Pool (LOP) may yield further improvements, because, it will provide a soft fusion by using the distance between samples to the hyper plane of SVM. It is important to note that the proposed NU-RFS with the kernel density scoring particularly performs well with small sample size

[1] S. G. Beaven, D. Stein, and L. E. Hoff, “Comparison of Gaussian mixture and linear mixture models for classification of hyperspectral data,” in Proc. IEEE IGARSS, 2000, vol. 4, pp. 1597–1599. [2] B. Demir and S. Erturk, “Increasing hyperspectral image classification accuracy for data sets with limited training samples by sample interpolation,” in Proc. 4th Int. Conf. Recent Advances in Space Technologies, 2009, 2009, pp. 367–369. [3] Q. Jackson and D. A. Landgrebe, “An adaptive classifier design for high-dimensional data analysis with a limited training data set,” IEEE Trans. Geosci. Remote Sens., vol. 39, pp. 2664–2679, Dec. 2001. [4] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, pp. 1778–1790, Aug. 2004. [5] J. A. Benediktsson, C. Garcia, B. Waske, J. Chanussot, J. R. Sveinsson, and M. Fauvel, “Ensemble methods for classification of hyperspectral data,” in Proc. IEEE IGARSS 2008, pp. I-62–I-65. [6] S. Prasad and L. M. Bruce, “A robust multi-classifier decision fusion framework for hyperspectral, multi-temporal classification,” in Proc. IEEE IGARSS 2008, pp. II-273–II-276. [7] S. Prasad and L. M. Bruce, A Divide-and-Conquer Paradigm for Hyperspectral Classification and Target Recognition Optical Remote Sensing. Berlin Heidelberg: Springer, 2011, vol. 3, pp. 99–122. [8] C. Mingmin, K. Qian, J. A. Benediktsson, and R. Feng, “Ensemble classification algorithm for hyperspectral remote sensing data,” IEEE Geosci. Remote Sens. Lett., vol. 6, pp. 762–766, Oct. 2009. [9] M. S. Haghighi, A. Vahedian, and H. S. Yazdi, “Creating and measuring diversity in multiple classifier systems using support vector data description,” Elsevier Applied Soft Computing, vol. 11, pp. 4941–4942, Dec. 2011. [10] R. Ranawana, “Intelligent multi-classifier design methods for the classification of imbalanced data sets—Application to DNA sequence analysis,” Ph.D. dissertation, Univ. of Oxford, Oxford, U.K., 2007. [11] G. Brown, J. Waytt, R. Harris, and X. Yao, “Diversity creation methods: A survey and categorisation,” J. Information Fusion, vol. 6, 2005. [12] B. Waske, S. van der Linden, J. A. Benediksson, A. Rabe, and P. Hostert, “Sensitivity of support vector machines to random feature selection in classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 48, pp. 2880–2889, Jul. 2010. [13] Y. Jinn-Min, K. Bor-Chen, Y. Pao-Ta, and C. Chun-Hsiang, “A dynamic subspace method for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, pp. 2840–2853, Jul. 2010. [14] J. Ham, C. Yangchi, M. M. Crawford, and J. Ghosh, “Investigation of the random forest framework for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 43, pp. 492–501, Mar. 2005. [15] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, Aug. 1996. [16] R. Bryll, R. G. Osuna, and F. Quek, “Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets,” Pattern Recogn., vol. 36, pp. 1291–1302, 2003. [17] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proc. 13th Int. Conf. Machine Learning, Bari, Italy, 1996. [18] S. Samiappan, S. Prasad, and L. M. Bruce, “Automated hyperspectral imagery analysis via support vector machines based multi-classifier system with non-uniform random feature selection,” in Proc. IEEE IGARSS, Vancouver, Canada, 2011. [19] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 212–167, 1998. [20] D. J. Sebald and J. A. Bucklew, “Support vector machines and the multiple hypothesis test problem,” IEEE Trans. Signal Process., vol. 49, pp. 2865–2872, 2001. [21] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Trans. Neural Networks, vol. 13, pp. 415–425, 2002. [22] E. Alpaydin, “Multiple networks for function learning,” in Proc. IEEE Int. Conf. Neural Networks, 1993, vol. 1, pp. 9–14. [23] R. T. Clemen, “Combining forecasts: A review and annotated bibliography,” Int. J. Forecasting, vol. 5, pp. 559–583, 1989. [24] A. Krogh, “Neural network ensembles, cross validation, active learning,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1995.

800

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 6, NO. 2, APRIL 2013

[25] D. W. Opitz, “Generating accurate and diverse members of a neuralnetwork ensemble,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1996. [26] D. W. Opitz et al., “Actively searching for an effective neural-network ensemble,” Connection Science, vol. 8, pp. 337–353, 1996. [27] L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001. [28] S. Prasad and L. M. Bruce, “Decision fusion with confidence-based weight assignment for hyperspectral target recognition,” IEEE Trans. Geosci. Remote Sens., vol. 46, pp. 1448–1456, 2008. [29] T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 832–844, 1998. [30] C. Simin, R. Zhang, W. Cheng, and H. Yuan, “Band selection of hyperspectral images based on Bhattacharyya distance,” WSEAS Trans. Inf. Sci. and App., vol. 6, pp. 1165–1175, 2009. [31] I.-S. Oh, J. S. Lee, and Y. S. Ching, “Analysis of class separation and combination of class-dependent features for handwriting recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 21, pp. 1089–1094, 1999. [32] Purdue University, link (on September 29, 2011) [Online]. Available: https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html [33] R. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Chicester, U.K.: Wiley, 2006. [34] V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory of Probability and Its Applications, vol. 14, pp. 153–158, 1967. [35] M. A. Lee, S. Prasad, L. M. Bruce, T. R. West, D. Reynolds, T. Irby, and H. Kalluri, “Sensitivity of hyperspectral classification algorithms to training sample size,” in Proc. IEEE WHISPERS 2009, Grenoble, France, 2009. [36] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computing, vol. 3, pp. 79–87, 1991. [37] R. A. Jacobs and M. I. Jordan, “Adaptive mixtures of local experts and the EM algorithm,” Neural Computing, vol. 6, pp. 79–87, 1991. [38] S. Kumar, J. Ghosh, and M. M. Crawford, “Hierarchical fusion of multiple classifiers for hyperspectral data analysis,” Pattern Analysis Applications, pp. 210–220, 2002, Springer Verlag.

Sathishkumar Samiappan (M’12) received the B.E. degree in electronics and communication engineering from Bharathiar University, Coimbatore, India, in 2003 and the M.Tech degree in computer science, with a major in computer vision and image processing, from Amrita University, Coimbatore, India, in 2006. Since 2009, he has been working toward the Ph.D. degree in electrical and computer engineering at Mississippi State University, Starkville, MS. Until 2009, he was a Lecturer in the Department of Electronics and Communication Engineering, Amrita University, Coimbatore, India. Since 2009, he has been a Graduate Research Assistant with Geosystems Research Institute and Graduate Teaching Assistant with the Department of Electrical and Computer Engineering at Mississippi State University, Starkville, MS. His research interests include big data problems, pattern recognition, image processing, machine learning and hyperspectral image classification.

Saurabh Prasad (S’05–M’09) received the B.S. degree in electrical engineering from Jamia Millia Islamia, India, in 2003, the M.S. degree in electrical engineering from Old Dominion University, Norfolk, VA, in 2005, and the Ph.D. degree in electrical engineering from Mississippi State University, Starkville, MS, in 2008. He is an Assistant Professor in the Electrical and Computer Engineering Department at the University of Houston (UH), and is also affiliated with UH’s Geosensing Systems Engineering Research Center and the National Science Foundation (NSF) funded National Center for Airborne Laser Mapping (NCALM). He is the Principal Investigator/Technical-lead on projects funded by the National Geospatial Intelligence Agency (NGA), National Aeronautics and Space Administration (NASA), and Department of Homeland Security (DHS). His research interests include statistical pattern recognition, adaptive signal processing and kernel methods for medical imaging, optical and SAR remote sensing. In particular, his current research work involves the use of information fusion techniques for designing robust statistical pattern classification algorithms for hyperspectral remote sensing systems operating under low-signal-to-noise-ratio, mixed pixel and small training sample-size conditions. Dr. Prasad is an active Reviewer for the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS and the Elsevier Pattern Recognition Letters. He was awarded the GRI’s Graduate Research Assistant of the Year award in May 2007, and the Office-of-Research Outstanding Graduate Student Research Award in April 2008 at Mississippi State University. In July 2008, he received the Best Student Paper Award at IGARSS’2008 held in Boston, MA. In October 2010, he received the State Pride Faculty Award at Mississippi State University for his academic and research contributions. He was the Lead Editor of the book entitled Optical Remote Sensing: Advances in Signal Processing and Exploitation Techniques (2011).

Lori M. Bruce (S’90–M’96–SM’01) received the B.S., M.S., and Ph.D. degree in electrical and computer engineering from the University of Alabama, Hunstville, and the Georgia Institute of Technology, Atlanta. She is the Associate Dean for Research and Graduate Studies in the Bagley College of Engineering at Mississippi State University. Dr. Bruce has been a Faculty Member for 14 years, during which she has taught approximately 40 engineering courses at the undergraduate and graduate level. Her research in image processing and remote sensing has been funded by the Department of Homeland Security, the Department of Energy, the Department of Transportation, the National Aeronautics and Space Administration, the National Geospatial-Intelligence Agency, the National Science Foundation, the United States Geological Survey, and industry, resulting in over 100 peer reviewed publications and the matriculation of more than 75 graduate students (25 as major professor and more than 50 as thesis/dissertation committee member). Dr. Bruce is an active member of the IEEE Geoscience and Remote Sensing Society, and she is a member of the Phi Kappa Phi, Eta Kappa Nu, and Tau Beta Pi honor societies, and prior to becoming a faculty member, she held the prestigious title of National Science Foundation Research Fellow.

Suggest Documents