Hyperspectral Band Selection Using Improved Firefly ... - IEEE Xplore

14 downloads 135 Views 681KB Size Report
Abstract—An improved firefly algorithm (FA)-based band se- lection method is proposed for hyperspectral dimensionality re- duction (DR). In this letter, DR is ...
68

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 13, NO. 1, JANUARY 2016

Hyperspectral Band Selection Using Improved Firefly Algorithm Hongjun Su, Member, IEEE, Bin Yong, and Qian Du, Senior Member, IEEE

Abstract—An improved firefly algorithm (FA)-based band selection method is proposed for hyperspectral dimensionality reduction (DR). In this letter, DR is formulated as an optimization problem that searches a small number of bands from a hyperspectral data set, and a feature subset search algorithm using the FA is developed. To avoid employing an actual classifier within the band searching process to greatly reduce computational cost, criterion functions that can gauge class separability are preferred; specifically, the minimum estimated abundance covariance and Jeffreys–Matusita distances are employed. The proposed band selection technique is compared with an FA-based method that actually employs a classifier, the well-known sequential forward selection, and particle swarm optimization algorithms. Experimental results show that the proposed algorithm outperforms others, providing an effective option for DR. Index Terms—Band selection, dimensionality reduction (DR), firefly algorithm (FA), hyperspectral imagery.

I. I NTRODUCTION

H

YPERSPECTRAL sensors can acquire hundreds of contiguous bands over a wide electromagnetic spectrum for each pixel. The rich spectral information allows for distinguishing materials with subtle spectral discrepancy, but it usually leads to the “curse of dimensionality.” Dimensionality reduction (DR) has been commonly used to address this issue [1]. Widely employed transformation-based DR techniques include principal component analysis (PCA), minimum noise fraction transform, and Fisher’s linear discriminant analysis [2]. However, these methods usually change the physical meaning of original bands since the channels in the low-dimensional space do not correspond to individual bands but their linear combinations. Band selection methods to select a subset of the original bands may be preferred when the physical meaning of bands needs to be maintained [3], [4]. Manuscript received July 26, 2015; revised October 24, 2015; accepted October 29, 2015. Date of publication November 20, 2015; date of current version December 24, 2015. This work was supported in part by the National Natural Science Foundation of China under Grant 41571325 and Grant 41201341; by the Open Research Fund of Key Laboratory of Digital Earth Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences under Grant 2014LDE003; and by the Fundamental Research Funds for the Central Universities under Grant 2015B16814 and Grant 2014B08514. H. Su and B. Yong are with the State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering and the School of Earth Sciences and Engineering, Hohai University, Nanjing 210098, China (e-mail: hjsurs@163. com; [email protected]). Q. Du is with the Department of Electrical and Computer Engineering and the Geosystem Research Institute in High Performance Computing Collaboratory, Mississippi State University, Mississippi State, MS 39762 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2015.2497085

The objective of band selection is to select a subset of the original bands without greatly changing the following data analysis performance. There are three issues encountered in band selection: 1) how many bands to be selected; 2) which criterion (i.e., objective function) to be used for band selection; and 3) which searching strategy to be adopted. For the first issue, a concept of virtual dimensionality [5] was proposed to estimate the number of spectrally distinct signatures, which can serve as a reference value. Particle swarm optimization (PSO) has been proposed to automatically determine the optimal number of bands to be selected [6]. Thus, this letter will investigate the last two issues, and particularly, this letter will propose a firefly algorithm (FA)-based searching method that is independent of a classifier. For supervised band selection, an objective function is to measure if class information is well preserved and if class separability can be maintained. Many supervised band selection algorithms take distance metrics as class separability, such as divergence [7], the Bhattacharyya distance [1], or the Jeffreys–Matusita (JM) distance [1], [4]; in this case, class samples are usually required to examine class statistics. However, a representative spectral signature for each class can be used to compute the minimum estimated abundance covariance (MEAC) distance [8]. Sometimes, the classification accuracy produced from a classifier is directly used as the objective function [10]. Adopting a searching strategy (with a certain objective function) is to avoid testing all the possible band combinations. The subset forward-searching strategies, e.g., sequential forward selection (SFS) and sequential forward floating selection, can be used for band searching [8]. Recently, the FA, which is an evolutionary-type computation technique developed by Yang [9], has been adopted in clustering, multiobjective scheduling, and band selection [10], [11] due to its capability of global searching in a high-dimensional space to solve complex optimization problems [11]. However, band selection with FA searching in [10] (denoted as FA-classifier hereafter) may be uncompetitive due to its scheme that classification needs to be actually conducted during the band selection process. It is computationally prohibitive if the selected classifier is very expensive with many training samples. Moreover, the selected bands may be optimal only to the involved classifier. In fact, some distances utilizing statistical features (e.g., MEAC) may be preferred in gauging class separability, and they can be used as the objective function. In this letter, to avoid employing an actual classifier within the band searching process, the MEAC and JM distances are used as the objective function in the FA-based band selection algorithm. Two real hyperspectral data sets are used for performance evaluation, demonstrating that our method can

1545-598X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SU et al.: HYPERSPECTRAL BAND SELECTION USING IMPROVED FA

69

outperform with much lower computational cost. Although the FA is used in [10] for band selection, our method is different because it is based on a filter approach, which contrasts with a wrapper-based band selection approach in [10]. II. P ROPOSED M ETHOD

where distance rij is the Euclidean distance between the indexes of two sets of selected bands, i.e., xi and xj , rand is a random vector with each element being a random variable within [0, 1], and 1 is a vector with all entries equal to 1. For m fireflies, the new location of xj can be determined after considering all other fireflies as follows:

A. FA-Based Searching The FA is an evolutionary optimization algorithm proposed by Yang [9]; it is inspired by the group’s searching based on fireflies’ biological characteristic of fluorescence. Fireflies have different flashing behaviors, which are used for communication and attracting the potential prey. There are two necessary elements, i.e., brightness I and attractiveness β. The I at a particular distance r obeys the inverse square law, which means that I decreases as r increases. In the FA, the brightness of a firefly is expressed in terms of its current position: If it is brighter, its position is preferred; this also means that the value of the objective function is larger. The less bright fireflies will move toward the brighter fireflies. In the case that the brightness of fireflies has the same value, they will move randomly. For simplicity, the following hypotheses are made in describing the FA: 1) all fireflies are unisex so that one firefly will be attracted to others regardless of their sex; 2) the attractiveness of a firefly is proportional to its brightness; and 3) the brightness is proportional to the objective function [11], [12]. During the process of fly movement, I and β are updated repeatedly, and randomly distributed points are gradually moved toward the extreme points. After a certain number of iterations, the less desired points are eliminated, and the best positional points are finalized. The brightness of a firefly at some distance r of its location varies with the value of an objective function, which can be defined as I(r) = I0 e−γr

2

(1)

where I0 is the maximum brightness when r = 0; it is related to the value of the objective function, and a larger value means brighter. Here, γ is the light absorption coefficient. The attractiveness of a firefly at some distance r of its location is proportional to its light brightness observed by adjacent fireflies; it can be expressed as β(r) = β0 e−γr

2

(2)

where β0 is the attractiveness when the distance between two fireflies is zero. In this letter, we simply set I(r) = β(r). The equation that updates the jth firefly’s location based on the ith firefly’s attraction can be described as   2 1 (i) −γrij xj = xj + β0 e (xj − xi ) + α rand − (3) 2 where xi and xj are the initial position of the ith and jth fireflies, respectively, α is a constant within [0, 1], rand is a random number within [0, 1], and rij is the distance between the ith and jth fireflies. In this letter, a firefly represents selected band indexes, which is a vector. Then, (3) becomes   2 1 (i) −γrij (4) (xj − xi ) + α rand − · 1 xj = xj + β0 e 2

xnew = j

m  i=1,i=j

(i)

xj . (m − 1)

(5)

Compared with the PSO algorithm, the FA can deal with multimodal functions more naturally and efficiently [12]. More importantly, PSO can be regarded as a special case of the FA from the theoretical perspective [12]. According to the preceding formulas, there are two extreme cases when γ → 0 and γ → ∞ for the FA. If γ → ∞, β = β0 is constant, which means that the brightness does not decrease in an idealized sky. In this case, a flashing firefly can be seen anywhere, easily leading to a global optimum. This corresponds to a special case of PSO and results in the same efficiency as that of PSO. On the other hand, if γ → ∞, then the attractiveness decreases dramatically, and it is almost zero in the sight of other fireflies or the fireflies. In this case, each firefly roams in a completely random way. Consequently, this corresponds to the completely random search method. As the FA is often operated somewhere between these two extremes, it is possible to adjust parameters γ and a so that it can outperform both the random search and PSO.

B. Objective Function In this letter, instead of using classification accuracy directly from applying a classifier in each FA iteration, MEAC or JM is adopted as an objective function to measure class separability. MEAC: Assume that there are p classes in an image scene. Based on the linear mixture model, a pixel y can be considered the mixing result of the endmembers of p classes. Let the endmember matrix be S = s1 , s2 , . . . , sp . Pixel y can be expressed as y = Sα + n

(6)

where α = (α1 α2 , . . . , αp )T is the abundance vector, and n is the uncorrelated white noise with E(n) = 0 and Cov(n) = σ 2 I (I is an identity matrix). Intuitively, the selected bands should ˆ from the actual α as small as possible. If let the deviation of α only part of the classes is known, this is equivalent to determine ⎧ ⎡

−1 ⎤⎫ −1 ⎨ ⎬  ⎦ arg min trace ⎣ ST S (7) ⎩ ⎭ ΦS where ΦS is the selected band subset, and Σ is the data covariance matrix. This method may not need training samples if class representative signatures are known. JM: The distance between two classes ωi and ωj is defined as  (8) di,j = [p(y|ωi ) − p(y|ωj )]2 dy

70

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 13, NO. 1, JANUARY 2016

where p(y|ωi ) and p(y|ωj ) are two class-conditional probability distributions of y. When p(y|ωi ) and p(y|ωj ) are Gaussian distributions, the JM distance can be simplified as di,j = 2(1 − e−bi,j )

TABLE I O PTIMIZED PARAMETER OF THE FA AND PSO IN THE E XPERIMENT

(9)

where 

1 bi,j = (μi−μj)T 8

i+

2

⎛  +   ⎞ j   i 2  ⎟  1 ⎜ (μi−μj)+ ln⎜  1  1⎟ ⎝ 2 | i |2 | j |2⎠

 −1 j

(10) is the Bhattacharyya distance between ωi and ωj . Here, μi and μj are class means, and Σi and Σj are class covariance matrices. Class samples are required such that class means and covariance matrices can be reliably estimated. C. Proposed Improved FA-Based Band Selections In the proposed algorithm, the selected band indexes are regarded as firefly variables. The improved FA-based band selection algorithm can be described as follows. 1) Parameters initialization: Maximum iterations t = 100, step size α = 0.5, light absorbance γ = 1, the numbers of fireflies m, the number of selected bands b, and the objective function [MEAC in (7) or JM in (9)] as I0 . 2) Compute the brightness (i.e., attractiveness) with (1). Objective function I0 is evaluated by using the b bands whose indexes are included in a firefly; in total, m fireflies (i.e., m sets of selected bands) are evaluated. 3) Estimate the movement state using (4) and (5). 4) Update the objective function according to (7) or (9) with updated fireflies (i.e., updated selected band indexes). 5) Repeat steps 2–4 until the maximum number of iterations is reached. The final selected bands are the b bands whose indexes are included in a firefly that generates the largest I0. D. Complexity Analysis The FA offers advantages, such as easy implementation and low computational complexity. It has two inner loops when going through population m and one outer loop for iteration t. Thus, the complexity at the extreme case is O(m2 t). If m is small and t is large (m = 10 and t = 100 in this letter), the computational cost is relatively inexpensive because the algorithm complexity is linear in terms of t. In addition, it has two major advantages over other evolutionary algorithms, i.e., automatic subdivision and the ability to deal with multimodality, and the parameters can be tuned to control the randomness as iterations proceed so that convergence can be sped up [8]. On the contrary, PSO does not linearly search the solution space; therefore, the possibility of finding the optimum solution before exploiting all combinations is very low. Yang [12] has provided a comparative study and proved that the FA is computationally much cheaper than PSO. For an optimization problem, most of the computational cost is in objective evaluation; the metrics adopted in this letter can significantly reduce the computational cost, as compared with using classification accuracy, as in [10].

Fig. 1. Overall classification accuracy versus the number of fireflies.

III. E XPERIMENTS A. Experiment Data The Hyperspectral Digital Imagery Collection Experiment (HYDICE) subimage scene with 304 × 301 pixels over the Washington, DC Mall area was used in the first experiment. After bad band removal, 191 bands were left. There are six classes, i.e., roof, tree, grass, water, road, and trail. The available training and testing samples can be found in [6]. The second data set is an image scene over Pavia University in northern Italy with 610 × 340 pixels acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor. The number of spectral bands is 103, and there are nine classes. More details are available in [13]. For all ground-truth classes in [13], 20% of labeled samples are used as training samples and the rest as testing samples. The number of classes of the data sets as prior knowledge is used as the true number of classes in the objective function. For performance assessment, classification accuracy using the selected bands derived from a support vector machine (SVM) is considered. A radial basis function is selected as the SVM kernel, whose parameters are determined via fivefold cross validation and grid search. B. Parameter Tuning In the proposed method, parameters (i.e., the number of fireflies m, the maximum number of iterations, step size factor α, light absorbance γ, and maximum attractiveness β0 ) are important for its performance. This is the same as in the PSO algorithm because PSO has similar parameters (i.e., the number of particles m, the maximum number of iterations t, acceleration coefficients c1 and c2 , and the range of inertia weight w). According to the work in [6], the parameters of the FA and PSO are empirically chosen, as in Table I. For the FA, to determine the performance change with m, it varies from 5 to 30, and the average accuracy over 20 runs is presented.

SU et al.: HYPERSPECTRAL BAND SELECTION USING IMPROVED FA

71

Fig. 2. FA convergence curves. (a) (MEAC). (b) JM.

Fig. 4. Band selection performance in the HYDICE DC Mall experiment. (a) MEAC. (b) JM.

Fig. 3. Comparison of the FA with different objective functions. (a) HYDICE DC Mall. (b) Pavia University.

Fig. 1 illustrates the overall classification accuracy of the FA for different data sets with varied m and the two metrics. The number of fireflies is set as ten since it can achieve excellent performance with lower computational cost. C. Results and Discussion For comparison purposes, all the original bands, PCA, and PSO- and SFS-based band selection methods are implemented. Fig. 2 shows the convergence curves in the HYDICE data, where the FA can converge with less than 100 iterations. For better illustration, the selected band number has increased from 5 to 15; each experiment is repeated for 20 times, and the highest overall classification accuracy is recorded. The proposed improved FA-based band selection methods are

compared with the FA-classifier in [10], and Fig. 3 shows that our proposed method can provide better performance. In addition, our FA-based band selection methods are also compared with others. As shown in Figs. 4 and 5, we can observe that the performance of the FA is much better than those of PSO and SFS, and they also provided better results than those of PCA and using all the bands. For example, the FA has over 2.6% and 1.7% higher accuracy than PSO using the MEAC and JM distances, respectively, when the selected band number is six for the HYDICE DC Mall data. For the Pavia University data, the FA-based method also outperforms other methods. To further illustrate the performance, averaged classification accuracy, as well as the corresponding standard deviation, was reported in Tables II and III. According to our previous study [6], when six bands were selected, the band selection method provided the best performance for the HYDICE DC data. Thus, the number of bands to be selected is fixed to be six. For the Pavia University data, the number is fixed to be 15. For the classification results in Tables II and III, the best result for each case (i.e., the highest classification accuracy and the κ statistic) was marked in bold typeface. As expected, the proposed method outperforms other considered band selection methods. D. Computing Time To compare the computational complexity of the FA and PSO for band selection, the computing time when the algorithms run

72

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 13, NO. 1, JANUARY 2016

TABLE III C LASSIFICATION A CCURACY (%) AND K APPA C OEFFICIENT (κ) M EAN S TANDARD D EVIATION OVER 20 T RIALS FOR THE PAVIA U NIVERSITY D ATA ( THE N UMBER OF S ELECTED BANDS = 15)

TABLE IV AVERAGE C OMPUTING T IME ( IN S ECONDS ) OF THE FA AND PSO FOR S ELECTED BANDS OVER 20 T RIALS FOR THE PAVIA U NIVERSITY D ATA

Fig. 5. Band selection performance in the Pavia University experiment. (a) MEAC. (b) JM. TABLE II C LASSIFICATION A CCURACY (%) AND K APPA C OEFFICIENT (κ) M EAN S TANDARD D EVIATION OVER 20 T RIALS FOR THE HYDICE DC M ALL D ATA ( THE N UMBER OF S ELECTED BANDS = 6)

in a desktop computer with a 3.6-GHz central processing unit and 8.0-GB memory was recorded and listed in Table IV by using the Pavia University data. Note that the running time is the average value of 20 trials for each method. We can see that, due to the complexity of PSO, it is much more time-consuming. The FA-classifier method in [10] also takes much more time due to the classification process during band selection. The proposed improved FA method consumes less time with more accurate classification performance. IV. C ONCLUSION In this letter, we have proposed an improved FA for hyperspectral band selection. In order to reduce the computational cost of objective evaluation without the use of an actual classifier, two simple distance measurements, i.e., MEAC and

JM, were adopted as the objective function. The experiments demonstrate that the improved FA-based band selection method outperforms the widely used PSO-based and SFS methods. It also has fast convergence. Thus, it offers an alternative for efficient hyperspectral band selection. R EFERENCES [1] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York, NY, USA: Kluwer, 2003, ch. 2, pp. 15–35. [2] W. Li, S. Prasad, J. E. Fowler, and L. Bruce, “Locality preserving discriminant analysis in kernel induced feature spaces for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 5, pp. 894–898, Sep. 2011. [3] Y. Yuan, G. Zhu, and Q. Wang, “Hyperspectral band selection by multitask sparsity pursuit,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 2, pp. 631–644, Feb. 2015. [4] A. Ifarraguerri, “Visual method for spectral band selection,” IEEE Geosci. Remote Sens. Lett., vol. 1, no. 2, pp. 101–106, Apr. 2004. [5] C.-I. Chang and Q. Du, “Estimation of number of spectrally distinct signal sources in hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 608–619, Mar. 2004. [6] H. Su, Q. Du, G. Chen, and P. Du, “Optimized hyperspectral band selection using particle swarm optimization,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2659–2670, Jun. 2014. [7] A. Martínez-Usó, F. Pla, J. M. Sotoca, and P. García-Sevilla, “Clusteringbased hyperspectral band selection using information measures,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 12, pp. 4158–4171, Dec. 2007. [8] H. Yang, Q. Du, H. Su, and Y. Sheng, “An efficient method for supervised hyperspectral band selection,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 1, pp. 138–142, Jan. 2011. [9] X. Yang, Nature-Inspired Metaheuristic Algorithms. Bristol, U.K.: Luniver Press, 2008. [10] R. Nakamura et al., “Nature-inspired framework for hyperspectral band selection,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 4, pp. 2126–2137, Apr. 2014. [11] X. Yang and X. He, “Firefly algorithm: Recent advances and applications,” Int. J. Swarm Intell., vol. 1, no. 1, pp. 36–50, Aug. 2013. [12] X. Yang, “Firefly algorithm for multimodal optimization,” Stochastic Algorithms, Found. Appl., vol. 5792, pp. 169–178, 2009. [13] [Online]. Available: http://www.ehu.eus/ccwintco/index.php?title= Hyperspectral_Remote_Sensing_Scenes