Spectral–Spatial Classification of Hyperspectral Images ... - IEEE Xplore

2 downloads 0 Views 2MB Size Report
Lin He, Member, IEEE, Yuanqing Li, Senior Member, IEEE, Xiaoxin Li, and Wei Wu, .... HE et al.: SPECTRAL–SPATIAL CLASSIFICATION OF HSIs VIA STIW-SR.
2696

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

Spectral–Spatial Classification of Hyperspectral Images via Spatial Translation-Invariant Wavelet-Based Sparse Representation Lin He, Member, IEEE, Yuanqing Li, Senior Member, IEEE, Xiaoxin Li, and Wei Wu, Member, IEEE

Abstract—For hyperspectral image (HSI) classification, it is challenging to adopt the methodology of sparse-representation-based classification. In this paper, we first propose an 1 -minimization-based spectral–spatial classification method for HSIs via a spatial translation-invariant wavelet (STIW)-based sparse representation (STIW-SR), wherein both the spectrum dictionary and the analyzed signal are formed with STIW features. Due to the capability of a STIW to reduce both the observation noise and the spatial nonstationarity while maintaining the ideal spectra, which is proved with our signal–interference–noise spectrum model involved, it is expected that the pixels in the same class congregate in a lower dimensional subspace, and the separations among class-specific subspaces are enhanced, thus yielding a highly discriminative sparse representation. Then, we develop an approach to evaluate the sparsity recoverability of an 1 -minimization on HSIs in a probabilistic framework. This approach takes into account not only the recovery probability under the given support length of the 0 -norm solution but also the a priori probability of the support length; consequently, it overcomes the inability of traditional mutual/cumulative coherence conditions to address high-coherence HSIs. This paper reveals that the higher sparsity recoverability of a STIW-SR leads to its higher classification accuracy and that the increasing coherence does not necessarily lead to a reduced sparsity recovery probability, and this paper verifies the connection between 0 and 1 -minimizations on HSIs. Experimental results from realworld HSIs suggest that our classification method significantly outperforms several representative spectral–spatial classifiers and support vector machines. Index Terms—Hyperspectral image (HSI), sparse representation, sparsity recoverability, spatial translation-invariant wavelet (STIW), spectral–spatial classification.

Manuscript received June 16, 2013; revised December 21, 2013, April 26, 2014, and August 9, 2014; accepted October 4, 2014. This work was supported by the National High-Tech Research and Development Program of China (863 Program) under Grant 2012AA011601, by the National Natural Science Foundation of China under Grant 91120305 and Grant 61403144, and by the High-Level Talent Project of Guangdong Province of China. L. He, Y. Li, and W. Wu are with the School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China (e-mail: [email protected]; [email protected]). X. Li is with the Center for Computer Vision, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou 510275, China, and also with the College of Computer Science and Technology, Faculty of Information Technology, Zhejiang University of Technology, Hangzhou 310023, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2014.2363682

I. I NTRODUCTION

H

YPERSPECTRAL images (HSIs) contain not only spatial information but also rich spectral information [1], [2]. The methods used to classify such data can be approximately divided into the pixelwise and spectral–spatial categories [3], [4]. The former assigns a pixel to a class exclusively based on its spectrum, notably a support vector machine (SVM) due to its good performance [5], [6], whereas the latter utilizes both the pixel’s spectrum and its dependence on the spatial neighbors [4], [7]–[12]. A sparse representation originates from the observation that most natural signals can be effectively represented with a few coefficients on a basis set [13]–[18]. It has been successfully applied in image enhancement [19], signal source separation [20], [21], feature selection [22], compressive sensing [16], and biometric classification [23], [24]. Specifically, sparserepresentation-based classification (SRC) has been introduced to the HSI processing community, wherein a spectrumdictionary-based sparse representation (SDSR) is used [11], [12], [25]. In an SDSR, training pixels are used as the atoms of a dictionary Ψtr , and if the involved 0 -minimization (i.e., min α0 , s.t. Ψtr α = xte ) is relaxed to an 1 -minimization, it α can be formulated as min α1 , α

s.t. Ψtr α = xte

(1)

where xte is the test pixel, and α are the sparse coefficients. No universally best method exists for all scenarios [26], [27]. When the SRC methodology is extended to the HSI classification, the special characteristics of HSIs have to be considered. HSIs are featured by the high observation noise and the spatial nonstationarity, whereas the SRC assumes that class-specific samples lie in low-dimensional subspaces and that a test sample is specified as the sparse linear combination of the training samples [24]. Hence, for an HSI, class-specific subspaces heavily interfere with one another. If we use the SRC directly, the yielded nonzero coefficients will spread across all of the classes, implying weak discriminability. The observation noise and the spatial nonstationarity have a drastic impact on the SRC. If they can be reduced, the SRC is expected to gain enhanced discriminability. Moreover, to classify a sample, the SRC utilizes the sparsest coefficients that are in theory solved by an NP-hard 0 -minimization and are considered the objective for discriminating [11], [12], [24], [25]. If a convex 1 -minimization is used

0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

HE et al.: SPECTRAL–SPATIAL CLASSIFICATION OF HSIs VIA STIW-SR

in the SRC for computation tractability, the equivalence of 0 and 1 -minimizations (i.e., the sparsity recoverability of an 1 minimization in this paper) must be evaluated. As hyperspectral data might be of high coherence1 , the classical mutual coherence (MC)/cumulative coherence (CC) conditions [28]–[30] are no longer valid for evaluating the sparsity recoverability; thus, the principal premise of the 1 -minimization-based SRC for HSIs cannot be established. The contributions of our paper are identified in the following. 1) We propose a spectral–spatial classification method for HSIs using a spatial translation-invariant wavelet (STIW)-transform-based sparse representation (STIWSR). In a STIW-SR, both the dictionary and the analyzed signal are formed by STIW features. We prove that a STIW can reduce the spectral observation noise as well as the spatial nonstationarity while maintaining the classspecific truth spectra. Such capabilities tend to gather pixels of the same class in a much lower dimensional subspace, and the separations among class-specific subspaces are expected to be greatly enhanced. Thus, given a test pixel, the STIW-SR is very likely to reach the most sparse linear combination of the training samples, leading to its high discriminating ability. 2) Then, we develop an probabilistic approach to evaluate the sparsity recoverability of the 1 -minimization on HSIs. The high-coherence characteristic of HSIs prevents the application of traditional MC/CC conditions. By constructing our evaluation method, we address some research questions. Can the sparsity recoverability on high-coherence HSIs be evaluated? Does a higher sparsity recovery probability correspond to higher classification accuracy? Our evaluation method verifies the sparsity recoverability of the 1 -minimization on HSIs and demonstrates that our STIW-SR classifier achieving a higher sparsity recovery probability can really reach higher classification accuracy. In particular, our results reveal that, for HSIs, higher MC/CC conditions do not necessarily lead to a lower recovery probability. 3) In addition, we introduce a signal–interference–noise model while proving the effectiveness of the STIW transform. It is derived from the spectral mixture model in the context of the HSI classification and offers a framework that is beneficial to the interpretation of the spatial dependence, the spatial nonstationarity, and the spectral noise in HSIs. The proposed STIW-SR is significantly different from the SRC classifiers in [11], [12], and [25]. The STIW-SR conducts a sparse representation in the STIW domain and is a spectral–spatial classification method, whereas the classifier in [25] is a pixelwise method that employs the original hyperspectral pixels to form a sparse representation. In contrast 1 The MC/CC conditions defined in [28]–[30] are the measures of correlations among the atoms of a dictionary. In this paper, we use the term “coherence” to denote the correlation among the pixels in an HSI. The high coherence of the HSI means that its pixels are highly correlated with one another and leads to high MC/CC conditions of the spectrum dictionary.

2697

to the spectral–spatial classifier in [11] and [12] that merely utilizes the spatial neighboring of the test samples to form a joint sparsity model, the STIW-SR integrates spatial information from both the training and test samples. Compared with those classifiers, the STIW-SR offers the potential to yield significantly higher discriminability. As regards the proposed sparsity recoverability evaluation method, to the best of our knowledge, no previous study has been dedicated to evaluating the sparsity recoverability of the 1 -minimization on highcoherence HSIs. The remainder of this paper is organized as follows. Section II provides an introduction to the related classification methods. The proposed spectral–spatial classification method is described in Section III. The proposed sparsity recoverability evaluation method is introduced in Section IV. Section V demonstrates the effectiveness of our classification method through experimental results on four real-world HSIs. Finally, Section VI presents the conclusions of this paper. II. R ELATED M ETHODS The methods for spatial–spectral classification can be roughly divided into three categories. In preprocessing-based classification, the spectrum of a pixel and its spatial dependence are jointly preprocessed to extract the spectral–spatial feature, which is then fed into a classifier [as shown in Fig. 1(a)]. In postprocessing-based classification, pixelwise classification is first applied, and then, the spatial dependence is used to regularize this preliminary classification [as shown in Fig. 1(b)]. In integrated classification, the spectral feature and the spatial dependence are simultaneously utilized to form an integrated classifier [as shown in Fig. 1(c)], where neither of them can be separately used in different phases. The composite SVM (CSVM) [7], the integration of subspace multinomial logistic regression and Markov random fields (iSMLR-MRFs) [10], and the simultaneous orthogonal matching pursuit (SOMP) [11], which follow the aforementioned three schemes, respectively, are summarized as follows.

A. CSVM (w)

is the spatial feature extracted from Suppose that xi the neighbors of pixel xi . The CSVM utilizes the composite kernel, i.e.,   (w) (w) (2) KC (xi , xj ) = μK1 (xi , xj )+(1 − μ)K2 xi , xj to conduct the SVM and hence integrates both the spectral information and the spatial information into the classification, where K1 (·, ·) and K2 (·, ·) are two kernel functions, and μ is the regularization parameter. B. iSMLR-MRF Initially, the multinomial logistic regression is first utilized to learn the a posteriori probability distributions of labels. Then, contextual information is included using the prior

2698

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

Fig. 1. Categories of the spectral–spatial classification. (a) Preprocessing-based method. (b) Postprocessing-based method. (c) Integrated classification method.

Gibbs-distribution-based MRF; thus, the maximum a posteriori segmentation, i.e., ⎡ ⎤ n   ˆ ˆ = arg max⎣ log p(yi |xi , ω)+μ δ(yi − yj )⎦ (3) y y

i=1

i, j∈A

is computed, where xi and yi are the ith pixel and the associated ˆ is the estimation of the parameter of label, respectively, ω a class-specific subspace, A is the set of interacting pairs of pixels, and μ is the regularization parameter. Intuitively, (3) strikes a balance between the fitness of labels to the observed pixels and the spatial smoothness of the global labeling. C. SOMP The SOMP harnesses the joint sparsity model to integrate the spectral information and the spatial information. Suppose that Xte = [x1 , x2 , . . . , xT ] contains T neighborhoods of a test pixel. In each iteration step, an atom is selected in terms of (4) λk = arg max RTk−1 xtr, i p i

and is added to the active set A, where R0 = Xte , and Rk−1 is the residual calculated in the previous iteration step. Then, the residual between Xte and its projection on A is updated by the least square estimation. Finally, class belonging is determined ˆ i are the ith class ˆ i  , where Ai and S by arg min Xte − Ai S F i

subdictionary of A and the associated coefficients, respectively. III. P ROPOSED S PECTRAL –S PATIAL C LASSIFICATION M ETHOD In the proposed spectral–spatial classification method, we utilize a STIW transform to extract the spectral–spatial feature and subsequently construct the STIW-SR classifier. Intuitions for introducing a STIW are as follows: 1) the wavelet transform performs adaptive and local time-frequency decomposition

[18], [31], [32], which could superbly separate noise and extract spatially local information in HSIs; 2) unlike an orthogonal discrete wavelet transform (ODWT) [18], having been used to extract spectral features [33], the STIW transform, following the frame condition [18], [34], [35], is not the most compact and could be more flexible when collecting useful information in HSIs; 3) the STIW transform, which requires no downsampling during the transform, is spatially translation invariant, which implies that the original pixel location will not be destroyed when we utilize it to integrate the spectral information and the spatial information. We will later prove the effectiveness of the STIW transform on HSIs. Based on the STIW, we design highly discriminating SRC.

A. Construction of STIW Transform for HSIs Let h(m, n, l) represent a pixel at the lth band of an HSI, h(m, n) be the spectrum of this pixel that can be defined as in (15), and hm, n, l stand for the entire HSI, where (m, n) ∈ Z2 , and l (l = 1, . . . , NB ) denote the index

of spectral bands. The image at the lth band hm, n (l) = m, n∈Z2 hx, y (l)δ(x − m, y − n) can be viewed as the 2-D sampling of a finite energy function hx, y (l) ∈ L2 (R2 ), where δ(·, ·) represents the unit impulse, and (x, y) ∈ R2 . A dyadic wavelet transform (DYWT) is translation invariant, wherein the time-domain translation is not downsampled [18]. We utilize an extended spatial DYWT to construct the STIW transform for HSIs; this approach does not require spatial downsampling. For the spatial case, there is a scale function φ(x, y) that is differentiable smooth; thus, ψ (x) (x, y) = dφ(x, y)/dx and ψ (y) (x, y) = dφ(x, y)/dy are the wavelet functions along the x- and y-directions, respectively. By dilating with {2j }j∈Z and translating with m and n, function sequences {φj, m, n }j, m, n∈Z3

HE et al.: SPECTRAL–SPATIAL CLASSIFICATION OF HSIs VIA STIW-SR

2699

(·)

and {ψj, m, n }j, m, n∈Z3 are generated, where φj, m, n =

(1/2 )φj ((x/2 ) − m, (y/2 ) − n), and ψj, m, n = (·) j j j (1/2 )ψj ((x/2 ) − m, (y/2 ) − n). Such sequences span approximation space Vj = span{φj, m, n }(m, n)∈Z2 and detail j

j

j

(·) (·) space Wj = span{ψj, m, n }(m, n)∈Z2 , where Vj+1 ⊂ Vj , (x) (y) Vj+1 ∪ Wj+1 ∪ Wj+1 = Vj , and V−∞ = L2 (R2 ). Thus, a (·) function in Vj+1 or Wj+1 is also in Vj , and we find that

φj+1, 0, 0 =



g(m, n)φj, m, n

(5)

g  (m, n)φj, m, n

(6)

Similarly, applying the Fourier transform to both sides of (5) results in Φj+1 (ωx , ωy ) =

1 G(2j ωx , 2j ωy )Φj (2j ωx , 2j ωy ) 2

(9)

where G(·, ·) is the Fourier transform of g(m, n). Thus, Aj+1 (ωx , ωy ) = Hl (2j ωx , 2j ωy )Aj+1 (ωx , ωy ) can be found, where Hl (·, ·) is the Fourier transform of h−x, −y (l). Then, the inverse Fourier transform of both sides of this equation yields

m, n (·)

ψj+1, 0, 0 =



hx, y (l), φj+1, m, n = hx, y (l), φj, m, n ∗ g(j)

(10)

m, n 

where g(m, n) and g (m, n) are the coefficients of the linear combinations. Then, in the HSI scenario, we can project a band hx, y (l) onto the jth scale along the spatial dimension (x) (y) with inner products hx, y (l), ψj, m, n , hx, y (l), ψj, m, n , and hx, y (l), φj, m, n , which represent the spatial details of the xand y-directions, and the spatial approximation at the jth scale, respectively. To retain spatial translation invariance, m and n are not downsampled, which is opposite to the (·) ODWT. Thus, {ψj, m, n }(m, n)∈Z2 and {φj, m, n }(m, n)∈Z2 (·)

the frames of Wj and Vj , respectively, J J (x) (y) where VJ = L2 (R2 ). j=−∞ Wj ∪ j=−∞ Wj ∪

form

(x)

where g(j) denotes the filter after inserting 2j − 1 zeros between the neighboring samples of g(−m, −n). Viewing a band of an HSI as the approximation of h(x, y, l) at the 0th scale, we have hx, y (l), φj+1, m, n = hx, y (l), φ0, m, n . ∗ g(0) · · · ∗ g(j−1) = hx, y (l), δ(x − m, y − n) ∗ g(0) · · · ∗ g(j−1) = hm, n (l) ∗ g(0) · · · ∗ g(j−1) . (11) Finally, HSI hm, n, l can be mapped into a STIW scale space by

(y)

{ψj, m, n , ψj, m, n }j=−∞,...,J,(m, n)∈Z2 and {φj, m, n }(m, n)∈Z2 build a frame of L2 (R2 ) that can represent hx, y (l) in a stable and complete manner. Then, we have  hx, y (l), φj, m, n φ¯j, m, n hx, y (l) = (m, n)∈Z2



+

(x, y)∈Z2 j=−∞,...,J



+



(x) (x) hx, y (l), ψj, m, n ψ¯j, m, n



(y) (y) hx, y (l), ψj, m, n ψ¯j, m, n

(7)

(x, y)∈Z2 j=−∞,...,J

(·) where φ¯j, m, n and ψ¯j, m, n are the dual frames with respect to (·)

φj, m, n and ψj, m, n , respectively. We will employ projection hx, y (l), φj, m, n to construct our STIW. Directly calculating hx, y (l), φj, m, n is inconvenient since we have to interpolate the observed hm, n (l) to obtain hx, y (l). As hx, y (l), φj, m, n = hx, y (l) ∗ φj, −m, −n , applying the Fourier transform to both sides of such an equation leads to Aj (ωx , ωy ) =

+∞ 

φj hm, n, l

⎛ h (0) · · · ∗ g(j−1) ⎞ m, n (1) ∗ g hm, n (2) ∗ g(0) · · · ∗ g(j−1) ⎟ 1 ⎜ ⎜ ⎟ = j⎝ .. ⎠ 2 . hm, n (NB ) ∗ g(0) · · · ∗ g(j−1)

where coefficient 1/2j is used to maintain the true spectrum (as shown in the following section). For convenience, we hereafter abbreviate the right-hand side of (12) as (1/2j )hm, n, l ∗ g(0) · · · ∗ g(j−1) . The scale function used in the STIW could be the cubic spline due to computation convenience and its excellent ability to separate high-order singularity [18], [36], with the corresponding wavelet function being the quadratic spline. The STIW transform is summarized in Algorithm 1. Algorithm 1 STIW transform for HSIs 1) Input: An HSI hm, n, l and filter g(m, n) determined by (9). 2) Insert 2j − 1 zeros between the neighboring samples of g(−m, −n) to generate g(j) . 3) Repeat: Apply scale approximation at the lth band as follows:

Hl (ωx + 2kπ, ωy + 2kπ) φj hm, n (l) =

k=−∞

×Φj (ωx + 2kπ, ωy + 2kπ)

(8)

where Aj (·, ·), Hl (·, ·), and Φj (·, ·) are the Fourier transforms of hx, y (l), φj, m, n , hx, y (l), and φj, −m, −n , respectively.

(12)

Until l = NB .

1 hm, n (l) ∗ g(0) · · · ∗ g(j−1) . 2j

(13)

2700

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

B. Effectiveness of STIW on Reducing Spectral Noise and Spatial Nonstationarity Based on an introduced signal–interference–noise model, we prove that the STIW transform is able to reduce both the spectral noise and the spatial nonstationarity in HSIs while maintaining the class-specific truth spectrum. 1) Signal–Interference–Noise Model: Normally, spectral mixture models are utilized to formulate HSI data [37]– [39]. Without loss

of generality, consider bilinear mixture R−1 R α s + model (BMM) h = R i=1 i i i=1 j=2 βi si sj + v, where si stands for the ith endmember, si sj represents the Hadamard product of the ith and jth endmembers, αi and βi are the associated abundance fractions, and v denotes the observation or sensor noise that can be formulated as a zero-mean Gaussian white noise sequence [39], [40]. h(m, n), si (m, n), and v(m, n) are here shortened as h, si , and v, respectively, for brevity. If the interactions among the lights of distinct endmember materials are ignored, the BMM is then degenerated to the linear mixture model. In the context of the HSI classification, each class can be characterized by a set of endmember spectra and the associated set of abundance fractions since the assumption that a single endmember can exclusively represent a class is generally far from reality [41]. Under real HSIs, the fraction abundance of a class varies across distinct pixels and thus can be formulated with random variables. Therefore, for each class, there exists an expectation spectrum (or loosely speaking, a truth spectrum), with the observed spectra centering around it. In this situation, the aforementioned BMM can be reformulated as h=

R  i=1

α ¯ i si +

R−1 R 

β¯i si sj +

i=1 j=2

R 

(αi − α ¯ i )si

i=1

+

R R−1 

(βi − β¯i )si sj + v

(14)

i=1 j=2

where αi and β i are the expectation fraction of an endmember material and that of the interaction between various end

members, respectively. Then, we obtain h = s + i di + v,

R

R ¯

where s represents ¯ i si + R−1 i=1 α i=1 j=2 βi si sj and is the truth spectrum, and di is the additive spectrum deviation resulting from the uncertainty of fraction abundance, which is highly correlated over neighboring spectral bands. According to the central limit theorem and its several variant versions, the sum of random variables is close

to the Gaussian distribution [42], [43]. Therefore, term i di in the righthand side of the aforementioned equation can be approximated by a Gaussian random quantity with nondiagonal covariance. Then, pixel h(m, n) could be expressed with the following signal–interference–noise model: h(m, n) = s(m, n) + u(m, n) + v(m, n)

(15)

where signal term s(m, n) denotes the truth spectrum of a class, and u(m, n) stands for the interference following the zero-mean Gaussian distribution with a nondiagonal covariance matrix. In fact, under the assumption of white observation

Fig. 2. Observation noise and interference in spectra. (a) Original spectrum. (b) Spectra with noise removed. The red curve in (b) represents the truth spectrum. The observation noise appearing in the original spectrum is connected to the rapid variation across neighboring bands, implying that it can be approximated by the white noise. Interference is the difference between the noise-removed spectrum and the truth spectrum, which is associated with the spectrum deviation that is highly correlated over neighboring bands.

noise v(m, n), interference term u(m, n) might be used to accommodate some other nonwhite additive model errors. Direct observation on real hyperspectral data also suggests that the deviation of real-world spectra from their true spectrum could be approximately attributed to two constitutes, i.e., the observation noise and the interference, which have different appearance patterns (as shown in Fig. 2). Consider spectrum deviation t(m, n) = u(m, n) + v(m, n). For any given interference u(m, n), we have conditional probability density p(t|u) = N (u, Λ), where Λ is the covariance matrix of Gaussian distribution N (u, Λ). Thus, u(m, n) can be viewed as the mean of deviation t(m, n). Such a first-order moment varies over distinct pixels [as shown in Fig. 2(b)], thus generating the so-called “spatial nonstationarity” of an HSI aforementioned in this paper. This nonstationarity formulation is actually from the perspective of Bayesian interference and is a little similar to the Bayesian curve fitting in [44], wherein the nonstationarity of a target value is controlled by a random Gaussian weight. We could further justify our signal–interference–noise model with real-world data in the following. Take the Indian Pines image for instance. We can easily calculate the sample covariance matrices of each class and find that they are all nondiagonal. Such kind of nondiagonalities indicates that the interference that is correlated across distinct bands really exists if the observation noise is white and the mean of a class approximates the truth spectrum of the class. Furthermore, consider the aforementioned deviation term t(m, n). If t(m, n) ∼ N (0, Γ + Λ), where Γ is the covariance matrix connected to the interference, then quadratic form tT (Γ + Λ)t follows the central chi-square distribution with the degree of freedom NB (denoted as χ2NB distribution) [45], [46]. Fig. 3 shows the normalized histogram of the quadratic forms of real HSI pixels versus the probability distribution of the χ2NB distribution. This figure suggests that t(m, n) approximately follows distribution N (0, Γ + Λ) and thus verifies the zero-mean Gaussianity of the interference under the Gaussian observation noise. Our signal–interference–noise model (15) is beneficial to the interpretation of the spatial context, the spatial nonstationarity, and the spectral noise in HSIs. Under such a model, the spatial context/dependence in an HSI implies that spatial neighboring

HE et al.: SPECTRAL–SPATIAL CLASSIFICATION OF HSIs VIA STIW-SR

2701

Fig. 3. Distribution of the quadratic forms of real spectra versus the chi-squared probability distribution. (a)–(i) Classes 1–9. The blue bin represents the rate of the quadratic form samples falling into a certain interval, whereas the red curve stands for the probability distribution calculated with 10 000 000 samples generated by the χ2N distribution. B

pixels are likely to have the same truth spectrum, whereas the interference and the noise randomly vary across these pixels. In the following section, we will show that this spatial characteristic leads to, after an HSI goes through the STIW transform, the reduction of both the spatial nonstationarity and the spectral noise but the preservation of the class-specific truth spectrum. This implies the enhancement of the separation of class-specific classes and thus offers the potential to achieve high discriminative SRC. Therefore, the benefits of our signal–interference–noise model are justified. 2) Effectiveness of STIW Transform: The integrals of both sides of (5) yield  φj+1, 0, 0 dxdy R2

⎫ ⎧  ⎬ x  1 ⎨ y = j g(m, n) φj j − m, j − n dxdy ⎭ 2 m, n ⎩ 2 2 =

1 2

R2



φj+1, 0, 0 dxdy ·



g(m, n).

= 2j s(m, n) + u(m, n) ∗ f (m, n) + v(m, n) ∗ f (m, n). (18) Then, we obtain 1 1 φj h(m, n) = s(m, n) + j u(m, n) ∗ f (m, n) j 2 2 1 + j v(m, n) ∗ f (m, n). 2



Trace {cov(h1 )} ⎧⎧ NB ⎨⎨ 1  E = ⎩⎩2j =



NB 





E

l=1 (m , n )∈Z2

=

⎧ NB ⎨  l=1

Suggest Documents