Target Detection with a Contextual Kernel Orthogonal Subspace Projection Luca Capobiancoa and Gustavo Camps-Vallsb a Dept.
Ingegneria dell’Informazione. Universit`a degli Studi di Siena, Italy. Enginyeria Electr`onica. Universitat de Val`encia, Spain.
b Dept.
ABSTRACT The Orthogonal Subspace Projection (OSP) algorithm is substantially a kind of matched filter that requires the evaluation of a prototype for each class to be detected. The kernel OSP (KOSP) has recently demonstrated improved results for target detection in hyperspectral images. The use of kernel methods helps to combat the high dimensionality problem and makes the method robust to noise. This paper incorporates the contextual information to KOSP with a family of composite kernels of tunable complexity. The good performance of the proposed methods is illustrated in hyperspectral image target detection problems. The information contained in the kernel and the induced kernel mappings is analyzed, and bounds on generalization performance are given. Keywords: Target detection, Orthogonal Subspace Projection (OSP), Kernel methods, Composite Kernel, Spatial, Contextual information.
1. INTRODUCTION The field of remote sensing image classification comprises different machine learning paradigms, either supervised, unsupervised or semi-supervised. In the case of supervised classification, the user is given a number of labeled pixels belonging to different classes to develop a model that extrapolates well to unseen situations. Unsupervised learning does not use any labeled information and tries to find clusters in the data. Lately, high interest has been payed to supervised methods with information only about a class of interest, and several classes of approaches have been devised: 1) one-class classification, where one tries to detect one class and reject the others; 2) anomaly detection, where one tries to identify pixels differing significantly from the background; and 3) target detection, where the target spectral signature is assumed to be known, and the goal is to detect pixels that match the target. In this paper we focus on the target detection problem. Target detection is of great interest in many applications, and several techniques have been proposed in the literature, such as the Reed-Xiaoli anomaly detector,1 the orthogonal subspace projection (OSP),2 the Gaussian mixture model,3 the cluster-based detector,3 or the signal subspace processor.4 All of them assume a parametric (linear or Gaussian mixture) model. In the last years, many detection algorithms based on spectral matched (subspace) filters have been reformulated under the kernel methods framework: matched subspace detector (MSD), orthogonal subspace detector (OSD), spectral matched filter (SMF), and adaptive subspace detectors (ASD).5 Certainly, the use of kernel methods offers many advantages: they combat the high dimensionality problem in hyperspectral images, make the method robust to noise, and allow for flexible non-linear mappings with controlled (regularized) complexity.6 Kernel methods in general, and kernel detectors in particular, rely on the proper definition of a kernel (or similarity) matrix among pixels, and they have typically used the spectral information only. This paper pays attention to the kernel OSP (KOSP) target detection algorithm. We incorporate contextual information in the detector through the introduction of a family of composite kernels, which has been succesfully used in suport vector machine (SVM) hyperspectral image classification,7, 8 semi-supervised graph-based classification,9 and multi-source and multi-temporal remote sensing image classification.10 The proposed methodology Further author information: Gustavo Camps-Valls, Dept. Enginyeria Electr` onica. Escola T`ecnica Superior d’Enginyeria. Universitat de Val`encia. C/ Dr. Moliner, 50. 46100-Burjassot (Val`encia). Spain. E-mail:
[email protected], http://www.uv.es/gcamps. Telephone: +34963160197.
allows the inclusion of the contextual information in KOSP with different levels of sophistication and low computational cost. The rest of the paper is outlined as follows. Section 2 reviews the canonical expression of linear and kernel OSP. Section 3 presents the proposed formulations for the contextual KOSP (CKOSP) algorithms. Section 4 shows the experimental results, and Section 5 concludes and outlines the further work.
2. KERNEL ORTHOGONAL SUBSPACE PROJECTION This section reviews the classical orthogonal subspace projection (OSP) method and its non-linear kernel-based version proposed in Ref. 5.
2.1. Orthogonal Subspace Projection (OSP) Algorithm In the standard formulation of the Orthogonal Subspace Projection (OSP) algorithm,2 a linear mixing is assumed to model each B-bands pixel as follows: r = Mα + n, (1) where M is the mixing matrix of size (B × p), α is a (p × 1) column vector of the coefficients that account for the abundance of p endmembers spectrum contributing to the mixed pixel r, and n stands for an additive zero mean Gaussian noise vector. Singular Value Decomposition (SVD)
Subspace Projection for Background suppresion
outliers outliers
background
In order to identify one particular signature in the image, and given its spectral signature d with corresponding abundance measurements αp , the above expression can be organized by rewriting the M matrix in two submatrices M = (U : d), so that r = dαp + Uγ + n.
residuals
Figure 1. The standard (linear) OSP method first performs a linear transformation that looks for projections that identify the subspace spanned by the background, and then the background suppresion is carried out by projecting the data onto the subspace orthogonal to the one spanned by the background components. The kernel version of this algorithm consists of the same procedure yet performed in the kernel feature space.
(2)
The columns of U represent the undesired spectral signatures (background), while the γ represent the abundance for the undesired spectral signatures.
The effect of the OSP algorithm on the data set can be summarized in two steps (see Fig. 1). First, an annihilating operator rejects the background signatures for each pixel, so that only the desired signature should remain in the spectral component of the data. This op# erator is given by the (B × B) matrix P⊥ U = I − UU , # where U is the right pseudoinverse of U. The second step of the OSP algorithm is represented by the matched filter, w = kd, where k is a constant. ⊤ ⊥ The OSP operator is given by q⊤ OSP = d PU , and the output of the OSP classifier is: ⊤ ⊥ DOSP = q⊤ OSP r = d PU r.
(3)
By using the singular value decomposition (SVD) of U = BΣA⊤ , the annihilating operator becomes P⊥ U = I − BB⊤ , where the columns of B are obtained from the eigenvectors of the covariance matrix of the background spectral samples.
2.2. Kernel OSP A non-linear kernel-based version of the OSP algorithm can be devised by definining a linear OSP in an appropriate Hilbert feature space where samples are mapped to through a kernel mapping function Φ(·). Similarly to ⊤ the previous linear case, let us define the annihilating operator now in the feature space as P⊥ Φ = IΦ − BΦ BΦ . Then, the output of the OSP classifier in the feature space is now trivially given by DOSPΦ = Φ(d)⊤ (IΦ − BΦ B⊤ Φ )Φ(r).
(4)
The columns of matrix BΦ , say biΦ , are the eigenvectors of the covariance matrix of the undesired background signatures. By using the Representer Theorem,6 each eigenvector in the feature space can be expressed as a linear combination of the input vectors in the feature space transformed through the function Φ(·). Hence, one can write bjΦ = XbΦ β j , where the columns of XbΦ are the eigenvectors in the feature space corresponding to the background spectral signatures, while the columns of Xb correspond to the input spectral signatures; and β j are the eigenvectors of the (centered) Gram matrix K(Xb , Xb ) normalized by the square root of its corresponding eigenvalues. The kernelized version of (4) is given by DKOSP
=
K(Xbd , d)⊤ ΥΥ⊤ K(Xbd , r) − K(Xb , d)⊤ BB ⊤K(Xb , r),
(5)
where K(Xb , r) and K(Xb , d), referred to as the empirical kernel maps in the machine learning literature, are column vectors whose entries are k(xi , r) and k(xi , d) for xi ∈ Xb (xi ∈ RB ), i = 1, ..., l, being l the number of labeled samples; B is the matrix containing the eigenvectors βj described above; and Υ is the matrix containing the eigenvectors υ j , similar to β j , but obtained from the centered kernel matrix K(Xbd , Xbd ), where Xbd = Xb ∪d. It is convenient in the following to rewrite the above expression as
⊤ ⊤ Dω = Kω (Xbd , d)⊤ Υω Υ⊤ ω Kω (Xbd , r) − Kω (Xb , d) Bω Bω Kω (Xb , r),
(6)
where the use of subscript ‘ω’ remarks that the computation of the kernel matrices involves only the spectral information.
3. CONTEXTUAL KOSP The performance of the method strongly depends on the definition of the kernel structural form, which can be casted as a similarity (or distance) measurement among samples. Surprisingly, in the literature, kernel target detectors have considered so far the spectral information alone, obviating the contextual information of the target pixels. Inclusion of contextual or spatial information in any image classifier has the positive effects of regularizing the classification maps and, at the same time, making the training process more stable and reliable. In the following, we review the framework of composite kernels, and subsequently we indicate how to build different contextual KOSP target detectors easily.
3.1. Framework of Composite Kernels In order to incorporate the spatial context into kernel-based classifiers, a pixel entity xi is redefined simultaneously Nω both in the spectral domain using its spectral content, xω , and in the spatial domain by applying some i ∈ R s Ns feature extraction to its surrounding area, xi ∈ R , which yields Ns spatial (contextual) features. These separated entities can be either combined in the original feature space or in the high dimensional kernel feature space, giving rise to different kernel methods of different levels of sophistication, as follows: 1. Stacked features approach. Let us define the mapping Φ as a transformation of the concatenation xi ≡ {xsi , xω i }, then the corresponding ‘stacked’ kernel matrix is: k{s,ω} ≡ k(xi , xj ) = hΦ(xi ), Φ(xj )i, which does not include explicit cross relations between
xsi
and
(7)
xω j.
2. Direct summation kernel. Let us assume two nonlinear transformations ϕ1 (·) and ϕ2 (·) into Hilbert spaces H1 and H2 , respectively. Then, the following transformation can be constructed: Φ(xi ) = {ϕ1 (xsi ), ϕ2 (xω i )}
(8)
and the corresponding dot product can be easily computed as follows: s ω s s ω ω k(xi , xj ) = hΦ(xi ), Φ(xj )i = h{ϕ1 (xsi ), ϕ2 (xω i )}, {ϕ1 (xj ), ϕ2 (xj )}i = ks (xi , xj ) + kω (xi , xj )
(9)
Note that the solution is expressed as the sum of positive definite matrices accounting for the contextual s and spectral counterparts, independently. Also, note that dim(xω i ) = Nω , dim(xi ) = Ns , and dim(k) = dim(ks ) = dim(kω ) = n × n.
3. The weighted summation kernel. By exploiting properties of Mercer’s kernels a composite kernel that balances the spatial and spectral content in (9) can also be created, as follows: k(xi , xj ) =
ω µks (xsi , xsj ) + (1 − µ)kω (xω i , xj )
(10)
where µ is a positive real-valued free parameter (0 < µ < 1), which is tuned in the training process and constitutes a trade-off between the spatial and spectral information to classify a given pixel. 4. Cross-information kernel. The previously addressed kernel classifiers can be conveniently modified to account for the cross-relationship between the spatial and spectral information. Assume a nonlinear mapping ϕ(·) to a Hilbert space H and three linear transformations Ak from H to Hk , for k = 1, 2, 3. Let us construct the following composite vector: s ω Φ(xi ) = {A1 ϕ(xsi ), A2 ϕ(xω i ), A3 (ϕ(xi ) + ϕ(xi ))}
(11)
and compute the dot product ω ⊤ s ⊤ ω s ⊤ ω k(xi , xj ) = Φ(xsi )⊤ R1 Φ(xsj ) + Φ(xω i ) R2 Φ(xj ) + Φ(xi ) R3 Φ(xj ) + Φ(xi ) R3 Φ(xj )
(12)
⊤ ⊤ ⊤ ⊤ where R1 = A⊤ 1 A1 + A3 A3 , R2 = A2 A2 + A3 A3 , and R3 = A3 A3 are three independent positive definite matrices. Similarly to the direct summation kernel, it can be demonstrated that (12) can be expressed as the sum of positive definite matrices, accounting for the contextual, spectral, and cross-terms between contextual and spectral counterparts: ω s ω ω s k(xi , xj ) = ks (xsi , xsj ) + kω (xω i , xj ) + ksω (xi , xj ) + kωs (xi , xj )
(13)
5. Kernels for improved versatility. Also note that one can build up a full family of kernel composition to account for cross-information between spatial and spectral features. For instance, one could think of the following combination of kernels for improved versatility: ω k(xi , xj ) = ks (xsi , xsj ) + kω (xω i , xj ) + k{s,ω} (xi , xj ),
(14)
which combines the summation kernel and the stacked approach (xi ≡ {xsi , xω i }). Similarly, another possibility is to construct the kernel: k(xi , xj )
ω s ω ω s = ks (xsi , xsj ) + kω (xω i , xj ) + ksω (xi , xj ) + kωs (xi , xj ) + k{s,ω} (xi , xj )
(15)
which combines the cross-information and the stacked vector approach in one similarity matrix. These composite kernels were originally introduced in Ref. 11 for SVM-based non-linear system identification problems, and they were extended to a sparse Bayesian framework in Ref. 12. Contextual and spectral information was combined for kernel-based8 and graph-based9 hyperspectral image classification, while in Ref. 10, the methodology was extended to integrate the multi-source remote sensing data for multi-temporal image classification and change detection.
3.2. Contextual KOSP with Composite Kernels According to the previous description, the explicit form of contextual KOSP may be simply obtained from equation (6), by replacing the spectral kernel Kω with a composite kernel K computed in one of the way described above: Dω,s = K(Xbd , d)⊤ ΥΥ⊤ K(Xbd , r) − K(Xb , d)⊤ BB ⊤K(Xb , r), (16) where the variables in brackets are now intended as containing spectral and contextual information. Note that this strategy yields a full family of contextual KOSP with different versatility. Besides, note that each kernel in the composition may have a different free parameter to be tuned and thus the intrinsic complexity to tune the hyperparameters grows. In fact, several Rademacher bounds of generalization performance can be derived for composite kernels,13–15 which demonstrate that complexity of the machine grows sublinearly with the number of basic kernels used in the composition (see Section 4.5 for empirical analysis of these issues).
4. EXPERIMENTAL RESULTS A family of contextual versions of the KOSP algorithm is proposed in this work, in which the surrounding information of the target is included in the standard algorithm by means of composite kernels.8 Results are illustrated in different problems of hyperspectral target detection. In this section, we describe the hyperspectral data used, and pay attention to the experimental setup, model development issues, and the obtained results. We compare OSP, KOSP, and several contextual KOSP through the use of ROC curves. We also pay attention to the selection of free parameters, and the complexity and generalization capabilities of the different kernel methods.
4.1. Hyperspectral Data The used dataset comes from the AVIRIS instrument that acquired data over the Kennedy Space Center (KSC), Florida (USA), on March 23, 1996. AVIRIS acquires data in 224 bands of 10 nm width with wavelengths from 400-2500 nm. The data was acquired from an altitude of 20 km and has a spatial resolution of 18 m. After removing low SNR and water absorption bands, a total of 176 bands remains for analysis. The wetlands of the Indian River Lagoon (IRL) system, located on the western coast of the KSC are a critical habitat for several species of waterfowl and aquatic life. The test site for this research consists of a series of impounded estuarine wetlands of the northern Indian River Lagoon that reside on the western shore of KSC. Detection of land cover for this environment is difficult due to the similarity of spectral signatures for certain vegetation types. The dataset contains 13 labeled classes representing the various land cover types of the environment. Figure 2 shows and RGB composition with the labeled classes highlighted. More information can be found in http://www.csr.utexas.edu/. Figure 2. RGB composition of the data acquired over the Kennedy Space Center by the NASA AVIRIS instrument. Thirteen classes are labeled and are superimposed in the image.
4.2. Experimental Setup
Among all available labeled classes, we focused on the most complex ones. Many criteria of class separability exist in the literature, either relying on a specific metric, class PDF modeling, or based on information theory criteria of class independence. However, in our case, the interest is to evaluate the accuracy gain when the contextual information is used. For this purpose, we run the spectral version of the KOSP for all classes using a reduced number of ∼ 10 randomly chosen samples, and analyzed the obtained results. These results give us an upper bound of generalization performance per class using only the spectral information in the image. Table 1 shows the area-under-the-curve (AUC) computed from the ROC for each target class. Results are reported in ascending order, and the three lowest accuracies obtained are highlighted: Cattail marsh, Dark/Broadleaf mixed class and Graminoid marsh, whose AUC are 6 − 7% lower than for other classes. However, in the following, the analysis is extended to Spartina and Salt marsh in order to perform a more complete analysis, considering all the marsh classes represented in the dataset. This will allow us in turn to evaluate how the different composite kernels exploit the spatial information as a function of the significance of the spectral one.
4.3. Model Development s The spectral samples xω i are, by definition, the spectral signature of pixels xi . The contextual samples, xi , were computed as the mean of a 3 × 3 window surrounding xi for each band. This simple method is motivated by the local assumption in the spatial domain, which has previously produced good results in the context of SVMs.8 In all cases, we used the RBF kernel to construct the similarity matrices, K (xi , xj ) = exp −kxi − xj k2 /(2σ 2 ) . Depending on the composite kernel used, different parameters need to be tuned: σw , σs , σstacked , σcross , and µ; cf. Sect.3 . All RBF kernel widths were tuned in the range σ = {10−2, . . . , 10}. In the case of the weighted summation kernel, µ was varied in steps of 0.05 in the range [0, 1]. Complementary material (MATLAB source code, demos, and datasets) is available at http://www.uv.es/gcamps/ckosp/ for those interested readers.
Class Cattail marsh Dark/Broadleaf Graminoid marsh Salt marsh Spartina marsh Mud flats Slash pine Willow CP/Oak Scrub CP Hammock Hardwood swamp Water
AUC 0.7925 0.8162 0.8164 0.8725 0.8736 0.8845 0.8849 0.8911 0.8976 0.9025 0.9249 0.9463 0.9960
The 5211 labeled pixels were randomly split into three sets: training (l = 1300), cross-validation set (1311) for free parameter tuning, and the test set (2600). Furthermore, training and validation are randomly subsampled in order to use only 5% of the original data. The number of pixels used for each class is reported in Table 2. Therefore, classifiers are developed by using very low training data, and by combining spectral and spatial information as described above. A one-against-all target detection scheme was adopted.
4.4. Detection Accuracy
Figure 3 shows the ROC curves for all the methods and considered target classes. It can be noticed that, in general, all kernel methods outperform the linear OSP, and that the composite KOSP methods proposed here show an improved performance in all classes and ROC regions. It is also observed that for classes ‘Graminoid’ and ‘Spartina’, a lower Table 1. Area under the curve (AUC) per performance is obtained with the most complex kernel approaches inclass using the spectral KOSP. volving the cross-information kernel. Contrarily, satisfactory results are obtained with these kernels for ‘Dark/Broadleaf’ and ‘Cattail marsh’, specially in the case of high values of true positive detections (high sensitivity). Note that, in general, simpler methods such as the stacked, summation or weighted KOSP kernels show excellent performance. These results suggest that the intrinsic spatial-spectral relations are not so complex in this problem, probably due to the high spectral sampling and the simple spatial feature extraction carried out, which introduces high linear correlation between spatial and spectral samples. From the ROC curves, quantitative accuracy measures are derived: area under the ROC curve (AUC), reflecting the overall detection probability, and the minimum squared distance from the ROC to the ideal point [0, 1] (∆γ). Note that the AUC is a more standard (and accurate) approach to analyze models robustness since it measures the average sensitivity over all possible specificities. Table 2 shows the obtained AUC and γ-points obtained for all methods and considered classes, along with the optimal free parameters selected. In general, the family of the proposed contextual KOSP outperforms the spectral-based approach, as a systematic gain of detection probability is observed. Table 2 shows that, in most of the cases, AUC and ∆γ scores are quite similar for the CKOSP algorithms, which was also observed in the ROC curves in Fig. 3. Contextual KOSP performs equally robust as a function of training samples or class complexity, suggesting some kind of additional regularization effect, previously observed in Ref. 9. Differences in AUC (∆γ) between the spectral and the best contextual approaches for each class are numerically significant: gains of 6.90% (8.02%), 5.00% (5.61%), 14.89% (9.19%), 8.99% (9.76%), and 12.42% (9.39%) for ‘Graminoid’, ‘Spartina’, ‘Cattail’, ‘Salt’, and ‘Dark/Broadleaf’ classes, respectively. These numerical results confirm the previous qualitative analysis of the ROC; simpler contextual models (stacked, weighted and summation+stacked) outperform the rest, but results are not numerically or statistically significant. The weighted summation approach outperforms the rest in terms of AUC for the ‘Spartina’ and ‘Cattail’ classes, while when looking at the γ values is the stacked approach the best method for these two classes. Moreover, greater improvements are accomplished when training samples are not actually representing their class variability. On the contrary, if the class is well-represented by the training set, the spectral-based KOSP produces higher probability of detection and, of course, the gain by including contextual KOSP is reduced. For instance, note that in these experiments, spectral KOSP lowest results are obtained for class Cattail but, actually, the used training samples do not accurately represent this class. When using composite kernels, about 14% improvement in AUC detection accuracy is observed, probably due to the smoothing effect of using the spatial information.
4.5. Analysis of Model Complexity The proposed kernel methods do not only yield improved performance but they can model complex data relationships in a more natural way. Figure 4 illustrates the optimal mappings Φ performed by the different versions
1
0.9
0.9
0.8
0.8
0.7
0.7
TPR or sensitivity
TPR or sensitivity
1
0.6 0.5 0.4 0.3
OSP Spectral (KOSP) Stacked Weighted Summation+Stacked Crossinfo Crossinfo+Stacked
0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.6 0.5 0.4 0.3
OSP Spectral (KOSP) Stacked Weighted Summation+Stacked Crossinfo Crossinfo+Stacked
0.2 0.1 0 0
1
0.1
0.2
0.3
FPR or (1−specificity)
0.9
0.8
0.8
0.7
0.7
TPR or sensitivity
0.9
0.6 0.5 0.4 0.3
OSP Spectral (KOSP) Stacked Weighted Summation+Stacked Crossinfo Crossinfo+Stacked
0.2 0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.5
0.7
0.8
0.9
1
0.7
0.8
0.9
0.6 0.5 0.4 0.3
OSP Spectral (KOSP) Stacked Weighted Summation+Stacked Crossinfo Crossinfo+Stacked
0.2 0.1 0 0
1
0.1
0.2
0.3
FPR or (1−specificity)
0.4
0.5
0.6
0.7
FPR or (1−specificity)
(c) Cattail marsh
(d) Salt marsh
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3
OSP Spectral (KOSP) Stacked Weighted Summation+Stacked Crossinfo Crossinfo+Stacked
0.2 0.1 0 0
0.6
(b) Spartina marsh 1
TPR or sensitivity
TPR or sensitivity
(a) Graminoid marsh 1
0 0
0.4
FPR or (1−specificity)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FPR or (1−specificity)
(e) Dark/Broadleaf Figure 3. ROC curves for different KOSP models and target classes.
0.8
0.9
1
Kernel Method (σw , σs , σstacked , σcross , µ) Spectral KOSP (1,-,-,-,-) Weighted (0.8,1.2,-,-,0.4) Stacked (-,-,2,-,-) Summation+Stacked (0.5,1.2,1.85,-,-) Crossinfo (0.5,1.2,-,2.1,-) Crossinfo+Stacked (0.45,1.2,1.8,2.2,-)
Graminoid 10 samples 0.7965 0.1005 0.8645 0.0587 0.8627 0.0599 0.8655 0.0561 0.8616 0.0585 0.8614 0.0587
Spartina 12 samples 0.8579 0.0515 0.9079 0.0349 0.9024 0.0292 0.9049 0.0305 0.9024 0.0358 0.9024 0.0361
Cattail 10 samples 0.7464 0.0894 0.8953 0.0503 0.8850 0.0429 0.8902 0.0436 0.8882 0.0438 0.8888 0.0439
Salt 10 samples 0.8562 0.0408 0.9046 0.0121 0.9454 0.0116 0.9461 0.0109 0.9349 0.0166 0.9280 0.0119
Dark/Broadleaf 6 samples 0.7961 0.0987 0.8991 0.0664 0.9203 0.0485 0.9068 0.0524 0.9060 0.0555 0.9054 0.0566
Table 2. Results in terms of the AUC (top row) and the γ-point (bottom row) for the considered classes and different KOSP-based target detectors: spectral and spatial composite KOSP. Best results are boldfaced.
of KOSP. The mappings can be computed from the optimal training kernel Kdr = VDV⊤ = Φ⊤ Φ, that is, Φ = V⊤ D1/2 . Normalized energy of the off-diagonal elements in the kernel, kKkof f = kKij ki6=j /kKk, and total scatter energy of the approximated mapping, kΦk, can be computed as kernel quality measures.
These empirical scores on kernel analysis can be related to the theoretical results in Refs. 13–15. Computing bounds of generalization performance in kernel machines is an active research field. InpRef. 14, it is observed that the empirical Rademacher complexity, R, for a given K can be bounded with R ≤ tr(K)/M n, where n is the number of samples and M is the margin. Note that for the RBF kernel, the trace of the kernel matrix grows almost linearly with the number of samples, and even if the problem is linearly separable, the margin decreases to a strictly positive √ constant. In Ref. 15, the empirical Rademacher risk for a composition of N kernels, was bounded to RN ≤ N R, which states that the generalization capabilities of the kernel are limited by the number of component kernels. In Ref. 14, a tighter bound PN was also demonstrated for the convex hull of kernel matrices, i.e. the linear combination of kernels K = i=1 di Ki with tr(K) = c and d ≥ 0. Essentially, complexity can p q 1 be bounded by R′N ≤ M 2c/n max kKi k/tr(Ki ). Therefore, the proposed criterion kKdr kof f and R′N are i=1,...,N
equivalent if the margin is fixed: the higher the off-diagonal energy, the lower empirical Rademacher bound. The optimal kernels Kdr and their corresponding Φ mappings, along with empirical and theoretical bounds are given in Fig. 4 for all kernel methods. According to the quantitative results, the methods with higher generalization capabilities are the stacked and the summation kernels, but numerical differences are not significant. It must be noted that, even though the kKkof f and R′N are very similar for all kernels, higher dispersion of the mapping is observed as we move from spectral to spatio-spectral cross-information kernels. This effect gives rise to more complex (but also more scattered) feature mappings, which suggest that, by summing up a number of kernels, a risk of overfitting may happen.
5. CONCLUSIONS AND FUTURE WORK In this paper, we developed contextual/spatial versions of the kernel OSP algorithm through composite kernels. Performance was succesfully evaluated in several hyperspectral image target detection problems. Results indicated that simple composite kernels can clearly improve the results. Further work is tied to use a more sophisticated spatial feature extraction and test the algorithms in other challenging target detection problems.
ACKNOWLEDGMENTS The authors would like to thank Dr. M. Crawford at Purdue University (USA) for making the image dataset available, and Dr. J. Mu˜ noz-Mar´ı at Universitat de Val`encia (Spain) for generating Fig. 2. This work has been partly supported by Spanish Ministry of Education and Science under projects ESP2005-07724-C05-03 and CSD2007-00018.
Spectral
Weighted
Stacked
Sum+Stacked
Crossinfo
Cross+Stacked
kKdr kof f RN [15] R′N [14] 1/R′N [14]
0.9956 1.0000 0.0057 175.4386
0.9962 1.4142 0.0062 161.2903
0.9964 1.0000 0.0064 156.2500
0.9955 1.7321 0.0057 175.4386
0.9967 7.6557 0.0068 147.0588
0.9967 10.8195 0.0067 149.2537
kΦdr k
15.00
16.18
16.65
25.70
34.60
38.45
Figure 4. Optimal kernels Kdr (top row) and corresponding Φ mappings (bottom row) for the different composite kernels. Maps have been exponentiated for proper visualization. Normalized energy of the off-diagonal elements in the kernel, kKdr kof f , total scatter energy of the corresponding estimated mapping, kΦk, and complexity bounds RN and 1/R′N are given (the best results are boldfaced).
REFERENCES 1. I. S. Reed and X. Yu, “Adaptive multiple-band cfar detection of an optical pattern with unknown spectral distribution,” IEEE Transactions on Signal Processing 38, pp. 1760–1770, Oct 1990. 2. J. C. Harsanyi and C. I. Chang, “Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection approach,” IEEE Transactions on Geoscience and Remote Sensing 32, pp. 779–785, 1994. 3. D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, and A. D. Stocker, “Anomaly detection from hyperspectral imagery,” Signal Processing Magazine, IEEE 19, pp. 58–69, Jan 2002. 4. K. I. Ranney and M. Soumekh, “Signal subspace change detection in averaged multilook SAR imagery,” IEEE Transactions on Geoscience and Remote Sensing 44, pp. 201–213, Jan. 2006. 5. H. Kwon and N. M. Nasrabadi, “Kernel matched signal detectors for hyperspectral target detection,” in CVPR ’05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Workshops, p. 6, IEEE Computer Society, (Washington, DC, USA), 2005. 6. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. 7. G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing 43, pp. 1351–1362, June 2005. 8. G. Camps-Valls, L. G´ omez-Chova, J. Mu˜ noz-Mar´ı, J. Vila-Franc´es, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geoscience and Remote Sensing Letters 3, pp. 93–97, Jan 2006. 9. G. Camps-Valls, T. V. Bandos Marsheva, and D. Zhou, “Semi-supervised graph-based hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing 45, pp. 3044–3054, Oct. 2007. ´ 10. G. Camps-Valls, L. G´ omez-Chova, J. Mu˜ noz Mar´ı, M. Mart´ınez-Ram´ on, and J. L. Rojo-Alvarez, “Kernel-based framework for multi-temporal and multi-source remote sensing data classification and change detection,” IEEE Transactions on Geoscience and Remote Sensing 46, pp. 1822–1835, Jun 2008. ´ 11. M. Mart´ınez-Ram´ on, J. L. Rojo-Alvarez, G. Camps-Valls, A. Navia-V´ azquez, E. Soria-Olivas, and A. R. FigueirasVidal, “Support vector machines for nonlinear kernel ARMA system identification,” IEEE Transactions on Neural Networks 17(6), pp. 1617–1622, 2007. ´ 12. G. Camps-Valls, M. Mart´ınez-Ram´ on, J. L. Rojo-Alvarez, and J. Mu˜ noz-Mar´ı, “Non-linear system identification with composite relevance vector machines,” IEEE Signal Processing Letters 14, pp. 279–282, May 2007. 13. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, “Learning the kernel matrix with semidefinite programming,” J. Mach. Learn. Res. 5, pp. 27–72, 2004. 14. O. Bousquet and D. J. L. Herrmann, “On the Complexity of Learning the Kernel Matrix,” Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference , 2003. 15. G. Camps-Valls, “Rademacher complexities for composite kernels,” tech. rep., Dept. Enginyeria Electr` onica, Universitat de Val`encia, Val`encia, Spain, 2003. Available at http://www.uv.es/gcamps.