IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 7, NO. 4, OCTOBER 2010
741
Spatio-Spectral Remote Sensing Image Classification With Graph Kernels Gustavo Camps-Valls, Senior Member, IEEE, Nino Shervashidze, and Karsten M. Borgwardt
Abstract—This letter presents a graph kernel for spatio-spectral remote sensing image classification with support vector machines (SVMs). The method considers higher order relations in the neighborhood (beyond pairwise spatial relations) to iteratively compute a kernel matrix for SVM learning. The proposed kernel is easy to compute and constitutes a powerful alternative to existing approaches. The capabilities of the method are illustrated in several multi- and hyperspectral remote sensing images acquired over both urban and agricultural areas. Index Terms—Graphs, kernel methods, spatio-spectral image classification, support vector machine (SVM).
I. I NTRODUCTION
C
LASSIFICATION of remote sensing images constitutes a challenging problem because of the potentially high dimensionality of images and low number of training samples, the spatial variability of the spectral signature, and the presence of noise and uncertainty in the data [1]. In this context, the use of classifiers that are robust to the dimensionality and noise is strictly necessary. This is typically guaranteed by using regularized classifiers that not only minimize a cost function but also control the complexity of the classification function. Kernel methods, in general, and support vector machines (SVMs), in particular, are a family of methods that nicely implement these properties. The methods have been successfully used in pixelbased (spectral-based) classification and are among the stateof-the-art remote sensing image classifiers [2], [3]. Despite the good performance of these methods, the obtained classification maps are, in general, very noisy, particularly when dealing with very high resolution (VHR) sensors, or in images acquired over urban areas where spatial homogeneity can no longer be assumed. One should note that an image is not a mere collection of independent and identically distributed pixels, as assumed in pixel-based image classification, but it constitutes a structured domain: Spatially close pixels should intuitively belong to the same class. Hence, not only spectral but also spatial information should be included in the classifier.
Manuscript received January 5, 2010; revised February 25, 2010. Date of publication May 10, 2010; date of current version October 13, 2010. This work was supported in part by the Spanish Ministry of Education and Science under Projects TEC2009-13696, AYA2008-05965-C04-03, and CONSOLIDER/CSD2007-00018. G. Camps-Valls is with the Image Processing Laboratory, Universitat de València, 46980 València, Spain (e-mail:
[email protected]). N. Shervashidze and K. M. Borgwardt are with the Max Planck Institutes, 72076 Tübingen, Germany (e-mail:
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2010.2046618
The field of spatio-spectral image classification is very active, particularly with the advent of VHR sensors, where robust techniques are needed to deal with the high spatial correlation between spectral responses of neighboring pixels [4]. Two basic families of spatio-spectral classifiers are encountered in the literature: 1) feature extraction (or preprocessing) techniques and 2) filtering (or postclassification) methods. In the first case, the spectral signature of a given pixel is combined with the spatial information obtained from the neighbor through windowbased approaches, such as morphological filtering [5]–[7], geometrical features [8], or Markov random fields [9], [10]. The spatial and spectral feature vectors are then jointly classified. These techniques yield good results in general, but several critical parameters need to be tuned, such as the scale and extent of the spatial relations. However, even more important is the fact that the quality of the feature extraction may hamper the representation capabilities of the classifier. The second approach essentially performs spatial smoothing of a pixelbased classification map and is often called postregularization. The approach is mainly based on morphological operators, such as the majority voting scheme [11]–[14]. The generally low improvement in accuracy of these methods is due to the strong dependence of a classifier’s performance and filter design. Certainly, considering spectral and spatial information jointly in the classifier training yields better results. Previous approaches to develop spatio-spectral SVM classifiers considered performing a local spatial feature extraction. The spatial feature vector is then concatenated at a pixel level with the spectral signature and subsequently used for classification. The main problem with this approach is that the curse-of-dimensionality problem is worsened as the feature extraction process is repeated for each spectral channel. The latter problem was alleviated with the introduction of composite kernels [15] in which dedicated kernels for the spectral and spatial information are combined. The framework has been recently extended to deal with convex combinations of kernels through multiple-kernel learning. In both cases, however, the methodology still relies on performing an ad hoc spatial feature extraction before kernel computation, typically limited to second-order statistics or morphological operators [16]–[18]. In this letter, we present an alternative framework to cope with spatio-spectral classification with SVM which alleviates the aforementioned problems. Several advantages over previous developments may be highlighted: 1) The approach takes into account higher order relations in the neighborhood of the pixels (beyond pairwise relations); 2) both spectral and spatial similarities are simultaneously computed, but only one kernel
1545-598X/$26.00 © 2010 IEEE
742
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 7, NO. 4, OCTOBER 2010
containing both kinds of information is computed for training or testing the SVM; and 3) the involved kernel generalizes the composite kernel framework approach by computing similarities in a proper feature space. We illustrate its performance in multi- and hyperspectral images of different spatial and spectral resolutions. The remainder of this letter is organized as follows. Section II presents the novel kernel function for spatio-spectral SVM classification. Section III describes the data used in the experiments, and Section IV shows the experimental setup and analyzes the obtained results in detail. Finally, Section V concludes this letter.
Fig. 1. Illustration of (left) a toy graph on samples and (right) an associated feature map. Each sample (node) is mapped into a higher dimensional vector containing both self-description and neighbor relation entries. In this example, the mapping is explicit but can be easily generalized by means of kernels on graph nodes.
II. P ROPOSED G RAPH K ERNEL This section presents a kernel approach based on graphs that avoids relying on the calculation of an arbitrary spatial feature extraction and, at the same time, guarantees that all data relations in a spatial neighborhood, beyond pairwise sample relations, are taken into consideration.
functions that measure similarity between nodes within one graph. In this letter, we will use the term “graph kernel” in the latter sense. More formally, given a graph G with vertices {1, . . . , n}, a graph kernel is a function k(i, j) = φ(i), φ(j)
A. Limitations of Previous Spatio-Spectral Kernels In [15], a family of alternative kernel-based approaches was presented. Essentially, kernels are composed by a weighted summation of dedicated kernels on spatial and spectral features. Different combinations were proposed. However, several limitations can be noted. First, as other methods, composite kernels are limited by the quality of the feature extraction process and thus cannot learn all higher order similarities between neighboring samples directly. For example, using the average or variance of the spatially neighboring pixels as a feature for classification may be useful yet limited as only secondorder statistics are considered. Second, by merely looking at the similarity between samples (or between the sample and the average of the neighbors), one is implicitly assuming that the neighbors of two similar-enough samples should be also similar. This is not necessarily true in remote sensing image processing, where data may lie in complex manifolds [19]. Finally, the computational cost of the method is increased because a different kernel parameter must be tuned for each kernel in the composition. B. Introduction to Graph Kernels To fix notation, let us consider an undirected unweighted graph to be defined as a pair G = (E, V ), where V is a set of vertices numbered 1 to n, with n being the size of the graph, and E ⊆ V × V is a set of edges, satisfying (i, j) ∈ E ⇔ (j, i) ∈ E. Nodes i and j are neighbors if (i, j) ∈ E. The adjacency matrix of G is an n × n matrix A, with Ai,j = 1 if i and j are neighbors and Ai,j = 0 otherwise. We denote N (i) to be the neighborhood of i, i.e., the set of vertices {v|(i, v) ∈ E}. Graph kernels have recently emerged as a new and rapidly developing field of computer science [20]–[24]. There is some ambiguity associated with the term “graph kernels.” Sometimes, they refer to functions that, given a pair of graphs, assess their similarity. In other contexts, graph kernels correspond to
that takes into account the topology of the graph G. See Fig. 1 for a simple illustration. The most prominent examples of graph kernels on nodes include the diffusion kernel [25], the p-step random walk kernel, the regularized Laplacian kernel, or the inverse cosine kernel [23]. These kernels are all based on the graph Laplacian, i.e., a matrix L = D − A, where A is the adjacency matrix of the graph and D is an n × n diagonal matrix with Dii = j Aij . The graph Laplacian has been previously used in remote sensing image classification [26], [27]. C. Proposed Graph Kernel Function The computation time of most of the previous graph kernels scales as O(n3 ), where n is the size of the graph, because it involves multiplication or inversion of an adjacency matrix or a graph Laplacian. This is prohibitively expensive when n is large. For this reason, we here propose a scalable graph kernel for comparing pixels in images. Definition 1: Given a graph G and a kernel on its nodes ko (for instance, a kernel that checks for the identity of node labels), the recursive graph kernel of depth δ is defined as kδ (i, j) =
kδ−1 (u, v)
(1)
u∈N (i) v∈N (j)
where N (i) is the set of neighbors of i. The recursive graph kernel in (1) can be particularized for image processing. In this setting, nodes become pixels, and the notion of vicinity in the graph is forced to be spatial. The proposed kernel computes recursively a kernel on graph nodes: At each step δ = 1, . . . , S, the kernel function (similarity) is computed first between spatial neighbors and then between the closest neighbors in the previous kernel matrix. This way, the kernel has a multiscale structure depending on the so-called
CAMPS-VALLS et al.: SPATIO-SPECTRAL REMOTE SENSING IMAGE CLASSIFICATION
Fig. 2. Illustration of the graph kernel in the “two-moon” toy example. Kernels obtained for different δ = {0, . . . , 4} are represented, along with the tenfold cross-validation accuracy (in brackets), when using 50 training samples per class in an SVM with an RBF kernel. Increasing the depth parameter δ gives rise to better results and smoother kernel matrices.
data structural depth parameter δ. For pixels xi and xj , the graph kernel is defined as w w 1 kδ (xi , xj ) = 4 kδ−1 xim , xjn w m=1 n=1 2
2
(2)
where xim ∈ Ωi and xjn ∈ Ωj , with Ωk being the spatial window w around pixel xk . The first kernel k0 is computed with all pairs of pixels in Ωi and Ωj . Fig. 2 shows the behavior of the proposed graph kernel in a 2-D toy example. The kernel matrices for different depths kδ are represented, along with the obtained tenfold cross-validation accuracy, when using 50 samples per class. The first kernel ko represents the sample-based classification, without including locality relations. After the first iteration, a noticeable gain in accuracy is attained (the kernel matrix looks sharper in the diagonal entries). The gain is only slightly improved with more iterations as the notion of vicinity in this example is well defined and not too deep. D. Relations to Other Methods The proposed method has direct connections to other kernel approaches. For example, it can be readily shown that, if w = 1, the graph kernel reduces to the spectral approach since xim = δm,i xi . Moreover, the graph kernel generalizes the crossinformation kernel in [15], as not only pairs of individual pixels but also all neighbors are considered in a spatial window. Note that the proposed graph kernel generalizes the mean map kernel as well [28]: Rather than computing average distances, the kernel is estimating higher order deviations of all neighboring samples in the feature space. The notion of distance in the proposed method is based not only on the labeled samples but also on their spatial neighbors: This resembles kernel deformation methods used in semisupervised learning, such as bagged kernels [29], Laplacian SVM [27], or label propagation methods [26].
743
to recompute the whole kernel matrix. For the test (out-ofsample) phase, the cost is linear with the number of test pixels. An interesting advantage is that, once the kernel is computed (which is fast and simple for low number of training samples), the training cost is the same as that of the standard SVM since no additional free parameters are included as occurred in the composite kernel framework. Finally, note that, for the interesting case of VHR images, window sizes should not be very large to efficiently deal with spatial details. III. DATA C OLLECTION We illustrate the performance of the algorithm in two different multi- and hyperspectral images: 1) Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines. The first experiment deals with the standard AVIRIS image taken over Northwest Indiana’s Indian Pines test site in June 1992. Discriminating among the major crops can be difficult (in particular, given the moderate spatial resolution of 20 m), which has made the scene a challenging benchmark to validate classification accuracy of hyperspectral imaging algorithms. The calibrated data are available online (along with the detailed ground-truth information) from http://dynamo.ecn.purdue.edu/~biehl/. We used the whole scene, consisting of the full 145 × 145 pixels, which contains 16 classes, ranging in size from 20 to 2468 pixels, and thus constituting a very difficult situation. Even though 20 noisy bands covering the region of water absorption are typically removed, we decided to keep them here to assess methods’ robustness to noise. 2) DAISEX-1999 data set: This data set consists of labeled pixels of six different hyperspectral images (700 × 670 pixels) acquired with the 128-band HyMap airborne spectrometer during the DAISEX-1999 campaign. This instrument provides 128 bands across the reflective solar wavelength region of 0.4–2.5 μm with contiguous spectral coverage (except in the atmospheric water vapor absorption bands), bandwidths around 16 nm, very high signal-to-noise ratio, and a relatively high spatial resolution of 5 m. IV. E XPERIMENTAL R ESULTS This section evaluates the proposed method in the considered hyperspectral image classification settings and analyzes its characteristics. A. Model Development and Free Parameter Selection
E. On the Computational Cost The proposed method can capture all higher order spatial relations between pixels in a spatial neighborhood. The main problem is that the computational cost increases with the window size in O(w4 ), which can be unaffordable for large window sizes. In practice, however, one starts by building the largest training kernel possible and derives the lower scales from it through properly indexing kernel entries, without needing
We used different rates of randomly selected training samples n = {0.1%, 1%, 3%, 5%}. In all cases, we used the radial basis function (RBF) kernel for the SVM classifiers. The classifiers were trained with tenfold cross validation on the training set, using the following σ parameters {0.1, 0.25, 0.5, 1, 2, 3}/N , and the regularization parameter in C = {100 , . . . , 102 }. Different values of the window size were explored w = {3, 5, 7, 9}, and different depths of the neighborhood parameter were tried
744
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 7, NO. 4, OCTOBER 2010
Fig. 3. Results for different remote sensing image classification problems: (a) AVIRIS Indian Pines and (b) HyMap image. The overall accuracy OA[%] is shown for different numbers of training data, window sizes, w, and neighboring depths: (Dotted line) δ = 1, (dashed line) δ = 2, and (solid line) δ = 3.
δ = 1, 2, 3. A one-against-one multiclassification scheme was adopted. We compare three kernels for SVMs: the one based on the spectral signature; a contextual kernel (C-SVM) that uses stacked spectral and morphological features (specifically, a total of 25 spatial features were extracted by opening and closing morphological filters from specific bands); and the proposed graph kernel (GK). Before training, the data were normalized to give zero mean and unit variance. Complementary material is available at http://www.uv.es/gcamps/graphkernel/ for those interested readers. B. Numerical Comparison Fig. 3 shows the results for the test set formed by all labeled data in the image, for the different images considered, and for different rates of training data, window sizes, w, and neighboring depths δ. Several conclusions can be extracted. First, all graph kernels, regardless of w or δ, outperform the standard spectral kernel. Second, it is also noticed that, as w is increased, the results are generally improved, but they typically saturate for w ≈ 7. For larger values of w, a decrease in performance is expected as neighboring relations would be oversmoothed. Third, the advantage of the proposed kernel approach is clearly noticeable in the Indian Pines image (between 10% and 15% gain in accuracy), mainly due to the high spectral resolution and spatial homogeneity of classes. In the particular case of the HyMap high-resolution image, the gain over spectral SVM is very noticeable (between 16% and 18% in accuracy). C-SVM provides better results than GK with w = 3, but with higher window sizes, the proposed kernel largely outperforms the contextual kernel. Note that the kernel depth δ improves the results in both cases, but the improvement mainly comes from the proper definition of the spatial extent. Classes are spatially and spectrally homogeneous, and hence, incorporating complex neighboring relations only improves results noticeably in highspatial-resolution scenarios. C. Statistical Comparison To further compare kernels, we have additionally performed a statistical analysis of the differences (in the test set) between all the considered spectral and graph kernels with different w and δ. The comparison was done through McNemar’s test [30], [31], which is based on the standardized normal test
statistic.1 The test can be used not only to assess whether statistical differences between methods exist or not but also to quantify the statistical gain of one classifier versus the others. The difference in accuracy between two classifiers is said to be statistically significant if |z| > 1.96, and the sign of z indicates which is the most accurate classifier. For the two considered images, the test revealed the following: 1) Significant statistical differences between the spectral and spatial kernels were observed (|z| > 20); 2) significant statistical differences were observed between different δ values (|z| > 3.1) only for the higher spatial resolution HyMap image; and 3) as the rate of training examples increases, the z-score decreases. All these conclusions match the numerical results shown before. Summarizing, in both situations, the proposed kernel is significantly different (and in most of the cases, better) than the standard kernels. D. Visual Comparison Fig. 4 shows the classification maps obtained with standard SVM and the proposed graph kernel for 1% of training samples, w = 11 and δ = 3, in a representative realization. Numerical results are confirmed by the visual inspection of the classification maps. For the case of the AVIRIS Indian Pines scene, the supervised spectral-based SVM obtains poor results, mainly due to the low number of training samples and the high number of spectral bands and classes. The graph kernel dramatically improves the accuracy and also yields a more spatially regularized classification map. See, for instance, the high spatial homogeneity coherence on different crop fields, such as the large soybean area in the center (orange in the image). The same is true for class “Soybeans-notill” (in blue, east side of the scene), which is better learned by the graph kernel. Interestingly, improvement is typically observed for classes spectrally very similar (e.g., “Soybeans” subclasses), which suggests that the spatio-spectral information helps in identifying subtle critical differences. For the case of the HyMap image, more spatially coherent classification maps are obtained with graph kernels than with both the standard (pixel-based) and the contextual SVM. V. C ONCLUSION In this letter, we have presented a kernel on graph nodes for spatio-spectral remote sensing image classification. The kernel approach considers spatial neighborhoods of all pixels to compute pairwise similarities in a high-dimensional feature space. The proposed kernel is a powerful alternative to existing approaches and generalizes previous approaches on kernelbased spatio-spectral classification. Since, for spatio-spectral classification, one is interested in computing all possible (highorder) neighboring relations, we here proposed to directly work with the input (spectral) samples in the graph-based feature space. This has the advantage of getting rid of a the z-score of classifiers 1 and 2 reduces to z = (f12 − f21 )/ f12 + f21 , where f12 represents the number of samples correctly classified by 1 and incorrectly classified by 2. 1 Computing
CAMPS-VALLS et al.: SPATIO-SPECTRAL REMOTE SENSING IMAGE CLASSIFICATION
745
Fig. 4. RGB composition and classification maps for the best SVM, contextual SVM (C-SVM), and graph kernel SVM for (top) AVIRIS Indian Pines and (bottom) HyMap images. The overall accuracy and kappa statistic are given in brackets.
spatial preprocessing step and to compute the similarity among both the labeled samples and those in their neighborhood at different scales. Future work is tied to devise automatic ways of selecting the window size, relating it to the spatial resolution of the image, and the neighboring depth, related to the spatial arrangement of classes in the scene. R EFERENCES [1] T. M. Lillesand, R. W. Kiefer, and J. W. Chipman, Remote Sensing and Image Interpretation, 5th ed. New York: Wiley, 2004. [2] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, Jun. 2005. [3] G. Camps-Valls and L. Bruzzone, Eds., Kernel Methods for Remote Sensing Data Analysis. Oxford, U.K.: Wiley, Dec. 2009. [4] A. Plaza, J. A. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, and J. C. Tilton, “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environ., vol. 113, no. S1, pp. 110–122, Sep. 2009. [5] P. Soille, Morphological Image Analysis: Principles and Applications. Berlin, Germany: Springer-Verlag, 2003. [6] J. A. Benediktsson, M. Pesaresi, and K. Arnason, “Classification and feature extraction for remote sensing images from urban areas based on morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 1940–1949, Sep. 2003. [7] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classification of hyperspectral data from urban areas based on extended morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 480–491, Mar. 2005. [8] J. Inglada, “Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features,” ISPRS J. Photogramm. Remote Sens., vol. 62, pp. 236–248, 2007. [9] R. C. Dubes and A. K. Jain, “Random field models in image analysis,” J. Appl. Stat., vol. 16, no. 2, pp. 131–163, 1989. [10] Q. Jackson and D. Landgrebe, “Adaptive Bayesian contextual classification based on Markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2454–2463, Mar. 2002. [11] I. Tomas, “Spatial postprocessing of spectrally classified Landsat data,” Photogramm. Eng. Remote Sens., vol. 46, no. 9, pp. 1201–1206, 1980. [12] J. Ton, J. Sticklen, and A. Jain, “Knowledge-based segmentation of Landsat images,” IEEE Trans. Geosci. Remote Sens., vol. 29, no. 2, pp. 222–231, Mar. 1991. [13] B. Solaiman, R. Koffi, M. Mouchot, and A. Hillion, “An information fusion method for multispectral image classification postprocessing,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 2, pp. 395–406, Mar. 1998. [14] Y. Zhang, “Detection of urban housing development by fusing multisensor satellite data and performing spatial feature post-classification,” Int. J. Remote Sens., vol. 22, no. 17, pp. 3339–3355, Nov. 2001.
[15] G. Camps-Valls, L. Gómez-Chova, J. Muñoz-Marı, J. Vila-Francés, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006. [16] G. Camps-Valls, L. Gómez-Chova, M. Muñoz Marı, J. Martınez-Ramón, and J. L. Rojo-Álvarez, “Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1822–1835, Jun. 2008. [17] M. Marconcini, G. Camps-Valls, and L. Bruzzone, “A composite semisupervised SVM for classification of hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 234–238, Feb. 2009. [18] D. Tuia, F. Ratle, A. Pozdnoukhov, and G. Camps-Valls, “Multisource composite kernels for urban-image classification,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 1, pp. 88–92, Jan. 2009. [19] C. M. Bachmann, T. L. Ainsworth, and R. A. Fusina, “Exploiting manifold geometry in hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 441–454, Mar. 2005. [20] T. Gärtner, P. A. Flach, and S. Wrobel, “On graph kernels: Hardness results and efficient alternatives,” in Proc. COLT, B. Schölkopf and M. K. Warmuth, Eds., 2003, pp. 129–143. [21] H. Kashima, K. Tsuda, and A. Inokuchi, “Marginalized kernels between labeled graphs,” in Proc. ICML, San Francisco, CA, 2003, pp. 321–328. [22] N. Shervashidze and K. Borgwardt, “Fast subtree kernels on graphs,” in Proc. 23rd Ann. Conf. Neural Inf. Processing Syst., Vancouver, BC, Canada, Dec. 7–10, 2009, pp. 1660–1668. [23] A. J. Smola and R. I. Kondor, “Kernels and regularization on graphs,” in Proc. COLT, 2003, pp. 144–158. [24] F. Fouss, L. Yen, A. Pirotte, and M. Saerens, “An experimental investigation of graph kernels on a collaborative recommendation task,” in Proc. IEEE Int. Conf. Data Mining, Los Alamitos, CA, 2006, pp. 863–868. [25] R. I. Kondor and J. D. Lafferty, “Diffusion kernels on graphs and other discrete input spaces,” in Proc. ICML, 2002, pp. 315–322. [26] G. Camps-Valls, T. Bandos, and D. Zhou, “Semisupervised graph-based hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3044–3054, Oct. 2007. [27] L. Gómez-Chova, G. Camps-Valls, J. Muñoz-Marı, and J. Calpe-Maravilla, “Semisupervised image classification with Laplacian support vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 3, pp. 336–340, Jul. 2008. [28] 1L. Gómez-Chova, G. Camps-Valls, L. Lorenzo Bruzzone, and J. Calpe-Maravilla, “Mean map kernels for semisupervised cloud classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 1, pt. 1, pp. 207– 220, Jan. 2010. [29] D. Tuia and G. Camps-Valls, “Semisupervised remote sensing image classification with cluster kernels,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 224–228, Feb. 2009. [30] L. L. Lapin, Probability and Statistics for Modern Engineering. Long Grove, IL: Waveland Press, 1998. [31] G. M. Foody, “Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy,” Photogramm. Eng. Remote Sens., vol. 70, no. 5, pp. 627–633, May 2004.