A Semisupervised Latent Dirichlet Allocation Model for ... - IEEE Xplore

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

1

A Semisupervised Latent Dirichlet Allocation Model for Object-Based Classification of VHR Panchromatic Satellite Images Li Shen, Hong Tang, Yunhao Chen, Adu Gong, Jing Li, and Wenbin Yi

Abstract—Typically, object-based classification methods are learned using training samples with labels attached to image objects. In this letter, a semisupervised object-based method in the framework of topic modeling is proposed to classify very high resolution panchromatic satellite images using partially labeled pixels. In the stage of training, both topics and their co-occurred distributions are learned in an unsupervised manner from segmented satellite images. Meanwhile, unlabeled pixels are allocated user-provided geo-object class labels based on the learned model. In the stage of classification, each segment is classified as a user-provided geo-object class label with the maximum posterior probability. Experimental results show that the proposed method outperforms several SVM-based supervised classification methods in terms of both spatial consistency and semantic consistency. Index Terms—Object-based image analysis, probabilistic topic models, semisupervised image classification.

I. I NTRODUCTION

W

ITH the advancement of remote sensing technology, large collections of very high resolution (VHR) satellite images are becoming increasingly available to the public. An accompanying need is to effectively extract thematic information from VHR satellite images. In order to achieve an effective classification of VHR satellite images, many classification approaches based on image segmentation have been proposed to take advantage of both spatial information and spectral information. The majority of these methods, e.g., object-based image analysis [1], typically regard segments as “image objects” and classify unlabeled objects using the classifier learned from a set of fully labeled objects. Although these methods can work well, there exist some

Manuscript received March 7, 2013; revised May 21, 2013 and July 13, 2013; accepted August 28, 2013. This work was supported by the Program for New Century Excellent Talents in University (No. NECT-11-0039), by the National Science Foundation of China (No. 41071259), by the National High Technology Research and Development Program of China (2012AA121302), and by the National Science & Technology Pillar Program of China (2012BAH27B01, 2012BAH27B03). L. Shen, H. Tang, Y. Chen, A. Gong, and J. Li are with the State Key Laboratory of Earth Surface Processes and Resource Ecology and the Key Laboratory of Environment Change and Natural Disaster, Beijing Normal University, Beijing 100875, China (e-mail: [email protected]). W. Yi is with the HSE Information Center, CNPC Institute of Safety and Environment Technology, Beijing 102206, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2013.2280298

problems in their practical applications: 1) training samples must be labeled according to the result of image segmentation, making it difficult to collect training samples of image objects in experimental fields before image analysis, and 2) individual ground truth points cannot be used directly to train objectbased classifiers. Some methods, e.g., spectral–spatial methods [2], conduct pixel-based classification followed by spatial regularization using segmentation map—where the segmentation is defined as an adaptive neighborhood for pixels and the informative statistical characteristics of image objects are not fully utilized. These observations motivate us to develop a new object-based semisupervised classifier, in which partially labeled pixels—instead of fully labeled image objects—are used to train an object-based classifier by using a probabilistic topic model. Benefiting from utilizing both labeled and unlabeled pixels in the stage of training [3], semisupervised learning has been proved to be effective in increasing classification accuracies in many remote sensing applications [4]–[7]. In recent years, probabilistic topic models, e.g., probabilistic latent semantic analysis [8] and latent Dirichlet allocation (LDA) [9]—which were originally proposed in text domain—have gained successful ground in natural image understanding [10]–[12], as well as satellite image analysis [13]– [17]. It is reported that the classification or clustering results obtained by topic models can rely more on semantic coherence. However, due to the unsupervised nature of basic topic models [18], the aforementioned methods based on topic modeling were previously developed for unsupervised image classification or clustering. Although much has been done to incorporate supervised information into topic models, most of the existing approaches are trained using labeled documents [19]–[23]. For this reason, they suffer from the same problem as the objectbased methods mentioned previously—that is their inability to utilize partially labeled pixels to train a classifier. This letter presents a semisupervised object-based method to classify VHR panchromatic satellite images based on the LDA model [9], [24], which is referred to as the ssLDA. In the stage of training, both topics and their co-occurred distributions are learned in an unsupervised manner from segmented satellite images. Meanwhile, unlabeled pixels are allocated userprovided geo-object class labels. In the stage of classification, each segment is classified as a user-provided geo-object class label with the maximum posterior probability. The remainder of this letter is organized as follows. In Section II, the proposed approach is presented in detail. Experimental results are given in Section III. Finally, the conclusion is drawn in Section IV.

1545-598X © 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

TABLE I Q UANTITIES IN THE ssLDA M ODEL

B. Model Compared with the graphical model of the LDA, the variable cdi of geo-object class label for each word, as shown in Fig. 1(a), is added to model the one-to-one correspondence between pixels and geo-object class labels [9], [24]. The hashed fill pattern is used as “half-shaded” to denote that the class node is partially observed. Given a set of segments originating from a panchromatic satellite image, the generative process of the ssLDA model is as follows. 1) For kth element of K topics: k aca) Sample a topic specific grayscale distribution φ cording to the Dirichlet distribution [φk ∼ Dirichlet(β)]. b) Sample a topic specific class distribution πk according to the Dirichlet distribution [πk ∼ Dirichlet(η )]. 2) For dth element of D segments, sample a topic mixture proportion θd according to the Dirichlet distribution [θd ∼ Dirichlet( α)]. Then, for the grayscale value wdi and the class label cdi of each pixel in dth segment, carry out the following sampling. a) Sample a topic zdi according to the multinomial distribution over topics [zdi ∼ M ultinomial(θd )]. b) Sample a grayscale value wdi according to the multinomial distribution over grayscale values conditioned z )]. on topic zdi [wdi ∼ M ultinomial(φ di c) If the class label cdi is unobserved, it behaves according to the multinomial distribution over class labels conditioned on topic zdi [cdi ∼ M ultinomial(πzdi )]. If not, the sampling is ignored, and cdi takes the original value of the given training pixel. C. Algorithm

II. M ETHODOLOGY This section first defines correspondences of text-related terms in the image domain. Then, both the model and the algorithm are presented. A. Mapping Image Into Text for the ssLDA Modeling To introduce techniques used in the text domain to satellite images, it is necessary to build an analog of text-related terms— such as words, vocabulary, and documents—in image domain. For the proposed processing, the following analogs are defined. 1) Following the definition in [13] and [15], the grayscale value of a pixel is defined as a word. 2) The vocabulary is composed of the unique grayscale values of pixels. 3) Image segments, which are obtained by performing image segmentation on the original satellite image, are regarded as documents. Thus, a corpus of documents is equivalent to a satellite image. It should be noted that, to avoid destroying the boundaries and structures of geoclasses, the original satellite image is oversegmented [25]. To simplify the expression, a list of some related quantities in this letter is given in Table I.

As shown in Fig. 1(b), Gibbs sampling is used to conduct model inference. For ease of exposition, assume that, if ith pixel in dth segment is indexed by s = {d, i}, then the expressions of ws , zs , and cs are equivalent to that of wdi , zdi , and cdi , respectively. In addition, the subscript −s denotes a quantity that excludes data on the site s. The algorithm of the proposed model can be summarized in the following three steps. 1) Step 1: Preprocessing. In this step, the parameters in the model are initialized, and the satellite image is oversegmented. For the proposed model, symmetric priors are used for all Dirichlet hyperparameters. Thus, as shown in Fig. 1(a), there are six scalars that need to be set. These include the number of segments D, the number of topics K, the number of classes C, and the Dirichlet hyperparameters α, β, and η. In addition, two matrices with the same size as the original panchromatic satellite image, i.e., matrices of topics Z and class labels C, are initialized by assigning random topics for all pixels and random class labels to the unlabeled pixels. Furthermore, a segmentation map is created by oversegmenting the original satellite image to constitute a corpus of documents. 2) Step 2: Training. Topics are learned in an unsupervised manner from the segmented satellite images. As shown in Fig. 1(a), it can be found that the generative process

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN et al.: LDA MODEL FOR OBJECT-BASED CLASSIFICATION OF VHR PANCHROMATIC SATELLITE IMAGES

3

Fig. 1. Graphic model and algorithm of the ssLDA. The shaded nodes represent observed random variables, and the hashed fill node denotes partially observed random variable class. The intuitional explanations of the model are shown at the end of the red and dashed arrows. (a) Model. (b) Algorithm.

of a class label is almost the same as that of a word (i.e., grayscale value of the pixel). Analogous to the deduction in [26], for each site s, the conditional posterior of the topic zs can be given by P (zs = k | W, Z−s C) ∝

s s nw +η nck,−s k,−s + β · V C v c nk,−s + V β nk,−s + Cη

v =1

θd,k =

πk,c =

c =1

nkd,−s + α

·

K k =1

nkd,−s

(1)

+ Kα

cs s where nw k,−s and nk,−s are the numbers of times the grayscale value wdi and the class label cdi occur with kth topic with the exception of the value on site s, respectively. Likewise, nkd,−s refers to the number of times kth topic occurs in dth segment with the exception of the value on site s. Meanwhile, unlabeled pixels are allocated userprovided class labels. Specifically, for each site s, the conditional posterior of the class cs can be given by

P (cs = c | W, Z, C−s ) ∝

are used to approximate the parameters of document topic mixing proportion and per-topic class distribution

nck,−s + η . C nck,−s + Cη

(2)

c =1

It is worth noting here that the class label of a pixel should be kept untouched if the pixel corresponds to the labeled training pixel. Otherwise, the pixel will be allocated a new class label by sampling using the aforementioned equation. The sampling process of the topic zs and class cs is repeated until convergence or until the maximum number of iterations has been reached. The final sampling results

K

nkd + α

nkd + Kα k =1 nck + η . C c nk + Cη c =1

(3)

(4)

The topic mixing proportion reflects co-occurrence relationships of topics, and the topic specific class distribution reflects dependence between latent topics and userprovided geo-object class labels. 3) Step 3: Classification. Each segment is classified as a user-provided geo-object class label with the maximum posterior probability c∗ = arg max P (c | d) 1≤c≤C K πk,c · θd,k . = arg max 1≤c≤C

(5)

k=1

III. E XPERIMENTAL R ESULTS AND D ISCUSSION A. Experimental Data As shown in Fig. 2(a), a panchromatic QuickBird image of a suburban area with a size of 900 × 900 pixels and 0.6-m spatial resolution was used in the experiments. Fig. 2(b) shows the ground truth map, which includes six geo-object classes of interest, i.e., building, road, shadow, water, tree, and field. The segmentation map was obtained to constitute a corpus of documents using the entropy rate superpixel segmentation algorithm, which has been proven to be both effective and efficient in [25]. The optimal number of segments (i.e., documents) D = 2000 was experimentally derived.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

TABLE II C LASSIFICATION ACCURACIES FOR D IFFERENT M ETHODS

Fig. 2. Classification results. (a) Original QuickBird image. (b) Ground truth. (c) ssLDA. (d) SVM. (e) SVM+MV. (f) Seg-SVM.

B. Comparison With Existing Methods To evaluate the effectiveness of the proposed method, the performance of the ssLDA has been compared with that of three state-of-the-art classification methods: 1) the pixel-based SVM; 2) the spectral–spatial SVM—where pixel-based SVM classification was followed by a majority voting within the adaptive neighborhoods defined by image segmentation (termed as SVM+MV) [2]; and 3) the segment-based SVM, where each region or segment was treated as a basic processing unit and the attribute of a segment was represented by the average grayscale value of all pixels within the segment (termed as Seg-SVM). The latter two methods are based on image segmentation. The ssLDA initialized the Dirichlet priors as symmetric priors, i.e., α = 50/K, η = 50/C, and β = 100. The number of classes was set to six according to the distribution of geoobject classes, and the number of topics was set to ten by fivefold cross validation. For the three SVM-based methods, a multiclass one-versus-one SVM classification with RBF kernel by means of LIBSVM was used [27]. The optimal parameters C and γ were determined by fivefold cross validation. For all methods based on image segmentation, the same segmentation map was adopted.

For the ssLDA, the pixel-based SVM, and the SVM+MV, 20% of the labeled pixels is chosen at random for each class from the ground truth for training purposes. For the Seg-SVM, object samples including approximately 20% of ground truth pixels were selected for training. The remaining labeled pixels of ground truth were set aside for testing purposes. Table II gives the global (overall accuracies as well as kappa coefficients) and class-specific (producer accuracies) classification accuracies for different methods. The corresponding classification maps are shown in Fig. 2(c)–(f), respectively, where each geoclass is represented by a color. From visual inspection, all methods based on image segmentation achieve a more compact and smoother classification result when compared to the pixel-based SVM. Therefore, the advantages of performing image segmentation by enforcing spatial consistency over the classification are confirmed. Furthermore, due to a certain degree of spectral overlap between road and building, they are seriously misclassified with each other in the classification map of the Seg-SVM method. Similar misclassification can also be observed for tree and field in the classification map of the SVM+MV method. The phenomena might be explained by the fact that, although both the Seg-SVM and the SVM+MV embed the spatial context information from the segments, they can only ensure spatial consistency and lack a mechanism to distinguish between different geo-objects with similar spectral characteristics. However, the proposed ssLDA benefits from the topic modeling and works differently. The co-occurrence relationship of different geo-objects can be modeled by the mixture of topics. Therefore, when these geo-objects are observed frequently in a different scene, the mixture proportion features can be utilized to differentiate them. Table II shows that the ssLDA method yields the best global as well as a majority of the best classspecific classification accuracies. In particular, the producer accuracies for building, field, road, and tree are significantly improved. In general, the proposed ssLDA method can obtain a better classification result in terms of both spatial consistency and semantic consistency. C. Influence of Different Sizes of Labeled Pixels As the number of labeled training pixels may affect the classification results, it is therefore worth exploring how the performance of the proposed method behaves with different settings on the size of labeled pixels. A set of experiments with

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN et al.: LDA MODEL FOR OBJECT-BASED CLASSIFICATION OF VHR PANCHROMATIC SATELLITE IMAGES

5

R EFERENCES

Fig. 3.

Overall accuracy versus the proportion of labeled pixels.

different proportions of the ground truth pixels for training, i.e., 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40%, is conducted. Labeled training pixels for each class were acquired at random from the ground truth, and the remaining labeled pixels of ground truth were used for testing. The same labeled pixels were used to fit the SVM+MV, which was used as a baseline method. It should be noted that, in the ssLDA, all of the unlabeled pixels are also used for training. Fig. 3 shows the overall accuracies of both the ssLDA and the SVM+MV against the proportions of labeled training pixels. As can be seen, there are two obvious observations: 1) across all different sizes of labeled pixels, the ssLDA method achieves a better performance than the SVM+MV, and 2) the performance difference between the ssLDA and the SVM+MV has correlation with the size condition of labeled pixels, i.e., when the size of labeled pixels is relatively small, these two methods perform comparably. In contrast, when the size of labeled pixels is large, the ssLDA significantly outperforms the SVM+MV. This is due to the reason that SVM-based methods are not relatively sensitive to the size of labeled training pixels [28]. By comparison, both unlabeled and labeled pixels are utilized in the mechanism of the proposed ssLDA, and with the increase in size of labeled pixels, the model enforces stronger constraints that align learned topics with user-provided geo-object classes, i.e., class labels of labeled training pixels. Thus, in the proposed method, the classification accuracy can be improved considerably when a large number of labeled training pixels are available. It should be noted that, although the requirement of large sizes of labeled training pixels may sometimes be difficult to achieve in practical applications, the proposed ssLDA method provides a promising solution that fully utilizes the labeled training pixels to improve classification accuracy. IV. C ONCLUSION In this letter, a semisupervised object-based method has been proposed to address the problem of VHR panchromatic satellite image classification. The method extends the LDA model by incorporating the information of partially labeled pixels for training. Segments obtained by image segmentation are used as documents to enforce spatial regularization over the classification results. Experimental results show that the proposed approach is superior to three aforementioned SVMbased methods. Furthermore, this approach is not tied to a specific segmentation algorithm. Any method that can produce a reasonable oversegmentation of satellite images may meet the requirement of the proposed method.

[1] T. Blaschke, “Object based image analysis for remote sensing,” ISPRS J. Photogramm. Remote Sens., vol. 65, no. 1, pp. 2–16, Jan. 2010. [2] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral–spatial classification of hyperspectral imagery based on partitional clustering techniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973–2987, Aug. 2009. [3] X. Zhu, “Semi-supervised learning literature survey,” Tech. Note, Jul. 2008. [4] D. Tuia and G. Camps-Valls, “Semisupervised remote sensing image classification with cluster kernels,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 224–228, Apr. 2009. [5] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4085–4098, Nov. 2010. [6] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 2, pp. 318–322, Mar. 2013. [7] I. Dópido, J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas Dias, and J. A. Benediktsson, “Semisupervised self-learning for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 7, pp. 4032–4044, Jul. 2013. [8] T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retrieval, Berkeley, CA, USA, Aug. 15–19, 1999, pp. 50–57. [9] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn., vol. 3, pp. 993–1022, Mar. 2003. [10] X. Wang and E. Grimson, “Spatial latent Dirichlet allocation,” in Proc. NIPS, Vancouver, BC, Canada, Dec. 3–5, 2007, pp. 1577–1584. [11] B. Zhao, F.-F. Li, and E. P. Xing, “Image segmentation with topic random field,” in Proc. 11th Eur. Conf. Comput. Vis., Part V, Crete, Greece, Sep. 5–11, 2010, pp. 785–798. [12] H. Tang, N. Boujemaa, Y. Chen, and L. Deng, “Modeling loosely annotated images using both given and imagined annotations,” Opt. Eng., vol. 50, no. 12, p. 127 004, Nov. 2011. [13] W. Yi, H. Tang, and Y. Chen, “An object-oriented semantic clustering algorithm for high resolution remote sensing images using the aspect model,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 3, pp. 522–526, May 2011. [14] X. Kan, W. Yang, G. Liu, and H. Sun, “Unsupervised satellite image classification using Markov field topic model,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 1, pp. 130–134, Jan. 2013. [15] H. Tang, L. Shen, Y. Qi, Y. Chen, Y. Shu, J. Li, and D. A. Clausi, “A multiscale latent Dirichlet allocation model for object-oriented clustering of VHR panchromatic satellite images,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 3, pp. 1680–1692, Mar. 2013. [16] H. Tang, A. Gong, S. Li, and C. Yang, “Similarity measures of satellite images using an adaptive feature contrast model,” Int. J. Geosci., vol. 4, no. 2, pp. 329–343, Mar. 2013. [17] M. Liénou, H. Maître, and M. Datcu, “Semantic annotation of satellite images using latent Dirichlet allocation,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 1, pp. 28–32, Jan. 2010. [18] D. Blei, C. Lawrence, and D. Dunson, “Probabilistic topic models,” IEEE Signal Process. Mag., vol. 27, no. 6, pp. 55–65, Nov. 2010. [19] D. Blei and J. McAuliffe, “Supervised topic models,” in Proc. NIPS Conf , Vancouver, BC, Canada, Dec. 3–5, 2007, pp. 1–8. [20] S. Lacoste-Julien, F. Sha, and M. I. Jordan, “DiscLDA: Discriminative learning for dimensionality reduction and classification,” in Proc. NIPS Conf., Vancouver, BC, Canada, Dec. 8–13, 2008, pp. 1–8. [21] D. Ramage, D. Hall, R. Nallapati, and C. D. Manning, “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora,” in Proc. Conf. Empirical Methods Nat. Lang. Process., Singapore, Aug. 6–7, 2009, pp. 248–256. [22] J. Zhu, A. Ahmed, and E. Xing, “MedLDA: Maximum margin supervised topic models for regression and classification,” in Proc. 26th Annu. Int. Conf. Mach. Learn., Montreal, QC, Canada, Jun. 14–18, pp. 1257–1264. [23] L. Cao and F.-F. Li, “Spatially coherent latent topic model for concurrent object segmentation and classification,” in Proc. 11th Int. Conf. Comput. Vis., Rio de Janeiro, Brazil, Oct. 14–20, 2007, pp. 1–8. [24] M. Sharifi, Semi-Supervised Extraction of Entity Aspects Using Topic Models. Pittsburgh, PA, USA: Carnegie Mellon Univ., 2009. [25] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in Proc. 13th Int. Conf. Comput. Vis., College Park, MD, USA, Jun. 20–25, 2011, pp. 2097–2104. [26] G. Heinrich, “Parameter estimation for text analysis,” Tech. Note, Aug. 2008. [27] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Apr. 2013. [28] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3, pp. 247–259, May 2011.