298
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 10, NO. 2, MARCH 2013
Interactive Segmentation for Change Detection in Multispectral Remote-Sensing Images Haikel Hichri, Member, IEEE, Yakoub Bazi, Senior Member, IEEE, Naif Alajlan, Member, IEEE, and Salim Malek, Member, IEEE
Abstract—In this letter, we propose to solve the change detection (CD) problem in multitemporal remote-sensing images using interactive segmentation methods. The user needs to input markers related to change and no-change classes in the difference image. Then, the pixels under these markers are used by the support vector machine classifier to generate a spectral-change map. To enhance further the result, we include the spatial contextual information in the decision process using two different solutions based on Markov random field and level-set methods. While the former is a region-driven method, the latter exploits both region and contour for performing the segmentation task. Experiments conducted on a set of four real remote-sensing images acquired by low as well as very high spatial resolution sensors and referring to different kinds of changes confirm the attractive capabilities of the proposed methods in generating accurate CD maps with simple and minimal interaction. Index Terms—Change detection (CD), interactive segmentation, level-set (LS) methods, Markov random fields (MRFs), support vector machine (SVM).
I. I NTRODUCTION
C
HANGE detection (CD) is one of the most important applications in remote-sensing technology. The aim of CD is to find pixels that correspond to real changes on the ground in pairs of coregistered images acquired over the same geographical area at two different times. Usually, CD methods rely on the computation of the difference image (DI) from two coregistered images, and then, changes are identified by automatically segmenting the DI into two regions associated with changed and unchanged classes, respectively [1]–[4]. However, as these methods are data driven, the fully automatic discrimination between changed and unchanged classes is constrained by the complexity of the statistical distributions characterizing these classes, their degree of overlap, and initialization. Recently, the utilization of semiautomatic methods with user’s intervention (i.e., interactive segmentation) has become popular in the literature of image processing [5]–[9]. They Manuscript received February 18, 2012; revised April 15, 2012 and May 15, 2012; accepted June 11, 2012. Date of publication July 19, 2012; date of current version October 22, 2012. This work was supported by a research grant from the National Plan for Science and Technology, King Saud University (Project Identifier: 10-SPA-1192-02). The authors are with Advanced Laboratory for Intelligent Systems Research, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia (e-mail:
[email protected];
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2012.2204953
represent a promising solution for enhancing and generalizing the segmentation result. In other words, the user inputs could be valuable in steering the segmentation process in order to obtain accurate results. Usually, interactive-based segmentation methods start by exploiting the user inputs through a set of strokes, lines, scribbles, or curves for generating labeled pixels for object and background termed as seeds. Then, on the basis of these seeds, the segmentation process is carried out using, for example, adaptive weight distances [5], spline regression [6], and maximal-similarity-based region merging [7]. Obviously, the more user interactions we have, the more accurate is the result, but ideally, the level of interaction is usually preferred to be simple and minimal. In remote sensing, user-based interaction methods have been developed to address supervised classification problems [10]. The basic idea of these semiautomatic methods known as “active learning” is that, starting from a small and suboptimal training set, additional samples, considered important, are selected in some way from a large amount of unlabeled data (learning set). Then, these samples are labeled by the user and then added to the training set. The entire procedure is repeated until a stopping criterion is satisfied. In this letter, we propose to solve the CD problem using the concept of interactive segmentation. In particular, we focus on images characterized by a single change. To this end, the user needs to input markers related to change and no-change classes in the DI. Then, the pixels under these markers are used for training a support vector machine (SVM) classifier in a similar way to supervised remote-sensing image classification. After training, the pixels in the image are initially classified with SVM as change and no change. It is a wellknown fact that the analysis of image pixels under spatial independence assumption may lead to inconsistencies due to several reasons, which include, for example, the coregistration noise. The decision for a pixel by taking into account its neighborhood often represents an effective way to increasing the accuracy of the result. In this context, we propose to process further the obtained CD maps using two different strategies based on Markov random field (MRF) and level-set (LS) methods. The former proved to be a powerful and successful mathematical framework as shown by various works dealing with different remote-sensing problems. On the other hand, LS methods recently gained popularity in image segmentation. They exhibit interesting advantages over classical segmentation methods such as thresholding, edge based, and region growing techniques.
1545-598X/$31.00 © 2012 IEEE
HICHRI et al.: INTERACTIVE SEGMENTATION FOR CHANGE DETECTION
299
B. Pixel Analysis With SVM
Fig. 1. Example of change and no-change markings. (a) For the Elba image, changes are located in one region, whereas for the (b) Riyadh image, changes are situated in many regions.
The learning set D = {(xi , yi )}ni=1 is given as input to a binary SVM classifier. The latter is among the most popular supervised kernel-based classifiers available in the literature. Compared to standard classification methods, it relies on the margin maximization principle that makes it less sensitive to overfitting problems [11]. During the classification stage, a spectral-change map is generated by classifying the remaining pixels in the DI as change or no change. The decision for assigning each pixel is made according to the following decision function: αi∗ yi K(xi , x) + b∗ (1) f (x) = i∈S
II. D ESCRIPTION OF THE P ROPOSED I NTERACTIVE CD M ETHOD Let X1 and X2 be two coregistered multispectral remotesensing images of size M × N × d acquired over the same geographical area at two different times. Let XD be the multispectral DI of dimension d generated from X1 and i = |X1i − X2i |, i = 1, . . . , d). Let XDM be X2 (i.e., XD the scalar modulus image obtained for XD (i.e., XDM = 1 )2 (XD
2 )2 (XD
d )2 ) (XD
+ + ... + [1]. The aim of the proposed approach is to generate a CD map representing the changes that occurred between the two acquisition dates of the two images. The following algorithm provides a general view of the proposed method. Detailed descriptions are given in the next subsections. Algorithm: Interactive CD Input: X1 and X2 Output: CD map Y 1) Compute the DI 2) Mark change and no-change regions in the DI 3) Build a training set using the pixels under these markers 4) Train an SVM classifier using this training set 5) Generate an initial CD map with the trained SVM 6) Enhance the CD map using MRF or multiresolution level-set End
A. Change and No-Change Region Markings Given the DI, the user needs to specify pixels related to change and no-change classes. To this end, the user can input the interactive information by drawing markers in the form of lines, strokes, or curves on the DI. Fig. 1 shows an example using line markers. After marking, a learning set D = {(xi , yi )}ni=1 is generated from the multispectral DI, where n represents the number of samples, xi is the feature vector of dimension d, and yi ∈ {1, −1} is the corresponding pixel label. Change pixels under green marker are assigned label “1,” whereas no-change pixels under blue marker are assigned label “−1.”
where K(·, ·) is a kernel function. The set S is a subset of the indices {1, 2, . . . , n} corresponding to the nonzero Lagrange multipliers αi ’s, which define the so-called support vectors, and b∗ ∈ R is the optimal bias. C. Spatial Analysis With MRF Under MRF modeling, the generation of the final changedetection map Y is equivalent to the minimization of the following total energy function [1], [2]: UT =
M N
Umn
(2)
m=1 n=1
where Umn is the local energy function and it is given as follows: Umn = Udata (XD (m, n), Y (m, n)) + Ucontext Y (m, n), Y S (m, n) .
(3)
The first term represents the spectral energy function from the observed data under likelihood independence assumption Udata (XD (m, n), Y (m, n)) = − ln{P (XD (m, n)/Y (m, n))} (4) where P (XD (m, n)/Y (m, n) are the posterior probabilities obtained by applying the Platt procedure to the SVM outputs generated during the classification phase. The term Ucontext (·) represents the spatial local energy function defined over a second-order neighborhood system S. It is defined as Ucontext Y (m, n), Y S (m, n) β · I (Y (m, n), Y (p, q)) (5) = (p,q)∈Y S (m,n)
where I is the indicator function which allows counting the number of occurrences of Y (m, n) in its neighborhood Y S (m, n) and is defined as −1, if Y (m, n) = Y (p, q) I (Y (m, n), Y (p, q)) = (6) 0, otherwise and β is a regularization parameter that tunes the influence of the spatial contextual information on the change-detection
300
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 10, NO. 2, MARCH 2013
process. The minimization of (3) could be done using the iterated conditional mode algorithm as it represents a simple and computationally effective solution to the optimization problem. It consists of the following steps. 1) Initialize Y (m, n) with the class label (change or no change) that minimizes the spectral energy function Udata (XD (m, n), Y (m, n)) without the spatial energy term (i.e., by setting β = 0). 2) For i = 1: Iter (number of MRF iterations) Update Y by minimizing for each pixel (m, n) the local energy function Umn defined in (3) including the spatial energy term Ucontext (Y (m, n), Y S (m, n)). End
D. Spatial Analysis With LS An alternative way in incorporating the spatial contextual information in the decision approach is the adoption of LS methods. Usually, these methods are based on the optimization of a functional energy composed of global fitting energy terms and regularization term. They allow the incorporation of prior knowledge in addition to the definition of statistical models associated with change and no-change classes. In this letter, we consider the well-known segmentation model proposed by Chan-Vese (CV) [3]. However, it is interesting to notice that more advanced models could be considered. The CV model is given by |XDM (m, n) − c1 |2 dmdn
F (C, c1 , c2 ) = inside(C)
|XDM (m, n) − c2 |2 dmdn + μ|C|
−
(7)
outside(C)
and the related minimization problem is given as follows: min(C,c1 ,c2 ) F
(C, c1 , c2 ).
replacing the unknown variable C in (7) by the LS function φ, the energy functional is transformed to [4] |XDM (m,n)−c1 |2 H(φ(m,n))dmdn
F (φ,c1 ,c2 )= Ω
|XDM (m,n)−c2 |2 (1−H(φ(m,n)))dmdn
+ Ω
+μ|
δ0 (φ(m,n))|∇φ(m,n)|dmdn
(10)
Ω
where H is the Heaviside step function, i.e., H(z) = 1 if z ≥ 0 and H(z) = 0 otherwise. δ0 is the Dirac delta function δ0 (z) = (d/dz)H(z). The main advantage of such transformation is that the minimization of the functional (10) with respect to the LS function ψ can be handled more easily than the minimization of the functional (7) with respect to C. In addition, the splitting and merging of the curve can be carried out by simply moving up and down the LS function φ. To optimize F (φ, c1 , c2 ) with respect to φ as well as c1 and c2 , the two-step alternating approach proposed in [3] is iterated until convergence is reached. Depending on the initialization of the LS function, the minimization of the energy functional of the original CV model could be easily trapped into a local minimum. In particular, this risk is increased when the DI is corrupted by noise due to coregistration errors. To reduce this effect, an automatic multiresolution approach termed as MLS is proposed in [3]. Its basic idea is to analyze the DI at different resolutions, namely, from coarse to fine resolutions by successively down sampling the image with a factor of two. The spatial down sampling of the DI leads to interesting properties as it provides a less noisy image and reduces the search space and the number of local minima. The combination of SVM and MLS consists in setting the initial contour φ0 as the SVM change map then running the MLS algorithm [3] initialized with φ0 .
(8) III. E XPERIMENTAL R ESULTS
The parameter μ controls the tradeoff between the goodness of fit and the length of the curve C. The two constants c1 and c2 approximate the image XDM in the segments inside(C) and outside(C), respectively. For a fixed contour C, the optimal values for these parameters are given by the average of XDM over the regions inside(C) and outside(C), respectively. The minimization of the functional (8) can be performed within an LS framework. The latter is based on the representation of the curve C by the zero LS of a Lipschitz function φ : Ω > R such that ⎧ ⎨ C = ∂ω = {(m, n) ∈ Ω : φ(m, n) = 0} inside(c) = ω = {(m, n) ∈ Ω : φ(m, n) > 0} (9) ⎩ outside(c) − Ω \ ω ¯ − {(m, n) ∈ Ω : φ(m, n) < 0} . The LS function is typically defined as the signed distance function of spatial points defined on Ω to the curve C. By
A. Data Set Description To assess the effectiveness of the proposed CD method, different multitemporal remote-sensing images acquired by low as well as very high spatial resolution (VHR) multispectral remote-sensing sensors and referring to different kinds of changes were used in the experiments. The first low-spatialresolution data set refers to two images of size 300 × 300 pixels acquired by the Landsat-5 Thematic Mapper (TM) in April and May 1994, respectively, over an agricultural area in the basin over the Po River situated in Northern Italy. For this data set, changes are of agricultural type. The second data set represents two images of size 393 × 305 acquired over the island of Elba, Italy, by the Landsat-5 TM sensor in August and September 1994, respectively. Between the two acquisition dates, a fire occurred, destroying an area of the Elba’s forest. The third data set is composed of two VHR images of size 430 × 401 pixels
HICHRI et al.: INTERACTIVE SEGMENTATION FOR CHANGE DETECTION
Fig. 2. Set of real DIs used in the experiments. (a) Po image. (b) Elba image. (c) Riyadh image. (d) Mina image. The brightness and contrast of the original DIs have been enhanced for better visualization.
acquired over the region of Riyadh, Saudi Arabia, by the Ikonos-2 sensor in February and October 2006, respectively. For this data set, changes are of agricultural type. Finally, the last data set is composed of two high spatial resolution (VHR) images of size 950 × 700 pixels acquired over the region of Mina, Saudi Arabia, by the Ikonos-2 sensor in January and December 2007, respectively. In that period, new residential and commercial buildings were constructed. The DIs obtained from these data sets are shown in Fig. 2. Ground-truth images (not reported for reasons of space limitation) are used to evaluate the accuracy of the obtained change maps using the following standard error measures (expressed in percentage): 1) error rate (PE : total number of wrongly classified pixels over the total number of pixels); 2) false-alarm rate (PF : number of unchanged pixels classified as changed pixels over the total number of unchanged pixels); and 3) missed-alarm rate (PM : number of changed pixels classified as unchanged pixels over the total number of changed pixels). The first error measure is typically considered as a global detection performance criterion, whereas the two last measures are more specific criteria whose importance depend on the application.
B. Results The results reported in this letter are based on a single interaction step at the beginning, where the user selects only two markers. One marker line is used for the change and nochange classes, respectively. This will allow us to assess the capability of the proposed method in capturing changes situated in different regions of the image with minimal marking. An example of marking related to the four data sets is shown in Fig. 2. To ensure better performances, it is preferable to put markers with sufficient size and close to each other. The number
301
of (change, no-change) samples under these markers are (169, 214), (203, 215), (158, 204), and (538, 438) for the Elba, Po, Riyadh, and Mina data sets, respectively. These training samples represent 0.42%, 0.34%, 0.20%, and 0.14% of the total size of the images. Concerning SVM, we used a radial basis kernel. The regularization and kernel width were determined according to a three-fold cross-validation procedure by varying these values in the ranges [10−3 , 200] and [10−3 , 2], respectively. The SVM parameters in addition to the cross-validation accuracy yielded on the training sets were equal to (150, 0.25, 92.15%), (25, 0.25, 100%), (50, 0.25, 98%), and (200, 0.25, 98.56%) for these data sets, respectively. For MRF, the spatial regularization parameter β was varied in the range [1 10]. For all data sets, no significant change in accuracy was observed, suggesting that the way this parameter is set is not critical. The results reported in Table I are for the value β = 1 [1]. Regarding MLS, the regularization parameter was set to μ = 0.1 [3]. The number of resolutions was fixed to three levels (i.e., coarse, middle, and fine levels) by down sampling the image by factors of 22 , 21 , and 20 (i.e., original sampling), respectively. The number of iterations for the coarse, the middle, and the fine resolutions was fixed to 100. For the sake of comparison, we provide also in Table I the results of two state-of-the-art unsupervised CD methods. The first is based on the fusion of an ensemble of different thresholding algorithms through MRF. The second is the standard MLS method developed in [3] where the initial curve is set as small rectangles uniformly covering the entire DI, so that the likelihood of capturing the changed regions can be maximized, which may occur at different positions of the image. In addition, we include the results of a recent interactive segmentation method based on maximal-similarity region merging (MSRM) [7]. The latter is based on an initial partitioning of the image into homogenous regions using the mean-shift algorithm. Then, the user introduces label information via interactive line markers. With the help of these markers, the method calculates the similarity of different regions and merges them according to a maximal-similarity rule. In particular, the authors adopted the Bhattacharyya coefficient to measure the similarity between these regions. From Table I, we observe clearly the promising capabilities of the proposed interactive SVM–MLS and SVM–MRF methods in obtaining robust CD results for all data sets compared to state-of-the-art methods. The interactive MSRM method worked well for Elba as this image is characterized by one connected change region. By contrast, it provided weak results in the Po and Riyadh data sets as changes are situated in many regions of the DI. A possible solution to this problem is to mark all changed regions, which is either difficult or impractical. By contrast, SVM–MLS followed by SVM–MRF were less sensitive owing to the following: 1) the discrimination power of SVM, which was able to capture these changes, and 2) MLS and MRF methods in exploiting the spatial information for enhancing further these maps. Fig. 3 shows an example of the CD maps obtained for the Elba image. The worst result was obtained by MLS with random initialization, whereas the best result was achieved by SVM–MLS.
302
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 10, NO. 2, MARCH 2013
TABLE I E RROR R ATE (PE ), FALSE -A LARM R ATE (PF ), AND M ISSED -A LARM R ATE (PM ) IN P ERCENTAGE ACHIEVED ON E ACH OF THE F OUR T EST O RIGINAL DIs
Fig. 3. Change maps obtained for the Elba data set: (a) MRF fusion [1]. (b) MLS [3]. (c) MSRM [7]. (d) SVM. (e) SVM–MRF. (f) SVM–MLS.
IV. C ONCLUSION This letter has presented two interactive segmentation methods for solving the problem of CD in remote-sensing images. The first is based on the combination of SVM and MRF, while the second combines the SVM and LS methods. The experimental results obtained on four different multitemporal remotesensing data sets have shown that the proposed approach has the following characteristics: 1) It is very attractive in generating accurate CD results with minimum interaction, and 2) it is robust against initial markings compared to the interactive MSRM method. The latter fails when confronted with images characterized by changes situated in different regions of the DI. Future development of this approach is to extend it to the multichange case that may characterize the DI. R EFERENCES [1] F. Melgani and Y. Bazi, “Markovian fusion approach to robust unsupervised change detection in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 4, pp. 457–461, Oct. 2006. [2] Y. Bazi, F. Melgani, L. Bruzzone, and G. Vernazza, “A genetic expectation maximization method for unsupervised change detection in multitemporal SAR imagery,” Int. J. Remote Sens., vol. 30, no. 24, pp. 6591–6610, Dec. 2009.
[3] Y. Bazi, F. Melgani, and H. Alsharari, “Unsupervised change detection in multispectral remotely sensed imagery with level set methods,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 8, pp. 3178–3187, Aug. 2010. [4] T. Celik and K. K. Ma, “Multitemporal image change detection using undecimated discrete wavelet transform and active contours,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 2, pp. 706–716, Feb. 2011. [5] A. Portiere and G. Sapiro, “Interactive image segmentation via adaptive weight distances,” IEEE Trans. Image Process., vol. 16, no. 4, pp. 1046– 1057, Apr. 2007. [6] S. Xiang, F. Nie, C. Zhang, and C. Zhang, “Interactive natural image segmentation via spline regression,” IEEE Trans. Image Process., vol. 18, no. 7, pp. 1623–1632, Jul. 2009. [7] J. Ning, L. Zhang, D. Zhang, and C. Wu, “Interactive image segmentation by maximal similarity based region merging,” Patter Recognit., vol. 43, no. 2, pp. 445–456, Feb. 2010. [8] L. Zhang and Q. Ji, “A Bayesian network model for automatic and interactive image segmentation,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2582–2593, Sep. 2011. [9] L. Ding, A. Yilmaz, and R. Yan, “Interactive image segmentation using Dirichlet process multiple view learning,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 2119–2129, Apr. 2012. [10] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Munoz-Mari, “A survey of active learning algorithms for supervised remote sensing image classification,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 606–617, Jun. 2011. [11] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machine,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.