IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 4, APRIL 2014
743
A Fast Level Set Algorithm for Building Roof Recognition From High Spatial Resolution Panchromatic Images Zhongbin Li, Zhizhao Liu, and Wenzhong Shi
Abstract—Traditional level set methods usually require repeated tuning of parameters, which is quite laborious and thus limits their applications. In order to simplify the parameter setting, this letter presents a fast level set algorithm that is a further extension of the original Chan–Vese model. For computational efficiency, we start by initializing the level set function in our algorithm as a binary step function rather than the often used signed distance function. Then, we eliminate the curvature-based regularizing term that is commonly used in traditional models. Thus, we can use a relatively larger time step in the numerical scheme to expedite our model. Furthermore, to keep the evolving level curves smooth, we introduce a Gaussian kernel into our algorithm to convolve the updated level set function directly. Finally, compared with other existing popular algorithms in an experiment of recognizing building roofs from high spatial resolution panchromatic images, the proposed model is much more computationally efficient while object recognition performance is comparable to other popular models. Index Terms—Building roof recognition, Chan–Vese (CV) model, fast level set algorithm, high spatial resolution, panchromatic image.
I. I NTRODUCTION
T
HE level set method was first proposed by Osher and Sethian [1] in the field of image processing and computer vision. Because of its capability of handling topological changes naturally, it has been extensively used for 2-D building recognition from remotely sensed data [2]–[5]. Based on the driving force that controls the evolution of the level curve, existing level set methods can be categorized into two classes [6]: region-based models [2]–[4], [7] and edge-based models [5], [8]. Region-based models use the image statistical information (e.g., image mean and variance) to guide the level curves to desired boundaries. Edge-based models, on the other hand, mainly exploit the image gradient to control the evolution of level curves. Currently, most of the level set methods are derived from the region-based Chan–Vese (CV) model [7], e.g.,
Manuscript received March 23, 2013; revised June 25, 2013; accepted August 3, 2013. Date of publication September 6, 2013; date of current version December 2, 2013. This work was supported in part by the Hong Kong Research Grants Council under projects B-Q28F and B-Q32M. The authors are with the Department of Land Surveying and GeoInformatics, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong (e-mail:
[email protected];
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2013.2278342
the two-phase models [2], [3], which are used for panchromatic images, and the multiphase model [4] for multispectral images. Meanwhile, the edge-based models also have received significant attention. For instance, the distance regularized level set evolution (DRLSE) model proposed by Li et al. [8] was utilized to extract rooftops from color aerial imagery [5]. Despite extensive applications of level set method, efforts have to be made to investigate their computation costs. In [2], prior shapes were incorporated into the level set scheme to overcome the occlusion of shadow. However, their algorithm can be further improved in computational speed. It takes nearly 2 h to search for an appropriate building template from an image of half a million pixels [2]. Since CV model can detect all the object boundaries, two extra constraints were added into it to ensure that only the desired objects are extracted [3]. However, the added conditions bring extra parameters and thus cause much more tedious work for tuning them. In [5], although several models were compared with each other in terms of accuracy, the comparison of computational time was not included. In addition, the DRLSE method used in [5] also suffers tedious parameter setting. In [9], a level set-based fast two-cycle (FTC) algorithm was proposed for real-time applications. Due to its computational efficiency, we use it as a benchmark against which our proposed model’s computational speed performance is evaluated. This study is aimed at developing a fast level set method with fewer parameters to identify building roofs from panchromatic images. At this stage, we will mainly focus on the two-phase level set model instead of the multiphase one. Therefore, we have two hypotheses regarding the building roofs: 1) They have distinctive outlines; and 2) the spectral property within a single roof is approximately homogeneous. The proposed algorithm will be elaborated below before it is tested in the experiments of building roof recognition. This algorithm will be useful for those who are interested in object recognition from remote-sensing data by using level set method. It has twofold advantages: It is practically more valuable in implementation since it uses fewer parameters; fast computational speed can be achieved while object recognition accuracy is still comparable to other existing popular models. The outline of this letter is as follows. Section II describes the proposed level set algorithm and its details of implementation. Section III applies the proposed algorithm to building roof recognition from high spatial resolution panchromatic images and compares it with the existing popular models quantitatively. Finally, the discussion and conclusion are given in Section IV.
1545-598X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
744
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 4, APRIL 2014
II. M ETHODOLOGY In traditional level set methods, curve evolution is controlled by two components [8]–[12], i.e., data term and regularization term. The first one attracts the curve toward the boundary, and the second one controls the regularity of the curve [10]. In this section, we first describe our algorithm with proposed data term and then elaborate its implementation. A. Level Set Algorithm The energy functional F (c1 , c2 , C) of CV model [7] is defined as follows [13]–[15]: |I − c1 |2 dx dy F (c1 , c2 , C) = μ · Length(C) + λ1 inside(C)
+ λ2
|I − c2 |2 dx dy
(1)
outside(C)
where Length(C) refers to the length of level curve C and serves as the regularizing term to keep the evolving level curve smooth, μ ≥ 0, λ1 , λ2 > 0 are weighting coefficients, c1 and c2 are mean intensities of image I inside and outside of the level curve, respectively. The corresponding level set formula of (1) is given as follows: ∂φ = δε (φ) μ · div (∇φ/|∇φ|)−λ1 (I − c1 )2 + λ2 (I − c2 )2 ∂t (2) where φ is the level set function normally selected as a signed distance function (SDF), t denotes the artificial time, δε (φ) is the Dirac delta function, div(∇φ/|∇φ|) signifies the mean curvature of level curve, and μ · div(∇φ/|∇φ|)δε (φ) is the mean curvature-based regularization term [9], [12], [16] obtained by minimizing the above functional μ · Length(C). As mentioned above, the CV model [7] has become increasingly popular in the processing of remote-sensing images. However, as one can see in (2), there are up to five parameters, i.e., ε, μ, λ1 , λ2 and time step Δt, that need to be tuned before it can be effectively used. This is a hurdle for many applications. Particularly for feature extraction from large-scale remotely sensed images, it is labor intensive for users to tune the parameters repeatedly. In [10], the authors also pointed out that a model with fewer parameters will be advantageous. Thus, we intend to properly simplify some parameters in the CV model and devise a more practical and efficient model. Motivated by the example in [17], we let μ = 0 and λ1 = λ2 = λ, and then (2) can be rewritten as ∂φ = δε (φ) [λ(c1 − c2 )(2I − c1 − c2 )] . (3) ∂t In (2), the traditional mean curvature-based regularization term is a parabolic term [16], and thus, the choice of time step for numerical scheme should satisfy the Courant–Friedrichs–Lewy condition [8], i.e., the time step should be sufficiently small. Assigning μ = 0 is to eliminate this mean curvature-based regularization term, thus allowing the use of a relatively large time step to expedite our model. Also, it is reasonable to have λ1 = λ2 = λ because we have assumed that the boundaries between object and background regions are clear. For the applications of λ1 = λ2 , we refer the readers to [18].
Fig. 1.
Implementation of the proposed algorithm.
In (c1 − c2 )(2I − c1 − c2 ) on the right-hand side of (3), the really effective part is the data term (2I − c1 − c2 ) whose sign can be used to control the direction of the propagating level curve. The term (c1 − c2 ) can be removed because of its insignificant contribution. In addition, it is feasible to replace δε (φ) with |∇φ| as mentioned in [7]. Based on all these procedures, the following much simpler region-based level set formula can be obtained: ∂φ = λ(2I − c1 − c2 )|∇φ|. ∂t
(4)
In (4), a larger value of λ can expedite the evolution of level curve. However, to reduce the number of parameters, we speed up this model by using a larger time step Δt instead of a larger λ. Therefore, we remove λ directly in (4). The definition of c1 and c2 is the same as that in (1). As in [19], to obtain more stable results, we normalize the data term (2I − c1 − c2 ), and thus, the proposed formula can be written as follows: ∂φ 2I − c1 − c2 = |∇φ|. ∂t max (|2I − c1 − c2 |)
(5)
In this way, we obtain a model with only one parameter, i.e., the time step Δt. Also, it is a cost effective model and suitable for the feature extraction from panchromatic images, as shown in the experimental results. Other related work has been proposed in [16] and [20] and more recently in [15]. In the next section, we will give details of the implementation of this proposed model, including the regularization of level curve. B. Implementation As shown in Fig. 1, the implementation of the proposed algorithm (5) is an iterative process that mainly involves five steps: initialization, evolution, reinitialization, regularization, and iteration. Compared with traditional level set methods [7], [10], major improvements in this letter are highlighted, as shown in Fig. 1.
LI et al.: FAST LEVEL SET ALGORITHM FOR BUILDING ROOF RECOGNITION FROM PANCHROMATIC IMAGES
The five major steps are detailed as follows. 1) Initialization. The initialization consists of two parts. One is to choose the level set function, and the other is to locate the initial position of the level curve and determine its signs inside and outside the level curve. In traditional methods, the level set function is commonly initialized as SDF [7], [10]. However, it is computationally expensive due to its reinitialization [9]. For computational efficiency, in our model, we initialize the level set function as a binary step function (BSF) [8], [16] as follows: −1, if X ∈ R φ(X, t = 0) = 1, (6) otherwise
2)
3)
4)
5)
where R is a region in the image domain. The signs of the level set function (6) control the direction of the evolving level curves, and they can be adjusted according to different illuminations of the object and background regions. Additionally, the initial position of the level curve is of significant importance to implement level set algorithm. Particularly for edge-based models [8], [10], object recognition may fail because of incorrect initialization. The initialization of our model will be detailed in the following experiments. Evolution. Compute the values c1 , c2 and |∇φ|. Then, update the level set function (6) according to the finite difference equation φn+1 = φn + Δt · L(φn ), where n is the number of iterations, Δt is the time step, and L(φn ) is the right-hand side of (5). Reinitialization. Similar to that in [7], this step is optional for our model. It is unnecessary to implement reinitialization when there are multiple objects to be simultaneously extracted. In this case, |∇φ| will not equal to zero in the entire image, which makes (5) valid for every pixel. On the contrary, when only single object is to be extracted from complicated background with unwanted objects whose intensities are similar to the object of interest, one has to reinitialize the level set function periodically to ensure that |∇φ| = 0 in the vicinity of zero level curve and |∇φ| = 0 elsewhere. This makes (5) valid only around the desired object boundary. Specifically, φ is reinitialized as the BSF (6), i.e., set φ = 1, if φ > 0; otherwise, set φ = −1. Regularization. The widely used mean curvature-based regularization term div(∇φ/|∇φ|)δε (φ) in (2) has been removed in our model due to its limitation mentioned previously. To keep the evolving level curve smooth, we introduce a Gaussian kernel to smoothen the updated level set function directly [9], [20]. This regularization step actually stems from the earlier work, such as diffusion-generated motion [21] and convolutiongenerated motion [22]. Check whether the iteration converges. The iteration stops if the evolution converges; otherwise, go back to step 2. III. E XPERIMENTAL R ESULTS
The algorithm presented in this letter is compared against several widely used models such as CV [7], DRLSE [8], and FTC [9] models in terms of parameter tuning, accuracy, and CPU time of building roof recognition from panchromatic images. Note that DRLSE model has inherent limitations of edge-
745
based model, i.e., it is a one-way curve evolution model [4] and sensitive to the initial location of level curve [5]. For a fair comparison, we therefore locate all the initial level curves at the same position in each experiment. In addition, the proposed and CV models need reinitialization periodically, whereas neither DRLSE nor FTC model needs reinitialization due to their intrinsic properties. The MATLAB codes for the CV, DRLSE, and FTC models are downloaded from [23]–[25], respectively. All the algorithms are run under MATLAB R2012a 64 b in Windows 8 OS with a MacBook Pro of Intel(R) Core(TM) i5-3210M CPU @ 2.50 GHz, 4 GB RAM. The amount of CPU time consumed by each algorithm is recorded. A. Data In Fig. 2, the original images marked with letters (a) to (e) are listed from top to bottom in the left column. Images (a), (b), and (e) are GeoEye-1 panchromatic images of spatial resolution 0.5 m/pixel with a size of 618 × 467 (pixels), 412 × 297, and 650 × 510, respectively. Images (c) and (d) are aerial images of spatial resolution 0.15 m/pixel with a size of 867 × 623 and 994 × 648, respectively. The buildings in images (a), (b), (c), and (e) have sloping roofs. Because of the slant incidence of sunshine, the roofs appear to have varied illuminations. In comparison, the roofs in image (d) are flat and appear to have approximately homogeneous intensities. In addition, the building roofs and facades in image (e) have extremely similar intensities, which generally poses a great challenge for building roof recognition. B. Parameter Setting After trial and error, we set the following parameters to get the best performance: Δt = 4, μ = 0.2, v = 0, and λ1 = λ2 = 1 for CV, Δt = 5, μ = 0.04, λ = 5, and α = −3 for DRLSE, default values [25] for FTC, and Δt = 15 for the proposed model. To keep the evolving level curves smooth for our model, we use a Gaussian kernel size of 5 × 5 with standard deviation σ = 1 to convolve the level set function directly at each iteration [9], [20]. For the CV and the proposed models, we use level set functions that take positive values inside the zero level curves and negative values outside. On the contrary, the signs of the level set function in the DRLSE and FTC models are negative inside and positive outside. Note that the DRLSE model is to some extent sensitive to noise, and thus, a Gaussian kernel of 15 × 15 with standard deviation σ = 1.5 is used as preprocessing to smoothen the original images. CV, FTC, and the proposed models are however relatively robust to noise, and no denoising process is performed. C. Qualitative Evaluation In Fig. 2, the original images (a) to (e) are shown with green initial level curves. The images in each subsequent column display the outlines of building roof recognized by CV, DRLSE, FTC, and the proposed models, respectively. The images in the last column show the ground truth data, i.e., red regions. Specifically, in images (a) and (b), the shadows caused by the buildings themselves result in overdetection in CV, FTC, and the proposed models. In contrast, the DRLSE model performs slightly better, and it recognizes more complete outlines of building roofs. Disregarding the shape a priori information,
746
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 4, APRIL 2014
Fig. 2. First column shows the original images with green initial level curves. Subsequent images in each column display the outlines of building roof recognized by CV [7], DRLSE [8], FTC [9], and the proposed model, respectively. The ground truth (i.e., red region) of each test image is shown in the last column.
all the algorithms fail to recognize the building roofs occluded by trees in images (c) and (d). Also, they are unable to distinguish building roofs from facades in image (e) because of the similar intensities. Besides, DRLSE cannot recognize the darker building roofs from image (e), whereas the other models perform better. D. Quantitative Evaluation The recognized building roofs are compared with the ground truth data which are manually digitized from the original
images. To evaluate the quality of these models, we use the following two indices: Completeness = Ao /Ar Correctness = Ao /Ae
(7) (8)
where Ao represents the total area of recognized building roof that is matched with the ground truth data, Ar is the total area of ground truth data, and Ae is the total area of the extracted building roofs. Besides, the most concerned index in this study is CPU time that represents the computation efficiency of each model.
LI et al.: FAST LEVEL SET ALGORITHM FOR BUILDING ROOF RECOGNITION FROM PANCHROMATIC IMAGES
747
TABLE I Q UANTITATIVE E VALUATION R ESULTS OF CV, DRLSE, FTC, AND THE P ROPOSED M ODELS
Table I shows the quantitative evaluation results of all the models. In general, the proposed model has comparable performance with the other three models in terms of completeness and correctness. Note that the completeness of DRLSE model, 0.680, is relatively low as indicated in bold text, since it is an edge-based model which is generally insensitive to weak edges. However, CV, FTC, and the proposed models are based on image region information, and thus, they have better performance. In terms of CPU time, our model is considerably faster than the other three models. As indicated in bold text, the proposed model takes only 4.1, 1.0, 5.5, 4.6, and 5.6 s, respectively, for each test image. IV. D ISCUSSION AND C ONCLUSION This letter has presented a fast region-based level set algorithm that is a further development to the CV model. To be specific, four improvements have been made in this study: 1) Based on CV model, we have devised a more practical and efficient data term with fewer parameters; 2) for computational efficiency, we have initialized the level set function in our model as BSF rather than the often used SDF; 3) in our model, a larger time step can be used in the numerical scheme because of the elimination of traditional curvature-based regularizing term; and 4) to maintain the smoothness and regularity of evolving curves, a Gaussian kernel has been introduced into the smoothing regularization process. The combination of all the steps results in a considerable improvement of computational efficiency. The experimental results have shown that the proposed model is significantly faster than the other three models while achieving comparable performance in terms of completeness and correctness. However, further improvement in the following aspects can be considered: 1) modifying this algorithm to process multispectral images; 2) addressing the recognition of occluded building roofs; and 3) handling multiphase objects directly. These limitations can be addressed in the future research. The test results have shown that this letter is useful for selecting the most appropriate level set method for feature extraction. Given its efficiency and accuracy, it is suitable for object recognition and extraction from remote-sensing data for applications such as road extraction and vehicle extraction. ACKNOWLEDGMENT The authors would like to thank all the anonymous reviewers for their detailed and valuable comments and suggestions. R EFERENCES [1] S. Osher and J. A. Sethian, “Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations,” J. Comput. Phys., vol. 79, no. 1, pp. 12–49, Nov. 1988.
[2] K. Karantzalos and N. Paragios, “Recognition-driven two-dimensional competing priors toward automatic and accurate building detection,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 1, pp. 133–144, Jan. 2009. [3] S. Ahmadi, M. J. V. Zoej, H. Ebadi, H. A. Moghaddam, and A. Mohammadzadeh, “Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, no. 3, pp. 150–157, Jun. 2010. [4] K. Kim and J. Shan, “Building roof modeling from airborne laser scanning data based on level set approach,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 4, pp. 484–497, Jul. 2011. [5] M. Cote and P. Saeedi, “Automatic rooftop extraction in nadir aerial imagery of suburban regions using corners and variational level set evolution,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 313–328, Jan. 2013. [6] C. Li, C. Y. Kao, J. C. Gore, and Z. Ding, “Minimization of regionscalable fitting energy for image segmentation,” IEEE Trans. Image Process., vol. 17, no. 10, pp. 1940–1949, Oct. 2008. [7] T. F. Chan and L. A. Vese, “Active contours without edges,” IEEE Trans. Image Process., vol. 10, no. 2, pp. 266–277, Feb. 2001. [8] C. Li, C. Xu, C. Gui, and M. D. Fox, “Distance regularized level set evolution and its application to image segmentation,” IEEE Trans. Image Process., vol. 19, no. 12, pp. 3243–3254, Dec. 2010. [9] Y. Shi and W. C. Karl, “A real-time algorithm for the approximation of level-set-based curve evolution,” IEEE Trans. Image Process., vol. 17, no. 5, pp. 645–656, May 2008. [10] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” Int. J. Comput. Vis., vol. 22, no. 1, pp. 61–79, Feb./Mar. 1997. [11] K. Zhang, L. Zhang, H. Song, and D. Zhang, “Reinitialization-free level set evolution via reaction diffusion,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 258–271, Jan. 2013. [12] L. Bertelli, S. Chandrasekaran, F. Gibou, and B. S. Manjunath, “On the length and area regularization for multiphase level set segmentation,” Int. J. Comput. Vis., vol. 90, no. 3, pp. 267–282, Dec. 2010. [13] E. Brown, T. Chan, and X. Bresson, “Completely convex formulation of the Chan-Vese image segmentation model,” Int. J. Comput. Vis., vol. 98, no. 1, pp. 103–121, May 2012. [14] X.-F. Wang, D.-S. Huang, and H. Xu, “An efficient local Chan-Vese model for image segmentation,” Pattern Recognit., vol. 43, no. 3, pp. 603–618, Mar. 2010. [15] Y. Yuan and C. He, “Adaptive active contours without edges,” Math. Comput. Model., vol. 55, no. 5/6, pp. 1705–1721, Mar. 2012. [16] F. Gibou and R. Fedkiw, “A fast hybrid k-means level set algorithm for segmentation,” Stanford Univ., Stanford, CA, USA, Tech. Rep., 2002. [17] S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surface. New York, NY, USA: Springer-Verlag, 2002, pp. 123–124. [18] M. Maska, P. Matula, O. Danek, and M. Kozubek, “A fast level set-like algorithm for region-based active contours,” in Proc. 6th Int. Conf. Adv. Vis. Comput., 2010, Lect. Notes Comput Sci., vol. 6455, pp. 387–396. [19] L. D. Cohen and I. Cohen, “Finite-element methods for active contour models and balloons for 2-D and 3-D images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 1131–1147, Nov. 1993. [20] K. Zhang, L. Zhang, H. Song, and W. Zhou, “Active contours with selective local or global segmentation: A new formulation and level set method,” Image Vis. Comput., vol. 28, no. 4, pp. 668–676, Apr. 2010. [21] B. Jawerth and P. Lin, “Shape recovery by diffusion generated motion,” J. Vis. Commun. Image Represent., vol. 13, no. 1/2, pp. 94–102, Mar. 2002. [22] S. J. Ruuth and B. Merriman, “Convolution-generated motion and generalized huygens’ principles for interface motion,” SIAM J. Appl. Math., vol. 60, no. 3, pp. 868–890, Feb./Mar. 2000. [23] S. Lankton, Shawn Lankton Online, New York, NY, USA, 2007. [Online]. Available: http://www.shawnlankton.com/?s=active+contour [24] C. Li., Chunming Li’s Homepage, 2010. [Online]. Available: http://www. engr.uconn.edu/~cmli/DRLSE/ [25] O. Bernard, CREASEG, Villeurbanne Cedex, France 2010. [Online]. Available: http://www.creatis.insa-lyon.fr/~bernard/creaseg/files/ download.html