1
Building outline extraction from Digital Elevation Models using marked point processes Mathias Ortner, Xavier Descombes, and Josiane Zerubia
Abstract— This work presents an automatic algorithm for extracting vectorial land registers from altimetric data in dense urban areas. We focus on elementary shape extraction and propose a method that extracts rectangular buildings. The result is a vectorial land register that can be used, for instance, to perform precise roof shape estimation. Using a spatial point process framework, we model towns as configurations of and unknown number of rectangles. An energy is defined, which takes into account both low level information provided by the altimetry of the scene, and geometric knowledge about the disposition of buildings in towns. Estimation is done by minimizing the energy using simulated annealing. We use an MCMC sampler that is a combination of general Metropolis Hastings Green techniques and the Geyer and Møller algorithm for point process sampling. We define some original proposition kernels, such as birth or death in a neighborhood and define the energy with respect to an inhomogeneous Poisson point process. We present results on real data provided by the IGN (French National Geographic Institute). Results were obtained automatically. These results consist of configurations of rectangles describing a dense urban area. Index Terms— Image processing, inhomogeneous Poisson point process, stochastic geometry, dense urban area, Digital Elevation Models, Laser data, land register, building detection, MCMC, RJMCMC, simulated annealing.
I. I NTRODUCTION A. Urban areas, the third dimension and remote sensing HE automatic reconstruction of 3 dimensional precise maps of towns is of the first importance for an increasing number of applications, including cartography, urban planning and military intelligence. Of course, there are several ways of representing a town in 3 dimensions. The simplest is raster data : each pixel of an image stands for a height; the most complex is a precise vectorial description : each building is described by a set of 3D primitives. Other features such as cars or trees can be included. The advent of high resolution data (HR) in remote sensing has made aerial and satellite images very useful for the analysis of urban areas. A lot of sensors are now available, at different costs. For instance, the commercial US satellite Ikonos provides images at 1m resolution, European satellite SPOT5 provides images at 2.5m, while aerial imagery gives centimetric resolution. LASER or SAR imaging give specific HR data. In this work, we partly focus on the French city of Amiens, for which we were provided with a dataset by the
T
Ariana - Joint research group CNRS / INRIA / UNSA - INRIA, 2004 route des Lucioles, BP 93, Sophia Antipolis, France. Tel : +33 492 387 857, Fax : +33 492 387 643, Email :
[email protected]
Fig. 1. Aerial image (0.25m) of Amiens provided by the French National Geographic Institute (IGN).
French National Geographic Institute (IGN). Figure 1 presents part of an aerial image of this town. Depending on the application, the data provided and the precision required, several methods and algorithms have already been proposed for dealing with urban areas. General overviews can be found in [1], [2] or in the introduction of [3]. Concentrating on automatic methods, a general synopsis of the existing approaches emerges. Automatic methods are mostly made up of three steps : focalisation on an area of interest, low level primitive extraction, and building reconstruction. Focalisation on a precise area consists in the search for some relevant area using either a pre-segmentation (ground/above ground) or some external vectorial data like a vectorial land register. Primitive detection and building reconstruction are usually closely linked, since they depend on two complementary processes: the former creates aggregations of primitives (bottom up step, as illustrated in [1]), while the latter matches building models from a data base with the previously obtained hypothesis (top down step). During the matching step, these methods face a combinatorial problem. To improve this approach, many proposed methods tend to increase the amount of data used for example by using multiple color images, Laser data, register maps or hyperspectral images (see examples in [3]–[5]). Original proposals have been presented, for example using grammars on graphs or Bayesian networks (see respectively [6] and [7]). We present here a slightly different approach based on the
Mathias Ortner
No. 07038
following precepts : • We propose to fuse the two first steps of focalisation and primitive detection. Our goal is to build an automatic method that extracts simple shapes of buildings, and furnishes a result in vectorial form. • We also propose to use only the data given by a Digital Elevation Model (DEM). • We aim at introducing some spatial knowledge about the arrangement of buildings in dense urban areas. In this paper, we do not deal with the third step, namely the precise reconstruction of 3D shape. B. Digital Elevation Models and Building Extraction DEM’s are raster data, for which the value associated to each pixel represents the height. Figure 2 presents such a model, provided by IGN. Each pixel represents an area of 20cm by 20 cm, and the height resolution is 10 cm.
Fig. 2.
c Digital Elevation Model of Amiens ( IGN)
Our goal is to produce an automatic method for the extraction of simple shapes from such Digital Elevation Models. We present in this paper results on three different kinds of DEM : two optical (high and low quality) and one laser. The extraction of the biggest possible amount of vectorial information from a DEM has been explored using various ways. In [8] and [9], S. Vinson and L. D. Cohen extract rectangles from a DEM using an orthoimage, a presegmentation of the DEM and a template approach. In [10], U. Weidner presents a set of morphological tools. In [11], a robust approach based on planar segmentation has been proposed by Vestri and Devernay. Also well-known is the work from H. Maas and G. Vosselman in [12] that introduces invariant moments of clouds of points in a model-oriented approach, and focuses on laser scanner data points without requiring any interpolation. These different methods prove to be efficient in different contexts, and provide different kinds of vectorial representation of the area. The work presented here tries to detect simple shapes from DEMs using the physical information present in the discontinuities of the DEM. The goal is to obtain a vectorial representation of a dense urban area that can be used to
Article soumis au journal IJCV 2
initialise precise roof and building reconstruction methods like the algorithms presented by H. Jibrini in [3]. This work can thus be seen as a medium-level feature extractor. The results are presented as a vectorial land register map. C. Spatial point processes In order to introduce spatial statistics to our model, we use a marked point process representation where points interact. Such models were introduced to image processing by A. Baddeley and M.N.M. van Lieshout in [13]. Further work has been done by H. Rue ( [14], [15]) while more complex applications have been studied in [16]–[18]. Our approach consists in modeling an urban area by a set of an unknown number of interacting particles, where each particle stands for a building hypothesis. A particle is actually a geometrical object that can be compared to the data. A theoretical analysis of possible mathematical models can be found in [19], [20]. Simulated annealing is performed on the density of the defined point process, under its Gibbs representation. This simulated annealing requires an MCMC (Markov Chain Monte Carlo) sampler, which was provided by C. Geyer and J. Møller in [21], [22], P.J. Green in [23], and Van Lieshout in [24]. The Monte Carlo technique we have implemented uses a Metropolis-Hastings-Green update. The “Green” part comes from the fact that the number of particles is unknown and thus jumps between configurations of different sizes are needed. This recalls Grenander and Miller’s Jump and Diffusion algorithm [25]. The major differences with their work come from the statistical physics model with strongly interacting particles that we have chosen, and from the object oriented sampling approach we have implemented. A general presentation of the proposed model is provided in section II. The following two sections detail the model proposed in this paper. Section III presents the external field that quantifies the pertinence of a building hypothesis with respect to the data, while section IV focuses on the internal field made up of strong interactions between buildings. The optimization algorithm is then described in section V. Finally, some results are shown in section VI. We conclude with some comments on the relevance and the reliability of our approach in section VII. II. G ENERALITIES ABOUT THE MODEL A. Point processes 1) Definition: First, let consider a point process X living in K = [0, Xmax ] × [0, Ymax ]. X is a measurable mapping from a probability space (Ω, A, P) to configurations of points of K: ∀ω ∈ Ω X (ω) = {x1 , . . . , xn , . . . } xi ∈ K Since K is bounded and included in R2 , this mapping defines a point process (see [19] for details). Basically, a point process is a random variable whose realizations are random configurations of points. K can be a torus.
Mathias Ortner
No. 07038
Article soumis au journal IJCV 3
Finally, let note C the set of all the finite configurations of points of S: ∃n ∈ N x = {x1 , . . . xn } x∈C iff ∀i xi ∈ S
(a) Homogeneous Fig. 3.
(b) Inhomogeneous
Realizations of two different Poisson point processes
2) Poisson point process: The most random (w.r.t. the entropy) point process is the Poisson point process. Let ν(.) be a positive measure on K and consider a Poisson point process X with intensity measure ν(.) on K. If we consider the discrete Poisson distribution with mean ν(K): pn = e−ν(K)
ν(K)n n!
a realization of X can be generated by the following two steps : a) Number of points: generate N (K) = n, number of points in the configuration following the previously described discrete distribution, b) Conditional distributions: generate {x1 , . . . , xn } by n ν )(.) independent simulations of the distribution given by ( ν(K) and Lebesgue measure on K, which we note |.| or |.|K . For a point process on K, the intensity measure gives the average number of points falling in every Borelian set of K. Let NA (X) be the random number of points of X falling in A ⊂ K. E[NA (X)] = ν(A) Fig. 3 shows realizations of two Poisson point processes, the first case 3(a) corresponds to a uniform measure ν(.) while the second one 3(b) to an intensity measure that gives three times more weight to the upper left corner of K. 3) Marked Point process: Point processes were introduced in image processing because they easily allow to model scenes made of objects. A marked point process adds some marks (i.e. random parameters) to each point. Let take S = K × M with M ⊆ Rd−2 . For instance, to describe rectangles, we use the following parametrization (center position, orientation, length and width of the rectangle). π π M = [− , ] × [Lmin , Lmax ] × [lmin , lmax ] 2 2 which corresponds to the natural parameterization of a rectangle : u = (x(u), y(u), θ(u), L(u), l(u))
4) Density of a spatial point process : We consider the distribution µ(.) of a given Poisson point process with a non atomic intensity measure ν(.). It is possible to define a point process X by specifying its probability distribution by a density with respect to the dominating distribution µ(.). Let consider a mapping h(.) from the space of configurations of points C to [0, ∞[, and a real parameter Z such that: Z 1 h(x)dµ(x) = 1 C Z For instance, let assume:
n(x)
h(x) =
Y
β(xi )
i=1
where β(.) is an intensity function defined on S. A point process X specified by such a density is a Poisson point process with intensity : Z ν 0 (A) = β(u)dν(u) (1) A
This example belongs to a more general class of densities defined by exponential families: suppose a mapping t(.) from C to some Rk is defined, it is possible to describe a class of densities using a parameter θ ∈ Rk and the scalar product < ., . >: h(x) = e−
In this work we introduce a density where points are not independent but are correlated through interaction energies. 5) Estimator and MCMC: If we are given a point process X defined by an unnormalized density h(.) and an intensity measure ν(.) defining a reference Poisson point process distribution µ(.), it is possible to build a Markov Chain that ergodically converges to the distribution of X. In [26] we presented an algorithm producing a Markov Chain (Xt )t≥0 with following properties: • • •
(Xt )t≥0 ergodically converges to the distribution of X, (Xt )t≥0 is Harris recurrent (every starting point is suitable), the ergodic convergence is geometric.
Once such a sampler is defined, any Monte Carlo value is computable. It includes moment functions or more complex statistics. Another way of using this sampler is to use it within a simulated annealing algorithm which gives a global maximum of the density h(.) as described in [24]. In this case, the estimator we obtain is given by: ˆ = Argmax h(.) x
Mathias Ortner
No. 07038
Article soumis au journal IJCV 4
B. Point processes to model urban areas In this section, we define the objects and the class of densities we consider to model dense urban areas. 1) Silhouettes and third dimension: The first choice we have to make concerns the kind of objects we want to detect. It could have been lines or polygons, but we have chosen rectangles for the following reasons. Firstly, rectangles have few parameters. Limiting the dimension of S limits the size of C and makes the optimization easier. Secondly, tests done with lines revealed two difficulties: there are a lot of lines of interest in a DEM, and a non closed line is hard to interpret from a semantic point of view. Finally, land registers are often presented as a collection of polygons that appear to be close to rectangles which hence seems to be a natural pattern to be detected on a DEM. 2) Configurations of objects, image and energy: We define here the strategy we have adopted to define the density h(.). In image processing, two main types of models are generally used. The first one is the Bayesian model which requires to be able to exhibit a likelihood function describing the distribution of an image I being given a configuration of rectangles x. L(I/X = x) = L(x, I)
h(x) ∝ L(x, I)hprior (x)
In this case the optimization is carried out over the a posteriori density which is obtained by multiplying the likelihood, which provides the correspondence between the data and a configuration, by some a priori density, chosen either for the suitability of its statistical properties for the application at hand, or due to the fact that it is mathematically well behaved. However, in our setup, such a Bayesian model would require us to accurately describe the distribution of heights in every area of the DEM. A precise model of height distribution on both the foreground and the background is thus needed. Examples of bayesian point process models for image processing are given in [14] or [27]. For our application, we use a second class of models widely used in image processing and define a density under its Gibbs form: h(.) =
1 −U (x) e Z
U (x) = Uint (x) + ρUext (x)
(2)
Here, Uint (x) stands for an internal energy giving a spatial structure to the configuration x, while Uext (x) is the external field quantifying the quality of a configuration given the data. The positice parameter ρ allows to tune the relative weight of the two terms. Focusing on the data term, The simplest way of defining it is to expand it as a sum over objects in a configuration : X Uext (x) = Ud (u) (3) u∈x
Ud (.) is thus a mapping from S to R quantifying the relevance of an object with respect to the DEM. If Ud (u) ≤ 0, the object u is attractive, since the chosen estimator is the maximizer of probability density h(.). However, care is needed to avoid superposition of points. In equation (3) it is obvious that if Ud (u) ≤ 0, then successive additions of u to the configuration x increase the density in the current configuration: h(x ∪ u ∪ u) ≥ h(x ∪ u) ≥ h(x). An exclusive term avoiding such a
r
Lext
(a) Disposition of measured slices Fig. 4.
e
(b) Sub-sampling
A DEM, a rectangle and slices construction
superposition is thus needed, and we finally propose to define a model with the following structure: X U (x) = ρ Ud (u) + Uint (x) + Uexcl (x) (4) u∈x
Where Uexcl (x) is such that:
U (x ∪ u ∪ u) > U (x ∪ u)
∀ (u, x) ∈ S × C
III. DATA TERM In this section, we focus on the analysis of the altimetric data. In particular, we explain how the data term Ud (.), mapping from S to R, has been designed. The first purpose is to decide what an “attractive object” (Ud (u) ≤ 0) is, and the second is to introduce a potential such that minimizing Ud (u) locally gives the closest attractive object. A. Low level procedure 1) Use of slices: The filter we need should recognize elements of buildings on a DEM. Since objects have local properties, the filter should involve some local detector. In [18] we made two hypothesis on the grey level distribution in a building’s silhouette. However, this filter appeared to be inefficient when trying to detect more complex structures. We propose here to process slices of the DEM taken orthogonally to the principal axis of a rectangle. Fig. 4(a) shows a part of a DEM, a rectangle on it and slices evenly disposed. Fig. 5 shows some examples of profiles measured on the DEM. As described in this figure, we propose to extract some points of interest from each profile and to check the coherence between these points and the rectangular shape of the object. Thus, too compute Ud (u), we propose to use a low level algorithm that relies on the following idea. For a given rectangle u ∈ S, we propose to measure first all related slices and then, for each slice, to detect points of interest. This allows to compute :
Mathias Ortner
No. 07038
Article soumis au journal IJCV 5
Fig. 5.
•
•
Slices, points of interest and rectangular shape.
a binary test deciding if the object is relevant (see discussion leading to (4) : this gives the support of the negative part of Ud (.)), a cost function corresponding to values of Ud (u), reflecting the distance between u and the closest attractive object if Ud (u) > 0 and between u and its best version if it is already attractive.
2) Profile simplification and point detection: For a given rectangle u, we use three parameters that define a mask of points : a resolution e standing for the distance between two successive profiles, a resolution r corresponding to the sampling resolution of the elevation on this profile, and a parameter Lext corresponding to the length to be explored in the neighborhood of the rectangle. Fig. 4(b) shows the mask of points used to compute the profiles on the DEM (e and r are two parameters of the sub-sampling). To detect points of interest, we apply the following procedure to each profile. First, for each profile, slopes higher than a threshold σl /r are accumulated using a regularization parameter lregul . Then an opening is performed using the same structuring element (segment of lregul length) . Finally, gradients larger than a threshold σh are selected, and thus, for each profile, a set of points of interest is drawn up. This procedure can be implemented in a fast way. Finally only two loops over each profile are necessary to perform the extraction of points of interest. Fig. 6 presents results of those different steps. In practice, considered profiles are smaller than the one presented in fig. 6. The filter relies on 4 parameters: r, σl , σh and lregul , all defined in meters since they all have a physical meaning. They appeared in practice to be quite robust. It is easy to learn them, if a training set is provided. An important issue is that it is easy to re-learn them when changing the kind of data used (for instance when switching from a optical DEM to a Laser one.) 3) Point selection and potential computation: The algorithm previously described gives for each profile of a rectangle u a set of detected points of interest. Fig. 7 presents a rectangle hypothesis on a true DEM. Fig. 7(a) shows the detected gradients after the profile simplification procedure. After this step, for each profile, the two closest gradients from the two
r=0.50 m,
Sigma h= 2 m,
l regul= 4 m,
(Sigma l)/r= 1
45 40 35 30 25 20 15
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
350
400
450
0
50
100
150
200
250
300
350
400
450
10 5 0 −5 −10 20 10 0 −10 −20 15 10 5 0 −5 −10 −15 −20 −25
Fig. 6. Steps of the profile simplification algorithm : From top to down : a) True profile and sub-sampled profile, b) detected slopes (≥ σl ), c) detected gradients after slope accumulation and opening steps and d) True profile and simplified one. Example on real data (Amiens (IGN)).
rectangle sides are selected. If only one (or none) gradient has been found, a fictive one (or two) is considered at distance Lext . 4) Hit length: The hit length corresponds to the number of gradients which are close enough to the corresponding rectangle side multiplied by the resolution parameter e. Fig. 7(d) illustrates the meaning of the hit length Lg(u) on an example: multiplying the number of boxed gradients by e (length of a box) gives the hit length, ie. the length of detected discontinuities on the DEM. Lg(u) is computed by the mean
Mathias Ortner
No. 07038
of a sensibility parameter δr (width of a box). Since there are two sides, we actually compute two values that can be ordered: Lmin ≤ Lg1 (u) ≤ Lg2 (u) ≤ Lmax . 5) Other useful values: A volume rate v¯(u) is also computed. This rate corresponds to the total length of grey segments shown in Fig. 7(b) over the number of profiles multiplied by the width. Maximizing this volume rate makes a rectangle evolve until it is well localized with respect to the detected structure. However, the expression of this volume rate recalls a translation potential (up to the sign). So, in order to measure the distance between an hypothesis and the interesting structure with respect to a rotation, we have introduced the moment rate m(u) ¯ which consists in the average of the squared length presented in Fig. 7(c). B. Data term design 1) Attractive objects: We recall that the two basic functions Lg1 (.) and Lg2 (.) stand for the length of detected gradients along the sides of a rectangle. We first focus on what an attractive object is. Let γ0 be the set of attractive objects. We define an attractive object as an object with enough gradients detected along its sides. Let th1 ≤ th2 be two thresholds, both living in ]0, 1[, L(u) and l(u) the length and the width of the rectangle u; γ0 is then the following set : Lg1 (u) ≥ th1 ∗ L(u) and γ0 = u ∈ S s.t. (5) Lg2 (u) ≥ th2 ∗ L(u)
2) Cost function: We associate to this set the following energy function J0 : S → [0, −1] 2 (u) + 12 ll(u) J0 : u ∈ γ0 → − 14 Lg1 (u)+Lg Lmax max (6) u 6∈ γ0 → 0 Thus, an object is attractive if and only if the detected hit lengths are large enough. Basically, we define Ud (u) ≈ J0 (u). Minimizing the cost function maximizes the length of detected gradients and the perimeter ofPthe rectangle. Over the total configuration, since Ud (x) = Ud (u), the reward function evolves linearly with the total detected gradient length. However, the ratio between |γ0 | and |S| is really small. We thus propose to add other levels to the data term, allowing us to order objects that are not attractive, to ease the optimization. 3) Partition of S: We introduce two thresholds vmin and mmax , and the following three sets : u 6∈ γ0 γ1 = u ∈ S : v¯(u) ≥ vmin (7) m(u) ¯ ≤ mmax u 6∈ γ1 ∪ γ0 (8) γ2 = u∈S: v¯(u) ≥ vmin or m(u) ¯ ≤ mmax γ3 = {u ∈ S : u 6∈ γ2 ∪ γ1 ∪ γ0 } (9) The four sets γi describe a partition of S. Table I presents the relative sizes of these sets in a practical case. Our algorithm is based on random generations of rectangle hypothesis through a uniform distribution over S. Nevertheless, the attractive rectangles are those living in γ0 . The probability of hitting this set is almost insignificant. We will see later that we propose
Article soumis au journal IJCV 6
TABLE I D ECOMPOSITION OF THE PARTITION OF S INTO THE 4 γi . S IZES WERE OBTAINED BY AN
i 0 1 2 3
MCMC ESTIMATION . |γi | S 0.015 % 0.177% 2.950% 96.858%
to re-equilibrate the weights of these sets in order to improve the behavior of the algorithm. 4) Final data term: To each of these sets γi , we associate an energy function Ji (.) such that Ji (u) ∈ ]0, 1] if u ∈ γi and Ji (u) = 0 otherwise. Details of these functions are given in [28]. We define the data term as : X J0 (u) + 0.001(1γ1 (u) + J1 (u)) Ud (x) = u∈x
+0.01(1γ2 (u) + J2 (u)) + 0.1(1γ3 (u) + J3 (u))
The important points to be highlighted are the following : • an object is attractive (Ud (u) ≤ 0) if and only if it belongs to γ0 , • if the object is not attractive, it is a least slightly repulsive, • and the data energy of an object is decreasing with the quality of this object. This last point implies, among others properties, that Ud (u3 ) ≥ 0.1 > Ud (u2 ) ≥ 0.01 > Ud (u1 ) ≥ 0.001 ≥ 0 > Ud (u0 ) ≥ −1, if u3 , . . . , u0 belong to γ3 , . . . , γ0 respectively. When performing simulated annealing, repulsive objects (i.e. those belonging to γ1 ∪ γ2 ∪ γ3 ) are progressively removed. IV. I NTERNAL F IELD The prior model can be seen as a regularizing term. In our framework, the prior model is essentially composed of interactions between objects. This reduces to the definition of second order energy terms U (u, v) where u and v are different objects. Our goal with this regularizing term is to favor some interactions, and to penalize some others. A. A general model for interactions 1) Definitions: For a given symmetric relation ∼ on S, let define by R(x) the set of interacting pairs of x: R(x) = {{u, v} : u ∈ x, v ∈ x, u 6= v u ∼ v}
We also define the neighborhood N (x, u) of a point u in x as the set of points in x in relation with u: u ∈ x N (u, x) = {v ∈ x : u ∼ v}
First we consider the following mapping V : V (x, u) = 1(N (u, x) 6= ∅)
V (x, u) is null only if u has no neighbor in x. This function will be included in the model, in order to favor or penalize the presence of an interaction. We might be interested in qualitatively favoring or penalizing some interactions with respect to some others. Let suppose that a mapping Ψ(., .)
Mathias Ortner
No. 07038
Article soumis au journal IJCV 7
the following maxima W (x, u) : maxv∈N (x,v) Ψ(u, v) W (x, u) = 0
if N (x, u) 6= ∅ otherwise
W (u, x) is the reward function of the best interaction among those involving u. Note that for a repulsive interaction it might be better to compute the worst one (minimizing Ψ). 2) Local energies: It is now possible to define an energy that is the sum of some local energies describing the state of each object u ∈ x with respect to the relation ∼. Let define the local energy of an object u ∈ x as:
(a) Detection : Gradients detected by the profile simplification procedure.
(b) Volume rate : Grey lines represent segments used to compute the volume rate.
u∈x
∼ Uloc (x, u) = − (aV (x, u) + bW (x, u))
(11)
where a and b are two real parameters and write the total energy of the configuration x: X ∼ U ∼ (x) = (x, u) Uloc u∈x
3) Generalization: When using several interactions ∼1 , . . . , ∼k , the above described model becomes: Uint (x)
=
k X
U i (x)
(12)
i=1
=
k XX
i (x, u) Uloc
(13)
u∈x i=1
= (c) Moment rate : Grey lines represent segments used to compute the moment rate.
(d) Localization : A gradient is boxed if it is close enough to the corresponding rectangle side.
Fig. 7. A DEM, a rectangle hypothesis and the different values computed to measure the relevance of this rectangle hypothesis.
−
k XX
[ai V i (x, u) + bi W i (x, u)] (14)
u∈x i=1
Expressed this way, it is easy to verify that the global energy of the configuration evolves linearly with the number of points in the configuration. Thus combinatorial problems disappear and the balance of the different terms is eased. The last important advantage of this interaction model is its scale invariance : the weights ai , bi do not depend either on the size of the considered area or on the number of objects in the scene to be detected. B. Alignment interactions
from S × S to [−1, 1] is defined on interacting objects. This function should quantify the quality of a pair of interacting points (an example is given later in this paper):
Ψ:
S×S
→
[−1, 1]
(u, v)
→
Ψ(u, v) = Ψ(v, u) 0
if u ∼ v otherwise
(10) A given object might be involved in several interactions. The first idea is to sum the Ψ functions over all such interactions. However, this solution leads to a combinatorial phenomenon, since the number of second order interactions evolves in the worst case possible with the square of the number of points (see [22] or [26] for a proof) . We thus propose to compute
In a town, buildings are usually aligned. Hence, we design an interaction that favors such alignments. Since our data term detects discontinuities that can be seen as walls, we take into account the two kinds of alignments described by Fig. 8. Fig. 8(a) presents an example of alignment, while on Fig. 8(b) values used to define an alignment are shown. Denoting dC1 (u, v) the distance between appropriate corners and dt (u, v) the angle difference between the two rectangles (modulo π), we define the first interaction by the three following conditions : dC1 (u, v) ≤ dC max dt (u, v) ≤ dt max (15) ⇐⇒ u ∼al1 v (u, v) ∈ γ02 We only detect alignments between two attractive objects (belonging to γ0 ). Since a rectangle has four corners, we define an alignment relation for each of them. These four
Mathias Ortner
No. 07038
Article soumis au journal IJCV 8 C 1
C 1
d C
C2
C2
C4
C 3
C4
dt
C 3
C4
Fig. 9. Illustrations of the completion relations. On the left a rectangle is interacting with an other with respect to the four possible completion relations, while on the right a rectangle is interacting with two others with respect to two completion relations. 1
(a) Example 1
C 1
(b) Definition
1
(c) Example 2
Fig. 8. Alignment interactions. a) example of a single alignment, b) values needed to define an alignment and c) example of a rectangle interacting with three other ones with respect to the 4 possible relations of alignment.
relations are denoted by ∼1al to ∼4al . An object can be aligned with another one by two corners, in which case we consider that two different interactions are acting on them. Fig. 8(c) shows an example of a rectangle that is related to three other ones under the four relations. To define the reward function associated with the alignment relationship, we first introduce the following real valued function $(.): $:
R2
→
Fig. 10.
Illustration of the paving interactions.
arrangements of buildings. The definition of the relation is similar to those previously defined. The reward function Ψ is also defined in a similar way. This leads to interactions ∼9pav to ∼12 pav .
[0, 1] E. Exclusion Interaction 1 + x2max 1 −1 , |x| ≤ xmax An exclusion term that avoids redundant objects is needed, (x, xmax ) → x2max 1 + x2 because of the following reasons: we need to avoid redundant $ was designed such that $(0, xmax ) = 1 and explanations of the data and we need to insure that the attrac$(xmax , xmax ) = 0. The associated reward function evalu- tive interactions will not make the set of particles collapse to an infinite accumulation of points. Furthermore, a condition ated on a pair of related points is: used in the convergence of the algorithm proof requires the 1 1 Ψ(u, v) = $(dt (u, v), dt max ) + $(dC1 (u, v), dC max ) variation of the energy, induced by adding a point to a given 2 2 configuration, to be bounded. This reward function is important : the goal is not only 1) Definition: The simplest exclusion interaction we can to promote the presence of alignments, but also to favor use is the following intersection relation that acts only on alignments of good quality. parallel rectangles : dt (u, v) ≤ dtmaxexcl C. Completion interactions (16) u ∼excl v iff Surf (u) ∩ Surf (v) 6= ∅ Simple buildings are usually made of four sides. Since our data term detects discontinuities along only two of the This exclusion interaction has the following property with four sides of a rectangle, it is useful to add a completion respect to the attractive interactions defined previously : relationship favoring the detection of discontinuities with respect to two orthogonal directions. We use the same type It exists a finite number Nover such that if an object u of conditions as in equation (15) : related objects should both has Nover neighbors (with respect to one of the attractive belong to γ0 , distance between suitable corners should be less relations ∼i ), it is impossible to add a new neighbor that than dCmax , and the difference between angles should be close does not intersect with one of the formers. from π/2 (with a dtmax tolerance). Once again, we actually define four different relations ∼5comp to ∼8comp , each of them This property is easily verified, since the number of being related to one of the corners. Fig. 9 presents examples non overlapping rectangles bigger than lmin ∗ Lmin is limited of related rectangles. because the total surface of interest (the image) is finite. 2) Associated energy: We use a simple model, homogeD. Paving interactions neous to the general interaction model presented in section IVThe last kind of relations favors parallel rectangles which A: X are located side by side as illustrated in Fig. 10. This inVexcl (x, u) aexcl < 0 Uexcl (x) = −aexcl teraction is essentially introduced in order to favor clean u∈x
Mathias Ortner
(a) At medium temperature Fig. 11.
No. 07038
(b) At low temperature
Illustration of the influence of the internal field.
aexcl is taken small enough, such that it is impossible to have redundant objects. We detail later how it is tuned. It is possible to choose more relevant terms, that avoid to penalize too strongly intersecting rectangles, but it would cost computational time. F. Visual results We present on Fig. 11 two results. The first one (Fig. 11(a)) is a realization of a point process whose density is defined by (Uint (.) + Uexcl (.))/T at a medium temperature. The second one (Fig. 11(b)) is a configuration minimizing Uint (.)+ Uexcl (.) obtained by simulated annealing. These results show how the internal field constraints object positioning. V. O PTIMIZATION ALGORITHM In this section, we present the algorithm used to optimize the energy defined in equation (4). A general presentation and proofs of convergence have been given in [26]. A. Generalities We have defined a point process X by its energy U (.). Through the Gibbs relation, this energy leads to a density h known up to a normalizing constant. This density and the distribution µ(.) of the reference Poisson point process define the distribution π(.) of X. If we want to sample from π(.) (i.e. to obtain some configurations of rectangles distributed according to π(.)), a solution is to use a Monte Carlo Markov Chain (MCMC) method. Such a procedure builds a Markov Chain (Xt )t≥0 on C, space of finite configurations of rectangles using a starting point (in our case, the empty configuration X0 = {∅}), and a Markovian transition kernel K(x, .) giving the distribution of Xt+1 |Xt = x. Of course, K(., .) is designed in order to make the Markov Chain converge ergodically to the desired distribution. Ergodic convergence means that the final elements of any trajectory are distributed according to π(.). kK n ({∅}, .) − π(.)kT V → 0 The Markov Chain generated by the following algorithm statisfies this property. We actually have more accurate results, since we know that we can start from any configuration (Harris recurrence) and that the total variation (TV) tends to zero geometrically (geometric ergodicity), as proved in [26].
Article soumis au journal IJCV 9
1) MCMC procedure for point processes: The sampler we use is based on the work of Geyer and Møller (see [21]). It also uses the work of Green on Metropolis Hastings samplers in general state spaces (see [23]). This technique uses proposition kernels Qm (x, .), which generate a new configuration y from an old configuration x, and an acceptance rate αm (x, y), which is the probability of accepting y in place of x. Denote by x = {x1 , . . . , xn(x) } a configuration of points. We consider 3 kinds of proposition kernel. a) Birth or death: This kind of perturbation first chooses with probability pb wether a point should be removed, or with probability pd = 1 − pb wether a point should be added to the current configuration. If death is chosen, the kernel chooses randomly one point u in x and proposes y = x \ u, while if birth is chosen, it generates a new point u according to the uniform measure |.|/|S| and proposes y = x ∪ u. b) Non jumping transformations: Such transformations randomly select one point u in the current configuration x, perturb it to obtain a new point v and propose y = x \ u ∪ v. We present later all the perturbations we have use. c) Birth or death in a neighborhood: we introduced this kind of transformation in [26]. The idea is to propose the removal or addition of interacting pairs of points with respect to one of the previously defined relations of alignment, completion or paving. To each of these proposition kernels, a mapping Rm (., .) from C × C to (0, ∞) is associated. For a given selected perturbation and two configurations x and y, we call Rm (x, y), Green’s ratio.
Algorithm : For a given state Xt = x : 1. choose one of the previously described proposition kernels Qm (., .) with probability pm (x), 2. sample y according to the chosen kernel : y ∼ Qm (x, .), 3. compute the associated Green’s ratio Rm (x, y) and the acceptance rate αm (x, y) = min(Rm (x, y), 1), 4. accept the proposition Xt+1 = y with probability αm (x, y), and reject it otherwise.
2) Reversibility and Green’s ratio: The Markov Chain we are designing should converge to the desired distribution. The algorithm actually builds a π(.) reversible Markov Chain (a property of Markov Chains states that if it is π(.) reversible, and if it converges, then it converges to π(.)). π(.) reversibility comes from the expression of Green’s ratio Rm (x, y). If we assume that the proposition kernel Q(x, .) can be written as a mixture of distributions: X Q(x, .) = pm (x)Qm (x, .) m
then, if y is obtained from x using Qm (x, .) (i.e. y ∼ Qm (x, .)), the expression of the Green’s ratio can be formally denoted as : π(dy)Qm (y, dx) Rm (x, y) = π(dx)Qm (x.dy)
Mathias Ortner
No. 07038
where the division symbol corresponds to a Radon Nikodym derivative. Such a notation supposes that if y ∼ Qm (x, .), then x can be obtained from y through Qm (y, .). For instance, a translation perturbation that can only translate a point to its left is not convenient, since it does not allow the reverse move. 3) Simulated annealing: Our goal is to find a minimizer of the energy U (.). To do so, we use simulated annealing. Instead of simulating according to h(.), we simulate according 1 to h Tt (.). Tt is a temperature parameter which tends to zero as t tends to ∞. These techniques have been widely used in image processing (see [29] for instance). If Tt decreases with a logarithmic rate, then Xt tends to one of the global maximizers of h(.). Of course, in practice it is not possible to use a logarithmic decrease and we actually use a geometrical one. This last point makes the quality of the proposition kernels an important issue. Actually, a poorly correlated trajectory is much better since it ensures that the space has been sufficiently explored. In what follows, we aim at building a trajectory that is not overly correlated. B. Non jumping transformations Non jumping transformations are transformations that: first, select randomly a point u in the current configuration; second, perturb this point to obtain a new version v; and third, propose replacing u by v: y = x \ u ∪ v. 1) Translations, rotations, dilations: We have implemented the transformations shown in Figure 12. Each of these transformations uses a parameter z that is randomly chosen in some set Σ. For instance, as shown by Figure 13(a), rotation uses a z ∈ Σ = [−∆ϕ, ∆ϕ] to generate the new angle for the selected object. Under the following conditions: • u is chosen uniformly in x, • the distribution of Z giving z is symmetric over the symmetric space Σ = [−∆ϕ, ∆ϕ] the suitable ratio R(x, y) is given by: R(x, y) =
Fig. 12.
h(x) h(y)
2) Pre-explorative transformations: We also have implemented some Gibbs-like versions of such transformations. An example is provided in Fig. 13(b). Instead of using a uniformly generated z, it is possible to pre-compute a distribution depending on the target distribution h(.). The random variable Z which is used to generate the new object v is not symmetrically distributed. Its distribution follows an estimated approximation of π(x \ u ∪ v(z)). The advantage of such transformations is that they fit the target distribution by exploring it. As detailed in [26], Green’s ratio for this kind of transformation is close to 1: R(x, y) ≈ 1. The exact expression for R(x, y) is derived in [26]. C. Birth or death in neighborhoods We have defined some configurations that are favored by the prior model. It proves useful to improve the exploration ability of the algorithm in parts of the space where such alignments occur. For a given relevant rectangle (i.e. living in γ0 ) we propose to add a new rectangle in its neighborhood.
Article soumis au journal IJCV 10
(a) Rotation
(b) Translation
(c) Dilation 1 (length)
(d) Dilation 2 (width)
Simple non jumping transformations.
(a) Rotation Fig. 13.
(b) Pre-explorative rotation
Improvement of the rotation scheme.
1) Birth or death of an aligned point: The corresponding kernel proposes either to create an alignment or to remove one. This proposition kernel can choose between the two perturbations with probability respectively pb and pd = 1−pb : a) birth: it randomly selects a point u of x∩γ0 , generates a new point v aligned with u in the sense defined by ∼1 , ∼2 , ∼4 or ∼4 , and proposes y = x ∪ v. b) death: it selects a pair of aligned points provided that at least one of them is in γ0 , chooses an object v in this pair with probability 0.5, and propose to remove v: y = x \ v. Reversibility is ensured by taking into account two phenomena: generating one alignment can create several others, and a building created in the neighborhood of another one can be repulsive. The expression for the Green’s ratio associated with this kind of transformation is detailed in the Appendix II.
Mathias Ortner
No. 07038
2) Other birth or death in a neighborhood: This type of updating method has been implemented for each of the 12 cases corresponding to the relations of alignment, paving interaction and completion.
E. Convergence of the algorithm Let us verify that the conditions derived in [26] ensuring the convergence of the algorithm hold. 1) Stability conditions on the density: A real bound Rg is needed such that:
D. Reference measure
∀(x, u) ∈ C × S
1) Comments: For computational convenience, the reference intensity measure ν(.) usually used is uniform. This intensity measure defines the reference Poisson point process distribution µ(.). Take for instance: ν(.) = |.|K ×
|.|M |M |
(17)
Uint (x) + Uexcl (x) + ρUext (x) log(β(u)) − h(.) ∝ exp T u∈x (18) This is equivalent to changing the reference intensity measure, except that it permits to retain uniform generation. We denote by ν 0 (.) the corresponding reference measure, given by equation (1). 2) Utility of the partition: As detailed in section III-B.3, we have defined a partition of S as S = γ0 ∪ · · · ∪ γ3 , where the sizes of the γi in one example are given by table I. We propose to use the following function β: X
i=3 Y
1γi (u)
(19)
βi
i=0
which gives the following property : Ei = E[Nγi (X)] = ν 0 (γi ) = βi ν(γi ) = βi |K|
|γi | |S|
(20)
We can tune the weights such that the Ei are of the same order, making all γi approximatively equivalent with respect to the exploration ability of the Markov Chain. Another advantage of this parameterization is that it is independent of the observed area K, since ν 0 (.) is proportional to the surface measure (Lebesgue measure in R2 ). Using an MCMC run, we computed the values |γi |/|S| given in table I, and then set the βi to values such that: Ei ≈ |K|
where |K| corresponds to the area, in meters squared, of the urban zone under consideration.
h(x ∪ u) ≤ Rg h(x)
(21)
This condition states that adding a point to a configuration should not decrease too much its energy. This is verified if the following condition on the prior model holds, as proved in the Appendix III. aexcl < −
It describes a homogeneous Poisson point process that puts on average |K| objects in S. The configuration maximizing h(x) does not depend on the chosen reference measure. The advantage of using a simple intensity measure is that it makes the birth of a point easier. For instance, using (17) implies that the birth step uses a uniform generation over S. However, in our setup, points of interest are those living in γ0 which is of small Lebesgue measure as already mentioned in table I. To improve the exploration of γ0 , a solution is to use another reference measure that favors this set. It is possible to do so and keep uniform generation. It is sufficient to add a “measure term” to the energy :
β(u) =
Article soumis au journal IJCV 11
i=12 X
(ai + bi )
(22)
i=1
2) Stability conditions on the sampler: The next three sets of conditions we need concern the sampler. a) For each perturbation kernel, the probability of choosing one kind of transformation pm (x) does not depend on the state Xt = x, and the probability of proposing to do nothing is strictly positive, b) some bounds are needed on the distributions used by the birth or death proposition kernels (these distributions are detailed in the Appendix II), c) for each relation used to build a birth or death in a neighborhood kernel, the following condition holds: !
∃Ri ∀u ∈ S
|{v ∈ S v ∼i u}|S > Ri
In this framework, all these conditions are valid. 3) Convergence property: The fact that these conditions hold gives the convergence property of the sampler at a fixed temperature. We actually have the ergodic convergence and Harris recurrence of the Markov Chain. Because we have this convergence of the sampler, we can state the following : decreasing the temperature at a logarithmic rate in the simulated annealing gives a realization x that belongs to the set of global maxima of the density h(.) (see [24] for a proof, using Dobrushin’s conditions). VI. R ESULTS We show here some practical results obtained using the proposed method on optical and Laser data. The starting configuration is the empty set. We first focus on the parameters used. We then present different results on different DEMs. A. Parameters of the Model Parameters were tuned by trial and error. As already mentioned, parameters βi describing the true underlying reference measure were computed using an MCMC procedure. Table II presents all parameters used that do not depend on the kind of data provided, its size, or the subsampling resolution used. 1) Definition of the internal field: The relations of alignment, paving and completion require two physical parameters : the maximum accepted angle difference dtmax , and the maximum distance between the two considered corners of the two rectangles.
Mathias Ortner
Fig. 14.
No. 07038
Article soumis au journal IJCV 12
(a) Laser DEM (50cm)
(b) Ground truth (20cm)
(c) Estimated land register (20cm)
(d) Classification errors (black : missed, white : overdetected)
c Result on the first zone of interest (250 × 250 m2 ), using a LASER DEM ( IGN). TABLE III PARAMETERS OF THE SAMPLER
TABLE II PARAMETERS OF THE MODEL Space of Marks lmin 4m lmax 40m Lmin 6m Lmax 40m Attractive relations dtmax 20◦ dCmax 4m Intersection dtmax 60◦ Minimum length of detected gradient th1 80% th2 40%
Definition of γ1 , . . . , γ3 vmin 90% mmax 0.02 a al b al
Alignments 0.04 0.1
Completion relations a comp 0.04 b comp 0.1 Paving relations a pav 0 b pav 0.5 a
Exclusion relation -3
excl
2) Definition of γi sets for the data term: The data term relies on some thresholds. Of first importance are the levels defining what is γ0 , i.e. th1 and th2 . 3) Model parameters of the internal field: The model parameters ai , bi and aexcl were tuned and chosen such that
pBD
Birth or death 1/20
Birth or death w.r.t. ∼al pBDNal 5/20 Birth or death w. r. t. ∼pav pBDNpav 1/20
Birth or death w. r. t. ∼comp pBDNcomp 1/20 Non jumping transformation pT 3/20 pR 3/20 pDL 3/20 pDl 3/20
the condition (22) holds. The prior model proved to be robust. Only the relative weight between the internal field and the external one needs to be re-tuned, depending on the desired regularization and the noise level in the data. Here we took ρ = 1. B. Parameters of the sampler There are two main types of parameter : the mixture parameters (pm ) and the size of the spaces used to generate new components. For birth or death perturbation kernels, the
Mathias Ortner
Fig. 15.
No. 07038
Article soumis au journal IJCV 13
(a) Optical DEM (20cm)
(b) ground truth (20cm)
(c) Estimated land register (20cm)
(d) Classification errors (20cm)
c Second result: optical DEM of zone 1 (250 × 250 m2 , IGN). TABLE IV
L OW LEVEL FILTER PARAMETERS USED FOR L ASER AND OPTICAL DATA . Grid of points r 0.5m e 1m lext 10m
Data
Laser
lregul σl σh δr
2m 0.7*r 1.5m 2m
Optical Filters 1 (Amiens) 2 (Rennes) 4m 1*r 2m 3m 2.5m
TABLE V M ISCLASSIFICATION RATES FOR THE THREE FIRST PRESENTED RESULTS . Result Missed area (black) False alarm (white)
1 5% 11%
2 8.9% 10.6%
3 11.4% 5.9%
probabilities were taken to be equal (pd = pb = 0.5). Table III presents the mixture parameters. C. Data and low level parameters 1) LASER Data: We first present an example on LASER data. The main problem encountered here is a problem of smoothness. Such data are built using an aircraft and several flights
over the area. Measured points are not equidistant. Thus some areas are precisely covered while some others are not, and the reconstruction is obtained using some interpolation between measured points. This makes the profiles a lot smoother than those provided by a correlation algorithm, and thus low level filter parameters are tuned in order to deal with smooth discontinuities, as described in table IV. The DEM has a planimetric resolution of 0.5m and a altimetric resolution of 0.1m 2) Optical Data (1 and 2): We consider two DEMs, both provided by the French National Geographic Institute. The first one corresponds to the French town of Amiens. Its planimetric resolution is 20cm while the altimetric one is around 10cm. It is a high quality DEM, and our objective with this data is to detect low buildings, a task on which simpler methods fail. We present results on different parts of this DEM (second and third results). Ground truth is available for some subparts of this data. The second optical DEM is of the French town of Rennes. This DEM is a lot noisier, and our objective is to test if the proposed approach still works. However, ground truth is not available for this data. 3) Low level filter: All parameters were kept the same for all the results exept those related to the low level filter. For LASER data, the filter was tuned in order to deal with
Mathias Ortner
No. 07038
(a) Aerial photography (25cm)
(c) Estimated land register (20cm) Fig. 16.
Article soumis au journal IJCV 14
(b) Ground truth (20cm)
(d) Classification errors (20cm)
c Third zone of interest, in Amiens ( 250 × 250 m2 , IGN)
smooth discontinuities, and thus, for instance σl is lower than for the other data. For optical data 1, the goal is to detect small discontinuities, and σh was set to a low value. In contrast, for optical data 2, this parameter was taken to be larger, since this avoids the detection of discontinuities due to noise. These parameters are summarized in table IV.
the vectorial representation extracted from the DEM. Once this image has been correctly shifted (registration step), it is possible to compute the surface that has been misclassified. This gives three classes : pixels that were well-classified (in grey on the resulting images), missed pixels (in black), and over detected pixels (in white).
D. Validation of the method We encountered difficult problems when trying to validate our method. The ground truth actually consists of a very precise DEM built semi-automatically by IGN using precise aerial optical data. The DEMs we use do not contain enough information to be able to detect artifacts as precisely. Moreover some buildings are missing in the ground truth. Finally, and perhaps more importantly, it is hard to compare two vectorial representations of the same urban area. 1) Hausdorff Distance: For a given result (i.e. an estimated configuration of rectangles x), it is possible to compute the average surface of missed buildings and the average surface of over-detected buildings with respect to the ground truth. This corresponds to the two parts of Hausdorff distance. We compare the ground truth, which is at 20cm resolution to the land register we have automatically obtained by projecting
2) Visual appreciation: Another way of estimating the quality of the results is to build a refined DEM from the estimated configuration of rectangles. To achieve this we introduced the following procedure. For each detected rectangle, we compute two values : an estimate of the mean height inside the rectangle (roof height) and an estimate of the mean height along each side, giving an estimate of the local ground elevation. For each rectangle, we then replace elevation values inside the rectangle by the ground estimate. This gives a crude DSM (Digital Surface Model). We then perform an opening, a closing and a gaussian smoothing in order to obtain a representation of the surface underlying the town. We finally inject the mean height estimate for each rectangle. This gives a refined DEM where buildings are represented by parallelepipeds.
Mathias Ortner
No. 07038
(a) Original DEM
(c) Photography of a subpart
(e) Crude DEM
Article soumis au journal IJCV 15
(b) Reconstuction
(d) Corresponding extracted rectangles
(f) Reconstuction
c Fig. 17. Result on a large area (1000×750 m2 ). DEM of Amiens ( IGN). The selected part shows a result on an area where trees are of first importance. Disjoint trees are well separated, while overlapping ones disturb the detection.
Mathias Ortner
No. 07038
Article soumis au journal IJCV 16
(a) Crude DEM
(b) Reconstuction Fig. 18.
c Another area of Amiens (250 × 280 m2 , IGN). The result shows that the method can take into account small discontinuities.
Mathias Ortner
No. 07038
(a) Original DEM
(b) Reconstuction Fig. 19.
c Result on a low quality DEM (40cm, 10cm), Rennes, IGN.
Article soumis au journal IJCV 17
Mathias Ortner
No. 07038
E. First result : LASER data The LASER data are at a resolution of 0.5m per pixel and of 0.01m for the height. The area of interest, on which the second result also focuses, is the area presented in Figure 1. 1) Extraction: Figure 14 shows the data, the ground truth, the land register map that was estimated and the misclassified pixels. Quite frequently, the size of detected rectangles is bigger than required due to the low level filter, and because of the smoothness of the data. 2) Misclassification rates and comments: Table V presents the detection errors obtained. The image corresponds to an area which is approximatively 200m by 200m in size. Since the resolution of the data is 0.5m, the size of the processed image is about 400 × 400. The automatic extraction of the land register requires 3 hours (10 million iterations ) which is a quite long time, given that the algorithm was implemented in C++ on a Linux engine working at 1Ghz with 1 Gb Ram . However, it is important to notice that the first three results all took approximatly the same time even if the the subparts of the optical DEM are around 1000 × 1000 (resolution of 20cm). The size of the data is important for the quality of the results, but what influences the computation time the most is the complexity of the observed scene. F. Second result : Optical DEM 1 The second result was obtained on a DEM built using optical data and a correlation algorithm. The resolution is 0.2m per pixel, and 0.1m per unit height. Fig. 15 presents the estimated configuration. It is worth noticing that some trees have been classified as buildings. It is nevertheless possible to do some post-processing to improve this result. G. Third result : Optical DEM 1 A similar result is presented in Figure 16. The optical DEM is smooth in the backyards of buildings. Discontinuities are difficult to detect in these cases. Our method behaves correctly, except in these backyards and over the trees. H. Fourth result : Optical DEM 1, large area Figure 17 presents a result obtained on a large area (1000 × 750 m2 ). This area includes some trees. We focus on a subpart of the result in order to show how the method behaves with respect to the trees. When trees are well separated from buildings, they are correctly distinguished by the detector. However, when there is a large overlap between trees and buildings, our method considers them as a single element.
Article soumis au journal IJCV 18 VII. C ONCLUSION
We have presented an effective algorithm for the difficult problem of extracting buildings from urban areas. The main advantages of the algorithm we propose are the following. The algorithm extracts simple rectangular shapes from altimetric data. It is an automatic approach which means that there is no need for any interaction with an operator and, in particular, no initial conditions are needed. The algorithm seems to be effective even on dense urban areas, and the parametrization is quite robust, since the same model was used for both LASER and optical DEMs. The results obtained might be very useful for more precise 3D reconstruction algorithms such as those proposed in [3]. Nevertheless, there are some drawbacks : the algorithm is quite slow, it still requires a few hours to process a 1000 by 1000 pixel image. Another issue is the ad hoc tuning of some parameters. However, we have presented an elegant way of mixing low level information and high level knowledge by using point processes. We also have experimentally ascertained that a non Bayesian model can be powerful. Finally, we have introduced a relevant way of using non homogeneous Poisson point processes. Future work will involve implementing an MCMC-ML procedure for the estimation of parameters. We also plan to do data fusion and use more data. Another direction involves the use of more complex models of buildings to improve the quality of the results. To achieve this goal, some improvements in computational time may be needed : a hierarchical approach could be faster by first detecting higher buildings, and then refining the description; another possibility would be to use some primitives and introduce a non-homogeneous birth or death that allows us to make a building appear only in relevant positions, making our approach more data-driven, as proposed by Tu and Zhu in [30].
ACKNOWLEDGMENT The authors would like to thank the French National Geographic Institute (IGN) for providing the data, M. C. Van Lieshout from CWI, Amsterdam and R. Stoica from University Jaume I, Castellon for several interesting discussions, H. Le Men, G. Maillet and N. Chehata from IGN for fruitfully comments. The work of the first author has been partially supported by the French Defense Agency (DGA) and CNRS.
I. Fifth result : Optical DEM 1, small discontinuities In figure 18 we present another result. The goal was to show the algorithm’s ability to detect small discontinuities. J. Sixth result : Optical DEM 2, low quality In figure 19 we present a result on a much noiser DEM. This result is far from perfect. However, it should be remarked that the main lines of the urban area were respected by the method.
A PPENDIX I T HE PROPOSED MODEL AS AN EXPONENTIAL FAMILY We have defined 12 attractive interactions ∼1al , . . . , ∼4al , ∼5comp , . . . , ∼8comp , ∼9pav , . . . , ∼12 pav , an exlusion interaction ∼excl , and a partition of the object space S defined by γ0 , . . . , γ3 . We now define the following mappings and
Mathias Ortner
No. 07038
parametrization: X tint (x) = V 1 (x, u), , W 1 (x, u), . . . , V i (x, u), u∈x
W i (x, u), . . . , V 12 (x, u), W 12 (x, u)
td (x)
=
X
(1γ0 (u), J0 (u), . . . , 1γ3 (u), J3 (u))
T
T
u∈x
texcl (x)
=
X
Vexcl (x, u)
u∈x
θint θext
= =
(a1 , b1 , . . . , ai , bi , . . . , a12 , b12 )T (0, 1, 0.001, 0.001, 0.01, 0.01, 0.1, 0.1)T
θmes
=
(log(β0 ), 0, . . . , log(βi ), 0, . . . , log(β3 ), 0)
θexcl
=
aexcl
T
It is easy to verify that the density hTt (.) can be written as: < θext , td (x) > h(x) ∝ exp − < θmes , td (x) > + Tt < θint , tint (x) > + < θexcl , texcl (x) > + Tt A PPENDIX II G REEN ’ S RATIOS We present here the expression of Green’s ratio for the birth or death kernel and the birth or death within a neighborhood. These expressions where derived in detail in [26]. Validity of these expressions was experimentally verified in [26], by checking the convergence of the empirical distributions of the Markov Chain to reference distributions. This empirical checking is important since MCMC methods are deeply sensitive to coding errors. A. Birth or death For a given state Xt = x, with probability pb this kernel proposes to add a point to the current configuration, and with probability pd = 1 − pb , proposes to remove a point of the current configuration, except if Xt = {∅} in which case Xt+1 = Xt : 1) Birth: Generate a new point u ∈ S according to propose y = x ∪ u, compute : R=
h(y) |K| h(y) ν(S) = h(x) n(y) h(x) n(y)
ν(.) ν(S) ,
(23)
2) Death: Choose v uniformly in x, propose y = x \ v, compute : h(y) n(x) h(y) n(x) R= = (24) h(x) ν(S) h(x) |K| B. Birth or death within a neighborhood We briefly present here how we have implemented this kind of transformations and the associated Green’s ratio. For two given rectangles, all attractive relations are described by a distance constraint over the two corners of interest C(u) and C(v) and a distance constraint over the angle of the rectangles.
Article soumis au journal IJCV 19
1) Birth: If birth has been chosen (with probability pb ), the procedure is the following : 1. Randomly select an object u in γ0 ∩ x. If the set is empty do nothing. 2. Once u has been chosen, generate a new rectangle v such that v ∼ u. A solution is to simulate Z ∈ R5 such that z = (x, y, δθ, L, l). (x, y) are the coordinates of the corner of interest C(v) of the generated object. Thus, the point made by this couple should be in a disk of center C(u) and radius dmax . We propose to uniformly generate this couple in the disk. A good solution is to uniformly generate points in a square of side 2 ∗ dmax until one of them falls in the disk. A polar parametrisation should be avoided, as pointed out in [26], δθ is the difference of angle between u and v : it should be generated in [−dt , dt ]. L, l are respectively the length and the width of v, generated in [Lmin , Lmax ] and [lmin , lmax ]. 3. Using z, we construct the unique associated v. 4. We compute R+ (x ∪ v), set of pairs of related points in x ∪ v such that one of the points is in γ0 : R+ (x) = {{w, w 0 } ∈ x2
w ∼ w0
{w, w0 } ∩ γ0 6= ∅}
we then compute s(x ∪ v) = card R+ (x) and jd : P 1 u∈N (v,x∪v) 2 1γ0 (u)1γ0 (v) + 1γ0 (u)(1 − 1γ0 (v)) x∪v jd (v) = s(x ∪ v) and then the jb term : X
jbx (u) =
X
u∈N (v,x)
u∈N (v,x)
1γ0 (u) cardγ0 (x)
5. Finally, the Green’s ratio is given by : R(x, x ∪ v) =
jdx∪v 2πd2max dt h(x ∪ v) pd P x h(x) pb u∈N (v,x) jb (u) π
2) Death: If death has been chosen, the procedure is the following : 1. Randomly select a pair {w, w 0 } of related objects in + R (x), 2. Choose a point v in the chosen pair : • •
if both w and w 0 are in γ0 (x) choose one of the two with probability 0.5, otherwise v is taken as the unique object in the pair belonging to γ0 ,
3. Compute Green’s ratio : h(x \ v) pb R(x, x \ v) = h(x) pd
P
x\v (u) u∈N (v,x) jb x jd (v)
π 2πd2max dt
Following this procedure is important. For instance, as pointed out in [26], if death uniformly chooses an object instead of a pair of interacting objects, the convergence property might be lost.
Mathias Ortner
No. 07038
A PPENDIX III S TABILITY CONDITIONS In [26] we expressed some sufficient conditions under which the algorithm converges. Among them, the stability condition given by equation (21). This condition states that adding a point to any configuration should not decrease too much its energy. One of the consequence of this condition is the integrability of the density h(x), since it can be bounded by n(x) h({∅})Rg which is integrable.
Article soumis au journal IJCV 20
C. Optimisation Since the goal is to obtain a minimizing condition, the energy has to be bounded. A similar analysis shows that the following condition is sufficient : aexcl < −(
k X i=1
(ai + bi )) −
√ 2kθext k
(27)
R EFERENCES
[1] A. Fischer, T. H. Kolbe, F. Lang, A. B. Cremers, W. Fo¨ rstner, L. Pl¨umer, and V. Steinhage, “Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D,” Computer Vision and Image Understanding, vol. 72, no. 2, pp. 185–203, 1998. √ [2] H. Mayer, “Automatic object extraction from aerial imagery-a survey td (x ∪ u) − td (x) = td (u) with ktd (u)k ≤ 2 focusing on buildings,” Computer Vision and Image Understanding, vol. 74, no. 2, pp. 138–149, 1999. This leads immediately to: [3] H. Jibrini, “Reconstruction automatique des bˆatiments en mod`eles √ θext kθext k poly´edriques 3D a` partir de donn´ees cadastrales vectoris´ees 2D et d’un , td (x∪u)−td (x) > | ≤ 2(kθmes k+ ) | < θmes + couple d’images a´eriennes a` haute r´esolution,” Ph.D. dissertation, ENST, T T Paris, France, 2002. [4] M. Fradkin, M. Roux, and H. Maˆıtre, “Building detection from multiple views.” in ISPRS Conference on Automatic Extraction of GIS Objects B. Prior model form Digital Imagery, 1999. [5] M. Fradkin, M. Roux, H. Maˆıtre, and U. Leloglu, “Surface reconstrucThe stability condition holds also with respect to the prior tion from multiple aerial images in dense urban areas.” in Proc of IEEE model, but this property relies on the fact that the exclusion Int. Conf. on Computer Vision and Pattern Recognition, vol. 1, Fort Collins, Colorado, USA, June 1999, pp. 262–267. relation penalizes superpositions of rectangles. Let take the [6] F. Fuchs, “Contribution a` la reconstruction du bˆati en milieu urbain, a` practical example of an attractive relation ∼i . Recalling that l’aide d’images a´eriennes st´er´eoscopiques a` grande e´ chelle. etude d’une aexcl < 0, we suppose that: approche structurelle.” Ph.D. dissertation, Universit´e Ren´e Descartes, Paris V, France, 2001. aexcl < −(ai + bi ) (25) [7] A. Stassopoulou and T. Caelli, “Building detection using bayesian networks,” International Journal of Pattern Recognition and Artificial The induced energy variation when adding a point to a Intelligence, vol. 14, no. 6, pp. 715–733, 2000. configuration is: [8] S. Vinson and L. D. Cohen, “Multiple rectangle model for buildings X segmentation and 3D scene reconstruction.” in Proc of ICPR Int. Conf. i i i (x∪u, u)+ (x, v) U i (x∪u)−U i (x) = Uloc (x∪u, v)−Uloc Uloc on Pattern Recognition, Qu´ebec, Canada, August 2002. [9] S. Vinson, L. D. Cohen, and F. Perlant, “Extraction of rectangular v∈x buildings using DEM and orthoimage.” in SCIA, Bergen, Norway, June i where Uloc stands for: 2001. [10] U. Weidner, “Building extraction from digital elevation models,” Institut i Uloc (x, v) = −[ai V i (x, v) + bi W i (x, v) + aexcl Vexcl (x, v))] f¨ur Photogrammetrie, Bonn, Tech. Rep., 1995. Vestri and C. Devernay, “Using robust methods for automatic Thus the negative part of the energy variation corresponds to: [11] C. extraction of buildings,” in CVPR, 2001. i • the part related to Uloc (x ∪ u, u) which is bounded by [12] H. Maas and G. Vosselman, “Two algorithms for extracting building models from raw laser altimetry data,” |ai | + |bi |, vol. 54, no. 2-3, pp. 153–163, 1999. [Online]. Avail• the variation induced by u over each v belonging to able: http://www.sciencedirect.com/science/article/B6VF4-3WY9SWXx. Condition (25) gives that the negative part of the D/1/2ab469c94d1f33bc54ae6f9cb55d5c99 energy only involves the objects in x that are in attractive [13] A. Baddeley and M. N. M. Van Lieshout, “Stochastic geometry models in high-level vision,” Statistics and Images, vol. 1, pp. 233–258, 1993. interaction ∼i but not in exclusive relation ∼excl with u. [14] H. Rue and M. Hurn, “Bayesian object identification,” Biometrika, vol. 3, We have seen that the number of such objects is uniformly pp. 649–660, 1999. [15] H. Rue and A. R. Syverseen, “Bayesian object recognition with Baddebounded by Nover and : ley’s delta loss,” Adv. Appl. Prob, vol. 30, pp. 64–84, 1998. i [16] R. Stoica, X. Descombes, and J. Zerubia, “A gibbs point process for e−U (x∪u) i i road extraction from remotely sensed images,” Int. Journal on Computer = exp −|U (x ∪ u) − U (x)|+ . . . e−U i (x) Vision, vol. 37, no. 2, pp. 121–136, 2004. [17] C. Lacoste, X. Descombes, and J. Zerubia, “A comparative study of +|U i (x ∪ u) − U i (x)|− point processes for line network extraction in remote sensing,” INRIA i i ≤ exp |U (x ∪ u) − U (x)|− Research Report 4516, 2002. [18] M. Ortner, X. Descombes, and J. Zerubia, “Building Detection from ≤ exp |ai | + |bi | + Nover (|ai | + |bi |) Digital Elevation Models,” in ICASSP, vol. III, Hong Kong, April 2003. [19] M. N. M. Van Lieshout, Markov Point Processes and their Applications. ≤ exp (Nover + 1)(|ai | + |bi |) Imperial College Press, London, 2000. Finally, if we want to be able to sum this result over k attractive [20] O. E. Banorff-Nielsen, W. Kendall, and M. V. Lieshout, Eds., Stochastic Geometry Likehood and computation. Chapman and Hall, 1999. relations we need the following condition: [21] C. Geyer and J. Møller, “Simulation and likehood inference for spatial k point processes,” Scandinavian Journal of Statistics, vol. Series B, 21, X pp. 359–373, 1994. (26) aexcl < −( (ai + bi )) [22] C. J. Geyer, “Likehood inference for spatial point processes,” in i=1 Stochastic Geometry Likehood and computation, O. Banorff-Nielsen, W. Kendall, and M. V. Lieshout, Eds. Chapman and Hall, 1999. and, through the two results, the stability condition is proved.
A. Term involving td : It is important to notice that:
Mathias Ortner
No. 07038
[23] P. Green, “Reversible jump Markov chain Monte-Carlo computation and Bayesian model determination,” Biometrika, vol. 57, pp. 97–109, 1995. [24] M. N. M. Van Lieshout, “Stochastic annealing for nearest-neighbour point processes with application to object recognition.” CWI Research Report, BS-R9306, ISSN 0924-0659, 1993. [25] A. Srivastava, U. Grenander, G. Jensen, and M. Miller, “Jump-diffusion Markov processes on orthogonal groups for object recognition,” Journal of Statistical Planning and Inference, 1999. [26] M. Ortner, X. Descombes, and J. Zerubia, “Improved RJMCMC point process sampler for object detection by simulated annealing.” INRIA Research Report 4900, August 2003. [27] A. Pievatolo and P. Green, “Boundary detection through dynamic polygons,” Journal of the Royal Statistical Society, vol. B, no. 60, pp. 609–626, 1998. [28] M. Ortner, X. Descombes, and J. Zerubia, “Automatic 3D land register extraction from altimetric data in dense urban areas.” INRIA Research Report 4919, August 2003. [29] G. Winkler, Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: a Mathematical Introduction. Springer-Verlag, 2003. [30] Z. Tu and S. Zhu, “Image segmentation by Data-Driven Markov Chain Monte Carlo,” IEEE Trans on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657–673, May 2002.
Article soumis au journal IJCV 21