Driving 3D Geometric Modelling for Building Roof Reconstruction by Semantics Stephan Scholze1 , Theo Moons2,3 and Luc Van Gool1,2 1
2
ETH Z¨urich, Computer Vision Laboratory (BIWI), Z¨urich, Switzerland Email: {scholze,vangool}@vision.ee.ethz.ch Katholieke Universiteit Leuven, Center for Processing Speech and Images (ESAT/PSI), Leuven, Belgium 3 Katholieke Universiteit Brussel, Group of Exact Sciences, Brussel, Belgium Email:
[email protected]
Abstract This paper investigates into model-based reconstruction of complex polyhedral scenes consisting of a structured ensemble of planar polygonal faces. The modelling is done in two different regimes. One focuses on geometry, whereas the other is ruled by semantics. Inside the geometric regime, 3D line segments are grouped into planes and further into faces using a Bayesian analysis. In the second regime, the preliminary geometric models are subject to a semantic interpretation. The knowledge gained in this step is used for inferring missing parts of the model (by invoking the geometric regime once more) and for adjusting the overall model topology. A main application of the presented method is for automated building roof reconstruction from multiview aerial imagery in densely built up urban areas.
1
Introduction
Automated building roof reconstruction from aerial imagery poses several demanding 3D modelling problems. The goal of building roof reconstruction is to recover the precise 3D geometry of building roofs. Although research in the field of fully automated building reconstruction has been conducted over the past decades, none of the developed systems made it into industrial production. However, very powerful systems have been proposed, able to work fully automatic in specific scenarios. Two main aspects make modelling a difficult VMV 2002
task. Firstly, input data might be missing, incomplete, erroneous and will always show some inevitable geometric imprecision. Secondly, roof shapes show a great variability which has to be met by an appropriate model. Consequently, two key elements seem to be crucial to successfully deal with the complicated reconstruction task. On the one hand, a balance between generic and detailed building models has to be found, in order to cope with the virtually infinite variability of roof types on one hand and with the need to support modelling with prior knowledge on the other. On the other hand, a building reconstruction system, able to deal with missing or erroneous input data will need some sort of intelligent control to perform its task successfully.
1.1
Previous Work
Despite the long lasting need for cheap, reliable and up-to-date 3D city models for technical and environmental planning tasks, automated building reconstruction is still an only partially solved task. An excellent overview of important developments and state-of-the-art in the field can be found in the Proceedings of the Ascona Workshops on Automatic Extraction of Man-Made Objects from Aerial and Space Images [1, 2, 3]. Some very renown approaches, able to reconstruct complex buildings with minimal user interaction include [4, 5, 6, 7]. The roof models used in the literature cover the range from very generic ones (polyhedral structures) up to very specific ones (parametrized modErlangen, Germany, November 20–22, 2002
from the image data up to a preliminary geometric model of the roof. A detailed description how the geometric model is augmented with semantic meaning is given in Section 4. The semantic interpretation is used to complete and refine the roof model. Finally, results of the approach are presented in Section 5.
els of different building types). For an overview see [8]. For handling uncertainty and imprecision of the input data, some building reconstruction systems already have a probabilistic underpinning [9, 10, 11].
1.2 A Geometric Model in a Probabilistic Setup In this paper, a probabilistic formulation for geometric building reconstruction is presented. As a novelty, a semantic interpretation of generic roof parts is used to guide the geometric reconstruction. The conjunction of robust geometric reasoning in 3D space together with a semantic interpretation allows to reconstruct complex building roofs completely and with correct topology. Figure 1 gives an overview of the proposed methodology.
2
A roof is geometrically modelled as an ensemble of planar polygonal faces (patches). It suffices to distinguish between triangular and quadrangular patches, since more complex patch shapes (e.g. Lshapes) are obtained by patch composition. Theoretically, it would suffice to use triangular patches only. However, since quadrangular patches are very common in building roofs, including them is advantageous for the practical application. The 3D line segments forming the border of a patch will be referred to as patch segments. The term edge is deliberately not used for its context in computer vision.
3D line segments (a)
seeds from geometry (b)
geometric reconstruction (c)
Geometric Roof Model
2.1
seeds from semantics (d)
Roof Atoms: Patches
In Figure 2(a) the parametrization of a quadrangular patch is shown. The advantages of the parametric representation are that the quantities involved have an obvious meaning, and that probability distributions for the parameters can be obtained (e.g. from a test dataset). Additionally, this representation allows to incorporate symmetries between different patches of a roof model. In parallel to the parametric representation, a representation based on the 3D world coordinates of the corner points P0 , . . . , P3 of the patch is kept as well. Although this dual representation is redundant, the coordinate based representation is especially useful when introducing coincidence constraints to ensure topological connectivity, as discussed in the next section.
semantic labelling (e)
semantic model refinement (f)
Figure 1: Structure of geometric and semantic modelling system. From the input data (a), consisting of 3D line segments, a set of seed segments (b) is chosen by virtue of some geometric properties. In the neighbourhood of these, preliminary geometric models consisting of planar faces in 3D space are reconstructed (c). Firstly, the semantic labelling performed in (e) allows to detect missing parts of the model and thus to determine additional seed segments (d) which in turn are used for geometric model instantiation. Secondly, the semantic labels are used to improve the overall model topology (f).
2.2
Roof =
P
Patches + Constraints
A roof model consists of its constituting patches and additional (geometric) constraints which describe the relations between the patches. As an illustrative example, consider the L-shaped patch in Figure 2(b), constructed out of two quadrangular patches. In order to compose the two patches to represent
The organisation of this paper is as follows. In Section 2 the proposed roof model is presented. Section 3 briefly summarizes the processing steps 666
built up urban areas. An example stereo pair is show in Figure 3(a,b). Using the input images, a set of 3D line segments is derived using feature based multiview correspondence analysis [12, 13]. These line segments (Figure 3(c)) form the input data for the following reconstruction process.
one L-shaped patch, two constraints are imposed. By making use of the parametric representation, a unique slant angle can be achieved by requiring (1) (2) φ2 = φ2 . Additionally, the coordinate based representation allows to glue corner points together (1) (2) by setting P0 = P3 . Another important benefit in using constraints transpires during model topology refinement described in Section 4.5.
P3 z0 y
T
z x
y
φ2 P2
α2
w 0
x0
α1
P0 h
(a)
P1
(a)
(b)
(1)
φ2
(1) P (2)0
P3
(2)
φ2
(b)
(c) Figure 2: (a): Quadrangular patch model. T is the transformation from world coordinates to the local patch coordinate system including a translation, a rotation around the z-axis by φ1 , and a rotation around the y 0 -axis by φ3 . The slant of the patch plane is given by φ2 . The inclination of the bordering segments is denoted with α1 and α2 respectively. The parametrization of the patch is completed by specifying the width w and the height h. The orientation of the line given by the points {P1 , P2 } is always parallel to the x0 -axis. (b): Illustration of the two different types of constraints. (For this example, the inclination of the bordering segments is fixed to α = π/2.) Further description in text.
3
Figure 3: (a,b): Example aerial image stereo pair used for 3D line segment reconstruction. (c): Resulting 3D line segments (after geometric verification in a third view not shown here).
3.1 Plane and Patch Hypotheses Given a set of 3D line segments, planes which could correspond to roof structures, supported by these line segments, are sought. The general approach is to rotate a half plane around some reference 3D line segment, e. g. a tentative ridge line. (The choice of the reference segments is based on geometric measurements, basically corresponding to the unary attributes which will be introduced in Section 4.1.) For each inclination of the half plane, segments in its neighbourhood, which approximately lie in this plane are collected. To determine the number of planes and their positions, a Bayesian model selection procedure is applied. The result is a set (usually 1-4) of plane hypotheses in the neighbourhood of a
Patch Reconstruction: the Geometric Regime
This section discusses the geometry-based reconstruction of roof patches. The raw data consists of multiview, high resolution aerial imagery of densely 666
reference line together with their probabilities. During the Bayesian plane selection procedure, 3D line segments are associated with different plane hypotheses. The next step now is to determine the outlines of possible patches, containing the segments in the individual plane hypotheses. The outlines are found by computing the extremal points of the convex hull of the segments in a tentative plane. The extremal points correspond to the end points of the associated line segments, projected into the patch plane, with minimal/maximal coordinates. Then, the patch parameters are adjusted to contain the extremal points. At this stage of the reconstruction, different patch and plane configurations exist in parallel. To select the optimal configuration Utility Theory is applied. The Utility Function used consists of two parts. One part quantifies the reliability of the patch hypothesis [14], whereas the second part takes into account the compatibility between different patches. Finally, the patch and plane configuration with Maximum Expected Utility is entered into a preliminary roof model. (A detailed description of Bayesian plane and patch instantiation will be published elsewhere.)
gap
missing patches (a)
(b)
Figure 4: Results of geometry based patch reconstruction of parts of the corner house in Figure 3(c). Reference lines, used at different passes of geometric reconstruction algorithm, are indicated bold. See discussion in text.
4.1
Semantic Labels by Geometric Attributes
Five different semantic labels for patch segments are distinguished. The names of the labels are chosen to be coherent, although they should not be taken literally. For example, a gutter segment just corresponds to the lower boundary of a patch, no matter if there is a gutter in the scene or not. Figure 5 gives an overview. For convenience each semantic label (e. g. gutter) is represented by a variable ω (i) , where i encodes the actual label. The five possible labels form the set Ω:
3.2 Results of Geometry-based Reconstruction
Ω = { ω (1) = ridge, ω (2) = gutter,
Up till now, only geometric information conveyed by the 3D line segments has been used in the reconstruction process. Figure 4 shows some reconstructed patches for the corner building in Figure 3. Although the reconstructed patches capture the roof geometry, two aspects need to be improved. First of all, the pure geometric reconstruction fails to determine whether the entire roof structure has been retrieved or if patches are still missing. Secondly, small gaps appear between neighbouring segments which should be adjacent. Hence, the roof topology is not correct at this stage of reconstruction.
4
gap
ω (3) = gable, ω (4) = convex, ω (5) = concave}
(1)
In order to assign the semantic labels (later also referred to as classes) to the segments in a patch, geometric measurements are used. We distinguish measurements characterizing individual segments ui (unary attributes) and measurements between adjacent segments bij (binary attributes). The actual measurements of
Roof Reconstruction: the Semantic Regime
ui
(1)
:
length of segment i
(2) ui
:
slant angle of segment i relative to a horizontal plane
(1) bij
To obtain a complete and topological correct reconstruction of entire building roofs, the following reconstruction steps are driven by a semantic interpretation of the geometric patch models obtained so far.
:
angle enclosed by adjacent (coplanar) segments i and j
(2) bij
:
(signed) mid point height difference of segments i and j
666
(2)
form the components of the attribute vectors ui = (1) (2) (1) (2) (ui , ui ) and bij = (bij , bij ). For future convenience, the unary and binary attributes of a patch are represented in a cyclic graph where the nodes of the graph represent the patch segments. A node i holds the unary attribute vector ui . Between each two nodes of adjacent segments i and j, two edges are introduced in the graph, representing the binary attribute vectors bij and bji (cf. Figure 6). Although the current set of measured attributes is complete in a sense that it reliably allows correct label assignment, the system could be easily extended by introducing additional attributes. Note that currently only intra patch relations are being used.
labels, the unary and binary attributes have been recorded.
4.3
We now concentrate on the problem of assigning semantic labels to the patch segments obtained from the geometry-based reconstruction. In a first step, unary and binary attributes are computed for each patch individually. Using these attributes to assign semantic labels involves a classification of the measured attributes according to the attribute distributions of the test dataset. For the unary attributes, the possible classes correspond to the semantic labels given in Equation (1). Since the binary attributes describe relations between pairs of line segments, their possible classes are given by label combinations formed out of Ω×Ω (e.g. (ridge;gable) ). In other words, unary and binary attributes have to be classified separately, according to their different label sets.
ridge concave connection gable
convex connection
gable
gutter
Figure 5: Functional parts of a roof and their semantic labels. The label ridge is generally used for the upper boundary of a patch, gutter for the lower one. The boundaries of a patch are either labelled gable, convex connection or concave connection depending on the neighbourhood.
ui
bij
Distribution of Attributes In order to choose an appropriate classification method the distribution of the attributes obtained from the test dataset has to be considered. As a illustrative example the Normal Probability plot for the unary relations u(1) and u(2) of gutter line segments is depicted in Figure 7 (a,b). The figures are obtained by plotting the quantiles of the distribution of u(i) against the quantiles of a normal distribution with the same mean and the same variance as u(i) . If the distribution of u(i) would correspond to a normal distribution, the points would lie on a straight line. Deviations from this line indicate a different distribution. The points in Figure 7 (a,b) clearly deviate from straight lines, thus implying that one cannot assume that they are normal distributions. For line segments with different labels and for binary attributes the analysis yields comparable results.
uj
bji i
j
Figure 6: Fragment of cyclic graph used to represent unary and binary attributes. See description in the text.
4.2
Classification of New Measurements
Test Dataset
Linear Discriminant Analysis A non-parametric classification technique, namely Linear Discriminant Analysis (LDA), is used to classify the unknown observations [16]. LDA is chosen because it does not impose restrictions on the number of classes in a dataset (which would allow to use more attributes, if needed) and because the underlying distributions need not to be normal [17].
To learn the probabilistic structure for the geometric measurements at test dataset has been used. The dataset shows urban and sub-urban areas with fourfold overlap at an image scale of approximately 1:5000 [15]. Nearly 150 roofs, consisting of about 80 triangular and 270 quadrangular patches have been labelled manually using the semantic labels from Equation (1). Additionally to the semantic 666
shown), it is not possible to unambiguously assign labels using unary or binary attributes alone. This deficiency will be overcome in the next section.
u(2) 0.004
10
14
0.008
u(1)
(b)
0.002
N (µ, σ)
convex gable
gutter
gable convex concave
0
0.000
−2
−0.002
ridge
−6
−4
−2
0
2
4
1
w2
(a)
−6
−4
−2
0
2
4
w2
(b)
Figure 8: Schematic representation of unary attributes of the test dataset in the space of maximal separation. The clusters are shown as ellipses for visualization only – no normal distribution is assumed for classification. (a): Trigonal patches. (b): Quadrangular patches.
−2
3.9
0
4.1
gutter
−4
w2 2
w1
−4
−2
N (µ, σ)
(a)
2
15
0
10
2
0.000
2
5
w1
4
6
w1
3.85
3.95
4.05
4.15
N (µ, σ)
(c)
−2
−1
0
1
N (µ, σ)
2
(d)
Figure 7: Normal Probability plots of gutter line segments from test dataset. (a): Unary attribute length. (b): Unary attribute slant. (c): First discriminant function. (d): Second discriminant function. See description in text.
4.4
Probabilistic Relaxation Labelling (l)
The posterior probability distribution P (ωi |ui ) of (l) patch segment i having label ωi ∈ Ω (l = 1, . . . , 5) (cf. Equation (1) ) is known from LDA. However, unary attributes alone do not allow an unambiguous labelling. This is overcome by an iterative procedure which determines the labels for all segments in a patch in such a way, that the entire label assignment (per patch) has maximal probability, exploiting unary and binary attributes simultaneously. For this purpose, a probabilistic relaxation labelling algorithm is used [19]. The patch segments are represented as nodes in a cyclic graph (cf. Figure 6). Each node i holds the conditional probabil(l) ities P (ωi |ui ). Between each two nodes of adjacent segments i and j, two edges are introduced in the graph, representing the binary attributes bij and bji . Conditional probabilities of the form (m) (l) p(bij |ωi , ωj ) (l, m = 1, . . . , 5) are stored in these edges. These probabilities quantify the com(l) patibility between the label assignments ωi and (m) ωj . Assuming that the noise in the binary attributes bij is Gaussian and that the errors associated with each measurement are statistically independent one may write the density functions as
During LDA, the input data (unary and binary attributes respectively) are transformed as to obtain maximal separation between the classes. Optimal classification results would be achieved, when the decision boundaries perfectly separate the classes. Unfortunately, this might not be achieved in practice. However, the optimal separation between the different classes is obtained by maximizing the ratio λ=
between class scatter within class scatter
(3)
This calculation leads to a set of eigenvectors wi , which are called discriminant functions. These eigenvectors represent the directions of maximum separation between the classes [18]. Besides better separation, the transformed attribute vectors also show a distribution which can be reasonably well approximated by a multivariate normal distribution. This is illustrated by Figure 7 (c,d), showing the Normal Probability plot for the unary attribute vectors of gutter line segments in the space spanned by the discriminant functions w1 and w2 . Figure 8 shows a schematic representation of the clusters formed by the unary attributes of trigonal and quadrangular patches of the test dataset in the transformed space. Since the clusters are overlapping for both, unary and binary attributes (not
(l)
(m)
p(bij |ωi , ωj 666
1 )= Q √ × 2πσk k
exp
1X − 2 k
(k)
(k)
bij − ˜b{l,m}
!2 !
σk
means of geometric attributes to very probably correspond to ridge segments. Figure 9(a) shows the reconstruction result for the upper left building in Figure 3 which was completely reconstructed from its seed (here: ridge) line by one pass of the reconstruction algorithm. The reconstruction results are detailed and topologically correct. For instance the small difference in the slope of the two patches on the right side of the roof has been correctly detected. Figure 9(b) shows the reconstruction of the roof of the corner building in Figure 3. The triangular patches on the front side do not lie in a plane given by the seed line. However, driven by the semantic labels attributed to the partially reconstructed roof (cf. Figure 4 (a)), the missing patches could be successfully found.
(4)
where k denotes the number of binary attributes. ˜b(k) is the mean value of the corresponding bi{l,m} nary relation in the test dataset. Incompatible relations, i. e. relations which are not present in the test dataset, are given zero probability. In an iterative scheme the relaxation algorithm searches for a labelling which maximizes the overall patch probability. For all examples we have investigated, the algorithm converges rapidly (≈ 10 iterations) to a stable and correct label assignment.
4.5
Semantic Model Completion and Refinement
After having determined the semantic labels for all patch segments present in the geometric roof model, the roof is refined using a set of simple rules. The most important aspect here is inference of missing patches. Missing patches are indicated by segments corresponding to convex or concave connections, which do not have patches with compatibly labelled segments in their direct neighbourhood. For these segments (now used as reference segments, cf. Figure 1 (d)), the geometric patch reconstruction described in Section 3 is invoked again. If no more missing patches can be inferred, the semantic labels are used to glue corresponding segments together, closing possible gaps and therefore ensuring a consistent roof topology. The constraints introduced in Section 2 ensure, that already joined segments will not be separated again by following gluing operations. Note that in this step, topological correctness is preferred over geometric precision.
5
Results and Conclusion
The presented results are obtained using a stateof-the-art dataset, produced by Eurosense Belfotop n.v. The image characteristics are 1:4000 image scale with a pixel size corresponding to 8 × 8 cm2 on ground. The 3D line segments are obtained using three overlapping views. The precise sensor orientation is known. To emphasise the quality of the reconstructed roof geometry no texture mapping is applied. For the presented set of experiments the initial reference line segments have been determined by
Figure 9: (a,b): Three-dimensional view of reconstructed building roofs. The non-trivial roof structures are captured to their full extent. Thin lines show the input 3D segments. 666
To conclude, we feel that the combination of probabilistic geometric and semantic reasoning presented in this paper leads to encouraging reconstruction results. The proposed roof model seems to meet a reasonable balance between being general and specific. Future work will focus on exploiting the knowledge available in the form of test datasets to a broader extent, with the goal to initialize roof models with even less evidence in 3D space. Another research direction, which is pursued at the moment, is oriented towards the optimization of the geometric accuracy of the models using backprojection into the input images.
[9]
[10]
[11]
References [1] A. Gr¨un, O. K¨ubler, and P. Agouris. Automatic Extraction of Man-Made Objects from Aerial and Space Images. Birkh¨auser-Verlag, Basel / Boston / Berlin, 1995. [2] A. Gr¨un, E.P. Baltsavias, and O. Henricsson. Automatic Extraction of Man-Made Objects from Aerial and Space Images. Birkh¨auserVerlag, Basel / Boston / Berlin, 1997. [3] E. Baltsavias, A. Gr¨un, and L. Van Gool. Automatic Extraction of Man-Made Objects from Aerial and Space Images. Balkema Publishers, Rotterdam, 2001. [4] O. Henricsson. The Role of Color Attributes and Similarity Grouping in 3-D Building Reconstruction. Computer Vision and Image Understanding, 72(2):163–184, 11 1998. [5] T. Moons, D. Frere, J. Vandekerckhove, and L. Van Gool. Automatic Modelling and 3D Reconstruction of Urban House Roofs from High Resolution Aerial Imagery. ECCV, 1:410–425, 6 1998. [6] C. Baillard and A. Zisserman. A PlaneSweep Strategy for the 3D Reconstruction of Buildings from Multiple Images. In International Archives of Photogrammetry and Remote Sensing, volume 33 part B2, pages 56– 62, 7 2000. [7] A. Fischer, T.H. Kolbe, F. Lang, A.B. Cremers, W. F¨orstner, L. Pl¨umer, and V. Steinhage. Extracting Buildings from Aerial Images Using Hierarchical Aggregation in 2D and 3D. Computer Vision and Image Understanding, 72(2):185–203, 11 1998. [8] H. Mayer. Automatic Object Extraction from Aerial Imagery – A Survey Focusing on
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
666
Buildings. Computer Vision and Image Understanding, 74(2):138–149, 1999. K. Kulschewski. Building Recognition with Bayesian Networks. In W. F¨orstner and L. Pl¨umer, editors, SMATI, 1997. M. Cord, M. Jordan, and J.-P. Cocquerez. Accurate Building Structure Recovery from High Resolution Aerial Imagery. Computer Vision and Image Understanding, 82(2):138– 173, 2001. S. Heuel and W. F¨orstner. Topological and Geometrical Models for Building Reconstruction from Multiple Images. In E. Baltsavias, A. Gr¨un, and L. Van Gool, editors, Automatic Extraction of Man-Made Objects from Aerial and Space Images. Balkema Publishers, Rotterdam, June 2001. S. Scholze, T. Moons, F. Ade, and L. Van Gool. Exploiting Color for Edge Extraction and Line Segment Stereo Matching. In IAPRS, volume 33, pages 815–822, 7 2000. S. Scholze. Exploiting Color for Edge Extraction and Line Segment Multiview Matching. Technical Report BIWI–TR–257, Communication Technology Laboratory, Computer Vision Group, ETH, Z¨urich, Switzerland, 2000. S. Scholze, T. Moons, and L. Van Gool. A probabilistic approach to roof patch extraction and reconstruction. In E. Baltsavias, A. Gr¨un, and L. Van Gool, editors, Automatic Extraction of Man-Made Objects from Aerial and Space Images. Balkema Publishers, Rotterdam, June 2001. Institute of Geodesy and Photogrammetry. Dataset of Z¨urich Hoengg. Swiss Federal Institute of Technology (ETH) Z¨urich, CH-8093 ETH Hoenggerberg, Switzerland, 2001. R. A. Fisher. The use of Multiple Measurements in Taxonometric Problems. Ann. Eugenics, 7:179–188, 8 1936. D. J. Hand. Construction and Assessment of Classification Rules. John Wiley & Sons, 1997. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973. W. J. Christmas, J. Kittler, and M. Petrou. Structural Matching in Computer Vision Using Probabilistic Relaxation. PAMI, 17:749– 764, 8 1995.