Extraction of building footprints from airborne laser ... - IEEE Xplore

Extraction of building footprints from airborne laser scanning: Comparison and validation techniques (Invited Paper) Norbert Pfeifer∗ , Martin Rutzinger† , Franz Rottensteiner‡, Werner Muecke∗ and Markus Hollaus § ∗ Institute

of Photogrammetry and Remote Sensing, Vienna University of Technology, Austria Email: [email protected] † alp-S, Centre for Natural Hazard Management and Institute of Geography, University of Innsbruck, Austria [email protected] ‡ Cooperative Research Centre for Spatial Information, University of Melbourne, Australia Email: [email protected] § Christian Doppler Laboratory for Spatial Data from Laser Scanning and Remote Sensing, Vienna University of Technology Email: [email protected]

Abstract— Many applications in urban planning and analysis require the position and extent of buildings, the so-called building footprint. Cadastral maps acquired by ground based surveying are often not up-to-date or may not be available at all. Airborne laser scanning, on the other hand, offers the possibility to detect the roof outlines in an automated manner. This subject has been researched in the previous years, but a final comparison of the quality of these algorithms has not been performed. Additionally, the methods for assessing the quality of building footprint extraction should consider the special characteristics of airborne laser scanning data. This paper gives an overview of the algorithms developed for the task of house detection from airborne laser scanning and evaluates two algorithms experimentally. The methods of quality assessment are discussed and next to the standard method of pixel based comparison the object based comparison is studied, providing more insight.

I. I NTRODUCTION Airborne Laser Scanning (ALS), helicopter or fixed wing aircraft based, has received considerable attention for the characterization of surface areas and the extraction of topographic objects, also and especially in urban regions. On the one hand it is attractive because of the simplicity of obtaining 3d information by the direct polar measurement method [1]. Compared to 3d information derived from aerial images by correlation of stereo-pair-images [2] it has the advantage that no problems occur in areas without texture, e.g. areas of dark shadows, and areas with repeated texture, also a typical problem in urban areas. Both methods, i.e. laser scanning and image correlation, provide an unstructured set of points that can be interpolated to yield a digital surface model. For ALS common point density values over urban areas are currently between 0.5 and 15 points/m 2 . On the other hand, airborne laser scanning provides more than a digital surface model (DSM), also termed digital canopy model (DCM). It delivers information beyond points on the first visible surface from above. ALS has the ability to see through small gaps in the foliage of vegetation and record points on the ground below

trees. Additionally, from each laser shot a number of echoes can be detected, if the laser pulse, which has a diameter of roughly 50cm in typical projects, is reflected at different surfaces distributed in the vertical, i.e. shot, direction. Often, but not always, the last detected echo comes from the ground. The classification of the measured points into ground points and off-terrain points is called filtering [3] and provides a digital terrain model (DTM). These algorithms were first developed for forested areas and application in urban areas is not without problems [4], meaning that off-terrain points will, in some cases, be classified as ground points and vice versa. One problem is that in city areas the definition of the terrain elevation becomes less precise. Also, the characteristics of the points off the ground, i.e. lying only above the terrain in the vegetation, does not hold fully for off-terrain points in city areas. Points below ground level, e.g. underpasses, occur frequently and violate the basic assumptions of many of these algorithms. New approaches are investigated which provide more than only the classification into ground and off-terrain. The advantages of ALS caused expectations in the field of urban remote sensing, where a practicable and reliable method for fully automated derivation of building models is still missing. A large volume of research on that topic, i.e. the automatic recognition of houses from aerial images and their reconstruction, exists [5]. However, only semi-automated methods [6], [7] are used routinely. In the airborne laser scanning community many algorithms for the extraction of vegetation (i.e. trees as a set), or single trees [8], of buildings, and roof forms were published. Concerning the reconstruction of houses from airborne laser scanning data, much attention was given to the reconstruction of roof forms. Many of the algorithms assume that the footprint of the building is given, e.g. by a digital cadastral map. One comparison for the accuracy of roof modeling with given building footprints from cadastral plans and airborne laser scanning data was published in a EuroSDR test [9]. While

c 2007 IEEE 1-4241-4244-0712-5/07/$20.00

cadastral plans are available in many parts of the world, they are not available in all parts. Also issues of up-to-date-ness, completeness, and, finally, different interpretation from ground based surveying and aerial data acquisition call for reliable methods for extraction of the building footprints. Also the task of map updating requires the detection of houses, (partly) without given ground plans. A number of algorithms has been published on building detection, but neither has the meaning of ‘successfully detected’ been investigated in detail, nor has a comparison between different methods shed some light on the relative performance of these algorithms. Therefore, the quality of building footprint extraction methods is largely unknown. Finally, the problem is not solved satisfactory, and upcoming regulations within the European Union (e.g., on noise emission), require to give more attention to the urban vegetation and building models. The contribution of this paper is put the extraction of building footprints into the focus. So far, no experimental comparison of algorithms working primarily with airborne laser scanning data has been performed. Furthermore, many algorithms have shortcomings in one or the other situation. Additionally, the method of assessing the quality of the results is not straightforward as it may seem at first sight. The quality assessment can be performed on the iconic level (point or pixel), or on the object level (the entire house or block). This paper presents and compares different assessment methods. The paper is organized as follows. The next section will present an overview of building detection algorithms and present two algorithms, which will be compared more in depth. Section 3 elaborates on the assessment criteria. Section 4 presents the data sets used, and section 5 presents the results, i.e. the relative and absolute comparison of the algorithms, measured by different methods. Conclusions are drawn in the last section. II. P REVIOUS W ORK A. Overview on methods of building detection in airborne laser scanning data Algorithms for the extraction of the houses often classify the data into a number of urban classes. The published methods differ in: • derived classes, e.g., building, vegetation, open area, and other objects, • methods of data acquisition, i.e., with additional data (aerial imagery) or without, • input data type, i.e., point cloud or raster data, • granularity of the detected classes, • type of classification, i.e., fuzzy or hard. Before presenting the methods used in this comparison, a comprehensive overview of existing approaches will be given. [10] suggests to extract objects from the normalized DSM (nDSM=DSM-DTM) on a height threshold so that small objects (cars, . . . ) will not be included. Then, the mean intensity of the return echo is used to separate buildings (lower reflectance) from vegetation (higher reflectance). As

this separation is not perfect, variance of height, variance of the surface normal and its distribution are considered further. [11] construct the nDSM and extract elevated objects. The GLCM (grey level co-occurrence matrix) contrast in x-, y-, and the diagonal directions is computed which is a means to find tree pixels which have a high edge contrast in all directions. The objects which are not trees (houses, cars, sheds, etc.) are then split by an unsupervised k-means clustering on the contrast texture and the height. [12] first apply a segmentation based on a bottom-up region merging, minimizing locally the growth of height heterogeneity. Their first approach detects the class ‘building and tree’ by comparison of mean segment height to neighboring pixels and then uses color information from an aerial image or the recorded intensity of the laser range finder for distinguishing between vegetation and buildings. In an alternative classification approach the GLCM homogeneity for height and for intensity and a shape criterion (edge length) on the boundary polygon are the features extracted for each segment. A fuzzy approach for combining these cues is then performed in order to arrive at the final classification. [13] apply a region growing based segmentation where the height between a pixel and its neighbors has to stay below a threshold. The relation to neighboring segments and shape descriptors are used for the classification by a set of rules and also the possibility to reconstruct the roof structure from the point set is considered. [14] use an volumetric approach, where the DSM is not interpreted as surface, but made up of horizontal binary slices, where the bit is set at a certain pixel and a certain height slice, if the DSM elevation is larger. In each slice connected component labeling is performed. Such a component is classified as belonging to a building, if the components in the (vertically) neighbored slices have similar size and center of gravity. [15] suggest to use morphological filters on the height of the laser points in order to detect elevated objects, using different window sizes. Then, a size criterion and the requirement that a large portion of the points can be modeled by planes is used to verify or falsify these building hypotheses. [16] first use the first minus last echo height differences and then the roughness compared to an adjusting plane in a small neighborhood to remove pixels in vegetated areas. This leads to a DSM containing buildings and small objects as cars, and based on a DTM the nDSM is computed and a height threshold is used to finally extract the buildings. [17] perform a segmentation of the nDSM by region growing in the 8-neighborhood to those pixels where the height difference of a candidate point is below a threshold. For each segment a number of features is extracted: gradients on the segment boundary, height texture and first to last echo height differences, segment size and shape encoded in the length of the longest edge, and laser echo intensities. Based on these features a fuzzy classification or a maximum likelihood classification is performed. [18] use exclusively first echo data, because the last echoes are ‘not reliable’ in all cases. Based on edge detection, off-

terrain segments are detected and cleaned using mathematical morphology. The above ground objects are subjected to a plane detection algorithm and the height differences of all points within one raster cell as well as morphological operations are applied in order to separate vegetation from buildings. [19] start from an nDSM and use the first-last echo differences and a roundness measure to discriminate between vegetation and building objects. The ground plan of the building objects are approximated and generalized with recursively determined rectangles aligned with the boundary edges. [20] start with an nDSM. Thresholds on step edges and the variance of surface normals are used together with morphological operators and area size thresholds to detect and remove vegetation areas. Remaining objects in the nDSM are considered to be buildings. In the proposed approach Bayesian networks are used to overcome the problem of fixed thresholds. Most of the above methods use an nDSM with subsequent application of a height threshold to detect elevated objects. Buildings are separated from the rest by size and roughness criterions. Common to all methods are the problems of steep terrain, of errors in DTM generation which propagate to the nDSM, and of trees next to houses, especially if data acquisition is performed under leaf-on conditions. Depending on the specific method, very large or small objects can cause problems additionally. Furthermore, the assessment of the quality of extraction for the above methods is typically analyzed only briefly and the methodology for evaluation differs. Also the setting of parameters is often not discussed. In the next sections two methods will be described that are finally used for making a comparison of different building detection algorithms. These two algorithms are very different, so they can give an indication on the expected range of possible results when other algorithms are applied. B. Building detection based on hydrological raster GIS tools This method uses GIS tools from hydrological analysis to detect buildings. The input data sets are a raster DSM and a model describing the difference between first and last reflected echo [21]. 1) Method description: The algorithm is embedded in the Open Source Geographic Information System GRASS GIS. From the raw point cloud different raster layers are derived: a DSM and a first-to-last echo difference model [22],[23]. The latter one is used primarily for detecting the vegetation. The DSM is inverted in the vertical direction and a hydrological algorithm for filling sinks is used to detect high objects. These derived segments are enhanced by morphological filtering and an area criterion to remove noise. Then features based on the point cloud or raster layers are calculated to derive buildings from these high objects. Significant features for building classification are e.g. shape index, surface roughness, mean first-to-last-echo difference, or area. The classification is done by user defined rules which contain the value ranges for each feature and a weight. The result can be refined by

the iterative adaption of feature selection and value range definition in the rule base. 2) Practical application of the method: The parameter selection runs via training areas, where correct house footprints are available, e.g. from an other source or manual digitization. Care has to be taken to select a typical area in order to get globally optimal results. The optimal parameters for the segmentation (height, morphological closing, minimum area) are found by an exhaustive search in the solution space. The range and step width of the segmentation parameters has to be supplied. The object features (e.g. shape index) used within the classification have to be defined by the user. The classification thresholds derived from the reference data set are adjusted iteratively until the best producer accuracy and user accuracy values (see also Sec. III-A) are reached. Weights are applied to the single feature layers to derive a quality measurement showing how well the single segments belong to the building class. Only those segments with a certain quality measurement are used for the building classification. Manual adjustments of the classification rule base enhance the classification results of the whole test site. C. Building detection by Dempster-Shafer fusion of ALS data and multispectral Images This is a probabilistic method for building detection that combines a set of parameters derived from airborne laser scanner data and the Normalised Difference Vegetation Image (NDVI) generated from multispectral images. DempsterShafer Fusion is favoured over Bayesian classification because it provides a framework to overcome incomplete knowledge about the distribution of the input data with respect to the classes to be discerned [24]. 1) Method description: The minimum input to this method is a DSM generated from the last pulse laser points. Optionally, a DSM derived from the first pulse points and an NDVI image can be generated, too. The first processing stage is the generation of a coarse DTM, which can either be accomplished by hierarchical morphological filtering [25] or, in an offline process, by any other method, e.g. by hierarchic robust linear prediction [3]. The DTM is used to obtain the height differences between the DSM and the DTM, the first input parameter for building detection. Two parameters related to surface roughness, the height differences between the first and last pulse DSMs and the NDVI are combined in a DempsterShafer fusion process to distinguish the four classes building (B), tree (T), grassland (G), and bare soil (S). In this fusion process, a simple yet realistic model for the a priori probability masses is used, where each classification cue is modelled to distinguish two complementary subsets of the set of classes Θ = {B, T, G, S}. This fusion process is carried out for each pixel of the DSM independently, and after some postprocessing to eliminate incorrectly classified isolated pixels, initial building regions are found as connected components of pixels classified as “buildings”. A second fusion process is carried out for each of the individual building regions using

another set of input parameters representing average values per building region, and finally, the building outlines are found as the thinned-out boundary polygons of the building regions in the rasterised building label image. More details on the method can be found in [25]. 2) Practical Application: Two sets of control parameters have to be provided for the method. The first stage of processing, i.e. the generation of an approximate DTM requires some thresholds that are actually used to detect large buildings and then exclude them from further processing. Some of these parameters are closely related to physical properties such as minimum building height and area, but others (e.g. those related to surface roughness) have to be determined in a training phase or by analyzing intermediate results. For the second phase of the process, the actual DempsterShafer classification, the parameters for the models of the probability masses have to be set. Except for the NDVI related parameters, which require training areas, these parameters can be derived from estimates of the minimum building height in the scene and the approximate percentage of the scene covered by trees. III. M EASURING DETECTION SUCCESS The description of an algorithm for the detection of houses or buildings in urban areas is of little value if no assessment of its performance is given. This triggers the question on the assessment method, and different aspects have to be considered. •

•

•

•

•

• •

The definition of ‘house’ is not clear, dependent on local administrative requirements, and usually given in natural (ambiguous) language. The definition of ‘house’ usually does not refer to the view from an airborne perspective but to the object usage. From two objects that are identical from above, only one may be considered as a house due to specific regulations, whereas the other one not. Reference data sets are often acquired by different methods. If data is acquired terrestrially (e.g. by total station), the position of the walls are typically measured, whereas the aerial view considers the roof overhang. Systematic errors in the georeferencing, e.g. a shift in a coordinate direction, in either data set typically has no influence on the detection or measurement process, but very well in the comparison. Properties of the aerial data, e.g. point density, may prevent the successful detection of all buildings beforehand, e.g. due to size limitations. Depending on the application the building footprint is used for, different types of errors, e.g. commission or omission, may weigh differently. The data sets may be from different epochs. The granularity of the reference data may be different to that of the automatically extracted data. One example is that the reference data set may be organized according to street numbers, whereas the laser data set may be organized according to height jumps relative to the ground.

Assessing the difference between two different data sets of building footprint is therefore an assessment of the georeferencing accuracy, of the appearance similarity in the data sets, of the appropriateness of the ‘house’ definition for the aerial view point, the suitability of the data set, the change between the epochs, the compatibility of the representation units in either data set, and eventually of the detection success. The detection success only should be measured, whereas the other factors have to be taken care of by other means. It is therefore necessary to measure ‘detection success’ in a number of ways. A. Low level comparison The output of the detection process can either be a set of closed polygons or a raster map. The same holds for the reference data set. Each pixel or intersection polygon of the entire region can be assigned to one of four groups: both data sets indicate building, neither data set indicates building, either one or the other data set indicates building. This low level/pixel comparison method of assessment takes all the above factors into account. From a practical point of view it is useful because it gives a guideline on how much work is necessary to transform one data set into the other. As it is the standard, the producer’s accuracy (PA) and user’s accuracy (UA) will be used for comparing the classification to reference data. The PA shows the percentage of the correctly classified reference class pixels. A high PA means that most reference areas are covered by correctly classified areas, but it could be that the classification overestimates the class of interest. The UA is defined by the relation of right classified areas to all classified areas within a class. A high UA means that the classified areas match the reference areas very well, but it is no evidence about the completeness of the coverage (underestimation is possible). Therefore a good classification result should reach high PA and UA. These values are also termed completeness and correctness [26] and will also be used in comparisons based on objects. B. Comparison on the object level An object is a connected set of pixels in the raster domain and a polygonal face in vector data. Comparing objects has the advantage, that the unit of interest, e.g. a building, is analyzed and not pixels, which are arbitrary concerning their placement, size, etc. In pixel based comparisons the spatial connections are not investigated. The object comparison approach requires a method that declares two objects as identical. The location in the first place, but size, shape, and orientation additionally, are properties to consider. While a lower threshold value on the similarity measured may be obtained from the georeferencing precision and typical roof overhang sizes in the region, upper threshold values are typically more arbitrary. For comparing objects it is furthermore important to have a comparable level of details in both data sets, not only with respect to a single object, but also concerning the total size of each object. Therefore, objects in the reference data set that cannot be detected in the extracted set should be

both geometries correctly classified, over- and underestimated areas (i.e., producer’s and user’s accuracy, resp.), are calculated for each object. IV. R ESULTS

Fig. 1. A reference data set is shown, where thin structures and areas smaller than 10m2 are removed (darker regions).

removed before hand. A typical example is an isolated wall in the reference data set which may be denoted as building, whereas the laser scanning data set has a density too low for detecting a wall (Fig. 1). The object based comparison also accounts for small georeferencing errors and differences in the measurement method, i.e. rising wall vs. roof overhang. In a general setup, i.e. without specialization to airborne laser scanning data and cadastral maps, this subject is investigated also in [27] where especially the relation of the boundary polygons are analyzed. C. One implementation of an object level comparison To make a comparison between reference and classified data on object level, a spatial relation between corresponding objects must be established. In the first step an unique ID is assigned for all objects which are converted to vector polygons in GRASS GIS. Then for each polygon the ‘central point’ [22] coordinates from the polygon outline vertices is calculated with a python script. A positive result on a point in polygon test between reference polygon and central point of the classification polygon and vice versa marks the objects which are assumed to be mapped in both data sets. An alternative, though more complex, measure of object identity, based on the overlapping area is given in [25]. Depending on the percentage of overlap, it is categorized into four classes (strong, partial, weak, none). Based on the simple spatial relation, i.e. identical or not, different types of error assessment can be calculated. First of all the PA and UA are calculated (cf. Sec. III-A) assuming that all objects with a successful point in polygon test are correctly classified. The size of correctly classified areas is taken from the whole reference object for PA and from the whole classification object for UA, respectively. With this method the errors caused by different object representation in the data sets is suppressed. The second method calculates PA and UA not based on areas but on the number of detected objects based on the positive point in polygon test. Using this method means that every object contributes independent from size to the error assessment while the first method is weighted by the object area. The last method investigates the object representation and classification accuracy for every single object. If the point in polygon test was successful, the ID of the corresponding polygon is stored in the attribute table of the reference and classification GRASS vector layer respectively. By intersecting

The derived buildings will be compared with reference data sets and classification results from other methods. Differences between two data sets can be caused by the i) time of data acquisition, ii) object representation in different data sets, or iii) the differences in the detection methods. This must be taken into account when analyzing the results of the error assessment. A. Test area Haselgraben 1) Airborne laser scanning data: For the test site Haselgraben the ALS data were acquired for the Upper Austrian Federal Government. As the ALS flight campaign took place on March 24, 2003 leaf-off and snow free conditions were available for the whole area. The ALS data cover an area of about 70 km 2 . The ALS campaign was accomplished by the company TopScan, Germany, which employed a first- and lastecho Airborne Laser Terrain Mapper (ALTM) systems from Optech Inc., Canada. The flying heights above ground were about 1,000 m and the pulse repetition rate was 25 kHz. The average point density is 1 point per m 2 . The maximum scan angle was ± 20◦ . The beam divergence of the ALTM system results in a mean footprint size of 0.3m in diameter. The whole investigation area is covered with 20 flight strips including 140 millions first- and last-echo points. Within the test site the elevations range between 250 and 950 m. 2) Reference data: The reference building layer is based on a manual interpretation of aerial photos that were acquired in 1997. Furthermore, the digital cadastral map from the year 2005 was integrated to this building layer, which is provided by the Upper Austrian Federal Government, Spatial Planning Department. The local authorities estimated, that 60% of all available buildings are correctly included in this reference data, which can clearly be seen in Fig. 2a and b. For the comparison small and thin objects like walls and small buildings are removed. For the comparison on object level inner polygons (islands) and connected polygons are dissolved in order to be able to perform the point in polygon test (Sec. III-B). The reference data set includes thin walls and very small buildings which are not represented in the 1m resolution DTM raster. Therefore long, thin elements are removed by converting the reference data to a raster layer and apply morphological erosion in 3 iteration with a resolution of 0.25m cell size. Then all polygons from the original reference data set are selected overlaying cells with a value. Then the polygons are dissolved and areas smaller than 10m 2 are removed. 3) Classification with GRASS: To optimize segmentation settings and the classification rule base 77 buildings are digitized from the last pulse DSM. The optimization of the segmentation settings is done for the parameters height from 0.0–3.5m, minimum segment area from 0.0–100m 2, and the

TABLE I T ESTED AND USED PARAMETER FOR SEGMENTATION AND CLASSIFICATION FOR BUILDING DETECTION IN

Segm. Class. -

Parameter height opening area Feature shape index fplp diff area

Tested range / Steps 0-3.5 / 0.5 m 0-3 / 1 pixel 0-100.0 / 10.0 m Range from reference 0.14-1.23 m−1 -0.17-4.00 m 79.0-1462 m2

GRASS.

used value 0.5 m 2 pixel 70.0 m Adjusted range 0.085-0.7 m−1 1.9-9.5 m 70.0-18000.0 m2

radius for morphological closing from 0–3 pixel. The used features are (i) the segment area, (ii) the shape index (perimeter per area) and (iii) the mean first last pulse difference. The value ranges used in the classification rule base were first derived from the reference data set and iteratively adjusted for the training areas and then adjusted again. For the features a factor two was applied to weight. In the final classification only segments with quality measurement 2 to 4 were used to classify buildings while segments with quality measurement 1 were excluded. Tab. I gives an overview of the parameter settings. 4) Classification with Dempster-Shafer: The region where buildings should be detected was restricted to a manually digitized polygon following loosely the outline of the residential area. The parameters were set on the basis of the expected building height (etc.) as specified above. Computation was performed in tiles of 2.5×2.5km 2 augmented by an overlap. Fine tuning of the parameters can be performed as the processing of a 7km2 tile is completed on a current PC in five minutes, which resulted in three different parameter sets. Each tile was processed with all parameters and the optimal result per tile was chosen by visual analysis. NDVI data was not available. 5) Evaluation: The results of building footprint extraction are shown in Fig. 2 and for a selected area in Fig. 3d and e. As the reference data was not very reliable a typical area comprising 77 houses was digitized manually in the shaded relief view (Fig. 3a and b). This process took approximately one hour. Fig. 2c shows the results of the GRASS GIS approach which was derived without manual editing. For the extraction with Dempster-Shafer the results are shown in Fig. 2d. At first glance it becomes apparent that the digital cadastral data is not a complete record of the building structure. Both methods for automatic reconstruction find many obvious buildings in the south western part of the area where the cadastral map shows no houses. A comparison on the pixel level can be performed between the cadastral data and the two laser scanning data based results, but also from one laser scanning based result to the other. On the pixel level Tab. II gives the PA and UA. The low UA with respect to the cadastral reference shows the high amount of missing houses in the cadaster. Comparing the two automatic approaches to each other shows that the Dempster-Shafer finds 15% more building pixels than the method with GRASS tools. The two laser scanning based methods provide similar

results, but the comparison to the cadastral map failed. The Dempster-Shafer results do fit to the cadastral data set slightly better, especially concerning producer’s accuracy. Therefore, in a small area the results were compared to a digitization of the house outlines in the shaded relief view (last column in in Tab. II), which was also used to determine the parameters in the GRASS method. Accuracy around 75% can be achieved. The above analysis gives an overall view, but it is not possible to specify if the results hold for each house, i.e. each house if approx. 75% correct, or the other extreme, namely that 75% of the houses are 100% correct. This can be concluded from the object based comparison. The values in the Tab. III below give the correctness of house detection as such, i.e. not considering the footprint geometry. With respect to the manual digitization the GRASS based method detects 83% of the houses correctly, whereas concerning pixels it was only 73%. However, 30% of detected objects are not houses. The Dempster-Shafer method finds 63% of the houses. This low value may be caused by the aggregation of multiple houses into one object. This would trigger a failure in the object identity evaluation of cadaster to ALS data. On the other hand, only 7% of the detected objects are not houses. Weighting the results on building area tends to improve the results, which shows that larger buildings can be detected better. For regional applications also the total of the derived building area is relevant. The results are very similar for the small area and the entire area. In the test region of all together 186.100m 2 the cadaster shows 20.500m 2 of built up area, the manual digitization 26.300m 2, and the two automatic approaches deliver 22.600m 2 (GRASS) and 30.100m 2 (Dempster-Shafer). Assuming that the manual digitization is error free, this is an under- and overestimation of 14% and 15%, respectively. As mentioned above, PA and UA can also be computed on object basis. The 529 building objects from the cadaster were compared to the objects derived with GRASS tools. The objects were grouped into class A: (PA ≥ 70%) ∧ (UA ≥ 70%), class B: (PA ≥ 20%) ∧ (UA ≥ 20%) and not in class A, and the rest in class C. In class A the objects are, more or less, correctly detected. Differences like roof overhang have to be considered here, too. It applies to 50% of the buildings in the cadaster. In class B the objects are correctly found, but there is a considerable difference between cadaster and the automatically derived result. This can be caused by a change in the house or by a failure in the automatic detection. Also houses aggregated in the automatic derivation but separated in the cadaster are likely to fall into that class. It applies to 43% of the cadaster buildings. The remaining 7% of buildings in class C were not found automatically, or did not show up in the identity test, or have been broken down. B. Test area Hohenems The second test site will be treated in less detail and especially different aspects to the previous results will be presented. Parameters for both methods were determined as before (see Fig. 5). The NDVI was computed from a color

Fig. 2. Subfigure (a) shows the shaded relief view of the DSM from the airborne laser scanning point cloud, (b) shows the digital cadastral map, (c) shows the extraction with GRASS tools, and (d) shows the extraction based on Dampster-Shafer. similar error

missing buildings detected

aggregation

false detection

ings build g in miss

under and overestimation

Fig. 3. Subfigure (a) shows the shaded relief view of the DSM from the airborne laser scanning point cloud, (b) shows the digitized roof facets from the last pulse DSM, (c) shows the digital cadastral map, (d) shows the extraction with GRASS tools, and (e) shows the extraction based on Dempster Shafer. TABLE II PA AND UA ON PIXELS FOR THE BUILDING DETECTION IN GRASS AND BASED ON D EMPSTER -S HAFER WITH RESPECT TO THE DIGITAL CADASTER , THE OTHER METHOD , AND A MANUAL DIGITIZATION IN THE

Reference → Classification ↓ GRASS Dempster-Shafer

ALS DATA SET.

cadastral map

GRASS, entire region

Dempster-Shafer, entire region

hill shading digitization

84.3/15.6 87.9/18.4

– 65.6/74.1

74.1/65.6 –

72.9/84.9 83.6/72.8

TABLE III PA AND UA FOR BUILDING DETECTION . U PPER ROWS REFER TO THE NUMBER OF OBJECTS , LOWER ROWS ARE WEIGHTED WITH THE BUILDING SIZE . Reference →

cadastral map

Classification ↓ GRASS # number of houses Dempster-Shafer # number of houses GRASS # area of correct houses Dempster-Shafer # area of correct houses

GRASS, entire region

Dempster-Shafer, entire region

hill shading digitization

51.9/16.1 45.7/31.2 71.0/23.4 64.4/29.0

– 33.7/70.7 – 67.2/80.9

70.7/33.9 – 80.9/67.2 –

82.9/70.0 63.2/92.6 80.4/86.4 68.3/97.4

V. C ONCLUSIONS

Fig. 4. Frequency of (PA+UA)/2 for individual objects detected with the GRASS approach. The first bin includes the values between 0 and 10%, etc.

and near infrared ortho photo and used in the Dempster-Shafer approach. No manual delineation of the residential area was performed. The training area for the GRASS approach was digitized in the ortho photo, too. 1) Airborne laser scanning and reference data: The test site Hohenems is located in Vorarlberg/Austria. It covers an area of 10.4km 2 which contains different land cover types like industrial and residential areas, roads and railways, forests and agricultural land. The data was collected for the Federal Survey Institute Feldkirch in November 2003 with an ALTM 2050 by the company TopScan Gmbh. The average point density within the test site is 16 points/m 2 and it is covered by 17 flight strips. The cadastral data is acquired with terrestrial methods and kept regularly up-to-date. The differences between the automatically derived building footprints and the cadaster reference are therefore expected to be much smaller than for the area Haselgraben. 2) Evaluation: On pixel level both methods show a PA of 85% to the cadastral map, similar values to the test site Haselgraben. The UA for the GRASS approach is 73% and somewhat lower with 63% for the Dempster-Shafer approach. This shows that the cadaster is indeed in better “shape” for that test site. Misclassifications in the hilly forested region in the south east are causing the reduced UA for the DemspterShafer approach. Concerning the number of detected houses, PA is lower by at least 10% for both approaches, but weighting these values with the area of the detected objects restores the original PA values. Again, this confirms that larger houses are detected with a greater reliability. The total built-up area in the reference data set is approx. 750.000m 2, which is overestimated by 17% with the GRASS approach, in the previous example it showed an underestimation. For the Dempster-Shafer approach the results are consistent. For the results from the GRASS approach the individual PA and UA were computed on object basis. In Fig. 4 the frequency distribution of (PA+UA)/2 is shown, which is very similar to the distribution of PA as well as of UA. Of 505 objects, 245 have an average PA and UA between 80 and 90%. This means that if a house is detected, the chances that it is entirely “correct” are larger than 50%. There was no strong correlation to object size, indicating that the outlines of larger houses are not necessarily detected with a higher precision.

The paper demonstrated the use of airborne laser scanning data, in the form of rasterized point clouds, for the derivation of building footprints in built-up areas. Also in highly developed region existing cadastral data is often outdated or may not be sufficient for other reasons (e.g. legal definition of house in the cadaster vs. the need to map all buildings). Compared to other results published in the literature (Sec. II-A) the results are less optimistic regarding the fully automatic derivation of a building map. This is attributed to two reasons. Firstly, the data sets cover an extended area which is larger than the areas used in other studies. Additionally, a considerable amount of the area is not covered by buildings but shows undulating terrain with different degrees of vegetation density. Secondly, the main focus of this paper was not advocating a specific algorithm, but to shed more light on the problems associated with assessing the quality of building detection. Quality parameters can be determined by a simple comparison on the pixel level between reference data and automatically generated classification results. However, as it was demonstrated in the paper, this lacks for a number of reasons. Cadastral maps are typically acquired by terrestrial methods, measuring the rising walls, whereas airborne methods measure the eaves, causing a difference because of the roof overhang. A pixel based comparison is, furthermore, ignorant to georeferencing errors between reference data and automatically derived classification from another source. Errors cannot be localized and, finally, different granularity and detectability of objects should be considered. Therefore, the classification should be made on objects, i.e. detected buildings, and not on pixel level. This has been demonstrated and two algorithms for building detection were compared. It was shown that the algorithms showed different behavior concerning over- and underestimation of building footprints. Additionally, detection works better for larger houses, but more details are given in the paper. The difference in the point density (factor ten) between the two test sites did not have a strong influence on the classification. The producer’s and user’s accuracy for each detected objects allows to investigate the errors of a specific algorithm if reference data is available. These values hint to the problematic objects directly. For making a comparison on object level, a method for declaring two object identical is necessary. A number of measures was proposed, but only a comparatively simple method based on mutual inclusion of an inner polygon point, was used for the presented study. The possibilities should be further investigated and overlap areas as well as shape similarity should be considered. As Figs. 3 (including annotations) and 5 demonstrated, a fully automatic derivation of building footprints has not been achieved yet. However, accuracy in the order of 80% appears realistic. The test is not comprehensive in the way that only two areas were investigated and only two algorithms were compared.

Fig. 5. Subfigure (a) shows the shaded relief view of the DSM from the airborne laser scanning point cloud over Hohenems, (b) shows the digital cadastral map, (c) shows the extraction with GRASS tools, and (d) shows the extraction based on Dampster-Shafer.

Including more city types (e.g. those available from EuroSDR data sets) and testing more algorithms would be a first step to reach an improved judgement on the performance of various building extraction algorithms. Assessment should then be performed with object based methods as presented in this contribution. R EFERENCES [1] A. Wehr and U. Lohr, “Airborne laser scanning an introduction and overview,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 54, 1999. [2] W. Förstner, Computer and Robot Vision. Addison-Wesley Longman Publishing Co., Inc., 1993, vol. 2, ch. Image Matching. [3] K. Kraus and N. Pfeifer, “Determination of terrain models in wooded areas with airborne laser scanner data,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 53, 1998. [4] G. Sithole and G. Vosselman, “Comparison of filter algorithms,” in International Archives of Photogrammetry and RS, 34/3-W13, Dresden, Germany, 2003. [5] W. Förstner, “3d-city models: Automatic and semiautomatic acquisition methods,” in Photogrammetric Week 99, D. Fritsch and R. Spiller, Eds. Wichmann Verlag, Heidelberg, 1999. [6] C. Brenner, “Interactive modelling tools for 3d building reconstruction,” in Photogrammetric Week 99, D. Fritsch and R. Spiller, Eds. Wichmann Verlag, Heidelberg, 1999. [7] F. Rottensteiner and M. Schulze, “Performance evaluation of a system for semi-automatic building extraction using adaptable primitives,” in International Archives of Photogrammetry and RS, 34/3-W8, Munich, Germany, 2003. [8] J. Hyyppä, H. Hyyppä, P. Litkey, X. Yu, H. Haggren, P. Roennholm, U. Pyysalo, J. Pitkaenen, and M. Maltamo, “Algorithms and methods of airborne laser-scanning for forest measurements,” in International Archives of Photogrammetry and RS 36/8-W2, Freiburg, Germany, 2004. [9] H. Kaartinen, J. Hyyppä, E. Gülch, G. Vosselman, H. Hyyppä, L. Matikainen, A. Hofmann, U. Mäder, . Persson, U. Söderman, M. Elmqvist, A. Ruiz, M. Dragoja, D. Flamanc, G. Maillet, T. Kersten, J. Carl, R. Hau, E. Wild, L. Frederiksen, J. Holmgaard, and K. Vester, “Accuracy of 3d city models: EuroSDR comparison,” in International Archives of Photogrammetry and RS, 36/3-W19, Enschede, The Netherlands, 2005. [10] C. Hug and A. Wehr, “Detecting and identifying topographic objects in imaging laser altimetry data,” in International Archives of Photogrammetry and RS, 32/3-4W2, 1997. [11] S. Oude Elberink and H.-G. Maas, “The use of height texture measures for the segmentation of airborne laser scanner data,” in International Archives of Photogrammetry and RS, 33/3A, Amsterdam, The Netherlands, 2000.

[12] L. Matikainen, J. Hyyppä, and H. Hyyppä, “Automatic detection of buildings from laser scanner data for map updating,” in International Archives of Photogrammetry and RS, 34/3-W13, 2001. [13] G. Forlani and C. Nardinocchi, “Building detection and roof extraction in laser scanning data,” in International Archives of Photogrammetry and RS, 34, 2001. [14] Q. Zhan, M. Molenaar, and K. Tempfli, “Building extraction from laser data by reasoning on image segements in elevation slices,” in International Archives of Photogrammetry and RS, 35, 2002. [15] M. Morgan and A. Habib, “Interpolation of Lidar data and automatic building extraction,” in APSRS Annual Conference Proceedings, 2002. [16] A. Alharthy and J. Bethel, “Heuristic filtering and 3d feature extraction from Lidar data,” in International Archives of Photogrammetry and Remote Sensing, Vol. 34/3A, Graz, Austria, 2002. [17] D. Tóvaria and T. Vögtle, “Object classification in laserscanning data,” in International Archives of Photogrammetry and RS, 36/8-W2, 2004. [18] F. Tarsha-Kurdi, T. Landes, P. Grussenmeyer, and E. Smigiel, “New approach for automatic detection of buildings in airborne laser scanner data using first echo only,” in International Archives of Photogrammetry and RS 36/3, Bonn, Germany, 2006. [19] H. Gross, U. Thoennessen, and W. v. Hansen, “3d-modeling of urban structures,” in International Archives of Photogrammetry and RS, 36/3W24, Vienna, Austria, 2005. [20] A. Brunn and U. Weidner, “Hierarchical bayesian nets for building extraction using dense digital surface models,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 53, 1998. [21] M. Rutzinger, B. Höfle, T. Geist, and J. Stötter, “Object-based building detection based on airborne laser scanning data within grass gis environment,” in Proceedings of UDMS 2006 25th Urban Data Management Symposium, Fendel and Rumor, Eds., Aalborg, Denmark, 2006. [22] G. D. Team, “Geographic resources analysis support system (grass) software,” ITC-irst, Trento, Italy, 2006. [Online]. Available: http://grass.itc.it [23] B. Höfle, M. Rutzinger, T. Geist, and J. Stötter, “Using airborne laser scanning data in urban data management - set up of a flexible information system with open source components,” in Proceedings of UDMS 2006 25th Urban Data Management Symposium, Fendel and Rumor, Eds., Aalborg, Denmark, 2006. [24] L. Klein, Sensor and Data Fusion, Concepts and Applications, second ed. SPIE Optical Engineering Press, 1999. [25] F. Rottensteiner, J. Trinder, S. Clode, and K. Kubik, “Using the dempster shafer method for the fusion of lidar data and multi-spectral images for building detection,” Information Fusion, vol. 6, no. 4, 2005. [26] C. Heipke, H. Mayer, C. Wiedemann, and O. Jamet, “Evaluation of automatic road extraction,” in International Archives of Photogrammetry and Remote Sensing, Vol. 32. [27] L. Ragia and S. Winter, “Contributions to a quality description of areal objects in spatial data sets,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 55, 2000.