For example, Tomko et al. (2008) examined the hierarchical organization of urban street networks using the concept of betweenness centrality among regions, ...
Polygon-based Approach for Extracting Multilane Roads from OpenStreetMap Urban Road Networks Qiuping Lia , Hongchao Fanb,* , Xuechen, Luanc, Bisheng Yangc, Lin Liua,d a
Center of Integrated Geographic Information Analysis, School of Geography and Planning, Sun Yat-sen
University, Guangzhou, China b
Chair of GIScience, University of Heidelberg, Berlinerstr. 48, 69120, Heidelberg, Germany
c
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University,
Wuhan 430079, P.R. China d
Department of Geography, University of Cincinnati, Cincinnati, OH, USA
ABSTRACT: This study proposes a novel approach for extracting multilane roads from urban road networks in OpenStreetMap (OSM) datasets as functional high-level roads, thereby allowing comparative analyses to determine the differences between this functional hierarchy and other hierarchies. OSM road networks have high levels of detail and complex structures, but they also have large numbers of duplicated lines for the same road features, which leads to difficulties and low efficiency when extracting multilane roads using conventional methods based on the analysis and operations of line segments. To overcome these deficiencies, a polygon-based method is proposed that is based on shape analysis and Gestalt theory, which treats polygons surrounded by roads as the operating elements. First, shape descriptors are calculated for each polygon in networks and used for classification. Second, candidate multilane polygons are classified as seeds based on all the polygons used as shape descriptors by a support vector machine. Finally, based on the seed polygons, a region-growing method is proposed that connects and fills the multilane features according to Gestalt theory. An experiment using OSM data from different urban networks verified the validity of the proposed method. The method achieved good and effective extraction performance, regardless of the complexity and duplication of datasets. Thus, a comparative analysis with high-level roads extracted based on road type attributes and structural analysis was performed to demonstrate the differences between the constructed road levels and other hierarchies. KEY WORDS: hierarchy, multilane road, OpenStreetMap, road network.
* Corresponding author. hongchao.fan @ geog.uni-heidelberg.de
1. Introduction The multilane roads in high level of detail (LoD) datasets normally represent the high-level roads constructed in urban road networks. These multilane roads have a high capacity for transportation and they form the major pattern of urban movement flows, which are crucial for many modern applications such as geographical analysis, traffic planning, location-based services (LBS), and emergency evacuation. In recent years, road network generalization algorithms have considered the maintenance of the characteristic patterns in road networks, because road networks can be viewed as a collection of typical structures and patterns, as roads are complex manmade objects. Thus, multilane roads make the main contribution to road network patterns in a city at the coarse level in the hierarchy of road networks because of their crucial roles in the functional hierarchy, which support their high transportation capacities in the overall road network and they are used as undeletable (or maintained) roads in generalizations (Zhang, 2004; Heinzle et al., 2006; Heinzle and Anders, 2007; Luan and Yang, 2010). Based on this undeletable pattern, a collapsing method has been proposed to extract single centerlines from multilane roads (Thom, 2005). For road network hierarchy analysis, however, another hierarchy called a structural hierarchy has been researched widely in recent years, which is represented by the connectivity measurements of the entire urban road network, such as centrality measurements (Freeman, 1977). Some roads with very important connection roles are regarded as a high structural level (Jiang and Claramunt, 2004; Jiang and Harrie, 2004; Porta et al., 2006a, 2006b; Tomko et al., 2008; Jiang and Liu, 2009; Yang et al., 2011). For example, Tomko et al. (2008) examined the hierarchical organization of urban street networks using the concept of betweenness centrality among regions, which was shown to be a credible method for exploring the hierarchical characteristics of urban street networks. The betweenness centrality of a node is a global value that measures the mediator effect of one stroke in an urban street network. It is equal to the number of shortest paths from all nodes to all others that pass through that node, where a node is defined as a stroke that comprises continuous line segments and an edge is defined as a connection relation among nodes. In general, the structural hierarchy usually indicates whether a road should be important, whereas the functional hierarchy indicates whether it is important in the road construction. The comparison between these two hierarchies is very interesting. The ratio of common parts can show the fitness of the constructed road network, while the difference between them can indicate the specific multilane roads that are actually not used frequently at present, as well as the single-lane roads that should be widened in the future. To the best of our knowledge, similar comparative research has not been reported previously. To facilitate this comparison, the detection of multilane roads is regarded as the preliminary step. However, in some of the datasets used in our search, such as
OpenStreetMap (OSM), the multilane attributes might not be stored explicitly in the data attributes of many regions. To derive multilane roads from the most detailed road network, most current approaches address this problem by using the technique of line segments analysis, where closely spaced lines with appropriate angles, lengths, and distances are usually selected and tracked in road network (Zhang, 2009; Yang et al., 2011). The typical tracking method for multilane roads extraction proposed by Yang et al. (2011) includes three main steps. First, the spaced road lines are detected using a buffer-growing method. Second, these lines are paired by angle and distance as candidate multilane roads. Finally, the candidates are connected along smooth angles by using a heuristic tracking approach to deal with the branching problem. This approach is effective for detecting dual carriageways because it addresses specific difficulties, such as removing tracks that follow dual carriageways and the ramps used to access highways. In practice, however, these three steps are time consuming, especially the final tracking step if the data quality is low in volunteered geographical information (VGI) datasets, where there is a lack of data quality control. First, the road lines created by different volunteers may be drawn repeatedly on the same road feature. The lack of data checking processes means that these duplicated lines exist independently of the dataset, which increases the difficulty and the time required when extracting multilane roads. Moreover, some road lines in non-professional OSM data may include errors, such as tangles, broken roads, and singular angles. Due to the high LoD, the application of conventional ‘cleaning’ operations to OSM data with GIS software will disrupt several correct digitizations of ground truth on a large scale, which may make the problem more complex. These issues make it difficult and inefficient to extract reasonable multilane roads from this type of messy digital linear data. Therefore, a different strategy is required to improve the efficiency of multilane road extraction. In this study, a novel approach is proposed for extracting multilane roads from road networks in OSM in an efficient manner, where the road type attributes are only recorded as primary, secondary, and tertiary roads, but not as multilane roads. Instead of using line data analysis techniques (Yang et al., 2011), multilane roads are extracted based on polygon analysis in the proposed method. This concept is based on the fact that a road network in a city can also be viewed as a set of closed polygons, which are intersected by roads in line segments. Multilane roads are often digitized as multiple parallel lanes, thus several long and thin polygons can be clearly distinguished from block polygons. In our approach, these long and thin polygons can be detected as seeds to extract multilane roads from urban road networks. First, our approach makes the road network planar and it then builds the polygons from the road network. Next, it extracts multilane polygons based on considerations of their shape and topology characteristic, which involves the following two key steps. 1) Shape-based multilane polygon extraction: the classification of candidate multilane polygons as seeds using a support vector machine (SVM) where shape descriptors are
used as input vectors. 2) Topology-based multilane polygon extraction: connecting and filling road features extracted from seed polygons based on the Gestalt concept using a region-growing algorithm based on geometrical and topological information, and the elimination of false extracted regions. The remainder of this paper is structured as follows. Section 2 provides an overview of the state of the art. Section 3 introduces the polygon shape descriptors and multilane polygon seeds are detected by SVM classification. Section 4 describes the spatial multilane polygon extraction technique based on a region-growing process. In Section 5, the feasibility and validity of our approach are evaluated and compared using road type and structural hierarchies for three case networks based on OSM data. Our conclusions are given in Section 6. 2. Shape-based multilane polygon extraction If the overall road network is viewed as a set of closed polygons, which are intersected by roads in line segments, then multilane roads comprise a number of long and thin polygons, because they are often digitized as multiple parallel lanes. In our approach, these long and thin polygons will be detected and used as seeds to extract multilane roads from urban road networks. For this purpose, the shapes of the polygons are analyzed initially. In general, polygon shape can be described using some basic measures such as their area, eccentricity, Euler number, convexity, compactness, and aspect ratio (Sonka et al., 1993), or more accurately based on their ellipticity, rectangularity, and triangularity, as suggested by Rosin (2003). These characteristics have been shown to be effective for describing and classifying basic shapes. To discriminate multilane polygons in road networks, other shapes such as long and narrow polygons or small polygons, which are interlocked with long and narrow polygons (Figure 1), need to be selected (Touya, 2010).
(1) long-straight (2) long-curving (3) grained Fig. 1 Categories of potential multilane polygons, (1) long-straight, (2) long-curving, (3) grained
These polygons are more complex than basic shapes and they are not combinations of them. Thus, some additional shape descriptors need to be proposed to discriminate these shapes. In our approach, five descriptors are proposed for these long and narrow
polygon shape descriptions, which are calculated in the first step. Next, the SVM classification algorithm is used to comprehensively analyze all of the descriptors. If sufficiently varied descriptors are available, it should be possible to combine them to facilitate shape classification and discrimination during multilane polygon extraction. 2.1 Calculation of polygon shape descriptors To describe the shapes of polygons in a road network, five descriptors, i.e., area, perimeter, compactness, parallelism, and width, are defined as follows. Area, perimeter, and compactness The relationship among these three descriptors is defined as follows. 𝑎𝑟𝑒𝑎(𝑝𝑜𝑙𝑦𝑔𝑜𝑛)
𝑐𝑜𝑚𝑝𝑎𝑐𝑡𝑛𝑒𝑠𝑠 = 4π × 𝑝𝑒𝑟𝑖𝑚𝑒𝑡𝑒𝑟2 (𝑝𝑜𝑙𝑦𝑔𝑜𝑛)
(1)
The value of compactness ranges from 0 to 1, where a circle shape has the highest compactness value of 1. The compactness value is smaller as the outline of the polygon becomes more complex. Compared with other polygons (e.g., blocks), most multilane polygons usually have small areas, perimeters, and compactness. Parallelism of polygons In addition to their small areas, perimeters, and compactness, most multilane polygons have two primary parallel road lines. When viewed at an urban scale, most of the parallel lines have straight shapes. Thus, a parallelism descriptor for straight multilane polygons is proposed in this study to measure the shape of a polygon. The formulation of parallelism is defined as follows. ∑𝑛 length(𝑙𝑖𝑛𝑒 )∙𝑥
𝑖 𝑖 1 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚(𝑝𝑜𝑙𝑦𝑔𝑜𝑛) = perimeter(𝑝𝑜𝑙𝑦𝑔𝑜𝑛)
(2)
Suppose that n line segments belong to the boundary of the polygon. If the angle between linei and the polygon’s primary direction is less than a given threshold, which is defined empirically as 30 degrees in our approach, then xi is 1; otherwise the value is 0. The polygon’s primary direction is defined as the long edge direction of the polygon’s minimum bounding rectangle (MBR). If MBR is a square, either direction can be regarded as the primary direction. Therefore, this parallelism descriptor calculates the ratio between the sum of the lengths of the boundary lines that are similar to the polygon’s primary direction and the polygon’s perimeter. The value of parallelism also ranges from 0 to 1, where the parallelism is more likely to be larger if a polygon is a multilane polygon with a straight shape. Width of polygons Another important characteristic of multilane polygons is their narrow shape. Thus, the width descriptor is defined as the area-length ratio of a polygon, which is calculated as follows. 𝑎𝑟𝑒𝑎(𝑝𝑜𝑙𝑦𝑔𝑜𝑛)
𝑊𝑖𝑑𝑡ℎ(𝑝𝑜𝑙𝑦𝑔𝑜𝑛) = 𝑙𝑒𝑛𝑔𝑡ℎ(𝑐𝑒𝑛𝑡𝑒𝑟𝑙𝑖𝑛𝑒)
(3)
The central line of a polygon is extracted from the Delaunay triangles within the
polygon. To achieve this, a constrained Delaunay triangulation algorithm is implemented within each polygon, which forces the boundary of the polygon into the triangulation. The triangulation process is solved by the open source polygon triangulation code HGRD (http://www.codeproject.com/Articles/8160/Polygon-Triangulation). The central line can then be extracted based on the triangle’s inner polygons (Ai and Guo, 2000). As shown in Figure 2, the triangles with bold lines are classified as three types of triangles. A central line can then be created for each type of triangle. Finally, the longest route extracted from all central line segments is selected as the central line of the polygon, as shown in Figure 3.
triangle-I triangle-II triangle-III Figure 2 Triangle categories and centerline segment extraction according to adjacent triangle number. For the triangle-I, the central line segment is drawn from the midpoint P1 of the only common edge with another triangle to its corresponding vertex A; for the triangle-II, it is drawn between midpoints P1 and P2 of two common edges; and for the triangle III, segments are created from the centroid O of triangle to the midpoints P1, P2 and P3 of each edge respectively. (Ai and Guo 2000)
Figure 3 Example of centerline generation. Segments in dark red color indicate the triangles inner the polygon. The segments in red and green colors are central-line segments in each triangle, in which the route in red color is the final central line of the polygon.
2.2 Multilane road classification with the SVM The five shape descriptors proposed for polygons in section 2.1 can be used to discriminate multilane polygons. As mentioned above, each descriptor can partially describe one characteristic of a multilane polygon. The threshold setting for each index is also difficult because of the complex urban structure. In some cases, the thresholds cannot be predefined uniformly because the threshold of one descriptor
may vary with the values of other descriptors. Thus, it is necessary to classify multilane polygons automatically without setting the thresholds manually. In our approach, a SVM is used for this purpose. Instead of setting a threshold for each descriptor, some multilane polygons are selected manually as the initial training data. Using these selected multilane polygons, the SVM package libsvm (www.csie.ntu.edu.tw/~cjlin/) generates a classification model to extract the potential multilane polygons from all polygons in the overall road network. SVMs are one of the most useful two-category classification approaches for finding a hyper-plane that separates most of the positive examples from negative example, and SVMs are used widely in pattern recognition (Boser et al., 1992; Cortes and Vapnik, 1995). In the present study, the input vector for the SVM is defined as a 5-dimensional vector that comprises all the polygon shape descriptors. The primary SVM classification process using the libsvm package includes three steps. 1) Shape descriptor scaling It is important to scale all of the shape descriptors before applying the SVM. This can prevent large-range descriptors from dominating small-range descriptors. In our method, large area, perimeter, and width values should be linearly scaled to [0, 1]. The same scaling range should be implemented for both the training and testing data, where the upper and lower limits of the scaling are derived from all of the polygons in the entire road network, rather than the training area. 2) Cross-validation of the SVM parameters Some parameters used by the SVM classification need to be defined by cross-validation to ensure that the classifier is as accurate as possible. The common strategy of v-fold cross-validation separates the training dataset into v parts of equal size, where one is considered unknown and tested using a classifier trained with the remaining v-1 subsets. Each training subset is predicted once and the cross-validation accuracy is the percentage of the data that are classified correctly. This cross-validation procedure can also avoid the over-fitting problem. Using cross-validation, various groups of parameters values are tested and the set with the best cross-validation accuracy is selected. The search process can be parallelized easily because each group of parameters is independent. This step is solved by the grid-search package in libsvm. 3) SVM model generation and test data classification After finding the best parameters, the entire training set is trained again to generate the final classification model. Using the model, all of the polygons in the urban network can be assessed using the scaled geometric descriptors to determine whether they are multilane roads. 3. Complementation of spatial multilane polygons based on Gestalt theory The SVM classification can identify typical multilane polygons. However, this strategy only considers the shape and not the spatial relationships among polygons,
which means that some multilane polygons will be missed, thereby degrading the road features due to the unguaranteed data quality of VGI. Thus, the polygons discriminated by the SVM need to be complemented using connecting and filling operations based on their spatial relationships. In the proposed method, region-growing and gap-closing strategies based on Gestalt theory are applied to the polygons detected by the SVM as seed polygons, thereby complementing the multilane roads. Gestalt theory focuses on the idea of “grouping” according to human cognition and using self-regulating adjustments to structure or interpret a visual field (Wertheimer, 1923). The primary factors that determine groupings are: (1) proximity where elements tend to be grouped together according to their nearness; (2) similarity where items that are similar in some respect tend to be grouped together; (3) closure where items are grouped together if they tend to complete some entity; and (4) simplicity where items are organized into simple figures according to their symmetry, regularity, and smoothness. Based on these four factors, the complementation process is divided into two steps: polygon connecting and gap closing, which consider the shape similarity between adjacent polygons before complementation and the shape after complementation. The design of the polygon connection process is based on using the proximity and similarity, as mentioned above, to group multilane polygons together according to their nearness and similarity. The multilane polygons are used as seeds, which are grouped with other adjacent polygons by comparing their spatial similarity. Next, other polygons that are surrounded by the detected multilane polygons are regarded as holes inside the multilane polygons, which are filled according to the principles of closure and simplicity, thereby making the shapes of the multilane polygons complete and simple. These two complementation processes are described as follows. 3.1 Polygon connecting with an iterative region-growing strategy Using the polygons detected by the SVM as seeds, the proposed region-growing polygon connection process iteratively compares the spatial similarity of adjacent polygons, which includes the arrangement of the polygons and the new width and length of connected polygons. Figure 4 shows the workflow of the region-growing connection method.
Input polygon data (seeds) no Step 1: Search for adjacent candidate polytons found
irregular
Step 2: Judge arrangement Side-side Update new length of side-side arrangement
End-end
Side-end
Update new length of end-end arrangement
Empty
Update new length of side-end arrangement
Step 3: Update new width & push to input data set
End
Figure 4 Workflow of the regional growing connecting method
The key steps of the spatial connection method can be elaborated as follows. Step 1: Retrieve a seed polygon Pi discriminated by SVM and search for the adjacent candidate polygon Pj with the common boundary Lij to Pi. Step 2: Calculate the two acute angles Aij and AiB to determine the arrangement of two polygons Pi and Pj. The angle Aij is the acute angle of the major orientations of Pi and Pj, while the other angle AiB is the acute angle between the common boundary Bij and the major orientations of polygon Pi. Thus, the arrangement between two polygons can be categorized according to three cases. For each group, the road length and width are updated. The process used to measure the new width simply merges the two adjacent polygons and measures the width directly. However, the merging centerline extraction process, including the polygon triangulation algorithm, are time consuming for each two polygon comparisons, particularly when the shape of the merged polygon is highly concave like a dead tree with many branches. Thus, an approximate solution is proposed, as follows. Suppose that the length of the common boundary Bij is represented as Lboundary and the lengths and widths of polygon Pi and Pj are represented as Li, Wi and Lj, Wj, respectively. The length of a polygon is calculated as the length of the centerline extracted by the Delaunay triangles, as described in section 3.1, and the width of a polygon is calculated as the area divided by the length. For each of the three cases, the updating process are described as follows.
Case 1: If the angles Aij and AiB are both smaller than 45 degrees, as shown in Figure 5-1, they are regarded as a side-by-side pair. The new length of these two polygons is calculated as follows. 𝐿𝑛𝑒𝑤 = 𝐿𝑖 + 𝐿𝑗 − 𝐿𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 Case 2: If only angle Aij is smaller than 45 degrees, as shown in Figure 5-2, they are regarded as an end-to-end pair. The new length of these two polygons is calculated as follows. 𝐿𝑛𝑒𝑤 = 𝐿𝑖 + 𝐿𝑗 Case 3: If the angle Aij is larger than 45 degrees, they are regarded as a side-to-end or end-to-side pair. If AiB is smaller than 45 degrees, as shown in Figure 5-3, the new length of these two polygons is calculated as follows. 𝐿𝑛𝑒𝑤 = 𝐿𝑖 + 𝑊𝑖 + 𝐿𝑗 − 𝐿𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 Otherwise, if AiB is larger than 45 degrees, as shown in Figure 5-4, the new length of these two polygons is calculated as follows. 𝐿𝑛𝑒𝑤 = 𝐿𝑖 + 𝑊𝑗 + 𝐿𝑗 − 𝐿𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 For all three cases, the new width of these two polygons is calculated as follows. 𝑊𝑛𝑒𝑤 = (𝑎𝑟𝑒𝑎𝑖 + 𝑎𝑟𝑒𝑎𝑗 )/𝐿𝑛𝑒𝑤 Major orientation of polygon
Seed polygon Pi
Candidate polygon Pj Common boundary
(1) side-side
(2) end-end (3) side-end (4) end-side Figure 5 Arrangement cases of two polygons
Step 3: A width threshold is defined according to the road construction specification, which is set as 10 m in our method. If the new width Wnew of the two polygons is smaller than the width threshold, the polygons Pi and Pj are connected as new detected multilane polygons and pushed into the seed polygons set. Otherwise, the connecting algorithm will assess the other candidate polygons adjacent to polygon Pi. Step 4: The connecting algorithm retrieves all of the seed polygons and stops when no more seeds remain. 3.2 Closing gaps in multilane roads Because of the unguaranteed data quality of OSM, there may still be some missing
multilane polygons after the region-growing connection process. These missing polygons are usually surrounded by the detected multilane polygons. Thus, there are holes in the places where there are missing polygons. First, the filling approach clusters all of the undetected polygons according to their connectivity relationship by regarding the detected multilane polygons as separators. As shown in Figure 6, the red polygons are the multilane polygons detected by the SVM and the region-growing method, whereas the green and blue polygons represent two clusters of undetected polygons, which are separated by the detected red polygons. The green clusters that belong to the multilane features are much smaller than the other clusters. Thus, the clusters with small areas that are enclosed completely within the detected multilane polygons are reselected as multilane polygons. There is an apparent gap between multilane holes and other polygons according to the area statistics, thus an area threshold can be set easily to eliminate large clusters, such as urban blocks.
Figure 6 Different sizes of cluster holes
Finally, to extract the multilane roads, the connected and filled clusters with small areas are not regarded as functional high-level roads. This is because the high-level multilane roads in road networks are usually connected with each other and they have relatively large areas. By contrast, the long and thin polygons with small areas are usually short isolated parallel roads or even incorrectly detected polygons, which should be eliminated from the final results. In our approach, the area threshold used to eliminate the small area detection results is set empirically to 2000 m2. 4. Experiments and Discussion 4.1 Experimental data To verify the validity of the proposed approach, the road networks in Munich, Frankfurt, and Stuttgart were tested using OSM datasets. All of these maps originated from precisely captured real-world roads and the data volumes were quite large. These three OSM datasets were selected because they included several divided highways and complex street junctions, while the multilane roads in urban networks could not be identified exactly based on the attributes in OSM. Moreover, the lack of data
quality control meant that several duplicated lines were drawn on the same road feature. Therefore, line-based multilane road extraction methods were too complex and they performed inefficiently with these road networks. There were 240321 roads in the Munich OSM data, 94745 in Frankfurt, and 40944 in Stuttgart. The numbers of polygons generated were 84247 in Munich, 32583 in Frankfurt, and 13168 in Stuttgart. 4.2 Experimental results To apply the SVM method, the training areas were first selected from each road network. The training area included as many different shapes of multilane polygons as possible. Figure 7 shows the training area in Munich, which was about 1500 × 2500 m2 and it contained 1180 polygons, including long and narrow polygons, as well as small polygons that interlocked with long and narrow ones. We selected 613 multilane polygons artificially as the training sample to generate the SVM classification model. The selection process involved box selection because it was easy for the user to simply draw boxes on the multilane area. After the training process, the accuracy ratio of this model was 77.16% based on these training data, which was computed as the area of manually selected multilane polygons divided by the area of polygons detected by the SVM. The selection process was performed using the training data for Frankfurt and Stuttgart, and the accuracy ratios of the SVM models were 72.78% and 73.77%, respectively. This demonstrated that the SVM process could be improved by subsequent processing.
Figure 7 Training area of Munich
After SVM classification, the detection results were refined according to the process described in section 3, where the width threshold for the aggregated roads was 10 m and the area threshold used to eliminate incorrect detection results was set to 2000 m2. The final extraction results are shown in Figure 8, where cyan indicates the roads detected by the SVM that did not belong to multilane roads and the dark color
indicates the final extracted multilane roads. The multilane roads reflect the overall structure of the entire road network. Details of the detection results are shown in Figure 9 based on global and local views. In Figure 9-1, there are several duplicated lines, but the multilane roads were detected well using our approach. Our approach is able to detect multilane roads if the data quality is low. However, some multilane roads were excluded from the detection results because these roads were not connected to the main parts of the road network. In addition, the results included some incorrect detection errors because the area and width thresholds specified in section 3, which were set as constants based on our experience, might not have been appropriate for certain road structures. For example, some very thin polygons (Figure 9-2) adjacent to the multilane roads were connected incorrectly by the region-growing algorithm described in section 3, while the roads in grid-like regions of Frankfurt (Figure 10) were actually duplicated lines that measured >10 m. These incorrect detection results depended on the data structure and quality, thus they were difficult to correct automatically. They were also difficult to eliminate manually without additional auxiliary data. Therefore, a further manual modification assisted by ground truth image data might be required after the detection process proposed in this study.
(1) Munich
(2) Frankfurt
(3) Stuttgart Figure 8 Extraction results
(1) Local view of road and junction
(2) Global view of a network Figure 9 Details of detection results. Light grey symbol indicates the detected multilane polygons detected by SVM, and the dark grey symbol indicates the connected and filled multilane polygons.
Figure 10 Incorrect detections in Frankfurt
4.3 Analysis and discussion After detecting the multilane polygons, 30587 polygons and 81474 corresponding roads were selected as multilane roads, which comprised 12.73% of all the roads in Munich after weighting by road length. Compared with the manual selections in other sample areas, which differed from the training area used as a benchmark, the accuracy ratio after weighting by road length was 93.79%. Similarly, the accuracy ratios for Frankfurt and Stuttgart were 89.44% and 91.14% respectively, where ramps were regarded as parts of the multilane roads. To demonstrate the efficiency of our approach, three comparisons were performed based on the OSM data quality and the difference between functional and structural hierarchies, as follows. 4.3.1 Comparison with a typical multilane detection approach The comparison between our polygon-based approach and the typical approach proposed by Yang et al. (2011) was conducted as follows. The execution time comparison between the line-based and polygon-based approaches was based on the data volume differences in the road lines and polygons, the complexity of the algorithms, the data exchange rate, the efficiencies of third-party packages, and other factors. Table 1 shows the runtime comparisons for three urban areas using the line-based and polygon-based approaches. The line-based approach included stroke generation and betweenness calculations that involved two key steps, whereas the polygon-based approach included shape descriptor calculations, classification by the SVM, and the polygon connection and filling steps. Some subordinate processes such as topology construction and data exchanges among platforms were not included in the statistics. The time required by the line-based approach was approximately four times more than that by our polygon-based approach in the three experimental urban areas. The most time-consuming components of the line-based approach were stroke generation and betweenness calculations because of the duplicated line data in the OSM. One factor that makes our polygon-based approach more efficient compared with the line-based approach is that the number of polygons is usually much less than the number of corresponding lines. Table 1.Comparison of efficiency between line-based and polygon-based approaches
Munich Frankfurt Stuttgart
Line-based Number of Computation line segments time 240321 7h18m 94745 2h32m 40944 1h3m
Polygon-based Number of Computation polygons time 84247 1h45m 32583 40m 13168 16m
4.3.2 Comparison based on road type attributes It is often thought that multilane roads can be represented well by some road type attributes, such as primary road and expressways. Thus, we also compared multilane roads based on the road type attributes in OSM and the structural high-level roads extracted by a graph analysis approach. The multilane roads and road type attributes
were compared first. Figure 11 shows the results for the different road types based on the OSM attribute data for multilane roads. At the first road levels, primary roads were more prominent that secondary and tertiary roads, which showed that the road type was essentially consistent with multilane roads. Cycleways and paths are usually constructed along primary and secondary roads, and they can be used as temporary vehicle ways if an emergency evacuation is required.
(1) Munich
(2) Frankfurt
(3) Stuttgart Figure 11 Statistical results of different road type amongst the multilane roads
It is notable that residential streets were the most common in all three cities. After checking the detailed comparative results, we found that several residential streets near squares, gardens, and the entrances to motorways were usually digitized as one of the multilanes. For example, as shown in Figure 12, Donau Street in Munich was digitized as a multilane road, but the attributes of the two lanes were a footway and a residential way. This type of road was detected as a multilane road using our approach. This reflects the fact that the OSM attributes did not include complete road digitizing rules, so only the separate lanes of the same road could be digitized as multilanes. Tracks are digitized as dual or more parallel lines in OSM data, which also plays an important role in the transportation of urban networks.
Figure 12. The comparison between OSM and Google earth
4.3.3 Comparison with the structural hierarchy Moreover, because the structural hierarchy is effective for knowledge discovery and traffic flow predication in road networks, the main aim of the present study was to evaluate the rationality of existing high transportation-capacity roads based on multilane roads. Thus, the structural urban skeleton selection based on the betweenness centrality was computed using the graph analysis package IGraph in the R-project (http://cran.r-project.org/ package=igraph). The green road lines in Figure 13 represent the skeleton selection results for Munich, Frankfurt, and Stuttgart.
(1) Munich
(2) Frankfurt
(3) Stuttgart Figure 13 Skeletons selections based on the betweenness centrality > 5, 000, 000
From a cartographic viewpoint, these results appear to be a more reasonable reflection of the structure of road networks. This demonstrates the gaps between “what the high-level roads are” and “what the high-level roads should be.” The overlap between multilane roads and structural high-level roads should be large in a well-planned urban road network. The overlaps between multilane roads and structural high-level roads with different thresholds are shown in Figure 13. The red line shows the overlap ratio and the green line shows the recall ratio, which are defined as follows. 𝑜𝑣𝑒𝑟𝑙𝑎𝑝 = 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑙𝑒𝑛𝑔𝑡ℎ(𝑀𝑢𝑙𝑡𝑖𝑙𝑎𝑛𝑒𝑠∩𝑆𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙_𝑙𝑒𝑣𝑒𝑙) 𝑆𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙_𝑙𝑒𝑣𝑒𝑙
𝑙𝑒𝑛𝑔𝑡ℎ(𝑀𝑢𝑙𝑡𝑖𝑙𝑎𝑛𝑒𝑠∩𝑆𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑎𝑙_𝑙𝑒𝑣𝑒𝑙) 𝑀𝑢𝑙𝑡𝑖𝑙𝑎𝑛𝑒𝑠
(4) (5)
These three statistical results all have similar characteristics. Using Munich as an
example, the three characteristics shown in Figure 14-1 are as follows. 1) As the betweenness threshold increased from 0 to 25,000,000, the overlap ratio increased dramatically. This means that although a large number of roads were excluded based on the betweenness centrality, many of the multilane roads extracted using the polygon-based approach were retained among the structural high-level roads. However, the recall ratio decreased dramatically, which means that several multilane roads were not regarded as structural high-level roads. These multilane roads may have been located in unimportant regions and their traffic capacity might not have been fully utilized. 2) When the betweenness threshold exceeded 25,000,000, the overlap ratio was greater than 80% and it stabilized, which showed that the multilane roads extracted using the polygon-based approach were approximately stable and they comprised the roads selected based on the betweenness centrality. 3) In the overall comparison, the recall ratio was always lower than expected. This was because several roads with high structural significance were simply digitized as single-line features, which could not be extracted using the polygon-based approach. Considering that the OSM urban networks are high LoD datasets, these single-line features demonstrate the gap between the structural significance and the actual constructed road levels. This can be used to guide road construction to improve the traffic capacity of roads with high structural significance but that attract little attention.
(1) Munich
(2) Frankfurt
(3) Stuttgart Figure 14 Comparisons of constructed arterial roads and structural hierarchies
5. Conclusions In this study, we proposed a polygon-based approach for extracting multilane roads from high LoD road networks. Experiments demonstrated that the proposed approach is suitable for road networks with high LoD. Compared with road line information such as strokes, the shapes of polygon helped to reflect several characteristics of the construction level and structural significance in urban road networks, according to our experiments. This method provides a highly efficient solution for high LoD urban street networks by representing and selecting multilane polygons using shape descriptors, and detecting roads based on Gestalt theory. Complex shapes such as “grained” intersections and long curved lines could be detected as multilane roads. Three OSM road network datasets were used to verify the validity of the proposed approach. The major contributions of this approach are as follows. The proposed method depicts multilane polygons using several shape descriptors. The method classifies multilane polygons as seed polygons by using a SVM to
select multilane road features. A region-growing approach connects and fills polygons as multilane features based on Gestalt theory. This method can extract multilane roads from networks with a high LoD but low quality.
The polygon-based multilane road extraction approach identified multilane roads with high LoD as well as low semantic-quality VGI urban networks, such as OSM datasets. However, some non-road polygons with long and thin shapes may be selected incorrectly as multilane roads because of the complex structures of urban networks. This will be improved by using a more refined algorithm to balance the effectiveness and efficiency. The matched nodes can be used for OSM data enrichment, such as generalizing dual-line roads into single lines and simplifying complex junctions into single nodes. This enriched version of the dataset can also be used for comparisons with other LoD data sources to perform spatiotemporal analyses of urban pattern development. Our future research will concentrate on multilane road collapsing, the analysis of VGI data from historical sources, and other features. Acknowledgements This work is supported by the Klaus Tschira Foundation (KTS) in Germany, the NSFC (National Natural Science Foundation of China) projects (41101443, 41201425 and 41171140), and the National High Technology Research and Development Program of China (863 Program) (Grant No. 2012AA121402, 2013AA122302). References Ai, T.H. and Guo, R.Z., 2000.Extracting Center-lines and Building Street Network Based on Constrained Delaunay Triangulation. Acta Geodaeticaet Cartographic Sinica, 29(4), pp. 348-354. [In Chinese] Boser, B. E. Guyon, I. and Vapnik, V., 1992. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM Press, pp. 144-152. Chang, C.-C.andLin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), Article No. 27. Cortes, C. and Vapnik, V., 1995. Support-vector network. Machine Learning, 20, pp. 273-297. Edwardes A.J., Mackaness W.A., 2000. Intelligent Generalization of Urban Road Networks. In: proceedings of GIS Research UK Conference (GISRUK'2000), York, UK, 2000, pp. 81-85. Freeman, L.C., 1977. A set of measures of centrality based on betweeness. Sociometry, 40 (1), 35–41. Heinzle F., Anders K.-H., Sester M. 2006. Pattern Recognition in Road Networks on the Example of Circular Road Detection. In: proceedings of the 4th International Conference GIScience 2006, Münster, Germany, September 2006, pp. 153-167. Heinzle F., Anders K.-H. 2007. Characterising Space via Pattern Recognition Techniques: Identifying Patterns in Road Networks. Mackaness W., Ruas A., Sarjakoski T. (eds) :The Generalisation of
Geographic Information : Models and Applications. Elsevier Hu, Y. G., Chen, J., Li, Z. L. and Zhao, R. L., 2007. Selection of Streets Based on Mesh Density for Digital Map Generalization. In: Proceedings of the fourth International Conference on Image and Graphics. Jiang, B. and Claramunt, C., 2004. Topological analysis of urban street networks.Environment and Planning B, 31, pp. 151–162. Jiang B and Harrie L. 2004.Selection of Streets from a Network Using Self-Organizing Maps. Transactions in GIS, 8(3), pp. 335-350 Jiang, B. and Liu, C., 2009. Street-based topological representations and analyses for predicting traffic flow in GIS. International Journal of Geographical Information Science, 13(9), pp. 1119–1137. Jones, C.B. 1997. Geographical information systems and computer cartography. Essex: Addison Wesley Longman Limited. Luan X. and Yang B. 2010. Generating Strokes of Road Networks Based on Pattern Recognition. In: 13th ICA Workshop on Generalisation and Multiple Representation, 12-13 September 2010, Zurich, Switzerland Liu, X. J., Zhan, B. J. and Ai, T. H., 2009.Road Selection Based on Voronoi Diagrams and ’Strokes’ in Map Generalization. International Journal of Applied Earth Observation and Geoinformation, 12(2), pp. S194-S202. Mackaness W. and Edwards G. 2002.The Importance of Modelling Pattern and Structure in Automated Map Generalisation. In: Proceedings of Joint Workshop on Multi-scale Representations of Spatial Data, Ottawa, Canada McMaster, R. B. and K.S. Shea, 1992.Generalization in digital cartography. Washington, D.C., U.S.: Association of American Geographers. Porta, S., Crucitti, P., and Latora, V., 2006a. The network analysis of urban streets: a dual approach. Physica A: Statistical Mechanics and its Applications, 369(2), pp. 853–866. Porta, S., Crucitti, P., and Latora, V., 2006b. The network analysis of urban streets: a primal approach. Environment and Planning B: Planning and Design, 33(5), pp. 705–725. Rosin, P.L., 2003.Measuring shape: ellipticity, rectangularity, and triangularity. Machine Vision and Applications, 14(3), pp.172-184. Sonka M., Hlavac V., and Boyle R., 1993.Image Processing, Analysis, and Machine Vision. Chapman and Hall. Thom S., 2005. A strategy for collapsing OS Integrated Transport Network(tm) dual carriageways. Proceedings of the 8th ICA workshop on generalisation and multiple representation, La Coruña,Spain. Tomko, M., Winter, S., and Claramunt, C., 2008. Experiential hierarchies of streets. Computers, Environment, and Urban Systems, 32 (1), pp. 41–52. Touya, G., 2010. A road network selection process based on data enrichment and structure detection. Transactions in GIS, 14 (5), pp. 595–614. Wertheimer, M., 1923.Laws of organization in perceptual forms.First published as Untersuchungen zur Lehre von der Gestalt II, in Psycologische Forschung, 4, 301-350. Translation published in Ellis, W. (1938). A source book of Gestalt psychology (pp. 71-88). London: Routledge&Kegan Paul. [available at http://psy.ed.asu.edu/~classics/Wertheimer/Forms/forms.htm] Yang, B.S., Luan, X.C., Li, Q.Q., 2011. Generating hierarchical strokes from urban street networks based on spatial pattern recognition. International Journal of Geographical Information Science
25(12), pp. 2025-2050. Zhang M., 2009. Methods and implementations of road-network matching. Thesis (PhD). Institutefor Photogrammetry and Cartography, Technical University of Munich, Munich, Germany. Zhang Q. 2004. Modelling Structure and Patterns in Road Network Generalization, In proceedings of ICA Workshop on Generalisation and Multiple Representation - 20-21 August 2004 – Leicester Zhang, X., Ai, T. H. and Jantien, S., 2008.The Evaluation of Spatial Distribution Density in Map Generalization. In: ISPRS Congress, Beijing.