Document not found! Please try again

Improving the Quality of Citizen Contributed Geodata through ... - MDPI

2 downloads 0 Views 11MB Size Report
Jun 28, 2018 - Keywords: OpenStreetMap history file; Voronoi diagram; intrinsic quality ...... of the Knowledge Discovery and Data Mining, Portland, OR, USA,.
International Journal of

Geo-Information Article

Improving the Quality of Citizen Contributed Geodata through Their Historical Contributions: The Case of the Road Network in OpenStreetMap Afsaneh Nasiri 1 , Rahim Ali Abbaspour 1, * and Jamal Jokar Arsanjani 2 1 2

*

ID

, Alireza Chehreghan 1

ID

School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, 1439957131 Tehran, Iran; [email protected] (A.N.); [email protected] (A.C.) Geoinformatics Research Group, Department of Planning, Aalborg University Copenhagen, A.C. Meyers Vænge 15, DK-2450 Copenhagen, Denmark; [email protected] Correspondence: [email protected]; Tel.: +98-21-8208-4519

Received: 17 April 2018; Accepted: 27 June 2018; Published: 28 June 2018

 

Abstract: OpenStreetMap (OSM) has proven to serve as a promising free global encyclopedia of maps with an increasing popularity across different user communities and research bodies. One of the unique characteristics of OSM has been the availability of the full history of users’ contributions, which can leverage our quality control mechanisms through exploiting the history of contributions. Since this aspect of contributions (i.e., historical contributions) has been neglected in the literature, this study aims at presenting a novel approach for improving the positional accuracy and completeness of the OSM road network. To do so, we present a five-stage approach based on a Voronoi diagram that leads to improving the positional accuracy and completeness of the OSM road network. In the first stage, the OSM data history file is retrieved and in the second stage, the corresponding data elements for each object in the historical versions are identified. In the third stage, data cleaning on the historical datasets is carried out in order to identify outliers and remove them accordingly. In the fourth stage, through applying the Voronoi diagram method, one representative version for each set of historical versions is extracted. In the final stage, through examining the spatial relations for each object in the history file, the topology of the target object is enhanced. As per validation, a comparison between the latest version of the OSM data and the result of our approach against a reference dataset is carried out. Given a case study in Tehran, our findings reveal that the completeness and positional precision of OSM features can be improved up to 14%. Our conclusions draw attention to the exploitation of the historical archive of the contributions in OSM as an intrinsic quality indicator. Keywords: OpenStreetMap history file; Voronoi diagram; intrinsic quality analysis; positional accuracy; volunteered geographic information

1. Introduction The widespread availability of advanced web technologies, the improvement of mapping devices and positioning systems, as well as the escalating growth of users’ need of spatial information have led to an emerging era of massive citizen-collected geographic data. OSM and Wikimapia [1] can be named as well-known examples of this, which evolve with users’ contributions [2]. OSM has turned into one of the largest and most successful examples of volunteered geographic information (VGI) projects by creating and maintaining a free online map that could be revised by the users while having global coverage [3]. The increase in the number of users and an extensive volume of compiled data in this project during recent years prove the rising community behind the contributions [4]. It is of note

ISPRS Int. J. Geo-Inf. 2018, 7, 253; doi:10.3390/ijgi7070253

www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2018, 7, 253

2 of 21

that all the OSM data, such as the ways, nodes, and relations, besides its previous versions, are kept in the OSM data history, which itself entails all the spatial and semantic information of a certain object along with information related to the users of that object [5]. Unlike other VGI projects, OSM not only allows the users to have access to the latest version of the information about the object, but also allows them to download the previous versions and the history file of the edits [6]. Due to the nature of data collection in OSM and its openness for everyone with any level of mapping experience to contribute, its quality is subject to careful investigation prior to tailoring it for any application [7,8]. Therefore, investigating data quality, and the mechanism to improve and enrich it, has been on the research agenda of the VGI community [9]. The research relevant to VGI quality, in particular OSM, can be divided into three categories, as illustrated in Figure 1. In the first category, OSM data is compared against a reference dataset produced by a governmental or private organization, focusing on quantitative analysis of data quality [10–18]. This category includes a considerable volume of the research dealing with the VGI quality. For instance, Haklay [12] compared OSM data of London with its official dataset generated by Ordnance Survey. Mohammadi and Malek [17] compared OSM data with that of Iranian National Cartographic Center by extracting the model from the corresponding reference data, they estimated the accuracy of the corresponding reference OSM data. Lyu et al. [18] proposed an evaluative approach on the basis of symmetric arc similarity and assessed the geometric quality of the VGI road network based on the difference with the land road network corresponding reference. Brovelli et al. [19] presented an approach for evaluating two data quality elements, namely, completeness and positional accuracy, based on the difference with the reference data, which was capable of setting the parameters of quality assessment consistent with the users’ needs. These studies have been conducted across different countries with diverse contribution patterns, e.g., Germany [13,14,16,20], Iran [15,17], England [12], France [11], and Greece [10,18]. Nonetheless, these methods are disadvantaged with high costs and limitations in acquiring highly-accurate proprietary data while the quality of reference data was also in question. In the second category, the researchers elaborated on the data quality by linking the data quality with users’ behavior. For example, Arsanjani et al. [21] investigated several quality-related elements, including positional accuracy, completeness, and semantic accuracy, linked with contributing users who produced the data. In this category, the researchers endeavored to explore the impact of the number of contributors on the data quality [22–24]. The research falling in the third category does not employ the reference dataset for quality assessment, i.e., the recent research has proposed methods for the VGI quality assessment on the basis of data history rather than a comparison with reference data [25,26]. As an alternative, the research in this category has made use of the information captured in the history of OSM data as a tool for assessing data quality [27–29]. For instance, Keßler and De Groot [27] recommended a set of indicators based on historical data, i.e., the number of versions, contributors, confirmations, tag corrections, and rollbacks for quality evaluation. Another study, by Sehra et al. [30], was executed using the history file information in India to assess the completeness of the spatial data by utilizing intrinsic indicators. They used the data history files from January 2016 and February 2017 and through comparing the two sets of data, determined the length of the roads, completeness of the road name, the maximum speed, and the semantic accuracy. D’Antonio et al. [31] introduced a model to evaluate VGI feature trustworthiness and user reputation. They evaluated the data validity on the basis of the history of the information, e.g., creation, modification, deletion, and topology. In another study, Touya et al. [32] analyzed and interpreted the VGI quality, as well as examined their evolution over time, by proposing a combined method using the reference data, the objects’ history, and spatial relationships. In the third category, the information quality is investigated directly while taking into account investigating components other than the geometric information. They investigated the OSM history databases in England and Ireland in 2012. The authors reviewed the number and information of the users, the edits,

ISPRS Int. J. Geo-Inf. 2018, 7, 253

3 of 21

the number of geometric changes to each object, and the labels attributed to each object. Their results ISPRS Int. that J. Geo-Inf. 7, x FOR PEER REVIEW 3 of 21 showed 11%2018, of users created or edited 87% of spatial data more than 15 times [33,34].

Figure 1. 1. Categorization Categorization of of VGI VGI quality quality assessment assessment methods. methods. Figure

Although there are a number of VGI-related studies elaborating data quality, quality Although there are a number of VGI-related studies elaborating data quality, quality enhancement enhancement has been sidelined so far. The current research aims at proposing an approach to has been sidelined so far. The current research aims at proposing an approach to enhance the positional enhance the positional accuracy of the OSM road network through producing new data using data accuracy of the OSM road network through producing new data using data history and to minimize history and to minimize the uncertainty in the version provided to the users. Hence, this research the uncertainty in the version provided to the users. Hence, this research presents a five-stage approach presents a five-stage approach based on a Voronoi diagram for improving the quality of the linear based on a Voronoi diagram for improving the quality of the linear road network in OSM through the road network in OSM through the historical edits within a long-term history. For the implementation historical edits within a long-term history. For the implementation of the proposed method, a case of the proposed method, a case study of Tehran, Iran has been selected and the achieved results are study of Tehran, Iran has been selected and the achieved results are presented. presented. This paper consists of the following sections: after the introduction in Section 1, the data history This paper consists of the following sections: after the introduction in Section 1, the data history file is introduced in Section 2. Section 3 presents the theories related to this research. The proposed file is introduced in Section 2. Section 3 presents the theories related to this research. The proposed method is thoroughly explained in Section 4. The case study, the proposed framework, as well as the method is thoroughly explained in Section 4. The case study, the proposed framework, as well as the obtained results are presented in Section 5. The research draws some conclusions and suggestions for obtained results are presented in Section 5. The research draws some conclusions and suggestions for future study in Section 6. future study in Section 6. 2. OSM History File 2. OSM History File The history file of the OSM data encompasses every single edit in the lifetime of OSM, even the Thecontributions history file of from the OSM encompasses every singlethe editchanges in the lifetime OSM,objects even the deleted laterdata contributors. Furthermore, on theofOSM in deleted contributions from later contributors. Furthermore, the changes on the OSM objects in terms terms of attributes and geometry or even the variations whose uid (i.e., user ID) and username of attributes and geometry or even whose uid OSM (i.e., user username are unknown are unknown have been stored inthe thisvariations file [5]. The entire dataID) is and publicly accessible on http: have been stored in this file [5]. The entire OSM data is publicly accessible on //planet.openstreetmap.org/ in the form of XML and PBF. This file has a substantial volume due http://planet.openstreetmap.org / inhistory the form of which XML and PBF. This has a substantial volume due to the complete information of the file, is updated onfile a weekly basis. As an example, to the complete information of the history file, which is updated on a weekly basis. As an example, the history file updated on 29 June 2017 is roughly 37 GB [35]. the history on 29data Juneof2017 is roughly 37 GB [35].and processing such information would In casefile theupdated users require a certain area, analyzing In case the users require data of a certain area, analyzing processing information be time-consuming and complicated. To access the data for aand small area, it issuch possible to accesswould some be time-consuming To access the fortoa save smallthe area, it is possible tofor access some subset of the historyand filecomplicated. from other repositories indata order processing time, example, subset of the history file from other repositories in order to save the processing time, for example, http://osm.personalwerk.de/full-history-extracts/ and Geofabrik.de [5]. The information in the http://osm.personalwerk.de/full-history-extracts/ and (node, Geofabrik.de [5]. TheID, information in the history file is organized based on the type of the objects way, relations), and the version of history file is organized based on the type of the objects (node, way, relations), ID, and the version of the objects. the objects. As illustrated in Figure 2, the information related to an object can be categorized into (a) geometric As illustrated Figure the information related object can be categorized intouser (a) information of the in object; (b) 2,semantic information; andto(c)anthe information related to the geometric information of the object; (b) semantic of information; and (c) the to the producing the object. The geometric information the object includes theinformation ID of the setrelated of the nodes user producing the object. The geometric information of the object includes the ID of the set of the forming a linear object while the semantic information encompasses the ID of the object, its version, nodes forming a linear object while the semantic information encompassesrelated the ID to of two the object, its timestamp, road type, and road name. To illustrate this, the information revisions version, timestamp, road type, road name. illustrate in this, information to two undertaken in 2010 and 2012 for and Sepand Street (IDTo = 85479570) thethe OSM history filerelated is depicted in revisions undertaken in 2010 and 2012 for Sepand Street (ID = 85479570) in the OSM history file is Figure 3. depicted in Figure 3.

ISPRS Int. J. Geo-Inf. 2018, 7, 253 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

4 of 21 4 of 21 4 of 21

Figure 2. 2. Category of of information in in the the OSM OSM history history file. file. Figure Figure 2.Category Category of information information in the OSM history file.

Figure 3. 3. (a)(a) AA part the Figure partofof thestreet streetnetwork networkfrom fromOSM OSMofofTehran; Tehran;(b) (b)aasample sampleofofthe theOSM OSMhistory historyfile filefor Sepand (yellow in version 1. 1. Figure 3.Street (a) A part(yellow of line) the street from OSM of Tehran; (b) a sample of the OSM history file for Sepand Street line) innetwork version

for Sepand Street (yellow line) in version 1.

Containing spatial data of the object users’ information, the OSM history file Containing allall thethe spatial data of the object andand the the users’ information, the OSM history file allows allows us to conduct statistical and quantitative analyses. Additionally, it is possible to identify thefile all the and spatial data of the object and the users’itinformation, OSM the history us toContaining conduct statistical quantitative analyses. Additionally, is possible tothe identify changes changes and evaluate theofstability ofquantitative the spatial information over time by retaining the lineage of thethe allows us to conduct statistical analyses. Additionally, itthe is possible tothe identify and evaluate the stability the and spatial information over time by retaining lineage of historical historical data given the by the users.of Until now, this file has been employed toretaining evaluate the the accuracy changes and evaluate stability the spatial information over time by lineage ofofthe data given by the users. Until now, this file has been employed to evaluate the accuracy of the dataset, the dataset, to replace the reference dataset, and to perform statistical analyses mostly on the spatial historical given bydataset, the users. Until now, this file hasanalyses been employed to evaluate the accuracy of to replace data the reference and to perform statistical mostly on the spatial aspects or the aspects or to thereplace information of one ofdataset, the contributors [33,34]. statistical analyses mostly on the spatial the dataset, the reference and to perform information of one of the contributors [33,34].

aspects or the information of one of the contributors [33,34]. 3. Theoretical Framework 3. Theoretical Framework This section presents the concepts and theories underpinning this research. 3. Theoretical Framework This section presents the concepts and theories underpinning this research. section 3.1.This Medial Axis presents the concepts and theories underpinning this research. 3.1. Medial Axis The Axis medial axis was first proposed by Blum [36] as a tool in image analysis. The medial axis is 3.1. Medial The medial axis was first proposed by Blum [36] as a tool in image analysis. The medial axis unique for each object being topologically equivalent to the original object. The medial axis is defined is unique for each object being topologically equivalent theinoriginal object. The medial axis axis was by points Blum [36] as object’s ato tool image analysis. medial as The a setmedial of all points that first haveproposed at least two on the boundary [37–39].The In other words, is defined as a set of all points that have at least two points on the object’s boundary [37–39]. In other unique for each object being topologically equivalent to the original object. The medial axis is defined medial axis of a curve in two-dimensional space is the geometric location of the centers of all the words, medial axis of a curve in two-dimensional space is the geometric location of the centers of all as circles a set oflocated all points that least two ontothe In other words, inside thehave curveatwhile beingpoints tangent at object’s least twoboundary boundary[37–39]. points (see Figure 4). theThe circles inside curve while being tangent at least two boundary points (seeshapes, Figure 4). medial axislocated ofaxis a curve inthe two-dimensional space thetogeometric location of the centers of all the medial has diverse applications, such as is pattern analysis and identification of the The medial axis has diverse applications, such as pattern analysis and identification of the shapes, circles located inside the curve while being tangent to at least two boundary points (see Figure 4). image compression, curve fitting, extraction of geometric objects, extraction of the central line and The medial axis diverse applications, such as pattern analysis and identification of the shapes, extraction of thehas streets’ central lines [37,40–42]. image compression, curve fitting, extraction of geometric objects, extraction of the central line and extraction of the streets’ central lines [37,40–42].

ISPRS Int. J. Geo-Inf. 2018, 7, 253

5 of 21

image compression, curve fitting, extraction of geometric objects, extraction of the central line and ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 5 of 21 extraction of the streets’ central lines [37,40–42]. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

5 of 21

Figure 4. Medial axis for a 2D curve (adapted from [43]). Figure 4. Medial axis for a 2D curve (adapted from [43]). Figure 4. Medial axis for a 2D curve (adapted from [43]).

There Thereare arenumerous numerousmethods methodsfor forextracting extractingthe themedial medialaxis. axis.Using UsingDelaunay Delaunaytriangulation triangulationand and There are numerous methods for extracting the medial axis. Using Delaunay triangulation and sampling points from the object boundary is the most widely-used method. On the condition sampling points from the object boundary is the most widely-used method. On the conditionthat thatthe the sampling pointsfrom fromthe the object object boundary isare thedense most widely-used method. On the condition that about the sampling points boundary enough, they could contain information sampling points from the object boundary are dense enough, they could contain information about sampling[38,44]. points from the object boundary are denseaxis enough,Delaunay they could contain information about that thatobject object [38,44].Extraction Extractionmethods methodsof ofthe themedial medial axisvia via Delaunaytriangulation triangulationand andthe theVoronoi Voronoi that object [38,44]. Extraction methods of from the medial axis via Delaunay triangulation and theofVoronoi diagram make use of the sampling points the object boundary. As such, the structure Delaunay diagram make use of the sampling points from the object boundary. As such, the structure of diagram make usesampling of the sampling points from the object boundary. As circles such, the structure ofthe triangulation of the is inpoints such a way thatathe peripheral passing from Delaunay thepoints sampling such way that peripheral circles passing Delaunaytriangulation triangulation of of the sampling points isisininsuch a way that thethe peripheral circles passing vertices of each triangle do not encompass any other points. Delaunay triangulation edges are divided from donot notencompass encompass anyother other points. Delaunay triangulation edges fromthe thevertices vertices of of each each triangle triangle do any points. Delaunay triangulation edges into three categories as follows: the first category includes the edges, which restructure the shape are as follows: follows:the thefirst firstcategory category includes edges, which restructure aredivided dividedinto into three three categories categories as includes thethe edges, which restructure boundary, the second category hascategory the edgeshas being located completely inside the shape, andthe theshape, third the shape boundary, the second the edges being located completely inside the shape boundary, the second category has the edges being located completely inside the shape, has the edges being totally outside the shape. The triangulation edges being located totally outside and being totally totallyoutside outsidethe theshape. shape.The The triangulation edges being located andthe thethird third has has the the edges edges being triangulation edges being located the boundary are omitted. Having omitted the extra edges, the structure of the Voronoi diagram is totally are omitted. omitted.Having Havingomitted omittedthe the extra edges, structure of the Voronoi totallyoutside outsidethe theboundary boundary are extra edges, thethe structure of the Voronoi formed by utilizing the peripheral circles of the triangles inside the Delaunay triangulation structure diagram theperipheral peripheralcircles circlesofofthe the triangles inside Delaunay triangulation diagramisisformed formedby by utilizing the triangles inside thethe Delaunay triangulation while the Voronoi diagram vertices are extracted. Yet, these points fail to have a specific order are structure while the Voronoi diagram vertices are extracted. Yet, these points fail to have a specific structure while the Voronoi diagram vertices are extracted. Yet, these points fail to have aand specific merely a set of points. In this research, Algorithm 1, as presented in Figure 5, is employed to connect orderand and are are merely merely a set of 1, 1, as as presented in Figure 5, is5, is order of points. points. In In this thisresearch, research,Algorithm Algorithm presented in Figure the Voronoito diagram In this algorithm, the distance between the points is calculated first, employed to connectvertices. Voronoi diagram vertices. InInthis thethe distance between the the points employed connect the Voronoi diagram vertices. thisalgorithm, algorithm, distance between points and the two points thepoints greatest distance are considered as the first and theas last of the calculated first, that and have the two that have greatest distance areare considered the firstfirst and isiscalculated first, and the two points that havethe the greatest distance considered aspoints the and object. Then the other points are selected according to the shortest distance from their previous points. the last points of the object. Then the other points are selected according to the shortest distance from the last points of the object. Then the other points are selected according to the shortest distance from This continues reaching the ending point of thethe object. their previousuntil points. This continues until reaching ending point of of thethe object. their previous points. This continues until reaching the ending point object.

Figure 5. Pseudo-code representing Algorithm 1. Figure 5. Pseudo-code representing Algorithm 1.

Figure 5. Pseudo-code representing Algorithm 1.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

6 of 21

3.2. Topological Spatial Relationships ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

6 of 21

In order to investigate the topological relationships between the objects, diverse methods have been proposed so far.Spatial The Relationships four-intersection matrix is one of the most extensively applied in this 3.2. Topological context. This matrix a comprehensive report the binary topological spatial In order is to considered investigate theas topological relationshipsmodel betweentothe objects, diverse methods have beenbetween proposed regions, so far. Thelines, four-intersection is one of theon most applied in this relation relationships and pointsmatrix [45,46]. Based thisextensively model, the topological context. matrix is considered as is a comprehensive to report thevalue binary(empty, topological spatial Ra,b between twoThis linear objects A and B characterizedmodel by the binary non-empty) of the relationships between regions, lines, and points [45,46]. Based on this model, the ◦topological relation ◦ set intersection of A’s interior (A ) and boundary (∂A), with the interior (B ) and boundary (∂B) of B R a,b between two linear objects A and B is characterized by the binary value (empty, non-empty) of (Equation the (1)): set intersection of A’s interior (A°) and (B°) and boundary (∂B) of "boundary (∂A), with the interior # ∂A ∩ ∂B ∂A ∩ B◦ B (Equation (1)): Ra,b = (1) ◦ ◦ A◦∂A∩∩∂B ∂A ∩ ∂B B° A ∩ B R a,b = [

A° ∩ ∂B A° ∩ B°

]

(1)

Figure 6 illustrates the possible relationships between two linear objects in a two-dimensional Figure 6 illustrates the possible relationships between two linear objects in a two-dimensional space based onbased this model. space on this model.

Eight kinds of topological relations for in aintwo-dimensional space [46]. Figure 6.Figure Eight6.kinds of topological relations forlines lines a two-dimensional space [46].

3.3. Data Quality Elements

3.3. Data Quality Elements

A large body of research has been conducted to evaluate the quality based on methods which

A large body of research has been to evaluate the quality basedaon methods which involve comparing the VGI with a setconducted of spatial information reference. In these studies, number of elements have been used for quality and by comparing the reference data with the contributed ones, involve comparing the VGI with a set of spatial information reference. In these studies, a number a quantitative estimation is calculated concerning these elements. International Organization for of elements have been used for quality and by comparing the reference data with the contributed Standards [47] defines five quality elements: completeness, logical consistency, positional accuracy, ones, a quantitative estimation is calculated concerning these International for temporal accuracy, and thematic accuracy. In this section, we elements. provide an overview of the Organization method estimatefive positional accuracy and completeness. Standardsused [47]todefines quality elements: completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy. In this section, we provide an overview of the method used 3.3.1. Positional Accuracy to estimate positional accuracy and completeness. In order to estimate the positional accuracy, the proximity between lines in OSM and the reference data is measured and compared. To do so, the following four criteria of Hausdorff distance, 3.3.1. Positional Accuracy orientation, shape, and the overlapped buffer area are considered. For normalization of criteria, Chehreghan and Ali [48] have assessed fuzzy membership functions In order to estimate theAbbaspour positional accuracy, the proximity between lines (Z-shape in OSMfunction) and the reference for checking similarity. The reason for doing so is that it is a suitable function for quantifying data is measured and compared. To do so, the following four criteria of Hausdorff distance, orientation, similarity and dissimilarity between the linear objects from two sources [48]. Hence, the positional shape, andaccuracy the overlapped buffer area are considered. For normalization of criteria, Chehreghan and Ali for each object is computed using Equation (2):

Abbaspour [48] have assessed fuzzy membership functions (Z-shape function) for checking similarity. ∑ 4 𝑤𝑖 𝑐𝑖 𝑄𝑝 = ∑𝑖=1 × 100 4 𝑤 The reason for doing so is that it is a suitable function for quantifying similarity and (2) dissimilarity 𝑖=1 𝑖 between the linear objects from two sources Hence, thethe positional accuracy for each where 𝑄𝑝 represents the spatial similarity[48]. relation between objects, and 𝑤𝑖 represents the object is weights given to the criterion. The values of the criteria are calculated after normalization by a computed using Equation (2): 4 membership function [48]. What follows is a description of the criteria being used: ∑ i =1 wi c i Q p to = estimate × 100 in spatial sciences is the Hausdorff (2) Distance: One of relevant methods the distance 4 distance that estimates the closeness between ∑ thei =linear 1 wi objects [49]. The Hausdorff distance has been the greatest distance between the shortest distance existing between each point from where Q p introduced representsasthe spatial similarity relation between the objects, and wi represents the weights given to the criterion. The values of the criteria are calculated after normalization by a membership function [48]. What follows is a description of the criteria being used: Distance: One of relevant methods to estimate the distance in spatial sciences is the Hausdorff distance that estimates the closeness between the linear objects [49]. The Hausdorff distance has been introduced as the greatest distance between the shortest distance existing between each point from the first object and the set of the points of the second objects [50]. Nonetheless, the Hausdorff distance

ISPRS Int. J. Geo-Inf. 2018, 7, 253

7 of 21

is sensitive to the shape of the two objects, especially to the parts far from the center. Tong et al. [51] introduced the short-line median Hausdorff distance (SMHD), indicating that, in comparison with Hausdorff and median Hausdorff, this method has a lower variance and a more efficient performance when facing complex data to measure the distance between the linear objects [52]. The SMHD between two linear objects can be calculated through Equation (3) where L( PL1 ) and L( PL2 ) are the length of ISPRS Int. J. Geo-Inf.while 2018, 7, xm FOR PEER REVIEW 7 of 21 the two linear objects, ( PL 1 , PL2 ) and m ( PL2 , PL1 ) signify the median of the shortest distance between each point ofand thethe first object andofthe of the[50]. corresponding object. These are calculated the first object set of the points thepoints second objects Nonetheless, the Hausdorff distance using Equations (4)toand (5): of the two objects, especially to the parts far from the center. Tong et al. [51] is sensitive the shape ( Hausdorff distance (SMHD), indicating that, in comparison with introduced the short-line median m f L( PL ( PL ), ai lower ( PL 2 ) efficient performance 1 , PL2has 1) < L Hausdorff and median C Hausdorff, this method variance and a more (3) 1 = m( PL2the , PLdistance when facing complex data to measure between theLlinear ( PL1 ) ≥ ( PL2 )objects [52]. The SMHD 1 ), i f L between two linear objects can be calculated through Equation (3) where 𝐿(𝑃𝐿1 ) and  𝐿(𝑃𝐿2 ) are the P − Lb the median of the PL2 ) =while median min𝑚(𝑃𝐿 (4) ( PL1 ,objects, length of the twomlinear 𝑚(𝑃𝐿 , 𝑃𝐿 Pa1∈ PL21) and Pb ∈ PL 2 ,2𝑃𝐿1 )a signify of the corresponding  and the points shortest distance between each point of the first object object. m(using PL2 ,Equations PL1 ) = median (5) Pb ∈ PL2 min Pa ∈ PL1 Pb − L a These are calculated (4) and (5): 𝑚(𝑃𝐿 𝑖𝑓 the 𝐿(𝑃𝐿linear 1 , 𝑃𝐿2 ),of 1 ) < 𝐿(𝑃𝐿 2) In these Equations, Lb and L a are edges objects PL1 and PL2 , Pa and { 𝐶1 =two (3) Pb are the 𝑚(𝑃𝐿2 , 𝑃𝐿1 ), 𝑖𝑓 𝐿(𝑃𝐿1 ) ≥ 𝐿(𝑃𝐿2 ) points belonging to PL1 and PL2 , respectively. ||.|| denotes the Euclidean distance between a point on 𝑚(𝑃𝐿1 , 𝑃𝐿2 ) = 𝑚𝑒𝑑𝑖𝑎𝑛𝑃𝑎∈𝑃𝐿1 {𝑚𝑖𝑛𝑃𝑏∈𝑃𝐿2 ‖𝑃𝑎 − 𝐿𝑏 ‖} (4) object and one of the edges of another object. ‖ ‖ ) 𝑚(𝑃𝐿 , 𝑃𝐿 = 𝑚𝑒𝑑𝑖𝑎𝑛 {𝑚𝑖𝑛 𝑃 − 𝐿 } (5) 1 𝑃𝑎the ∈𝑃𝐿1 local 𝑏 𝑎 𝑏 ∈𝑃𝐿2 Orientation: This criterion is 2employed to 𝑃compare orientation of two linear objects. In these Equations, 𝐿𝑏 and 𝐿two two edges of the linear objects The orientation difference between objects can play a 𝑃𝐿 substantial role as𝑃𝑏aare geometrical 𝑎 arelinear 1 and 𝑃𝐿2 , 𝑃 𝑎 and ‖. ‖ denotes thethe points belongingoftodata 𝑃𝐿1 and 𝑃𝐿2 , [53]. respectively. the the Euclidean distance from between criterion in evaluation quality The angle between line formed thea beginning point on object and one of the edges of another object. and ending points of the object and the horizontal axis determine the orientation of a linear object. Orientation: This criterion is employed to compare the local orientation of two linear objects. See PL1 and 7, where two linear areplay considered to be while the angle 2 in Figure The PL orientation difference between two linearobjects objects can a substantial roleparallel, as a geometrical in β the evaluation of data quality Thedifference angle between theobjects line formed fromtothe differencecriterion of α and is close to zero. In case the[53]. angle of the is close π, it implies beginning and ending points of the object and the horizontal axis determine the orientation of a linear that these two linear objects are parallel, but have different orientations. The orientation difference object. See 𝑃𝐿1 and 𝑃𝐿2 in Figure 7, where two linear objects are considered to be parallel, while the between two linear objects can be estimated using Equation (6) [54]: angle difference of α and β is close to zero. In case the angle difference of the objects is close to π, it   → different implies that these two linear objects are parallel, → but have orientations. The orientation V . V PL −1  PLusing 2 1 difference between two linear objects be estimated Equation (6) [54]: C2 can = cos (6) → → ⃗ 𝑃𝐿 .𝑉 ⃗ 𝑃𝐿 𝑉 . V 1 2 −1 V 𝐶 = cos ( PL1 PL )2 (6) 2

⃗ 𝑃𝐿 ‖.‖𝑉 ⃗ 𝑃𝐿 ‖ ‖𝑉

1 2 → ⃗ ⃗ where V PL and are the vectors formed from the first and second the V where 𝑉 and 𝑉 are the vectors formed from the first and second nodes of the nodes first and of second PL 𝑃𝐿1 2 𝑃𝐿2 1 objects. second objects.



first and

7. Orientationdifference difference between twotwo linear objects. FigureFigure 7. Orientation between linear objects.

Shape: The linear objects might vary in terms of shape so that the difference in the shape of two

Shape: Theis linear varycriterion in terms of shape so of that difference intwo the shape of objects used asobjects another might well-known for the evaluation the the difference between polylines or close). well-known One of the functions related theevaluation object shape of is the two objects is used(open as another criterion fortothe thecumulative differenceangle between two function [55,56]. To calculate the shape difference between two linear objects, Equation (7) could be polylines (open or close). One of the functions related to the object shape is the cumulative angle adopted in which 𝜃𝑃𝐿1 and 𝜃𝑃𝐿2 are the cumulative angle functions of the linear objects 𝑃𝐿1 and 𝑃𝐿2 , function [55,56]. To [54]: calculate the shape difference between two linear objects, Equation (7) could be respectively adopted in which θ PL1 and θ PL2 are the cumulative angle functions of the linear objects PL1 and PL2 , |𝜃𝑃𝐿1 − 𝜃𝑃𝐿2 |, 𝑖𝑓|𝜃𝑃𝐿1 − 𝜃𝑃𝐿2 | ≤ π 1 (7) respectively [54]: 𝐶3 = ∫0 𝑓(𝜃𝑃𝐿1 , 𝜃𝑃𝐿2 )𝑑𝑠, 𝑓(𝜃𝑃𝐿1 , 𝜃𝑃𝐿2 ) = {2 − |𝜃 − 𝜃 |, 𝑖𝑓|𝜃 − 𝜃 | > π 𝑃𝐿1

C3 =

Z 1 0

(   f θ PL1 , θ PL2 ds, f θ PL1 , θ PL2 =

𝑃𝐿2

𝑃𝐿1

𝑃𝐿2

θ PL − θ PL , i f θ PL − θ PL ≤ π 2 2 1 1 2π − θ PL1 − θ PL2 , i f θ PL1 − θ PL2 > π

(7)

ISPRS Int. J. Geo-Inf. 2018, 7, 253

8 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

8 of 21

3.3.2. Completeness 3.3.2. Completeness The road length is generally used to investigate the completeness of the OSM data in comparison The road length is generally used to investigate the completeness of the OSM data in comparison with the reference data. In order to determine the completeness of a road network, the lengths of roads with the reference data. In order to determine the completeness of a road network, the lengths of from the OSM are compared with the reference data. [57]. This metric is calculated as the ratio of the roads from the OSM are compared with the reference data. [57]. This metric is calculated as the ratio length the OSM object the) length of the of reference objectobject (LRef )(Lin the same area using OSM ) to of theoflength of the OSM(Lobject (LOSM to the length the reference Ref ) in the same area Equation (8) and presented as a percentage. The closer this ratio is to one-hundred, the more using Equation (8) and presented as a percentage. The closer this ratio is to one-hundred, thecoverage more of coverage the OSM of data is obtained: the OSM data is obtained: L QC = LOSM × 100 (8) QC = LOSM Ref × 100 (8) L Ref

4. The Proposed Approach 4. The Proposed Approach Our method proposes an approach based on a Voronoi diagram for enhancing the positional Our proposes approach based on a Voronoi diagram enhancing the positional accuracy ofmethod road network in an OSM. The adopted method includes fivefor stages, as shown in Figure 8. accuracy of road network in OSM. The adopted method includes five stages, as shown in Figure Restoring the historical versions of an object was a prerequisite in this research. With reference to 8. this, Restoring the historical versions of an object was a prerequisite in this research. With reference to during the first stage, the history file of the OSM data, which encompasses the entire information about during the firsthad stage, thecompiled. history fileHaving of the OSM data, which encompasses the entire information allthis, the OSM objects, to be compiled the history file of Tehran, the information about all the OSM objects, had to be compiled. Having compiled the history file of Tehran, the related to the latest version existing in OSM was also extracted. Afterwards, for each object in the information related to the latest version existing in OSM was also extracted. Afterwards, for each latest version, the corresponding objects in all the existing versions in the history file were identified. object in the latest version, the corresponding objects in all the existing versions in the history file To do so, two types of information, namely, semantic and geometric, related to the objects were used. were identified. To do so, two types of information, namely, semantic and geometric, related to the During the third stage, after matching the corresponding objects, preprocessing was performed with objects were used. During the third stage, after matching the corresponding objects, preprocessing thewas intention of identifying and removing the outlier data from the historical versions of each object. performed with the intention of identifying and removing the outlier data from the historical The fourth of stage extracting a representative object afor each object among alleach the object existing versions eachaimed object.atThe fourth stage aimed at extracting representative object for versions. As such, we made use of the medial axis extraction method by implementing the Voronoi among all the existing versions. As such, we made use of the medial axis extraction method by diagram. In the last thediagram. topologyInofthe thelast extracted object for each of the corresponding implementing the stage, Voronoi stage, the topology of category the extracted object for each objects in the roads network was enhanced. To fulfill this,was theenhanced. objects inTo thefulfill extracted version category of the corresponding objects in the roads network this, the objectsare connected to eachversion other by the topology of the objectsthe intopology the history file, and this may in the extracted areinvestigating connected to each other by investigating of the objects in the increase the certainty in the output version. What follows is an explanation of the method involved history file, and this may increase the certainty in the output version. What follows is an explanation in of the method involved in identifying the history corresponding objects in as thewell history filemethod (Section 4.1), as in identifying the corresponding objects in the file (Section 4.1), as the involved as the method and removal thewell identification andinvolved removalin ofthe theidentification outlier data (Section 4.2). of the outlier data (Section 4.2).

Figure8.8.Flowchart Flowchartof of the the implementation implementation of Figure ofthe theproposed proposedmethod. method.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

9 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

9 of 21

4.1. Object Object Matching Matching 4.1. Object matching matching refers refers to to the of the Object the identification identification of the objects objects with with an an equivalent equivalent entity entity in in several several datasets [58–61]. The main aim of object matching was to identify the corresponding objects across datasets [58–61]. The main aim of object matching was to identify the corresponding objects across historical versions versionsin inthe theOSM OSMhistory historyfile. file. general, three aspects, as semantic, topologic, historical In In general, three aspects, suchsuch as semantic, topologic, and and geometric, are considered for the object matching process based on which similarity of the objects geometric, are considered for the object matching process based on which similarity of the objects is is evaluated to identify the corresponding objects [62]. evaluated to identify the corresponding objects [62]. Thereafter, we corresponding objects objects in in the the linear linear dataset dataset using using two two types types of of Thereafter, we identified identified the the corresponding semantic and geometric information of the objects through two stages. In the first stage, the semantic semantic and geometric information of the objects through two stages. In the first stage, the semantic information attributed the spatial spatial data data in in OSM OSM was was used. used. Figure Figure 99 shows shows the the semantic semantic information information information attributed to to the about aa linear object, including including the the ID ID of of the the object, object, the the type type of of the the object object (primary (primary road, road, secondary secondary about linear object, road, etc.), the traffic route (one-way or not), and maximum speed, among others. In many cases, road, etc.), the traffic route (one-way or not), and maximum speed, among others. In many cases, the the semantic information related to an object varies in historical versions. This is because the producers semantic information related to an object varies in historical versions. This is because the producers of such such data data might might have have an of aa certain certain object object or or might might only only be be able to of an improper improper understanding understanding of able to diagnose and perceive limited aspects of a certain spatial object [9]. For example, many users might diagnose and perceive limited aspects of a certain spatial object [9]. For example, many users might mistakenly label label the the road road type Therefore, the the semantic mistakenly type (highway, (highway, trunk trunk road, road, primary primary road, road, etc.). etc.). Therefore, semantic information related to the OSM objects is often deficient, implying that a mere reliance on the semantic information related to the OSM objects is often deficient, implying that a mere reliance on the information for identification of the corresponding objects is not sufficient. semantic information for identification of the corresponding objects is not sufficient.

Figure 9. An example of tags of Mirza Shirazi Street in OSM. Figure 9. An example of tags of Mirza Shirazi Street in OSM.

In the second stage, buffer analysis was conducted as a method to identify the corresponding In the stage, buffer was conductedbetween as a method to identify the from corresponding objects. Thesecond common buffer areaanalysis is the area overlapped the regions resulted the buffer objects. The common buffer area is the area overlapped between the regions resulted from theobjects buffer created for two linear objects [63]. For example, in Figure 10, for identifying the corresponding created for two linear objects [63]. For example, in Figure 10, for identifying the corresponding objects with a different semantic label, the common buffer area was used. with Equation a different(9) semantic label, the common buffer was used. was used in order to estimate the area common buffer area of the objects, where 𝐴𝑃𝐿1 is Equation (9) was used in order to estimate the common of the objects, A PL1 is the resultant buffer area for the first linear object, while 𝐴𝑃𝐿2 buffer standsarea for the buffer areawhere of the second the resultant forarea the first linear object, while Aby stands the buffer areaThe of the second PL2the linear object, buffer and 𝐴𝑖area is the of the region overlapped two for resultant buffers. closer the linear object, and A is the area of the region overlapped by the two resultant buffers. The closer the i obtained results are to each other, the more similar the two objects are in terms of geometry: obtained results are to each other, the more similar the two objects are in terms of geometry: 2𝐴 𝐶4 = 𝐴 +𝐴𝑖 (9) 𝑃𝐿1 2Ai𝑃𝐿2 C4 = (9) A PL1 + A PL2

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

ISPRS Int. J. Geo-Inf. 2018, 7, 253

10 of 21

10 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

10 of 21

Figure 10. An example of the buffer method. Figure 10.10.An buffermethod. method. Figure Anexample example of of the the buffer

4.2. Identification and Removal of Outliers 4.2. Identification Removal of Outliers 4.2. Identification and and Removal of Outliers The process involved in the identification and removal of the outlier data is typically done with The process involved in the identification and removal of the outlier data is typically done with The process involved aspects in the identification and removal of the outlier data for is typically done with a a focus on the geometrical of of data and techniques cleansing a focus on the geometrical aspects data andcommonly commonly includes includes techniques for cleansing datadata fromfrom focus on thetogeometrical of data and commonlyofincludes for cleansingthe data from errors. Due the thataspects all the versions certaintechniques objectwas was accessible, criteria errors. Due tofact the fact that all the versionsand andchanges changes of aa certain object accessible, the criteria errors. Due to the fact that all the versions and changes of a certain object was accessible, the criteria to to choose the invalid object waswas thethe number ofan anobject’s object’s spatial location. to choose the invalid object numberofofthe the repetition repetition of spatial location. ThisThis choose the invalid object was the number of the repetition of an object’s spatial location. This process process performed in two stages(Figure (Figure11). 11). During During the linear object was was process was was performed in two stages the first firststage, stage,each each linear object was performed in two stages (Figure 11). During the first stage, each linear object was converted converted into a point set using the sampling method. Using a sampling technique, we take the point converted into a point set using the sampling method. Using a sampling technique, we take the point into abetween point the method. Using a the sampling technique, we take the point between theusing two points of the linear object, except for the beginning beginning and end at specified intervals. between the set two points of sampling the linear object, except for andthe the end at specified intervals. In the second stage, the object, outlier data in the corresponding objects of end a specific object are identified the two points of the linear except for the beginning and the at specified intervals. In the In the second stage, the outlier data in the corresponding objects of a specific object are identified using density-based spatial clustering of applications with noise (DBSCAN) algorithm [64]. seconddensity-based stage, the outlier dataclustering in the corresponding objectswith of a noise specific(DBSCAN) object are identified using spatial of applications algorithm using [64]. Accordingspatial to Algorithm 2, presented in Figure 11, each(DBSCAN) sample point is primarily weighted. density-based clustering of applications with noise algorithm [64]. According to According to Algorithm 2, presented in Figure 11, each sample point is primarily weighted. Weighting is done based on the number of points in each point’s buffer, i.e., the points which are less Algorithm is2,done presented in 11, each sample is primarily weighted. Weighting is done Weighting onFigure theasnumber ofexperiments points in point each point’s buffer,receive i.e., the points which are less than the radiusbased (e.g., valued 5 m in the developed) would a significance degree based on the number of points in each point’s buffer, i.e., the points which are less than the radius than the as 5 m in the experiments developed) receive as a significance degree andradius in case(e.g., theirvalued significance is less than a certain threshold, they would are considered outliers and are (e.g.,inconsequently valued as 5significance m in the experiments would receive a significance degree andand in and case their is lessweights thandeveloped) ago certain threshold, theythreshold, are considered outliers are removed. If their beyond the specified they willas be retained in case their significance is less than a certain threshold, they are considered as outliers and are consequently the main dataset. consequently removed. If their weights go beyond the specified threshold, they will be retained in removed. If their weights go beyond the specified threshold, they will be retained in the main dataset. the main dataset.

Figure 11. Pseudo-code depicting Algorithm 2.

Figure 11. 11. Pseudo-code Pseudo-code depicting depicting Algorithm Algorithm 2. 2. Figure

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW ISPRS Int. J. Geo-Inf. 2018, 7, 253

11 of 21 11 of 21

5. Implementation 5. Implementation 5.1. Case Studies 5.1. Case Studies In this study, district 6 of Tehran’s municipality with about 21 km2 area was selected for our In this study, 6 of Tehran’s with about 21 km area selected for was our investigation. For district implementation, the municipality suggested comprehensive file2 of thewas OSM history investigation. implementation, the suggested comprehensive file To of the OSM history was file extracted extracted fromFor http://planet.openstreetmap.org/ in the PBF format. separate the history of the from http://planet.openstreetmap.org/ in the PBF format. To separate file of thesoftware chosen chosen study area from the comprehensive file, the history osmconvert study area from the comprehensive file, osmconvertwas software (http://wiki.openstreetmap.org/wiki/Osmconvert) used.(http://wiki.openstreetmap.org/wiki/ The format of the history file was Osmconvert) wasPBF used. format of the file different was converted from into OSM. In this converted from intoThe OSM. In this file,history there are versions of PBF the points, ways, and file, there respectively. are different We versions of the points, ways, and Weand concentrated relations, concentrated on enhancing therelations, quality ofrespectively. the linear data the linear on enhancing the quality the linearfile. dataFigure and the datathe were separated the history file. data were separated fromof the history 12alinear presents history file of from the study area while Figure thelatest history file ofofthe area while Figure 12b shows the latest version the Figure 12a 12b presents shows the version thestudy information about the objects existing in OSM. Theof users information in OSM. users have 24 labels classify the streets have used 24about labelsthe to objects classifyexisting the streets in theThe study area. Theused objects havingtothe following labels in the study area. Theobjects objects having following motorway, labels which include 37,238secondary, objects that were which include 37,238 that were the investigated: trunk, primary, tertiary, investigated: motorway, trunk, primary, secondary, tertiary, and residential. In order to evaluate data and residential. In order to evaluate data quality, a reference dataset generated by Tehran quality, a reference dataset generated a cartographic 1:2000 used, Municipality at a cartographic scaleby of Tehran 1:2000 Municipality was used, as at shown in Figurescale 12c. of The OSMwas datasets as shown in Figure 12c. The OSM datasets were provided in the Coordinate Systemwhich with were provided in the Geographical Coordinate System with theGeographical WGS-1984 reference ellipsoid, the reference ellipsoid, which zone was projected onto the UTM projection, zone 39. wasWGS-1984 projected onto the UTM projection, 39.

Figure 12. 12. Study area: (a) the OSM data history file; (b) the latest version of the OSM; and and (c) (c) the the Figure reference dataset. dataset. reference

5.2. Identification Identification of of the the Corresponding Corresponding Objects Objects in in the the History History File File 5.2. At this thisstage, stage,atatfirst firstthe thecorresponding corresponding objects history were identified; afterwards, of At objects in in thethe history filefile were identified; afterwards, of the the corresponding objects one object, a corresponding feature was produced for that object by corresponding objects of oneofobject, a corresponding feature was produced for that object by extracting extracting medial axis. Thehistory complied filethe includes thecontributions historical contributions an object the medial the axis. The complied filehistory includes historical of an objectoffrom 2007 from 2007 to 2017. to 2017. Identifyingthe thecorresponding corresponding objects for each in historical versions was conducted Identifying objects for each objectobject in historical versions was conducted through through two In step, the first step, by investigating the semantic information in thefile, history file, it two steps. In steps. the first by investigating the semantic information in the history it became became evident that the label of the object ID was identical in almost all versions of the changes of an evident that the label of the object ID was identical in almost all versions of the changes of an object. object. this reason, to identify the corresponding objects, in thefile history file was For For thisFor reason, to identify the corresponding objects, the ID inthe theID history was used. Forused. example, example, in for Figure 13L1 forwith object with object ID 429262, object L2 was to considered to bebased equivalent in Figure 13 object ID L1 429262, L2 was considered be equivalent on thebased same on The the same The second step, identifying which involves identifying the corresponding objects with a ID. secondID. step, which involves the corresponding objects with a different semantic different semantic buffer label, the was used as the fulfil this, for both label, the common areacommon was usedbuffer as thearea criterion. To fulfil this,criterion. for both To linear objects, a buffer linear objects, buffer analysis with a distance of 10 m (valued in the experiments to developed) analysis with a adistance of 10 m (valued in the experiments developed) corresponding its width corresponding itsoverlapping width was created andtwo thebuffers overlapping area ofEquation the two (9) buffers is estimated. was created andtothe area of the is estimated. was employed to

ISPRS Int. J. Geo-Inf. 2018, 7, 253 ISPRSInt. Int.J.J.Geo-Inf. Geo-Inf.2018, 2018,7,7,xxFOR FORPEER PEERREVIEW REVIEW ISPRS

12 of 21 12of of21 21 12

compute the(9) objects common to area. In case this valuecommon is largerarea. thanIn the threshold, theisistwo objects could Equation (9) was employed employed to compute compute the objects common area. In case thisvalue value larger thanthe the Equation was the objects case this larger than bethreshold, identified as two corresponding objects. threshold, the thetwo two objects objects could couldbe beidentified identified as as two two corresponding corresponding objects. objects.

Figure 13. An exampleofof ofidentifying identifying the the corresponding corresponding objects: objects: (a) using the IDID inin the history file; Figure An example identifying the the ID in the history file; Figure 13.13. An example corresponding objects:(a) (a)using using the the history file; and(b) (b)calculating calculating the thecommon commonbuffer buffer area. area. and and (b) calculating the common buffer area.

5.3. Preprocessing Preprocessing 5.3. 5.3. Preprocessing Since OSM OSM data data are are retrieved retrieved with with some some topological topological errors, errors, data data preprocessing preprocessing is is important important to to Since Since OSM data are retrieved with some topological errors, data preprocessing is important minimize the the possible possible outliers. outliers. The The preprocessing preprocessing encompasses encompasses the the reduction reduction and and removal removal of of the the to minimize minimize the possible outliers. The preprocessing encompasses the reduction and removal of outliers from from the the spatial spatial information. information. In In practice, practice, the the geometric geometric information information of of an an object object would would be bethe outliers outliers from the spatial information. In practice, the geometric information of an object would be considered when when identifying identifying and andremoving removingthe the error error from from the thehistorical historicalversions versions of of an an object. object. considered considered when identifying the error from the historical versions an object. and To eliminate eliminate the outlier outlierand dataremoving in the the corresponding corresponding objects of aa certain certain object, aaof point-based and To the data in objects of object, point-based To eliminate thewas outlier data inin the corresponding a certain object, a corresponding point-based and two-stage solution was introduced in Section 3.2, during duringobjects which,of firstly, the set set of the the corresponding two-stage solution introduced Section 3.2, which, firstly, the of two-stage solution was introduced in Section 3.2, during which, firstly, the set of the corresponding objects for for each each object object was was converted converted into into points points by by sampling, sampling, while while during during the the second second stage stage the the objects objects fordata eachwere object was converted into points by on sampling, while during the second stage the outlier outlier data were identified and removed removed based on the the predefined predefined threshold using the the DBSCAN DBSCAN outlier identified and based threshold using method [65]. For For instance, instance, is shown shown inon Figure 14 that that five fivethreshold points (P (P11using to PP55the with weightmethod less than than method itit is Figure 14 points to with aa weight less data were [65]. identified and removed basedin the predefined DBSCAN [65]. 2) were identified as the outlier points and were, accordingly, removed from the set of the points. 2) were identified as the outlier points and were, accordingly, removed from the set of the points. For instance, it is shown in Figure 14 that five points (P1 to P5 with a weight less than 2) were identified Having identified and removed removed the outlier outlier points, the the medial axis process was carriedidentified out on on the the identified the points, medial axis process was carried out as Having the outlier points and and were, accordingly, removed from the set of the points. Having and data. the outlier points, the medial axis process was carried out on the data. data. removed

Figure 14. outlier pointsin inall allversions versionsof an object. Figure 14.An Anexample exampleof ofidentifying identifyingoutlier outlier points points in all versions ofof an object. Figure 14. An example of identifying an object.

5.4. Extracting the MedialAxis Axis 5.4. Extracting the Medial Axis 5.4. Extracting the Medial The aim of this stagewas wasto toextract extractaarepresentative representative feature feature for an object among allall the versions The aim this stage was to extract representative all the versions The aim ofof this stage featurefor foran anobject objectamong among the versions existing in the history file. Therefore, the method proposed in Section 3.1 was used to extract the existing in the history file. Therefore, the method proposed in Section 3.1 was used to extract the existing in the history file. Therefore, the method proposed in Section 3.1 was used to extract the medial

ISPRS Int. J. Geo-Inf. 2018, 7, 253 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

13 of 21 13 of 21

axis. In the first stage, a buffer was created (e.g., valued as 10 m in the experiments developed) around themedial pointsaxis. sampled each aobject 15a).(e.g., In the second stage, triangulation In the from first stage, buffer(Figure was created valued as 10 m inthe the Delaunay experiments developed) of the points sampled each of object (Figure points 15a). Inwas theformed second(Figure stage, 15b). the Delaunay thearound set of points resultant from from the buffer the sampled In the third triangulation setformed of points resultantinfrom the buffer of the sampled points formed (Figure stage, the edgesofofthe the triangles the Delaunay triangulation, whichwas are determined from In the third stage, the edges of the 15c). formed triangles the Delaunay triangulation, which are the15b). object’s border, are removed (Figure In the fourthinstage, after triangulation of the sampled determined from the object’swas border, are removed 15c). Inthe the next fourthstage, stage,the after triangulation points, the Voronoi diagram created (Figure (Figure 15d). During Voronoi diagram of the sampled points, the Voronoi diagram was created (Figure 15d). During the next stage, axis the is vertices were extracted (Figure 15e). Finally, the diagrams vertices are connected and the medial Voronoiusing diagram vertices 2were extracted extracted Algorithm (Figure 15f). (Figure 15e). Finally, the diagrams vertices are connected and the medial axis is extracted using 2 (Figure 15f).executed to simplify and smoothen the However, it should be noted that aAlgorithm refinement stage was However, it should be noted that a refinement stage was executed to simplify and smoothen the border of the buffer before estimating the medial axis because of the fact that the number of polygon border of the buffer before estimating the medial axis because of the fact that the number of polygon points obtained from the buffer of the sampled points was high. After simplifying the boundary of points obtained from the buffer of the sampled points was high. After simplifying the boundary of the shape, the medial axis was extracted. The medial axis also had numerous details and nuances the shape, the medial axis was extracted. The medial axis also had numerous details and nuances and and the oversampling problem. Therefore, a post-processing step was required after calculating the the oversampling problem. Therefore, a post-processing step was required after calculating the medial axis to remove additional branches of the medial axis and maintain the main branches. In this medial axis to remove additional branches of the medial axis and maintain the main branches. In this study, two radial distancesimplification simplificationalgorithms algorithms were implemented. study, two radialdistance distanceand andperpendicular perpendicular distance were implemented. The radial points in in the thevicinity vicinityofofthe thespecific specific radius, while The radialdistance distancealgorithm algorithmeliminates eliminates the the points radius, while thethe perpendicular distance algorithm is an iterative process so that a point among three points is removed, perpendicular distance algorithm is an iterative process so that a point among three points is if the distance of the midpoint from the line passing from the other is less thanisthe removed, if the distance of the midpoint from the line passing fromtwo thepoints other two points lessspecified than threshold value. the specified threshold value.

Figure 15.15. The the Voronoi Voronoidiagram diagrammethod: method:(a)(a) a sample point Figure Themedial medialaxis axisapproximation approximation using using the a sample point of of thethe shape boundary; (b) Delaunay triangulation of the boundary points; (c) discarding triangles that shape boundary; (b) Delaunay triangulation of the boundary points; (c) discarding triangles that areare outside the shape; (d) applying the Voronoi diagram; (e) extracting the Voronoi diagram vertices; outside the shape; (d) applying the Voronoi diagram; (e) extracting the Voronoi diagram vertices; and (f)(f) connecting on Algorithm Algorithm2.2. and connectingthe theVoronoi Voronoidiagram’s diagram’svertices vertices based based on

ISPRS Int. J. Geo-Inf. 2018, 7, 253

14 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

14 of 21 14 of 21

5.5. Topological Check ofofof the 5.5. Topological Check theObjects Objects 5.5. Topological Check the Objects At thisstage, stage,the thetopology topologyof of the object extracted forfor each ofthe the corresponding objects the in AtAt this topology ofthe theobject object extracted each ofcorresponding the corresponding objects this stage, the extracted for each of objects ininthe road network has to be checked. The topological errors in the created version are eliminated by the road network has to be checked. The topological errors in the created version are eliminated road network has to be checked. The topological errors in the created version are eliminated by examining how the objects are connected to each other in the history of that object. Hence, the byexamining examining how objects connected to each inhistory the history of object. that object. how thethe objects are are connected to each otherother in the of that Hence, Hence, the connections ofof each object toto other objects were examined thethe history filefile sothat, that, forfor each object, the connections each object other objects were examined in history so that, each object, connections of each object to other objects were examined ininthe history file so for each object, every version was inspected to determine what type of topological relation they have with the other every version was inspected to determine what type of topological relation they have with the other every version was inspected to determine what type of topological relation they have with the other objects. A four-intersection topological model was used to determine the topological relationships objects. modelwas wasused usedtotodetermine determinethe thetopological topological relationships objects.AAfour-intersection four-intersection topological topological model relationships between linear objectsin two-dimensionalspace. space.Using Usingthis this model, the topological information of of between linear objects ininaaatwo-dimensional two-dimensional topological information of between linear objects space. Using thismodel, model,the the topological information the dataset in the history file was extracted. The topological information extracted from the history, the datasetininthe thehistory historyfile file was was extracted. extracted. The extracted from thethe history, the dataset Thetopological topologicalinformation information extracted from history, alongwith withthe thetype typeof oftheir theirtopological topologicalrelationship, relationship,was wasrecorded recordedin inaatable. table.For Forexample, example,ininFigure Figure along along with the type of their topological relationship, was recorded in a table. For 16, 16,the thetopology topologyrelationships relationshipsbetween betweenthe the fourversions versionsof ofthe theLLline line and LL2 line linewere weremarked marked and 16, 2 were the topology relationships between the fourfour versions of the L line andand L2 line marked andand stored storedin inaadescriptive descriptivetable. table.According Accordingto to thetable, table,the thefirst firstversion versionof ofthe the lineLLhas hasaa “disjoint” “disjoint” instored a descriptive table. According to the table,the the first version of the line L hasline a “disjoint” topological topological relationship with line L ; nonetheless, the other three versions, which also include the topological with relationship line L22; nonetheless, the versions, other three versions, which also the relationship line L2 ; with nonetheless, the other three which also include theinclude latest version latest version of the site, have a “meet” topological relationship with L line. Since these data were version ofathe site, have a “meet”relationship topological relationship with L22these line. Since data were by oflatest the site, have “meet” topological with L2the line. Since data these were recorded recorded bycontributors contributors inOSM, OSM,the thebasis basisfor forrecognizing recognizing topological relationship between two recorded by in the topological relationship between two contributors in OSM, the basis for recognizing the topological relationship between two linear objects linearobjects objectswas wasthe thenumber numberof ofrepetitions repetitionsof ofaatopological topologicalrelationship relationshipbetween betweenthe thetwo twoobjects objectsinin linear was theeditions numberofof repetitions of a topological relationship between the two objects in two editions two two object. According to the results of the descriptive table, it is expected that thetwo two of two editions of two object. According to the results of the descriptive table, it is expected that the two object. According to the results of the descriptive table, it is expected that the two newly-created newly-created versions versions from from the the two two linear linear objects objects LL and and LL2 will will have have the the “meet” “meet” topological topological newly-created 2 versions from the two linear objects L and L2 will have the “meet” topological relation. relation. relation.

Figure 16.An Anexample exampleof ofthe thetopology topologyrelationships relationshipsbetween between two linear objects. Figure 16. the topology relationships between two linear objects. Figure 16. An example two linear objects.

Afterdiscovering discoveringthe thespatial spatialrelationships relationshipsbetween between theobjects, objects, it isnecessary necessaryto tocorrect correctthe themap map After After discovering the spatial relationships betweenthe the objects,it itis is necessary to correct the map and fix the errors. Two examples of common errors in the linear data are overshoot and undershoot and theerrors. errors.Two Twoexamples examples of of common common errors and undershoot and fixfixthe errorsin inthe thelinear lineardata dataare areovershoot overshoot and undershoot errors,which whichboth both implyaalack lackof ofprecise precisealignment alignmentof ofthe thelines linesininthe theconnection connectionspot. spot.IfIfany anyobject object errors, errors, whichovershoot both imply imply aundershoot lack of precise alignment of the lines inerrors. the connection spot. If any encounters and errors, it is necessary to fix these For example, in Figure encounters overshoot and undershoot errors, iterrors, is necessary to fix thesetoerrors. For example, in Figure object encounters overshoot and undershoot it is necessary fix these errors. For example, 17,line lineL1 L1has hasan anovershoot overshooterror, error,while whileL2 L2line linehas hasan anundershoot undershooterror, error,ininwhich whichthe thedistances distancesof of 17, in Figure 17, line L1 has an overshoot error, while L2 line has an undershoot error, in which the overshootand andundershoot undershoothave haveto tobe beresolved. resolved. overshoot distances of overshoot and undershoot have to be resolved.

Figure17. 17.Overshoot Overshootand undershoot errors. Figure Overshoot andundershoot undershooterrors. errors. Figure 17.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

15 of 21

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

15 of 21

5.6. 5.6. Evaluation Evaluation of of the the Approach Approach The The proposed proposed method method for for improving improving the the positional positional accuracy accuracy of of the the data data exploits exploits the the inherent inherent characteristics of the historical contributions (i.e., lineage) from multiple users. In Figure 18, characteristics of the historical contributions (i.e., lineage) from multiple users. In Figure 18, several several examples are displayed thewhich objects, which were enhanced our approach. proposed examples are displayed showingshowing the objects, were enhanced using our using proposed approach. Linear objects are closer to the reference data in the compiled version, while having an Linear objects are closer to the reference data in the compiled version, while having an enhanced enhanced quality compared to the latest version in OSM. Furthermore, we attempted to improve quality compared to the latest version in OSM. Furthermore, we attempted to improve the topological the topologicalofrelationships of the the produced objects in version the produced version by therelationships topological relationships the objects in by examining theexamining topological relationships between the linear objects in the history file, which is shown in Figure 18. between the linear objects in the history file, which is shown in Figure 18.

Figure 18. An example example of of aa reference Figure 18. An reference dataset, dataset, the the latest latest version version of of OSM, OSM, and and extracted extracted datasets. datasets.

In this quality elements, namely, completeness and positional accuracy, were In this study, study, two twodata data quality elements, namely, completeness and positional accuracy, estimated to evaluate the proposed method. To fulfil this, the quality of the latest version of the were estimated to evaluate the proposed method. To fulfil this, the quality of the latest version dataset in OSM and the extracted by the solution was was determined based on the of the dataset in OSM anddataset the dataset extracted byproposed the proposed solution determined based on difference with the reference dataset. the difference with the reference dataset. The positional accuracy of the dataset was estimated, while the latest version of information of the object available in OSM, in comparison with the reference dataset, was created by calculating the geometric differences. Four criteria, such as distance, shape, orientation, and the common buffer area

ISPRS Int. J. Geo-Inf. 2018, 7, 253

16 of 21

The positional accuracy of the dataset was estimated, while the latest version of information of the object in7,OSM, in comparison with the reference dataset, was created by calculating the ISPRS Int. J. available Geo-Inf. 2018, x FOR PEER REVIEW 16 of 21 geometric differences. Four criteria, such as distance, shape, orientation, and the common buffer area were were considered considered to to numerically numerically evaluate evaluate the the geometric geometric quality quality of the objects and a weight of 0.62, 0.7, 0.31, and 0.65 was given to each of these parameters, respectively, as proposed by [48]. After 0.31, and 0.65 was given to each of these parameters, respectively, as proposed by calculating [48]. After the effect of the different the difference of the positional Equation (2) was used to calculating effectparameters of differenton parameters on the difference of precision, the positional precision, Equation estimate the positional accuracy. The result obtained through this evaluation is illustrated in Figure (2) was used to estimate the positional accuracy. The result obtained through this evaluation 19. is Our early findings positional accuracy of OSM features can beofimproved through illustrated in Figureconfirm 19. Ourthat earlythe findings confirm that the positional accuracy OSM features can retrieving their historical contributions from OSM. be improved through retrieving their historical contributions from OSM.

Figure 19. 19. (a) The computed positional accuracy of the latest version of OSM dataset; dataset; and (b) (b) the the Figure computed positional positional accuracy accuracy of of the the enhanced enhanced dataset. dataset.

To evaluate the completeness, the typical approach of comparing the length of the two datasets To evaluate the completeness, the typical approach of comparing the length of the two datasets was applied. Completeness of the dataset was determined by calculating the total length of the roads was applied. Completeness of the dataset was determined by calculating the total length of the roads in the latest OSM version compared with the extracted dataset and, subsequently, compared with the in the latest OSM version compared with the extracted dataset and, subsequently, compared with reference map in the same area. The results obtained through these comparisons are exhibited in the reference map in the same area. The results obtained through these comparisons are exhibited in Figure 20, proving that the proposed method improved the completeness in the compiled version. Figure 20, proving that the proposed method improved the completeness in the compiled version.

ISPRS Int. J. Geo-Inf. 2018, 7, 253 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

17 of 21 17 of 21

Figure 20. 20. (a) (a) Length Length percentage percentage of of the the latest latest version version of of the the OSM OSM dataset; dataset; and and (b) (b) the the length length percentage percentage Figure of the enhanced dataset. of the enhanced dataset.

The evaluation results in Figures 19 and 20 reveal that the proposed method has considerably The evaluation results in Figures 19 and 20 reveal that the proposed method has considerably enhanced the positional accuracy and completeness of OSM. Statistically speaking, the mean value enhanced the positional accuracy and completeness of OSM. Statistically speaking, the mean value of positional accuracy was enhanced from 82.5% to 95.3% using our proposed method. Additionally, of positional accuracy was enhanced from 82.5% to 95.3% using our proposed method. Additionally, from a completeness viewpoint, the completeness of roads in the study area was improved from from a completeness viewpoint, the completeness of roads in the study area was improved from 86.2% 86.2% to 97.1% compared to the reference dataset. This implies that our proposed method can to 97.1% compared to the reference dataset. This implies that our proposed method can substantially substantially reduce the quality differences between the raw OSM data and its reference dataset. reduce the quality differences between the raw OSM data and its reference dataset. 6. Discussion and Conclusions 6. Discussion and Conclusions OSM, as the most successful instance of VGI projects, offers numerous unique merits, in OSM, as the most successful instance of VGI projects, offers numerous unique merits, in particular, particular, retrieval of historical contributions. On the other hand, studies have proven that a larger retrieval of historical contributions. On the other hand, studies have proven that a larger number of number of contributors leads to better data quality in VGI based on the fact that data quality is better contributors leads to better data quality in VGI based on the fact that data quality is better if more if more people get involved and contribute. However, there is a high probability that some people get involved and contribute. However, there is a high probability that some contributors might contributors might reduce the quality of a given feature by eliminating some information around a reduce the quality of a given feature by eliminating some information around a feature. Therefore, feature. Therefore, the historical contributions should include some useful information for improving the historical contributions should include some useful information for improving data quality. Hence, data quality. Hence, the current research was designed to propose an approach to improve the quality the current research was designed to propose an approach to improve the quality of the contributions of the contributions in OSM through the long-term historical edits. In this study, all the historical in OSM through the long-term historical edits. In this study, all the historical edits from 2007 to 2017 edits from 2007 to 2017 were used to extract representative features for the street network of the study were used to extract representative features for the street network of the study area. The choice of area. The choice of Tehran as a case study aimed at exploring the feasibility of the proposed approach Tehran as a case study aimed at exploring the feasibility of the proposed approach in areas with in areas with limited contributors and the lack of a strong mapping community. Finally, the results limited contributors and the lack of a strong mapping community. Finally, the results of the study of the study were evaluated against a high-quality reference dataset. The resultant maps and quality were evaluated against a high-quality reference dataset. The resultant maps and quality measures measures reveal that our approach was successful to enhance the positional accuracy and completeness of OSM data considerably while resolving topological errors.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

18 of 21

reveal that our approach was successful to enhance the positional accuracy and completeness of OSM data considerably while resolving topological errors. The results prove the capability of the proposed method to improve the quality of VGI data. However, it should be noted that the results are sensitive to temporality of the contributions because, in a dynamic landscape, some features might have changed in the chosen timeframe, while the reference data may not have been updated accordingly. Therefore, the results of our approach, as well as the quality of the reference data, might have been biased by data temporality. Future studies should focus on determining the weights of OSM features based on their currency so that the latest contributions receive higher weights. One of the unique strengths of OSM is the availability of historical edits, which allows us to see behind the scenes of contributions and how they have evolved over time. Therefore, the history of OSM contributions is certainly a rich resource for a number of purposes, including (a) tracking the quality of contributions over time in order to enhance the latest version; (b) monitoring changes across the landscape since the OSM life cycle exceeds 14 years so far; (c) subsequently, generating multi-temporal datasets instead of representing a static time; (d) mining users’ behaviors in mapping practices, i.e., what they map correctly and incorrectly; (e) how an object constantly remains persistent/edited over time; (f) historical completion of semantic/attribute information of objects; (g) cross-linking data quality with user behaviors and their socioeconomic history; and (h) ranking users’ activities in OSM. Future studies should focus on exploiting the history of objects towards addressing these issues. Finally, the role of a number of contributors is evident while looking at historical edits for quality enhancement. However, this factor might still be challenging in some areas with inadequate contributors contributing to creating maps of that particular region. This is more crucial in developing countries with scarce mapping communities compared to active mapping communities across the globe. Thus, the OSM mapping communities should extend the mapping events and strengthen the motivating mechanisms for further user engagement across the globe. Author Contributions: A.N. is a master’s student at University of Tehran supervised by R.A.A. and advised by J.J.A., A.N., R.A.A., and A.C. conceived and designed the methodology while A.N. and A.C. implemented the methodology. A.N. and A.C. wrote the main draft of the paper; while R.A.A. and J.J.A. have critically revised and extended the paper. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

Ballatore, A.; Jokar Arsanjani, J. Placing Wikimapia: An exploratory analysis. Int. J. Geogr. Inf. Sci. 2018, 32, 1–18. [CrossRef] Hashemi, P.; Abbaspour, R.A. Assessment of logical consistency in OpenStreetMap based on the spatial similarity concept. In Openstreetmap in Giscience; Springer: Berlin, Germany, 2015; pp. 19–36. Goetz, M. Towards generating highly detailed 3D CityGML models from OpenStreetMap. Int. J. Geogr. Inf. Sci. 2013, 27, 845–865. [CrossRef] OSM Statistics. Available online: http://wiki.Openstreetmap.Org/wiki/stats (accessed on 24 June 2017). Full History Dump. Available online: http://wiki.Openstreetmap.Org/wiki/planet.Osm/full (accessed on 23 June 2017). Barron, C.; Neis, P.; Zipf, A. A comprehensive framework for intrinsic OpenStreetMap quality analysis. Trans. GIS 2014, 18, 877–895. [CrossRef] Grira, J.; Bédard, Y.; Roche, S. Spatial data uncertainty in the VGI world: Going from consumer to producer. Geomatica 2010, 64, 61–72. Feick, R.; Roche, S. Understanding the value of VGI. In Crowdsourcing Geographic Knowledge; Springer: Berlin, Germany, 2013; pp. 15–29. De Longueville, B.; Ostländer, N.; Keskitalo, C. Addressing vagueness in volunteered geographic information (VGI)—A case study. Int. J. Spat. Data Infrastruct. Res. 2010, 5, 1725–0463.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

10.

11. 12. 13.

14. 15. 16.

17. 18. 19. 20. 21. 22. 23. 24. 25.

26. 27.

28.

29. 30. 31.

19 of 21

Kounadi, O. Assessing the Quality of OpenStreetMap Data. Master’s Thesis, Geographical Information Science, Department of Civil, Environmental and Geomatic Engineering, University College of London, London, UK, 2009. Girres, J.F.; Touya, G. Quality assessment of the french OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [CrossRef] Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [CrossRef] Zielstra, D.; Zipf, A. A comparative study of proprietary geodata and volunteered geographic information for germany. In Proceedings of the 13th AGILE International Conference on Geographic Information Science, Guimaras, Portugal, 11–14 May 2010. Neis, P.; Zielstra, D.; Zipf, A. The street network evolution of crowdsourced maps: Openstreetmap in germany 2007–2011. Future Internet 2011, 4, 1–21. [CrossRef] Forghani, M.; Delavar, M.R. A quality study of the OpenStreetMap dataset for tehran. ISPRS Int. J. Geo-Inf. 2014, 3, 750–763. [CrossRef] Arsanjani, J.J.; Mooney, P.; Zipf, A.; Schauss, A. Quality assessment of the contributed land use information from OpenStreetMap versus authoritative datasets. In Openstreetmap in Giscience; Springer: Berlin, Germany, 2015; pp. 37–58. Mohammadi, N.; Malek, M. VGI and reference data correspondence based on location-orientation rotary descriptor and segment matching. Trans. GIS 2015, 19, 619–639. [CrossRef] Lyu, H.; Sheng, Y.; Guo, N.; Huang, B.; Zhang, S. Geometric quality assessment of trajectory-generated VGI road networks based on the symmetric arc similarity. Trans. GIS 2017, 21, 984–1009. [CrossRef] Brovelli, M.A.; Minghini, M.; Molinari, M.; Mooney, P. Towards an automated comparison of OpenStreetMap with authoritative road datasets. Trans. GIS 2017, 21, 191–206. [CrossRef] Dorn, H.; Törnros, T.; Zipf, A. Quality evaluation of VGI using authoritative data—A comparison with land use data in Southern Germany. ISPRS Int. J. Geo-Inf. 2015, 4, 1657–1671. [CrossRef] Arsanjani, J.J.; Barron, C.; Bakillah, M.; Helbich, M. Assessing the quality of OpenStreetMap contributors together with their contributions. In Proceedings of the AGILE, Vienna, Austria, 3–7 June 2013; pp. 14–17. Haklay, M.; Basiouka, S.; Antoniou, V.; Ather, A. How many volunteers does it take to map an area well? The validity of linus’ law to volunteered geographic information. Cartogr. J. 2010, 47, 315–322. [CrossRef] Leeuw, J.D.; Said, M.; Ortegah, L.; Nagda, S.; Georgiadou, Y.; DeBlois, M. An assessment of the accuracy of volunteered road map production in Western Kenya. Remote Sens. 2011, 3, 247–256. [CrossRef] Neis, P.; Zipf, A. Analyzing the contributor activity of a volunteered geographic information project—The case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 2012, 1, 146–165. [CrossRef] Antoniou, V.; Touya, G.; Raimond, A.-M. Quality analysis of the parisian osm toponyms evolution. In European Handbook of Crowdsourced Geographic Information; Capineri, C., Haklay, M., Huang, H., Antoniou, V., Kettunen, J., Ostermann, F., Purves, R., Eds.; University of Zurich: Zurich, Switzerland, 2016; pp. 97–112. Jonietz, D.; Zipf, A. Defining fitness-for-use for crowdsourced points of interest (POI). ISPRS Int. J. Geo-Inf. 2016, 5, 149. [CrossRef] Keßler, C.; De Groot, R.T.A. Trust as a proxy measure for the quality of volunteered geographic information in the case of OpenStreetMap. In Geographic Information Science at the Heart of Europe; Springer: Berlin, Germany, 2013; pp. 21–37. Keßler, C.; Trame, J.; Kauppinen, T. Tracking editing processes in volunteered geographic information: The case of OpenStreetMap. In Proceedings of the Workshop on Identifying Objects, Processes and Events in Spatio-Temporally Distributed Data (IOPE 2011), Belfast, ME, USA, 12–16 September 2011. Rehrl, K.; Gröchenig, S. A framework for data-centric analysis of mapping activity in the context of volunteered geographic information. ISPRS Int. J. Geo-Inf. 2016, 5, 37. [CrossRef] Sehra, S.S.; Singh, J.; Rai, H.S. Assessing OpenStreetMap data using intrinsic quality indicators: An extension to the qgis processing toolbox. Future Internet 2017, 9, 15. [CrossRef] D’Antonio, F.; Fogliaroni, P.; Kauppinen, T. VGI edit history reveals data trustworthiness and user reputation. In Proceedings of the AGILE 2014, Castellón, Spain, 3–6 June 2014.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

32.

33.

34. 35. 36. 37. 38.

39. 40.

41. 42. 43. 44. 45.

46. 47. 48. 49. 50. 51. 52. 53. 54. 55.

20 of 21

Touya, G.; Antoniou, V.; Olteanu-Raimond, A.-M.; Van Damme, M.-D. Assessing crowdsourced poi quality: Combining methods based on reference data, history, and spatial relations. ISPRS Int. J. Geo-Inf. 2017, 6, 80. [CrossRef] Van Exel, M.; Dias, E.; Fruijtier, S. In The impact of crowdsourcing on spatial data quality indicators. In Proceedings of the 6th GIScience International Conference on Geographic Information Science, Zurich, Switzerland, 14–17 September 2010; p. 213. Mooney, P.; Corcoran, P. Characteristics of heavily edited objects in OpenStreetMap. Future Internet 2012, 4, 285–305. [CrossRef] Wiki. Available online: http://wiki.Openstreetmap.Org/wiki/planet.Osm (accessed on 25 July 2017). Blum, H. A transformation for extracting descriptors of shape. In Models for the Perception of Speech and Visual Form; MIT Press: Cambridge, MA, USA, 1967. Blum, H.; Nagel, R.N. Shape description using weighted symmetric axis features. Pattern Recognit. 1978, 10, 167–180. [CrossRef] Attali, D.; Boissonnat, J.-D.; Edelsbrunner, H. Stability and computation of medial axes-a state-of-the-art report. In Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration; Springer: Berlin, Germany, 2009; pp. 109–125. Armstrong, C.G. Modelling requirements for finite-element analysis. Comput. Aided Des. 1994, 26, 573–578. [CrossRef] Hisada, M.; Belyaev, A.G.; Kunii, T.L. A skeleton-based approach for detection of perceptually salient features on polygonal surfaces. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2002; pp. 689–700. Brandt, J.W.; Jain, A.K.; Algazi, V.R. Medial axis representation and encoding of scanned documents. J. Vis. Comm. Image Represent. 1991, 2, 151–165. [CrossRef] Gold, C.; Dakowicz, M. The crust and skeleton—Applications in GIS. In Proceedings of the 2nd International Symposium on Voronoi Diagrams in Science and Engineering, Seoul, Korea, 10–13 October 2005; pp. 33–42. Xia, H.; Tucker, P.G. Fast equal and biased distance fields for medial axis transform with meshing in mind. Appl. Math. Model. 2011, 35, 5804–5819. [CrossRef] Cheng, S.-W.; Funke, S.; Golin, M.; Kumar, P.; Poon, S.-H.; Ramos, E. Curve reconstruction from noisy samples. Comput. Geom. 2005, 31, 63–100. [CrossRef] Egenhofer, M.J.; Herring, J. Categorizing Binary Topological Relations between Regions, Lines, and Points in Geographic Databases; Technical Report; Department of Surveying Engineering, University of Maine: Orono, ME, USA, 1991. Egenhofer, M.J.; Franzosa, R.D. Point-set topological spatial relations. Int. J. Geogr. Inf. Syst. 1991, 5, 161–174. [CrossRef] ISO 19157:2013: Geographic information—Data Quality; International Organization for Standardization (ISO): Geneva, Switzerland, 2013. Chehreghan, A.; Ali Abbaspour, R. An assessment of spatial similarity degree between polylines on multi-scale, multi-source maps. Geocarto Int. 2017, 32, 471–487. [CrossRef] Abbas, I. Base de Données Vectorielles et Erreur Cartographique: Problèmes Posés par le Contrôle Ponctuel, une Méthode Alternative Fondée sur la Distance de Hausdorff: Le Contrôle Linéaire; ABES: Montpellier, France, 1994. Li, L.; Goodchild, M.F. An optimisation model for linear feature matching in geographical data conflation. Int. J. Image Data Fusion 2011, 2, 309–328. [CrossRef] Tong, X.; Liang, D.; Jin, Y. A linear road object matching method for conflation based on optimization and logistic regression. Int. J. Geogr. Inf. Sci. 2014, 28, 824–846. [CrossRef] Chehreghan, A.; Ali Abbaspour, R. An assessment of the efficiency of spatial distances in linear object matching on multi-scale, multi-source maps. Int. J. Geogr. Inf. Sci. 2017, 1–20. [CrossRef] Olteanu Raimond, A.-M.; Mustière, S. Data matching—A matter of belief. In Headway in Spatial Data Handling; Springer: Berlin, Germany, 2008; pp. 501–519. Zhang, M. Methods and Implementations of Road-Network Matching. Ph.D. Dissertation, Technical University of Munich, Munich, Germany, 2009. Veltkamp, R.C. Shape matching: Similarity measures and algorithms. In Proceedings of the International Conference onShape Modeling and Applications, Herzliya, Israel, 22–24 June 2001; IEEE: Piscataway, NJ, USA, 2001; pp. 188–197.

ISPRS Int. J. Geo-Inf. 2018, 7, 253

56. 57. 58. 59. 60. 61. 62. 63. 64.

65.

21 of 21

Ali Abbaspour, R.; Chehreghan, A.; Karimi, A. Assessing the efficiency of shape-based functions and descriptors in multi-scale matching of linear objects. Geocarto Int. 2017, 33, 1–14. [CrossRef] Wang, M.; Li, Q.; Hu, Q.; Zhou, M. Quality analysis of OpenStreetMap data. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, 2, W1. Samal, A.; Seth, S.; Cueto, K. A feature-based approach to conflation of geospatial sources. Int. J. Geogr. Inf. Sci. 2004, 18, 459–489. [CrossRef] Chehreghan, A.; Ali Abbaspour, R. A geometric-based approach for road matching on multi-scale datasets using a genetic algorithm. Cartogr. Geogr. Inf. Sci. 2017, 1–15. [CrossRef] Chehreghan, A.; Ali Abbaspour, R. A new descriptor for improving geometric-based matching of linear objects on multi-scale datasets. GISci. Remote Sens. 2017, 54, 836–861. [CrossRef] Dehghani, A.; Chehreghan, A.; Ali Abbaspour, R. Matching of urban pathways in a multi-scale database using fuzzy reasoning. Geod. Cartogr. 2017, 43, 92–104. [CrossRef] Mustière, S.; Devogele, T. Matching networks with different levels of detail. GeoInformatica 2008, 12, 435–453. [CrossRef] Fan, H.; Yang, B.; Zipf, A.; Rousell, A. A polygon-based approach for matching OpenStreetMap road networks with regional transit authority data. Int. J. Geogr. Inf. Sci. 2016, 30, 748–764. [CrossRef] Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques: Concepts and Techniques; Elsevier: New York, NY, USA, 2011. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).