Use of Artificial Neural Networks for Selective Omission in Updating ...

26 downloads 0 Views 2MB Size Report
operators, among which selective omission has received much attention. ... networks for selective omission is tested on a real-life road network. The use of a ...
The Cartographic Journal Vol. 51 No. 1 # The British Cartographic Society 2014

pp. 38–51

February 2014

REFEREED PAPER

Use of Artificial Neural Networks for Selective Omission in Updating Road Networks Qi Zhou1,2 and Zhilin Li2,3 1

Faculty of Information Engineering, China University of Geosciences, Wuhan, China. 2Department of Land Surveying and Geo-Informatics, Hong Kong Polytechnic University, Kowloon, Hong Kong. 3School of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, China Email: [email protected]

An important problem faced by national mapping agencies is frequent map updates. An ideal solution is only updating the large-scale map with other smaller scale maps undergoing automatic updates. This process may involve a series of operators, among which selective omission has received much attention. This study focuses on selective omission in a road network, and the use of an artificial neural network (i.e. a back propagation neural network, BPNN). The use of another type of artificial neural network (i.e. a self-organizing map, SOM) is investigated as a comparison. The use of both neural networks for selective omission is tested on a real-life road network. The use of a BPNN for practical application road updating is also tested. The results of selective omission are evaluated by overall accuracy. It is found that (1) the use of a BPNN can adaptively determine which and how many roads are to be retained at a specific scale, with an overall accuracy above 80%; (2) it may be hard to determine which and how many roads should be retained at a specific scale using an SOM. Therefore, the BPNN is more effective for selective omission in road updating. Keywords: map generalisation, road updating, artificial neural networks

INTRODUCTION

The national map agencies may have map series at different scales. As an example, the topographic maps of Hong Kong have six different general scales (i.e. 1 : 1000, 1 : 5000, 1 : 20 000, 1 : 50 000, 1 : 100 000 and 1 : 200 000) produced by the Land Department of Hong Kong. These maps need to be updated every few years due to changes in the earth’s surface that may result from either natural phenomena (e.g. earthquake and flood) or human activities (e.g. newly built roads and buildings). Currently, these maps are updated manually or semi-automatically, which is a labour intensive, time-consuming and costly procedure. As a result, maps become outdated soon after they are updated. This problem is particularly serious in developing countries like China and India due to their rapid economic and social development. An ideal solution is only updating the map at the largest scale, with other maps at smaller scales having automatic updates (Kilpela¨inen and Sarjakoski, 1995; Harrie and Hellstro¨m, 1999; Li, 2006). The process of transforming the map from the largest scale to the other smaller scales is called map generalisation, and it always involves a series of operators, such as collapsing, selective omission, simplification, smoothing, displacement, and so on (Li, 2006). This paper focuses on selective omission in a road DOI: 10.1179/1743277413Y.0000000042

network (or a set of roads) because (1) roads are one of the major geographical features on a map and (2) among the above operators, selective omission, which means to retain more important objects (e.g. roads), is necessary for transforming a road network, thus attracting much attention. Selective omission in a road network involves two main issues: (1) which roads are selected and (2) how many roads are selected? 1. For the first issue, some researchers assigned an important value to each of the road segments (Mackaness and Beard, 1993; Mackaness, 1995; Thomson and Richardson, 1995; Kreveld and Peschier, 1998) or road intersections (Mackaness and Machechnie, 1999) for selection, as the database always stores road networks in the form of intersections and segments. Some built ‘strokes’, which were defined as ‘a set of one or more arcs in a non-branching and connected chain’ (Thomson and Richardson, 1999), and based selections on those built strokes. Indeed, the use of a ‘stroke’ makes the analysis of road networks, based on the importance of individual roads possible, even in the absence of all other thematic information (Thomson and Brooks, 2007). The importance of each stroke may be determined according to various properties, such as road length (Chaudhry and

39

Use of ANN in Updating Road Networks

Mackaness, 2005), stroke connectivity (Zhang, 2004) and topological centralities (Jiang and Claramunt, 2004; Jiang and Harrie, 2004). Some others proposed to use ‘areal partition’ (Edwardes and Mackness, 2000) or ‘mesh density’ (i.e. the total length in a unit area) (Chen et al., 2009) to determine whether road segments are retained. 2. For the second issue, some research has focused on the relationship between map scales and the numbers of features represented on maps, and some laws were formed. The representative one is the so-called ‘Principle of Selection’ or ‘Radical Law’ proposed by To¨pfer and Pillewizer (1966). The ‘Principle of Selection’ can be used to determine how many objects are retained for the representation at a specific scale. Yu (1993) further extended the ‘Principle of Selection’ by adopting the concept of ‘Fractal Dimension’. Some (Leitner and Buttenfield, 1995; Li and Choi, 2002) analysed the percentages of the retained features on real-life maps at various scales. Other researchers (Chen et al., 2009; Li and Zhou, 2012) used scale-related parameters (e.g. mesh density threshold and/or road length threshold) to determine a representation at a specific scale. However, it was found by Chen et al. (2009) that the number or percentage of retained roads computed, based on one specific case, may not be suitable for other cases (e.g. with different contents or scales). Li and Zhou (2012) found that the appropriate values for parameters may vary with cases. Thus, it is very desirable to adaptively determine the number or percentage of objects (e.g. roads) to be retained for the various cases. In this study, rather than finding an adaptive way to determine the appropriate parameters or laws, a new approach to selective omission is developed, which can adaptively determine which and how many roads are to be retained. This new approach is proposed because the national mapping agencies already have a series of representations of road networks at different scales. Therefore, the basic idea of this approach is to use existing representations to infer updated representations. This is achieved by adopting artificial neural networks, which are computational models used to solve complex problems. The artificial neural network (ANN) is inspired by biological nervous systems and has been widely used in many fields, such as data analysis, data classification and pattern recognition (Fischer and Leung, 2001; Hearty and Gibney, 2008; Bac¸˜ao et al., 2008; Ito and Murata, 2009). There are various types of ANNs. In the field of map generalisation, Werschlein and Weibel (1994) discussed the theoretical feasibility of applying neural networks to tasks such as line generalisation. Jiang and Harrie (2004) used a self-organizing map (SOM), which is a type of ANN, for the selection of roads in a network. This type of technique has also found applications in building typification (Sester, 2005) and amalgamation (Allouche and Moulin, 2005). Balboa and Lo´pez (2008) applied another type of ANN, the backpropagation neural network (BPNN), to the classification of road lines according to their sinuosities. This paper will further investigate them when applied to map generalisation, especially to selective omission in a road network. Although

the SOM has been developed before (Jiang and Harrie, 2004), it is found in this study that this approach cannot adaptively determine the number of roads to be retained for a specific representation (or at a specific scale). Therefore, this study develops the BPNN and proposes to use it. This paper is structured as follows. The section on ‘Two typical artificial neural networks’ introduces the basics of the two typical ANNs (i.e. SOM and BPNN). The section on ‘Input attributes and output classes of applying artificial neural networks to selective omission in a road network’ presents the possible input attributes and output classes of using them for selective omission in a road network. The section on ‘Testing of artificial neural networks used for selective omission in a road network’ tests the two approaches used for selective omission in a road network. The section on ‘Testing of artificial neural networks (ANNs) applied to selective omission in road updating’ further validates the BPNN applied to selective omission in road updating. The section on ‘Conclusions and discussions’ gives the conclusions and discussions.

TWO TYPICAL ARTIFICIAL NEURAL NETWORKS

This section introduces the basic theories of the two typical types of ANNs. A more detailed explanation can be found in Masters (1993), Webos (1994) and Kohonen (2001). Self-organizing map (SOM)

The SOM (also called a Kohonen neural network) can be used to cluster a high-dimensional dataset into a lowdimensional one. An SOM consists of two layers: the input layer and the output layer (Figure 1). Each layer consists of neurons, and all the neurons in the input layer are connected to all the neurons in the output layer. Usually, each neuron in the input layer records the input vector of a high dimension, and each neuron in the output layer records the output vector of the same dimension, but all the neurons in the output layer are arranged in an output space (or an output map) which can be a one-dimensional line or a twodimensional plane. As an example, Figure 1 is a 363 output map, which is a two-dimensional plane. The process of training an SOM is to group input vectors with similar properties into the same neuron in the output map. Before training, one input vector is randomly chosen and all the output vectors are randomly initialized. The distance between this input vector and any output vector is calculated as follows (equation (1)):   (1) dij ~xi {wj  where xi is a randomly chosen input vector in the neuron i of the input layer; wj is an output vector in the neuron j of the output layer; and k:kis the distance measure (the Euclidean distance is a common choice). After calculating distances between this input vector (xi) and all the output vectors, the output vector with the minimum distance to the input vector is called the winning vector or the Best-Matching Unit.

40

The Cartographic Journal

output map; and s(t) is a parameter that means the neighbourhood radius at the iteration step t (Figure 2). The learning rate function can be defined by (equation (4)):  t (4) aðt Þ~að0Þ 1{ T where a(0)[(0,1) is the initial learning rate and Tis the total number of iteration steps. The above process is repeated many times until the changes in the output vectors become very small. Back propagation neural network (BPNN)

Figure 1. Self-organizing map. All the neurons in the input layer are connected to all the neurons in the output layer (only one example is shown here for clarity)

Then, the winning vector and other output vectors in its neighbouring neurons are updated to be closer to this input vector (xi), according to the formula below (equation (2)): wðtz1Þ~wðt Þzaðt Þh ðt Þkxi {wðt Þk

(2)

where w(t) is the winning vector or the output vector of its neighbouring neuron; w(tz1) is the corresponding updated output vector; a(t) is the learning rate function; h(t) is the neighbourhood function, which determines how strongly the neurons are connected to each other; and t is the iteration number. The Gaussian neighbourhood function is often adopted, and is defined by (equation (3)): h ðt Þ~e

 2 2 {rj {rm  =2s ðtÞ

(3)

where rj and rm, respectively, are the coordinates of the winning neuron j and its neighbouring neuron m on the

The BPNN is a multilayer ANN, and it consists of an input layer, an output layer and one or more hidden layers (Figure 3). The hidden layers are located between the input and output layers. Each layer is composed of at least one neuron (marked as a circle in Figure 3). Each neuron is connected to all the neurons in the following layer, but there are no connections between neurons in the same layer. It is called a feed forward network because the ‘information’ only flows from the input layer to the output layer, with no feedback. The feed forward network can be used to produce output values according to the input and intermediate values recorded in each neuron, and weights recorded in each connection. The input values are recorded in the neurons of the input layer, and the values in the neurons of other layers are computed by summing up the multiplications between the values in the neurons of the previous layer and the weights of the connections to those neurons. Figure 4 gives a specific example to illustrate the computation. In this figure, the values of H1 and Q 1 can be computed, respectively, as follows (equations (5) and (6)): H1 ~L1 |wl1 h1 zL2 |wl2 h1 zL3 |wl3 h1

(5)

O1 ~H1 |wh1 o1 zH2 |wh2 o1 zH3 |wh3 o1

(6)

In order to obtain the desired output values, all of the weights need to be determined. In the beginning, all the weights of the connections are randomly initialized; then they are adjusted step by step. The process of adjustment of weights is called a learning process, which always involves a

Figure 2. The winning vector and its neighbouring vectors of different radiuses

41

Use of ANN in Updating Road Networks

Figure 3. A three-layer feed forward network

set of known input and output values for training. The differences between the actual output (of using this neural network) and known output (or desired output) are propagated back to the network. All the weights are readjusted to reduce the error (see equations (7)–(12), Basheer and Hajmeer, 2000; Fischer and Leung, 2001). The whole learning process is repeated until the error is lower than a given threshold, or the number of iterations is high. Some formulas related to the BPNN are listed below:

yj ~F

hX

i 1 wij yi {hj ~ 0

P if wij yi §hj P if wij yi vhj

(12)

wij ðtz1Þ~wij ðt ÞzLDwij

(7)

There are also some other activation functions, such as the sigmoid function, the radial basis function and the conic section function (Karlik and Olgac, 2011). To sum up, the BPNN builds an ANN model through training known input and output values and adjusting the weights step by step within this model. Once all the weights are fixed, the trained model can be applied to determine other unknown output values, based on their inputs.

Dwij ~gdj yi

(8)

Differences between these two artificial neural networks

where wij denotes the weight between neuron i of a layer and neuron j of the next layer; Dwij denotes the change in wij; t denotes the training time; and g is the learning rate, which affects the speed of learning. The smaller the learning rate (g), the longer it takes to meet the stopping criterion. On the other hand, the larger the learning rate, the faster it will proceed, but poor solutions may be produced; is the momentum rate, which can be helpful in accelerating the convergence and avoiding local minima (Dai and Macbeth, 1997); and j is the error gradient, which is obtained at the output layer as:    dj ~yj 1{yj yje {yj (9) where yje and yj are the desired output and the actual output, respectively; j is obtained at the hidden layer as:  X dk wkj dj ~yj 1{yj (10) where yj is the actual output of the neuron j of the hidden layer; and k is the error gradient at the neuron k of the previous layer, which connects to the neuron j of the hidden layer. The value yj is obtained as: hX i (11) wij yi {hj yj ~F where yj is a value of the neuron i; F [?] is an activation function that describes the output of a neuron; and hj is the activation threshold of the neuron j. For instance,

There are two main differences between the SOM and BPNN. 1. training mode: the BPNN is trained in a supervised mode; thus, this approach needs known input and output data as samples; but the SOM is trained in an unsupervised mode, which does not involve any samples; 2. classification/clustering: the BPNN uses a classification approach, and the label of each classified group is known; but the SOM is a clustering approach, as the label of each clustered group is unknown.

INPUT ATTRIBUTES AND OUTPUT CLASSES OF APPLYING ARTIFICIAL NEURAL NETWORKS TO SELECTIVE OMISSION IN A ROAD NETWORK

As mentioned in the section on ‘Two typical artificial neural networks’’, the BPNN needs known input and output data as training samples, but the SOM only needs input data. Thus, this section introduces the possible input attributes and output class of applying the two ANNs for selective omission in a road network. Input attributes (for both SOM and BPNN)

Normally, for selective omission in a road network, the relatively important roads are selected. As mentioned in the section on ‘Introduction’, a number of properties can be used to describe the importance of roads in a network. These properties may be used as input attributes, and they

42

The Cartographic Journal

will be described according to three types: geometric, topological and thematic ones. 1. geometric properties: ‘Length and width are two important geometric properties. Common sense tells us that long and wide roads tend to be more important’ (Jiang and Harrie, 2004). However, in the database, the road network is often represented by single lines, especially the ones at middle or small scale. Thus, length is more widely used than width, and normally, the longer the road length, the more importance the road is viewed as possessing; 2. topological properties: they have been widely used to determine the important nodes or links within a network, such as a social network (Freeman, 1979) and a road network (Jiang and Claramunt, 2004; Crucitti et al., 2006). The latter can be represented as a dual graph in which individual roads are taken as nodes and road intersections are taken as links (Porta et al., 2006). Then, the importance of each road can be computed according to the different topological properties. Three widely used properties, i.e. degree, closeness and betweenness, are introduced here (Jiang and Harrie, 2004; Crucitti et al., 2006). 2. Degree measures the number of connections of a given road to other roads within a network. The degree of a given road i can be calculated as (equation (13)): X CiD ~ aij (13) j[N

where N is the total number of roads within a network and aij equals 1 if there is a connection between road i and road j; otherwise it equals 0. 2. Closeness measures the shortest distance from a given road to all other roads. The closeness of a given road i can be calculated as (equation (14)): N {1 CiC ~ P

j[N,j=i dij

(14)

where dij is the shortest distance between road i and road j. 2. Betweenness measures the extent of a given road that is located in between the paths that connect all other pairs of roads. The betweenness of a given road i can be calculated as (equation (15)): CiB ~

X njk ði Þ 1 ðN {1ÞðN {2Þ j,k[N;j=k;k=i njk

(15)

where njk is the number of shortest paths from j to k and njk(i) is the number of shortest paths from j to k that pass through road i. 2. Normally, the larger the value of a topological property, the more important the road is viewed as. 3. Thematic properties: always exist in the database and a number of thematic properties may be used to determine the importance of an individual road. Li and Choi (2002) investigated six different ones: road type (or class), length, number of lanes, number of

Figure 4. Computation of values in the neurons

traffic directions, width and connectivity. However, from our observations, not all of them are available.

Output classes (for BPNN only)

The output class can decide whether a road is retained or not for a specific representation (or at a specific scale), as selective omission is usually involved with transforming a road network from one scale to another. The decision for each road may have two possible situations:

N N

road is retained for a specific representation (or at a specific scale); and road is not retained (or eliminated) for a specific representation (or at a specific scale).

TESTING OF ARTIFICIAL NEURAL NETWORKS USED FOR SELECTIVE OMISSION IN A ROAD NETWORK

This section will test the two ANNs (i.e. SOM and BPNN) used for selective omission in a road network. The testing results are shown and analysed. Data and benchmark for testing

The source data used for testing was the digital road network of the Lower Hutt City, New Zealand, at 1 : 50 000 scale (Figure 5a), and the benchmark was the corresponding map of the same region, but at a smaller scale, i.e. 1 : 250 000. Both datasets were retrieved from the Land Information of New Zealand (http://www.linz. govt.nz/topography/topo-maps). In the database for the source data, those road segments (Figure 5b) that are retained, at 1 : 250 000 scale (by referring to this benchmark), were marked with records. Experimental design for testing

The methods for using the BPNN and SOM for selective omission in a road network are described as follows: 1. input attributes and output classes:

N

input attributes: as mentioned in the section on ‘Input attributes and output classes of applying artificial

43

Use of ANN in Updating Road Networks

Figure 5. Source data (a) and benchmark (b) for testing: (a) 1 : 50 000 scale; (b) 1 : 250 000 scale

N

neural networks to selective omission in a road network’, both the SOM and BPNN involve input variables for training. In this test, six properties were used as input attributes: length, degree, closeness, betweenness, number of lanes and road surface (e.g. ‘sealed’ and ‘unmetalled’). For the first four properties (i.e. length, degree, closeness and betweenness), first, road segments in the network were built into named roads (Jiang and Claramunt, 2004), as the road name attribute existed in the database, and then the properties of each named street were computed. For the last two attributes (i.e. number of lanes and road surface), each road segment has these attributes already recorded in the database, thus these attributes can be obtained directly. As a named street may consist of road segments with different numbers of lanes, in this test, the road segment was viewed as the minimum element for selective omission. This means it is necessary to determine whether a road segment is retained or not. Moreover, the four properties of each named street were assigned to the corresponding road segments comprising the named street. For instance, if the length (property) of a named street is 1000 m, all the road segments comprising this named street are also assigned 1000 m; output classes: only the BPNN involves output classes for training. In this test, all road segments were recorded with an ‘output’ class. More precisely, if a road segment is retained in the benchmark at 1 : 250 000 scale, it was recorded with a class ‘R_Y’ in the database; otherwise, it was recorded with a class ‘R_N’ (Table 1);

2. training of neural networks: the two neural networks were trained with TANAGRA, which is free DATA

MINING software package for academic and research purposes. TANAGRA proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and database areas (Rakotomalala, 2005):

N

N

training of the BPNN: because the BPNN involves samples for training, a certain percentage of named roads were randomly chosen from the whole network for training, and others were used for validation. In previous research, Crowther and Cox (2005) recommended removal of one-third of the data for validation; Refaeilzadeh et al. (2009) suggested that 90% of the data should be used to build a model, with the remaining 10% for validation. In this testing, 10% and 30% of named roads were chosen for validation, respectively. For each percentage, 10 groups of random samples were chosen in order to reduce the sampling bias (Refaeilzadeh et al., 2009). During training, most of the parameters in TANAGRA were set as default (Figure 6); training of the SOM: the use of the SOM for selective omission in a road network has been proposed by Jiang and Harrie (2004). As suggested by them, in this test, all the road segments in the network were used for training. During training, different sizes (i.e. ‘2’, ‘3’, ‘4’ and ‘5’) of output maps were considered to observe the variation of outputs. Again, most of the parameters in TANAGRA were set as default (Figure 7);

3. evaluation of the performance of trained models: For the BPNN, after each training run, a trained model was applied to the corresponding validation data to evaluate the performance of the model. The retained road

Table 1. Output classes Scale of source map

Scale of target map

Output classes

1 : 50 000

1 : 250 000

‘R_Y’ – the road segment is retained at 1 : 250K scale ‘R_N’ – the road segment is not retained at 1 : 250K scale

44

The Cartographic Journal

Figure 6. Parameter settings for training a BPNN (a screenshot of the parameters panels in TANAGRA): (a) number of neurons; (b) learning parameters; (c) stopping rule

segments by the BPNN were compared with the ones in the benchmark, and a road segment was correctly classified only when its output value (e.g. ‘R_Y’ or ‘R_N’) was the same as the corresponding value recorded for this benchmark. The overall accuracy can be calculated (equation (16)) and used as a measure to evaluate the performance of a trained model (Balboa and Lo´pez, 2008).

Overall accuracy~ number of correctly classified road segments |100% (16) total number of road segments

Experimental results and analyses

Results of using the SOM

The outputs from using the SOM were a number of clusters of road segments. It can be seen in Table 2 that the number of clusters increases along with increasing map size of an output map. As an example, in the case with a map size 363, all road segments were grouped into 8 clusters. The number of road segments in each cluster is listed in Table 3, and the corresponding road segments are visualized on a program developed in VCzz and shown in Figure 8. In this figure, it was found that the road segments of the same cluster have very similar properties. For instance, the road segments in Cluster (1, 3) seem to be short and relatively unimportant. However, it is not easy to distinguish the importance of

Table 2.

The numbers of clusters for different sizes of an output map

Map size

262

363

464

565

Number of clusters

4

8

14

22

Table 3. The numbers of road segments in the clusters of an output map

Figure 7. Parameters setting for training an SOM (a screenshot of the parameter panel in TANAGRA)

Column row

1

2

3

1 2 3

1222 318 284

1928 906 0

840 7 1207

Use of ANN in Updating Road Networks

45

Figure 8. Output results of using an SOM with a 363 output map (and road segments in each cluster are highlighted with a darker colour)

road segments among some of these clusters (e.g. Cluster (2, 1) and Cluster (2, 2)), and it is also not easy to determine which and how many road segments should be retained for a specific representation (e.g. at 1 : 250 000 scale). Results of using the BPNN

Firstly, Table 4 lists the overall accuracies of using different percentages of road segments as validation data and different random samples in each percentage. It can be found in Table 4 that: 1. In terms of the overall accuracies, for each random sampling, the overall accuracy is above 80%. That means that 80% of the road segments of validation data can be correctly classified. 2. In terms of the basic statistics, the average overall accuracies are 83.36% and 86.32% for the ratios 7 : 3 and 9 : 1, respectively, and the larger the ratio of

training data (e.g. 9 : 1), the higher the average overall accuracy. However, the smaller the ratio of validation data, the higher the standard deviation (i.e. 0.0388), and the wider the range of overall accuracies in the 10 groups of random samples. This is because the lower the validation data, the greater the possibility of that data being either too easy or too difficult to be classified. We can also imagine that if there is only one road segment in the validation data, the overall accuracy for this segment is either 100% or 0%. Then, for instance, Figure 9 shows two groups of random samples (each has 10% of all the named roads), as well as the retained road segments at 1 : 250 000 scale, by using a BPNN model trained with the other 90% of roads. The results in the benchmark are also shown for comparison. It can be seen in this figure that most of the road segments in each sample have been correctly classified. This finding is not only consistent with that shown in Table 4, but also

46

The Cartographic Journal

Figure 9. Output results of using a BPNN

indicates that the BPNN can adaptively determine which and how many road segments are to be retained for a specific representation (e.g. at 1 : 250 000 scale).

BPNN involves samples for training, and the national mapping agencies already have a series of representations of road networks at different scales, which may be used as samples.

TESTING OF ANNS APPLIED TO SELECTIVE OMISSION IN ROAD UPDATING

Data for testing

In the last section, the two neural networks (i.e. SOM and BPNN) used for selective omission in a road network have been tested by experiment. It was found that the BPNN can adaptively determine which and how many road segments are to be retained for a specific representation (or at a specific scale); however, this is not the case for the SOM. This section follows with a further test of the BPNN with practical application road updating, due to the fact that the

The data for testing were a series of maps produced by the Land Department of Hong Kong in 1994 and 2007, respectively. Each series of maps involved four different scales, i.e. 1 : 20 000, 1 : 50 000, 1 : 100 000 and 1 : 200 000. The Kowloon Peninsula district from Hong Kong was chosen as the study area because there were largescale changes and updates to the roads in this area between 1994 and 2007. Figure 10 shows the digitalized road network retrieved from the map at 1 : 20 000 scale,

Table 4. Results of overall accuracies Ratio of training data and validation data

Overall accuracy

Statistics

Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 Average Standard deviation

7 : 3 (or 30% for validation)

9 : 1 (or 10% for validation)

81.30% 83.90% 83.40% 83.60% 83.30% 84.30% 85.70% 81.60% 82.90% 83.60% 83.36% 0.0126

82.90% 81.80% 82.00% 83.10% 85.40% 85.60% 90.40% 90.50% 90.60% 90.90% 86.32% 0.0388

47

Use of ANN in Updating Road Networks

2. training and evaluation: the old road segments (existing on a map produced in 1994) were used for training in TANAGRA with the same parameters (Figure 6). After training, both the updated road segments (only existing on a map produced in 2007), and all road segments were used for validation, and the overall accuracy (equation (16)) was calculated for each case.

Experimental results and analyses

Table 6 lists the overall accuracies in consideration of different target scales. Figures 11 and 12 highlight the retained road segments at three different scales of applying BPNN to the updated segments and all segments, respectively. The results in the corresponding benchmark are also shown for comparison. The results in both Table 6 and Figures 11 and 12 show that:

Figure 10. The digitized road network of Kowloon Peninsula at 1 : 20 000 scale, in the year 2007 (the updated road segments were highlighted in a darker colour)

produced in the year 2007. The updated roads are recorded in the database and highlighted in a darker colour. The corresponding roads existing at different scales were manually identified in the database, although it is also possible to establish their links automatically (Sester et al., 1998; Haunert, 2005; Anders et al., 2007). Experimental design for testing

This experiment is very similar with that described in the section on ‘Testing of artificial neural networks used for selective omission in a road network’. The differences are introduced here: 1. input and output variables:

N N

1. Most (i.e., more than 80%) of the road segments for validation have been correctly classified. This also indicates that the BPNN is effective to be used for selective omission in updating a road network. 2. The overall accuracies of applying the BPNN to all the segments are higher than those to the updated segments only. This indicates that the overall accuracies calculated for the old road segments (used for training) are higher than those for the updated segments (used for validation). 3. Although most of the road segments have been correctly classified, the output results may still have limitations, such as new dead-end roads (which may cause large detours for map users) may be produced (see the dotted circle in Figure 12d, and a few disconnected roads may also be produced (see the dotted triangle in Figure 12b). However, these limitations can be improved by other algorithms/approaches (Peschier, 1997; Touya, 2007).

Table 6. Results of overall accuracies

input variables: in this test, five properties were used as input variables, and they were length, degree, closeness, betweenness and road class. Road class rather than number of lanes or road surface was used, as the available attributes existed in the database; output variables: the data for testing involves three sets of scales, and thus results in three cases of output variables (Table 5);

Applying to

Target scale

Overall accuracy

The updated segments

1 : 50K 1 : 100K 1 : 200K 1 : 50K 1 : 100K 1 : 200K

80.0% 81.9% 92.1% 82.4% 87.0% 98.6%

All the segments

Table 5. Three cases of output classes Case

Scale of source map

Scale of target map

Output classes

1

1 : 20 000

1 : 50 000

2

1 : 20 000

1 : 100 000

3

1 : 20 000

1 : 200 000

‘R_Y_50’ – the road segment is retained at 1 : 50K scale ‘R_N_50’ – the road segment is not retained at 1 : 50K scale ‘R_Y_100’ – the road segment is retained at 1 : 100K scale ‘R_N_100’ – the road segment is not retained at 1 : 100K scale ‘R_Y_200’ – the road segment is retained at 1 : 200K scale ‘R_N_200’ – the road segment is not retained at 1 : 200K scale

48

The Cartographic Journal

Figure 11. Output results of applying trained BPNNs to the updated road segments

CONCLUSIONS AND DISCUSSIONS

This paper discusses the issue of selective omission in a road network for the purpose of map updates. Two neural networks (i.e. SOM and BPNN) were investigated, and the BPNN was developed to be used for road updating. The performance of the BPNN was validated with two experiments. In the first experiment, the road network of Lower Hutt City in New Zealand was used as the study area, in which a certain percentage of roads were randomly chosen for training a neural network model, and then the other roads were used for validation of this model. The

performance of the SOM was also tested with this study area, but all road segments were used for training. In the second experiment, the road network of Kowloon in Hong Kong was used as the study area, in which the old roads were used for training, and the updated roads were used for validating. Both of these experimental results were compared with a corresponding benchmark at a smaller scale and evaluated by an overall accuracy. It was found in the above experiments that 1. In terms of using the SOM, this method only divides the roads/road segments in a network into a number of

49

Use of ANN in Updating Road Networks

Figure 12. Output results of applying trained BPNNs to all the road segments

clusters; however, it may be hard to determine which and how many roads/road segments should be retained at a specific scale. 2. In terms of using the BPNN, this method can determine which and how many roads/road segments to be retained at a specific scale, and more than 80% of the road segments for validation can be correctly classified; This method is also effective for application to selective omission in road updating. However, this study still has some limitations, which are worth discussing in the future:

N

Firstly, as introduced in the section on ‘Input attributes and output classes of applying artificial neural networks to selective omission in a road network’, the use of the BPNN involves input attribute(s). In the study area of Lower Hutt City, six properties (i.e. length, degree, closeness, betweenness, number of lanes and road surface) out of each road segment were used as input attributes, but five (i.e. length, degree, closeness, betweenness and road class) were used in the study area of Kowloon. Different input attributes were considered because the available thematic properties recorded in two databases were different. However, the use of different

50

N

N N

input attributes may result in different performances of the selective omission. Thus, whichever kind of combination of them results in the best performance of selective omission will be discussed in the future. Secondly, a number of parameters (e.g. number of neurons and learning rate) need to be determined during training a BPNN. Different parameters may also result in different performances of selective omission. Thus, which parameter set results in the best performance of selective omission is also worthy of discussed. Thirdly, the output results using a BPNN may still have flaws, such as new dead-end roads and disconnected roads, and these flaws can be improved by other algorithms or approaches. Therefore, the output results may somehow need refinement. Last but not least, this paper only focused on selective omission in a road network, other operators such as simplification, smoothing, displacement and so on may also be needed for a cartographic representation of a road network. In future work, other operators will be considered for a true cartographic representation of a road network.

BIOGRAPHICAL NOTES

Qi Zhou received a master degree in Photogrammetry and Remote Sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing of Wuhan University in China in June 2008. He receives a PhD degree from the Department of Land Surveying & Geo-Informatics of the Hong Kong Polytechnic University in 2012. He is now working at China University of Geosciences. His research domain lies in the fields of automated map generalisation, data mining and network analysis. ACKNOWLEDGEMENTS

The project was supported by the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan); and also was supported by China Postdoctoral Science Foundation (no. 2012M521496). The authors were thankful to the Land Department of Hong Kong and the Land Information of New Zealand for providing the experimental data. The authors would also like to express special thanks to all the anonymous reviewers and the editor for their valuable comments.

REFERENCES Allouche, M. K. and Moulin, B. (2005). ‘Amalgamation in cartographic generalization using Kohonen’s feature nets’, International Journal of Geographical Information Science, 19, pp. 899–914.

The Cartographic Journal Anders, K. H., Sester, M. and Bobrich, J. (2007). ‘Incremental Update in an MRDB’, in 23th International Cartography Association Conference, Moscow, Aug 4–10. Bac¸˜ao, F., Lobo, V. and Painho, M. (2008). ‘Applications of different self-organizing map variants to geographical information science problems’, in Self-Organizing Maps: Applications in Geographic Information Science, ed. by Agarwal, P. and Skupin, A., Chapter 2, John Wiley & Sons, New York. Balboa, J. L. G. and Lo´pez, F. J. A. (2008). ‘Generalization-oriented road line classification by means of an artificial neural network’, Geoinformatica, 12, pp. 289–312. Basheer, I. A. and Hajmeer, M. M. (2000). ‘Artificial neural networks: fundamentals, computing, design, and application’, Journal of Microbiological Methods, 43, pp. 3–31. Crowther, P. S. and Cox, R. J. (2005). ‘A Method for Optimal Division of Data Sets for Use in Neural Networks’, Lecture Notes in Computer Science, 3684, pp. 1–67. Chaudhry, O. and Mackaness, W. (2005). Rural and urban road network generalization deriving 1 : 250000 from OS MasterMap, http://www.era.lib.ed.ac.uk/bitstream/1842/1137/1/ochaudry 001.pdf (accessed 31 January 2009). Chen, J., Hu, Y. G., Li, Z. L., Zhao, R. L. and Meng, L. Q. (2009). ‘Selective omission of road features based on mesh density for automatic map generalization’, International Journal of Geographical Information Science, 23, pp. 1013–1032. Crucitti, P., Latora, V. and Porta, S. (2006). ‘Centrality measures in spatial networks of urban roads’, Physical Review E, 73, pp. 0361251–5. Dai, H. and Macbeth, C. (1997). ‘Effects of learning parameters on learning procedure and performance of a BPNN’, Neural Network, 10, pp. 1505–1521. Edwardes, A. and Mackaness, W. A. (2000). ‘Intelligent Generalisation of Urban Road Network’, in Geographical Information Systems Research UK 2004 Conference (GISRUK 2000), pp. 81–85, York, Apr 5–7. Fischer, M. M. and Leung, Y. (2001). Geocomputational Modelling: Techniques and Applications, Springer, Berlin/Heidelberg. Freeman, L. C. (1979). ‘Centrality in social networks: conceptual clarification’, Social Networks, 1, pp. 215–239. Harrie, L. and Hellstro¨m, A. K. (1999). ‘A prototype system for propagating updates between cartographic data sets’, The Cartographic Journal, 36, pp. 133–140. Haunert, J. H. (2005). ‘Link Based Conflation of Geographic Datasets’, in 8th ICA Workshop on Generalisation and ˜ a, Jul 7–8. Multiple Representation, A Corun ´ . P. and Gibney, M. J. (2008). ‘Analysis of meal patterns with Hearty, A the use of supervised data mining techniques – artificial neural networks and decision trees’, The American Journal of Clinical Nutrition, 88, pp. 1632–1642. Ito, F. and Murata, A. (2009). ‘Artificial neural network model estimating land use change in the southwestern part of Nagareyama City, Chiba Prefecture’, in New Frontiers in Urban Analysis: In Honor of Atsuyuki Okabe, pp. 65–79, CRC Press (Taylor & Francis Group), Boca Raton, FL. Jiang, B. and Claramunt, C. (2004). ‘A structural approach to the model generalization of urban street network’, GeoInformatica, 8, pp. 157–173. Jiang, B. and Harrie, L. (2004). ‘Selection of roads from a network using self-organizing maps’, Transactions in GIS, 8, pp. 335–350. Karlik, B. and Olgac, A. V. (2011). ‘Performance analysis of various activation functions in generalized MLP architectures of neural networks’, International Journal of Artificial Intelligence and Expert Systems, 1, pp. 75–122. Kilpela¨inen, T. and Sarjakoski, T. (1995). ‘Incremental generalization for multiple representations of geographic objects’, in GIS and Generalization, ed. by Mu¨ller, J. C., Lagrange, J. P. and Weibel, R., pp. 209–218, Taylor & Francis, London. Kohonen, T. (2001). Self-Organizing Maps, 3rd ed., Springer, Berlin. Kreveld, M. and Peschier, J. (1998). ‘On the Automated Generalization of Road Network Maps’, in 3rd International Conference on GeoComputation, http://www.geocomputation. org/1998/21/gc_21.htm (accessed 31 January 2010). Leitner, M. and Buttenfield, B. P. (1995). ‘Acquisition of procedural cartographic knowledge by reverse engineering’, Cartography and Geographic Information Systems, 22, pp. 232–241.

Use of ANN in Updating Road Networks Li, Z. L. (2006). Algorithmic Foundation of Multi-scale Spatial Representation, CRC Press (Taylor & Francis Group), Bacon Raton, FL. Li, Z. L. and Choi, Y. H. (2002). ‘Topographic map generalization: association of road elimination with thematic attributes’, The Cartographic Journal, 39, pp. 153–166. Li, Z. L. and Zhou, Q. (2012). ‘Integration of linear- and arealhierarchies for continuous multi-scale representation of road networks’, International Journal of Geographical Information Science, 26, pp. 855–880. Mackaness, W. A. and Beard, M. K. (1993). ‘Use of graph theory to support map generalization’, Cartography and Geographic Information Systems, 20, pp. 210–211. Mackaness, W. (1995). ‘Analysis of urban road networks to support cartographic generalization’, Cartography and Geographic Information Systems, 22, pp. 306–316. Mackaness, W. and Mackechine, G. (1999). ‘Automating the detection and simplification of junctions in road networks’, GeoInformatica, 3, pp. 185–200. Masters, T. (1993). Practical Neural Network Recipes in Czz, Academic Press, New York. Peschier, J. (1997). Computer aided generalization of road network maps. MSc thesis, Department of Computer Science, Utrecht University, Utrecht, The Netherlands. Porta, S., Crucitti, P. and Latora, V. (2006). ‘The network analysis of urban roads: a dual approach’, Physica A: Statistical Mechanics and Its Applications, 369, pp. 853–866. Rakotomalala, R. (2005). ‘TANAGRA: A Free Software for Research and Academic Purposes’, in EGC 2005, RNTI-E-3, Vol. 2, pp. 697–702, Amsterdam, Feb 14–16 (in French). Refaeilzadeh, P., Tang, L. and Liu, H. (2009). ‘Cross-validation’, in Encyclopedia of Database Systems, pp. 532–538, Springer, Berlin.

51 Sester, M. (2005). ‘Optimization approaches for generalization and data abstraction’, International Journal of Geographical Information Science, 19, pp. 871–897. Sester, M., Anders, K. H. and Walter, V. (1998). ‘Linking objects of different spatial data sets by integration and aggregation’, GeoInformatica, 2, pp. 335–358. Thomson, R. and Richardson, D. (1995). ‘A Graph Theory Approach to Road Network Generalisation’, in 17th International Cartographic Conference, pp. 1871–1880, Barcelona, Sep 3–9. Thomson, R. and Richardson, D. (1999). ‘The ‘‘Good Continuation’’ Principle of Perceptual Organization Applied to the Generalization of Road Networks’, in 19th International Cartographic Conference, pp. 1215–1223, Ottawa, Ont., Aug 14–21. Thomson, R. and Brooks, R. (2007). ‘Generalisation of geographical networks’, in Generalization of Geographic Information: Cartographic Modeling and Applications, ed. by Ruas, A., Mackaness, W. A. and Sarjakoski, L. T., pp. 255–267, Elsevier, Oxford. Touya, G. (2007). ‘A Road Network Selection Process Based on Data Enrichment and Structure Detection’, in 10th ICA Workshop on Generalisation and Multiple Representation, pp. 595–614, Moscow, Aug 2–3. To¨pfer, F. and Pillewizer, W. (1966). ‘The principle of selection’, The Cartographic Journal, 3, pp. 10–16. Webos, P. J. (1994). The Roots of Backpropagation, John Wiley & Sons, New York. Werschlein, T. and Weibel, R. (1994). ‘Use of Neural Networks in Line Generalization’, in EGIS ’94, pp. 76–85, Paris, Mar 30–Apr 1. Yu, Z. (1993). The effects of scale change on map structure. Doctoral thesis, Department of Geography, Clark University, Worcester, MA. Zhang, Q. (2004). ‘Road Network Generalization Based on Connection Analysis’, in 11th International Symposium on Spatial Data Handling, pp. 343–353, Leicester, Aug 23–25.

Suggest Documents