A Method for Traffic Congestion Clustering Judgment Based on ... - MDPI

8 downloads 13813 Views 3MB Size Report
May 18, 2016 - a grey relational membership degree rank clustering algorithm (GMRC) to ..... Next, we cluster analysis domain data objects via grey relational ...
International Journal of

Geo-Information Article

A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis Yingya Zhang 1 , Ning Ye 1,2, *,† , Ruchuan Wang 3,† and Reza Malekian 4, * 1 2 3 4

*



Department of Computer, Nanjing University of Post and Telecommunications, Nanjing 210003, China; [email protected] Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing 210003, China Key Lab of Broadband Wireless Communication and Sensor Network Technology of Ministry of Education, Nanjing University of Post and Telecommunications, Nanjing 210003, China; [email protected] Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador, Santiago 3659, Chile Correspondence: [email protected] (N.Y.); [email protected] (R.M.); Tel.: +86-138-1389-2478 (N.Y.); +27-12-420-4305 (R.M.); Fax: +86-025-83492470 (N.Y.); +27-12-420-362-5000 (R.M.) These authors contributed equally to this work.

Academic Editors: Chi-Hua Chen, Kuen-Rong Lo and Wolfgang Kainz Received: 17 January 2016; Accepted: 9 May 2016; Published: 18 May 2016

Abstract: Traffic congestion clustering judgment is a fundamental problem in the study of traffic jam warning. However, it is not satisfactory to judge traffic congestion degrees using only vehicle speed. In this paper, we collect traffic flow information with three properties (traffic flow velocity, traffic flow density and traffic volume) of urban trunk roads, which is used to judge the traffic congestion degree. We first define a grey relational clustering model by leveraging grey relational analysis and rough set theory to mine relationships of multidimensional-attribute information. Then, we propose a grey relational membership degree rank clustering algorithm (GMRC) to discriminant clustering priority and further analyze the urban traffic congestion degree. Our experimental results show that the average accuracy of the GMRC algorithm is 24.9% greater than that of the K-means algorithm and 30.8% greater than that of the Fuzzy C-Means (FCM) algorithm. Furthermore, we find that our method can be more conducive to dynamic traffic warnings. Keywords: urban traffic; grey relational membership degree; traffic congestion judgment

1. Introduction With the rapid development of urban traffic, urban vehicle surges and the pressure on traffic capacities are increasing sharply. Therefore, traffic problems are becoming serious and bound the development of a city. In China, the conditions of roads and vehicles are quite inconvenient, and traffic congestion has caused substantial social and economic problems. In this case, traffic jams not only waste time, delay work, and reduce efficiency but also cause a substantial waste of fuel, increase the probability of accidents and exacerbate the already serious difficulties facing traffic control and management. Since the 1980s, intelligent transportation systems (ITSs) consisting of integrated computer technology, automatic control technology, communication technology and information processing technology have achieved remarkable results worldwide. In addition, many aspects of ITSs are based on traffic information. Furthermore, traffic information processing has become an important aspect of ITSs [1]. The critical function of an ITS is to manage and control traffic flows and avoid the development of traffic jams. When traffic jams occur, such systems should provide timely and effective solutions and ease traffic pressure. Therefore, clustering and evaluating urban traffic congestion is of

ISPRS Int. J. Geo-Inf. 2016, 5, 71; doi:10.3390/ijgi5050071

www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2016, 5, 71 71

2 of 15

To determine the road congestion degree, different definitions of traffic congestion are formulated. great importance andthe is thus a prerequisite correctly inspecting traffic congestion. To exceeding determine the the Rothenberg defined traffic congestion for rank as the number of vehicles on the road road congestion degree, different definitions of traffic congestion are formulated. Rothenberg defined carrying capacity on the general acceptable road service level [2]. Under such a definition, U.S. the traffic congestion rank the number of vehicles on from the road thethe carrying on authorities divide Level of as Service (LOS) into six levels A toexceeding F based on ratio ofcapacity the actual the general acceptable road service level [2]. Under such a definition, U.S. authorities divide Level vehicle flow (volume) and road capacity: V/C. In Virginia, when V/C is less than 0.77, LOS is at the of D Service (LOS) into six levels from A to F based the ratio of the and level, and the traffic situation is considered to beon a high-density butactual stablevehicle traffic flow flow.(volume) When LOS is road capacity: V/C. In Virginia, when V/C is less than 0.77, LOS is at the D level, and the traffic at the E level, traffic begins to deteriorate and results in a serious traffic jam. A method of assessing situation considered to be a high-density but stable traffictraffic flow. parameter When LOSwith is atathe E level, When traffic the trafficiscongestion level (rank) is by comparing a certain threshold. begins to deteriorate and results a serious traffic jam. method the traffic congestion the parameter is greater than aincertain threshold, a A traffic jamofisassessing considered to have formed. level (rank) is by comparing a certain traffic parameter with a threshold. When the parameter Specifically, the method can detect whether traffic congestion occurs but cannot represent isa greater than a certain threshold, a traffic jam is considered to have formed.Currently, Specifically, method can comprehensive information evaluation method for traffic congestion. wethecollect traffic detect whether traffic congestion occurs but for cannot represent comprehensive information evaluation flow information based on three properties Nanjing urbana trunk roads to comprehensively weigh method for traffic congestion. Currently, we collect traffic flow information based onroad threeisproperties the level of traffic congestion in the same time period. In addition, we judge which allowing for Nanjing urban trunk roads to comprehensively of is traffic congestion the same smooth traffic flows, which is suffering from a light weigh traffic the jam,level which suffering from ain traffic jam, time period. In addition, we judge which road is allowing smooth traffic flows, which is suffering from and which is suffering from a heavy traffic jam state. Figure 1 represents the Nanjing transport anetwork light traffic which is suffering from a Using traffic jam, and which is suffering heavy traffic jam areajam, in its geographical context. the above information, wefrom can aprovide reference state. 1 represents the Nanjing transport network area in its geographical context. Using the valuesFigure for traffic management. above information, we can provide reference values for traffic At present, the method of judging traffic congestion canmanagement. be divided into three categories: (1) At present, the method of judging traffic congestion can divided intoneeds three categories: (1) Direct Direct detection method, such as video detection method.be This method to install too many detection method, such as video detection method. This method needs to install too many cameras cameras and the cost is higher. (2) Indirect detection method, which is mainly according to events’ and the cost higher. (2) Indirect is mainly according events’ on influence onistraffic flow, used to detection detect themethod, event’s which existence. The method hastolow cost,influence simple and traffic used but to detect the event’s existence. Thehigh method lowrate. cost,(3) simple and to operate, easy toflow, operate, has lower detection rate, and falsehas alarm Based oneasy theory model, but has lower detection rate, and high false alarm rate. (3) Based on theory model, design algorithm design algorithm to judge traffic congestion. This method is being used, and some mature theories to judge traffic congestion. method is being used, and theory, some mature theories been applied, have been applied, such asThis cluster analysis, grey system and rough set have theory. Our work such as clusteronanalysis, greytraffic system theory, and rough set theory. Our work concentrates clustering concentrates clustering flow information based on grey relational analysis to on judge traffic traffic flow situations information congestion [3].based on grey relational analysis to judge traffic congestion situations [3].

Figure 1. Nanjing transport network area in its geographical context. Figure 1. Nanjing transport network area in its geographical context.

The paper is organized as follows. First, we give a brief summary of previous related work in The paper is organized as follows. First, we give a brief summary of previous related work Section 2. Next, we introduce how to build the grey relational clustering model in Section 3. In Section in Section 2. Next, we introduce how to build the grey relational clustering model in Section 3.

ISPRS Int. J. Geo-Inf. 2016, 5, 71

3 of 15

In Section 4, the grey relational membership degree rank clustering algorithm (GMRC) is described. Then, we illustrate experimental results in Section 5. Finally, we conclude the paper in Section 6. 2. Related Work 2.1. Related Theories Our work particularly involves grey system theory. Grey system theory was proposed by Professor Julong Deng in 1982 and includes many aspects such as grey generation, grey analysis, grey modeling and grey prediction [4]. This theory has been widely used in image processing, network security and logistics management. In addition, grey relational analysis is not only an important aspect of grey theory but also a type of measurement method for the study of the similarity of data. This analysis also has no strict requirement on sample size, and it is usually used for analysis and comparison of the geometric forms of curves described by several points in the space; the closer to the referenced standard array, the higher the relational degree between the referenced standard and the higher its rank. Next, we will briefly introduce rough set theory, which is a mathematical theory method used to address uncertain, imprecise and fuzzy information. This theory has been applied in machine learning, data mining, decision-making analysis, etc. In addition, rough set theory is an important branch of uncertainty calculations, which include other theories such as fuzzy sets, neural networks and evidence theory. In this paper, we analyze urban road traffic information using grey relational clustering and combine the results with rough set theory to establish a decision table system. Finally, we judge the degree of urban traffic congestion. 2.2. Clustering Techniques Clustering analysis is the subject of active research in several fields such as statistics, pattern recognition, machine learning, and data mining. It aims to partition a large number of data into different subsets or groups so that the requirements of homogeneity and heterogeneity are fulfilled. Homogeneity requires that data in the same cluster should be as similar as possible and heterogeneity means that data in different clusters should be as different as possible. At present, a number of cluster methods have been widely used, among which a weighted adaptive algorithm based on the equilibrium approach is proposed wherein the grey method is introduced to a spectral clustering algorithm [5] to measure the similarity between the data. The experimental results show that the proposed algorithm can effectively overcome the shortcomings of spectral clustering concerning the sensitivity of parameters. Another clustering algorithm based on entropy that can automatically determine the number of clusters based on the distribution characteristics of the data sample and reduce the user’s participation was proposed in [6]. The result is more objective, and the algorithm can find large and small clusters of any shape. The disadvantages of the algorithm are the selection of the initial points and the effects of noise and outliers. However, these shortcomings can be overcome by screening and addressing the original data noise and outliers, excluding false data and improving the reliability and separation of the data to minimize the influence of noise and outliers. The two methods inspire our work such that we consider using an approach to comprehensively evaluate data; therefore, we use grey relational theory, which can effectively process multidimensional attribute data. However, those traditional algorithms are mostly a simple clustering of similar data and do not consider what data attributes have what indicator characteristics. Moreover, clustering results cannot reflect the rank of data that present greater clustering. 2.3. K-Means Algorithm and FCM Algorithm Currently, the K-means algorithm and Fuzzy C-Means (FCM) algorithm are commonly used to cluster data. We consider the two algorithms comparing with our proposed algorithm in our work. First, our work will provide a brief introduction of the K-means algorithm. The K-means algorithm is a classical clustering algorithm that is widely used in different subject areas. In addition, various

ISPRS Int. J. Geo-Inf. 2016, 5, 71

4 of 15

improved algorithms have evolved based on the K-means algorithm. The K-means algorithm, however, is an NP-hard optimization problem (Generally speaking, problems that will cost polynomial time to solve and are easy to address are commonly regarded as P problems. Problems that cost super polynomial time to solve are considered as difficult problems and are known as NP-hard problems), which means that many problems cannot obtain global optimal results [7]. The Euclidean distance is typically used as the criterion function of the K-means algorithm, where the distance between two data points with different units is sometimes calculated. The Euclidean distance is the real distance between two points in an m-dimensional space and is mainly determined by the heavily fluctuant elements. Slightly fluctuant elements are often neglected, and the phenomenon is more obvious with increasing ratio of the difference between corresponding elements. FCM algorithm is a type of fuzzy clustering algorithm based on the objective function. First, we introduce the concept of fuzziness: fuzziness is uncertainty. In addition, we usually say that an object is what is certain. However, the uncertainty indicates the similarity between two objects [8,9]. For example, we regard twenty years of age as a standard of judging whether an individual is young or not young. Therefore, a 21-year-old person is not young according to this division of certainty; however, 21 years of age is very young in our opinion. Thus, at this moment, we can vaguely think of the possibility of a 21-year-old person belonging to young as 90% and that belonging to not young as 10%. Here, 90% and 10% are not probabilities; rather, they are the degree of similarity. Although the FCM algorithm can effectively perform clustering, it does not differentiate the rank to which the clustering belongs. In view of the above problems, we mine traffic flow information relations with different attributes via grey relational clustering. In addition, we propose the GMRC algorithm to judge the clustering priority. Simultaneously, we combine the K-means and FCM clustering algorithms and contrast the results with those of the GMRC algorithm to evaluate the performance of our algorithm. 3. Grey Relational Clustering Model In this paper, we suppose that the traffic congestion state of roads is divided into four ranks (not four clusters), namely, smooth, light jam, jam and heavy jam, which is our precondition, where smooth characterizes the best condition, belonging to the first rank, and heavy jam characterizes the worst condition, belonging to the fourth rank. According to the description of the different definitions of traffic congestion, traffic congestion is not only related to certain parameters [10] (such as traffic volume and traffic speed) but also includes many factors. Therefore, a single parameter used to describe traffic congestion states is insufficient: when the vehicle’s speed is zero, it may be blocked by too many vehicles on the road that cannot move or it may also be smooth, with no vehicles driving on the road. Therefore, a light traffic flow can match two states: heavy jam and smooth. In addition, low density may characterize traffic that is smooth and may also include more trucks and other large vehicles on the road. However, comprehensive analysis of the three variables of traffic flows (traffic flow velocity, traffic flow density and traffic volume) can reflect the real situation concerning traffic jams. Our purpose is to evaluate the traffic jam degree of different roads based on a multiple-attributes index. Thus, we introduce three variables of traffic flows to evaluate the degree of traffic jams. However, to consider addressing data of multidimensional mixed attributes and obtain data clustering levels, we use grey relational analysis theory. Traffic flows are characterized by three properties [11]: traffic flow velocity, traffic flow density and traffic volume. Traffic flow velocity indicates the average speed of vehicles on the road, in units of km/h. The traffic flow density, namely, the density of vehicles, indicates the number of vehicles per unit length that the road contains. The traffic volume is defined as the number of vehicles traveling through a certain road section in unit time. This paper uses these three properties of traffic flows to judge the traffic congestion state. Definition 1. Assume the analysis domain data set X “ tXi |Xi “ pXi1 , Xi2 , Xi3 qu i P N as the comparative object set, where Xi represents the ith data object, each of which has three attributes: traffic flow velocity, traffic flow density, and traffic volume.

ISPRS Int. J. Geo-Inf. 2016, 5, 71

5 of 15

ISPRS Int. J. Geo-Inf. 2016, 5, 71

5 of 15

Definition 2. Suppose a reference standard array set as Y “ tYi |Yi “ pYi1 , Yi2 , Yi3 qu Definition Suppose aobject reference standard Y = {𝑌𝑖 |𝑌𝑖(when = (𝑌𝑖1the , 𝑌𝑖2road , 𝑌𝑖3 )} (𝑖 = pi “ 1, 2, . . . ,2.pq (referenced set). We regard thearray trafficset flow as information 1, 2, …is, 𝑝) (referenced object set). We regard the traffic flow information (when the road is in the smooth in the smooth state or vehicles are traveling at the free stream velocity) as a reference standard state or vehicles arrayare set.traveling at the free stream velocity) as a reference standard array set. Each Each group group of of aa reference reference sequence sequence can can obtain obtain aa clustering clustering result result when when combined combined with with aa comparative object set by the grey relational clustering method; therefore, p groups of referenced comparative object set by the grey relational clustering method; therefore, p groups of referenced arrays arrays will will produce produce pp clustering clustering results. results. In In addition, addition, the the group group of of referenced referenced standards standards extracted extracted from from the the data data is is used used for for clustering clustering when when combined combined with with comparative comparative object object sets, sets, which which can can produce produce aa clustering clustering result. result. Thus, Thus, over over the the whole whole process, process,we wecan canobtain obtainpp++ 11 clustering clustering results. results. Definition 3. The evaluation information M is regarded as the output information of the grey Definition 3. The evaluation M is as regarded as the output information of relationship the grey relational relational clustering system S,information and F is defined the evaluation function. Then, the clustering system S, and F is defined as the evaluation function. Then, the relationship between S and M is between S and M is denoted as # denoted as S “ ppX, Yq , Gq (1) ˆ Y𝑌), Ñ 𝐺) M {F𝑆 :=X((𝑋, (1) 𝐹: 𝑋 × 𝑌 → 𝑀 where G represents the grey relational similar matrix. The evaluation function F of the above model where G represents the grey relational similar matrix. The evaluation function F of the above model leverages leverages the decision table system of rough set theory to weigh the degree of contribution of the the decision table system of rough set theory to weigh the degree of contribution of the cluster members to the cluster members to the clustering results. F is also called the grey relational membership function, clustering results. F is also called the grey relational membership function, whose inputs are X and Y and whose inputs are X and Y and output is M, which reflects the similarity between elements inside a output is M, which reflects the similarity between elements inside a class and the similarity between classes. class and the similarity between classes.

In this this paper, paper,we wepropose proposethe theGMRC GMRCalgorithm algorithm oriented oriented to tomultidimensional multidimensional attribute attribute data data to to In judge clustering rank priority. The method’s procedure includes pre-processing data, transforming judge clustering rank priority. The method’s procedure includes pre-processing data, transforming the data data into into matrix matrix form, form, setting setting the the threshold, threshold, and filtering and deleting abnormal data objects. the Next, we we cluster cluster analysis analysis domain domain data data objects objects via via grey grey relational relational clustering clustering analysis analysis and and then then apply apply Next, weighted decision decision analysis analysis from from rough rough set set theory. theory. Finally, Finally,the theclustering clustering results results are are calculated calculated using using weighted probabilitytheory. theory. probability 4. GMRC Algorithm Algorithm 4. GMRC The The architecture architectureof ofthe theGMRC GMRCalgorithm algorithmconsists consistsof ofthe thefollowing followingphases phases(Figure (Figure2): 2):

Data preprocessing

The probability of each class

Grey relation clustering

Establish the decision table system

Traffic flow data

Output

The rank set of analysis domain data objects

Road1

Road2

Figure2.2.The Thearchitecture architecture of grey the grey relational membership degree rank clustering Figure of the relational membership degree rank clustering algorithmalgorithm (GMRC). (GMRC).

According to the index feature of the three properties of the traffic flow data, the optimal referenced According to the index feature the analysis three properties of thethe traffic flow data, thestandard optimal standard is extracted from the data set of of the domain. Then, optimal referenced referenced standard from the data the analysis domain. Then, thewith optimal and referenced array is setextracted are used to calculate the set greyofrelational degree in combination the referenced standard and referenced array set are used to calculate the grey relational degree in dataset of the analysis domain. Subsequently, the grey relational matrix can be obtained, and the combination with the dataset of the analysis domain. Subsequently, the grey relational matrix can be cluster members can be calculated. Next, we use rough set theory, where cluster members are applied, obtained, cluster members can be to calculated. we usefor rough set theory, of where cluster and build and the the decision information table completeNext, the fusion the weighting the cluster members are applied, and build the decision information table to complete the fusion for the members, according to which we need to calculate the rank of each data object using probability theory. weighting of the cluster members, according to which we need to calculate the rank of each data We regard the first category as the highest priority, which means that the data object is closer to the object using probability We regard thethe first category as thecorresponding highest priority, which means that referenced standard andtheory. simultaneously that object is better, to the lightest traffic the data object is closer to the referenced standard and simultaneously that the object is better, congestion degree. corresponding to the lightest traffic congestion degree.

4.1. Grey Relational Clustering Steps

ISPRS Int. J. Geo-Inf. 2016, 5, 71

6 of 15

4.1. Grey Relational Clustering Steps 4.1.1. Extracting Optimal Referenced Standard according to the Characteristics of Traffic Flow Data Because the problem of clustering analysis is solved based on a given indicator system, it is very important to choose the appropriate indicator to achieve reasonable and appropriate clustering. There are three properties for each Xi from a data source, and based on the indicator attribute of the data object, the optimal reference standard X0 can be extracted from the data source. Specific instructions are as follows. We know that the three properties units are different; therefore, we first need to convert the data to the same format. “Ò” indicates that the property value is greater and thus better; “Ó” indicates the opposition. (a, b), where a and b are numbers, indicates that, if the property value of a data object is in this interval, the value will be better. Extract the optimal referenced standard X0 according to the characteristics of the three properties, and its expression is X0 “ tX01 , X02 , X03 u, which is described as follows: Considering the characteristic of the traffic flow velocity as “Ò”, we use X0j “ maxXij pi P N, j “ 1q

(2)

Considering the characteristic of the traffic flow density as “Ó”, we use X0j “ minXij pi P N, j “ 2q Considering the characteristic of the traffic flow volume as “Ó”, we use ˇ ˇ " * ˇ pa ` bq ˇˇ ˇ X0j “ Xij |min ˇXij ´ pi P N, j “ 3q 2 ˇ

(3)

(4)

4.1.2. Data Normalization Processing Because there are different types of data, the units are also different. According to the characteristics of the properties, the data of the analysis domain are processed using different measures, and the data are compressed to (0, 1). The processing is as follows: For the traffic flow velocity of the whole data set, we use ˇ ˇ ˇ X ´X ˇ ij j min ˇ ˇ (5) Xi pjq “ ˇ ˇ pi “ 0, 1, . . . , n, j “ 1q ˇ X j max ´ X j min ˇ For the traffic flow density of the whole data, we use ˇ ˇ ˇ X ´X ˇ ij j max ˇ ˇ Xi pjq “ ˇ ˇ pi “ 0, 1, . . . , n, j “ 2q ˇ X j max ´ X j min ˇ For the traffic volume of the whole data set, we use ˇ ˇ ˇ ˇ $ ˇ Xij ´X0j ˇ ˇ Xij ´X0j ˇ ’ & 1 ´ ˇ X0j ˇ , ˇ X0j ˇ ď 1 and i “ 0, 1, . . . , n, j “ 3 Xi pjq “ ˇ ˇ ’ ˇ X ´X ˇ % 0, ˇ ijX 0j ˇ ą 1 and i “ 0, 1, . . . , n, j “ 3

(6)

(7)

0j

Xij is the original matrix element in the above Equations (5)-(7) and is normalized to Xi pjq, where X j max represents the maximum of the jth column and X j min represents the minimum of the jth column, and anything inside of braces is the limiting condition. After normalization, we obtain (p + 1) matrices, namely, A0 , A1 , . . . , A p where A0 can be acquired by normalizing the optimal referenced standard X0 and the comparative object set. Meanwhile, A1 , A2 , . . . , A p can be acquired by normalizing the referenced array set and the comparative object set, where A0 and A p are defined as

ISPRS Int. J. Geo-Inf. 2016, 5, 71

7 of 15

» 5, 71 ISPRS Int. J. Geo-Inf. 2016,

fi » fi 7 of 15 X1 p1q X1 p2q X1 p3q X1 p1q X1 p2q X1 p3q — X p1q X p2q X p3q ffi — X p1q X p2q X p3q ffi 2 2 2 2 2 2 ffi — ffi — ffi — ffi — (8) A0 “ — ¨ ¨ ¨ 𝑋1 (2) ¨ ¨ ¨ 𝑋1 (3) ¨¨¨ ¨ ¨𝑋 ¨ 1 (1) ¨ ¨𝑋¨1 (2) ¨ 𝑋 ¨ ¨1 (3) ffi A p “ — 𝑋1 (1) ffi ffi — ffi — (1) (2) (3) 𝑋 𝑋 𝑋 (2) (3) 𝑋2 (1)Xn 𝑋 𝑋 fl – fl – Xn p1q 2 2 2 Xn p1q 2Xn p2q 2 Xn p3q p2q Xn p3q ⋯ p1q ⋯ 𝐴0 X = p1q⋯ X p2q 𝐴𝑝 = Y (8) ⋯ X p3q ⋯ Yp p2q ⋯Yp p3q p 0 𝑋𝑛 (1) 𝑋𝑛 (2) 𝑋𝑛 (3) 𝑋𝑛 (1) 0 𝑋𝑛 (2) 0𝑋𝑛 (3) 𝑌𝑝 (2) standard 𝑌𝑝 (3) ] sequence, and the The last row of the matrix normalized referenced 𝑋0 (3)] to the optimal [ 𝑋0 (1)A0𝑋is0 (2) [ 𝑌𝑝 (1) last row of the matrix A p is the pth normalized referenced standard array. The last row of the matrix 𝐴0 is normalized to the optimal referenced standard sequence, and the last row of theGrey matrix 𝐴𝑝 is the pth normalized referenced 4.1.3. Calculating Relational Degree and Generating Grey standard Relationalarray. Similar Matrix The greyGrey relational degree reflects the high degree between two comparative objects. 4.1.3.(1) Calculating Relational Degree and Generating Grey Relational Similar Matrix For example, we focus on calculating the grey relational degree of the matrix A0 . The formulas (1) The grey relational degree reflects the high degree between two comparative objects. For are as follows (Equations (9) and (10)): example, we focus on calculating the grey relational degree of the matrix 𝐴0 . The formulas are as |X0 pkq ´ Xi pkq| ` σmaxmax |X0 pkq ´ Xi pkq| min follows (Equations (9)min and (10)): i i k k γ0ipkq “ pk “ 1, 2, 3q (9) |X pkq ´ X pkq| ` σmax |X Xi − pkq| max |𝑋0´(𝑘) min min 𝑋𝑖 (𝑘)| 0 |𝑋0 (𝑘)i − 𝑋𝑖 (𝑘)| + 𝜎 max max 0 pkq i 𝑖 𝑘 k𝑖 𝑘 γ0𝑖(𝑘) = (𝑘 = 1,2,3) (9) |𝑋0 (𝑘) − 𝑋𝑖 (𝑘)| + 𝜎 max m max|𝑋0 (𝑘) − 𝑋𝑖 (𝑘)| ÿ 𝑘 1𝑖 (10) γ0i “ γ pkq m1 𝑚 0i k “1 γ0𝑖 = ∑ 𝛾0𝑖 (𝑘) (10) where σ is the resolution coefficient, which has𝑚 a 𝑘=1 range of 0 to 1. Generally, we assume that σ is 0.5. γoi pkq is the correlation coefficient between Xi and X0 at the kth point. The grey relational degree where 𝜎 is the resolution coefficient, which has a range of 0 to 1. Generally, we assume that 𝜎 is 0.5. is expressed by γoi pkq, where X0 is regarded as the referenced sequence and Xi is regarded as the 𝛾𝑜𝑖 (𝑘) is the correlation coefficient between 𝑋𝑖 and 𝑋0 at the kth point. The grey relational degree comparative sequence. Similarly, we can obtain γij when Xi is regarded as the referenced sequence is expressed by 𝛾𝑜𝑖 (𝑘), where 𝑋0 is regarded as the referenced sequence and 𝑋𝑖 is regarded as the and X j is regarded as the comparative sequence. Then, X1 , X2 , . . . , Xn , X0 are regarded as the comparative sequence. Similarly, we can obtain 𝛾𝑖𝑗 when 𝑋𝑖 is regarded as the referenced sequence referenced sequence; meanwhile, the (n + 1) sequences are regarded as comparative sequences and 𝑋𝑗 is regarded as the comparative sequence. Then, 𝑋1 , 𝑋2 , … , 𝑋𝑛 , 𝑋0 are regarded as the (the (n + 1) sequences are not only referenced sequences but also comparative sequences). Finally, referenced sequence; meanwhile, the (n + 1) sequences are regarded as comparative sequences (the we calculate the grey relational degree matrix Γ0pn ` 1q ˆ pn ` 1q according to the grey relational analysis, (n + 1) sequences are not only referenced sequences but also comparative sequences). Finally, we where Γ0pn the is obtained based on AΓ0 0and consists of any γij pi “ 1, 2, . . . , n, n ` 1, j “ 1, 2, 3q. calculate relational degree matrix ` 1qgrey ˆ pn ` 1q (𝑛 + 1) × (𝑛 + 1) according to the grey relational analysis, 0we can calculate the grey relational degree matrices Γ1 , Γ2 , . . . , Γ p based on A0 , A , . . . , A p . Similarly, where Γ(𝑛 + 1) × (𝑛 + 1) is obtained based on A0 and consists of any 𝛾𝑖𝑗 (𝑖 = 1, 2, … , 𝑛, 𝑛 + 1, 𝑗1 = 1,2,3). Similarly, we can calculate the grey relational degree matrices Γ1 , Γ 2 , … , Γ 𝑝 based on 𝐴0 , 𝐴1 , … , 𝐴𝑝 . (2) Calculate grey relational similar matrix G (2) Calculate grey relational similar matrix ` G ˘ We similarity elements gij “ 𝑔𝑖𝑗 γij=`(𝛾 γ𝑖𝑗ji +{2𝛾𝑗𝑖in)/2 G. Gin0 , for example, is the grey We calculate calculatethe the similarity elements G. G 0, for example, is relational the grey similar matrix obtained when X is regarded as the referenced sequence, and the (n + 1)th row 0 when 𝑋0 is regarded as the referenced sequence, and the (n +grey relational similar matrix obtained 1)th relational elements of G0 are calculated when X0 iswhen regarded the optimal referenced row grey similarity relational similarity elements of G0 are calculated 𝑋0 isasregarded as the optimal sequence. can obtain p + 1obtain grey relational matrices: G0 , G1 , . . . 𝐺 ,G . referencedSimilarly, sequence.we Similarly, we can p + 1 grey similar relational similar matrices: 0 , p𝐺1 , … , 𝐺𝑝 . 4.1.4. 4.1.4. Grey Grey Relational Relational Degree Degree Clustering Clustering Analysis Analysis Based grey relational similar matrix, we construct the maximal relational tree, which Based on onG, G,i.e., i.e.,the the grey relational similar matrix, we construct the maximal relational tree, consists of the values of the last row elements ordered in G. This is shown in Figure 3. which consists of the values of the last row elements ordered in G. This is shown in Figure 3.

Xa——Xb——Xc——…………——Xk——Xn Good

Bad Figure 3. 3. Maximal Maximal relational relational tree. tree. Figure

Based on G0, we can obtain 𝑔 , which is also called the closeness degree between 𝑋𝑖 and 𝑋𝑗 . Based on G0 , we can obtain g𝑖𝑗 ij , which is also called the closeness degree between Xi and X j . The greater 𝑔𝑖𝑗 is, the closer 𝑋𝑖 and 𝑋𝑗 are; in contrast, 𝑋𝑖 and 𝑋𝑗 are further away for decreased The greater gij is, the closer Xi and X j are; in contrast, Xi and X j are further away for decreased 𝑔𝑖𝑗 . Based on Figure 3, the maximal relational tree with closeness degrees is generated, as shown in Figure 4, where 𝑎𝑖 represents the closeness degree between 𝑋𝑖 and 𝑋𝑗 .

ISPRS Int. J. Geo-Inf. 2016, 5, 71

8 of 15

gij . Based on Figure 3, the maximal relational tree with closeness degrees is generated, as shown in Figure 4,J.where the closeness degree between Xi and X j . i represents ISPRS Int. Geo-Inf. a2016, 5, 71 8 of 15

α1 α2 αk αn Xa——Xb——Xc——…………——Xk——Xn Good

Bad

Figure with closeness closeness degrees. degrees. Figure 4. 4. Maximal Maximal relational relational tree tree with

Based on on Figure Figure 4, 4, we we set set the the isolated isolated point 𝜆, which which is is in in the the interval interval (0, (0, 1). 1). We cut Based point coefficient coefficient λ, We cut the tree when the closeness degree is less than λ and when adjacent branches exhibit substantial the tree when the closeness degree is less than λ and when adjacent branches exhibit substantial differences. Therefore, the disconnected tree is used to make its connected branches form k levels of differences. Therefore, the disconnected tree is used to make its connected branches form k levels of clusters along the horizontal. We consider the closest branch as the first level and the loosest branch clusters along the horizontal. We consider the closest branch as the first level and the loosest branch as as the kth level (If k ≥ 4, we also classify the fifth branch, the sixth branch, the seventh branch, and the kth level (If k ě 4, we also classify the fifth branch, the sixth branch, the seventh branch, and even even the kth branch as the fourth rank. This is to say that smooth corresponds to the first rank and the kth branch as the fourth rank. This is to say that smooth corresponds to the first rank and that that heavy jam corresponds to the fourth rank). Thus, in this way, we can obtain a cluster member of heavy jam corresponds0 to 1the fourth rank). Thus, in this way, we can obtain a cluster member of G. G. Therefore, we use Γ , Γ , … , Γ 𝑝 to compute p + 1 grey relational similarity matrices 𝐺 , 𝐺 , … , 𝐺 , Therefore, we use Γ0 , Γ1 , . . . , Γ p to compute p + 1 grey relational similarity matrices G0 , 0G1 , 1. . . , G𝑝p , from which which we we can can find find the the closeness closeness relationship relationship among among each each object. object. Therefore, Therefore, we we can can obtain obtain pp + + 11 from clustering results, namely, cluster members. clustering results, namely, cluster members. 4.2. Establishing Function for for Grey Grey Relational Relational Clustering Clustering System System 4.2. Establishing Evaluation Evaluation Function According to to the the above above initial initial clustering clustering results, results, we we use use rough rough set set theory theory to According to establish establish the the decision decision table system that is applied to weight the contribution of cluster members to the clustering results table system that is applied to weight the contribution of cluster members to the clustering results and and give weights to the cluster members. give weights to the cluster members. Table System System 4.2.1. Describing How to Establish the Decision Table referenced standard set set areare combined withwith the First, the the optimal optimalreferenced referencedstandard standardand andthe the referenced standard combined comparative object set through the grey relational clustering method to construct the decision table the comparative object set through the grey relational clustering method to construct the decision system 𝐹 =< 𝐷, 𝑉, > V, ,where 𝑈 =U {𝑋1“ , 𝑋tX 𝑋2𝑛,}. .represents analysisanalysis domaindomain data; data; 𝐶= table system F 𝑈, “𝐶,xU, C,𝑓 D, f y,where . , Xn u represents 2, … 1 ,, X ( {𝑐1“, 𝑐2c,1…, ,c𝑐2𝑝, }. . are attributes and areand cluster formed by the referenced standard C . , c pconditional are conditional attributes are members cluster members formed by the referenced standard set; as D the “ tdu, as the decisional attribute, is the clusterobtained memberusing obtained using the optimal set; 𝐷 = {𝑑}, decisional attribute, is the cluster member the optimal referenced referenced standard; “𝐶 V , V∈C 𝐶“ are YVthe P C are the set of traffic flow properties, standard; 𝑉 = 𝑉𝐶 ∪ 𝑉V =C ∪Y𝑉V of the the range set of of traffic flow properties, where 𝑉𝑐ℎ ch , crange 𝐷, 𝑉 𝑐ℎD, 𝑐 where Vch represents level in cluster ch (h 1, 2, . . . , p); representsfunction, the evaluation represents the level in the cluster member 𝑐ℎ member (h = 1, 2, …, p); =f represents thefevaluation 𝑓: 𝑈 × q function, f : U ˆ pC Y Dq Ñ V ; and f pX , c P V represents the level of X in cluster member ch . (𝐶 ∪ 𝐷) → 𝑉; and 𝑓(𝑋𝑖 , 𝑐ℎ ) ∈ 𝑉𝑐ℎ represents of 𝑋𝑖 in cluster member 𝑐ℎ . ch i h the level i 4.2.2. Calculating Information Information Entropy 4.2.2. Calculating Entropy In weight Ipc𝐼(𝑐 indicates how important the h , Dq In the the decision decisiontable tablesystem, system,the theinformation informationentropy entropy weight ℎ , 𝐷) indicates how important cluster member c (conditional information) is for result D (decisional information) when the optimal h the cluster member 𝑐ℎ (conditional information) is for result D (decisional information) when the referenced standard is chosen to calculate the cluster member. According to the information entropy of optimal referenced standard is chosen to calculate the cluster member. According to the information rough set theory, Ipc , Dq is described as follows: entropy of rough seththeory, 𝐼(𝑐 , 𝐷) is described as follows: ℎ

I pc, pDq ´ pD|tcuq 𝐼(𝑐,Dq 𝐷)“=H𝐻(𝐷) −H 𝐻(𝐷|{𝑐})

(11) (11)

k𝑘 ÿ H𝐻(𝑐) pcq “=´− ∑P𝑃(𝑅𝐶 pRCi q) log pP pRCi qq 𝑖 𝑙𝑜𝑔(𝑃(𝑅𝐶𝑖 ))

(12) (12)

i “1

𝑖=1

k𝑘 ÿ H𝐻(𝐷|{𝑐}) pD|tcuq “=´− ∑P𝑃(𝑅𝐷 pRDi |RC i q log pP pRDi |RCi qq 𝑖 |𝑅𝐶𝑖 ) 𝑙𝑜𝑔(𝑃(𝑅𝐷𝑖 |𝑅𝐶𝑖 ))

(13) (13)

i“1

𝑖=1

|RC |𝑅𝐶𝑖i|| P𝑃(𝑅𝐶 pRCi q) “ 𝑖 = |X| |𝑋|

𝑃(𝑅𝐷𝑖 |𝑅𝐶𝑖 ) =

|𝑅𝐷𝑖 ∩ 𝑅𝐶𝑖 | |𝑅𝐶𝑖 |

(14) (14)

(15)

where 𝑖 = 1, 2, … , 𝑘 (k is the number of clusters), 𝑅𝐶𝑖 represents the ith divided cluster of cluster member c, |𝑅𝐶𝑖 | represents the number of elements in the ith cluster, and |𝑋| = 𝑛 . A larger

ISPRS Int. J. Geo-Inf. 2016, 5, 71

9 of 15

P pRDi |RCi q “

|RDi X RCi | |RCi |

(15)

where i “ 1, 2, . . . , k (k is the number of clusters), RCi represents the ith divided cluster of cluster member c, |RCi | represents the number of elements in the ith cluster, and |X| “ n. A larger conditional attribute c is more important to the decisional information D. In addition, H pcq and HpD|tcuq are determined by the conditional information entropy of rough set theory. Thus, the relative weight of each cluster member can be determined, and a more important cluster member corresponds to a greater weight. 4.3. Calculating the Level of Clustering Membership of Data Objects Step 1: Calculate the importance of the attribute information entropy Eh “ I pch , Dq for each cluster member ch in the decision system, where h “ 1, 2, . . . , p. Step 2: Set the relative weight of each cluster member: ω h “ Eh {

p ÿ

Eh

(16)

h “1

Step 3: Use probability theory to calculate the probability of each data object emerging in every clustering based on the relative weights to choose the level whereby the probability is maximized. Furthermore, obtain the final clustering results. In addition, data object Xi belonging to the jth level pj “ 1, 2, . . . , kq is defined as ´ P

j Xi

p ÿ

¯ “

wh

(17)

h“1 Mik “ j where Mik represents the level of data object Xi in cluster member ch , which has been computed in grey relational clustering. Thus, the grey relational membership degree level of Xi can be expressed as: j

Level pXi q “ tj| max pPpXi qqu j “1 Ñ k

(18)

! ) The final result is C “ C1 , C2 , . . . , C k , where C k includes all data objects whose grey relational membership degree level is the kth level: C k “ tXi |Level pXi q “ k, Xi P Xu

(19)

4.4. GMRC Algorithm Detail Description In this paper, we study the problem of multidimensional-attribute information clustering for traffic flow and propose the GMRC algorithm. First, we transform the dataset into matrix form, extract the optimal referenced standard from the dataset, and then perform the normalized processing to eliminate the effects of different units. Furthermore, we obtain the preliminary clustering results according to grey relational theory analysis. Finally, we build a decision table system to calculate the relative weight for each cluster member. The algorithm is described as follows. Input:

Analysis domain data set X “ tXi |Xi “ pXi1 , Xi2 , Xi3 qu i P N,

Referenced array set Y “ tYi |Yi “ pYi1 , Yi2 , Yi3 qu pi “ 1, 2, . . . , pq. Output: The level (rank) set of analysis domain data objects

ISPRS Int. J. Geo-Inf. 2016, 5, 71

10 of 15

Algorithm 1: GMRC (X,Y) 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Level = null; Weight = null; Member = null; Entropy = null; //Initialize sets Level, Weight; Matrix Member, Entropy Data_ Preprocessing (X,Y); // Data pre-processing X0 = ExtractOptimal (X); // Extract the optimal referenced standard S = Normalization (X,Y); // Normalization processing T = MaxRelTrees (S) // Construct the maximum relational trees T’ = ClosenessTrees (T) // Construct the maximum relational tree with closeness degree Member0 = GreyCluster (X,X0 ); // Regard X0 as referenced standard to compute cluster member Foreach (Yi in Y) Memberi = GreyCluster (X,Yi ); // Regard Yi as referenced standard to compute cluster member End F = DecisionSystem (Members); // Establish the decision table system F Entropy = CalculateEntropy (F); // Calculate the information entropy for each cluster member Weight = CalculateWeight (Entropy); // Calculate relative weights for each cluster member CalculateLevel (X); // Calculate membership degree level of Xi in X

The above steps of the algorithm are described as follows: Step 1: Initialize the parameters, pre-process traffic flow data, set the threshold to filter and delete abnormal data objects (lines 1 and 2). Step 2: According to the features of the three properties of the traffic flow data, extract the optimal referenced standard from the analysis domain data (line 3). Step 3: Normalize the analysis domain data set in combination with the referenced standard sequences (line 4). Step 4: Compute the grey relational degree of the corresponding matrix; further, determine the grey relational similarity matrices and then construct the maximum relational trees based on the (n + 1)th row elements of those matrices (line 5). Step 5: Based on Step 4, construct the maximum relational tree with closeness degrees (line 6). Step 6: Compare the closeness degree between data objects, cut off the tree when the closeness degree is less than λ and adjacent branches exhibit large differences, and then obtain k levels of clustering results. Similarly, we obtain in total p + 1 cluster members from the p + 1 referenced arrays (lines 7–10). Step 7: Establish the decision table system based on p cluster members as conditional attributes obtained from the referenced standard array set. In addition, the only cluster member obtained from the optimal referenced standard (line 11) is regarded as the decisional attribute. Step 8: Compute the information entropy of cluster members for decision making, which is used to weigh the contribution of each cluster member to the clustering results (line 12). Step 9: Calculate the weight of each cluster member (line 13). Step 10: Calculate the probability of each data object emerging in every clustering; then, choose the level when the probability is maximized. Furthermore, obtain the final clustering results (line 14). 4.5. Grey Relational Membership Function The performance of the traditional clustering results depends on the distance between elements inside classes and the distances between classes. Shorter distances between elements inside a class indicate better classes; conversely, longer distances between classes indicate better classes [12]. Our purpose for clustering is to obtain the membership degree rank of classes and to judge the rank of data objects. Thus, we construct a membership function reflecting the similarity between elements inside a class and the similarity between classes based on the grey relational similarity degree. Assume that γX,Y is denoted as the grey relational similarity degree between object X and object

ISPRS Int. J. Geo-Inf. 2016, 5, 71

11 of 15

! ) Y, and C “ C1 , C2 , . . . , C k represents divided clusters. SC , which is used to weigh the division C, is defined as Sc “

k k k ÿ ÿ ÿ 1 1 1 1ÿ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ p γ ` XY ˇC i ˇ ˆ ˇC i ˇ ˇC i ˇ ˆ ˇC l ˇ k k pk ´ 1q i i“1 i“1 i“1,l ‰i X,Y PC

ÿ γXY q

(20)

X PCi ,Y PC l

5. Experimental Results and Analysis In this paper, we collected traffic data for Nanjing trunk roads, which connect the main commercial areas and which include high traffic volumes and high-density residential areas. These roads need to be cleared in a timely manner before traffic jams can occur. Generally speaking, people’s daily routines are very regular [13]: go to work in the morning and go home at dusk. However, this can quickly lead to traffic congestion during rush hours. A previous study noted that a single parameter, such as velocity, used to describe traffic congestion states is insufficient. To obtain detailed knowledge for judging traffic congestion, comprehensive analysis of the three variables of traffic flows for determining the traffic flow state can be used to reflect the real conditions of the road for predicting traffic jams [14]. To test our algorithm, we experimentally collected traffic flow data from 50 monitoring points along Nanjing’s trunk roads during the time periods of approximately 7:00–9:30 a.m. and 4:30–7:00 p.m., which correspond to the rush hours. In addition, we chose 30 drivers with more than five years of driving experience as testers and watched their vehicle driving videos to obtain their evaluation of the traffic flows’ four states (smooth, light jam, jam, and heavy jam) [15]. Then, we evaluated the clustering results to validate the accuracy of our algorithm compared with other clustering methods such as the K-means algorithm and the FCM algorithm. In this experiment, we assumed that the resolution coefficient σ is 0.5 and that the number of levels is 4. Because the algorithm is stochastic in nature, the average results of 20 tests for each algorithm on each dataset are used as the experimental results. Table 1 shows the corresponding information entropy of the four cluster members obtained by the primary clustering data samples, namely, the relative weights of the cluster members. To verify the performance of our algorithm, six sample points were randomly selected from the dataset, as shown in Table 2. In Table 2, we find that the clustering of the traffic state is most closely related to traffic flow velocity; however, the velocity cannot fully determine the traffic state. For example, the traffic flow speed in the 459th group is 40.3 km/h, and the state is a light jam. However, the traffic flow speed in the 812th group is 45.5 km/h, which is greater than 40.3 km/h, but the state is a jammed state. This phenomenon occurs mainly because the traffic volume of the latter is greater than that of the former. The three properties of the traffic flow should be comprehensively considered rather than only the velocity. Table 1. Relative weights of cluster members. Cluster Members

Relative Weights

C1 C2 C3 C4

0.3746 0.2644 0.2128 0.2481

Table 2. Random samples. Samples

Traffic Flow Velocity

Traffic Flow Density

Traffic Volume

Results

15 35 106 459 689 812

66.0 35.2 50.8 40.3 10.8 45.5

58.2 70.3 66.9 68.3 81.9 63.6

16.2 25.6 23.9 31.2 41.7 58.9

Smooth Jam Smooth Light Jam Heavy Jam Jam

ISPRS Int. J. Geo-Inf. 2016, 5, 71

12 of 15

ISPRS Int. J. Geo-Inf. 2016, 5, 71

12 of 15

5.1. Complexity Analysis

We suppose mm dimensions, andand p referenced standard sequences. The We suppose kk levels, levels,nndata dataobjects, objects, dimensions, p referenced standard sequences. complexity of the algorithm is described in the The complexity ofGMRC the GMRC algorithm is described in following. the following. 5.1.1. Time Complexity Analysis

constructing the the initial initial matrix matrix is is O𝑂(𝑚 × nq. 𝑛). Extracting the optimal referenced The time for constructing pm ˆ elements in the initial matrix, and the time is also is 𝑂(𝑚 × standard requires requiresaccess accesstotoallall elements in the initial matrix, and thecomplexity time complexity also 𝑛).pmThe time complexity of the matrix normalization is 𝑂(𝑝 × grey relational degree ˆ nq. ˆ nThe ˆ mq. O The time complexity of the matrix normalization is𝑛O×pp𝑚). The grey relational calculation requiresrequires access toaccess all theto elements in n grey in relational similarity similarity degree matrices; degree calculation all the elements n grey relational degreetherefore, matrices; its time complexity is approximately 𝑂(𝑛 × 𝑚 × 𝑝 The complexity sorting theof(nsorting + 1)th therefore, its time complexity is approximately O× pn𝑛). ˆm ˆ ptime ˆ nq. The time of complexity row(ngrey relational degree elements forelements p grey relational matrices is matrices 𝑂(𝑝 × 𝑛2is ). the + 1)th row grey relational degree for p greysimilarity relationaldegree similarity degree ` ˘ 2 Calculating the information entropy for entropy the cluster members 𝑂(𝑝 needs × 𝑘). The to O p ˆ n . Calculating the information for the clusterneeds members O pptime ˆ kq.needed The time calculate clustering membership degree analysis domaindomain data is data 𝑂(𝑛 × 𝑝× to the pn𝑘). ˆ pAccording ˆ kq. According needed tothe calculate the clustering membership degree analysis is O ` ˘ 2 2 above analysis, the average time complexity of the of GMRC algorithm is 𝑂(𝑘is×O𝑚k׈𝑝m׈𝑛 p).ˆ n . to the above analysis, the average time complexity the GMRC algorithm 5.1.2. Space Complexity Analysis The complexity of analyzing the domain domain initial initial matrix matrix is is O pm × ˆ𝑛) nq in space, and the complexity 𝑂(𝑚 ` ˘ of analyzing the similarity degree matrix with space complexity is O𝑂(𝑝 pˆ ×n 𝑛22 ).. In addition, the space complexity in the GMRC algorithm is O pk ˆ pq. complexity of of determining determiningthe thegrey greyrelational relationalmembership membershipdegree ` degree in2 ˘the GMRC algorithm is 𝑂(𝑘 × Therefore, the the totaltotal space complexity of the algorithm is Oisk ˆ pˆ 𝑝). Therefore, space complexity of the algorithm 𝑂(𝑘 × n𝑝 ×. 𝑛2 ). 5.2. Impact of Isolated Point Coefficient λ λ on on the the Clustering Clustering Results Results From Figure 5, we can find that, based on different numbers of data samples, the grey relational membership function value gradually gradually increases increases with with increasing increasing λ. This is is because becausewith withincreasing increasing 𝜆, λ, 𝜆. This the grey relational similarity degree between the elements inside classes increases, and this accounts for the Clearly, when λ is between 0.84 and0.84 0.87,and the function is maximized. thedominant dominantposition. position. Clearly, when 𝜆 is between 0.87, thevalue function value is Then, as λ continues to gradually increase, the function value decreases. This is because grey maximized. Then, as 𝜆 continues to gradually increase, the function value decreases. This isthe because relational similarity similarity degree between occupies dominant From the perspective the grey relational degreeclasses between classesthe occupies theposition. dominant position. From the of the grey relational similarity degree between data objects, we analyze the closeness degree inside perspective of the grey relational similarity degree between data objects, we analyze the closeness classes among classes and fully utilize thefully multi-dimensional information feature and the feature overall degree and inside classes and among classes and utilize the multi-dimensional information change the three properties better describeto thebetter closeness degree objects. Therefore, and thein overall change in the to three properties describe thebetween closenessdata degree between data in the next experiment, the range of λ is set as (0.84, 0.87). objects. Therefore, in the next experiment, the range of 𝜆 is set as (0.84, 0.87).

Figure function as as aa function function of of λ. λ. Figure 5. 5. Grey Grey relational relational membership membership function

5.3. Comparison with Other Algorithms

ISPRS Int. J. Geo-Inf. 2016, 5, 71

ISPRS Int.J.J.Geo-Inf. Geo-Inf. 2016, 71 Algorithms 5.3. Comparison with Other ISPRS Int. 2016, 5,5,71

13 of 15

13ofof15 15 13

Figure 6 6illustrates accuracy rateofof ofthe theGMRC, GMRC, K-means and FCM clustering algorithms Figure 6illustrates illustratesthe theaccuracy accuracyrate the GMRC, K-means and FCM clustering algorithms Figure the K-means and FCM clustering algorithms inin ineach each class. We find that the accuracy of GMRC in each class and under different λ is higher eachclass. class. We find that the accuracy of GMRC in each class and under different 𝜆 is higher than that We find that the accuracy of GMRC in each class and under different 𝜆 is higher than that theK-means K-means andFuzzy Fuzzy algorithms. Thisisisbecause because the GMRCthe algorithm usesthe thedata data attribute than that of the K-means and Fuzzy algorithms. This isthe because GMRCuses algorithm uses the data ofofthe and algorithms. This GMRC algorithm attribute index feature to compute the grey relational similarity degree and takes the quality of cluster attribute index feature to compute the grey relational similarity degree and takes the quality of cluster index feature to compute the grey relational similarity degree and takes the quality of cluster members into consideration. Thus, the algorithm effectively improves internal class similarity members into consideration. Thus, the algorithm effectively improves internal class similarity members into consideration. Thus, the algorithm effectively improves internal class similarity toto to achieve ranked clustering. achieve ranked clustering. achieve ranked clustering.

Figure 6.Comparison Comparisonof accuracyrate rateofofeach eachclass classamong amongGMRC, GMRC, K-means and FCM algorithms. Figure 6.6.Comparison each class among GMRC,K-means K-means and FCM algorithms. Figure ofofaccuracy accuracy and FCM algorithms.

Figure 7shows shows theaverage averageaccuracy accuracyof ofour ourGMRC GMRCalgorithm algorithm under different that the Figure 7 7shows the of our GMRC algorithm under different λand and that of the Figure the average accuracy under different 𝜆𝜆 and that ofofthe K-means and FCM algorithms. From Figures 6 and 7, we can conclude that the average accuracy K-means and FCM algorithms.From FromFigures Figures66and and77,we wecan canconclude concludethat thatthe theaverage averageaccuracy accuracyof ofofthe K-means and FCM algorithms. theGMRC GMRC algorithm higher than that ofthe the K-means andFCM FCMalgorithms. algorithms.In In addition, the average the algorithm isishigher than that K-means and FCM algorithms. the average GMRC algorithm is higher than that of of the K-means and Inaddition, addition, the average accuracy of the GMRC algorithm is 24.9% higher than that of the K-means algorithm and 30.8% accuracy GMRC algorithm is 24.9% higher than of K-means the K-means algorithm 30.8% accuracy of of thethe GMRC algorithm is 24.9% higher than thatthat of the algorithm and and 30.8% higher higher than that of the FCM algorithm. In addition, our new algorithm exhibits better stability. higher of algorithm. the FCM algorithm. In our addition, our new algorithm exhibits betterBecause stability.our than that than of thethat FCM In addition, new algorithm exhibits better stability. Becauseour ouralgorithm algorithmdoes doesnot notneed needtotorandomly randomlychoose choosethe theinitial initialcenter centerpoint, point,as asininthe theK-means K-means Because algorithm does not need to randomly choose the initial center point, as in the K-means algorithm, algorithm, the stability of the algorithm is not affected by the stochastic nature of the algorithm. thethe stability of theisalgorithm is not by the stochastic nature of the algorithm. thealgorithm, stability of algorithm not affected byaffected the stochastic nature of the algorithm.

Figure 7. Comparison of average accuracy rate among the GMRC algorithm with different λ and the Figure7.7.Comparison Comparisonofofaverage averageaccuracy accuracyrate rateamong amongthe theGMRC GMRCalgorithm algorithmwith withdifferent differentλλand andthe the Figure K-means and FCM algorithms. K-means and FCM algorithms. K-means and FCM algorithms.

ISPRS Int. J. Geo-Inf. 2016, 5, 71

14 of 15

6. Conclusions Judging traffic congestion states is the premise and basis for dynamic traffic congestion warning, traffic guidance, actively avoiding traffic congestion and ensuring smooth roads. However, traffic jams are usually judged by experience. This paper collects traffic flow data and provides a more effective judgment method. We introduce both grey relational analysis and rough set theory to the GMRC algorithm and weigh the membership degree of data object clustering using comprehensive information about the data. In this process, we construct the maximum relational tree with closeness degree and compare the closeness degree between data objects. We cut off the tree when the closeness degree is less than λ and when adjacent branches exhibit large differences. Consequently, we obtain p + 1 cluster members. Next, we establish a decision table system based on p cluster members as conditional attributes obtained from referenced standard array sets. Then, we calculate the probability of each data object emerging in every clustering, choose the rank when the probability is maximized, and finally obtain the final clustering results. Thus, our algorithm fills the gaps present in the literature whereby the K-means and FCM algorithms cannot differentiate which rank a clustering belongs to. The experimental results show that the proposed algorithm, which takes the characteristics of the multidimensional data object attributes into consideration, is a superior algorithm. Next, we plan on applying grey relational similarity to other algorithms and to consider reducing the computational complexity of the algorithm. Acknowledgments: This research was performed in cooperation with the Institution. The research is support by the National Natural Science Foundation of China (No. 61572260, No. 61373017, and No. 61572261), the Peak of Six Major Talent in Jiangsu Province (No. 2010DZXX026), the China Postdoctoral Science Foundation (No. 2014M560440), the Jiangsu Planned Projects for Postdoctoral Research Funds (No. 1302055C), the Scientific & Technological Support Project of Jiangsu Province (No. BE2015702), the Jiangsu Provincial Research Scheme of Natural Science for Higher Education Institutions (No. 12KJB520009), and the Science & Technology Innovation Fund for Higher Education Institutions of Jiangsu Province (No. CXZZ11-0405). The authors are grateful to the anonymous referee for a careful review of the details and for their helpful comments, which improved this paper. Author Contributions: All the authors conceived of and designed the study. Furthermore, Yingya Zhang designed the GMRC algorithms presented in this paper and produced the results. Ning Ye provided guidance on modeling, Ruchuan Wang provided guidance on theory and publication fee, and Malekian provided guidance on simulation. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5.

6. 7. 8. 9.

Courtney, R.L. A broad view of its standards in the U.S. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Boston, MA, USA, 9–12 November 1997; pp. 529–536. Arnold, E.D., Jr. Congestion on Virginia’s Urban Highways. Available online: http://ntl.bts.gov/DOCS/ arnold.html (accessed on 17 April 1998). Mok, P.Y.; Huang, H.Q.; Kwok, Y.L.; Au, J.S. A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recognit. 2012, 45, 3017–3033. [CrossRef] Yu, B.; Zhou, Z.; Xie, M. New grey comprehensive correlation degree model and its application. Technol. Econ. 2013, 32, 108–114. Hu, F.; Xia, G.-S.; Wang, Z.; Huang, X.; Zhang, L.; Sun, H. Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2015–2030. Li, K.; Li, P. Fuzzy Clustering with generalized entropy based on neural network. In Unifying Electrical Engineering and Electronics Engineering; Springer: New York, NY, USA, 2014; pp. 2085–2091. Zhang H, R.; Zhang, F. The traditional K-means clustering algorithm research and improvement. J. Xianyang Norm. Univ. 2010, 25, 138–144. Zheng, Y.; Jeon, B.; Xu, D.; Wu, Q.M.J.; Zhang, H. Image segmentation by generalized hierarchical fuzzy C-means algorithm. J. Intell. Fuzzy Syst. 2015, 28, 961–973. Li, K.; Cui, L. A kernel fuzzy clustering algorithm with generalized entropy based on weighted sample. Int. J. Adv. Comput. Res. 2014, 4, 596–600.

ISPRS Int. J. Geo-Inf. 2016, 5, 71

10. 11. 12. 13. 14. 15.

15 of 15

Wen, H.; Sun, J.; Zhang, X. Study on traffic congestion patterns of large city in China taking Beijing as an example. Procedia-Soc. Behav. Sci. 2014, 138, 482–491. [CrossRef] Stefanello, F.; Buriol, L.S.; Hirsch, M.J.; Pardalos, P.M.; Querido, T.; Resende, M.G.C.; Ritt, M. On the minimization of traffic congestion in road networks with tolls. Ann. Oper. Res. 2015. [CrossRef] Guan, X.; Sun X, W.; He, Y. A novel feature association algorithm based on grey correlation grade and distance measure. Radar Sci. Technol. 2013, 4, 363–367, 374. He, H.; Tang, Q.; Liu, Z. Adaptive correction forecasting approach for urban traffic flow based on fuzzy C-Mean clustering and advanced neural network. J. Appl. Math. 2013, 2013, 633–654. Lu, X.; Song, Z.; Xu, Z.; Sun, W. Urban traffic congestion detection based on clustering analysis of real-time traffic data. J. Geo-Inf. Sci. 2012, 14, 775–780. [CrossRef] Elefteriadou, L.; Srinivasan, S.; Steiner, R.L.; Tice, P.C.; Lim, K. Expanded transportation performance measures to supplement level of service (LOS) for growth management and transportation impact analysis. Congest. Manag. Syst. 2012, 19, 977–991. © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents