An Approach Based on Association Rules Mining to Improve Road Safety in Morocco Ait-Mlouk Addi1, Agouti Tarik1 1 Department of Computer Science Faculty of Science Semlalia, Cadi Ayyad University Marrakech, Morocco
[email protected],
[email protected]
Abstract : Traffic accident has a great impact on the socio-economic development of a society. This work employed large scale data mining method especially association rules and multi criteria analysis approach to discover new knowledge from historical data about traffic accidents in one of Morocco’s busiest roads in order to assist police decision makers in the formulation of new policies and traffic rules on our highways management. The study focused on resulting from accidents using real data obtained from the Ministry of Equipment and Transport of Morocco, empirical results show that the developed models could,nkjl provide new information that can assist the authority to improve road safety.
Gharnati Fatima2 2 Department of Physics Faculty of Science Semlalia, Cadi Ayyad University Marrakech, Morocco
[email protected]
According to traffic accident report provided by World Health Organization (WHO), 1.24 million people die each year on the world's roads and between 20 and 50 million injuries due to road traffic accident [2]. Moreover, the Centers for Disease Control and Prevention, announced that road traffic accident cost 100$ billion in medical care every year, the emergency room suffering from accident every 10 second and almost 40.000 deaths are killed due to road traffic[3]. Recently, the Ministry of Equipment and Transport of Morocco announced that there are about 68.279 accident, 3021 mortal accident, 3489 people killed, 10185 people seriously injured and 91057 people are slightly injured [4].
Keywords: Data mining, Traffic accident, frequent item
sets, association rules, algorithms, quality measurements, multiple criteria analysis (MCA). I.
INTRODUCTION
The association rule is a technique that allows the user to discover the correlation between a different object in databases [1]. The results are presented in the form of antecedent and consequence
X1 X 2 ... X n Y1 . For
example, an association rule extracted from a traffic accident database: “Driver_age5
Vehic_Type
Pedestrian, Car, Trucks, Motorcycles, other Night, day
Light_Cond Weather_Cond Acci_Cause
Run, clear, wind Vitesse, Alcohol, Sleep
Nationality of driver Service year of the vehicle The type of the vehicle
Light condition The weather condition Causes for the accident
According to published data by the ministry of transport, the Accident_type is derived from accident to classify the input factors to fatal and injury. The Driv_Age is also derived from drivers to classify the input values as less than 20, between 21 and 27, between 28 and 60, and more than 60. Drive_Exp, is derived from driver’s to classify the input values less than 1, between 2 and 4, and more than 5. Also the Vehic_Age is derived from the vehicle to N 1 2 3 4 5 … 38 39 40
Antecedent "Drive_Nationality=M" "Accident_Type=Fatal" "Accident_Cause=Sleep" "Drive_Nationality=M" "Accident_Cause=Sleep" "Accident_Cause=Sleep" "Accident_Type=Fatal" … "Accident_Cause=Vitesse" "Drive_Nationality=M" "Accident_Cause=Vitesse" "Accident_Cause=Vitesse"
classify the age of vehicle involved in the accident such as 1, 2, 3, 4, and more than 5. Also the vehic_Type is derived from vehicle to classify the input types as car, truck, Motorcycle, etc., the factor of light_cond is also take two type like day and night, The weather_cond also take three values such as clean, run and wind etc. TABLE II. SET OF EXTRACTED RULES In this section, we applied the proposed approach on a real dataset [20], which identifies the factors of accident traffic for Morocco, as a case study to illustrate the performance of this proposed approach. We used a threshold support=0.40, confidence =0.75, max rule length =3 and lift=1.10, for extracting frequents item sets then we get 40 extracted rules presented in the Table 2
Consequent "Accident_Cause=Sleep"
Lift 2,22
Sup 0.40
Conf 0.88
"Drive_Sex=M" - "Accident_Type=Fatal" "Accident_Type=Fatal"
2,22 2,22
0.40 0.40
0.10 0.10
"Drive_Nationality=M" - "Accident_Type=Fatal" "Drive_Nationality=M" - "Accident_Cause=Sleep" … "Accident_Type=Injury" "Accident_Type=Injury"
2,22 2,22 … 1,62 1,62
0.40 0.40 … 0.40 0.40
0.10 0.88 … 0.88 0.88
"Drive_Nationality=M" - "Accident_Type=Injury"
1,62
0.40
0.88
We used the set of rules previously extracted by Apriori algorithm, Table 2 as actions to be evaluated according to the decision makers preferences (support, confidence, lift…). For each action in question, and for each criterion a preference threshold p, q of indifference and a veto threshold v estimated each criterion is given a weight k reflecting his contribution in the final decision, the following table gives the decision matrix obtained:
R_Acci_4 R_Acci_5 … R_Acci_38 R_Acci_39 R_Acci_40
0,40 0,40 … 0,40 0,40 0,40
0,10 0,80 … 0,80 0,80 0,80
2,22 2,22 … 1,62 1,62 1,62
The next step is to define a set of profiles that can be compared with the extracted association rules.
TABLE III. DECISION MATRIX TABLE IV. PROFILES TABLE
Rule\Criteria R_Acci_1 R_Acci_2 R_Acci_3
C1 0,40 0,40 0,40
C2 0,80 0,10 0,10
C3 2,22 2,22 2,22
C1
C2
C3
b1
1
0.5
1
b2
1
1.1
1.2
The importance of each criteria in the decision-making resulted to predefined thresholds, such as:
"Accident_Cause=Sleep" "Drive_Nationality=M" "Accident_Cause=Sleep"
TABLE V. TRESHOLD TABLE "Accident_Cause=Sleep" C1
C2
C3
3
2
1
q j b1
0.1
0.1
0.2
p j b1
0.3
0.3
0.3
v j b1
0.4
0.4
0.4
q j b2
0.5
0.5
0.5
p j b2
0.6
0.6
0.7
v j b2
0.8
0.9
0.9
weight (Kj)
We define the default value -cut index 0.76 as the parameter that determines the situation preferably between a and bh. The implementation of the case provides a preference relationship between rules and profiles, and an assignment of association rules through two procedures: pessimistic and optimistic, the results of this assignment are presented as follows: TABLE VI. ASSIGNMENT PROCEDURES Class C1
R_Acci_1 R_Acci_2 R_Acci_3 R_Acci_4 R_Acci_5 R_Acci_6
Optimistic R_Acci_7 R_Acci_9 R_Acci_10 R_Acci_30 R_Acci_32 R_Acci_34
Pessimistic R_Acci_30 R_Acci_32 R_Acci_34
According to generated rules, the fatal and injury accident occurred during the driver’s situation, most accidents caused by driver’s situation, and where the driver’s is sleeping. Based on this study when applying multi criteria analysis approach especially Electre Tri method, on the set of fourteen rules extracted by Apriori algorithm we obtained 12 best rules after eliminating redundant and non-interesting rules. The remainder belongs to the third category, for our case study of 40 (rules) conclude that the most relevant rules are given in the following Table:
"Accident_Type=Fatal" "Accident_Cause=Sleep" "Accident_Type=Fatal" "Accident_Type=Fatal" "Drive_Sex=M" "Accident_Cause=Sleep" "Accident_Type=Injury" "Drive_Nationality=M" "Accident_Type=Injury" "Accident_Type=Injury"
"Drive_Sex=M" "Accident_Type=Fatal" "Accident_Type=Fatal" "Drive_Nationality=M" "Accident_Type=Fatal" "Drive_Nationality=M" "Accident_Cause=Sleep" "Accident_Type=Fatal" "Accident_Cause=Sleep" "Drive_Sex=M" "Accident_Cause=Sleep" "Accident_Type=Fatal" "Drive_Sex=F" "Drive_Sex=F" "Drive_Nationality=M" "Drive_Sex=F"
The illustration of statistics of assignment are presented below TABLE VIII. ASSIGNMENT STATISTICS
Class
Pessimistic assignment
Optimistic assignment
C1 C2 C3
8/% [3 of 40] 90% [36 of 40] 2% [1 of 40]
30% [12 of 40] 2% [1 of 40] 68% [27 of 40]
Nbr 12 1 27
After the execution of Apriori algorithm, we obtained fourteen rules, in which we find some is not interesting and some others are redundant. After the integration of multi-criteria analysis approach, we minimized the large number of rules from fourteen to twelve and, on the other hand; we selected the relevant and ignoring the redundant rules. Based on this study when applying Apriori algorithm on a dataset that contained the traffic accident we obtained fourteen extracted rules, on the other hand, when applying multi criteria analysis approach especially Electre Tri method, on a set of extracted rules, we obtained best teen rules after eliminating redundant and non-interesting rules. Moreover, the obtained assigning results are always sensitive to the values of thresholds
q j , p j , v j , and the decision maker preferences.
TABLE VII. Most interesting rules
Relevant Rules "Drive_Nationality=M" "Accident_Type=Fatal" "Accident_Cause=Sleep"
Concluded, by applying multi criteria analysis approach, we selected twelve useful and relevant rules from the vast extracted, in other ways we minimize the set of extracted rules then we eliminate the redundant rules.
VI.
CONCLUSION
In this paper, we have discussed the usefulness and relevance problem issued from a KDD process, this problem clearly identified by multi criteria analysis approach in order to make respondents recommendations to the decision maker preferences. We studied fourteen rules according to three criteria commonly used in literature for selecting a relevant association rules. In particular, the use of Electre Tri method, confirmed the fact that the preferences of experts have a direct impact on the order of sorting and selecting the relevant rules. On the other hand, we observe that when applying Multi criteria analysis especially Electre Tri method on a set of extracted rules solved the problem that could occur when traditional algorithms in terms of redundancy and no interesting rules. This result applied on a traffic accident database may assist the department of traffic accident based on some hidden patterns to formulate new rules and policies for road safety.
[12] Abdelaziz Araar, Amira A. El Tayeb, “Mining Road
[13]
[14]
[15]
[16]
[17]
REFERENCES [18] [1]
[2] [3] [4] [5]
[6]
[7]
[8]
[9]
[10]
[11]
R. Agrawal, H. Mannila, R. Srikant , H. Toivonen A. Verkamo, “Fast discovery of association rules”, Advances in knowledge discovery and data mining, MIT express, Cambridge, MA, 1996, P- 307-328. Who , http://www.who.int/gho/road_safety/en/ http://www.marylandinjurylawyerblog.com/2010/09/car _accident_statistics_from_t.html http://www.equipement.gov.ma/en/Pages/home.aspx Paul J. Ossenbruggen, Jyothi Pendharkar, John Ivan "Roadway safety in rural and small urbanized areas." Accidents Analysis and Prevention 33(4): 485-498, 2001. L. Chang and H. Wang, "Analysis of traffic injury severity: An application of non-parametric classification tree techniques Accident analysis and prevention", Accident analysis and prevention 38(5): 1019-1027, 2006. S. Srisuriyachai, “Analysis of road traffic accidents in Nakhon Pathom province of Bangkok using data mining”, Graduate Studies, Bangkok, Mahidol University, 2007. J. Wong, and Y. Chung, "Comparison of Methodology Approach to Identify Causal Factors of Accident Severity." Transportation Research Record 2083: 190198, 2008. T. Beshah and S. Hill, "Mining Road Traffic Accident Data to Improve Safety: Role of Road-related Factors on Accident Severity in Ethiopia", Proceedings of AAAI Artificial Intelligence for Development (AI-D'10), 2010. N. Galvão, Fátima Marin H, “Traffic accident in CuiabáMT: an analysis through the data mining technology”, Federal University of Mato Grosso-UFMT, Brazil, 2010. Vandana Munde, Sachin Deshpande, S.K.Shinde, “Data Mining for Traffic Accident Analysis”, International Conference on Advances in Computing and Management, 2012.
[19]
[20]
Traffic Accident Data to Improve Safety in Dubai”, Journal of Theoretical and Applied Information Technology, Vol. 47 No.3, 31st January 2013, pp.911925. K. Gouda and M. J. Zaki, “Efficiently mining maximal frequent itemsets”, In Proceedings of 1st IEEE International Conference on Data Mining. San Jose, Novembre 2001. R. Agrawal, H. Mannila, R. Srikant , H. Toivonen A. Verkamo, “Mining Association Items in Large Databases”, In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, Washington, DC, 1993, pp. 207216. R. Agrawal, H. Mannila, R. Srikant , H. Toivonen A. Verkamo, “Fast discovery of association rule”, Advances in knowledge discovery and data mining, MIT express, Cambridge, MA, 1996, P- 307-328. M. J. Zaki and C. Hsiao. Charm, “an efficient Algorithm for Closed Association Rule Mining”. In 2nd SIAM International Conference on Data Mining, Arlington, Avril 2002. A.-M. Addi, A. Tarik; G. Fatima, "Comparative survey of association rule mining algorithms based on multiplecriteria decision analysis approach," in Control, Engineering & Information Technology (CEIT), 2015 3rd International Conference on , vol., no., pp.1-6, 25-27 May 2015. L.C Dias, and Mousseau, V.2003. IRIS “A DSS for Multiple Criteria Sorting Problems”. Journal of MultiCriteria Decision Analysis, (12):285-298. Y. Wei. “Aide multicritère à la décision dans le cadre de la problématique du tri: Concepts, méthodes et applications”. Université Paris Dauphine, Paris, France (1992) Thèse de doctorat http://www.aitmlouk-addi.info/2015/10/24/data-mining/