Extracting Association Rules from the SHRP2 Naturalistic Driving Data: A Market Basket Analysis
Saleh R. Mousa, M.Sc
Sherif Ishak, Ph.D., P.E
Louisiana State University
University of Alabama in Huntsville
[email protected]
[email protected] Osama A. Osman, PhD Louisiana State University
Presentation Outline Problem Introduction
Objectives Methodology
Results Conclusions
Problem Annual traffic accidents: Six million traffic accidents Six million traffic accidents
35,000 human lives 2.4 million injured
Total annual cost of traffic accidents (Killed and Injured): Fatalities + Injuries
$ 594 billions
These alarming rates raise the importance of studying the underlying factors associated with Crash/Near-Cash (CNC) events and distracted driving conditions
Introduction Parametric Modeling
Non-Parametric Modeling
Features
Examples
Features
Linear Regression
No fixed structure
Classification Trees
Model learns from Data
Ensemble Tree based
Basic Statistical Structure Specific assumptions
Certain relationships between input and output variables
Poisson Regression
Negative Binomial regression
Contingency Tables
Model becomes more complex to accommodate the complexity of the data
Examples
Neural Networks
Clustering Analysis
Introduction
Non-Parametric (Data mining) approaches
NDS data examine the driving behavior and understand the likely causes of crashes
Crash records Only
Market Basket Analysis
Introduction Market Basket Analysis (MBA)
Objectives Perform Comprehensive MBA for extracting useful association rules
Use the entire SHRP 2 NDS data (crash/near-crash and normal/baseline events)
Methodology Rules structure and evaluation criteria RHS
LHS
X SUPPORT ( X Y ) =
(X
Y)
N
(X Y) CONFIDENCE ( X Y ) = (X ) LIFT ( X Y ) =
CONFIDENCE ( X Y ) SUPPORT (Y )
Y
Methodology
Description of Data Data from all six sites were used (New York, Pennsylvania, Florida, Washington, North Carolina, and Indiana)
Baseline, Crash and Near-Crash events (23,710 event)
Event details table and driver demographics questionnaire (24 variables per event)
Methodology
Description of Data Variables per each event Age Gender
Years of Driving
Annual Miles Traveled
Relation to Junction
Training
Event Duration
Intersection Influence
Secondary Task 1
Income
License Age
Education
State
Working Status
Vehicle
Front seat passengers
Occupied lane
Marital Status
Driver Behavior
Rear seat passengers
Locality
Secondary Task 2
Alignment
Grade
Methodology Thresholds for extracting rules Rule Support ≥ 3% Confidence ≥ 75%
Lift >1
Rule Length ≤ 3
Removing the Redundant rules A specific rule is considered redundant if it is equally or less predictive than a more general rule.
Methodology
Non-redundant rules (4,754 rules)
Methodology
Useless
Obvious rules
LHS
RHS
S
C
L
6%
99%
3.4
SecondaryTask1=Passenger Interaction
Front Seat Passengers=2
Relation to Junction=Interchange
Locality=Interstate/Highway
10%
95%
3.5
SecondaryTask1=None
SecondaryTask2=None
47% 100%
1.2
Grade=Level, Working Status=Full-time
Alignment=straight
29%
1.1
88%
Crash Association Rules #
LHS
RHS
S
C
L
75%
4.2
1
Driver Behavior=Improper actions, Rear seat passengers=0
Event=Crash/near-Crash
5%
2
Driver Behavior=Improper actions, Grade=Level
Event=Crash/near-Crash
5% 76% 4.2
3
Driver Behavior=Distracted
Event=Crash/near-Crash
5% 79% 4.4
4
Driver Behavior=Distracted, Grade=Level
Event=Crash/near-Crash
4% 79% 4.4
5
Driver Behavior=Distracted, Alignment=straight
Event=Crash/near-Crash
4% 79% 4.4
6
Driver Behavior=Distracted, Front Seat Passengers=1
Event=Crash/near-Crash
4% 79% 4.5
7
Driver Behavior=Distracted, Rear Seat Passengers=0
Event=Crash/near-Crash
4% 80% 4.5
8
Driver Behavior=Distracted, Years Driving=[0,20)
Event=Crash/near-Crash
3% 80% 4.5
Driver Characteristics Rules
Socioeconomic Related Rules #
LHS
RHS
S
C
L
1
Age=30-34
SecondaryTask2=None
5%
87%
1.00
2
Age =45-49
SecondaryTask2=None
4%
88%
1.01
3
Age =50-54
SecondaryTask2=None
4%
90%
1.04
4
Age =60-64
SecondaryTask2=None
4%
90%
1.03
5
Age =65-69
SecondaryTask2=None
5%
92%
1.05
6 7 8
Age =70-74 Age =75-79 Age =80-84
SecondaryTask2=None SecondaryTask2=None SecondaryTask2=None
4% 5%
94% 93%
1.08 1.07
4%
93%
1.07
9
Age =35-39
Event Type=Normal
3%
83%
1.01
10 11 12 13 14 15 16 17 18
Age =40-44 Age =45-49 Driver Behaviour= Distracted Age =55-59 Age =60-64 Age =65-69 Age =70-74 Age =75-79 Age =80-84
Event Type=Normal Event Type=Normal Rear Seat Passengers=0 Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal
3% 4% 5% 3% 3% 5% 4% 5% 3%
86% 85% 93% 83% 82% 88% 90% 89% 82%
1.04 1.04 1.01 1.03 1.03 1.07 1.10 1.08 1.00
Conclusions whenever the driving experience is less than 20 years, the driver is more likely to get involved in cell phone texting/reading/writing activity and there is an increased likelihood of crash/near-crash event occurrence if the driver gets distracted for this age group.
Strong association between likelihood of Crash/Near-Crash event occurrence and each of the following: A.Improper actions
B.Driver is distracted by a secondary task
Conclusions Association between the normal driver behavior or normal/baseline events with each of the following:
a) Driving locality is an Interstate/Highway/Residential, b) Driver not near any intersection c) Driver is married MBA application in safety research as a more reliable and accurate tool for analysing naturalistic driving data, especially with a comprehensive database with high dimensionality (a large number of variables) and multicategorical variables like the SHRP 2 NDS data.