Extracting Association Rules from the SHRP2 Naturalistic Driving Data

2 downloads 0 Views 2MB Size Report
These alarming rates raise the importance of studying the underlying factors associated with Crash/Near-Cash. (CNC) events and distracted driving conditions ...
Extracting Association Rules from the SHRP2 Naturalistic Driving Data: A Market Basket Analysis

Saleh R. Mousa, M.Sc

Sherif Ishak, Ph.D., P.E

Louisiana State University

University of Alabama in Huntsville

[email protected]

[email protected] Osama A. Osman, PhD Louisiana State University

Presentation Outline Problem Introduction

Objectives Methodology

Results Conclusions

Problem Annual traffic accidents: Six million traffic accidents Six million traffic accidents

35,000 human lives 2.4 million injured

Total annual cost of traffic accidents (Killed and Injured): Fatalities + Injuries

$ 594 billions

These alarming rates raise the importance of studying the underlying factors associated with Crash/Near-Cash (CNC) events and distracted driving conditions

Introduction Parametric Modeling

Non-Parametric Modeling

Features

Examples

Features

Linear Regression

No fixed structure

Classification Trees

Model learns from Data

Ensemble Tree based

Basic Statistical Structure Specific assumptions

Certain relationships between input and output variables

Poisson Regression

Negative Binomial regression

Contingency Tables

Model becomes more complex to accommodate the complexity of the data

Examples

Neural Networks

Clustering Analysis

Introduction

Non-Parametric (Data mining) approaches

NDS data examine the driving behavior and understand the likely causes of crashes

Crash records Only

Market Basket Analysis

Introduction Market Basket Analysis (MBA)

Objectives Perform Comprehensive MBA for extracting useful association rules

Use the entire SHRP 2 NDS data (crash/near-crash and normal/baseline events)

Methodology Rules structure and evaluation criteria RHS

LHS

X SUPPORT ( X  Y ) =

 (X

Y)

N

 (X Y) CONFIDENCE ( X  Y ) =  (X ) LIFT ( X  Y ) =

CONFIDENCE ( X  Y ) SUPPORT (Y )

Y

Methodology

Description of Data Data from all six sites were used (New York, Pennsylvania, Florida, Washington, North Carolina, and Indiana)

Baseline, Crash and Near-Crash events (23,710 event)

Event details table and driver demographics questionnaire (24 variables per event)

Methodology

Description of Data Variables per each event Age Gender

Years of Driving

Annual Miles Traveled

Relation to Junction

Training

Event Duration

Intersection Influence

Secondary Task 1

Income

License Age

Education

State

Working Status

Vehicle

Front seat passengers

Occupied lane

Marital Status

Driver Behavior

Rear seat passengers

Locality

Secondary Task 2

Alignment

Grade

Methodology Thresholds for extracting rules Rule Support ≥ 3% Confidence ≥ 75%

Lift >1

Rule Length ≤ 3

Removing the Redundant rules A specific rule is considered redundant if it is equally or less predictive than a more general rule.

Methodology

Non-redundant rules (4,754 rules)

Methodology

Useless

Obvious rules

LHS

RHS

S

C

L

6%

99%

3.4

SecondaryTask1=Passenger Interaction

Front Seat Passengers=2

Relation to Junction=Interchange

Locality=Interstate/Highway

10%

95%

3.5

SecondaryTask1=None

SecondaryTask2=None

47% 100%

1.2

Grade=Level, Working Status=Full-time

Alignment=straight

29%

1.1

88%

Crash Association Rules #

LHS

RHS

S

C

L

75%

4.2

1

Driver Behavior=Improper actions, Rear seat passengers=0

Event=Crash/near-Crash

5%

2

Driver Behavior=Improper actions, Grade=Level

Event=Crash/near-Crash

5% 76% 4.2

3

Driver Behavior=Distracted

Event=Crash/near-Crash

5% 79% 4.4

4

Driver Behavior=Distracted, Grade=Level

Event=Crash/near-Crash

4% 79% 4.4

5

Driver Behavior=Distracted, Alignment=straight

Event=Crash/near-Crash

4% 79% 4.4

6

Driver Behavior=Distracted, Front Seat Passengers=1

Event=Crash/near-Crash

4% 79% 4.5

7

Driver Behavior=Distracted, Rear Seat Passengers=0

Event=Crash/near-Crash

4% 80% 4.5

8

Driver Behavior=Distracted, Years Driving=[0,20)

Event=Crash/near-Crash

3% 80% 4.5

Driver Characteristics Rules

Socioeconomic Related Rules #

LHS

RHS

S

C

L

1

Age=30-34

SecondaryTask2=None

5%

87%

1.00

2

Age =45-49

SecondaryTask2=None

4%

88%

1.01

3

Age =50-54

SecondaryTask2=None

4%

90%

1.04

4

Age =60-64

SecondaryTask2=None

4%

90%

1.03

5

Age =65-69

SecondaryTask2=None

5%

92%

1.05

6 7 8

Age =70-74 Age =75-79 Age =80-84

SecondaryTask2=None SecondaryTask2=None SecondaryTask2=None

4% 5%

94% 93%

1.08 1.07

4%

93%

1.07

9

Age =35-39

Event Type=Normal

3%

83%

1.01

10 11 12 13 14 15 16 17 18

Age =40-44 Age =45-49 Driver Behaviour= Distracted Age =55-59 Age =60-64 Age =65-69 Age =70-74 Age =75-79 Age =80-84

Event Type=Normal Event Type=Normal Rear Seat Passengers=0 Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal Event Type=Normal

3% 4% 5% 3% 3% 5% 4% 5% 3%

86% 85% 93% 83% 82% 88% 90% 89% 82%

1.04 1.04 1.01 1.03 1.03 1.07 1.10 1.08 1.00

Conclusions whenever the driving experience is less than 20 years, the driver is more likely to get involved in cell phone texting/reading/writing activity and there is an increased likelihood of crash/near-crash event occurrence if the driver gets distracted for this age group.

Strong association between likelihood of Crash/Near-Crash event occurrence and each of the following: A.Improper actions

B.Driver is distracted by a secondary task

Conclusions Association between the normal driver behavior or normal/baseline events with each of the following:

a) Driving locality is an Interstate/Highway/Residential, b) Driver not near any intersection c) Driver is married MBA application in safety research as a more reliable and accurate tool for analysing naturalistic driving data, especially with a comprehensive database with high dimensionality (a large number of variables) and multicategorical variables like the SHRP 2 NDS data.