Machine Learning project: Identify a Car's Driver from Driving Behavior

Machine Learning project: Identify a Car’s Driver from Driving Behavior Fan Yang, Chunjing Jia December 12, 2013

1

Introduction

Each individual has his/her personal driving behavior, which could been used as a identifying characteristic, similar to handwriting. Under this hypothesis, we propose a learning study of the connection between a driver’s identity and the vehicle’s characteristics, such as accelerometer/heading/speed, which can usually be collected using the electronic system of the vehicle or by imposing other measurements. The dataset includes real-time high-frequency accelerometer, heading, speed, odometer, and gas usage. We first convert the time-dependent data into a large number of time-independent features, which can then be used to train vehicle-against-vehicle classifiers. We aim to obtain the reliable supervised learning algorithm for the single driver driving the same car, as well as unsupervised clustering to detect when vehicles have multiple drivers.

2

Data Collecting

The data collection was operated by MetroMile, Inc and has been saved in csv (comma-separated values) format which can be seen and manipulated by Microsoft Excel and Matlab. Each csv file has the information for one car, in which the data was collected for a number of trips. Each trip includes the information for a continuous section of time, usually every second or every few seconds. The recorded information includes the velocity in the units of mph, the orientation of the car, the accelerations in three dimensions, and the transient gas usage. See figure 1 for the first trip that has been collected for car #133000249. The characteristic number of trips collected at each car is a few thousands, which is for example 2281 for car #133000249 when any two consecutive data points collected with a time interval greater than 60 seconds being seen as two different trips. This provides us a lot of information to study the driving behavior of each driver. And further with the assumption that driving behavior is unique for each single person, we can identify the driver just by looking at the way he/she drives. We note that the we assume that the each driver’s driving behavior is independent of the car’s make/model/condition, just 1

like when people recognize the signature the kind of pen he/she uses is ignored. The same data collecting procedure has been performed for 18 different cars. We know from the data provider that some of the cars are driven by one single drive, while some of the cars are driven by multiple people in a family. Table 1 shows the list of car names, the corresponding number in the study and the number of drivers.

300 200 100 0

heading degree 0

100

200

300

400

500

600

700

800

100 speed mph gas mpg 50

0

0

100

200

300

400

500

600

700

1

800

accel x gs accel y gs accel z gs

0.5 0 −0.5 −1 0

100

200

300

400

500

600

700

800

Figure 1: The information of the first trip/section that has been collected for car #133000249.

3

Feature selection

Extracting out the key features from the tons of data that we have obtained is one of the key questions for this study. We see each trip as one data point, so that we can extract a vector x containing all the useful features to represent this data point. Then we can obtain, for example for car #249, 2281 data points. This has provided us a large enough data set for either the regression for the single-driver cases or the multi-class classification for the multiple drivers cases. To find out the good and useful features turn out to be a tough question, especially considering the complexity of the collected data and the problem itself. The features that we propose include: (1) average speed in each section x1 (2) max speed in each section x2 (3) average speed on the ramp when entering highway x3 (4) average speed on the ramp when leaving highway x4 (5) frequency of lane changing x5 (6) speed at 1 second before stop x6 (7) speed at 2 second before stop x7 (8) speed at 3 second before stop x8 (9) speed at 1 second after start x9 (10) speed at 2 second after start x1 0 (11) speed at 3 second after start x11 . x= [x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 ]T . We find 2

Table 1: The list of car models, with the car number and the number of drivers, that have been used for the data collecting. car model and make 2005 Volkswagen GTI 2-door Hatchback — 4-cylinder 2004 Honda Pilot 6-cylinder — 4WD 2012 Toyota Prius v 4-door Wagon — 4-cylinder 2003 Toyota Corolla 4-door Sedan — 4-cylinder 2011 Infiniti G37 4-door Sedan — 6-cylinder 2011 Mercedes-Benz GL450 8-cylinder — 4WD 2008 Subaru Outback 4-door Wagon — 4-cylinder 2003 Honda Accord 4-door Sedan — 4-cylinder 2005 Toyota Camry 4-door Sedan — 4-cylinder 2012 Subaru Impreza 4-door Wagon — 4-cylinder 2011 Volkswagen Jetta 4-door Sedan — 5-cylinder 2011 Nissan Versa 4-door Hatchback — 4-cylinder 2007 Acura MDX 6-cylinder — 4WD 2000 Toyota Camry 4-door Sedan — 4-cylinder 2007 BMW 335 4-door Sedan — 6-cylinder 2001 BMW X5 8-cylinder — 4WD 2006 Honda Civic 2-door Coupe — 4-cylinder 2003 Cadillac CTS 4-door Sedan — 6-cylinder

car number 133000249 133000250 133000251 133000252 133000253 133000254 133000257 133000258 133000259 133000261 133000263 133000265 133000284 133000374 133000381 133000386 133000485 133000623

driver(s) condition 2 2 2 1 Family of 3 drivers same as 254 Family of 3 drivers same as 257 1 1 1 2 1 1 2 1 1 1

out that by including these features we don’t oversimply the modeling nor make the modeling over complicated so as to overfit.

4

Supervised learning

We performed supervised learning for the car of single driver. The internal relation of the features can be modeled as: x2 ∼ N (a1 ∗x21 +a2 ∗x1 +a3 , a4 ∗x1 +a5 ), x3 ∼ N (a6 , a7 ), x4 ∼ N (a8 , a9 ), x5 ∼ N (a10 , a11 ), x6 ∼ N (a12 ∗ x7 + a13 , a14 ∗ x7 +a15 ), x7 ∼ N (a16 ∗x8 +a17 , a18 ∗x8 +a19 ), x9 ∼ N (a20 ∗x10 +a21 , a22 ∗x10 + a23 ), x10 ∼ N (a24 ∗ x11 + a25 , a26 ∗ x11 + a27 ). For the cars of one single driver, we fit the features with the model described above and find the parameter a= [a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 ]T . The parameter vector a can be used to as the identification for the driver. A model fitting of the features for car #133000259 has been shown in figure 2.

5

Unsupervised learning

For those cars of multiple drivers, we use k-means clustering algorithm to separate different drivers. For example, for car #133000249 as shown in figure 3, the frequency of lane changing highlighted by the dotted circles have two clusters that can be directly used to separate the two drivers. This algorithm becomes very useful for separating the drivers who have very different behaviors on lane changing frequency, but may not work very well when different drives tend to have close behaviors on lane changing frequency.

3

speed

average (accel)speed on ramp (mph) average (deaccel)speed on ramp (mph) frequency of lane changing #/5000s

100 0.05 max speed mph

80

0

60

−0.05

40

−0.15

−0.1

−0.2

20

−0.25 0

20

40 60 average speed mph

80

100

0

50

50

40

40 speed next second

speed last second

0

30 20 1 second before stop 2 seconds before stop

10 0

0

10

20 30 speed this second

40

20

40

60

100

30 20 10 0

50

80

1 second after stop 2 seconds after stop 0

10


40

50

Figure 2: Features and the model parameters for car #133000259 (1 driver). speed

100

0.05

max speed mph

80

0

60

−0.05

40

−0.15

−0.1

−0.2

20 0

−0.25 0

20

40 60 average speed mph

80

100

0

50

50

40

40

speed next second

speed last second

average (accel)speed on ramp (mph) average (deaccel)speed on ramp (mph) frequency of lane changing #/5000s

30 20 10 0

1 second before stop 2 seconds before stop 0

10


40

40

60

80

100

30 20 10 0

50

20

1 second after stop 2 seconds after stop 0

10


40

50

Figure 3: Features and the model parameters for car #133000249 (2 drivers). 4

Machine Learning project: Identify a Car's Driver from Driving Behavior

Machine Learning project: Identify a Car's Driver from Driving Behavior

Suggest Documents

Driving Behavior Signals and Machine Learning: A

Sparsifying machine learning models identify

Driver Identification Using Driving Behavior Signals - CiteSeerX

Modeling and Recognizing Driver Behavior Based on Driving Data: A ...

Using Machine Learning Techniques to Identify

Applying machine learning to identify autistic

Applying Machine Learning to identify Geological ...

MACHINE LEARNING SYSTEMS FOR DETECTING DRIVER ...

The Driving School System: Learning Basic Driving Skills From a ...

A Machine Learning Approach to Identify Clinical Trials ... - Core

Virtual Reality Driving Simulation for Measuring Driver Behavior and ...

Examining the influence of aggressive driving behavior on driver injury ...

Identify User's Satisfaction from Platform Using Behavior

a driving simulator study on diverging driver

A driver behavior recognition method based on a driver ... - CiteSeerX

Using Machine Learning to Predict Project Effort

Utility of Machine-Learning Approaches to Identify ... - Frontiers

Machine Learning Classification to Identify the Stage ...

Machine Learning Classification to Identify the Stage ...

Machine learning models identify molecules active ... - Semantic Scholar

Fingerprint-Based Machine Learning Approach to Identify ... - MDPI

Driver distraction from in-vehicle telematics devices - Distracted Driving

Using Machine Learning to Identify Intonational ... - Semantic Scholar

Using Machine Learning Classifiers to Identify Glaucomatous Change