Predicting Visual Distraction Using Driving Performance Data. Katja Kircher & Christer Ahlstrom. VTI (Swedish National Road and Transport Research Institute) ...
Predicting Visual Distraction Using Driving Performance Data Katja Kircher & Christer Ahlstrom VTI (Swedish National Road and Transport Research Institute) __________________________________ ABSTRACT – Behavioral variables are often used as performance indicators (PIs) of visual or internal distraction induced by secondary tasks. The objective of this study is to investigate whether visual distraction can be predicted by driving performance PIs in a naturalistic setting. Visual distraction is here defined by a gaze based real-time distraction detection algorithm called AttenD. Seven drivers used an instrumented vehicle for one month each in a small scale field operational test. For each of the visual distraction events detected by AttenD, seven PIs such as steering wheel reversal rate and throttle hold were calculated. Corresponding data were also calculated for time periods during which the drivers were classified as attentive. For each PI, means between distracted and attentive states were calculated using t-tests for different time-window sizes (2 – 40 s), and the window width with the smallest resulting p-value was selected as optimal. Based on the optimized PIs, logistic regression was used to predict whether the drivers were attentive or distracted. The logistic regression resulted in predictions which were 76 % correct (sensitivity = 77 % and specificity = 76 %). The conclusion is that there is a relationship between behavioral variables and visual distraction, but the relationship is not strong enough to accurately predict visual driver distraction. Instead, behavioral PIs are probably best suited as complementary to eye tracking based algorithms in order to make them more accurate and robust.
__________________________________ INTRODUCTION Driver distraction is a widespread phenomenon and may contribute to numerous crashes [Gordon, 2009; Klauer, et al., 2006; Olson, et al., 2009]. It would, therefore, be both cost and safety beneficial if a distraction detection algorithm based on vehicle performance data available through the vehicle controller area network (CAN) could be developed. Driver distraction can be defined as “the diversion of attention away from activities critical for safe driving toward a competing activity” [Lee, et al., 2009]. This is a very general definition where the diversion of attention can be visual, auditory, physical or cognitive and where the competing activity can be anything from mobile phone usage to daydreaming. Since this definition is impossible, or at least very difficult, to apply to large sets of naturalistic data where manual inspection is impractical and expensive, a more pragmatic and easily operationalized definition of driver distraction was used in this study. We were only interested in visual distraction, which is defined as not looking at what is classified to be relevant for driving. More specifically, a recently developed distraction warning system called AttenD [Kircher & Ahlstrom, 2009] was used to determine when a driver was considered to be attentive or distracted.
Distraction is often measured with eye movements [Donmez, et al., 2007; Kircher & Ahlstrom, 2009; Victor, et al., 2005; Zhang, et al., 2006] and even though the relationship between visual distraction and eye movements is self-evident, there are some general issues related to using eye movements as a predictor of driver distraction. Although the technical development of eye movement systems is occurring rapidly, it is still difficult to track a driver’s gaze accurately and precisely over a long period of time and with large coverage. Clothing, mascara, certain facial features, eye glasses, large head rotations and a number of other factors can impede tracking, and vibrations, sunglare and other situational factors can disturb the tracking equipment. So far, highperformance eye tracking systems typically used for research are still prohibitively expensive, and even one-camera systems, which have inferior performance, entail considerable costs. Last but not least, relying on a single sensor alone poses a risk because the sensor can fail, and then no backup data are available. One solution to this problem would be to establish additional indicators of visual distraction that are based on sensors other than eye tracking, preferably sensors that are already present in the vehicles. Over the years, numerous PIs based on longitudinal and lateral vehicle control dynamics have been developed and refined in order to track effects of visual or internal distraction. Some of these PIs, such
as headway or lateral position metrics, require advanced hardware. However, other PIs only require measures that are readily available on the CAN. Examples include steering wheel reversal rate [MacDonald & Hoffman, 1980] and throttle hold rate [Zylstra, et al., 2003]. The objective of this study is to investigate if visual distraction, as measured via eye movements, can be predicted by behavioral PIs in a naturalistic driving setting. METHOD This study is based on data acquired during an extended field study performed in 2008 [Kircher & Ahlstrom, 2009; Katja Kircher, et al., 2009]. All participants gave their informed consent, and the study was approved by the ethical committee for studies with human subjects in Linköping, Sweden. The main requirement for participation was to recruit non-professional drivers that regularly drove about 200 km per day, such as commuters. Further requirements were that the drivers should be at least 25 years of age and they should have held a valid driving license for at least seven years. In order to ensure good eye tracking results, the participants should not wear eye glasses, heavy mascara or be bearded. Seven drivers participated in the study, four men and three women. Their mean age was 42 years (sd = 10.9 years), and they had held their driving license on average for 25 years (sd = 10.9 years). One participant did not report his age. The test car was instrumented with an autonomous data acquisition system which logged CAN data, GPS position and video films of the forward scene
and the vehicle cabin. Furthermore, the vehicle was equipped with the non-obtrusive remote eye tracker SmartEye Pro 4.0 (SmartEye AB, Gothenburg, Sweden). Two cameras observing the driver’s face were installed in the vehicle, one on the A-pillar and the other in the middle console. To ensure good eye tracking quality, the cameras were automatically recalibrated every hour based on the head model. The system logged eye movements and head movements with a frequency of 42 Hz. Each participant drove a baseline phase lasting approximately ten days. During this time data were logged, but would-be warnings from the distraction warning system were inhibited. After the baseline phase the driver was informed that the vehicle was equipped with a distraction warning system. During the following three weeks each participant drove with AttenD activated. In both phases the participants used the car just as they would their own, without any restrictions regarding where, when or how they could drive. Distraction Warning System The distraction warning algorithm AttenD was developed at the Swedish National Road and Transport Research Institute in cooperation with Saab Automobile AB. A short description of AttenD is included here, but detailed information can be found in Kircher and Ahlstrom [2009]. An illustration of how AttenD operates is given in Figure 1. AttenD is based on a 3D model of the cockpit, dividing the car into different zones such as the windshield, the speedometer, the mirrors and the dashboard, and on the time the driver spends looking within these zones.
Figure 1. Example of time trace illustrating the development of the attention buffer for three consecutive one-second glances away from the field relevant for driving (FRD), marked dark gray, with half-second glances back to the FRD in between. Note the 0.1 s latency period before increasing the buffer again. A glance to the rear view mirror is exemplified between -1.8 s and 0 s, note the 1 s latency period before the buffer starts to decrease.
A time based attention buffer with a maximum value of two seconds is decremented over time when the driver looks away from the field relevant for driving (FRD), which is represented by the intersection between a circle of a visual angle of 90° and the vehicle windows, excluding the area of the mirrors. When the driver’s gaze resides inside the FRD, the buffer is incremented until the maximum value is reached. One-second latencies are implemented for the mirrors and the speedometer before the buffer starts decrementing. There is also a 0.1 second delay before increasing the buffer after a decrement phase. This latency is meant to reflect the adaptation phase of the eye and the mind to the new focusing distance and the driving scene. When no eye tracking but only head tracking is available, a somewhat simplified algorithm based on head direction takes over.
calculated based on time series data from a particular time-window, see Figure 2. The size of the window is different for different PIs. In addition to the seven PIs, the speed measured at the end of each window, called point speed, was included for control purposes, since several of the other PIs depend on the point speed [Green, et al., 2007]. The variability in speed is an indicator of longitudinal driving behavior. Here it is represented as the standard deviation of speed, calculated in the current window (measured in km/h).
CAN Based Performance Indicators
Throttle hold rate is another indicator of longitudinal driving behavior which is based on the observation that speed adaptation is diminished when the driver is distracted [Zylstra, et al., 2003]. Throttle hold rate is based on the position of the accelerator pedal (measured in per cent) according to Green et al. [2007]. If the throttle position deviated with less than 0.2% in a one-second wide sliding window, a throttle hold was marked at this particular window location. The throttle hold rate is then determined as the percentage of throttle hold instances within the current PI window. Note that two different windows are used to calculate the throttle hold rate; a shorter window which determines if there are pedal movements or not, and a longer window which determines the percentage of throttle holds.
Seven different PIs, which have been introduced in the literature as potential indicators of driver distraction, were included in this study. Each PI was
Increased steering wheel activity has been linked to visually and cognitively distracting tasks, and a variety of steering wheel metrics have been proposed
When the buffer reaches zero the driver is considered to be distracted, and when further conditions are met (e. g. direction indicators not activated, speed above 50 km/h, no brake activation, no extreme steering maneuvers), a warning is issued. When the buffer value is between 1.8s and 2.0s, the driver is considered to be attentive. For the AttenD algorithm, raw gaze data were used instead of fixations in order to reduce the computational complexity.
Time (s)
SD Speed Point Speed
SWRR
SD SWA Max. Rev.
St. Entropy
High Freq. St.
Throttle Hold Rate
Figure 2. Visualisation of the different window sizes in relation to a distraction event as defined by the AttenD algorithm. The PIs marked in grey are included in the logistic regression.
for quantifying this activity. They range from simple indicators such as the standard deviation of steering wheel angle [Liu & Lee, 2006] to more advanced metrics such as steering wheel reversal rate, high frequency steering and steering entropy. Steering wheel reversal rate (SWRR) measures the number of steering wheel reversals per minute. The steering wheel angle signal is filtered with a 2nd order Butterworth low pass filter (cut-off frequency = 0.6 Hz) before reversals that are larger than 1° are extracted according to Östlund et al. [2005]. Max steering wheel reversal is calculated alongside SWRR and is defined as the largest steering wheel reversal, measured in degrees, within the current window. High frequency steering is sensitive to variations in both primary and secondary task load [McLean & Hoffmann, 1971], and is calculated as the total power in the frequency band 0.35 – 0.6 Hz. The power spectral density of the steering wheel signal is estimated with the fast Fourier transformation and the power in the considered frequency band is calculated by numerical integration. Steering entropy represents the predictability of steering wheel movements [Boer, et al., 2005]. The steering wheel signal is downsampled to 4 Hz and the residuals from a one-step prediction based on autoregressive modeling are used to determine the entropy. During visual distraction, corrective actions result in a less predictable steering profile and consequently a higher entropy. Data Processing All data preparation and processing was carried out using MATLAB 7.9, except for the logistic regression and the evaluation which were carried out using SPSS 17.0. The driving behavior based PIs were calculated for each distraction event, i. e. when AttenD gave a warning (treatment) or would have given a warning (baseline), and for each attentive event, that is, when the attention buffer exceeded 1.8 seconds across the complete duration of the time window in question. In cases where several warnings occurred consecutively, and where the buffer did not reach its maximum value in between, only the first distraction event was used for the analyses. Similarly, only one attentive event was allowed between two occurrences in which the buffer value dropped below 1.8 s, and the PI window was centered in this interval. Data windows representing attentive events were not allowed to overlap with the windows resulting from distraction events. Also, the location of the largest attentive windows governed the location of the shorter windows, so that the number of windows was the same regardless of window size.
Several of the PIs that were used can be computed with different parameters, which can influence their outcome. Typical examples include cut-off frequencies for different filters, the gap size in SWRR and the choice of model order in steering entropy. In this study, parameter values suggested in the literature were used for all parameters except window size. To find an optimal window size, the PIs were calculated using sizes in the range from 2 – 40 seconds, and the size giving the largest difference between attentive and distracted was selected. T-tests were used to analyze this difference, and the optimization was carried out using data from the baseline dataset. A logistic regression, based exclusively on the driving performance related PIs, was calculated. Based on the resulting values each event was classified as either “distracted” or “attentive”. The PIs were entered stepwise (floating forward selection), with the inclusion condition of a p-value below 0.05, and with an exclusion criterion of a pvalue above 0.10. The classification threshold was set to 0.5, and the maximum number of iterations was set to 20. The baseline dataset was used to determine the coefficients in the logistic regression equation. By doing so it was ensured that the training set was not influenced by any distraction warnings. The treatment dataset, produced by the same participants at a different point in time, was used to validate the coefficients. The possibility of splitting the baseline dataset in two subsets for (i) determination and (ii) evaluation of the coefficients was rejected, in order to avoid the risk of too few data points in each set for some of the participants. The sensitivity and specificity of the classifier was evaluated using contingency tables, indicating the percentage of correctly and incorrectly classified events, and receiver operating characteristic (ROC) curves, indicating how the selection of the classification threshold would influence the percentage of correct classifications. RESULTS The seven participants provided 1644 attention windows and 1808 distraction windows during the baseline phase, according to the described procedure and specification. Corresponding figures for the treatment phase were 1931 attention windows and 1881 distraction windows, respectively. The number of attention and distraction windows differed widely between participants both in the baseline phase and in the treatment phase (see Table 1).
Table 1. Number of attention and distraction cases and the mileage driven. Baseline dataset
Treatment dataset
Participant
Attention cases
Distraction cases
Quota
Mileage (km)
Attention cases
Distraction cases
Quota
Mileage (km)
1
219
182
1.20
1047
470
241
1.95
1987
2
277
277
1
3138
318
348
0.91
2299
3
38
89
0.43
708
90
203
0.44
833
4
210
58
3.62
2137
461
158
2.92
4450
5
764
1044
0.73
4290
346
684
0.52
3117
6
72
131
0.55
890
104
152
0.68
1679
7
64
25
2.56
710
142
95
1.48
3169
total
1644
1808
0.91
12920
1931
1881
1.03
17534
For each window size (2 – 40 seconds in 1 second intervals) and each PI, means were compared between the attentive and the distracted states using ttests. For all PIs, a number of window sizes existed that yielded significant differences in the mean value between attentive and distracted. See Figure 3 and Figure 4 for an example plot of the standard deviation of the steering wheel angle. Even though the means do differ significantly for several window sizes (Figure 3), it is notable that the standard deviations for the two driver states overlap markedly (Figure 4). The window sizes, means, standard deviations and pvalues for the optimal windows for each PI are presented in Table 2. The logistic regression on the 3452 baseline cases showed that steering entropy provided the highest explanation of variance, followed by SWRR, standard deviation of steering wheel angle and point speed (see Table 3). The variance explained did not change much when new variables were entered, and no variables were removed in the process. The resulting logistic regression coefficients are presented in Equation 1. Setting the threshold to 0.5 resulted in a sensitivity of 0.76, a specificity of 0.79 and 77.5 % correct classifications.
Table 2. Mean and standard deviation for each PI, where the selected window size is chosen based on the minimum p-value from a t-test between the groups distracted and attentive. The results were obtained from the baseline dataset. PI
Win size (s)
Mean attentive
Mean distracted
pvalue
SD Speed (km/h)
6
1.2±1.1
1.6±12.0