Screening Naturalistic Driving Study Data for Safety-Critical Events Kun-Feng Wu and Paul P. Jovanis This study responds to the need to screen events observed during naturalistic driving studies to derive a set of crashes and near crashes with common etiologies; these crashes are referred to as “well-defined surrogate events.” Two factors are critical to the identification of these well-defined surrogate events: selection of screening criteria and the designation of a time window to be used for event search. Testing conducted by using an algorithm developed in a previous study is described. The algorithm allows for the use of a range of search criteria to identify events with common etiology from unrefined naturalistic driving data. A range of kinematic search criteria was used to screen events, including lateral and longitudinal accelerations averaged over different time windows and characterized by average as well as maximum values during a time window. The testing was conducted with data from road departure events collected during a concluded 100-car naturalistic driving study. Fifty-one nonintersection and 12 intersection-related run-off-road events were included in the testing. Different sets of events were identified with different search criteria and different time windows. Diagnostic tools borrowed from medicine identified the best screening criteria and time windows. The methods allowed for enhanced identification of well-defined surrogates by using covariates such as driver attribute context and driver fatigue. The research illustrates a flexible procedure that uses a variety of statistical methods shown to effectively screen crashes and near crashes.
confirmation of event occurrence with video data from any of multiple cameras located onboard the vehicle. Clearly, the longer the duration of the study and the larger the number of subjects, the larger the data base of video and kinematic data to be searched for the relatively rare safety-critical events of interest. Table 1 illustrates an example of the search criteria used in a recent study; many kinematic triggers may be used (e.g., longitudinal or lateral acceleration, yaw rate) with a range of triggering values (e.g., longitudinal deceleration greater than 0.6 g; lateral acceleration greater than 0.7 g) (5). Table 1 also indicates that some kinematic triggers may be used in combination (e.g., time to collision obtained from onboard radar and longitudinal deceleration). Experience with the data sets produced from naturalistic studies consistently points to the need to screen the data set with kinematic measures and then verify event occurrence by using video.
BACKGROUND The need for a generalizable framework and the issues that can arise from naturalistic data can be observed by using two examples from the recent U.S. 100-car study (5). Figure 1 shows two cases for two different events; several kinematic signatures are recorded for each. Data are collected at 10 Hz, so the time scale of −300 to +100 in Figure 1a depicts 400 samples or 40 s of data. Recorded for each event are the lateral and longitudinal acceleration (shown by a dashed line with three dots and a solid line, respectively) along with speed in miles per hour. Figure 1a is a curb strike that occurs while the driver (who appears to be drowsy on the video) is placing an item in the glove box. The event shown in Figure 1b is one in which the driver looks in the rearview mirror while the vehicle runs off the road to the right. Considering Case 1, it is clear that the lateral acceleration has identified the event as occurring just after the zero point on the graph. Longitudinal acceleration changes very little, either before or after the event. Speed first increases gradually from 55 to 60 mph during Time −300 to −200. The speed remains constant for about 5 s (–200 to −150) and then gradually drops until it reaches approximately 42 mph at Time 100. For this event, clearly the lateral acceleration would be a good triggering criterion, and use of longitudinal acceleration or speed would miss the event entirely. Case 2 is more complex. Speed varies during the event, first increasing (up to Time −200) and then decreasing, increasing again from about −70 to zero, and then dropping before increasing for the final time from about Time 80 to 180. The lateral acceleration shows a series of peaks and troughs with the first between −300 and −200, followed by a steep drop at −100 at a rate of −0.4 g for less than a second. There is an increase and some oscillation between zero and 0.2 g until about Time +50, where there is a rapid acceleration of 0.4 g to the right followed by an acceleration of about 0.3 g
Naturalistic driving is a research technique that electronically observes how people drive by using a variety of cameras and sensors such as vehicle accelerometers, gyroscopes, and the Global Positioning System. The advantage of the technique is the observations of driving, including crash and near-crash events, which can be used in a range of safety studies. The technique has been applied in the United States in intelligent transportation system technology assessments for many years (1–4). It has also been applied to safety studies of the general driving public (5) as well as to studies of specific driver populations such as truck drivers (6), young drivers (7), and motorcyclists (8). A common attribute of the technique is the need to identify what some have called “safety-critical events” (e.g., crashes and near crashes) by using a combination of kinematic measures from accelerometers, gyroscopes, and radar along with visual verification and K.-F. Wu, Turner–Fairbank Highway Research Center, FHWA, U.S. Department of Transportation, 6300 Georgetown Pike, McLean, VA 22101. P. P. Jovanis, Civil and Environmental Engineering, Pennsylvania Transportation Institute, Pennsylvania State University, 212 Sackett Building, University Park, PA 16802-1408. Corresponding author: K.-F. Wu,
[email protected]. Transportation Research Record: Journal of the Transportation Research Board, No. 2386, Transportation Research Board of the National Academies, Washington, D.C., 2013, pp. 137–146. DOI: 10.3141/2386-16 137
138
Transportation Research Record 2386
TABLE 1 Summary of Kinematic Search Criteria for Events (5) Trigger Type
Description
Lateral acceleration Longitudinal acceleration
Lateral acceleration ≥ 0.7 g Acceleration or deceleration ≥ 0.6 g Acceleration or deceleration ≥ 0.5 and forward TTC ≤ 4 s 0.4 g ≤ longitudinal deceleration < 0.5 g, forward TTC ≤ 4 s, and forward range at the min. TTC ≤ 100 ft Activated by driver by pressing a button located on the dashboard when an event occurred that he or she deemed critical Acceleration or deceleration ≥ 0.5 g and TTC ≤ 4 s 0.4 g ≤ longitudinal deceleration < 0.5 g, forward TTC ≤ 4 s, and forward range at the min. TTC ≤ 100 ft Rear TTC ≤ 2 s, rear range ≤ 50 ft, and absolute acceleration of following vehicle > 0.3 g Any value greater than or equal to a plus AND minus 4-degree change in heading (i.e., vehicle must return to same general direction of travel) within a 3-s window of time
Event button
Forward time to collision
Rear time to collision
Yaw rate
2. There is a need to explore the basic time scale or window of measurement. A window as small as 1⁄10 of a second (the resolution of the accelerometers in this case) may have identified only one event in Case 1 but could have identified as many as three events in Case 2. These important points can be addressed by using an iterative process to systematically screen the data set, identify the potential safety-critical events, and validate the events for similar etiology. The existing studies have laid the foundation for using naturalistic data, but a more generalizable screening framework is needed for a broader range of naturalistic studies. This need is particularly crucial when future studies are conducted that compare the United States with other countries.
EVENT SCREENING AND VERIFICATION OF EVENT ETIOLOGY Event Screening Screening methods include the identification of sudden evasive maneuvers that reflect changes in vehicle kinematic variables such as longitudinal or lateral acceleration (5). Some researchers have used physics-based metrics reflecting the ongoing collision process, such as a short time to collision (3) or the maximum additional time that a following vehicle could have waited to brake before avoiding a rear-end crash (referred to as lagged time) (3, 10, 11). Lagged time was used to determine event severity in these last three studies, not to screen events of interest. The events and conflict events identified by Dingus et al. (5), Battelle (3), and Fitch et al. (10) are analogous to events of interest and surrogate events in studies by Wu and Jovanis (12, 13) and Wu (14). Table 1 is an example of search criteria used to identify events for the Virginia Tech Transportation Institute (VTTI) 100-car study (5). In studies by Battelle (3) and Fitch et al. (10), events of interest are events in which a deceleration greater than 0.25 g is required to avoid a collision with a lead vehicle within 1.5 s, and the trigger must persist for at least 0.7 s. These studies then filter out nonthreatening triggered events by using video review to confirm the presence of a rear-end conflict. The algorithms implemented in the field operation
Note: TTC = time to collision; min. = minimum.
to the left (negative direction). Longitudinal acceleration indicates acceleration at a decreasing rate until about Time −170, when the acceleration drops to zero at about −110. There is some oscillation until about Time +50, where there is a rapid and sharp deceleration followed by a gradual acceleration. From Figure 1, one can identify three separate lateral acceleration changes during this 50-s period, which may or may not be part of the same event. These two cases serve to graphically illustrate several important points: 1. Use of different kinematic search criteria can reveal conflicting outcomes, which may be resolved only by considering context [see the paper by Jovanis et al. (9) for the importance of context].
-100
0
100
40
.4
30 35 Speed (MPH)
.2 0 -.2
25
Longitudinal/Lateral Acceleration (g)
40 -200
-.4
50 45
0 -.2 -300
Speed (MPH)
.2
55
.4
.6
60
Lateral acceleration (g); Positive, turn to left Longitudinal acceleration (g) Speed (MPH)
-.4
Longitudinal/Lateral Acceleration (g)
Lateral acceleration (g); Positive, turn to left Longitudinal acceleration (g) Speed (MPH)
-300
-200
-100
0
Measurement Duration
Measurement Duration
(a)
(b)
FIGURE 1 Two cases illustrating vehicle kinematics: (a) simple crash event and (b) more complex event.
100
200
Wu and Jovanis
tests by researchers at the University of Michigan Transportation Research Institute can also be considered another set of search criteria. A portion of these algorithms involve operation of proprietary equipment, so some of the details of their rationale are rather vague (2, 4). Unfortunately, the foregoing studies provide little discussion on the selection of the screening criteria, the weakness and strengths of alternative criteria, and the implications of changes in event identification when screening criteria are modified. Once events of interest are detected, it is critical to determine the duration for those events. VTTI researchers determined event duration through the use of forward and face video (5). The events began at the onset of the precipitating factors and ended after the evasive maneuvers. This determination is costly and resource intensive since it requires a researcher to view every frame of the driving video. Relying solely on the use of forward and face video also raises questions on reliability among raters and any changes in study goal will require additional video reduction as well as cost.
Verification of Event Etiology Systematic verification of event etiology has received only cursory treatment in past naturalistic driving data analyses. Typically the data sets are only screened once (3–5, 10, 13). If one thinks about the use of naturalistic driving event data in safety analyses, this one-time screening seems a weak choice. The rationale for the use of surrogate events is their use to assess the expected number of crashes and their contributing factors (10, 12–21). Searching for different crash types or assessing different countermeasures for effectiveness may lead to different search criteria and crash or near-crash events. Assessing the similarity among surrogate events is challenging. Consider an event of interest with T time frames recorded and N variables, including kinematic variables, event attributes, driving environment, and driver attributes; the basic set of variables is dimensioned T by N. As an example, for two events of interest with exact event duration of 30 s being recorded every 0.1 s, T is 300; Meanwhile, suppose that “only” 100 variables are recorded (e.g., vehicle speed, lane position); N is 100. Hence, there are 300 p 100 = 30,000 dimensions for evaluating the similarity between these two events of interest. Even if these 30,000 dimensions were similar at the beginning of the events, changes of predictors over time would make an exact match between two events virtually impossible. There is a clear need to reduce the dimensionality of this problem. Therefore, the quality of a set of surrogate events, one can argue, may be best assessed by the similarity of the events and their ability to predict crash occurrence. Clearly, employing a search method that is ad hoc or makes minimal use of context has the potential weakness of yielding an inconsistent number of events, possibly with different etiologies from the crashes they are being compared with. Although assessing similarity among surrogate events is difficult, the prediction accuracy for crash occurrence would be relatively easy to compare by using safety statistical models that have practical interpretation.
Proposed Method for Classifying and Verifying Event Etiology This paper describes additional testing of an algorithm previously described and preliminarily tested (13, 14). Figure 2 shows the algorithm, which seeks to identify surrogate events in naturalistic driving
139
data. In the study by Wu and Jovanis (13), a multistage procedure is described: 1. The first screening stage detects safety-relevant events, referred to as events of interest, from unrefined naturalistic driving data; 2. The classification stage separates events of interest from a larger number of possible events identified in the initial screening and is the first assessment of a common etiology applied to the naturalistic driving events; this classification will reduce the number of events passed to the next stage in the process; 3. Because the number of events has changed from those obtained in the first screening, a second screening stage is needed to further refine the set of similar events; and 4. Finally, there is a verification of event etiology by using statistical models to quantitatively test for surrogate event similarity. The feedback loop shown in Figure 2 has not been described in previous papers except for a dissertation (14). Feedback is described in this paper with specific attention paid to the use of alternative screening criteria and searching for events using time windows of varying durations.
OBJECTIVE OF RESEARCH This research seeks to systematically study screening of naturalistic driving event data with the goal of producing a method that is reproducible for many sites and stresses the use of statistics as screening tools; the use of video is minimized where possible. The tests of the method verify its efficacy by using video collected and analyzed in the original data set (5). A user of the method with new data should not have to undertake these comparisons or should need them to a more limited degree.
METHODOLOGY The first screening seeks to detect possible events of interest from unrefined naturalistic driving data by using information collected in the data acquisition system. One way to think about the screening of crash and near-crash events is by comparison with medical diagnosis. The result of a diagnostic test can be classified as a true positive (TP), a true negative, a false positive, or a false negative. At this stage it is desirable to correctly diagnose as many crash and near-crash events (TPs) and few non-safety-related events (true negatives). The screening is a function of the variable used for screening (typically kinematic), the threshold selected for inclusion as an event of interest, and the duration of time (i.e., the time window) used for the screening. At the first screening, it is desirable to cast a broad net for possible events of interest. Subsequent steps in the algorithm of Figure 2 will refine the search. The test threshold determines the number of TPs, true negatives, false positives, and false negatives. One way to examine trade-offs with the four outcomes is with the receiver operating characteristic (ROC) curve, which can be conceptualized as determining the optimal diagnostic point (13, 22–24). First a threshold c is defined for a marker Z as positive if Z > c or as negative if Z < c. The marker is the variable used to identify the event of interest in the first screening. Let the corresponding true and false positive rate at the threshold c be TPR(c) and FPR(c), as shown in Equations 1 and 2, respectively. As the threshold c increases, both the false positive and TP rate decrease. Generally, the thresholds of the criteria should be set to
140
Transportation Research Record 2386
Unrefined Naturalistic Driving Data as Recorded from Kinematic Sensors
First Screening
Select Events of Interest from Raw Naturalistic Driving Data
Events of Possible Interest
Event No Longer of Interest
Classification
Identify Refined Set of Events: Using a Counterpart to the Chow Test to Identify (Initially) Similar Events
Event No Longer of Interest
Refined Events with Similar Generating Process to Crashes of Interest
Second Screening
Determine Specific Conditions for Surrogate Events Searching Different Time-Varying Variables and Thresholds Using Survival Analysis and ROC curve
Event No Longer of Interest
Vary Relevant Parameter Identifying Events of Initial Interest
Candidate Surrogate Events
Verification of Etiology
Verify the Common Etiologies Between Candidate Surrogate Event and Crash Outcome If Surrogate Events Are in Need of Additional Screening Event No Longer of Interest
Verified Surrogate Events
Estimate Conditional Crash Probabilities
Using Valid Surrogate Events and Event-Based Model
FIGURE 2 Analytical procedure for analysis and validation of surrogate events (13) (ROC = receiver operating characteristic).
include a high proportion of events of interest (i.e., high sensitivity). The desired goal is to achieve an acceptable sensitivity (correctly detect an event of interest), say, at least 90%, at the best specificity (minimum false alarm rate).
To select markers that can achieve the best TPR and FPR, performance can be compared and tested by quantifying the true positive fraction (TPF, sensitivity) and false positive fraction (FPF, 1 minus specificity):
TPR (c ) = true positive rate (c ) = P ( Z ≥ c Y = 1)
(1)
TPF ( Z ) = P (Y1 = 1 Y2 = 1, Z )
(3)
FPR (c ) = false positive rate (c ) = P ( Z ≥ c Y = 0 )
(2)
FPF ( Z ) = P (Y1 = 1 Y2 = 0, Z )
(4)
Wu and Jovanis
141
where Y1 = 1 represents an event satisfying a preset screening criterion, Z, and Y2 = 1 represents the event ending in an event of interest. A generalized linear model for binary outcomes can be used to develop separate models for TPF and FPF (23): gY2 ( TPF ( ZY2 )) = βY2 XY2
(5)
gY1( FPF ( ZY1 )) = βY1 XY1
(6)
quantify the additional information provided by the second test as P(Y1 = 1|YA = 1, YB) compared with P(Y1 = 1|YA = 1). One can also compare P(Y1 = 1|YA = 0, YB) with P(Y1 = 1|YA = 0). By comparing P(Y1 = 1|YA = 1, YB) with P(Y1 = 1|YA = 1), one can determine, among events testing positive with YA, if YB provides additional predictive information. This procedure is used in the development of TPF and FPF relationships in this research. Medical researchers (25, 26) use covariates associated with disease to improve the discriminatory accuracy of the marker (i.e., the ROC curve). As a hypothetical application to the problem here, if every male driver depresses the brake pedal harder than female drivers (meaning that gender is associated with the marker for deceleration force), the brake pedal force may better discriminate events of interest for female drivers than for male drivers. More generally, consider M as an attribute that inherently affects the discriminatory accuracy of the marker. In this example, Z is the braking force, and M is gender. Suppose that larger Z is associated with the greater crash risk; further, M = 0 and M = 1 indicate female and male drivers, respectively. An example of distributions of braking force for female and male drivers can be seen in Figure 3a. Different braking force for male and female drivers suggests different patterns of distribution. When
Popular choices for the link function are the logit link, g(t) = log (t/(1 − t)), or the log link, g(t) = log(t), which is easier for interpretation than the logit link. The performance of TPF and FPF can also be used to explore the incremental value of a test for prediction. It is not viable to rely on a single measure to detect an event of interest; hence it is necessary to understand how to enhance the screening results. Convenient frameworks exist for evaluating the incremental predictive value of a test beyond information contained in one source (23). Suppose that the result of a simple test, denoted YA, is available and that another test, YB , can be performed in addition. One can
M = 0 (Females)
M = 1 (Males) Events of Interest Normal Driving Events
-4
-2
0
2
4
Events of Interest Normal Driving Events
6
-4
-2
0
2
Z
4
6
Z
0.50 0.00
0.25
Sensitivity
0.75
1.00
(a)
M=0 (Females) M=1 (Males) 0.00
0.25
0.50 1-Specificity
0.75
1.00
(b) FIGURE 3 Hypothetical example reproduced from work by Janes et al. (26): (a) distributions of braking forces for (left) female and (right) male drivers and (b) ROC curves for hypothetical example.
142
Transportation Research Record 2386
one selects a threshold (Z = 1 in this example, the reference line) without taking gender difference into consideration, the events of interest will include more female-normal events than male-normal events, which automatically induces selection bias. More generally, the same marker may have different discriminatory performance for different groups of drivers. As shown in Figure 3b, given the same threshold, this marker performs differently for the two groups of drivers. The greater the area under the ROC curve, the better the discriminatory accuracy of the marker. Figure 3b suggests that the marker would perform better for female drivers than for male drivers. Methodologically this finding is important because gender can now be used directly as part of the screening process. Formally, ROC regression methods can be used to test and handle this situation, where covariates affect the separation between cases (event of interest), and controls (event not of interest). It is a methodology that models a marker’s ROC curve as a function of covariates (24, 26). Implementation proceeds in two steps: (a) model the dis tribution of the marker among controls as a function of covariates and calculate the case percentile values; and (b) model the cumulative density function of the ROC curve as a function of covariates (25). The ROC curves can therefore be modeled parametrically by using ROC Z ( f ) = F {α 0 + α1F −1 ( f ) + α 2 M }
(7)
where F = standard normal cumulative density function, f = set of discrete FPR points, and α0, α1, α2 = estimated parameters for ROC regression. If α2 is positive, an increase in the covariate, M, can enhance the accuracy of the marker. Such a model is estimated in the data analysis section, where the use of fatigue as a predictor, M, is illustrated. DATA The VTTI 100-car naturalistic driving study data set is used for empirical testing (5); it includes 241 primary and secondary drivers and 12 to 13 months of data collection for each vehicle. A data acquisition system consisting of cameras for video recording, kinematic sensors, radar, lane-tracking devices, and a hard drive for data storage was installed in each vehicle. These naturalistic data were collected under two important conditions: 1. Vehicles were instrumented with video camera technologies that observe the driver and the road ahead of the vehicle continuously during driving but with minimal driver awareness. In addition to the video, other onboard sensors continuously recorded vehicle accelerations in three dimensions as well as rotational motion along the same axes. Radar was often present to record proximity to other vehicles and potential obstacles on the roadway or roadside. 2. Drivers were asked to drive as they normally would (i.e., without specific experimental or operational protocols and not in a simulator or test track). The period of observation for most drivers in the study was approximately 1 year. The database used in this study started with the identification of events by VTTI researchers. Once the triggering events were found in the data, VTTI researchers saved kinematic data from 30 s before to 10 s after the onset of the precipitating event. These data were obtained
from the VTTI website (http://forums.vtti.vt.edu/index.php?/index) and combined with data obtained during a recently completed SHRP 2 safety study (27). On the basis of the event criteria in Table 1, VTTI researchers identified 69 crashes and 761 near crashes of all types that occurred during the 100-car study. To refine the scope of the investigation, road departure events were selected as a focus of this research, building on the results from previous research by this team (9, 27, 28). The focus on road departure events resulted in an initial sample size of 13 crashes and 38 near crashes as defined in the original VTTI study. However, the availability of kinematic data 30 s before each event and 10 s afterward provides for the capability to reanalyze the data and define different numbers of events when the data are searched with different time windows. This concept was illustrated in the discussion of the event shown in Figure 1. In that example a different number of events could be defined depending on the time window used for event screening. As a result, the number of events defined as an outcome of each data screening may vary somewhat, but all the observations come from the initial set of 13 crashes and 38 near crashes. Six specific measures for road departure event screening are tested in this study (including the definition of the duration of the time window used in the analysis): • Maximum lateral acceleration difference within a 1-s window, Lat10D; • Maximum lateral acceleration difference within a 3-s window, Lat30D; • Maximum instantaneous lateral acceleration, Lat01M; • Maximum lateral acceleration within a 1-s window, Lat10M; • Maximum lateral acceleration within a 3-s window, Lat30M; and • Maximum yaw rate difference within a 3-s window, Yaw30D. The time scale is referenced as 10 for 1-s and 30 for 3-s time windows, to remind the reader that kinematic measures in the data are recorded 10 times per second.
DATA ANALYSIS The goal of the data analysis stage is to detect as many real events of interest (maximum sensitivity) without obtaining too many false alarms (acceptable specificity). The ROC curve was applied to determine which measure is best in detecting events of interest and to select a threshold for each measure. The greater the ROC area, the better the performance of a measure to correctly discriminate events of interest from the unrefined naturalistic driving data. As shown in Figure 4, in terms of the ROC area, Lat10D (maximum lateral acceleration difference within a 1-s window) was found to be the best measure in detecting events of interest, followed by Lat01M, Lat10M, Lat30D, Lat30M, and then Yaw30D. Lat10D was significantly better than Lat01M, Lat01M was significantly better than Lat10M, Lat10M was significantly better than Lat30D, and Lat30D was significantly better than Lat30M. These results illustrate that the choice of time window has an effect on the performance of the screening; both the measure used (i.e., maximum lateral acceleration versus maximum lateral acceleration difference) and the duration of the window influence performance; the 1-s window is generally preferred for these data. Use of Lat10D to detect road departure events during the first pass through the algorithm is therefore recommended.
143
0.00
0.25
Sensitivity 0.50 0.75
1.00
Wu and Jovanis
0.00
0.25
0.50
0.75
1.00
1-Specificity Lat30D ROC area: 0.7512 Lat01M ROC area: 0.8953 Lat30M ROC area: 0.7096 Reference
Lat10D ROC area: 0.948 Lat10M ROC area: 0.8573 Yaw30D ROC area: 0.6549
FIGURE 4 ROC curves at first screening.
A threshold for each measure was chosen in terms of sensitivity and specificity, as shown in Table 2. For the Lat10D measure, a value that was greater than 0.4 g was selected. For the Lat01M measure, a value that was greater than 0.3 g was selected. For the Lat10M measure, a value that was greater than 0.3 g was selected. For the Lat30D measure, a value that was greater than 0.4 g was selected. For the Lat30M measure, a value that was greater than 0.3 g was selected. Finally, for the Yaw30D measure, a value that was greater than 4 degrees/s was selected. It is well known that crash risk varies in terms of driver attributes, event attributes, and context. It seems reasonable that the performance of a screening criterion should similarly vary. Therefore, it is important to understand the influence of such factors in identifying the best screening criteria. ROC regression was applied to assess how the
presence of driver fatigue would affect the overall ROC area. As shown in Table 3, the presence of driver fatigue improves the ability to correctly discriminate events of interest from normal driving for Lat10D and Lat30M. The model is formulated as shown in Equation 7, with a dichotomous indication of fatigue used as a covariate, M. Use of driver fatigue enhanced the ability to correctly detect events of interest, such as increased TPF. Figure 5 shows that a different screening criterion would essentially lead to different sets of events. To demonstrate how varying the definition of event duration affects analysis results, the simplest case is considered, where the duration of an event of interest begins at the onset of a certain measure exceeding a predetermined threshold and ends after the same measure falls below the same threshold. As shown in Figure 5, the line at the top indicates the original trip
TABLE 2 Sensitivity and Specificity for Each Measure of Lateral Acceleration Lat10D Cutoff Point (g) ≥0.0 ≥0.1 ≥0.2 ≥0.3 ≥0.4 ≥0.5 ≥0.6 ≥0.7 ≥0.8 ≥0.9 ≥1.0 >1.0
Lat01M
Lat10M
Lat30D
Lat30M
Sensitivity
Specificity
Sensitivity
Specificity
Sensitivity
Specificity
Sensitivity
Specificity
Sensitivity
Specificity
100.00 100.00 100.00 95.24 92.06 77.78 55.56 44.44 36.51 33.33 23.81 0.00
0.00 8.00 44.80 75.20 89.60 93.60 97.60 98.40 98.40 99.20 99.20 100.00
100.00 100.00 96.83 90.48 68.25 44.44 38.10 26.98 12.70 7.94 4.76 0.00
0.00 18.40 55.20 69.60 91.20 96.00 99.20 100.00 100.00 100.00 100.00 100.00
100.00 100.00 96.83 90.48 69.84 44.44 38.10 26.98 12.70 7.94 4.76 0.00
0.00 9.60 42.40 60.80 84.80 94.40 97.60 99.20 99.20 99.20 100.00 100.00
100.00 100.00 100.00 96.83 93.65 84.13 63.49 49.21 41.27 36.51 25.40 0.00
0.00 2.40 13.60 28.00 41.60 56.80 72.80 80.00 83.20 84.80 90.40 100.00
100.00 100.00 98.41 92.06 71.43 46.03 38.10 26.98 12.70 7.94 4.76 0.00
0.00 4.80 22.40 33.60 60.80 77.60 84.00 89.60 95.20 96.80 98.40 100.00
Note: Values given are percentages.
144
Transportation Research Record 2386
TABLE 3 Factors Affecting ROC Performance Variable
Lat10D
α0 in Equation 7
2.098 (1.363)
α1 in Equation 7
0.893 (0.967) 5.226** (2.162) 188
M, presence of fatigue Number of observations
Lat30D
Lat01M
Lat10M
Lat30M
Yaw30D
1.109*** (0.284)
1.751*** (0.334)
1.385*** (0.263)
0.719*** (0.221)
0.867*** (0.233)
1.435*** (0.242) 0.314 (0.282) 188
1.317*** (0.244) 1.850 (1.957) 188
1.197*** (0.204) 1.341 (0.873) 188
1.315*** (0.180) 0.693*** (0.261) 188
1.477*** (0.182) −0.104 (0.282) 188
Note: Standard error given in parentheses. *p < .1; **p < .05; ***p < .01.
covering before, during, and after periods; two events of interest were detected when a threshold of Lat10D greater than 0.4 g was applied during the event. One lasted for 0.3 s, which is in the 30-s before period; the other lasted for about 1 s around the event end. The reference line indicates the time point at which the event ended according to the identification by VTTI researchers. Similarly, three events of interest were detected when a threshold of Lat10M greater than 0.3 g was applied, and so on. The implications of this definition are that (a) the comparison across events is standardized and (b) these events are kinematically similar. Seventy-six events of interest satisfying the threshold of Lat10D greater than 0.4 g during an event were carried to the second stage; 203 events of interest satisfying the threshold of Lat01M greater than 0.3 g during an event were carried to the second stage; 101 events of interest satisfying the threshold of Lat10M greater than 0.3 g during an event were carried to the second stage; 99 events of interest satisfying the threshold of Lat30D greater than 0.4 g during an event were carried to the second stage; 90 events of interest satisfy-
51 00
52 00
53 00
ing the threshold of Lat30M greater than 0.3 g during an event were carried to the second stage; and 188 events of interest satisfying the threshold of Yaw30D greater than 4 degrees/s during an event were carried to the classification and second screening stage. So the use of different triggers and time windows clearly results in different sample sizes and events for subsequent analysis using the method in Figure 2. Although several large-scale naturalistic studies are under way, knowledge of the appropriate use of the data to enhance traffic safety is still limited. To demonstrate, suppose that the only screening measure in this study was Yaw30D greater than 4 degrees/s (Yaw30D04). As shown in Table 4, the predictive value of Yaw30D04 alone is 35% [exp(–1.072)]. If events that are detected by Yaw30D04 are also tested with Lat30D04 (Lat30D greater than 0.4 g), a positive result increases the probability that the event is a true event of interest by a multiplicative factor of 1.18 to 1.43 [exp(0.173) and exp(0.36), respectively]. A negative Lat30D04 test decreases the probability by a multiplicative factor of 10% to 56%. Thus Lat30D04 provides
54 00
55 00
Measurement Duration (Every 100 units = 10 seconds) Original Trip
Lat10D > 0.4g during event
Lat01M > 0.3g during event
Lat10M > 0.3g during event
Lat30M > 0.4g during event
Yaw30D > 4 degree during event
Lat30D > 0.4g during event FIGURE 5 Detected event of interest after first screening for each measure.
56 00
Wu and Jovanis
145
TABLE 4 Incremental Value of Lat30D on Yaw30D
Lat30D04 = 0 Lat30D04 = 1 Constant
Coeff.
SE
z
P>z
95% CI
−1.493 0.267 −1.072
0.462 0.048 0.102
−3.230 5.600 −10.460
0.001 0.000 0.000
−2.398, −0.588 0.173, 0.360 −1.273, −0.871
Note: Coeff. = coefficient; SE = standard error; CI = confidence interval.
statistically significant predictive information when combined with Yaw30D04.
SUMMARY AND DISCUSSION Although several large-scale naturalistic driving studies are under way, knowledge of appropriate use of the data to enhance traffic safety is still limited. For every different research question that seeks answers in naturalistic driving data analysis, it is likely that there would be a need to redo the event identification; this step would add significant cost to the study. There is a need to develop and test a more generalizable screening framework that can be applied consistently to data from naturalistic driving studies exploring similar hypotheses in different locations. This research seeks to systematically study screening of naturalistic driving event data with the goal of producing a method that is reproducible in a range of applications and stresses the use of statistics as screening tools. This study explored and discussed issues in the screening and analysis of naturalistic driving data. Among the methodological findings are the following: • Different screening criteria lead to different set events; the method proposed that used ROC curves readily accommodates a comparison of different triggers. • First screening criteria should include covariates such as event attributes, context, and driver attributes. Even with this limited sample of events, the use of ROC regression demonstrated improved event detection flexibility to support a wide range of analysis goals. • The proposed analysis framework supports the quantitative use of multiple triggers within the same statistical model; this method is a substantial advance over methods relying on one trigger and visual observation alone. Empirical recommendations are provided with the caution associated with a limited sample size. Nevertheless, it appears that screening with a 1-s time window using the maximum lateral acceleration difference within that time (LAT10D) provided the best initial screening of events. Analyses with ROC curves indicated a clear order of preference for screening: Lat10D greater than 0.4 g, Lat01M greater than 0.3 g, Lat10M greater than 0.3 g, Lat30D greater than 0.4 g, Lat30M greater than 0.3 g, and Yaw30D greater than 4 degrees/s were found to be suitable as first screening criteria. Specifically related to ROC regression, consideration of driver fatigue increased the performance for Lat10D and Lat30M screening criteria. Combining search criteria also proved beneficial: Lat30D04 enhanced predictive information when Yaw30D04 was used for screening. The goal of this research is to develop an algorithm that enhances the application of video review. Although video review is a requirement for this study (in order to obtain some of the modeling variables),
the hope is that common criteria would evolve when events of the same type were searched for. For example, once the algorithm is tested on a sample of road departure crashes from the SHRP 2 naturalistic driving study, a more refined set of search criteria for well-behaved surrogates may be on hand. These criteria may then be applied with more limited verification by video. In this way the method would produce a reduction in the time and cost for video review and increase the likelihood of identifying statistically similar crashes and near crashes. Future research should focus on continuing application of these screening concepts to other data sets. In addition, there is a need to complete the entire analysis structure contained in Figure 2 for additional data sets and follow the feedback loop completely through all steps. One missing step is the comparison of crashes and comparable near crashes (i.e., events with comparable etiology). This goal is the original one of the Figure 2 framework, but the application of these concepts has been limited (12–14). In particular, the need to statistically verify similarity in events is, in the view of the authors, an important yet not well-studied task.
REFERENCES 1. Volvo Trucks Field Operational Test: Evaluation of Advanced Safety Systems for Heavy Truck Tractors. Volvo, Washington, D.C., 2005. 2. Sayer, J. R., M. L. Mefford, K. Shirkey, and L. Lantz. Driver Distraction: Naturalistic Observation of Secondary Behaviors with the Use of Driver Assistance Systems. In Driver Assessment 2005: Third Inter national Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design, University of Iowa, Iowa City, 2005. 3. Evaluation of the Volvo Intelligent Vehicle Initiative Field Operational Test Version 1.3. Battelle, Columbus, Ohio, 2007. 4. Leblanc, D., J. Sayer, C. Winkler, R. Ervin, S. Bogard, J. Devonshire, M. Mefford, M. Hagan, Z. Bareket, R. Goodsell, and T. Gordon. Road Departure Crash Warning System Field Operational Test: Methodology and Results. University of Michigan Transportation Research Institute, Ann Arbor, 2006. 5. Dingus, T. A., S. G. Klauer, V. L. Neale, A. Petersen, S. E. Lee, J. Sudweeks, M. A. Perez, J. Hankey, D. Ramsey, S. Gupta, C. Bucher, Z. R. Doerzaph, J. Jermeland, and R. R. Knipling. The 100-Car Natural istic Driving Study, Phase II—Results of the 100-Car Field Experiment. Report DOT HS 810 593. NHTSA, U.S. Department of Transportation, 2005. 6. FMCSA’s Naturalistic Truck Driving Research Program. FMCSA, U.S. Department of Transportation, 2008. http://www.fmcsa.dot.gov/factsresearch/media/webinar-08-06-23-slides.pdf. 7. 40–Teen Naturalistic Driving Study. Virginia Tech Transportation Institute, Blacksburg, Va., 2012. http://www.vtti.vt.edu/casr-research/Naturalistic_ Teenage_Driving_Study.php. 8. 100-Motorcyclist Naturalistic Study. Motorcycle Safety Foundation, New York, 2012. http://wiki.fot-net.eu/index.php?title=100-Motorcyclist_ Naturalistic_study. 9. Jovanis, P. P., J. Aguero-Valverde, K.-F. Wu, and V. Shankar. Analysis of Naturalistic Driving Event Data: Omitted Variable Bias and Multilevel Modeling Approaches. In Transportation Research Record: Journal of
146
the Transportation Research Board, No. 2236, Transportation Research Board of the National Academies, Washington, D.C., 2011, pp. 49–57. 10. Fitch, G. M., H. A. Rakha, M. Arafeh, M. Blanco, S. A. Gupta, R. P. Zimmermann, and R. J. Hanowski. Safety Benefit Evaluation of a Forward Collision Warning System. Report DOT HS 810 910. U.S. Department of Transportation, 2008. 11. Martin, P. G., and A. L. Burgett. Rear-End Collision Events: Characterization of Impending Crashes. Proc., First Human-Centered Transportation Simulation Conference, University of Iowa, Iowa City, 2001. 12. Wu, K., and P. P. Jovanis. The Relationships of Crashes and CrashSurrogate Events in Naturalistic Driving. Accident Analysis and Pre vention, Vol. 45, 2012, pp. 507–516. 13. Wu, K., and P. P. Jovanis. Defining, Screening, and Validating Crash Surrogate Events Using Naturalistic Driving Data. Accident Analysis and Prevention, Vol. 45, 2012, pp. 507–516. 14. Wu, K. Defining, Screening, and Testing Crash Surrogates Using Natu ralistic Driving Data. Doctoral dissertation. Department of Civil and Environmental Engineering, Pennsylvania State University, 2011. 15. Hauer, E. Traffic Conflicts and Exposure. Accident Analysis and Pre vention, Vol. 14, No. 5, 1982, pp. 359–364. 16. Hydén, C. The Development of a Method for Traffic Safety Evaluation: The Swedish Traffic Conflicts Technique. Department of Traffic Planning and Engineering, Lund University, Lund, Sweden, 1987. 17. Chin, H. C., and S. T. Quek. Measurement of Traffic Conflicts. Safety Science, Vol. 26, No. 3, 1997, pp. 169–185. 18. Archer, J. Methods for the Assessment and Prediction of Traffic Safety at Urban Intersections and Their Application in Micro-simulation Modeling. Thesis. Royal Institute of Technology, Stockholm, Sweden, 2004. 19. Tarko, A., G. Davis, N. Saunier, T. Sayed, and S. Washington. Surrogate Measures of Safety. White Paper. Presented at 88th Annual Meeting of the Transportation Research Board, Washington, D.C., 2009.
Transportation Research Record 2386
20. McGehee, D. V., L. N. Boyle, S. J. Hallmark, D. Lee, D. M. Neyens, and N. J. Ward. SHRP2 S02 Integration of Analysis Methods and Develop ment of Analysis Plan—Phase II Report. SHRP 2, Transportation Research Board of the National Academies, Washington, D.C., 2010. 21. Datta, T. K. Accident Surrogates for Use in Analyzing Highway Safety Hazards. Proc., Second International Traffic Conflict Technique Workshop, Transport Research Laboratory, Crowthorne, Berkshire, England, 1979, pp. 4–20. 22. Hanley, J. A., and B. J. McNeil. The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology, Vol. 143, 1982, pp. 29–36. 23. Pepe, M. S. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, New York, 2003. 24. Peat, J., and B. Barton. A Guide to Data Analysis and Critical Appraisal. Blackwell Publishing, Oxford, United Kingdom, 2005. 25. Janes, H., and M. S. Pepe. Adjusting for Covariate Effects on Classification Accuracy Using the Covariate-Adjusted Receiver Operating Characteristic Curve. Biometrika, Vol. 96, No. 2, 2009, pp. 371–382. 26. Janes, H., G. Longton, and M. S. Pepe. Accommodating Covariates in Receiver Operating Characteristic Analysis. The Stata Journal, Vol. 9, Nov. 1, 2009, pp. 17–39. 27. Jovanis, P. P., V. Shankar, J. Aguero-Valverde, K. Wu, and A. Greenstein. Analysis of Existing Data: Prospective Views on Methodological Para digms. SHRP 2, Transportation Research Board of the National Academies, Washington, D.C., 2010. 28. Shankar, V. N., P. P. Jovanis, J. Aguero-Valverde, and F. Gross. Analysis of Naturalistic Driving Data: Prospective View on Methodological Paradigms. In Transportation Research Record: Journal of the Trans portation Research Board, No. 2061, Transportation Research Board of the National Academies, Washington, D.C., 2008, pp. 1–8. The Safety Data, Analysis, and Evaluation Committee peer-reviewed this paper.