1 Preface on Avoiding Bias

2 downloads 0 Views 2MB Size Report
May 18, 2012 - NHTSA's proposed “Visual-Manual Driver Distraction Guidelines for In-Vehicle Electronic ... 5 The pdf document of the official Federal Register posting of the draft ...... 2 The Alliance is a trade association of twelve automobile manufacturers, including BMW Group, Chrysler Group ...... 2012c;23(1):116-118.
Department of Psychiatry and Behavioral Neurosciences

To:

The Honorable David L. Strickland, Administrator National Highway Traffic Safety Administration 1200 New Jersey Avenue, SE Washington, DC 20590

From: Richard A. Young, Ph.D. 1 Research Professor Re:

Comment on: Visual-Manual National Highway Traffic Safety Administration Driver Distraction Guidelines: In-Vehicle Electronic Devices (Docket No. NHTSA-2010-00530009)

Date: May 18, 2012

1 Preface on Avoiding Bias Driver distraction has been a subject of national debate for a long time. Many people (both the lay public and experts in the field) have strong views on the subject, leading to possible confirmation bias, meaning “that information is searched for, interpreted, and remembered in such a way that it systematically impedes the possibility that the hypothesis could be rejected – that is, it fosters the immunity of the hypothesis.” 2 (chapter 4) Confirmation bias does not mean deceptive strategies are used, or intentional manipulations of the facts are done. Rather, it means that biased forms of information processing take place unintentionally or subconsciously. That is, people are not consciously aware that they are biased in this way, and that this bias is influencing their opinions, or even what information they selectively attend to and process. Many other forms of bias exist -- Pohl2 lists dozens. Hindsight bias2 (chapter 20) is another type of bias that plays a major role in categorizing crashes as caused by driver distraction, particularly when only single cases are examined rather than hundreds or thousands as required in an unbiased scientific investigation. A whimsical illustration of bias is given in the Appendix to this comment. It might be hoped that an unbiased examination of the scientific evidence concerning distracted driving would circumvent the errors in judgment and cognitive illusions that arise from such 1

Disclaimer: The opinions expressed in this comment and its attachments are my personal views, and do not necessarily reflect the views of my employer, Wayne State University. This document and its attachments were written on my personal time without financial support from any company, government agency, organization, or individual. 2 Pohl, R. (2004). Cognitive illusions: a handbook on fallacies and biases in thinking, judgement and memory. New York, NY: Psychology Press.

Comment on NHTSA Guidelines

p. 1 of 14

Young

biases. Unfortunately these same biases have affected not just the general public or professionals with non-scientific backgrounds, but also many of the scientific experts in driver distraction – hence, these biases are also unfortunately evident in the scientific literature. In order to further advance the science of driver distraction, it would be beneficial when encountering new data and information that one keeps an open mind and carefully examines the data on its own merits, rather than engaging pre-existing belief systems that may cause one to reject or shy away from ideas or data that do not readily agree with one’s pre-existing beliefs. I have attempted such an unbiased spirit (as much as one person can) with this comment and its attachments concerning NHTSA’s proposed “Visual-Manual Driver Distraction Guidelines for In-Vehicle Electronic Devices” (hereafter, NHTSA Distraction Guidelines).

2 NHTSA Distraction Guidelines On February 24, 2012, NHTSA 3 published the NHTSA Distraction Guidelines. 4,5 These draft guidelines have a limited scope of application: only information, navigation and entertainment systems with visual-manual interfaces that are integrated into vehicles as original equipment. The draft NHTSA Distraction Guidelines apply neither to speech interfaces nor to portable electronic devices that can be carried into a vehicle. A “docket” has been opened on the web to collect public comments. 6 Many items have been posted, including transcripts from three public hearings and a public technical workshop; comments from private citizens, corporations, and public organizations; and several technical reports sponsored by NHTSA intended to support the draft NHTSA Distraction Guidelines. The best that can be done in this brief comment is to go over the highlights – the good things and the bad things about the proposed NHTSA Distraction Guidelines from a high-level point of view, and in as unbiased a manner as possible. A limited amount of technical documentation for these comments is also provided in the attachments to this comment.

3

I refer in this document and its attachments simply to “NHTSA” rather than “The NHTSA” or “The Agency” as is common in some technical documents. The standalone term “NHTSA” is now in common usage – see http://en.wikipedia.org/wiki/National_Highway_Traffic_Safety_Administration. 4 NHTSA, "Visual-Manual NHTSA Driver Distraction Guidelines for in-Vehicle Electronic Devices,” Docket No. NHTSA–2010–0053, 77, Federal Register, 2012, https://federalregister.gov/a/2012-6266. 5 The pdf document of the official Federal Register posting of the draft Guidelines can be directly downloaded from http://www.gpo.gov/fdsys/pkg/FR-2012-02-24/pdf/2012-4017.pdf. 6 http://www.regulations.gov/#!docketDetail;rpp=100;so=DESC;sb=docId;po=0;D=NHTSA-2010-0053.

Comment on NHTSA Guidelines

p. 2 of 14

Young

3 Good Things about the NHTSA Distraction Guidelines 3.1 A Level Playing Field for Automakers and Suppliers NHTSA has made a good first effort at setting driver distraction guidelines for visual-manual tasks. It is beneficial to have common guidelines for all automakers and their suppliers so that there is a “level playing field.” Not all automakers selling vehicles in North America signed the “letter of commitment” to the Alliance Guidelines, 7 which means those that did not sign, are able to sell vehicles which do not necessarily adhere to the Alliance Guidelines, which has the potential to compromise public driving safety. NHTSA is therefore to be commended for attempting to come up with a common set of guidelines that will help ensure that all automakers who sell vehicles in the U.S. “come on board,” with built-in electronic devices that meet the NHTSA Distraction Guidelines.

3.2 Attempts Needed Updates to Alliance Guidelines The Alliance Guidelines were not intended to be a document that was “frozen” for all time after their publication in 2006. 7 As new advances are made in the area of driver behavior and performance, it is intended for this document to be updated and adapted to incorporate relevant new knowledge and data: “The Guidelines are “a work in progress” and will continue to be refined as resources and scientific support become available. There is extensive ongoing relevant research in the area of driver distraction and workload management and as new information becomes available the document will need to be reviewed for possible updating.” Alliance (2006, p.6) It has now been 6 years since the publication of the Alliance Guidelines and it is widely agreed by driver distraction experts that the Alliance Guidelines are in need of updating given the many additional research findings in driver distraction since 2006. NHTSA has taken this opportunity to provide such an update, for which they are to be commended for at least making the attempt, even if one disagrees with technical aspects of their updates. Other positive aspects of the NHTSA Distraction Guidelines are documented in Attachment 1.

4 Bad Things about the NHTSA Distraction Guidelines 4.1 No Empirical Validation of Tests There are serious concerns about the dearth of empirical validation that NHTSA provides for the test procedures they put forward in support of the proposed Distraction Guidelines. Indeed, no 7

Alliance of Automobile Manufacturers Driver Focus Telematics Working Group, “Statement of Principles, Criteria and Verification Procedures on Driver-Interactions with Advanced in-Vehicle Information and Communication Systems, June 26, 2006 Version,” Alliance of Automobile Manufacturers, Washington, DC, 2006, http://autoalliance.org/files/DriverFocus.pdf

Comment on NHTSA Guidelines

p. 3 of 14

Young

validation data were presented for any of their seven proposed tests. By validation, I mean an examination of whether the specific laboratory or simulator test that is used to classify a task as meeting or not meeting a criterion is confirmed (or not) by an experimental test of that same task when conducted on an open road or test track. If a laboratory or simulator test classifies a task as meeting criteria in the laboratory or simulator but the task does not meet criteria during real driving on a road, then that is a “false negative” result (see Attachment 2 for details). If a laboratory or simulator test classifies a task as not meeting criteria in the laboratory or simulator but the task does meet criteria during actual driving on an open road or closed track, then that is a “false positive” result (see Attachment 3 for details). So far, no examination of false positive or false negative results with the NHTSA tests and criteria has been presented in any NHTSA document cited by the NHTSA Guidelines,4 or posted to the docket.6 The preliminary examination in Attachments 2 and 3 indicates a high likelihood that both types of errors could be present for a high percentage of tested visual-manual tasks, if the NHTSA Guidelines were accepted and implemented in their current form by automakers or anyone else. The ultimate validation of a laboratory or simulator test is of course real-world crash data, such as collected in crash databases or “naturalistic” studies, but the current state of knowledge and technology permits this form of validation only to a limited extent. A current naturalistic study with around 2,000 vehicles known as SHRP2 is currently underway. However, it will be a number of years before that data is fully collected and analyzed, and the limited number of crashes recorded will still not permit definitive conclusions about the many hundreds if not thousands of features and functions present in today’s electronic in-vehicle systems, or in portable devices carried into a vehicle. Experimental testing rather than “naturalistic” or crash studies will therefore be required in the foreseeable future to ensure the safety of electronic information, communication, and entertainment devices, whether built-in by automakers, carried in by drivers, or some combination thereof. Unfortunately, no convincing validation data for their proposed tests were cited or presented in the draft NHTSA Guidelines or the supporting research reports published in the NHTSA docket at the time of writing of this comment. Extensive research was done by NHTSA internally, or contracted by them, in an attempt to validate the proposed criteria in the NHTSA Guidelines, but there was no attempt to validate the test procedures themselves. To do so would have required a new set of tasks (not used to develop the criteria) to be tested according to the test procedures in the laboratory or simulator, and then classified by the proposed criteria. 8 Then an examination would need to be made as to whether those tasks would or would not have been classified in the same way if the test had been conducted using actual driving on an actual road, not simulated driving on a simulated road, or a surrogate laboratory test (e.g. occluded goggles test). 8

Hallett, C., Regan, M. A., Bruyas, M-P., "Development and Validation of a Driver Distraction Impact Assessment Test," 2nd International Conference on Driver Distraction and Inattention, Gothenburg, Sweden, September 5-7 2011, http://www.chalmers.se/safer/ddi2011en/program/program/downloadFile/attachedFile_f0/DDI2011_FINAL_program?nocache=1315170096.36.

Comment on NHTSA Guidelines

p. 4 of 14

Young

NHTSA did make a limited attempt to validate the occluded goggles test with NHTSA’s STISIM simulator data, 9 but no data concerning validation of the NHTSA simulator itself was presented. Such validation may have been done, but no such data were cited in the Guidelines or published in any of the supporting documents. One non-validated or uncalibrated instrument cannot be used to calibrate or validate another. The many studies conducted or sponsored by NHTSA as cited in the NHTSA Guidelines document were used solely to come up with the criteria they proposed; again, none of those studies were actual validation tests that tested a large number of tasks according to the criteria in both the laboratory or simulator test, and then compared those results to a road or track test. NHTSA did sponsor a study at VTTI 10 to show that the criteria NHTSA selected (based on a radio tuning task) were consistent for both track data and NHTSA’s simulator data. Indeed, a similar result of about a 12 second 85th percentile eyes-off-road time was found for both venues for a 2010 Toyota Prius tuning task, after pooling all subjects who did radio tuning in 10 other radios in 10 other vehicles, despite the large differences between the results for individual radios. However, just because s single data point matched simulator and road for a single task, does not ensure that similar results will be achieved for other tasks. Other tasks need to be tested with the same test procedures and scored according to the same criteria in order to determine if a test and criterion is valid or not. While NHTSA made some attempt to present road study data on these issues, the tests themselves had not yet been validated in the data so far presented by NHTSA. In other words, no data were presented in any cited NHTSA supporting report that validates the NHTSA-proposed criteria in combination with their proposed tests. That is, no study examining to what extent visual-manual tasks that met or did not meet the NHTSA criteria in any of their 7 laboratory or simulator tests would have a similar result if tested during actual driving on an actual road or track. In short, validating a criterion with road data does not mean that the test itself is valid, only the criterion (assuming no other problems are present). A large number of tasks using a range of in-vehicle systems need to be tested against the criterion in the laboratory or simulator, and then also on the road, to determine whether predictable road results are achieved from the simulator or laboratory test or not. Such validation testing was not reported by NHTSA, although it is common protocol in validating a laboratory or simulator surrogate test to

9

Ranney, T.A., Baldwin, G.H.S., Smith, L.A., Martin, J., and Mazzae, E.N., "Driver Behavior During VisualManual Secondary Task Performance: Occlusion Method Versus Simulated Driving," National Highway Traffic Safety Administration, Washington, DC, 2012. Available at: http://www.regulations.gov/#!documentDetail;D=NHTSA-2010-0053-0077, Accessed May 5, 2012. 10 Perez, M., Owens, J., Viita, D., Angell, L., Ranney, T.A., Baldwin, G.H.S., Parmer, E., Martin, J., Garrott, W.R., and Mazzae, E.N., “Summary of Radio Tuning Effects on Visual and Driving Performance Measures – Simulator and Test Track Studies,” National Highway Traffic Safety Administration, 2012, http://www.regulations.gov/#!documentDetail;D=NHTSA-2010-0053-0076.

Comment on NHTSA Guidelines

p. 5 of 14

Young

then use those results to predict on-road driver performance. 11,12,13 As a consequence of this lack of predictive validity, it is hypothesized that the NHTSA tests will have a high percentage of errors in classifying a task as “safe” or “not safe” for drivers to perform while driving. The reason such errors occur is explained in the following comments and documented in the accompanying Attachments 2 and 3. It is obviously impossible to test the countless number of secondary tasks that are present or will be present in the future in a laboratory or simulator test and then also on the road. The point of having a validated laboratory or simulator test is that over the range of types of responses, and the types of interferences between those secondary tasks and driver performance that they may produce, can be evaluated. This concept is well-known in calibration laboratory best practices. A secondary standard (such as a one-gallon container) cannot be used as a laboratory standard unless it has been calibrated against a standard that has a traceable calibration to an accepted primary standard (e.g., the one-gallon primary standard at the National Institute of Standards and Technology). Ideally, the “gold standard” would be the relative crash risk from crash data from thousands of crashes. In the absence of such data for the foreseeable future, the next best thing is at least experimental data collected during actual driving on an actual open road or closed track, a predictive validation method which has been amply demonstrated in many studies11,12,13 (also see Attachments).

4.2 High Percentage of False Positive Errors Tasks that do not meet criteria in the laboratory or simulator, but do meet criteria in on-the-road conditions, are known as false positives, meaning that the tasks are incorrectly identified in a laboratory or simulator test as needing “lock-out” while moving or costly redesign, when they may actually be relatively safe for driver performance during actual driving on an actual (not simulated) road. Unfortunately, a quick check with some prior public data from the NHTSAsponsored Crash Avoidance Metrics Partnership Driver Workload Metrics project shows that the proposed NHTSA occlusion test likely has produces about 40% false positive errors with the 9 second TSOT criterion NHTSA proposes (see Attachment 3), leading to unnecessary redesign or 11

Angell, L. S., Young, R. A., Hankey, J. M., & Dingus, T. A. (2002). An evaluation of alternative methods for assessing driver workload in the early development of in-vehicle information systems. Paper presented at the Society of Automotive Engineers Government/Industry Meeting, Washington, DC. http://www.sae.org/technical/papers/2002-01-1981 12 Young, R.A., Angell, L., Sullivan, J.M., Seaman, S., and Hsieh, L., "Validation of the Static Load Test for Event Detection During Hands-Free Conversation ", Proceedings of the Fifth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design 5:268-75, 2009, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf. 13 Young, R.A., Aryal, B., Muresan, M., Ding, X., Oja, S., and Simpson, S.N., "Road-to-Lab: Validation of the Static Load Test for Predicting on-Road Driving Performance While Using Advanced Information and Communication Devices," Proceedings of the Third International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Rockport, Maine, June 2005, http://drivingassessment.uiowa.edu/DA2005/PDF/35_DickYoungformat.pdf.

Comment on NHTSA Guidelines

p. 6 of 14

Young

lockouts. Creating unnecessary costs for automakers means that fewer resources are available for them to actually address tasks or issues that really need to be attended to in order to make invehicle devices safer for on-road use (assuming that automakers will assign the freed-up unnecessary expenditures for improving safety of in-vehicle devices). Also, if unnecessary lockouts are implemented, customers are more likely to turn to portable devices to perform those same tasks, also potentially increasing safety risks, as these devices were never designed for use by drivers while driving (see Attachment 5).

4.3 High Percentage of False Negative Errors Tasks that do meet criteria in the laboratory or simulator, but do not meet criteria in on-the-road conditions, are known as false negatives, meaning the tasks really need lock-out or redesign before release to the general driving public. If the results from the laboratory or simulator test are accepted as final validation for a task, and a road test is never performed or the laboratory test or simulator is not validated, these false negative tasks could potentially be made available to the general public “as is,” when in fact they may be unsafe to perform on the road. Unfortunately, 6 of the 7 proposed NHTSA tests may have a high proportion of false negative errors (see Attachment 2). It is obvious that false negative errors carry potentially serious negative safety consequences. These negative consequences to driving safety potentially arise if a simulator or laboratory test (such as those proposed by NHTSA) show a task to be relatively safe according to the NHTSA Guidelines, and the automaker, in good faith lets the task go on the road without redesign or lockout. The increased crashes could have been avoided if the simulator had been properly validated for driver distraction testing in the first place. (Note that validation for driver distraction testing is totally different than validation of simulated vehicle dynamics in a simulator. Having a simulator validated for vehicle dynamics does not ensure it is validated for driver distraction testing. In fact, it is completely unnecessary to have a complete vehicle cab and simulated vehicle dynamics to achieve a high predictive validity for driver distraction from a laboratory test11,12,13). In particular, any of the NHTSA proposed tests for visual-manual distraction which do not include some sort of peripheral detection task (PDT) as part of the test will not address the attention dimension as it relates to detection and response of on-road events, and are therefore likely to produce false negative errors. In addition, the radio tuning reference task used in several tests has a long single glance duration from NHTSA’s own data, which may contribute to crash causation through attention capture, a cognitive distraction phenomenon where a driver’s attention becomes excessively focused on one object or event. Attention capture can occur with or without conscious awareness. For example, a ringing cellphone provides an external stimulus that could capture attention, and the driver could be aware of the ringing phone without having to look at it, but not necessarily aware that their cognitive attentional “spotlight” has shifted to some degree to the auditory input rather Comment on NHTSA Guidelines

p. 7 of 14

Young

than the visual input from the forward roadway. For radio tuning, attentional capture is more subtle and the driver may be less likely to be consciously aware that their attention has been captured. For example, the radio tuning task was shown by the VTTI data3 to be relatively benign for their three glance metrics and the lateral and longitudinal deviations of the vehicle. But the long maximum single glance that tends to be associated with radio tuning at least some of the time in some subjects (see Attachment 2) may not be “benign” for event detection and response (reflecting underlying attentional problems that are not captured by the longitudinal and lateral vehicle measures, nor the three glance metrics, proposed by NHTSA). A test using fixed criteria that measures glance properties, as well as event detection, in the same test of driver performance while doing a secondary visual-manual task, is therefore the minimum test that I would recommend for final validation of a task (given of course that the test method and its associated acceptance criteria have been successfully validated against road data).

4.4 New Test Methods Are Needed to Reduce Classification Errors Both false positive and false negative classification errors create issues, and both types of errors should be minimized to the extent possible. A test can have both false positive and false negative errors at the same time. Changing a criterion can alter the balance between false negative and false positive errors (i.e., the proportion of each in the overall error total), but a simple change in the criterion cannot reduce both types of errors simultaneously. Only a more valid test can reduce both types of errors at the same time.

4.5 The NHTSA DFD-FC Test May Not Reduce Classification Errors At first encounter, the NHTSA DFD-FC test (Dynamic Following and Detection Protocol with Fixed Acceptance Criteria) seems to be the only test out of the 7 NHTSA-proposed tests which measures glance properties and event detection with fixed criteria, as well as longitudinal and lateral metrics, and therefore is (upon cursory examination) the only NHTSA test apparently suitable for final validation of tasks for driver distraction (again, assuming it had been properly calibrated and validated using road data). It should minimize both false negative and false positive errors compared to the other tests because it has the most comprehensive set of metrics. The test uses fixed criteria, and does not use the radio tuning test as a benchmark as does the DFD-BM test, so the relatively poor event detection associated with the radio tuning test need not lead to false negative errors. The event detection criteria could be separately set by some task other than radio tuning, thereby minimizing the false negative problem arising for event detection when using a radio tuning task. However, a closer examination of the methods of the proposed DFD-FC test reveals several problems with this test as well. First, the DFD-FC test sets its criteria based on the radio tuning task, which is of concern because of the tendency for radio tuning to have a long single glance, and poor event detection (see Section 4.3 and Attachment 2 on false negative errors). Second, the DFD-FC test uses six lights in its PDT, which may produce smaller effect sizes in selective Comment on NHTSA Guidelines

p. 8 of 14

Young

attention tests based on the PDT. Third, the use of the “coherence” metric in the car-following scenario in the DFD-FC test has serious limitations. Fourth, simply repeating a short task many times in a row has potential side-effects that may cloud its predictive validity for on-road experiments, particularly for object and event detection. These concerns about the DFD-FC test are further clarified in Attachment 2 on false negative errors.

4.6 Practical Consideration Questions Not Answered Another concern is that, in addition to a lack of validation tests, no information was provided by NHTSA regarding a number of practical considerations to which automakers would likely need answers before they would implement the draft NHTSA Guidelines. These have been documented by Angell 14 (slides 29-40) and Angell 15 (pp. 67-68) (see Attachment 1, Section 3.2 for details): 1. “Does the assessment [test] provide a diagnostic of task properties?” 2. “Are effect sizes to which the test method is sensitive of sufficient size to have ‘realworld’ meaning?” 3. “Can effects be acted upon with appropriate outcomes?” 4. Other practicality issues: “cost of equipment/software needed, cost of using the method, number and level of staff required, ease of collecting and analyzing data, speed of obtaining results, ease of interpreting data, clarity/precision of discriminations between task effects, credibility/level of support for criteria.”

4.7 Portable Device Hardware and Software Developers Not in Scope Unfortunately, portable device hardware and software developers (as well as the information and web-based service companies that supply content for those portable devices) -- the major ones being Apple, Google, Microsoft, and Facebook -- are not affected by, nor within the intended scope of, the proposed NHTSA Distraction Guidelines. Therefore there is at present no legal obligation for these companies to take into consideration the proposed NHTSA Guidelines. In fact, it is not part of their current business models, nor have these companies shown any inclination in their public statements, public documents, or web postings, to indicate that they are inclined to deal with the concerns of automakers, government, or the general public about driver distraction arising from use of their products as they are currently designed. The avoidance of these issues by these companies potentiates the driving safety risk, but some background information is needed to understand why (see Attachment 5 to this comment).

14

Angell, L., "Conceptualizing Effects of Secondary Task Demands on Event Detection during Driving: Surrogate Methods & Issues," Paper presented at: The Driver Performance Metrics Workshop, San Antonio, Texas, 2008. http://drivingassessment.uiowa.edu/drivingmetrics/P_Conceptualizing%20Event%20Response%20Linda.pdf. 15 Angell, L. "Conceptualizing Effects of Secondary Task Demands During Driving: Surrogate Methods and Issues." Chap. 3 In Performance Metrics for Assessing Driver Distraction: The Quest for Improved Road Safety, edited by Gary L. Rupp. 42-72. Warrendale, PA, USA: SAE International, 2010.

Comment on NHTSA Guidelines

p. 9 of 14

Young

By laying the burden on the automakers in its first phase and none on the portable device makers and app developers for those devices, the proposed NHTSA Distraction Guidelines may unfortunately worsen driving safety. The reason behind this potential increased crash risk is that the NHTSA Distraction Guidelines may accelerate the trend to use of portable devices and apps in vehicles, for which there are at present no controls for safety for use while driving. This problem is particularly acute among the vulnerable population of younger drivers, where it is well documented by NHTSA’s own figures that prevalence of use of these devices while driving is higher than any other age group. In its Driver Distraction Program, 16 NHTSA has stated its intention to release distraction guidelines for portable device makers within two years (by the end of 2013), but whether NHTSA has jurisdiction or not in this area is uncertain (see Section 3 in Attachment 5). It is therefore a major challenge for society, NHTSA, Congress, and all those concerned with driving safety as to how bring the companies associated with making billions of dollars from portable electronic devices to fulfill at least an ethical or moral obligation to improve driving safety, as well as to allay public concern (see Attachment 5 for further discussion on this issue).

4.8 Insufficient Time to Review and Comment The proposed NHTSA Distraction Guidelines and their associated support documents constitute a large amount of information that, in order to meet the revised comment deadline of May 18, 2012, must be analyzed and absorbed quickly. It is unfortunate that government reports are not peer-reviewed before they are made public in the way that documents published in scientific journals are required to be. The only chance for public comment is during this brief comment period. Reviewing multiple publications spanning several years of prior work that NHTSA has published or sponsored is therefore difficult to accomplish in the short time permitted. The situation is even worse for internal company documents, which are rarely published in public journals. The resulting lack of scientific peer review and compressed time for review have unfortunately enabled a number of technical errors to creep into the proposed NHTSA Guidelines and many of the supporting documents. Although such mistakes happen in the scientific literature as well, there are fewer of them, because many such mistakes are caught by the peer review process before publication. Given these problems and the lack of a proper peer review system, numerous errors arise in many government reports and documents. The NHTSA Guidelines and associated supporting documents are no exception, as documented in Section 4.9 below, and Attachment 4.

4.9 Data Values in Graphs and Tables are Inconsistent Attachment 4 is a collection of additional editorial or technical errors in the NHTSA Distraction Guidelines and supporting documents. For example, Error 9 in Attachment 4 shows that some of the key values in the graphs and tables in the key track data report by Perez et al. (2012) do not

16

http://www.nhtsa.gov/staticfiles/nti/distracted_driving/pdf/811299.pdf

Comment on NHTSA Guidelines

p. 10 of 14

Young

match up, despite the fact that they are the same conditions and should be the same data points. Numerous other errors and inconsistencies lead one to conclude that the data as presented are inconsistent. It is hard to be convinced therefore that the data really do support the test procedures, criteria, and conclusions reached by NHTSA investigators, without knowing which data are correct.

5 Discussion The NHTSA Guidelines attempt to create a “level playing field” for all automakers selling vehicles in the U.S., which almost all would agree would be advantageous for driving safety if successfully accomplished. One automaker could not then be in a position to attempt to gain a competitive advantage by allowing relatively unsafe tasks on the road that they might believe would increase vehicle sales, yet increase crash risk for that customer as well as others on or near the roadways. The NHTSA Guidelines also have the potential to fulfill a needed opportunity for an update to the Alliance Guidelines. However, there is a legitimate concern (well documented in the automakers’ postings to this docket) that if the “level playing field” created by the proposed NHTSA Distraction Guidelines does not include portable device makers and software and information technology developers (like Facebook and Google) for those devices, then these Guidelines could lead to a net reduction in driving safety (see Attachment 5). The main concern I have however, is that because no on-road validation results were presented for any of the 7 NHTSA laboratory or simulator tests, it cannot be determined from the available data if any of the 7 surrogate tests would or would not give the same task classifications (as meeting or not meeting criteria) if the tasks had been tested on the road or track, or if naturalistic study data had been available to determine the odds ratios or relative risks of those tasks when performed under real-world conditions. Such validations have been done in the past using tests which bear some resemblance to the NHTSA-proposed tests (e.g. Angell et al., 2006). The extent to which there are false negative and false positive errors associated with the 7 tests is therefore indeterminable from the data and studies that NHTSA has presented in support of their laboratory test suite and criteria. A preliminary examination based on previous data in the current analysis found a high percentage of false positive errors (Attachment 3) and false negative (Attachment 2) errors, leading to concerns about driving safety if these proposed NHTSA Distraction Guidelines were implemented as currently written. As a consequence, implementation of the tests in the proposed Guidelines, if adopted by manufacturers, would have arbitrary and unknown effects on system design and associated distraction potential, and their effects on crashes. Further technical details supporting these concerns are provided in the attachments to this comment. In general, these technical issues cause the data to be insufficiently convincing in support of the proposed tests, criteria, and conclusions reached by NHTSA.

Comment on NHTSA Guidelines

p. 11 of 14

Young

6 Conclusions 1. The 7 test procedures proposed in the NHTSA Distraction Guidelines need on-road validation of the complete test procedures (not just the criteria) before they can be successfully used by automakers as final acceptance tests for visual-manual secondary tasks. 2. NHTSA should address the validity and practicality issues raised here before the release of the final version of these Distraction Guidelines. 3. Numerous errors and a lack of clarity in the methods and results of many of the supporting documents mean that it is not possible to determine if the data as presented support the conclusions that NHTSA has drawn from the data. 4. These errors and contradictions need to be addressed before automakers can successfully use the NHTSA Guidelines as final acceptance tests to ensure the driving safety of visualmanual secondary tasks.

7 Acknowledgments I thank Christine Hallett of the University of Lyon and IFSTTAR for editorial and technical comments. Respectfully submitted,

Richard A. Young, Ph.D. Research Professor Attachment 1: Compliments and Concerns on NHTSA Visual-Manual Driver Distraction Guidelines Attachment 2: Possible False Negative Errors in NHTSA Driver Distraction Tests Attachment 3: Possible False Positive Errors in NHTSA Occlusion Test Attachment 4: Additional Editorial, Technical, and Procedural Errors in NHTSA Driver Distraction Guidelines and Supporting Documents Attachment 5: Portable vs. Built-In Devices in Vehicles

Comment on NHTSA Guidelines

p. 12 of 14

Young

Appendix. Whimsical Illustration of Bias The following illustration is purely didactic. Any resemblance to actual people living or dead is purely coincidental.☺ Scenario 1: NHTSA issues a Distraction Guideline that says, “Apples are Apples.” •

Person 1 says, “I do not agree.” o Biased. Person 1 has incorrectly disagreed with a true statement.



Person 2 says, “I agree.” o Unbiased. Person 2 has correctly agreed with a true statement.



Person 3 says, “I agree.” o Unbiased. Person 3 has correctly agreed with a true statement.

Scenario 2: NHTSA issues a separate Distraction Guideline that says, “Apples are Oranges.” •

Person 1 says, “I do not agree.” o Unbiased. Person 1 has correctly disagreed with a false statement.



Person 2 says, “I agree.” o Biased. Person 2 has incorrectly agreed with a false statement.



Person 3 says, “I do not agree.” o Unbiased. Person 3 has correctly disagreed with a false statement.

Appendix Discussion The responses from the 3 people to the two different types of scenarios show that Persons 1 and 2 are biased, and Person 3 is unbiased. However, that cannot be determined unless Guidelines containing both types of scenarios are released (at least one correct and one incorrect Guideline), and responses to both types of Guidelines have been made (a person remaining silent does not reveal whether they are unbiased or biased). If only a “truly correct” Guideline as in Scenario 1 is issued by NHTSA, then Person 1 is revealed to be biased, and Persons 2 and 3 are apparently unbiased. If only a “truly incorrect” Guideline as in Scenario 2 is issued by NHTSA, then Person 2 is revealed to be biased, and Persons 1 and 3 are apparently unbiased. Again, it is only when at least two separate Guidelines are issued, one that is truly correct and one that is truly incorrect, and responses to both have been made, that Person 3 is the only one correctly shown to be unbiased, and Persons 1 and 2 are correctly shown to be biased.

Comment on NHTSA Guidelines

p. 13 of 14

Young

Of course, in Scenario 1, Person 1 may (incorrectly) accuse Persons 2 and 3 of bias, and Persons 2 and 3 may (correctly) accuse Person 1 of bias. In Scenario 2, Persons 1 and 3 may (correctly) accuse Person 2 of bias, and Person 2 may (incorrectly) claim that Persons 1 and 3 are biased. At the moment, it is my opinion that we are in Scenario 2. NHTSA has good intentions to improve driving safety, but has issued Guidelines with technical flaws (as I have tried to demonstrate in my rather lengthy attachments). I like to think I am among those attempting as unbiased a critique as possible of the proposed NHTSA Guidelines, which means at least in my own mind☺ I fall in the category of “Person 3” (as would any other primarily technical person who is attempting to objectively evaluate the Guidelines on their inherent merits or lack of them). Person 1 is perhaps best characterized as the automakers or any others who tend to be inherently opposed to any government guidelines or regulations, and Person 2 is perhaps best characterized as various non-profit safety organizations, or individuals who are absolutely convinced from their personal experience that any use of electronic devices in vehicles by anyone is inherently unsafe. Therefore, I am guessing that many automakers will likely agree with the negative comments I (and many others) have posted. Oppositely, some safety organizations (and some others) who post comments agreeing with the NHTSA Guidelines, may accuse me (and others) of bias. However, if Scenario 1 (technically correct Guidelines) is hopefully the case in the final revised Guidelines, then I (and other unbiased technical people) will likely equally as strongly support them as we now criticize the current draft Guidelines. In a technically correct scenario, some safety organizations (and some others) will then agree with my position, and some automakers (and some others) may then accuse me of bias. So either way, I (and others who attempt to make objective technical assessments) will be accused of bias!☺

Comment on NHTSA Guidelines

p. 14 of 14

Young

Attachment 1: Compliments and Concerns on NHTSA Visual-Manual Driver Distraction Guidelines Richard A. Young, Ph.D. Research Professor Dept. of Psychiatry and Behavioral Neurosciences Wayne State University School of Medicine Detroit, MI USA May 18, 2012 [email protected] Attachment 1 to “Comment on: Visual-Manual National Highway Traffic Safety Administration Driver Distraction Guidelines: In-Vehicle Electronic Devices (Docket No. NHTSA-2010-00530009).”

1 Compliments 1. With the release of the Driver Distraction Guidelines, NHTSA (2012) has made a bold attempt to address drive distraction, which is sorely needed. It is important for driving safety that drivers direct their attention to things that need attending to for safe driving, and not pay attention to things that do not need attending to for safe driving. Some driving-related tasks should “distract” the driver because they need urgent attention, such as a tire blowing out, etc. By “driving-related,” I mean related to real-time control of vehicle position in lane and speed, and responding in appropriate ways to roadway objects and events. Not paying attention to those driving-related tasks could be an error that may be a causative factor in crashes. However, there are other tasks that are secondary to driving itself such as tuning a radio, looking at a Facebook page on a “smart phone,” or entering a destination using visual-manual alphanumeric data entry. Paying attention to those secondary tasks could be a driver error that can be a causative factor in crashes. “Driver distraction” in this sense is one of the many types of “human error” while driving. Human driving errors in general (of which driver distraction errors are just one component) is a definite or probable cause of 92.6% of vehicle crashes, based on the findings of Treat (1980) (Fig. 1).

NHTSA Comment Attachment 1

p. 1 of 21

Young

Figure 1. Proportion of human, environmental, or vehicular definite or probable causes of 420 property damage or injury crashes from indepth crash investigations (Treat, 1980).

The estimates of the proportion of driver errors preceding crashes that involve a “driver distraction” component (i.e., the prevalence of driver distraction as a crash causation factor) vary widely from study to study, in part due to upon variations in the definition of driver distraction (Young, 2012a). Whether the prevalence of driver distraction in vehicle crashes is increasing or not over the last 10 years is difficult to determine for similar reasons. It cannot be questioned, however, that reducing driver distractions that degrade driver performance is important to improving driving safety. (I here use the term driver performance to be a more general category than driving performance. Driving performance is traditionally used to refer solely to the driver’s real-time control of lane position (lateral variations), and speed (longitudinal variations). I use driver performance as a more general category, which includes the ability of the driver to respond to objects and events in the roadway, as well as maintain lane and speed. Driver performance therefore includes driving performance as a special case. 1 2. NHTSA has resisted a “clean sheet” approach and instead has offered “continuous improvement” of the foundation laid down by the Alliance 2 (2006) Guidelines. 3. NHTSA has spent considerable resources to conduct their own internal studies, as well as fund contractors to conduct studies, to evaluate the Alliance (2006) Guidelines and other test methods and criteria developed by NHTSA (2012). These studies were intended to provide

1 2

I thank Paul Green of the University of Michigan Transportation Research Institute for suggesting this distinction. The Alliance is a trade association of twelve automobile manufacturers, including BMW Group, Chrysler Group LLC, Ford Motor Company, General Motors Company, Jaguar Land Rover, Mazda, Mercedes-Benz USA, Mitsubishi Motors, Porsche, Toyota, Volkswagen, and Volvo.

NHTSA Comment Attachment 1

p. 2 of 21

Young

supporting data for NHTSA’s recommended changes and updates to the Alliance (2006) Guidelines. 4. The draft NHTSA (2012) Guidelines are based upon detailed and resource-intensive studies with many subjects performing several visual-manual tasks. NHTSA used one of those tasks, radio tuning as their primary benchmark for both setting criteria, as well as comparing other tasks to it directly in some of their tests. Other test methods were developed using destination entry as a benchmark. Based on these results, NHTSA put forward three glance criteria as well as an occluded goggles test as their top two recommended tests and criteria for visualmanual tasks.

2 Are These NHTSA Guidelines Reasonable and Applicable for Meeting Their Intended Goals? The intended goal of the draft NHTSA Guidelines is “…to promote safety by discouraging the introduction of excessively distracting devices in vehicles” (NHTSA, 2012, p. 11200). To achieve this goal, NHTSA lists seven laboratory tests with associated criteria (NHTSA, 2012, their Table 3, p. 11222; their Table 10, p. 11241). These tests and criteria may be applied as screening tests (and used in conjunction with the decision criteria articulated in the NHTSA Guidelines) to classify tasks that should be ‘locked out’ except when the vehicle is in park. 3 Several supporting studies have been conducted by NHTSA to help establish numerical values for the simulator and occluded goggles criteria they set, based on experimental data in track studies (Ranney et al., 2009, 2011a,b,c). Data from these other studies have been quoted in the draft NHTSA (2012) Guidelines (e.g. their Tables 5, 6, 7); however, those studies were not published by NHTSA before the comment deadline date of April 24, 2012. Three of these studies have been published since the original deadline date, and the deadline date extended until May 18th, 2012. Some of the data in these studies is still missing, incomplete or contradictory (see Attachment 4). For these and other reasons, it is difficult or impossible to make an objective evaluation of the validity of the draft NHTSA Guidelines with the data and studies so far provided by NHTSA. 2.1 NHTSA Guidelines Have Not Been Demonstrated to Have Predictive Validity Predictive validity involves the ability of a laboratory or “surrogate” simulator test to make a valid prediction of whether or not an automaker should lock out a task during real driving.

3

The NHTSA requirement for being in “Park” is different than the Alliance DFT restriction of “above 5 mph.” As soon as a vehicle starts to move after being stopped and reaches a speed of 5 MPH, the Alliance requirement causes lock-outs to automatically implement, with little crash risk. The NHTSA requirement of requiring the vehicle to be in Park will prevent a driver from performing such tasks when at a stop light or stuck in traffic, increasing the tendency of the driver to use portable devices untested for driving safety, increasing crash risk (see Attachment 5).

NHTSA Comment Attachment 1

p. 3 of 21

Young

Predictive validity has two aspects – the first aspect compares the ability of the laboratory or simulator test to predict what the result would be if the test were to be conducted on an open road or closed test-track instead of the laboratory. The second aspect concerns the ability of the laboratory test to predict what the effects of a task would be on driver performance in real-world driving – with specific reference to the prevalence of a task in association with crashes in crash databases; or, to estimate the relative crash/near-crash risk or odds ratios of a task in “naturalistic” studies. In such naturalistic studies, cameras and other instruments are placed in vehicles to record driver and vehicle performance along with the environmental and traffic conditions prior to a crash, as well as “baseline driving” with no crash. An estimate of relative risk can be obtained in such naturalistic studies under real-world driving conditions recorded over a long period of time. The prevalence of a task in association with a crash can be compared to the prevalence of the task in baseline driving. Both aspects of validity are important for a simulator or laboratory study – on road experimental validity, as well as naturalistic or realworld driving validity measured in terms of relative crash risk. 2.2 False Positive Errors (Attachment 3) A task 4 may not meet criteria in a laboratory and track experiment, and still not increase relative crash risk in crash databases or naturalistic studies (i.e., a false positive – see Attachment 3 about possible false positive results from the occlusion goggles test and criteria in the draft NHTSA Guidelines). A good example is a cellphone conversation, which has been shown to increase response time to visual events in typical laboratory and track experiments (Horrey and Wickens, 2006), but does not increase relative crash risk in “naturalistic” driving studies (Klauer et al., 2006; Olson et al., 2009; Hickman et al., 2009) or real-world driving (Young and Schreiner, 2009). Early case-crossover study analyses (Redelmeier and Tibshirani, 1997; McEvoy et al., 2005) had a bias which caused relative risk estimates of cellphone conversation to be overestimated by about four times (Young, 2012c). A 4-fold cellphone conversation crash risk is a prime example of a “false positive” result which has probably not been accepted as such because of confirmation bias. Again, false positives are safe tasks classified as unsafe for driver distraction by a laboratory or simulator test and criteria, leading to false lockouts or the unnecessary redesign on the part of automakers and their suppliers. False positives may seem to not have safety consequences – merely an overabundance of caution. But there are safety consequences to false positives, because if customers believe a task to be safe to do and they cannot perform it on an automaker’s system, they may perform it on a portable device, which has not been developed or testing for use while driving (see Attachment 5), and thereby increase their crash risk. Also, if automakers

4

For a definition of task, see the Alliance (2006, p. 88), Task is defined as a sequence of control operations (i.e., a specific method) leading to a goal at which the driver will normally persist until the goal is reached. This task definition is further refined in Angell et al. (2012).

NHTSA Comment Attachment 1

p. 4 of 21

Young

spend their resources on unnecessary redesign of tasks that are really safe to perform on the road, they will have fewer resources available to redesign those tasks that really do need redesign. Unfortunately, NHTSA in all the information so far posted to the docket 5 shows no validation test results presented showing the extent to which the laboratory or simulator tests in Table 3 (NHTSA, 2012, p. 11222) produce false positives (i.e., safe tasks incorrectly classified as unsafe for performance while driving). Some preliminary analyses with available CAMP-DWM data (Angell et al., 2006a,b) create concern that at least some of the recommended NHTSA (2012) laboratory tests (in particular the occlusion test) give rise to a greater number of false positive errors compared to other laboratory surrogate driving tests in the literature (see Attachment 3). Making the criteria less strict in such cases will reduce the number of false positive errors, but it will increase the number of “false negative” errors (described in Section 2.3 below and Attachment 2). As noted earlier, only an improvement in the test itself (or combining the test with additional tests) can reduce both types of error simultaneously. Attachment 3 shows that a number of false positive task classification errors were observed after performing the recommended NHTSA (2012) occlusion test on the occlusion data collected from 55 subjects in the CAMP-DWM project (Angell et al., 2006a,b). The goal was to see how well the NHTSA laboratory criterion of 9 seconds total shutter open time (TSOT) correctly classifies tasks as meeting or not meeting the NHTSA eyes-off-road time (EORT) of 12 seconds on the track or open road in the CAMP-DWM data. 6 Fortunately, there were no false negatives or misses as far as predicting glance variables on the road (i.e., unsafe tasks not detected by the test) for the occlusion test. Hence a sensitivity score of 100% is achieved indicating that the occluded goggles test is at the maximum sensitivity possible, a successful result. However, there were 4 false positive misclassifications among the tasks tested on the road and track – these tasks were falsely predicted by the occlusion test to be unsafe for use on the road or track, when in fact they were among the set of 10 tasks that met the on-road criterion of 12 seconds or less EORT (see Attachment 3). This result gives rise to a false positive rate of 40% (4/10) and a “specificity” of only 60%. Specificity is the proportion of truly safe tasks which are so identified by the screening test in the laboratory. It is a measure of the probability of correctly identifying a safe task with the screening test (i.e. the true negative rate). Hence automotive OEMs or suppliers using the occluded goggles test recommended by NHTSA, with the proposed 9-second TSOT criterion

5

http://www.regulations.gov/#!docketDetail;rpp=100;so=DESC;sb=docId;po=0;D=NHTSA-2010-0053.

6

The CAMP-DWM study used eyes-off-road time to the device, whereas the NHTSA metric is total eyes-off-road time anywhere off the forward roadway, not just to the device. Eyes off road time to the device is thus a subset of the NHTSA eyes-off-road time, and is always less than or equal to the NHTSA eyes-off-road time. Therefore it is slightly easier for an on-road task to meet a 12 second eyes-off-road-to-task criterion using the CAMP-DWM metric than the NHTSA metric, indicating a small bias towards more false positives in the CAMP-DWM occlusion data when compared to the CAMP-DWM track and road data.

NHTSA Comment Attachment 1

p. 5 of 21

Young

they recommend, is predicted to have about 40% of “safe” tasks unnecessarily locked out or redesigned when there was no need for such lockout or redesign -- a high enough number to be of concern. Resources spent in such unnecessary redesign will then not be available for redesigning tasks that truly need to be redesigned, leading to a potential net reduction in driving safety. 2.3 False Negative Errors (Attachment 2) If a task meets a comprehensive set of criteria in a validated laboratory or simulator test, or on a road test, it is unlikely to increase crash risk in a detectable manner in real-world driving. But if the tests are not comprehensive enough to cover both dimensions of driver performance (see Attachment 2), then false negative task classifications can occur. That is, tasks can be classified as “safe” in a simulator test, but may be “unsafe” for use in real-world driving. Conducting well-designed on-road test studies is expensive and time-consuming, and such onroad studies are not practical to do for every one of the hundreds of tasks that are possible when using modern-day information, navigation, and entertainment systems. To conduct a naturalistic study is even more time-consuming and expensive, and provides so much data that it cannot be processed in a timely manner. The goal of developing surrogate laboratory tests is to allow reasonably valid predictions of what would happen in a well-designed road test if one were conducted. They represent a good compromise for practical and safety issues, because if a task really is unsafe to use on the road then it is questionable to allow drivers to do it on an open road, even with safety precautions. Surrogate laboratory tests can then be at least partially validated against on-road test results - as has been done in a number of previous studies (e.g., Angell et al., 2002; Angell et al., 2006a,b; Young et al., 2005, 2009). The degree to which the test actually does predict road results can then be directly estimated. Ideally, one should also validate laboratory tests using tasks that have been validated in real-world settings as well, ideally in naturalistic studies. Once a test procedure has been validated on-road or naturalistic studies, then the laboratory test is considered to be calibrated (or validated). That means the laboratory or simulator test can then be used to evaluate new tasks with new devices, to determine whether if they are safe to perform while driving, or need to be redesigned, or ‘locked out’ while the vehicle is in motion. Again, false negatives are misses, or unsafe tasks that meet criteria on the laboratory or simulator test and so are incorrectly permitted on the road. The undesirable safety consequences of false negatives are obvious. The data that would permit such validation tests to be conducted by researchers or engineers concerned with these issues has unfortunately not been provided by NHTSA (2012). To re-emphasize this point, there is a lack of such validation of the NHTSA test methods, or the availability of data which would allow potential users of the tests to perform the validation themselves. Hence, it is difficult or impossible to determine whether the draft NHTSA Guidelines are reasonable and applicable for meeting their intended goals.

NHTSA Comment Attachment 1

p. 6 of 21

Young

To reiterate, there are well-established quantitative methods for determining the predictive validity of driver distraction data in the laboratory in terms of driver distraction data on the road or test-track (Angell et al., 2002; Angell et al., 2006a,b; Young et al., 2005, 2009). Validity testing of laboratory or track experimental studies using naturalistic driving data is considered as the “gold standard,” but this has not yet been implemented in any peer-reviewed study that I am aware of. Invalid screening tests can produce an excessive number of false positive and false negative errors. This excessive number of errors occurs regardless of the criterion – changing the criterion value just changes the relative balance between the relative number of false positive and false negative errors. A better test is needed to reduce both types of errors at the same time. Even fully validated screening tests, when coupled with decision criteria (or redlines), produce some pattern of classification errors (no test is perfect), and so it becomes important to understand the number and type of classification errors that are made by a test. Several brief reviews of the recommended NHTSA test protocols for false negative (Attachment 2) and false positive errors (Attachment 3) were conducting using existing data from the Crash Avoidance Metrics Partnership Driver Workload Metrics (CAMP-DWM) project, which was conducted from 20012006, and sponsored by NHTSA. Five of the 7 NHTSA proposed tests do not make use of a peripheral detection task or its variants. A preliminary evaluation in Attachment 2 shows that these 5 tests therefore tend to make false negative errors because the test metrics employed do not screen visual-manual tasks for poor event detection. Thus they would incorrectly find that potentially unsafe visual-manual tasks meet the NHTSA criteria. 2.4 Lack of Validation of NHTSA Proposed Tests To further emphasize this point, it is unfortunate that none of the docketed or published NHTSAsponsored studies, nor any material in the NHTSA (2012) Guidelines draft, evaluate the extent to which the NHTSA-proposed laboratory tests produce valid predictions for any on-road test or naturalistic data. Many of the NHTSA test methods are similar to, or derive from, methods used in the industry or those that appear in the scientific literature, some of which have formal validation studies associated with them (e.g. Angell et al., 2002; Angell et al., 2006a,b; Young et al., 2005, 2009), However, NHTSA has developed its own set of criteria and test methods which includes variations on these test methods, such as the number of repetitions of the test or the number of subjects and their ages to be tested. These test variations have not been given formal validation testing by NHTSA, at least that they have published. It is important that each specific test method NHTSA recommends, if it does not represent a validated method, be so evaluated for validity. The differences in methodological details can affect which tasks meet or do not meet criteria, and the percentage of false positive and false negative results from the laboratory or simulator test, as described below.

NHTSA Comment Attachment 1

p. 7 of 21

Young

In particular, NHTSA has not provided any data showing the extent to which the 7 laboratory tests in Table 3 (NHTSA, 2012, p. 11222) produce classification errors for tasks assessed in a simulator. Again, these two types of classification errors – false positives and false negatives – were described in the sections 2.2 and 2.3 above with reference to driver performance testing. 2.5 Limitations of Experimental Track and Road Studies There are general limitations to any experimental track or road study. The CAMP-DWM project (Angell et al., 2006a,b) illustrates some of these generic difficulties. The CAMP-DWM project related laboratory surrogates to both on-road and test track experimental trials. These driving trials (even those conducted on open public roads) were part of an experiment, and were not in any meaningful sense ‘naturalistic.’ A short duration session was used (about 1-1.5 hours). The test participant was accompanied by an experimenter and test engineer while driving. There was a “requested-task” paradigm in which they scheduled events (CHMSL onset, lead vehicle coastdown, following vehicle turn signal onset as seen through the left or center rear-view mirror, and task performances without events) to co-occur with the requested tasks. Participants were indeed driving real cars in a three-car platoon on public roads or at the Michigan Proving Ground; however, as previously mentioned, one cannot say that these conditions are representative of ‘natural’ driving. Why were the CAMP-DWM trials on the open road or test track not representative of “naturalistic driving”? Some reasons are described by Angell (2008, slides 36-37): 1. Events are often “scheduled” to occur once or more per task, regardless of task length. 2. Events often mimic real events, but are not fully natural and may not elicit fully natural response. 3. Responses to events in an experiment may differ from the natural response (e.g., button press rather than braking or steering response, and perhaps under a lower sense of risk). 4. Research participants either receive a single surprise trial per experiment – or receive more, and are aware that objects will occur, but unaware of exactly where or when. 5. Task effects on event detection are not by themselves indicative of crash risk. Crash risk reflects many things besides cognitive, manual, or visual demand as measured in an experiment: for example, frequency and duration with which drivers engage in the task, the co-occurrence of environmental events with the task (whose probability is affected by task length), the environmental conditions under which the task is done, the individual driver state at the time the task is performed, etc. Other limitations of experiments (whether in the laboratory or on the road) in generalizing to real-world driving are: 6. There is no long-duration exposure to tasks (weeks or months). 7. Experimental protocols rarely allow for driver choice and decision-making that depends upon the context. In an experiment, drivers are not given choices regarding whether, NHTSA Comment Attachment 1

p. 8 of 21

Young

when, and often how they might perform the requested tasks. For example, in on-road tests, drivers are required to follow particular routes. The environmental context (road conditions, weather, etc.) is constrained and drivers are not allowed to make choices in experiments, or even to “opt out” of performing a task. Naturalistic and real-world driving data studies show that driver choice and the environmental context in which tasks are performed (or not performed) matters. It has been argued by a prominent researcher in driving safety (personal communication) that when it comes to highway safety and evaluation of relative crash risk of different tasks, environmental context, driver choice and their interaction will affect the results and estimates of relative crash risk. In other words, evidence from real-world studies is accumulating that drivers in general do a reasonable job of self-regulating their secondary task behavior in everyday driving (e.g. Young and Schreiner, 2009). That is, drivers don’t usually perform secondary tasks during periods of high driving demand. Similarly, there are some real-world events that happen so randomly, unexpectedly and suddenly that it is impossible for any driver to avoid a crash, no matter how conscientious and cautious they are, even if they are fully attentive to the forward roadway (deer jumping out; box falling off a truck, brick falling off a bridge). Having such tasks as part of an experimental design ensures that drivers will not meet criteria for such tasks. It is then an impossible task for automakers to design an interface that would satisfy such test conditions, because no design could meet criteria. A PDT-like task in the laboratory has been shown however, to not place undue burdens on the participant, and can actually well predict on-road track or open road event detection results such as a lead vehicle brake light illumination (Angell et al., 2006a,b). Therefore, the on-road validation test results of the NHTSA test methods and criteria (as shown in Attachments 2 and 3) should be considered as preliminary only. These preliminary validation tests however were not encouraging as to the predictive validity of the NHTSA test methods and criteria that were examined. As stated, the ideal data set for validating tasks for real-world driving safety would be real-world crash experience and naturalistic driving data that capture actual crash and near-miss events in detail (i.e., assuming these data are properly analyzed to combine near-crash and crash events in an epidemiologically sound way). However, the link between event response data on the road and track to real-world crash data is difficult to establish, given that the conditions and events usually are usually specific to an individual and circumstances which seldom if ever re-occur in an identical manner for other drivers (Angell, 2008, slide 35). Nevertheless, the relative crash risk that may be associated to the engagement with a given task while driving is important to determine to improve driving safety, as is the prevalence of the engagement with that task during normal driving 7 as well as in crashes (both are necessary to estimate relative risk). A valid

7

By “normal” driving, I here mean a driving baseline without performing the secondary task in question, which is the common epidemiological usage of baseline. NHTSA and others interpret “baseline” to mean not engaging in

NHTSA Comment Attachment 1

p. 9 of 21

Young

estimate of crash risk also requires specification of the element of time. Crash risk is the probability that a crash will occur within a stated period of time. The risk of a particular driver crashing in the next minute is rare; however, the risk of that driver crashing sometime in their lifetime is not rare. The prevalence of an activity before a crash does not mean the same as the relative crash risk of that activity. For instance, 100% of all crashes are preceded with the driving breathing; however, it would be wrong to say that the high prevalence of breathing before a crash indicates that breathing is the cause of a crash. In order to estimate relative risk, the prevalence of a task being engaged in before a crash has occurred must be compared with the prevalence of the same task being performed during normal driving with no crash having occurred. To put in other words, since 100% of drivers breathe during normal driving, the relative crash risk associated with breathing while driving is no different than that of normal driving (that is, the relative risk is 1). If a cellphone conversation occurs 6.7% of the time during normal driving (Funkhouser and Sayer, 2012), and if the cellphone conversation percentage in the 5 seconds before a crash is also 6.7% then the relative crash risk of cellphone conversation is no different than that of normal driving. The prevalence of integrated electronic systems in police-reported crash reports is about 0.5%; however, the point made here is that this prevalence would need to be compared against the prevalence of such usage in normal driving to estimate relative crash risk of usage of integrated electronic systems. A preliminary attempt to validate some of the NHTSA test protocols was made in a pilot study using the CAMP-DWM public data (Angell et al., 2006a,b) and is provided in Attachments 2 and 3. This examination was tentative as the full CAMP-DWM data set with individual subject data and individual glance durations was not available to the investigator. For example, 85th percentile values had to be interpolated from the 50th percentile and 75th percentile values to see if individual visual-manual tasks in the CAMP-DWM data met or did not meet the NHTSA glance criteria on the road and track. Although these data used data derived from the CAMP-DWM study (and thus were subject to the limitations noted above). I also attempted in a preliminary manner to relate these findings to those from crash studies. Although these findings are preliminary and tentative, they represent the only validation test of the proposed NHTSA (2012) Guidelines that is known to this investigator. As such, these findings, as preliminary as they are, are of interest to those attempting to evaluate the “validity” of the proposed NHTSA laboratory tests. They give a rough indication of the extent to which the NHTSA test methods and criteria may be subject to false positive and false negative errors in their classification of tasks. 2.8 Discussion In sum, the NHTSA Guidelines have not yet been shown to be reasonable and applicable for meeting their intended goals, because they lack published validity testing to show they correctly

any “secondary” tasks. This latter definition is impractical, because although observable secondary tasks can be screened out from baseline, “cognitive distraction” from thoughts unconnected with driving cannot be (Young, 2012a).

NHTSA Comment Attachment 1

p. 10 of 21

Young

predict track or on-road experimental results (not to mention real-world crash data or naturalistic study data). Quick pilot checks in Attachments 2 and 3 with publicly available CAMP-DWM data shows that five of the NHTSA-proposed tests have potentially excessive false negative errors (Attachment 2), and the recommended occlusion test for visual-manual tasks has potentially excessive false positive errors (Attachment 3). 2.9 Conclusion The NHTSA Guidelines lack validity testing, and therefore have not been demonstrated as reasonable and applicable for meeting NHTSA’s intended goals.

3 How Likely Are Automakers to Adopt These NHTSA Guidelines? 3.1 Validity Issues Decrease Probability of Adoption As mentioned in Section 2 above, NHTSA has not provided data to permit evaluation of the validity of their laboratory tests for predicting on-road results, including the sensitivity and specificity of their 7 tests. Attachment 2 shows that a quick pilot check of the 3 NHTSA glance criteria indicates that 5 of their tests may give rise to a “false negative” result for many tasks – the tasks will meet the NHTSA glance criteria, but they may have long single glances and poor event detection and response. Some potentially unsafe tasks may therefore be incorrectly classified as safe to perform while driving if the NHTSA-proposed tests and criteria were used for final validation of visual-manual tasks. Also, a quick check of the recommended occlusion test with the NHTSA-sponsored CAMPDWM data indicates about a 40% false positive rate. That means the automakers may do unnecessary lockout or redesign for tasks given final acceptance testing with the occlusion method. Unless this preliminary result can be contradicted by more extensive data and analyses, it is possible but unlikely that the automakers would adopt the occlusion test protocol and criteria as recommended in the NHTSA Guidelines. Because the occlusion test is recommended in the Alliance (2006) guidelines, it is possible that some automakers have been using it, and may have internal data that shows the occlusion test is valid in predicting on-road test results with few false positives. If so, then it is possible that those automakers may adopt at least the NHTSA (2012) occlusion test protocol, which is virtually the same as the ISO 16673:2007 occlusion standard. But it is unlikely that automakers will do so with the NHTSA proposed 9-second TSOT criterion, when NHTSA’s own analysis found was too stringent relative to their other test data. Also, the report issued on May 3, 2012 by Perez et al. (2012) of a survey of experts found that all agreed that the Alliance 15-second TSOT criterion was reasonable. Until the other 6 NHTSA-proposed test protocols have their validity evaluated, it is unknown to what extent these other tests may also give rise to similar false positive task classifications. Likewise, 5 of the proposed tests have no direct metric for event detection (mean single glance duration may be indirectly associated with event detection however, as per Young, 2012b). NHTSA Comment Attachment 1

p. 11 of 21

Young

Therefore, tasks with poor event detection could be found to meet criteria in the laboratory test, and could be put on the road without lockout or redesign. NHTSA therefore does not have a sufficient technical basis that would support automakers’ use of any of these 5 tests for final validation testing of a task. 3.2 Level Playing Field Increases Probability of Adoption In a positive vein, one of the primary reasons why so many automakers signed on to the Alliance (2006) Guidelines was to create a “level playing field” so that no vehicle manufacturer could choose not to “lockout” or redesign unsafe tasks on the road, and thereby gain a putative competitive advantage over automakers who adopted the more prudent and safety-conscious approach. There are still some automakers selling vehicles in the North America who have not adopted the Alliance Guidelines. Therefore there may be incentive for at least some automakers to support and adopt the NHTSA Guidelines, even given their lack of validation, in order to create a “level playing field.” 3.3 Threat of Defect Investigations Leads to Uncertain Probability of Adoption In a negative vein, there are comments in the proposed NHTSA Guidelines that contain language going beyond “voluntary” guidelines, despite NHTSA’s insistence in meetings and the wording in sections of the Guidelines document the guidelines are “voluntary.” For example, on p. 11218 the NHTSA Guidelines state: New in-vehicle technologies are being developed at an extremely rapid pace. NHTSA does not have the resources to evaluate the safety implications of every new device before it is introduced into vehicles. Such a practice would dramatically slow the rate of introduction of new technology into vehicles. Finally, and most importantly, adopting such a practice is unnecessary in light of the National Traffic and Motor Vehicle Safety Act of 1966’s requirement that each manufacturer bears primary responsibility for products that they produce that are in motor vehicles. A manufacturer that produces a vehicle or item of motor vehicle equipment that either does not comply with the FMVSSs or contains a defect creating an unreasonable risk to safety must recall the vehicle or equipment and provide the owner a remedy. 49 U.S.C. 30118–30120. Accordingly, a section has been included in the NHTSA Guidelines emphasizing that, to protect the general welfare of the people of the United States; manufacturers are responsible for refraining from introducing new in-vehicle devices that create unreasonable risks to the safety of the driving public.

The above language is repeated on p. 11234, Section I.1. It is further repeated regarding aftermarket and portable devices that manufacturers “produce” (p. 11250). Note that the term “manufacturers” is not preceded by “auto” so technically the clause below could also apply to portable device makers such as Apple and Google (see Attachment 5). VIII.2 Unreasonable Risks with Aftermarket and Portable Devices. NHTSA reminds manufacturers that they are responsible for ensuring that aftermarket and portable

NHTSA Comment Attachment 1

p. 12 of 21

Young

devices they produce which may reasonably be expected to be used by vehicle drivers do not create unreasonable risks to the driving public.

And then likewise for auditory-vocal interfaces: IX.2 Unreasonable Risks with Auditory-Vocal Interfaces. NHTSA reminds manufacturers that they are responsible for ensuring that devices they produce which have auditoryvocal portions of their interfaces do not create unreasonable risks to the driving public.

Some automakers may feel concern with this language associating the Distraction Guidelines with the FMVSS laws regarding “defects” that create an “unreasonable risk.” If an automaker chooses not to follow these “voluntary” guidelines, then does that open them to recalls because of alleged product defects that are not compliant with the guidelines? If automakers judge that they are vulnerable to recalls based on this language, or if NHTSA indicates that it is going to pursue the “unreasonable risk” clause in the document, then automakers may end up locking out so many features and functions (many unnecessarily because of “false positives”), hastening the migration to portable devices which may be in the “doughnut hole” for regulatory action (see quotes from David Strickland, NHTSA Administrator, in Attachment 5). Furthermore, some of the automakers within the Alliance may have product development processes which call for the use of some of the recommended tests during different phases of product development. For example, occlusion testing may be used early in design for the purpose of identifying tasks that may need further attention from the development team for the device (e.g., evaluation for possible design improvements, or consideration of further testing). For such manufacturers, the process would call for occlusion testing to be followed up by further testing later in the development process (since it would not be used as a final ‘verification’ test to determine whether a task does or does not meet glance criteria). 3.4 Practicality Issues Decrease Probability of Adoption There is also a practical side that goes beyond the question of validity. Even if experimental results were perfectly valid in predicting real-world crash risk and other safety consequences, practical considerations may independently reduce or delay acceptance of the NHTSA Guidelines by automakers and suppliers. These practical considerations have been well described by Angell (2008, slides 29-40; 2010, pp. 67-68) in the indented paragraphs below: 1. Does the assessment provide a diagnostic of task properties? For example, is it diagnostic of “type of loading” or “amount of loading”' or some other task property that can be changed during product development to improve situation awareness and event detection-and-response? Note that the emphasis here is on an assessment providing clues about what task properties could be improved (versus providing very specific identification of detailed reasons for the outcome of the evaluation). If an assessment clearly indicates that the visual demand of a task is high, then it provides some diagnostic value. Of course, additional evaluations could also be done in such a case (such as NHTSA Comment Attachment 1

p. 13 of 21

Young

content analysis of the screen) to determine why eyes-off-road time is long. However, with assessments of event detection, it may be less straightforward to identify the diagnostic information than for some other dimensions of assessment. 2. Are effect sizes to which the test method is sensitive of sufficient size to have ‘real-world’ meaning? This is a very difficult issue, given that there is still very limited knowledge of the transfer function from lab, simulator, and test-track venues to the real-world situations in which drivers experience difficulty or do not. Nonetheless, it is an issue of practicality that requires attention and careful thought to assure diligence in decisions made on test outcomes. Can effects be acted upon with appropriate outcomes? That is, once the effects are known, can steps be taken to reduce the task's effects on the measured attribute -that is, to improve system design or operation so that any interference with driving is reduced or eliminated? And if such changes were made, and the task/system were re-tested, would they result in an improved test outcome? 3. Can effects be acted upon with appropriate outcomes? That is, once the effects are known, can steps be taken to reduce the task's effects on the measured attribute - that is, to improve system design or operation so that any interference with driving is reduced or eliminated? And if such changes were made, and the task/system were re-tested, would they result in an improved test outcome? • Example: If a task scores poorly on “event detection” measures, is it clear what can be modified to reduce its interference with event detection? o e.g., will reducing a secondary task's "amount of workload" lead to better event detection performance? o Answer for event detection is: Not always! o Based on Young and Angell (2003), only a portion of event detection is affected by or related to “amount of workload” (the first component in their analysis). Thus, sometimes, changing amount of workload does not substantially affect event detection performance. Instead, changes in other task properties are necessary (e.g., changing the type of resources that the task demands must be done to improve event detection, or changing the pattern of demands over the period of the task's length). This underscores the critical importance of understanding clearly what it is that a method, test, or metric is “measuring.” 4. Other practicality issues include: • Cost of equipment/software needed • Cost of using the method • Number and level of staff required • Ease of collecting and analyzing data • Speed of obtaining results • Ease of interpreting data • Clarity/precision of discriminations between task effects NHTSA Comment Attachment 1

p. 14 of 21

Young



Credibility/level of support for criteria

Unfortunately, the draft NHTSA Guidelines (2012) do not provide guidance on these practical considerations, just as they do not provide data on validity for either road testing or real-world crash effects. 3.5 Competitive Issues Increase Probability of Adoption by Some Automakers Some automaker may feel that they have a competitive advantage with their program and experience with internal driver distraction testing for meeting the Alliance (2006) Guidelines. They may therefore conclude that they can easily adopt their internal processes to accommodate the new NHTSA Guidelines, and achieve a competitive advantage over other car companies which do not have the experience and procedures in place to handle driver distraction guidelines, beyond rudimentary conformance with the Alliance (2006) Guidelines. These companies may not have the expertise in-house to make sophisticated analyses of the new NHTSA Guidelines, and so not realize the many drawbacks that have been outlined in this and many other responses to the NHTSA Guidelines. They will therefore support the Guidelines without realizing the difficulties. 3.6 Concern for Public Safety Image Increases Probability of Adoption by Some Automakers No automaker can be seen in the public eye as compromising on vehicle safety. Almost every automaker has internal slogans concerning “safety is our first priority,” etc. Despite the many technical limitations of the NHTSA (2006) Guidelines which they may well be cognizant of, these automakers may consider themselves as forced into acceptance of the Guidelines, because they are concerned that any rejection or public criticism of the Guidelines will lead to a poor public safety image for their company. 3.7 Conclusions • Unless these validity and practicality issues are addressed by NHTSA, the automakers, academia, or some combination thereof, the adoption of the NHTSA Guidelines by the automakers (and/or their suppliers) selling vehicles in the U.S. is questionable. • Even if the technical issues in the draft release are fully addressed in the final release of the NHTSA Guidelines, it may still be the case that some proportion of the automakers will not sign on to them, simply because some automakers oppose any form of government guidelines or regulations affecting their industry. Automakers who signed onto the Alliance Guidelines (2006) may simply wish to continue using those (which have arguably improved driving safety over the last 10 years), despite the fact that many of the principles need to be updated by new research.

NHTSA Comment Attachment 1

p. 15 of 21

Young

4 How Likely Are Equipment Suppliers to Adopt These NHTSA Guidelines? Equipment suppliers will generally follow the wishes of the vehicle manufacturers, but not always. For example, a manufacturer may request that a supplier unlock a feature or function which the supplier’s data indicate may not be safe for use on the road. The supplier may choose in such circumstances to not unlock the feature or function without indemnification by the vehicle manufacturer. However, the vehicle manufacturer may be unwilling to provide that indemnification. The reverse case is also possible. Moreover, some automakers may not test the products in scope for meeting or not meeting the NHTSA Guidelines, and may require the suppliers to do so. But suppliers will only do that if the testing costs have been encompassed in their original bid. Finally, given that this is a safety-related matter, automakers may not wish to rely solely upon the test results given to them by a supplier, without some internal verification of the supplier’s test results. Suppliers were not signatories on the Alliance documents (nor on letters of commitment which accompanied them). However, many automakers expected suppliers to follow the Alliance (2006) Guidelines, and, in practice, a number of them did. Major Japanese suppliers such as Denso or Panasonic might in some cases follow the JAMA Guidelines, which are nominally more stringent than the Alliance (2006) Guidelines. 8 4.1 Conclusion Because suppliers did not have to adopt the Alliance document, it cannot be determined from past experience to what extent equipment suppliers will adopt the NHTSA (2012) Guidelines.

5 How Should NHTSA Monitor Adoption of these NHTSA Guidelines to Evaluate their Effectiveness? How Should It Make Public the Results of that Monitoring? Once their test procedures and criteria are validated, NHTSA should assess conformance 9 of the in-scope products of automakers and suppliers with the NHTSA Guidelines. One way is to test products, either internally at NHTSA or through contractors, and assign safety ratings such as is done now with NCAP. Safety ratings would need to be along both dimensions of driver performance conducting visual-manual tasks. 10 A single safety rating for driver distraction

8

Some Japanese suppliers have gotten around the stringency of the JAMA 7 second TSOT. Apparently, they do it by taking advantage of their loose definition of “task,” which apparently lets them test what the Alliance (2006) would define as subtasks (Perez et al., 2012, p. 17). 9 Conformance is a better term here than compliance, which is used in association with regulations, which are mandatory. 10 There is a potential hidden problem with this suggestion however. If the guidelines are truly voluntary, then some automakers may choose not to adopt them (particularly some they appear to contain so many problems). Devices from automakers that are not in conformance will then presumably obtain safety ratings that are worse than those

NHTSA Comment Attachment 1

p. 16 of 21

Young

should not be done because it is not technically correct -- driver distraction is 2-dimensional, not 1-dimensional (Young and Angell, 2003; Angell et al., 2006a; Angell, 2010; Young, 2012b). A minimum of 2 separate safety ratings will therefore be necessary – one for driver workload, and one for event detection and response – the 2 main dimensions of driver distraction (Young and Angell, 2003; Young, 2012b). In other words, object and event detection performance is distinct from vehicle control performance. Eyes-off-road metrics can be used as a surrogate for objectand-event detection only to the extent of the obvious situation of eyes-off- -road – “did not look so could not see.” 11 Object and event detection performance is largely independent of lanekeeping for example, since lanekeeping, as a highly over practiced psychomotor skill, can make use of learned processing requiring little top-down attentional control, supported by peripheral vision. 12 Obviously, looking away from the road scene at sufficient visual eccentricity 13 will make even brake light onset in a lead vehicle ineffective as a cue (for the following vehicle) to brake. Furthermore, perception of closing rates and some other changes in the forward road scene (e.g., sudden lane encroachment of another vehicle) may not be perceived without foveal vision. However, the second dimension of driver performance is independent of eyes-off-road time. It largely represents object and event detection performance, independent of driver workload. Some form of a PDT-like test is necessary to probe and measure tasks which score highly on this dimension of driver performance (Young and Angell, 2003; Young, 2012b). A new ISO standard is in preparation which details several variants of the PDT test, now called the “Detection Response Test” or DRT (ISO NP 17488, in preparation; see also Hsieh et al., 2012; Young et al., 2012d). Assuming that a laboratory test has been validated with road and track data, meaning that the task can be predicted with that laboratory test (with a reasonable degree of certainty) to meet or not meet criteria during actual rather than simulated driving, the problem with excessive “false positives” from laboratory to road predictions in experimental data will largely disappear. If both driver workload and event detection metrics are measured, then the problem with false negatives (failure to measure event detection properties of the task) will also largely disappear, at least for laboratory to road experimental predictions. There may still be difficulties in generalizing to realworld relative or absolute crash risk, as described next.

from automakers who do comply. However, there is a vocal minority of customers who have little regard for safety considerations, along with a strong distaste for lockouts, and will therefore deliberately look to purchase vehicles that have poor driver distraction safety ratings, since they will prefer more functional telematics devices (fewer lock-outs) when driving in their vehicles. 11 However, mean single glance duration or maximum single glance duration is strongly associated with object and event detection – see Young (2012b). 12 On the other hand, lane-keeping metrics may be more sensitive to manual load than are visual metrics. 13 Visual eccentricity is defined as “The visual angle, relative to the center of the fovea, at which a certain visual stimulus impinges on the retina.” (ISO NP 17488, in preparation).

NHTSA Comment Attachment 1

p. 17 of 21

Young

5.1 Generalizing to Real-World Crashes Ultimately all driver distraction metrics and methods should ideally be validated against realworld naturalistic driver behavior and real-world crash experience. Assuming that neither the laboratory nor the on-road experimental test protocol have been validated with real-world crash or naturalistic driving data, the question arises: Can anything at all be concluded based on experimental results, or must one wait for the ultimate validation of the test procedure with realworld crash data? I suggest that if a given task meets experimental test criteria for both dimensions of driver performance (driver workload and event detection), then ten years of experience with the Alliance (2006) Guidelines indicates it is plausible that the task will be safe to perform in real-world driving. There is as yet no known task which has been tested as “safe” in a validated laboratory or closed road experimental test by automakers subscribing to the Alliance (2006) Guidelines, where real-world driving data has found that task to be unsafe after release of the device to the public, according to crash investigation or naturalistic driving data. Of course, if the experimental test procedures are not comprehensive enough to cover the primary dimensions of driver performance (Young, 2012b), then false negatives will likely occur (see Attachment 2). The converse is not necessarily true, however. Just because a task does not meet criteria in an experimental test does not mean it will necessarily lead to real-world crashes, unless there has been some real-world validation of the test procedure. The reasons have to do with driver choice, and the differences between a real-world context and an experimental context. One good example is cellphone conversation, which consistently shows an increase in response time to a visual event of 150-200 msec in both simulator and closed--road experimental studies (Horrey and Wickens, 2006). However, cellphone conversation does not increase crash risk beyond normal driving as shown in real-world (Young and Schreiner, 2009) and naturalistic studies (Klauer et al., 2006; Olson et al., 2009; Hickman et al., 2009). Studies claiming that cellphone conversation does increase crash risk to four times baseline driving (Redelmeier and Tibshirani, 1997; McEvoy et al., 2005) contain a substantial bias arising from part-time driving in control windows (Young, 2012c). When that bias is adjusted for, the risk of cellphone conversation is near that of baseline driving (Young, 2012c). 5.2 Conclusions • NHTSA should monitor adoption of these NHTSA Distraction Guidelines to evaluate their effectiveness by testing systems itself, or contracting with external entities to do so.

• NHTSA should make public the results of that monitoring by public posting of test results, along with other safety ratings such as NCAP. 14

14

Note however as mentioned elsewhere in these comments and attachments, a vocal minority of the general auto buying public may deliberately purchase vehicles with low driver distraction safety ratings in order to avoid lockouts, defeating the purchase of the rating system at least for those customers.

NHTSA Comment Attachment 1

p. 18 of 21

Young

6 References Alliance of Automobile Manufacturers Driver Focus Telematics Working Group, "Statement of Principles, Criteria and Verification Procedures on Driver-Interactions with Advanced in-Vehicle Information and Communication Systems, June 26, 2006 Version," Alliance of Automobile Manufacturers, Washington, DC, 2006, http://autoalliance.org/files/DriverFocus.pdf. Angell, L.S., Young, R.A., Hankey, J.M., and Dingus, T.A., "An Evaluation of Alternative Methods for Assessing Driver Workload in the Early Development of in-Vehicle Information Systems," Paper presented at: Society of Automotive Engineers Government/Industry Meeting, Washington, DC, May 2002, http://www.sae.org/technical/papers/2002-01-1981. Angell, L., Auflick, J., Austria, P., W, B., Diptiman, T., Hogsett, J., Kiger, S., Kochhar, D., and Tijerina, L., "Driver Workload Metrics Project, Task 2 Final Report," National Highway Traffic Safety Administration, 2006a, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/Driver %20Workload%20Metrics%20Final%20Report.pdf. Angell, L., Auflick, J., Austria, P., Biever, W., Diptiman, T., Hogsett, J., Kiger, S., Kochhar, D., and Tijerina, L., "Driver Workload Metrics Project, Task 2 Final Report, Appendices," National Highway Traffic Safety Administration, 2006b. Angell, L., "Conceptualizing Effects of Secondary Task Demands on Event Detection during Driving: Surrogate Methods & Issues," Paper presented at: The Driver Performance Metrics Workshop, San Antonio, Texas, 2008, http://drivingassessment.uiowa.edu/drivingmetrics/P_Conceptualizing%20Event%20Response%20Linda.pdf. Angell, L. "Conceptualizing Effects of Secondary Task Demands During Driving: Surrogate Methods and Issues." Chap. 3 In Performance Metrics for Assessing Driver Distraction: The Quest for Improved Road Safety, edited by Gary L. Rupp. 42-72. Warrendale, PA, USA: SAE International, 2010. Angell, L., Cook, J., and Perez, M., "Support for NHTSA’s Development of Guidelines on Distraction-Potential from Visual-Manual Interfaces: An Examination of the Definition of “Task” and Task Taxonomies Based on Interviews with Experts," National Highway Traffic Safety Administration, Washington, D.C., 2012. Funkhouser, D., and Sayer, J. A naturalistic cellphone use census. Transportation Research Board 2012 Annual Meeting, Washington, DC. Paper 12-4104. Available at: http://amonline.trb.org/1soumj/1. Accessed March 13, 2012. Hickman, J., Hanowski, R.J., and Bocanegra, J. Distraction in commercial trucks and buses: assessing prevalence and risk in conjunction with crashes and near-crashes. Report No. FMCSA-RRR-10-049. Washington, DC: US Department of Transportation, Federal Motor Vehicle Carrier Safety Administration; 2010. Horrey, W.J. and Wickens, C.D., "Examining the Impact of Cellphone Conversations on Driving Using MetaAnalytic Techniques," Human Factors 48(1):196-205, 2006, doi:10.1518/001872006776412135, http://www.scopus.com/inward/record.url?eid=2-s2.033646374399&partnerID=40&md5=5f4410fc94bd7b7004d0af42b97dbb16. Hsieh, L., Young, R., and Seaman, S., "Development of the Enhanced Peripheral Detection Task: A Surrogate Test for Driver Distraction," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1), 2012, doi:10.4271/2012-01-0965. ISO NP 17488. Detection-Response Task for assessing selective attention in driving. Road vehicles - Transport information and control systems -Man machine interface. TC13, WG8 (in preparation).

NHTSA Comment Attachment 1

p. 19 of 21

Young

Klauer, S.G., Dingus, T.A., Neale, V.L., Sudweeks, J.D., and Ramsey, D.J. The impact of driver inattention on nearcrash/crash risk: an analysis using the 100-car naturalistic driving study data. Report No. DOT HS 810 594. Washington, DC: National Highway Traffic Safety Administration; 2006. McEvoy, S.P., Stevenson, M.R., McCartt, A.T., et al. Role of mobile phones in motor vehicle crashes resulting in hospital attendance: a case-crossover study. BMJ. 2005;331:428–430. doi:10.1136/bmj.38537.397512.55. NHTSA, "Visual-Manual NHTSA Driver Distraction Guidelines for in-Vehicle Electronic Devices,” Docket No. NHTSA–2010–0053, 77, Federal Register, 2012, https://federalregister.gov/a/2012-6266. Olson, R.L., Hanowski, R.J., Hickman, J.S., and Bocanegra, J. Driver distraction in commercial vehicle operations. Report No. FMCSA-RRR-09-042. Washington, DC: US Department of Transportation; 2009. Perez, M., Hulse, M., and Angell, L., "Support for NHTSA Visual-Manual Guidelines: Expert Review of the Visual Occlusion Method and How It Compares to Driver Eye Glance Behavior," National Highway Traffic Safety Administration, 2012. Ranney, T.A., Baldwin, G.H.S., Vasko, S.M., and Mazzae, E.N., "Measuring Distraction Potential of Operating inVehicle Devices, DOT HS 811 231," 2009. Ranney, T.A., Baldwin, G.H.S., Parmer, E., Domeyer, J., Martin, J., and Mazzae, E. N., "Developing a Test to Measure Distraction Potential of in-Vehicle Information System Tasks in Production Vehicles, DOT HS 811 463," 2011a. Ranney, T.A., Baldwin, G.H.S., Parmer, E., Martin, J., and Mazzae, E. N., "Distraction Effects of Manual Number and Text Entry While Driving, DOT HS 811 510," 2011b. Ranney, T.A., Baldwin, G. H. S., Parmer, E., Martin, J., and Mazzae, E. N., "Distraction Effects of in-Vehicle Tasks Requiring Number and Text Entry Using Auto Alliance’s Principle 2.1b Verification Procedure, NHTSA Technical Report Number DOT HS 811 571," 2011c. Redelmeier D, and Tibshirani R. Association between cellular-telephone calls and motor vehicle collisions. N Engl J Med. 1997;336:453–458. Available at: http://www.medicine.mcgill.ca/epidemiology/hanley/temp/practicum/redelmeierAllPlusLetter.pdf. Accessed March 19, 2012. Treat, J.R., "A Study of Precrash Factors Involved in Traffic Accidents," 0146-8545, 10(6) May-June, 11(1) JulyAug, University of Michigan Highway Safety Research Institute Ann Arbor, MI USA, 1980, http://psycnet.apa.org/psycinfo/1981-13775-001. Young, R.A. and Angell, L.S., "The Dimensions of Driver Performance During Secondary Manual Tasks," Driving Assessment 2003: Second International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Park City, Utah, July 2003, http://drivingassessment.uiowa.edu/DA2003/pdf/25_Youngformat.pdf. Young, R.A., Aryal, B., Muresan, M., Ding, X., Oja, S., and Simpson, S.N., "Road-to-Lab: Validation of the Static Load Test for Predicting on-Road Driving Performance While Using Advanced Information and Communication Devices," Proceedings of the Third International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Rockport, Maine, June 2005, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf. Young, R.A., and Schreiner, C. Real-world personal conversations using a hands-free embedded wireless device while driving: effect on airbag deployment crash rates. Risk Analysis, 2009;29:187–204. Young, R.A., Angell, L., Sullivan, J.M., Seaman, S., and Hsieh, L., "Validation of the Static Load Test for Event Detection During Hands-Free Conversation ", Proceedings of the Fifth International Driving Symposium on

NHTSA Comment Attachment 1

p. 20 of 21

Young

Human Factors in Driver Assessment, Training and Vehicle Design 5:268-75, 2009, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf. Young, R.A., "Cognitive Distraction While Driving: A Critical Review of Definitions and Prevalence in Crashes," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1):2012a, doi:10.4271/2012-01-0967. Young, R.A., "Event Detection: The Second Dimension of Driver Performance for Visual-Manual Tasks," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1), 2012b, doi:10.4271/2012-01-0964. Young R.A. Cellphone use and crash risk: Evidence for positive bias. Epidemiology. 2012c;23(1):116-118. eAppendix available at: http://links.lww.com/EDE/A535. Accessed February 13, 2012. Young, R., Seaman, S., and Hsieh, L., "Measuring Cognitive Distraction on the Road and in the Lab with Wayne State Detection Response Task," Transportation Research Board 2012 Annual Meeting, Washington, D.C., 2012d.

NHTSA Comment Attachment 1

p. 21 of 21

Young

Attachment 2: Possible False Negative Errors in NHTSA Driver Distraction Tests Richard A. Young, Ph.D. Research Professor Dept. of Psychiatry and Behavioral Neurosciences Wayne State University School of Medicine Detroit, MI USA May 18, 2012 [email protected] Attachment 2 to “Comment on: Visual-Manual National Highway Traffic Safety Administration Driver Distraction Guidelines: In-Vehicle Electronic Devices (Docket No. NHTSA-2010-00530009).”

Abstract The proposed glance metrics in the draft NHTSA (2012) Guidelines glance metrics are given a preliminary test for the extent to which they permit false negative errors (that might lead to allowing unsafe tasks on the road while driving). Most of the proposed tests for visual-manual distraction do not address the attention dimension as it relates to detection of and response to onroad events. Furthermore, the radio tuning task used as a reference task exhibits long single glance durations that may contribute to crash causation through “attention capture.” The NHTSA (2012) Guidelines test called DFD-FC (Dynamic Following and Detection Protocol with Fixed Acceptance Criteria) therefore appears at first glance to be the only one of the seven NHTSA (2012) tests that will not likely give rise to a great many “false negatives” – tasks that meet criteria in a laboratory test, but would not meet criteria in a track or road test. However, a closer examination of the protocols in the DFD-FC test reveals a number of weaknesses in that test as well. The fixed criteria for that test set by NHTSA may need to be reevaluated, at least for the event detection metrics, because the DFD-FC test relies on the radio tuning task for its reference criteria.

1 Introduction There are three NHTSA (2012) glance criteria that must be met in three of their seven recommended tests – EGDS (Eye Glance Testing Using a Driving Simulator), DFD-BM (Dynamic Following and Detection Protocol with Benchmark), and DFD-FC (Dynamic NHTSA Comment Attachment 2

p. 1 of 32

Young

Following and Detection Protocol with Fixed Acceptance Criteria) (see NHTSA, 2012, their Table 10). These three criteria are:

1.1 Limiting Mean Duration of Single Glances The NHTSA (2012) criterion for mean duration of single glances is similar to that of the Alliance (2006). NHTSA. For at least 21 of the 24 test participants, the mean duration of all individual eye glances away from the forward road scene should be less than 2.0 seconds while performing the secondary task (NHTSA, 2012, p. 11229). Alliance. A task will be considered to meet criterion [for single glance durations] if the mean of the average glance durations to perform a task is ≤ 2.0 sec for 85% of the test sample (Alliance, 2006, p. 53).

1.2 Limiting Total Eyes off Road Time The NHTSA (2012) criterion for eyes off road time is stricter than that of the Alliance (2006). The total eyes off road time criterion is tightened from 20 seconds to 12 seconds, and also is measured as total eyes-off-road time (i.e., anywhere off the road), rather than total glance time to device. Glances off the road to, for example, mirrors or speedometer during a task’s performance will therefore count against the task in the NHTSA (2012) method NHTSA. For at least 21 of the 24 test participants, the sum of the durations of each individual participant’s eye glances away from the forward road scene should be less than, or equal to, 12.0 seconds while performing the secondary task one time (NHTSA, 2012, p. 11230). Alliance. A task will be considered to meet criterion [for total glance time] if the mean total glance time to perform a task is ≤ 20 sec for 85% of the sample of test participants (Alliance, 2006, p. 53).

1.3 Limiting Percentage of Long Single Glances This is a new criterion not present in the Alliance (2006) Guidelines. It is intended to reduce the prevalence of long single glances in the “tail” of a glance duration histogram. NHTSA. For at least 21 of the 24 test participants, no more than 15 percent (rounded up) of the total number of eye glances away from the forward road scene should have durations of greater than 2.0 seconds while performing the secondary task (NHTSA, 2012, p. 11230).

1.4 Occluded Goggles Total Shutter Open Time The NHTSA (2012) criterion for eyes off road time is reduced to 9 seconds total shutter open time (TSOT), and required for at least 85% of test participants. It is stricter than the Alliance (2006) criterion of 15 seconds TSOT. NHTSA. OCC.6.a For at least twenty-one of the twenty-four test participants, the task was successfully completed during six or fewer viewing intervals (i.e., a maximum of 9.0 seconds of shutter open time) (NHTSA, 2012, p. 11243). NHTSA Comment Attachment 2

p. 2 of 32

Young

Alliance. If a task can be successfully completed with total shutter open time < 15 sec (with reasonable statistical confidence), the task would be considered to meet both criteria [for single glance durations] and [total glance time] (Alliance, 2006, p. 52).

2 Concern: Five of the NHTSA (2012) Tests by for Assessing Visual-Manual Distraction Do Not Address the Attention Dimension The portion of distraction due to attentional phenomena associated with visual-manual interactions while driving is not addressed by five of the NHTSA (2012) test protocols proposed for visual-manual tasks. These are the two recommended test protocols for visual-manual tasks: EGDS (Eye Glance Testing using a Driving Simulator) and OCC (Occlusion testing). Also, the test protocols STEP (Step Counting), DS-BM (Driving Test Protocol with Benchmark), and DSFC (Driving Test Protocol with Fixed Acceptance Criteria) contain no test for object and event detection. Therefore, visual-manual tasks can meet criteria for all five of these tests, and still have poor event detection and response. This lack of a test for event detection is a matter for concern, because about one third of performance decrements during visual-manual tasks are due to attentional phenomena – aboveand-beyond eyes-off-road time or other driver workload variance. These attentional phenomena are not addressed by these five tests because they do not test for object and event detection. The object and event detection test is only included in the two test procedures DFD-BM (Dynamic Following and Detection Protocol with Benchmark) and DFD-FC (Dynamic Following and Detection Protocol with Fixed Acceptance Criteria). These are not the NHTSA (2012) recommended tests for visual-manual tasks. NHTSA indicates that the tests they favor for measuring distraction from visual-manual tasks are the EGDS and OCC tests. Thus, the EGDS and OCC recommended tests for visual-manual tasks in the NHTSA (2012) Guidelines provide only a part of the protection that they are intended to provide for visual-manual tasks.

2.1 The New Glance Criterion Restricting Long Single Glances Does Not Sufficiently Address Attentional Issues in Distraction Recent studies suggest that long single glances have an independent contribution to crash risk above and beyond that of eyes-off-road time (Horrey and Wickens, 2007; Victor, 2011; Victor and Dozza 2011; Young, 2012b; Liang et al., 2009, in press; Victor et al., in preparation). Long single glances may reflect an underlying attentional process in attention shifts. These analyses indicate it is not just the mechanistic aspect of eyes off the road that is the sole problem in missed events or crash causation. The attentional processes underlying long single glances play an independent role in event detection and probably in crash causation as well. It is therefore important to ensure that long single glances are adequately covered by the criteria in the NHTSA (2012) Guidelines. The draft NHTSA (2012) Guidelines have attempted an important NHTSA Comment Attachment 2

p. 3 of 32

Young

advancement over the Alliance Guidelines in this regard, by adding a third glance criterion intending to limit long glances, listed in Section 1.3 above as “Limiting Percentage of Long Single Glances.” Unfortunately, a question remains about whether the NHTSA proposed method and criterion is, by itself, adequate, to limit long single glances. 2.1.1 Hypothetical Data: The NHTSA (2012) Glance Criteria Still Permit a Long Single Glance If the criteria above are applied to hypothetical data, it becomes apparent that, in theory, tasks with 7 to 10 average glances of 1 sec each could have one single glance as long as 3-6 sec and still meet all three NHTSA glance criteria (Table 1). For example, a long glance (3-6 seconds) could occur at any one of the glance locations up through the last red location indicated in rows 7 through 10 in Table 1 (shown in red). Table 1. Hypothetical data demonstrating that the three NHTSA (2012) glance criteria still permit a long single glance of 3 to 6 seconds.

2.1.2 Real Data: Radio Tuning Tasks Have a Long Single Glance Duration Fig. 1 shows that radio tuning has a long tail to the distribution of eye glance durations greater than 2 seconds (Rockwell, 1988; also quoted in Alliance, 2006, p. 57).

NHTSA Comment Attachment 2

p. 4 of 32

Young

Figure 1. Distribution of eye glance durations when manually tuning a radio (Rockwell, 1988).

2.1.3 Real Data: NHTSA’s (2012) Data Shows Radio Tuning Tasks Have a Long Glance Duration Table 2. Data from NHTSA (2012, their Table 7, p. 11228; as revised April 3, 2012 by Maddox, 2012) from the VTTI Smart Road demonstrates that for radio tuning tasks for all nine radios tested, there is a long single glance ranging from 2.1 to 3.2 seconds (red box), as predicted by the analysis in Table 1 and the on-road data in Fig. 1.

NHTSA Comment Attachment 2

p. 5 of 32

Young

2.2 Why Long Single Glances a Concern? Long single glances may reflect attention capture, a prolonged engagement of attention at an invehicle location. When there is no subjective cue or external cue to interrupt attention to a secondary task, a glance to the task can linger if processing is not complete (see Altman and Trafton, 2002; Lee et al., 2012). This effect can occur in short tasks, for example, where there is no subjective feeling of heavy workload as is typically associated with relatively long visualmanual tasks (Young and Angell, 2003; Fig. 2A below). Hence drivers can maintain a long single glance without being aware of it during relatively short, low workload tasks. These long single glances are associated with poor event detection and response, even more so than eyesoff-road time or other driver workload metrics.

2.3 “Attention Capture” Not Itself Caught by Driver Workload Variables Event detection explains about one-third of the variance in driver performance, orthogonal to the variance in driver workload metrics, including eyes-off-road time (EORT), number of glances, lane keeping, speed maintenance, headway or any other conventional driver workload metric (Young and Angell, 2003; Young, 2012b; also see Angell, 2007, 2010). Fig. 2 shows that “mean single glance duration” (glncedur or GlnceDur) is associated with the attentional processes underlying event detection response time and miss rate variables (red box) during visual-manual NHTSA Comment Attachment 2

p. 6 of 32

Young

tasks, but is not associated with EORT, subjective workload, task time, lane deviations, speed deviations, subjective situation awareness, or any other driver workload variable measured (black box). That is, the longitudinal and lateral movements of the vehicle do not capture the key attentional component associated with event detection and response, because it is a separate dimension.

Figure 2. Loadings of driver performance variables on the first two dimensions of driver performance. A (left). Principal components for 77 visual-manual tasks in five vehicles, on VTTI Smart Road (Young and Angell, 2003). B (right). Principal components for CAMP-DWM data (Angell et al., 2006a,b) for 13 visual-manual tasks on the track venue (Young, 2012b). The distinct groupings of the metrics in the two dimensions are quite similar.

NHTSA Comment Attachment 2

p. 7 of 32

Young

Figure 3. A (left). Of 13 visual-manual tasks in CAMP-DWM track data with the dimensions shown in Fig. 2b, 5 tasks (38%, red bars) have low workload (small EORT, low TSOT, few glances), but poor event detection and long single glance durations (Young, 2012b). B (right). 2-dimensional plot of the task scores on dimension P1 (horizontal axis) and dimension P2 (vertical axis). The HVAC, radio tuning, MapHard, and RouteTrace tasks have relatively low workload scores (P1 dimension), but poor event detection (positive score on P2). The RadioHard task is similar to the Alliance (2006) and NHTSA (2012) radio tuning benchmark task.

Table 2 shows that for the CAMP-DWM data (Angell et al., 2006b), the RadioHard, RadioEasy, and HVAC tasks meet the NHTSA (2012) Occlusion TSOT criterion in the laboratory (first column of numbers). These tasks also meet the NHTSA (2012) glance criteria for 85th percentile single glance duration and 85th percentile EORT on the road. They would thus meet criteria for both the NHTSA (2012) recommended tests OCC and EGDS. Yet these tasks have a mean number of glances in the 7-10 glance caution range as per the hypothetical model in Table 1 (2nd column of numbers). Indeed, they all had a maximum single glance duration of more than 2 seconds (fifth column of numbers). The maximum single glance duration shown here is not even the longest single glance experienced by any subject in the experiment. It is the longest average single glance duration across the subjects. The longest maximum glance would be even longer than shown here. Tasks causing a long maximum average single glance duration in a subject were in turn associated with poor event detection (high miss rate and long RT to CHMSL event) relative to other visual-manual tasks. None of these tasks meet the NHTSA (2012) miss rate and response time (RT) criteria for Test DFD-FC. The CAMP-DWM RadioHard task [a close equivalent to the NHTSA (2012) and Alliance (2006) benchmark radio task] has a CHMSL miss rate of 33%, meaning that when a participant was performing this task on the road they missed about 1/3 of the brake light activations on the forward vehicle while on the open highway in the CAMP-DWM study (Angell et al., 2006a). Missing event lights while driving on an open highway in an experimental setting is no guarantee that the same thing would happen in realworld non-experimental driving. Nonetheless, recent data from real-world driving suggests that such long single glances may be associated with real-world crashes and near-crashes as described in the next section. Table 2 (next page). CAMP-DWM (Angell et al., 2006b) data demonstrating that the RadioHard, RadioEasy, and HVAC Tasks on the road meet the NHTSA (2012) mean single glance and EORT test criteria, as well as the occlusion test criterion in the lab. Yet these tasks still show a long (maximum) single glance of more than 2 seconds on the road, as well as having poor event detection (high miss rates and long RTs) for the CMHSL forward vehicle brake light activation.

NHTSA Comment Attachment 2

p. 8 of 32

Young

2.4 Long Single Glances Away From Road May Be Associated with Crashes Recent analyses (Victor and Dozza, 2011; Liang et al., in review) suggest an increased prevalence of long single glances immediately preceding a precipitating factor 1 (such as forward vehicle brake lights coming on) just before a crash into the rear end of the forward vehicle (Fig. 4). It is implied by Victor and Dozza (2012) the long glance overlapping the precipitating factor is the proximate cause of the crash. However, the time duration between the precipitating factor and the crash is not specified. If that time is short (e.g., less than a second), the analysis by Victor and Dozza (2011) has not strictly ruled out that a reflex aversion of the eyes to the impending crash may be what they are seeing. Although such reflex aversions (flinching) have been occasionally observed in crash videos from the commercial vehicle naturalistic studies, they are not common however (Rich Hanowski, personal communication).

1

“Precipitating Factor (GES Variable V26, Critical Event). The driver behavior or state of the environment that begins the event and the subsequent sequence of actions that result in a crash, near-crash, or incident, independent of who caused the event (driver at fault). The precipitating factor occurs outside the vehicle and does not include driver distraction, drowsiness, or disciplining child while driving.” (Klauer et al., 2006, p. 157).

NHTSA Comment Attachment 2

p. 9 of 32

Young

Figure 4. Victor and Dozza (2011) re-analysis of 100-car rear-end crash data showing that a long single glance in the last second precedes a rear-end crash, suggesting that it is the proximate cause of the crash.

Although it is not recognized by some that crashes can occur without a precipitating factor (e.g. a single vehicle simply drifts off the road and goes into a ditch), typically if these long single glances immediately precede (or co-occur) with a conflict or some other precipitating event on the roadway, the relative risk of a crash increases compared to baseline driving. Fig. 5 shows that long single glances increase crash odds ratios when they co-occur with a conflict (Victor, 2011). In particular, Fig. 5 shows that single glances > 1.0 sec occurring just before the precipitating event (just before a rear-end crash) are associated with a 2- to 4-fold increase in the crash odds ratio compared to shorter glances, which is an effect size larger than that predicted by Eyes off Road Time (EORT) (see Fig. 5) (Victor, 2011).

NHTSA Comment Attachment 2

p. 10 of 32

Young

Figure 5. Re-analysis of 100-car data (Victor, 2011) showing that long glances and the last single glance time (red oval) before a crash have a stronger effect on the odds than does eyes-off-road time (total glance time) or “glance history.”

2.5 Caveats It is important to point out that these analyses by Victor and Dozza (2011) and Victor (2011) are all retrospective (from crash outcomes), rather than prospective from long single glances away from road that exceed a given duration. The baseline data in Fig. 5 (blue bars) suggests that long single glances were not as common in the baseline data examined as they were in the immediate second before a rear-end crash. However, the prevalence of long single glances away from the road in baseline driving is not yet fully known, and only some of those long glances may be associated with secondary tasks. Even fewer long glances may co-occur with conflicts that increase the probability of a crash. It is also possible that the glance away from the road just before a crash is a protective reflex as the driver sees they are about to crash, and is thus not a proximate cause of the crash at all, but is rather a reflex response to the imminent crash. The effect seen in Figs 4 and 5 would thus be an artifact. Nonetheless, these preliminary findings by Victor and Dozza (2011), Victor (2011), and Liang et al. (in press) may be important in underscoring the potential role of both long glances and attention shifts as risk factors in crashes, and should be investigated further.

3 Discussion The top two NHTSA (2012) recommended test procedures (EGDS, OCC) do not include a test for event detection and response, and also do not sufficiently limit long single glances (see Tables 1 and 2). Hence, the NHTSA (2012) recommended tests do not address the attentional component of driver distraction. They allow visual-manual tasks with potentially poor event NHTSA Comment Attachment 2

p. 11 of 32

Young

detection performance to meet the NHTSA (2012) criteria (false negatives). Indeed, a number of common relatively low workload tasks can “escape” the filter of the three eye glance criteria in the NHTSA (2012) recommended EGDS and OCC test, as shown by the CAMP-DWM data and NHTSA’s own data (Ranney et al., 2012).

3.1 Recommendation The glance criteria put forward by NHTSA (2012) are a necessary and important step for controlling driver distraction from high workload tasks with long eyes off road time, but there is a need to do more. Simply tightening the single glance duration limit to be lower than the 15% criterion is not recommended because it does not address the underlying problem of the attentional shifts that give rise to long single glance durations. Instead, it is recommended that an additional event detection and response test (above and beyond glance measures) is required to evaluate the effect that a device or task has on the underlying attentional processes which contribute to controlling long single glances. The only NHTSA (2012) tests which provide that are DFD-BM (Dynamic Following and Detection with Benchmark) and DFD-FC (Dynamic Following and Detection with Fixed Criteria). The DFD-BM test uses the radio tuning task as a benchmark, and that task already tends to contain a long duration glance (Table 2), making that benchmark (and test) unsuitable for this purpose. Therefore, after a quick initial evaluation, the DFD-FC test seems to be the only suitable test to be used for final validation for visual-manual tasks. However, closer examination finds that this test too has some serious drawbacks, as detailed in later sections below. In any case, it is also recommended that a simple PDT-like test be added to the test suite, compatible with ISO NP 17488 (in preparation), to allow for evaluation of event detection and response by automakers. 2 These PDT-like tests can be combined with eye-movement analysis to form a complete test suite, without having to use a full-blown simulator, that has predictive validity with correlation of 0.9 for on-road driver performance of visual-manual tasks for the full set of eye glance and event detection and response metrics being considered by NHTSA (e.g. Angell et al., 2002; Young and Angell, 2003; Young et al., 2005; Young et al., 2009; Hsieh et al., 2012; Young, 2012b,c).

3.2 Radio Tuning Alternatives and DFD-FC Test Criteria The three glance criteria for the DFD-FC test, while fixed, are nonetheless based on the radio tuning task – and may be associated with unacceptable attentional performance because the radio tuning task may itself contain a long single glance (and still meet the 3 NHTSA glance criteria). Therefore, the rationale for the use of the radio tuning task in establishing the fixed glance 2

If a simple PDT-like test is added to the Guidelines, the language should perhaps employ the terminology now being used within the International Standards Organization (ISO), where “PDT” is referred to as “DRT” (for Detection Response Task) and a standard method is under development (ISO NP-17488, in preparation; see also Hsieh et al., 2012; Young et al., 2012c).

NHTSA Comment Attachment 2

p. 12 of 32

Young

criteria for the DFD-FC test is in question. How can those values be valid, if they are based on a task that reflects unacceptable attentional performance? (It should be noted that when the radio tuning task was selected for use as a reference task by the Alliance, it was before the finding that there is an attentional element to driver performance for visual-manual tasks that goes beyond what is reflected in eyes-off-road time or mean single glance duration metrics (see Sections 2.2 and 2.2). Now, however, it would be preferable (and more logical) to base the visual demand criteria on a reference task that reflects acceptable visual and attentional performance. (And, if no such reference task can be found, then perhaps it would be better to base the criteria on an alternative approach altogether). In short, the basis for the fixed criteria that are recommended in the Guidelines for visual demand – and for the DFD-FC test – may need re-examination. One alternative is for NHTSA to provide a different set of criteria– or a new rationale for establishing such a set (perhaps requiring the collection of new data), given that the basis for the current criteria set has been questioned by the arguments made above. However, this solution might lead to a delay in the adoption of the NHTSA Guidelines until this other criteria set could be developed, tested, and validated. One way to define this new criterion set or benchmark task is to use naturalistic study data – looking across such studies to obtain enough crash data to be reliable – combining naturalistic findings from the 100-car study and the heavy truck studies sponsored by VTTI, and the teen driving studies sponsored by U. Iowa, for example. This method would allow an alternative approach based on real-world naturalistic driving data. Data from SHRP 2 could then be used to refine this methodology in the years to come. A second alternative is to consider the radio tuning task as acceptable, but only for setting one of the three NHTSA glance criteria – EORT. That is the only glance variable that is clearly related to dimension 1 of driver performance 3 (see Fig. 2). Radio tuning would also be suitable for those other metrics that are related solely to the first driver performance dimension -- which includes longitudinal and lateral vehicle control measures, task time, etc. (see Fig. 2, variables in lower right of each graph). An additional benchmark task (besides radio tuning) would then be needed to set the criteria for single mean glance duration, and event detection and response variables (Fig. 2, variables in upper left of each graph). Event detection and response performance (as measured for example by a PDT-like test) should then be required in combination with the glance measures. In the CAMP-DWM track and open road data analyzed by Young (2012b) and in the 79 visual-manual tasks on the Smart Road analyzed by Young and Angell (2003) lateral and longitudinal control metrics were highly correlated with eyes-off-road time and number of glances, so only one dimension (a single parameter) is needed to capture all the variance in those variables. In the supporting data provided by NHTSA (2012) it is not as clear that the lateral and longitudinal control metrics are as closely associated with EORT and number of glances. 3

I here use the term “driver” performance to include attentional mechanisms of the driver, as well as “driving” performance, which is simply keeping the vehicle in the lane and maintaining speed. A driver can have good driving performance and poor driver performance, because they can still miss events and crash even when staying in their lane and maintaining speed. That is why 2 dimensions are needed to describe driver performance, and only 1 to describe driving performance.

NHTSA Comment Attachment 2

p. 13 of 32

Young

Correlational analyses would be required to determine that, which have not been reported in the published NHTSA studies to date attempting to support the NHTSA Guidelines. Regardless of what other measures are used, event detection performance should also be measured and criteria met for event detection for any visual-manual task before it can be considered unlockable when the vehicle is in motion. Event detection performance was not included in the original Alliance (2006) document, but it is one of the ways that NHTSA can improve upon the foundation laid by the Alliance, in the light of the new data about event detection found since the Alliance completed its Guidelines in 2006. Both the Alliance Guidelines and the NHTSA Guidelines rely almost completely upon the use of radio tuning as a benchmark for setting all criteria. Unfortunately, as this attachment shows, radio tuning is not suitable to use for setting benchmarks for event detection, because it has poor event detection, even though it has relatively low driver workload (Young, 2012b). Some task other than radio tuning needs to be used to set event detection criteria (or to be used as a reference task for event detection), because the radio tuning task has unacceptable event detection performance as shown in Table 2 in this Attachment. Exactly what that benchmark task should be will require further research, but the analysis of the CAMP-DWM data by Young (2012b) or the analysis by Young and Angell (2003) of 79 visual-manual tasks, reveals a number of candidate tasks which have both low workload and good event detection (radio tuning not being one of them). The Alliance Guidelines (which also apply solely to visual-manual tasks) contain this same imperfection – they have no criteria for event detection. The Alliance Guidelines, despite this imperfection – have served some useful purposes in the last decade. Those companies which adhere to the Alliance Guidelines, the general public, and NHTSA representing that general public’s interest, have it in their interest at least to get the rest of the (non-Alliance) vehicle manufacturers following such Guidelines, to create a “level playing field.” And some would argue that even imperfect Guidelines are a reasonable interim step, and making this small step should therefore be seen as reasonable and prudent, and likely to bring the vehicle manufacturers and their suppliers on board. A larger step could then be made a number of years later, when there is a more robust set of naturalistic data (e.g. from the SHRP2 data set) than we currently have. The trouble with this argument is that the automakers who signed a letter of intent to comply with the Alliance (2006) Guidelines have already had six years’ experience with those Guidelines. Yet those Guidelines also contain a lack of completeness by missing about 1/3 of the variance in driver performance due to the lack of event detection and response performance criteria that would prevent high scores on the second dimension of driver distraction (Young et al., 2012b). That is, there is little question that many overly-distracting tasks (on the driver workload dimensions of driver performance) have been “caught” in the Alliance (2006) Guidelines safety net, and have been consequently locked out or redesigned over the past 6 years or longer. On the other hand, tasks may have been unknowingly “escaping” through the safety net for that same number of years, due to poor object and event detection performance. If NHTSA Comment Attachment 2

p. 14 of 32

Young

NHTSA simply adopts its NHTSA (2012) Guidelines for visual-manual tasks that again do not have object and event detection performance adequately measured with appropriate criteria for performance of visual-manual tasks while driving, then such tasks will continue to be released for use on the road without lockout or redesign. It would seem that now would be an ideal opportunity for NHTSA to add a PDT-like test as a part of any final validation test for visualmanual tasks (as well as for auditory-vocal tasks in its intended future guidelines). This added test may be particularly important because of recent data suggesting that long single glances may be important if they co-occur with a precipitating factor (such as the onset of braking in a lead vehicle) that precedes a crash or near-crash, even more so than total eyes-off-road time or glance history (Liang, 2009, in press; Victor and Dozza, 2011; see Sections 2.2 and 2.3 above). This possible relationship with real-world crashes is examined further in the following sections.

3.3 Radio Tuning and Crash Causation In an analysis of the in-depth field investigation data from 4,536 crashes in the Crashworthiness Data System (CDS),Wang et al. (1995, their Table 1, p. 6) found that distraction “from adjusting radio, cassette, or CD” was the number 3 distraction crash “causal factor” in 1.2% of drivers (2.1% of crashes), ranking only behind “looked-but-did-not-see” crash “causal factor” in 5.6% of drivers (9.7% of crashes), and “distracted by outside person, object, or event” in 2% of drivers (3.2% of crashes). 4 Note that the methods of the CDS identify these as “causal factors” based on their crash investigation methods, but the percentage figures indicated are technically simply prevalences, albeit based on expert judgment of the crash investigators. To establish causation in a scientific sense, these prevalence figures must be compared against prevalence in baseline driving where there was no crash involved. For example, there is 100% prevalence of the driver breathing in the seconds just before a crash, but there is also 100% prevalence of the driver breathing in normal baseline driving. Therefore, breathing should not be considered a crash causation factor. Hence, prevalence alone is necessary but not sufficient to establish causation. Note also that the prevalence rate of distraction in crashes is necessarily always higher than the prevalence rate of distraction in drivers. That is because crashes often involve more than one vehicle, and therefore more than one driver. For example, if distraction is involved in 18% of crashes, but there is an average of 2 vehicles (drivers) in the crashes, then distraction is involved in only 9% of drivers. Also the figures reported are likely conservative because of the high number of crashes with “unknown” causation (38.5% of drivers, 46% of crashes). The 1.2% figure represents an upper bound on the prevalence for radio tuning in this crash causation study, because the Wang et al. (1995) figures include all radio adjustments, which would include volume and preset adjustments as well as tuning.

4

The percentage is higher for crashes than drivers because many crashes involve more than 1 vehicle, so there is more than 1 driver involved – see paragraph after next.

NHTSA Comment Attachment 2

p. 15 of 32

Young

The NHTSA (2008) National Motor Vehicle Crash Causation Survey was a nationally representative sample of 5,471 crashes that were investigated during a 2½- year period from July 3, 2005, to December 31, 2007. This study found that “adjusting radio/CD player/ other vehicle controls” had a weighted prevalence factor of 0.9% in crash-involved drivers (NHTSA, 2008, their Table 10, p. 27), slightly decreased from the Wang et al. (1995) figure of 1.2% of drivers. This ranked fifth, below conversing (11.6%), other interior non-driving activities (3%), and looking at movements/actions of other occupants (1.3%). Again, this 0.9% figure is an upper bound because it could include volume, pre-sets, and other radio operations besides tuning. Again, note that these are only prevalence figures and are not sufficient by themselves to determine the relative risk associated with these tasks. For example, if cellphone conversation occurs during normal driving (without crashes) say 10% of the time, then there is no increase in relative crash risk for cellphone conversation, if cellphone conversation has a prevalence of 10% or lower just before a crash. Likewise the 0.9% to 1.2% figure for radio tasks for drivers involved in crashes, says nothing about the relative risk of radio tuning, without knowing what the prevalence of radio tuning in normal baseline driving is (without crashes). The relative risk of radio tuning could be higher than, equal to, or lower than normal baseline driving (without crashes). The prevalence of radio tuning as a prevalence factor in crashes must be compared with the prevalence of radio tuning in baseline driving to determine the relative risk. Such a comparison was attempted in the baseline control data in the naturalistic case-control studies conducted at VTTI known as the “100-car” study (Klauer et al., 2006). Although there were only 61 crashes in that study, the study summed the crash and near-crash counts to achieve sufficient power to estimate odds ratios with a narrow confidence band. The crash/near crash crude odds ratio for radio tuning was then 0.55 (95% confidence limits 0.13 to 2.22) (Klauer et al., 2006, their Table 2.5, p. 30). The high confidence range indicates that there were insufficient instances of radio tuning observed in the crash, near-crash, or baseline data to achieve a narrow confidence range. The recent studies showing a link between long duration single glances to increased odds ratios of a crash (Victor and Dozza, 2011; Victor, 2011; Liang et al.,2009, in press) have not yet provided a stratification of the odds ratios according to the task being performed just before the crash. Because there are only 61 crashes in the 100-car study, there may not be sufficient crashes for a given task strata to have an acceptably small confidence range. Other studies show that radio tuning tasks have a low frequency of occurrence in baseline driving. . In an instrumented-vehicle study of radio tasks, Curry et al. (1998) found that “volume” was the most frequent adjustment with more than 16 operations/hour in a standard radio. “Preset” operations (setting and recalling station presets) and the “Seek” function fell into a second tier of controls which were accessed on average between 4 and 5 times per hour. All other operations had less prevalence, with “Auto Tone” (1.5/hour), “band,” and “power” leading radio tuning. Tuning the radio manually was done at about 1 operation/hour (about the same NHTSA Comment Attachment 2

p. 16 of 32

Young

frequency as turning the radio on), whether using a standard radio or a radio with an equalizer. 5 Once per hour seems low compared to “Volume,” “Preset” or “Seek” but one per hour is still a non-trivial frequency. Unfortunately prevalence (% of driving time) in baseline driving cannot be determined without knowledge of the driving time for these drivers, and the duration of each task, which was not provided in in the Curry et al. (1998) study. Neurauter et al. (2007) analyzed 700 hours of video data, representing approximately 1.7% of the available data in the 100-car database. They found that volume adjustment was the most frequent adjustment at 6.11 operations per hour. Preset buttons were 3.16/hour, then changing source at 0.57/hour, and finally radio tuning at only 0.56 operations per hour. Again, no figure for driving time and the duration of the tasks was provided in the Neurauter et al. (2007) study, so the prevalence and exposure data needed to estimate relative risk are not calculable from the published study data. Angell et al., (2008) studied two vehicles with 17 participants in naturalistic driving (each driving for 4 weeks), for a total of 694 hours of driving with aftermarket infotainment systems, including radio and navigation functions. Participants spent, on average, 2.43% of their time in the vehicle manipulating the system controls (SD=1.77). No crashes occurred during the study. This usage estimate for radio tasks observed by Angell et al. (2008) is higher than the Stutts et al. (2003) estimate for radios, which showed only 1.1% and the Neurauter et al. (2007) estimates for radio controls using 100-Car data, which was 1.4%. This higher usage in the Angell et al. (2008) study may have been due the inclusion of satellite radio (which was free-of-charge to participants), because this type of radio has many more stations to choose from relative to an AM/FM radio. Again, adjusting volume was the most common task at 5-10% of all radio interactions. “Listening for something else” was the second most common task, at 4-5% of radio interactions. “Listening for something else” would combine the presets, tuning the conventional radio, tuning satellite radio, use of CD, and use of iPod, which were not broken out as separate categories in the 2008 report. “Adjust volume” had average task duration of about 3 sec, while “listening for something else” was the longest event duration at about 17 seconds. “Adjusting volume” occurred at about 5 times/hour, and “listen for something else” between 3 and 4 times per hour. “Adjust volume” had mean glance duration of about 0.5 sec with an upper error limit (1 standard deviation) of about 1.2 sec, and eyes off road time of about 1 second. “Listen for something else” had mean glance duration of about 1.2 sec with an upper error limit near 3 seconds, and eyes off road time of about 7 sec with an upper error limit of about 18 seconds. Exposure time as a percentage of driving time was not reported in the 2008 briefing on this study. Miguel Perez (one of the co-investigators on the study) has since separately examined predictive functions for crash risk using these data (report in preparation).

5

An equalizer strengthens (boosts) or weaken (cuts) the energy of specific frequency bands.

NHTSA Comment Attachment 2

p. 17 of 32

Young

3.4 Next Steps for Further Research Regarding Long Single Glance Concern As noted in Fig. 3B, some relatively short tasks with low workload and little eyes off road time (such as radio tuning and some HVAC tasks) nevertheless give rise to poor event detection and long mean single glance durations (Young and Angell, 2003; Angell, 2010; Young, 2012b), as well as one long single glance. More research is needed on why as many as one-third of such low workload tasks have long mean single glance durations and poor event detection, and the other two-thirds of short tasks have acceptable event detection performance (Young, 2012b). These low or moderate workload tasks with poor event detection include radio tasks similar to the NHTSA/Alliance radio reference task. If the attentional processes underlying these short tasks with long single glance durations could be known, these tasks could perhaps be redesigned to reduce the problem (or, alternatively, coupled with technology countermeasures to assist in cuing driver attention to the road when needed). These various limitations in the crash data concerning radio tuning as noted above unfortunately do not allow existing study data to make a definitive determination of the relative crash risk of radio tuning and multi-knob HVAC tasks. Odds ratios for radio tuning and multi-knob HVAC tasks in more advanced implementations of radios and HVAC user interfaces could perhaps be calculated more precisely in future naturalistic driving studies such as SHRP-2, although this may be difficult without a sufficiently detailed video image of the radio or HVAC faceplates or screen images. (Most of the video bandwidth in the SHRP2 study is unfortunately directed to the color image of the forward road scene.) Although the faceplates and screen images themselves can be known from information about the vehicle model, VIN, and year, it will be difficult to discern what features or functions on the radio or HVAC are being accessed by the driver’s hand. Also no sound is available in the SHRP2 recordings so what the driver is listening to if anything cannot be determined either. Bottom line, radio tuning (at least with older style radio as intended by the Alliance) has been widely regarded as a “socially-acceptable” reference task today. Undoubtedly this is in part because it is a relatively short and familiar task, and the perceived workload associated with it by drivers is low. The number of glances and total eyes-off-road time associated with radio tuning are also low, and so radio tuning can be suitable for setting criteria for those variables. The difficulty with radio tuning for setting every criterion is the recent discovery by Young, (2012b) that radio tuning has excessively long single maximum glance times, in combination with the new findings that such long single glances may be a contributing factor to crashes. Thus radio tuning may reflect “attention capture” that is observed for still to be determined reasons in radio tuning but not in many other short tasks. The driver is not aware by definition that their attention has been captured, and because there is no subjective feeling of high workload as in destination entry for example, the driver does not have an underlying uncomfortable feeling that they should return their eyes to the road, resulting in an excessively long glance for radio tuning (Young, 2012b), and poor event detection. Given these findings, it is a puzzle then why radio tuning does not show up in police crash reports (but then “bee in a vehicle” also does not show up in police NHTSA Comment Attachment 2

p. 18 of 32

Young

reports, and yet “bee in a vehicle” has a known high crash odds ratio from naturalistic studies). Now that this phenomenon has been identified, then more research can be done on it, and appropriate re-designs or countermeasures eventually identified. There are those who will argue that there is no solution for long single glances other than crash avoidance and crashworthiness technology, or even eventually automated vehicles. Given that the average age of a vehicle on the road today is now l0.8 years, 6 it will be many years until crash avoidance and the latest crashworthiness technologies have sufficient market penetration to make much difference in crash risk. Crash avoidance technologies add considerably to new vehicle cost, and will be selected as an option only by a subset of vehicle purchasers. Automated vehicles, despite the enormous sums of money currently being poured into their development by automakers and the government, will likely have even a longer time to achieve a significant market penetration in common everyday driving, likely many decades. A more practical solution is to study the newly identified issue in more depth, and then to develop appropriate near-term design guidelines, test procedures, and counter-measures that will reduce the problem, just as has been done for the first dimension of driver performance, namely driver workload.

4 Concerns about the DFD-FC Test On initial examination, the DFD-FC test seems ideal – it combines measurements of glance behavior, driving performance (lateral and longitudinal variation), and event detection all in one test. In fact, in my presentation at the NHTSA Technical Forum in Ohio, I recommended the DFD-FC test as the only one suitable for final validation (as long as some test other than radio tuning was used to set the event detection criteria). Upon closer examination, however, I have identified several concerns with the DFD-FC test. The detailed protocols and results for the DFD-FC test in a simulator are contained in Ranney et al. (2011) and I discuss the protocols of the DFD-FC test as described there.

4.1 Use of Radio Tuning Task to Set Criteria The concern about the DFD-FC test basing its criteria on the radio tuning test has been discussed in Section 3.2 above. A better method for setting fixed criteria even for that test is necessary, because the DFD-FC test relies on the radio tuning task for its reference criteria, and that task tends to have a long single glance and poor event detection.

4.2 Coherence Concerns Ranney et al. (2011) describe the use of coherence in a car following paradigm as a critical element of the DFD-FC test. Ranney et al. (2011, p. vi) define coherence as “a measure of carfollowing performance.” Coherence is used both as a measure of car-following performance and as a test of whether the associated measure of phase shift (car-following delay) is interpretable. The calculation of coherence requires a car-following paradigm in which the lead vehicle 6

http://www.usatoday.com/money/autos/story/2012-01-17/cars-trucks-age-polk/52613102/1

NHTSA Comment Attachment 2

p. 19 of 32

Young

changes speed. A detailed discussion of coherence calculations is presented in Ranney et al. (2007). In that study, they presented data indicating that car-following delay measured by a coherence paradigm increased during a complex cell phone conversation secondary task (that is, the driver increased the distance to the lead vehicle). One problem with coherence is that following a lead car closely for a certain minimum time period is necessary to obtain a valid coherence measurement. That is, the subject driver in the vehicle to the rear of the lead vehicle must drive sufficiently close to that lead vehicle so that the subject driver is likely to make adjustments to the subject vehicle speed as the lead vehicle changes its speed. This close following of a lead vehicle is not typical during driving while distracted by engaging in a secondary visual-manual or auditory-vocal task as many studies have shown (e.g. Angell et al., 2006a). Drivers typically tend to slow down and increase the headway to the forward vehicle. For example, Ranney et al. (2007) used “coherence” in their test track assessments of the effects of voice-based interfaces on driver performance. They reported difficulties among their test participants to follow sufficiently close so that coherence measures could be obtained. They even found they had to pay participants extra money to follow closely, against their normal driving inclinations: In response to difficulties experienced by some pilot participants in maintaining a close following distance, the training protocol was modified to include additional training and feedback about the range of following distances considered acceptable. However, as in the previous study, because of documented individual differences in comfort associated with close following distances, a narrow range of following distances was not enforced. During the experiment, participants received feedback and monetary incentives based on their ability to maintain a consistent and relatively close following distance. (Ranney et al., 2007, p. 12). The difficulty with coherence measures is that if the driver falls back and, essentially, quits car following in a precise manner, the coherence metrics are not meaningful. As mentioned, drivers under workload typically back off of a lead vehicle to allow themselves a greater safety margin. If this safety margin is not allowed because of forced experimental protocol requirements (e.g. the subject is told or even rewarded to maintain a particular distance to the lead vehicle), the coherence results become meaningless as indicators of normal driving behavior in nonexperimental conditions. Another problem with coherence measures is that Ranney et al (2007, p. 47) found that “modulus” or gain of the following driver’s responses had mean values for the secondary task trials generally close to 1, indicating accurate following during secondary task trials. During baseline driving, modulus values were greater than 1, indicating overly aggressive following during baseline trials. So subjects are actually exhibiting better car-following during secondary tasks, which is contrary to what would be predicted if coherence is a valid metric for distraction.

NHTSA Comment Attachment 2

p. 20 of 32

Young

4.3 Multiple PDT Stimuli Ranney et al. (2007, p. 8) used 23 locations of a target light on a windshield. Ranney et al. (2011) used 6 locations of the target lights on the windshield, which they called the “Multiple target Detection Task” (MDT). The 6 targets are presented in the simulated roadway display at different locations. The MDT is proposed as an improvement to the Peripheral Detection Task (PDT) which has only 1 light. In the MDT, targets (red-colored circles approximately the same size as the LED reflections in the traditional PDT) appeared at one of 6 locations on a single horizontal line near the horizon in the driving scene (Fig. 6), as described below: DFD–FC.9.a As explained in Subsections VI.2.f and VI.2.g, for the visual detection task, the driving simulator should display a series of targets to be detected. Each target consists of a filled-in, solid, red circle that is displayed in any one of six positions. Target dimensions and positions are described in VI.2.f (NHTSA, 2012, p. 11247) VI.2.f.i The target to be detected consists of a filled-in, red circle. The target should be sized such that it subtends a visual angle of 1.0±0.2 degrees. It may be displayed in any one of six positions. These positions are: vertically—all approximately at horizon height, and horizontally, with respect to the driver’s head position— 9±1, 5±15, and 1±11 to the left of straight ahead, and 10±11, 14±15, and 17±11 degrees to the right of straight ahead. (NHTSA, 2012, p. 11239)

Figure 6. Multiple-Target Detection Task (MDT) from Ranney et al. (2011, their Fig. 3, p. 8).

The three methods in the main body of the ISO NP 17488 Detection Response Test (DRT) standard (document in preparation) have only a single stimulus (the Head-Mounted and Remote DRT have a single red light stimulus, and the Tactile DRT has one tactile stimulus on the shoulder near the neck). In the Annex to that ISO standard, the Wayne State Detection Response Task has two red lights in a left and center position (Hsieh et al., 2012), as does the GM “Static Load Test,” another variant of the PDT (Angell et al., 2002). The 20+ light PDT has not been in NHTSA Comment Attachment 2

p. 21 of 32

Young

use in the European versions of the PDT since the first experiments there in the 1990’s (they have since migrated to the single stimulus ISO DRT standard), and two lights are the most that have ever been used in the U.S. published studies (Angell et al., 2002; Hsieh et al., 2012). Ranney et al. (2011) give some supporting data for why they chose to employ six lights instead of 1 or 2, but the advantages they found for use of multiple lights over a single light has not been replicated in an ISO test evaluation at Wayne State University (Young et al., 2012c). Showing six lights will increase the response times (RTs) a great deal, because it is wellestablished that increasing the number of choices increases the RT. This effect is seen in Fig. 7 below, extracted from Ranney et al. (2011, their Fig. 10, p. 23). The “HDT” is the head-mounted display with only one light. The “MDT” is NHTSA’s proposed six-light PDT display. The task is the n-Back task. Comparing the 2-Back task (red bars) to the 1-Back task (blue bars) is a pure test of the ability of the detection task to pick up a cognitive distraction effect (or, more properly, a selective attention effect). The listening and vocalization parts of the task are identical for 2back vs. 1-back. The subject simply has a slight increase in memory load to remember 2 digits rather than 1. The RTs are clearly larger in the 6-light MDT display, as expected from wellestablished principles in the field of experimental psychology.

Figure 7. Response time means (± se) across 40 subjects for single light (HDT, n.s.) vs. six light (MDT, p = 0.019). Note the longer response times for the multiple light (MDT) vs. single light (HDT) conditions (from Ranney et al., 2011, their Fig. 10, p. 23).

Curiously, Ranney et al. (2011) found a statistically significant difference between the 2-Back and 1-Back tasks in the 6-light condition (MDT) but not in the 1-light condition (MDT). For several reasons, one would not expect to have a larger effect size when the baseline RTs are NHTSA Comment Attachment 2

p. 22 of 32

Young

relatively long. First, there can be ceiling effects, where long RTs cannot get any longer without turning into missed events. Secondly, if internal attentional processes are directed to multiple targets (either in parallel or serially) then unless there is a prior cue indicating which is the correct target as in the “Posner” paradigm, effect of removal of attention (as a result of “cognitive distraction” or “selective attention” effects) might not be as large as it would be with fewer targets. An attempt at duplicating the Head-Mounted DRT results of Ranney et al. (2011) did not succeed. In Fig. 8, all methods of Young et al. (2012c) (first four sets of bars) are able to correctly and easily discriminate the pure cognitive load comparing the 1-Back to the 0-Back condition. These methods include the single-stimulus Tactile DRT, the Head-Mounted DRT, and the remote DRT; and the 2-light Enhanced DRT (see Young et al., 2012c for further details). The data in Fig. 7 above were digitized and plotted on the same scale in Fig. 8. For unknown reasons, the 1-Back task with the Head-Mounted DRT for Ranney et al. (2011) (labeled “NHTSA Head” in Fig. 8) gave rise to a longer RT (775 msec) than that for the same 1Back task in Young et al. (2012c) (515 msec). It is likely that NHTSA did not follow the draft ISO DRT standard (in preparation) for the Head-Mounted Display. Furthermore, Ranney et al. (2011) did not find a statistically significant difference between the 2-Back and 1-Back task for the Head-Mounted DRT. The 6-light NHTSA-developed multiple event DRT (rightmost red bars) had the longest mean RT of any condition in Fig. 8, but did find about a 100 msec difference between the 2-Back and 1-Back tasks. Ranney et al. (2011) therefore ended up concluding that the NHTSA 6-light display was better, and it ended up being the recommended NHTSA DRT task. However, their relatively poor results for the Head-Mounted DRT were not replicated in the Wayne State studies, or any other of the DRT types they tested (Young et al., 2012c). The difficulty is that if NHTSA goes forward with its own 6-light DRT version, then the automakers, if they choose to agree to the voluntary NHTSA guidelines, will be forced to have to use the NHTSA 6-light version, and could not adopt or even continue to use the ISO DRT 1stimulus standard, making it effectively moot for automakers.

NHTSA Comment Attachment 2

p. 23 of 32

Young

Figure 8. Comparison of response times for different number of events. The first three sets of bars are to one event (Tactile, Head, Remote DRT) for baseline (no secondary task), 0-back and 1back for Tactile DRT, Head-Mounted DRT, and Remote DRT. The blue bars are two events (Enhanced DRT) for the same tasks, from updated data (see Young, 2012c for details of conditions). For comparison, the Ranney et al (2011) data in Fig. 7 of this attachment are replotted as the four rightmost bars.

The effect size comparisons are shown in Fig. 9. The first sets of three bars (Tactile, Head, Remote DRT) compare the effect size for baseline, 0-Back, and 1-Back tasks for 1 event (red light or tactile stimulus). These have similar effect sizes. The blue bar is the effect size for 2 events (the Wayne State Enhanced DRT) for the same tasks, and shows the effect size to be higher. The NHTSA Head-Mounted version (last two red bars) does not have an improved effect size compared to the other DRT methods. If the NHTSA 6-light DRT method is adopted as the recommended method, then automakers will be unable to pursue improved or enhanced versions of the DRT with larger effect sizes. Effect sizes are important measures of a test’s ability to be sensitive to small differences in cognitive distraction.

NHTSA Comment Attachment 2

p. 24 of 32

Young

Lab DRT RT n-Back Task Cohen's d Effect Size

10 9

Lab DRT Cohen's d Effect Size for RT to Event

8 7 6 5 4 3 2 1 0

Base

0

1

Base

0

1

Base

0

1

Base

0

1

Base

1

2

Base

1

Tactile

Head

Remote

Enhanced

NHTSA Head

NHTSA Multiple

1 event

1 event

1 event

2 events

1 event

6 events

Series1 2.9 3.2 2.6 4.4 4.9 4.1 5.7 5.4 4.2 7.8 9.3 5.5

4.6 4.5

2

4.4 4.7

Figure 9. Comparison of Cohen’s d measure of effect sizes for different venues and comparisons.

In short, rather than concluding that the 6-light method is better than the 1-light head-mounted method as Ranney et al. (2011) did, the better conclusion might be that both methods are producing about the same effect sizes (Fig. 9), and their methods of implementing the headmounted display seem to be producing longer RTs than the ISO methods (Fig. 8). There is a concern that the Ranney et al. (2011) methods may be making their version of their DRT less sensitive to small changes in cognitive load than the ISO methods. Again, the longer RTs in the Ranney et al. (2011) head-mounted apparatus (Fig. 8) are also cause for concern, and likely indicate unknown differences in the head-mounted PDT methodology between the two labs. The methodology in Young et al. (2012c) followed the methodology in the draft ISO DRT standard, and used the head-mounted apparatus and software from TNO in the Netherlands, which was instrumental in the development of the head-mounted DRT. The differences in procedure between, on the one hand, Ranney et al. (2011) and the draft NHTSA (2012) Guidelines, and on the other hand, from the draft ISO standard that might give rise to this difference in results are unknown. These different results leave unanswered concerns about the Ranney et al., (2011) data and procedures that form an important basis for the DFD-FC test and criteria. Furthermore, the RT criterion for the NHTSA (2012) Guidelines fixed at 1 second as per the quote below is not credible if different laboratories produce widely different response times: NHTSA Comment Attachment 2

p. 25 of 32

Young

“J11.b.iv The mean Visual Detection Task Response Time during the Data Intervals is not statistically significantly greater, at the 95 percent confidence level, than 1.0 second.” (NHTSA, 2012, p. 11250)

4.4 Continuous Performance of Secondary Tasks for 3 Min Because obtaining a valid coherence measurement requires the driver to follow a vehicle for some time, relatively short duration tasks will tend not to have a reliable coherence measurement. Therefore, the Ranney et al. (2011) DFD-FC test method requires repeating relatively short tasks for three minutes so that a coherence measure can be obtained: DFD–FC.8 Continuous Task Performance. Each secondary task is continuously performed for 3 minutes during the car following portion of the test (see Subsection VI.3.b.iv). (NHTSA, 2012, p. 11247)

Any such repetition of a single test in back-to-back repetitions is questionable due to lack of realworld validity. The duration of a visual-manual task is a fundamental characteristic of a task, the highest loading variable on the first dimension of driver performance (Young and Angell, 2003). Thus, a task with a short duration has a positive characteristic, as far as driver workload is concerned (the first dimension of driver performance). NHTSA (2012) seems contradictory in artificially repeating a short task many times simply to obtain a measurement that demonstrates a high workload effect. It is especially contradictory given that NHTSA (2012, p. 11210) states that: “Although destination entry was no more demanding than radio tuning when task duration effects were eliminated with DFD metrics, it exposes drivers to more risk than radio tuning and phone tasks due to its considerably longer duration.”

5 Recommended Updates to NHTSA Guidelines 1. Four tests (EGDS, OCC, STEP, and DS-FC) should be recommended solely for early design of tasks, to help limit “driver workload,” and not for final validation of tasks to criteria. 2. Tests using a radio benchmark (DS-BM, DFD-BM) should be removed from the list of recommended tests, because the radio tuning reference task is associated with poor attentional processes (poor event detection and long maximum single glance). 3. To ensure that driver attentional processes are also captured as well as driver workload, the test for final validation of compliance to criteria for visual-manual tasks should also use a test that includes event detection. (That is, event detection tests should be used just as much if not more with visual-manual tasks as with auditory-vocal tasks, to deal with “cognitive distraction” from visual-manual tasks, which is typically even larger than that for auditory-vocal tasks – see Angell et al., 2006). 4. The DFD-FC test is the only one of the seven tests meeting the requirement for a test that measures both driver workload and event detection, and also uses fixed criteria. It therefore appears at first to be the only test suitable for final validation of tasks. However, closer examination revealed several new concerns for this test, including that at least NHTSA Comment Attachment 2

p. 26 of 32

Young

some of the fixed criteria for this test are based on radio tuning, which are on shaky footing for the reasons noted in recommendation “2” above. 5. NHTSA should consider developing an alternative way to create a substitute set of criteria, on the basis of a visual-manual reference task that reflects acceptable event detection performance as well as visual performance. 6. Alternatively, the radio tuning task could be used solely to set eyes-off-road time or other criteria related to the driver workload dimension. Another task then needs to be found by NHTSA that could be used to set criteria related to the event detection dimension. Several candidate tasks can be found in Young (2012b) with good event detection as well as low workload. 7. A simple PDT-like test with a simple simulator or video of driving with a tracking task (Hsieh et al., 2012) should be included in the NHTSA (2012) test suite (see PDT-like tests in a new preliminary work item, ISO NP-17488, “Detection-Response Task for assessing selective attention in driving,” in preparation). This test can be used as an early screening test or as a final validation test (Young et al., 2009, 2012c).

6 Recommendations for Protocols and Acceptance Criteria These recommendations are summarized in the colored boxes in Table 3 below, adapted from the table in NHTSA (2012, p. 11241). Table 3 (below). Summary of recommendations for NHTSA (2012) test methods. None of the tests are suitable for final validation for reasons indicated in main text. The last test DFD-FC may be OK for final validation for predicting the results of an experimental on-road test if its fixed criteria for event detection are set by some means other than the radio tuning task.

NHTSA Comment Attachment 2

p. 27 of 32

Young

7 Conclusions 1. NHTSA deserves kudos for a good first effort at proposing draft national Guidelines for driver distraction. 2. However, most of the tests, including the two most recommended ones for visual-manual tasks (OCC and EGDS), do not sufficiently address attentional criteria relevant to poor event detection and long single glances. 3. On the surface, the DFD-FC test is the only one that potentially addresses all of the different aspects of driver performance. It can perhaps be developed further and could potentially be the recommended NHTSA (2012) test for final validation of visual-manual tasks to criteria, at least for validation defined as predicting experimental on-road studies. However several difficulties are listed that will need to be addressed before it can be recommended for use by automakers and suppliers for final validation purposes. In particular, the radio tuning task cannot be used to set the benchmarks for the event detection criteria for this test or others. 4. A simple PDT-like test in the context of a driving scenario (without necessarily even requiring a simulator) that will allow glance behavior to be measured should also be adopted by NHTSA, such as those in the new preliminary work item ISO NP-17488 “Detection-Response Task for Assessing Selective Attention in Driving.” These tests

NHTSA Comment Attachment 2

p. 28 of 32

Young

have been fully validated with on-road driving, and are suitable for use for early screening as well as final validation. 5. For the many reasons mentioned in this attachment and the other attachments accompanying my main comment, NHTSA has not yet demonstrated that their recommended changes to the Alliance (2006) Guidelines are empirically valid. 6. In particular, NHTSA has not taken the opportunity to deal with long maximum glances that are associated with the radio tuning task, and hence using that task to set all criteria (including those for event detection) may well continue to permit many “false negative” tasks to escape the screening tests – thereby losing an opportunity to improve driving safety.

8 References Altman, E. M., and Trafton, J. G. “Memory for goals: An activation-based model.” Cognitive Science: A Multidisciplinary Journal, 26, 39-83, 2002. Alliance of Automobile Manufacturers Driver Focus Telematics Working Group, "Statement of Principles, Criteria and Verification Procedures on Driver-Interactions with Advanced in-Vehicle Information and Communication Systems, June 26, 2006 Version," Alliance of Automobile Manufacturers, Washington, DC, 2006, http://autoalliance.org/files/DriverFocus.pdf. Angell, L.S., Young, R.A., Hankey, J.M., and Dingus, T.A., "An Evaluation of Alternative Methods for Assessing Driver Workload in the Early Development of in-Vehicle Information Systems," Paper presented at: Society of Automotive Engineers Government/Industry Meeting, Washington, DC, May 2002, http://www.sae.org/technical/papers/2002-01-1981. Angell, L. S. Effects of Secondary Task Demands on Drivers’ Responses to Events during Driving: Surrogate Methods & Issues. Fourth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design: Driving Assessment 2007 Conference. Stephenson, Washington. July 10, 2007. Angell, L., Auflick, J., Austria, P., W, B., Diptiman, T., Hogsett, J., Kiger, S., Kochhar, D., and Tijerina, L., "Driver Workload Metrics Project, Task 2 Final Report," National Highway Traffic Safety Administration, 2006a, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/Driver %20Workload%20Metrics%20Final%20Report.pdf. Angell, L., Auflick, J., Austria, P., Biever, W., Diptiman, T., Hogsett, J., Kiger, S., Kochhar, D., and Tijerina, L., "Driver Workload Metrics Project, Task 2 Final Report, Appendices," National Highway Traffic Safety Administration, 2006b. Angell, L.S., "Effects of Secondary Task Demands on Drivers’ Responses to Events during Driving: Surrogate Methods and Issues (Abstract)," Paper presented at: Driving Assessment, Stevenson, Washington, 2007, http://drivingassessment.uiowa.edu/DA2007/PDF/005_Angell.pdf. Angell, L.S., Perez, M., and Hankey, J., "Driver Usage Patterns for Secondary Information Systems," Paper presented at: The First Human Factors Symposium on Naturalistic Driving Methods and Analyses, Blacksburg, Virginia, August 25-28, 2008, http://www.vtti.vt.edu/PDFs/ndmas_ppt_PDFs/angellGM-perezVTTI.pdf. Angell, L.S. "Conceptualizing Effects of Secondary Task Demands During Driving: Surrogate Methods and Issues." Chap. 3 In Performance Metrics for Assessing Driver Distraction: The Quest for Improved Road Safety, edited by Gary L. Rupp. 42-72. Warrendale, PA, USA: SAE International, 2010. Curry, D.G. and Jaworski, T., "Frequency of Use of Automotive Stereo Controls” Paper presented at: Human Factors and Ergonomics Society 42nd Annual Meeting, Chicago, IL, 1998.

NHTSA Comment Attachment 2

p. 29 of 32

Young

Horrey, W.J. and Wickens, C.D., "In-Vehicle Glance Duration: Distributions, Tails, and Model of Crash Risk," Transportation Research Record: Journal of the Transportation Research Board 2018:22-28, Washington, DC, 2007. Hsieh, L., Young, R., & Seaman, S. (2012). Development of the Enhanced Peripheral Detection Task: A Surrogate Test for Driver Distraction. SAE Int. J. Passeng. Cars - Electron. Electr. Syst., 5(1). doi: 10.4271/2012-01-0965 ISO NP 17488. Detection-Response Task for assessing selective attention in driving. Road vehicles - Transport information and control systems -Man machine interface. TC13, WG8 (in preparation). Klauer, S.G., Dingus, T.A., Neale, V.L., Sudweeks, J.D., and Ramsey, D.J., "The Impact of Driver Inattention on near-Crash/Crash Risk: An Analysis Using the 100-Car Naturalistic Driving Study Data (Report No. DOT HS 810 594)," National Highway Traffic Safety Administration, Washington, DC, 2006, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/81059 4.pdf. Accessed Jan 24, 2011. Lee, J.D., Roberts, S.C., Hoffman, J.D., and Angell, L.S., "Scrolling and Driving," Human Factors: The Journal of the Human Factors and Ergonomics Society 54(2):250-63, April 1, 2012 2012, doi:10.1177/0018720811429562, http://hfs.sagepub.com/content/54/2/250.abstract. Liang, Y. Detecting Driver Distraction. Iowa City, The University of Iowa. Doctoral Thesis. 2009. Liang, Y., Lee, J. D., and Yekhshatyan, L, "How Dangerous Is Looking Away from the Road? Algorithms Predict Crash Risk from Glance Patterns in Naturalistic Driving," Human Factors, in review. Maddox, J. “Technical Correction to 77 FR 11200, February 24, 2012, Visual-Manual NHTSA Driver Distraction Guidelines for In-Vehicle Electronic Devices, Notice of Proposed Federal Guidelines.” Posted 5/9/12 at: http://www.regulations.gov/#!documentDetail;D=NHTSA-2010-0053-0079 Neurauter, M.L., Hankey, J.M., and Young, R.A., "Radio Usage: Observations from the 100-Car Naturalistic Driving Study (SAE 2007-01-0441)," SAE 2007 Transactions Journal of Passenger Cars: Mechanical Systems V116-6, 2007, http://www.sae.org/technical/papers/2007-01-0441. NHTSA, "National Motor Vehicle Crash Causation Survey: Report to Congress (DOT HS 811 059)," National Highway Traffic Safety Administration, Washington, DC, 2008, http://wwwnrd.nhtsa.dot.gov/Pubs/811059.PDF. NHTSA, "An Examination of Driver Distraction as Recorded in NHTSA Databases (DOT HS 811 216)," NHTSA’s National Center for Statistics and Analysis, Washington, DC, 2009, http://wwwnrd.nhtsa.dot.gov/Pubs/811216.pdf. NHTSA, "Visual-Manual NHTSA Driver Distraction Guidelines for in-Vehicle Electronic Devices,” Docket No. NHTSA–2010–0053, 77, Federal Register, 2012, https://federalregister.gov/a/2012-6266. Ranney, T.A., Mazzae, E.N., Baldwin, G.H.S., and Salaani, M.K., "Characteristics of Voice-Based Interfaces for inVehicle Systems and Their Effects on Driving Performance," National Highway Traffic Safety Administration, 2007, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/2007/DOT-HS-810867.pdf. Ranney, T.A., Baldwin, G.H.S., Parmer, E., Domeyer, J., Martin, J., and Mazzae, E.N., "Developing a Test to Measure Distraction Potential of in-Vehicle Information System Tasks in Production Vehicles (DOT HS 811 463)," 2011. Ranney, T. A., Baldwin, G. H. S., Parmer, E., Martin, J., and Mazzae, E. N., ‘‘Distraction Effects of Number and Text Entry Using the Alliance of Automotive Manufacturers’ Principle 2.1B Verification Procedure,’’ NHTSA Technical Report number DOT HS 811 571, February, 2012. Rockwell, T.H. "Spare Visual Capacity in Driving---Revisited: New Empirical Results for an Old Idea." In Vision in Vehicles II edited by A G Gale, M H Freeman, C M Haslegrave, P Smith and S P Taylor. Burlington, MA: Elsevier, 1988. Stutts, J., Feaganes, J., Rodgman, E., Hamlett, C., Reinfurt, D., Gish, K., Mercadante, M., and Staplin, L., "The Causes and Consequences of Distraction in Everyday Driving," Annu Proc Assoc Adv Automot Med 47:235-51, 2003,

NHTSA Comment Attachment 2

p. 30 of 32

Young

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1294122 8. Victor, T., "Inattention-Risk Function (for Lead Vehicle Crashes) - a SHRP2 S08 Analysis Project," Paper presented at: SHRP2 Summer Symposium, Washington DC, July, 2011. Victor, T. and Dozza, M., "Timing Matters: Visual Behavior and Crash Risk in the 100-Car on-Line Data," Paper presented at: Driver Distraction and Inattention 2011, Gothenburg, Sweden, 2011, http://www.chalmers.se/safer/ddi2011-en/program/papers-presentations. Victor, T. W., Dozza, M., and Lee, J. D. Timing Matters: Distraction, Glances, and Crash Risk. Manuscript in preparation. Wang, J., Knipling, R.R., and Goodman, M.J., "The Role of Driver Inattention in Crashes: New Statistics from the 1995 Crashworthiness Data System," 40th Annual Proceedings: Association for the Advancement of Automotive Medicine:377-92, 1996, http://www-nrd.nhtsa.dot.gov/departments/Human%20Factors/driverdistraction/PDF/Wang.PDF. Young, R.A. and Angell, L.S., "The Dimensions of Driver Performance During Secondary Manual Tasks," Driving Assessment 2003: Second International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Park City, Utah, July 2003, http://drivingassessment.uiowa.edu/DA2003/pdf/25_Youngformat.pdf. Young, R.A., Angell, L., Sullivan, J.M., Seaman, S., and Hsieh, L., "Validation of the Static Load Test for Event Detection During Hands-Free Conversation ", Proceedings of the Fifth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design 5:268-75, 2009, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf. Young, R.A., Aryal, B., Muresan, M., Ding, X., Oja, S., and Simpson, S.N., "Road-to-Lab: Validation of the Static Load Test for Predicting on-Road Driving Performance While Using Advanced Information and Communication Devices," Proceedings of the Third International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Rockport, Maine, June 2005, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf. Young, R., "Cognitive Distraction While Driving: A Critical Review of Definitions and Prevalence in Crashes," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1):2012a, doi:10.4271/2012-01-0967. Young, R.A., "Event Detection: The Second Dimension of Driver Performance for Visual-Manual Tasks," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1), 2012b, doi:10.4271/2012-01-0964. Young, R., Seaman, S., and Hsieh, L., "Measuring Cognitive Distraction on the Road and in the Lab with Wayne State Detection Response Task," Transportation Research Board 2012 Annual Meeting, Washington, D.C., 2012c.

NHTSA Comment Attachment 2

p. 31 of 32

Young

Addendum: Recent NHTSA Results Confirming False Negative Task Predictions Subsequent to the writing of this attachment, a new study was added to the docket 7 on May 3. It was expected that this study by Perez et al. (2012) would confirm the high false negative rate reported here for test and road data. However, few conditions were matched between the simulator study and the track study. In addition, it was almost impossible to determine what the correct data were to use from that study because of numerous inconsistencies between the data as presented in the graphs and in the main table 8 (see Attachment 4, Error 9). Another difficulty is that it was difficult or impossible to determine what a number of the variables actually meant, because the ways in which the variables were calculated were never specified. For example, this attachment on false negatives makes some new points about the possible effect of long maximum single glances on leading to poor event detection and response, at least in experimental studies, with some possible corroborating evidence from crash/near-crash naturalistic studies. Therefore, a close examination was made of the “maximum long single glance” variable in the NHTSA Guidelines and supporting documents. However, as per Attachment 4, Error 6, there is no clear definition of maximum single glance or how the values shown were calculated.

Addendum References Perez, M., Owens, J., Viita, D., Angell, L., Ranney, T.A., Baldwin, G.H.S., Parmer, E., Martin, J., Garrott, W.R., and Mazzae, E.N., "Summary of Radio Tuning Effects on Visual and Driving Performance Measures – Simulator and Test Track Studies," National Highway Traffic Safety Administration, 2012, http://www.regulations.gov/#!documentDetail;D=NHTSA-2010-0053-0076.

7

http://www.regulations.gov/#!docketDetail;dct=FR%252BPR%252BN%252BO%252BSR;rpp=25;so=DESC;sb=poste dDate;po=0;D=NHTSA-2010-0053.

NHTSA Comment Attachment 2

p. 32 of 32

Young

Attachment 3: Possible False Positive Errors in NHTSA Occlusion Test Richard A. Young, Ph.D. Research Professor Dept. of Psychiatry and Behavioral Neurosciences Wayne State University School of Medicine Detroit, MI USA May 18, 2012 [email protected] Attachment 3 to “Comment on: Visual-Manual National Highway Traffic Safety Administration Driver Distraction Guidelines: In-Vehicle Electronic Devices (Docket No. NHTSA-2010-00530009).”

Abstract The occlusion test proposed in the NHTSA (2012) Guidelines has a substantial proportion of “false positive” errors (almost 40%) when predicting whether a task meets criteria in a road experimental test, after testing in the lab. This high false positive error rate is based on an analysis of the Crash Avoidance Metrics Partnership Driver Workload Metrics (CAMP-DWM) Project dataset (Angell et al., 2006a,b), applying the metrics of the NHTSA (2012) Guidelines as closely as possible. [Note added after completion of the foregoing analysis: The addendum to this attachment shows that this result is consistent with the recent report by Ranney et al. (2012) using the occluded goggles test to predict simulator outcomes for four visual-manual tasks. Radio tuning and 7-digit phone dialing did not meet the occluded goggles test but met all seven simulator criteria in the EGDS test (Ranney et al., 2012, their Table 1, p. xiii), indicating a false positive result for 50% of the tasks tested (see Addendum).]

1 Introduction An analysis of classification accuracy for the occlusion test (OCC) was undertaken to examine how effectively the draft NHTSA (2012) criterion for occlusion identifies tasks that are high in visual demand. An analysis of classification errors was done originally by the Crash Avoidance Metrics Partnership Driver Workload Metrics (CAMP-DWM) Project, the results of which were NHTSA Comment Attachment 3

p. 1 of 17

Young

published in Table 5.5, on page 5-19 of the final CAMP-DWM report (Angell et al., 2006a). The original CAMP-DWM analysis examined seven criteria or decision rules related to occlusion and other visual demand metrics that were under consideration at that time (six years ago). These decision rules were compared to the way tasks “should” have been sorted based on a combination of empirical findings in the literature and expert judgment. The analysis below differs insofar as it examines the specific occlusion criterion proposed in the draft Phase 1 NHTSA (2012) Guidelines for application with Visual Occlusion (TSOT < 9 sec), and uses empirical on-road data exclusively for the comparison to similar data (eyes off road and other glance data) from the CAMP-DWM road and track studies. It should be kept in mind that, in general, no criterion is perfect. Most criteria result in some number of classification errors (some number of false positive or false negative identifications, or both). What is of interest, therefore, is how many classification errors result from any given criterion, as well as the nature and pattern of errors that are found to exist. We now examine the predictive validity of the Occluded Goggles Test.

2 Method Note: All data examined here were based on publicly available sources [in particular, the CAMPDWM reports (Angell et al, 2006a,b)]. Some interpolations and other techniques had to be adopted to determine the 85th percentile values in some cases, to be compatible with the criteria in the NHTSA (2012) draft guidelines. A more complete examination based on individual subject and glance data could be done with the full CAMP-DWM data set, which is available to the participants in the CAMP-DWM project as well as to NHTSA if they retained the CAMPDWM dataset that were given at the completion of the CAMP-DWM study. However, the results would likely not be much different than the preliminary findings here. The 85th percentile total shutter open time (TSOT) data for the occluded goggles test was transcribed from Angell et al. (2006a, their Table 5-5, p. 5-19). These TSOT data were based on 55 subjects, with at least 6 subjects with balanced gender in all four of the age ranges recommended by NHTSA (2012): 18-24, 25-39, 40-54, and 55-75. Therefore, this data sample met the age and gender requirements of the NHTSA (2012) tests. The 85 percentile EORT on the road was estimated from linear prediction to the 85th percentile of the median (50th percentile) and 3rd quartile (75th percentile) from the road data in Angell et al. (2006b, Table Q-31, p. Q-31). The 85th percentile EORT on the track was similarly estimated from the median and 3rd quartile data in Angell et al. (2006b, their Table Q-45, p. Q-45). 1

1

As noted, it would have been desirable to calculate the actual 85th%ile data directly from the individual subject data, or to calculate the actual number of participants that met the criteria or did not meet the criteria, but the

NHTSA Comment Attachment 3

p. 2 of 17

Young

The 85th percentile TSOT values were then matched against the 85 percentile eyes-off-road times (EORT) from the road and track data for the same tasks. 2 A task was scored “green” (negative, meets criterion) in the occluded goggles test if, as per the NHTSA (2012) draft criterion, it had a TSOT < 9 seconds, and “red” (positive, does not meet criterion) if it had a TSOT >= 9 seconds. A task was scored “green” in the Eyes-off-Road-Time (EORT) open-road and track tests if, as per the NHTSA (2012) draft criterion, it had an EORT < 12 seconds, and “red” if EORT was >= 12 seconds. If a task was “red” in the occluded goggles test but “green” on the road and/or track, it is a “false positive” – the lab test incorrectly classified the task as not meeting criterion, when it did meet criterion on the open road or track. A task is a “false negative” if it is “green” in the laboratory test, but “red” on the track and/or open-road test. A task was scored a “true positive” in the occluded goggles test if it was “red” in both the lab and road. A task was scored a “true negative” if it was “green” in both venues. 3

3 Results Table 1 shows the predictive validity of the 85th percentile total shutter open time (TSOT) in the occluded goggles test to predict the 85th percentile eyes-off-road time (EORT) on the road and track in the CAMP-DWM study (Angell et al., 2006a,b). A positive or “red” classification in the laboratory test (TSOT > 9) correctly matched a positive “red” classification in the road and track tests (EORT > 12 sec) 10 times (red class “a” in Table 1). A negative or “green” classification in the laboratory test (TSOT