Acting Together by Mutual Control Evaluation of a Multimodal Interaction Concept for Cooperative Driving Markus Zimmermann, Stefan Bauer, Niklas Lütteken, Iris M. Rothkirch and Klaus J. Bengler Institute of Ergonomics Technische Universität München Munich, Germany
[email protected] An increasing number of research projects currently deal with cooperative vehicle guidance, such as the project H-Mode (mainly from the in-vehicle perspective) or D3CoS (from a cross-domain traffic perspective) [4, 5]. As cooperative behavior in driving (e.g. a lane-change) currently relies on a voluntary basis [6, 7], an important research question is, how to design interaction concepts for driving in order to coordinate, motivate and improve the cooperation. This mode of interaction involves the mutual mediation of all user and system modes between a driver and the system. Such system modes include, for example, task allocation between the human and machine agents in order to reach their goals and the transition between responsibilities. On the other hand, the user mode implies the user’s state (e.g. the assessment whether he is able to overtake a certain task). The question is investigated on the basis of a lane-change scenario.
Abstract—This paper presents a study on the evaluation of a proposed interaction concept for cooperative driving in a lane-change scenario. First, cooperative driving is set into the context of human-machine cooperation. Second, for designing the system, the interaction between driver and car is established (based on mutual control), and the cooperation among different vehicles is elaborated. A timing sequence is presented for both. The corresponding multimodal user interface is introduced. The interface focuses on augmented reality via the contact analogue head-up display. During its design phase, certain mode aspects and design patterns are considered in order to improve the cooperation. Third, the implementation is outlined. Fourth, the evaluation is presented discussing the within-subjects experiment with 25 participants by means of three aspects: user interface quality, interaction timing and workload measurement, as a basis for user state inference. We obtained evidence that the proposed interaction concept improves cooperative behavior and increases safety. We furthermore verified a U-shaped relation between workload and performance by using a variety of different metrics. In a fifth step, future iterations are depicted.
This research question is deconstructed in the following sections, first in the design phase (section II). As for the system mode, the traffic (II.A) as well as the in-vehicle (II.B) cooperation need to be designed, followed by a definition of the underlying automation (II.C). Part of the interaction is, on the one hand, the interaction sequence and its timing (II.D), on the other hand, the communication of the aspect of cooperation itself by using several paradigms (II.E). In order to communicate the interaction, the interface directed from the system to the user consists of the communication of information by augmented reality (II.F) and multimodal aspects (II.G). The resulting implementation of this system design is presented afterwards (III), succeeded by the evaluation of the concept in a driving simulator (IV). In an experiment both the timing of the interaction (IV.B) and the interface directed from the user to the machine in terms of workload assessment (IV.C) have been determined. Finally, the iteration (V) is discussed: the system is drawn to dynamically adapt to the driver’s state (i.e. the dynamic task distribution based on workload and interface reconfiguration based on visual availability, as shown in section V.A) and a metric for assessing the quality of cooperation (V.B) is outlined. To point out the final idea of the proposed interaction concept: the mutual communication of user and system state is expected to measurably improve the quality of cooperation.
Human-Machine Cooperation; Cooperative Adaptive Cruise Control; Human-Machine Interface and Interaction (HMI) in Vehicles; Augmented Reality; Multimodality; Driver State and Intent Recognition
I.
MOTIVATION: COOPERATIVE SYSTEMS
Human-machine cooperation for cars targets, on the one hand, optimizing the in-vehicle interaction on board (between driver and automation), as well as the traffic interaction (between several road users). This advances conventional driver assistance systems like adaptive cruise control that only informs and supports the driver on vehicle control. Such limited assistance systems can lead to non-satisfying, dangerous, stressful, and unpredictable situations, for example during an unexpected lane-change situation [1]. Human-machine cooperation between a car and its driver has been previously defined as “a relation between human and machine where the interaction […] occurs with shared authority in dynamic situations” [2] or more generic from a multi-agent perspective by two minimal conditions: “each [agent] strives towards goals and can interfere with the other”, as well as “each one tries to manage the interference to facilitate the individual activities and/or the common task” as a symmetric definition [3].
II.
The interaction concept is designed in several steps. In order to establish an interaction for cooperative driving, it first needs to be set into the context of a cooperative scenario.
The work described in this paper is funded by the ARTEMIS Joint Undertaking under the number 269336-2 (www.d3cos.eu)
978-1-4799-5158-1/14/$31.00 ©2014 IEEE
DESIGNING THE INTERACTION CONCEPT
227
(under certain circumstances, depending e.g. on traffic speed and flow, as previously shown [9]). So no actions should be performed by the vehicle itself, but a management by consent within the vehicle, as a common course of action between driver and the automated system is the goal. On the one hand, the voluntary decision by the driver to open a gap and let a car cut in ahead is expected to increase the system acceptance. On the other hand, through management by consent, the driver’s mode awareness is increased and thereby the risk of running into mode errors is minimized [10]. C. Highly-automated driving The whole cooperation is initiated and supported by an, according to Gasser, highly automated system [11]; his definition of highly automated driving is translated as “a system which takes over longitudinal and lateral control for a specific period and in specific situations. The driver does not have to continuously monitor the system, but if required the system can request a take-over. The driver has to be given sufficient time for the take-over. Furthermore, system boundaries are reliably detected by the system.” [12]
Figure 1. The cooperative lane-change scenario: traffic cooperation between two vehicles (red Vl and blue Vr) in order to establish a gap (hachure) and invehicle cooperation in terms of mutual control.
A. Traffic Cooperation This scenario for the interaction concept has been chosen to be – because of its transferability – a cooperative lane-change scenario (see Fig. 1): a car (blue, Vr) is driving in the right lane of a motorway and approaching an obstacle (O, in this case a slower vehicle in form of a truck, 22 m⁄s). To keep the desired speed of 33 m⁄s involves the vehicle to initiate a lane-change maneuver. But in the left lane there is already dense traffic. The cooperative (contrary to a free or forced) lane-change, as shown [6], depends on another vehicle being able and willing to cooperate (i.e. slowing down and generating a gap, symbolized by the hachure in the figure). In normal driving without a cooperative interaction concept, this would force the driver of Vr to set the indicator and to expect a cooperative road user. The proposed improvement is to engage this cooperation technically by establishing a vehicle-to-vehicle communication (symbolized by the green dashed connection) between the traffic, already preselecting a suited cooperation partner (i.e. Vl, red, by criteria such as timeliness or cost) and allowing the interaction described later. This leads to a gap, which is coordinated, and – ideally – suited best for the timing of the particular maneuver.
Hence, in the setting of the presented mutual control, the longitudinal and lateral vehicle control for the situation cruise on the motorway is operated by the machine (no supervision necessary). The upcoming situation lane-change (because of O) is suggested to the driver. This action requires the driver’s acknowledgement, with a sufficient amount of time for reaction. In this sense the automation fulfils the requirement for management by consent, without provoking a system boundary at the same time: When the action (change lane) is accepted (by all involved agents), the lane-change is performed highly automated. If not, the lateral control system of Vr just follows O at a safe distance. Later on (II.G), the driver’s actions to the automation are described as being compatible to manual driving in a comparable lane-change situation. Those are: pushing the brake to let someone in, accelerating to deny cooperation and turning the wheel to change the lane. While the maneuver takes place on the guidance level, manual driving would also involve the stabilization level for steering the car [13]. The scenario benefits from the automation by reducing the complete stabilization level for those three actions to single accept/deny actions with a short duration. On the guidance level, “misunderstandings between the interacting traffic participants and resulting conflict situations indicate that today’s vehicles’ communicative devices such as direction indicators, brake lights, horn and headlight flasher are insufficient to clarify intended driving maneuvers mutually between the road users” (own translation [13]). When having a common course of action between automation and driver, the guidance level and therefore the cooperation accordingly benefits from more reliable and explicit information.
B. In-Vehicle Cooperation and Mutual Control There are four cooperative agents involved in the two cars, namely the two automated systems (responsible for the arbitration of the traffic cooperation), as well as the two drivers (responsible for the final common decisions). This involves the invehicle cooperation between human and machine. Five layers important for this kind of cooperation have therefore been defined: intention, cooperation mode, allocation, interface, and contact [2]. The lane-change scenario has been deconstructed by means of those levels [1]. For example, the intention mode apprehends the different agents as sources for recognizing, requesting and performing the lane-change; in the end, there are intentional cascades present for the onboard and traffic intention. The allocation, for instance, targets the distribution of un-sharable resources at a certain point in time, which is the lane. Most important, relating to the cooperation mode, the “action suggestion mode” has been chosen as a part of the “mutual control mode” with a medium level of invasiveness [8]. Action suggestion has been shown to have a positive effect on the effectiveness on risk management in critical situations by an interpreted information transfer. A lane-change maneuver tends to be a critical situation
At the same time the highly automated setup allows better control over the scenario, which is more reproducible during the simulated evaluation (see section V). If all the simulated cars’ velocities are constant the scenario timing is more predictable and the arbitration between Vl and Vr is therefore less complex.
228
abort conditions have been estimated using a literature- and model-based approach [14].
D. The Interaction Sequence and its Timing In order to transport traffic and in-vehicle cooperation through the automated system, a sequence of five phases of interaction was elaborated in several workshops [1]. They are transporting the cooperation: request, suggest preparation, prepare, suggest action, and finally action. The positive phases are built upon each other and shown in the Unified Modeling Language (UML) activity diagram in Fig. 2.
The durations of the request preparation phase and the suggest action phase were estimated on the basis of a study [4]. They examined the time it took drivers to take over control of a fully automated vehicle after being completely distracted from the driving task. To achieve this, the authors confronted subjects with a take-over request of 4 s, 6 s and 8 s before the system boundaries were reached and human intervention was necessary. The experimenters found that 4 s provided enough time for the majority of subjects to successfully take over control of the vehicle on a low level of vehicle control. This study was expected to be comparable, as accepting the lane-change is even on a higher level of control than conducting the lane-change itself. Following the study [4, 12], the request preparation phase and the suggest action phase were estimated to last about 4 s as well.
Figure 2. The UML activity of the cooperative lane-change scenario.
In the request phase, the driver of Vr is informed about the obstacle O and the initiated search for a suited cooperation partner. The driver of Vl, who in our scenario is a suitable candidate, receives the request and needs to accept. If accepted, the next phase begins, which is to suggest preparation. Here the required preparative action (i.e. opening a gap) is suggested to Vl and again needs to be confirmed. In the prepare phase, the automation then prepares the gap, by increasing the distance between Vl and its prior vehicle. Both drivers of Vl and Vr are informed about the maneuver’s progress. As soon as the gap is prepared and the time has come to perform the lane-change maneuver, this action is suggested to the driver of Vr, who then can accept. The lane-change itself, during the final phase action, is executed by the automation of Vr and communicated to both drivers. All the phases may be aborted or interrupted by all agents, whenever no common course of action can be established.
The timing of the prepare phase (tgap) was established by using a model-based approach, cf. (1). The variable dgap corresponds to the needed gap width between Vl and the dummy vehicle (D) driving ahead to safely perform a lane-change. The initial distance between Vl and D at the beginning of the prepare phase is represented by dfollow. ∆𝑥 brake equals the distance that results between Vl and D during deceleration of Vl from vleft (36 m⁄s) to vmin (30 m⁄s) in tbrake (2 s) [14]. 𝑡𝑔𝑎𝑝 =
𝑑𝑔𝑎𝑝 −𝑑𝑓𝑜𝑙𝑙𝑜𝑤 −∆𝑥𝑏𝑟𝑎𝑘𝑒 𝑣𝑙𝑒𝑓𝑡 −𝑣𝑚𝑖𝑛
+ 𝑡𝑏𝑟𝑎𝑘𝑒
(1)
With these parameters, the estimated timing of the prepare phase for the lane-change scenario was calculated (2) to require a total time of 9 seconds: 𝑡𝑔𝑎𝑝 =
It has been shown that an improvement of the driver’s mode awareness in automated systems (for supporting the common course of action) requires a detailed process analysis [10]. In order to gain a precise understanding of the interaction primitives occurring in the cooperative scenario, each of the UML activity’s phases from Fig. 2 has been modeled as an interaction sequence [1]. As an example, for the suggest action phase, the interaction flow starts in Vr between human (Hr) and machine (Mr, the automation): Mr informs, via the human-machine interface (Ir, e.g. visually via the head-up display), about the request “Searching cooperation partner…”, which Hr needs to perceive from Ir and to process cognitively. Aboard the other vehicle Vl, at the same time, automation Ml requests “Do you want to cooperate?” over the interface Il, again perceived and then decided by the human Hl. This is followed by a confirmation (e.g. “yes” via a button press) and replied then to Mr and Ml…
81 𝑚−33 𝑚−6 𝑚 36 𝑚⁄𝑠−30 𝑚⁄𝑠
+2𝑠 =9𝑠
(2)
By summing up the individual durations for each interaction phase – request preparation (4 s), prepare (9 s), suggest action (4 s) and action (6 s) – an estimation of the total duration of 23 s was obtained [14]. Whenever the criteria for a successful lane-change is unsatisfiable, the cooperation is aborted. The drivers are informed “cooperation aborted” over the interface multimodally. Succeeding actions (e.g. open gap) are not suggested and at the same time prevented (e.g. change lane). The automation in Vr then resumes normal highly automated driving, which leads to Vr falling into follow mode behind O. Such an abort could happen because of timeouts during all phases (such as missing responses or changed minds) or altered frame conditions (e.g. sudden speed differences). Once the action phase began, it is considered irreversible, in case of unexpected events (e.g. stopping vehicles), all involved automations would induce an emergency safe state (i.e. stopping the vehicle).
All those interaction primitives during the five phases require in total a certain amount of time, entailing deadlines, whereby cooperation is no longer reasonable and therefore aborted. This amount (unknown then) relies on human processing (e.g. how long does it take to answer the question “Do you want to cooperate” positively or negatively), on the interface modality (the chosen communication channel, like augmented reality) and above all on the scenario itself (e.g. velocities of the vehicles and their distances). While the scenario and the multimodal user interface (see sections II.F and II.G) have been designed by reference to the interaction phases, the timing and
E. Patterns For and Mode Aspects of Cooperation Within the project D3CoS, several design patterns were defined for designing and implementing human-machine interaction and interfaces for distributed cooperative systems [15]. Those patterns provide cross-domain generic “recipe-like” solutions for targeting recurring problems. Such as the design problem that “more than one agent could be affected by the communicated demands of other agents”. Agents, as previously defined,
229
may be human and machine: Mr, Hr, Ml, and Hl in this specific case. But there are several more agents considered for the lanechange situation, specifically all the vehicles in the left lane, therein the corresponding drivers and machine agents. If Hr or Mr would need help for executing a specific task, e.g. conducting a lane-change, cooperative behavior is needed. An option is to ask: “Could someone please help me in performing the lanechange?” To set the light indicator on the road is a metaphor for asking for help in form of a gap. Generally spoken, the problem is now twofold: It is a diffusion of responsibility on the one hand (nobody could feel responsible for opening gaps), and an addressing of unsuited partners for cooperation (a car too far ahead could respond). The solution this exemplar design pattern provides is called explicit addressing: Selecting the most optimal partner for cooperation and targeting the communication/interaction flow directly to that particular agent. In terms of the given example, asking “Driver Hl, could you please assist me by opening a gap?” is expected to produce higher success rates. The efficiency of explicit addressing has been shown in many psychological studies [16]. Nine such design patterns [15] have been used for iterating the interaction sequence and approaching the user interface design for the phases initiation, maintenance, and completion of cooperation.
is feasible with a static HUD (Fig. 3, left), where the projected information’s position is fixed; more advanced is the contact analogue HUD (Fig. 3, right) which allows linking of the projection to the scene dynamically. This technique has been introduced for projecting the security clearance dynamically ahead on the road [20]. Studies show: scene-linked information interfaces allow the user to process scenery and augmented information simultaneously [21]. Scene-linked AR therefore increases mode awareness [10] and thus has been selected for applying it during the further user interface design process. The corresponding requirements analysis has been conducted [22].
Figure 3. Superimposed static, position-fixed head-up information (left) vs. scene-linked AR [22].
Furthermore, mode awareness can be influenced by the graphical user interface itself, in the way it displays the scenelinked information. Various aspects such as style, color, size, position and transparency affect the driver’s attention and mode awareness. Transparency, for instance, may guide the attention between different projected information [23]. Coloring can be used in order to group specific information which is not necessarily projected next to each other [24]. Consistency of symbols and shapes supports the aesthetics on the one hand. On the other hand, a consistent presentation can be used to increase the contrast between HUD projection and scenery by embedding symbols in elementary shapes like circles or squares. The projected warning triangle in Fig. 3 (right) is integrated in a red circle which leads to better contrast and readability; moreover it generates a unitary visual impression.
As mentioned earlier, the improvement of the driver’s mode awareness is an important goal of this research. In order to improve mode awareness, Sarter identified three mode aspects that an automated system needs to communicate accordingly [10]: what mode (including tasks and actions) is the system running currently, an explanation why the system/interaction is in that specific mode, and a projection of the future and therefore what the next mode or action it will be in. We extended this concept by adding the aspect who for cooperative systems in order to communicate the information “who is the cooperative partner” [1]. All the interaction sequence’s (see II.D) interaction primitives were ensured to communicate those mode aspects. For example, during the phase request in Vr, the semantic information “searching cooperation partner…” needs to transport the request itself (what) for the upcoming cooperation (next) because of the necessity for a lane-change because of the obstacle (why) towards another partner, namely Vl (who).
Concept studies for all interaction phases and interface perspectives (Il, Ir) were elaborated in different workshops. An example of the iteration is shown in Fig. 4, the development stages of the idea behind the design pattern explicit addressing can be observed here. It is implemented by an AR indicator, which, contrary to the classic light indicator, is only displayed to a partner ideally suited for the cooperation (relying on the timing of the interaction sequence in II.D). The mode aspects can be seen as well: why is presented directly by the scenario and the scenelink, who is additionally forced by the intensity of the augmentation, next is displayed by tendencies (e.g. the hemisphere) and carpets on the road, and finally what is represented iconographically (e.g. the action suggestion by arrows).
At the same time, possibilities were researched on how to communicate those cooperative solutions (design patterns) and mode aspects towards the user. The design pattern Group Allocation (which agent is cooperating with whom [15]), for instance, which is closely linked to the mode aspect who, could be communicated by means of a dyadic link [17]. All the following user interface considerations were iterated by and rely on those patterns and aspects. F. Augmented Reality While designing solutions for cooperative interaction and user interfaces that are supporting mode awareness, considerations like dyadic links or techniques for marking who or projecting next rapidly demand advanced interface techniques. Such as augmented reality (AR), which “allows the user to see the real world, with virtual objects superimposed upon or composited with the real world” [18]. This idea can be transferred to information display in cars technically by using a head-up display (HUD, illustrated in Fig. 3) [19]. The easiest AR display in cars
Figure 4. Concept studies of the augmented user interface (action phase, Il).
230
The graphical user interface has been development through various stages [22]. An important aspect of the final user interface is the use of specific colors for different classes of information. There are three color coded classes of information: white for non-informative graphical objects (e.g. carpet for the gap, see Fig. 4 right), blue for passive information (e.g. blue sphere for marking O, see Fig. 6) and green for active information and activity (e.g. green arrow for the lane-change action, see Fig. 6). In order to establish a border between graphical objects and scenery and to improve the contrast, the user interface uses white colored borders for each graphical object. The design essence in return influenced the creation of design patterns for cooperative user interfaces like “Augmented Reality” [15].
therefore trigger the automated lane-change. For the haptic channel, only deviations from the set point on the inceptors (necessary to keep the speed or lane) were used to trigger accept or reject in order to avoid inadvertent actions. In both perspectives, the auditory channel is used by the machine to announce additional action and abort sounds. This kind of multimodal action suggestion cue (on the augmented visual and auditory sensory channels) was used in order to (re)capture the driver’s attention. Inherent in the definition of highly automated driving (no surveillance necessary) is the chance of a hypovigilant driver or a user dedicated to different activities than driving (e.g. reading a newspaper). Multimodal cues have been shown to be effective for capturing spatial attention [27].
The final set of graphical objects is a tradeoff between a minimal amount for reducing distraction and a maximum transportation of all interaction semantics. The interface design finally consists of seven elements recurring in both perspectives, namely four scene-linked objects (sphere and suggestion carpet for both perspectives, action arrow left for Vr, see Fig. 6, and action arrow back for Vl) and three statically projected status objects (searching partner, cooperation established, cooperation aborted). The static projected objects are mainly used within Vr towards informing the driver about the preparation process for the lane-change in the retral traffic. The concept of marking vehicle Vr by an indicating (hemi-) sphere was kept through all steps of the iteration (Fig. 4). Additionally the final interface involves the fading of transparency for all graphical objects.
III.
IMPLEMENTING THE SIMULATED COOPERATIVE COURSE
The implementation of the previously discussed cooperative interaction and user interface concept was done in the static driving simulator at the Institute of Ergonomics. The fully instrumented BMW 640i mockup is surrounded by a ca. 270° view on three front and three rear screens. The simulation environment SILAB was used, which is expandable by data processing units (DPUs). Those DPUs can be sophisticated graphical 3D objects, predefined logical blocks or custom C++ libraries to realize more complex calculations and simulation control flows.
This reduced set of augmented interface elements led to a (according to the workshops) acceptable amount of accentuated and additional information. Due to the fact that the cooperation never involves more than one cooperation partner and the interaction is sequential, one element is added per phase at most. Even in high-traffic situations, the user interface thus never produces overwhelming information. G. Multimodality According to Wickens’ S-C-R (stimulus–central processing–response) theory, a compatibility principle for single spatial tasks is to combine human visual input with human manual output modalities [25]. A lane-change conducted by action suggestion is a spatial task. Hence, the user interface was designed multimodally, involving besides the augmented visual also the (bidirectional) haptic and auditory channels. This multimodality was achieved as part of the interaction sequence [14]. Within Vl, temporary overriding the automation’s set point for keeping the desired speed on the accelerator pedal (resulting in an acceleration of the vehicle) is used to reject a request (during request phase) or to abort the cooperation (in all later phases). Overriding the brake pedal and therefore decelerating is used to confirm a request (again during the initial request phase). This matches the driver’s mental model (primary inner compatibility [26]) in terms of expected actions and their consequences while driving: Opening or closing a gap would be initiated in manual driving in the same way. In the case of Vr, the brake pedal is again used to abort the cooperation (during all phases) or to reject a suggestion (in the suggest phases). An actuation (again temporary) on the steering wheel towards the target direction (i.e. left, which matches the primary outer compatibility [26]) may be used by the driver to confirm the action suggestion and
Figure 5. Relationship between scenario (green), vehicle (yellow), interaction (red) and interface (blue) control.
In this way, DPUs were implemented as shown in Fig. 5. The central interaction control (red) is the main component, which, on the one hand, coordinates the scenario timing (calculation of the vehicle interplay from the scenario, green) and the interaction timing (when to switch between phases). On the other hand, the interaction control is responsible for the arbitration (between traffic, again green) and the execution of actions and suggestions (open gap, get slower, change lane, etc.). This again is done for traffic (traffic control) and the own vehicle, which is controlled by automation (lateral and longitudinal control, orange). The interaction control interfaces the user interface control (violet); its main task is the coordination of the ten graphical AR objects (blue). The haptic inceptors brake, active accelerator, and active steering wheel (all blue) are handled by the automation control, which passes specific interaction requests from and to the interaction control. An example for the transfer from user interface design prototype (see Fig. 6 left) to the implemented final simulation (see Fig. 6 right) is shown for the suggest action phase from the per-
231
spective of Vr. As the specific interaction/interface flow in combination with the timing is hardly imaginable, a video of the simulated scenario is available as well (http://perm.ly/d3cos) from both vehicle perspectives.
for the phase suggest action aboard the vehicle Vr (see Fig. 5) was rated highest (M = 2.8, SD = 0.6). Especially the scenelinked green arrows in combination with the carpet (symbolizing the gap) helped the test persons to perform convincingly. Nevertheless there was a negative rated view (M = −0.1, SD = 1.7), which was presented in the static HUD to inform about the search for an appropriate partner for cooperation. A central issue in this view is an included animation which moves a dot in a circle around two cars to symbolize an establishing connection, which was difficult to recognize. B. Timing Based on the estimated overall duration (23 s) of the cooperative lane-change scenario, a general theory T1 about the timing was inferred [14]. T1 states that 30 s provides enough time for drivers to successfully complete a cooperative lane-change, whereas 20 s is not sufficient. To make assumptions about the accuracy of T1, three measurable hypotheses were postulated. The first hypothesis (H1.1) proposed success rates greater than 90% if participants were given 30 s to complete the cooperative lane-change scenario. The second hypothesis (H1.2) stated that – compared to 30 s – success rates would significantly drop if the whole duration of the scenario was limited to 20 s. Lastly, the third hypothesis (H1.3) predicted a significant increase in perceived workloads if test participants were given 20 s to perform the cooperative lane-change instead of 30 s.
Figure 6. Exemplary design (left) and implementation (right) in the simulator.
EXPERIMENTAL EVALUATION
The data collected in the simulator experiment supports H1.1. In the left and right perspective (see Fig. 7), the success rates are around 90%. Additionally, success rates in the left perspective showed a significant drop (48% vs. 88%) when drivers were given only 20 s to perform the cooperative lane-change. Since success is dichotomous data collected in paired samples, we used McNemar’s test for testing significance. This outcome supports H1.2, whereas the data collected in the right perspective contradicts H1.2, since in this case success rates were overall very high. By looking at the subjects’ perceived workloads it was found that they did not differ significantly between scenarios with durations of 20 s or 30 s, thus H1.3 had to be rejected.
***
The experimental procedure was – for each of the ten conditions – first to induce workload (in the different levels) while driving highly automated on the motorway without additional tasks (ca. 180 s). Second the subject was activated by means of a gong sound when the request interaction phase started. Third the subjects were suggested (in suggest preparation and suggest action) to perform the described highly automated cooperative lane-change by mutual control (in the different perspectives, timings, and with/without CoS, 20–30 s) by accepting the suggestion (or ignoring/rejecting it). Finally we collected feedback by questionnaires ten times while the highly automated drive continued (ca. 120 s). The measurements taken were experiment time, condition, interaction phase, interface activity (e.g. steering), vehicle positions and distances, gaze positions, glanced objects (as a video), different visual metrics (elaborated in IV.C) and two questionnaires (NASA TLX and a qualitative one).
Success Rate [%]
100
A. Quality of User Interface The evaluation of the user interface [22] was based on a postexperiment questionnaire consisting of three parts. The subjects were asked to review general aspects of the interface, to interpret the subjectively perceived meaning of each interface view and to respond to additional questions about some specific views on a seven point ordinal scale (−3 to +3). For the perceived meaning, free-form comments were collected as well.
**
IV.
The experimental evaluation was conducted in the same driving simulator with a sample of 25 paid subjects, mostly university students, with a mean (M) age of 22.24 and a standard deviation (SD) of 1.96 years. We used a within-subjects design; each of the subjects ran through ten randomized conditions in one single trial for ca. 52 minutes. There were no brakes between the conditions. All participants did an additional five minutes acclimatization drive. Of 24 possible combinations depicting both perspectives Vl and Vr, conditions with and without cooperative system (CoS), short and long timing (20 s vs. 30 s), and finally three different workload conditions (no, medium, high in terms of a response task), we chose a subset of ten conditions. We operationalized five of the research aspects outlined in section I for evaluation: gaze patterns, the efficiency of design, the quality of the user interface, the accuracy of the estimated timing, as well as a workload measurement. The latter three are discussed in detail in the following subsections IV.A to IV.C.
80 60 88
40 20 0
36
88
92
100
48
Left
Right
Scenario Without CoS
CoS 20 s
CoS 30 s
Figure 7. Percentage of success rate for the two cooperation perspectives (left and right) for no CoS, 20 s CoS (short timing) and 30 s CoS (long timing): Significantly higher lane-change success rates for the long timing on the left lane. McNemar test with p < 0.01 (**) and p < 0.001 (***).
Results show that the graphical interface supported the test subjects during the cooperative lane-change scenario. The view
232
*
***
350
**
The overall high success rates in the right perspective might be rooted in the experiment design. By deflecting the steering wheel, subjects were able to confirm the action suggestion and to initiate the lane-change [14]. However, subjects were only able to overrule the automation and to weigh in with their actions – steering, braking and accelerating – if the system allowed it (referred to the common course of action). In addition, the right perspective was overrepresented in the experiments. Thus, learning effects within the group of subjects could not be avoided completely. They presumably started to respond to the scenario, the CoS and its user interface by an automated behavior.
Mean Distances [m]
300
192.3
200 150 100 50
These two aspects of the experiment design allowed subjects to trick the CoS. They simply started steering before the action suggestion phase had begun. If subjects kept the steering wheel deflected, the CoS would initiate the lane-change as soon as the action suggestion phase had started. Consequently, subjects were not required to react to the suggestion by the CoS. The early steering led to consistently high success rates in the right perspective that contradict H1.2. However, in the left perspective subjects were not able to outsmart the CoS and were forced to properly react to the cooperation requests. That is why the success rates in these cases show the expected drop between the 30 s and 20 s timing conditions and support H1.2.
227.7
250
0
29.4
45.6
32.2 49.6
Vr-Vl 𝑉 𝑙 𝑉𝑟
Vr-D 𝑉𝑟 𝐷
Vr-O 𝑉𝑟 𝑂
Vehicle Distances Without CoS
CoS 30s
Figure 8. Evaluation of timing hypothesis H3 for the mean safety clearances (error indicator: SD) in the right scenario: significantly lower distances without CoS and significantly earlier (worse coordinated) lane-change. Wilcoxon test with p < 0.05 (*), p < 0.01 (**), p < 0.001 (***).
As the automation inhibited a precipitated lane-change in the CoS condition, we furthermore analyzed the distances at the moment of the first steering wheel deflection (which was in 92% of the cases earlier than the action suggestion). The results advise an opposed effect; such as the distance between Vr and D, which was significantly shorter in the in the CoS condition (Mdn = 23.9) compared to the condition without CoS (Mdn = 28.9; T = 86, p < 0.05, r = −0.29). We assume that the confirmed willingness to cooperate increased the drivers’ confidence and certainty to change the lane earlier with the CoS activated and is perceived as being safe.
Conclusively, the duration of 30 s should provide enough time for a majority of drivers to successfully perform a cooperative lane-change in both perspectives (left/right). Moreover, 20 s should be considered the minimum duration for the proposed scenario since drivers have to react rather quickly (< 4 s) to the suggest action or the request preparation phase. Although only the left perspective supports T1, it may be assumed that under the right circumstances (automation can be overruled) a similar outcome might be observed in the right scenarios as well. One further theory (T2) examined the influence of the CoS on the subjects’ willingness to cooperate [13]. It postulated that with the CoS in place, subjects would be more likely to help other drivers to facilitate a lane-change, since they are directly asked by the CoS to cooperate. However, findings show no significant difference in the left lane drivers’ willingness to cooperate whether the CoS was active or not.
C. Workload Two hypotheses were proposed considering the driver’s workload. The first one assumed that mental workload (MWL) increases with growing complexity of the n-back task (delayed response task with simple auditory stimuli to distract the subject from driving). The second hypothesis suggests both a U-shaped relation between MWL and performance and a U-shaped function between MWL and prospective MWL. So the lowest reaction time (best performance) and the least stressful situation for the driver are achieved with a medium workload condition, as this reflects the optimal activation level. Both a low and a high demanding situation cause longer reaction times. In the first condition the driver is mentally out-of-the situation and therefore his activation level is low and he gets overstrained in a critical event. In the latter his mental abilities are already challenged, so no further resources remain for the actual event which leads to worse performance and more stress as well. [28, 29]
The last theory (T3) evaluated the safety benefits of the CoS by comparing the clearances between the involved vehicles – Vl, Vr and O – at the beginning of the action phase. The mean distances (see Fig. 8) were significantly larger when the CoS was active. Vehicle distances were collected as independent interval data, but appeared not to be normally distributed (p < 0.001) according to Kolmogorov-Smirnov (KS) and Shapiro-Wilk (SW) in the right scenario while the CoS was active. This was expected as a lane change was not suggested/executed by the automation if the safety clearances were not met. The right scenario condition without CoS however was normally distributed (KS, SW, n.s.). Levene’s test furthermore showed a heterogeneity of variance (p < 0.001). Consequently the Wilcoxon signed-rank test was deployed. Following this, drivers accepted significantly smaller distances between following and leading vehicles (at 33 to 36 m⁄s) during the lane-change at the beginning of the action phase if no CoS intervened. Therefore an increase in safety was confirmed as long as the CoS was deployed, as hazardous actions were prevented by the system.
In the literature there are many ocular activity parameters as indicators for increased mental workload. This data suggests that during higher levels of MWL, operators tend to blink quickly and less often (shorter blink duration and lower blink frequency), experience dilated pupils, have the eyes less closed, exhibit more but shorter saccades, and fixate more often and longer on important areas of interest [30, 31, 32].
233
***
***
Pupil diameter of right eye [mm]
Some of the above mentioned visual metrics were analyzed concerning the prospective mental workload. For example, significantly less and shorter gazes on the dynamic AOIs were found for the medium workload condition. Additionally, the pupil diameter for the right eye, the amount of saccades and the amount of blinks showed significant results. Figure 10 illustrates the U-shaped trend with the smallest amount of saccades (M=7.2) for the medium workload condition compared to the low (M=12.7) and high (M=9.6) workload conditions.
***
8
6 5.84 4
2
5.12 4.43
no n-back
1-back
D. Discussion Regarding the augmented user interface, the qualitative feedback offers clues about its benefit, which was essentially supporting cooperative actions and their suggestions by using hemisphere, arrow, and carpet objects. Considering the certainty about the system status during the initialization of the cooperation, the static head-up elements (e.g. searching) didn’t work out. The hypotheses corresponding to the timing theory T1 could only partially be confirmed, but provide evidence that 30 s are an appropriate amount of time for a cooperative lane-change interaction. The interaction itself was not able to improve the willingness to cooperate (T2), so additional motivational strategies will become important. The cooperative paradigm mutual control supported safety (T3) by prohibiting risky actions, but this strict overruling should be attenuated in the future in order to increase comparability. A variety of visual workload assessments has been used, formulated as a design pattern [15] and will be used for the future individual modification of the interaction timing based on user availability, as it affects the prospective workload. In this regard, a U-shaped relation between MWL, performance and prospective MWL was confirmed.
3-back
Workload condition
*
25
*
Amount of saccades
Figure 9. Evaluation of increased workload during n-back condition (before user activation) by means of the mean pupil diameter (error indicator: SD). ANOVA with p < 0.001 (***).
20 15
10
12,7 12.7
5 0
7.2 7,2 no workload
medium workload
9,6 9.6
high workload
Workload condition
V.
Figure 10. Evaluation of U-shaped performance during lane-change condition (after user activation) using the mean amount of saccades for estimating different levels of workload (error indicator: SD). ANOVA with p < 0.05 (*).
ITERATION AND NEXT STEPS
Although it would add complexity to the scenario’s operationalization, participants of future studies should be able to overrule the automation and the cooperative system. Thus it might be possible to collect unbiased data on perception response times for both perspectives. Furthermore it might be helpful to create a more diversified scenario design with even more diversified experimental conditions in order to reduce learning effects and signs of fatigue within the sample and to gain a more accurate estimation of the needed time.
The analysis of the workload during the n-back task supported the first hypothesis quite well. Most of the mentioned visual metrics such as the pupil diameter, closure of the eye, blink frequency and duration of the saccades were investigated. Figure 9 depicts the significant increased pupil diameter of the right eye with growing workload (diameter of 4.43 mm for the no n-back task compared to a diameter of 5.84 mm for the 3-back task). Post-hoc tests showed a significant difference between the no and the medium and the no and the high workload condition. Also significant were the results for the pupil diameter for the left eye and the blink duration. Although the other variables did not show statistically significant differences, the descriptive data supported the idea of an increase of MWL.
For additional studies on the drivers’ willingness to cooperate, it must be assured that subjects are not biased at the beginning of the experiments. The introduction to the experiment should not contain any implications for the participants to drive cooperatively. But as willingness to cooperate is expected to be linked to necessity to cooperate, criticality should be introduced in the form of varying speed differences between the vehicles and between Vr and O. Higher levels of criticality will be included in the follow-up study in terms of modified scenarios and an updated interaction concept, in order to measure the benefit in safety. To further motivate cooperation, a rewarding system will be introduced in our further studies.
In addition, much evidence was found for the second hypothesis. The steering response latency and the total duration of the lane change were investigated in regards to the reaction time. Although the results were not statistically significant, the descriptive data indicated that the medium workload condition is associated with the best reaction time. Compared to the steering response latency of the no workload (13.24 s) and of the high workload condition (13.18 s), the mean value of 10.25 s for the medium workload condition is obviously smaller. As the total duration time correlates with the steering response latency, a similar trend is observable for this parameter.
A. Ring Closure of State Inference and Adaptation The perspective for improving the mutual control is to adapt the system dynamically based on the driver’s state (that needs to be inferred by the machine). We intend to reach this in three
234
steps: from the current post-hoc analysis of gaze patterns on dynamic areas of interest (like other cooperation partners, obstacles, or AR) where we depict the human visual processing in the lane-change scenario, via an online state inference of the driver’s visual attention during the simulation and an online estimation of workload (which needs to be calibrated individually per subject), to a real-time adaption of the system mode and interface. The adaptation based on workload would be a task redistribution (in terms of mutual control: the machine decides if the driver is overloaded) and the adaptation based on visual processing targets visual cues for guiding the driver’s attention. Currently we are running a pretest with a simple two-dimensional search task.
[11] [12]
[13] [14] [15]
B. Measuring Cooperation The success rates shown in IV.C suggest that the interaction concept at least supports cooperative behavior. In order to answer the question “Does the interaction concept improve the cooperation?” a precise metric is needed, because success rates are not sufficient for rating cooperativeness. We expect, for arbitrary vehicle groups (e.g. two), for non-cooperative behavior to lead to more frequent changes in longitudinal and lateral speed (the signal) and observable phase shifts. This is measurable by using digital signal processing methods [33]. We are currently establishing such a measurement method in order to test three assumptions for better cooperation: lower frequencies are visible after a Fourier transform, a cross correlation of the considered signals is higher, and their phase shift is smaller.
[16] [17] [18] [19]
[20] [21]
ACKNOWLEDGMENTS We thank Johann Kelsch, Matthias Heesen and Gerald Temme (DLR) for fruitful discussions on the interaction design; Michael Krause, Moritz Körber and the TUM Graduate School for proofreading this article.
[22]
[23]
REFERENCES [1]
M. Zimmermann and K. Bengler, “A Multimodal Interaction Concept for Cooperate Driving,” in 2013 IEEE Intelligent Vehicles Symposium (IV), 2013, pp. 1285–1290. [2] K. Bengler, M. Zimmermann, D. Bortot, M. Kienle, and D. Damböck, “Interaction Principles for Cooperative Human-Machine Systems,” it Information Technology, vol. 54, no. 4, pp. 157–164, 2012. [3] J.-M. Hoc, “Towards a cognitive approach to human-machine cooperation in dynamic situations,” International Journal of Human-Computer Studies, vol. 54, no. 4, pp. 509–540, 2001. [4] D. Damböck, “Automationseffekte im Fahrzeug – von der Reaktion zur Übernahme,” Dissertation, Institute of Ergonomics, Technische Universität München, Munich, 2013. [5] A. Lüdtke, D. Javaux, F. Tango, R. Heers, K. Bengler, and C. RonfléNadaud, “Designing Dynamic Distributed Cooperative Human-Machine Systems,” in Work: A Journal of Prevention, Assessment and Rehabilitation, IOS Press, Ed, 2012, pp. 4250–4257. [6] P. Hidas, “Modelling vehicle interactions in microscopic simulation of merging and weaving,” Transportation Research Part C: Emerging Technologies, vol. 13, no. 1, pp. 37–62, 2005. [7] M. Heesen, M. Baumann, J. Kelsch, D. Nause, and M. Friedrich, “Investigation of Cooperative Driving Behaviour during Lane Change in a Multi-Driver Simulation Environment,” Human Factors and Ergonomics Society (HFES) Europe Chapter Conference Touluse, 2012. [8] J.-M. Hoc, “Human and automation: a matter of cooperation,” HUMAN 07, Timimoun: Algeria, 2007. [9] T. F. Golob and W. W. Recker, “A method for relating type of crash to traffic flow characteristics on urban freeways,” Transportation Research Part A: Policy and Practice, vol. 38, no. 1, pp. 53–80, 2004. [10] N. B. Sarter and D. D. Woods, “How in the World Did We Ever Get into That Mode? Mode Error and Awareness in Supervisory Control,” Human
[24] [25]
[26] [27] [28]
[29]
[30] [31] [32] [33]
235
Factors: The Journal of the Human Factors and Ergonomics Society, vol. 37, no. 1, pp. 5–19, 1995. T. M. Gasser, “Rechtsfolgen zunehmender Fahrzeugautomatisierung: Gemeinsamer Schlussbericht der Projektgruppe,” Berichte der Bundesanstalt für Straßenwesen, no. F 83, 2012. C. Gold, D. Damböck, L. Lorenz, and K. Bengler, ““Take over!” How long does it take to get the driver back into the loop?,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 57, no. 1, pp. 1938–1942, 2013. E. Donges, “Aspekte der aktiven Sicherheit bei der Führung von Personenkraftwagen,” in Automobil-Industrie, Würzburg: Vogel, 1982, pp. 183–190. S. Bauer, “Evaluating the Timing Conditions of a Cooperative Lane Change Scenario,” Diploma Thesis, Lehrstuhl für Ergonomie, Technische Universität München, München, 2013. M. Zimmermann, Ed, “Public Deliverable D3-10: Reference Designs and Design Patterns for Multi-Modal Human Machine Interfaces – Final Version,” ARTEMIS JU 269336-2, München, 2014. G. Barron and E. Yechiam, “Private e-mail requests and the diffusion of responsibility,” Computers in Human Behavior, vol. 18, no. 5, pp. 507– 520, 2002. J. Kelsch, G. Temme, and J. Schindler, “Arbitration based framework for design of holistic multimodal human-machine interaction,” in AAET 2013, Braunschweig, Germany, 2013. R. T. Azuma, “A Survey of Augmented Reality,” Presence: Teleoperators and Virtual Environments, vol. 6, no. 4, pp. 355–385, 1997. N. J. Ward, A. M. Parkes, and P. R. Crone, “The effect of background scene complexity on the legibility of head-up-displays for automotive applications,” IEEE Vehicle Navigation and Information Systems Conference Proceedings, pp. 457–462, 1994. H. Bubb, “Untersuchung über die Anzeige des Bremsweges im Fahrzeug,” Dissertation, Technische Universität München, 1975. D. C. Foyle, R. S. McCann, and S. G. Shelden, “Attentional Issues with Superimposed Symbology: Formats for Scene-Linked Displays,” Proceedings of the Eight International Symposium on Aviation Psychology, pp. 98–103, 1995. N. Lütteken, “Design und Implementierung eines augmentierten Anzeigekonzeptes für die Mensch-Computer Interaktion in einem kooperativen Fahrstreifenwechsel-Szenario,” Bachelor Thesis, Lehrstuhl für Ergonomie, Technische Universität München, München, 2013. B. L. Harrison, G. Kurtenbach, and K. J. Vicente, “An experimental evaluation of transparent user interface tools and information content,” in Proceedings of the 8th annual ACM symposium on User interface and software technology, Pittsburgh, Pennsylvania, USA: ACM, 1995, pp. 81– 90. F. Hall, “Software-Gestaltung: Farbe auf dem Bildschirm,” Computer und Arbeit, vol. 8, no. 9, 2007. C. D. Wickens, D. L. Sandry, and M. Vidulich, “Compatibility and Resource Competition between Modalities of Input, Central Processing, and Output,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 25, no. 2, pp. 227–248, 1983. H. Bubb, “Systemergonomie,” in Ergonomie, H. Schmidtke, Ed, München: Hanser, 1993. C. Spence and V. Santangelo, “Capturing spatial attention with multisensory cues: A review,” Multisensory integration in auditory and auditory-related areas of cortex, vol. 258, no. 1–2, pp. 134–142, 2009. I. M. Rothkirch, “Evaluation of the driver mode with eye tracking in a highly automated lane-changing situation,” Bachelor Thesis, Fakultät für Informatik & Lehrstuhl für Ergonomie, Technische Universität München, Munich, 2013. M. Zimmermann, I. M. Rothkirch, and K. Bengler, “Reading the Driver: Visual Workload Assessment in Highly Automated Driving Scenarios,” Proceedings of the 5th International Conference on Applied Human Factors and Ergonomics (AHFE 2014), vol. 2014, 2014. M. S. Jessee, “Ocular Activity as a Measure of Mental and Visual Workload,” Human Factors and Ergonomics Society Annual Meeting Proceedings, vol. 54, no. 18, pp. 1350–1354, 2010. S. T. Iqbal and B. P. Bailey, “Using eye gaze patterns to identify user tasks,” The Grace Hopper Celebration of Women in Computing, 2004. U. Ahlstrom and F. J. Friedman-Berg, “Using eye movement activity as a correlate of cognitive workload,” International Journal of Industrial Ergonomics, vol. 36, no. 7, pp. 623–636, 2006. S. W. Smith, The scientist and engineer's guide to digital signal processing. San Diego, Calif: California Technical Pub, 1997.