Modeling the Effect of Loudness and Semantics of Speech. Warnings on Human Performances. Yiqi Zhang and Changxu Wu. State University of New York ...
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
817
Modeling the Effect of Loudness and Semantics of Speech Warnings on Human Performances Yiqi Zhang and Changxu Wu State University of New York (SUNY) at Buffalo, United States The quantitative prediction and understanding of human performances in the responses to speech warnings is an essential component to improve warning effectiveness. Queuing network-model human processor (QN-MHP), as a computational architecture, enables researchers to model dual-task information processing. The current study enhanced QN-MHP by modelling the effect of loudness and semantics on human responses to speech warning messages. The model predictions of crash rate were validated with two empirical studies in collision warning systems with resultant R squares of 0.73 and 0.77, respectively. The developed mathematical model could be further utilized in optimizing the design of speech warnings to achieve most safety benefits. aspects of human can be found at its webpage (“Mathematical Modeling of Human Performance”, 2013).
Copyright 2014 Human Factors and Ergonomics Society. DOI 10.1177/1541931214581172
INTRODUCTION Warning messages play an important role in communicating information of potential hazards to avoid accidents and injuries (Laughery, 2006). In order to facilitate communications between humans and machines and improve the safety of the entire system, it is increasingly important to develop effective methods to evaluate warning effectiveness in alarm systems. To date, much work has been done to study warning effectiveness with empirical studies (Laughery, 2006; Mortimer, 2007). However, behavioral approaches to assess speech warning effectiveness can be highly task-depended, time consuming and high-cost. The model to predict human responses to warning systems is still missing in current practice but is indispensable in the context of complex human-machine interacted systems. Repeated application of the models provided designers with the effect of different design parameters and their interactions associated with human performance. The present work attempted to develop a mathematical model by modelling the human responses to speech warning messages. Two message parameters were modeled: the acoustic properties and the semantic content. The validity of the model was tested by two empirical studies. The validated model can be then utilized to predict the effectiveness of speech warning messages for different warning systems. MATHMATICAL MODELING OF HUMAN PERFORMANCES TO SPEECH MESSAGES Overview of Queuing Network-Model Human Processor Queuing Network-Model Human Processor (QN-MHP) is a computational model which integrates three discrete serial stages (including perceptual, cognitive, and motor processing) into three continuous-transmission sub-networks of servers (see Figure 1). Since this architecture was established, QNMHP has been applied to quantify various aspects of human cognition and performance, for instance, human driving speed control (Zhao, Wu, & Qiao, 2013), human mental workload (Wu, Liu, & Quinn-Walsh, 2008) and reinforcement learning process (Wu, Berman, & Liu, 2010). The integration of mathematical modeling work of QN-MHP to quantify various
Enhancements of Queuing Network-Model Human Processor (QN-MHP)
Figure 1. Enhanced QN-MHP architecture The message parameters to be modeled in this work, acoustic properties and semantic content, have significant effects on human behaviors (Baldwin, 2011; Jang, 2007). It has been shown that the hemodynamic response signal increases with increasing sound level in the auditory cortex, as well as at lower processing levels (Uppenkamp et al., 2006). Therefore the effect of acoustic properties of messages is modeled at Server 6. The semantic content of messages modeled at Server 8 is recognized at the superior temporal sulcus. Due to the interference caused by the auditory information from the messages of on-going tasks, memory decay may occur. The effect of message parameters on memory decay is modeled in the working memory system regarding auditory processing (i.e., Servers B and C). There were two stages involved in processing warning signal words associated with hazard perception (Ma, Jin, & Wang, 2010). The first stage is an early rapid automatic activity influenced by stimuli strength and the second stage is the activation of the hazard evaluation. The route choice is modeled at the posterior system represented by Server B. Previous fMRI study also indicated that hazard rating activated the medial prefrontal cortex, the inferior frontal gyrus, the cerebellum, and the amygdale (Vorhold et al., 2007), which is modeled in the Server F.
Downloaded from pro.sagepub.com by guest on October 28, 2015
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
MATHEMATICAL MODELING MECHANISMS QN-MHP will be used to model responses to speech messages from human’s perception of speech messages to the reception of neural signals at the primary motor cortex. In order to estimate the reaction time and error rates, the route of speech message responses in QN-MHP is defined as: 5 → 6/7 → 8 → B → C → F → C → W → Z → 21/22/23/24/25
Modeling the Effect of Message Parameters on Subjective Rating (Perceived Urgency and Annoyance) Loudness was reported having a positive relationship with urgency expression (Hellier et al., 2002). By quantifying the relationship with the Stevens Power Law, the perceived urgency (U) and annoyance (A) was modeled as follows: log(𝑥𝐿 ) = 𝑚𝑥 log(𝐿) + 𝑘𝑥 , 𝑥 is U or A (1) where m quantifies the relationship between perceived value and objective acoustic change. L denotes the loudness level. The parameters are quantified as: 𝑚𝑈 =1.33, 𝑚𝐴 =1.45, 𝑘𝑈 = −0.64 , 𝑘𝐴 = −0.91 (Gonzalez, Lewis, Roberts, Pratt, & Baldwin, 2012,Parmanen, 2007) The relationship between semantic and perceived urgency Wogalter and Silver (1990, 1995) had participants to rate how careful they would be to a set of signal words. Similar sets of words have been studied before in detail (Hollander & Wogalter, 2000) using the word notice rather than note. The perceived urgency of “danger”, “caution” and “notice” speaking by a female on a 100-point scale were quantified as 90.53,72.40 and 46.81 (Hellier, Edworthy, Weedon, Walters, & Adams, 2002). Modeling the Effect of Message Parameters on the Probability of Route Choice The effect of message parameters on reaction time ( 𝐼𝑅𝑇,𝑖 ) and response error rate ( 𝐼𝐸,𝑖 ) were modeled with different route choices in the message information processing. A shorter route (Route I) would be chosen when human learned the proper message responses for urgent situation from the daily life. However, this short route can be inhibited when the perceived urgency level is lower than the threshold to trigger it. Therefore, a longer route (Route II) with the decision making process will take place in that case. When processing information with Route II, human would take a longer time to make proper responses and the probability of making errors would decrease correspondingly when the speech messages were detailed processed. As an example, drivers tend to make the braking response compare to accelerating to avoid any potential collision in the emergency situation. Studies regarding speech messages had set up criteria for message parameters relevant for emergency situations. In terms of message loudness, Berg (1973) reported that 50% probability threshold of a startle response was 85dB (A) SPL. In terms of message semantic content, different signal words expressed different levels of perceived urgency (Hollander & Wogalter, 2000). Therefore, speech messages with its loudness higher than 85dB and the certain signal word (e.g. “Danger”) would represent an emergency situation. Route I: 5 → 6⁄7 → 8 → B → W → Z → 21/22/23/24/25
818
Route II:5 → 6⁄7 → 8 → B → C → F → C → W → Z → 21/22/23/24/25 The route choice probability (𝑃𝑖 ) at a certain server was 1 1 modeled in a previous work: 𝑃𝑖 = ⁄(∑𝑈 𝑗=1 ) (Wu & Liu, 𝑆𝑖
𝑆𝑗
2008) where 𝑆𝑖 is the travel time of route i. The effect of perceived urgency on route choices was modeled in the phonological loop (Server B). Both acoustic properties (e.g. loudness) and semantic content (e.g. signal words) affected the urgency levels of the situation as the message described. It is necessary to process the detail of messages through a longer route (Route II) in the cognitive subnetwork when humans perceived different urgency levels from the loudness and signal words. The difference in the urgency of speech messages brought by its loudness (𝑈𝐿 ) and semantics (𝑈𝑆 ) was added in the modeling of route choice probability (𝑃𝑖 ): 𝑃𝑖 =
1 𝑆𝑖 1 ∑𝑈 𝑗=1𝑆 𝑗
(
|𝑈𝑆 −𝑈𝐿| 𝑘
)
(2)
𝑖
where i refer to the route II and 𝑃𝐼 constant parameter.
= 1 − 𝑃𝐼𝐼 and k is a
Modeling the Effect of Message Parameters on Message Reaction Time The reaction time was defined as the time duration from the time when the message occurred to the time when the human started to react. As assumed in QN-MHP, entity processing time at an individual server was independent of arrivals of entities and that routing was independent of the state of the system. Therefore, the reaction time (𝑅𝑇𝑖 ) of a speech auditory stimulus can be modeled by summarizing the processing time of all the servers on the route i. 𝑇 + 𝑇6 + 𝑇8 + 𝑇𝐵 + 𝑇𝑊 + 𝑇𝑍 ,i = I 𝑅𝑇𝑖 = { 5 𝑇5 + 𝑇6 + 𝑇8 + 𝑇𝐵 + 𝑇𝐶 + 𝑇F + 𝑇𝐶 + 𝑇𝑊 + 𝑇𝑍 , i = II
(3)
where 𝑇𝑘 is the processing time of stimulus at Server k (k=5-8, B-Z). The effect of loudness on reaction time was modeled in the initial processing of auditory stimuli in Server 6. And the effect of semantics on reaction time was modeled at Server B: T6,=𝑇6(0) /𝑈𝐿 , (4) where 𝑇6(0) is the initial entity processing time in Server 6 and𝑈𝐿 denotes the effect of loudness on perceived urgency. TB,=𝑛𝑖 × 𝑇𝐵(0) /𝑈𝑠 , (5) where 𝑇B(0) is the entity processing time in Server B and 𝑛𝑖 is the number of words in the ith warning message.𝑈𝑠 denotes the urgency level expressed by the initial words (e.g. signal words) in the speech messages. All in all, the equation (3) for modeling reaction time of auditory messages though route i is updated as: 𝑅𝑇𝑖 = {
𝑇5 + 𝑇5 +
𝑇6(0) 𝑈𝐿
𝑇6(0) 𝑈𝐿
+ 𝑇8 + 𝑛𝑖 ×
+ 𝑇8 + 𝑛𝑖 ×
𝑇𝐵(0)
𝑈𝑠 𝑇𝐵(0) 𝑈𝑠
+ 𝑇𝑊 + 𝑇𝑍 ,i = I
+ 𝑇𝐶 + 𝑇F + 𝑇𝐶 + 𝑇𝑊 + 𝑇𝑍 , i = II
(6)
Modeling the Error Rate in Message Response Message parameters have different influence on message response error rate in different stages of message responses. When human processed messages mainly through route I in an emergency situation, the error rate was mainly influenced by the effect of acoustic properties and initial semantics (e.g.
Downloaded from pro.sagepub.com by guest on October 28, 2015
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
signal words for warning messages) on message perception. In the non-emergency situation, the messages are mainly processed through route II. Therefore the error rate in the message responses was also influenced by the hazard evaluation and potential memory decay of the messages. Effect of message parameters on message responses error rate The effect of message parameters on error rate (IE) can be modeled with the joint effect of speech message loudness and semantics. Compare to the route I, route II involved the detail processing of messages (e.g. hazard evaluation) at the decision making Server F. The perceived urgency and annoyance brought by initial semantic content would be diminished. Therefore, the error rate (IE,i) of route i was modeled with the following equation 1
𝑈
𝑈
( 𝐿 + 𝑆 ),𝑖 = 𝐼 𝐼𝐸,𝑖 = { 2𝑈𝐿100 𝐴𝐿100 ( − ),𝑖 = 𝐼𝐼 100
(7)
100
where 𝑈𝐿 and 𝑈𝑠 denote the perceived urgency level expressed by message loudness and semantics, respectively; and AL is the annoyance level caused by message loudness. Based on the route choice probability modelled in equation (2), the overall error rate was modeled as: 𝐼𝐸 = ∑2𝑖=1 𝐼𝐸,𝑖 × 𝑃𝑖 (8) where 𝑃𝑖 denotes the probability of information processing through route i The impact of memory decay and hazard evaluation on error rate Humans react to speech messages along with monitoring of the environment. When there is no potential hazard after making responses, humans will go back to normal operations of driving tasks and make responses to the speech message again when the estimated hazard is coming. During this process, the corresponding hazard estimation and the potential memory decay of the message content may also increase error rate. The Servers B and C in QN-MHP represent the working memory system regarding auditory information processing, where the equation (9) of memory decay will be integrated. The probability of information retrieving is modeled as: p= 𝑒 𝑎𝑡 (Laughery, 1970) and 𝑎 = −0.02 based on the parameter setting of the MHP (Melton, 1963). The effect of memory decay on speech messages (𝐼𝑀𝐷 ) is computed 1 𝐼𝑀𝐷 = −0.02𝑡𝑡 (9) 𝑒
𝑙𝑒𝑎𝑑
where 𝑡𝑙𝑒𝑎𝑑 denotes the lead time for message responses. It is defined as the available time for message responses from the message occurrence till the message response is no longer necessary. In terms of hazard estimation, human will react to messages when perceived hazard reaches certain threshold. The effect of hazard evaluation on error rate (IH) can be modeled by the equation 𝐻𝑝 𝐼𝐻 = (10) 𝐻 0
where 𝐻𝑝 denotes the perceived value of the hazard and 𝐻0 denotes the actual value of the hazard. In summary, the error rate (r) in message responses is modeled as follows r = 𝐼𝐸 × 𝐼𝑀𝐷 × 𝐼𝐻 (11)
819
The Model Application in Driving and Speech Warning Message Responses The following section presented the application of message response models in modelling human responses to speech warning messages in intelligent transportation systems (e.g., V2V/V2I communication systems and in-vehicle information systems. Corresponding responses to warnings in a driving task include the releasing of the accelerator pedal when drivers are accelerating and the change in braking pedal when drivers are already braking (i.e. foot on brake pedal) or on their way to brake (i.e. released the accelerator). QN-MHP was used to model responses to the warning messages starting from perceiving the information from warning messages to transmit neural signals from the primary motor cortex to the foot server. The route of warning message responses in QNMHP s defined as follows: 5 → 6/7 → 8 → B → C → F → C → W → Z → 25 The hazard evaluation in the driving tasks Previous works invested the effect of motion factors (e.g. optical flow rate, optical density of texture and edge rage) and cognitive factors (e.g. perceived time, actual speed) on the traversed distance estimation(Frenz & Lappe, 2005; Montello, 1976; Redlick, Jenkin, & Harris, 2001). Driver will continuously evaluate the potential hazard based on the information obtained from visual perception and from warning messages (e.g. distance to hazard location). Previous work also suggested a significant effect of actual speed on distance estimation. The relationship between actual distance and estimated traversed distance ( 𝐷𝑃 ) was quantified with the Steven’s power law (Witmer, 1998). 𝑣 𝐷𝑃 = 𝐷0 𝑏 (12) where 𝐷0 denotes the distance between current position of message receiving vehicle and the potential hazard location when message is presented and 𝑣 denotes the instant speed (b=0.955; (Witmer, 1998)). 1 2 𝐷0 = 𝑣0 𝑡𝑙𝑒𝑎𝑑 + 𝑎0 𝑡𝑙𝑒𝑎𝑑 (13) 2 When the perceived distance is shorter than the minimum safety headway, drivers may react to the warning messages directly. Otherwise, drivers continuous to drive and react to warning messages until perceived distance (Dp) reaches the threshold ( 𝐷ℎ = 𝐷𝑝 ). And the hazard evaluation effect on crash rate is modeled as 𝐼𝐻 =
𝐷ℎ 𝐷0
= 𝐷0 𝑏
𝑣(𝑡) −1
(14)
The instant speed (𝑣(𝑡)) and acceleration (𝑎𝑡 ) at time t is modeled as following: 𝑣(𝑡) = 𝑣0 + 𝑎𝑡 (∆𝑡), where 𝑣0 denotes the initial speed, 𝑎𝑡 denotes the acceleration at time t. The constant rate of deceleration(𝑎𝑡 (∆𝑡)) is modelled as: 𝑎𝑡 (∆𝑡) = 𝑚
𝜃̇
× 𝜙 × (Fajen, 2008), where 𝜙 is the global optic flow rate 𝜃 of the textured ground surface, which is constant in a braking task. Novices tended to initiate emergency braking earlier than necessary when initial speed was slow and to a lesser extent, which brought in a parameter m of driving experience (0