tasking Environment: Are False Alarms Really Worse than Misses? Eric T. Chancey. Yusuke Yamani. J. Christopher Brill. James P. Bliss. Leidos. Dayton, OH.
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting
1621
Effects of Alarm System Error Bias and Reliability on Performance Measures in a Multitasking Environment: Are False Alarms Really Worse than Misses? Eric T. Chancey Leidos Dayton, OH
Yusuke Yamani Old Dominion University Norfolk, VA
J. Christopher Brill Air Force Research Laboratory Dayton, OH
James P. Bliss Old Dominion University Norfolk, VA
Researchers suggest signaling system false alarms damage human performance more than misses, yet the evidence is inconsistent. Therefore, we examined the effects of false alarm and miss rates on concurrent task and detection task performance. Method: Eighty-eight participants interacted with two primary flight simulation tasks and a secondary detection task, with the help of a signaling system that varied by error bias (false alarm-prone, miss-prone) and reliability (90%, 60%). Results: Higher reliability led to better detection task performance and the miss-prone group had higher dependence rates. Misses, however, negatively affected detection task performance more than false alarms and Bayes analyses indicated no differential effect of error type on primary task performance. Conclusion: The frequent alarms in the false alarm-prone group appeared to aid participants in switching from the primary to secondary detection task. Participants may have used the information acquisition function of the aid, but not the information analysis function.
Copyright 2017 by Human Factors and Ergonomics Society. DOI 10.1177/1541931213601890
Sensor-based signaling systems are prevalent in numerous multitasking environments, such as aviation (Pritchett, Vándor, & Edwards, 2002), power plant operation (Carvalho, do Santos, Gomes, Borges, & Guerlain, 2008), healthcare (Cvach, 2012), and ground transportation (Lees & Lee, 2007). Broadly, signaling systems represent a category of automation that employs stimuli, such as alarms, alerts, and warnings, to direct a human operator to hazards that may require intervention or further inspection (Bliss & Gilson, 1998). Unfortunately, these systems are not perfectly reliable and may generate errors, which manifest as either misses or false alarms. The purpose of this work is to explore which error more negatively affects performance in a multitasking environment. Sensor-Based Signaling Systems Signaling systems are sometimes referred to as signal detection systems because they are compatible with signal detection theory (SDT) and analysis, where the system monitors noisy input data for abnormal events (Sorkin & Woods, 1985). This perspective assumes system output takes the form of a hit (signal present and system signals event), correct rejection (signal absent and system remains silent), miss (signal present and system remains silent), or false alarm (signal absent and system signals event). Within the SDT paradigm, signal event detection is described by the system’s sensitivity (d') and response criterion (β or c) parameters. The sensitivity parameter represents the system’s capability to distinguish abnormal from normal events. Although designers should attempt to maximize this parameter, it is determined by technological capability and the knowledge required to inform what constitutes an abnormal event. Alternatively, the response criterion represents the sensor threshold setting, or the amount of evidence required to signal an event. Unlike the sensitivity parameter, this threshold can be set to any level desired by the designer (Sorkin & Woods, 1985). Importantly, the response criterion determines the system’s error bias. If the sensor threshold is set too conservatively (i.e., much evidence is required to signal an event), then false alarms will be minimized and misses will increase. If the threshold is set too liberally (i.e., little evidence is required to signal an event), then missed events will be minimized and
false alarms will increase. Both sensitivity and response criterion determine the system’s reliability, or the number of errors that occur during a given time period, which can negatively impact user performance (Green & Swets, 1966). System Errors on Human Performance Generally, higher reliability leads to higher signal dependence rates (Manzey, Gerard, & Wiczorek, 2014), quicker signal response times (Chancey, Bliss, Proaps, & Madhavan, 2015), and greater user sensitivity (Rice, 2009) (for review, see Wickens & Dixon, 2007). Yet, beyond reliability, system error bias also affects human responses and task performance. False alarms tend to result in a cry wolf effect, where the user reduces, slows, or even ignores system alarms, alerts, and warnings (Bliss & Gilson, 1998). Alternatively, misses can result in the operator having to continuously cross-check the system’s accuracy to ensure no events are overlooked. Misses can, therefore, force the operator to divide attention among tasks, leading to increased workload and deterioration in task performance (Dixon, Wickens, & McCarley, 2007). Although high error rates of either type negatively affect performance, research is somewhat inconsistent as to which error is more damaging. Wickens and McCarley (2008) suggest that false alarm-prone (FP) systems are as disruptive of task performance as miss-prone (MP) systems, if not more so (p. 37). The authors offer two explanations: 1) The signal, hit or false alarm, will usually be responded to in some form, this will lead to more frequent shifts away from a concurrent task than an MP system; 2) False alarms are more salient than misses, which makes the unreliability of the system more obvious and leads to a decline in automation dependence. Noting that FP systems are more detrimental to detection task performance than equally reliable MP aids, Rice and McCarley (2011) reported that even when false alarms and misses were matched for perceptual salience, an FP system led to lower participant sensitivity and dependence than those using an MP system. When false alarms were framed as neutral messages, however, the false alarm and miss asymmetry was reduced. Similarly, Dixon et al. (2007) reported that participants using an MP system performed better (i.e., d') and responded quicker to detecting system failures than those us-
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting
ing an FP system. Concurrent tracking task performance, however, was not different between the MP and FP systems. Yet, in a single task experiment, Rice (2009) found no effect of error bias on participant sensitivity for detecting a tank in an aerial image when using systems that ranged in reliability from 95% to 55% in 5% increments. Unsurprisingly, higher reliability led to higher sensitivity. Chancey et al. (2015) found that, although the effect of miss rate on participant sensitivity was more pronounced than false alarm rate, signaling system dependence rate, alarm response time, and primary task performance was not significantly affected by error bias. Finally, Madhavan, Wiegmann, and Lacson (2006) did not find differences between the effects of misses and false alarms on participant sensitivity. Purpose and Research Questions The purpose of this study was to determine which error is more harmful to both detection performance and concurrent task performance. We expected higher reliability to result in better detection task performance and higher signal dependence (Chancey et al., 2015). Yet because the literature is somewhat inconsistent as to which error more negatively affects performance, we generated two research questions: RQ1) Will the effect of increasing error rate on concurrent (i.e., primary) task performance be more damaging for participants interacting with the MP system or the FP system?; RQ2) Will the effect of increasing error rate on detection task performance (i.e., secondary task) be more damaging for participants interacting with the MP system or the FP system? METHOD Participants Eighty-eight undergraduate students (56 female) from Old Dominion University (ODU) participated for research credit. Participants self-reported an average age of 19.29 years (SD = 2.13), playing video-games an average of 2.69 hours/week (SD = 5.08), and using computers (work/recreation) an average of 17.55 hours/week (SD = 13.09). All participants indicated having normal or corrected-to-normal visual acuity at the time of participation. No participant indicated having color deficiency or hearing impairment. This research complied with the American Psychological Association Code of Ethics and was approved by the Institutional Review Board at ODU. Experimental Tasks Experimental tasks were hosted on two desktop computers with two 12-inch monitors. Auditory alarms were presented by a RadioShack ® PRO-100 Communication Headset. Primary flight simulation tasks. Participants performed two tasks from the Multi-Attribute Task Battery (MATB II; Santiago-Espada, Myer, Latorella, & Comstock, 2011). Participants performed the compensatory tracking task using a Microsoft SideWinder Precision 2 Joystick and attempted to keep a continuously drifting reticle at the center of a pair of crosshairs (Figure 1A). Participants performed the resource management task using a ten-key number pad to activate pumps that would transfer fuel among tanks that were continuously depleting fuel, to maintain levels at a pre-specified amount of 2,500 units in the top two tanks (Figure 1B).
1622
Figure 1. Screenshots of the MATB-II’s Tracking Task (A) and Resource Management Task (B).
Secondary signaling system task. We developed the signaling system task with SuperCard 4.7TM, which was modeled after the task used by Rice (2009). Participants judged the presence or absence of a tank in a series of aerial pictures. We created 30 aerial pictures of Bagdad, Iraq, using screenshots from Google Earth. Tank images were embedded within each of these pictures, and across conditions participants were presented with the same 30 pictures with and without an embedded tank (resulting in 60 pictures).There was a delay between each picture (randomized: 10, 14, or 18 seconds). Pictures appeared for 3 seconds before transitioning to a screen asking for a response (i.e., “Tank Present” or “No Tank”). Participants were aided by a “Tank-Spotting Aid,” which diagnosed the presence or absence of a tank. If the aid diagnosed the presence of a tank it issued an alarm by surrounding the suspected quadrant of the picture in red and sounded an auditory tone (sinewave increasing in frequency from 700 to 1,700 Hz in 0.85 seconds with an interruption interval of 0.12 seconds; MIL-STD-411F March 1997). If the aid diagnosed the absence of a tank, no alarm occurred while the aerial picture was presented. The aid’s diagnosis was also presented at the response screen. We presented a point bank in the task window, where a point was added or subtracted for correct or incorrect responses respectively (Figure 2).
Figure 2. Secondary tank spotting task sequence. 1) Delay screen (randomized: 10, 14, or 18 seconds). 2) Aerial picture screen (3 seconds). 3). Response screen (no time limit)
Design We used a 2 (error bias: FP, MP signaling system) × 2 (reliability: 90%, 60%) × 2 (risk: High, Low) split-plot design, where all variables were fixed effects. We manipulated Risk to support a separate research effort based on this study (see Chancey, Bliss, Yamani, & Handley, 2017, for details). We manipulated error bias and risk between-subjects and reliability within-subjects. Reliability indicated the percentage of images the system correctly alarmed when a tank was present
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting
or silent when a tank was absent; error bias indicated if the system would err by making false alarms or misses (Table 1). We instructed the high risk group that poor task performance would cause additional time to be added to the experiment (this consequence was not ultimately enforced). We provided no instructions about the consequences of poor task performance to the low risk group. Table 1 Detection response matrix for the false alarm prone (FP) and miss prone (MP) systems according to reliability for the signaling system task 90% FP 60% FP 90% MP 60% MP Hits 30 (.50) 30 (.50) 24 (.40) 6 (.10) False Alarms Misses
6 (.10)
24 (.40)
0 (.00)
0 (.00)
0 (.00)
0 (.00)
6 (.10)
24 (.40)
Correct Rejections d′ c
24 (.40)
6 (.10)
30 (.50)
30 (.50)
2.97 -.64
1.29 -1.48
2.97 .64
1.29 1.48
Note. Numbers outside of parentheses represent the raw number of responses per category during the session. Numbers in parentheses represent the proportions of responses out of the total number of responses during each session.
For the signaling system task performance metrics, we recorded participant sensitivity (d′), response bias (c), response time (RT; seconds from onset of choice to the response), and alarm score (final amount of points accumulated at the end of each session). Dependence rate was the number of times the participant’s response matched the system’s diagnosis out of the total number of aerial pictures presented. For compensatory tracking task performance, we recorded root mean square deviation of the drifting reticle from the center point in pixel units, sampled every 10 seconds. For resource management task performance we recorded the deviation in fuel “units” from 2,500 units, sampled every 30 seconds. Procedure Participants first completed informed consent and demographics forms. Participants then received task instructions, where half received instructions indicating either “High Risk” or “Low Risk” associated with poor task performance (randomized). Participants then practiced the primary tasks and searched 10 aerial pictures containing a tank. Participants completed a perceived risk questionnaire, followed by a 10minute practice session with the primary and secondary tasks (signaling system was 100% reliable). Following this familiarization session, participants completed two 20-minute sessions with either a 60% or 90% reliable aid (counterbalanced). Half of the participants used FP systems and the other half used MP systems (randomized). Following the completion of the two sessions, participants were thanked and debriefed. RESULTS We inspected the data for normality, confirming that no data were missing, and consulted Levene’s tests to inspect for homogeneity of variance. We used multiple 2 (error bias: FP, MP signaling system) × 2 (reliability: 90%, 60%) × 2 (risk: High, Low) split-plot analyses of variance (ANOVAs) to test for main effects and interactions. We established an alpha lev-
1623
el of p < .05 to indicate statistical significance. The results of the main effect of risk on tracking task performance are published in another venue (see Chancey et al., 2017). All other analyses, however, are novel contributions. To examine whether the data supported any null hypotheses, we employed default Bayesian tests (Rouder & Morey, 2012). In this analysis, Bayes factors served as the measure of evidence for the effect of interest, referred to as B10. Briefly, the Bayes Factor measures the magnitude of each effect by comparing the full model, including the effect of interest, to the reduced model excluding the effect. A value greater than 3 is generally considered as substantial evidence for the effect and 1/3 as substantial evidence against the effect (Jeffreys, 1961). The default Bayesian tests were conducted using the BayesFactor package in R (Morey & Rouder, 2015). Tracking Task Performance. A significant main effect of risk on tracking task performance was observed, F(1, 84) = 10.42, p = .002, partial η2 = .11. Participants in the high risk group (M = 39.89, SE = 1.37) kept the drifting reticle significantly more stable than those in the low risk group (M = 46.16, SE = 1.37). No other main effects or interactions were observed (p > .05). The data favored neither the null or full model for the main effect of reliability and the interactions involving risk [1/2.90 < B10 < 1/1.31] and gave substantial evidence against the main effect of error bias [B10 = 1/3.8] and the error bias by reliability interaction [B10 = 1/3.52]. Resource Management Performance. No main effects or interactions were observed for resource management performance (p > .05). Moreover, data indicated substantial evidence against all main effects and the error bias by reliability interaction [all B10 < 1/3.12]. Using a term by Jeffreys (1961), all remaining effects were anecdotal [1/2.26 < B10 < 1/1.04]. Dependence Rate. A significant interaction between reliability and error bias on dependence rate was observed, F(1, 84) = 17.76, p < .001, partial η2 = .18. A follow-up analysis on simple effects indicated a significant effect of reliability in both the FP, Wilk’s λ = .22, F(1, 84) = 300.37, p < .001, partial η2 = .78, and the MP group, Wilk’s λ = .39, F(1, 84) = 129.29, p < .001, partial η2 = .61. Alternatively, for the 60% reliable condition, participants in the MP group agreed with the alarm system more often than those in the FP group, F(1, 84) = 45.38, p < .001, partial η2 = .35 (see Figure 3).
Figure 3. Average dependence rate as a function of reliability (90% and 60%) and error bias (error bars represent standard errors).
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting
Response Time. A significant interaction between reliability and error bias on RT was observed, F(1, 84) = 3.98, p = .049, partial η2 = .05. A follow-up analysis of simple effects indicated a significant effect of reliability on RT, but only in the FP group, Wilk’s λ = .934, F(1, 84) = 5.90, p = .017, partial η2 = .07. Alternatively, there was a significant effect of error bias, but only in the 60% reliability group, F(1, 84) = 11.16, p = .001, partial η2 = .12 (see Figure 4). No other main effects or interactions were observed for RT (p > .05).
1624
84) = 19.31, p < .001, partial η2 = .19 (see Figure 6). No other main effects or interactions were observed for sensitivity (p > .05).
Figure 6. Average sensitivity (d′) as a function of reliability (90% and 60%) and error bias (error bars represent standard errors).
Figure 4. Average RT (seconds) as a function of reliability (90% and 60%) and error bias (error bars represent standard errors).
Alarm Score. A significant interaction between reliability and error bias on alarm score was observed, F(1, 84) = 22.46, p < .001, partial η2 = .21. A follow-up analysis on simple effects indicated a significant effect of reliability on score for both the FP, Wilk’s λ = .78, F(1, 84) = 23.47, p < .001, partial η2 = .22, and MP groups, Wilk’s λ = .39, F(1, 84) = 133.34, p < .001, partial η2 = .61. Alternatively, there was a significant effect of error bias, but only in the 60% reliability group, F(1, 84) = 22.53, p < .001, partial η2 = .21 (see Figure 5). No other main effects or interactions were observed for alarm score (p > .05).
Response Bias. A significant interaction between reliability and error bias on response bias was observed, F(1, 84) = 10.61, p = .002, partial η2 = .11. A follow-up analysis of simple effects indicated a significant effect of reliability on response bias, but only for the FP group, Wilk’s λ = .88, F(1, 84) = 11.91, p = .001, partial η2 = .12. Specifically, participants in the 90% FP condition tended to respond as though a tank was present (M = -.08, SE = .05) and those in the 60% FP condition tended to respond as though there was no tank (M = .11, SE = .04). Alternatively, there was a significant effect of error bias for both the 90% condition, F(1, 84) = 49.42, p < .001, partial η2 = .37, and the 60% condition, F(1, 84) = 12.24, p = .001, partial η2 = .13 (see Figure 7). No other main effects or interactions were observed for response bias (p > .05).
Figure 7. Average response bias (c) as a function of reliability (90% and 60%) and error bias (error bars represent standard errors). Figure 5. Average secondary alarm-task score as a function of reliability (90% and 60%) and error bias (error bars represent standard errors).
Sensitivity. A significant interaction between reliability and error bias on sensitivity was observed, F(1, 84) = 19.72, p < .001, partial η2 = .19. A follow-up analysis on simple effects indicated a significant effect of reliability on sensitivity for both the FP, Wilk’s λ = .76, F(1, 84) = 26.99, p < .001, partial η2 = .24, and MP groups, Wilk’s λ = .39, F(1, 84) = 131.69, p < .001, partial η2 = .61. Alternatively, there was a significant effect of error bias, but only in the 60% reliability group, F(1,
DISCUSSION Previous research has suggested false alarms are more damaging than misses on both concurrent task performance and, because of a greater negative impact on dependence rate, detection task performance (Dixon et al., 2007; Rice & McCarley, 2011; Wickens & McCarley, 2008). Contrary to this position, we found that detection task performance was actually worse for those using a MP system, even though false alarms did result in lower dependence rates than misses. Indeed, the fact that the FP system was continually calling atten-
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting
tion (i.e., alarming) to evaluate the picture appeared to help performance (cf. Breznitz, 1984). Moreover, primary task performance was unaffected by error bias. In fact, there was substantial evidence to indicate there was no effect of error bias on concurrent task performance at all. This evidence opposes the assertion that false alarms are more damaging than misses on human performance. Interestingly, participants may not have been using the FP system to help them diagnose the presence or absence of a tank, but instead used it to help determine when to inspect an image. Based on Parasuraman, Sheridan, and Wickens’ (2000) taxonomy, the signaling system in this experiment supported both information acquisition (i.e., Stage 1 automation) and information analysis (i.e., Stage 2 automation). Yet participants using the FP system seemed to be depending on the Stage 1 function more heavily, when the system was very unreliable. Clearly, the MP system did not support the Stage 1 function well at the lower reliability (i.e., 60% FP system issued 54 alarms compared to the 60% MP system that issued 6). Evidence for the effects of this difference were reflected by the MP system producing significantly longer response times than the FP system. Because participants using the MP system were not regularly signaled to evaluate an image, they were slow to notice that there was a picture to evaluate. This functional difference also likely resulted in participants depending upon the MP system diagnosis (i.e., Stage 2 function) more than the FP system diagnosis. If the participant did not shift their attention to evaluate the image, then they may have been more likely to simply agree with the system rather than guess. The 60% reliability level falls below the 70% cutoff proposed within Wickens and Dixon’s (2008) quantitative literature review, which is the point at which the human would perform better without the automation. In this case, participants using the MP system had little choice but to depend upon the system diagnosis, as they had less of an opportunity to evaluate the images themselves. Alternatively, the frequent false alarms (supporting information acquisition) buffered a more severe detection performance decrement from the frequent diagnostic errors, by allowing the participant to do the task manually and thus avoiding the “concrete life preserver” effect (Wickens & McCarley, 2008, p. 37). Conclusion Although we found evidence disputing the notion that false alarms are more detrimental to concurrent and detection task performance than misses, laboratory research (Dixon et al., 2007; Rice & McCarley, 2011) and research on operational signaling systems (e.g., Bliss, 2003) show that false alarms can have a greater negative impact on task performance. Yet, operationally, sensor thresholds are often set to minimize missed events, which can be disastrous (e.g., fire alarm miss, aviation collision avoidance alarm miss, hospital physiological monitor alarm miss). Clearly, the purpose of signaling systems are to direct the human to hazards, and highly conservative threshold settings are akin to having no system at all. In that case, the human would be continuously monitoring for abnormal events, a task which humans are poorly suited. Designers, therefore, must anticipate how the signaling system will be used, and where to set the threshold if high reliability cannot
1625
be achieved. Though the FP and MP systems were equally sensitive, participants in this study were able to use the liberal threshold setting (i.e., frequent alarms) to their advantage, showing false alarms are not always worse than misses. Author Note: The views expressed are those of the authors and do not necessarily reflect the official policy or position of the Air Force, the Department of Defense, or the U.S. Government.
REFERENCES
Bliss, J. P. (2003). Investigation of alarm-related accidents and incidents in aviation. The International Journal of Aviation Psychology, 13 (3), 249268. Bliss, J. P., & Gilson, R. D. (1998). Emergency signal failure: Implications and recommendations. Ergonomics, 41 (1), 57-72. Breznitz, S. (1984). Cry wolf: The psychology of false alarms. Hillsdale, NJ: Lawrence Erlbaum Associates. Carvalho, P. V., do Santos, I. L., Gomes, J. O., Borges, M. R., & Guerlain, S. (2008). Human factors approach for evaluation and redesign of humansystem interfaces of a nuclear power plant simulator. Displays, 273-284. Chancey, E. T., Bliss, J. P., Proaps, A. B., & Madhavan, P. (2015). The role of trust as a mediator between system characteristics and response behaviors. Human Factors, 57 (6), 947-958. Chancey, E. T., Bliss, J. P., Yamani, Y., Handley, H. A. H. (2017). Trust and the compliance-reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human Factors, 59 (3), 333-345. Cvach, M. (2012). Monitor alarm fatigue: An integrative review. Biomedical Instrumentation & Technology, 268-277. Dixon, S. R., Wickens, C. D., & McCarley, J. S. (2007). On the independence of compliance and reliance: Are automation false alarms worse than misses? Human Factors, 49 (4), 564-572. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford, UK: Oxford University Press. Lees, M. N., & Lee, J. D. (2007). The influence of distraction and driving context on driver response to imperfect collision warning systems. Ergonomics, 50 (8), 1264-1286. Madhavan, P., Weigmann, D. A., & Lacson, F. C. (2006). Automation failures on tasks easily performed by operators undermine trust in automated aids. Human Factors, 48 (2), 241-256. Manzey, D., Gerard, N., & Wiczorek, R. (2014). Decision-making and response strategies in interactions with alarms: The impact of alarm reliability, availability of alarm validity information and workload. Ergonomics, 57 (12), 1833-1855. Morey, R. D., Rouder, J. N., & Jamil, T. (2015). Package “BayesFactor”. https://cran.r-project.org/web/packages/BayesFactor/BayesFactor.pdf Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 30 (3), 286-297. Pritchett, A. R., Vándor, B., & Edwards, K. (2002). Testing and implementing cockpit alerting systems. Reliability Engineering and System Safety, 75, 193-206 Rice, S. (2009). Examining single- and multiple-process theories of trust in automation. The Journal of General Psychology, 13 (3), 303-319. Rice, S., & McCarley, J. S. (2011). Effects of response bias and judgment framing of operator use of an automated aid in a target detection task. Journal of Experimental Psychology: Applied, 17 (4), 320-331. Rouder, J. N., & Morey, R. D. (2012). Default Bayes factors for model selection in regression. Multivariate Behavioral Research, 47, 877-903. Santiago-Espada, Y., Myer, R. R., Latorella, K. A., & Comstock, J. R. (2011). The Multi-Attribute Task Battery II (MATB-II) Software for Human Performance and Workload Research: A User's Guide (NASA/TM-2011217164). Hampton, VA: National Aeronautics and Space Administration, Langley Research Center. Sorkin, R. D., & Woods, D. D. (1985). Systems with human monitors: A signal detection analysis. Human-Computer Interaction, 1, 49-75. Wickens, C. D., & Dixon, S. R. (2007). The benefits of imperfect diagnostic automation: A synthesis of the literature. Theoretical Issues in Ergonomics Science, 8 (3), 201-212. Wickens, C.D. & McCarley, J.S. (2008). Applied Attention Theory. CRC Press.