Do Velocity Vectors Support Multiple Object Tracking? - Cognitive ...

PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 53rd ANNUAL MEETING—2009

16

Do Velocity Vectors Support Multiple Object Tracking? Krista Oinonen1, Lauri Oksama1, Esa Rantanen2 and Jukka Hyönä1 1

University of Turku, Finland Rochester Institute of Technology, Rochester, NY, USA

2

We examined the effect of velocity vectors on human ability to extrapolate movement of multiple moving objects. This was achieved using an identity tracking task, in which objects (line-drawings of familiar objects) first move for 5 s, after which they temporarily disappear from view. In order to examine different aspects of vector information, we compared 4 velocity vector conditions: (1) no vectors were displayed; (2) only a history trail was displayed; (3) a condition where the end of the velocity vector pointed to exactly where the objects would reappear after the masking; (4) a constant heading length was displayed. Based on recent studies on Keane and Pylyshyn (2006) and Oinonen et al. (2009), we hypothesized that the condition, where exact location information is presented, would improve performance compared to baseline. The results supported this hypothesis by indicating that history trail or directional vector do not improve performance but only complete visual information about the future location will help to anticipate movement in multiple object tracking.

Copyright 2009 by Human Factors and Ergonomics Society, Inc. All rights reserved. 10.1518/107118109X12524440832665

INTRODUCTION In many real-life dynamic visual environments, such as traffic, sports, or air traffic control (ATC), it is important to anticipate in what direction critical objects are about to move and where they would be positioned in the near future. In order to aid to predict the targets’ trajectories into the future, velocity vectors (or, track velocity and direction vector lines) are common in ATC plan view displays (PVD). The velocity vectors achieve a very simple thing of making the dynamic characteristics of targets (i.e., aircraft) on controllers’ display visually available. Without such visualization the controller would be under a heavy cognitive burden having to rely on past experience on the target dynamics (memory) and integration of a large number of circumstantial factors to predict the targets’ trajectories into the future. It appears that the putative benefits of visual representation of target dynamics were almost coincidental to the old radar technology and cathode ray tube (CRT) displays. Because the radar antenna continuously rotates and thus ‘sweeps’ the atmosphere for returns from solid objects, a target is only ‘illuminated’ for the moment the radar antenna is pointed at it; phosphorus coating of the CRT displays provided an afterglow of the radar return (Cole, 1985), forming a history trail for a moving target for past 3-5 positions (depending on the phosphorus decay rate). From such history trails controllers can glean targets’ direction, velocity (the distance between successive past positions), and rate of turn information, and extrapolate from these the future trajectories of the targets. In subsequent digital radar PVDs the history of the targets continued to be provided, together with the computed velocity vector projected ahead of the target (Nolan, 1998). However, remarkably little research has been published on the theoretical underpinnings of predictor displays in ATC. But are vectors really helpful? Do they really help in movement extrapolation? We raise these questions as recent studies on tracking of multiple moving objects that temporarily disappear from the observer’s field of view have suggested that observers are not able to update or anticipate objects’ lo-

cations (Keane & Pylyshyn, 2006; Oinonen, Oksama & Hyönä, 2009). Our ability to extrapolate movement of multiple moving objects during interruption or occlusion hence appears to be quite poor. According to Keane and Pylyshyn (2006) observers can only remember where the critical objects were located at the time of their disappearance or slightly before that. Even knowing the objects’ movement directions does not help observers to extrapolate and predict the future locations of multiple objects. According to the displacement hypothesis of Keane and Pylyshyn, predicting object’s location is the poorer the farther the objects move while invisible, causing bigger displacement from their last visible locations. These results seem to cast a shadow on the intuitive idea that observers can easily anticipate multiple objects’ movement and benefit from velocity vector information. In particular, it follows from these studies that vectors including only directional information (heading or history) may not be beneficial at all, as they do not give any exact visual information on targets’ future locations. Cognitively demanding movement extrapolation process is still required. Instead, only vectors that provide exact information on targets’ future location may be truly helpful. Because these experimental tasks are clearly analogous to those of air traffic controllers, where the occlusion of targets corresponds to the controller taking his or her eyes off the PVD to perform other tasks, an ATC task offers an attractive approach to deeper understanding of the contribution of velocity vectors to human multiple object tracking performance. Past Research on Predictor Displays Much of predictor display literature pertains to control engineering (e.g., Kelley, 1962; 1968) and active vehicular control with applications in nuclear submarines (Berbert & Kelley, 1962) and lunar landing (Fargel & Ulbrich, 1963). For a flavor of this body of literature and further examples of driving an automobile in traffic, a blind pedestrian using a cane or electronic obstacle detector, and remote manipulation of solid


objects using artificial sensors and effectors, see Sheridan (1966). Textbook descriptions of predictor displays are quite short and lack any treatment of theoretical background. Kantowitz and Sorkin (1983) only cite Palmer, Jago, Baty, and O’Connor (1980) about the benefits of predictor displays, and Sanders and McCormick (1993) cite Kelley (1962; 1968) as well as Roscoe, Corl, and Jensen (1981) and Jensen (1981) about quickened displays. As far as empirical human-in-the-loop (HITL) studies are concerned, these come from either direct vehicular control or airborne collision detection settings. Kelley (1968) cites several studies that have demonstrated the benefits of predictor displays in vehicular control, for example controlling a spacecraft and correct rapidly changing thrust disturbances (Besco, 1964) and submarine displays in collision avoidance task (McLane & Wolf, 1966). In the aviation domain most of the predictor display research has been on advanced cockpit displays, either perspective flight displays (e.g., Jensen, 1981; Grunwald, 1981, Merwin, & Wickens, 1996; Wickens, Haskell, & Harte, 1989) or cockpit displays of traffic information (CDTIs; Hart & Loomis, 1980; Kelly, 1983; Kelly & Abbot, 1984; Kelly & William, 1983; Palmer, Baty, & O'Connor, 1979; Palmer et al., 1980; Williams; 1983). These studies also include HITL simulations with strong evidence about the benefits of predictor displays to human performance in estimating future events (e.g., conflicts with other aircraft). Palmer et al. (1980) examined straight and curved predictor lines in an airborne conflict detection task where subjects estimated whether the intruder would pass in front of or behind the ownship. Predictors clearly aided the participating pilots’ performance when both aircraft were moving along straight trajectories as well as when either aircraft was turning if the predictors displayed turn-rate information. Interestingly, the presence or absence of flight path history or the information update rate did not influence performance (although the pilots preferred having history trails displayed and continuous ownship position update). In a similar experiment Hart and Loomis (1980) explored the effect of CDTI velocity vectors on pilots' ability to judge lateral relationships between their own and other aircraft. The participating pilots judged as quickly as possible whether the other aircraft would pass ahead of or behind the ownship at the closest point of approach. Straight predictor lines reduced response time from about 38 s to 31 s and errors from about 15% to 0%in straight flight paths scenarios, and from 40 s to 34 s and from 28% to 5%, respectively, in curved flight path scenarios. Straight predictors in curved flight path scenarios did not help much. Palmer (1983) also manipulated the quality of information provided by the predictor vector in simulated conflict avoidance maneuvering tasks. The participating pilots were able to avoid triggering conflict avoidance system (CAS) alarm 90% of the time when the predictive information was free of noise, but their performance suffered substantially (avoiding triggering the CAS alarm only 78% of the time) when predictor information was degraded. In an eye-movement study on a CDTI Ellis and Stark (1986) showed that the most common points of interest in that

17

display were the leader line predictors, with the aircraft symbol and history trail receiving respectively fewer visual fixations. These results attest to the value of the velocity vectors to the subjects (8 experienced airline pilots) in their task (intruder passing in front of or behind the ownship). Although there seems to be ample empirical evidence for predictor lines supporting task performance, it should be noted that all the experimental tasks reviewed above were typically very short in duration (quick judgments) and involved only a few targets (typically only two, ownship and an intruder). They were hence quite different from the ATC-like tasks of interest to us in our study. Velocity Vectors in ATC There is little empirical research on different predictor displays in ATC, despite their ubiquitous presence on most modern ATC systems. Perhaps this is due to the long evolution of predictor information on ATC displays and the recent research focus on the development of cockpit displays (i.e., CDTIs). Unfortunately, the results from CDTI research generalize poorly to ATC because the air traffic controller’s task is fundamentally different from the pilot’s. In all of the experiments reviewed in the previous section the task was to estimate at the most two targets’ relative position in the immediate future, or compare a vehicle’s state to some criterion value in direct vehicular control. However, while controllers need to estimate aircraft trajectories and judge potential conflicts between aircraft in a pair, they also must maintain situation awareness on a much larger scale, involving tens of aircraft in busy sectors. The research paradigm of tracking multiple moving objects that are temporarily masked appears uniquely suitable for examination of the effects of velocity vectors on tracking performance in a task that is closely analogous to that of air traffic controllers. The present study In the present study we were interested in the influence of velocity vectors on movement extrapolation. A version of multiple identity tracking task (MIT, see Oksama & Hyönä, 2004, 2008) was employed, where at some point all moving targets temporarily disappear from the screen (Oinonen, Oksama, & Hyönä, 2009). Participants were instructed to continue tracking the invisibly moving targets and extrapolate their movement. When the targets reappeared, the participants’ task was to indicate as fast as possible the current location of one auditorily probed target. In MIT, the moving elements all have distinct identities analogously to real-life tracking tasks; thus, the task required continuous updating of what–where– bindings. The task was designed to be a sensitive measure of immediate situation awareness of the observers. In order to examine different features of vector information, we compared four velocity vector conditions (no vectors, only a history trail, velocity vectors corresponding to the masking duration, and heading vectors with constant length) over three target masking durations. Based on Keane and Pylyshyn (2006) and Oinonen et al. (2009) we expected that the


vectors corresponding to masking duration, and thereby visually presenting exact target location into the future, would improve performance the most compared to the baseline. METHOD Participants Twenty undergraduate psychology students (median age 24 years) from University of Turku, Finland, volunteered for the study. Apparatus and Experimental Task The stimuli were presented on a 19-inch Samsung SyncMaster monitor with a resolution of 1280 by 1024 pixels controlled by NVIDIA GeForce FX5200 card, AMD Sempron 2400+, 1.66 GHz, 1.0 GB of RAM computer. The experiment was programmed with the E-prime software (Schneider, Eschman, & Zuccolotto, 2002a, 2002b). The program that generated the motion sequences was written in Visual Basic. The stimuli included 8 black-and-white line drawings (familiar objects and animals), chosen from the picture corpus of Snowgrass and Vanderwart (1980). The pictures were chosen so that their names in Finnish began with a consonant and comprised 5 letters and 2 syllables. Their pronunciation time was about 700 ms. The names were also matched with respect to their frequency in a written language corpus (Laine & Virtanen, 1999). Pictures were 75 x 39 pixels in size, and extended a visual angle of 1.9 x 1.0 degrees. The objects’ movement directions were randomly chosen from among the four intermediate compass directions (northeast, south-east, south-west, and north-west). Object speed was 2.42 degrees/s. Objects were allowed to move with intersecting trajectories. The objects moved for 5 s. after which they disappeared from the display but continued to move. Participants were instructed to continue to track the invisibly moving targets and extrapolate their movement during masking (disappearance). After a given time the objects reappeared and the participants’ task was to immediately locate and indicate the auditorily designated target on the screen. Design Independent variables. We manipulated the masking duration (1680, 2520 and 3360 ms, resulting in the objects’ displacement during masking of approximately 4, 6, and 8 degrees of visual angle, respectively) and the number of targets to be tracked (2 and 3 out of the total of 8 objects). We also employed four velocity vector conditions: (1) in the baseline (B) condition, no vectors were displayed; (2) in the track (T) condition, only a history trail was displayed, indicating the objects’ movement direction but not their future positions; (3) in the congruent (C) condition the velocity vectors corresponded to the masking duration, that is, the end of the velocity vector pointed to exactly where the objects would reappear after the masking; (4) finally, in the heading (H) condition the objects’ velocity vector lengths were constant predicting the objects’

18

positions 2520 ms into the future. Note that this matched the congruent condition when the masking duration was 2520 ms, but with masking duration of 1680 ms the vector pointed farther and with 3360 ms masking duration closer than where the objects would reappear. This condition tested whether the length of the vector aided in anticipation of the targets’ movement. In all visual vector conditions (track, congruent and heading) the vectors were presented in all eight objects. Dependent variable. The movement extrapolaton performance was measured by response time (RT) to locate a requested target at a moment of targets’ reappearance. Design. The experimental design included three withinparticipant variables: set-size (2-3), masking duration (840 ms, 1680 ms, 2520 ms) and vector condition (B, T, C, or H). Each of the resulting 24 conditions was replicated 10 times, resulting in a total of 240 trials for each participant. Procedure A total of 60 different object trajectories were created, for 10 different trajectories per set-size and masking duration. The same trajectories were used in all vector conditions. The target pictures to be identified and their mean distance from the screen center at a moment they reappeared were matched across all experimental conditions. At the beginning of each trial, the objects were displayed stationary for 1s. After that, a black frame flashed on and off for ten times (3s) around the target objects. All objects then began to move in a random and continuous fashion on the screen. The tracking period lasted for 5 s, after which the objects disappeared for one of the three masking durations. When the objects reappeared on the screen (they no longer moved), participants heard simultaneously a name of one target picture via the earphones. They were instructed to locate and point the specific target with the computer mouse as quickly as possible. The mouse cursor was initially positioned in the center of the screen. Participants’ eye-to-screen distance (57 cm) was controlled using a chin rest. The baseline, track, congruent and heading conditions were presented in separate blocks. The other variables were randomized within these blocks. The order of blocks was counterbalanced across participants. There was a short rest period between the blocks. The entire session took about 100 min. RESULTS The results supported our hypothesis. The congruent condition where the vector length corresponded to the masking duration and thus predicted the target position upon its reappearance, resulted in the fastest identification (see Figure 1). Before analysis, erroneous responses (1.3%) were removed from the data. A few response time outliers (0.6%) were also removed (z < -3.29 or z > 3.29). The remaining response time data were analyzed with a 2 (Set-Size) x 3 (Masking Duration) x 4 (Vector Condition) repeated measures analysis of variance (ANOVA). A Greenhouse-Geisser correction was applied to the p-values whenever needed.


1300 1250 1200

Baseline Track

ms 1150

Congruent Heading

1100 1050 1000 1680

2520

3360

Masking duration

Figure 1. Response times (ms) as a function of masking duration (ms) and vector condition (baseline, track, congruent and heading). Vertical lines depict standard error of means (±1 SEM). The main effect of set-size was significant, F(1, 19) = 233.53, p < .001; RTs were lengthened as the set-size increased. The main effect of masking duration was significant as well, F(2, 38) = 10.77, p < .001; RTs were lengthened as as a function of masking duration. The main effect of vector condition was also significant, F(3, 57) = 4.49, p < .01. Planned comparisons were conducted where the baseline condition was compared to other vector conditions. Comparisons indicated that the congruent vector condition differed from the baseline, F(1, 19) = 6.51, p < .05); RTs were faster in the congruent vector than in the baseline condition. The other vector conditions did not differ from baseline; track, F(1, 19) = 1.68, p >. 20; heading, F < 1. The interaction between set-size and masking duration was significant, F(2, 28) = 5.84, p < .05); with the effect of masking duration was more robust for set-size three than setsize two. The other interactions were not significant (Set-size x Vector Condition, F(3, 57) = 1.87, p > .10; Masking Duration x Vector Condition, F < 1; 3-way, F < 1). DISCUSSION Our study examined the influence of velocity vectors on movement extrapolation when tracking multiple invisible objects. We tested this by using the MIT task where at some point all moving targets temporarily disappeared from view. The participants task was to immediately locate and indicate a requested target after it reappeared on the screen. We compared four velocity vector conditions (no vector, only a history trail, velocity vector corresponding to the masking duration, and constant vector length) over three target masking durations. Our results indicate that only the vector providing complete information about the target’s future location improved

19

movement extrapolation when compared to the condition where no visual vectors were present. Surprisingly, neither the history trail nor the forward heading vector with constant length (not pointing to the future location in all trials) did improve performance. If anything, the history trail condition was slightly worse than the baseline condition (no velocity information at all). Our results suggest that only explicit visual information about the future locations of invisibly moving targets will help to anticipate their locations after occlusion. On the other hand, movement history or directional information of targets’ trajectory does little to help on movement extrapolation with multiple targets. Our results are consistent with Keane and Pylyshyn’s (2006) displacement hypothesis. According to this hypothesis, when objects disappear from view, observers maintain the last visible locations of the target objects in the visual short term memory (VSTM). When the objects reappear, this VSTM representation is used as a reference in relocating the targets. This view is also supported by Oksama and Hyönä (2008), whose model of multiple identity tracking (MOMIT) suggests that in order to maintain an up-to-date representation of multiple moving targets focal attention is serially switched between the targets. When focal attention is directed to a target element its identity-location binding is refreshed and updated. And when a target becomes unattented, or in this case invisible, its last visible and attended location is stored in the VSTM. Therefore, relocating a recently unattended target relies on stored information on that target’s last position. Congruent vectors provide explicit location information of the targets’ position after occlusion, which information can then be stored in VSTM for future use. This is why congruent vectors are superior to other vector conditions in multiple identity tracking. This may also explain why the trail condition produced the worst performance. The trail may draw observers’ visual attention toward it and thus bias movement extrapolation in the wrong direction. REFERENCES Berbert, A. G., & Kelley, C. R. (1962). Piloting nuclear submarines with controls that look into future. Electronics, XXXV (June 8, 1962). Besco, R. O. (1964). Manual attitude control systems: Vol II. Display format considerations (Technical report to the NASA). Culver City, CA: Hughes Aircraft Company. Cole, H. W. (1985). Understanding radar. London: Collins. Ellis, S.R., & Stark, L. (1986). Statistical dependency in visual scanning. Human Factors, 28(4), 421-438. Fargel, L. C., & Ulbrich, E. A. (1963). Predictive controllers applied to lunar landing. Instrumentation and Control Systems, 36, 130-131. Grunwald, A. J. (1981). Predictor symbology in computergenerated perspective displays. In J. Lyman & A Bejczy (Eds.), Proceedings of the 17th Annual Conference in Manual Control (NASA-JPL Pub 81-95). Hart, S. G., & Loomis, L. L. (1980). Evaluation of the potential format and content of a cockpit display of traffic information. Human Factors, 22(5), 591-604.


Jensen, R. S. (1981). Prediction and quickening in perspective flight displays for curved landing approaches. Human Factors, 23, 355-363. Kantowitz, B. H. & Sorkin, R. S. (1983). Human factors: Understanding people-systems relationships. New York: John Wiley & Sons. Keane, B. P., & Pylyshyn, Z. W. (2006). Is motion extrapolation employed in multiple object tracking? Tracking as a low-level, non-predictive function. Cognitive Psychology, 52, 346-368. Kelley, C. R. (1962). Predictor instruments look to the future. Control Engineering, 86, 86-90 Kelley, C. R. (1968). Manual and automatic control. New York: Wiley Kelly, J. R. (1983). Effect of lead-aircraft ground-speed quantization on self-spacing performance using a cockpit display of traffic information. NASA TP-2194. Kelly, J. R., & Abbott T. S. (1984). In-trail spacing dynamics of multiple CDTI-equipped aircraft queues. NASA TM85699. Hampton, VA: NASA Langley. Kelly, J. R. & Williams, D. H. (1983). Factors affecting intrail following using CDTI. Proceedings of the Eighteenth Annual Conference on Manual Control, Frank L. George, ed., (AFWAL-TR-83-3021), pp. 485-513. U.S. Air Force. Laine, M. & Virtanen, P. (1999). WordMill Lexical Search Program. Turku, Finland: Centre for Cognitive Neuroscience, University of Turku. McLane, R. C., & Wolf, J. D. (1966). Symbolic and pictorial displays for submarine control. Paper presented at the MIT–NASA Working Conference on Manual Control, Cambridge, MA, February 28–March 2, 1966. Merwin, D. H., & Wickens, C. D. (1996). Evaluation of perspective and coplanar cockpit displays of traffic information to support hazard awareness in free flight (Technical Report ARL-96-5). Savoy, IL: Aviation Research Laboratory. Nolan, M. S. (1998). Fundamentals of air traffic control (3rd ed.). Pacific Grove, CA: Brooks/Cole—Wadsworth. Oinonen, K., Oksama, L., & Hyönä, J. (2009). Movement extrapolation and identity-location binding for invisible objects. In submission. Oksama, L., & Hyönä, J. (2004). Is multiple object tracking carried out automatically by an early vision mechanism

20

independent of higher-order cognition? An individual difference approach. Visual Cognition, 11, 631-671. Oksama, L., & Hyönä, J. (2008). Dynamic binding of identity and location information: A serial model of multiple identity tracking. Cognitive Psychology, 56, 237-283. Palmer, E. A. (1983). Conflict resolution maneuvers during near miss encounters with cockpit traffic displays. Proceedings of the 27th Annual Meeting of the Human Factors Society. Santa Monica, CA: Human Factors Society. Palmer, E. A., Jago, S. J., Baty, D. L., & O’Connor, S. L. (1980). Perception of horizontal aircraft separation on a cockpit display of traffic information. Human Factors, 22(5), 605-620. Palmer, E., Baty, D., & O'Connor, S. (1979). Perception of aircraft separation with various svmbols on a cockpit display of traffic information. Proceedings of the 15th Annual Conference on Manual Control. Wright-Patterson Air Force Base, OH. Schneider, W., Eschman, A., & Zuccolotto, A. (2002a). Eprime user’s guide. Pittsburgh, PA: Psychology Software Tools, Inc. Schneider, W., Eschman, A., & Zuccolotto, A. (2002b). Eprime reference guide. Pittsburgh, PA: Psychology Software Tools, Inc. Sanders, M. A. McCormick, E. J. (1993). Human factors in engineering and design. New York: McGraw-Hill. Sheridan, T. B. (1966). Three models of preview control. IEEE Transactions on Human Factors in Electronics, HFE-7(2), 91-102. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity and visual complexity. Journal of Experimental Psychology: Human Learning and Memory, 6, 174–215. Roscoe, S. N., Corl, L., & Jensen, R. S. (1981). Flight display dynamics revisited. Human Factors, 23, 341-353. Wickens, C. D., Haskell, I., & Harte, K. (1989). Ergonomic design for perspective flight path displays. IEEE Control Systems Magazine, 9(4), 3-8. Williams, D. H. (1983). Time-based self-spacing techniques using cockpit display of traffic information during approach to landing in a terminal area vectoring environment. NASA-TM-84601.

Do Velocity Vectors Support Multiple Object Tracking? - Cognitive ...

Do Velocity Vectors Support Multiple Object Tracking? - Cognitive ...

Suggest Documents

Multiple Object Tracking

Interactive Multiple Object Tracking (iMOT)

Online Multiple Support Instance Tracking

MULTIPLE OBJECT TRACKING BASED ON SPARSE

Multiple Object Tracking Using Local PCA

Shared processing in multiple object tracking and

Online Multiple Object Tracking with the Hierarchically

Instance Flow Based Online Multiple Object Tracking

Multiple Object Tracking Using Local PCA - CiteSeerX

Object Tracking Using Multiple Neuromorphic Vision ... - CiteSeerX

Multiple Object Tracking Using Particle Filters

Multiple Object Tracking Reveals Object-Based ... - Gestalt ReVision

Vibrotactile Speech Tracking Support: Cognitive ...

Support for Cognitive Apprenticeship in Object

Multiple Object Tracking Is Based On Scene, Not Retinal, Coordinates.

Precise multiple object identification and tracking using ... - CiteSeerX

Multiple Object Tracking Using the Shortest Path Faster Association ...

Detecting single-target changes in multiple object tracking: The case

Problem Analysis of Multiple Object Tracking System: A ... - IJARCCE

An Algorithm for Multiple Object Trajectory Tracking - Semantic Scholar

robust video object tracking based on multiple kernels ... - Google Sites

Object Detection, Tracking and Recognition for Multiple Smart Cameras

Robust Multiple Object Tracking by Detection with Interacting Markov ...

Assessing multiple object tracking in young children using a game