International conference on Driver Distraction and Inattention, Sept-2011, .... cause an additional cognitive load, and the associated decrement of available.
Effects of aging and a cognitive competing task on the setting of the saliencerelevance balance in visual search C. Gabaude1, J.E. Cerf1,2, C. Jallais1, E. Douissembekov1,2, D. Letisserand1, L. PaireFicout1 and G.A. Michael2 1
Université de Lyon, F-69622, Lyon - IFSTTAR, LESCOT, F-69675, Bron, France
2
Département de Psychologie Cognitive & Neuropsychologie, Institut de Psychologie, Laboratoire d‟Étude des Mécanismes Cognitifs, Université Lyon 2, Lyon, France Abstract Goal-directed and stimulus-driven processes interact during the deployment of attention while driving. Using road elements localization, we investigate the issue of the salience/relevance balance (SRB) in visual search by seeking to understand the influence of age, task demands and scene presentation conditions. Twenty-four older drivers (mean age = 71.3) and 24 younger experimented drivers (mean age = 35.4) were asked to localize as quickly as possible a target on a fixed driving scene (a pedestrian, a vehicle, a traffic light or a road-marking). Pictures of intersections were segmented in 9 sections and showed randomly under three different presentation conditions: original (scenes presented in their original form), partly jumbled (the target appeared in its original position, but the remaining sections were jumbled), or fully jumbled (all locations were jumbled, ensuring that the target was not in its original position). Half of the trials were administered while executing a competing task of semantic judgment (dual task), and the remainder without (single task). Eye movement strategies were analyzed. Two indices, Normalized Scanpath Salience and Normalized Scanpath Relevance were developed to compare data from salience maps and from relevance maps to visual scanning pathways. Results provide evidence for the existence of adjustments of SRB and the need of available attentional resources for such a setting. Furthermore, we show that, despite the greater driving experience of elderly drivers, there is a deleterious effect of the competing cognitive task on the organization of their visual search, which is not compensated by a relevance-oriented adjustment. Introduction Among human activities, driving requires a broad range of skills and competencies that cannot be reduced to a simple sequence of motor tasks because recruiting attention-demanding cognitive functions (Horberry et al., 2006). New research themes emerge since the 80s because of aging of population in industrialized countries, particularly on driving, because demographic projections indicate that the elderly will represent a much larger share of population than today, 30% in 2040 against 16.5% today. In addition, aging is accompanied by several declines, as the decrease in reaction time and visual performance with reduced visual field, acuity, contrast sensitivity and increased time to adapt to darkness, but also a decrease in International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
1
attentional capacity and working memory as well as an increase of difficulties in dual tasks. Nevertheless, there is literature that shows that the experience can compensate for the declines that occur with age (Marquié and Isingrini, 2001). As older have a longer driving experience they have developed special strategies to avoid difficulties (Gabaude, Marquié, Obriot-Claudel, 2010). A previous study (Jallais, Paire-Ficout & Gabaude, 2009) showed that the elderly can compensate declines through the use of an operative strategy testifying expectations, probably related to their greater driving experience. Driving a car is a factor of social bonds and a guarantee of autonomy necessary or even indispensable. Given the societal and economic challenges represented by the automobile in the elderly, particularly in the fight against isolation and dependence, it is necessary to better understand and prevent the effects of aging by studying its impact on attentional control while driving. The larger amount of the environment‟s information is received through the visual modality. Because of differences in acuity in the central and peripheral visual fields, the eyeballs make constant movements to bring the image of the object of interest on the fovea (Fiori, 2006). Thus, perceived image is a gathering of images from different scenes‟ parts formed on the fovea and nearby regions. Michael et al. (2006; 2007) explain that the amount of information provided by the visual environment poses serious problems for organisms evolving in it when they need to use such information to develop coherent and adapted behaviors. Among information available at once, which one should be selected and processed first, and why? It is widely admitted that attention is the mechanism determining these priorities. Visual attention is the ability of biological systems to mobilize resources for processing important parts of a scene in order to reduce the amount of data to be analyzed further on by more complex cognitive functions, such as the mapping of various characteristics and object recognition. The ability to quickly and efficiently explore the environment for important information is crucial for car drivers. The road is a quite unstable environment since drivers must understand the road signs and the behavior of other drivers while controlling the vehicle. In such an environment, attentional capacity can be quickly saturated, and cognitive overload may have dramatic consequences. The cognitive effort is thought by psychologists in terms of cognitive cost, that is to say, resources allocated at a given time to the processing system to solve a particular task using a certain strategy. Since Shiffrin and Schneider (1977), the cost of treatment is considered related to the level of control over the system. Thus, the more the activation process is controlled, the more it requires attentional resources. In contrast, automatic processes are less resource-consuming. We are interested in the possibility of establishing a link between the capacities of control, of cognitive effort and attention of the operator by observing how this adjustment can be made. Selection and prioritization of visual signals depend on their intrinsic nature and/or their importance for the task at hand. Exogenous process, controlled by sensory inputs and functioning on a rather automatic and involuntary mode, and endogenous International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
2
processes, guided voluntarily by the expectations and goals of the observer, determine conjointly processing priorities on the basis of a balanced combination. These mechanisms of visual attention guide eye movements towards the most salient or the most task-relevant part of the scene (Van der Stigchel, Peters and Theeuwes, 2009). The Master Activation Map model of attention (MAMm; Michael et al., 2006; 2007) is hybrid in that it is based on previous models of attention and, as such, it includes those models. The basic and novel feature is the existence of a computational map (the “master activation map” or MAM; see Figure 1), which integrates activities related to visually salient signals and those related to the relevance of items or locations for the current task. Spatial locations and basic feature dimensions (color, luminance, and so forth) are processed in distinct computational maps, and feature differences computed for each stimulus dimension. Their sum results in a salience map depicting how different each item is from each of its neighbors (Koch & Ullman, 1985; Theeuwes, 1992). A relevance map is activated if subjects intend to find the target by using the available knowledge regarding this item. The integration of both the salience and relevance activities is thereafter computed within the MAM along with their respective spatial coordinates obtained via the interactions with the spatial map. The MAM output is the only input to an orienting process and serves to guide attention to the strongest signal in the visual field, the next highest, and so on (Koch & Ullman, 1985; Michael & Galvez-Garcia, 2011). If salience and relevance signals share the same spatial code (i.e., the most salient signal in the visual field is also the one subjects have been searching for), they are integrated (Fecteau & Munoz, 2006) and orienting towards the source of these signals is faster. If however salience and relevance signals have different spatial codes, then they compete for selection. The involvement of an independent active inhibitory process (Michael et al., 2001; 2006) tends to weaken the propensity of bottom-up signals to call for attention if they are not task-relevant (i.e., distractors). Lastly, top-down activities (be it the generation of relevance or activation of the inhibitory process) depend on higher-order operations that control the distribution of resources (i.e., commodities or pools of energy to be „spent‟ on task performance; Norman & Borbow, 1975), and the maintenance of task-related goals (Watson & Humphreys, 1997). For instance, resource availability will indirectly affect the ability to activate and maintain inhibition for resisting interference. Thus, three subsystems cooperate to produce attention-related phenomena: a computational subsystem made up of maps that process either bottom-up or top-down signals; an operatory subsystem made up of the independent and indirectly interacting processes of orienting and inhibition; and a directing subsystem that generates and maintains task-related goals and controls the top-down flow of information through capacity limited resources (Michael et al., 2007). The main contribution of the MAM model is the theorization of interactions between bottom-up processing of inputs and top-down goal-directed biases that describe normal human attention-related behaviors as the
International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
3
result of a balance between what is salient and what is relevant for the current task at a given moment.
Figure 1: the Master Activation Map model of attention
The role of spatial structure and context in visual search has been demonstrated with the jumbled image paradigm developed by Biederman and colleagues (Biederman et al., 1973; Biederman, 1981), and this paradigm has also been used to investigate the way drivers locate targets in the road (Chapman et al., 2002). During visual search, a schema can be activated to provide a spatial representation of the organization scene. This guides selection of the probable locations of the targets in an otherwise ordinarily arranged scene. However, a scene in which the elements are presented at unusual locations would contradict the schema and visual search would be disturbed (Biederman et al., 1973). Jallais et al. (2009) used such a protocol with intersection pictures to measure, in single and double tasking, the number of eye fixations in predefined areas and localization times for road elements by younger and older participants. The results revealed that older participants were slower to find the visual element than the younger participants especially under dual tasking. Furthermore, the visual search of older participants was found to be particularly disorganized under dual tasking. The authors wondered whether a voluntary exploratory strategy (i.e., top-down guidance) was employed by older drivers to compensate for their agerelated attentional decline. It was shown that cognitive tasks completed during driving increase gaze frequency in the middle of the road (Recarte et al. 2003), probably indicating an increase in information selectivity in central vision before action, a
International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
4
decrease of the peripheral targets‟ appeal power, but also a deterioration in the information acquisition necessary for safe driving. The aim of this work was to investigate the influence of age, task demands and scene presentation conditions on visual search for elements being at salient or relevant locations in complex visual scenes. The jumbled picture paradigm was used to investigate the way salience and relevance guide visual search for targets in fixed scenes of intersections. Indeed, scenes activate schemata and guide attention in a top-down fashion towards relevant locations. Presenting these elements at unusual locations would contradict the activated schemata and salient elements rather relevant locations would attract attention. At this aim, two indexes were computed, one reflecting bottom-up use of salience, the other one top-down use of relevance. Based on the work of Peters et al. (2005), we computed the degree of correspondence between a salience map and the visual scanpath of participants (Normalized Scanpath Salience index, or NSS). Similarly, an index of the degree of correspondence between the targets‟ location relevance and visual fixations was computed (Normalized Scanpath Relevance index, or NSR). It can show the influence of spatial structure in visual search. Our first hypothesis was that the more the jumbled is a scene, the more the NSS will be high and the more the NSS will drop. Second, older people would compensate physiological losses in vision with more efficient visual strategies involving top-down processes (Porter et al., 2009). They would therefore tend to search for items located in highly relevant locations exhibiting low salience, the balance should thus tilt towards relevance. The NSR index should be higher while the NSS index would be reduced. Second, dual tasking cause an additional cognitive load, and the associated decrement of available attentional resources would advantage bottom-up mechanisms which are thought to be less costly in attentional resources. In this case, the balance would be tilted towards salient visual elements. The value obtained with our NSS index would thus be higher in the dual task, while the NSR higher in the simple task. Finally, the joint effects of aging and competing task would lead the elderly to compensate by developing expectations through their driving experience. We therefore hypothesized that attentional resources available during dual task would be still sufficient and they would use top-down strategies, settling in this case the balance in favor of the relevance. Material and method Participants 24 older drivers (30% women, M = 71.33 years, SD = 4.5; Mini Mental State Examination or MMSE = 28.75, SD = 1.03, min = 26, max = 30) and 24 young drivers (46% women, M = 35.38 years, SD = 3.35) participated in this experiment. Out of the 24 young drivers who accepted to participate, 4 were excluded because of technical problems during the eye tracking data collection. Among older drivers, one participant International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
5
was excluded due to the poor quality of eye tracking data collection. Finally, 20 young participants and 23 older participants were involved in the data analysis. The MMSE was used to verify the absence of cognitive impairment in older participants. All participants were experienced drivers traveling at least 3000 km per year and had at least 5/10 of visual acuity. Stimuli Twelve pictures of intersections were used in this experiment, each one containing a car, a pedestrian, a traffic light and a road marking (an arrow). For each trial, only one of these objects was the target. The scenes were divided into 9 sections (Chapman et al., 2002) and showed in three possible presentation conditions: original (no change was done from the original picture, figure 2), partly jumbled (the target object appeared in its original position, while the remaining 8 sections were rearranged, Figure 3) and fully jumbled (all sections were mixed so that the target was not in its original location, Figure 4).
Figure 2: Driving scene displayed in the original condition.
Figure 3: Image displayed in the partly jumbled condition.
The section containing the target (here, the vehicle) retains its original position.
International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
6
Figure 4: Image displayed in the fully jumbled condition.
The section containing the target (here, the vehicle) is moved by changing the horizontal level. As the driving scenes corresponded to the driver‟s view, the traffic lights were located at the top of the image (sections 1, 2 or3), the pedestrians and the cars were located in the middle areas (sections 4, 5 or 6) and the road-markings were located at the bottom sections (sections 7, 8 or 9).A projector (EPSON Full HD model EMPTW 1000) displayed the image on a non-reflective screen (120 x 80 cm) placed at a distance of 130 cm from the participant. The image subtended a horizontal angular space of 30°. The lighting conditions were constant during the experiment. Procedure Participants had to localize as quickly as possible a specific target (pedestrian, vehicle, traffic light or road-marking) by indicating the section of the target object. In each new trial, the target‟s name was displayed for 2000 ms as a text message. Then, a mask appeared for 800 ms followed by the driving scene (ISI = 0ms). The image was displayed until the participant indicated that he located the target by pressing a button. As soon as the participant pressed the button a grid was displayed on the screen to reveal the number of all sections. The participant then had to indicate the number of the section where he/she saw the target (figure 5). A competing task was added while performing the localization task in order to increase cognitive load and decrease the available attentional resources. In this semantic judgment task, participants listened to short sentences and were asked to decide whether the statements were true or false. Thirty six sentences were used, the number of syllables of each sentence was controlled and each sentence lasted about 2 seconds. The order of appearance of all kinds of targets was counterbalanced. Each participant received 72 trials obtained by combining two tasks (single or dual task), 3 conditions (original, partly and fully jumbled) and 4 targets (car, pedestrian, traffic lights and road-marking). In order to counterbalance all pictures, targets, conditions, we created 8 scenarios (4 started with the single task and finished with the dual task and the same 4 scenarios in which participants started with the dual task) with the International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
7
PsyScope software. Participants were randomly assigned to one of the 8 scenarios. For each participant, the twelve intersection scenes were displayed six times (in single and double task, and in the three presentation conditions – each time the participant was looking for a different target). Eye tracking (FaceLabTM 4.3, Seeing Machine, Australia) was used to record participants‟ visual strategies. In dual task, reading sentences distracting started at the onset of word-primer. Indexes of salience and relevance Definition of salience maps
For each picture, a salience maps was obtained with Matlab ® 7.0 using the algorithm developed by Itti and Koch (2000) and a subsequent comparison was done with the visual scanpath of the participants (Peters et al., 2005). We used the default settings given in the algorithm to process all images (in all presentation conditions). Some examples of salience maps are given in Figures 6 to 8. The most salient areas are noted in dark red.
Figure 5: Time course of displaying images in a single task.
Figure 6: Salience map in the original condition.
Figure 7: Salience map in the partly jumbled condition.
Figure 8: Salience map in the fully jumbled condition.
International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
8
Definition of relevance maps
When we computed relevance maps, we set scores of relevance depending on the prime previously initiated and the potential location of the target within a scene. Greater values were attributed to those segments of the scene potentially containing the target, and values decreased progressively up to irrelevant segments. The targets are distributed in the scene in three horizontal levels. A different coding was conducted depending on the type of target, however, vehicle and pedestrian had to be eliminated because they were mostly located in segments No. 5 containing the word prime indicating the target at the beginning of each trial, thus guiding the participant's attention to that specific segment. Locations potentially containing the target "trafic light" located at the top of the image received 3 points, the intermediate horizontal locations (just below) received 1 point and those on the lower horizontal level received 0 points. Similarly, locations potentially containing the target "arrow” located on the lower horizontal level received 3 points, the intermediate horizontal locations (just above) received 1 point and those on the upper horizontal level 0 points. Eye gaze data processing
The collection of data characterizing the visual scanpath was done with the FaceLabTM, signal processing was done to realigne the collected data and to clean, filter and smooth the scanpath data. Based on data collected for the left eye and right eye, FaceLabTM offers the gaze vector in two dimensions (coordinates x and y). The FaceLab technology detects the visual saccade and do not characterize visual fixation in real time. This system provides interpolations of x and y data during saccades or blinks. We wrote an algorithm in Matlab® cleansing, filtering and smoothing data in order to locate the visual fixations on each of the scenes used. We first eliminated the saccades (eye movement taking place at speeds between 300 and 800 °/sec) and eye blinks, and with reference to the criteria used by the Tobii©, we characterized our visual fixations. We used the threshold for a mixed content task (reading and observing picture), i.e. a fixation is defined when the eye movement is under 30 pixels in 100 ms. We therefore used this threshold to calculated, in a time window of 100 ms minimum, the barycenter of all points collected on a circle of 30 pixels in diameter. The coordinates of each point collected at 60Hz are shown in blue on the images, eye fixations are represented by red crosses. The quality of data collected by the FaceLabTM fluctuated during the experiment. The system provides a quality index for each eye that varies from 0 to 3. At each time step, we added these two indices and kept it for the analysis of fixations data whose quality was above 3 (7% of trials were removed as well). Sometimes FaceLabTM dysfunctioned and eyes data froze for relatively long period of time. To eliminate these artifacts, a visualization phase for each of the tests carried out by our participants was needed to remove altered trials (4% of trials were removed as well). International conference on Driver Distraction and Inattention, Sept-2011, Gothenburg, Sweden
Page
9
Choice of indexes to estimate the salience and relevance
To measure the degree of correspondence between a saliency map and visual fixations, the NSS score from Peters et al. (2005) was used. First, each saliency map is normalized to obtain a scale of pre-linear normalization, consisting of an origin and a unit standard deviation. Then the normalized values of salience are extracted for each point corresponding to fixations along the scanpath. The NSS score corresponds to the sum of all these normalized salience values divided by the number of fixations. The stronger the correspondence between the salience map and the ocular route, the higher the NSS score (NSS> 0). A zero score indicates no correspondence, gaze is guided at chance. By contrast, an anti-correspondence gives a negative score (NSS