Spatial Attention and Audiovisual Interactions in ... - Semantic Scholar

1 downloads 0 Views 185KB Size Report
Salvador Soto-Faraco. Institució Catalana de Recerca i Estudis Avançats, Parc Cientıfic de Barcelona and Universitat de Barcelona. Charles Spence. University ...
Journal of Experimental Psychology: Human Perception and Performance 2007, Vol. 33, No. 4, 927–937

Copyright 2007 by the American Psychological Association 0096-1523/07/$12.00 DOI: 10.1037/0096-1523.33.4.927

Spatial Attention and Audiovisual Interactions in Apparent Motion Daniel Sanabria

Salvador Soto-Faraco

Universidad de Granada and University of Oxford

Institucio´ Catalana de Recerca i Estudis Avanc¸ats, Parc Cientı´fic de Barcelona and Universitat de Barcelona

Charles Spence University of Oxford In this study, the authors combined the cross-modal dynamic capture task (involving the horizontal apparent movement of visual and auditory stimuli) with spatial cuing in the vertical dimension to investigate the role of spatial attention in cross-modal interactions during motion perception. Spatial attention was manipulated endogenously, either by means of a blocked design or by predictive peripheral cues, and exogenously by means of nonpredictive peripheral cues. The results of 3 experiments demonstrate a reduction in the magnitude of the cross-modal dynamic capture effect on cued trials compared with uncued trials. The introduction of neutral cues (Experiments 4 and 5) confirmed the existence of both attentional costs and benefits. This attention-related reduction in cross-modal dynamic capture was larger when a peripheral cue was used compared with when attention was oriented in a purely endogenous manner. In sum, the results suggest that spatial attention reduces illusory binding by facilitating the segregation of unimodal signals, thereby modulating audiovisual interactions in information processing. Thus, the effect of spatial attention occurs prior to or at the same time as cross-modal interactions involving motion information. Keywords: spatial attention, cross-modal interactions, apparent motion, audition, vision

support for this view, for example, comes from failures to modulate the ventriloquism effect by spatial attention (Bertelson, Vroomen, de Gelder, & Driver, 2000; Vroomen, Bertelson, & de Gelder, 2001a). For instance, Bertelson et al. (2000) conducted a study in which they investigated whether spatial ventriloquism depends on the direction of endogenous spatial attention. Participants had to point to the perceived location of a rapidly presented sequence of auditory stimuli, which could originate from the right, from the left, or centrally. To induce spatial ventriloquism, they presented a large square either to the right or left of central fixation at the same time as the auditory stimuli. In order to manipulate the direction of spatial attention, they also required participants to monitor visual targets (i.e., small squares), which were presented either at the same location as the visual attractor or centrally, for occasional changes in their shape. The magnitude of the ventriloquism effect (attraction of the perceived sound location toward the location of the big squares) was unaffected by the attentional manipulation (i.e., by variations in the location of the small squares that participants had to monitor). That is, cross-modal interactions appeared to occur regardless of the focus of participants’ selective spatial attention. It is important to note, however, that given that participants in Bertelson et al.’s study had to monitor a visual stimulus for changes in its shape, one might wonder whether object-based rather than space-based attentional processes were actually involved. In fact, object-based attentional effects have been reported irrespective of an object’s spatial location (e.g., Vecera & Farah, 1994). This aspect of Bertelson et al.’s study, therefore, leaves open the question of whether endogenous spatial attention modulates spatial ventriloquism.

It is now widely accepted that the representation of the external world in the brain relies on extensive multisensory information processing (e.g., see Calvert, Spence, & Stein, 2004; Spence & Driver, 2004). Behavioral, neuroimaging, and neurophysiological research repeatedly have demonstrated that attention can influence the processing of sensory information (e.g., Reynolds & Chelazzi, 2004; Spence, Shore, & Klein, 2001; Tata & Ward, 2005; Treue, 2004). However, according to many researchers, cross-modal interactions (i.e., multisensory integration) occur(s) independently of the locus of spatial attention. Indeed, it has been found that spatial attention can be directed within representations of external space that already result from multisensory integration (e.g., Bertelson & de Gelder, 2004; Driver & Spence, 2004; Soto-Faraco, Ronald, & Spence, 2004; Spence, McDonald, & Driver, 2004). One line of

Daniel Sanabria, Departamento de Psicologı´a Experimental, Universidad de Granada, Granada, Spain, and Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom; Salvador SotoFaraco, Institucio´ Catalana de Recerca i Estudis Avanc¸ats, Parc Cientı´fic de Barcelona, Barcelona, Spain, and Universitat de Barcelona, Barcelona, Spain; Charles Spence, Department of Experimental Psychology, University of Oxford. This study was supported by a Network Grant from the McDonnell-Pew Centre for Cognitive Neuroscience in Oxford to Salvador Soto-Faraco and Charles Spence. We thank David I. Shore and Ana Chica for helpful comments on a draft of this article. Correspondence concerning this article should be addressed to Daniel Sanabria, Departmento de Psicologı´a Experimental, Universidad de Granada, Campus Cartuja s/n, 18074, Granada, Spain. E-mail: [email protected] 927

928

SANABRIA, SOTO-FARACO, AND SPENCE

The potential influence of stimulus-driven (exogenous) spatial attention on spatial ventriloquism effects was addressed by Vroomen et al. (2001a) in a subsequent study. Vroomen et al. (2001a) used a staircase methodology (cf. Bertelson & Aschersleben, 1998; Caclin, Soto-Faraco, Kingstone, & Spence, 2002), which reduces the likelihood that any response biases that are present will influence the performance data. The visual attractor consisted of a display with four squares horizontally aligned (with 7.1° separation between them) presented on the screen. One of the squares (i.e., the singleton) was smaller than the others and was always presented at one or the other side of the visual display. The authors reasoned that if attention is exogenously attracted toward the singleton before spatial ventriloquism occurs, then a consistent auditory bias toward the location of the singleton should be found. In contrast with this hypothesis, the results showed a significant bias toward the bigger squares rather than toward the singleton (replicating the results of Bertelson et al.’s, 2000, Experiment 2). In a subsequent control experiment, Vroomen et al. (2001a) demonstrated that the singleton effectively attracted their participants’ spatial attention. Although this study seems more conclusive than the one by Bertelson et al. (2000), there is still one possible caveat, because as studies of stimulus-driven spatial attention have demonstrated, singletons capture attention to a lesser extent than do visual onsets (see Ruz & Lupia´n˜ez, 2002, for a review). Therefore, it seems more likely that if any modulation of cross-modal interactions by exogenous spatial attention exists, it may be found using visual onsets. These repeated failures to demonstrate any modulatory effect of spatial attention on audiovisual interactions stand in contrast to the extensive body of empirical evidence demonstrating the importance of attention for spatial (e.g., Colby & Goldberg, 1999; Klein & Shore, 2000; Posner, Snyder, & Davidson, 1980; Sengpiel & Hu¨bener, 1999) and perceptual processing within individual sensory modalities (e.g., Briand & Klein, 1987; Driver & Frith, 2000; Hawkins et al., 1990; Klein & Shore, 2000; Reynolds & Chelazzi, 2004; Spence et al., 2001; Tata & Ward, 2005; Treisman & Gelade, 1980; Treue, 2004). In the present study, we report a series of five experiments designed to investigate more thoroughly whether cross-modal (perceptual) spatiotemporal interactions are necessarily preattentive in nature. The processing of information regarding movement is critical for the construction of a coherent representation of the environment (e.g., Gibson, 1979). And, just as for the case of the processing of static events, recent research has revealed substantial crossmodal links in the perception of motion signals (e.g., see SotoFaraco & Kingstone, 2004, for a review). Nevertheless, to date, all attempts to show an attentional modulation of cross-modal spatial interactions have involved the presentation of static events (see Bertelson & de Gelder, 2004, for a review). In the present study, therefore, we used a variant of the cross-modal dynamic capture task (e.g., Soto-Faraco, Lyons, Gazzaniga, Spence, & Kingstone, 2002), a behavioral measure of cross-modal interactions in motion, combined with a spatial cuing paradigm (e.g., Posner et al., 1980; Spence & Driver, 1996, 1997) in order to assess whether shifts of covert spatial attention can modulate audiovisual interactions. In a typical experiment involving the cross-modal dynamic capture task, participants have to report the direction in which an apparent motion stream in a predefined target modality (e.g., audition) appears to move while an irrelevant apparent motion

stream is presented in a different modality (e.g., vision). The apparent motion streams consist of the sequential presentation of two static stimuli from different spatial locations. The target (auditory in the present study) and distractor (visual in this case) apparent motion streams can either move in the same (congruent) or opposite (incongruent) directions. Soto-Faraco and colleagues (e.g., Soto-Faraco et al., 2002; Soto-Faraco, Spence, & Kingstone, 2004) have primarily studied interactions between auditory and visual stimuli, although this effect also has been generalized to the case of cross-modal interactions involving visuotactile and audiotactile apparent motion displays (see Lyons, Sanabria, Vatakis, & Spence, 2006; Sanabria, Soto-Farco, Chan, & Spence, 2005; SotoFaraco & Kingstone, 2004; Soto-Faraco, Spence, & Kingstone, 2004; see also Craig, 2006). The cross-modal dynamic capture effect is defined as the difference in performance between congruent and incongruent trials. According to the authors (Soto-Faraco, Spence, & Kingstone, 2004), the audiovisual congruency effect reflects visual capture of the direction in which the auditory stimulus appears to move (although the same presumably applies to other pairings of sensory modalities as well, e.g., audiotactile), that is, that ventriloquism in motion occurred, analogous to the effects seen in static spatial and temporal ventriloquism reported in other studies (for a review, see Bertelson & de Gelder, 2004; see also Morein-Zamir, Soto-Faraco, & Kingstone, 2003; Sanabria, Spence, & Soto-Faraco, 2007; Vroomen & de Gelder, 2004). Soto-Faraco, Spence, and Kingstone’s (2004) results (see also Soto-Faraco, Spence, & Kingstone, 2005) suggested that, on incongruent trials, participants often commit errors because they hear the sound as moving in the same direction as the visual stimulus, despite the apparent motion streams moved in opposite directions. The research conducted by Soto-Faraco, Spence, and Kingstone (2004) provides further evidence to support the existence of substantial links between the perception of movement in different sensory modalities. Their studies to date have shown that crossmodal dynamic interactions are subject to similar rules as crossmodal spatial interactions involving static events. Indeed, spatial and temporal coincidence increases the likelihood of obtaining significant cross-modal dynamic capture effects (Soto-Faraco et al., 2002). Moreover, and relevant to our present purposes, they showed that the behavioral effects obtained using motion stimuli are considerably larger in magnitude than those obtained using static events (Soto-Faraco, Spence, & Kingstone, 2004), which leaves a wider window for detecting a potential modulation of cross-modal interactions by attention at a behavioral level.

Scope of the Present Study In order to investigate the role of spatial attention in audiovisual interactions, in our experimental manipulation, we cued participants’ spatial attention to one of two spatial locations prior to the presentation of the audiovisual apparent motion stream, which could appear afterward at either the cued or the uncued location (cf. Posner et al., 1980). Given the potential differences between the mechanisms of voluntary and reflexive orienting of spatial attention (e.g., Corbetta & Shulman, 2002; Mu¨ller & Rabbitt, 1989; Prinzmetal, McCool, & Park, 2005), we investigated the role played by endogenous spatial attention when oriented by experimental instructions (and blocking of the likely target location;

ATTENTION MODULATES AUDIOVISUAL INTERACTIONS

Experiment 1; cf. Mazza, Turatto, Rossi, & Umilta`, 2007; Spence & Driver, 1996) or by spatially informative peripheral cues (Experiments 2 and 4; cf. Mondor & Amirault, 1998) as well as the effect of cuing attention by means of noninformative peripheral cues (Experiments 3 and 5). In the present experiments, the direction of attention was oriented in a dimension (vertical) that was orthogonal to the direction of stimulus movement (horizontal) and to the participant’s response set (left vs. right) in order to minimize any behavioral effects that might be attributable to response biases (cf. Spence & Driver, 1996, 1997). Contrary to other studies of cross-modal motion (e.g., Beer & Ro¨der, 2004), attention was allocated to a particular region of space where the cross-modal stimuli might be presented, rather than to a particular sensory modality and/or direction of motion. According to most authors (e.g., Paul & Schyns, 2003; Quinlan, 2003; Reynolds & Desimone, 1999; Treisman, 1982, 1996; Treisman & Gelade, 1980; Wolfe, 2003), attention is necessary for the integration of the various different visual features belonging to a given object. A direct extrapolation of this notion to the case of cross-modal perception would predict the stronger binding of cues from different modalities (i.e., cross-modal integration processes) at attended locations than at unattended locations (e.g., Driver & Spence, 2004; Treisman & Davies, 1973). Nevertheless, the crossmodal dynamic capture task (as well as the static ventriloquism effect) presumably involves the binding of stimuli whose features are incongruent, as in the perception of illusory conjunctions. Given that an individual object cannot move in opposite directions at the same time, the cross-modal dynamic capture effect would therefore appear to reflect the brain’s (in)ability to maintain some form of perceptual coherence (cf. Bedford, 1999, 2004). Therefore, as for the case of illusory conjunctions in the visual domain (where

Figure 1. 4 and 5.

929

features belonging to different objects can be combined erroneously because of the lack of attention; cf. Treisman & Gelade, 1980; Treisman & Schmidt, 1982), in the present study one might expect a reduced effect of visual information on auditory motion perception at attended locations compared with unattended locations when the two motion streams move in opposite directions. Such a result would appear to suggest that attention reduces the likelihood of the illusory binding of (cross-modal) features from different objects (but see Alsius, Navarra, Campbell, & SotoFaraco, 2005, for contradictory results in the case of speech perception).

General Method Seventy-two participants (24 in each of the first three experiments) from the University of Oxford took part in this study. All of the participants reported normal hearing and normal or corrected-to-normal vision, and each received a £5 ($9.85) gift voucher or course credit in exchange for their participation. The participants gave their informed consent prior to the start of the experiment and were fully debriefed at the end of their session. Experiments 1 and 2 took approximately 30 min to complete, whereas Experiment 3 took around 20 min.

Apparatus and Stimuli Four loudspeaker cones, attached to two metallic supports, were mounted on a table and arranged in a virtual square facing the participant (30 cm center to center), centered 40 cm above the tabletop. An orange and a green LED were positioned at the center of each loudspeaker (see Figure 1). The orange LEDs (1.8 cm in

Illustration of the setup used in Experiments 1–5. The neutral cues were used only in Experiments

930

SANABRIA, SOTO-FARACO, AND SPENCE

diameter) were used to present the visual apparent motion stimuli, and the green LEDs (0.9 cm in diameter) were used to present the visual cues in Experiments 2 and 3. A red LED situated at the center of the display served as the fixation point. The activation of the loudspeakers and LEDs was controlled using E-Prime software (Schneider, Eschman, & Zuccolotto, 2002a, 2002b). The experiments were conducted in a completely dark, sound-attenuated experimental booth. Auditory apparent motion streams consisted of the sequential presentation of two white noise bursts (50-ms duration, 65dB[A]), one from each of the two loudspeakers situated either above or below fixation, separated by a 75-ms silent interval. Visual apparent motion streams consisted of the sequential presentation of two light flashes (50-ms duration), one from each of the upper (or lower) orange LEDs, again separated by an interval of 75 ms. Previous research has shown that these spatiotemporal parameters elicit the illusory perception of motion (e.g., Strybel & Vatakis, 2005). The visual and auditory apparent motion streams moved horizontally from left to right or from right to left (either above or below fixation), in the same or opposite directions (congruent and incongruent trials, respectively), and were always presented in temporal synchrony. The visual cue consisted of the synchronous presentation of two light flashes (100-ms duration) from either the top or bottom green LEDs, 200 ms before the appearance of the apparent motion stimuli (i.e., a stimulus onset asynchrony of 300 ms was used).

Design Each experiment constituted a within-participants repeatedmeasures design, with the independent variables of cuing (cued, neutral, and uncued in Experiment 1; cued vs. uncued in Experiments 2 and 3) and congruency (congruent vs. incongruent motion direction). These conditions were fully randomized within each of the three experiments. Half of the trials were congruent and the other half incongruent in each of the cuing conditions and experiments. The cuing variable was manipulated within each experiment as explained below. All participants completed 12 practice trials in which the sounds were presented in the absence of any visual stimuli to ensure that they were able to discriminate the direction of the auditory motion streams when presented in isolation. This practice block was repeated until participants performed at above 90% correct. Additionally, before the start of the experimental blocks in each experiment, the participants completed a block of 16 further practice trials in order to familiarize themselves with the task. None of the participants took part in more than one experiment in this study.

Procedure The participants sat in a comfortable chair and rested their head on a chinrest placed 65 cm from the central fixation light. The participants also were instructed to ignore the irrelevant visual apparent motion stream as much as possible and to respond (with their left index finger using the z key or their right index finger using the m key to indicate leftward vs. rightward auditory motion, respectively) in an unspeeded manner to the target auditory motion

as accurately as possible. They were informed about the independence between the direction of the auditory and visual apparent motion streams prior to the start of the experiment. Daniel Sanabria instructed the participant to fixate the central red LED and to refrain as much as possible from making eye movements (this instruction was stressed by the experimenter at the beginning of each block of trials). Eye position was monitored throughout the experiment in a subset of participants (N ⫽ 31 across the three experiments) using an infrared video camera placed behind the fixation light (to ensure adherence to the central fixation instructions), which revealed no differences in the pattern of results compared with the unmonitored group (with the exception of a few rare occasions, the monitored participants adhered to the experimental instructions). A trial started with the onset of the red fixation light, which remained illuminated until the end of the trial. The apparent motion stimuli were presented after an interval of 1,000 ms in Experiment 1. In Experiments 2 and 3, a visual cue was presented 1,000 ms after the onset of the fixation point, followed by the stimulus onset asynchrony of 300 ms and the audiovisual apparent motion stimuli. The fixation light was turned off when participants responded, and there was a 1,000-ms interval before the start of the next trial. Experiment 1 (informative blocked cuing). Twenty-four participants (13 women, age range: 19 –34 years, M ⫽ 25 years) completed three blocks of trials: a neutral block of 48 trials in which participants were instructed to divide their attention equally between the upper and lower locations (with the audiovisual motion stimuli being just as likely to appear from above or below fixation); an upper block of 112 trials in which participants were instructed to attend to the upper location because audiovisual stimuli would appear there on 75% of trials (and from the opposite location on the remaining 25% of trials; cued vs. uncued trials, respectively); and a lower block of 112 trials in which the participants were instructed to attend to the lower location, as the audiovisual stimuli were presented from that elevation on 75% of trials (and from the opposite location on the remaining 25% of trials). The order of the blocks was pseudorandomized (the neutral block was presented either before or after the other two blocks). Experiment 2 (informative peripheral cue). Twenty-four participants (11 women, age range: 19 –33 years, M ⫽ 24 years) completed two blocks of 80 trials in which a peripheral visual cue (the top or bottom two green LEDs flashing) indicated the likely location of the audiovisual stimuli (75% cue validity). Participants were informed in advance about this contingency. Experiment 3 (uninformative peripheral cue). Twenty-four participants (14 women, age range: 18 –35 years, M ⫽ 21 years) completed a block of 80 trials in which a spatially noninformative visual cue (the two upper or lower green LEDs flashing) was presented before the onset of the audiovisual stimulus. Participants were told that the peripheral cue was not informative with regard to either the position of the subsequent audiovisual stream or to the relative congruency between the two unimodal streams, because the cue and apparent motion stimuli were just as likely to come from the same, as from opposite, elevations.

ATTENTION MODULATES AUDIOVISUAL INTERACTIONS 80

The proportions of correct responses in the sound motion discrimination task1 from Experiments 1–3 were each submitted to a repeated-measures analysis of variance (ANOVA), with the variables of cuing (cued, uncued, and neutral in Experiment 1; cued vs. uncued in Experiments 2 and 3) and motion congruency (congruent vs. incongruent). The main effects of cuing in Experiment 1, F(2, 46) ⫽ 7.25, in Experiment 2, F(1, 23) ⫽ 38.27, and in Experiment 3, F(1, 23) ⫽ 33.72, and of congruency in Experiment 1, F(1, 23) ⫽ 110.98, in Experiment 2, F(1, 23) ⫽ 72.10, and in Experiment 3, F(1, 23) ⫽ 53.46, were significant in all three experiments (all ps ⬍ .01). Participants responded significantly more accurately on cued trials than on uncued trials and on congruent trials than on incongruent trials overall (see Table 1). Critically, the magnitude of the cross-modal congruency effect (defined as the difference between the accuracy of participants’ responses on congruent and incongruent trials) was larger in the uncued condition than in the cued condition, resulting in a significant interaction between cuing and congruency in all three experiments: Experiment 1: F(2, 46) ⫽ 5.76, p ⫽ .005; Experiment 2: F(1, 23) ⫽ 28.73, p ⬍ .0001; Experiment 3: F(1, 23) ⫽ 19.18, p ⫽ .0002 (see Figure 2). There was no significant difference in the magnitude of the congruency effect between the cued and neutral trials in Experi-

70

Table 1 Percentage of Correct Responses and Cross-Modal Dynamic Capture Effects as a Function of Cuing and Congruency in Experiments 1–3 % correct Condition

M

SE

% congruency effect

Experiment 1 Cued Congruent Incongruent Uncued Congruent Incongruent Neutral Congruent Incongruent

56 99 43

0.31 5.67

98 34

0.52 5.94

99 42

0.35 5.92

64 57

Experiment 2 Cued Congruent Incongruent Uncued Congruent Incongruent

26 98 72

0.70 4.21

97 49

0.88 4.85

Condition

60

Cued

50

Neutral Uncued

40 30 20 1

2

3

Experiment

Figure 2. Mean cross-modal dynamic capture effects ([% congruent – % incongruent] ⫹ SE, denoted by the error bars) as a function of cuing in Experiments 1–3.

ment 1 (t ⬍ 1). Subsequent t test analyses revealed that the significant difference in performance between cued and uncued trials was restricted to the incongruent trials in Experiment 1, t(23) ⫽ 3.26, p ⫽ .003, in Experiment 2, t(23) ⫽ 14.12, p ⬍ .0001, and in Experiment 3, t(23) ⫽ 5.38, p ⬍ .0001, whereas there were no significant differences between cued-congruent trials and uncued-congruent trials in any of the first three experiments (all ps ⬎ .10; see Table 1). The changes in the magnitude of the congruency effect as a function of cuing were not dependent on the particular location (i.e., upper vs. lower) from which the auditory targets (and visual distractors) were presented: Experiment 1, F ⬍ 1; Experiment 2: F(1, 23) ⫽ 1.10, p ⫽ .30; and Experiment 3: F(1, 23) ⫽ 2.65, p ⫽ .11. In order to test whether the magnitude of the cross-modal dynamic capture effect varied between experiments (see Figure 2), we conducted a between-experiments ANOVA, with the variables of Experiment (1, 2, and 3), cuing (cued vs. uncued; note that in order to compare the three experiments, we did not include the neutral trials from Experiment 1), and congruency (congruent vs. incongruent). The analysis showed a significant interaction among experiment, cuing, and congruency, F(2, 69) ⫽ 3.53, p ⫽ .03 (see Figure 2). Crucially, larger differences in the magnitude of the congruency effects between cued and uncued trials were obtained in Experiments 2 and 3 (22% and 20%, respectively) than in Experiment 1 (8.5%; p ⬍ .05 in the two comparisons). The analysis also showed a significant main effect of experiment, F(2, 69) ⫽ 5.18, p ⬍ .01, with participants responding more accurately overall in Experiments 2 and 3 than in Experiment 1 (Ms ⫽ 78%, 77%, and 68%, respectively), whereas there was no significant difference between performance in Experiments 2 and 3 (F ⬍ 1). This main effect is explained by the significant interaction reported above.

48 1

Experiment 3 Cued Congruent Incongruent Uncued Congruent Incongruent

% Congruency effect

Statistical Analyses and Results

931

25 96 71

1.38 4.83

94 49

1.57 5.45

45

We also analyzed the z-transformed accuracy data in order to address the concern that the use of direct proportion scores in parametric statistical tests may lead to an overestimation of significance values in certain situations due to the artificially reduced variability found when accuracy scores fall close to 0 or 1. However, in the present study (see also Soto-Faraco, Spence, & Kingstone, 2004), the pattern of statistical results obtained using the transformed data was identical to that derived from the untransformed accuracy data. Therefore, for ease of understanding, the data in the text and figures are described in terms of the percentages of correct responses.

932

SANABRIA, SOTO-FARACO, AND SPENCE

One might argue that the larger modulation of the Congruency Effect ⫻ Spatial Cuing observed in Experiments 2 and 3 compared with Experiment 1 reflects a masking effect elicited by the presentation of the cue (i.e., the spatially static green light flashes) on each of the visual events making up the visual apparent motion stream (i.e., orange light flashes), rather than a true attentional effect. That is, as a consequence of this potential forward visual masking, the strength of the visual motion stream on cued trials might have been weaker than on uncued trials, thus resulting in a reduced congruency effect of the lights on the sound motion in Experiments 2 and 3. It is important to stress that, by itself, visual masking cannot fully explain the attentional effect reported in our study, given the results of Experiment 1 (in which no peripheral cue was used). Nevertheless, it is a still a logical possibility that masking could have contributed to the effects reported in Experiments 2 and 3 given that each of the green cue LEDs was situated just below each of the visual apparent motion orange LEDs (see Figure 1). In order to rule out this potential confound, we conducted a further control experiment (N ⫽ 10 participants) using a noninformative peripheral cue (just as in Experiment 3) in which the green cue LEDs were each located in the middle of the outer side of each of the loudspeaker cones, 7 cm away from the center of the loudspeaker (and from the respective orange LED). Thus, in this control experiment, there was no possibility of pure sensory masking, whereas the attentional manipulation was similar to that present in the previous three experiments. The results of this control study mirrored those obtained in Experiments 2 and 3, with a significantly larger congruency effect on uncued than on cued trials (42% and 23%, respectively), F(1, 9) ⫽ 8.04, p ⫽ .019. Note also that the magnitude of the congruency effect obtained in the cued trials is comparable with that seen in Experiments 2 and 3. Therefore, it seems more likely that our results reflect a truly spatial attention effect rather than simply visual masking. We ran a between-experiments ANOVA using the reaction time data from correct response trials in Experiments 1–3, except for the data from 1 participant who was excluded because of not having sufficient correct responses in the uncued-incongruent condition. The ANOVA revealed a significant main effect of cuing, F(1, 68) ⫽ 12.90, p ⬍ .0001, with participants responding more rapidly on cued trials than on uncued trials overall (854 vs. 1,026 ms, respectively). Participants’ responses were also faster on congruent trials than on incongruent trials overall (722 vs. 1,158 ms, respectively), resulting in a significant main effect of congruency, F(1, 68) ⫽ 38.90, p ⬍ .00001. As it was found in the analysis of the accuracy data, there was a significant interaction between cuing and congruency, F(1, 68) ⫽ 5.68, p ⫽ .01, highlighting the reduction in the cross-modal congruency effect (reaction time on incongruent– congruent trials) on cued trials with respect to uncued trials (320 vs. 552 ms, respectively). None of the other terms in the analysis reached statistical significance (all ps ⬎ .05). It is worth noting that although participants’ responses were unspeeded, the reaction time analysis demonstrates that the larger congruency effects obtained on uncued trials (compared with any of the other conditions) were not caused by any speed–accuracy trade-off in the data.

Attentional Benefit or Attentional Cost? Experiment 1 revealed no differences between the neutral and cued conditions in terms of the modulation of the cross-modal dynamic capture effect. This result might lead one to conclude that the spatial attention modulation obtained in Experiment 1 (and in Experiments 2 and 3) only reflected attentional costs, and not attentional benefits. That is, our results might not reflect the beneficial effect of attending to a spatial location on performance in a cross-modal task but rather the cost of attracting attention away from the location in which the cross-modal stimuli were presented. However, as noted by some authors (e.g., Jonides & Mack, 1984), the interpretation of the results in the neutral condition in many experimental designs is controversial. For instance, in our case, it is possible that during the neutral blocks, participants successfully divided their attention between the upper and lower locations, giving rise to the same modulation of the cross-modal dynamic capture effect that had been obtained in the cued trials of the two remaining blocks of trials. In Experiment 1, as noted above, presenting the neutral condition in a block of trials (not intermingled with cued and uncued trials) may have enabled participants to develop a successful strategy of attending to both the upper and lower locations. Moreover, endogenous attention was manipulated by varying the proportion of cued and uncued trials and by informing participants of this manipulation in advance. It is possible, then, that mixing neutral, cued, and uncued trials on a trial-by-trial basis would discourage this kind of strategy and reveal both significant attentional costs and significant benefits. In order to address this important question, we used a predictive peripheral cue (Experiment 4) replicating Experiment 2 and a nonpredictive peripheral cue (Experiment 5) replicating Experiment 3, but this time we included a neutral condition. The neutral condition consisted of two green LEDs flashing at the same time from the center of the setup (one green LED attached to either side; see Figure 1) and occurring intermingled with cued and uncued trials.

Experiment 4 (Informative Peripheral Cue With Neutral Condition) Twenty-four participants (17 women, age range: 19 –33 years, M ⫽ 24 years) completed two blocks of 100 trials in which 60 were cued trials (75% cue validity), 20 were neutral trials, and 20 were uncued trials. Participants were informed in advance about this contingency. The ANOVA performed on the accuracy data from Experiment 4 revealed a significant interaction between cuing and congruency, F(2, 46) ⫽ 19.63, p ⬍ .0001 (see Figure 3). Subsequent t tests showed that participants performed more accurately on incongruent trials in the cued condition (M ⫽ 86% correct) than in the neutral condition (M ⫽ 75% correct), t(23) ⫽ 3.65, p ⫽ .001, and that performance on incongruent trials in the neutral condition was significantly better than in the uncued condition (M ⫽ 65% correct), t(23) ⫽ 3.56, p ⫽ .001. There were no significant differences in performance on congruent trials between the cued and neutral conditions, between the neutral and uncued conditions, or between the cued and uncued conditions (all ps ⬎ .05). Just as reported in Experiment 2, the analysis of the data from Experiment 4 also revealed a significant main effect of congruency, F(1, 23) ⫽ 21.51, p ⬍ .001, and cuing, F(2, 46) ⫽ 22.83, p ⬍ .0001.

ATTENTION MODULATES AUDIOVISUAL INTERACTIONS

General Discussion

45

% Congruency effect

933

40

Condition

35

Cued Neutral

30 25 20

Uncued

15 10 5 4

5

Figure 3. Mean cross-modal dynamic capture effects ([% congruent – % incongruent] ⫹ SE, denoted by the error bars) as a function of cuing in Experiments 4 and 5.

Experiment 5 (Uninformative Peripheral Cue With Neutral Condition) Twenty-four participants (11 women, age range: 18 –33 years, M ⫽ 24 years) completed a block of 120 trials in which 40 were cued, 40 were neutral, and 40 were uncued trials. The participants were told that the peripheral cue was not informative with regard to either the position of the subsequent audiovisual stream or the relative congruency between the two unimodal streams, because the cue and apparent motion stimuli were just as likely to come from the same, as from opposite, elevations. The same pattern of results (as obtained in Experiment 4) was obtained in the ANOVA performed on the accuracy data from Experiment 5. Participants responded more accurately on incongruent trials in the cued condition than in the neutral condition, t(23) ⫽ 2.46, p ⫽ .02 (M ⫽ 88% vs. 81% correct, respectively) and more accurately in the neutral condition than in the uncued condition (M ⫽ 73% correct), t(23) ⫽ 2.81, p ⫽ .009. There was also a significant interaction between congruency and cuing, F(2, 46) ⫽ 8.19, p ⫽ .001 (see Figure 3). As in Experiment 4, there were no differences in performance on congruent trials among cued, neutral, and uncued trials (all ps ⬎ .05). The main effects of congruency and cuing were also significant, F(1, 23) ⫽ 23.69, p ⬍ .0001, and F(2, 46) ⫽ 13.72, p ⬍ .0001, respectively, thus replicating the overall pattern of results reported in Experiment 3. The results of Experiments 4 and 5 confirm the existence of both attentional benefits and costs as a result of the attentional manipulations that we introduced. One might argue that these positive results and the null effect (no difference between performance in the cued and neutral trials) reported in Experiment 1 reflect a dissociation between the effects of endogenous and exogenous spatial attention on cross-modal perception. However, as noted above, it is possible that blocking the neutral condition in Experiment 1 may have allowed participants to divide their attention more successfully (see Spence & Driver, 1996, on this point). This account is supported by the analogy with the results of Experiments 2 and 3 versus Experiments 4 and 5 in which the neutral trials were blocked or intermingled with cued and uncued trials, respectively. We discuss further the qualitative and quantitative differences found between the endogenous and exogenous manipulations presented here in the General Discussion.

The present study provides the first empirical demonstration of a behavioral modulation of audiovisual interactions as a function of the locus of covert spatial attention. Crucially, the influence of visual apparent motion on the perceived direction of auditory apparent motion (as measured in terms of the cross-modal dynamic capture task; cf. Soto-Faraco, Spence, & Kingstone, 2004) was significantly reduced in cued trials compared with uncued trials. This result challenges the oft-made claim that cross-modal interactions are necessarily preattentive in nature (see Bertelson & de Gelder, 2004, for a recent review). Consistent with the hypothesis outlined in the introduction, it would appear that in Experiments 1–5, spatial attention helped participants to segregate the auditory target information from the visual distractors more effectively, partially preventing the illusory conjunction of features (cf. Sanabria et al., 2005). This idea is supported by a reduced effect of incongruent visual information on auditory directional discrimination in cued (or valid) trials compared with uncued (and invalid) trials. As in other recent studies (e.g., Reynolds & Chelazzi, 2004), the results of the experiments reported here highlight the important role that spatial attention plays in perceptual organization and go further by generalizing this notion to the case of cross-modal perceptual processing.

Binding or Segregation? Although a sensory conflict is present on incongruent trials in the cross-modal dynamic capture task, it may be assumed that congruent trials provide redundant sensory information from the same object. If spatial attention helps to bind the different features of an object (cf. Treisman & Gelade, 1980), one should have expected to see an improvement in performance on congruent trials in the cued condition compared with the uncued condition in the present experiments. Note, however, that performance on congruent trials was at ceiling levels in all of the experimental conditions (see Table 1; although accuracy levels were numerically higher on congruent-cued trials than on congruent-uncued trials in all three experiments). This would have made any change in accuracy on congruent trials between the cued and uncued conditions difficult to detect. Lowering performance on congruent trials should allow one to assess the prediction of cross-modal binding enhancement on congruent trials at cued locations with respect to uncued locations. This prediction was tested in a final control experiment in which we replicated the design of Experiment 3 (with a nonpredictive peripheral cue) but presented background white noise throughout the experimental session in order to lower participants’ performance in sound direction discrimination performance off ceiling (see Sanabria et al., 2007, for a similar methodology). The results were surprising: Although a significant difference was, again, obtained between cued and uncued incongruent trials (39% vs. 32% correct, respectively), t(25) ⫽ 2.14, p ⫽ .02, replicating the results of Experiments 1–5, there was absolutely no difference in performance between cued and uncued trials in the congruent condition (82% correct in both conditions). Thus, although presenting background white noise during the experimental session was shown to successfully bring participants’ accuracy off ceiling on congruent trials (both cued and uncued), we failed to show any benefit of spatial attention on performance on those trials.

934

SANABRIA, SOTO-FARACO, AND SPENCE

Therefore, although spatial attention seems to effectively reduce illusory cross-modal grouping in sensory conflict (i.e., incongruent) situations (at least in the cross-modal dynamic capture paradigm used here), it does not seem to produce an enhancement in the binding of congruent information. One tentative account for this particular set of results may be that although sensory binding of congruent information occurs preattentively, attention can nevertheless still help to resolve situations in which there is sensory conflict. The outcome of a recent study questions the latter argument and contrasts to a certain degree with the results of our final control experiment (cf. Driver, 1996). Using event-related potentials, Talsma and Woldorff (2005; see also Senkowski, Talsma, Herrmann, & Woldorff, 2005) demonstrated that the degree of audiovisual integration under congruent spatiotemporal conditions can be modulated by the locus of endogenous spatial attention. In Talsma and Woldorff’s study, participants were presented with auditory, visual, and audiovisual stimuli (always spatiotemporally congruent) either in the right or left hemifield, and they were instructed to respond to infrequent target stimuli. Their behavioral data revealed that participants responded more rapidly to audiovisual stimuli than to either visual or auditory stimuli, replicating a number of previous studies (e.g., Gondan, Niederhaus, Ro¨sler, & Ro¨der, 2005). The crucial result to emerge from Talsma and Woldorff’s study was a modulation of different event-related potential components, ranging from early potentials (100 ms after stimulus onset) to late potentials (500 ms after stimulus onset) on attended versus unattended audiovisual stimuli. These results support the idea that endogenous spatial attention can modulate the cross-modal integration of auditory and visual spatiotemporally congruent stimuli at different levels of information processing (from perceptual through more decisional/ response-related levels of stimulus processing). Of note, however, the analysis of the behavioral data in Talsma and Woldorff’s study only revealed a significant main effect of target modality (i.e., faster reaction times in response to multisensory stimuli than in response to either unimodal visual or auditory stimuli) but no interaction between target modality and attentional condition, a result very similar to what was found in the present study (for the congruent condition, which was the only situation tested by Talsma & Woldorff, 2005). Before making any specific claims regarding the apparent inconsistency between Talsma and Woldorff’s (2005) event-related potentials results and the outcome of our final control experiment, we should note that in Talsma and Woldorff’s study, endogenous spatial attention was manipulated and only static audiovisual stimuli were presented. By contrast, in our control experiment, we manipulated exogenous attention and we presented audiovisual stimuli that gave rise to the perception of apparent motion. Therefore, it remains possible that a dissociation between endogenous and exogenous spatial attention exists in the processing of congruent cross-modal information, or that those attentional effects depend on the type of stimuli that participants have to respond to. Moreover, the modulation of cross-modal integration reported in Talsma and Woldorff’s study was only obtained in the eventrelated potentials data but not at a behavioral level. Therefore, it would be interesting to conduct a full behavioral version of their experiment to come to some more definitive conclusions on this issue.

The results of one recent investigation (Alsius et al., 2005) also appear to be inconsistent with one of our main conclusions. In Alsius et al.’s (2005) study, increasing the attentional demands of the task (using a dual-task paradigm) reduced the magnitude of the effect of lip movements on auditory speech perception (the wellknown McGurk effect; McGurk & MacDonald, 1976). In further control experiments, the authors showed that neither auditory speech discrimination nor the lip-reading task (when presented in isolation) were affected by the concurrent dual task. The authors concluded that although both the visual and auditory inputs were unaltered by increasing the demands of the task, audiovisual interactions were somewhat impaired under conditions of high attentional load. This result would appear to be inconsistent with the conclusion outlined here (i.e., that spatial attention can partially inhibit illusory cross-modal binding). In any case, the attentional manipulation adopted by Alsius and colleagues is quite different from the one used in the present study. Although they investigated the effect of attentional load, here we manipulated spatial attention (cf. Spence et al., 2001; Vibell, Klinge, Zampini, Spence, & Nobre, 2007, on this issue). Moreover, and perhaps more important, in the case of the McGurk illusion, the visual and auditory information are not totally contradictory as they share certain features. On the other hand, on incongruent situations in the cross-modal dynamic capture effect, the auditory and visual inputs provide totally inconsistent information regarding the task-relevant perceptual dimension (direction of motion). Taken together, these differences may help to explain the apparent contradiction between our results and those of Alsius et al. In any case, the present study as well as the studies of Talsma and Woldorff (2005) and Alsius et al. open up an interesting line for future research that should lead to a better understanding of the role that attention may play on cross-modal integration–multisensory interactions.

Locus of Modulation An alternative explanation for why audiovisual integration was reduced at the locus of spatial attention in the present study is based on the beneficial effect of attention on the unimodal processing of auditory stimuli. Orienting attention to a location in space has been shown to enhance the perception of auditory stimuli appearing at the cued location (e.g., Spence & Driver, 1994; Tata, Prime, McDonald, & Ward, 2001; Tata & Ward, 2005). Thus, in the present study, spatial attention may have improved auditory spatial processing, thereby reducing any effect of the distractor visual motion information by increasing the reliability of the target auditory information (cf. Battaglia, Jacobs, & Aslin, 2003; Ernst & Bu¨lthoff, 2004; Welch & Warren, 1980). Because spatial attention was allocated to a region of space in which both the auditory and visual signals were presented, one might argue that the visual signal (and not only the auditory signal) also may have been enhanced by spatial attention (cf. Carrasco, Williams, & Yeshurun, 2002; Spence & Driver, 1996), and, as a consequence, we should not have found any effect of spatial attention on the audiovisual interaction. However, because of the salient nature of the visual stimuli, perception of visual direction was not at all ambiguous (cf. Soto-Faraco, Spence, & Kingstone, 2004). Therefore, any enhancement of stimulus processing may only have been detected in the auditory modality (note that the perceptual sensitivity to the direction of auditory apparent motion

ATTENTION MODULATES AUDIOVISUAL INTERACTIONS

can be reduced in the presence of visual apparent motion; e.g., Soto-Faraco, Spence, & Kingstone, 2005). The results of the control experiment described earlier in the General Discussion, in which the level of auditory performance was brought off ceiling by the introduction of background white noise, would appear to rule out this explanation. Given that spatial attention did not improve performance on congruent trials, it seems that, at least in the present paradigm, spatial attention did not affect auditory or visual processing in isolation, but rather it affected the interaction between the two inputs. The largest reductions in the magnitude of the cross-modal dynamic capture effect were reported in Experiments 2–5 (in which a peripheral cue was used) relative to those seen in Experiment 1 (in which attention was oriented between blocks of trials without the use of a peripheral cue). According to Klein and Shore (2000; see also Briand, 1998; Briand & Klein, 1987), only peripheral (and not central) cues influence the perceptual binding of sensory information (though see also Prinzmetal, McCool, & Park, 2005; Prinzmetal, Park, & Garret, 2005). In line with this idea, it is possible that our hypothesis, based on the effect of attention on perceptual sensory conflict, applies only to Experiments 2–5 in which a peripheral cue was used. In fact, the reduction in the magnitude of the congruency effect in cued trials with respect to that seen in uncued trials was not significantly different in Experiments 2 and 3, regardless of the predictability of the cue with respect to the target location (for similar examples, see Briand, 1998; Briand & Klein, 1987). On the other hand, the reduction of the cross-modal congruency effect reported in Experiment 1 could be accounted for by a sensory enhancement (weak in comparison with the effect produced by the peripheral cue) produced by endogenous attention (e.g., Driver & Frith, 2000; Prinzmetal, McCool, & Park, 2005; Spence et al., 2001; Tata & Ward, 2005) or even by a modulation of audiovisual interactions solely at the response-selection stage of information processing (cf. Briand & Klein, 1987; see below for further discussion of this issue). The results of Experiments 4 and 5 suggest that attentional benefits in our paradigm are found only when a peripheral cue is used. Moreover, the overall level of performance on incongruent trials was less accurate in Experiment 1 than in any of the other experiments presented here. Taken together, these results support the existence of a dissociation between endogenous and exogenous spatial attention in our study. However, as noted before, participants in Experiment 1 could have successfully divided their spatial attention during the neutral blocks of trials compared with the trial-by-trial manipulation of cuing used in the remaining experiments of this study. The change in overall performance on incongruent trials between the experiments could then simply be due to between-participants variability. Thus, although our data suggest a dissociation between endogenous and exogenous attention in the modulation of cross-modal perception, we believe that, at present, any hypothesis regarding the specific role played by endogenous and exogenous (i.e., peripherally cued) spatial attention in the audiovisual interactions described here must remain speculative.

Levels of Processing The levels of processing at which visual information affects auditory dynamic processing has been a matter of much debate over the past few years. Although the results of recent investiga-

935

tions suggest that visual distractor information about motion direction can affect auditory perceptual processing before response selection– execution (Sanabria et al., 2007; Soto-Faraco et al., 2005; Vroomen & de Gelder, 2000), it has also been shown that visual motion distractors can affect the perception of auditory motion information at a decisional (or response selection) level as well (e.g., see Sanabria et al., 2007; Soto-Faraco et al., 2005; Wuerger, Hofbauer, & Meyer, 2003). The above discussion is relevant to the present study, given that one might argue that the effects reported here reflect an attentional modulation of crossmodal interactions at the level of response selection– execution rather than an effect of attention at the level of perceptual processing prior to the response stage (for instance, Talsma & Woldorff, 2005, reported electrophysiological evidence demonstrating that attentional modulations can take place at both early and late stages of information processing; see also Senkowski et al., 2005). In fact, in Experiments 1 and 2, our participants were able to predict (on the basis of the experimental instructions) the likely location where the audiovisual stimuli would be presented. As they were instructed to ignore the visual moving distractor and only respond to the direction of the auditory target, one might argue that they may have set a stricter response criterion on expected (cued) than on unexpected (uncued) trials, thus resulting in better performance. That is, participants’ responses to the auditory target would have been less affected by the direction of the visual distractor stream on cued than on uncued trials. However, although this explanation can account for the results obtained in Experiments 1 and 2, it does not hold for Experiments 3 and 5 in which a nonpredictive peripheral cue was used (i.e., participants could not use any prior information to set their response criterion as the cue was nonpredictive). When one refers to conscious strategies, it is also possible to argue that participants may have set themselves a stricter criterion for responding on the more infrequent and unexpected trials (and not on expected trials), which would have resulted in the reverse pattern of results (i.e., a smaller cross-modal dynamic capture effect on uncued than on cued trials). Therefore, given that a similar general pattern of results was obtained in all of the experiments (i.e., reduced congruency effects on cued compared with uncued trials) even in those experiments in which a nonpredictive peripheral cue was used, we believe that both endogenous and exogenous spatial attention modulated the interaction between auditory and visual apparent motion information principally at a perceptual level of information processing (although of course this does not totally exclude the possibility of a modulation at the response stage as well). Although the explanation of our results based solely on the effect of conscious strategies (instead of attentional effects operating at a perceptual level of information processing) does not seem to hold here, the use of an orthogonal cuing paradigm should have reduced the likelihood of changes in response criterion taking place on the basis of stimulus–response compatibility effects. The pattern of data reported in the present study stands in sharp contrast with the results of previous attempts to modulate crossmodal interactions by means of the orienting of spatial attention. Indeed, Bertelson and colleagues (Bertelson et al., 2000; Vroomen, Bertelson, & de Gelder, 2001b) have repeatedly failed to show any effect of either endogenous or exogenous attention on the ventriloquism effect. Methodological factors may, however, help to account for the apparent inconsistency between previous studies and

SANABRIA, SOTO-FARACO, AND SPENCE

936

the results observed here. In Experiments 1, 2, and 4, endogenous attention was oriented to a region in space prior to the appearance of the audiovisual stimuli, whereas in Bertelson et al.’s (2000) study, it is unclear whether attention was oriented to a particular location (as the authors suggested) or to a particular object. It is therefore possible that only location-based spatial attention, and not object-based spatial attention, may modulate cross-modal interactions (cf. Vecera & Farah, 1994). One might still argue that the attentional manipulation used in Experiments 1, 2, and 4 may have helped observers to focus their attention on the target auditory object (and not on a particular region of space) as they had prior knowledge regarding the target location. However, this explanation cannot account for the results of Experiments 3 and 5 in which a nonpredictive peripheral cue was used instead. A peripheral cue was used in Experiments 2–5 to attract spatial attention exogenously, whereas Vroomen et al. (2001b) used singletons to capture attention. Given that onsets have been shown to be more effective in capturing exogenous attention than singletons (e.g., see Ruz & Lupia´n˜ez, 2002), it is possible that attentional capture was more effective in the experiments described here than in Vroomen et al.’s (2001b) study, thus enabling the modulation of cross-modal interactions that would otherwise go unnoticed. In conclusion, the results of the present study suggest that cross-modal interactions can be modulated by spatial attention (see Alsius et al., 2005, for a similar argument regarding manipulations of attentional load). This conclusion has both theoretical and methodological implications and contradicts previous claims that cross-modal binding is necessarily preattentive in nature (e.g., Bertelson & de Gelder, 2004).

References Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15, 839 – 843. Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America A, 20, 1391–1397. Bedford, F. L. (1999). Keeping perception accurate. Trends in Cognitive Sciences, 2, 4 –11. Bedford, F. L. (2004). Analysis of a constraint on perception, cognition, and development: One object, one place, one time. Journal of Experimental Psychology: Human Perception and Performance, 30, 907–912. Beer, A. L., & Ro¨der, B. (2004). Attention to motion enhances processing of both visual and auditory stimuli: An event-related potential study. Cognitive Brain Research, 18, 205–225. Bertelson, P., & Aschersleben, G. (1998). Automatic visual bias of perceived auditory location. Psychonomic Bulletin & Review, 5, 482– 489. Bertelson, P., & de Gelder, B. (2004). The psychology of multimodal perception. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 141–179). Oxford, England: Oxford University Press. Bertelson, P., Vroomen, J., de Gelder, B., & Driver, J. (2000). The ventriloquist effect does not depend on the direction of deliberate visual attention. Perception & Psychophysics, 62, 321–332. Briand, K. A. (1998). Feature integration and spatial attention: More evidence of a dissociation between endogenous and exogenous orienting. Journal of Experimental Psychology: Human Perception and Performance, 24, 1243–1256. Briand, K. A., & Klein, R. M. (1987). Is Posner’s “beam” the same as Treisman’s “glue”? On the relation between visual orienting and feature

integration theory. Journal of Experimental Psychology: Human Perception and Performance, 13, 228 –241. Caclin, A., Soto-Faraco, S., Kingstone, A., & Spence, C. (2002). Tactile “capture” of audition. Perception & Psychophysics, 64, 616 – 630. Calvert, G. A., Spence, C., & Stein, B. E. (Eds.). (2004). The handbook of multisensory processes. Cambridge, MA: MIT Press. Carrasco, M., Williams, P. E., & Yeshurun, Y. (2002). Covert attention increases spatial resolution with or without masks: Support for signal enhancement. Journal of Vision, 2, 467– 479. Colby, C., & Goldberg, M. E. (1999). Space and attention in parietal cortex. Annual Review of Neuroscience, 22, 319 –349. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3, 201–215. Craig, J. C. (2006). Visual motion interferes with tactile motion perception. Perception, 35, 351–367. Driver, J. (1996, May 2). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66 – 68. Driver, J., & Frith, C. (2000). Shifting baselines in attention research. Nature Reviews Neuroscience, 1, 147–148. Driver, J., & Spence, C. (2004). Crossmodal spatial attention: Evidence from human performance. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 179 –220). Oxford, England: Oxford University Press. Ernst, M. O., & Bu¨lthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences, 8, 162–169. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Gondan, M., Niederhaus, B., Ro¨sler, F., & Ro¨der, B. (2005). Multisensory processing in the redundant-target effect: A behavioral and event-related potential study. Perception & Psychophysics, 67, 713–726. Hawkins, H. L., Hillyard, S. A., Luck, S. J., Mouloua, M., Downing, C. J., & Woodward, D. P. (1990). Visual attention modulates signal detectability. Journal of Experimental Psychology: Human Perception and Performance, 16, 802– 811. Jonides, J., & Mack, R. (1984). On the cost and benefit of cost and benefit. Psychological Bulletin, 96, 29 – 44. Klein, R. M., & Shore, D. I. (2000). Relations among modes of visual orienting. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 195–208). Cambridge, MA: MIT Press. Lyons, G., Sanabria, D., Vatakis, A., & Spence, C. (2006). The modulation of crossmodal integration by unimodal perceptual grouping is unaffected by posture change: A visuo-tactile apparent motion study. Experimental Brain Research, 174, 510 –516. Mazza, V., Turatto, M., Rossi, M., & Umilta`, C. (2007). How automatic are audiovisual links in exogenous spatial attention? Neuropsychologia, 45, 514 –522. McGurk, H., & MacDonald, J. (1976, December 23–30). Hearing lips and seeing voices. Nature, 264, 746 –748. Mondor, T. A., & Amirault, K. J. (1998). Effect of same- and differentmodality spatial cues on auditory and visual target identification. Journal of Experimental Psychology: Human Perception and Performance, 24, 745–755. Morein-Zamir, S., Soto-Faraco, S., & Kingstone, A. (2003). Auditory capture of vision: Examining temporal ventriloquism. Cognitive Brain Research, 17, 154 –163. Mu¨ller, H. J., & Rabbitt, P. M. A. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315–330. Paul, L., & Schyns, P. G. (2003). Attention enhances feature integration. Vision Research, 43, 1793–1798. Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the

ATTENTION MODULATES AUDIOVISUAL INTERACTIONS detection of signals. Journal of Experimental Psychology: General, 109, 160 –174. Prinzmetal, W., McCool, C., & Park, S. (2005). Attention: Reaction time and accuracy reveal different mechanisms. Journal of Experimental Psychology: General, 134, 73–92. Prinzmetal, W., Park, S., & Garrett, R. (2005). Involuntary attention and identification accuracy. Perception & Psychophysics, 67, 1344 –1353. Quinlan, P. T. (2003). Visual feature integration theory: Past, present, and future. Psychological Bulletin, 129, 643– 673. Reynolds, J. H., & Chelazzi, L. (2004). Attention modulation of visual processing. Annual Reviews of Neuroscience, 27, 611– 647. Reynolds, J. H., & Desimone, R. (1999). The role of neural mechanisms of attention in solving the binding problem. Neuron, 24, 19 –29. Ruz, M., & Lupia´n˜ez, J. (2002). A review of attentional capture: On its automaticity and sensitivity to endogenous control. Psicologica, 23, 283–309. Sanabria, D., Soto-Faraco, S., Chan, J., & Spence, C. (2005). Intramodal perceptual grouping modulates multisensory integration: Evidence from the crossmodal dynamic capture task. Neuroscience Letters, 377, 59 – 64. Sanabria, D., Spence, C., & Soto-Faraco, S. (2007). Perceptual and decisional contributions to audiovisual interactions in the perception of motion: A signal detection study. Cognition, 102, 299 –310. Schneider, W., Eschman, A., & Zuccolotto, A. (2002a). E-Prime reference guide [Computer software manual]. Pittsburgh, PA: Psychology Software Tools. Schneider, W., Eschman, A., & Zuccolotto, A. (2002b). E-Prime user’s guide [Computer software manual]. Pittsburgh, PA: Psychology Software Tools. Sengpiel, F., & Hu¨bener, M. (1999). Spotlight on the primary visual cortex. Current Biology, 9, 318 –321. Senkowski, D., Talsma, D., Herrmann, C. S., & Woldorff, M. G. (2005). Multisensory processing and oscillatory gamma responses: Effects of spatial selective attention. Experimental Brain Research, 166, 411– 426. Soto-Faraco, S., & Kingstone, A. (2004). Multisensory integration of dynamic information. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processes (pp. 49 – 69). Cambridge, MA: MIT Press. Soto-Faraco, S., Lyons, J., Gazzaniga, M., Spence, C., & Kingstone, A. (2002). The ventriloquist in motion: Illusory capture of dynamic information across sensory modalities. Cognitive Brain Research, 14, 139 – 146. Soto-Faraco, S., Ronald, A., & Spence, C. (2004). Tactile selective attention and body posture: Assessing the contribution of vision and proprioception. Perception & Psychophysics, 66, 1077–1094. Soto-Faraco, S., Spence, C., & Kingstone, A. (2004). Cross-modal dynamic capture: Congruency effects in the perception of motion across sensory modalities. Journal of Experimental Psychology: Human Perception and Performance, 30, 330 –345. Soto-Faraco, S., Spence, C., & Kingstone, A. (2005). Assessing automaticity in the audio-visual integration of motion. Acta Psychologica, 118, 71–92. Spence, C., & Driver, J. (1996). Audiovisual links in endogenous covert spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 22, 1005–1030. Spence, C., & Driver, J. (1997). Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1–22. Spence, C., & Driver, J. (Eds.). (2004). Crossmodal space and crossmodal attention. Oxford, England: Oxford University Press. Spence, C. J., & Driver, J. (1994). Covert spatial orienting in audition: Exogenous and endogenous mechanisms. Journal of Experimental Psychology: Human Perception and Performance, 20, 555–574.

937

Spence, C., McDonald, J., & Driver, J. (2004). Exogenous spatial-cuing studies of human crossmodal attention and multisensory integration. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 247–277). Oxford, England: Oxford University Press. Spence, C., Shore, D. I., & Klein, R. M. (2001). Multisensory prior entry. Journal of Experimental Psychology: General, 130, 799 – 832. Strybel, T. Z., & Vatakis, A. (2005). A comparison of auditory and visual apparent motion presented individually and with crossmodal moving distractors. Perception, 33, 1033–1048. Talsma, D., & Woldorff, M. G. (2005). Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience, 17, 1098 –1114. Tata, M. S., Prime, D. J., McDonald, J. J., & Ward, L. M. (2001). Transient spatial attention modulates distinct components of the auditory ERP. NeuroReport, 12, 3679 –3682. Tata, M. S., & Ward, L. M. (2005). Spatial attention modulates activity in a posterior “where” auditory pathway. Neuropsychologia, 43, 509 –516. Treisman, A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8, 194 –214. Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178. Treisman, A., & Davies, A. (1973). Divided attention to ear and eye. In S. Kornblum (Ed.), Attention and performance (Vol. 4, pp. 101–117). New York: Academic Press. Treisman, A., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. Treisman, A., & Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107–141. Treue, S. (2004). Perceptual enhancement of contrast by attention. Trends in Cognitive Sciences, 8, 435– 437. Vecera, S. P., & Farah, M. J. (1994). Does visual attention select objects or locations? Journal of Experimental Psychology: General, 123, 146 –160. Vibell, J., Klinge, C., Zampini, M., Spence, C., & Nobre, A. C. (2007). Temporal order is coded temporally in the brain: Early ERP latency shifts underlying prior entry in a crossmodal temporal order judgment task. Journal of Cognitive Neuroscience, 19, 109 –120. Vroomen, J., Bertelson, P., & de Gelder, B. (2001a). Directing spatial attention towards the illusory location of a ventriloquized sound. Acta Psychologica, 108, 21–33. Vroomen, J., Bertelson, P., & de Gelder, B. (2001b). The ventriloquist effect does not depend on the direction of automatic visual attention. Perception & Psychophysics, 63, 651– 659. Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: Cross-modal effects of auditory organization on vision. Journal of Experimental Psychology: Human Perception and Performance, 26, 1583–1590. Vroomen, J., & de Gelder, B. E. (2004). Temporal ventriloquism: Sound modulates the flash-lag effect. Journal of Experimental Psychology: Human Perception and Performance, 30, 513–518. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychonomic Bulletin, 3, 638 – 667. Wolfe, J. M. (2003). Moving towards solutions to some enduring controversies in visual search. Trends in Cognitive Sciences, 7, 70 –76. Wuerger, S. M., Hofbauer, M., & Meyer, G. F. (2003). The integration of auditory and visual motion signals at threshold. Perception & Psychophysics, 65, 1188 –1196.

Received January 23, 2006 Revision received August 15, 2006 Accepted August 21, 2006 䡲