Exploring the role of visual perceptual grouping on the ... - CiteSeerX

1 downloads 0 Views 89KB Size Report
grouping on audiovisual motion integration, using an adaptation of the crossmodal ..... over the horizontal local motion in terms of participants' subjective ...
NEUROREPORT

AUDITORYAND VESTIBULAR SYSTEMS

Exploring the role of visual perceptual grouping on the audiovisual integration of motion Daniel Sanabria,1,CA Salvador Soto-Faraco1,2 and Charles Spence1 1

Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX13UD, UK; 2Parc Cient|¤ ¢c de Barcelona, Universitat de Barcelona, Spain CA

Corresponding Author: [email protected] Received 23 July 2004; accepted18 October 2004

In the present study, we explored the role of visual perceptual grouping on audiovisual motion integration, using an adaptation of the crossmodal dynamic capture task developed by Soto-Faraco et al.The principles of perceptual grouping were used to vary the perceived direction (horizontalvs vertical) and extent of apparent motion within the visual modality. When the critical visual stimuli, giving rise to horizontal local motion, were embedded within a

larger array of lights, giving rise to the perception of global motion vertically, the in£uence of visual motion information on the perception of auditory apparent motion (moving horizontally) was reduced signi¢cantly. These results highlight the need to consider intramodal perceptual grouping when investigating crossc 2004 modal perceptual grouping. NeuroReport 15:2745^2749  Lippincott Williams & Wilkins.

Key words: Auditory; Motion perception; Multisensory integration; Perceptual grouping; Visual

INTRODUCTION The traditional approach to the study of perception involved examining each of the sensory modalities in isolation. By contrast, current research has more often considered perception as a consequence of multisensory integration [1]. The study of motion processing is no exception to this, and there is now a wealth of literature demonstrating that the perceptual system benefits from convergent motion information available to different sensory modalities. Neuroimaging studies have highlighted the multisensory convergence of motion information in a number of different brain areas in humans. For example, using fMRI, researchers have identified a number of sites of activation common to motion in the visual, auditory and tactile modalities [2,3]. Other studies have reported the activation of what have traditionally been thought of as unimodal auditory areas when investigating the pattern of brain activity during different types of unimodal visual motion [4]. Consistent with these neurophysiological data, recent behavioral studies have documented the existence of robust crossmodal links in motion perception [5]. However, despite the growing evidence for crossmodal interactions in the perception of audiovisual motion, little is known about the influence of processes occurring at the level of unimodal perceptual grouping. For instance, to what extent changes in the intramodal grouping of visual stimuli will modulate the crossmodal grouping of audiovisual stimuli [5,6]. In the present experiment, we explored the role of intramodal perceptual grouping on multisensory integration [6]. Motion integration in humans was examined using an adaptation of Soto-Faraco et al.’s crossmodal dynamic

c Lippincott Williams & Wilkins 0959- 4965 

capture task [7]. To date, the crossmodal dynamic capture task has most frequently involved participants responding to the direction of an auditory apparent motion stream (elicited by the sequential presentation of two sounds) while trying to ignore a synchronously-presented visual apparent motion stream (elicited by the sequential presentation of two visual stimuli). Typically, an impairment in performance is observed in directionally incongruent trials (i.e., when the visual and auditory motion streams move in opposite directions) as compared to directionally congruent trials (i.e., when the visual and auditory motion streams move in the same direction). In order to explore the influence of visual perceptual grouping on audiovisual integration, we manipulated the number and spatial distribution of visual stimuli presented. In the crucial experimental condition we presented a visual apparent motion stream with two potential perceptual interpretations that had orthogonal directions: One consisting of two lights moving in the horizontal plane, and the other consisting of four lights moving vertically. According to the classic grouping principles, we expected the local horizontal visual motion to be subsumed within the larger global vertical visual stream [8]. Because of the different direction of the visual (vertical) and auditory (horizontal) motion streams we therefore expected to find a reduced effect of the visual motion information on the perception of the direction of the auditory motion in the critical experimental condition (i.e., when more visual stimuli were presented) compared to the typical 2 lights–2 sounds condition. We therefore hoped to demonstrate a modulation of audiovisual motion integration as a consequence of changes in perceptual grouping taking place within vision [6,9,10].

Vol 15 No 18 22 December 2004

2 74 5

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

NEUROREPORT

D. SANABRIA, S. SOTO-FARACO AND C. SPENCE

MATERIALS AND METHODS

Materials: Two loudspeakers were positioned in a row (30 cm separation centre-to-centre) at ear level, 90 cm in front of the seated participants. An orange LED was placed directly above each loudspeaker cone. A third orange LED was placed 30 cm above the left loudspeaker and a fourth 30 cm below the right loudspeaker (Fig. 1). A constantlyilluminated red LED, situated midway between the two loudspeakers served as the central fixation point. Participants sat in complete darkness, in front of the fixation point, holding a response keypad. The auditory apparent motion stimuli consisted of two 100 ms white noise bursts (60 dB(A) as measured from the participant’s head position), one burst presented from each of the two loudspeakers, and separated by an inter-stimulus-interval (ISI) of 50 ms. The visual

Incongruent

Congruent

T2

T1

T2

T1

T1

T2

T2

T1

T2

T1

T2

T1

T1

T2

T2

T1

2 lights

4 lights

Direction, and path, of local visual apparent motion

T1−T2 = Time Active LED(s) at T1

Direction, and path, of global visual apparent motion Direction, and path, of auditory apparent motion

Active LED(s) at T2 Inactive LED

Auditory stimulus Fig. 1. Four of the trial types presented in the present experiment, resulting from the crossing of the number of visual stimuli (2 vs 4 lights) and congruency (incongruent vs congruent) factors. There were also four more trials types (not shown) in which the stimuli moved in the opposite direction (i.e., with the direction of auditory apparent motion moving from left to right).

2 74 6

Incongruent Congruent 100

90

% Correct

Participants: Eighteen participants (age range 21–30, mean 27 years) took part in this experiment. All reported normal hearing and normal or corrected-to-normal vision, and all received a d5 gift voucher in exchange for their participation. The experiment took B30 min to complete, and was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki. All the participants gave their informed consent prior to their inclusion on the study.

80

70

60

50 2 lights

4 lights

Number of visual stimuli Fig. 2. Mean accuracy ( + s.e.) in discriminating the direction of auditory apparent motion as a function of the number of visual stimuli and congruency factors. Note that the congruency is related to the movement of the middle two lights.

apparent motion stimuli consisted of a sequence of 100 ms light flashes presented from different LEDs, each separated by an ISI of 50 ms. In the 2 lights condition, the visual apparent motion stream consisted of the sequential activation of the two central LEDs, in time with the two sounds burst (Fig. 1). In the 4 lights condition, each visual onset consisted of the simultaneous activation of two LEDs (100 ms duration). The top-left LED was always flashed in time with the middle-right LED, while the bottom-right LED was always flashed with the middle-left LED. When the middle LEDs flashed from right-to-left in the 4 lights condition, it appeared as if two groups of two lights (one on the left and the other on the right) moved from top-tobottom. When the central LEDs flashed from left-to-right it appeared as if the two groups of two lights moved from bottom-to-top instead. Note that while the global motion of the 4 lights display was either upward or downward, the local motion of the two middle lights (when considered in isolation) was still either leftward or rightward just as in the 2 lights condition. Procedure: Every trial contained an auditory apparent motion stream moving to either the left or right, and either a 2 or 4 light visual apparent motion stream. The direction of motion of the visual and auditory streams was independent. The type of visual stream (2 vs 4 lights) and the directional congruency between the auditory and visual motion streams (incongruent vs congruent) were randomized on a trial-by-trial basis. Note that congruency was always defined with respect to the direction of motion of the two central lights placed directly in front of the loudspeakers (i.e., with respect to the local visual motion). Participants were instructed to make a spatially-compatible keypress response (using their left or right thumb) to indicate leftward vs rightward auditory motion respectively, while attempting to ignore the irrelevant visual stimuli as much as possible. The participants were instructed to prioritize

Vol 15 No 18 22 December 2004

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

AUDIOVISUAL INTEGRATION AND PERCEPTUALGROUPING

response accuracy over response speed, and they were also informed about the independence of the direction of the auditory and visual apparent motion streams. No time constraints were imposed for the completion of the trials. The response-stimulus interval was 2000 ms. Participants completed 12 practice trials in which the sounds were presented alone (these trials were repeated if participants made more than one error). After the practice trials, participants completed two blocks of 96 experimental trials.

RESULTS The accuracy data were submitted to a 2 (number of visual stimuli) by 2 (congruency) repeated-measures ANOVA. This analysis revealed a significant main effect of congruency (F(1,17)¼34.06, MSE¼0.04, po0.01) with participants performing more accurately on congruent trials than on incongruent trials overall (95% vs 65%, respectively). Participants also responded less accurately in the 2 lights condition (mean¼79%) than in the 4 lights condition (82%), resulting in a significant main effect of the number of visual stimuli (F(1,17)¼5.39, MSE¼0.00, p¼0.03). Crucially, however, a larger crossmodal dynamic capture effect (measured as the difference between congruent and incongruent trials) was reported in the 2 lights condition than in the 4 lights condition (36% and 23%, respectively; Fig. 2), resulting in a significant interaction between congruency and number of visual stimuli (F(1,17)¼12.51, MSE¼0.00, po0.01). Post-hoc comparisons (Tukey honest significant difference) revealed that this interaction was caused by a significant difference in participants’ performance between the 2 lights and 4 lights conditions on incongruent trials (61% and 71% respectively, po0.01), while no such difference was reported for the comparison of the congruent trials (p¼0.59). Elsewhere [11], using the same auditory apparent motion stimuli, we have demonstrated that participants’ performance in a unimodal auditory motion discrimination task was near perfect (99%; just as in the practice trials of the main experiment presented here). This result, added to the non-significant difference between the 2 and 4 lights conditions on congruent trials in the main experiment, suggests that the visual distractors primarily impaired performance on the incongruent distractor trials, rather than facilitating it on congruent trials. However, note that it is still possible that any improvement of performance caused by congruent stimulus movement might be masked by participants responding at ceiling levels on these trials.

DISCUSSION Our results clearly show the influence of the direction of visual apparent motion on the perception of auditory apparent motion. Indeed, performance on incongruent trials (in both experimental conditions) was significantly worse than on congruent trials, where performance was near perfect. This result is consistent with previous findings supporting the existence of a crossmodal dynamic capture effect [5–7,12]. The important new result to emerge from the present study is that the magnitude of the crossmodal dynamic capture effect was reduced significantly in the 4 lights condition as compared to the 2 lights condition, despite of the fact that the motion of the 2 central lights was the same in both conditions. Because of the difference in the direction

NEUROREPORT of motion between the auditory apparent motion stream and the global visual apparent motion stream, the influence of visual motion on the perception of the direction of the auditory apparent motion was reduced in the 4 lights compared to the 2 lights condition. The claim that any local horizontal motion was dominated by the global vertical motion in the 4 lights condition was supported by the results of a follow-up experiment in which vertical global motion was also shown to prevail over the horizontal local motion in terms of participants’ subjective experience. Seven additional participants were tested to assess their subjective experience of motion using the stimulus displays used in the main experiment. The participants were asked to categorize the visual stimuli as moving horizontally, as moving vertically, as moving both in a horizontal and a vertical fashion at the same time, or else as not moving at all. They also had to rate how confident they were regarding their responses on a 5 point Likert scale (0, not confident, to 5, very confident). Crucially, participants perceived vertical motion in 90% of the 4 lights trials, and horizontal motion in 98% of the 2 lights trials. Other responses (both vertical and horizontal motion, or none) accounted for o4% of trials overall. Finally, the results of the confidence ratings revealed that participants were typically highly confident of their responses (mean of 4 for both the vertical and horizontal motion responses). Even though the perception of motion vertically seems to have overcome horizontal motion in the display used here (at least in terms of participants’ subjective experience), we still found a significant crossmodal dynamic capture effect in the 4 lights condition. Given that participants seemed to be unaware of the presence of any horizontal visual motion in our display, their judgments regarding the direction of auditory apparent motion could hardly be biased by their subjective perception of the direction of visual horizontal motion. Therefore, this latter result appears to support a multisensory integration account of the crossmodal dynamic capture effect reported here rather than an explanation based on a response bias elicited by the direction of the visual stream [12–14]. Our results therefore strongly suggest that the crossmodal grouping of auditory and visual motion signals can be modulated by the perceptual grouping taking place within vision [6,9,10]. However, one might also consider an alternative explanation for our results. It could be argued that the reduced congruency effect reported in the 4 lights condition (compared to the 2 lights condition) does not reflect the consequences of visual perceptual grouping modulating multisensory integration, but instead exogenous attention being attracted away from the central lights. However, this alternative explanation would directly conflict with the results of many other researchers [15–17], arguing that perceptual grouping processes (that in our study gave rise to a global vertical motion stream) occur preattentively. Therefore, it is more parsimonious to interpret our data as a modulation produced by perceptual grouping happening before the possible effects that any displacements of spatial attention might have elicited. In the light of the results in the 4 lights condition, we argue that visual perceptual grouping precedes multisensory integration, thus supporting the idea that multisensory integration of motion information depends upon

Vol 15 No 18 22 December 2004

2 74 7

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

NEUROREPORT the configuration of unimodal perceptual information. In fact, this is consistent with recent findings from our laboratory in which we showed that, in order to modulate the crossmodal dynamic capture effect, conditions for unimodal visual perceptual grouping must be established before the presentation of the multisensory audiovisual event [6]. The data presented here are also in keeping with neurophysiological and neuroimaging research that have uncovered the principles governing audiovisual interactions, such as temporal synchrony and spatial coincidence [18–22]. Together with the results of Sanabria et al.’s study [6], the present data also suggest that, as well as temporal synchrony and spatial coincidence, perceptual numerosity seems to be a key factor for multisensory integration of audiovisual apparent motion events. We have shown that larger crossmodal effects are found when visual and auditory streams are of the same number of stimuli, than when different numbers of stimuli are presented in each modality [23]. This may explain the lack (or weak nature) of any significant crossmodal effect (either facilitation or interference) in experiments in which, random-dot kinematograms were used as visual moving stimuli, and manipulations of interaural level differences were used to elicit moving auditory stimuli [24]. In those particular cases, visual and auditory events were often temporally (if not always spatially) aligned, but were different in number and kind, and, as a consequence, the conditions for multisensory binding may have not been optimal (see [14] for a similar argument). We believe that the present results reflect the outcome of multisensory integration in motion perception. One might wonder though, whether the use of real moving auditory and visual stimuli would result in a different pattern of results. In a recent investigation, Soto-Faraco et al. [12] addressed this question directly, reporting nearly the same pattern of results (i.e., the visual capture of the perceived direction of auditory motion) for both real and apparent motion audiovisual stimuli. Moreover, in terms of the perception of motion, the results of our follow-up study demonstrated that participants actually perceived motion (either vertical or horizontal) in all of the experimental conditions that were tested. Finally, in neural terms, our results regarding the modulation of multisensory integration of motion information by the perceptual grouping taking place within vision (i.e., unimodally) seems to be consistent with neurophysiological and neuroimaging studies that have investigated both processes independently. For instance, relevant to the present study is Francis and Grossberg’s suggestion that interactions between primary visual areas V1, V2 and hMT may provide the locus of perceptual grouping by apparent motion [25]. Meanwhile, other studies have shown that a number of higher level association areas (e.g., ventral premotor cortex, lateral parietal cortex, lateral frontal cortex) appear to play a critical role in the multisensory integration of motion information (see [5] for a review). Therefore, added to the behavioral evidence reported here, the neurophysiological data seem to further support the idea that multisensory integration of motion information depends on how the unisensory information is carved up into units. However, it is worth mentioning that other researchers [2] have suggested that low level perceptual areas, such as area hMT, may also be implicated in the multisensory

2 74 8

D. SANABRIA, S. SOTO-FARACO AND C. SPENCE

integration of motion information. Therefore, an important issue for future research will be to uncover exactly how the modulation of multisensory integration by unimodal perceptual grouping established in the present study is implemented neurally.

CONCLUSIONS The results of the present experiment show that one needs to consider the nature of any perceptual grouping taking place within individual sensory modalities when studying crossmodal grouping, as they can clearly determine the final outcome of multisensory integration.

REFERENCES 1. Calvert GA, Spence C and Stein BE (eds). The Handbook of Multisensory Processes. Cambridge, MA: MIT Press; 2004. 2. Hagen MC, Franzen O, McGlone F, Essick G, Dancer C and Pardo JV. Tactile motion activates the human middle temporal/V5 (MT/V5) complex. Eur J Neurosci 2002; 16:957–964. 3. Lewis JW, Beauchamp MS and DeYoe EA. A comparison of visual and auditory motion processing in human cerebral cortex. Cerebr Cortex 2000; 10:873–888. 4. Howard RJ, Brammer M, Wright I, Woodruff PW, Bullmore ET and Zeki S. A direct demonstration of functional specialization within motionrelated visual and auditory cortex of the human brain. Curr Biol 1996; 6:1015–1019. 5. Soto-Faraco S, Kingstone A and Spence C. Multisensory contributions to the perception of motion. Neuropsychologia 2003; 41:1847–1862. 6. Sanabria D, Soto-Faraco S, Chan J and Spence C. When does intramodal perceptual grouping affect multisensory integration? Cogn Affect Behav Neurosci 2004; 4:218–229. 7. Soto-Faraco S, Lyons J, Gazzaniga M, Spence C and Kingston A. The ventriloquist in motion: illusory capture of dynamic information across sensory modalities. Cogn Brain Res 2002; 14:139–146. 8. Navon D. Forest before trees: the precedence of global features in visual perception. Cogn Psychol 1977; 9:353–383. 9. Vroomen J and de Gelder B. Sound enhances visual perception: cross-modal effects of auditory organization on vision. JEP:HPP 2000; 26:1583–1590. 10. Watanabe K and Shimojo S. When sound affects vision. Psych Sci 2000; 12:109–116. 11. Soto-Faraco S, Spence C and Kingstone A. Assessing automaticity in the audio-visual integration of motion. Acta Psychol 2004; in press. 12. Soto-Faraco S, Spence C and Kingstone A. Crossmodal dynamic capture: congruency effects of motion perception across sensory modalities. JEP:HPP 2004; 30:330–345. 13. Kitagawa N and Ichihara S. Hearing visual motion in depth. Nature 2002; 416:172–174. 14. Meyer GF, Wuerger SM, Ro¨hrbein F and Zerzsche C. Low-level integration of auditory and visual motion signals requires spatial co-localisation. Exp Brain Res 2004; in press. 15. Marr D. Vision. San Francisco: Freeman; 1982. 16. Neisser U. Cognitive Psychology. New York: Appleton-Century-Crofts; 1967. 17. Treisman A. Properties, parts, and objects. In: Boff KR, Kaufman L and Thomas JP (eds), Handbook of Perception and Human Performance. New York: Wiley; 1986, pp. 1–70. 18. Bushara KO, Hanakawa T, Immisch I, Toma K, Kansaku K and Hallet M. Neural correlates of crossmodal binding. Nature Neurosci 2003; 6:190–195. 19. Calvert GA, Campbell R and Brammer MJ. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 2000; 10:649–657. 20. Calvert GA. Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cerebr Cortex 2001; 11: 1110–1123. 21. Macaluso E, George N, Dolan R, Spence C and Driver J. Spatial and temporal factors during processing of audiovisual speech: a PET study. Neuroimage 2004; 21:725–732.

Vol 15 No 18 22 December 2004

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

AUDIOVISUAL INTEGRATION AND PERCEPTUALGROUPING

22. Schroeder CE, Smiley J, Fu KMG, McGinnis T, O’Connell MN and Hackett TA. Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. Int J Psychophys 2003; 50:5–18. 23. Morein-Zamir S, Soto-Faraco S and Kingstone A. Auditory capture of vision: examining temporal ventriloquism. Cogn Brain Res 2003; 17:154–163.

NEUROREPORT 24. Meyer GF and Wuerger SM. Cross-modal integration of auditory and visual motion signals. Neuroreport 2001; 12:2557–2560. 25. Francis G and Grossberg S. Cortical dynamics of form and motion integration: persistence, apparent motion, and illusory contours. Vis Res 1996; 36:149–173.

Acknowledgements: This study was supported by a Network Grant from the McDonnell-Pew Centre for Cognitive Neuroscience in Oxford to S.S.-F. and C.S.

Vol 15 No 18 22 December 2004

2 74 9

Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.

Suggest Documents