Effects of Experience, Training and Expertise on Multisensory ...

4 downloads 14541 Views 1MB Size Report
Volume 7403 of the book series Lecture Notes in Computer Science (LNCS) ... Pollick F.E., Petrini K. (2012) Effects of Experience, Training and Expertise on ...
Effects of Experience, Training and Expertise on Multisensory Perception: Investigating the Link between Brain and Behavior Scott A. Love1, Frank E. Pollick2, and Karin Petrini3 1

Indiana University, Department of Psychological and Brain Sciences, Bloomington IN 47405 USA 2 University of Glasgow, School of Psychology, 58 Hillhead Street, Glasgow G12 8QB UK 3 University College London, Institute of Ophthalmology, UK [email protected], [email protected], [email protected]

Abstract. The ability to successfully integrate information from different senses is of paramount importance for perceiving the world and has been shown to change with experience. We first review how experience, in particular musical experience, brings about changes in our ability to fuse together sensory information about the world. We next discuss evidence from drumming studies that demonstrate how the perception of audiovisual synchrony depends on experience. These studies show that drummers are more robust than novices to perturbations of the audiovisual signals and appear to use different neural mechanisms in fusing sight and sound. Finally, we examine how experience influences audiovisual speech perception. We present an experiment investigating how perceiving an unfamiliar language influences judgments of temporal synchrony of the audiovisual speech signal. These results highlight the influence of both the listener’s experience with hearing an unfamiliar language as well as the speaker’s experience with producing non-native words. Keywords: multisensory, audiovisual, perception, expertise, drumming.

1

Introduction

Everyday we receive a large amount of sensory information, the majority of which is redundant. Some of this information comes from the same source and needs to be combined, whilst other information needs to be kept separate because it arises from a different source. The ability of our brain to process multisensory information and make sense of it is essential for our wellbeing and for conducting everyday tasks. Many internal and external factors can dictate whether two sensory signals will be integrated and how. For example, in a situation in which sound localization is limited (e.g. when walking in a very noisy street) combining sound and sight can help us make better decisions and keep us from harm (e.g. when the pedestrian traffic light starts beeping and concomitantly turns green). Thus combining multiple cues can reduce uncertainty and enhance our ability to make better estimates of the situation A. Esposito et al. (Eds.): Cognitive Behavioural Systems 2011, LNCS 7403, pp. 304–320, 2012. © Springer-Verlag Berlin Heidelberg 2012

Effects of Experience, Training and Expertise on Multisensory Perception

305

[1, 2]. Because of this realization, in cognitive neuroscience we have seen somewhat of a paradigm shift away from trying to explain human perception by individuating and understanding each of our senses separately. Indeed the field has moved towards a more holistic approach that considers the interaction between the senses to be, at least, as important as unimodal perception. Shadowing this shift, we begin this article by briefly describing behavioral, functional and neuroanatomical evidence of experience-dependent plasticity of unimodal and multimodal processing. Subsequently, we outline a relatively fresh research strand aiming to specifically understand how expertise can enhance, or fine tune, and alter multisensory processing; in particular, how musical training or experience with a particular language can influence audiovisual synchrony perception.

2

Effects of Experience on Unimodal and Multimodal Processing

Several behavioral studies have now demonstrated that experience can modify sensitivity to unimodal sensory information such as vision, touch and sound [3-5], as well as the way we integrate this information [6, 7]. For example, Green and Bavelier [3] reported that playing action video games enhances the spatial resolution of visual processing, and Atkins, Fiser and Jacobs [6] that experiencing haptic information can modify observers’ reliability estimates of visual cues during three-dimensional visual perception. This multisensory malleability is not confined to visual-haptic interaction, but extends to audiovisual temporal processesing. Powers, Hillock [8], for example, found that multisensory perceptual training can decrease our tolerance towards audiovisual asynchrony, by reducing the size of the temporal integration window (TIW). When we integrate information from our various senses, experience matters not only because it can affect the way we estimate cue reliability and subsequently combine the cues, but also because it can affect the role of prior knowledge in cue combination. Prior knowledge stems from previous experience of the world and in certain instances it may be innate [9-13]. A striking example of prior knowledge that is often reported is an illusion called the ‘hollow-face’ [14] in which a concave mask elicits the percept of a convex face. This happens because the prior belief that faces are convex overrides the sensory information of concavity. In an elegant study Adams, Graf [15] also showed that the prior assumption of light-from-above can be changed by repeated haptic feedback and that this adaptative mechanism extends to different situations and tasks. These findings clearly suggest that our behaviors and assumptions are not only dependent on sensory information but are constantly shaped by prior experience. Along with human behavior, brain structure and function are remarkably plastic; moreover, recent reviews have outlined the general nature of the experiencedependent organization of both cortical and sub-cortical brain regions [16-18]. These reviews highlight that neuroanatomical organization and behavior can be modified by many different types of expertise as well as by learning over both long and short

306

S.A. Love, F.E. Pollick, and K. Petrini

periods of time. For example, learning the identity of unfamiliar voices over a short period, i.e., around 6 twenty minute learning sessions, produces significantly improved vocal identity discrimination performance and alters how voices are processed in the inferior frontal cortex [19]. The N170 electrophysiological component, most often referenced in regards to face processing [20], is also modulated by expertise with objects other than faces such as ‘greebles’ [21, 22], dogs and birds [23], and fingerprints [24, 25]. Effects of expertise on brain activity have also been reported when using functional magnetic resonance imagining (fMRI) to compare expert dancers to non-experts [26-28]. For example, Calvo-Merino and collaborators found that expert dancers and non-experts differed in the level of activation in fronto-temporal-parietal cortex when viewing dancing actions [26]. That is, dancers had greater activation for movements that they had been trained to perform than for those that they had not, while non-experts did not display any difference, demonstrating that the specific motor expertise was key to explaining changes in brain responses. In regards to brain structure, the length of time London taxi drivers have been in their job correlates with how much larger their posterior hippocampi are than those of controls [29, 30]. Similarly, the posterior hippocampi of dancers and slackliners are larger than those of controls [31]. Moreover, the gray matter volume of several cortical areas are known to increase after just 40 hours of golf training [32]. These are just a few examples of how learning and expertise in various areas can produce skilled behavior that can be associated with plastic reorganization of brain structure and function. Several studies have now shown that inter-individual variation in white matter reflects behavioral variation [33, 34]; however, they were not able to determine a causal role of experience or training on white matter structures. However, the existence of such a causal role on grey matter changes was demonstrated in 2004 by Draganski and collaborators [35]. In a longitudinal MRI study, the authors compared changes in grey matter structures in a group of adults that were trained to juggle (a motor skill that requires accurate bimanual movements) to a group of non-jugglers. Jugglers and non-jugglers were scanned before and after training and using voxelbased morphometry (VBM) Draganski and collaborators showed an enhancement in grey matter in a mid-temporal area (hMT/V5) and in the left posterior intraparietal sulcus of the jugglers after the training. In a more recent longitudinal study this evidence of a causal relationship between training and changes in brain structures was extended to white matter [36], by using a similar task and diffusion tensor imaging (DTI). Although it cannot be excluded that in some instances experts naturally have larger brain structures, the aforementioned findings provide strong evidence for the direction of causation between changes in behavior and changes in brain structure and function. Nevertheless the real extent and causal relationship of training-induced brain changes can only be completely understood by studying skilled performers after longterm training. This is one reason why musicians are a very useful group to study brain plasticity and the neural correlates of skilled performers. Differences between musicians and non-musicians in both gray and white matter have also been consistently outlined [37-45]. For example, the cerebellum of male professional keyboard players is larger than that of controls [40] and both singers and instrumentalists

Effects of Experience, Training and Expertise on Multisensory Perception

307

have larger white-matter tract volume and fractional anisotropy in bilateral arcuate fasciculi [41]. Interestingly, a review of this literature pointed out that the specific neuroanatomical changes that occur are dependent on the particular domain of musical expertise [46]. Further to these structural differences, [47], used proton MR spectroscopy to highlight differences between musicians and non-musicians in their concentration levels of the N-acetylaspartate metabolite within the planum temporale.

3

Effects of Musical Experience on Unimodal and Multimodal Processing

Over the last two decades musical expertise has been extensively used as a model to investigate brain plasticity [48-52]. Musical expertise is achieved over many years of extensive training, which fine-tunes and enhances perceptual, cognitive and motor abilities. Crucially, musical expertise is also a specialization that not every individual undertakes; this enables researchers to study plasticity by comparing two different cohorts, i.e., musicians and non-musicians, or by longitudinally observing musical novices as they become musical experts [53]. Furthermore, musical training does not only improve musical ability it can also enhance behavioral performance on a variety of other cognitive abilities: speech perception and linguistic ability [54], second language linguistic skills [55], verbal working memory [56], musical and non-musical auditory imagery [57], visuospatial perception and imagery [58] and even mathematical ability [59]. Hence, the musician’s brain is not only an ideal model to explore experience-dependent plasticity in regards to musical experience but also on how experience in one domain transfers to others. How musical expertise transfers to speech perception and linguistic ability is arguably one of the most extensively studied areas of experience-dependent plasticity [55, 60, 61]. The extensive work cited above focuses mainly on how musical experience can shape unimodal processing; however, musical experience, in particular, is inherently multisensory and it would therefore be prudent to use this type of training to explore the experience-dependent nature of multisensory processing. Interestingly, the importance of experience for the development of multisensory integration has been observed at the single-neuron level in non-human animal research [62, 63]. For example, when neurons in the superior colliculus of the cat are deprived of input from the cortex these neurons fail to develop the ability to integrate multisensory information [64, 65]. Over the past five years we have been building upon the small amount of research that has used musical experience to explore how multisensory processing can be altered after many years of musical training [66-73]. One of the first studies investigating the effects of musical training on multimodal information processing and cortical plasticity was conducted by Schulz and collaborators [73]. They used magnetoencephalography (MEG) to compare a group of professional trumpet players with a group of non-musicians and showed that the musicians processed multimodal information (haptic/auditory) differently from nonmusicians: when the lower lip was stimulated simultaneously with a tone a different response was elicited compared to when either the lip or the tone were stimulated separately. A subsequent study [74] has since extended these findings by showing that even short periods (2 weeks) of multimodal musical training (i.e., playing musical

308

S.A. Love, F.E. Pollick, and K. Petrini

sequences on piano) can induce cortical brain plasticity and elicit differential responses compared to unimodal training (listening and make judgments about the musical sequences executed by the multimodally trained group). Specifically, the multimodally trained group showed an enlargement of mismatch negativity (MMNm) from magnetoencephalographic measurements after training when compared to the unimodally trained group. The extent of musical training on multimodal processing leading to brain plasticity and reorganization is not limited to the cortical sensory structures and to a specific age. Indeed, modifications elicited by musical training extend to subcortical sensory structures responding to auditory and audiovisual information [72], and can be detected from early childhood [37, 49, 75] 3.1

Effects of Musical Experience on Perception of Audiovisual Synchrony and Congruency

One way that expertise in multisensory processing of music has been studied is to examine differences between novices and musicians [66-70] in their sensitivity, to audiovisual asynchrony [76-82], to audiovisual congruency [83-86], and to the interaction between these two processes [68, 78, 87, 88]. Since audio and visual channels have different processing latencies due to dissimilarity in physical and neural transmission [76, 89-91] the problem of how they are combined to obtain a unitary percept is not trivial. The amount of asynchrony that can be tolerated while still perceiving the audio and visual streams as unitary is known as the “Temporal Integration Window” (TIW). This window gives a good behavioral measure of training-induced changes in the way musicians process multisensory information, and can be usefully linked to changes in brain activation. In a very recent study Lee and Noppeney [82] examined the TIW of 18 pianists and 19 non-musicians when viewing audiovisual videos of either speech or piano actions. They showed, in agreement with previous results [68-70], that musicians were less tolerant to audiovisual asynchrony (had a smaller TIW) than non-musicians, and also that this higher sensitivity was specific to the piano displays. Having ascertained that, Lee and Noppeley then used fMRI to examine the neural correlates of this behavioral difference between pianists and non-musicians and reported enhanced asynchrony effects for musicians in left superior precentral sulcus, right posterior superior temporal sulcus/middle temporal gyrus, and left cerebellum and effective connectivity for music in an STS-premotorcerebellar circuitry. Based on their findings the authors thus conclude that piano practicing affords an internal forward model that enables more precise predictions of the relative timings of the audiovisual signals. This idea builds upon initial observations of musical conductors having more finely tuned auditory and temporal processes than non-conductors as assessed by using behavioral and fMRI methods [67], and in the next section we will further explore this issue by reviewing a series of studies comparing drummers and non-musicians sensitivity to asynchrony. 3.2

Effects of Drumming Experience on Perception of Audiovisual Synchrony and Congruency

For their studies, Petrini and colleagues chose drumming. These movements were chosen since drumming movements are very visually salient, in contrast to some other

Effects of Experience, Training and Expertise on Multisensory Perception

309

musical instruments, where asynchrony could be much harder to detect. Motion capture data of drummers playing a swing groove [92] were shown as point light displays [93] in combination with a sound synthesized from an impact model using input from the motion data (see description of the display in Figure 1). Point light displays (PLDs) allow one to isolate the effects of perceiving biological motion from contextual factors, and the specific rhythmic pattern of the swing groove provided a perfect simple stimulus to differentiate between novices and experts.

Fig. 1. Schematic of stimulus conditions used in the fMRI study [71]. In the top center of the figure a single frame from the point-light display is presented. The point-light dots represent the drummer’s arm beginning at the shoulder joint. Note that the white line outlining the drummer is presented here for clarity only and did not appear in the presented stimulus. For both experiments in left and right columns, the attributes of the visual motion, in terms of the relationship of the original motion velocity relative to implied velocity, appear in the top plots, and the produced sound waveforms appear directly under that. The lowermost panels depict the relationship of the timing of auditory (A) and visual (V) stimuli relative to one another. In Experiment 1 (left column), the displays had an audio that maintained the natural covariation with the visual signal but was presented either in synchrony (left plot on the bottom) or asynchrony (right plot on the bottom). In Experiment 2, the displays had an audio that was always in synchrony with the visual signals, although in one case it covaried with it (left plot on the top) and in the other case it did not (right plot on the top). With kind permission from Elsevier, Petrini et al. (2011) Action expertise reduces brain activity for audiovisual matching actions: An fMRI study with expert drummers. NeuroImage 56, 1480-1492, Fig. 1, Copyright (2011) Elsevier Limited The Boulevard, Langford Lane Kidlington, Oxford, OX5 1GB,UK).

Petrini, Dahl et al. [68] showed that not only are drummers more sensitive to asynchrony (i.e., less tolerant of audiovisual asynchrony), but also that, unlike novices, their sensitivity depends less on the manipulation of other physical

310

S.A. Love, F.E. Pollick, and K. Petrini

characteristics, such as drumming tempo [68, 77] or audiovisual incongruency [68, 78, 94, 95]. Indeed, while novices are facilitated in detecting asynchrony for drumming displays with faster tempos [68, 69, 77], and also for drumming displays where the covariation between the sound and the drummer’s movement has been eliminated [68], drummers are not. The evidence that musicians can tap at slower tempos than non-musicians [96] may explain why drummers are not affected by changes in drumming tempo when judging audiovisual simultaneity. Through practice, drummers acquire the ability to perform drumming actions at a wide range of tempos, which could be why changes in tempo do not affect the way drummers bind the familiar biological motion and its sound. These findings seem to indicate that, after a long period of musical practice, the binding of biological motion and its sound changes in such a way that additional factors are no longer used by our neural system to integrate the multisensory information. This is probably because the system reaches a very high and unbiased level of precision itself, and recent findings seem to further corroborate this conclusion. Petrini, Holt & Pollick [70], for instance, found that only novices’ simultaneity judgments were affected by the rotation of a drumming display (point light display in Figure 1 rotated at 90, 180 and 270 degree), while drummers’ were not. That is, the tolerance to asynchrony of novices increased when viewing rotated audiovisual drumming displays, while that of the drummers remained relatively unchanged. This extends the findings of Saygin et al. [97], to another kind of audiovisual biological motion event, and indicates that the gestalt of upright point light drumming enhances the detection of audiovisual asynchrony for musical novices but not for expert drummers. Hence, the nature of the visual stimulation can affect the perceived synchrony between the two sensory signals, but the extent of this effect is constrained by the level of experience with a particular multisensory event. If drummers are better able to detect asynchrony because of their experience and familiarity with that particular biological motion and its resulting sound, then they should still be better than novices when only a part of the body information is presented in the drumming displays. In other words, while the drummers could have acquired, through practice, internal models specific to drumming biological motion that they can use to predict the sound occurrence when no impact point is presented, this should not be the case for the novices. In a further study [69] we addressed this possibility and demonstrated that this is exactly what happens. Not only were drummers found to be better than novices at detecting asynchrony between the drummer’s biological motion and the sound, but they were also the only group that could still bind the information from both sensory domains. Indeed it was found that novices were completely unable to discriminate between synchronous and asynchronous drumming displays when the impact point was eliminated. However, when presented with either the intact drumming information (Figure 1) or only the impact point, drummers demonstrated a lack of difference in sensitivity to asynchrony, indicating that as long as the impact point is there they will use it as much as the novices, although maintaining a narrower audiovisual temporal integration window. Thus, while drummers can use both kinds of information, novices can only refer to the impact point when deciding whether or not the sound and the drummer’s movement are part of the same action. These findings suggest that

Effects of Experience, Training and Expertise on Multisensory Perception

311

expertise with a certain action enhances the ability to maintain a coherent representation of the multisensory aspects of biological motion. This assumption is strengthened by the finding that when drummers judged the simultaneity between the drumming biological motion and the sound of the aforementioned display from which the impact point information was eliminated, their results were reminiscent of tapping tasks [98, 99], indicating that the acquired information for that specific action was used. In other words, when presented with only the point light arm information of the drumming display, drummers’ points of subjective simultaneity occurred in some instances when the sound was leading the sight, showing the same anticipatory effect as that found in tapping tasks [98, 99]. This interpretation suggests that drummers do not possess only a general enhanced ability to determine the co-occurrence of the auditory and visual information for any kind of multisensory event, but they also possess a more specific ability to use the representation of that action to bind sight and sound. These examinations of the temporal integration window of drummers appear to show that the narrow tuning for audiovisual asynchrony exhibited by the drummers [68-70] and potentially also the ability to fuse sight and sound from incomplete visual displays [69] results from both involvement of higher order (cognitive) processes for the novices in fusing together the audio and visual tracks as well as enhanced perceptual and simulation processes of the drummers in detecting asynchronous events. The neural basis of these differences was studied using brain imaging techniques [71]. Specifically, we used functional Magnetic Resonance Imaging (fMRI) to measure the brain activity of a group of drummers and novices while they watched synchronized and asynchronized PLD drumming displays. Their task was to determine whether the biological motion of the drummer and the sound matched or not (see Figure 1 for a detailed description of stimulus and design used in the fMRI experiments). The timings for the synchronous and asynchronous displays were determined separately for each participant in a behavioral experiment immediately prior to entering the MRI scanner. This predetermination of the optimal timings was necessary to exclude any difference in brain processes between drummers and novices that could be due to differences in task difficulty, rather than in the multisensory processing. Behavioral results from subjects in the scanner indicated that both groups were almost perfect in detecting when the drummer’s movement and corresponding sound mismatched in Experiment 1; despite this, the patterns of activation in the detected brain regions were obviously different between the groups [71]. For example, during the task there was a reduced overall activation in bilateral middle frontal gyrus (MFG) for experts compared to novices. Moreover, there was an interaction effect in both the cerebellum and the parahippocampal gyrus: drummers activated these regions less than novices but only to the synchronous stimulus displays. In line with these findings, drummers were found to have reduced activation in fronto-temporal-parietal regions in Experiment 2 were the congruency between the drummer’s movements and resulting sound was manipulated (in Experiment 2 sound was always synchronized with the drummer’s movements, but the natural covariation between sound intensity and velocity of the drumming strike was manipulated). Our results are complementary to those of Lee and Noppeney [82] in showing finer tuning in musicians than non-musicians when processing audiovisual synchrony information.

312

S.A. Love, F.E. Pollick, and K. Petrini

Indeed, not only did we find differences in musicians and non-musicians in similar areas (e.g., cerebellum and precentral gyrus), despite examining two different types of musical expertise (piano and drum players), but also whereas Lee and Noppeney [82] show enhanced asynchrony effects for musicians, we show a reduced synchrony effect for musicians. Taken together, these findings provide evidence for a two-way training-induced mechanism, where musical practice increases sensitivity to audiovisual asynchrony by reducing the brain resources required when multisensory information is obviously synchronous plus fine tuning and increasing precision when a delay is present between the incoming auditory and visual information.

4

Effects of Language Experience on Perception of Audiovisual Synchrony

The vast majority of humans can be regarded as experts in speech perception; however, in general, individuals are only experts in their own native language. Navarra, Alsius [100] took advantage of this fact to explore how expertise with a particular language influences an individual’s perception of audiovisual speech synchrony. Synchronous and asynchronous audiovisual stimuli containing either English or Spanish sentences were presented to native English and native Spanish participants, while their task was to decide if the audio and visual streams were in synchrony or not. Native language experience was found to increase the amount of visual lead required for the audio and visual streams to be optimally perceived as synchronous, i.e., when the speech was in the participants’ native language their point of subjective simultaneity (PSS) was larger than when it was in the foreign language. Interestingly, this effect was not present in a group of participants who had experience with both English and Spanish [100]. Thus being a speech expert appears to increase participants’ tolerance to audiovisual asynchrony, while being a music expert appears to decrease such a tolerance [101]. This posits an interesting question of whether musical and speech expertise have somehow different effects on brain plasticity leading to different PSS when either the visual or the auditory information are unfamiliar (e.g. from a non-native language) to the listener. Here we build on the Navarra, Alsius [100] study and describe a similar experiment that also aimed to explore how mismatches between visual and auditory language (e.g. seeing the facial movements for the native language coupled with the sound of a non-native language) influences the perception of audiovisual speech synchrony. 4.1

Methods

Participants. Eighteen monolingual native English speakers took part in the experiment. All participants had between two and four years of French lessons in school, none had taken Italian, and all described themselves as monolinguals with very little experience of any other language. Nine of the participants were female and the age range was between 17 and 29 (mean = 22).

Effects of Experience, Training and Expertise on Multisensory Perception

313

Stimuli. Stimuli were dynamic audiovisual movies of either, a native English speaker or a native Italian speaker separately saying the words, “tomorrow”, “domani” and “andesker”. Domani is tomorrow in Italian and andesker is a made up nonsense word. It is worth noting that the English speaker had no experience of speaking Italian, while the native Italian speaker was actually a bilingual Italian/English speaker. All three words have, the same number of syllables (three), a similar spoken duration (about 1 second) and can easily be spoken with neutral affect. Ten cue onset asynchronous (COA) versions of each movie were created: the audio was either shifted to begin before the video (-400, -320, -240, -160, -80ms) or after (+400, +320, +240, +160, +80ms), in 80ms (2 frame) increments. Two versions of each stimulus were created: one containing the full face of the actor the other only the lower half of the face, i.e., the mouth region. In total, there were 132 movies: 2 (speaker - native English or Italian) x 2 (stimulus view - full face or mouth region) x 3 (word tomorrow, domani, andesker) x 11 (COA levels - ±400, ±320, ±240, ±160, ±80, 0ms). Procedure. Participants completed three sessions on separate days, each lasting around 45 minutes. During a session there were six experimental blocks, each consisting of one presentation of all 132 movies in random order - giving a total of 18 repetitions of each asynchrony level for each condition. After each movie the task question, “Were the audio and visual streams in synchrony with each other?” and possible answers, “1 for in synch and 3 for out of synch”, remained on screen until the participant responded at which point the next trial began. Analysis first involved, for each subject and each of the twelve conditions, finding the normal Gaussian curve that best fit the number of synchronous responses at each COA level. From this fit to the data two parameters of interest were derived: the point of subjective simultaneity (PSS) and the temporal integration window (TIW). The PSS is derived by taking the millisecond COA value that corresponds to the peak of the best-fitting Gaussian and is generally interpreted as the COA that is perceived as being optimally synchronous [77, 101]. The standard deviation (SD) of the fitted distribution was taken as an estimate of the TIW. This window represents the range of COA, around the PSS, within which participants are unable to reliably perceive asynchrony. The PSS and TIW data were then used in separate repeated measures 3-factor analysis of variance (ANOVA) tests: 2 (speaker - native English or Italian) x 2 (stimulus view - full face or mouth region) x 3 (word - tomorrow, domani, andesker). It was evident from these tests that there were no significant differences involving the stimulus view condition. Therefore, we collapsed across this condition before refitting each subjects data and re-estimating the PSS and TIW for each of the six remaining conditions: 2 (speaker - native English or Italian) x 3 (word - tomorrow, domani, andesker). 4.2

Results

The 2-factor repeated measures ANOVA on the PSS data highlighted a significant interaction between the factors speaker and word (F2, 34 = 158.87, p < 0.001) and a significant main effect of word (F2, 34 = 8.15, p = 0.01). The main effect of nationality was not significant (F1, 17 = 1.47, p = 0.242). To further explore the interaction,

314

S.A. Love, F.E. Pollick, and K. Petrini

1-factor ANOVAs on the word factor were run separately for each speaker nationality. Significant main effects for both the native English (F2, 34 = 59.83, p < 0.001) and native Italian speakers (F2, 34 = 21.76, p < 0.001) were found. Bonferroni corrected pairwise comparison follow up tests highlighted the cause of the significant interaction (see also Figure 2): when the native English speaker said “domani” the PSS (126ms) was significantly larger compared to that of “tomorrow” (36ms, p < 0.001) or “andesker” (40ms, p < 0.001); in contrast, when the native Italian speaker said “domani” the PSS (35ms) was significantly lower compared to “tomorrow” (70ms, p < 0.001) and “andesker” (80ms, p < 0.001). There was never a significant difference between tomorrow and andesker (p = 0.753 for English speaker and p = .673 for Italian speaker). The 2-factor repeated measures ANOVA on the TIW data highlighted a significant interaction between the factors speaker and word (F2, 34 = 13.59, p < 0.001) as well as significant main effects of word (F2, 34 = 4.54, p = 0.018) and nationality (F1, 17 = 5.02, p = 0.039). To further explore the interaction, 1-factor ANOVAs on the word factor were tested separately for each speaker nationality. The ANOVA for the native English speaker produced a significant main effect (F2, 34 = 3.69, p = 0.035); however, bonferroni corrected pairwise comparison follow up tests highlighted no significant difference between any of the words. The ANOVA for the native Italian speaker was also significant (F2, 34 = 7.63, p = 0.002) and follow up tests showed that the TIW for andesker was significantly larger than that of both tomorrow (p = 0.021) and domani (p = 0.036).

Fig. 2. Average PSS (a) and TIW (b) values for each word spoken by a native English speaker (dashed line with square markers) and a native Italian speaker (solid line with diamond markers). Error bars represent standard errors of mean

4.3

Discussion

To investigate the role of experience with a particular language on synchrony perception we studied synchrony judgments made by native English speakers on audiovisual movies of two speakers (one native English the other native Italian) uttering three different words. Regarding the PSS, a speaker by word interaction was found: while “domani” uttered by the English speaker led to the largest PSS, it led to the smallest PSS when uttered by the Italian speaker. Furthermore, there was no

Effects of Experience, Training and Expertise on Multisensory Perception

315

difference between the English word and the nonsense word regardless of which speaker produced them. Measures of the TIW also highlighted an interaction that was mostly due to the nonsense word producing a particularly large TIW for the Italian speaker compared to the English speaker. Interestingly, the smallest PSS for each speaker occurred for the speaker’s native language. This indicates that rather than the experience of the participants’ native language being solely responsible for our results, the experience of the speaker in producing the words greatly influenced synchrony judgments. This is further supported by the fact that the difference in PSS between the two real words (tomorrow and domani) was larger for the monolingual English speaker than the bilingual Italian/English speaker. This suggests that the bilingual speaker produced a more coherent and congruent relationship between the visual and auditory cues to his non-native real word, i.e., tomorrow, than did the monolingual English speaker for his non-native real word, i.e., domani. Therefore, our results demonstrate that while experience is often used to refer to the participants’ familiarity with the items being studied, in the particular case of audiovisual speech, the speaker’s experience also plays a major role in how their speech will be perceived. This, along with methodological differences, may help to explain why we failed to replicate the results of Navarra, Alsius [100]. Our native English observers required a smaller PSS for native speech compared to non-native speech, which is the opposite result of Navarra et al. Also when the display portraying the Italian speaker (e.g. non-native language movements to the English listeners) was coupled with the native language word ‘tomorrow’ the PSS was smaller than when the display portraying the English speaker (e.g. native language movements to the listener) was coupled with the non-native word ‘domani’, meaning that the increased tolerance to audiovisual asynchrony was mostly driven by the non-nativity/unfamiliarity of the sound information. No differences were observed between the English word and the non-word for both speakers. Experience with the non-words was similar for both speakers in that neither had experience with the ‘word’ andesker; however, participants reported that andesker sounded like it could be a possible English word. So the comparison between the English word and the non-word demonstrates the role of the participants’ experience and expectations on their synchrony judgments. When the speaker’s ability is similar, the listeners’ expectation/experience is driving their judgments, making the perception of the non-word similar to that of the English word. Finally, introducing a mismatch between visual and auditory language information causes speech experts to be more tolerant to audiovisual asynchrony (e.g. producing larger PSS) than when the language of the visual and auditory information matches. This result is opposite to what we and others found with music and object actions displays, for which either the mismatch between visual and auditory information reduced the amount of tolerance to audiovisual asynchrony [101] or did not have any effect on temporal discrimination accuracy [102, 103]. Our findings, however, are similar to other speech studies in which gender mismatch between speaker and produced syllables increased how much the visual information had to lead the auditory in order to perceive them simultaneously ([95], see Experiment 1 and 3). Hence, the results of the present study supportVatakis and Spence [102] hypothesis

316

S.A. Love, F.E. Pollick, and K. Petrini

that the ‘unity assumption’ (i.e., the observer’s assumption that two different sensory signals refer to the same multisensory event) may not have the same effect on the multisensory integration of speech and non-speech stimuli. This supports the idea that the effect of the ‘unity assumption’ can be driven by both top-down and bottom-up factors contributing to multisensory integration.

5

Summary

In this chapter we discussed the role of experience on human abilities to integrate sight and sound, and related behavioral results to potential neural mechanisms. Moreover, we focused on the topic of synchrony perception in the domains of music and speech expertise. The evidence presented provides clues to some possible mechanisms of multisensory integration that are common across different domains. Clearly our results taken together with the broader literature indicate that experience does alter neural mechanisms in the process of obtaining heightened abilities to discriminate temporal properties. In the future we believe that studies using methods other than those involving the study of temporal synchrony will be helpful to understand differences between these domains. For example, musical training might change the ways in which we weight particular sensory cues and understanding this weighting could provide important insights into the development of expertise.

References 1. Landy, M.S., et al.: Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res. 35(3), 389–412 (1995) 2. Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002) 3. Green, C.S., Bavelier, D.: Action-video-game experience alters the spatial resolution of vision. Psychol. Sci. 18(1), 88–94 (2007) 4. Simmons, R.W., Locher, P.J.: Role of extended perceptual experience upon haptic perception of nonrepresentational shapes. Percept. Mot. Skills 48(3 Pt. 1), 987–991 (1979) 5. Kisilevsky, B.S., et al.: Effects of experience on fetal voice recognition. Psychol. Sci. 14(3), 220–224 (2003) 6. Atkins, J.E., Fiser, J., Jacobs, R.A.: Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision Res. 41(4), 449–461 (2001) 7. Jacobs, R.A., Fine, I.: Experience-dependent integration of texture and motion cues to depth. Vision Res. 39(24), 4062–4075 (1999) 8. Powers III, A.R., Hillock, A.R., Wallace, M.T.: Perceptual training narrows the temporal window of multisensory binding. J. Neurosci. 29(39), 12265–12674 (2009) 9. Mamassian, P., Goutcher, R.: Prior knowledge on the illumination position. Cognition 81(1), B1-B9 (2001) 10. Mamassian, P., Landy, M.S.: Interaction of visual prior constraints. Vision Res. 41(20), 2653–2668 (2001) 11. Mondloch, C.J., et al.: Face perception during early infancy. Psychol. Sci. 10(5), 419–422 (1999) 12. Turati, C.: Why faces are not special to newborns: An alternative account of the face preference. Current Directions in Psychological Science 13(1), 5–8 (2004)

Effects of Experience, Training and Expertise on Multisensory Perception

317

13. Hershber, W.: Attached-Shadow Orientation Perceived as Depth by Chickens Reared in an Environment Illuminated from Below. Journal of Comparative and Physiological Psychology 73(3), 407-&. (1970) 14. Gregory, R.L.: Knowledge in perception and illusion. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 352(1358), 1121–1127 (1997) 15. Adams, W.J., Graf, E.W., Ernst, M.O.: Experience can change the ’light-from-above’ prior. Nat. Neurosci. 7(10), 1057–1058 (2004) 16. Dayan, E., Cohen, L.G.: Neuroplasticity subserving motor skill learning. Neuron 72(3), 443–454 (2011) 17. May, A.: Experience-dependent structural plasticity in the adult human brain. Trends Cogn. Sci. 15(10), 475–482 (2011) 18. Pascual-Leone, A., et al.: The plastic human brain cortex. Annu. Rev. Neurosci. 28, 377–401 (2005) 19. Latinus, M., Crabbe, F., Belin, P.: Learning-induced changes in the cerebral processing of voice identity. Cereb Cortex 21(12), 2820–2828 (2011) 20. Bentin, S., et al.: Electrophysiological Studies of Face Perception in Humans. J. Cogn. Neurosci. 8(6), 551–565 (1996) 21. Rossion, B., et al.: Expertise training with novel objects leads to left-lateralized facelike electrophysiological responses. Psychol. Sci. 13(3), 250–257 (2002) 22. Bukach, C.M., et al.: Does acquisition of Greeble expertise in prosopagnosia rule out a domain-general deficit? Neuropsychologia 50(2), 289–304 (2012) 23. Tanaka, J.W., Curran, T.: A neural basis for expert object recognition. Psychol. Sci. 12(1), 43–47 (2001) 24. Busey, T.A., Vanderkolk, J.R.: Behavioral and electrophysiological evidence for configural processing in fingerprint experts. Vision Res. 45(4), 431–448 (2005) 25. Busey, T.A., Parada, F.J.: The nature of expertise in fingerprint examiners. Psychon. Bull. Rev. 17(2), 155–160 (2010) 26. Calvo-Merino, B., et al.: Action observation and acquired motor skills: an FMRI study with expert dancers. Cereb Cortex 15(8), 1243–1249 (2005) 27. Calvo-Merino, B., et al.: Seeing or doing? Influence of visual and motor familiarity in action observation. Curr. Biol. 16(19), 1905–1910 (2006) 28. Cross, E.S., Hamilton, A.F., Grafton, S.T.: Building a motor simulation de novo: observation of dance by dancers. Neuroimage 31(3), 1257–1267 (2006) 29. Maguire, E.A., et al.: Navigation-related structural change in the hippocampi of taxi drivers. Proc. Natl. Acad. Sci. U S A 97(8), 4398–4403 (2000) 30. Woollett, K., Maguire, E.A.: Acquiring "the Knowledge" of London’s layout drives structural brain changes. Curr. Biol. 21(24), 2109–2114 (2011) 31. Hufner, K., et al.: Structural and functional plasticity of the hippocampal formation in professional dancers and slackliners. Hippocampus 21(8), 855–865 (2011) 32. Bezzola, L., et al.: Training-induced neural plasticity in golf novices. J. Neurosci. 31(35), 12444–12448 (2011) 33. Johansen-Berg, H., et al.: Integrity of white matter in the corpus callosum correlates with bimanual co-ordination skills. Neuroimage 36(suppl. 2), T16–T21 (2007) 34. Tuch, D.S., et al.: Choice reaction time performance correlates with diffusion anisotropy in white matter pathways supporting visuospatial attention. Proc. Natl. Acad. Sci. U S A 102(34), 12212–12217 (2005) 35. Draganski, B., et al.: Neuroplasticity: changes in grey matter induced by training. Nature 427(6972), 311–312 (2004) 36. Scholz, J., et al.: Training induces changes in white-matter architecture. Nat. Neurosci. 12(11), 1370–1371 (2009)

318

S.A. Love, F.E. Pollick, and K. Petrini

37. Bengtsson, S.L., et al.: Extensive piano practicing has regionally specific effects on white matter development. Nat. Neurosci. 8(9), 1148–1150 (2005) 38. Bermudez, P., et al.: Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cereb Cortex 19(7), 1583–1596 (2009) 39. Gaser, C., Schlaug, G.: Brain structures differ between musicians and non-musicians. J. Neurosci. 23(27), 9240–9245 (2003) 40. Hutchinson, S., et al.: Cerebellar volume of musicians. Cereb Cortex 13(9), 943–949 (2003) 41. Halwani, G.F., et al.: Effects of practice and experience on the arcuate fasciculus: comparing singers, instrumentalists, and non-musicians. Front Psychol. 2, 156 (2011) 42. Imfeld, A., et al.: White matter plasticity in the corticospinal tract of musicians: a diffusion tensor imaging study. Neuroimage 46(3), 600–607 (2009) 43. Schmithorst, V.J., Wilke, M.: Differences in white matter architecture between musicians and non-musicians: a diffusion tensor imaging study. Neurosci. Lett. 321(1-2), 57–60 (2002) 44. Schlaug, G., et al.: In vivo evidence of structural brain asymmetry in musicians. Science 267(5198), 699–701 (1995) 45. Ozturk, A.H., et al.: Morphometric comparison of the human corpus callosum in professional musicians and non-musicians by using in vivo magnetic resonance imaging. J. Neuroradiol. 29(1), 29–34 (2002) 46. Tervaniemi, M.: Musicians–same or different? Ann. N Y Acad. Sci. 1169, 151–156 (2009) 47. Aydin, K., et al.: Quantitative proton MR spectroscopic findings of cortical reorganization in the auditory cortex of musicians. AJNR Am. J. Neuroradiol. 26(1), 128–136 (2005) 48. Elbert, T., et al.: Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–357 (1995) 49. Hyde, K.L., et al.: Musical training shapes structural brain development. J. Neurosci. 29(10), 3019–3025 (2009) 50. Hyde, K.L., et al.: The effects of musical training on structural brain development: a longitudinal study. Ann. N Y Acad. Sci. 1169, 182–186 (2009) 51. Kraus, N., Chandrasekaran, B.: Music training for the development of auditory skills. Nat. Rev. Neurosci. 11(8), 599–605 (2010) 52. Munte, T.F., Altenmuller, E., Jancke, L.: The musician’s brain as a model of neuroplasticity. Nat. Rev. Neurosci. 3(6), 473–478 (2002) 53. Bangert, M., Altenmuller, E.O.: Mapping perception to action in piano practice: a longitudinal DC-EEG study. BMC Neurosci. 4, 26 (2003) 54. Magne, C., Schon, D., Besson, M.: Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. J. Cogn. Neurosci. 18(2), 199–211 (2006) 55. Milovanov, R., Tervaniemi, M.: The Interplay between Musical and Linguistic Aptitudes: A Review. Front Psychol. 2, 321 (2011) 56. Chan, A.S., Ho, Y.C., Cheung, M.C.: Music training improves verbal memory. Nature 396(6707), 128 (1998) 57. Aleman, A., et al.: Music training and mental imagery ability. Neuropsychologia 38(12), 1664–1668 (2000) 58. Brochard, R., Dufour, A., Despres, O.: Effect of musical expertise on visuospatial abilities: evidence from reaction times and mental imagery. Brain Cogn. 54(2), 103–109 (2004)

Effects of Experience, Training and Expertise on Multisensory Perception

319

59. Schmithorst, V.J., Holland, S.K.: The effect of musical training on the neural correlates of math processing: a functional magnetic resonance imaging study in humans. Neurosci. Lett. 354(3), 193–196 (2004) 60. Besson, M., Chobert, J., Marie, C.: Transfer of Training between Music and Speech: Common Processing, Attention, and Memory. Front Psychol. 2, 94 (2011) 61. Patel, A.D.: Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis. Front Psychol. 2, 142 (2011) 62. Wallace, M.T., Stein, B.E.: Sensory and multisensory responses in the newborn monkey superior colliculus. J. Neurosci. 21(22), 8886–8894 (2001) 63. Wallace, M.T., Stein, B.E.: Development of multisensory neurons and multisensory integration in cat superior colliculus. J. Neurosci. 17(7), 2429–2444 (1997) 64. Wallace, M.T., Stein, B.E.: Cross-modal synthesis in the midbrain depends on input from cortex. J. Neurophysiol. 71(1), 429–4232 (1994) 65. Jiang, W., Jiang, H., Stein, B.E.: Neonatal cortical ablation disrupts multisensory development in superior colliculus. J. Neurophysiol. 95(3), 1380–1396 (2006) 66. Haslinger, B., et al.: Transmodal sensorimotor networks during action observation in professional pianists. J. Cogn. Neurosci. 17(2), 282–293 (2005) 67. Hodges, D.A., Hairston, W.D., Burdette, J.H.: Aspects of multisensory perception: the integration of visual and auditory information in musical experiences. Ann. N Y Acad. Sci. 1060, 175–185 (2005) 68. Petrini, K., et al.: Multisensory integration of drumming actions: musical expertise affects perceived audiovisual asynchrony. Exp. Brain Res. 198(2-3), 339–352 (2009) 69. Petrini, K., Russell, M., Pollick, F.: When knowing can replace seeing in audiovisual integration of actions. Cognition 110(3), 432–439 (2009) 70. Petrini, K., Holt, S.P., Pollick, F.: Expertise with multisensory events eliminates the effect of biological motion rotation on audiovisual synchrony perception. J. Vis. 10(5), 2 (2010) 71. Petrini, K., et al.: Action expertise reduces brain activity for audiovisual matching actions: an fMRI study with expert drummers. Neuroimage 56, 1480–1492 (2011) 72. Musacchia, G., et al.: Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc. Natl. Acad. Sci. U S A 104(40), 15894–15898 (2007) 73. Schulz, M., Ross, B., Pantev, C.: Evidence for training-induced crossmodal reorganization of cortical functions in trumpet players. Neuroreport 14(1), 157–161 (2003) 74. Lappe, C., et al.: Cortical plasticity induced by short-term unimodal and multimodal musical training. J. Neurosci. 28(39), 9632–9639 (2008) 75. Schlaug, G., et al.: Effects of music training on the child’s brain and cognitive development. Ann. N Y Acad. Sci. 1060, 219–230 (2005) 76. Spence, C., Squire, S.: Multisensory integration: maintaining the perception of synchrony. Curr. Biol. 13(13), R519–R521 (2003) 77. Arrighi, R., Alais, D., Burr, D.: Perceptual synchrony of audiovisual streams for natural and artificial motion sequences. J. Vis. 6(3), 260–268 (2006) 78. van Wassenhove, V., Grant, K.W., Poeppel, D.: Temporal window of integration in auditory-visual speech perception. Neuropsychologia 45(3), 598–607 (2007) 79. Vatakis, A., Spence, C.: Audiovisual synchrony perception for music, speech, and object actions. Brain Res. 1111(1), 134–142 (2006) 80. Vatakis, A., Spence, C.: Audiovisual synchrony perception for speech and music assessed using a temporal order judgment task. Neurosci. Lett. 393(1), 40–44 (2006)

320

S.A. Love, F.E. Pollick, and K. Petrini

81. Dixon, N.F., Spitz, L.: The detection of auditory visual desynchrony. Perception 9(6), 719–721 (1980) 82. Lee, H., Noppeney, U.: Long-term music training tunes how the brain temporally binds signals from multiple senses. Proc. Natl. Acad. Sci. U S A 108(51), E1441–E1450 (2011) 83. Petrini, K., et al.: The music of your emotions: neural substrates involved in detection of emotional correspondence between auditory and visual music actions. PLoS One 6(4), e19165 (2011) 84. Petrini, K., McAleer, P., Pollick, F.: Audiovisual integration of emotional signals from music improvisation does not depend on temporal correspondence. Brain Res. 1323, 139–148 (2010) 85. Hein, G., et al.: Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. J. Neurosci. 27(30), 7881–7887 (2007) 86. Kim, R.S., Seitz, A.R., Shams, L.: Benefits of stimulus congruency for multisensory facilitation of visual learning. PLoS One 3(1), e1532 (2008) 87. Munhall, K.G., et al.: Temporal constraints on the McGurk effect. Percept. Psychophys 58(3), 351–362 (1996) 88. Vatakis, A., et al.: Temporal recalibration during asynchronous audiovisual speech perception. Exp. Brain Res. 181(1), 173–181 (2007) 89. Fain, G.L.: Sensory transduction, 340 p. Sinauer Associates, Sunderland (2003) 90. King, A.J.: Multisensory integration: strategies for synchronization. Curr. Biol. 15(9), R339–R3941 (2005) 91. King, A.J., Palmer, A.R.: Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Exp. Brain Res. 60(3), 492–500 (1985) 92. Waadeland, C.H.: Strategies in empirical studies of swing groove. Musicologia Norvegica 32, 169–191 (2006) 93. Jansson, G., Johansson, G.: Visual perception of bending motion. Perception 2(3), 321–326 (1973) 94. McGurk, H.M., Macdonald, J.: Hearing lips and seeing voices. Nature 264(5588), 746–748 (1976) 95. Vatakis, A., Spence, C.: Crossmodal binding: evaluating the "unity assumption" using audiovisual speech stimuli. Percept. Psychophys 69(5), 744–756 (2007) 96. Drake, C., Jones, M.R., Baruch, C.: The development of rhythmic attending in auditory sequences: attunement, referent period, focal attending. Cognition 77(3), 251–288 (2000) 97. Saygin, A.P., Driver, J., de Sa, V.R.: In the footsteps of biological motion and multisensory perception: judgments of audiovisual temporal relations are enhanced for upright walkers. Psychol. Sci. 19(5), 469–475 (2008) 98. Aschersleben, G., Prinz, W.: Synchronizing actions with events: the role of sensory information. Percept. Psychophys 57(3), 305–317 (1995) 99. Miyake, Y., Onishi, Y., Poppel, E.: Two types of anticipation in synchronization tapping. Acta Neurobiol Exp. (Wars), 64(3), 415–426 (2004) 100. Navarra, J., et al.: Perception of audiovisual speech synchrony for native and non-native language. Brain Res. 1323, 84–93 (2010) 101. Petrini, K., et al.: Multisensory integration of drumming actions: musical expertise affects perceived audiovisual asynchrony. Experimental Brain Research 198(2-3), 339–352 (2009) 102. Vatakis, A., Spence, C.: Evaluating the influence of the ’unity assumption’ on the temporal perception of realistic audiovisual stimuli. Acta Psychol. (Amst) 127(1), 12–23 (2008) 103. Vatakis, A., Ghazanfar, A.A., Spence, C.: Facilitation of multisensory integration by the "unity effect" reveals that speech is special. J. Vis. 8(9),14. 1–14.11 (2008)