Scandinavian Journal of Psychology, 2009, 50, 437–444
DOI: 10.1111/j.1467-9450.2009.00741.x
Visual and Audiovisual Speech Role of speechreading in audiovisual interactions during the recovery of speech comprehension in deaf adults with cochlear implants K. STRELNIKOV,1,2 J. ROUGER,1,2 P. BARONE1,2 and O. DEGUINE1,2,3 1
Universite´ Toulouse, CerCo, Universite´ Paul Sabatier, France CNRS, UMR 5549, Faculte´ de Me´decine de Rangueil, Toulouse France 3 Service d’Oto-Rhino-Laryngologie, Hopital Purpan, Toulouse, France 2
Strelnikov, K., Rouger, J., Barone, P. & Deguine, O. (2009). Role of speechreading in audiovisual interactions during the recovery of speech comprehension in deaf adults with cochlear implants. Scandinavian Journal of Psychology, 50, 437–444. Speechreading is an important form of communicative activity that improves social adaptation in deaf adults. Cochlear implantation allows interaction between the visual speechreading abilities developed during deafness and the auditory sensory experiences acquired through use of the cochlear implant. Crude auditory information provided by the implant is analyzed in parallel with conjectural information from speechreading, thus creating new profiles of audiovisual integration with implications for brain plasticity. Understanding the peculiarities of change in speechreading after cochlear implantation may improve our understanding of brain plasticity and provide useful information for functional rehabilitation of implanted patients. In this article, we present a generalized review of our recent studies and indicate perspectives for further research in this domain. Key words: Speechreading, lip-reading, deafness, cochlear implantation, brain plasticity. Pascal Barone, CNRS-Universite Paul Sabatier Toulouse 3, Centre de Recherche Cerveau et Cognition UMR 5549, Faculte´ de Me´decine de Rangueil, 31062 Toulouse CEDEX9, France. E-mail:
[email protected]
GENERAL ASPECTS Due to various factors that damage hair cells within the inner ear, sensorineural hearing loss may occur as a failure to convert sound waves into electrical impulses conveyed through the auditory nerve. During the last three decades, cochlear implantation has become the only efficient method to help patients with profound bilateral sensorineural hearing loss retain speech-based communicative abilities when hearing aids are ineffective (Copeland & Pillsbury, 2004; Deggouj, Gersdorff, Garin, Castelein & Gerard, 2007). Cochlear implants bypass the damaged hair cells and are designed to convert acoustic waves into electric stimulation of the relatively intact auditory nerve. The implant’s external microphone and processor convert the acoustic signal into electric pulses that are transferred through the skin – using electromagnetic waves – to the subcutaneous receiver, from which the signal is sent to the electrode array implanted into the cochlea. Thus, the cochlear implant (CI) converts the acoustic signal into a specific auditory nerve stimulation. This yields meaningful auditory sensations that can lead to the understanding of speech. The evolution of cochlear implants during the last 30 years has led to considerable success in the functional rehabilitation of deafness (Moller, 2006). Modern cochlear implants allow deaf individuals to understand spoken speech, environmental sounds and even in some cases to listen to music, although music perception usually remains poor (Drennan & Rubinstein, 2008; Pressnitzer, Bestel & Fraysse, 2005). As CI technology improves and safer surgical techniques are implemented, indica-
tions for cochlear implantation are being extended (Deggouj et al., 2007). Standard guidelines have been developed for postlingually deaf adults; however, adults with residual hearing (Di Nardo, Cantore, Cianfrone, Melillo, Rigante & Paludetti, 2007) have also been shown to benefit from cochlear implantation and cochlear implant surgery is now being performed in young children of 1 to 2 years or younger (Calmels, Saliba, Wanna et al., 2004).
Speech comprehension and cochlear implants It is important to remember that auditory information sent to the brain from the implant is spectrally degraded (Shannon, Zeng, Kamath, Wygonski & Ekelid, 1995) and lacks some of the fine temporal acoustic structure important for speech comprehension (Friesen, Shannon, Baskent & Wang, 2001; Lorenzi, Gilbert, Carn, Garnier & Moore, 2006). This limitation in the coding strategy used by the implant processor contributes to a period of adjustment during which perceived sounds remain largely indecipherable and hearing may be poor during the first months after implantation (Tyler, Parkinson, Woodworth, Lowder & Gantz, 1997). Many implant recipients reach an acceptable level of understanding for speech and other environmental sounds only after several months of implant experience (‘‘Criteria of candidacy for unilateral cochlear implantation in postlingually deafened adults. I: Theory and measures of effectiveness, 2004). We have been able to follow the evolution of speech comprehension performance in a longitudinal study with a large cohort of postlinguistically deaf CI adult users that extended across a
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations. Published by Blackwell Publishing Ltd., 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. ISSN 0036-5564.
438 K. Strelnikov et al.
Fig. 1. Longitudinal evolution of performances in word comprehension in a large cohort of cochlear implanted patients (n = 97) during a follow-up of over 8 years after implantation. Performances are presented according to the sensory modality of the test, in auditory-only (with the implant, green), in visual-only (speeechreading, blue) or in bimodal conditions (audiovisual, red). Note that the scores of recovery in the auditory comprehension are progressive during the first year postimplantation and reach thereafter a plateau. Note that the speechreading performances are stable in spite of the auditory recovery. Further, in bimodal conditions, the word recognition scores are at a near optimal level at any period after cochlear implantation (adapted from Rouger et al., 2007). Error bars represent standard deviation.
period of 8 years following implantation (Rouger, Lagleyre, Fraysse, Deneve, Deguine & Barone, 2007). Figure 1 shows how performance in word comprehension changed in this cohort of cochlear implanted patients over several years. Immediately following implant activation, CI users showed significant improvement in word recognition in the auditory modality, with a performance level of 47% in quiet conditions. This average performance level was much higher than levels obtained with conventional external hearing aids before implantation (mean 10% correct). Auditory performance continued to increase significantly during the subsequent months before reaching a plateau at about seven months after implantation, then showing no significant change during the following years. Given this plateau in auditory speech comprehension, even after several years of CI experience, and the degraded auditory signal from the cochlear implant, one can expect that the visual modality will play a compensatory role in speech perception in deaf CI patients. Details concerning speechreading of words in this study will be discussed in the next section of the article. In normally-hearing (NH) adults, most speech information can be obtained through the auditory channel, yet visual information can improve speech intelligibility when presented simultaneously with a noisy or degraded acoustic signal (Benoit, Mohamadi & Kandel, 1994; Grant & Braida, 1991; MacLeod & Summerfield, 1987; Ross, Saint-Amour, Leavitt, Javitt & Foxe, 2007; Sumby & Pollack, 1954; Summerfield, 1979). Moreover, visual information is useful for speech detection and intelligibility even when acoustic information is perfectly clear (Arnold & Hill, 2001; Campbell, 2008; Reisberg, McLean & Goldfield, 1987). For example, visual cues decrease the acoustic threshold for speech detection (Grant & Seitz, 2000) and allow better perception of speech presented in an unfamiliar foreign language or
Scand J Psychol 50 (2009)
pronounced with a foreign accent. Visual cues also facilitate speech perception when speech is perfectly clear but characterized by a high level of semantic complexity (Kim & Davis, 2003; Reisberg et al., 1987). Though there is some dispute concerning the sources of inter-individual variability in speechreading, and the utility of speech-reading training (Bernstein, Auer & Tucker, 2001; Summerfield, 1992), several studies including recent studies from our group provide evidence that deaf people can have higher speechreading performance than normally hearing subjects (Bernstein et al., 2001; Bernstein, Demorest & Tucker, 2000; Rouger et al., 2007; Strelnikov, Rouger, Lagleyre, Fraysse, Deguine & Barone, 2009). Although speechreading and auditory speech perception through a cochlear implant involve very different processes, they share at least one process in common. They provide insufficient sensory cues for reliable speech analysis; therefore, they require the involvement of predictive and integrative cognitive strategies to compensate for the lack of reliable information in the input (Stenfelt & Ro¨nnberg, 2009). We do not yet know what happens if the brain receives insufficient information from both modalities (auditory and visual), as often occurs with CI patients. One can suppose that CI patients would benefit from cognitive integration of the two types of information for optimal performance. However, there is no intuitive answer to questions about how speechreading abilities may change following cochlear implantation. Speechreading ability could drop off with time, becoming less necessary as auditory processes recover; it could remain at pre-implant levels, or it could become more important in order to help disambiguate the degraded auditory signal provided by the implant. The last two possibilities are of special interest for rehabilitation and the relevance of incorporating visual cues into auditory training procedures for CI patients. To answer these questions, we carried out a series of studies in our laboratory, which we will systematically summarize and then suggest some directions for future research.
Role of speechreading in speech comprehension in cochlear implanted deaf patients There is considerable confusion between the terms ‘‘lipreading’’ and ‘‘speechreading’’ in the literature. We adopted the distinction proposed by Summerfield (Summerfield, 1992): Lipreading is perception of speech purely by observing the talker’s articulatory gestures, while speechreading is understanding of speech by observing the talker’s articulatory, facial and manual gestures. However, in our studies, we consider speechreading without manual gestures. Speechreading of phonemes after cochlear implantation. In a first set of observations, we tested 33 CI adults with postlingually acquired deafness for their ability to recognize isolated phonemes using only visual information (Rouger, Fraysse, Deguine & Barone, 2008). We found no difference (z = )0.85), between CI subjects and NH controls for speechreading of isolated phonemes (correct responses 27% for both NH and CI participants).
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
Scand J Psychol 50 (2009)
Moreover, we did not find any difference (z = )0.14) between CI and NH participants when we analyzed the categorization of phonemes according to place of articulation (i.e., bilabials, dentals, velars) or mode of articulation (i.e., voiceless stops, voiced stops, and nasals). Thus, our results showed that CI and NH subjects performed at similar levels on phonetic identification of stimuli in the visual-only condition at three levels of phonetic analysis: correct phoneme identification, and global and detailed phonetic categorization, making our findings very consistent. Our results were rather unexpected given that long periods of deafness tend to be accompanied by the acquisition of a high proficiency for word speechreading (Rouger et al., 2007; Strelnikov et al., 2008). Consider however that in our study, stimuli were short meaningless phonetic structures without semantic cues. In such a context, CI users showed the same visual phonetic abilities as NH subjects. Visual speech recognition in CI users may rely to a great extent on top-down influences involving higher contextual levels of cognitive processes (e.g., lexical, syntactic and semantic levels) as suggested by speechreading studies on deaf and NH subjects (Bernstein et al., 2000; Garstecki & O’Neill, 1980; Lind, Erber & Doyle, 1999; Rubinstein, Cherry, Hecht & Idler, 2000), see also (Ronnberg, Samuelsson, Lyxell & Arlinger, 1996) for the comparison of different contextual influences. Further, our results were obtained using a specific experimental situation with a closed set of stimuli (the number of possible phonemes in the language). Visual identification of phonemes under experimental conditions may be different from the ecological situation when a person sees some movements of the face and does not know if they represent a phoneme or a compound of several phonemes. Nonetheless, vision plays an important role during audiovisual identification of isolated phonemes, as the scores of CI patients were significantly higher in the phonetic audiovisual condition (about 60%) than they were in the auditory-only condition. They were also greater than in the vision-only condition: that is, CI patients were able to integrate auditory and visual phonemic information. In NH and CI participants cross-modal interactions between vision and audition can enhance speech intelligibility, even at a phonological level (Sams, Manninen, Surakka, Helin & Ka¨tto¨, 1998; Schwartz, Berthommier & Savariaux, 2004). To further evaluate the automatic integration of speechreading and acoustic cues during audiovisual speech perception, we performed a study of the McGurk effect in both NH and CI participants (Rouger et al., 2008). In this widely known effect (McGurk & MacDonald, 1976), presenting visual speech simultaneously with an incongruent auditory counterpart can lead to an illusory audiovisual percept. For example, when presenting a clear auditory /ba/ dubbed with a synchronous visual /ga/, NH subjects often report hearing /da/, even though the auditory part of the stimulus is reliably identified as /ba/ when presented in the auditory-only condition. Thus, the visual portion of the incongruent stimuli strongly affects perceived place of articulation (e.g., anterior or posterior) because of its high visibility so that during audiovisual integration of incongruent McGurk stim-
Role of speechreading in audiovisual interactions
439
uli, auditory cues for place of articulation can be dominated by visual cues for place of articulation (Brancazio & Brancazio, 2004; McGurk & MacDonald, 1976). In our study of the integration of speechreading and acoustic cues, auditory dentals (/ada/, /ata/, /ana/) were dubbed with a visual bilabial (/ama/). We found that 82% of NH participants’ responses integrated articulatory components of the incongruent auditory and visual stimuli to form hybrid percepts containing auditory and visual features of place of articulation (e.g., visual bilabial (/ama/) with auditory dental (/ata/) elicited /apta/ responses). Only 10% of NH participants’ percepts were purely auditory (e.g. /ata/), and 7% corresponded to purely visual percepts (e.g. /apa/). The CI users’ responses were strongly driven by the visual modality: more than 98% of CI users reported purely visual percepts (e.g., bilabial answers /apa/, /aba/, and /ama/), with the perceived place of articulation of their answers corresponding to the place of articulation of the visual part of the ambiguous stimuli. Desai, Stickney & Zeng (2008) obtained similar results in a similar study which, in addition, showed that the strength of the visual influence was dependent on the duration of CI use. Their results might be related to development of a functional synergy between the visual and auditory modalities, as will be discussed later (see also Giraud, Price, Graham, Truy & Frackowiak, 2001b). In general, the analyses of the McGurk percepts show that, for these speech tokens at least, in both NH and CI participants, nasality and voicing are deduced from the auditory part of the stimuli, while place of articulation is integrated from auditory and visual information. However, regarding perceived place of articulation for incongruent audiovisual stimuli, we observed that CI users tended to place more weight on visual cues than auditory cues, whereas NH participants tended to balance the weight of cues coming from both modalities. In conclusion, because of ambiguity in the stimuli and uncertainty in the auditory signal, CI participants’ perceptual decisions appear to be dominated by vision, their most reliable sensory channel. This trend in favor of reliance on the visual modality is consistent with reports that CI users develop strong visual speechreading skills during the pre-implant period of deafness, and these are maintained several years after cochlear implantation, despite progressive recovery of auditory function (Rouger et al., 2008). NH people, who can rely on good speech information from the auditory modality, show dominance of that modality, often ignoring visual cues. The finding that CI patients show greater visual influence than NH when auditory and visual tokens are incongruent suggests that when the auditory speech signal is (imperfectly) delivered by the implant, speechreading remains an important influence in speech processing. Speechreading of words after cochlear implantation. Research with humans and animals has shown that loss of a given sensory modality may lead to compensatory mechanisms and increased reliance on the remaining modalities (Bavelier, Dye & Hauser, 2006; Bavelier & Neville, 2002; Putzar, Goerendt, Lange, Rosler & Roder, 2007; Rauschecker, 1991; Roder, Teder-Salejarvi,
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
440 K. Strelnikov et al. Sterr, Rosler, Hillyard & Neville, 1999). In the case of profound deafness, the acquisition of speechreading skills is one of the sensory substitution strategies developed by deaf patients in order to recover a degree of speech comprehension (Grant, Walden & Seitz, 1998; Kaiser, Kirk, Lachs & Pisoni, 2003; Summerfield, 1992; Tyler et al., 1997). However, in CI patients, there are contradictory reports about the way this compensatory skill evolves following the recovery of auditory function. In some cases a progressive increase in speechreading ability was reported during the first years after implantation (Giraud, Price, Graham & Frackowiak, 2001a; Giraud et al., 2001b), whereas others have reported only mild improvement (Bergeson, Pisoni & Davis, 2005) or no change at all (Gray, Quinn, Court, Vanat & Baguley, 1995). To address this issue, we performed a longitudinal study of speechreading performance in a population of about 100 adults, encompassing post-implantation times ranging up to 8 years (Rouger et al., 2007) (see Fig. 1 for the visual modality, in blue). These were postlinguistically deafened subjects (mean age 56 years, range 19–82) who received a CI after profound deafness (defined as a hearing loss of ‡ 90 dB) of diverse etiologies (meningitis, chronic otitis, otosclerosis, neurinoma) and durations (mean 22 years, range 1–57). All CI patients were recipients of a Nucleus (Cochlear) implant (CI-22 or CI-24) and used a range of different sound-coding strategies. The set of isolated words used for stimulation in this study was different at different testing times and across presentation modalities to exclude practice effects. Words were chosen randomly without replacement from the Fournier French speech therapist list. As previously reported (Gray et al., 1995), at the time the CI was switched on, word speechreading performance in CI patients was much higher (35%) than the performance of NH participants (9%) (p < 0.05, unpaired t-test). These scores were collected while patients had almost no auditory experience with their implant, so that they mainly reflect word speechreading abilities in deaf patients without CI – they are furthermore not significantly different from the pre-implantation scores (p = 0.62, paired t-test). The most important outcome of this quantitative analysis was that CI users’ speechreading abilities remained unchanged across all post-implantation periods tested (i.e., greater than 35% even after several years), even though they had reached their maximal auditory performance (see Fig. 1). We interpreted these results as evidence that CI users developed mechanisms of bisensory integration as a strategy to maintain high levels of speech recognition in noisy auditory environments – CI patients being highly susceptible to interference from noise (Fu, Shannon & Wang, 1998; Munson & Nelson, 2005). These results, obtained from a large population of CI deaf patients, were sufficiently robust to support preservation of speechreading performance in CI patients, despite the recovery of auditory speech comprehension after several years of CI experience. Interestingly, at the time of CI activation visual speech perception scores did not correlate with auditory speech perception scores (r = 0.01, p = 0.76). This may suggest that the two
Scand J Psychol 50 (2009)
scores reflect different processing subskills, rather than a common, speech-processing ability. However, auditory scores were very poor, which may have lowered the possible correlation. However, this conclusion was reconsidered when the gender of subjects was taken into account (Strelnikov, 2008). In NH participants, even though overall performance was low (less than 10% for word recognition), women significantly outperformed men at word speechreading (12% vs. 7%, respectively). Similarly, female superiority was observed in deaf patients at the time of cochlear implantation. Interestingly, this superiority of women disappeared in the late period after implantation (24– 36 months) and remained unnoticeable during subsequent years, when male and female CI patients reached similar levels of recovery of visual speech perception. Thus, in male CI patients, we may claim that there is a progressive increase in speechreading performance, a phenomenon not observed in women who had already reached their maximal level of performance before cochlear implantation. This suggests different processes and strategies for speech perception recovery in the two populations. Though our talker was female and therefore there was greater congruence between talker and observer for women than for men, we believe it is unlikely to be the cause of the observed gender difference in learning. We propose that the improvement for males derived from the involvement of multisensory integration in unisensory perceptual learning. In support, it has been shown that brain reorganization in CI users is strongly related to audio-visual coupling (Doucet, Bergeron, Lassonde, Ferron & Lepore, 2006). Furthermore, recent psychophysical studies have shown that audio-visual training can increase rate of learning and perceptual performance in the auditory or visual modality alone (Frassinetti, Bolognini, Bottari, Bonora & Ladavas, 2005; Lippert, Logothetis & Kayser, 2007; Seitz, Kim & Shams, 2006), though they did not use speech material. We hypothesize that in deaf male patients the progressive recovery of auditory and audio-visual speech comprehension following cochlear implantation provides strong positive feedback that consolidates the decoding of visual cues with speech information resulting in an increase in speechreading performance. Recent fMRI studies have shown that NH women use a more extended network than men for speech-reading (Ruytjens, Albers, van Dijk, Wit & Willemsen, 2006), which may explain their better initial performance. However, due to this very fact, the potential for the further recruitment of more widespread neural resources during learning speech reading in women is less than in men. This could lead to the greater training effects in men and the eventual equal performance of men and women in experienced CI users. Speechreading and predictive strategies. Our studies provide evidence that there is a clear distinction between the visual perception of words and phonemes in CI patients. For words, they are better than NH adults with no improvement for phonemes. This difference may be related to different ecological and adaptive values of these types of stimuli, as well as to the different levels of speech integration they require (cf. Auer, 2009).
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
Scand J Psychol 50 (2009)
A phoneme embedded into a word may generate predictions about the preceding or following phonemes according to the word representation in linguistic memory. However, such context-based cues, which underlie the integrative analysis of speech, are much smaller in cases of isolated phoneme perception. Because word speechreading improves after cochlear implantation for men but phoneme speechreading does not, this difference may confirm the increased importance of predictive and integrative strategies for speech processing in CI patients. The augmented role of integrative strategies in CI users may exist not only in comparison with NH controls but also in comparison with deaf patients before implantation. Our studies thus show the importance of distinguishing between levels of speech processing during speechreading. Beyond the phonetic and lexical levels discussed so far, the next processing level concerns speechreading of connected speech, including simple sentences as well as sentences with complex syntactic structures and prosodic features. One can predict that in situations of richer linguistic context, like those provided by connected speech, there will be an increase in the usage of predictive and integrative strategies in CI patients. For example, it would be interesting to systematically compare the perception of isolated words with words embedded in sentences. Stimuli could be separate sentences as well as sentences forming a coherent story about familiar situations. Within this sequence of conditions, word-sentence-story, the importance of context increases progressively. Thus, an increase in speechreading scores can be expected. Non-verbal (Garstecki & O’Neill, 1980) and verbal (Lind et al., 1999) situational contexts as well as predictive strategies (Rubinstein et al., 2000) were shown to improve speechreading performance. Speechreading of words was shown to be more effective in sentences rather than for isolated words, however no facilitation in words and sentences was observed for phonemes (Bernstein et al., 2000). The approaches described above that allow manipulation of different levels of speech integration during speechreading can be implemented not only in behavioral but also in neuroimaging studies. According to a recent model of continuous speech processing (Strelnikov, 2008), the more prediction and integration are used for speech processing, the greater the expected involvement of right hemispheric structures, especially the posterior frontal and posterior temporal areas of the right hemisphere. Extensive activations in the posterior parts of the superior and middle temporal regions and in the posterior parts of the middle and inferior frontal regions were reported bilaterally for speechreading of isolated words both for congenitally deaf subjects and hearing controls (Capek, MacSweeney, Woll et al., 2008).
Role of audiovisual interactions in speech comprehension in cochlear implanted deaf patients A large body of psychophysical studies has demonstrated that simultaneous polysensory stimulation improves perception by reducing ambiguity. Benefits gained from multisensory processing can affect a range of different measures including reaction
Role of speechreading in audiovisual interactions
441
times, detection rates or accuracy of stimulus identification, and learning effects on stimulus processing (Stein & Meredith, 1993). Multisensory effects for speech stimuli are also well known (Bernstein, Auer & Takayanagi, 2004) and studies have shown large improvements in speech comprehension when the speaker can be seen, especially in noisy environments (Ross et al., 2007; Sumby & Pollack, 1954). As expected, given the classical perceptual benefit derived from multisensory integration, we have been able to show that the same large cohort of deaf patients achieved higher performance in audiovisual conditions as compared with auditory-alone conditions (Rouger et al., 2007; see also Bergeson et al., 2005; Grant & Seitz, 2000; Grant, Walden & Seitz, 1998; Gray et al., 1995). This effect was observed at each period of testing, before and after cochlear implantation (Fig. 1). When the implant was activated, CI patients showed a high level of word recognition in bimodal conditions (86% correct). From that time, audiovisual recognition improved slightly with practice, with CI users reaching near perfect performance levels (94%) as early as the second month post-implantation. Thus, considering that CI patients maintained the speechreading abilities acquired during the period of deafness as they reached a high level auditory recognition, we showed that speech intelligibility in CI users was greatly improved under audiovisual conditions, especially during the first months post-implantation when auditory performance has not yet reached maximal recovery. We hypothesized that this high visual aptitude might induce in CI users an improvement of the multisensory integration mechanisms, leading to greater audiovisual benefits than those that can be observed in NH subjects. To address this issue we compared the audiovisual benefits presented by CI patients during a word recognition task to the ones obtained in naı¨ve NH subjects exposed to the same stimuli degraded using a noiseband vocoder paradigm (Shannon, Zeng & Wygonski, 1998). In these stimuli, which simulate the processing strategy of cochlear implants, fine temporal cues within each spectral component are removed while global temporal and spectral acoustic information is preserved. This specific processing largely affects speech comprehension (Desai et al., 2008). This CI simulation protocol allowed us to make direct comparisons of audiovisual performance at equivalent ranges of non-optimal auditory performance for both NH and CI groups. When the visual-auditory performance of CI patients was compared to that of the NH participants exposed to vocoded speech, the audiovisual gain in CI patients was higher than that observed in NH subjects, especially in conditions of low auditory performance (Fig. 2). A normalization of the bimodal benefit with respect to the auditory performance showed an audiovisual gain in CI patients nearly twice as high as that observed in NH participants. Finally, to determine whether the audiovisual performance of CI users was due to higher visual skill or to a better capacity for integrating visual and auditory information, we developed an optimal-integration model. In this model, the spatio- and spectro-temporal auditory and visual cues are combined across modalities in such a way that the amount of information required
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
442 K. Strelnikov et al.
Fig. 2. Relationships between the performances of word recognition in auditory (x axis) and audiovisual (y axis) conditions in NH subjects (NHS) with vocoder simulation (green) and CI patients (blue). The population of CI users is clearly above the NH group suggesting that patients benefit more from sensory fusion. The lower graph confirms this. The normalized bimodal gain (expressed as [(VA – A)/(100 – A)]) is much higher in CI patients whatever their auditory level of performances (adapted from Rouger et al., 2007). Asterisks indicate statistically significant differences in audiovisual benefits between CI users and NH subject (p < 0.05, unpaired t-test). Error bars represent standard deviation.
for correct word recognition is minimized. We found that the model fitted very well the performance of CI patients, indicating that they integrate visual and auditory inputs very efficiently, in a nearly optimal way. By contrast, the bisensory performance of NH participants tested with CI simulations was far below the model’s performance levels. Altogether, this suggests that the performance of CI patients is due not only to higher efficiency in speechreading but also to the development of specific audiovisual skills that allow them to match the visual speech information provided by the lip and face movements with the impoverished auditory information (Rouger et al., 2007).
PERSPECTIVES There is now a large body of work showing cross-modal compensation in profoundly deaf subjects at the cortical level (Bavelier et al., 2006). For example, it has been demonstrated that auditory cortex in deaf patients may participate in the processing of visual information that is linguistic or non-linguistic in nature (Capek
Scand J Psychol 50 (2009)
et al., 2008; Finney, Clementz, Hickok & Dobkins, 2003; Finney, Fine & Dobkins, 2001; Nishimura, Hashikawa, Doi et al., 1999; Petitto, Zatorre, Gauna, Nikelski, Dostie & Evans, 2000). This form of brain plasticity after deprivation of one sensory modality may be explained by changes in the efficiency of existing heteromodal connections that directly link areas involved in different sensory modalities (Cappe & Barone, 2005; Falchier, Clavagnier, Barone & Kennedy, 2002; Rockland & Ojima, 2003). Despite the increasing use of cochlear implants, little is known about the changes in cortical networks involved in sound processing after cochlear implantation (Giraud, Truy & Frackowiak, 2001c). Neuroimaging studies with PET have reported progressive activation of auditory areas and other cortical regions involved in speech processing after cochlear implantation (Green, Julyan, Hastings & Ramsden, 2005; Ito, Momose, Oku et al., 2004; Mortensen, Mirz & Gjedde, 2006; Nishimura, Doi, Iwaki et al., 2000; Wong, Miyamoto, Pisoni, Sehgal & Hutchins, 1999). However, as reviewed previously, the use of the auditory neuroprosthesis in postlingually deaf CI patients involves long-term adaptative processes to build coherent percepts from the coarse information delivered by the implant. Consequently, postlingually deaf CI patients present different levels of activation in cortical areas involved in semantic and/or phonological speech processing (Giraud et al., 2001a; Giraud, Truy, Frackowiak, Gregoire, Pujol & Collet, 2000). Further, the cross-modal compensation observed at the behavioral level is accompanied by plastic changes in the visual and auditory networks involved during visual and audiovisual speech integration (Giraud et al., 2001b; Lagleyre, Rouger, Laborde et al., 2006; Lee, Giraud, Kang et al., 2007). Lastly, our behavioral data in CI deaf patients suggest a synergy between vision and audition during the first years after implantation, leading to an increase of performance in each single modality – at least for men, whose speechreading processing abilities immediately post-implant was poorer than that of women (Strelnikov, 2008). Similarly, such a synergetic phenomenon can also be observed at the cortical level, as expressed by a progressive increase in the activity level of the visual and auditory areas in response to stimulation by auditory words following the recovery of speech comprehension (Giraud et al., 2001b). In consequence, we suggest that the activation of cortical areas involved in auditory speech processing by visual cues could be an explanation for the increased responsiveness of these auditory areas to auditory speech stimuli. These results are of importance from a clinical standpoint since they provide a theoretical framework for clinicians and speech therapists, suggesting the development of appropriate therapeutic strategies favoring speech rehabilitation focused on visual and audiovisual interactions through speechreading. We strongly believe that rehabilitation strategies built on visual and audiovisual training will improve and hasten the recovery of auditory speech comprehension. This work was supported by ACI Neurosciences Inte´gratives et Computationnelles (to OD, PB, JR), Fondation pour la Recherche Me´dicale (to
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
Scand J Psychol 50 (2009) JR), Fe´de´ration pour la Recherche sur le Cerveau (to PB), ANR Hearing Loss, ANR-06-Neuro-021-01 (OD, PB), CNRS Atip+ program (KS, BP), Fondation de l’Avenir (OD). The authors would also like to thank C. Marlot for help with the bibliography, M.-L. Laborde and S. Lagleyre for their help with the data for the manuscript and the anonymous reviewers for their helpful comments and suggestions.
REFERENCES Arnold, P. & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology, 92(2), 339–355. Auer, E. T., Jr (2009). Spoken word recognition by eye. Scandinavian Journal of Psychology, 50, 419–425. Bavelier, D., Dye, M. W. & Hauser, P. C. (2006). Do deaf individuals see better? Trends in Cognitive Sciences, 10(11), 512–518. Bavelier, D. & Neville, H. J. (2002). Cross-modal plasticity: Where and how? Nature Reviews, 3(6), 443–452. Benoit, C., Mohamadi, T. & Kandel, S. (1994). Effects of phonetic context on audio-visual intelligibility of French. Journal of Speech and Hearing Research, 37(5), 1195–1203. Bergeson, T. R., Pisoni, D. B. & Davis, R. A. (2005). Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants. Ear and Hearing, 26(2), 149–164. Bernstein, L. E., Auer, E. T., Jr & Takayanagi, S. (2004). Auditory speech detection in noise enhanced by lipreading. Speech Communication, 44(1–4), 5–18. Bernstein, L. E., Auer, E. T., Jr & Tucker, P. E. (2001). Enhanced speechreading in deaf adults: Can short-term training/practice close the gap for hearing adults? Journal of Speech, Language, and Hearing Research, 44(1), 5–18. Bernstein, L. E., Demorest, M. E. & Tucker, P. E. (2000). Speech perception without hearing. Perception & Psychophysics, 62(2), 233–252. Brancazio, L. & Brancazio, L. (2004). Lexical influences in audiovisual speech perception. Journal of Experimental Psychology, 30(3), 445–463. Calmels, M. N., Saliba, I., Wanna, G., Cochard, N., Fillaux, J., Deguine, O., et al. (2004). Speech perception and speech intelligibility in children after cochlear implantation. International Journal of Pediatric Otorhinolaryngology, 68(3), 347–351. Campbell, R. (2008). The processing of audio-visual speech: Empirical and neural bases. Philosophical Transactions of the Royal Society of London, 363(1493), 1001–1010. Capek, C. M., MacSweeney, M., Woll, B., Waters, D., McGuire, P. K., David, A. S., et al. (2008). Cortical circuits for silent speechreading in deaf and hearing people. Neuropsychologia, 46(5), 1233– 1241. Cappe, C. & Barone, P. (2005). Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. The European Journal of Neuroscience, 22(11), 2886–2902. Copeland, B. J. & Pillsbury, H. C. (2004). Cochlear implantation for the treatment of deafness. Annual Review of Medicine, 55, 157–167. Criteria of candidacy for unilateral cochlear implantation in postlingually deafened adults. I: Theory and measures of effectiveness. (2004). Ear and Hearing, 25(4), 310–-335. Deggouj, N., Gersdorff, M., Garin, P., Castelein, S. & Gerard, J. M. (2007). Today’s indications for cochlear implantation. Acta otorhino-laryngologica Belgica, 3(1), 9–14. Desai, S., Stickney, G. & Zeng, F. G. (2008). Auditory-visual speech perception in normal-hearing and cochlear-implant listeners. The Journal of the Acoustical Society of America, 123(1), 428–440. Di Nardo, W., Cantore, I., Cianfrone, F., Melillo, P., Rigante, M. & Paludetti, G. (2007). Residual hearing thresholds in cochlear implantation and reimplantation. Audiology & Neuro-otology, 12(3), 165–169.
Role of speechreading in audiovisual interactions
443
Doucet, M. E., Bergeron, F., Lassonde, M., Ferron, P. & Lepore, F. (2006). Cross-modal reorganization and speech perception in cochlear implant users. Brain, 129(Pt 12), 3376–3383. Drennan, W. R. & Rubinstein, J. T. (2008). Music perception in cochlear implant users and its relationship with psychophysical capabilities. Journal of Rehabilitation Research and Development, 45(5), 779–789. Falchier, A., Clavagnier, S., Barone, P. & Kennedy, H. (2002). Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience, 22(13), 5749–5759. Finney, E. M., Clementz, B. A., Hickok, G. & Dobkins, K. R. (2003). Visual stimuli activate auditory cortex in deaf subjects: Evidence from MEG. Neuroreport, 14(11), 1425–1427. Finney, E. M., Fine, I. & Dobkins, K. R. (2001). Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience, 4(12), 1171–1173. Frassinetti, F., Bolognini, N., Bottari, D., Bonora, A. & Ladavas, E. (2005). Audiovisual integration in patients with visual deficit. Journal of Cognitive Neuroscience, 17(9), 1442–1452. Friesen, L. M., Shannon, R. V., Baskent, D. & Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. The Journal of the Acoustical Society of America, 110(2), 1150–1163. Fu, Q. J., Shannon, R. V. & Wang, X. (1998). Effects of noise and spectral resolution on vowel and consonant recognition: acoustic and electric hearing. The Journal of the Acoustical Society of America, 104(6), 3586–3596. Garstecki, D. C. & O’Neill, J. J. (1980). Situational cue and strategy influence on speechreading. Scandinavian Audiology, 9(3), 147–151. Giraud, A. L., Price, C. J., Graham, J. M. & Frackowiak, R. S. (2001a). Functional plasticity of language-related brain areas after cochlear implantation. Brain, 124(Pt 7), 1307–1316. Giraud, A. L., Price, C. J., Graham, J. M., Truy, E. & Frackowiak, R. S. (2001b). Cross-modal plasticity underpins language recovery after cochlear implantation. Neuron, 30(3), 657–663. Giraud, A. L., Truy, E. & Frackowiak, R. (2001c). Imaging plasticity in cochlear implant patients. Audiology & Neuro-otology, 6(6), 381–393. Giraud, A. L., Truy, E., Frackowiak, R. S., Gregoire, M. C., Pujol, J. F. & Collet, L. (2000). Differential recruitment of the speech processing system in healthy subjects and rehabilitated cochlear implant patients. Brain, 123(Pt 7), 1391–1402. Grant, K. W. & Braida, L. D. (1991). Evaluating the articulation index for auditory-visual input. The Journal of the Acoustical Society of America, 89(6), 2952–2960. Grant, K. W. & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108(3 Pt 1), 1197–1208. Grant, K. W., Walden, B. E. & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103(5 Pt 1), 2677– 2690. Gray, R. F., Quinn, S. J., Court, I., Vanat, Z. & Baguley, D. M. (1995). Patient performance over eighteen months with the Ineraid intracochlear implant. The Annals of Otology, Rhinology & Laryngology, 166, 275–277. Green, K. M., Julyan, P. J., Hastings, D. L. & Ramsden, R. T. (2005). Auditory cortical activation and speech perception in cochlear implant users: effects of implant experience and duration of deafness. Hearing Research, 205(1–2), 184–192. Ito, K., Momose, T., Oku, S., Ishimoto, S., Yamasoba, T., Sugasawa, M., et al. (2004). Cortical activation shortly after cochlear implantation. Audiology & Neuro-otology, 9(5), 282–293. Kaiser, A. R., Kirk, K. I., Lachs, L. & Pisoni, D. B. (2003). Talker and lexical effects on audiovisual word recognition by adults with
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
444 K. Strelnikov et al. cochlear implants. Journal of Speech, Language and Hearing Research, 46(2), 390–404. Kim, J. & Davis, C. (2003). Hearing foreign voices: Does knowing what is said affect visual-masked-speech detection? Perception, 32(1), 111–120. Lagleyre, S., Rouger, J., Laborde, M. L., Demonet, J. F., Fraysse, B., Deguine, O., et al. (2006). Role of visuo-auditory integration in speech comprehension in deaf subjects with cochlear implants. Paper presented at the 2nd Meeting of the European Societies of Neuropsychology. Lee, H. J., Giraud, A. L., Kang, E., Oh, S. H., Kang, H., Kim, C. S., et al. (2007). Cortical activity at rest predicts cochlear implantation outcome. Cerebral Cortex, 17(4), 909–917. Lind, C., Erber, N. P. & Doyle, J. (1999). Effects of related and unrelated questions on the speechreading of sentences. Journal of the Academy of Rehabilitative Audiology, 32, 61–83. Lippert, M., Logothetis, N. K. & Kayser, C. (2007). Improvement of visual contrast detection by a simultaneous sound. Brain Research, 1173, 102–109. Lorenzi, C., Gilbert, G., Carn, H., Garnier, S. & Moore, B. C. (2006). Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proceedings of the National Academy of Sciences of the United States of America, 103(49), 18866–18869. MacLeod, A. & Summerfield, Q. (1987). Quantifying the contribution of vision to speech perception in noise. British Journal of Audiology, 21(2), 131–141. McGurk, H. & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748. Moller, A. R. (2006). History of cochlear implants and auditory brainstem implants. Adv Otorhinolaryngol, 64, 1–10. Mortensen, M. V., Mirz, F. & Gjedde, A. (2006). Restored speech comprehension linked to activity in left inferior prefrontal and right temporal cortices in postlingual deafness. Neuroimage, 31(2), 842–852. Munson, B. & Nelson, P. B. (2005). Phonetic identification in quiet and in noise by listeners with cochlear implants. The Journal of the Acoustical Society of America, 118(4), 2607–2617. Nishimura, H., Doi, K., Iwaki, T., Hashikawa, K., Oku, N., Teratani, T., et al. (2000). Neural plasticity detected in short- and long-term cochlear implant users using PET. Neuroreport, 11(4), 811–815. Nishimura, H., Hashikawa, K., Doi, K., Iwaki, T., Watanabe, Y., Kusuoka, H., et al. (1999). Sign language ‘‘heard’’ in the auditory cortex. Nature, 397(6715), 116. Petitto, L. A., Zatorre, R. J., Gauna, K., Nikelski, E. J., Dostie, D. & Evans, A. C. (2000). Speech-like cerebral activity in profoundly deaf people processing signed languages: Implications for the neural basis of human language. Proceedings of the National Academy of Sciences of the United States of America, 97(25), 13961–13966. Pressnitzer, D., Bestel, J. & Fraysse, B. (2005). Music to electric ears: Pitch and timbre perception by cochlear implant patients. Annals of the New York Academy of Sciences, 1060, 343–345. Putzar, L., Goerendt, I., Lange, K., Rosler, F. & Roder, B. (2007). Early visual deprivation impairs multisensory interactions in humans. Nature Neuroscience, 10(10), 1243–1245. Rauschecker, J. P. (1991). Mechanisms of visual plasticity: Hebb synapses, NMDA receptors, and beyond. Physiological Reviews, 71(2), 587–615. Reisberg, D., McLean, J. & Goldfield, A. (1987). Easy to hear but hard to understand. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 97–114). Hillsdale, NJ: Erlbaum. Rockland, K. S. & Ojima, H. (2003). Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology, 50(1–2), 19–26. Roder, B., Teder-Salejarvi, W., Sterr, A., Rosler, F., Hillyard, S. A. & Neville, H. J. (1999). Improved auditory spatial tuning in blind humans. Nature, 400(6740), 162–166.
Scand J Psychol 50 (2009) Ronnberg, J., Samuelsson, S., Lyxell, B. & Arlinger, S. (1996). Lipreading with auditory low-frequency information. Contextual constraints. Scandinavian Audiology, 25(2), 127–132. Ross, L. A., Saint-Amour, D., Leavitt, V. M., Javitt, D. C. & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17(5), 1147–1153. Rouger, J., Fraysse, B., Deguine, O. & Barone, P. (2008). McGurk effects in cochlear-implanted deaf subjects. Brain Research, 1188, 87–99. Rouger, J., Lagleyre, S., Fraysse, B., Deneve, S., Deguine, O. & Barone, P. (2007). Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proceedings of the National Academy of Sciences of the United States of America, 104(17), 7295–7300. Rubinstein, A., Cherry, R., Hecht, P. & Idler, C. (2000). Anticipatory strategy training: Implications for the postlingually hearing-impaired adult. Journal of the American Academy of Audiology, 11(1), 52–55. Ruytjens, L., Albers, F., van Dijk, P., Wit, H. & Willemsen, A. (2006). Neural responses to silent lipreading in normal hearing male and female subjects. The European Journal of Neuroscience, 24(6), 1835–1844. Sams, M., Manninen, P., Surakka, V., Helin, P. & Ka¨tto¨, R. (1998). McGurk effect in Finnish syllables, isolated words, and words in sentences: Effects of word meaning and sentence context. Speech Communication, 26(1–2), 75–87. Schwartz, J. L., Berthommier, F. & Savariaux, C. (2004). Seeing to hear better: Evidence for early audio-visual interactions in speech identification. Cognition, 93(2), B69–78. Seitz, A. R., Kim, R. & Shams, L. (2006). Sound facilitates visual learning. Current Biology, 16(14), 1422–1427. Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J. & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270(5234), 303–304. Shannon, R. V., Zeng, F. G. & Wygonski, J. (1998). Speech recognition with altered spectral distribution of envelope cues. The Journal of the Acoustical Society of America, 104(4), 2467–2476. Stein, B. E. & Meredith, M. A. (1993). The Merging of the Senses. Cambridge, MA: MIT Press. Stenfelt, S. & Ro¨nnberg, J. (2009). The Signal-Cognition interface: Interactions between degraded auditory signals and cognitive processes. Scandinavian Journal of Psychology, 50, 385–393. Strelnikov, K. (2008). Activation-verification in continuous speech processing. Interaction of cognitive strategies as a possible theoretical approach. Journal of Neurolinguistics, 21, 1–17. Strelnikov, K., Rouger, J., Lagleyre, S., Fraysse, B., Deguine, O. & Barone, P. (2009). Improvement in speech-reading ability by auditory training: Evidence from gender differences in normally hearing, deaf and cochlear implanted subjects. Neuropsychologia, 47(4), 972–979. Sumby, W. H. & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215. Summerfield, Q. (1979). Use of visual information for phonetic perception. Phonetica, 36(4–5), 314–331. Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society of London, 335(1273), 71–78. Tyler, R. S., Parkinson, A. J., Woodworth, G. G., Lowder, M. W. & Gantz, B. J. (1997). Performance over time of adult patients using the Ineraid or Nucleus cochlear implant. Journal of Acoustical Society of America, 102(1), 508–522. Wong, D., Miyamoto, R. T., Pisoni, D. B., Sehgal, M. & Hutchins, G. D. (1999). PET imaging of cochlear-implant and normal-hearing subjects listening to speech and nonspeech. Hearing Research, 132(1–2), 34–42. Received 22 April 2009, accepted 30 April 2009
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.