Cognitive Correlates of Visual Speech Understanding in Hearing

10 downloads 0 Views 107KB Size Report
for identification (i.e., lexical identification speed) is an important cognitive .... sponse, or until the maximum response time had ex- pired. ... similar rhyming word pairs (e.g., “meter-eter,” ..... One interesting feature of the correlational pattern.
Cognitive Correlates of Visual Speech Understanding in Hearing-Impaired Individuals Ulf Andersson Bjo¨rn Lyxell Jerker Ro¨nnberg Linko¨ping University ¨ rebro University O Karl-Erik Spens Royal Institute of Technology

This study examined the extent to which different measures of speechreading performance correlated with particular cognitive abilities in a population of hearing-impaired people. Although the three speechreading tasks (isolated word identification, sentence comprehension, and text tracking) were highly intercorrelated, they tapped different cognitive skills. In this population, younger participants were better speechreaders, and, when age was taken into account, speech tracking correlated primarily with (written) lexical decision speed. In contrast, speechreading for sentence comprehension correlated most strongly with performance on a phonological processing task (written pseudohomophone detection) but also on a span measure that may have utilized visual, nonverbal memory for letters. We discuss the implications of this pattern.

Individuals who rely on speechreading because of hearing impairment communicate to different degrees of success by means of speechreading and speech tracking (Arnold, 1997; Demorest & Bernstein, 1992; Dodd & Campbell, 1987). Much of this variation can be accounted for by individual differences in specific cognitive skills (Arnold & Ko¨psel, 1996; Conrad, 1979; Gailey, 1987; see Ro¨nnberg, 1995, for a review). However, many of these findings might be valid only for sentence-based speechreading, as this task has usually been employed as a measure of visual speech understanding (e.g., Arnold & Ko¨psel, 1996; see Ro¨nnThis work is supported by grants from the Swedish Council for Social Research awarded to Bjo¨rn Lyxell (97-0319) and grants awarded to Jerker Ro¨nnberg (30305108). We thank Ulla-Britt Persson for checking the language. Correspondence should be sent to UIf Andersson, Department of Behavioural Sciences, Linko¨ping University, S-581 83 Linko¨ping, Sweden (e-mail: [email protected]). 䉷 2001 Oxford University Press

berg, 1995). Because the tasks differ in contextual constraints (e.g., speech tracking vs. sentence-based speechreading; see De Filippo & Scott, 1978, and Demorest, Bernstein, & DeHaven, 1996) and linguistic complexity (e.g., visual word-decoding vs. sentencebased speechreading; Lyxell & Ro¨nnberg, 1991a), it seems plausible that different measures of speechreading engage different sets of cognitive skills. In this study we will re-analyze data from the Ro¨nnberg, Andersson, Lyxell, and Spens (1998) speech tracking training study. Speech tracking refers to a procedure in which a talker reads a text, sentence by sentence, and the speechreader’s task is to repeat each sentence verbatim (De Filippo & Scott, 1978). The Ro¨nnberg et al. study examined the relationship between cognitive skills and visual and visual-tactile speech tracking. It found that tasks of visual worddecoding, lexical decision speed, and phonological processing speed accounted for a substantial portion of the individual differences in visual and visual-tactile speech tracking performance. That study focused on speech tracking only, whereas our study used several tasks of visual speech understanding and included some additional cognitive measures. Previous research has identified four cognitive correlates of context-bound sentence-based speechreading (see Ro¨nnberg, 1995, for review). In addition to the three skills of visual word-decoding, lexical identification speed, and phonological processing that also correlated with speechtracking, verbal inference-making ability correlated strongly with the sentence comprehension task (De Filippo, 1982; Lyxell & Ro¨nnberg,

104

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

1987; 1989; 1991a; 1992; Lyxell, Ro¨nnberg, & Sam¨ hngren, 1992). Why should these paruelsson, 1994; O ticular aspects of cognition be involved in these speechreading tasks? As lip movements and facial gestures create a speech signal that is incomplete and poorly specified, as well as transient (Berger, 1972; Dodd, 1977; Ro¨nnberg, 1990), rapid access to a lexical adress for identification (i.e., lexical identification speed) is an important cognitive operation (Lyxell, 1989; Lyxell & Ro¨nnberg, 1992; Ro¨nnberg, 1990). Slowed lexical identification would impede processing of new incoming visual information, as well as occupy resources from other important processes (e.g., inference making), with negative consequences for speech understanding (Lyxell, 1989). Similarly, especially when “watching for meaning,” as in the context-bound sentencespeechreading task, the speechreader must try to infer missing pieces of verbal information, using stored knowledge (i.e., verbal inference-making ability; Lyxell & Ro¨nnberg, 1987, 1989). Phonological processing includes a number of cognitive operations in which phonological aspects of language are processed, represented, and used during spoken or seen language (i.e., speechreading and reading; Cutler, 1989; MarslenWilson, 1987; Ro¨nnberg, 1995; Share & Stanovich, 1995). Does the strength of correlations of these skills vary with the different speechreading requirements of the different spechreading tasks? Ro¨nnberg (1995) found that lexical identification speed and phonological processing speed were relatively more important in relation to visual sentence-based speechreading and visual word-decoding than to visual speech tracking. Phonological representations were, on the other hand, relatively more important in relation to visual speech tracking. A recent study by Lidestam, Lyxell, and Andersson (1999), has also shown that the ability to speechread long sentences compared to short sentences is more strongly associated with the individuals’ working memory capacity. Thus, previous studies indicate that different tasks of visual speechreading relate in different degrees to particular cognitive abilities. To examine this further, our study used three measures of speech tracking and two speechreading tasks: sentence-based speechreading and visual worddecoding (Ro¨nnberg et al., 1999). Concerning speech tracking, some of these tasks have been used frequently

in previous studies (cf. Lyxell, 1989), whereas some are relatively new (cf. Spens, 1995). The optimum wpm rate and the blockage index constitute the two new measures of speech tracking, in addition to the conventional wpm rate measure. The rationale for inclusion of these newer measures is that the optimum wpm rate measure and the blockage index assess contrasting aspects of speech tracking. The optimum wpm rate measure will provide an estimate of a speech process that runs efficiently and smoothly (i.e., no blockages), whereas the blockage index will provide an estimate of a speech process that breaks down. Speech tracking and sentence-based speechreading differ in the degree and nature of contextual constraint. In speech tracking (De Filippo & Scott, 1978) a coherent story is presented, sentence by sentence, to the individual. Thus, context works cumulatively during the tracking task. In contrast, during sentence-based speechreading, as used here, context is provided by a title describing the scenario to which the sentence belongs. Finally, a relatively context-free and less linguistically complex speechreading task is speechreading of single words (a visual word-decoding test). All three are used in this study. The cognitive tests in this study are the same written-material tasks as in Ro¨nnberg, Andersson, Lyxell, et al. (1998). The rationale for presenting the test material visually is, besides the obvious reason of not testing hearing impaired individuals auditorily, that these tasks all tap central, amodal cognitive functions common to the processing of both spoken and seen language/speech (Ro¨nnberg, Andersson, Andersson, et al., 1998). Lexical identification speed was examined by means of a lexical decision test (cf. Lyxell & Ro¨nnberg, 1992). Three rhyme-judgment tasks were employed to assess phonological processing ability (Campbell, 1992; Hanson & MacGarr, 1989; Leybaert & Charlier, 1996; Lyxell et al., 1996). To further examine the nature of the relationship between phonological processing, speech tracking, and speechreading, we also included a phonological lexical decision test and a letter span test (cf. Baddeley & Wilson, 1985). The letter span test was included to specifically examine the phonological loop component (i.e., phonological coding and articulatory-phonological rehearsal; see Baddeley, 1997, for a review) of working memory. We accomplished this by manipulating the phonological

Visual Speech Understanding and Cognition

similarity of the stimulus material used in a short-term memory task (cf. Baddeley, 1966; Conrad & Hull, 1964). We administered a sentence completion task to obtain a measure of verbal inference making (i.e., the ability to fill in missing pieces of information; Lyxell, 1989). General working memory capacity (i.e., the ability to simultaneously process and store information) was tested by means of a reading span task (Daneman & Carpenter, 1980; Towse, Hitch, & Hutton, 1998). We used an analogy test and an antonym test as two indices of verbal ability (Lyxell et al., 1994). In sum, in our study we will examine whether performance in different visual speech understanding tasks, varying in linguistic complexity and contextual constraint, is related to a single set of cognitive abilities or whether some tasks are more strongly associated with a specific cognitive skill or skills. Based on previous research, we think it is reasonable to assume that lexical identification speed and phonological processing speed will be related to performance in almost all visual speech understanding tasks (Lyxell, 1989; Lyxell & Ro¨nnberg, 1992; Lyxell et al., 1994; Ro¨nn¨ hngren, 1992). Linguistically complex berg, 1990; O speechreading tasks (i.e., sentence-based speechreading and speech tracking) rather than speechreading of single words should show this relationship especially strongly. We make this assumption because linguistically complex tasks require faster throughput of larger amounts of speech signals than does discrete word identification. As contextual support has a positive effect on speechreading performance (Samuelsson & Ro¨nnberg, 1991, 1993), it could also reduce some of the cognitive demands that the individual must manage during visual speech understanding. Thus, speech tracking performance may be more weakly associated with lexical identification speed and phonological processing speed than sentence-based speechreading (Ro¨nnberg, 1995).

Method Participants Eighteen severely hearing-impaired (one male) individuals, ages 21–76 years (M ⫽ 53 years, SD ⫽ 16) participated in this study. Of these, 14 had participated in the Ro¨nnberg, Andersson, Lyxell, et al. training study

105

(1998). Characteristics for each participant are displayed in Table 1. The mean duration of the participants’ hearing impairment was 31 years (SD ⫽ 14), and the pure tone average hearing loss was 75 dB (SD ⫽ 16) for the better ear according to the most recent available medical records. All participants were native speakers of Swedish and preferred an oral communication mode.

General Procedure and Tests All participants were tested individually during two separate test sessions, one week apart. The cognitive tests and the speechreading tests were administered in the first session and the speech tracking test in the second session. All speechreading tests were presented on a Finlux 26⬙ color TV and a video cassette recorder (JVC HR-7700EG). The cognitive tests were run on a Macintosh SE 30 computer with computer-controlled display and collection of results. (TIPS [TextInformation-Processing-System]; Ausmeel, 1988). All test instructions were given in writing and complemented with further oral instructions.

Lexical Identification Test Lexical decision speed. The task was to decide whether a string of three letters constituted a real word. One hundred items were used as test material, 50 of which were monosyllabic real words (e.g., “sno¨,” SNOW) and 50 were not. Out of the 50 lures, 25 were pronounceable (e.g., “GAR”) and 25 unpronounceable (e.g., “NCI”). The letter string was displayed in the center of the screen for a maximum of 2 sec and the intertrial interval was 2 sec. Latency time was measured from onset of the stimulus display to the button press response, or until the maximum response time had expired. The participants responded by button press. Accuracy and speed of performance were measured.

Phonological Processing The rhyme judgment tests. Three rhyme judgment tests were administered. The task was to decide whether two simultaneously presented words or nonwords rhymed. The procedure for presentation of the items and response collection was as for the lexical decision speed

106

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

Table 1 Participant characteristics

Participants

Age

Age of onset

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Mean SD

41 52 61 61 76 40 72 39 21 57 64 68 45 37 51 45 60 60 53 14

31 36 1 34 61 7 32 3 3 7 39 33 1 1 36 31 15 29 22 18

Years with hearing aid

Years with hearing loss

Three frequency average (500, 1000, 2000 Hz)

10 8 19 18 9 12 25 36 17 34 22 32 40 15 14 14 37 11 21 11

10 16 61 27 15 33 40 36 18 50 25 35 44 37 15 14 46 31 31 14

83 70 57 88 93 93 85 60 57 75 77 73 87 70 113 65 53 58 75 16

task, except that the maximum response time was set at 5 sec. The first test list was composed of 50 pairs of monosyllabic and bisyllabic Swedish words. The word pairs were of four different types: 12 orthographically similar rhyming word pairs (e.g., “meter-eter,” METRE-ETHER), 12 orthographically dissimilar rhyming word pairs (e.g., “kurs-dusch,” COURSESHOWER), 13 orthographically similar nonrhyming word pairs (e.g., “klubba-stubbe,” STICK-STUMP), and 13 orthographically dissimilar nonrhyming word pairs (e.g., “cykel-pa¨ron,” BICYCLE-PEAR). In the second test, the material consisted of 50 pairs of bisyllabic words, and in each pair there was one “real” word and one nonword (e.g., “citron-mirol,” LEMONMIROL). The third test contained 30 monosyllabic pairs of nonwords (e.g., PRET-BLET). Phonological lexical decision test. The participants were instructed to decide, as quickly as possible, whether a string of letters, a nonword, sounded like a real word or not when it was read silently. Fifty pseudohomophones and 50 nonwords (all pronounceable) were employed as test material. One item at a time was dis-

played on the computer screen, and the subjects responded “yes” or “no” by means of pressing predefined buttons. The response time was set at 5 sec, and the string disappeared when participants pressed the button. Accuracy and speed of performance were measured.

Working Memory Capacity Reading span test. The participant’s task was to read sequences (3, 4, 5, or 6 sentences) of three-word sentences, judge whether each sentence was semantically sensible or absurd, and respond by saying “yes” or “no”. At the end of each sequence of sentences, the participants were instructed to recall orally, in correct serial order, the first or the final word in each sentence. The words in each sentence were presented in sequential order at a rate of one word per 0.8 sec and with an interword interval of 0.075 sec. The intersentence interval in each sequence of sentences was 1.75 sec during which time the participants had to respond “yes” or “no.” Half of the sentences were absurd (e.g., “Fisken ko¨rde bilen,” THE FISH DROVE THE

Visual Speech Understanding and Cognition

CAR), and half were normal sentences (e.g., “Kaninen var snabb,” THE RABBIT WAS FAST). Three different sequences were presented for each span size. The response interval was set at 80 sec. However, no participant needed more than 30 sec to respond. The experimenter started the next sequence of sentences by pushing a button. The participants’ responses were scored by the experimenter in terms of total number of recalled words. Letter span. The test material consisted of two different lists of letters, one list of phonologically similar letters (B, C, D, E, G, P, T, V) and one of phonologically dissimilar letters (F, H, J, K, M, N, R, S). The presentation order of the two lists was counterbalanced across participants. The letters were presented at a rate of one letter per 0.8 sec with an interitem interval of 0.075 sec. Each list contained 12 sequences of letters (three sequences for each span size). The span size ranged from four to seven letters. The participants were asked to recall orally the letters in correct serial order. As no participant needed more than 30 sec to respond (maximized to 120 sec), presentation of the next series of letters started after the experimenter pushed a button. The participants’ responses were scored in terms of total number of letters recalled in correct serial order.

Verbal Inference Making Sentence completion test. Twenty-eight incomplete sentences constituted the test material (e.g., “Kan jag . . . ett par byxor?,” MAY I . . . PAIR OF TROUSERS?). From each sentence two to four words were omitted. The sentences were 4 to 13 words long. All words were common Swedish words (Alle´n, 1970). Half the sentences were related to a restaurant scenario and the other half to a shop scenario. The task was to complete the sentence by filling in the missing words. The incomplete sentence was displayed on the computer screen for 7 sec. Immediately after the presentation of the sentence, the response interval started, which was set at 30 sec. The participants completed the sentence orally, and the experimenter transcribed the responses. These were scored correct if they were semantically and syntactically appropriate. The number of correct words was divided by the maximum number of deleted

107

words in each sentence to generate a sentence-based score, and means over 28 sentences were the dependent variable.

Verbal Ability Analogy test. This was a paper-and-pencil test in which the participants were required to judge which two words out of five were related to each other, in a similar (analogous) way as two target words (e.g., “SUGARSWEET: SUN, DAY, WHITE, NIGHT, DARK”). The test included 27 five-word strings and was performed under time pressure (maximum 5 minutes and 30 sec). The proportion of correct responses was recorded. Antonym test. This consisted of 29 five-word strings, and the participants’ task was to decide which two words out of five were antonyms (e.g., “BEAUTIFUL, OLD, SAD, FAST, YOUNG”). The participants had a maximum of 5 minutes to perform the test, and the scoring procedure was the same as for the analogy test. This was also a paper-and-pencil test.

Tactile Aids One single-channel aid and one multichannel aid were used in this study. MiniVib 3 (Ab Specialinstrument, Stockholm). The singlechannel aid was the MiniVib 3 device. This aid performs an extraction of the acoustic signal, between 500 and 1800 Hz, but time/intensity variations are presented to the user at a fixed 230 Hz frequency. The tactile signal is transmitted to the user via a boneconductor attached to the user’s wrist by a wristband. Tact aid 7. The Tact aid 7 device was the multichannel aid used in this study. This aid was equipped with seven vibrators. Out of these seven channels four covered the first formant frequencies, and four covered the second formant frequencies. Thus, one channel was shared by the first and second formant analyzers. The seven vibrators were attached to a specially designed hand prototype, and the participant held the hand prototype in his or her right hand. Five of the vibrators

108

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

stimulated the fingertips and the two remaining vibrators stimulated the palm. (See Tact aid 7 user’s manual [1991] for technical details.)

Visual Speech Understanding Sentence-based speechreading test. The participants’ task was to speechread sentences with and without tactile support. Eighteen sentences subdivided into three separate scenarios, each containing six sentences, were used as test material. The scenarios were a “train scenario,” a “shop scenario,” and a “restaurant scenario.” Within each scenario, the participants speechread two sentences in each one of the three speechreading conditions. The sentences were presented by a female native speaker of Swedish. The participants wrote their responses on an answer sheet. The order of the aids employed and the presentation order of the sentences were counterbalanced. Performance was measured as the proportion of words correctly perceived. Visual word-decoding test. In this test the participants’ task was to speechread single words presented by a female actor. The test material included 30 common Swedish bisyllabic nouns. Otherwise, the same test routine was used as in the sentence-based speechreading test. The proportion of correct responses was measured. Speech tracking test. A computerized speech tracking procedure (De Filippo & Scott, 1978; Gnosspelius & Spens, 1992) was employed in this test. The test material was a simplified and easy-to-read book by PerAnders Fogelstro¨m, Mina dro¨mmars stad (“The town of my dreams”). The following test routine was used: the experimenter read a text from a computer screen, sentence by sentence, and the participants were instructed to speechread and repeat as many words as possible following each sentence presentation. Words the participant was unable to perceive were repeated orally twice, and, if necessary, after approximately 4–5 sec, finally presented orthographically on an electronic display. All participants performed the task with their hearing aids turned off. The participants speechread for 10 min (2 ⫻ 5 min) in each of the three test conditions, and the test order was counterbalanced across participants.

Three measures of speech tracking performance were employed in this study: (1) conventional wpm rate (the total number of words speechread during the test session was divided by the time elapsed to give a wpm rate automatically calculated by the computer); (2) optimum wpm rate (this measure was based on error-free sentences only, that is, sentences in which all words were correctly perceived after the first presentation, and also automatically calculated by the computer program); (3) blockages (words that the participants could not correctly perceive after the first presentation constituted a blockage and were automatically registered by the computer program; the total number of blockages encountered during tracking constituted an additional measure of speech tracking performance).

Results The results are presented in two sections: the first section examines the intercorrelations among the five tasks of visual speech understanding, and the second the relationship between the cognitive tests and speech understanding tasks.

Different Measures of Visual Speech Understanding Descriptive statistics for and intercorrelations among the five tasks of visual speech understanding, pooled over the three conditions to reduce the number of variables (i.e., MiniVib 3, Tact aid 7, and visual), are shown in Table 2. The conventional speech tracking rate and the sentence-based speechreading test correlated with almost all the other measures, indicating that they constitute composite indexes of visual speech understanding. Our sample is relatively heterogeneous with respect to background variables such as chronological age, age of onset of hearing loss, years with hearing loss, number of years with hearing aid, and level of dB loss (see Table 1), and it is, thus, possible that these variables can affect the correlational patterns (cf. Dancer, Krain, Thompson, Davis, et al., 1994; Lyxell & Ro¨nnberg, 1991b; Ro¨nnberg, 1990; Shoop & Binnie, 1979). Correlations were for this reason calculated between these variables and visual speech understanding. Although previous studies have failed to obtain any

Visual Speech Understanding and Cognition

109

Table 2 Descriptive statistics for and correlations among different measures of visual speech understanding pooled over the three conditions and chronological age Speech tracking

Chron. age (yrs) Speech tracking Wpm conv. Wpm opt. Blockages Speechreading Sent.-based Word decod.

X (SD)

Wpm

Wpm opt.

53 (14)

⫺.50*

⫺.51*

34.92 (17.72) 45.11 (17.97) 41.76 (16.54)

Speechreading Blockages .31

.94** ⫺.79** ⫺.62**

Sentencebased

Word decoding

⫺.51*

⫺.57*

.72** .63** ⫺.81**

.76** .67** ⫺.79**

.34 (.29) .35 (.23)

.87**

*p ⬍ .05. **p ⬍ .01.

effect of some of these variables (number of years with hearing loss and dB loss, Ro¨nnberg, 1990), they were included to validate previous results. The outcome of these analyses showed that chronological age was the only variable significantly related to visual speech understanding (see Table 2; Lyxell & Ro¨nnberg, 1991b; Ro¨nnberg, 1990). To control for the effect of age, we then computed partial correlations among the five tasks of visual speech understanding. As can be seen in Table 3, controlling for age reduced the magnitude of the correlation coefficients in general, but all significant correlation coefficients remained significant.

Cognitive Correlates of the Visual Speech Understanding Tasks The purpose of this study was to identify possible cognitive correlates of visual speech understanding. To accomplish this aim, we computed correlation analyses between the cognitive tests and the visual speech understanding tests. As the correlational patterns for the three different speechreading conditions (i.e., MiniVib 3, Tact aid 7, and visual) did not deviate from each other in any systematic way, the three conditions were pooled to obtain one composite measure for each one of the five speechreading tasks. Descriptive statistics for the cognitive tests and the results of the correlation analyses are displayed in Table 4. Lexical identification speed, phonological lexical decision speed, and rhyme judgment speed all showed

Table 3 Partial correlations among different measures of visual speech understanding pooled over the three conditions, holding chronological age constant Speech tracking

Speechreading

Wpm Sentence- Word opt. Blockages based coding Speech tracking Wpm conv. rate Wpm: opt. rate Blockages Speechreading Sent.-based

.92**

⫺.77** ⫺.57*

.62** .47* ⫺.82**

.69** .53* ⫺.79** .81**

*p ⬍ .05. **p ⬍ .01.

systematic and significant correlations with the speech understanding tests, whereas the tests of working memory, verbal inference making, and verbal ability did not correlate significantly with the visual speech tests. Additional correlation analyses were calculated between the accuracy scores on the reaction time tests and the speech tests. However, these analyses did not reveal any significant correlations. One interesting feature of the correlational pattern was that the blockage index did not show any systematic relation with the cognitive tests. Furthermore, the speed measure of the phonological lexical decision test was significantly correlated only with the sentencebased speechreading test and optimum speech tracking rate. As chronological age affects both cognitive func-

110

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

Table 4 Descriptive statistics, expressed as proportion and speed, for the cognitive tests used in this study and correlations among different measures of visual speech understanding, pooled over the three conditions, and cognitive tests Speechtracking Cognitive tasks Working memory Reading span Letter span Phonologically dissimilar Phonologically similar Verbal ability Antonym test Analogy test Inference making Sentence-completion Lexical identification Lexical decision speed Phonological processing speed Phon. lexical decision Rhyme judgment Word-pairs Pairs of words/nonwords Pairs of nonwords

Speechreading

Wpm

Wpm opt.

Blockages

.41 (.10)

⫺.10

⫺.05

⫺.08

.05

.09

.34 (.20) .21 (.20)

⫺.24 .32

.08 .38

.08 ⫺.20

.04 .11

⫺.04 .29

.46 (.21) .43 (.27)

.06 .16

.22 .29

⫺.19 ⫺.29

.24 .46

.10 .43

.71 (.11)

.06

.16

⫺.31

.27

.30

.77 (.15)

⫺.58*

⫺.66**

.38

⫺.61**

⫺.54*

1.98 (.61)

⫺.41

⫺.50*

.43

⫺.61**

⫺.34

1.40 (.41) 1.54 (.48) 1.59 (.51)

⫺.47* ⫺.53* ⫺.52*

⫺.63* ⫺.65** ⫺.62**

.27 .39 .43

⫺.51* ⫺.62** ⫺.68**

⫺.49* ⫺.57* ⫺.55*

X (SD)

Sentencebased

Word decoding

*p ⬍ .05. **p ⬍ .01.

tions (see Birren & Fisher, 1995; Salthouse, 1985, for reviews) and visual speech understanding, partial correlations were calculated to examine whether significant correlations would remain when the effect of age was partialled out (see Table 5). The rhyme judgment tests (speed) continued to be significantly associated with optimum speech tracking rate and sentence-based speechreading but not with the other measures of visual speech understanding. The correlations between the speed measures of lexical decision and phonological lexical decision tests also remained significant for optimum speech tracking rate and sentence-based speechreading, respectively. The most important aspect of the partial correlational pattern was that the letter span task (i.e., phonologically similar letters) was now significantly correlated with almost all visual speech understanding tests. To examine the contribution of phonological processing in these correlations, we calculated a phonological similarity effect index (PSE) by subtracting performance in the phonologically similar condition from the phonologically dissimilar condition. This new variable (X ⫽ .23,

SD ⫽ .32, min ⫽ ⫺.34, max ⫽ .78) constitutes a measure of relative phonological processing (i.e., coding) in verbal working memory (Baddeley, 1966; 1997; Baddeley & Wilson, 1985; Conrad & Hull, 1964). The correlations between the PSE and the speech tasks when the effect of age was partialled out are presented in Table 5. Significant negative correlations were now evident with conventional and optimum speech tracking rate. That is, relative resistance to the PSE (low scores on this index) were associated with higher speechreading performance levels and vice versa. The PSE, on the other hand, was significantly and positively correlated with performance on the letter span task condition including phonologically dissimilar letters (r ⫽ .54, p ⬍ .05) and also showed a tendency to a positive and significant correlation with the reading span task (r ⫽ .46, p ⬍ .06). These correlations are consistent with the assumption that phonological coding is particularly efficient for temporal order recall of verbal information (Hanson, 1982; 1990; Logan, Maybery, & Fletcher, 1996), and as such the correlations also validate the assumption that the phonological similarity effect vari-

Visual Speech Understanding and Cognition

111

Table 5 Partial correlations among different measures of visual speech understanding, pooled over the three conditions, and cognitive tests, controlling for chronological age Speechtracking Cognitive tasks

Wpm

Wpm opt.

Reading span Letter span Phonologically dissimilar Phonologically similar Phonological similarity effect (dissimilar-similar) Antonym test Analogy test Sentence-completion Lexical decision speed Phonological lexical decision Rhyme-judgment Word-pairs Pairs of words/nonwords Pairs of nonwords

⫺.40

⫺.35

Speechreading Sentencebased

Word decoding

⫺.20

⫺.19

⫺.08 .07 ⫺.04 .58* .65** ⫺.36 ⫺.61** ⫺.52* .29

.27 .47* ⫺.15

.20 .63** ⫺.39

Blockages .06

.03 ⫺.05 ⫺.05 ⫺.38 ⫺.26

.21 .11 .07 ⫺.50* ⫺.38

⫺.18 ⫺.17 ⫺.26 .24 .31

.23 .32 .21 ⫺.41 ⫺.52*

.07 .26 .24 ⫺.24 ⫺.13

⫺.27 ⫺.39 ⫺.36

⫺.48* ⫺.54* ⫺.48*

.13 .30 .33

⫺.32 ⫺.50* ⫺.57*

⫺.25 ⫺.43 ⫺.37

*p ⬍ .05. **p ⬍ .01.

able taps phonological processing in working memory (i.e., coding). Finally, to examine whether the letter span test and the rhyme judgment tests tap similar aspects of phonological processing, we calculated intercorrelations. These correlation coefficients were, however, nonsignificant (⫺.02 ⬍ r ⬍ ⫺.22). Discussion The purpose of this study was to examine how cognitive abilities relate to different measures of visual speech understanding. The results demonstrate that the pattern of cognitive association varies with the different tests of visual speech. Before partialling for age, lexical identification speed proved, once again (cf. Lyxell, 1989; Lyxell & ¨ hngren, 1992), to be strongly related Ro¨nnberg, 1992; O to almost all tests of visual speech understanding, as well as phonological processing speed (i.e., rhyme judgment and phonological lexical decision). In con¨ hngren, trast to previous studies (Lyxell et al., 1994; O 1992), the accuracy measures of the rhyme judgment and phonological lexical decision tests were not associ-

ated with visual speech understanding. In line with previous studies, however, chronological age was associated with visual speech understanding (Lyxell & Ro¨nnberg, 1991b; Ro¨nnberg, 1990; Shoop & Binnie, 1979). There was no relationship between the blockage index of visual tracking fluency and any of the cognitive tests. This finding is in sharp contrast to the patterns of correlations for the other measures of visual speech understanding. Moreover, considering the strong correlation between the blockage index and sentencebased speechreading, one would expect a similar pattern to arise. This negative finding suggests that individual differences in blockages during speech tracking are not related to the cognitive tasks used in this study, whereas the index of a speech process that runs efficiently and smoothly (i.e., optimum wpm rate) is well accounted for by these cognitive functions. To account for breakdowns in the processing of visual speech input that result in blockages, other measures of cognitive and perceptual functions may be required. As the word-decoding tests were related to the blockage index, the early stages of visual speech understanding (e.g., visual feature extraction) may be implicated here (cf. Gailey, 1987; Lyxell & Ro¨nnberg,

112

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

1991a). In line with this, the lack of relationship with the cognitive tests is logical as the tasks used in this study all tap central amodal cognitive functions (cf. Ro¨nnberg, Andersson, Andersson, et al., 1998) that come into play at a later stage of the processing of the speech signal. Thus, when blockages occur, the flow of speech information is cut off or considerably reduced, and these central amodal cognitive functions are not required because there is no information, or only a small amount, left for processing. The suggestion that providing contextual information during visual speech understanding may reduce some of the cognitive demands imposed by the task was not strongly supported. Lexical identification speed and phonological processing speed were relatively stronger correlates of sentence-based speechreading than of conventional speech tracking rate; however, these two cognitive skills were also strongly correlated with the optimum speech tracking rate and with sentence-based speechreading. The provision of contextual information may not be responsible for the different strengths of associations between sentencebased speechreading and conventional speech tracking rate. Rather, it may reflect the fact that conventional speech tracking rate is a composite measure that includes, among other things, blockages, which, as we have seen, are uncorrelated with any cognitive test. Thus, the inclusion of blockages reduces the strength of the associations compared to optimum speech tracking rate and sentence-based speechreading. The results of the partial correlations corroborate this line of reasoning, as performance on the sentence-based speechreading test and the optimum rate measure were the only significant measures when the contribution of chronological age was eliminated. All significant correlations with the word-decoding test disappeared after partialling out for age, except for the letter span task. This pattern of results conforms with the finding of Lidestam et al. (1999), showing that the ability to speechread long sentences, compared to short sentences, depends more on the individual’s working memory capacity. Thus, the prediction that lexical identification and phonological processing are less powerful correlates of linguistically less complex speech tests compared to linguistically complex speech tests is supported. Word-decoding in speechreading

may tap relatively early (perceptual) stages of lexical identification (see Lyxell & Ro¨nnberg, 1991a), prior to any phonological processing. Furthermore, the on-line processing constraints of the word-decoding task are relatively small compared to speechreading of sentences (cf. Lidestam et al., 1999). Partialling for age revealed a sharper pattern of distinct associations of cognitive variables with the speechreading tasks. Phonological processing speed remained significantly related to optimum speech tracking rate and sentence-based speechreading, whereas lexical identification speed was still related to optimum speech tracking rate. A novel finding is that the letter span test, including phonologically similar letters, emerged as a significant correlate of all visual speech tests (except the blockage index). None of the other associations reached significance. The correlations between the letter span task, including phonologically similar letters, and almost all visual speech tests, suggest that working memory contributes to the speechreading process. However, the finding that the PSE index correlated negatively with speech tracking implicates reliance on a visual nonverbal component of short-term memory in this task (cf. De Filippo, 1982), as well as in visual word decoding, which showed a similar, though nonsignificant pattern. Thus, these findings suggest that skilled speechreading requires the ability to use both phonological and visual working memory coding strategies. Our results indicate that phonological processing permeates visual speech understanding. For example, performance on the phonological lexical decision task correlated highly with sentence-speechreading. Latencies on the rhyme judgment tests were associated with visual speech understanding (e.g., optimum speech tracking rate and sentence-based speechreading). The rhyme judgment tasks were all strongly associated with the lexical decision task (.81 ⬍ r ⬍.92), implying that the rhyme judgment speed taxes aspects of phonological processing related to lexical identification processes. From an auditory spoken word recognition approach (Luce & Pisoni, 1998; Marslen-Wilson, 1987), phonological processing may contribute to the speechreading process in a number of ways. According to models of auditory spoken word recognition (Luce & Pisoni, 1998; Marslen-Wilson, 1987), the initial pho-

Visual Speech Understanding and Cognition

neme is critical for identification of spoken words. A study reported by Lyxell and colleagues (1991a) shows that the initial phoneme also is critical in visual speechreading. Thus, the relationships between the rhyme judgment tasks and speechreading performance may reflect a stage in the speechreading process when the extracted initial speech segments of a word are converted into abstract phoneme or syllable representations (cf. Pisoni & Luce, 1987). The correlations with rhyme judgment speed may also manifest the subsequent process when these phonemic or syllabic representations activate the phonological-lexical items that share this word-initial information in the lexicon, which eventually results in lexical identification of the speechread word (cf. Marslen-Wilson, 1987). As rhyme judgment tasks involve manipulation and comparison of suprasegmental information (i.e., syllables) including syllabic stress (see Gathercole & Baddeley, 1993), it is possible that the associations with speechreading performance also reflect the processing of prosodic information during speechreading. The importance of word prosody (i.e., number of syllables and syllabic stress) and sentence prosody (i.e., rhythm and intonation) in auditory speech processing is well established (Cutler, 1989; Kjelgaard & Speer, 1999; Lindfield, Wingfield, & Goodglass, 1999a, 1999b; Norris, McQueen, & Cutler, 1995). Spoken word recognition is facilitated when word stress, not just initial phonology, is taken into account; that is, the word-initial cohort is constrained only when the words included share both stress pattern and initial phonology with the stimulus word (Lindfield et al., 1999a, 1999b). Sentence prosody contributes to auditory speech processing by solving syntactic ambiguities and identifying syntactic boundaries (Kjelgaard & Speer, 1999; Pisoni & Luce, 1987; Schepman & Rodway, 2000; Steinhauer, Alter, & Friederici, 1999). Word stress patterns (syllabic stress) and the rhythm of an utterance may be extracted from lips and face movements, as these speech cues have a direct corresponce to the energy of the acoustic signal (cf. Gathercole & Baddeley, 1993), and, as such, this energy should also be displayed in the face of the talking person. Thus, the speechreader may be able to obtain these speech cues and use them to facilitate lexical identification and to disambiguate speechread utterance (cf. Ro¨nnberg, An-

113

dersson, Andersson, et al., 1998). The amount of prosodic information is greater in speechreading of sentences than in words, which would explain why rhyme judgment speed was associated with optimum speech tracking rate and sentence-based speechreading only after controlling for age. Inference-making ability did not correlate with visual speech understanding in this study. However, this study used a different methodology and a different population than did previous studies (Lyxell & Ro¨nnberg, 1987, 1989) in which such a relationship has been obtained. Our negative result is, on the other hand, consistent with other previous studies (cf. Ro¨nnberg, 1990). As in previous studies on hearing-impaired participants, no direct relationships between general working memory capacity (i.e., reading span) and visual speech understanding were obtained (Lyxell & Ro¨nnberg, 1989). The studies finding such a relationship have typically employed populations of normal hearing participants (Lidestam et al., 1999; Lyxell & Ro¨nnberg, 1993). One reason for this difference between the populations is that speechreading in hearing-impaired individuals may be relatively automatized compared to that of normal hearing individuals. As automatized processes presumably require less working memory capacity (e.g., LaBerge & Samuels, 1985), it consequently becomes a less critical cognitive prerequisite in hearing-impaired individuals’ speechreading. In sum, these results demonstrate that specific cognitive skills relate to individual differences in visual speech understanding. The major cognitive correlates proved to be lexical identification speed and phonological processing. We observed the contribution of visual as well as phonological working memory coding strategies. Furthermore, we elaborated on the importance and function of phonological processing in visual speech understanding. As hypothesized, the pattern of correlations between cognitive skills and strengths of relationship varied among the different visual speech measures. Particularly after partialling correlations with age, linguistically complex tests (i.e., sentencebased speech reading, speech tracking) were associated with measures of lexical identification speed and phonological processing speed, but the (linguistically) simpler word-decoding task was not. Letter span, when measured either in terms of performance on similar-

114

Journal of Deaf Studies and Deaf Education 6:2 Spring 2001

sounding letters or as strength of PSE, correlated with all tests of visual speech understanding. In contrast to the other speech tests, none of these cognitive tests was associated with individual differences in number of blockages encountered during speech tracking. Prediction of this type of measure may require tests tapping the early stages of visual speech processing (i.e., perceptual functions). Received October 6, 1999; revision received April 13, 2000; accepted October 5, 2000

References Alle´n, S. (1970). Frequency dictionary of present-day Swedish. (In Swedish: Nusvensk frekvensbok.) Stockholm: Almqvist & Wiksell. Arnold, P. (1997). The structure and optimization of speechreading. Journal of Deaf Studies and Deaf Education, 2, 199–211. Arnold, P., & Ko¨psel, A. (1996). Lipreading, reading and memory of hearing and hearing-impaired children, Scandinavian Audiology, 25, 13–20. Ausmeel, H. (1988). TIPS (Text-Information-ProcessingSystem): A user’s guide. Linko¨ping: Department of Education and Psychology, Linko¨ping University, Sweden. Baddeley, A. D. (1966). Short-term memory for word sequences as a function of acoustic, semantic and formal similarity. Quarterly Journal of Experimental Psychology, 18, 362–365. Baddeley, A. D. (1997). Human memory: Theory and practice. Rev. ed. Hove, England: Psychology Press. Baddeley, A. D., & Wilson, B. (1985). Phonological coding and short-term memory in patients without speech. Journal of Memory and Language, 24, 490–502. Berger, K. W. (1972). Visemes and homophonous words. Teacher of the Deaf, 70, 396–399. Birren, J. E., & Fisher, L. M. ( 1995). Speed of behavior: Possible consequences for psychological functioning. Annual Review of Psychology, 46, 329–353. Campbell, R. (1992). Speech in the head? Rhyme skill, reading, and immediate memory in the deaf. In D. Reisberg (Ed.), Auditory imagery (pp. 73–94). Hillsdale, NJ: Erlbaum. Conrad, R. (1979). The deaf schoolchild. London: Harper & Row. Conrad, R., & Hull, A. J. (1964). Information, acoustic confusion and memory span. British Journal of Psychology, 55, 429–432. Cutler, A. (1989). Auditory lexical access: Where do we start? In W. Marslen-Wilson (Ed.), Lexical representation and processes (pp. 342–356). Cambridge, MA: MIT Press. Dancer, J., Krain, M., Thompson, C., Davis, P., et al. (1994). A cross-sectional investigation of speechreading in adults: Effects of age, gender, practice, and education. Volta Review, 96, 31–40. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466. De Filippo, C. L. (1982). Memory for articulated sequences and

lipreading performance of hearing-impaired observers. Volta Review, April, 134–146. De Filippo, C. L., & Scott, B. L. (1978). A method for training and evaluating the reception of ongoing speech. Journal of the Acoustical Society of America, 63, 1186–1192. Demorest, M. E., & Bernstein, L. E. (1992). Sources of variability in speechreading sentences: A generalizability analysis. Journal of Speech and Hearing Research, 35, 876–891. Demorest, M. E., & Bernstein, L. E., & DeHaven, G. P. (1996). Generalizability of speechreading performance on nonsense syllables, words, and sentences: Subjects with normal hearing. Journal of Speech & Hearing Research, 39, 697–713. Dodd, B. (1977). The role of vision in the perception of speech. Perception, 6, 31–40. Dodd, B., & Campbell, R. (1987). Hearing by eye: The psychology of lipreading. London: Erlbaum. Gailey, L. (1987). Psychological parameters of lipreading skill. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 115–141). London: Erlbaum. Gathercole, S. E., & Baddeley, A. D. (1993). Working memory and language. Hove, England: Erlbaum. Gnosspelius, J., & Spens, K.-E. (1992). A computer-based speech tracking procedure. Speech Transmissions Laboratory Quarterly Progress and Status Report, 2, 131–137. Hanson, V. L. (1982). Short-term recall by deaf signers of American sign language: Implications of encoding strategy for order recall. Journal of Experimental Psychology: Learning, Memory and Cognition, 8, 572–583. Hanson, V. L. (1990). Recall of ordered information be deaf signers: Phonetic coding in temporal order recall. Memory & Cognition, 18, 604–610. Hanson, V. L., & MacGarr, N. S. (1989). Rhyme generation by deaf adults. Journal of Speech and Hearing Research, 32, 2–11. Kjelgaard, M. M., & Speer, S. H. (1999) Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. Journal of Memory and Language, 40, 153–194. LaBerge, D., & Samuels, S. J. (1985). Toward a theory of automatic information processing in reading. In H. Singer & R. B. Ruddell (Eds.), Theoretical models and processes of reading, 3rd ed. Newark, DE: International Reading Association. Leybaert, J., & Charlier, B. (1996). Visual speech in the head: The effect of cued-speech on rhyming, remembering and spelling. Journal of Deaf Studies and Deaf Education, 1, 234–248. Lidestam, B., Lyxell, B., & Andersson, G. (1999). Speechreading: cognitive predictors and displayed emotion. Scandinavian Audiology, 28, 211–217 Lindfield, K. C., Wingfield, A., & Goodglass, H. (1999a). The contribution of prosody to spoken word recognition. Applied Psycholinguistics, 20, 395–405. Lindfield, K. C., Wingfield, A., & Goodglass, H. (1999b). The role of prosody in the mental lexicon. Brain & Language, 68, 312–317. Logan, K., Maybery, M., & Fletcher, J. (1996). The short-term memory of profoundly deaf people for words, signs and abstract spatial stimuli. Applied Cognitive Psychology, 10, 105–119. Luce, P. A., & Pisoni, D. A. (1998). Recognizing spoken words:

Visual Speech Understanding and Cognition The neighborhood activation model. Ear & Hearing, 19, 1–36. Lyxell, B. (1989). Beyond lips: Components of speechreading. Doctoral dissertation, University of Umea˚, Sweden. Lyxell, B. (1994). Skilled speechreading: A single-case study. Scandinavian Journal of Psychology, 35, 212–219. Lyxell, B., Andersson, J., Arlinger, S., Bredberg, G., Harder, H., & Ro¨nnberg, J. (1996). Verbal information-processing capabilities and cochlear implants: Implications for preoperative predictors of speech understanding. Journal of Deaf Studies and Deaf Education, 1, 190–201. Lyxell, B., & Ro¨nnberg, J. (1987). Guessing and speech-reading. British Journal of Audiology, 21, 13–20. Lyxell, B., & Ro¨nnberg, J. (1989). Information-processing skill and speech-reading. British Journal of Audiology, 23, 339–347. Lyxell, B., & Ro¨nnberg, J. (1991a). Visual speech processing: Word-decoding and word-discrimination related to sentence-based speechreading and hearing-impairment. Scandinavian Journal of Psychology, 32, 9–17. Lyxell, B., & Ro¨nnberg, J. (1991b). Word discrimination and chronological age related to sentence-based speech-reading skill. British Journal of Audiology, 25, 3–10. Lyxell, B., & Ro¨nnberg, J. (1992). The relationship between verbal ability and sentence-based speechreading. Scandinavian Audiology,21, 67–72. Lyxell, B., & Ro¨nnberg, J. (1993). The effects of background noise and working memory capacity on speechreading performance. Scandinavian Audiology, 22, 67–70. Lyxell, B., Ro¨nnberg, J., & Samuelsson, S. (1994). Internal speech functioning and speechreading in deafened and normal hearing adults. Scandinavian Audiology, 23, 179–85. Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word-recognition. Cognition, 25, 71–102. McClelland, J. L., & Elman, J. L. (1986). The trace model of speech perception. Cognitive Psychology, 18, 1–86. Norris, D., McQueen, J. M., & Cutler, A. (1995). Competition and segmentation in spoken-word recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 1209–1228. ¨ hngren, G. (1992). Touching voices: Components of direct tactuO ally supported speechreading. Doctoral dissertation, University of Uppsala, Sweden. Pisoni, D. B., & Luce, P. A. (1987). Acoustic-phonetic representations in word recognition. Cognition, 25, 21–52. Ro¨nnberg, J. (1990). Cognitive and communicative function:

115

The effects of chronological age and “Handicap Age.” European Journal of Cognitive Psychology, 2, 253–273. Ro¨nnberg, J. (1995). What makes a skilled speechreader? In G. Plant & K.-E. Spens (Eds.), Profound deafness and speech communication (pp. 393–416). London: Whurr. Ro¨nnberg, J., Andersson, J., Andersson, U., Johansson, K., Lyxell, B., & Samuelsson, S. (1998). Cognition as a bridge between signal and dialogue: Communication in the hearing impaired and deaf. Scandinavian Audiology, 27(suppl. 49), 101–108. Ro¨nnberg, J., Andersson, J., Samuelsson, S., So¨derfeldt, B., Lyxell, B., & Risberg, J. (1999). A speechreading expert: The case of MM. Journal of Speech, Language, and Hearing Research, 42, 5–20. Ro¨nnberg, J., Andersson, U., Lyxell, B., & Spens, K-E. (1998). Vibrotactile speech tracking support: Cognitive prerequisites. Journal of Deaf Studies and Deaf Education, 3, 143–156. Salthouse, T. (1985). A theory of cognitive aging. Amsterdam: North-Holland. Samuelsson, S., & Ro¨nnberg, J. (1991). Script activation in lipreading. Scandinavian Journal of Psychology, 32, 124–143. Samuelsson, S., & Ro¨nnberg, J. (1993). Implicit and explicit use of scripted constraints in lipreading. European Journal of Cognitive Psychology, 5, 201–233. Schepman, A., & Rodway, P. (2000). Prosody and parsing in coordination structures. Quarterly Journal of Experimental Psychology, 53A, 377–396. Share, D. L., & Stanovich, K. E. (1995). Cognitive processing in early reading development: Accomodating individual difference into a model of acquisition. Issues in Education, 1, 1–57. Shoop, C., & Binnie, C. A. (1979). The effect of age upon the visual perception of speech. Scandinavian Audiology, 8, 3–8. Spens, K-E. (1995). Evaluation of speech tracking results: Some numerical considerations and examples. In G. Plant & K.-E. Spens (Eds.), Profound deafness and speech communication (pp. 417–437). London: Whurr. Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience, 2, 191–196. Tact aid 7: User’s manual. (1991). Somerville, MA: Audiological Engineering Corporation. Towse, J. N., Hitch, G. J., & Hutton, U. (1998). A reevaluation of working memory capacity in children. Journal of Memory and Language, 39, 195–217.