Speech Recognition in Noise by Hearing-impaired ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
A prevailing complaint among individuals with sensorineural hearing loss (SNHL) ... of SNHL to normal-hearing individuals with simulated hearing impairments, ...
J Am Acad Audiol 6: 414-424 (1995)

Speech Recognition in Noise by Hearing-impaired and Noise-masked Normal-hearing Listeners Alyssa R. Needleman* Carl C . Crandellt

Abstract A prevailing complaint among individuals with sensorineural hearing loss (SNHL) is difficulty understanding speech, particularly under adverse listening conditions . The present investigation compared the speech-recognition abilities of listeners with mild to moderate degrees of SNHL to normal-hearing individuals with simulated hearing impairments, accomplished using spectrally shaped masking noise. Speech-perception ability was assessed using the predictability-high sentences from the Speech Perception in Noise test . Results revealed significant differences between groups in sentential-recognition ability, with the hearing-impaired subjects performing poorer than the masked-normal listeners. These findings suggest the presence of a secondary distortion degrading sentential-recognition ability in the hearing impaired . Implications of these data will be discussed concerning the mechanism(s) responsible for speech perception in the hearing impaired . Key Words:

Auditory distortions, noise masking, speech perception

isteners with sensorineural hearing loss (SNHL) often complain of difficulty L understanding speech, particularly in adverse listening conditions (e .g ., Cooper and Cutts, 1971 ; Keith and Talis, 1972 ; Dirks et al, 1982 ; Suter, 1985 ; Plomp, 1986 ; Gordon-Salant, 1987 ; Crandell, 1991 ; Crandell et al, 1991) . At present, however, the auditory and/or cognitive mechanisms responsible for these perceptual deficits are not well recognized . Two predominant hypotheses have been proposed to explain diminished speech perception in the hearing impaired . One theory suggests that reduced speech-recognition abilities are the result of secondary distortions accompanying the loss of pure-tone sensitivity, such as reductions in cochlear processing (i .e ., reduced frequency selectivity and temporal resolution) (Dirks et al, 1982 ; Plomp, 1986 ; Trees and Turner,

'Callier Center for Communication Disorders, University of Texas-Dallas, Dallas, Texas ; and ?Department of Communication Processes and Disorders, University of Florida, Gainesville, Florida Reprint requests : Alyssa R . Needleman, Callier Center for Communication Disorders, University of Texas at Dallas, 1966 Inwood Road, Dallas, TX 75235

414

1986 ; Gordon-Salant, 1987 ; Irwin and McAuley, 1987 ; Summerfield, 1987 ; Turner et al, 1987 ; Tyler, 1988 ; Crandell, 1991 ; Crandell et al, 1991 ; van Rooij and Plomp, 1991) . In contrast, a second hypothesis proposes that perceptual deficits in the hearing impaired are predominantly due to the attenuation of the speech signal resulting from the reduction of pure-tone sensitivity (Kamm et al, 1985 ; Humes et al, 1986, 1987, 1988 ; Humes and Roberts, 1990 ; Humes, 1991). One procedure used to evaluate these hypotheses is to compare speech-recognition ability in hearing-impaired listeners with normalhearing subjects in which hearing loss has been simulated via spectrally shaped masking noise. In such procedures, spectrally shaped masking noise is presented to normal-hearing individuals to simulate masked thresholds similar to those of hearing-impaired persons (Fabry and van Tasell, 1986 ; Humes et al, 1987, 1988 ; Zurek and Delhorne, 1987 ; Humes and Roberts, 1990 ; Needleman and Crandell, 1992, 1993a, b) . It has been suggested that noise-masking simulation produces a threshold elevation of cochlear origin as well as simulates the loudness recruitment typically found in sensorineural pathology (Humes et al, 1987 ; Humes and Roberts, 1990). In theory, a comparison of the speech-perception

Speech Recognition in Noise/Needleman and Crandell

performance of noise-masked normals with hearing-impaired listeners would suggest whether deficits in speech recognition are the result of secondary distortions accompanying the loss of audibility or simply reduced pure-tone sensitivity. That is, if the noise-masked normals perform the same as or poorer than the individuals with SNHL, it would imply that reduced performance was due to attenuated pure-tone sensitivity. Conversely, if the noise-masked normals obtain better speech-recognition scores than the hearing-impaired listeners, an auditory distortion would be suggested as an underlying causal factor. Several investigators have utilized modeling paradigms to examine the effects of reduced pure-tone sensitivity on speech perception under quiet listening conditions (Humes, 1980 ; Florentine and Buus, 1984 ; Fabry and van Tasell, 1986 ; Humes et al, 1987, 1988). Fabry and van Tasell (1986) studied the effects of masking and filtering on consonant recognition for six unilaterally hearing-impaired adults . Filtering is typically thought to provide a more accurate simulation of a conductive hearing loss, in which the signal is attenuated before it reaches the cochlea (Milner, 1982 ; Humes et al, 1987). Consonant-recognition performance and confusion error patterns were compared between the hearing-impaired and normal ear using Sequential Information Analysis (SINFA) (Wang and Bilger, 1973 ; Bilger and Wang, 1976 ; Wang et al, 1978 ; Walden, 1984). Auditory thresholds using both the filtering and masking procedures were matched to within ±3 dB of the impaired threshold. Consonant-vowel (CV) syllables from the Nonsense Syllable Test (NST) (Resnick et al, 1975) served as the speech stimuli, presented unfiltered to the impaired ear and in both filtered and masked conditions to the normal ear at 65 dB SPL. In general, consonant-recognition performance for the normal ears was similar to that of the impaired ears under both filtering and masking conditions . With respect to the consonant error patterns, four of six subjects showed similar error patterns between the impaired and the simulated ear via filtering, three displayed comparable error patterns via masking, and three subjects exhibited similar error patterns via both procedures . For the remaining subjects, neither simulation paradigm was successful in producing similar scores between the impaired and simulated ear. Results from this investigation suggested that masking proved no more effective in simulating SNHL than did filtering. That is, the additional simulation of

recruitment achieved through noise masking had no effect on the manner in which suprathreshold speech cues were processed by the normal ear. Based on these findings, the authors concluded that while all hearing-impaired subjects could not be successfully simulated by either masking or filtering paradigms, the major effect of SNHL on speech perception is the attenuation of the speech signal resulting from the loss of audibility. Humes et al (1987) utilized a noise-masking paradigm to assess speech recognition in listeners with normal hearing and with SNHL . Specifically, 12 normal-hearing subjects were divided into one of four groups, such that 3 normal-hearing individuals were matched with each hearing-impaired subject. Spectrally shaped masking noise was used for the hearing loss simulation . All thresholds were matched within ±3 dB of the hearing-impaired ear. Nine of the 11 subtests of the NST, presented at levels of 56, 66, 76, and 86 dB HL, were used as the speech stimuli. Results indicated that all four of the hearing-impaired subjects performed the same as or better than the noise-masked normal subjects in terms of percent-correct scores . Interestingly, however, a "gross analysis" of consonant error rates across individual NST subtests showed differences in performance among groups, which varied with presentation level. Despite these findings, the authors suggested that the loss in pure-tone sensitivity was the primary factor in the speech-recognition difficulties noted in individuals with SNHL . Overall, the results of these investigations suggest that hearing-impaired subjects perform as well as or better than noise-masked normal subjects on speech-recognition tasks in quiet listening conditions, thus indicating that the attenuation of pure-tone sensitivity is the major factor in speech-perception deficits . However, it should be noted that each of the aforementioned investigations examined speech perception in quiet listening conditions only, a listening environment infrequently encountered in everyday life . Due to the redundancy of the speech signal, suprathreshold distortions in the auditory system may have minimal influences on speechperception ability under quiet listening situations (e .g ., Plomp, 1986 ; Crandell, 1991). Few studies have investigated the speechrecognition abilities of listeners with simulated hearing loss in the presence of background noise (Zurek and Delhorne, 1987 ; Humes and Roberts, 1990 ; Needleman and Crandell,1992, 1993 a, b) . Zurek and Delhorne (1987) investigated the 415

Journal of the American Academy of Audiology/Volume 6, Number 6, November 1995

effects of noise on the speech-recognition abilities of 15 hearing-impaired listeners and 15 noise-masked normals. Subjects were divided into five groups based on the severity/configuration of their hearing loss (mild shallow-rising, mild steep-rising, mild falling, moderate rising, and moderate flat). Three sets of CV syllables, low-pass filtered at 4500 Hz, served as the speech stimuli. The speech stimuli were presented in a background of speech-spectrum noise at signalto-competition ratios of -20 dB to +20 dB . Speech-perception ability was assessed by percent-correct scores and consonant error patterns. Results revealed no differences in consonantrecognition performance for the hearing-impaired subjects and the noise-masked normal controls, indicating that the greatest source of degradation in the speech signal by hearing-impaired individuals was the combined effect of the hearing loss and external noise. A detailed analysis of the consonant error patterns, however, was not performed. The authors stated only that "the patterns of errors [were] roughly the same for the two groups" (p . 1553). The authors concluded that the effects of suprathreshold auditory deficits on speech intelligibility were negligible in comparison to the effects of the loss of audibility and that amplification should provide satisfactory restoration of the speech signal . Humes and Roberts (1990) investigated the recognition of temporally degraded speech for elderly individuals (aged 65-75 years) with SNHL . The mean audiogram of 13 elderly hearing-impaired subjects was simulated on 10 normal-hearing young adults . The 11 subtests of the NST served as the speech stimuli. The stimuli were administered in quiet, and at a +5 dB signal-to-noise ratio (SNR), presented at 0° and 90° azimuth. Cafeteria noise was used as the competing signal . Results showed that mean speech-recognition performance of the noisemasked normal-hearing subjects closely approximated that of the hearing-impaired subjects . Specifically, no significant differences were found between the hearing-impaired and the maskednormal groups across any of the listening conditions . Though the authors concluded that the loss of audibility was the major contributing factor degrading the speech signal, they note that these conclusions may only be valid when considering the perception of nonsense syllables. Needleman and Crandell (1993b) evaluated the syllabic recognition abilities of 10 hearingimpaired and 10 normal-hearing listeners. All hearing-impaired subjects exhibited mild to moderate degrees of SNHL for durations of 416

greater than 20 years. The 11 subtests of the NST served as the speech stimuli, accompanied by cafeteria noise as the competing signal . An adaptive procedure was used to assess the speech reception threshold (SRT), or 50 percent correct performance level (Levitt and Rabiner, 1967 ; Plomp and Mimpen, 1979). Results revealed that the masked normals performed similarly to or slightly poorer than the hearing-impaired listeners on all tests. Specifically, no significant differences were found in recognition performance between the two groups (F[1,19] = 1 .37, p = .257), suggesting again that the attenuation of the speech signal was the major factor affecting recognition . The investigators are currently examining the consonant error and feature patterns utilizing SINFA multidimensional scaling analysis to determine if any differences in syllabic recognition exist between the maskednormal and hearing-impaired groups . . Although the aforementioned investigations indicate that reductions in auditory sensitivity are the major contributing factor to degraded speech-recognition performance in noise and reverberation, a more detailed examination of these studies reveal potential confounds. Foremost among these confounds is that each of the preceding studies has utilized nonsense syllable tests to assess speech-perception ability. It is well recognized that there is a strong correspondence between highly constrained stimuli, such as nonsense syllables, and audiometric configuration (e .g ., Miller and Nicely, 1955 ; Rosen, 1962 ; Wang and Bilger, 1973 ; Walden et al, 1975 ; Chari et al, 1977 ; Levitt, 1982 ; Walden, 1984 ; Needleman and Crandell, 1992, 1993a, b). Moreover, nonsense syllables do not require listeners to make use of semantic or syntactic information to interpret the speech signal . Thus, it is not surprising that past investigations have shown similar performance between hearing-impaired and masked-normal groups on tests of nonsense stimuli, as there is minimal requirement for higher-level processing . Since sentential stimuli are more representative of everyday speech and require listeners to make use of contextual information, the use of such stimuli may provide a more "real-world" estimate of recognition ability. To date, no investigation has used sentence recognition in comparing perceptual ability between hearing-impaired and masked-normal listeners . With these considerations in mind, the present investigation compared the sentential-recognition abilities of individuals with long-standing SNHL to normal listeners with simulated SNHL produced via noise masking. Findings will suggest

Speech Recognition in Noise/Needleman and Crandell

whether diminished speech perception in the hearing impaired is the result of reductions in auditory sensitivity or secondary distortions accompanying the loss of pure-tone sensitivity. Hearing-impaired subjects consisted of 10 listeners with varying degrees and configurations of mild-to-moderate SNHL . Twenty normal-hearing subjects were given simulated hearing losses that were matched to each hearing-impaired subject . All speech stimuli were presented in a background of speech-spectrum noise. METHOD Subject Selection Thirty adult listeners, 10 with SNHL and 20 with normal-hearing sensitivity, served as subjects for this investigation. Each of the hearingimpaired subjects exhibited bilateral, symmetrical degrees of mild-to-moderate SNHL . Table 1 presents the pure-tone thresholds, from 250 Hz to 8000 Hz, of the hearing-impaired subjects. Audiologic tests (tympanometry, acoustic reflexes, acoustic reflex decay, and Performance Intensity Phonetically Balanced [PIPB] tests) indicated the absence of conductive and/or retrocochlear pathology. The hearing-impaired subjects ranged in age from 21 to 54 years, with a mean age of 40 .2 years. None of the subjects had prior experience in psychoacoustic experimentation . Furthermore, each hearing-impaired individual reported hearing loss since birth or for greater than 20 years . In order to control for individual variability, two normal-hearing subjects served as a noisemasked normal listener for each hearing-impaired subject. A spectrally shaped masking noise was presented to the right ear of each normal hearer Table 1

to simulate the hearing loss of one of the ears of each hearing-impaired listener. Each normalhearing subject exhibited pure-tone thresholds better than 15 dB for octave intervals in the frequency region from 250 to 8000 Hz . Normal listeners ranged in age from 22 to 39, with a mean age of 27 .3 years. In addition, each of the hearing-impaired and normal-hearing subjects met the following criteria : (1) normal middle ear function as demonstrated by tympanometry (+ 100 mm H20) ; (2) present acoustic reflexes from 500 to 2000 Hz ; (3) excellent word recognition in quiet, 90 percent or better, when listening to NU-6 words presented at 80 dB HL ; (4) native speakers ofAmerican English; and (5) good health with no history of chronic illness or disease. Speech Stimuli Speech perception was assessed by the predictability-high sentences from the revised Speech Perception in Noise test (PH-SPIN) (Kalikow et al, 1977 ; Bilger, 1984). The revised SPIN test is comprised of eight lists containing 50 sentences per list . The lists were designed to be comparable in phonetic balance, average sentence length, and number of syllables (Bilger, 1984) . The revised SPIN test consists of two types of stimuli : predictability-high (PH) and predictability-low (PL) sentences. In the PH sentences, the final noun is highly predictable from syntactical and semantic cues within the sentence (i .e ., "She made the bed with clean sheets ."). The PL sentences are contextually neutral so that identification of the final noun is based solely on acoustic information contained in the target word (i .e ., "We're discussing the sheets ."). The sentences were developed in this manner to

Pure-tone Thresholds, in dB SPL, and Etiology and Duration of Hearing Loss for All 10 Hearing-impaired Subjects Frequency (Hz)

Subject

250

500

1000

2000

4000

8000

Etiology

Duration

1

32

23

29

37

35

62

Unknown

> 20 years

3

22

14

27

46

64

66

Unknown

> 25 years

2 4 5 6

7

8 9 10

47 31 32 28 36

30 71 29

24 31 29 25 33

23 46 17

23 51 5 28 38

27 51 18

36 35 4 36 61

56 39 55

56 24 10 56 57 72 35 58

68 10 14 55 44 68

Unknown Unknown Unknown Noise induced Noise induced Unknown

72 67

Unknown Unknown

> 25 years From birth From birth > 20 years > 35 years From birth

> 20 years From birth

Journal of the American Academy of Audiology/Volume 6, Number 6, November 1995

permit separate assessment of a listener's ability to use linguistic and acoustic information. Since the PH sentences are more representative of real-world listening situations, only those sentences were utilized in this investigation. The following procedures were implemented to generate test PH-SPIN lists appropriate for use in this study. The analog recording of each sentence from the four PH sentence lists were played on a reel-to-reel tape recorder (Revox A77) and delivered to a laboratory computer (LSI11/73) for digitization . The sentences were low-pass filtered at 6 .3 kHz and digitized using a sampling rate of 20 kHz and sampling duration of 3 seconds. The digitized sentences were randomized and stored in files of 25 sentences. Once the sentences were digitized, a computer program randomized the 25 sentences in a list, making a total of eight different lists . The randomized lists were then converted back to their analog form via D/A conversion, low-pass filtered at 6.3 kHz, and recorded on magnetic cassette tapes using a cassette tape recorder (Hitachi D-E33) . The PH sentences on cassette contained eight lists of 25 sentences. A 1000-Hz narrow-band calibration noise was placed at the beginning of each tape . The level of the calibration noise for the PH-SPIN sentences was equal to the long-term root mean square (RMS) level of the speaker's voice. The SPIN test was chosen as the speechperception stimuli for a number of reasons. First, the sentences from the SPIN test allowed assessment of each individual's ability to recognize speech under real-world listening conditions. In addition, the SPIN test has been shown to have high test-retest reliability for both normal and hearing-impaired listeners (Bilger, 1984 ; Dubno et al, 1984 ; Kamm et al, 1985 ; Irwin and McAuley, 1987 ; Crandell, 1991) . Competing Noise The SPIN stimuli were accompanied by the multitalker babble recording from the revised SPIN test (Kalikow et al, 1977). The multitalker babble recording accompanies the SPIN commercially. The noise was generated by recording six adults (three males, three females) reading a passage in an anechoic chamber. The six recordings were then combined with a second recording of the same speakers, producing a 12speaker babble . The SPIN noise has a long-term spectrum equivalent to the long-term spectrum of speech . The babble is characterized by a relatively flat spectrum below approximately 800 418

Hz and an attenuation rate of 9 to 10 dB/octave above 800 Hz . The multitalker babble noise was recorded onto the second track of the PHSPIN tapes . A 1000-Hz narrow-band calibration noise was placed at the beginning of the noise stimuli, equal to the long-term RMS level of the noise . Procedures Subsequent to the audiologic evaluation, additional pure-tone testing for the normalhearing and hearing-impaired subjects was conducted to assess precise pure-tone thresholds in 1/3-octave frequencies from 63 to 8000 Hz . All pure-tone testing was conducted in a doublewalled IAC sound-treated room under TDH-49 earphones mounted in MX-41/AR supra-aural cushions . Alaboratory computer (Zenith, Model 386/25) controlled the generation and presen . tation of stimuli, adaptive procedures, and online collection of subject responses . Specifically, pure-tone stimuli were generated by the laboratory computer to have a 400-msec duration with a 10-msec rise-fall time . Data for the pure-tone threshold estimation were obtained using a two-alternative forced-choice paradigm designed to estimate the 70 .7 percent correct threshold level (Levitt, 1971). In such a paradigm, two successive correct responses caused a decrease in signal level, while one incorrect response caused an increase in signal level . A total of 14 reversals in signal level was used for a single threshold estimate . Signal increments/decrements were 5 dB for the first 3 reversals and 2 dB for the last 11 reversals. The final 10 reversals were averaged to estimate threshold. The two observation intervals of each trial were separated by 500 msec . Practice trials were given to each subject in order to familiarize the listener with the experimental task . In addition, the order of presentation for stimulus frequency was randomized . Noise-masking simulation for the normalhearing subjects was accomplished by routing the output of a white noise generator (Coulbourn, Model S81/02), to be spectrally shaped by a 1/3octave-band multifilter (Bruel & Kjaer, Model 1612/SP, 1612/SIA), amplified (Crown, Model CA 150) and output to both ears of the normalhearing subject (Fig. 1) . The multifilter has an attenuation of 20 dB at ±1/a octave from the center frequency. All simulated thresholds were within ±5 dB of the actual hearing-impaired threshold from 63 to 8000 Hz in 1/3-octave intervals . Masking noise was presented to the left ear

Speech Recognition in Noise/Needleman and Crandell

Figure 1 Block diagram of experimental instrumentation. (Adapted from Zurek and Delhorne, 1987 .)

Speech Stimuli

Multitalker Babble

TDH-49

White poise Spectrally Shaping Thresholds

Hearing Impaired

of the normal-hearing subjects to avoid participation of the better (nonmasked) ear in the listening task . Subjects listened to all speech stimuli under TDH-49 earphones mounted in MX-41/AR supraaural cushions while seated in a sound-treated room (Tracor, Model RS 253C) . Speech and noise stimuli were played on separate channels of a stereo cassette tape recorder (Nakamichi, Model BX-106). The speech stimuli and the competing noise were separately attenuated, mixed, amplified, and presented to the right ear of each subject . The subjects'task was to repeat the sentence presented . Mean SRTs were measured for each subject. To achieve appropriate playback levels, the output level for the 1000-Hz narrow-band noise calibration signal at the earphone was measured in a 6-cml coupler. The attenuators were then adjusted, with the VU meter set at 0 dB, to read desired SPLs . The attenuator settings necessary to achieve these levels were noted and used throughout the playback procedure. An adaptive procedure to assess the SRT or 50 percent correct performance (Levitt and Rabiner, 1967 ; Plomp and Mimpen, 1979) was utilized to assess performance for both stimuli . Adaptive procedures circumvent those difficulties associated with percent correct scores, by limiting all recognition scores to the linear portion of the performance-intensity function and avoiding floor and ceiling effects. In addition, high intrasubject reliability has been reported for such procedures (e .g ., Duquesnoy and Plomp, 1983 ; Festen and Plomp, 1983 ; Plomp, 1986 ; Crandell, 1991). Using this procedure, the noise level was kept constant while the speech signal was varied in 1-dB steps. Specifically, the following procedure was utilized for this paradigm : (1) presentation of the first stimulus item began at an inaudible level and was increased in 2-dB

steps until it could be correctly repeated . Subjects were encouraged to guess when necessary; (2) the next stimulus was repeated at a level 1 dB lower than the first stimulus ; (3) if this stimulus was correctly recognized, the presentation level for the following stimulus was decreased by 1 dB . If, however, this stimulus was incorrectly recognized, the presentation level for the following stimulus was raised by 1 dB ; (4) the above steps were repeated for all remaining stimuli. The SRT was determined by averaging the 50 percent response level over the last 20 SPIN sentences, such that approximately 12 to 13 sentences were presented to determine the SRT. The speech stimuli were administered in the presence of multitalker babble, presented at 75 dB SPL, to simulate typical environmental noise levels . It should be noted that the noise level (75 dB SPL) was close to or above threshold across the frequency range of all hearing-impaired subjects, thus ensuring that the SRTs were determined primarily by the competing noise, rather than by the listeners'thresholds (absolute or masked). Total test time per subject lasted approximately 5 hours, completed over two sessions . RESULTS he mean SRTs and standard deviations for T the SPIN sentences in dB SPL are shown in Figure 2 and Table 2, for the hearing-impaired subjects and the average performance for each pair of masked-normal subjects . Masked normals (shaded bars) are presented next to the hearing-impaired subject (black bars) he/she modeled. Recall that the higher the SRT, the greater the recognition difficulty in noise. Statistical analysis revealed a significant difference in mean performance between the two groups (F[1,19] = 32.27, p < .0001), with masked normals 419

Journal of the American Academy of Audiology/Volume 6, Number 6, November 1995

obtaining better thresholds in noise than the hearing-impaired listeners. Post-hoc analyses utilizing the Tukey test revealed that all performance differences between subjects, with the exception of subject 1 data, were significant at the 0 .05 level. No significant differences in performance were exhibited between the two masked normals modeling individual hearingimpaired subjects . While the performance differences between the hearing-impaired listeners and the noisemasked normals may initially appear inconsequential, relatively small changes in SNR can equate to large differences in percent-correct scores (Crandell, 1991) . Preliminary data for normal-hearing subjects has indicated that a 1-dB improvement in SNR for the SPIN sentences equates to an improvement of approximately 10 percent in percent-correct scores (Crandell, 1991 ; Crandell et al, 1991). DISCUSSION he results of the present investigation indiT cated that hearing-impaired subjects obtained poorer sentence recognition scores than did masked normals. These results indicate that the hearing-impaired subjects exhibit greater susceptibility to noise than noise-masked normals, suggesting that factors other than the loss of pure-tone sensitivity have degraded performance. Specifically, the decrement in recognition performance of the hearing-impaired listeners as compared to the noise-masked normals on the SPIN suggests the presence of a secondary distortion affecting perception . At present, however, the origin of the secondary distortions

Table 2 Speech Reception Thresholds, in dB SPL, and Standard Deviations for the SPIN Test Subject

Hearing Impaired

Noise Masked

1

71 .45

72 .18

3

75 .60

70 .75

76 .85

2

76 .05 77 .65 80 .10

4 5 6

76 .00

7

80 .00 76 .25 78 .90

8 9 10

Mean (Standard Deviation)

76 .89 (2 .53)

72 .38

71 .95 73 .65 72 .80

72 .13

70 .48 72 .30 71 .85

72 .05 (0 .91)

affecting sentential recognition remains uncertain . Several hypotheses are offered to explain. these findings . First, cochlear and/or central auditory distortions such as impaired frequency selectivity, temporal resolution, frequency discrimination, intensity discrimination, and loudness recruitment may have affected individuals' discrimination of an incoming speech signal, particularly in the presence of noise. In support of this hypothesis, a number of investigations have reported significant relationships between cochlear or central auditory distortions and speech perception in noise (e .g ., Olsen et al, 1975 ; Konkle et al, 1977 ; Orchik and Burgess, 1977 ; Bonding, 1979 ; Leshowitz and Lindstrom, 1979 ; Florentine et al, 1980 ; Chung, 1981 ; Tyler et al, 1982, 1983 ; Stelmachowitz et al, 1985 ; Trees and Turner, 1986 ; Horst, 1987 ; Irwin and McAuley, 1987 ; Turner et

8s

m

75

70

65

1

2

3

4

5

6

7

Subject Number

420

8

8

1 0

Mean

Figure 2 Speech reception thresholds, in dB SPL, for the SPIN test . Average performance for each pair of masked-normal subjects (shaded bars) is presented next to hearing-impaired subjects (black bars) modeled. Higher SRT values reflect greater difficulty in sentence recognition . Mean performance for each group is indicated in the last row.

Speech Recognition in Noise/Needleman and Crandell

al, 1987 ; Gagne, 1988 ; Jerger et al, 1989 ; Stach et al, 1991 ; Jerger, 1992). For example, Bonding (1979) investigated critical bandwidths derived from loudness summation, psychoacoustic tuning curves, and speech-discrimination scores in noise for listeners with SNHL . Findings from Bonding's investigation revealed a monotonic relationship between poor speech-recognition performance in noise and degraded frequency selectivity in listeners with SNHL. In a similar investigation, Horst (1987) examined frequency discrimination, frequency selectivity, and speech perception in individuals with SNHL . Findings from Horst's investigation revealed a significant correlation between speech perception in noise and frequency selectivity (r = -0 .80, p < .01), and speech recognition and frequency discrimination (r =-0.76, p < .01) . Second, the hearing-impaired listeners may be exhibiting diminished cognitive/linguistic processing, particularly in extracting lexical meaning from the stimuli. While sentences provide more redundancy to the acoustic signal, as well as linguistic information, they require more complex processing for recognition (Giolas and Epstein, 1963 ; Giolas, 1966 ; Kalikow et al, 1977 ; Levitt, 1982). That is, sentential recognition requires the listener to extract both semantic and syntactic information from the speech signal, as well as acoustic information . While there is currently no evidence in the literature to support this hypothesis, it is possible that the processing strategies utilized by hearing-impaired listeners are compromised when higher levels of complexity are required for comprehension of speech stimuli. A possible extension of this investigation would be to compare the recognition abilities for the difference score on the PL versus PH sentences of the SPIN test, to get an indication of use of cognitive/linguistic information in these listeners . However, it has been shown by Owen (1981) that SPIN difference scores are related more to the audibility of the speech stimulus than to contextual use. Another potential procedure for assessing the use of linguistic information has been described by Boothroyd and Nittrouer (1988, 1990). These investigators examined the effect of context in word and sentence recognition in normal-hearing children, elderly, and young adults . The authors attempted to quantify the effects of context based on predictions of simple probability theory (Boothroyd, 1978, 1985 ; Schiavetti et al, 1984 ; Benoit,1990) . Results suggested that sentence context was more important than word context and that semantic constraints were the most important contextual

factor, given that the listener had sufficient knowledge of the language and had normal hearing. Unfortunately, to date, no investigation has attempted to quantify the effects of context utilized in speech recognition by hearing-impaired listeners. It is quite likely that hearing-impaired listeners utilize context differently than do normal-hearing listeners, due to the fact that the distortion of the incoming speech signal serves to degrade their immediate processing performance. It is reasonable to assume that while normalhearing listeners are able to use all of the contextual cues in the sentence to derive its meaning, perhaps hearing-impaired listeners do not fully make use of these contextual cues, or use them in a different manner, to synthesize meaning from sentences. Thus, when more demanding auditory processing is required for perceiving sentences, hearing-impaired listeners' inefficient use of context makes their sentential perception poorer than the normal hearers. In contrast, processing of nonsense syllables does not require listeners to make use of context. Thus, no differences are seen in recognition scores between hearing-impaired and masked-normal listeners. A third hypothesis is that response bias differences exist between the two groups . That is, the hearing-impaired subjects may demonstrate vastly different strategies than the masked normals, particularly in recognizing sentential material . For example, if the listener understands a message correctly but mistrusts what he/she has heard, the flow of conversation is impeded. Further, if the listener responds to a message that he/she has heard incorrectly, the conversation is additionally interrupted (Yanz, 1984 ; Yanz et al, 1985). Hence, a performance decrement may have been demonstrated by the hearing-impaired listeners because of different biases in responding to sentential material than normal-hearing listeners. A well-recognized procedure for examining response bias differences between groups is provided by the Theory of Signal Detection (TSD) (Pollack and Decker, 1958 ; Broadbent, 1967 ; Yanz, 1984 ; Yanz et al, 1985 ; Gordon-Salant, 1986 ; Jerger et al, 1988). Yanz (1984) described the application of TSD to assessing speech perception . This method quantifies an individual's ability to determine the accuracy of his/her identifications on a speech task as well as his/her bias towards trusting these identifications. The subject's task is to first identify the speech signal and then to assess its correctness . Needleman and Crandell (1994) utilized TSD to investigate the response biases between

Journal of the American Academy of Audiology/ Volume 6, Number 6, November 1995

the two groups . Subjects responded to syllabic and sentential stimuli presented in backgrounds of noise, then judged the correctness of their responses using a binary decision task . The TSD was applied to determine differences in selfassessment ability (d') and response bias ((3) between groups for both syllabic and sentential stimuli. Results for syllabic recognition indicated no significant differences in response bias between groups and greater self-assessment ability by the hearing-impaired group, suggesting the absence of any additional psychoacoustic distortion . In contrast, tests of sentential recognition revealed that the masked-normal group exhibited a significantly stricter criterion for self-assessment, as well as greater sensitivity. These contrasting findings suggest that response biases deleteriously affect intelligibility for individuals with SNHL when listening to sentential material . A fourth hypothesis proposes that hearingimpaired listeners could be experiencing difficulty in the processing of intonational/prosodic cues in the speech signal . These deficits may be the result of cochlear, central, and/or cognitive distortions . Intonational/prosodic cues consist of variations in the vowel pitch, duration, spectrum, and intensity of sounds and in the fundamental frequency of voiced sounds (Kalikow et al, 1977 ; Pickett, 1980). It has been shown that these features are used by listeners as cues for understanding sentences, particularly stress and intonation (Kozhevnikov and Chistovich, 1965 ; Speaks, 1967 ; Pickett, 1980) . Kozhevnikov and Chistovich (1965) examined the use of rhythmic structure in perceiving words and sentences in normal-hearing young adult listeners. Analysis of the errors showed that listeners made use of the rhythm of the message and the features of the individual sounds . The authors concluded that listeners reached decisions about the words without waiting for the termination of the whole sentence . Further, they suggested that the unit for which final decisions are made concerning sentential meaning exceeds the length of the syllable . A final hypothesis suggests that the use of multitalker babble combined with the spectrally shaped masking noise for the normal hearers provided them with an advantage over the hearing-impaired listeners in performance on the SPIN sentences . Investigators have shown that SRTs are higher when SPIN sentences are presented in multitalker babble as opposed to white noise (e .g ., Dirks et al, 1986 ; Lewis et al, 1988 ; Dubno and Schaefer, 1992 ; 422

Baer et al, 1993) . The combination of the babble with the spectrally shaped noise may alter the temporal characteristics of the competing noise for the normal hearers, thus lowering SRTs for the masked-normal subjects . While the combination of noises may have provided an advantage to the masked-normal listeners, extensive testing performed by Lewis et al (1988) revealed that babble was a more effective masker than random noise by a factor of approximately 2.3 dB . Results of the current investigation show a difference between masked-normals (combination maskers) and hearing-impaired listeners (babble) by a factor of 4 .8 dB, substantially greater than that found by Lewis et al (1988) . This would suggest that, even in the presence of the combination masker, additional distortion(s) was serving to degrade recognition ability for the hearing-impaired listeners. Furthermore, past investigations have utilized this modeling paradigm for examining the effects of cafeteria noise on the recognition abilities of listeners with simulated hearing impairments (Zurek and Delhorne, 1987 ; Humes and Roberts, 1990 ; Needleman and Crandell, 1992, 1993a, b) . While no differences in performance were found, it has already been established that a strong correlation exists between the pure-tone audiogram and highly constrained syllabic information (e .g ., Miller and Nicely, 1955 ; Rosen, 1962 ; Walden et al, 1975 ; Chari et al, 1977 ; Levitt, 1982 ; Walden, 1984). It is unlikely that the combination of the two maskers in these studies would affect only the multitalker babble noise and not the cafeteria noise. In summary, the results of this investigation indicate that differences exist between the sentential-recognition abilities of individuals with simulated hearing impairment and those with SNHL . These results may suggest the presence of a secondary distortion(s) degrading the accurate perception of speech . To date, the origin of these distortions remains uncertain. Several hypotheses include cochlear and/or central auditory distortions, deficits in cognitive/ linguistic function, differences in response bias, and inefficient utilization of prosodic cues . Future research must attempt to isolate the cochlear, central, cognitive, and/or intonational distortions that influence speech-recognition ability in hearing-impaired listeners. Certainly, it is reasonable to assume that only when we have identified the origin of the secondary distortions will we be able to implement appropriate remediational strategies for hearingimpaired individuals.

Speech Recognition in Noise/Needleman and Crandell

REFERENCES Baer T, Moore BC, Gatehouse S. (1993) . Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment : effects on intelligibility, quality, and response times. JRehab Res Dev 30 :49-72 .

Florentine M, Buns S. (1984) . Temporal gap detection in sensorineural and simulated hearing impairments . J Speech Hear Res 27 :449-455 . Florentine M, Buus S, Scharf B, Zwicker E. (1980) . Frequency selectivity in normal-hearing and hearingimpaired observers . J Speech Hear Res 23 :646-669 .

Benoit C . (1990) . An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity. Speech Communication 9:293-304 .

Gagne J. (1988) . Excess masking among listeners with a sensorineural hearing loss . JAcoust Soc Am 83:2311-2322 .

Bilger R. (1984) . Speech recognition test development. In : Elkins E, ed . Speech Recognition by the Hearing Impaired. ASHA Reports 14 :2-15 .

Giolas TG, Epstein A. (1963) . Comparative intelligibility of word lists and continuous discourse. J Speech Hear Res 6:349-358 .

Bilger R, Wang M. (1976) . Consonant confusions in patients with sensorineural hearing loss . J Speech Hear Res 19 :718-748 .

Gordon-Salant S. (1987) . Consonant recognition and confusion patterns among elderly hearing-impaired subjects . Ear Hear 8:270-276 .

Bonding P. (1979) . Frequency selectivity and speech discrimination in sensorineural hearing loss . Scand Audiol 8:205-215 .

Gordon-Salant S. (1986) . Effects of aging on the response criteria in speech-recognition tasks. J Speech Hear Res 29 :155-162 .

Boothroyd A. (1978) . Speech perception and sensorineural hearing loss . In : Ross M, Giolas TG, eds. Auditory Management of Hearing-Impaired Children . Baltimore: University Park Press, 117-144. Boothroyd A. (1985) . Measurement of speech production in hearing-impaired children : some benefits of forcedchoice testing . J Speech Hear Res 28 :185-196 . Boothroyd A, Nittrouer S. (1988) . Mathematical treatment of context effects in phoneme and word recognition. JAcoust Soc Am 84 :101-111 .

Broadbent DE . (1967). Word-frequency effect and response bias . Psychol Rev 74 :185-196 . Chari NCA, Herman G, Danhauer JL . (1977). Perception of one-third octave-band filtered speech . JAcoust Soc Am 61 :576-580 . Chung DY (1981) . Masking, temporal integration and sensorineural hearing loss . J Speech Hear Res 24:514-520 . Cooper J, Cutts B. (1971) . Speech discrimination in noise. J Speech Hear Res 14 :332-337 . Crandell CC . (1991) . Individual differences in speech recognition ability : implications for hearing aid selection. Ear Hear 12(Suppl) :100S-1088 . Crandell CC, Henoch MA, Dunkerson KA . (1991) . A review of speech perception and aging: some implications for aural rehabilitation . JAcad Rehabil Audiol 24 :121-132 . Dirks DD, Bell TS, Rossman RN, Kincaid GE . (1986) . Articulation index predictions of contextually dependent words. JAcoust Soc Am 80 :82-92 . Dirks DD, Morgan DE, Dubno JR . (1982) . A procedure for quantifying the effects of noise on speech recognition. J Speech Hear Disord 47 :114-123 . Dubno JR, Dirks DD, Morgan DE . (1984) . Effects o£ age and mild hearing loss and age on speech recognition in noise. JAcoust Soc Am 76 :87-96 . Dubno JR, Levitt H. (1981) . Predicting consonant confusions from acoustic analysis . JAcoust Soc Am 69 :249-261 . Duquesnoy A, Plomp R. (1983) . The effect of a hearing aid on the speech-recognition threshold of hearingimpaired listeners in quiet and in noise. JAcoust Soc Am 73 :2166-2173 . Fabry DA, van Tasell DJ . (1986) . Masked and filtered simulation of hearing loss : effects on consonant recognition . J Speech Hear Res 29 :170-178 . Festen J, Plomp R. (1983) . Relations between auditory functions in impaired hearing . J Acoust Soc Am 73 :652-662 .

Giolas TG . (1966) . Comparative intelligibility scores of sentence lists and continuous discourse. JAud Res 6:31-38.

Horst J. (1987) . Frequency discrimination of complex signals, frequency selectivity, and speech perception in hearing-impaired subjects. JAcoust Soc Am 82 :874-884 . Humes L. (1980) . Temporary threshold shift for masked pure tones. Audiology 19 :335-345 . Humes LE. (1991) . Understanding the speech-understanding problems of the hearing-impaired . J Am Acad Audiol 2:59-69 . Humes LE, Dirks DD, Bell TS, Ahlstrom C, Kincaid GE . (1986) . Application of the articulation index and the speech transmission index to the recognition of speech by hearing-impaired and normal-hearing listeners. J Speech Hear Res 29 :447-462 . Humes LE, Dirks DD, Bell TS, Kincaid GE . (1987) . Recognition of nonsense syllables by hearing-impaired listeners and by noise-masked normal hearers. JAcoust Soc Am 81 :765-773 . Humes LE, Espinoza-Varas B, Watson CS . (1988) . Modeling sensorineural hearing loss . 1. Model and retrospective evaluation . J Acoust Soc Am 83 :188-202 . Humes LE, Roberts L. (1990) . Speech-recognition difficulties of the hearing-impaired elderly: the contributions of audibility. J Speech Hear Res 33 :726-735 . Irwin RJ, McAuley SF. (1987). Relations among temporal acuity, hearing loss, and the perception of speech distorted by noise and reverberation. J Acoust Soc Am 81 :1557-1565 . Jerger J. (1992). Can age-related decline in speech understanding be explained by peripheral hearing loss? JAm Acad Audiol 3:33-38 . Jerger J, Jerger S, Oliver T, Pirozzolo F. (1989) . Speech understanding in the elderly. Ear Hear 10 :79-89 . Jerger J, Johnson K, Jerger S. (1988) . Effect of response criterion on measures of speech understanding in the elderly. Ear Hear 9:49-56 . Kalikow DH, Stevens JN, Elliott LL . (1977) . Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J Acoust Soc Am 61 :1337-1351 . Kamm CA, Dirks DD, Bell TS . (1985) . Speech recognition and the Articulation Index for normal and hearing-impaired listeners. JAcoust Soc Am 77 :281-288 . Keith R, Talis H. (1972) . The effects of white noise on PB scores of normal-hearing and hearing-impaired listeners. Audiology 11 :177-186 . Konkle DF, Beasley DS, Bess FH . (1977) . Intelligibility of time-altered speech in relation to chronological aging. J Speech Hear Res 20 :108-115 .

423

Journal of the American Academy of Audiology/ Volume 6, Number 6, November 1995

Kozhevnikov VA, Chistovich LA. (1965). Speech : Articulation and Perception . [Translated by the Joint Publications Research Service (Washington, DC : No . JPRS 30543)]. Leshowitz B, Lindstrom R. (1979) . Masking and speechto-noise ratio. Audiology and Hearing Education 5:5-8 . Levitt H. (1971) . Transformed up-down methods in psychoacoustics . JAcoust Soc Am 49 :467-477 .

Levitt H. (1982) . Speech discrimination ability in the hearing impaired : Vanderbilt Hearing Aid Report . Monographs in Contemporary Audiology: 32-43. Levitt H, Rabiner LR . (1967) . Use of a sequential strategy in intelligibility testing. JAcoust Soc Am 42 :609-612.

Lewis HD, Benignus VA, Muller KE, Malott CM, Barton CN. (1988) . Babble and random-noise masking of speech in high and low context cue conditions . J Speech Hear Res 31 :108-114. Miller GA, Nicely PE . (1955) . An analysis of perceptual confusions among some English consonants . J Acoust Soc Am 27 :338-352 . Milner P. (1982) . Perception ofFiltered Speech by Hearingimpaired Listeners and by Normal Listeners with Simulated Hearing Loss. Unpublished doctoral dissertation, City University of New York .

Needleman AR, Crandell CC . (1992, November). Speech recognition in Noise by Listeners with Simulated Hearing Loss. Presented at the American Auditory Society annual meeting, San Antonio, TX . NeedlemanAR, Crandell CC . (1993a, February). Speech Recognition in Indiuiduals with Simulated Sensorineural Hearing Loss . Presented at the University Texas Southwestern Allied Health Research Forum, Dallas, TX . Needleman AR, Crandell CC . (1993b, April) . Speech Perception by Listeners with Simulated Sensorineural Hearing Loss. Presented at the American Academy of Audiology annual meeting, Phoenix, AZ . NeedlemanAR, Crandell CC . (1994) . Effects of response bias on speech perception in hearing-impaired and noisemasked normal listeners. Manuscript in preparation. Nittrouer S, BoothroydA . (1990) . Context effects in phoneme and word recognition by young children and older adults . JAcoust Soc Am 87 :2705-2715 . Olsen WO, Noffsinger D, Kurdziel S. (1975) . Speech discrimination in quiet and in white noise by patients with peripheral and central lesions. Acta Otolaryngol (Stockh) 80 :375-382 . Orchik DJ, Burgess J. (1977) . Synthetic sentence identification as a function of the age of the listener. J Am Audiol Soc 3:42-46 . Owen JH . (1981) . Influence of acoustical and linguistic factors on the SPIN test difference scores . J Acoust Soc Am 70 :672-682 . Pickett JM . (1980) . The Sounds of Speech Communication. Baltimore: University Park Press .

Plomp R. (1986) . A signal-to-noise ratio model for the speech-reception threshold of the hearing impaired . J Speech Hear Res 29 :146-154 . Plomp R, Mimpen AM . (1979) . Improving the reliability of testing the speech reception threshold for sentences . Audiology 18 :43-52 . Pollack 1, Decker LR. (1958) . Confidence ratings, message reception, and the receiver operating characteristic . JAcoust Soc Am 30 :286-292 . Resnick SB, Dubno JR, Hoffnung S, Levitt H. (1975) . Phoneme errors on a nonsense syllable test . JAcoust Soc Am 58(Suppl 1) :115 .

Rosen R. (1962). Phoneme Identification in Sensorineural Deafness. Unpublished doctoral dissertation, Stanford University.

Schiavetti N, Sitler RW, Metz DE, Houde RA. (1984) . Prediction of contextual speech intelligibility from isolated word intelligibility measures . J Speech Hear Res 27 :623-626 . Schmitt JR, Carroll MR . (1975) . Older listeners' ability to comprehend speaker-generated rate alteration of passages. J Speech Hear Res 28 :309-312 . Speaks C. (1967) . Intelligibility of filtered synthetic sentences . J Speech Hear Res 10 :289-298 .

Stach BA, Loiselle LH, Jerger J. (1991) . Special hearing aid considerations in elderly patients with auditory processing disorders. Ear Hear 12(Suppl) :131S-1378 .

Stelmachowitz P, Jesteadt W Gorga M, Mott J. (1985) . Speech perception ability and psychophysical tuning curves in hearing-impaired listeners. J Speech Hear Res 77 :620-627 . Summerfield Q. (1987) . Speech perception in normal and impaired hearing. Br Med J 43 :909-925 . Suter A. (1985) . Speech recognition in noise by individuals with mild hearing impairments . J Acoust Soc Am 78:887-900 .

Trees D, Turner C . (1986) . Spread of masking in normal subjects and subjects with high-frequency hearing loss . Audiology 25 :70-83 .

Turner CW, Holte LA, Relkin E. (1987) . Auditory filtering and the discrimination of spectral shapes by normal and hearing-impaired subjects . J Rehabil Res Der 24 :229-238 . Tyler RS . (1988) . Signal processing techniques to reduce the effects of impaired frequency resolution . Hear J 41 :34-47 . Tyler RS, Wood E, Fernandes M. (1982) . Frequency resolution and hearing loss . Br JAudiol 16 :45-63 . Tyler RS, Wood E, Fernandes M. (1983) . Frequency resolution and discrimination of constant and dynamic tones in normal and hearing-impaired listeners. JAcoust Soc Ain 74 :1190-1199 . van Rooij JCGM, Plomp R. (1991) . Auditive and cognitive factors in speech perception by elderly listeners. II : multivariate analyses . JAcoust Soc Am 88 :2611-2624 . Walden BE . (1984) . Speech perception of the hearingimpaired . In : Jerger J, ed . Hearing Disorders in Adults. San Diego: College Hill Press, 263-309. Walden BE, Prosek RA, Worthington DW (1975). Auditory and audiovisual feature transmission in hearing-impaired adults . J Speech Hear Res 18 :272-280 . Wang M, Bilger R. (1973) . Consonant confusions in noise: a study of perceptual features . JAcoust Soc Am 54 :12481266 . Wang M, Reed C, Bilger R. (1978) . A comparison of the effects of filtering and sensorineural hearing loss on patterns of consonant confusions . JSpeech Hear Res 21 :5-36. Yanz JL. (1984) . The application of the theory of signal detection in the assessment of speech perception . Ear Hear 5:64-71 . Yanz JL, Carlstrom JE, Thibodeau LM . (1985) . Selfassessment of communication skills : toward the development of a new audiometric tool. Ear Hear 6:211-215 . Zurek PM, Delhorne LA. (1987) . Consonant reception in noise by listeners with mild and moderate sensorineural hearing impairment. JAcoust Soc Am 82 :1548-1559 .

Suggest Documents