Cochlear generation of intermodulation distortion revealed by DPOAE frequency functions in normal and impaired ears Lisa J. Stover,a) Stephen T. Neely, and Michael P. Gorga Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131
共Received 29 January 1999; accepted for publication 22 July 1999兲 Distortion product otoacoustic emission 共DPOAE兲 frequency functions were measured in normal-hearing and hearing-impaired ears. A fixed- f 2 /swept- f 1 paradigm was used with f 2 fixed at half-octave intervals from 1 to 8 kHz. L 1 was always 10 dB greater than L 2 , and L 2 was varied from 65 to 10 dB SPL in 5-dB steps. The responses were quantified by the frequency and amplitude of the peak response. Peak responses were closer to f 2 in higher frequency regions and for lower intensity stimulation. Results from hearing-impaired subjects suggest that audiometric thresholds at the distortion product frequency, f dp , in addition to hearing status at f 2 , can affect DPOAE results. Results are discussed in terms of several manifestations of a second resonance model, as well as a dual source model for the generation of DPOAEs as measured in the ear canal of humans. It appears that a dual source model accounts for the data better than second filter models. © 1999 Acoustical Society of America. 关S0001-4966共99兲03911-9兴 PACS numbers: 43.64.Jb 关BLM兴
INTRODUCTION
It has long been known that the cochlea produces intermodulation distortion. It is most easily demonstrated when two sinusoids, or primaries, of slightly different frequency ( f 1 , f 2 ; f 1 ⬍ f 2 ) are presented to the ear. The psychophysical perception of this distortion, particularly the cubic distortion product (2 f 1 ⫺ f 2 ), has been documented and studied for many years 共e.g., Goldstein, 1967; Smoorenburg, 1972兲. Distortion has also been demonstrated in the responses of primary afferent nerve fibers 共Goldstein and Kiang, 1968兲, in the basilar membrane response 共Robles et al., 1993兲, and in the cochlear microphonic 共Dallos et al., 1969兲. Kemp 共1979兲 reported that this distortion could also be measured acoustically in the ear canal as a distortion product otoacoustic emission 共DPOAE兲. One characteristic of DPOAEs is that they exhibit a bandpass shape when the frequency of either primary is varied. If one holds f 2 constant and varies f 1 , then the DPOAE will reach a maximum absolute amplitude when the distortion product ( f dp ) occurs at a specific frequency below f 2 , and decreases as f dp either approaches or further separates from f 2 共Brown et al., 1992兲. Similar bandpass shapes in DPOAE amplitude have been reported when f 1 is held constant and f 2 is swept 共Gaskill and Brown, 1990; O’Mahoney and Kemp, 1995兲, and if f dp is held constant and both primary frequencies are changed 共Harris et al., 1989兲. There are two reasons why this bandpass characteristic is of interest. First, in trying to define clinical test parameters there is a need to determine the ‘‘optimal’’ frequency ratio, defined as that ratio which most often produces a response, or the f 2 / f 1 ratio which produces the maximum amplitude DPOAE in normal-hearing ears 共Harris et al., 1989兲.1 Although the frequency at which this maximum occurs depends on the overall and relative levels of the primaries, as well as a兲
Electronic mail:
[email protected]
2669
J. Acoust. Soc. Am. 106 (5), November 1999
the frequency region being tested, an f 2 / f 1 ratio of approximately 1.2 generally yields a DPOAE amplitude within 3 dB of the maximum 共Harris et al., 1989; Gaskill and Brown, 1990兲, and has been generally adopted in clinical protocols 共Nielson et al., 1993兲. The bandpass nature of DPOAEs is also important for the more theoretical study of basic cochlear mechanics. This feature has been investigated to better understand the cochlear mechanism共s兲 that might underly its generation. There is considerable empirical evidence that the primary source of distortion is closely linked to the f 2 cochlear place 共Harris et al., 1992; Martin et al., 1987; Zwicker and Harris, 1990兲. Theoretically, the primary source generates distortion proportional to the displacement of the basilar membrane distributed around the point of maximal interaction between the two primaries. This means that at the f 2 place, distortion is generated that decreases in amplitude monotonically as the primaries are separated, as Fig. 1共b兲 would suggest and as has been corroborated by empirical data 共Robles et al., 1993兲. This distortion propagates to the f dp place and generates a neural response which leads to the perceived distortion. Thus the psychophysical distortion monotonically decreases in amplitude as the primaries are separated. The question then becomes, ‘‘How does a low pass characteristic at f 2 place become a bandpass characteristic in the ear canal?’’ Three different hypotheses will be examined herein. Several cochlear models have been developed which incorporate a second resonance 共Neely and Stover, 1993; Allen and Fahey, 1993兲. It has been suggested that the second resonance may be found in the structure of the tectorial membrane 共TM兲 共Allen and Fahey, 1993; Zwislocki and Kletsky, 1980兲. This TM theory and model, however, is not supported by the findings of Taschenberger et al. 共1995兲, who found DPOAE frequency functions in owls and lizards that are similar to those seen in mammalian ears. In avian ears, the TM internal structure and its connection to the hair cells are significantly different from mammalian ears. Furthermore,
0001-4966/99/106(5)/2669/10/$15.00
© 1999 Acoustical Society of America
2669
FIG. 1. 共a兲 A schematic of the dual source model. Primary distortion generation occurs at the point of maximal interaction on the basilar membrane between the f 1 and f 2 stimuli. 共1兲 That distortion then propagates basally to the stapes and apically to the f dp place. 共2兲 At the f dp place a stimulus frequency emission is generated which propagates to the stapes. Those two signals combine at the stapes and for certain frequency 共distance兲 relationships a cancellation of the primary distortion occurs. 共b兲 A schematic of basilar membrane displacement in a fixed- f 2 /swept- f 1 paradigm. The left panel shows a hypothetical displacement pattern for a single f 2 stimulus and several f 1 stimuli. The point of maximal interaction is indicated by the horizontal dashed lines. The right panel shows the amount of basilar membrane displacement as a function of the distance between f 1 and f 2 . This displacement is assumed to be proportional to the amount of distortion generated.
reptilian ears lack a TM altogether over the higher frequency range. This makes it less likely that the TM is responsible for DPOAE tuning in mammalian ears for which the model was developed. In addition, the idea of a second resonance to explain DPOAE behavior is also questioned by the model of Matthews 共1986兲, which produces typical DPOAE bandpass characteristics without the use of a second resonance. Whatever the physiological manifestation of the second resonance may be, the models also differ in how the resonance affects the DPOAE response. One theory is that the resonance is associated with the frequency of the peak amplitude of the DPOAE bandpass 共Allen and Fahey, 1993兲. In this model, which we will refer to as the second-filter enhancing model, the resonance enhances the response at its resonant frequency in the basalward traveling wave and thus increases the amplitude in the ear canal. The second theory associates the second resonance with the high-frequency side of the ‘‘bandpass.’’ In this model, which we will refer to as the second-filter canceling model, the resonance absorbs energy from the response. Thus the dip on the high-frequency side of DPOAE frequency functions is the result of energy removed from the high-pass nature of the distortion genera2670
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
tor that is evident in the psychophysical and basilar membrane data. An alternative explanation for this phenomenon that also accounts for psychophysical distortion product data is the idea that there are multiple sources for the distortion generated in the cochlea by two tones 共Wilson, 1980; Furst et al., 1988; Whitehead et al., 1992兲. In this model, which is illustrated in Fig. 1共a兲, distortion is initially generated at or very near to the f 2 place and then propagates both basally toward the stapes and apically to the f dp place, where it causes a psychophysical percept. At the f dp place, however, a stimulus frequency emission is also generated which in turn propagates toward the stapes. 共It should be noted here that stimulus frequency emission is also generated at the f 1 and f 2 places; however, the energy generated there does not contain DP frequencies. The primary stimulus frequency emission probably impacts on the relationship between the two primaries but not on the distortion product directly.兲 Thus the signal measured in the ear canal at the DP frequency represents some combination of inputs from the f 2 and f dp places in the cochlea. Support for this hypothesis can be seen in the complex latency that has been observed when DPOAE ‘‘filter’’ functions are converted into ‘‘impulse’’ responses 共Stover et al., 1996兲. For certain distance/time/phase relationships between f 2 and f dp , the primary distortion, generated very near the f 2 place, and the stimulus frequency emission, generated around the place where f dp is represented, might partially cancel each other, thereby causing a decrease in amplitude of the acoustic distortion as f dp approaches f 2 . Although this multiple source model is as yet not established, empirical data demonstrating DPOAE bandpass characteristics would be consistent with its predictions. A resonant tectorial membrane 共RTM兲 model of cochlear mechanics was first proposed nearly two decades ago. However, the physiology of the tectorial membrane has not been well defined and the TM as a second resonance source remains controversial. RTM models of the cochlea have also been largely abandoned in favor of cochlear amplifier models to explain other phenomena such as spontaneous otoacoustic emissions or the fine structure observed in various otoacoustic emissions 共Shera and Zweig, 1993兲. Nevertheless, the TM appears to be an important structure in the cochlea, and until its function is better understood, the idea of a RTM will remain. On the other hand, it is also well accepted that the source for DPOAEs is distributed across multiple places on the BM 共Furst et al., 1988; Harris et al., 1989; Brown et al., 1992兲. How these sources interact and whether that interaction may be responsible for the overall shape of DPOAE frequency responses or for the microstructure of those responses is still not well understood and completely satisfactory models remain to be developed. Both mechanisms could theoretically be operating simultaneously, either in concert or in opposition. The present study was designed to further explore the bandpass characteristics of DPOAEs in humans in the hopes of providing additional insights into the underlying mechanisms responsible for this feature. Data from normal-hearing subjects are compared to those from subjects with welldefined cochlear hearing losses, which will be used to ‘‘reStover et al.: Cochlear generation of DPOAEs
2670
TABLE I. Audiometric thresholds, age, and test ear for each of 15 hearing-impaired subjects. Thresholds in boldface indicate the f 2 frequencies for which data were collected. Audiometric thresholds 共dBHL兲 Subject
Age
Ear
500
750
1000
1500
2000
3000
4000
6000
8000
100 107 113 131 133 134 138 139 141 143 145 146 153 200 203
62 74 64 67 79 39 54 47 46 36 18 55 57 12 14
R R R L L R L L L R L R L L R
35 35 5 45 5 10 15 5 30 45 45 20 15 0 35
25 10 30 5 5 15 5 40 50 45 10 30 0 30
20 10 10 20 0 5 20 10 35 45 40 5 35 20 40
15 20 10 15 0 0 25 5 15 45 20 10 10 25 40
0 30 15 10 10 0 30 5 10 30 5 5 10 45 35
25 30 40 15 20 0 25 5 10 25 10 25 25 45 25
35 40 45 15 30 0 20 20 10 20 5 40 55 10 5
55 45 25 25 65 45 35 10 10 20 5 65 65 0
35 75 45 20 25 75 40 35 5 10 10 25 65 55 5
strict’’ the cochlear region共s兲 contributing to any measured response. These data should enable us to directly test the extent to which the multiple source hypothesis is correct. I. METHODS
Fourteen young adults with normal hearing and 15 adolescents and adults with hearing impairment served as subjects. Normal hearing was defined as audiometric thresholds of 20 dB HL or better for half-octave frequencies from 500 to 8000 Hz. All hearing losses were cochlear in origin and stable. Both groups of subjects had normal middle ear function at the time of DPOAE data collection, determined by standard clinical acoustic immittance procedures. All subjects were seated in a comfortable recliner in a sound treated room. During data collection they were asked only to sit quietly and were allowed to read or sleep. Considerable effort was made to include subjects with atypical audiometric configurations that were thought to possibly isolate DPOAE sources. 共The audiometric thresholds for the 15 hearing-impaired subjects are given in Table I.兲 Subjects with steeply rising audiometric configurations were of particular interest because these subjects presumably could have normal cochlear function at the primary generation site for distortion 共i.e., very near to f 2 兲, while having abnormal cochlear function at the site where a stimulus frequency distortion product would be generated 共i.e., the f dp place兲. In contrast to data from subjects with normal cochlear function at both the f 2 and f dp places, or data from subjects with more typical sloping configurations where f dp would be less compromised than f 2 , the DPOAEs from subjects with rising audiometric configurations should include distortion generated only at the f 2 place. A fixed- f 2 /swept- f 1 paradigm was used with f 2 fixed at one of seven frequencies in half-octave steps from 1000 to 8000 Hz. For each f 2 frequency, f 1 was moved from an f 2 / f 1 ratio of 1.01 to 1.5 in 25-Hz steps. L 1 was always 10 dB greater than L 2 , and L 2 was decreased in 5-dB steps from 65 dB SPL until no discernible response could be observed. Stimuli have been described in more detail elsewhere 共Stover 2671
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
et al., 1996兲. Stimulus generation and data acquisition were controlled by locally written software 共EMAV, Neely and Liu, 1993兲. Stimuli were delivered to the ear canal via Etymotic ER-2 transducers and the ear canal sound pressure was measured using an Etymotic ER-10B microphone system. For each f 2 / f 1 combination, the following data were obtained: L 1 , L 2 , and amplitude, phase, and noise floor at each of four DP frequencies 共2 f 1 ⫺ f 2 , 3 f 1 ⫺2 f 2 , 4 f 1 ⫺3 f 2 and either 5 f 1 ⫺4 f 2 or 2 f 2 ⫺ f 1 兲. These data were then used to construct DPOAE frequency functions, which are plots of DPOAE amplitude as a function of f dp 共see Fig. 2 for example兲. For each DPOAE frequency function, the following parameters were measured: the peak amplitude, DP frequency at which peak amplitude occurred, the low and high DP frequency edges of the complete response, and the average amplitude of the complete response. Although an attempt was made to analyze a 3-dB bandwidth, it was not possible to measure in all cases, especially when there was considerable microstructure in the response or if spontaneous otoacoustic emissions 共SOAEs兲 were present. 共While SOAEs were present in some ears, only two or three cases arose where SOAE frequencies were within the range being measured. In no case did SOAEs change the overall shape or interpretation of the data.兲 Data collection required approximately 10 hours in normal-hearing subjects divided into 1–2-hour sessions and anywhere from 2 to 10 hours in hearing-impaired subjects, depending on amount of hearing loss and dynamic range of their DPOAE response. Further details of data collection have been reported previously 共Stover et al., 1996兲. II. RESULTS A. Normal-hearing subjects
Figure 2 displays mean data from all 14 subjects with normal hearing. Although the mean functions were smoother, lacking some of the fine structure of data from individual subjects, the relevant features remain consistently salient. Therefore, most of the results from the normal subStover et al.: Cochlear generation of DPOAEs
2671
FIG. 3. 共a兲 Mean DPOAE frequency functions with L 1 /L 2 ⫽65/55 dB SPL for five f 2 frequencies from 2 to 8 kHz. The frequencies are indicated by the symbols as given in the top left of the figure. The bars and symbols at the bottom of the figure indicate the systematic shift of the frequencies of peak amplitudes toward f 2 as f 2 increases. Note also the similarity among frequency functions on the high-frequency side. 共b兲 Mean peak frequencies as a function of L 2 for five f 2 frequencies indicated by the symbols as described in 共a兲. There is a systematic shift away from f 2 as the level of stimulation is increased.
FIG. 2. Mean DPOAE frequency responses from all normal subjects for five different f 2 frequencies indicated at the top left of each panel. The amplitude of the DPOAE is plotted as a function of the DP frequency. The level of stimulation used to elicit the response is indicated by the symbol as described in the lower panel. Note the broadening of the response in lower frequency regions and higher levels of stimulation. Note also the relative stability of the high side of the function as opposed to either the low side or the peak. 共Error bars are not included in this figure for the sake of clarity; however, a general idea of the variation in the individual data included in these mean data can be seen in Figs. 4–6.兲 2672
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
jects will be illustrated with mean data. In each panel, DPOAE amplitude is plotted as a function of f dp on a logarithmic scale 共i.e., in octaves relative to f 2 兲 in order to allow comparison of data across the full range of f 2 frequencies.2 The f 2 frequency is noted in the upper-left corner of each panel. Within each panel, individual functions represent data for different primary levels, ranging from L 2 ⫽65 to 35 dB SPL in 5-dB steps. As Fig. 2 demonstrates, the peak of the DPOAE frequency function approaches f 2 and the bandwidth narrows as f 2 increases in frequency. For example, at 8 kHz with L 1 /L 2 ⫽65/55 dB SPL, the peak occurred at a DP frequency 0.5 octaves below 8 kHz 共i.e., approximately 5600 Hz兲. In contrast, the peak occurred at an f dp about 0.75 octaves below f 2 when f 2 was fixed at 2 kHz for the same stimulus levels. The peak of the functions systematically became more distant 共on a logarithmic scale兲 as f 2 decreased. This feature of the data is summarized in Fig. 3共a兲, which shows the mean frequency functions for each f 2 for stimulus intensities of 65/55 dB SPL. The symbols and bar at the bottom of the figure indicate the frequencies and range of amplitude maxima. These results are consistent with previous reports 共Harris et al., 1989兲 and with what is known about tuning of the peripheral auditory system. Specifically, psychophysical tuning curves, neural tuning curves, and basilar membrane motion studies all indicate that the auditory system is more broadly tuned, on a logarithmic frequency scale, as frequency decreases. A second feature of the frequency functions in Fig. 2 is a shift in peak frequency as stimulus level was reduced. This pattern is summarized for all stimulus intensities and frequencies in Fig. 3共b兲, which plots the mean DP frequency at which the peak amplitude occurred as a function of stimulus level. The parameter is f 2 frequency. The peak occurs between 0.22 and 0.8 octaves below f 2 ; however, it migrates systematically toward f 2 as f 2 is increased or L 2 is decreased. As with the previously discussed frequency effect, this primary level effect might be expected based on psychoStover et al.: Cochlear generation of DPOAEs
2672
physical and physiological data. Specifically, masking patterns and the low-frequency shift of tuning curves both indicate a loss of tuning with increased intensity of stimulation. Perhaps the most striking feature in Fig. 2 is the asymmetry of both the frequency and level effects. As intensity is increased in each panel, the peak shifts away from f 2 , but this effect is primarily due to the broadening of the lowfrequency side of the functions. While the high-frequency side remains relatively stable, particularly in terms of slope, the low-frequency side changes in both slope and amplitude. Although it is not as readily apparent in Fig. 2, the same statement can be made for the peak shift across f 2 . It can be seen more easily in Fig. 3共a兲 that the broadening occurs only on the low-frequency side of the function, while the highfrequency slope remains relatively constant across f 2 . There are two possible explanations for this asymmetry. First, this pattern is very consistent with a dual source model. The two sources, distributed near the f 2 and f dp places, do not change cochlear places with changes in level. Therefore, the frequency of cancellation would also remain constant, thus resulting in the ‘‘dip’’ occurring at the same frequency across level. The amount of cancellation might change with level because of the different saturation rates and amplitudes of the two sources. Specifically, the stimulus frequency generator saturates at input levels approximating 40 dB SPL 共Kemp and Chum, 1980兲, while the primary generator 共presumably very close to f 2 兲 does not saturate, at least for the levels used in this study. Thus, at high levels of stimulation, the level of the ‘‘interfering’’ stimulus 共from f dp place兲 is reduced relative to the primary generation and the ‘‘dip’’ becomes less pronounced 共see Fig. 2兲. Because frequency is mapped logarithmically along the cochlear partition, one would expect that the distance, in octaves, between f 2 and f dp , for which cancellation would occur, would be similar regardless of f 2 . The data shown in Fig. 3共a兲 are consistent with this idea. The second explanation for this asymmetry in the data is that a second resonance exists, which would, however, be measured by the ‘‘dip’’ in the function rather than the peak. That is, the second resonance would change the high-pass characteristic of the primary generator by absorbing energy rather than adding energy. Such a second resonance would be located at the zero of the function rather than at the peak. This resonance also would have to be sharply tuned and relatively independent of both f 2 and stimulus intensity. No such model has, as yet, been proposed, perhaps because of such unusual requirements. In summary, the normal cochlea produces distortion in response to two tones of slightly different frequency. DPOAE frequency functions, in which f 2 is held constant and f 1 is varied, reveal an amplitude peak whose frequency location is determined by both the frequency region being stimulated and the intensity of that stimulation. The highfrequency slopes of these functions, however, are relatively constant across both frequency and intensity. Because various arguments can be made for each of the models regarding these features, however, data from normal-hearing subjects cannot adequately differentiate between the three models under consideration. 2673
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
III. HEARING-IMPAIRED SUBJECTS
Data from subjects with hearing loss provide a unique opportunity to evaluate different explanations for the shape of DPOAE frequency functions. For instance, one might expect a broadened ‘‘filter’’ response if these frequency functions reflect changes in frequency selectivity similar to the changes observed in other measures of auditory function 共i.e., tuning curves, critical bands兲 following cochlear damage. On the other hand, if there are multiple sources, one might be able to isolate those sources through frequency specific hearing losses. Therefore, attempts were made to include hearing losses in the mild range 共when some damage exists but not enough to completely eliminate DPOAE responses兲 and those with audiometric configurations in which DPOAEs might be eliminated from one source but not the other. 共It should be noted that audiometric thresholds at halfoctave frequencies do not supply the precision or accuracy needed to evaluate cochlear function for the purposes stated here. However, if the effects we predict can be seen with such a gross measure of cochlear status, then the argument is only strengthened further.兲 The nature of these data are such that it is not possible to collapse results across hearingimpaired subjects. We are constrained to illustrate response patterns by presenting data from individual subjects with hearing loss. These individual sets of data are compared with the overall response patterns that were seen in subjects with normal hearing. Subjects were selected to represent hearing losses whose underlying cochlear conditions might represent one of the following situations: 共1兲 intact cochlea at both f 2 and f dp regions with damage in other regions; 共2兲 intact cochlea in f dp region with damage at f 2 ; and 共3兲 intact cochlea at f 2 with damage in f dp regions. The first situation was chosen to ensure that areas of the cochlea remote to the frequency regions of interest are not involved in the generation of DPOAEs. This can be seen in areas of normal hearing in subjects with high-frequency hearing loss 共i.e., S100, S113, S133, S134, S139, and S146 for f 2 ⫽2 kHz; S131, S134, S139, and S141 for f 2 ⫽4 kHz, etc.兲 An example, shown in Fig. 4, demonstrates that, if the f 2 place and the f dp region are intact, then the patterns of acoustic distortion are essentially normal. The panels on the left show the frequency functions for a subject with hearing loss, whose audiogram is shown on the right. Data from four primary levels are shown for illustration and are compared to the mean frequency functions for the 14 normal-hearing subjects 共solid lines兲. The dotted lines represent ⫾ one standard deviation based on the normative data set. The star in the audiogram represents f 2 and the short and long horizontal bars indicate the ranges of f 1 and f dp , respectively. For this subject, hearing thresholds 共and thus presumably cochlear status兲 are normal throughout the frequency range of all three parameters ( f 2 , f 1 , f dp ). The DPOAE frequency functions also fall within the normal range in terms of peak amplitude, peak frequency, overall shape, and the dynamic range of the response. Thus, these data are consistent with the prediction that remote regions of the cochlea do not contribute to the DPOAE. The second situation was chosen such that only the primary distortion generation site ( f 2 ) had sustained some damStover et al.: Cochlear generation of DPOAEs
2673
FIG. 4. Frequency responses from a hearing-impaired subject with normal hearing in the frequency range being measured. The subject’s audiogram is given on the right with the frequency indicated by the large star symbol. The bars at the top of the audiogram indicate the frequency ranges of f 1 and f dp . The left column shows the subject’s DPOAE response by the symbols for a range of stimulus intensities 共given to the left of each function兲. The amplitude range is given in the bottom panel and the horizontal dashed line in each panel indicates the ⫺20-dB SPL noise floor reference. For comparison, the mean normal frequency function is given in each panel by the solid line, ⫾ one standard deviation by the dotted lines. For this subject the response is essentially normal in shape and amplitude.
age. This situation is one in which predictions differ between the second resonance models and the dual source model. For all three models, primary generation must happen at the f 2 place. However, a second resonance model would predict a broadening of the frequency response similar to the changes that have been observed in other measures of frequency selectivity in damaged cochleae. Thus one might expect either a broadening of the peak 共second-filter enhancing兲 or the dip 共second-filter canceling兲, or a shifting of either peak or dip further away from f 2 . In a dual source model, the relative contributions from the f 2 and f dp places would change because the output of the f 2 generator would be reduced, while the output from the f dp place would be reduced very little, if at all because of the assumed saturation of the stimulus frequency emission generator. Thus with greater relative contribution from the ‘‘interfering’’ stimulus 共from f dp place兲, one would predict that there would be more cancellation, which would manifest as a broader dip in the frequency function. Because the primary generator at f 2 is affected, all three models would predict reduced amplitude and smaller dynamic range. This situation can be found in the typical sloping hearing loss 共i.e., S107 at 2 kHz; S100 and S113 at 3 kHz; S133, S146, and S153 at 4 kHz S134, S138, and S139 at 6 kHz兲. It should be noted that when threshold was over 40 dB HL for f 2 , either no response or only a weak response at the highest levels of stimulation could be measured, with the exception of one case 共S200 at 3 kHz兲. In several cases, no response could be seen when threshold at f 2 was between 20 and 40 dB HL 共S139 at 8 and S143 at 2 kHz兲. In all other cases of mild hearing loss, a response could be measured with sufficient dynamic range to evaluate the data. Figure 5 shows audiometric and DPOAE data for a subject in whom the primary generation site was functionally abnormal but still capable of producing distortion. The representations follow the same conventions as were used in Fig. 4. The overall amplitude is, indeed, reduced compared to data from normal subjects, as is the dynamic range due to 2674
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
an elevated threshold of the DPOAE. The shape of the response, however, does not indicate any broadening of the peak, nor is it shifted in frequency, as would be expected from a model in which a second resonance is manifested in the peak of the response. The ‘‘dip,’’ however, is broader, as predicted by both the second resonance canceling model and the dual source model. The final case was chosen to represent that the region of primary generation within the cochlea ( f 2 ) was normal, but the regions where the secondary generation might occur 共i.e., the regions where f dp is represented兲 were damaged. This is the situation that will most clearly distinguish between the two explanations. Both second resonance models would predict normal DPOAE frequency functions as long as the cochlea was intact in the region of primary distortion generation 共i.e., the f 2 place兲 because the second resonance is assumed to be located at, or near, the f 2 place. The dual source model, on the other hand, would predict a more highpass characteristic to the DPOAE frequency function because the regions responsible for generating the secondary energy 共i.e., the f dp place兲 would not produce a response. Thus little or no cancellation might occur. In this instance, a more atypical upward sloping audiogram is required, even one with a fairly steep slope in order to remove the lower frequency f dp contributions. This criterion was met in several of the subjects 共i.e., S141, S145, and S153 at 1500 Hz; S145 at 2 kHz; and S200 and S203 at 4 kHz兲. While there were several other instances of reverse sloping audiograms, the slopes were probably too shallow to remove the f dp component. The results for this situation are demonstrated in Fig. 6 following the conventions of Figs. 4 and 5. There is clearly a difference between the shape of this response and that from normal-hearing subjects. The response has a high-pass characteristic, as a dual source model would predict. Note, however, that the peak amplitude 共which in this case occurs at a frequency very close to f 2 兲 and dynamic range are within the normal range of variabilStover et al.: Cochlear generation of DPOAEs
2674
FIG. 5. As in Fig. 4. In this case the subject has a mild hearing loss at f 2 with improving thresholds in the range of f dp . The DPOAE is reduced in amplitude but the peak is not systematically shifted from the normal response. However, the high-frequency side ‘‘dip’’ is broader than the normals.
ity, again consistent with the predicted pattern for a dual source model if the cochlea is intact at f 2 . In summary, the data from hearing-impaired subjects would appear to favor a dual source model of DPOAE generation. Specifically, it was observed that regions remote from f 2 and f dp are not involved in distortion generation; that even if the f 2 region was damaged, the response does not broaden, as a ‘‘filter’’ model would predict; and that if the f dp region is damaged, thus eliminating the ‘‘interfering’’ stimulus, the apparent bandpass characteristic is eliminated. If input from a second DP source is intact, relatively more cancellation occurs 共seen as a broadening of the dip兲, and if the second DP source is removed, the frequency response changes, and reveals a high-pass characteristic of the primary f 2 source alone. Although the broadening of the dip could also be explained by a second resonance theory if the action of the resonance is energy absorption, the results from
the upward sloping hearing losses cannot be explained by either of the second resonance hypotheses. IV. DISCUSSION
We have considered three classes of models in attempts to account for DPOAE frequency functions: a second resonance filtering distortion which is bandpass in nature and defined by the peak of the function; a second resonance which is energy absorbing and defined by the ‘‘dip’’ of the function as the primaries approach each other; and a dual source model in which two sources of energy combine in such a way as to cancel the distortion as measured in the ear canal. While the present data do not disagree with previous data, our conclusions regarding underlying mechanisms involve different models. The underlying assumptions of each model can be assessed by evaluating the present data in dif-
FIG. 6. As in Figs. 4 and 5. This subject has normal hearing at f 2 but reduced hearing in the f dp region. Without the contribution from the f dp region the response is characterized by a high-pass shape and is significantly different than the normal frequency functions. This is difficult to explain in terms of a second filter model but is predicted by a dual source model.
2675
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
Stover et al.: Cochlear generation of DPOAEs
2675
ferent ways. For instance, the peak is only relevant to the enhancing second resonance model, while the ‘‘dip’’ is intrinsically more important to the other two models. We should be able to determine which model is most consistent with these data by examining the underlying assumptions of each model and comparing all aspects of the data rather than simply the peak frequency and amplitude. The concept of a second cochlear filter has been considered for some time. A second filter was hypothesized to account for the difference between broad basilar membrane tuning and sharp neural tuning 共Evans and Wilson, 1973兲. The need for a second filter was eliminated as better measurement techniques for BM motion were developed and the difference between mechanical and neural tuning decreased. Most of the phenomena which were attributed to a second resonance are now believed to be due to the action of the motile outer hair cell system. Since many measures of cochlear response can now be accounted for without involving a second filter, perhaps such a ‘‘filter’’ is not needed to account for DPOAE data either. Two arguments have been made for second resonance models. The original explanations for the DPOAE frequency functions involved either a suppression or a cancellation process between the two primary stimuli. The second filter idea was resurrected when higher-order distortion data refuted the primary interaction theory 共Brown et al., 1992; Brown and Williams, 1993兲. The argument was that if the results were not due to the two primaries interacting, then there may be a ‘‘filter’’ associated with the f 2 place which would act on all distortion products similarly. However, the original explanation could be altered such that the DPOAE frequency functions involve a cancellation process between f 2 and f dp , rather than f 1 . In that case, ‘‘filter functions’’ would be independent of DP order 共at least in terms of the frequency of the peak or the ‘‘dip’’兲 but would depend on the frequency at which the DP occurs, be it 2 f 1 ⫺ f 2 , 3 f 1 ⫺2 f 2 , or any other DP. This is essentially the dual source model. A second argument for a DPOAE filter revolved around the half-octave shift between the peak frequency and f 2 . The half-octave difference is consistent with several well-studied psychophysical and physiological phenomena, such as noiseinduced temporary threshold shift 共TTS兲, contralateral efferent effects, and the tip/tail juncture in tuning curves. The first and third of these phenomena 共TTS and the tip/tail juncture兲 have been explained by a second resonance hypothesis 共Allen and Fahey, 1993兲; there are no current theories regarding efferent effects which involve a second cochlear resonance. Furthermore, the observation of effects at the ‘‘half-octave’’ may be an oversimplification of the data. The present data do not indicate an exact ‘‘half-octave’’ between f 2 and the peak, but rather a range from a quarter-octave to one full octave. There is a similar imprecision to the halfoctave relationships in the other types of data as well. At a first pass, it would seem that some features of DPOAE frequency functions are consistent with other frequency tuning characteristics. The changes in ‘‘tuning’’ of the second filter 共both in terms of peak-to-f 2 and filter bandwidth兲 with frequency are similar to those seen in both psychophysical and physiological data. The primary resonance 2676
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
of the basilar membrane is indeed more sharply tuned at the base than the apex. If the bandwidth of a second resonance is also frequency dependent, then the two frequency maps will not be parallel but rather closer together at the base and increasingly more separated further toward the apex. Such a relationship has been reported in estimates of a second frequency map for cat 共Allen and Fahey, 1993兲, although those calculations are subject to the argument, above, regarding the imprecision and subjectivity of tip/tail measurement. Basilar membrane, neural, and psychophysical data also show that the auditory system loses some tuning as stimulus level increases. However, the high degree of tuning for low level stimuli results from the action of the outer hair cells 共OHCs兲 when they are in their linear mode of operation. As stimulus level is increased, the relative input from the OHCs becomes less, due to their saturating nonlinearity. The basilar membrane remains broadly tuned across levels of stimulation. Thus the level effect is not intrinsic to the basilar membrane, but rather is due to the added energy of the OHCs. Invoking the same line of reasoning for a second resonance, in order for the same level effect to be seen, the mechanism of that second resonance 共presumably the TM兲 will also probably be affected by the OHCs rather than intrinsically changing its tuning with level of stimulation. It is not clear whether the coupling of the stereocilia with the TM is strong enough to actually change its motion as would be required by this line of reasoning. The fact that there is a high-pass characteristic seen in frequency functions of basilar membrane motion and in the psychophysical distortion product literature suggests a twostep process in DPOAE generation. The second ‘‘filtering’’ must occur either after the primary generation in time or basal to it in order for the high-pass stimulus to propagate to the DP place and generate the psychophysical percept. Given this constraint, the input to the second resonance is not the same as the input to the primary distortion generator 共the representation of the two primary tones near f 2 place兲, but rather the output of the distortion generator which is characteristically high pass in nature. The TM is intrinsically involved in the primary generation of distortion as a consequence of its role in the shear displacement of the stereocilia of the OHCs. Thus for the TM to be the source of a second resonance it would have to be responsible for both a highpass response and also a bandpass second resonance response at the same time. The previous discussion applies to a second filter defined by the peak of DPOAE frequency functions. However, a second resonance might be more appropriately described by the ‘‘dip’’ of the frequency function. The second resonance must account for the difference between primary generation, distributed at the f 2 place, which has a high-pass characteristic 共according to psychophysical and physiological data兲, and the acoustic distortion in the ear canal, which has a bandpass characteristic. The primary difference between the two ‘‘filter responses’’ is a removal of energy for f dp ’s approaching f 2 . Even a peak-defined filter will have to include a ‘‘zero’’ to account for this feature. The second resonance would thus be absorbing energy rather than adding energy to the response. Stover et al.: Cochlear generation of DPOAEs
2676
A second filter of this type would be inherently different from one defined by the peak. First of all, such a filter would necessarily be highly tuned, as the often seen increase in DPOAE amplitude for very close f 2 / f dp relationships on the other side of the dip would suggest. Although the ‘‘dip’’ was not quantified in the present data 共and would be more difficult to quantify than the peak兲, qualitative observation indicates that it is relatively independent of f 2 , or cochlear location, a pattern that would be consistent with a second frequency map parallel to the BM frequency map. The frequency of the dip is also independent of stimulus intensity, although the depth of the dip seems to be less at higher intensities. These characteristics of a ‘‘dip’’-defined second filter allow it to be less intrinsically attached to the primary distortion generator, which avoids some of the difficulties just discussed. However, there are currently no second resonance models of this type. The dual source model is consistent with the data from normal-hearing subjects. This model can account for the difference between psychophysical and acoustic distortion, because it is inherently a cancellation model and thus is manifested by the ‘‘dip’’ rather than the peak. The peak is simply a function of the juxtaposition of the high-pass primary distortion generator and the cancellation from the second source. Because frequency is logarithmically mapped, the frequency of the dip should be relatively independent of f 2 frequency, when plotted on an octave scale, as shown in Fig. 3. The frequency of the dip should also be independent of stimulus intensity. The model would, however, predict changes in the relative contributions from the two sources as stimulus intensity changed, because the stimulus frequency emission from the f dp place would be expected to saturate more and to lower levels than the primary generator at f 2 . Thus there would not be changes in the frequency of the effect but the amount of cancellation would be less for higher intensity stimulation. Although a general trend toward shallower dips at high intensities was observed, further study is needed prior to any conclusive statements. The data from the hearing-impaired subjects present perhaps the strongest argument in favor of a dual source model. The fact that predicted high-pass frequency functions were obtained from subjects in which hearing loss was used to isolate the primary source 共at f 2 place兲 is difficult for a second filter model to explain. If a second filter is associated with the generator at f 2 place, cochlear damage at f dp place should not affect the output. Damage at the f 2 place may limit the output magnitude but the overall frequency function is unchanged as long as there remain two cochlear sources. This type of data requires further exploration, particularly with a more precise measure of hearing 共cochlear兲 status, such as threshold microstructure with laboratory threshold procedures rather than half-octave thresholds with clinical threshold procedures. However, the preliminary results from these few subjects present a challenge for any second-filter model. In summary, the arguments for a second resonance explanation for DPOAE frequency functions are not compelling. Furthermore, although it is not clear what effect such a second resonance would have on modeling cochlear pro2677
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
cesses other than DPOAEs, there are some inherent difficulties in modeling an underlying mechanism as a second filter. We do not argue, necessarily, that there is no second resonance in the cochlea, only that it is not a necessary, nor the most parsimonious, explanation for DPOAE frequency functions. A dual source model accounts for the data and does not have far-reaching implications in terms of other auditory processing. This does not mean that DPOAE frequency functions are not worthy of further investigation. There are significant clinical implications in these findings. The I/O functions and the data from hearing-impaired subjects indicate that a standardized test with an f 2 / f 1 ratio of 1.2 may require further examination, and that configuration of the hearing loss, not just cochlear status at f 2 , can affect the measured DPOAE response 共although if no distortion is generated at f 2 , the status at the f dp place is irrelevant兲. Furthermore, the source of the frequency microstructure in the response and its effect on clinical interpretation need to be addressed. These questions require further investigation. ACKNOWLEDGMENTS
We wish to thank Doug Keefe for helpful comments on an earlier draft of this manuscript. Portions of this work were previously presented at the ARO Midwinter Meeting in 1996 and 1994. This work was supported by grants from the NIDCD 共P60DC00982 and R01DC02251兲. 1
Although defining the stimulus characteristics yielding the largest response in normal ears is useful, from a clinical perspective, optimal stimulus conditions are those that result in the greatest separation between the distribution of responses from ears with normal hearing and ears with hearing loss. 2 The data for f 2 at 1000 and 1414 Hz were sparse and often contaminated with a high noise floor. While the same trends presented below were present in these data they are not specifically included in further discussion.
Allen, J. B., and Fahey, P. F. 共1993兲. ‘‘A second cochlear-frequency map that correlates distortion product and neural tuning measurements,’’ J. Acoust. Soc. Am. 94, 809–816. Brown, A. M., and Williams, D. M. 共1993兲. ‘‘A second filter in the cochlea,’’ in Biophysics of Hair Cell Sensory Systems, edited by H. Duifhuis, J. W. Horst, P. van Dijk, and S. M. van Netten 共World Scientific, London兲, pp. 72–77. Brown, A. M., Gaskill, S. A., and Williams, D. M. 共1992兲. ‘‘Mechanical filtering of sound in the inner ear,’’ Proc. R. Soc. London, Ser. B 250, 29–34. Dallos, P., Schoeny, Z. G., Worthington, D. W., and Cheatham, M. A. 共1969兲. ‘‘Some problems in the measurement of cochlear distortion,’’ J. Acoust. Soc. Am. 46, 356–361. Evans, E. F., and Wilson, J. P. 共1973兲. ‘‘The frequency selectivity of the cochlea,’’ in Basic Mechanisms of Hearing, edited by A. R. Moller 共Academic, New York兲, pp. 519–554. Furst, M., Rabinowitz, W. M., and Zurek, P. M. 共1988兲. ‘‘Ear canal acoustic distortion at 2 f 1 ⫺ f 2 from human ears: Relation to other emissions and perceived combination tones,’’ J. Acoust. Soc. Am. 84, 215–221. Gaskill, S. A., and Brown, A. M. 共1990兲. ‘‘The behavior of the acoustic distortion product, 2 f 1 ⫺ f 2 , from the human ear and its relation to auditory sensitivity,’’ J. Acoust. Soc. Am. 88, 821–839. Goldstein, J. L. 共1967兲. ‘‘Auditory nonlinearity,’’ J. Acoust. Soc. Am. 41, 676–689. Goldstein, J. L., and Kiang, N. Y.-S. 共1968兲. ‘‘Neural correlates of the aural combination tone 2 f 1 ⫺ f 2 ,’’ Proc. IEEE 56, 981–999. Harris, F. P., Probst, R., and Xu, L. 共1992兲. ‘‘Suppression of the 2 f 1 ⫺ f 2 otoacoustic emission in humans,’’ Hear. Res. 64, 133–141. Stover et al.: Cochlear generation of DPOAEs
2677
Harris, F. P., Lonsbury-Martin, B. L., Stagner, B. B., Coats, A. C., and Martin, G. K. 共1989兲. ‘‘Acoustic distortion products in humans: Systematic changes in amplitude as a function of f 2 / f 1 ratio,’’ J. Acoust. Soc. Am. 85, 220–229. Kemp, D. T. 共1979兲. ‘‘Evidence of mechanical nonlinearity and frequency selective wave amplification in the cochlea,’’ Arch. Otorhinolaryngol. 224, 37–45. Kemp, D. T., and Chum, R. 共1980兲. ‘‘Observations on the generator mechanism of stimulus frequency acoustic emissions—two tone suppression’’ in Psychophysical, Physiological and Behavioural Studies in Hearing, edited by G. van den Brink and F. A. Bilsen 共Delft University Press, The Netherlands兲, pp. 34–42. Martin, G. K., Lonsbury-Martin, B. L., Probst, R., Schinin, S. A., and Coats, A. C. 共1987兲. ‘‘Acoustic distortion products in rabbit ear canal. II. Sites of origin revealed by suppression contours and pure tone exposures,’’ Hear. Res. 28, 191–208. Matthews, J. W. 共1986兲. ‘‘Modeling intracochlear and ear canal distortion produce (2 f 1 ⫺ f 2 ),’’ in Peripheral Auditory Mechanisms, edited by J. B. Allen, J. L. Hall, A. Hubbard, S. T. Neely, and A. Tubis 共Springer, New York兲, pp. 258–265. Neely, S. T., and Liu, Z. 共1993兲. ‘‘EMAV: Otoacoustic emission averager,’’ Tech. Memo No. 17 共Boys Town National Research Hospital, Omaha兲. Nielsen, L. H., Popelka, G. R., Rasmussen, A. N., and Osterhammel, P. A. 共1993兲. ‘‘Clinical significance of probe-tone frequency ratio on distortion product otoacoustic emissions,’’ Scand. Audiol. 22, 159–164 O’Mahoney, S., and Kemp, D. T. 共1995兲. ‘‘Distortion product otoacoustic
2678
J. Acoust. Soc. Am., Vol. 106, No. 5, November 1999
emission delay measurement in human ears,’’ J. Acoust. Soc. Am. 97, 3721–3735. Robles, L., Ruggero, M. A., and Rich, N. C. 共1993兲. ‘‘Distortion products at the basilar membrane of the cochlea: Dependence on stimulus frequency and intensity and effect of acoustic trauma,’’ Neurosci. Abstracts 19, 1421. Shera, C. A., and Zweig, G. 共1993兲. ‘‘Noninvasive measurement of the cochlear traveling-wave ratio,’’ J. Acoust. Soc. Am. 93, 3333–3352. Smoorenburg, G. F. 共1972兲. ‘‘Audibility region of combination tones,’’ J. Acoust. Soc. Am. 52, 603–614. Stover, L. J., Neely, S. T., and Gorga, M. P. 共1996兲. ‘‘Latency and multiple sources of distortion product otoacoustic emissions,’’ J. Acoust. Soc. Am. 99, 1016–1024. Taschenberger, G., Gallo, L., and Manley, G. A. 共1995兲. ‘‘Filtering of distortion-product otoacoustic emissions in the inner ear of birds and lizards,’’ Hear. Res. 91, 87–92. Whitehead, M. L., Lonsbury-Martin, B. L., and Martin, G. D. 共1992兲. ‘‘Evidence for two discrete sources of 2 f 1 ⫺ f 2 distortion-product otoacoustic emission in rabbit: I. Differential dependence on stimulus parameters,’’ J. Acoust. Soc. Am. 91, 1587–1607. Wilson, J. P. 共1980兲. ‘‘The combination tone, 2 f 1 ⫺ f 2 , in psychophysics and ear-canal recording,’’ in Psychophysical, Physiological and Behavioural Studies in Hearing, edited by G. van den Brink and F. A. Bilsen 共Delft University Press, The Netherlands兲, pp. 43–52. Zwiskocki, J. J., and Kletsky, E. J. 共1980兲. ‘‘Micromechanics in the theory of cochlear mechanics,’’ Hear. Res. 2, 205–212.
Stover et al.: Cochlear generation of DPOAEs
2678