William J Murphy and David Byrne .... the stimulus level by means of a switch. ... stimulus level was 5 dB per second and the duty cycle was 275 ms on and 275 ...
Minneapolis, Minnesota
NOISE-CON 2005 2005, Oct 17-19
Psychophysical uncertainty estimates for real ear attenuation at threshold measurements in naïve subjects William J Murphy and David Byrne National Institute for Occupational Safety and Health Hearing Loss Prevention Team 4676 Columbia Parkway MS C-27 Cincinnati, OH 45226-1998
Brad Witt and Jesse Duran Howard Leight Industries a Division of Bacou-Dalloz 7828 Waterville Road San Diego CA
Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Institute for Occupational Safety and Health.
ABSTRACT The International Standards Organization (ISO) TC43 SC1 has directed the working groups to use the “Guide to the expression of Uncertainty in Measurement” (GUM) in the development of new acoustical standards. In light of this resolution, Working Group 17 on hearing protection has added an annex to a proposed standard ISO4869-7 describing a subject fit method for estimation of noise reduction. Data from two laboratories, NIOSH Pittsburgh Research Lab and Howard Leight Industries were analyzed to examine the within-test variance and between-test variance when subjects were tested using a Bekesy tracking paradigm. The NIOSH data demonstrated that the within-test variance, calculated from the reversals of the Bekesy tracks did not agree with the between-test variability of the subjects’ repeated measures. This result prompted a revision of the testing software to determine the within-test variance from the midpoint of the Bekesy tracks. The revised software has been installed at both NIOSH PRL and Howard Leight facilities. This paper will compare the within-test and between-test variances determined with reversal levels and the midpoints from these two labs. The variability has been modeled with a simulated subject and Monte Carlo techniques.
1. INTRODUCTION The American National Standard for Measurement of Real Ear Attenuation of Hearing Protectors1 has been the subject of considerable debate within the hearing protector testing community. The International Standards Organization is in the process of adopting the standard as ISO 4869-72. The working group has drafted an annex on the uncertainties for Real Ear Attenuation at Threshold (REAT) measurements. Uncertainties and bias errors may arise from equipment calibration, the diffusivity or uniformity of the sound field, subject panel selection, fitting instructions, threshold measurement paradigm and the fitting of the protector by the subject(s). While the systematic bias errors of equipment and sound field are expected to be small, less than a decibel, the effect of fitting and instruction could be on the order of several decibels. The REAT measurement with uncertainties can be expressed REAT( f ) = Loccluded ( f ) – Lunoccluded( f ) + Σ δ i ,
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
(1)
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
where the δ i are the systematic bias errors for subject selection, sound field, equipment, and fitting instructions. The errors for many of these parameters are unknown, or might not be measurable and yet, they must be estimated. Measurement of the systematic bias error of signal generation equipment with a single set of subjects under multiple fits of a protector is prohibitively expensive if not impossible. In addition to attempting to understand the sources of systematic errors, an uncertainty budget was developed for each term in Eq. 1. The sources are assigned a standard uncertainty (ui dB), probability distribution (Normal, Uniform, etc.), and a sensitivity coefficient (ci) which yield the contribution to the overall expanded uncertainty of the measurement (ci ui dB). For instance the uncertainty of each threshold might be 1 dB, the calibration of the sound field 0.5 dB, the equipment 0.5 dB, and the fit 3 to 6 dB. In the estimate of REAT, the sensitivity coefficient would be 1 for all factors and the errors are likely to be normally distributed. 2 2 2 2 2 σ REAT = σ Occluded + σ Unoccluded + σ Fit + σ Calibratio n + σ Equipment
(2)
Calculating the overall uncertainty yields about 3.4 to 6.2 dB uncertainty to the overall measurement. An interlaboratory study of four test labs found evidence of bias due to subject selection and fitting instructions3, 4. The subjects in one laboratory were better able to fit a foam hearing protection device and in general performed better than the other three labs. The effect was demonstrated to be on the order of 6-10 decibels. Joseph5 demonstrated that group and individual instruction of a test panel effected 9 and 13 decibel increases, respectively, in the Noise Reduction Rating over the naïve subject-fit test. Systematic effects due to subject selection and instruction are expected to dominate other potential sources of uncertainty. The testing laboratory uses a Bekesy tracking procedure where the subject adjusts the stimulus level by means of a switch. A computer records the levels and responses. When the subject has met the threshold criteria, the average of the ascending and descending tracks is stored and the next stimulus frequency is tested. During 2000 and 2001, the National Institute for Occupational Safety and Health (NIOSH) developed a laboratory system for measuring REAT for hearing protection devices5. The NIOSH software has been installed in three laboratories: NIOSH Taft Laboratories, NIOSH Pittsburgh Research Laboratories, and Howard Leight Industries. One feature that was incorporated was the ability to track the standard deviation of a subject’s individual threshold estimate. In 2004, the variability of repeated threshold measurements using the Bekesy paradigm was conducted. This paper will describe the results of the analysis of variance on the subjects’ hearing thresholds for unoccluded and occluded conditions and will examine how the within-trial variance can be used to track subject performance. 2. METHODS In the NIOSH HPDLab software, the experimenter sets the initial number of reversals to be discarded (typically 2), the total number to collect (typically 8 reversals), the minimum range of successive reversals (3 dB), the maximum range of reversals (20 dB), the duty cycle (milliseconds on and off) and the rate of change (dB/sec). If the subject’s responses failed the range criteria, they were retested immediately. If they failed the task Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
three times, the experimenter reinstructed the subject on the psychophysical task. Upon completion, the mean and standard deviation of the last six reversals were calculated, stored in the database and reported onscreen to the experimenter. The NIOSH Pittsburgh Research Laboratory (PRL) tested three protectors for a study during 2004 using the ANSI S12.6 Method B protocol. The specific devices (an earmuff, foam earplug and flanged earplug) are not relevant to this analysis and discussion. Thirteen naïve subjects were recruited for the study. Subjects had normal hearing (< 20 dB HL re ANSI S3.6-1996) and no visible otoscopic abnormalities in both ears. Subjects had no experience with the use of hearing protectors and/or the testing of protectors as described in ANSI S12.6-1997. Each subject demonstrated proficiency in the psychophysical task by completing five unoccluded threshold measurements using the one-third octave band noise stimuli. The range of acceptable performance was 6 decibels for repeated measurements at all frequencies. The unoccluded and occluded thresholds were then measured twice for each protector. The rate of change for the stimulus level was 5 dB per second and a duty cycle of 250 ms on and 250 ms off. Howard Leight Industries is a division of Bacou-Dalloz, a multinational manufacturer of safety equipment. Howard Leight Industries installed the NIOSH HPDLab software and has been testing subjects since early 2004. Fifteen subjects were recruited to participate in an ongoing study of several hearing protection devices among which were two earmuffs, an expandable foam earplug and a premolded earplug. Subjects were required to meet the same standards of hearing thresholds listed above and
Figure 1: The variability of REAT threshold measurements determined using standard deviations of reversals and midpoints. The within-trial standard deviations are determined from the average of the standard deviations of the reversals (PRL, red squares) and the standard deviations of the midpoints of a Bekesy track (HLI, blue circles). The between-trial standard deviations are determined from the standard deviation of five unoccluded threshold measurements.
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
to demonstrate proficiency with the Bekesy testing paradigm. The rate of change of stimulus level was 5 dB per second and the duty cycle was 275 ms on and 275 ms off. Other than the specific protectors, the only substantial difference was the way in which the within-trial standard deviation was calculated and stored. The HLI software was updated following an analysis of the PRL data to calculate the within-trial standard deviation based upon the midpoint of the Bekesy tracks. Figure 1 compares the standard deviations of all threshold estimates collected at PRL (red squares) and HLI (blue circles) during the training portion of the testing. The abscissa is the average of the subject’s within-trial standard deviations determined with the reversal (PRL) or midpoint method (HLI) and the ordinate is the standard deviation of the repeated trials for the unoccluded training thresholds. Little difference in the variability for the subjects’ repeated thresholds is evident; however, the method of estimating the within-trial variance has a significant effect. This result was expected.
Figure 2: The within and between-trial standard deviations for two devices estimated from reversals and midpoints. Occluded trials are blue-filled diamonds and unoccluded trials are white or red diamonds. Each frequency is shown shifted slightly about the stimulus center frequency. The within-trial standard deviations from reversals for Muff 1 range between 6 and 11 dB. For Plug 2, the within-trial standard deviations range from 0.3 to 3 dB. The between-trial standard deviations were not significantly different for the muff while the plug exhibited significant group differences at 250, 500 and 1000 Hz indicated by red symbols.
An analysis of variance was conducted on the test results for the subjects from the two facilities. The standard deviations for the between and within-trial variance were extracted for each subject using the mean threshold and standard deviations recorded by the HPDLab software. In Figure 2, a representative set of standard deviations are plotted for the data measured with reversals (PRL Muff 1) and with midpoints (HLI Plug 2). The standard deviations for between and within-trial variability from an earmuff are plotted in
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
the two left panels and the between and within-trial variability data from an earplug are plotted in the right panels. The open diamonds are the unoccluded results and the filled diamonds are the occluded data. Occluded and unoccluded results are offset from the associated octave-band frequency to facilitate interpretation. The lower panels show the within-trial variability. The within-trial standard deviations derived from the extreme values of the reversals (PRL) are consistently higher (about 6-10 dB) whereas the standard deviations from midpoints are lower (about 0.3-2.5 dB). For the unoccluded thresholds, these results are consistent with what was derived from the training threshold data in Figure 1. Examination of the within-trial standard deviations reveals that the performance of the subjects regardless of the condition of occlusion is quite similar. That is, the subjects are able to perform the task of detecting the stimulus equally well. However, as soon as the protector is fit to the subject’s ear or head, the between-trial variability increased. For the earmuff condition, the effect is minimal. For the earplug shown in Figure 2, the fitting of the plug in the ear generally increased between-trial variability. The differences between the occluded and unoccluded standard deviations were significant for 250, 500 and 1000 Hz. The symbols for the unoccluded frequencies have been filled with red to indicate significant effects. Since the standard deviations derived from the reversals overestimate the within-trial variability, the remaining PRL data will not be discussed further except with respect to the simulation of a Bekesy threshold test.
Figure 3: The within and between-trial standard deviations for two devices estimated from midpoints. Occluded trials are blue-filled diamonds and unoccluded trials are white or red diamonds. Each frequency is shown shifted slightly about the stimulus center frequency. Occluded and unoccluded conditions were significantly different for Muff 2 at 1000 Hz and Plug 1 at 4000 Hz. The between-trial standard deviations were typically larger than the within-trial deviations.
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
Similar HLI results for the remaining three devices tested with the Bekesy paradigm and the standard deviations calculated with the midpoint method are plotted in Figure 3. For all of the protectors, the within-trial standard deviations are not significantly different across frequencies or between occluded and unoccluded conditions. The subjects exhibited the same ability to perform the task whether the ears were occluded or unoccluded. For the earmuffs, the between-trial variability did not exhibit any statistically significant effects except for Muff 2 at 1000 Hz. For the earplugs, however, the ability to fit the device was statistically significant. The significant differences for Plug 2 have been discussed and for Plug 1 the difference at 4000 Hz between occluded and unoccluded conditions was statistically significant. 3. DISCUSSION The PRL data demonstrated that the standard deviation of the reversals was not representative of the standard deviation of the repeated threshold measurements. A simulation of the Bekesy test was developed to investigate the different approaches to reporting an uncertainty parameter for an individual threshold estimate. The subject response probability was modeled with a pair of logistic psychometric functions7. If the stimulus level was well above the “threshold”, the probability of a positive response was near one. The slope of the psychometric function controlled the “performance” of the subject during the task. A steep slope resulted in a perfect subject and a shallow slope a poor subject. In order to model differing ascending and descending criteria, the logistic curves were shifted to provide lower and upper detection limits.
Figure 4: Schematic design of a Bekesy paradigm threshold simulation. Two psychometric functions are depicted in the left panel corresponding to the differential probability for detecting stimuli for ascending and descending sequences. The right panel illustrates a simulated response. The plus signs (+) indicate the virtual subject detected the stimulus; the open circles (o) indicate the stimulus was not detected. As the track goes from the base of the figure to the top, the reversals are apparent and a stopping criterion for threshold identification has been reached: ignore the first 2 reversals and use the last 6 reversals.
In Figure 4, the left panel shows the pair of functions and the right panel shows a set of simulated responses using the Bekesy paradigm. The subject starts off near Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
threshold, 0 dB, hears the stimulus and responds until the level reaches – 2 dB. The level increases until it reaches +2 dB and so on. Increasing the separation of the 50% probability points increases the distance between the minima and maxima. For simulations in Figure 5, the slope was varied between 100 to 0.01 and 200 threshold tests were simulated for each slope. Figure 5 demonstrates the effect of changing the sensitivity on the estimates of the standard deviation of the threshold when calculated from the reversals and from the midpoints of the Bekesy tracks. For the steepest slope, 100, the simulated response exhibits the smallest standard deviation for midpoints, and mean thresholds computed from the reversals and from the midpoints. The standard deviation of the reversals is also greatest for this condition. The simulation yields ascending and descending responses that hit or miss at the same level each time. As the slope was decreased, the standard deviation of the reversals decreased monotonically. While the standard deviation from the midpoints does not exactly track the standard deviation of the mean, it does follow it more closely than does the standard deviation of the reversals. Therefore, the standard deviation of an individual threshold test measured with the Bekesy tracking procedure should be estimated by the standard deviation of the midpoints.
Figure 5: The effect of changing the sensitivity of the psychometric function’s slope on the standard deviations of threshold estimates. The slopes of psychometric functions were varied from steep (100) to shallow (0.01) to simulate differing levels of subject performance. Steep slopes resulted in highly repeatable simulations. Shallow slopes resulted in noisy responses. At about a slope of 0.9, the standard deviation of the midpoints (green line) reaches a maximum. The standard deviation of the reversals (blue line) starts off at the separation of the ascending and descending psychometric functions and the number of reversals used in the estimate (10 dB, 6 reversals: 0,10, 0, 10, 0, 10). The standard deviations of the thresholds determined by the simulation from the means of both the reversals and the midpoints (red and black lines, respectively) are plotted as well and tracked closely the standard deviation of the midpoints.
From the analysis of between and within-trial variability, the effect of fitting the hearing protection device can be extracted. In the case of Plug 2 in Figure 2, the effect
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005
Uncertainties in REAT measurements for naïve subjects
Murphy, Byrne, Witt & Duran
of the between-trial variability seems to be about 6-9 dB. For other devices such as the Muff 1 and Muff 2 in Figure 3, the between-trial variability is more on the order of 3-5 dB. However, for the devices examined with the midpoint method, the within-trial standard deviations were about 1 dB or less. Although a formalized method of extracting the true uncertainty has not been completely developed, this research gives an indication that such an analysis can be conducted. 4. CONCLUSION How a subject wears a hearing protector and the subject’s noise exposure spectrum are the most critical aspects of choosing appropriate protection. For the purpose of developing a standard and comparing results measured in different laboratories or at different times, the REAT measurement must be quantified and analyzed using statistical tests of significance. Repeated tests of a group of subjects for an earmuff may result in statistically significant differences of 1 dB at all frequencies. Conversely, repeated tests of another protector may yield mean differences across frequencies and tests of 5 dB which are not significantly different due to the greater variability in fit. This paper shows that the variability of the individual thresholds is useful for understanding both the occluded and unoccluded performance of an REAT measurement. Furthermore, the error in the occluded and unoccluded thresholds might be limited to 1 or 2 dB depending upon the paradigm. Variability of the protector’s fit can be confounded with the variability of the thresholds. Hearing protector testing laboratories need to assess the within-trial variability to adequately determine the uncertainties associated with the REAT measurement process. REFERENCES 1
ANSI S12.6-1997 (R2002). American National Standard methods for the Measurement of Real-ear Attenuation of Hearing Protectors. American National Standards Institute, New York, (2002). 2 ISO/CD/TS 4869-7 "Acoustics - Hearing protectors -Part 7: Method for estimation of noise reduction using fitting by inexperienced test subjects" International Organization for Standardization, Geneva, (2005). 3 J.D. Royster, E.H. Berger, C.J. Merry, C.W. Nixon, J.R. Franks, A. Behar, J.G. Casali, C. Dixon-Ernst, R.W. Kieper, B.T. Mozo, D. Ohlin, and L.H. Royster. Development of a new standard laboratory protocol for estimating the field attenuation of hearing protection devices. Part I. Research of Working Group 11, Accredited Standards Committee S12, Noise. J. Acoust. Soc. Am., 99:15061526, 1996. 4 W.J. Murphy, J.R. Franks, E.H. Berger, A. Behar, J.G. Casali, C. Dixon Ernst, E.F. Krieg, B.T. Mozo, D.W. Ohlin, J.D. Royster, L.H. Royster, S.D. Simon, and C. Stephenson. Development of a new standard laboratory protocol for estimation of the field attenuation of hearing protection devices: Sample size necessary to provide acceptable reproducibility. J. Acoust. Soc. Am., 115:311-323, 2004. 5 Antony R. Joseph Attenuation Of Passive Hearing Protection Devices As A Function Of Group Versus Individual Training, Doctoral Dissertation, Mich.St. Univ., (Lansing, MI) 2004. 6 W.J. Murphy and J.R. Franks. Software development for NIOSH hearing protector testing, J. Acoust. Soc. Am., Vol. 112 No. 5 Pt. 2, 2295 (2002). 7 N. Jeremy Hill. Testing Hypotheses About Psychometric Functions: An investigation of some confidence interval methods, their validity, and their use in assessment of optimal sampling strategies, Doctoral Dissertation, St. Hugh’s College, Univ. (Oxford, UK) 2001.
Noise-Con 2005, Minneapolis, Minnesota, October 17-19, 2005