Title: The development and cross-validation of a self-report inventory to assess pure-tone threshold hearing sensitivity. Authors: Coren, Stanley, Hakstian, A. Ralph Psychology Dept., University of British Columbia, Vancouver, Canada Source: Journal of Speech & Hearing Research, Vol 35(4), Aug, 1992. pp. 921-928. Abstract: Using a pool of 108 items and with 384 Ss (aged 17–75 yrs), a self-report inventory was developed in Exp 1 for group testing or survey administration, which appeared to have high correlation with pure-tone hearing thresholds (PTHTs). The stability of the instrument was assessed in Exp 2 with 102 students (mean age 19.1 yrs), and crossvalidation of the instrument occurred in Exp 3 with 422 Ss (aged 17–76 yrs). The resulting 12-item Hearing Screening Inventory (HSI) was reliable with an internal consistency coefficient of .89 and test–retest stability coefficient of .88. The correlation between PTHTs in the better ear and the HSI scores for the combined samples was .81. The correct classification rate for the HSI was 92.1% for a low fence of 25-db hearing level and 93.4% for a high fence of 55-db hearing level. A conversion equation with estimated variability is provided for point estimates of PTHTs from the HSI scores.
THE DEVELOPMENT AND CROSS-VALIDATION OF A SELF-REPORT INVENTORY TO ASSESS PURE-TONE THRESHOLD HEARING SENSITIVITY Stanley Coren and A. Ralph Hakstian Previous attempts to assess hearing loss by means of self-report survey items have shown only low to moderate correlations with actual audiometric measures, probably because these attempts used items with high face validity rather than laboratory-tested validity. Beginning with a pool of 108 items used with 384 individuals, we developed a self-report inventory (see Appendix) suitable for group testing or survey administration, which appears to have high correlation with pure-tone hearing thresholds. The inventory was then cross-validated against laboratory audiometric measures in a separate sample of 422 subjects. The resulting 12-item Hearing Screening Inventory (HSI) was shown to be reliable with an internal consistency coefficient (alpha) of 0.89 and test-retest stability coefficient of 0.88. The correlation between pure-tone hearing thresholds in the better ear and the HSI scores for the combined samples was r = 0.81. The correct classification rate for the HSI was 92.1% for a low fence of 25-dB hearing level and 93.4% for a high fence of 55-dB hearing level. A conversion equation with estimated variability is also provided for point estimates of pure-tone hearing thresholds from the HSI scores. A copy of the inventory and scoring procedure is appended to this report. KEY WORDS: hearing assessment and screening, hearing thresholds, hearing measurement, questionnaires/surveys, audiology The most popular means of testing for hearing disabilities is still pure-tone audiometry. Although such testing is relatively simple in a clinical setting, it still requires specialized equipment operated by trained personnel, a quiet environment, and face-to-face contact with one test subject at a time. Since the mid-1930s, in an attempt to reduce the time and expense involved in such audiological testing for epidemiological and screening purposes, various self-report inventories have been introduced. Some of the earliest contained only a single questionnaire item with response categories indicating the severity of the loss with a measure approximating a Guttman scale (e.g., Schein, Gentile, & Haase, 1970; United States Public Health Service, 1938). The vast majority of such questionnaires have used a set of face-valid items with multiple-response categories (e.g., High, Fairbanks, & Glorig, 1964; Kell, Pearson, Acton, & Taylor, 1971; Silverman, Thurlow, Walsh, & Davis, 1948). By the end of the 1970s, such self-report inventories of auditory function were beginning to reflect a shift in emphasis from a direct prediction of hearing loss to the prediction of hearing handicap. Hearing handicap was defined in terms of the effect that the individual's hearing impairment has upon his or her everyday activities, such as communication and social interaction, rather than in terms of dB of hearing sensitivity loss (Alpiner, 1979; Giolas, Owens, Lamb, & Schubert, 1979; Noble & Atherley, 1970; Rosen, 1979). By the early '80s, two books had been written on the subject, both with a major focus on the self-report methodology for measuring hearing handicap (Giolas, 1982; Noble, 1978). This emphasis on hearing handicap is understandable, since it plays an important
role in the medicolegal realm, especially in connection with questions of compensation for occupational hearing loss (Giolas, 1982; Kryter, 1985; Noble, 1978, 1983; Pawing, 1985). Another reason self-report inventories have come to focus on hearing handicap rather than hearing loss is that the questionnaire approach has proven not to yield very high correlations with performance measures such as pure-tone threshold testing. For instance, the pencil and paper inventory that has been the most successful to date in predicting actual audiometric measures is the Hearing Handicap Scale (HHS). This questionnaire was introduced by High, Fairbanks, and Glorig (1964) and recently has been modified (the HHSE) for specific use with elderly samples (Lichtenstein, Bess, & Logan, 1988; Marcus-Bernstein, 1986; Newman & Weinstein, 1988; Sever, Harry, & Rittenhouse, 1989; Weinstein & Ventry, 1983). The HHS was specifically designed to measure hearing handicap, so it may not be surprising that it has only low to moderate correlation with clinical performance measures of hearing sensitivity. If we use the 19 samples reviewed in Giolas (1982) to assess the relationship between the HHS and pure-tone audiometry, we find an average correlation of only 0.57. This correlation is certainly large enough to demonstrate the feasibility of designing a questionnaire that is significantly related to pure-tone testing. On the other hand, since this relationship accounts for only 32% of the variance, the statistical efficiency of predicting actual pure-tone threshold scores from HHS scores is probably too low to warrant general use in this way. Thus the HHS may serve a useful general screening purpose, but lacks the precision necessary for epidemiological, survey, and assessment purposes, for which it would be desirable to have a measure whose scores were transformable into a direct estimate of pure-tone threshold sensitivity. It should be possible to develop a self-report inventory that produces scores that closely predict audiometric performance if the scale is developed following appropriate psychometric and statistical procedures. This is particularly important in terms of item selection. In the previous attempts at scale development, item selection focused upon face validity of the individual inventory items as determined by the researcher or other audiological professionals who may have been consulted. Unfortunately, face-valid items are not always the most useful in practice, since face validity does not ensure predictive validity (Cronbach, 1984). A recent set of studies by Coren and Hakstian (1987, 1988, 1989) have shown that it is possible to develop self-report inventories with a high degree of classification accuracy when compared to clinical tests of sensory capacity. The goal of these authors has been to develop self-report tests of sensory function that are (a) valid, when tested against standard laboratory and clinical instruments, (b) psychometrically reliable, (c) based upon information reported by the person being evaluated and not dependent upon previous clinical diagnoses or direct knowledge of previous testing, (d) capable of measuring the broad range of sensory function, (e) independent of specific disease states, (f) applicable across all adult age groups (e.g., 18 years and older), (g) brief and easy to comprehend, (h) suitable for self-administration, (i) easily reproducible, and (j) capable of producing scores that can be interpreted in terms similar to standard clinical measures. The methodology that Coren and Hakstian (1987, 1988, 1989) used started with a large number of items in the original item pool. Based upon the item correlations with actual clinical testing, these authors then empirically selected those that were most predictive of objectively measured sensory status. Finally, using statistical procedures, these researchers selected the set of items
that produced the best composite prediction of actual laboratory measures of sensory function. After the initial development sample was tested, Coren and Hakstian then cross-validated the inventories on a completely independent sample. Working in the realm of vision, these authors were able to produce a 10-item scale that diagnosed color vision deficiencies with a correct classification rate of 89%, corresponding to a sensitivity of 89% and a sensitivity of 81% (Coren & Hakstian, 1988). They were also able to produce a 12-item scale to measure uncorrected binocular visual acuity with a correct classification accuracy (within one Snellen line) of 91% (Coren & Hakstian, 1989). The study reported below represents the development and cross-validation of a self-report inventory designed to predict pure-tone thresholds. The procedures followed were similar to those used by Coren and Hakstian for the development of sensory inventories that have been successful for predicting functional levels in the visual modality. The resulting scale and interpretive guidelines are also presented in what follows. EXPERIMENT 1: DEVELOPMENT OF THE HEARING SCREENING INVENTORY Method Materials The first step in the development of this instrument involved the assembly of an initial item pool. The original set consisted of 216 items selected from frequently reported situations in which hearing loss impairs performance, from symptom lists, perceptual assays, previous hearing inventories, and case histories. Before actual testing, this set was reduced to 108 items by eliminating items that required specialized experience, items with too much overlap, and items that were misinterpreted or were confusing to a pilot sample of 14 university student volunteers. The set of items used for the development phase of testing contained several types. Some were fairly global in nature (e.g., "Are you ever bothered by feelings that your hearing is poor?"), whereas others were fairly situation-specific (e.g., "Can you hear the telephone ring when you are in the room next door?"). All items were chosen to have some degree of face validity. Each was stated as a five-alternative question with one of two types of response categories. For the majority of the questions, subjects were given the graded response series Never, Seldom, Occasionally, Frequently, and Always. This set of alternatives appears to work well with most task-related items. For some types of global questions the response categories provided were Good, Average, Slightly below average, Poor, and Very poor. Subjects Previous attempts at self-report inventory development for hearing impairment have often used samples of subjects known to be clinically normal, who are then contrasted to samples of individuals who had been clinically diagnosed (prior to inclusion in the sample) as having a particular type and degree of hearing loss. This procedure, although having the advantage of guaranteeing a large number of cases of defective hearing in the sample, has some statistical and interpretive problems. Such advance case selection to augment the number of defective hearing cases effectively alters the base rate for the deficit relative to the population at large. This
artificially inflates the observed correlations and measures of sensitivity and specificity (see Griner, Mayewski, Mushlin, & Greenland, 1981, for a discussion of this problem, and Khan & Khan, 1982, and Watson & Tang, 1980, for applied examples). For this reason, subjects were not pre-selected on the basis of previous hearing tests. The only factors considered in sample composition were maintaining an adequate balance of men and women and achieving a reasonably broad sampling of the adult age range. The development sample consisted of 384 subjects (231 female and 153 male) who responded to advertisements for a "Vision and Hearing Survey" in a university setting and from the general surrounding urban community in Vancouver, BC, Canada. Subjects ranged in age from 17 to 75 years with a mean age of 35.5 years (age distribution: 17 to 25 years = 241:26 to 65 = 84; 65 to 75 = 59). The tested American Medical Society (AMA) average hearing level (HL) (re ANSI, 1969), which is described below, was distributed as follows: hearing level less than 10 dB = 167; 10 to 24 dB = 125; 25 to 39 dB = 59; 40 to 54 dB = 15; 55 dB or more = 18 (ANSI, 1969). Information on occupation and environmental exposure to noise was not available for this sample. Procedure Testing took place on 2 separate days. In the first session, the set of 108 hearing-related selfreport items were administered. If subjects typically wore a hearing aid, they were asked to respond as if they were not wearing it. A second session was used to obtain an objective assessment of pure-tone hearing thresholds. The separation of testing into two sessions and the order of these sessions were designed to minimize potential confounding between self-reports and knowledge of performance on the objective measure of auditory function, For pure-tone audiometry, all subjects were tested individually in a sound-deadened room. A MALCO (MA-24) audiometer was used to obtain pure-tone air-conduction hearing thresholds for test frequencies of 500, 1000, 2000 and 4000 Hz. Each ear was tested separately, using an initial descending sequence of tones, followed by three ascending sequences with 5-dB steps (Hodgson, 1980). The median of the three ascending measures served as an estimate of threshold sensitivity for that ear. Results and Discussion Before further analyses could be performed, a decision was required about the way to summarize each subject's pure-tone threshold audiogram. A number of different formulas are currently in use to provide a single number that can describe an individual's auditory sensitivity. These involve averages over different subsets of frequencies going from a full-range average of the pure-tone thresholds for six standard audiometric test frequencies arranged in octaves (cf. Coren, 1989) through several other averages based upon three to four frequencies that center in the speech range. It was decided to use one of these latter measures, namely the American Medical Association (AMA) formula, which uses the frequencies 0.5, 1, 2, and 4 kHz (American Medical Association, 1947). This formula is widely used in the United States and has been adopted as the official measure for legal assessment of hearing loss in several states and by Australia and Italy. In addition, Coren and Hakstian (1990a) have provided simple conversion formulas that allow pure-tone thresholds based upon AMA averages to be easily converted to estimates of the other popular measures for summarizing pure-tone audiograms, including the AAOO (American
Academy of Ophthalmology and Otolaryngology, 1969), BAOL/ BSA (British Association of Otolaryngologists and the British Society of Audiology, 1983), Industrial Scale and Veterans Administration Scale (Suter & von Gierke, 1975), and Full Range Average (Coren, 1989). The AMA average for the better ear and the mean AMA index for both ears combined were used in the subsequent analyses (cf. Coren & Hakstian, 1990b). To score the questionnaire, individual item responses were numerically coded with a 1, 2, 3, 4, or 5, running from a 1 for "never" or "good" up to 5 for "always" or "very poor." For each of the original 108 items, Pearson product-moment correlations were computed between the item responses and the laboratory measure of pure-tone hearing thresholds. The items with the highest correlations were retained. Items that seemed to overlap other items too closely or that might be inappropriate for use on a broader cross section of the general population than used here were next eliminated. The remaining candidate items were then analyzed using the Furnival-Wilson algorithm (Furnival & Wilson, 1974) to isolate the subsets of items that produced the highest regression with the pure-tone threshold measure. After we had analyzed each of these resulting subsets for internal consistency reliability and criterion-related validity, a 12-item set, which we labeled the Hearing Screening Inventory (or HSI), was decided upon. The items of the HSI appear as an appendix to this paper. To score the HSI, one takes a simple sum over the 12-item responses. For seven of the items (1, 5, 6, 9, 10, 11, 12), "never" or "good" are coded I up through "always" or "very poor," which are coded 5, whereas for five of the items (2, 3, 4, 7, 8) "never" was coded as 5, through "always" coded as 1. This procedure yields a consistent scale score in which higher totals are indicative of self-reports of poorer hearing ability. The first step in examining the properties of the HSI was to determine an estimate of the psychometric reliability of the inventory. The internal consistency reliability of the scale was assessed by means of Cronbach's alpha coefficient (Cronbach, 1951). The obtained estimate of 0.91 is very high and is acceptable for all measurement purposes. With the internal consistency of this short hearing sensitivity scale established, our next concern was with its validity. The criterion-related validity of the HSI scale was assessed by comparing the total scale scores to the laboratory measures of pure-tone hearing threshold. The validity coefficients, represented by the product-moment correlations between the simple sum of questionnaire item responses (as described above) and the objectively measured pure-tone thresholds (summarized here as AMA average values), were computed. We should note that simply summing the questionnaire scale items as noted is sometimes referred to as unit weighting, that is, not weighting the items differentially, but rather equally (in raw, unstandardized form). Unit weighting of the items in this way greatly simplifies computation, of course, and also minimizes capitalization upon chance (Dawes & Corrigan, 1974). Because there were no significant differences between the HSI scores of men and women in this sample, all of the analyses presented below are based on a pooled-gender sample. The correlation between the HSI scale total and mean pure-tone AMA score for both ears combined was 0.83 (p < 0.001). The HSI correlation with better-ear AMA average was 0.82 (p < 0.001). These correlations are considerably higher than those usually reported for the Hearing Handicap Scale of High, Fairbanks, and Glorig (1964) discussed earlier (which, of course, was designed to measure
hearing handicap rather than sensitivity), and suggest that the HSI may prove to be a very useful instrument for predicting an individual's pure-tone hearing sensitivity thresholds. The criterion-related utility of the HSI may be illustrated in another manner. Let us establish a criterion or cutoff (a low fence) for what we will call a hearing impairment, and see how well the HSI can discriminate individuals on either side of that fence. The American Academy of Ophthalmology and Otolaryngology sets the fence between "no handicap" and "slight hearing handicap" at 25-dB hearing level (H. Davis, 1965). This same 25-dB fence for pure-tone threshold testing is used by the Veterans Administration as the criterion for normal hearing (Noble, 1978; Suter & von Gierke, 1975). Although some authors note measurable degradation in speech intelligibility with hearing losses less than 25 dB (e.g., Kryter, 1985), there appears to be a consensus that people with thresholds of less than 25 dB generally do not experience problems with normal auditory functioning, whereas hearing levels greater than 25 dB produce a small but measurable effect on behavior, particularly with faint speech (e.g., A.C. Davis, 1983; Giolas, 1982, Noble, 1978). Requiring the scale to discriminate impairments at this low fence is setting it a difficult task. However, if this instrument is ultimately to be used as a screening device, it should have some sensitivity for minimal hearing loss. Measured better-ear pure-tone thresholds were cross-tabulated against various HSI scale totals in order to determine an optimal cutoff value. Optimal classification was found to occur when HSI scores between 12 and 27 were classified as normal, and scores between 28 and 60 were classified as hearing impaired. With this optimal HSI cut point and the pure-tone threshold low fence established, we then computed the sensitivity (the probability that an individual with a hearing level of 25 dB or greater will have an HSI score exceeding the cut point of 27) and the specificity (the probability that a person with an AMA average hearing threshold not exceeding 25 dB will have an HSI score of 27 or less). With the actual hearing impairment quantified by the better-ear AMA average, the specificity (correct detection of normal hearing subjects) was 92.6%. The sensitivity (correct classification of impaired subjects) was 86.7%. The HSI's overall correct classification ability, therefore, was 91.4% for this sample. On the basis of the foregoing results from Experiment 1, based upon a single development sample, it was demonstrated that the HSI has high internal consistency and criterion-related validity, with attendant high sensitivity and specificity. Let us next consider the stability of the instrument. EXPERIMENT 2: ASSESSMENT OF STABILITY OF THE HSI Method The HSI, consisting of the 12 items chosen in Experiment 1, was administered to 102 students in an introductory psychology course (mean age of 19.1 years). A re-administration of the inventory occurred 4 weeks later, resulting in a full data set (both initial test and retest results) from 93 subjects. The re-administration of the HSI used a slightly altered presentation (consisting of the same items, but in a different order). Subjects were not given prior warning that the inventory would be administered a second time and were told, upon the second administration, to respond as if they were assessing their own hearing "today."
Results and Discussion The HSI was scored for both administrations as indicated in Experiment 1. There was no significant difference between the means of the HSI for the two administrations [means of 20.8 and 21.1; t(92) = 0.68]. To determine test-retest reliability, or stability, the Pearson product moment correlation was computed between the HSI scores on the two administrations, with a resulting r of 0.88 (p < 0.001). The high test-retest reliability observed in the HSI scores suggests that the inventory yields a stable measurement of the hearing status of the individual, which would, of course, be expected to remain relatively unchanged over the period of a month, in addition to providing reassurance as to the psychometric reliability of the HSI, this result also suggests that the inventory might be useful as a means of longitudinally monitoring hearing ability in any given individual. Given the degree of test-retest reliability observed here, it is likely that any marked changes in an individual's retest score would likely be a reflection of an actual auditory sensitivity change that might have occurred during the interval between the initial and subsequent testing. Using the reliability measure obtained in Experiment 2, for instance, we can compute that an HSI score change of 4 points or more would exceed the 95% confidence interval (+/- 3.24), whereas a change of 5 points would exceed the 99% confidence interval (+/- 4.26). Hence a shift in an individual's HSI score by 4 or 5 points probably represents a significant change in hearing status. EXPERIMENT 3: CROSS-VALIDATION OF THE HEARING SCREENING INVENTORY Before we accept the HSI as a demonstrably useful instrument for screening for hearing loss, it must be recognized that the procedures used in Experiment 1 might be expected to have capitalized somewhat on chance, causing an artificial inflation of the apparent validity of the scale. It will be recalled that we began by first selecting the questionnaire items that individually predicted the objective measure of pure-tone thresholds, and then created a scale consisting of these items. By means of Experiment 1 we demonstrated the predictive validity of this composite scale, using exactly the same sample as we had used to select the individual items. Good psychometric practice requires that we cross-validate this scale--that is, assess its validity when applied with an independent sample of subjects. Cross-validation corrects for the capitalization upon chance relationships between individual items and the objective criterion that may have occurred in the development sample. To effect such a cross-validation, Experiment 3 was conducted. Method Procedure The procedures used in the cross-validation were very similar to those used in the original development study. Two separate test sessions were employed (the first for the self-report items and the second for audiometric testing) as in Experiment 1. In this cross-validation we did not use a large inventory of items since item selection was no longer an issue. Instead, in the first session, subjects received only the 12-item HSI. Pure-tone threshold testing was conducted in the second session, and was identical to that employed in Experiment 1.
Subjects The subject sample was demographically similar to that used in Experiment 1, and similar procedures were used to solicit volunteers. The cross-validation sample was somewhat: larger than the development sample, totalling 422 subjects. It contained 254 women and 168 men, with a mean age of 35.6 years and an age range of 17 to 76 (age distribution: 17 to 25 years = 261; 26 to 64 = 96; 65 to 76 = 65). The tested AMA average hearing levels were distributed as follows: hearing level less than 10 dB = 213; 10 to 24 dB = 113; 25 to 39 dB = 57; 40 to 54 dB = 21; 55 dB or more = 18. Information on occupation and environmental exposure to noise was not available for this sample. Results and Discussion The statistical analyses followed a pattern similar to that used in the development of the HSI, except that the item selection phase was not needed. Each individual's HSI was scored as in Experiments 1 and 2. The reliability of the scale was again assessed by means of Cronbach's alpha coefficient. The obtained value of 0.89 for the cross-validation sample was nearly as high as that obtained (0.91) for the original development sample. As in Experiment 1, the criterion-related validity of the HSI scale was assessed by relating the total (12-item) HSI scale scores to the laboratory measure of pure-tone hearing threshold. The validity coefficients, represented by the product-moment correlations between the simple sum of questionnaire item responses and the objectively measured pure-tone thresholds (the AMA average) were computed. The correlation between the HSI scale total and the mean pure-tone AMA average for both ears combined was 0.81 (p < 0.001). The correlation with better-ear AMA average was 0.80 (p < 0.001). One expects some reduction in criterion-related validity between that obtained in the development sample and that in the cross-validation sample. In this instance this reduction was remarkably small, only 0.02 against both criteria. To further develop the nature of criterion-related utility, the same criterion categorization that we used in Experiment 1, that is, one that dichotomizes individuals into "normal" versus "slightly impaired," was applied. This low fence, or cutoff, occurs at an AMA average of 25 dB HL. We also used the same HSI cutting point that we established in the development phase, that is, an HSI score of 27. Individuals with scores between 12 and 27 were classified as normal, and individuals with scores between 28 and 60 were classified as hearing impaired. With the betterear pure-tone AMA average as the criterion, the specificity(correct detection of normal subjects) was 93.0%. The sensitivity (correct classification of impaired subjects) was 91.3%. The HSI's overall correct classification rate, then, was 92.6%, which actually slightly exceeds the parallel value obtained in the development sample and confirms the high classification accuracy of the instrument.
Interpretation of Hearing Screening Inventory Scores
Because analysis of the results from both development and the cross-validation samples produced virtually identical patterns of relationship between the HSI totals and pure-tone hearing thresholds, it appears warranted to combine these two samples in order to specify some of the normative characteristics of the inventory. These characteristics should provide the potential users of this scale with a clearer interpretation of HSI scores obtained for particular individuals or groups. The combined sample comprises a grand total of 806 subjects. Validity coefficients between HSI scores and pure-tone thresholds (the AMA average) were computed. The correlation between the HSI scale and the mean pure-tone AMA score for both ears combined was 0.82 (p < 0.001). The correlation with better-ear AMA score was 0.81 (p < 0.001). With this larger sample we can specify the relationship between pure-tone threshold sensitivity and HSI scores with greater confidence. The 95% confidence intervals for the correlations thus become for mean pure-tone AMA average for both ears combined, (0.80, 0.84), and for betterear AMA average (0.78, 0.83). This high correlation permits a regression-based estimate of AMA average (here for the better ear) as follows: pure-tone AMA average = 2.07 (HSI total) - 26.20 The standard error of estimate (which quantifies the scatter of the actual individual AMA average hearing level scores around the predictions made from the regression) is 10.12 dB. Thus 80% of the actual lab-tested pure-tone AMA averages will lie within +/- 12.95 dB of the AMA average predicted on the basis of HSI scores, 90% will lie within +/- 16.67 dB, and 95% will be within +/- 19.84 dB. Generally, in interpreting HSI scores for screening purposes, it may be more advisable to use score ranges rather than exact point estimates. To do this we can categorize HSI scores using both a low fence and a high fence. We have already discussed the low fence of 25-dB hearing level, which has been interpreted as defining the boundary between "no significant hearing loss" and "slight hearing impairment." This low fence thus serves as a reasonably sensitive criterion between "normal" and "impaired" hearing. For the combined sample, using the cutoff value of 27 established above, we find that the overall specificity (defined as the correct detection of normal subjects with AMA average hearing level less than 25 dB) is 92.8%. The sensitivity (the correct detection of impaired hearing using the definition of hearing level greater than or equal to 25 dB) is equal to 89.0%. The total classification accuracy thus becomes 92.1% for this low fence. Selecting a high fence to indicate major hearing loss is always a bit arbitrary. For example, for purposes of compensation for hearing loss, various venues differ as to the appropriate criterion. The United Kingdom uses a fence of 50 dB HL, Belgium uses 55 dB HL, and Japan and Denmark use 60 dB HL for purposes of compensation. The American Academy of Ophthalmology and Otolaryngology sets the fence between "mild hearing handicap" and "marked hearing handicap" at a 55-dB hearing level (H. Davis, 1965). Various reviews of the literature on hearing impairment (e.g., A.C. Davis, 1983; Giolas, 1982; Noble, 1978) agree that hearing losses at this level are quite substantial in their behavioral implications. With a 55-dB hearing level, the individual has difficulty understanding loud speech. This level corresponds to a
45% hearing loss (H. Davis, 1965). Let us define our high fence, for screening purposes, at 55 dB. When we do so, using the same procedures as in Experiment 1, we can set the optimal cutoff value for the HSI at 37. To evaluate the classification accuracy around this high fence, we computed the same statistics as before. Thus the specificity (defined here as the correct detection of individuals with AMA average hearing level less than 55 dB) is 93.2%. The sensitivity (here the correct detection of marked hearing impairment using a criterion of hearing level greater than or equal to 55 dB) is 100% for this sample. The total classification accuracy thus amounts to 93.4% for this high fence. Thus, if precise AMA estimates of hearing loss are not needed, HSI scores can be used to categorize subjects as to whether they have significant hearing loss, and if so, whether that hearing loss is mild or marked by use of these low and high fences based upon HSI cutoff values set at 27 and 37 respectively. Summary and Implications The research presented in this report describes the development and cross-validation of a brief, behaviorally validated screening instrument, which can classify individuals according to their pure-tone hearing sensitivity using self-report survey items. This scale has been shown to have high reliability--both internal consistency and stability--and to correlate highly with clinical audiometric test results, yielding validity coefficients of approximately 0.80 when related to pure-tone thresholds. Unlike previous auditory questionnaires, this scale was constructed using empirically selected items, which results in higher criterion-related validity and predictive utility. Our aim was to maximize prediction of direct audiometric measures, rather than to assess the handicapping effect of hearing loss upon occupational, social, communication, or other behavioral activities, as has been done in some other hearing questionnaires reviewed above. The resulting scale, which yields high specificity and sensitivity, provides a total score that can be converted to a predicted pure-tone threshold range or a point estimate with known confidence intervals. Using the low fence (25-dB loss) we have a misclassification rate of only 7.9%, and with the high fence (55-dB hearing level), the misclassification rate is only 6.6% (with HSI cutoffs of 27 and 37 respectively). The resulting HSI is brief, requiring only a few minutes to complete and to score. It could easily be administered by mail or in large group settings. It could be administered by itself as a simple auditory screening measure or as a brief hearing assessment component in larger health-related surveys. Since there are no requirements concerning any kind of audiometric apparatus, specific printing formats, controlled and quiet environments, or other testing-session desiderata, the HSI should be very attractive for studies in which the assessment of hearing ability of large samples of individuals is desirable for normative, epidemiological, or health monitoring reasons. The high test-retest reliability of the HSI may make it useful for detecting systematic changes in auditory sensitivity in targeted groups over extended periods of time, because a 5-point change in the HSI score exceeds the 99% confidence interval for retest reliability. Thus the HSI provides a potentially useful tool for researchers and clinicians who desire a quick, valid, and inexpensive method to estimate hearing sensitivity. Acknowledgments
This research was supported in part by grants from the British Columbia Health Care Research Foundation, the Medical Research Council of Canada, and the Natural Sciences and Engineering Research Council of Canada. The authors would like to acknowledge the assistance of Wayne Wong, David Wong, Tania Jackson, Joan Coren, Geof Donelly, Lynda Berger, Steve Park, Debbie Aks, and Dereck Atha, who assisted in the collection of data. The Hearing Screening Inventory is copyrighted by SC Psychological Enterprises Ltd., and reprinted by permission. It may be reproduced for research purposes only. We would appreciate receipt of copies of any data collected using this instrument, because we are trying to establish population norms to assist researchers in interpretation of data collected with the HSI. References Alpiner, J. G. (1979). Psychological and social aspects of aging as related to hearing rehabilitation of elderly clients. In M. Henoch, (Ed.), Aural rehabilitation for the elderly (pp. 169-184). New York: Grune & Stratton. American Academy of Ophthalmology and Otolaryngology (1969). American Academy of Ophthalmology and Otolaryngology guide for conservation of hearing. New York: AAOO. American Medical Association (1947). American Medical Association Council on Physical Medicine Tentative standard procedure for evaluating the percentage of hearing in medicolegal cases. Journal of the American Medical Association, 133, 396-397. American National Standards Institute (1970). Specifications for audiometers (ANSI 53.61969). New York: ANSI. British Association of Otolaryngologists and British Society of Audiology (1983). British Association of Otolaryngologists and British Society of Audiology BAOL/BSA method for assessment of hearing disability. British Journal of Audiology, 17, 91-94. Coren, S. (1989). Summarizing pure tone hearing thresholds: The equipollence of components of the audiogram. Bulletin of the Psychonomic Society, 27, 42-44. Coren, S., & Hakstian, A. R. (1987). Visual screening without the use of technical equipment: Preliminary development of a behaviorally validated questionnaire. Applied Optics, 26, 14681472. Coren, S., & Hakstian, A. R. (1988). Color vision screening without the use of technical equipment: Scale development and cross-validation. Perception and Psychophysics, 43, 115-120. Coren, S., & Hakstian, A. R. (1989). A behaviorally validated self-report inventory of the measurement of visual acuity. International Journal of Epidemiology, 18, 451-456.
Coren, S., & Hakstian, A. R. (1990a). Conversion between systems of hearing handicap measurement: An empirically determined computational procedure. Annals of Otology, Rhinology & Laryngology, 99, 977-979. Coren, S., & Hakstian, A. R. (1990b). Methodological implications of inter-aural correlation; Count heads not ears. Perception and Psychophysics, 48, 291-294. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Cronbach, L. J. (1984). Essentials of psychological testing (4th ed). New York: Harper and Row. Davis, A. C. (1983). Hearing disorders in the population: First phase findings of the MRC National Study of Hearing. In M. E. Lutman & M. P. Haggard (Eds.), Hearing science and hearing disorders. Lot)don: Academic Press. Davis, H. (1965). Guidelines for classification and evaluation of hearing handicap in relation to international audiometric zero. Transactions of the American Academy of Ophthalmology and Otolaryngology, 69, 740-751. Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 95-106. Furnival, G. M., & Wilson, R. W. (1974). Regression by leaps and bounds. Technometrics, 18, 499-511. Giolas, T. G. (1982), Hearing-handicapped adults. Englewood Cliffs, NJ: Prentice-Hall. Giolas, T. G., Owens, E., Lamb, S. H., & Schubert, E. D. (1979). Hearing Performance Inventory. Journal of Speech and Hearing Disorders, 44, 169-195. Grinner, P. F., Mayewski, R. J., Mushlin, A. I., & Greenland, P. (1981). Selection and interpretation of diagnostic tests and procedures. Annals of Internal Medicine, 94, 553-600. High, W. S., Fairbanks, G., & Glorig, A. (1964). Scale of self-assessment of hearing handicap. Journal of Speech and Hearing Disorders, 29, 215-230. Hodgson, W. R. (1980). Basic audiologic evaluation. Baltimore: Williams & Wilkins. Kell, R. L., Pearson, J. C. G., Acton, W. I., & Taylor, W. (1971). Social effects of hearing loss due to weaving noise. In Robinson, D. W. (Ed.), Occupational hearing loss. London: Academic Press. Khan, M. A., & Khan, M. K. (1982). Diagnostic value of HLA-B27 testing in ankylosing spondylitis and Reiter's syndrome. Annals of Internal Medicine, 96, 70-76.
Kryter, K. D. (1985). The effects of noise on man (2nd ed.). Orlando, FL: Academic Press. Lichtenstein, M. J., Bess, F. H., & Logan, S. A. (1988). Diagnostic performance of the Hearing Handicap Inventory for the elderly (screening version) against differing definitions of hearing loss. Ear and Heating, 9, 208-211. Marcus-Bernstein, C. (1986). Audiologic and nonaudiologic correlates of hearing handicap in black elderly. Journal of Speech and Heating Research, 29, 301-312. Newman, C. W., & Weinstein, B. E. (1988). The Hearing Handicap Inventory for the elderly as a measure of hearing aid benefit. Ear and Hearing, 9, 81-85. Noble, W.G. (1978). Assessment of impaired hearing: A critique and a new method. New York: Academic Press. Noble, W. G. (1983). Hearing, hearing impairment, and the audible world: A theoretical essay. Audiology, 22, 325-338. Noble, W. G., & Atherley, G. R. C. (1970). The Hearing Measurement Scale: A questionnaire for the assessment of auditory disability. Journal of Auditory Research, 10, 229-250. Parving, G.S.A. (1985). Hearing disability and communication handicap for compensation purposes based on self-assessment and audiometric testing. Audiology, 24, 135-145. Rosen, J. K. (1979). Psychological and social aspects of the evaluation of acquired hearing impairment. Audiology, 18, 238-252. Schein, J. D., Gentile, A., & Haase, K. W. (1970). Development and evaluation of an expanded hearing loss scale questionnaire. National Center for Health Statistics (Series 2, No. 37). Rockville, MD: U.S. Department of Health, Education and Welfare. Sever, J. C., Jr., Harry, D. A., & Rittenhouee, T. S. (1989). Using a self-assessment questionnaire to identify probably hearing loss among older adults. Perceptual and Motor Skills, 69, 511-514. Silverman, S. R., Thurlow, W. R., Walsh, T. E., & Davis, H. (1948). Improvement in the social adequacy of hearing following the fenestration operation. Laryngoscope, 58, 607-631. Suter, A., & von Gierke, H. E. (1975). Evaluation and compensation of occupational hearing loss in the United States. In G. Rossi & M. Vignone (Eds.), L'Uomo e il rumore (pp 359-366). Turin: Minerva Medica. United States Public Health Service (1938). Preliminary analysis of audiometric data in relation to clinical history of impaired hearing. The National Health Survey, Hearing Study Series, Bulletin No. 2. Washington DC: U.S. Public Health Service.
Watson, R. A., & Tang, D. B. (1980). The predictive value of prostatic acid phosphatase as a screening test for prostatic cancer. New England Journal of Medicine, 303, 497-499. Weinstein, B. E., & Ventry, I. M. (1983). Audiometric correlates of the Hearing Handicap Inventory for the elderly. Journal of Speech and Hearing Disorders, 48, 379-384. Accepted November 4, 1991 Contact author: Stanley Coren, PhD, Department of Psychology, University of British Columbia,
[email protected] Appendix with Hearing Screening Inventory follows ~~~~~~~~ By Stanley Coren and A. Ralph Hakstian, University of British Columbia, Vancouver, Canada
Hearing Screening Inventory Coren and Hakstian (1992) Instructions: This questionnaire deals with a number of common situations. For each question you should select the response that describes you and your behaviors best. You can select from the following alternatives: Never (or almost never)
Seldom
1) Are you ever bothered by feelings that your hearing is poor? 2) Is your reading or studying easily interrupted by noises in nearby rooms? 3) Can you hear the telephone ring when you re in the same room in which it is located? 4) Can you hear the telephone ring when you are in the room next door? 5) Do you find it difficult to make out the words in recordings of popular songs? 6) When several people are talking in a room, do you have difficulty hearing an individual conversation? 7) Can you hear the water boiling in a pot when you are in the kitchen? 8) Can you follow the conversation when you are at a large dinner table? For the last four questions use these labels as your answers 9) Overall, I would judge my hearing in my right ear to be... 10) Overall, I would judge my hearing in my left ear to be… 11) Overall, I would judge my ability to make out speech or conversation to be … 12) Overall, I would judge my ability to judge the location of things by the sound they are making alone to be…
Occasionally
Frequently
Never
Seldom
Good
Average
Always (or almost always)
Occasionally Frequently Always
Slightly Below Average
Poor
Very Poor
Scoring instructions: For Questions 1, 5, and 6 responses are scored from 1 for “Never,” 2 for “Seldom,” 3 for “Occasionally,” 4 for “Frequently,” and 5 for “Always.” For Questions 2, 3, 4, 7 and 8 are reverse scored from 1 for “Always” up to 5 for “Never.” For Questions 9 through 12 scoring goes from 1 for “Good” to 5 for “Very Poor.” Your hearing score is just the sum of these 12 items. The table below provides the predicted best hear hearing sensitivity (this prediction 92 percent accurate). HSI scale total
predicted best-ear sensitivity
12 to 27 28 to 37 38 or more
hearing is normal hearing loss 25 dB or more (some conversational hearing loss) hearing loss 55 dB or more
*The Hearing Screening Inventory is copyrighted by SC Psychological Enterprises Ltd. and is reprinted here with permission