Interobserver Variability in Assessing Adequacy of the Squamous ...

42 downloads 0 Views 844KB Size Report
original criterion of 10% slide coverage, 15 smears had a unanimous designation; the ..... the 10% slide rule, future studies are warranted to examine the clinical ...
Anatomic Pathology / INTEROBSERVER VARIATION IN ASSESSING SQUAMOUS CELLULARITY

Interobserver Variability in Assessing Adequacy of the Squamous Component in Conventional Cervicovaginal Smears Matthew V. Sheffield, MD,1 Aylin Simsir, MD,2 Lynya Talley, PhD,3 A. Janie Roberson, SCT(ASCP),1 Paul A. Elgert, CT(ASCP), CMIAC2 and David C. Chhieng, MD1 Key Words: Cervicovaginal smears; Squamous cellularity; Adequacy; Bethesda System; Interobserver reproducibility DOI: 10.1309/08J6MDLCJPPHJND1

Abstract We compared the interobserver reproducibility of estimating the adequacy of the squamous component of conventional Papanicolaou (Pap) smears using traditional and newly proposed criteria. Forty conventional Pap smears with varying degrees of squamous cellularity were reviewed by 13 observers who evaluated adequacy (satisfactory vs unsatisfactory) based on the traditional criterion of estimating 10% slide coverage. After being introduced to the new criterion and the reference images, the observers reevaluated adequacy on the same set of smears, using the new criterion and the reference images. With the original criterion of 10% slide coverage, 15 smears had a unanimous designation; the overall kappa value was 0.49 (P < .001). With the newly proposed adequacy criterion and reference images, 17 smears had a unanimous designation; the overall kappa value was 0.60 (P < .001). The difference in the kappa correlation coefficients was statistically significant (P = .007). While traditional and newly proposed criterion resulted in fair interobserver agreement, it seemed that the newly proposed criterion, along with the use of reference images, for evaluating adequacy of the squamous component of conventional Pap smears result in better interobserver reproducibility.

The assessment and reporting of squamous cellularity has become an integral part in the evaluation of cervicovaginal (Papanicolaou, or Pap) smears. The 1991 Bethesda System for reporting gynecologic cytology diagnoses proposed that the criterion for adequate cellularity of conventional Pap smears should be the presence of wellpreserved and well-visualized squamous epithelial cells covering more than 10% of the slide surface.1 Although the 10% slide coverage criterion was somewhat arbitrary, the purpose was to encourage consistency and facilitate intralaboratory and interlaboratory reproducibility in Pap smear reporting. However, the reproducibility and the validity of determining and reporting the adequacy of the squamous component of Pap smears based on the 10% criterion later was scrutinized by several investigators. 2,3 The 2001 Bethesda System revised the criterion for determining adequate squamous cellularity for conventional Pap smears: an adequate conventional Pap smear should have an approximate estimation of a minimum of 8,000 to 12,000 wellpreserved and well-visualized squamous epithelial cells.4 It was emphasized that such an estimation was not to be based on an actual cell count. Cytologists should not count individual cells. Instead, the specimen should be compared with a set of reference images of known cellularity to determine squamous cell adequacy. We undertook the present study to determine whether the newly proposed Bethesda 2001 guidelines for squamous component adequacy and the use of a set of reference images results in improved interobserver reproducibility compared with the 10% criterion.

© American Society for Clinical Pathology 1

Am J Clin Pathol 2003;119:367-373 367 DOI: 10.1309/08J6MDLCJPPHJND1 1

Sheffield et al / INTEROBSERVER VARIATION IN ASSESSING SQUAMOUS CELLULARITY

Materials and Methods We selected 40 conventional Pap smears from the files of the Department of Pathology, University of Alabama at Birmingham. Twenty-nine smears originally were interpreted as satisfactory but limited by (SBLB) scant squamous cellularity and 2 as unsatisfactory because of inadequate squamous component based on the 1991 Bethesda System. The remaining 9 smears originally were reported as satisfactory specimens with an adequate squamous component. There were 13 observers; all were cytotechnologists. Their experience ranged from 3 to 29 years (mean, 10 years). The observers were asked to classify the smears as satisfactory or unsatisfactory based on the adequacy of squamous cellularity on 2 separate occasions approximately 4 weeks apart. Smears were randomized between occasions. Only a binary classification was used in this study because the category

A

C

368 Am J Clin Pathol 2003;119:367-373 2 DOI: 10.1309/08J6MDLCJPPHJND1

“Satisfactory but limited by” is eliminated in the latest Bethesda system.4 During the first round, the observers were asked to use the 10% slide coverage as the criterion; slides with 10% or more coverage with squamous cells were classified as satisfactory and those with less than 10% coverage as unsatisfactory. During the second round, observers were introduced to the new criterion for an adequate squamous component proposed in the Bethesda 2001 guidelines. The observers, through a didactic seminar, became familiar with the reference images to which the specimens were compared to determine whether there were a sufficient number of fields with similar or greater cellularity than the reference images ❚Image 1❚, ❚Image 2❚, and ❚Image 3❚. These computer-generated reference images exemplified the density of coverage needed using a 4× objective for 10 fields, 20 fields, and for an entire slide. For example, using a 4× objective, at least 10

B

❚Image 1❚ A, Representation of microscopic fields with the following parameters: objective, 4×; ocular, 10×; field number, 20; number of “cells,” 1,000. The cell size relation to the entire field corresponds to superficial squamous cells (50 µm); 10 fields would need to be covered at this density to have an estimated minimum of 10,000 cells. B and C, On a Papanicolaou smear, 10 microscopic fields with cellular density similar to the ones illustrated would be needed to have an estimated minimum of 10,000 cells.

© American Society for Clinical Pathology

Anatomic Pathology / ORIGINAL ARTICLE

fields must demonstrate cellularity similar to or greater than that depicted in Image 1 for the slide to contain a minimal number of 10,000 squamous cells. Similarly, at least 20 fields and the entire slide must demonstrate cellularity similar to or greater than that depicted in Image 2 and Image 3, respectively, to be considered adequate for squamous cells. Examples using “actual” Pap smears were presented to the participants of the study to illustrate this method. The observers were provided a copy of the computer-generated images for reference and were asked to reclassify the 40 smears as satisfactory or unsatisfactory based on the adequacy of squamous cellularity using the new criterion and the reference images. The kappa statistic was used to test the null hypothesis that there was no agreement among multiple observers. Estimates of agreement were computed using the MAGREE macro (SAS Institute, Cary, NC). P values equal to or less

A

C

than .05 were considered statistically significant. The degree of agreement for ranges of kappa values has been described by Landis and Koch.5 A kappa value of 0.70 or more would be interpreted as strong positive agreement, a value of 0.40 or less would signify weak positive agreement, and a coefficient between 0.40 and 0.70 would be interpreted as fair positive agreement.

Results The frequency distribution of smears designated as satisfactory by all observers based on the Bethesda 1991 and 2001 criteria is given in ❚Figure 1❚. The frequency of smears designated as satisfactory using the criterion of 10% slide coverage ranged from 0.5 to 0.9 (mean, 0.66; SD, 0.11), whereas the frequency of satisfactory smears based on the

B

❚Image 2❚ A, Representation of microscopic fields with the following parameters: objective, 4×; ocular, 10×; field number, 20; number of “cells,” 500. The cell size relation to the entire field corresponds to superficial squamous cells (50 µm); 20 fields would need to be covered at this density to have an estimated minimum of 10,000 cells. B and C, On a Papanicolaou smear, 20 microscopic fields with cellular density similar to the ones illustrated would be needed to have an estimated minimum of 10,000 cells.

© American Society for Clinical Pathology 3

Am J Clin Pathol 2003;119:367-373 369 DOI: 10.1309/08J6MDLCJPPHJND1 3

Sheffield et al / INTEROBSERVER VARIATION IN ASSESSING SQUAMOUS CELLULARITY

A

C

presence of a minimum of 10,000 squamous cells ranged from 0.325 to 0.675 (mean, 0.49; SD, 0.09). The difference was statistically significant (P = .001, t test). ❚Table 1❚ represents a symmetrical agreement matrix based on the 10% slide coverage. Because there were 40 smears and 13 observers, a total of 520 (40 × 13) interpretations were made. Each of these interpretations was compared with those made by the other 12 observers on the corresponding smears, resulting in a total of 6,240 possible pairs (40 slides × 13 observers × 12 comparisons). For example in Table 1, 342 designations were satisfactory. Each of these 342 designations was compared with the designations made by the other 12 observers on the corresponding smears, resulting in a total of 4,104 (342 × 12) comparisons. In 3,394 instances, the other observers made a designation of satisfactory, ie, concordant; in 710 instances, the designation was unsatisfactory. ❚Table 2❚ represents a symmetrical agreement table based on the newly proposed criterion (Bethesda 2001) 370 Am J Clin Pathol 2003;119:367-373 4 DOI: 10.1309/08J6MDLCJPPHJND1

B

❚Image 3❚ A, Representation of microscopic fields with the following parameters: objective, 4×; ocular, 10×; field number, 20; number of “cells,” 157. The cell size relation to the entire field corresponds to superficial squamous cells (50 µm); the entire slide would need to be covered at this density to have an estimated minimum of 10,000 cells. B and C, The entire smear should demonstrate cellular density similar to the ones illustrated to achieve the presence of an estimated minimum of 10,000 cells on a Papanicolaou smear.

and the use of reference images. The number of concordant pairs was higher with the newly proposed criterion (5,548) than with the “10% rule” (4,820). The difference was statistically significant (P < .001, z test). Based on the 1991 Bethesda system criterion for an adequate squamous component, 15 smears (38%) had a unanimous interpretation of the adequacy of the squamous component. Thirteen smears were classified as satisfactory, including 6 smears that originally were interpreted as satisfactory and 7 that were interpreted as SBLB scant squamous cells. All observers agreed that 2 smears, 1 originally designated as unsatisfactory and 1 as SBLB, were unsatisfactory because of the lack of an adequate squamous component. In addition, all except 1 observer agreed on 5 smears (13%); 2 were classified as satisfactory and 3 as unsatisfactory. The degree of agreement for the result categories “satisfactory” and “unsatisfactory” was fair (kappa coefficients, 0.490 and 0.490, respectively). These results © American Society for Clinical Pathology

Anatomic Pathology / ORIGINAL ARTICLE

Discussion One of the major contributions of the Bethesda System for reporting gynecologic cytology is the establishment of standard criteria for reporting specimen adequacy of Pap smears. Before publication of the Bethesda guidelines,

1.0 0.9 0.8 0.7 Frequency

were statistically significant (P < .001 for each category). The overall kappa value for this method was 0.490, and it was statistically significant (P < .001). When using the newly proposed criterion for adequate squamous component and the references images, there was complete agreement among all observers in the classification of 17 smears (43%). Twelve smears were classified as satisfactory, including 8 that originally were designated as satisfactory and 4 as SBLB scant squamous cellularity. The remaining 5 smears, including 1 originally designated as unsatisfactory and 4 as SBLB scant squamous cellularity, were designated unanimously as unsatisfactory because of the lack of an adequate squamous component. In addition, all except 1 observer agreed on 7 smears (18%); 2 were classified as satisfactory and 5 as unsatisfactory. Although the number of smears with unanimous agreement based on the new criterion was higher than that obtained with the old criterion, the difference did not reach statistical significance (P > .05, chi square). The kappa values for the result categories satisfactory and unsatisfactory were both 0.606, signifying a fair agreement. These results were statistically significant with P < .001 for each category compared with those obtained with the old criterion. The overall kappa value for this method was 0.606, and it was statistically significant (P < .001). The difference in kappa correlation coefficients based on the different sets of criteria was statistically significant (P < .007).

0.6 0.5 0.4 0.3 0.2 0.1 0.0 1

2

3

4

5

6 7 8 Observers

9

10 11

12 13

❚Figure 1❚ Frequency distribution of smears designated as satisfactory by observers based on the old “10% rule” (white bars) and the new Bethesda 2002 criterion (black bars).

objective definition and quantifiable morphologic criteria to define and report unsatisfactory smears were lacking. Interobserver agreement on what constituted an unsatisfactory Pap smear was poor.6 In addition, a substantial portion of the false-negative Pap smears that originally were designated as negative or within normal limits were deemed unsatisfactory for interpretation on retrospective review.7-9 Henry and Wadehra10 correlated smear quality in terms of squamous cellularity and the presence of an endocervical component with the detection of epithelial abnormalities in a series of 68,328 Pap smears. Superior smear quality, in particular the presence of an adequate squamous component, was associated with a higher detection rate of epithelial abnormalities. In another study, Ransdell et al11 reported that for 16% of

❚Table 1❚ Comparison of Variation in Designations Among 13 Reviewers Using the 10% Slide Coverage Rule No. of Designations Satisfactory Unsatisfactory Total

342 178 520

Satisfactory 3,394 710 4,104

Unsatisfactory

Total No. of Paired Comparisons

710 1,426 2,136

4,104 2,136 6,240

❚Table 2❚ Comparison of Variation in Designations Among 13 Reviewers Using the Newly Proposed Criterion and Reference Images

Satisfactory Unsatisfactory Total

No. of Designations

Satisfactory

Unsatisfactory

Total No. of Paired Comparisons

276 244 520

2,988 346 3,334

346 2,560 2,906

3,334 2,906 6,240

© American Society for Clinical Pathology 5

Am J Clin Pathol 2003;119:367-373 371 DOI: 10.1309/08J6MDLCJPPHJND1 5

Sheffield et al / INTEROBSERVER VARIATION IN ASSESSING SQUAMOUS CELLULARITY

patients with unsatisfactory Pap smears, follow-up revealed a diagnoses of squamous intraepithelial lesion or neoplasia. In 1991, the Bethesda Conference developed guidelines for assessing Pap smear adequacy. An adequate squamous component for conventional Pap smears was defined as10% or greater slide coverage by well-preserved and well-visualized squamous cells. Based on a survey conducted by the College of American Pathologists in 1996 and 1997, 92% of the 2000 laboratories surveyed used the Bethesda criterion for designating a specimen as unsatisfactory.12 Despite wide acceptance by many laboratories, the 10% rule for reporting squamous cell adequacy is not perfect. First, it is an arbitrary value that is unsupported by scientific studies. In addition, it is difficult to translate cellularity into a percentage of surface coverage because of nonuniform distribution of cells on conventional Pap smears. Renshaw et al2 reported that cytologists often visually overestimated the percentage of slide coverage by squamous cells compared with results obtained using an image analysis system. Gill3 stated that 10% slide coverage was equivalent to the presence of an average of 190 squamous cells per ×100 field of view and, therefore, was too high. Another criticism is that the interpretation of 10% or greater slide coverage varies among cytologists, and the reproducibility of determining scant cellularity is poor. In 1 study, 5 reviewers were asked to independently evaluate 114 Pap smears, which included 14 smears designated unsatisfactory because of less than 10% slide coverage by squamous cells.13 The mean kappa value for adequacy assessment was 0.61. However, the authors stated that one problematic area was the evaluation of adequate minimum squamous cellularity. In another study, 4 observers, 2 cytopathologists and 2 cytotechnologists, were asked to estimate the percentage of the slides that were covered by squamous cells in 83 smears prepared from buccal scrapes.2 The kappa values ranged from very poor (0.02) to fair (0.55). Because of the controversy associated with the 10% rule, a new numeric criterion for determining an adequate squamous component was proposed during the 2001 Bethesda workshop—a conventional Pap smears should have an estimated minimum of approximately 8,000 to 12,000 well-preserved and well-visualized squamous epithelial cells.4 Some participants expressed concerns that it would be difficult, if not impossible, to actually count the cells, because most laboratories do not have sufficient time and resources to do so. However, in most instances, the presence and absence of an adequate squamous cellularity are readily apparent, and for only a small number of smears with questionable adequacy should the cellularity be estimated. Even then, laboratory personnel should not count individual cells.4 Instead, computer-generated images of known cellularity density should be used as a reference to compare with smears in question. 372 Am J Clin Pathol 2003;119:367-373 6 DOI: 10.1309/08J6MDLCJPPHJND1

Recently, using a method similar to the one proposed in Bethesda 2001, Haroon et al14 showed that the interobserver reproducibility of assessing the squamous cellularity of liquid-based preparations was excellent. However, the criterion and methods proposed for liquid-based preparations cannot be applied to conventional smear preparation because squamous cells are evenly distributed within a well-defined circle in the former, whereas the squamous cells are not uniformly distributed over the entire slide in the latter. To our knowledge, there have not been published reports addressing interobserver reproducibility using the reference images proposed in Bethesda 2001 to assess squamous adequacy in conventional Pap smears. In the present study, we demonstrated that based on the new criterion, 17 smears (12 satisfactory and 5 unsatisfactory) had a unanimous designation in comparison with 15 smears by the 10% rule. The interobserver agreement expressed as the mean kappa value was 0.60 with the new guidelines, a significant improvement over the 10% rule. Other investigators have made similar observations regarding the use of a reference or physical standard in the evaluation of squamous component adequacy. Renshaw et al2 created slides to illustrate smears with known levels of cellularity and informed the observers to use them as references to estimate the percentage of the slide covered by squamous cells. The interobserver agreement increased dramatically after the provision of the reference slides. Although we did not specifically measure the feasibility of using reference images to estimate squamous cellularity, all observers found that the reference images were easy to understand and apply. Other authors have proposed other methods of assessing squamous cellularity on conventional Pap smears. For example, one group of investigators proposed dividing each smear into 15 equal areas using a lined template and then assigning a score of 0 to 1 to each rectangle depending on whether at least 50% of its area contained squamous cells.15 A semiquantitative total score, ranging from 15 to 0, was generated for each slide. The authors showed that false-negative smears had significantly lower scores than true-positive smears. However, all scoring was calculated by a single observer, and there was no discussion of the feasibility of the method. It is interesting to note that based on the new criterion and using the reference images, the proportions of smears designated as unsatisfactory were higher than those obtained using the 10% rule. This was true for all except 1 observer. The differences were statistically significant. It seemed that observers tended to overestimate the percentage of slide coverage by squamous cells based on the old criterion. Others have reported similar findings. A visual estimate of 10% coverage corresponded to a “true” median coverage of 3% as determined by computerized image © American Society for Clinical Pathology

Anatomic Pathology / ORIGINAL ARTICLE

analysis.2 When the evaluation was repeated with the provision of reference smears with different extents of coverage for comparison, the median coverage estimated by 3 of 4 observers was similar to the results obtained using computerized image analysis. Based on our experience and that of others,2 the advantage of using reference images and/or smears to determine squamous cellularity of conventional Pap smears is apparent. Although it is beyond the scope of the present study, the question remains unanswered of whether an estimated range of 8,000 to 12,000 squamous cells is the most appropriate cutoff for an adequate squamous component. Since no substantial evidence is available in the literature to support the superiority of the newly proposed numeric criterion over the 10% slide rule, future studies are warranted to examine the clinical relevance of the newly proposed numeric criterion or some other numbers. Interobserver agreement in assessing squamous adequacy on conventional smears is fair, whether using the old or the newly proposed method. However, the application of the Bethesda 2001 criterion along with the use of reference images to determine the adequacy of squamous cellularity on conventional Pap smears significantly improved interobserver reproducibility compared with the 10% rule. From the 1Department of Pathology and 3Biostatistics Unit, University of Alabama at Birmingham; and 2Department of Pathology, New York University Medical Center, New York, NY. Presented in part at the 91st Annual Meeting of the United States and Canadian Academy of Pathology, Chicago IL, February 23 to March 1, 2002. Address reprint requests to Dr Chhieng: Dept of Pathology, University of Alabama at Birmingham, 619 19th St S, KB 627, Birmingham, AL 35249-6823. Acknowledgment: We thank the cytotechnologists who participated in this study: Kay Alexander, Jun Chen, Kathy Connolly, Norma Driver, Sandra Gallaspy, Jon Gidley, Donald Kosatka, Dawn Lucas, Kathleen Randall, Rebecca Robertson, Kay St. John, Tina Taylor, and Leisa Whitlow; and George Birdsong, MD, Emory University, Atlanta, GA, for permission to use and reproduce the reference images in this report.

References 1. The 1988 Bethesda System for reporting cervical/vaginal cytological diagnoses: National Cancer Institute Workshop. JAMA. 1989;262:931-934. 2. Renshaw AA, Friedman MM, Rahemtulla A, et al. Accuracy and reproducibility of estimating the adequacy of the squamous component of cervicovaginal smears. Am J Clin Pathol. 1999;111:38-42. 3. Gill GW. Pap smear cellular adequacy: what does 10% coverage look like? What does it mean [abstract]? Acta Cytol. 2000;44:873. 4. Solomon D, Davey D, Kurman R, et al. The 2001 Bethesda System: terminology for reporting results of cervical cytology. JAMA. 2002;287:2114-2119. 5. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159-174. 6. Yobs AR, Plott AE, Hicklin MD, et al. Retrospective evaluation of gynecologic cytodiagnosis, II: interlaboratory reproducibility as shown in rescreening large consecutive samples of reported cases. Acta Cytol. 1987;31:900-910. 7. Pairwuti S. False-negative Papanicolaou smears from women with cancerous and precancerous lesions of the uterine cervix. Acta Cytol. 1991;35:40-46. 8. van der Graaf Y, Vooijs GP, Gaillard HL, et al. Screening errors in cervical cytologic screening. Acta Cytol. 1987;31:434-438. 9. Paterson ME, Peel KR, Joslin CA. Cervical smear histories of 500 women with invasive cervical cancer in Yorkshire. Br Med J (Clin Res Ed). 1984;289:896-898. 10. Henry JA, Wadehra V. Influence of smear quality on the rate of detecting significant cervical cytologic abnormalities. Acta Cytol. 1996;40:529-535. 11. Ransdell JS, Davey DD, Zaleski S. Clinicopathologic correlation of the unsatisfactory Papanicolaou smear. Cancer. 1997;81:139-143. 12. Davey DD, Woodhouse S, Styer P, et al. Atypical epithelial cells and specimen adequacy: current laboratory practices of participants in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology. Arch Pathol Lab Med. 2000;124:203-211. 13. Spires SE, Banks ER, Weeks JA, et al. Assessment of cervicovaginal smear adequacy: the Bethesda System guidelines and reproducibility. Am J Clin Pathol. 1994;102:354-359. 14. Haroon S, Samayoa L, Witzke D, et al. Reproducibility of cervicovaginal ThinPrep cellularity assessment. Diagn Cytopathol. 2002;26:19-21. 15. Valente PT, Schantz HD, Trabal JF. The determination of Papanicolaou smear adequacy using a semiquantitative method to evaluate cellularity. Diagn Cytopathol. 1991;7:576-580.

© American Society for Clinical Pathology 7

Am J Clin Pathol 2003;119:367-373 373 DOI: 10.1309/08J6MDLCJPPHJND1 7