203
Virtual Microscopy for Cytology Proficiency Testing Are We There Yet?
Jimmie Stewart III, MD1 Kayo Miyazaki, BS, CT2 Kristen Bevans-Wilkins, BA, SCT3 Changhong Ye, BS, SCT2 Daniel F.I. Kurtycz, MD1,2 Suzanne M. Selvaggi, MD1
BACKGROUND. The objective of this study was to investigate the potential of virtual microscopy (VM) as an avenue for the delivery of mandatory cytology proficiency tests). METHODS. Three senior cytotechnologists and 2 board-certified cytopathologists participated in 3 virtual proficiency tests. Each set consisted of 10 ThinPrep slides that were digitized by an Aperio T3 ScanScope. The cytologic diagnoses covered the range of interpretive guidelines provided by the Centers for Medicare and
Department of Pathology and Laboratory Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin.
Medicaid Services (CMS). Each cytotechnologist followed the requirement of a
2
Wisconsin State Laboratory of Hygiene, Madison, Wisconsin.
RESULTS. Analysis of the diagnostic interpretation of the first proficiency test showed correct classification of 100% of normal and abnormal cells for primary
3 Department of Laboratory Medicine, Swedish Covenant Hospital, Chicago, Illinois.
and secondary screeners. The second proficiency test analysis revealed a 93.3%
1
primary screener with the cytopathologists utilizing the secondary screener option.
correct classification (100% using CMS guidelines) among the primary screeners. The secondary screeners gave a 100% correct classification. The final proficiency test had primary screeners and secondary screeners with 100% correct classification.
CONCLUSIONS. The current results confirmed the feasibility of VM for proficiency tests with 2 main problems noted. First, primary screeners had difficulties meeting the mandatory time allocation; however, with increased familiarity with the software, the screening times decreased. Second, the 3-dimensional nature of certain lesions made them difficult to interpret even on monolayered, liquid-based preparations. Creation of a more user-friendly software interface and better methods to capture depth of focus should make this a valid measure of cervicovaginal cytopathologic interpretive competence. Cancer (Cancer Cytopathol) 2007;111:203–9. 2007 American Cancer Society.
KEYWORDS: Centers for Medicare and Medicaid Services, cytology, proficiency test, virtual microscopy.
O
Address for reprints: Jimmie Stewart, III, MD, Cytopathology Laboratory/D4-207b, University of Wisconsin Hospital and Clinics, 600 Highland Avenue, Madison, WI 53792; Fax: (608) 263-6453; E-mail:
[email protected] Received January 26, 2007; revision received March 6, 2007; accepted March 9, 2007.
ª 2007 American Cancer Society
ver the past decade, health care professionals in the field of cytopathology have had to withstand many changes. Some changes have been met with a degree of enthusiasm, whereas others have been more contentious. The most notable of these changes include liquid-based technology for gynecologic cytology, the newly approved human papillomavirus vaccine, technologic advances in digital imaging and virtual microscopy, and the application of the first federally mandated proficiency test in cytopathology. The most controversial change in the field of cytopathology is the national cytopathology proficiency testing (PT) program. In 2004, the Midwest Institute for Medical Education fulfilled the federal government mandate, which required the periodic confirmation and evaluation of proficiency of individuals involved in the screen-
DOI 10.1002/cncr.22766 Published online 19 June 2007 in Wiley InterScience (www.interscience.wiley.com).
204
CANCER (CANCER CYTOPATHOLOGY) August 25, 2007 / Volume 111 / Number 4
ing or interpretation of cytologic preparations.1 The Center for Medicare and Medicaid Services (CMS) approved their PT program and required all clinical cytology programs in the United States to enroll. The initial charge for individualized PT was issued under the Clinical Laboratory Improvement Amendments of 1988 (CLIA 88). The creation of a PT program requires a large supply of field-validated gynecologic cytology slides and a host of administrators to insure the validity and integrity of a program of this nature. PT is not a new concept. European countries have testing for a variety of areas in pathology. In the United Kingdom, there are quality-assurance testing programs for a variety of areas called National External Quality Assessments Schemes; whereas, in the European Union, there is an aptitude test for cervical cytology that was created by the Committee for Quality Assurance, Trading, and Education (QUATE).2 It is important to learn from their examples. These programs have struggled to create testing that is equal for each laboratory or individual. The objective of the QUATE committee was to test each individual using the exact same slides to ensure equity of examination. However, a problem of reproducibility regarding cervical cytology slides was noted that was similar to the difficulties noted in the United States today. This problem was solved by gathering the test takers in a single place where they could use the same material placed on different microscopes to permit quick, sequential examination by several individuals.2 Although it was fair, this meant that only a small number of participants could take the test at any given time, creating a logistical nightmare. Currently, national cytology PT committees in the United States country are discussing critical issues with the current PT program. Recent advances in digital slide-scanning technology again have raised the hypothesis that PT could be enhanced if it was performed on a computer. This possibility was studied first in the late 1990s by both the Centers for Disease Control and Prevention (CDC) and the International Academy of Cytology (IAC). At the time, concerns were raised regarding the ability of a digital test to mimic the laboratory experience closely. Current advances in this technology have enabled those of us interested in digital imaging to revisit this idea. In this article, we discuss our experience in the development and evaluation of PT using virtual microscopy.
MATERIALS AND METHODS Slide Acquisition For this crucial step, we decided to use the liquidbased screening method that was available in our
laboratory, gynecologic ThinPrep (Cytyc Corporation, Marlborough, Mass) specimens. Our decision was based on previous knowledge of the difficulty of imaging 3-dimensional groups appropriately with our scanner, an Aperio Technologies T3 ScanScope CS (Aperio Corporation, Vista, Calif). A previous report indicated that liquid-based preparations may have fewer problems with depth of focus than conventional gynecologic slides.3 The ThinPrep (Cytyc Corporation) slides were acquired from a collection of 1000 examination slides that had been used by the School of Cytotechnology at the Wisconsin State Laboratory of Hygiene (Madison, Wis). The coordinator of the mock virtual-PT program pulled all slides that had a 100% diagnostic concordance rate among the cytology students from the previous class and a concordance with the pathologist’s actual diagnosis. The slides that were compiled followed the diagnostic guidelines of the standard CLIA-mandated proficiency test with specimens grouped as follows: category A, unsatisfactory; category B, negative for intraepithelial lesion with or without infectious agents; category C, low-grade squamous intraepithelial lesions (LSIL); category D, high-grade squamous intraepithelial lesions (HSIL), adenocarcinoma in situ, and all slides that were positive for carcinoma. The slides were assigned to 1 of 3 proficiency tests for slide scanning.
Image Acquisition The Aperio T3 ScanScope (Aperio Corporation) instrument was used to digitize the cellular area of a ThinPrep (Cytyc Corporation) slide or 24 mm 3 25 mm. The ScanScope console is equipped with an Olympus 320/0.75-u Plan-Fluor objective, which generates a 0.5-m per pixel scanning resolution, and a 340 scanning optical magnification changer, which produces a 0.25-lm per pixel scanning resolution. US Patent 6,711,283 line scanning is used to create of a 24-bit, color, contiguous TIFF slide file.4 The manual slide-scanning process, as outlined in the ScanScope User Guide, was used. Upon loading, a macro image of the slide and its label were taken for archival purposes. The slides were scanned at an absolute magnification of 3200. A prescan operation that calibrated the background illumination levels was performed, and the operator initiated large-scale scanning of all the examination slides. The acquired virtual image was evaluated for image quality. The virtual slides were labeled according to PT number and position in the test and were saved on the local host machine as 24-bit, color, contiguous TIFF files that were readily viewable.4 The slides were not altered in any way.
VM for Cytology Proficiency Testing/Stewart III et al.
205
TABLE 1 Correct (Target) Diagnosis per 10-slide Test Virtual Proficiency Test
Cytotechnologist 1
Cytotechnologist 2
Cytotechnologist 3
Cytopathologist 1
Cytopathologist 2
1 2 3
10 10 10
10 9* 10
10 9* 10
10 10 10
10 10 10
* The target diagnosis was not met by the cytotechnologist.
TABLE 2 Distribution of Target Diagnosis Versus Interpreter Diagnosis: Virtual Proficiency Test 2 No. of responses Slide No.
Reference interpretation
Cumulative history of responses
CT
PATH
1 2 3 4 5 6 7 8 9 10
Negative for intraepithelial lesion or malignancy HSIL Negative for intraepithelial lesion or malignancy Negative for intraepithelial lesion or malignancy HSIL Negative for intraepithelial lesion or malignancy HSIL LSIL Negative for intraepithelial lesion or malignancy Positive for malignancy
Negative for intraepithelial lesion or malignancy HSIL; LSIL Negative for intraepithelial lesion or malignancy Negative for intraepithelial lesion or malignancy HSIL Negative for intraepithelial lesion or malignancy HSIL LSIL Negative for intraepithelial lesion or malignancy Positive for malignancy; HSIL
3 1; 2 3 3 3 3 3 3 3 3; 0
2 2; 0 2 2 2 2 2 2 2 1; 1
CT indicates cytotechnologist; PATH, pathology; HSIL, high-grade squamous intraepithelial lesion; LSIL, low-grade squamous intraepithelial lesion.
PT Creation, Testing, and Grading Ten virtual slides were assigned to each of the 3 proficiency tests. An answer sheet was created for the participating cytotechnologist and pathologist. Each sheet contained the slide number, appropriate case history, and an area for diagnostic interpretation. The answers given by the participants followed the categories of A through D described above. Three senior cytotechnologist and 2 board-certified cytopathologists participated in all 3 proficiency tests. The cytotechnologist followed the requirements of a primary screener, and the cytopathologist utilized the secondary screener option, because it reflects their laboratory duties and is permitted under current PT regulations. Each participant was instructed to adhere to the current CMS guidelines for time allocation and diagnostic categorization. After it was determined that the initial virtual screening took significantly longer than the CMS guideline, it was waived, and the participants were instructed to record the time it took to complete the test. The factors that led to the variance from testing regulations of 10 slides completed within 2 hours and the actual screening time per slide for the primary screeners are incorporated into this report (see Results, below).
Upon completion of the test, the coordinator scored the results based on the current gynecologic proficiency test scoring for cytotechnologists and for the cytopathologists. A grade of pass or fail was issued.
RESULTS Only 1 slide caused the primary screeners difficulty, leading to a single missed diagnosis and a grade of 90% (Table 1). It is noteworthy this was Slide 2, which showed HSIL (Table 2). This may represent difficulty in evaluating hyperchromatic, crowded groups on the slide. It also is worth noting that the secondary screeners differed on the severity of a squamous cell carcinoma. Although this was a difference in final diagnosis, the clinical significance was the same, and the response was graded equally. All participants scored 100% using the CMS grading scale (Table 3). Participants were given instruction on the utilization of the software, and each had some practice using it. However, when they were given the first 2 proficiency tests, primary screeners uniformly could not finish in the allotted time (Table 4). The mean time for primary screeners for the first 2 virtual profi-
206
CANCER (CANCER CYTOPATHOLOGY) August 25, 2007 / Volume 111 / Number 4 TABLE 3 Scored Results Using the Centers for Medicare and Medicaid Services Point-value Grid for Cytotechnologist and Cytopathologist Virtual proficiency test
Cytotechnologist 1
Cytotechnologist 2
Cytotechnologist 3
Cytopathologist 1
Cytopathologist 2
1 2 3
100 100 100
100 100* 100
100 100* 100
100 100 100
100 100 100
* On the basis of the current Centers for Medicare and Medicaid Services scoring grid, the participants were able to pass with a rate of 100%.
TABLE 4 Actual Time to Evaluate Virtual Proficiency Test Virtual proficiency test
Cytotechnologist 1, min
Cytotechnologist 2, min
Cytotechnologist 3, min
Cytopathologist 1, hr
Cytopathologist 2, hr
1 2 3
200 135 85
158 177 79
143 139 60
\2 \2 \2
\2 \2 \2
ciency tests was 189.3 minutes for the first test and 150.3 minutes for the second test. However, by the last test, the mean for primary screeners had fallen to 74.6 minutes. The inability to adhere to the time limitations was a significant factor. The reasons given were unfamiliarity with the software in a testing situation, eye strain from viewing the monitor, and a lack of ergonomically appropriate controls for screening. These difficulties were noted only among primary screeners. Secondary screeners had a mean proficiency test time of 76 minutes for the first test, 73.5 minutes for the second test, and 43.5 minutes for the third test.
DISCUSSION The current study represents a natural progression in the examination of digital and computerized PT. In the past, other studies3,5,6 focused on conventional smears and partial digitization of the diagnostic material on a slide. After taking their analysis into consideration, we chose to use existing technology to emulate the glass-slide PT with a current, widely used, liquid-based preparation. All diagnostic material on a slide was imaged, which forced screeners to screen the digital representation completely. This allowed us to examine aspects that previously were not mentioned in considerable detail, such as the design of the test and time allotment. We believe that our findings will be important in any future development of virtual PT. The development of virtual PT began in March 1994, when a Clinical Laboratory Improvement Advi-
sory Committee recommended that the CDC develop a plan for computer-imaging PT as an alternative.6 CytoView, a prototype computer image-based gynecologic cytology proficiency test, was born out of those initial efforts.6 The CytoView system was among the first of its kind with virtual partial slide imaging rather than static imaging, ie, a viewer could view a portion of a whole slide rather than selected images. Thus, for the first time, locator skills could be evaluated in addition to diagnostic ability on a computerized system. A 5 mm 3 10 mm area of a slide that contained the most diagnostic material was scanned by taking hundreds of images at a particular magnification and then stitching the individual frames together to create a representation of the slide.6 Although it provided acceptable images, drawbacks included large file size because of the imaging process compounded by the creation of z-stacks or imaging of 5 complete levels of focus on a slide to overcome the 3-dimensional effect of conventional smears; tiling defects, in which the background coloration of a particular frame would be different than an adjacent frame; and, finally, time factors in that it took 24 hours to scan a partial slide manually with the z-stacks for a case.6 Although the CytoView system had significant drawbacks, it showed that, with future advances in technology, computer-based PT was feasible. An evaluation of computerized PT by the IAC in 1998 highlighted many of the potential advantages of utilizing a digital system over glass slides. The examination of both locator skills and diagnostic skills was put forth by many7–9 as a necessity for any
VM for Cytology Proficiency Testing/Stewart III et al.
computerized PT program. Glass-slide PT may test adequately for diagnostic skills, but it lacks testing for locator skills. It has been reported in the literature that the majority of laboratory-based, false-negative results arise from screening error.10 If PT is truly to be beneficial to patients, then assessment and improvement of locator skills will be vital. Virtual microscopy, with the appropriate software, offers an increased chance of evaluating locator skills over conventional PT methods. The IAC also discussed the need for PT validity. There are different types of validity—construct validity, content validity, and face validity.7 Many have expressed the prerequisite to create PT that mirrors performance in the real world or that has construct validity. If this is an objective, then the current basic format for PT is flawed. A 10-slide test cannot recapitulate the workplace environment. The most obvious validity in PT is content validity or agreement among knowledgeable personnel of a correct answer. The current CLIA-mandated rule, 42 CFR 493.945(b)(I), states that agreement of 3 board-certified anatomic pathologist with tissue confirmation is adequate content validity. Field validation provides a higher level of content validity, because many different trained individuals evaluate slide material.11 An adequate PT slide ideally would have 90% agreement among cytopathologists and cytotechnologists. Virtual PT has an additional burden, in that it must be proven that a diagnosis rendered on a glass slide directly correlates with its digital analog to have true content validity. Face validity involves creating a test with materials that emulates working materials. This is difficult for a virtual cytopathology proficiency test. A test on a computer will not be face valid, because the actual work is performed with a microscope. Digital imaging has improved vastly over the last 5 to 10 years. Today’s systems have resolution that is close to diagnostic. Our Aperio T3 ScanScope produced clear, 2-dimensional images of a thin slide in 9 minutes per slide. Through the use of Aperio’s patented scanning method (Fig. 1), most of the slide remained in focus. The scanner picks many different focal points on a slide; and, as it scans a ‘‘strip,’’ it brings these points into focus (Fig. 2). This lessens the need for depth of focus or ‘‘z-stacks’’ but does not eliminate it. We sought to create a test that mirrored the glass PT as much as possible. Whereas other reports in the literature3,5,6 have limited the testing material only to a portion of the slide, we chose to image the entire area of the ThinPrep slide. This was not problematic technically because of the limited diagnostic area of
207
FIGURE 1. This is an illustration of the Aperio T3 ScanScope (Aperio Corporation, Vista, Calif) scanning method of using strips to scan the diagnostic area of the slide.
FIGURE 2. This is an illustration of the prescan stage. Numerous focal points are chosen by the computer; and, as the scanner moves down a strip, its focal point changes levels to bring more of the cells on a slide into focus.
a ThinPrep slide (24 mm 3 25 mm). The size of our slides averaged 200 MB. Storage space is inexpensive today, so this is no longer a limitation. Hard drives are available that can store data from hundreds of gigabytes to terabytes. In addition, because our virtual slides were stored on a hard drive, problems related to bandwidth and the transmission of large amounts of data also were avoided. Our entire virtual PT can be stored on 1 DVD. Although the technical problems of file size were overcome, other more pertinent issues with virtual PT were discovered. Improvements in technology have been made to make virtual PT technically feasible, but it is far from perfected. Although liquid-based preparations, new focusing methods, and better software designs have improved our ability to digitize images, there still are instances in which creating a high degree of depth of focus is imperative. Table 2 shows that the only slide that was not perfectly correct by our primary screeners was an HSIL that was under-called by 2 screeners as LSIL; and, although all of our participants passed, most if not all noted the loss of a depth of focus.
208
CANCER (CANCER CYTOPATHOLOGY) August 25, 2007 / Volume 111 / Number 4
FIGURE 3. This screen capture image of the Aperio T3 ScanScope (Aperio Corporation, Vista, Calif) software interface shows a digitally annotated or ‘‘dotted’’ area.
In liquid-based preparations, there is difficulty in recognizing HSIL, which often appears as hyperchromatic, crowded groups.12 Even though there are fewer large masses of cells in liquid-based preparations, there are more clumps of small groups of cells. This makes accurate diagnosis without the ability to focus through the groups impossible. Unfortunately, most commercial software packages do not allow for an entire slide to be ‘‘z-stacked’’ automatically. We believed that the 3-dimensional revisit software provided by Aperio would be a solution. However, although it does allow depth of focus through preselected areas, the area remains visibly annotated, thus, negating the test. There are a few proprietary software packages that may offer a solution to this problem. A study using CytoView II from the CDC showed that scores on virtual PT correlated well with glass-slide PT (mean, 99.2% on glass slides vs 96.8% on virtual microscopy).5 That patented system allowed for z-stacks created through an automated system, but only a portion of the slide was imaged (5 mm 3 10 mm).5 The most notable limitation that we identified was that a reliable correlation was not established between virtual PT and the actual work environment. This problem was noticed first when the first test was administered. Table 4 shows that the primary screeners were far over the mandated 2 hour time limit for the first 2 tests, but they were within the time allotment for the last test. The examinees were not familiar with the software environment or the controls used for lighting adjustments, color, magnification, and annotations (Fig. 3). When those issues were combined with the elimination of (manual) focusing (performed on a microscope), there was an absence of real workplace simulation. This may have
led to a form of bias for the test, in that computer knowledge was an unintentionally measured skill.13 Those who had good computer dexterity, but not necessarily screening competence, were able to use the software better than others. Some complained that fatigue increased with reports of eye strain and wrist pain related to screening a full slide on the computer. Anecdotally, even when they were given instructions on how to use the equipment to simulate the microscope, some did not or could not utilize the information adequately. Any virtual microscopy testing program needs to address the issues of a user-friendly software interface for cytopathology professionals. Controls for lighting, contrast, and magnification need to be readily apparent. All of our examinees passed the test without difficulty; however, they all noted problems with finding basic controls. Any virtual proficiency test developer needs to assure that no bias is introduced that detracts from the objective of measuring locator/ screening and diagnostic ability. In conclusion, the creation of a virtual PT proficiency test is possible. However, technical issues, such as making a full slide with appropriate depth of focus, must be addressed before risking professional careers. The psychological aspects of creating a workplace simulation cannot be ignored. The PT experience is stressful enough without the added pressure of dealing with an unfamiliar environment. Any vendor with an appropriate vision, using field-validated slides, and working with motivated, highly trained cytopathology professionals could create a test with user-friendly software and slides with full depth of focus. Currently, that would take a commitment to develop an automated system for cytology slide scanning or a committed staff to undergo the arduous task of focusing manually large numbers of field-validated diagnostic slides. So, currently, the answer to the question, ‘‘Are we there yet?’’ is, ‘‘We are not, but we are getting there.’’
REFERENCES 1.
2.
3.
US Department of Health and Human Services;Center for Medicare and Medicaid Services and Centers for Disease Control. 42 CFR Part 493.855(a), January 24, 2003. Bethesda, Md: US Department of Health and Human Services; 2003. Demichelis F, Della Mea V, Fort S, Dalla Palma P, Betrami CA. Digital storage of glass slides for quality assurance in histopathology and cytopathology. J Telemed Telecare. 2002;8:138–142. Marchevsky AM, Khurana R, Thomas P, Scharre K, Farias P, Bose S. The use of virtual microscopy for proficiency testing in gynecologic cytopathology. Arch Pathol Lab Med. 2006;130:349–355.
VM for Cytology Proficiency Testing/Stewart III et al. 4. 5.
6.
7.
8.
9.
Aperio Technologies. Scanscope CS User Guide. Vista, Calif: Aperio Corporation; 2005. Taylor RN, Gagnon M, Lange J, Lee T, Draut R, Kujawski E. Cytoview: a prototype computer image-based Papanicolou smear proficiency test. Acta Cytol. 1999;43:1045–1051. Gagnon M, Inhorn S, Hancock J, et al. Comparison of cytology proficiency testing: glass slides versus virtual slides. Acta Cytol. 2004;48:788–794. Vooijs GP, Davey DD, Sumrak TM, et al. Computerized training and proficiency testing: International Academy of Cytology Taskforce summary. Acta Cytol. 1998;42:141–147. Nagy GK, Newton LE. Cytopathology proficiency testing: where do we go from here? Diagn Cytopathol. 2006;34:257– 264. Breyer FJ. A computerized model-based approach for assessing cytotechnologists’ recognition and referral skills. Lab Med. 1994;25:264–266.
209
10. Nguyen HN, Nordqvist SRB. The Bethesda System and evaluation of abnormal Pap smears. Semin Surg Oncol. 1999;16:217–221. 11. Renshaw AA, Wang E, Mody DR, Wilbur DC, Davey DD, Colgan TJ. Cytopathology Resource Committee, College of American Pathologists. Measuring the significance of field validation in the College of American Pathologists interlaboratory comparison program in cervicovaginal cytology: how good are the experts? Arch Pathol Lab Med. 2005; 129:609–613. 12. Renshaw AA, Young NA, Birdsong GG, et al. Comparison of performance of conventional and ThinPrep gynecologic preparations in the College of American Pathologists Gynecologic Cytology Program. Arch Pathol Lab Med. 2004; 128:17–22. 13. Friedman CP, Wyatt JC. Evaluation Methods in Medical Informatics. New York, NY: Springer; 1997:155–200.