Anatomic Pathology / Variability in Lymph Node Counting. CME/SAM. To Count and .... The gross dissection of LNs was relatively standardized. (per self-report) at both ..... Laparoscopic retroperitoneal lymphadenectomy followed by immediate.
Anatomic Pathology / Variability in Lymph Node Counting
To Count and How to Count, That Is the Question Interobserver and Intraobserver Variability Among Pathologists in Lymph Node Counting Vinita Parkash, MD,1 Carlo Bifulco,1 Richard Feinn, PhD,2,3 John Concato, MD,2,3 and Dhanpat Jain, MD1
CME/SAM
Key Words: Lymph node counting; Cancer staging; Interobserver variability; Intraobserver variability DOI: 10.1309/AJCPO92DZMUCGEUF
Upon completion of this activity you will be able to: • identify the most common reasons for variability in lymph node counts among pathologists. • list the factors that could result in low lymph node counts in patients. • describe the reasons why low total lymph node counts are associated with poorer progression-free survival.
Abstract Optimal cancer staging requires retrieval of a minimal number of nodes. However, variability among pathologists in counting on a slide has not been studied. To study the differences in node counting among pathologists, 10 pathologists counted nodes on 15 slides on 2 occasions. They also opined on whether selected “structures” represented countable nodes. There was no slide on which all pathologists agreed on all occasions. The greatest variability was on slides on which the number of nodes exceeded 8. There was disagreement on the size of the smallest countable node, on how to count 2 closely related structures, and when the gross disagreed with the microscopic finding. With a mean count of 5.7 nodes per slide, the 95% confidence interval was ± 2.6, which could be clinically significant when the count approaches the set minimum. Uniform criteria are necessary to allow for meaningful comparisons between studies on minimal nodal counts for cancer lymphadenectomies.
The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this educational activity for a maximum of 1 AMA PRA Category 1 Credit ™ per article. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module. The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose. Questions appear on p 163. Exam is located at www.ascp.org/ajcpcme.
The traditional survival prognostic criteria in cancer are extent of tumor (T), nodal status (N), and metastatic disease (M), forming the core of both major cancer-staging systems in the world (the American Joint Commission on Cancer and the International Union Against Cancer). The N has traditionally been based on the number of “positive” lymph nodes (LNs), unrelated to the total number of LNs detected. Recent studies have suggested, however, that the total number of LNs removed may also be an important prognostic variable for long-term and progression-free survival.1-4 Hence, it is now considered necessary to harvest a certain minimal number of LNs to deem a lymphadenectomy as adequate. Although the disease in node-negative patients with fewer LNs than that considered adequate is still classified as N0, there is poorer local control, leading some to recommend additional adjuvant therapy.1 Hence, there is pressure on surgeons and pathologists to retrieve the requisite number of LNs. This presupposes that retrieval and counting of LNs is a standardized, reproducible process, assuming an adequate excision. The current study was undertaken to determine whether standardization exists regarding the process of LN retrieval and counting and to assess interobserver and intraobserver differences in LN counting at 2 institutions.
Materials and Methods This project was approved by the institutional review board. LN Retrieval A questionnaire was given to prosectors at 2 separate but affiliated institutions. One is an academic hospital with a cancer 42 42
Am J Clin Pathol 2010;134:42-49 DOI: 10.1309/AJCPO92DZMUCGEUF
© American Society for Clinical Pathology
Anatomic Pathology / Original Article
center, where the prosectors are predominantly residents; the second is a large community hospital with a cancer center where the prosectors are pathology assistants with 7 to 15 years of experience.
theory is an extension of classical test theory in that it partitions the variance of observed scores—counts in this case— into systematic variability among components (facets) to better identify the sources of unreliability in measurement. The 10 pathologists counting LNs on 15 slides on 2 different occasions provide 2 facets, one for pathologists and the other for occasions. The variability among pathologists is a measure of interrater reliability, whereas the variability between occasions is a measure of intrarater reliability. Two measures of reliability commonly reported are the generalizability coefficient (ρ²) and the index of dependability (Φ). The generalizability coefficient is reported when relative decision making is the primary focus, ie, how the object of measurement (slide) is rank-ordered without regard to the absolute score. The index of dependability is appropriate when absolute decision making is the primary concern, ie, how the object of measurement (slide) is rated based on an absolute score (number of LNs detected). Because the actual number of LNs counted should be an exact number, the index of dependability was used for analyzing the current data set.
LN Counting Fifteen slides with gross descriptions relating to each block (slide) were circulated among 10 pathologists, 3 of whom practice at the academic center. The slides were circulated in batches of 6, 5, and 4 to avoid inadvertent mix-up between paperwork and slides. After the first set of results was received, the slides were circulated for a second time; the time between reviews ranged from 2 days to 6 weeks. No additional instructions were given, other than to count LNs as if in routine practice. Follow-up Assessments After completion of the study, 4 areas in 3 slides were marked and circulated with a questionnaire to better determine what each pathologist might count in a very specific setting. A brief roundtable discussion was conducted at the end of the study to clarify issues about the definition of and when to count LNs.
Results
Statistical Analysis To investigate the interrater and intrarater reliability of the number of LNs identified by different pathologists on different slides, generalizability theory was used.5 Generalizability
The gross dissection of LNs was relatively standardized (per self-report) at both institutions ❚Table 1❚, using a combination of entire submission, fat clearing, and visualization and palpation techniques for detection of LNs. Virtually all large
❚Table 1❚ Prosector Questionnaire and Responses PA No. Question
1
1.
Both Both Both
Do you use manual palpation (MP) or Carnoy solution for lymph node retrieval? 2. Do you cut fat into multiple pieces and then “smush” each smaller fragment and look for LNs (vs keeping entire fat intact and smushing)? 3. If you cut and then smush, do you ensure that a portion of a LN was not cut into half and present in 2 different fragments of fat? 4. Carnoy only: Do you palpate the fat for large lymph nodes before fixing in Carnoy solution? 5. If yes, do you remove these and fix remainder in Carnoy? 6. Do you slice the fat before fixing in Carnoy solution? 7. If so, then at what intervals: < or >1 cm? 8. When you slice fat to look for lymph nodes (by either method), do you check both sides of the slice to ensure that the lymph node is not counted twice? 9. If tissue is small (fits into ≤3 cassettes), do you submit entire specimen? 10. If entire specimen is submitted, do you still palpate lymph nodes and submit detected nodes separately? 11. If you slice the specimen and submit in ≤3 cassettes, do you make sure that a lymph node was not sliced in half and submitted in 2 cassettes and erroneously counted as 2?
2
3
PAS No. 4
1
RES No.
2
1
2
3
4
Y
N
Y
Both Both Both Both MP; rarely MP; occasion- Both Carnoy ally Carnoy Y Y N Y N N Y
Y
Y
Y
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y Y Less Y
N Y Less Y
Y Y Less Y
Y Y Less Y
Y Y Less Y
Y Y Less Y
Y N Less Y
Y N NA Y
Y N NA Y
Y N Less Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Some- Y times Y Y
Y
Y
Y
N
N
N
Y
N
Y
Y
Y
Sometimes N
N
Y
Y
LN, lymph node; N, no; NA, not applicable; PA, pathology assistant; PAS, pathology assistant student; RES, resident; Y, yes.
© American Society for Clinical Pathology
Am J Clin Pathol 2010;134:42-49 43
DOI: 10.1309/AJCPO92DZMUCGEUF
43 43
Parkash et al / Variability in Lymph Node Counting
specimens are subjected to fat clearing, although some variation exists in how fat clearing is used. Some prosectors chose to slice and clear, without initially attempting to look for LNs, whereas others chose to dissect out grossly appreciable nodes and then subject the remainder to fat clearing with a repeated dissection for smaller LNs. Smaller specimens (requiring 5 or fewer blocks) were subject to greater variation in dissection technique. Some prosectors chose to dissect out the LNs from the fat; others used entire submission without any attempt to dissect out grossly appreciable nodes. Although most prosectors looked for bisected nodes at the time of gross sectioning for larger specimens, at least some did not do the same with entire submission. All prosectors submitted LNs in their entirety, unless there was gross involvement by tumor. ❚Table 2❚ shows the actual observations of LN counts. With combined results from all 15 slides, the total LN count ranged from 62 to 101. Two pathologists had the same total count for 1 occasion but did not agree on all slides; 4 pathologists had a count greater than 90, 4 ranged between 80 and 90, and 1 pathologist each had a count of 79 and 66 (average of 2 readings). (One pathologist did not give numbers on 2 cases and only a single total was used in that case.) No slide was found to have agreement among all pathologists as to the number of LNs on either occasion. Two cases had the greatest interobserver variability; both had more than 10 “lymphoid” fragments on the slide. The gross description on the first case was “multiple LNs” ❚Image 1A❚ (case 4), and the count ranged between 5 and 16. The other case was of “tissue submitted in its entirety with only a single LN palpated at gross examination” ❚Image 1B❚
(case 13). The count on this slide ranged between 1 and 11. No factor was readily evident to explain these differences. Anecdotally, however, 3 of 4 pathologists with counts of more than 90 had had fellowship training at a cancer center, whereas only 1 of 6 with counts of fewer than 90 had formal oncologic pathology fellowship training. In formal generalizability analysis, most of the variability in the counts, as expected, was attributed to the objects of measurement (slides) themselves; the index of dependability was Φ = 0.89. This result suggests that the correlation between the observed number of counted LNs and a theoretical “true” number of LNs is 0.89. Conversely, the variability in the counts is approximately 11% (1 minus 0.89), and by looking at the variance components, the sources contributing to misclassification can be identified. By comparing the variability associated with pathologists with the variability associated with occasions, it becomes evident that the difference among pathologists is an important source of error. Specifically, a variance of 2.5% was found between the 2 instances that the same pathologist (intrarater variability) reviewed the slides, whereas a larger variance of 5.8% was found between pairs of pathologists (interrater variability). The remaining variability was mainly attributable to residual variance (“error term”). This same analysis can be used to calculate a standard error of measurement and produce a confidence interval. The mean of all measurements (LN counts) was 5.7, and the standard error of measurement was 1.3, with a 95% confidence interval of ± 2.6. Thus, a pathologist’s reading included 95% confidence that the true number of LNs for that particular slide equals the number counted ± 2.6. This magnitude of result is
❚Table 2❚ Slide Description and Results of Pathologists’ Counts* Pathologist Slide No./Description 1/Multiple LNs 2/Single LN on palpation 3/Multiple LNs 4/Multiple LNs 5/2 LNs bisected (1 inked black) 6/Multiple LNs 7/Multiple LNs 8/Multiple LNs 9/Entirely submitted in 1 block 10/Multiple LNs 11/Multiple LNs 12/Multiple LNs 13/Entirely submitted in 1 block; 1 node palpated on gross examination 14/Entirely submitted in 1 block 15/Multiple LNs Total
A 4 2 4 13 2 5 3 1 2 10 11 12 † †
8 —
B
C
D
E
F
G
H
I
J
4 1 3 9 2 5 2 1 2 10 11 12 10
4 2 4 10 2 5 3 2 2 10 12 12 11
4 2 4 11 2 6 4 1 2 10 13 12 10
5 2 4 13 2 6 3 3 5 10 12 12 9
5 2 4 14 2 6 4 2 5 10 13 12 10
4 2 4 14 2 6 4 1 4 10 13 12 9
4 2 4 16 2 6 3 1 1 10 12 12 11
4 1 3 8 4 5 2 1 3 10 11 10 4
4 1 4 9 4 5 2 1 2 10 11 10 9
4 1 3 5 2 5 1 1 1 9 11 10 1
4 1 3 8 2 5 1 1 1 9 11 10 3
4 2 4 11 2 4 3 1 3 10 11 10 4
4 2 4 12 4 6 3 1 1 9 11 10 5
4 1 3 10 2 5 2 2 4 10 11 12 5
4 1 3 10 2 5 2 2 4 10 11 12 4
4 2 4 12 4 5 3 1 6 9 11 12 9
5 2 4 12 4 5 3 1 6 9 11 12 9
4 1 3 10 2 5 2 2 4 10 11 12 5
4 1 3 10 2 5 1 1 2 10 11 12 4
3 8 83
4 9 92
3 8 92
4 8 98
4 4 8 8 101 97
3 8 95
1 8 75
3 10 85
1 7 62
2 8 69
3 8 80
3 7 82
3 7 81
4 7 81
4 8 94
4 8 95
3 7 81
3 7 76
LN, lymph node. * Slides were circulated among pathologists twice; the time between reviews ranged from 2 days to 6 weeks. † Pathologist did not count lymph nodes and had a query.
44 44
Am J Clin Pathol 2010;134:42-49 DOI: 10.1309/AJCPO92DZMUCGEUF
© American Society for Clinical Pathology
Anatomic Pathology / Original Article
clinically meaningful, especially when considering the minimal node count requirements for determining the adequacy of a nodal dissection. Intraobserver variability was much less pronounced than interobserver variability. Only 2 pathologists were in complete agreement with themselves on the 2 occasions with respect to the total count, but not for all 15 slides. Five had a difference in count between 2 reviews of 3 or fewer, and 3 had a variation of more than 5, one of whom counted 13 cases
for both occasions. There was greater intraobserver variability on slides with more than 8 fragments of “lymphoid tissue,” accounting for about 49% of the episodes of disagreement. Differences were found within specific scenarios as to what each pathologist considered to be a LN ❚Table 3❚. Five pathologists did not consider a dispersed collection of lymphocytes without a defined capsule to be a LN ❚Image 2A❚. Only 4 pathologists counted 2 “LNs” in a single fragment
A
B
❚Image 1❚ Slide scans of the 2 cases with the greatest interobserver variation in count among pathologists. A (Case 4), The fragments of “lymph node tissue” exceeded 10, and pathologist counts ranged between 5 and 16. An arrow points to a fragment of fat with 2 “lymph nodes” that 6 pathologists counted as 2 separate nodes (H&E). B (Case 13), Grossly described as “entirely submitted tissue, with one palpable lymph node” (H&E). Pathologist counts ranged between 1 and 11. ❚Table 3❚ Pathologist Questionnaire Relating to Marked Slides and Responses Pathologist Slide No./Question
A
B
C
D
E
F
G
H
I
J
1/Does this organized collection of lymphocytes with no definite capsule count as a lymph node? (Image 2A) 2/Does this “lymphoid follicle” with no definite capsule count as a lymph node? (Image 2B) 3/If multiple nodes are submitted in a single cassette and there are 2 “lymph nodes” in a single fragment of fat, do you count this as 1 node or more? Please count. (Image 1A) 4/If the gross description says a single node is detected and microscopic examination reveals 1 grossly appreciable node and 1 minute lymph node, do you count this as 1 or 2? 5/If the gross description reads as “2 × 1 × 1-cm aggregate of adipose tissue which is entirely submitted,” and the slide shows several “lymph nodes,” is this a single lymph node? 6/Should this count as a lymph node? 7/Count the number of lymph nodes (positive/total number) as if in a case. (Image 3)
Y
Y
Y
Y
N
Y
Y
Y
Y
N
Y
N
Y
Y
N
Y
Y
N
N
N
1
2
2
2
2
1
2
1
1
2
1
1
2
2
2
2
2
1
1
2
?Y
N
N
N
N
N
N
N
N
N
N 1/2
?Y 1/2
Y 1/2
Y 1/2
N 1/2
Y 1/2
Y 1/2
Y 1/2
Y 1/2
N 1/2
N, no; Y, yes.
© American Society for Clinical Pathology
Am J Clin Pathol 2010;134:42-49 45
DOI: 10.1309/AJCPO92DZMUCGEUF
45 45
Parkash et al / Variability in Lymph Node Counting
of fat as 1 node (Image 1A, arrow; Question 3, Table 3). Similarly, there was disagreement on how to count multiple “LNs” in a single fragment of fat (Image 1B). When a second microscopic “LN” was detected in a block in which only a single LN was palpated grossly, 4 pathologists thought that the gross description should take precedence and counted this as a single LN. Five pathologists thought that minute “LN” structures (