Predicting Nonsentinel Node Status After Positive Sentinel ... - CiteSeerX

0 downloads 0 Views 159KB Size Report
Jun 16, 2005 - tioning the need for completion axillary lymph node dissection ... likelihood of residual disease in the axilla after a positive SLN biopsy result.
Annals of Surgical Oncology, 12(8): 654)659

DOI: 10.1245/ASO.2005.06.037

Predicting Nonsentinel Node Status After Positive Sentinel Lymph Biopsy for Breast Cancer: Clinicians Versus Nomogram Michelle C. Specht, MD,1 Michael W. Kattan, PhD,2,3 Mithat Gonen, PhD,3 Jane Fey, MPH,1 and Kimberly J. Van Zee, MD, FACS1

1

Department of Surgery, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, MRI 1026, New York, New York 10021 2 Department of Urology, Memorial Sloan-Kettering Cancer Center, 353 East 68th Street, New York, New York 10021 3 Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 East 63rd Sreet, New York, New York 10021

Background: With increasing frequency, breast cancer patients and clinicians are questioning the need for completion axillary lymph node dissection (ALND) in the setting of a positive sentinel lymph node (SLN). We previously developed a nomogram to estimate the likelihood of residual disease in the axilla after a positive SLN biopsy result. In this study, we compared the predictions of clinical experts with those generated by the nomogram and evaluated the ability of the nomogram to change cliniciansÕ behavior. Methods: Pathologic features of the primary tumor and SLN metastases of 33 patients who underwent completion ALND were presented to 17 breast cancer specialists. Their predictions for each patient were recorded and compared with results from our nomogram. Subsequently, clinicians were presented with clinical information for eight patients and asked whether they would perform a completion ALND before and after being presented with the nomogram prediction. Results: The predictive model achieved an area under the receiver operating characteristic curve of .72 when applied to the test data set of 33 patients. In comparison, the clinicians as a group were associated with an area under the receiver operating characteristic curve of .54 (P < .01 vs. nomogram). With regard to performing a completion ALND, providing nomogram results did not alter surgical planning. Conclusions: Our predictive model seemed to substantially outperform clinical experts. Despite this, clinicians were unlikely to change their surgical plan based on nomogram results. It seems that most clinicians can improve their predictive ability by using the nomogram to predict the likelihood of additional non-SLN metastases in a woman with a positive SLN biopsy result. Key Words: Sentinel lymph node biopsy—Nomogram—Predictions—Breast cancer—Completion axillary lymph node dissection.

Health care decisions have become complex. Medical technology now provides us with improved

diagnostics and new treatments, as well as an overwhelming amount of information and number of patient choices. Staging systems and decision trees are examples of tools that we use on a daily basis to help make clinical decisions.1 Clinicians are presumably able to process this information in their heads or ‘‘on the back of an envelope.’’ However, these instruments are limited in that they often oversimplify the answers to complex questions.

Received July 1, 2004; accepted February 15, 2005; published online June 16, 2005. Address correspondence and reprint requests to: Kimberly J. Van Zee, MD, FACS; E-mail: [email protected]. Published by Springer Science+Business Media, Inc. Ó 2005 The Society of Surgical Oncology, Inc.

654

PREDICTING NON-SLN STATUS AFTER POSITIVE SLN BIOPSY RESULT

655

FIG. 1. Nomogram for predicting the likelihood of additional nodal metastases in breast cancer patients with a positive sentinel lymph node (SLN) biopsy result.3 NUCGRADE, tumor type and nuclear grade (ductal, nuclear grade I; ductal, nuclear grade II; ductal, nuclear grade III; lobular); LVI, lymphovascular invasion; MULTIFOCAL, multifocality of primary tumor; ER, estrogen-receptor status; NUMNEGSLN, number of negative SLNs; NUMSLNPOS, number of positive SLNs; PATHSIZE, pathologic size in centimeters; METHDETECT, method of detection of SLN metastases (frozen, routine, serial hematoxylin and eosin [HE], or immunohistochemistry [IHC]). The first row (Points) is the point assignment for each variable. Rows 2 to 9 represent the variables included in the model. For an individual patient, each variable is assigned a point value (uppermost scale; Points) based on the histopathologic characteristics. A vertical line is made between the appropriate variable value and the Points line. The assigned points for all 8 variables are summed, and the total is found in row 10 (Total Points). Once the total is located, a vertical line is made between Total Points and the final row, row 11 (Predicted Probability of + Non-SLN). Reprinted with permission.3

Prognostic nomograms are another type of a clinical prediction instrument. The statistical definition of a nomogram implies particular graphical representations of a continuous-based prediction.2 Nomograms are often built on the basis of a logistical regression analysis of multiple variables to accurately predict outcome. Nomogram calculations cannot be performed in a clinicianÕs head but are easily brought to the bedside as a simple graphical tool or on a personal digital assistant. We have recently introduced a nomogram to the practice of breast surgery. This nomogram incorporates eight characteristics of the primary breast cancer and sentinel lymph node (SLN) to predict the likelihood of non-SLN metastases in patients with a positive SLN biopsy result3 (Fig. 1; http://www. nomograms.org). SLN biopsy alone, without complete axillary lymph node dissection (ALND), has been adopted at many institutions as an accurate method of staging the axilla while avoiding much of the morbidity4 associated with a complete ALND. However, the standard of care for breast cancer patients with SLN metastases remains completion ALND. Still, many question the need for complete ALND in every patient with detectable SLN metastases, particularly those in whom the perceived risk of

additional disease is low. We sought to determine whether this prognostic nomogram was more accurate than clinical experts in predicting the likelihood of residual disease.

METHODS Construction of the nomogram was previously described in detail.3 In brief, 702 cases of primary breast cancer in which the SLN was positive for metastasis were identified from a prospectively collected SLN database. By using primary tumor and SLN metastasis characteristics, a multivariate model was created to predict the likelihood of additional, non-SLN metastases being found at completion ALND. The model was subsequently applied prospectively to an additional 373 patients (validation population) and was found to accurately predict the likelihood of residual disease (area under the receiver operating characteristic curve [ROC], .77). For experiment 1, 33 women were selected at random from the validation population used to confirm the original nomogram. The characteristics of these women were supplied to 17 participating clinicians Ann. Surg. Oncol. Vol. 12, No. 8, 2005

656

M. C. SPECHT ET AL.

FIG. 2. Handout for clinicians who requested an estimate of the likelihood of positive nonsentinel lymph nodes after a positive sentinel lymph node (SLN) biopsy result. TLOC, tumor location; PATH SIZE, pathologic size in centimeters; NUC GRADE, tumor type or nuclear grade; LVI, lymphovascular invasion; HOW SLN + DETECT, how positive SLNs were detected; NUM SLN POS, number of positive SLNs; NUM SLN NEG, number of negative SLNs; ER, estrogen receptor; PR, progesterone receptor; ALND, axillary lymph node dissection; IHC, immunohistochemistry; UIQ, upper inner quadrant; HE, hematoxylin and eosin; UOQ, upper outer quadrant; LOQ, left outer quadrant; LIQ, left inner quadrant.

for their prediction (Fig. 2). Clinicians were asked, for each patient, ‘‘If 100 women with these characteristics were to have a positive sentinel node and then receive a full axillary dissection, how many of them would you expect to have one or more positive non-sentinel lymph nodes?’’ Clinicians included specialists in breast cancer who attended a weekly multidisciplinary breast conference. Individual specialists included surgeons, medical oncologists, radiation oncologists, radiologists, and pathologists. These clinicians were unfamiliar with the nomogram and had not yet incorporated it into their clinical practice when they participated in this experiment. Subsequently, in experiment 2, clinicians were presented with tumor and patient characteristics from eight patients in the validation set. Clinicians were specifically asked, ‘‘Would you perform a completion Ann. Surg. Oncol. Vol. 12, No. 8, 2005

axillary dissection on this patient?’’ After they answered, they were then presented the results of the nomogram prediction for residual axillary disease in that patient. Clinicians were then asked again, ‘‘Would you perform a completion axillary dissection?’’ Twenty-four clinicians participated in experiment 2. This study was conducted during a multidisciplinary breast cancer conference. The room was equipped with audience participation software, and each participant was given a handheld device. Responses were recorded for each device. Unfortunately, not all participants responded to all questions. There were 187 responses to 8 questions, which resulted in a 97.4% response rate. For experiment 1, the clinician and nomogram predictions were evaluated by calculating the area

PREDICTING NON-SLN STATUS AFTER POSITIVE SLN BIOPSY RESULT

657

FIG. 3. Receiver operating characteristic curves (ROC) for clinicians as a group and the nomogram. The ROC curve is a visual representation of the tradeoff between the sensitivity and specificity of a diagnostic test. This curve describes the inherent predictive ability of the test. Each point along the curve corresponds to the sensitivity and specificity for that test threshold. In our study, the curve describes the sensitivity and specificity of the nomogram at different levels of likelihood of residual disease. The area under the ROC curve (AUC) is a value measurement that allows us to compare the predictive ability of 2 tests. If the AUC is .5, then the curve is approximately a straight line, and the test is no better than a flip of a coin in predicting the desired outcome. However, if the AUC is 1.0, then the test is perfect and correctly identifies all the true positives and true negatives. Sensitivity indicates the proportion of women with positive nonsentinel lymph nodes predicted to have positive nonsentinel lymph nodes. Specificity indicates the proportion of women with negative nonsentinel lymph nodes predicted to have negative nonsentinel lymph nodes. The AUC for the nomogram was .72, and the AUC for clinicians as a group was .54 (P < .01).

under the ROC (AUC). The ROC is a visual representation of the trade-off between the sensitivity and specificity of a diagnostic test. This curve describes the inherent predictive ability of the test. Each point along the curve corresponds to the sensitivity and specificity for that test threshold. In our study, the curve describes the sensitivity and specificity of the nomogram at different levels of likelihood of residual disease. The AUC is a value measurement that allows us to compare the predictive ability of two tests. If the AUC is .5, then the curve is approximately a straight line, and the test is no better than a flip of the coin in predicting the desired outcome. However, if the AUC is 1.0, the test is perfect and correctly identifies all the true positives and true negatives. To compare the accuracy of nomogram and clinician predictions of lymph node positivity, we used an ROC approach. Specifically, we estimated the parameters of the ROCs by using a latent-variable binormal model.5 A random-effects term was added to account for the fact that each patient was evaluated several times by different physicians. Model estimates were obtained by using restricted maximum likelihood with SAS PROC MIXED,6 and the accuracies of the nomogram and clinician predictions were compared by using a likelihood ratio test. For experiment 2, we again used a mixed model, this time to evaluate whether a statistically significant shift in clinician judgment occurred after the nomogram predictions were viewed. The effect of the

nomogram on the decision was assessed by using McNemarÕs test, adjusted for clustering.7

RESULTS In experiment 1, 17 clinicians made predictions for 33 patients. The nomogram predicted more accurately (AUC = .72) than the clinicians (AUC = .54) as a group (P < .01; Fig. 3). One clinician outperformed the nomogram, whereas the remaining 16 made predictions that were inferior to those made by the nomogram. In experiment 2, 24 clinicians responded to 8 questions, which resulted in 187 responses (97% response rate). Clinicians rarely changed their surgical decision (P = .67) after being presented with the nomogram prediction of non-SLN metastases (Table 1). Ninety percent (168 of 187) of responses represented no change in the surgical plan. Among recommendations for not proceeding with completion ALND, half (7 of 14) were changed after presentation of the nomogram prediction. Among recommendations for completion ALND, only 7% (12 of 173) were changed to no ALND after presentation of the nomogram prediction.

DISCUSSION Outside of a clinical trial, we continue to recommend completion ALND after a positive SLN biopsy Ann. Surg. Oncol. Vol. 12, No. 8, 2005

658

M. C. SPECHT ET AL.

TABLE 1. Clinician responses to whether they would recommend a completion axillary lymph node dissection after a positive sentinel lymph node biopsy result (n = 187 responses) Variable

Clinician would recommend ALND after nomogram prediction

Clinician would recommend no ALND after nomogram prediction

161

12

7

7

Clinician would recommend ALND before nomogram prediction Clinician would recommend no ALND before nomogram prediction ALND, axillary lymph node dissection.

result. However, for patients in whom the perceived risk of residual disease is low, some patients (and clinicians) believe that the benefit of ALND is outweighed by its risks and choose not to have a completion ALND. For such patients, the nomogram was developed to provide an accurate risk estimate that can help in weighing the pros and cons of completion ALND. Uncertainties in medical decision making are plentiful and include inaccuracies in diagnosis, uncertainties in the natural progression of disease, and variations with regard to the effect of treatment in an individual patient. In discussing the risks and benefits of performing a completion ALND after a positive SLN biopsy result, the physician must process abundant data to arrive at the best prediction of the likelihood of finding residual disease. We have demonstrated that when presented with similar input data to answer this question, a statistical model (nomogram) predicted more accurately than clinical experts the status of the axilla after a positive SLN biopsy result. This study supports a previous finding that nomogram models outperform human experts.8 Humans are filled with inherent biases that make predicting outcomes difficult. Clinicians are plagued by recall bias, remembering the unique patient rather than the routine. Control bias occurs when we tend to predict outcomes that we want to come true. Practically, clinicians use simple rules to stratify patients rather than a continuous regression analysis.9 For example, a clinician might use the heuristic that tumors >5 cm indicate a high risk for nodal metastases rather than using size as a continuous variable. Therefore, it is not surprising that the nomogram performed better than clinical experts. The nomogram is a predictive instrument that can accurately weigh multiple individual variables simultaneously and without bias. Does this mean that the nomogram should replace the clinical expert in making the decision about completion ALND after a positive SLN biopsy result? Of course not, but the nomogram could be adAnn. Surg. Oncol. Vol. 12, No. 8, 2005

ded as a tool in the decision-making process. It could provide a numerical estimate to help both the clinician and patient weigh the pros and cons in making this decision. There is no inherent ability of a nomogram to perform risk/benefit analyses, and, therefore, it cannot replace clinical judgment. In experiment 2, we were unable to demonstrate that the nomogram changed physiciansÕ behavior (P = .67). It is possible that the clinicians were unfamiliar with the nomogram as a prediction tool and that with increased use they would be more comfortable relying on the results to be part of the decision-making process. Alternatively, reported clinical decisions may not be reliable outside of the clinic, and, therefore, only with a patient present could a true estimate of change in behavior be made. However, the most likely explanation is that clinicians at our institution believe that the standard of care is completion ALND after a positive SLN biopsy result and only rarely consider not recommending a completion ALND (14 of 187; 7%). It is informative that of these 14 clinicians who indicated a preference for not performing completion ALND, half (n = 7) changed their minds to recommend completion ALND after hearing the nomogram estimate of the likelihood of residual disease. Conversely, of the 173 recommendations for ALND, only 12 (7%) were changed to recommend no ALND after the nomogram estimates were presented. It is interesting to note that in two scenarios, the decision about whether to dissect the axilla was different even though the risk of residual disease was the same. A case of a postmenopausal woman with a 9% risk of residual disease was presented, and two clinicians changed their behavior to not perform completion ALND; presumably, the nomogram prediction was worse than what the clinician had anticipated. Conversely, a case of a 38-year-old woman with an 8% risk of residual disease was presented, and four clinicians changed their response to dissect the axilla; presumably, the nomogram prediction was better than they had predicted. Clearly, age plays a role in the decision about returning to the

PREDICTING NON-SLN STATUS AFTER POSITIVE SLN BIOPSY RESULT

operating room. However, in constructing the nomogram, age had no predictive role in determining likelihood of non-SLN metastases. A patientÕs age is seemingly associated with the amount of risk clinicians are willing to assume. Estimating the risk of non-SLN metastases in the axilla after a positive SLN biopsy result is an important issue that clinicians are facing more frequently. Accurate estimates of this likelihood may improve risk stratification in future clinical trials of the utility of completion ALND. Furthermore, accurate estimates of risk are essential for an informed discussion with patients regarding the pros and cons of completion ALND in the setting of a positive SLN biopsy result. Nomogram predictions seem to be substantially more accurate than clinical predictions, and, therefore, clinicians can improve their predictive ability by using the nomogram to predict the likelihood of additional non-SLN metastases in a woman with a positive SLN biopsy result who is considering not pursuing completion ALND.

659

REFERENCES 1. Hunink M, Glasziou P. Decision Making in Health and Medicine: Integrating Evidence and Values. Cambridge, UK: Cambridge University Press, 2001. 2. Weisstein EW. Nomogram. In: MathWorld—A Wolfram Web Resource. Available at: http://mathworld.wolfram.com/Nomogram.html. 3. Van Zee KJ Van , Manasseh DE, Bevilacqua JB, et al. A nomogram for predicting the likelihood of additional nodal metastases in breast cancer patients with a positive sentinel node biopsy. Ann Surg Oncol 2003;10:1140–51. 4. Temple LKF, Baron R, Cody HS III, et al. Sensory morbidity after sentinel lymph node biopsy and axillary dissection: significant and persistent sequelae. Ann Surg Oncol 2002;9:654–62. 5. Pepe M. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Oxford University Press, 2003. 6. SAS/STAT User Manual. Version 9. Cary, NC: SAS Institute, 2003. 7. Obuchowski NA. On the comparison of correlated proportions for clustered data. Stat Med 1998;17:1495–507. 8. Ross PL, Gerigk C, Gonen M, et al. Comparisons of nomograms and urologistsÕ predictions in prostate cancer. Semin Urol Oncol 2002;20:82–8. 9. Ross PL, Scardino PT, Kattan MW. A catalog of prostate cancer nomograms. J Urol 2001;165:1562–8.

Suggest Documents