Breast Cancer Res Treat (2006) 100:309–318 DOI 10.1007/s10549-006-9252-6
ORIGINAL REPORT
Variation in false-positive rates of mammography reading among 1067 radiologists: a population-based assessment Alai Tan Æ Daniel H. Freeman Jr. Æ James S. Goodwin Æ Jean L. Freeman
Received: 10 April 2006 / Accepted: 12 April 2006 / Published online: 4 July 2006 Springer Science+Business Media B.V. 2006
Abstract Background The accuracy of mammography reading varies among radiologists. We conducted a populationbased assessment on radiologist variation in falsepositive rates of screening mammography and its associated radiologist characteristics. Methods About 27,394 screening mammograms interpreted by 1067 radiologists were identified from a 5% non-cancer sample of Medicare claims during 1998–1999. The data were linked to the American Medical Association Masterfile to obtain radiologist characteristics. Multilevel logistic regression models were used to examine the radiologist variation in falsepositive rates of screening mammography and the associated radiologist characteristics. Results Radiologists varied substantially in the falsepositive rates of screening mammography (ranging from 1.5 to 24.1%, adjusting for patient characteristics). A longer time period since graduation is associated with lower false-positive rates (odds ratio [OR] for every 10 years increase: 0.87, 95% Confidence Interval [CI], 0.81–0.94) and female radiologists had higher A. Tan (&) Æ D. H. Freeman Jr. Æ J. L. Freeman Department of Preventive Medicine and Community Health, Office of Epidemiology and Biostatistics, University of Texas Medical Branch, 301 University Boulevard, Galveston, Texas 77555-1148, USA e-mail:
[email protected] J. S. Goodwin Æ J. L. Freeman Department of Internal Medicine – Geriatrics, University of Texas Medical Branch, Galveston, Texas, USA J. S. Goodwin Æ J. L. Freeman Sealy Center on Aging, University of Texas Medical Branch, Galveston, Texas, USA
false-positive rates than male radiologists (OR = 1.25, 95% CI, 1.05–1.49), adjusting for patient and other radiologist characteristics. The unmeasured factors contributed to about 90% of the between-radiologist variance. Conclusions Radiologists varied greatly in accuracy of mammography reading. Female and more recently trained radiologists had higher false-positive rates. The variation among radiologists was largely due to unmeasured factors, especially unmeasured radiologist factors. If our results are confirmed in further studies, they suggest that system-level interventions would be required to reduce variation in mammography interpretation. Keywords False-positive Æ Mammography Æ Medicare claims Æ Older women
Introduction Over the past decade, the use of screening mammography has increased substantially [1, 2]. Based on statewide data from the Behavioral Risk Factor Surveillance System [2], the median percent of women age 65 and older who report having a mammogram in a 2-year period has increased from 54.3% in 1990 to 77.1% in 2002. The increase in mammography use was associated with a shift to smaller tumor size at diagnosis and a substantial reduction in mortality from breast cancer [3–5]. However, concerns remain about variation in quality of mammography [6]. For example, it has been estimated that one third of regularly screened women experience at least one false-positive screening mammogram over a period of 10 years [7].
123
310
The adverse physical, psychological and economic effects of false-positive results are well documented [8–11]. Whether the current level of false-positives is necessary to achieve high sensitivity is unclear. In fact, a comparison of U.S. screening performance with the United Kingdom found that recall rates for additional testing and negative open surgical biopsy rates are twice as high in the U.S. than in the United Kingdom but cancer detection rates are similar, suggesting that current breast cancer detection rates in the U.S. could still be achieved with lower recall rates for further evaluation of screening mammograms [12]. Previous research on factors affecting the accuracy of mammography interpretation has largely focused on patient characteristics, such as breast density, age, race, use of hormone replacement therapy, body mass index, prior screening history, family history of breast cancer [13, 14]. Current effort in improving the quality of cancer care has drawn increasing attention to the variability among radiologists in mammography interpretation. Substantial variability in mammography reading among radiologists has been documented by measuring the agreement among radiologists in interpreting the same sets of films [15–17]. However, such studies were conducted in a testing environment and may not represent the radiologists’ performance in their actual practice [18]. Few studies have assessed variability in mammography performance among radiologists in their real practice [19–21]. However, these studies include a small number of radiologists from relative small geographic areas (up to 209 radiologists from four mammography registries [21]) and may not be representative of radiologists across the United States. In the present study, we used a population-based dataset, the 5% non-cancer sample from the SEERMedicare database during 1998–1999 to: (1) describe the variation among radiologists in false-positive rates of screening mammography for older women and (2) investigate the effect of selected radiologist characteristics.
Materials and methods
Breast Cancer Res Treat (2006) 100:309–318
through 1999. We use the non-cancer sample because the outcome of interest is the false-positive rate (1-Specificity), which measures the extent to which disease-free individuals are falsely identified as diseased. In order to exclude breast cancer cases diagnosed in 2000 from the 5% non-cancer sample of 1999, we additionally screened the 1999 claims data to exclude any woman who had (1) a breast cancer treatment related procedure (mastectomy, partial mastectomy, nodal dissection, chemotherapy or radiation therapy) AND a breast cancer diagnosis in the 2000 claims; OR (2) breast cancer as the principal diagnosis on the 2000 inpatient claims. Based on our previous work [23], these are criteria with a high likelihood of distinguishing breast cancer cases from breast cancer-free women. Information on each woman’s demographics, Medicare entitlement, coverage of Parts A (hospital care) and B (physician and outpatient services) and HMO membership was obtained from the Medicare enrollment files. Screening status and follow-up diagnostic testing (diagnostic mammography, breast ultrasound or biopsy) were determined from (1) Carrier File, (2) Hospital Outpatient Standard Analytic File (SAF) and (3) Medicare Provider Analysis and Review File (MEDPAR). The Carrier File contains the claims for physician and other medical services covered under Part B. Diagnoses are coded in the International Classification of Diseases, Clinical Modification (ICD9-CM) and procedures are coded in the HCFA (Health Care Financing Administration) Common Procedure Coding System (HCPCS). The SAF file contains claims on facility-based outpatient services. Diagnoses are coded in ICD-9-CM and procedures are coded in both ICD-9-CM and HCPCS. The MEDPAR File contains claims for inpatient hospital stays. Diagnoses and procedures are coded on ICD-9-CM. Information on radiologist characteristics was obtained from the American Medical Association (AMA) Masterfile. The AMA Masterfile is a source of physician characteristics data that can be linked with Medicare claims via UPINs (Unique Personal Identification Numbers) in the performing provider field of the claims [24].
Data source Subjects Data were from the 5% non-cancer sample of Medicare claims data over the period of 1998–1999. As part of the SEER-Medicare data linkage project [22], a 5% random sample of Medicare beneficiaries residing in the SEER areas is selected each year from the Medicare enrollment file, after excluding beneficiaries who EVER linked to the SEER tumor registry cases
123
The screened women were identified as female Medicare beneficiaries aged 65 years or older who had a screening mammogram claim (CPT code of 76092) in the Carrier file [25] over the period 1998–1999. Women were excluded if they were members of an HMO or were not covered under both Parts A and B of
Breast Cancer Res Treat (2006) 100:309–318
Medicare from the date of the mammogram through 3 months after the mammogram. These women were excluded since their claims for certain services may not be included in the Medicare database. We also excluded women if their screening mammogram claim could not be linked to a unique radiologist in the AMA Masterfile because of a missing performer UPIN, multiple performer UPINs, a failed link of performer UPIN to the AMA Masterfile or a performer UPIN that was not identified as a radiologist by either the AMA data or the claim’s specialty field. These criteria identified 24,746 women with 31,839 screening mammograms interpreted by 2448 radiologists. The number of mammograms interpreted by each radiologist ranged from 1 to 151. To have relative reliable estimates of the false-positive rate of individual radiologists, we only include 1067 radiologists who each interpreted 10 or more screening mammograms in the analyses. These 1067 radiologists interpreted 27,394 (86%) of the 31,839 screening mammograms initially identified. Definition of a false-positive screening mammogram All women in our study are assumed to be free of breast cancer 1 year after the screening mammogram. A screening mammogram was defined as a false-positive if the screened woman had a claim for follow-up diagnostic testing within 3 months after the mammogram. The diagnostic testing included diagnostic mammograms (CPT code 76090 or 76091), breast ultrasound (CPT code 76645) and breast biopsies (CPT code 19120, 19100, 19101, or ICD-9-CM procedures codes 8511, 8512, 8520 and 8521). For mammograms conducted after September 30, 1999, we used Medicare claims through March 31, 2000 in order to have three full months of follow-up time to assess whether they were positives. Patient and radiologist characteristics Patient characteristics include age, race, socioeconomic status (SES), SEER area and family history of breast cancer. In the analysis, age was categorized as 65–74 vs. 75+. Race was categorized as white, black and other. Median household income of the subject’s zip code was used as a proxy for patients’ SES as recommended by Bach et al. [26]. Since only 2–3% of the subjects had zip code median household income lower than $25,000, the variable was categorized as < 50,000 and ‡50,000.Three SEER areas—San Francisco, San Jose and Los Angeles—were combined into one category as California. Each of the other eight SEER areas
311
remained as a separate category. Based on whether the diagnosis code V163 appeared in the claims, family history of breast cancer was documented as yes or no. Radiologist characteristics include age, year-sincegraduation from medical school, board certification status, type of practice and volume of mammograms. Radiologist’s age was categorized as < 40, 40–49, 50–59 and 60+. Year-since-graduation from medical school was categorized as < 10, 10–19, 20–29 and 30+ years. Board certification status was categorized as with or without a board certification in radiology. Type of practice is categorized as engaging in direct patient care versus indirect patient care. Indirect patient care includes medical education, medical research, and other medical activities. Mammogram volume of a radiologist was estimated from the claims data, computed as the number of claims for any mammogram (CPT code 76090, 76091, 76092) with his/her UPIN in the Carrier file during the year of study. It was categorized into five groups: low (£10), somewhat low (10– 13), moderate (14–18), somewhat high (19–27) and high (28+), corresponding to actual annual Medicare volume of £200, 200–260, 280–360, 380–540 and 560+. Groups were categorized so that each contains about 20% of the radiologists. Statistical analysis Descriptive statistics were used to describe the patient and radiologist characteristics. v2-statistics were performed to test the bivariate association between each variable and the false-positive rates. Because radiologist characteristics of age and year-since-graduation are highly correlated (Pearson correlation coefficient = 0.97, P < 0.001), we only used year-sincegraduation in the subsequent regression analysis. Also, because the association between year-since-graduation and false-positive rates appeared to be linear, yearsince-graduation was used as a continuous variable in the regression analysis. The observed radiologist false-positive rate is computed as the number of false-positives divided by total number of screening mammograms a radiologist conducted in breast cancer-free women. The radiologist risk-adjusted false-positive rate is generated from the multilevel logistic regression model [27]. Although our sample has a relative small number of screening mammograms for each radiologist, the multilevel modeling technique can reduce sampling variation due to small sample sizes by generating shrunk estimates of radiologist false-positive rates [28, 29]. In this study, the adjusted patient risk factors include patient age, race, SES and family history of breast cancer. The
123
312
Breast Cancer Res Treat (2006) 100:309–318
variability among radiologists in their false-positive rates was graphically illustrated through boxplots. To evaluate the effects of selected radiologist characteristics, two multilevel logistic regression models were used: Model 1 only included radiologist characteristics and Model 2 included both patient and radiologists characteristics. We used SAS (version 9.1; SAS Institute, Cary, NC) for these analyses.
Results A total of 21,576 women received 27,394 screening mammograms during 1998–1999. The overall rate of false-positive screening mammography was 6.3%. About 4.4% had a follow-up diagnostic mammogram, 2.7% had a breast ultrasound and 0.9% had a breast biopsy (Table 1). Generally, the false-positive rates were higher among younger women (6.69% vs. 5.63% for women aged 75 or older, P < 0.001), non-black women (6.36% for White and 6.43% for Other racial/ ethnicity vs. 4.46% for Black, P = 0.01) and women living in certain SEER areas (9.12% in Hawaii vs. 5.39– 8.28% in other SEER areas, P < 0.001) (Table 2). The 27,394 screening mammograms were read by 1067 radiologists. Most of the radiologists were between 40 and 59 years of age (70.4%), male (84.0%), board certified (91.3%) and 10–29 years post graduation (64.6%) Radiologists of younger age, female gender and more recently trained had significantly higher false-positives rates (P values < 0.001) (Table 3). Figure 1 displays the variability of the radiologistspecific false-positive rates. Radiologists’ observed false-positive rates vary substantially from 0 to 39.1%. The variation in false-positive rates was reduced greatly after adjusting for sampling errors. Further adjusting for patient characteristics of age, race, family history of breast cancer and SES only reduced the variation minimally. Even with the reduction in variation by adjusting for sampling error and patient charTable 1 Rates of follow-up diagnostic testing within three months after a screening mammogram during 1998–1999
Total Any follow-up worka Dx mammography Breast ultrasound Breast biopsy a
N
Rates (%)
95% CI
27,394 1,717 1,197 749 245
6.27 4.37 2.73 0.89
(5.98, (4.13, (2.54, (0.78,
6.55) 4.61) 2.93) 1.01)
Follow-up work includes one or more of the following diagnostic tests of breast cancer (1) diagnostic (Dx) mammography, (2) breast ultrasound and (3) breast biopsy
123
acteristics, the radiologist-specific false-positive rates still varied considerably, from 1.5 to 24.1%. We categorized radiologists into deciles based on their ranks in observed and adjusted false-positive rates. Over 70% of the radiologists had observed falsepositive rates lower than 10%, and over 90% radiologists had adjusted false-positive rates lower than 10% (Fig. 2). Figure 2 also shows that the top 10% radiologists who had highest false-positive rates (observed false-positive rates higher than 16% and adjusted falsepositive rates higher than 8%) also had the greatest variation, indicating that the substantial variation in false-positive rates shown in Fig. 1 was largely due to these radiologists. Radiologist gender and year-since-graduation were significantly associated with false-positive rates both before and after adjusting for patient characteristics (Table 4). Female radiologists had significantly higher false-positive rates than male radiologists (odds ratio [OR] = 1.25, 95% Confidence Interval [CI], 1.05–1.49) and radiologists with a longer time period since graduation had lower false-positive rates (OR for every 10 years increase = 0.87, 95% CI, 0.81–0.94), adjusting for patient and other radiologist characteristics. The intraclass-correlation of 0.37 from the null model indicates that about 37% of the total variation in logit scale of false-positive rates is attributable to between-radiologist variation (Table 4). Of the between-radiologist variance, only 10% were explained by patient- and radiologist-characteristics.
Discussion The 1067 radiologists varied substantially in their falsepositive rates of screening mammography. The large variation was mostly due to fewer than 10% of the radiologists who had adjusted false-positive rates higher than 8%. Female and more recently trained radiologists were associated with higher false-positive rates. Most of the between-radiologist variation is attributable to unmeasured factors. Consistent with other studies [19–21], we found that radiologists with a longer time since graduation had significantly lower false-positive rates. A possible explanation is that the current training for more recently trained radiologists emphasizes sensitivity over specificity, which results in more recently trained radiologists being more meticulous and missing fewer cancers than radiologists who were more distant from their training. If this is the case, the sensitivity of screening mammography among more recently trained radiologists might also be higher. Results from the
Breast Cancer Res Treat (2006) 100:309–318 Table 2 Description of patient characteristics and false positive rates of screening mammography for older women
Table 3 Description of radiologist characteristics and false positive rates of screening mammography for older women
Patient characteristics
313
No. (%)
Age 65–74 13,068 ‡75 8,508 Race White 18,615 Black 1,231 Others 1,730 Zip code median income < 50,000 12,088 ‡50,000 9,488 Family history of breast cancer Yes 543 No 21,033 SEER area California 4,910 Connecticut 2,850 Detroit 3,188 Hawaii 798 Iowa 3,980 New Mexico 1,036 Seattle 2,103 Utah 1,593 Atlanta 1,113 Total (N) 21,576
Radiologist characteristics
No. of screening mammograms (%)
% False-positive rate
P value
(60.57) (39.43)
16,477 (60.15) 10,917 (39.85)
6.69 5.63
< 0.001
(86.28) (5.71) (8.02)
23,745 (86.68) 1,456 (5.32) 2,193 (8.01)
6.36 4.46 6.43
0.01
(56.03) (43.97)
15,398 (56.21) 11,996 (43.79)
6.10 6.49
0.19
(2.52) (97.48)
730 (2.66) 26,664 (97.34)
6.71 6.26
0.62
(22.76) (13.21) (14.78) (3.70) (18.45) (4.80) (9.75) (7.41) (5.16)
6,137 3,598 3,906 1,075 5,338 1,293 2,666 2,009 1,372 27,394
5.39 5.42 6.58 9.12 5.53 8.28 7.54 6.72 7.14
< 0.001
No. of screening mammograms (%)
% False-positive rate
P value
4847 (17.69) 10,551 (38.52) 9227 (33.68) 2769 (10.11)
7.55 6.55 5.57 5.27
< 0.001
4529 (16.53) 22,865 (83.47)
7.90 5.94
< 0.001
25,122 (91.71) 2272 ( 8.29)
6.34 6.26
0.89
2827 8989 9145 6433
7.92 6.90 5.84 5.27
< 0.001
2382 (8.70) 3809 (13.90) 4630 (16.90) 6165 (22.50) 10,408 (37.99)
6.34 6.17 6.16 6.62 6.13
0.77
25,039 (91.40) 2355 (8.60) 27,394
6.23 6.62
0.46
No. (%)
Age < 40 200 (18.74) 40–49 394 (36.93) 50–59 357 (33.46) 60+ 116 (10.87) Gender Female 170 (15.93) Male 897 (84.07) Type of practice Direct patient care 970 (90.91) Indirect patient care 97 ( 9.09) Year-since-graduation < 10 129 (12.09) 10–19 326 (30.55) 20–29 363 (34.02) 30+ 249 (23.34) Volume of mammogram Low 197 (18.46) Somewhat low 238 (22.31) Moderate 215 (20.15) Somewhat high 213 (19.96) High 204 (19.12) Board certification on radiology Yes 974 (91.28) No 93 (8.72) Total Radiologists (N) 1067
other studies [20, 30] support this explanation. Barlow et al. [20] found that fewer years of experience was associated with higher sensitivity but lower specificity. Chouldhry et al. [30] systematically reviewed articles
(22.40) (13.13) (14.26) (3.92) (19.49) (4.72) (9.73) (7.33) (5.01)
(10.32) (32.81) (33.38) (23.48)
published during 1966–2004 that studied the relationship between clinical experience and quality of health care, and found that longer years of practice were reported to have a negative impact on quality of care in
123
314
Breast Cancer Res Treat (2006) 100:309–318
40 35
FPR (%)
30 25 20 15 10 5 0 A
B
C
Fig. 1 Boxplots of observed and adjusted radiologist falsepositive rates (FPRs). A: observed; B: adjusted for sampling error; C: further adjusted for patient characteristics of age, race, socio-economic status and family history of breast cancer
40 35
FPR (%)
30 25 20 15 10 5 0 1
2
3
4
5
6
7
8
9
10
Deciles of Radiologists
Fig. 2 Boxplots of radiologist-specific false-positive rates (FPRs) (observed: left, median joined by the solid line; adjusted: right, median joined by the dash line) by deciles of radiologists. The adjusted false-positive rates are adjusted for sampling error, patient characteristics of age, race, socio-economic status and family history of breast cancer. The deciles of radiologists are categorized based on radiologists’ ranks in observed and adjusted false-positive rates
almost all studies. We concur with the conclusion of Barlow et al. that more years of practice is not necessarily associated with better performance, but may affect sensitivity and specificity by determining the threshold for calling a mammogram positive. The same argument could also be made about the effect of radiologist gender on accuracy of mammography reading. It is likely that female radiologists are more meticulous than male radiologists. As a result, they might have higher false-positive rates but miss fewer cancers. However, those few studies that have evaluated the relationship of radiologist gender and mammography performance have not found any significant association [19, 20]. Further studies are needed to confirm our finding.
123
Similar to Christiansen et al. [31], we found that unmeasured factors contribute to most of the variance in mammography performance. These unmeasured factors were interpreted by Christiansen et al. as a radiologist’s tendency to claim a mammogram as falsepositive [31]. However, they could reflect a more complex process involving factors at both patient- and radiologist-level. Several important patient-level factors were not examined in this study, such as the breast density of the patient, hormonal therapy status, and the availability of previous mammogram film for comparison. Some radiologist-level factors are difficult to measure, such as a radiologist’s mood or focus at mammography film reading. We suspect that a large proportion of between-radiologist variance will remain even if the unmeasured patient-level factors have been taken into account. If this is true, then modification of current protocols for mammography reading may be needed to achieve more consistent readings. This need for intervention at a system-level is also supported by Smith-Bindman et al.’s finding that recall rates for additional testing and negative open surgical biopsy rates are twice as high in US as in the United Kingdom but cancer detection rates are similar [12]. It is not clear whether the differences in false-positive rates among SEER areas are a function of other unmeasured patient characteristics, reflect geographical difference in practice patterns among radiologists, or arise from both of these factors. Although the patient’s SEER area of residence was adjusted for as a patient-level factor in evaluating the effect of radiologist characteristics (Table 4, model 2), it has not been adjusted for in generating adjusted radiologist-specific false-positive rates. Our study has a number of limitations, primarily attributed to the use of claims data to estimate screening mammography performance. First, previous studies found that mammograms conducted for screening purposes may have been billed as diagnostic in claims data [25, 32, 33]. This practice has declined considerably since Medicare began to reimburse for annual screening mammography in January 1998 [25]. Nevertheless, we estimate that up to 20% of the screening mammograms billed through Medicare may be missing from this analysis [34]. Second, not all screening mammograms provided to Medicare beneficiaries are billed to Medicare. These would include, for example, mammograms obtained through free screening programs or billed to a private insurance. It is estimated that the overall sensitivity of the Medicare claims for screening mammography is 85% and this estimate is lower for beneficiaries who are younger, African–American or Hispanic and have
Breast Cancer Res Treat (2006) 100:309–318 Table 4 The association between radiologist characteristics and falsepositive rates of screening mammography for older women
a
OR, odds ratio
b
CI, confidence interval
c
ORs from model 2 were also adjusted for patient characteristics of age, race, SES, family history of breast cancer and SEER area of residence
*
P = 0.01;
**
P < 0.001
315 ORa (95% CIb)
Radiologist characteristics
Model 2c
Model 1 Gender* Female Male Type of practice Indirect patient care Direct patient care Year-since-graduation (per 10 year)** Volume of mammogram Low Somewhat low Moderate Somewhat high High Board certification in radiology Yes No Intraclass-correlation (Null model) = 37.14% Percent of between-radiologist variance explained
some college education [35]. It is not clear whether the inclusion of these mammograms in the study would change our estimates of false-positive rates or the findings with respect to radiologist characteristics. However, the sensitivity of claims for screening mammography does not appear to be related to mammography facility [35]. Third, unlike data from mammography registries, a radiologist’s assessment and recommendations after a mammogram are not directly available in Medicare claims. It is reasonable to infer that additional imaging and biopsies following the initial screening mammogram indicate a positive result. However, our definition of a false-positive result would misclassify a ‘‘positive’’ mammography interpretation as ‘‘negative’’ in the absence of claims for follow-up work. Another concern is that the 3-month follow-up period might be too short to capture follow-up procedures provided in a longer time interval. The Breast Imaging Reporting and Data System (BI-RADS) classifies mammogram results into six categories (0–5) [36]. Concerns arise in ‘‘probably benign abnormalities’’ (BI-RADS assessment 3), which is recommended for follow-up within 6 months. However, pooled data from the Breast Cancer Surveillance Consortium show that only a small proportion (4.1%) of mammograms had BI-RADS assessment 3 and 37% of them were recommended for immediate follow-up [37]. Analogous to other studies [20, 21] (Table 5), our approach should also be able to accurately identify mammograms with BI-RADS assessment 1 and 2 (‘‘not suspicious for breast cancer, normal follow-up is recommend’’) as ‘‘negative’’,
1.25 (1.05, 1.49)
1.24 (1.04, 1.49)
0.99 (0.80, 1.22)
0.99 (0.80, 1.22)
0.87 (0.81, 0.94)
0.87 (0.81, 0.94)
1.04 1.03 0.98 1.07
1.01 1.00 0.93 1.05
(0.84, (0.85, (0.81, (0.90,
1.29) 1.29) 1.18) 1.29)
0.98 (0.78, 1.23) 4.50
(0.81, (0.82, (0.77, (0.88,
1.27) 1.22) 1.34) 1.26)
0.99 (0.79, 1.24) 10.52
BI-RADS assessment 0, 4, 5 (‘‘incomplete or suspicious of breast cancer, immediate follow-up is recommend’’) and BI-RADS assessment 3 with immediate follow-up as ‘‘positive’’. Fourth, measures of family history of breast cancer and radiologist volume of mammography could be underestimated. Physicians might not record family history of breast cancer routinely unless it affects claims reimbursement. A radiologist would appear to be a low-volume provider if he/she provided mammography services primarily for patients younger than 65, or patients outside SEER areas. Finally, other performance measures, such as sensitivity, predicted value positive, predicted value negative should be considered in combination with falsepositive rates (1-Specificity) to evaluate the radiologists’ quality of screening mammography. Although these limitations may affect the accuracy of our estimates of screening mammography falsepositive rates, the overall 6.3% false-positive rate estimated from this study appears reasonable. First, population-based studies reported a range of 2.7– 13.8% for screening mammography false-positive rates [38–40]. Our estimate falls within this range. Second, for SEER site specific comparisons, the false positive rate estimate for New Mexico (8.3%) is close to those reported by the New Mexico Mammography Project (9.5%) [39]. Compared to other studies, the lower rates we estimated may be attributable to the inclusion of older women (Table 5). In conclusion, radiologists varied greatly in the falsepositive rates of screening mammography. Female and
123
123 1 year follow-up 9.9% 2–26% —
— 2.6–15.9% 3.5–7.9%d
1–29% —
6.5–15%
1 year follow-up
1995–2000 209 480+/year All ages BI-RADS 0,4,5 and 3bwith immediate further assessment
Four mammography registries
Smith-Bindman et al.[21]
0–40% 1.5–28.9%e
5% non-cancer sample of Medicare claims (including 11 SEERa areas) 1998–1999 1067 10+ in 2 years 65+ with a claims for follow-up diagnostic testing within 3 months after the screening All cases are free of cancer 1 year after the screeningc 6.4%
Present study
b
The Surveillance, Epidemiology, and End Results (SEER) Program, an authoritative source of information on cancer incidence and survival in the United States BI-RADS (Breast Imaging Reporting and Data System): 0, 4, 5: incomplete or suspicious of breast cancer, immediate follow-up is recommended; 3: probably benign finding, short-interval follow-up ( < 6 month) is recommended; 1, 2: no concern of breast cancer, normal follow-up is recommended c 5% non-cancer sample of Medicare claims are purged of cancer cases diagnosed through 1999. In order to exclude breast cancer cases diagnosed in 2000 from 5% non-cancer sample of 1999, we conducted a screen of the 1999 claims data to exclude any woman whose 2000 claims (1) had a breast cancer treatment related procedure (mastectomy, partial mastectomy, nodal dissection, chemotherapy or radiation therapy) AND a breast cancer diagnosis; OR (2) had breast cancer as the principle diagnosis on an inpatient claim. Based on our previous work [23], these are criteria with a high likelihood of distinguishing breast cancer cases from breast cancer-free women d Adjusted for patient, testing and radiologist characteristics e Adjusted for sampling error and patient characteristics
a
Follow-up of a breast cancer diagnosis Overall false-positive rate Radiologist false-positive rate Observed Adjusted
1996–2001 124 480+/year 40+ BI-RADS 0,4,5 and 3b with immediate further assessment
1985–June 1993 24 50+ in 8.5 years 40–69 Indeterminate or suspicious for cancer; or a recommendation for nonroutine follow-up within 12 months 1 year follow-up
Study Period Number of radiologists Caseload of each radiologist Age of the screened women Definition of a positive
Three mammography registries
An HMO in New England
Barlow et al. [20]
Data source
Elmore et al. [19]
Table 5 The comparison of present study with other population-based studies that have evaluated the radiologist variability in false-positive rates of screening mammography
316 Breast Cancer Res Treat (2006) 100:309–318
Breast Cancer Res Treat (2006) 100:309–318
more recently trained radiologists had higher falsepositive rates. The variation among radiologists was largely due to unmeasured factors, especially unmeasured radiologist factors. If our results are confirmed in further studies, they suggest that system-level interventions would be required to reduce variation in mammography interpretation. Acknowledgements This study is supported by grants from National Cancer Institute, National Institutes of Health (R01CA072076 and P50CA105631) and the Agency for Healthcare Research and Quality (R24HS011618). The study used the linked SEER-Medicare Database. The interpretations and reporting of the data are the sole responsibility of the authors. The authors acknowledge the efforts of: the Applied Research Branch, Division of Cancer Prevention and Population Science, NCI; the Office of Information Services, and the Office of Strategic Planning, HCFA; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database.
References 1. Breen N, Wagener DK, Brown ML et al (2001) Progress in cancer screening over a decade: results of cancer screening from the 1987, 1992, and 1998 National Health Interview Surveys. J Natl Cancer Inst 93(22):1704–1703 2. BRFSS (2005) http://apps.nccd.cdc.gov/brfss/Trends/agechart.asp?qkey=10060&state=US. Cited Jan. 21, 2005 3. Randolph WM, Goodwin JS, Mahnken JD et al (2002) Regular mammography use is associated with elimination of age-related disparities in size and stage of breast cancer at diagnosis. Ann Intern Med 137(10):783–790 4. McCarthy EP, Burns RB, Coughlin SS et al (1998) Mammography use helps to explain differences in breast cancer stage at diagnosis between older black and white women. Ann Intern Med 128(9):729–736 5. Kerlikowske K, Grady D, Rubin SM et al (1995) Efficacy of screening mammography. A meta-analysis. JAMA 273(2):149–154 6. Randall T (1993) Varied mammogram readings worry researchers. JAMA 269(20):2616–2617 7. Elmore JG, Barton MB, Moceri VM et al (1998) Ten-year risk of false positive screening mammograms and clinical breast examinations. N Engl J Med 338(16):1089–1096 8. Gram IT, Slenker SE (1992) Cancer anxiety and attitudes toward mammography among screening attenders, nonattenders, and women never invited. Am J Public Health 82(2):249–251 9. Lerman C, Trock B, Rimer BK et al (1991) Psychological and behavioral implications of abnormal mammograms. Ann Intern Med 114(8):657–661 10. Velanovich V (1995) Immediate biopsy versus observation for abnormal findings on mammograms: an analysis of potential outcomes and costs. Am J Surg 170(4):327–332 11. Barton MB, Moore S, Polk S et al (2001) Increased patient concern after false-positive mammograms: clinician documentation and subsequent ambulatory visits. J Gen Intern Med 16(3):150–156
317 12. Smith-Bindman R, Chu PW, Miglioretti DL et al (2003) Comparison of screening mammography in the United States and the United kingdom. JAMA 290(16):2129–2137 13. Kerlikowske K, Grady D, Barclay J et al (1993) Positive predictive value of screening mammography by age and family history of breast cancer. JAMA 270(20):2444–2450 14. Carney PA, Miglioretti DL, Yankaskas BC et al (2003) Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann Intern Med 138(3):168–175 15. Feldman J, Smith RA, Giusti R et al (1995) Peer review of mammography interpretations in a breast cancer screening program. Am J Public Health 85(6):837–839 16. Beam CA, Layde PM, Sullivan DC (1996) Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. Arch Intern Med 156(2):209–213 17. Kerlikowske K, Grady D, Barclay J et al (1998) Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. J Natl Cancer Inst 90(23):1801–1809 18. Rutter CM, Taplin S (2000) Assessing mammographers’ accuracy. A comparison of clinical and test performance. J Clin Epidemiol 53(5):443–450 19. Elmore JG, Miglioretti DL, Reisch LM et al (2002) Screening mammograms by community radiologists: variability in false-positive rates. J Natl Cancer Inst 94(18):1373– 1380 20. Barlow WE, Chi C, Carney PA et al (2004) Accuracy of screening mammography interpretation by characteristics of radiologists. J Natl Cancer Inst 96(24):1840–1850 21. Smith-Bindman R, Chu P, Miglioretti DL et al (2005) Physician predictors of mammographic accuracy. J Natl Cancer Inst 97(5):358–367 22. Warren JL, Klabunde CN, Schrag D et al (2002) Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care 40(8 Suppl):IV-3–18 23. Freeman JL, Zhang D, Freeman DH et al (2000) An approach to identifying incident breast cancer cases using Medicare claims data. J Clin Epidemiol 53(6):605–614 24. Baldwin LM, Adamache W, Klabunde CN et al (2002) Linking physician characteristics and medicare claims data: issues in data availability, quality, and measurement. Med Care 40(8 Suppl):IV-82–95 25. Freeman JL, Klabunde CN, Schussler N et al (2002) Measuring breast, colorectal, and prostate cancer screening with medicare claims data. Med Care 40(8 Suppl):IV-36–42 26. Bach PB, Guadagnoli E, Schrag D et al (2002) Patient demographic and socioeconomic characteristics in the SEER-Medicare database applications and limitations. Med Care 40(8 Suppl):IV-19–25 27. Ash AS, Shwartz M, Pekoz EA (2003) In Risk adjustment for measuring health care outcomes, 3rd edn. (Iezzoni LI (ed), Health Administration Press, Chicago, Illinois 28. Raudenbush SW, Bryk AS (2002) Hierarchical linear models: applications and data analysis methods, 2nd edn. Sage Publications, Thousand Oaks, California 29. Christiansen CL, Morris CN (1997) Improving the statistical approach to health care provider profiling. Ann Intern Med 127(8 Pt 2):764–768 30. Choudhry NK, Fletcher RH, Soumerai SB (2005) Systematic review: the relationship between clinical experience and quality of health care. Ann Intern Med 142(4):260–273
123
318 31. Christiansen CL, Wang F, Barton MB et al (2000) Predicting the cumulative risk of false-positive mammograms. J Natl Cancer Inst 92(20):1657–1666 32. Blustein J (1995) Medicare coverage, supplemental insurance, and the use of mammography by older women. N Engl J Med 332(17):1138–1143 33. Office of Technology Assessment (November 1987) Breast Cancer Screening for Medicare Beneficiaries: Effectiveness, Cist to Medicare and Medical Resources Required., U.S. Congress, Office of Technology Assessment, Health program 34. Randolph WM, Mahnken JD, Goodwin JS et al (2002) Using Medicare data to estimate the prevalence of breast cancer screening in older women: comparison of different methods to identify screening mammograms. Health Serv Res 37(6): 1643–1657 35. Mouchawar J, Byers T, Warren M et al (2004) The sensitivity of Medicare billing claims data for monitoring mammography use by elderly women. Med Care Res Rev 61(1):116–127
123
Breast Cancer Res Treat (2006) 100:309–318 36. BI-RADS (2003) BI-RADS—Mammography, American College of Radiology, Reston, VA 37. Taplin SH, Ichikawa LE, Kerlikowske K et al (2002) Concordance of breast imaging reporting and data system assessments and management recommendations in screening mammography. Radiology 222(2):529–535 38. Moseson D, Meharg K (1994) Tumor registry audit of mammography in community practice. Am J Surg 167(5):505–508 39. Rosenberg RD, Lando JF, Hunt WC et al (1996) The New Mexico Mammography Project. Screening mammography performance in Albuquerque, New Mexico, 1991 to 1993. Cancer 78(8):1731–1739 40. Gill KS, Yankaskas BC (2004) Screening mammography performance and cancer detection among black women and white women in community practice. Cancer 100(1):139–148