Comparison of Digital Mammography and Screen-Film ... - AJR

2 downloads 30 Views 592KB Size Report
a nonsignificant higher cancer detection. Comparison of Digital. Mammography and Screen-Film. Mammography in Breast Cancer. Screening: A Review in the ...
Wo m e n ’s I m a g i n g • O r i g i n a l R e s e a r c h Hambly et al. FFDM Versus Screen-Film Mammography for Screening Women’s Imaging Original Research

W O M E N ’S IMAGING

Niamh M. Hambly 1,2 Michelle M. McNicholas1 Niall Phelan 3 Gormlaith C. Hargaden1 Ann O’Doherty4 Fidelma L. Flanagan1 Hambly NM, McNicholas MM, Phelan N, Hargaden GC, O’Doherty A, Flanagan FL

Keywords: breast cancer, breast cancer screening, breast imaging, digital imaging, digital mammography, screen-film mammography

Comparison of Digital Mammography and Screen-Film Mammography in Breast Cancer Screening: A Review in the Irish Breast Screening Program OBJECTIVE. Clinical trials to date into the use of full-field digital mammography (FFDM) for breast cancer screening have shown variable results. The aim of this study was to review the use of FFDM in a population-based breast cancer screening program and to compare the results with screen-film mammography. MATERIALS AND METHODS. The study included 188,823 screening examinations of women between 50 and 64 years old; 35,204 (18.6%) mammograms were obtained using FFDM. All films were double read using a 5-point rating scale to indicate the probability of cancer. Patients with positive scores were recalled for further workup. The recall rate, cancer detection rate, and positive predictive value (PPV) of FFDM were compared with screen-film mammography. RESULTS. The cancer detection rate was significantly higher for FFDM than screenfilm mammography (6.3 vs 5.2 per 1,000, respectively; p = 0.01). The cancer detection rate for FFDM was higher than screen-film mammography for initial screening and subsequent screening, for invasive cancer and ductal carcinoma in situ, and across all age groups. The cancer detection rate for cancers presenting as microcalcifications was significantly higher for FFDM than for screen-film mammography (1.9 vs 1.3 per 1,000, p = 0.01). The recall rate was significantly higher for FFDM than screen-film mammography (4.0% vs 3.1%, p < 0.001). There was no significant difference in the PPVs of recall to assessment for FFDM and screenfilm mammography (15.7% and 16.7%, p = 0.383). CONCLUSION. FFDM resulted in significantly higher cancer detection and recall rates than screen-film mammography in women 50–64 years old. The PPVs of FFDM and screenfilm mammography were comparable. The results of this study suggest that FFDM can be safely implemented in breast cancer screening programs.

DOI:10.2214/AJR.08.2157 Received November 23, 2008; accepted after revision March 26, 2009. 1 Department of Radiology, Irish National Breast Screening Programme, Eccles Screening Unit, Dublin, Ireland. 2 Present address: Jefferson-Honickman Breast Imaging, Thomas Jefferson University Hospital, 1100 Walnut St., Philadelphia, PA 19107. Address correspondence to N. M. Hambly ([email protected]). 3 Department of Medical Physics, Irish National Breast Screening Programme, Eccles Screening Unit, Dublin, Ireland. 4 Department of Radiology, Irish National Breast Screening Programme, Merrion Screening Unit, Dublin, Ireland.

AJR 2009; 193:1010–1018 0361–803X/09/1934–1010 © American Roentgen Ray Society

1010

M

ammography is a well-established screening tool, and screening has been shown to reduce breast cancer mortality due to earlier detection. To date, screen-film mammography has been the reference standard for use in breast cancer screening programs, and all previous randomized controlled trials into population-based breast cancer screening programs were performed using screen-film mammography [1–3]. Since it first gained U.S. Food and Drug Administration approval in 2000, digital mammography has gained in popularity because of its many advantages over screen-film mammography, including elimination of film processing, storage, copying, and retrieval; the ability to manipulate images after acquisition; and the more efficient use of computer-aided detection and telemammography. There were initial concerns that the more lim-

ited spatial resolution of digital mammography compared with the reference standard of screen-film mammography might lead to a reduced sensitivity for cancer detection [4]. However, experimental studies have shown that digital systems have a higher detective quantum efficiency and dynamic range, leading to improved contrast resolution [5]. Also, concerns that the lower spatial resolution of digital imaging would limit the detection of microcalcifications have been discounted by studies reporting that full-field digital mammography (FFDM) shows improved image quality with higher reliability in characterizing calcifications compared with screen-film mammography [6, 7]. Clinical studies to date into the use of FFDM for breast cancer screening have shown variable results. In early trials, Lewin et al. [8, 9] and Skaane et al. [10] showed a nonsignificant higher cancer detection

AJR:193, October 2009

FFDM Versus Screen-Film Mammography for Screening rate for screen-film mammography than for FFDM. However, the Digital Mammographic Imaging Screening Trial (DMIST) [11, 12], which represents the largest trial of digital mammography to date, concluded that FFDM was more accurate in screening preor perimenopausal women younger than 50 years with dense breasts. In a more recently published article describing the final results of the Oslo II study, Skaane et al. [13] reported a significantly higher cancer detection rate in women screened on FFDM, and a number of recent European studies have

yielded results that are more favorable for digital mammography than for screen-film mammography [14–17]. Table 1 summarizes the main findings of these studies. The Irish National Breast Screening Programme (INBSP) was launched in 2000. It invites women ranging in age from 50 to 64 years to undergo breast cancer screening every 2 years. Screening is performed both onsite in static screening units and off-site in mobile units, with all reading performed centrally in the static units. Digital mammography was first introduced into the screening

program on a phased basis in January 2005. Between January 2005 and December 2007, 18.6% (35,204 of 188,823) of patients were screened on digital systems. In April 2008, the INBSP became the first national screening center in Europe to be fully digitized. The aim of this study was to retrospectively review the performance of FFDM in a population-based screening program and to compare its performance with that of the standard of screen-film mammography with respect to recall rate, cancer detection rate, and positive predictive value (PPV).

TABLE 1:  Previous Studies Comparing Full-Field Digital Mammography (FFDM) to Screen-Film Mammography (SFM) Study Design Study [reference no.]

Retrospective or Prospective

Population

Imaging Technique(s)

Patient Age (y)

Study Size (no. of examinations)

Results Recall Rate (p)

PPV1 (p)

INBSP Study [current study]

Retrospective Population-based FFDM or SFM review screening program; initial and subsequent screenings

Oslo I [10]

Prospective Population-based FFDM and SFM 50–69 3,683 paired FFDM > SFM: case control screening program; examinations: 4.6% vs 3.5% study subsequent 3,683 FFDM (NS) screenings only and 3,683 SFM

SFM > FFDM: 0.76% SFM = 20%, vs 0.62% (NS) FFDM = 12% (NA)

Oslo II [13]

Prospective randomized control trial

Population-based screening program; initial and sub­ sequent screenings

FFDM > SFM: 0.59% SFM = 15.1%, vs 0.38% (0.02) FFDM = 13.9% (0.68)

DMIST [11]

Prospective study

Population recruited FFDM and SFM over 2 years at 33 sites in the United States and Canada

42,760 paired 8.4% for both Accuracy of FFDM examinations: FFDM and SFM was significantly 42,760 FFDM higher in pre- or and 42,760 SFM perimenopausal women < 50 y with dense breastsa

Lewin et al. [9]

Prospective study

Women presenting FFDM and SFM ≥ 40 for screening at two institutions

6,736 paired SFM > FFDM: SFM > FFDM: 0.49% FFDM = 3.4%, examinations: 14.9% vs 11.8% vs 0.4% (NS) SFM = 3.3% (NS) 6,736 FFDM (< 0.001) and 6,736 SFM

FFDM or SFM

50–64 188,823 total: FFDM > SFM: 35,204 FFDM + 4.0% vs 3.1% 153,619 SFM (< 0.001)

Cancer Detection Rate (p)

45–69 23,929 total: 16,985 SFM + 6,944 FFDM

52,172 total: 25,901 SFM + 9,841 PC-DR + 16,430 CR

FFDM > SFM: 4.2% vs 2.5% (< 0.001)

SFM > PCDR: 1.4% vs 1.0% (< 0.001) SFM > CR: 1.4% vs 1% (< 0.001)

FFDM > SFM: 0.63% FFDM = 15.7%, vs 0.52% (0.01) SFM = 16.7% (0.383)

Heddson et al. [17] Retrospective Population-based PCDR, CR, or review screening program; SFM subsequent screening only

40–74

PCDR > SFM 0.49% PCDR = 47%, CR vs 0.31% (0.01) = 39%, SFM = 22% CR > SFM (< 0.001) 0.38% vs 0.31% (0.22)

Del Turco et al. [16] Retrospective Population-based FFDM or SFM review screening program; initial and subsequent screenings

50–69 28,770 total: FFDM > SFM: FFDM > SFM (NS) 14,385 FFDM + 4.56% vs 3.96% 0.72% vs 0.58% 14,385 SFM (0.01) (0.14)

Vigeland et al. [15] Retrospective Population-based FFDM only; 50–69 343,002 total: FFDM = SFM: FFDM > SFM for all review screening program; compared with 18,239 FFDM + 4.09% vs 4.16% cancers: 0.77% vs initial screenings SFM data from 324,763 SFM (NS) 0.65% (0.058) only previous 9 y DCIS: 0.21% vs 0.11% (< 0.001)

FFDM = 15.9%, SFM = 14.7% (0.65)

FFDM = 16.6%, SFM = 13.5% (0.014)

Note—PPV1 = positive predictive value of recall to assessment, NS = nonsignificant, NA = not available, PCDR = photon-counting direct radiography, CR = computed radiography, DM = digital mammography, DMIST = Digital Mammographic Imaging Screening Trial, INBSP = Irish National Breast Screening Programme, DCIS = ductal carcinoma in situ. aThe end point of the study was diagnostic accuracy.

AJR:193, October 2009

1011

Hambly et al. Materials and Methods

The composition of both groups was comparable in terms of screening round and age distribution. Twenty-seven percent (9,546 of 35,204) of digital screenings were initial studies compared with 28.7% (44,156 of 153,619) of analog screenings. For women undergoing their first screening, the average age was 53.5 years for FFDM and 54.1 years for screen-film mammography. For women undergoing their second or subsequent screening, the average age was 58.6 years for FFDM and 58.5 years for screen-film mammography. Table 2 compares the two groups in terms of age and screening round. Information regarding breast density, socioeconomic status, and cancer risk factors such as menopausal status or hormone replacement therapy use is not collected by the INBSP and is therefore not included for comparison. A retrospective analysis was performed to determine if screening was performed on a digital system or screen-film system, if the woman was recalled for further assessment, if a biopsy was performed, and if cancer was diagnosed. The recall rate, biopsy rate, cancer detection rate, and PPV for women who underwent FFDM were then determined and compared with those values for women who underwent standard screen-film

Between January 1, 2005, and December 31, 2007, 245,863 invitations to undergo breast cancer screening in the INBSP were sent to 188,823 women. This group represented all eligible women (age range, 50–64 years) in the catchment area of the INBSP. The uptake rate was 76.8%, and 188,823 screenings of 146,114 women were performed during the study period. Women were selected to undergo screen-film mammography or FFDM when they presented for screening. Assignment was based on the time of check-in—that is, every third or fourth patient was assigned to digital mammography depending on the screening center. Patient age, breast density, or menopausal status did not influence patient selection. Women were not offered a choice of screening technique. Of the 188,823 screening mammograms, 35,204 (18.6%) were obtained using FFDM and 153,619 (81.4%) using screen-film mammography. These examinations were initial (prevalence) screenings in 53,702, of which 9,546 (17.8%) were performed on digital systems, and were subsequent (incidence) screenings in 135,121, of which 25,658 (19%) were performed digitally.

TABLE 2:  Age Distribution and Screening Round Compared for Screen-Film Mammography (SFM) and Full-Field Digital Mammography (FFDM) SFM (n = 153,619) Characteristic

FFDM (n = 35,204)

No.

%

No.

%

50–54

53,229

34.7

12,188

34.6

55–59

56,209

36.6

12,493

35.5

60–64

44,181

28.8

10,523

29.9

Initial

44,156

28.7

9,546

27.1

Subsequent

109,463

71.3

25,658

72.9

Age (y)

Screening round

mammography screening. All women signed a consent form to participate in the screening program and agreed in writing to the collection, storage, and exchange of their health records for audit and quality assurance. Institutional review board approval was obtained.

Image Acquisition The screen-film mammography images were acquired on one of two units: a GE 800 T (GE Healthcare) using a molybdenum anode and molybdenum–rhodium filter or a Mammomat 3000 (Siemens Healthcare) using a molybdenum–tungsten anode and molybdenum–rhodium filter. The FFDM images were acquired using one of three machines: Sectra MDM (Sectra), Lorad Selenia (Hologic), or GE Essential (GE Healthcare).

Image Interpretation Seven specialist breast radiologists with a minimum of 5 years’ experience in reading mammography and reading an average of 20,000 examinations per year participated in image interpretation. Screen-film mammography images were read using standard motorized mammography alternators. A magnifying glass was offered for reading. Old films were available and displayed under the current studies. FFDM soft-copy images were read using a PACS mammography review workstation (IDS5, Sectra). The workstation included two high-resolution monitors (2,000 × 5,000 pixels). Initially, all four views were displayed. The images were then displayed at full resolution with one mediolateral oblique view on each monitor followed by one craniocaudal view on each monitor. Any further image manipulation such as zooming or windowing was left to the discretion of the radiologist interpreting the study. The review workstation was located in a darkened room away from the mammography alternators. Previous mammograms were reviewed on a standard viewbox adjacent to the workstation.

TABLE 3:  Recall Rates, Cancer Detection Rates, and Positive Predictive Values of Recall to Assessment for Screen-Film Mammography (SFM) and Full-Field Digital Mammography (FFDM) All Screenings (n = 188,823) Performance Result

SFM (n = 153,619)

First Screenings (n = 53,702)

FFDM (n = 35,204)

p

SFM (n = 44,156)

FFDM (n = 9,546)

p

Second and Subsequent Screenings (n = 135,121) SFM (n = 109,463)

FFDM (n = 25,658)

p

Recall ratea

3.1

4.0

< 0.001

5.7

7.3

< 0.001

2.0

2.8

< 0.001

Cancer detection rateb

5.2

6.3

0.01

7.0

7.9

0.483

4.4

5.7

0.008

Invasive cancer detection rateb

4.2

5.0

0.054

5.7

6.4

0.63

3.6

4.4

0.047

DCIS detection rateb

0.95

1.3

0.072

1.3

1.5

0.66

0.8

1.2

0.036

15.7

0.383

12.2

10.8

0.337

21.9

20.5

0.455

PPV1 (%)

16.7

Note—DCIS = ductal carcinoma in situ, PPV1 = number of cancers detected as a percentage of the women recalled to assessment. aPercentage of screenings recalled for further assessment. bNumber of cancers detected per 1,000 screenings.

1012

AJR:193, October 2009

FFDM Versus Screen-Film Mammography for Screening TABLE 4:  Recall Rates and Cancer Detection Rates for Screen-Film Mammography (SFM) and Full-Field Digital Mammography (FFDM) According to Age Distribution

Statistical Analysis

No. of Women Screened Age (y)

tion oncologists as well as nursing staff, administration staff, and radiographers.

Recall Ratea

Cancer Detection Rateb

SFM

FFDM

SFM

FFDM

p

SFM

FFDM

p

50–54

53,229

12,188

4.4 (2,343)

5.3 (649)

< 0.001

4.9 (261)

5.7 (70)

0.27

55–59

56,209

12,493

2.3 (1,306)

3.4 (421)

< 0.001

4.7 (263)

5.8 (72)

0.13

60–64

44,181

10,523

2.4 (1,080)

3.2 (336)

< 0.001

6.1 (268)

7.5 (79)

0.11

Note—Numbers in parentheses are the actual number of women recalled or diagnosed with cancer. aPercentage of screenings recalled for further assessment. bNumber of cancers detected per 1,000 screenings.

All mammograms were double read by two radiologists. Reader 2 was not blinded to reader 1’s recommendations. Old films were available at the time of reading. Each mammogram was assigned an R category 1–5. The R classification system is a 5-category rating scale used to define the probability of cancer on a radiologic study. The categories are defined as follows: R1, normal study; R2, benign finding; R3, indeterminate finding requiring further workup; R4, likely malignant finding; and R5, highly suspicious for malignancy. The R classification system is similar to BI-RADS with the main difference being that all patients with R3 findings are recalled for further assessment, whereas a diagnosis of BI-RADS category 3 usually prompts short-interval follow-up [18]. Six-month recall is not practiced in our screening program. For each case, if both readers assigned a category of R1 or R2, the woman was listed for routine screening in 2 years. If both readers assigned a category of R3–R5, the woman was automatically recalled for further workup. If there was a discrepancy in R category—that is, if one reader assigned a category of R1 or R2 but the other reader assigned a category of R3, R4, or R5, the patient was listed for discussion at a consensus meeting. This process in INBSP has recently been described in an article by Shaw et al. [19]. The consensus meeting was held twice weekly. All radiologists were invited to attend, and a minimum of two was required. All cases with a discrepancy in R category from the previous week were reviewed, and a consensus was reached as to whether the patient should be recalled for assessment or listed for routine screening.

Further Workup Further diagnostic workup for all women recalled to assessment was performed in the central screening units. Diagnostic workup involved acquiring additional mammographic views including spot compression, true lateral, and magnification views and performing an ultrasound examination and MRI as required.

AJR:193, October 2009

All solid masses, R3–R5 microcalcifications, and other suspicious lesions underwent core biopsy using a 14-gauge biopsy device. An 11-gauge suction device (Mammotome, Ethicon Endo-Surgery) was used in some cases depending on radiologist preference. Biopsy was performed under ultrasound guidance if possible and stereotactic guidance was used if necessary. On-site specimen radiographs were obtained to confirm adequate sampling of microcalcifications. All samples were evaluated by a dedicated breast pathologist. All lesions biopsied were discussed at a multidisciplinary meeting within 1 week of biopsy. The multidisciplinary meeting was attended by breast surgeons, breast radiologists, breast pathologists, and medical and radia-

The recall rate, cancer detection rate, biopsy rate, and PPV were calculated for digital mammography and were compared with those values for standard screen-film mammography. The studies were subdivided into initial screenings and subsequent screenings and according to patient age. The indications for recall were noted in all women diagnosed with cancer, and the cancer detection rate based on the abnormality detected was compared for the two groups. The recall and cancer detection rates were also compared as a function of time for the digital group to determine whether a learning curve was apparent. The recall rate was defined as the percentage of women screened who were recalled for further diagnostic workup. The cancer detection rate was defined as the number of cancers detected per 1,000 women screened. The PPV1 was the number of cancers detected as a percentage of the women recalled for assessment. The PPV2 was the number of cancers detected as a percentage of the women who underwent biopsy. Statistical analysis was performed using a statistical software program (SigmaStat 3.0, Systat Software). A chi-square test was used to compare recall rate, cancer detection rate, and PPVs in

TABLE 5:  Cancer Detection Rates for Screen-Film Mammography (SFM) and Full-Field Digital Mammography (FFDM) Based on the Type of Abnormality Detected Cancer Detection Ratea (Actual No. of Cancers Detected) Mammographic Abnormality Detected

All Cancers SFM

Invasive Cancers

FFDM

p

SFM

FFDM

DCIS p

SFM

FFDM

p

Microcalcifications

1.3 (202)

1.9 (66)

0.01

0.6 (93)

0.7 (25)

0.48 0.7 (109) 1.2 (41)

0.009

Mass

2.4 (370)

2.7 (96)

0.27

2.3 (350)

2.6 (93)

0.2

0.08 (3)

0.49

Architectural distortion

0.7 (109)

1.0 (36)

0.06

0.7 (100)

1.0 (36)

0.03 0.06 (9)

0 (0)

0.16

Asymmetry

0.7 (111) 0.68 (23) 0.66

0.7 (103)

0.6 (21)

0.63 0.05 (8)

0.06 (2)

0.9

0.1 (20)

Note—Numbers in parentheses are actual number of cancers detected. DCIS = ductal carcinoma in situ. aPer 1,000 screening examinations.

TABLE 6:  Size of Invasive Tumors Detected by Screen-Film Mammography (SFM) and Full-Field Digital Mammography (FFDM) Tumor Size (mm)

SFM

FFDM

No.

%

No.

%

p

≤ 10

187

28.9

43

24.6

0.25

≤ 15

361

55.9

103

58.9

0.48

> 15

285

44.1

72

41.1

0.48

Total

646

175

1013

Hambly et al. FFDM and screen-film mammography. The significance level was set at a p value of < 0.05.

Results Table 3 summarizes the number of women screened, the recall rate, the cancer detection rate, and the PPV1 both in total and for initial and subsequent screenings. The rates for FFDM and screen-film mammography are compared. Recall Rate A total of 6,135 women (of 188,823 screening mammograms) were recalled for further diagnostic workup, giving a recall rate of 3.2% overall. For women undergoing screenfilm mammography, 3.1% (4,729 of 153,619) were recalled compared with 4.0% (1,406 of 35,204) of women who underwent FFDM. This difference was statistically significant (p < 0.001). The recall rate was higher in women undergoing their first screening—6.0% (3,220 of 53,702). For first-screen women undergoing screen-film mammography, the recall rate was 5.7% (2,526 of 44,156) compared with 7.3% (694 of 9,546) in those undergoing FFDM (p < 0.001). As expected, the recall rate was lower in women undergoing a subsequent screening: 2.2% (2,915 of 135,121 screenings). For subsequent-screen women undergoing screenfilm mammography, the recall rate was 2.0% (2,203 of 109,463) compared with 2.8% (712 of 25,658) of women undergoing FFDM. This difference was also statistically significant (p < 0.001). The recall rate for FFDM was significantly higher for women of all ages (Table 4). For women 50–54 years old, the recall rate for FFDM was 5.3% (649 of 12,188) versus 4.4% for screen-film mammography (2,343 of 53,229; p < 0.001). For women 55–59 years old, the recall rate for FFDM was 3.4% (421 of 12,493) versus 2.3% for screen-film mammography (1,306 of 56,209; p < 0.001). For women 60–64 years old, the recall rate for FFDM was 3.2% (336 of 10,523) versus 2.4% for screen-film mammography (1,080 of 44,181; p < 0.001). Cancer Detection Rate During the study 1,013 cancers were detected giving an overall cancer detection rate of 5.4 per 1,000. The rate was higher in those undergoing their first screening round; 384 cancers were detected in 53,702 screenings (7.2 per 1,000). Six hundred twenty-nine

1014

cancers were detected in the 135,121 second or subsequent screenings (4.7 per 1,000). The cancer detection rate was significantly higher in those screened on FFDM: 221 cancers were detected in 35,204 digital screenings (6.3 per 1,000), and 792 cancers were detected in 153,619 analog screenings (5.2 per 1,000) (p = 0.01). When the study cohort was subdivided into women undergoing their first screening and those undergoing subsequent screening rounds, the detection rate remained higher in the digital group. Of the women undergoing their initial screening, 75 cancers were detected in 9,546 digital screenings (7.9 per 1,000) and 309 cancers were detected in 44,156 analog screenings (7.0 per 1,000). This difference did not achieve statistical significance (p = 0.483). Of the women undergoing a second or subsequent screening, a significantly higher number of cancers were detected in those screened on FFDM. One hundred fortysix cancers were detected in 25,658 digital screenings (5.7 per 1,000) and 483 cancers were detected in 109,463 analog screenings (4.4 per 1,000) (p = 0.008). The cancer detection rate was higher for FFDM across all age categories, but this difference in detection rates did not achieve statistical significance for individual groups (Table 4). For women 50–54 years old, the cancer detection rate was 5.7 per 1,000 (70 cases in 12,188 screenings) for FFDM versus 4.9 per 1,000 (261 in 53,229) for screen-film mammography (p = 0.27). For women 55–59 years old, the cancer detection rate was 5.8 per 1,000 (72 cases in 12,493 screenings) for FFDM versus 4.7 per 1,000 (263 in 56,209) for screen-film mammography (p = 0.13). For women 60–64 years old, the cancer detection rate for FFDM was 7.5 per 1,000 (79 cases in 10,523 screenings) versus 6.1 per 1,000 (268 in 44,181) for screen-film mammography (p = 0.11). In the women screened using screen-film mammography, 146 (18.4%) of the 792 cancers detected were ductal carcinoma in situ (DCIS). In the women screened on FFDM, 46 (20.8%) of the 221 cancers detected were DCIS (p = 0.48). The cancers were subdivided into invasive cancers and DCIS, and the detection rates for screen-film mammography and FFDM were compared. The detection rates for both invasive cancers and DCIS were higher in women screened on FFDM. The detection rate for invasive cancers was 5.0 per 1,000 women

screened using FFDM (175 of 35,204) and 4.2 per 1,000 women screened using screenfilm mammography (646 of 153,619) (p = 0.054). The detection rate for DCIS was 1.3 per 1,000 women screened on FFDM (46 of 35,204) and 0.95 per 1,000 for women screened on screen-film mammography (146 of 153,619) (p = 0.072). PPV1 The PPV based on the number of women recalled for assessment (PPV1) who were subsequently diagnosed with cancer was 792 of 4,729 (16.7%) women undergoing screenfilm mammography and 221 of 1,406 (15.7%) women undergoing FFDM (p = 0.383). In women undergoing their initial screening, the PPV1 was 309 of 2,526 (12.2%) for screen-film mammography and 75 of 694 (10.8%) for FFDM (p = 0.337). In women undergoing a subsequent screening, the PPV1 was 483 of 2,203 (21.9%) for screenfilm mammography and 146 of 712 (20.5%) for digital mammography (p = 0.455). Biopsy Rate The biopsy rate was similar in both groups: 470 of 1,406 women (33.4%) recalled after FFDM screening underwent biopsy and 1,698 of 4,729 women (35.9%) recalled after screen-film mammography screening underwent biopsy (p = 0.09). The PPV based on the number of women who underwent biopsy (PPV2) was also determined. For those screened on FFDM, 221 cancers were diagnosed in 470 women who underwent a biopsy, giving a PPV2 of 47%. In those screened on screen-film mammography, 792 cancers were diagnosed in 1,698 women biopsied, giving a PPV2 of 46.6%. This difference was not significant (p = 0.93). Effect of Time and Experience on Recall Rate and Cancer Detection Rate The patients were subcategorized according to the date of screening, and the recall rate and cancer detection rate were compared over time. In year 1, 7.8% (4,759 of 60,636) of women were screened on FFDM. Thirty-three percent (1,580 of 4,759) of these women were attending an initial screening. The recall rate in the FFDM group was 4.1% (193 of 4,759), and the cancer detection rate was 6.7 per 1,000 (32 of 4,759). The recall rate and cancer detection rate for screen-film mammography during this period were 3.2%

AJR:193, October 2009

FFDM Versus Screen-Film Mammography for Screening per 1,000 (1,769 of 55,877) and 5.2 per 1,000 (292 of 55,877), respectively. The difference in recall rates was significant (p = 0.001), but the difference in cancer detection rates was not (p = 0.21). In year 2, 17.3% (10,963 of 63,403) of women were screened using FFDM, 24.3% (2,664 of 10,963) of whom were undergoing their first screening. The recall rate was 3.7% (407 of 10,963), and the cancer detection rate 6.3 per 1,000 (69 of 10,963). The recall rate and cancer detection rates for screen-film mammography were 2.9% per 1,000 (1,507 of 52,440) and 4.9 per 1,000 (257 of 52,440), respectively. Again, the difference in recall rates was significant (p < 0.001), but the difference in cancer detection rates was not (p = 0.07). In year 3, 30.1% (19,482 of 64,784) of women were screened using FFDM and 27.2% (5,302 of 19,482) of these were initial screenings. The recall rate was 4.1% per 1,000 (806 of 19,482), and the cancer detection rate was 6.2 per 1,000 (120 of 19,482). The recall rate and cancer detection rate for screen-film mammography were 3.2% per 1,000 (1,453 of 45,302) and 5.4 per 1,000 (243 of 45,302), respectively. The difference in recall rates was again significant (p < 0.001), but the difference in cancer detection rates was not (p = 0.23). There was no significant difference in the recall rate (p = 0.19) or cancer detection rate (p = 0.91) for FFDM over time. Mammographic Abnormality Detected The type of mammographic abnormality detected was recorded for all women who were diagnosed with cancer and was compared for screen-film mammography and FFDM. Table 5 shows the cancer detection rates for FFDM and screen-film mammography based on the type of mammographic abnormality detected. The cancer detection rate due to the detection of microcalcifications was significantly higher for FFDM for all cancers (i.e., invasive and DCIS combined) (1.9 per 1,000 for FFDM vs 1.3 per 1,000 for screen-film mammography, p = 0.01) and for DCIS alone (1.2 per 1,000 for FFDM vs 0.7 per 1,000 for screen-film mammography, p = 0.009). For invasive cancers, the cancer detection rate due to the detection of architectural distortion was significantly higher for FFDM than for screen-film mammography (1.0 vs 0.7 per 1,000, respectively; p = 0.03). There was no other significant difference between the two groups.

AJR:193, October 2009

Tumor Size Table 6 compares the sizes of the tumors detected by FFDM and screen-film mammography. Of the 821 invasive tumors detected, 465 (56.6%) measured ≤ 15 mm at diagnosis and 232 (28.3%) measured ≤ 10 mm. For those screened using FFDM, 103 of 175 (58.9%) invasive cancers measured ≤ 15 mm and 43 (24.6%) measured ≤ 10 mm. For those screened using screen-film mammography, 361 of 646 (55.9%) invasive cancers measured ≤ 15 mm and 187 (28.9%) measured ≤ 10 mm. These differences were not statistically significant. Discussion Table 1 summarizes the main findings of the previously published studies comparing digital and screen-film mammography. The recent publication of the final results of the Oslo II study [13] showed a significantly higher cancer detection rate in women between 45 and 69 years old screened on FFDM (5.9 vs 3.8 per 1,000, respectively; p = 0.02). The results of our study support this finding, showing a significantly higher cancer detection rate for FFDM compared with screen-film mammography (6.3 vs 5.2 per 1,000, respectively; p = 0.01). These results suggest that digital mammography may be superior to screenfilm mammography for cancer detection in women older than 50 years. In our study, the two groups of patients were drawn from the same population and were similar in terms of age and screening round. The mammograms were interpreted using the same protocol and by the same radiologists throughout the 3-year period. When digital mammography was introduced, the screening program was well established with a reproducible practice of radiologist recall, reading policies, and guidelines. Therefore, it is likely that no other factors influenced the higher cancer detection rate in the women who underwent digital mammography. When the screening examinations were subdivided into initial and subsequent screenings, the difference in cancer detection rates remained. This difference was significant for women undergoing a second or subsequent screening (p = 0.008) but not for those undergoing initial screening (p = 0.40). This latter finding is probably due to the fact that a smaller number of women underwent initial screening, so the data lack the statistical power to determine a significant difference (Table 3). The cancer detection rate was higher for FFDM than screen-film mammography for

both invasive cancer (p = 0.054) and DCIS (p = 0.072), and these results approached statistical significance. Although the cancer detection rate was significantly higher in women undergoing digital mammography overall, when broken down into age groups the difference did not reach statistical significance (Table 4). Again, this finding is almost certainly due to the smaller numbers and we anticipate that a significant difference will be shown when larger numbers are compared. Our results are concordant with the final results of the Oslo II trial [13, 14]. In that study, 23,929 women between 45 and 69 years old attending a population-based screening program were randomized to undergo either screen-film mammography or FFDM, with approximately 29% (6,944) undergoing FFDM. Images were interpreted using independent double reading, and positive results were discussed at a consensus meeting before recall. They reported a cancer detection rate of 3.8 per 1,000 for screen-film mammography and 5.9 per 1,000 for FFDM, which was statistically significant (p = 0.02, chi-square test). Our screening program has many similarities to that used in the Oslo II study [13], such as biennial screening, independent double reading, and consensus review of reader discrepancies. However, our study represents a larger study population, with 35,204 digital screenings in our study versus 6,944 in theirs. Overall 1,013 cancers were detected in our study versus 105 cancers in Oslo II. The Oslo II investigators reported no significant difference in PPVs, which is also in accordance with our study. Their study included women younger than 50 years old, in whom FFDM has already been shown to be more effective than screen-film mammography for cancer detection [11]. In another Norwegian study, Vigeland et al. [15] compared 18,239 women screened on FFDM with 324,763 women screened on screen-film mammography and found a nonsignificant higher cancer detection rate for FFDM than screen-film mammography (7.7 vs 6.5 per 1,000, respectively; p = 0.058). The results achieved statistical significance for DCIS detection (2.1 vs 1.1 per 1,000, p < 0.001). A limitation of that study was that the large screen-film mammography group consisted of merged data from 18 different counties collected over a 9-year period and read by different radiologists. None of the radiologists who read the screen-film mammography screening examinations were involved in

1015

Hambly et al. reading the digital screening examinations. In our study, seven experienced radiologists read both FFDM and screen-film mammography and remained constant throughout the 3-year period. In the current study, the cancer detection rate and DCIS detection rate were significantly higher for cancers presenting as microcalcifications. The invasive cancer detection rate was significantly higher for cancers presenting as architectural distortion. These findings are concordant with those of another European study by Del Turco et al. [16]. In that study, 28,770 women between 50 and 69 years old undergoing biennial screening in a Florence screening program underwent either FFDM or screen-film mammography. Films were double read using the R classification system. Those investigators found a higher cancer detection rate for FFDM than screen-film mammography (7.2 vs 5.8 per 1,000, respectively; p = 0.14), but this difference did not reach statistical significance, likely because of the small sample size; 104 cancers were detected in the digital group and 84 cancers in the analog group. However, Del Turco et al. did report a significantly higher detection rate of cancers depicted as microcalcifications for FFDM than screenfilm mammography (2.6 vs 1.2 per 1,000, p = 0.007). This finding suggests that the higher cancer detection rate for FFDM may be secondary to the improved detection of microcalcifications and architectural distortion. Another European study showing favorable results for FFDM involved the comparison of three techniques: screen-film mammography, photon-counting direct radiography, and computed radiography (CR) [17]. For this retrospective study of a population-based screening program, investigators compared 52,172 screening studies. The cancer detection rates were 3.1 per 1,000 for screen-film mammography, 4.9 per 1,000 for photon-counting direct radiography (p = 0.01), and 3.8 per 1,000 for CR (p = 0.22). Unlike our study, they reported a significantly higher PPV for digital mammography: 22% for screen-film mammography, 47% for photon-counting direct radiography (p < 0.001), and 39% for CR (p < 0.001). DMIST by Pisano et al. [11] represents the largest clinical trial of digital mammography published to date [12]. In that study, 42,760 asymptomatic women were recruited at 33 different sites to undergo both screen-film mammography and FFDM. Both examinations were read independently by two single

1016

readers, one reader for screen-film mammography and one for FFDM. The readers rated the mammograms using a 7-point malignancy scale suitable for receiver-operating-characteristic curve analysis and using BI-RADS. Further workup was performed if either reader recommended it. They found no significant difference between digital and screen-film mammography except in preand perimenopausal women younger than 50 years with dense breasts, in whom FFDM showed greater diagnostic accuracy. It is difficult to compare our study with DMIST because ours is a retrospective study and we are unable to subcategorize patients by breast density and menopausal status. However, what is of interest is that because the INBSP does not currently offer screening to women younger than 50 years, the population reported to benefit from digital mammography by Pisano et al. [11] was not included in our study. We can therefore exclude that category of women as accounting for the significant difference in the cancer detection rates. Our study results suggest that a broader category of women than previously thought may benefit from the use of digital mammography in breast cancer screening. Earlier trials by Lewin et al. [8, 9] and the Oslo I trial by Skaane et al. [10] reported a slightly higher cancer detection rate in women screened on screen-film mammography, although this difference was not statistically significant. In a study by Lewin et al. [9], 6,736 paired examinations were performed on women 40 years old and older. Forty-two cancers were detected in total: 33 by screen-film mammography and 27 by FFDM. The difference in these results was not significant (p > 0.1, McNemar chi-square test). Probably the most limiting aspect of that study in its ability to show true differences in cancer detection is the relatively small numbers of cancers diagnosed (42 vs 1,013 in the INBSP study). Also, in the study by Lewin et al., the digital images were acquired using a prototype unit, and a prototype workstation with more limited spatial resolution (1,800 × 2,300 pixels) was used for soft-copy display. The authors commented that the workstation interface was not user-friendly, which may have been a source of distraction to the reader. The studies were also read by a single reader only. Lewin et al. performed a discrepancy analysis of all cases in which the interpretation of the screen-film examination differed from that of the digital examination. The most common reasons cited for discrepancy were fortuitous positioning

and minor differences in opinion rather than any factor inherent to the technique used. The Oslo I study by Skaane et al. [10] also reported a nonsignificant higher cancer detection rate in women screened on screenfilm mammography than those screened on FFDM (7.6 vs 6.2 per 1,000, respectively; p = 0.23, McNemar test). However, the numbers in that study were also relatively small (3,683 paired examinations), and the authors commented that the reading environment for the digital studies was suboptimal with high ambient lighting. They performed a conspicuity analysis for all cancers detected and concluded that both techniques were equal overall, with 61% of tumors showing equal conspicuity and 19.3% showing superior conspicuity on both screen-film mammography and FFDM. They, therefore, concluded that the cancers missed on FFDM were not due to poor image quality because the cancers were visible in retrospect, and they attributed the misses to both a suboptimal reading environment and a learning curve effect. The number of cancers detected in that study was also relatively small with 28 cancers detected on screen-film mammography and 23 detected on FFDM, again limiting its power to show small differences. In this study, we compared the cancer detection rates for digital mammography over time to look for a learning curve effect but found no significant difference (p = 0.91). In our study, the recall rate was significantly higher for digital mammography in all age categories and for women undergoing both initial and subsequent screenings. Again, these findings are similar to the Oslo studies [10, 13], both of which reported a higher recall rate for the FFDM group. This difference was significant in the Oslo II study (4.2% for digital mammography vs 2.5% for screen-film mammography, p < 0.001). Del Turco et al. [16] also reported a significantly higher recall rate overall for digital mammography than screen-film mammography (4.56% vs 3.96%, p = 0.01). In the Italian study, the recall rate for FFDM was significantly higher than screen-film mammography for the detection of microcalcifications but not for masses or architectural distortion. They found that although the recall rate was higher for women of all ages and all breast density categories, the difference in recall rates was significant only for women 50–59 years old and for women with dense breasts (> 75% density). We did not record breast density, but with large numbers in this study

AJR:193, October 2009

FFDM Versus Screen-Film Mammography for Screening of a very homogeneous population, we think that it is reasonable to assume that the distribution of breast density would be similar for both groups. The higher recall rate for FFDM than screen-film mammography in our study could be attributable to improved conspicuity of abnormalities with digital mammography. It could also reflect a degree of unfamiliarity with a new technique. However, if the latter is the case, then the recall rate should have decreased over time, which was not apparent in our study; we detected no significant difference in recall rate over time (p = 0.19). The higher recall rate in the digital group may account for the higher cancer detection rate because previous studies have shown that cancer detection rates increase with increasing recall rates. This increase in cancer detection rates occurs at the expense of increasing false-positive rates. Otten et al. [20] reported that breast cancer detection rates can be increased by lowering the threshold for recall, especially for recall rates of 1–4% [20]. However, with further increases in recall rate, the cancer detection rate levels off with an associated disproportionate increase in false-positives. According to the study by Otten et al., for each 1% incremental increase in recall rate above 5%, the detection rate increases by only 0.03%, whereas PPVs decrease to less than 10%. The aim of a screening program is to increase cancer detection while avoiding unnecessary morbidity and cost associated with increasing false-positive rates. The INBSP operates within strict quality assurance guide­ lines. Acceptable ranges for recall rate, biopsy rate, and cancer detection rate have been established nationally and are in keeping with international guidelines. The increased recall rate associated with FFDM in our study (from 3.1% with screen-film mammography to 4.0% with FFDM) is still within the acceptable range. The PPVs for FFDM and screen-film mammography were comparable, which implies that the increased recall rate was not associated with an unacceptable increase in false-positives. The recall rate of a screening program depends on a number of factors including the skill of the readers, factors inherent to the screening population such as age and screening round, national health policy, and medicolegal issues. During our study, the same experienced readers read the FFDM and screen-film mammography screening examinations with no change in protocol or recall

AJR:193, October 2009

practice. The threshold for recall of suspicious findings was not lowered for FFDM; therefore, it is likely that the increased recall rate was due to increased detection of mammographic abnormalities. Whether this increased detection was due to the increased perception of subtle abnormalities or to the increased interpretation of perceived abnormalities as being suspicious is a topic for ongoing research. In previous studies from the United States by both Lewin et al. [8, 9] and the DMIST group [11], the recall rates were much higher than the recall rates reported in the INBSP and other European studies [12]. This phenomenon has been previously described by Smith-Bindman et al. [21]; they reported that recall rates are twice as high in the United States as in the United Kingdom. Lewin et al. [9] found a significantly lower recall rate in the FFDM group than in the screen-film mammography group (11.8% vs 14.9%, respectively; p < 0.001, McNemar chi-square model). In the DMIST trial, the recall rate was 8.4% for both FFDM and screen-film mammography. The lower recall rate in our study and other European studies may be representative of inherent differences between the screening systems in Europe and the United States, as previously commented on by Skaane et al. [10]. The lower threshold for recall of subtle abnormalities in the United States is believed to reflect a difference in the medicolegal environment. A potential criticism of our study is that because the women screened in 2006 and 2007 have not yet had their 2-year followup, early false-negative studies cannot be excluded. However, the main premise of our study is that the cancer detection rate is higher for digital mammography and although we cannot evaluate for false-negative studies, this is true of both digital and analog groups. Some women (~ 25%) underwent two screening mammography examinations during the study period. We do not think that this would have influenced the results of the study because women are removed from the screening population to a symptomatic service once they are diagnosed with cancer and were recalled for a specific abnormality only once. Assignment to digital or analog mammography was not influenced by the type of mammography examination previously performed and was based only on the time of check-in. Another potential criticism is the possibility of bias introduction during random-

ization. However, patients were not questioned about menopausal status or hormone replacement therapy use before assignment and their breast density was not reviewed. We therefore have no reason to believe that women were preferentially assigned to one technique over the other. There were no differences between the two groups in terms of screening round or age distribution. Information regarding breast density, menopausal status, and other risk factors such as hormone replacement therapy use, parity, and age of menarche is not recorded by the INBSP. This is a limitation of our study, and subtle differences between the two groups cannot be definitively excluded. However, the large size of our study population should minimize any potential hidden bias introduced by small differences in these factors. This study represents the largest review of the use of digital mammography in a population-based breast screening program to date. The results are very favorable for FFDM, which showed a cancer detection rate significantly higher than that for screen-film mammography (6.3 vs 5.2 per 1,000, p = 0.01). The benefit was apparent from the outset and was maintained in both the initial and subsequent screening groups. These findings support the previous findings of Skaane et al. [13]. Women younger than 50 years who have been shown to benefit from digital mammography in the DMIST trial were not included in our study. The results of this study indicate that the benefit of digital mammography can be extended to a broader group of women up to the age of 64 years. These findings further suggest that FFDM with soft-copy reading can be safely implemented in largescale breast cancer screening programs. Acknowledgments We thank Albert Winston and Donal Kiernan for their contribution to data collection. References 1. Berry DA, Cronin KA, Plevritis SK, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med 2005; 353:1784–1792 2. Tabár L, Fagerberg CJ, Gad A, et al. Reduction in mortality from breast cancer after mass screening with mammography: randomised trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet 1985; 1:829–832 3. Nyström L, Andersson I, Bjurstam N, Frisell J, Nordenskjöld B, Rutqvist LE. Long-term effects of mammography screening: updated overview of

1017

Hambly et al. the Swedish randomised trials. Lancet 2002; 359:909–919 4. Pisano ED, Yaffe MJ. Digital mammography. Radiology 2005; 234:353–362 5. Suryanarayanan S, Karellas A, Vedantham S, Ved H, Baker S, D’Orsi C. Flat-panel digital mammography system: contrast-detail comparison between screen-film radiographs and hard-copy images. Radiology 2002; 225:801–807 6. Fischer U, Baum F, Obenauer S, et al. Comparative study in patients with microcalcifications: full-field digital mammography vs screen-film mammography. Eur Radiol 2002; 12:2679–2683 7. Fischmann A, Siegmann K, Wersebe A, Claussen C, Muller-Schimpfle M. Comparison of full-field digital mammography and film-screen mammography: image quality and lesion detection. Br J Radiol 2005; 78:312–315 8. Lewin JM, Hendrick RE, D’Orsi CJ, et al. Comparison of full-field digital mammography with screen-film mammography for cancer detection: results of 4,945 paired examinations. Radiology 2001; 218:873–880 9. Lewin JM, D’Orsi CJ, Hendrick RE, et al. Clinical comparison of full-field digital mammography and screen-film mammography for detection of breast cancer. AJR 2002; 179:671–677 10. Skaane P, Young K, Skjennald A. Population-

based mammography screening: comparison of screen-film and full-field digital mammography with soft-copy reading—Oslo I study. Radiology 2003; 229:877–884 11. Pisano ED, Gatsonis C, Hendrick E, et al.; DMIST Investigators Group. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005; 353:1773–1783 [Erratum in N Engl J Med 2006; 355:1840] 12. Pisano ED, Hendrick RE, Yaffe MJ, et al.; DMIST Investigators Group. Diagnostic accuracy of digital versus film mammography: exploratory analysis of selected population subgroups in DMIST. Radiology 2008; 246:376–383 13. Skaane P, Hofvind S, Skjennald A. Randomized trial of screen-film versus full-field digital mammography with soft-copy reading in population-based screening program: follow-up and final results of Oslo II study. Radiology 2007; 244: 708–717 14. Skaane P, Skjennalf A. Screen-film mammography versus full-field digital mammography with soft-copy reading: randomized trial in a population-based screening program—the Oslo II study. Radiology 2004; 232:197–204 15. Vigeland E, Klaasen H, Klingen TA, Hofvind S, Skaane P. Full-field digital mammography compared to screen film mammography in the prevalent round of a population-based screening pro-

gramme: the Vestfold County Study. Eur Radiol 2008; 18:183–191 16. Del Turco MR, Mantellini P, Ciatto S, et al. Fullfield digital versus screen-film mammography: comparative accuracy in concurrent screening cohorts. AJR 2007; 189:860–866 17. Heddson B, Ronnow K, Olsson M, Miller D. Digital versus screen-film mammography: a retrospective comparison in a population-based screening program. Eur J Radiol 2007; 64:419–425 18. American College of Radiology. Breast imaging reporting and data system (BI-RADS), 4th ed. Reston, VA: American College of Radiology, 2003 19. Shaw CM, Flanagan FL, Fenlon HM, McNicholas MM. Consensus review of discordant findings maximizes cancer detection rate in double-reader screening mammography: Irish National Breast Screening Program experience. Radiology 2009; 250:354–362 20. Otten JD, Karssemeijer N, Hendriks JH, et al. Effect of recall rate on earlier screen detection of breast cancers based on the Dutch performance indicators. J Natl Cancer Inst 2005; 97:748–754 21. Smith-Bindman R, Chu PW, Miglioretti DL, et al. Comparison of screening mammography in the United States and the United Kingdom. JAMA 2003; 290:2129–2137 [Erratum in JAMA 2004; 291:824]

F O R YO U R I N F O R M AT I O N

Got a few minutes for a visit? Stop by the American Roentgen Ray Society’s online women’s imaging community, which features exclusive content and resources by and for women’s imaging specialists, including articles, electronic exhibits, webcasts, current news, and job and meeting listings. ARRS provides these valuable resources as a free service just for you. Visit us at http://womensimagingonline.arrs.org/.

1018

AJR:193, October 2009