J Cancer Surviv DOI 10.1007/s11764-014-0383-1
The Concerns About Recurrence Questionnaire: validation of a brief measure of fear of cancer recurrence amongst Danish and Australian breast cancer survivors Belinda Thewes & Robert Zachariae & Søren Christensen & Tine Nielsen & Phyllis Butow
Received: 4 November 2013 / Accepted: 30 June 2014 # Springer Science+Business Media New York 2014
Abstract Purpose Fear of cancer recurrence (FCR) is prevalent amongst survivors, and breast cancer survivors are particularly vulnerable. Currently, there are few well-validated brief measures of FCR and none specific to breast cancer. This manuscript describes the development and initial validation of a new measure of FCR for breast cancer survivors, the Concerns about Recurrence Questionnaire (CARQ), and reports its initial validation in an Australian and Danish population-based sample of breast cancer survivors. Methods CTT analyses explored scale reliability and validity; Rasch analyses explored model fit statistics, item bias (DIF) and local dependency. Three-item, four-item and five-item versions were considered.
B. Thewes (*) : P. Butow Centre for Medical Psychology and Evidence-Based Decision-Making School of Psychology Transient Building, University of Sydney, Sydney, NSW 2006, Australia e-mail:
[email protected] R. Zachariae : S. Christensen Unit for Psychooncology and Health Psychology, Department of Psychology and Behavioural Sciences, Aarhus University, Aarhus, Denmark R. Zachariae : S. Christensen Department of Oncology, Aarhus University Hospital, Aarhus, Denmark T. Nielsen Department of Psychology, University of Copenhagen, Copenhagen, Denmark P. Butow Psycho-Oncology Cooperative Research Group (PoCOG), School of Psychology, University of Sydney, Sydney, Australia
Results Two hundred eighteen Australian women aged 28– 45 years diagnosed with early-stage breast cancer (stages 0–2) and 2001 Danish women diagnosed with breast cancer (stages 1–3) aged 26–70 completed the CARQ. Based on the results of both CTT and IRT analyses, the four-item English version of the scale performed best. Although the CTT analyses suggested that the CARQ-4 was reliable and valid in both samples, Rasch analyses identified item bias relative to age, and local dependence which may be remedied by further scale development. Conclusions The CARQ-4 English version is currently one of the most rigorously tested brief scales of FCR available. Implications for Cancer Survivors The availability of more valid and reliable brief measures of FCR will help to promote research and screening of FCR amongst cancer survivors. Keywords Cancer . Oncology . Fear of recurrence . Psychological assessment . Measurement scale Currently, about one in every 20 adults has survived cancer, including nearly one fifth of all people over 65 [1]. The number of cancer survivors in the USA alone is currently estimated at 13.7 million and projected to reach over 20 million by 2022 [1]. In light of these increasing numbers, understanding and addressing the psychological as well as physical problems of cancer survivorship is becoming increasingly important. The need for help with fear of cancer recurrence (FCR) is the most commonly reported unmet need with one quarter to one third of cancer survivors reporting moderate to high levels of unmet need for help with FCR [2–4]. Fear of cancer recurrence which has been defined as ‘the fear that cancer could progress or return in another part of the body’ [5] can be a chronic problem for some survivors. Breast cancer survivors are particularly vulnerable to FCR [6].
J Cancer Surviv
Although the literature on FCR has been expanding in recent years [6], there are very few well-validated brief measures of FCR and none specific to breast cancer [7]. A recent systematic review of self-report measures of FCR by the present authors identified 20 multi-question scales and 7 single question measures of FCR [7], and found that high-quality brief measures of FCR are still lacking. In 2002, the Medical Outcomes Trust (MOT) defined a set of attributes and review criteria for patient-reported outcome questionnaires [8] which are widely accepted and recommended for the assessment of measurement tools used in cancer [9]. The quality attributes specified by the MOT include: definition of concept and measurement model, reliability, validity, responsiveness to change, interpretability, patient burden and alternate forms, and cultural and language adaptations. In the aforementioned review of FCR measures [7], instruments were evaluated against MOT attributes and assigned a quality rating. Four longer validated measures (>10 items) achieved low to moderate quality ratings ranging from 1 to 4.5 (mean=3.75, possible range of 0 to 7). However, the brief measures (2–10 items) scores were low ranging from 0 to 2.5 (mean=0.7). Of the brief FCR measures currently available, only one [10] has psychometric data available from multiple samples, and only one, the Assessment of Survivor Concerns (ASC) [11], has undergone thorough validation work. However, the ASC is not specific to fear of cancer recurrence and includes the assessment of health anxiety more broadly. While the Concerns about Recurrence Scale [5], another commonly used FCR scale, has established its reliability and validity; at 30 items, it is too long for use in a screening context. The Fear of Cancer Recurrence Concerns Inventory is another valid and reliable 42-item measure which has been developed for a mixed cancer population [12]. Based on strong correlations between the nine-item severity FCRI subscale and total scores (r=0.8), the authors of this scale proposed the severity index as a potential screening tool for FCR in a mixed cancer sample. While the severity subscale of the FCRI may have promise as a screening instrument, there is currently little published data to support its use as a screening tool nor are validation data published on the English version of the scale. Thus, there are currently no well-validated brief measures in English which specifically focus on fear of cancer recurrence in breast cancer survivors. Fear of cancer recurrence is an emerging area of interest in the survivorship research and developing better methods of screening for FCR that are suitable for use in clinical settings is of high importance to both clinicians and researchers to help identify individuals who may benefit from emerging interventions for this common problem. Only one previous scale has a cut-off score for clinical levels of FCR and there is very little information available to researchers to help determine
clinically meaningful differences for intervention research. Given the prevalence of the problem in breast cancer survivors [6] and lack of suitable existing instruments, there is a need for a psychometrically sound brief measure of FCR in breast cancer survivors, which is suitable for international use and appropriate for both research and clinical purposes [7]. The aims of the present study, therefore, were to describe the development of a new measure of fear of cancer recurrence for breast cancer survivors, the Concerns about Recurrence Questionnaire (CARQ), and to report on its initial validation in Australian and Danish samples. Most self-report scales in the field of psycho-oncology have been validated using exclusively CTT methods. However, in the present study, we chose to supplement a CTT validation with Rasch analysis, in order to identify areas for possible scale improvement and offer suggestions for incremental validity that can be achieved by item modification.
Methods Participants Participants in this study were recruited as part of two existing studies of recurrence-free women previously diagnosed with early breast cancer. The first, focused specifically on FCR, was a cross-sectional study of Australian women (n=218) aged 45 years or less at diagnosis, and diagnosed at least 1 year previously (stages 0–2) [13]. The second was a sample of Danish women (n=2001) aged 26–70 years at time of surgery (stages 1–3) included in a 7–9 year follow-up of a previously established nationwide cohort of women treated for primary breast cancer (for further details see [14]). An additional sample of 93 out of 129 Danish women aged 35–89 years with stages 1–2 breast cancer attending routine surgical review clinics was recruited in order to examine test–retest reliability.
Procedure Ethical approvals were obtained from the appropriate institutional ethics committees. In the Australian sample, consecutive patients seen at one of seven metropolitan cancer centres and members of one of two breast cancer consumer groups received a written invitation to participate in the study. Interested and eligible participants contacted the researchers and completed a web-based questionnaire. In the Danish sample, the eligible women had been informed about the study at surgical departments. Information on eligibility, histopathology, treatment and disease status were obtained directly from surgical departments, and/or from the Danish Breast Cancer
J Cancer Surviv
Cooperative Group (DBCG) and the Danish Cancer Registry. In addition, women reporting a recurrence or secondary cancer were excluded from the analyses. Demographic data and socioeconomic variables were collected from six nationwide Danish registries through a linkage serviced by Statistics Denmark [14]. Additional data was collected by a mail-out questionnaire 7–9 years post-surgery. In order to explore test– retest reliability participants were asked to complete the CARQ at surgical clinic reviews and a second time by mail 14 days later. Development and scoring of the CARQ The original CARQ consisted of five items: three were adapted from an existing measure of FCR [15] and assessed the frequency, intrusiveness and degree of distress caused by FCR. Adapted items were made specific to breast cancer and the response format modified to make it consistent across items. Two supplementary items assessed perceived risk of recurrence and perceived risk of recurrence relative to others. Ten healthy Australian women were asked to complete the questionnaire for initial feedback on interpretability and response burden, after which further minor revisions were made to the wording (see Appendix 1 for the final version in English). The first three items are rated on an 11-point Likert scale ranging from 0 ‘not at all’ to 10 ‘a great deal’. In item 4, respondents are asked to quantify perceived risk of recurrence as a number ranging from 0 to 100 %, and in item 5, they are asked to judge their risk relative to others using a multiple choice format with five response options ranging from ‘considerably less than other women in my situation’ to ‘considerably greater than other women in my situation’. Scores on item 4 are transformed to an 11-point scale ranging from 0 to 10 for consistency of scoring with items 1–3, whilst item 5 is scored on a scale ranging from 1 to 5. A total score is obtained by summing individual item scores. Possible ranges for total scores are 0–30 for the CARQ-3 (items 1–3), 0–40 for the CARQ-4 (items 1–4) and 0–45 for the CARQ-5 (items 1–5). The CARQ-5 was translated into Danish using a forward–backward translation method by two bilingual independent translators. Discrepancies in translation were resolved through consensus. Additional measures for validity testing collected in the Australian Sample Additional measures used for validity testing were all reliable and valid. Fear of Cancer Recurrence was additionally assessed using the 42-item Fear of Cancer Recurrence Inventory (FCRI) as a gold standard measure of FCR. The FCRI is a comprehensive, multi-dimensional measure [12] which includes seven subscales: triggers, severity,
psychological distress, functional impairment, insight, reassurance and coping strategies. Higher scores indicate greater FCR. A score of 13 or greater on the severity subscale (range=0–36) has been recommended as a cut-off score for clinical FCR based on a receiver operating curve (ROC) analysis using clinical interviews as a gold standard [16]. Generalised Anxiety (GAD) was assessed with the Generalised Anxiety Disorders Questionnaire 4th Edition (GAD-Q-IV) [17]. Depression, Anxiety and Stress were measured with the Depression Anxiety Stress Scale-Short form (DASS-21) [18]. Health Anxiety was assessed using the seven-item Whitely Index-Short Form (WI-7) [19]. As we aimed to assess general health anxiety, items 1 and 5 were reworded for our purpose to specify fears about other illnesses by adding the words ‘other than cancer’ (for example ‘Do you often worry about the possibility that you have got a serious illness other than cancer?’).
Statistical analysis Classical test theory (CTT) analyses All CTT analyses were conducted with SPSS V20 and LISREL V4. Reliability of the five-item CARQ (CARQ-5) was assessed in both the Australian and Danish samples using Cronbach’s alpha and inter-item and item-total correlations. Further tests of reliability and screening performance were conducted on the three-item (CARQ-3) and four-item (CARQ-4) versions only, due to a lack of homogeneity identified between item 5 and other items of the scale in preliminary stages of the analysis. For the analysis of test–retest reliability, correlations between two administrations of the CARQ-3 and CARQ-4, 14 days apart, were calculated. In order to identify the underlying factor structure of the CARQ and latent contruct(s), exploratory factor analyses (EFA) using the maximum likelihood extraction method were conducted for the three-, four- and five-item English versions using the Australian data. Confirmatory factor analyses (CFA) of all versions of the scale were then conducted on the Danish data to examine the extent to which the factor structure identified by EFA was replicated in the Danish version. Concurrent validity of the CARQ with the Fear of Cancer Recurrence Inventory (FCRI) was tested in the Australian sample using Spearman’s rho correlations. Convergent validity was also examined with the other psychological measures in the Australian sample using the same method. Moderate to high correlations between the CARQ and other measures of psychological morbidity were expected. To gather preliminary data on the potential of the CARQ-3 and CARQ-4 as screening instruments for clinical levels of FCR, a Receiver Operator
J Cancer Surviv
Characteristic (ROC) analysis was performed to find optimal cut-off scores to detect clinical FCR according to the FCRISeverity Index. Likelihood ratios (sensitivity/(1-specificity)) were calculated for all possible cut-off scores. Sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) are also reported for the optimal CARQ-3 and CARQ-4 cut-offs. Readability statistics were calculated for the English version of the scale, for the instructions and item wording using Flesch reading ease scores and Flesch– Kincaid grade level scores [20]. Item response theory (IRT) analyses The item analyses were conducted within the family of Rasch models related to item response theory (IRT) [21, 22]. These models all include the same five fundamental assumptions about the measurement scale in question [23, 24]: 1. Uni-dimensionality (the set of items measure a single latent construct). 2. Monotonicity (the probability of a “high” item response increases monotonously as a function of the latent variable). 3. Homogeneity (the ordering of items with regard to difficulty is the same for all respondents with the same level on the latent variable). 4. Local independence (LD) (items are conditionally independent given the latent variable, that is the associations between items are fully explained by the latent variable). 5. Absence of differential item functioning (DIF) (items and exogenous variables are conditionally independent given the latent variable, that is the associations between items and exogenous variables are fully explained by the latent variable). The statistical software DIGRAM [25] was used for all Rasch analyses. The strategy of analysis for both samples consisted of the following steps: First, fit of the data to the pure Rasch model (RM) was tested. Second, when fit to the RM could not be established, graphical loglinear Rasch models (GLLRM) were conducted to test whether data fitted a more complicated model with uniform DIF and/or uniform LD. Analyses by GLLRMs thus provide a catalogue of the departures from the Rasch model, which in turn can be used to suggest improvements that may lead to a measurement scale with better psychometric qualities by pointing to items that might be eliminated, replaced or modified, as suggested in Nielsen and Kreiner’s scale improvement strategy [26]. When the departures from the RM only consist of uniform LD and/or uniform DIF, statistical sufficiency of the raw score is retained, and measurement may still be regarded as essentially objective and valid. However, uniform LD may still lead to
reduced reliability, and uniform DIF will affect the validity of the scale in question and limit its use, if scores are not equated across groups showing DIF [26]. In case of non-fit to the RM and subsequent fit to a GLLRM, reliability was calculated using a Monte Carlo version of the estimation method proposed by Hamon and Mesbah [27] as the usual estimation of Cronbach’s alpha tends to overestimate the lower bound of reliability in these cases. Overall fit of the models were tested by Andersen’s [28] conditional likelihood ratio tests (CLR). Fit of individual items, was tested by comparing observed and expected correlations between item scores and the rest-score across all items except the one in question. In the GLLRMs LD and DIF were tested using conditional tests of independence, which also provided partial gamma correlation coefficients giving the strength of the association between items in the case of LD and between items and exogenous variables in the case of DIF [29]. Specifically, DIF was tested in relation to all the exogenous variables mentioned earlier. In addition, a critical level of 0.05 was used for all tests, except the overall test of fit where 0.01 was used, and the Benjamini–Hochberg procedure was applied to correct for false detection due to multiple testing when appropriate [30]. See Kreiner and Christensen [29, 31, 32] for additional statistical and technical details of the analyses conducted. Items were recoded identically for all Rasch analyses. Items 1–3 were recoded from the original 11-category variables as follows: 0=0–1, 1=2–3, 2=4–6, 3=7–8, 4=9–10. Item 4 was recoded as follows: 0=0–19 %, 1=20–39 %, 2= 40–59 %, 3=60–79 %, 4=80–100 %. Item 5 was retained in its five category original version. All items are directed so that the higher the value the higher degree of FCR. Differing covariates were available for analysis of DIF in the Australian (AU) and Danish (DK) samples. In the AU sample three covariates were included in DIF-analysis: time since diagnosis (0=up to 23 months, 1=24–47 months, 2= 48–95 months, 3=96+ months), age at diagnosis (median cut; 0=28–39, 1=40–45) and educational level (0=year 10, 1= year 12, 2=Diploma, 3=Bachelor degree, 4=Higher degree). In the DK sample for women with complete item and covariate responses for item 1–3 (N=1948), one covariate was included in the analyses: age at diagnosis (0=lowest–44, 1= 45–50, 2=51–60, 3=61–highest). Based on the initial findings of differential item functioning relative to age in the separate analyses of the AU and DK samples and in order to facilitate analyses that would provide more definite evidence towards the possible bias effect of age on FCR a reduced sample with a comparable age range of 28–45 years amongst Danish women was chosen. For the analyses of the combined Australian and Danish (AU-DK) sample (N=532 with complete item and covariate responses) two covariates could be included: country (0=AU, 1=DK) and age at diagnosis (0=28–39, 1=40–45).
J Cancer Surviv
Results
CARQ scores
Clinical and demographic characteristics of the sample
There were very few missing data for the primary outcome measures in both samples, ranging from 0.5 % (CARQ-3) to 1.8 % (CARQ-5) in the Australian women and between 0.5 % (CARQ-3) and 4 % (CARQ-5) in Danish women. The Flesch reading ease scores of the English version of the CARQ-5 and CARQ-4 were 63.8 and 65.7, respectively, and the Flesch–Kincaid grade level scores were 7.4 (corresponding to a 12- to 13-year-old reading level) and 6.8 (corresponding to an 11- to 12-year-old reading level). A summary of the descriptive data for the CARQ-3, CARQ-4 and CARQ-5 total scores from women in both samples is shown in Table 2.
Due to ethical requirements, an opt-in method was used in the Australian study, and 218 of the 702 eligible women returned a completed survey (response rate 31 %). Amongst the Danish sample, 2,097 of 2,316 eligible women approached returned a completed survey (response rate 91 %). Ninety-six reported a recurrence or a secondary cancer leaving 2001 eligible for analysis. Of these a total of 1,885 women completed all five items of the CARQ. A summary of the clinical and demographic characteristics of both samples is shown in Table 1. Australian participants were on average 39 years at the time of diagnosis (range 28– 45 years) and were approximately 4 years post-diagnosis (range=1–20 years). In the Danish sample, women were on average 54 years (range=26–70) and were on average 8 years post-diagnosis (range=7–9 years). Table 1 Demographic and clinical characteristics of the sample Characteristic Agea (years) Time since diagnosis (years)
Reliability Table 3 reports inter-item correlation and item-total correlations of all versions of the CARQ. The three- and four-item versions of the CARQ were found to be relatively homogeneous in both
Australian sample (n=218)
Danish sample (n=1,885)
Mean (SD) 39.3 (4.6) 4.2 (2.7) Number
Mean (SD) 54.3 8.4 Number
Range 26–70 7–9 Percentage
Range 28–45 1–20 Percentage
Marital status at diagnosis
a
Age at diagnosis for Australian women and age at time of surgery for Danish women
b Highest level of education, data unavailable or not recorded on registries
Married/de facto Never married Divorced/separated Widow Missing Educationb Secondary Tertiary Post-graduate Other Missing Number of children No children 1 or more children Nodal status Positive Negative Don’t know/not sure/missing Adjuvant treatment Chemotherapy Radiotherapy Hormonal Herceptin Oophorectomy
159 37 22
72.9 17.0 10.1
1,474 72 211 105 23
78.2 3.8 11.2 5.6 1.2
62 113 30 13
28.4 51.8 13.8 6.0
1,286 484 95 – 20
68.2 25.7 5.0 – 1.1
73 145
33.5 66.5
209 1,676
11.1 88.9
132 85 1
60.5 39.0 0.5
874 1,007 4
46.4 53.4 0.2
168 171 150 21 10
77.1 78.4 68.8 9.6 4.6
813 1,493 1,215 – –
43.7 79.4 65.3
J Cancer Surviv Table 2 Descriptive statistics for CARQ-3, CARQ-4 and CARQ-5 total scores
CARQ-3
Mean SD Range Possible range Median
CARQ-4
CARQ-5
Australian (n=217)
Danish (n=1,948)
Australian (n=215)
Danish (n=1,892)
Australian (n=214)
Danish (n=1,885)
13.0 7.7 0–30 0–30 12.0
5.3 6.5 0–30 0–30 3.0
16.6 9.5 0–40 0–40 17.0
7.0 7.8 0–40 0–40 4.0
19.5 9.8 1–43 1–45 20.0
9.6 8.1 1–43 1–45 7.0
samples, with satisfactory inter-item correlations (r=0.42–0.83 p