Page 1
Research article
Performance of three cognitive screening tools in a sample of older New Zealanders ------------------------------------------------------------------------------------------------Cheung G.1,Clugston A.2, Croucher M.3, Malone, D.4, Mau E.5, Sims A.5 and Gee S3. 1
Dr Gary Cheung, Department of Psychological Medicine, The University of Auckland, Private Bag 92019,
Auckland Mail Centre, Auckland 1142, New Zealand. 2
Dr April Clugston, Auckland District Health Board, Private Bag 92189, Auckland Mail Centre, Auckland 1142,
New Zealand 3
Dr Matthew Croucher and Dr Susan Gee, Princess Margaret Hospital, PO Box 800, Cashmere, Christchurch,
New Zealand 4
Dr Darren Malone, Rotorua Hospital, Private Bag 3023, Rotorua Mail Centre, Rotorua 3046, New Zealand
5
Dr Etuini Mau, Waikato Hospital, Private Bag 3200, Hamilton 3240, New Zealand
6
Dr Adam Sims, Wellington Hospital, Private Bag 7902. Wellington South, New Zealand
CORRESPONDING AUTHOR
Dr Gary Cheung Email:
[email protected]; telephone: +64 9 373 7599; Fax +64 9 373 7013; Postal Address: Private Bag 92019, Auckland Mail Centre, Auckland 1142, New Zealand.
Page 2
ABSTRACT Background: With the ubiquitous Mini Mental State Exam now under copyright, attention is turning to alternative cognitive screening tests. The aim of the present study was to investigate three common cognitive screening tools: the Montreal Cognitive Assessment (MoCA), the Rowland Universal Dementia Assessment Scale (RUDAS), and the recently revised Addenbrooke’s Cognitive Assessment Version III (ACE-III). Method: The ACE-III, MoCA and RUDAS were administered in random order to a sample of 37 participants with diagnosed mild dementia and 47 comparison participants without dementia. The diagnostic accuracy of the three tests was assessed.
Results: All the tests showed good overall accuracy as assessed by area under the ROC Curve, 0.89 (95% CI = 0.80 - 0.95) for the ACE-III, 0.84 (0.75 - 0.91) for the MoCA, and 0.86 (0.77 - 0.93) for RUDAS. The three tests were strongly correlated: r(84) = .85 (0.78 - 0.90) between the ACE-III and MoCA, 0.70 (0.57 - 0.80) between the ACE-III and RUDAS; and 0.65 (0.50 - 0.76) between the MoCA and RUDAS. The data derived optimal cut-off points for were lower than the published recommendations for the ACE-III (optimal cut-point ≤76, sensitivity=81.1%, specificity=85.1%) and the MoCA (≤20, sensitivity=78.4%, specificity=83.0%), but similar for the RUDAS (≤22, sensitivity =78.4%, specificity=85.1%).
Conclusions: All three tools discriminated well overall between cases of mild dementia and controls.
To inform interpretation of these tests in clinical settings, it would be useful for future research to address more inclusive and potentially age-stratified local norms. (Word count = 243) KEYWORDS Dementia, cognitive assessment, diagnostic accuracy, ACE-III, MoCA, RUDAS
RUNNING TITLE Performance of three cognitive screen tools
Page 3
Introduction The assessment of cognitive function is one of the most important assessments made by clinicians in old age psychiatry and geriatric medicine, and provides a key to detecting dementia and delirium (Ballard et al., 2013). The Mini Mental State Examination (MMSE; Folstein et al., 1975) became the defacto gold standard for cognitive screening internationally for over thirty years after its development (Newman and Feldman, 2011; Strauss et al., 2012). However the decision to enforce copyright on the MMSE and the imposition of a US$1.36 per copy charge (PAR, 2014) has created a significant change in the landscape. Given this financial impetus, increased attention is now being focused on possible viable alternatives to the MMSE for routine use. A number of recent guidelines have provided guidance on alternatives to the MMSE. A UK expert group endorsed by the Department of Health and the Alzheimer’s Society has recommended the Addenbrooke’s Cognitive Examination III (ACEIII) and the Montreal Cognitive Assessment (MoCA) for memory clinic and specialist outpatient assessments (Ballard et al., 2013). In Australia, the locally developed Rowland Universal Dementia Assessment Scale (RUDAS) has also been recommended by an expert clinical reference group for aged care assessment (Sansoni et al., 2010). The MoCA was developed as a cognitive screening tool that could be sensitive in detecting mild cognitive impairment as well as dementia (Nasreddine et al., 2005). It is well validated in a range of dementing illnesses and has high sensitivity and specificity for detecting mild cognitive impairment (Nasredinne et al., 2005; Smith et al, 2007; Dalrymple-Alford et al., 2010). A recent study suggests that the MoCA has advantages over the MMSE for screening and monitoring mild cognitive impairment and Alzheimer’s disease in clinical settings (Freitas et al., 2013).
Page 4
The RUDAS developers aimed to provide a brief cognitive screening tool that could be easily be interpreted into other languages and be fair across diverse cultural backgrounds (Storey et al, 2004). The RUDAS correlates well with the MMSE and is as accurate in predicting cognitive impairment against the DSM IV-TR criteria. Performance on the RUDAS appears to be less influenced by language and education than the MMSE (Pang et al., 2006). The Addenbrooke’s Cognitive Examination (ACE) and its later version Addenbrooke’s Cognitive Examination-revised (ACE-R) were developed to provide a more comprehensive bedside cognitive assessment than the MMSE, that was sensitive to mild dementia and could differentiate frontal temporal dementia (FTD) from Alzheimer’s disease (Pagliautile et al., 2011). A number of studies have concluded that the ACE-R has good validity and reliability as a screening tool for dementia (Mioshi et al., 2006; Larner, 2007; Terpening et al, 2011). The MMSE is embedded in the ACE-R and because of copyright issues, the developers have advised clinicians to start using a new version, the ACE-III. The validity of the revised subscales of the new ACE-III and correlation to the ACE-R has been assessed in participants with frontotemporal dementia and Alzheimer’s disease (Hseih et al., 2013), however as the authors note “the extent to which the ACE-III compares with other cognitive screening tools such as the Rowland Universal Dementia Assessment Scale or the Montreal Cognitive Assessment is an avenue to pursue in the future” (Hseih et al., 2013, p.249). The present study aims to investigate the diagnostic accuracy and generalizability of the recommended cut-points for three leading alternative cognitive screening tools (ACE-III, MoCA, and RUDAS) in a New Zealand secondary care population. ACE-III and MoCA were chosen for this study because the New Zealand Framework for Dementia Care (Ministry of Health, 2013) puts forward these two tools as good practice examples for clinical cognitive assessment. The RUDAS was included because of the relevance of a culturally fair cognitive
Page 5
assessment tool given an increasing population of culturally and linguistically diverse older people in New Zealand.
Method The present study assesses the ability of the three cognitive screening tests (ACE-III, MoCA, and RUDAS) to differentiate between individuals with a known diagnosis of mild dementia and a group known not to have dementia. This was a cross sectional multicentre study which took place between November 2013 and February 2014 in five localities of New Zealand. In each locality there was an old age psychiatrist supervisor and an undergraduate medical student. The localities were Auckland, Christchurch, Hamilton, Rotorua and Wellington. ETHICAL APPROVAL AND CONSENT Approval for this project was given by the Ministry of Health’s Health and Disability Ethics Committee. Written consent was obtained from control group participants. For the mild dementia group, the next of kin provided written consent and assent was sought from the participant. PARTICIPANTS Mild dementia group: The mild dementia group consisted of community-dwelling participants (65 years or older) with an established diagnosis of dementia made by geriatrician or old age psychiatrist. “Mild” dementia in this study was defined by a MMSE score of 20 or above in conjunction with the diagnosis. No upper limit of MMSE score was set for the mild dementia group.
A total of 37 particpants with mild dementia were identified across the five localities from memory clinic databases, if available, or old age psychiatry service databases. The subtypes of dementia represented in the mild dementia group included Alzheimer’s disease (n=10),
Page 6
vascular dementia (n=3), fronto-temporal dementia (n=2), Lewy Body dementia (n=1), mixed dementia (n=5) and unspecified (n=16). Exclusion criteria were: Inability to complete the tasks due to visual/hearing/physical impairment, current delirium, current major mental illness or current known alcohol/drug abuse or dependence, having a GDS score of 10 or more; not having English as a first or bilingual language; and living in residential care. Control group: The control group consisted of rehabilitation ward and psychiatric day hospital patients (aged 65 or older) specifically nominated as cognitively intact by the specialist responsible for their care (geriatricians/old age psychiatrists). An additional check was conducted to ensure that the control participants had an MMSE score of 24 or above. A total of 47 participants were recruited across the five localities. The rehabilitation ward patients were recruited just prior to their discharge to their home (not residential care facility) in the community. Exclusion criteria were the same as those for participants with mild dementia. INSTRUMENTS Addenbrooke’s Cognitive Examination III (ACE-III): Like its predecessor, the well-validated ACE-
R, the ACE-III covers five cognitive domains for which individual subscale scores can be derived. The memory and fluency domains are unchanged. The attention, language, and visuospatial sections have been modified and no longer contain items from the Mini Mental Status Examination. The original Australian validation study based on a total of 81 participants (12 with behavioural variant frontal temporal dementia, mean age 64.7; 21 with primary progressive aphasia, mean age 64.7; 28 with Alzheimer’s disease, mean age 69.9; and 25 control, mean age 66.1) provided evidence of concurrent validity between the revised subscales and relevant neuropsychological tests and a very strong association with the ACER (Hseih et al., 2013). The ACE-III takes approximately 20 minutes to complete. It is scored
Page 7
out of a maximum of 100 and retains the recommended cut-points of 88 / 89 (sensitivity = 100%; specificity = 96%) and 82 / 83 (sensitivity = 93%; specificity = 100%). The ACE-III is freely available from www.neura.edu.au/frontier/research/test-downoads/ . The New Zealand version was used. The Montreal Cognitive Assessment (MOCA): The MoCA is a one page 10 minute cognitive screening tool. It covers multiple cognitive domains to provide a single score with a maximum of 30. The validation study compared healthy volunteers recruited from the community (n = 90, mean age = 72.8) with groups with mild cognitive impairment (n = 94, mean age = 75.2), and Alzheimer’s disease (n = 93, mean age = 76.7) recruited from memory clinics. The developers add 1 point to the score of individuals with 12 or less years of education to correct for education effects. There was evidence of test –retest reliability from a subsample of participants. The MoCA differentiated well between the control group and both the MCI and Alzheimer’s groups. The recommended cut-point is 25 / 26 (sensitivity for dementia = 100%; sensitivity for MCI = 90%; specificity = 87%). The MoCA is freely available from http://www.mocatest.org . The correction for education was used in the present study. The Rowland Universal Dementia Assessment Scale (RUDAS): The RUDAS is a 6-item cognitive screening assessment encompassing multiple cognitive domains, taking around 10 minutes to complete, which gives a single score with a maximum of 30. The validation study (Storey et al., 2004) involved 90 community dwelling older people referred to an age-care service stratified by age and cognitive status (45 participants with dementia, mean age = 81.4; 45 control participants, mean age 78.1). It reported high inter-rater and test-retest reliability, and excellent diagnostic accuracy (AUC = .95). The RUDAS has a recommended cut-point of 22 / 23 (sensitivity = 89%; specificity = 98%). The test is freely available from
Page 8
https://fightdementia.org.au/sites/default/files/20110311_2011RUDASAdminScoringGuide.p df DATA COLLECTION We collected the following demographic data: age, date of birth, gender, ethnicity, education level (years), and subtype of dementia (if applicable and known). Two screening measures for exclusion criteria were used at the beginning of the data collection session, the 15 item Geriatric Depression Scale (GDS; Sheikh and Yesavage, 1986) and the Mini-Mental State Examination (MMSE; Folstein et al., 1975). The three cognitive assessments measures were administered face-to-face: ACE-III, MoCA and RUDAS. The order of administration of the cognitive screening tools was randomised. Where an identical item was repeated in more than one tool the participant was only asked to complete the item once, unless the item was used as a distraction task, and the scores were carried across to the other measures. Data was gathered by medical student researchers. These students underwent local training in screen administration as well as attending group video-conference seminars. STATISTICAL ANALYSIS The Statistical Package for the Social Sciences (SPSS) Versions 20 and 21 and MedCalc 14.8.1 were used for data analysis. The normality of the distribution for the ACE-III, MoCA and RUDAS scores for the mild dementia and control groups was assessed by the Shapiro-Wilks test, with no significant deviations from normality indicated. To assess the equivalence of the descriptive statistics between the mild dementia group with the controls, independent t-tests (2-sided) were used for continuous variables and Chi-square tests (2-sided) for discrete variables. Fisher exact test was used for discrete variables when the cells contain less than 5 cases. The ethnic background of the groups was compared for European versus Non-European (collapsing across ethnicities).
Page 9
A standard multiple regression was performed between test score as the criterion and group (mild dementia, control), education and age as predictor variables, for the ACE-III, MoCA and RUDAS. T-tests were used to compare the performance between the control and mild dementia groups for each of the five subscale of the ACE-III, with the appropriate Bonferroni correction (p set at 0.01) To assess concurrent validity, Pearson correlations between the three cognitive tests were calculated, with appropriate Bonferroni correction for significance (p set at 0.017). Standardscores (z-scores) were used to allow for the difference in the scoring system of the tests. Given the attenuation of validity coefficients due to unreliability inherent in the scales, a correlation coefficient of 0.6 or more was judged as indicating an extremely strong association (McDowell, 2006). The overall diagnostic accuracy of the tests was assessed by the area under the receiving operating characteristic curve (AUC). In the present study the AUC represents the probability that a randomly selected individual from the mild dementia group will have a lower cognitive test result than a randomly selected individual from the control group. An AUC between 0.9 and 1.0 was judged as indicating ‘excellent’ accuracy, 0.8 to 0.9 as ‘good’, 0.7 to 0.8 as ‘not good’, and 0.6 to 0.7 as ‘worthless’ (Zhu, 2010). The statistical method of DeLong et al. (1988) was used to test whether there were any significant differences (z statistics) between the AUCs. Diagnostic accuracy for clinical diagnosis using cut-points was calculated by sensitivity, specificity, positive and negative likelihood ratios (LRs). As a rule of thumb, sensitivity or specificity values of 0.8 or more were considered good and 0.95 and very good. A positive likelihood ratio of greater than 10 or a negative likelihood ratio of less than 0.1 was considered a large change in probability, 5 to 10 or 0.1 to 0.2 moderate, 2 to 5 or 0.5 to 1.0
Page 10
was considered small (but sometimes important), and 1 to 2 or 0.5 to 1 small and rarely important (Furukawa et al., 2008). Positive predictive value (PPV), negative predictive value (NPV) are not reported for this case-control study as these values are dependent on the relative size of the groups and would not be comparable across settings with a naturally occurring prevalence of dementia. SAMPLE SIZE The initial target sample size was not able to be met, and a revised minimum sample size of 80 was set. A sample size of 80 is adequate with power set at 0.8 and probability at 0.05 to be confident that a ‘good’ AUC of 0.8 is greater than chance (N ≥ 27 using observed ratio between group sizes; MedCalc), that a correlation between the tests of 0.4 (which McDowell, 2006 states as the minimum threshold of commonly observed validity coefficients,) is greater than zero (N ≥ 46; MedCalc), and that a multiple regression with three predictors predict can predict a medium effect size (N ≥ 80, Fields, 2009). However the small sample size means that qualitative descriptions of findings should be interpreted in conjunction with the corresponding confidence intervals.
Results PARTICIPANT CHARACTERISTICS The demographic details of the groups are shown in Table 1. The mild dementia group and the control group did not differ significantly in age (t(82) = 1.43, p = 0.15, effect size = 0.16), gender (χ2(1) = 0.47, p= 0.49, phi = 0.08), or education (t(82) = 0.81, p = 0.42, effect size = 0.09), and both groups were predominantly of European ethnicity (European vs nonEuropean Fishers exact p = 1.0 ). OVERALL PERFORMANCE OF TESTS Table 2 shows the mean ACE-III, MoCA and RUDAS scores in the dementia group and the control group. Group membership (dementia group, control group) was a significant predictor
Page 11
of tests scores for all three tests. Age group (≤79, 80+) was also a significant predictor of test score for the ACE-III (see Table 3). The three tests were strongly correlated with each other: r(84) = 0.85 (95 CI = 0.78 - 0.90, p < 0.001) between the ACE-III and MoCA, r(84) = 0.70 (95% CI = 0.57 - 0.80, p < 0.001) between the ACE-III and RUDAS; r(84)= 0.65 (95% CI = 0.50 - 0.76, p < 0.001) between the MoCA and RUDAS. In our sample, all three measures showed good classification accuracy as measured by the AUC (Zhu, 2010). The area under the curve (AUC) was 0.89 (95% CI = 0.80 - 0.95, p < 0.001) for the ACE-III, 0.84 (95% CI = 0.75 - 0.91, p < 0.001) for the MoCA, and 0.86 (95% CI = 0.77 - 0.93, p < 0.001) for RUDAS. No statistical differences were found between the AUCs of the three tests: ACE-III vs MoCA (z = 1.46, p = 0.48), ACE-III vs RUDAS (z = 0.59, p = 0.56), MoCA vs RUDAS (z = 0.33, p = 0.74). Figure 1 shows the ROC curves of the ACE-III, MoCA and RUDAS. The mean scores for each of the subscales of the ACE-III were significantly higher for the control group than for the mild dementia group for the attention (t(82) = 4.75, p < 0.001, effect size = 0.46), memory (t(82) = 7.73, p < 0.001, effect size = 0.65), fluency (t(82) = 2.70, p = 0.001, effect size = 0.37), language (t(82) = 3.83, p = 0.003, effect size = 0.33), and visuospatial (t(82) = 2.79, p < 0.002, effect size = 0.24). CUT-POINTS Sensitivity, specificity and likelihood ratios for a range of cut-points are available online as supplementary tables [link to be inserted]. For convenience, Table 4 includes only the optimal cut-points derived from the data and the original recommended cut-points from the validation studies. The optimal cut-points derived from the data were 76 / 77 for the ACE-III (95% CI 72-80), 20 / 21 for the MoCA (95% CI 18 – 21) and 22 / 23 for the RUDAS (95% CI 20 – 23). As can be seen in Table 4 the optimal cut-point derived for this population was
Page 12
the same as the published standard for the RUDAS, but considerably lower than the established standard for the ACE-III and MoCA. Using the optimal scores all three tests performed moderately well, as did the RUDAS when the standard cut-point was used (with values at or approaching 0.8 for sensitivity and specificity, and 5 for the +LR). Sensitivity was poor and positive likelihood ratios indicated only a small and likely to be unimportant shift in probability for the ACE-III and MoCA at the recommended cut-points.
Discussion This study investigated three cognitive screening tools, the ACE-III, the MoCA, and the RUDAS, in an older New Zealand secondary care sample. The results supported the overall diagnostic accuracy of the three measures, but raised concerns about the generalizability of the established cut-points for the ACE-III and MoCA. The present study was, at the time of writing, the first to investigate the performance of the new ACE-III following its initial validation study in 2013. The ACE-III showed an overall ability to discriminate between the dementia and control groups that was similar to the MoCA and RUDAS, and the ACE-III correlated strongly with both of these measures. While the MoCA and RUDAS provide a brief global screen for dementia ideal for primary care settings, the ACE-III is designed to be more comprehensive. The potential for a greater range of scores and differential diagnosis may give the ACE-III a wider range of potential uses in secondary care settings. The RUDAS is not commonly used in New Zealand currently (Strauss et al., 2012). In this study the RUDAS had a comparable discriminant accuracy to the other tests. The RUDAS has previously been shown to be relatively culturally unbiased (Storey et al., 2004; Basic et al., 2009) and this coupled with its ease and speed of administration makes the RUDAS
Page 13
worthy of evaluation as a recommended screening tool for dementia, particularly in culturally diverse settings. With the Maori, Pacific and Asian population over the age of 65 years projected to double in the next decade in New Zealand and an estimated 8% of the population who do not speak English (Statistics New Zealand, 2013a,b), the availability of a culturally neutral cognitive screening tool is important. To interpret the scores obtained from the cognitive screening tools, a comparison to what constitutes ‘normal’ performance is needed (Callow, 2013). The mean total scores in our control group were notably lower those of their respective validation studies for the ACE-III and the MoCA , with 85% falling below the standard screening cut-point of the ACE-III (88 / 89) and 83% falling below the established cut-point for the MoCA (25 / 26) (Hseih et al., 2013; Nasreddine et al., 2005). The optimal derived cut-points were correspondingly lower. These low scores may stem in part from the awaiting-discharge control group having impaired performance while recovering from their illness, particularly with multiple tests administered in the same session, despite the minimum MMSE score criteria. The control group participants were selected on the basis of the responsible specialist being confident they were cognitively intact, however it may be possible that participants with an undiagnosed mild cognitive impairment were still included in the control group. However these are unlikely to provide a full explanation of the pattern of results as the RUDAS, which has been shown to be sensitive to mild impairment (Rowland et al., 2007), did not show this strong discrepancy in means and cut-points compared to the relevant validation study (Storey et al., 2004). A stronger explanation may be that our sample was older than the control groups for validation studies for the ACE-III and MoCA (mean age of 79.7 versus 66.1 and 72.8 respectively), but similar in age to the control group for the RUDAS validation study (77.9). A number of previous studies have also reported a discrepancy between the published cut-points and the performance of population groups for the MoCA and ACE family of tests.
Page 14
The initial cut-point recommendations were based on relatively small normative samples (a control group of 25 for the ACE-III and 90 for the MoCA) which did not allow for full stratification for age or education. Studies seeking to establish Portuguese and US norms for the MoCA reported mean scores for many of the normative sample subgroups that fall below the established cut-point (Freitas et al., 2011; Rossetti et al., 2011) and other studies have reported optimal cut-points for the MoCA that are lower than the established cut-points (e.g., Roalf et al., 2013; Hoops et al., 2009; Smith et al, 2007). Likewise, a number of studies of earlier versions of the ACE have found mean scores for control groups or normative subgroups that fall below the recommended screening cut-point (Garcia-Callebro et al., 2006; Mathuranath et al., 2006; Banerjee et al., 2006; Pigliautile et al.,, 2011) and reported lower optimal cut-points than the established recommendations (e.g., Kwak et al., 2010; Larner, 2007; Strauss, 2012). Studies from New Zealand (Callow, 2013) and Italy (Pigliautile et al., 2011) have found that ‘older-old’ age (75+) is associated with lower performance on the ACE-R, suggesting that it may be appropriate to provide age-norms. In their ACE-R validation study Mioshi et al. acknowledges that the “patient group was relatively young, which reflects the bias of the Cambridge clinics. It is not clear whether these findings would apply equally to an older patient group” (Mioshi et al., 2006, p. 1084). The present study suggests this issue may be carried across to the new ACE-III revision, with age being an independent predictor of performance. While acknowledging the limitations of the present study, the results add to an accumulating literature suggesting caution when generalising the established cut-off scores for the MoCA and ACE family of tests to different population groups. Given that cognitive impairment becomes more prevalent with age, ensuring that age-appropriate norms are available for older age-groups is a priority for enabling confidence in routine use of cognitive screening tests in clinical practice.
Page 15
This study had a number of strengths and limitations. (1) Recruitment across multiple locations means that findings can be generalized across the country. However, with all but 4 of the participants being European, it may not be possible to generalize the results to other ethnic groups. (2) The clinical diagnosis of dementia had high credibility as the diagnosis was made by a geriatrician or old age psychiatrist, in the context of a multidisciplinary memory service assessment. (3) Using an MMSE of