Detecting feigned cognitive impairment with a computerized test battery: Comparison of seven proposed techniques for identifying malingering.
Alexander Collie1,2, Marian Kolta3, Paul Maruff2,3, Michael McStephen2 & David G Darby1,2.
From
1.
Centre for Neuroscience The University of Melbourne Parkville, Victoria, Australia
2.
CogState Ltd Carlton South, Victoria, Australia
3.
School of Psychology La Trobe University Bundoora, Victoria, Australia
Supported by research grants from CogState Ltd, 51 Leicester St, Carlton VIC 3053 and the Australian Federal Government START Grant scheme. Authors Collie, Maruff, McStephen & Darby are currently employees of, or hold equity in, CogState Ltd.
Acknowledgment: The authors would like to thank Marina Falleti for her assistance with data collection. Keywords: Malingering, feigned cognitive impairment, fatigue, CogState, computerized. Running head: Detecting feigned cognitive impairment. Address correspondence to: Alex Collie Centre for Neuroscience The University of Melbourne C/o 51 Leicester Street Carlton South 3053 Victoria, Australia Phone: +61 3 9349 1300 Fax: +61 3 9348 2689 Email:
[email protected]
Abstract
In forensic and medico-legal settings, neuropsychologists are increasingly required to determine whether an individual is feigning cognitive impairment (i.e., malingering) in pursuit of financial compensation, treatment or to evade prosecution. All published neuropsychological studies of malingering have required individuals to malinger complex conditions that may encompass many signs and symptoms (e.g., head injury), and of which they have no first-hand experience. In the current study, healthy young individuals were required to malinger a single common symptom of such conditions (i.e., fatigue), both before and after experiencing the symptom first-hand. Of seven recently proposed neuropsychological criteria for detecting malingering, slowing of response times on simple cognitive tests provided the greatest sensitivity and specificity to malingered impairment. When performance on multiple criteria were considered, sensitivity to malingering was increased.
2
Introduction
In forensic and medico-legal settings, individuals may exaggerate or feign impairment on cognitive tests (i.e., enact or malinger; Walsh & Darby, 1999). Malingering is a pejorative term imputing conscious or volitional motivation for simulating illness. Such malingering appears to be in pursuit of a clearly identifiable external goal, such as avoiding work, obtaining financial compensation or treatment, or evading criminal prosecution. In clinical settings, the possibility that an individual is malingering must be considered carefully prior to inferring brain dysfunction on the basis of neuropsychological results. Neuropsychological evidence is being increasingly requested in such settings to effectively differentiate malingered from ‘true’ cognitive impairment.
There are now numerous neuropsychological tests and techniques designed specifically to detect malingered performance (see Vickery et al, 2001 for review). Our review of this literature indicates that these can be categorized into a number of broad approaches encompassing seven different criteria for detection of malingering (Table 1). The most prominent approach in the published literature involves the application of forced-choice tests and “symptom-validity testing”, such as the Digit Memory Test (DMT: Hiscock & Hiscock, 1989) and the Portland Digit Recognition Test (PDRT: Binder, 1990). For memory recognition tasks that require individuals to differentiate the previously learned target from a single distracter, malingering is inferred when the accuracy of their performance falls below 50%, as this indicates a performance deficit more severe than a person with true cognitive impairment could achieve by chance (Martin et al, 1998; Nies & Sweet, 1994). Another approach is to administer very simple cognitive tasks and examine performance for abnormal decreases in response accuracy (Tenhula & Sweet, 1996; Rey, 1964) and, where reaction time is recorded, abnormal increases in response latency (Guilmette et al, 1994) or variability (Binder et al, 1992; Ellwanger et al, 1999). 3
With this approach, malingering is suspected when the individual's performance is worse than that expected of someone with true cognitive impairment. Two similar approaches are to inspect performance on more difficult cognitive tasks for abnormally slow response times (Guilmette et al, 1994; Hiscock & Hiscock, 1989), and to inspect performance on simple memory recognition tasks for increased error rates even if they remain greater than chance (Tenhula & Sweet, 1996). Finally, Fredirick and Crosby (2000) have proposed that malingering may manifest as a catastrophic increase in reaction time as individuals perceive task difficulty to increase.
[Insert Table 1 about here]
To date, all published experimental studies of malingering have employed crosssectional designs. The most valid studies compare the performance of individuals later diagnosed as malingerers to that of a healthy control group or to the normative data for that test. However, this post-hoc approach provides little opportunity for comparing methods of detecting malingering or for developing new methods using conventional experimental designs. One approach to overcome this limitation has been to compare performance on putative malingering tests or analytic methods between two groups of healthy people, one of which has been instructed or coached about the consequences of a specific injury, disease or condition and then asked to simulate the effects of this condition on the test or method under investigation. However, these experimental designs are also limited in that healthy participants cannot simulate the complete range of illness-related factors that accompany a specific injury or disease. Other limitations include that participants may emphasize different aspects of the malingered condition and that some participants will malinger more realistically than others. Furthermore, this approach cannot be used as a model for symptom exaggeration as the exclusion criteria
4
for such studies generally include a history of head injury, neurological or psychiatric disease.
It is possible to overcome these limitations by modifying experimental designs in two ways. First, rather than having individuals simulate conditions with complex and multiple signs and symptoms, such as head injury or toxic exposure, experimental models of malingering could concentrate on single symptoms. For example, symptoms such as fatigue, forgetfulness, inattention and anxiety present commonly as part of neurological, psychiatric and systemic illness and are also reported commonly by individuals who are later classified as malingerers. The reduction of complex syndromes to investigation at the level of single symptoms is used commonly in neurological and psychiatric research. For example, there have been studies into the specific nature of disinhibition in head injury (Reiger & Gauggel, 2002), delusions of alien control and auditory hallucinations in schizophrenia (Maruff, Wilson & Currie, in press), memory impairment in Alzheimer’s disease (Eustache et al, 2001) and fatigue in multiple sclerosis (Krupp & Elkins, 2000). In each case, the symptom has been studied in an attempt to understand the nature of the disorder when it is characterized clinically by a heterogeneous presentation.
Focusing on single symptoms also allows the use of ‘within-groups’ study designs, which overcome limitations associated with selecting appropriate control groups. For example, if the symptom of fatigue is investigated, individuals can be asked to simulate fatigue and then at some later time, fatigue can actually be induced in the same participants. The effect of the actual and feigned impairment can then be compared in the same individuals. The authors are unaware of any studies in which normal individuals have been asked to feign an impairment associated with a condition with which they have firsthand experience. Consequently, no study has investigated whether expectation of the effects of an actual impairment, and memory of the effects of an actual impairment, 5
produce different patterns of malingering. This requires a repeated measures design where the individual is required to: (a) anticipate the effects of a reversible condition that causes cognitive impairment and with which they have personal experience; (b) be exposed to the actual reversible condition; and (c) remember the effects of the reversible condition at some later time. Methodologically, the most sound method of directly comparing these three conditions (anticipate impairment, actual impairment, remember impairment) is to record the individual's normal or baseline level of performance and then determine the magnitude of change from baseline in each condition (e.g., using a standardized score or z-score).
This study aimed to compare directly the ability of seven recently proposed neuropsychological test criteria to detect individuals who are feigning cognitive impairment associated with a symptom of many neurological and psychiatric illnesses, fatigue. In order to determine whether direct experience of impairment alters an individual’s ability to feign the same impairment, the second aim was to compare the malingering of an anticipated impairment to simulation of a remembered impairment. Many clinicians and researchers propose that individual tests are inadequate for detecting malingering, and that consideration of performance on multiple tests simultaneously may aid clinical detection (Vickery et al, 2001; Slick, Sherman & Iverson, 1999). Therefore, a third aim of this study was to determine whether considering performance on multiple criteria allows more accurate detection of malingering than consideration of performance on individual tests. A repeated-measures design was implemented in which participants were asked to anticipate the effects of fatigue, experience 24 hours of sustained wakefulness resulting in fatigue, and then remember the effects of fatigue. A novel computerized cognitive test battery was administered that allowed reliable serial testing and included measures that addressed each of the seven
6
theoretical models outlined above (Collie et al, in press; Makdissi et al, 2001; Westerman et al, 2001).
Method
Participants Forty healthy young participants (16 males, 24 females) were recruited from two university campuses to participate in this study. The participants had a mean age of 21.64 ± 3.79 years (range = 18-40 years), and mean education of 14.92 ± 1.42 years (range = 13-20 years). Inclusion criteria for the study included age greater than 18 years and normal neurological function as determined by a consultant neurologist. Exclusion criteria included a history of loss of consciousness, psychiatric or neurological illness, dementia, head-injury, the presence of significant health-related (including emotional) disorders, and the use of medications that might have altered cognitive status. Participants were also excluded if they regularly used drug and alcohol substances, or reported difficulties sleeping in the few nights before each experimental session. Each participant was paid $40 for each experimental condition completed. All participants were naïve to the aims and hypotheses of the experimental conditions, and gave informed consent prior to enrolment. The protocol was approved by an institutional ethics committee prior to the beginning of the study.
Materials The CogState computerized cognitive test battery (version 1.0) was administered on Apple Macintosh PowerPC computers in a large assessment laboratory, in which fifteen computers were designated for test administration. This battery takes approximately 1520 minutes to complete depending upon the speed of response. CogState consists of
7
eight different tasks (Westerman et al, 2001; Collie et al, in press; Makdissi et al, 2001), four of which were considered in the present study [Simple Reaction Time (SRT); Choice Reaction Time (ChRT), Complex Reaction Time (CxRT) and One-back working memory (One-Back)]. The stimuli for each task consist of playing cards, and all tasks are presented in a game-like fashion. Importantly, stimulus presentation is randomized within the test battery ensuring that there are many hundreds of equivalent alternate forms of the tasks. Immediately prior to the beginning of each task, both written and visual instructions were presented on the computer screen. These instructions disappeared once the participant had demonstrated they were aware of the task requirements by making three successive correct responses. The task then proceeded with response speed and accuracy being recorded. Responses were indicated by pressing one of three keys on the computer keyboard ('d', 'k', and 'spacebar'). A buzzer indicated incorrect responses, that is, failures to respond or responses faster than 100 milliseconds (ms). Incorrect responses were omitted from further analysis. Correct responses received no auditory feedback.
The SRT task requires the participant to press the spacebar as quickly as possible once a presented card turns face-up. The SRT task is presented at the beginning (SRT1), in the middle (SRT2) and at the end of the test battery (SRT3). The ChRT task requires the participant to decide whether a presented card is red or black. The CxRT task requires the participant to decide whether two presented cards are the same color or different colors. The One-Back task requires the participant to attend to the color, suit and number of a presented card, and decide whether that card is the same as the one presented immediately prior to it. All instructions are framed as questions. For example, the question for the one-back task is "Does the face-up card exactly match the one before?". For ChRT, CxRT and One-Back tasks, the 'k' key is pressed to indicate a yes response, and the 'd' key to indicate a no response. Response speed (average of 8
response time on correct trials), accuracy (percentage of correct responses) and throughput (number of correct responses per minute) are reported for all tasks. The other tasks within the CogState battery have been described in detail previously (Westerman et al, 2001; Collie et al, 2001). A self-rated fatigue questionnaire was administered during the fatigue study. This questionnaire required each participant to rate their current level of fatigue on a scale from 0 to 10, where 0 indicated 'not fatigued at all' and 10 indicated 'very fatigued'.
Procedure The study employed a within-subjects factorial design in which participation required the completion of four sessions scheduled approximately one week apart for four weeks. Brief written and verbal instructions were given prior to the commencement of each session. The first session was designed to familiarize participants with task requirements and to allow an optimal baseline performance level to be reached from which any perturbations in cognitive function would be evident. Participants were instructed to 'try their best' in all tasks. Previous studies indicated that practice effects were evident between the first and second test administration but not thereafter (Collie et al, in press). CogState was therefore administered four times and data from the fourth administration was taken to represent a baseline to which future comparisons would be made. In the second session (Anticipated Impairment), participants were instructed to perform the CogState tasks as if they had been awake continuously for 24 hours. The specific requirements were that they should “malinger fatigue in order to increase the likelihood of winning a lawsuit”. Participants were further instructed that “very obvious faking would result in a loss of settlement and severe court penalty”. In the third session (Actual Impairment) participants were required to complete the test battery every 2 hours for 24 hours of sustained wakefulness. The first assessment was conducted at 9am in the morning and the last (13th) assessment at 9am the next morning. For the duration of 9
this session participants were restricted from drinking caffeine or ingesting other stimulants. Immediately prior to each cognitive assessment, participants were also required to rate their level of fatigue on the self-rated fatigue scale. Entertainment was provided between testing sessions to keep the participants occupied. In the final session (Remembered Impairment), participants were instructed to ‘remember’ the effect that 24 hours of sleep deprivation had on their ability to perform the CogState tasks, and to re-enact the most fatigued they had felt in the previous fatigue session. Specifically they were required “to perform the test battery as if they were experiencing the same level of fatigue as they had experienced recently when they were awake for a whole night”. All participants were tested in the same laboratory on the same computers in all four testing sessions.
Data Analysis
Calculation of performance measures This analysis was guided by the theoretical models proposed for detecting malingering outlined in the introduction and detailed in Table 1. For each experimental condition, participants' data were examined and excluded from further analysis if they had not completed the test battery. Only participants with complete data sets were included in this analysis. The analysis proceeded in a number of stages. First, for each task, the reaction time (RT) of each individual correct response (i.e., true positive and true negative) was base 10 logarithmically (Log10) transformed to normalize RT distributions. Individual participant's mean and standard deviation Log10 RT were then calculated for all correct responses for each of the following tasks: SRT, ChRT, CxRT and One-Back. Second, for each individual participant a linear regression equation was fitted to their mean Log10 RT on the SRT, ChRT and One-Back tasks. The slope of that regression equation was then calculated. Finally, all responses (i.e., true positive, true negative, 10
false positive, false negative, anticipation abnormally slow) were used to calculate accuracy for the SRT, ChRT and One-Back tasks. Specifically, accuracy was defined as the number of true positive and true negative responses divided by the total number of responses attempted, and expressed as a percentage (%) value.
Actual Fatigue data For the purposes of this analysis it was necessary to determine the data point that most appropriately represented performance when fatigued. Self-rated levels of fatigue were used to identify the point at which each subject considered themselves the most fatigued. In almost all cases, the most severe self-rated impairment was observed after 22 hours of the 24 hour sustained wakefulness condition (i.e., 7am in the morning; Figure 1). Therefore, the 22 hour assessment was selected as the fatigue data point against which malingering data would be compared.
[Insert Figure 1 about here]
Comparison of theoretical models of malingering. To determine the effects of malingering as measured by each of the performance measures in Table 1, z-scores were calculated for each malingering index for each participant by comparing performance in the anticipated impairment, remembered impairment and actual impairment conditions to that at baseline. Z-scores were employed to standardize data against performance at baseline and to allow direct comparison between malingering and actual impairment conditions. The following formulae were employed:
Anticipated Impairment (ANI) = Performance ANI -Group mean performance Baseline Group standard deviation performance Baseline 11
Remembered Impairment (RMI) = Performance RMI-Group mean performance Baseline Group standard deviation performance Baseline Actual Impairment (ACI) = Performance ACI-Group mean performance Baseline Group standard deviation performance Baseline This analysis reduced the number of conditions for each performance measure to three. For RT measures, a positive z-score indicated a slowing of responses, an increase in variability or an increase in the slope of the linear regression equation, while a negative z-score indicated faster mean responses, a decrease in variability or a decrease in the slope of the linear regression equation. For accuracy measures, positive z-scores indicated an increase in accuracy and negative z-scores a decrease in accuracy. Direct comparisons of the actual impairment, anticipated impairment and remembered impairment conditions were then conducted by submitting each participants’ standardized performance score to a series of one-way repeated measures Analysis of Variance (ANOVA). For this analysis, the Type 1 error rate was set at 0.007 (0.05/7) to protect against experiment wise error. Where significant effects were observed, three paired samples t-tests were employed to compare: (1) actual impairment and anticipated impairment; (2) actual impairment and remembered impairment; and (3) anticipated impairment and remembered impairment. For this analysis, the Type 1 error rate was set at 0.017 (0.05/3) to protect against experiment wise error. Effect sizes for each of these comparisons were calculated using Cohen's d' (Zakzanis, 2001). Finally, for each performance measure, receiver operating characteristic (ROC) curves were plotted where significant differences were observed between experimental conditions. For each ROC curve, a cut-score was calculated that represented the point at which both sensitivity and specificity were highest, and the area under the curve (AUC) was also calculated. The AUC is an indicator of the diagnostic accuracy of a test, with a score of 1 indicating a perfectly sensitive and specific diagnostic test, and a score of 0.5 indicating a test that is unable to differentiate between two clinically distinct conditions. Finally, we 12
also sought to determine whether combinations of performance measures allowed more accurate detection of malingering than individual performance measures. To do this, we determined the two individual performance measures with the highest sensitivity and specificity in each of the anticipated impairment vs. actual impairment and remembered impairment vs. actual impairment comparisons. The bivariate distributions of these measures under both malingering and actual impairment conditions were then plotted by placing, for each individual, performance on one measure on the x-axis and performance on the other measure on the y-axis. Cut-scores derived from the above analysis were applied and the number of participants in both malingering and actual impairment conditions meeting the following criteria determined: (a) above cut-score on both performance measures; and (b) above cut-score on one performance measure only. These values were then converted to percentages to determine the sensitivity and specificity of the combined performance measures.
Results
The group means and standard deviations for the standardized performance measures under each experimental condition are reported in Table 2. The results of repeated measures ANOVA conducted on this data is also shown. Significant differences between experimental conditions were observed for six of the seven malingering performance measures. The only performance measure not to record significant within-subjects effects was SRT accuracy. This measure was therefore not included in any further analyses. Post-hoc paired samples t-tests revealed significant group differences between the anticipated impairment and actual impairment conditions on the following performance measures: SRT mean Log10 RT [t(39)=5.8,p