Comparison of Multiple Scoring Methods for Rey's ... - Science Direct

51 downloads 0 Views 727KB Size Report
Thus, very poor neuropsychological test scores more than 1 year after .... Oscillation (also known as Finger Tapping; Reitan & Wolfson, 1985), Lafayette Grooved.
Pergamon

Archives of Clinical Neuropsychology, Vol. 11, No. 4, pp. 283-293, 1996 Copyright © 1996 National Academy of Neurop6ychology Printed in lhc USA. All rights reserved 0887-6177/96 $15.00 + .00

SSDI 0887-6177(95))00038-0

Comparison of Multiple Scoring Methods for Rey's Malingered Amnesia Measures Manfred F. Greiffenstein and W. John Baker Psychological Systems, Inc.

Thomas Gola Wayne State University

The predictive accuracy of Andre Rey's malingered amnesia measures (Memory for Fifteen Items and Word Recognition List) was examined. Discriminant function and crosstabulation analytic strategies were applied to predict membership in either a traumatically brain injured group (n = 60) or in a litigated minor head injury group claiming permanent severe disability (n = 90). Satisfactory hit rates were achieved with Rey's original scoring methods, but improved hit rates were obtained with scoring modifications. Removal of dense anmesiacs from the sample resulted in even better hit rates. Rey's measures appear to be valid for the assessment of cognitive malingering in settings where litigated disability claims are out of proportion to injury characteristics. Rey's measures are not appropriate for globally and severely impaired patients in clinical settings.

Lezak (1983) introduced the American neuropsychology community to the malingered amnesia measures of French neuropsychologist Andre Rey. These measures include the Fifteen Item Memory Test (Rey-15) and the Word Recognition List (WRL). The Rey-15 is a visual immediate memory test that was first described in Rey (1964). Psychometrically, it is a very easy task that requires immediate drawn reproduction of overlearned number, letter, and shape series. Rey suggested a score of three rows (nine items) or less as suggestive of feigned amnesia. The WRL is a verbal immediate recognition memory task (Rey, 1941). The WRL was specifically designed to discriminate "traumatic encephalopathy" from factitious brain injury. A finding of WRL words recognized equal to or less than words recalled on the first trial of Rey's Auditory Verbal Learning Test (AVLT) signals feigning of memory deficit. Rey created these measures on a face valid and rational basis. He did not employ an empirical validating strategy.

Partial data from this study were presented as a poster at the 1994 Annual Convention of the American Psychological Association, Los Angeles, California. Address correspondence to: M. Frank Greiffenstein, PhD, 217 South Woodward, Suite 102, Royal Oak, M148067. 283

284

M. E Greiffenstein, W. J. Baker, and T. Gola

There have been a number of clinical trials testing the validity of the Rey-15. Guilmette, Hart, Giuliano, and Leininger (1994) examined the sensitivity of the Rey-15 to volunteer simulation of organic amnesia. The authors found Rey-15 performance to be worse in individuals with mixed forms of severe brain damage than in volunteer dissimulators. Schretlen, Brandt, Krafft, and Van Gorp (1991) found litigating minor accident victims to score poorer on the Rey-15 than individuals with major head injuries, but they also found depressed Rey15 performance in anoxic patients with severe memory and other cognitive deficits. Unfortunately, Schretlen et al.'s sample size for suspected fakers was low, making generalization difficult. Greiffenstein, Baker, and Gola (1994) found the Rey-15 to have acceptable sensitivity and specificity in separation of real TBI from factitious brain injury, but only one scoring method was tested. There has been one clinical trial of the WRL (Greiffenstein et al., 1994), but only one scoring method was examined. There are two basic methods for validation of malingered amnesia measures, termed simulation and known group methodologies (Rogers, 1990). Simulation methodology involves administering putative faking measures to nonclinical volunteers asked to fake a serious brain disorder. Known group methodology involves selection of individuals who are strongly suspected on clinical grounds of manufacturing severe disability (Binder, 1993; Binder & Willis, 1991). Known group studies are preferable because results are more generalizable to real samples, as opposed to the limited generalization that can be made from the college student samples used in the typical simulation study. The main problem with the known group method is the absence of reliable and uncontaminated criteria for malingering. The grouping criteria stated in known group analyses are often vague and difficult to replicate in an independent laboratory. Descriptions such as "nine malingerers, classified on the basis of independent clinical judgement" (Rogers, 1988) and "suspected faking" (Schretlen et al., 1991) are stated without any guidelines as to how these determinations are made. In addition, the danger of criterion contamination is not addressed. Criterion contamination results when the methods used to construct comparison groups overlap with the dependent measures. The result is illusory correlation and confirmatory bias. Greiffenstein, Baker, and Gola (1994) developed criteria for assignment of malingering status to real world clients that are reliable, replicable, and distinct from the malingering measures being validated. This method can be termed convergent improbable outcome selection. The method involves identification of healthy but litigating postconcussion cases who present with more than one improbable outcome. For example, vocational and educational status in a prospective series of mild traumatic brain injury is not different from that of controls at 1 year follow-up (Dikmen, Temkin, Machamer, Holubkov, Fraser, & Winn, 1994; McLean, Dikmen, & Temkin, 1993). Thus, a claim of minor head trauma being the proximate cause of joblessness a year later can be termed an improbable outcome. Cognitive recovery in minor head trauma takes place between 3 days to 1 month (McClean, Temkin, Dikmen, & Wyler, 1983; Ruff et al., 1989), with more complex attention recovering by 12 weeks in more severe cases of mild head injury (up to 24 hours of PTA; Gronwall & Wrightson, 1974). Thus, very poor neuropsychological test scores more than 1 year after MHI can also be termed an improbable outcome (Larabbee, 1990). There is nothing new about this concept. Neurologists have long used such a "discrepancy method" to detect malingerers (Adams & Victor, 1993), the detection of mismatches between a patient's behavior and their disease histories. It is argued that the joint occurrence of improbably poor test performance (relative to documented severe TBI) with other improbable outcomes in the compensation seeking postconcussion patient justifies classification as probable malingering (PM). A more thorough evaluation of Rey's tasks with large sample sizes appears to be appropriate at this point in the evolution of neuropsychology as an applied science. Serious ques-

Rey's Malingering Measures

285

tions have been raised regarding the ability of expert neuropsychologists to distinguish real from factitious neurological impairment (Faust, Ziskin, & Hiers, 1991; Heaton, Smith, Lehman, & Vogt, 1978). The purpose of this study was to examine the validity of Rey's two malingered amnesia methods by use of a known group methodology in a forensic context. The predictive accuracies of eight scoring methods were examined for their ability to separate documented severe traumatic brain injury cases from chronically complaining mild head injury (MHI) litigants claiming severe and permanent disability.

METHOD Subjects The traumatic brain-injury group (TBI) consisted of a consecutive series of 60 patients with severe traumatic brain injury referred by physicians, insurance companies, and attorneys. The criteria for inclusion were (1) Glasgow Coma Score of twelve or less more than 48 hours after trauma, (2) posttraumatic amnesia (PTA) longer than 48 hours, (3) hospital stays of greater than 1 week, and (4) positive CT scan and/or focal neurologic findings. Sixty percent of these patients had GCS scores in the severe range on admission (GCS = 3-8) and the remainder were moderate (GCS = 9-12). Forty-four patients (73%) had combined coma and PTA longer than 1 week. The probable malingering group (PM) consisted of 90 litigating postconcussion patients who claimed severe and permanent disability more than 1 year after nontraumatic deceleration or mildly traumatic deceleration head injuries. They were selected from a larger series of postconcussion patients seen in 1992, 1993, and 1994. Postconcussion patients who sustained long bone fractures or internal injuries were eliminated. Subjects with positive serial EMGs read by board certified electromyographers were also eliminated. PCS patients who still had subjective symptoms more than 1 year postinjury but had returned to work and were seeking only first party payments (retroactive payment of doctor and hospital bills) were also eliminated. The 90 PM subjects were selected from the remaining series of postconcussion patients on the following basis: (1) two or more neuropsychological test scores in the severely impaired range (> -3 SD below age-education reference group), and the presence of one or more additional improbable outcomes such as (2) total disability in work or a major social role lasting greater than 1 year, (3) subjective claims of severe remote memory loss in at least one area (spelling, reading, overlearned motor skills, childhood memory), and (4) striking contradictions between self-report and collateral sources (e.g., surveillance films, hospital records). With regards to criterion #4, small discrepancies between documented and claimed problems were ignored (e.g., records state "seconds" of PTA, patient says "minutes" the following year). Only obvious lies and misrepresentations were coded, for example, claiming 48 hours of amnesia when records showed the patient drove to the hospital and gave a detailed accident description to police. This method has proven interrater reliability (Greiffenstein et al., 1995). All PM Ss were involved in either a third party lawsuit alone (seeking compensation from the offending driver or corporate entity) or both first and third party lawsuits. The objective injury characteristics recorded in medical documents were generally minor: 69% had posttraumatic amnesia less than 20 minutes, 11% had benign head trauma (no amnesia), and the remaining 20% had nontraumatic deceleration injuries (also known as "whiplash"). All patients had Glasgow Coma Scale scores of 15 (maximum = 15) at the time of arrival at the emergency room. No hospital stay was greater than 24 hours. The neuropsychological measures on which improbable poor score rankings were based included the Logical Memory and Visual Reproduction subtests of the Wechsler

286

M. E Greiffenstein, W. J. Baker, and T. Gola

Memory Scale - - Revised (WMS-R; Wechsler, 1987), Rey's Auditory Verbal Learning Test (Lezak, 1983), the Stroop Test (Spreen & Strauss, 1991), Speech Sounds Perception (Reitan & Wolfson, 1985), Mesulam Letter Cancellation (Mesulam, 1987), Halstead Grip strength (also known as the Smedley Dynamometer; Reitan & Wolfson, 1985), Finger Oscillation (also known as Finger Tapping; Reitan & Wolfson, 1985), Lafayette Grooved Pegboard (Heaton, Grant, & Matthews, 1991), Trailmaking Test A and B (Reitan & Wolfson, 1985), Wisconsin Card Sorting Test (Heaton, Grant, & Matthews, 1991), Luria 3Step (Rothlind & Brandt, 1993), Rey Complex Figure (Lezak, 1983; Spreen & Strauss, 1991), Aphasia Screening Test (Reitan & Wolfson, 1985), 60-item Boston Naming Test (Kaplan, Goodglass, & Weintraub, 1983) and the Wide Range Achievement Test - - Revised (Jastak & Jastak, 1984).

Procedure The Rey-15 and WRL stimuli were obtained from Lezak (1983). The Rey-15 was administered according to Lezak's (1983) guidelines: the client was asked to study the 3 × 5 symbol array printed on a 8-1/2 × 11" card for 10 seconds. The stimulus card was withdrawn and the client was asked to draw from what they say from memory. The WRL administration consisted of a study phase and a recognition memory probe. The study phase consisted of instructing the client to listen to a list of words. The technician read the 15-word WRL (half, camel, mistake, toy, morning, hair, wax, grain, cookie, fly, place, cherry, door, knee, state). There was no 5-second delay between the study period and the recognition memory probe, the only modification from Lezak's instructions. This brief delay was eliminated in order to make the test easier. The recognition memory probe consisted of supplying the client with a typed list of 30 words containing the original 15 words and 15 recognition foils. The client was asked to circle the words from the learning list but avoid circling words not on the original list. Clients with reading difficulties were read the list and asked to answer "yes" and "no" in response to each item. The authors later subjected scores on the Rey-15 and WRL list to four different scoring methods after clients had been classified as TBI or PM on the basis of the criteria stated earlier. The Rey-15 was scored in the following manner. Accuracy scoring involved summing the number of correctly recalled symbols with a maximum score of 15. The spatial position of the reproduced row or the spatial position of a symbol within a row was irrelevant. For example, recalling "A C B D" would receive a score of 3. Adjusted accuracy scoring involved subtracting intrusions (symbols not from the original stimulus vocabulary) from the number correctly recalled. Recalling "A C B D" would receive a score of 2. Spatial scoring was calculated by summing the number of symbols accurately placed within a row. In this case, "A C B D" would receive a score of only 1. Adjusted spatial scoring involved subtracting intrusions from the spatial accuracy score. "A C B D" would, thus, earn a score of 0. The WRL was scored four different ways. The first scoring method was a sign approach (WRL-I) as suggested by Rey (cited in Lezak, 1983). Malingered amnesia was rated as present if the number of correctly recognized words was less than or equal to the words freely recalled on the first trial of Rey's Auditory Verbal Learning Test. The second approach was a modification of Rey's sign, with PM rated as present if the number of correctly identified words minus intrusion errors was less than or equal to the first trial AVLT free recall score (WRL-2). The third method was termed accuracy scoring and entailed a simple sum of correctly identified words. The fourth and final method was termed the adjusted accuracy score. This was the sum of words recognized minus false positive errors.

Rey's Malingering Measures

287

RESULTS

Analysis of Group Characteristics The means and standard deviations of group characteristics are listed in Table 1. The group differences were examined with independent t-tests (experiment-wide critical level o f p < .05). The two groups did not differ in age [t(148) = 1.95, p = .053], years of education [t(148) = 1.13], reading level as measured the Wide Range Achievement Test - Revised [t(148) = -1.02] or presumed preinjury intelligence as measured by errors on the North American Adult Reading Test [t(148) = 1.44, p = .309]. There were significant differences in the number of days of hospitalization [t(148) = -8.13, p < .0001], and duration [months since injury; t(148) = -4.13, p < .0001]. These differences indicate the TBI group consisted predominately of patients who were neurologically and medically very unstable upon initial admission. Analysis of group differences on standard episodic memory measures with independent t-tests (df = 148) indicated consistently poorer performance by the PM group: WMS-R Logical Memory I (t = -4.32, two-tailed p < .0001), WMS-R Logical Memory II (t = -3.46, two-tailed p < .001), WMS-R Visual Reproductions I (t = -3.17, two-tailed p < .002), WMS-R Visual Reproduction II (t = 12.12, two-tailed p < .036), and AVLT total words recalled (t = -3.29, two-tailed p < .001). This pattern of worse performance by litigating minor head trauma clients justifies a label of probable malingering.

Analysis of Predictive Accuracy Group means and standard deviations of the eight scoring methods are listed in Table 2. The validity of the two sign approaches to WRL was tested with a chi-square. The presence or absence of the sign WRL-1 (Rey's original method) was unequally distributed between

TABLE 1

Means and Standard Deviations of Group Characteristics

Traumatic Brain Injury (n = 60)

Age Education WRAT-R reading NAART errors Hospital days* Duration illness* (months) WMS-R LM I* WMS-R LM ll* WMS-R VR I* WMS-R VR II* AVLT total*

Prob. Malingerers (n = 90)

Mean

SD

Mean

SD

34.15 11.87 84.59 43.31 65.80

10.79 2.49 16.95 12.24 76.85

37.77 12.51 81.57 46.21 0.08

11.31 3.92 18.08 12.49 0.27

47.07 18.90 13.25 28.78 19.76 40.63

45.11 7.52 9.38 8.49 11.10 11.79

25.19 13.82 8.78 24.30 16.02 34.89

18.09 6.73 6.44 8.56 10.35 9.42

*Groups significantly different at p < .001 level, two tailed. WRAT-R reading = Reading subtest standard score from the Wide Range Achievement Test - - Revised, NAART = North American Adult Read Test, WMSR LM I -- Logical Memory I from the Wechsler Memory Scale Revised, WMS-R LM II = Logical Memory II, VR I = Visual Reproduction I, VR II = Visual Reproduction II, AVLT Total = Total words recalled from the Auditory Verbal Learning Test.

288

M. E Greiffenstein, W. J. Baker, and T. Gola

TABLE 2 Group Means and Standard Deviations of Six Scoring Approaches to Rey's Malingering Measures Probable Malingerers

Traumatic Brain Injury

Fifteen Item Memory Accuracy Score Adjusted Accuracy Spatial Score Adjusted Spatial Score Word Recognition List Accuracy Score Adjusted Accuracy Score

Mean

SD

Mean

SD

12.55 12.53 I 1.90 11.88

3.58 3.40 3.31 3.38

9.68 9.26 7.29 6.87

3.58 4.17 5.07 6.04

8.54 7.34

2.62 2.96

5.17 3.98

2.52 3.02

the two groups (Z2 = 27.58, p < .00001). Sign WRL-2 (Rey's original method with adjustment for false positive errors) was also distributed unequally between the TBI and PM groups (Z2 = 28.86, p < .00001). Crosstabulation of group membership by sign presence, expressed in terms of sensitivity and specificity, are presented in Table 3. Discriminant function analysis were carded out on the six remaining scoring methods. All six variables produced significant canonical correlations as tested by a chi-square (df = 1): WRL Accuracy scoring (r = .544, Z2 = 51.406, p < .00001), WRL Adjusted Accuracy (r = .4836, ~2 = 39.02, p < .00001), Rey-15 Accuracy (r = .370, ~2 = 21.57, p < .001), Rey-15 Adjusted Accuracy (r = .3791, Z2 = 22.73, p < .00001), Rey-15 Spatial Accuracy (r = .4517, ~2 = 33.43, p < .00001), Rey-15 Adjusted Spatial (r= .4304, ~2 = 30.01, p < .00001). Cutting scores were calculated by setting the discriminant function score to zero, multiplying the unknown raw score by the unstandardized canonical discriminant coefficient, and adding the constant. Table 3 contains the results of these analyses, expressed in terms of sensitivities, specificities, and incremental hit rates (a priori base rate for PM = 0.6). The cutting scores for the six measures treated to discriminant function analysis are presented in Table 4.

Analysis With Dense Amnesiacs Removed Table 3 suggests some specificity levels that are unacceptably low. For example, the false positive error rate for the Rey-15 is 33%. The other specificity levels indicate false positive errors rates ranging from 12% to 28%. One way to reduce false positive errors is to lower the

TABLE 3 Sensitivity, Specificity, and Incremental Hit Rates for Eight Scoring Approaches to Rey's Malingering Measures

Fifteen Item Memory Accuracy Adjusted Accuracy Spatial Adjusted Spatial Word Recognition List WRL-1 WRL-2 Accuracy Adjusted Accuracy

Sensitivity

Specificity

Increment

64% 60% 69% 69%

72% 77% 77% 77%

7% 7% 12% 12%

58% 62% 80% 72%

88% 82% 85% 80%

8% 12% 22% 15%

Rey's Malingering Measures

289

TABLE 4 Proposed Cutting Scores for Six Scoring Approaches to Rey's Methods Proposed Cutting Score Rey Fifteen Item Memory Accuracy Adjusted Accuracy Spatial Adjusted Spatial Word Recognition List Accuracy Adjusted Accuracy

< 10

Suggest Documents