The power of timing: Adding a time-to-completion ...

0 downloads 0 Views 2MB Size Report
Sep 22, 2016 - aDepartment of Psychology, University of Windsor, Windsor, ON, Canada; bNeuropsychology Services, Department of Psychiatry,.
Journal of Clinical and Experimental Neuropsychology

ISSN: 1380-3395 (Print) 1744-411X (Online) Journal homepage: http://www.tandfonline.com/loi/ncen20

The power of timing: Adding a time-to-completion cutoff to the Word Choice Test and Recognition Memory Test improves classification accuracy Laszlo A. Erdodi, Bradley T. Tyson, Ayman G. Shahein, Jonathan D. Lichtenstein, Christopher A. Abeare, Chantalle L. Pelletier, Brandon G. Zuccato, Brittany Kucharski & Robert M. Roth To cite this article: Laszlo A. Erdodi, Bradley T. Tyson, Ayman G. Shahein, Jonathan D. Lichtenstein, Christopher A. Abeare, Chantalle L. Pelletier, Brandon G. Zuccato, Brittany Kucharski & Robert M. Roth (2017) The power of timing: Adding a time-to-completion cutoff to the Word Choice Test and Recognition Memory Test improves classification accuracy, Journal of Clinical and Experimental Neuropsychology, 39:4, 369-383, DOI: 10.1080/13803395.2016.1230181 To link to this article: http://dx.doi.org/10.1080/13803395.2016.1230181

Published online: 22 Sep 2016.

Submit your article to this journal

Article views: 21

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ncen20 Download by: [University of Windsor], [Dr Laszlo A. Erdodi]

Date: 13 March 2017, At: 08:45

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 2017 VOL. 39, NO. 4, 369–383 http://dx.doi.org/10.1080/13803395.2016.1230181

The power of timing: Adding a time-to-completion cutoff to the Word Choice Test and Recognition Memory Test improves classification accuracy Laszlo A. Erdodia,b, Bradley T. Tysonb,c, Ayman G. Shaheind, Jonathan D. Lichtensteinb, Christopher A. Abearea, Chantalle L. Pelletierb, Brandon G. Zuccatoa, Brittany Kucharskia and Robert M. Rothb a

Department of Psychology, University of Windsor, Windsor, ON, Canada; bNeuropsychology Services, Department of Psychiatry, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA; cWestern Washington Medical Group, Everett, WA, USA; dThe Schulich School of Medicine, Western University, London, ON, Canada ARTICLE HISTORY

ABSTRACT

Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6–10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93–.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.

Forced-choice word recognition tests provide a well-established paradigm for performance validity test (PVT; Boone, 2013). The Word Choice Test (WCT), a relatively recent addition to the existing toolbox, is the stand-alone PVT of the Advanced Clinical Solutions (Pearson, 2009) published in conjunction with the fourth edition of the Wechsler Adult Intelligence Scale (WAIS–IV) and Wechsler Memory Scale (WMS–IV). Since it is modeled after the Warrington Recognition Memory Test (RMT; Warrington, 1984), it shares several features with its predecessor: format (50 CONTACT Laszlo A. Erdodi [email protected] Avenue, Windsor, ON N9B 3P4, Canada © 2016 Informa UK Limited, trading as Taylor & Francis Group

Received 1 February 2016 Accepted 3 August 2016 KEYWORDS

Embedded validity indicators; Performance validity assessment; Recognition Memory Test; Time-to-completion cutoffs; Word Choice Test

items, binary forced-choice recognition), appearance (presented as a verbal memory test), and administration protocol (during encoding, each target word is shown for 3 s, a response from the examiner is required, then a page with 50 word pairs—one target, one foil—is presented during the recognition trial). The WCT also has a few unique features. During the encoding trial, it requires semantic categorization of each word (man-made or natural), promoting a greater depth of processing. Also, responses during this encoding phase can

Department of Psychology, University of Windsor, 168 Chrysler Hall South, 401 Sunset

370

L. A. ERDODI ET AL.

be evaluated for accuracy, which might serve as an incentive for examinees to attend to the target words more closely. On the other hand, RMT instructions simply ask examinees to report whether their associations with the target word are pleasant or unpleasant. As the pleasantness rating is an inherently subjective and, thus, unverifiable response, ultimately it is not possible to determine whether the examinee is in fact attending to the task or just responding at random. In addition, the WCT requires the examiner to read the target stimulus aloud as the printed word is presented, adding a redundant encoding mechanism in the form of auditory input. Finally, the font size is larger with the WCT both during the initial stimulus presentation and on the recognition card. Given these similarities and differences, the question of equivalence has been raised. Erdodi, Kirsch, Lajiness-O’Neill, Vingilis, and Medoff (2014) examined the classification accuracy of both tests in a mixed clinical sample of 237 patients referred for neuropsychological testing. Half of the sample (n = 120) were administered the RMT, and half (n = 117) were administered the WCT. At the level of accuracy scores, the WCT was significantly easier and less variable (M = 47.1, SD = 4.6) than the RMT (M = 44.3, SD = 6.8). Correcting for this difference between the means by adding three points to the RMT accuracy score appeared to achieve functional equivalence in terms of failure rates and signal detection performance. Using the RMT ≤42 as the reference cutoff (M. S. Kim et al., 2010), an adjusted WCT cutoff of ≤45 produced essentially the same base rate of failure (BRFail; 20.8% and 20.5%), and comparable sensitivity (.40 and .46) and specificity (.87 and .94) against a composite of four PVTs. The most recent investigation comparing the two tests was performed by Davis (2014) in a sample of 46 physiatry patients who were administered both measures. At the level of accuracy scores, patients performed better on the WCT (M = 47.7, SD = 4.3) than on the RMT (M = 44.9, SD = 6.8). The majority of patients (61%) had zero PVT failures, and they obtained higher and less variable scores than the overall sample (MWCT = 49.4, SDWCT = 1.0; MRMT = 47.6, SDRMT = 2.5). In contrast, those who failed two PVTs had notably poorer and more variable performance on both the WCT (M = 42.0, SD = 10.4) and the RMT (M = 37.0,

SD = 13.5), suggesting that these measures are sensitive to invalid responding. The Davis (2014) study also offered the first empirically grounded explanations for the persisting differences in mean raw scores between the two tests. Examining the item-level characteristics of the two instruments revealed that the target words of the WCT had lower word frequency as well as higher imageability and concreteness than those of the RMT. Moreover, there were significant differences between targets and foils (target words had higher imageability and concreteness) during the recognition trial of the WCT, but not the RMT. Since these discrepant features predict that the WCT items would be easier to recognize, they provide a compelling reason for the consistently higher raw scores in the WCT. Although the roughly 3-point mean difference between the two tests observed in these studies may suggest that once corrected for the difference in their inherent difficulty level, the two instruments function similarly, certain questions about equivalence remain. First, it is unclear whether the results would generalize to different clinical samples using different reference PVTs. Second, as Erdodi and colleagues (Erdodi, Kirsch, et al., 2014) acknowledged, their study was based on a between-groups design, and, as such, it did not allow for a direct comparison of the two tests within the same examinee. While the study by Davis (2014) was the first to use a within-group design, his sample was relatively small and relied on the Word Memory Test as the free-standing reference PVT, which was administered to only 78% of the patients. Finally, the RMT currently has the advantage of an additional published cutoff based on time-tocompletion. M. S. Kim and colleagues (2010) reported that a completion time of ≥207 s to complete the recognition trial is an effective emergent indicator of invalid performance on its own (.65 sensitivity at .91 specificity). This classification accuracy outperforms accuracy cutoffs (≤42 and ≤45) associated with both RMT and WCT in the Erdodi, Kirsch, et al. (2014) study. If this can be replicated, the time-cutoff alone would make the RMT a more attractive choice, since this feature is not available with the WCT. The present study was conducted to address some of these questions regarding the two instruments. It was hypothesized that the mean time-tocompletion for the WCT recognition trial will be

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

lower than that of the RMT recognition trial, in line with previous reports on the discrepancy in difficulty level reflected by mean differences in accuracy scores. In addition, we predicted that introducing a time-cutoff for the WCT would enhance its overall classification accuracy, as observed with the RMT.

Method Participants Archival data were collected from a consecutive series of 202 outpatients clinically referred for neuropsychological assessment by a physician between December 2012 and June 2014 at the Department of Psychiatry of a northeastern academic medical center. Only those clinical files that contained data on both RMT and WCT were included in the study. The majority of the sample was female (54.5%) and right-handed (87.1%). Mean age was 45.3 years (SD = 16.8). Mean level of education was 13.9 years (SD = 2.7). Mean fullscale IQ (FSIQ) was 99.0 (SD = 15.9). The main diagnoses were psychiatric (44.1%), traumatic brain injury (TBI; 37.1%), stroke (5.4%), neurological (10.4%) and general medical conditions (3%). Within the psychiatric subsample, the majority (83.6%) had two or more psychiatric diagnoses. The most common conditions were depression (77.1%) and anxiety (57.4%), followed by obsessive-compulsive disorder (OCD; 13.1%), posttraumatic stress disorder (PTSD; 13.1%), and personality disorders (11.5%). The majority of the TBI sample (85.3%) sustained a mild head injury. The most common neurological disorder was epilepsy (61.9%), followed by Parkinson’s (19%) and Huntington’s disease (9.5%). Materials In addition to the RMT and WCT, patients were administered a core battery of neuropsychological tests including the Wechsler Adult Intelligence Scale–Fourth Edition (WAIS–IV; Wechsler, 2008) and Wechsler Memory Scale–Fourth Edition (WMS–IV; Wechsler, 2009), the Wide Range Achievement Test, 4th edition (WRAT–4; Wilkinson & Robertson, 2006) Reading subtest, California Verbal Learning Test, 2nd edition (CVLT–II; Delis, Kramer, Kaplan, & Ober, 2000), a letter fluency task (FAS; Spreen & Benton, 1977),

371

the Wisconsin Card Sorting Test (WCST; Heaton, Chelune, Talley, Kay, & Curtiss, 1993), the Finger Tapping Test (FTT), Conners’s Continuous Performance Test, 2nd edition (CPT–II; Conners, 2004), and the Rey 15 Item Test (Rey–15; Rey, 1964). Although the Rey–15 is one of the most widely used PVTs (Sharland & Gfeller, 2007), it has been criticized for low sensitivity to invalid responding (Reznek, 2005). Since it was the only free-standing PVT consistently administered in addition to the RMT and WCT, three composite measures were developed based on embedded validity indicators (EVIs) to complement it. Combining data from multiple measures administered throughout the testing session is consistent with the ideal of ongoing monitoring of test-taking effort (Boone, 2009). Moreover, aggregating multiple measures provide cumulative evidence of invalid responding, with each additional EVI failure increasing the confidence in the correct detection of noncredible performance (Larrabee, 2008). The first composite was based on nine different PVTs and was labeled the EI-9. Only measures administered to at least 60% of the sample were included in the EI-9. If a component test was not administered to a given patient, it was counted as a Pass to ensure that one missing data point does not forfeit the entire composite. Each component of the EI-9 was dichotomized along published cutoffs. Some PVTs contained several indicators—failing any of them was counted as failing the entire PVT. The value of the EI-9 was obtained by summing the number of failed component PVTs. As such, it ranges from 0 (no EVI failure) to 9 (all EVIs failed). Table 1 lists the components of the EI-9 and references for the cutoffs used. The majority of the sample (55.9%) failed zero or one PVT, which is considered a Pass according to commonly accepted forensic standards (Boone, 2013). Given that the most sensitive of the available cutoffs was used on all EI-9 components, a score of two was classified as Borderline (14.4%) and was excluded from further analyses to protect against false positives. Therefore, ≥3 was considered the first level of failure. Since the EI-9 is an aggregate measure that represents a variety of sensory modalities and testing paradigms, it was used as the main reference PVT in cross-validation analyses. The EI-9 is an extension of the methodology described by Larrabee (2003), who demonstrated that a

372

L. A. ERDODI ET AL.

Table 1. Base rates of failure for EI-9 components, cutoffs, and references for each indicator. Test Rey–15 Digit Span

BRFail Indicator Cutoff Reference 12.4 Recall ≤9 Lezak (1995); Boone, Salazar, Lu, Warner-Chacon, and Razani (2002) 29.2 RDS ≤7 Greiffenstein, Baker, and Gola (1994); Pearson (2009) ACSS ≤6 Axelrod et al. (2006); Spencer et al. (2013); Trueblood (1994) LDF ≤4 Heinly, Greve, Bianchini, Love, and Brennan (2005) WCST 17.3 FMS ≥2 Larrabee (2003); Suhr and Boyer (1999) LRE >1.9 Greve, Bianchini, Mathias, Houston, and Crouch (2002); Suhr and Boyer (1999) WMS–IV LM 19.8 I ACSS ≤3 Bortnik et al. (2010) II ACSS ≤4 Bortnik et al. (2010) Recognition ≤20 Bortnik et al. (2010); Pearson (2009) WMS–IV VR 19.3 Recognition ≤4 Pearson (2009) Bauer, Yantz, Ryan, Warden, and McCaffrey (2005); Greve, Curtis, Bianchini, and Ord (2009); Wolfe CVLT–II 16.8 HitsRecognition ≤10 et al. (2010) FCR ≤15 Bauer et al. (2005); D. Delis, personal communication, May 2012; Erdodi, Kirsch, et al. (2014) RCFT 34.2 Copy raw ≤26 Meyers, & Meyers (1995); Lu, Boone, Cozolino, and Mitchell (2003); Reedy et al. (2013) 3-min raw ≤9.5 Lu et al. (2003); Reedy et al. (2013) ≤6 Lu et al. (2003); Reedy et al. (2013) TPRecognition Atyp RE ≥1 Blaskewitz, Merten, and Brockhaus (2009); Lu et al. (2003) FAS 12.4 T score ≤33 Curtis, Thompson, Greve, and Bianchini (2008); Sugarman and Axelrod (2015) Animals 18.3 T score ≤33 Hayward, Hall, Hunt, and Zubrick (1987); Sugarman and Axelrod (2015) Note. BRFail = base rate of failure (% of the sample that failed one or more indicators within the test); RDS = reliable digit span; ACSS = agecorrected scaled score; LDF = longest digit span forward; WCST = Wisconsin Card Sorting Test; FMS = Failure to maintain set; UE = unique errors; LRE = logistical regression equation; WMS–IV = Wechsler Memory Scale, 4th edition; LM = Logical Memory; VR = Visual Reproduction; CVLT–II = California Verbal Learning Test, 2nd edition; FCR = Forced-Choice Recognition; RCFT = Rey Complex Figure Test; TPRecognition = recognition true positives; Atyp RE = atypical recognition errors.

combination of five EVIs produced respectable overall classification accuracy (88–92%) separating credible and noncredible performance in a mixed sample of malingered neurocognitive dysfunction, moderate-to-severe TBI, and patients with mixed neurological and psychiatric diagnoses. More importantly, failing ≥2 EVIs was associated with perfect specificity. Considering the duality of accuracy scores and time-to-completion within the RMT and WCT, two additional validity composites were developed to more accurately match the underlying abilities measured by each of these indicators. The first one was exclusively based on five EVIs nested within the recognition memory paradigm and was labeled EI5REC. The second one was constructed using five processing speed based EVIs and therefore labeled EI-5PSP. Components of the EI-5s were recoded into a 4point scale. A score reflecting an incontrovertible Pass was assigned the value of zero. Failing the most conservative cutoff was assigned the value of three, with the values one and two indicating intermediate levels of failure. This model of validity assessment recognizes the dual nature of cognitive effort: categorical (passing or failing a cutoff) and dimensional (the extent to which an indicator was failed). As such, the EI-5s simultaneously monitor both the number and severity of PVT failures. It has been shown to produce a classification accuracy that is comparable to well-

established stand-alone PVTs (Erdodi, Tyson, et al., 2016; Erdodi, Roth, Kirsch, LajinessO’Neill, & Medoff, 2014). Table 2 lists the components of the EI-5s, BRFail at each of the three levels, and references for the cutoffs used. As cross-validation analyses require a dichotomous reference PVT, the continuous scale of the EI-5s was divided into a passing and a failing range. The first two levels were considered a Pass, since they indicate at most one failure at the most liberal cutoff. EI-5 scores of two and three are harder to interpret, as they could reflect either a single failure at the most conservative cutoff (which by itself would be insufficient evidence to render the entire profile invalid) or multiple failures at the most liberal cutoff (the clinical meaning of which is often unclear). This indeterminate range of performance was labeled Borderline and was excluded from further analyses in the interest of establishing diagnostically pure categories on the reference PVTs, following previously published methodological guidelines (Axelrod, Meyers, & Davis, 2014; Greve & Bianchini, 2004; Sugarman & Axelrod, 2015). Table 3 provides frequency distributions and classification ranges for both versions of the EI-5. The EI-9 and the EI-5s were specifically developed to provide a calculated method variance. Given the engineered differences among them, they represent various approaches to performance validity assessment nested in the same basic

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

373

Table 2. The components of the EI-5 with base rates of failure corresponding to each cutoff. EI-5 Values EI-5REC component LM Recognition Base rate VPA Recognition Base rate VR Recognition Base rate FCRCVLT-II Base rate RCFT Rec TP Base rate

0 >20 84.7 >35 85.1 >4 83.2 16 85.6 >6 86.7

1 18–20 8.9 32–35 8.4 4 7.4 15 4.6 6 6.1

EI-5 Values 2

17 2.5 28–29 4.5 3 4.0 14 3.1 4 4.4

3 ≤16 4.0 ≤27 2.0 ≤2 5.4 ≤13 6.7 ≤3 2.8

EI-5PSP Component FTT Failures Base rate FAS Base rate Animals Base rate CPT–II Failures Base rate CDACSS Base rate

0

1

2

0 92.7 >33 86.1 >33 81.7 0 71.8 >5 85.8

1 5.8 32–33 5.9 25–33 9.9 1 12.4 5 4.1

2 1.5 28–31 4.0 21–24 4.5 2 4.5 4 6.1

3 — — ≤27 4.0 ≤20 4.0 ≥3 11.4 ≤3 4.1

Note. EI-5REC = recognition-memory-based EI-5; EI-5PSP = processing-speed-based EI-5; LM = Logical Memory (Bortnik et al., 2010; Pearson, 2009); VPA = Verbal Paired Associates (Pearson, 2009); VR Recog = Visual Reproduction (Pearson, 2009); FCRCVLT-II = California Verbal Learning Test, 2nd Edition, Forced-Choice Recognition (Bauer et al., 2005; D. Delis, personal communication, May 2012; Erdodi, Kirsch, et al., 2014; Erdodi, Roth, et al., 2014); RCFR Recog = Rey Complex Figure Test recognition true positives (Lu et al., 2003; Reedy et al., 2013); FTT Failures = Finger Tapping Test, number of scores at ≤35/28 dominant hand and ≤66/58 combined mean raw scores (Arnold et al., 2005; Axelrod et al., 2014); FAS = Letter Fluency T score (Curtis et al., 2008; Sugarman & Axelrod, 2015); Animals = Category Fluency T score (Sugarman & Axelrod, 2015); CPT–II Failures = Conners’s Continuous Performance Test, 2nd edition; number of T scores >70 on omissions, hit reaction time standard error, variability, and perseverations (Erdodi, Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve, & Bianchini, 2010); CDACSS = Coding agecorrected scaled score (Etherton, Bianchini, Heinly, & Greve, 2006; N. Kim et al., 2010; Trueblood, 1994).

Table 3. Frequency distribution of the EI-5s with classification ranges. EI-5REC 0 1 2 3 4 5 6 7 8 9

f 116 22 22 19 8 2 6 1 2 2

% 57.4 10.9 10.9 9.4 4.0 1.0 3.0 0.5 0.5 1.0

%Cumulative 54.5 68.3 79.2 88.6 92.6 93.6 96.5 97.0 98.0 99.0

EI-5PSP 0 1 2 3 4 5 6 7 8 9

f 96 37 22 26 3 5 3 2 5 1

% 47.5 18.3 10.9 12.9 1.5 2.5 1.5 1.0 2.5 0.5

%Cumulative 47.5 65.8 76.7 89.6 91.1 93.6 95.0 96.0 98.5 99.0

Classification Pass Pass Borderline Borderline Fail Fail FAIL FAIL FAIL FAIL

Note. EI-5REC = recognition-memory-based validity composite; EI-5PSP = processing-speed-based validity composite.

methodology as that established in clinical research (i.e., using a combination of individual dichotomized PVTs and the Pass/Fail pattern across multiple indicators to determine the credibility of a neurocognitive profile). As such, their simultaneous use allows one to model potential instrumentation artifacts while cross-validating target measures. Converging results across the composites would indicate stable parameter estimates. Conversely, diverging results may suggest that the target construct is volatile and vulnerable to random fluctuations in the reference PVT.

appointment. The average time elapsed between the two administration was 5.0 hours (SD = 1.4). The time taken to complete the recognition trials was recorded for both tests. Data were collected from a consecutive series of referrals through a retrospective chart review using the clinical archives of the neuropsychology service at a Northeastern academic medical center. The project was approved by the Institutional Review Board. Ethical guidelines regulating research with human participants were followed throughout the study. Data analysis

Procedure The tests were administered by staff psychometricians or postdoctoral fellows under the supervision of clinical neuropsychologists. The RMT and the WCT were administered to patients following standard instructions in counterbalanced order, once in the morning at the beginning and once in the afternoon near the end of the testing

Descriptive statistics (frequency, percentage, mean, standard deviation) were computed for the variables of interests. The main inferential analyses were within-subjects t tests, F tests for two-sample variances, and area under the curve (AUC). All significance tests were two-tailed. Effect size estimates, sensitivity (SENS), and specificity (SPEC) were calculated using standard formulas. SENS is

374

L. A. ERDODI ET AL.

the probability that an individual is classified as having a condition of interest given that he or she has that trait (proportion of correct detections). On the other hand, SPEC is the probability that an individual is classified as not having a condition of interest given that he or she does not have that trait (proportion of correct rejections). Finally, two logistic regression classifiers were developed to predict performance validity based on EI-9, EI5REC, and EI-5PSP criteria using time-to-completion and accuracy score for both the RMT and WCT.

Results Accuracy score distributions were J-shaped on both the RMT and WCT. As Figure 1 illustrates, the frequency of RMT scores drops steadily as the scale slides from right to left, with small spikes at 45 and 43. In contrast, the WCT scores show a precipitous drop in frequency from 50 to 47, plateau between 45 and 47, and then reach the asymptote. The scale was truncated at 32, the upper limit of the mathematically defined chance level performance. Patients obtained significantly higher and less variable accuracy scores on the WCT (M = 46.9, SD = 5.7; median = 47) than the RMT (M = 44.1, SD = 7.6; median = 49), with a medium effect (d = 0.52). Likewise, they completed the recognition trial faster on the WCT (M = 137.2, SD = 75.7) than they did on the RMT (M = 157.2, SD = 71.8), with a small effect (d = 0.31).

Next, the classification accuracy of eight accuracy score cutoffs (RMT ≤44 through ≤37 and WCT ≤41 through ≤48) was calculated against the reference PVTs. The RMT produced the highest AUC (.73) against the EI-9 at ≤43. This cutoff had .60 SENS at .87 SPEC. AUC values were notably higher against the EI-5REC, peaking at .90, associated with the ≤42 cutoff, which produced .88 SENS at .91 SPEC. AUC values were lower against the EI-5PSP, and while they peaked at ≤43 (.69), the first cutoff to reach the minimum acceptable SPEC (.84; Larrabee, 2003) was ≤42. On the WCT, AUC peaked at ≤48 against all three reference PVTs. However, this cutoff consistently failed to reach .84 SPEC, so it received no further consideration. The next level of failure (≤47) produced a good combination of SENS (.57) and SPEC (.87) against the EI-9 as well as against the EI5REC (.87 SENS at .86 SPEC). Classification accuracy was markedly lower against the EI-5PSP: the first cutoff to reach minimum SPEC (.86) was ≤45, at .33 SENS. Further details are displayed in Table 4. Given that completion time does not lend itself to natural demarcation lines the same way raw scores do, cutoffs were established in reference to five BRFail: 30%, 25%, 20%, 15%, and 10%. The RMT time cutoff ≥192 s (25% BRFail) was the first one to reach the minimum SPEC standard (.88) at .48 SENS, even though in terms of AUC it is equivalent to the cutoff associated with the 30% BRFail (≥177 s). It narrowly missed the benchmark SPEC against the EI-5REC (.83), but performed well against the EI-5PSP (AUC = .71; .57 SENS at .86 SPEC).

Figure 1. Frequency polygon of Recognition Memory Test (RMT) and Word Choice Test (WCT) raw scores between 32 and 50. To view a color version of this figure, please see the online issue of the Journal.

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

375

Table 4. SENS and SPEC of various RMT and WCT accuracy score cutoffs against reference PVTs. EI-9 34.7

EI-5PSP 10.4

EI-5REC 14.3

Test RMT

BRFail 14.9 16.8 18.3 20.3 21.8 24.3 28.2 33.2

Cutoff ≤37 ≤38 ≤39 ≤40 ≤41 ≤42 ≤43 ≤44

AUC .65** .67** .68** .68** .69** .71** .73** .71**

SENS .33 .38 .42 .43 .47 .52 .60 .62

SPEC .97 .96 .95 .92 .91 .89 .87 .81

AUC .82** .84** .86** .87** .87** .90** .89** .85**

SENS .70 .74 .78 .83 .83 .88 .91 .91

SPEC .95 .94 .94 .91 .91 .91 .86 .79

AUC .58 .65* .64* .66* .65* .66* .69** .67*

SENS .25 .38 .38 .43 .43 .47 .57 .57

SPEC .92 .92 .90 .89 .87 .85 .81 .76

WCT

11.4 12.9 14.4 15.3 19.3 23.8 29.2 39.6

≤41 ≤42 ≤43 ≤44 ≤45 ≤46 ≤47 ≤48

.65** .67** .67** .68** .69** .71** .72** .73**

.30 .35 .37 .38 .43 .50 .57 .68

.99 .98 .98 .97 .94 .91 .87 .78

.82** .84** .86** .86** .84** .87** .86** .86**

.65 .70 .73 .74 .74 .83 .87 .95

.99 .98 .98 .97 .94 .91 .86 .76

.53 .58 .57 .59 .60 .60 .62 .65*

.14 .24 .24 .29 .33 .38 .48 .62

.92 .92 .91 .90 .86 .81 .76 .68

Note. EI-9 = validity composite based on nine independent embedded indicators (Pass ≤1, Fail ≥3); EI-5REC = recognition-memory-based validity composite (Pass ≤1, Fail ≥4); EI-5PSP = processing-speed-based validity composite (Pass ≤1, Fail ≥4); BRFail = base rate of failure (% of sample that failed the test at the given cutoff); RMT = Recognition Memory Test–Words; AUC = area under the curve: overall classification accuracy; WCT = Word Choice Test; SENS = sensitivity; SPEC = specificity.

On the WCT, the highest AUC against the EI-9 was produced by the time cutoff associated with 30% BRFail (≥156 s), resulting in .56 SENS at .86 SPEC. Despite having the highest AUC (.80) against the EI-5REC, it narrowly missed the benchmark SPEC (.83). However, the next level of cutoff (≥171 s, 25% BRFail) produced a good combination of SENS (.68) and SPEC (.87). This cutoff also produced the best combination of SENS (.48) and SPEC (.84) against the EI-5PSP. To examine the unique contribution of time-tocompletion, the subsample that passed the accuracy cutoff but failed the time cutoff was reevaluated. Within the RMT, 21 patients failed the ≥192s time cutoff, but passed the ≤43 accuracy score

cutoff. The majority of them (95.2%) failed ≥2 independent PVTs; 85.7% failed ≥3, and 47.6% failed ≥4 PVTs. Within the WCT, 14 patients failed the ≥171-s time cutoff, but passed the ≤47 accuracy cutoff. The majority of these patients (92.9%) failed ≥2 independent PVTs; 85.7% failed ≥3, and 71.4% failed ≥4 PVTs. Next, we examined the effect of combining accuracy scores and time-to-completion to impose more stringent cutoffs (Table 5). Requiring both an accuracy score >43 and time-to-completion 47 and time-to-completion .95) against the EI-5s. The RMTLRE achieved .91 SPEC against the EI-5REC even at ≥.15, the lowest cutoff investigated (Table 6).

Similar trends were observed with the WCTLRE. The lowest cutoff to meet the minimum threshold for SPEC against all three criterion PVTs was ≥.40, but the more liberal cutoff of ≥.20 also produced high SPEC (.93–.96) against the EI-5s. As with the RMTLRE, the WCTLRE achieved high SPEC (.95) against the EI-5REC even at the most liberal cutoff investigated (≥.15). This discrepancy in signal detection profile across criterion PVTs is likely driven by the variability in the BRFail associated with a given cutoff. Against the EI-9, both LREs achieved the Larrabee limit (.50 SENS and .90 SPEC; Erdodi, Kirsch, et al., 2014; Lichtenstein, Erdodi, & Linnea, 2016) between ≥.40 and ≥.55, with BRFail around 20–30%. A similar balance in SENS and SPEC was achieved against the EI-5PSP at lower cutoffs (≥.15–.25). The EI-5REC was a notable exception, as both LREs produced a classification accuracy that was consistently above the Larrabee limit across the full range of cutoffs examined. Conversely, the WCTLRE never truly achieved the Larrabee limit against the EI-5PSP. Since choosing a more sensitive cutoff failed to reliably increase SENS against the EI-5s, the clinical utility of the LREs was further evaluated by examining cases that passed both accuracy score

Table 6. P(INV), SENS, and SPEC across a range of cutoffs on the logistic regression equation. EI-9 Test RMT

WCT

P(INV) ≥.15 ≥.20 ≥.25 ≥.30 ≥.35 ≥.40 ≥.45 ≥.50 ≥.55 ≥.60 ≥.65 ≥.15 ≥.20 ≥.25 ≥.30 ≥.35 ≥.40 ≥.45 ≥.50 ≥.55 ≥.60 ≥.65

BRFail 75.3 60.4 48.0 40.1 33.2 29.7 25.3 21.3 19.3 17.8 16.8 75.7 55.5 45.5 39.1 34.7 28.2 26.7 23.8 21.3 18.8 16.8

SENS .95 .87 .77 .73 .66 .58 .49 .44 .41 .37 .34 .97 .85 .77 .73 .64 .53 .51 .48 .44 .41 .37

EI-5PSP

EI-5REC SPEC .37 .57 .69 .78 .83 .85 .88 .92 .93 .94 .94 .38 .64 .75 .81 .83 .87 .88 .90 .93 .96 .97

BRFail 21.8 18.8 16.3 14.4 13.4 13.4 11.9 11.9 10.9 10.4 9.9 18.3 16.8 15.4 13.4 13.4 13.4 13.4 12.9 12.4 10.9 10.4

SENS .83 .74 .70 .70 .68 .68 .68 .68 .64 .59 .55 .74 .70 .70 .70 .68 .68 .68 .68 .64 .59 .59

SPEC .91 .93 .94 .96 .97 .97 .99 .99 .99 .99 1.00 .95 .96 .96 .97 .97 .97 .97 .97 .99 .99 .99

BRFail 31.2 22.3 17.3 13.9 12.4 11.4 9.9 6.9 5.5 3.5 2.5 23.3 13.4 8.9 6.9 4.5 3.5 3.5 2.5 2.0 2.0 1.5

SENS .62 .57 .48 .43 .43 .38 .33 .29 .24 .14 .14 .43 .19 .14 .10 .10 .10 .10 .10 .10 .10 .05

SPEC .82 .89 .93 .96 .96 .97 .97 .97 .99 .99 .99 .84 .93 .95 .96 .96 .97 .98 .99 .99 1.00 1.00

Note. EI-9 = validity composite based on nine independent embedded indicators (Pass ≤1, Fail ≥3); EI-5REC = recognition-memory-based validity composite (Pass ≤1, Fail ≥4); EI-5PSP = processing-speed-based validity composite (Pass ≤1, Fail ≥4); BRFail = base rate of failure; P (INV) = probability of invalid performance as defined by the given criterion; RMTLRE = logistic regression equation combining Recognition Memory Test–Words accuracy score and time-to-completion to estimate the probability of invalid responding; WCTLRE = logistic regression equation combining Word Choice Test accuracy score and time-to-completion to estimate the probability of invalid responding; SENS = sensitivity; SPEC = specificity.

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

and time-to-completion cutoffs, but failed the LRE ≥.35. Only one patient fell in this category on the RMT: a 45-year-old male with a complex history (childhood abuse and neglect, multiple TBIs and psychiatric hospitalization, repeated arrests for drunk driving). He obtained an accuracy score of 44 with a time-to-completion of 171 s and failed five independent EVIs. Therefore, his profile can be considered invalid and the RMTLRE a correct detection. Three patients passed both accuracy score and time-to-completion cutoffs, but failed the WCTLRE ≥.35. They obtained scores just below the threshold on both cutoffs (48 and 146–148 s) and failed at least two other PVTs. Therefore, their profiles can be considered invalid and the WCTLRE a correct detection. As shown by the individual scatter plots in Figure 2, the majority of cases clustered in the lower right corner, indicating that they produced both high accuracy scores and low time-to-completion. In other words, their clinical classification is an unequivocal Pass. Conversely, the distribution fans out to the left and upwards, indicating that patients with invalid performance tend to have lower accuracy scores and higher time-to-completion. As expected based on existing research on the psychometric properties of the two tests, the WCT distribution was less variable and was compressed around the area of valid performance, whereas the RMT scores were more scattered. This was especially pronounced against the EI-5PSP, which may explain the markedly low SENS of the WCTLRE.

Discussion The present study investigated the psychometric and functional equivalence between the RMT and WCT. The mean difference in accuracy scores between the tests (2.8) was the same as that reported by Davis (2014) and Erdodi and colleagues (Erdodi, Kirsch, et al., 2014). The convergence of existing data suggests that an approximately 3-point difference between RMT and WCT reflects an inherent discrepancy in difficulty level likely due to the combination of differences in stimulus presentation as well as word frequency, concreteness, and imageability of the words used in each test. Likewise, the ratio of the variability within the distributions was comparable to the RMT/WCT ratio of standard deviations reported by previous

377

investigations (Davis, 2014: 6.8/4.3; Erdodi, Kirsch, et al., 2014: 6.8/4.6). These findings suggest that the distributional properties of the two tests are remarkably stable across settings. Thus, they likely reflect inherent differences in their psychometric properties. As predicted, mean time-to-completion was shorter for the WCT than for the RMT. On average, patients took 20 s less to complete the WCT recognition trial. However, the effect size (d = 0.31) was smaller than that observed with accuracy score differences (d = 0.52). This finding (slower completion time) provides further evidence that the RMT is a more difficult test than the WCT. As shown by previous investigators, both instruments are sensitive to invalid responding as defined by other PVTs. Patients who passed the EI-9 produced markedly higher and less variable accuracy score distributions than those who failed it. This may indicate less consistent test taking effort or alternatively, different malingering strategies (Cottingham, Victor, Boone, Ziegler, & Zeller, 2014) or variable success rates in avoiding detection by patients in the invalid subsample (Erdodi, Kirsch, et al., 2014). Interestingly, the fundamental differences in test characteristics (higher and less variable scores in the WCT than in the RMT) were preserved even across the valid/invalid demarcation line, providing further evidence that they are stable properties of these instruments, likely due to systematic differences in item content described by Davis (2014). The optimal accuracy score cutoff on the RMT was ≤43, at a BRFail of 28.2%, correctly identifying 73% of the sample characterized as valid or invalid by the EI-9, with .60 SENS at .87 SPEC. This classification accuracy is lower than the numbers associated with the ≤42 cutoff (.89 SENS at .92 SPEC) in the study by M. S. Kim and colleagues (2010). The discrepancy may be partly due to a much greater divide in accuracy scores between the credible and noncredible groups in their study (47.2 vs. 31.9) than in our sample (47.0 vs. 39.0). The difference is apparently driven by the higher frequency of extremely low scores in their sample, which in turn makes it easier to separate the criterion groups. On the WCT, the optimal accuracy score cutoff was ≤47, at a BRFail of 29.2%, achieving an overall classification accuracy of 72% against the EI-9, with .57 SENS at .87 SPEC.

Figure 2. Scatter plots, sensitivity (SENS), and specificity (SPEC) produced by the logistical regression equations combining the accuracy scores and time-to-completion against the EI-9, EI-5REC, and EI-5PSP. PVT = performance validity test.

378 L. A. ERDODI ET AL.

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

While the two accuracy score distributions are separated by a consistently replicated 2.8-point difference, their BRFail and classification accuracy are synchronized around the cutoffs of RMT ≤43 and WCT ≤47, respectively, at a 4-point difference. Although this gap is slightly narrower, and the cutoffs are more liberal than the values (RMT ≤42 and WCT ≤45) reported by Erdodi, Kirsch, et al. (2014), the optimal LRE cutoffs had BRFail similar to the earlier study (i.e., hovering around 20%). These findings illustrate the importance of distinguishing between psychometric and functional equivalence when comparing similar instruments. Despite apparent differences in difficulty level, once the cutoffs are adjusted to synchronize the classification accuracy, the two tests perform similarly as PVTs. In terms of time-to-completion, the optimal cutoff on the RMT was ≥192 s, correctly identifying 68% of the sample characterized as valid or invalid by the EI-9, with .48 SENS at .88 SPEC. This classification accuracy is also lower than the numbers associated with the ≥207 s cutoff (.66 SENS at .91 SPEC) in the study by M. S. Kim and colleagues (2010). Again, the discrepancy likely reflects a much greater divide in time-tocompletion between the credible and noncredible groups in their study (131.7 s vs. 296.9 s) than in our sample (133.8 s vs. 201.4 s). As before, the difference is clearly driven by the higher frequency of extremely long completion times in their sample, which improves the signal detection properties by virtue of a better separation between targets and nontargets. The optimal time cutoff on the WCT was ≥171 s, correctly identifying 70% of the sample characterized as valid or invalid by the EI-9, with .49 SENS at .91 SPEC. Time cutoffs appear to be an equally valuable addition to both tests. Although they performed slightly below accuracy score cutoffs matched for BRFail, they can function as effective independent validity indicators. Also, they make unique contributions to improving the overall classification accuracy of both tests. Specifically, when used in combination with accuracy scores, time cutoffs increase SENS by 6% (WCT) to 10% (RMT), while maintaining high levels of SPEC (.93–.95). Given the relatively low frequency of extreme values, however, cutoffs based on time-to-completion are vulnerable to chance variation and may be sample-specific. Therefore, instead of reifying the numbers produced by the present analyses, a more

379

ecologically valid approach might be introducing a general rule of thumb that a completion time approaching 3 minutes on either RMT or WCT raises questions about the credibility of the performance. Comparing the reference PVTs amongst each other revealed a complex pattern of shifting signal detection properties as a function of cognitive domain (recognition memory vs. processing speed), test (RMT vs. WCT), cutoff, and type of score (accuracy vs. time-to-completion). In terms of accuracy scores, both tests performed notably better against the EI-5REC than the other two reference PVTs. Also, they performed the worst against the EI-5PSP. This pattern is consistent with previous studies reporting that processing-speedbased PVTs are less sensitive to noncredible performance than recognition-memory-based indicators (Inman & Berry, 2002; M. S. Kim et al., 2010; N. Kim et al., 2010; Trueblood, 1994). However, within the EI-5PSP, RMT cutoffs consistently outperformed WCT cutoffs matched for BRFail. The divergence in SENS and SPEC across the three reference PVTs serves as a reminder that the choice of criterion influences the signal detection performance of the measure being calibrated (Erdodi, Abeare, et al., 2016). Such evidence of instrumentation artifacts echoes Bigler’s (2012, 2015) concerns about the cross-validation methodology used in PVT research in general, and specifically its reliance on arbitrary criterion measures. In the absence of a true gold standard PVT, strategic methodological pluralism appears to be the best defense against such criticisms (Boone, 2013; Larrabee, 2012). In terms of time-to-completion, RMT cutoffs produced slightly, but consistently, better classification accuracy against the EI-5PSP than against both the EI5REC and the EI-9. The opposite pattern was observed with WCT time cutoffs: They had the highest AUC values against the EI-5REC, followed by EI-9 and EI5PSP. Surprisingly, the markedly different BRFail between the EI-9 and the EI-5s had little apparent effect on classification accuracy. Since AUC is a quantitative summary of the concordance rate between two dichotomous instruments, part of the pattern described above follows a predictable trend. One would expect a higher level of agreement between scales that measure similar constructs, such as number of words correctly recognized (i.e., RMT or WCT accuracy scores) and a composite based on five recognition memory tests (i.e., the EI-

380

L. A. ERDODI ET AL.

5REC), or time-to-completion and a composite based on five processing speed tests (i.e., the EI-5PSP). Conversely, a lower level of agreement would be predicted when there is a mismatch between the construct measured by the two instruments. While that was generally the case, a few inexplicable deviations from this expected pattern were also observed. This cautions researchers and clinicians against drawing definitive conclusions from a limited number of observations, regardless of how strong the relationship between the given variables appears to be. Until the mechanisms behind such fluctuation across measurement models are better understood, the best defense against spurious findings may be methodological pluralism (Boone, 2013; Larrabee, 2008). Examining the same construct from multiple angles will, at the minimum, alert the investigator to inconsistencies potentially indicative of additional measurement error that may not be apparent from a unidimensional analysis, but will likely limit the internal and external validity of the statistical inferences. Finally, a multivariate approach combining both accuracy scores and time-to-completion into a single indicator produced results that were consistent with univariate analyses. In addition to supplying further evidence that the RMT and WCT can reliably differentiate between credible and noncredible responding, the LREs also provide an alternative method to evaluate performance validity beyond the traditional accuracy score or even the novel time-cutoff. More importantly, LREs have the capability to recognize the additive effect of repeated subthreshold performance as sufficient evidence for invalid responding, resulting in overall classification accuracy that is superior to the sum of their components (Sugarman & Axelrod, 2015; Wolfe et al., 2010). As such, they can make a unique contribution to performance validity assessment. There was a strong linear relationship between incremental changes in both raw score and completion time cutoffs and AUC values. Specifically, as the cutoffs shifted across the conservative (low SENS, high SPEC) to liberal (high SENS, low SPEC) continuum, they tended to produce increasingly better overall classification accuracy. Nevertheless, the score associated with the minimum acceptable SPEC was consistently chosen over the one with the highest AUC. Although this overt beta-bias (i.e., prioritizing true negatives over true positives) is a core principle behind performance validity assessment both in the calibration and clinical interpretation of PVTs (Larrabee,

2012), concerns about false-positive errors persist (Bigler, 2012, 2015). The present analyses serve as yet another reminder that protecting examinees from false-positive errors is a foundational feature of PVT research. Therefore, subjective post hoc attempts to discount a PVT failure on the grounds of imperfect specificity reveal either lack of awareness or deliberate ignorance of this built-in mechanism to minimize false positives, while knowingly sacrificing sensitivity. Overall, both the RMT and WCT seem to be effective tools to discriminate between valid and invalid responding. Once their cutoffs are matched for BRFail, they produce comparable classification accuracy, using both accuracy scores and time-to-completion. The performance of the optimal cutoffs hovered around the Larrabee limit (.50 SENS at .90 SPEC). Inevitably, the study has a number of weaknesses. Its sample size was smaller than that of previous studies (Erdodi, Kirsch, et al., 2014; M. S. Kim et al., 2010). Further, while the use of a mixed clinical sample has ecological validity for many neuropsychology practices, it might constrain the generalizability of the findings, as it remains unclear whether these tests will behave the same way when applied to other, more homogeneous populations in terms of diagnosis (TBI, depression, neurological disorders), demographic characteristics (age, education, race, geographic region), or assessment context (clinical vs. forensic). Finally, there were no data available from a robust stand-alone PVT to reference along the EIs. The present study meaningfully contributes to the knowledge base of clinical neuropsychology by providing a head-to-head comparison between the two tests in a large clinical sample. By virtue of its withinsubject design, it controlled for interindividual variability, which is a major confound in validity assessment. Also, it introduced and cross-validated a time cutoff for the WCT, a feature that was previously reserved to the RMT. Finally, it demonstrated that the choice of reference PVT influences the classification accuracy of the instrument under evaluation. The methodological and forensic implications of such instrumentation artifacts are far reaching and deserve more systematic investigations in the future.

Disclosure statement No potential conflict of interest was reported by the authors.

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

References Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPherson, S. (2005). Sensitivity and specificity of finger tapping test scores for the detection of suspect effort. The Clinical Neuropsychologist, 19, 105–120. doi:10.1080/13854040490888567 Axelrod, B. N., Fichtenberg, N. L., Millis, S. R., & Wertheimer, J. C. (2006). Detecting incomplete effort with digit span from the Wechsler adult intelligence scale – Third edition. The Clinical Neuropsychologist, 20, 513–523. doi:10.1080/13854040590967117 Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger tapping test performance as a measure of performance validity. The Clinical Neuropsychologist, 28 (5), 876–888. doi:10.1080/13854046.2014.907583 Bauer, L., Yantz, C. L., Ryan, L. M., Warden, D., & McCaffrey, R. J. (2005). An examination of the California verbal learning test II to detect incomplete effort in a traumatic brain-injury sample. Applied Neuropsychology, 12, 202–207. doi:10.1207/ s15324826an1204_3 Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity and performance validity testing. Brain Imaging and Behavior, 9, 421–444. doi:10.1007/s11682-015-9409-1 Bigler, E. D. (2012). Symptom validity testing, effort and neuropsychological assessment. Journal of the International Neuropsychological Society, 18, 632– 642. doi:10.1017/S1355617712000252 Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection of suboptimal effort with the Rey complex figure test and recognition trial. Applied Neuropsychology, 16, 54–61. doi:10.1080/ 09084280802644227 Boone, K. B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23(4), 729–741. doi:10.1080/ 13854040802427803 Boone, K. B. (2013). Clinical practice of forensic neuropsychology. New York, NY: Guilford. Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002). The Rey 15-item recognition trial: A technique to enhance sensitivity of the Rey 15-item memorization test. Journal of clinical and experimental neuropsychology, 24(5), 561–573. doi:10.1076/ jcen.24.5.561.1004 Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E., Victor, T. L., & Zeller, M. A. (2010). Examination of various WMS-III logical memory scores in the assessment of response bias. The Clinical Neuropsychologist, 24(2), 344–357. doi:10.1080/13854040903307268 Conners, K. C. (2004). Conner’s Continuous Performance Test (CPT II).Version 5 for Windows. Technical guide and software manual. North Tonawada, NY: Multi-Health Systems. Cottingham, M. E., Victor, T. L., Boone, K. B., Ziegler, E. A., & Zeller, M. (2014). Apparent effect of type of compensation seeking (disability vs. litigation) on

381

performance validity test scores may be due to other factors. The Clinical Neuropsychologist, 28(6), 1030– 1047. doi:10.1080/13854046.2014.951397 Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini, K. J. (2008). Verbal fluency indicators of malingering in traumatic brain injury: Classification accuracy in known groups. The Clinical Neuropsychologist, 22, 930–945. doi:10.1080/ 13854040701563591 Davis, J. J. (2014). Further consideration of advanced clinical solutions word choice: Comparison to the recognition memory test – Words and classification accuracy on a clinical sample. The Clinical Neuropsychologist, 28(8), 1278–1294. doi:10.1080/ 13854046.2014.975844 Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (2000). California Verbal Learning Test (2nd ed.). San Antonio, TX: The Psychological Corporation. Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff, B. (2014). Comparing the recognition memory test and the word choice test in a mixed clinical sample: Are they equivalent? Psychological Injury and Law, 7(3), 255–263. doi:10.1007/s12207-014-9197-8 Erdodi, L. A., Roth, R. M., Kirsch, N. L., LajinessO’Neill, R., & Medoff, B. (2014). Aggregating validity indicators embedded in Conners’ CPT-II outperforms individual cutoffs at separating valid from invalid performance in adults with traumatic brain injury. Archives of Clinical Neuropsychology, 29(5), 456–466. doi:10.1093/arclin/acu026 Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski, B., Zuccato, B. G., & Roth, R. M. (2016). WAIS-IV processing speed scores as measures of non-credible responding – The third generation of embedded performance validity indicators. Psychological Assessment. Advance online publication. doi:10.1037/pas0000319 Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier, C. L., Rai., J. K., & Roth, R. M. (2016). The BDAE complex ideational material – A measure of receptive language or performance validity? Psychological Injury and Law, 9, 112–120. doi:10.1007/s12207-016-9254-6 Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W. (2006). Pain, malingering, and performance on the WAIS-III processing speed index. Journal of Clinical and Experimental Neuropsychology, 28, 1218–1237. doi:10.1080/13803390500346595 Greiffenstein, M. F., Baker, J. W., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6(3), 218–224. doi:10.1037/1040-3590.6.3.218 Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on psychometric indicators of negative response bias: A methodological commentary with recommendations. Archives of Clinical Neuropsychology, 19(4), 533–541. doi:10.1016/j. acn.2003.08.002 Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J., & Crouch, J. A. (2002). Detecting malingered

382

L. A. ERDODI ET AL.

neurocognitive dysfunction with the Wisconsin card sorting test: A preliminary investigation in traumatic brain injury. The Clinical Neuropsychologist, 16(2), 179–191. doi:10.1076/clin.16.2.179.13241 Greve, K. W., Curtis, K. L., Bianchini, K. J., & Ord, J. S. (2009). Are the original and second edition of the California verbal learning test equally accurate in detecting malingering? Assessment, 16(3), 237–248. doi:10.1177/1073191108326227 Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can localised brain impairment be simulated on neuropsychological test profiles? Australian and New Zealand Journal of Psychiatry, 21, 87–93. doi:10.3109/00048678709160904 Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., & Curtiss, G. (1993). Wisconsin Card Sorting Test manual: Revised and expanded. Odessa, FL: Psychological Assessment Resources. Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A. (2005). WAIS digit-span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury. Assessment, 12(4), 429–444. doi:10.1177/1073191105281099 Inman, T. H., & Berry, D. T. R. (2002). Cross-validation of indicators of malingering: A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations. Archives of Clinical Neuropsychology, 17, 1–23. doi:10.1093/arclin/17.1.1 Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cottingham, M. E., . . . Zeller, M. A. (2010). The Warrington recognition memory test for words as a measure of response bias: Total score and response time cutoffs developed on “real world” credible and noncredible subjects. Archives of Clinical Neuropsychology, 25, 60–70. doi:10.1093/arclin/acp088 Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C. (2010). Sensitivity and specificity of a digit symbol recognition trial in the identification of response bias. Archives of Clinical Neuropsychology, 25, 420–428. doi:10.1093/arclin/acq040 Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., Pancholi, S., Bhagwat, A., & French, L. M. (2013). Clinical utility of the Conners’ continuous performance test-II to detect poor effort in U.S. military personnel following traumatic brain injury. Psychological Assessment, 25(2), 339–352. doi:10.1037/a0030915 Larrabee, G. J. (2012). Assessment of malingering. In G. J. Larrabee (Ed.), Forensic neuropsychology: A scientific approach (2nd ed.). New York: Oxford University Press. Larrabee, G. J. (2003). Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist, 17(3), 410–425. doi:10.1076/clin.17.3.410.18089 Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22, 410–425. doi:10.1080/ 13854040701494987 Lezak, M. D. (2004). Neuropsychological assessment. Oxford: Oxford University Press.

Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2016). Introducing a forced-choice recognition task to the California verbal learning test – Children’s version. Child Neuropsychology. Advance online publication. doi:10.1080/09297049.2015.1135422 Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness of the Rey-Osterrieth complex figure test and the Meyers and Meyers recognition trial in the detection of suspect effort. The Clinical Neuropsychologist, 17(3), 426–440. doi:10.1076/ clin.17.3.426.18083 Meyers, J. E., & Meyers, K. R. (1995). Rey Complex Figure Test under four different administration procedures. The Clinical Neuropsychologist, 9(1), 63–67. Ord, J. S., Boettcher, A. C., Greve, K. W., & Bianchini, K. J. (2010). Detection of malingering in mild traumatic brain injury with the Conners’ continuous performance test-II. Journal of Clinical and Experimental Neuropsychology, 32(4), 380–387. doi:10.1080/ 13803390903066881 Pearson. (2009). Advanced clinical solutions for the WAIS-IV and WMS-IV – Technical manual. San Antonio, TX: Author. Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F., Lu, P. H., Victor, T. L.,. . . Wright, M. J. (2013). Cross validation of the Lu and colleagues (2003) Rey-Osterrieth complex figure test effort equation in a large known-group sample. Archives of Clinical Neuropsychology, 28, 30–37. doi:10.1093/arclin/ acs106 Rey, A. (1964). L’examen clinique en psychologie (Clinical examination in psychology). Paris: Presses Universitaires de France. Reznek, L. (2005). The Rey 15-item memory test for malingering: A meta-analysis. Brain Injury, 19(7), 539–543. doi:10.1080/02699050400005242 Sharland, M. J., & Gfeller, J. D. (2007). A survey of neuropsychologists’ beliefs and practices with respect to the assessment of effort. Archives of Clinical Neuropsychology, 22, 213–223. doi:10.1016/j. acn.2006.12.004 Spencer, R. J., Axelrod, B. N., Drag, L. L., WaldronPerrine, B., Pangilinan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable digit span is no more accurate than age corrected scaled score as an indicator of invalid performance in a veteran sample undergoing evaluation for mTBI. The Clinical Neuropsychologist, 27(8), 1362–1372. doi:10.1080/13854046.2013.845248 Spreen, O., & Benton, A. L. (1977). Neurosensory center comprehensive examination for Aphasia. Victoria, BC: Neuropsychology Laboratory, University of Victoria. Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of performance validity using verbal fluency tests in a clinical sample. Applied Neuropsychology: Adult, 22(2), 141–146. doi:10.1080/23279095.2013.873439 Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin Card Sorting Test in the detection of malingering in student simulator and patient samples. Journal of Clinical and Experimental Neuropsychology, 21(5), 701–708.

JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY

Trueblood, W. (1994). Qualitative and quantitative characteristics of malingered and other invalid WAIS-R and clinical memory data. Journal of Clinical and Experimental Neuropsychology, 14(4), 697–607. Warrington, E. K. (1984). Recognition Memory Test manual. Berkshire, UK: NFERNelson. Wilkinson, G. S., & Robertson, G. (2006). Wide Range Achievement Test (4th ed.) (WRAT-4). Lutz, FL: Psychological Assessment Resources. Wechsler, D. (2008). Wechsler Adult Intelligence Test — Fourth Edition (WAIS-IV). San Antonio, TX: Pearson. Wechsler, D. (2009). Wechsler Memory Scale — Fourth Edition (WMS–IV) technical and interpretive manual. San Antonio, TX: Pearson. Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., & Sweet, J. J. (2010). Effort indicators within the California verbal learning test-II (CVLTII). The Clinical Neuropsychologist, 24(1), 153–168. doi:10.1080/13854040903107791

383

Appendix Components of logistic regressions equations (LREs) PðInvalidityjx1 ; x2 Þ ¼

1 ðβ0 þβ1 x1 þβ2 x2 Þ

1þe

β0, β1, β2 are the LRE derived parameters x1, x2 represent accuracy score and time-to-completion, respectively β0

β1

β2

RMT

EI-9 EI-5REC EI-5PSP

5.2617 17.2877 −2.2876

−0.1588 −0.4074 −0.0369

0.0070 −0.0131 0.0124

WCT

EI-9 EI-5REC EI-5PSP

14.0900 27.5578 2.1967

−0.3340 −0.6147 −0.0914

0.0092 −0.0078 0.0020

cRITERION

Note. RMT = Recognition Memory Test; WCT = Word Choice Test.

Suggest Documents