Applied Neuropsychology: Child WISC-IV Unusual ...

0 downloads 0 Views 90KB Size Report
Sep 30, 2013 - performance of 86 adolescents with LDs on three measures embedded in the Wechsler ..... from elementary to secondary school curricula. An.
This article was downloaded by: ["Queen's University Libraries, Kingston"] On: 30 September 2013, At: 08:34 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Neuropsychology: Child Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hapc20

WISC-IV Unusual Digit Span Performance in a Sample of Adolescents with Learning Disabilities a

Allyson G. Harrison & Irene Armstrong

a

a

Regional Assessment and Resource Centre, Queen's University , Kingston , Ontario , Canada Published online: 30 Sep 2013.

To cite this article: Allyson G. Harrison & Irene Armstrong , Applied Neuropsychology: Child (2013): WISC-IV Unusual Digit Span Performance in a Sample of Adolescents with Learning Disabilities, Applied Neuropsychology: Child To link to this article: http://dx.doi.org/10.1080/21622965.2012.753570

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

APPLIED NEUROPSYCHOLOGY: CHILD, 0: 1–9, 2013 Copyright © Taylor & Francis Group, LLC ISSN: 2162-2965 print/2162-2973 online DOI: 10.1080/21622965.2012.753570

WISC-IV Unusual Digit Span Performance in a Sample of Adolescents With Learning Disabilities Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

Allyson G. Harrison and Irene Armstrong Regional Assessment and Resource Centre, Queen’s University, Kingston, Ontario, Canada

Accurate identification of symptom magnification is essential when determining whether or not obtained test data are valid or interpretable. Apart from using freestanding symptom validity tests, many researchers encourage use of embedded measures of testrelated motivation, including ones derived from the Digit Span subtest of the Wechsler scales. Such embedded measures are based on identification of performance patterns that are implausible if the test taker is investing full effort; however, it is unclear whether or not persons with preexisting cognitive difficulties such as specific learning disabilities (LD) might be falsely accused of poor test motivation due to actual but impaired workingmemory skills. This study examined the specificity of such measures by reviewing performance of 86 adolescents with LDs on three measures embedded in the Wechsler Intelligence Scale for Children-Fourth Edition—namely, Digit Span (DS), Vocabulary– DS differences, and Reliable Digit Span (RDS) scores. Results indicate that while RDS is likely insensitive to impairments associated with LD, other DS measures may have an unacceptably high false-positive rate, especially if Canadian normative data are used to calculate scores.

Key words: validity

adolescents, assessment, Digit Span, effort testing, embedded measures, symptom

It is now expected practice to include symptom validity tests (SVTs) when completing a neuropsychological assessment with an adult (American Academy of Clinical Neuropsychology, 2007; Bush et al., 2005; Harrison, Edwards, & Parker, 2007, 2008). In an effort to ensure that the assessment is not too lengthy, clinicians may often employ symptom validity measures embedded in the Wechsler Intelligence tests rather than use multiple freestanding SVTs as a means of identifying low test-taking effort or noncredible symptoms (Babikian & Boone, 2007; Babikian, Boone, Lu, & Arnold, 2006). In doing so, however, it is essential that the false-positive rate be minimal so that individuals with bona-fide impairments are not wrongfully accused of deceit. Address correspondence to Allyson G. Harrison, Regional Assessment and Resource Centre, Queen’s University, MackintoshCorry Hall, 68 University Ave., Kingston, ON K7L 3N6, Canada. E-mail: [email protected]

The scaled score of the Wechsler Digit Span (DS) subtest from the Wechsler Adult Intelligence Scale (WAIS)-Fourth Edition (Wechsler, 2008) has proven effective as a means of detecting exaggerated or feigned performance in a variety of adult populations (e.g., Greve et al., 2007; Iverson & Franzen, 1996; Trueblood, 1994). In addition, Mittenberg and his colleagues (Mittenberg, Theroux-Fichera, Zielinski, & Heilbronner, 1995) recommended assessing DS performance relative to performance on the Vocabulary (VOC) subtest of the Wechsler scales. Millis, Ross, and Ricker (1998) further supported use of this discrepancy as an embedded SVT and noted that persons believed to be malingering were accurately differentiated from those with true brain injuries 79% of the time when employing the VOC–DS difference score. In examining the normative data from the WAIS-Third Edition (WAIS-III), Iverson and Tulsky (2003) determined appropriate cutoff scores to maximize sensitivity and specificity of these two indexes. They proposed that

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

2

HARRISON & ARMSTRONG

“a scaled [Digit Span] score of 5, 4, or less” “and/or a Vocabulary–Digit Span difference score of 5 or 6 (or greater)” are indicative of a possible negative response bias (Iverson & Tulsky, 2003, p. 7). The raw scores from the DS subtest have also provided a useful index of response bias. Indeed, Greiffenstein, Baker, and Gola (1994) found that the raw scores for the longest digit string forward and backward was also sensitive to exaggeration or feigning—a concept that has been expanded upon as an SVT by a number of researchers. The Reliable Digit Span (RDS) score, as it has come to be called, is calculated by summing the longest forward and backward digit string recalled without error on both trials of the DS test and has demonstrated strong specificity and sensitivity (Mathias, Greve, Bianchini, Houston, & Crouch, 2002). In fact, RDS scores of 6 or less are said to be rare or nonexistent in nonmalingering patients (Greve et al., 2010; Heinly, Greve, Bianchini, Love, & Brennan, 2004), and many research studies have identified the sensitivity and specificity of various RDS cutoff scores when used to determine noncredible performance in various complainant groups (e.g., Etherton, Bianchini, Ciota, & Greve, 2005; Greve et al., 2010; Greve et al., 2007; Heinly et al., 2004; see also Schroeder, Twumas-Ankrah, Baade, & Marshall, 2012, for a review of the research on RDS when used to assess various populations). Overall, it appears that the DS subtest, alone or in combination, may offer clinicians a reliable embedded method for determining suboptimal effort. Despite the promise of DS as a useful embedded measure of effort, it is important to ensure that the false-positive rate of any DS-derived measure is not elevated in persons who have genuine neurological impairments. For instance, Harrison, Rosenblum, and Currie (2010) presented evidence that large VOC–DS score differences may be common in young adults with true neurological impairments such as a learning disability (LD) or attention-deficit hyperactivity disorder (ADHD) and that even low DS scores were not uncommon in this population. As such, care must be taken to ensure that unusual scores on DS measures truly reflect noncredible performance behavior. Although adult assessments now typically include measures of effort and motivation, it is only recently that clinicians have been urged to include such measures in assessment of children and adolescents (Green & Flaro, 2003; Salekin, Kubak, & Lee, 2007). Indeed, until recently, it was widely held that children did not engage in noncredible behavior during psychological or neuropsychological assessments (Kirkwood, Yeates, Randolph, & Kirk, 2012; Salekin et al., 2007). This is despite research conducted more than 20 years ago that showed that low effort or avoidance of disliked tasks could negatively influence children’s performance on achievement tests and result in inaccurate diagnoses (Adelman, Lauber,

Nelson, & Smith, 1989). Many recent studies have demonstrated that children are indeed capable of feigning cognitive impairment during a neuropsychological evaluation (e.g., Flaro & Boone, 2009; Kirkwood, Hargrave, & Kirk, 2011; Kirkwood, Kirk, Blaha, & Wilson, 2010; McCaffrey & Lynch, 2009) and that parental coaching or persuasion can also produce false or exaggerated symptoms in such assessments (e.g., Lu & Boone, 2002). Given that clinicians are not able to identify suboptimal effort accurately using clinical judgment alone (Faust, Hart, & Guilmette, 1988; Faust, Hart, Guilmette, & Arkes, 1988), clinicians are now beginning to include SVTs in their evaluation of children and adolescents. Very little research has been published regarding accurate identification of response bias in this age group. Further, the research that exists has typically examined performance on freestanding—as opposed to embedded—SVTs. For example, Blaskewitz, Merten, and Kathmann (2008) found that the use of cutoff scores derived from adult research may not be appropriate for use with younger children. One recent study examining the use of DS-derived measures as embedded SVTs (Kirkwood et al., 2011) found that both the DS scaled score and RDS have excellent specificity when employing cutoff scores of 5 or less for the former measure and 6 or less for the latter measure. Neither was particularly sensitive to noncredible performance. Further research is needed to determine whether adultderived criteria for interpreting unusual DS-related scores can be applied to data from children. One group of children who may normally produce low scores on DS-related measures includes those with specific LDs. Research has demonstrated that children with LDs are at higher risk for experiencing disabling accidents relative to the general population, and when injured, they are more likely to suffer a permanent injury of some type (Sigmundsson, 2005; Uzi & Mona, 2006). Hence, children with such preexisting disorders may often be referred for neuropsychological assessment in litigious circumstances where secondary gain is inferred or expected. No research to date, however, has investigated the premorbid performance of such individuals on these DS indicators, specifically investigating how frequently children or adolescents with LDs produce extremely low DS or RDS scores, or if their VOC–DS scores are abnormally large relative to the normal population. Given research suggesting that children with LDs have specific impairments in working-memory functions (McCallum et al., 2006; The Psychological Corporation, 1997), it stands to reason that individuals with such preexisting disabilities may normally produce scores on the DS subtest that could be misinterpreted as the results of low effort or exaggeration. Further, given that WAISIII scores were consistently lower when employing the Canadian norms for this test (Iverson, Lange, Viljoen, & Brink, 2006) and that use of Canadian normative data

3

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

UNUSUAL DIGIT SPAN

from the Wechsler Intelligence Scale for Children (WISC) typically produces lower scores for children with LDs relative to American normative data (Beal, Dumont, Cruse, & Branche, 1996), it is not clear whether younger respondents with LDs might be misclassified as investing low effort if clinicians employ Canadian as opposed to American norms. The question, therefore, is how clinicians can accurately identify adolescents with LDs who are not investing maximal effort in testing, while also minimizing the risk for misattributing poor but honest performance to low motivation/effort. The purpose of the present study was to assess the influence of severe reading and learning difficulties on DS performance. We therefore examined DS performance in a group of seventh-grade students with identified LDs. These students were participating in a transition program, a component of which involved updated psychoeducational testing to identify their specific learning needs as they made the transition to secondary school. Given that these students already had academic accommodations in place within their schools and given that each student had a documented history of learning problems dating back to the early grades, it was felt that they were an ideal LD group on whom to evaluate this question as they had little external incentive to feign or exaggerate their learning problems. The goal of this study was to investigate whether three different DS-based indexes (low DS, high VOC–DS, and low RDS) might potentially misidentify students with known disabilities as possible malingerers or whether they were insensitive to the underlying neurological impairments in persons with such diagnoses. Classification accuracy was investigated for each of the DS-based indexes, allowing for comparison of accuracy across these scores. DS and VOC scaled scores were calculated using both the Canadian and American norms for the WISC-Fourth Edition (WISC-IV; Wechsler, 2003a, 2003b) to examine whether use of the Canadian norms altered the classification of students. The results are presented in frequency tables, which can be referenced easily in clinical practice.

TABLE 1 Demographics of the Sample of Students With Learning Disabilities Excluding Students Who Failed Criterion A of a SVT Dyslexic Students

Sex (male) Age (years) FSIQ GAI

Participants Participants were 86 adolescents (70% male) aged 12.2 years old (SD = 0.55) who completed psychoeducational assessments at a university assessment center between 2008 and 2010; 65 were given a final diagnosis of a LD (Table 1). These students were referred by their schools to participate in a comprehensive program designed to assist students with LD with their transition from elementary to secondary school. All students had been assessed and

Mean

SD

50 (73%) 69 68 68

12.2 92.2/96.5 99.1/102.3

.55 9.5/9.7 11.0/10.9

Note. Canadian scores are bold-faced; U.S. scores are not. FSIQ = Full-Scale IQ; GAI = General Ability Index; SVT = Symptom Validity Test.

identified under the Ontario Ministry of Education criteria for exceptionality as “learning disabled”1 some time prior to fifth grade. As shown in Table 1, adolescents in the current sample were of average intelligence (Mean Full-Scale IQ, Canadian Norms = 92.2, SD = 9.5; American Norms = 96.5, SD = 9.7). All students participated in three testing sessions lasting approximately 3 hr each during the course of 2 to 3 weeks. The current sample did not include students with any other known neurological or behavioral problems apart from possible comorbid ADHD. Measures Students in the transition program were administered a comprehensive test battery (see the Appendix) that included the WISC-IV (Wechsler, 2003a, 2003b). The WISC-IV is a widely used standardized test of cognitive ability for children and adolescents. It consists of 15 subtests that address a variety of cognitive areas. Scores for the DS and VOC subtests were reported using both the American (Wechsler, 2003a) and Canadian (Wechsler, 2003b) normative data for this test. Currently, protocols for assessing children for a possible LD do not typically include use of SVTs. However, it was important in the present study to ensure, as much as 1

METHOD

n

The Ontario Ministry of Education classifies individuals as learning disabled as “having a learning disorder evident in both academic and social situations that involves one or more of the processes necessary for the proper use of spoken language or the symbols of communication, and that is characterized by a condition that: 1. Is not primarily the result of impairment of vision, impairment of hearing, physical disability, developmental disability, primary emotional disturbance, or cultural difference; 2. Results in a significant discrepancy between academic achievement and assessed intellectual ability, with deficits in one or more of the following: receptive language (listening, reading), language processing (thinking, conceptualizing, integrating), expressive language (talking, spelling, writing), or mathematical computations; and 3. May be associated with one or more conditions diagnosed as: a perceptual handicap, a brain injury, minimal brain dysfunction, dyslexia, developmental aphasia.” (http://snow.idrc.ocad. ca/index.php?option=com_content&task=view&id=40&Itemid=53)

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

4

HARRISON & ARMSTRONG

possible, that participants had invested maximal effort during the assessment. As such, participants in the present study were screened using the criteria proposed by Slick, Sherman, and Iverson (1999). Criterion A states that an individual can only be identified with malingering if a substantial external incentive is present. In the present study, we sought to minimize the incentive for malingering by assuring the students and their parents that the child’s school-based classification and accommodations would not change as a result of our assessment. Hence, we felt that the chances of meeting Criterion A were minimal. Criterion B states that when other necessary diagnostic criteria are met, then below-chance performance on symptom validity (neuropsychological) tests is sufficient for a diagnosis of definite malingering. One such measure that is frequently administered as an SVT is the Green Word Memory Test (WMT; Green, 2003). The WMT was specifically designed to measure the degree of effort put forth during testing by asking participants to learn a list of 20 words and recall them in a variety of formats (Bauer, O’Bryant, Lynch, McCaffrey, & Fisher, 2007). Flaro, Green, and Robertson (2007) found that participants of significantly lower cognitive ability showed better performance on the WMT than did those who were of higher intelligence who had incentive to feign impairment. This supports the use of the WMT as a valid measure of effort rather than ability. In the current study, the WMT was administered to all students in an effort to evaluate symptom credibility (see Flaro et al., 2007, and Kirkwood et al., 2010, for more detailed descriptions of the WMT). Briefly, individuals are shown a list of word pairs on a computer screen and are then asked to recognize all words immediately and after a delay of 30 min. In addition, the consistency between the immediate and delayed responses is measured. A score of less than 83% correct on any of these validity indexes of the test is usually considered a failure of Criterion A of the WMT (Green & Flaro, 2003). Although there is evidence to support that children with severe phonological decoding deficits may perform poorly on this test (Green & Flaro, 2003; Larochette & Harrison, 2012), we wanted to ensure that the scores being evaluated in this study were reflective of good effort and motivation. As such, 10 participants were placed into a “questionable effort” group as they performed below the published threshold on one or more of the first three indicators. Even though all but 1 of these adolescents produced a profile that suggested a genuine memory impairment profile (GMIP; Green, Flaro, & Courtney, 2009) and would thus not usually be classified by the WMT as malingering (i.e., if known to have a genuine neurological impairment that interfered with their performance, then they would not be classified as investing low effort), we felt that any student with questionable validity status

should not be included in the main analyses; their data are reported separately. In keeping with the criteria for malingered neurocognitive dysfunction outlined by Slick et al. (1999), assessments also included evaluation of consistency, specifically whether or not the current performance was consistent with the documented history of the individual; whether the pattern of reported symptoms was consistent with known patterns of neurological impairment in those with LDs; whether their reported symptoms were consistent with behavior both in and outside the test environment; and whether or not their symptoms were consistent with those reported by collateral informants. Part of the assessment process also involved review of old report cards or other objective documentation from childhood (such as previous assessments or other medical reports) and observer evaluations completed by parents and teachers. No student was removed from the study due to inconsistent data. In addition, students were informed at the outset of the evaluation that they needed to put forth their best effort and that failure to do so might result in an inability to participate in all of the components of the transition program. Procedure Students with a previous identification of a specific LD were referred by their schools to participate in a research project designed to assist in their successful transition from elementary to secondary school curricula. An updated psychoeducational assessment was provided as part of the research program to help the students better understand their specific learning needs, and where appropriate, to inform school personnel regarding the interventions and supports suggested for each student once they began high school. The assessment did not occur within the school, and so the results were not shared with school personnel without the express permission of the student and his/her family. Two local school boards were informed that a total of 25 students from each board could participate in the program each year, and each board was provided with application forms to distribute to students in Grade 7 with a longstanding LD identification. Only students whose initial assessment had occurred more than 3 years previously were invited to participate. Students came with their parents for an initial intake interview and provided their informed consent to participate in the study at this time and allow their test data to be used for research purposes. Students were administered a full battery of tests to determine their current level of functioning, including the WISC-IV and the WMT. Due to technical errors, 7 participants did not complete the WMT. Given that we could not be certain of their effort status, their scores were also not included in the main analysis.

UNUSUAL DIGIT SPAN

Data Analysis Scores on the DS, VOC–DS, and RDS were sorted for summary in a frequency table. Subsequent single-sample t test analyses compared students who performed below cutoffs with the standardized mean for each test.

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

RESULTS Of the 86 participants, 10 failed at least one of the three primary validity indexes on the WMT, and the WMT was not completed for 7 additional students. As noted, all but 1 of the 10 students who failed one of the first three WMT subtests produced a GMIP that the test classifies as a false positive. Even though all participants had a history of significant reading impairment and continued to obtain scores on measures of word decoding that were at or below the 1st percentile for their age (see Larochette & Harrison, 2012), scores from these subjects (WMT failures and those who were not given this test) are reported separately in an effort to ensure that the scores reviewed in the present study were unequivocally from adolescents who invested full effort. DS and VOC–DS performance were evaluated based on the recommendations of Iverson and Tulsky (2003). Data on DS performance were calculated using both the American and Canadian normative data for the WISC-IV and are presented in Table 2, excluding those in the “questionable” group (e.g., those who either failed or were not given the WMT). Noteworthy discrepancies were observed depending on the normative data employed. Indeed, scaled scores of 5 or less on the DS were TABLE 2 Cumulative Percentages of Participants With LD at or Below Scores as Indicated Including Only Students Who Passed the Word Memory Test (n = 69) Vocabulary–Digit Span Difference

Digit Span DS ≤

Can

U.S.

3 4 5 6 7 8 9 10 11 12 13

3 11 26 37 68 77 87 93 99 99 100

1 6 17 31 43 73 84 90 97 100 100

VOC–DS ≥ 9 8 7 6 5 4 3 2 1 0 –1 –2 –3 –4

Reliable Digit Span

Can

U.S.

RDS ≤

1 6 7 13 23 31 44 61 71 80 91 94 96 100

1 6 7 14 21 31 44 60 73 86 91 93 97 100

6 7 8 9 10 11 12 13

1 13 39 67 89 99 99 100

Note. Canadian scores beyond recommended cutoffs are bold-faced.

5

somewhat uncommon in this sample (17% in total) when using the American normative data, but 26% of the sample was identified as investing low effort when using the Canadian norms. Data on DS performance for those in the “questionable” group are presented in Table 3. As may be seen, this group had an almost identical rate of DS failure to that of the main subject pool. VOC–DS difference scores were calculated separately for students who passed the SVT and for those in the “questionable” group. Difference scores of 5 or more were fairly common and occurred in 21% of the total group (using normative data from America) and 23% (Canadian norms) of those with a final diagnosis of LD. Difference scores of 6 or greater were also fairly common, with 14% and 13% of the sample achieving such scores (based on American and Canadian norms, respectively). Of the participants in the questionable group, only 17% (American norms) and 11% (Canadian norms) returned a score of 5, and none had scores greater than 5 (Table 3). RDS scores were evaluated based on Greiffenstein and colleagues’ (1994) recommendations and are presented in Table 2, excluding those in the “questionable” group. Scores of 6 or less were almost nonexistent in this sample and occurred in only 1% of the participants with LDs. RDS scores for those in the “questionable” group are presented in Table 3 and show that none of the students in this group scored at or below this level. Students who scored below cutoffs on the DS and the VOC–DS scores were also below average on measures of word reading, pseudoword decoding, and numerical operations on the Wechsler Individual Achievement TestSecond Edition (WIAT-II; The Psychological Corporation, 2001) and on measures of visual matching on the Woodcock-Johnson-Third Edition (WJ-III; Woodcock, McGrew, & Mather, 2001), with the students below the VOC–DS criterion also performing significantly below TABLE 3 Cumulative Percentages at or Below Scores Shown for Participants Who Either Failed Criterion A of the Word Memory Test (n = 10) or Who Were Not Given the Word Memory Test (n = 7) Vocabulary–Digit Span Difference

Digit Span DS ≤

Can

U.S.

5 6 7 8 9 10 11

22 44 56 72 83 89 100

17 28 50 61 78 89 100

VOC–DS ≥ 6 5 4 3 2 1 0 –1 –2 –3

Reliable Digit Span

Can

U.S.

RDS ≤

— 11 28 44 50 61 67 83 94 100

— 17 28 39 50 61 78 89 94 100

6 7 8 9 10 11 12

11 39 61 83 94 100

Note. Canadian scores beyond recommended cutoffs are bold-faced.

6

HARRISON & ARMSTRONG

average on the Letter–Word Identification and Reading Fluency subtests of the WJ-III (all t tests, p < .001).

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

DISCUSSION The current study examined DS performance of motivated adolescents with LD to evaluate the influence of severe reading and learning problems on DS performance. Previous research has suggested that persons suspected of biased or feigned responding show poor performance on the DS subtest of the Wechsler scales. The purposes of the present study were to investigate whether the indexes of low DS scores, high VOC–DS difference scores, and low RDS scores from the WISC-IV might incorrectly identify persons known to suffer from LD as having produced an exaggerated performance and to investigate classification accuracy when employing either Canadian or American normative data. Results demonstrate that a sizeable minority of adolescents with LD obtained scaled scores of 5 or less on DS, and more than one quarter of the sample returned a score at this level if Canadian norms were employed. Although a cutoff score of 4 only misclassified 6% of the sample when American norms were employed, 11% of these same students were misclassified when Canadian norms were used. This suggests that employing cutoff scores recommended by Iverson and Tulsky (2003) may not be appropriate when evaluating an individual with a documented history of LD, especially when Canadian norms are used to calculate the scaled score. Similarly, VOC–DS difference scores using American norms identified 14% to 21% of participants, depending whether a cutoff score of 5 or 6 was used, and 13% to 23% when using Canadian norms; all of these individuals had longstanding LDs and had not met the Slick et al. (1999) criteria for malingered neurocognitive dysfunction. As such, this embedded index, too, appears vulnerable to falsely accusing adolescents with true LD as investing low effort. Employing a cutoff score of 7 or more appears to be associated with a more acceptable false-positive rate of around 7%. By contrast, RDS had a very low false-positive rate and appears to correctly identify the majority of students with LD as unlikely to be feigning cognitive deficits. The RDS cutoffs showed strong classification accuracy and are therefore likely to be effective measures of symptom magnification even in clinical samples. This index is also not affected by use of different normative data as they rely solely on the raw scores produced by the participant. All of the students who passed the WMT but scored below recommended cutoffs on the DS and VOC–DS had additional test scores that suggested that they were severely learning-disabled. First, each of these students

also had a documented history of severe learning difficulties dating back to Grade 1, and their current test scores were consistent with those obtained in their original assessment. Second, all either had extremely weak scores on measures of phonological decoding or numeracy, and the group who failed the VOC–DS also performed poorly on tests measuring reading speed. These results suggest that those students who are most disabled phonologically or numerically may be at greatest risk for being falsely accused of low effort if one employs either of these indexes without taking into account actual neurological impairment. One might suggest that the students identified by these two DS indexes were in fact performing below capabilities and therefore were accurately classified as noncredible. Although this is indeed a possibility, it is the case that none of these students met Slick et al.’s (1999) criteria with the exception that they were below cutoff on the DS or VOC–DS. Students in the current sample did not have any apparent external incentives for purposefully providing low effort, as they were already receiving supports for their LDs in school. Although lack of external incentive does not preclude investment of low effort in such testing (e.g., low or avoidance motivation as discussed by Adelman et al., 1989; Marinak & Gambrell, 2008), it seems unlikely that one quarter of our previously diagnosed sample were investing such low effort and yet were not identified by either the WMT or by application of the other Slick et al. criteria. Thus, it seems more plausible that for most of these students, their actual severe LD affected their performance on these measures and increased their chances for being falsely accused. The suspect identification of 13% to 23% of the present sample as malingerers based on VOC–DS scores suggests that cutoff recommendations for DS indexes may not be appropriate for individuals with a history of a congenital, neurologically based disorder such as LD. Given that the RDS cutoff score correctly identified almost all (99%) of the sample as nonmalingerers and that this sample also passed a freestanding SVT, it is reasonable to suggest that both the DS score and the VOC–DS difference score are not an effective way to detect malingering in persons previously diagnosed with LD because their performance on the VOC and DS subtests of the WISC-IV could be affected. A number of studies suggest that working memory appears to be a relative deficit in people with LD (McCallum et al., 2006; The Psychological Corporation, 1997) and that performance on tests of this nature may be disproportionately low relative to the overall intellectual ability of such individuals. Others have suggested that individuals with LD may perform more poorly on tests of processing speed and working memory (Dumont & Willis, 2001a, 2001b; McCallum et al., 2006; Prifitera, Weiss, & Saklofske, 1998), primarily because of the nature

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

UNUSUAL DIGIT SPAN

of their underlying condition. For example, in many children with LD, impairments in the speed of information processing or the ability to hold and manipulate large quantities of information in short-term memory, rather than academic ability per se, are the primary cause of current scholastic or occupational difficulties (Learning Disabilities Association of Ontario, 2003). Given that persons with LDs tend to be unimpaired in areas of general thinking and reasoning ability, the difference between their expressive vocabulary and their workingmemory skills may be greater than that of most nondisabled individuals. Hence, it is not that DS is so unexpectedly low, but rather, in contrast with intact crystallized ability, their difference score is greater than that found in most nondisabled people. Also of note was the dramatic difference in classification based on whether American or Canadian normative data were employed in calculating scores for these adolescents. There has certainly been some discussion of the fact that the Canadian normative data tend to return lower scores than do the American normative data for the Wechsler Intelligence scales (e.g., Beal et al., 1996; Iverson et al., 2006); however, the results from this study serve as a caution to clinicians not to interpret low DS or VOC–DS differences as indicative of low effort if Canadian norms were used in calculating WISC-IV scores. Indeed, there was a strong likelihood that honest but impaired adolescents would be misidentified as investing low effort if Canadian norms were employed. This finding demonstrates the need to ensure that any embedded test score should be investigated thoroughly before being recommended for general use as an SVT, and if separate normative data are available for a test, it is incumbent upon clinicians to demonstrate that classification accuracy is comparable regardless of the norms employed. Detecting malingered performance is a complicated and difficult task for clinicians. Although poor performance on the DS subscale of Wechsler scales has been shown to be indicative of biased responding, it is important that clinicians interpret low scores with caution. The findings from the present study suggest that the presence of certain neurological disorders such as LD may affect performance on the VOC and DS subtests of the WISC-IV such that both DS scaled scores and VOC–DS difference scores cannot be used as an effective identification of low effort or symptom exaggeration in this population. An issue of particular interest is that scores on the WMT were not predictive of scores on the DS measures (kappa = .011, p > .8 for the DS cutoff using American norms; kappa = .013, p > .75 for the DS cutoff using Canadian norms). One would suspect that a majority of the participants who were placed in the “questionable” group would score very poorly on the DS-derived scales; however, this was not the case. The only conclusion that

7

can be drawn from this is that the WMT and DS tests evaluate different constructs and that they can both be valid measures of malingering without having concurrent validity. Alternatively, it might be that all participants but one in the present study invested maximal effort and that the WMT accurately classified severely impaired adolescents with the greatest word-decoding impairments as having a GMIP (see Larochette & Harrison, 2012, for more information about this speculation). For instance, although the WMT is not recommended for use with children whose reading level is below a Grade 3 level, half of the students who had questionable effort were reading below a Grade 3 level and half were reading at exactly a Grade 3 level on the Word Reading subtest of the WIAT-II. As such, it is possible that the almost-identical rates of performance on DS indexes between those who passed or did not pass the WMT are due to the effects of a severe neurological impairment that interfered with accurate reading and word-decoding difficulties rather than one group having invested low test-taking effort. Limitations of this study must also be noted. The data obtained from the clinical groups in this study are based on relatively small sample sizes. Replication of this study with larger samples is necessary before conclusively ruling out the use of DS or VOC–DS difference scores to detect suboptimal effort in those with a documented history of LD. This study also looked only at the specificity of DS indexes and did not directly address issues of sensitivity when identifying low effort in children. Additional research investigating sensitivity of these indexes in child populations is necessary. In addition, although the Slick et al. (1999) criteria were employed in an effort to ensure that obtained scores truly represented good effort in our subjects, it is possible that students in this sample may still have invested low effort when taking the WISC-IV. It is understood that one can never been completely certain that a respondent has given their best effort; however, when other evidence was examined, including participants’ scores on other administered tests, a review of their prior assessment performance, and information provided by collateral sources, there was enough evidence to suggest that the students were accurately demonstrating their abilities on the tests administered. The fact that their responses are consistent over time provides strong evidence for effortful responding. However, future research would benefit from evaluating whether or not existing freestanding SVTs are able to differentiate motivated malingerers from adolescents with bona-fide LD. In conclusion, the results from this study indicate that clinicians should interpret DS or VOC–DS effort indicators cautiously if the scores are obtained from individuals with known histories of neurologically based LDs, as the rate of false-positive identification of malingering when

8

HARRISON & ARMSTRONG

using these embedded SVT measures is excessively high. It may be that other embedded measures, in addition to freestanding SVTs, may be more appropriate for use in populations in which a prior history of LD is present. Further, use of one embedded SVT alone as an indicator of low effort is not recommended (Larrabee, 2008), as this practice can lead to a high false-positive rate in general (Boone, 2009).

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

REFERENCES Adelman, H. S., Lauber, B., Nelson, P., & Smith, D. (1989). Minimizing and detecting false positive diagnoses of learning disabilities. Journal of Learning Disabilities, 22, 234–244. American Academy of Clinical Neuropsychology. (2007). American Academy of Clinical Neuropsychology (AACN) practice guideline. Clinical Neuropsychology, 21(2), 209–231. Babikian, T., & Boone, K. (2007). Intelligence tests as measures of effort. In K. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective (pp. 103–127). New York, NY: Guilford. Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity of various Digit Span scores in the detection of suspect effort. Clinical Neuropsychologist, 20, 145–159. doi:10.1080/ 13854040590947362 Bauer, L., O’Bryant, S. E., Lynch, J. K., McCaffrey, R. J., & Fisher, J. M. (2007). Examining the Test of Memory Malingering Trial 1 and Word Memory Test Immediate Recognition as screening tools for insufficient effort. Assessment, 14, 215–222. Beal, A. L., Dumont, R., Cruse, C. L., & Branche, A. H. (1996). Practical implications of differences between the American and Canadian norms for WISC-III and a short form for children with learning disabilities. Canadian Journal of School Psychology, 12, 7–14. Blaskewitz, N., Merten, T., & Kathmann, N. (2008). Performance of children on symptom validity tests: TOMM, MSVT, and FIT. Archives of Clinical Neuropsychology, 23, 379–391. Boone, K. B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. Clinical Neuropsychologist, 23, 729–741. doi:10.1080/138540408024 27803 Bush, S., Ruff, R., Troster, A., Barth, J., Koffler, S., Pliskin, N., … Silver, C. (2005). Symptom validity assessment: Practice issues and medical necessity. NAN Policy & Planning Committee. Archives of Clinical Neuropsychology, 20, 419–426. Dumont, R., & Willis, J. (2001a). Use of the Tellegen and Briggs formula to determine the Dumont-Willis Indexes (DWI-1 and DWI-2) for the WISC-IV. Retrieved from http://alpha.fdu.edu/psychology/ WISCIV_DWI.htm Dumont, R., & Willis, J. (2001b). Using the DWI or GIA. Retrieved from http://alpha.fdu.edu/psychology/using_the_dwi_or_gia.htm Etherton, J. L., Bianchini, K. J., Ciota, M. A., & Greve, K. W. (2005). Reliable Digit Span is unaffected by laboratory-induced Pain: Implications for clinical use. Archives of Clinical Neuropsychology, 12(1), 101–106. Faust, D., Hart, K. J., & Guilmette, T. J. (1988). Pediatric malingering: The capacity of children to fake believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology, 56, 578–582. Faust, D., Hart, K. J., Guilmette, T. J., & Arkes, H. R. (1988). Neuropsychologists’ capacity to detect adolescent malingerers. Professional Psychology: Research and Practice, 19, 508–515. Flaro, L., & Boone, K. (2009). Using objective effort measures to detect noncredible cognitive test performance in children and adolescents.

In J. E. Morgan & J. J. Sweet (Eds.), Neuropsychology of malingering casebook (pp. 369–376). New York, NY: Psychology Press. Flaro, L., Green, P., & Robertson, E. (2007). Word Memory Test failure 23 times higher in mild brain injury than in parents seeking custody: The power of external incentives. Brain Injury, 21, 373–383. Green, P. (2003). Word Memory Test for Windows: User’s manual and program. Edmonton, AB, Canada: Green’s Publishing. Green, P., & Flaro, L. (2003). Word Memory Test performance in children. Child Neuropsychology, 9, 189–207. Green, P., Flaro, L., & Courtney, J. (2009). Examining false positives on the Word Memory Test in adults with mild traumatic brain injury. Brain Injury, 23, 741–750. doi:10.1080/02699050903133962 Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesic measures with a large clinical sample. Psychological Assessment, 6, 218–224. Greve, K. W., Bianchini, K. J., Etherton, J. L., Meyers, J. E., Curtis, K. L., & Ord, J. S. (2010). The Reliable Digit Span test in chronic pain: Classification accuracy in detecting malingered pain-related disability. Clinical Neuropsychologist, 24, 137–152. doi:10.1080/13854040902927546 Greve, K. W., Springer, S., Bianchini, K. J., Black, F. W., Heinly, M. T., Love, J. M., … Ciota, M. A. (2007). Malingering in toxic exposure: Classification accuracy of Reliable Digit Span and WAIS-III Digit Span scaled scores. Assessment, 14, 12–21. Harrison, A. G., Edwards, M. E., & Parker, K. P. (2007). Identifying students faking ADHD: Preliminary findings and strategies for detection. Archives of Clinical Neuropsychology, 22, 577–588. Harrison, A. G., Edwards, M. E., & Parker, K. P. (2008). Identifying students feigning dyslexia: Preliminary findings and strategies for detection. Dyslexia, 14, 228–246. Harrison, A. G., Rosenblum, Y., & Currie, S. (2010). Examining unusual Digit Span performance in a population of postsecondary students assessed for academic difficulties. Assessment, 17, 283–293. doi:10.1177/1073191109348590 Heinly, M., Greve, K., Bianchini, K., Love, J., & Brennan, A. (2004). WAIS Digit Span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury. Assessment, 12, 429–444. Iverson, G. L., & Franzen, M. D. (1996). Using multiple objective measures to detect simulated malingering. Journal of Clinical and Experimental Neuropsychology, 18, 38–51. Iverson, G., Lange, R., Viljoen, H., & Brink, J. (2006). The WAIS-III General Ability Index in neuropsychiatry and forensic psychiatry inpatient samples. Archives of Clinical Neuropsychology, 21, 77–82. Iverson, G. L., & Tulsky, D. S. (2003). Detecting malingering on the WAIS-III: Unusual Digit Span performance patterns in the normal population and in clinical groups. Archives of Clinical Neuropsychology, 18, 1–9. Kirkwood, M. W., Hargrave, D. D., & Kirk, J. W. (2011). The value of the WISC-IV Digit Span subtest in detecting noncredible performance during pediatric neuropsychological examinations. Archives of Clinical Neuropsychology, 26, 377–384. doi:10.1093/arclin/acr040 Kirkwood, M. W., Kirk, J. W., Blaha, R. Z., & Wilson, P. (2010). Noncredible effort during pediatric neuropsychological exam: A case series and literature review. Child Neuropsychology, 16, 604–618. doi: 10.1080/09297049.2010.495059 Kirkwood, M. W., Yeates, K. O., Randolph, C., & Kirk, J. W. (2012). The implications of symptom validity test failure for ability-based test performance in a pediatric sample. Psychological Assessment, 24, 36–45 doi:10.1037/a0024628 Larochette, A., & Harrison, A. G. (2012). Word Memory Test performance in Canadian adolescents with learning disabilities: A preliminary study. Applied Neuropsychology: Child, 1, 38–47. Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22, 666–679.

Downloaded by ["Queen's University Libraries, Kingston"] at 08:34 30 September 2013

UNUSUAL DIGIT SPAN Learning Disabilities Association of Ontario. (2003). Recommended best practices for assessment, diagnosis and documentation of learning disabilities. Retrieved from http://www.ldao.ca/documents/Assessment%20Protocols_Sept%2003.pdf Lu, P. H., & Boone, K. B. (2002). Suspect cognitive symptoms in a 9-year-old child: Malingering by proxy? Clinical Neuropsychology, 16, 90–96. Marinak, B., & Gambrell, L. B. (2008). Intrinsic motivation and rewards: What sustains young children’s engagement with text? Literacy Research and Instruction, 47, 9–26. Mathias, C. W., Greve, K. W., Bianchini, K. J., Houston, R. J., & Crouch, J. A. (2002). Detecting malingered neurocognitive dysfunction using the Reliable Digit Span in traumatic brain injury. Assessment, 9, 302–308. McCaffrey, R. J., & Lynch, J. K. (2009). Malingering following documented brain injury: Neuropsychological evaluation of children in a forensic setting. In J. E. Morgan & J. J. Sweet (Eds.), Neuropsychology of malingering casebook (pp. 377–385). New York, NY: Psychology Press. McCallum, R. S., Bell, S., Wood, M., Below, J., Choate, S., & McCane, S. (2006). What is the role of working memory in reading relative to the big three processing variables (orthography, phonology, and rapid naming)? Journal of Psychoeducational Assessment, 24, 243–259. Millis, S. R., Ross, S. R., & Ricker, J. H. (1998). Detection of incomplete effort on the Wechsler Adult Intelligence Scale-Revised: A cross-validation. Journal of Clinical and Experimental Neuropsychology, 20, 167–173. Mittenberg, W., Theroux-Fichera, S., Zielinski, R., & Heilbronner, R. L. (1995). Identification of malingered head injury on the Wechsler Adult Intelligence Scale-Revised. Professional Psychology: Research and Practice, 26, 491–498. Prifitera, A., Weiss, L. G., & Saklofske, D. H. (1998). The WISC-III in context. In A. Prifitera & D. H. Saklofske (Eds.), WISC-III: Clinical use and interpretation (pp. 1–38). New York, NY: Academic. The Psychological Corporation. (1997). WAIS-III/WMS-III technical manual. San Antonio, TX: Harcourt Brace. The Psychological Corporation. (2001). Wechsler Individual Achievement Test-Second edition (WIAT II). San Antonio, TX: Author. Salekin, R. T., Kubak, F. A., & Lee, Z. (2007). Deception in children and adolescents. In R. Rogers (Ed.), Clinical assessment of malingering and deception (3rd ed., pp. 343–364). New York, NY: Guilford. Schroeder, R. W., Twumasi-Ankrah, P., Baade, L. E., & Marshall, P. S. (2012). Reliable Digit Span: A systematic review and cross-validation study. Assessment, 19, 21–30. doi:10.1177/1073191111428764

9

Sigmundsson, H. (2005). Do visual processing deficits cause problems on response time task for dyslexics? Brain and Cognition, 58, 213–216. Slick, D. J., Hopp, G., Strauss, E., & Spellacy, F. J. (1996). Victoria Symptom Validity Test: Efficiency for detecting feigned memory impairment and relationship to neuropsychological tests and MMPI-2 validity scales. Journal of Clinical and Experimental Neuropsychology, 18, 911–922. Slick, D. J., Sherman, E. M., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. Clinical Neuropsychology, 13(4), 545–561. doi: 10.1076/1385-4046(199911)13:04;1-Y;FT545 Trueblood, W. (1994). Qualitative and quantitative characteristics of malingered and other invalid WAIS-R and clinical memory data. Journal of Clinical and Experimental Neuropsychology, 16, 597–607. Uzi, B., & Mona, B. (2006). Adolescents with attention deficit and hyperactivity disorder/learning disability and their proneness to accidents. The Indian Journal of Pediatrics, 73, 299–303. Wechsler, D. (2003a). Wechsler Intelligence Scale for ChildrenFourth Edition: American manual. San Antonio, TX: Psychological Corporation. Wechsler, D. (2003b). Wechsler Intelligence Scale for ChildrenFourth Edition: Canadian manual. San Antonio, TX: Psychological Corporation. Wechsler, D. (2008). Wechsler Adult Intelligence Scale-Fourth Edition. San Antonio, TX: Pearson. Woodcock, R., McGrew, K., & Mather, N. (2001). Woodcock-Johnson III. Chicago, IL: Riverside.

APPENDIX The following tests were administered to participants in all 3 years, 2008 to 2010: Wechsler Intelligence Scale for Children-Fourth Edition Wechsler Individual Achievement Test Test of Word Reading Efficiency Woodcock-Johnson-Third Edition Wide Range Assessment of Memory and Learning Gray Oral Reading Tests-Fourth Edition Comprehensive Test of Phonological Processing