Psychometric Properties of the MacArthur ...

Psychological Assessment 1998, Vol. 10, No. 4, 435-443

Copyrighl 1998 by the American Psychological AMOCiadon, Inc. 1040-3590/98/J3.00

Psychometric Properties of the MacArthur Competence Assessment Tool-Criminal Adjudication Randy K. Otto and Norman G. Poythress University of South Florida

Robert A. Nicholson

John F. Edens

John Monahan, Richard J. Bonnie, Steven K. Hoge, and Marlene Eisenberg

University of Tulsa

University of South Florida

University of Virginia

This article describes the development of a new clinical instrument for use in assessments of adult criminal defendants' competence to proceed to adjudication, the MacArthur Competence Assessment Tool-Criminal Adjudication (MacCAT-CA). The MacCAT-CA was derived from a more comprehensive research instrument (MacArthur Structured Assessment of Competencies of Criminal Defendants; Hoge, Bonnie, Poythress, Monahan, & Eisenberg, 1997) on the basis of considerations efface validity for use in legal contexts, psychometric analyses, and advice from mental health experts who reviewed an earlier prototype. This article presents the results from an National Institute of Mental Healthsponsored validation study that investigated the psychometric properties of the MacCAT-CA.

At least 25,000 criminal defendants are referred annually for

Eaves, 1984), each exhibits various limitations (Grisso, 1991;

evaluation of their competence to participate in legal proceed-

Melton, Petrila, Poythress, & Slobogin, 1997; Nicholson, 1992).

ings (Steadman & Hartstone, 1983). Although a number of

Briefly, some measures were designed to serve primarily as

measures have been specifically designed to assess defendants'

screening instruments (e.g., CST and GCCT) and appear mainly

capacities in this area (e.g., Competence Screening Test [CST],

to be collections of items having various legal content but little

Lipsitt, Lelos, & McGarry, 1971; Georgia Court Competency

or no underlying conceptual structure. Others (e.g., CAI and IFI)

Test [GCCT], Wildman et al., 1980; CADCOMP, Barnard et

lack standardized administration and criterion-based

al., 1991; Competency Assessment Instrument [CAI], Labora-

relying instead on the judgment of the clinician to determine,

tory of Community Psychiatry, 1974; Interdisciplinary Fitness

on a case-by-case basis, which questions to pose to the defendant

scoring,

Interview [IFI], Golding, Roesch, & Schreiber, 1984; Fitness

and how to rate the responses received. None is designed to

to Stand Trial Interview, Menzies, Webster, Roesch, Jensen, &

provide quantitative indexes of discrete competence-related abilities, nor have interpretative norms based on large, national samples been developed for any of these measures.

Randy K. Otto, Norman G. Poythress, and John F. Edens, Department of Mental Health Law and Policy, University of South Florida; Robert A. Nicholson, Department of Psychology, University of Tulsa; John Monahan and Richard J. Bonnie, School of Law, University of Virginia; Steven K. Hoge, School of Law and School of Medicine, University of Virginia; Marlene Eisenberg, Institute of Law, Psychiatry, and Public Policy, University of Virginia. John F. Edens is now at the Department of Psychology, Sam Houston State University. This research was supported by the MacArthur Foundation Research Network on Mental Health and the Law, and by Grant RO1MH54517O1A1 from the National Institute of Mental Health. We are grateful to Tom Feucht-Haviar for his assistance on a pilot test of this measure. Norman G. Poythress, John Monahan, Richard J. Bonnie, and Steven K. Hoge are authors of the MacArthur Competence Assessment ToolCriminal Adjudication (MacCAT-CA) published by Psychological Assessment Resources. Randy K. Otto, Norman G. Poythress, Robert A. Nicholson, John F. Edens, John Monahan, Richard J. Bonnie, and Steven K. Hoge are authors of the MacCAT-CA manual, also published by Psychological Assessment Resources. Correspondence concerning this article should be addressed to Randy K. Otto, Department of Mental Health Law and Policy, Florida Mental Health Institute, University of South Florida, 13301 Bruce B. Downs Boulevard, Tampa, Florida 33612-3899. Electronic mail may be sent to [email protected].

Given the pivotal role of a defendant's competence, one of the major research initiatives of the MacArthur Foundation Research Network on Mental Health and the Law was the development of a standardized research instrument for evaluating criminal defendants' psycholegal abilities related to competence to proceed to adjudication. The MacArthur Structured Assessment of Competencies of Criminal Defendants (MacSAC-CD; Hoge, Bonnie, Poythress, Monahan, & Eisenberg, 1997) was based on Bonnie's (1992, 1993) comprehensive theory of legal competence, and it contains measures of discrete competence-related abilities. Items in the MacSAC-CD were modeled after those created by Grisso, Appelbaum, Mulvey, and Fletcher (1995) for the assessment of competence to consent to treatment (see, more generally, Grisso & Appelbaum, 1998). Each measure in the MacSAC-CD involves standardized administration and criterion-based scoring, thus reducing much of the discretionary (and

potentially idiosyncratic) scoring that plagues existing

measures. Hoge et al. (1997) recently described

the MacSAC-CD

and presented the primary findings from a field study that involved both incompetent and competent defendants as research participants. The results of this field study indicated that the

435

436

OTTO ET AL.

MacSAC-CD had satisfactory psychometric properties, construct validity, and potential classification utility for research applications. Significant differences in the expected direction between mean scores of hospitalized incompetent (HI), jail unscreened (JU), and jail treated (JT; i.e., jail inmates who were receiving mental health services but whose competence was not raised as an issue) participants were obtained on each measure. Within a subset of the HI group, the measures were sensitive to changes in the patients' or defendants' clinical condition; reassessment of those patients or defendants who had been deemed clinically restored to competence revealed that scores on the MacSAC-CD increased, reflecting gains in competencerelated abilities. Although the primary field study used only male defendants (W = 366), a second study involving 106 female defendants produced similarly encouraging findings concerning the validity of the MacSAC-CD (Poythress, Hoge, et al., 1998). Having demonstrated the potential utility of the MacSAC-CD for the assessment of competence-related abilities, the network's next priority was to develop a clinically portable measure for actual use in clinical practice. The field study revealed that administration time for the MacSAC-CD (approximately 1.5 to 2 hours) exceeded considerably that required by other currently available measures; further, the need to reduce the length of the measure was indicated by some duplication in scoring procedures and redundancy across separate measures (see Hoge et al., 1997). In light of these considerations, and keeping in mind the importance of face validity for application in the legal context, the MacSAC-CD, an instrument comprising 7 measures containing 47 items, was streamlined to one comprising 3 measures containing 22 items. Decisions about item retention were informed by an examination of (a) the item-total correlations for each item with its original MacSAC-CD measure, and (b) the impact on coefficient alpha for the MacSAC-CD measure if the item was removed. The resulting 22-item clinical instrument, the MacArthur Competence Assessment Tool-Criminal Adjudication (MacCAT-CA; Poythress, Nicholson, et al., in press) evaluates three discrete competence-related abilities: understanding (the ability to understand general information related to the law and adjudicatory proceedings), reasoning (the ability to discern the potential legal relevance of information, and capacity to reason about specific choices that confront a defendant in the course of adjudication), and appreciation (rational awareness of the meaning and consequences of the proceedings in one's own case). These abilities are assessed through three separate measures on the MacCAT-CA. The content of both the Understanding (eight items) and Reasoning (eight items) measures refers to a hypothetical legal scenario in which a defendant is charged with aggravated assault. Questions comprising the Appreciation measure (six items) refer specifically to the examinee's own case. Items are scored on a 0-2 scale, with higher scores indicative of higher levels of capacity. The specific legal content of each item is summarized in the Appendix. The MacCAT-CA was reviewed by experienced forensic clinicians for ease of administration and was pilot tested to determine administration time, which ranges from 25 to 55 minutes. Because the MacCAT-CA was not normed on persons with IQ estimates below 60 (see below) it is not recommended for use with this population. From 1996 to 1998, we conducted a multistate study, funded

by the National Institute of Mental Health, to establish the clinical utility of the MacCAT-CA. In this article, we report the findings regarding the psychometric properties and construct validity of the instrument. Our clinical protocol, comprising the MacCAT-CA and other measures relevant to the investigation of the validity of the MacCAT-CA, was administered to 729 pretrial, fekmy defendants in eight states. The selection of states as potential sites for data collection was informed by two considerations. First, state systems for providing forensic evaluation services vary considerably in their organizational structure. For example, the typology of forensic service delivery systems proposed by Grisso, Cocozza, Steadman, Fisher, and Greer (1994) consisted of five system types (i.e., traditional inpatient, modified-traditional, communitybased, private practitioner, and mixed evaluation systems) as well as a ' 'not classified'' category. Second, states differ in the presence or absence of specific training requirements for forensic examiners and in the availability of state-sponsored training programs designed to meet those training requirements (Farkas, DeLeon, & Newman, 1997). Both of these variables-—organizational structure for forensic service delivery and the existence of state-sponsored forensic training—could influence the thresholds for determining incompetence that are applied across states. Therefore, gathering data from each system type and from states with and without forensic examiner training was necessary to inform judgments about the generalizability of findings on the basis of the MacCAT-CA. To provide representative national data, we selected individual states that represented each of the six categories identified by Grisso et al. (1994). In addition, where possible, within types of service delivery systems, states with and without forensic examiner training were selected.

Method Eligibility Criteria ami Recruitment Procedures Data were collected- from three groups of felony defendants: (a) defendants admitted to forensic psychiatric units after being adjudicated incompetent to proceed (HI); (b) defendants in jail who were receiving treatment for mental health problems but who were presumed competent (JT); and (c) randomly selected jail inmates who were presumed competent (JU). Participants were recruited from six states representing categories in the Grisso et al. (1994) typology: Washington (traditional);

Louisiana

(private

practitioner); Oklahoma

(community-

based); South Carolina (modified-traditional); Wisconsin (mixed); and Utah (not classified). According to the survey by Farkas et al. (1997), none of these six states reported the availability of extensive, statesponsored forensic evaluator training programs. Therefore, two additional states (Michigan [modified-traditional] and Alabama [community-based]) that provide systematic training for forensic examiners also were included. Both male and female felony defendants were eligible for participation in the study; participants were paid $10 for their time. Potential participants had to be English-speaking, between the ages of 18 and 65, without a diagnosis suggesting organicity, and they had to have a prorated Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 1981) Full Scale IQ of 60 or greater. For the HI sample, participants were recruited within 14 days of hospital admission to minimize the effects of treatment on their performance on the research measures. All but 4 ( 1 % ) participants were assessed within 10 days of admission. Potential HI participants were identified as they entered the forensic hospital for competence restoration treatment; essentially all eligible

437

MAcCAT-CA

participants were approached by research assistants. Jail inmate participants were recruited from lists of defendants provided by cooperating

Table 1 Demographics by Group

public defender offices in each state. Jail inmates receiving mental health

JU

JT

HI

31.33 9.18

32.46 9.35

36.01 10.66

Male (%) Female (%) Ethnicity White (%) Non-White (%) Education

93.9

6.1

88.8 11.2

89.4 10.6

35.0 65.0

50.2 49.8

49.8 50.2

M SD SES

11.26 1.98

11.35 2.51

11.44 2.68

56.10 10.96

56.25 12.45

57.94 13.13

Variable

services (potential JT participants) were identified by the research assistant with the assistance of jail mental health staff and randomly selected for invitation to participate in the study. The remaining inmates were randomly selected for participation as members of the JU group. Potential participants were approached by a research assistant and informed about the study using informed consent procedures approved by a university institutional review board. Those expressing an interest in participating were screened by the research assistant with respect to their competence to consent to the research project and their cognitive functioning (see below). After participants completed the research protocol, $10 was placed in their institutional accounts. Refusal rates for the HI, JT, and JU samples were 22%, 9%, and 9%, respectively. Whereas the 22% refusal rate is higher than desirable, it is lower than is often found in studies of acutely mentally ill people. For example, in studies of competence to consent in civil contexts (consent to voluntary hospitalization; consent to treatment), Appelbaum, Mirkin, and Bateman (1981) and Grisso, Appelbaum, and Hill-Fbtouhi (1997) reported refusal rates of 33% and 43%, respectively.

Age M SD Sex

M

SD

Note. JU = unscreened jail inmates; JT = jail inmates receiving mental health services; HI = hospitalized incompetent defendants; SES = socioeconomic status.

Measures Measures gathered on each participant consisted of historical and demographic information, estimated IQ, measures of current adjustment

measures of theoretically relevant and divergent constructs. This measure

and psychopathology, the MacCAT-CA, and a treating clinician's global

was added to the research protocol after data collection already had

estimate of competence to proceed (HI participants only). Measures

been initiated; nevertheless, data were available for 89% (n = 647) of

were administered in the order listed below.

the participants.

Estimate of cognitive-intellectual functioning.

To estimate partici-

MacCAT-CA.

The MacCAT-CA, the clinical instrument derived

pants' cognitive abilities and screen out potential participants with mark-

from the MacSAC-CD research instrument, was administered to all

edly limited abilities, we administered to all potential participants the

participants.

Information and Picture Completion subtests of the WAIS-R. Perfor-

Chan data.

Relevant background information on the participants

mance on these two scales is highly correlated with WAIS-R Rill Scale

was gathered, in part, by way of institutional chart review. A review of

IQ scores (r = .86 and r = .92, respectively; Kaufman, Ishikuma, &

the hospital chart or jail records provided information on participants'

Kaufman-Packer, 1991). Because cognitive functioning can be impaired

age, race, admission date, and current psychiatric status including diag-

by underlying mental disorder, this score is best considered as an index

nosis, treatment status, and medications.

of current functioning rather than a measure of baseline intellectual

Clinical judgment regarding competence.

After HI participants

capacity. Thirty-six potential HI participants were excluded because they

completed the research protocol, a hospital mental health professional

obtained WAIS-R Rill Scale IQ estimates of below 60, whereas 8 potential JT participants and 3 potential JU participants were excluded on

familiar with the participant's mental state and adjustment offered a

these grounds.

anchored 6-point Likert scale (1 = clearly incompetent, 6 = clearly

Background interview data.

Background information on the partici-

global judgment of the participant's competence to proceed using an competent}.

pants was gathered through a brief structured interview. Information obtained included history of psychiatric treatment, arrest history, educa-

Results

tional level, and highest socioeconomic status achieved. Brief Psychiatric Rating Scale-Anchored (SPSS).

Scored after the

Demographic Data

administration of a 15-minute interview, the BPRS (Overall & Gorham, 1962) provides a reliable and valid estimate of the presence and severity of psychopathology. We used the anchored version (Woerner, Mannuzza, & Kane, 1988) in which the severity of 18 symptoms of mental disorder (e.g., depression and orientation) are rated on a 7-point, anchored Likert scale. Ratings on the 18 items are summed to provide a global measure of current psychopathology, and four subscale scores that provide indices of psychoticism, depression, emotional withdrawal, and hostility can be calculated (Overall & Porterfield, 1963). MMPI-2 Psychoticism scale.

The 25-item Psychoticism scale was

derived from the MMPI-2 item pool. Based on a five-factor model of personality especially relevant for pathological populations (Harkness & McNulty, 1994), the Psychoticism scale measures the gross degree of correspondence between an individual's internal models of reality and the external social and physical world (Harkness, McNulty, & Ben-

Table 1 provides descriptive information about the hospitalized and jailed defendant samples recruited for this study (for ffl, n = 283; for JT, n = 249; for JU, n = 197).' Participants were predominantly male (90%), and gender was distributed similarly across the HI, JT, and JU samples, x 2 (2, N = 729) = 3.90, p = .14. Although there was a balance of White and non-White participants in the study, they were somewhat disproportionately distributed across the three samples, X 2 (2, N = 729) = 12.99, p < .002. HI participants as a group were significantly older than participants in the JU sample, f(724) = 25.01, p < .001, and the JT sample, f(724) = 16.40, p < .001. The mean years of education reported by participants was 11.36 (SD = 2.45), and level of education did not vary signifi-

Porath, 1995). Estimates of internal consistency (coefficient alpha) ranging from .70 to .84 have been reported across diverse nonclinical and clinical samples. The scale also yields appropriate mean differences across these samples, as well as expected patterns of correlations with

1

Different degrees of freedom for the analyses that follow reflect

missing data on these variables for a small number of participants.

438

OTTO ET AL.

Table 2 Mental Status Characteristics by Group JT

JU

HI

Variable

M

SD

M

SD

M

SD

Univariate F

df

Estimated WAIS-R Full Scale IQ MMPI-2 Psychoticism BPRS total BPRS Psychoticism BPRS Depression BPRS Hostility BPRS Emotional Withdrawal

85.23 63.32 29.16 3.41 8.70 5.00 3.91

11.99 14.13 6.99 1.16 3.96 2.26 1.50

85.49 70.48 35.95 4.% 10.76 6.15 4.62

14.61 18.40 9.07 2.61 4.55 2.69 2.33

83.21 74.68 38.80 7.55 8.20 6.74 5.40

14.05 19.64 10.04 3.72 3.99 3.09 2.95

2.17 20.93* 68.26* 131.96* 24.48* 23.53* 22.27*

2,726 6,646 2,726 2,722 2,722 2,722 2,722

Note. JU = unscreened jail inmates; JT = jail inmates receiving mental health services; HI = hospitalized incompetent defendants; WAIS-R = Wechsler Adult Intelligence Scale-Revised; MMPI-2 = Minnesota Multiphasic Personality Inventory-2; BPRS = Brief Psychiatric Rating Scale-Anchored. "p < .001.

cantly across groups. Using Hollingshead and Redlich's (1958) two-factor index of social position, participants identified the highest level of employment ever held. The mean scores (see Table 1) equate with Social Class 4 (e.g., clerical and sales workers, technicians). The group means did not differ significantly in terms of highest socioeconomic status attained by participants. The participants in the various samples reported similar amounts and types of involvement with the criminal justice system. The mean number of estimated misdemeanor and felony arrests for the total sample was 5.24 (SD = 9.52) and 1.91 (SD = 2.79), respectively.2 There were no significant differences between the groups in terms of their estimated number of felony arrests, F(2,721) = .35, p = .71, or misdemeanor arrests, F(2, 714) = 1.11, p = .33. Clinical Measures Group characteristics on measures of psychopathology and cognitive functioning are summarized in Table 2. The three samples did not differ in terms of their estimated current cognitive-intellectual functioning. There was a significant group effect as measured by the MMPI-2 Psychoticism scale, with the HI sample demonstrating greater levels of psychopathology than both the JT, r(646) = 2.55, p = .011, and JU, t(646) = 6.46, p < .001, samples. Similarly, participants in the HI sample obtained higher scores on the BPRS total score than the JT and JU samples, rs(726) = 3.65 and 11.58, respectively, as well as higher scores on the BPRS Psychoticism, fs(722) = 10.49 and 15.74, Hostility, is(722) = 2.61 and 6.85, and Emotional Withdrawal, B(722) = 3.69 and 6.61, subscales, respectively (all ps < .01). Similar to findings in the MacSAC-CD field study (Hoge et al., 1997), HI participants were less depressed than JT participants, f(722) = 6.75, p < .001, but did not differ from JU participants, r(722) = 1.30, p = .19.

1971). The alphas ranged from .81 (Reasoning) to .85 (Understanding) to .88 (Appreciation), indicating good internal consistency for these measures. These estimates are comparable to those obtained using the MacSAC-CD research measure (Hoge et al., 1997). The mean interitem correlations were .36, .42, and .54 for Reasoning, Understanding, and Appreciation, respectively, indicating appropriate homogeneity of item content for all three measures. Although the Appreciation measure has the fewest items, its relatively greater item homogeneity enabled it to produce the highest estimate of scale reliability. Interrater reliability. Tb evaluate interrater reliability, we drew a sample of 48 protocols from the database (2 HI, 2 JT, and 2 JU from each state). Scoring assigned by the original research assistant who completed the protocol was removed, and 42 protocols were mailed to each research assistant for rescoring.3 One protocol was not returned; hence, the interrater reliability analysis was based on 47 total cases scored by each of the eight research assistants. Three analyses of variance (ANCTVAs) were conducted on the total scores from the Understanding, Reasoning, and Appreciation measures, respectively, to compute an intraclass correlation for each measure. In each of the ANOVks, rater was treated as a random variable (see Shrout & Fleiss, 1979, Model 2). As several authors have emphasized, the intraclass correlation is superior to the traditional product-moment correlation as an index of reliability (e.g., see Bartko & Carpenter, 1976). Interscorer reliability for the three measures as estimated by this procedure ranged from very good to excellent, with intraclass R = .75 for Appreciation, .85 for Reasoning, and .90 for Understanding (see Table 4). Although explicit scoring guidelines are provided for each item on each measure, the scoring criteria are more rigorous for the Understanding items, and less so for the Appreciation measure (where a judgment about the plausibility of a defendant's "reasons" is required). The pattern of intraclass correlations for the three

MacCAT-CA Performance and Comparisons 2

Internal consistency. Cronbach's alpha and the mean and range of interitem correlations for each of the MacCAT-CA measures are provided in Table 3. These values serve as indices of scale reliability and item homogeneity, respectively (Fiske,

One participant who reported 1,000 misdemeanor arrests was de-

leted from this analysis. 3 Research assistants did not rescore the six protocols they originally completed that were subsequently mailed to the remaining research assistants. Therefore, only 42 new protocols were mailed to them.

MACCAT-CA Table 3 Internal Consistency ofMacArthur Competence Assessment Tool— Criminal Adjudication Measures Interitem correlation Measure

.85 .81 .88

Understanding Reasoning Appreciation

M

Range

.42 .36 .54

.31 -.53 .21 -.53 .4S-.62

measures is consistent with the degree of rigor in the scoring guidelines.

Correlations Between MacCAT-CA Scales and Clinical Measures Support for the construct validity of the MacCAT-CA is found in the pattern of correlations between MacCAT-CA measures and select clinical variables. Ideally, measures of competencerelated abilities should correlate positively with estimated IQ, but they should correlate negatively with measures of psychopathology, particularly with measures of psychoticism. As revealed in Table 5, this is the pattern of relationships obtained in the present study, providing evidence of convergent validity for the MacCAT-CA. Consistent with the findings of Hoge et al. (1997) with the research instrument (MacSAC-CD), the MacCAT-CA measures correlate negatively with disturbed thinking as measured by the

Table 4 Interrater Reliability ofMacArthur Competence Assessment Tool-Criminal Adjudication Measures Measure/Item no.

ICC

Understanding

.90 .82 .88 .80 .88 .82 .92 .90 .86 .85 .93 .75 .87 .34 .90 .74 .72 .45 .75 .73 .66 .57 .70 .53 .42

1 2 3 4 5 6 7 g Reasoning

9 10 11 12 13 14 15 16 Appreciation

17 18 19 20 21 22

439

Table 5 Correlations Between MacArthur Competence Assessment Tool-Criminal Adjudication and Clinical Measures Variable Estimated WAIS-R Full Scale IQ MMPI-2 Psychoticism BPRS Total BPRS Psychoticism BPRS Depression BPRS Hostility BPRS Emotional Withdrawal Clinician Ratings of Competence'

Understanding

Reasoning

.41 -.33 -.23 -.40

.34 -.31 -.29 -.48

.18

.18

Appreciation

.14 -.21 -.36 -.52

.18

-.01

-.10

-.28

-.34

-.27

-.19

.36

.42

.49

Note. WAIS-R = Wechsler Adult Intelligence Scale-Revised; MMPI2 = Minnesota Multiphasic Personality Inventory-2; BPRS = Brief Psychiatric Rating Scale-Anchored, p < .001 for all correlations except Hostility-Understanding correlation (p = .71) and Hostility-Reasoning correlation (p = .007). " Correlations for clinician ratings are based only on data from the 283 hospitalized incompetent participants.

BPRS Psychoticism subscale and MMPI-2 Psychoticism scale, and with psychopathology generally as measured by the BPRS total score. Present cognitive ability, as measured by performance on select subtests of the WMS-R, correlated positively with performance on the MacCAT-CA. Similar to results obtained with the research version of the MacCAT-CA (the MacSAC-CD; Hoge et al., 1997), presumed competent jail defendants showed higher levels of depression as measured by the BPRS than hospitalized incompetent defendants. Global ratings of the HI participants' competence to proceed were offered by forensic clinicians who were knowledgeable about their clinical conditions. As expected, clinicians' ratings of competence were moderately correlated with performance on the MacCAT-CA (Understanding, r = .36; Reasoning, r = .42; Appreciation, r = .49) presumably due, in part, to the restriction in range (i.e., only hospitalized incompetent defendants received such clinician ratings). These results provide evidence of concurrent validity for the MacCAT-CA. Sample differences. Table 6 provides the mean and the standard deviation of each sample for each of the three MacCAT-CA psycholegal ability measures: Understanding, Reasoning, and Appreciation. These means differed significantly in a multivariate analysis of covariance (MANCOV\) in which type of pretrial forensic evaluation service delivery system and mandated training requirements were entered as covariates, multivariate F(18, 2,160) = 9.79, p < .001." For each measure, planned orthogonal comparisons revealed that the HI sample scored significantly lower (i.e., were more impaired regarding their cam-

' Although these two variables are important to assess the generalizability of MacCAT-CA scores, they are potential confounds that might obscure within-state group differences (HI vs. JT and HI vs. JU) across states with differing organizational structures and training requirements. It should be noted, however, that not including these covariates in this analysis resulted in essentially similar results, with defendants in the HI group obtaining significantly lower scores than those in the JT and JU

Note.

ICC = intraclass correlation.

groups across all three MacCAT-CA measures.

OTTO ET AL.

440

Tible 6 MacArthur Competence Assessment Tool-Criminal Adjudication Descriptive Statistics by Group and Multivariate Analysis of Covariance (MANCOVA) Results JU (n = 197)

IT (n = 249)

HI(R) (n = 90)

HI (n = 283)

HI(C) (n = 170)

Measure

M

SD

M

SD

M

SD

M

SD

M

SD

Univariate F' (2, 720)

Univariate F" (3,696)


12.50 13.27 11.44

3.08 2.64 1.01

12.56 12.90 11.02

3.25 2.90 1.63

9.11 9.33 7.89

4.19 4.31 4.01

10.78 11.14 9.87

3.41 3.75 3.00

8.14 8.23 6.58

4.17 4.23 4.08

79.12 95.92 121.07

73.35 88.27 129.08

Note. Clinician ratings of competence were unavailable for 23 hospitalized incompetent (M) defendants, who were not included in the second MANCOVA. JU = unscreened jaU inmates, JT = jail inmates receiving mental health services; HI(C) = clinically affirmed incompetent ffl defendants; ffl(R) = residual HI defendants. All F values significant at p < .001. • Univariate F for MANCOVA comparing JU, JT, and HI samples. " Univariate F for MANCOVA comparing JU, JT, HI(R), and HI(C) samples.

petence-related abilities) than the JU sample [Understanding, ((720) = 11.37; Reasoning, ((720) = 12.72; Appreciation, ((720) = 13.74, allps < .001]. The planned comparisons contrasting the means from the HI and JT samples provide a more stringent test of group discrimination. As expected, and lending support for the claim that the MacCAT-CA can discriminate between competent and incompetent mentally ill defendants, the HI group mean was significantly lower than the JT group mean for each MacCAT-CA measure [Understanding, ((720) = 10.04; Reasoning, ((720) = 10.66; Appreciation, /(720) = 12.84, all ps < .001]. This MANOTvA was repeated after separating the HI sample into two groups on the basis of clinicians' ratings of competence. The HI(C) sample comprised those HI participants who were clinically confirmed as incompetent on the basis of independent ratings provided by a treatment-team clinician at the time that the research protocol was administered. The HI(R) sample comprised the residual HI participants whose clinician ratings reflected a clinical impression that the individuals were at least marginally competent at the time of protocol administration. A significant multivariate effect was obtained, F(18, 2,088) = 9.82, p < .001, for this analysis. Planned comparisons indicated that the HI(C) sample obtained lower mean scores on each of the MacCAT-CA measures when compared to the HI(R) group, rs(696) = 6.05, 6.98, and 9.93 for Understanding, Reasoning, and Appreciation, respectively; all ps < .001. Fur• thermore, despite obtaining higher scores than the HI(C) sample, the HI(R) sample obtained lower mean scores on each of the MacCAT-CA measures when compared with both the JU sample, (s(696) = 5.19, 5.53, and 4.70 for Understanding, Reasoning, and Appreciation, respectively; all ps < .001, and the JT sample, rs(696) - 3.88, 3.66, and 3.62 for Understanding, Reasoning, and Appreciation, respectively; all ps < .001, suggesting greater impairment in competence-related abilities (see Table 6). We note that we did not include race as a covariate in these analyses, although race does have a statistically significant bivariate correlation with Understanding (—.15) and Reasoning (—.09).' Partial correlations, controlling for cognitive functioning (i.e., prorated IQ scores), indicated nonsignificant correlations between race and MacCAT-CA performance. Thus, we attribute the difference in MacCAT-CA scores between White

and non-White participants to differences in current cognitive functioning. The preceding analyses documented the statistical significance of the differences between groups of participants on the MacCAT-CA measures. Table 7 provides information regarding the magnitude of those between-group differences. The measure of effect size used to estimate group differences was Cohen's (1977) d, which is defined as follows: d = -

where Xi and X2 are the means of the two groups being compared, and s is the pooled within-group standard deviation. Thus, Cohen's d expresses the difference between means relative to within-group variation. As the first two columns of Table 7 show, comparisons involving all hospitalized incompetent defendants yielded effect sizes of approximately 1 SD across the three MacCAT-CA measures. Further, as can be seen in the third and fourth columns of the table, substantially larger effect sizes were obtained from comparisons involving the clinically affirmed incompetent defendants. In every case, the between-group differences exceeded 1 SD, ranging from almost 1.25 to more than 1.5 SDs. In contrast, more moderate effects were observed in comparisons involving incompetent defendants who were not clinically affirmed as such. The latter effect sizes clustered around one half of a standard deviation. Discussion We have described the derivation of a new clinical tool for use in the assessment of criminal defendants' adjudicative competence, the MacCAT-CA. We also have presented research findings concerning its psychometric properties and potential clinical utility in evaluating psycholegal abilities relevant to competence determinations. These results suggest that the MacCAT-CA measures of Understanding, Reasoning, and Appreciation have good interrater reliability (intraclass Rs ranging from .75 to .90) and strong internal consistency (as > .80, mean

5

It is noted, however, that race only accounted for a small percentage of the variance associated with the Understanding (2%) and Reasoning (1%) scores.

441

MACCAT-CA

Table 7

Effect Sizes (Cohen's d)for Comparisons Between Samples of Jailed and Hospitalized Incompetent Defendants

Defendants

Comparison Measure

JU-ffl

JT-HI

JU-HI(C)

JT-HI(C)

JU-Hl(R)

JT-HI(R)


.90 1.06 1.13

.91 .96 1.00

1.20 1.45 1.69

1.21 1.33 1.54

.54 .70 .84

.54 .56 .55

Note. JU = unscreened jail inmates; JT = jail inmates receiving mental health services; HI = hospitalized incompetent defendants; HI(C) = clinically affirmed incompetent HI defendants; HI(R) = residual HI defendants.

interitem rs ranging from .36 to .54), as well as evidence of construct validity (expected patterns of correlation with measures of cognitive ability, psychopathology, and clinical judgments of degree of impaired competence). The psychometric properties of the MacCAT-CA scales observed in this study compare favorably with those reported for other measures of adjudicative competence, such as the 22-item CST (Lipsitt et al., 1971) and the 21-item GCCT-Mississippi State Hospital revision (GCCT-MSH; Nicholson, Briggs, & Robertson, 1988). Although the individual MacCAT-CA scales contain fewer items, they produced estimates of internal consistency reliability that are similar to those reported for the other competency measures. In addition, differences between competent and incompetent defendants on the MacCAT-CA scales are comparable to those obtained with the CST and GCCT-MSH. Reanalysis of data reported in a quantitative review comparing competent and incompetent defendants (Nicholson & Kugler, 1991) yielded a mean effect size of 1.05 SDs across six studies (n = 429) involving the CST and a mean effect size of 1.20 SDs across four studies (n = 539) involving the GCCT. In the present study, effect sizes for the MacCAT-CA scales ranged from almost 1 SD for comparisons with all incompetent defendants to more than 1.5 SDs for comparisons with clinically affirmed incompetent defendants. Interesting to note, the Understanding scale, which covers information similar to that assessed by other knowledge-based competency measures (e.g., the characteristics of criminal prosecution and defense, the nature of criminal charges, the consequences of conviction), tended to yield smaller effects than either the Reasoning or Appreciation scales, which tap domains of competence-related abilities not represented on other measures. The MacCAT-CA will be available to the field after development of clinical interpretive norms, currently in progress. Norms will be derived using this eight-state sample of 729 defendants, providing data about the distribution of MacCAT-CA performance among competent and incompetent criminal defendants. These norms will serve as guideposts to inform individualized judgments regarding the competence of defendants, more specifically, defendants' abilities to understand and reason about legally relevant information, and to appreciate the significance of such information in the context of their own legal situation. Although the MacCAT-CA is useful in evaluating criminal defendants whose competence has been questioned, it is not offered as a measure of legal competence per se. That is, we do not anticipate the derivation of a single optimal cutoff score

for the empirical classification of a defendant as competent or incompetent. Other psycholegal abilities are relevant to the ultimate determination of competence (e.g., memory for specific events related to one's own case; coherency of speech) and a court may consider other nonpsychological factors (e.g., legal complexity of the case) in making the dichotomous judgment about "competence." As a result, no single measure, no matter how nuanced, can encompass all of the •potentially relevant factors. Additionally, the MacCAT-CA includes no scales designed to assess the examinee's test-taking set. Some defendants may be motivated to appear to lack capacity when in fact, they do not. As is the case with any forensic evaluation, the examiner must consider the possibility of exaggerated and fabricated deficits or psychopathology. When such deception is suspected, it is necessary to use other measures designed to assess the defendant's test-taking set. Nevertheless, the MacCAT-CA, through its measurement of understanding, reasoning, and appreciation capacities, offers a more systematic and differentiated analysis of the psycholegal abilities relevant to adjudicative competence than do existing measures. Grounded in a comprehensive theory of adjudicative competence, the MacCAT-CA offers the additional features of standardized administration and criterion-related scoring that are missing from other interview-based measures. Used as part of a comprehensive evaluation that includes a relevant history (e.g., psychiatric history), current mental-status evaluation, and inquiry about case-specific memories, the MacCAT-CA should enhance the thoroughness and quality of clinical evaluations of adjudicative competence.

References Appelbaum, P. S., Mirkin, S. A., & Bateman. A. L. (1981). Empirical assessment of competency to consent to psychiatric hospitalization. American Journal of Psychiatry, I3S, 1170-1176. Barnard, G., Thompson, J. W., Freeman, W. C., Robbins, L., Gies, D., & HanKns, G. C. (1991). Competency to stand trial: Description and initial evaluation of a new computer-assisted assessment tool (CADCOMP). Bulletin of the American Academy of Psychiatry and Law, 19, 367-381. Bartko, J. J., & Carpenter, W. T. (1976). On the methods and theory of reliability. Journal of Nervous and Mental Disease, 163, 307-317. Bonnie, R. J. (1992). The competence of criminal defendants: A theoretical reformulation. Behavioral Sciences and the Law, W, 291-316. Bonnie, R. J. (1993). The competence of criminal defendants: Beyond Dusky and Drope. University of Miami Law Review, 47, 291-316.

442

OTTO ET AL.

Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic Press. Farkas, G. M., DeLeon, P. H., & Newman, R. (1997). Sanity examiner certification: An evolving national agenda. Professional Psychology: Research and Practice, 28, 73-76. Fiske, D. W. (1971). Measuring the concepts of personality. Chicago: Aldine. Golding, S. L., Roesch, R., & Schreiber, J. (1984). Assessment and conceptualization of competency to stand trial: Preliminary data on the Interdisciplinary Fitness Interview. Law and Human Behavior, 9, 321-334. Grisso, T. (1991). Clinical assessment for legal decision making: Research recommendations. In S. A. Shah & B. D. Sales (Eds.), Law and mental health: Major developments and research needs. (DHHS Publication No. ADM 91-1875, pp. 49-80). Rockville, MD: National Institute of Mental Health. Grisso, T., & Appelbaum, P. S. (1998). Assessing competence to consent to treatment. New York: Oxford University Press. Grisso, X, Appelbaum, P. S., & HUl-Fotouhi, C. (1997). The MacCAT-T A clinical tool to assess patients' capacities to make treatment decisions. Psychiatric Services, 48, 1415-1419. Grisso, X, Appelbaum, P. S., Mulvey, E. P., & Fletcher, K. (1995). The MacArthur treatment competence study. II: Measures of abilities related to competence to£onsent to treatment. Law and Human Behavior, 19, 127-148. Grisso, X, Cocozza, J., Steadman, H., Fisher, W., & Greer, A. (1994). The organization of pretrial forensic evaluations services: A national profile. Law and Human Behavior, 18, 377-394. Harkness, A. R., & McNulty, J. L. (1994). The personality psychopathology five (PSY-5): Issues from the pages of a diagnostic manual instead of a dictionary. In S. Strack & M. Lorr (Eds.), Differentiating normal and abnormal personality. New %rk: Springer. Harkness, A. R., McNulty, J. L., & Ben-Porath, Y. S. (1995). The personality psychopathology five (PSY-5): Constructs and MMPI-2 scales. Psychological Assessment, 7, 104-114. Hoge, S. K., Bonnie, R. J., Poythress, N. G., Monahan, J., & Eisenberg, M. (1997). The MacArthur Adjudicative Competence Study: Development and validation of a research instrument. Law and Human Behavior, 21, 141-179. Hollingshead, A. R., & Redlich, F. C. (1958). Social class and mental illness. New York: Guilford Press. Kaufman, A. S., Ishikuma, X, & Kaufman-Packer, J. L. (1991). Amazingly short forms of the WMS-R. Journal of Psychoeducational Assessment, 9, 4-15. Laboratory of Community Psychiatry (1974). Competency to stand trial

and mental illness. (DHEW Publication No. ADM 74-103). Rockville, MD: Department of Health, Education, & Welfare. Lipsitt, P., Lelos, D., & McGarry, A. L. (1971). Competency for trial: A screening instrument. American Journal of Psychiatry, 128, 105109. Melton, G. B., Petrila, J., Poythress, N., & Slobogin, C. (1997). Psychological evaluations for the courts: A handbook for mental health professionals and lawyers (2nd ed.). New Tfork: Guilford Press. Menzies, R. J., Webster, C. D., Roesch, R., Jensen, F., & Eaves, D. (1984). The Fitness Interview Test: A semi-structured instrument for assessing competency to stand trial. Medicine and Law, 3, 151-162. Nicholson, R. (1992, August). Defining and assessing criminal competencies. Paper presented at the 100th Annual Convention of the American Psychological Association, Washington, DC. Nicholson, R. A., Briggs, S. R., & Robertson, H. C. (1988). Instruments for assessing competency to stand trial: How do they work? Professional Psychology: Research and Practice, 19, 383-394. Nicholson, R. A., & Kugler, K. E. (1991). Competent and incompetent criminal defendants: A quantitative review of comparative research. Psychological Bulletin, 109, 355-370. Overall, J., & Gorham, D. (1962). The Brief Psychiatric Rating Scale. Psychological Reports, 10, 799-812. Overall, J., & Porterfield, J. (1963). Power vector method of factor analysis. Psychometrika, 28, 415-422. Poythress, N. G., Hoge, S. K.. Bonnie, R. J., Monahan, J., Eisenberg, M., & Feucht-Haviar, T. (1998). The competence related abilities of women criminal defendants. Journal of the American Academy of Psychiatry and Law, 26, 215-222. Poythress, N. G., Nicholson, R., Otto, R. K., Edens, J. F., Bonnie, R. J., Monahan, J., & Hoge, S. K. (in press). Manual for the MacArthur Competence Assessment Tool- Criminal Adjudication (MacCAT-CA). Odessa, FL: Psychological Assessment Resources. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428. Steadman, H. J., & Hartstone, E. (1983). Defendants incompetent to stand trial. In J. Monahan & H. J. Steadman (Eds.), Mentally disordered offenders: Perspectives from law and social science. New M>rk: Plenum. Wechsler, D. (1981). The Wechsler Adult Intelligence Scale-Revised: Manual. San Antonio, TX: Psychological Corporation. Wildman, R., Batchelor, E., Thompson, L., Nelson, F., Moore, J., Patterson, M., & DeLaosa, M. (1980). The Georgia Court Competency Test: An attempt to develop a rapid quantitative measure for fitness for trial. Unpublished manuscript, Forensic Services Division, Central State Hospital, Milledgeville, Georgia. Woerner, M. G., Mannuzza, S., & Kane, J. M. (1988). Anchoring the BPRS: An aid to improved reliability. Psychopharmacology Bulletin, 24, 112-118.

443

MAcCAT-CA

Appendix Legal Content of the MacCAT-CA Measures A. Legal Content of the Understanding Measure 1. Understanding the roles of defense attorney and prosecutor. 2. Understanding both the act and mental elements of a serious offense. 3. Understanding the elements of a less serious offense. 4. Understanding the role of a jury. 5. Understanding the responsibilities of a judge at a jury trial. 6. Understanding sentencing as a function of the severity of the offense. 7. Understanding the process of a guilty plea. 8. Understanding the rights waived in pleading guilty. B. Legal Content of the Reasoning Measure 9. Reasoning about evidence suggesting self-defense. 10. Reasoning about evidence related to criminal intent. 11. Reasoning about evidence of provocation. 12. Reasoning about motivation for one's behavior. 13. Reasoning about the potential impact of alcohol on one's behavior. 14. Capacity to identify information that might inform the decision to plead guilty versus plead not guilty.

15. Capacity to identify both potential costs and potential benefits of a legal decision (e.g., pleading guilty). 16. Capacity to compare one legal option (e.g., accepting a plea bargain) with another legal option (e.g., going to trial) in terms of advantages and disadvantages. C. Legal Content of the Appreciation Measure 17. Plausibility of defendant's beliefs about the likelihood of being treated fairly by the legal system. 18. Plausibility of defendant's beliefs about likelihood of being helped by his/her lawyer. 19. Plausibility of defendant's beliefs about whether to disclose case information to his/her attorney. 20. Plausibility of defendant's beliefs about likelihood of being found guilty. 21. Plausibility of defendant's beliefs about likelihood of being punished if found guilty. 22. Plausibility of defendant's beliefs about whether to accept a plea bargain. Received April 23, 1998 Revision received June 21, 1998 Accepted July 30, 1998 •

750 First Street, NE - Washington, DC 20002-4242 20^?J36- S5

American Psychological Associat 750 First Street, NE Washington, EC Z0002-424Z

75IM=ir5t Street, NE - Washington, DC 20002-4;

l

^ ^ « ^ a a l ^ ' ^ o d a i t : ^ W ^ d a : - > a s .

n Psychological Associate

""™

ISO First Street, KB shington, DC 20002-4242

nstruetlons to Publisher

of bonOi, mongagBi. or other HtuitUn of ttw p

eolation mgsi Be puDKOaa: It rnt