Journal of Counseling Psychology Clinical Validity of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62): Further Evaluation and Clinical Applications Andrew A. McAleavey, Samuel S. Nordberg, Jeffrey A. Hayes, Louis G. Castonguay, Benjamin D. Locke, and Allison J. Lockard Online First Publication, September 3, 2012. doi: 10.1037/a0029855
CITATION McAleavey, A. A., Nordberg, S. S., Hayes, J. A., Castonguay, L. G., Locke, B. D., & Lockard, A. J. (2012, September 3). Clinical Validity of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62): Further Evaluation and Clinical Applications. Journal of Counseling Psychology. Advance online publication. doi: 10.1037/a0029855
Journal of Counseling Psychology 2012, Vol. 59, No. 4, 000
© 2012 American Psychological Association 0022-0167/12/$12.00 DOI: 10.1037/a0029855
Clinical Validity of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62): Further Evaluation and Clinical Applications Andrew A. McAleavey, Samuel S. Nordberg, Jeffrey A. Hayes, Louis G. Castonguay, Benjamin D. Locke, and Allison J. Lockard The Pennsylvania State University Self-report instruments of psychological symptoms are increasingly used in counseling centers but rely on rigorous evaluation of their clinical validity. Three studies reported here (total N ⫽ 26,886) investigated the validity of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS62; Locke et al., 2011) as an assessment and screening instrument. In Study 1, initial evidence regarding the concurrent validity of the CCAPS-62 was replicated and extended in a naturalistic clinical sample of clients from 16 counseling centers. Using this sample, convergent validity of the subscales was examined in counseling center clients, the range of sensitivity of the subscales was investigated using item-response theory, and the presence of 2nd-order factors was preliminarily examined. In Study 2, 7 of the 8 CCAPS-62 subscales statistically significantly differentiated between students in counseling and those who were not, using data collected from a large national survey, although most differences were small and the groups’ distributions overlapped considerably. Cut scores based on the differences between these clinical and nonclinical populations showed limited utility due to overall similarities between these broadly defined groups. In Study 3, therapist-rated diagnoses collected from 5 university counseling centers were used to further examine the validity of subscale scores. In addition, cut points for diagnostic screening using receiver operating characteristic curves were evaluated. Overall, these studies support the use of the CCAPS-62 as an initial measure of psychological symptoms in college counseling settings, provide additional information about its psychometric performance, develop cut scores, and illustrate the potential for collaboration between practitioners and researchers on a large scale. Keywords: counseling, CCAPS-62, validity, item-response theory, receiver-operator characteristic (ROC) curves
Additionally, several lines of research over the past two decades have converged on the need for increased rigor of assessment in psychotherapy practices in order to provide valuable practicebased evidence about the nature of therapeutic work. For instance, Barkham, Stiles, Lambert, and Mellor-Clark (2010) suggested that regular institutional use of repeated measures of distress can provide opportunities for assessment of therapy quality and have important implications both for individual cases and for larger entities such as mental health agencies. Other authors (e.g., Howard, Moras, Brill, Martinovich, & Lutz, 1996) have suggested that routine assessment in clinical practice can help address the research question, “What particular treatment may work most efficiently for a particular patient?” as well as provide feedback to clinicians on whether their patients are on track for successful outcome (e.g., Lambert, Hansen, Umpress, et al., 2001). Several authors have suggested that college counseling centers are an ideal location for research into naturalistic psychotherapy (e.g., Kopta & Lowry, 2002; Lambert & Hawkins, 2004), further encouraging the adoption of more rigorous assessment in counseling centers. There is also strong indication of a convergence between the needs of counselors related to assessment, triage, treatment planning, and quality improvement and the interests of researchers who assess treatment effectiveness and the impact of feedback. The potential clinical benefits of improved assessment and outcome monitoring practices that have been identified include reduction of treatment deterioration through outcome monitoring (e.g., Lam-
Several authors have called for more rigorous and efficient assessment of psychological symptoms, treatment needs, and treatment outcomes in college and university counseling centers to improve both clinical practice and research on psychotherapy with this population (e.g., Erdur-Baker, Aberson, Barrow, & Draper, 2006; Kettmann et. al., 2007; Kopta & Lowry, 2002; Lambert & Hawkins, 2004; Sharkin, 2004; Sharkin & Coulter, 2005). These calls for standardized assessment have arisen in the context of reports from counseling center directors of increasing burdens on staff, hiring freezes, and a lack of consistent assessment procedures (e.g., Barr, Rando, Krylowicz, & Winfield, 2010; Gallagher, 2009).
Andrew A. McAleavey and Samuel S. Nordberg, Department of Psychology, The Pennsylvania State University; Jeffrey A. Hayes, Counseling Psychology Program, The Pennsylvania State University; Louis G. Castonguay, Department of Psychology, The Pennsylvania State University; Benjamin D. Locke, Center for Assessment and Psychological Services, The Pennsylvania State University; Allison J. Lockard, Counseling Psychology Program, The Pennsylvania State University. Correspondence concerning this article should be addressed to Andrew A. McAleavey, Department of Psychology, The Pennsylvania State University, 140 Moore Building, University Park, PA 16802. E-mail:
[email protected] 1
2
MCALEAVEY ET AL.
bert, Hansen, & Finch, 2001), more efficient and evidence-based allocation of resources within counseling centers, demonstration of clinical effectiveness, revealing potential areas for training (e.g., Boswell, McAleavey, Castonguay, Hayes, & Locke, in press; Kraus, Castonguay, Boswell, Nordberg, & Hayes, 2011), and opportunities for standardized comparisons between counseling center clients across different institutions. In addition, regular assessment during treatment should provide clinicians with additional information with which to make case formulations and guide treatment. Empirically, frequent and valid assessment has been used to demonstrate, for instance, that at least some counseling centers appear to be providing services that are roughly as efficacious as many controlled outcome studies of psychotherapy efficacy (e.g., Minami et al., 2009) and that individual counselors account for a meaningful percent of variation in outcome of counseling (Okiishi et al., 2006). Additionally, researchers have indicated that early change in clients may be a powerful predictor of overall change, and a promising indicator of when treatment is working and when it is not (Stulz, Lutz, Leach, Lucock, & Barkham, 2007). However, achieving these converging clinical and research goals depends on reliable and valid measurement of clinical problems. Several instruments have been developed in order to help reach these goals, including the Symptom Checklist Revised-90 (SCL90-R; Derogatis, 1992), the Behavioral Health Monitor-20 (BHM20; Kopta & Lowry, 2002), the Outcome Questionnaire-45 (OQ45, Lambert, Hansen, Umpress, et al., 2001), and the Treatment Outcome Package (TOP; Kraus, Seligman, & Jordan, 2005). Each of these measures has a well-established psychometric profile and a considerable research literature to recommend it, but these measures were designed with general psychiatric patients as the intended population, rather than college counseling clients in particular. In contrast, whereas a small number of measures have been developed specifically for use in college counseling, widespread use of many of these scales is limited by various factors. For instance, the K-State Problem Identification Rating Scales (Robertson et al., 2006) were designed for use in college counseling centers and assess common student problems using a relatively brief, standard form. However, there is limited information regarding their psychometric performance across multiple counseling centers. Similarly, the Presenting Problem Checklist (Diemer, Wang, & Dunkle, 2009) was developed for use within one counseling center, and information regarding its convergent validity is lacking. Another example is the Millon College Counseling Inventory (Millon, Strack, Millon-Niedbala, & Grossman, 2008), which was designed to be a multidimensional assessment of specific concerns appropriate to counseling center populations and was normed using multiple counseling centers. However, this instrument is long (150 items, comprising 36 subscales), and data regarding its use as a repeated measure in counseling are not available. The Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62; Locke et al., 2011) was developed as a standardized instrument for use with college counseling center clients. It provides a multidimensional assessment of psychological symptoms that were identified as important by counseling center staff and experienced psychotherapy researchers. Further development of this instrument was undertaken by the Center for Collegiate Mental Health (CCMH) and was based on a large (n ⫽
22,060) and nationally representative sample of counseling center clients, which has contributed to its rapid adoption in the field; Locke et al. (2012) reported that at least 100 counseling centers in the United States and Canada use either the CCAPS-62 or its short form, the CCAPS-34. Both of the CCAPS instruments are offered to college and university counseling centers free of charge for clinical and research purposes. A subset of the counseling centers that administer the CCAPS instruments also regularly collect informed consent from their clients to share de-identified information with CCMH. Given its broad distribution, it is all the more essential to continue to evaluate the suitability of the CCAPS-62 for use in counseling centers and to improve research-based tools designed to facilitate interpretation and decision making. The CCAPS-62 has eight factor-derived subscales, including Depression, Eating Concerns, Substance Use, Generalized Anxiety, Hostility, Social Anxiety, Family Distress, and Academic Distress. As a first step in the validation process, Locke et al. (2011) used a large sample (n ⫽ 11,106) of counseling center clients to derive these factors in an exploratory factor analysis and confirmed the measure’s structure using a cross-validation sample (n ⫽ 10,954). Results of that study demonstrated excellent confirmatory model fit (comparative fit index [CFI] ⫽ .97; Tucker– Lewis index [TLI] ⫽ .97; root-mean-square error of approximation [RMSEA] ⫽ .051; standardized root-mean-square residual [SRMR] ⫽ .057) and good internal consistency of subscale scores (range: .78 –.92). They also provided evidence regarding the stability of the subscales over time with a nonclinical sample of university students (n ⫽ 117). One-week test–retest stability coefficients were between .78 and .93 for each of the eight subscales. Two-week test–retest stability coefficients were between .76 and .92, indicating that, for students not attending psychotherapy (and from whom little change would be expected in mean level of symptoms), the scales are stable over short periods of time. In addition, Locke et al. (2011) tested the concurrent validity of the subscales with a sample of nonclinical students (n ⫽ 499) from a single mid-Atlantic university. Because each subscale of the CCAPS-62 assesses a conceptually different set of psychological concerns, these authors had participants complete several referent measures, one for each subscale of the CCAPS-62. Results from this initial study revealed that each CCAPS-62 subscale correlated most highly with its referent measure. While these initial results indicated that individual subscales assess their intended constructs in this sample of nonclinical students, additional work is required before the CCAPS-62 can be accepted as a well-validated instrument for use with college counseling populations. Indeed, now that several basic psychometric properties of the instrument have been reported, examining the utility and applicability of the CCAPS-62 in counseling centers is an essential step in evaluating this measure, both for practice and research. As described by Cizek (2012), test “validity” is frequently invoked to imply one of two distinct concerns: score meaning (i.e., does a test provide an accurate measure of its designed construct?) and justifications of test use (i.e., is it justified to use the test in a specific way?). While there is obvious overlap between these validity questions, they must be kept distinct. Many of the initial studies reported by Locke et al. (2011) addressed the first definition of validity; based on these studies we can be reasonably certain that the scores of CCAPS-62 subscales at least adequately measure the constructs that they were intended to assess. However,
CLINICAL VALIDITY OF THE CCAPS-62
more can be addressed in this aim, and there are important questions left to answer regarding the recommended applications of the CCAPS-62.
The Present Report: Three Clinically Focused Validation Studies In this article, we report results of three studies designed to further assess the validity of the CCAPS-62, as well as to examine its clinical usefulness in counseling centers. The CCAPS-62 is designed for use in counseling centers, to assist with intake, triage, follow-up assessments, and outcome monitoring. Locke et al. (2011) provided preliminary evidence of adequate construct validity and reliability with a nonclinical sample at a single university, and if the CCAPS-62 is to be used in counseling centers, its subscale score meanings must be validated in this population. The first study extends that preliminary work to the setting for which the CCAPS-62 was designed, using data from clients at multiple counseling centers. Another question addressed in the first study is whether the CCAPS-62’s subscales are sensitive at the levels of distress experienced by most clients seeking counseling. In essence, the subscale scores should be able to detect differences in scores at typical levels of distress for clients, rather than only extremely high or low levels of distress. To examine this, we used item-response-theory (IRT) analysis of the CCAPS-62 and plotted test information functions (TIFs) for each of the subscales in this clinical sample. The final topic addressed in this study is the higher order organization of CCAPS-62 subscales. Since subscale scores on the CCAPS-62 are correlated, they may be influenced by one or multiple organizing constructs, such as negative affect or internalizing problems. If there are higher order organizing factors on the CCAPS-62, these factors may be clinically informative along with the domain-specific subscale scores (McAleavey, Nordberg, Kraus, & Castonguay, 2012). One potential use of the CCAPS-62 is in triage and outcome monitoring. Some other commonly used instruments in counseling centers (e.g., the Outcome Questionnaire-45; Lambert, Hansen, Umpress, et al., 2001) use scores to differentiate between “clinical” and “nonclinical” levels of distress. If the CCAPS-62 is to be an acceptable tool to use alongside or as an alternative to these instruments, the use of its subscale scores for this purpose must be justified. While researchers have noted that many college students with mental health concerns do not seek treatment (e.g., Blanco et al., 2008), several studies have nevertheless found meaningful differences in the reported distress of students who seek psychotherapy and those who do not (Eisenberg, Hunt, Speer, & Zivin, 2011; Hayes, Youn, et al., 2011). Thus, as another evaluation of the CCAPS-62’s clinical usefulness, the second study examined its ability to discriminate between students in counseling and those not in counseling. We would expect that, if the CCAPS-62 subscales are sensitive to psychological distress, students seeking counseling would report higher distress, on average, than students not seeking such services. This would potentially allow for creation of cut scores that could optimally differentiate between “clinical” and “nonclinical” ranges of the CCAPS-62 subscales, which is commonly done using methods developed by Jacobson and Truax (1991) and described by Lambert and Ogles (2009). In addition, a better understanding of differences in CCAPS-62 scores between those seeking treatment and those not may offer
3
important avenues for research into treatment-seeking behavior, with potential applications to outreach and preventative services. Finally, if the CCAPS-62 is to be used during initial and follow-up assessments, it must be justified for the purpose. Evidence that the CCAPS-62 subscales correspond to and predict commonly assessed clinical disorders could serve as justification for this test use. Further, it would be optimal if the CCAPS-62 could be used explicitly as a diagnostic screening instrument with which counselors could make initial diagnostic impressions for further evaluation. Since the CCAPS-62 was not designed originally as a diagnostic instrument, its performance must be carefully evaluated in this regard. In particular, adequate diagnostic predictions by the subscales of the CCAPS-62 would not only help validate the subscales’ score meanings but could also facilitate intake procedures at counseling centers. To address these important clinical issues, the third study examined whether the CCAPS-62 subscales correspond to and predict commonly assessed clinical disorders.
Study 1: Concurrent Validity With Established Measures in a Counseling Sample, Ranges of Test Information, and Exploration of Higher Order Factors The aim of Study 1 was primarily to replicate and extend the design of Locke et al. (2011), using a clinical sample drawn from multiple counseling centers and an additional referent measure. Further, this sample was used to examine the effective ranges of sensitivity of the CCAPS-62’s subscales in the clinical population. Last, as it is also known that several CCAPS-62 subscales have significant correlations with one another, this study sought to explore the presence of higher order latent constructs that might explain some of the interrelatedness.
Method Participants. Participants in Study 1 were 3,470 graduate and undergraduate students attending an intake for counseling services at 16 college and university counseling centers across the United States. The mean age of the sample was 22.4 years (range ⫽ 18 – 62). Of the participants, 560 (18.0%) classified themselves as freshman, 604 (16.9%) as sophomore, 659 (21.2%) as junior, 704 (22.6%) as senior, 525 (16.9%) as graduate student, and 418 did not report their academic standing or reported another option. Women represented 61.1% of the sample (n ⫽ 2,070), men 38.4% (1,302), eight (0.3%) identified as transgender, and 90 did not report gender. In terms of ethnicity, 307 (9.1%) identified as African American, 227 (6.8%) as Asian American, 167 (5.0%) as Latino/a, 2,418 (71.7%) as European American, 103 (3.0%) identified as multiracial, and 238 (4.0%) identified as “other” or preferred not to answer. Measures. To facilitate comparisons between the clinical sample in this study and the nonclinical sample examined by Locke et al. (2011), the same referent measures were used in this study with the addition of one other validation instrument, the PHQ-9. Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62; Locke et al., 2011). The CCAPS-62 is a 62-item measure designed to assess a range of psychological symptoms common in Axis I disorders most applicable to the college popu-
4
MCALEAVEY ET AL.
lation. It has eight subscales: Depression, Generalized Anxiety, Social Anxiety, Eating Concerns, Substance Use, Hostility, Academic Distress, and Family Distress. Respondents are asked to rate themselves over the previous 2 weeks, with each item rated on a 5-point Likert-type scale anchored at 0 (not at all like me) and 4 (extremely like me). Nine items on the CCAPS-62 are reversescored. The subscale scores are calculated by summing item scores such that higher scores indicate more distress. This is based on the simple structure of the CCAPS-62 (see Locke et al., 2011), and the subscale scores derived in this way have demonstrated acceptable internal consistency and test–retest reliability estimates, as well as initial evidence of convergent validity in a largely nonclinical college population (Locke et al., 2011). In this sample, the Cronbach’s alphas were ␣ ⫽ .92 for Depression, ␣ ⫽ .85 for Generalized Anxiety, ␣ ⫽ .85 for Social Anxiety, ␣ ⫽ .87 for Eating Concerns, ␣ ⫽ .85 for Substance Use, ␣ ⫽ .85 for Hostility, ␣ ⫽ .83 for Academic Distress, and ␣ ⫽ .84 for Family Distress. Alcohol Use Disorders Identification Test (AUDIT; Saunders, Aasland, Babor, de la Fuente, & Grant, 1993). The AUDIT is a widely used 10-item measure of problematic alcohol use and has demonstrated adequate validity and internal consistency reliability (␣ ⫽ .88 in an American sample; Saunders et al., 1993). In this sample the Cronbach’s alpha was ␣ ⫽ .85. Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). The BDI is a 21-item self-report measure designed to assess symptoms of depression. The BDI has demonstrated strong convergent validity with other measures of depression symptoms (Roberts, Lewinsohn, & Seeley, 1991). Its internal consistency has been reported at ␣ ⫽ .86 on average (Beck, Steer, & Garbin, 1988). In this sample the Cronbach’s alpha was ␣ ⫽ .91. Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988). The BAI is a 21-item self-report measure designed to assess symptoms and severity of anxiety. The BAI has demonstrated adequate internal consistency reliability (␣ ⫽ .92) and concurrent validity (Steer, Ranieri, Beck, & Clark, 1993). In this sample the Cronbach’s alpha was ␣ ⫽ .89. Social Phobia Diagnostic Questionnaire (SPDQ; Newman, Kachin, Zuellig, Constantino, & Cashman-McGrath, 2003). The SPDQ is a 25-item self-report measure designed to assess the symptoms of social phobia, as defined in the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM– IV–TR; American Psychiatric Association, 2000). In samples of undergraduate college students, the SPDQ has demonstrated strong clinical validity and correlated highly with other measures of social anxiety, with excellent internal consistency (␣ ⫽ .95). In this sample the Cronbach’s alpha was ␣ ⫽ .96. Student Adaptation to College Questionnaire (SACQ; Baker & Siryk, 1986). This is a 67-item Likert-type self-report measure designed to assess undergraduate students’ adjustment to the college environment. The SACQ is scored using four subscales, and only the SACQ Academic Adjustment scale (internal consistency reliability has been reported at ␣ ⫽ .84; Baker & Siryk, 1986) was used in the present analysis. In this sample the Cronbach’s alpha was ␣ ⫽ .84. Eating Attitudes Test-26 (EAT-26; Mintz & O’Halloran, 2000). This is a 26-item measure designed to assess problematic attitudes and behaviors related to eating. The EAT-26 has demonstrated strong internal consistency reliability (␣ ⫽ .90) and corre-
lates highly with other measures of eating concerns (Garner, Olmsted, Polivy, & Garfinkel, 1984). In this sample the Cronbach’s alpha was ␣ ⫽ .85. Self-Report Family Inventory (SFI; Beavers, Hampson, & Hulgus, 1985, 1990). The SFI is 36-item inventory designed to assess an individual’s perception of his or her family functioning. The SFI has demonstrated good internal consistency (overall ␣ between 0.84 and 0.88) and retest reliability and has been shown to correlate highly with other self-report measures of family functioning (Tutty, 1995). In this study only the total scale score was used since the CCAPS-62 Family Distress subscale assesses a relatively broad sense of family distress and the overall scale has good internal consistency reliability (Tutty, 1995). In this sample the Cronbach’s alpha was ␣ ⫽ .64. State-Trait Anger Expression Inventory-2 (STAXI-2; Spielberger, 1999). The STAXI-2 is a 57-item inventory designed to measure the experience, expression, and control of anger in adolescents and adults. Because the CCAPS Hostility subscale asks respondents to rate themselves over the past 2 weeks (rather than at the present moment), only the Trait Anger subscale of the STAXI-2 was used; in a study of college students the Trait Anger subscale had good internal consistency reliability (␣ ⫽ .84; Agliata & Renk, 2009). In this sample the Cronbach’s alpha was ␣ ⫽ .86. Marlowe-Crowne Social Desirability Scale-Short Version (MCSD; Reynolds, 1982). The MCSD is a 13-item inventory designed to assess social desirable responding. The MarloweCrowne has demonstrated good internal consistency and validity (Reynolds, 1982) and is frequently used in self-report research. In this sample the Cronbach’s alpha was ␣ ⫽ .73. Patient Health Questionnaire-9 (PHQ-9; Kroenke, Spitzer, & Williams, 2001). The PHQ-9 is a nine-item measure of depressive symptoms consistent with the DSM–IV–TR (American Psychiatric Association, 2000) definition of major depressive disorder. It has demonstrated good reliability (␣ ⫽ .86 –.89 in a general health setting; Kroenke et al., 2001) as well as convergent validity with other measures of depression. This measure was added to the design of Locke et al. (2011) because it is directly based on the DSM–IV–TR criteria for diagnosis of major depressive disorder (MDD) and has demonstrated strong sensitivity and specificity to this diagnosis (sensitivity ⫽ .88; specificity ⫽ .88) in a sample of 580 patients (Kroenke et al., 2001), whereas it may be said that some other measures assess depressive mood. In this sample the Cronbach’s alpha was ␣ ⫽ .86. Procedure. Participants’ CCAPS-62 scores are included in the CCMH 2010 –2011 data set (Center for Collegiate Mental Health, 2012), although these data were collected through a separate study. Participating centers were recruited on a voluntary basis via e-mail to member-centers of the CCMH. Prior to their scheduled intake appointment at the counseling center, each participant completed the CCAPS-62 and one of the referent measures. All new clients seeking services during the data collection period at each counseling center were invited to participate. Each participating counseling center was assigned to collect 100 cases of CCAPS-62 and one referent measure, although due to the specifics of data collection procedures across sites, not all sites contributed 100 cases of complete data. Each site only administered one referent measure at a time, and most (13) only collected data from one measure other than the CCAPS-62. One institution collected data from two referent measures during separate data
CLINICAL VALIDITY OF THE CCAPS-62
collection periods. In addition, data were collected from one counseling center at which it was standard practice to administer the PHQ-9 and the CCAPS-62 at intake. Data from this university were available for a 1-year period from August 2009 to August 2010, so all available cases were included. Data analysis. Consistent with the methods used in the earlier validation study of the CCAPS-62 (Locke et al., 2011), data for any client who had completed fewer than 50% of the CCAPS-62 items were dropped from the study prior to analysis (n ⫽ 7). In addition, when referent measure provided heuristics in their respective manuals for how many items must be complete in order for an administration to be considered valid, we used these heuristics to identify invalid administrations of the referent measures and deleted these cases as well (n ⫽ 93). Thus, the final sample size for this study was 3,470. Independent samples t tests indicated that these data did not differ significantly from the retained data in severity of the subscale scores of the CCAPS-62. The primary analysis for this study required the computation of Pearson product-moment correlation coefficients between each CCAPS-62 subscale and the referent measures, aggregated across collection sites that used the same referent measure. Consistent with Locke et al. (2011), the peak correlations between CCAPS-62 subscales and the referent measures were inspected. It was hypothesized that all measures would correlate significantly with one another, given the tendency for self-report measures of psychological symptoms to correlate in meaningful ways but that the highest correlations for each CCAPS-62 subscale would be with the referent measure(s) of the same construct. For instance, the correlation between the Eating Concerns subscale of the CCAPS-62 and the total score on the EAT-26 was hypothesized to be the highest correlation between the Eating Concerns subscale and any other measure. In addition, the MCSD was hypothesized to correlate negatively with measures of self-report psychological symptoms, consistent with Locke et al. To evaluate the ranges of test information, TIFs for each of the subscales were computed. Mplus v. 6.11 (L. K. Muthén & Muthén, 1998 –2011) was used to conduct graded-response model (Samejima, 1969) of the subscales, with each CCAPS-62 subscale analyzed separately. All items that primarily load on a given subscale were entered as polytomous ordinal variables and each subscale was thus modeled as a simple one-factor solution using maximumlikelihood estimation. The main outcome of this analysis was test information, which is analogous to internal consistency. TIFs are graphical representations of test information. In IRT, a test’s (in this case, a subscale’s) ability to make meaningful discriminations between scores varies across levels of the latent trait being measured (in this case, levels of symptom severity). The TIF is the inverse of the conditional error variance for all items in a subscale and, therefore, is a measure of an estimate’s precision. That is, the Depression subscale may be more sensitive at certain levels of depression (say, differentiating severe depression from a more mild depression) but relatively less sensitive to differences in depression among healthy individuals (say, differentiating euthymia from contentment). When information is high, error variance is low, and estimates are more precise. Importantly, in the process of developing a short form of the CCAPS-62, a similar IRT analysis of the CCAPS-62 items was conducted by Locke et al. (2012) using the original validation data for the CCAPS-62. However, these authors did not present TIFs
5
and were not primarily concerned with examining the ranges of sensitivity of the CCAPS-62 subscales; rather, these authors selected items for the CCAPS-62’s short form (i.e., the CCAPS-34) based in part on item and test information. In the present study, we explicitly evaluated subscales, rather than items, and used a new sample of counseling clients who completed the CCAPS-62, rather than the CCAPS-70, as in Locke et al. Based on item-level information presented by Locke et al., we expected that most subscales would have highest information near the mean of the latent trait or slightly higher than the mean. TIFs were plotted from –3 SD to 3 SD on theta and generated directly by Mplus (see B. O. Muthén, 1998 –2004; B. O. Muthén & Asparouhov, 2002). Finally, an exploratory factor analysis (EFA) was conducted on the subscale scores of the CCAPS-62. The analysis was conducted using Mplus v. 6.11 (L. K. Muthén & Muthén, 1998 –2011) with the default maximum likelihood estimator, as well as the default oblique GEOMIN rotation. The CCAPS-62 subscale scores (computed using unit weighting above) were treated as continuous indicator variables, with values ranging from 0 to 4. A cutoff of 0.40 was used to determine a meaningful factor loading (Stevens, 1996). Models with increasing numbers of factors were compared using the chi-square difference test for improved model fit as well as meaningful theoretical interpretability.
Results Means and standard deviations of the subscales of the CCAPS-62 and the referent measures are in Table 1. The Pearson product-moment correlations for the eight CCAPS subscales and all referent measures can be found in Table 2. Consistent with expectations, each subscale correlated most highly with the associated referent measure(s). The target correlations ranged from r ⫽ .576 (for Eating Concerns and the EAT-26) to r ⫽ .821 (for Depression and the BDI). Also, as expected there were significant negative correlations between the MCSD and each of the CCAPS subscales, ranging in size from r ⫽ –.298 to r ⫽ –.577. TIFs for the subscales are presented in Figure 1. As can be seen, all subscales have maximal information near the average severity of clients in this sample, between the – 0.02 (for Academic Distress) and ⫹0.64 SD (for Substance Use) of their respective distributions. This suggests that the CCAPS-62 subscales are maximally sensitive around the average levels of distress reported by clients in counseling centers, with some subscales (i.e., Hostility, Substance Use, Eating Concerns) displaying maximal sensitivity at somewhat higher levels of the latent traits. Substance Use in particular seems to contain most of its information above the mean. In addition, it is clear by the relative heights of the TIFs that Depression, Eating Concerns, and Substance Use contain more information than the other subscales at some ranges of severity and therefore provide the most precise estimates at these ranges. Academic Distress, Family Distress, and Social Anxiety in particular appear to contain less information across most of the distribution than some other subscales. These differences are at least in part due to the length of subscales, as Depression and Eating Concerns are the two longest subscales on the CCAPS-62 (with 13 and nine items, respectively). Most subscales (except for, perhaps, Depression) seem to contain little information below –1.5 SD; Substance Use in particular is lacking in information below about – 0.5 SD The upper limit of information is variable across subscales, rang-
MCALEAVEY ET AL.
6
Table 1 Means and Standard Deviations for CCAPS-62 Subscales and Referent Measures in Study 1 Measure
N
M
SD
AUDIT BDI BAI EAT-26 MCSD A.A. SFI SPDQ Trait Anger PHQ-9 CCAPS-62 subscales Depression Eating Concerns Substance Use Generalized Anxiety Hostility Social Anxiety Family Distress Academic Distress
234 201 316 101 107 69 57 169 186 2,030
4.96 15.95 15.64 5.60 7.55 5.79 106.7 8.07 14.38 10.65
5.41 10.68 10.04 5.86 2.94 1.28 11.55 6.11 4.75 6.36
3,470 3,470 3,470 3,470 3,470 3,470 3,470 3,470
1.62 1.00 0.79 1.66 1.08 1.83 1.27 1.94
0.93 0.89 0.89 0.93 0.88 0.94 0.95 1.03
Note. CCAPS-62 ⫽ Counseling Center Assessment of Psychological Symptoms-62 (Locke et al., 2011); AUDIT ⫽ Alcohol Use Disorders Identification Test (Saunders, Aasland, Babor, de la Fuente, & Grant, 1993); BDI ⫽ Beck Depression Inventory (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); BAI ⫽ Beck Anxiety Inventory (Beck, Epstein, Brown, & Steer, 1988); EAT-26 ⫽ Eating Attitudes Test-26 (Mintz & O’Halloran, 2000); MCSD ⫽ Marlowe-Crowne Social Desirability scale (Reynolds, 1982); A.A. ⫽ Academic Adjustment subscale of the Student Adaptation to College Questionnaire (Baker & Siryk, 1986); SFI ⫽ Self-report Family Inventory total score (Beavers, Hampson, & Hulgus, 1985, 1990); SPDQ ⫽ Social Phobia Diagnostic Questionnaire (Newman, Kachin, Zuellig, Constantino, & Cashman-McGrath, 2003); Trait Anger ⫽ Trait Anger subscales of the STAXI-2 (Spielberger, 1999); PHQ-9 ⫽ Patient Health Questionnaire-9 (Kroenke, Spitzer, & Williams, 2001).
ing from about 1 SD for Academic Distress to about 2.5 SD for Depression and Eating Concerns. Item-level parameters as well as the points on theta at which peak test information for each subscale are available directly from the first author on request. The second-order EFA resulted in a two-factor solution as optimal. While both the three- and four-factor solutions provided even better fit than the one- and two-factor solutions according the chi-square difference test, the third and fourth factors each had a factor with only one subscale loading above our cutoff of 0.40, indicating that they did not truly represent a higher order factor. Thus, these models were discarded in favor of the more parsimonious two-factor model. It should be noted that the one-factor model demonstrated nearly adequate model fit, as indicated by the CFI (0. 960), TLI (0.944), SRMR (.032), and RMSEA (.068). However, the two-factor solution was significantly better fit to the data than a one factor solution, 2(7, 3,470) ⫽ 201.706, p ⬍ .001, and had superior measures of fit: CFI ⫽ 0.984, TLI ⫽ 0.966, both above the 0.95 cutoff suggested by Hu and Benter (1999); SRMR ⫽ .021, below the suggested criterion of 0.08 (Hu & Bentler, 1999); and an RMSEA of 0.054 (90% confidence interval, 0.046 – 0.062), also close to the .05 cutoff recommended by Browne and Cudeck (1993). The rotated subscale loadings for the two-factor solution can be found in Table 3. The first factor seemed to represent “general distress” or “internalizing distress,” with above-threshold
loadings from the Depression, Generalized Anxiety, Social Anxiety and Academic Distress scales. The second factor comprised the Substance Use and Hostility subscales and may be indicative of externalizing problems. The two factors were significantly correlated (r ⫽ .77, p ⬍ .01).
Discussion The results of the first aim of Study 1 largely replicate and extend the findings of Locke et al. (2011). This study provides evidence that, within a clinical population, the CCAPS-62 has good convergent validity with established measures of psychological symptoms. These correlational results are broadly similar to those obtained by Locke et al. The peak correlations for each subscale in this study and in Locke et al., respectively, were for Depression, .821 and .721; Eating Concerns, .576 and .648; Substance Use, .604 and .811; Generalized Anxiety, .692 and .643; Social Anxiety, .747 and .733; Family Distress, .616 and .648; Academic Distress, –.690 and –.680; Hostility, .674 and .566. Taken together, these data suggest that the CCAPS-62 subscale scores are comparably accurate in assessing their intended constructs for college students who are and are not in counseling, suggesting that it is justifiable for use with both populations. Further, the Depression subscale also correlated quite highly (r ⫽ .767) with the PHQ-9, a newer measure of depressive symptoms, further supporting this subscale as a measure of depression. Two cautionary notes should be added. First, several subscales of the CCAPS-62 seem to correlate meaningfully with multiple referent measures. This may be particularly true for the Generalized Anxiety subscale, which had high correlations (r ⬎ .5) with the BDI and PHQ-9. This lower specificity of some CCAPS-62 subscales may be due to several causes, including the brevity of many subscales, but also seems to reflect actual clinical phenomena as well. Anxiety and depression frequently co-occur, and discriminating perfectly between the two in self-report forms may not be possible (e.g., Stulz & Crits-Christoph, 2010). Thus, while the CCAPS-62 Generalized Anxiety subscale does not appear to be a “pure” measure of anxiety (i.e., free from covariation with depressive and other symptoms), its pattern of correlations is common when assessing self-report anxiety and may still therefore be clinically meaningful. Second, the correlations between the MCSD and CCAPS-62 subscales were in the expected direction but were particularly strong for the Hostility subscale (r ⫽ –.577). This value is notably larger in this counseling sample than that found by Locke et al. (2011). It may be that many college students have difficulty acknowledging their hostile impulses and feelings, and this may be especially true of those seeking counseling. Future research is needed to better address this topic. The results of the second aim of Study 1 further inform the use of the CCAPS-62 subscales. A particular strength of the CCAPS-62 subscales appears to be the ability to best detect differences between scores near the average level of distress across all subscales. This is potentially very useful, as it suggests that the CCAPS-62 may be useful when little is known about a client (as in initial assessment) and does not require a clinical suspicion of a problem before screening. That is, subscales provide their best estimates of distress for clients who are near or slightly above the mean level of distress on the subscale.
CLINICAL VALIDITY OF THE CCAPS-62
7
Table 2 Pearson Product-Moment Correlation Coefficients Between CCAPS-62 Subscales and Referent Measures in Study 1 Measure BDI PHQ-9 EAT-26 AUDIT BAI SPDQ SFI A.A. Trait Anger MCSD CCAPS-62 subscales Depression Eating Concerns Substance Use Generalized Anxiety Social Anxiety Family Distress Academic Distress Hostility
Depression ⴱⴱ
Eating Concerns ⴱⴱ
Substance Use
Generalized Anxiety ⴱⴱ
Social Anxiety
Family Distress
ⴱⴱ
ⴱⴱ
Academic Distress ⴱⴱ
.821 .767ⴱⴱ .150 .074 .454ⴱⴱ .510ⴱⴱ .474ⴱⴱ ⫺.440ⴱⴱ .423ⴱⴱ ⫺.439ⴱⴱ
.375 .347ⴱⴱ .576ⴱⴱ .030 .295ⴱⴱ .358ⴱⴱ .252 .053 .176ⴱ ⫺.346ⴱⴱ
.137 .193ⴱⴱ ⫺.480 .604ⴱⴱ .153ⴱ .100 .053 ⫺.224 .064 ⫺.298ⴱ
.651 .613ⴱⴱ .158 .171ⴱ .692ⴱⴱ .460ⴱⴱ .449ⴱⴱ ⫺.173 .315ⴱⴱ ⫺.332ⴱⴱ
.497 .405ⴱⴱ .039 .041 .240ⴱⴱ .747ⴱⴱ .275 ⫺.435ⴱⴱ .355ⴱⴱ ⫺.385ⴱⴱ
.480 .310ⴱⴱ .263ⴱ .075 .188ⴱⴱ .155 .616ⴱⴱ ⫺.202 .371ⴱⴱ ⫺.339ⴱⴱ
.506 .655ⴱⴱ .112 .167ⴱ .294ⴱⴱ .326ⴱⴱ .080 ⴚ.690ⴱⴱ .267ⴱ ⫺.356ⴱⴱ
1.00 .402ⴱⴱ .215ⴱⴱ .660ⴱⴱ .551ⴱⴱ .382ⴱⴱ .582ⴱⴱ .570ⴱⴱ
1.00 .192ⴱⴱ .340ⴱⴱ .301ⴱⴱ .233ⴱⴱ .269ⴱⴱ .290ⴱⴱ
1.00 .187ⴱⴱ .077ⴱⴱ .123ⴱⴱ .195ⴱⴱ .270ⴱⴱ
1.00 .466ⴱⴱ .339ⴱⴱ .437ⴱⴱ .501ⴱⴱ
1.00 .242ⴱⴱ .321ⴱⴱ .332ⴱⴱ
1.00 .229ⴱⴱ .378ⴱⴱ
1.00 .371ⴱⴱ
Hostility .516ⴱⴱ .482ⴱⴱ .118 .067 .320ⴱⴱ .225ⴱ .107 ⫺.309ⴱⴱ .674ⴱⴱ ⫺.577ⴱⴱ
1.00
Note. CCAPS-62 ⫽ Counseling Center Assessment of Psychological Symptoms-62 (Locke et al., 2011); BDI ⫽ Beck Depression Inventory (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); PHQ-9 ⫽ Patient Health Questionnaire-9 (Kroenke, Spitzer, & Williams, 2001); EAT-26 ⫽ Eating Attitudes Test-26 (Mintz & O’Halloran, 2000); AUDIT ⫽ Alcohol Use Disorders Identification Test (Saunders, Aasland, Babor, de la Fuente, & Grant, 1993); BAI ⫽ Beck Anxiety Inventory (Beck, Epstein, Brown, & Steer, 1988); SPDQ ⫽ Social Phobia Diagnostic Questionnaire (Newman, Kachin, Zuellig, Constantino, & Cashman-McGrath, 2003); SFI ⫽ Self-report Family Inventory total score (Beavers, Hampson, & Hulgus, 1985, 1990); A.A. ⫽ Academic Adjustment subscale of the Student Adaptation to College Questionnaire (Baker & Siryk, 1986); Trait Anger ⫽ Trait Anger subscales of the STAXI-2 (Spielberger, 1999); MCSD ⫽ Marlowe-Crowne Social Desirability scale (Reynolds, 1982). Boldface indicates target correlations. ⴱ p ⬍ .01. ⴱⴱ p ⬍ .001.
However, when a particular subscale is not a clinical issue because a client’s reported distress is appreciably lower than average (e.g., the Substance Use subscale for nonabusing clients), the subscales appear to be noticeably less able to differentiate and may provide uninformative scores. This suggests that below these points, counselors should not expect or attempt to meaningfully interpret different scores (or changes at these levels), and instead a more simple interpretation, such as “Not at risk” may be more appropriate than a numeric score. In addition, and for some subscales more so than others, very high scores (higher than 1.5 or 2 SDs) may not be very precise. These upper limits on the precision of subscales of the CCAPS-62 suggest that above certain points on the subscales, it would be wise to further assess the clinical problem, as the CCAPS-62 subscale score may not be sufficient to meet clinical needs. For researchers, these upper and lower limits may provide useful interpretations as well. For instance, in evaluating the degree of change associated with a particular treatment, clients who begin treatment below the information “floor” for a particular subscale may appear to change, but the changes may be more related to measurement imprecision than to meaningful change, especially if the changes do not bring the clients into the more average range. Further work is needed in order to provide confidence levels of change significance across CCAPS-62 subscales. Last, the second-order EFA provides some preliminary structure to help explain and understand the significant correlations between subscales of the CCAPS-62, as well as some of the high correlations between subscales and measures other than the primary referent (e.g., Depression and the BAI, Generalized Anxiety and
the BDI). Some CCAPS-62 subscales appear to assess shared underlying constructs in addition to the primary subscale, which can account for some of the covariation between subscales. One such construct is internalizing and externalizing psychopathology. This distinction indicates whether symptoms are primarily related to behaviors (externalizing problems including substance use and conduct problems) or to emotional well-being (internalizing problems including depression and anxiety disorders). Some evidence in studies of contemporaneous diagnoses has been found to support this basic distinction between observed symptoms in clinical situations (e.g., Slade & Watson, 2006). Although we only address these issues briefly, these preliminary analyses highlight the importance of more thoroughly examining the higher order structure of the CCAPS-62 and indicate the possibility of generating new scales that reflect these additional latent structures.
Study 2: Differentiating Clinical and Nonclinical Groups The aim of Study 2 was to determine the extent to which subscale scores on the CCAPS-62 meaningfully differentiated between clinical and nonclinical samples of college students and to assess the utility of cut-scores developed based on the two samples, using Jacobson and Truax’s (1991) equation c, which is recommended for overlapping populations. The primary clinical group of interest for the CCAPS-62 is counseling center clients, as the instrument was designed for use in this context.
MCALEAVEY ET AL.
8 20 18 16 14
Depression
Information
12
Family Distress Eating Concerns
10
Academic Distress Hostility
8
Substance Use Generalized Anxiety
6
Social Anxiety 4 2 0 -3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Theta
Figure 1. Test information functions for the subscales of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62; Locke et al., 2011) in Study 1.
Method Participants. Data for this study were derived from a large, previously reported survey data set (Hayes, Locke, & Castonguay, 2011). The overall sample of this study included 21,686 college students from 43 colleges and universities. Prior to analyses, 2,961 participants were removed from the sample due to missing data on the CCAPS-62, and an additional 1,303 participants were removed from the sample due to missing data on the current treatment services items. Listwise deletion was selected because missing data were sparse (range: 0.2%–1.1% missing per item of the CCAPS-62), and the sample with complete data was large. Although listwise deletion negatively affects statistical power and may produce biased estimates when sensitive information is assessed, individuals who completed all survey items may be motivated to respond more accurately than individuals who did not complete the survey.
Table 3 Exploratory Second-Order Factor Analysis of Data in Study 1 Scale
1
2
Depression Generalized Anxiety Academic Distress Social Anxiety Family Distress Eating Concerns Substance Use Hostility
.918 .593 .618 .698 .131 .354 ⫺.089 .005
.004 .167 ⫺.001 ⫺.121 .368 .117 .419 .786
Note. Loadings in boldface are above the inclusion cutoff of .400.
The remaining 17,422 participants were divided into three groups. The two groups of interest were No Treatment (15,027 participants who had indicated that they were not presently receiving psychiatric or psychological care for a mental or emotional concern of any kind) and Counseling (846 participants who indicated that they were presently receiving psychological counseling on campus at the time they completed the survey). The third group, termed Other Services, comprised 1,549 individuals who were not in counseling on campus at the time of the survey but who indicated either current psychiatric medication and/or off-campus counseling. These respondents were removed from the sample because they did not represent either group of interest, and the heterogeneity of clinical problems within this group may be wider than that seen in counseling centers (e.g., students receiving only pharmacologic treatments for attention-deficit/hyperactivity disorder, students engaged in long-term outpatient psychotherapy in the community), making the results of this group less relevant to the present purposes of evaluation. However, when they were examined and included in the analyses, all interpretations were unaffected. Thus, the total sample size used for analysis was 15,873. Counseling and No Treatment groups were compared using multivariate analysis of variance (MANOVA), with the eight subscales of the CCAPS-62 as the dependent variables and planned comparisons between the two groups on each subscale. The final sample was composed of 9,977 (62.9%) female participants, 5,676 (36.1%) male, 23 (0.1%) transgender, and 43 (0.3%) who indicated “other.” The majority (11,825; 75.2%) of the sample was White/Caucasian, with 1,128 (7.2%) Asian/Pacific Islander, 743 (4.7%) Latino(a)/Hispanic, 545 (3.5%) Black/African American, 548 (3.5%) Multiracial, 149 (0.9%) Middle Eastern, 53 (0.3%) Indigenous/Native American, and 740 (4.7%) of respon-
CLINICAL VALIDITY OF THE CCAPS-62
dents preferring not to answer regarding race/ethnicity. Most of the sample consisted of undergraduates, with 3,536 (22.5%) being freshman, 3,401 (21.7%) sophomore, 3,927 (25.0%) junior, and 3,858 (24.6%) senior, while 834 (5.3%) were graduate students, and less than 151 (0.9%) selected another option. Procedure. The methods of the overall survey have been reported elsewhere (Hayes, Locke, & Castonguay, 2011). Briefly, college students from 43 colleges and universities in the United States were invited to participate in an online, anonymous study during the Spring 2010 semester. Participants completed the CCAPS-62 and a brief demographics questionnaire via the Internet.
Results Means, standard deviations, and the results of the planned t tests are reported in Table 4. The omnibus F test was significant, F(8, 15,864) ⫽ 84.656, p ⬍ .001, indicating that significant differences between groups existed on the subscales. Examination of the univariate F tests for each subscale revealed that seven of the eight CCAPS-62 subscales differentiated between the No Treatment and the Counseling group. Substance Use was the only subscale that did not differentiate between the groups. In general, the effect sizes for the group mean differences were moderately sized, ranging from d ⫽ 0.283 for Eating Concerns to d ⫽ 0.829 for Depression. While these do indicate some differentiation between groups, they also show clear and substantial overlap between the groups’ distributions on these scores.
Discussion The results of Study 2 suggest that nearly all of the subscales on the CCAPS-62 are somewhat sensitive to differences between in-treatment and not-in-treatment students on average, although it is also clear that there is substantial overlap between these groups on all subscales. This is particularly noteworthy because the subscales of the CCAPS-62 are designed to be brief measures of specific symptomatic constructs. Therefore, not every counseling center client would be expected to have elevated scores on every subscale. For instance, some clients may endorse high Depression but low Academic Distress, compared to their nonclinical peers, which would tend to lessen the group differences on Academic Distress. Further, some subscales of the CCAPS-62 assess rela-
9
tively low base-rate clinical phenomena, even in counseling centers. For instance, Hoyt and Ross (2003) surveyed counseling center clients at one university and identified 12.9% of the sample as at-risk on either of two common indicators (the EAT-26 and body-mass index), but only two people (out of the total 555) were identified as at-risk on both measures. So, even for identified risk areas, many clients at counseling centers would be expected to not fall in the clinical range. Because counseling centers treat clients with a broad range of psychological problems, the discrimination achieved by the subscales of the CCAPS-62 may actually reflect relatively strong sensitivity to differences between populations. Still, some of the effect sizes for these discriminations are small, which suggests that although there are group differences, predictions about individuals based on these group means are not likely to be effective. Whereas this may be due to a number of issues, including the low base-rate issue just discussed, it seems that none of these subscales, except perhaps Depression, can be interpreted to clearly differentiate clinical versus nonclinical populations. Thus, the cut scores derived using the Jacobson and Truax (1991) formula, which are frequently used in counseling psychology, may not profitably be applied to such broad groups as “clinical” and “nonclinical,” or “client” and “nonclient” with a multidimensional measure like the CCAPS-62. It remains possible that the Depression cut score of 1.1 may usefully, if not perfectly, be applied in research and clinical settings, given the larger effect size difference found with this subscale. It is likely that larger differences of subscale scores may be found by selecting subgroups of clients with known clinical problems. For instance, clients who seek treatment for eating disorders or body image issues and/or meet diagnostic criteria for an eating disorder would make a more appropriate clinical group for the Eating Concerns subscale, in that if this subscale is performing well, it ought to discriminate more meaningfully between this group and a nonclinical group. In effect, this may contribute to the Substance Use subscale’s inability to discriminate in this sample, since college student alcohol use is commonplace and often problematic in nonclinical college students. In fact, evidence suggests that alcohol use disorders are more prevalent among college students than among similarly aged adults who are not enrolled in college, although college students are less likely to receive treatment for alcohol use disorders (Blanco et al., 2008). This indicates that more research is needed to determine what, if any, clinical
Table 4 Means, Standard deviations, F Tests, Confidence Intervals, and Effect Sizes for Groups in Study 2 No treatment
Counseling
CCAPS-62 subscale
M
SD
M
SD
F
p
95% CIdiff
d
Cut score
Depression Generalized Anxiety Social Anxiety Academic Distress Eating Concerns Hostility Substance Use Family Distress
0.812 0.983 1.517 1.217 0.9813 0.6517 0.695 0.771
0.737 0.802 0.838 0.840 0.802 0.688 0.832 0.766
1.430 1.539 1.928 1.660 1.210 1.030 0.697 1.251
0.895 0.890 0.901 0.969 0.912 0.849 0.837 0.962
550.63 454.55 191.17 218.65 64.16 236.02 0.004 304.93
⬍.001 ⬍.001 .001 ⬍.001 ⬍.001 ⬍.001 ns ⬍.001
[0.556, 0.680] [0.495, 0.617] [0.348, 0.473] [0.375, 0.509] [0.165, 0.291] [0.320, 0.436] [⫺0.056, 0.059] [0.414, 0.546]
0.829 0.689 0.488 0.523 0.283 0.543 0.002 0.618
1.09 1.25 1.72 1.42 1.09 0.82 0.70 0.98
Note. CCAPS-62 ⫽ Counseling Center Assessment of Psychological Symptoms-62 (Locke et al., 2011); CIdiff ⫽ confidence interval of the mean difference. Cut scores are based on Jacobson & Truax (1991) equation c.
MCALEAVEY ET AL.
10
populations can be meaningfully differentiated using the subscales of the CCAPS-62.
Study 3: Diagnosis Given the multidimensional structure of the CCAPS-62 and the heterogeneous population of counseling clients, one reason that the differences observed in Study 2 were relatively small may be that many clients in counseling do not report elevated scores on all subscales. One advantage of the CCAPS-62, relative to unidimensional measures of distress such as the OQ-45, is its potential ability to identify specific types of distress for which clients need help. Thus, knowing a client’s presenting concerns and reasons for seeking treatment would greatly influence expected scores on the subscales. In addition, if the CCAPS-62 is to be maximally beneficial to initial assessments, the subscales must be able to differentiate, at least to an adequate degree, subpopulations of clients with and without relevant diagnoses. In this third study we sought to determine if the CCAPS-62 subscales are related to therapistrated diagnosis and, if so, how accurate predicted diagnoses are based on these subscales.
Method Data for this study were derived from ongoing data collection of CCMH. This is a collaborative network of counseling centers that have agreed to use standardized self-report assessment instruments and contribute de-identified data for the purposes of research in college counseling. This infrastructure has been described elsewhere (Castonguay, Locke, & Hayes, 2011; Hayes, Locke, & Castonguay, 2011). Participants. For the purposes of this study, five member centers of CCMH contributed de-identified diagnostic information for a period beginning July 1, 2009, and ending December, 13, 2011. These clients’ CCAPS-62 scores are included in the larger CCMH 2010 –2011 data, but the diagnostic information was collected specifically for the purpose of this study. The five contributing centers regularly provide diagnoses based on the DSM– IV–TR (American Psychiatric Association, 2000) following routine intakes or first sessions of counseling. In all cases, only clients who attended at least one session, completed one CCAPS-62, and were assigned a diagnosis were included in the data set. The CCAPS-62 and the first session were required to have taken place during the same week, but the diagnosis may have been assigned up to 3 weeks following the intake or first session (to accommodate the frequent backlog of paperwork in high-volume clinical settings); however, in these cases the diagnosis was likely to be largely based on information obtained in the first session. Of the five participating centers, three were at large state schools (all in rural environments, two in the mid-Atlantic region and one in the Southwest United States), contributing data from 3,209, 1,253, and 2,468 clients, respectively. The other two were private, urban institutions located in the same major Eastern U.S. city, contributing data from 473 and 140 clients each. Each counseling center treats a different number of clients per year, and not every center had been collecting diagnostic data for the entire period, which caused the large range of representation in the sample across schools. Prior to combining data across schools, univariate analyses of variance (ANOVAs) were conducted on each of the
CCAPS-62’s subscales to examine differences between the sites. Although each of the omnibus F tests were significant (p ⬍ .01), the effect sizes of these differences were trivial, with partial eta squared values ranging from .002 (for the effect of center on Social Anxiety and Academic Distress) to .014 (for Family Distress). Thus, differences between centers accounted for roughly 1% or less of the total variability across CCAPS-62 subscale scores. The significance of the F tests was likely due to fairly high number of participants in the analyses. The total sample consisted of 7,543 students in counseling centers. Of these, 4,413 (58.5%) were female, 3,090 (40.9%) were male, 11 (0.1%) were transgender, and seven (0.1%) preferred not to answer. 5,475 (72.6%) were European American, 590 (7.8%) were Hispanic or Latino/a, 476 (6.3%) were Asian American or Asian, 461 (6.1%) were African American or Black, 216 (2.9%) were multiracial, 25 (0.3%) were American Indian or Alaskan Native, 13 (0.2%) were native Hawaiian or Pacific Islander, and the remaining 287 (3.8%) specified another race/ethnicity or preferred not to answer. Procedure. Five diagnoses and groups of diagnoses were identified as face valid diagnostic analogues of CCAPS-62 subscales. These were the Depression subscale with Major Depressive Episodes and Dysthymic Disorder, Generalized Anxiety with Generalized Anxiety Disorder, Eating Concerns with Eating Disorders, Social Anxiety with Social Phobia, and Substance Use with Alcohol Use Disorders (the incidence of nonalcohol substance use disorders was low, and only one question on the CCAPS-62 Substance Use subscale refers to “drugs,” while the others refer to alcohol). Hostility, Family Distress, and Academic Distress were not assigned face-valid diagnostic groups. To determine if CCAPS-62 scores varied significantly based on client diagnoses, we used a series of ANOVAs. Each diagnostic grouping was considered separately and clients given the diagnoses were compared to clients who did not have the diagnosis in question. If the omnibus test was significant (Bonferroni corrected p ⬍ .05), the univariate tests on each subscale were examined. It was hypothesized that the target subscale would be elevated in the diagnostic groups, and if other subscales were significantly elevated these would be smaller effect sizes, likely due to higher severity of general distress in clients who are given diagnoses (McAleavey et al., 2012). To predict diagnoses, additional analyses were conducted. First, because the sample size was large, it was randomly divided into two subsamples. The first subsample was used to develop cut scores by plotting receiver-operating-characteristic (ROC) curves. ROC curves are methods derived from signal detection theory (McFall & Treat, 1999) and are a plot of the sensitivity and 1-specificity at every conceivable cut score on a given measure. Sensitivity is the likelihood of a CCAPS-62 subscale score higher than the cut score, given a therapist-rated diagnosis. Specificity is the likelihood of a CCAPS-62 subscale score lower than the cut score given the absence of a diagnosis. The further the ROC curve deviates from a diagonal line, the more predictive value the test has. From the ROC curve, an optimal cut score can be derived, in this case defined a priori as the combined maximum of the sensitivity and specificity. In addition, the total area under the curve (AUC), which is a value between 0 and 1, has the distinct interpretation of being the likelihood that an individual with a diagnosis
CLINICAL VALIDITY OF THE CCAPS-62
will have a higher score on the CCAPS-62 subscale of interest than an individual who was not given that diagnosis. The optimal cut points found in the first subsample were used to classify members of the second subsample, and the resulting predictions were examined for sensitivity and specificity, providing replications for these values. Further, the second subsample predictions were also evaluated for positive predictive power (PPP) and negative predictive power (NPP). PPP is the likelihood of an individual actually being assigned a diagnosis, given a CCAPS-62 subscale score above the cut score, and NPP is the likelihood of an individual actually not being assigned a diagnosis, given a CCAPS-62 subscale score below the cut score. PPP and NPP are much more sensitive to low base rates, and so for clinical problems such as psychiatric diagnoses, these represent a more stringent test of a cut score’s efficiency as well as being a major factor determining the utility of a cut score in practice.
Results The frequencies of the diagnostic groups in this sample are presented in Table 5, along with the means and standard deviations of the CCAPS-62 subscales for each group. Each diagnostic group significantly differed from the nondiagnostic comparison group on the CCAPS-62 subscales, with omnibus F tests (p ⬍ .001). Univariate significance tests are represented in the table. As can be seen, in each case the most highly elevated subscale score was the hypothesized target variable. In addition, while more than one subscale was elevated for each diagnostic group, nontarget subscales that were significantly higher than in the diagnostic groups than the nondiagnostic groups achieved fairly small effect sizes. The clear exceptions to this were cases of depression (both when considering clients with any severity of major depressive episode [MDE] as well as only those experiencing severe MDEs), which included several substantially elevated subscales besides the Depression subscale. In addition, it should be noted that there were unexpected findings with the Alcohol Use Disorders group. Although they reported substantially higher Substance Use, they also reported slightly lower Depression, Generalized Anxiety, and Social Anxiety than clients that were not diagnosed with Alcohol Use
11
Disorders. Effect sizes of the CCAPS-62 subscale score differences between the diagnostic groups and clients not assigned the particular diagnoses are shown in Table 6. These effects were large, ranging from d ⫽ 0.76 for Generalized Anxiety to d ⫽ 1.98 for Eating Concerns. These differences still show some overlap between those diagnosed with a disorder and those not given the diagnosis, but there are greater differences between these groups than for the clinical and nonclinical individuals in Study 2. ROC curves found in the first subsample are plotted in Figure 2. Optimal cut points and AUC for each subscale are presented in Table 5. As can be seen, the Generalized Anxiety subscale did not perform very well in predicting diagnoses of Generalized Anxiety Disorder, while the other subscales performed adequately (as is the case for the Depression subscale) to fairly well (the Substance Use and Eating Concerns subscales both achieved sensitivity and specificity values of .8), especially considering that these instruments were not initially designed as screening instruments for DSM–IV diagnoses. Results from the second subsample are also found in Table 6. The sensitivity and specificity found in the second subsample were very similar to that of the first subsample, for all subscales. Importantly, the NPPs for each subscale were very high, suggesting that when the subscale score is lower than the cut score, researchers and clinicians can have a high degree of certainty in a nondiagnosis. However, the PPP values were much lower. This indicates that the subscales of the CCAPS-62 will produce many false positives: particularly the Social Anxiety predictions produced nearly six times as many false positives as correct positives when making prediction of Social Phobia. As mentioned above, both the NPP and PPP are heavily and directly influenced by base rates, and the phenomenon of high NPP with low PPP is largely driven by the low base rates of these disorders, even in the clinical population. Interestingly the predictions of Depression provided the best PPP values, both for the relatively common MDE and dysthymic disorder diagnoses and for the uncommon severe MDE diagnoses. While the sensitivity and specificity of these predictions were not as good as some of the other subscales, the relatively high PPP, even for a very low-frequency diagnosis, suggests that these
Table 5 Frequencies and Means of the CCAPS-62 Subscale for the Diagnostic Groups in Study 3 Alcohol Use Disorders (n ⫽ 794; 10.53% of total)
Social Phobia (n ⫽ 431; 5.71% of total)
GAD (n ⫽ 1,149; 15.23% of total)
Eating Disorders (n ⫽ 504; 6.68% of total)
MDE and DD (n ⫽ 1,834; 24.31% of total)
Severe MDE (n ⫽ 162; 2.15% of total)
Total sample (N ⫽ 7,529)
CCAPS-62 subscale
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
Depression Generalized Anxiety Social Anxiety Academic Distress Eating Concerns Family Distress Hostility Substance Use
1.52 1.51 1.62 1.90 1.02 1.15 1.16 2.13
0.93 0.79 0.90 0.96 0.91 0.90 0.83 0.89
1.89 1.86 2.96 1.99 1.00 1.19 1.06 0.75
0.80 0.77 0.67 0.99 0.89 0.93 0.78 0.90
1.79 2.23 2.11 2.03 1.12 1.19 1.14 0.81
0.93 0.79 0.90 0.96 0.91 0.90 0.93 0.89
1.93 1.78 2.02 1.74 2.50 1.20 1.01 0.96
0.86 0.90 0.84 0.98 0.90 0.95 0.78 0.94
2.34 2.02 2.22 2.47 1.26 1.49 1.41 0.96
0.74 0.83 0.90 0.92 0.94 0.95 0.90 0.97
2.70 2.10 2.42 2.72 1.22 1.65 1.59 1.01
0.71 0.83 0.95 0.90 0.95 0.95 0.95 0.95
1.66 1.67 1.84 1.92 1.02 1.21 1.07 0.85
0.93 0.90 0.95 1.03 0.90 0.93 0.86 0.91
Note. CCAPS-62 ⫽ Counseling Center Assessment of Psychological Symptoms-62 (Locke et al., 2011); GAD ⫽ generalized anxiety disorder; MDE ⫽ major depressive episode; DD ⫽ dysthymic disorder. Boldface indicates a diagnostic group that was significantly higher on that subscale than individuals not in that diagnostic group, p ⬍ .05.
MCALEAVEY ET AL. Note. MDE ⫽ major depressive disorder; DD ⫽ dysthymic disorder; GAD ⫽ generalized anxiety disorder; AUC ⫽ area under the curve; PPP ⫽ positive predictive power; NPP ⫽ negative predictive power.
0.90 0.86 0.98 0.97 0.98 0.97 0.40 0.45 0.17 0.24 0.27 0.32 MDE and DD Severe MDE Social phobia GAD Eating disorders Alcohol use disorders Depression Depression Social Anxiety Generalized Anxiety Eating Concerns Substance Use
2.34 2.70 2.96 2.23 2.50 2.13
0.74 0.71 0.67 0.79 0.90 0.89
1.44 1.64 1.77 1.57 0.91 0.69
0.87 0.92 0.92 0.88 0.80 0.79
1.08 1.16 1.31 0.76 1.98 1.80
1.73 2.12 2.54 1.71 1.84 1.42
0.82 0.84 0.78 0.74 0.79 0.83
0.63 0.67 0.77 0.59 0.85 0.82
0.79 0.82 0.84 0.71 0.89 0.89
0.78 0.62 0.77 0.74 0.77 0.75
0.63 0.76 0.77 0.59 0.85 0.82
NPP PPP Specificity AUC Specificity
Subsample 1
Sensitivity Cut score Cohen’s d SD M SD M Disorder CCAPS-62 subscale
Nondiagnostic group Diagnostic group
Table 6 CCAPS-62 Subscale Scores, Group Differences, Effect Sizes, Cut Points, and Predictive Utility for the Diagnostic Groups in Study 3
Sensitivity
Subsample 2
12
predictions have some merit. Nevertheless, the optimal cut points on all CCAPS-62 subscales derived through ROC curves produce predictions that are likely, at best, initial screening tests for diagnoses of these disorders, requiring follow-up assessment.
Discussion The results of Study 3 suggest that these five CCAPS-62 subscales are relevant to diagnosis and, when administered in the general outpatient setting of counselors, can provide adequate to good predictions of diagnosis. It is meaningful that, in each case, the most elevated subscale was the CCAPS-62 subscale of interest. In most cases, only the target subscale was meaningfully elevated, providing some evidence that the subscales of the CCAPS-62 contain unique information related to qualitatively different types of symptoms. The exception to this was diagnoses of depressive disorders, in which many subscales appear to be elevated. This may reflect the broad impact that symptoms of depression can have: It is clear that depression adversely affects interpersonal, life functioning, family matters, and other symptoms (LeMoult, Castonguay, Joorman, & McAleavey, in press). In addition, it may be the case that given the high degree of comorbidity between depression and other disorders, these elevated subscales reflect the influence of other syndromes and disorders. In the least discriminating diagnostic group, generalized anxiety disorder (GAD), the effect size between those with GAD and those without on the Generalized Anxiety subscale was almost as large (d ⫽ 0.76) as the largest effect size found in Study 2 between those in counseling and those not (d ⫽ 0.829). This may suggest that the subscales of the CCAPS-62 do not discriminate between those in counseling and non-treatment-seeking individuals in part because of the heterogeneity found in the treatment-seeking group in the types of problems that they experience. For individuals with known presenting problems and diagnoses related to the CCAPS-62’s areas of assessment, the subscales are more robustly discriminating. These observed group differences were further tested by creating diagnostic cut scores, which overall performed well. Though these predictions were far from perfect and in particular were likely to produce many false positives, given the fact that the CCAPS-62 was not designed to be a diagnostic screening instrument and that each of the subscales examined in this study is relatively short (ranging from six to 13 items), these results may be considered promising. The clear exception is the Generalized Anxiety cut score, which was inferior to the other cut scores in AUC, sensitivity, and specificity and did not achieve as much support for its use as a diagnostic screening tool for GAD. The cut scores on the Depression subscale produced mixed results as well, achieving acceptable sensitivity, AUCs, and PPPs, but it is possible that the rarity of the severe MDE diagnosis may have caused the cut score’s properties to differ across the two subsamples. This may explain why the sensitivity of this prediction was .84 in the first sample but .62 in the second, while the specificity was .67 in the first and .76 in the second. More cases of severe MDE are required to achieve more reliable results. To reiterate the implications of the overall high NPP values and low PPP values, the diagnostic predictions made by the cut scores on the CCAPS-62 subscales are likely to produce many false positive diagnoses. This means that, at best, these predictions may be used in clinical practice as pointers to potential areas for further
CLINICAL VALIDITY OF THE CCAPS-62
13
1.0 0.9 0.8
Sensitivity
0.7 Substance Use
0.6
Social Anxiety 0.5
Generalized Anxiety
0.4
Eating Concerns MDE and DD
0.3
MDE (Severe)
0.2
Baseline
0.1 0.0 0.0
0.2
0.4
0.6
0.8
1.0
1-Specificity
Figure 2. Receiver-operating-characteristic (ROC) curves of the subscales of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62; Locke et al., 2011) for the first subsample in Study 3. MDE ⫽ major depressive episode; DD ⫽ dysthymic disorder.
assessment and should not be taken to be definitive diagnostic tests. However, the low NPP values should confer a degree of confidence to assessors that if one of the subscales is not elevated, the chance of missing a true diagnosis is limited. There are two important points regarding the diagnoses used in this study that raise uncertainty regarding the results. The first is the fact that in this study we have collected data from five different institutions and pooled them, based on the fact that the overall means of the CCAPS-62 subscales were highly similar. We expect that this will protect the results of this study from being overly influenced by the diagnostic biases at one center or even from a limited number of counselors at a center, although it may also have obscured center level differences in disorder prevalence. In addition, it would be ideal to have had “gold standard” diagnoses for the purposes of this study, which would entail a trained, blind assessor with known reliability conducting a structured interview. Instead, we used diagnoses assigned as part of routine practice, which may increase the external validity but decrease the internal validity of this study. In each participating center, DSM–IV–TR diagnoses were assigned following an informal, usually 1-hr, interview. These interviews may vary across centers and, because they are not structured interviews, likely varied considerably between counselors even within centers. Further, we cannot know to what degree assessors used the CCAPS-62 when assigning diagnoses or if they saw them before the assessment, after the assessment, or ever. Given this, it may be expected that the diagnoses used in this study have much more “noise” than diagnoses assigned as part of more controlled research, which increases the uncertainty regarding the findings but does not necessarily mean that the results are inflated. In fact, in some ways it is surprising that these noisy diagnoses were nevertheless so strongly predicted by the CCAPS-62.
General Discussion The CCAPS-62 has several potential uses in clinical practice, including as an intake screening measure designed to facilitate triage and treatment planning; as a clinical tool for use with individual clients, tracking change over time and assessing problem areas as well as recovery/deterioration; and as an evaluative instrument to help administrators assess both the needs and the performance of their center. While a previous study (Locke et al., 2011) demonstrated important psychometric qualities of this instrument, the current article was aimed at addressing a number of other issues related to its construct validity and clinical utility. Overall, the results of the studies reported here provide detailed information regarding the meaning of scores on the CCAPS-62 subscales, as well as some appropriate and inappropriate uses of the CCAPS-62 in counseling centers. As such, these studies inform both clinicians and researchers about the interpretation and usage of this instrument within its intended clinical context. Broadly, the eight subscales appear to be reasonably justified for use in counseling center settings (based on the pattern of correlations with other self-report instruments in the counseling population) and perform best (based on the results of IRT analyses) with clients who are roughly in the average to slightly elevated range of distress. This is evidence of score meaning validity and suggests that the CCAPS-62 may be useful to assess clients in counseling, at least as a first or preliminary assessment. With the goal of simplifying score interpretation in applied settings, cut scores were also derived for each of the CCAPS-62 subscales. However, our findings suggested that most of the cut scores were not very effective in discriminating between students in counseling and those not seeking treatment, indicating that it may be misguided to attempt to discriminate such broadly differing groups using specific subscales (with the possible exception of the Depression subscale). However, when clients were grouped by
14
MCALEAVEY ET AL.
diagnosis, five subscales showed distinct elevations among clients given subscale-specific diagnoses compared to other clients. This not only provides some evidence of cross-rater (clients and therapists) validity but also suggests that the subscales may be able to identify when one of the areas of distress is elevated enough to be a specific clinical concern. Four of these subscales appeared to provide adequate predictions of diagnoses, which may be useful in initial assessment at counseling centers. Thus, cut scores based on domain specific samples, rather than all clients, appear promising as tools for facilitating intake, triage, and for tracking change over time, while cut scores based on broader groupings appear less useful. The above findings suggest an important difference between the CCAPS-62 and some other popular self-report measures of distress used in counseling centers (e.g., the Outcome Questionnaire-45; Lambert, Hansen, Umpress, et al., 2001): The individual subscales of the CCAPS-62 do not seem to assess the type of general distress that measures like the OQ-45’s total score frequently assess. This is by design: The CCAPS-62 was not intended to be a measure of general distress but of discrete groups of psychological symptoms common in college counseling centers. In light of this important difference, assuming that all clients’ scores on all of the CCAPS62’s subscales will be elevated does not appear to be a viable assumption, nor, therefore, is it likely useful to examine CCAPS-62 subscale differences between broad groups such as “clinical” and “nonclinical.” Instead, it is likely that in order to optimally use the CCAPS-62’s subscales, researchers and clinicians can refer to specific ranges of an individual subscale using such groups as “not likely to be specifically problematic” or “at risk of a specific clinical problem.” That is, if a client has an elevated score on one of the subscales, that domain should be considered a potential problem area, to be further assessed by the clinician. In addition, a course of treatment might be considered “successful” if a client’s subscale score moves from the elevated range to the less elevated range, representing potential recovery (with the obvious caveat that diagnosis is not nearly ensured with a high score, nor is it certainly a nonissue with a low score). This is also one area in which the current studies can inform treatmentmonitoring and quality assessment at the level of a counseling center or for research programs. These results suggest that clients with scores in the lowest range for a given subscale should not be considered candidates for meaningful recovery. Counseling centers assessing their effectiveness treating, for example, alcohol use disorders, should not aggregate all clients for analyses but should select only those clients who exceed the cutoff determined in Study 3. The same might be said for future research examining effectiveness of counseling, in which the lowest-distressed clients may be examined for potential deterioration but would not be expected to show improvement on all CCAPS-62 subscales. Given these promising indications of valid score meanings and justifiable application to counseling practice for initial assessment and diagnosis, future directions for the CCAPS-62 should include using the subscales in more multivariate applications rather than as univariate predictors as we have presented here. This would perhaps include developing profiles of distress, or examining relationships between subscales for certain, specific sub-populations, such as diagnostic groups or demographic groups. At present, multivariate methods, such as latent profile analysis, growth mixture modeling, and latent transition analyses, are becoming more
common but have not become standard in naturalistic counseling research. Given the multidimensional structure of the CCAPS-62, it may be a useful tool to evaluate these methods. In addition, further research should assess the sensitivity of the CCAPS-62 to change in psychological treatment, in order to validate it as a measure of treatment effectiveness. While there are always additional areas for measure development and assessment, as well as several important limitations of the present studies, the present studies provide valuable information regarding the validity and clinical utility of the CCAPS-62 subscale scores.
References Agliata, A. K., & Renk, K. (2009). College students’ affective distress: The role of expectation discrepancies and communication. Journal of Child and Family Studies, 18, 396 – 411. doi:10.1007/s10826-008-9244-8 American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author. Baker, R. W., & Siryk, B. (1986). Exploratory intervention with a scale measuring adjustment to college. Journal of Counseling Psychology, 33, 31–38. doi:10.1037/0022-0167.33.1.31 Barkham, M., Stiles, W. B., Lambert, M. J., & Mellor-Clark, J. (2010). Building a rigorous and relevant knowledgebase for the psychological therapies. In M. Barkham, G. E. Hardy, & J. Mellor-Clark (Eds.), Developing and delivering practice-based evidence in the psychological therapies (pp. 21– 61). Chichester, England: Wiley-Blackwell. doi: 10.1002/9780470687994.ch2 Barr, V., Rando, R., Krylowicz, B., & Winfield, E. (2010). The Association for University and College Counseling Center Directors annual survey. Retrieved from http://aucccd.org/img/pdfs/directors_survey_2009_nm .pdf Beavers, W. R., Hampson, R. B., & Hulgus, Y. F. (1985). Commentary: The Beavers systems approach to family assessment. Family Process, 24, 398 – 405. doi:10.1111/j.1545-5300.1985.00398.x Beavers, W. R., Hampson, R. B., & Hulgus, Y. F. (1990). Beavers systems model manual. Dallas, TX: Southwest Family Institute. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893– 897. doi:10.1037/0022006X.56.6.893 Beck, A. T., Steer, R. A., & Garbin, M. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 77–100. doi:10.1016/0272-7358(88)90050-5 Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561–571. doi:10.1001/archpsyc.1961.01710120031004 Blanco, C., Okuda, M., Wright, C., Hasin, D. S., Grant, B. F., Liu, S.-M., & Olfson, M. (2008). Mental health of college students and their non-college-attending peers: Results from the national epidemiologic study on alcohol and related conditions. Archives of General Psychiatry, 65, 1429 –1437. doi:10.1001/archpsyc.65.12.1429 Boswell, J. F., McAleavey, A. A., Castonguay, L. G., Hayes, J. A., & Locke, B. D. (in press). The effect of previous mental health service utilization on change in counseling clients’ depressive symptoms. Journal of Counseling Psychology. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 445– 455). Newbury Park, CA: Sage. Castonguay, L. G., Locke, B. D., & Hayes, J. A. (2011). The center for collegiate mental health: An example of a practice-research network in university counseling centers. Journal of College Student Psychotherapy, 25, 105–119. doi:10.1080/87568225.2011.556929
CLINICAL VALIDITY OF THE CCAPS-62 Center for Collegiate Mental Health. (2012). 2011 annual report (Publication No. STA 12-59). Retrieved from http://ccmh.squarespace.com/ storage/CCMH_AnnualReport_2011.pdf Center for the Study of Collegiate Mental Health. (2009). Standardized data set [Unpublished instrument]. Retrieved from http://ccmh .squarespace.com/storage/SDS-Client-REV-6 –1-2009-Details.pdf Cizek, G. J. (2012). Defining and distinguishing validity: Interpretations of score meaning and justifications of test use. Psychological Methods, 17, 31– 43. doi:10.1037/a0026975 Derogatis, L. R. (1992). SCL-90-R administration, scoring, and procedures manual II. Towson, MD: Clinical Psychometric Research. Diemer, M. A., Wang, Q., & Dunkle, J. H. (2009). Counseling center intake checklists at academically selective institutions: Practice and measurement implications. Journal of College Student Psychotherapy, 23, 135–150. doi:10.1080/87568220902743728 Eisenberg, D., Hunt, J., Speer, N., & Zivin, K. (2011). Mental health service utilization among college students in the United States. Journal of Nervous and Mental Disease, 199, 301–308. doi:10.1097/NMD .0b013e3182175123 Erdur-Baker, O., Aberson, C. L., Barrow, J. C., & Draper, M. R. (2006). Nature and severity of college students’ psychological concerns: A comparison of clinical and nonclinical national samples. Professional Psychology: Research and Practice, 37, 317–323. doi:10.1037/07357028.37.3.317 Gallagher, R. P. (2009). National survey of counseling center directors. Alexandria, VA: International Association of Counseling Services. Garner, D. M., Olmsted, M. P., Polivy, J., & Garfinkel, P. E. (1984). Comparison between weight preoccupied women and anorexia nervosa. Psychosomatic Medicine, 46, 255–266. Hayes, J. A., Locke, B. D., & Castonguay, L. G. (2011). The Center for Collegiate Mental Health: Practice and research working together. Journal of College Counseling, 14, 101–104. doi:10.1002/j.2161-1882.2011 .tb00265.x Hayes, J. A., Youn, S. J., Castonguay, L. G., Locke, B. D., McAleavey, A. A., & Nordberg, S. (2011). Rates and predictors of counseling center use among college students of color. Journal of College Counseling, 14, 105–116. doi:10.1002/j.2161-1882.2011.tb00266.x Howard, K. I., Moras, K., Brill, P. L., Martinovich, Z., & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51, 1059 –1064. doi:10.1037/0003-066X .51.10.1059 Hoyt, W. D., & Ross, S. D. (2003). Clinical and subclinical eating disorders in counselling center clients: A prevalence study. Journal of College Student Psychotherapy, 17, 39 –54. doi:10.1300/J035v17n04_06 Hu, L.-T., & Bentler, P. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118 Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19. doi:10.1037/ 0022-006X.59.1.12 Kettmann, J. D. J., Schoen, E. G., Moel, J. E., Cochran, S. V., Greenberg, S. T., & Corkery, J. M. (2007). Increasing severity of psychopathology at counseling centers: A new look. Professional Psychology: Research and Practice, 38, 523–529. doi:10.1037/0735-7028.38.5.523 Kopta, S. M., & Lowry, J. L. (2002). Psychometric evaluation of the Behavioral Health Questionnaire-20: A brief instrument for assessing global mental health and the three phases of psychotherapy outcome. Psychotherapy Research, 12, 413– 426. doi:10.1093/ptr/12.4.413 Kraus, D. R., Castonguay, L. G., Boswell, J. F., Nordberg, S. S., & Hayes, J. A. (2011). Therapist effectiveness: Implications for accountability and patient care. Psychotherapy Research, 21, 267–276. doi:10.1080/ 10503307.2011.563249
15
Kraus, D. R., Seligman, D. A., & Jordan, J. R. (2005). Validation of a behavioral health treatment outcome and assessment tool designed for naturalistic settings: The Treatment Outcome Package. Journal of Clinical Psychology, 61, 285–314. doi:10.1002/jclp.20084 Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606 – 613. doi:10.1046/j.1525-1497.2001 .016009606.x Lambert, M. J., Hansen, N. B., & Finch, A. E. (2001). Patient-focused research: Using patient outcome data to enhance treatment effects. Journal of Consulting and Clinical Psychology, 69, 159 –172. doi: 10.1037/0022-006X.69.2.159 Lambert, M. J., Hansen, N. B., Umpress, V., Lunnen, K., Okiishi, J., Burlingame, G. M., & Reisinger, C. W. (2001). Administration and scoring manual for the OQ-45. Orem, UT: American Professional Credentialing Services. Lambert, M. J., & Hawkins, E. J. (2004). Measuring outcome in professional practice: Considerations in selecting and using brief outcome instruments. Professional Psychology: Research and Practice, 35, 492– 499. doi:10.1037/0735-7028.35.5.492 Lambert, M. J., & Ogles, B. M. (2009). Using clinical significance in psychotherapy outcome research: The need for a common procedure and validity data. Psychotherapy Research, 19, 493–501. doi:10.1080/ 10503300902849483 LeMoult, J., Castonguay, L. G., Joorman, J., & McAleavey, A. A. (in press). Depression: Basic research and clinical implications. In L. G. Castonguay & T. Oltmanns (Eds.), Psychopathology: Bridging the gap between basic empirical findings and clinical practice. New York, NY: Guilford Press. Locke, B. D., Buzolitz, J. S., Lei, P.-W., Boswell, J. F., McAleavey, A. A., Sevig, T. D., . . . Hayes, J. A. (2011). Development of the Counseling Center Assessment of Psychological Symptoms-62 (CCAPS-62). Journal of Counseling Psychology, 58, 97–109. doi:10.1037/a0021282 Locke, B. D., McAleavey, A. A., Zhao, Y., Lei, P.-W., Hayes, J. A., Castonguay, L. G., . . . Lin, Y.-C. (2012). Development and initial validation of the Counseling Center Assessment of Psychological Symptoms-34 (CCAPS-34). Measurement and Evaluation in Counseling and Development, 45, 151–169. doi:10.1177/0748175611432642 McAleavey, A. A., Nordberg, S. S., Kraus, D., & Castonguay, L. G. (2012). Errors in outcome monitoring: Implications for real-world psychotherapy. Canadian Psychology, 53, 105–114. doi:10.1037/a0027833 McFall, R. M., & Treat, T. A. (1999). Quantifying the information value of clinical assessments with signal detection theory. Annual Review of Psychology, 50, 215–241. doi:10.1146/annurev.psych.50.1.215 Millon, T., Strack, S., Millon-Niedbala, M., & Grossman, S. D. (2008). Using the Millon College Counseling Inventory to assess student mental health needs. Journal of College Counseling, 11, 159 –172. doi:10.1002/ j.2161-1882.2008.tb00032.x Minami, T., Davies, D. R., Tierney, S. C., Bettmann, J. E., McAward, S. M., Averill, L. A., . . . Wampold, B. E. (2009). Preliminary evidence on the effectiveness of psychological treatments delivered at a university counseling center. Journal of Counseling Psychology, 56, 309 –320. doi:10.1037/a0015398 Mintz, L. B., & O’Halloran, M. (2000). The Eating Attitudes Test: Validation with DSM–IV eating disorder criteria. Journal of Personality Assessment, 74, 489 –503. doi:10.1207/S15327752JPA7403_11 Muthén, B. O. (1998 –2004). Mplus technical appendices. Los Angeles, CA: Muthén & Muthén. Muthén, B. O., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (Mplus Web notes No. 4, v. 5). Retrieved from http://www.statmodel .com Muthén, L. K., & Muthén, B. O. (1998 –2011). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén.
16
MCALEAVEY ET AL.
Newman, M. G., Kachin, K. E., Zuellig, A. R., Constatino, M. J., & Cashman-McGrath, L. (2003). The social phobia diagnostic questionnaire: Preliminary validation of a new self-report diagnostic measure of social phobia. Psychological Medicine, 33, 623– 635. doi:10.1017/ S0033291703007669 Okiishi, J. C., Lambert, M. J., Eggett, D., Nielsen, S. L., Dayton, D. D., & Vermeersch, D. A. (2006). An analysis of therapist treatment effects: Toward providing feedback to individual therapists on their patients’ psychotherapy outcome. Journal of Clinical Psychology, 62, 1157– 1172. doi:10.1002/jclp.20272 Reynolds, W. M. (1982). Development of reliable and valid short forms of the Marlowe-Crowne Social Desirability Scale. Journal of Clinical Psychology, 38, 119 –125. doi:10.1002/1097-4679(198201)38:1⬍119:: AID-JCLP2270380118⬎3.0.CO;2-I Roberts, R. E., Lewinsohn, P. M., & Seeley, J. R. (1991). Screening for adolescent depression: A comparison of depression scales. Journal of the American Academy of Child & Adolescent Psychiatry, 30, 58 – 66. doi:10.1097/00004583-199101000-00009 Robertson, J. M., Benton, S. L., Newton, F. B., Downey, R. G., Marsh, P. A., Benton, S. A., . . . Shin, K.-H. (2006). K-State Problem Identification Rating Scales for College Students. Measurement and Evaluation in Counseling and Development, 39, 141–160. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100 –114. Saunders, J. B., Aasland, O. G., Babor, T. F., de la Fuente, J. R., & Grant, M. (1993). Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption-II. Addiction, 88, 791– 804. doi:10.1111/j.1360-0443.1993.tb02093.x Sharkin, B. S. (2004). Assessing changes in categories but not severity of counseling center clients’ problems across 13 years: Comment on Benton, Robertson, Tseng, Newton, and Benton (2003). Professional Psy-
chology: Research and Practice, 35, 313–315. doi:10.1037/0735-7028 .35.3.313 Sharkin, B. S., & Coulter, L. P. (2005). Empirically supporting the increasing severity of college counseling center client problems: Why is it so challenging? Journal of College Counseling, 8, 165–171. doi:10.1002/j .2161-1882.2005.tb00083.x Slade, T., & Watson, D. (2006). The structure of common DSM–IV and ICD-10 mental disorders in the Australian general population. Psychological Medicine, 36, 1593–1600. doi:10.1017/S0033291706008452 Spielberger, C. D. (1999). STAXI-2: The State-Trait Anger Expression Inventory professional manual. Odessa, FL: Psychological Assessment Resources. Steer, R. A., Ranieri, W. F., Beck, A. T., & Clark, D. A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 195–205. doi:10.1016/ 0887-6185(93)90002-3 Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Erlbaum. Stulz, N., & Crits-Chrisoph, P. (2010). Distinguishing anxiety and depression in self-report: Purification of the Beck Anxiety Inventory and Beck Depression Inventory-II. Journal of Clinical Psychology, 66, 927–940. doi:10.1002/jclp.20701 Stulz, N., Lutz, W., Leach, C., Lucock, M., & Barkham, M. (2007). Shapes of early change in psychotherapy under routine outpatient conditions. Journal of Consulting and Clinical Psychology, 75, 864 – 874. doi: 10.1037/0022-006X.75.6.864 Tutty, L. M. (1995). Theoretical and practical issues in selecting a measure of family functioning. Research on Social Work Practice, 5, 80 –106. doi:10.1177/104973159500500107
Received August 29, 2011 Revision received July 14, 2012 Accepted July 19, 2012 䡲