Alternative Approaches to the Study of Change

Alternative Approaches to the Study of Change: When Syndromal Assessments Conflict with Adults’ and Children’s Impressions of Improvement By Sophia C. Choukas-Bradley

HONORS THESIS April, 2008

Submitted in partial fulfillment of the requirements for the degree of Bachelor of Arts with Honors in the Department of Psychology at Brown University

Approaches to Studying Change

2

Acknowledgments It would be impossible to adequately thank Jack Wright for his exceptional dedication as an advisor. Thank you for never faltering in your support of this thesis that has taken an inordinate amount of time from other parts of your life over the past two years. Our work has been a joy, and it has been the most important part of my college experience. I feel prepared for and excited about a future in this field, and I cannot thank you enough for what you have taught me. I would also like to thank Audrey Zakriski, for her support over these past two years, and for the laughs we have shared at the end of long days in the lab. A big thank you to their kids, Emily and Ethan: You have always been so friendly and patient on all research-related expeditions, you were invaluable pretesters of my interview protocol, and I loved our movie-watching night at Wediko. Thank you to Steph, Annie, Kristen, Natalie, the members of the summer research teams and my clinical team, and all the wonderful people of Wediko who have helped make this project happen. A special thank you goes to Lindsay Metcalfe. You have literally been there every step of the way and I can’t imagine this project without your friendship, advice, and support. Finally, thank you to all my friends and family who have listened to me talk endlessly about Wediko and my thesis and have supported me through it all—and thank you especially to Mom, Dad, Jesse, and Emily.


Table of Contents Acknowledgments

2

Table of Contents

3

Abstract

4

Introduction

5

Method

17

Results

31

Discussion

40

References

53

Tables

58

Figure Captions

63

Figures

65

Appendices

73

3


4

Abstract This research examined how context-sensitive assessments of behavior can reveal patterns of change in both adaptive and problem behaviors that are obscured by syndromal assessments. In a field study of children at a short-term residential treatment program, counselors’ impressions of change suggested improvement in all areas of functioning. These findings conflicted with the results from standardized syndromal assessments, which suggested increases in some problem behaviors and no change in others. Structured child interviews designed to parallel adults’ field observations revealed that children perceived improvement at the end of the program. However, further analyses showed that their self-assessments did not change from the first to the second interview administration in their ratings of their overall behavior frequencies, their reactions to events, or the frequencies of encountering those events. Extensive finegrained field observations revealed robust increases in overall prosocial behavior, and complex patterns of change in children’s responses to specific events, which demonstrated how children can simultaneously “worsen” in some areas and “improve” in others. The results underscore the importance of developing treatment outcome measures that are contextually sensitive and that assess changes in positive as well as negative behaviors.


5

Introduction At a time when 14 million of the country’s youth are estimated to have serious emotional or behavioral problems, the field of child clinical psychology is moving towards evidence-based treatments (Kazdin, 2003). There is surprisingly little agreement, however, about what constitutes “evidence” in the study of change and about how such evidence should be used. Questions remain unresolved about the types of changes we should examine, the types of informants we should involve in assessments, when such assessments should occur, and how we should interpret them. Traditionally, change has been defined “as the simple difference of two measurements, such as the difference between pre- and postintervention therapist ratings of client symptomatology” (Eddy, Dishion, & Stoolmiller, 1998, p. 53). More recently, reviews of psychosocial interventions have advocated for an expansion of the study of change “beyond the exclusive focus on symptom change” (Kazdin, 2003, p. 269), to include greater attention to the situational factors in children’s environments that may mediate change processes (Hoagwood, Jensen, Petti, & Burns, 1996). Nevertheless, evaluation studies that define treatment outcomes beyond ratings of overall “symptom reduction” continue to be surprisingly rare (Hoagwood et al., 1996, p. 1056), leading to a “stymied” study of change (Eddy et al., 1998). Controversies in Assessment The field’s overemphasis on symptomatology can be traced to the framing of personality in terms of the frequency of “acts” over some period of time. Buss and Craik (1983) defined the “act frequency approach” to personality as a method of summarizing past behavior and predicting future behavior based on “act trends”—that is, rates of


6

specified, decontextualized actions, collapsed over situations. In this and related approaches to personality assessment, variations in behavior across situations are viewed as deviations from one’s “true” nature, which should be reduced through aggregation (Mischel and Shoda, 1995; Wright & Mischel, 1987). The “five factor approach” to personality (McCrae & Costa, 1987) claims that personality can be measured based on the “Big Five” traits (e.g., extraversion, agreeableness, neuroticism). Traits are measured through mostly acontextual statements (e.g., “she really enjoys talking to people”; “she is not a worrier”), and participants are provided with post-assessment summaries of their personalities that state: “Unless you experience major life changes or make deliberate efforts to change yourself, this summary should apply to you throughout your adult life” (Costa & McCrae, 1991). Such trait theories favor a view of personality as a set of relatively stable and enduring characteristics within the person that are relatively free from contextual influences. Although trait theories that emphasize aggregated act frequency statements have continued to receive support and empirical testing, researchers have also developed other theories of personality that emphasize the complexity of person-situation interactions. For example, although Roberts and Caspi (2001, p. 105) argue that there is continuity in traits from childhood to adulthood, they also acknowledge that personality involves “complex patterns of behavior,” based on “if-stimulus-then” relationships. This formulation of personality as a series of if…then… links between situations and behaviors has been discussed and tested over the past two decades (see Mischel & Shoda, 1995; Zakriski, Wright, & Underwood, 2005). This view suggests that personality is stable, but it also acknowledges the importance of the environment in shaping behavior, treating variations


7

across situations as meaningful clues in our attempt to solve the puzzle of personality, rather than as “noise” that we must filter out. Behavioral observations collected at a summer program for emotionally and behaviorally disturbed youth revealed that children displayed stable reaction patterns in response to situations that demanded similar competencies (Shoda, Mischel, & Wright, 1993). Mischel and Shoda (1995, p. 257) claimed that the underlying theory, that personality can be understood as a person’s pattern of if…then… situation-behavior relations, helps to resolve the person-situation debate by recognizing the importance of both stable behavioral tendencies and situational factors. Implications for Assessment It has been difficult for the field to make the translation from theory into the practice of assessment. In spite of the theoretical questions raised about acontextual approaches to assessment, the field continues to rely on methods that are designed to provide highly aggregated statements about people’s behavioral tendencies, because such assessments are reported to show high reliability and validity (Achenbach & Rescorla, 2001). Cervone, Shadel, and Jencius (2001, p. 36) note that, despite repeated concerns about context-free approaches, “the central elements of personality assessment are measures of global, decontextualized psychological qualities, and individuals are characterized by their average tendency to exhibit each of the qualities.” This disconnect reflects the ambivalence of a field that wants to acknowledge the variability of people’s behaviors, but is reluctant to depart from widely tested assessment tools. The disconnect between theory and practice can often be seen within papers, and sometimes within sentences. For example, Roberts and Caspi (2001, p. 105) claim that the “complex


8

patterns of behavior,” mentioned previously, “need to be examined, interpreted, and aggregated across numerous situations, places and times, to arrive at a reliable and valid index of a personality trait.” This statement calls for two contradictory actions: an interpretation of different responses to situations that are aggregated across time and place. We cannot understand these complex patterns of behavior if we are to filter out the very contextual information that makes an interpretation necessary. The need for highly context-sensitive measurements is arguably greatest in the field of child clinical psychology, because children may differ significantly in their responses to peers and adults, and in their home and school settings. And yet, the majority of assessment tools ask raters to retrospectively assess the children’s behaviors through act statements which are insensitive to situational variability. One widely used instrument is the Conners parent and teacher scales (Conners, Sitarenios, Parker, & Epstein, 1998), which consist of 38 items that reflect a range of behavior problems (e.g., oppositional; anxious/shy). For each item (e.g., “difficulty being quiet”), the rater is asked how often the behavior has occurred (i.e., “just a little,” “pretty much,” or “very much”). A similar instrument is the Behavioral Assessment System for Children (BASC, Reynolds & Kamphaus, 2002), which asks informants to rate a range of behaviors on a frequency scale of “never” to “almost always.” The behaviors probed in these assessments are typically acontextual statements, which are aggregated into scale scores or “syndromes” that attempt to summarize children’s dispositions. The most widely used standardized instruments in this field, with over 6,000 citations, are Achenbach’s (2001) Teacher Report Form (TRF) and the parallel Child Behavior Checklist (CBCL), which is used by parents. Each instrument consists of 118


9

items, the majority of which are decontextualized act statements (e.g., “threatens people”; “teases a lot”). Raters are asked to assess these behaviors on a scale of 0-2 (“not true,” “somewhat or sometimes true,” and “very true or often true”). Some items take context into account (e.g., “argues when denied own way”), and such items could even be reframed in terms of if…then… statements (i.e., “if denied own way, then argues”). However, this contextual information is lost in the aggregation of all relevant items into syndromal scale scores. These scores are converted to T scores so that children can be compared to means for “normal” or “clinical” samples. Various studies have highlighted the importance of using context-sensitive measures in the assessment of behavior in children and adolescents. Wright, Lindgren, and Zakriski (2001) found that in both a field study and in a controlled laboratory study, the TRF failed to discriminate between children with similar behavioral base rates who were functionally different. Specifically, a target child who was highly aggressive when provoked, but encountered few aversive events, received the same aggression scores as a functionally opposite target child, who was less reactive but embedded in a hostile social environment. The researchers found that raters were capable of encoding information about the children’s social environments when forming global personality impressions. The raters recognized that the children differed in their event rates and reaction rates, and in forming their global judgments of the children’s personalities, they weighted reactions more heavily than event rates. It was only in completing the TRF that raters seemed to “filter out” this important information and only focus on their overall act frequencies. Syndromal assessments can fail to recognize important differences in individual children, and also in groups of children. Zakriski, Wright, and Underwood (2005) found

Approaches to Studying Change 10 that gender differences emerged in girls’ and boys’ responses to specific events and in their rates of encountering those events. Their results also elucidate how complex interactions can occur between the person and his/her environment. For example, older girls (age 12 and up) encountered more prosocial peer talk than did older boys, and they responded with a higher rate of prosocial reactions in the face of such events. Thus, older girls appeared to be involved in a positive reciprocal cycle, in which their positive reactions encouraged future positive events, and vice-versa. Older boys, on the other hand, seemed to be involved in a negative cycle; they more frequently encountered aversive peer events, and “appear[ed] to be enmeshed in a cycle in which their counterattacks to peer provocation [led] to escalated coercion from peers” (Zakriski et al., p. 852). Implications for the Study of Change Discrepancies between “evidence” and impressions. Assessment measures that do not examine the person-situation interactions that contribute to behavior may fail to uncover the change processes that influence treatment outcomes (Choukas-Bradley, Banducci, Metcalfe, Wright, & Zakriski, 2008). Indeed, if it is difficult to assess behaviors, it is likely to be even more challenging to assess behavioral change. A common approach to measuring children’s change in response to treatment uses difference scores from two administrations of a syndromal measure. A smaller number of studies use a more direct approach to measuring change, by asking observers, such as clinicians or parents, to reflect on children’s improvement over the full course of treatment, and to make summary judgments about this improvement. One such measure is the Clinical Global Impressions (CGI, Guy, 1976) instrument, which asks raters to

Approaches to Studying Change 11 provide impressions of change in six areas (global improvement, relationships with adults, relationships with peers, aggressive behavior, withdrawn behavior, and prosocial behavior). A study of residential treatment outcomes by Connor, Miller, Cunningham, and Melloni (2002) found that greater improvement was reported by the CGI than by a standardized syndromal instrument similar to the TRF (the Devereux Scales of Mental Disorder, or DSMD). The researchers attributed this discrepancy to the fact that teachers completed the DSMD forms and residential clinical staff completed the CGI forms, and point out that the low level of staff agreement from these two measures “highlight[ed] the difficulties in defining and reliability assessing outcome for youth in residential care” (Connor et al., 2002). In analyses of multiple measures of children’s behavioral changes during short-term residential treatment, Metcalfe (2007) and Choukas-Bradley et al. (2008) found results consistent with those of Connor et al. (2002). Counselors’ impressions of change appeared to contradict pre-post difference scores based on the TRF, even when the same raters completed both forms. Specifically, the CGI reported improvement in all areas, but the TRF did not show a linear change in aggression over time. Problem behaviors vs. adaptive functioning. In trying to understand the differences between raters’ global impressions of change and standardized syndromal measures, it is important to note that instruments such as the CGI ask raters to assess changes in both problem and prosocial behaviors, while the syndromal instruments focus on “symptoms,” that is, problem behaviors. Indeed, the trend in the study of children’s behavioral change—and the study of children’s problem behaviors in general—is to

Approaches to Studying Change 12 focus on the reduction of symptoms. As Kazdin (2003, p. 269) points out, “Although symptom change is important, it is difficult to find compelling evidence that symptom change, as opposed to reduced impairment or improvements in prosocial functioning, family interaction, or peer relations, is the best predictor of long-term adjustment and functioning.” Some evidence suggests that during treatment, children improve more in this realm than in their problem or antisocial behaviors. In the Metcalfe (2007) study, counselors’ judgments of improvement for prosocial behavior were higher than for problem behaviors such as aggression. Moreover, field observations revealed increases in prosocial behavior, whereas aggression showed relatively little change. To summarize, global impressions of change and field observations of behavior reveal improvements in children’s adaptive, prosocial behaviors, and such changes are not revealed in syndromal assessments that focus on problem behaviors. Contextual specificity of change. A second possible explanation for the discrepancy between counselors’ impressions of change and syndrome change scores is that the latter may obscure important context-specific changes in children’s reactions to social events. Wright and Zakriski (in press) found that complex change patterns occurred during residential treatment. For example, prosocial reactions to positive events increased, yet they decreased in response to aversive events (e.g., adult discipline and peer provocation). Similarly, the overall rate of aggressive behavior decreased, yet aggressive reactions to aversive events increased. Metcalfe’s (2007) study also revealed complicated patterns of change. Children’s “total adaptive functioning” (a measure that included increases in prosocial behavior and decreases in problem behaviors) improved in response to adult events, but changed little if at all in their reactions to peer events.

Approaches to Studying Change 13 Choukas-Bradley et al. (2008) found that children rated as “most improved” in counselors’ impressions of change showed significant decreases in aggression over time, but that their aggression increased in response to aversive events. Other research found that counselors were able to incorporate important contextual information about a target child in their impressions of his personality, whereas their TRF ratings did not discriminate between functionally different children (Wright et al., 2001). It seems that similar processes may mediate the discrepancies between people’s impressions of change and changes in TRF scores; counselors’ impressions of change reflect their internalization of contextual specificity, but their TRF assessments may not. Children’s Interviews Though previous research has provided evidence of the relation between extensive field observations and people’s impressions of change, such studies neglect a key group of raters: the children themselves. If we are to view inter-rater discrepancies as informative rather than problematic, and if we hope to incorporate the ratings of many different types of informants in order to understand differences in children’s behaviors across a wide range of situations, then it follows that we should incorporate children’s views of their own behaviors in our model of change. Achenbach, McConaughby, and Howell (1987, p. 213) argue: “Because we lack definitive criteria against which to validate measures of childhood problems, it is essential to preserve the contributions of different informants, even if they do not correlate well with each other.” Achenbach, who developed the TRF/CBCL, also developed the Youth Self Report (YSR), the child version of the behavioral checklist. He reasoned that the low mean correlation (.22) between children’s self-ratings and their ratings by parents, teachers, and mental health

Approaches to Studying Change 14 workers, demonstrates that children’s self-ratings should not stand alone as measures of their behavioral problems, but that neither should they be ignored, because “[l]ow correlations between informants may indicate that the target variables differ from one situation to another, rather than that the informants’ reports are invalid or unreliable” (Achenbach et al., 1987, p. 213). Giving voice to children in the assessment of their own behaviors is an important step towards understanding their behavioral changes, but important information could be lost in the aggregation that occurs in existing child versions of syndromal scales. A context-sensitive assessment that measures children’s perceptions of their behaviors would help clarify the processes that mediate change, and would help us understand whether children’s self-perceptions of change agree with adults’ judgments. In his critique of the act frequency approach’s (AFA) lack of regard to intentionality, Block (1989, p. 238) writes, “…if I affirm the context-unspecifying statement ‘I demand a backrub’ (a dominance ‘act’ statement), the AFA is not interested in and cannot know the meaning of my yes response.” Context-sensitive ratings by adults could help us to understand the situations in which a person is most likely to demand a backrub, but only in talking to the person directly could we learn more about the “meaning of his yes response.” In the context of a residential treatment program, an observer might recognize that a child responds aggressively to peer prosocial talk, but only the child herself could reveal that she believes other children are mocking her when, from the adults’ perspectives, they are simply trying to engage her in conversation. Children are valuable informants because they can provide us with insights into their behaviors to which they have better access than outside observers. Kazdin (2001, p.

Approaches to Studying Change 15 789) states that “aggression is not merely triggered by environmental effects, but rather through the way in which these events are perceived and processed.” His claims are complemented by considerable research on social information processing models of aggression, which suggest that aggressive children may interpret their social surroundings in distinctive ways. Social Information Processing (SIP) theory, as reformulated by Crick and Dodge (1994), suggests that, when children are confronted with a social situation, they (1) encode external and internal cures, (2) interpret and mentally represent those cues, (3) select a goal, (4) access or construct responses, (5) decide on a response, and (6) enact that response. The most common method of studying children’s social informational processing involves an interview or questionnaire in which children are presented with hypothetical situations and are asked questions that probe their processing patterns at various steps of the SIP model, such as how the situation might make them feel and what they might do next (Crick & Dodge, 1994). As support for the SIP model has increased, studies have attempted to link children’s vignette responses to actual behaviors, by using data from peer nominations about social preference and/or aggression (e.g., Dodge, Lansford, Burks, Bates, Pettit, Fontaine, & Price, 2003; Parker, Hubbard, Ramsden, Relyea, Dearing, Smithmyer, & Schimmel, 2001) or teacher and parent reports (e.g., Dodge et al., 2003). Few researchers have studied the SIP patterns of children with “clinically severe aggressive behavior problems” (Orobio de Castro, Merk, Koops, Veerman, & Bosch, 2005, p. 105). The extant studies have compared samples of clinically referred children with nonclinical samples (e.g., Coy, Speltz, DeKlyen, & Jones, 2001; Orobio de Castro et al., 2005) or have examined differences between different groups of aggressive children, such as

Approaches to Studying Change 16 children with conduct disorder versus oppositional defiant disorder (Dunn, Lochman, & Colder, 1997). Findings indicate that children with clinically aggressive behavior have trouble encoding social cues (Coy et al., 2001), and tend to attribute hostile intentions, generate more aggressive responses, and evaluate aggressive responses less negatively than their nonaggressive peers (Orobio de Castro et al., 2005). My child interview method resembled SIP studies in its use of vignettes to probe children’s reactions to hypothetical social events, but its main purpose differs from the goals of most related studies in several key ways. Firstly, most studies seek to explore the cognitive processes underlying children’s social perceptions and decisions (see Crick & Dodge, 1994). Studies that focus on distinguishing the processing patterns of clinical samples from non-clinical samples generally try to isolate the specific processing step at which children with maladaptive social patterns deviate from the norm in their thinking (e.g., Orobio de Castro et al., 2005). The current study focused on the less studied question of whether children’s verbal responses to hypothetical situations are related to the behaviors they display in response to real social events. Secondly, rather than comparing the responses of a clinical sample to a normative sample, my study focused on the relationship between children’s interview responses and extensive field observations recorded by adults in a short-term residential treatment program for children with emotional and behavioral disorders. The vignettes depicted social events commonly encountered by children in their social interactions with peers and adults (e.g., teasing from peers or instructions from adults). My review of the literature revealed no studies that directly compared children’s vignette responses to field observations of their reactions to similar events. The current study also differed from past work in the field of

Approaches to Studying Change 17 child vignette research by asking children straightforward questions about their overall behavior ratings, event ratings, and impressions of their own behavioral changes. Finally, the current study sought to embed the acquired information about children’s selfperceptions into an extensive study of behavioral change. Hypotheses of the Current Study The study examined counselors’ global impressions of change through the CGI, as in Metcalfe (2007) and Choukas-Bradley et al. (2008). Based on past research, I hypothesized that counselors would perceive children as having improved in all areas of functioning. Secondly, in contrast to people’s impressions of change, I hypothesized that standardized assessments (TRF) at three time points would not reveal changes in problem behaviors. Thirdly, I expected that extensive fine-grained field observations would help elucidate the discrepancies between the CGI and TRF judgments of change, by revealing complex patterns of change in children’s reactions to events. Fourthly, based on findings about low correlations between children’s self-reports and observer reports (Achenbach et al., 1987), and findings that aggressive children do not have accurate perceptions of their social environments (e.g., Coy et al., 2001), I hypothesized that children’s perceptions of their own behaviors, social events, and behavioral changes would differ from adults’ ratings at the individual difference level. Such discrepancies were not expected to act as hindrances to our understanding of children’s behavioral changes, but rather to help us untangle the complicated web of change.

Method This research was collected as part of the second year of a three-year study at Wediko Children’s Services’ residential summer treatment program for children with

Approaches to Studying Change 18 emotional and behavioral problems. The program admits approximately 125-150 children each summer. Many of the children are referred through the Boston public school system and the majority of the children come from the New England region. The children are 719 years old and live on the setting for 45 days during July and August in clinical groups of 8-12 same-sex, roughly same-aged peers. They participate in a range of daily activities including classroom learning, recreational activities (e.g., athletics, art), and group therapy. Approximately 140 counselors and teachers work on the setting and participate in data collection during the summer. During the first year of data collection, when these methods were developed, I worked as a counselor and as a member of the research team. During the current year of study, I developed the child interview protocol, oversaw all interviews, and helped with data management and staff training for the overarching study. Parental consent to use children’s data for research was obtained from parents/guardians during the pre-summer interview process. Child participants. For field observations and standardized assessment forms, data were obtained for a total of 148 children during the summer. Children who arrived late, departed early, were placed in a new clinical group after their arrival, or spent more than three days away from their group at any point during the summer were excluded. The final sample consisted of 139 children; 55.4% White, 30.2% African-American, 12.9% Hispanic, 0.7% Asian, and 0.7% other. For analysis purposes, children were placed in two age groups, using the age guidelines of past research (Achenbach & Rescorla, 2001). Specifically, children over age 11 were categorized as “older” and all others as “younger.” Each gender and age group was as follows: younger boys (N = 29, Myears = 9.14, SD = 1.55), younger girls (N = 9, M = 10.22, SD = .97), older boys (N =

Approaches to Studying Change 19 67, M = 14.42, SD = 1.93), older girls (N = 34, M = 14.26, SD = 1.96). For every analysis reported in this thesis, gender and age were tested as factors, but no significant results were found, and thus will not be further discussed. Adult participants. A total of 144 counselors and teachers (mainly college students or recent graduates) were employed at Wediko over the summer. Teachers and activity counselors ran classes and activities for six hours each day, and other residential counselors as well as supervisory staff spent the majority of the day with their clinical groups of children. End of summer assessments from counselors who had arrived late to or departed early from Wediko were excluded from analyses, as they were not in a position to provide impressions of children’s changes over the full summer. Hourly observations of behavior (see below) from all counselors were included in the behavior data analyses. Standardized Syndromal Assessment Teacher Report Form (TRF). The TRF (Achenbach & Rescorla, 2001) is a standardized measure that assesses behavior problems in children. It is one of the most widely used instruments of its type in research and clinical settings. It consists of 118 items (see Appendix D) grouped into eight narrowband measures and two broadband measures. The narrowband measures include Aggression, Delinquency, Withdrawal, Somatic Complaints, Anxiety, Thought Disorder, Attention Problems, Rule Breaking, and Social Problems; the broadband measures include Internalizing and Externalizing. “Internalizing” is the aggregate of Withdrawal, Somatic Complaints, and Anxiety; “Externalizing” is the aggregate of Aggression and Rule Breaking (called “Delinquency” in earlier versions of the TRF). The “Total Problem Behavior” (TPB) scale was also used

Approaches to Studying Change 20 in my analyses; in addition to the eight narrowband scales, it incorporates additional problem behaviors such as “bedwetting.” Raters are asked to assess these behaviors on a scale of 0-2 (“not true,” “somewhat or sometimes true,” and “very true or often true”). All results are reported in the form of T scores, using Achenbach and Rescorla’s (2001) procedure. Scores are calculated based on age and gender groups, with children over 11 categorized as “older,” as mentioned before. Reliability information is available from Achenbach and Rescorla (2001). Procedure. With the exception of supervisory staff, all staff—including teachers, residential counselors, and activity counselors—completed standardized assessments of behavior every two weeks and participated in daily behavioral observations. There were three TRF administrations during the summer: July 15 (10 days after the children arrived), July 29 (24 days after the children arrived), and August 12 (38 days after the children arrived). At each administration, counselors were assigned 1-2 children from their clinical team (or in the case of teachers, from their classroom) to rate using the TRF. For the first assessment, they were instructed to assess the children’s behavior since the beginning of the session (a 10 day period). For the second and third assessments, they were instructed to base their ratings on the most recent two weeks. The completion of each child’s TRF assessment took approximately 30 minutes. If a child was rated by more than one adult, the raters’ scores were averaged. Field Observations Daily behavior observations were recorded for up to six days per week during hourly activity periods. Coders recorded behaviors at the end of the period, using 79 hand-held computers (“Palm”© Z-22s) running SatForm software written for this project.

Approaches to Studying Change 21 Each day, the 79 Palms were updated with the appropriate schedule of activities, including the names of the children and staff who were scheduled to be present at that activity. The Palms were distributed each morning to activity sites, classrooms, and living groups and they were collected at the end of each day. Each night, they were docked, the data were uploaded to a desktop computer, the batteries were charged for use the next day, and the new schedule was uploaded. 18,649 observation sessions were obtained. A manual provided specific code definitions and explained Palm input procedures (see Appendix A). Training at the beginning of the summer (before the children had arrived) included group presentations and demonstrations, hands-on practice in coding hypothetical situations, and a quiz of the codes used in the Palms. Counselors were also provided with “Quick Guides” summarizing the codes, which they were encouraged to carry with them for reference. Each coder typically coded 2-6 children per period and each child was observed for 2-6 periods per day. A coder started each coding session by logging into the software with his/her assigned username and password. The coder was then presented with the list of activities he/she was scheduled to attend during the day. At this point, the coder could select the activity to be coded, and would then be presented with a list of children expected to be present at that activity. He/she would coordinate the coding of the children with the other counselors present. Overall behavior frequencies. The coder was then prompted to record an overall assessment of eight of the target child’s behaviors using a 0-3 scale (“not at all,” “somewhat,” “moderately,” and “a lot”). The eight behaviors were: “withdrew/isolated,” “looked sad/cried,” “whined/complained,” “argued/quarreled,” “teased/bossed,”

Approaches to Studying Change 22 “hit/pushed/attacked,” “talked prosocially,” and “complied/cooperated.” If the rater failed to enter a rating for any of the eight behaviors, he/she was prompted to do so, as the Palm program required a rating for each of the eight behaviors (for more details, see Appendix A). Event frequencies. The coder was then prompted to record overall frequencies of eight social events that the target child encountered, using the same 0-3 scale. The social events were: “peer teased/bossed,” “peer argued/quarreled,” “peer asked/told,” “peer talked prosocially,” “adult disciplined/punished,” “adult warned/reprimanded,” “adult gave instructions,” and “adult talked prosocially.” As with the overall behaviors, the coder was prompted to enter a rating for each of the eight social events if he/she had not done so (see Appendix A). Target’s reactions to events. At this point in the coding procedure, the software selected two of the social events that the coder reported had occurred (i.e., an event receiving a code other than “0”) so as to obtain details about the event. Due to the time constraints faced by the counselors and teachers coding the children’s behaviors, it was not feasible for the Palms to probe coders for details about all of the reported events. The following procedure occurred for each of the two events probed. Once an event was selected to be probed, the coder was instructed to select the name of the peer or adult who had most recently (or most memorably) engaged the target child in the “event” in question. For example, if the coder was describing an event in which the target child “Max” was “teased/bossed” by a peer, the program would prompt the coder for the name of the peer who had teased or bossed Max (e.g., “John”). The coder would then be asked to assess which of the eight possible responses Max had

Approaches to Studying Change 23 displayed when John teased/bossed him, and to what extent. For responses that did occur, the coder used a 1-3 scale (“somewhat,” “moderately,” or “a lot”). For responses that did not occur, the coder did not enter any information. The possible responses were the same eight behaviors that were used earlier in the “overall behavior” part of the coding procedure. Interactant’s responses to target’s behavior. The data from the final part of the coding procedure were not used in analyses for the present study, and thus are described briefly. In the final step, the coder was prompted to assess how John (the peer interactant) responded to Max’s reactions (those described above), using the same 1-3 scale. The possible responses of peers were as follows: “ignored/no response,” “talked prosocially,” “showed positive emotion,” “asked/told,” “argued/quarreled,” “teased/bossed,” “hit/pushed/attacked,” and “yielded/gave in”; the possible adult responses were: “ignored/no response,” “talked prosocially,” “showed positive emotion,” “gave instructions,” “warned/reprimanded,” “disciplined/punished,” “physically controlled,” and “yielded/gave in.” After the coder entered information about the first social event, he/she was prompted about the second one with the same procedure. Once two behavioral sequences were coded, the coder was brought back to the original prompt screen and could begin to code another child or terminate the session. Preliminary analyses of behavioral data. As in previous work (Zakriski et al., 2005), individual items from the overall behavior ratings were grouped into three behavioral categories: aggression (argued/quarreled, teased/bossed, whined/complained, and hit/pushed/attacked), withdrawal (withdrew/isolated, looked sad/cried), and prosocial

Approaches to Studying Change 24 behavior (talked prosocially, complied/cooperated). As in Zakriski et al. (2005), adult warned/reprimanded and adult disciplined/punished were combined to form a single category called “adult warned/disciplined,” and peer teased/bossed and peer argued/quarreled were combined to form “peer argued/teased.” A total of six event categories resulted: adult prosocial talk, adult instruct, adult warn/discipline, peer prosocial talk, peer argue/tease, and peer ask/tell. For purposes of this thesis, adult prosocial talk and peer prosocial talk will be referred to as “positive events,” adult instruct and peer ask/tell as “neutral events,” and adult warn/discipline and peer argue/tease as “negative events.” Aggressive, prosocial, and withdrawn reaction rates were calculated based on data acquired from the stage of the Palm coding process in which coders enter information about the target child’s reactions to a social event. These data were grouped according to the three behavioral categories described above (i.e., aggression, withdrawal, and prosocial behavior), and expressed in terms of reactions to specific social events (e.g., “aggression to adult instruct”). Impressions of Change At the end of the session, all teachers, residential counselors, and activity counselors rated children’s overall “improvement” from the summer. Materials. Each adult rated all children in his/her clinical group using six items of a revised version of the CGI (Guy, 1976) scale mentioned in the Introduction section (see Appendix B). Items 1 through 3 examined the children’s overall adjustment (“global”), relationships and interactions with adults (“adults”), and relationships and interactions with peers (“peers”). Items 4 through 6 examined changes in aggressive behavior

Approaches to Studying Change 25 (“aggression”), withdrawn behavior (“withdrawal”), and prosocial/friendly behavior (“prosocial”). Aggressive behavior was defined as “arguing, teasing, bossing, bullying, threatening, pushing, and/or hitting.” Withdrawn behavior was defined as “isolating self from others, looking sad, and/or crying.” Prosocial/friendly behavior was defined as “talking, attending, and/or showing positive emotion to others.” Staff made their evaluations based on a 7-point scale: “very much worse” (1), “much worse,” “minimally worse,” “no change,” “minimally improved,” “much improved,” and “very much improved” (7). Internal consistency was high for the six items of the adult CGI; Cronbach’s α was .92 and the mean inter-item r was .65. Therefore an overall index of change (“overall”) was calculated for each child by averaging the six other items. Procedure. All raters were instructed to use their full knowledge of the child over the entire summer when evaluating his/her changes. The forms were individualized so that each staff member received a form with the names of the children who had been in his/her clinical group. The rater circled a number for each of the six items (using the 1-7 scales) for each of the children in his/her clinical group. The completion of these six items took approximately ten minutes. Children’s Self Assessments A structured interview protocol was developed to probe children’s perceptions of their social environments and their own behavioral changes. Children were interviewed at two points during the summer using a protocol that was designed to parallel as closely as possible the procedures used to collect field observations of children’s behavior in context.

Approaches to Studying Change 26 Participants. A sample of 40 children was used for purposes of interviews and related analyses. Based on the lower number of girls relative to boys attending the Wediko summer program, children aged 9 to 13 were targeted. An equal number of boys and girls were selected based on the first week’s observational data for aggressive, withdrawn and prosocial behaviors. Girls and boys might differ in their reasons for referral and in the severity of their behavior problems, which could confound comparisons of their behavior change. Therefore matched samples of girls and boys were identified based on their overall behavioral frequencies (using field observations) during their first week in the program. Specifically, each girl was matched to the boy who was the same age (i.e., within one year) and whose overall rates of aggressive, withdrawn, and prosocial behavior were the most similar to hers. One child was ill during the second interview administration, and thus was excluded from all data analyses, yielding a final sample of 39 children. The gender and age breakdown was as follows: younger boys (N = 6, Myears = 10.67, SD = .52), younger girls (N = 8, M = 10.50, SD = .53), older boys (N = 13, M = 12.38, SD = .51), older girls (N = 12, M = 12.42, SD = .52). Qualitative Responses. Interviews were organized into four parts (see Appendix C). The first part of the interview included four vignettes, which I developed to represent a specific social event that paralleled one of the event categories used in the field observations (see above). Due both to the nature of the population (e.g., limits in the children’s attention spans), and to the nature of the setting (e.g., limits to the amount of time that children could spend away from their clinical groups and structured activities), it was not possible to probe all eight social events examined in the field observations. Four events were chosen based on their frequency and the variety of behavioral responses

Approaches to Studying Change 27 evoked by them, according to Palm data from Year 1 of the study: “peer argue/tease,” “adult warn/discipline,” “peer ask/tell,” and “adult instruct.” Each of the four vignettes was created to exemplify one of the events, based on definitions and examples provided in the coding manual (see Appendix A). Five free-form questions followed the presentation of each vignette (see Appendix C). Children were asked: (1) “If a kid/adult did something like this to you, how would it make you feel?” (2) “Why would a kid/adult do something like this to you?” (3) “If a kid/adult did something like this to you, what would you do next?” (4) “Let’s say you did the things you just said you would do. What would the kid/adult do next to you?” (5) “Let’s say the kid/adult did the things you just said the kid/adult would do. How would that make you feel?” Questions 3 and 4 were designed to reflect Palm pilot probes: Question 3 corresponded with “target’s reactions to event,” and Question 4 corresponded with “interactant’s response to target’s behavior.” Social Event Rates. The second part of the interview asked children for discrete responses. Interviewees were asked to rate how often they had encountered the eight social events probed by the Palm. The language used in describing the social events to the children was as simple as possible without compromising the meaning of the code (see Appendix C). Interviewees could respond using a five-point scale: “Almost Never,” “Sometimes,” “Often,” “Very Often,” or “Almost Always,” which was converted to a 1-5 scale for the purposes of analysis. Overall Behavior Rates. The third part of the interview asked children to use the same scale to rate how often they themselves had displayed the eight overall behaviors probed by the Palms. As was the case for the social event ratings, the questions about

Approaches to Studying Change 28 behaviors were asked in language that was both child-friendly and true to the nature of the code. The same behavioral scales were formed for the interview data as were formed for the Palm data: aggression (argued/quarreled, teased/bossed, whined/complained, and hit/pushed/attacked), withdrawal (withdrew/isolated, looked sad/cried), and prosocial behavior (talked prosocially, complied/cooperated). Perceptions of Change. The fourth and final part of the interview was only used during the second interview administration, as it assessed children’s perceptions of their behavioral changes over the course of the summer. Children were asked six questions based on the six items of the Clinical Global Impressions (CGI) assessment completed by adults at the end of the summer. The seven-point scale was adapted from the scale used for adults, reworded with child-friendly language: “very much worse” (1), “much worse,” “a little worse,” “no change,” “a little improved,” “much improved,” and “very much improved” (7). The items remained functionally equivalent to those asked of adults but were revised with child-friendly language. To parallel the scales of the adult CGI, an overall index of change (“overall”) was calculated for each child by averaging the six other items (see Results for a discussion of internal consistency and the additional subscale created). Interview Procedure Two interviewers conducted each interview: a primary interviewer to ask the questions and a recorder to transcribe the interview. Such an arrangement reduced the total time of the interview and was clinically important. Due to the time-sensitive nature of the data and the special demands of the population and setting, it was not feasible for the same pairing of interviewer and recorder to conduct all interviews. Interviewers were

Approaches to Studying Change 29 senior staff members, and recorders were members of the research team; I served as the recorder in over 75% of the interviews. I carefully trained every interviewer in the vocal delivery of the vignettes so as to minimize inconsistencies. The first set of interviews was conducted 10 to 12 days into the summer session, on or within two days of the administration of the first TRF. The second set of interviews was conducted four to six days before the end of the summer session, on or within two days of the third TRF, and within six to eight days of the administration of the adult CGI. Children were never asked to leave school or group therapy for the interviews; appropriate times included activities (e.g., swimming or art), mealtimes, and free time. At the beginning of each interview, the child was told that he/she would be asked about things that kids or adults might say or do to him/her. The interviewee was informed that his/her answers would remain confidential and that there were no right or wrong answers. During the vignette portion of the interview, the child responded verbally to all questions. The interviewer prompted him/her to provide details about his/her answers, and summarized his/her previous responses when asking the final two questions. For the social event frequency section and the overall behavior frequency sections, all instructions about the questions and the five-point response scale were carefully explained to the interviewee and he/she was given a printed version of the scale with the text in large print. An example question was provided to ensure that the child understood the instructions (see Appendix C). During the first interview administration, each social event and overall behavior question was preceded by the phrase “since Wediko started,” and for the second interview administration, each question was preceded by the phrase “over the last two weeks.” This wording was based on the

Approaches to Studying Change 30 instructions given to adults during the corresponding administrations of standardized forms. The child was allowed either to verbally state his/her responses or to point to them, in which case the interviewer would repeat each response aloud to ensure the intended choice had been made. For the first administration, the interview ended after the overall behavior ratings section. In the case of the second administration, the child completed one final section: Perceptions of Change. The new scale was presented and explained, the interviewee again was given a piece of paper with the words of the scale largely printed, and he/she was asked another example question followed by the six change questions. Interviews in the first administration were completed in approximately 15 to 20 minutes and interviews in the second administration were completed in approximately 17 to 25 minutes. Coding of Responses Vignette responses involving the interviewee’s own behaviors were coded as a target child’s reactions to events would have been coded in field observations, then grouped based on the behavioral scales used for Palm data (i.e., “aggression,” “withdrawal,” “prosocial”; see “Behavior Coding”). Questions regarding affect, intent, and the interactant’s reactions were not used in the analyses for this thesis. Children’s social event frequency ratings were grouped into three scales: positive (peer prosocial talk, adult prosocial talk), neutral (peer ask/tell, adult instruct), and negative (peer tease/boss, peer argue/quarrel, adult warn/reprimand, adult discipline/punish). Children’s overall behavior frequency ratings were grouped in the same way as the behavioral vignette responses.

Approaches to Studying Change 31 Results Overall Impressions of Change (CGI) Means and standard errors for adults’ and children’s assessments of change are provided in Figure 1. For adults’ ratings of all children (Panel A), mean ratings for each item were reliably higher than 4 (“no change”), ts(138) ranged from 10.94 to 23.97, all ps < .001. A repeated measures ANOVA revealed that scales varied, F(5, 725) = 93.93, p < .001. (All significance levels provided in this report for repeated-measures analyses were based on the Greenhouse-Geisser adjustment.) Mean improvement was highest for global behavior, relationships with adults, relationships with peers, and prosocial behavior, with means between 5 (“minimally improved”) and 6 (“much improved”). Ratings were lower for improvement in aggressive behavior and withdrawn behavior, with means between 4 (“no change”) and 5 (“minimally improved”). For adults’ ratings of the interviewed children (Figure 1, Panel B), mean ratings for each item were again reliably higher than 4, ts(38) ranged from 7.12 to 13.07, all ps < .001. A repeated-measures ANOVA again revealed that scales varied, F(5, 38) = 17.91, p < .001. As was the case for the ratings of all children, improvement was lowest for aggression and withdrawal, with means between 4 and 5, and all other means were between 5 and 6. To assess whether the sample of interviewed children was representative of the full sample of children, I tested whether adults’ ratings of the interviewed children differed from their mean ratings of all children. Two-sample t tests would not have been appropriate given that the interviewed children were part of the full sample. One-sample t tests were performed against nulls of the mean level over all children for each scale, and

Approaches to Studying Change 32 no comparison was significant, with ts(38) ranging from .261 to 1.82 and ps ranging from < .08 to >.70. For children’s ratings (Figure 1, Panel C), mean ratings for each item were again reliably higher than the midpoint, with ts (38) ranging from 3.52 to 9.88, ps < .01. A repeated-measures ANOVA revealed that scales also varied for children’s assessments, F(5,190) = 3.08, p < .02. An inspection of Figure 1 reveals the similarities in the pattern of variation over scales between adults’ and children’s assessments. Children’s selfassessments yielded patterns of greatest and least improvement that was similar to the adults’ ratings, with mean improvement scores between 4 and 5 for aggression and withdrawal and between 5 and 6 for all other scales. Pairwise comparisons were performed to test whether children’s ratings differed for each scale when compared with adults’ ratings of the same children. No comparison was significant, with the largest t(38) = 1.87, p < .07, and all other ts < 1. Note that the mean level similarities discussed thus far are different from the discrepancies hypothesized to occur at the level of individual children. Although the adults’ CGI ratings of the interviewed children agreed with the children’s assessments of their own behavioral changes at the mean level, correlation analyses revealed no reliable relationship at the individual level for any scale, rs ranged from -.28 to .19, all ps > .08, with a mean r of .01. An inspection of the bivariate analyses revealed no obvious outliers or nonlinearities. Because internal consistency was low for the children’s items (mean inter-item r = .29), to further explore correlations between children’s and adults’ ratings, I created an additional subscale consisting of the four most highly inter-correlated items (i.e.,

Approaches to Studying Change 33 “global,” “adults,” “aggression,” and “withdrawal”). For this subscale, mean inter-item r was higher (.43), but α showed little increase (.73), due to the reduced number of items. For adults’ ratings of the interviewed children, α was .84, and for adults’ ratings of all children α was .86. However, using this subscale to compare the interviewed children’s own ratings to those made by adults, the correlation remained non-significant (r = -.11). Thus, I found little evidence of agreement between adults’ and children’s ratings at the individual level. Summary. For both children and adults, improvement was reported for each scale, the degree of improvement varied over scales, and children and adults agreed as to the scales for which the greatest and least improvement was shown. Adults and children showed little if any agreement about where change occurred at the level of the individual. Standardized Syndromal Assessments Figure 2 shows results for the Teacher Report Form (TRF) for three scales of interest: the Aggression narrowband scale, and the broadband Externalizing, Internalizing, and Total Problem Behavior broadband scales. In contrast to children’s and adults’ judgments of improvement, Externalizing and Total Problem Behavior showed significant linear increases over time. The Aggression scale increased slightly, but nonsignificantly. Internalizing, like most of the TRF scales, did not show change over time. Table 1 provides scale means and results of repeated-measures ANOVAs of all the narrowband and broadband scales of the TRF over time. The TRF showed no evidence of “improvement” in children, which would be revealed through decreases in their problem behaviors.

Approaches to Studying Change 34 For the subgroup of interviewed children, the TRF did not show significant changes for any of the narrowband or broadband scales, Fs ranged from .074 to 2.59. To further check possible differences between interviewed and non-interviewed children, one-way ANOVAs were performed to determine whether the group of interviewed children differed from the non-interviewed children at Time 1, the time of the first interview and the first TRF administration. A reliable difference was found for one scale of 11, the Aggression subscale, F(1, 138) = 4.85, p < .03. Summary. The TRF suggested that little change occurred, and changes that were revealed showed increases in problem behaviors. These results contrasted with adults’ and children’s impressions of change, which suggested improvement in all areas, including those assessed by the TRF (i.e., aggression and withdrawal). Field Observations of Behavior One possible explanation for why the TRF did not reveal “improvement” is that it focused on children’s problem behaviors at isolated time points, rather than continuously assessing both problem behaviors and adaptive behaviors over time. Extensive finegrained field observations examined changes in children’s overall behaviors, reactions to events, and rates of encountering those events. Linear regression. Mean behavior rates were computed over all behaviors in a given category for each child for each day of the summer. Raw scores were converted to z-scores (Xi-M)/spooled where Xi was each child’s score for a given measure on a given day, M was the mean for that measure for all children over all coded days, and spooled was the standard deviation for that measure over days, pooled over children. This transformation was performed because it preserves information about changes over time

Approaches to Studying Change 35 and differences among children in their amount of variation, but it removes differences in the overall elevations of the scales. Similarly, event rates were computed for each child for each day of the summer, and reactions to each event were computed by pooling all behavioral reactions in that category in response to that event, over all days, over all children. Analyses using distinct time periods. Because various forms of assessment (e.g., the TRF, and the child interviews to be discussed) assess behavioral change by comparing data acquired at different time points, the Palm data were divided into time periods for the purposes of certain analyses. “Days” refers to days when coding occurred. The first time period was defined as Days 1 through 9, the second time period was Days 10 through 22, and the third period was Days 23 through 36. For each of these periods, means were calculated for children’s overall behavior rates, reaction rates, and event rates. The first and third time periods corresponded with the two interview administrations. Day 36 was not the final day of coding, but it was the first day of the final interview administration, so the remaining Days were excluded from time period analyses. Overall Rates of Behavior Figure 3 provides the results of linear regression analyses for children’s daily global behavior ratings over time, for both the full sample of children and the sample of interviewed children. Overall behavior rates showed a significant increase in prosocial behavior r2 = .46, b = .27, F(1, 36) = 29.27, p < .001, and a significant decrease in withdrawal, r2 = .09, b = -.11, F(1, 36) = 4.43, p < .05. Aggression did not show significant linear change, r2 = 0, b = .02, F(1, 36) < 1.

Approaches to Studying Change 36 Overall behavior rates of the interviewed children showed similar patterns to those of the full sample. As was the case for the group of all children, the interviewed children showed a significant increase in prosocial behavior, r2 = .31, b = .38, F(1, 36) = 14.97, p < .001, and a decrease in withdrawal, though it did not reach significance, r2 = .08, b = -.13, F(1, 36) = 2.81, p < .20. Although aggression for the interviewed children showed clear nonlinearity, with a peak at time 2, the measure of linear change was not significant, r2 = 0, b = -.04, F(1, 36) < 1. Reactions to Events Another possible explanation for why “improvement” was not revealed through the TRF involves the measurement’s focus on “act frequencies,” and its inability to assess contextually specific behaviors. Overall behavior rates aggregate over contexts and thus can mask significant changes in children’s behavioral responses to specific situations (e.g., Choukas-Bradley et al., 2008). Aggressive, withdrawn, and prosocial reaction rates in response to individual events can reveal narrow yet important behavioral changes. In the analyses that follow, for each type of reaction, figures showing the children’s changes in responses to two events are shown (i.e., reactions to peer prosocial talk and to adult warn/discipline). These events were chosen so as to show the diversity of reactions and forms of reaction change in response to a positive peer event and an aversive adult event. For each type of reaction, complete analyses for reactions to all six events are shown in table form. Aggressive reactions to events. Although children’s overall aggression did not change significantly over time (see Figure 3), an examination of their aggressive responses to events revealed a complex pattern of context-specific changes. Children’s

Approaches to Studying Change 37 aggressive reactions decreased in response to positive events, increased in response to aversive events, and did not change in response to neutral events. Figure 4 shows the decrease in children’s aggressive reactions to peer prosocial and the increase in their aggression to adult warn/discipline; results of reactions to all events, with summary statistics, are shown in Table 2. The subgroup of interviewed children showed patterns that resembled those for the full sample. The main exception was that the changes in children’s responses to positive events were not significant. Withdrawn reactions to events. Although the overall rate of withdrawn behavior showed modest but significant change (see Figure 3), children decreased significantly in their withdrawn reactions to all events except to peer argue/tease, to which their withdrawal did not change. Figure 5 shows their significance decreases in withdrawal in response to peer prosocial and adult warn/discipline; Table 3 shows the results of analyses of reactions to all events, with relevant statistical summaries. The interviewed children showed a similar pattern of reactions in terms of direction and significance of change, with the possible exception of their withdrawn reactions to peer ask/tell, which was not significant. Prosocial reactions to events. Although children’s overall prosocial behaviors increased significantly (see Figure 3), their changes in prosocial reactions to most events were not significant. Strong positive increases were found only in response to peer prosocial talk (see Figure 6) and to adult prosocial talk. Children showed no significant changes in their prosocial reactions to peer ask/tell, adult instruct, or adult warn/discipline (Figure 6). Moreover, children decreased significantly over time in their prosocial responses to peer argue/tease. Table 4 provides the results of all analyses of the children’s

Approaches to Studying Change 38 prosocial reactions. The interviewed children’s prosocial reactions also decreased in response to peer argue/tease, but not significantly, and all other reactions followed the same patterns of directionality and significance as those of the full sample of children. Event Rates Table 5 shows the results of linear regression analyses of children’s event rates. No event showed significant linear changes except for adult instruct, which decreased. Interestingly, the interviewed children differed from the full group in their changes in event rates. They experienced significantly positive changes in their peer positive and peer neutral events. Summary. In contrast to the TRF’s finding of no “improvement” over time, linear regressions of data from field observations revealed important changes in children’s overall behaviors and reactions to events. Overall behavior rates showed strong improvement in prosocial behaviors and less improvement in problem behaviors. Children’s reaction rates further clarify change processes by revealing increases in adaptive behaviors in response to positive events, and increases in problem behaviors in response to aversive events. Such context-specific changes may help us understand adults’ and children’s reports of improvement in their global impressions of change. Children’s Self-Assessments An analysis of children’s interview responses can help clarify their own impressions of their behavioral changes; how those impressions may or may not be linked to changes in their ratings of overall behaviors, event rates, and reactions to events; and how their self perceptions may or may not agree with adults’ assessments.

Approaches to Studying Change 39 Overall Behavior Ratings Repeated-measures analyses of the children’s ratings of their own overall aggression, withdrawal, and prosocial behaviors at the two interview administrations yielded a time x behavior interaction, F(2, 76) = 3.96, p < .04. Withdrawal decreased significantly from time 1 to time 2, F(1, 38) = 8.37, p < .007, aggression decreased but not significantly, F(1, 38) = 2.04, p > .1, and prosocial increased but not significantly, F (1, 38) < 1. Children’s responses were given at two discrete time points and were not collected continuously. To make a more direct comparison of children’s results to field observations, overall behavior ratings were calculated for time points that corresponded with the first and second interview administrations. Using data from these two time points, field observations from the Palms did not reveal a time x behavior interaction for the interviewed children, F(2, 76) = 2.42, p > .10. Thus, there was a lack of correspondence between children’s and adults’ ratings of overall behavior, and children reported more change than did adults’ field observations based on these two time periods. Reactions to Events Children’s judgments of their reactions to events were assessed using their freeform responses to vignettes (see Method). Figure 7 shows the children’s self-reported hypothetical aggressive, withdrawn, and prosocial reactions to the vignettes designed to mirror the aversive and neutral events in the field observations. Panel A shows the results at the first interview administration, and Panel B shows the results at the second

Approaches to Studying Change 40 interview administration. Analyses revealed only a vignette x reaction interaction effect, F(6, 228) = 12.25, p < .001. In order to again directly compare the children’s responses to field observations, Figure 8 shows the children’s behavioral reactions to the four events based on field observations from the Palms, for the two corresponding time periods. A time x reaction interaction was revealed, F(2, 70) = 4.51, p < .03, with aggressive reactions increasing, F(1, 35) = 14.28, p < .002. As with the interview data, the behavioral reactions yielded by field responses did not show a time x reaction x vignette interaction, F(6, 210) < 1. Figures 7 and 8 appear to reveal similarities in the patterns between the children’s verbally-generated responses to hypothetical situations and their recorded behavioral responses to actual events. To assess whether the patterns were in fact similar at the time of the first interview administration, the means depicted in Figure 7A were correlated with the means depicted in Figure 8A, yielding a high correlation, r = .84, p < .002. The analyses for the second administration (Figures 7B and 8B) revealed a stronger relationship, r = .906, p < .001. Event Rates The means of children’s event ratings reported in the interviews did not change over time, F (3, 114) < 1. Corresponding analyses of event data from adults’ field observations at the two corresponding time points also yielded results that were not significant, F (3, 114) < 1. Discussion This research supports past work in clarifying how complex behavioral changes emerge when multiple, context-sensitive measures are used (Choukas-Bradley et al.,

Approaches to Studying Change 41 2008; Metcalfe, 2007; Wright & Zakriski, in press), and reinforces questions about the field’s overemphasis on measures of treatment outcomes that focus on the reduction of symptoms (Mash & Hunsley, 2005) and that neglect to measure changes in adaptive functioning (Kazdin, 2003). Four main findings emerged. First, counselors’ and children’s overall impressions of change indicated improvement in all areas of functioning. These findings are consistent with Metcalfe’s (2007) and Choukas-Bradley et al.’s (2008) in showing that residential counselors believe children have changed during treatment, and my findings add to our understanding of change by demonstrating that the children themselves also believe such improvement has occurred. Second, adults’ and children’s impressions of change conflicted with multiple administrations of the TRF, which showed increases in the Externalizing and Total Problem Behavior broadband scales and no changes in other scales.. These findings agree with past work showing that overall impressions of change yielded higher ratings of improvement than pre-post difference scores from standardized syndromal assessments (e.g., Connor et al., 2002; Metcalfe, 2007). Such differences may be due in part to the syndromal assessments’ focus on problem behaviors. Recall that the CGI asks raters for assessments of change in adaptive behaviors as well as problem behaviors. Mean ratings were higher for prosocial behavior than for aggression and withdrawal in both the current study and in Metcalfe’s (2007) and Choukas-Bradley et al.’s (2008). Specific CGI mean ratings were not provided in the Connor et al. (2002) article. Third, extensive field observations of children’s overall behaviors, social environments, and changes in context-specific reactions, revealed complex patterns of behavioral changes. Overall behavior rates showed strong improvement in prosocial

Approaches to Studying Change 42 behaviors and less improvement in problem behaviors, with robust increases in prosocial behavior, more modest decreases in withdrawal, and no linear change in aggression. Children’s reaction rates further clarify change processes by revealing increases in adaptive behaviors in response to positive events, and increases in problem behaviors in response to aversive events. Such context-specific changes illuminate how pre-post difference scores from standardized syndromal assessments can obscure complicated change patterns, and may help explain adults’ and children’s reports of improvement in their global impressions of change. Fourth, although children reported improvement in their overall self-assessments of change, their other responses did not change from the first to the second interview administration. Little if any change was found for their overall behavior ratings (with the exception of their mean overall behavior rate of withdrawal, which decreased), or their reported reactions to events described in the vignettes. Thus, the children’s interviews further highlight the complicated nature of change and the difficulties in measuring it, and reinforce the importance of using multiple forms of measurement that are sensitive to contextual specificity, that consider changes in reactions to positive as well as negative events, and that do not rely solely on retrospective ratings. Implications of contextual specificity for “iatrogenesis” My findings demonstrate how different instruments can reveal either overall “improvement” or “worsening” in the same group of children. Though many different kinds of discrepancies were revealed, the current section focuses on the lack of agreement in counselors’ ratings of aggression change. We cannot attribute these conflicting findings to the assessments’ having been completed by different raters who interacted

Approaches to Studying Change 43 with children in different environments, which was the case in the Connor et al. (2002) study. In the current study, the TRF and CGI forms and the extensive Palm coding of children’s behaviors were all shared by residential counselors. The counselors’ impressions of change suggested that children improved in their aggression and in all other areas of functioning. In contrast, according to the TRF, the children worsened during treatment: They did not change for some problem behaviors and they increased in their Externalizing and Total Problem Behavior broadband scales. Furthermore, data from field observations revealed that the overall frequency of children’s aggression did not change. Children themselves claimed that they improved in their aggression in their global impressions of change, but no change in aggression was revealed in their other interview responses. Without further information about changes in children’s contextspecific reactions, I might have reviewed my data and concluded that children had showed no improvement in their overall aggression, in contrast to children’s and adults’ impressions of change. However, the fine-grained field observations of children’s reactions revealed that children’s aggression should not be measured only as one aggregated behavior, because it increased in response to some events and decreased in response to others. These intricate patterns of context-specific improvement and worsening may help answer questions raised in the field of child clinical psychology about “iatrogenesis”— worsening through treatment. Although the evidence is mixed, some studies focusing on problem behaviors have found that children appear to worsen in response to group treatment programs (e.g., Dishion, McCord, & Poulin, 1999). Such “iatrogenic effects” have been shown to occur when children increase in their aggression, presumably

Approaches to Studying Change 44 because of “peer deviancy training,” in which aggression and other deviant behaviors (such as tobacco use) are rewarded by peers (Dishion & Dodge, 2005). Viewed in isolation, these findings are troubling. However, it may be helpful to view aggression as one piece of a behavioral change mosaic, in which children can improve in some areas and worsen in others (Wright & Zakriski, in press). Such contextual “unpacking” may help to explain why some studies found worsening in aggressive behavior (e.g., Dishion et al., 1999) while others did not (see Weiss, Caron, Ball, Tapp, Johnson, & Weisz, 2005); changes in children’s aggression may be contextually narrow and thus undetectable by certain assessment methods. Studies that found worsening in overall rates of aggression may have failed to detect contextually specific improvements. Conversely, studies that did not find overall worsening may have failed to recognize that aggressive reactions to certain events increased over time. The findings of Wright and Zakriski (in press) further emphasize the need for assessments that focus on context, because the overall behavior ratings from field data showed that aggression decreased during residential treatment, but that aggressive reactions nevertheless increased to aversive events. Those contextually narrow but clinically important changes made by children are aggregated and filtered out by standardized syndromal measures, but counselors seem to internalize them and form global impressions accordingly (Wright et al., 2001), possibly weighting the improvements more heavily. The importance of prosocial behavior My findings highlight the need for assessment measures that consider adaptive functioning as an important indicator of treatment outcomes. Studies that emphasize the

Approaches to Studying Change 45 presence of iatrogenic effects may not have recognized context-specific improvements in aggression, and may also have neglected to measure improvements in prosociality. In support of Kazdin’s (2003) assertion that a decrease in problem behaviors is not necessarily a better predictor of long-term adjustment than improvements in prosocial functioning, the current study suggests that counselors may be especially sensitive to increases in prosocial behavior when forming global impressions of change. My analyses of field observations reveal robust changes in children’s overall prosocial functioning and in their prosocial reactions to positive events. Results of the current study are in agreement with Metcalfe (2007), who found robust changes in overall prosocial behavior, including in the group of children who were “least improved” according to the CGI. My results also converge with Wright and Zakriski (in press), who found increases in overall prosocial behavior and in prosocial reactions to adult instruct and adult prosocial talk. Overall prosocial behaviors, prosocial reactions to events, and various responses to positive events appear to be important areas in the study of change, and they are all areas that cannot be understood using syndromal methods that emphasize “symptoms” and ignore contextual influences. Implications of children’s interview responses Field observations revealed that children showed greater improvement in response to positive events than aversive ones. This may help explain the lack of change I found in children’s responses to the vignettes. It should be noted that adults’ field observations revealed that all forms of children’s reactions improved in response to positive events: prosocial behavior increased over the course of the summer, and withdrawal and aggression decreased. The opposite pattern was found for aversive events: prosocial

Approaches to Studying Change 46 reactions decreased, aggression increased, and withdrawal decreased to adult warn/discipline but did not change to peer argue/tease. Note that my interview protocol included four of the six event categories in the form of vignettes, and that the two events I excluded were the two positive events. I made this decision on the basis of previous research showing that children’s reactions to positive events yielded less variability as compared to neutral and aversive events (Wright & Zakriski, 2001, 2003), and I believed that such effects might be exaggerated in children’s responses to interview vignettes, due to possible social desirability biases. However, my analyses of field data revealed a rich variety in responses to positive events. Including these events in the interview vignettes might have led to interesting interview responses, and potentially to more hypothetical prosocial reactions and fewer aggressive and withdrawn reactions in the second interview administration. Another important point to note is that, although the linear regressions from Palm data revealed changes in reactions to events, no changes were revealed when we analyzed reactions to the aversive and neutral events based on average rates from the two time periods corresponding to the interviews. This finding suggests that children’s responses should not necessarily have been surprising. Children’s ratings of their overall behaviors and social event frequencies revealed further methodological limitations. The lack of change in children’s social event ratings between the first and second interview administrations was unsurprising given that adults’ field observations also did not reveal changes in event ratings over the course of the summer. More surprising was the absence of change in children’s ratings of overall prosocial behavior. However, two factors must be kept in mind. Firstly, when we examine adults’ ratings of children’s overall behaviors as a function of two time periods

Approaches to Studying Change 47 equal to those we asked the children to consider in their responses, no overall behavioral changes were revealed. Secondly, children were interviewed beginning the 36th day of treatment, six days prior to the end of the summer session. When I performed linear regression analyses of their behaviors for the first 36 days of treatment, no changes were significant. In sum, little if any change was revealed through children’s ratings of their overall behavior and social event frequencies or through their responses to vignettes, but when adults’ field observations were viewed as means over blocked periods of time, or in terms of linear change for the appropriate time period, the apparent discrepancies between adults’ and children’s perceptions were reduced. What remains unclear is why children’s global impressions of change showed improvement. The children formed these impressions at a time when the field observations, the TRF pre-post difference scores, and the children’s own ratings, all suggested that their behaviors had not improved over the course of the summer. The fact that the children still believed they had improved may suggest that they were not capable of accurately assessing their behavioral changes. The social information processing models suggest that aggressive children interpret their surroundings in skewed ways (Coy et al., 2001; Orobio de Castro et al., 2005), and it is possible that clinically aggressive children are unable to accurately assess their behaviors, let alone any changes in those behaviors. Indeed, even carefully trained clinicians have trouble agreeing about which children have changed (Connor et al., 2002), and research suggests that children’s reports generally have low correlations with adults’ reports (Achenbach, et al., 1987). What we should not do is conclude that children are poor judges who should not be used in future studies of behavioral change. As Kazdin (2003) points out, the way

Approaches to Studying Change 48 children interpret their surroundings can have a significant impact on their behaviors, and the children’s perceptions of their behavioral changes could predict their post-treatment adjustment more strongly than the actual changes measured by adults. Future studies should compare children’s perceptions of their own improvement to parents’ and teachers’ ratings in the months following treatment. Retrospective ratings of symptomatology Taken together, the findings from adults’ and children’s various forms of assessment highlight the need for measurements that do not depend solely on retrospective ratings; the unpacking of children’s complex changes in reactions may require the collection of fine-grained field observations. Two possible explanations for the difficulty of recognizing change through syndromal assessments are the lack of attention to prosociality and an inability to encode contextual specificity. A third possible explanation is that such measures depend on retrospective analysis and the conceptualization of long, yet specific, time periods. It is important to recognize that the children’s responses to questions about their overall behavior rates and social event rates required that they conceptualize their behavior over a specified time period, a task that may have been challenging for children as well as adults. In the current research and in the Metcalfe (2007) and Choukas-Bradley et al. (2008) studies, counselors rated children with the TRF ten days into the treatment program, again two weeks after that, and a third time two weeks after that. If raters confused the time periods (for example, by remembering an event from three weeks ago as having occurred two weeks ago, a mistake that seems easy to make), the difference scores would not accurately reflect behavioral changes. Additionally, assessing behavior every two weeks leaves room for a

Approaches to Studying Change 49 significant amount of information to be forgotten. Moreover, the standard period of time for TRF ratings is two months (Achenbach & Rescorla, 2001), and Achenbach, Dumenci, and Rescorla (2002) report that parents are sometimes asked to consider the past six months in filling out the CBCL. Thus, errors in TRF and CBCL ratings of change may be more pronounced in most studies than in the current work. An advantage of the extensive field observation approach is that it tracks behaviors continuously, so that raters are asked to recall information over the past hour. Such data fulfills the standard called for by trait psychologists who wish to see behaviors recorded over multiple contexts; and yet it preserves information about the social environments in which children’s behavioral changes are embedded. At the end of the summer, global impressions of change and standardized syndromal measures can be compared to numerous observations of behaviors, events, and reactions over all days from the summer. It must be noted that global change impression ratings such as the CGI share this one limitation of syndromal measures, in that they ask raters to retrospectively assess behaviors over lengthy time periods. However, the CGI does not share the other two limitations of the TRF in assessing behavioral change: As the Wright et al. (2001) study suggests, raters are capable of internalizing information about changes in children’s environments and context-specific reactions, and the CGI asks raters to assess the degree of improvement in prosocial, adaptive functioning, as well as in problem behaviors. Moreover, it seems that the CGI’s prompts for impressions of change allow informants to consider all the information they know about a child, including contextual information about changes in response to specific events, when forming their judgments, whereas the TRF focuses raters’ attention on the task of aggregating over situational information to

Approaches to Studying Change 50 form judgments about act frequencies. The CGI seems to be an especially useful assessment tool when used in conjunction with fine-grained field observations. Information from these hourly observations can help us to disentangle counselors’ posttreatment impressions of change, and the CGI can in turn allay the concern that field observations alone may be too atomistic (Wright & Zakriski, in press). Shortcomings and Future Directions The current research furthers our understanding of how complex children’s behavioral changes can be, and how difficult it can be to measure them. It does not reveal how we can predict which children will transfer their treatment gains to their home and school settings. Parent CGIs as well as parent and teacher reports (CBCLs and TRFs) were collected during the autumn following treatment, but such data were not yet available at the time of this thesis. We need to understand whether long-term adjustment can be predicted by TRF difference scores, CGI impressions, field observations, and/or children’s interview responses. Thus, future research should examine the relationship between various forms of summer data and assessments completed by parents, teachers, and the children themselves in the fall. Of course, challenges in methodology will make change assessment difficult on the other end of treatment as well. A limitation of the current study was its use of a small sample of children for the interview portion of the data. Ideally, the full sample of children would have been interviewed, but the special nature of the setting and population made this impossible for the current year of study. With more interviewers, it might be possible in future years to interview a larger sample of children within two or three days of the TRF administrations. However, a greater number of interviewers would exacerbate a problem

Approaches to Studying Change 51 with the current study—ideally, all children would be interviewed by the same pair of adults. It may never be possible to conduct interviews with a large sample of children using the same two adults within a limited period of time. However, a simpler improvement to the current study design would be to ensure an equal number of older and younger children. Though I was careful to include an equal number of boys and girls in the original sample, the sample included more older children than younger children. Future studies should also examine whether children’s responses to positive interview vignettes change from the first to the second interview administration in ways that reflect the changes revealed in adults’ field observations. Future work should also attempt to use multiple raters to code children’s interview responses, in order to be sure of coding reliability. Finally, future research should analyze the full range of data provided in the children’s vignette responses, including the evoked emotions, the perceived intent of the interactant, and the imagined response of the interactant to the interviewee’s behavior. Conclusions My findings highlight the difficulty of measuring and explaining children’s behavioral changes during residential treatment, and underscore the importance of assessing change with multiple instruments, including context-sensitive assessments that measure adaptive behaviors and do not require retrospective analysis. Connor et al. (2002) called for a “best estimate” approach in the assessment of children’s improvement, which combines all available information from different forms of assessment. I argue that the “best estimate” of change must include context-sensitive measurements, because important changes in narrow but clinically informative areas may help us understand which children change, why and how they change, and whether those

Approaches to Studying Change 52 changes will be transferred to post-treatment home and school settings. The difficulty of measuring change simultaneously calls for a highly sensitive contextual assessment system, and reinforces the tendency of the field to depend on syndromal measurements that are claimed to be “reliable” and “valid.” Such assessments are easy to access and administer, they are cost-effective, and they have been widely used in the field. In contrast, the use of context-sensitive field observations has not been extensively tested and is costly and labor-intensive. However, if we wish to help emotionally and behaviorally disturbed youth learn how to respond appropriately to their surroundings, we must understand which environments pose the greatest challenges, and which types of reactions improve during residential treatment. If we believe that treatment helps children, then we cannot continue to rely on measurements that aggregate over important information about the changes that are occurring, and that often lead to the interpretation that children are worsening in response to our attempts to help.

Approaches to Studying Change 53 References Achenbach, T. M., Dumenci, L., & Rescorla L. A. (2002). Ten-year comparisons of problems and competencies for national samples of youth: self, parent, and teacher reports. Journal of Emotional and Behavioral Disorders, 10, 194-203. Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213-232. Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA school-age forms & profiles. Burlington, VT: University of Vermont. Block, J. (1989). Critique of the act frequency approach to personality. Journal of Personality and Social Psychology, 2, 234-245. Buss, D., & Craik, K. H. (1983). The act frequency approach to personality. Psychological Review, 90, 105-126. Cervone, D., Shadel, W. G., & Jencius, S. (2001). Social-cognitive theory of personality assessment. Personality and Social Psychology Review, 5, 33-50. Choukas-Bradley, S. C., Banducci, A. N., Metcalfe, L. M., Wright, J. C., & Zakriski, A. L. (2008). Reassessing the assessment of change: Disentangling the social interactional processes that mediate behavior change in at-risk youth. Poster presented at the Eastern Psychological Association’s 2008 Conference. Conners, C. K., Sitarenios, G., Parker, J. D. A., & Epstein, J. M. (1998). Revision and restandardization of the Conners Teacher Rating Scale (CTRS-R): Factor structure, reliability, and criterion validity. Journal of Abnormal Child Psychology, 26, 279-291.

Approaches to Studying Change 54 Conners, C. K., Sitarenios, G., Parker, J. D. A., & Epstein, J. M. (1998). The revised Conners’ Parent Rating Scale (CPRS-R): factor structure, reliability, and criterion validity. Journal of Abnormal Child Psychology, 26, 257-268. Connor, D. F., Miller, K. P., Cunningham, J. A., & Melloni, R. H. (2002). What does getting better mean? Child improvement and measures of outcome in residential treatment. American Journal of Orthopsychiatry, 72, 110-117. Costa, P. T., & McCrae, R. R. (1991). Your NEO Summary. Psychological Assessment Resources, Inc. Coy, K., Speltz, M. L., DeKlyen, M., & Jones, K. (2001). Social-cognitive processes in preschool boys with and without oppositional defiant disorder. Journal of Abnormal Child Psychology, 29, 107-119. Crick, N. R., & Dodge, K. A. (1994). A review and reformulation of social information processing mechanisms in children’s social adjustment. Psychological Bulletin, 115, 74101. Dishion, T. J., McCord, J., & Poulin, F. (1999). When interventions harm: Peer groups and problem behavior. American Psychologist, 54, 755-764. Dodge, K. A., Lansford, J. E., Burks, V. S., Bates, J. E., Pettit, G. S., Fontaine, R., & Price, J. M. (2003). Peer rejection and social information-processing factors in the development of aggressive behavior problems in children. Child Development, 74, 374-393. Dunn, S. E., Lochman, J. E., & Colder, C. R. (1997). Social problem-solving skills in boys with conduct and oppositional defiant disorders. Aggressive Behavior, 23, 457-469. Eddy, J. M., Dishion, T. J., & Stoolmiller, M. (1998). The analysis of intervention change in children and families: Methodological and conceptual issues embedded in intervention

Approaches to Studying Change 55 studies. Journal of Abnormal Child Psychology, 26, 53-69. Guy, W. (1976). Clinical global impressions. ECDEU Assessment Manual for Psychopharmacology, revised. Rockville, MD: National Institutes of Health. Hoagwood, K., Jensen, P. S., Petti, T., & Burns, B. (1996). Outcomes of mental health care for children and adolescents: I. A comprehensive conceptual model. American Academy of Child and Adolescent Psychiatry, 35, 1055-1063. Kazdin, A. E., (1991). Effectiveness of psychotherapy with children and adolescents. Journal of Consulting and Clinical Psychology, 59, 785-798. Kazdin, A. E. (2003). Psychotherapy for children and adolescents. Annual Review of Psychology, 54, 253-276. Mash, E. J., & Hunsley, J. (2005). Evidence-based assessment of child and adolescent disorders: Issues and challenges. Journal of Clinical Child and Adolescent Psychology, 34, 362379. McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52, 81-90. Metcalfe, L. A. (2007). Contextual versus syndromal assessment of child behavior change: Disentangling components of change impression. Unpublished Honors Thesis for the Department of Psychology at Brown University. Mischel, W., & Shoda, Y. (1995). A cognitive-affective system theory of personality: Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure. Psychological Review, 102, 246-268. Orobio de Castro, B., Merk, W., Koops, W., Veerman, J. W., & Bosch, J. D. (2005). Emotions in social information processing and their relations with reactive and proactive aggression in

Approaches to Studying Change 56 referred aggressive boys. Journal of Clinical Child and Adolescent Psychology, 34, 105116. Parker, E. H., Hubbard, J. A., Ramsdem, S. R., Relyea, N., Dearing, K. F., Smithmeyer, C. M., & Schimmel, K. D. (2001). Children’s use and knowledge of display rules for anger following hypothetical vignettes versus following live peer interaction. Social Development, 10, 529-557. Reynolds, C. R., & Kamphaus, R. W. (2002). Behavior Assessment System for Children – Second Edition. Circle Pines, MN: American Guidance Service. Roberts, B. W., & Caspi, A. (2001). Personality development and the person-situation debate: It’s déjà vu all over again. Psychological Inquiry, 12, 104-109. Shoda, Y., Mischel, W., & Wright, J. (1993). The role of situational demands and cognitive competence in behavior organization and personality coherence. Journal of Personality and Social Psychology, 65, 1023-1035. Weiss, B., Caron, A., Ball, S., Tapp, J., Johnson, M., & Weisz, J. R. (2006). Iatrogenic effects of group treatment for antisocial youth. Journal of Consulting and Clinical Psychology, 73, 1036-1044. Wright, J. C., Lindgren, K., & Zakriski, A. L. (2001). Syndromal versus contextualized personality assessment: Differentiating environmental and dispositional determinants of boys’ aggression. Journal of Personality and Social Psychology, 81, 1176-1189. Wright, J. C., & Mischel, W. (1987). A conditional approach to dispositional constructs: The local predictability of social behavior. Journal of Personality and Social Psychology, 53, 1159-1177. Wright, J. C., & Zakriski, A. L. (2003). When syndromal similarity obscures functional

Approaches to Studying Change 57 dissimilarity: Distinctive evoked environments of externalizing and mixed syndrome children. Journal of Consulting and Clinical Psychology, 71, 516-527. Wright, J. C., & Zakriski, A. L. (in press). Reassessing the assessment of change: Conceptualizing behavior change in children as patterns of adaptation. Manuscript under review. Zakriski, A. L., Wright, J. C., & Underwood, M. K. (2005). Gender similarities and differences in children’s social behavior: Finding personality in contextualized patterns of adaptation. Journal of Personality and Social Psychology, 88, 844-855.

Approaches to Studying Change 58 Table 1. Means and repeated measures ANOVA results for TRF

Time 1

Time 2

Time 3

F1

p2

Flinear3

p

Anxiety

62.03

61.71

61.82