The Framework-Based Teacher Performance Assessment Systems in ...

4 downloads 878 Views 408KB Size Report
Apr 21, 2003 - Association, Orlando, FL, March 28, 2003. The research reported in this ..... level, a teacher could move to the 'Career level' if ratings of '3' ('proficient') in all domains. 12 ...... New York, NY: Teachers College Press. Milanowski ...
CONSORTIUM FOR POLICY RESEARCH IN EDUCATION University of Pennsylvania • Harvard University • Stanford University University of Michigan • University of Wisconsin-Madison

The Framework-Based Teacher Performance Assessment Systems in Cincinnati and Washoe Anthony Milanowski Associate Researcher Consortium for Policy Research In Education University of Wisconsin-Madison Madison, WI 53706 (608) 262-9872 [email protected] Steven M. Kimball Assistant Researcher Consortium for Policy Research in Education University of Wisconsin-Madison (608) 265-6201 [email protected] April, 2003 CPRE-UW Working Paper Series TC-03-07 This paper was prepared for presentation at the 2003 Annual Meeting of the American Educational Research Association held in Chicago, IL, April 21, 2003. An earlier version of this paper was presented at the 2003 Annual Meeting of the American Education Finance Association, Orlando, FL, March 28, 2003. The research reported in this paper was supported in part by a grant from the U.S. Department of Education, Office of Educational Research and Improvement, National Institute on Educational Governance, Finance, Policymaking and Management, to the Consortium for Policy Research in Education (CPRE) and the Wisconsin Center for Education Research, School of Education, University of Wisconsin-Madison (Grant No. OERI-R308A60003). The opinions expressed are those of the authors and do not necessarily reflect the view of the National Institute on Educational Governance, Finance, Policymaking and Management, Office of Educational Research and Improvement, U.S. Department of Education, the institutional partners of CPRE, or the Wisconsin Center for Education Research.

Wisconsin Center for Education Research, University of Wisconsin-Madison 1025 West Johnson Street, Room 653, Madison, WI, 53706-1796 ■ Phone 608.263.4260 ■ Fax 608.263.6448

Abstract This article provides an overview of the development and implementation of two standardsbased teacher evaluation system based on the Framework for Teaching (Danielson, 1996), in the Cincinnati Public Schools and Washoe County (Nevada) School District. The systems were intended to improve both formative and summative evaluation and, in Cincinnati, to serve as the basis for a knowledge and skill pay schedule for teachers. The article also summarizes teacher reactions to the systems. It concludes with lessons learned about standards-based teacher evaluation from our research at these two sites.

2

1. Standards-Based Teacher Evaluation The evaluation of teacher performance is generally recognized as a problematic process. Common problems identified in the literature include low validity and high subjectivity, an emphasis on following procedures rather than improving performance, limited attention to the quality of teachers’ instruction, low utility to teachers as a means for improving performance, and pervasive mistrust or at best apathy on the part of teachers toward evaluation. These problems have made teacher evaluation appear to be a relatively ineffective process for improving instruction and have contributed to a legacy of low evaluation expectations among teachers and administrators. Some (e.g. Peterson, 2000) have gone as far as to suggest that teacher evaluation be eliminated or radically changed to promote teacher self-development rather than be used for administrative purposes. Another alternative that has emerged is to base the process on explicit, detailed standards for teacher performance that try to capture the content of quality instruction. Consistent with the movement for standards for students, this reform has been called standards-based teacher evaluation. Standards based evaluation has the following characteristics: a. a comprehensive model or description of teacher performance reflecting the current consensus on good teaching; b. explicit standards and multiple levels of performance (rather than simply pass/fail) defined by detailed behavioral rating scales (usually called rubrics) that provide guidance to evaluators on how to rate and teachers on what behaviors are expected of high performers; c. more frequent observations of actual classroom practice and use of multiple lines of evidence, such as lesson plans and samples of student work, to provide a richer picture of teacher performance. These features have the potential to address some of the problems of teacher evaluation. Comprehensive standards reflecting current notions of quality instruction provide a goal for

3

practice that should be perceived as relevant and worth pursuing, and a vocabulary to discuss practice. Explicit standards and rating scales also provide more guidance on desired behaviors to teachers and evaluators, making expectations clear and reducing subjectivity in performance judgments. Specific, behaviorally-defined rating scales should also provide a better framework for feedback from the evaluator that teachers will find useful to improve their practice. Multiple levels of performance provide a goal for ‘satisfactory’ teachers to work toward. More frequent observation and the use of multiple lines of evidence should increase accuracy and make the results more credible to teachers. Combined, these features may help challenge low evaluation expectations among educators and help improve accountability and instructional performance. These evaluation systems may also have the potential to improve student learning, if the standards truly capture quality instruction. Figure 1 represents potential links between standardsbased teacher evaluation and improved student achievement. Figure 1 Theory of Action Linking Standards-Based Teacher Evaluation with Improved Student Learning

Standards-Based Evaluation - Model of Desired Teaching Performance - Performance Assessments

promote - Incentives & a Consequences

Attraction & Retention High Performing Teachers Improved Performance of Average Performers

Shared Conception of Quality Instruction

4

Improved Instruction

Improved Student Achievement

Standards-based evaluation can help retain teachers that teach in the desired manner (through the recognition the system accords them and rewards associated with higher performance, such as pay differentiation) and help remove those who do not (through terminations based on poor performance and self-selection of poorer performers out of the teaching force). Over time, the average skill level of a faculty should increase, improving the average quality of instruction. The standards and performance levels also provide guidance for practice improvement of averageperforming teachers, by making District expectations clear, by identifying areas where performance can be improved and by motivating teachers to try the behaviors described by the higher levels of the rubrics. Standards-based teacher evaluation can also contribute to a shared conception of quality instruction by providing a detailed description of good practice. To the extent that staffing, evaluation, training and development, and perhaps even compensation programs are aligned with this model, the organization communicates a normative vision of quality instruction. Such a shared conception supports teacher skill seeking and efforts to improve practice. Of course, a major assumption underlying the theory is that the model of teaching on which the standards are based is efficacious in helping students learn. If the behaviors in model do not constitute good instruction, then implementing standards-based teacher evaluation won’t likely lead to improved student achievement. In 1999, CPRE began to research standards-based teacher evaluation systems. This paper describes the systems at two of our sites, the Cincinnati Public Schools and the Washoe County (Nevada) School District. After a brief overview of the origin of these evaluation systems, the paper describes their operation, then summarizes our research on teacher reactions to them. It concludes with some lessons we believe we have learned about how to implement standardsbased evaluation systems.

5

2. The Origin of the Systems District Backgrounds Cincinnati Public Schools (CPS) is a large urban district with 48,000 students and 3,000 teachers in 70+ schools and programs. Like most urban districts, it has low levels of student achievement relative to suburban districts, and a high proportion of students eligible for free or reduced-price lunch. CPS has also had a history of school reform activity, including the introduction of new whole school designs (e.g., Success for All), school-based budgeting, and the use of team structures to run schools and deliver instruction. The union-management relationship has generally been positive, and close working relationships have been developed between key administrators and union leaders. Teachers are generally paid more than in surrounding districts, giving the potential to attract better teachers. Like some other urban districts, a substantial percentage of eligible students attend private schools. This is of concern to both District administrators and teachers union leaders, because loss of students reduces the funding available under the state school finance system. The Washoe County School District (WCSD) encompasses the cities of Reno and Sparks, Nevada and their outlying areas. The district is the second largest in the state, with over 58,000 students and 84 schools. Seven of the elementary schools are on a multi-track, year-round schedule and eight operate a year-round schedule with a single track. Thirty-eight percent of the students are non-white, with Hispanic students representing the largest non-majority group (25%). There are over 3,700 certified staff and about 270 administrators working in the district. To keep pace with a growing student population and teacher turnover, the district has hired an average of 400 teachers annually. A persistent shortage of qualified teachers has existed due to a

6

lack of teacher candidates graduating from the state’s universities and competition from Clark County (Las Vegas) and California for teachers (Sawyer, 2001). The relationship between the district and teachers’ association has been positive, with collaboration on the teacher evaluation system among other instructional initiatives. A literacy initiative was recently implemented and two additional days dedicated to professional development were added to the teacher contract. Impetus for Changes in Teacher Evaluation in the Two Districts Cincinnati. There were several reasons why CPS developed and implemented a new teacher evaluation system. First, both the District and the local Federation of Teachers were dissatisfied with the old evaluation system. It was perceived as cumbersome, its language outdated, and it placed relatively little emphasis on instruction. Over the years, changes in the evaluation process made via collective bargaining agreement, and the establishment of a peer assistance and evaluation program (which appears to have made minimal if any use of the dimensions and elements of the old system) reduced the importance of the performance dimensions and the goal setting process. It was felt that most teachers and evaluators no longer understood how the system was supposed to operate. In addition, the single annual observation used to assess most teachers was little more than a spot check to ensure minimally acceptable performance, providing neither real performance accountability nor assistance to teachers in improving performance. As a result of these concerns, the District and the Federation agreed in 1997 to develop a new teacher evaluation system. A District-Federation committee was set up and charged with recommending a new system. At the same time, the state changed the teacher licensing system to require periodic relicensing based on teachers’ completion of a professional development plan approved by a District professional development committee. This committee needed to consider

7

the relationship of professional development to teacher performance in order to develop policies for approving teacher development plans. A third impetus for change was the District’s strategic plan, which set ambitious goals for improving student achievement on state proficiency tests. To support achievement of these goals, the Federation and District agreed to explore changing the pay system to reward development of the knowledge and skills needed to improve instruction and in turn student achievement. A third Federation-District committee was set up to develop this ‘knowledge and skill-based’ pay system. These three committees were merged in 1998 because all were concerned with the central question of defining good teaching. The resulting larger steering committee, consisting of teachers, school principals, and central office administrators, developed both a new evaluation system and a proposed knowledge and skillbased pay system. A more detailed description of the design process can be found in a paper by Kellor & Odden (2000). Washoe County. In 1997, Washoe County School District and Washoe County Education Association took opposing positions on state legislation intended to set uniform requirements for teacher evaluation. Although the district and teachers’ association were in opposition on the merits of the state proposal (which failed that year), both agreed that current evaluation practices in the district were deficient. Despite fairly detailed procedures spelled out under the prior district evaluation system, there was a belief among district and association officials that evaluations were not facilitating teacher growth or strengthening instructional accountability. First, it was believed that teachers were not empowered by evaluation or provided input opportunities into the evaluation process (Sawyer, 2001). Communication was seen as one-way and top-down. As a result, evaluation was not viewed as encouraging teachers

8

to reflect on their teaching or focus on self-improvement. Second, evaluation quality varied greatly depending on who was conducting the evaluation. According to an association official, because the prior evaluation instrument was overly vague and inconsistently used, the district was losing challenges in dismissal and disciplinary proceedings. Another perceived deficiency with the prior system was that it was undifferentiated for novice and experienced teachers. Veteran teachers did not feel evaluations provided meaningful feedback or encouraged them to attempt new challenges, and new teachers were often left without structured feedback and support before adverse decisions were made relating to contract renewal. Further, there was no differentiation based on levels of performance; marginally performing and strongly performing teachers were each given satisfactory ratings, which did not allow teachers to clearly gauge steps of improved performance. Both the district and the association realized that changes needed to occur. Later in 1997, the district and association began initiating a process to change teacher evaluation in Washoe County. Working through a task force made up of members of the teachers’ association, principals’ association, district office and school board, the parties were able to collaboratively restructure their system for evaluating teachers. Following the review of a district survey to assess teacher and administration perceptions of the prior evaluation system, the task force decided to focus the new system on teacher growth, to more clearly specify district performance expectations, and to improve use for teachers and administrators. In addition, the task force wanted to create a process that would provide a common set of expectations through performance standards, foster self-reflection, self-modification, and self-renewal of teachers, encourage professional conversations among teachers and administrators, and be rewarding to all educators involved. With the feedback of administrators and teachers in hand and agreement on

9

outcomes to be achieved through a new evaluation system, the task force researched evaluation design alternatives and settled on an adaptation of the Framework for Teaching (Danielson, 1996). After one year of planning and two years of field-testing, the district launched a new standards-based evaluation system in 2000. Additional details on the design process can be found in Kimball (2001). 3. How the Systems Operate Cincinnati’s Teacher Evaluation System The Cincinnati system was based on a set of teaching standards derived from the Framework for Teaching (Danielson, 1996). Sixteen performance standards are grouped into 4 domains: planning and preparation, creating an environment for learning, teaching for learning, and professionalism. (See Appendix Table 1.) Teachers’ performance on each standard is rated at one of four levels (unsatisfactory, basic, proficient, and distinguished), each defined by a behaviorally-anchored rating scale, which included a description of performance at each level. These scales, called ‘rubrics’ were intended to guide evaluator judgment and provide a basis for helping teachers understand how to improve their practice. The new system was divided into two processes: a ‘comprehensive evaluation’, intended to be used in a teacher’s first and third years, and every five years thereafter, and an ‘annual assessment’ for use in all other years. The more intensive comprehensive process required evaluation of teachers’ performance on all 16 standards, and was to be done in the first and third year of a teacher’s employment, and every five years thereafter. Evidence used to evaluate teachers under this process came from six classroom observations and a portfolio prepared by the teacher. The portfolio was to include artifacts such as sample lesson and unit plans, attendance records, student work, family contact

10

logs, and documentation of professional development activities. After the completion of six observations, and a review of the teachers’ portfolio, a final summative evaluation was made using the rubrics for the appropriate standards, and a domain score established based on the rubric scores on the standards within each domain. For beginning teachers (those evaluated in their first or third years), the consequences of a poor evaluation could be nonrenewal. For tenured teachers, the consequences of the evaluation included determining eligibility for the continuing contract, and if the evaluation is poor, placement in the peer assistance program. And as will be discussed below, the results of the comprehensive evaluation were intended to influence teachers’ salaries under the new pay schedule beginning in 2002-2003 The annual assessment, required for most other teachers, was to be used as both a formative tool for teacher professional development and as an evaluation for accountability purposes. During the annual assessment, the teacher selected two standards at the beginning of the year (one of which had to be in the teaching for learning domain) on which to be evaluated. The evaluator was expected to conduct at least one classroom observation, and the teacher to develop a portfolio documenting performance with respect to the chosen standards, if appropriate. A summary conference was conducted and a final summative evaluation was made on the two standards chosen by the teacher. This process was also intended to give teachers the opportunity to work on standards on which they might need to improve before their next comprehensive evaluation and to choose developmental activities consistent with the individual professional development plans the state required for re-licensure. The new pay system, which was intended to be based on the results of the comprehensive evaluation, was narrowly approved by members of the Cincinnati Federation of Teachers in September 2000 to be effective in the Fall of the 2002-03 school year. Thus by 2002-2003, the

11

evaluation system could have had relatively high stakes for many of the District’s teachers. The pay system defined 5 career levels for teachers, with pay differentiated by level. As Table 1 shows, a new, inexperienced teacher would enter at the ‘Apprentice’ level, then move to the ‘Novice’ level by passing a state licensing assessment and achieving ratings of ‘2’ (‘basic’) on all the four domains. Table 1 Cincinnati Public Schools Performance-Based Pay Schedule Teacher Level

Performance Requirements

Pay

Accomplished

Rating of 4 all 4 Performance Domains

$60,000 + 2 steps to 62,500 maximum

Advanced

Rating of 4 Domain 3 and in one other domain; rating of 3’s in other 2 domains. Rating of 3 in all 4 domains.

$52,500 +3 steps to 55,000 maximum

Novice (up to 5 years)

Rating of 2’s in all 4 domains. Has passed state licensing test.

$32,000 + 3 steps to 35,750 maximum

Apprentice Entry level for inexperienced teachers; teachers can remain in this level for a maximum of 2 years.

BA/BS degree, temporary state license

$30,000

Career (no max years)

38,750 +4 steps to 49,250 maximum

A teacher could accomplish this in her/his first year, or have another chance in the second. If these were not achieved in the second year, the teacher would be terminated. From the ‘Novice’ level, a teacher could move to the ‘Career level’ if ratings of ‘3’ (‘proficient’) in all domains

12

were achieved in the comprehensive evaluation at the end of her/his third year. If a teacher did not achieve these ratings, s/he could remain at this level for up to five years. At the end of that period, the teacher would receive another comprehensive evaluation, and if the required scores were not achieved, would be terminated. Once the Career level was achieved, the teacher could stay at that level for her/his entire career, as long as s/he continued to achieve ratings of ‘3’ on all domains on the comprehensive evaluation undergone each five years. Remaining at the Career level limited pay progression, however, since after 4 yearly pay steps a salary maximum was reached. Teachers could also move to the “Advanced’ or ‘Accomplished’ levels based on the results of a comprehensive evaluation by achieving the ratings shown in Table 2, resulting in a pay increase and eligibility for additional annual pay increases as shown. Unlike the traditional seniority-based teacher pay schedule, under this system, at the Career level or above, teachers’ salaries could be reduced if the required performance ratings were not achieved in subsequent comprehensive evaluations. For example, if an Accomplished teacher did not achieve ratings of 4 on all domains, s/he would receive a comprehensive evaluation again the next year, providing the opportunity to improve her/his scores. Failure to do so, however, would lead to movement down to the level associated with the domain scores obtained. As will be discussed below, this feature of the system also affected some teachers’ reactions to the evaluation system. Though the most senior teachers (those with 22 years or more of service) were to be exempted from new pay and evaluation systems, the new system held the potential for a pay cut for some senior teachers. Those near the maximum of the old seniority-based pay schedule would have to earn ratings of 4 on all domains in order to get a salary on the new system equivalent to or higher than their current salary under the old.

13

Development of the System. Before implementing the new evaluation system in all schools, the District and Federation decided to field test it in ten schools during the 1999-2000 school year. Based on the results of the field test, several major changes were made in the evaluation system. The most important change was the hiring and training of eight specialists called Teacher Evaluators to participate in the comprehensive evaluation process. The Teacher Evaluators were hired from the ranks of the teaching force and released from classroom teaching for three years. They were in effect ‘peer’ evaluators, since they remained members of the Federation during this assignment. Each Teacher Evaluator was assigned a group of teachers to evaluate, based on the match between the evaluator’s and teacher’s subject matter expertise and/or grade level. The Teacher Evaluator was to do four of the six classroom observations for the comprehensive evaluation, while administrators (principals or assistant principals) did the other two. Using the results of their observations and summaries of those of the administrators, the Teacher Evaluators assigned the rubric scores for Domains 2 and 3. Administrators assigned scores for Domains 1 and 4 based primarily on the teacher portfolios. The Teacher Evaluators were intended to reduce the classroom observation burden on the site administrators, add content expertise to the evaluations, and increase consistency in evaluation across schools. To address problems of timeliness of feedback uncovered in the field test, Teacher Evaluators and administrators were required to provide written feedback on each observation within 5 days. However, they were no longer required to hold a formal post-observation conference to provide feedback to the teacher. The standards and rubrics were revised to make them easier to apply, and more extensive training for administrators and teachers was developed and provided. Further changes were made after the first year of implementation. Teacher concerns about the Comprehensive process were an issue in the election for local union president in the

14

Spring of 2001, and the incumbent, who had participated in designing the evaluation system and in negotiating the new pay system, was soundly defeated by a challenger who made an issue of problems with the evaluation process. The new president asked that the steering committee overseeing the evaluation system be disbanded, and that changes be made by direct negotiation between the Federation and the administration. In order to respond to teacher stress and the need to better prepare experienced teachers for more rigorous evaluation, it was agreed that the comprehensive evaluation process would be applied only to newly-hired teachers, teachers completing the probationary period, and volunteers until 2005-2006. All other teachers would only come under the annual process until that time. In the mean time, the District committed to develop ways to better prepare senior teachers for the comprehensive evaluation. One approach was a program of study groups, in which teachers worked together to improve skills on each domain, starting with Domain 1 (Planning and Preparing for Student Learning) in 2001-2002. Another was to make the annual assessment a better preparation for the comprehensive by focusing it on one domain at a time, beginning with Domain 1. The District also improved the training on the evaluation system it made available to teachers. Teachers with between 16 and 22 years of service were exempted from Comprehensive evaluation and from the new pay system, and its effective date was moved back to 2005-06. Other changes included the addition of a conference between the teacher and teacher evaluator after the first classroom observation (to provide more feedback and allow teachers to provide information about the school and classroom context), clarification of portfolio requirements and development of example portfolios for teachers to review, and revisions to the standards and rubrics to make distinctions between levels depend more on observable behavior. To increase consistency among evaluators, the District instituted a certification process based on

15

the calibration training used the first year. All evaluators, including principals, were required to meet a certain standard of agreement with a set of reference or expert evaluators in rating videotaped lessons. Those who could not do so after additional training were not to be allowed to evaluate after the 2001-02 school year. In May of 2002, teachers voted overwhelmingly against the new pay schedule. This will require the District and Federation to negotiate a new system or return to the old seniority-based pay schedule. However, the District is continuing to use the evaluation system for new teachers, teachers in their third year, and volunteers during the 2002-03 school year. Wording changes were made in the standards and rubrics, the most important of which was the deletion of one part of a standard in Domain 3 referring to depth of content knowledge, which evaluators had found hard to judge in classroom observations. The District continued to train administrators in the use of the system, and continued with the teacher study groups. Further changes in the system are expected to be a topic of collective bargaining beginning between the District and Federation. Washoe County’s Evaluation System Washoe County’s teacher evaluation system was closely modeled on the standards and suggested procedures included in the Framework for Teaching (Danielson, 1996). There are four domains of practice with 23 components of professional practice and 68 elements elaborating behavioral descriptions of the components. Each element includes separate behavioral descriptions on a four-level rubric: unsatisfactory, target for growth (level 1), proficient (level 2), and area of strength (level 3). An example of one element, with its rubrics is represented in Table 2 below.

16

Table 2 Example of Washoe County TPES Rubric Domain 1: Planning and Preparation; Component 1a: Demonstrating Knowledge of Content and Pedagogy Element

Unsatisfactory

Knowledge of Prerequisite Relationships

Teacher displays little understanding of prerequisite knowledge important for student learning of the content.

Target for Growth/Level 1 Teacher indicates some awareness of prerequisite learning, although such knowledge may be incomplete or inaccurate.

Proficient/Level 2 Teacher’s plans and practices reflect understanding of prerequisite relationships among topics and concepts.

Area of Strength/Level 3 Teacher actively builds on knowledge of prerequisite relationships when describing instruction or seeking causes for student misunderstanding.

In addition to the core standards, a supplemental teacher evaluation form includes four performance standards, which is a composite of elements from Domain 1 (Planning and Preparation) and Domain 3 (Instruction). The form is intended to ensure that each teacher is evaluated on their instructional practice annually and is used for those teachers who, because of their stage in the evaluation cycle, may not be evaluated on the Instruction Domain (Domain 3). The content of the supplemental evaluation form for instruction is included in Table 3. Table 3 Supplemental Teacher Performance Evaluation Form Performance Standard

Unsatisfactory

The teaching displays solid content knowledge and uses a repertoire of current pedagogical practices for the discipline being taught. Reference: components 1a, 1c, 3e The teaching is designed coherently, using a logical sequence, matching materials and resources appropriately, and using a welldefined structure for connecting the individual activities to the entire unit. Instruction links student assessment data to instructional planning and implementation. Reference: components 1f, 1e, 3f

17

Target for Growth

Proficient

Area of Strength

Performance Standard

Unsatisfactory

Target for Growth

Proficient

Area of Strength

The teaching provides for adjustments in planned lessons to match the students’ needs more specifically. The teacher is persistent in using alternative approaches and strategies for students who are not initially successful. Reference: component 3e The teaching engages students cognitively in activities and assignments, groups are productive, and strategies are congruent to instructional objectives. Reference: components 3c (source: WCSD Supplemental Teacher Performance Evaluation form) The evaluation system calls for multiple sources of evidence to demonstrate teacher performance relative to the standards. Evidence may include a teacher self-assessment, a preobservation data sheet (lesson plan), classroom observations, pre- and post-observation conferences, alternative analysis of instruction (e.g., observations at IEP meetings, parent conferences or faculty meetings), samples of teaching work and instructional artifacts, a reflection form, a three-week unit plan, a knowledge of student’s and resources form, and logs of professional activities. Evaluators have discretion in the specific sources of evidence to gather and there is no instructional portfolio requirement in the District. The combined sources of evidence are intended to provide the material on which evaluators may make formative and summative evaluation decisions and offer related performance feedback. Teachers advance through three evaluation stages: probationary, post-probationary major, and post-probationary minor. All teachers undergo one of these three stages each year. Teachers in their probationary years are those who are novice teachers or new to the district. Probationary teachers are evaluated on all four of the performance domains and must meet at least level 1 (target for growth) scores on all 68 elements. They are observed at least nine times over three periods of the year and a written evaluation is provided at the end of each period. Based on their

18

performance, probationary teachers may be required to undergo a second probationary year, advance to post-probationary status or be dismissed from teaching in the district. Teachers in post-probationary status are evaluated in a “major evaluation” in two performance domains. They are formally observed three times over the course of the year and receive one written evaluation at the end of the year (in April). Once teachers are successfully evaluated under the major, they move to two minor evaluation years. Teachers on the postprobationary minor cycle are evaluated on one domain and are formally observed at least once during the year. Each year of the two-year minor evaluation results in one written evaluation, also occurring in April. An optional evaluation process is available to teachers on the minor evaluation who have at-least five years in the district and have received one successful major evaluation. These teachers may choose from six professional growth options (e.g., action research, mentoring a new teacher or pursuit of National Board Certification) that must be linked to an evaluation domain, but are less structured than the typical minor or major evaluation. For each cycle, any evaluation standard rated unsatisfactory results in an unsatisfactory evaluation and the teacher undergoes and structured intervention process. Teachers in this process receive an intensive evaluation that focuses on components from all domains that are not being satisfactorily met. Teachers remain in focused assistance until they reach all objectives of their focused assistance plan, at which point they move back into the regular major-minor cycle. 4. Summary of CPRE Research on Teacher Reactions Reactions in Cincinnati The results of CPRE research on teacher reactions have been reported previously in papers by Milanowski and Heneman (2001, 2002) and Heneman and Milanowski (2002). Data

19

was collected using both on-site interviews and mail surveys. Table 4 below summarizes interview results by theme. Table 4: Cincinnati Teacher Interview Results by Theme Theme Teacher Reactions Standards and Most teachers understood the standards and rubrics and agreed that they represented Rubrics good teaching. Some concerns with the applicability of the standards to subjects like art, music, and special education were expressed. A substantial minority of teachers expressed concerns that the highest level of the rubrics was not attainable by them because of student limitations. Implementation

Teachers were aware of and troubled by implementation glitches, mid-year procedural changes, and lack of clarity in procedures and deadlines in the first field test and first year of implementation. These glitches led many to question the validity of the system and caused stress for some teachers. Few complaints about these problems were made in the second year.

Fairness

Many teachers were concerned about the fairness of the comprehensive evaluation. Concerns included evaluator expertise, subjectivity of evaluator judgments, and the representativeness of the class periods during which they were observed. Few concerns were expressed about the fairness of the annual assessment.

Usefulness/ Feedback

Most teachers felt the system encouraged reflection and provided clear guidance on district expectations. Many complained that the written feedback did not help them improve their performance. The elimination of the post-observation conferences encouraged teachers to see the system as more focused on accountability than performance improvement.

Stress

Most teachers on the comprehensive evaluation felt stressed by the process. Sources of stress included the workload associated with the portfolio, observation by someone from outside their school, that they did not know when the evaluator would be observing, and in the filed test and first year, uncertainties about procedures and deadlines. While few undergoing the annual assessment reported feeling stressed, many felt the comprehensive process would be stressful based on conversations they had with teachers undergoing the comprehensive process.

Pay System

Teacher reactions were largely negative. Concerns were expressed about potential pay cuts and future pay uncertainty due to the link to performance, rumors of district quotas on the number of high ratings and the likely inability of the district to be able to pay for many teachers at the highest career level, and putting future pay in hands of an inherently subjective evaluator.

Survey results were largely consistent with interview information. Teachers undergoing the comprehensive process were on the average neutral to slightly positive about the fairness of the process, the accuracy and fairness of their own evaluation results, the usefulness of the

20

evaluation process to them, and their interactions with the evaluators. Average levels of satisfaction with the system were low, and reported stress high. On the average, teachers agreed that the system was more work than the results were worth. Reactions in Washoe County Teachers and administrators were interviewed during the first year of full implementation and again during the following year to gauge their reactions to the evaluation system. Questions were asked about views on evaluation purposes, standards, procedures, utility, and the fairness of the evaluation system. Attitude surveys were also administered during the first year of program implementation. Purposes, standards and procedures. There is some debate in the literature about separating formative and summative evaluation purposes. The Washoe County evaluation system encompasses both formative and summative purposes. The system does not separate formative and summative procedures by time or by evaluator, which was seen as appropriate by the vast majority of those interviewed. However, this reaction could be due in part to the low stakes nature of the evaluation system. The system in Washoe County has potential consequences for contract renewal, but not for pay. Indeed, while the system was intended to improve instructional accountability, with the exception of non-renewal for probationary teachers, most teachers did not believe that there were substantial consequences for teachers under the system. It was seen as very difficult to remove poor performing post-probationary teachers under both the new and old evaluation systems. Administrators also spoke of the low stakes nature of the evaluation system. Teachers and administrators largely agreed that the system of standards and performance descriptors were understandable. While some teachers and administrators spoke of certain

21

aspects of the descriptors that were ambiguous and open to interpretation, most said that the standards were generally easy to understand either on their face or after some further reading or consultation with evaluators or peers. Teachers also indicated support for the evaluation standards. With the four domains and 68 performance descriptors, the new system was seen as representing a comprehensive picture of teaching on which to base evaluations and structure dialog during evaluations. Rather than a generic evaluation tool, teachers and evaluators mentioned that the system spelled out specific performance expectations. Many spoke of the standards helping to clarify district expectations, guide instruction, and foster self-reflection. While most evaluators and teachers expressed their support for the evaluation standards, there were concerns that some of the elements created unrealistic expectations for student performance. At the elementary level, for example, several teachers mentioned that involving young children in planning lessons, structuring the classroom or facilitating discussion was difficult given the developmental level of the students. These concerns existed even though the district attempted to modify the standards to address the issue of developmental appropriateness by inserting the word “may” regarding student initiated activities or behaviors. Evaluators and teachers also expressed support for the structural requirements of the evaluation system, including the goal setting session, number of evaluation observations, and evidence collected to support evaluation decisions. The structural requirements were seen as appropriate for the evaluation system and retained some similarities to the basic structure of the prior system (e.g., the probationary, major and minor cycles). Given the structure, specificity and timelines of the new evaluation system, evaluators and teachers agreed that it provided a

22

better idea of what the district expected of them through teacher evaluations and teaching performance in general. Utility. Teachers were asked about the utility of their evaluations relating to evaluation dialog, feedback provided, and the impact of the process on professional development and teaching practice. Teachers and evaluators mentioned that dialog during evaluations had improved under the new system, citing the specificity and comprehensive nature of the standards, as well as the varied sources of evidence used in evaluation decisions as factors leading to improved dialog. Using the common standards, teachers and evaluators were able to set focused goals after teachers shared their self-evaluations and discussed weak areas. Evaluators also agreed that the system allowed for a common dialog between themselves and teachers based on the standards. While teachers spoke of improved dialog between their evaluators and themselves, teachers did not believe that dialog around the evaluation standards extended to other teachers beyond evaluation interactions. Teachers commented that curriculum changes due to new student academic standards were more likely to frame teacher dialog rather than the teacher performance standards of the evaluation system. Even though teachers may not consciously recognize it, if brought up during evaluation discussions or observed by evaluators, these discussions of student standards can be considered in teacher evaluations. A teacher’s ability to build their instruction on student standards and assessments is one dimension included in the teacher evaluation standards. When asked about the nature of feedback and its influence on teacher evaluation and practice, many teachers mentioned that they were mainly given affirmative comments on positive aspects of their teaching. Such feedback was perceived as nice or welcome, but perhaps

23

not of great influence on their performance. Other teachers were given more detailed feedback on their practice and offered specific suggestions relating to how they could improve. The newness of the system combined with the time pressure to complete evaluations may have a negative impact on the quality of feedback provided by evaluators. However, several teachers indicated that feedback on observations led them to think about their teaching in a different way or focus on different elements of instruction, which they did not do previously. Other teachers suggested that the level of detail and depth of practice captured in the evaluation elements themselves helped them reflect on their teaching. These teachers stated that the new evaluation system helped them guide their own growth. Professional development and teaching The annual goal setting session gave teachers and evaluators the opportunity to link professional development opportunities to the evaluation standards, which may not have occurred under the prior evaluation system. However, some evaluators have streamlined or eliminated goal setting sessions due to time constraints. Teachers have made some changes in their professional development resulting from the new system, though such changes were certainly not universal. A number of teachers mentioned that their evaluator did not make specific professional development suggestions, while others were guided to district workshops or learned on their own about inservice activities that helped provide useful information about how to improve on the new teacher evaluation standards. Similarly, the perceived impact on teaching appears to be limited. Yet, there were many examples of changes in practice due to evaluation feedback or self-reflection on performance. These changes often related to improved classroom management or curriculum planning and organization. Deep instructional change due to the evaluations was not the norm. Of those who

24

spoke of substantial changes to their practice, probationary teachers were more likely to cite examples than more experienced teachers. Increased paperwork/burden The new system resulted in greater time demands on teachers and administrators. Principals and other evaluators made adjustments to their administrative routines (and often in their personal lives) to fit in all the required evaluation steps. Both evaluators and teachers spoke of increased evaluation encounters and paperwork requirements. Increased burdens related to evaluations appeared to have a greater impact on administrators, especially elementary principals, who were often the sole evaluators in their schools. Strategies used by some evaluators to fit in all of the evaluations included focusing primarily on those teachers who needed more intensive assistance, streamlining the amount and types of performance evidence gathered (e.g., lesson plans, examples of student work, etc.) or limiting time spent observing classes or discussing performance with teachers. This approach may have helped evaluators manage the process and complete evaluations on time, but contradicts system goals of utilizing comprehensive evidence for each teacher evaluation. Other evaluators adjusted other aspects of their work during the day or after school hours in order to maintain a high level of engagement in the evaluation process. Fairness. Most teachers indicated that they perceived the evaluation system as fair. Factors influencing fairness were that post-probationary teachers were able to choose goals to pursue, including the domains they were to be evaluated on and the specific standards focused upon during their observation phases. Some evaluators also allowed the teacher to indicate before observations those elements on which they will be performing. In addition, teachers were able to provide additional evidence on their performance to a standard if they believed evaluators

25

might have missed pertinent evidence. This flexibility allowed teachers to fit the system to their needs and may enhance ownership in the system. Survey results confirmed major themes that emerged from the interviews and demonstrated that the evaluation system and its standards were understood, largely accepted as representing good teaching, and perceived as fair. Teachers were satisfied with evaluation interactions and evaluator feedback and that it was appropriately focused on teacher growth and accountability goals. Teachers have yet to see the system substantially influencing their own professional growth, teaching and student achievement, and dialog beyond evaluations. 5. Evaluator Decision Making These evaluation systems introduce different judgment tasks for evaluators in comparison with traditional teacher evaluation, which is often based on a few walk-through observations, holistic, if not subjective conceptions of good teaching, few if any detailed rating scales or rubrics, and little use of portfolios or teaching artifacts. The standards-based evaluation systems in these districts are similar to some of the performance assessments used in teacher licensing (notably PRAXIS III and Connecticut’s portfolio system based on the INTASC standards; see Porter, Youngs, and Odden, 2001). This section discusses some of the evaluator decision making and measurement issues that we (and the districts) have encountered in implementation of these standards-based evaluation systems. Cincinnati One basic measurement issue for standards-based evaluation systems that depend on the professional judgment of evaluators is inter-rater agreement. The degree to which ratings depended upon evaluator characteristics (e.g., values about teaching styles, subject expertise) and not teacher behavior was of concern to both teachers and district administrators. Besides training

26

evaluators, the District attempted to promote consistency through a series of evaluator “calibration” sessions during which evaluators scored videotaped vignettes of teacher performance and compared their scores with those of expert raters. Initial results were mixed. While almost all evaluators rated within plus or minus one category of the master judges, only a minority agreed exactly more than 60% of the time.1 These results motivated the District to increase the amount of evaluator training and continue the calibration sessions. After several sessions, most evaluators were able to agree with the master raters at the level desired by the District (60% absolute agreement). In the second year of implementation, the District instituted a certification process based on the calibration training. All evaluators, including principals, were required to meet a certain standard of agreement with a set of reference or expert evaluators in rating videotaped lessons. Those who could not do so after additional training were not to be allowed to evaluate after the 2001-02 school year. Because rating tapes differs from the judgment task of making a domain score judgment based on multiple observations under field conditions, we also estimated the level of inter-rater agreement between Teacher Evaluators and administrators in Domain 2 and 3 scores based on the classroom observations for a sample of teachers. (Recall that administrators did two of the six required observations, while Teacher Evaluators did four.) As described by Milanowski and Heneman (2002), the average percentage of absolute agreement for a sample of 54 teachers evaluated in 2000-01 was 69% for Domain 2 and 78% for Domain 3. For a sample of 45 teachers in 2001-02, the corresponding percentages were 78% and 80%. Further analyses suggested that much of the lack of agreement was due to the administrators and Teacher Evaluators observing at different times. These results suggest that agreement between Teacher

1

Because accuracy depended both on evaluator judgment and on the clarity of the training tapes’ representation of performance at different levels, 100% absolute agreement was not be expected.

27

evaluators and administrators was fairly strong, certainly higher than many concerned teachers appear to have expected. We also interviewed both principals and teacher evaluators about their decision-making process. In general, most evaluators told us that they did not have major problems using the rubrics to differentiate teacher behavior into different levels. It was difficult and time consuming, however, to integrate evidence from multiple observations and multiple facets of each standard to come up with a standard-level score. For Domains 2 and 3, many evaluators seemed to have based their decisions on the ‘preponderance of evidence’ by counting the times rubric language at different levels was observed. Several evaluators mentioned tension between their desire to be objective and a desire to be as fair as possible to a teacher, especially in light of potential negative consequences. These evaluators cited two major reasons why they felt they had to be objective, even if it meant giving a low score to a teacher: 1) fairness to other teachers being evaluated; and, 2) the need to ensure District students had quality teachers. Some principals also cited the fact that the Teacher Evaluators would be reviewing their observation summaries and comparing them to what the Teacher Evaluator saw. These principals felt this prevented them from being lenient. In the samples we reviewed for the inter-rater agreement analysis, there was little evidence of leniency on the part of principals as a group, but it was clear that principals on the average tended to cite less relevant evidence in their observation summaries. The District’s experience also suggests that evaluation of teacher content (subject) knowledge may require more than classroom observation. Reviewing observation summaries also made it clear that many evaluators had a difficult time evaluating the level of a teacher’s content knowledge. Most teachers whose summaries we reviewed received scores of ‘3’ (or proficient) on the 1-4 scale, unless clear content errors were made. It was often hard to tell from

28

the summary why teachers received ‘4’s’ on this standard. The District has recognized this problem in modifying the rubrics for the 2002-03 school year to eliminate content knowledge as a distinction between the ‘3’ and ‘4’ level on the relevant standard. Though some aspects of content are still evaluated using evidence from the teacher portfolio, it appears that teacher portfolios varied considerably, and that many teachers did not like the extra work associated with preparing them. Interviews with the Teacher Evaluators also suggested that this was a difficult job with relatively few intrinsic rewards. Teacher Evaluators do not get to provide much developmental assistance to teachers, have a demanding schedule requiring a lot of travel between schools, and are the objects of fear and resentment from some evaluatees. These issues could well lead these specialized evaluators to job “burnout.” Under such circumstances, it may be quite difficult to secure enough qualified teachers to serve as Teacher Evaluators, and in fact CPS was unable to find an evaluator for middle and high school math teachers for the 2000-01 school year. Washoe Due to use of a single rater in the Washoe evaluation system (primarily the principal), no information on inter-rater agreement was available. Additionally, the district did not conduct rater-consistency exercises as in Cincinnati. However, principal and assistant principal interviews provided some information about evaluator decision processes. Most evaluators adhered to the basic procedures outlined in the evaluation manual, which included a goal setting process, pre-observation conversations, observations, and post- observations conferences. Evaluators also believed that the set of evaluation standards were useful in examining instruction and fostering evaluation dialog. They understood the standards, but did acknowledge that some standards were somewhat vague and required evaluator interpretation.

29

Although most followed the basic procedures per the system design, evidence gathering and analysis varied. Several evaluators indicated that evidence depended on the domain selected (consistent with the system design), but it appeared that evidence was primarily drawn from classroom observations and related discussions. Lesson plans and student artifacts were not consistently gathered and analyzed, as called for in the evaluation manual. The content of the written evaluations also varied considerably. Some evaluators provided detailed written commentary, with evidence cited, links to goals on elements, and specific recommendations for teachers. These evaluators often frequently conducted more observations than required under the system. In contrast, other evaluators provided only two to four sentences following ratings on an entire domain, with no specific evidence cited, no link to specific elements, and few, if any, recommendations for improvement. Written evaluations were largely used to highlight positive aspects of teacher performance. The written evaluations also allowed evaluators to document contextual issues that may have had impact on a teacher’s performance (e.g., the teacher took on too many responsibilities outside the class), which could enhance perceived fairness. As indicated in evaluation write-ups and from interviews, evaluators attempted to avoid negative perceptions of their feedback. If criticism was offered, evaluators delivered it in a positive fashion or first pointed out positive aspects of performance or other characteristics of the teacher. Unsatisfactory ratings were rare. In some cases, recommendations for improvement were provided, but not consistently for the same teacher or with other teachers in the same school. In some cases, teachers receiving a level 1 (basic performance) were provided little elaboration of why such ratings were given or how the teacher could improve on the element.

30

It should be noted that, for most teachers interviewed, more feedback was provided during discussions with the evaluator than was reflected in the written evaluations. However, there were a number of cases in which teachers received minimal or no feedback in either format (other than general performance affirmation). This was particularly the case for some teachers in the minor evaluation cycle. As these findings suggest, the extent to which principals incorporated the evaluation system into their instructional leadership approach also varied. Some principals actively modeled their actions around the evaluation system. They indicated that the domains and standards helped them identify areas of teaching that needed improvement and provided suggestions for teachers to change. A few principals did not like the structure of the system and believed it impinged on their approach, which tended to be more open and narrative. Another possible reason for the variation in evaluation approaches and evidence gathering may be that the evaluation system was widely perceived by principals and teachers as low stakes. The predominant approach taken by the principals was to use a formative approach to the evaluations. Principals tended to emphasize praise and provide “soft” criticisms. Feedback was affirmative and intended to foster reflection and growth. Indeed, the system appears to have been largely used by principals as a strategy to build professional community within schools rather than to make performance distinctions among teachers. Finally, given the increased burden the system has created for evaluators, some were probably unable to collect all of the evidence required by the system or spend much time composing the written evaluation narratives. Consistency may represent the greatest challenge thus far in implementation of the teacher evaluation system in Washoe County. Lack of specific portfolio requirements may have

31

contributed to inconsistent evidence gathering and use in evaluations. The issue of consistency has not been a primary focus of evaluator training in Washoe County thus far in program implementation. Several district respondents indicated that this was partly as a result of limited resources for training, which needed to be addressed in the future. 6. System Management Issues Implementing rigorous standards-based teacher evaluation poses many challenges for district program administrators. Our research in these sites suggests that implementation is just as important as system design in ensuring the systems work as intended, and indeed, whether they survive at all. In this section, we discuss five key implementation issues that we believe districts thinking about rigorous standards-based evaluation should be prepared to confront. Similar issues are also explored in Kimball (2002) and Milanowski and Heneman (2001, 2002). 1. Getting teachers to fully understand the system and preparing them to succeed Just putting the booklet describing the system in teachers’ mailboxes and inviting them to a group orientation does not seem to get the message across about district intentions and what teachers should know and do in the evaluation process. If a new evaluation system is a significant departure from its predecessor, teachers need to get engaged with the standards early in the year and to begin a process of reflection on their own teaching. In CPS, though teachers understood and accepted the standards, they did not really know what evidence evaluators were looking for until they experienced the process. In Washoe, the district primarily relied on principals to explain the standards, procedures and expectations of the new evaluation system, supplemented by sessions held by the teacher’s association and information on the district’s website. But here, too, the teachers have come to understand and accept the standards mostly

32

through experience with the system and through some related professional development training. Districts considering this type of teacher performance assessment should develop multiple methods of communication. District-level training for teachers at program initiation and periodic follow-up sessions could help resolve uncertainties and prepare teachers and administrators, as well as signify district commitment to the evaluation system. Principals or other evaluators need to be involved in one-to-one communication with teachers being assessed about specific performance expectations and the evidence demonstrating they are being met. Rigorous evaluation, especially for stakes, is likely to be perceived as a threat by teachers, many of whom may not have had a serious evaluation since they got tenure. To reduce the threat, and to encourage teachers to believe that by their efforts they can meet the standards, districts need to provide more support. As suggested by McLaughlin and Pfeifer (1988), training teachers and evaluators together before program initiation may help ease suspicions about evaluation purposes and provide some common understandings about evaluation processes and how to succeed. While both districts developed some formal ways to help prepare teachers (e.g., orientation sessions, training on the evaluation system) it may also be necessary to ensure that evaluators give quality feedback and to get administrators and peers at the school more involved in providing developmental feedback and coaching. Teachers need specific, concrete feedback that tells them not only the rating they will receive but also exactly what needs to be done to do better next time. Because some teachers may not be as motivated to improve on evaluation ratings in a low-stakes system, they need to see value of going through the process to their instruction and student learning, beyond just getting a higher evaluation score. Included with concrete feedback should be suggestions about techniques to try and who to observe to see good performance exemplified, modeling of aspects of desired performance, and information about

33

relevant professional development opportunities. This requires evaluators to be trained in providing feedback, that teachers have a trained coach or mentor to go to for help, and that training and development programs are available to provide the skills needed to do well on the evaluation system. 2. Ensuring Consistency Across Schools and Evaluators. Experience in Cincinnati’s field test and in Washoe suggests that basic evaluator training does not ensure consistency of evaluation across evaluators nor eliminate evaluator leniency. There were many good reasons that principals had different frames of reference for evaluation, ranging from different subject discipline training to different socio-economic status of the student body. Similarly, principals had powerful motivations for leniency in the desire to avoid demotivating teachers and to avoid damaging fragile cooperative relationships. For example, some evaluators used a low-key approach that stressed positive affirmation of performance to reduce stress about the evaluation process and build school community. At the least, some form of frame of reference rater training (Bernardin and Buckley, 1981; Lievens, 2001) needs to be required. This training aims to develop consensus on a normative understanding of good performance, the critical behaviors that exemplify it, and the process of gathering, evaluating, and weighing evidence of performance. Program designers should also consider checks like the calibration sessions used in Cincinnati and the inter-rater agreement analyses described above. Once trained, evaluators also need to be held accountable for, and perhaps given incentives for, accurate evaluation. These changes do not necessarily require a principal to break from using a positive and affirmative approach, but would help ensure that evaluators are holding teachers to the same standards of performance. Specialist evaluators from outside the school, like Cincinnati’s Teacher Evaluators, can also add

34

consistency and reduce leniency, but add expense and introduce some dynamics that may provoke negative teacher reactions. 3. Getting Principals “On Board” in Support of the Evaluation System. In both districts, the new evaluation systems created more work for principals than the systems they replaced. Many principals seemed to feel that the evaluation system was a burden and just one more demand being made by the central office. The structure of the principal job, the tradition of isolated practice in teaching, and a crisis management atmosphere in some schools all tend to push evaluation to the periphery of many principals’ management efforts. Beyond following the procedures of the system and doing accurate rating, the principal has an important role in making these new systems work. Our field work suggests that where principals took the system seriously, provided quality feedback and performance improvement suggestions, and used it as part of their program of instructional leadership, teachers took it more seriously, were more accepting of evaluation and gained more from the process. In our sites, those principals who saw the value of the system to them, as a tool for improving instruction, tended to be more supportive and communicated this to teachers. Since the current structure of the principal job in most districts provides little time or encouragement for working with teachers on quality instruction, or for using the evaluation system as a tool for instructional improvement, districts might want to consider some restructuring of the job. They may also want to consider providing incentives for principals to use the evaluation system for instructional improvement and designing the system so it measures some of the same behaviors and skills required for other instructional improvement initiatives. Increasing professional development opportunities for teachers and principals that are aligned to

35

the evaluation standards could also help build a more systemic instructional focus for the evaluation system. 4. Working Out the Glitches It appears more difficult than might be expected to implement a rigorous standard-based evaluation system, especially one with high stakes. Even though Cincinnati conducted a field test, implementation problems remained in the first year that turned teachers off to the system and influenced some to vote against the new pay schedule. First year implementation problems included confusion about portfolio requirements and deadlines, misunderstandings about the Teacher Evaluators and what they were looking for, overestimating the feasible Teacher Evaluator workload, and uncertainty about how to score observations when no relevant evidence for a standard was observed. These stumbles reduced the credibility of the system to many teachers, and some perceived them as reasons to doubt the validity of the system. Certainly the CPS experience suggests the need to test all major changes and be sure that procedures are worked out. It may also be necessary to appoint a ‘czar’ who can make decisions and concentrate on ironing out problems. Cincinnati divided the leadership of it program between an oversight committee, two administrators, and a union liaison during the first two years, but has now concentrated the TES program under one administrator. Unlike Cincinnati, glitches that occurred in Washoe County do not appear to have negatively impacted teacher and administrator acceptance of the evaluation system. One reason may be that because of the lower stakes, teachers were less concerned about them. The district also used field tested the system for two years. This appeared to be successful in removing most administrative glitches, but some inconsistencies remain, such as what evidence is collected and how written narratives are composed. District leaders have recognized that additional evaluator

36

training on decision-making and writing evaluations was needed, but such training has not yet been mandated for all evaluators. 5. Taking the Next Step: Assessing Content Knowledge and Content-Specific Pedagogy. Some educational researchers believe that content knowledge and especially knowledge of content-specific pedagogy are necessary to making major improvements in student learning. If this is so, then these two Framework-based systems may put less emphasis on these skills than may be desirable. Most of the standards address generic pedagogy and classroom management. It is clear from our research that evaluators have difficulty assessing content knowledge and content-specific pedagogy through classroom observations. Not only are these hard to observe, but when observations occur at random, without prior arrangement with the teacher, evaluators may not see enough connected lessons to get a good picture of how the teacher develops the content concepts. It also seems likely that beyond the primary grades, principals can’t be expert enough in all content areas to provide an in-depth assessment of teachers’ skills in contentspecific pedagogy. So how can districts seeking to assess these important aspects of instruction proceed? Some suggestions were made by Odden (2003). One approach would be to structure observations around an instructional unit, so that the evaluator sees the entire range of instructional skills in development, from planning to assessment and feedback, and can understand the content/pedagogy interaction. Another is the collection of more artifacts in a portfolio in which teachers are asked to provide examples of planning, activities, and assignments and describe how these address the content. Content experts, rather than principals could evaluate these artifacts ‘off-line’ in the summer. This is essentially what the State of Connecticut does in its teacher licensing assessment. The downsides to this option include

37

additional expense, finding enough qualified evaluators, and the tendency of teachers to see such portfolios as burdensome extra work that was apparent in our sites. Other potential issues would be perceived credibility of external evaluators who are not familiar with a teacher’s class or school context, and the potential loss of relevance if there is a large time gap between submission of portfolios and results (e.g., if they are scored over the summer and returned the following school year). 7. Concluding Comments The standards-based teacher evaluation systems implemented by these districts appear to address several of the concerns raised in the teacher evaluation literature. As the companion papers by Milanowski, and Kimball, White, and Milanowski will show, there is positive evidence for their criterion-related validity. They appear to focus on the substance of teaching, and to have some utility to teachers in terms of encouraging reflection and driving practice improvement. But the systems require considerable investment. (A conservative estimate of the direct cost of the Cincinnati system would be at least $1.6 million per year. This is about .4% of the 2001 operating budget.) This does not include the value of administrator and teacher time. No comparable figures are available for Washoe, but there was a considerable investment in time and personnel to design and implement the system. And in each of our sites, yet further investment needs to be made to improve the impact of the systems on actual instruction. The next few years will show whether the districts are able to make that investment. One way to leverage an investment in teacher standards and standards-based evaluation would be to align more aspects of the human resource management system around the teaching standards. The standards can provide a foundation for recruitment, and selection, professional development, and compensation. Such alignment would send a coherent message to teachers

38

about what they should know and be able to do, and encourage the development of a shared conception of good instruction. The paper by Heneman presents an alignment model and discusses the efforts these districts have made to align aspects of their HR systems around the standards. Another source of leverage could come from learning about the relationship between the evaluation system and student achievement. If it can be demonstrated that scores on the evaluation correlate with student achievement scores and can help predict student achievement gains, more interest in investment and greater acceptance may be generated.

39

References Bernardin, H.J., and Buckley, M.R. (1981). Strategies in rater training. Academy of Management Review, 6, 205-212. Danielson, C. (1996). Enhancing Professional Practice: A Framework for Teaching. Alexandria, VA: Association for Supervision and Curriculum Development. Gallagher, H.A. (2002). The Relationship Between Measures of Teacher Quality and Student Achievement: The Case of Vaughn Elementary. Paper presented at the 2002 annual meeting on the American Educational Research Association, New Orleans, LA. Heneman, H.G. III, and Milanowski, A.T. (2002). Evaluation of Teachers’ Reactions to a Standards-Based Teacher Evaluation System. Paper presented at the Sixteenth Annual Conference of the American Evaluation Association, November 8, 2002. Washington, D.C. Kellor, E. and Odden, A. (2000). How Cincinnati developed a knowledge and skill-based salary schedule. Madison, WI: Consortium for Policy Research in Education. Available at http://www.wcer.wisc.edu/cpre/papers/. Kimball, Steven M. (2001). Washoe County Teacher Performance Evaluation System: A Case Study. Wisconsin Center for Education Research, Consortium for Policy Research in Education. Available at http://www.wcer.wisc.edu/cpre/papers/. Kimball, Steven M. (2002). Analysis of Feedback, Enabling Conditions and Fairness Perceptions of Teachers in Three School Districts with New Standards-Based Evaluation Systems. Journal of Personnel Evaluation in Education, 16(4), 241-268. Lievens, F. (2001). Assessor training strategies and their effects on accuracy, inter-rater reliability, and discriminant validity. Journal of Applied Psychology, 86(2), 255-264. McLaughlin, M.W., and Pfeifer, R.S. (1988). Teacher Evaluation: Improvement, Accountability, and Effective Learning. New York, NY: Teachers College Press. Milanowski, A.T., and Heneman, H. G. III. (2002). Transforming Teacher Evaluation: A Standards-Based Teacher Evaluation System. Paper presented at the Sixteenth Annual Conference of the American Evaluation Association, November 8, 2002. Washington, D.C. Milanowski, A.T., and Heneman, H. G. III. (2001). Assessment of teacher reactions to a standards-based teacher evaluation system: A pilot study. Journal of Personnel Evaluation in Education, 15:3, 193-212. Odden, A. R. (2003, forthcoming). An Early Assessment of Comprehensive Teacher Compensation Change Plans. In D. Monk and M. Plecki, (Eds.) School Finance and Teacher

40

Quality: Exploring the Connections. 2003 Annual Yearbook of the American Education Finance Association. Philadelphia: Eye on Education. Peterson, K.D. (2000). Teacher Evaluation: A Comprehensive Guide to New Directions and Practices. Thousand Oaks, CA: Corwin Press. Porter, A.C., Youngs, P. and Odden, A. (2001). Advances in teacher assessments and their uses. In V. Richardson, (Ed.) Handbook of Research on Teaching. Washington, D.C. American Educational Research Association, 259-297. Sawyer, L. (2001). Revamping a Teacher Evaluation System. Educational Leadership, 58(5), 4447.

41

Table 1: Cincinnati Teaching Standards Domain 1: Planning and preparing for student learning The teacher will: 1.1.Acquire and use knowledge about students as individual learners in preparing lessons which consider the student’s cultural heritage, interests and community 1.2.Write clear instructional objective that will enable all students to meet or exceed Promotion/Credit Granting standards, establish high expectations, address individual learning needs, and make connections within and among disciplines. 1.3.Design lessons and use clearly defined assessments that align with standards and select/adapt instructional resources appropriate for the developmental level of students. Domain 2: Creating an environment for learning The teacher will: 2.1.Create an inclusive and caring environment in which each individual is respected and valued. 2.2.Establish a classroom culture where high expectations for learning and achievement are communicated to students and all students are invited and encouraged to participate. 2.3.Establish, maintain, and manage a safe and orderly environment in which time is used to maximize student learning. Domain 3: Teaching for learning The teacher will: 3.1.Know the content, content specific pedagogy, and the background knowledge and skills student need prior to learning new concepts. 3.2.Communicate learning objectives, performance standards for those objectives, procedures, and assessments effectively. 3.3.Pose thought-provoking questions, foster classroom discussion, and provide opportunities for each student to listen and speak for many purposes. 3.4.Engage all students in relevant learning activities that encourage conceptual understanding and connections, challenge student thinking, and address real-life situations. 3.5.Provide timely, constructive information on student performance through a variety of assessment strategies. 3.6.Reflect upon and adjust instruction to respond to differences in student knowledge, experiences, cultural heritage and traditions, and persist in finding effective instructional strategies to meet individual needs Domain 4: Professionalism The teacher will: 4.1.Track student progress toward Promotion/Credit Granting Standards, maintain records to show how decisions are made about rubric scores and grades, and keep accurate noninstructional records. 4.2.Inform families about the academic and social progress of their child and events in the classroom and encourage parental involvement in child’s education. 4.3.Establish and maintain a professional relationship with peers/teams, function as a member of an instructional team, department, or level, and participate in school and district initiatives. 4.4.Improve content knowledge and pedagogical skills by participating in professional development activities and applying what is learned.

42

Suggest Documents