Making use of assessments for creating stronger ...

ED/GEMR/MRT/2017/P1/6

Background paper prepared for the 2017/8 Global Education Monitoring Report

Accountability in education: Meeting our commitments

Making use of assessments for creating stronger education systems and improving teaching and learning This paper was commissioned by the Global Education Monitoring Report as background information to assist in drafting the 2017/8 GEM Report, Accountability in education: Meeting our commitments. It has not been edited by the team. The views and opinions expressed in this paper are those of the author(s) and should not be attributed to the Global Education Monitoring Report or to UNESCO. The papers can be cited with the following reference: “Paper commissioned for the 2017/8 Global Education Monitoring Report, Accountability in education: Meeting our commitments”. For further information, please contact [email protected].

Prema Clarke

2017

Abstract The education sector uses learning assessment data as an accountability measure for quality education and to guide sector reform. This paper begins with an overview of assessment systems across countries in terms of their experience, coverage of school populations and subjects, and institutional responsibility. The use of assessment data in education reform in three groups of countries is then reviewed. The first group includes countries with established systems of assessment. Countries with established systems provide rich examples of how the use of assessment data can positively impact student learning reflected in higher PISA and TIMSS scores. In these countries, based on an initial and critical alignment of curricula objectives with assessment content, test data are systematically used in three components relevant to the functioning of schools– infrastructure and instructional aids management, personnel management and support, and school oversight and support. Harmonization across these three components and coherence across the entire system positively influence teaching and learning in the classroom. The second group comprises those countries with evolving assessment systems. In evolving systems, information on the status of two of the components is weak (personnel management and school oversight) and therefore, the basis and direction for using assessment data appropriately to improve teaching and learning are unclear. The third group of countries operates learning assessments in federal systems with dispersed authority structures. In federal systems, when the national level is authorized to support harmonization and coherence at sub-national levels, there is potential for the use of assessment data for accountability and improvement in sector performance. The paper ends by highlighting three facets in the reform experience of countries successful in creating harmonization and coherence for quality teaching and learning. First, the availability of appropriate and in-depth analyses on the status of infrastructure, personnel and schools facilitates and steers harmonization and coherence (balanced knowledge base). Second, depending on the history of education, technical capacity and political, socio-cultural and economic contexts, the process of harmonization is expedited by prioritizing what requires more attention than others (negotiating embedded constraints). Finally, countries exhibiting a high degree of adaptability, adopting different strategies and amounts of time to ensure coherence across the system (deliberate but adaptive sequencing) are more poised toward improved student learning. ‘The sparrow may be small, but all its vital organs are present. [A Chinese Proverb]. The nation took pains to ensure not just that all its parts were working well, but also that they are working in tandem. Having the best teachers, schools, or policies is not sufficient; it is about having a broader understanding so that the parts work together.’ Pak Tee Ng, Associate Dean of Leadership and Learning at the National Institute of Education, Singapore. 1

1

Roozen and Faruqi (2014) summarize Mr. Ng’s words expressed in an interview with students at the Harvard Graduate School of Education.

2

Outline Abstract ................................................................................................................................................................... 2 Introduction ............................................................................................................................................................. 4 Overview of assessments............................................................................................................................... 5 Possible uses of learning assessments to promote accountability .......................................................... 10 Patterns of assessment data use across the globe ................................................................................... 12 3.1

Established systems of assessment ................................................................................................... 12

3.2

Evolving Systems .................................................................................................................................. 18

3.3

Dispersed systems ............................................................................................................................... 23

Conclusion..................................................................................................................................................... 27 Annex 1 ................................................................................................................................................................. 33 Annex 2 ................................................................................................................................................................. 35 References ............................................................................................................................................................ 36

3

Introduction Achieving satisfactory learning outcomes for most of the population, which leads to a thoughtful and productive citizenry, is the fundamental goal of education.2 Results from the testing of student learning represent what and how much learning is taking place through the school system. The importance of testing student learning was highlighted first at the World Education for All (EFA) conference in Jomtien, Thailand (1990). It was reiterated ten years later at the EFA conference in Dakar (2000), Senegal. Over the last few decades, the conduct of student evaluations of learning has received considerable attention across the globe (OECD, 2013; Benavot and Tanner, 2007). Setting up effective systems for testing student learning with a defined policy framework, organizational structure, and suitable operational procedures are important for education reform in both established and evolving economies (OECD, 2013; Greaney and Kellaghan, 2012; Clarke, 2012; Ravela et al., 2008). Moreover, the fourth Sustainable Development Goal (SDG) for equitable quality education along with lifelong learning opportunities for all by 2030 calls for a continued and sustained focus on monitoring learning (United Nations, 2015).3 This paper explores the twin questions of whether and how large-scale testing or evaluation of student learning influences the arduous, resource intensive, and complex task of improving and sustaining education quality. Unlike other sectors, providing quality education for learning is significantly multifaceted. It involves the availability of good quality infrastructure and instructional materials, serving students in a variety of age groups, managing a range of public institutions and processes, and, most importantly, the continuous mediation between different stakeholders (students, teachers, administrators, and parents). Levels of student learning is an accountability measure for whether such a system is operating effectively for the entire school year. When the sector is not functioning satisfactorily, learning levels can also play an important role in guiding reform. Scholars have highlighted the challenges involved in dealing with the extent and nature of the use of test data for improving teaching and learning. According to Kellaghan et al., (2009) there is evidence that there is no widespread use of test data even though this information is seen to have the possibility ‘for sparking reform and despite the expense incurred in obtaining such information’ (p. 21). In Latin America, for example, test ‘….data are alarmingly underused in designing strategies to improve educational quality….’ (Ferrer, 2006, p. 27). Mindful that there appears to be an under-use of test results, the question that this paper seeks to answer is when and how countries have used this information. The paper explores the extent to which test data ‘are used as tools for understanding better how well students are learning, for providing information to parents and society at large about educational performance and for improving schools, school leadership and teaching practices’ (OECD, 2013, p.30). The objective of this paper is to draw lessons from the available experience of a group of countries that have made use of assessment data, on the one hand, to judge system effectiveness and on the other hand, to direct reform for improved learning. The paper attempts to hone in on how the design, decision-making trajectory, and program implementation for improved education quality were guided by system level data on student performance. The limited use of test results in directing education reform, especially in developing countries, and the associated challenges with improving quality and learning are also discussed. The analysis, it must be asserted, is dependent on the level of detail provided in the available literature. The intention of this review is to summarize the conditions and context in which assessment data can contribute to accountability and improved service delivery in the education sector. This paper has four sections. It begins with an overview of system level student evaluations of learning. This is followed by a review of country systems under three categories of experience, scale, and agency. A third section 2

Hanushek and Woessmann (2008) show the connections between the cognitive abilities of a population and the increases in individual earnings, more equity in income distribution, and enhanced economic growth. 3 The SDGs were established in 2015 and includes 16 goals and 159 targets covering all the relevant sectors. See http://www.un.org/sustainabledevelopment/sustainable-development-goals/

4

discusses the different ways in which countries have used test data for directing and implementing reform to impact student learning. Countries are grouped into those with established, evolving, and dispersed evaluation systems. A concluding section summarizes the experience of the three groups and engages with a framework that fosters the use of assessment data to propel student learning.

Overview of assessments There are different ways in which students are tested in the school systems across the world. The ongoing monitoring or testing of student learning in the classroom, often referred to as formative evaluation, allows teachers to monitor individual student’s mastery of the curriculum in response to instruction. This paper does not deal with classroom level ongoing testing of student learning. Instead, it examines large-scale testing of student learning at the country level.4 Large-scale testing of student learning at the country level includes examinations and assessments (Clarke, 2012; Greaney and Kellaghan, 2008) administered to students in different grades and in a variety of disciplines. Examinations are referred to as norm-referenced. The performance of a student is evaluated in the context of the performance of others that take the same test. Because the stakes are high in examinations, there is a penchant for malpractice and gaming of the system. National and international assessments, on the other hand, are criterion-referenced. It measures whether a population has mastered a set of knowledge and skill expected of an age-group within a discipline. The knowledge attained and abilities acquired that are tested are often generic and can be applied across populations. Assessments, especially those administered at intervals and on sample populations, are not related to individual students but reflect the effectiveness of systems to foster learning. Both norm- and criterion-referenced tests can be standardized depending on the uniform conditions in which both these instruments are administered. Though this analysis is focused on national assessments that are criterion-referenced, the line between normreferenced examinations and criterion-referenced assessments is becoming less clear as large-scale assessments become more regular, census-based, and widespread. Moreover, systems are focused on comparing one student with the other as well as on subject mastery of knowledge, skill, and application. Furthermore, the distinction between examinations and assessments as being high and low stakes, respectively, is also becoming blurred as large-scale assessments are increasingly being used to make judgements on system effectiveness, influence financing decisions, and inform the agendas for improvement and program designs. In other words, as soon as the accountability function of assessments is applied, it appears to become high stakes either for the individual student or for the functioning of different administrative levels. Large-scale assessments of student learning can be internationally or regionally or nationally conducted. There are several international assessments, each with its own psychometric approach and objectives. These tests enable cross-country comparisons of learning.5 While international assessments will be used to show evidence of the successful use of data, the focus in this paper is confined to country-led large-scale testing of student learning. The assumption is that if national level test data are not available to inform the education reform process, it is unlikely that either regional or international test data will be used to promote quality education. This assumption is not just based on the importance of ownership, relevance, and proximity in triggering reform, but based on the experience of countries that have successfully developed and used assessment data in building 4

Continuous evaluation can also be summative if the results are combined or used as part of the end evaluation and, therefore, in the decisions associated with student transition or promotion to the next level of learning. 5 One of the first international assessments is the Monitoring Learning Achievement (MLA) project created by UNESCO and UNICEF for Grades 4 and 8. MLA was implemented in about 72 countries mostly between the Jomtien and Dakar years. International large-scale assessments include (i) Trends in International Mathematics and Science Study (est. 1995); (ii) Progress in International Literacy Study (est. 2001); (iii) Program for International Student Assessment (est. 2000); (iv) Southern and Eastern Africa Consortium for Monitoring Educational Quality (est. 1995); (v) Programme d’Analyse des Systèmes Educatifs de la Confemen (est. 1991); and (iv) Latin American Laboratory for Assessment of the Quality of Education (est. 1997). See Gove, 2015.

5

robust education systems. Country-led evaluations (national governments, sub-national entities, or nongovernmental institutions) can be directly applied to government policies and programs intended to improve education quality and learning. It is important to note that the paper does not address the quality of assessments – whether they are reliable, verifiable, and transparent.6 Country Assessment Systems: Experience, Scale, and Agency An overall analysis of the types of country-led learning assessments seen across the globe is described in this section. Countries can be classified based on the length of time they have conducted assessments along with the frequency with which they administer them (experience), the scope of coverage in terms of school populations and subjects (scale), and the site of institutional responsibility for this task (agency).7 Table 1 below portrays the proportion of countries that undertook assessments in each region since the 1990s. The regions of Central Asia and Central East Europe (86 percent) and North America and Western Europe (88 percent) have the highest proportion of countries doing assessments. Asia and the Pacific and the Arab states are next with about 74 percent and 70 percent of the countries, respectively, testing learning. The numbers are lower for Latin America (66 percent) and Sub-Saharan Africa (62 percent). Table 1: Distribution of countries doing assessments

Arab States Asia and the Pacific Central Asia and Central East Europe Latin America North America and Western Europe Sub-Saharan Africa

Total countries in the region

No of countries doing assessments

% countries doing assessments

20 39

14 29

70 74

29

25

86

41

27

66

24

21

88

45

28

62

Source: EFA Global Monitoring Report, 2015

Experience refers to the length of time countries have carried out assessments and the frequency with which these tests are administered. Establishing a national assessment system is a challenging task requiring considerable resources and skills. The longer a country has carried out assessments, the more likely there is to be considerable buy-in and expertise developed over time (Lockheed et al., 2015). Assessments can also be carried out annually or at intervals. Table 2 captures the experience of countries in the different regions. Of the countries in North America and Western Europe doing assessments, 76 percent are doing it annually and most of the rest (19 percent) at regular intervals. This pattern is similar for the Latin American region –74 percent countries are doing annual assessments and 15 percent at regular intervals. Both these regions have the most number of countries conducting annual assessments for nearly two decades (average 18 years). About 36 percent of the countries in Central Asia and Central East Europe and 41 percent in Asia and the Pacific do annual assessments. Those doing assessments at intervals in both these regions are 48 and 55 percent, respectively. Less than a third of the countries doing assessments in the Arab States (29 percent) and Sub-Saharan Africa (18 6

Reliable in that the test is indeed measuring what they are supposed to and verifiable in terms of what is intended and there is consistency in measurement overtime (Kellaghan et al., 2009). 7 The analysis is based primarily on the tables found in the EFA Global Monitoring Report, “Education for All 2000-2015. Achievement and Challenges” (2015). The tables can be found in Annex 1 entitled “National Learning Assessments by Country and Region.” Relevant information from the UIS database and the SABER database is incorporated where possible.

6

percent) evaluate learning annually. About half the countries in the Arab states test students at intervals and in Sub-Saharan Africa this number drops to 36 percent. In both these regions, the period between assessments is uneven and extended. Table 2: Experience with learning assessments

Arab States Asia and the Pacific Central Asia and Central East Europe Latin America North America and Western Europe Sub-Saharan Africa

Countries doing assessments Annually (%*) At intervals (%*) 4 (29%) 7 (50%) 12 (41%) 16 (55%)

Average years with annual assessments 9 15

9 (36%)

12 (48%)

12

20 (74%)

4 (15%)

18

16 (76%)

4 (19%)

18

5 (18%)

10 (36%)

6

Source: GEM, 2015 *of those countries doing assessments

Scale refers to the grades and the subjects covered in the assessment. There is considerable variation across countries. Assessments could be confined to the primary section (Grades 1-5), the upper primary section (Grades 6-8) or the secondary section (Grades 9-12). Within each section, it could be students in all the grades, a selection of grades or a single grade that are tested. Assessments can include a combination of sections or all the sections in a school system. The latter would ensure comprehensiveness and consistency with the emphasis on learning as students proceed from grade to grade. Regarding the subjects covered, assessments can be focused on just mathematics and reading or on a wider group of subjects. Figure 1 portrays the variation across countries with regard to the sections that are included in the assessment system. More than half the countries in Latin America (56 percent) assess the learning of students in all the three sections, primary, upper primary and secondary grades. The least number of countries that do this is in SubSaharan Africa (11 percent). In the other regions, it is about a third of the countries. Almost all the countries in Latin America (96 percent), North America and Western Europe (90 percent), and Sub-Saharan Africa (96 percent) test students in primary grades. Central Asia and Central East Europe is the lowest at 52 percent and the remaining two regions about 80 percent. 97 percent and 95 percent of the countries in Asia and the Pacific and North America and Europe, respectively, assess students in upper primary grades. Latin America is next with 85 percent of countries and the lowest is Sub-Saharan Africa with only 50 percent. In the remaining two regions, around 68 percent countries include upper primary grades. There is a dramatic decrease across regions in the proportion of countries that assess students in the secondary grades, which is likely due to the existence of school leaving examinations at this level. Examinations at secondary levels are in place to certify students that have completed the required course of study. In those countries where there are no examinations, assessments are more likely to act as a substitute for certifying students. Except for Sub-Saharan Africa, around 50 percent of the countries include testing of students in secondary grades. In Sub-Saharan Africa, it is only 18 percent of the countries that include secondary grades. Within each level and across levels, a variety of permutations are observed: (i) Different grades could be tested at different points in time; (ii) different sections could be the focus at different points in time; (iii) all the grades could be tested within a specific section; and (iv) a few grades in each of the sections could be assessed. 7

Almost all the countries across the different regions that test students in primary schools include assessment of mathematics and language proficiency. If upper primary grades are included, about half the countries administered tests also in science and about a third in social studies and/or life sciences. Similarly, for the countries that included assessment of secondary school students, about half of them tested in science and about a third in social studies. A few countries also assessed student performance in additional subjects. For example, Hungary tests students on cognitive reasoning and New Zealand in visual arts and information skills.

Source: GEM, 2015

Agency refers to the institutional base or entity that is responsible for carrying out assessments. Assessments could be carried out by a non-government agency, an entity authorized by the government, or a government agency. According to Ferrer (2006), entities established by the government but outside ministries of education are most effective as they have administrative and technical autonomy to systematically conduct and report on results. When the ministry of education is responsible, since this entity is both the financing and implementing authority, there is a tendency to emphasize the successes rather than provide an authentic portrayal of test scores. On the other hand, autonomy does carry a significant risk, especially if these entities become disconnected from what is required by the ministry. Such extraneous agencies might well focus on their own persuasions, which might be technically sound but pay limited attention to the implications of assessment results for sector reform. In some cases, research entities are involved with the Ministry of Education or the Examination Board in either conducting assessments or analysing results. When non-governmental agencies are responsible there are issues with sustainability and ownership. In a federal system, sub-national entities and/or the central government are responsible for this task. Figure 2 displays the distribution of responsibility for assessments across countries among the three entities – Ministries, Boards, and Research Entities. The range in the proportion of countries across regions where Ministries of Education are responsible for student assessment is from 40 percent in Central Asia and Central East Europe to 67 percent in Latin America. Examination Boards are authorized to do assessments in about half the countries in Sub-Saharan Africa (54 percent) and Central Asia and Central East Europe (44 percent). Boards are relatively less responsible in Asia and the Pacific, the Arab states, and Latin America (around 35 percent). In North America and Western Europe, Boards (19 percent) are least responsible but research entities are significantly involved (43 percent). Research entity involvement in Asia and the Pacific (28 percent) and Latin

8

America (30 percent) it is about a third. It is the lowest in the Arab states (14 percent), Central Asia and Central East Europe (20 percent), and Sub-Saharan Africa (11 percent). Student assessment in five countries, two in Asia (Pakistan and India) and four in Sub-Saharan Africa (Mali, Kenya, Uganda, and Tanzania) are undertaken and financed by NGOs.8 No assessments in the North America and Western Europe are supported with external funds (Figure 3). On the other hand, assessments in a number of countries in Sub-Saharan Africa (36 percent) and the Arab States (50 percent) are financed by external donors such as the UNICEF, USAID and the World Bank. External financing leading to limited national ownership may account for the uneven and sporadic nature of assessments in these countries. In Central Asia and Central East Europe and Asia and the Pacific, it is around 30 percent. Only about 11 percent of the countries in Latin America have external support for assessments.

Source: GEM, 2015

Source: GEM, 2015

8

Except for Mali, the Annual Status of Education Report in India and Pakistan and UWEZO in Kenya, Uganda and, Tanzania were founded by a non-governmental agency in India called “Pratham.” This large-scale assessment is unique in that it is based at the village level and administered to a sample of students attending school. Enumerators administer the test at village centers.

9

Possible uses of learning assessments to promote accountability The relationship between quality education and assessment of learning is symbiotic. While system level student achievement data represents an outcome of an effective system, it also serves as feedback information for directing the work of improving quality. Assessment results are a critical tool for promoting accountability and for improving effective system functioning (OECD, 2015). Assessments as an accountability measure help differentiate schools, districts or provinces, and states in terms of the quality of services. Assessments for improvement informs the design of new initiatives as well as the revision and adjustment of existing interventions to improve quality. There are several ways in which assessments can serve as a tool for raising school quality. Before, discussing these implications, it must be reiterated that it would not be appropriate to apply large-scale assessments of learning directly to what is going on in schools and classrooms, especially if they are sample based. Formative assessments and continuous evaluations of student learning play this role and are critical complements to largescale assessments. Large-scale assessment data, on the other hand, are indispensable tools for the different administrative levels that provide oversight and support to schools across sub-national levels and the state. They monitor and facilitate harmonization across three broad components: school infrastructure and instructional materials management, personnel management and support, and school oversight and support. Basic dimensions of each of these components are identified in Box 1. Box 1: Potential areas for assessment data use (sub-national and state levels)

Infrastructure and instructional materials management: While the availability and quality of school infrastructure and materials are taken for granted in high-income countries, this might not be the case with middle-income countries and is usually not the case with low-income countries. Though student assessment information may not be as relevant to the conditions and instructional materials available in schools, since these are much more related to enrolment outcomes, it is important to determine (at a minimum) whether the quality of education and learning outcomes are constrained by inadequate classroom infrastructure (insufficient furniture and personnel, unusable blackboards etc.) and instructional materials (teacher guides, textbooks, writing instruments, paper etc.). Earlier school effectiveness research has pointed to the basic threshold requirements of school infrastructure for learning (Teddlie and Reynolds, 2000). Personnel management and support: Assessment data can be used to support and improve the management of school personnel (administrators, principals, and teachers). Researchers have highlighted, specifically, the significance of a teacher’s knowledge and skill for student learning and future earnings (Chetty et al., 2010; Hanushek, 2010). Personnel management includes (i) the way in which administrators, school principals, and teachers are recruited; (ii) how they are deployed across schools; (iii) their service conditions including oversight; 10

and (iv) how they are supported in the work they do (Clarke and Singh, 2015; Göttelmann-Duret and Tournier, 2008). When regions or geographical areas where student performance is inadequate are identified, the system can be alerted to recruit and deploy better principals and teachers to these schools. Assessment data can point to whether teachers’ service conditions are facilitating or constraining student learning. Teachers’ service conditions that include performance appraisals can help identify teachers that require support. Depending on a more nuanced review of teachers’ pedagogical and subject content knowledge, each teacher could receive tailormade professional development (Ree, 2016; Altinok, 2013; Bill and Melinda Gates Foundation, 2010). Similarly, assessment results can help identify principals that require further support and training to improve outcomes. In addition, student learning assessment systems can provide data for designing and introducing an appropriate incentive structure for personnel at different levels of the system. This is described further when I discuss examples from select countries. School oversight and support: Assessment data can be used effectively for school oversight and support. School oversight and support would relate to time-on-task, instruction and curriculum implementation, formative evaluation of student learning, maintenance of school statistics, leadership activities, and parent participation. Assessment data can alert authorities to whether the activities associated with school oversight and support are taking place. Moreover, by reviewing assessment data, authorities can gauge whether these activities account for all that must be included in school monitoring (De Grauwe, 2008; OECD, 2015). Weak student performance in assessments would suggest that school oversight is non-existent or partial. It would be partial when there is monitoring of whether teachers are present in school but not if they are teaching or if school leadership is assessed but not whether there is continuous evaluation of student learning. It would also be partial if it is just a one-time review and not a comprehensive review of curriculum implementation throughout the school year. In a scenario where there is sufficient oversight of schools and yet weak student performance, assessment data can help guide the support given to schools (for example, leadership training, or guidance for parent participation) or the design of incentives to improve low-performing schools. In this way, learning data can also be used for addressing inequity among schools especially for those in poorer or ethnically marginalized communities. If the assessment is census-based, data can be used to corroborate trends and patterns in the ongoing formative assessments of student learning. Consolidating schools’ internal reporting by the principal with provincial school reviews and external assessments can provide a clearer and more in-depth portrait of school performance. If the above three areas are guided by assessment data, the assumption is that sector budgets and expenditure for quality education would also be indirectly informed by patterns of students’ learning achievements. Expenditure, especially in the case of staff salaries, infrastructure and instructional materials, which often constitute a substantial portion of a countries education budget, can be influenced by the results achieved in student learning. Consistently underperforming sub-national entities or states can be given additional resources. More micro level targeting can also take place in well-performing districts or provinces that have pockets of schools with weak test scores. Again, if assessment data are being appropriately used by aiming for a harmonized and coherent approach that includes infrastructure and personnel management, and school oversight, this would also provide a location for public action based on achievement data. When there is information out there that students are not performing according to expectation, parents, communities and the general public will know where to go to protest and demand answers. Conversely, when there is insufficient information on how schools and personnel are functioning and these areas are operating in silos disconnected from each other, there will be no locus to apply public pressure for reform and there will also be a lack of clarity in terms of acting on the concerns raised. Box 1 (see Pg. 11) summarizes the three areas and the financing and locus for public pressure in this harmonized and coherent approach to reform.

11

Patterns of assessment data use across the globe The use of learning assessment data is discussed for three groups of countries. The first group includes countries that have longstanding and established systems of student assessment. The second group is those countries that have evolving assessment systems. These systems may have been functioning for a few years and are often supported by external partners. Countries falling in this category are usually middle and low-income countries. The third group includes countries operating within a federal system of government with sovereign sub-national entities such as states. The discussion is based on a literature review of sources captured through citation and keyword searches of recent studies and books. The approach taken in reviewing sources was to look for ways in which assessment data generated in-country influenced policies, planning, and implementation for quality education. Evidence of such beneficial relationships between assessment data and the modification of policy and performance is often embedded in the history of education reform in a country. The extent to which sources are relevant also depend on the level of detail in the discussions on the deliberate use of assessment data. 3.1 Established systems of assessment Many countries across the world have had assessment systems for several years especially those in Latin America, North America, and Western Europe. The experience in the other regions is mixed with some countries in these regions having sustained evaluations of student learning. Many of the countries with significant experience and scale in conducting assessments disaggregate this information along several categories (gender, socio-economic status, location etc.) and the results are systematically disseminated. The countries chosen for discussion in this paper are primarily based on how much relevant information was available on the trajectory of reform over the decades and whether the utilization of assessment data in these efforts is clear.9 The incorporation of assessment data findings also provides pathways and options for using learning data effectively for system accountability and improvement. Secondly, the countries discussed in this segment of the paper show evidence of successful impact of the use of assessment data in programming. Evidence of impact is represented in their performance in international assessments including Program for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS). In addition, attempts were made to include countries from the different regions and those with successful and partially successful reforms. The countries included in this analysis are Singapore, Finland, Shanghai, Chile, Uruguay, Mexico, and Qatar. PISA and TIMSS results for these countries are portrayed in Table 3.10 Of the 144 countries undertaking assessments about half the countries participated in PISA and less than a third in TIMSS. The mean score for PISA (OECD, 2016) is 493 in science and reading and 490 in math. Singapore, Finland, and Shanghai are well above the average scores across all three subjects compared to the other participating countries. In spite of the decreasing trend (11 in science, -5 in reading and -10 in math), Finland remains a high performer. Though Chile, Uruguay, Mexico and Qatar are below the PISA average, except for reading in Mexico and math in Uruguay, the drift is toward an increasing score across subjects. Only three countries discussed in this paper took part in TIMSS 2015 (IEA, 2016) in both Grade 4 and Grade 8 tests (Singapore, Chile and Qatar) and Finland in just Grade 4. The range in performance across the countries participating in TIMSS is significant. The difference between the highest and lowest performing countries is more than 200 points. Singapore and Finland are top performers while Chile and Qatar are towards the middle. 9

Even though the experience and scale in doing assessments might be extensive, there are few specific accounts of how countries are using data in education reform. In Ferrer’s analysis (2006) of the practices and challenges of assessment systems in 19 countries in the Latin American region, Chile, Mexico and Uruguay are identified as examples for the use of student learning data in improving quality (2006, p.39). 10 PISA assessed 540 000 students in 2015, representing about 29 million 15-year-olds in 72 countries. This test examines what students know and how they apply this knowledge creatively and for problem-solving. TIMSS assessed 580000 students from 57 countries overall. In mathematics, 49 countries participated in Grade 4 and 39 in Grade 8. In science, 47 countries participated in Grade 4 and 39 in Grade 8. With a strong curricular focus, the goal for TIMSS is to improve teaching and learning and therefore influence policy.

12

Table 3: Evidence of impact from PISA and TIMSS PISA 2015 Country

PISA (Average 3-year trend score difference) Math Science Reading

Singapore

556 (7)

535 (5)

564 (1)

531 (-11)

526 (-5)

511 (-10)

518

494

531

Chile

447 (2)

459 (5)

423 (4)

Uruguay

435 (1)

437 (5)

418 (-3)

Qatar

418 (21)

402 (15)

402 (26)

Mexico

416 (2)

423 (-1)

408 (5)

Finland B-S-J-G China*

Third International Mathematics and Science (increases and decreases in average scores)

Science

Math

Singapore

Finland Chile

Qatar Range across all TIMSS countries

Grade 8

Grade 4 618

Grade 8 621

Grade 4 590

597

(increase from 2011 and 1995)

(increase from 2011 and 1995)

(no change from 2011 and increase from 1995)

(no change from 2011 and increase from 1995)

535

554

(decrease from 2011)

(decrease from 2011)

459

427

478

454

(no change from 2011)

(increase from 2011)

(no change from 2011)

no change from 2011)

439 (increase

437

436

457

from 2011)]




618-353

621-368

590-337

597-358

Source: PISA, 2016; TIMSS, 2016

* Beijing-Shanghai-Jiangsu Guangdong (China).

Singapore is a top performer in both TIMSS and PISA 2015 international tests. Soon after its independence in 1965, the country started its quest for quality education. Testing of student learning, which was focused on examination results in the early years, played a significant role throughout the reform process. Singapore’s experience with achieving quality education included two phases. An initial phase heralded the switch from enrollment to an “efficiency driven system.” A critical activity during this phase was to establish a national curriculum based on “bilingualism and moral, civics, science, mathematics, and technical education” (Lee et al., 2008, p. 23). Curriculum reform was accompanied with systematic connections made, initially, with examinations and then with student assessments. Instructional materials were closely aligned to the new curriculum. This phase also included (i) the streamlining of procedures and processes in schools; (ii) effective monitoring of the work of school personnel (Staff Confidential Reports); (iii) mandatory principal reporting on teacher

13

performance; (iv) establishing a pupil data bank, which included information from pupil report cards; (v) internal and external school reviews; and (vi) use of this information for assessing school performance and planning. The second phase started in 1997 with the move toward “ability driven education” and the fostering of creative and critical thinking. The “Thinking Schools, Learning Nation” program was introduced, which involved a reduction in subject content, revision of assessment templates, and emphasis on monitoring school performance (Tan and Gopinath, 2000). The “Teach Less, Learn More (TLLM) initiative,” which began in 2004, focused on further improving the system. More recently, the TEACH framework was established for teacher-led professional development (National Center for Education and the Economy). Curricula objectives, teacher education, school and teacher appraisals, and learning assessments were progressively better aligned. Using information from different sources, officials across the system observed and analyzed what is going on in schools and classrooms. This information was consistently used to develop relevant and timely tools and guides to support teaching and learning. Ng lists four accountability concepts that are upheld in the Singapore experience: ‘accountability as performance reporting; as a technical process; as a political process; and as an institutional process’ (2010, p. 275). Finland has been a consistent high performer as evidenced in the PISA and TIMSS scores. In most of the countries with advanced economies, the basic elements of quality education are in place – progressive curricula related to assessments of student learning; teacher management and appraisal systems; and school monitoring and evaluation frameworks. Student learning, the ubiquitous goal for education, impacts both personnel and school monitoring activities in some form or the other.11 Unlike other countries discussed in this section, Finland does not have extensive standardized large-scale assessments of student learning. National assessments are sample based and are not publicized by the Finnish National Board of Education. Municipalities are responsible for education and each municipality closely monitors the functioning of schools in their jurisdiction. Several studies on Finland’s stardom in education, have pointed to the environment surrounding schools that uphold trust and responsibility. Based on the principles of decentralization, self-evaluation, and external evaluations, the system “steers through information, support and funding” (Finnish Education in a Nutshell, p. 13). There is a seamless connection between school choice, school functioning, teacher recruitment and school financing in the Finish education system. Municipalities support schools only if there is enrolment, which in turn depends on whether parents choose the school to enrol their children, which is connected to the performance of the school (Bruns, 2015). Schools become accountable for performance, as the funding, which includes personnel salaries, would be adversely affected if there are no students. Teacher management and support are localized and continuous (starting with preservice), focused on both individual (teacher autonomy) and group activities (designing curriculum implementation and designing projects) based on the performance of their students. According to Niemi (2015): ‘…. teachers learn to take an analytical and open-minded approach to their work and they develop teaching and learning environments in a systematic way…. [Thus,] teachers learn alternative ways of working, reflecting, dialoguing, and gaining feedback for their work’ (p. 284). Shanghai is one of the largest cities in China and is at the forefront in student performance on PISA tests.12 China has a centralized syllabus but provides for freedom at local levels to choose textbooks. After being given permission from Central authorities to introduce changes to its own examinations in 1985, Shanghai made adjustments to exam questions to focus on application, problem-solving, and creativity (cross disciplinary). An interdisciplinary approach with three levels (basic, enriched and inquiry based) was introduced in the curriculum 11

The depth and extent of this harmonization and coherence is unclear as there is no literature available on this topic at this time. For example, a key question relates to what is included in the assessments – how does it impact formative assessments, teacher and principal management, especially appraisal and training and school evaluations. 12 One in four students in Shanghai can handle the most complex computations in mathematics using symbolic representations (PISA 2015).

14

and students could select subjects. These changes were accompanied by teaching reform (Curriculum and Teaching Approaches Reform) in the 1990s (Tucker, 2014) that were aligned with ongoing formative evaluations of learning in the classroom (Liang et al., 2016).13 The city also introduced a comprehensive program for equalizing its schools on following criteria: (i) Based on detailed evaluation of infrastructure and services, schools were classified into four categories depending on how much they met government standards. (ii) Financial transfers were made to schools based on their categorization. (iii) Teachers and principals were transferred from urban schools to rural schools. (iv) And partnerships and/or take overs were organized between strong and weak schools. Shanghai was the ﬁrst district in China to mandate 240 hours of professional development over a period of five years. China has a unique system of professional development (Tucker, 2014). Teachers are expected to belong to a “Teaching-Research Group” or “Grade Group” that meets regularly each week to develop lesson plans together. The plans are used as a tool for teacher appraisal. The groups are mentored and observe each other teaching, sharing experiences, challenges and possible solutions encountered in teaching and learning. Chile is a well-performing country being one of the top achievers in the Latin American region and participating in international assessments. Apart from the PISA and TIMSS 2015 performance, according to Gonzalez (2008), between 2000 and 2006 this country showed the largest increases in PISA scores in reading. Chile also has high graduation rates for secondary level and internal efficiency has improved steadily. Several developments in the education sector in this country have coherently and holistically facilitated the quest for improved student learning. Three key reforms are worth mentioning. The first initiative was to establish the “Basic Objectives and Minimum Obligatory Contents for primary and secondary education” approved by law in 1990, which included a mandate for national curricular standards and the monitoring of student learning (Ferrer, 2006).14 Curricula reform was implemented and aligned to the assessment system, i.e., Sistema de Medición de la Calidad de la Educación SIMCE or the System to Measure the Quality of Education (Santiago et al., 2013; Greaney and Kellaghan, 2008). Earlier norm-referenced tests, which evaluated learning gradually, morphed overtime into criterion-referenced tests (Ferrer, 2006). Assessments are census-based with different grades tested each year. Second, curriculum and assessment reforms were accompanied by a suite of changes in teacher management beginning with the introduction of the “Teacher Statute”, also legislated in the early 1990s. The Statute ensured civil service status for teachers and initiated a centralized system of teacher management with teacher appraisals (Sistema Nacional de Evaluacion de Desempeno de los Establecimientos Educaciónales SNED or the National System of Performance Assessment).15 Collective performance incentives introduced in the 1990s was revised to include incentives (bonus pay) for individual teachers by late 2000 (Mizala and Schneider, 2014).16 A third relevant initiative was the use of SIMCE data to promote equity. In the P-900 program, 900 schools in a municipality that were the lowest 10 percent in learning assessments, based on the SIMCE together with selfinitiated school evaluations, received ‘infrastructure, textbooks, classroom libraries, teaching material and inservice school-based workshops’ (Greaney and Kellaghan, 2008, p.100). Once scores improve, these schools no longer received this focused support. These three programs (curricula reform and assessments; teacher management and appraisal; and school performance evaluations) with their own databases that included student performance data contributed to accountability and helped identify where and what kind of support was needed for quality education and learning.17 Overall, data on student learning played a critical role in improving 13

The OECD report (2010) highlights the hierarchy in the culture and the expectation for consistent attention facilitating successful learning. Remedial instruction and supplementary programs are also emphasized. 14 Since this point an independent, government entity is responsible for this task. 15 The previous regime had transferred teachers to the municipalities, banned collective bargaining, and instituted vouchers instead of salaries. 16 Teacher Unions are powerful entities in Latin America. This has resulted in partial or incomplete reforms in teacher management. Also, differentiated packages are a challenge to implement in these situations. Chile’s success in this regard is significant. Individual incentives ranged from 15–25 percent increase in base pay (Mizala and Schneider, 2014). 17 A number of other programs were also introduced based on SIMCE. Data helped government identify schools that would participate in a school improvement program, which involved the allocation of competitive funds for development (Ferrer, 2006). Incentives were also

15

both the teacher workforce and school performance (Santiago et al., 2013). More recently, the Quality of Education Agency was established to bring together student assessment, the monitoring of system performance, and reporting of performance to all stakeholders (OECD, 2015).18 Though not as extensive as in Chile, Mexico is also at the forefront in the use of assessment: ‘The role of evaluation and assessment as key tools to achieve quality and equity in education is reinforced by a range of policy initiatives’ (Santiago et al., 2012, p. 9). Mexico’s education reform began with the establishment of an agreement between the teacher unions and the federal government in the “Alliance for Education Quality.” 19 Curricular reform was also initiated in 1993 specifying the content and outcomes for each grade level (Ferrer, 2006). This reform, in turn, informs the content of the criterion-referenced National Assessment of Academic Achievement in Schools (Evaluación Nacional del Logro Académico en Centros Escolares, ENLACE), which is administered annually to third and ninth graders in language and mathematics (Santiago et al., 2012).20 The Ministry also tests student learning through the Educational Quality and Achievement Tests (Examen para la Calidad y el Logro Educativo, EXCALE) administered to small samples of students from Grades 3, 6, 9, and 12 covering language, mathematics, science, and social studies (Santiago et al. 2012; World Bank, 2013). This was accompanied by the Comprehensive Reform of Basic and Secondary Education (Reforma Integral de la Educacion Basica, RIEB and Reforma Integral de la Educación Media Superior, RIEMS). The RIEB program introduced the “Quality School Program” in 2001 where schools were chosen for additional resources based on a detailed self-evaluation and improvement plan (school reviews with results from ENLACE and EXCALE). 21 Equity is addressed in this program with targeted support (additional resources) given to schools in marginalized communities and isolated rural locations (World Bank, 2012). ENLACE results are used to appraise teacher performance (Santiago et al., 2012). Improving instruction includes a systematic evaluation of teachers during and after preservice training; a standardized examination for permanent posts (Examen Nacional de Conocimientos y Habilidades Docentes the ENCHD or the National Examination of Teaching Knowledge and Skills,); and voluntary evaluations for promotion to management jobs, salary increases, and individual and collective monetary subsidies. A mandatory teacher evaluation system is in the process of being established. 22 This new system is expected to have an equal emphasis on performance appraisal and support. A developmental evaluation is to be introduced that is internal to the school but will inform in-service training courses for each teacher. In addition, the school monitoring program, according to Santiago et al (2012), needs improvement as the “principles and practices” (p. 11) of school evaluation are not entirely clear. 23 School monitoring is conducted as an internal exercise at this time. The possibility of introducing an external evaluation is being considered. In Uruguay, the Education Results Mapping Unit is responsible for assessment. This unit, which is technically autonomous, is administratively attached to the National Administration for Public Education. Both census and sample-based tests have been used. Questionnaires that include a variety of areas are also administered to teachers, principals, students, and families. Several rounds of curriculum reform have taken place with changes given to 25 percent of the highest performing schools, schools that showed the most improvement in this group and those that did not have any pre-selection criteria (Gonzalez, 2008). 18 Additional reforms are consistently being introduced to improve each of the initiatives independently and as a whole. For example, the teacher management reforms now include a revised career path for teachers not based on seniority but on performance, which would necessarily include the performance of the students that they teach (Bruns and Luque, 2015). 19 The Teacher Union in Mexico is one of the largest in Latin America and is exceptionally strong with significant political connections (31 education ministers are union appointees) and power to veto any education reform (Bruns and Luque, 2015). 20 There are external examinations at the end of primary and lower secondary (Santiago et al., 2012). 21 The Scottish self-assessment influenced this program with 33 indicators on school management, teaching practices, and infrastructure (Ferrer, 2006). 22 In the face of protest from teacher unions, the most recent political leadership is in the process of introducing integral reforms surrounding teachers including mandatory teacher evaluations, probationary periods, and expertise-based promotions (Bruns and Luque, 2015). Teacher appraisals will include a section on suggestions for professional development done by the principal and a review of performance for career progression (Santiago et al., 2012). 23 Santiago et al. 2012 indicate the need to include an external evaluation of schools that corroborates the self-evaluation.

16

in theoretical perspectives and academic content. As a result, the recent initiative in the 1980s appears to include some contradictions, which the results mapping unit responsible for developing assessments had to resolve through several rounds of consultations with practitioners (Ferrer, 2006). In 2008, the government did attempt curriculum reform to resolve contradictions and to move from discrete subjects to a more inter-disciplinary approach. This initiative was opposed by the teachers and abandoned (Bruns and Luque, 2015). Teachers are keen to understand the results of assessments and are provided test samples to use in classrooms. Test results direct the content of teacher training activities. Results are also used to continue and expand equity programs that were found to have an impact in full-time schools. Having said this, however, according to Ferrer, ‘Accountability is not a goal of the assessment system. Efforts to legitimize the system in the eyes of the teaching community focus on confidentiality and on using the data solely for pedagogical and curricular purposes’ (2006, p. 127). There is no information on how schools are evaluated and supported in Uruguay. Nevertheless, ‘the assessment system enjoys a substantial degree of legitimacy in the eyes of Uruguay’s teachers, who view the data as valid and useful input for improving teaching practices and school management’ (Ferrer, 2006, p. 128). In 2001, Qatar introduced its world-class program “Education for a New Era.” The initiative (Watkins, 2006) was based on the principles of:  Autonomy (allowing innovation at school level) within a framework of an international curriculum;  Accountability by implementing an objective and transparent assessment of student learning that holds schools responsible;  Encouraging variety by supporting different kinds of schools and instructional programs; and  Enabling school choice. This approach was supported with teacher professional development programs and the consistent use of data and regular studies on the reform process and impact on schools (Brewer et al., 2006).24 According to Alkhater’s review of the initiative (2016), there is complete dissatisfaction and rejection of this initiative, especially among charter schools and attempts are being made to re-establish the earlier system: ‘more than 13 years after launching the reform, there is one common sentiment that underpins the reaction of the educators…..: bitterness. …..All of the initial reform policies have been completely reversed after causing unprecedented social controversy and after years of policy instability’ (Alkhatar, 2016, Abstract). Notwithstanding, this response, there is an expectation that the focused data collection and analysis that this initiative introduced would inform the redesign. The recent OECD (2015) publication “Making Reforms Happen” analyzes the experience of 34 countries. One of the chapters in this book deals with how countries are using student learning data based on questionnaires to school officials. Responses to these questionnaires indicate a more fine-grained use of learning assessment data. Principals report on whether they are using results to inform parents on their child’s progress, to evaluate the performance of their schools and to review problems in curriculum implementation. In a majority of countries, assessment results of students are shared with their parents (p. 95). About 80 percent of 15-year-olds across OECD countries are in schools where principals report that assessment results are being used to improve curriculum and instruction. In addition, principals in 63 percent of the schools’ report that school results are compared with national performance (p. 101). Countries are also using assessment data to improve the monitoring of school and personnel performance. Summary observations: Each of the systems described above have followed different pathways for reform based on their institutional histories, political decision-making, and bold interventions taken at critical junctures overtime. Across countries, though, there is a clear pattern of using assessment information for accountability and for designing interventions to improve teacher management and support and school oversight and support. 24

Charter schools were also introduced in this program. The Qatar Supreme Education Council developed the charter for these independent corporatized schools (Watkins, 2006).

17

Though this is a dynamic portrait, several common directions are apparent in the experience of countries with evolved systems of assessment. First, assessment systems do not exist in isolation. In most countries, assessment systems were established based on detailed and clearly defined curricular reform that stipulates the objectives for each grade level and the expected progress to be made in skill and knowledge from grade to grade. Second, there are variations in how assessments are organized across countries. Finland’s review of student learning is located mainly at the municipality level. Decentralized monitoring has more implications than those organized at the national level in this country. In Chile and Singapore, the system did not duplicate ongoing examinations with an assessment system but overtime ensured that these tests became more aligned to reviewing content knowledge, problem-solving, and application. Third, curricular reform and assessments are accompanied with realignment between school oversight and support even as they are dependent on how personnel is managed and supported. Overall, this multifaceted set of interventions in each country focused not just on inputs but on processes and sustained engagement with the iterative course of policy development. These were accompanied by rigorous reviews of policy implementation and reporting leading, in due course, to a new program or a redesign. In other words, a menu of evaluations done on service delivery, especially school evaluations and teacher performance appraisals using student assessment information, completes the feedback loop to improve system functioning. Evaluations lead to further curricula revision and assessments, and additional insertions in how teachers and schools were supported. For effective instruction, all of these pieces of the puzzle have to be harmonized and continuously fitted together for coherence. 3.2 Evolving Systems In most of the developing countries, student assessment systems are in the process of being established. This may also be the case for some of the middle-income countries and for a few of the more advanced economies. The potential and limitations that incomplete systems impose on the interdependent relationship between assessments and its use in programming for improved learning are discussed here. The analysis is severely constrained by the quality of information available on the reform experiences in these countries. Quantitative data, which would necessarily be input driven and easily measurable, is what is most available. In addition, there are detailed studies on the implementation and impact of development partner financing and programs in these countries. However, process-based analysis associated with government policy formulations, implementation, and monitoring of impact are rare in these countries. As a result, it is not easy to capture the complexity and multifaceted nature of this exercise. It must also be noted that this absence does not lead to the conclusion that there is no use of assessment data in these countries; only that there is no information about such use at this time. Most of the countries that have joined the Global Partnership for Education (GPE) fall into this category of evolving systems.25 The GPE is built on the principles of harmonization and coherence in the support provided by development partners to governments for education reform. The Education Sector Plans (ESPs) of countries that have joined the GPE are publicly available on the GPE Secretariat website. The policies, strategies and activities described in these sector plans are examined in this paper as they represent the intentions and potential use of student learning data for quality improvement and accountability. Within the GPE model, annual or bi-annual Joint Sector Reviews (JSRs) organized by the government and development partners are expected

25

Countries receive funds from the Global Partnership when they have finalized an Education Sector Plans (ESPs) that outlines the government’s policies, strategies, and activities to improve the system. The sector plan goes through a process of “appraisal,” after which donors are expected to provide coherent and coordinated support for the implementation of these plans. 65 countries have joined the GPE.

18

to monitor sector plan implementation.26 Ideally, the review reports on the extent to which ESPs are implemented would have been useful and relevant to this discussion. However, most of these review reports are not available. For the purposes of this paper, the plans in English and French with an end-date of 2017 and beyond are analyzed.27 ESPs from federal states are not included. Sector plans are expected to have at a minimum (i) a vision for sector development, (ii) sector diagnosis describing the status, prospects and constraints, (iii) a list of interventions to address sector limitations and problems, and (iv) a framework for monitoring and evaluating sector outcomes and program implementation. There is a strong assumption that if countries diagnose educational sector issues appropriately, the interventions will also be suitably designed and outcomes achieved. The vision statements found in the country ESPs are broad and general conveying the country’s intention to raise quality, bridge equity gaps, build knowledge and life skills etc. For example, for Zimbabwe’s Ministry of Education (MoE), ‘it is to be the leading provider of inclusive, quality education for socio-economic transformation by 2020’ (Zimbabwe, MoE, 2011, p. 24). Uzbekistan’s vision is to develop a national model of personnel training in order to form harmoniously developed persons that are adapted to modern society and lifelong learning (Uzbekistan, MoE, 2013). Cameroon intends to train its citizens, rooted in culture but open to the world, to be respectful, imbued with dignity, values, honor, and integrity, supporting regional integration, and civic responsibility (Cameroon, MoE, 2013). None of the plans, however, translate or discuss the implications of these broad vision statements to grade appropriate curriculum content and objectives. It is likely that countries have gone through curricula development, and plans do not capture this. As a result, connections between curricula and assessment frameworks are not known. The ESP sector diagnosis primarily focuses on the analysis of the country’s progress on outcomes. This includes an analysis of enrollment, transition, retention, and completion rates examined in the light of the student population and in the context of equity. The identification of out of school children and the magnitude of school expansion required for universal enrollment are included in this diagnosis. Relevant to this discussion is whether there is an analysis of learning levels. Most of the plans reviewed did include an analysis of learning levels and even the performance of the country in regional assessments. Based on the forthcoming publication by the GPE Secretariat of 41 sector plans (2017):  

83 percent of the ESPs identified low learning levels as an issue; and 78 percent analyzed learning achievement results done nationally;28

General causes for the low levels of learning are stated (with brief discussions) such as the lack of trained teachers, the need for new curricula, few textbooks in the classroom, and high pupil-teacher ratios. Even if plans do have a discussion of weak learning achievement results, what this means for sector reform is vague. No connections are made between assessment data and the plan’s vision statement or whether there are links to the curriculum being implemented in schools. Perhaps, the notion that assessments are testing generic skills 26

JSRs were implemented initially in the multi-donor supported District Primary Education Project in India described in the next section of this paper. JSRs, led jointly by the government and development partners, are expected to enable consistent and systematic monitoring of program implementation. This model was transplanted to many countries in Sub-Saharan Africa through study tours to India. JSRs of sector plan implementation was expected to bring ownership and accountability of ministries of education and development partners. Whether this expectation has been met is unclear partly because of the complexities and lack of specificity in the interventions described in the plan and also due to the limited expertise available in conducting implementation research and analysis. Moreover, the effectiveness of the JSR mechanism is dependent on whether there is timely and relevant data available on plan implementation to identify constraints and take action. Within the framework of the project design in the India project this did take place – constraints were identified and actions taken. 27 Annex 1 includes the list of sector plans included. 28 The GPE Secretariat’s forthcoming report also states that 37 percent of the ESPs analyzed results from regional learning assessments; and 5 percent from international learning assessments. Annex 2 includes the SACMEQ and PASEC results of GPE countries participating in these tests.

19

prevalent over the last decade has reduced the importance of whether there needs to be alignment to the syllabus being taught in classrooms. There is no analysis of the status of schools and classrooms in terms of teaching and learning. Apart from teacher absenteeism, there is little information on, for example, the number of classrooms where teaching and learning are taking place satisfactorily, partially satisfactorily or is fully dysfunctional. Moreover, it is unclear how administrators and school personnel (teachers and principals) are managed and whether the support provided is adequate. This would help answer the question of the nature and magnitude of reform required in this area. So even if student assessment results were carefully analyzed, with the limitations in sector diagnosis, one would not know where to begin and what the targets could be. If there is a quality rating of schools and personnel, appropriate and achievable targets could be identified for sector plan implementation. The approach taken during sector diagnosis does have implications for the list of interventions outlined in ESPs. Three levels of ESPs can be identified. The first level is evident in the ESPs of post-conflict and fragile countries, such the Central African Republic, South Sudan etc., which include as expected a predominant focus on the provision of school inputs such as, classroom construction, furniture, developing curricula and textbooks, additional teachers, and teacher training. These initial inputs are necessary and non-negotiable aspects of promoting student learning. Together with these non-negotiables, it would be useful for a new or post-conflict country to begin making connections for accountability – between inputs and the “soft” interventions including curricula reform, student assessment, teacher and administrative staff management and support, and school oversight and support. The second level is concentrated in the intention but not the specificity of ESPs to improve learning, where interventions listed do include most of the important dimensions of quality, especially associated with management and processes. The sequencing and connections, though, remain unclear and therefore limited in terms of promoting harmonization and coherence. Three illustrative examples can be highlighted where it is difficult to discern the connections between critical interventions. Niger will introduce mother tongue instruction in early grades, improve supervision in the classroom, supply relevant instructional materials, establish a new recruitment and deployment policies, and work on curricula revisions (Niger, MoE, 2014). To improve quality, Ghana (Ghana, MoE, 2012) intends to first reduce the textbook to student ratio. Second, the country will strengthen the internal monitoring and supervision of literacy and numeracy teaching in schools by improving school leadership, developing instructional materials, teaching in English and the Ghanaian languages, increasing time-on-task, and introducing learning assessments. Third, Ghana aims to revise curricula and textbooks, update syllabi, train teachers, and introduce examinations based on the new curricula. School report cards and an inspectorate (to reduce teacher absenteeism) will ensure accountability. Tanzania (Tanzania, MoE, 2008) describes its plan to improve education quality by recruiting and retaining teachers by establishing an Educational Qualification Framework and a Teacher Development and Management Strategy. In-service training is to be provided for learner-centered pedagogy and scholarship grants continued. A performance management system will also be established and the inspectorate system would be strengthened. In the illustrative examples of Niger, Ghana, and Tanzania, even though the components that would strengthen assessment results are there, the interventions are fragmented. First, due to the limitations in sector diagnosis, it appears as though the interventions are going to be implemented on a clean slate rather than on decades of school improvement. If there is sufficient analysis of the institutions and process that already exists, what is being proposed would be better understood and the insertion of student assessment data could be appropriately organized. Second, though there are a number of interventions listed, it is not clear whether existing capacity levels in the system and the challenges inherent in the work culture of state institutions would allow implementation of new initiatives or changes to existing practice. Finally, the harmonization and coherence across interventions to impact teaching and learning in classrooms are lacking. For example, one is not sure what would be the connection between teacher performance management and in-service training in the Tanzanian 20

plan or how would there be alignment between increasing time on task and the proposed school report cards in the Ghanaian plan. The third level ESPs display a more coherent approach to education reform and the use of assessments to improve quality. Here are Illustrative examples from four countries. The ESP for Bhutan includes the following (Bhutan, MoE, 2014):    

Curriculum revision to meet international standards. A revamped assessment system to fit with the curriculum. Teachers trained in a holistic approach to assessment. Teachers, principals and schools to achieve minimum quality standards.

Cambodia in response to the challenge of insufficient and weak student learning will initiate the development of a national framework to ensure a quality monitoring and response mechanism……[to include] assessing student achievement through regular classroom testing and initiating student assessment in Grade 3 with a view to expanding to Grades 6 and 8; enhance the quality and relevance of learning through regular and systematic curriculum, textbook and learning materials review (Cambodia, MoE, 2014, p. 8). Though performance reviews are not included in this list, Eritrea’s program will improve quality with curricula revisions focused on learning outcomes, the provision of revised textbooks, increasing the competencies of teachers and school leaders, improving teaching and time on task and the tracking of learning through assessments (Eritrea, MoE, 2013). Again, Rwanda outlines a similar approach: At primary level, a continued focus will be placed on the acquisition of basic numeracy and literacy in the early grades. This will be monitored through national and systematic assessment of learning achievement in core subjects at key points. The results of assessment will be utilized to inform teaching practice to improve reading and numeracy skills. This will provide a strong foundation for students to move from ‘learning to read’ to ‘reading to learn’ across all subjects in the curriculum (Rwanda, MoE, 2013, p. 39). While there appears to be harmonization and coherence in the interventions described in these plans, the limited sector diagnosis will also pose issues with knowing where to start and how to build on existing systems as well as the task of identifying goals for implementation. Furthermore, whether teacher management and support and school oversight and support are holistically addressed and appropriately aligned are unclear in these plans. Information on two other initiatives in Pakistan and Vietnam, respectively, hold pointers to the iterative use of assessments in evolving systems. The “Punjab Education Reform Road Map” was initiated in 2010 (Barber, 2013). Instead of a global approach to performance, the review of 32 districts that participated in the program captures variation and detail on the status of schools in each district. Teachers were given guides and training while district officials were held accountable for school performance. Administrators visited 97 percent of the schools each month. A variety of indicators are judged on four criteria: whether they were on or above trajectory, close to the set standard, attained some progress, or were significantly off track. The detailed information made available on the status of schools across the state allowed targeted and relevant attention to be given to schools and districts that were off track. Attention could be in the form of additional support to teachers, training for principals, or ensuring better instructional materials and sufficient teaching time. The Program Monitoring and Implementation Unit responsible for “stocktaking” every two months made the connections between the different components critical to improving quality in classrooms including instruction, school administration, school and teacher evaluations, and learning. This is only the beginning, but the regular monitoring and reporting to the highest levels of government on what has worked, what has not, and actions to be taken based on this information boards well for progress to be made. This approach avoids the one size fits all but enables the specific 21

constraints in a school to be understood and action taken. ASER data shows increases in student learning of the basics of about seven percent (Barber, 2013). The longitudinal Young Lives study on educational opportunity, inequality, and learning outcomes in Ethiopia, Peru, India, and Vietnam, highlights the achievements of Vietnam. Most of the students in Grade 8 in Vietnam had mastered the basic skills when compared to the other countries (Rolleston et al., 2013). PISA scores for 2012 and 2015 also show Vietnam’s progress in learning (OECD, 2014) and equity (Glewwe et al., 2014). Though there is a decrease in average scores from PISA 2011, Vietnam was a top performer in science (525) in PISA 2015, and around average in reading (487) and mathematics (495). Possible explanation for this success would be the systematic reform of school curricula starting in 2000 and the distribution of new textbooks based on this curriculum: The overall view is that this renovation of the general education curricula has met the objectives and requirements of educational content and methodology at different levels as laid down in the Education Law. The revised curricula has improved consistency in learning and has also facilitated continuation and development among levels…The revised curricula have made for better harmonization between a subject’s content and the teaching/learning methodology, and it has improved links between curriculum/textbook and teaching equipment. (World Bank, n.d., p. 21)

These activities were supported by the principle of "school-close-to-local people," and the ‘establishment of an enabling cultural environment that promotes and maintains all achievements already gained, is an integral part of the development of minimum knowledge foundation for the entire population’ (World Bank, n.d., p. 25). Clear expectations were broadcast for transition and curriculum mastery. Vietnam has tried to ensure a reasonable PTR with teachers and staff qualified according to the Education Law. Hardship allowances were provided and equity is given importance in terms of salaries and allowances. Reforms are now being directed at pre-service and in-service training, and the deployment and utilization of teachers and educational managers. Summary observations: A few summary observations can be made on the use of learning assessments in countries with evolving systems. Firstly, the sector diagnoses are rich in data on infrastructure, instructional aids, student and teacher numbers, and salaries. However, the diagnosis is limited in the analysis of the status of schools and how personnel is managed. Therefore, it is difficult, to judge the adequacy of the building blocks for quality education and how the parts fit together. Secondly, since the sector diagnosis is weak, the starting point for improving teaching and learning is vague. The starting point is critical here as all these countries do have some sort of management system in place often emanating from their colonial backgrounds (Clarke and Perrot, 2011; Mulkeen, 2007). Moreover, there will be strong cultural and cognitive affiliations to these systems that have been in place for significant periods of time. It is important, to avoid failure, for any new initiative to evolve from these systems that have existed for decades. Finally, most of these systems do have regular and intense testing of student learning in the examination system. External partner support is driving student assessments being undertaken across these countries and the extent to which there is ownership and sustainability are concerns as this becomes a parallel initiative. A fully established system of student assessment is likely to take time. In addition, for political buy-in (and thereby financial allocations), assessments at the higher-grade levels are important and this will take even longer to establish. In the meantime, in order to avoid a general and global approach to education reform evident across sector plans, the existing examination system could be used for targeted interventions for quality education. As noted earlier, both Chile and Singapore point to a use of examination data in their initial reform efforts, as the system progressed into a more robust review of student learning.

22

3.3 Dispersed systems Carrying out and using student assessment data is further complicated in a federal system with sovereign and independent sub-national entities responsible for education service delivery. Several countries fall into this category, such as India, Ethiopia, Canada, and the United States. This section focuses on the experience of two countries with a federal system of government, namely the United States of America and India. The United States (U.S.) represents an established assessment system and India, an evolving one. In both countries, education is the responsibility of both federal and state governments. The U.S. and India have had large-scale national or federal reform programs to improve education quality and student learning for nearly two decades. The U.S. has had both national (National Assessment of Education Progress)29 and state assessments for more than 40 years. India, however, has had the national system of assessment (National Achievement Survey) for just over a decade. There is a push for states in India to have their own systems and certain states such as Gujarat and Andhra Pradesh are already doing so. In addition, in the case of India, there is a non-government entity (Pratham) doing assessments since 2004 (ASER). United States: Initially, the states were fully responsible for education. The unsatisfactory performance of states on student assessments led to the establishment of the Elementary and Secondary Education Act (ESEA) of 1965. The reforms undertaken have had somewhat of a positive impact evident in the PISA and TIMSS scores (Table 4). The PISA score in science (496) and reading (497) is above the mean score across countries, which is 493 but in math (470), it is lower than the overall mean score, which is 490. With regard to the TIMSS test scores, there is no change since 2011 except for Grade 8 math, which shows an increasing trend. Considering the 1995 results, except for Grade 4 science, there is a steady increase in average scores. Table 4: US results in PISA and TIMSS PISA 2015 PISA (Average 3-year trend) Country

Science

Reading

Math

USA

496 (2)

497 (-1)

470 (-2)

Third International Mathematics and Science Science

Math Grade 4 539 USA

Grade 8 518

Grade 4 546

(no change from (increase (no change 2011 and increase from 2011 from 2011 and from 1995) and 1995) 1995) Source: PISA, 2016; TIMSS, 2016

Grade 8 530 (no change from 2011 and increase from 1995)

The ESEA Act mandated federal government involvement with education provision in the states leading to financial transfers.30 States were expected to demonstrate improvement by testing a single grade in primary, middle, and high school. Due to the challenges with implementation and compliance, there were several reauthorizations of the ESEA. The 1994 reauthorization, called the “Improving America’s Schools Act”, demanded states establish standards and align assessments accordingly.

29 The National Assessment Governing Board is appointed by the national level Secretary of Education. The Board consists of state officials,

academics, and representatives from business and the pubic (Greaney and Kellaghan, 2008). 30 Title I, Part A of the ESEA stipulates this and is often referred to as Title 1 Funds.

23

Another significant reauthorization of federal involvement with the states was the “No Child Left Behind Law” introduced in 2002 (Education Week, 2016) in response to this situation. The No Child Left Behind (NCLB) program emphasized, on the one hand, the setting of goals and measurement and on the other hand, transparency and accountability. The NCLB was meant to give a boost to education in the states, especially for children with English as second language and with special needs and from ethnic and racial minorities. States were required to test their students to establish a baseline in Grades 3 to 8 in reading and math. Disaggregation of results was a requirement. A “remedy cascade” of federal sanctions was introduced for failing schools with a mandate for yearly progress toward 100 percent proficiency in reading and math scores by 2014. If federal policies were adopted, such as the establishment of clear standards and teacher performance evaluations, sanctions relief was offered to states with failing schools (American Enterprise Institute 2015). The NCLB encouraged states to increase the monitoring of student learning (Sawchuk, 2011) by paying attention to the assessment data of weaker groups in the state. States have used assessment data to examine their own systems of teaching and learning in the light of this information and to become accountable for reform (Dee et al., 2013; Gosnell-Lamb et al., 2013; Dee and Jacob, 2011). With regard to teaching and learning, the system started paying attention to the alignment of curriculum, instruction, and assessment; the amount of time spent on teaching core subjects; and the appraisal and professional development of teachers (Jennings and Rentner, 2006). Highlighting success, supporting schools that need improvement and imposing consequences for nonperformance was part of this package of state initiatives that resulted from the NCLB. In sum ‘…states have also committed to better supports for educators to adapt to the new standards, better assessments to measure student learning, and better accountability systems to understand where schools are struggling and how to help them improve’ (Center for American Progress (CAP) and the Council of Chief State School Officers (CCSSO), 2014, p. 1). The NCLB program also had its critics (Ravitch, 2009). The Federal government was considered to be imposing common expectations across states, with the assumption that there existed a shared commitment and interest in reform. It was also assumed that states knew how to improve and had the capacity to implement change. The sanctions imposed on the states were strong and led to gaming the system. There was inaccurate reporting of teacher qualifications and student achievement results and/or lowering of the cut-off mark (American Enterprise Institute, 2015; Johnson and Howley, 2015). For instance, there was a provision for additional tutoring for weaker students. However, in some of the states, there were not enough tutors for what was required. ‘The common denominator is that each of these seemed like a simple and sensible idea to congressional staffers but each became counterproductive when it hit the complexities of governance in the federal system’ (American Enterprise Institute, 2015, p.4). An alternative approach was suggested which was to introduce a “facilitating team” to identify reform strategies at the state level considering the traditions, politics, and socio-cultural environment. As issues with the NCLB entered the discourse on education reform and the presidential term of George W. Bush (2001-2009) was ending, there was a natural slow-down of this program. Interestingly, by then NCLB had done enough to facilitate states realize the importance of accountability and the need to build their systems based on extensive analysis, goal setting, and monitoring of learning. State leadership started working together to establish their own reforms. The CCSSO comprised of state leaders came together in 2011 to propose the Common Core State Standards (CAP and CCSSO, 2014), which outlines shared curricular objectives across the states. To address bottlenecks within NCLB, a program entitled “Race to the Top” was introduced (Riddle, 2012). The Common Core and the Race to the Top programs highlight the following broad areas adapted by the states:31

31 Au (2016)

describes a fracturing of the group that came together around NCLB and the introduction of the Common Core in the context of the Race to the Top.

24

    

Measuring progress toward college and career readiness through multiple measures and more robust systems of assessment; Measuring and supporting school-based quality improvement; Rethinking state systems of support and intervention for struggling schools; Promoting resource accountability; and Promoting professional accountability of teachers and leaders

In 2015, the national program was reauthorized in “Every Student Succeeds Act (ESSA)” focusing on a more balanced approach that includes both accountability and improvement (Darling-Hammond, 2016; Mann, 2016). The reauthorization focuses on developing relevant indicators of school progress, strategies for supporting schools, and the use of evidence-based interventions. States develop their own reform programs and use federal resources flexibly to suit their own realities. India: For several decades after independence in 1947, India was playing catch-up trying to reduce the knowledge gap with the West and trying to increase its literacy rate.32 Since its independence, India’s school reform efforts were left to the states. Due to severe resource constraints and a resistance to the use of external donor funds, programs to improve education were scarce. Small scale innovative programs supported by external donors were introduced in a few states (Clarke and Jha, 2006). Following the World Conference on Education for All held in Jomtien in March 1990, the District Primary Education Program (DPEP) was launched in 1994. This program, implemented by the Government of India (GOI) with the help of resources from external partners (multilateral and bilateral), was one of the first large-scale reform programs undertaken by GOI. Though it was financed by donors, the program was strongly aligned to India’s ethos of self-sufficiency and ownership (Bashir and Ayyar 2001; Abadzi, 2002). The DPEP was considered to be conceptualized and implemented entirely by the Indian national and state governments. An important goal for DPEP was to increase learning in math and language by 25 percentage points over baseline and to ensure a minimum score of 40 percent in other subjects. Another objective of the program was to build institutional capacity throughout the system by introducing administrative entities closer to the school (Block and Cluster Resource Centers). Other outcomes of the program included an increase in enrollment and a reduction to less than five percent enrollment between the general population and groups such as girls or children from socially disadvantaged communities (the Scheduled Caste and Scheduled Tribes). The program covered 271 low female literacy districts in 18 states (Hirshberg, 2002). States were required to share program expenditure – 85 percent by federal and donor financing and 15 percent by state governments. States were expected to submit annual budget and expenditure plans for the funds. The federal government stipulated what the money could be spent on, which included infrastructure development, improvement in textbooks, in-service training for teachers, training for Village Education Committees (VEC), grants to teachers for preparing low-cost teaching aids, and support to VECs to improve the school environment. A cost ceiling of 33 percent was put on infrastructure development. A “State Implementation Society” was established parallel to the State Ministry of Education for the purposes of introducing an alternative implementation culture, financial ring-fencing (where fund flows can be monitored and accounted for instead of disappearing into the state machinery), and ease of monitoring of implementation. DPEP morphed into the Sarva Shiksha Bbhiyan (SSA) in early 2000.33 Except for the proportionate contribution of states, which was increased to 25 percent and the inclusion of upper primary grades (6 to 8) for support, the SSA essentially expanded the interventions introduced in the DPEP to the entire

32 This approach accounts for the significant advances that India has made in technology and the high

proportion of graduates competitive in the various technology and professional fields. 33 Rashtriya Madhyamik Shiksha Abhiyan (RMSA) is the Government of India’s secondary education program. The RMSA aims to ensure universal access to quality secondary education. The model being implemented is similar to the SSA.

25

country (UNESCO, 2015b). This program continues to operate supported only by the World Bank (World Bank, 2014). Both SSA and DPEP have contributed to the increase in the resources available at the school and community levels, which swelled the number of schools, textbooks, and teachers in India’s education sector. DPEP and SSA have built extensive sub-district level institutions (Block and Cluster Resource Centers) critical for all training activities. Both DPEP and SSA have made substantial contributions to school participation, especially of girls and children from socially disadvantaged groups (UNESCO, 2015). Learning though has continued to stagnate: ‘Learning outcomes for children in Indian schools are low and the learning trajectories for children who remain in school are almost flat… Reports indicate that learning achievement has been decreasing over the years….’ (World Bank, 2014, p. 2).34 More than 30 percent of children dropout before they reach Grade 6. Interventions associated with teacher professional development, which were diligently introduced across the subcontinent, have had little impact. 35 This lack of impact is discernable and evident in the student achievement data published each year by Pratham since 2004. According to their ASER Survey (Pratham, 2014), there is an actual decline in the proportion of students in Grade 5 that can read a Grade 2 text between 2008 and 2014, especially in public schools. Another longitudinal study done by Young Lives (2014) also shows the decline in learning by 14 percentage points for 12-year old’s in 2013 when compared to 2006 in the state of Andhra Pradesh. Summary observations: A few general comments can be made on the initiatives introduced by the U.S. and India and the benefits of using assessment data. The U.S. approach appears to be more appropriate in the context of federalism in that specific expenditures are not stipulated in the financial transfers to the states. Therefore, states could design a program that was appropriately harmonized and targeted. In the SSA, the parallel system established by State Implementation Societies and the specificity in how finances could be spent, weakened the capacities of state ministries of education to harmonize the system. In this context, it was a challenge for Indian states to bring together state curriculum objectives and the NAS assessment framework. Having said this, however, the common approach to the states in both countries led to what Bashir and Ayyar (2001) called the “tortoise-hare dilemma,” which essentially adopts a common approach to reform across sovereign states that might be at different stages in education development. In the U.S., while states were given the freedom to design their own programs, it was assumed that sufficient capacity and expertise existed across states for implementation. This issue may have been addressed in the appointment of a facilitation team for each state. In India, this common approach did not allow for a more nuanced and state specific financing for education reform. Ownership and reform drive, which existed in some of the states to tackle more complex issues, were left untapped. Among the extensive reviews of the U.S. program, it was found that the Common Core is too difficult for at-risk students and the gap is increasing between high and low performers (Lauen and Gaddis, 2016). Critics want a move away from just the core subjects and standardized testing to a more creative and non-standardized approach to learning (Tucker, 2014b; Wei et al., 2015). NCLB’s preoccupation with testing is also seen to reduce attention to the complexities involved in the evaluation of school quality (Price, 2016). Specific to India and similar to the ESPs discussed above, even if the progressive states wanted to address the learning crisis by using assessment data in programming, it would be difficult to do this due to the limited information available on districts and state governments’ school oversight and support functions and personnel 34

The most recent effort to address this situation is the introduction of “Continuous, Comprehensive Evaluation (CCE)” in classrooms. Again, this is a federal initiative and disconnected from other quality initiatives and constraints at the state level. Early reports indicate that states are struggling to implement the CCE likely for the same reasons of insufficient harmonization and coherence with other streams of school functioning. 35 It could be that the program’s substantial increases in enrollment have had a negative impact on learning. However, a recent study by Lúcia Kassouf (2015) on Brazil found that large increases in enrollment were not associated with learning proficiency.

26

management and appraisal systems. The first comprehensive teacher management and development studies in primary and secondary education in the states were undertaken in 2015 (Ramachandran et al., 2015; Clarke and Singh, 2015, 2015a; Singh and Clarke 2015). Due to the absence of this information on areas key to the effective functioning of schools for an entire school year, officials would not know what kind of reforms would be most appropriate and the sequencing.36 In this scenario, the role of communities and parents to influence the system using available achievement data (for example, ASER) is also weakened. 37 Furthermore, due to the limited harmonization and coherence across the key areas, even if parents were to express their concerns, what needs to be done would be unclear. In both countries, the passing of the baton from one political party to another will likely bring substantial changes to the role of the federal government and states in the future.

Conclusion The question of whether experience, scale, and agency have implications for the use of test data is unclear (Table 5). The performance of countries with established systems such as Singapore, Chile, and the U.S., which have had assessments for more than two decades, is similar to a country like Finland, which does not have standardized publicly shared national level assessments. Moreover, the length of time in which countries have done assessments also does not have implications for performance. Consider Vietnam with assessments done irregularly for about a decade and others that have done census surveys of learning regularly for a significantly longer period. The considerable variation in terms of the scale also poses challenges to determining whether this dimension has implications for the use of data. Countries test different grades and different combinations of sections and they are all high performers. It is likely that the impact of a country’s experience, scale and agency in carrying out assessments would be linked to other forms of evaluations of student learning in a country (such as how examinations and formative assessments are organized), which is not discussed in this paper. Table 5: Experience, Scale, and Agency of countries reviewed

Country

Experience Start Year Frequency

Chile

1988

Annually

China

2007

Annually

Finland

1998

Irregular

India

2002

Irregular

Scale Subjects

Agency Grades

Language, English, Math, Science, Social Studies, Physical Education, ICTs Chinese, English, Math, Science, Psychological Health, Physical Education, Health

2,4,6,8,10,11

Board

4,8

Research Entity

Language, Math, Physical Education

6,9

Ministry

Language, Math, Science, Social Sciences

3,5,8

Research Entity

36

In some of the progressive states especially in southern India, independent attempts (such as Andhra Pradesh, see Jain, 2014) were made to improve system functioning critical to improving quality. But there is no systematic analysis of what was and is being done. 37 Over the last decade, there has been the assumption that if there is sufficient information available to parents and communities then they would demand reform. This has not been the case in India. The ASER survey administered at the community level with data now available for about 10 years has not prompted deep reform evident in the weak results. Randomized control trials of whether information can make a difference also show a mixed picture (Pandey et al., 2011; Keefer and Khemani, 2014).

27

Mexico

1994

Annually

Qatar

2004

Annually

Singapore

1960

Annually

United States

1969

Annually

Uruguay

1996

Annually

Viet Nam

2001

Irregular

Language, English, Math, Physics, Chemistry, Biology, Social Studies, geography, Civics, Ethics, Reasoning Language, English, Math, Science, Social Studies, Muslim Education Language, Math, Science, Social Studies, Applied Subjects Science, Social Studies, Geography, History, Civics, Arts, Technology, Engineering Language, Math, Science, Social Studies, Cognitive Behavior Venetian, English, Math, Physics, Biology

3,4,5,6,7,8, 9,12

Ministry Research Entity

4 to 11

Board

10,12

Board

4,8,12

Research Entity

1,2,3,4,6

Board

5,6,9,11

Ministry

Source: GEM, 2015

Though the impact of experience, scale, and agency regarding assessment systems may not be clear, this analysis shows how student achievement data in the TIMSS and PISA high scoring countries reflects accountability for quality education and the movement toward better teaching and learning in classrooms. Starting with the recognition based on test data that their system is weak and lacking in accountability for student achievement, they have worked on several key components to increase learning. Broadly, the emphasis included a focus on curricular objectives that underlie assessment frameworks, followed by an emphasis on test data informing three main parts or components of system effectiveness – managing the provision of infrastructure and instructional aids, personnel management and support, and school oversight and support. The basic dimensions that constitute each component or building block of system effectiveness were listed earlier (Box 1). The model for the complex use of assessment data in these countries is portrayed in Figure 4.

28

Figure 4: Making use of assessments for quality education

In Figure 4, the school and classroom are at the center. Curricular objectives connected to the assessment system forms the frame within which schools and classrooms are located. The three components of system effectiveness (lower right-hand corner of Figure 4), which directly affect teaching and learning in classrooms, are being continuously harmonized or synchronized with each other. Harmonization takes place at the different administrative levels that have decision making power and resources. Harmonization is when information evolving from one component is considered in the light of what emanates from the other two components. For example, while classrooms might have sufficient time-on-task and formative evaluations of student learning, information on the abilities of the teacher, which is collected and organized in the teacher management component, is also important. Similarly, for effective instruction and curriculum implementation, teachers are dependent on good quality instructional materials and infrastructure as well as school leadership. Coherence characterizes the system when there is consistency across the entire system. One way to avoid disconnects and promote consistency across the system would be for the three components that were identified in this paper 29

infrastructure provision, managing personnel, and overseeing school functioning - to consistently inform the organization of curricula and assessments of student learning. Though each country’s pathway to reform is different, the end result is a holistic system representing harmonization between the parts directed toward overall coherence. The whole matters as do the component parts. Tucker (2014), in his introduction, summarizes points made by Ben Jensen and Kai-Ming Cheng on Shanghai’s success in education. There is a tendency to: ‘focus on a few key facets of the system in a search for magic bullets, ignoring the fact that the success of these particular factors is made possible only by the myriad other features of the system that gives it its particular gestalt.’ He adds that ‘We cannot really understand how the Shanghainese built such an effective education system unless we understand it as a system that is more than the sum of its parts’ (Tucker, 2014, p.3).38 I submit that there are three aspects that stand out in the experience of countries creating harmonized and coherent systems. First, countries can harmonize the parts when there is enough information available on each component. A balanced knowledge base is a basic building block for education reform. Second, countries may choose to emphasize one part over the other depending on a variety of factors that are context-specific. Systemic educational reform involves identifying and negotiating embedded constraints. Thirdly, countries adopt different strategies and take varying amounts of time putting the components of the system together. Education sector reform calls for deliberate but adaptive sequencing. An elaboration of these three aspects is in order. Balanced knowledge base: To chart the course for improving quality, an important question could be stated as follows: is there sufficient information available on the inputs and processes associated with schools, personnel, and infrastructure at different levels of the system? The answer is to a large extent “yes” for the countries that have established systems of assessment. Basic infrastructure and Human Resource Management systems (with performance appraisals) are in place and, in most of the countries, extensive school monitoring systems are either in place or in the processes of being established. All of these systems are generating rich and detailed information incorporating student assessment data. With sufficient political will and resources, the pieces can be and are being put together. In countries with dispersed assessments systems, while the national system might be collecting information, putting the pieces together is dependent on the states and there would be considerable variation with regard to the comprehensiveness of the information available at that level for advancing harmonization and coherence. Having said this, the demand from the federal government with the offer of financial incentives provides a context for sub-national entities to create a coherent picture. In countries with evolving assessment systems, adequate information to a large extent is available on infrastructure and instructional aids; however, this is not the case with schools or personnel. The “Country Status Reports” in several countries in Sub-Saharan Africa formed the analytical base for developing sector plans. These reports provided extensive and critical data on equity trends in enrolment, transition, and completion. This information strengthened the capacity of government to make decisions on access. Augmenting this information with descriptive statistics and qualitative information on how schools and teachers were functioning and being managed across the system would have constituted a base for relevant design and planning for quality education. School report cards, which are now being done in many countries (Cheng et al., 2016), are trying to provide more details. However, reports may not include sufficient information on the more process type dimensions such as how the oversight of schools takes place or how data on student test performance informs personnel management. Moreover, it is also unclear whether the information from these reports is being consolidated to influence decision makers.

38

Scholars discussing the experience of La tin America, Finland, Singapore etc. convey similar cautions.

30

The search for certainty and conclusiveness in what explains and raises outcomes using quantitative data is also responsible for the limited information available in countries with evolving systems. Recent reviews of these studies were undertaken by Glewwe (2013) and Evans (2015).39 These studies, while significantly resource intensive, did not provide information on the existing system, especially the curricular fit with assessments, about what was going on in schools and how personnel was being managed. As Tucker points out when discussing the Shanghai success: People educated in the analytical methods of the sciences typically analyze everything, decomposing systems into their constituent parts and try to estimate the contribution of each to the effect on student achievement. By all means, do that, say Cheng and Jensen, but, unless you grasp holistically the way the whole comes together, unless you grasp the motivating spirit of the system, you do not really understand anything very important (2014, p.3).

A balanced knowledge base for managers across the system (initial sector diagnosis and implementation analysis) would allow decision makers more opportunity to use assessment/examination data and to consistently fit the parts for harmonization and coherence across the system and thus, provide a platform for effective teaching and learning. Negotiating embedded constraints: In countries with established systems, depending on the history of education, technical capacity, and the political, socio-cultural and economic contexts, there are consistent and ongoing negotiations for quality education on a variety of context-specific and embedded constraints. The initial emphasis to a large extent has been on the alignment of curriculum and assessment. Countries began by looking at their examination system and adjusting the content to become more criterion based. Undertaking the process of aligning curriculum content with the testing of student learning in countries with evolving systems will require considerable negotiation. Historically, in many countries, it is the examinations that have been closely aligned to curriculum. Since there is an absence of any single assessment that has “dominant scientific superiority” (Wagner, 2010), an efficient approach would be to review and adjust examinations (which are connected to curricula) to become criterion-referenced or to include a proportion of norm- and criterion-referenced items in these tests.40 This approach would also take into account the cost of doing assessments (Wagner et al., 2011), especially if there is a parallel system with both examinations and assessments being undertaken. As in other countries with established systems, this strategy would allow governments to begin using data emanating from examinations to guide reform even as robust assessment systems are put in place. Following curricula-assessment alignment, what happened simultaneously or what followed seems to vary. In some cases, it involved the system’s capacity to review teacher performance and to conduct appropriately positioned teacher support activities (for example, in-service training) and in other cases, it was the need to improve school performance monitoring and develop programs to improve school functioning. In countries where assessments are still evolving the emphasis has been mainly on inputs –infrastructure, community mobilization, recruiting sufficient teachers, and providing enough textbooks. Negotiations have often focused on increasing the resource envelope and capacity building for providing basic inputs. These countries are only beginning to address embedded constraints (these could be political, cultural, economic, and technical) associated with establishing school oversight and teacher management systems for accountability. Furthermore, 39

Randomized Controlled Trials (RCTs) have gained significant popularity over the last decade, well over the earlier production function approach. Verger and Zancajo, (2015) in their review of Glewwe’s work highlight the limitation of the RCTs in education. Firstly, considering the complexity of enabling learning, it would be difficult to “control” for the totality of individual and contextual characteristics. Secondly, the focus is narrow mainly on school level policies and incentives, and systemic issues are difficult to include in this kind of research. Finally, RCTs are often restricted to economic justifications and ignore findings from other disciplinary approaches that influence reform in the education sector. 40 In this way, the predominance of rote in teaching and learning in countries with evolving systems could also be addressed as the revised high stakes examinations would demand this.

31

as more fine-grained data begins to be generated, designing targeted and relevant programs to improve teaching and learning rather than generic interventions would be complex and challenging but critical and essential for student learning. Deliberate yet adaptive sequencing: Countries have taken different amounts of effort and time to harmonize components and ensure coherence. The process always seems to begin with country leadership deciding that it is time to do something about the learning crisis. Thus, for the process of harmonization to start and proceed efficiently political buy-in is critical. Once the reform begins, in countries with established assessment systems and complete databases with input and process information, managers are able to comprehend the extent of deficits and the effectiveness of implementation and respond accordingly. In other words, when parts are perceived to get out of sync based on assessment data, a knowledge base exists for making corrections or conveying the need for a complete redesign to get back on track. Reviewing sector plan implementation for making progress toward quality education will be invaluable for countries with evolving assessment systems. As discussed above, there is no perfect system and countries across the globe are consistently trying to put the pieces together deliberately and adaptively. Data on implementation enables continuous review of achievements and bottlenecks associated with learning outcomes. It focuses on areas that the implementers and development professionals are actually “doing” and have control over. External reviews of activities being implemented for quality education also reveal constraints that are not immediately observable to the insider. Essentially, the intention of wanting to know why learning is not taking place and what can be done assumes a posture of deliberateness in problem-solving and an openness to adaptation – both enabling conditions for making progress toward quality education and accountability for learning. This paper has tried to capture pathways in the use of student learning data for accountability and for improving education quality in the classroom. Curriculum reform and alignment with assessment frameworks form the basis, influencing key components of this task, namely infrastructure and instructional aids management, personnel management and support, and school oversight and support. Facilitating and sustaining the interactive process of reform for harmonization and coherence across these components based on assessment data would be critical for a well-functioning education system.

32

Annex 1 Bhutan Government, Ministry of Education. 2014. Bhutan Education Blueprint 2014-2024. Rethinking Education. Thimpu, Ministry of Education Burkina Faso Government Ministry of Education. 2012. Programme Sectoriel de l'Education et de la Formation, 2012-2021, Version finale. Ouagadougou, Burkina Faso: Ministry of Education. Burundi Government, Ministry of Education. 2012. Plan sectoriel de développement de l’éducation et de la formation, 2012- 2020. Bujumbura, Burundi: Ministry of Education Cambodia Government, Ministry of Education. 2014. Education Strategic Plan, 2014 – 2018, Draft. Phnom Penh, Cambodia: Ministry of Education. Cameroon Government, Ministry of Education. 2013. Document de Stratégie du Secteur de l’Education et de la Formation, 2013-2020. Yaoudé, Cameroon: Ministry of Education. Central African Republic Government Ministry of Education. Plan De Transition 2014-2017, Bangui, Ministry of Education. Congo, Government Ministry of Education. 2015. Strategie Sectorielle de l’education 2015-2025. Kinshasa, Ministry of Education Djibouti Government Ministry of Education. 2014. Plan d'Action de l'Education 2014-2016. Djibouti: Ministry of Education. Democratic Republic of Congo Government Ministry of Education. 2015. Le système éducatif congolais. Diagnostic pour une revitalisation dans un contexte macroéconomique plus favorable. 2016-2025. Kinshasa, Democratic Republic of Congo: Ministry of Education. Eastern Caribbean States. 2012. Organization of Eastern Caribbean States Education Sector Strategy. 2010-2021. Eritrea Government Ministry of Education. 2013. Education Sector Development Plan 2013-2017. Asmara, Eritrea: Ministry of Education. Gambia Government Ministry of Education. 2013. Draft Education Sector Strategic Plan 2014-2022. Banjul, Gambia: Ministry of Education. Ghana Government Ministry of Education. 2012. Education Strategic Plan 2010-2020 Volume 1 and 2. Accra, Ghana: Ministry of Education. Guinea Government Ministry of Education. 2015. Programme Sectoriel de l’education 2015-2017. Connakry, Ministry of Education. Guyana Government Ministry of Education. 2014. Guyana Education Sector Plan 2014-2018. Georgetown, Ministry of Education. Kenya Government Ministry of Education. 2012. National Education Sector Plan 2013/2014 – 2017/2018. Nairobi, Ministry of Education.

33

Krygyz Republic Government Ministry of Education. 2012. Education Development Strategy of the Kyrgyz Republic for 2012-2020. Bishkek, Krygyz Republic: Ministry of Education. Liberia Government Ministry of Education. 2010. The Education Sector Plan of Liberia – A Commitment To Making A Difference 2010-2020. Monrovia, Liberia, Ministry of Education. Malawi Government Ministry of Education. 2008. National Education Sector Plan 2008–2017. Lilongwe, Malawi, Ministry of Education. Mauritania Government Ministry of Education. 2011. Programme National de Developpement du Secteur Educatif, 2011-2020, Plan d’Action Triennal, 2012-2014. Nouakchot, Mauritania: Ministry of Education. Niger Government. Ministry of Education. 2013. Programme Sectoriel de l’Education et de la Formation, 20142024. Niamey, Niger, Ministry of Education. Rwanda, Government Ministry of Education. 2013. Education Sector Strategic Plan Kigali 2013/14 – 2017/18. Kigali, Rwanda, Ministry of Education. Senegal Government Ministry of Education. 2013. Programme d’Amélioration de la Qualité, de l’Equité et de la Transparence, Secteur Education Formation, 2013-2025. Dakar, Senegal, Ministry of Education. Sierra Leone Government Ministry of Education. 2013. Education Sector Plan, 2014-2018. Freetown, Sierra Leone: Ministry of Education. South Sudan Government Ministry of Education. 2012. General Education Strategic Plan, 2012-2017. Juba, South Sudan, Ministry of Education. Tanzania Government Ministry of Education. 2013. Education Sector Development Plan 2008-2017. Dodoma, Tanzania: Ministry of Education. Tadzhikistan Government Ministry of Education. National Strategy of Education Development of the Republic of Tajikistan till 2020. Timor-Leste Government Ministry of Education. 2011. National Education Strategic Plan, 2011-2030. Dili, TimorLeste, Ministry of Education. Togo Government Ministry of Education. 2014. Plan Sectoriel de l’education 2014-2025. Amelioration del’access de l’edquite et de la qualite de l’education. Lome, Togo, Ministry of Education Uzbekistan Government Ministry of Education. 2013. Education Sector Plan, 2013-2017. Tashkent, Uzbekistan: Ministry of Education. Zimbabwe Government Ministry of Education. 2011. Education Sector Plan, 2011-2015. Harare, Zimbabwe: Ministry of Education.

34

Annex 2 SACMEQ SCORES – Anglophone Sub-Saharan Countries

Kenya Lesotho Malawi Mozambique Swaziland Tanzania Zanzibar Uganda Zambia Zimbabwe

SACMEQ I 1995-1999 Reading 543 463

489 477 504

SACMEQ II 2000-2004 Reading Mathematics 547 563 451 447 429 433 517 530 530 517 546 522 478 478 482 506 440 435

SACMEQ III 2006-2011 Reading Mathematics 543 557 468 477 434 447 476 484 549 541 578 553 537 490 479 482 434 435 508 520

Source: SACMEQ, 2015

PASEC SCORES – Francophone Sub-Saharan Countries Level

Making use of assessments for creating stronger ...

Making use of assessments for creating stronger ...

Suggest Documents

Higher Standards, Stronger Assessments - cloudfront.net

Creating Assessments in itslearning.pdf

Creating Meaningful Assessments for Professional ... - CiteSeerX

Making Opinions Stronger & Weaker - UsingEnglish.com

Creating SMART Response assessments - downloads.smarttech.com

MAP Assessments: Making Sense of Data

MAP Assessments: Making Sense of Data

MAKING COMMUNITIES STRONGER: Engaging African - Interagency ...

[PDF] Download Making Classroom Assessments ... - Google Sites

Use of Cumulative Assessments in US Schools

Expert assessments on evaluation of land use

Functional assessments for decision-making regarding return to sports ...

Assessments for attention-deficit hyperactivity disorder: use of ...

Limits on use of health economic assessments for ...

Use of Student Response Systems for Summative Assessments

Health Consequence Scales for Use in Health Impact Assessments of ...

Making sense of the shadows: priorities for creating a ... - CiteSeerX

Use Quality Plaster Cornice for Stronger Structure.pdf - Google Drive

Making Sense of Data from Complex Assessments - UMD College of ...

Making use of Android - eLinux.org

Stronger charities for a stronger society - Parliament (publications)

Making Sense of Data from Complex Assessments - CiteSeerX

Stronger charities for a stronger society - Parliament (publications)

Making Weak Instrument Sets Stronger: Factor-Based ... - Google Sites