Understanding the Divergent Trends in PISA Test Results for Poland ...

3 downloads 259 Views 612KB Size Report
Poland and the Czech Republic in order to better understand the divergence in achievement trends. Our objectives are two-fold: to understand the contributions  ...
Program on Education Policy and Governance Working Papers Series

Understanding the Divergent Trends in PISA Test Results for Poland and the Czech Republic

Mikolaj Herbst, University of Warsaw Daniel Munich, CERGE-EI Steven G. Rivkin, University of Illinois at Chicago, University of Texas at Dallas, and NBER Jeffrey C. Schiman, University of Illinois at Chicago

PEPG 12-06  

Understanding the Divergent Trends in PISA Test Results for Poland and the Czech Republic Mikolaj Herbst, Daniel Munich, Steven G. Rivkin, and Jeffrey C. Schiman July 2012

VERY PRELIMINARY DO NOT CITE

I. Introduction Fifteen year old students in the Czech Republic outscored their peers in Poland on the 2000 PISA examinations in both mathematics and reading, but subsequent year test score increases in Poland and decreases in the Czech Republic after 2003 led to a reversal of that order by 2009. Many credit the 1999 education reform in Poland that delayed tracking until 10th grade and substantially increased class time devoted to academic subjects for the rise in Polish test scores. However, there were a number of other changes in Poland during this period that could have also affected school quality and student achievement including the adoption of national



Mikolaj Herbst, University of Warsaw; Daniel Munich, CERGE-EI; Steven G. Rivkin, University of Illinois at Chicago, University of Texas at Dallas, and NBER; Jeffrey C. Schiman, University of Illinois at Chicago

1

standardized tests in 2002, the announcement that all 12th graders will have to pass a national mathematics examination at the conclusion of secondary school beginning in 2010, an increase in teacher salaries, a sharp enrollment decline that caused a roughly 30 percent fall in average student-teacher ratio, and a large expansion in university enrollment that may have altered expectations and motivation to achieve academically in primary and lower-secondary school. The Czech Republic exhibits more complex trends in mathematics and reading achievement, suggesting that no single factor accounts for the observed changes. We would expect the observed decline in the student-teacher ratio and increase in time devoted to academic subjects to increase achievement, the opposite of the substantial test-score decline between 2006 and 2009. Factors that potentially offset this effect are an increase in the share of fifteen year olds in 9th rather than 10th grade, a decline in average parental education for cohorts born immediately following the transition, a fall in relative teacher salaries, and a school decentralization reform that may have devolved authority to administrative bodies with little capacity for such decision making. In addition, a close inspection of the PISA data raises concerns about the accuracy with which they capture national trends in academic achievement in Poland and the Czech Republic and about potential deficiencies of the PISA data more generally. Deviations between reported parental education distributions in PISA and in Czech Republic and Polish national surveys indicate problems with student reports, the sampling and weighting schemes, or both. In Poland, discrepancies between student and parent self-reported educational attainment appear to be systematically related to test scores. Moreover, the form of the parental education question has changed over time, as has the approach to reweighting due to survey non-response. Although potentially beneficial, such changes could compromise comparisons over time.

2

In this paper we undertake a comprehensive investigation of achievement trends in Poland and the Czech Republic in order to better understand the divergence in achievement trends. Our objectives are two-fold: to understand the contributions of specific factors to achievement changes in these two countries and to learn more about the deficiencies and fragility of the PISA data. In order to accomplish these objectives we supplemented the information in PISA with national census and labor force survey data on educational attainment and salaries for teachers and non-teachers, administrative data with information on school operations, and institutional detail on education reforms in the two countries. Purposeful sorting of students and school personnel into schools and the non-random timing and adoption of education reforms certainly complicate identification of the effects of specific factors. The regressions illustrate the conditional relationships between included factors and achievement. In the case of class time we build on Lavy (2010) by using differences in the time devoted to mathematics and reading instruction to estimate the effects of class time. Finally, we will examine test score patterns in order to learn more about the potential importance of reforms that cannot be easily quantified. The next section describes the data used in the analyses, focusing on the PISA test information and sampling methods. Section 3 presents national trends in test scores, maternal education, and school inputs, showing changes over time in all variables by maternal education and school type. Section 4 reports the regression analysis of test scores, focusing on the school characteristics that exhibit substantial changes over time. This section also builds on Lavy (2010) and conducts a student fixed-effect analysis of the effects of instructional time on achievement. Section 5 describes potential deficiencies in the sampling methods and potential implications for

3

the results, and Section 6 summarizes the analysis and considers implications for education policy and use of the PISA data.

II. Data This study combines several sources in order to paint a richer picture of the changes over time in schooling outcomes and systems in the Czech Republic and Poland. The PISA data provide the primary source on students and schools, and government censuses, labor force surveys and other administrative data provide additional information on labor markets and demographics. We describe the PISA data in detail, focusing on the structure of the assessments, sampling methodology, and evolution of question formats and structure. Beginning in 2000, the Programme for International Student Assessment (PISA) tested 15-year old students from 43 different countries in mathematics, science, and reading. The test was administered again in 2003, 2006, and 2009. By 2009, students from 65 countries around the world took the test. An important feature of the PISA test is that it focuses on applications of knowledge rather than memorization and this may paint a more accurate picture of the type of skills most closely related to economic outcomes. Student and school surveys provide a wealth of information about home and school life. Each tested student responds to questions about family, home, study habits, school, and classroom instruction, and a representative from each school answers questions about the principal, teaching staff, school resources, curriculum, school climate, and policies. In 2006 and 2009, Polish parents responded to similar questions as their children, enabling comparisons between student and parent answers.

4

The PISA sample is organized by age rather than grade meaning that there is substantial variation in grade distributions across countries and over time. Poland, which concentrates on students in 9th grade, provides an exception to this approach. Therefore the increasing rate at which families delay school entry appears as an increase in the share of students in 9th grade in the Czech Republic but not in Poland. PISA uses a two-stage stratified sampling design in order to produce nationally representative statistics and have adequate sample sizes for groups of particular interest. In the first stage, at least 150 schools within the country are selected at random from each strata. Following this selection, thirty-five students are sampled randomly at each school. School sampling weights reflect the inverse probability of a school being chosen for the sample and student sample weights adjust for non-response. As we illustrate below, the method for adjusting for non-response changes over time in order to account for differences in attrition by gender and grade. Calculations of descriptive statistics and regression coefficients using the assessment data also require the use of replicate weights in addition to the sampling weights. Adams and Wu (2002) describe the treatment of test scores and generation of standard errors. Given the finite sample sizes in each country, the generation of unbiased standard errors requires the use of these weights.

III. Trends in Test Scores, Family Background and School Characteristics This section describes changes over time in mathematics and reading achievement scores, family background, and selected school characteristics for students in the Czech Republic and Poland. Based on both extensive evidence of its importance as a determinant of educational

5

attainment and high response rate we use maternal education as a proxy for family background. The school characteristics include class-time devoted to mathematics and language arts instruction and the pupil-teacher ratio. We focus on these variables because of their availability, central role in education policy discussions, and substantial changes observed in these countries. Figure 1 presents trends in mathematics and reading scores for 9th and 10th graders in the Czech Republic and 9th graders in Poland. In 2000, Czech Republic students outscored their Polish peers by 37 points in mathematics and 25 points in reading, but in 2009 just the opposite was true: Polish students outscored Czech Republic students by 3 points in mathematics and 23 points in reading. Although Polish reading scores were already higher in 2003, it was not until 2009 that Polish math scores exceeded those in the Czech Republic. Figures 2 and 3 illustrate that these test score movements occurred throughout the distribution. In mathematics Polish gains at the lower end of the distribution preceded those at the upper end, while in reading Polish gains appear to be more consistent and evenly distributed across the distribution. Table 1 reports the distribution of maternal education by year and country based on student surveys. Because of the very small number of children with less than a lower secondary school education, we divide students into four categories: lower secondary school or lower; upper secondary school without a matriculation examination; upper secondary school with a matriculation examination; and university, the latter of which includes technical college and university graduates.1 Perhaps most striking is that neither country exhibits monotonic trends between 2000 and 2009, with the shares of university graduates and graduates of upper-secondary schools with matriculation exams fluctuating from year to year. The decline in the share with only a lower1

This includes ISCED categories 5A, 5B, and 6.

6

secondary education or less appears to be larger in the Czech Republic, and the Czech Republic also shows a decline in the share that completed upper-secondary programs without examinations. Between 2000 and 2009 the share that completed an upper-secondary program with exams, university, or technical college rose from roughly 60 to 67 percent in the Czech Republic but fell from 66 percent to 63 percent in Poland, suggesting that changes in family background contributed little if any to the improvement in Polish achievement. Importantly, changes over time in both the wording and structure of parental education questions gives rise to some skepticism about the validity of these figures. In 2000 students were first asked if their mother completed upper secondary school. They were asked to tick one of the following: did not attend school; primary only; lower secondary only; upper secondary without exam; upper secondary with exam. Then students were asked if the parent completed any substantial post-secondary school, yes or no. In 2003 the questions were restructured. Students were asked “(W)hich of the following did your mother complete at school?” They were asked to check off each, beginning with upper secondary with examinations and ending with none of the above. Then students were asked if their mother had any of the following post-secondary degrees: university, other substantial postsecondary; insubstantial post-secondary. In 2006 the questions were again restructured. First, students were asked what was the highest level of secondary schooling completed by their mother. They were asked to check one box only and ask the test administrator for help if they were not sure which box to choose. The post-secondary question had a quite similar format to 2003, except that students were again encouraged to solicit help if they were uncertain. This format was basically retained for 2009. In sum, the substantial modifications to question structure and instructions for these later years

7

raises doubts about the comparability of responses across years and about the information gleaned in the early surveys. We now describe changes over time in the aforementioned school characteristics in Table 2. Given the high possibility that maternal education influences schooling choices, we present the patterns by maternal education category. We also separate 9th and 10th grade students in the Czech Republic, because many students transition to an upper-secondary school or apprenticeship program between these two grades. The most striking changes early in the period occurred for class-time devoted to mathematics and reading instruction in Poland following the reform that eliminated tracking into vocational programs in 9th grade. The average number of minutes of mathematics instruction per week in Polish classrooms rose 80 minutes per week for students with less educated mothers, 70 minutes per week for students whose mothers had completed an upper secondary school with examinations, and over 60 minutes per week for students with university educated mothers. This across the board increase denotes a major change in the structure of schooling; a similar though not quite as pronounced pattern emerges for language arts. Czech 9th graders began at a much higher quantity of math instruction in 2000, but the much larger increases in Poland leave 9th graders in the two countries with similar distributions of class-time devoted to these two subjects. Tenth grade Czech students do experience slightly larger increases in minutes devoted to mathematics and language arts instruction for all maternal education levels. Nonetheless, there remain substantial differences by maternal education even in 2009. Similar to the case for instructional time, changes over time in the pupil-teacher ratio experienced by Poland and the Czech Republic exhibit some differences and some similarities.

8

In Poland the pupil-teacher ratio rises between 2000 and 2003 before falling back below its 2000 level in 2006 and 2009. Note that the magnitude of the decline rises along with maternal education. There is no similar decline in the Czech Republic, and the magnitude of the decline in the Czech Republic decreases rather than increases monotonically with parental education. This reverses the ordering of 10th grade average pupil-teacher ratio in the Czech Republic. Disaggregation by school type illuminates the sources of the observed changes in the Czech Republic. Table 3 shows that in 10th grade the pupil-teacher ratio remained quite stable in the gymnasiums and vocation upper secondary schools with matriculation examinations but fell by forty percent in the less selective school sectors. Apparently a large enrollment decline not accompanied by a proportional decline in teaching staff led to a shift of students toward more selective schools with lowered entry cut-offs and a decline in the student-teacher ratio in the less selective sector. In fact between 2000 and 2009 the proportion of 10th grade students in apprenticeship and vocational declined 34% while the proportion in all other school types increased. Note that class-time devoted to academic subjects increased more in the least and most selective sectors, leaving the vocational upper secondary schools with examinations to have the lowest amount of instructional time devoted to mathematics and reading. This indicates a continued emphasis on career preparation despite the increasing attention paid to mathematics and language arts education.

IV. Test Score Changes, Family and School Characteristics A clearer understanding of the contributions of family background, class time, the pupilteacher ratio and other factors to the observed test score changes over time would be quite valuable for education policy. In general the PISA data do not provide exogenous sources of

9

variation with which to identify the causal effects of specific factors. However, because there is within student variation in class-time devoted to each subject, the use of student fixed effects regressions to identify the effect of class-time is feasible. We therefore build on the work of Lavy (2010) and estimate such models. Because maternal education is likely to affect achievement directly through family inputs and indirectly through the choice of school we begin by presenting test scores by maternal education. Table 4 shows that both mathematics and reading average achievement are much higher for students with mothers that complete an upper secondary school requiring matriculation examinations or university. The differential by maternal education declines over time in Poland (excluding the small and declining number of students whose mothers do not have some kind of upper-secondary degree). Between 2000 and 2009 the average mathematics scores increase by 22 points for those with university educated mothers, 27 points for those whose mothers completed an upper-secondary school with examinations and 31 points for those whose mothers completed an upper-secondary school without examinations; a very similar pattern emerges for reading. Note that the increases in class time take on a similar pattern, while the decreases in student-teacher ratio rose along with maternal education. Mathematics and reading test scores in the Czech Republic fluctuate much more than in Poland, and differences emerge by subject, grade and maternal education. In the large maternal education category of upper-secondary education without examination average mathematics achievement increases between 2000 and 2003, remains constant between 2003 and 2006 and then declines substantially between 2006 and 2009, decreasing by almost 20 points in 9th grade and by 13 points in 10th grade. This pattern of an increase in average math achievement between 2000 and 2003 and a large decline between 2006 and 2009 also emerges for children with

10

mother’s that completed upper-secondary school with examinations. In contrast, children with university educated mothers experience larger average decreases in mathematics achievement between 2003 and 2006. Reading achievement in the Czech Republic exhibits a somewhat different pattern than mathematics achievement. First, it decreases in all maternal education categories and both grades between 2000 and 2003. Second, there are pronounced differences by grade in the changes observed between 2006 and 2009: in 10th grade all but the category of upper-secondary school with examinations shows an increase during this time, and scores in this category fall only slightly between 2006 and 2009. In 9th grade, the upper-secondary categories show large declines in average achievement, and the university category also shows a decrease, albeit smaller. Only the lower secondary and below category shows a large increase in both 9th and 10th grades. The divergence between mathematics and reading score trends and differences by grade suggest that multiple factors contribute to achievement changes in the Czech Republic, and differences by school type may provide guidance as to the importance of particular factors. Table 5 shows somewhat more systematic patterns than Table 4. Math test scores increase substantially between 2000 and 2003 for all school types and both grades, remain fairly stable between 2003 and 2006 except for the less academically oriented schools in each grade where achievement declines, and then decline precipitously between 2006 and 2009 in all types except the least academic and rigorous 10th grade schools where achievement remains relatively constant after falling by 18 points between 2003 and 2006. In reading, achievement in gymnasia increases substantially between 2000 and 2006, while achievement in other schools decreases during this period. Following 2006 achievement in gymnasia declines back to levels below those observed in 2000, 9th grade scores for students in

11

primary and lower secondary schools continue to fall, 10th grade scores for students in vocational upper-secondary schools with exams decreases substantially, and achievement in the least academic and rigorous schools increases substantially. Average 2009 reading achievement in all types of schools lies below its 2000 level, and the difference is particularly pronounced outside of the gymnasium sector. Together these patterns by maternal education and school type suggest that something deleterious to achievement occurred between 2003 and 2009, though the timing varies by school type. Class time devoted to mathematics and reading instruction rose and the pupil-teacher ratio declined, but any positive effects of such changes were more than offset by other factors. Although changes in the family background and grade distributions likely contributed to the decline, the fact that test scores fall within maternal education and grade strongly suggests the presence of factors that adversely affected the quality of education post-2003.

V. The Effects of Instructional Time The nonrandom allocation of children among schools and purposeful choices of schools regarding instructional time and the pupil teacher ratio impede efforts to estimate causal effects of these variables in the Czech Republic and Poland, as unobserved student and school factors can introduce bias. As Lavy (2010) discusses there is limited compelling evidence on the effects of the length of the school day or year and even less on subject specific instructional time. However, the variation in instructional time across students and subjects raises the possibility of using within student variation in class time devoted to the tested subjects. Building on Lavy (2010) we estimate student fixed effects models that account for school and student effects during the tested grade that are common to both subjects. Unlike many models where the student

12

fixed effects account only for student heterogeneity that is time invariant, this framework focuses on differences across subjects in a single time period and thereby accounts for time variant as well as time varying student and school effects as long as they are common to both subjects. Although we begin by focusing on time in class, we also consider the effect of time spent in other instructional settings and study time outside of school. Equation 1 models achievement for student i in subject j in school s as a function of a student by year effect , a school by year effect , weekly class time in subject j, X, a school by subject effect , a student by subject effect , and a random error : (1)

Aijs Xijsi  s  ij sj ijs The presence of both reading and mathematics test scores and class time information

 enables the estimation of student fixed effects models. Such specifications control not only for the student by year effect but also for the school by year effect specific to the respective student. However, this specification does not account for any of the three final error components: the student by subject effect, the school by subject effect, and . In order to obtain an unbiased estimate of  class time must be uncorrelated with both the school by subject and student by subject effects. A number of factors could violate these assumptions. For example, the compensation of low quality math instruction with additional class time devoted to mathematics instruction would violate the strict exogeneity assumption by introducing a negative relationship between X and . Assuming that  is positively related to achievement in the subject, this would bias  toward zero. Similarly, if schools arrange for struggling students to have additional class time in their weaker subjects, X and  would be negatively related, again biasing  toward zero. On the other hand, if schools were to increase class time in the subject in which teachers or students are stronger,  would be biased upward. 13

Some steps can be taken to mitigate such biases. Specifically student reported class time in each subjected can be aggregated to the school level, effectively eliminating within school variation in class time assignments related to relative skill in a particular subject. Even so, school average differences in math versus language arts skills that are related to differences in class time could still bias the estimates. Therefore the key identifying assumption is that subject differences in class time are not related to other factors that contribute to subject differences in achievement, conditional on student fixed effects. The feasibility of this estimation approach rests on the availability and accuracy of subject specific information on class time and differences by subject. The PISA survey solicits information on mathematics class time for all years and reading class time for all but 2003. Importantly, class time in 2000, 2003, and 2009 is based on information on the typical number of classes in a given subject attended each week (provided by students) and the number of minutes in a typical class (provided by teachers). In 2006, however, students were asked “How much time do you typically spend per week studying the following subjects (science, mathematics, and )?” Students were asked to indicate time spent in regular class lessons, out of school-time lessons, and self-study time. The possible responses for each were “no time,” “less than 2 hours a week,” “2 or more but less than 4,” “4 or more but less than 6,” and “6 or more.” An hour refers to 60 minutes and not to a class period. We begin by estimating a series of specifications for 2006 that differ according to the approach taken to convert the intervals to numbers of minutes per week. Lavy (2010) takes the midpoint of the intervals, and we replicate his approach but also use the empirical distributions for each interval in 2009 to impute minutes. There are marked differences in between category differences in average minutes by imputation method.

14

Even more striking, Table 6 illustrates the dramatic effect of question structure on the responses by converting the continuous responses in years other than 2006 into the 2006 categories. For example, between 2000 and 2003 the growth in mathematics class-time in Poland shifts the distribution almost entirely into the 2 to 4 hours per week category, with a small fraction of students having weekly mathematics class time between 4 and 6 hours. This concentration in the 2 to 4 hours per week category is even more pronounced in 2009. However, the distribution for 2006, the year with a distinct question structure, differs dramatically. Not only does it place over 10 percent of the students in the less than 2 hours per week category (the corresponding share is zero in 2009), it places over 75 percent in the categories of four hours or above, in contrast to only 4.5 percent in 2009. The distribution for 2006 conflicts with the Polish educational reform and would appear to be highly misleading. Table 7 reports OLS and fixed effect class time coefficients for the various specifications for 2006, by country. The estimates vary considerably by the imputation method; for both countries the Lavy imputation method produces much larger and in the case of the Czech Republic statistically significant effects, while the imputation based on the 2009 empirical distributions within categories tend to produce much smaller estimates. Clearly any imputation method introduces measurement error, and even the measures of minutes for the other years contain error. Classical measurement error would most likely attenuate the estimates, but the problems introduced by non-classical error are much harder to predict. Therefore we turn now to estimates based on reports of actual minutes in class for the years 2000 and 2009. Column 1 of Table 8 shows that the estimated effect time in class is positive and statistically significant at the one percent level. The coefficient of 0.079 suggests that the roughly 80 minutes per week average increase between 2000 and 2009 in mathematics class time

15

accounted for over six points (20 percent) of the increases in average mathematics test scores in Poland. In addition, the approximately 60 minute per week increase in language arts class time accounted for almost 5 points of the increase in average Polish language arts scores. In general the class time increases were more modest in the Czech Republic, though 10th graders in the less selective school sectors did experience average class time increases around 30 minutes per week in both subjects, and these increases helped to offset the effects of other factors that adversely affected achievement in the Czech Republic. Finally, time spent on learning outside of the regular classroom would also be expected to increase achievement, and any changes over time could contribute to test score trends. Unfortunately, PISA collects information on weekly time studying specific subjects only in 2000 and weekly time receiving instruction outside of the school only in 2009. Therefore we are not able to quantify changes over time in these time-use categories. Nonetheless, it is important to control for time in other instructional settings if possible, because parents may compensate for a lack of regular class time with instructional time in other settings.2 Such endogenous response could introduce bias into the estimates of regular class time. The case of time studying is somewhat different. Given the inclusion of student fixed effect, study time differences by subject may accompany class time differences and provide an additional pathway through which longer time in class translates into higher achievement. PISA collects information on time studying or in other instruction settings in categories, and we construct two sets of variables for each of these time uses. The first includes the shares of students from each school and subject in each of the categories, excluding one. The second includes dummy variable for answers in each category excluding one. As discussed earlier the

2

Todd and Wolpin (2003) describe the endogenous response of parents to observed school inputs.

16

share structure mitigates bias introduced by endogenous responses to realized performance in a subject. The final four categories in Table 8 report estimates for class time and one of the two other time uses from separate regressions for 2000 or 20009. There is not strong evidence that study time has a significant effect on achievement or accounts for a sizeable portion of the effect of regular class time; none of the coefficients are significant, and the inclusion of information on study time does not decrease the class time coefficient. In contrast, there is suggestive evidence that time spent in other instructional settings increases achievement. A higher share of children that spend more than six hours per week in other instructional settings is positively related to achievement. Note that regressions that use individual information on time spent in other instructional settings finds a negative relationship, consistent with the notion that struggling in a subject induces some students to seek outside instruction in that subject.

VI. Other potential factors Family background, class time and the pupil-teacher ratio likely contributed to observed changes test score changes, but several other factors likely affected test scores as well. Some, such as the school decentralization reform in the Czech Republic and growth in the share of 15 year olds in 9th grade, constitute real changes, while others, such as changes in weighting methods and sample coverage, likely affected the connection between true national achievement and PISA test results. VIa. Representativeness of national samples PISA uses a two-stage stratified probability sample design to generate nationally representative statistics, and a number of factors determine the magnitude of any deviation from

17

this objective. This includes the initial stratification, the response to school refusals, the adherence to random sampling in schools, the magnitude and character of student nonparticipation and attrition, and the construction of the weights. We begin with a comparison of the weighted PISA maternal education distributions with those produced by national censuses and household surveys. Because differences could result from either deficiencies in the sample design or inaccuracies in the information provided by students, we compare student and parent responses. Subsequently we examine the evolution of the adjustment to sample weights in the presence of student non-response, describing both the sensitivity of test score patterns to the structure of the weights and changes over time in the distribution of student weights. Table 9 reports the differences in maternal education distributions by country and year, and it shows that maternal education distributions in PISA based on student reports are far more concentrated in the higher educational attainment categories than are the national surveys. Importantly, the divergence with the national surveys fluctuates from year to year in ways that might explain some of the observed test score trends. In Poland for example, the divergence between samples generally falls between 2006 and 2009, a period in which the average reading score declines and the average math score levels out. This might have concealed additional test score improvement. In the case of the Czech Republic the opposite appears to hold: in 2009 the PISA samples seems to be drawn even more disproportionately from the upper end of the maternal schooling distribution, but test scores decline substantially anyway. This suggests that the true achievement decline was larger than the true decline in achievement, and this appears to hold for the period between 2003 and 2006 as well. An alternative and not mutually exclusive explanation for the divergence between PISA and national surveys is a tendency for students to overstate maternal education. Fortunately,

18

Poland collected parent surveys in 2006 and 2009 as part of PISA, and Table 10 reports the distribution of parent responses conditional on student responses for these years. Although a substantial share of students overstates maternal education (a far smaller share understate it), there is little evidence that such mistakes explain the divergence between the distributions in PISA and the national surveys. For example, a lower share of Polish students misclassified their mothers as university graduates in 2009 than in 2006 despite the fact that the difference between the PISA and national survey maternal education distributions is actually larger in 2009. A slightly higher share that classify their mothers as having completed an upper-secondary school with examinations overstates their school attainment in 2009, but this is not nearly enough to explain the substantial decline in the PISA-national survey difference in shares classified in this category observed between 2006 and 2009. Modifications to the weighting algorithm used to deal with non-respondents could also contribute to test score changes over time. In 2000 and 2003 there is no variation in weights within schools; initial school weights are adjusted for non-response without consideration of the gender or grade of the non-respondent. In 2006 and 2009, however, the weighting algorithm uses information on the grade and gender of non-respondents. Sizeable differences in weights emerge within schools. Nonetheless, a recalculation of annual test score means based on calculations that give all students in a school the school average student weight is virtually identical to the calculation based on the student weights. Although the change in weighting procedures may affect some results, it has little or no effect on the time pattern of mathematics or reading achievement. VIb. Increase in 9th grade share in the Czech Republic

19

The trend toward a larger share of 15 year olds attending 9th rather than 10th grade likely contributes to the decline in average Czech PISA scores, as these students will have had one less year of school. The share of the sample in 9th grade is around 42 percent in 2000, 45 percent in 2003 and 2006 and 49 percent in 2009. Given that the average test score of students in 10th grade is substantiallyhigher than the 9th grade average, this reallocation of students likely contributes to the achievement decline. Of course, 9th graders tend to be younger than 10th graders. Moreover, an increasing number of retentions shifts the distributions of both 9th and 10th grade students right over time. Therefore the magnitude of this effect is likely to be smaller than the average grade difference in 2000. Moreover, one would expect this shift to have a positive effect on average achievement in 10th grade as average age in 10th grade rises. VIc. Teacher Salaries and School Decentralization in the Czech Republic Notwithstanding the increase in the 9th grade share and potential data deficiencies, the marked decrease in mathematics achievement following 2003 suggests the emergence of factors that adversely affected schools throughout the Czech Republic. Average ninth and tenth grade mathematics achievement fell in all maternal schooling categories and school types, where the decrease was at least twenty points except for tenth grade students in extended gymnasia. Ninth grade reading scores also decrease in all maternal education and school type categories, though the magnitudes tend to be far smaller than those observed for mathematics. Finally, the change in 10th grade reading achievement fluctuates more across categories and on average is much smaller. Note that the set of negative factors appears to have been powerful enough to offset the benefits of substantial increases in class time devoted to mathematics and substantial decreases in the pupil-teacher ratio in the least selective school sectors.

20

Although there may be a number of contributing factors, two primary candidates for explaining the decline are the fall in relative teacher salaries and the school decentralization reform adopted in 2001. Figures 4a and 4b illustrate the substantial decline in teacher salaries between 2004 and 2009; a similar decline did not occur in Poland. The figures plot the shares of workers with tertiary education that earn less than the average upper-secondary (Figure 4a) and lower-secondary (Figure 4b) teachers, by age. A decrease in these shares corresponds to teacher movement down the earnings distribution. The movement down the earnings distribution is particularly pronounced for younger teachers between the ages of 21 and 40. Younger upper and lower secondary teachers move roughly 10 percentile points down the non-teacher earnings distribution, a quite pronounced change in labor market position. This almost certainly had an adverse effect on the pool of potential teachers and composition of those choosing not to remain in teaching, each contributing to an erosion of teacher quality. However, the period between 2004 and 2009 was a period of sharp demographic decline, with limited hiring of new teachers. In addition, evidence suggests that job mobility is fairly low in the Czech Republic. Therefore, the short term effect of the salary decline is not likely to account for much of the dramatic decrease in test scores between 2003 and 2009, though over the longer term the contribution could be quite large. School decentralization may well have had a larger negative effect given the immaturity of Czech political institutions and processes, particularly at the local level. However, we do not have any direct measures of the consequences of the decentralization. Therefore we simply describe the decentralization in some detail, and in the future we hope to be able to bring more evidence to bear on this issue.

21

In 2001 the Czech Republic implemented a major school decentralization reform that transferred much of the authority over public school operations from the National Ministry of Education to local municipalities. Prior to 2001 each of the 76 Czech school districts and Prague had a District schooling offices (DSO) that was subordinate to the National Ministry of Education. The DSOs supervised schools, provided pedagogical consultations and guidance, and transmitted the educational guidelines and instructions put forth by the Education Ministry. The DSOs employed school principals, as local schools did not have the autonomy to hire staff. The 2001 reform abolished the DSOs and devolved power to the municipalities, as much of the authority over the local schools was transferred from the National Ministry. Locally elected officials now had authority over the hiring and firing of principals and expanded responsibilities and control over school operations. In the case of upper-secondary schools regional governmental institutions were created and elections held, while in the case of the lower secondary schools more relevant to analysis of PISA the existing municipal political bodies generally were granted authority over the local schools. In many locales, particularly very small towns and villages, the local officials had little experience with local school operations and little knowledge of curriculum and the key schooling issues. Governance appeared to have focused on funding and staffing. Many staff from the abolished DSOs accepted administrative positions in the regional and municipal governments. Schooling committees as advisory bodies for municipal councils, were also established (Committee for education and upbringing - komise pro vychovu a vzdelani) in all municipalities with +5 schools. Only a minority of seats in these committees is reserved for school representatives. Anecdotal evidence suggests that local politicians have larger representation in committees of municipalities of regional capitals, because these are paid positions. At smaller

22

municipalities where members work pro-bono, this evidence suggests that committee members are more likely to represent current or past teachers and principals. There are no direct measures of the effects of decentralization, but anecdotal evidence suggests the loss of expertise, weakened monitoring, and a growing administrative emphasis on the allocation of money and positions and the fulfillment of legal regulations. This may have had a particularly deleterious effect on new teachers, schools with less parental involvement and less well-run municipalities. It seems that the transition of external school governance from DSOs to municipalities occurred over the period 2002-2005, a period of institutional uncertainty and lack of school accountability. Following 2005, schools adjusted to new governance by municipalities. Given that the adverse effects of any deterioration in the quality of school governance likely has a cumulative effect on students both because knowledge acquisition is a cumulative process and because school quality depends upon decisions over time that cumulate in their effects, one might expect the magnitude of any negative effect to increase between 2006 and 2009.

VII. Summary The divergence in Czech and Polish achievement trends appears to reflect the influences of many factors including family background, class time devoted to mathematics and language arts, teacher salaries, the structure of school governance, and PISA sampling and weighting methods. Although it is difficult to pinpoint the extent to which changes over time in family background and sample composition account for the observed test score trends, the preponderance of evidence suggests that school quality accounts for a substantial portion of the observed divergence in educational outcomes.

23

Large Polish increases in class time devoted to academic subjects appears to have had a substantial effect on achievement and to have accounted for a sizeable share of test score growth. The student fixed effect regressions produce strong evidence of the causal effect of class time, and the imprecision in the measurement of minutes in mathematics and language arts classes suggests that the coefficients may be attenuated. The much smaller class time increases in the Czech Republic likely had a much smaller effect except in the case of students attending schools in the less selective sectors. These two countries also experience substantial decreases in the pupil-teacher ratio, particularly in less selective Czech schools. The PISA data do not permit the production of compelling estimates of pupil-teacher ratio effects, and evidence on effects in 9th and 10th grade is far weaker than evidence on class size in early grades. Nonetheless, based on the evidence one would expect the decreases to have a positive albeit small effect in most circumstances. The influences of national labor market changes and educational reforms are more difficult to measure, but the pronounced decline in Czech test scores following decentralization in the early 2000s and the steady decline in relative teacher salaries certainly raise the possibility that they accounted for a substantial share of the test score decreases. Future work will focus on trying to gain a better understanding of the contributions of these two factors to the Czech test score decline.

24

References (Incomplete)

Adams, R. and Wu, M. (Eds) (2002) PISA 2000 Technical Report (Paris: OECD). Lavy, Victor. 2010. Do Differences in School’s Instruction Time Explain International Achievement Gaps in Math, Science, and Reading? Evidence from Developed and Developing Countries. Working Paper. National Bureau of Economic Research. http://www.nber.org/papers/w16227. Programme for International Student Assessment. 2009. Pisa Data Analysis Manual: SAS. OECD Publishing. Todd, Petra E., and Kenneth I. Wolpin. 2003. On the Specification and Estimation of the Production Function for Cognitive Achievement. The Economic Journal 113 (485): F3–F33. doi:10.1111/1468-0297.00097.

25

Figure 1: PISA Test Scores by Country and Year Poland Math

Poland Reading

Czech Republic Math

Czech Republic Reading

530 520

Test Score

510 500 490 480 470

460 450 440 2000

2003

2006 Year

26

2009

Figure 2: Math Test Score Distributions in Poland and the Czech Republic by Year

27

Figure 3: Reading Score Distributions in Poland and the Czech Republic by Year

28

%

Figure 4a. Share of tertiary educated workers with wages below average wage of upper-secondary school teachers in given group (%) 100% 90% 80% 70% 60% 50% 40% 30% 2001

20-29 30-39 40-49 50-59 +60 2002

2003

2004

2005

2006

2007

2008

2009

2010

year

%

Figure 4b. Share of tertiary educated workers with wages below average wage of lower-secondary school teachers in given group (%) 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 2001

20-29 30-39

40-49 50-59 +60 2002

2003

2004

2005

2006

year

29

2007

2008

2009

2010

Table 1: Distributions of Mother's Education in Poland and the Czech Republic by Year UpperUpper Student report

92.0 8.1 100

17.4 80.9 1.8 100

16.2 83.8 100

88.4 11.6 100

18. 6 77.2 4.2 100

11.1 88.9 100

Notes:

39