Returns to Schooling, Uncertainty, and Child-Parent

0 downloads 0 Views 3MB Size Report
monotone treatment response, by introducing and studying the identifying ...... the instrumental vector also varies, the definition of T need simply be ... The key to derive the identification result for Qα [y(t)] under α-QMTS is to realize that ...... children enrolled in technical curricula reported having made a joint decision with ...
NORTHWESTERN UNIVERSITY

Decision Making in Education: Returns to Schooling, Uncertainty, and Child-Parent Interactions

A DISSERTATION

SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

for the degree

DOCTOR OF PHILOSOPHY

Field of economics

By Pamela Giustinelli

EVANSTON, ILLINOIS

June 2010

UMI Number: 3402415

All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

UMI 3402415 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.

ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106-1346

2

c Copyright by Pamela Giustinelli 2010

All Rights Reserved

3

ABSTRACT

Decision Making in Education: Returns to Schooling, Uncertainty, and Child-Parent Interactions

Pamela Giustinelli

This dissertation is composed of two related parts. Chapter 1 studies identification of a pre-specified α-th quantile of a distribution of potential outcomes under weaker and more credible assumptions than those usually maintained in analogous settings of treatment-response, and obtains results of partial identification. On the theoretical side, the paper adds to the existing results on non-parametric bounds on quantiles with no prior information and under monotone treatment response, by introducing and studying the identifying properties of αquantile monotone treatment selection, α-quantile monotone instrumental variables, and their combinations. On the empirical side, theoretical results are illustrated through an application to the Italian education returns. The second part collects new data to aid identification of family members’ preferences, expectations, and interactions in high school curriculum choice with curricular tracking, conceptualized as a choice made under uncertainty and with heterogeneous family decision protocols. In particular, chapter 2 uses these data within a random subjective expected utility framework to answer the following set of questions: (1) What are the most important determinants of curriculum choice among those aspects that are uncertain at the moment of the choice? (2) Conditional on the observed decision protocols, to what extent are parents’ beliefs transmitted to children in the choice and do parents’ preferences affect it? (3) Is it important to account for multiple decision makers and heterogeneous family decision protocols for this choice? Chapter 3 proposes that family decision protocol in curriculum choice be interpreted and analyzed as the outcome of a parenting problem, which could be formalized either as a straight parental choice or as a game between the parent and the child. As first steps towards this goal, the paper reviews existing empirical evidence and modeling approaches in analyses of parenting

4 and child outcomes across fields. And it provides new evidence on family decision making in curriculum choice. The chapter concludes by clarifying how the parenting problem generating decision protocol heterogeneity can be inserted in the curriculum choice framework of chapter 2 and which structural channels may produce an association between decision protocol selection and curriculum choice.

5

Acknowledgements I wish to thank all those that contributed to this work; they are mentioned one by one in each chapter; I hope I did not forget anyone. I am especially grateful to Chuck Manski, David Figlio, Joel Mokyr, and Elie Tamer for their outstanding service on my dissertation committee, as well as to Paola Dongili, Federico Grigis, Diego Lubian, and Aldo Heffner for their invariable and invaluable support. I feel fortunate and honored to have had the guidance of Chuck Manski. His training and inspiration have accompanied me since my first year at Northwestern. His encouragement and the faith he put in me, even when I had nearly lost it myself, have escorted me throughout dissertation and the job market. With exceptional availability and flexibility, David Figlio and Elie Tamer joined the committee last Spring. Their concrete and to-the-point help with specific issues of my job market paper and with preparation for the job market was invaluable. Joel Mokyr generously accepted to serve on the committee since the very beginning, even though my research interests did not lie in the field of Economic History. He provided insightful comments and pragmatic suggestions which greatly helped to improve my job market paper, and always showed a lively enthusiasm for the project. I heartily thank Diego Lubian for encouraging me to undertake this journey and for never letting me alone in it. Paola Dongili and he made my hope for original data collection come true, and helped a great deal with the organization and administration of the survey. I cannot forget to thank Federico Grigis for exchanging and sharing with me so many ideas on research and life. I owe it mostly to him to have survived my first year of graduate school. Finally, I wish to thank Aldo Heffner for living and sharing with me most of our experience in graduate school, and for always being there for me and helping throughout.

6 It is with the deepest gratitude that I dedicate my effort to Paola Dongili, Federico Grigis, Aldo Heffner, Diego Lubian, and Chuck Manski.

7

Contents 1 FIRST CHAPTER

17

1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2

Existing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.2.1

Identification of Quantiles under Empirical Evidence Alone . . . . . . . . 24

1.2.2

Identification of Quantiles under Monotone Treatment Response . . . . . 26

1.2.3

More on Response-Based Restrictions and Common Confusions with IVType Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.3

1.4

1.5

Monotonicity Restrictions on α-Quantiles . . . . . . . . . . . . . . . . . . . . . . 30 1.3.1

Introducing α-Quantile Monotone Treatment Selection . . . . . . . . . . . 30

1.3.2

Generalizing to α-Quantile Monotone Instrumental Variables . . . . . . . 33

Increasing the Identification Power: Combined Assumptions . . . . . . . . . . . . 37 1.4.1

α-QMTS&MTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1.4.2

α-QMIV&α-QMTS&MTR . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1.4.3

Relating QR and Bounds under α-QMTS&MTR . . . . . . . . . . . . . . 43

Bounding the Italian Returns to Educational Qualifications: an Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

1.6

1.5.1

Data and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1.5.2

Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.5.3

Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2 SECOND CHAPTER

56

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.2

Motivation and Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8

2.3

2.4

2.5

2.2.1

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.2.2

Methodological Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Curriculum Choice and the Identification Problem . . . . . . . . . . . . . . . . . 71 2.3.1

Conceptualizing Curriculum Choice . . . . . . . . . . . . . . . . . . . . . 71

2.3.2

The Identification Problem, Idealized . . . . . . . . . . . . . . . . . . . . . 74

Decision Regimes within the Family . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.4.1

Unilateral Decision Making With No Child-Parent Interaction . . . . . . . 79

2.4.2

One Party Chooses After Listening to the Other Party . . . . . . . . . . . 79

2.4.3

Child and Parent Choose Jointly . . . . . . . . . . . . . . . . . . . . . . . 81

Collecting Field Data on Curriculum Choice: Report and Methodological Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.5.1

Survey Design and Administration . . . . . . . . . . . . . . . . . . . . . . 83

2.5.2

Eliciting Subjective Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

2.6

Estimation Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.7

Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 2.7.1

Curriculum Choice by a Representative Decision Maker . . . . . . . . . . 96

2.7.2

Child Chooses Unilaterally . . . . . . . . . . . . . . . . . . . . . . . . . . 105

2.7.3

Child Chooses After Listening to the Parent . . . . . . . . . . . . . . . . . 111

2.7.4

Child and Parent Make a Joint Decision . . . . . . . . . . . . . . . . . . . 114

2.8

Policy Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

2.9

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

3 THIRD CHAPTER

123

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

3.2

Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

9 3.2.1

Conceptualization of Parenting in Developmental Psychology . . . . . . . 126

3.2.2

Empirical Studies of Parenting and Child Behavior in Psychology . . . . . 129

3.2.3

The economics of child-parent interactions . . . . . . . . . . . . . . . . . . 130

3.2.4

High School Curriculum Choice in the Italian Context: Evidence from Psychology and Sociology . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

3.3

A First Look at the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 3.3.1

Decision Protocols Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

3.3.2

Individual Choice Preferences and Family Decision Protocols . . . . . . . 150

3.3.3

Child’s and Parent’s Perceptions of Each Other’s Choice Preferences, “Negotiation Windows”, and Family Decision Protocols . . . . . . . . . . . . 155

3.4

Final Remarks for Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 161

A Appendix for Chapter 1

183

A.1 Tables and Figures for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 A.2 α-QMTS Bounds: Derivation and Sharpness . . . . . . . . . . . . . . . . . . . . . 192 A.3 α-QMIV Bounds: Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 A.4 Semi-monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 A.4.1 α-Quantile Semi-Monotone Treatment Selection . . . . . . . . . . . . . . . 195 A.4.2 α-Quantile Semi-Monotone Instrumental Variables . . . . . . . . . . . . . 196 A.5 Relating Quantile Regression and Bounds under α-QMTS&MTR . . . . . . . . . 196

B Appendix for Chapter 2

197

B.1 Tables and Figures for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 B.2 Useful Institutional Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 B.3 Ex-Post Conditioning and Choice-Based Sampling . . . . . . . . . . . . . . . . . 218

10 B.4 Online Appendix for “Understanding Choice of High School Curriculum: Subjective Expectations and Child-Parent Interactions” . . . . . . . . . . . . . . . . 221 B.4.1 Student Questionnaire (English Version) . . . . . . . . . . . . . . . . . . . 221 B.4.2 Parent Questionnaire (English Version) . . . . . . . . . . . . . . . . . . . 240 B.4.3 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 B.4.4 Tables and Figures for “Data Description” . . . . . . . . . . . . . . . . . . 262

C Appendix for Chapter 3

302

C.1 Tables and Figures for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

11

List of Tables 1

Empirical Quantiles and Mean of y = ln(wage) and Distribution of z 183

2

No-Assumption Bounds on Qα [y(t)|ν] and Distribution of ν . . . . . . . 184

3

Bounds on 0.10-Quantile of ln(wage) and 0.10-QTEs . . . . . . . . . . . 185

4

Bounds on 0.25-Quantile of ln(wage) and 0.25-QTEs . . . . . . . . . . . 186

5

Bounds on 0.50-Quantile of ln(wage) and 0.50-QTEs . . . . . . . . . . . 187

6

Bounds on Expectation of ln(wage) and ATEs . . . . . . . . . . . . . . . 188

7

Bounds on 0.75-Quantile of ln(wage) and 0.75-QTEs . . . . . . . . . . . 189

8

Bounds on 0.90-Quantile of ln(wage) and 0.90-QTEs . . . . . . . . . . . 190

9

QR and OLS with Education Dummies only - α = {0.10, 0.25, 0.75, 0.90} . 191

10

IVQR Estimates of Marginal Returns to Education . . . . . . . . . . . 191

11

Curriculum Choices and Decision Protocols . . . . . . . . . . . . . . . . 197

12

Table 11 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

13

Some Demographic Characteristics in the Estimation Samples . . . . 199

14

Some Demographic Characteristics in the Estimation Samples (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

15

Comparing RP and SP-child . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

16

Comparing RP and SP-parent . . . . . . . . . . . . . . . . . . . . . . . . . . 201

17

RP, SP-Child, and JH - R1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

18

RP, SP-Child, SP-Parent, and P.O. - R3 . . . . . . . . . . . . . . . . . . . 202

19

RP, JH, and P.O. - R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

20

Representative Decision Maker - All Children and Parents . . . . . . 203

21

Representative Decision Maker - Matched Children and Parents . . 203

22

Marginal Effects - Representative Models (Not Matched) . . . . . . 204

12 23

Marginal Effects - Representative Models (Matched) . . . . . . . . . 205

24

“Child Chooses Unilaterally” (Child’s Version) . . . . . . . . . . . . . . 206

25

Marginal Effects - Child Chooses Unilaterally . . . . . . . . . . . . . . 207

26

“Child Chooses After Listening to the Parent” (Child’s Version) . . 208

27

Marginal Effects - Child Chooses after Listening to the Parent . . 209

28

“Child and Parent Choose Together” (Child’s Version) . . . . . . . . . 210

29

Table 28 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

30

Marginal Effects - Child and Parent Choose Jointly . . . . . . . . . . 212

31

Policy Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

32

Policy Experiments (Continued) . . . . . . . . . . . . . . . . . . . . . . . . 214

33

Policy Experiments (Continued) . . . . . . . . . . . . . . . . . . . . . . . . 215

34

Enrollment in 9th Grade - Verona Municipality, School Year 20072008

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

35

Choice-Based Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

36

Students’ Full Sample

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

37

Students’ Full Sample

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

38

Students’ Full Sample (Continued) . . . . . . . . . . . . . . . . . . . . . . 266

39

Students’ Full Sample (Continued) . . . . . . . . . . . . . . . . . . . . . . 267

40

Responding Parents’ Sample: Characteristics . . . . . . . . . . . . . . . 268

41

Non-Responses to Expectations’ Questions . . . . . . . . . . . . . . . . . 269

42

Extent of Monotonicity Violations in Students’ Sample . . . . . . . . 270

43

Extent of Monotonicity Violations in Parents’ Sample . . . . . . . . . 271

44

Percent Chance of Daily Average Homework and Study Time ≥ 2.5 hours - Students’ Expectations

. . . . . . . . . . . . . . . . . . . . . . . . 272

13 45

Table 44 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

46

Percent Chance of Daily Average Homework and Study Time ≥ 2.5 hours - Parents’ Expectations

. . . . . . . . . . . . . . . . . . . . . . . . . 274

47

Table 46 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

48

Percent Chance of Attaining the Diploma in the Regular Time Students’ Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

49

Table 48 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

50

Percent Chance that Child Attains the Diploma in the Regular Time - Parents’ Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

51

Table 50 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

52

Percent Chance of Passing 9th Grade at the End of the Current School Year - Students’ Expectations . . . . . . . . . . . . . . . . . . . . 280

53

Percent Chance of Passing 9th Grade at the End of the Current School Year - Parents’ Expectations . . . . . . . . . . . . . . . . . . . . . 281

54

Percent Chance of Attaining the Diploma with a Yearly GPA ≥ 7.5/10 - Students’ Expectations . . . . . . . . . . . . . . . . . . . . . . . . . 282

55

Table 54 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

56

Percent Chance that Child Attains the Diploma with a Yearly GPA ≥ 7.5/10 - Parents’ Expectations . . . . . . . . . . . . . . . . . . . . . . . . 284

57

Table 56 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

58

Percent Chance of Going to College - Students’ Expectations

59

Percent Chance that Child Goes to College - Parents’ Expectations287

60

Percent Chance of Flexibly Choosing between College and Work

. . . 286

after the Diploma - Students’ Expectations . . . . . . . . . . . . . . . . 288

14 61

Percent Chance that Child Can Flexibly Choose between College and Work after the Diploma - Parents’ Expectations . . . . . . . . . . 289

62

Percent Chance of Flexibly Facing the University Field’s Choice (i.e. Among a Wide Range of Fields) . . . . . . . . . . . . . . . . . . . . . 290

63

Statistics: Graduation’s Curriculum and Field in College . . . . . . . 291

64

Percent Chance of Finding a Liked Job Immediately After Graduation - Students’ Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . 292

65

Percent Chance that the Child Finds a Liked Job Immediately After Graduation - Parents’ Expectations . . . . . . . . . . . . . . . . . . . . . . 293

66

Choice Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

67

Choice Data (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

68

Choice Data (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

69

Choice Data (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

70

Choice Data (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

71

Choice Data (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

72

Stated Preferences Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

73

Junior High Orientation Data . . . . . . . . . . . . . . . . . . . . . . . . . . 301

74

Family Decision Protocols by Sample and Reporting Person . . . . . . 302

75

Decision Maker Under No Agreement . . . . . . . . . . . . . . . . . . . . . 305

76

Individuals and Sources Children Listened to . . . . . . . . . . . . . . . 306

77

Family Decision Protocols and Place of Birth . . . . . . . . . . . . . . . 307

78

Family Decision Protocols: Child’s and Parent’s Versions Compared 308

79

Three Categories of Family Decision Protocol . . . . . . . . . . . . . . 308

80

Parental Participation Rates by Choice and Protocol . . . . . . . . . 309

15 81

Family Decision Protocols and Curriculum Choice - All . . . . . . . . 310

82

Children Listening to “Experts” . . . . . . . . . . . . . . . . . . . . . . . . 310

83

Family Decision Protocols and Gender - All . . . . . . . . . . . . . . . . 311

84

Family Decision Protocols and Gender - Matched . . . . . . . . . . . . 311

85

Family Decision Protocols and Education . . . . . . . . . . . . . . . . . . 312

86

Family Decision Protocols and Family Structure . . . . . . . . . . . . . 313

87

Child-Parent Choice Alignment and Background Characteristics . . 314

88

Family Decision Protocols and Child-Parent Choice Alignment . . . 315

89

Child’s Perception of Parent’s Favorite Curriculum Before the Choice316

90

Parent’s Perception of Child’s Favorite Curriculum Before the Choice317

91

Child’s Perception of Parent’s Favorite Curriculum Before the Choice (Cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

92

Parent’s Perception of Child’s Favorite Curriculum Before the Choice (Cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

93

Child’s and Parent’s Reciprocal Perception of Their Favorite Curriculum

94

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

Family Decision Protocols and Child-Parent Choice Alignment as Perceived by the Child . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

95

Table 94 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

96

Family Decision Protocol and Child-Parent Choice Alignment as Perceived by the Parent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

97

Table 96 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

98

Family Decision Protocol and “Negotiation Windows” . . . . . . . . . 325

16

List of Figures 1

Comparing the Identifying Power of different Monotonicity Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

2

Students’ and Parents’ Use of the 0-100 Scale - Percent Chances that the Child Graduates in the Regular Time with an Yearly GPA ≥ 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

3

Curriculum Locations by Ward and Public Transports Network within the Municipality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

4

Students and Parents’ Use of the 0-100 Percent Chance Scale in Answering Expectations Questions

. . . . . . . . . . . . . . . . . . . . . . 304

17

1

FIRST CHAPTER

Non-Parametric Bounds on Quantiles under Monotonicity Assumptions. With an Application to the Italian Education Returns1

1 The paper in this chapter is forthcoming in Journal of Applied Econometrics. Acknowledgements: I heartily thank Chuck Manski for his helpful comments on this paper and for his encouragement and guidance throughout my dissertation, of which this paper constitutes a chapter. I have also benefitted from valuable feedback by Erich Battistin, Federico Grigis, Aldo Heffner, Diego Lubian, Rosa Matzkin, Aaron Sojourner, Chris Taber, and Alex Tetenov at different stages of the project. Comments and suggestions by Manuel Arellano and three anonymous referees enormously helped improve the paper. Financial support from Marco Fanno Fellowship (MCC) and MIUR (COFIN 2005, under contract 2005132539-001) is gratefully acknowledged. All remaining errors are mine.

18

1.1

Introduction

In any treatment-response analysis, the non-observability of counterfactual outcomes arises as a typical problem of missing outcomes and, in absence of assumptions on the nature of this missingness, it prevents point-identification of the distribution of potential outcomes. It is well established both theoretically and empirically though (e.g. Manski and Pepper (2000), Gonzalez (2005), Okumura and Usui (2006) etc.), that a small number of weak and credible assumptions on the individual response functions and the population selection process are capable to generate narrow enough identification sets for the parameters of interest to make their non-parametric bound estimates comparable to the point estimates obtained from parametric models based on strong and controversial assumptions. Within the non-parametric partial-identification literature analyzing the tension between credibility of assumptions and identification power in settings of treatment-response, a larger number of theoretical results and empirical illustrations have been provided for the mean outcome and the average treatment effects (ATE) than for other features of the outcome distribution, for example the α-quantiles and the α-quantile treatments effects (α-QTE). At the same time, the parametric point-identification literature on quantile regression (QR) (Koenker and Bassett, 1978) has been growing and developing considerably in the past few decades, emphasizing the advantages of quantiles over the mean, such as their existence and their robustness to outliers, as well as their capability of allowing for a more general behavior of the “effects” of regressors on the outcome variable (Koenker, 2000). On this ground, adopting a non-parametric partial-identification approach, I focus on identification of the α-quantiles of a distribution of potential outcomes Qα [y(t)] and the α-QTEs.

To formalize this problem, let us consider a setting with a population J of individuals (j = 1, ..., J), characterized by a distribution of potential outcomes P [y(t)] across j, each one receiving

19 some treatment t ∈ T , according to a uniform treatment rule z : J → T . Let yj (·) : T → Y be the response function mapping the mutually exclusive and exhaustive treatments t ∈ T into outcomes yj (t) ∈ Y , so that the observed outcome that a person experiences under the received treatment is defined as yj = yj (zj ).2 For example, yj may be the wage of individual j and zj her level of education.   The major difficulty in predicting which distribution of outcomes P y(t˜) would occur if every individual of this population were to receive the particular treatment t˜, consists on the fact that one cannot logically observe the outcomes that a person would experience under treatments different from the received one, for treatments are mutually exclusive. Stated differently, for each individual j ∈ J, the response function yj (·) is unknown but for the particular point yj . Indeed, we can only observe an individual’s wage given her level of education and not her entire wage-schooling locus, yj (·). Assuming observability of the realized treatments and the realized outcomes, random sampling from J reveals the status quo distribution P (y, z), but it does not reveal the conjectural outcome distribution P [y(·)]. In practice, the researcher faces an identification problem, called selection, that requires combining empirical knowledge of P (y, z) with prior information to learn about P [y(·)]. Prior information may involve shape restrictions on the response function as well as distributional assumptions on the relation between the conjectural outcomes and some variable playing the role of an instrument, possibly the treatment itself. For example, OLS estimation of a Mincer mean regression (Mincer, 1958, 1974) implies linearity of each individual wage-schooling locus (i.e. linear treatment response) with homogeneous slopes, exogeneity of schooling (i.e. exogenous treatment selection (Manski and Pepper, 2000)) and the rank or invertibility condition. Together these assumptions point-identify the 2 Each member j of population J will also have a vector of covariates, xj ∈ X. In order to keep the exposition simple, I will leave implicit the conditioning on x throughout the theoretical sections and all analyzed identifying assumptions should be understood as holding conditional on x.

20 schooling coefficient. When exogeneity of schooling and homogeneity of its coefficient in the population, given the observables, are doubted, an IV estimation substitutes exogeneity of the treatment with that of some instrument, which must also be correlated with schooling to ensure invertibility. More credibly, linearity of the wage-schooling locus may be relaxed by assuming monotonicity (Manski and Pepper, 2000) or concavity (Okumura and Usui, 2006), as formalized by monotone treatment response (MTR) and concave-monotone treatment response (Manski, 1997). Similarly, exogeneity may be replaced by monotone instrumental variables (MIV) (Manski and Pepper, 2000), where the mean response varies weakly monotonically across sub-populations of persons with different values of an observable covariate ν, or by monotone treatment selection (MTS) which makes ν = z. The price to be paid for this greater credibility is that the parameter of interest will be only partially identified.

This paper embraces the “credibility philosophy” of Manski (1995) and identifies the α-quantile of a distribution of interest P [y(t)], denoted as Qα [y(t)], by introducing and studying the identifying properties of two main assumptions: α-quantile monotone treatment selection (α-QMTS) and α-quantile monotone instrumental variables (α-QMIV). These assumptions respectively apply the ideas of MTS and MIV to conditional α-quantile functions of the form Qα [y(t)|z] and Qα [y(t)|ν]. Notice that these assumptions refer to a single quantile α ∈ (0, 1), which would be subsequently specified in applications. In fact, assuming monotonicity for all conditional quantiles, i.e. assuming first order stochastic dominance, would presumably lead to stronger identification results at the cost of a stronger distributional assumption and a more complex analysis. To provide an example of the economic content of α-QMTS and α-QMIV, imagine that the population J is composed of individuals that hold either a high school diploma (zj = HS),

21 or a college degree (zj = C). Assume that one wishes to know how the wage distribution of this population would look like if all individuals were to receive a college degree instead.3 In particular, assume that one wishes to identify Qα [y(C)], for some fixed α ∈ (0, 1). Clearly, college-degree wages are not observed for individuals with a high school diploma, and Qα [y(C)] is not point-identified absent any prior information. But, for this example, α-QMTS says that the α-quantile of wages among workers with HS, had they obtained C instead, would not be higher than the α-quantile of wages among workers that actually graduated from college. Relative to the no-assumption case, this assumption provides some identification power on the upper bound of Qα [y(C)], implying that α is a logical upper bound on the fraction of individuals with HS that could possibly get a wage equal to the observed quantity Qα [y(C)|z = C] in the counterfactual scenario of interest in which everybody gets a college degree. α-QMTS, by mimicking MTS, assumes that persons who select higher levels of schooling have weakly higher α-quantile (instead of mean) wage functions than do those who select lower levels of schooling, perhaps because they have higher ability. It should not be surprising, therefore, that if we were interested in identification of Qα [y(HS)], α-QMTS would imply that the α-quantile of wages among workers with C, had they obtained HS, would not be lower than the α-quantile of wages among workers that actually obtained HS. In fact, taking this at its extreme, if we could completely wipe out any degree of education from all population members, α-QMTS would restrict the α-quantile wage functions of those observed studying longer in the status quo situation to be greater or at least equal to those of individuals without any educational qualification. One rationale for this is that the former have higher ability than the latter, thereby facing a higher wage-schooling locus, i.e. higher wages at all possible levels 3 This counterfactual exercise may seem to evoke an unrealistic scenario. However, it should be transparent that this is a mere consequence of the uniform treatment rule assumption, typical in this setting. As I pointed out in earlier versions of the paper, it would be of great interest to study identification of P [y(t)] under non-uniform treatment rules. The mathematical analysis of the latter turns out to be different and more complex than the one needed for the uniform case, and features a structure similar to that of the mixing problem studied in Manski (1995, ch. 3).

22 of education.4 In a similar fashion, α-QMIV says that the α-quantile of the potential outcome distribution conditional on some instrument ν is weakly monotonic across sub-groups defined by ν. If, for example, ν were some measure of a person’s ability and y(·) were wages, using ν as a α-QMIV would be equivalent to assuming that the α-quantile wage functions of persons with higher measured ability cannot be lower than those of individuals with lower measured ability. On the one hand, α-QMTS and MTS share the idea that assuming the monotonicity of some feature of the potential outcome distribution conditional on treatment is a weaker and more credible assumption than the classical exogeneity assumption that requires constant conditional expectation or quantile functions across treatment sub-groups. In fact, monotone selection encompasses exogeneity as a special case and, like exogeneity, it is not refutable. A similar argument applies to α-QMIV and MIV. On the other hand, the idea of applying monotone selection or monotone instruments to the α-quantiles of a distribution of potential outcomes, as opposed to the mean, allows one to take advantage of the usual existence and robustness properties of quantiles,5 as well as of the capability of Quantile Regression (QR) models to characterize the heterogeneous impact of variables at different points of an outcome distribution.

The paper is structured as follows. Section 1.2 describes the existing partial identification 4 This story is fully consistent with the basic human capital theory taught in undergraduate labor economics (e.g. Borjas (2008)). The simplified model, elaborated from Mincer (1958), assumes that individuals select their optimal level of schooling by maximizing their present discounted value of earnings in an environment with full information and in which the only incurred costs are foregone earnings. The individual-specific wage-schooling loci, which are market determined, are assumed to be concave and identical for all individuals with the same ability but with possibly different discount factors. The slope of the wage-schooling locus traces out the marginal rate of return to schooling function over the relevant schooling range. Hence, in the model there are two key factors that lead different workers to obtain different levels of education and, hence, to have different earnings: workers either have different rates of discount, being thereby located on distinct points along the same wageschooling locus, or they differ in ability, with higher ability individuals being located on higher wage-schooling loci. 5 For instance, one might prefer assuming that the median conditional wage functions given schooling are non-decreasing, should there be few individuals with particularly high wages among low educated people or some highly educated workers with very low wages.

23 results on Qα [y(t)] under no-prior information (Manski, 1994) and MTR (Manski, 1997), using a notation consistent with the treatment-response set-up introduced above. I also define MTS and MIV formally and discuss some variations and extensions. Section 1.3 studies identification properties of α-QMTS and α-QMIV. I also consider their semi-monotone counterparts, such that the treatment and the instrument variables are vectors rather than scalars, and provide expressions for the bounds in appendix A.4. I find that analytical bounds for Qα [y(t)] can be computed by inversion of bounds defining the identification region for the corresponding distribution function only for the case of α-QMTS. Under α-QMIV, bounds on the distribution function have the particular form of empirical mixture distributions and cannot be inverted analytically, although they can be inverted numerically and used in applications. In section 1.4 I analyze identification of Qα [y(t)] under combined assumptions (α-QMTS&MTR and α-QMIV&α-QMTS&MTR). I find that, as for the mean (see Manski and Pepper (2000) and Gonzalez (2005)), MTR and α-QMTS work in a complementary fashion, so that applying them jointly greatly increases the identification power. Furthermore, I provide a theoretical result relating bounds on particular α-QTEs under α-QMTS&MTR with corresponding coefficients of the treatment dummies in a linear QR model. Identification power of α-QMTS&MTR is illustrated by findings of the empirical application presented in section 1.5. Specifically, I use data pooled from 5 waves of the Survey of Households’ Income and Wealth (SHIW) of the Bank of Italy to estimate non-parametric bounds on various quantiles of the distribution of ln(wage) for different values of the educational qualifications, and on corresponding α-QTEs. I perform the exercise for all assumptions discussed in the theoretical sections in order to size the identification power of each assumption compared to the baseline no-assumption case. As a benchmark case, I also estimate bounds

24 on the mean under the corresponding “E-Assumptions” (EE, MTR, MTS, MTS&MTR, MIV, MIV&MTR, MIV&MTS, MIV&MTS&MTR) in order to further asses whether results for quantiles display patterns similar to those for the mean; I find that they do. Finally, I compare the α-QMTS&MTR bounds, which have the greatest identifying power, with parametric α-IVQR point-estimates on the same sample based on Chernozhukov and Hansen (2005, 2006). As for the mean (Manski and Pepper, 2000), the α-QMTS&MTR upper bounds on the α-QTE comparing college degree with elementary education imply, remarkably, year-by-year returns to college completion that are generally smaller than the corresponding α-IVQR point estimates.

1.2 1.2.1

Existing Results Identification of Quantiles under Empirical Evidence Alone

An upper (lower) bound on Qα [y(t)] can be derived by first determining a lower (upper) bound on the corresponding distribution function P [y(t) ≤ y˜α ], and subsequently inverting it. In the no-information or empirical-evidence-alone case (EE), this procedure yields the bounds stated in the following proposition. Proposition 1.1. Let α ∈ (0, 1). Define rEE (α, t) and sEE (α, t) as

r

EE

(α, t) =

     Qh1−

1−α P (z=t)

i (y|z

= t)

    y IN F

EE

s

(α, t) =

     Qh

α P (z=t)

    y SU P

if P (z 6= t) < α < 1 otherwise,

i (y|z

= t)

if 0 < α ≤ P (z = t) otherwise,

25 where y IN F and y SU P are, respectively, the lower and upper bounds of the support of y(t). Then, for every t ∈ T , rEE (α, t) ≤ Qα [y(t)] ≤ sEE (α, t).

proof: See Manski (1994). As rEE (α, t) and sEE (α, t) are both non-decreasing in α, the identification set for Qα [y(t)] tends to move to the right along the support of y(t) as α increases. Moreover, the quantiles of the observable conditional distribution P [y|z = t], identifying the lower and the upper bounds of Qα [y(t)], are respectively a lower and a higher quantile than α.     α ≤ α and That is, 1 − P1−α (z=t) P (z=t) ≥ α for any P (z = t) ∈ [0, 1]. Clearly, at the extremes, if P (z = t) = 1 then Qα [y(t)] ≡ Qα [y|z = t] and point-identification is achieved; if P (z = t) = 0 then Qα [y(t)] is totally unidentified, with an identification width equal to y SU P − y IN F . This also makes transparent some trade-off for identification of each Qα [y(t)], since

P

t∈T

P (z =

t) = 1. For instance, it can be shown that for any value of α a necessary but not sufficient condition for both rEE (α, t) and sEE (α, t) to be informative is that more than half the population receive treatment t, i.e. P (z = t) > 12 . For intuition on how identification works in this case, let us maintain that individuals can either receive treatment HS (high school diploma) or treatment C (college degree), and that we are interested in their wage outcomes. In particular, we ask how the distribution of wages would look like under the hypothetical uniform treatment rule assigning a college degree qualification to everybody in the population. When no prior information is available, only observed outcomes yj (zj = C) contribute to identification of P [y(C)] or of any other feature of it, while a worst-case analysis is applied to the counterfactual outcomes y−j (C), for individuals “−j” with z−j = HS. Specifically, a logical lower bound on the distribution function P [y(C) ≤ y˜α ] obtains when the

26 conjectural wages that individuals with a high school diploma would get if they achieved a college degree are assumed to be greater than y˜α , i.e. P [y(C) ≤ y˜α |z = HS] = 0; conversely, the logical upper bound is reached when all the counterfactual wages are smaller than or equal to y˜α , i.e. P [y(C) ≤ y˜α |z = HS] = 1. Note that the dependence of y˜α on t, which in this example is given by the specific treatment C, has been dropped for notational simplicity. Finally, notice that lower and upper bounds on α-QTEs of the type Qα [y(t2 )]−Qα [y(t1 )], can be computed as LB {α-QTE(t2 − t1 )} = rEE (α, t2 )−sEE (α, t1 ) and UB {α-QTE(t2 − t1 )} = sEE (α, t2 ) − rEE (α, t1 ).

1.2.2

Identification of Quantiles under Monotone Treatment Response

Let us continue with the education-wages example to illustrate monotone treatment response (MTR). In this setting, the assumption of MTR says that an individual’s wage is a nondecreasing function of her education level. Formally, ∀j ∈ J and t1 , t2 ∈ T ,

t2 ≥ t1 =⇒ yj (t2 ) ≥ yj (t1 ).

Assume the goal is again to identify Qα [y(C)]. MTR implies yj (C) ≥ yj (HS) for all individuals j ∈ J. Hence, by MTR we can say something about the counterfactual wages of individuals with a high school diploma only, i.e. y−j (C) such that z−j = HS. This in turn yields a tighter upper bound for P [y(C) ≤ y˜α ] (and a tighter lower bound on the α-quantile) than under EE, since a logical upper bound for P [y(C) ≤ y˜α |z = HS] is now given by P [y(C) ≤ y˜α |z = C] ≤ 1. Bounds on P [y(C) ≤ y˜α ] are finally inverted to obtain the identification region for Qα [y(C)]. The general result is stated in the following proposition. Proposition 1.2 (MTR). Let T be an ordered set of treatments. Let the response function

27 yj (·), with j ∈ J, be non-decreasing on T . Let α ∈ (0, 1), and define rM T R (α, t) and sM T R (α, t) as

r

MT R

(α, t) =

     Qh1−

1−α P (z≤t)

i (y|z

≤ t)

    y IN F

MT R

s

(α, t) =

     Qh

α P (z≥t)

    y SU P

if P (z > t) < α < 1 otherwise,

i (y|z

≥ t)

if 0 < α ≤ P (z ≥ t) otherwise,

where y IN F and y SU P are the lower and the upper bound of the support of y(t). Then, for every t ∈ T, rM T R (α, t) ≤ Qα [y(t)] ≤ sM T R (α, t).

proof: See Manski (1997). Consistent with the intuition of the education-wages example, the MTR assumption generically yields at least a one-sided informative bound on quantiles of y(t). In particular, it can be shown that for any value of α, a necessary but not sufficient condition for both rM T R (α, t) and sM T R (α, t) to be informative is that there exists a positive fraction of the population receiving treatment t, i.e. P (z = t) > 0. This tells us that MTR has greater identification power than EE. The difference between the widths of the identification regions for the distribution functions under EE and MTR is the non-negative quantity

P (z < t)−

X

X     P y ≤ y˜α |z = t0 P (z = t0 )+ P y ≤ y˜α |z = t00 P (z = t00 ).

t0 t

28 Finally, under MTR, sharp bounds on the α-QTE(t2 − t1 ) are given by

    0 ≤ Qα [y(t2 )] − Qα [y(t1 )] ≤ Qα y SU P (t2 ) − Qα y IN F (t1 ) ,

where y SU P (t) and y IN F (s) are random variables in the population J such that yjIN F (t) = yj if zj ≤ t, and yjIN F (t) = y IN F otherwise; while yjSU P (t) = yj if zj ≥ t, and yjSU P (t) = y SU P otherwise.

1.2.3

More on Response-Based Restrictions and Common Confusions with IVType Assumptions

The set-up so far has been one in which treatments are quantities that can be manipulated exogenously,6 inducing variation in response, while covariates are simply realized quantities associated with the members of the population, on the same footing as zj . In fact, treatments and covariates can be usefully integrated and a generalized response function, yj∗ (t, ν) : T ×V → Y , can be introduced to represent the conjectural outcome of person j, should she receive treatment pair (t, ν) (Manski, 1997). Then, her realized outcome under the actual treatment vector (zj , vj ) is given by yj = yj∗ (zj , vj ) and the response function yj (·) : T × Y , previously regarded as a primitive, becomes a derived sub-response function obtained by evaluating the generalized response with its second argument set at the realized treatment value: yj (·) = y ∗ (·, vj ).7 By formalizing a covariate as a realized treatment, this framework makes the usefulness of the distinction between t and ν transparent whenever one wants to perform a thought experiment which varies t but not ν. Moreover, to describe scenarios under which some components of 6 Which is the main reason for distinguishing between conjectural treatments t ∈ T and actual treatments zj ∈ T . 7 This can be further generalized to allow for variation in covariates induced by variation in treatments, by introducing the covariate response function vj (·) : T → V , so that yj (t) = y ∗ [t, vj (t)] (Manski, 1997).

29 the instrumental vector also varies, the definition of T need simply be generalized to be a set of treatment vectors. Accordingly, Manski (1997) studies identification of parameters of P [y(t)] that respect stochastic dominance (“D-parameters”), when the individual response yj (·) is assumed to be a function of a K-dimensional vector of treatments (t1 , ..., tK ) and T is only semi-ordered. In the human capital application considered in this paper, one may think of sensibly defining the treatment vector in terms of a schooling component and an on-the-job training or experience component.8 In this generalized framework, confusion about the statements “covariate ν does not affect response” or “response is monotone in ν” may easily arise, for they may be both given an interpretation based on either response-type restrictions or IV-type ones. Namely, if one interprets the two statements in quotation marks as restricting individual responses, then they correspond, respectively, to an assumption of constant treatment response (Manski and Pepper, 1998):

yj∗ (t, v˜) = yj∗ (t, vj )

∀˜ v ∈ V,

and monotone treatment response:

v2 ≥ v1 =⇒ yj∗ (t, v2 ) ≥ yj∗ (t, v1 ),

with v1 , v2 ∈ V.

That is, either an individual’s wage is a constant function of her measurable ability, or it is non-decreasing in it. 8 While actually implementing this case in the empirical illustration seems to be beyond the scope of this paper, theoretical results have been derived for both the assumption of semi-monotone treatment response by Manski (1997), and for those of α-QSMTS and α-QSMIV in Appendix A.4 of this paper. An additional extension, which has already been explored by Okumura and Usui (2006) in the returns to schooling context, is represented by the assumption of concave-monotone treatment response (concave-MTR). Analytical bounds for D-outcomes and ∆DTEs under concave-MTR are derived in Manski (1997). Okumura and Usui (2006) study identification of concave-MTR combined with MTS and estimate bounds on average treatment effects of schooling on log(wages) significantly tighter than those previously obtained by Manski and Pepper (2000) on the same data. ConcaveMTR seems a plausible assumption for the returns to schooling application, being well rooted in human capital theory and substantiated by recent empirical evidence (see Heckman et al. (2003)).

30 If the second interpretation is meant instead, then the two statements correspond rather to IV and MIV assumptions. IV assumes existence of an observable ν such that

E [y(t)|ν = v˜] = E [y(t)]

∀˜ v ∈ V.

MIV relaxes this statement to one of weak monotonicity, i.e. given an ordered set of instruments V, v2 ≥ v1 =⇒ E [y(t)|ν = v2 ] ≥ E [y(t)|ν = v1 ] ,

with v1 , v2 ∈ V.

Exogenous treatment selection (ETS) and MTS (Manski and Pepper, 2000) constitute, in turn, the special cases in which ν is replaced by t. Applied to the wage-schooling relationship, MTS says that persons who select higher levels of schooling have weakly higher mean wage functions than do those who select lower levels of schooling, for instance, because they have higher ability; instead, MIV maintains that the conditional mean wage function given some measure of a person’s ability, ν, is weakly monotonic across sub-groups defined by values of ν. Then, should the type of ability measured by ν be valued in the labor market, MIV would seem far more plausible than IV, for the latter instead holds that persons with different measured ability have the same mean wage functions. α-QMTS and α-QMIV, introduced in the next Section, apply monotonicity to the α-quantile function, conditional on t and ν respectively, in place of the mean.

1.3 1.3.1

Monotonicity Restrictions on α-Quantiles Introducing α-Quantile Monotone Treatment Selection

Assumption α-QMTS holds that the α-quantile of the distribution of potential outcomes conditional on some uniform treatment z is non-decreasing in z. Formally,

31 Assumption 1.3 (α-QMTS). Let T be an ordered set and α ∈ (0, 1). Then, ∀t1 , t2 ∈ T t2 ≥ t1 =⇒ Qα [y(t)|z = t2 ] ≥ Qα [y(t)|z = t1 ] .

The key to derive the identification result for Qα [y(t)] under α-QMTS is to realize that P [y(t) ≤ y˜α |z = t] = P [y ≤ y˜α |z = t] represents a logical

  upper bound for all P y(t) ≤ y˜α |z = t˜ such that t˜ ≥ t

(1)

and a   lower bound for all P y(t) ≤ y˜α |z = t˜ such that t˜ ≤ t,

(2)

which yield the following identification region for P [y(t) ≤ y˜α ]:

P [y ≤ y˜α |z = t] P (z ≤ t) ≤ P [y(t) ≤ y˜α ] ≤

(3)

P [y ≤ y˜α |z = t] P (z ≥ t) + P (z < t).

It is then apparent how, compared to the no-assumption case, (1) improves identification of U B{P [y(t) ≤ y˜α ]} by restricting behavior of the conditional counterfactual distributions beyond the treatment sub-group z = t, to include z = t˜ > t. Symmetrically, (2) improves identification of LB{P [y(t) ≤ y˜α ]} by additionally restricting behavior of the conditional counterfactual ditributions for z = t˜ > t. The corresponding analytical sharp bounds on Qα [y(t)] are finally obtained by inversion of (3), as before. They are formally established in the following proposition. Proposition 1.4. Let the α-QMTS Assumption 1.3 hold. Let α ∈ (0, 1), and define rQM T S (α, t)

32 and sQM T S (α, t) as

r

QM T S

(α, t) =

     Qh1−

1−α P (z≥t)

i (y|z

= t)

    y IN F

s

QM T S

(α, t) =

     Qh

α P (z≤t)

if P (z < t) < α < 1 otherwise,

i (y|z

if 0 < α ≤ P (z ≤ t)

= t)

    y SU P

otherwise.

Then, for every t ∈ T , rQM T S (α, t) ≤ Qα [y(t)] ≤ sQM T S (α, t).

proof: In Appendix A.2. A lower (upper) sharp bound on Qα [y(t2 )] − Qα [y(t1 )] can be constructed, as usual, by subtracting the lower (upper) bound on Qα [y(t2 )] from the upper (lower) bound on Qα [y(t1 )].

Let us now compare α-QMTS with EE and MTR. Under α-QMTS, conditions for the UB and LB to be jointly informative are generally easier to meet than without any prior information, since     P (z ≤ t) ≥ P (z = t)    P (z ≥ t) ≥ P (z = t). Moreover, the greater identification power of α-QMTS with respect to EE can be easily seen by comparison of results of Propositions 1.4 and 1.1:    h   Q 1−    

1−α P (z≤t)

Qh

i (y|z

α P (z≥t)

≤ t) ≥ Qh1−

i (y|z

≥ t) ≤ Qh

1−α P (z=t)

α P (z=t)

i (y|z

i (y|z

= t)

= t) ,

33 i.e. the LB and the UB computed under α-QMTS are, respectively, a higher and a lower quantile of the same known conditional distribution than those under EE. Comparison of MTR and α-QMTS is less clear-cut both conceptually and in practice. On the one hand, it is possible to show that MTR and α-QMTS imply simultaneously informative identification regions for Qα [y(t)] either when    

P (z < t) < α < P (z ≥ t) (Case A)

   P (z = t) > P (z < t) − P (z > t) > 0,

or when    

P (z > t) < α < P (z ≤ t) (Case B)

   P (z = t) > P (z > t) − P (z < t) > 0. When (Case A) holds, α-QMTS yields a smaller UB on Qα [y(t)] than MTR does, while nothing definitive can be said about the LB without further restrictions or assumptions. The opposite is true for (Case B). On the other hand, MTR and α-QMTS should be better considered as two distinct monotonicitytype assumptions likely to hold simultaneously in many applications. In the schooling-wages setting for instance, MTR and (α-Q)MTS provide two different interpretions of the statement “wages increase with schooling”. Hence, rather than asking whether one dominates the other in terms of identification power, a more interesting question may be whether their joint application would yield substantial gain in identification. I address this question in Section 1.4.

1.3.2

Generalizing to α-Quantile Monotone Instrumental Variables

Assumption α-QMIV generalizes α-QMTS exactly as MIV generalizes MTS. Assumption 1.5 (α-QMIV). Let V be an ordered set of instruments.

34 Then ∀v1 , v2 ∈ V v2 ≥ v1 =⇒ Qα [y(t)|ν = v2 ] ≥ Qα [y(t)|ν = v1 ] .

The identifying power of Assumption 1.5 is established in the following proposition. Proposition 1.6. Let α-QMIV Assumption 1.5 hold. Then for each v ∈ V , the identification region for P [y(t) ≤ y˜α ] is

X

h i   P (ν = v)P y ≤ y˜α |ν = vˆˆ∗ (v), z = t P z = t|ν = vˆˆ∗ (v)

v∈V

≤ P [y(t) ≤ y˜α ] ≤ X

(4)

P (ν = v){P [y ≤ y˜α |ν = vˆ∗ (v), z = t] P (z = t|ν = vˆ∗ (v)) + P (z 6= t|ν = vˆ∗ (v))},

v∈V

where vˆˆ∗ (v) and vˆ∗ (v) are defined as

h i   vˆˆ∗ (v) =arg max P y ≤ y˜α |ν = vˆˆ, z = t P z = t|ν = vˆˆ vˆ ˆ≥v

and vˆ∗ (v) =arg min {P [y ≤ y˜α |ν = vˆ, z = t] P (z = t|ν = vˆ) + P (z 6= t|ν = vˆ)}, vˆ≤v

with Qα [y(t)] = min y˜α s.t. P [y(t) ≤ y˜α ] ≥ α.

proof: In Appendix A.3. As for the mean (see Manski and Pepper (2000)), these bounds are informative if the noassumptions bounds for Qα [y(t)|ν] are not monotone increasing in ν. When they are, the α-QMIV and the no-assumptions bounds on P [y(t) ≤ y˜α ] coincide. It can be easily noticed by inspection of (4) that the identification region of P [y(t) ≤ y˜α ] under α-QMIV has two main problematic features. First, its bounds are mixtures of empirical

35 conditional distributions, so that it is not possible to analytically invert them and express the identification region for Qα [y(t)] in terms of quantiles of known distributions and probabilities. Second, bounds on Qα [y(t)] are “implicit” not only in that the latter is embedded in y˜α on the left side of the conditioning sign of the distribution function, but also on its right side. Namely, y˜α enters the maximization and the minimization problems to determine vˆˆ∗ = vˆˆ∗ (v, y˜α ) and vˆ∗ = vˆ∗ (v, y˜α ) respectively. This implies that vˆˆ∗ and vˆ∗ cannot be solved for analytically because y˜α is unknown. An exception is represented by the binary instrument case, in which bounds on Qα [y(t)] simplify remarkably, as stated in the following corollary.

Corollary 1.7. Let v ∈ V ≡ {0, 1}, and α ∈ (0, 1). Define rBQM IV (α, t, v) and sBQM IV (α, t, v) as

rBQM IV (α, t) = max

     Qh1− 1−α i (y|ν = 0, z = t)    P (z=t|ν=0)              Qα {P [y ≤ y˜α |ν = 0, z = t] P (ν = 0, z = t) +           +P [y ≤ y˜α |ν = 1, z = t] P (ν = 1, z = t) + P (z 6= t)}

sBQM IV (α, t) = min

    i (y|ν = 1, z = t)  Qh  α   P (z=t|ν=1)              Qα {P [y ≤ y˜α |ν = 0, z = t] P (ν = 0, z = t) +           +P [y ≤ y˜α |ν = 1, z = t] P (ν = 1, z = t)} .

36 Then, for every t ∈ T , under Assumption 1.5,

rBQM IV (α, t) ≤ Qα [y(t)] ≤ sBQM IV (α, t).

proof: By substitution of v ∈ {0, 1} in (4). As in the previous results, the above max and min may be taken only for ranges of α such that the above expressions represent proper quantiles. Outside those ranges bounds are noninformative and should be replaced by y IN F and y SU P , respectively. Consider now the set-up in which yj∗ (t, ν) = g(t, ν, ηj , εj ), with t ∈ T , ν ∈ V , η ∈ N , and ε ∈ E. t and ν are treatments, while ηj and εj are conceptually distinct unobservable covariates. T , V and N are ordered sets. The following set of sufficient conditions proposed by Manski and Pepper (1998) for a variable ν to be a valid MIV guarantee also that ν is a valid α-QMIV:9 1. g(t, ·, ·, ε) : V × N → R is weakly increasing in ν and η; 2. ε is statistically independent of (ν, η); 3. P (η|ν) is weakly increasing in ν, i.e. v2 ≥ v1 =⇒ P (η|ν = v2 ) stochastically dominates P (η|ν = v1 ).

In the empirical application I use mother’s education as a α-QMIV. In terms of the above conditions this amounts to assuming that: (i) given some fixed level of education t, unobservable ability η, and the other unobservable term ε, an individual’s wage g(t, ·, η, ε) is weakly increasing in her mother’s educational level ν; and (ii) the distribution of ability among persons with higher mother’s education weakly dominates the distribution of ability among persons with 9 The derivation is easily obtained by appropriate modification of Manski and Pepper (1998)’s proof and is available from the author upon request. In fact, none of these conditions specifically involves the mean. After all, both quantiles and the mean respect stochastic dominance.

37 lower mother’s education. On the one hand, mother’s education has either no direct impact on wages or a direct positive impact; on the other hand, it is not required that education of one’s mother be equivalent to ability, but just that it be a weakly positive predictor of it. In fact, while the achievement measures pointed to by Manski and Pepper (1998) are natural and appealing MIV or α-QMIV in this context, measures of parents’ education also seem plausible when the former are not available. Having a mother with a higher educational level will presumably not hurt an individual’s wage. Moreover, it will plausibly have some prediction power on her ability, via the nurturing process or simply by the mother’s ability, if education and ability are positively related.

1.4 1.4.1

Increasing the Identification Power: Combined Assumptions α-QMTS&MTR

When MTR and α-QMTS are assumed to hold jointly, the identification region for the distribution P [y(t) ≤ y˜α ] shrinks significantly. This finding parallels that for the mean found by Manski and Pepper (2000). Formally, by the law of total probability

P [y(t) ≤ y˜α |z ≥ t] P (z ≥ t)+

X

  P y(t) ≤ y˜α |z = t0 P (z = t0 )

t0 t

  P y(t) ≤ y˜α |z = t00 P (z = t00 ).

(5)

38 Then, by MTR

P [y(t) ≤ y˜α |z ≥ t] ≥ P [y(z) ≤ y˜α |z ≥ t] = P [y ≤ y˜α |z ≥ t] and P [y(t) ≤ y˜α |z ≤ t] ≤ P [y(z) ≤ y˜α |z ≤ t] = P [y ≤ y˜α |z ≤ t] ,

and by α-QMTS P [y ≤ y˜α |z = t] is a lower bound for P [y(t) ≤ y˜α |z = s] for all s = s0 < t, and a upper bound for it for all s = s00 > t. So that (5) becomes

P [y ≤ y˜α |z ≥ t] P (z ≥ t) + P [y ≤ y˜α |z = t] P (z < t) ≤ P [y(t) ≤ y˜α ] ≤

(6)

P [y ≤ y˜α |z ≤ t] P (z ≤ t) + P [y ≤ y˜α |z = t] P (z > t).

Analogously to the case of α-QMIV, bounds of the identification region for P [y(t) ≤ y˜α ] under α-QMTS&MTR are mixtures of empirical conditional distributions that cannot be inverted analytically. Despite the lack of a closed form for the identification region of Qα [y(t)], a comparison of the identification regions of the distribution functions under α-QMTS&MTR, MTR and α-QMTS suffices to show that the former composite assumption has greater identification power than each one of the two assumptions MTR and α-QMTS applied alone. This should be intuitive, for use of more information should not worsen identification. The width of the identification region under MTR is given by

P [y ≤ y˜α |z < t] P (z < t) + (1 − P [y ≤ y˜α |z > t])P (z > t),

39 under α-QMTS by

P [y ≤ y˜α |z > t] P (z > t) + (1 − P [y ≤ y˜α |z < t])P (z < t),

and finally, under α-QMTS&MTR, by

(P [y ≤ y˜α |z < t] − P [y ≤ y˜α |z = t])P (z < t) + (P [y ≤ y˜α |z = t] − P [y ≤ y˜α |z > t])P (z > t).

Importantly, α-QMTS&MTR is refutable, while each one of α-QMTS and MTR is not, implying

    tˆ ≤ tˆˆ =⇒ Qα y|z = tˆ = Qα y(tˆ)|z = tˆ ≤ h i h i h i ≤ Qα y(tˆˆ)|z = tˆ ≤ Qα y(tˆˆ)|z = tˆˆ = Qα y|z = tˆˆ ,

(7)

i h   ˆ ˆ ˆ where Qα y|z = t and Qα y|z = t are known. Hence, equation (7) is a necessary verifiable condition for α-QMTS&MTR to hold. This would additionally open up the possibility of estimating the bounds in (6) and, hence, those on Qα [y(t)] more efficiently by incorporating the information of (7) in estimation under the form of constraints.10 The point can be more easily seen if (7) is thought of in terms of observable conditional cumulative distributions of the type P [y ≤ y˜α |z]. On the one hand, for local asymptotics this would matter only if the true P [y(t) ≤ y˜α ] were really close to P [y ≤ y˜α |z = t], since the lower bound of (6) is at most equal to P [y ≤ y˜α |z = t] and the upper bound of (6) is at least equal to P [y ≤ y˜α |z = t].11 On the other hand, it may have a sizeable effect on efficiency in finite samples.

Noticeably, Manski and Pepper (2000) use their estimates of the MTR&MTS bounds on average 10

I thank Manuel Arellano for pointing this out. Note, nonetheless, that this is not likely to be true in practice. For example, in the sample used for the empirical application, condition (7) is satisfied in its strong form, as can be seen in Table 1. 11

40 treatment effects of schooling on log(wage) to implement a specification test on the schooling coefficients obtained by estimating standard linear IV models. The underlying idea is that, since monotonicity of response weakens linearity and MTS/MIV weakens exogeneity assumed by OLS/IV, and no other assumption is made, estimated upper bounds on ATEs under MTR&MTS should necessarily be not smaller than the corresponding OLS/IV estimates. If they are, at least one of the assumptions of the linear OLS/IV regressions must not hold. This idea could be applied also to quantile regression, although one should carefully consider all the assumptions the parametric model is based on, as opposed to the monotone one. For instance, in Chernozhukov and Hansen (2005, 2006)’s model of QTE,12 point-identification is achieved through an assumption of statistical independence between the potential outcomes and an appropriate instrument, together with one of rank invariance, restricting behavior of ranks across treatments. Comparison of their set of assumptions to the ones analyzed here reveals that, on the one hand, they do not postulate any shape restriction, such as linearity or monotonicity, on the response functions. On the other hand, rank invariance requires that the relative position of individuals in the potential outcome distributions does not change across treatment states conditional on the instrument. For instance, in the basic human capital model, unobservable ability or “proneness to earn” may provide a person’s rank. Rank invariance would then imply that individuals’ relative positions, as given by this latent variable, are the same in all potential earnings’ distributions, i.e. under all conjectural levels of education.13 While their independence condition is close in spirit and form to a standard IV assumption, rank invariance does not imply nor is implied by any shape restriction on the response function, 12

For other related models dealing with identification and estimation of structural response functions and effects see for instance Chesher (2003), Imbens and Newey (2002), Newey and Powell (2003), Chernozhukov, Imbens and Newey (2007). 13 In passing, notice also that, as opposed to mean regression, since QR models allow for limited heterogeneous effects across quantiles, the framework is consistent with the idea that individuals with the same level of ability face the same returns’ schedule, while individuals with higher ability face a higher and steeper wage-schooling locus than do those with lower ability levels.

41 for individuals’ relative positions may be maintained across treatments without any necessary constraint on the relative wage levels. The two types of assumptions are fundamentally different in nature; the former requiring cross-person restrictions on the outcome distributions across different treatments and the latter affecting the shape of the individual response as a function of the treatment, without any restriction relating different persons of the population.14 In the empirical application, I compare the α-QMTS&MTR bounds on a number of α-QTEs with parametric IVQR point estimates on the same sample, based on Chernozhukov and Hansen (2006)’s model plus a linear specification.

A final note concerns bounding QTEs. While a UB on Qα [y(t2 )] − Qα [y(t1 )] can be computed in the usual way as the difference between the upper bound of Qα [y(t2 )] and the lower bound of Qα [y(t1 )], computing the LB by subtracting the UB of Qα [y(t1 )] from the LB of Qα [y(t2 )] may, in general, lead to a negative result. Therefore, following what Manski and Pepper (2000) do for the mean, in the empirical application I shall use 0 as a lower bound for the Quantile Treatment Effects, on the ground that MTR implies Qα [y(t2 )] − Qα [y(t1 )] ≥ 0, although in general this bound will not be sharp.

1.4.2

α-QMIV&α-QMTS&MTR

It can be shown that by simultaneously applying the restrictions implied by MTR, α-QMTS and α-QMIV to P [y(t) ≤ y˜α ], once it has been rewritten in terms of the law of total probability, 14

To be more precise, the basic QR model (Bassett and Koenker, 1982) typically assumes that the individual responses and, hence, the conditional quantile functions are linear in parameters, and that the slopes are homogeneous within groups identifying each quantile, although they are allowed to differ across quantiles. In quantile regressions with K parameters, as the quantile indicator, α, increases from 0 to 1 the coefficients’ estimates describe an ascending sequence of (K-1)-dimensional hyperplanes each of which passes through at least K points, according to one of the results in Koenker and Bassett (1978). That is, the conditional quantile functions should not cross. In fact, these assumptions jointly end up delivering rank invariance.

42 the following identification region is obtained:15

X

h i   P (ν = v){P y ≤ y˜α |ν = vˆˆ∗ (v), z ≥ t P z ≥ t|ν = vˆˆ∗ (v) +

v∈V

h i   +P y ≤ y˜α |ν = vˆˆ∗ (v), z = t P z < t|ν = vˆˆ∗ (v) } ≤ P [y(t) ≤ y˜α ] ≤ X

(8)

P (ν = v){P [y ≤ y˜α |ν = vˆ∗ (v), z ≤ t] P (z ≤ t|ν = vˆ∗ (v)) +

v∈V

+P [y ≤ y˜α |ν = vˆ∗ (v), z = t] P (z > t|ν = vˆ∗ (v))},

where, for all v ∈ V , vˆˆ∗ (v) and vˆ∗ (v) are defined as before:

h i   vˆˆ∗ (v) =arg max P y ≤ y˜α |ν = vˆˆ, z = t P z = t|ν = vˆˆ vˆ ˆ≥v

and vˆ∗ (v) =arg min {P [y ≤ y˜α |ν = vˆ, z = t] P (z = t|ν = vˆ) + P (z 6= t|ν = vˆ)}. vˆ≤v

Things noticeably simplify again when ν is binary, so that the LB of P [y(t) ≤ y˜α ] under αQMIV&α-QMTS&MTR is given by the max between

P [y ≤ y˜α |ν = 0, z ≥ t] P (ν = 0, z ≥ t) + P [y(t) ≤ y˜α |ν = 0, z = t] P (ν = 0, z < t) + +P [y ≤ y˜α |ν = 1, z ≥ t] P (ν = 1, z ≥ t) + P [y(t) ≤ y˜α |ν = 1, z = t] P (ν = 1, z < t) and P [y ≤ y˜α |ν = 1, z ≥ t] P (z ≥ t| ν = 1) + P [y(t) ≤ y˜α |ν = 1, z = t] P (z < t|ν = 1) , 15

A derivation of this result is available from the author upon request.

(9)

43 while its UB is given by the min between

P [y ≤ y˜α |ν = 0, z ≤ t] P (ν = 0, z ≤ t) + P [y(t) ≤ y˜α |ν = 0, z = t] P (ν = 0, z > t) + +P [y ≤ y˜α |ν = 1, z ≤ t] P (ν = 1, z ≤ t) + P [y(t) ≤ y˜α |ν = 1, z = t] P (ν = 1, z > t) and

(10)

P [y ≤ y˜α |ν = 0, z ≤ t] P (z ≤ t| ν = 0) + P [y(t) ≤ y˜α |ν = 0, z = t] P (z > t|ν = 0) .

Notice that the first maximand and the first minimand constitute the identification region for P [y(t) ≤ y˜α ] under α-QMTS&MTR. Intuitively, using the additional information yielded by α-QMIV cannot be worse than not using it.

1.4.3

Relating QR and Bounds under α-QMTS&MTR

Gonzalez (2005) explores the relation between the parametric linear OLS model and the nonparametric monotone model based on MTR and MTS by formally deriving a number of results linking the two in the case of discrete treatments. Remarkably, analogous results seem to hold also for quantiles. Specifically, it can be easily shown that the α-QMTS&MTR upper bound on the only α-QTE in the binary treatment case, as well as on the α-QTE comparing the highest and the lowest treatment in the case of multiple discrete treatments, coincides with the Koenker and Bassett (1978)’s α-QR coefficient on the highest treatment dummy. Unfortunately, the full set of relationships between the standard parametric model, based on linearity and quantile-independence, and the non-parametric model, based on response and selection monotonicity, found by Gonzalez (2005) for the mean, cannot be easily established for quantiles in the case of multiple discrete treatments, i.e. for all α-QTEs. Once again, this is due to the particular form of the bounds for P [y(t) ≤ y˜α ] that cannot be inverted analytically.

44 I report the theoretical results below, and their derivation in Appendix A.5.

Binary Treatment Consider formulating the basic α-QR model for the binary treatment case and some quantile α ∈ (0, 1) as

Qα (Yi |Di ) = γ(α) + β(α)Di ,

where Di = 1 indicates an individual with treatment z = 1 (or treated), while Di = 0 indicates an individual with treatment z = 0 (or control). The α-QTE, β(α), can be estimated by solving



 0 ˆ γˆ (α), β(α) = argmin 

 X

X

α|yi − γ − βdi |+

{i:yi ≥γ+βdi }

(1 − α)|yi − γ − βdi | ,

{i:yi , there is no treatment for 2 which both the LB and the UB are simultaneously informative under EE. A glance at the typical identification pattern is also provided in Figure 1, which shows estimated identification regions of the returns from obtaining a college or a higher degree versus a high school diploma for the median, under all analyzed assumptions. Under EE more informative bounds are obtained for extreme quantiles than for the median. Indeed, neither the LB nor the UB of Q0.50 [ln (wage(·))] are informative when no prior information is assumed, implying that the 0.50-QTE(C+ - HS) is bounded between -5.6914 and 5.6914, thereby featuring the maximum possible width of 11.3828, as shown also in Figure 1. Instead, MTR always guarantees at least one informative bound, as well as tighter bounds for all treatments and quantiles than under EE. In particular, it implies a LB of 0 for all αQTEs and an UB that is usually smaller than the one under EE. For instance, in the case of the 0.50-QTE(C+ - HS) MTR yields an identification region which implies positive returns to college and features a width equal to 2.7055, 4 times smaller than the one under EE. α-QMTS delivers tighter bounds than EE does and guarantees at least one informative bound for all quantiles and treatments, too. Notice, though, that it does not necessarily imply positive returns. Let us consider again the median treatment effect of a college or higher degree versus a high school diploma, the 0.50-QMTS bounds have a width of 4.2269, featuring 2.7 times more identification power than EE. The relationship in magnitude between the widths of the bounds under MTR and α-QMTS depends on the considered treatment. Specifically, MTR implies informative LBs for the higher treatments and informative UBs for the lower treatments. Assumption α-QMTS does the indicate that the bounds are not informative and they are reported also in parentheses as bounds of the confidence intervals. The corresponding extreme lower and upper bounds for the QTE are in turn QT E M IN ≡ −5.6914 and QT E M AX ≡ 5.6914.

52 opposite. That is why, exactly as for the mean, when MTR and α-QMTS are applied jointly, the identification power on Qα [y(t)] and on α-QTEs increases significantly and the bounds’ widths shrink appreciably compared to those obtained assuming either MTR or α-QMTS alone. In the case of the median treatment effect comparing college and high school degrees, MTR has an overall greater identification power than 0.50-QMTS, implying a significantly smaller identification region. But, the combined assumption 0.50-QMTS&MTR is enormously more powerful (6.3 and 10 times more than MTR and α-QTMS alone, respectively), featuring a width of 0.4276, which is finally comparable with the corresponding point estimates shown in Tables 9 and 10. The former implies a maximum premium of obtaining a college or a higher degree versus a high school diploma of 0.4276 for the median. The latter shows a marginal return of 0.05607. Notice that the mean per-year return for the median quantile implied by 0.50-QMTS&MTR ranges between 0.05345, assuming 8 years of formal education following high school (the maximum in the sample), and 0.1425, assuming a 3-year college degree (the minimum possible). Across quantiles, α-QMTS&MTR implies returns from obtaining a college degree, as opposed to a high school diploma, which are positive and have an upper bound of approximately 28% at the first decile (see Table 3), almost 43% at the median (see Table 5) and 48% at the ninth decile. While this does not exclude the possibility of actual lower returns at higher quantiles, it is consistent with the idea that individuals with higher ability have also higher returns to education. As far as assumption α-QMIV is concerned, results confirm the expectations of Sub-section 1.5.1. More precisely, the use of mother’s education as a (α-Q)MIV improves identification mainly at low quantiles (α = 0.10 and α = 0.25) for treatments C+ and HS, and in all cases for the mean. The indicator for the mother having a high school diploma or higher clearly lacks any

53 identifying power as a 0.50-QMIV, as can be easily deduced by inspection of Figure 1 and Table 5, which show that identification regions of 0.50-QTE(C+ - HS) are identical under EE and 0.50-QMIV, and under 0.50-QMTS&MTR and 0.50-QMIV&0.50-QMTS&MTR, respectively.27

Result of Claim 1.9, that the α-QMTS&MTR identification region for α-QTE(C+ - E-) is   0, α-QRC+ is confirmed by the empirical estimates. To see this, it is sufficient to compare the α-QR coefficient for C+ in Table 9 for each quantile, with the upper bound on α-QTE(C+ - E+) under α-QMTS&MTR for the corresponding quantile. For instance, 0.50-QRC+ = 0.50-QTE(C+ - E-) = 0.5615 under 0.50-QMTS&MTR. Relative to the other α-QTEs, i.e. those not comparing the last and the first treatments, I find that the α-QR coefficients are lower than the corresponding α-QMTS&MTR bounds, or at most not statistically different from them.28 This is empirically consistent with Gonzalez (2005)’s theoretical results on the mean.

Finally, when comparing the α-IVQR estimates in Table 10, based on the Chernozhukov and Hansen (2006) model plus a linear specification, with the non-parametric bounds under αQMTS&MTR, I find that the average year-by-year returns implied by the α-QMTS&MTR upper bounds on Qα [ln (wage(C+))] − Qα [ln (wage(E−))] and 12 years of formal education acquired after elementary are generally lower than the marginal effects estimated within the α-IVQR parametric model.29 The estimates are displayed in a tabular format for ease of comparison:30 27 Fathers’ education displayed similar patterns. The fact that these variables do not extensively improve identification may be at least partially due to the fact that they are defined as binary. In other words, allowing for more educational categories in ν, were observations available, would perhaps generate more variability in the pattern of the no-assumptions bounds, which need just to be not monotonically increasing in ν. Noticeably, though, Manski and Pepper (1998) also found that the AFQT score had very little identifying power when used as a MIV in their application. 28 To see this, again compare the α-QR coefficients on JH and HS in Table 9 with the α-QMTS&MTR bounds on the corresponding α-QTEs. 29 This is totally analogous to findings of Manski and Pepper (2000) 30 Notice that the finding holds as long as 12 or more years of education are used to compute the implied average yearly returns. In fact, the logically possible minimum number of years needed to obtain a (3-year) college degree,

54

α

0.10

0.25

0.50

0.75

0.90

α-QMTS&MTR

0.03769

0.03665

0.04679

0.05947

0.06606

α-IVQR

0.03641

0.04433

0.05607

0.06249

0.07745

This means that at least one of the assumptions underlying the parametric models does not hold in this application. Of course, the failing assumption need not be independence or rank invariance, and may well be linearity, which is not necessary for identification of Chernozhukov and Hansen (2005)’s original model.

1.6

Conclusions

Within the inferential context of predicting a distribution of outcomes P [y(t)] under a uniform treatment assignment t ∈ T , this paper deals with partial identification of the α-quantile of the distribution of interest Qα [y(t)], under the assumptions of response and selection monotonicity. On the theoretical side, I introduce and study the identifying properties of α-quantile monotone treatment selection (α-QMTS), α-quantile monotone instrumental variables (α-QMIV) and their combinations, thereby adding to the existing results on non-parametric bounds on quantiles with no prior information and under monotone treatment response (Manski (1994) and Manski (1997)). I subsequently illustrate the theoretical results through an empirical application to the returns to educational qualifications in Italy. The application explores the identification power of the assumptions MTR, α-QMTS and α-QMIV and their combinations on several quantiles of the distribution of ln(wage) under different treatments, i.e. individuals’ educational qualifications, and on a number of α-QTEs derived from them. As a benchmark case, bounds on expectation after graduating from elementary school, is 11. Using 11 years would not always yield average yearly returns, as implied by the α-QMTS&MTR bounds, smaller than the corresponding point estimates of Table 10. Yet, in the sample 778 individuals out of the 832 with a college degree have at least a 4-year college degree, implying 12 or more years of formal education after elementary. The results still holds when, more conservatively, one considers the bootstrap UB instead of the estimated one. Additionally, for some quantiles, this applies also to bounds on α-QTEs comparing intermediate treatments.

55 are also computed, under the corresponding “E-Assumptions” (EE, MTR, MTS, MTR&MTS, MIV, MTR&MTS&MIV).

When studying identification of Qα [y(t)], I find that it is possible to express its non-parametric bounds analytically, i.e. in terms of different quantiles of known conditional distributions and known probabilities, only when the monotone instrument coincides with the treatment itself, i.e. under α-QMTS. Instead, when the assumption used to compute these bounds involves more general monotone instruments, as well as when more assumptions are applied jointly, e.g. α-QMTS&MTR, I find bounds on the distribution functions that cannot be inverted analytically, although they can still be used in applications, since they are easily expressible as mixtures of known conditional distributions, with weights given by the probability values of different instrument or treatment levels. Remarkably, the main result parallels Manski and Pepper (2000)’s finding for the mean. That is, MTR and α-QMTS aid identification in a complementary fashion, so that combining them greatly increases identification power. The findings of the empirical illustration effectively illustrate the theoretical results. In particular, the joint assumption α-QMTS&MTR has substantial identification power, yielding bounds on the α-QTEs that are comparable with parametric α-IVQR point-estimates performed on the same data. This is in turn consistent with previous results for the mean (see Manski and Pepper (2000), Gonzalez (2005)). Finally, I provide a formal result on the relationship between the bounds under response and selection monotonicity and the parametric α-QR estimates based on linearity and α-quantile independence. Specifically, and analogously to the case of the mean (Gonzalez, 2005), the upper bound on the only α-QTE of the binary treatment case, as well as on the one comparing the last and the first treatment in the multiple case, under the assumption of α-QMTS&MTR coincides with the α-QR coefficient on the highest treatment dummy.

56

2

SECOND CHAPTER

Understanding Choice of High School Curriculum: Subjective Expectations and Child-Parent Interactions31

31

The paper in this chapter is my job market paper. Acknowledgements: I heartily thank Chuck Manski for his helpful comments on this work and for his constant encouragement and guidance throughout dissertation, as well as the other members of my committee, David Figlio, Joel Mokyr, and Elie Tamer, for their availability and feedback. I am enormously indebted to Paola Dongili and Diego Lubian for without their support and friendship this project would have not come into existence. I have greatly benefited from patient and insightful discussions with Federico Grigis and from inputs on different issues and at different stages by Ivan Canay, Adeline Delavande, Jon Gemus, Aldo Heffner, Aviv Nevo, Zahra Siddique, Chris Taber, Basit Zafar, Claudio Zoli, and participants to the MOOD doctoral workshop at Collegio Carlo Alberto and the econometrics seminar at NU. Data collection was funded by the SSIS of Veneto, whose financial support I gratefully acknowledge together with that of the University of Verona. All remaining errors are mine.

57

“If someone studies humanities in a general high school but after 5 years he no longer wishes to go to university, what can he do? And after studying art in a general high school? Because when one is 14 he makes a choice, and thinks that perhaps he will go to college afterwards... But after 5 years he might change his mind. And if he is fed up with school, then he can go to work [if he has attended a technical or vocational school].” (a brother) (IARD, 2001, p.62)32 “As for the high school curriculum, she decided what to study. She chose the school, but only after we had talked together. Her father, for instance, preferred another [type of] school and, perhaps, I hoped for yet a different one. But she chose in the end, after a series of discussions we had together.” (a mother) (IARD, 2001, p.39)

2.1

Introduction

Every day individuals make, within their groups, choices characterized by aspects of uncertainty. Criminals in a gang decide whether to commit crimes with partial knowledge of their probability of being arrested. Executives of a company’s board of officers decide whether to manufacture a new product with partial knowledge of the prospective market share and of the time before a competitor enters the market. Sexually active partners make contraceptive choices with partial knowledge of effectiveness and side effects. Because of the private and public impact of these and other behaviors, researchers and policy makers have long analyzed and sought to understand them. However, any research concerned with prediction of behavior to inform policy faces endemic identification problems caused by insufficient knowledge and lack of adequate information on how individuals and their groups make decisions with uncertain outcomes (Manski, 2004a, 2000). Posing strong assumptions on individual and group behavior is the current approach in economics to achieve identification and make inference on behavior. The price paid is that the content and implications of the inference will depend on the content and implications of the maintained assumptions. Aiding identification through collection of new data is the main available alternative. It is costly and methodologically challenging, but it has some advantages. First, it transfers the locus of the assumptions from things researchers neither know to be true nor can test (the behavioral processes of individuals and groups) to things over which they 32

From the sociological survey IARD (2001). My translation from the Italian.

58 have some control or, at least, better information (the data collection). Second, it can be used to shed light on the plausibility of some assumptions commonly maintained for identification (e.g., Pareto optimality and perfect knowledge by each decision participant of preferences and expectations of the others). My work collects new data to aid analysis of high school curriculum choice under curricular tracking, with focus on uncertainty and on heterogeneity of family decision protocols.33 Since families generally do not have perfect knowledge of the child’s tastes, ability, and future opportunities, high school curriculum choice under curricular tracking is an important and consequential family decision characterized by an uncertain quality of the match between the child and the available curricula. Curriculum choice has also been regarded as a channel through which a certain societal structure may replicate itself, thereby allowing for little intergenerational mobility (Checchi and Flabbi, 2007). In fact, it is a channel through which parents may try to create their children in their own image (` a la Bisin and Verdier (2001)) or to improve their children’s condition (as in Doepke and Zilibotti (2008)), i.e., via “cultural transmission” of preferences and beliefs. Yet, little is known of how children and their parents perceive the uncertain dimensions of this choice and of what roles children, parents, and teachers play in it. Existing empirical work is mostly descriptive in nature (e.g., Flabbi (2001) and Checchi and Flabbi (2007)), and, though useful and informative on questions such as “which families choose what”, it is not suitable for making prediction and informing policy nor for uncovering the channels through which parents influence their children’s paths. First, modeling uncertainty requires disentangling preferences from expectations of individual decision makers.34 Second, the presence of multiple decision makers introduces additional identification challenges. In particular, when a choice under uncertainty is made in a collective fashion rather than unilaterally, the 33

Throughout the paper I shall refer to the family decision protocol interchangeably as decision or choice process or regime or as mode of interaction among choice participants. 34 Manski (2002) provides an illustration based on the ultimatum game.

59 interplay of expectations and preferences of family members in generating the observed choice is determined by the type of interactions among them. And if the latter is not accounted for, the parameters of the model will capture aspects of the interactions among members together with their preferences and expectations, without any possibility of separating them. In this paper, I use a novel data set I collected in high schools of the Municipality of Verona, Italy to estimate a behavioral model of curriculum choice that accounts for uncertainty and for heterogeneity of decision protocols across families. In particular, the paper asks • What are the most important determinants of curriculum choice among those aspects that are generally regarded as potentially relevant and uncertain at the moment of the choice? • Conditional on the observed decision protocol, to what extent are parents’ beliefs transmitted to children in the curriculum choice? Do parents’ preferences affect curriculum choice? • Is it important to account for heterogeneity of family decision protocols and for multiple decision makers in curriculum choice? I attempt to answer these questions by modeling the decision maker’s and family’s black boxes and by gathering information on some of the usually unobserved constructs of a decision process– beliefs, preferences, and decision protocol–in the spirit of McFadden (1989). Data collection was carried out by surveying individuals retrospectively during the first weeks of school. This has two main advantages. The first is that actual choices are observed. The second is that children and parents participating in the survey could separately provide their expectations and preferences before the decision, a point in time that is likely to vary across families.35 The disadvantage of the retrospective approach is that its validity relies on the 35 The way in which the choice process unfolds over time may vary, too. E.g., in cooperative families children and parents may choose after a series of repeated interactions, or they may “sit down around a table” and settle on the choice once and for all.

60 capability of respondents to recall their expectations as they were before any main interaction between child and parent occurred, if at all.36 Finally, I focus on situations in which either one party chooses unilaterally or family behavior is cooperative in nature. This is justified by the fact that in my setting children and parents are assumed to try to solve the very same problem: choosing the curriculum that suits the child best.37 The data set includes • the probabilistic expectations of children and parents before the choice on a number of uncertain outcomes potentially relevant for it; • the stated preferences (SP) of children and parents before the choice, in the form of curriculum rankings; • the family’s actual choice, or revealed preference data (RP); • the child-parent mode of interaction in making the choice, as one of several verbal categories describing whether they employed some form of joint decision or one of the two parties chose unilaterally; • the curriculum suggestion given by junior high school teachers of the child. A descriptive analysis of the data reveals that children had a greater role in the choice than their parents, at least explicitly. The majority of children and of participating parents reported either that the child chose alone (27%), or that he chose after listening to his parent (35%), or 36

It is assumed that individual preferences do not change. It has been noticed elsewhere (Lundberg et al., 2007) that economic models of child-parent interactions mostly tend to be based on non-cooperative game theory (e.g., Hao et al. (2008)), for the standard assumption of (spousal) bargaining (that binding, costlessly enforceable agreements can support an efficient solution) is generally not considered a plausible one in the child-parent context. This appears to be reinforced by those works modeling children behavior as one that can strategically limit parental control and cause the “Rotten Kid Theorem” (Becker, 1981) to fail (e.g., Bergstrom (1989)). While an approach based on non-cooperative games would be worth exploring, it is left for future research. Interesting examples of non-cooperative models of family decision-making within a discrete choice framework are Hiedemann and Stern (1999) and Engers and Stern (2002). The IO literature on identification and estimation of non-cooperative games represents another promising modeling approach. 37

61 that child and parent chose together (38%); only a handful of respondents indicated that the parent chose alone or after listening to her child. (Notice that throughout the paper I refer to any one child as a “he”, any one parent as a “she”, any one planner as a “he”, and any one researcher as a “she” and that in my setup families are child-parent dyads.) These answers are consistent with the evidence obtained by comparing each child’s and parent’s (stated) preferred curricula with the actual choice. In the overall sample, 14% of children did not have their own way in the choice versus 40% of parents. When the sample is disaggregated by reported decision protocol, the first figure intuitively increases with parental participation in the choice while the second figure decreases with it. Moreover, whenever the choice had entailed some form of interaction between child and parent, family members had generally a better sense of the preferences of the other members.38 Decision protocol and chosen curriculum are statistically related. However, it is to be established whether such a relationship is structural in nature, as it would be, e.g., if selection of a family decision protocol for the curriculum choice depends on the child’s and parent’s structures of preferences and beliefs.39 It is well known that family background characteristics such as parental education and child’s ability are systematically related to curriculum choice; my data suggest that those characteristics are related to selection of a family decision protocol for the curriculum choice as well. In particular, while most children enrolled in the artistic track and in vocational curricula reported having made their choice unilaterally, the majority of children enrolled in technical curricula reported having made a joint decision with their parents, 38

I could assess this by comparing the stated preferred curriculum of each party with the perception of the other party. Nonetheless, parents appear much more knowledgeable of their children’s preferences than children of their parents’, independently of the reported decision protocol. 39 This may depend on the gains and costs from cooperation, as in Del Boca and Flinn (2006, 2009); on a specific parental behavior, as in Bisin and Verdier (2001) and Doepke and Zilibotti (2008); or on some other reasons. Indeed, it would be extremely important to model decision protocol selection and curriculum choice jointly since, should they be structurally related, a policy targeting expectations of family members would affect curriculum choice both directly and through the channel of choice protocol selection. Hence, a model of curriculum choice in isolation would not enable one to evaluate the effects of such a policy. I consider this paper and Giustinelli (2010a) to be first necessary steps toward this goal.

62 and children attending curricula of the general track indicated having chosen after listening to their parents or jointly with them. Another relevant finding is that only a small fraction of cooperative families (3-5%) in the sample selected a dominated choice, given the child’s and parent’s initial preferences. Finally, although my data are cross-sectional and do not allow me to carry any formal test of rational expectations, in the online appendix to the paper I provide a detailed comparison between children’s and parents’ expectations and between their expectations and available realizations for similar populations and samples. Encouragingly, (mean and median) responses are not very far from the realized statistics, though within children’s and parents’ groups heterogeneity of beliefs is prevalent. As for the comparison between children and parents, previous findings that parents are generally more optimistic than their children are confirmed in my sample (see, e.g., Fischhoff et al. (2000), Dominitz et al. (2001), and Attanasio and Kaufmann (2009)). In the empirical analysis I first estimate a preliminary logit model of curriculum choice that does not account for heterogeneity of decision protocols across families and assumes that children and parents, alternatively, are the “relevant” or “representative” decision makers for the curriculum choice.40 I use elicited probabilistic expectations directly in the econometric model to separately identify the effects of preferences and expectations of decision makers on the choice, as suggested by Manski (1993, 2004a,b) and done by Delavande (2008) and Zafar (2008) in empirical analysis. This allows me to relax standard but strong assumptions on expectations, like rationality (e.g., Willis and Rosen (1979) and Checchi and Flabbi (2007)), myopia (Freeman, 1971), or others (Manski and Wise, 1983), typically employed as an identification device. Overall the approach is reasonable because of a specific institutional feature of Italian secondary education: open enrollment. That is, lack of selectivity from the school side eliminates 40

The model is estimated by weighted exogenous maximum likelihood (WESML) (Manski and Lerman, 1977) to account for choice-based sampling. Relaxation of homogeneous preferences and IIA are proposed for the next version of the paper.

63 potential identification problems from the interplay of demand and supply in producing observed choices. Finally, the size of the area, location of the schools within the area, and characteristics of the public transport network make all curricula in the defined choice set available to everybody. For the basic model I find that the outcomes “child likes the subjects”, “child faces the college field choice flexibly”, and “child graduates in the regular time” are the most valued ones, both based on children’s and parents’ expectations. However, some differences exist in the relative magnitude of preference parameters for pairs of outcomes between children and parents and in the importance ranking of preferences over less important outcomes. The model based on parents’ expectations seems less good at capturing families’ actual behavior, consistent with children’s prominent role in the choice. I then use the decision protocols reported by respondents to classify sampled families into the three main observed types: child chooses alone, child chooses after listening to the parent, and child and parent make a joint decision. And I estimate a distinct behavioral model of curriculum choice for each type.41 In estimation I use again probabilistic expectations to separate preferences and expectations of decision participants. I additionally use information on stated preferences (SP) of children and parents together with their revealed preferences (RP) to disentangle preference parameters from parameters capturing different types of child-parent interactions. That is, estimates are obtained through an SP-RP combined framework (e.g., Hensher et al. (1999)) which, though routinely used in transportation and marketing to study static discrete choices under certainty (e.g., Dosman and Adamowicz (2006)), is not standard in 41

Here I re-interpret the idea of decision protocol heterogeneity in situations of individual decision-making (e.g., Gopinath (1995) and Swait and Adamowicz (2001)), as one of decision protocol heterogeneity of family decision-making. This idea is rooted in the structure of random utility models (McFadden, 1974) laid out by Manski (1977), with utility maximization as decision rule and a decision maker and a choice set constituting the choice problem generating process. An alternative interpretation comes from the psychological model of random utilities (Luce and Suppes, 1965), which assumes that each decision maker carries a distribution of utility functions internally and selects one at random whenever a decision must be made. Then, each family may be thought of as first drawing a choice protocol and then making a choice according to that protocol, as in Steckel et al. (1988).

64 economics and, to the best of my knowledge, has never been employed for estimation of group choices under uncertainty. Estimates of the basic specifications reveal the existence of both similarities and differences in preferences across groups of families behaving according to distinct choice protocols and between children and parents.42 For instance, “child likes the subjects” is consistently the most important outcome for both children and parents and for all decision protocol groups. However, the importance ranking of the other outcomes and the relative magnitude of their weights tend to vary across groups. Notably, children who had listened to their parents partially incorporated parental expectations into their own when making the choice, and in cooperative families parents’ preferences and expectations substantially affected the final choice. An important feature of the empirical model is that it is directly specified in terms of the primitives of the decision problem. Hence, despite its simplicity, the model is capable of generating policy-relevant information on the relative roles of preferences, expectations, and interactions in driving observed choices, and can be used to answer basic counterfactual questions relevant for education policy. Based on current estimates, differences in predictions delivered by a “representative decision maker”-type model and by a model with heterogeneous decision protocols for a number of counterfactual scenarios suggest that accounting for decision protocol heterogeneity may be important for policy analysis. The paper is organized as follows. In Section 2.2, I motivate the main components of my work and relate them to the existing literature. In Section 2.3, first I conceptualize the curriculum choice problem for the child, the parent, and the family as a whole. (Supporting institutional details are summarized in Appendix B.2.) Then, by means of a simple example with 2 alternatives, 2 factors, 2 potential decision makers, and 2 potential decision protocols I 42

The proposed empirical models account for possible correlation across different sources of data in a form that makes the relation between the observational difficulty by the econometrician and the decision protocol explicit. However, current estimates are obtained from basic specifications with homogeneous preferences within groups of children and within protocols and with no correlation across data sources.

65 illustrate the main identification problem faced by a researcher seeking to analyze curriculum choice, and highlight its relevance for policy analysis. In Section 2.4, I formalize a small number of behavioral models of curriculum choice under uncertainty, corresponding to three classes of decision protocols employable by dyads of decision makers facing a choice under uncertainty. In Section 2.5, I report on data collection, and I illustrate the key questions of the surveys while motivating the main methodological choices behind them. In Section 2.6, I describe the estimation samples. In Section 2.7, I present the basic econometric models of curriculum choice, and discuss estimation results. In Section 2.8, I provide a number of illustrative counterfactual exercises based on current estimates. Conclusions follow.

2.2 2.2.1

Motivation and Related Literature Application

Curriculum Choice.

In the majority of the OECD countries children are sorted into curric-

ular tracks at some point during high school and in some cases even earlier; the modal age of first tracking is between 15 and 16 (Brunello and Checchi, 2006).43 The purpose of curricular tracking is to provide educational specialization, so that children with different aptitudes and aspirations may pursue careers involving different areas and types of expertise by receiving a training in specific curricula within the general, technical, or vocational track (Malamud, 2007). However, the institutional features of a stratified system have been shown to affect efficiency and equity of the system by interacting non-trivially with one another. For example, Brunello and Giannini (2004) analyze efficiency, Brunello and Checchi (2006) and Checchi and Flabbi (2007) 43

Tracking may be purely ability-based (as in the U.S.), curricular, or a combination of the two (as in many European countries). Germany is an example of “curricular tracking by testing”. Italian tracking has been referred to as one “by family background” (Checchi and Flabbi, 2007), since families are ultimately responsible for the choice. Age of first tracking ranges from 10 in Austria and Germany to 18 in the U.S. and Canada (Brunello and Checchi, 2006). Indeed, the U.S. schooling system is considered to be “de-tracked” curricular-wise; however, the state of Florida has recently introduced the requirement that high school students declare a major in their 9th grade.

66 focus on equity under curricular tracking, and Figlio and Page (2002) under ability grouping. The tension between breadth and depth of education is a prominent issue (see Brunello et al. (2007), Ariga et al. (2005), and Ariga et al. (2006)). Intuitively, if a central planner had perfect knowledge of the ability of each child, he may wish to track students as early as possible. But since, like families, he does not, he faces a clear trade-off between time of tracking and probability of misallocation. On the one hand, awareness that the type of training in high school carries consequences for future education and work opportunities and that “wrong choices” may not be easily or costlessly corrected makes it very important for adolescents to be able to make a sensible choice (whenever choice is the allocation mechanism). On the other hand, the earlier the first age at tracking, the longer the future to be anticipated, and the less the available experience with past education performance that can be used to form expectations on future opportunities, choices, and outcomes.44 This opens up to the conceptualization of curriculum choice as one made under uncertainty. Bamberger (1986) and Altonji (1993) model choice of track as one that requires a large investment in training and is made by individuals under uncertainty about their ability and investment returns. With the same motivations, Zafar (2008) estimates a static model of college major choice under uncertainty on a sample of Northwestern University students using elicited expectations data. The main differences between Zafar (2008)’s work and mine are the analyzed choice and that I account for heterogeneity of decision regimes across families. Zafar (2008) assumes that college students are the main decision makers for the choice of major, which appears sensible given their age, and yet finds evidence of a possibly strong influence by parents in the choice. This relates directly to the problem of the relative roles of parents and children 44 A girl enrolled in a vocational school for tourism motivates her choice thus: “I chose this school for the training [languages] and because afterwards I would like to study law in university. But assume that something happens to me, [with this diploma] I can still find a job in a travel agency... I am not lost. It [this school] provides me with several job opportunities.” And her mother confirms, “Perhaps, once A. has gotten her diploma she may change her mind, and decide she does not want to go to college any more... Yet, [thanks to this training] she will hold a diploma that allows her to find a job. A piece of paper is chased.” (IARD, 2001, p. 38)

67 in family decisions concerning the latter.

Children in Family Decision Making.

Since children are tracked into different curricula

during adolescence, identification of a decision maker for the curriculum choice does not seem at all unambiguous. On the one hand, adolescents undergo development of their preferences and their capability for communication, formal reasoning, and independent action; on the other hand, they still rely on their parents for guidance and support.45 An adolescent seems definitely old enough to play an active role in the high school choice, but his rate and level of autonomy acquisition will presumably vary depending on his traits and abilities, his environment, the preferences and resources of his parents as well as their parenting style (Lundberg et al., 2007). Indeed, it seems natural to hypothesize existence of heterogeneous decision processes across families, from a unilateral-type decision to more interactive protocols.46 However, this sort of middle-ground situation is not easily accommodated by the traditional economic literature (Lundberg et al., 2007), and virtually no study of school choice in economics has challenged the unitary view of household behavior (Becker, 1974, 1981).47 An important exception is Attanasio and Kaufmann (2009) who find that both youths’ and 45

A child’s say increases gradually with age. Prior research establishes that children’s involvement in decisions increases over ages 9-13 (Yee and Flanagan, 1985), while decision autonomy increases over ages 12-17 (Dornbusch et al., 1985). The formal reasoning skills needed to generate and weigh alternatives have been found to develop rapidly from age 8-9 to age 15-16 (Keating, 1990). 46 A girl explains, “They never wanted to influence me too much, I think because, should it have turned out that the choice they had imposed was a mistake, they would have regretted it. Hence, they let me free.” (IARD, 2001, p.63) A mother, with reference to the fact that her child had autonomously proposed his choice and provided a clear supporting argument for it, “I liked such a clear idea, and I agreed!” (IARD, 2001, p. 59) A daughter: “My mom wanted me to attend the artistic high school, my father the accounting curriculum, and I chose the teacher-training school, instead. So, I gave them both the sack.” (IARD, 2001, p. 61) 47 Becker (1981, p. 298) argues, “Even altruistic parents do not merely accept the utility functions of young children who are too inexperienced to know what is ‘good for them’ ” and further, “the basic utility functions of young children would be accepted, but the children could not be trusted to maximize their utility because they would be poorly informed about household production functions.” Though he admits, “Of course children (in modern times, especially adolescents) may believe that they do know enough and that their parents are out of touch with important changes (...) The conflict with older children is usually less severe, and altruistic parents are more willing simply to contribute dollars that children can spend as they wish (...) [This conflict] means that a common utility function for the family does not exist; different members maximize different utility functions.” In fact, the problems of the decision maker’s identity and of the heterogeneity of processes in schooling decisions have received some recognition in sociology by Gambetta (1987), and seem to have been in the minds of designers of sociological and governmental surveys (e.g., NELS88 (1988), IARD (2001), and that by ISTAT in 1959 (Gambetta, 1987)). Nonetheless, to the best of my knowledge, they have never received any formal treatment.

68 parents’ expectations matter for the high school attendance decision in rural Mexico, while only the former are relevant for enrollment in college. But they do not model interactions between children and their parents. Hence, the role of adolescents in family decision-making needs formal accommodation.48 Research in economics of the family has definitely reached a point in which it seems compelling to bring the child in, as an active participant in the family decision-making. This is also supported by results of ongoing research testing the collective model in the context of multiple decision-makers, including children 16 or older (Dauphin et al., 2008) and by the first few attempts recently made to model child-parent interactions, such as Burton et al. (2002), Hao et al. (2008), and Cosconati (2009).

Group Decision Making Across Fields. Streams of empirical studies of family and group decision-making have been developing in different fields, such as household economics, marketing, and transportation. Current studies of family discrete choice in marketing generally employ conjoint-type data and Bayesian estimation techniques to estimate preferences of choice participants together with attribute-level parameters capturing the interaction among them, typically for situations of choice under certainty (e.g, Arora and Allenby (1999), Aribarg et al. (2002), Arora (2006), and Aribarg et al. (2009)).49 The recent empirical studies of family decision in the field of transportation relate more closely to the economic literature on collective models of the household (see Bhat and Pendyala (2005) and Timmermans and Zhang (2009) for summaries). Some of these studies have also 48

Contrast this to the studies of intra-household decision-making in the marriage or fertility literature, which commonly analyze interactions among family members in terms of cooperative or non-cooperative game theory or based on the assumption of Pareto efficiency, a ` la Chiappori (1988). Starting with Manser and Brown (1980) and McElroy and Horney (1981), researchers have been employing cooperative bargaining models of household behavior. Subsequently, other researchers have proposed models of household behavior based on non-cooperative games (e.g., Chen and Woolley (2001)). More recently, the two approaches have been integrated to accommodate situations where the assumption of Pareto efficiency fails (e.g., Del Boca and Flinn (2006, 2009)). 49 These works are the new generation of a pre-existing literature on group decision-making extensively reviewed by Steckel et al. (1991) and Corfman and Gupta (1993), where the latter covers modeling approaches to group decision-making proposed in a variety of fields (including economics) and for a variety of choice situations.

69 entertained the idea of heterogeneous processes of joint decision-making across households (e.g., Zhang and Fujiwara (2006) and Zhang et al. (2009)), and have incorporated it into the econometric model through a latent class approach, since, differently from my case, information on the actual process used by each household was not available to them.

2.2.2

Methodological Aspects

Discrete Choice Models with Subjective Expectations Data.

On the methodological

side, this paper relates to the studies that collect and analyze expectations of children in their teens (e.g., Dominitz and Manski (1996) and Fischhoff et al. (2000)), some times together with those of their parents (e.g., Quadrel et al. (1993), Dominitz et al. (2001), and Attanasio and Kaufmann (2009)). It is also related to works employing subjective expectations data in estimation of structural models of discrete choice–like Delavande (2008) and Zafar (2008)–from which I borrow the setup.50 However, children’s expectations seem to have never been used in estimation of discrete models of choice before.51

Discrete Choice Models with Revealed and Stated Preference Data.

In this paper,

I use a Revealed Preference (RP)-Stated Preference (SP) integrated approach to gain identification power over the structural parameters of the curriculum choice model that would not be identified using RP data alone. Ben-Akiva and Morikawa (1990) and Morikawa et al. (1991) 50

See also Van der Klaauw (2000) and Erdem et al. (2005) for applications to dynamic discrete choices. One issue may be that children of this age are considered too young for their behavior to be captured in any meaningful way by models based on rationality and, in particular, on expected utility maximization. While I acknowledge this concern, I refer to recent experimental studies that have evaluated the development of abilities by children to make rational economic decisions, and have found that behavior of adolescents is very similar to that usually noticed in adults. E.g., Harbaugh et al. (2001) find that the choices of children as young as 11 seldom violate the generalized axiom of revealed preference. And though ability to appropriately weight high-probability and low-probability events in choices under risk seems to develop more slowly (Harbaugh et al., 2002), children 14-to-20 and adults did behave very similarly in Harbaugh et al. (2002)’s sample. Moreover, in studies of how children judge expected value of complex gambles in which alternative outcomes have different prizes, Schlottmann and Anderson (1994) and Schlottmann (2001) find that children as young as 6 (and always by 8-9) use the normatively prescribed multiplication rule for integrating probability and value of each individual outcome. And though significant fraction of participants deviated from the normative addition rule for integration over outcomes, their risk seeking and risk averse patterns of judgment turned out to be similar to those generally found with adults. 51

70 first noticed complementarity properties of revealed preference (RP) and stated preference (SP) data, and proposed a simple method to combine them in estimation of discrete choice models of travel demand to exploit their relative advantages. The method was subsequently extended by Morikawa (1994) to incorporate serial correlation and state dependence between the two data sources. Ben-Akiva and Morikawa (1997) and Hensher et al. (1999) provide up-to-date summaries of the literature, illustrate the main theoretical framework, and discuss a number of conceptual and methodological issues. More recently, Train and Wilson (2007) have analyzed instances in which collection of SP data is designed on the basis of the corresponding RP choice situation. They describe a general estimation method that accounts for the non-independence between RP and SP generated by that practice, and show conditions under which standard estimation methods are consistent despite the non-independence.

Discrete Choice Models with Endogenously Stratified Samples.

As my sample is

choice-based, this work is also related to the literature on estimation of discrete choice models with endogenously stratified samples (see Manski and McFadden (1981) and Cosslett (1993) for a systematic treatment). Choice-based sampling means that a random sample of individuals was drawn within each available alternative and their characteristics observed. Hence, as opposed to random sampling, the rates at which individuals are sampled from different alternatives need not coincide with the population choice shares. In estimation I use Manski and Lerman (1977)’s Weighted Exogenous Sampling Maximum Likelihood (WESML) estimator (see technical details in Section 2.7 and Appendix B.3).

71

2.3 2.3.1

Curriculum Choice and the Identification Problem Conceptualizing Curriculum Choice

I conceptualize and model curriculum choice within the following environment.52 Notation is such that when a letter indexing individuals is used as a subscript, it indicates that the variable or parameter to which it refers is individual-specific; when it is used as a superscript, it indicates that the variable or parameter pertains to the group of individuals identified by the index. There is a population of families, F, indexed by f = 1, ..., F . Each family is formed by one adolescent child, c = c(f ), and one parent, p = p(f ), facing the choice of high school curriculum for the child over a set of available curricula, J , indexed by j = 1, ..., J.53 Families view this choice as one that entails making an optimal child-curriculum match. In practice, they wish to maximize a parameter, {θjc }j∈J , representing the unknown quality of the match between the child and each available curriculum. θjc may be thought of as a composite parameter encompassing aspects related to the immediate quality of the choice as well as to post-graduation choices and opportunities. Assuming separability of the components of θjc yields a convenient represeni

54 For tation of uncertainty through a set of binary outcomes, B i = {{bn ∈ {0, 1}}N n=1 }i∈{c,p} .

example, whether the student would enjoy the core subjects of curriculum j, how his performance would be in curriculum j, which opportunities curriculum j would provide him with after graduation, and so forth. Family members do not necessarily know the child-curriculum spei

cific objective realization probabilities of the outcomes, say {{{Πcj (bn )}Jj=1 }N n=1 }i∈{c,p} , but they 52 Grounded on previous discussion and institutional details provided in Appendix B.2, I assume (1) a hierarchical process of (a) selection of family decision protocol, (b) curriculum choice, and (c) school choice; (2) separability of curriculum choice from other family choices. This allows me to analyze curriculum choice in isolation. 53 I consider child-parent dyads rather than child-parents triads for the purpose of the empirical application, since the parent questionnaire was filled in by one parent only. Theoretically, this amounts to assuming that parental role in the choice can be represented through primitives–preferences, expectations, and decision protocol– of one parent only, the “relevant” or “representative” parent. Moreover, for simplicity each family is assumed to face the same choice set, whereas in general that may vary across families, and should be indicated more precisely as Jf . 54 i B may generally differ both between and within groups, but for simplicity the latter level of generality is dispensed of here.

72 i

55 Specifically, P hold subjective assessments of them, {{{Pij (bn )}Jj=1 }N ijn indicates n=1 }i∈{c,p} .

the subjective probability of member i ∈ {c, p} that outcome bn = 1 would occur if curriculum j were selected. Hence, for each family member, i ∈ {c, p}, N i × J subjective probabilities are defined.56

The Child Problem. Let us momentarily assume that the child is the unilateral decision maker of the high school curriculum choice. Under expected utility maximization he would evaluate the available curricula and determine his most preferred one by solving

 c Mc max EUcj {bn }N n=1 , {xcjm }m=1 , zc ; εcj , j∈J

(11)

which is a function of the vector of uncertain outcomes (bc = (b1 , ..., bN c )), of a M c × 1 vector of child-curriculum specific attributes not subject to uncertainty (xcj = (xcj1 , ..., xcjM c )0 ), of a vector of individual characteristics (zc ), and of a random term unobservable to the econometrician (εcj ). Applying additive separability of the binary outcomes, the attributes, and the unobserved component leads to c

M ax EUcj = j∈J

N X X n=1 bn ∈{0,1}

c

Pcj (bn )·u(bn , zc )+x0cj δ c (zc )+εcj

=

N X

Pcjn ·∆un (zc )+Uc0 +x0cj δ c (zc )+εcj ,

n=1

(12) with Pcjn = Pcj (bn = 1). Each structural preference parameter over outcomes, ∆un (zc ) = u(bn = 1, zc ) − u(bn = 0, zc ), is the difference in utility of child c deriving from occurrence of outcome n (i.e., bn = 1) relative to its non-occurrence (i.e., bn = 0). In this formulation preference parameters are assumed to be identical for all children sharing the same observable 55

I.e., individuals are not assumed to have rational expectations, but the possibility is allowed for. i This requires elicitation of {{{Pij (bn = 1)}Jj=1 }N n=1 }i∈{c,p} in place of the more complicated objects {{Pij (b1 , ..., bN )}Jj=1 }i∈{c,p} . Moreover, notice that if multiple discrete or continuous outcomes were allowed for, multiple points in the subjective distributions of beliefs should be elicited. Hence, choice of modeling uncertainty as a set of distinct events with uncertain binary outcomes is purely dictated by feasibility of data collection. 56

73 characteristics, zc . Being constant over alternatives, Uc0 =

PN c

n=1 u(bn

= 0, zc ) drops out of the

choice.57

The Parent Problem.

Under the assumption that parents put themselves into their chil-

dren’s “shoes”, the maximization problem they face can be formalized as in (12) by simply substituting the individual index c with p. This is because parents solve the same problem as their children, though they do it through their own lenses, i.e., through their subjective expectations and their preference parameters. This assumption echoes the parents’ “imperfect empathy” of Bisin and Verdier (2001).

The Family Problem. A family-level decision protocol for the curriculum choice may either consist of a unilateral decision by a single family member or may entail some form of interaction between members. Specifying a particular form of interaction requires knowledge or assumptions on how expectations and preferences of the members are aggregated in the decision process and on whether and how the choice set and possible constraints are modified by the interaction itself. In Section 2.4, I describe few models of group choice of high school curriculum under uncertainty which, though closely related to those analyzed in marketing and transportation (see Section 2.2.1), feature an additional layer of aggregation–that of expectations–as in Raiffa (1968)’s panel of experts problem. The proposed models differ from one another with respect to the nature of child-parent interactions, i.e., with respect to the rule they use to aggregate their preferences and their expectations. First, however, I will present an idealized illustration of the main identification problems that a researcher encounters if she wishes to analyze curriculum choice as I have conceptualized it in this section. 57 Linearity of expected utility implies that decision makers are risk-neutral. However, some students may wish to make a curriculum choice that enables them to “insure” against the presently uncertain outcomes of the future college and work choices. To account for this aspect, in the empirical model I include individuals’ perception of the degree of flexibility that different curricula would give them in the future choices of college versus work and of the field in college (see description in Subsection 2.5.2).

74 2.3.2

The Identification Problem, Idealized

Let us consider the following “binary world” with 2 potential decision makers, 2 alternatives, 2 factors, and 2 protocols. A dyad of an adolescent child (c) and a parent (p) face the choice of high school curriculum for the child. Choice is between two alternatives, the Michelangelo (M) and the Galileo (G) curricula. The following two factors are potentially relevant for the choice: 1. the level of difficulty of the program (D), measured as the perceived probability that the child would graduate in the regular time from each curriculum if he attended it; 2. the degree of flexibility the program will provide to the child when he faces the subsequent choice of field in college (F), measured as the perceived probability that the child would receive a training that would allow him to choose among a wide range of fields if he graduated from the curriculum.58 For the purpose of the illustration, let us maintain the following set of assumptions: 1. The objective probabilities are {(ΠM D , ΠM F ); (ΠGD , ΠGF )} = {(95, 30); (70, 90)}. That is, an M-diploma is easier to obtain than a G-diploma, since ΠM D = 95 > ΠGD = 70 (math at Galileo is really hard). Also, an M-diploma provides less flexibility than a Gdiploma, since ΠM F = 30 < ΠGF = 90 (Michelangelo’s artistic training is somewhat narrow; good only if the child wants to study architecture or some art-related field in college). 2. The subjective realization probabilities, {(PiM D , PiM F ); (PiGD , PiGF )} with i ∈ {c, p}, may or may not coincide with the objective ones. 3. The expected utilities are EUij = PijD · ∆uiD + PijF · ∆uiF , j ∈ {M, G} and i ∈ {c, p}. 4. The two following choice protocols are possible: either the child chooses unilaterally or child and parent choose jointly. When the child chooses unilaterally he maximizes EUcj 58 “I knew I would go to college and I could do well in any type of generic high school, then they [parents] said: ‘The scientific school is better because you have more options afterwards.’ That is, it was a school that would allow me to choose among a large number of university fields.” (IARD, 2001, p. 39)

75 over {M, G}. When child and parent choose jointly they weight their expected utilities with weights {wc , wp } and maximize the following family-level expected utility:

max wc [PcjD · ∆ucD + PcjF · ∆ucF ] + wp [PpjD · ∆upD + PpjF · ∆upF ] .59

j∈{M,G}

5. The final choice is Michelangelo (M). One researcher and one planner observe the final choice and some background characteristics of the child-parent family, but they do not observe members’ subjective expectations nor their decision protocol. The researcher is in charge of informing the planner on how the choice was made, and the planner is in charge of implementing useful policy, if desirable. The researcher faces an uncomfortably wide range of competing explanations, all of them consistent with selection of Michelangelo. She illustrates the issue to the planner through the following three scenarios: 1. The child may have unilaterally chosen M by (i) holding rational expectations, i.e., {(PcM D , PcM F ); (PcGD , PcGF )} = {(95, 30); (70, 90)} and (ii) only caring about difficulty, e.g., {∆ucD , ∆ucF } = {10, 0}. Indeed, this scenario together with a linear compensatory rule trading off difficulty and flexibility implies EUcM = 95 · 10 + 30 · 0 > EUcG = 70 · 10 + 90 · 0. 2. Alternatively, the child may have unilaterally chosen M by (i) holding rational expectations on the difficulty levels while erroneously perceiving the two alternatives as providing the same degree of flexibility, e.g., {(PcM D , PcM F ); (PcGD , PcGF )} = {(95, 90); (70, 90)} and 59

The assumption that individuals maximize a separable-in-outcomes, linear expected utility clearly affects identification. Therefore, the proposed illustration and the subsequent discussion hold within this class of problems, which is actually a prominent one, especially in empirical analysis. In fact, parametric-type assumptions should, if any, ease identification rather than complicate it. Hence, the proposed discussion on the problems of disentangling the role of preferences, expectations, and interactions in driving a simple (discrete) choice may be thought of as applying to a relatively favorable setup for a researcher facing such problems. A different question is whether assumptions like separability and linearity are reasonable in the context of a specific application.

76 (ii) equally caring for difficulty and flexibility, say, {∆ucD , ∆ucF } = {5, 5}. This yields EUcM = 95 · 5 + 90 · 5 > EUcG = 70 · 5 + 90 · 5. 3. Finally, child and parent may have chosen together by (i) weighting their expected utilities with a greater overall weight to the parent, e.g., {wc , wp } = {1/3, 2/3} and (ii) holding preferences and expectations such that (a) they both cared equally for difficulty and flexibility, say, {∆ucD , ∆ucF } ≡ {∆upD , ∆upF } = {5, 5}; (b) the child had rational expectations, i.e., {(PcM D , PcM F ); (PcGD , PcGF )} = {(95, 30); (70, 90)}; and (c) the parent, instead, erroneously perceived M and G to provide the same degree of flexibility, e.g., {(PpM D , PpM F ); (PpGD , PpGF )} = {(95, 90); (70, 90)}. Where all this implies

EUf M =

1 2 1 1 [95 · 5 + 30 · 5]+ [95 · 5 + 90 · 5] > EUf G = [70 · 5 + 90 · 5]+ [70 · 5 + 90 · 5] . 3 3 3 3

Comparison of scenarios 1 and 2 illustrates how preference-driven choices have different policy implications than expectations-driven choices. Specifically, if the planner were to intervene by providing the child with the correct information (optimistically assuming he himself knows it), his policy would be meaningful and possibly effective only under the second scenario. That is, if the informed decision maker of example 2 were to “comply” and use the objective realization probabilities, he would now switch to choice of G (since 95 · 5 + 30 · 5 < 70 · 5 + 90 · 5). The decision maker of example 1, instead, will choose M as long as he does not value flexibility and he correctly perceives M as easier, even without holding rational expectations. Example 3 shows that knowledge of decision process dynamics–such as presence or absence of interpersonal interactions–is also fundamental to inform policy. Under the third scenario, for information provision to be meaningful in the first place, it should target and reach a specific decision participant, the parent. Furthermore, assessing whether disclosing certain information

77 may be at all effective and to what extent (which the planner may wish to know given that information provision may be costly) requires knowledge of the relative importance of each participant and of his/her preferences. In case 3, importance and preference weights of the parent are such that disclosure of the objective probabilities on the flexibility outcomes, if feasible, may effectively induce a change in behavior, since

1 2 1 2 [95 · 5 + 30 · 5] + [95 · 5 + 30 · 5] < [70 · 5 + 90 · 5] + [70 · 5 + 90 · 5] , 3 3 3 3

but it need not do so in general. Finally, let us imagine for a moment that child and parent are totally aligned and both prefer M based on the wrong perception that it provides the same degree of flexibility as G, i.e., {∆ucD , ∆ucF } ≡ {∆upD , ∆upF } = {5, 5} and {(PcM D , PcM F ); (PcGD , PcGF )} ≡ {(PpM D , PpM F ); (PpGD , PpGF )} = {(95, 90); (70, 90)}. Alignment should “make them indifferent” with respect to the decision protocol, at least within a certain class of behavioral models, e.g., those requiring outcome of a group decision to satisfy Pareto optimality.60 Indeed, any group decision rule linearly combining expected utilities of the two decision makers, including the {0, 1} and {1, 0} unilateral cases, would result in choice of M, given the primitives. However, from the point of view of the planner, knowing which process is employed in the choice is still very important. Assume he does not. If the family decision process for the curriculum choice is that the child chooses unilaterally (as in scenario 2), then providing the correct information may be useful. If instead the employed decision process entails weighting expected utilities of child and parent with weights 1/3 and 2/3 (as in scenario 3), targeting the child alone would 60

As opposed to the Bayesian-type behavioral model of group choice under uncertainty discussed by Raiffa (1968) that does not require Pareto optimality and can accommodate situations in which final choice may differ from that on which all choice participants would otherwise agree based on their individual preferences and expectations.

78 not be effective, since

1 2 1 2 [95 · 5 + 30 · 5] + [95 · 5 + 90 · 5] > [70 · 5 + 90 · 5] + [70 · 5 + 90 · 5] , 3 3 3 3

but targeting only the parent or both may be, e.g.,

1 2 1 2 [95 · 5 + 90 · 5] + [95 · 5 + 30 · 5] < [70 · 5 + 90 · 5] + [70 · 5 + 90 · 5] .61 3 3 3 3

2.4

Decision Regimes within the Family

I assume that each family f ∈ F chooses a curriculum for its child according to one of K possible decision protocols represented by {Γkf }K k=1 , i.e.,

  k i Mk k , {x , δ } } ; ε M ax Γkf j = Γ {{Pijn , ∆uin }N ijm m m=1 i∈{c(f ),p(f )} f j , n=1 j∈J

(13)

i , and P where ∆uin , δm ijn are the preferences and the expectations of the child and the parent,

and εkf j is a random component capturing the observational difficulty of the econometrician.62 The specific form of {Γkf }Jj=1 is dictated by the particular rule used to aggregate preferences and expectations of choice participants. As anticipated, I focus on cooperative-type protocols only, when interaction is present at all. Another important aspect relates to the structure of εkf j . Any observational difficulty of the family-level criterion function translates directly from the unobservability of some components of the individual expected utilities, and it may be 61

As previously noticed, if selection of the decision protocol is endogenous to curriculum choice, further knowledge of the process underlying the former is required for meaningful inference and policy analysis. Should a policy affecting one decision maker’s expectations induce any change in whether or how the whole group of decision makers interact, then quantification of the effects of such policy requires some knowledge or assumptions on the members’ behavior in selecting an interaction protocol for the group. This is true also for preferences, since my setup implicitly assumes that preferences are not endogenous with respect to expectations. I.e., the former would not adjust following a change in the latter. If that were the case, one would also need to incorporate that dependence in the choice model. 62 Dependence of the preference parameters on individual characteristics zi is suppressed for notational convenience, since it is not essential to the discussion.

79 augmented by the possibility that some aspects of the choice protocol itself are not observable. I shall describe 3 main categories of decision protocols in the following subsections, while I delay discussion of the assumptions on the error structure to Section 2.7, where I lay out the protocol-specific econometric models.

2.4.1

Unilateral Decision Making With No Child-Parent Interaction

When a child chooses alone without any major interaction with his parent (k = 1), the family criterion function will simply coincide with the expected utility for the child as in (12): 1

Γ1f j

=

N X n=1

1

Pcjn ·

∆uc,1 n

+

M X

c,1 δm · xcjm + ε1f j ,

(14)

m=1

where U c,0 is dropped for notational convenience, x0cj δ c,1 is written in summation form, and the superscript 1 denotes the protocol. Notice that this category includes the possibility that the child interacted with any person or listened to any source different from the parent. The case in which the parent chooses alone (k = 5) is totally symmetric to case k = 1, with the roles of child and parent reversed.

2.4.2

One Party Chooses After Listening to the Other Party

Case k = 2 is one in which a child makes the curriculum choice himself, but he does so after listening to the opinions of his parent. This is formalized by assuming that the child maxi2

N mizes an expected utility function in which preference parameters over outcomes, {∆uc,2 n }n=1 2

c,2 M and {δm }m=1 , are his own, while expectations are formed (by the child) combining his ini2

2

N tial expectations, {{Pcjn }N n=1 }j∈J , with his parent’s, {{Ppjn }n=1 }j∈J , through some weights,

80 2

{wnc,2 }N n=1 : 2

Γ2f j

=

N X 

2

wnc,2

· Ppjn + (1 −

n=1

wnc,2 )



· Pcjn ·

∆uc,2 n

+

M X

c,2 δm · xcjm + ε2f j .

(15)

m=1

Hence, aggregated expectations can be simply thought of as the child’s updated expectations, after he has received some outcome-specific information or opinions from his parent and has incorporated them into his own.63,64 Clearly, protocol k = 1 is nested in this one, since it can be represented by a (k = 2)type situation in which wnc,2 = 1 for all n = 1, ..., N2 . This setup can also accommodate opinion polarization by not restricting weights wni,k to lie in [0, 1], as in Arora and Allenby (1999). Finally, there is a strong resemblance with the panel-of-experts situation described by Raiffa (1968), but here modified so that (1) both child and parent possibly act as experts; (2) the child plays the role of decision maker; (3) the traditional economic assumption that only expectations can be updated (not preferences) holds. Then, a situation may occur in which while all experts agree on the optimal action based on maximization of their individual expected utilities, maximization of the expected utility derived by aggregating individual preferences and expectations separately and outcome by outcome would lead to choice of a different action (see Raiffa (1968)). Indeed, in the Michelangelo-Galileo setup of Section 2.3, an example can be easily constructed in which child and parent preferences over the two outcomes are identical, their expectations differ, and they both prefer Galileo over Michelangelo based on their expected utilities before interacting (say EUib , i ∈ {c, p}), but maximization of the child’s expected utility after interacting with his 63

The extent to which a child relies on his parent’s opinions on a specific outcome may also vary across curricula. 2 I.e., the weights {wnc,2 }N n=1 may be alternative-specific. This would occur, e.g., if a child thinks that the parent has more experience with or better information about certain curricula. While I do not model this aspect, in the empirical application weights could be easily made to depend on covariates capturing these types of dynamics, if available. An example would be a dummy for whether the parent graduated from or teaches in a certain curriculum. 64 The case in which the parent chooses after listening to her child (k = 4) is again symmetric to case k = 2, with the roles of child and parent reversed.

81 parent (say EUca ), leads to choice of Michelangelo over Galileo. That is,  b  EUcG            b    EUpG    a  EUcG            EU a cM

b = 0.15 · 1 + 0.45 · 2 = 1.05 = 0.9 · 1 + 0.1 · 2 = 1.1 > EUcM

b = 0.2 · 1 + 0.8 · 2 = 1.8 > EUpM = 0.25 · 1 + 0.75 · 2 = 1.75

= [0.2 · 0.9 + 0.8 · 0.2] · 1 + [0.5 · 0.1 + 0.5 · 0.8] · 2 = 1.24
0,   P y(t) ≤ sQM T S (α, t) − λ = X  = P y(t) ≤ sQM T S (α, t) − λ z = t˜]P (z = t˜). t˜∈T

  TakingP y(t) ≤ sQM T S (α, t) − λ|z = t˜ =0 for all t˜ > t,   and P y(t) ≤ sQM T S (α, t) − λ|z = t˜ = P y(t) ≤ sQM T S (α, t) − λ|z = t for all t˜ < t (which is possible under α-QMTS), by the definition of sQM T S (α, t) one obtains     P y(t) ≤ sQM T S (α, t) − λ = P y(t) ≤ sQM T S (α, t) − λ|z = t P (z ≤ t) < α. Hence, Qα [y(t)] > sQM T S (α, t) − λ. Now let P (z = t) < α, so that sQ (α, t) = ∞. For any finite y˜, X P [y(t) ≤ y˜] = P [y(t) ≤ y˜|z = s] P (z = t˜).

(39)

t˜∈T

    Setting P y(t) ≤ y˜|z = t˜ = 0 for all t˜ > t, and P y(t) ≤ y˜|z = t˜ = P [y(t) ≤ y˜|z = t] for all t˜ < t implies P [y(t) ≤ y˜] = P [y(t) ≤ y˜|z = t] P (z ≤ t) < α.

194 Hence, Qα [y(t)] > y˜. (b) rQM T S (α, t) is the greatest LB Let P (z ≥ t) > 1 − α, so that rQM T S (α, t) < ∞. For any λ > 0,   P y(t) ≤ rQM T S (α, t) + λ = X  = P y(t) ≤ rQM T S (α, t) + λ z = t˜]P (z = t˜). t˜∈T

  QM T S ˜ = 1 for all t˜ < t, If one sets P y(t) ≤ r (α, t) + λ|z = t     and P y(t) ≤ rQM T S (α, t) + λ|z = t˜ = P y(t) ≤ rQM T S (α, t) + λ|z = t for all t˜ > t, by the definition of rQM T S (α, t) it follows that   P y(t) ≤ rQM T S (α, t) + λ =   = P y(t) ≤ rQM T S (α, t) + λ|z = t P (z ≥ t) + P (z < t) ≥ α. Hence, Qα [y(t)] ≤ rQT M S (α, t) + λ.   TS Now let P (z ≥ t) ≤ 1− α, so that rQM (α, t) = −∞. For any y˜ ∈ R, if P y(t) ≤ y˜|z = t˜ =  1 for all t˜ < t, and P y(t) ≤ y˜|z = t˜ = P [y(t) ≤ y˜|z = t] for all t˜ > t, then by equation (39) one gets P [y(t) ≤ y˜|z = t] P (z ≥ t) + P (z < t) ≥ α. Hence Qα [y(t)] ≤ y˜.

A.3

α-QMIV Bounds: Derivation

The starting point for determination of the identifying power of α-QMIV assumption is the no-assumptions bound on P [y(t) ≤ y˜α |ν = v] of Manski (1994). Let v ∈ V . By the law of total probability and the fact that P [y(t) ≤ y˜α |ν = v, z = t] = P [y ≤ y˜α |ν = v, z = t], one can write P [y(t) ≤ y˜α |ν = v] = P [y ≤ y˜α |ν = v, z = t] P (z = t|ν = v)+ +P [y(t) ≤ y˜α |ν = v, z 6= t] P (z 6= t|ν = v). The sampling process identifies each of the quantities on the right side except for the P [y(t) ≤ y˜α |ν = v, z 6= t], which may take any value in the interval [0, 1]. This yields an identification region of the form P [y ≤ y˜α |ν = v, z = t] P (z = t|ν = v) ≤ P [y(t) ≤ y˜α |ν = v] ≤ P [y ≤ y˜α |ν = v, z = t] P (z = t|ν = v) + P (z 6= t|ν = v). An α-QMIV assumption implies the inequality restriction vˆ ≤ v ≤ vˆ ˆ

=⇒

h i Qα [y(t)|ν = vˆ] ≤ Qα [y(t)|ν = v] ≤ Qα y(t)|ν = vˆˆ ,

or equivalently vˆ ≤ v ≤ vˆ ˆ

=⇒

h i P [y(t) ≤ y˜α |ν = vˆ] ≥ P [y(t) ≤ y˜α |ν = v] ≥ P y(t) ≤ y˜α |ν = vˆˆ .

195 Hence, P [y(t) ≤ y˜α |ν = v] is no smaller than the no-assumption lower bound h i on P [y(t) ≤ y˜α |ν = vˆ], and no larger than the no-assumption upper bound on P y(t) ≤ y˜α |ν = vˆˆ . ˆ ≥ v. There are no other restrictions on P [y(t) ≤ y˜α |ν = v]. Thus, the This holds for all vˆ ≤ v and all vˆ implied (sharp) identification region for P [y(t) ≤ y˜α |ν = v] is n h i  o max P y ≤ y˜α |ν = vˆˆ, z = t P z = t|ν = vˆˆ ˆ v ˆ≥v

≤ P [y(t) ≤ y˜α |ν = v] ≤ min {P [y ≤ y˜α |ν = vˆ, z = t] P (z = t|ν = vˆ) + P (z 6= t|ν = vˆ)} .

(40)

v ˆ≤v

Now, by the law of total probability P [y(t) ≤ y˜α ] =

X

P (ν = v)P [y(t) ≤ y˜α |ν = v] .

v∈V

Given (40), the sharp lower (upper) bound on all {P [y(t) ≤ y˜α |ν = v] , v ∈ V } is obtained by setting P [y(t) ≤ y˜α |ν = v] at its lower (upper) bound, as given in (40), for each one of v ∈ V . This yields the identification region for P [y(t) ≤ y˜α ] presented in Proposition 1.6.

A.4

Semi-monotonicity

A.4.1

α-Quantile Semi-Monotone Treatment Selection

α-quantile selection semi-monotonicity (α-QSMTS) extends α-QMTS to the case in which treatment t is represented by some k × 1 vector with k > 1, and hence T k is only semi-ordered. While the formal definition of the assumption remains unchanged, in terms of identification, for each pair of treatments t1 , t2 ∈ T k that are not ordered, i.e. t1 ø t2 , we have also that Qα [y(t)|z = t2 ] ø Qα [y(t)|z = t1 ], i.e. nothing can be said about the relative magnitude of Qα [y(t)|z = t2 ] and Qα [y(t)|z = t1 ], in this subset of cases. Proposition A.1. Let α ∈ (0, 1). Define rQST M S (α, t) and sQST M S (α, t) as

r

s

QSM T S

QSM T S

( Q[1− 1−α ] (y|z = t) P (z≥t) (α, t) = y IN F ( Q[ α ] (y|z = t) P (z≤t) (α, t) = y SU P

if P (z < t ∪ z

ø t) < α < 1

otherwise,

if 0 < α ≤ P (z ≥ t) otherwise.

Then, for every t ∈ T , rQSM T S (α, t) ≤ Qα [y(t)] ≤ sQSM T S (α, t). Clearly, these bounds are in general less tight than those yielded by α-QMTS, P (z ≥ t) and P (z ≤ t) being now smaller because of the additional possibility that z ø t, which implies P (z ø t) ≥ 0. Let consider e.g. the lower bound rQSM T S (α, t), when informative. After expressing Q[1−

] (y|z = t) as Q[ α−P (z