The third essay analyzes distributional treatment effects. The model is based on the potential outcomes framework and focuses on the estimation of quantile ...
Three Essays on Bayesian Nonparametric Modeling in Microeconometrics
Dissertation zur Erlangung des Grades Doktor der Wirtschaftswissenschaften (Dr. rer. pol.) am Fachbereich Wirtschaftswissenschaften der Universit¨at Konstanz
vorgelegt von: Markus Jochmann Mainaustraße 61 78464 Konstanz
Tag der m¨ undlichen Pr¨ ufung: 26. Juli 2006 1. Referent: Prof. Dr. Winfried Pohlmeier 2. Referent: Prof. Gary Koop Konstanzer Online-Publikations-System (KOPS) URL: http://www.ub.uni-konstanz.de/kops/volltexte/2008/5786/ URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-57862
Vorwort ¨ Diese Arbeit wurde w¨ahrend meiner Zeit am Lehrstuhl f¨ ur Okonometrie am Fachbereich Wirtschaftswissenschaften der Universit¨at Konstanz angefertigt. Ich danke meinem Doktorvater Winfried Pohlmeier f¨ ur seine Hilfe, seine Unterst¨ utzung und den mir gew¨ahrten wissenschaftlichen Freiraum. Nicht weniger danke ich allen Kollegen w¨ahrend dieser Zeit. Wir hatten immer eine besondere Atmosph¨are am Lehrstuhl, diese h¨atte ich nicht missen wollen. Da diese Dissertation in der Diaspora entstanden ist, war es f¨ ur mich unerl¨asslich, einige Bayesianische Diskussionspartner und Freunde zu haben. Hier gilt mein Dank vor allem Gary Koop, der sich bereit erkl¨art hat, als zweiter Gutachter zu fungieren. Auch danke ich meinem Koautoren Roberto Le´on-Gonz´alez. Schließlich konnte ich mich mit dummen Fragen immer an Luc Bauwens wenden. Weiter danke ich der Deutschen Forschungsgemeinschaft und der Universit¨atsgesellschaft Konstanz f¨ ur finanzielle Unterst¨ utzung meiner Forschungsvorhaben. Auch bin ich Robert Lee f¨ ur seine Hilfe bez¨ uglich der englischen Rechtschreibung zu Dank verpflichtet. Zu guter Letzt danke ich meiner Familie und meinen Freunden.
ii
Contents
Abstract
1
Zusammenfassung
2
1 Introduction Summary of the Literature
4
Bibliography
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Estimating the Demand for Health Care with Panel Data - A Semiparametric Bayesian Approach (Essay 1) 21 2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2
A Parametric Benchmark Model . . . . . . . . . . . . . . . . . . 24
2.3
A Semiparametric Extension . . . . . . . . . . . . . . . . . . . . 25
2.4
Bayesian MCMC Sampling . . . . . . . . . . . . . . . . . . . . . 29
2.5
The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Bibliography
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
iii
Contents
3 Nonparametric Bayesian Inference for Count Data Treatment Effects (Essay 2) 47 3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2
The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3
Bayesian MCMC Sampling . . . . . . . . . . . . . . . . . . . . . 53
3.4
Empirical Illustration: Number of Trips by Households . . . . . 55
3.5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4 Nonparametric Bayesian Inference for Quantile Treatment Effects (Essay 3) 73 4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2
The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3
Bayesian Computation . . . . . . . . . . . . . . . . . . . . . . . 81
4.4
Empirical Application . . . . . . . . . . . . . . . . . . . . . . . 84
4.5
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Bibliography
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Complete Bibliography
95
Erkl¨ arung
105
Abgrenzung
106
iv
Abstract This dissertation is comprised of three essays on nonparametric Bayesian modeling in microeconometrics. The introduction discusses some basic concepts of Bayesian nonparametrics including the Dirichlet process and the mixture of Dirichlet processes model. Further, the literature on estimating the demand for health care using count data and the literature on treatment effect models is summarized. The first essay specifies a Bayesian nonparametric random effects model for count data. This model is based on a mixture distribution with a random number of components, and is therefore a natural extension of prevailing latent class models. We propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference and apply the framework to data from Germany. The second essay is also concerned with count data but focuses on the estimation of causal treatment effects. A potential outcomes model is specified in a nonparametric Bayesian fashion using a mixture of Dirichlet processes. Posterior inference is done using MCMC simulation methods. We illustrate the proposed techniques with a real data set concerning the mobility of households. The third essay analyzes distributional treatment effects. The model is based on the potential outcomes framework and focuses on the estimation of quantile treatment effects. Flexibility is achieved by including random intercepts in the outcomes equations. These random intercepts are assumed to be drawn from Dirichlet processes. We apply the model to real data using MCMC techniques for posterior simulation.
1
Zusammenfassung Diese Dissertation umfasst drei Aufs¨atze u ¨ber nichtparametrische Bayesianische Verfahren in der Mikro¨okonometrie. In der Einleitung werden grundlegende Konzepte der Bayesianischen Nichtparametrik diskutiert. Insbesondere werden der Dirichlet-Prozess und das Dirichlet-Prozess-Mischmodell angesprochen. Weiter wird die Literatur u ¨ber die Sch¨atzung der Nachfrage nach Gesundheitsdienstleistungen auf Basis von Z¨ahldatenmodellen und die Literatur u ¨ber Maßnahmeeffekte zusammengefasst. Im ersten Aufsatz wird ein Bayesianisches nichtparametrisches Zufallseffektemodell f¨ ur Z¨ahldaten vorgestellt. Dieses basiert auf einer Mischverteilung mit einer zuf¨alligen Anzahl von Komponenten und kann daher als Erweiterung der in der Literatur bereits diskutierten Modelle mit latenten Klassen angesehen werden. Ein auf Markov Ketten Monte Carlo (MKMC) Verfahren basierender Algorithmus wird f¨ ur die Analyse der a posteriori Verteilung entwickelt. Der Modellrahmen wird schließlich auf deutsche Daten angewendet. Der zweite Aufsatz besch¨aftigt sich auch mit Z¨ahldaten, fokussiert aber auf die Ermittlung von kausalen Maßnahmeeffekten. Ein auf dem Kausalmodell von Rubin fußendes Modell wird vorgestellt. Dieses wird nichtparametrisch mit Hilfe des Dirichlet-Prozess-Mischmodells formuliert. Die a-posteriori-Verteilung wird mit MKMC Verfahren analysiert. Der Aufsatz schließt mit einer Anwendung, die sich mit der Mobilit¨at von Haushalten besch¨aftigt. Der dritte Aufsatz analysiert schließlich die kausale Wirkung einer Maßnahme auf die Verteilung der relevanten Zielgr¨osse. Die Sch¨atzung von Quantilsmassnahmeeffekten steht im Vordergrund der Analyse. Hierzu basiert der vorgeschlagene Modellrahmen wiederum auf Rubins Kausalmodell. Die
2
Zusammenfassung
Einf¨ uhrung von Zufallsthermen, die Dirichlet-Prozessen folgen, erlaubt eine flexible Formulierung des betrachteten Zusammenhangs. Das Modell wird unter R¨ uckgriff auf MKMC Verfahren gesch¨atzt und mit realen Daten illustriert.
3
Chapter 1 Introduction Summary of the Literature This dissertation is comprised of three essays on nonparametric Bayesian methods in microeconometrics. Nonparametric methods have become one of the central areas of Bayesian research and are intensely used in many fields like biometrics or machine learning. In contrast, like their parametric counterparts, they have not been so influential in the econometric literature. Thus, one purpose of this thesis is to demonstrate the usefulness of Bayesian nonparametric methods for applied econometric research. In a literal sense we should not use the term Bayesian ‘nonparametric’ methods. It is an ‘oxymoron and misnomer ’, as M¨ uller and Quintana (2004) put it. Bayesian inference is always based on a well defined probability model, and thus is inherently parametric. The commonly used definition of Bayesian nonparametrics, rather, refers to models with infinitely many parameters (Bernardo and Smith (1994)).1 The Dirichlet process is the most frequently used prior in the Bayesian nonparametric literature (see MacEachern and M¨ uller (2000), and M¨ uller and Quintana (2004) for recent surveys). Accordingly, we also employ the Dirichlet 1
For a nice discussion of parametric versus nonparametric modeling from a Bayesian point of view see Koop (2003), who uses the term ‘flexible models’ for his chapter on Bayesian nonparametrics.
4
Introduction
process and the closely related mixture of Dirichlet processes model in the three essays of this dissertation. The first essay estimates the demand for health care from panel data. In this application the Dirichlet process is used to relax the assumption on the distribution of the random effects. The second and the third essay deal with treatment effect models. Here, the distribution of the error terms is modeled in a flexible way using a mixture of Dirichlet processes. The remainder of this introduction gives a review of the literature related to this dissertation. First, the Dirichlet process and issues of its application in Bayesian nonparametrics are discussed. Second, we review the literature on count data regression for estimating the demand for health care. Finally, the fundamental aspects of treatment effect models are summarized.
The Dirichlet Process The Dirichlet process was introduced by Ferguson (1973) as a prior distribution on spaces of probability measures. It is defined by the following property: A random probability measure G is generated by a Dirichlet process with precision parameter ν > 0 and base distribution G0 if for any partition B1 , . . . , Bm on the space of support of G0 the vector of probabilities (G(B1 ), . . . , G(Bm )) follows a Dirichlet distribution with parameter (νG0 (B1 ), . . . , νG0 (Bm )). The expectation and variance of G are defined by E(G) = G0 ,
(1.1)
and, for any event A, Var[G(A)] =
G0 (A)[1 − G0 (A)] . ν+1
(1.2)
More aspects of the Dirichlet process are discussed, among others, in Ferguson (1973), Antoniak (1974), Cifarelli and Melilli (2000) and Ghosh and Ramamoorthi (2003). To better understand the properties of the Dirichlet process, it is useful to look at two of its representations. Sethuraman (1994) shows that any G ∼
5
Introduction
DP(ν, G0 ) can be represented as G=
∞ X
ωh δθh ,
(1.3)
h=1
ωh = Uh
Y
iid
(1 − Uj ) with Uh ∼ Beta(1, ν),
(1.4)
j 1, b0 − 1
n20 a0 (a0 + b0 − 1) , b0 > 2, Var(ν) = (b0 − 1)2 (b0 − 2) and Mode(ν) =
n0 (a0 − 1) , a0 > 1. b0 + 1
(1.15) (1.16)
(1.17)
Griffin and Steel suggest choosing a0 = b0 = η0 , which means that the prior median of ν is n0 . Thus, the prior is centered around n0 with its variance decreasing in η0 . In this case ν can be sampled using the Metropolis-Hastings algorithm. One restriction of the Dirichlet process is that it does not allow to model the relationship between covariates and the unknown distribution directly. However, this fact received increasing attention in the literature and some possible extensions have been developed. Cifarelli and Regazzini (1978) consider the case of discrete covariates and propose the Product of Dirichlet Processes model. They use Dirichlet process priors at each level of the covariate but link them through a common regression component in the base distribution. A similar approach has been used in the econometric literature by Griffin and Steel (2004). An alternative approach is proposed by MacEachern (1999) who discusses the Dependent Dirichlet process (DDP). He starts from the stickbreaking representation and assumes that the distribution of the locations θh is dependent across different levels of the covariate. An alternative strategy is to model dependencies of the weights in the stick-breaking representation. This approach is followed by Dunson and Pillai (2004) and Griffin and Steel (2006).
Estimating the Demand for Health Care from Count Data The first essay of this thesis “Estimating the Demand for Health Care with Panel Data: A Semiparametric Bayesian Approach” (Chapter 2) employs a
9
Introduction
mixture of Dirichlet processes model in order to estimate the demand for health services from count data. Given that the utilization of health care is often measured as the number of visits to a physician or another institution, count data models are often encountered in the empirical literature. A natural starting point for modelling count data is the Poisson distribution. However, simple models based on the Poisson distribution have a number of shortcomings. One is that they do not allow for unobserved heterogeneity. Alternative assumptions concerning the underlying probability distribution may fit the data better. Following this reasoning, some papers employ a negative binomial model or a Poisson-log-normal model. But still these two alternatives cannot fully account for one feature of health care data, which is the high proportion of zero usage. To overcome this problem, two-part models (which are also called hurdle models) were proposed in the literature (Mullahy (1986), Pohlmeier and Ulrich (1995), and Gurmu (1997)). The first part of these models consists of a binary outcomes equation that distinguishes between users and non-users. The second part then specifies the distribution of usage conditional on the fact that it is positive. Two-part models are attractive since they can be interpreted in terms of a principal-agent model. In a first step the patient decides whether to go to the physician or not, and in the second step the physician determines the level of care. However, the fact that the data is usually recorded over a fixed time period and not over an illness spell makes this interpretation problematic. Still, the two-part setup can be seen as a reasonable model approach that enables a richer specification of the data generating process. Another type of model that is able to capture the high proportion of zero usage is the latent class model (Deb and Trivedi (1997, 2002)). This finite mixture model does not distinguish between users and non-users but between frequent users and non-frequent users (in the case of two groups). Thus, the latent class model is appealing if the mixture components can be interpreted in a meaningful way. However, this is not required, like the two-part model it can be regarded as a more flexible framework for modelling count data. A further shortcoming of the standard Poisson regression model is that it ignores a possible panel structure of the data. Riphahn, Wambach, and Million
10
Introduction
(2003) take up this problem and discuss a bivariate random effects framework for estimating the demand for health care with count data. L´opez-Nicol´as (1998) is another study that uses panel data to infer the demand for health care. Finally, a growing part of the literature on the demand for health care is allowing for endogeneity of the insurance status (see, for example, Miller and Luft (1994), and Meer and Rosen (2004)). Munkin and Trivedi (2003) consider the case of count data. They analyze a self-selection model with two outcome variables, one of which is a count and the other a continuous variable. They use Bayesian methodology to draw inference and motivate this choice by computational problems they had in a simulated maximum likelihood framework. The first essay in this book (Chapter 2) does not account for endogeneity but considers the first two shortcomings of standard Poisson models for estimating the demand for health care. First, it develops a random effects panel data model and thus allows to control for different attitudes and genetic diversity across individuals. The model is formulated in a Bayesian nonparametric fashion using a mixture of Dirichlet processes. In this way, an arbitrary specification of the random effects distribution is avoided. Second, employing the Dirichlet process prior generalizes latent class models by allowing the mixture distribution to have a random number of components. Thus, the problem of selecting the number of classes is avoided.
Bayesian Treatment Effect Models The second essay “Nonparametric Bayesian Inference for Count Data Treatment Effects” (Chapter 3) and the third essay “Nonparametric Bayesian Inference for Quantile Treatment Effects” (Chapter 4) of this thesis discuss nonparametric Bayesian methods for econometric program evaluation. At the heart of econometric program evaluation are ‘what if’ questions. These play a major role in many fields of economics. For example, a classical ‘what if’ question in labor economics concerns the wage effect of an additional year of schooling: What would an individual earn if it went to school for one more
11
Introduction
year? Estimation of causal effects involves a comparison of two states of the world. However, one of those two states cannot be observed, since an individual is either in the considered program or not. Thus, econometric program evaluation can be seen as a missing data problem. The literature on econometric program evaluation is large and steadily growing. Heckman, LaLonde, and Smith (1999), Heckman and Vytlacil (2007) and Blundell and Costa Dias (2002) give excellent surveys. Basically, three different approaches to program evaluation can be distinguished: i) social/natural experiments, ii) matching methods, and (iii) instrumental variable methods. In social experiments a small subsample of the population is randomly assigned to treatment and control groups. The treatment group is then subjected to the program and the difference in outcomes provides an estimate of the causal effect of the program. Hausman and Wise (1985) discuss the advantages of this approach. In the case of a natural experiment (also called randomized trial), the researcher is able to observe a group of individuals that behaves like a control group in a properly randomized experiment. Often the ‘difference-in-differences’ estimator is used to evaluate natural experiments (see, for example, the famous study of Card and Krueger (1994) about minimum wages). The matching literature assumes that individuals select themselves into the treatment solely based on variables that can be observed by the researcher (selection on observables). Thus, the researcher can match each treated individual with a non-treated individual with the same matching variables in order to estimate the effect of the treatment. The two most common approaches to do this are propensity score matching (Rosenbaum and Rubin (1983)) and multivariate matching based on the Mahalanobis distance (Cochran and Rubin (1973)). Finally, the instrumental variable approach builds on an exclusion restriction. That is, there needs to be at least one variable that influences treatment choice but is excluded from the outcome equation. The seminal papers on instrumental variable estimation in treatment effect models are Imbens and
12
Introduction
Angrist (1994) and Heckman (1997). Given that econometric program evaluation can be seen as a missing data problem, following the Bayesian approach seems natural, since here the distinction between missing data and model parameters gets blurred. This becomes even more apparent when applying Markov chain Monte Carlo simulation methods for Bayesian inference. Modern sampling techniques augment the parameter space by the missing data and sample both in turn. Despite this appeal, there are only a few Bayesian papers that deal with econometric program evaluation and treatment effects. Chib and Hamilton (2000) consider the potential outcomes framework (Neyman (1923), Fisher (1935), Roy (1951), Cox (1958) and Rubin (1974)) and extensions of it from a Bayesian viewpoint. In a subsequent paper (Chib and Hamilton (2002)) they extend their approach in a nonparametric way. Chib (2003) also analyzes treatment effects from a Bayesian perspective. Instead of modelling the two potential outcomes separately, he considers a model with an endogenous dummy variable indicating treatment choice. Thus, this model is more restrictive in that, for example, it equates the variances of the two potential outcomes. The potential outcomes model is also used by Koop and Poirier (1997) who focus on the correlation between the two potential outcomes. Given that only one of the potential outcomes is observable, this quantity is inherently unidentified. Koop and Poirier show how one can learn about this parameter in a Bayesian setup with a proper prior and suggest MCMC techniques for drawing inference. Poirier and Tobias (2003) also follow this approach and focus on predictive distributions of the outcome gains. Finally, Li, Poirier, and Tobias (2004) extend the analysis and look at non-normal selection models. Specifically, they propose Student-t selection models and a finite mixture of Gaussian selection models. Imbens and Rubin (1997) estimate treatment effects in the case of randomized experiments with noncompliance. Their approach is then extended by Hirano, Imbens, Rubin, and Zhou (2000) who allow for the presence of covariates. They also consider violations of the identifying exclusion restrictions.
13
Introduction
The second essay of this dissertation (Chapter 3) considers treatment effect estimation in situations where the outcome variable is a count. Terza (1998) discusses classical inference for a count data regression model with an endogenous dummy variable. We extend his approach by formulating a potential outcomes model for count data. Again, we use a mixture of Dirichlet processes to obtain a robust model framework. MCMC simulation methods are also used for posterior inference in this model. Most of the program evaluation literature focuses on mean treatment effects. The average treatment effect (ATE), which gives the effect of the treatment on a randomly picked individual, and the effect of the treatment on the treated (TT), which calculates the effect on a randomly chosen participant, are the two most common. However, in many situations researchers and policymakers are also interested in the effect of the treatment on the distribution of the outcome variable. In order to address this point, quantile treatment effects can be estimated. Abadie, Angrist, and Imbens (2002) and Chernozhukov and Hansen (2004, 2005) discuss appropriate model frameworks for doing so. The third essay (Chapter 4) of this thesis proposes a Bayesian alternative to their classical approaches. To allow the data drive the shape of the distribution of the outcome variable, we introduce a nonparametric mixture of Dirichlet processes model. Posterior inference is done using MCMC techniques.
Bibliography Abadie, A., J. Angrist, and G. Imbens (2002): “Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings,” Econometrica, 70, 91–117. Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152– 1174. Bernardo, J. M., and A. F. M. Smith (1994): Bayesian theory. Wiley, New York.
14
Introduction
Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via Polya urn schemes,” The Annals of Statistics, 1, 353–355. Blei, D., and M. Jordan (2004): “Variational methods for the Dirichlet process,” in Proceedings of the 21st International Conference on Machine Learning. Blundell, R., and M. Costa Dias (2002): “Alternative approaches to evaluation in empirical microeconomics,” Portuguese Economic Journal, 1, 91–115. Card, D., and A. B. Krueger (1994): “Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania,” American Economic Review, 84, 772–793. Carota, C., and G. Parmigiani (2002): “Semiparametric regression for count data,” Biometrika, 89, 265–281. Chen, M. H., Q. M. Shao, and J. G. Ibrahim (2000): Monte Carlo methods in Bayesian computation. Springer, New York. Chernozhukov, V., and C. Hansen (2004): “The impact of 401K participation on savings: an iv-qr analysis,” The Review of Economics and Statistics, 86, 735–751. (2005): “An iv model of quantile treatment effects,” Econometrica, 73, 245–261. Chib, S. (2003): “On inferring effects of binary treatments with unobserved confounders (with discussion),” in Bayesian Statistics 7, ed. by J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, pp. 66–84. Oxford University Press, Oxford. Chib, S., and B. H. Hamilton (2000): “Bayesian analysis of cross-section and clustered data treatment models,” Journal of Econometrics, 97, 25 – 50. (2002): “Semiparametric Bayes analysis of longitudinal data treatment models,” Journal of Econometrics, 110, 67–89.
15
Introduction
Cifarelli, D. M., and E. Melilli (2000): “Some new results for Dirichlet priors,” The Annals of Statistics, 28, 1390–1413. Cifarelli, D. M., and E. Regazzini (1978): “Problemi statistici non parametrici in condizioni di scambialbilita parziale e impiego di medie associative,” Annali del Instituto di Matematica Finianziara dell Universit`a di Torino, Serie III, 12, 1-36. Cochran, W. G., and D. B. Rubin (1973): “Controlling bias in observational studies: A review,” Sankhya, Ser. A, 35, 417–446. Cox, D. R. (1958): The planning of experiments. Wiley, New York. Deb, P., and P. Trivedi (1997): “Demand for medical care by the elderly: A finite mixture approach,” Journal of Applied Econometrics, 12, 313–336. (2002): “The structure of demand for health care: Latent class versus two-part model,” Journal of Health Economics, 21, 601–625. Dunson, D. B., and N. Pillai (2004): “Bayesian density regression,” ISDS Discussion Paper 2004-33. Escobar, M. D. (1994): “Estimating normal means with a Dirichlet process prior,” Journal of the American Statistical Association, 89, 268–277. Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588. Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1, 209–230. Fisher, R. A. (1935): Design of experiments. Oliver and Boyd. Gelfand, A. E., and A. Kottas (2002): “A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 11, 289–305. Gelfand, A. E., and A. F. M. Smith (1990): “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, 85, 398–409.
16
Introduction
Geman, S., and D. Geman (1984): “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. Ghosh, J. K., and R. V. Ramamoorthi (2003): Bayesian nonparametrics. Springer, New York. Griffin, J. E., and M. F. J. Steel (2004): “Semiparametric Bayesian inference for stochastic frontier models,” Journal of Econometrics, 123, 121– 152. Griffin, J. E., and M. F. J. Steel (2006): “Order-based dependent Dirichlet processes,” Journal of the American Statistical Association, 101, 179– 194. Gurmu, S. (1997): “Semiparametric estimation of hurdle regression models with an application to Medicaid utilization,” Journal of Applied Econometrics, 12, 225–242. Hausman, J. A., and D. A. Wise (1985): Social experimentation, NBER conference report. University of Chicago Press, Chicago. Heckman, J. (1997): “Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations,” Journal of Human Resources, 32, 441–462. Heckman, J., R. LaLonde, and J. Smith (1999): “The economics and econometrics of active labour market programs,” in Handbook of Labour Economics, Volume 3, ed. by O. Ashenfelter, and D. Card. Elsevier, Amsterdam. Heckman, J., and E. Vytlacil (2007): “Econometric evaluation of social programs,” in Handbook of Econometrics, Volume 6, ed. by J. Heckman, and E. Leamer. North Holland, Amsterdam. Hirano, K., G. W. Imbens, D. B. Rubin, and X.-H. Zhou (2000): “Assessing the effect of an influenza vaccine in an encouragement design,” Biostatistics, 1, 69–88.
17
Introduction
Imbens, G. W., and J. D. Angrist (1994): “Identification and estimation of local average treatment effects,” Econometrica, 62, 467–475. Imbens, G. W., and D. B. Rubin (1997): “Bayesian inference for causal effects in randomized experiments with noncompliance,” The Annals of Statistics, 25, 305–327. Ishwaran, H., and L. F. James (2002): “Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information,” Journal of Computational and Graphical Statistics, 11, 508–532. Koop, G. (2003): Bayesian econometrics. Wiley, Chicester. Koop, G., and D. J. Poirier (1997): “Learning about the across-regime correlation in switching regression models,” Journal of Econometrics, 78, 217–227. Li, M., D. Poirier, and J. Tobias (2004): “Do dropouts suffer from dropping out? Estimation and prediction of outcome gains in generalized selection models,” Journal of Applied Econometrics, 9, 203–225. Liu, J. S. (2001): Monte Carlo strategies in scientific computing. Springer, New York. ´ pez-Nicola ´ s, A. (1998): “Unobserved heterogeneity and censoring in the Lo demand for private health care,” Health Economics, 7, 429–437. MacEachern, S. (1999): “Dependent nonparametric processes,” in ASA Proceedings of the Section on Bayesian Statistical Science. American Statistical Association, Alexandria. MacEachern, S. N., M. Clyde, and J. S. Liu (1999): “Sequential importance sampling for nonparametric Bayes models: The next generation,” Canadian Journal of Statistics, 27, 251–267. ¨ ller (1998): “Estimating mixtures of MacEachern, S. N., and P. Mu Dirichlet process models,” Journal of Computational and Graphical Statistics, 7, 223–238.
18
Introduction
(2000): “Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models,” in Robust Bayesian analysis, ed. by F. Ruggeri, and D. R´ıos-Ins´ ua. Springer. Meer, J., and H. S. Rosen (2004): “Insurance and the utilization of medical services,” Social Science and Medicine, 58, 1623–1632. Miller, R. H., and H. S. Luft (1994): “Managed care plan performance since 1980,” The Journal of the American Medical Association, 271, 1512– 1519. Mullahy, J. (1986): “Specification and testing in some modified count data models,” Journal of Econometrics, 33, 341–365. ¨ ller, P., and F. A. Quintana (2004): “Nonparametric Bayesian data Mu analysis,” Statistical Science, 19, 95–110. Munkin, M. K., and P. K. Trivedi (2003): “Bayesian analysis of a selfselection model with multiple outcomes using simulation-based estimation: an application to the demand for healthcare,” Journal of Econometrics, 114, 197–220. Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 9, 249– 265. Neyman, J. (1923): “Statistical problems in agricultural experiments,” Journal of the Royal Statistic Society, 2, 107–180. Pohlmeier, W., and V. Ulrich (1995): “An econometric model of the twopart decisionmaking process in the demand for health care,” The Journal of Human Resources, 30, 339–361. Poirier, D. J., and J. L. Tobias (2003): “On the predictive distribution of outcome gains in the presence of an unidentified parameter,” Journal of Business and Economic Statistics, 21, 258–268. Riphahn, R. T., A. Wambach, and A. Million (2003): “Incentive effects in the demand for health care: A bivariate panel count data estimation,” Journal of Applied Econometrics, 18, 387–405.
19
Introduction
Robert, C. P., and G. Casella (1999): Monte Carlo statistical methods. Springer, New York. Rosenbaum, P. R., and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 1, 41–55. Roy, A. (1951): “Some thoughts on the distribution of earnings,” Oxford Economic Papers, 3, 135–146. Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology, 66, 688– 701. Sethuraman, J. (1994): “A constructive definition of Dirichlet priors,” Statistica Sinica, 4, 639–650. Tanner, M. A., and W. Wong (1987): “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550. Terza, J. V. (1998): “Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects,” Journal of Econometrics, 84, 129–154.
20
Chapter 2 Estimating the Demand for Health Care with Panel Data A Semiparametric Bayesian Approach (Essay 1)
21
Demand for Health Care
Abstract This paper is concerned with the problem of estimating the demand for health care with panel data. A random effects model is specified within a semiparametric Bayesian approach using a Dirichlet process prior. This results in a very flexible distribution for both the random effects and the count variable. In particular, the model can be seen as a mixture distribution with a random number of components, and is therefore a natural extension of prevailing latent class models. A full Bayesian analysis using Markov chain Monte Carlo (MCMC) simulation methods is proposed. The methodology is illustrated with an application using data from Germany.
JEL classifications C14, C23, I10
Keywords random effects model, Dirichlet process prior, MCMC
22
Demand for Health Care
2.1
Introduction
This paper is concerned with the problem of estimating the demand for health care. It advances on previous cross sectional studies by explicitly incorporating unobserved heterogeneity using a random effects panel data model (see L´opezNicol´as (1998) and Riphahn, Wambach, and Million (2003) for other studies using panel data to infer the demand for health care). This approach allows us to control for different behavioral attitudes or genetic diversity across individuals, which are both very likely to influence the demand for health care. One aspect of our analysis is to develop a semiparametric framework that avoids the arbitrary specification of a particular distribution for the random effects. Another purpose of this paper is to expand the range of the recently advocated latent class models (e.g., Deb and Trivedi (1997) or Jimenez-Martin, Labeaga, and Martinez-Granado (2002)) by allowing the population to be split into more than a small number of classes. An argument in favor of latent class models is that they allow for a heterogeneous population while avoiding the sharp distinction between “users” and “non-users” which is assumed in twopart hurdle models (see, for example, Pohlmeier and Ulrich (1995) or Gurmu (1997)). Deb and Trivedi (2002) use data from the RAND Health Insurance Experiment (RHIE) and find that latent class models outperform two-part models in terms of in-sample and cross-validation model selection tests. However, latent class models only allow for a small number of classes in practice. Moreover, the problem of selecting the number of classes is not straightforward, especially with small sample sizes. In the literature about health care demand, it is common to estimate models with just two classes representing the ‘ill’ and the ‘healthy’ (e.g., Deb and Trivedi (1997, 2002)). This assumption may be too restrictive in some circumstances. Our model overcomes this fact by specifying a Dirichlet process prior (Ferguson (1973) and Antoniak (1974)) for the distribution of the random effects. The resulting mixture distribution of the random effects has a random number of components, and hence it is very flexible while remaining tractable. By having a random number of components, we extend the ’healthy - ill’ dichotomy into a richer classification.
23
Demand for Health Care
We apply the proposed model to analyze equity in the delivery of health care using 5 waves from the German Socio-Economic Panel Study (GSOEP). In particular, we focus on the analysis of horizontal equity. The delivery of health care will be equitable in a horizontal sense if individuals with equal need, in terms of health status, are given the same treatment irrespective of their income and other socio-economic characteristics. For that purpose, we analyze the importance of income and socio-econonomic characteristics in explaining health care utilization while controlling for health status. The plan of the paper is as follows. Section 2 introduces a parametric random effects count data model which assumes a multivariate Normal distribution for the random effects. This model will serve as a benchmark throughout the paper. In Section 3 we present a semi-parametric extension of the model which allows for a wide range of distributions for the random effects. Section 4 describes the numerical procedures (Markov chain Monte Carlo techniques) that we use to obtain the model estimates. In Section 5, we describe the data and the results of the empirical analysis will be presented in Section 6. Section 7 draws some conclusions and provides an outlook on future research.
2.2
A Parametric Benchmark Model
In this section, we describe a parametric Bayesian model for panel count data, which sets the benchmark for the semiparametric extension discussed later (Chib and Winkelmann (2001) analyze a similar model using Bayesian inference, Zeger and Karim (1991) propose a Bayesian approach to generalized linear models and Richardson, Viallefont, and Green (2002) analyze a Bayesian finite mixture Poisson model). We assume that the observed count outcomes yit for individual i = 1, . . . , N over time periods t = 1, . . . , Ti follow a Poisson distribution, that is, yit |θit ∼ Poisson(exp(θit )). (2.1) The logarithm of the conditional mean θit is defined as θit = x′it β + wit′ bi + εit ,
24
(2.2)
Demand for Health Care
where xit is a k × 1 vector of covariates, β is the corresponding parameter vector, wit is a p × 1 vector of covariates for the corresponding vector of unobserved random effects bi and εit is an error term. We assume that bi and εit are independent and that each random effects vector bi follows a p-dimensional multivariate Normal distribution with mean zero and variance-covariance matrix D: bi ∼ Np (0, D). (2.3) The error term εit is drawn from a Normal distribution with mean 0 and precision parameter τ , εit ∼ N(0, τ −1 ). (2.4) The presence of εit relaxes the assumptions in the Poisson distribution by allowing for over-dispersion. The model is completed by specifying the following priors for β, D and τ : β ∼ Nk (µ0 , Σ0 ), (2.5) D−1 ∼ Wishart(ν0 , S0 ), α α 0 0 τ ∼ Gamma , . 2 2
2.3
(2.6) (2.7)
A Semiparametric Extension
It has been shown both from the classical and the Bayesian perspectives that in many situations the assumption of a particular functional form for the random effects is too restrictive and may lead to wrong parameter estimates (see for example Heckman and Singer (1984) who make this point for duration models or Verbeke and Lesaffre (1996) in the context of linear mixed-effects models). For this reason, we now propose a Dirichlet process mixture (DPM) model in the spirit of Ibrahim and Kleinman (1998) that generalizes the parametric benchmark model of the previous section. The model uses a Dirichlet process (Ferguson (1973)), which is a prior that reflects beliefs about the probability function of the random effects. Instead of imposing a parametric probability function for the individual effects, the Dirichlet process represents the uncertainty about the true probability function of the random effects. In addition, the Dirichlet process is flexible enough to approximate any probability func-
25
Demand for Health Care
tion. The DPM model removes the parametric normal prior assigned to the random effects {bi } and replaces it with a general distribution G: bi ∼ G.
(2.8)
The prior distribution on G is then defined to be a Dirichlet process with concentration parameter M and base distribution G0 : G ∼ DP(M · G0 ).
(2.9)
The base distribution G0 is specified as a p-dimensional multivariate Normal distribution: G0 = Np (0, D). (2.10) We therefore add a further stage to the model that allows us to take into account possible deviations of the true distribution of the random effects G from the “baseline” multivariate normal distribution G0 . In other words, we approximate the true nonparametric shape of G by the base distribution G0 . The concentration parameter M reflects our prior belief about how similar G is to G0 . Large values of M lead to a G that is very likely to be close to G0 . Small values of M allow G to deviate more from G0 and put most of its probability mass on just a few atoms. In order to illustrate some properties of the model, Figure 2.1 plots several probability functions for bi and yit , which are obtained from two random draws from the Dirichlet process with M = 10. Note that each draw from the Dirichlet process represents a probability function for the random effects (bi ). This probability function in turn implies a probability function for the count variable. The two draws represented in Figure 2.1 are just two possible scenarios, out of the infinite existing possibilities considered in the prior. In order to obtain these draws, we utilized a truncated version of the sum-representation of the Dirichlet process proposed by Sethuraman (1994). It is clear from the graph that the model can approximate unimodal and multimodal probability functions for the count variable. Figure 2.1 also illustrates that the distribution
26
Demand for Health Care
Figure 2.1: Two draws of bi (left column) and yit (right column) from the prior of the random effects is almost surely discrete (Sethuraman (1994)). When M is small the number of mass points with non-negligible probability is smaller than when M large. As M increases, the probability mass will be more evenly distributed on a bigger set of mass points, and it would resemble more closely the continuous density G0 . Looking at two key features of the Dirichlet process helps to clarify the implications of this setup. First, some of the bi are identical with positive probability. Thus, each bi takes one of l < N distinct values which we denote by κ = (κ1 , . . . , κl ). A so called cluster then contains all random effects which take the same value. In order to discuss the second fact, some additional notation is necessary. Let b−i denote the random effects excluding the random effect for individual i. Finally, let the set κ−i consist of the distinct elements −i of b−i with each value κ−i j appearing mj times. Now we can show that by integrating over G the prior distribution of bi conditional on b−i and G0 can
27
Demand for Health Care
be expressed as:
l
X 1 M m−i δ(κ−i G0 + bi |b , G0 ∼ j ), M +N −1 M + N − 1 j=1 j −i
(2.11)
where δ(κ) represents a degenerate distribution with point mass at κ. Therefore, a new value drawn from the base distribution is chosen for bi with probability M/(M + N − 1), whereas bi takes the value of an already existing −i cluster κ−i j with probability mj /(M + N − 1). Combining this result with equations (2) and (4), we obtain the following expression for the conditional distribution of θit marginalized over bi and G:
θit |β, D, G0 , b
−i
∼
Z
fN (θit |x′it β + wit′ bi , τ −1 )d[bi |b−i , G0 ].
(2.12)
Performing the integration we end up with:
M fN (θit |x′it β, wit′ Dwit + τ −1 ) M +N −1 l X 1 −1 + m−i fN (θit |x′it β + wit′ κ−i ), j ,τ M + N − 1 j=1 j
θit |β,D, G0 , b−i ∼
(2.13)
where fN represents the normal density. We see that θit follows a mixture distribution with a random number of components, where the components differ both with respect to their means and variances. Equation (13) illustrates that the here proposed DPM model can be seen as a mixture model with an infinite number of classes (see Neal (2000) for a more formal presentation of this point). Thus, it contributes to the existing literature on latent class models for estimating the demand for health care. Also note that by using the Dirichlet process as a prior on the distribution of the random effects, we are able to relax the restrictive parametric assumption inherent in the benchmark model
28
Demand for Health Care
in a tractable manner.
2.4
Bayesian MCMC Sampling
Having specified the prior distribution and the likelihood function, we now turn to the analysis of the posterior distribution, which is proportional to the product of these two terms. In the Bayesian approach, the posterior distribution of a model contains all the relevant information and can be used to make probability statements about the parameters.
However, due to the complexity of the proposed models, we are not able to analyze their posterior distributions analytically. This problem can be overcome by applying Markov chain Monte Carlo (MCMC) techniques. This means that we draw large samples from the posterior distributions and then use these samples to summarize the posterior distributions. We do this by employing the Gibbs sampler where each element of the parameter vectors is updated conditional on the actual values of the other components. After discarding some number of initial draws, the resulting Markov chains have converged to the posterior distributions. We refer to Chen, Shao, and Ibrahim (2000) or Robert and Casella (1999) for comprehensive surveys on MCMC methods. In order to keep the Gibbs sampler computations simple, we apply the data augmentation technique put forward by Tanner and Wong (1987). This means that we include the random effects {bi } and the latent variables {θit } in the parameter space. Thus, we end up with full conditional distributions which take convenient functional forms. The resulting Gibbs sampler for the parametric benchmark model can be summarized as follows (further details on the algorithm are given in the appendix of this paper): 0. Choose starting values for τ , {bi }, D−1 ,{θit }. 1. Sample β from [β|{bi }, τ, {θit }], which is a Normal distribution.
29
Demand for Health Care
2. Sample τ from [τ |{bi }, β, {θit }], which is a Gamma distribution. 3. Sample {θit } from [θit |bi , β, τ ], using the Metropolis-Hastings algorithm, independently for i = 1, . . . , N and t = 1, . . . , Ti . 4. Sample {bi } from [bi |β, τ, D, {θit }], which is a Normal distribution, independently for i = 1, . . . , N . 5. Sample D−1 from [D−1 |{bi }], which is a Wishart distribution. 6. Repeat Steps 1-5 using the updated values of the conditioning variables. Since G0 is chosen to be a conjugate prior distribution (a conjugate prior distribution yields a posterior distribution that falls in the same class of distributions), we can easily set up a Gibbs sampler for the semiparametric model as well. Examples of MCMC methods applied to the semiparametric setting are Escobar and West (1995) or MacEachern and M¨ uller (1998). In particular, we have to modify steps 4 and 5 as follows (further details are also given in the appendix): 4’a. Sample {bi } from [bi |b−i , G0 , D, β, τ, {θit }], independently for i = 1, . . . , N . 5’. Sample D−1 from [D−1 |{κj }], which is a Wishart distribution. In order to improve the mixing behavior of the modified algorithm, we follow a strategy proposed by Bush and MacEachern (1996) and resample the cluster values {κj } after determining how the bi s are grouped. This is achieved by including the following step: 4’b. Sample {κj } from [κj |β, τ, D, {θit }], which is a Normal distribution, independently for j = 1, . . . , l. We would like to point out that the Bayesian approach and its application via MCMC techniques offer several advantages. First, the Bayesian approach allows for full and exact small sample inference both in the parametric and the semiparametric version of the model and is not restricted to asymptotic
30
Demand for Health Care
approximations. Second, numerical integration methods are avoided in the evaluation of the model. Finally, by using data augmentation we easily obtain estimates for the random effects. This becomes important when analyzing extensions of the model in which the estimates of the random effects play a central role on their own (see Ibrahim and Kleinman (1998) and the cited literature therein). For example, one might think of a possible extension of the model in the direction of causal effect modelling. In this case, MCMC methods would allow us to calculate individual treatment effects (see Chib and Hamilton (2002)). In addition, the estimates of the random effects in our model can be used to obtain predictions in a simple way. Thus, our framework can be easily used for analyzing the impact of institutional changes on the individual demand for health care.
2.5
The Data
In the following, the proposed methodology is used to estimate the demand for health care by the elderly in Germany. There are many existing studies analyzing the demand for health care, but only few of them focus on the elderly population (Deb and Trivedi (1997) is one exception). Nevertheless, this group is of particular interest, since elderly people typically have higher medical care needs and costs and their population share is steadily growing in many countries. The data set used in this study stems from five waves (1997-2001) of the German Socio-Economic Panel Study (GSOEP). The GSOEP, conducted by the German Institute for Economic Research in Berlin, is a representative longitudinal survey of German households (for more information, see SOEP Group SOEP Group (2001)). It contains detailed information about the health care utilization of the respondents and insurance schemes under which they are covered. We restrict our analysis to retired men who are older than 65 years. After eliminating all observations with missing values on any of the variables of interest, we obtain a final sample of 1854 individuals and 4761 person-year observations. Note that the observations are not equally distributed throughout
31
Demand for Health Care
Variable VISITS AGE AGE2 EDUCATION SATISFAC LOWS HIGHS HANDICAP HDEGREE NOPARTNER PENSION PUBLICIN ADDON FOREIGN YEAR97 YEAR98 YEAR99 YEAR00 YEAR01 N P = 1854 Ti = 4761
Definition Number of doctor visits in last 3 months Age in years Age squared in years / 1000 Years of education Self reported health satisfaction (0-low to 10-high) 1 if SATISFAC < 4 1 if SATISFAC > 6 1 if individual is handicapped Degree of handicap in percentage points 1 if individual has no partner Monthly pension payments in DM / 1000 1 if individual is in public health insurance 1 if individual purchased add-on insurance 1 if individual is foreigner 1 if year = 1997 1 if year = 1998 1 if year = 1999 1 if year = 2000 1 if year = 2001
Mean 4.120 72.371 5.274 11.300 5.667 0.187 0.400 0.337 21.800 0.145 2.639 0.920 0.055 0.056 0.118 0.138 0.153 0.298 0.293
Std. Dev. 5.534 6.041 0.913 2.306 2.323
33.102 1.295
Table 2.1: Variable definitions and summary statistics the five years, since both in 1998 and 2000 the GSOEP was expanded with new sub-samples. The variable definitions and summary statistics are reported in Table 2.1. The dependent variable in our study is the number of visits to a doctor in the last three months prior to the survey (VISITS). Note that visits to a dentist are subsumed under this definition as well. The explanatory variables consist of socioeconomic characteristics and variables that describe the health condition of the individual. In particular, we include a self-perceived health satisfaction index (SATISFAC), as well as variables measuring disability (HANDICAP and HDEGREE). In order to capture nonlinear and threshold effects of SATISFAC we include the dummy variables LOWS and HIGHS. In the German health care system, only individuals above a certain earnings level (3,825 Euros gross monthly earnings in 2003), civil servants, or self-
32
Demand for Health Care
employed individuals can opt out the public insurance scheme (PUBLICIN) and choose a private insurance plan or remain uninsured. Individuals in the public insurance scheme can purchase add-on insurance (ADDON) that, for example, covers extra costs for dental prostheses or glasses. Given this institutional setup, the decisions to choose a private insurance plan and to purchase add-on insurance may be endogenous. However, since we control for the health condition of the individual, the strength of this argument is reduced (see Deb and Trivedi (1997), who argue in the same line). The possibility of endogeneity should nevertheless not be overlooked when interpreting the results.
2.6
Results
We analyze these data using both the parametric benchmark model and the semiparametric extension of it. Prior elicitation is done in the following way: we randomly choose 250 individuals from the data set and analyze this “ training sample ” using the parametric benchmark model with uninformative priors (i.e. priors with large variance). In this way we mimic the usual Bayesian approach where the results of a previous study with different data are used to select prior distributions (Chib and Hamilton Chib and Hamilton (2002) and Ibrahim and Kleinman Ibrahim and Kleinman (1998) also follow the ’training sample’ strategy). For a discussion of the training sample strategy see, for example, Gelfand, Dey, and Chang Gelfand, Dey, and Chang (1992)) and Ghosh and Samanta Ghosh and Samanta (2002). To analyze the remaining data, we select a prior distribution on D−1 by ˆ −1 ˆ is the training sample posterior mean. setting ν0 = 250 and S0 = Dν0 , where D Cowles, Carlin, and Connett (1996) argue that a flatter prior on the variance matrix of the random effects can lead to a slow convergence of the algorithm (see also Ibrahim and Kleinman (1998)). In addition, the prior means and variances of the slope parameters in β are the corresponding estimates obtained with the training sample. In order to facilitate the calculation of Bayes factors (Verdinelli and Wasserman (1995)), the non diagonal elements in Σ0 are set equal to zero. Finally, in order to represent prior ignorance, we set α0 = 0.001.
33
Demand for Health Care
We then estimate the parametric benchmark model and the semiparametric model with M equal to 10. Recall that a Dirichlet process prior implies that we expect the density of the individual effects to be discrete (we showed several draws from the prior on the distribution of the random constant in Figure 2.1). Given our choice of M , S0 and ν0 , the number of mass points with probability larger than 0.01 is between 2 and 9 with probability 0.95 (we calculate this “a priori” credible interval by Monte Carlo simulation). We specify the models choosing VISITS as the dependent variable. All other variables (including the year dummies) plus a constant are included in the population mean vectors. The random effects include a constant and the effects of SATISFAC, HIGHS and LOWHS. The models are then estimated using the MCMC sampling algorithms described in Section 4. We ran each for 30,000 iterations keeping the last 25,000 iterations each time. To give an indication of the performance of the algorithm for the semiparametric model, Figure 2.2 reports the posterior histograms and autocorrelation functions of βAGE , τ , and DC , where DC is the variance of the intercept in the base measure (DSAT ISF AC , DHIGHS and DLOW S denote the variances of the SATISFAC, HIGHS and LOWS effects, respectively). It can be seen that the mixing behavior of the sampler is satisfactory since autocorrelations decline steadily as the number of lags increases. The algorithm for the parametric model displays an even better mixing behavior. Table 2.2 shows the posterior estimates for the parametric and semiparametric models. The medians and 95% highest posterior density (HPD) regions for some marginal effects are quite similar. However, the estimates for the effects of SATISFAC, LOWS, HANDICAP, NOPARTNER, PUBLICIN and FOREIGN are substantially different among the parametric and semiparametric models. This indicates that the posterior distributions of the binary covariates are the most affected by the relaxation of the parametric assumptions. With regard to the effect of NOPARTNER, zero is included in the 95% HPD region in the parametric case but it is excluded in the semiparametric case. However, also the posterior distributions for the effect of continuous covariates change. This is illustrated in Figure 2.3, which compares the posterior density of the coefficient of SATISFAC in both models. Note that the semiparametric point estimate receives very small density weight in the para-
34
Demand for Health Care
Figure 2.2: Autocorrelation functions and posterior histograms for τ (top row), the marginal effect of AGE2 (middle row) and DC (bottom row)
35
Demand for Health Care
metric model and that there is more uncertainty in the estimate when the parametric assumptions are relaxed. Additionally, the posterior distributions of the elements of the covariance matrix D are noticeably different in the two models. One has to keep in mind, however, that D plays a different role in the semiparametric model and hence obtaining a meaningful comparison is difficult. Variable 2.5%
50%
Quantiles 97.5% 2.5%
M =∞ AGE AGE2 EDUCATION SATISFAC LOWS HIGHS HANDICAP HDEGREE NOPARTNER PENSION PUBLICIN ADDON FOREIGN τ
50%
97.5%
M = 10
0.151 −6.094 −0.006 −0.584 −0.354 −0.845 −0.690 0.010 −0.945 −0.198 −0.526 −0.442 −0.314
0.576 −3.343 0.066 −0.460 0.108 −0.424 −0.002 0.019 −0.493 −0.057 0.054 0.124 0.408
0.990 −0.530 0.140 −0.338 0.571 0.002 0.676 0.029 −0.046 0.086 0.651 0.699 1.106
0.163 −6.140 −0.009 −0.708 −0.601 −0.899 −0.709 0.011 −0.783 −0.163 −0.383 −0.444 −0.434
0.571 −3.347 0.057 −0.563 −0.073 −0.379 −0.045 0.020 −0.342 −0.034 0.209 0.099 0.240
0.991 −0.648 0.123 −0.423 0.440 0.126 0.617 0.030 0.085 0.092 0.767 0.621 0.890
4.760
5.478
6.245
4.809
5.560
6.387
0.261 0.018 0.035 0.055 vector.
0.560 0.027 0.069 0.119
1.195 0.045 0.168 0.292
DC 0.177 0.212 0.256 DSATISFAC 0.016 0.018 0.021 DLOWS 0.105 0.123 0.146 DHIGHS 0.130 0.153 0.183 Note: We report marginal effects for the coefficient
Table 2.2: Posterior estimates for the parametric benchmark model (M = ∞) and the MDP model with M = 10
The estimated coefficients on AGE and AGE2 imply that the number of doctor visits increases with age until the age of 85 and decreases thereafter. There is a large probability that the effect of NOPARTNER is negative, but positive values cannot be ruled out. Similarly, there is some uncertainty regarding the sign of the effect of education. The evidence on the effect of disability
36
Demand for Health Care
Figure 2.3: Posterior distributions of βSATISFAC : Parametric benchmark model (dashed curve) and MDP model with M = 10 (solid curve)
37
Demand for Health Care
is twofold: the sign of the dummy variable (HANDICAP) is not clearly determined, whereas the degree of handicap (HDEGREE) has an unambiguously positive effect. An increase of 10 percentage points would lead to 0.2 visits more on average. The variable SATISFAC has as expected a negative effect, whereas the signs of the threshold effects (LOWS and HIGHS) are uncertain. Note that the variance of the time variant error term εit is small when compared with the variance of the individual effects. Thus, individual heterogeneity accounts for a large proportion of the variability in the data, which illustrates the importance of modelling the distribution of the individual effects correctly. There is substantial uncertainty regarding the signs of the coefficients of the variables FOREIGN, ADDON, PUBLICIN and PENSION. Riphahn, Wambach, and Million (2003) argue that the result for ADDON is not surprising and can be explained by the benefit packages of the German add-on insurance plans. In order to determine whether the delivery of health care for the elderly is equitable, we test the hypothesis that the variables EDUCATION, FOREIGN, ADDON, PUBLICIN and PENSION have all a zero effect. We calculate a Bayes factor for this hypothesis following the method proposed by Verdinelli and Wasserman (1995). We obtain that the hypothesis of equitable delivery of health care is much more likely than the alternative (the probability of this hypothesis versus the alternative is 0.9993). Note, however, that the model does not account for the possible endogenous nature of the variable PUBLICIN. An extension in the direction of causal modelling using the potential outcomes approach is one direction for future research.
2.7
Conclusion
This paper developed a semiparametric Bayesian framework for estimating the demand for health care with panel data. This was done by specifying a Dirichlet process prior for the distribution of the random effects. Thus, the presented framework allowed explicitly for individual heterogeneity while it did not impose unreasonably strong constraints on distributional assumptions.
38
Demand for Health Care
It was shown that the model can be seen as a natural extension of latent class models, which abound in the recent literature on health care demand. This results from the fact that the Dirichlet process prior leads to a mixing distribution with an infinite number of components. The model was used to test for the existence of horizontal equity using German data. The estimation was carried out with MCMC methods. The results were largely in accordance with the previous literature. The approach presented here can be extended in many directions, including the development of a potential outcomes model for inferring causal effects, or a model that allows for the endogenous nature of private insurance.
Bibliography Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152– 1174. Bush, C. A., and S. N. MacEachern (1996): “A semi-parametric Bayesian model for randomized block designs,” Biometrika, 83, 275–285. Chen, M. H., Q. M. Shao, and J. G. Ibrahim (2000): Monte Carlo methods in Bayesian computation. Springer, New York. Chib, S., and B. H. Hamilton (2002): “Semiparametric Bayes analysis of longitudinal data treatment models,” Journal of Econometrics, 110, 67–89. Chib, S., and R. Winkelmann (2001): “Markov chain Monte Carlo analysis of correlated count data,” Journal of Business and Economic Statistics, 19, 428–435. Cowles, M. K., B. P. Carlin, and J. E. Connett (1996): “Bayesian tobit modeling of longitudinal ordinal clinical compliance data with nonignorable missingness,” Journal of the American Statistical Association, 91, 86–98.
39
Demand for Health Care
Deb, P., and P. Trivedi (1997): “Demand for medical care by the elderly: A finite mixture approach,” Journal of Applied Econometrics, 12, 313–336. (2002): “The structure of demand for health care: Latent class versus two-part model,” Journal of Health Economics, 21, 601–625. Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588. Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1, 209–230. Gelfand, A. E., D. Dey, and H. Chang (1992): “Model determination using predictive distributions with implementation via sampling based methods (with discussion),” in Bayesian Statistics 4, ed. by J. M. Bernardo, J. O. Berger, J. O. Dawid, and A. F. M. Smith, pp. 147–167. Oxford University Press. Ghosh, J. K., and T. Samanta (2002): “Nonsubjective Bayes testing - an overview,” Journal of Statistical Planning and Inference, 103, 205–223. Gurmu, S. (1997): “Semiparametric estimation of hurdle regression models with an application to Medicaid utilization,” Journal of Applied Econometrics, 12, 225–242. Heckman, J., and B. Singer (1984): “A method for minimizing the impact of distributional assumptions in econometric models of duration,” Econometrica, 52, 271–320. Ibrahim, J. G., and K. P. Kleinman (1998): “Semiparametric Bayesian inference for random effect models,” in Practical nonparametric and semiparametric Bayesian statistics, ed. by D. Dey, P. M¨ uller, and D. Sinha. Springer, New York. Jimenez-Martin, S., J. Labeaga, and M. Martinez-Granado (2002): “Latent class versus two-part models in the demand for physician services across the European Union,” Health Economics, 11, 301–321.
40
Demand for Health Care
´ pez-Nicola ´ s, A. (1998): “Unobserved heterogeneity and censoring in the Lo demand for private health care,” Health Economics, 7, 429–437. ¨ ller (1998): “Estimating mixtures of MacEachern, S. N., and P. Mu Dirichlet process models,” Journal of Computational and Graphical Statistics, 7, 223–238. Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 9, 249– 265. Pohlmeier, W., and V. Ulrich (1995): “An econometric model of the twopart decisionmaking process in the demand for health care,” The Journal of Human Resources, 30, 339–361. Richardson, S., V. Viallefont, and P. J. Green (2002): “Bayesian analysis of poisson mixtures,” Journal of Nonparametric Statistics, 14, 181– 202. Riphahn, R. T., A. Wambach, and A. Million (2003): “Incentive effects in the demand for health care: A bivariate panel count data estimation,” Journal of Applied Econometrics, 18, 387–405. Robert, C. P., and G. Casella (1999): Monte Carlo statistical methods. Springer, New York. Sethuraman, J. (1994): “A constructive definition of Dirichlet priors,” Statistica Sinica, 4, 639–650. SOEP Group (2001): “The German Socio-Economic Panel (GSOEP) after more than 15 years - Overview,” in Proceedings of the 2000 Fourth International Conference of German Socio-Economic Panel Study Users (GSOEP2000), ed. by E. Holst, D. R. Lillard, and T. A. DiPrete, vol. 70 of Vierteljahrshefte zur Wirtschaftsforschung, pp. 7–14. Tanner, M. A., and W. Wong (1987): “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550.
41
Demand for Health Care
Verbeke, G., and E. Lesaffre (1996): “A linear mixed-effects model with heterogeneity in the random-effects population,” Journal of the American Statistical Association, 91, 217–221. Verdinelli, I., and L. Wasserman (1995): “Computing Bayes factors using a generalization of the Savage-Dickey density ratio,” Journal of the American Statistical Association, 90, 614–618. Zeger, S. L., and M. R. Karim (1991): “Generalized linear models with random effects: A Gibbs sampling approach,” Journal of the American Statistical Association, 86, 79–86.
42
Demand for Health Care
Appendix The Algorithm for the Parametric Model 1. Sampling β from [β|{bi }, τ, {θit }]:
−1 ′ −1 p(β|{bi }, τ, {θit }) ∝|Σ0 | exp (β − µ0 ) Σ0 (β − µ0 ) 2 ! Ti n −τ X X (θit − x′it β − wit′ bi )2 , × exp 2 i=1 t=1 − 21
(2.14)
so that [β|{bi }, τ, {θit }] ∼ Nk (µβ , Σβ ) with Σβ =
Σ−1 0 +τ
Ti n X X
xit x′it
i=1 t=1
and µβ = Σβ
Σ−1 0 µ0 + τ
Ti n X X
(2.15)
!−1
(2.16)
!
xit (θit − wit′ bi ) .
i=1 t=1
(2.17)
2. Sampling τ from [τ |{bi }, β, {θit }]: p(τ |{bi }, β, {θit }) ∝ τ
α0 −1 2
exp
−α0 τ 2
n 2
τ exp
! Ti n −τ X X ε2 , 2 i=1 t=1 it (2.18)
so that [τ |{bi }, β, {θit }] ∼ Gamma
α0 + n α0 + , 2
Pn PTi i=1
2 t=1 εit
2
!
.
(2.19)
3. Sampling {θit } from [θit |{bi }, β, τ ]: p(θit |{bi }, β, τ ) ∝ exp
− exp(θit ) + yit θit ! τ (θit − x′it β − wit′ bi )2 . − 2
43
(2.20)
Demand for Health Care
4. Sampling {bi } from [bi |β, τ, D, {θit }]:
−1 ′ −1 p(bi |β, τ, D, {θit }) ∝ exp b D bi 2 i ! Ti −1 X (θit − x′it β − wit′ bi )2 , × exp 2 t=1
(2.21)
so that [bi |β, τ, D, {θit }] ∼ Np (µb , Σb )
(2.22)
!−1
(2.23)
with
Σb =
τ
Ti X
wit wit′ + D−1
t=1
and
µb = Σb τ
Ti X
wit (θit − x′it β).
(2.24)
(2.25)
t=1
5. Sampling D−1 from [D−1 |{bi }]: −1 −1 −1 exp p(D |{bi }) ∝|D | tr(S0 D ) 2 ! n X n −1 b′ D−1 bi , × |D−1 | 2 exp 2 i=1 i −1
−1
ν0 −p−1 2
so that
[D−1 |{bi }] ∼ Wishart ν0 + n, S0−1 +
44
n X i=1
bi b′i
!−1
.
(2.26)
Demand for Health Care
The Algorithm for the Semiparametric Model 4’a. Sampling {bi } from [bi |b−i , G0 , D, β, τ, {θit }]: Sample bi |b−i , G0 , D, β, τ, {θit } from the distribution qi0 π0 (bi |β, τ, D, {θit }) +
X
qij δ(κ−i j ),
(2.27)
j
where π0 denotes the density of the p-variate Normal distribution: π0 (bi |β, τ, D, {θit }) = fN (µb , Σb ),
(2.28)
where µb and Σb are defined above. The weights sum up to 1 and are given by 1
1
qi0 ∝ M |Σb | 2 |D|− 2 exp
τ
2
(θi − Xi β)′ Ui (θi − Xi β)
(2.29)
and τ −i ′ −i (θ − X β − W κ ) (θ − X β − W κ ) , qij ∝ m−i exp − i i i j i i i j j 2
(2.30)
where Xi ≡ (xi1 , xi2 , . . . , xiTi )′ , Wi ≡ (wi1 , wi2 , . . . , wiTi )′ , θi ≡ (θi1 , θi2 , . . . , θiTi )′ and Ui ≡ (τ Wi Σb Wi′ − I). 4’b. Sampling {κj } from [κj |β, τ, D, {θit }]:
−1 ′ −1 p(κj |β, τ, D, {θit }) ∝ exp κ D κj 2 j ! (2.31) Ti −1 X X (θit − x′it β − wit′ κj )2 , × exp 2 i∈j t=1 so that [κj |β, τ, D, {θit }] ∼ Np (µκ , Σκ ) with Σκ =
τ
Ti XX i∈j t=1
45
wit wit′ + D−1
!−1
(2.32)
(2.33)
Demand for Health Care
and µκ = Σκ τ
Ti XX
wit (θit − x′it β).
(2.34)
i∈j t=1
5’. Sampling D−1 from [D−1 |{κj }]: −1 −1 −1 tr(S0 D ) p(D |{κj }) ∝|D | exp 2 ! l X l −1 × |D−1 | 2 exp κ′ D−1 κj , 2 j=1 j −1
−1
ν0 −p−1 2
(2.35)
so that
[D−1 |{κj }] ∼ Wishart ν0 + l, S0−1 +
46
l X j=1
!−1 . κj κ′j
(2.36)
Chapter 3 Nonparametric Bayesian Inference for Count Data Treatment Effects (Essay 2)
47
Count Data Treatment Effects
Abstract This paper focuses on the estimation of causal treatment effects in situations where the variable of interest is a count process. A potential outcomes model is specified in a semiparametric Bayesian fashion using a mixture of Dirichlet processes. The model is then subject to a full Bayesian analysis using Markov chain Monte Carlo simulation methods. The proposed techniques are illustrated with a real data application.
JEL classifications C11, C15, C25
Keywords count data, treatment effect, MDP, MCMC
Acknowledgements The author is grateful to Joseph Terza who provided the data set used in this paper.
48
Count Data Treatment Effects
3.1
Introduction
This paper is concerned with the problem of causal inference in models where the variable of interest is a count process and responses to treatment vary among observationally identical individuals. It accounts for the fact that unobserved characteristics may determine both the probability of treatment and the response variable. The resulting selection bias is a crucial problem when estimating causal effects in situations with non-random assignment of treatments. Count outcomes may arise in many economic areas, including applications from labor, health or industrial organization. The effect of a firm’s corporate structure on its innovation behavior in terms of patent applications is just one example in this context. The effect of health insurance on the number of visits to the doctor is another one. Despite the variety of possible applications, techniques for dealing with count data treatment effect models are underdeveloped compared to the continuous data case. Terza (1998) analyzes a count data regression model with an endogenous dummy variable and discuss estimation by full-information maximum likelihood (FIML), two-stage methods (TSM) and weighted non-linear least squares (WNLS).1 Romeu and Vera-Hern´andez (2005) extend the basic model by specifying the conditional probability function of the counts in a flexible way using a series expansion approach. Van Ophem (2000) and Zimmer and Trivedi (2006) use copula functions to model selectivity in the count data context. Furthermore, a selection correction for ordered treatment effects is proposed by Lee (2004). Bayesian approaches include Kozumi (2002) who considers Terza’s (1998) model from a Bayesian point of view. Munkin and Trivedi (2003) analyze a Bayesian model with one endogenous treatment variable and two outcome variables including one count.2 This paper advances on the work of Terza (1998), Kozumi (2002) and Munkin and Trivedi (2003) by allowing the treatment effect to vary across individuals. Another main contribution of the paper is to formulate the potential 1
Greene (2001) follows a similar approach. See also Chib, Greenberg, and Winkelmann (1998), who analyze correlated count data, but do not consider the case of endogeneity. 2
49
Count Data Treatment Effects
outcomes model of Rubin (1974) in a semiparametric Bayesian fashion. This is done using a mixture of Dirichlet processes (MDP) and has not been considered for count data before. Our approach brings about the obvious benefits of structural models. The resulting estimates are economically interpretable and can be used to conduct out-of-sample forecasts. Moreover, by applying Markov chain Monte Carlo (MCMC) simulation methods we are able to estimate treatment effects on the individual level. These are of interest on their own and may be easily summarized to calculate a variety of treatment parameters. Last but not least, the Bayesian approach allows for exact inference in small samples and is not restricted to asymptotic approximations. The rest of the paper is organized as follows. In Section 2, we present a semiparametric version of a potential outcomes model for count data. Section 3 discusses the estimation of the model via MCMC techniques. In section 4, we consider an application of the proposed methods. Section 5 concludes and provides an outlook on future research.
3.2
The Model
Following the treatment effect literature (see for example, Neyman (1923), Fisher (1935), Roy (1951), Cox (1958) and Rubin (1974)), two potential outcomes (y0i , y1i ) are defined for each individual i. y0i denotes the potential outcome in the non-treatment state, whereas y1i describes the potential outcome in the treatment state. With di = 1 denoting treatment, the observed outcome yi is then given by yi = (1 − di )yi0 + di yi1 .
(3.1)
Furthermore, we assume that conditional on an unobserved random component (denoted by η0i and η1i , respectively) the two potential outcomes follow a Poisson distribution. That is: P(yi0 = k|η0i ) =
exp[− exp(η0i )] exp(η0i )k , k!
50
(3.2)
Count Data Treatment Effects
and
exp[− exp(η1i )] exp(η1i )k . k! The individual decision rule is then specified by P(yi1 = k|η1i ) =
1, if d∗ ≥ 0, i di = 0, otherwise,
(3.3)
(3.4)
where d∗i is a latent variable determining the treatment status. To allow for unobserved selection into treatment, we assume that, conditional on a positive random variable λi , the unobserved random variables η0i , η1i and d∗i follow a trivariate normal distribution: 2 σ0 0 σ0d η0i α0 + x′i β0 −1 η1i ∼ N α1 + x′i β1 , λi 0 σ12 σ1d , zi′ βD σ0d σ1d 1 d∗i
(3.5)
where xi is a k × 1 vector of observed covariates not including a constant and zi denotes a l ×1 vector of observed random variables including a constant. α0 , α1 , β0 , β1 and βd are the corresponding parameters and σ ≡ (σ02 , σ12 , σ0d , σ1d ) consists of the elements of the variance-covariance matrix.3 Additionally, it is assumed that zi includes at least one component which is not in xi . Furthermore, the model assumes that the scale parameters {λi } are independent and identically drawn from a discrete random distribution G for which a Dirichlet process prior (Ferguson (1973), Antoniak (1974)) is specified for. The parameters of the Dirichlet process are the baseline distribution G0 , which centers the process and a positive precision parameter ν, which loosely spoken measures the ‘strength of belief’ in G0 . Choosing a Gamma distribution as the baseline measure G0 , we thus get: λi |G ∼ G,
(3.6)
G ∼ D(νG0 ),
(3.7)
3
Setting σ01 to zero is a standard assumption in the literature. Vijverberg (1993) and Koop and Poirier (1997) explain how to relax this assumption.
51
Count Data Treatment Effects
G0 = Gamma
γ γ , . 2 2
(3.8)
This last stage in the hierarchy explains why this type of model is called a mixture of Dirichlet processes (MDP) model in the literature.4 MDP models combine the advantages of parametric models while making it possible to relax distributional assumptions, and they are still quite easy to use. The idea of choosing the baseline distribution for the scale parameter λi to be a Gamma distribution is provided by Chib and Hamilton (2002). They use this setup to analyze a continuous outcomes treatment effect model for longitudinal data. Their reasoning that the chosen mixture distribution is appealing and leads to a general and flexible joint distribution of the treatment and the potential outcomes holds for the model presented in this paper as well. The model is completed with independent priors π(α0 ), π(α1 ), π(β0 ), π(β1 ), π(βd ), π(σ) and π(ν) for α0 , α1 , β0 , β1 , βd , σ, and for the precision parameter ν of the Dirichlet process prior. In particular, we use normal priors for α0 and α1 , multivariate Normal priors for β0 , β1 and βd and a truncated multivariate Normal prior for σ, π(α0 ) = N(α0 |¯ a0 , A¯0 ),
(3.9)
π(α1 ) = N(α1 |¯ a1 , A¯1 ),
(3.10)
¯0 ), π(β0 ) = Nk (β0 |¯b0 , B
(3.11)
¯1 ), π(β1 ) = Nk (β1 |¯b1 , B
(3.12)
¯d ), π(βd ) = Nl (βd |¯bd , B
(3.13)
¯ )1PD (σ), π(σ) = N4 (σ|m, ¯ M
(3.14)
where 1PD (σ) is the indicator function taking the value one if σ leads to a positive definite variance-covariance matrix and the value zero otherwise. Following Escobar and West (1995), we specify the prior on ν to be a Gamma 4
See MacEachern and M¨ uller (1998) for example.
52
Count Data Treatment Effects
¯5 distribution with shape parameter c¯ and scale parameter d, ¯ π(ν) = Ga(ν|¯ c, d).
3.3
(3.15)
Bayesian MCMC Sampling
Due to the complexity of the proposed model, we are not able to analyze the posterior distribution by analytical means. Therefore we apply Markov chain Monte Carlo (MCMC) techniques. That is, we base our inference on a large sample of draws from the posterior distribution, where the sample is generated by designing a Markov chain with a transition kernel having an invariant measure equal to the posterior distribution.6 Thereby, we refer to Tanner and Wong (1987) and augment the parameter space by the latent variables (ηio , ηiu , d∗i , yi∗ ), where ηio denotes the element of (η0i , η1i ) that corresponds to the observed potential outcome. That is, if di = 0, then ηio ≡ η0i , and if di = 1, then ηio ≡ η1i . Let ηiu be defined conversely. yi∗ denotes the unobserved potential outcome, that is yi∗ ≡ di y0i + (1 − di )y1i . Since G0 is chosen to be conjugate, we can construct a Gibbs sampler by resorting to the Blackwell and MacQueen (1973) representation of the Dirichlet process and marginalizing over the unknown measure G in the calculations. This idea was first proposed by Escobar (1994) and Escobar and West (1995). Due to the discrete nature of G some of the λi can share the same value and we define κ = (κ1 , . . . , κl ), l ≤ n, as the set of distinct λi s. Further, let the vector s = (s1 , . . . , sn ) indicate in which cluster each of the λi s lies. The sampling scheme for our model can be summarized by the following steps:7 0. Chose starting values for (α0 , α1 , β0 , β1 , βd , σ, ν, λ, η o , d∗ , η u , y ∗ ). 5
For other approaches to eliciting a prior on the precision parameter ν see, for example, Walker and Mallick (1997) or Carota and Parmigiani (2002). 6 See Chen, Shao, and Ibrahim (2000) or Robert and Casella (1999) for comprehensive surveys on MCMC methods. 7 Note that the conditioning sets of all distributions explicitly recognize only those parameters that are relevant.
53
Count Data Treatment Effects
1. Sample (α0 , α1 , σ, d∗ ) from (α0 , α1 , σ, d∗ |β0 , β1 , βd , λ, η o ) by (a) sampling (α0 , α1 , σ) from (α0 , α1 , σ|β0 , β1 , βd , λ, η o ) using the MetropolisHastings algorithm, (b) sampling d∗i from (d∗i |α0 , α1 , σ, β0 , β1 , βd , λi , ηio ), which is a truncated Normal distribution, independently for i = 1, . . . , n. 2. Sample ηio from (ηio |α0 , α1 , β0 , β1 , βd , σ, λi , d∗i ) independently for i = 1, . . . , n, using the adaptive rejection sampling algorithm of Gilks and Wild (1992). 3. Sample β0 from (β0 |α0 , βd , σ, λ, η o , d∗ ), which is a Normal distribution. 4. Sample β1 from (β1 |α1 , βd , σ, λ, η o , d∗ ), which is a Normal distribution. 5. Sample βd from (βd |α0 , α1 , β0 , β1 , σ, λ, η o , d∗ ), which is a Normal distribution. 6. Sample λi from (λi |λ−i , α0 , α1 , β0 , β1 , βd , σ, ν, η o , d∗ ), independently for i = 1, . . . , n, using the above mentioned Polya urn scheme representation of the Dirichlet process. 7. Sample κj from (κj |α0 , α1 , β0 , β1 , βd , σ, η o , d∗ ), j = 1, . . . , l, which is a Gamma distribution.
independently for
8. Sample ν from (ν|λ), using the data augmentation idea of Escobar and West (1995). 9. Sample (ηiu , yi∗ ) from (ηiu , yi∗ |α0 , α1 , β0 , β1 , βd , σ, λi , ηio , d∗i ) independently for i = 1, . . . , n by (a) sampling ηiu from (ηiu |α0 , α1 , β0 , β1 , βd , σ, λi , ηio , d∗i ), which is a Normal distribution, independently for i = 1, . . . , n, (b) sampling yi∗ from (yi∗ |ηiu ), which is a Poisson distribution, independently for i = 1, . . . , n. 10. Repeat steps 1-9 using the updated values of the conditioning variables.
54
Count Data Treatment Effects
Finally, we would like to point out that the Bayesian approach and its application via MCMC methods has several advantages. First, by simulating the latent variables we are able to compute posterior distributions for the individual treatment effects ∆i (see Chib and Hamilton (2000)). These can then be summarized to calculate mean treatment parameters. Second, we can avoid numerical integration methods in evaluating our model. Finally, the Bayesian approach allows for full and exact inference without being limited to asymptotic approximations in the semiparametric framework presented here.
3.4
Empirical Illustration: Number of Trips by Households
We illustrate our proposed framework considering the data from the article “Estimating count data with endogenous switching: Sample selection and endogenous treatment effects” by Terza (1998). These effectively became one benchmark for methods analyzing endogeneity in count data given that Kozumi (2002) and Romeu and Vera-Hern´andez (2005) also use them for exemplifying their approaches. The data set contains information about 577 households and is described in more detail by Terza and Wilson (1990). The dependent count variable TOTTRIPS is the number of trips taken by members of the respective household in the 24 hours period prior to the interview. The binary treatment variable OWNVEH indicates whether the household owns at least one motorized vehicle (OWNVEH=1) or not (OWNVEH=0). The covariates include demographic and economic characteristics of the household. Terza argues that unobserved factors which relate to household members’ tastes for public transportation may be correlated with vehicle ownership and thus may lead to endogeneity. We follow Terza’s reasoning and use ADULTS which denotes the number of adults in the household 16 years of age or older as instrumental variable. Table 3.1 describes the variables used in the analysis. formed some variables into dummies which reduces the present in the data. Summary statistics for the entire groups of owners and non-owners are given in Table 3.2.
55
Note that we transinfluence of outliers sample and for the The means between
Count Data Treatment Effects
the groups are significantly different. For example, vehicle owners live in larger households, in smaller areas, farther from the nearest transit node and have a higher income. In addition, if there is a full-time worker in a household, the more likely the household is to own a motorized vehicle. Variable Definition TOTTRIPS total number of total trips OWNVEH =1 if household owns at least one motorized vehicle WORKSCHL % of total trips for work or school vs. personal business or pleasure HHMEM number of individuals in household DISTOCBD =1 if distance to the central business district > 5 km AREASIZE =1 if SMSA (standard metropolitan statistical area) ≥ 2.5 million population FULLTIME =1 if there are full-time workers in household ADULTS number of adults in household DISTONOD = 1 if distance from home to nearest transit node > 10 blocks REALINC household income divided by median income of census tract in which household resides WEEKEND =1 if 24 hr. survey period is either Saturday or Sunday Table 3.1: Variable definitions Figure 3.1 shows a histogram of the distribution of the amount of trips for the entire sample. In contrast, Figure 3.2 presents histograms for the two groups separately. Again, one can recognize large differences. Nearly 60% of the non-owners, but only approximately 10% of the owners didn’t make a trip during the 24 hours prior to the interview. The maximum number of trips for the group of non-owners is 5, whereas more than 15% of the owners report 10 or more trips. In order to see whether these differences can be interpreted in a causal manner, we apply the developed framework to the data. We fit the model using relatively flat priors. In particular, we set all the elements of the prior means ¯0 = B ¯1 = 10Ik and a ¯0 , a ¯1 , ¯b0 , ¯b1 and ¯bd equal to zero and chose A¯0 = A¯1 = 10, B ¯d = 10Il , with Im denoting the m×m identity matrix. As for the prior for the B ¯ = diag(0.1, 0.1, 0.1, 0.1). covariance matrix, we set m ¯ = (0.1, 0, 0.1, 0)′ and M c¯ = 20 and d¯ = 1 finally specify the prior for the precision parameter, and γ
56
Count Data Treatment Effects
Variable TOTTRIPS OWNVEH WORKSCHL HHMEM DISTOCBD AREASIZE FULLTIME ADULTS DISTONOD REALINC WEEKEND
Observations
Entire Sample 4.551 (4.935) 0.849 (0.358) 0.262 (0.328) 2.929 (1.613) 0.594 (0.491) 0.376 (0.485) 0.676 (0.468) 0.080 (0.898) 0.272 (0.445) 2.413 (2.759) 0.224 (0.417)
OWNVEH=0 0.828 (1.269)
OWNVEH=1 5.212 (5.050)
0.109 (0.301) 1.931 (1.669) 0.425 (0.498) 0.506 (0.503) 0.172 (0.380) 1.437 (0.831) 0.092 (0.290) 1.005 (1.469) 0.195 (0.399)
0.289 (0.325) 3.106 (1.538) 0.624 (0.485) 0.353 (0.478) 0.765 (0.424) 2.194 (0.861) 0.304 (0.460) 2.662 (2.859) 0.229 (0.420)
577
87
490
Table 3.2: Summary statistics
is set equal to 20. Table 3.3 reports the posterior estimates for the treatment equation. Again, one recognizes the positive impact of the instrument ADULTS on the probability of owning a vehicle. The posterior estimates for the potential outcomes equations are given in Table 3.4. Once more, the two groups differ systematically. How these differences translate into causal treatment effects is finally answered by Table 3.5 which gives posterior estimates of various treatment effect parameters. The posterior mean of the average treatment effect (ATE) is 1.731, the posterior mean of the effect of the treatment on the treated (TT) is 1.455. However, posterior standard deviations are very high in both cases.
57
Count Data Treatment Effects
0.20 0.15 0.10 0.05 0
1
2
3
4
5
6
7
8
9
10+
Figure 3.1: Histogram of TOTTRIPS for entire sample Variable C WORKSCHL HHMEM DISTOCBD AREASIZE FULLTIME DISTONOD REALINC WEEKEND ADULTS
Posterior Mean −1.229 0.936 0.034 0.037 −0.220 −0.059 0.008 0.380 0.044 0.772
Posterior Std. Dev. 0.312 0.327 0.457 0.015 0.178 3.121 0.007 0.087 0.205 0.154
Table 3.3: Posterior Estimates Treatment Equation
In addition, quantile treatment effects (QTE) are calculated at different quantiles. The quantile treatment effect for quantile τ is defined by Qτ (y1 )−Qτ (y0 ), where Qτ (yj ) denotes the τ th quantile of yj , j = 0, 1. Whereas the quantile treatment effect at quantile 0.1 is nearly zero, it increases up to 3.856 at the 0.9 quantile.
58
Count Data Treatment Effects
Variable Potential Outcome y0 C WORKSCHL HHMEM DISTOCBD AREASIZE FULLTIME DISTONOD REALINC WEEKEND
Posterior Mean
Posterior Std. Dev.
−1.281 1.005 0.189 −0.294 0.371 1.033 0.653 −0.180 0.476
0.416 0.526 0.109 0.351 0.354 0.431 0.548 0.171 0.416
Potential Outcome y1 C WORKSCHL HHMEM DISTOCBD AREASIZE FULLTIME DISTONOD REALINC WEEKEND
0.271 −0.451 0.182 −0.004 −0.015 0.813 0.188 0.001 −0.141
0.149 0.136 0.026 0.079 0.086 0.113 0.089 0.014 0.095
σ02 σ0d σ12 σ1d
0.444 0.118 0.372 −0.071
0.177 0.210 0.058 0.177
ν
20.091
4.447
Table 3.4: Posterior Estimates Outcomes Equations
59
Count Data Treatment Effects
0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0
1
2
3
4
5
6
7
8
9
10+
Figure 3.2: Histogram of TOTTRIPS by OWNVEH (black=0, grey=1)
3.5
Conclusion
This paper has proposed a semiparametric Bayesian framework for estimating causal treatment effects in situations where the variable of interest is a count process. A general and flexible joint distribution for the potential outcomes and the treatment is specified using Dirichlet process mixtures. An exact Bayesian analysis is then carried out by Markov chain Monte Carlo simulation methods.
One major advantage of the Bayesian approach is the possibility to esti-
60
Count Data Treatment Effects
Effect ATE TT QTE QTE QTE QTE QTE
(0.1) (0.3) (0.5) (0.7) (0.9)
Posterior Mean 1.731 1.455 0.004 1.722 2.284 3.262 3.856
Posterior Std. Dev. 4.569 5.374 0.060 0.465 1.023 1.856 5.712
Table 3.5: Posterior Estimates Treatment Effects
mate treatment effects on the individual level. These are of interest in their own right and may be summarized to calculate the treatment effects discussed in the literature.
Although the attention in this paper is focused on the analysis of crosssectional data, it is clear that the presented approach can be extended in many directions including the cases of panel data and multiple treatments. Another goal of future research is to apply recent techniques for Bayesian model comparison to the proposed framework.
Bibliography Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152– 1174. Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via Polya urn schemes,” The Annals of Statistics, 1, 353–355. Carota, C., and G. Parmigiani (2002): “Semiparametric regression for count data,” Biometrika, 89, 265–281. Chen, M. H., Q. M. Shao, and J. G. Ibrahim (2000): Monte Carlo methods in Bayesian computation. Springer, New York.
61
Count Data Treatment Effects
Chib, S., E. Greenberg, and R. Winkelmann (1998): “Posterior simulation and Bayes factors in panel count data models,” Journal of Econometrics, 86, 33–54. Chib, S., and B. H. Hamilton (2000): “Bayesian analysis of cross-section and clustered data treatment models,” Journal of Econometrics, 97, 25 – 50. (2002): “Semiparametric Bayes analysis of longitudinal data treatment models,” Journal of Econometrics, 110, 67–89. Cox, D. R. (1958): The planning of experiments. Wiley, New York. Escobar, M. D. (1994): “Estimating normal means with a Dirichlet process prior,” Journal of the American Statistical Association, 89, 268–277. Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588. Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1, 209–230. Fisher, R. A. (1935): Design of experiments. Oliver and Boyd. Gilks, W. R., and P. Wild (1992): “Adaptive rejection sampling for Gibbs sampling,” Applied Statistics, 41, 337–348. Greene, W. H. (2001): “FIML estimation of sample selection models for count data,” in Economic Theory, Dynamics and Markets, Essays in Honor of Ryuzo Sato, ed. by T. Negishi, R. V. Ramachandran, and K. Mino. Kluwer, Boston. Koop, G., and D. J. Poirier (1997): “Learning about the across-regime correlation in switching regression models,” Journal of Econometrics, 78, 217–227. Kozumi, H. (2002): “A Bayesian analysis of endogenous switching models for count data,” Journal of the Japan Statistical Society, 32, 141–154.
62
Count Data Treatment Effects
Lee, M.-J. (2004): “Selection correction and sensitivity analysis for ordered treatment effect on count response,” Journal of Applied Econometrics, 19, 323–337. ¨ ller (1998): “Estimating mixtures of MacEachern, S. N., and P. Mu Dirichlet process models,” Journal of Computational and Graphical Statistics, 7, 223–238. Munkin, M. K., and P. K. Trivedi (2003): “Bayesian analysis of a selfselection model with multiple outcomes using simulation-based estimation: an application to the demand for healthcare,” Journal of Econometrics, 114, 197–220. Neyman, J. (1923): “Statistical problems in agricultural experiments,” Journal of the Royal Statistic Society, 2, 107–180. Robert, C. P., and G. Casella (1999): Monte Carlo statistical methods. Springer, New York. ´ ndez (2005): “Counts with an endogeRomeu, A., and M. Vera-Herna nous binary regressor: A series expansion approach,” Econometrics Journal, 8, 1–22. Roy, A. (1951): “Some thoughts on the distribution of earnings,” Oxford Economic Papers, 3, 135–146. Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology, 66, 688– 701. Tanner, M. A., and W. Wong (1987): “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550. Terza, J. V. (1998): “Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects,” Journal of Econometrics, 84, 129–154.
63
Count Data Treatment Effects
Terza, J. V., and P. W. Wilson (1990): “Analyzing frequencies of several types of events: A mixed multinomial-poisson approach,” The Review of Economics and Statistics, 72, 108–15. Van Ophem, H. (2000): “Modeling selectivity in count-data models,” Journal of Business and Economic Statistics, 18, 503–511. Vijverberg, W. P. M. (1993): “Measuring the unidentified parameter of the extended Roy model of selectivity,” Journal of Econometrics, 57, 69–89. Walker, S. G., and B. K. Mallick (1997): “A note on the scale parameter of the Dirichlet process,” Canadian Journal of Statistics, 25, 473–479. Zimmer, D. M., and P. K. Trivedi (2006): “Using trivariate copulas to model sample selection and treatment effects: Application to family health care demand,” Journal of Business and Economic Statistics, 24, 63–76.
64
Count Data Treatment Effects
Appendix This appendix describes the algorithm proposed in Section 3 in more details. 1a. Sampling (α0 , α1 , σ) from (α0 , α1 , σ|β0 , β1 , βd , λ, η o ) p(α0 , α1 , σ|β0 , β1 , βd , λ, η o ) n0 Y = [p(di = 0|η0i , α0 , β0 , βd , σ, λi )fN (η0i |α0 , β0 , σ, λi )] i=1
×
n Y
[p(di = 1|η1i , α1 , β1 , βd , σ, λi )fN (η1i |α1 , β1 , σ, λi )]
i=n0 +1
× π(σ) × π(α0 ) × π(α1 ) (3.16) ( " ′ # n0 −zi βd − σσ0d2 (η0i − α0 − x′i β0 ) Y 0 Φ ∝ −1 2 λ (1 − σ0d /σ02 ) i i=1 ) ′ 2 1 1 λ (η − α − x β ) i 0i 0 0 i × (σ02 ) 2 exp − 2 σ02 # ( " ′ n zi βd + σσ1d2 (η1i − α1 − x′i β1 ) Y 1 × Φ −1 2 2 λ i (1 − σ1d /σ1 ) i=n0+1 ) ′ 2 1 λ (η − α − x β ) 1 i 1i 1 i 1 × (σ12 ) 2 exp − 2 σ12 n 1h ¯ 1 )2 io (α0 − α ¯ 0 )2 (α1 − α ¯ −1 (σ − m) + × exp − (σ − m) ¯ ′M ¯ + 2 A¯0 A¯1 × 1PD (σ)
We sample (α0 , α1 , σ) using the Metropolis-Hastings algorithm by first drawing a proposal value ξ˜ = (˜ α0 , α ˜1, σ ˜ ) from a multivariate Student’s t-density that is tailored to the target density and then accepting it according to the Metropolis-Hastings acceptence rule. 1b. Sampling
d∗i
from
(d∗i |α0 , α1 , σ, β0 , β1 , βd , λi , ηio )
65
independently
Count Data Treatment Effects
for i = 1, . . . , n: For di = 0, p(d∗i |α0 , α1 , σ, β0 , β1 , βd , λi , ηio ) = p(d∗i |α0 , β0 , βd , σ, λi , η0i ) (3.17) 2 σ 1 σ0d 1 − 0d 1(d∗i < 0), = fN d∗i |zi′ βd + 2 (η0i − α0 − x′i β0 ), 2 σ0 λi σ0 so that d∗i
2 σ0d σ0d 1 ′ ′ ∼ TN zi βd + 2 (η0i − α0 − xi β0 ), 1 − 2 ,0 , σ0 λi σ0 −
(3.18)
where TN− (µ, σ 2 , a) denotes a Normal distribution with mean µ and variance σ 2 that is truncated at the right at a. For di = 1, p(d∗i |α0 , α1 , σ, β0 , β1 , βd , λi , ηio ) = p(d∗i |α1 , β1 , βd , σ, λi , η1i ) (3.19) 2 σ σ 1 1d = fN d∗i |zi′ βd + 2 (η1i − α1 − x′i β1 ), 1 − 1d 1(d∗i > 0), σ1 λi σ12 so that d∗i
2 σ1d 1 σ1d ′ ′ 1 − 2 ,0 , ∼ TN zi βd + 2 (η1i − α1 − xi β1 ), σ1 λi σ1 +
(3.20)
where TN+ (µ, σ 2 , a) denotes a Normal distribution with mean µ and variance σ 2 that is truncated at the left at a. 2. Sampling ηio from (ηio |α0 , α1 , β0 , β1 , βd , σ, λi , d∗i ) independently for i = 1, . . . , n: For di = 0, p(ηio |α0 , α1 , β0 , β1 , βd , σ, λi , d∗i ) = p(η0i |α0 , β0 , βd , σ, λi , d∗i ) 1
1
2 −2 ∝ exp[− exp(η0i )] exp(yi η0i )λi2 (σ02 − σ0d ) λi ′ ∗ ′ 2 × exp − [η0i − α0 − xi β0 − σ0d (di − zi βd )] , 2 ) 2(σ02 − σ0d
66
(3.21)
Count Data Treatment Effects
for di = 1, p(ηio |α0 , α1 , β0 , β1 , βd , σ, λi , d∗i ) = p(η1i |α1 , β1 , βd , σ, λi , d∗i ) 1
1
2 −2 ) ∝ exp[− exp(η1i )] exp(yi η1i )λi2 (σ12 − σ1d λi ′ ∗ ′ 2 × exp − [η1i − α1 − xi β1 − σ1d (di − zi βd )] . 2 2(σ12 − σ1d )
(3.22)
Both densities are log-concave in ηio and we apply the adaptive rejection sampling algorithm of Gilks and Wild (1992) to draw ηio for i = 1, . . . , n. 3. Sampling β0 from (β0 |α0 , βd , σ, λ, η o , d∗ ): p(β0 |α0 , βd , σ, λ, η o , d∗ ) n0 Y p(η0i |α0 , β0 , βd , σ, λi , d∗i ) π(β0 ) ∝
(3.23)
i=1 n0 Y
2 σ02 − σ0d ′ ∗ ′ ¯0 ), ∝ fN η0i |α0 + xi β0 + σ0d (di − zi βd ), fN (β0 |¯b0 , B λ i i=1
so that p(β0 |α0 , βd , σ, λ, η o , d∗ ) ∼ N(µβ , Σβ ) with Σβ =
¯0−1 B
+
Pn0
′ i=1 λi xi xi 2 σ02 − σ0d
(3.24)
−1
(3.25)
and µβ = Σβ
¯0−1¯b0 B
+
Pn0
i=1
λi xi [η0i − α0 − σ0d (d∗i − zi′ βd )] 2 σ02 − σ0d
−1
.
(3.26)
4. Sampling β1 from (β1 |α1 , βd , σ, λ, η o , d∗ ): p(β1 |α1 , βd , σ, λ, η o , d∗ ) n Y ∝ p(η1i |α1 , β1 , βd , σ, λi , d∗i ) π(β1 ) ∝
i=n0 +1 n Y
i=n0
(3.27)
2 σ12 − σ1d ′ ∗ ¯1 ), fN (β1 |¯b1 , B fN η1i |α1 + xi β1 + σ1d (di − zi βd ), λ i +1
67
Count Data Treatment Effects
so that p(β1 |α1 , βd , σ, λ, η o , d∗ ) ∼ N(µβ , Σβ ) with Σβ =
¯1−1 B
+
(3.28)
′ −1 i=n0 +1 λi xi xi 2 σ12 − σ1d
Pn
(3.29)
and µβ = Σβ
¯1−1¯b1 B
+
Pn
i=n0 +1
λi xi [η1i − α1 − σ1d (d∗i − zi′ βd )] 2 σ12 − σ1d
−1
. (3.30)
5. Sampling βd from (βd |α0 , α1 , β0 , β1 , σ, λ, η o , d∗ ): p(βd |α0 , α1 , β0 , β1 , σ, λ, η o , d∗ ) n0 n Y Y ∗ p(di |α0 , β0 , βd , σ, λi , η0i ) p(d∗i |α1 , β1 , βd , σ, λi , η1i ) ∝ i=n0 +1
i=1
× π(βd ) n0 2 Y (3.31) σ0d σ0d −1 ′ ∗ ′ 1− 2 ∝ fN di |zi βd + 2 (η0i − α0 − xi β0 ), λi σ0 σ0 i=1 n 2 Y σ1d σ1d −1 ∗ ′ ′ × fN di |zi βd + 2 (η1i − α1 − xi β1 ), λi 1− 2 σ σ1 1 i=n +1 0
¯d ), × fN (βd |¯bd , B
so that (βd |α0 , α1 , β0 , β1 , σ, λ, η o , d∗ ) ∼ N(µβ , Σβ ) with Σβ =
Pn0
′ ¯ −1 + i=1 λi zi zi + B d 2 1 − σ0d /σ02
′ −1 i=n0 +1 λi zi zi 2 1 − σ1d /σ12
Pn
(3.32)
(3.33)
and µβ = Σβ
Pn0
λi zi [d∗i − σ0d /σ02 (η0i − α0 − x′i β0 )] 2 1 − σ0d /σ02 !−1 Pn ∗ 2 ′ i=n0 +1 λi zi [di − σ1d /σ1 (η1i − α1 − xi β1 )] + . 2 1 − σ1d /σ12
¯ −1¯bd B d
+
i=1
(3.34)
6. Define nj as the number of si equal to j, that is, the size of the jth
68
Count Data Treatment Effects
cluster. Let λ−i denote the vector (λ1 , . . . , λi−1 , λi+1 , . . . , λn ) and n− j the − size of cluster j without the element λi and j the number of clusters with λi removed from consideration. Then we marginalize over λi and sample si from n− q j j P(si = j|λ−i , s−i ) ∝ νq 0
with
j = 1, . . . , j − , j = j − + 1,
qj = fN wi |µi , λ−1 j Σi and q0 =
Z
fN wi |µi , λ−1 Σi dG0 (λ) = fT (wi |µi , Σi , γ),
(3.35)
(3.36)
(3.37)
where wi = (η0i , d∗i )′ , µi = (α0 + x′i β0 , zi′ βd )′ and Σi =
σ02 σ0d σ0d 1
!
(3.38)
for di = 0 and wi = (η1i , d∗i )′ , µi = (α1 + x′i β1 , zi′ βd )′ and Σi =
σ12 σ1d σ1d 1
!
(3.39)
for di = 1. 7. Sampling κj from (κj |α0 , α1 , β0 , β1 , βd , σ, η o , d∗ ) independently for j = 1, . . . , l: p(κj |α0 , α1 , β0 , β1 , βd , σ, η o , d∗ ) γ γ Y ∝ fN (˜ εi |0, κ−1 Σ )f i G κj | , j 2 2 i∈j Y 1 ′ −1 −1 −1 − 12 ∝ |κj Σi | exp − ε˜i κj Σi ε˜i 2 i∈j γ γ −1 × κj2 exp − κj , 2
69
(3.40)
Count Data Treatment Effects
where ε˜i =
η0i − α0 − x′i β0 d∗i − zi βd
!
η1i − α1 − x′i β1 d∗i − zi βd
!
,
Σi =
σ02 σ0d σ0d 1
!
(3.41)
Σi =
σ12 σ1d σ1d 1
!
(3.42)
for di = 0 and ε˜i =
,
for di = 1. Thus, o
∗
(κj |α0 , α1 , β0 , β1 , βd , σ, η , d ) ∼ G
γ+ γ + nj , 2
P
i∈j
2
ε˜′i Σ−1 ˜i i ε
!
.
(3.43)
8. Sample ν from (ν|λ), using the data augmentation idea of Escobar and West (1995): First, sample a latent variable τ , τ |ν, j ∼ Beta(j + 1, n),
(3.44)
where j is the number of distinct clusters. Then, sample the precision parameter ν from a mixture of two Gamma distributions, ν|τ, j ∼ πτ Ga(¯ c + j, d¯− log(τ )) + (1 − πτ )Ga(¯ c + j − 1, d¯− log(τ )), (3.45) where the mixture weight πτ is given in odds form by πτ c¯ + j − 1 . = 1 − πτ n(d¯ − log(τ ))
(3.46)
9a. Sampling ηiu from (ηiu |α0 , α1 , β0 , β1 , βd , σ, λi , ηio , d∗i ), independently for
70
Count Data Treatment Effects
i = 1, . . . , n: For di = 0, p(ηiu |α0 , α1 , β0 , β1 , βd , σ, λi , ηio , d∗i ) = p(η1i |α0 , α1 , β0 , β1 , βd , σ, λi , η0i , d∗i ) " σ0d σ1d (η0i − α0 − x′i β0 ) = fN η1i |α1 + x′i β1 − 2 2 σ0 − σ0d # 2 2 1 σ1d σ02 σ σ (d∗i − zi′ βd ), + 2 σ12 − 2 1d 02 , 2 σ0 − σ0d λi σ0 − σ0d
(3.47)
so that p(η1i |α0 , α1 , β0 , β1 , βd , σ, λi , η0i , d∗i ) ∼ N(µη , Ση )
(3.48)
with µη = α1 + x′i β1 −
σ0d σ1d (η0i − α0 − x′i β0 ) 2 σ02 − σ0d
σ1d σ02 + 2 (d∗i − zi′ βd ) 2 σ0 − σ0d and
1 Ση = λi
2 2 σ1d σ0 2 σ1 − 2 . 2 σ0 − σ0d
(3.49)
(3.50)
For di = 1, p(ηiu |α0 , α1 , β0 , β1 , βd , σ, λi , ηio , d∗i ) = p(η0i |α0 , α1 , β0 , β1 , βd , σ, λi , η1i , d∗i ) " σ0d σ1d = fN η0i |α0 + x′i β0 − 2 (η1i − α1 − x′i β1 ) 2 σ1 − σ1d # 2 2 σ0d σ1 σ0d σ12 1 2 ∗ ′ σ0 − 2 , + 2 (di − zi βd ), 2 2 σ1 − σ1d λi σ1 − σ1d
(3.51)
so that p(η0i |α0 , α1 , β0 , β1 , βd , σ, λi , η1i , d∗i ) ∼ N(µη , Ση )
71
(3.52)
Count Data Treatment Effects
with µη =α0 + x′i β0 − + and
σ0d σ1d (η1i − α1 − x′i β1 ) 2 σ12 − σ1d
σ0d σ12 (d∗i − zi′ βd ) 2 2 σ1 − σ1d 1 Ση = λi
2 2 σ0d σ1 2 σ0 − 2 . 2 σ1 − σ1d
(3.53)
(3.54)
9b. Sampling yi∗ from (yi∗ |ηiu ), independently for i = 1, . . . , n: For di = 0, p(yi∗ |ηiu ) = p(y1i |η1i ) =
exp[− exp(η1i )] exp(η1i )y1i , y1i !
(3.55)
so that p(y1i |η1i ) ∼ Poisson[exp(η1i )],
(3.56)
for di = 1, p(yi∗ |ηiu ) = p(y0i |η0i ) =
exp[− exp(η0i )] exp(η0i )y0i , y0i !
(3.57)
so that p(y0i |η0i ) ∼ Poisson[exp(η0i )].
72
(3.58)
Chapter 4 Nonparametric Bayesian Inference for Quantile Treatment Effects (Essay 3)
73
Quantile Treatment Effects
Abstract This paper is concerned with the analysis of the causal effect of a treatment on the distribution of outcomes. Distributional treatment effects are of interest in many situations. A Bayesian framework for estimating quantile treatment effects is developed. The proposed model is based on the potential outcomes framework and assumes the existence of random intercepts in the equations of the two potential outcomes. The distributions of the random intercepts are modelled in a nonparametric way using Dirichlet process priors. This results in a flexible framework that allows the data to drive the shape of the potential outcomes distributions. Further, the proposed methods are used to investigate the wage effect of school attendance.
JEL classifications C14, C31, D31
Keywords Causal inference, quantile treatment effects, Dirichlet process prior
74
Quantile Treatment Effects
4.1
Introduction
The literature on econometric program evaluation is concerned with the estimation of causal treatment effects in situations where individuals select themselves into the treatment. Thus, it has to account for unobserved factors that determine both the probability of treatment and the outcome variable and lead to heterogeneous responses to the treatment. These heterogeneous responses are often described by mean treatment effects. The two most famous are the average treatment effect (ATE) and the effect of the treatment on the treated (TT). The average treatment effect gives the effect of the treatment on a randomly chosen individual from the population. The effect of the treatment on the treated defines the effect on a randomly picked individual from the treatment group. However, often researchers and policy-makers are not only interested in mean treatment effects but also in the distributional effect of a treatment. For example, when analyzing returns to schooling we might be interested in the effect of an additional year of education on the dispersion of earnings, or its effect on the upper tail of the earnings equation. One parameter that captures these aspects is the quantile treatment effect (QTE). The quantile treatment effect was introduced in the statistics literature by Doksum (1974) and Lehmann (1974). It is defined, for any fixed percentile τ , as the horizontal distance between two cumulative distribution functions. To define the quantile treatment effect in the program evaluation context, it is helpful to introduce the potential outcomes framework of Rubin (1974). For each person we specify two potential outcomes y0 and y1 which correspond to the potential outcomes in the untreated and treated states, respectively. The quantile treatment effect for quantile τ is then defined by the horizontal distance between the cumulative distribution functions of the two potential outcomes: = Qτ (y1 ) − Qτ (y0 ), (4.1) ∆QTE τ where Qτ (yj ) denotes the τ th quantile of yj , j = 0, 1. Figure 4.1 illustrates the quantile treatment effect for the 0.1 quantile in comparison with the average treatment effect for a chosen example. Note, that the quantile treatment effect
75
Quantile Treatment Effects
y0
y1
∆ATE
∆QTE 0.1
q0.1 (y0 )
E(y0 )
q0.1 (y1 )
E(y1 )
versus average treatFigure 4.1: Example of quantile treatment effect ∆QTE 0.1 ATE ment effect ∆ is deduced from the marginal distributions of the potential outcomes. Thus, ∆QTE is not the quantile of the difference y1 − y0 . τ Standard quantile regression techniques cannot be applied for estimating quantile treatment effects due to the endogeneity problem discussed above and advanced methods have been proposed in the literature. Firpo (2005) develops an estimator for quantile treatment effects under the conditional independence assumption (CIA). Abadie, Angrist, and Imbens (2002) and Chernozhukov and Hansen (2004, 2005) discuss estimation of quantile treatment effects in an instrumental variables setup. Bayesian approaches to inferring quantile treatment effects are discussed by Imbens and Rubin (1997) and Hirano, Imbens, Rubin, and Zhou (2000). They consider an eligibility design setup and estimate the effect of the treatment for the group of individuals that comply to the original treatment decision, the so called ‘compliers’ (see Imbens and Angrist (1994)). This paper also proposes a framework for estimating quantile treatment
76
Quantile Treatment Effects
effects using Bayesian inference but is different from the previous two in that it does not consider an eligibility design. In fact, we base our inference on a version of Rubin’s potential outcomes approach and specify a joint model of the potential outcomes and the treatment decision. We build on the work of Chib and Hamilton (2000) who assume that the outcome and treatment distributions either follow a multivariate Student-t distribution or a finite mixture of multivariate Normal distributions. In principle, their model can already be used to estimate quantile treatment effects in a switching-regression setup. However, inference for quantile treatment effects in their model might critically depend on distributional assumptions. For this reason, we propose a nonparametric Bayesian model approach. We develop a hierarchical specification with random intercepts in the potential outcomes equations. These follow Dirichlet process (DP) priors (Ferguson (1973), Antoniak (1974)), which are centered around Normal distributions. The specification of two different distributions for the random intercepts allows for different shapes of the potential outcomes distributions. This distinguishes our work from Chib and Hamilton (2002) who also consider a nonparametric extension of the potential outcomes model. The rest of the paper is organized as follows. In Section 2, we present the model. Section 3 discusses estimation of the model via Markov chain Monte Carlo (MCMC) simulation techniques. In section 4, we consider an empirical illustration of the proposed methods. Section 5 concludes and provides an outlook on future research.
4.2
The Model
Setting up the model, we use the potential outcomes framework of Neyman (1923), Fisher (1935), Roy (1951), Cox (1958) and Rubin (1974). For each person i = 1, . . . , n we specify two potential outcomes y0i and y1i which correspond to the potential outcomes in the untreated and treated states respectively. Let di = 1 denote participation in the treatment, di = 0 denotes non-participation.
77
Quantile Treatment Effects
Then the observed outcome variable is yi = (1 − di )y0i + di y1i .
(4.2)
The potential outcome equation for the treatment state is modelled as y0i = α0i + x′i β0 + ε0i ,
(4.3)
and the potential outcome in the non-treatment state is y1i = α1i + x′i β1 + ε1i .
(4.4)
The k × 1 vector xi consists of observed covariates (not including a constant) and β0 and β1 are the corresponding parameter vectors. ε0i and ε1i denote unobserved error terms and α0i and α1i are random intercepts. The observed treatment decision di is modelled as 1, if d∗ ≥ 0, i (4.5) di = 0, otherwise,
where d∗i is a latent variable that determines the treatment status, d∗i = zi′ βd + εdi ,
(4.6)
with zi being a l × 1 vector of observed random variables, βd the corresponding parameter vector (including a constant) and εdi an unobserved error term. The joint distribution of the error terms ε0i , ε1i and εdi is modelled as a trivariate Normal distribution, 2 σ0 0 σ0d 0 ε0i ε1i ∼ N 0 , 0 σ12 σ1d , σ0d σ1d 1 0 εdi
(4.7)
where σ ≡ (σ02 , σ12 , σ0d , σ1d ) denote the elements of the covariance matrix. As often assumed in the literature, we set σ01 to zero, see Koop and Poirier (1997) and Poirier and Tobias (2003), who do not impose this restriction.
78
Quantile Treatment Effects
The distributions of the random intercepts α0i and α1i are modelled in a Bayesian nonparametric fashion using a Dirichlet process mixture model. The Dirichlet process, which is well established in the nonparametric literature by now, was introduced by Ferguson (1973). A random probability distribution F is generated by a Dirichlet process with precision parameter ν > 0 and base distribution F0 if for any partition B1 , . . . , Bm on the space of support of F0 the vector of probabilities (F (B1 ), . . . , F (Bm )) follows a Dirichlet distribution with parameter (νF0 (B1 ), . . . , νF0 (Bm )). The expectation and variance of F are defined by E(F ) = F0 , (4.8) and for any event A, Var[F (A)] =
F0 (A)[1 − F0 (A)] . ν+1
(4.9)
For a more detailed discussion on the role of the two parameters see Walker, Damien, Laud, and Smith (1999).
In our model we assume that the random intercept α0i is drawn from an unknown distribution G0 , α0i ∼ G0 . (4.10) The prior distribution on G0 is then defined to be a Dirichlet process with precision parameter ν0 > 0 and base distribution G00 , G0 ∼ DP(ν0 , G00 ).
(4.11)
Finally, the base distribution G00 is specified as a Normal distribution with mean µα0 and variance Sα0 , G00 = N(µα0 , Sα0 ).
(4.12)
The distribution of α1i is modelled in an analogous way: α1i ∼ G1 ,
79
(4.13)
Quantile Treatment Effects
G1 ∼ DP(ν1 , G01 ),
(4.14)
G01 = N(µα1 , Sα1 ).
(4.15)
Thus, the introduction of the Dirichlet process priors allows us to model possible deviations of the true distributions G0 and G1 of the random intercepts α0i and α1i from their ‘baseline’ normal distribution G00 and G01 , respectively. In other words, we approximate the true nonparametric shapes of G0 and G1 by the normal base distributions G00 and G01 . The prior beliefs about the similarity of the true distributions and the base distributions are reflected by the precision parameters ν0 and ν1 . The larger ν0 is, the more likely is G0 to be close to G00 , with the same argument holding for ν1 .
Finally, we select independent priors π(β0 ), π(β1 ), π(βd ), π(σ), π(ν0 ), π(ν1 ), π(µα0 , Sα0 ) and π(µα1 , Sα1 ) for the parameter vectors β0 , β1 and βd , the elements of the covariance matrix σ, the precision parameters ν0 and ν1 and the parameters associated with the base distribution µα0 , Sα0 , µα1 and Sα1 . In particular, we specify multivariate Normal priors for β0 , β1 and βd and a truncated multivariate Normal prior for σ, ¯0 ), π(β0 ) = Nk (β0 |¯b0 , B
(4.16)
¯1 ), π(β1 ) = Nk (β1 |¯b1 , B
(4.17)
¯d ), π(βd ) = Nl (βd |¯bd , B
(4.18)
¯ )1PD (σ), π(σ) = N4 (σ|m, ¯ M
(4.19)
where 1PD (σ) denotes the indicator function taking the value one if σ leads to a positive definite variance-covariance matrix and the value zero otherwise. Following Escobar and West (1995), we specify the prior distributions on ν0 and ν1 to be Gamma distributions with shape parameters c¯0 and c¯1 and scale parameters d¯0 and d¯1 , respectively,1 π(ν0 ) = Ga(ν0 |¯ c0 , d¯0 ), 1
(4.20)
For other approaches to eliciting priors on the precision parameters ν0 and ν1 see, for example, Walker and Mallick (1997) or Carota and Parmigiani (2002).
80
Quantile Treatment Effects
π(ν1 ) = Ga(ν1 |¯ c1 , d¯1 ).
(4.21)
The priors for the parameters of the base distribution are defined to be Normalinverted gamma distributions:2
4.3
π(µα0 , Sα0 ) = NIG(µα0 , Sα0 |¯ v0 , V¯0 , e¯0 , f¯0 ),
(4.22)
π(µα1 , Sα1 ) = NIG(µα1 , Sα1 |¯ v1 , V¯1 , e¯1 , f¯1 ).
(4.23)
Bayesian Computation
We base our inference on the posterior distribution of the model which is according to Bayes theorem proportional to the prior distribution times the likelihood function. However, this distribution is too complex to be analyzed analytically. Hence, we resort to MCMC sampling techniques, in particular the Gibbs sampler (Geman and Geman (1984)), for summarizing features of the posterior model space. The Gibbs sampler draws a large posterior sample by successively sampling from the conditional distributions of the model parameters.
In order to facilitate computations, we follow the approach of Tanner and Wong (1987) and augment the parameter space of the model by the vectors of latent variables α0 ≡ (α01 , . . . , α0n ), α1 ≡ (α11 , . . . , α1n ), d∗ ≡ (d∗1 , . . . , d∗n ) and y ∗ ≡ (y1∗ , . . . , yn∗ ), where yi∗ denotes the unobserved potential outcome, that is yi∗ ≡ di y0i + (1 − di )y1i . Furthermore, we partition α0 into α0o and α0u , where α0o ≡ (α01 , . . . , α0n0 ) and α0u ≡ (α0(n0 +1) , . . . , α0n ), assuming that the first n0 observations in the data set have di = 0 and the last n1 = n − n0 observations have di = 1. Thus, α0o collects the random intercepts α0i for the individuals for whom the corresponding potential outcome y0i is observed, and α0u collects the random intercepts α0i for those for whom y0i is unobserved. α1o and α1u are defined in the same way, that is, α1o ≡ (α1(n0 +1) , . . . , α1n ) and α1u ≡ (α11 , . . . , α1n0 ). 2
The Normal-inverted gamma distribution is defined here as NIG(µ, σ 2 |m, M, a, b) = N(µ|m, M σ 2 )IG(σ 2 |a, b).
81
Quantile Treatment Effects
This distinction between random intercepts corresponding to observed and unobserved potential outcomes becomes necessary, since the standard Gibbs sampler can exhibit slow convergence for this model. To overcome this problem, we set up a collapsed Gibbs sampler (Liu (1994)) by integrating α0u and α1u out in many steps of the sampler. The same strategy is applied with respect to the latent vectors y ∗ and d∗ .3
For constructing the Gibbs sampler for the nonparametric part of the model, we follow the literature (see, for example, Escobar (1994), Escobar and West (1995) and Neal (2000)) and marginalize over the unknown measures G0 and G1 by resorting to the P´olya urn scheme representation of the Dirichlet process (Blackwell and MacQueen (1973)). They show that if F ∼ DP(ν, F0 ) and θ1 , . . . , θn is a sample from F , then the conditional distribution of θi is given by X 1 ν θi |θ−i ∼ F0 , (4.24) δ(θj ) + n − 1 + ν j6=i n−1+ν where δ(θj ) denotes a point mass at θi . Formulating the prior distributions of α0 and α1 in this way and combining them with the likelihood function of the model, we can easily derive the resulting conditional posterior distributions. Since we chose the base distributions G00 and G10 to be conjugate, all necessary calculations are analytically tractable and straightforward.
The P´olya urn scheme representation also illustrates the fact that the Dirichlet process leads to a clustering structure of the random intercepts α0 and α1 . We denote the distinct values of α0o and α1o by κ0 ≡ (κ01 , . . . , κ0(m0) ) and κ1 ≡ (κ11 , . . . , κ1(m1) ). The implemented Gibbs sampler resamples κ0 and κ1 after the grouping of α0o and α1o is determined. This idea goes back to to Bush and MacEachern (1996) and improves the mixing behavior of the resulting sampler.
Finally, the sampling scheme for our model can be summarized by the 3
See Chib and Hamilton (2000) who follow a similar strategy concerning d∗ .
82
Quantile Treatment Effects
following steps:4 0. Chose starting values for (α0o , α0u , α1o , α1u , β0 , β1 , βd , σ, d∗ , y ∗ , µα0 , Sα0 , µα1 , Sα1 , ν0 , ν1 ). 1. Sample (σ, d∗ ) from (σ, d∗ |α0o , α1o , β0 , β1 , βd ) by (a) sampling σ from (σ|α0o , α1o , β0 , β1 , βd ) using the Metropolis-Hastings algorithm, (b) sampling d∗i from (d∗i |α0o , α1o , β0 , β1 , βd , σ), which is a truncated Normal distribution, independently for i = 1, . . . , n. 2. Sample β0 from (β0 |α0o , βd , σ, d∗ ), which is a Normal distribution. 3. Sample β1 from (β1 |α1o , βd , σ, d∗ ), which is a Normal distribution. 4. Sample βd from (βd |α0o , α1o , β0 , β1 , σ, d∗ ), which is a Normal distribution. o o o 5. Sample α0i from (α0i |α0,−i , β0 , βd , σ, d∗ , ν0 , µα0 , Sα0 ) independently for i = 1, . . . , n0 using the Polya urn scheme representation of the Dirichlet process. o o o 6. Sample α1i from (α1i |α1,−i , β1 , βd , σ, d∗ , ν1 , µα1 , Sα1 ) independently for i = n0 + 1, . . . , n using the Polya urn scheme representation of the Dirichlet process.
7. Sample κ0j from (κ0j |β0 , βd , σ, d∗ , µα0 , Sα0 ) independently for j = 1, . . . , m0 , which is a Normal distribution. 8. Sample κ1j from (κ1j |β1 , βd , σ, d∗ , µα1 , Sα1 ) independently for j = 1, . . . , m1 , which is a Normal distribution. 9. Sample ν0 from (ν0 |α0o ), using the data augmentation idea of Escobar and West (1995). 10. Sample ν1 from (ν1 |α1o ), using the data augmentation idea of Escobar and West (1995). 4
Note that the conditioning sets of all distributions explicitly recognize only those parameters that are relevant.
83
Quantile Treatment Effects
11. Sample (µα0 , Sα0 ) from (µα0 , Sα0 |κ0 ), which is a Normal-inverted gamma distribution. 12. Sample (µα1 , Sα1 ) from (µα1 , Sα1 |κ1 ), which is a Normal-inverted gamma distribution. u u 13. Sample (α0i , yi∗ ) from (α0i , yi∗ |α0o , α1o , β0 , β1 , βd , σ, d∗ , µα0 , Sα0 , ν0 ) independently for i = n0 + 1, . . . , n, making use of the Polya urn scheme representation of the Dirichlet process. u u 14. Sample (α1i , yi∗ ) from (α1i , yi∗ |α0o , α1o , β0 , β1 , βd , σ, d∗ , µα1 , Sα1 , ν1 ) independently for i = 1, . . . , n0 , making use of the Polya urn scheme representation of the Dirichlet process.
15. Repeat steps 1-14 using the updated values of the conditioning variables.
4.4
Empirical Application
In this section we will apply the proposed framework to the data from the article “Using geographic variation in college proximity to estimate the returns to schooling” by David Card (1995).5 These were drawn from the National Longitudinal Survey of Young Men (NLSYM) and contain information about 3010 young men in 1976. We are interested in the causal effect of education on wages and chose LWAGE which is the logarithm of hourly wages to be the outcome variable. The treatment variable EDU is a dummy indicating whether the individual attained more than 12 years of schooling (EDU=1) or not (EDU=0).6 We follow Card’s reasoning and chose presence of a nearby college (NEAR) as instrumental variable. The explanatory variables consist of labor market experience, labor market experience squared and dummies for location and race. All variable definitions can be found in Table 4.1. Summary statistics for the entire sample and for the treatment (EDU=1) and control (EDU=0) groups are reported in Table 4.2. 5
The data set accompanies the textbook of Wooldridge (2000). Note that in the original data set education attainment is measured on an ordinal scale. By converting it to a binary variable we follow the approach of Chib and Greenberg (2005). 6
84
Quantile Treatment Effects
Variable LWAGE EXP EXP2 BLACK METRO SOUTH EDU NEAR
Definition log hourly wage in cents potential experience (years of schooling - age - 6) EXP squared =1 if black, =0 otherwise =1 if residence in a metropolitan area, =0 otherwise =1 if residence in the South, =0 otherwise =1 if years of schooling > 12, =0 otherwise =1 if residence near 4-year college, =0 otherwise Table 4.1: Variable definitions
The last two columns of Table 4.2 show systematic differences in the means of the covariates between the two groups. Individuals who went to school for more than 12 years have (by definition) less experience, are less likely to be black and to live in the south and are more likely to live in a metropolitan area. In addition, they live closer to a 4-year college. Figure 4.2 plots kernel density estimates of LWAGE for the treatment and the control group. The distribution of those who attained school for more than 12 years is shifted to the right, but the shapes of the two distributions look rather similar. We then apply the methods introduced in this paper to estimate the effect of going to school for more than twelve years on the wage distribution. The model is fit to the data using the MCMC sampling algorithm described in Section 4.3. Prior elicitation is done in the following way: we randomly choose 500 individuals from the data set and analyze this ‘training sample’ using a parametric version of our model where the random intercepts are not assumed to be drawn from Dirichlet processes but from Normal distributions. That is, equations (4.10) to (4.12) and (4.13) to (4.15) are replaced by α0i ∼ N(µα0 , Sα0 ),
(4.25)
α1i ∼ N(µα1 , Sα1 ),
(4.26)
and
respectively. The priors of this ‘training model’ are specified to be rather flat. In this way we mimic the usual Bayesian approach where the results of a previous study with different data are used to select prior distributions (see, for
85
Quantile Treatment Effects
Variable LWAGE EXP EXP2 BLACK METRO SOUTH EDU NEAR
Entire sample 6.262 (0.444) 8.856 (4.142) 95.579 (84.618) 0.234 (0.423) 0.713 (0.452) 0.404 (0.491) 0.505 (0.500) 0.682 (0.466)
EDU=0 6.164 (0.438) 11.093 (3.786) 137.373 (92.696) 0.323 (0.468) 0.642 (0.480) 0.457 (0.498)
EDU=1 6.358 (0.428) 6.667 (3.198) 54.664 (48.750) 0.146 (0.353) 0.782 (0.413) 0.352 (0.478)
0.629 (0.483)
0.734 (0.442)
Table 4.2: Summary statistics
example, Ibrahim and Kleinman (1998), and Chib and Hamilton (2002) who also follow the training sample strategy).7 To analyze the remaining data, ¯ to the corresponding posterior estimates we set µα0 , Sα0 , µα1 , Sα0 and m, ¯ M obtained from the training sample. In addition, we choose c¯0 = d¯0 = 50 and c¯1 = d¯1 = 1. Relatively flat priors are selected for the other parameters. The results of the model fitting are reported in Table 4.3. They are based on 50,000 draws from the MCMC sampler. One can see from the posterior estimates of the treatment equation that an individual living near to a 4-year college has a higher probability of attending school for more than 12 years. The posterior means of the coefficients in the two potential outcomes equations do not differ much. The largest differences are attributed to the dummy variables BLACK and SOUTH. The estimates of the variances of the random intercepts and the error terms indicate that the dispersions of the two potential outcomes are nearly equal. The covariances σ0d and σ1d are not bounded away from zero. This suggests the absence of unobserved confounders (this was also found by Chib and Greenberg (2005)). 7
A discussion of the training sample strategy for prior elicitation is given in Gelfand,
86
Quantile Treatment Effects
Variable Treatment Intercept EXP EXP2 BLACK METRO SOUTH NEAR
Posterior Mean
Posterior Std. Dev.
2.691 −0.445 0.011 −0.601 0.275 0.035 0.195
0.204 0.040 0.002 0.075 0.070 0.066 0.068
Potential Outcome y0 EXP EXP2 BLACK METRO SOUTH
0.098 −0.003 −0.243 0.186 −0.211
0.017 0.001 0.027 0.024 0.024
Potential Outcome y1 EXP EXP2 BLACK METRO SOUTH
0.077 −0.003 −0.168 0.167 −0.073
0.013 0.001 0.035 0.028 0.026
µ α0 S α0 µ α1 S α1
5.530 0.167 5.895 0.189
0.116 0.018 0.058 0.020
σ02 σ0d σ12 σ1d
0.021 −0.034 0.025 −0.024
0.007 0.037 0.006 0.048
ν0 ν1
50.740 51.343
7.066 7.077
Table 4.3: Posterior Estimates
87
Quantile Treatment Effects
Figure 4.2: Kernel density estimates of LWAGE for the control group (EDU=0, solid curve) and the treatment group (EDU=1, dashed curve)
Posterior estimates for the quantile treatment effects are given in Table 4.4. Figure 4.3 presents boxplots of the corresponding posterior distributions. The causal impact of attending school for more than 12 years is nearly constant along the distribution. The posterior mean of the 0.1 quantile treatment effect is 0.312, the posterior mean at the 0.9 quantile is 0.257. This result is in accordance with the similar shapes of the log wage distributions for the two groups depicted in Figure 4.2. However, the kernel density estimates in Figure 4.2 do not account for covariates and the possible presence of endogeneity. Dey, and Chang (1992) and Ghosh and Samanta (2002).
88
Quantile Treatment Effects
Quantile Posterior Mean 0.1 0.312 0.3 0.307 0.5 0.290 0.7 0.252 0.9 0.257
Posterior Std. Dev. 0.053 0.059 0.055 0.055 0.062
Table 4.4: Posterior Estimates of Quantile Treatment Effects
4.5
Conclusions
In many settings the researcher or policy-maker is interested in the causal effect of a treatment on the distribution of outcome gains rather than only the mean effect of the treatment. This paper offers a Bayesian alternative to existing classical approaches for estimating quantile treatment effects. In order to let the data drive the shape of the distribution of outcome gains we formulate a flexible model which is based on the potential outcomes framework. We assume the two potential outcomes equations having random intercepts that follow Dirichlet processes. A MCMC algorithm for conducting posterior inference is proposed. We apply the proposed method to a subsample of the NLSYM which was originally analyzed by Card (1995). We estimate the causal effect of attending school for more than twelve years. Quantile treatment effects are found to vary between 0.312 at the 0.1 quantile and 0.257 at the 0.9 quantile. This suggests that going to school for more than 12 years merely shifts the wage distribution to the right. The model can be extended in at least two interesting directions. First, uncertainty about the precision parameters in the Dirichlet processes was captured by formulating prior distributions for them. A next step would be to assess formal model comparison via Bayes factors. Basu and Chib (2003) discuss how to calculate these in MDP models. Alternatively, a cross validation model comparison criteria could be used. Second, the presented framework does not allow the shapes of the potential outcomes distributions to vary across the covariate space. Recently developed approaches that incorporate covariates into the Dirichlet process (see, for example, MacEachern (1999), Dunson and Pillai
89
Quantile Treatment Effects
Figure 4.3: Boxplots of Posterior Estimates of Quantile Treatment Effects (from left to right: 0.1, 0.3, 0.5, 0.7, 0.9 quantile)
(2004), and Griffin and Steel (2006)) can help to relax this assumption.
Bibliography Abadie, A., J. Angrist, and G. Imbens (2002): “Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings,” Econometrica, 70, 91–117. Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152– 1174. Basu, S., and S. Chib (2003): “Marginal likelihood and Bayes factors for Dirichlet process mixture models,” Journal of the American Statistical Association, 98, 224–235.
90
Quantile Treatment Effects
Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via Polya urn schemes,” The Annals of Statistics, 1, 353–355. Bush, C. A., and S. N. MacEachern (1996): “A semi-parametric Bayesian model for randomized block designs,” Biometrika, 83, 275–285. Card, D. (1995): “Using geographic variation in college proximity to estimate the return to schooling,” in Aspects of labor market behaviour: Essays in honour of John Vanderkamp, ed. by L. Christofides, E. Grant, and R. Swidinsky. University of Toronto Press, Toronto. Carota, C., and G. Parmigiani (2002): “Semiparametric regression for count data,” Biometrika, 89, 265–281. Chernozhukov, V., and C. Hansen (2004): “The impact of 401K participation on savings: an iv-qr analysis,” The Review of Economics and Statistics, 86, 735–751. (2005): “An iv model of quantile treatment effects,” Econometrica, 73, 245–261. Chib, S., and E. Greenberg (2005): “Analysis of additive instrumental variable models,” mimeo. Chib, S., and B. H. Hamilton (2000): “Bayesian analysis of cross-section and clustered data treatment models,” Journal of Econometrics, 97, 25 – 50. (2002): “Semiparametric Bayes analysis of longitudinal data treatment models,” Journal of Econometrics, 110, 67–89. Cox, D. R. (1958): The planning of experiments. Wiley, New York. Doksum, K. (1974): “Empirical probability plots and statistical inference for nonlinear models in the two sample case,” Annals of Statistics, 2, 267–277. Dunson, D. B., and N. Pillai (2004): “Bayesian density regression,” ISDS Discussion Paper 2004-33. Escobar, M. D. (1994): “Estimating normal means with a Dirichlet process prior,” Journal of the American Statistical Association, 89, 268–277.
91
Quantile Treatment Effects
Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588. Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1, 209–230. Firpo, S. (2005): “Efficient semiparametric estimation of quantile treatment effects,” mimeo. Fisher, R. A. (1935): Design of experiments. Oliver and Boyd. Gelfand, A. E., D. Dey, and H. Chang (1992): “Model determination using predictive distributions with implementation via sampling based methods (with discussion),” in Bayesian Statistics 4, ed. by J. M. Bernardo, J. O. Berger, J. O. Dawid, and A. F. M. Smith, pp. 147–167. Oxford University Press. Geman, S., and D. Geman (1984): “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. Ghosh, J. K., and T. Samanta (2002): “Nonsubjective Bayes testing - an overview,” Journal of Statistical Planning and Inference, 103, 205–223. Griffin, J. E., and M. F. J. Steel (2006): “Order-based dependent Dirichlet processes,” Journal of the American Statistical Association, 101, 179– 194. Hirano, K., G. W. Imbens, D. B. Rubin, and X.-H. Zhou (2000): “Assessing the effect of an influenza vaccine in an encouragement design,” Biostatistics, 1, 69–88. Ibrahim, J. G., and K. P. Kleinman (1998): “Semiparametric Bayesian inference for random effect models,” in Practical nonparametric and semiparametric Bayesian statistics, ed. by D. Dey, P. M¨ uller, and D. Sinha. Springer, New York. Imbens, G. W., and J. D. Angrist (1994): “Identification and estimation of local average treatment effects,” Econometrica, 62, 467–475.
92
Quantile Treatment Effects
Imbens, G. W., and D. B. Rubin (1997): “Bayesian inference for causal effects in randomized experiments with noncompliance,” The Annals of Statistics, 25, 305–327. Koop, G., and D. J. Poirier (1997): “Learning about the across-regime correlation in switching regression models,” Journal of Econometrics, 78, 217–227. Lehmann, E. L. (1974): Nonparametrics: statistical methods based on ranks. Holden-Day, San Francisco. Liu, J. S. (1994): “The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem,” Journal of the American Statistical Association, 89, 958–966. MacEachern, S. (1999): “Dependent nonparametric processes,” in ASA Proceedings of the Section on Bayesian Statistical Science. American Statistical Association, Alexandria. Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 9, 249– 265. Neyman, J. (1923): “Statistical problems in agricultural experiments,” Journal of the Royal Statistic Society, 2, 107–180. Poirier, D. J., and J. L. Tobias (2003): “On the predictive distribution of outcome gains in the presence of an unidentified parameter,” Journal of Business and Economic Statistics, 21, 258–268. Roy, A. (1951): “Some thoughts on the distribution of earnings,” Oxford Economic Papers, 3, 135–146. Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology, 66, 688– 701. Tanner, M. A., and W. Wong (1987): “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550.
93
Quantile Treatment Effects
Walker, S. G., P. Damien, P. W. Laud, and A. F. M. Smith (1999): “Bayesian nonparametric inference for random distributions and related functions (with discussion),” Journal of the Royal Statistical Society, Series B, 61, 485–527. Walker, S. G., and B. K. Mallick (1997): “A note on the scale parameter of the Dirichlet process,” Canadian Journal of Statistics, 25, 473–479. Wooldridge, J. M. (2000): Introductory econometrics: A modern approach. South-Western College Publishing, Cincinnati.
94
Complete Bibliography Abadie, A., J. Angrist, and G. Imbens (2002): “Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings,” Econometrica, 70, 91–117. Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152– 1174. Basu, S., and S. Chib (2003): “Marginal likelihood and Bayes factors for Dirichlet process mixture models,” Journal of the American Statistical Association, 98, 224–235. Bernardo, J. M., and A. F. M. Smith (1994): Bayesian theory. Wiley, New York. Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via Polya urn schemes,” The Annals of Statistics, 1, 353–355. Blei, D., and M. Jordan (2004): “Variational methods for the Dirichlet process,” in Proceedings of the 21st International Conference on Machine Learning. Blundell, R., and M. Costa Dias (2002): “Alternative approaches to evaluation in empirical microeconomics,” Portuguese Economic Journal, 1, 91–115. Bush, C. A., and S. N. MacEachern (1996): “A semi-parametric Bayesian model for randomized block designs,” Biometrika, 83, 275–285.
95
Complete Bibliography
Card, D. (1995): “Using geographic variation in college proximity to estimate the return to schooling,” in Aspects of labor market behaviour: Essays in honour of John Vanderkamp, ed. by L. Christofides, E. Grant, and R. Swidinsky. University of Toronto Press, Toronto. Card, D., and A. B. Krueger (1994): “Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania,” American Economic Review, 84, 772–793. Carota, C., and G. Parmigiani (2002): “Semiparametric regression for count data,” Biometrika, 89, 265–281. Chen, M. H., Q. M. Shao, and J. G. Ibrahim (2000): Monte Carlo methods in Bayesian computation. Springer, New York. Chernozhukov, V., and C. Hansen (2004): “The impact of 401K participation on savings: an iv-qr analysis,” The Review of Economics and Statistics, 86, 735–751. (2005): “An iv model of quantile treatment effects,” Econometrica, 73, 245–261. Chib, S. (2003): “On inferring effects of binary treatments with unobserved confounders (with discussion),” in Bayesian Statistics 7, ed. by J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, pp. 66–84. Oxford University Press, Oxford. Chib, S., and E. Greenberg (2005): “Analysis of additive instrumental variable models,” mimeo. Chib, S., E. Greenberg, and R. Winkelmann (1998): “Posterior simulation and Bayes factors in panel count data models,” Journal of Econometrics, 86, 33–54. Chib, S., and B. H. Hamilton (2000): “Bayesian analysis of cross-section and clustered data treatment models,” Journal of Econometrics, 97, 25 – 50. (2002): “Semiparametric Bayes analysis of longitudinal data treatment models,” Journal of Econometrics, 110, 67–89.
96
Complete Bibliography
Chib, S., and R. Winkelmann (2001): “Markov chain Monte Carlo analysis of correlated count data,” Journal of Business and Economic Statistics, 19, 428–435. Cifarelli, D. M., and E. Melilli (2000): “Some new results for Dirichlet priors,” The Annals of Statistics, 28, 1390–1413. Cifarelli, D. M., and E. Regazzini (1978): “Problemi statistici non parametrici in condizioni di scambialbilita parziale e impiego di medie associative,” Annali del Instituto di Matematica Finianziara dell Universit`a di Torino, Serie III, 12, 1-36. Cochran, W. G., and D. B. Rubin (1973): “Controlling bias in observational studies: A review,” Sankhya, Ser. A, 35, 417–446. Cowles, M. K., B. P. Carlin, and J. E. Connett (1996): “Bayesian tobit modeling of longitudinal ordinal clinical compliance data with nonignorable missingness,” Journal of the American Statistical Association, 91, 86–98. Cox, D. R. (1958): The planning of experiments. Wiley, New York. Deb, P., and P. Trivedi (1997): “Demand for medical care by the elderly: A finite mixture approach,” Journal of Applied Econometrics, 12, 313–336. (2002): “The structure of demand for health care: Latent class versus two-part model,” Journal of Health Economics, 21, 601–625. Doksum, K. (1974): “Empirical probability plots and statistical inference for nonlinear models in the two sample case,” Annals of Statistics, 2, 267–277. Dunson, D. B., and N. Pillai (2004): “Bayesian density regression,” ISDS Discussion Paper 2004-33. Escobar, M. D. (1994): “Estimating normal means with a Dirichlet process prior,” Journal of the American Statistical Association, 89, 268–277. Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588.
97
Complete Bibliography
Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 1, 209–230. Firpo, S. (2005): “Efficient semiparametric estimation of quantile treatment effects,” mimeo. Fisher, R. A. (1935): Design of experiments. Oliver and Boyd. Gelfand, A. E., D. Dey, and H. Chang (1992): “Model determination using predictive distributions with implementation via sampling based methods (with discussion),” in Bayesian Statistics 4, ed. by J. M. Bernardo, J. O. Berger, J. O. Dawid, and A. F. M. Smith, pp. 147–167. Oxford University Press. Gelfand, A. E., and A. Kottas (2002): “A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 11, 289–305. Gelfand, A. E., and A. F. M. Smith (1990): “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, 85, 398–409. Geman, S., and D. Geman (1984): “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. Ghosh, J. K., and R. V. Ramamoorthi (2003): Bayesian nonparametrics. Springer, New York. Ghosh, J. K., and T. Samanta (2002): “Nonsubjective Bayes testing - an overview,” Journal of Statistical Planning and Inference, 103, 205–223. Gilks, W. R., and P. Wild (1992): “Adaptive rejection sampling for Gibbs sampling,” Applied Statistics, 41, 337–348. Greene, W. H. (2001): “FIML estimation of sample selection models for count data,” in Economic Theory, Dynamics and Markets, Essays in Honor of Ryuzo Sato, ed. by T. Negishi, R. V. Ramachandran, and K. Mino. Kluwer, Boston.
98
Complete Bibliography
Griffin, J. E., and M. F. J. Steel (2004): “Semiparametric Bayesian inference for stochastic frontier models,” Journal of Econometrics, 123, 121– 152. Griffin, J. E., and M. F. J. Steel (2006): “Order-based dependent Dirichlet processes,” Journal of the American Statistical Association, 101, 179– 194. Gurmu, S. (1997): “Semiparametric estimation of hurdle regression models with an application to Medicaid utilization,” Journal of Applied Econometrics, 12, 225–242. Hausman, J. A., and D. A. Wise (1985): Social experimentation, NBER conference report. University of Chicago Press, Chicago. Heckman, J. (1997): “Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations,” Journal of Human Resources, 32, 441–462. Heckman, J., R. LaLonde, and J. Smith (1999): “The economics and econometrics of active labour market programs,” in Handbook of Labour Economics, Volume 3, ed. by O. Ashenfelter, and D. Card. Elsevier, Amsterdam. Heckman, J., and B. Singer (1984): “A method for minimizing the impact of distributional assumptions in econometric models of duration,” Econometrica, 52, 271–320. Heckman, J., and E. Vytlacil (2007): “Econometric evaluation of social programs,” in Handbook of Econometrics, Volume 6, ed. by J. Heckman, and E. Leamer. North Holland, Amsterdam. Hirano, K., G. W. Imbens, D. B. Rubin, and X.-H. Zhou (2000): “Assessing the effect of an influenza vaccine in an encouragement design,” Biostatistics, 1, 69–88. Ibrahim, J. G., and K. P. Kleinman (1998): “Semiparametric Bayesian inference for random effect models,” in Practical nonparametric and semiparametric Bayesian statistics, ed. by D. Dey, P. M¨ uller, and D. Sinha. Springer, New York.
99
Complete Bibliography
Imbens, G. W., and J. D. Angrist (1994): “Identification and estimation of local average treatment effects,” Econometrica, 62, 467–475. Imbens, G. W., and D. B. Rubin (1997): “Bayesian inference for causal effects in randomized experiments with noncompliance,” The Annals of Statistics, 25, 305–327. Ishwaran, H., and L. F. James (2002): “Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information,” Journal of Computational and Graphical Statistics, 11, 508–532. Jimenez-Martin, S., J. Labeaga, and M. Martinez-Granado (2002): “Latent class versus two-part models in the demand for physician services across the European Union,” Health Economics, 11, 301–321. Koop, G. (2003): Bayesian econometrics. Wiley, Chicester. Koop, G., and D. J. Poirier (1997): “Learning about the across-regime correlation in switching regression models,” Journal of Econometrics, 78, 217–227. Kozumi, H. (2002): “A Bayesian analysis of endogenous switching models for count data,” Journal of the Japan Statistical Society, 32, 141–154. Lee, M.-J. (2004): “Selection correction and sensitivity analysis for ordered treatment effect on count response,” Journal of Applied Econometrics, 19, 323–337. Lehmann, E. L. (1974): Nonparametrics: statistical methods based on ranks. Holden-Day, San Francisco. Li, M., D. Poirier, and J. Tobias (2004): “Do dropouts suffer from dropping out? Estimation and prediction of outcome gains in generalized selection models,” Journal of Applied Econometrics, 9, 203–225. Liu, J. S. (1994): “The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem,” Journal of the American Statistical Association, 89, 958–966.
100
Complete Bibliography
(2001): Monte Carlo strategies in scientific computing. Springer, New York. ´ pez-Nicola ´ s, A. (1998): “Unobserved heterogeneity and censoring in the Lo demand for private health care,” Health Economics, 7, 429–437. MacEachern, S. (1999): “Dependent nonparametric processes,” in ASA Proceedings of the Section on Bayesian Statistical Science. American Statistical Association, Alexandria. MacEachern, S. N., M. Clyde, and J. S. Liu (1999): “Sequential importance sampling for nonparametric Bayes models: The next generation,” Canadian Journal of Statistics, 27, 251–267. ¨ ller (1998): “Estimating mixtures of MacEachern, S. N., and P. Mu Dirichlet process models,” Journal of Computational and Graphical Statistics, 7, 223–238. (2000): “Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models,” in Robust Bayesian analysis, ed. by F. Ruggeri, and D. R´ıos-Ins´ ua. Springer. Meer, J., and H. S. Rosen (2004): “Insurance and the utilization of medical services,” Social Science and Medicine, 58, 1623–1632. Miller, R. H., and H. S. Luft (1994): “Managed care plan performance since 1980,” The Journal of the American Medical Association, 271, 1512– 1519. Mullahy, J. (1986): “Specification and testing in some modified count data models,” Journal of Econometrics, 33, 341–365. ¨ ller, P., and F. A. Quintana (2004): “Nonparametric Bayesian data Mu analysis,” Statistical Science, 19, 95–110. Munkin, M. K., and P. K. Trivedi (2003): “Bayesian analysis of a selfselection model with multiple outcomes using simulation-based estimation: an application to the demand for healthcare,” Journal of Econometrics, 114, 197–220.
101
Complete Bibliography
Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, 9, 249– 265. Neyman, J. (1923): “Statistical problems in agricultural experiments,” Journal of the Royal Statistic Society, 2, 107–180. Pohlmeier, W., and V. Ulrich (1995): “An econometric model of the twopart decisionmaking process in the demand for health care,” The Journal of Human Resources, 30, 339–361. Poirier, D. J., and J. L. Tobias (2003): “On the predictive distribution of outcome gains in the presence of an unidentified parameter,” Journal of Business and Economic Statistics, 21, 258–268. Richardson, S., V. Viallefont, and P. J. Green (2002): “Bayesian analysis of poisson mixtures,” Journal of Nonparametric Statistics, 14, 181– 202. Riphahn, R. T., A. Wambach, and A. Million (2003): “Incentive effects in the demand for health care: A bivariate panel count data estimation,” Journal of Applied Econometrics, 18, 387–405. Robert, C. P., and G. Casella (1999): Monte Carlo statistical methods. Springer, New York. ´ ndez (2005): “Counts with an endogeRomeu, A., and M. Vera-Herna nous binary regressor: A series expansion approach,” Econometrics Journal, 8, 1–22. Rosenbaum, P. R., and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 1, 41–55. Roy, A. (1951): “Some thoughts on the distribution of earnings,” Oxford Economic Papers, 3, 135–146. Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology, 66, 688– 701.
102
Complete Bibliography
Sethuraman, J. (1994): “A constructive definition of Dirichlet priors,” Statistica Sinica, 4, 639–650. SOEP Group (2001): “The German Socio-Economic Panel (GSOEP) after more than 15 years - Overview,” in Proceedings of the 2000 Fourth International Conference of German Socio-Economic Panel Study Users (GSOEP2000), ed. by E. Holst, D. R. Lillard, and T. A. DiPrete, vol. 70 of Vierteljahrshefte zur Wirtschaftsforschung, pp. 7–14. Tanner, M. A., and W. Wong (1987): “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550. Terza, J. V. (1998): “Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects,” Journal of Econometrics, 84, 129–154. Terza, J. V., and P. W. Wilson (1990): “Analyzing frequencies of several types of events: A mixed multinomial-poisson approach,” The Review of Economics and Statistics, 72, 108–15. Van Ophem, H. (2000): “Modeling selectivity in count-data models,” Journal of Business and Economic Statistics, 18, 503–511. Verbeke, G., and E. Lesaffre (1996): “A linear mixed-effects model with heterogeneity in the random-effects population,” Journal of the American Statistical Association, 91, 217–221. Verdinelli, I., and L. Wasserman (1995): “Computing Bayes factors using a generalization of the Savage-Dickey density ratio,” Journal of the American Statistical Association, 90, 614–618. Vijverberg, W. P. M. (1993): “Measuring the unidentified parameter of the extended Roy model of selectivity,” Journal of Econometrics, 57, 69–89. Walker, S. G., P. Damien, P. W. Laud, and A. F. M. Smith (1999): “Bayesian nonparametric inference for random distributions and related functions (with discussion),” Journal of the Royal Statistical Society, Series B, 61, 485–527.
103
Complete Bibliography
Walker, S. G., and B. K. Mallick (1997): “A note on the scale parameter of the Dirichlet process,” Canadian Journal of Statistics, 25, 473–479. Wooldridge, J. M. (2000): Introductory econometrics: A modern approach. South-Western College Publishing, Cincinnati. Zeger, S. L., and M. R. Karim (1991): “Generalized linear models with random effects: A Gibbs sampling approach,” Journal of the American Statistical Association, 86, 79–86. Zimmer, D. M., and P. K. Trivedi (2006): “Using trivariate copulas to model sample selection and treatment effects: Application to family health care demand,” Journal of Business and Economic Statistics, 24, 63–76.
104
Erkl¨ arung Ich versichere hiermit, dass ich die vorliegende Arbeit mit dem Thema Three Essays on Bayesian Nonparametric Modeling in Microeconometrics ohne unzul¨assige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus anderen Quellen direkt oder indirekt u ¨bernommenen Daten und Konzepte sind unter Angabe der Quelle gekennzeichnet. Weitere Personen, insbesondere Promotionsberater, waren an der inhaltlich materiellen Erstellung dieser Arbeit nicht beteiligt (siehe hierzu die Abgrenzung auf der folgenden Seite). Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder ¨ahnlicher Form einer anderen Pr¨ ufungsbeh¨orde vorgelegt.
Konstanz, den 26. Juli 2006
Markus Jochmann
105
Abgrenzung Ich versichere hiermit, dass ich die vorliegende Arbeit ohne Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Eine Ausnahme stellt Kapitel 2 (Estimating the Demand for Health Care with Panel Data: A Semiparametric Bayesian Approach) dar. Bei der Erarbeitung dieses Aufsatzes habe ich die Daten aufbereitet und den verwendeten MKMC-Algorithmus programmiert. Ansonsten ist der Aufsatz das Ergebnis einer gemeinsamen Arbeit mit Roberto Le´on-Gonz´alez.
Konstanz, den 26. Juli 2006
Markus Jochmann
106