CORELA LABORATOIRE DE RECHERCHE SUR LA CONSOMMATION INRA – 65, BOULEVARD DE BRANDEBOURG – 94205 IVRY-SUR-SEINE CEDEX TÉL : 33 (0) 1 49 59 69 25 – FAX : 33 (0) 1 49 59 69 90 www.ivry.inra.fr/corela
Estimating Demand Response with Panel Data: Sébastien Lecocq, Jean-Marc Robin Document de travail n°05-03
INSTITUT NATIONAL DE LA RECHERCHE AGRONOMIQUE Département d’Economie et Sociologie Rurales
Estimating Demand Response with Panel Data¤ Sébastien Lecocq y
Jean-Marc Robinz
January 2004 Revised, June 2005
Abstract In this paper, we extend to panel data the Iterated Linear Least Squares estimator of Blundell and Robin (1999). It is shown to be consistent when total expenditure and regression residuals are correlated, either because of simultaneity or because of unobserved heterogeneity. We propose separate tests for these two e¤ects. Monte Carlo experiments are then conducted and the estimator is applied to data drawn from a French Consumer Panel. Keywords: Conditional linearity, Demand analysis, Panel data, Unobserved heterogeneity. J.E.L. Classi…cation: C33, D12.
¤
We thank C. Boizot for helpful assistance in formatting the data. We are also grateful to P. Bertail, R. Blundell, A. Chesher, F. Gardes and two anonymous referees for their comments. y INRA-CORELA, 65 boulevard de Brandebourg, 94205 Ivry-sur-Seine Cedex, France. E-mail:
[email protected]. z EUREQUA, University of Paris I, 106-112 b oulevard de l’Hôpital, 75647 Paris Cedex 13, France, and CREST-INSEE. E-mail:
[email protected].
1
1
Introduction
Ordinary Least Squares (linear or nonlinear) generally do not provide a consistent estimator of demand systems because of the potential endogeneity of total expenditure in the right-hand side of the system of demand equations. This source of potential bias is usually controlled for by means of instrumental variable techniques, using income as an instrument for total expenditure. Yet, this estimation procedure is not consistent if some unobserved heterogeneity component is responsible for the cross-sectional dispersion of both income and preferences. The empirical evidence of such biases has recently been demonstrated by a series of studies on panel and pseudo-panel data (see for instance Gardes et al., 1996, 1998). Panel data provide the best response to unobserved heterogeneity. They permit the identi…cation of models when unobserved heterogeneity is correlated with conditioning variables under considerably less stringent assumptions than those required by instrumental variable estimators. In the case of linear models, simple linear transformations of the original model are usually su¢cient to get rid of the unobserved heterogeneity component (di¤erences between the variables and their individual means, …rst di¤erences, etc.), and consistent estimates can then be obtained by standard least squares techniques. However, for nonlinear models, such transformations generally do not exist. Unfortunately, nonlinearity (with respect to both parameters and variables) is a typical ingredient of theoretical demand systems (the Almost Ideal and Translog demand models, and the Linear Expenditure System are of this sort). Although nonlinear, parametric demand systems generally share a common property: they are conditionally linear, that is they are linear in all the parameters conditional on a set of functions of explanatory variables and parameters themselves. Browning and Meghir (1991) exploited this conditional linearity to construct a simple Iterated Linear Least Squares (ILLS) estimator for the Almost Ideal demand model, and Blundell and Robin (1999) generalized the estimator and derived the conditions for consistency and asymptotic normality. Blundell and Robin also show how to account for the endogeneity of total expenditure using instrumental variables and augmented regression
2
techniques. This estimator is de…nitely a preferred alternative to Nonlinear Three-Stage Least Squares (N3SLS) for large demand systems. To understand the dimensionality problem, N3SLS would also be obtained by iterating a series of linear regressions, yet on a …rst-order linear expansion of the model. If there are N commodities and N equations, the linear expansion is a system of N equations with a number of the order of N 2 parameters in each equation. In contrast, the ILLS estimator is obtained by iterating ordinary least squares applied to a system of N equations each of approximately N parameters. The purpose of this paper is to extend the Iterated Linear Least Squares estimator of Blundell and Robin (1999) to panel data. In section 2 we develop a consistent and asymptotically normal ILLS estimator for conditionally linear demand systems which applies to situations where total expenditure and regression residuals are correlated either because of simultaneity (common shocks both determine taste and total expenditure changes), or because of unobserved heterogeneity. The estimator’s principle is based on similar ideas as those used in the seminal paper of Mundlak (1978). Monte Carlo experiments are presented in section 3. They show that our estimator is consistent when unobserved heterogeneity is responsible for a correlation between total expenditure and the error terms, while that used in Blundell and Robin potentially induces important biases in parameter estimates. Section 4 of the paper proceeds to an application on data drawn from the SECODIP French Consumer Panel. The main lesson of this application is a good news for the standard practice. It shows that although we de…nitely detect the presence of unobserved heterogeneity correlating total expenditure to the error terms this correlation is not strong enough to bias the usual cross-sectional instrumental variable estimates in any signi…cant way. Section 5 concludes.
2
Estimation procedure
We consider a demand model of N “conditionally linear” equations. By conditionally linear, we mean that equations are linear in all the parameters conditional on the same functions of explanatory variables and parameters themselves. Speci…cally, we 3
h ; :::; yh )0 are indexed by two indices, suppose that the N dependent variables yht = (y1t Nt
h = 1; :::; H; and t = 1; :::; T (individual, time), and are related to a vector xht of nx conditioning variables by the following system of equations y hit = g(xht; µ)0 µi + uhit ;
i = 1; :::; N;
(1)
with µi 2 Rng a vector of parameters, µ = (µ01; :::; µ0N )0, g(¢; ¢) a ng-vector of functions
of xht and µ, and uhit an error term. Examples of such a model are the Almost Ideal and Translog demand models, and the Linear Expenditure System. Blundell and Robin (1999) analyze an attractive Iterated Linear Least Squares (ILLS) estimator which exploits the conditional linearity of (1). This estimator consists of the following series of iterations: update each iteration value µn by the coe¢cient µn+1 of the regression of yth on g(xht ; µ n). If numerical convergence occurs then it can be shown that this procedure yields a consistent and asymptotically normal estimator of µ. One important feature of the ILLS estimator is that endogenous regressors can be controlled for, using the limited information augmented regression technique of Hausman (1978) and Holly and Sargan (1982). Now suppose that a subset of nex · nx conditionning variables in xht can be correlated
with residuals uhit not only because of common shocks but also because of unobserved heterogeneity. To put it precisely, consider the Quadratic Almost Ideal (QAI) demand model of Banks et al. (1997), where the budget share yith on good i for household h at time t with log total expenditure mht and the N-vector of log prices pht is given by y hit = ®i + ° 0ipht + ¯ i (mht ¡ a(pht; µ)) + ¸i
(mht ¡ a(pht ; µ))2 + uhit ; b(pht; µ)
(2)
with the following nonlinear price aggregators 1 a(pht; µ) = ®0 pht + p h0 ¡ph ; 2 t t b(pht; µ) = exp(¯ 0p ht); where ® = (®1; :::; ®N ) is a vector of taste shifters, ¡ = (°1 ; :::; °N )0 and ¯ = (¯ 1; :::; ¯ N ) are respectively a matrix and a vector of parameters, µ is the set of all parameters, and uhit is a measurement error (due for example to purchase infrequency) correlated 4
with mht. Households’ heterogeneity is introduced in the system through the intercepts (®) which incorporate observed heterogeneity, modelled as linear combinations of sociodemographic variables observed in the data (age and profession of the household’s head, composition of the household, etc.), and unobserved heterogeneity ®ht = Asht + e ®h;
A = (®0i );
with sht a vector of socio-demographic variables and ® e h a N -vector of speci…c e¤ects, that is …xed or random e¤ects. Note that modelling heterogeneity in this way does not mean that heterogeneity enters linearly the model. As can easily be seen in (2), it
appears not only linearly in the intercepts but also nonlinearly in all expenditure terms through the …rst price aggregator. There are only two other places where heterogeneity could be introduced in a nonlinear way, namely price parameters (°) and/or expenditure parameters (¯ and ¸). However, there are usually not su¢cient price variations in the data to enable the estimation of heterogeneous price e¤ects, and heterogeneity in expenditure parameters would generate products of two heterogeneity components (one in the parameter and one in the price aggregator), which would make the estimation much more complicated. Total expenditure can be decomposed as mht = m00rth + m e h + v ht;
(3)
where m e h is a speci…c e¤ect, vht is an idiosyncratic shock correlated with uht, m0 is a
h0 h0 0 h vector of parameters, and rth = (sh0 t ; pt ; zt ) with zt a set of identifying instruments for
mht, household’s income for instance, which can also be decomposed into an individualspeci…c e¤ect, zeh, and a shock, wth
zth = e zh + wth:
In this model, mht can be correlated with the residuals of the demand system, either because common shocks vht and uht are correlated or because speci…c e¤ects m e h and ® eh are correlated or both. Note also that, because of unobserved heterogeneity, zth may itself
be correlated with the error terms in (2) and/or (3), i.e. zeh and ® e h correlated and/or zh and m e e h correlated respectively. In general, panel data help to reduce the biases due 5
to unobserved heterogeneity, and instrumental variables (IV) can be used to control for simultaneity. In the case of linear models, traditional estimation methods are based on linear transformations of the original model (…rst di¤erences, for example). These methods are a very convenient way to estimate models that are linear with respect to the unobserved heterogeneity component since simple linear transformations are su¢cient to get rid of speci…c e¤ects and consistent estimates can then be obtained by standard least squares techniques. However, they cannot be used when unobserved heterogeneity enters nonlinearly the model. For instance, in the nonlinear model (2), speci…c e¤ects cannot all be eliminated by any transformation. The remainder of the section proposes a method for reducing the biases induced by the two potential sources of endogeneity of total outlay in the case of models of the form of (2) which are nonlinear with respect to total expenditure. Following Mundlak (1978), we shall assume that the correlation between m e h and zeh
can be fully accounted for by the linear regression of mht on the mean of zth over time, P zh² = T1 Tt=1 zht . Although this approach does not rule out the case where some other unobserved factors would be correlated with other variables, it is the most attractive since it allows to capture the main correlation in a very simple way.1 Speci…cally, mht = m00rth + ¿ 0 z²h + vht = Czht + vth; where zht = (rth0 ; z²h0 )0, C = (m00; ¿ 0 ), and vth is such that E[vth j zh1 ; :::; zhT ] = 0: Symmetrically, we assume that the correlation between ® e h, and m e h and zeh can be fully P accounted for by the introduction of mh² = T1 Tt=1 mht and z²h as additional regressors in the demand system, that is
1
® eh = Ámh² + ¢z²h;
Á = (Ái); ¢ = (±0i ):
Chamberlain’s (1984) approach would b e more general, but it would also b e parametrically very demanding without guaranteeing better results.
6
Finally, common shocks are controlled for by assuming the following orthogonal decomposition uht = ½vth + "ht ;
½ = (½i);
where "ht is such that E["ht j zh1 ; :::; zhT ; v1h; :::; vTh ] = 0; h h h h E["ht"h0 t j z1 ; :::; zT ; v1 ; :::; vT ] = §:
Under these assumptions, model (2) can be written under the general formulation (1), with nex = 1, xht = (sht; mh² ; zh² ; pht; mht),
g(xht; µ)0µ i = ®0isht + Áimh² + ±0iz²h + °0i pht ³ ´ 1 + ¯ i[mht ¡ ph0 Asht + Ámh² + ¢zh² ¡ p h0 ¡ph ] t 2 t t £ h ¡ h ¢ 1 h0 h¤ 2 h h mt ¡ ph0 t Ast + Ám² + ¢z² ¡ 2 p t ¡pt + ¸i ; exp(¯ 0 pht)
and uhit = ½0ivth + "hit. The ILLS estimation of µ is then based on the following set of identifying restrictions ½µ h ¶ h i¾ g(xt ; µ) h h 0 0 h y ¡ g(x ; µ) µ ¡ ½ v = 0; E i it t i t vht
i = 1; :::; N:
Practically, we estimate C by Ordinary Least Squares (OLS) ÃH T !¡1 H X T X XX h h0 h h0 b = C mt zt zt zt h=1 t=1
= C+
H X T X
h=1 t=1
vht zh0 t
h=1 t=1
ÃH T XX
!¡1
zhtzh0 t
h=1 t=1
;
and the ILLS estimator is obtained as a solution to the empirical counterpart of (4) (µ ) ¶ H X T i X g(xht; b µ) h h yit ¡ g(xht; b µ)0bµi ¡ b ½0ib vth = 0; h v b t t=1 h=1
where
b ht vbth = xht ¡ Cz
b ¡ C)zh: = vth ¡ (C t 7
(4)
Note that apart from the use of z²h in addition to zht as identifying instruments, and
of mh² and z²h as independent variables in the demand equations, this estimator is not
di¤erent from the one developed for cross-section data. The asymptotic result derived by Blundell and Robin therefore still applies. However, with panel data, the fact that the error terms "ht are correlated not only across equations but also across time implies that the variance-covariance matrix of the ILLS estimator takes a less compact form. Under a set of regularity assumptions, it can be shown that p H (bb ¡ b) Ã N (0; J ¡1 -(J 0 )¡1);
(5)
where à denotes weak convergence,
µ ¶ µ 1 ::: µN b = vec ; ½1 ::: ½N
J is the N by N block matrix which i by j block is
Jij = ±ij K + (Mij; 0(ng +nex)£nxe );
with ±ij the Kronecker delta, equal to 1 if i = j and 0 otherwise, and 0 (ng +nxe)£nxe a (ng + nxe) by nex matrix of zeros,
¶µ ¶0 H T µ 1 X X g(xht; µ) g(xht; µ) K = plim ; H vth vht h=1 t=1 ¶ H T µ 1 X X g(xht; µ) 0 @g(xht; µ) Mij = plim µi ; H vth @µ0j t=1 h=1
and
(T ) µ h ¶) (X H T 1 X X h g(xt ; µ) h 0 h0 - = plim "t "h0 t - (g(x t ; µ) ; vt ) h H v t t=1 h=1 t=1 (T )( T ) H ³ ´ ´ X³ 1 X X 0 h ¡1 h h0 h0 ¡1 0 + plim R vt - DL zt vt R - zt L D ; H h=1
t=1
t=1
with R = (½1; :::; ½N ), and
¶ H T µ 1 X X g(xht; µ)zh0 t D = plim ; H h=1 t=1 0nex£nz H T 1 X X h h0 L = plim zt zt ; H t=1 h=1
8
where nz represents the dimension of zht. Estimates of J and - are easily obtained by replacing µ in the preceding formulae by its ILLS estimator and by removing the plim operator. We end up this section by summarizing the estimation procedure. First, unbiased estimates of parameters are obtained by a procedure which amounts to iterating a series of OLS regressions of yhit on g(xht; bµ) and vbht .2 Within each iteration, the estimation is performed equation by equation, imposing the additivity and homogeneity constraints only. Additivity is by construction automatically veri…ed in Almost Ideal-type demand models and homogeneity can easily be imposed by considering in each equation N ¡ 1
relative prices instead of N absolute prices (Deaton and Muellbauer, 1980). Convergence is declared when the criterion max j(µn+1 ¡ µn) ® µnj · tol, where ® is the element-by-
element division operator and tol is a prede…ned tolerance level, is satis…ed. Second, once convergence has occurred, standard errors of all parameters in all equations are simultaneously calculated using the White-type asymptotic variance-covariance matrix given in (5), which takes into account the introduction of a predicted regressor, vbth, in each equation and the correlation of the error terms, "hit , across equations and across time. Third, symmetry restricted parameters and their asymptotic variance-covariance matrix are obtained using a minimum distance estimator.3
3
Monte Carlo experiments
We consider Monte Carlo simulations for a QAI demand model of N = 3 equations. The number of individuals and the number of time periods are respectively H = 500; 1000 and T = 5; 10. All individual-speci…c e¤ects and idiosyncratic error terms are normally distributed with zero-mean, and their variance is equal to the square of their respective coe¢cient if any, and to unity otherwise. The logarithm of income, which is used as an instrument for log total expenditure, is generated as zth = 7 + zeh + wht;
2 Seemingly Unrelated Regressions (SUR) would give strictly identical parameter estimates since the same set of variables appears in the right-hand side of each equation. 3 A complete Gauss 6.0 language subroutine for the QAI demand model is available from the authors upon request.
9
where e zh is an individual-speci…c component and wht is an idiosyncratic component. The logarithm of total expenditure is computed as
mht = 1 + ¼0p ht + 0:3zth + m e h + vth;
where p ht is a N-vector of log relative prices,4 ¼ = (0:4; 0:2; 0:1)0 , vth is an idiosyncratic shock, and m eh = e yh +¿ zeh, with yeh an individual-speci…c e¤ect independent from zth. In this equation, the instrument is thus correlated with the error term through m e h as long as ¿ di¤ers from zero.
The QAI demand system is given by relation (2), where ®ht = ® + e ®h. Because of
adding-up, we only need to specify the parameters entering the …rst N ¡ 1 equations.
Let these be ® = (0:4; 0:4)0, ¡ = (°; ¡°)0 with ° = (¡0:3; 0:3; 0:0)0 , ¯ = (¡0:2; 0:2)0 and ¸ = (0:1; ¡0:1)0 . The vector of individual-speci…c components is de…ned as e h = §aeah + Áe ® y h + ±e zh ;
where e ah is a (N ¡1)-vector of individual-speci…c e¤ects, §a = ((0:1; ¡0:3)0 ; (¡0:3; 0:1)0 )
is the square root of its variance-covariance matrix, Á = (¿; ¡¿)0 and ± = (¿; ¡¿)0 , and the vector of idiosyncratic error terms as
uht = §eeht + ½vht; where eht is a (N ¡ 1)-vector of idiosyncratic terms, §e = ((0:2; ¡0:1)0 ; (¡0:1; 0:2)0) is the square root of its variance-covariance matrix, and ½ = (0:3; ¡0:3)0 . The dependence
of ® eh on zeh and yeh insures that the error terms can be correlated both with income and with total expenditure because of unobserved heterogeneity, and the dependence of uht
on vht allows the error terms to be correlated with total expenditure because of common shocks. We ran 1000 simulations for di¤erent values of ¿, the correlation between individualspeci…c components. All variables are newly generated at each replication. Note also that, apart from the constant terms, we set the parameters of the second equation to minus those of the …rst equation and those of the third equation to zero. This enables 4
Prices are generated from univariate normal distributions and are therefore orthogonal one another. Introducing correlations does a¤ect the estimates but does not change fondamentally the results.
10
us to restrict the exposition of the results to the …rst equation only. The parameters satisfy adding-up, homogeneity and symmetry constraints, and the model is estimated without imposing the last two.5 Before turning to the results, let us only mention that accounting for individual-speci…c e¤ects in the reduced form equation yields unbiased estimates whatever the value of ¿, whereas ignoring them produces a biased estimate for the instrument parameter as soon as ¿ is non-zero. Table 1 displays the results obtained with the estimator described in section 2 when ¿ = 0:1. For comparison purposes, we report in the same table the parameters estimated without correcting for unobserved heterogeneity, as explained in Blundell and Robin (1999). [ Table 1 about here ] These simulations clearly show that our procedure is consistent when total expenditure is correlated to the error terms because of unobserved heterogeneity, while that used in Blundell and Robin induces severe biases in all parameters but two, namely the constant and the parameter of total expenditure squared. Additional results (not reported here) show that the larger the absolute value of ¿, the larger the biases.6
4
Application
The data we use are drawn from the 1991, 1992 and 1993 issues of the SECODIP (Société d’Etude de la Consommation, Distribution et Publicité) French Consumer Panel. This survey records the expenditures and quantities for a list of food items. Every week, sampled households are requested to send to SECODIP a diary …lled in with their purchases of the goods on the list. However, other households’ economic and sociodemographic characteristics, like income or family size, are reported only once a year.7 Households remain in the panel during an average period of four years. 5
Imposing or not the parameter restrictions in the Monte Carlo simulations does not matter since we want to show the consistency of the estimator and that when the unrestricted estimator is shown to be consistent then the restricted estimator is consistent too. 6 Of course, b oth procedures yield consistent estimates when ¿ = 0. 7 The sample only considers households of the 21 regions in the metropolitan France without taking into account (i ) those living in Corsica and France’s overseas departments and territories, and (ii ) those made up of single men for the 1993 sample only.
11
We constructed seven groups of goods: vegetables, fruit, meat, chicken and game, …sh, dairy produce, and alcohol. We aggregated weekly expenditures by quarter so as to avoid the problem of purchase infrequency, assuming that a household that is not observed spending some money on a particular group during a quarter can be considered over that period as a non-consumer of the goods in the group. Aggregating by quarter circumvents the issue of wrongly treating consumers as non-consumers, and therefore reduces the number of zeros in the dependent variables and the biases that they may induce. We kept the balanced panel of all households reporting a positive total food expenditure at all twelve quarters. Our …nal sample consists of 2148 households, i.e. 25776 observations. Note that some households do not buy certain goods, in particular quarters or ever (like vegetarians meat). Table A1 in the appendix reports for each group the number and proportion of observations where the budget share is equal to zero due to non-participation. To keep the sample to a reasonable size, we nevertheless decided to treat these non-participating households as all others. Group-level prices were computed as the ratio of expenditures on quantities purchased. For those households who do not purchase any item of a given group during a given quarter, we imputed the mean price of all purchases made in the same quarter, the same region and the same city size. All these prices are actually unit values, that is ratios of expenditures over quantities, which allow us both to have reasonable price variations and to avoid a strong collinearity in prices, as can be seen in table A2 of the appendix where the correlation matrix of prices is presented. This raises the issue of the endogeneity of unit values pointed out by Deaton (1988): two di¤erent households facing the same price system may well exhibit di¤erent unit values because their shares of the di¤erent varieties composing the group are di¤erent. In our particular case, this problem is largely attenuated by averaging unit values over quarters. The data are thus aggregated by quarter not only to avoid purchase infrequency, but also to get rid of the endogeneity of unit values. Table 2 gives some descriptive statistics for the main variables used in the estimation. [ Table 2 about here ]
12
We then considered the estimation of a QAI demand model on these data. First, we estimated a reduced form equation for log total food expenditure, using as instruments a set of socio-demographic variables (the age of the household’s head and its square, variables describing the demographic composition of the household, professional occupation, region and city size), quarterly and annual dummy variables, six log relative prices, one log absolute price (alcohol, the reference) and, as proper identifying instruments, the log of household’s income and its mean over time. 8 The results are presented in table A3 of the appendix. Many variables are signi…cant at the 5% level: 20 out of 28 socio-demographic variables, dummies for all quarters and one year, log prices of all groups but …sh, and the mean of log income over time. However, the coe¢cient of log income is not signi…cant, probably because of too small variations of household’s income over time. Two reasons for this are that income, as we mentioned above, is reported by households only once a year (i.e. same value for all quarters in a same year), and that it is actually recorded as the mean value of one (out of 18) income interval, which implies that there is not much variability in income from one year to another either. Then, we proceeded to the estimation of the demand equations using the iterative algorithm previously described, and imposing price homogeneity and symmetry from the beginning. Independent variables are the same set of socio-demographic variables as in the reduced form, six log relative prices, log food expenditure, its square and its mean over time (mh² ), the mean of log income over time (z²h) and residuals predicted from the instrumental regression (b vth). Convergence occured at a very demanding level of tolerance (tol = 10¡5) within 41 iterations. To save space, only the main results are reported in table 3.9 [ Table 3 about here ] The quadratic expenditure term is signi…cant at the 5% level in the vegetables and meat equations only, which con…rms the …ndings of Banks et al. (1997) showing that the original version of the Almost Ideal model was unlikely to be rejected for most food 8
The use of income as an instrument raises a number of issues, highlighted by Hausman et al. (1995), but is usual pratice (see for example Banks et al., 1997, Blundell and Robin, 1999, 2000). We also tried to model log total food expenditure as a function of polynomial terms in log income but it turned out that the …rst-order term alone was doing better. 9 A full account of the estimation results is however available from the authors.
13
items. The overidenti…cation tests show that variables used as instruments have the required properties. The exogeneity of food expenditure due to simultaneity is rejected since the coe¢cients associated with the residuals of the instrumental regression vbht are
signi…cant in all cases. Testing for the absence of mh² and z²h in the regressions provides, on the other hand, direct tests for existing biases due to unobserved heterogeneity. The
results show that mh² is signi…cant for all groups but vegetables and alcohol, and that z²h
is signi…cant in the chicken and game, …sh, dairy produce and alcohol equations, which is a clear indication that the usual instrumentation by income is not alone su¢cient to fully control for the endogeneity of total food expenditure. Tables 4 to 6 respectively display the estimated budget shares, food expenditure elasticities and conditional uncompensated own-price elasticities evaluated at the sample mean point and each quartile of the household’s income distribution. 10 [ Tables 4 to 6 about here ] The estimated budget shares are all very signi…cant and fairly comparable at the mean point to the observed values (see table 2). The elasticities are also all very signi…cant and reasonable. All total food expenditure elasticities are positive and all conditional uncompensated own-price elasticities are negative. In all cases, the di¤erences between income quartiles are very small and, when compared to the estimated standard errors, not signi…cant. Nevertheless, it is worth noting from table 5 that except for meat, the higher the household’s income, the larger food expenditure elasticities. Although this result may seem counter-intuitive, it can be explained by the decrease, implied by the additivity restriction and observed in table 4, of most budget shares (namely fruit, meat, chicken and game, and dairy produce) as income increases.11 Similarly, table 6 suggests that high income households are more sensitive to an own-price increase for vegetables, fruit and alcohol, but are less sensitive for meat, chicken and game, …sh and dairy produce. 10
Standard errors are obtained using the delta method. Income elasticities cannot be computed since we need the income elasticity of food expenditure, given by the coe¢cient of log income in the reduced form, which is not signi…cant. P 11 The additivity restriction indeed implies that N i=1 yi ei = 1, where y i and e i are the budget share and food exp enditure elasticity of good i resp ectively, is satis…ed for all quartiles of income.
14
Eventually, we report in table 7 the sample mean point values of the previous elasticities together with those that are obtained without correcting for unobserved heterogeneity. [ Table 7 about here ] The estimated elasticities are similar in both cases, the only di¤erence to be signi…cant at the 5% level arising between the values of the food expenditure elasticity for …sh.12 Although unobserved heterogeneity is found to be responsible for a correlation between food expenditure and the error terms, this correlation does not seem to be large enough to severely bias the estimation of budget and price elasticities.13 This result may seem quite surprising as it is not usual in pratice to …nd that unobserved heterogeneity has almost no signi…cant e¤ect on the estimates. Possible explanations to this could be the relatively small time dimension of the panel (three years), or the rather small variability of household’s income over time. Yet, similar results were obtained using data of better quality on these two particular points (namely pseudo-cohorts constructed from the UK Family Expenditure Survey). Beyond these data-related issues, the good performance of the Iterated Linear Least Squares estimator, neglecting unobserved heterogeneity, would be per se good news since it would con…rm (at least on our data) the validity of the usual practice which, because of the lack of consumer panels, just hopes that instrumenting total expenditure by income both controls for simultaneity and unobserved heterogeneity.
5
Conclusion
Many popular demand systems are nonlinear. Consequently, their estimation on panel data cannot be performed consistently by simple linear transformations of the variables, the speci…c e¤ects resulting from unobserved heterogeneity entering these models in a nonlinear way. In this paper, we show how the Iterated Linear Least Squares estimator developed by Blundell and Robin (1999) can be extended to panel data. The estimation 12 This di¤erence can be explained by arguing that …sh is probably one of the food products for which tastes are the most heterogeneous. 13 Note however that most standard errors are understated when the correlation of the error terms across time is not taken into account in the variance-covariance matrix of the estimator.
15
procedure allows for separate tests of the simultaneity of total expenditure and taste shocks determination, and of the existence of a bias due to the presence of unobserved heterogeneity. Monte Carlo experiments show that our procedure is consistent when unobserved heterogeneity correlates total expenditure to the regression residuals, whereas that used in Blundell and Robin potentially yields strong biases in parameter estimates. The method is then applied to the estimation of a Quadratic Almost Ideal demand system for seven food commodities using a French Consumer Panel. The results clearly show that unobserved heterogeneity is indeed a source of potential biases which cannot necessarily be controlled for by instrumenting total food expenditure by income, as is usually done in the literature. However, the full comparison of our corrected estimates with the standard IV estimates shows that neglecting unobserved heterogeneity does not seem to induce much biases on our data as far as the estimation of budget and price elasticities is concerned.
16
References [1] Banks J, Blundell R, Lewbel A (1997) Quadratic Engel Curves and Consumer Demand. Review of Economics and Statistics 79: 527-539. [2] Blundell R, Robin JM (1999) Estimation in Large and Disaggregated Demand Systems: An Estimator for Conditionally Linear Systems. Journal of Applied Econometrics 14: 209-232. [3] Blundell R, Robin JM (2000) Latent Separability: Grouping Goods Without Weak Separability. Econometrica 68: 53-84. [4] Browning M, Meghir C (1991) The E¤ect of Male and Female Labor Supply on Commodity Demands. Econometrica 59: 925-951. [5] Chamberlain G (1984) Panel Data. In: Z Griliches, MD Intriligator (eds) Handbook of Econometrics. Elsevier Science Publishers, Amsterdam, North Holland, 2: 1247-1318. [6] Deaton AS (1988) Quality, Quantity and Spatial Variation of Price. American Economic Review 78: 418-430. [7] Deaton AS, Muellbauer J (1980) Economics and Consumer Behavior. Cambridge University Press, New-York. [8] Gardes F, Duncan GJ, Gaubert P, Starzec C (1998) A Comparison of Consumption Models Estimated on American and Polish Panel and Pseudo-Panel Data. Working Paper 1998-09, LAMIA, Université Paris I. [9] Gardes F, Langlois S, Richaudeau D (1996) Cross-Section Versus Time-Series Income Elasticities of Canadian Consumption. Economics Letters 51: 169-175. [10] Hausman JA (1978) Speci…cation Tests in Econometrics. Econometrica 46: 12511270. [11] Hausman JA, Newey WK, Powell JL (1995) Non-Linear Errors in Variables: Estimation of some Engel Curves. Journal of Econometrics 65: 205-233. 17
[12] Holly A, Sargan D (1982) Testing for Exogeneity in a Limited Information Framework. Cahiers de Recherches Economiques 8204, Université de Lausanne. [13] Mundlak Y (1978) On the Pooling of Time Series and Cross-Section Data. Econometrica 46: 69-85.
18
Table 1 Monte Carlo estimates (1) with and (2) without correction for unobserved heterogeneity, ¿ = 0:1
Variables
µ0
T=5 (1)
T = 10 (2)
(1)
(2)
H = 500 Constant Log price 1 Log price 2 Log price 3 Log total expenditure Log total expenditure squared Mean over time of log total expenditure Mean over time of log income Reduced form residuals
0.4 ¡0.3 0.3 0.0 ¡0.2 0.1 0.1 0.1 0.3
Mean Square Error
0.586 ¡0.330 0.322 ¡0.060 ¡0.234 0.101 0.163 0.111 0.339
(0.694) (0.148) (0.075) (0.225) (0.112) (0.003) (0.116) (0.048) (0.131)
0.148
¡0.378 ¡0.367 0.430 0.232 0.018 0.101
(0.291) (0.050) (0.070) (0.100) (0.036) (0.004)
¡0.037 (0.040) 0.165
0.480 ¡0.307 0.303 ¡0.026 ¡0.216 0.101 0.132 0.105 0.317
(0.460) (0.085) (0.045) (0.149) (0.078) (0.003) (0.081) (0.035) (0.089)
0.157
¡0.386 ¡0.368 0.430 0.235 0.018 0.101
(0.221) (0.042) (0.055) (0.074) (0.036) (0.003)
¡0.038 (0.039) 0.175
H = 1000 Constant Log price 1 Log price 2 Log price 3 Log total expenditure Log total expenditure squared Mean over time of log total expenditure Mean over time of log income Reduced form residuals Mean Square Error
0.4 ¡0.3 0.3 0.0 ¡0.2 0.1 0.1 0.1 0.3
0.472 ¡0.297 0.305 ¡0.023 ¡0.216 0.101 0.146 0.103 0.317
(0.501) (0.093) (0.049) (0.162) (0.086) (0.002) (0.090) (0.036) (0.098)
0.158
¡0.388 ¡0.364 0.429 0.236 0.016 0.101
(0.203) (0.036) (0.048) (0.070) (0.025) (0.003)
¡0.036 (0.027) 0.172
0.458 ¡0.295 0.297 ¡0.018 ¡0.212 0.101 0.129 0.104 0.313
(0.333) (0.053) (0.030) (0.108) (0.057) (0.002) (0.060) (0.026) (0.065)
0.154
¡0.382 ¡0.363 0.427 0.234 0.015 0.101
(0.168) (0.028) (0.041) (0.056) (0.025) (0.002)
¡0.034 (0.028) 0.171
Notes: estimates based on 1000 replications; standard deviations of the 1000 estimated parameters in parentheses.
Table 2 Descriptive statistics
Variables Budget shares Vegetables (V EGE) Fruit (FRU IT) Meat (MEAT ) Chicken and game (CH ICK) Fish (FISH) Dairy produce (DAIRY ) Alcohol (ALCO) Total food expenditure (euros per quarter) Prices (euros per kilogram) V EGE FRU IT MEAT CHICK FISH DAIRY ALCO Age of the household’s head Household’s size Household’s monthly income (euros) Number of observations
Means
(Std. dev.)
0.0926 0.0683 0.2996 0.1243 0.0954 0.1921 0.1277 383.6714
(0.0936) (0.0910) (0.1400) (0.0781) (0.0898) (0.1042) (0.1297) (225.6084)
1.3523 1.5661 8.1021 1.0083 7.5796 2.2108 4.1455 53.2492 2.8172 1695.4045
(0.6368) (0.4768) (2.6080) (1.4884) (4.0282) (1.8144) (3.3640) (15.0418) (1.3846) (1017.2015)
25776
20
Table 3 Estimated budget e¤ects
Groups
Log food expenditure
V EGE FRU IT MEAT CH ICK FISH DAIRY ALCO
¡0.04079 0.02665 0.01827 0.03720 ¡0.04944 0.08971 ¡0.08159
(0.01156) (0.01811) (0.01764) (0.02324) (0.02292) (0.04037) (0.05346)
a
a a
Log food expenditure squared 0.00620 ¡0.00067 ¡0.01148 0.00095 ¡0.00102 ¡0.00209 0.00811
(0.00151) (0.00169) (0.00185) (0.00239) (0.00198) (0.00352) (0.00495)
a a
Mean over time of log food expenditure
¡0.00272 ¡0.02516 0.07723 ¡0.02735 0.04508 ¡0.08301 0.01593
(0.00682) (0.00849) (0.00837) (0.00657) (0.00727) (0.00977) (0.01285)
a a a a a
Mean over time of log income 0.00699 ¡0.00320 ¡0.00402 ¡0.00936 0.02801 ¡0.04020 0.02178
(0.00418) (0.00448) (0.00557) (0.00341) (0.00426) (0.00466) (0.00547)
a a a a
Simultaneity tests
¡0.02880 ¡0.03445 0.09841 ¡0.05964 0.05914 ¡0.13081 0.09615
(0.01222) (0.01546) (0.01543) (0.01320) (0.01544) (0.02295) (0.03039)
Overidenti…cation tests (Â22) a a a a a a a
0.30941 0.07192 0.00028 0.19213 0.22961 0.89423 1.15387
(0.85667) (0.96468) (0.99986) (0.90841) (0.89154) (0.63947) (0.56162)
Notes: also includes 28 socio-demographic variables, 6 log relative prices and a constant; standard errors (p-values for overidenti…cation tests) in parentheses; a signi…cant at the 5% level.
Table 4 Estimated budget shares
Groups
V EGE F RUIT MEAT CHICK F ISH DAIRY ALCO
Mean point 0.086 0.063 0.326 0.113 0.110 0.177 0.125
(0.006) (0.008) (0.011) (0.012) (0.015) (0.022) (0.017)
First quartile
Median point
0.086 0.069 0.330 0.115 0.104 0.183 0.113
0.086 0.060 0.330 0.114 0.105 0.179 0.125
(0.006) (0.008) (0.010) (0.011) (0.013) (0.020) (0.016)
(0.006) (0.008) (0.011) (0.012) (0.015) (0.022) (0.017)
Third quartile 0.086 0.059 0.327 0.111 0.112 0.175 0.130
(0.007) (0.009) (0.013) (0.013) (0.016) (0.024) (0.018)
Note: standard errors in parentheses.
Table 5 Estimated food expenditure elasticities
Groups
V EGE F RUIT MEAT CHICK F ISH DAIRY ALCO
Mean point 1.157 1.330 0.747 1.405 0.469 1.403 0.916
(0.128) (0.271) (0.033) (0.149) (0.073) (0.160) (0.211)
First quartile
Median point
1.124 1.306 0.767 1.390 0.443 1.396 0.873
1.156 1.348 0.751 1.398 0.442 1.399 0.915
(0.124) (0.238) (0.033) (0.139) (0.082) (0.152) (0.231)
(0.128) (0.288) (0.032) (0.147) (0.074) (0.159) (0.211)
Note: standard errors in parentheses.
22
Third quartile 1.172 1.351 0.741 1.411 0.476 1.405 0.932
(0.132) (0.297) (0.032) (0.155) (0.071) (0.165) (0.205)
Table 6 Estimated conditional uncompensated own-price elasticities
Groups
V EGE FRUIT MEAT CH ICK FISH DAIRY ALCO
Mean point
¡1.172 ¡1.160 ¡0.628 ¡0.874 ¡0.866 ¡1.085 ¡1.137
(0.038) (0.072) (0.041) (0.031) (0.051) (0.017) (0.156)
First quartile
¡1.171 ¡1.148 ¡0.654 ¡0.881 ¡0.870 ¡1.087 ¡1.135
(0.039) (0.066) (0.038) (0.025) (0.048) (0.015) (0.159)
Median point
¡1.172 ¡1.168 ¡0.632 ¡0.877 ¡0.863 ¡1.084 ¡1.137
(0.038) (0.076) (0.040) (0.030) (0.052) (0.017) (0.155)
Third quartile
¡1.174 ¡1.170 ¡0.619 ¡0.870 ¡0.864 ¡1.084 ¡1.138
(0.038) (0.077) (0.042) (0.033) (0.053) (0.018) (0.155)
Note: standard errors in parentheses.
Table 7 Estimated elasticities at mean point (1) with and (2) without correction for unobserved heterogeneity
Groups
Food expenditure (1)
V EGE F RUIT MEAT CHICK F ISH DAIRY ALCO
1.157 1.330 0.747 1.405 0.469 1.403 0.916
Cond. uncomp. own-price
(2)
(0.128) (0.271) (0.033) (0.149) (0.073) (0.160) (0.211)
1.236 1.092 0.689 1.613 0.656 1.378 1.105
(1)
(0.085) (0.142) (0.014) (0.247) (0.035) (0.131) (0.114)
¡1.172 ¡1.160 ¡0.628 ¡0.874 ¡0.866 ¡1.085 ¡1.137
(0.038) (0.072) (0.041) (0.031) (0.051) (0.017) (0.156)
Note: standard errors in parentheses.
23
(2)
¡1.154 ¡1.140 ¡0.616 ¡0.837 ¡0.945 ¡1.063 ¡1.032
(0.019) (0.044) (0.021) (0.054) (0.022) (0.014) (0.048)
Appendix Table A1 Number and proportion of zeros in the dependent variables
Budget shares V EGE F RUIT MEAT CHICK F ISH DAIRY ALCO Number of obs.
Zeros
%
2096 9520 467 1077 2844 132 4653
8.13 36.93 1.81 4.18 11.03 0.51 18.05
25776
Table A2 Correlation matrix of prices
Groups V EGE FRU IT MEAT CHICK FISH DAIRY ALCO
V EGE
FRUIT
MEAT
1.000 0.143 0.221 0.055 0.153 0.117 0.052
1.000 0.246 0.018 0.161 0.085 0.047
1.000 0.075 0.309 0.207 0.086
24
CHICK
FISH
DAIRY
ALCO
1.000 0.020 0.102 ¡0.013
1.000 0.153 0.096
1.000 0.055
1.000
Table A3 Reduced form estimates
Variables
Coe¤. (Std. err.)
Quarterly and annual dummies Quarter 2 Quarter 3 Quarter 4 1992 1993 Log relative prices, ref: ALCO V EGE FRU IT MEAT CHICK FISH DAIRY Log absolute price ALCO Log income Mean over time of log income
¡0.06823 ¡0.22134 0.03929 ¡0.01122 ¡0.03129
(0.00907) (0.00892) (0.00893) (0.00775) (0.00799)
a
0.04164 0.11990 0.20039 0.03895 0.00160 0.01510
(0.00901) (0.01290) (0.01093) (0.00369) (0.00769) (0.00521)
a
0.44804 (0.01729) ¡0.03711 (0.03010) 0.23816 (0.03079)
a
Notes: also includes 28 socio-demographic variables and a constant; a signi…cant at the 5% level.
25
a a a
a a a a
a