(1999a; 1996b) and Walker and Ben$Akiva (2002) develop a the$ ...... [18] Murdock, Jennifer (2006), VHandling Unobserved Site Characteristics in Ran$.
Integrating Revealed Preferences and Psychometric Measures to Identify Heterogeneity in a Mixed Logit Model: An Application to Congestion in Demand for Recreation Felipe Vásquez Lavín and Michael Hanemanny Department of Agricultural and Resource Economics Universtiy of California, Berkeley (Draft Version, please do not quote) April 27, 2007
Abstract We integrate taste indicators and revealed preferences to describe heterogeneity in preferences for good’s attributes in a Mixed Logit (ML) model. Results from a ML model can be interpreted as a distribution of taste that divides the sample into a group of people who like an attribute of the good from those individuals that dislike it. Taste indicators, represented in a like-dislike scale, constitute additional information which contributes to depict the distribution of taste, especially for attributes that might a¤ect individuals’ utility in a positive or negative direction. We apply this methodology to describe a preference distribution for congestion in recreation demand models. Using a traditional ML model we conclude that almost 60% of the sample likes crowded places. In contrast, the integrated model suggests that 100% of the sample dislike congestion. These results show the potential bene…t of using taste indicators to correctly account for individual preferences over attributes that describe the alternatives of a choice set. Ph. D. Candidate, Department of Agricultural and Resource Economics, University of California at Berkeley. y Professor, Department of Agricultural and Resource Economics, University of California at Berkeley.
1
1
Introduction
In this article we integrate revealed preferences and taste indicators to account for heterogeneity of preferences among individuals using a Mixed Logit (ML) model. Integrating taste indicators, a type of psychometric measure, contributes to identify a taste distribution for good’s attributes that might a¤ect individuals’ utility in di¤erent directions. We apply this integrated model to estimate a taste distribution for congestion in a recreation demand model. M L has been extensively used to capture individual’s heterogeneity in many areas of applied econometrics including industrial organization (Berry, 1994; Berry et al. 1995, 1998; Nevo 2000), marketing (Louviere et al. 2002; Wedel et al. 1999; Ben-Akiva et al., 1999a; 1999b) and nonmarket valuation (Train,1998; Von Haefen et al. 2004). In a random parameter model, where M L is a particular case, information on heterogeneity is summarized using a parametric density function for the coe¢ cients of the model, which re‡ects how preferences or tastes are distributed across individuals. With this information, it is possible to distinguish the proportion of people who like some attribute of the good from those individuals who dislike it (Train, 1998). Extending this idea, Revelt and Train (1998) and Train (2003) suggest a procedure to describe more precisely the taste distribution for a particular subgroup of individuals based on their observed sequence of choices. Their idea is a useful tool for policy makers as it might provide information for the design of new products or the allocation of resources to enhance the characteristics of recreational sites, among many other applications. However, this extension depends critically in the correct identi…cation of the initial distribution function for the parameters. If we only rely on observed behavior, our conclusions will be constrained by the experimental design, that is, the number of alternatives and combinations of attributes that people face when they make their market decisions. However, and especially in nonexperimental studies, we cannot always provide all possible com2
binations of attributes and alternatives at the same time, or in the whole period of time we collect data, some combinations of attributes or alternatives will never be available. Thus, individuals make decisions over subsets of the potential set of goods. Asking people directly about their attitudes towards certain set of attributes helps us overcome this informational limitation and to characterize the distribution of tastes in the population. However, relying exclusively in psychometric questions does not take into account people do make decisions under constraints and they are willing to accept an undesirable characteristic in compensation for some level of a desirable one. It seems reasonable then, to combine both sources of information in the identi…cation of heterogeneity. The combination of these two sources of information, revealed preferences and psychometric measures of tastes, provides a convenient way to overcome the lack of information needed to evaluate how appropriately a given distribution describes people’s preferences. Results in a ML model depend strongly on assumptions about the distribution of the unknown component of the parameters. For some parameters, there is theoretical support to …nd out whether the distribution obtained after the estimation process has a reasonable shape. However, in many occasions we do not know the correct sign, if any, and the appropriate shape of the distribution for the parameters of interest. In other words, we do not know whether the outcome of the random parameter estimation really re‡ects the true taste variation over people. Paradoxically, it is for those parameters that we use a random parameter model after all. In general, we want the entire distribution of the parameter because we want to interpret this distribution as a re‡ection of taste variation over the population. Let us discuss examples for these two situations in the context of recreation demand models. First, consider a travel cost coe¢ cient (price coe¢ cient), and assume that the random component of this coe¢ cient is normally distributed over
3
the population. Within this framework, it is commonly found in a ML model that a portion of the distribution of this coe¢ cient lies in the positive part of the real line. In other words, there is a nonzero probability that the price coe¢ cient for some individuals is positive, though this is not a plausible result from a theoretical perspective. Given this prior belief, researchers solve this inconsistency constraining the support of the distribution, such that the estimated model satis…es the theoretical restrictions, for example, using a lognormal distribution for the coe¢ cient (Walker and Ben-Akiva 2002; Train, 2003). This is a convenient solution to overcome possible unrealistic values for the parameters, but we would expect that if the model is correctly speci…ed and the sampling procedure is appropriate for the problem under analysis then the selection of the distribution should not change radically the main qualitative results. Second, consider a parameter associated with a variable measuring the crowding conditions in a recreational site. Again, using a normal distribution assumption for the unknown parameter, we will probably obtain a portion of the estimated distribution in both the positive and negative part of the real line. Evaluation of this result is more complicated in this case because we do not have an unambiguous prior belief for the sign of the parameter. We could always come out with some arguments in favor of any of these two results1 . On one hand, we could justify a positive coe¢ cient arguing that high crowding levels produce an agglomeration e¤ect, i.e., people like crowded places. On the other hand, a negative sign for the coe¢ cient would re‡ect people dislike crowds, which is typically called a congestion e¤ect. It could be argued that the same factors impeding a correct estimation of the price parameter distribution might a¤ect the distribution of our new parameter of 1 A similar situation can be found in a traditional …xed parameter estimation. Given the lack of theory or additional information about the relationship between an explanatory variable and the dependent variable, almost any sign of the coe¢ cient and sometimes any magnitude of it can be explained by some creative relationship between the variables.
4
interest. If this is true, we should proceed in the same way as we did with the price distribution, i.e., we will need to shift a little bit the distribution to its correct position, using some predetermined criterion, additional information, or just our own beliefs about it. Unfortunately, unlike the price example we cannot discriminate between a correct and an incorrect distribution because we do not have enough information, from a theoretical perspective, about how people should react to this attribute. We are interested in using psychometric measures as additional information to evaluate whether this distribution is correct. The common practice in random parameter models is to adjust the distribution of those parameters for which we have prior beliefs, but accept as correct the distribution of those parameters for which we do not have such priors. In this way we treat di¤erent parameters asymmetrically, and there is not reason to think that only some of the distributions are correct. Ben-Akiva et al. (1999a; 1996b) and Walker and Ben-Akiva (2002) develop a theoretical framework that underscores many aspects of the decision making process a¤ecting the observed outcome in a choice model. They suggest the use of psychometric data to measure attitudes and perceptions about attributes of alternatives and use this information in the estimation process to enrich the model. It is a matter of new research to evaluate how these indicators have to be measured and to what extend these taste indicators can be used as additional information to identify the distributions. In our application, a simple ML model makes us conclude that around half of the sample likes crowded places. However, the psychometric scales tell us that people strongly dislike this attribute and they look for places with low levels of crowding. Integrating the psychometric measures in the estimation of the model adjust the estimated distribution, providing di¤ering conclusions in comparison with the simple ML model. In this paper we focus in the way that psychometric information can help to
5
characterize taste distribution among individuals, but we also contribute to the literature on congestion in recreation demand model in several ways. Previous literature on congestion has found both positive and negative signs for parameters associated with crowding (Cicchetti and Smith, 1973; Deyak and Smith, 1978; McConnell, 1977; Berrens et al., 1993; Boxall et al., 2003; Schuhmann and Schwabe, 2004).2 As mentioned above, both results are justi…ed as the existence of agglomeration or congestion e¤ects, respectively. Additionally, some authors claim the e¤ect of low levels of crowding is positive but after a certain threshold, crowding becomes detrimental to people’s utility. Schuhmann and Schwabe (2004) estimate a model with linear and quadratic e¤ects of crowding and …nd results consistent with this hypothesis since there is a positive linear e¤ect and a negative quadratic e¤ect. From these studies, only a few of them have estimated congestion e¤ects using revealed preference methods (Deyak and Smith 1978; Bell and Leeworthy 1990) and most of the them use stated preferences (McConnell, 1977; Boxall et al. 2003). The di¤erences in sign for the parameter in these applications and the prevalence of stated preference studies can be explained by two issues. First, there is not a consistent way to measure congestion in recreation demand models (Shelby, 1980; Jakus and Shaw, 1997; Boxall et al., 2003). Some authors use objective measures of congestion such as intensity of use, de…ned as aggregated demand per unit of area or parking spots available, waiting times for using resources like boat ramps or climbing walls, number of encounters with other users, etc., while other researchers focus on subjective measures or perceived measures of congestion. Second, Boxall et al. (2003) argue that the problem in revealed preference estimation is that aspects such as expectations, experience and mitigation behavior are confounded with the measure of congestion, therefore it is di¢ cult to disentangle the e¤ect of congestion from other aspects of the decision process. This confounding e¤ect is the result of the endogeneity of crowding levels in many of the studies using revealed preferences. 2 Earlier literature discussing congestion from a theoretical perspective are Anderson and Bonsor (1974), Freeman and Havemann (1977), Cesario (1980), Smith (1981), and McConnell (1988).
6
None of revealed preference studies mentioned above use a discrete choice model as the one presented in this paper. Recently, Timmins and Murdock (2007) develop a method to estimate a revealed preference discrete choice model that takes into account the endogeneity of congestion. Endogeneity in their model is the result of people sorting across sites. They apply this model to estimate welfare measures for …shing activities while O’Hara (2006) estimates the e¤ect of congestion in the demand for rock climbing. These two studies do not have any measure of congestion for the recreational sites in their sample. The absence of this measure in their data, together with the high correlation of congestion with other explanatory variables, and the endogeneity of this variable impose several econometric challenges. They use a sophisticated iterative instrumental variable procedure, that allows them to jointly estimate a proxy for congestion and the parameters of the model. However, in their model, the proxy for congestion is also confounded with other alternative-speci…c site attributes, therefore they suggest a two stage estimator to identify the parameters associated with attributes varying only over alternatives. Unfortunately, this second stage estimation is a linear regression model whose properties depend on the size of the choice set. If there are only a few number of alternatives in the choice set, the second stage might not be a suitable solution. In contrast, in our estimation we do have a measure of congestion. This measure of congestion can be classi…ed as an actual measure of congestion because it has been measure independently by somebody outside of the sample, therefore, according to Jakus and Shaw (1997) it could be considered exogenous to the individual.3 Finally, from a theoretical perspective, Freeman and Havemann (1977) and McConnell (1988) have shown that heterogeneity over willingness to pay and over aversion to congestion a¤ect the optimal price de…nition needed to achieve a socially optimum level of congestion and that heterogeneity also determines the composi3 We
will test this hypothesis in our estimation process.
7
tion of users of the sites, since users with lower price sensitivity and higher aversion to crowding might increase their demand for places with higher entrance fees. Previous literature on congestion has addressed heterogeneity in preferences for congestion through the estimation of di¤erent models for di¤erent groups of individuals, di¤erent parameters for di¤erent activities or di¤erent sites, or not considers it at all in the estimation. In other words, these applications on congestion belong to the …xed parameter framework since random parameters have not been applied to estimate preference heterogeneity for congestion. Even though the sorting model of Timmins and Murdock can be extended to a random parameter model, they do not provide any estimation of heterogeneity for the congestion parameter. Our paper suggests a simple way to use actual measure of congestion together with indicators about attitudes toward congestion to describe the distribution of tastes for this attribute. The model is estimated in a much simpler way and produce similar qualitative results than those in the literature aforementioned. Nevertheless, after estimation of the model, any welfare calculation should take into account the sorting over sites produced by changes in the number of available sites or in the quality of them. We do not provide the calculation of welfare measures since our main objective in this paper is to stress the way that taste indicators can help to extend the estimation and interpretation of ML models. In the next section we describe the theoretical background of random parameters in the context of M L. This section presents the econometric tools needed to estimate the model and to characterize the distribution of taste among individuals, and describes several ways suggested in the literature to incorporate additional information to characterize the distribution of taste, including the use of psychometric data. Section 3 describes the data, the estimated model and the main econometric results. Finally, we present a discussion of these results and conclusions.
8
2
Background
The random parameter model presented by Train (1998; 2003) starts with the de…nition of a utility function for individual n in period t given that he chose alternative j; denoted by Unjt =
0 n xnjt
+ "njt ;
(1)
where xnjt is a vector of observed attributes of the alternatives,
n
is a vector of
unobserved individual’s coe¢ cients varying randomly over individuals and "njt is an identically and independently distributed extreme value error term, independent of n:
The vector of coe¢ cients can be de…ned as
e¤ect and n
n
n
=b+
n;
where b is the average
represents an individual’s deviation from the average, in other words,
is characterized by a probability distribution,
g( j ), where
n
is a set of
parameters de…ning the distribution g(:): Replacing this de…nition in (1) the utility function is Unjt = (b +
0 n ) xnjt
+
njt
= b0 xnjt +
0 n xnjt
+ "njt :
Given the assumption for the error term "njt , and conditioning on the random component
n;
the probability that individual n chooses alternative j in period t is
given by Lnjt
e =P
0 n xnjt
ie
0 n xnit
;
(2)
which is the traditional formulation of a conditional logit model. Typically, researcher have a panel of N individuals making decisions over T periods, then, grouping all decision occasions for the same individual, the joint probability that individual n chooses a sequence of alternatives ynj = hynj1 1 ; :::; ynjT T i is given by Lnj ( ) = Lnynj1 1 ::: LnynjT T =
T Q
t=1
Lnynjt t =
T Q J Q
t=1 j
e P
0 n xnjt
ie
0 n xnit
!ynjt
;
(3)
with ynjt equal to 1 if person n; chose alternative j in period t and 0 otherwise.
9
Since we do not observed
n;
the unconditional probability is obtained integrating
out this random component, that is, Pnj =
Z
Z
Lnj ( )g ( j ) d =
T Q J Q
e P
ie
t=1 j
and the likelihood function for the sample is L( ) =
N Z1 Y T Y J Y n
t
1
j
0 n xnjt 0 n xnit
!ynjt
g( j )d ;
(Lnjt )ynjt g ( j ) d ;
or in its log form
` = ln L( ) =
N X n
0
ln @
Z1 Y T Y J t
1
j
1
(Lnjt )ynjt g ( j ) d A :
The integral in the likelihood function is approximated by simulation, i.e., R random draws are taken from the distribution g ( j ) ; and the probability Pnj in the likelihood function is replace by R
Pnj
1 X = R r=1
where
r
T Q J Q
t=1 j
e P
r0
ie
xnjt r0
xnit
!ynjt !
denotes the r-th random draw. The estimated parameters ^ obtained by
maximizing this likelihood function are called the simulated maximum likelihood estimators. An important issue in the literature involves the de…nition of the random parameter
n;
in order to correctly capture the variation in individuals’ preferences.
Some authors (Berry, 1994; Berry et al., 1995) suggest incorporating individual information in the de…nition of
n;
so that the randomness of this parameter depends
on some observable information and not only on the assumption about the random component
n.
This inclusion will turn into the traditional interaction between
individual information and attributes of the good. Ben-Akiva et al. (1999a; 1999b) and Walker and Ben-Akiva (2002) suggest in10
cluding taste indicators in the model. If we treat them as another explanatory variable, the simplest way to incorporate these indicators in the model is to plug 4 n.
them directly into the de…nition of the random parameters ideas together
n
Putting these two
could be de…ned as
n
= b + Dn + Hn +
n;
(4)
where Dn are demographics variables and Hn are variables re‡ecting individual opinion about attributes of the goods (taste indicators). Then the utility function will be transformed to Unjt = (b + Dn + Hn + =
0 n ) xnjt
Dn xnjt + Hn xnjt + (b +
+ "njt
(5)
n ) xnjt
+ "njt :
(6)
This model includes new sources of heterogeneity through interactions between individual’s information and attributes, Dn xnjt and between indicators and attributes, Hn xnjt : Ben Akiva et al. (1999a; 1999b) and Walker and Ben-akiva (2002) explicitly reject the idea of putting the indicators inside of the de…nition of
n,
mainly because there
is not a clear causal e¤ect of these indicators, they only help us measure the causal relationships existing in the model. Their model deals with latent explanatory variables, where researchers do not observe all the relevant explanatory variables of the model and the taste indicators measure how people feel about this unobservable variable (latent variables). In our case we assume we observe the relevant explanatory variables, but we want to describe how each individual of the sample feels about this attribute, in other words, we want to describe the distribution for the parameters associated with these variables. We adjust their model in order to …t our main interest. 4 See Walker and Ben-Akiva (2002) for a full literature review on di¤erent ways to extend the traditional random utility model.
11
The structural model propose by these authors includes the utility equation given in (1) and the equation that characterizes the random parameters given in (4) but without taste indicators, i.e.,
n
= h(Dn ; ) +
= b + Dn +
n;
(7)
notice that Dn is the matrix of explanatory variables including any relevant information except taste indicators. Ben Akiva et al. de…ne some measurement equations, which allow us to identify the structural model. In the choice case we have already de…ned these equations, which are the indicators ynit explained above. This measurement provides us with the choice probability equation given by either (2) and (3) depending on whether we have a cross section or a panel data. Finally, we have to de…ne the taste indicator measurement equations IS = w (W;
n;
)+
D (0;
);
these indicator equations are the answer to S psychometric questions that capture with some level of certainty the true individuals’taste, which we want to identify. The de…nition includes any functional form for w(:) and also any set of explanatory variables, W; beside the de…nition of
n:
Putting all these equations together
we obtain a model similar to the ML described above, but with the inclusion of an additional element capturing the information in the indicator equations. The probability that an individual n chooses a sequence of alternatives, ynj ; is Pnj (ynj jxnj ; Dn ; ; ; ") =
Z
Lnj (y jxnj ;
; ; ; ") g (
jDn ; ; ) d
;
comparing to the previous de…nition we have only introduced new notation to make explicit that this probability depends on the parameters of the function de…ning , i.e., parameters in ; on the error components of this equation ; and the error de…nition of ".
12
Now introducing the indicator component into the model and assuming independence among error terms in the three equations, the joint distribution of ynj and I is Z
f (ynj ; I jxnj ; Dn ; W ; ; ; ; ; ") = Lnj (y jxnj ;
where f (I jW;
; ; ; ") f (I jW;
; ; )g(
jDn ; ; ) d
;
; ; ) is the density function associated with the indicator equa-
tions. Again this integral can be calculated by simulation. Assuming an extreme 0 s n
value distribution for "; and a linear form for the indicators, Is = s
+
s;
with
normally distributed, we would have
f (ynj ; I jxnj ; Dn ; W ; ; ; ; ; ") =
Z
T Q J Q
e P
r=1
0 n xnit
ie
t=1 j
r Y 1
0 n xnjt
0 s n
Is s
!ynjt g(
jDn ; ; ) d
s
;
which is calculated using simulations as 1 X R r
f (ynj ; I jxnj ; Dn ; W ; ; ; ; ; ") =
S Y 1
s=1
T Q J Q
t=1 j
Is
s
e P
r0
ie
0 s
xnjt r0
xnit
!ynjt !
r
:
s
By and large, there are many ways to capture heterogeneity. The simplest way is to include individual information interacting with attributes of the good that have a …xed parameter. The parameters are …xed but since the explanatory variable varies among individuals, its e¤ect is speci…c to each subject. This is analogous to incorporating individual attributes Dn and/or taste indicators Hn in the de…nition of the random parameters as shown in equation (5), since these variables will enter in the model as interactions with attributes of the alternatives with a …xed coe¢ cient. Second, together with the interactions we could include a random component in the model that captures unobserved individual e¤ects. Finally, we could use the 13
indicators, assuming they partially describe the true latent parameter
; and in-
corporate their distributions directly in the de…nition of the likelihood function. In this way the indicators are not a direct part of the random parameter de…nition but instead they contribute to calibrate the estimated distribution of the random parameter in order to make this distribution consistent with how people perceived these attributes.
3
Application
In the traditional revealed preferences approach with panel data, we observed N individuals making a recreational trip to one out of a group of di¤erent destination zones in a period of time t; during T periods. Our data consists of 431 Alaskan anglers that make …shing trips to capture King salmon during the summer of 1986 (22 weeks). For each individual n; we have information on his travel cost (T Cnj ) to each of the 21 sites where King salmon is available. We also have information on the environmental quality (Qjt ) of each site at each decision occasion, which is measured as an eight-level indicator of availability of …sh, starting from 1 for not available to 8 when the site has plenty of …sh. Additionally there is a three-level indicator of crowding conditions (CRjt ) at site j in period t, with 0 for not crowded, 1 for somewhat crowded and 2 for very crowded. There is also a dummy variable indicating whether people have a cabin in site j (CAnj ) and a measure of the amount of King salmon harvested in site j in the previous year (HARj ). Finally there are several individual information variables that will be described below5 . Then, the form for the utility function in equation (1) is Unjt =
1 n ln(T Cnjt )
+
2 n
Qjt +
3 n
CRjt +
4
CAnj +
5
ln(HARj ) + "njt :
For identi…cation purposes we assume that only the …rst three parameters are random. Even though the model can be estimated with all the parameters being 5A
more detailed description of the data can be found in Vásquez and Hanemann (2006).
14
random, there have been concerns in the random parameter literature about the identi…cation of models when all the parameter are random (Walker, 2002). Columns one and two in table 1 present the results of a …xed parameter model and a random parameter model assuming independent normal distributions for the parameters and using 100 Halton sequences as the sampling procedure for the R draws used in the calculation of the integral discussed above6 . All coe¢ cients are statistically signi…cant and with the expected sign except for the crowding coe¢ cient in the …xed parameter model. Two results are the most interesting in the ML model. First, all the distributions have part of its mass in both sides of the real line. For the price coe¢ cient, we observed a probability of 10% that an individual has a nonnegative value for the parameter, which is inconsistent with economic theory. Similarly from the distribution for the quality indicator we conclude that around 9% of the sample prefers places with lower quality. These outcomes are statistically feasible since we have not constrained our distribution to be only in the positive or negative part of the real line. We could solve this problem using a log normal distribution for those coe¢ cients or just conclude that the portion of the distribution that is outside of the expected range is not signi…cant.7 Second, the ML model provides additional information about the crowding parameter. As noticed before, this coe¢ cient is not signi…cant in the …xed parameter model; however, it is signi…cant in the ML estimation with a standard deviation that is also signi…cant. The distribution for this estimation is shown in …gure 1 under the name ML distribution. These results make us conclude that around 60% of the population likes crowded places. If this is the case, it could explain why this coe¢ cient is not signi…cant in the …xed parameter estimation; negative values for some individuals cancel out with positive values of other individuals. Unfortunately, unlike price and quality, we do not have a theoretical guideline to 6 Actually, a Halton sequence is a systematic sampling procedure instead of a random procedure, which increases the coverage and reduces the variance of the simulator. See Train (1999; 2003) for a discussion on Halton sequences and other topics on simulation. 7 Same results are obtained with normal draws, so they are not presented in the analysis.
15
decide whether this distribution is correct or not. It could also be shifted from its correct position due to the same reason why the distribution of the price parameter is shifted to the right. Without additional information we cannot really evaluate how appropriate this distribution is. Since we are trying to capture heterogeneity among individuals which turns into a like or dislike category, we use some taste indicators obtained in a Liker scale or like-dislike ranking. From the survey, there are several questions available related to the level of crowding of the sites. Let’s consider one of these questions in which individuals were asked about how desirable was to …sh in a site with few other …shermen around, the results are presented in table 2. Most of the people declared that crowded places are undesirable (Notice that -2 means people like uncrowded places). In …gure 1, we also show the distribution of this ranking using the mean and the variance on table 2 and assuming a normal distribution for it (under the title psychometric information). Clearly there is an inconsistency between what people mentioned in the survey about their preferences and what we concluded from the estimation of the ML model. There are many possible actions we can take to deal with this inconsistency. First, we could ignore the information from table 2 arguing people do not answer this type of questions carefully, and what really matters is what they do and not what they say about their preferences. In our opinion, this could be an acceptable answer only if we have a clear notion about how several aspects of the research design a¤ect the possibility of recovering the true taste parameters. In revealed preference studies individuals do not face all possible combinations of alternatives and attributes, therefore recovering the unconstrained taste distribution is not feasible. For example, if we observed a person visiting a very crowded place, it does not necessarily mean he likes crowded places, it might be the case he tolerates crowded places because there are some other characteristics that make the site attractive in spite of the crowd.
16
To have a better understanding of the problem consider …gure 2 which shows the average level of quality, the average level of quality, crowding level and travel cost for each site in the sample. Places are sorted according to quality in an increasing order, and to account for the di¤erences in magnitudes of the three variables we plot the relative di¤erence of each characteristic with respect to its mean (that is, [xi
x] =x): There are several patterns we can highlight from this graph. First, places with
lower travel cost but with also lower levels of quality have consistently lower level of crowding (for example sites 1, 2, 3, 4, 6, 8, 9, 10, 12 in the …gure). Second, high levels of quality and low cost always have high level of crowding (sites 16, 17, 19, 20, 21, 23, 25). In other words, crowding levels move in opposite direction of price and in the same direction of quality. Third, the positive e¤ect of quality is fully compensated at very high travel cost, this is more evident at the highest levels of quality. For instance, for site 22, 24 and 26 the travel cost is so high that it totally compensates the e¤ect of quality. Fourth, it seems the pattern is a little bit di¤erent for low and high quality, for example, for low level of quality almost all places have the lowest level of crowding, prices only exacerbate the e¤ect. Finally, there are a few exceptions to these patterns. Site 5 has a low level of quality, a high price, and a high level of crowding. A similar situation happens in site 14, but it is less strong and site 11 has medium quality, low cost and one of the highest levels of crowding. Notice that all of these inconsistencies are in the low level of quality. This shows us that the level of crowding is highly correlated with the other two explanatory variables and that there is substitution between these characteristics. In other words, it is clear that people accept higher levels of congestion because the quality of the …shing activities is good in that area. This is especially important when …shing for King salmon, because many people desire to capture a trophy size …sh, so if they want to capture such …sh they have to go to the places with high quality and accept the congestion of these places.
17
To take into account this fact we incorporate the taste indicators in the estimation of the ML model. Our de…nition of the parameter for crowding is as follows,
cr
= f (Dh ; ) +
in this equation we add several individual explanatory variables: size of the household, age, gender, education, number of years people have been …shing in Alaska, a dummy variable taking value 1 if people know many …shing places, number of years as a resident in Alaska and income. The utility function has the same variables de…ned before. Recall that in this case all the demographic variables will be interacting with the crowding rate, the indicator are incorporated directly into the likelihood function through their density function. We use six indicators, all of them are …ve-level scaled questions from very undesirable to very desirable or from de…nitively no to de…nitively yes. Two of them are directly related to crowding. For instance, the variable N OCROW D represents the answer about how desirable is "A site with few other …sherman around." The variable CHCROW D shows how well the statement "We usually go out of our way to avoid sites crowded with other …shermen" applies to an individual household. Furthermore, two indicators are indirectly related to crowding since they describe attributes that we expect to be present together with lower levels of crowding. These variables are BEAU T Y and W ILD; describing a site with exceptional beauty and a wilderness area, respectively. Finally, there are two indicators which are related to the size of the …sh caught, T rophy describes the answer to the statement "a good chance to catch trophy-sized …sh" and Limit describes the statement "a good chance to catch your limit." These indicators are denoted by the equations Is = =
s cr s(
+
s
Dh + ) +
18
s
Results for this model are presented in the last two columns of table 1. For identi…cation purposes the standard deviation of one of these equations has been …xed to one. In this estimation, all of the coe¢ cients of the choice model and the indicator equations are signi…cant, while only size of household, age, age squared and the number of years …shing in Alaska by anglers are signi…cant in the equation for
cr .
The mean value for the crowding coe¢ cient is E( cr ) = 0 X = 0:372 49 and its p standard deviation SD( cr ) = vars ( ) = 0:143. Figure 1, also includes the plot of this distribution under the name integrated model. The new estimation results correctly point out that people do not like crowded places. Nevertheless, there is not a big negative impact of the crowding conditions, compared to what they declared in the psychometric questions. Furthermore the variance is smaller than in the ML estimation suggesting there is not a lot of variation in the way people feel about crowding conditions. We also estimated a ML with demographic variables and/or with indicators included as part of the de…nition of the random parameter,
n
= b + Dn + H n +
n;
but none of these alternatives a¤ect the qualitative results of the simple ML model.
4
Comments
Even though, our main interest is to show how to use taste indicators in order to identify heterogeneity among individuals, there are several aspects related to congestion that have to be discussed in order to evaluate their e¤ect on the main conclusions of this paper. First, it has been recognized in many papers that congestion is a endogenous variable and not taking into account this endogeneity could explain it’s wrong sign in …xed parameter estimations. The endogeneity of crowding measures is related to the way that congestion is measured. Jakus and Shaw (1997) classi…ed measures of congestion in four categories; actual, expected, anticipated and perceived congestion. Our measure of 19
congestion should be considered as an actual measure since it was de…ned independently by an outside member of the sample. Jakus and Shaw conclude that an actual measure of congestion is exogenous to the consumer so researchers should not be concerned about the endogeneity problem. We test this hypothesis following a similar method as that suggest by Villas-Boas and Winer (1999) for models with micro data, which use instrumental variables to test the endogeneity of the variable. We use several instrumental variables to avoid any bias due to the selection of the instrument and none of the results provided any evidence of endogeneity.8 Additionally, as mentioned before, recent applications on revealed preferences dealing with the problem of endogeneity of congestion do not have any measure of congestion, and they have to create proxy variables for it. Since we have measures of congestion their methodology is not completely appropriate to our data, but calculation of welfare measure should take into account the sorting implications of any policy under evaluation (see Timmins and Murdock 2007). Finally the full model could include a time and site speci…c e¤ect, which can be captured by a set of dummy variables for each alternative as in Nevo (2000) and Murdock (2006). In our data all the variables except the previous harvest (HAR) vary at least according to two out of three elements; time, people and site. Therefore it is possible to identify dummy variables for the alternatives since they will not be confounded with the attributes of the sites. Estimations with dummy variables for each site were not di¤erent from those presented above so we report only the simple estimations.
5
Conclusions
We show a simple way to use taste indicators in the traditional ML model, which contributes to describe heterogeneity of preferences over attributes of the goods. 8 Since
Villas-Boas and Winer’s model is also very complex we do not present those results here.
20
This methodology is especially useful for those attributes that could a¤ect individuals’utility in a positive and negative direction. In the particular case of congestion, the traditional ML model report results that are inconsistent with the way people feel about this site attribute. The methodology presented in this paper contributes in bringing together two di¤erent pieces of information; on one hand the observed behavior and on the other hand the attitudes toward the attributes of interests. Our results suggest that researchers could have a better description of preferences if they combine these two sources of information.
References [1] Anderson , F. And N. Bonsor (1974), "Allocation, Congestion and the Valuation of Recreational Resources" Land Economics 50:51-57. [2] Bell, F. and V. Leeworthy (1990), "Recreational Demand by Tourist for Saltwater Beach Days" Journal of Environmental Economics and Management 18:189205. [3] Ben-Akiva, M., D. McFadden, T. Gärling, D. Gopinath, J. Walker, D. Bolduc, A. Börsch-Supan, P. Delquié, O. Larichev, T. Morikawa, A. Polydoropoulou and V. Rao (1999a), "Extended Framework for Modeling Choice Behavior" Marketing Letters 10(3):187-203. [4] Ben-Akiva, M., J. Walker, A. Bernardino, D. Gopinath, T. Morikawa and A. Polydoropoulou (1999b), "Integration of Choice and Latent Variable Models" Working Paper, MIT. [5] Berrens, R., O. Bergland and R. Adams (1993), "Valuation Issues in an Urban Recreational Fishery: Spring Chinook Salmon in Portland, Oregon" Journal of Leisure Research 25(1):70-83.
21
[6] Berry, Steven (1994), “Estimating Discrete-Choice Models of Product Di¤erentiation" Rand Journal of Economics 25(2):242-262. [7] Berry, S., J. Levinsohn and A. Pakes (1995), "Automobile Prices in Market Equilibrium" Econometrica 63(4):841-890. [8] Berry, S., J. Levinsohn and A. Pakes (1998), "Di¤erentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market" NBER Working Papers Series No. 6481 Cambridge, MA. [9] Boxall, P. , K. Rollins, K. and J. Englind (2003), "Heterogeneous Preferences for Congestion during a Wilderness experience" Resource and Energy Economics 25:177-195. [10] Cesario, F. (1980), "Congestion and the Valuation of Recreation Bene…ts" Land Economics 56(3):331-338. [11] Cicchetti, C. and V. Smith (1973), "Congestion, Quality Deteriorations, and Optimal Use: Wilderness Recreation in the Spanish Peaks Primitiva Area" Social Science Research 2:15-30. [12] Deyak, T. and V. Smith (1978), "Congestion and Participation in Outdoor Recreation: A Household Production Function Approach" Journal of Environmental Economics and Management 5:63-80. [13] Freeman, M. and R. Havemann (1977), "Congestion, Quality Deterioration, and Heterogeneous Preferences" Journal of Public Economics 8:225-232. [14] Jakus, P. and D. Shaw (1997), "Congestion at Recreation Areas: Empirical Evidence on Perceptions, Mitigating Behavior and Management Preferences" Journal of Environmental Management 50:286-401.
22
[15] Louviere, J., R. Carson, A. Ainslie, J. Deshazo, T. Cameron, R. Kohn and T. Marley (2002), "Dissecting the Random Component of Utility" Marketing Letters 13:3,177-193. [16] McConnell, K. (1977), "Congestion and Willingness to Pay: A Study of Beach Use" Land Economics 53(2):185-195. [17] McConnell, K.(1988), "Heterogeneous Preferences for Congestion" Journal of Environmental Economics and Management 15:251-258. [18] Murdock, Jennifer (2006), "Handling Unobserved Site Characteristics in Random Utility Models of Recreation Demand" Journal of Environmental Economics and Management 51:1-25 [19] Nevo, A. (2000), “A Practitioner’s Guide to Estimation of Random-Coe¢ cients Logit Models of Demand” Journal of Economics and Management Strategy 9:513-48. [20] O’Hara, Michael (2006), "Nobody goes there anymore....it’s too crowded" Working paper, Binghamton University, Department of Economics. [21] Revel, D. and K. Train (1998), "Mixed Logit with Repeated Choices: Households’ Choices of Appliance E¢ ciency Level" The Review of Economics and Statistics 80(4):647-657. [22] Schuhmann, P. and K. Schwabe (2004), "An Analysis of Congestion Measures and Heterogeneous Angler Preferences in a Random Utility Model of Recreational Fishing" Environmental and Resource Economics 27:429-450. [23] Shelby, B. (1980), "Crowding Models for Backcountry Recreation" Land Economics 56(1):43-55.
23
[24] Smith, K. (1981), "Congestion, Travel Cost Recreational Demand Models, and Bene…t Evaluation" Journal of Environmental Economics and Management 8:92-96. [25] Timmins, C. and J. Murdock (2007), "A Revealed Preference Approach to the Measurement of Congestion in Travel Cost Models" Journal of Environmental Economics and Management 53(2):230-249 [26] Train, K., D. McFadden and M. Ben-Akiva (1987), "The Demand for Local telephone service: A fully discrete model of residential calling patterns and service choices" Rand Journal of Economics 18(1):109-123. [27] Train, K. (1998), "Recreation Demand Models with Taste Di¤erences Over People" Land Economics 74(2):230-39. [28] Train (2003), Discrete Choice Model with Simulation Cambridge University Press. [29] Villas-Boas J. and R. Winer (1999), "Endogeneity in Brand Choice Models" Management Science 45(10):1324-1338. [30] Von Haefen, R., D. Phaneuf and G. Parson (2004), "Estimation and Welfare Analysis with Large Demand System" Journal of Business & Economic Statistics 22(2):194-205. [31] Walker, J. (2002), "The Mixed Logit (or Logit Kernel) Model: Dispelling Misconceptions of Identi…cation" Transportation Research Record 1805:86-98. [32] Walker, J. and M. Ben-Akiva (2002), "Generalized Random Utility Model" Mathematical Social Sciences 43:303-343. [33] Wedel, M., W. Kamakura, N. Arora, A.Bemmaor, J. Chiang, T. Elrod, R. Johnson,P. Lenk, S. Neslin and C. Poulsen (1999), "Discrete and Continuous
24
Representations of Unobserved Heterogeneity in Choice Modelling" Marketing Letters 10:3,219:232.
25
Table 1. Estimation Discrete Choice Models explanatory variable Log(TC) t-value Quality t-value Crowding t-value Cabin t-value Log(harvest) t-value Size of household t-value Age t-value Age*age t-value Gender t-value Education t-value Years fishing in Alaska t-value Know many fishing sites t-value Years as a resident t-value Income t-value
(1) Fixed coefficient
(2) Mixed Logit coefficient s.d.
-0.905 -1.132 0.874 -27.858 -16.285 12.060 0.242 0.257 0.192 12.759 10.271 4.508 0.364 1.309 -0.031 -0.702 3.636 11.583 2.131 2.611 15.577 14.250 0.382 0.434 12.612 11.756 Parameters in definition of beta Indicators -
Nocrowd t-value Chose No crowded t-value Wild t-value Beauty t-value Trophy t-value Limit t-value Bold numbers are significant at a 5 % level.
26
(3) Integrated coefficient s.d. -1.196 -22.464 0.300 10.846 -0.684 -5.723 2.395 22.669 0.393 22.548
0.695 15.685 0.265 -7.865 0.110 9.720 -
0.106 1.978 9.945 2.019 -0.093 -1.677 0.020 0.952 -0.004 -0.217 0.176 2.577 -0.007 -0.120 0.010 0.783 0.002 0.073
-
1.000 3.842 11.578 2.628 10.519 2.976 11.047 2.067 9.894 3.342 11.297
1.027 18.599 0.586 27.764 0.769 25.774 0.691 26.161 0.880 26.039 0.760 32.471
Table 2. A site with few other fishermen around very desirable desirable no opinion undesirable very undesirable mean
Value -2 -1 0 1 2
frequency 215 180 34 6 2
-1.373
27
variance
% 49.2 41.19 7.78 1.37 0.46 0.526871
28 0 -4
0.5
1
1.5
2
2.5
3
-3
-2
Psychometric information
-1
0
Figure 1
x
1
2
ML Distribution
Integrated Model Distribution
Distribution for crowding coefficient
3
4
29
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
1
2
3
4
5
6
7
8
10
11
QUALITY
9
12
14
15
Figure 2
CROWD
Sites
13
16
18
19
TRAVCOST
17
Crowding levels, Prices and Quality
20
21
22
23
24
25
26