Effects of distributional assumptions on conditional estimates from Mixed Logit models Stephane Hess The University of Sydney∗ and ETH Z¨ urich† John M. Rose The University of Sydney‡ Abstract With the gains in popularity of random coefficients models such as Mixed Logit, an increasing body of work is looking at the issue of the distributional assumptions made in such models. The results from these studies have shown that more flexibly distributions can have significant advantages over their more assumption-bound counterparts, such as the commonly used Normal distribution. However, their use also leads to significant increases in model complexity and estimation cost. Recently, there has also been an increase in the number of applications making use of individual-specific draws for random coefficients, obtained through conditioning on the observed choices. However, much as in the case of work based on unconditional distributions, the majority of such applications is based entirely on the use of the Normal distribution, and there is little knowledge as to the impact of the (unconditional) distributional assumptions on conditional model results. This gap is filled by the present paper, which, with the help of various applications making use of real and simulated data, shows that different distributions can lead to very different results, even when conditioning on observed choices. Furthermore, the advantages of certain distributions in estimation do not necessarily translate into advantages when working with conditional draws. Finally, the results show that model fit on its own is not an appropriate indicator of how good an approximation to the true distribution is provided by a model, especially when basing the analysis on conditional estimates. ∗
Institute of Transport and Logistics Studies, The University of Sydney,
[email protected] † Institute for Transport Planning and Systems, ETH Z¨ urich,
[email protected] ‡ Institute of Transport and Logistics Studies, The University of Sydney,
[email protected]
1
1
Introduction
The random coefficients formulation of the Mixed Multinomial Logit (MMNL) model (cf. Revelt and Train, 1998; Train, 1998; McFadden and Train, 2000; Hensher and Greene, 2003; Train, 2003) is fast becoming one of the most widely used econometric structures for the analysis of travel behaviour. The main advantage of the MMNL model over its more simplistic closed-form counterparts is that it allows for a relaxation of the assumption of constant marginal utility coefficients across individuals. Here, one of the main topics of interest has been the representation of variations in respondents’ valuation of travel time savings (VTTS), i.e., differences in the willingness to pay for reductions in travel time. Applications making use of the MMNL model in this context are for example discussed by Algers et al. (1998) and Cirillo and Axhausen (2006). In this context, one issue that has received repeated attention is the impact of distributional assumptions on the VTTS findings. Indeed, in the specification of a MMNL model, analysts need to make an a priori choice of a specific mixture distribution for each randomly distributed coefficient. In the absence of information on the true shape of the distribution, these assumptions can have very significant impacts on the quality of the results, as discussed for example by Hensher and Greene (2003), Fosgerau (2005) and Hess et al. (2005). While issues with the choice of distribution also arise in the case of individual randomly distributed coefficients, they are compounded when interested in ratios of two randomly distributed coefficients, such as in the calculation of VTTS measures from models estimated in preference space1 . Here, the expected value of the ratio is not equal to the ratio of the expected values for the two coefficients, such that the ratio needs to be simulated over a set of different draws from the two distributions. In this case, problems can be caused by the range of the two coefficients, and especially the travel cost coefficient used in the denominator of the trade-off. Indeed, very small values for the travel cost coefficient used in the simulation of the trade-off can lead to extreme values, and an overestimated variance. The vast majority of MMNL applications make use of the Normal distribution. Here, problems can arise due to the unbounded nature of the distribution, as well as due to its symmetry assumption. These can lead to issues with sign violations and biased mean values respectively. A possible solution is to use more flexible distributions, not making a strict symmetry assumption, while also allowing for the estimation of bounds to either side. Examples of such distributions include the Johnson SB , used for example by Train and Sonnier (2005) and Hess et al. (2005). The problem is that models making use of these more advanced distributions are 1
I.e., using separate travel time and travel cost coefficients.
2
far more difficult to use in practice, in terms of estimation times as well as data requirements. As such, more basic distributions, such as the Normal, continue to be used in most applications. The issue when using random coefficients models is one of trying to infer information on the true shape of a distribution while minimising the impact of the distributional assumptions on these findings. As such, researchers should aim to maximise the probability that their findings are in fact retrieved from the data, rather than being a result of the distributional assumptions. When using inflexible distributions such as the Normal, it is clearly impossible to eliminate the risk of bias caused by the distributional assumptions. On the other hand, the cost of using more flexible distributions often means that the use of basic distributions, such as the Normal, is the only viable option. One way of breaking free from the strict shape assumptions of target distributions such as the Normal is to conduct a posterior analysis that generates individual-specific draws for the sample of respondents used in the analysis, by conditioning on the observed choices (cf. Revelt and Train, 1999; Train, 2003; Sillano and Ort´ uzar, 2004; Greene et al., 2005). Let β give a vector of taste coefficients that are jointly distributed according to f (β | Ω), where Ω is a vector of distributional parameters that is to be estimated from the data. Let Yn give the sequence of observed choices for respondent n, and let L (Yn | β) give the probability of observing this sequence of choices with a specific value for the vector β 2 . Then it can be seen that the probability of observing the specific value of β given the choices of respondent n is given by: K (β | Yn ) = R
L (Yn | β) f (β | Ω) β L (Yn | β) f (β | Ω) dβ
(1)
Here, we replace the continuous formulation by a discrete approximation, using summation over a very high number of draws. A mean for the conditional for respondent n is then obtained as: PR r=1 [L (Yn | βr ) βr ] c βn = P , (2) R r=1 L (Yn | βr ) where βr with r = 1, . . . , R are independent multi-dimensional draws with equal cn gives the most weight from f (β | Ω) at the estimated values for Ω. Here, β likely value for the various marginal utility coefficients, conditional on the choices observed for respondent n. It is important to stress that the conditional estimates 2
A corresponding formulation for single-choice experiments such as typically encountered in Revealed Preference data is obtained by replacing the probability of the observed sequence of choices with the probability of the single observed choice.
3
for each respondent follow themselves a random distribution, and that the output from equation (2) simply gives the expected value of this distribution. As such, a distribution of the output from equation (2) across respondents should not be seen as a conditional distribution of a taste coefficient across respondents, but rather a distribution of the means of the conditional distributions (or conditional means) across respondents. The use of conditional distributions in discrete choice analyses has a number of advantages. Firstly, they provide a closer approximation to the true distribution of tastes across individuals, potentially reducing the impact of the shape assumptions for the unconditional distribution on model results. Secondly, tradeoffs between random coefficients can now be calculated over a finite set of pairs of conditional taste coefficients. This means that the randomness in this process is reduced significantly when compared to working with simulation, where there is a non-trivial probability of including extreme values in the calculation. This does not only lead to an overestimation of the range of individual coefficients, but, as mentioned above, puts modellers at risk of producing biased trade-offs. Here, it should be noted that, in fact, even when working with conditionals, there is a certain risk of bias when using a ratio of means approach for each individual, given the earlier comment about the draws for individuals themselves stemming from a random distribution. However, the use of simulation of the ratios over these conditional distributions can in itself lead to problems due to the possibility of including extreme values. Furthermore, the risk of bias is reduced by the narrower interval for the individual-specific conditional distributions. Finally, the simulation of individual-specific trade-off values on the basis of individual-specific conditional distributions leads to very high computation cost. A third advantage of the conditional approach is that it is possible to conduct posterior analyses linking the distribution of coefficients to a variation in socio-demographic attributes across respondents. Finally, with unconditional distributions, the treatment of correlation between individual coefficients is non-trivial, and the main method of using the Choleski transformation works only off the mean of the distribution and does not consider the other population moments of the distribution. This issue does not arise with conditional draws, which allow for more substantial tests of correlation. The major drawback with conditional distributions comes in their use in out of sample forecasting. Indeed, draws from a conditional distribution are associated with given individuals in the estimation sample. When moving from a sample to a population3 , it becomes necessary to associate a specific conditional parameter estimate from the sample with each individual in the population. This map3
Or even just an alternative sample used for forecasting.
4
ping of conditional parameters onto a population is a non-trivial task, requiring some form of posterior analysis linking the conditional distribution to respondentspecific information (e.g., income) that is also available at the population level (see for example Hensher and Jones, 2006). Another issue when working with conditional distributions is that, in estimation, modellers specify the taste heterogeneity on the basis of unconditional distributions, whose relationship with the later conditional distributions is not clear a priori. Despite a recent spate of applications (cf. Sillano and Ort´ uzar, 2004; Greene et al., 2005; Hensher and Rose, 2006; Hess, 2006), the use of the conditional approach is still relatively rare. Furthermore, as indicated above, there is little knowledge as to the impacts of the unconditional distributional assumptions on the conditional estimates. Aside from providing a further illustration of the potential benefits of the conditional approach, this paper aims precisely to address this question of the link between the unconditional and conditional distributions. As such, we aim to establish whether advantages of a certain distribution in unconditional estimation do in fact translate into advantages in the accuracy of the conditional estimates. The remainder of this paper is organised as follows. The following section presents the main application, making use of stated preference (SP) route choice data. This is followed in Section 3 by four separate case studies making use of simulated data. Finally, Section 4 presents the conclusions of the paper.
2
Application on stated preference data
In this section, we describe the findings of an application making use of SP route choice data. After a description of the data and the basic model specification (Section 2.1), we first present the results for commuters (Section 2.2), before presenting the results for non-commuters (Section 2.3).
2.1
Data and model specification
The data used in this application were collected in Sydney in 2004, in the context of car driving commuters and non-commuters making choices from a range of levels of service packages defined in terms of travel times and costs, including a toll where applicable. After using a telephone call to establish eligible participants from households stratified geographically, the survey data were collected with the help of a face-to-face computer aided personal interview (CAPI). The actual SP questionnaire presented respondents with sixteen choice situations, each giving a choice between their current route and two alternative routes with varying trip attributes. 5
Figure 1: Example of survey questionnaire A D-efficient design was used in the generation of the SP questionnaires (see for example Bliemer et al., 2005), in which the two SP alternatives are unlabelled routes. The trip attributes associated with each route are free flow time, slowed down time, trip travel time variability, vehicle running cost and toll cost. Variability in travel time for the current alternative was calculated as the difference between the longest and shortest trip time provided in non-SP questions. The SP alternative values for this attribute are variations around the total trip time. For all other attributes, the values for the SP alternatives are variations around the values for the current trip. An example of a survey screen is shown in Figure 1. For a more detailed description of the survey in the context of a recent application, see Hess et al. (2006c). After data cleaning4 , a sample of 7, 072 observations from 442 respondents was obtained, divided into 3, 792 observations from 237 commuters, and 3, 280 observations from 205 non-commuters. A purely linear formulation of the utility function was used, and in addition to the five marginal utility coefficients, dummy variables were associated with the revealed preference (RP) alternative and the first of the two SP alternatives. On the basis of this, the base utility function for alternative i is given by: 4
Respondents observed to always choose the reference alternative across their 16 choice situations were removed from the analysis.
6
Ui = δRP + δSP(1) + βFF FFi + βSDT SDTi + βTC TCi + βTOLL TOLLi + βVAR VARi + εi
(3)
where FFi and SDTi give the free flow and slowed down time of alternative i respectively, with TCi and TOLLi giving travel cost and toll charges, while VARi gives the travel time variability for alternative i. The dummy terms δRP and δSP(1) are only included in the utility functions for the RP and first SP alternative respectively. Finally, εi represents the usual iid type I extreme value term.
2.2
Commuters
Five different models were estimated on the 3, 792 observations for commuters; one Multinomial Logit (MNL) model, and four different MMNL models. All models were coded in Ox 4.2 (Doornik, 2001), where specific code was also written for the calculation of the conditional distributions. For the estimation of the MMNL models, 1, 000 MLHS draws (Hess et al., 2006d) per individual were used in the simulation of the log-likelihood function. The four MMNL models all make the assumption that the taste coefficients vary across respondents, but stay constant across the 16 choice situations for the same respondent, meaning that the integration (and hence simulation) takes place at the level of the individual respondents rather than the individual observations. In the four MMNL models, we allowed for random taste heterogeneity in all five marginal utility coefficients. The difference between the four models comes in the distribution used to represent this variation, where model MMNL(N) uses the Normal distribution, while model MMNL(U) uses the Uniform distribution, model MMNL(T) uses the symmetrical Triangular distribution, and model MMNL(SB ) uses the Johnson SB distribution. In each case, the same distribution is used across all five coefficients. The estimation results for the five commuter models are summarised in Table 5 1 . In the MNL model, all five marginal utility coefficients are of the expected sign and statistically significant, where there is also a significant positive estimate 5
In these models, up to four parameters were estimated for each marginal utility coefficient.
7
8
βVAR
βTOLL
βTC
βSDT
βFF
δSP(1) a b α1 α2 a b α1 α2 a b α1 α2 a b α1 α2 a b α1 α2
δRP
Respondents Observations LL(0) ˆ LL(β) par. adj. ρ2 est. 0.2646 0.0880 -0.0754 -0.0956 -0.3427 -0.3813 -0.0096 -
est. 0.4323 0.1105 -0.1187 0.1235 -0.1582 0.0950 -0.5754 0.3218 -0.6706 0.4326 -0.0265 0.0762
asy. t-rat. 2.86 1.67 -10.32 11.26 -16.82 12.08 -13.39 6.54 -16.55 11.38 -3.11 10.71
MMNL(N) 237 3,792 -4,165.94 -2,324.43 12 0.4392 est. 0.5444 0.1107 -0.3364 0.4111 -0.3255 0.3245 -1.1907 1.2244 -1.4568 1.5487 -0.1574 0.2373 -
asy. t-rat. 3.24 1.68 -20.83 12.53 -31.95 12.82 -11.80 7.77 -27.28 13.35 -10.02 13.73 -
MMNL(U) 237 3,792 -4,165.94 -2,321.38 12 0.4399 est. 0.4435 0.1178 -0.4242 0.5951 -0.4026 0.5092 -1.3995 1.6399 -1.7085 2.0824 -0.2061 0.3523 -
asy. t-rat. 2.96 1.78 -13.49 11.15 -14.69 10.39 -10.06 6.72 -15.71 11.73 -12.19 12.98 -
MMNL(T) 237 3,792 -4,165.94 -2,327.1 12 0.4385
Table 1: Main estimation results for commuter models
asy. t-rat. 3.26 1.72 -18.60 -28.32 -14.91 -29.12 -3.70 -
MNL 237 3,792 -4,165.94 -2,854.28 7 0.3132 est. 0.5921 0.0916 -1.4807 1.5434 -2.3781 1.0835 -0.9409 0.9460 -1.8217 1.0148 -0.8806 0.7148 0.2592 0.0154 -1.5469 1.6574 -0.1167 0.6591 -0.4181 0.4822 -1.3486 0.8348
asy. t-rat. 3.92 1.37 -1.37 1.40 -1.78 2.96 N/A N/A N/A N/A -13.18 8.03 4.43 0.58 -3.76 2.92 -0.52 2.17 N/A N/A N/A N/A
MMNL(SB ) 237 3,,792 -4,165.94 -2,293.94 22 0.4441
for δRP , along with a positive estimate for δSP(1) , significant at the 91% level. All four MMNL models offer very significant improvements in log-likelihood (LL) over the MNL model. After taking into account the differences in the number of parameters, the best performance is obtained by the model using the Johnson SB distribution, ahead of the models using the Uniform, Normal and Triangular distributions. The differences in model fit between these latter three models are very small. All four models retrieve significant levels of variation for all five marginal utility coefficients, where this is especially large for βVAR and βFF . Before proceeding to a more detailed analysis of the results, a further point deserves some attention. Indeed, in the model using the Johnson SB distribution, major issues arose during estimation. Several of the initial estimation runs resulted in an explosion of the offset (a) and range (b) parameters. Furthermore, several additional runs, using different starting values, simply failed to converge, with the estimation continuing after several days6 . The estimation results presented here for MMNL(SB ) were only obtained after lowering the convergence criteria. The problems were however not limited to estimation, but also extended to the computation of the variance-covariance matrix, where it was not possible to obtain standard errors for the parameters of the distribution of βSDT and βVAR . Finally, even for the three remaining coefficients, issues with parameter significance were observed for at least one of the parameters in each case. Similar problems with parameter significance when using the Johnson SB distribution were already observed by Hess et al. (2006a). These issues are a reflection of the difficulties associated with using advanced distributions in MMNL modelling, as alluded to earlier on in this paper. On the basis of the estimation results from Table 1, individual-specific means of the conditional distribution were produced for each marginal utility coefficient in the four MMNL models, conditioned on the observed choices for each respondent. From these conditional draws, summary statistics could be calculated for the distribution of the mean coefficient values across the 237 respondents. Similar Here, parameters a and b are used for the lower bound and the range respectively, such that the upper limit for bounded distributions is given by a + b. The two parameters α1 and α2 are shape parameters. In the MNL model and in MMNL(N), α1 gives a coefficient’s mean value, while, in MMNL(N), α2 is used for the standard deviation. In MMNL(SB ), the first shape parameter α1 has an effect on the skewness of the distribution, where negative values give a right-skewed distribution, zero gives a symmetrical distribution, and positive values give a leftskewed distribution. The second shape parameter α2 defines the actual shape of the distribution in terms of peak, with values greater than 1 leading to a single, progressively steeper peak, while values lower than 1 will eventually lead to two peaks/modes at the extremes of the domain. A b −(z−α , with z ∼ N (0, 1). draw from the SB distribution is obtained as x = a + 1) 1+exp
6
α2
The three remaining MMNL models all took under one hour to reach convergence on a 1.8GHz computer with 1GB of memory.
9
βFF
βSDT
βTC
βTOLL
βVAR
min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0)
MMNL(N) est. cond. −∞ -0.8429 +∞ 0.2384 -0.2666 -0.2700 0.2773 0.2039 14.28% 9.70% −∞ -0.7143 +∞ 0.0016 -0.3551 -0.3476 0.2133 0.1569 8.36% 0.42% −∞ -2.2183 +∞ 0.1197 -1.2920 -1.2956 0.7225 0.3823 9.27% 0.42% −∞ -3.3433 +∞ 0.3390 -1.5057 -1.5103 0.9713 0.7529 6.66% 2.11% −∞ -0.4094 +∞ 0.3598 -0.0595 -0.0593 0.1710 0.1433 37.54% 36.71%
MMNL(U) est. cond. -0.6884 -0.6091 0.1527 0.0911 -0.2678 -0.2538 0.2428 0.1753 16.31% 10.55% -0.6661 -0.5808 -0.0022 -0.0486 -0.3342 -0.3201 0.1917 0.1445 4.28% 0.00% -2.4365 -2.0017 0.0690 -0.1721 -1.1838 -1.1856 0.7233 0.4124 9.24% 0.00% -2.9811 -2.6526 0.1880 0.0252 -1.3966 -1.3928 0.9148 0.7317 1.65% 0.42% -0.3221 -0.2723 0.1633 0.1360 -0.0794 -0.0661 0.1401 0.1185 39.01% 35.02%
MMNL(T) est. cond. -0.9407 -0.7419 0.3791 0.1859 -0.2808 -0.2728 0.2694 0.1958 16.15% 10.13% -0.8927 -0.6849 0.2366 0.0294 -0.3281 -0.3327 0.2305 0.1686 10.85% 0.84% -3.1037 -2.2263 0.5331 -0.0779 -1.2853 -1.2832 0.7424 0.4007 9.81% 0.00% -3.7889 -3.0301 0.8292 0.2627 -1.4799 -1.4845 0.9427 0.7404 7.40% 1.27% -0.4571 -0.3387 0.3242 0.2361 -0.0664 -0.0607 0.1595 0.1338 37.91% 35.44%
MMNL(SB ) est. cond. -2.5008 -1.0903 0.1059 0.0296 -0.2245 -0.2281 0.2608 0.2047 14.47% 2.95% -1.5890 -0.7844 0.0087 -0.0510 -0.2761 -0.2746 0.2140 0.1619 0.00% 0.00% -1.4873 -1.4817 0.0005 -0.2836 -1.0066 -0.9998 0.5829 0.3314 0.00% 0.00% -2.6126 -2.2351 0.1866 0.0067 -1.1252 -1.1689 0.7668 0.5871 1.33% 0.42% -0.7062 -0.4343 0.1082 0.0809 -0.0714 -0.0713 0.1453 0.1216 37.38% 37.97%
Table 2: Comparison of estimated distributions and distribution of conditional means for commuter models statistics were also calculated for the estimated distributions, using the parameter values from Table 1. A comparison of the estimated distributions and the distributions of the conditional means is shown in Table 2. Here, it can be observed that, without exceptions, the use of the conditional means leads to a much narrower range for the distribution of the coefficients, where this effect is at its most dramatic when working with the Normal distribution. Here, it should be noted that the results here do not imply that the conditional distribution across respondents is bounded between the limits indicated in Table 2. Rather, these bounds apply to the distribution of the conditional means across respondents. While this narrowing of the range has obvious impacts on the standard deviations for the distribution of the conditional means, the mean values are very similar to those of the unconditional distributions. In the context of recent discussions about evidence of counter-intuitively signed coefficients (cf. Cirillo and Axhausen, 2006; Hess et al., 2005), the most interesting observations from Table 10
ˆ LL(β) LL(βcond )
MMNL(N) -2,324.43 -1,443.77
MMNL(U) -2,321.38 -1,465.22
MMNL(T) -2,327.1 -1,443.33
MMNL(SB ) -2,293.94 -1,422.35
Table 3: Model fit using estimated distributions and conditional means for random taste coefficients in commuter models 2 can be made in relation to the rates of sign violations. Here, it can be seen that the share of positive coefficient estimates decreases significantly when moving from the estimated distributions to the distribution of conditional means. The main exception comes with βVAR , where high rates of positive estimates continue to be observed even in the distribution of the conditional means. This could partly signal risk-prone behaviour by some respondents. Finally, the fact that shares of around 10% continue to be observed for βFF (except in MMNL(SB )) can partly be seen as evidence of trading-off between free flow time and the less desirable slowed down time. The differences between the estimated distributions and the distribution of the conditional means are further illustrated in Figure 2, which clearly shows the much narrower range for the conditional distributions. After these initial insights into the differences between the estimated and conditional distributions, we now proceed with a comparison of the results across the four different distributions used in the analysis. As a first step, we look at the likelihood of observing the specific choices recorded for each individual when making use of the individual-specific conditional mean estimates estimates. To this extent, the four MMNL models were applied to the data, using the final estimates for δRP and δSP(1) from Table 1, together with the conditional draws calculated on the basis of these estimates. The resulting LL values are presented in Table 3, alongside the corresponding values obtained with the unconditional ˆ As exdistributions, equating to the maximum log-likelihood estimates (LL(β)). pected, the results show major improvements in log-likelihood when moving from the unconditional draws to the conditional means. The percentage improvements are very similar for MMNL(N), MMNL(T) and MMNL(SB ) at around 38%, and are only slightly lower for MMNL(U), at 37%. As such, the differences in goodness of fit across distributions remain largely unaffected when moving from the estimated to the conditional distributions. As a final account of the differences between the various models, we look at a set of trade-offs calculated on the basis of the conditional draws. The results of this calculation, looking at the willingness to accept increases in travel cost or tolls in return for reductions in free flow or slowed down time, are summarised in Table 4. Here, it should be noted that the calculation of the conditional trade-offs made use of the ratios of the conditional means across respondents, 11
12
Figure 2: Cumulative distribution functions for estimated distributions and distribution of conditional means in commuter models
βFF vs βTC (AUD/hr)
βSDT vs βTC (AUD/hr)
βFF vs βTOLL (AUD/hr)
βSDT vs βTOLL (AUD/hr)
mean std.dev. P (< 0) mean std.dev. P (< 0) mean std.dev. P (< 0) mean std.dev. P (< 0)
MMNL(N) MMNL(U) MMNL(T) MMNL(SB ) 14.30 15.18 15.06 15.85 14.96 14.59 15.17 17.44 10.13% 10.55% 10.13% 2.95% 18.01 18.94 18.01 19.04 14.73 13.09 14.12 14.48 0.84% 0.00% 0.84% 0.00% 15.69 16.98 20.17 9.00 52.09 67.01 82.75 147.12 11.81% 10.97% 11.39% 3.38% 20.75 25.33 44.28 14.14 67.52 98.81 316.32 151.12 2.53% 0.42% 2.11% 0.42%
Table 4: Trade-offs based on conditional means i.e. calculating individual-specific trade-offs without acknowledging the random nature of the conditional distribution per individual. More work is required to test the effects of this simplification of the calculation. The results from Table 4 show that, even when working with the means of the conditional distributions, there are some differences in the calculated tradeoffs across the four models. For the trade-offs between βFF and βTC , the rate of sign violations is much lower for MMNL(SB ) than in the remaining three models. The same applies when looking at the trade-off between βFF and βTOLL . For this trade-off, the estimates from MMNL(SB ) however also lead to a much wider range for the trade-off, along with a lower mean value than for MMNL(N) and MMNL(U). On the other hand, for MMNL(T), the mean increases when compared to MMNL(N) and MMNL(U), along with the variance. These two observations are due to differences in the conditional draws for βTOLL , and similar observations can be made for the trade-off between βSDT and βTOLL . On the basis of the results shown here, it can be seen that while the differences across distributions may decrease when moving from the estimated distributions to the distribution of conditional means, some differences do remain. As such, the question arises as to what results should be used. It may be tempting to rely on the notion that better log-likelihood equates to better results. However, while MMNL(SB ) does produce the highest LL values when using the conditional draws, it also produces a suspiciously broad range for trade-offs using βTOLL in the denominator. Furthermore, while MMNL(N) and MMNL(T) produce essentially the same LL value when using the conditional draws (cf. Table 3), there are notable differences in the trade-offs produced with the two models. As such, it seems that likelihood on its own is not a perfect indicator of the performance of a given distribution when working with conditional draws. This is consistent with
13
earlier comments by Hess et al. (2005) in the case of unconditional distributions.
2.3
Non-commuters
The results for non-commuters are presented in Table 5. Again, there are significant levels of random taste heterogeneity for all five marginal utility coefficients, and all four MMNL models achieve significant gains in model fit when compared to the MNL model. As in the commuter models, the best performance is obtained by MMNL(SB ), ahead of MMNL(T), MMNL(U), and MMNL(N), where the differences in performance between the latter three models are again very small. Problems were again encountered in the estimation of MMNL(SB ), leading to a requirement for restarting the estimation several times. While, unlike in the corresponding commuter model, it was possible to estimate standard errors for all parameters, a large number of parameters again obtain low levels of significance. Furthermore, a dubiously low standard error is obtained for α1 in the distribution of βSDT . When comparing the estimated distributions to the distribution of the conditional means 6, we again observe a reduction in the range of the distribution for all coefficients, with the exception of βSDT in MMNL(SB ), where the limits are the same for the estimated and conditional distributions. Additionally, we again witness a notable reduction in the incidence of sign violations, where the only exception is βVAR , just as in the commuter models. As was the case in the commuter models, the distribution of the conditional means for βFF does retain a non-trivial positive part, but the reduction when compared to the unconditional distribution is more important than in the commuter models. The findings from Table 6 are reflected in the plots in Figure 3. The next step in the comparison looks at the differences in model fit obtained when working with the conditional means as opposed to using simulation over the unconditional distributions. The results of this process are summarised in Table 7. As was the case for commuters, we again observe very significant improvements in log-likelihood. However, while MMNL(SB ) was the best-performing model in estimation, it is outperformed by MMNL(N) and MMNL(T) when working with the conditional means. Furthermore, while MMNL(U) outperforms MMNL(N) in estimation, it produces the poorest fit when working with the conditional means. On the basis of model fit, this would be an indication that the advantages of a specific distribution in estimation do not necessarily translate into advantages when working with the conditional draws. As a final step in the analysis, we again look at the distribution of the main trade-offs in the four models, using the means of the conditional distributions. The results of these calculations are summarised in Table 8. Here, we again ob14
15
βVAR
βTOLL
βTC
βSDT
βFF
δSP(1) a b α1 α2 a b α1 α2 a b α1 α2 a b α1 α2 a b α1 α2
δRP
Respondents Observations LL(0) ˆ LL(β) par. adj. ρ2 est. 0.2858 0.1480 -0.0813 -0.0926 -0.3645 -0.4429 -0.0087 est. 0.4454 0.1448 -0.1387 0.1299 -0.1491 0.1080 -0.5986 0.4521 -0.8753 0.5828 -0.0245 0.0771
asy. t-rat. 3.50 1.89 -10.00 8.73 -12.13 7.28 -9.09 7.69 -15.05 10.79 -3.10 10.70
MMNL(N) 205 3280 -3603.45 -1972.27 12 0.4493 est. 0.4887 0.1506 -0.3487 0.4167 -0.3383 0.3535 -1.4133 1.5572 -1.8481 1.8790 -0.1511 0.2477 -
asy. t-rat. 3.87 1.99 -16.54 10.82 -23.27 10.56 -11.69 8.99 -41.94 13.71 -9.52 11.02 -
MMNL(U) 205 3280 -3603.45 -1969.92 12 0.4500 est. 0.4509 0.1489 -0.4687 0.6547 -0.4259 0.5552 -1.8171 2.3340 -2.2722 2.8135 -0.2084 0.3691 -
asy. t-rat. 3.48 1.94 -13.86 11.49 -12.82 10.02 -11.18 9.19 -15.37 12.40 -10.05 9.94 -
MMNL(T) 205 3280 -3603.45 -1965.29 12 0.4513
Table 5: Main estimation results for non-commuter models
asy. t-rat. 3.59 2.56 -20.17 -19.92 -14.92 -31.01 -2.59 -
MNL 205 3280 -3603.45 -2395.88 7 0.3332 est. 0.5311 0.1292 -1.3163 1.5165 -2.3871 1.8332 -0.2427 0.2008 0.1572 0.0000 -1.2913 1.1414 -0.1271 0.1124 -2.1105 2.1393 -0.2175 0.5787 -1.1389 1.3018 -3.6878 2.0689
asy. t-rat. 3.58 1.67 -1.03 1.24 -0.94 2.28 -15.27 12.20 144.84 0.06 -9.02 7.03 -1.02 1.17 -4.70 4.20 -0.78 3.33 -1.43 1.52 -1.62 1.68
MMNL(SB ) 205 3280 -3603.45 -1945.26 22 0.4541
16
Figure 3: Cumulative distribution functions for estimated distributions and distribution of conditional means for random taste coefficients in non-commuter models
βFF
βSDT
βTC
βTOLL
βVAR
min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0) min max mean std.dev. P (> 0)
MMNL(N) est. cond. −∞ -0.9188 +∞ 0.3572 -0.3114 -0.3220 0.2917 0.2200 14.28% 5.37% −∞ -0.7505 +∞ 0.0452 -0.3348 -0.3415 0.2424 0.1504 8.36% 1.46% −∞ -3.0135 +∞ 0.4109 -1.3441 -1.3547 1.0151 0.6413 9.27% 1.95% −∞ -4.1883 +∞ 0.7053 -1.9655 -1.9543 1.3087 1.0696 6.66% 2.44% −∞ -0.3681 +∞ 0.2777 -0.0550 -0.0542 0.1732 0.1327 37.54% 33.66%
MMNL(U) est. cond. -0.7136 -0.6359 0.1391 0.0919 -0.2873 -0.2907 0.2461 0.1830 16.31% 4.88% -0.6923 -0.6017 0.0310 -0.0312 -0.3307 -0.3256 0.2088 0.1363 4.28% 0.00% -2.8920 -2.4784 0.2945 -0.0763 -1.2988 -1.2608 0.9199 0.5928 9.24% 0.00% -3.7816 -3.2886 0.0633 -0.1369 -1.8592 -1.7922 1.1099 0.9567 1.65% 0.00% -0.3091 -0.2548 0.1977 0.1605 -0.0557 -0.0532 0.1463 0.1087 39.01% 34.63%
MMNL(T) est. cond. -1.0393 -0.8271 0.4125 0.2603 -0.3134 -0.3222 0.2964 0.2198 16.15% 6.34% -0.9445 -0.7166 0.2868 0.0413 -0.3288 -0.3390 0.2513 0.1576 10.85% 1.95% -4.0298 -2.9788 1.1463 0.2098 -1.4417 -1.4056 1.0566 0.6734 9.81% 1.95% -5.0390 -3.8739 1.2004 0.3547 -1.9193 -1.9354 1.2736 1.0576 7.40% 2.44% -0.4622 -0.3259 0.3564 0.2383 -0.0529 -0.0518 0.1671 0.1266 37.91% 34.15%
MMNL(SB ) est. cond. -2.4783 -1.0032 0.3769 0.1487 -0.2731 -0.2822 0.2667 0.2060 14.47% 4.39% -0.4569 -0.4569 -0.0788 -0.0788 -0.2909 -0.2928 0.1876 0.1259 0.00% 0.00% -2.4313 -2.3695 -0.2822 -0.3082 -1.2420 -1.2071 0.9726 0.6506 0.00% 0.00% -3.9737 -3.3537 0.0542 -0.1315 -1.6955 -1.7256 1.1622 0.9795 1.33% 0.00% -2.1443 -0.4615 0.3067 0.1424 -0.0697 -0.0675 0.1503 0.1215 37.38% 31.71%
Table 6: Comparison of estimated distributions and distributions of conditional means for non-commuter models ˆ LL(β) LL(βcond )
MMNL(N) -1972.27 -1199.17
MMNL(U) -1969.92 -1237.67
MMNL(T) -1965.29 -1195.54
MMNL(SB ) -1945.26 -1205.6
Table 7: Model fit using estimated distributions and conditional means for random taste coefficients in non-commuter models serve significant differences in the results across the four models, in terms of the mean values, the standard deviation, and the rates of sign violations. Also, as in the commuter models, these differences are larger than could have been expected on the basis of the small differences in conditional LL across the four models (cf. Table 7). As such, there are significant differences between MMNL(N) and MMNL(T) for the first two trade-offs, along with significant differences between MMNL(N) and MMNL(SB). While MMNL(N) and MMNL(SB) do produce similar mean values for the final two trade-offs, there are significant differences in 17
the implied level of variation across respondents. In common with the analysis on the commuter data, the results obtained for non-commuters thus suggest that important differences do exist between the conditional estimates obtained from models making different (unconditional) distributional assumptions. Furthermore, the scale of these differences is not necessarily reflected in the model fit obtained with the various conditional draws, or indeed the model fit obtained in estimation.
βFF vs βTC (AUD/hr)
βSDT vs βTC (AUD/hr)
βFF vs βTOLL (AUD/hr)
βSDT vs βTOLL (AUD/hr)
mean std.dev. P (< 0) mean std.dev. P (< 0) mean std.dev. P (< 0) mean std.dev. P (< 0)
MMNL(N) MMNL(U) MMNL(T) MMNL(SB ) 11.31 22.88 14.40 20.32 71.16 42.40 127.02 21.43 7.32% 4.88% 8.29% 4.39% 16.95 22.19 22.45 20.14 54.84 27.95 156.45 15.46 2.44% 0.00% 2.93% 0.00% 19.22 17.09 19.27 17.65 48.10 23.92 43.77 25.09 6.83% 4.88% 7.80% 4.39% 19.97 20.02 18.32 18.90 64.70 27.65 54.40 25.20 3.90% 0.00% 4.39% 0.00%
Table 8: Trade-offs based on conditional means
3
Applications on simulated data
In this section, we present the findings from a systematic analysis making use of four separate simulated datasets. A very basic model specification was used in this framework, basing the analysis on a binomial model with a single randomly distributed coefficient7 . The utility functions in this model are given by: U1 = λ x + ε1
(4)
U2 = λ α + ε2 ,
(5)
where ε1 and ε2 are the usual iid type I extreme value terms, and λ is a scale parameter, which, in the generation of the data, was fixed to a value of 2. The attribute x was set to follow a standard Normal distribution, with values distributed across respondents and observations. Finally, the coefficient α takes on various random distributions, depending on the application, and is distributed such that it stays constant across replications for the same respondent. In each 7
The form of this model is derived from a basic binomial logit model estimated in log willingness-to-pay space by Fosgerau (2005) and Hess et al. (2006b).
18
of the case studies, we generated 50 separate datasets, using different draws for the extreme value terms, so as to allow us to gauge the stability of the results. In each dataset, 8, 000 observations were generated for 1, 000 respondents.
3.1
Example 1: Lognormal
For the first case study, we use a Lognormal distribution for α, where this is rescaled by a factor of 21 and shifted to the left by a value of 1, such that we have α = 12 exp(x) − 1 with x ∼ N (0, 1).
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
Estimated LL mean std.dev. -3774.34 35.89 -3802.94 34.73 -3718.18 36.60 -3784.42 35.51 Deviation of min. 56.09% (9.52) 17.75% (4.48) 12.7% (5.59) 44.94% (7.07)
LL with conditional means mean std.dev. -2883.35 44.36 -2964.41 43.25 -2923.72 45.92 -2906.36 44.38
cond. means from true distribution (std.dev.) max. mean std.dev. 68.89% (0.99) 28.52% (6.67) 29.7% (2.86) 82.37% (0.82) 29.62% (6.71) 34.05% (2.72) 32.29% (6.51) 6.47% (4.93) 12.7% (4.97) 76.5% (0.89) 30.92% (6.73) 31.13% (2.82)
Table 9: Results for first simulated data experiment Four different MMNL models were estimated on this data, making use of the same four distributions used in the applications in Section 2. The results of this case study are presented in two parts. Table 9 shows summary results across the fifty datasets, in the form of two measures. First, the table shows the average model fit statistics obtained at the maximum likelihood estimates and when using the conditional means. Furthermore, the table shows results from a comparison of four distributional statistics, namely the minimum, maximum, mean and standard deviation, between the true distribution8 and the distribution of the conditional means. The measure used in these comparisons is the absolute percentage deviation from the statistics for the true distribution, where we present the mean across the fifty runs along with the standard deviation. To give a further account of the variation across the fifty datasets, Figure 4 shows plots of the LL values across runs, with the estimated distributions and the conditional means. In addition, the figure shows cumulative distributions for the estimated distribution and the distribution of conditional means for the first dataset. 8
In the form of the actual 1, 000 draws used in data generation.
19
Figure 4: Results for first simulated data experiment The results from Table 9 show that, in estimation, MMNL(SB ) obtains the best model fit, ahead of MMNL(N), MMNL(T) and MMNL(U). However, when moving to the conditional means, MMNL(SB ) only outperforms MMNL(U), with MMNL(N) obtaining the best performance ahead of MMNL(T). Furthermore, the variation in the LL values between runs is more significant when working with the conditional means than when using the estimated distributions. These findings are reflected in Figure 4, which also shows the closer approximation to the true distribution obtained with the conditional means when compared to the estimated distributions. Importantly, the plots also show that while the variation in the LL values is indeed greater when working with the conditional draws, the ranking of models is the same across the fifty runs. 20
A different picture however emerges when moving from the LL results to the comparison between the performance of the models in retrieving the four main distributional statistics of the true distribution (on the basis of the distribution of the conditional means). Here, we can see that, across the four measures, MMNL(SB ) obtains the best performance, with a sizeable advantage over the remaining three distributions. The level of variation across runs differs across the four measures, and while MMNL(SB ) is less stable than the other three models for the maximum and the standard deviation in the distribution, even a broad confidence interval still allows it to outperform the other models. These results are a strong indication that model fit on its own is not a good indicator for how well a given distribution is able to approximate the true distribution. Indeed, on the basis of the conditional model fit, MMNL(N) and MMNL(T) would have been preferred to MMNL(SB ), which, given the additional results from Table 9, is clearly inappropriate.
3.2
Example 2: Normal
For the second case study, we use a standard Normal distribution for α, such that α ∼ N (0, 1). Here, we only present summary statistics across runs (Table 10), with detailed results available on request.
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
Estimated LL mean std.dev. -3720.25 44.20 -3726.57 44.84 -3719.58 44.28 -3719.78 44.38
LL with conditional means mean std.dev. -2555.25 47.26 -2635.22 47.12 -2555.95 47.05 -2572.60 47.23
Deviation of cond. means from true distribution (std.dev.) min. max. mean std.dev. 27.83% (3.09) 14.98% (4.74) 542.81% (365.36) 11.2% (2.32) 49.16% (1.69) 42.29% (1.93) 517.02% (345.31) 16.02% (2.05) 25.5% (5.39) 18.28% (6.98) 598.76% (413.39) 11.09% (2.39) 35.96% (2.4) 27.6% (2.94) 518.69% (335.32) 12.86% (2.22)
Table 10: Results for second simulated data experiment In terms of log-likelihood, the results show very similar performance in estimation across the four distributions, where only MMNL(U) obtains slightly poorer model fit. When moving to the conditional draws, the relative variation across runs is again larger than was the case in estimation. Here, the best performance is obtained by MMNL(N) and MMNL(SB ), ahead of MMNL(T) and MMNL(U). 21
The difference in performance from MMNL(N) and MMNL(SB ) to MMNL(T) is probably too small to infer any impacts of the more stricter shape assumption on performance. For the results in the second part of Table 10, the high absolute deviations for the mean value should be disregarded; they are a direct effect of the zero mean in the true distribution. For the remaining three measures, the best performance is obtained by MMNL(N) and MMNL(SB ), with the former performing slightly better in the recovery of the tails of the distribution.
3.3
Example 3: Mixture of two Normals
For the third case study, we use a mixture of two Normals, with equal mass (π), a standard deviation of 0.5 and means of −1 and 1. As such, we have that α ∼ N (−1, 0.5) with π = 0.5 and α ∼ N (1, 0.5) with π = 0.5. The results across the fifty runs are summarised in Table 11. The mixture of the two Normals gives a symmetrical bimodal distribution, which should give an advantage to the Johnson SB distribution. As such, it is no surprise that the best performance in estimation is obtained by MMNL(SB ). However, what is surprising is that MMNL(U) outperforms MMNL(T) and MMNL(N), where MMNL(N) in fact obtains the poorest performance overall. The most striking observation is however made when moving to the conditional draws. Here, MMNL(SB ) is now outperformed by the remaining three models, with MMNL(N) obtaining the best performance, ahead of MMNL(T) and MMNL(U). In fact, unlike in the first simulated example, these results are also supported by the findings reported in the second part of Table 11. Indeed, we observe that, while MMNL(SB ) performs best in the recovery of the true mean9 , it is outperformed by the remaining three models in the recovery of the bounds of the distribution, as well as in the recovery of the standard deviation. This is an indication that advantages in estimation do not necessarily translate into advantages when working with conditional means.
3.4
Example 4: Normal with added mass at zero
For the fourth and final case study, we use a Normal distribution with an added mass at zero10 . As such, we have α ∼ N (0, 1) with π = 0.8 and α = 0 with π = 0.2. The results of the estimation across the fifty datasets are summarised in Table 12. Here, the best performance in estimation is obtained by MMNL(SB ), closely 9
Where the differences between models are quite small. This distribution could be useful for attributes that some individuals value positively, while others value them negatively and a third group is indifferent to them. 10
22
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
Estimated LL mean std.dev. -3626.50 34.90 -3602.19 36.53 -3595.33 37.67 -3621.98 35.16
LL with conditional means mean std.dev. -2339.77 37.56 -2411.08 37.62 -2476.43 38.76 -2357.13 37.43
Deviation of cond. means from true distribution (std.dev.) min. max. mean std.dev. 4.51% (2.94) 9.7% (5.38) 29.82% (21.73) 2% (1.95) 30.74% (2.02) 22.03% (2.45) 25.79% (19.85) 7.09% (2.23) 37.89% (2.8) 34.11% (3.63) 24.77% (17.93) 8.67% (2.06) 12.24% (3.71) 4.48% (2.81) 26.07% (19.94) 3.48% (2.3)
Table 11: Results for third simulated data experiment followed by MMNL(N) and MMNL(T). The situation is very similar when working with the conditional draws, where the differences across the four distributions in model fit are slightly larger than in estimation. These results are generally reflected in the second part of Table 12, where the high errors in the recovery of the mean again need to be put in context by a theoretical mean of zero for the true distribution.
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
MMNL(N) MMNL(U) MMNL(SB ) MMNL(T)
Estimated LL mean std.dev. -3802.51 37.12 -3823.13 37.31 -3801.37 37.10 -3805.78 37.29
LL with conditional means mean std.dev. -2759.89 39.88 -2842.37 39.60 -2760.61 39.85 -2775.71 39.92
Deviation of cond. means from true distribution (std.dev.) min. max. mean std.dev. 35.1% (2.99) 23.96% (2.93) 222.52% (163.39) 17% (1.99) 55.72% (1.31) 49.29% (1.33) 210.19% (147.2) 21.75% (1.77) 29.95% (5.29) 29.34% (4.55) 193.28% (123.1) 16.7% (2.08) 43.03% (2.13) 35.66% (1.9) 205.66% (141.19) 18.29% (1.92)
Table 12: Results for fourth simulated data experiment
23
4
Conclusions
This paper has aimed to address the issue of the choice of distribution in Mixed Logit models when results are to be used in a posterior analysis producing individual-specific (mean) coefficient values conditioned on observed choices. With an increase in the number of applications making use of these conditional draws (see for example Greene et al., 2005; Hensher and Rose, 2006; Hess, 2006), this issue is set to become just as important as the choice of distribution in studies making use purely of the unconditional results (cf. Hess et al., 2005). The discussion in this paper has shown that, independently of the distributional assumptions, the conditional approach has clear advantages in the computation of trade-offs when compared to the unconditional approach. Furthermore, working with conditional distributions leads to a much better fit to the data, which, if the problems with mapping (cf. Hensher and Jones, 2006) can be overcome, would lead to significant gains in forecasting performance. While the analysis has shown that the choice of unconditional distributions does have an impact on the conditional results, the advantages of the conditional approach do remain, and there seems to be sufficient evidence to suggest that the impact of the shape assumptions on the conditional estimates is less severe than in the unconditional case. Indeed, it could be argued that, with a large enough sample, the distribution of the conditional means should approximate the true population distribution, while the unconditional distribution has its shape imposed by the analyst, which also leads to issues in forecasting. Overall, these findings would suggest that analysts should increasingly make use of conditional model estimates, especially in studies aimed at producing VTTS estimates (or similar trade-offs), where forecasting is less of an issue. Even though the use of the conditional approach thus appears to have significant advantages over the use of unconditional distributions, the findings from this study does give cause for concern, suggesting that, when working with conditional draws, modellers need to be very careful in their choice of mixture distributions in model estimation. While working with conditionals can decrease the impact of the shape of the estimated distribution, for example in terms of range and sign violations, it seems obvious from the results presented here that some impact does remain. Furthermore, the fact that good performance in estimation does not necessarily translate into good performance of the conditional draws is worrying. More work is required on other datasets, and using a broader range of distributions, to further clarify the impact of distributional assumptions on conditional model estimates. Finally, the simulated data experiments have indicated that model fit on its own is not an appropriate indicator of how good an approximation a target distribution gives to the true distribution. The problem is that, 24
in estimation, the shape of the true distribution is unknown, such that modellers are heavily reliant on model fit as an indicator of model accuracy. As such, it seems that best advice to modellers would be to use as flexible a distribution as possible to decrease the potential impact of the shape assumptions, whether working with unconditional or conditional estimates. In closing, it should again be noted that the analysis here was based entirely on the use of the means of the conditional distributions, disregarding the variation around these individual-specific draws for the various coefficients. This is the basic approach used in studies making use of conditional draws, and as such, the results presented here are of interest despite this simplification. Nevertheless, it remains to be seen to what degree the conclusions presented here apply in the case where the whole distribution per individual is taken into account, especially in the calculation of trade-offs.
Acknowledgements The authors would like to thank Ken Train for helpful comments on an earlier version of this paper.
References Algers, S., P. Bergstroem, M. Dahlberg and J. Lindqvist Dillen (1998) Mixed Logit Estimation of the Value of Travel Time, Working Paper, Department of Economics, Uppsala University, Uppsala, Sweden. Bliemer, M. C. J., J. M. Rose and D. A. Hensher (2005) Constructing efficient stated choice experiments allowing for differences in error variances across subsets of alternatives, paper submitted to Transportation Research B. Cirillo, C. and K. W. Axhausen (2006) Evidence on the distribution of values of travel time savings from a six-week diary, Transportation Research Part A: Policy and Practice, 40 (5) 444–457. Doornik, J. A. (2001) Ox: An Object-Oriented Matrix Language, Timberlake Consultants Press, London. Fosgerau, M. (2005) Investigating the distribution of the value of travel time savings, Transportation Research Part B: Methodological, forthcoming. Greene, W. H., D. A. Hensher and J. M. Rose (2005) Using classical simulationbased estimators to estimate individual wtp values, in R. Scarpa and A. Al25
berini (eds.), Applications of Simulation Methods in Environmental and Resource Economics, chap. 2, 17–33, Springer Publisher, Dordrecht. Hensher, D. A. and W. H. Greene (2003) The Mixed Logit Model: The State of Practice, Transportation, 30 (2) 133–176. Hensher, D. A. and S. Jones (2006) Forecasting Corporate Bankruptcy: Optimizing the Performance of the Mixed Logit Model, Abacus, forthcoming. Hensher, D. A. and J. M. Rose (2006) Respondent behavior in discrete choice modeling with a focus on the valuation of travel time savings, Journal of Transportation and Statistics, 8 (2) 17–30. Hess, S. (2006) Posterior analysis of random taste coefficients in air travel choice behaviour modelling, IVT Working Paper, ETH Z¨ urich. Hess, S., K. W. Axhausen and J. W. Polak (2006a) Distributional assumptions in Mixed Logit modelling, paper presented at the 85th Annual Meeting of the Transportation Research Board, Washington, DC. Hess, S., M. Bierlaire and J. W. Polak (2005) Estimation of value of travel-time savings using mixed logit models, Transportation Research Part A: Policy and Practice, 39 (2-3) 221–236. Hess, S., M. Bierlaire and J. W. Polak (2006b) A systematic comparison of continuous and discrete mixture models, paper presented at the 11th International Conference of Travel Behaviour Research, Kyoto, Japan. Hess, S., J. M. Rose and D. A. Hensher (2006c) Asymmetrical Preference Formation in Willingness to Pay Estimates in Discrete Choice Models, ITLS Working Paper ITLS-WP-06-12, Institute of Transport and Logistics Studies, The University of Sydney, Sydney. Hess, S., K. Train and J. W. Polak (2006d) On the use of a Modified Latin Hypercube Sampling (MLHS) method in the estimation of a Mixed Logit model for vehicle choice, Transportation Research Part B: Methodological, 40 (2) 147– 163. McFadden, D. and K. Train (2000) Mixed MNL Models for discrete response, Journal of Applied Econometrics, 15, 447–470. Revelt, D. and K. Train (1998) Mixed Logit with repeated choices: households’ choices of appliance efficiency level, Review of Economics and Statistics, 80 (4) 647–657. 26
Revelt, D. and K. Train (1999) Customer-specific taste parameters and Mixed Logit: Households’ choice of electricity supplier, Working Paper No. E00-274, Department of Economics, University of California, Berkeley, CA. Sillano, M. and J. de D. Ort´ uzar (2004) Willingness-to-pay estimation from mixed logit models: some new evidence, Environment & Planning A, 37 (3) 525–550. Train, K. (1998) Recreation demand models with taste differences over people, Land Economics, 74, 185–194. Train, K. (2003) Discrete Choice Methods with Simulation, Cambridge University Press, Cambridge, MA. Train, K. and G. Sonnier (2005) Mixed logit with bounded distributions of correlated partworths, in R. Scarpa and A. Alberini (eds.), Applications of Simulation Methods in Environmental and Resource Economics, chap. 7, 117–134, Springer Publisher, Dordrecht, The Netherlands.
27