Comparison of Parametric and Semiparametric Representations of Unobserved Preference Heterogeneity in Logit Models Prateek Bansal School of Civil and Environmental Engineering Cornell University 301 Hollister Hall, Ithaca NY 14853
[email protected] Ricardo A. Daziano School of Civil and Environmental Engineering Cornell University 305 Hollister Hall, Ithaca NY 14853
[email protected] Martin Achtnicht Leibniz Institute of Ecological Urban and Regional Development (IOER), Dresden
&
Centre for European Economic Research (ZEW), Mannheim, Germany
[email protected] November 4, 2017
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ABSTRACT
The logit-mixed logit (LML) model is a very recent advancement in semiparametric discrete choice models. LML represents the mixing distribution of a logit kernel as a sieve function (polynomials, step functions, and splines, among many other variants). In the rst part of this paper, we conduct Monte-Carlo studies to analyze the number of required parameters (e.g., polynomial order) in three LML variants to recover the true population distributions, and also compare the performance (in terms of accuracy, precision, estimation time, and model t) of LML and a mixed multinomial logit with normal heterogeneity (MMNL-N). Our results indicate that adding too many parameters in LML may not be the best strategy to retrieve underlying taste heterogeneity; in fact, overspecied models generally perform worst in terms of BIC. We recommend to use neither minimum-BIC nor the most exible specication, but we rather suggest to start with the same number of parameters as a parametric model (such as MMNL-N) while checking changes in the derived histogram of the mixing distribution. As expected, LML was able to recover bimodal-normal, lognormal, and uniform distributions much better than the misspecied MMNL-N. Computational eciency makes LML advantageous in the process of searching for the nal specication. In the second part of the paper, we estimate the willingness-to-pay (WTP) estimates of German consumers for dierent vehicle attributes when making alternative-fuel-car purchase choices. LML was able to capture the bimodal nature of WTP for vehicle attributes, which was not possible to retrieve using standard parametric specications.
21 22
Keywords: mixed logit, semiparametric, unobserved taste heterogeneity, exible mixing
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
2
1. INTRODUCTION: RANDOM PREFERENCE HETEROGENEITY IN CHOICE MODELING
In random utility maximization-based discrete choice modeling, the multinomial or conditional logit (MNL) model (McFadden, 1973) has been widely used, but cannot handle unobserved dierences in preferences across decision makers. In the past two decades, researchers have realized the importance of incorporating random taste heterogeneity in many practical situations, including the valuation of travel time savings that vary across commuters. MNL has been subsequently extended to random parameter logit models, such as the mixed multinomial logit (MMNL) model (McFadden and Train, 2000) that assumes continuous parameteric heterogeneity distributions. In addition to MMNL, the literature oers several parametric and semiparametric logit-type models to specify random taste heterogeneity of the consumers. However, there is no agreement among researchers in terms of choosing any specic model (or mixing distribution). Keane and Wasi (2013) and Fosgerau and Hess (2007) are two seminal papes that have compared dierent random parameter logit models. Whereas Keane and Wasi (2013) used data obtained from 10 stated-preference (SP) discrete choice experiments with a focus on identifying the best parametric random parameter logit model in terms of both data t basically using the Bayes information criterion (BIC) and capturing specic behavioral patterns, Fosgerau and Hess (2007) compared semiparametric models with parametric models using SP data and Monte-Carlo studies to explore the best strategy to retrieve the true random heterogeneity in the population. The ndings of these studies are discussed below, in addition to other studies. In terms of parametric heterogeneity distributions, MMNL with normally distributed random parameters (MMNL-N) is the most commonly used specication in research.1 However, the normal distribution may be too restrictive for some practical situations and may create misspecication issues.2 Additionally, when Keane and Wasi (2013) compared MMNLN with other parametric models3 across 10 SP datasets, the authors never found MMNL-N to be preferable in terms of BIC. MMNL-N also did worse relative to other parametric models in capturing extreme consumer behavior (such as lexicographic behavior, when consumer choices are mainly determined by a single attribute of the alternatives). Whereas MMNL-N does not appear to be a universally appropriate choice of mixing distribution and there is no way to know the mixing distribution before estimation, a few studies (Bajari et al., 2007; Fosgerau and Bierlaire, 2007; Train, 2008; Fox et al., 2011; Bastin et al., 2010; Fosgerau and Mabit, 2013) have specied semiparametric logit models, which consider more exible and nonparameteric heterogeneity distributions. These models are generally computationally ecient and easier to implement when compared to parametric models. A quick review of these models can be found in a companion paper (Bansal et al., 2017). Fosgerau and Hess (2007) compared a Legendre polynomial-based semiparametric logit model (Fosgerau and Bierlaire, 2007) with other parametric logit models and found 1A
few studies have also used other distributions such as lognormal, Johnson's Sb , gamma, and triangular. example, the marginal utility of price components has to be negative, by microeconomic principles. However, a normal distribution for price parameters may misleadingly yield positive estimates. 3 These other parametric models include: generalized MNL (see Fiebig et al., 2010), theory-constraint MMNL (e.g., lognormal distribution of a price parameter), and Mixed-Mixed MNL (MM-MNL, Burda et al., 2008; Rossi et al., 2012). 2 For
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
3
it the best in terms of retrieving the true distribution of the random parameters across case studies (with true distributions ranging from uniform to multimodal). Such nding is expected because of the exibility of specifying a higher number of parameters in the semiparametric approaches. Fosgerau and Hess (2007) concludes with a dierent perspective on the advantage of semiparametric approaches: by allowing a higher number of parameters, semiparametric models can be used as initial diagnostic tool to identify the underlying heterogeneity distribution, and parametric logit models with fewer parameters could be subsequently estimated for inference and prediction. Train (2016) recently proposed a semiparametric Logit-Mixed Logit (LML) model (see Section 2.2), which generalizes many previous parametric and semiparametric logit models (see Sections 2.2.1 and 2.2.2 for details). As the name suggests, this model contains two logit formulations: one for the decision maker's probability to choose an alternative and another for the probability of selecting a given parameter value from a nite parameter space. The actual shape of the logarithm of the mixing distribution can be dened by dierent type of functions such as polynomials (see Section 2.2.1), step functions (see Section 2.2.2), and splines (see Section 2.2.3), among many others. Since LML provides a generalized framework for semiparameteric logit models, the required number of parameters (i.e., `order' of polynomial, `levels' in step function, and `knots' in spline) to retrieve specic heterogeneity distributions is worth exploring. Thus, in the rst part of this paper, we conduct Monte-Carlo studies to analyze the required number of LML parameters to recover dierent shapes of random taste heterogeneity (see Section 3). In addition, model t BIC, estimation time, and random heterogeneity retrieval (in terms of nite sample bias and probability distribution functions of the random parameters) of the LML model are also compared with MMNL-N. In the context of LML, we also investigate the observation of Fosgerau and Hess (2007), which suggests that a higher number of parameters yields a better approximation of the true distribution.4 In the second part of this paper, we analyze purchase preferences of German consumers for alternative-fuel vehicles using MMNLN and diering LML specications. The objective of this empirical application is to explore the implications of alternative LML specications (with varying number of parameters) on the estimates of willingness to pay (WTP) for various vehicle attributes. Since the true WTP distribution in unknown, we compare the WTP estimates (and probability density functions) of LML with MMNL-N and explicitly state the benets of using LML over MMNL-N. The remaining paper is organized as follows: Section 2 discusses mathematical details of MMNL-N and LML; Section 3 lays out the Monte-Carlo study design, and draws insights about performance of dierent logit models under dierent types of random taste heterogeneity; Section 4 focuses on the implications of using LML specications over MMNL-N in estimating stated purchase preferences for alternative-fuel vehicles; and Section 5 concludes with practical recommendations. 4 The loglikelihood of the model with a higher number of parameters may be higher, but it is worth exploring whether model specications with more exible distributions are preferable in terms of BIC, for instance.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 100 101 102 103 104
2. MIXED MULTINOMIAL LOGIT (MMNL) AND LOGIT-MIXED LOGIT (LML) MODELS
As stated in the introduction, the MMNL dominates research in random parameter logit models. In MMNL, the indirect utility derived by decision-maker i from choosing alternative j in choice situation t is:
Uijt = xijt 0 βi + ijt i = 1, ..., N ; j = 1, ..., J, t = 1, ..., T, 105 106 107 108 109 110
112 113 114 115 116 117
(1)
where βi is a vector of parameters for decision-maker i modeled as having a parametric, continuous mixing distribution in the population; xijt is a column vector of observed attributes of alternative j ; ijt is a taste shock independent and identically distributed (i.i.d. among decision-makers, alternatives, and time) Type I Extreme Value. Given a specic value of the random preference parameters, the conditional MMNL probability Qijt (βi ) of decision-maker i choosing alternative j in choice situation t is MNL-shaped:
exp(xijt 0 βi ) . Qijt (βi ) = PJ 0 k=1 exp(xikt βi ) 111
4
(2)
2.1. Mixed Multinomial Logit Model with Normal Heterogeneity (MMNL-N)
Variations of MMNL models can be derived by assuming dierent mixing distributions of the random parameters. For example, MMNL-N imposes a multivariate normal mixing distribution, i.e. βi ∼ N (β, Σ), where β and Σ are the population parameters of the heterogeneity distribution. Let yijt = 1 if decision maker i chooses alternative j in choice situation t, and 0 otherwise; then the unconditional probability Pi (β, Σ) of the sequence of alternatives chosen by the decision-maker i is: #yijt ) " Z (Y T Y J exp (xijt 0 βi ) f (βi |β, Σ)dβi , (3) Pi (β, Σ) = PJ 0β ) exp (x ikt i k=1 t=1 j=1
120
where f (βi |β, Σ) is the probability density function of the random parameter vector βi . Parameters of the MMNL model in general, including the common MMNL-N specication, can be found using the maximum simulated likelihood estimator (Train, 2009).
121
2.2. Logit-Mixed Logit (LML) Model
118 119
122 123 124 125 126 127 128
As detailed in Train (2016), in LML the joint mixing distribution of the random parameters βi is assumed to be discrete over a nite support set S (c.f. the latent class logit specication: Kamakura and Russell, 1989; Bhat, 1997). Discretization should not be seen as a constraint because the support set essentially a multidimensional grid can be made larger and denser by considering a broader domain and a higher number of grid points (or both). The joint probability mass function of random parameters in LML are specied with the following logit-type expression:
exp(z(βr )0 α) , 0 s∈S exp(z(βs ) α)
wi (βr |α) = Pr(βi = βr ) = P 129 130
(4)
where α is a vector of parameters and z(βr ) denes the shape of the mixing distribution. This study considers z(βr ) to be polynomial, step function, and spline. These functional
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 131 132 133
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
157 158 159 160 161 162 163 164
5
forms are described briey in the next subsections and the reader can refer to Train (2016) for further details. The unconditional probability of the sequence of choices of decision-maker i is: #yijt ) (T J " X Y Y exp (xijt 0 βr ) wi (βr |α). (5) Pi (α) = PJ 0β ) exp (x ikt r k=1 t=1 j=1 r∈S Note that the vector α is the only parameter of interest in LML, and can be estimated using the maximum likelihood estimator. Inclusion of all the points of the support set in the estimation of LML is not only unnecessary but also computationally very expensive. Howver, a random subset of points can be drawn within S for estimation purposes. The logit formula to compute the probability mass of random parameters (see equation 4) results into ecient computation of the score (loglikelihood gradient).
2.2.1 LML-Polynomial Many standard mixing distributions can be generated by considering z(βr ) to be a polynomial of certain order. For example, Train (2016) shows that the widely used normal distribution can be recovered in the LML framework by considering z(βr ) to be a second-order polynomial. It is worth noting that when a Legendre polynomial (LGP) is used to dene the shape of the mixing distribution, the LML-Polynomial model becomes similar to the model proposed by Fosgerau and Hess (2007). Polynomials such as Chebyshev and Bernstein can be used to dene z(βr ), but in this study we focus on Legendre polynomials. For illustration purposes, consider an indirect utility equation with only one random parameter β . Since the , where L and U are lower and LGP-support is [−1, 1], β is transformed to β¯ = −1 + 2 Uβ−L −L upper bounds of β in the support set S . Subsequently, the shape of the mixing distribution ¯. can be dened by the k th order LGP for β¯: zk (β) = LGP (k, β) 2.2.2 LML-Step Suppose that the support set S is divided into M subsets, labeled as Tm , where m = {1, 2, . . . , M }. Consider that the probability mass is the same for all points within subsets, but varies over subsets. The probability mass function in this case can be thus written as: P exp( M m=1 αm I(βr ∈ Tm )) . (6) wi (βr |α) = Pr(βi = βr ) = P PM s∈S exp( m=1 αm I(βs ∈ Tm )) Comparing equations 4 and 6 suggests that z(βr ) is a vector of M indicators I(βr ∈ Tm ), which identify the subset containing the realization βr . It is worth noting that Bajari et al. (2007), Train (2008), and Fox et al. (2011) also proposed similar semiparametric methods, but unlike LML, in these previously proposed approaches the probability mass is computed at every grid point of the support set S . In other words, these studies considered M = card(S), i.e. only one point in each Tm . Thus, the number of estimated parameters becomes equal to the number of grid points, causing a limitation of having a very coarse grid. However, the use of subsets in LML (see equation 6) overcomes this curse of cardinality.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 165
6
2.2.3 LML-Spline Spline functions connect piece-wise polynomial functions at a high degree of smoothness. In a one dimensional setting (only one random parameter β ), the linear spline5 g(β) can be dened as α0 z(β). Consider an example of spline with start and end points at β¯1 and β¯4 , and knots at β¯2 , and β¯3 . The components of α dene the spline height.6 The ve elements of vector z(β) in this example are as follows:
β − β¯1 I(β ≤ β¯2 ), z1 (β) = 1 − ¯ β2 − β¯1 z2 (β) =
β − β¯ ¯ 1 ¯2 ) + 1 − β − β2 I(β¯2 < β ≤ β¯3 ), I(β ≤ β β¯2 − β¯1 β¯3 − β¯2
z3 (β) =
β − β¯ ¯ 2 ¯2 < β ≤ β¯3 ) + 1 − β − β3 I(β¯3 < β), I( β β¯3 − β¯2 β¯4 − β¯3
β − β¯ 3 I(β¯3 < β), z4 (β) = ¯ ¯ β4 − β3 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
where I(·) is an indicator function. In sum, the number of identiable parameters in LML-Polynomial is related to the order of the polynomial; in LML-Step, to the levels of the step function; and in LMLSpline, to the number of knots. If the utility equation has R random parameters and all are uncorrelated, the number of identiable (G) parameters is: LML-Polynomial = Order×R, LML-Step = (Levels - 1)×R, and LML-Spline = (Knots + 1)×R.
3. MONTE-CARLO STUDY 3.1 Monte-Carlo Simulation Plan
As data generating process, we considered that a respondent chooses the alternative with maximum utility among four available alternatives, where the indirect utility Uijt of decisionmaker i for alternative j in choice situation t is: Uijt = β1i x1ijt +β2i x2ijt +ijt (see equation 1). The attributes x1ijt and x2ijt were independently drawn from a standard normal distribution and ijt was drawn from an EV 1(0, 1) distribution. To analyze the eectiveness of LML to retrieve dierent underlying distributions of the parameters, data were generated for three sets of true distributions of β1i and β2i (i.e., a total of six types of the true distributions)7 and also for two variants: a) short-panel with 500 respondents, and b) long-panel with 2000 respondents. In each panel, each respondent faced 5 choice situations. 100 datasets were generated for each of the three sets of parameter distributions, and also in both short- and long-panel settings, i.e. a total of 600 datasets were simulated. 5 In
LML, exp(α0 zβ ) denes the probability mass function of random parameters, but not of α0 zβ . Thus, the exponentiation may change the shape of the spline, but exibility is still not compromised. 6 Since relative heights are important, a component of α has to be normalized. Thus, in a one dimensional setting, the number of identiable/estimated height parameters = number of knots + 1 (such as 3 parameters in the example with 2 knots). 7 The correlation between β and β was assumed to be zero. 1i 2i
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
7
The true distributions of β1i and β2i in the three data generating processes (DGPs), for each short- and long-panel data, are as follows:
1. N-LN (normal & lognormal) data:
β1i ∼ N (1, 2.25), i.e. normal distribution (mean = 1, standard deviation = 1.5). β2i = eY where Y ∼ N (0, 1), i.e. lognormal distribution (mean = 1.649, standard deviation = 2.161).
2. BN-U (bimodal-normal & uniform) data:
β1i ∼ N (−1, 0.25) with probability =0.5 and β1n ∼ N (1, 0.25) with probability =0.5, i.e. bimodal-normal distribution (mean = 0, standard deviation = 1.118). β2i ∼ U (-2,2), i.e. uniform distribution (mean = 0, standard deviation = 1.155).
3. D-DLN (discrete & discrete-lognormal) data:
β1i = {−2, 2, 0} with probability = 31 for each element of the set, i.e. discrete distribution (mean = 0, standard deviation = 1.633). β2i = {−2, 0} with probability = 41 for each element of the set and β2i = eY (where Y ∼ N (0, 1)) with probability = 0.5, i.e. discrete-lognormal distribution (mean = 0.324, standard deviation = 2.142).
For each of the 600 simulated datasets, 16 models were estimated. These models consist of MMNL-N, ve LML-Polynomial (order varying from one to ve), ve LML-Step (levels varying from two to six), and ve LML-Spline (knots varying from zero to four) models. While four parameters were estimated in MMNL-N, the number of estimated parameters were two, four, six, eight, and ten in the ve specications (respectively, in increasing sequence) of each LML variant. To bring attention toward LML estimation, each dimension of the support set S was divided into 103 equally-spaced points in this study, meaning that the multidimensional grid contains 103×R points where R is the number of random parameters; i.e. 106 points (R = 2) in this study. Instead of using the entire support set, a random subset of 2,000 points within S was drawn for each person, as suggested by Train (2016). Additionally, boundaries of the support set were xed to three standard deviations away from the corresponding MMNL-N mean. Standard performance metrics for Monte-Carlo studies were calculated for all estimated models in each of the six simulated DGPs:
Estimation Time:
Estimation time of all models was recorded and averaged across all 100 datasets. MMNL-N and variants of LML were estimated in MATLAB using the code provided by Train (2009) and Train (2016)8 , respectively. Both MATLAB codes implement the maximum simulated likelihood estimator with the analytical expression of the gradient. The asymptotic standard errors of the parameter estimates of interest were not computed in LML because bootstrapping is the only feasible way to derive standard errors of metrics 8 The
original code provided by Train (2016) estimates models in WTP space. In addition to the changes detailed in our companion paper (Bansal et al. (2017)), we also implemented the estimator in standard preference space.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247
8
associated with the random marginal utilities (standard errors of α can be derived as usual), which is computationally impractical in a simulation study.9 Since we did not compute standard errors for LML, standard errors of MMNL-N model are not reported, but are available upon request. However, when comparing estimation time of MMNL-N and LML, it is worth noting that the reported estimation time for MMNL-N does include computation time of the standard errors. In addition, as detailed below, we did calculate the nite sample standard error.
Model Fit Statistics:
To compare model t, the loglikelihood (LL) and the Bayesian Information Criterion BIC = −2 × ln(LL) + ln(N ) × G (where N : Number of observations and G: Number of identiable parameters) were computed for all estimated models. For a discussion on the use of BIC to compare parametric versus semi/nonparametric specications, see Joo et al. (2010).
Absolute Percentage Bias (APB): APB indicates the deviation of parameter estimates
from their true values. The parameters across 100 datasets were averaged, and the subsequent parameter means were used to compute APB as follows:10 M ean P arameter Estimate − T rue P arameter V alue × 100 AP B = T rue P arameter V alue
Finite Sample Standard Error (FSSE): FSSE of any parameter was calculated by taking the standard deviation of the parameter estimates across the 100 simulated data sets.
Empirical Probability Density function (PDF) plots:
253
To qualitatively analyze the ability of the dierent models to retrieve the underlying distributions, empirical PDF plots of both the estimated parameters and the true distributions are superimposed. PDF plots also provide insights about the impact of assuming a more exible semiparametric mixing distribution (via increasing the number of the parameters of the semiparametric representation) on the ability to retrieve the true distribution.
254
3.2 Simulation Results: Comparisons and Insights
248 249 250 251 252
255 256 257 258 259 260 261 262 263
3.2.1 Estimation Time and Model Fit Statistics Table 1 summarizes the mean of the estimation time and model t statistics of the 100 datasets for all three DGPs in both short- and long-panel settings. The results of LML variants with two, six, and eight parameters are not shown here for succinctness, but are available upon request. Three important insights can be drawn about estimation time by comparing: 1) the ratio of estimation time of short- and long-panel data; 2) the ratio of estimation time of MMNL-N and LML; and 3) sensitivity of LML estimation time to the number of parameters. First, the mean of the ratio of the estimation time of short- (column 3 in Table 1) 9 If
100 bootstrapped samples are considered, estimation time for each LML model will increase by approximately 100 times and total simulation time would be in the order of months. 10 If the true parameter value is 0, then for APB computation, the calculated bias was multiplied by 100.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298
9
and long-panel data (column 6 in Table 1) is 3.8 for MMNL-N, but 6.6 for LML (with 95% approximate condence interval of [5.8, 7.4]).11 This estimation time ratio for MMNL-N is close to the sample size ratio of the long- (N=2000) and short- (N=500) panels. Thus an increasing sample size seems to aect computation time of LML more than MMNL-N. Second, LML with the same number of identiable parameters is computationally ecient when compared to MMNL-N (computation time ratios of 3.7 and 2.2 for shortand long-panel, respectively)12 . Even with ten identiable parameters and sample size of 2000 (long-panel), LML estimation time is in the order of just two to three minutes (see column 6 of Table 1). The computational eciency of LML allows researchers to quickly test numerous competing specications of preference heterogeneity. Standard errors, which are computationally very expensive for LML, can be computed just for the nal specication. If calculation of the asymptotic standard errors is added to LML models and are calculated using 100 bootstrap resamples, LML estimation time would be amplied by a factor of 100, making MMNL-N much cheaper than LML in computation time. Third, in both short- and long-panel data, estimation time of LML variants increases with the number of identiable parameters. The mean ratios of estimation time of LML variants with ten and four parameters using six observations (two observations per DGP) are 7.1 (polynomial), 5.9 (step), and 10.4 (spline). An additional observation is that among LML variants with the same number of parameters, LML-polynomial consistently took the highest estimation time. Model t statistics suggest that increase in the number of parameters in all LML variants improves the loglikelihood value at convergence, but does not necessarily improves model t in terms of BIC; in fact, increasing the number of parameters yields very high BIC values. For example in the BN-U short-panel data generating process, LML-Polynomial with ten parameters has a higher loglikelihood value of -2779.7 and a higher BIC value of 5637.6 versus -2792.6 and 5616.5, respectively, in the corresponding four-parameter case. In other words, more exible LML mixing distributions which require more parameters do not necessarily yield a large enough gain in the likelihood values to make these models preferable in terms of BIC. For all LML variants across both panel settings, the minimum BIC occurred with six parameters in N-LN, with four or six parameters in BN-U, and with four parameters in D-DLN. In the short-panel data, the minimum BIC of LML variants is marginally dierent from that of MMNL-N for all DGPs (except for N-LN)13 , but as the number of parameters increases, MMNL-N considerably outperformed all LML variants in terms of BIC. As expected, these dierences become more apparent in the long-panel data. 11 The
LML mean estimate and condence interval of the estimation time ratio were calculated using 18 observations (6 observations per DGP). 12 LML-Polynomial in N-LN DGP of the long-panel is an exception. 13 For example, in this N-LN case, LML-Spline with six parameters outperformed MMNL-N in terms of BIC with a dierence of 7.8. The expected poor ability of MMNL-N to retrieve parameters with an underlying/true lognormal distribution appears to be a reason of this dierence (see next subsection to nd discussion in the context of APB).
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht
10
TABLE 1 Estimation Time and Model Fit Statistics
Models MMNL-N LML-Polynomial LML-Step LML-Spline MMNL-N LML-Polynomial LML-Step LML-Spline MMNL-N LML-Polynomial LML-Step LML-Spline
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315
Short-Panel Data Long-Panel Data Estimation Time Estimation Time Parameters BIC Loglikelihood BIC Loglikelihood (sec) (sec) True Distributions: Normal and Lognormal [N-LN] 4 4 10 4 10 4 10 4 4 10 4 10 4 10 4 4 10 4 10 4 10
9.1 5.3 40.0 3.2 20.1 3.1 33.3
4625.5 4625.2 4644.5 4639.5 4648.5 4633.3 4644.6
-2297.1 -2296.9 -2283.1 -2304.1 -2285.1 -2301 -2283.2
25.0 31.2 205.2 19.7 126.2 20.3 188.1
18435.3 18432.3 18399.5 18486.4 18407.7 18461.0 18399.9
-9199.3 -9197.7 -9153.7 -9224.8 -9157.8 -9212.1 -9153.9
9.8 2.8 19.3 1.6 8.5 1.5 14.4
5617.1 5616.5 5637.6 5636.3 5653.3 5639.8 5647.7
-2792.9 -2792.6 -2779.7 -2802.5 -2787.6 -2804.2 -2784.8
41.9 24.4 166.9 14.7 69.3 12.6 143.5
22382.8 22383.6 22347.9 22462.3 22402.5 22475.3 22383.2
-11173.0 -11173.4 -11127.9 -11212.7 -11155.2 -11219.2 -11145.6
10.4 5 39.2 2.8 18.7 2.6 29.4
5263.6 5264.4 5297.9 5270.4 5302.4 5261.6 5295.8
-2616.2 -2616.5 -2609.8 -2619.5 -2612.1 -2615.1 -2608.8
45.4 25.9 185.1 14.6 85.6 13.7 139.2
20968.7 20964.1 20977.7 20990.5 20994.8 20953.4 20973.7
-10465.9 -10463.6 -10442.8 -10476.8 -10451.4 -10458.3 -10440.8
True Distributions: Bimodal-Normal and Uniform [BN-U]
True Distributions: Discrete and Discrete-Lognormal [D-DLN]
3.2.2 APB, PDF plots, and FSSE If not specied, inferences are drawn for the average APB of the mean and standard deviation, but separate APB values are reported in Tables 2, 3, and 4. Since short- and long-panel results are almost indistinguishable, with an exception of signicantly lower FSSE for the long-panel, only long-panel results are reported below. Additionally, results of LML variants with parameters two, six, and eight are not summarized here, but are discussed and are also available upon request. Two observations are consistent across all six true distributions, for both short- and long-panel settings. First, increasing the number of parameters in LML does not necessarily increase its ability to retrieve the true parameters; in fact, allowing for more exibility rather decreases retrieval signicantly in many cases. All LML variants with ten parameters appear to have lower APB of standard deviation relative to their counterparts with four parameters.14 However, there is no apparently clear relation between APB and the number of identiable parameters. For instance, consider the case when the true distribution is discretelognormal in the long-panel setting, LML-Spline with four parameters has APB of 16.21 and this value increases to 29.85 for six parameters and decreases to 4.69 for ten parameters (Table 4). Second, across all six distributions, APB of LML-Polynomial with four parameters (2nd 14 LML-Polynomial and LML-Step for the Normal and Discrete-Normal cases, respectively, are mild exceptions (see bold values in the last column of Tables 2 and 4).
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 316 317 318 319 320 321 322 323
11
order) almost matches that of MMNL-N. A similar pattern is observed for other metrics such as BIC, loglikelihood (see Table 1), and FSSE values, which is expected considering that a 2nd order polynomial in LML model yields a normal mixing distribution (Train (2016)). Figures 1, 2, and 3 (top two plots) also conrm that PDFs estimated using MMNL-N and LML-Polynomial with four parameters are indistinct. Another key observation suggests that except in the case of lognormal and discrete-lognormal distributions, MMNL-N outperformed LML-Step and LML-Spline in terms of APB, if the same number of parameters (i.e., four parameters) is considered. Other distribution-specic, if any, insights are described below.
TABLE 2 Parameter Retrieval (Normal and Lognormal [N-LN])
Model
True Parameters MMNL-N LML-Polynomial LML-Step LML-Spline True Parameters MMNL-N LML-Polynomial LML-Step LML-Spline
Model
Parameters Mean Std Dev FSSE Mean FSSE Std APB Mean APB Std True Distribution: Normal 4 4 10 4 10 4 10
1 1.15 1.17 1.19 1.22 1.20 1.19 1.20
1.5 1.80 1.78 1.82 2.04 1.84 1.92 1.83
0.05 0.05 0.06 0.06 0.06 0.05 0.06
0.05 0.05 0.11 0.06 0.11 0.06 0.11
15.0 16.9 19.3 21.8 20.1 18.6 19.7
4 4 10 4 10 4 10
1.65 1.52 1.54 1.73 1.63 1.73 1.58 1.73
2.16 0.95 0.97 1.36 1.05 1.36 1.02 1.37
0.05 0.04 0.06 0.05 0.06 0.04 0.06
0.04 0.04 0.08 0.04 0.08 0.04 0.08
8.0 6.8 5.0 1.1 4.8 4.4 5.1
True Distribution: Lognormal
19.7
19.0 21.5 35.9 22.6 27.9 22.3
56.0 55.2 36.9 51.3 37.3 52.7 36.4
TABLE 3 Parameter Retrieval (Bimodal-Normal and Uniform [BN-U])
True Parameters MMNL-N LML-Polynomial LML-Step LML-Spline True Parameters MMNL-N LML-Polynomial LML-Step LML-Spline
Parameters Mean Std Dev FSSE Mean FSSE Std APB Mean APB Std True Distribution: Bimodal-Normal 4 4 10 4 10 4 10
0 -0.006 -0.006 -0.001 -0.002 -0.003 -0.005 -0.004
1.12 1.62 1.63 1.39 1.76 1.58 1.82 1.54
0.04 0.04 0.04 0.05 0.04 0.05 0.04
0.04 0.04 0.04 0.07 0.04 0.04 0.03
0.6 0.6 0.1 0.2 0.3 0.5 0.4
44.7 46.1 24.2 57.8 41.6 62.9 37.6
4 4 10 4 10 4 10
0 0.004 0.003 0.006 0.006 0.004 0.004 0.003
1.16 1.62 1.63 1.46 1.77 1.58 1.81 1.53
0.04 0.04 0.04 0.06 0.04 0.05 0.04
0.05 0.05 0.05 0.07 0.04 0.05 0.04
0.4 0.3 0.6 0.6 0.4 0.4 0.3
40.1 41.3 26.6 53.3 36.8 56.9 32.6
True Distribution: Uniform
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht
12
TABLE 4 Parameter Retrieval (Discrete and Discrete-Lognormal [D-DLN])
Model
True Parameters MMNL-N
Parameters Mean Std Dev FSSE Mean FSSE Std APB Mean APB Std True Distribution: Discrete
LML-Polynomial LML-Step LML-Spline True Parameters MMNL-N LML-Polynomial LML-Step LML-Spline
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349
4 4 10 4 10 4 10 4 4 10 4 10 4 10
0 -0.007 -0.005 -0.005 -0.010 -0.006 -0.006 -0.009
1.63 2.23 2.25 2.15 2.58 2.19 2.49 2.14
0.07 0.06 0.07 0.07 0.06 0.07 0.06
0.06 0.06 0.10 0.07 0.06 0.06 0.07
0.75 0.48 0.46 0.98 0.63 0.63 0.88
36.72 37.76 31.93 57.69 34.13 52.63 31.31
0.32 0.23 0.24 0.25 0.20 0.27 0.23 0.30
2.14 1.87 1.88 2.18 2.14 2.09 2.05 2.20
0.05 0.05 0.07 0.06 0.07 0.05 0.06
0.06 0.05 0.11 0.06 0.08 0.06 0.09
29.15 25.26 21.72 39.36 17.70 27.89 6.81
12.61 12.32 1.77
True Distribution: Discrete-Lognormal
0.02 2.53 4.53 2.57
Normal:
MMNL-N outperformed all LML variants in terms of APB. This is as expected because the underlying true DGP is aligned with the MMNL-N specication. Figure 1 (left-side plots) also conrms that the PDF estimated using LML-polynomial is close to that of MMNL-N, but LML-Step and LML-Spline are poorly retrieving the true distribution.
Lognormal:
(As expected,) MMNL-N performed the worst among all specications (see columns 7 and 8 in Table 2), and even has a higher average APB than all LML variants with only two parameters. Figure 1 (right-side plots) indicates that increasing the number of parameters appears to improve LML retrieval of true PDF. Bimodal-normal: As the order of the polynomial increases, APB decreases. For polynomials of 5th order (i.e., ten parameters), APB is 12.20, which is considerably lower than APB (22.66) of MMNL-N (see columns 7 and 8 in Table 3). The top left plot in Figure 2 shows that LML-Polynomial with ten parameters is able to retrieve the bimodal shape of the true PDF very closely, but LML-Step and LML-Spline could not (as also evidenced by higher APB values in Table 3). Uniform: The APB values of the more exible LML variants (i.e., with ten parameters) are lower than that of MMNL-N (see last two columns in Table 3). MMNL-N has APB of 20.23, which is higher relative to the APB of exible LML-Polynomial (13.58), LML-Step (18.62), and LML-Spline (16.48). To understand this result qualitatively, the reader can refer to the right-side PDF plots in Figure 2, which show that exible LML variants retrieved the true distribution much better than MMNL-N. More specically, the exible LML-Polynomial outperformed all other specications in terms of APB and visual PDF retrieval. Discrete: None of the exible LML specications could perform signicantly better than MMNL-N in terms of retrieving the true mixing distribution or APB (see left-side plots in Figure 3 and last two columns of Table 4). In fact, none of the models gave even an indication of the discrete nature of the heterogeneity distribution. Thus, along with exible
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372
13
LML specications, latent-class logit models (Greene and Hensher (2003)) need to be taken into consideration to detect eventual discrete distributions of preferences. Discrete-Lognormal: With very exible mixing distributions (i.e. ten parameters), all variants of LML outperformed MMNL-N by a considerable margin. MMNL-N has APB of 20.88, whereas LML-Polynomial, LML-Step, and LML-Spline have APB values of 11.75, 10.12, and 4.69, respectively (see last two columns in Table 4). This improvement in APB with exible LML specications is mostly due to the better retrieval of the lognormal part of the mixing distribution (whereas the discrete part is not really identied), as is also evidenced by the right-side plots of Figure 3. Whereas LML-Spline has much lower APB value relative to other LML specications, it does not appear to retrieve the PDF of the mixing distribution any better. From the analysis we can conclude that better retrieval of the mixing distribution implies a lower APB in general (lognormal, bimodal-normal, and uniform), but the converse is not necessarily true (discrete-lognormal).
Finite Sample Standard Error (FSSE): FSSEs of the mean and standard deviation for long-panel data are reported in columns 5 and 6 of Tables 2, 3, and 4. FSSE of the long-panel data is lower (on average, by a factor of .55) than that of the short-panel. For short-and long-panel data, LML-variants attained minimum FSSE with either four or six parameters, and MMNL-N appears to be as empirically precise as LML. Very low FSSE values also provide evidence of rather small simulation noise in the estimated PDF. Figure 4 conrms this by superimposing sample mean and point-wise 95% condence intervals of the PDF estimated by LML-Polynomial with 10 parameters.
−2
−2
−2
−1
−1
−1
1
1
1
True Distribution
0
True Distribution
0
True Distribution
0
2
2
2
4
5
4
5
4
5
MMNL−N (parameter = 4)
3
MMNL−N (parameter = 4)
3
MMNL−N (parameter = 4)
3 0
0
0 LML−Spline (parameter = 4)
0 −0.5
0.05
0.1
0.15
0.2
LML−Step (parameter = 4)
0 −0.5
0.05
0.1
0.15
0.2
0.5
0.5
0.5
1.5
2
1.5
2
1.5
2 LML−Spline (parameter = 10)
1
LML−Step (parameter = 10)
1
LML−Polynomial (parameter = 10)
1
2.5
2.5
2.5
True Distribution: Lognormal(all right−side subplots)
LML−Polynomial (parameter = 4)
0 −0.5
0.05
0.1
0.15
0.2
FIGURE 1 PDF of Parameters for Long-Panel Data (True Distributions: Normal and Lognormal [N-LN])
0 −3
0.05
0.1
0 −3
0.05
0.1
0 −3
0.05
0.1
True Distribution: Normal(all left−side subplots)
3
3
3
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 14
−4
−4
−4
−3
−3
−3
−2
−2
−2
0
0
0
True Distribution
−1
True Distribution
−1
True Distribution
−1
1
1
1
3
4
3
4
3
4
MMNL−N (parameter = 4)
2
MMNL−N (parameter = 4)
2
MMNL−N (parameter = 4)
2
True Distribution: Bimodal−Normal(all left−side subplots)
−3
−2
−3
−2
−3
−2 LML−Spline (parameter = 4)
0 −4
0.05
0.1
LML−Step (parameter = 4)
0 −4
0.05
0.1
LML−Polynomial (parameter = 4)
0 −4
0.05
0.1
0
1
2
0
1
2
0
1
2 LML−Spline (parameter = 10)
−1
LML−Step (parameter = 10)
−1
LML−Polynomial (parameter = 10)
−1
True Distribution: Uniform(all right−side subplots)
3
3
3
FIGURE 2 PDF of Parameters for Long-Panel Data (True Distributions: Bimodal Normal and Uniform [BN-U])
0
0.1
0.2
0
0.05
0.1
0.15
0.2
0
0.05
0.1
0.15
0.2
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 15
−5
−5
−5
−4
−4
−4
−3
−3
−3
−1
0
−2
−2
0
1
1
0
True Distribution
−1
1
True Distribution
−1
True Distribution
−2
3
4
5
4
5
3
4
5
MMNL−N (parameter = 4)
3
MMNL−N (parameter = 4)
2
2
MMNL−N (parameter = 4)
2 −4
−3
−2
−4
−3
−2
−4
−3 LML−Spline (parameter = 4)
0
0.1
0.2
−2
LML−Step (parameter = 4)
0
0.1
0.2
LML−Polynomial (parameter = 4)
0
0.1
0.2
0
1
2
3
0
1
2
0
1
2 LML−Spline (parameter = 10)
−1
LML−Step (parameter = 10)
−1
3
3
LML−Polynomial (parameter = 10)
−1
True Distribution: Discrete−Lognormal(all right−side subplots)
4
4
4
FIGURE 3 PDF of Parameters for Long-Panel Data (True Distributions: Discrete and Discrete-Lognormal [D-DLN])
0
0.1
0.2
0.3
0
0.1
0.2
0.3
0
0.1
0.2
0.3
True Distribution: Discrete(all left−side subplots)
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 16
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht
17
True Distribution: Bimodal−Normal, Model: LML−Polynomial (10 parameters) 0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
−4
−3
−2
True Distribution
−1 Sample Mean
0
1 95% Lower Bound
2
3 95% Upper Bound
FIGURE 4 Condence Interval of PDF of Parameters for Long-Panel Data
4
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392
18
4. CASE STUDY: ALTERNATIVE-FUEL VEHICLE PREFERENCES IN GERMANY 4.1 Data Description To understand the benets of using LML specications over MMNL-N in practical situations, we revisited microdata about German car-buyers' preferences for alternative-fuel vehicles (Achtnicht, 2012). Analyzing the electro-mobility market has become one of the leading applications of discrete choice models, especially to understand what are the barriers that prevent broader adoption of battery electric vehicles. 598 driving license holders were interviewed at car dealership showrooms and at premises of the technical inspection authority across Germany between August 2007 and March 2008. Part of this survey instrument was a discrete choice experiment where each respondent was asked to select the preferred vehicle option from seven dierent alternative-fuel vehicles in six choice situations. Besides fuel type or propulsion system, the vehicles were characterized by purchase price, engine power, fuel costs, CO2 emissions, and fuel availability. Table 5 shows the fuel types and attribute levels of the experiment. It is noteworthy that purchase price and engine power were customized based on upper and lower bounds for price and horsepower, respectively, which respondents stated for their next intended car purchase. In addition, to increase realism, the zero-emission level was only applied to non-fossil fuels, and the lowest level of fuel availability was not applied to gasoline and diesel. See Achtnicht (2012) for further details about the experiment design.
TABLE 5 Attributes and Levels for the Discrete Choice Experiment Attribute Levels
Fuel Type Gasoline, Diesel, Hybrid, LPG or CNG, Biofuel, Hydrogen, Electric Purchase Price 75%, 100%, 125% of referencea (in e) Engine Power 75%, 100%, 125% of referencea (in hp) Fuel Costs per 100km e5,e10,e20 CO2 Emissions per km no emissionsb , 90g, 130g, 170g, 250g Fuel Availability 20%c , 60%, 100% of service station network a average of the lower and upper bounds for the next car indicated by the respondent b only applied to non-fossil fuel types (i.e. biofuel, hydrogen, and electric) c not applied to conventional fuel types (i.e. gasoline and diesel)
393 394 395 396 397 398 399 400 401
4.2 Model Specication and Estimation
We estimated both MMNL-N and LML model in WTP space (Train and Weeks, 2005). In the context of this study, MMNL-N in WTP space implies that WTP for each vehicle attribute is normally-distributed and the coecient of purchase price (identied up to scale) is lognormally-distributed. Thus, with four vehicle attributes and the assumption of uncorrelated coecients, ten parameters were estimated for the MMNL-N model. For LML, twelve dierent specications were considered four LML-Polynomial (order varying from one to four), four LML-Step (levels varying from two to ve), and four LML-Spline (knots varying from zero to three).15 The standard errors of LML are feasible to compute in this applica15 In
other words, for each type of LML sub-specication (e.g., LML-Polynomial), four specications were considered with identiable parameters varying from ve to twenty.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht
19
404
tion study and therefore, they were computed using 100 bootstrapped samples. MMNL-N in WTP space was estimated in R using the GMNL package (Sarrias and Daziano, 2017) and variants of LML were estimated in MATLAB using the code provided by Train (2016).16
405
4.3 Results and Insights
402 403
406 407 408 409 410
Table 6 summarizes model t statistics, but LML variants with only parameters ten and twenty are kept for better interpretability. The results indicate that the loglikelihood decreases and BIC increases with the number of parameters in LML. This observation is consistent with the results of the Monte-Carlo study: a higher number of parameters in LML does not guarantee a better t.
TABLE 6 Case Study: Model Fit Statistics
Models
MMNL-N LML-Polynomial LML-Step LML-Spline
411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432
Parameters Loglikelihood 10 10 20 10 20 10 20
-5990.8 -5987.3 -5967.6 -5971.9 -5964.3 -5991.7 -5971.9
BIC
12063.4 12056.4 12099.0 12025.6 12092.3 12065.2 12107.4
Table 7 shows that mean and standard deviation of WTP estimates for all vehicle attributes are statistically signicant in MMNL-N and all LML variants. LML variants with ve and fteen parameters are not shown in the table for succinctness. All models achieve very similar mean and standard deviation estimates, but PDF plots of the estimated WTP varies considerably with the number of parameters (Figures 5 and 6). This observation highlights the importance of comparing the resulting PDF plots. Key insights about WTP of dierent attributes are summarized below: Fuel Cost: According to MMNL-N (with ten parameters), WTP to save e1 (per 100 km) in fuel is normally-distributed with a mean of e2040. However, LML-Polynomial and LML-Step with twenty parameters identify two groups in the population one with a mean WTP of more than e4500 and another with mean WTP of less than e1000 (see left-side plots in Figure 5). Additionally, WTP appears to be normally distributed within each group. Fuel Availability: MMNL-N provides a mean WTP estimate for fuel availability of e350 for a one percent increase in the density of the refueling/recharging network. LML variants with twenty parameters indicate that heterogeneity in WTP is bimodal, with one group having a normally-distributed WTP with mean lower than e300, and another group (more than 15% of the population) exhibiting a mean WTP that is higher than e800 (Right-side plots in Figure 5). Engine Power: MMNL-N provides a mean WTP estimate for engine power of e150 for a marginal increase in horsepower, but the most exible LML variants indicate that heterogeneity in WTP is bimodal, with one group having a normally-distributed 16 Since
LML and MMNL-N were estimated in two dierent programming environments, estimation time of both are not compared in the empirical study.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456
20
WTP with mean close to e0, and another group (more than 20% of the population) exhibiting a mean WTP that is higher than e500 (left-side plots of Figure 6). CO2 Emissions: MMNL-N provides a mean WTP estimate for reducing 100g of CO2 emissions per kilometer of e6000, but the most exible LML variants again indicate that heterogeneity in WTP is bimodal, with a large group who is not willing to pay anything to reduce CO2 emissions, and another group (around 10% of the population) exhibiting a mean WTP of e26000 to reduce 100g of CO2 emissions per kilometer (see right-side plots of Figure 6). Overall, we see that the most exible LML models, regardless of the mixing distribution (polynomial, spline, or step function), all identify bimodal WTP distributions for the vehicle attributes. The PDF plots suggest that there are virtually two groups of consumers: one with a rather high WTP for product improvements and one with a very low or zero WTP. The MMNL-N specications give very similar mean and standard deviation estimates but naturally fail to recover such bimodal patterns, which are relevant for marketing eorts. The original analysis of the data Achtnicht (2012) mimics to some extent the bimodal shape of WTP that we obtain with LML. The orginal MMNL model in preference space included two xed price coecients (purchase price is included alone and interacted with a dummy indicating a low intended price) and normal and lognormal random parameters for the other car attributes. This specication results in two overlapping (log)normal WTP densities, but only the mean is set to be dierent. Achtnicht (2012) nds that WTP estimates dier by a factor of 3, depending on the intended price range for the next car, and concludes that for consumers who plan to buy a cheaper car, price is more determinative than other attributes. Our analysis suggests that LML is a powerful tool to identify such varying preferences even before introducing other covariates.
TABLE 7 Case Study: Parameter Estimates
Mean of WTPc
Parameters Purchase price/scale Fuel costd Fuel availabilitye Engine powerf CO2 emissiong
MMNL-N
10 -5.94b () -2.04(0.21) 0.35(0.04) 0.15(0.02) -0.06(0.008)
LML-Polynomial
10 -5.35(0.34)a -2.14(0.15) 0.37(0.02) 0.16(0.02) -0.07(0.008)
20 -5.5(0.48) -2.15(0.22) 0.37(0.03) 0.16(0.02) -0.07(0.009)
Mean LML-Step
10 -5.39(0.38) -2.2(0.17) 0.38(0.03) 0.16(0.02) -0.07(0.008)
20 -5.39(0.47) -2.25(0.19) 0.38(0.03) 0.16(0.02) -0.07(0.008)
Standard Deviation
Purchase price/scale 5.23() 2.89(0.25) 2.92(0.31) 2.88(0.28) Fuel cost 1.7(0.20) 1.67(0.13) 1.77(0.18) 1.8(0.15) Std. Dev. Fuel availability 0.28(0.03) 0.27(0.02) 0.3(0.03) 0.28(0.04) of WTP Engine power 0.21(0.03) 0.24(0.02) 0.24(0.02) 0.25(0.02) CO2 emission 0.1(0.01) 0.1(0.007) 0.1(0.008) 0.1(0.007) a Standard error is reported in the parenthesis. b Price coecient is scaled down by 100,000 and all explanatory variables are scaled down by c WTP is scaled down by a factor of 1000. d WTP to save e1 per 100 km e WTP for 1% increase in density of recharging/refueling network f WTP for unit increase in horse-power g WTP to reduce 100 gm of CO2 emissions per km
2.96(0.34) 1.85(0.15) 0.32(0.04) 0.24(0.02) 0.1(0.01)
LML-Spline
10 -4.84(0.23) -2.44(0.14) 0.41(0.02) 0.17(0.02) -0.07(0.009)
20 -5.39(0.38) -2.2(0.18) 0.38(0.03) 0.16(0.02) -0.07(0.008)
2.89(0.01) 1.94(0.02) 0.31(0.002) 0.25(0.0009) 0.11(0.0007)
2.88(0.27) 1.8(0.15) 0.28(0.03) 0.25(0.02) 0.1(0.007)
a factor of 100.
−3
−2
−1
0
1
0
0.02
0.04
0.06
0.08
0
0.05
0.1
−5
−5
−3
−3
−2
−2
0
0
MMNL−N (parameter = 10)
−1
MMNL−N (parameter = 10)
−1
1
1
0
−0.1 LML−Spline(parameter = 10)
0
0.05
0.1
0
−0.1 0 LML−Step(parameter = 10)
0
0.05
0.1
0.15
LML−Polynomial(parameter = 10)
−0.1
0.2
0.3
0.4
0.5
0.2
0.3
0.4
0.5 LML−Spline(parameter = 20)
0.1
0.1 0.2 0.3 0.4 0.5 LML−Step(parameter = 20)
LML−Polynomial(parameter = 20)
0.1
0.6
0.6
0.6
0.7
0.7
0.7
PDF of WTP for Fuel Availability (all right−side subplots)
FIGURE 5 PDF of Parameters of Willingness to Pay for Fuel Cost and Fuel Availability
−4
−4
MMNL−N (parameter = 10)
0
−4
0
−5
0.05
0.1
0.15
0.02
0.04
0.06
0.08
PDF of WTP for Fuel Cost (all left−side subplots)
0.8
0.8
0.8
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 21
0.2
0.3
0.4
0
0
0.05
0.1
0.15
0.2
−0.2
−0.2
0
0
0.1
0.1
0.3
0.4
0.3
0.4
MMNL−N (parameter = 10)
0.2
MMNL−N (parameter = 10)
0.2
0.5
0.5
−0.2
−0.2
LML−Spline(parameter = 10)
0
0.05
0.1
LML−Step(parameter = 10)
−0.2
−0.1
−0.05
−0.1
−0.05
−0.1
−0.05 LML−Spline(parameter = 20)
−0.15
LML−Step(parameter = 20)
−0.15
LML−Polynomial(parameter = 20)
−0.15
0
0
0
0.05
0.05
0.05
PDF of WTP for CO2 Emission (all right−side subplots)
FIGURE 6 PDF of Parameters of Willingness to Pay for Engine Power and CO2 Emission
−0.1
−0.1
0
LML−Polynomial(parameter = 10)
0.5
0
MMNL−N (parameter = 10)
0.1
0.05
0
0.1
−0.1
0.1
−0.2
0.05
0.1
0.2
0
0.05
0.1
0.15
0.2
PDF of WTP for Engine Power (all left−side subplots)
0.1
0.1
0.1
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 22
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496
23
5. CONCLUSIONS
In this paper we have conducted Monte-Carlo studies to gauge the ability of dierent variants of the very recently proposed logit-mixed logit (LML) model a generalized semiparametric random paremeter logit for retrieving the underlying preference heterogeneity distributions. Flexibility of LML models is associated with the number of parameters in the semiparametric representation of the mixing distribution. A second goal of the experiments was to compare LML performance with the most widely used random-parameter parametric specication: the mixed multinomial logit model with normal heterogeneity (MMNL-N). In the simulation experiments, we estimated 16 models in each of the 6 cases of the simulation plan where the data generating process imposes dierent distribution assumptions on two uncorrelated random parameters. To ensure the stability of the parameter estimates, all models were estimated for 100 datasets. First, the ndings of this study mostly agree with the result of Fosgerau and Hess (2007), in that the likelihood at convergence increases with the exibility of the mixing distribution in semiparametric models (with a few exceptions). However, unlike Fosgerau and Hess (2007) we did not observe that a higher number of parameters necessarily improves the ability to actually recover the underlying heterogeneity distribution.17 In fact, the results of our study do not support the blind use of very exible distributions because at times semiparametric representations with a large number of parameters resulted in very high bias in retrieving the true mixing distributions. Whereas the number of parameters are varied from two to ten in all LML variants and MMNL-N remains at four parameters, in our Monte Carlo setting the minimum value of BIC in LML generally occurred at four or six parameters (with a few exceptions). Since even the minimum-BIC model may not retrieve well behavioral characteristics of the mixing distribution (e.g., multimodality), as a general guideline to choose the number of parameters in LML, we suggest to start with the same number of parameters of a corresponding parametric model (such as MMNL-N) and increase the number of mixing parameters while checking changes in the derived histogram or kernel density plots of the mixing distribution. One can stop adding parameters when no major changes in the PDF plots are observed. Second, exible LML specications were able to retrieve lognormal, bimodal-normal, and uniform preference heterogeneity. Since LML-Polynomial either emulated or outperformed LML-Step and LML-Spline in terms of retrieving the true underlying distributions, we recommend the use of polynomials in practice. A modeling warning was also detected, as discrete mixing distributions could not be retrieved using LML specications,not even with a relatively large number of parameters. We thus suggest to consider parametric discrete heterogeneity distributions in conjunction to LML models when working with real data. Third, if asymptotic standard errors are not required, LML is computationally very ecient. This characteristic of LML is useful in quickly trying out various specications with dierent number of parameters and diering functional forms of the mixing distributions. The computation of standard errors is very expensive for LML models, but can be aorded 17 Since
LML-Polynomial of this study is a direct generalization of the approach used by Fosgerau and Hess (2007), we can compare the results of this study with their study. We note that Fosgerau and Hess (2007) only used the likelihood values and qualitative CDF comparisons (instead of quantitative measures for the bias) as performance metrics, which may be the reason behind this contrast in the conclusions of both studies.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht
24
514
for nal specications. In addition to the Monte Carlo study, we have also analyzed LML and MMNL-N willingness-to-pay estimates for marginal improvements in the attributes of alternative-fuel vehicles by German consumers. Interestingly, exible LML specications in this practical setting identied bimodal variations in WTP across the population that are not evident in MMNL-N without added covariates. Whereas this study provides many insights about LML performance, additional simulation experiments are needed to evaluate the robustness of these conclusions. In addition to test for a variety of data settings such as variation in the number of alternatives and number of choice situations in the panel data, other LML-specic variations may include setting boundaries for the domain of the support set S , number of points in each dimension of S (i.e., granularity of the grid), and number of samples to be drawn from S in the estimation process. Other functional forms of mixing distributions (e.g., combination of polynomial and spline) could be explored in the hope of nding those with a better ability to retrieve the underlying heterogeneity distributions of the random parameters. Finally, even though BIC has been used as a criterion for model selection in semiparametric settings, future research should look into better tools to quantitatively compare semi- and nonparametrically estimated mixing distributions.
515
ACKNOWLEDGMENT
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513
518
This research is based upon work supported by the National Science Foundation Award No. CMMI-1462289. The authors are thankful to Professor Kenneth Train and reviewers for their suggestions to improve the manuscript.
519
References
516 517
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534
Achtnicht, M. (2012). German car buyers' willingness to pay to reduce co2 emissions. Climatic change, 113(3-4):679697. Bajari, P., Fox, J. T., and Ryan, S. P. (2007). Linear regression estimation of discrete choice models with nonparametric distributions of random coecients. The American Economic Review, 97(2):459463. Bansal, P., Daziano, R. A., and Achtnicht, M. (2017). Extending the logit-mixed logit model for a combination of random and xed parameters. Technical Report 17-02124, Transportation Research Board 96th Annual Meeting. Bastin, F., Cirillo, C., and Toint, P. L. (2010). Estimating nonparametric random utility models with an application to the value of time in heterogeneous populations. Transportation Science, 44(4):537549. Bhat, C. R. (1997). An endogenous segmentation mode choice model with an application to intercity travel. Transportation science, 31(1):3448. Burda, M., Harding, M., and Hausman, J. (2008). A bayesian mixed logitprobit model for multinomial choice. Journal of Econometrics, 147(2):232246.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568
25
Fiebig, D. G., Keane, M. P., Louviere, J., and Wasi, N. (2010). The generalized multinomial logit model: accounting for scale and coecient heterogeneity. Marketing Science, 29(3):393421. Fosgerau, M. and Bierlaire, M. (2007). A practical test for the choice of mixing distribution in discrete choice models. Transportation Research Part B: Methodological, 41(7):784794. Fosgerau, M. and Hess, S. (2007). Competing methods for representing random taste heterogeneity in discrete choice models. Technical report, Working paper, Danish Transport Research Institute, Copenhagen. Fosgerau, M. and Mabit, S. L. (2013). Easy and exible mixture distributions. Economics Letters, 120(2):206210. Fox, J. T., Ryan, S. P., and Bajari, P. (2011). A simple estimator for the distribution of random coecients. Quantitative Economics, 2(3):381418. Greene, W. H. and Hensher, D. A. (2003). A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37(8):681 698. Joo, Y., Wells, M. T., Casella, G., et al. (2010). Model selection error rates in nonparametric and parametric model comparisons. In Borrowing Strength: Theory Powering ApplicationsA Festschrift for Lawrence D. Brown, pages 166183. Institute of Mathematical Statistics. Kamakura, W. A. and Russell, G. (1989). A probabilistic choice model for market segmentation and elasticity structure. Keane, M. and Wasi, N. (2013). Comparing alternative models of heterogeneity in consumer choice behavior. Journal of Applied Econometrics, 28(6):10181045. McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics, pages 105142. McFadden, D. and Train, K. (2000). Mixed mnl models for discrete response. Journal of Applied Econometrics, 15(5):447470. Rossi, P. E., Allenby, G. M., and McCulloch, R. (2012). Bayesian Statistics and Marketing. John Wiley & Sons. Sarrias, M. and Daziano, R. (2017). Multinomial logit models with continuous and discrete individual heterogeneity in r: The gmnl package. Journal of Statistical Software, Articles, 79(2):146. Train, K. (2016). Mixed logit with a exible mixing distribution. Journal of Choice Modelling, 19:4053.
Prateek Bansal, Ricardo A. Daziano and Martin Achtnicht 569 570 571
26
Train, K. and Weeks, M. (2005). Discrete choice models in preference space and willingnessto-pay space. In Applications of simulation methods in environmental and resource economics, pages 116. Springer.
573
Train, K. E. (2008). Em algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling, 1(1):4069.
574
Train, K. E. (2009). Discrete choice methods with simulation. Cambridge university press.
572