Computational Bayesian statistics in transportation modeling: from road safety analysis to discrete choice
Ricardo A. Daziano
School of Civil and Environmental Engineering, Cornell University, Ithaca, NY 14853; Email:
[email protected]
Luis Miranda-‐Moreno Department of Civil Engineering, McGill University, Montréal QC, Canada H3A OC3; Email: luis.miranda-‐
[email protected]
Shahram Heydari Department of Civil Engineering, McGill University, Montréal QC, Canada H3A OC3; Email: May 2013 In this paper we review both the fundamentals and the expansion of computational Bayesian econometrics and statistics applied to transportation modeling problems in road safety analysis and travel behavior. Whereas for analyzing accident risk in transportation networks there has been a significant increase in the application of hierarchical Bayes methods, in transportation choice modeling the use of Bayes estimators is rather scarce. We thus provide a general discussion of the benefits of using Bayesian Markov chain Monte Carlo methods to simulate answers to the problems of point and interval estimation and forecasting, including the use of the simulated posterior for building predictive distributions and constructing credible intervals for measures such as the value of time. Although there is the general idea that going Bayesian is just another way of finding an equivalent to frequentist results, in practice Bayes estimators have the potential of outperforming frequentist estimators and, at the same time, may offer more information. Additionally, Bayesian inference is particularly interesting for small samples and weakly identified models. Keywords: Bayesian statistics; MCMC; discrete choice; road safety
1 Introduction The study of uncertainty has become a paramount topic for several fields that analyze transportation as a system, including economics, statistics, and operations research. On the one hand, the frequentist or classical approach handles uncertainty about the true parameters of a statistical model by considering these parameters as fixed but unknown constants. On the other hand, the Bayesian approach considers the true parameters to be random variables. There are several advantages to the Bayesian approach, including estimators that are exact,1 compatibility with the likelihood principle, the possibility of introducing prior knowledge,2 and the flexibility of working with predictive posteriors. In fact, based on the Bernstein-‐von Mises theorem it is possible to argue that frequentist estimators are a good approximation of the Bayesian approach as the sample size gets larger.3 Thus, in this paper we review the expansion and modeling benefits of computational Bayesian econometrics and statistics applied to transportation modeling problems in travel behavior and road safety analysis. The aim of this paper is to promote broader adoption of Bayesian statistics in transportation modeling through a general discussion of the benefits of Bayesian inference and forecasting. We illustrate the benefits of Bayes estimators versus frequentist estimators by analyzing small-‐sample properties in two Monte Carlo studies. We first overview the fundamentals of Bayesian statistics (section 2), paying special attention to simulation-‐aided inference using MCMC methods. In effect, the expansion of Bayesian methods builds on the rapid development of Markov chain Monte Carlo (MCMC) techniques, which are a class of simulation-‐based estimators. Then we identify two relevant fields of transportation modeling for which different levels of application of the Bayesian approach can be found. On the one hand, in analyzing accident risk in transportation networks there has been a significant increase in the application of hierarchical Bayes methods (e.g. Heydecker and Wu, 2001, and Song, 2005; Miranda-‐Moreno et al., 2005; section 3 reviews these applications). One of the main advantages of hierarchical Poisson models is that they can deal with over-‐ or under-‐dispersion, spatial random variations, time trends, and clustering in the data. Another benefit of the Bayesian approach is that the simulated posterior predictive distribution is a direct result of MCMC estimation. Different risk measures can then be derived from the posterior distribution and used for multiple purposes, such as before-‐after observational studies and identification of accident-‐risk
1 Bayes estimators have properties that are valid for small samples.
2 Prior knowledge includes building on previous research as well as theoretical constraints, such as parameters having a
particular sign. 3 For small samples, frequentist estimators lose their good statistical properties.
contributing factors. To show the advantage of Bayesian statistics over MLE and to analyze the impact of different modeling decisions and parameters that can affect the final outcome of safety analyses, a data simulation framework is developed to measure the accuracy of parameter estimates. Furthermore, we examine the effect of the approach (Full Bayes vs. Empirical Bayes), differing model assumptions and ranking methods (Gamma vs. LogNormal), and type of prior choice (informative versus non-‐informative). Our posterior analysis highlights the fact that more complex methods do not necessarily provide better results. In fact, the impact of the model error structure seems to be marginal. While there are hundreds of Bayesian applications in road safety analysis, in travel choice analysis the large majority of applications are frequentist (with the exception of a very few excellent Bayesian examples to be overviewed in section 4). We argue here, however, that the application of Bayesian microeconometrics to choice modeling is natural because the concept of subjective probabilities is akin to the behavioral assumptions of discrete choice models. Despite this affinity, transportation modelers have generally been reluctant to adopt Bayesian techniques, even with estimators available for the most common models, including mixed logit with parametric (Train, 2009) and nonparametric heterogeneity distributions, probit (Albert and Chib, 1993), and nested logit models (Lahiri and Gao, 2002). In empirical transportation choice modeling only very few applications use Bayes estimators (e.g. Bolduc et al., 1997; Kim et al., 2003; Fang, 2008; Washington et al., 2010). Although some researchers have adopted Bayesian tools for specific modeling components, including efficient design of stated-‐preference experiments, choice set modeling, and latent class models, estimation and forecasting problems are not usually treated from a Bayesian perspective. For instance, even though Bayes’ Theorem is the basis for deriving latent class models, researchers continue to use frequentist estimators. Taking the simulation of accident risk analysis as inspiration, we illustrate the benefits of Bayes inference in choice modeling by performing a Monte Carlo study for analyzing interval estimation of the value of time. The Monte Carlo focuses on small-‐sample properties of the estimator and of the estimates derived by postprocessing draws of the original parameters. In conclusion, section 5 overviews reasons for encouraging a broader adoption of Bayesian tools.
2 Overview of Bayesian econometrics 2.1 Bayesian statistical models This first subsection overviews concepts that are treated in detail in Gourieroux (1995), Lancaster (2004),
Geweke
(2005),
and
Greenberg
(2008).
Consider
the
statistical
model
(Y ,P = Pθ = l( y;θ )µ ,θ ∈ Θ ⊆ R p , p ≥ 1) , where y is a vector of observations in the sample space
Y , P is a parameterized family of probability density functions Pθ on Y , l( y;θ ) is the likelihood function, µ is the dominating measure, θ is a vector of p unknown parameters, and Θ is the parameter space. In parametric statistics, the point estimation problem reduces to propose a value
θˆ to the true but unknown θ . In a Bayesian setting of statistical decision problems θ has a prior p(θ ) that describes the probability distribution of the unknown parameter before having access to the evidence. The combination of the prior with the information coming in via the sample data determines the posterior distribution of the parameters p(θ | y) . The posterior and prior distributions are related following Bayes theorem according to p(θ | y) = p( y | θ ) p(θ ) / p( y) , where p( y | θ ) represents the distribution of the observations for every particular value of θ , and
p( y) is the marginal distribution of the data. Note p( y | θ ) = l( y;θ ) by definition. Since p( y) is a constant, for inference purposes Bayes theorem is rewritten as p(θ | y) ∝ p( y | θ ) p(θ ) , which emphasizes the notion of updating knowledge through evidence. Bayesian inference is built as a decision making process under uncertainty, where the researcher chooses an estimator that minimizes the probability of being wrong. If the decision is taken ex post (after the observation of
y ), the Bayes estimator is built by minimizing the expected loss or posterior risk
R(θˆ | y) =
∫ L(θˆ | θ ) p(θ | y)d(µθ)
(1)
Θ
where the loss function L(θˆ | θ ) measures the cost of taking a wrong decision. In general, a linear-‐ linear loss function yields to a Bayes decision equal to the posterior qth quantile. However, the most common loss function is the general quadratic loss L(θˆ | θ ) = (θˆ − θ )'Q(θˆ − θ ) , where Q is a positive definite matrix. With a quadratic loss function, the Bayes estimator θˆ is unique and corresponds to the mean of the posterior distribution
θˆ = ∫ θ p(θ | y)d(µθ) = E(θ | y) Θ
(2)
In addition, with a quadratic loss function an unbiased estimator in the Bayesian sense of the precision (risk) of the Bayes estimator is the posterior variance. Even though Bayesian and classical inference are intrinsically different, a Bayesian estimate depends on the data and a different sample will generate a different estimate. If we denote the true a
value of θ by θ 0 , the Bernstein-‐von Mises theorem states that θˆ = E(θ | y)→ N θ 0 , I (θ 0 )−1 / N ,
(
)
where I (θ 0 ) is the asymptotic Fisher information matrix. Thus, under regularity conditions, the Bayes estimator θˆ is asymptotically unbiased, consistent, normal, and efficient, which are the same asymptotic properties of the maximum likelihood estimator. In fact, when N is large, the Bayes and maximum likelihood estimators are approximately equal. Note that for large samples the evidence provided by the observations is such that the effect of the prior can be neglected. 2.2 Markov chain Monte Carlo methods Bayesian inference examines the posterior distribution p(θ | y) , and in particular the posterior first and second moments. Although the density of the posterior distribution can be obtained using Bayes theorem, the main difficulty concerns the analytical characterization of this distribution. When p(θ | y) is not known explicitly, Bayesian inference makes use of Markov chain Monte Carlo (MCMC) methods, which are a class of stochastic sampling algorithms based on constructing a Markov chain that has the desired posterior as its equilibrium distribution (Gelfand and Smith, 1990). MCMC estimation belongs to the class of Monte Carlo methods of repeated sampling for simulation, and hence is relatively easy to implement. Computational Bayesian statistics has allowed researchers to simulate posteriors that are analytically intractable, just as simulation-‐aided inference has permitted frequentist estimation of sophisticated models. The difference between simulation for Bayesian inference and for frequentist inference is that the former stops at the evaluation of a density, whereas the latter involves maximization of a simulated likelihood. As a result, computational Bayesian statistics is gradient and Hessian free. There are different methods to iteratively build a Markov chain converging to the posterior of interest, but Gibbs sampling and Metropolis Hastings are the most typically applied.
2.3 Gibbs sampling Gibbs sampling requires that all conditional distributions of a specific partition of the parameter space have a closed-‐form to draw samples from. Let P be a partition of θ ∈ Θ such that ' ' ) , θ > p = (θ p+1 ,...,θ P' ) , θ P = ∅ , and θ − p = (θ < p ,θ > p ) . The partition is chosen such that it is possible to draw directly from each full conditional distribution π (θ p | θ − p ) . The Gibbs sampler is an algorithm based on an MCMC where, at iteration gth, the transition process from θ ( g−1) to θ ( g ) is defined through
θ p( g ) ~ π (θ p | θ ( g−1) ) . It can be shown that this reversible Markov chain generates an instance p from the posterior distribution at each iteration. 2.4 Metropolis Hastings Sometimes direct sampling for one or more of the conditional distributions inside the Gibbs sampler is difficult, because of a conditional distribution lacking a closed form. For Metropolis-‐ Hastings implementation, a candidate θ (cand ) is drawn from the transition probability
q(θ (cand ) | θ (curr ) ) of generating the candidate given the current guess. The candidate realization is then compared to the current guess through the acceptance rate:
! p( y | θ (cand ) ) p(θ (cand ) )q(θ (cand ) | θ (curr ) ) $ α = min "1, % (curr ) ) p(θ (curr ) )q(θ (curr ) | θ (cand ) ) & # p( y | θ . Starting with an arbitrary value θ (0) in the Metropolis Hastings algorithm at the gth iteration the candidate is accepted as the new θ ( g ) = θ (cand ) with probability α , whereas the old one is preserved
θ ( g ) = θ (curr ) with probability 1− α . In the case q(θ (cand ) | θ (curr ) ) = q(θ (cand ) − θ (curr ) ) the generating process of the candidate is a random-‐walk Metropolis chain. If the proposal density is such that
q(θ (cand ) | θ (curr ) ) = q(θ (cand ) ) the process is a Metropolis independence chain. Realizations in the Metropolis Hastings algorithm generate instances from the posterior distribution. Parameters of the transition probability are chosen such that the proposed candidates will explore areas of high probability in an efficient manner, i.e. avoiding poor mixing. Choosing the correct parameters for obtaining desired acceptance rates is known as tuning. Note that the Gibbs sampler is a special case of the Metropolis Hastings algorithm, where the proposal density is given by the full conditional distributions, the acceptance rate always equals one, and no tuning is necessary.
2.5 Summarizing and postprocessing simulated posteriors Once draws of the posterior distribution have been simulated using Monte Carlo sampling, point and interval estimates can be easily constructed. Whereas the Bayes point estimator in equation (2) depends on the analytical posterior, when using MCMC methods it is possible to use an empirical mean. Thus, the point estimate is simply calculated as the mean of the posterior sample
θˆ = 1/ GΣGg=1θˆ ( g ) , where G is the total number of iterations used for simulating the posterior. The 2.5% and 97.5% posterior quantiles can be used for deriving a 95% credible interval (although highest probability density regions should be preferred). A very attractive feature of MCMC methods is the possibility of postprocessing the sample to make inference on a function of the original parameters (Edwards and Allenby, 2003). For instance, the point estimate of a function
h(θˆ ) is given by h(θˆ ) = 1/ GΣGg=1h(θˆ ( g ) ) . 3 Bayesian statistical models of road safety In the road safety literature, Bayesian methods have become very popular with hundreds of peer-‐ reviewed articles published in this topic. These methods are commonly used for two tasks: i) ranking and selection of hotspots for engineering safety improvements and ii) evaluation of countermeasures in before-‐after studies. In terms of statistical modeling, let yi be the observed number of accidents at site i, which is assumed to be generated according to a parametric probability density function f (⋅ | θ i ) (e.g., a Poisson distribution with mean unknown θ i ). For Bayesian inference a prior distribution p(θ i | η ) is first assumed, where η is a vector of prior parameters. This prior information is then combined with the information brought by the sample into the posterior p(θ i | yi ) . Within the class of Bayesian methods, we can distinguish two main approaches: empirical Bayes (EB) and full hierarchical Bayes (FHB). One important difference between these two approaches is in the way the prior parameters are determined. In the EB approach, the parameters η are estimated using a maximum likelihood estimator (MLE) or any other frequentist technique (e.g., method of moments). Thus, in EB the estimates ηˆ depend only on the (accident) data. Alternatively, FHB assumes an additional layer of randomness on η , where hyperparameters are set by the modelers. In the following subsections both approaches are illustrated in the context of road safety analysis.
3.1 MLE and Empirical Bayes The EB approach is commonly implemented through the use of Negative Binomial models (Abbess et al., 1981; Hauer and Persaud, 1987; and Higle and Witkowski, 1988 use these models for hotspot identification). Assuming that the number of accidents Yi at site i over a given time period Ti is Poisson distributed, the standard Negative Binomial (or Poisson-‐Gamma) model for Yi can be represented as (Winkelmann, 2003),
i. Yi | T i ⋅θ i ~ Poisson(Ti ⋅ θ i ) , ~ Poisson(Ti ⋅ µi e i ) ε
(3)
(4)
ii. eεi | φ ~ Gamma(φ , φ ) or θ i ~ Gamma(φ , φ / µi ) , ε
ε
where e i is a multiplicative random effect and is Gamma distributed with E(e i ) = 1 and ε
V (e i ) = 1/ φ . µi is the mean response for site i, and φ is the inverse dispersion parameter of the Poisson-‐Gamma distribution. Also, µi is commonly specified as a function of site-‐specific attributes or covariates as in µi = f (Fi1 , Fi 2 , xi ; β ) , where Fi1 and Fi 2 are traffic flows representing a measure of traffic exposure.4 xi is a vector of covariates representing site-‐specific attributes and β is a vector of regression parameters to be estimated from the data. To capture the nonlinear relationship between the mean number of accidents and traffic flows, alternative functional forms can be specified for µi . For instance, a common functional form for intersections is
µi = β0 ( Fi1 + Fi 2 )
β1
(F
i1
Fi 2
)
β2
(see Miaou and Lord, 2003). After the integration of the random
ε
error e i , the marginal distribution of Yi is the density of the Negative Binomial distribution. The loglikelihood of the model can be then maximized numerically. 3.2 Full Hierarchical Bayes FHB is implemented through the use of hierarchical Poisson models. Considering again that Yi is Poisson distributed, a hierarchical Poisson-‐Gamma model can be defined as follows (Rao, 2003):
4 Because of the lack of disaggregate data, traffic exposure is usually defined according to the average annual daily traffic
(ADDT) flows. Little work has been focused on the relationships between crashes and other traffic flow characteristics such as vehicle density, level of service, vehicle occupancy, speed distribution, etc (Lord et al., 2005b). Few works have considered hourly exposure functions accounting for traffic composition and other temporal variations (we refer to the work of Qin et al., 2004).
i. Yi | Ti ,θ i ~ Poisson(Ti ⋅ θ i )
~ Poisson(Ti ⋅ µi e i ) ε
(5)
ii. e i | φ ~ Gamma(φ , φ ) ,
(6)
iii. φ ~ Gamma(a,b)
(7)
ε
Unlike EB, in the FHB approach the inverse dispersion parameter φ is assumed to be random following a Gamma distribution with parameters a and b . Furthermore, the regression coefficients β in µi = f (Fi1 , Fi 2 , xi ; β ) are also assumed to be random. As a result, this third level of randomness assumed on the model parameters (φ , β ) is the main difference between this hierarchical model and the Negative Binomial model discussed in the previous subsection. To obtain parameter estimates, posterior inference can be done by MCMC simulation methods such as Gibbs sampling and the Metropolis-‐Hastings algorithm. In practice, either an informative or non-‐ informative prior can be assumed on φ . For instance, an informative prior can be assumed by fixing the hyperparameters a and b according to previous data or expert knowledge (e.g., see Miranda-‐ Moreno et al., 2007). Non-‐informative prior distributions on the regression parameters β are
commonly assumed. However, semi-informative or informative priors can easily be integrated as discussed later in our simulation study. Note that one of the advantages of the FHB approach is the flexibility it offers to formulate and ε
compute alternative models. Instead of assuming a Gamma distribution as a prior for e i , can be ε
2
2
assumed to be Normal distributed, εi = ln(e i ) | σ ~ N (0,σ ) , with a proper hyper-‐prior for the parameter σ 2 . Note that this model with additive Normal error is not directly comparable with the hierarchical Poisson/Gamma model in equation 5-‐7. Then as an alternative specification, the hierarchical Poisson/Lognormal (HPL) model can be employed with a log-‐normal prior on the multiplicative random error: e εi ~ LogNormal (−0.5τ ε2 , τ ε2 ) , with
is the shape parameter and by
specifying τ ε2 = log(1 + 1 / φ) one can show that E(e εi ) = 1 and Var[e εi ] = 1 / φ . This leads to comparable model to the hierarchical Poisson/Gamma model defined above.
Hierarchical Poisson models can consider spatiotemporal patterns of accident counts. This extension is useful to work with complex datasets in which longitudinal and/or spatial information is available (Miaou and Song, 2005; Song et al., 2006). Space-‐time hierarchical Poisson models can be defined by incorporating into the mean number of accidents a spatial and temporal effect, e.g.
θit = µi eεit eϕt e ρi , where θit is the mean number of accidents at site i and period t (for each site there are t observation periods t = 1,...,T ). In this model, ϕ t denotes a time effect representing a possible time trend due to socioeconomic changes, traffic operation modifications, weather variations, etc.
ρi is a space-‐random effect accounting for spatial correlation among sites.5 As noted by Miaou and Song (2005), random variations across sites may be structured spatially in some way due to the complexity of the traffic interaction around locations or given that driver behaviors are influenced by a multitude of factors (e.g., roadside development near locations and environmental conditions). Hierarchical Poisson models have been also extended to a multivariate setting, which have been used to analyze simultaneously several accident types (e.g., accident counts divided in different severity levels, such as fatal accidents, accidents with injuries and others). Under a multivariate setting, one is able to take into account correlation structures between the response variables. For instance, one may suspect that a given site with a high number of fatal accidents would also present a high number of injury accidents since they may share the same set of risk factors (e.g., an excessive curvature or a bad condition of road surface). Multivariate Poisson hierarchical models with covariates and time-‐space random effects have been extensively studied by Song et al. (2006) in the road safety context. Without accounting for spatial correlation, many studies have analyzed accident datasets with different severity outcomes using a multivariate Poisson setting, e.g., one can refer to Tunaru (2002) and Brijs et al. (2007), Aguero-‐Valverde and Jovanis (2009), El-‐Basyounya K. Sayed T. (2012). Latent class models, multilevel models and finite mixture hierarchical models have been also applied to crash data -‐ Miranda-‐Moreno (2006), Huang, H., Abdel-‐Aty M. (2010), Zou et al. (2012). Under a full Bayes setting, some recent studies have applied multilevel models Poisson-‐Weibull model (Cheng et al. 2012). 3.3 Bayesian Methods for Hotspot Identification and Before-‐After Studies Various ranking criteria or safety measures can be used for hotspot identification (Higle and Witkowski, 1988; Schluter et al., 1997; Persaud et al. 1999; Heydecker and Wu, 2001; Tunaru,
5 Spatial dependency of θ with respect to other sites – which suggests that sites that are closer to each other are more it likely to have common features affecting their accident occurrence.
2002; Miaou and Song, 2005; Miranda-‐Moreno, 2005; Song et al., 2006; Brijs et al., 2007; Lan and Persaud, 2011; El-‐Basyouny and Sayed, 2012). In this context, the use of Bayesian statistics is highly attractive due to the flexibility of working with predictive posteriors. The most popular safety measures include the posterior expected number of accidents, the posterior probability of excess, the posterior expectation of ranks, and the posterior probability that a site is the worst. These safety measures have been used in different previous works (e.g., Schluter et al., 1997; Heydecker and Wu, 2001; Tunaru, 2002). The posterior expectation θˆi = E(θ i | yi ) is perhaps the most popular parameter of interest in the safety literature. The θˆi ranking criterion is a point estimate of the underlying mean number of accidents over the long term.6 In order to select a list of sites for safety inspections, we can simply sort the n sites under analysis based on their posterior mean number of accidents and then select the top r locations (r θ j ,∀j ≠ i, s ∈ [0,+∞)) . Then, for example if s = 1 , ps is simply i
the posterior probability that θ i is the largest and
i
∑
n i=1
ps = 1 . This criterion, which was first i
suggested by Schluter et al. (1997) and applied later by Tunaru (2002), can be interpreted as the average probability that the mean of the accident frequency at a given site is greater than all the other sites. Unfortunately, this criterion is computationally challenging when working with a large set of sites ( n > 2000 ). Under this situation, the values of ps will be also very small. i
Instead of making inference based on p(θ i | yi ) , we can use the posterior distribution of ranks denoted by p(Ri | yi ) , where Ri is the true rank of θ i and is defined as Ri =
∑
n j=1
I(θ i ≥ θ j ) , where
6 For instance, if
θˆi is equal to two accidents per year, it means that in five years ten accidents are expected.
I(⋅) is an indicator function (Rao 2003). The greatest ranks correspond to the most hazardous sites. (Note that the smallest θ i has rank 1.) The posterior expectation of ranks has been widely recommended as a ranking criterion and is defined as E(Ri | yi ) =
∑
n j=1
p(θ i ≥ θ j | yi ) . Using
hierarchical Bayes models this ranking criterion has been applied by Tunaru (2002) and Miaou and Song (2005). This criterion can be computed very easily in a full Bayes setting when using hierarchical Poisson models. Some studies have indicated that ranking by E(Ri | yi ) can be more accurate than E(θ i | yi ) (Shen and Louis, 2000). It has been shown that the use of the posterior distribution of ranks is a convenient criterion in similar problems in epidemiology (Rao, 2003). However, for large samples the evaluation of E(Ri | yi ) can be computationally harder than other absolute ranking criteria such as E(θ i | yi ) . Traditionally, EB has been used in most of the before-‐after studies reported in the literature. However, since the improvement of the computational capacities of computers, the full Bayes approach – being characterized by high dimensional and complex integrations – has been gaining popularity in accident data analysis during the last decade. Recently, a few researchers have adopted a fully Bayesian framework as an alternative approach to EB in observational before-‐after studies (Pawlovich et al., 2006; Aul and Davis, 2006; Li et al., 2008; Lan et al., 2009; Persaud et al., 2010; El-‐Basyouny and Sayed, 2010; Park et al., 2010; Yanmaz-‐Tuzel and Ozbay, 2010; Schultz et al., 2011; El-‐Basyouny and Sayed, 2012). Finally, the modeler has the possibility of introducing expert criteria or past evidences into the analyses through the prior distribution of the model parameters. The latter is particularly relevant when dealing with data characterized by a small number of observations and/or low mean values (Lord and Miranda-‐Moreno, 2008). The main steps to apply an FHB before-‐after study are: •
Obtain the posterior expected accident frequencies for treated and comparison sites in the before and after periods, which are θTBi , θTAi , θCBi , and θCAi . Treatment and comparison sites are denoted by T and C, respectively. The before and after periods are denoted by B and A, respectively;
•
Compute the ratio (for the comparison group) between the expected accident frequencies in the after period θCAi and those in the before period θCBi , Ri = θCBi θCAi ;
•
Calculate π , which is an estimate of the safety of a treated site if treatment would not be implemented. This is estimated as the posterior expected accident frequency in the after period ( θTA ) for a given site by the ratio obtained in step 2, i.e. π = θTA !"θCB θCA #$ ;
•
Calculate the treatment effectiveness, defined as the ratio between θTA and π .
3.4 A Monte Carlo study: FHB vs. MLE As discussed above, many alternative Bayesian models and ranking methods have been proposed for road safety analysis. To show the flexibility and advantages of the FHB approach with respect to the traditional EB approach based on the standard NB model, an extensive simulation study was executed. The impact of different model assumptions and methods was evaluated including: i) model error assumption on υi (Gamma vs LogNormal), ii) type of prior (informative versus non-‐ informative) on φ and/or regression parameters and iii) ranking criteria for hotspot identification, -‐ p(θi|y) vs p(Ri|y). To compare alternative priors on υi, the hierarchical Poisson/Gamma and Poisson/Lognormal models described in Section 3.2 are used. To illustrate the importance of prior information on model parameters, semi-‐informative priors for φ are built based on past studies. One often knows (as prior information) that a parameter of interest is expected to be positive or negative, or one has an idea of the range of variation. In some circumstances, model parameter estimates for many studies have been published and summarized. For instance, we can refer to Elvik (2009) that provides a summary of the model parameters for SPF at intersections for pedestrian-‐vehicle crashes. Also, Miranda-‐Moreno et al. (2013) used the inverse dispersion parameter estimates from previous studies to build informative priors. Hence, in a FHB framework, we can easily take advantage of past evidence, in particular when the observed data are limited. 3.4.1 Simulation framework The quasi-‐simulation was built upon 5,595 three-‐legged intersections in California. The true parameters were fixed using the estimates for the whole sample. As data generating process, the
(
safety performance function used was µi = β0 Fi1 + Fi 2
β1
) (F
i1
Fi 2
)
β2
, where Fi1 and Fi 2 are the
average major and minor entering daily traffic volumes, respectively. Also, β0 , β1 , and β 2 are parameters to be estimated from the data. To increase the reliability of inferences, the results were obtained from 100 simulated crash datasets.
The following steps were considered for the data simulation framework: 1. Calculate the μi using the safety performance function and the predefined true parameters; 2. Generate the multiplicative random effect υi (Gamma vs Log-‐Normal) based on the true dispersion parameter; 3. Compute the expected accident frequency (θi) by multiplying the values from the previous steps and generate accident frequency (counts) using the Poisson distribution with (θi); 4. Obtain posterior estimates of interest using an MCMC approach (here, we ran 20000 iterations from which 5000 were discarded as burn-‐in). Obtain statistics based on the 100 simulated datasets and compare outcomes among different methods (FHB vs EB); 5. Evaluate the effect of using alternative model settings, prior assumptions, and ranking criteria on parameters accuracy and hotspot identification. It is important to mention here that we specified two types of priors for φ and βk: a) prior1 that includes non-‐informative priors for all model parameters; and b) prior2 that includes informative prior for φ and semi-‐informative priors for regression parameters. An informative Gamma was assumed for φ, with φ ~ Gamma(2.5,1.3), as proposed by Miranda-‐Moreno et al. (2013) for the same training dataset. A semi-‐informative prior (positive constrained distribution) is used for each regression parameter, given that as prior information we know that traffic flow parameters are expected to be positive. Hence, we impose our regression parameter priors to take only positive values. 3.4.2 Results Table 1 presents the simulation outcomes to evaluate the impact of an alternative approach (FHB vs MLE) on the accuracy of the inverse dispersion parameter estimates. The results indicate that when dealing with limited data, the FHB approach with an informative hyper-‐prior choice provides much more accurate estimates than the EB based on MLE and the negative binomial (NB) model. In accordance with previous studies, when the sample size is small, the latter approach can lead to erroneous estimates (Lord and Miranda-‐Moreno, 2008; Miranda-‐Moreno et al. 2013; Heydari et al., 2013). Moreover, the MLE falls short in estimating the uncertainty around the mean values in limited datasets -‐ estimated confidence intervals are biased. It can be also inferred that credible intervals obtained from the informative prior choice are smaller (hence, these are more precise estimates) than those estimated from the non-‐informative hyper-‐prior specification. Therefore, the
FHB approach, especially when informative prior in model parameters is employed, offers promise in analyzing data characterized by a small number of observations that is common in before-‐after observational studies. Also, non-‐informative priors in model parameters might have serious consequences in the outcome when working with limited data. In the same Table 1, the effect of alternative priors on υi (Gamma vs Log-‐normal) can be observed. The results indicate that the choice of prior has a marginal effect on parameter accuracy. The outcomes obtained with both priors are very similar, with slightly more accurate estimates based on the HPL model. Despite that in most of the cases these two priors are expected to generate very similar results, this conclusion cannot be generalized and a sensitivity analysis is always recommended. It is also important to mention that the HPL can be easily extended to the multivariate case with and without spatial effects, or with a latent-‐class model setting. Regarding the impact of FHB vs EB on ranking accuracy, Table 2 presents the outcomes obtained from the FHB vs EB approach using the posterior probability of excess as raking criterion. From these results one can see that the accuracy measured with the Spearman’s correlation coefficient (ρ) is not very sensitive to the approach. FHB approach generates only slightly better results than those obtained with the EB approach – note that the closer that ρ is to 1.0 the better. For instance, ρ values obtained with the EB approach and 100 datasets are in average 0.65 for the sample scenario with 30 sites; for the sample size, the average ρ value is equal to 0.69 for the FHB approach and HPG model. Moreover, from the results presented in second part of Table 2, one can observed that the outcomes obtained with different ranking criteria (but with the same estimation method and model) are very similar. The posterior of ranks are basically the same than those obtained with the posterior of θi. For instance, the average of ρ values is 0.70 for the posterior mean of ranks vs. 0.68 for the posterior mean of accident frequency using the EB approach. This means that using more complex ranking criteria do not necessarily guarantee more accurate results. This also highlights the importance of carrying on a sensitivity analysis as part of study when using FHB approach, which allows computing with little additional effort different crash risk estimates. This simulation study also intends to illustrate the flexibility of the FHB approach; under FHB different modeling settings, informative priors and ranking methods can be implemented to the same dataset. Also, a sensitivity analysis can be easily executed to investigate the effect of key
model parameters and ranking methods that potentially can affect the outcome of the analysis. In addition, FHB provides a robust and versatile methodology allowing for a consistent incorporation of past evidence and parameter uncertainty. 4 Adopting the Bayesian approach in travel behavior modeling 4.1 Subjective probabilities and random utility models In essence, the concept of subjective probabilities motivates construing the parameters of a model as a random variable in a Bayesian setting. Instead of measuring frequency, subjective probabilities measure the beliefs about the occurrence of a particular event and are related to the notion of probability laws under uncertainty (see Savage, 1954). The application of Bayesian microeconometrics to choice modeling is natural precisely because of the concept of subjective probabilities, which is akin to the concept of random utility in discrete choice modeling. Effectively, discrete choice models are derived using stochastic behavioral assumptions, in which utility is considered random to represent the researcher's uncertainty about the individuals' decision-‐ making process. Being unable to account for all the factors that influence a consumer's decisions, the researcher can only ascertain choice probabilities. These choice probabilities, in turn, do not directly measure different frequency of choice, but rather the researcher's judgment or beliefs about the likelihood of each alternative being chosen. Despite both the conceptual affinity between subjective probabilities and random utility models and the potential benefits that such models offer, transportation choice modelers have in general been reluctant to adopt Bayesian techniques. (cf. road safety analysis). This fact was noticed in Brownstone (2001) and little has changed since, especially when compared to applications of Bayesian choice modeling in other fields such as marketing. For instance, Damien and Kockelman (2010) describe the impact of the Bayesian approach as being at an “early stage” for many fields in transportation analysis. Whereas finding conjugate priors is limited to multinomial logit models (Koop and Poirier, 1993), MCMC estimation can be implemented for any choice model (see Chib et al., 1998). On the one hand, a Gibbs sampler was first developed for binary and ordered choice (Albert and Chib, 1993) and later expanded to multinomial probit models (McCulloch et al., 2000; Imai and van Dyk, 2005). On the other hand, multinomial logit models can be estimated using Metropolis-‐Hastings
(Frühwirth-‐Schnatter and Frühwirth, 2010). Metropolis-‐Hastings has also been used for the nested logit (Lahiri and Gao, 2002; cf. Poirier, 1996) and the continuous cross-‐nested logit models (Lemp et al., 2010). These estimators have been used in biostatistics and marketing, in some occasions with aggregate data. Bayes estimators based on Metropolis-‐Hastings within a Gibbs sampler have been used for the hierarchical representation of mixed logit (Train, 2009) and for the system of equations of a multinomial logit kernel with endogenous latent attributes (Daziano and Bolduc, 2011). Fang (2008) and Brownstone (2009) derived a Gibbs sampler for the system of equations of discrete-‐continuous demand models.7 In economics, Bayes estimators have been also derived for dynamic choice models (Imai et al., 2009), another problem where transportation researchers lag behind relevant developments that are now common in applied econometrics (Aguirregabiria and Mira, 2010). Even with a straightforward Bayes estimator for probit (see empirical applications in Bolduc et al., 1997 and et al., 2003), the mixed logit Bayes estimator has dominated the few applications that do use the Bayesian approach for transportation analysis (e.g. Hensher and Greene, 2003; Sillano and Ortúzar, 2005; Scarpa et al., 2008). In particular, because the traffic assignment problem involves a large number of potentially correlated alternatives, the Bayesian approach is especially promising for route choice (Washington et al., 2010). Nevertheless, these few studies do not fully take advantage of the Bayesian approach. For instance, these particular studies have not exploited the ideas of credible intervals, Bayesian model selection or Bayesian averaging for competing models (see the discussion in Brownstone, 2001). Empirical work in transportation has also failed to exploit Bayes estimates of nonlinear functions of the model parameters (cf. Sonnier et al., 2007) and the use of predictive posteriors.8 On the other hand, transportation researchers are now adopting Bayesian tools for certain specifical modeling components that are not related to actual estimation of the parameters, such as efficient design of stated preference experiments (Sándor and Wedel, 2005), missing-‐data imputation (Washington et al., 2012), and choice-‐set formation modeling. Latent class models (Greene and Hensher, 2003), which have attracted significant research interest for addressing the
7 Brownstone and Fang (2009) address the problem of endogeneity, which was absent in the estimator derived by Fang
(2008). 8 Two exceptions are the work of Brownstone and Fang (2009), where the authors perform an out-‐of-‐sample check of vehicle choice and utilization forecasts using predictive posteriors, and the credible sets of willingness-‐to-‐pay derived in Daziano and Achtnicht (2012).
problem of discrete random heterogeneity in tastes, is another example of a pseudo Bayesian framework. Even though the formulation of latent class models is Bayesian, researchers have been using frequentist estimators (cf. Miranda-‐Moreno et al., 2005). 4.2 A Gibbs sampler for the multinomial probit model Both the logic and intuition of the Bayes estimator of the parameters of a discrete choice model is best described by the probit Gibbs sampler derived in McCulloch et al. (2000). Consider the following estimable form of the multinomial probit model
(8)
C 'Δ jU i = C 'Δ j X i β + C 'Δ jε i , C 'Δ jε i ~ N (0( J −1) , I ( J −1) ) ( J −1×1)
( K×1)
( J −1×K )
( J −1×1)
& j ⇔ Δ jU ij ' < 0,∀j ' ≠ j ( yi = ' () j ' ⇔ Δ jU ij ' > max{0,Δ jU i,− j },∀j ' ≠ j
(9)
where U i is the vector that contains the random utility of each alternative available to individual i,
X i is an attribute matrix, β are marginal utilities, ε i is a multivariate normally distributed random shock, Δ j is a matrix difference operator that normalizes the model with respect to alternative j, U i,− j represents the set of all elements of U i with the exception of U ij , C is the
(
Cholesky root of Δ j ΣΔ 'j
)
−1
, I ( J −1) is the identity matrix of size J-‐1, and yi is a choice indicator.
The parameter space of this model is given by both the marginal utilities β and the covariance '
matrix of the utility in differences Δ j ΣΔ j via the elements of its Cholesky root. Because the dependent variable of equation (8) is latent, it is possible to treat Δ jU i as an additional parameter. This step is called data augmentation and in the paragraphs below we show how the estimator is simplified by the use of the augmented parameter space. As discussed in subsection 2.3, Gibbs sampling requires building a partition of the parameter space. For the multinomial probit model the '
partition is natural: Δ jU i , β , and Δ j ΣΔ j . Given this partition, the steps of the probit Gibbs sampler are the following.
(0)
'
•
Start at any given point Δ jU i , β (0) , and (Δ j ΣΔ j )
•
For g ∈ {1,...,G} (g)
1. If yi = j , draw Δ jU i
(0)
in the parameter space.
from the truncated normal distribution
N (Δ j X i β ( g−1) ,(Δ j ΣΔ 'j )( g−1) )1(Δ jU ij ' < 0,∀j ' ≠ j) (g)
otherwise draw Δ jU i
from the truncated normal distribution
N (Δ j X i β ( g−1) ,(Δ j ΣΔ 'j )( g−1) )1(Δ jU ij ' > max{0,Δ jU i,− j },∀j ' ≠ j) 2. Draw β ( g ) from the normal distribution
N ((Vβ−1β + (C ( g−1)' X )'C ( g−1)' X )−1 (Vβ−1 + X 'C ( g−1)C ( g−1)'Δ jU ( g ) ), Vβ−1 + C ( g−1)' X '(C ( g−1)' X ))−1 ),
where β and Vβ are the parameters of the prior distribution p( β ) ~ N ( β ,Vβ ) . '
3. Draw (Δ j ΣΔ j )
(g)
from the inverted-‐Wishart distribution
N % ( ' ' IW 'ν + N ,Δ j ΣΔ j + ∑ (Δ jU i( g ) − Δ j X i β ( g ) )(Δ jU i( g ) − Δ j X i β ( g ) )'** , & ) i=1
' where the ν and Δ j ΣΔ j are the parameters of the inverted-‐Whishart prior
p(Δ j ΣΔ 'j ) ~ IW (ν ,Δ j ΣΔ 'j ) 4. Update the Cholesky root C ( g ) and normalize c11 = 1 for identification. Following McCulloch et al. (2000), normalization of scale can be set by postprocessing the chain in a procedure that involves dividing all the parameters of interest β ( g ) and the nuisance parameters in
C ( g ) by c11(g ) (cf. Nobile, 2000 and Imai and van Dyk, 2005). 5. To make inference on parameter ratios, just calculate the desired ratio using the current samples to generate an instance from the posterior distribution of the measure of interest. For instance, suppose that in a travel mode choice model βcost represents the additive inverse of the marginal utility of income and βtime represents the marginal disutility of travel time. In this case, a draw from the posterior distribution of the value of time is given by the (g) (g) simple ratio VOT ( g ) = [ βtime / βcost ] (cf. the procedure discussed at the end of subsection 3.4.,
which also involves inference on parameters ratios.)
•
Because the long-‐run distribution of the Gibbs sampler is the true posterior of interest, each sample can be treated as a random draw of the joint posterior.
•
Bayes point estimates are given by sample averages.
Note that unlike the frequentist estimator of the multinomial probit (see Geweke et al., 1997), the probit Bayes estimator is analytically straightforward. Because samples of the utility function are drawn in the data augmentation step 1, the estimator of the marginal utilities is treated in step 2 as an ordinary regression problem. In addition, note that no maximization is required. 4.3 Case study: mode choice with a Bayesian probit To illustrate the application of the probit Gibbs sampler we generate a quasi-‐simulated experiment from revealed-‐preference data on interurban travel mode choice in Canada (KPMG Peat Marwick and Koppelman, 1990).9 The true underlying model is a multinomial probit model with parameters extracted from the MLE of the original data. 2769 individuals choosing among car, train, and air options were considered. The utility function was assumed linear in parameters and in the following attributes: alternative-‐specific constants, household income, large city indicator, in-‐ vehicle-‐travel time, out-‐of-‐vehicle travel time, frequency, and cost. Once the simulated data was generated, a multinomial probit model was estimated using two subsamples, a large one of 2500 individuals, and a relatively small one of 250 individuals. Estimates were found for the Gibbs sampler with both a non-‐informative and an informative prior, as well as for the maximum simulated likelihood estimator. In Table 3 we report the estimates of the marginal disutilities of cost and time, as well as the estimates of the derived value of time (VOT, in C$[1989]/hr). The upper and lower limits of the 95% credible and confidence intervals are calculated for the Bayes and frequentist estimators, respectively. Highest probability density (HPD) credible intervals are calculated as the shortest possible set containing 95% of the posterior mass, based on the MCMC posterior samples. This calculation is valid not only for the parameters of interest of the model, but also for functions of the parameters, such as the samples of the posterior of the value of time. As a result, the derivation of Bayes confidence intervals is straightforward. In the case of the MSLE, Krinsky and Robb confidence intervals are constructed (Krinsky and Robb, 1986).
9 This widely studied dataset was collected in 1989 by VIA Rail to analyze the demand for a high-‐speed train in the
Toronto-‐Montréal corridor (see Forinash and Koppelman, 1993; Bhat, 1995; Koppelman and Wen, 2000).
For the large sample, where it is known that both the Bayes and frequentist estimators share the same asymptotic properties, all point estimates are very close to the true values (including the VOT point estimates). The credible interval with non-‐informative priors is somewhat wider than its frequentist counterpart. When an informative prior on the marginal utilities is considered, then the resulting VOT credible interval is very tight. Whereas no problems are detected for the large sample, when working with the smaller sample several issues arise. Bias in the point estimates is detected in almost all cases, with the exception of βtime and VOT for the informative-‐prior case. At the same time, all credible intervals contain the true parameters, and the hypothesis of the point estimates being different than the true parameters cannot be rejected. However, the hypothesis of a null VOT cannot be rejected in the frequentist case (at the 95% confidence level). In fact, the same conclusion applies to βcost , a result which is an indication of weak identification of the ratio representing VOT. Weak identification also explains the high VOT standard error. Note that a high standard deviation is also obtained for the VOT posterior distribution when a non-‐informative prior is considered. Although some VOT outliers are generated in this case, the credible interval is relatively tight (compared to the confidence interval). The credible interval contains the true parameter, and does not contain zero. In sum, this exercise shows that the Bayesian approach obviates inference problems in parameter ratios when the ratios are weakly identified. Furthermore, the use of an informative prior can in some instances entirely prevent weak identification problems, which are especially troublesome in constructing intervals for welfare measures. 5 Summary and conclusions Bayes estimators represent a decision-‐making process under uncertainty in which the optimal decision is based on the minimum expected cost of being wrong – or maximum utility of being right – given the researcher's beliefs about the state of the world. The decision is compatible with the scientific method for updating knowledge given evidence provided by data. There are several reasons to encourage the use of Bayesian statistics in transportation modeling. To begin with, Bayes estimators are both gradient and Hessian free. Since no maximization is involved, Bayes estimators are very particularly well suited for more sophisticated models that may have non-‐convex likelihood functions and weakly identified parameters. Bayes works for small samples, and as well as for data that are not samples. At the same time, asymptotic properties coincide with those of the
maximum likelihood estimator. In addition, the Bayesian framework is particularly well suited for models that include latent variables, such as partially observed variables, missing data, utility, or attitudes and multidimensional quality attributes in hybrid choice models. When the analytical posterior is not known, Bayesian inference exploits Monte Carlo sampling. Computational Bayesian estimators exploit simulation-‐aided inference, but they avoid the potential bias found in maximizing a simulated likelihood. Not only is the Bayesian answer to the interval estimation problem straightforward when using computational Bayes but also much easier to interpret. Computational Bayesian estimators can also be used for addressing the interval estimation problem. Credible sets represent the region of the posterior that contains the true parameter with a given probability (credible level). Given a significance level of α, a high posterior density (HPD) credible interval – the smallest interval that contains (1-‐α)100% of the posterior mass – can be computed using a very simple algorithm. For instance, in this paper we calculated HPD intervals for the inverse dispersion parameter and for the value of time. Since the probability of the true parameter being inside the credible interval is 1-‐ α, hypothesis testing is also straightforward in a Bayesian setting. In contrast, frequentist hypothesis testing is much harder to interpret and has some conspicuous problems, such as not obeying the likelihood principle. In this paper we have reviewed the empirical application of Bayesian statistics in transportation modeling. In road safety analysis, where small samples are common, Bayes estimators are already the dominant analytical tool. Within the class of Bayesian methods applied in road safety analysis, two main approaches exist: the empirical Bayes (EB) method and full hierarchical Bayes (FHB) approach. In EB, crash data is used first to estimate model parameters, and then used again to make posterior inference. Despite its popularity, EB has been criticized for its inability to properly handle parameter uncertainty and for its lack of justification for using the data twice. FHB, on the other hand, overcomes these limitations with a more flexible modeling framework although it does require analysts to model the calibration process directly using specialized statistical software. Several studies have looked at the potential benefits of the FHB method when dealing with deficient data (a problem referred to as the low-‐mean and low-‐sample problem) or under the presence of temporal and spatial correlation or latent-‐class data. In a full Bayesian setting, more complex
models have been proposed, such as multivariate Poisson models with and without spatial effects, mixture (latent-‐class) models, Poisson-‐Weibull models, and others. In spite of this rich literature, few studies in the traffic safety literature have investigated the practical implications of using these two alternative approaches in the model parameters, in particular when using informative priors. It is known that when using non-‐informative priors, the expected results should converge to those obtained in the EB approach using ML estimators. Moreover, when working with large samples, this is not an issue; however, when working with small-‐sample or low-‐mean datasets, the problem can become relevant. In the case of discrete choice models of travel behavior, Bayes estimators are still not part of the empirical researcher's toolkit. Although the Bayesian approach may appear less attractive inasmuch as the effect of prior distributions becomes less relevant given the large sample sizes of choice data, the use of predictive posteriors and credible intervals (instead of confidence intervals), as well as the treatment of weakly identified models, both favor the adoption of Bayesian tools. As a specific example, we have offered in this paper a simple simulation exercise to illustrate the construction of credible intervals for the value of time. In particular, we have shown that when problems of weak identification emerge, the Bayes estimates perform better than MLE and even more so when informative priors are taken into account. References Abbess, C., Jarrett, D.F., and Wright, C.C. (1981) Accidents at Blackspots: Estimating the Effectiveness of Remedial Treatment, with Special Reference to the Regression-‐to-‐mean effect. Traffic Engineering and Control, 22 (10), 535-‐542. Aguero-‐Valverde, J. & Jovanis, P.P. (2009) Bayesian Multivariate Poisson Log-‐Normal Models for Crash Severity Modeling and Site Ranking. Transportation Research Record 2136, pp. 82-‐91. Aguirregabiria, V. and Mira, P. (2010). Dynamic discrete choice structural models: a survey. Journal of Econometrics, 156(1), 38-‐67. Albert, J., Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669-‐679. Bhat, C.R. (1995). A heteroscedastic extreme-‐value model of intercity mode choice, Transportation Research B, 29(6), 471-‐483. Bolduc, D., Fortin, B. and Gordon, S. (1997). Multinomial probit estimation of spatially interdependent choices: An empirical comparison of two new techniques. International Regional Science Review, 20, 77-‐ 101. Brijs, T., Karlis, D., Van den Bossche, F. and Wets, G. (2007). A Bayesian model for ranking hazardous road sites. Journal of the Royal Statistical Society series A, 170, 1-‐17. Brownstone, D. (2001). Discrete choice modelling for transportation, in Hensher, D.A. (ed.). Travel Behaviour Research: The Leading Edge, Pergamon Press, Oxford, 97-‐124.
Brownstone, D., and Fang, H. (2009). A vehicle ownership and utilization choice model with endogenous residential density. Working paper, Department of Economics, University of California, Irvine. Cheng, L., S.R. Geedipally, and D. Lord (2012) Examining the Poisson-‐Weibull Generalized Linear Model for Analyzing Crash Data. Safety Science, Vol. 54, pp. 38-‐42. Chib, S., Greenberg, E., and Chen, Y. (1998). MCMC methods for fitting and comparing multinomial response models. Econometrics 9802001, EconWPA. Damien, P., and Kockelman, K.M. (2010). Preface to special issue on Bayesian methods. Transportation Research Part B, 44(5), 631-‐632. Daziano, R.A., and Achtnicht, M. (2012). Accounting for uncertainty in willingness to pay for environmental benefits. Working Paper, School of Civil and Environmental Engineering, Cornell University. Daziano, R.A., and Bolduc, D. (2011). Incorporating pro-‐environmental preferences toward green automobile technologies through a Bayesian Hybrid Choice Model. Transportmetrica. Edwards, Y., Allenby, G. (2003). Multivariate analysis of multiple response data. Journal of Marketing Research, 40, 321-‐334. El-‐Basyounya K. Sayed T. (2012) Depth-‐based hotspot identification and multivariate ranking using the full Bayes approach, Accident Analysis and Prevention, Vol. 50, pp. 1082–1089. Fang, A. (2008). A discrete-‐continuous model of households' vehicle choice and usage, with an application to the effects of residential density. Transportation Research Part B, 42(9), 736-‐758. Forinash, C.V., and Koppelman, F.S. (1993). Application and interpretation of nested logit models of intercity mode choice. Transportation Research Record, 1413, 98-‐106. Frühwirth-‐Schnatter, S., and Frühwirth, R. (2010). Data augmentation and MCMC for binary and multinomial logit models. In Thomas Kneib and Gerhard Tutz, editors, Statistical Modelling and Regression Structures, 111-‐132. Physica-‐Verlag, Heidelberg. Gelfand, A.E., and Smith, A.F.M. (1990). Sampling-‐Based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-‐409. Geweke, J. (1991). Efficient simulation from the multivariate normal and Student-‐t distributions subject to linear constraints, in E. M. Keramidas, ed., Computer Science and Statistics: Proceedings of the Twenty-‐ Third Symposium on the Interface, Interface Foundation of North America, Inc., Fairfax, 571-‐578. Geweke, J.F., Keane, M.P., Runkle, D.E. (1994). Alternative computational approaches to inference in the multinomial probit model. Review of Economics and Statistics, 76, 609-‐632. Geweke, J.F., Keane, M.P., Runkle, D.E. (1997). Statistical inference in the multinomial multiperiod probit model. Journal of Econometrics, 80, 125-‐165. Greene, W.H., and Hensher, D. (2003). A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B, 37, 681-‐698. Hajivassiliou, V., McFadden, D. (1998). The method of simulated scores for the estimation of LDV models. Econometrica, 66, 863-‐896. Hauer, E. (1997). Observational before-‐after studies in road safety: estimating the effect of highway and traffic engineering measures on road safety. Elsevier Science Ltd. Hauer, E., and Persaud, B.N. (1987). How to estimate the safety of rail-‐highway grade crossings and the safety effects of warning devices. Transportation Research Record, 1114, 131-‐140. Hensher, D.A., and Greene, W.H. (2003), The mixed logit model: The state of practice. Transportation, 30(2), 133-‐176. Heydecker, B.G., and Wu, J. (2001). Identification of road sites for accident remedial work by Bayesian statistical methods: an example of uncertain inference. Advances in Engineering Software, 32, 859-‐869. Higle, J.L., and Witkowski, J.M. (1988). Bayesian identification of hazardous sites. Transportation Research Record, 1185, 24-‐35. Huang, H., Abdel-‐Aty M. (2010) Multilevel data and Bayesian analysis in traffic safety. Accident Analysis & Prevention, Volume 42(6), pp. 1556–1565 Imai, K., and Van Dyk, D.A. (2005). A Bayesian analysis of the multinomial probit model using marhinal data augmentation. Journal of Econometrics, 124, 311-‐334. Imai, S., Jain, N., and Ching, A. (2009). Bayesian estimation of dynamic discrete choice models. Econometrica, 77(6) 1865-‐1899. Kim, Y., Kim, T.Y., and Heo, E. (2003). Bayesian estimation of multinomial probit models of work trip choice. Transportation, 30, 351-‐365.
Koop, G., and Poirier, D.J. (1993). Bayesian analysis of logit models using natural conjugate priors. Journal of Econometrics, 56, 323-‐340. Koppelman, F.S., Wen, C-‐H. (2000). The paired combinatorial logit model: properties, estimation and application. Transportation Research B, 34, 75-‐89. Krinsky, I., and Robb, A.L. (1986). On approximating the statistical properties of elasticities. Review of Economic and Statistics, 68, 715-‐719. Lahiri, K., Gao, J. (2002). Bayesian analysis of nested logit model by Markov Chain Monte Carlo. Journal of Econometrics, 111, 103-‐133. Lan B., Persaud, B. (2011) Fully Bayesian Approach to Investigate and Evaluate Ranking Criteria for Black Spot Identification. Transportation Research Record, 2237, pp. 117-‐125. Lemp, D., Kockelman, K.M. and Damien, P. (2010). The continuous cross-‐nested logit model: Formulation and application for departure time choice. Transportation Research Part B, 44(5), 646-‐661. Lord, D., Washington, S.P., Ivan, J.N. (2005b). Poisson, Poisson-‐gamma and zero inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention, 37(1), 35-‐46. McCulloch, R., Rossi, P. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64, 207-‐240. McCulloch and Rossi(2000)]{MR00} McCulloch, R., and Rossi, P. (2000). Bayesian analysis of the multinomial probit model, in R. Mariano, T. Schuermann, and M. Weeks, eds., Simulation-‐Based Inference in Econometrics, Cambridge University Press, New York. McCulloch, R.R., Polson, N.G., Rossi, P.E. (2000). Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics, 99, 173-‐193. McFadden, D.L., Train, K.E. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15(5), 447-‐470. Miaou, S.P., and Lord, D. (2003). Modeling traffic crash-‐flow relationships for intersections: dispersion parameter, functional form, and Bayes versus Empirical Bayes. Transportation Research Record, 1840, 31-‐ 40. Miaou, S.P., and Song, J.J. (2005). Bayesian ranking of sites for engineering safety improvement: decision parameter, treatability concept, statistical criterion and spatial dependence. Accident Analysis and Prevention, 37, 699-‐720. Miranda-‐Moreno, L.F., Fu, L., Saccomano, F., and Labbe, A. (2005). Alternative risk models for ranking locations for safety improvement. Transportation Research Record, 1908, 1-‐8. Miranda-‐Moreno, L.F., Labbe, A., Fu, L. (2007). Multiple Bayesian testing procedures for selecting hazardous sites. Accident Analysis and Prevention, 39, 1192-‐1201. Nobile, A. (2000). Bayesian multinomial probit models with a normalization constraint. Journal of Econometrics, 99, 335-‐345. Persaud, B., Lyon, C., and Nguyen, T. (1999). Empirical Bayes procedure for ranking sites for safety investigation by potential for safety improvement. Transportation Research Record, 1665, 7-‐12. Poirier, D.J. (1996). A Bayesian analysis of nested Logit models. Journal of Econometrics, 75, 163-‐181. Qin, X., Ivan, J.N., and Ravishanker, N. (2004). Selecting exposure measures in crash rate prediction for two-‐ lane highway segments. Accident Analysis and Prevention, 36 (2), 183-‐191. Rao, J.N. (2003). Small area estimation. John Wiley and Sons. Sándor, Z., and Wedel, M. (2005). Heterogeneous conjoint choice designs. Journal of Marketing Research, 42, 210-‐218. Savage, L. J. (1954). The Foundations of Statistics, John Wiley, New York. Schluter, P.J., Deely, J.J., and Nicholson, A.J. (1997). Ranking and selecting motor vehicle accident sites by using a hierarchical Bayesian model. The Statistician, 46 (3), 293-‐316. Scott, S. (2003). Data augmentation for the Bayesian analysis of multinomial logit models. Proceedings of the American statistical association section on Bayesian statistical science. Shen, W., and Louis, T. A. (2000), Triple-‐goal estimates for disease mapping. Statistics in Medicine 19, 2295-‐ 2308. Song, J.J., Ghosh, M., Miaou, S., and Mallick, B. (2006). Bayesian multivariate spatial models for roadway traffic crash mapping. Journal of Multivariate Analysis, 97, 246-‐273. Sonnier, G., Ainslie, A., Otter, T. (2007). Heterogeneity distributions of willingness-‐to-‐pay in choice models. Quantitative Marketing and Economics, 5, 313-‐331. Train, K. (2009). Discrete choice methods with simulations. Cambridge University Press.
Tunaru, R. (2002) Hierarchical Bayesian models for multiple count data. Austrian Journal of Statistics, 31 (3), 221-‐229. Washington, S., Congdon, P., Karlaftis, M. and Mannering, F. (2010). The Bayesian multinomial logit model: theory and route choice example. Transportation Research Record: Journal of the Transportation Research Board, No. 2136, 28-‐36. Washington, S., Ravulaparthy, S., Rose, J., Hensher, D. and Pendyala, R. (2012). Bayesian imputation of non-‐ chosen attribute values in revealed preference surveys. Journal of Advanced Transportation. Doi 10.1002/atr.201. Zou, Y., Y. Zhang, and D. Lord (2012) Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. Accident Analysis & Prevention, Vol. 50, pp. 1042-‐1051.
TABLES
Table 1. Impact of prior assumptions on inverse dispersion parameter and model error (υi) HPG model Data
Sample size
prior1 mean
PG data
PLN data
s.d.
HPL model
prior2 mean
s.d.
prior1 mean
s.d.
NB model
prior2 mean
s.d.
MLE mean
s.d.
30 sites
66.83 (180.3)
1.73
(1.00) 50.71 (146.7)
1.67
(1.1) 1789.98 (0.3)
50 sites
43.99 (120.0)
1.52
(0.84) 36.64
(98.7)
1.46
(0.9)
569.13
(3.9)
100 sites
7.38
(22.7)
1.22
(0.57)
(27.3)
1.10
(0.6)
1.12
(0.1)
30 sites
62.36 (168.7)
1.57
(0.92) 40.76 (127.1)
1.53
(0.9) 1664.88 (4.5)
50 sites
53.07 (129.6)
1.50
(0.81) 39.45 (106.6)
1.42
(0.9)
8.81
660.47
100 sites 7.03 (16.4) 1.23 (0.57) 5.67 (16.2) 1.09 (0.6) 1.40 s.d.: stands for the standard deviation from the 100 simulations PG data: means that PG – Poisson/Gamma model was used to generate data HPG: Hierarchical Poisson/Gamma model, HPL: Hierarchical Poisson/Lognormal model Prior1: non-informative, Prior2: informative
(3.6) (2.4)
Table 2: Effect of ranking methods based on the Spearman’s correlation coefficient (ρ) Bayesian approach: FHB vs EB approach Full Bayes approach Data
Sample size 30 sites
PG Data
50 sites 100 sites 30 sites
PLN Data
50 sites 100 sites
Statistics mean s.d. mean s.d. mean s.d. mean s.d. mean s.d. mean s.d.
HPG model prior1 prior2 0.66 0.69 (0.14) (0.12) 0.67 0.68 (0.09) (0.09) 0.69 0.69 (0.06) (0.06) 0.72 0.75 (0.13) (0.10) 0.74 0.75 (0.09) (0.09) 0.76 0.77 (0.06) (0.05)
EB approach
HPL model prior1 prior2 0.67 0.68 (0.13) (0.12) 0.68 0.67 (0.09) (0.09) 0.69 0.68 (0.06) (0.06) 0.73 0.74 (0.12) (0.10) 0.74 0.76 (0.09) (0.08) 0.77 0.77 (0.05) (0.05)
NB model MLE3 0.65 (0.15) 0.67 (0.09) 0.69 (0.06) 0.71 (0.14) 0.74 (0.09) 0.77 (0.05)
Ranking criteria: posterior distribution of θi and Ri Sample size
Posterior of θi
posterior of Ri
Statistics prior1 prior2 prior1 mean 0.69 0.70 0.68 PG Data 50 sites s.d. (0.08) (0.07) (0.09) mean 0.69 0.69 0.68 100 sites s.d. (0.07) (0.07) (0.07) s.d.: stands for the standard deviation from the 100 simulations
prior2 0.70 (0.07) 0.68 (0.07)
PG data: it means that PG – Poisson/Gamma model was used to generate data
EB estimator Negative Binomial 0.69 (0.08) 0.69 (0.07)
Table 3: Analysis of interurban travel mode choice
Large sample: 2500 individuals True values: -‐0.0182 -‐0.0060 ln(β0) β1 non-‐inf. mean -‐0.0189 -‐0.0059 s.d. 0.0033 0.0007 Lower bound 95% CI -‐0.0251 -‐0.0073 Upper bound 95% CI -‐0.0120 -‐0.0047 inf. mean -‐0.0185 -‐0.0058 s.d. 0.0024 0.0005 Lower bound 95% CI -‐0.0230 -‐0.0067 Upper bound 95% CI -‐0.0136 -‐0.0049 MLE mean -‐0.0189 -‐0.0058 s.e. 0.0032 0.0007 Lower bound 95% CI -‐0.0252 -‐0.0072 Upper bound 95% CI -‐0.0126 -‐0.0044 Small sample: 250 individuals non-‐inf. mean -‐0.0256 -‐0.0046 s.d. 0.0085 0.0014 Lower bound 95% CI -‐0.0425 -‐0.0077 Upper bound 95% CI -‐0.0092 -‐0.0023 inf. mean -‐0.0201 -‐0.0060 s.d. 0.0067 0.0013 Lower bound 95% CI -‐0.0341 -‐0.0088 Upper bound 95% CI -‐0.0079 -‐0.0037 MLE mean -‐0.0129 -‐0.0030 s.e. 0.0066 0.0014 Lower bound 95% CI -‐0.0258 -‐0.0057 Upper bound 95% CI 0.0000 -‐0.0003 s.e.: stands for the standard error of MLE s.d.: stands for the posterior standard deviation CI: stands for either credible or confidence interval
19.78 β2 19.81 8.35 12.99 31.03 19.31 2.56 15.05 25.12 19.15 4.30 12.53 29.30 15.00 191.33 5.41 28.61 20.06 10.92 10.08 41.27 15.29 370.44 -‐14.39 91.36