Computational Bayesian statistics in transportation

0 downloads 0 Views 4MB Size Report
In fact, based on the Bernstein-‐von Mises theorem it is possible ... For instance, even though Bayes' Theorem is the basis for deriving latent class models,.
Computational  Bayesian  statistics  in  transportation  modeling:  from  road  safety   analysis  to  discrete  choice  

  Ricardo  A.  Daziano  

School  of  Civil  and  Environmental  Engineering,  Cornell  University,  Ithaca,  NY  14853;  Email:   [email protected]  

  Luis  Miranda-­‐Moreno   Department  of  Civil  Engineering,  McGill  University,  Montréal  QC,  Canada  H3A  OC3;  Email:   luis.miranda-­‐[email protected]  

  Shahram  Heydari   Department  of  Civil  Engineering,  McGill  University,  Montréal  QC,  Canada  H3A  OC3;  Email:       May  2013       In   this   paper   we   review   both   the   fundamentals   and   the   expansion   of   computational   Bayesian   econometrics   and   statistics   applied   to   transportation   modeling   problems   in   road   safety   analysis   and  travel  behavior.  Whereas  for  analyzing  accident  risk  in  transportation  networks  there  has  been   a   significant   increase   in   the   application   of   hierarchical   Bayes   methods,   in   transportation   choice   modeling  the  use  of  Bayes  estimators  is  rather  scarce.  We  thus  provide  a  general  discussion  of  the   benefits  of  using  Bayesian  Markov  chain  Monte  Carlo  methods  to  simulate  answers  to  the  problems   of   point   and   interval   estimation   and   forecasting,   including   the   use   of   the   simulated   posterior   for   building   predictive   distributions   and   constructing   credible   intervals   for   measures   such   as   the   value   of   time.   Although   there   is   the   general   idea   that   going   Bayesian   is   just   another   way   of   finding   an   equivalent  to  frequentist  results,  in  practice  Bayes  estimators  have  the  potential  of  outperforming   frequentist   estimators   and,   at   the   same   time,   may   offer   more   information.   Additionally,   Bayesian   inference  is  particularly  interesting  for  small  samples  and  weakly  identified  models.         Keywords:  Bayesian  statistics;  MCMC;  discrete  choice;  road  safety  

1  Introduction   The   study   of   uncertainty   has   become   a   paramount   topic   for   several   fields   that   analyze   transportation   as   a   system,   including   economics,   statistics,   and   operations   research.   On   the   one   hand,   the   frequentist   or   classical   approach   handles   uncertainty   about   the   true   parameters   of   a   statistical   model   by   considering   these   parameters   as   fixed   but   unknown   constants.   On   the   other   hand,   the   Bayesian   approach   considers   the   true   parameters   to   be   random   variables.   There   are   several   advantages   to   the   Bayesian   approach,   including   estimators   that   are   exact,1  compatibility   with  the  likelihood  principle,  the  possibility  of  introducing  prior  knowledge,2  and  the  flexibility  of   working  with  predictive  posteriors.  In  fact,  based  on  the  Bernstein-­‐von  Mises  theorem  it  is  possible   to   argue   that   frequentist   estimators   are   a   good   approximation   of   the   Bayesian   approach   as   the   sample   size   gets   larger.3  Thus,   in   this   paper   we   review   the   expansion   and   modeling   benefits   of   computational   Bayesian   econometrics   and   statistics   applied   to   transportation   modeling   problems   in  travel  behavior  and  road  safety  analysis.    The  aim  of  this  paper  is  to  promote  broader  adoption  of   Bayesian   statistics   in   transportation   modeling   through   a   general   discussion   of   the   benefits   of   Bayesian   inference   and   forecasting.   We   illustrate   the   benefits   of   Bayes   estimators   versus   frequentist  estimators  by  analyzing  small-­‐sample  properties  in  two  Monte  Carlo  studies.     We   first   overview   the   fundamentals   of   Bayesian   statistics   (section   2),   paying   special   attention   to   simulation-­‐aided   inference   using   MCMC   methods.   In   effect,   the   expansion   of   Bayesian   methods   builds   on   the   rapid   development   of   Markov   chain   Monte   Carlo   (MCMC)   techniques,   which   are   a   class   of   simulation-­‐based   estimators.   Then   we   identify   two   relevant   fields   of   transportation   modeling  for  which  different  levels  of  application  of  the  Bayesian  approach  can  be  found.    On  the   one   hand,   in   analyzing   accident   risk   in   transportation   networks   there   has   been   a   significant   increase  in  the  application  of  hierarchical  Bayes  methods  (e.g.  Heydecker  and  Wu,  2001,  and  Song,   2005;   Miranda-­‐Moreno   et   al.,   2005;   section   3   reviews   these   applications).   One   of   the   main   advantages   of   hierarchical   Poisson   models   is   that   they   can   deal   with   over-­‐   or   under-­‐dispersion,   spatial  random  variations,  time  trends,  and  clustering  in  the  data.  Another  benefit  of  the  Bayesian   approach   is   that   the   simulated   posterior   predictive   distribution   is   a   direct   result   of   MCMC   estimation.   Different   risk   measures   can   then   be   derived   from   the   posterior   distribution   and   used   for  multiple  purposes,  such  as  before-­‐after  observational  studies  and  identification  of  accident-­‐risk  

                                                                                                                1  Bayes  estimators  have  properties  that  are  valid  for  small  samples.  

2  Prior  knowledge  includes  building  on  previous  research  as  well  as  theoretical  constraints,  such  as  parameters  having  a  

particular  sign.   3  For  small  samples,  frequentist  estimators  lose  their  good  statistical  properties.  

 

contributing   factors.   To   show   the   advantage   of   Bayesian   statistics   over   MLE   and   to   analyze   the   impact   of   different   modeling   decisions   and   parameters   that   can   affect   the   final   outcome   of   safety   analyses,   a   data   simulation   framework   is   developed   to   measure   the   accuracy   of   parameter   estimates.   Furthermore,   we   examine   the   effect   of   the   approach   (Full   Bayes   vs.   Empirical   Bayes),   differing   model   assumptions   and   ranking   methods   (Gamma   vs.   LogNormal),   and   type   of   prior   choice   (informative   versus   non-­‐informative).   Our   posterior   analysis   highlights   the   fact   that   more   complex   methods   do   not   necessarily   provide   better   results.   In   fact,   the   impact   of   the   model   error   structure  seems  to  be  marginal.     While  there  are  hundreds  of  Bayesian  applications  in  road  safety  analysis,  in  travel  choice  analysis   the   large   majority   of   applications   are   frequentist   (with   the   exception   of   a   very   few   excellent   Bayesian  examples  to  be  overviewed  in  section  4).  We  argue  here,  however,  that  the  application  of   Bayesian   microeconometrics   to   choice   modeling   is   natural   because   the   concept   of   subjective   probabilities  is  akin  to  the  behavioral  assumptions  of  discrete  choice  models.  Despite  this  affinity,   transportation   modelers   have   generally   been   reluctant   to   adopt   Bayesian   techniques,   even   with   estimators   available   for   the   most   common   models,   including   mixed   logit   with   parametric   (Train,   2009)   and   nonparametric   heterogeneity   distributions,   probit   (Albert   and   Chib,   1993),   and   nested   logit   models   (Lahiri   and   Gao,   2002).   In   empirical   transportation   choice   modeling   only   very   few   applications  use  Bayes  estimators  (e.g.  Bolduc  et  al.,  1997;  Kim  et  al.,  2003;  Fang,  2008;  Washington   et   al.,   2010).   Although   some   researchers   have   adopted   Bayesian   tools   for   specific   modeling   components,  including  efficient  design  of  stated-­‐preference  experiments,  choice  set  modeling,  and   latent   class   models,   estimation   and   forecasting   problems   are   not   usually   treated   from   a   Bayesian   perspective.   For   instance,   even   though   Bayes’   Theorem   is   the   basis   for   deriving   latent   class   models,   researchers  continue  to  use  frequentist  estimators.  Taking  the  simulation  of  accident  risk  analysis   as   inspiration,   we   illustrate   the   benefits   of   Bayes   inference   in   choice   modeling   by   performing   a   Monte  Carlo  study  for  analyzing  interval  estimation  of  the  value  of  time.  The  Monte  Carlo  focuses  on   small-­‐sample  properties  of  the  estimator  and  of  the  estimates  derived  by  postprocessing  draws  of   the  original  parameters.     In  conclusion,  section  5  overviews  reasons  for  encouraging  a  broader  adoption  of  Bayesian  tools.        

2  Overview  of  Bayesian  econometrics   2.1  Bayesian  statistical  models   This  first  subsection  overviews  concepts  that  are  treated  in  detail  in  Gourieroux  (1995),  Lancaster   (2004),  

Geweke  

(2005),  

and  

Greenberg  

(2008).  

Consider  

the  

statistical  

model  

(Y ,P = Pθ = l( y;θ )µ ,θ ∈ Θ ⊆ R p , p ≥ 1) ,  where   y  is  a  vector  of  observations  in  the  sample  space  

Y ,     P  is  a  parameterized  family  of  probability  density  functions   Pθ on  Y ,   l( y;θ ) is  the  likelihood   function,   µ is   the   dominating   measure,   θ is   a   vector   of   p   unknown   parameters,   and   Θ is   the   parameter  space.  In  parametric  statistics,  the  point  estimation  problem  reduces  to  propose  a  value  

θˆ  to   the   true   but   unknown   θ .   In   a   Bayesian   setting   of   statistical   decision   problems θ has   a   prior   p(θ ) that   describes   the   probability   distribution   of   the   unknown   parameter   before   having   access   to   the   evidence.   The   combination   of   the   prior   with   the   information   coming   in   via   the   sample   data   determines   the   posterior   distribution   of   the   parameters   p(θ | y) .   The   posterior   and   prior   distributions   are   related   following   Bayes   theorem   according   to   p(θ | y) = p( y | θ ) p(θ ) / p( y) ,   where   p( y | θ )  represents  the  distribution  of  the  observations  for  every  particular  value  of   θ ,  and  

p( y)  is  the  marginal  distribution  of  the  data.  Note   p( y | θ ) = l( y;θ )  by  definition.  Since   p( y)  is  a   constant,   for   inference   purposes   Bayes   theorem   is   rewritten   as   p(θ | y) ∝ p( y | θ ) p(θ ) ,   which   emphasizes   the   notion   of   updating   knowledge   through   evidence.   Bayesian   inference   is   built   as   a   decision   making   process   under   uncertainty,   where   the   researcher   chooses   an   estimator   that   minimizes  the  probability  of  being  wrong.  If  the  decision  is  taken  ex  post  (after  the  observation  of  

y ),  the  Bayes  estimator  is  built  by  minimizing  the  expected  loss  or  posterior  risk    

R(θˆ | y) =

∫ L(θˆ | θ ) p(θ | y)d(µθ)  

 

 

 

 

 

 

(1)  

Θ

where  the  loss  function   L(θˆ | θ ) measures  the  cost  of  taking  a  wrong  decision.  In  general,  a  linear-­‐ linear  loss  function  yields  to  a  Bayes  decision  equal  to  the  posterior  qth  quantile.  However,  the  most   common   loss   function   is   the   general   quadratic   loss   L(θˆ | θ ) = (θˆ − θ )'Q(θˆ − θ ) ,   where   Q  is   a   positive   definite   matrix.   With   a   quadratic   loss   function,   the   Bayes   estimator   θˆ  is   unique   and   corresponds  to  the  mean  of  the  posterior  distribution  

θˆ = ∫ θ p(θ | y)d(µθ) = E(θ | y)   Θ

 

 

 

 

 

 

(2)  

In   addition,   with   a   quadratic   loss   function   an   unbiased   estimator   in   the   Bayesian   sense   of   the   precision  (risk)  of  the  Bayes  estimator  is  the  posterior  variance.     Even   though   Bayesian   and   classical   inference   are   intrinsically   different,   a   Bayesian   estimate   depends   on   the   data   and   a   different   sample   will   generate   a   different   estimate.   If   we   denote   the   true   a

value   of   θ  by   θ 0 ,   the   Bernstein-­‐von   Mises   theorem   states   that   θˆ = E(θ | y)→ N θ 0 , I (θ 0 )−1 / N ,  

(

)

where   I (θ 0 )  is   the   asymptotic   Fisher   information   matrix.     Thus,   under   regularity   conditions,   the   Bayes  estimator   θˆ  is  asymptotically  unbiased,  consistent,  normal,  and  efficient,  which  are  the  same   asymptotic  properties  of  the  maximum  likelihood  estimator.  In  fact,  when   N is  large,  the  Bayes  and   maximum  likelihood  estimators  are  approximately  equal.  Note  that  for  large  samples  the  evidence   provided  by  the  observations  is  such  that  the  effect  of  the  prior  can  be  neglected.     2.2  Markov  chain  Monte  Carlo  methods   Bayesian   inference   examines   the   posterior   distribution   p(θ | y) ,   and   in   particular   the   posterior   first   and   second   moments.   Although   the   density   of   the   posterior   distribution   can   be   obtained   using   Bayes   theorem,   the   main   difficulty   concerns   the   analytical   characterization   of   this   distribution.   When   p(θ | y)  is  not  known  explicitly,  Bayesian  inference  makes  use  of  Markov  chain  Monte  Carlo   (MCMC)   methods,   which   are   a   class   of   stochastic   sampling   algorithms   based   on   constructing   a   Markov   chain   that   has   the   desired   posterior   as   its   equilibrium   distribution   (Gelfand   and   Smith,   1990).   MCMC   estimation   belongs   to   the   class   of   Monte   Carlo   methods   of   repeated   sampling   for   simulation,   and   hence   is   relatively   easy   to   implement.   Computational   Bayesian   statistics   has   allowed  researchers  to  simulate  posteriors  that  are  analytically  intractable,  just  as  simulation-­‐aided   inference   has   permitted   frequentist   estimation   of   sophisticated   models.   The   difference   between   simulation   for   Bayesian   inference   and   for   frequentist   inference   is   that   the   former   stops   at   the   evaluation   of   a   density,   whereas   the   latter   involves   maximization   of   a   simulated   likelihood.   As   a   result,  computational  Bayesian  statistics  is  gradient  and  Hessian  free.  There  are  different  methods   to  iteratively  build  a  Markov  chain  converging  to  the  posterior  of  interest,  but  Gibbs  sampling  and   Metropolis  Hastings  are  the  most  typically  applied.            

2.3  Gibbs  sampling   Gibbs   sampling   requires   that   all   conditional   distributions   of   a   specific   partition   of   the   parameter   space   have   a   closed-­‐form   to   draw   samples   from.   Let   P be   a   partition   of   θ ∈ Θ  such   that   ' ' ) ,   θ > p = (θ p+1 ,...,θ P' ) ,   θ P = ∅ ,  and   θ − p = (θ < p ,θ > p ) .  The  partition  is  chosen  such  that  it  is  possible  to  draw  directly  from   each  full  conditional  distribution   π (θ p | θ − p ) .  The  Gibbs  sampler  is  an  algorithm  based  on  an  MCMC   where,   at   iteration   gth,   the   transition   process   from   θ ( g−1)  to   θ ( g )  is   defined   through  

θ p( g ) ~ π (θ p | θ ( g−1) ) .   It   can   be   shown   that   this   reversible   Markov   chain   generates   an   instance   p from  the  posterior  distribution  at  each  iteration.       2.4  Metropolis  Hastings   Sometimes   direct   sampling   for   one   or   more   of   the   conditional   distributions   inside   the   Gibbs   sampler   is   difficult,   because   of   a   conditional   distribution   lacking   a   closed   form.   For   Metropolis-­‐ Hastings   implementation,   a   candidate   θ (cand )  is   drawn   from   the   transition   probability  

q(θ (cand ) | θ (curr ) ) of   generating   the   candidate   given   the   current   guess.   The   candidate   realization   is   then  compared  to  the  current  guess  through  the  acceptance  rate:  

! p( y | θ (cand ) ) p(θ (cand ) )q(θ (cand ) | θ (curr ) ) $ α = min "1, %   (curr ) ) p(θ (curr ) )q(θ (curr ) | θ (cand ) ) & # p( y | θ . Starting   with   an   arbitrary   value   θ (0) in   the   Metropolis   Hastings   algorithm   at   the   gth   iteration   the   candidate  is  accepted  as  the  new   θ ( g ) = θ (cand )  with  probability   α ,  whereas  the  old  one  is  preserved  

θ ( g ) = θ (curr ) with   probability   1− α .   In   the   case   q(θ (cand ) | θ (curr ) ) = q(θ (cand ) − θ (curr ) )  the   generating   process   of   the   candidate   is   a   random-­‐walk   Metropolis   chain.   If   the   proposal   density   is   such   that  

q(θ (cand ) | θ (curr ) ) = q(θ (cand ) ) the   process   is   a   Metropolis   independence   chain.   Realizations   in   the   Metropolis   Hastings   algorithm   generate   instances   from   the   posterior   distribution.   Parameters   of   the  transition  probability  are  chosen  such  that  the  proposed  candidates  will  explore  areas  of  high   probability   in   an   efficient   manner,   i.e.   avoiding   poor   mixing.   Choosing   the   correct   parameters   for   obtaining  desired  acceptance  rates  is  known  as  tuning.  Note  that  the  Gibbs  sampler  is  a  special  case   of   the   Metropolis   Hastings   algorithm,   where   the   proposal   density   is   given   by   the   full   conditional   distributions,  the  acceptance  rate  always  equals  one,  and  no  tuning  is  necessary.  

  2.5  Summarizing  and  postprocessing  simulated  posteriors   Once   draws   of   the   posterior   distribution   have   been   simulated   using   Monte   Carlo   sampling,   point   and   interval   estimates   can   be   easily   constructed.     Whereas   the   Bayes   point   estimator   in   equation   (2)   depends   on   the   analytical   posterior,   when   using   MCMC   methods   it   is   possible   to   use   an   empirical  mean.  Thus,  the  point  estimate  is  simply  calculated  as  the  mean  of  the  posterior  sample  

θˆ = 1/ GΣGg=1θˆ ( g ) ,   where   G   is   the   total   number   of   iterations   used   for   simulating   the   posterior.   The   2.5%   and   97.5%   posterior   quantiles   can   be   used   for   deriving   a   95%   credible   interval   (although   highest   probability   density   regions   should   be   preferred).     A   very   attractive   feature   of   MCMC   methods   is   the   possibility   of   postprocessing   the   sample   to   make   inference   on   a   function   of   the   original   parameters   (Edwards   and   Allenby,   2003).   For   instance,   the   point   estimate   of   a   function  

h(θˆ )  is  given  by     h(θˆ ) = 1/ GΣGg=1h(θˆ ( g ) ) .     3  Bayesian  statistical  models  of  road  safety   In  the  road  safety  literature,  Bayesian  methods  have  become  very  popular  with  hundreds  of  peer-­‐ reviewed   articles   published   in   this   topic.   These   methods   are   commonly   used   for   two   tasks:   i)   ranking   and   selection   of   hotspots   for   engineering   safety   improvements   and   ii)   evaluation   of   countermeasures   in   before-­‐after   studies.   In   terms   of   statistical   modeling,   let   yi  be   the   observed   number   of   accidents   at   site   i,   which   is   assumed   to   be   generated   according   to   a   parametric   probability   density   function   f (⋅ | θ i )  (e.g.,   a   Poisson   distribution   with   mean   unknown   θ i ).   For   Bayesian   inference   a   prior   distribution   p(θ i | η )  is   first   assumed,   where   η  is   a   vector   of   prior   parameters.  This  prior  information  is  then  combined  with  the  information  brought  by  the  sample   into   the   posterior   p(θ i | yi ) .   Within   the   class   of   Bayesian   methods,   we   can   distinguish   two   main   approaches:   empirical   Bayes   (EB)   and   full   hierarchical   Bayes   (FHB).   One   important   difference   between   these   two   approaches   is   in   the   way   the   prior   parameters   are   determined.   In   the   EB   approach,   the   parameters   η  are   estimated   using   a   maximum   likelihood   estimator   (MLE)   or   any   other   frequentist   technique   (e.g.,   method   of   moments).   Thus,   in   EB   the   estimates   ηˆ depend   only   on   the   (accident)   data.   Alternatively,   FHB   assumes   an   additional   layer   of   randomness   on   η ,   where   hyperparameters   are   set   by   the   modelers.   In   the   following   subsections   both   approaches   are   illustrated  in  the  context  of  road  safety  analysis.    

3.1  MLE  and  Empirical  Bayes   The  EB  approach  is  commonly  implemented  through  the  use  of  Negative  Binomial  models  (Abbess   et  al.,  1981;  Hauer  and  Persaud,  1987;  and  Higle  and  Witkowski,  1988  use  these  models  for  hotspot   identification).   Assuming   that   the   number   of   accidents   Yi at   site   i   over   a   given   time   period   Ti is   Poisson   distributed,   the   standard   Negative   Binomial   (or   Poisson-­‐Gamma)   model   for   Yi can   be   represented  as  (Winkelmann,  2003),  

i.   Yi | T i ⋅θ i ~ Poisson(Ti ⋅ θ i ) ,                 ~ Poisson(Ti ⋅ µi e i )   ε

 

 

 

 

 

 

(3)  

 

 

 

 

 

 

(4)  

ii.   eεi | φ ~ Gamma(φ , φ )  or   θ i ~ Gamma(φ , φ / µi ) ,   ε

ε

where     e i is   a   multiplicative   random   effect   and   is   Gamma   distributed     with   E(e i ) = 1  and   ε

V (e i ) = 1/ φ .   µi  is  the  mean  response  for  site  i,  and   φ  is  the  inverse  dispersion  parameter  of  the   Poisson-­‐Gamma  distribution.  Also,     µi  is  commonly  specified  as  a  function  of  site-­‐specific  attributes   or   covariates   as   in   µi = f (Fi1 , Fi 2 , xi ; β ) ,   where   Fi1 and   Fi 2 are   traffic   flows   representing   a   measure   of   traffic   exposure.4     xi  is   a   vector   of   covariates   representing   site-­‐specific   attributes   and   β  is   a   vector   of   regression   parameters   to   be   estimated   from   the   data.   To   capture   the   nonlinear   relationship  between  the  mean  number  of  accidents  and  traffic  flows,  alternative  functional  forms   can   be   specified   for   µi .   For   instance,   a   common   functional   form   for   intersections   is  

µi = β0 ( Fi1 + Fi 2 )

β1

(F

i1

Fi 2

)

β2

 (see   Miaou   and   Lord,   2003).   After   the   integration   of   the   random  

ε

error   e i ,   the   marginal   distribution   of   Yi  is   the   density   of   the   Negative   Binomial   distribution.   The   loglikelihood  of  the  model  can  be  then  maximized  numerically.         3.2  Full  Hierarchical  Bayes   FHB  is  implemented  through  the  use  of  hierarchical  Poisson  models.  Considering  again  that   Yi  is   Poisson  distributed,  a  hierarchical  Poisson-­‐Gamma  model  can  be  defined  as  follows  (Rao,  2003):  

                                                                                                                4  Because   of   the   lack   of   disaggregate   data,   traffic   exposure   is   usually   defined   according   to   the   average   annual   daily   traffic  

(ADDT)   flows.   Little   work   has   been   focused   on   the   relationships   between   crashes   and   other   traffic   flow   characteristics   such  as  vehicle  density,  level  of  service,  vehicle  occupancy,  speed  distribution,  etc  (Lord  et  al.,  2005b).  Few  works  have   considered  hourly  exposure  functions  accounting  for  traffic  composition  and  other  temporal  variations  (we  refer  to  the   work  of  Qin  et  al.,  2004).  

i.     Yi | Ti ,θ i ~ Poisson(Ti ⋅ θ i )    

 

 

 

              ~ Poisson(Ti ⋅ µi e i )   ε

 

 

 

 

(5)  

ii.     e i | φ ~ Gamma(φ , φ ) ,  

 

 

 

 

 

 

(6)  

iii.   φ ~ Gamma(a,b)    

 

 

 

 

 

 

(7)  

ε

  Unlike   EB,   in   the   FHB   approach   the   inverse   dispersion   parameter   φ  is   assumed   to   be   random   following   a   Gamma   distribution   with   parameters   a and   b .   Furthermore,   the   regression   coefficients β  in   µi = f (Fi1 , Fi 2 , xi ; β )  are  also  assumed  to  be  random.    As  a  result,  this  third  level   of   randomness   assumed   on   the   model   parameters   (φ , β )  is   the   main   difference   between   this   hierarchical   model   and   the   Negative   Binomial   model   discussed   in   the   previous   subsection.   To   obtain   parameter   estimates,   posterior   inference   can   be   done   by   MCMC   simulation   methods   such   as   Gibbs   sampling   and   the   Metropolis-­‐Hastings   algorithm.   In   practice,   either   an   informative   or   non-­‐ informative  prior  can  be  assumed  on   φ .  For  instance,  an  informative  prior  can  be  assumed  by  fixing   the   hyperparameters   a and   b  according   to   previous   data   or   expert   knowledge   (e.g.,   see   Miranda-­‐ Moreno   et   al.,   2007).   Non-­‐informative   prior   distributions   on   the   regression   parameters   β  are

commonly assumed. However, semi-informative or informative priors can easily be integrated as discussed later in our simulation study. Note that one of   the   advantages   of   the   FHB   approach   is   the   flexibility   it   offers   to   formulate   and   ε

compute  alternative  models.  Instead  of  assuming  a  Gamma  distribution  as  a  prior  for   e i ,    can  be   ε

2

2

assumed   to   be   Normal   distributed,   εi = ln(e i ) | σ ~ N (0,σ ) ,   with   a   proper   hyper-­‐prior   for   the   parameter   σ 2 .  Note  that  this  model  with  additive  Normal  error  is  not  directly  comparable  with  the   hierarchical   Poisson/Gamma   model   in   equation   5-­‐7.   Then   as   an   alternative   specification,   the   hierarchical   Poisson/Lognormal   (HPL)   model   can   be   employed   with   a   log-­‐normal   prior   on   the   multiplicative  random  error:   e εi ~ LogNormal (−0.5τ ε2 , τ ε2 ) ,  with    

 is  the  shape  parameter  and  by  

specifying τ ε2 = log(1 + 1 / φ)  one   can   show   that   E(e εi ) = 1  and   Var[e εi ] = 1 / φ .   This   leads   to   comparable  model  to  the  hierarchical  Poisson/Gamma  model  defined  above.      

Hierarchical   Poisson   models   can   consider   spatiotemporal   patterns   of   accident   counts.   This   extension   is   useful   to   work   with   complex   datasets   in   which   longitudinal   and/or   spatial   information   is  available  (Miaou  and  Song,  2005;  Song  et  al.,  2006).  Space-­‐time  hierarchical  Poisson  models  can   be   defined   by   incorporating   into   the   mean   number   of   accidents   a   spatial   and   temporal   effect,   e.g.  

θit = µi eεit eϕt e ρi ,  where   θit  is  the  mean  number  of  accidents  at  site  i  and  period  t  (for  each  site  there   are  t  observation  periods   t = 1,...,T ).  In  this  model,   ϕ t  denotes  a  time  effect  representing  a  possible   time   trend   due   to   socioeconomic   changes,   traffic   operation   modifications,   weather   variations,   etc.  

ρi  is  a  space-­‐random  effect  accounting  for  spatial  correlation  among  sites.5  As  noted  by  Miaou  and   Song   (2005),   random   variations   across   sites   may   be   structured   spatially   in   some   way   due   to   the   complexity  of  the  traffic  interaction  around  locations  or  given  that  driver  behaviors  are  influenced   by  a  multitude  of  factors  (e.g.,  roadside  development  near  locations  and  environmental  conditions).   Hierarchical   Poisson   models   have   been   also   extended   to   a   multivariate   setting,   which   have   been   used   to   analyze   simultaneously   several   accident   types   (e.g.,   accident   counts   divided   in   different   severity   levels,   such   as   fatal   accidents,   accidents   with   injuries   and   others).   Under   a   multivariate   setting,  one  is  able  to  take  into  account  correlation  structures  between  the  response  variables.  For   instance,  one  may  suspect  that  a  given  site  with  a  high  number  of  fatal  accidents  would  also  present   a   high   number   of   injury   accidents   since   they   may   share   the   same   set   of   risk   factors   (e.g.,   an   excessive   curvature   or   a   bad   condition   of   road   surface).   Multivariate   Poisson   hierarchical   models   with  covariates  and  time-­‐space  random  effects  have  been  extensively  studied  by  Song  et  al.  (2006)   in  the  road  safety  context.  Without  accounting  for  spatial  correlation,  many  studies  have  analyzed   accident  datasets  with  different  severity  outcomes  using  a  multivariate  Poisson  setting,  e.g.,  one  can   refer  to  Tunaru  (2002)  and  Brijs  et  al.  (2007),  Aguero-­‐Valverde  and  Jovanis  (2009),  El-­‐Basyounya   K.   Sayed   T.   (2012).   Latent   class   models,   multilevel   models   and   finite   mixture   hierarchical   models   have   been   also   applied   to   crash   data   -­‐   Miranda-­‐Moreno   (2006),   Huang,   H.,     Abdel-­‐Aty   M.   (2010),   Zou   et   al.   (2012).   Under   a   full   Bayes   setting,   some   recent   studies   have   applied   multilevel   models   Poisson-­‐Weibull  model  (Cheng  et  al.  2012).   3.3  Bayesian  Methods  for  Hotspot  Identification  and  Before-­‐After  Studies   Various   ranking   criteria   or   safety   measures   can   be   used   for   hotspot   identification   (Higle   and   Witkowski,   1988;   Schluter   et   al.,   1997;   Persaud   et   al.   1999;   Heydecker   and   Wu,   2001;   Tunaru,  

                                                                                                                5  Spatial  dependency  of     θ  with  respect  to  other  sites  –  which  suggests  that  sites  that  are  closer  to  each  other  are  more   it likely  to  have  common  features  affecting  their  accident  occurrence.  

2002;  Miaou  and  Song,  2005;  Miranda-­‐Moreno,  2005;  Song  et  al.,  2006;  Brijs  et  al.,  2007;  Lan  and   Persaud,  2011;  El-­‐Basyouny  and  Sayed,  2012).  In  this  context,  the  use  of  Bayesian  statistics  is  highly   attractive   due   to   the   flexibility   of   working   with   predictive   posteriors.   The   most   popular   safety   measures  include  the  posterior  expected  number  of  accidents,  the  posterior  probability  of  excess,   the   posterior   expectation   of   ranks,   and   the   posterior   probability   that   a   site   is   the   worst.   These   safety  measures  have  been  used  in  different  previous  works  (e.g.,  Schluter  et  al.,  1997;  Heydecker   and  Wu,  2001;  Tunaru,  2002).  The  posterior  expectation   θˆi = E(θ i | yi )  is  perhaps  the  most  popular   parameter  of  interest  in  the  safety  literature.       The   θˆi ranking   criterion   is   a   point   estimate   of   the   underlying   mean   number   of   accidents   over   the   long   term.6  In   order   to   select   a   list   of   sites   for   safety   inspections,   we   can   simply   sort   the   n   sites   under   analysis   based   on   their   posterior   mean   number   of   accidents   and   then   select   the   top   r   locations   (r θ j ,∀j ≠ i, s ∈ [0,+∞)) .   Then,   for   example   if   s = 1 ,   ps is   simply   i

the   posterior   probability   that   θ i  is   the   largest   and  

i



n i=1

ps = 1 .   This   criterion,   which   was   first   i

suggested  by  Schluter  et  al.  (1997)  and  applied  later  by  Tunaru  (2002),  can  be  interpreted  as  the   average   probability   that   the   mean   of   the   accident   frequency   at   a   given   site   is   greater   than   all   the   other  sites.  Unfortunately,  this  criterion  is  computationally  challenging  when  working  with  a  large   set  of  sites  ( n > 2000 ).  Under  this  situation,  the  values  of   ps will  be  also  very  small.     i

  Instead   of   making   inference   based   on   p(θ i | yi ) ,   we   can   use   the   posterior   distribution   of   ranks   denoted  by   p(Ri | yi ) ,  where   Ri  is  the  true  rank  of   θ i  and  is  defined  as   Ri =



n j=1

I(θ i ≥ θ j ) ,  where  

                                                                                                                6  For  instance,  if  

θˆi is  equal  to  two  accidents  per  year,  it  means  that  in  five  years  ten  accidents  are  expected.  

I(⋅)  is  an  indicator  function  (Rao  2003).  The  greatest  ranks  correspond  to  the  most  hazardous  sites.   (Note   that   the   smallest   θ i has   rank   1.)   The   posterior   expectation   of   ranks   has   been   widely   recommended   as   a   ranking   criterion   and   is   defined   as   E(Ri | yi ) =



n j=1

p(θ i ≥ θ j | yi ) .   Using  

hierarchical  Bayes  models  this  ranking  criterion  has  been  applied  by  Tunaru  (2002)  and  Miaou  and   Song   (2005).   This   criterion   can   be   computed   very   easily   in   a   full   Bayes   setting   when   using   hierarchical   Poisson   models.   Some   studies   have   indicated   that   ranking   by   E(Ri | yi )  can   be   more   accurate   than   E(θ i | yi )  (Shen   and   Louis,   2000).   It   has   been   shown   that   the   use   of   the   posterior   distribution   of   ranks   is   a   convenient   criterion   in   similar   problems   in   epidemiology   (Rao,   2003).   However,  for  large  samples  the  evaluation  of     E(Ri | yi )  can  be  computationally  harder  than  other   absolute  ranking  criteria  such  as   E(θ i | yi ) .       Traditionally,   EB   has   been   used   in   most   of   the   before-­‐after   studies   reported   in   the   literature.   However,   since   the   improvement   of   the   computational   capacities   of   computers,   the   full   Bayes     approach  –  being  characterized  by  high  dimensional  and  complex  integrations   –  has  been  gaining   popularity   in   accident   data   analysis   during   the   last   decade.   Recently,   a   few   researchers   have   adopted   a   fully   Bayesian   framework   as   an   alternative   approach   to   EB     in   observational   before-­‐after   studies  (Pawlovich  et  al.,  2006;  Aul  and  Davis,  2006;  Li  et  al.,  2008;  Lan  et  al.,  2009;  Persaud  et  al.,   2010;  El-­‐Basyouny  and  Sayed,  2010;  Park  et  al.,  2010;  Yanmaz-­‐Tuzel  and  Ozbay,  2010;  Schultz  et  al.,   2011;  El-­‐Basyouny  and  Sayed,  2012).       Finally,   the   modeler   has   the   possibility   of   introducing   expert   criteria   or   past   evidences   into   the   analyses  through  the  prior  distribution  of  the  model  parameters.  The  latter  is  particularly  relevant   when  dealing  with  data  characterized  by  a  small  number  of  observations  and/or  low  mean  values   (Lord  and  Miranda-­‐Moreno,  2008).  The  main  steps  to  apply  an  FHB  before-­‐after  study  are:   •

Obtain   the   posterior   expected   accident   frequencies   for   treated   and   comparison   sites   in   the   before   and   after   periods,   which   are   θTBi ,   θTAi ,   θCBi ,   and   θCAi .   Treatment   and   comparison   sites   are   denoted   by   T   and   C,   respectively.   The   before   and   after   periods   are   denoted   by   B   and   A,   respectively;  



Compute  the  ratio  (for  the  comparison  group)  between  the  expected  accident  frequencies  in  the   after  period   θCAi and  those  in  the  before  period   θCBi ,   Ri = θCBi θCAi ;  



Calculate   π ,   which   is   an   estimate   of   the   safety   of   a   treated   site   if   treatment   would   not   be   implemented.  This  is  estimated  as  the  posterior  expected  accident  frequency  in  the  after  period   ( θTA )  for  a  given  site  by  the  ratio  obtained  in  step  2,  i.e.   π = θTA !"θCB θCA #$ ;  



Calculate  the  treatment  effectiveness,  defined  as  the  ratio  between   θTA  and   π .  

  3.4  A  Monte  Carlo  study:  FHB  vs.  MLE   As   discussed  above,  many  alternative   Bayesian   models   and   ranking   methods   have   been   proposed   for  road  safety  analysis.  To  show  the  flexibility  and  advantages  of  the  FHB  approach  with  respect  to   the   traditional   EB   approach   based   on   the   standard   NB   model,   an   extensive   simulation   study   was   executed.   The   impact   of   different   model   assumptions   and   methods   was   evaluated   including:   i)   model   error   assumption   on   υi   (Gamma   vs   LogNormal),   ii)   type   of   prior   (informative   versus   non-­‐ informative)  on  φ  and/or  regression  parameters  and  iii)  ranking  criteria  for  hotspot  identification,  -­‐   p(θi|y)  vs  p(Ri|y).     To   compare   alternative   priors   on   υi,   the   hierarchical   Poisson/Gamma   and   Poisson/Lognormal   models   described   in   Section   3.2   are   used.   To   illustrate   the   importance   of   prior   information   on   model  parameters,  semi-­‐informative  priors  for   φ  are  built  based  on  past  studies.  One  often  knows   (as  prior  information)  that  a  parameter  of  interest  is  expected  to  be  positive  or  negative,  or  one  has   an   idea   of   the   range   of   variation.   In   some   circumstances,   model   parameter   estimates   for   many   studies   have   been   published   and   summarized.   For   instance,   we   can   refer   to   Elvik   (2009)   that   provides   a   summary   of   the   model   parameters   for   SPF   at   intersections   for   pedestrian-­‐vehicle   crashes.  Also,  Miranda-­‐Moreno  et  al.  (2013)  used  the  inverse  dispersion  parameter  estimates  from   previous   studies   to   build   informative   priors.   Hence,   in   a   FHB   framework,   we   can   easily   take   advantage  of  past  evidence,  in  particular  when  the  observed  data  are  limited.       3.4.1  Simulation  framework   The   quasi-­‐simulation   was   built   upon   5,595   three-­‐legged   intersections   in   California.   The   true   parameters   were   fixed   using   the   estimates   for   the   whole   sample.   As   data   generating   process,   the  

(

safety   performance   function   used   was     µi = β0 Fi1 + Fi 2

β1

) (F

i1

Fi 2

)

β2

,   where   Fi1  and   Fi 2  are   the  

average   major   and   minor   entering   daily   traffic   volumes,   respectively.   Also,   β0 ,   β1 ,   and   β 2 are   parameters   to   be   estimated   from   the   data.   To   increase   the   reliability   of   inferences,   the   results   were   obtained  from  100  simulated  crash  datasets.  

  The  following  steps  were  considered  for  the  data  simulation  framework:   1.  Calculate  the  μi  using  the  safety  performance  function  and  the  predefined  true  parameters;   2.   Generate   the   multiplicative   random   effect   υi   (Gamma   vs   Log-­‐Normal)   based   on   the   true   dispersion  parameter;   3.  Compute  the  expected  accident  frequency  (θi)  by  multiplying  the  values  from  the  previous   steps  and  generate  accident  frequency  (counts)  using  the  Poisson  distribution  with  (θi);   4.   Obtain   posterior   estimates   of   interest   using   an   MCMC   approach   (here,   we   ran   20000   iterations   from   which   5000   were   discarded   as   burn-­‐in).   Obtain   statistics   based   on   the   100   simulated  datasets  and  compare  outcomes  among  different  methods  (FHB  vs  EB);   5.   Evaluate   the   effect   of   using   alternative   model   settings,   prior   assumptions,   and   ranking   criteria  on  parameters  accuracy  and  hotspot  identification.     It   is   important   to   mention   here   that   we   specified   two   types   of   priors   for   φ   and   βk:   a)   prior1   that   includes  non-­‐informative  priors  for  all  model  parameters;  and  b)  prior2  that  includes  informative   prior   for   φ   and   semi-­‐informative   priors   for   regression   parameters.   An   informative   Gamma   was   assumed  for  φ,  with  φ  ~  Gamma(2.5,1.3),  as  proposed  by  Miranda-­‐Moreno  et  al.  (2013)  for  the  same   training   dataset.   A   semi-­‐informative   prior   (positive   constrained   distribution)   is   used   for   each   regression   parameter,   given   that   as   prior   information   we   know   that   traffic   flow   parameters   are   expected   to   be   positive.   Hence,   we   impose   our   regression   parameter   priors   to   take   only   positive   values.     3.4.2  Results   Table  1  presents  the  simulation  outcomes  to  evaluate  the  impact  of  an  alternative  approach  (FHB  vs   MLE)  on  the  accuracy  of  the  inverse  dispersion  parameter  estimates.  The  results  indicate  that  when   dealing   with   limited   data,   the   FHB   approach   with   an   informative   hyper-­‐prior   choice   provides   much   more   accurate   estimates   than   the   EB   based   on   MLE   and   the   negative   binomial   (NB)   model.   In   accordance   with   previous   studies,   when   the   sample   size   is   small,   the   latter   approach   can   lead   to   erroneous   estimates   (Lord   and   Miranda-­‐Moreno,  2008;  Miranda-­‐Moreno  et  al.  2013;  Heydari  et  al.,   2013).   Moreover,   the   MLE   falls   short   in   estimating   the   uncertainty   around   the   mean   values   in   limited   datasets   -­‐   estimated   confidence   intervals   are   biased.   It   can   be   also   inferred   that   credible   intervals   obtained   from   the   informative   prior   choice   are   smaller   (hence,   these   are   more   precise   estimates)   than   those   estimated   from   the   non-­‐informative   hyper-­‐prior   specification.   Therefore,   the  

FHB   approach,   especially   when   informative   prior   in   model   parameters   is   employed,   offers   promise   in  analyzing  data  characterized  by  a  small  number  of  observations  that  is  common  in  before-­‐after   observational   studies.   Also,   non-­‐informative   priors   in   model   parameters   might   have   serious   consequences  in  the  outcome  when  working  with  limited  data.       In  the  same  Table  1,  the  effect  of  alternative  priors  on  υi  (Gamma  vs  Log-­‐normal)  can  be  observed.   The   results   indicate   that   the   choice   of   prior   has   a   marginal   effect   on   parameter   accuracy.   The   outcomes  obtained  with  both  priors  are  very  similar,  with  slightly  more  accurate  estimates  based   on   the   HPL   model.   Despite   that   in   most   of   the   cases   these   two   priors   are   expected   to   generate   very   similar   results,   this   conclusion   cannot   be   generalized   and   a   sensitivity   analysis   is   always   recommended.   It   is   also   important   to   mention   that   the   HPL   can   be   easily   extended   to   the   multivariate  case  with  and  without  spatial  effects,  or  with  a  latent-­‐class  model  setting.     Regarding   the   impact   of   FHB   vs   EB   on  ranking   accuracy,   Table   2   presents   the   outcomes   obtained   from   the   FHB   vs   EB   approach   using   the   posterior   probability   of   excess   as   raking   criterion.   From   these   results   one   can   see   that   the   accuracy   measured   with   the   Spearman’s   correlation   coefficient   (ρ)  is  not  very  sensitive  to  the  approach.  FHB  approach  generates  only  slightly  better  results  than   those  obtained  with  the  EB  approach  –  note  that  the  closer  that  ρ  is  to  1.0  the  better.  For  instance,  ρ   values  obtained  with  the  EB  approach  and  100  datasets  are  in  average  0.65  for  the  sample  scenario   with   30   sites;   for   the   sample   size,   the   average   ρ   value   is   equal   to   0.69   for   the   FHB   approach   and   HPG  model.         Moreover,  from  the  results  presented  in  second  part  of  Table  2,  one  can  observed  that  the  outcomes   obtained   with   different   ranking   criteria   (but   with   the   same   estimation   method   and   model)   are   very   similar.  The  posterior  of  ranks  are  basically  the  same  than  those  obtained  with  the  posterior  of   θi.   For   instance,   the   average   of   ρ   values   is   0.70   for   the   posterior   mean   of   ranks   vs.   0.68   for   the   posterior  mean  of  accident  frequency  using  the  EB  approach.  This  means  that  using  more  complex   ranking   criteria   do   not   necessarily   guarantee   more   accurate   results.   This   also   highlights   the   importance  of  carrying  on  a  sensitivity  analysis  as  part  of  study  when  using  FHB  approach,  which   allows  computing  with  little  additional  effort  different  crash  risk  estimates.       This   simulation   study   also   intends   to   illustrate   the   flexibility   of   the   FHB   approach;   under   FHB   different   modeling   settings,   informative   priors   and   ranking   methods   can   be   implemented   to   the   same   dataset.   Also,   a   sensitivity   analysis   can   be   easily   executed   to   investigate   the   effect   of   key  

model  parameters  and  ranking  methods  that  potentially  can  affect  the  outcome  of  the  analysis.  In   addition,  FHB  provides  a  robust  and  versatile  methodology  allowing  for  a  consistent  incorporation   of  past  evidence  and  parameter  uncertainty.       4  Adopting  the  Bayesian  approach  in  travel  behavior  modeling   4.1  Subjective  probabilities  and  random  utility  models   In  essence,  the  concept  of  subjective  probabilities  motivates  construing  the  parameters  of  a  model   as  a  random  variable  in  a  Bayesian  setting.  Instead  of  measuring  frequency,  subjective  probabilities   measure   the   beliefs   about   the   occurrence   of   a   particular   event   and   are   related   to   the   notion   of   probability   laws   under   uncertainty   (see   Savage,   1954).   The   application   of   Bayesian   microeconometrics   to   choice   modeling   is   natural   precisely   because   of   the   concept   of   subjective   probabilities,   which   is   akin   to   the   concept   of   random   utility   in   discrete   choice   modeling.   Effectively,   discrete   choice   models   are   derived   using   stochastic   behavioral   assumptions,   in   which   utility   is   considered   random   to   represent   the   researcher's   uncertainty   about   the   individuals'   decision-­‐ making   process.   Being   unable   to   account   for   all   the   factors   that   influence   a   consumer's   decisions,   the   researcher   can   only   ascertain   choice   probabilities.     These   choice   probabilities,   in   turn,   do   not   directly   measure   different   frequency   of   choice,   but   rather   the   researcher's   judgment   or   beliefs   about  the  likelihood  of  each  alternative  being  chosen.       Despite   both   the   conceptual   affinity   between   subjective   probabilities   and   random   utility   models   and   the   potential   benefits   that   such   models   offer,   transportation   choice   modelers   have   in   general   been   reluctant   to   adopt   Bayesian   techniques.   (cf.   road   safety   analysis).   This   fact   was   noticed   in   Brownstone   (2001)   and   little   has   changed   since,   especially   when   compared   to   applications   of   Bayesian  choice  modeling  in  other  fields  such  as  marketing.  For  instance,  Damien  and  Kockelman   (2010)  describe  the  impact  of  the  Bayesian  approach  as  being  at  an  “early  stage”  for  many  fields  in   transportation  analysis.       Whereas  finding  conjugate  priors  is  limited  to  multinomial  logit  models  (Koop  and  Poirier,  1993),   MCMC   estimation   can   be   implemented   for   any   choice   model   (see   Chib   et   al.,   1998).     On   the   one   hand,  a  Gibbs  sampler  was  first  developed  for  binary  and  ordered  choice  (Albert  and  Chib,  1993)   and   later   expanded   to   multinomial   probit   models   (McCulloch   et   al.,   2000;   Imai   and   van   Dyk,   2005).     On   the   other   hand,   multinomial   logit   models   can   be   estimated   using   Metropolis-­‐Hastings  

(Frühwirth-­‐Schnatter   and   Frühwirth,   2010).   Metropolis-­‐Hastings   has   also   been   used   for   the   nested   logit  (Lahiri  and  Gao,  2002;  cf.  Poirier,  1996)  and  the  continuous  cross-­‐nested  logit  models  (Lemp   et   al.,   2010).   These   estimators   have   been   used   in   biostatistics   and   marketing,   in   some   occasions   with   aggregate   data.   Bayes   estimators   based   on   Metropolis-­‐Hastings   within   a   Gibbs   sampler   have   been   used   for   the   hierarchical   representation   of   mixed   logit   (Train,   2009)   and   for   the   system   of   equations   of   a   multinomial   logit   kernel   with   endogenous   latent   attributes   (Daziano   and   Bolduc,   2011).   Fang   (2008)   and   Brownstone   (2009)   derived   a   Gibbs   sampler   for   the   system   of   equations   of   discrete-­‐continuous   demand   models.7  In   economics,   Bayes   estimators   have   been   also   derived   for   dynamic  choice  models  (Imai  et  al.,  2009),  another  problem  where  transportation  researchers  lag   behind  relevant  developments  that  are  now  common  in  applied  econometrics  (Aguirregabiria  and   Mira,  2010).         Even  with  a  straightforward  Bayes  estimator  for  probit  (see  empirical  applications  in  Bolduc  et  al.,   1997  and  et  al.,  2003),  the  mixed  logit  Bayes  estimator  has  dominated  the  few  applications  that  do   use  the  Bayesian  approach  for  transportation  analysis  (e.g.  Hensher  and  Greene,  2003;  Sillano  and   Ortúzar,   2005;   Scarpa   et   al.,   2008).   In   particular,   because   the   traffic   assignment   problem   involves   a   large  number  of  potentially  correlated  alternatives,  the  Bayesian  approach  is  especially  promising   for   route   choice   (Washington   et   al.,   2010).   Nevertheless,   these   few   studies   do   not   fully   take   advantage   of   the   Bayesian   approach.   For   instance,   these   particular   studies   have   not   exploited   the   ideas   of   credible   intervals,   Bayesian   model   selection   or   Bayesian   averaging   for   competing   models   (see   the   discussion   in   Brownstone,   2001).   Empirical   work   in   transportation   has   also   failed   to   exploit  Bayes  estimates  of  nonlinear  functions  of  the  model  parameters  (cf.  Sonnier  et  al.,  2007)  and   the  use  of  predictive  posteriors.8         On   the   other   hand,   transportation   researchers   are   now   adopting   Bayesian   tools   for   certain   specifical   modeling   components   that   are   not   related   to   actual   estimation   of   the   parameters,   such   as   efficient   design   of   stated   preference   experiments   (Sándor   and   Wedel,   2005),   missing-­‐data   imputation   (Washington   et   al.,   2012),   and   choice-­‐set   formation   modeling.   Latent   class   models   (Greene  and  Hensher,  2003),  which  have  attracted  significant  research  interest  for  addressing  the  

                                                                                                                7  Brownstone  and  Fang  (2009)  address  the  problem  of  endogeneity,  which  was  absent  in  the  estimator  derived  by  Fang  

(2008).   8  Two  exceptions  are  the  work  of  Brownstone  and  Fang  (2009),  where  the  authors  perform  an  out-­‐of-­‐sample  check  of   vehicle  choice  and  utilization  forecasts  using  predictive  posteriors,  and  the  credible  sets  of  willingness-­‐to-­‐pay  derived  in   Daziano  and  Achtnicht  (2012).  

 

problem   of   discrete   random   heterogeneity   in   tastes,   is   another   example   of   a   pseudo   Bayesian   framework.  Even  though  the  formulation  of  latent  class  models  is  Bayesian,  researchers  have  been   using  frequentist  estimators  (cf.  Miranda-­‐Moreno  et  al.,  2005).         4.2  A  Gibbs  sampler  for  the  multinomial  probit  model   Both  the  logic  and  intuition  of  the  Bayes  estimator  of  the  parameters  of  a  discrete  choice  model  is   best   described   by   the   probit   Gibbs   sampler   derived   in   McCulloch et al. (2000).   Consider   the   following  estimable  form  of  the  multinomial  probit  model

(8)

C 'Δ jU i = C 'Δ j X i β + C 'Δ jε i , C 'Δ jε i ~ N (0( J −1) , I ( J −1) ) ( J −1×1)

( K×1)

( J −1×K )

( J −1×1)

& j ⇔ Δ jU ij ' < 0,∀j ' ≠ j (   yi = ' () j ' ⇔ Δ jU ij ' > max{0,Δ jU i,− j },∀j ' ≠ j

 

 

 

 

(9)

  where   U i  is   the   vector   that   contains   the   random   utility     of   each   alternative   available   to   individual   i,  

X i  is   an   attribute   matrix,   β  are   marginal   utilities,   ε i  is   a   multivariate   normally   distributed   random   shock,   Δ j  is   a   matrix   difference   operator   that   normalizes   the   model   with   respect   to   alternative   j,   U i,− j  represents   the   set   of   all   elements   of   U i  with   the   exception   of   U ij ,     C  is   the  

(

Cholesky  root  of   Δ j ΣΔ 'j

)

−1

,   I ( J −1)  is  the  identity  matrix  of  size  J-­‐1,  and   yi  is  a  choice  indicator.  

  The   parameter   space   of   this   model   is   given   by   both   the   marginal   utilities   β  and   the   covariance   '

matrix   of   the   utility   in   differences   Δ j ΣΔ j  via   the   elements   of   its   Cholesky   root.   Because   the   dependent   variable   of   equation   (8)   is   latent,   it   is   possible   to   treat   Δ jU i  as   an   additional   parameter.   This  step  is  called  data  augmentation  and  in  the  paragraphs  below  we  show  how  the  estimator  is   simplified   by   the   use   of   the   augmented   parameter   space.   As   discussed   in   subsection   2.3,   Gibbs   sampling  requires  building  a  partition  of  the  parameter  space.  For  the  multinomial  probit  model  the   '

partition   is   natural:   Δ jU i ,   β ,   and   Δ j ΣΔ j .   Given   this   partition,   the   steps   of   the   probit   Gibbs   sampler  are  the  following.    

(0)

'



Start  at  any  given  point   Δ jU i ,   β (0) ,  and   (Δ j ΣΔ j )



For   g ∈ {1,...,G}   (g)

1. If   yi = j ,  draw     Δ jU i

(0)

 in  the  parameter  space.    

 from  the  truncated  normal  distribution  

N (Δ j X i β ( g−1) ,(Δ j ΣΔ 'j )( g−1) )1(Δ jU ij ' < 0,∀j ' ≠ j)   (g)

otherwise  draw   Δ jU i

 from  the  truncated  normal  distribution  

N (Δ j X i β ( g−1) ,(Δ j ΣΔ 'j )( g−1) )1(Δ jU ij ' > max{0,Δ jU i,− j },∀j ' ≠ j)   2. Draw   β ( g )  from  the  normal  distribution  

N ((Vβ−1β + (C ( g−1)' X )'C ( g−1)' X )−1 (Vβ−1 + X 'C ( g−1)C ( g−1)'Δ jU ( g ) ), Vβ−1 + C ( g−1)' X '(C ( g−1)' X ))−1 ),

 

where   β  and   Vβ  are  the  parameters  of  the  prior  distribution   p( β ) ~ N ( β ,Vβ ) .   '

3. Draw   (Δ j ΣΔ j )

(g)

 from  the  inverted-­‐Wishart  distribution  

N % ( ' ' IW 'ν + N ,Δ j ΣΔ j + ∑ (Δ jU i( g ) − Δ j X i β ( g ) )(Δ jU i( g ) − Δ j X i β ( g ) )'** ,   & ) i=1

' where   the   ν  and   Δ j ΣΔ j  are   the   parameters   of   the   inverted-­‐Whishart   prior  

p(Δ j ΣΔ 'j ) ~ IW (ν ,Δ j ΣΔ 'j )   4. Update  the  Cholesky  root   C ( g )  and  normalize   c11 = 1  for  identification.  Following  McCulloch et al. (2000),   normalization   of   scale   can   be   set   by   postprocessing   the   chain   in   a   procedure   that   involves   dividing   all   the   parameters   of   interest   β ( g ) and   the   nuisance   parameters   in  

C ( g )  by   c11(g )  (cf. Nobile, 2000 and Imai and van Dyk, 2005).   5. To   make   inference   on   parameter   ratios,   just   calculate   the   desired   ratio   using   the   current   samples  to  generate  an  instance  from  the  posterior  distribution  of  the  measure  of  interest.   For   instance,   suppose   that   in   a   travel   mode   choice   model   βcost  represents   the   additive   inverse  of  the  marginal  utility  of  income  and   βtime represents  the  marginal  disutility  of  travel   time.   In   this   case,   a   draw   from   the   posterior   distribution   of   the   value   of   time   is   given   by   the   (g) (g) simple  ratio   VOT ( g ) = [ βtime / βcost ]  (cf.  the  procedure  discussed  at  the  end  of  subsection  3.4.,  

which  also  involves  inference  on  parameters  ratios.)  



Because  the  long-­‐run  distribution  of  the  Gibbs  sampler  is  the  true  posterior  of  interest,  each   sample  can  be  treated  as  a  random  draw  of  the  joint  posterior.  



Bayes  point  estimates  are  given  by  sample  averages.      

Note  that  unlike  the  frequentist  estimator  of  the  multinomial  probit  (see  Geweke  et  al.,  1997),  the   probit   Bayes   estimator   is   analytically   straightforward.   Because   samples   of   the   utility   function   are   drawn  in   the  data   augmentation   step  1,   the   estimator  of  the  marginal  utilities  is   treated   in   step   2   as   an  ordinary  regression  problem.  In  addition,  note  that  no  maximization  is  required.       4.3  Case  study:  mode  choice  with  a  Bayesian  probit   To   illustrate   the   application   of   the   probit   Gibbs   sampler   we   generate   a   quasi-­‐simulated   experiment   from   revealed-­‐preference   data   on   interurban   travel   mode   choice   in   Canada   (KPMG   Peat   Marwick   and  Koppelman,  1990).9    The  true  underlying  model  is  a  multinomial  probit  model  with  parameters   extracted   from   the   MLE   of   the   original   data.   2769   individuals   choosing   among   car,   train,   and   air   options   were   considered.   The   utility   function   was   assumed   linear   in   parameters   and   in   the   following   attributes:   alternative-­‐specific   constants,   household   income,   large   city   indicator,   in-­‐ vehicle-­‐travel   time,   out-­‐of-­‐vehicle   travel   time,   frequency,   and   cost.   Once   the   simulated   data   was   generated,   a   multinomial   probit   model   was   estimated   using   two   subsamples,   a   large   one   of   2500   individuals,   and   a   relatively   small   one   of   250   individuals.   Estimates   were   found   for   the   Gibbs   sampler   with   both   a   non-­‐informative   and   an   informative   prior,   as   well   as   for   the   maximum   simulated  likelihood  estimator.       In   Table   3   we   report   the   estimates   of   the   marginal   disutilities   of   cost   and   time,   as   well   as   the   estimates   of   the   derived   value   of   time   (VOT,   in   C$[1989]/hr).   The   upper   and   lower   limits   of   the   95%   credible   and   confidence   intervals   are   calculated   for   the   Bayes   and   frequentist   estimators,   respectively.   Highest   probability   density   (HPD)   credible   intervals   are   calculated   as   the   shortest   possible   set   containing   95%   of   the   posterior   mass,   based   on   the   MCMC   posterior   samples.   This   calculation  is  valid  not  only  for  the  parameters  of  interest  of  the  model,  but  also  for  functions  of  the   parameters,  such  as  the  samples  of  the  posterior  of  the  value  of  time.  As  a  result,  the  derivation  of   Bayes  confidence  intervals  is  straightforward.  In  the  case  of  the  MSLE,  Krinsky  and  Robb  confidence   intervals  are  constructed  (Krinsky  and  Robb,  1986).  

                                                                                                               

9  This  widely  studied  dataset  was  collected  in  1989  by  VIA  Rail  to  analyze  the  demand  for  a  high-­‐speed  train  in  the  

Toronto-­‐Montréal  corridor  (see  Forinash  and  Koppelman,  1993;  Bhat,  1995;  Koppelman  and  Wen,  2000).  

  For   the   large   sample,   where   it   is   known   that   both   the   Bayes   and   frequentist   estimators   share   the   same  asymptotic  properties,  all  point  estimates  are  very  close  to  the  true  values  (including  the  VOT   point   estimates).   The   credible   interval   with   non-­‐informative   priors   is   somewhat   wider   than   its   frequentist   counterpart.   When   an   informative   prior   on   the   marginal   utilities   is   considered,   then   the   resulting   VOT   credible   interval   is   very   tight.   Whereas   no   problems   are   detected   for   the   large   sample,  when  working  with  the  smaller  sample  several  issues  arise.  Bias  in  the  point  estimates  is   detected  in  almost  all  cases,  with  the  exception  of   βtime  and  VOT  for  the  informative-­‐prior  case.  At   the   same   time,   all   credible   intervals   contain   the   true   parameters,   and   the   hypothesis   of   the   point   estimates  being  different  than  the  true  parameters  cannot  be  rejected.  However,  the  hypothesis  of  a   null  VOT  cannot  be  rejected  in  the  frequentist  case  (at  the  95%  confidence  level).  In  fact,  the  same   conclusion   applies   to   βcost ,   a   result   which   is   an   indication   of   weak   identification   of   the   ratio   representing  VOT.  Weak  identification  also  explains  the  high  VOT  standard  error.  Note  that  a  high   standard  deviation  is  also  obtained  for  the  VOT  posterior  distribution  when  a  non-­‐informative  prior   is   considered.   Although   some   VOT   outliers   are   generated   in   this   case,   the   credible   interval   is   relatively   tight   (compared   to   the   confidence   interval).   The   credible   interval   contains   the   true   parameter,  and  does  not  contain  zero.       In  sum,  this  exercise  shows  that  the  Bayesian  approach  obviates  inference  problems  in  parameter   ratios   when   the   ratios   are   weakly   identified.   Furthermore,   the   use   of   an   informative   prior   can   in   some   instances   entirely   prevent   weak   identification   problems,   which   are   especially   troublesome   in   constructing  intervals  for  welfare  measures.       5  Summary  and  conclusions   Bayes   estimators   represent   a   decision-­‐making   process   under   uncertainty   in   which   the   optimal   decision  is  based  on  the  minimum  expected  cost  of  being  wrong  –  or  maximum  utility  of  being  right   –   given   the   researcher's   beliefs   about   the   state   of   the   world.   The   decision   is   compatible   with   the   scientific   method   for   updating   knowledge   given   evidence   provided   by   data.   There   are   several   reasons  to  encourage  the  use  of  Bayesian  statistics  in  transportation  modeling.  To  begin  with,  Bayes   estimators   are   both   gradient   and   Hessian   free.   Since   no   maximization   is   involved,   Bayes   estimators   are   very   particularly   well   suited   for   more   sophisticated   models   that   may   have   non-­‐convex   likelihood  functions  and  weakly  identified  parameters.  Bayes  works  for  small  samples,  and  as  well   as   for   data   that   are   not   samples.   At   the   same   time,   asymptotic   properties   coincide   with   those   of   the  

maximum  likelihood  estimator.  In  addition,  the  Bayesian  framework  is  particularly  well  suited  for   models   that   include   latent   variables,   such   as   partially   observed   variables,   missing   data,   utility,   or   attitudes  and  multidimensional  quality  attributes  in  hybrid  choice  models.       When   the   analytical   posterior   is   not   known,   Bayesian   inference   exploits   Monte   Carlo   sampling.   Computational   Bayesian   estimators   exploit   simulation-­‐aided   inference,   but   they   avoid   the   potential   bias   found   in   maximizing   a   simulated   likelihood.   Not   only   is   the   Bayesian   answer   to   the   interval   estimation   problem   straightforward   when   using   computational   Bayes   but   also   much   easier   to   interpret.       Computational   Bayesian   estimators   can   also   be   used   for   addressing   the   interval   estimation   problem.  Credible  sets  represent  the  region  of  the  posterior  that  contains  the  true  parameter  with  a   given   probability   (credible   level).   Given   a   significance   level   of   α,   a   high   posterior   density   (HPD)   credible   interval   –   the   smallest   interval   that   contains   (1-­‐α)100%   of   the   posterior   mass   –   can   be   computed  using  a  very  simple  algorithm.  For  instance,  in  this  paper  we  calculated  HPD  intervals  for   the   inverse   dispersion   parameter   and   for   the   value   of   time.   Since   the   probability   of   the   true   parameter  being  inside  the  credible  interval  is  1-­‐  α,  hypothesis  testing  is  also  straightforward  in  a   Bayesian   setting.   In   contrast,   frequentist   hypothesis   testing   is   much   harder   to   interpret   and   has   some  conspicuous  problems,  such  as  not  obeying  the  likelihood  principle.       In   this   paper   we   have   reviewed   the   empirical   application   of   Bayesian   statistics   in   transportation   modeling.  In  road  safety  analysis,  where  small  samples  are  common,  Bayes  estimators  are  already   the  dominant  analytical  tool.  Within  the  class  of  Bayesian  methods  applied  in  road  safety  analysis,   two   main   approaches   exist:   the   empirical   Bayes   (EB)   method   and   full   hierarchical   Bayes   (FHB)   approach.  In  EB,  crash  data  is  used  first  to  estimate  model  parameters,  and  then  used  again  to  make   posterior  inference.  Despite  its  popularity,  EB  has  been  criticized  for  its  inability  to  properly  handle   parameter   uncertainty   and   for   its   lack   of   justification   for   using   the   data   twice.   FHB,   on   the   other   hand,   overcomes   these   limitations   with   a   more   flexible   modeling   framework   although   it   does   require  analysts  to  model  the  calibration  process  directly  using  specialized  statistical  software.     Several  studies  have  looked  at  the  potential  benefits  of  the  FHB  method  when  dealing  with  deficient   data   (a   problem   referred   to   as   the   low-­‐mean   and   low-­‐sample   problem)   or   under   the   presence   of   temporal   and   spatial   correlation   or   latent-­‐class   data.   In   a   full   Bayesian   setting,   more   complex  

models  have  been  proposed,  such  as  multivariate  Poisson  models  with  and  without  spatial  effects,   mixture  (latent-­‐class)  models,  Poisson-­‐Weibull  models,  and  others.    In  spite  of  this  rich  literature,   few  studies  in  the  traffic  safety  literature  have  investigated  the  practical  implications  of  using  these   two  alternative  approaches  in  the  model  parameters,  in  particular  when  using  informative  priors.  It   is   known   that   when   using   non-­‐informative   priors,   the   expected   results   should   converge   to   those   obtained  in  the  EB  approach  using  ML  estimators.  Moreover,  when  working  with  large  samples,  this   is  not  an  issue;  however,  when  working  with  small-­‐sample  or  low-­‐mean  datasets,  the  problem  can   become  relevant.     In   the   case   of   discrete   choice   models   of   travel   behavior,   Bayes   estimators   are   still   not   part   of   the   empirical   researcher's   toolkit.   Although   the   Bayesian   approach   may   appear   less   attractive   inasmuch  as  the  effect  of  prior  distributions  becomes  less  relevant  given  the  large  sample  sizes  of   choice  data,  the  use  of  predictive  posteriors  and  credible  intervals  (instead  of  confidence  intervals),   as  well  as  the  treatment  of  weakly  identified  models,  both  favor  the  adoption  of  Bayesian  tools.       As   a   specific   example,   we   have   offered   in   this   paper   a   simple   simulation   exercise   to   illustrate   the   construction   of   credible   intervals   for   the   value   of   time.   In   particular,   we   have   shown   that   when   problems   of   weak   identification   emerge,   the   Bayes   estimates   perform   better   than   MLE   and   even   more  so  when  informative  priors  are  taken  into  account.       References     Abbess,   C.,   Jarrett,   D.F.,   and   Wright,   C.C.   (1981)   Accidents   at   Blackspots:   Estimating   the   Effectiveness   of   Remedial   Treatment,   with   Special   Reference   to   the   Regression-­‐to-­‐mean   effect.   Traffic   Engineering   and   Control,  22  (10),  535-­‐542.   Aguero-­‐Valverde,   J.   &   Jovanis,   P.P.   (2009)   Bayesian   Multivariate   Poisson   Log-­‐Normal   Models   for   Crash   Severity  Modeling  and  Site  Ranking.  Transportation  Research  Record  2136,  pp.  82-­‐91.   Aguirregabiria,   V.   and   Mira,   P.   (2010).   Dynamic   discrete   choice   structural   models:   a   survey.   Journal   of   Econometrics,  156(1),  38-­‐67.   Albert,   J.,   Chib,   S.   (1993).   Bayesian   analysis   of   binary   and   polychotomous   response   data.   Journal   of   the   American  Statistical  Association,  88,  669-­‐679.   Bhat,  C.R.  (1995).  A  heteroscedastic  extreme-­‐value  model  of  intercity  mode  choice,  Transportation  Research   B,  29(6),  471-­‐483.   Bolduc,   D.,   Fortin,   B.   and   Gordon,   S.   (1997).   Multinomial   probit   estimation   of   spatially   interdependent   choices:   An   empirical   comparison   of   two   new   techniques.   International   Regional   Science   Review,   20,   77-­‐ 101.   Brijs,   T.,   Karlis,   D.,   Van   den   Bossche,   F.   and   Wets,   G.   (2007).   A   Bayesian   model   for   ranking   hazardous   road   sites.  Journal  of  the  Royal  Statistical  Society  series  A,  170,  1-­‐17.   Brownstone,   D.   (2001).   Discrete   choice   modelling   for   transportation,   in   Hensher,   D.A.   (ed.).   Travel   Behaviour   Research:  The  Leading  Edge,  Pergamon  Press,  Oxford,  97-­‐124.  

Brownstone,   D.,   and   Fang,   H.   (2009).   A   vehicle   ownership   and   utilization   choice   model   with   endogenous   residential  density.  Working  paper,  Department  of  Economics,  University  of  California,  Irvine.   Cheng,   L.,   S.R.   Geedipally,   and   D.   Lord   (2012)   Examining   the   Poisson-­‐Weibull   Generalized   Linear   Model   for   Analyzing  Crash  Data.  Safety  Science,  Vol.  54,  pp.  38-­‐42.   Chib,  S.,  Greenberg,  E.,  and  Chen,  Y.  (1998).  MCMC  methods  for  fitting  and  comparing  multinomial  response   models.  Econometrics  9802001,  EconWPA.   Damien,   P.,   and   Kockelman,   K.M.   (2010).   Preface   to   special   issue   on   Bayesian   methods.   Transportation   Research  Part  B,  44(5),  631-­‐632.   Daziano,   R.A.,   and   Achtnicht,   M.   (2012).   Accounting   for   uncertainty   in   willingness   to   pay   for   environmental   benefits.  Working  Paper,  School  of  Civil  and  Environmental  Engineering,  Cornell  University.     Daziano,  R.A.,  and  Bolduc,  D.  (2011).  Incorporating  pro-­‐environmental  preferences  toward  green  automobile   technologies  through  a  Bayesian  Hybrid  Choice  Model.  Transportmetrica.     Edwards,   Y.,   Allenby,   G.   (2003).   Multivariate   analysis   of   multiple   response   data.   Journal   of   Marketing   Research,  40,  321-­‐334.   El-­‐Basyounya   K.   Sayed   T.   (2012)   Depth-­‐based   hotspot   identification   and   multivariate   ranking   using   the   full   Bayes  approach,  Accident  Analysis  and  Prevention,  Vol.  50,  pp.  1082–1089.   Fang,  A.  (2008).  A  discrete-­‐continuous  model  of  households'  vehicle  choice  and  usage,  with  an  application  to   the  effects  of  residential  density.  Transportation  Research  Part  B,  42(9),  736-­‐758.   Forinash,  C.V.,  and  Koppelman,  F.S.  (1993).  Application  and  interpretation  of  nested  logit  models  of  intercity   mode  choice.  Transportation  Research  Record,  1413,  98-­‐106.   Frühwirth-­‐Schnatter,  S.,  and  Frühwirth,  R.  (2010).  Data  augmentation  and  MCMC  for  binary  and  multinomial   logit  models.  In  Thomas  Kneib  and  Gerhard  Tutz,  editors,  Statistical  Modelling  and  Regression  Structures,   111-­‐132.  Physica-­‐Verlag,  Heidelberg.   Gelfand,   A.E.,   and   Smith,   A.F.M.   (1990).   Sampling-­‐Based  approaches  to  calculating  marginal  densities.  Journal   of  the  American  Statistical  Association,  85,  398-­‐409.   Geweke,   J.   (1991).   Efficient   simulation   from   the   multivariate   normal   and   Student-­‐t   distributions   subject   to   linear   constraints,   in   E.   M.   Keramidas,   ed.,   Computer   Science   and   Statistics:   Proceedings   of   the   Twenty-­‐ Third  Symposium  on  the  Interface,  Interface  Foundation  of  North  America,  Inc.,  Fairfax,  571-­‐578.   Geweke,   J.F.,   Keane,   M.P.,   Runkle,   D.E.   (1994).   Alternative   computational   approaches   to   inference   in   the   multinomial  probit  model.  Review  of  Economics  and  Statistics,  76,  609-­‐632.   Geweke,   J.F.,   Keane,   M.P.,   Runkle,   D.E.   (1997).   Statistical   inference   in   the   multinomial   multiperiod   probit   model.  Journal  of  Econometrics,  80,  125-­‐165.   Greene,  W.H.,  and  Hensher,  D.  (2003).  A  latent  class  model  for  discrete  choice  analysis:  contrasts  with  mixed   logit.  Transportation  Research  Part  B,  37,  681-­‐698.   Hajivassiliou,   V.,   McFadden,   D.   (1998).   The   method   of   simulated   scores   for   the   estimation   of   LDV   models.   Econometrica,  66,  863-­‐896.   Hauer,   E.   (1997).   Observational   before-­‐after   studies   in   road   safety:   estimating   the   effect   of   highway   and   traffic  engineering  measures  on  road  safety.  Elsevier  Science  Ltd.   Hauer,  E.,  and  Persaud,  B.N.  (1987).  How  to  estimate  the  safety  of  rail-­‐highway  grade  crossings  and  the  safety   effects  of  warning  devices.  Transportation  Research  Record,  1114,  131-­‐140.       Hensher,  D.A.,  and    Greene,  W.H.  (2003),  The  mixed  logit  model:  The  state  of  practice.  Transportation,  30(2),   133-­‐176.   Heydecker,   B.G.,   and   Wu,   J.   (2001).   Identification   of   road   sites   for   accident   remedial   work   by   Bayesian   statistical  methods:  an  example  of  uncertain  inference.  Advances  in  Engineering  Software,  32,  859-­‐869.         Higle,   J.L.,   and   Witkowski,   J.M.   (1988).   Bayesian   identification   of   hazardous   sites.   Transportation   Research   Record,  1185,  24-­‐35.   Huang,   H.,     Abdel-­‐Aty   M.   (2010)   Multilevel   data   and   Bayesian   analysis   in   traffic   safety.   Accident   Analysis   &   Prevention,  Volume  42(6),  pp.  1556–1565   Imai,  K.,  and  Van  Dyk,  D.A.  (2005).  A  Bayesian  analysis  of  the  multinomial  probit  model  using  marhinal  data   augmentation.  Journal  of  Econometrics,  124,  311-­‐334.   Imai,  S.,  Jain,  N.,  and  Ching,  A.  (2009).  Bayesian  estimation  of  dynamic  discrete  choice  models.  Econometrica,   77(6)  1865-­‐1899.       Kim,  Y.,  Kim,  T.Y.,  and  Heo,  E.  (2003).  Bayesian  estimation  of  multinomial  probit  models  of  work  trip  choice.   Transportation,  30,  351-­‐365.  

Koop,  G.,  and  Poirier,  D.J.  (1993).  Bayesian  analysis  of  logit  models  using  natural  conjugate  priors.  Journal  of   Econometrics,  56,  323-­‐340.   Koppelman,   F.S.,   Wen,   C-­‐H.   (2000).   The   paired   combinatorial   logit   model:   properties,   estimation   and   application.  Transportation  Research  B,  34,  75-­‐89.   Krinsky,   I.,   and   Robb,   A.L.   (1986).   On   approximating   the   statistical   properties   of   elasticities.   Review   of   Economic  and  Statistics,  68,  715-­‐719.   Lahiri,   K.,   Gao,   J.   (2002).   Bayesian   analysis   of   nested   logit   model   by   Markov   Chain   Monte   Carlo.   Journal   of   Econometrics,  111,  103-­‐133.   Lan   B.,   Persaud,   B.   (2011)   Fully   Bayesian   Approach   to   Investigate   and   Evaluate   Ranking   Criteria   for   Black   Spot  Identification.  Transportation  Research  Record,  2237,  pp.  117-­‐125.   Lemp,   D.,   Kockelman,   K.M.   and   Damien,   P.   (2010).   The   continuous   cross-­‐nested   logit   model:   Formulation   and   application  for  departure  time  choice.  Transportation  Research  Part  B,  44(5),  646-­‐661.     Lord,  D.,  Washington,  S.P.,  Ivan,  J.N.  (2005b).  Poisson,  Poisson-­‐gamma  and  zero  inflated  regression  models  of   motor  vehicle  crashes:  balancing  statistical  fit  and  theory.  Accident  Analysis  and  Prevention,  37(1),  35-­‐46.     McCulloch,   R.,   Rossi,   P.   (1994).   An   exact   likelihood   analysis   of   the   multinomial   probit   model.   Journal   of   Econometrics,  64,  207-­‐240.   McCulloch   and   Rossi(2000)]{MR00}   McCulloch,   R.,   and   Rossi,   P.   (2000).   Bayesian   analysis   of   the   multinomial   probit   model,   in   R.   Mariano,   T.   Schuermann,   and   M.   Weeks,   eds.,   Simulation-­‐Based   Inference   in   Econometrics,  Cambridge  University  Press,  New  York.   McCulloch,   R.R.,     Polson,   N.G.,   Rossi,   P.E.   (2000).   Bayesian   analysis   of   the   multinomial   probit   model   with   fully   identified  parameters.  Journal  of  Econometrics,  99,  173-­‐193.   McFadden,  D.L.,  Train,  K.E.  (2000).  Mixed  MNL  models  for  discrete  response.  Journal  of  Applied  Econometrics,   15(5),  447-­‐470.   Miaou,   S.P.,   and   Lord,   D.   (2003).   Modeling   traffic   crash-­‐flow   relationships   for   intersections:   dispersion   parameter,  functional  form,  and  Bayes  versus  Empirical  Bayes.  Transportation  Research  Record,  1840,  31-­‐ 40.   Miaou,   S.P.,   and   Song,   J.J.   (2005).   Bayesian   ranking   of   sites   for   engineering   safety   improvement:   decision   parameter,  treatability  concept,  statistical   criterion  and  spatial  dependence.  Accident  Analysis  and  Prevention,  37,  699-­‐720.   Miranda-­‐Moreno,   L.F.,   Fu,   L.,   Saccomano,   F.,   and   Labbe,   A.   (2005).   Alternative   risk   models   for   ranking   locations  for  safety  improvement.  Transportation  Research  Record,  1908,  1-­‐8.   Miranda-­‐Moreno,  L.F.,  Labbe,  A.,  Fu,  L.  (2007).  Multiple  Bayesian  testing  procedures  for  selecting  hazardous   sites.  Accident  Analysis  and  Prevention,  39,  1192-­‐1201.   Nobile,   A.   (2000).   Bayesian   multinomial   probit   models   with   a   normalization   constraint.   Journal   of   Econometrics,  99,  335-­‐345.   Persaud,   B.,   Lyon,   C.,   and   Nguyen,   T.   (1999).   Empirical   Bayes   procedure   for   ranking   sites   for   safety   investigation  by  potential  for  safety  improvement.  Transportation  Research  Record,  1665,  7-­‐12.   Poirier,  D.J.  (1996).  A  Bayesian  analysis  of  nested  Logit  models.  Journal  of  Econometrics,  75,  163-­‐181.   Qin,  X.,  Ivan,  J.N.,  and  Ravishanker,  N.  (2004).  Selecting  exposure  measures  in  crash  rate  prediction  for  two-­‐ lane  highway  segments.  Accident  Analysis  and  Prevention,  36  (2),  183-­‐191.   Rao,  J.N.  (2003).  Small  area  estimation.  John  Wiley  and  Sons.   Sándor,  Z.,  and  Wedel,  M.  (2005).  Heterogeneous  conjoint  choice  designs.  Journal  of  Marketing  Research,    42,   210-­‐218.   Savage,  L.  J.  (1954).  The  Foundations  of  Statistics,  John  Wiley,  New  York.   Schluter,  P.J.,  Deely,  J.J.,  and  Nicholson,  A.J.  (1997).  Ranking  and  selecting  motor  vehicle  accident  sites  by  using   a  hierarchical  Bayesian  model.  The  Statistician,  46  (3),  293-­‐316.   Scott,  S.  (2003).  Data  augmentation  for  the  Bayesian  analysis  of  multinomial  logit  models.  Proceedings  of  the   American  statistical  association  section  on  Bayesian  statistical  science.   Shen,  W.,  and  Louis,  T.  A.  (2000),  Triple-­‐goal  estimates  for  disease  mapping.  Statistics  in  Medicine  19,  2295-­‐ 2308.   Song,  J.J.,  Ghosh,  M.,  Miaou,  S.,  and  Mallick,  B.  (2006).  Bayesian  multivariate  spatial  models  for  roadway  traffic   crash  mapping.  Journal  of  Multivariate  Analysis,  97,  246-­‐273.     Sonnier,   G.,   Ainslie,   A.,   Otter,   T.   (2007).   Heterogeneity   distributions   of   willingness-­‐to-­‐pay   in   choice   models.   Quantitative  Marketing  and  Economics,  5,  313-­‐331.   Train,  K.  (2009).  Discrete  choice  methods  with  simulations.  Cambridge  University  Press.  

Tunaru,   R.   (2002)   Hierarchical   Bayesian   models   for   multiple   count   data.   Austrian   Journal   of   Statistics,   31   (3),   221-­‐229.   Washington,   S.,   Congdon,   P.,   Karlaftis,   M.   and   Mannering,   F.   (2010).   The   Bayesian   multinomial   logit   model:   theory   and   route   choice   example.   Transportation   Research   Record:   Journal   of   the   Transportation   Research   Board,  No.  2136,  28-­‐36.   Washington,   S.,   Ravulaparthy,   S.,   Rose,   J.,   Hensher,   D.   and   Pendyala,   R.   (2012).   Bayesian   imputation   of   non-­‐ chosen   attribute   values   in   revealed   preference   surveys.   Journal   of   Advanced   Transportation.   Doi   10.1002/atr.201.   Zou,   Y.,   Y.   Zhang,   and   D.   Lord   (2012)   Application   of   finite   mixture   of   negative   binomial   regression   models   with  varying  weight  parameters  for  vehicle  crash  data  analysis.  Accident  Analysis  &  Prevention,  Vol.  50,  pp.   1042-­‐1051.  

TABLES    

Table 1. Impact of prior assumptions on inverse dispersion parameter and model error (υi) HPG model Data

Sample size

prior1 mean

PG data

PLN data

s.d.

HPL model

prior2 mean

s.d.

prior1 mean

s.d.

NB model

prior2 mean

s.d.

MLE mean

s.d.

30 sites

66.83 (180.3)

1.73

(1.00) 50.71 (146.7)

1.67

(1.1) 1789.98 (0.3)

50 sites

43.99 (120.0)

1.52

(0.84) 36.64

(98.7)

1.46

(0.9)

569.13

(3.9)

100 sites

7.38

(22.7)

1.22

(0.57)

(27.3)

1.10

(0.6)

1.12

(0.1)

30 sites

62.36 (168.7)

1.57

(0.92) 40.76 (127.1)

1.53

(0.9) 1664.88 (4.5)

50 sites

53.07 (129.6)

1.50

(0.81) 39.45 (106.6)

1.42

(0.9)

8.81

660.47

100 sites 7.03 (16.4) 1.23 (0.57) 5.67 (16.2) 1.09 (0.6) 1.40 s.d.: stands for the standard deviation from the 100 simulations PG data: means that PG – Poisson/Gamma model was used to generate data HPG: Hierarchical Poisson/Gamma model, HPL: Hierarchical Poisson/Lognormal model Prior1: non-informative, Prior2: informative

(3.6) (2.4)

Table 2: Effect of ranking methods based on the Spearman’s correlation coefficient (ρ) Bayesian approach: FHB vs EB approach Full Bayes approach Data

Sample size 30 sites

PG Data

50 sites 100 sites 30 sites

PLN Data

50 sites 100 sites

Statistics mean s.d. mean s.d. mean s.d. mean s.d. mean s.d. mean s.d.

HPG model prior1 prior2 0.66 0.69 (0.14) (0.12) 0.67 0.68 (0.09) (0.09) 0.69 0.69 (0.06) (0.06) 0.72 0.75 (0.13) (0.10) 0.74 0.75 (0.09) (0.09) 0.76 0.77 (0.06) (0.05)

EB approach

HPL model prior1 prior2 0.67 0.68 (0.13) (0.12) 0.68 0.67 (0.09) (0.09) 0.69 0.68 (0.06) (0.06) 0.73 0.74 (0.12) (0.10) 0.74 0.76 (0.09) (0.08) 0.77 0.77 (0.05) (0.05)

NB model MLE3 0.65 (0.15) 0.67 (0.09) 0.69 (0.06) 0.71 (0.14) 0.74 (0.09) 0.77 (0.05)

Ranking criteria: posterior distribution of θi and Ri Sample size

Posterior of θi

posterior of Ri

Statistics prior1 prior2 prior1 mean 0.69 0.70 0.68 PG Data 50 sites s.d. (0.08) (0.07) (0.09) mean 0.69 0.69 0.68 100 sites s.d. (0.07) (0.07) (0.07) s.d.: stands for the standard deviation from the 100 simulations

prior2 0.70 (0.07) 0.68 (0.07)

PG data: it means that PG – Poisson/Gamma model was used to generate data  

   

EB estimator Negative Binomial 0.69 (0.08) 0.69 (0.07)

Table  3:  Analysis  of  interurban  travel  mode  choice        

Large  sample:  2500  individuals   True  values:   -­‐0.0182   -­‐0.0060   ln(β0)   β1       non-­‐inf.   mean    -­‐0.0189   -­‐0.0059   s.d.   0.0033   0.0007     Lower  bound  95%  CI   -­‐0.0251   -­‐0.0073     Upper  bound  95%  CI   -­‐0.0120   -­‐0.0047     inf.   mean   -­‐0.0185   -­‐0.0058   s.d.   0.0024   0.0005     Lower  bound  95%  CI   -­‐0.0230   -­‐0.0067     Upper  bound  95%  CI   -­‐0.0136   -­‐0.0049     MLE   mean   -­‐0.0189   -­‐0.0058     s.e.   0.0032   0.0007     Lower  bound  95%  CI   -­‐0.0252   -­‐0.0072     Upper  bound  95%  CI   -­‐0.0126   -­‐0.0044   Small  sample:  250  individuals   non-­‐inf.   mean   -­‐0.0256   -­‐0.0046   s.d.   0.0085   0.0014     Lower  bound  95%  CI   -­‐0.0425   -­‐0.0077     Upper  bound  95%  CI   -­‐0.0092   -­‐0.0023     inf.   mean   -­‐0.0201   -­‐0.0060   s.d.   0.0067   0.0013     Lower  bound  95%  CI   -­‐0.0341   -­‐0.0088     Upper  bound  95%  CI   -­‐0.0079   -­‐0.0037     MLE   mean   -­‐0.0129   -­‐0.0030     s.e.   0.0066   0.0014     Lower  bound  95%  CI   -­‐0.0258   -­‐0.0057   Upper  bound  95%  CI   0.0000   -­‐0.0003     s.e.: stands for the standard error of MLE s.d.: stands for the posterior standard deviation CI: stands for either credible or confidence interval  

19.78   β2   19.81   8.35   12.99   31.03   19.31   2.56   15.05   25.12   19.15   4.30   12.53   29.30   15.00   191.33   5.41   28.61   20.06   10.92   10.08   41.27   15.29   370.44   -­‐14.39   91.36