Zeng et al.
A Bayesian spatial random parameters Tobit model for analyzing crash rates on roadway segments Qiang Zenga, , Huiying Wena, Helai Huangb, Mohamed Abdel-Atyc a
School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, Guangdong 510641, PR China b
Urban Transport Research Center, School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, PR China
c
Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL 32816-2450, United States
ABSTRACT This study develops a Bayesian spatial random parameters Tobit model to analyze crash rates on road segments, in which both spatial correlation between adjacent sites and unobserved heterogeneity across observations are accounted for. The crash-rate data for a three-year period on road segments within a road network in Florida, are collected to compare the performance of the proposed model with that of a (fixed parameters) Tobit model and a spatial (fixed parameters) Tobit model in the Bayesian context. Significant spatial effect is found in both spatial models and the results of Deviance Information Criteria (DIC) show that the inclusion of spatial correlation in the Tobit regression considerably improves model fit, which indicates the reasonableness of considering cross-segment spatial correlation. The spatial random parameters Tobit regression has lower DIC value than does the spatial Tobit regression, suggesting that accommodating the unobserved heterogeneity is able to further improve model fit when the spatial correlation has been considered. Moreover, the random parameters Tobit model provides a more comprehensive understanding of speed limit on crash rates than does its fixed parameters counterpart, which suggests that it should be considered as a good alternative for crash rate analysis. Keywords: Crash rate; Tobit model; Spatial correlation; Random parameters; Bayesian inference.
Corresponding author E-mail address:
[email protected] (Q. Zeng),
[email protected] (H. Wen),
[email protected] (H. Huang),
[email protected] (M. Abdel-Aty)
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
1. Introduction Given the enormous importance of highway safety, gaining a better understanding of how the probability of crashes is affected by the relevant risk factors has been an area of research focus for a long time, in the hopes that it will provide useful suggestions for laws, regulations and countermeasures aimed at reducing crash occurrence. In most cases, the detailed driving data such as acceleration, braking and steering information, are not available. As a consequence, the relationship between the risk factors and crash frequency, the number of crashes occurring at certain road entities (e.g. road segments or intersections) over some specified periods (e.g. weeks, months or years), is investigated. Because crash frequencies are non-negative integers, statistical count models have been widely employed. Poisson regression is the basic model which assumes crash occurrence to be a Poisson process while requires the mean and variance of crash frequency to be equal (Jovanis and Chang, 1986). To accommodate certain characteristics of crash data, such as over-dispersion, under-dispersion, excess zero observations, spatiotemporal correlation, multilevel structure and unobserved heterogeneity, several Poisson model’s variations have been proposed successively, including Poisson-gamma/negative binomial (Miaou, 1994), Poisson-lognormal (Miaou et al., 2005), gamma (Oh et al., 2006), Conway-Maxwell-Poisson (Lord et al., 2008), zero-inflation (Huang and Chin, 2010), generalized estimating equation (Lord and Persaud, 2000), generalized additive (Xie and Zhang, 2008), multilevel (Huang and Abdel-Aty, 2010; Lee et al. 2015), random effects (Shankar et al., 1998), random parameters (Anastasopoulos and Mannering, 2009), finite mixture (Park and Lord, 2009), Markov switching (Malyshkina et al., 2009), latent class (Peng and Lord, 2011), and generalized ordered-response models (Castrol et al., 2012). Besides, some artificial intelligence models, such as the neural network (Chang, 2005; Huang et al., 2016; Zeng et al., 2016a, b), Bayesian neural network (Xie et al., 2007), and support vector machine (Li et al., 2008) have also been developed to predict crash frequencies as they exhibit better approximation performance than traditional count models. More detailed descriptions and assessments of these models can be found in the review papers of Lord and Mannering (2010) and Mannering and Bhat (2014). From another perspective, in recent years, more and more efforts have been made to develop methods for crash rate analysis which can be deemed as good alternatives to the traditional crash-frequency approaches (Anastasopoulos et al., 2008). Compared with crash count, crash rate is more appealing because it neutralizes the effect of crash exposure, forms a standardized measure of the risk of collision involvement, and may be a more effective criterion used for identifying hotspots (Ma et al., 2015b; Xu et al., 2014b). Moreover, crash rates are commonly adopted in accident reporting systems. For example, fatality and injury rates per 100 million vehicle miles traveled are used in the annual crash reports of National Highway Traffic Safety Administration (NHTSA, 2012). 2
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Other from crash frequencies (which are discrete integers), crash rates are continuous, non-negative numbers. Zero crash rates may be observed at some sites over finite time duration1, resulting in data left-censored at zero. To deal with the censoring problem, Anastasopoulos et al. (2008) first introduced the Tobit model to analyze crash rates. Later on, Anastasopoulos et al. (2012a, b) proposed a random parameters Tobit model to account for unobserved heterogeneity across observations and a multivariate Tobit model for modeling the crash-injury-severity rates simultaneously. Furthermore, a two-stage bivariate logistic-Tobit model was proposed for jointly modeling crash severity and crash rate by severity (Xu et al., 2014b). A correlated random parameters Tobit model was developed to monitor the interactions between independent variables (Yu et al., 2015), and a random parameters Tobit model with refined-scale panel data was developed to accommodate serial correlation across observations (Ma et al., 2015a). Caliendo el al. (2015) compared the random parameters Tobit regression with the random parameters negative binomial model, and found that the significance of some explanatory variables is not consistent in the two models. In addition, Ma et al. (2015b) advocated a lognormal hurdle model with flexible scale parameter for the purpose of approximating the distribution of crash rates more accurately. Most of the proposed methods aimed at analyzing crash rates are based on the Tobit regression. However, none of them has accounted for spatial correlation between neighboring sites. In highway safety analysis, spatial correlation is an important issue to be considered, because observation units that are in close proximity may share confounding factors. Recently, significant spatial effects have been found in crash prediction models for road entities (Abdel-Aty and Wang, 2006; Aguero-Valverde and Jovanis, 2008, 2010; Barua et al., 2014, 2016; El-Basyouny and Sayed, 2009; Mitra, 2009; Guo et al., 2010), road network (Zeng and Huang, 2014), regional units (e.g., wards, neighborhoods, counties, traffic analysis zones) (Aguero-Valverde and Jovanis, 2006; Aguero-Valverde, 2013; Dong et al., 2014, 2015; Noland and Quddus, 2004; Quddus, 2008; Xu et al., 2014a; Xu and Huang, 2015) and injury severity (Castro, 2013). Condon (2006) has pointed out that ignoring spatial dependence may lead to underestimation of variability. Moreover, Aguero-Valverde and Jovanis (2008) concluded the advantages of the inclusion of spatial correlation: (1) using spatial correlation, site estimates pool strength from adjacent sites, thereby improving model estimation; (2) spatial dependence can be a surrogate for unknown and related covariates, thus reducing model misspecification; and (3) spatial dependence is able to provide information for grouping sites in corridors for further analysis. Methodologically, a variety of approaches, ranging from simultaneous autoregressive (Quddus, 2008), conditional auto-regressive (CAR) (Aguero-Valverde and Jovanis, 2006, 2008, 2010; Ahmed et al., 2011; Dong et al., 2014, 2015; Guo et al., 2010; Mitra, 2009; Quddus, 2008; Siddiqui et al., 2012; Xu et al., 2014a) and spatial error model (Quddus, 2008), to multiple membership (El-Basyouny and Sayed, 2009), 1
This phenomenon may be caused by several reasons. One is simply that there is no crash occurrence at the sites over the observation period. Another is that no injury crashes are not reported when the property damage is not beyond a specific value. Anastasopoulos et al. (2012a, b) illustrated this phenomenon in more detail. 3
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
extended multiple membership (El-Basyouny and Sayed, 2009), geographic weighted regression (Hadayeghi, 2003), geographic weighted Poisson regression (Hadayeghi, 2010; Xu and Huang, 2015) and generalized estimation equations (Abdel-Aty and Wang, 2006), have been proposed to assess spatial effects in crash-frequency data. Among these approaches, CAR prior is the most prevalent for modeling spatial correlation. Moreover, as noted by Quddus (2008), CAR model under the Bayesian framework can lead to more appropriate estimation results than classic spatial models. In this study, the main objective is to develop a spatial model to analyze crash rates on roadway segments, which can be formulated by incorporating the CAR prior into a Tobit model. To accommodate the unobserved heterogeneity across observations as well, the coefficients of covariates can be further set as random parameters. In order to demonstrate the proposed models, a (fixed parameters) Tobit, a spatial (fixed parameters) Tobit and a spatial random parameters Tobit model are compared in the Bayesian context.
33
Yit* 0 m xitm it ,
2. Methodology In this section, firstly, the formulations of the three candidate models for crash rate analysis, Tobit, spatial Tobit and spatial random parameters Tobit regressions, are specified explicitly under the Bayesian framework. Then, a criterion in the context of Bayesian inference, the Deviance Information Criteria (DIC), is introduced for the purpose of model comparison. 2.1. Model specification 2.1.1. Tobit model Owing to James Tobin (1958), the Tobit model is a regression for modeling the continuous dependent variable which is censored at either a lower threshold (leftcensored), an upper threshold (right-censored), or both. Generally, crash rates are leftcensored at zero, because crashes may not be reported at some sites during the study period (Anastasopoulos et al., 2008). The Tobit regression for modeling crash rates is expressed as follows: M
(1)
m 1
34
Y * , if Yit* 0 Yit it , i 1, 2,, N , t 1, 2,, T . * 0, if Yit 0
35
In the above equations, Yit and xitm are the observed values of crash rate and the m
36
th covariate at site i during period t , respectively. M , N and T are the number
37
of covariates, observed sites and periods respectively. 0 is the constant, while m 4
(2)
Zeng et al.
1
is the estimable coefficient of the m th covariate. Yit* is a latent variable observed
2
only when positive, and it denotes the unstructured error which is assumed to follow
3 4
independently a normal distribution with zero mean and standard deviation ( 0) , that is,
5 6 7 8 9
it ~ normal (0, 2 ) .
(3)
2.1.2. Spatial Tobit model The spatial Tobit model can be defined by incorporating a residual term with Gaussian CAR prior, first proposed by Besag et al. (1991), into Eq. (1), such that M
10
Yit* 0 m xitm it i ,
(4)
i ~ normal (i , 1 ) , i
(5)
m 1
11
12
13
i
i
i j
jij
i j
ij
c
i j ij
,
(6)
,
(7)
14
where i denote the spatial correlation (a structured error) for site i . c is the
15
precision parameter in the CAR prior. ij is the entry with the adjacency index and
16 17 18
weight for sites i and j in proximity matrix ω . To measure the proportion of variability in the random effects that is due to spatial autocorrelation, an index, , is calculated:
19
sd
sd
.
(8)
20 21 22 23 24 25
2.1.3. Spatial random parameters Tobit model A number of previous studies have suggested that heterogeneous effects of certain factors may present across observations of crash rates, and that random parameters Tobit model is a feasible way to deal with this issue (Anastasopoulos et al., 2012a; Caliendo el al., 2015; Ma el al., 2015a; Yu et al., 2015). Therefore, to accommodate the
26
underlying unobserved heterogeneity in the spatial Tobit model, the coefficients ( 0 ,
27
1 , …, M ) in Eq. (4) are set to be random parameters ( it0 , it1 , …, itM ). The most 5
Zeng et al.
1 2
prevalent form of random parameters is used in the study, which assumes that they are independently and normally distributed (Anastasopoulos et al., 2012a): M
Yit* it0 itm xitm it i ,
(9)
4
itm m itm , m 0,1 , M ,
(10)
5
itm ~ normal (0, m2 ) ,
(11)
3
m 1
6
in which m is the mean of the random parameter itm , and itm is a normally
7
distributed term with zero mean and standard deviation m ( m 0) .
8 9 10 11 12 13 14 15
2.2. Model comparison As in many other studies modeling under the Bayesian framework (Barua et al., 2014, 2016; Zeng and Huang, 2014), the DIC, is used to compare the above candidate models. The DIC is intended as a Bayesian generalization of Akaike’s Information Criteria that penalizes larger parameter models. Specifically, it provides a Bayesian measure of model complexity and fitting, and is defined as (Spiegelhalter et al., 2002):
DIC D pD ,
(12)
16
where D is the posterior mean deviance that can be taken as a Bayesian measure
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
of fitting, and pD is a complexity measure for the effective number of parameters. Generally, models with lower DIC values are preferred. However, it is worth noting that determining a critical difference in DIC is very difficult. According to Spiegelhalter et al. (2005), very roughly, over 10 differences may rule out the model with the higher DIC; differences between 5 and 10 are considered substantial; if the DIC difference is less than 5, and the model inferences are significantly different, then it could be misleading to just report the model with the lowest DIC. 3. Data preparation and preliminary analysis To demonstrate the proposed model, an urban road network in Hillsborough county of Florida is selected, as shown in Fig. 1. The disaggregated crash data for the network in a three-year period (2005–2007, T 3 ) are obtained from the Crash Analysis Reporting (C.A.R.) system. Meanwhile, the shape files of some site characteristics are downloaded from the website of Florida Department of Transportation. To keep the homogeneity in traffic volume, the roadways in the network are segmented at the intersections (the dots at the right part of Fig. 1), resulting in a total of N 346 road segments. Geographical information system (GIS) techniques 6
Zeng et al.
1 2 3 4 5 6 7 8
are used to map crashes and site characteristics to these segments. Therefore, the annual crash numbers and attribute values of each site during 2005 to 2007 are acquired. For the years 2005–2007, average annual daily traffic (AADT) data are only available for roadways on the National Highway System (NHS), which are those maintained by the state and account for 38 % of the observed road segments. Fortunately, AADT data are recorded for all segments in 2012. To estimate the AADT of segments off the National Highway System in 2005–2007, the scale factors for each year are calculated first as:
st
9
ion system
AADTi t
AADTi 2012 ion system
, t 2005, 2006,2007 ,
(13)
10
where i on system means that segment i is on the National Highway System and
11
AADTi t is its AADT in year t . After the determination of the scale factors, the AADT
12 13 14 15
of a road segment off the system in 2005-2007 can be estimated by multiplying its AADT in 2012 by the corresponding scale factor. The yearly crash rate (number of crashes per million vehicle kilometers traveled), which is used as the dependent variable in this study, is calculated as:
16
No _ crashit CR , i 1, 2,364, t 2005, 2006, 2007 , AADTi t Li 365 /1000, 000 t i
(14)
17
in which No _ crashit is the number of crashes that occurred on road segment i in
18
year t and Li is the length of segment i , which varies from 0.065 km to 2.83 km
19 20 21 22 23 24 25 26 27 28 29
with mean 0.856 km. Among the total N T 1038 observations, the crash rates of 126 (≈12.1%) observations are 0. Table 1 illustrate the definitions and descriptive statistics of the variables used in the model development. With respect to the proximity matrix ω in the spatial models, various neighboring structures have been considered by Aguero-Valverde and Jovanis (2008, 2010). According to the findings of them and other researchers (Barua et al., 2014, 2016; Nicholson, 1999), the first-order neighbor is chosen to define the proximity matrix. Specifically, if two road segments are connected directly, then their adjacency weight is 1; otherwise, the adjacency weight is 0. Moran’s I is used to reflect whether observed crash rates are spatially correlated among adjacent road segments (Banerjee et al., 2004):
30
Moran ' s I
n i j ij (Yi Y )(Y j Y ) ( i j ij ) i (Yi Y )2
,
(15)
31
where n is the total number of road segments; Yi and Yj are the average crash rates
32
in the three years at entities i and j . Y is the global average of crash rates at all 7
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
segments. The results show that Moran’s I of the segments is 11 with z-scores over 2.58, indicating that the crash rates is spatially clustered at the 99% significance level. Correlation tests and multi-collinearity diagnoses for the risk factors are conducted. Table 2 shows the results of Pearson correlation tests. According to the results, we can find that AADT and Lane, AADT and Funclass, Lane and Funclass, Funclass and NHS are significantly correlated with correlation coefficients more than 0.6 or less than -0.6. To avoid the adverse impact of significant correlation, Lane and Funclass are excluded from the models. The results of the diagnoses indicate that there is no significant collinearity in the rest factors.
25
distribution N (0,10 4 ) is used as the priors of m and m ( m 0,1 ,8) , and a
26
diffused gamma distribution gamma (0.001, 0.001) is used as the priors of precisions
27
of the normal distributions, 1 / 2 and 1/ m2 ( m 0,1 ,8) . The CAR priors are
28 29 30 31 32 33 34 35 36 37 38 39 40
specified by the function of car.normal to reflect the spatial proximity relationship of the road segments analyzed (Zeng and Huang, 2014). For each model, a chain of 200,000 iterations of the Markov chain Monte Carlo (MCMC) simulation are made, with the first 4000 iterations acting as burn-ins. The Gelman-Rubin statistics available in WinBUGS is used to evaluate the MCMC convergence. In the spatial random parameters Tobit model, if the variance of a random parameter is not statistically significant at the 95% credible level, the random parameter is simplified to be fixed across the road segments (Anastasopoulos et al., 2012a).
4. Model estimation and result analysis 4.1. Model estimation Compared to the traditional maximum likelihood estimation which requires closedform likelihood functions, Bayesian inference is able to handle very complex models (such as the spatial models in this study) (Lord and Mannering, 2010). Moreover, Freeware WinBUGS, a popular platform to make the Bayesian inference, builds a flexible programming environment. Consequently, all the candidate models are programmed, estimated and evaluated in WinBUGS, which is much more easily implemented than other alternatives, such as maximum simulated likelihood estimation (Anastasopoulos et al., 2012a). In the absence of sufficient prior knowledge, non-informative priors are specified for the parameters and the hyper-parameters. Specifically, a diffused normal
4.2. Result analysis The results of the model estimation are summarized in Table 3. Comparing the spatial Tobit to the Tobit model, the standard deviation of the unstructured errors, , 8
Zeng et al.
1
is dropped dramatically from 6.838 to 2.243, after the incorporation of the CAR prior.
2
It is reasonable in that the spatial correlation sd is found statistically significant
3
at the 95% credible level and accounts for up to 74.6 % of the variability in the random
4
effects of crash rates. Moreover, the D value of the spatial Tobit model (=4622)
5 6 7 8 9 10 11 12 13
is much smaller than that of the Tobit model (=6936), suggesting that the spatial Tobit model fits the crash-rate data much better than the Tobit model. Although there are more effective parameters (as reflected by pD) in the spatial Tobit model, which increase the complexity, its much lower DIC indicates that it outperforms the Tobit model substantially. These results are consistent with those in crash-frequency modeling, showing that accommodating spatial correlation could significantly improve model fit (Aguero-Valverde and Jovanis, 2008; Quddus, 2008; Zeng and Huang, 2014). In the spatial random parameters Tobit model, the value of almost equals to its counterpart in the spatial (fixed parameters) Tobit model (≈2.2), but the spatial effect
14
sd decreases to only 0.666. Nevertheless, the spatial correlation is still significant
15
at the 95% credible level and accounts for 21.2 % of the variability in the random effects.
16
The spatial random parameters Tobit’s D (=4611) is less than the spatial Tobit’s
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
(=4622), which means that accounting for the heterogeneity caused by variable effects of risk factors across observations could further improve model fit, when the structured spatial correlation is considered. Besides, it is interesting to find that the pD value of the spatial random parameters Tobit (=330) is lower than that of the spatial Tobit (=336), that is, there are less effective parameters in the former model but the difference is only 6. Although this founding is somewhat counterintuitive, Spiegelhalter et al. (2002) pointed out that the pD of a model might also depend on the data, which makes it possible. Specifically, in our crash dataset, it is probably attributed to that the random parameter of Speed limit with significant variance in the spatial random parameters Tobit accounts for a portion of spatial effect. As shown in Fig. 2, the road segments with the same speed limit tend to be spatially clustered. As a consequence, the random parameter may weaken the spatial correlation between adjacent road segments while changing the spatial error terms at some sites to be ineffective. Overall, the spatial random parameters Tobit has lower DIC value than the spatial Tobit and the difference in DIC is 15 (>10), which again demonstrates that the random parameters Tobit model is superior to its fixed parameters counterpart (Anastasopoulos et al., 2012a). Since the spatial random parameters Tobit significantly outperforms the other two models, its risk factors’ parameter estimates are mainly discussed in this section. According to the estimation results in Table 3, significant heterogeneity is only found in the effect of Speed limit. To be specific, the mean and standard deviation of the random parameter of Speed limit are 0.143 and 0.151 respectively. Given these distributional parameters with their 95 % credible intervals away from zero, for 17.2 % 9
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
of the observations the effect of Speed limit on crash rates is negative and for 82.8 % of the observations the effect of Speed limit on crash rates is positive. It suggests that crash rates will increase with higher speed limits on most road segments, a finding conforming with the engineering intuitions, many previous research results (AgueroValverde and Jovanis, 2008; Zeng and Huang, 2014), and the estimates in the fixed parameters models. However, for a small percentage of the roadway segments the opposite is true. With regard to the parameter estimation of the other factors, we can see that the coefficient of AADT is significantly negative in all three models at the 95 % credible level, which indicates that crash rates significantly decrease with increasing daily traffic volume. It is consistent with many previous studies (Dickerson et al., 2000; Huang et al., 2016; Qi et al., 2007; Zhou and Sisiopiku, 1997) which have argued that the reduced travel speed by increasing traffic volume may decrease the likelihood of crash occurrence. Moreover, it is noticeable that lower speeds generally lead to crashes with lower injury severity, which are more likely to be under-reported. This may be another reason for the significantly negative effect of AADT on crash rates. It is interesting to find that the effect of Surface is significantly positive at the 90 % credible level in the spatial Tobit model, although the effect is insignificantly positive in the other two models (less than 90 % credible level). That is, lower crash rates may be associated with poor pavement conditions, which could be explained by the risk compensation theory that drivers are reasonably speculated to adapt to the adverse driving environment (poor pavement condition) by altering their driving behavior (such as being more careful or slowing down) (Mannering and Bhat, 2014). It is possible that some drivers may over compensate for the adverse condition, resulting in a lower crash risk than under normal driving conditions. 5. Conclusions and future research This study advocates a Bayesian spatial random parameters Tobit model for analyzing crash rates on road segments, which accommodates spatial correlation between adjacent sites and unobserved heterogeneity across observations simultaneously. The proposed model is developed and compared with Tobit and spatial Tobit models, using three years collision data on road segments within an urban roadway network in Hillsborough, Florida. The models are estimated and evaluated in the Bayesian context via programming in the freeware WinBUGS. The spatial effect, represented by a residual term with CAR prior in the study, is found statistically significant in both spatial models. Moreover, the results of DIC show that the spatial Tobit models have substantially better fit than the Tobit model, which indicates that the consideration of spatial correlation between adjacent road segments is reasonable. The spatial random parameters Tobit regression is found to outperform the spatial Tobit regression on fitting the crash-rate data, which suggests that accounting for the heterogeneous effects is able to further improve model fit when the spatial 10
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
correlation has been taken account of. Interestingly, the spatial random parameters model is found with less effective parameters than the spatial Tobit model. This may be caused by that a part of spatial effect is shifted to the random parameters of the risk factor which tend to be spatially clustered, thus making some spatial error terms ineffective. The parameter estimates show that only Speed limit has an effect on crash rates varying across roadway segments. AADT has a significantly negative effect on crash rates in all models, while the coefficient of Surface is only significant (at the 90 credible level) in the spatial Tobit model. Most of the results were intuitive and in line with previous research findings, partially validating the proposed model. In summary, the empirical analysis demonstrates the superiority of the Bayesian spatial random parameters Tobit model and the significance of spatial correlation and heterogeneous effects of certain risk factors in crash-rate data, which indicates the considerable potential of the proposed model in crash rate analysis. The proposed model could be applied to rank sites with promise for safety improvement and extended to its multivariate form to simultaneously analyze crash rates by certain categories (e.g. injury severity, the number of vehicles involved, crash type). It is noteworthy that segmentation logic of the roadway network may have impact on the estimation results. Therefore, further research efforts could also be made to comparing the performance of the candidate models on different segmentation forms of the same network or various networks with different configurations. Acknowledgements This research was jointly supported by the Natural Science Foundation of China (No. 51378222, 51578247, 71371192) and a grant from the Joint Research Scheme of National Natural Science Foundation of China/Research Grants Council of Hong Kong (No. 71561167001 & N_HKU707/15). References Abdel-Aty, M., Wang, X., 2006. Crash estimation at signalized intersections along corridors: analyzing spatial effect and identifying significant factors. Transportation Research Record 1953, 98-111. Aguero-Valverde, J., 2013. Multivariate spatial models of excess crash frequency at area level: Case of Costa Rica. Accident Analysis and Prevention 59, 365-373. Aguero-Valverde, J., Jovanis, P. P., 2006. Spatial analysis of fatal and injury crashes in Pennsylvania. Accident Analysis and Prevention 38 (3), 618-625. Aguero-Valverde, J., Jovanis, P., 2008. Analysis of road crash frequency with spatial models. Transportation Research Record 2061, 55-63. Aguero-Valverde, J., Jovanis, P., 2010. Spatial correlation in multilevel crash frequency models: Effects of different neighboring structures. Transportation Research Record 2165, 21-32. 11
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Ahmed, M., Huang, H., Abdel-Aty, M., Guevara, B., 2011. Exploring a Bayesian hierarchical approach for developing safety performance functions for a mountainous freeway. Accident Analysis and Prevention 43 (4), 1581-1589. Anastasopoulos, P.C., Mannering, F.L., 2009. A note on modeling vehicle accident frequencies with random-parameters count models. Accident Analysis and Prevention 41, 153-159. Anastasopoulos, P. C., Mannering, F. L., Shankar, V. N., Haddock, J. E., 2012a. A study of factors affecting highway accident rates using the random-parameters tobit model. Accident Analysis and Prevention 45, 628-633. Anastasopoulos, P. C., Shankar, V. N., Haddock, J. E., Mannering, F. L., 2012b. A multivariate tobit analysis of highway accident-injury-severity rates. Accident Analysis and Prevention 45, 110-119. Anastasopoulos, P. C., Tarko, A. P., Mannering, F. L., 2008. Tobit analysis of vehicle accident rates on interstate highways. Accident Analysis and Prevention 40 (2), 768-775. Banerjee, S., Carlin, B.P., Gelfand, A.E., 2004. Hierarchical Modeling and Analysis for Spatial data. CRC press. Barua, S., El-Basyouny, K., Islam, M. T., 2014. A full Bayesian multivariate count data model of collision severity with spatial correlation. Analytic Methods in Accident Research 3, 28-43. Barua, S., El-Basyouny, K., Islam, M. T., 2016. Multivariate random parameters collision count data models with spatial heterogeneity. Analytic Methods in Accident Research 9, 1-15. Besag, J., York, J., Mollié, A., 1991. Bayesian image restoration, with two applications in spatial statistics. Annals of the institute of statistical mathematics 43, 1-20. Caliendo, C., De Gugliemo, M. L., Guida, M., 2015. Comparison and analysis of road tunnel traffic accident frequencies and rates using random-parameter models. Journal of Transportation Safety and Security 8 (2), 177-195. Castro, M., Paleti, R., Bhat, C. R., 2012. A latent variable representation of count data models to accommodate spatial and temporal dependence: Application to predicting crash frequency at intersections. Transportation Research Part B 46, 253-272. Castro, M., Paleti, R., Bhat, C. R., 2013. A spatial generalized ordered response model to examine highway crash injury severity. Accident Analysis and Prevention 52, 188-203. Chang, L., 2005. Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Safety Science 43 (8), 541-557. Congdon, P., 2006. Bayesian statistical modelling, 2nd edition. John Wiley and Sons, New York. Dickerson, A., Peirson, J., Vickerman, R., 2000. Road accidents and traffic flows: An econometric investigation. Economica 67 (265), 101-121. Dong N., Huang H., Xu P., Ding Z., Wang. D., 2014. Evaluating spatial proximity structures in TAZ-level crash prediction models. Transportation Research Record 12
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
2432, 46-52. Dong N., Huang H., Zheng L., 2015. Support vector machine in crash prediction at the level of traffic analysis zones: assessing the spatial proximity effects. Accident Analysis and Prevention 82, 192-198. El-Basyouny, K., Sayed, T., 2009. Urban arterial accident prediction models with spatial effects. Transportation Research Record 2102, 27-33. Guo, F., Wang, X., Abdel-Aty, M. A., 2010. Modeling signalized intersection safety with corridor-level spatial correlations. Accident Analysis and Prevention 42, 8492. Hadayeghi, A., Shalaby, A., Persaud, B., 2003. Macrolevel accident prediction models for evaluating safety of urban transportation systems. Transportation Research Record 1840, 87-95. Hadayeghi, A., Shalaby, A. S., Persaud, B. N., 2010. Development of planning level transportation safety tools using geographically weighted Poisson regression. Accident Analysis and Prevention 42 (2), 676-688. Huang, H., Abdel-Aty, M., 2010. Multilevel data and Bayesian analysis in traffic safety. Accident Analysis and Prevention 42 (6), 1556-1565. Huang, H., Chin, H.C., 2010. Modeling road traffic crashes with zero-inflation and sitespecific random effects. Statistical Methods and Applications 19 (3), 445-462. Huang, H., Zeng, Q., Pei, X., Wong, S.C., Xu, P., 2016. Predicting crash frequency using an optimized radial basis function neural network model. Transportmetrica A 12 (4): 330-345. Jovanis, P.P., Chang, H.L., 1986. Modeling the relationship of accidents to miles traveled. Transportation Research Record 1068, 42-51. Lee J., Abdel-Aty M., Choi K., Huang H., 2015. Multi-level hot zone identification for pedestrian safety. Accident Analysis and Prevention 76, 64–73. Li, X., Lord, D., Zhang, Y., Xie, Y., 2008. Predicting motor vehicle crashes using support vector machine models. Accident Analysis and Prevention 40 (4), 16111618. Lord, D., Guikema, S., Geedipally, S.R., 2008. Application of the Conway–Maxwell– Poisson generalized linear model for analyzing motor vehicle crashes. Accident Analysis and Prevention 40 (3), 1123-1134. Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A 44 (5), 291-305. Lord, D., Persaud, B., 2000. Accident prediction models with and without trend: application of the generalized estimating equations procedure. Transportation Research Record 1717, 102-108. Ma, X., Chen, F., Chen, S., 2015a. Modeling crash rates for a mountainous highway by using refined-scale panel data. Transportation Research Record 2515, 10-16. Ma, L., Yan, X., Weng, J., 2015b. Modeling traffic crash rates of road segments through a lognormal hurdle framework with flexible scale parameter. Journal of Advanced 13
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Transportation 49 (8), 928-940. Malyshkina, N., Mannering, F., Tarko, A., 2009. Markov switching negative binomial models: An application to vehicle accident frequencies. Accident Analysis and Prevention 41 (2), 217-226. Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: methodological frontier and future directions. Analytic Methods in Accident Research, 1, 1-22. Miaou, S.-P., 1994. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident Analysis and Prevention 26 (4), 471-482. Miaou, S.-P., Bligh, R.P., Lord, D., 2005. Developing median barrier installation guidelines: a benefit/cost analysis using Texas data. Transportation Research Record 1904, 3-19. Mitra, S., 2009. Spatial autocorrelation and Bayesian spatial statistical method for analyzing intersections prone to injury crashes. Transportation Research Record 2136, 92-100. NHTSA, 2012. 2010 motor vehicle crashes: Overview. Nicholson, A., 1998. Analysis of spatial distributions of accidents. Safety science 31, 71-91. Noland, R. B., Quddus, M. A., 2004. A spatially disaggregate analysis of road casualties in England. Accident Analysis and Prevention 36 (6), 973-984. Oh, J., Washington, S.P., and Nam, D., 2006. Accident prediction model for railwayhighway interfaces. Accident Analysis and Prevention 38 (2), 346–356. Park, B.-J., Lord, D., 2009. Application of finite mixture models for vehicle crash data analysis. Accident Analysis and Prevention 41 (4), 683-691. Pei, X., Wong, S.C., Sze, N.N., 2012. The roles of exposure and speed in road safety analysis. Accident Analysis and Prevention 48, 464-471. Peng, Y., Lord, D., 2011. Application of latent class growth model to longitudinal analysis of traffic crashes. Transportation Research Record 2236, 102-109. Qi, Y., Smith, B. L., Guo, J., 2007. Freeway accident likelihood prediction using a panel data analysis approach. Journal of Transportation Engineering 133 (3), 149-156. Quddus, M. A., 2008. Modelling area-wide count outcomes with spatial correlation and heterogeneity: an analysis of London crash data. Accident Analysis and Prevention 40 (4), 1486-1497. Shankar, V., Albin, R., Milton, J., Mannering, F., 1998. Evaluating median crossover likelihoods with clustered accident counts: an empirical inquiry using the random effects negative binomial model. Transportation Research Record 1635, 44-48. Siddiqui, C., Abdel-Aty, M., Choi, K., 2012. Macroscopic spatial analysis of pedestrian and bicycle crashes. Accident Analysis and Prevention 45, 382-391. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., Van Der Linde, A., 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society 64 (4), 583-639. 14
Zeng et al.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Spiegelhalter, D., Thomas, A., Best, N., Lunn, D., 2005. WinBUGS user manual. MRC Biostatistics Unit, Cambridge, United Kingdom. Tobin, J., 1958. Estimation of relationships for limited dependent variables. Econometrica 26, 24-36. Xie, Y., Lord, D., Zhang, Y., 2007. Predicting motor vehicle collisions using Bayesian neural networks: an empirical analysis. Accident Analysis and Prevention 39 (5), 922-933. Xie, Y., Zhang, Y., 2008. Crash frequency analysis with generalized additive models. Transportation Research Record 2061, 39-45. Xu, P., Huang, H., 2015. Modeling crash spatial heterogeneity: Random parameter versus geographically weighting. Accident Analysis and Prevention 75, 16-25. Xu, P., Huang, H., Dong, N., Abdel-Aty, M., 2014a. Sensitivity analysis in the context of regional safety modeling: Identifying and assessing the modifiable areal unit problem. Accident Analysis and Prevention 70, 110-120. Xu, X., Wong, S.C., Choi, K., 2014b. A two-stage bivariate logistic-Tobit model for the safety analysis of signalized intersections. Analytic Methods in Accident Research 3-4, 1-10. Yu, R., Xiong, Y., Abdel-Aty, M., 2015. A correlated random parameter approach to investigate the effects of weather conditions on crash risk for a mountainous freeway. Transportation Research Part C 50, 68-77. Zeng, Q., Huang, H., 2014. Bayesian spatial joint modeling of traffic crashes on an urban road network. Accident Analysis and Prevention 67, 105-112. Zeng, Q., Huang, H., Pei, X., Wong S.C., 2016a. Modeling nonlinear relationship between crash frequency by severity and contributing factors by neural networks. Analytic Methods in Accident Research 10, 12-25. Zeng, Q., Huang, H., Pei, X., Wong, S. C., Gao, M., 2016. Rule extraction from an optimized neural network for traffic crash frequency modeling. Accident Analysis and Prevention 97, 87-95. Zhou, M., Sisiopiku, V., 1997. Relationship between volume-to-capacity ratios and accident rates. Transportation Research Record 1581, 47-52.
15
Zeng et al.
Table 1 Descriptive statistics for segment-related variables Variable
Description
Response variable Crash count per million vehicle kilometers Crash rate traveled Risk factors AADT Average annual daily traffic (103 pcua) Speed limit Posted speed limit Access Number of access roads/segment length Pavement condition: good/very good=1, Surface poor/fair=0 Lane Number of lanes NHS State-maintained roads=1, otherwise=0 Functional class: principal arterial=1 Funclass (reference), minor arterial =2, others=3 a
pcu: passenger car unit
16
Mean
S.D.
Min.
Max.
3.172
6.993
0
124.3
17.12 37.59 12.16
16.73 7.235 7.66
0.35 25 0
70 50 39.35
0.46
0.50
0
1
3.18 0.38
1.43 0.487
1 0
8 1
2.29
0.8
1
3
Zeng et al.
Table 2 Pearson correlation coefficients between explanatory variables AADT Lane Access Speed limit Surface Funclass NHS
AADT
Lane
Access
Speed limit
Surface
Funclass
NHS
1 0.802 0.031 0.550 0.265 -0.760 0.558
0.802 1 0.078 0.548 0.378 -0.700 0.547
0.031 0.078 1 0.055 0.250 -0.190 0.398
0.550 0.548 0.055 1 0.278 -0.562 0.387
0.265 0.378 0.250 0.278 1 -0.539 0.468
-0.760 -0.700 -0.190 -0.562 -0.539 1 -0.752
0.558 0.547 0.398 0.387 0.468 -0.752 1
17
Zeng et al.
Table 3 Model estimation resultsa
Constant AADT Speed limit S.D. of Speed limit Access Surface NHS
sd c
Spatial random parameters Tobit
Tobit
Spatial Tobit
-0.574(1.267)b -0.096(0.017)** 0.152(0.036)**
-4.185(2.426)* -0.088(0.033)** 0.231(0.065)**
-0.029(0.031) 0.789(0.493) -0.606(0.629) 6.838(0.150)**
-0.018(0.062) 1.769(0.978)* -0.864(1.239) 2.243(0.061)**
-0.482(1.856) -0.090(0.027)** 0.143(0.056)** 0.151(0.006)** -0.013(0.046) 0.492(0.695) -0.437(0.914) 2.232(0.060)**
6.572(0.104)**
0.666(0.446)**
15.34(0.619)**
0.893(0.605)**
0.746(0.006)**
0.212(0.115)**
D
6936
4622
4611
pD DIC
7 6943
336 4958
330 4941
a
Access and NHS are excluded, because neither of their effects on crash rates is significant at the 90 % credible level in these models. b Estimated mean(standard deviation) for the parameter * Significant at the 90 % credible level. ** Significant at the 95 % credible level.
18
Zeng et al.
Fig. 1. The selected road segments in Hillsborough County
19
Zeng et al.
Fig. 2. Spatial distribution of road segments by speed limit
20