This paper is available through TRB and is reference as: Guadamuz-Flores, R. and Aguero-Valverde, J. 2017. Bayesian Spatial Models of Crash Frequency at Highway–Railway Crossings. Transportation Research Record 2608. TRB, National Research Council, Washington, D.C., pp 27-35.
BAYESIAN SPATIAL MODELS OF CRASH FREQUENCY AT HIGHWAY-RAILWAY CROSSINGS
Renato Guadamuz-Flores, Corresponding Author Researcher Programa de Investigación en Desarrollo Urbano Sostenible Universidad de Costa Rica Barrio Los Profesores, Calle B, No 11, Mercedes. San Pedro, San José, Costa Rica. 11503 Tel: 506-2511-2786; Email:
[email protected] Jonathan Aguero-Valverde, Ph.D. Professor Programa de Investigación en Desarrollo Urbano Sostenible Universidad de Costa Rica Barrio Los Profesores, Calle B, No 11, Mercedes. San Pedro, San José, Costa Rica. 11503 Tel: 506-2511-2787; Email:
[email protected]
Word count: 5,732 words text + 7 tables/figures x 250 words (each) = 7482 words
TRR Paper number: 17-01993
Submission Date: March 6th, 2017
Guadamuz-Flores, Aguero-Valverde
2
ABSTRACT Although crashes at highway-railway crossing are rare, they usually result in severe injuries and fatalities. Intersections along railway corridors share many conditions that support the use of spatial analyses, but so far its spatial effects are essentially unknown. Little research has been conducted for highway-railway crossings using spatial correlation. The objective of this study is to analyze spatial correlation structures on crash frequency at railway crossings, particularly conditional autoregressive (CAR) and joint specification using Gaussian Kriging models. Full Bayesian Poisson-lognormal approaches are used to compare the effects among different models including heterogeneity-only, spatial-only and heterogeneity-spatial models. These methods are estimated using crash data from a low-speed passenger train service in Costa Rica. The models are compared using the Deviance Information Criterion. Heterogeneity-CAR models show a better goodness of fit than heterogeneity-only, joint Kriging and CAR-Only definitions. The secondorder neighboring model (CAR-2) yields the best fit and relates to an average neighbor distance of around 700 meters (0.44 mi.). Robust semivariograms based in joint Kriging models and CAR methods show similar results. The proportion of variation in the data explained by spatial correlation methods is similar (29% of the total variation). These findings suggest that spatial correlation at highway-railway crossings should be considered while modeling crash frequencies.
Keywords: Bayesian analysis, crashes, railway crossing, conditional autoregressive, joint Kriging, effective range.
Guadamuz-Flores, Aguero-Valverde
3
INTRODUCTION Railway and road networks have important differences. Usually, these systems are physically independent and therefore there are few interactions between them; however, when these two types of networks intersect, in the highway-railway crossings, the vehicles and users of these two very different systems get into proximity and crashes do occur. When these railway-highway crashes take place, the highway users are much more vulnerable than train passengers and consequently are more likely to get severely injured. Although, it is clear that crashes at highway-railway crossings are less common than crashes at other locations, the former usually lead to more severe injuries and fatalities. Therefore, better methods of network screening are essential to analyze and improve safety at railway crossings (1). In Costa Rica, the number of crashes at railway crossings has significantly increased due to the recent expansion of the regional railway system. Another consequence of the increase in the number of crashes is the continuous disruption on the railway service that affects hundreds of users. Most crossing crashes not only disrupt the train service but also halt the traffic in the road which affects the highway users as well. Given the size of the railway system in Costa Rica, the data is characterized by a small number of observations. Furthermore, given the operation of the system, the number of crashes is relatively low; these two characteristics known as the problem of low sample mean and small sample size can cause estimation problems in traditional crash frequency models (2). Another concern with this type of data is the spatial correlation or spatial dependence among observations. When sites are in proximity, unmeasured confounding variables might affect the estimation of the models, producing underestimated standard errors (2, 3). Spatial models can help to alleviate these problems since spatial dependence can be a surrogate for unknown covariates and can adjust for them, and by using spatial dependence, site estimates pool strength from neighborhood sites and improve model estimation (4). The reasoning behind the use of spatial correlation models is that since two intersections are close to each other, they share conditions like topography, average daily train traffic and type of users that may not be explicitly specified as covariates in the model, but that added all together in the form of spatial relationships can help to identify black spots and even clustering. Two main spatial methods were modeled, conditional autoregressive and a joint specification using Bayesian Gaussian Kriging models, the former offers better computational resource efficiency while the latter allows for the direct estimation of the distance at which two sites are no longer considered to be correlated. There are two main goals in this research: first, to improve crash frequency estimation by using a Full Bayesian approach, especially when there are low crash frequencies and few years of crash data available; and second, to evaluate how the use of spatial models can improve the predictions of crash frequencies at highway-railway crossings. LITERATURE REVIEW Most crash frequency studies have focused on roadways, which is expected since there are many more roadway segments and intersections than railway crossings. Spatial methods have been used in several studies, especially CAR methods (4-5); however, most studies are based on segments for the CAR analyses, not intersections. To the best of our knowledge, there is no literature focused on modeling the spatial correlation of crashes at highway-railway level crossings. Regarding intersections, few spatial studies have been conducted. In 2006, Wang and Abdel-Aty (6) analyzed rear-end crashes using Negative Binomial (NB) and generalized
Guadamuz-Flores, Aguero-Valverde
4
estimating equations with non-conditional autoregressive methods as spatial analysis and found a correlation among nearest intersections along a specific corridor. Mitra (7) used NB models as well but in a Bayesian approach. For the spatial analysis, Mitra used a joint model rather than a conditional prior as well as some built-in methods in geographical information systems. In Guo et al (8), Poisson and NB Bayesian approaches are also applied for intersections using CAR methods. This study concludes that Poisson CAR model outperforms other models. Li (9) used a completely different approach: a geographically-weighted regression was calibrated by using latitude and longitude components, this specific study concluded that Poisson log-normal is a better approach than NB. Another research based on crashes aggregated by transportation analysis zones (TAZ) used CAR methods to estimate crash frequency and focused for pedestrian and cyclists (10). All these papers concluded that spatial analysis has a significant effect on roadway crash frequency. One key aspect is that all of the above studies are focused on road networks and not railroad crossings. Little research on crash prediction at highway-railway crossings has been performed. Before Poisson and Negative Binomial (NB) models became popular, different methods ranging from Peabody Dimmick formula to different indices were widely used, but Austin and Carson (11) and Oh et al (12) showed how other methods like Negative Binomial present better results predicting the total amount of crashes. Other studies, like Saccomanno et al (13) presented other approaches like Poisson and Empirical Bayes, this in particular concluded that Poisson provides better results for different warning devices while NB explains better the severity scores. Usually NB is employed because of the presence of over-dispersion in the data; however, Oh et al (12) and Lu and Tolliver (14) considered NB models for under-dispersed data. Most of these railway crossing studies also use NB modeling (11-17). Some railway studies (12, 17) have used other relevant covariates like land use that is spatially related, but none has explicitly used spatial analyses. DATA DESCRIPTION The case in study is based on the great metropolitan area of San José, Costa Rica, where a lowspeed passenger train service works mainly during peak hours. There are a total of 134 at level highway-railway crossings ranging from national highways to entrances to suburban gated communities. Covariates such as the angle between railway and highway, the number of lanes, total highway width, the existence of a curve on highway before crossing, curve on railway before crossing, highway parallel to railway (without a physical separation) were determined with the use of Geographic Information Systems (GIS) and satellite imagery. Since the crossings include many different types of highways, the annual average daily traffic (AADT) was available only for few crossings (around 20%), and the highway type covariate was used as a proxy variable for the AADT. After performing several statistical tests, the highway type was classified in three homogenous categories: a) national roads (base level), b) municipal city block roads and c) other municipal roads. The average train speed on every crossing was measured using Global Positioning Systems (GPS) inside the trains. The crash data used was retrieved from the Costa Rican Institute of Railways from 2010 to 2013. In this period, 154 crashes occurred at highway-railway crossings. From the 134 crossings examined, 52% never had a crash in the period of study and 22% of the crossings concentrate 78% of the total crashes following an almost perfect Pareto principle. The statistics for every variable are shown in the Table 1.
Guadamuz-Flores, Aguero-Valverde
5
In addition to the variables that were found to be statistically significant given the data used, other variables were tested in the models but they were found not to be statistically significant or are highly correlated with other covariates: horizontal curve on railway just before crossing, railway-highway crossing angle, highway parallel to railway, train speed, and roadway width. METHODOLOGY Crashes are usually modeled by either Poisson-lognormal or Poisson-gamma (i.e. negative binomial) processes (4, 8, and 18) because of the nature of the phenomenon. For this research Poisson-lognormal models showed better results than Poisson-gamma and this is consistent with previous studies (4, 10 and 19). Poisson-lognormal models have been considered a better way to handle low sample mean and small sample size compared to the Poisson-gamma models (19). At the first stage a Poisson process is described as in (4, 18): 𝑌𝑖 | 𝜃𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜃𝑖 )
(1)
where 𝑌𝑖 is the observed amount of crashes for crossing 𝑖 and 𝜃𝑖 is the expected crash rate for crossing 𝑖. This is defined as a mathematical relationship between the covariates following a lognormal distribution: 𝑙𝑛(𝜃𝑖 ) = 𝛽0 + ∑𝑘 𝛽𝑘 𝑥𝑖𝑘 + 𝑣𝑖 + 𝑢𝑖
(2)
where 𝛽0 = intercept 𝛽𝑘 = coefficient for the 𝑘 𝑡ℎ covariate 𝑥𝑖𝑘 = value of 𝑘 𝑡ℎ covariate for crossing 𝑖 𝑣𝑖 = heterogeneity among crossings 𝑢𝑖 = spatially correlated random effect for crossing 𝑖. Traditionally, spatial methods have been applied at zonal level (10) and road segments (45); but intersections in close spatial proximity along a corridor should be considered as correlated due to interacting traffic flows as well as similar road design and environmental characteristics (8), in this case the corridor is the railway. At a second stage, the random effects are modeled using a normal prior distribution: 𝑣𝑖 ~ 𝑁(0, 𝜏𝑣 )
(3)
where 𝜏𝑣 is the precision controls the over-dispersion due to heterogeneity among crossings (𝜏𝑣 = 1/𝜎𝑣2 ). Then, errors (𝑢𝑖 ) can be assumed to be spatially dependent (20) and spatial random effects can be incorporated into the model. Spatial Analysis If few covariates are available, then there will be more data variation not explained by a heterogeneity-only model, therefore, a spatial analysis can provide better expected values for the response variable, the total amount of crashes.
Guadamuz-Flores, Aguero-Valverde
6
Conditional Autoregressive (CAR) The CAR is based on a neighbor structure where a spatial correlation is assumed. This method uses a fixed amount of neighbors for a given site, as shown in Figure 1, based on the spatial distribution of the railway network. The conditional distribution of CAR errors as proposed by Besag (21) is: 𝑢𝑖 |𝑢𝑖−1 ~ 𝑁 (
∑𝑗≈𝑖 𝑤𝑖𝑗 𝑢𝑖 𝑤𝑖+
,𝜏
1 𝑢 𝑤𝑖+
)
(4)
where 𝜏𝑢 is precision parameter for the spatial correlation over-dispersion (𝜏𝑢 = 1/𝜎𝑢2 ), 𝑗 ≈ 𝑖 refers to a crossing 𝑗 neighbor of crossing 𝑖, 𝑤𝑖𝑗 is the weight of neighbor crossing 𝑗 related to crossing 𝑖, and 𝑤𝑖+ is the sum of weights of neighbors of crossing 𝑖. The weights used for this research from first to third-order are 1, 0.5 and 0.25; that is 1/2 where n is the neighbor order. Many other weight definitions have been used in different studies (e.g. 4, 5 and 8). Definitions of weights are valid as long as the trend of losing influence with distance remains, based on the first law of geography: “Everything is related to everything else, but near things are more related than distant things” (22). By using CAR spatial methods the total variation of the data can be explained by heterogeneity and spatial effects, and the proportion of variation explained by the spatial term can be determined as (4, 10) suggest: (𝑛−1)
𝑆𝐷(𝑢)
𝜂 = 𝑆𝐷(𝑢)+𝑆𝐷(𝑣)
(5)
where 𝑆𝐷(∙) is the standard deviation for the type of error. This method is typically used in road networks where there is plenty of redundancy in the system (i.e. more neighbors for a given crossing of interest), which makes it interesting to apply on a railway network, since usually there is not as much redundancy as in road networks. Joint Specification Using Bayesian Gaussian Kriging Models Another spatial approach uses a joint specification with Bayesian Kriging models. These models are more common in geostatistics, where the location of data varies continuously over a surface (23). In these models, the covariance between the random errors at two locations is a function of the distance; therefore, the function that defines the covariance between any two points is called the covariogram C(h). It is a common practice in geostatistics to model the semivariogram instead of the covariogram. A semivariogram is a mathematical definition used to determine the distance at which there is no longer a spatial correlation among data from the phenomenon being studied. A semivariogram is mainly described by three parameters: the nugget (𝜏 2 ), the partial sill (𝜎 2 ) and the decay (𝜙) (see Figure 2). The partial sill is often referred as a spatial effect variation, while the nugget is known as the non-spatial effect variation (23). The adding of the nugget and the partial sill is the full sill, or simply sill, and is clear that this would be the total variation of the data. These three parameters (𝜏 2 , 𝜎 2 , 𝜙) need to be determined in order to estimate the distance of interest, which is called range, h (ℎ = 1/𝜙 ). Although this range is important, there is no
Guadamuz-Flores, Aguero-Valverde
7
practical definition of its statistical meaning; hence, a common term used for this purpose is the effective range, i.e. the distance at which the spatial correlation has dropped to only 0.05 (23). Different theoretical functions of semivariogram can be used (23, 24). For this research, four semivariograms are selected because these adjust better to the data used: Exponential 𝛾(𝑡) = 𝜏 2 + 𝜎 2 [1 − exp(−𝜙𝑡)]
(6)
Gaussian 𝛾(𝑡) = 𝜏 2 + 𝜎 2 [1 − exp(−𝜙 2 𝑡 2 )]
(7)
Wave 𝛾(𝑡) = 𝜏 2 + 𝜎 2 [1 −
sin(𝜙𝑡) 𝜙𝑡
]
(8)
Matérn at 𝜈 = 3/2 𝛾(𝑡) = 𝜏 2 + 𝜎 2 [1 − (1 + 𝜙𝑡) exp(−𝜙𝑡)]
(9)
The goodness of fit can be determined and compared using mean squared errors (MSE): 1
𝑀𝑆𝐸 = 𝑛 ∑𝑛𝑖=1(𝛾̂𝑖 − 𝛾𝑖 )2
(10)
where 𝛾̂𝑖 is the semivariogram estimation and 𝛾𝑖 is the semivariogram observed data. The objective of these theoretical semivariograms is to fit the observed semivariogram data. This observed semivariogram can be done through a regular semivariogram using the residual values from the non-spatial model. Although this semivariogram is useful, it can be heavily contaminated by outliers (24), thus, as an alternative, a robust semivariogram is used as suggested in Cressie and Hawkins (25), still using the residual values from the heterogeneity-only model for the observed data:
𝛾̅ (ℎ) =
1 (|𝑁(ℎ)|
1 4 ∑𝑁(ℎ) | 𝑍(𝑠𝑖 ) − 𝑍(𝑠𝑖 ) | 2 ) 0.494
2 (0.457+|𝑁(ℎ)|)
where 𝛾̅ = robust semivariogram value 𝑁(ℎ) = number of distinct sample pairs lagged by ℎ 𝑍(𝑠𝑖 ) = residual value for crossing 𝑖 𝑍(𝑠𝑗 ) = residual value for crossing 𝑗
(11)
Guadamuz-Flores, Aguero-Valverde
8
and the decimal values are not empirical coefficients but asymptotic values for the theoretical normal distribution (25). Model Comparison ̅ ) as a measure When using Bayesian methods, models can be compared by posterior deviance (𝐷 of adequacy and even better comparing the deviance information criterion (DIC), since the latter penalizes overfitting (i.e. increasing the number of parameters without resulting in a better model). The DIC can be considered the Bayesian equivalent of the Akaike Information Criteria (AIC) (26) and can be used to compare models directly. For this criterion the smaller value is preferred, this is considered as the best fitted model. Once the best model is determined by the lowest DIC value, other models can be discarded if their DIC value is within three to seven points higher than the best model. If the DIC value of a model is one or two points higher than the best model, then this model deserves some consideration (26). RESULTS The different models were estimated using Bayesian methods through OpenBUGS 3.2.3 (27). All models had a burn-in of 1000 iterations that were discarded from the results. The number of iterations on every model varies from 50000 to 110000 to fulfill a requirement of the Markov chain Monte Carlo (MCMC) error being less than 5% of the standard deviation of the parameter being estimated (4, 5). Bayesian methods require prior distributions and initial values, in this sense, heterogeneityonly (Poisson log-normal and Negative Binomial) approaches were modeled first in statistical software R 3.2.3 (28) and RStudio 0.99.902 (29) to determine the significant covariates and initial values for iterations. The values obtained for every parameter as well as the overall comparison criteria are shown in the Table 2. For the heterogeneity-only models, Poisson-lognormal and NB (i.e. Poisson-gamma) were modeled and Poisson-lognormal produced better overall results, therefore, all models shown are based on a Poisson-lognormal definition. Depending on the dataset used, Negative Binomial models can produce more accurate results. For all covariates, the 95% credible interval is showed as a measure of the significance of the estimation of the parameter. The dummy covariate “Curve on Road” explains if there is a curve on the highway before the crossing. It was kept in the model even though it does not comply with the 95% Bayesian credible interval (i.e. zero is contained in the interval) for most models, but it is significant for the spatial effects only (CAR-Only). The estimates of the coefficients for the different covariates are intuitive. For example, based on the CAR-2 model, the mean estimation of the coefficient for the number of lanes (𝛽5 ) is 0.411 and because of the Poisson definition of the model in equation 2, the change in crashes as a percentage of current number of crashes (Δ%𝑐𝑟𝑎𝑠ℎ𝑒𝑠 ) can be determined for a unitary change in the covariate as follows Δ%𝑐𝑟𝑎𝑠ℎ𝑒𝑠 =
𝑐𝑟𝑎𝑠ℎ𝑒𝑠′ −𝑐𝑟𝑎𝑠ℎ𝑒𝑠 𝑐𝑟𝑎𝑠ℎ𝑒𝑠
=
𝑐𝑟𝑎𝑠ℎ𝑒𝑠′ 𝑐𝑟𝑎𝑠ℎ𝑒𝑠
− 1 = 𝑒 𝛽5 − 1 = 0.51
(12)
where 𝑐𝑟𝑎𝑠ℎ𝑒𝑠 is the expected amount of current crashes and 𝑐𝑟𝑎𝑠ℎ𝑒𝑠 ′ is the expected amount of crashes after the unitary increase of the covariate. This means that if the number of lanes is increased by one lane, crashes are expected to increase 51% on average. This procedure can be applied to every covariate obtaining that a curve on a highway before a rail crossing will increase
Guadamuz-Flores, Aguero-Valverde
9
crashes in 48% on average, compared to a road with no curves. Similarly, a roadway Type B (municipal city block road) is expected to have 56% less crashes than a national road and if the road is Type C (other municipal road) it is expected to have 59% less crashes than a national road. This is clearly correlated to the exposure of vehicles in which the road type is a proxy variable for the road AADT. For the train ADT, the rate of change depends on the amount of daily trains (i.e. its effect is associated with the current covariate value). For this variable, the rate of change is given by equation 13: Δ%𝑐𝑟𝑎𝑠ℎ𝑒𝑠 = (1 +
Δ𝑇𝑟𝑎𝑖𝑛 𝐴𝐷𝑇 𝛽1
) 𝑇𝑟𝑎𝑖𝑛 𝐴𝐷𝑇
1
− 1 = (1 + 𝑇𝑟𝑎𝑖𝑛 𝐴𝐷𝑇)
0.862
−1
(13)
For example, a change from 11 daily trains (minimum current daily trains on a crossing) to 12 daily trains will transform in an expected increase of crashes of 0.8% and greater values of current daily trains will carry even lower changes. This is clear from the fact that the coefficient for daily trains is greater than zero (increasing amount of crashes), but less than one (the rate of change decreases as the current daily trains increases). On the other hand, doubling the amount of daily trains (from any current daily trains value) is expected to increase crashes in 82% The distances used for these spatial analyses were determined through network analysis by using geographical information systems (GIS). Several other studies (e.g. 5, 8 and 9) used map coordinates to determine aerial or Euclidean distances among intersections. Mostly this is justified by simplicity or similarity, but real network distances are the best option since they are a much better representation of reality. Conditional Autoregressive Focusing merely on the goodness of fit given by the DIC value, the heterogeneity-spatial correlation (CAR-1) and the spatial correlation-only (CAR-Only) model show very similar DIC values (314.0 and 314.1, respectively), this can lead to think that CAR-Only might a better model since it is simpler than the CAR-1. However, pD (i.e. effective number of parameters) for CAROnly is negative, this is because the posterior distribution of the standard deviation of the spatial errors is bimodal and very asymmetric (conflicting with the normality assumption), and therefore, does not provide a good estimate of the parameter (26). For this reason, the CAR-Only model is discarded, as recommended by (26 and 30). Comparing the heterogeneity-spatial models (i.e. CAR-1, CAR-2 and CAR-3) it is clear that the CAR-2 has a lower DIC value (311.5) and therefore is considered to be a better model than the CAR-1 and CAR-3 models (314.0 and 316.5, respectively). Using a different dataset might result in a different preferred spatial neighboring structure; however, the existence of spatial correlation is still expected because of the spatially related covariates that may not be included in a heterogeneity-only model. The average neighbor distance (over the real railway network) is 350 meters (0.22 mi.) for first-order level, 714 meters (0.44 mi.) for second-order level and 1065 meters (0.65 mi.) for thirdorder level. From the results for every model and these distances, it seems that the range distance estimated by conditional autoregressive methods is around 700 meters and the optimal solution is definitely between 350 and 1000 meters. These findings can be later compared to the ones obtained using joint Kriging models. One interesting fact is that 𝜂 (variation explained spatially relative to the total variation) decreases as the neighbor order level increases. This is expected in the sense that the spatial effect
Guadamuz-Flores, Aguero-Valverde
10
is intended for very close intersections; if the level order of neighbors is too high, then a larger distance is taken for every crossing of interest and then the overall spatial effect losses influence, since many of the crossings will be related to many other intersections, leading to a state where there is no real spatial significance among intersection, compared to having lower neighbor order level for each crossing. Since low order level neighbors are used (i.e. less than or equal to third order level), simultaneous autoregressive models (SAR) only offer an equivalent to third order and beyond, and CAR models offer more natural estimations than simultaneous autoregressive models, hence, CAR methods should be adopted at the onset (24). From the parameter values, there are little differences among the three neighbor levels, being 𝜏𝑢 the one with the greatest change, but closely related to the spatial variation explained on every model. Kriging Models Full Bayesian Kriging models with semivariograms are modeled to determine a more precise expected value of the range (ℎ). Since these are Bayesian methods, models need prior distributions for the parameters to be determined, including the range. It is common to use non-informative prior distributions (e.g. 4, 5 and 8) so the posterior distribution would be much more influenced by the data (i.e. likelihood). If the prior distributions are informative, more data is needed to change these prior beliefs. The semivariograms need to be calculated with the residuals instead of the observed values, since the spatial effects are considered apart from the regular non-spatial model. Many semivariograms of the residuals for the Poisson model were estimated using this methodology, including some not showed in this document (Spherical, Rational quadratic, Powered Exponential and Cubic) for brevity. For the models, the mean squared error (MSE) was calculated and a visual inspection was performed in order to determine the best fitting semivariogram for the observed data. The results are shown in the Table 3. The Figure 3 shows the graphical application and fitting of the different semivariograms shown in the Table 3. The range value for Matérn and Wave fittings are similar, while for the exponential and Gaussian semivariograms differ from the former two and also between them, so it suggests that the range value might be around 700 meters, this is especially interesting since the best Full Bayes model for the data used is the CAR-2 (heterogeneity and second-order neighbors level conditional autoregressive) and for this method the average distance from any crossing of interest and the second order neighbor is 710 meters. These two distances, 708 meters from semivariograms and 710 meters from conditional autoregressive, are very close which reinforces the idea that the range value is slightly over 700 meters, equivalent to second-order neighbors. Although both methods are different, similar results are expected (at least in order of magnitude) since the same data and conditions are being supplied to both methods to estimate the range value. Apart from that, there is the effective range (i.e. when the correlation drops to less than 5%), although this effective range is meant to be a more defined parameter, it also is more subjective, since the 5% correlation is not defined through theoretical methods but empirically. Nevertheless, the exponential, Gaussian and Matérn have values of effective range around 1700 meters (1.1 miles). The effective range is always greater or equal than the range, and in this case, values between 1700 m and 1800 m are expected. The rate of spatial variability explained by the joint Kriging models (𝜂′) can be seen as the equivalent to the one calculated in the CAR models (𝜂), but these are not the same since the
Guadamuz-Flores, Aguero-Valverde
11
reasonings behind them are somewhat different. In any case, all but the exponential semivariogram show values between 0.32 and 0.37, this is a slightly higher value than the found in CAR models (0.29 in second-order neighbors level) but definitely similar. Once again, the Matérn is one of the semivariograms found to be more consistent. Even though the mean squared error (MSE) was determined for every semivariogram, this is not a very useful criterion in this case, since it is very similar for all the models. In cases when the MSE value is similar among all fitting models, it is useful to do a visual inspection of the data and the fittings to verify how each of the fitting adapts to the nugget, sill and where is the range being defined, all of this compared to the observed data. Figure 3 shows how the exponential, Gaussian and Matérn semivariograms follow the same visual path while the Wave shows some oscillations. It is fair to say that the Matérn semivariogram at 𝜈 = 3/2 seems to outperform all the other semivariograms and it is consistent with the findings from the conditional autoregressive models. This does not mean that the other semivariogram models should be overlooked, but the discussion is centered on the results from the Matérn. Given the exploratory analysis of the residuals, a Bayesian Kriging model with exponential semivariogram was estimated as shown in Table 2. The Kernel Density Estimation (KDE) graphs for a few models (see Figure 4) show how the posterior distributions relate to the prior distribution defined. The model using a uniform prior distribution for the range h is shown in Table 2 (Figure 4.a) but the results for different priors for h in terms of goodness of fit are similar and not presented here for brevity. It is clear that the posterior distributions are almost identical to the prior distributions specified (i.e. the parameters are basically the same as the hyperparameters defined in the prior distribution). Flat, Uniform, Gamma, Chi-square and Beta distributions yielded the same results (only non-negative distributions should be used as a prior for the range). This means that regardless of the family of the distribution and whether an informative or non-informative prior distribution is used, the posterior distribution is driven by the prior information rather than by the data itself (i.e. the data does not have a strong effect on the posterior distribution of the range h). The exponential semivariogram included in Table 2 was the only Kriging model that could be estimated using Bayesian methods. The semivariogram fittings shown in Table 3 and Figure 3 were estimated using a frequentist approach as a guide of the different semivariograms that can be used in further studies. CONCLUSIONS AND RECOMMENDATIONS Poisson-lognormal model resulted in a better goodness of fit than the negative binomial approach for heterogeneity-only models. For more complex model definitions including spatial random effects, Poisson-lognormal models showed a more stable performance. This finding is also consistent with other studies. The spatial effect models obtained by using CAR methods showed a very significant improvement in the prediction of crashes, the DIC value dropped from 319.0 for heterogeneityonly to 311.5 for second –order CAR. The second-order neighbor level was found to be the best model since offered a better fit for the data compared to first-order and third-order neighbors that showed higher DIC values. As a result, the range determined by using CAR method is estimated around 700 meters (0.43 mi.) (average distance to second-order neighbors). From the total data variation, around 29% is explained by the spatial effects. Based on the robust semivariograms from the heterogeneity-only residuals, the Matérn fitting outperforms other theoretical models and lead to results consistent with those from CAR methods, the range of spatial dependence for semivariograms was determined to be about 700
Guadamuz-Flores, Aguero-Valverde
12
meters (0.44 mi.) and spatial effects explained about 36% of the total data variation. These results are based on the dataset used, the application of this methodology to different datasets may yield different prior distributions, semivariogram selection and overall results, although similar conclusions are expected. The Gaussian Kriging Models using a joint specification showed issues related to the density distribution of the range parameter. For all the prior distributions studied, the likelihood seemed to have no effect on the posterior distribution and therefore, the posterior distribution was almost exactly as the prior distribution given. This implies that for the data analyzed, the range of spatial correlation could not be estimated using a Full Bayesian approach. Several other types of conditional autoregressive models can be explored by changing the neighbor-order level using different fixed distance conditions. Other spatial methodologies can be applied to determine better approaches as well. This methodology should be applied to different datasets to verify the preliminary results found in this study and evaluate how these results may vary for different modeling conditions. REFERENCES 1. Bonneson, J. A. Highway Safety Manual. Publication HMS-1, American Association of State Highway and Transportation Officials, 2010. 2. Lord, D. and F. Mannering. The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. In Transportation Research Part A: Policy and Practice, Vol. 44, No. 5, 2010, pp. 291-305. 3. Aguero-Valverde, J. and P.P. Jovanis. Spatial Analysis of Fatal and Injury Crashes in Pennsylvania. In Accident Analysis & Prevention, Vol. 38, No. 3, 2006, pp.618-625. 4. Aguero-Valverde, J., and P.P. Jovanis. Analysis of Road Crash Frequency with Spatial Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 2061, Transportation Research Board of the National Academies, Washington, D.C., 2008, pp. 55-63. 5. Aguero-Valverde, J. Direct Spatial Correlation in Crash Frequency Models: Estimation of the Effective Range. In Journal of Transportation Safety & Security, Vol. 6, No. 1, 2014, pp. 21-33. 6. Wang, X. and M. Abdel-Aty. Temporal and Spatial Analyses of Rear-End Crashes at Signalized Intersections. In Accident Analysis & Prevention, Vol. 38, No. 6, 2006, pp. 1137-1150. 7. Mitra, S. Spatial Autocorrelation and Bayesian Spatial Statistical Method for Analyzing Intersections Prone to Injury Crashes. In Transportation Research Record: Journal of the Transportation Research Board, No. 2136, 2009, pp. 92-100. 8. Guo, F., X. Wang and M. Abdel-Aty. Modeling Signalized Intersection Safety with Corridor-Level Spatial Correlations. In Accident Analysis & Prevention, Vol. 42, 2010, pp. 84-92. 9. Li, Z, Y. Lee, S.H. Lee and E. Valiou. Geographically-Weighted Regression Models for Improved Predictability of Urban Intersection Vehicle Crashes. In Transportation and Development Institute Congress, 2011, pp. 1315-1329. 10. Siddiqui, C., M. Abdel-Aty, and K. Choi. Macroscopic Spatial Analysis of Pedestrian and Bicycle Crashes. In Accident Analysis & Prevention, Vol. 45, 2012, pp. 382-391. 11. Austin, R. D., and J. L. Carson. An Alternative Accident Prediction Model for HighwayRail Interfaces. In Accident Analysis & Prevention, Vol. 34, No. 1, 2002, pp. 31-42.
Guadamuz-Flores, Aguero-Valverde 12. 13.
14. 15.
16.
17.
18. 19.
20. 21.
22. 23. 24. 25. 26.
27.
28. 29. 30.
13
Oh, J., S. P. Washington and D. Nam. Accident Prediction Model for Railway-Highway Interfaces. In Accident Analysis & Prevention, Vol. 38, No. 2, 2006, pp. 346-356. Saccomanno, F., L. Fu, C. Ren and L. Miranda. Identifying Highway-Railway Grade Crossing Black Spots: Phase 1. Publication No. TP 14168E, Transportation Development Centre, Transport Canada, 2003. Lu, P., and D. Tolliver. Accident Prediction Model for Public Highway-Rail Grade Crossings. In Accident Analysis & Prevention, Vol. 90, 2016, pp. 73-81. Saccomanno, F., L. Fu and L. Miranda-Moreno. Risk-Based Model for Identifying Highway-Rail Grade Crossing Blackspots. In Transportation Research Record: Journal of the Transportation Research Board, No. 1862, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 127-135. Eluru, N., M. Bagheri, L. F. Miranda-Moreno and L. Fu. A Latent Class Modeling Approach for Identifying Vehicle Driver Injury Severity Factors at Highway-Railway Crossings. In Accident Analysis & Prevention, Vol. 47, 2012, pp. 119-127. Russo, B. and P. Savolainen. An Examination of Factors Affecting Frequency and Severity of Crashes at Rail-Grade Crossings. Presented at 92nd Annual Meeting of the Transportation Research Board, Washington D.C., 2013. Miranda-Moreno, L. F. Statistical models and methods for identifying hazardous locations for safety improvements. Ph.D. Dissertation, University of Waterloo, Canada, 2006. Lord, D., and L. F. Miranda-Moreno. Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter of Poisson-Gamma Models for Modeling Motor Vehicle Crashes: a Bayesian Perspective. In Safety Science, Vol. 46, No.5, 2008, pp.751-770. Congdon, P. Applied Bayesian Modelling. John Wiley & Sons, Inc., 2003. Besag, J. Spatial Interaction and the Statistical Analysis of Lattice Systems. In Journal of the Royal Statistical Society. Series B (Methodological), Vol. 36, No. 2, 1974, pp. 192236. Tobler W. A Computer Movie Simulating Urban Growth in the Detroit Region. In Economic Geography, Vol. 46, No. 2, 1970, pp. 234-240. Banerjee, S., Carlin, B. P., & Gelfand, A. E. Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC Press, 2004. Cressie, N. Statistics for Spatial Data. John Wiley & Sons, Inc., New York, 1993. Cressie, N., and D. M. Hawkins. Robust Estimation of the Variogram: I. In Journal of the International Association for Mathematical Geology, Vol. 12, No. 2, 1980, pp. 115-125. Spiegelhalter, D., N. Best, B. Carlin and A. van der Linde. Bayesian Measures of Model Complexity and Fit. In Journal of the Royal Statistical Society, Vol. 64B, No 4, 2002, pp. 583-639. Lunn, D., D. Spiegelhalter, A. Thomas and N. Best. The BUGS Project: Evolution, Critique and Future Directions. In Statistics in Medicine, Vol. 28, No. 25, 2009, pp. 30493067. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Austria, 2015. RStudio Team. RStudio: Integrated Development for R. RStudio, Inc., Boston, Ma., 2015. Spiegelhalter, D. Some DIC Slides. Presented at. IceBUGS Conference, Finland, 2006.
Guadamuz-Flores, Aguero-Valverde
14
LIST OF TABLES TABLE 1. Summary Statistics for the Data. TABLE 2. Posterior Summary of Bayesian Models TABLE 3. Theoretical Semivariogram Models. LIST OF FIGURES FIGURE 1. Neighbor Structure Definition for CAR Methods FIGURE 2. Theoretical Semivariogram Structure and Definition. FIGURE 3. Robust Semivariogram Models of Residuals. FIGURE 4. Posterior Distribution for the Range (h) for Several Prior Distributions.
Guadamuz-Flores, Aguero-Valverde
15
TABLE 1. Summary Statistics for the Data. Variable Crashes Train (Average Daily Traffic) Number of road lanes Variable
Mean
Std. Dev.
Minimum
Maximum
1.149
2.119
0
12
50.0
31.5
11
91
2.149
0.993
1
7
Proportion
Road type: National Road
0.149
Road type: Municipal City Block Road
0.388
Road type: Other Municipal Road
0.463
Curve on Road: No
0.769
Curve on Road: Yes
0.231
Guadamuz-Flores, Aguero-Valverde
16
TABLE 2. Posterior Summary of Bayesian Models Heterogeneity-Only (Poisson-lognormal) Variable Intercept Train average daily traffic Municipal city block road Other municipal road Curve on road Number of road lanes σ²ᵥ (Heterogeneity) σ²ᵤ (Spatial) τᵥ (Heterogeneity) τᵤ (Spatial) Std. Dev. v (Heterogeneity) Std. Dev. u (Spatial) Eta (Spatial/Total) Deviance DIC pD
Variable Intercept Train average daily traffic Municipal city block road Other municipal road Curve on road Number of road lanes σ²ᵥ (Heterogeneity) σ²ᵤ (Spatial) τᵥ (Heterogeneity) τᵤ (Spatial) Std. Dev. v (Heterogeneity) Std. Dev. u (Spatial) Eta (Spatial/Total) Range (h) Deviance DIC pD
-3.989 0.837 -0.759 -0.915 0.366 0.450 0.445
Std. Dev. 0.852 0.191 0.308 0.317 0.298 0.093 0.197
M.C. Error 0.009 0.002 0.004 0.004 0.004 0.002 0.004
2.788
1.565
0.647
0.138
Mean
2.5% -5.730 0.469 -1.375 -1.554 -0.240 0.273 0.148
-2.391 1.221 -0.149 -0.307 0.939 0.638 0.909
0.036
1.100
6.746
0.003
0.389
0.931
283.9 319.0 35.1 Heterogeneity and Spatial Correlation (Second order adjacency - CAR-2) Std. M.C. Mean 2.5% Dev. Error -4.022 1.057 0.027 -6.172 0.862 0.263 0.007 0.353 -0.831 0.323 0.005 -1.474 -0.888 0.319 0.003 -1.522 0.389 0.299 0.003 -0.214 0.411 0.099 0.002 0.217 0.422 0.197 0.004 0.121 0.032 0.049 0.002 0.001 3.128 3.039 0.075 1.130 267.7 460.9 19.7 5.8 0.629 0.144 0.003 0.350 0.278 0.201 0.010 0.048 0.285 0.150 0.007 0.067 282.4 311.5 29.2
97.5%
97.5% -1.982 1.403 -0.209 -0.271 0.965 0.607 0.885 0.171 8.303 1662.0 0.919 0.772 0.606
Spatial Correlation-Only (First order adjacency - CAR-Only) Std. M.C. Mean 2.5% Dev. Error -4.070 1.143 0.040 -6.451 0.913 0.299 0.012 0.321 -0.812 0.255 0.004 -1.323 -0.660 0.230 0.003 -1.110 0.487 0.221 0.003 0.043 0.352 0.068 0.002 0.212
97.5% -1.902 1.526 -0.314 -0.203 0.915 0.478
0.046
0.069
0.003
0.001
0.246
244.7
508.1
24.9
4.1
1817.0
0.459
0.292
0.016
0.075
1.168
324.2 314.1 -10.1 Heterogeneity and Spatial Correlation (Third order adjacency - CAR-3) Std. M.C. Mean 2.5% Dev. Error -4.010 1.006 0.021 -6.053 0.856 0.244 0.006 0.387 -0.823 0.321 0.005 -1.469 -0.894 0.321 0.003 -1.550 0.388 0.302 0.003 -0.233 0.417 0.099 0.002 0.222 0.425 0.203 0.004 0.106 0.046 0.088 0.004 0.001 3.604 11.870 0.359 1.117 226.2 412.6 17.2 3.6 0.629 0.150 0.003 0.330 0.247 0.190 0.009 0.037 0.262 0.154 0.007 0.051 282.5 316.5 34.0
97.5% -2.080 1.356 -0.208 -0.280 0.961 0.612 0.896 0.281 9.445 1419.0 0.925 0.733 0.607
Heterogeneity and Spatial Correlation (First order adjacency - CAR-1) Std. M.C. Mean 2.5% Dev. Error -3.971 1.102 0.034 -6.203 0.848 0.276 0.010 0.309 -0.847 0.327 0.006 -1.505 -0.889 0.321 0.004 -1.538 0.389 0.306 0.003 -0.230 0.407 0.099 0.002 0.214 0.434 0.200 0.004 0.127 0.013 0.017 0.001 0.001 3.001 2.467 0.061 1.107 343.5 494.8 23.5 16.4 0.638 0.145 0.003 0.359 0.315 0.186 0.010 0.075 0.315 0.133 0.007 0.099 281.5 314.0 32.5 Heterogeneity and Spatial Correlation (Exponential Semivariogram) Std. M.C. Mean 2.5% Dev. Error -4.012 0.876 0.008 -5.806 0.842 0.200 0.002 0.463 -0.768 0.312 0.003 -1.391 -0.915 0.320 0.003 -1.558 0.364 0.304 0.003 -0.254 0.440 0.096 0.002 0.255 0.456 0.196 0.003 0.156 0.028 0.092 0.004 4.6E-15 2.681 1.470 0.027 1.085 0.656 0.069 0.076 2637 282.5 318.6 36.0
0.136 0.134 0.130 1440
0.002 0.007 0.007 40.4
0.397 6.1E-08 9.9E-08 146.4
97.5% -1.829 1.407 -0.204 -0.273 0.979 0.606 0.903 0.061 7.873 1752.0 0.928 0.767 0.591
97.5% -2.355 1.250 -0.167 -0.299 0.939 0.631 0.922 0.2858 6.416 0.938 0.489 0.454 4887
Guadamuz-Flores, Aguero-Valverde
17
TABLE 3. Theoretical Semivariogram Models. Parameter
Exponential
Matérn at ν=3/2
Gaussian
Wave
Nugget
𝜏2
0.250
0.308
0.292
0.307
Partial Sill
𝜎2 𝜙
0.208
0.149
0.165
0.146
0.0008
0.0006
0.0014
0.0014
ℎ ℎ𝑒𝑓𝑓
1195
1696
708
703
1703
1774
1720
727
𝜂'
0.454
0.327
0.362
0.322
0.3628
0.3655
0.3650
0.3625
Decay Range (m) Effective Range (m) Spatially-explained variation Mean Squared Error
𝑀𝑆𝐸
Guadamuz-Flores, Aguero-Valverde
FIGURE 1. Neighbor Structure Definition for CAR Methods
18
Guadamuz-Flores, Aguero-Valverde
FIGURE 2. Theoretical Semivariogram Structure and Definition.
19
Guadamuz-Flores, Aguero-Valverde
FIGURE 3. Robust Semivariogram Models of Residuals.
20
Guadamuz-Flores, Aguero-Valverde
21
FIGURE 4. Posterior Distribution for the Range (h) for Several Prior Distributions.