Calibration of Highway Safety Manual Safety Performance Function Development of New Models for Rural Two-Lane Two-Way Highways Bradford K. Brimley, Mitsuru Saito, and Grant G. Schultz the state. To discover the prediction ability of different model forms and additional variables, new SPFs were also developed. This paper presents the methodology and findings of developing the HSM calibration factor and new jurisdiction-specific models for two-lane two-way highways in Utah, based on a research project completed for the Utah DOT (2). First, a background on the HSM and crash modeling is given, followed by a discussion on the data and the variables used in the study. Then, the results of the HSM calibration and the new models are presented, followed by the conclusions obtained from the results. These conclusions are valuable to transportation agencies as they identify methods of evaluating and reducing crashes in the agencies’ jurisdictions.
This paper documents the calibration of the Highway Safety Manual (HSM) safety performance function (SPF) for rural two-lane two-way roadway segments in Utah and the development of new SPFs through negative binomial regression. Crash data from 2005 to 2007 on 157 selected study segments in Utah provided a 3-year frequency of observed crashes to calibrate the HSM SPF and develop new models. The calibration factor for the HSM SPF for rural two-lane two-way roads in Utah is 1.16, indicating that the original HSM model underpredicts crashes in Utah. The HSM suggests that jurisdiction-specific SPFs may predict crashes with greater reliability than calibrated SPFs. The following variables were significant in each of the four models developed by this research: annual average daily traffic (AADT), segment length, speed limit, and the percentage of AADT composed of multiple-unit trucks. AADT and segment length are used in the HSM SPF; speed limit and the percentage of AADT composed of multiple-unit trucks were found to correlate significantly with observed crash frequencies. The fourth negative binomial model developed in the study would be the best SPF to predict crashes on rural highways in Utah. As encouraged by the HSM and contemporary research, the empirical Bayes method can be applied with each jurisdiction-specific SPF because the analysis provided an overdispersion parameter for each model.
Background HSM Predictive Method SPFs SPFs, contained in Chapters 10 to 12 (Part C) of the HSM, utilize known information about a roadway, such as geometry and annual average daily traffic (AADT), to predict the number of crashes on a roadway entity for 1 year. SPFs may be used with the existing roadway conditions, but the SPFs may also be applied to future conditions with a projected AADT. The continual changes in the observed safety of roadways make it difficult to determine which variables should be used to predict the number of crashes at a given site. For most SPFs, the only variable that changes from year to year is the AADT. The SPFs in the HSM were developed from studies that involved a number of areas of the United States and may be calibrated to better predict the safety of a specific jurisdiction, such as a state or county. Equation 1 is the SPF for rural two-lane two-way road segments that meet the base conditions documented in the HSM (1). The base conditions describe roadway characteristics that must be met to use the SPF in its original form. A crash modification factor (CMF) is applied to the model when a base condition is not met.
The Highway Safety Manual (HSM), published in 2010 by AASHTO, contains safety performance functions (SPFs) that predict the safety of a roadway in terms of the number of crashes. SPFs incorporate known information about a roadway entity into an equation that gives a predicted crash frequency (1). SPFs that accurately predict crashes are valuable to state and local transportation agencies because of the SPFs’ ability to detect areas with safety concerns. The SPFs in the HSM have been developed through extensive research across the United States. AASHTO recognizes that many factors that affect safety are unique to local areas and, thus, recommends that the HSM SPFs be calibrated to better represent local conditions (1). The Utah Department of Transportation (DOT) desired to calibrate the HSM SPF for rural two-lane two-way roads in B. K. Brimley, Texas A&M University, College Station, TX 77843. M. Saito and G. G. Schultz, Department of Civil and Environmental Engineering, Brigham Young University, Provo, UT 84602. Corresponding author: B. K. Brimley, brad.
[email protected].
N spf = AADT × L × 365 × 10 −6 × e − 0.312
(1)
where Nspf is the predicted number of annual crashes and L is the segment length in miles. The HSM base conditions are
Transportation Research Record: Journal of the Transportation Research Board, No. 2279, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 82–89. DOI: 10.3141/2279-10
• Lane width, 12 ft; • Shoulder width, 6 ft; 82
Brimley, Saito, and Schultz
83
• Shoulder type, paved; • Roadside hazard rating, three; • Driveway density, five driveways per mile; • Horizontal curvature, none; • Vertical curvature, none; • Centerline rumble strips, none; • Passing lanes, none; • Two-way left-turn lanes, none; • Lighting, none; • Automated speed enforcement, none; and • Grade level, 0%.
Calibrating HSM SPFs and Developing Jurisdiction-Specific SPFs Calibration of HSM Model SPFs can better predict crashes when they are calibrated to match the characteristics of the local roads and populations. This has already been done in some states (8–10). The calibration is performed by applying a multiplicative factor to an SPF so that its aggregate crash prediction within a whole jurisdiction is equal to the aggregate number of observed crashes. The calibration preserves the original model form and the relationship between independent variables and crashes. Equation 3 illustrates how to use the calibration factor:
CMFs
Nlocal = C × N
CMFs are applied to an SPF when the characteristics of a site deviate from the base conditions given in the HSM. CMFs are multiplied to the base prediction value (Nspf), which adjusts the base predicted crash frequency to meet the actual conditions, as shown in Equation 2 (1). A CMF greater than one indicates an increase in predicted crashes attributable to the nonbase condition; a CMF less than one represents a reduction in crashes. N = N spf × CMF1 × CMF2 × × CMFi
(2)
where N is the predicted number of crashes, considering all conditions, and CMFi is the crash modification factor. The HSM calibration procedure allows the calibration of two groups of segments: those that conform strictly to the base conditions and those with a variety of characteristics whose CMFs are included in the total prediction. This study used the latter case, providing a calibration of segments that did not all conform to the HSM base conditions. This was done because it was impractical to find enough study segments that completely met the HSM base conditions to properly calibrate the model. Empirical Bayes Method The empirical Bayes method is used on the basis of the recognition that the safety of a site is best estimated by considering both the number of observed crashes at the site and the number of crashes at sites with similar characteristics, as predicted by the SPF (3). The use of this method produces the expected number of crashes at a site through a mathematical combination of the predicted and observed crash frequencies. This method has been applied to road safety for a number of years and received even more attention with the publication of Hauer’s Observational Before–After Studies in Road Safety: Estimating the Effect of Highway and Traffic Engineering Measures on Road Safety (4). With additional support from the HSM, the empirical Bayes method may be considered to be the standard in the evaluation of road safety. Many studies have used this method for various applications, and there is a general concurrence regarding its usefulness (5–7). This paper does not discuss in detail the use of this method. Because the new models produced in this study were developed through negative binomial regression, which provides the overdispersion parameter required by the empirical Bayes method, this method can be utilized with each model.
(3)
where Nlocal is the total predicted crashes in a local jurisdiction and C is the calibration factor. The calibration factor is calculated by rearrangement of Equation 3 and substitution of an observed crash frequency for Nlocal, shown in Equation 4 (1). C=
N observed N
(4)
The HSM recommends that agencies use a sample set of at least 30 to 50 sites, selected without regard to their crash frequencies, to determine a calibration factor for an entire jurisdiction (1). Since the SPF calibration in this study is for rural two-lane two-way roads in Utah, the local jurisdiction is the entire state. Development of Jurisdiction-Specific SPFs When enough data are available, the HSM allows users to create jurisdiction-specific SPFs. It is recommended that SPFs be developed by using negative binomial regression techniques that account for the dispersion present in crash data and estimate an overdispersion parameter. When nonbase conditions are used, as was the case in this study, the new SPF should be able to be converted to base conditions by substituting the base condition values into the model (1). Data Needs The HSM specifies what data are required to calibrate the SPFs or develop new models. The data requirements for each model can be found in each model’s respective HSM chapter. The HSM recognizes that some data may not be available, specifically mentioning the extra effort involved in obtaining horizontal curvature data. The calibration can then be performed with the available data (1). In this study, the data came strictly from tangent segments, so the results are only applicable to such segments. Variables in addition to those required by the HSM were included in the new models because it was believed that the variables could make a positive contribution to the prediction of crash frequencies. Literature Review Variable Selection SPFs have long been developed for various types of entities, such as rural roads, urban arterials, intersections, freeways, and even
84
Transportation Research Record 2279
freeway ramps (11–15). The development of an SPF requires the variables that may be correlated with crash frequencies to be identified and the associated data to be obtained. The preferred model considers independent variables that best predict crashes and has a reasonable functional form, showing a logical connection between the variables and the results. Because of the time and resources required to collect data, agencies are often limited in the amount of data that can be used in a model. The HSM model for two-lane two-way rural roads considers the exposure, the cross section geometry (lane and shoulder widths), the curvature, and the density of driveways, among other variables, to be measurable characteristics that affect safety (1). Garber and Ehrhart showed that the distribution of vehicle speeds, expressed by the standard deviation, was correlated with crashes (16). Harwood et al. noted that wider lanes provided a buffer against driver mistakes or inattentiveness and thus resulted in lower crash rates (17). The CMFs for lane width in the HSM reinforce this. Hauer, however, showed that rural roads with 12-ft lanes were less safe than rural roads with 11-ft lanes (18). Mayora and Rubio presented the following factors as having a high correlation with crash rates: access density, sight distance, speed limit, and proportions of no-passing zones (19). The SPFs developed by Zegeer et al. included variables for flat or mountainous terrain (15). Griffin et al. found that there was an increase in overall crashes as speed limits increased (20). Vogt and Bared, however, found no such relationship between crashes and speed (14). The contradictions among the research mentioned here indicate that models should be robust enough to identify the unique attributes of their representative populations. Thus, jurisdiction-specific SPFs may be preferred over nationwide ones. There is also the issue of modeling crashes (which are most often caused by human error) by using roadway characteristics, which suggests that crashes are caused by the inadequacies of the roadway. However, SPFs contain predictive rather than causal factors (12). This correlation is not the same as causality. Correlation indicates that there is a noticeable relationship between an observation and a hypothesized occurring factor. Hauer’s observation that 12-ft lanes may experience higher crash rates than 11-ft lanes (18) may be a result of more aggressive behavior on roads with wider lanes, or it may be a product of the higher design standards required of roads with higher traffic volumes. In either case, the lane width itself is not a causal factor but can be used to predict crashes because of its correlation to observed occurrences.
high variability of crash frequencies whose variance is greater than the mean. Negative binomial models are also referred to as mixed Poisson-gamma models because crashes within a site fit a Poisson distribution, but the variation across multiple sites is gamma distributed (21, 22). The gamma-distributed error in the negative binomial model is the source of the overdispersion parameter used in the empirical Bayes method (3). The overdispersion parameter is an indication of the precision of the model and the variability of the crash frequencies. A value of one is indicative of Poisson-distributed data, and greater overdispersion values are indicative of more variability.
Time Periods for Analysis
The speed limit, the presence or absence of a shoulder rumble strip, the passing ability (indicated by a 0, 1, or 2 for the number of lanes that permitted a passing movement), and the percentage of single-unit and multiple-unit trucks (measured as a percentage of the AADT) are not utilized in the HSM SPF for rural two-lane two-way roads. These are new variables that were hypothesized to have a measurable correlation with total crash frequencies. The HSM does not discuss using other variables in developing jurisdiction-specific models. As such, the CMFs from the HSM may not necessarily apply to these new SPFs. The new variables were included in the analysis to explore possible, previously unknown, relationships. Even though the SPF in the HSM can be used with AADT values as high as 17,000 vehicles per day and does not consider speed limit, the modeling in this study did not include study segments with AADT values greater than 10,000 vehicles per day or speed limits less than 55 mph. There are some rural two-lane two-way roads that
Because crashes are rare and random events, crash frequencies are best expressed as an average for a time period of multiple years. A multiyear time period takes advantage of the regression to the mean phenomenon thoroughly discussed by Hauer (4). For the calibration of the HSM SPFs or the development of jurisdiction-specific SPFs, the HSM recommends using a period that reflects the length of time for which the models will be used (1). Negative Binomial Models Contemporary crash prediction models are most often developed using a negative binomial distribution. The negative binomial distribution is well suited to modeling crashes because of the naturally
Data Collected for Modeling Appendix A of Part C of the HSM notes that segments that do not conform to the HSM base conditions can be used for SPF calibration and the development of jurisdiction-specific SPFs (1). Few rural roads in Utah conform to the strict base conditions of the HSM. In fact, only 14 of the 157 study segments in the data set utilized in this research met the HSM base conditions. The study segments were selected in a manner that was as random as possible. Horizontal curvature data were not available at the time of the study; therefore, only tangent segments were used, as suggested in Appendix A of the HSM (1). The Utah DOT’s Roadview Explorer provided photologs of the roadways; the photologs were used to select segments and obtain visual data (23). Geometric measurements were obtained from Google Earth (24). The Utah DOT annually collects AADT values for state highways and local federally sponsored roads, which are published and available to the public (25, 26). Because these data are easily accessible, all of the study segments were on either a state or federal highway. The Utah DOT additionally supplied crash histories for the study period. The following data were collected for each study segment, with the source of the data given in parentheses: • The segment length, the number of driveways, the presence of a shoulder rumble strip, the passing ability based on centerline striping, and the speed limit (23); • The lane width, the shoulder width, and the longitudinal grade (24); • AADT (25); and • The percentage of single-unit trucks and the percentage of multiple-unit trucks (26).
Brimley, Saito, and Schultz
85
have an AADT as high as 17,000 vehicles per day or lower speed limits; however, these segments are not representative of the bulk of rural two-lane highways in Utah, and it was thought that their extreme characteristics could undermine the prediction capability of any new SPF. The developed models are valid only for segments whose characteristics fit within the range of the original data. To illustrate the characteristics of the study segments, Table 1 gives, for the 157 segments, the minimum, median, mean, and maximum of the data variables that have numerical values. The median values for the number of driveways and the driveway density are zero, emphasizing the absence of driveways along these rural highways.
at a 75% level of confidence to identify the general relationships between the independent variables and the crash frequencies. The backward stepwise process was continued until a model with a 95% confidence was reached. For the 75% confidence level, all of the variables with a p-value greater than .25 were removed. For the 95% confidence level, all of the variables with a p-value greater than .05 were removed. The subsections that follow discuss the form of the negative binomial models, the four models developed with data from Utah, some general observations of the models, a comparison of the models, the method of selecting a preferred model, and the limitations of the results of the study.
Results
Negative Binomial Model Form
Calibration of HSM Model
The form of the negative binomial model used to develop the jurisdiction-specific SPFs is shown in Equation 7.
From 2005 to 2007, 426 crashes were reported on the 157 studied segments. The HSM predicts 368 total crashes for these 3 years, using all of the applicable CMFs. Through the use of Equation 4, the calibration factor was found to be 1.16. Equation 5 gives the Utahcalibrated HSM SPF for two-lane two-way rural highway segments by using the calibration factor. Equation 5 is simplified to Equation 6. The calibrated SPF still requires the use of CMFs as directed by the HSM (1). N local = 1.16 × AADT × L × 365 × 10 −6 × e −0.312
(5)
N local = AADT × L × 3.09 × 10 −4
(6)
p
ln ( N ) = β 0 + ∑ β i xi
(7)
i =1
where N = number of predicted crashes, β0 = intercept, βi = coefficient for variable xi, xi = independent variable, and p = number of independent variables. A rearrangement of Equation 7 directly predicts the number of crashes for a given year, illustrated by Equation 8: p
Jurisdiction-Specific Negative Binomial Models Four negative binomial models were developed with two model types at two levels of confidence. The first model type incorporated the original data. The second used a natural log transformation of the AADT, transformed to normalize the data. The SPFs were developed with the statistical software SAS through a backward stepwise technique (27). Each type of model was first developed
N =e
β0 +
∑β x
i i
i =1
p = exp β 0 + ∑ β i xi i =1
The SPFs that follow are generally written with this form. Coefficients less than zero show a reducing effect on crash frequencies; coefficients greater than zero show an increasing effect on crash frequencies.
TABLE 1 Descriptive Statistics of Data Variables Variable
Minimum
Median
Segment length (mi) Longitudinal grade (%) Number of driveways Driveway density (driveways per mile) Speed limit (mph) Lane width (ft) Shoulder width (ft) AADT (vpd) Single-unit trucks (%)
55 10.2 0.0 287 3
65 12.1 4.1 2,739 10
Multiple-unit trucks (%)
4
16
Note: vpd = vehicles per day.
0.20 0.00 0 0.0
(8)
0.64 0.76 0 0.0
Mean 0.97 1.11 1.38 1.8 64.0 12.1 4.7 2,787 12.6 21.9
Maximum 5.85 7.13 14 21.2 65 16.6 11.4 8,270 32 60
86
Transportation Research Record 2279
Utah-Specific Negative Binomial Models
TABLE 3 Conventional Model at 95% Confidence Level
Four SPFs were developed through the negative binomial modeling, as presented in the full Utah DOT report (2). First, two conventional models (no data transformations at the 75% and 95% confidence levels) are given, followed by two models that use the natural log of the AADT (also at the 75% and 95% confidence levels). Conventional Negative Binomial SPFs The conventional mod-
els were formed with no data transformations. Table 2 shows the estimates and p-values from the conventional model at a 75% confidence level (p-value < .25). The written model is shown in Equation 10. The overdispersion parameter is 1.20. N = exp [ −7.49 + ( 0.0002 )( AADT ) + ( 0.429 )( L ) + ( 0.0286 )( DD ) − (1.60 ) ( No Passing ) − ( 0.128 ) ( One-Direction Passing ) − ( 0.268 )( No SRS) − ( 0.0219 )( CT ) + ( 0.104 ) ( Speed )]
(9)
Parameter
Estimate
p-Value
Intercept AADT Segment length No passing One-direction passing Multiple-unit truck percentage
-7.17 0.0003 0.423 -1.51 -0.0812 -0.0219
.0017