Modeling Route Choice Behavior How Relevant Is the Composition of Choice Set? Carlo Giacomo Prato and Shlomo Bekhor Generalized extreme value (GEV) specifications, such as cross nested logit (CNL) and generalized nested logit (GNL), relate the network topology to model parameters in the stochastic term of the utility function and present a more complex structure (9, 10). Probit and logit kernel (LK) assume that the covariance of path utilities is proportional to overlap lengths (11–13). Modeling revealed route choice behavior translates into the selection of a path generation method and a model specification, and this two-stage process raises several modeling questions. How dependent are the parameter estimates on the path generation technique implemented to construct the choice set? How reliant are the parameter estimates on the number of alternative routes considered in the choice set? How does the sensitivity to the formation of the set of alternatives differ among route choice model specifications? The issue of the choice set composition is the most difficult to resolve in the context of modeling revealed choice behavior (14). Biased model parameters, statistical inconsistency of parameter estimates, and violation of the independence from irrelevant alternatives assumption are related to misspecification of the choice set (15–19). Consistency of utility parameter estimates for MNL has been proved theoretically by sampling from the full set of alternatives (20). Accuracy and efficiency of model estimates for MNL and mixed logit have been tested empirically by reducing randomly the size of a synthetic data set (21). Several researchers sampled the full set of available alternatives and estimated a choice model while investigating industrial location choice (22–24), residential location choice (25–27), destination choice (28, 29), and consumer behavior (30, 31). Route choice is similar to these situation contexts in that hundreds of alternatives are potentially available to decision makers, but this similarity does not imply that the sampling approach is appropriate. Consistency of parameter estimates with sampled data sets has been neither proved theoretically nor investigated empirically for any model specification other than MNL, and the MNL model is not suitable to model route choice. This paper focuses on the effects of choice set composition in route choice modeling by designing an experimental analysis of actual route choice behavior of individuals habitually driving from home to work in Turin and participating in a web-based questionnaire at the Turin Polytechnic (32). The analysis covers extensively the route choice context: five path generation techniques are implemented (labeling approach, link penalty, link elimination, simulation method, branch and bound algorithm) and five model specifications are estimated (C-logit, PSL, CNL, GNL, LK). This paper explores different dimensions related to the impact of choice set formation on model estimates. Model estimates from path sets built with different generation techniques make it possible to understand the influence of the qualitative composition of the choice set. Model estimates from path sets created with sample size reduction from the initial data sets make it possible to comprehend the

Most route choice models are related to revealed choice behavior and are estimated by adding alternative paths to observed routes. This paper focuses on the effects of choice set composition in route choice modeling by designing an experimental analysis of actual route choice behavior of individuals driving habitually from home to work in an urban network. The numerical analysis concentrates on a qualitative perspective, by considering path sets built with different generation techniques, and a quantitative perspective, by accounting for path sets constructed with sample size reduction from each initial choice set. Comparison of prediction accuracy across different choice sets suggests that a recently developed branch and bound algorithm generates heterogeneous routes that allow for estimating models with better prediction abilities with respect to the outcomes of the drivers’ actual choices. Further, comparison of route choice models across different choice set compositions indicates that nonnested structures, such as C-logit and path size logit, yield more robust parameter estimates.

Most route choice models relate to revealed choice behavior and consider separately the generation of additional alternative paths and calculation of the probability of choosing the observed routes from the generated choice set. Selective path generation techniques are preferable to unrealistic exhaustive approaches to create alternative routes. Traditional methods, based on the shortest path search, include the following: a labeling approach, which minimizes generalized cost functions according to link attributes (1); a link penalty, which gradually increases the impedance of all links on the shortest path (2); a link elimination, which removes the shortest paths from the network in sequence to generate new routes (3); and a simulation method, which produces alternative paths by drawing link impedances from probability distributions (4). An alternative approach is the branch and bound algorithm, which constructs a connection tree between origin and destination of a trip by processing sequences of links according to a branching rule that accounts for behavioral constraints formulated to increase route likelihood and heterogeneity (5). Model specifications that account for correlation among alternatives are preferable to the multinomial logit (MNL) model to represent route choice behavior. MNL modifications, such as C-logit and path size logit (PSL), include a correction term in the deterministic part of the utility function and maintain a simple logit structure (6–8). C. G. Prato, Transportation Research Institute, and S. Bekhor, Faculty of Civil and Environmental Engineering, Technion–Israel Institute of Technology, Haifa 32000, Israel. Corresponding author: C. G. Prato, [email protected]. Transportation Research Record: Journal of the Transportation Research Board, No. 2003, Transportation Research Board of the National Academies, Washington, D.C., 2007, pp. 64–73. DOI: 10.3141/2003-09

64

Prato and Bekhor

65

importance of the quantitative composition of the choice set. Model estimates from different choice sets for the same model specification make it possible to compare the robustness of route choice models. The remainder of the paper is structured as follows. The next section describes the experimental design for evaluation of the performances of different combinations of choice set composition and model specification. Then the numerical results are illustrated and the impact of the choice set specification on parameter estimates and log likelihood function values is discussed. Finally, the results are summarized and conclusions are presented.

(33). Prato and Bekhor (5) provide details about path generation methods from a theoretical, behavioral, and practical perspective. Figure 1 illustrates the network structure and highlights the position of the common destination of the survey participants as well as the major arterials according to the road hierarchy. The labeling approach is applied by minimizing the path cost function with respect to attributes such as link distance, free-flow time, travel time, and delay, which measures the level of congestion through the difference between travel times in congested and free-flow conditions. The link elimination approach is modified by running the following routine 10 times: (a) calculation of the shortest travel time path, (b) elimination from the shortest path of a link that takes the driver farther from the destination and closer to the origin or compels the driver to turn from a high hierarchical road to a low hierarchical road, and (c) computation of the next shortest path. The link penalty approach is adapted by repeating a similar procedure 15 times: (a) calculation of the shortest travel time path, (b) penalizing the shortest path links with a factor equal to 1.05, and (c) computation of the next best path. The simulation method is implemented twice by calculating the shortest path for each draw of link impedances from a truncated normal distribution with the mean equal to the travel time, variances equal to 20% or 100% of the mean, left truncation limit equal to the free-flow time, and right truncation limit equal to the time calculated for a minimum speed assumed equal to 10 km/h. The branch and bound algorithm is applied by defining the factors of the branching rule. The behavioral constraints exclude from the path set routes that take the driver 10% of the distance farther from the destination and closer to the origin, constitute unrealistic options because of being more than 150% of the maximum impedance in terms

EXPERIMENTAL DESIGN This section describes implementation of the experimental design, which consists of (a) application of the path generation techniques to the urban network, (b) formation of initial choice sets that are behaviorally consistent with the observed behavior and extraction of subchoice sets that are randomly sampled from the initial sets of alternatives, (c) estimation of the route choice models using the generated choice sets, and (d) evaluation of the ability to reproduce utility parameter estimates and log likelihood function values for different combinations of choice set composition and model specification.

Path Generation Path generation techniques are applied to the urban network of the city of Turin, Italy, which contains 419 nodes and 1,427 links on the basis of the urban traffic plan designed by the municipality in 2001

NS1 NS2

NS3

P WE1 WE2 WE3

FIGURE 1

Urban network of Turin.

66

Transportation Research Record 2003

of travel time, contain detours larger than 120% with respect to other routes, involve difficulty in being distinguished due to their high similarity (80% or more) to other paths, or account for more than four maneuvers that cause delays and dangers to traffic circulation.

Composition of Choice Set The objective of the choice set formation is maximization of coverage of the collected routes and the consequent composition of choice sets behaviorally consistent with the observed behavior. Choice sets may correspond to path sets generated by single methods with good performance indexes or to the combination of path sets produced by different methods with poor individual performances. Coverage measures the percentage of observations for which a path generation technique reproduces the actual behavior according to a certain overlap threshold, which expresses the degree of similarity between generated and collected routes. N

max ∑ I ( Onr ≥ δ ) r

(1)

n =1

where I() = coverage function, equal to 1 when its argument is true and 0 otherwise, Onr = overlap measure for technique r and observation n, and δ = overlap threshold. Onr =

Lnr Ln

(2)

where Lnr is the overlapping length between generated and observed routes and Ln is the length of the observed path for driver n. The index of behavioral consistency compares a path generation method with the ideal algorithm that would replicate link by link all the routes reported in the survey, with a resulting coverage of 100% for a 100% overlap threshold. N

∑O

nr,max

CI r =

n =1

N Omax

(3)

where CIr = consistency index of algorithm r, Onr,max = maximum overlap measure obtained with the paths generated by algorithm r for the observed choice of each driver n, and Omax = 100% overlap over all N observations for the ideal algorithm. After initial choice sets are composed by maximizing the objective functions, reduced choice sets are formed by extracting percentages of alternatives from these initial data sets. The number of alternatives varies across the observations, and for each observation and each choice set (S − 1) alternatives except the chosen one are randomly selected and then the chosen route is added to achieve the size S corresponding, respectively, to 75%, 50%, and 25% of the alternatives in the initial data sets. For each class of sample size reduction, the sampling procedure is repeated 10 times to account for variance due to the randomness of the selection process.

Specifications of Route Choice Model In this analysis, route choice models account for correlation among alternatives and introduce specific parameters to be estimated. C-logit and PSL present similar functional forms by maintaining the MNL structure, but each model interprets differently the correction term that measures the degree of similarity of each route with respect to all other routes in the choice set. In C-logit, the commonality factor indicates that the utility of a path must be reduced because of similarity with other routes and is always greater than or equal to 1. In PSL, the path size presents the fraction of the path that constitutes a “full” alternative and is always less than or equal to 1. In the present research, commonality factors and path sizes are calculated according to the following formulations (6, 8): ⎡ ⎛ Lkl CFk = ln ⎢1 + ∑ ⎜ ⎢ l ∈C ⎝ L L k l ⎢⎣ k ≠l n

⎞ ⎛ Lk − Lkl ⎞ ⎤⎥ ⎟ ⎜⎝ L − L ⎟⎠ ⎥ ⎠ l kl ⎥⎦

(4)

where CFk Lk Ll Lkl PSk =

= = = =

commonality factor of route k, length of route k, length of each route l in choice set Cn, and common length between routes k and l. La

∑L

a∈Γ k

k

1 γ

⎛ Lk ⎞ ∑ ⎜ ⎟ δ al l ∈Cn ⎝ Ll ⎠

(5)

where PSk Γk La δal

= = = =

path size of route k, set of links belonging to route k, length of link a, link-path incidence dummy (equal to 1 if route l uses link a, and 0 otherwise), and γ = positive parameter that accounts for different size contributions due to routes with different lengths.

CNL and GNL introduce specific parameters to capture correlation among routes in the stochastic part of the utility function. In the current analysis, the inclusion coefficients of the alternatives in the nests are computed for both models, then for the CNL model the common nesting coefficient shared by all the nests is estimated, and for the GNL model the unique nesting coefficients for each nest are expressed according to the following parameterized formulation (9, 10): α mk =

Lm δ mk Lk

(6)

where αmk Lm Lk δmk

= = = =

inclusion coefficient of route k and link m (0 ≤ αmk ≤ 1), length of link m, length of route k, and link-path incidence dummy (equal to 1 if route k uses link m and 0 otherwise).

⎛ μm = ⎜1 − ⎜ ⎜⎝

∑α ∑δ

ml

l ∈Cn

ml

l ∈Cn

⎞ ⎟ ⎟ ⎟⎠

γ

(7)

Prato and Bekhor

67

where µm = nesting coefficient of link m (0 ≤ µm ≤ 1), δml = link-path incidence dummy (equal to 1 if route l uses link m and 0 otherwise), and γ = parameter to be estimated. In the present study, LK presents a factor analytic specification that accounts for subnetwork components. These components correspond to the major arterials in the urban road hierarchy. Correlation among alternatives is considered for paths that share the same subnetwork component, even if they do not physically overlap (13). Correlation on the urban network is captured by path size and expressed according to the original formulation (7 ). PSk =

La

∑L

a∈Γ k

k

1 ∑ δ al

(8)

l ∈Cn

PA p ,d =

The LK model exhibits a probability function that depends on the definition of vector ζ of standard normal variables: Pk = Λ ( k ζ ) =

exp ( μ ( X kβ + Fk Tζ )) ∑ exp ( μ ( Xlβ + FTl ζ ))

(9)

l ∈Cn

where Λ(kζ) β(l×V) Xk Fk

= = = =

probability that the choice is k given ζ, column vector of parameters, kth row of the matrix X(J×V) of explanatory variables, kth row of the factor loadings matrix F(K×M) (K paths and M subnetwork components), T(M×M) = diagonal matrix of σm covariance parameters associated with subnetwork component m, and ζ(M×l) = vector of standard normal variables.

Each element fkm of F equals the root square of the overlapping length between the route k and the subnetwork component m, and each element of T is a parameter to be estimated. Because vector ζ is unknown, the unconditional probability is computed by simulation: Mn

Pk = ∫ Λ ( k ζ ) ∏ φ (ζ n ) dζ n = ζ

m =1

Model estimates from choice sets constructed with different path generation techniques are not comparable to statistical methods such as likelihood ratios. A simulation approach evaluates the prediction accuracy of models estimated with different data sets and provides the comparison element to assess the most conducive technique to generating realistic and heterogeneous routes. The simulation approach applies parameter estimates to the appropriate choice set, calculates utility values for each alternative and each observation, and selects the chosen route as the alternative with the highest choice probability. Further, the approach counts predicted shares of individuals choosing the minimum distance path, the minimum travel time path, or routes containing relevant network landmarks and compares them with observed shares from the data collected. The resulting prediction accuracy provides the information necessary to compare models estimated with different choice sets.

1 D ∑ Λ ( k ζd ) D d =1

(10)

where φ(ζ) = standard normal density function, ζd = a draw d from the distribution of ζ, and D = number of draws. In the current analysis, 1,000 Halton draws are used to calculate the choice probability of each alternative route.

NR pq ,d NR q ,observed

(11)

where PAp,d = prediction accuracy for model p estimated with data set d, q = criterion for comparison (i.e., minimum distance path), NRpq,d = number of predicted chosen routes for criterion q applying model p to data set d, and NRq,observed = number of actual chosen routes for criterion q. Model estimates from choice sets built with sample size reduction are comparable in terms of proximity between estimated values and reference values. The true values of the model estimates would be the logical selection for the reference values, but these values are unknown and this lack of information mirrors the absence of knowledge about the consideration set of each survey participant. Model estimates from the initial choice sets are assumed as reference values, because the initial data sets are at least behaviorally consistent with the observed behavior. The proximity between estimated and reference values is evaluated according to three criteria: replication of the initial parameter estimates, reproduction of the choice probabilities for the actual chosen route, and accurate estimation of the log likelihood function values. The second and third criteria require application to the initial choice sets of the parameter estimates obtained using the reduced data sets to have a common support for meaningful comparison. The proximity evaluation is conducted according to two properties: the bias, equal to the difference between the mean of estimates for each sample size class of alternatives across the 10 runs and the initial values, and the repetition variance, equal to the variance in model parameters and likelihood values across the 10 runs of each class of sample size reduction. The proximity measurement is computed for each combination of criterion and property according to two error measures: root-mean-square error (RMSE) and mean-absolute-percentage error (MAPE).

Comparison Procedure The objective of the comparison procedure is the evaluation of differences in parameter estimates and log likelihood function values across choice sets that are different in the qualitative and quantitative composition.

NUMERICAL RESULTS This section illustrates the results of applying the experimental design to the collected data to evaluate the impact of the choice set composition in route choice modeling.

68

Transportation Research Record 2003

TABLE 1

Coverage and Behavioral Consistency of Path Generation Techniques Coverage (%) for Overlap Threshold Equal to

Path Generation Technique

100%

90%

80%

70%

Behavioral Consistency

Labeling approach Link elimination Link penalty Simulation (low variance) Simulation (high variance) Branch and bound

40.68 58.47 53.81 49.15 61.44 91.10

40.68 58.47 53.81 49.15 61.86 91.53

44.91 69.92 62.29 54.24 71.19 96.61

48.31 81.78 68.22 59.32 81.36 97.88

0.672 0.872 0.813 0.755 0.881 0.979

Choice Set Formation

Comparison Between Choice Sets

Among the 276 responses collected with the web-based survey, this analysis accounts for observations without any incorrectly coded route. A total of 236 actual chosen routes over 182 origin–destination pairs constitute the comparison term for evaluating the coverage and the behavioral consistency of implemented path generation techniques with respect to the ideal algorithm, as indicated in Table 1. The branch and bound algorithm largely outperforms each single generation technique for both the high consistency with respect to the observed behavior and the exact replication of more than 90% of the actual chosen routes. Simulation with a high variance and a link elimination approach show less satisfying results for both the consistency index and the coverage. Simulation with a low variance, a link penalty, and particularly a labeling approach present poorer performances. With respect to the path set generated with the branch and bound algorithm, the path set resulting from the combination of all the other generation techniques shows similar behavioral consistency and comparable coverage for an 80% overlap threshold. The choice set formation process excludes from consideration observations that appear inconsistent with the actual behavior (because they do not reproduce the chosen route for an 80% overlap threshold in the coverage function) and with the sampling method (because they do not present an alternative to the collected route for a sample size equal to 25% of the initial sample dimension). The choice set formation process examines path sets generated with the different techniques, considers the number of observations consistent with the actual behavior, and combines path sets created with single techniques that present limited coverage and consequently a limited number of observations for modeling purposes. Accordingly, constructed choice sets consist of 216 observations that contain the same observed route and at least five alternatives; the first choice set corresponds to the path set generated with the branch and bound algorithm, and the second choice set merges the path sets created with the techniques based on the shortest path search. Path sets created with different generation techniques contain dissimilar routes for the same origin–destination pair; consequently, choice sets from different path generation methods are composed of dissimilar alternative routes. Table 2 presents the characteristics of the choice sets and shows that their composition differs significantly for almost 70% of the observations (fewer than one-third of alternative routes are common to both data sets), and it appears similar for more than 13% of the observations (more than two-thirds of alternative routes are common to both data sets). Most likely, the small distance between origin and destination for these observations produces this effect, because the number of possible available alternatives decreases and different path generation techniques produce similar routes.

A linear-in-parameters utility function with 10 explanatory variables was used as a basis for model estimation, following previous work (32, 33). The variables can be grouped into three main categories: level of service, landmark, and individual variables. There are three level-of-service variables: the first variable measures the distance traveled and two additional variables measure travel time, with a distinction between experienced and inexperienced drivers. There are seven landmark dummy variables, equal to 1 if an alternative passes the landmark, and equal to 0 otherwise. The model also includes three behavioral variables measured at the individual level through factor analysis: habit, time-saving skill, and navigation abilities. Estimation results for the initial choice sets are presented in Tables 3 and 4. Significantly, the same description of choice behavior applies for both choice sets. Parameter estimates suggest that, on the one hand, drivers minimize distance and travel time; on the other hand, they behave according to their experience and habits. Travel time appears to be more relevant to experienced drivers, and the assumption that individuals navigate in the urban network through landmarks appears to be justified by the significance of landmark dummies throughout all the model structures and is emphasized by the significance of the subcomponent networks in the LK model. Accounting for the correlation structure of the alternatives within the stochastic part of the utility function improves likelihood values, as CNL, GNL, and LK outperform MNL modifications, with the exception of PSL estimated with the merged data set. Repre-

TABLE 2

Composition of Constructed Choice Sets Branch and Bound

Merged

Coverage (100% threshold)

91.10%

86.44%

Coverage (80% threshold)

96.61%

95.76%

Behavioral consistency

97.91%

98.46%

Total number of routes

4,625

6,881

Choice Set Characteristic

Maximum number of routes per observation

44

55

Median number of routes per observation

17

32

Observations with more than 40 routes

6.36%

31.36%

Shared routes

1,437

Observations with dissimilar sets of routes

68.06%

Observations with similar sets of routes

13.89%

Prato and Bekhor

TABLE 3

69

Model Estimation with Branch and Bound Choice Set

Variable Distance (est., t-stat.) Travel time (exp. driver) Travel time (inexp. driver) Sabotino Square Adriano Square Rivoli Square Bernini Square Sommeiller Bridge Dante Bridge Orbassano Square Habit Time-saving skill Navigating ability Commonality factor ln path size Exp path size Nesting coefficient Exp nesting coefficient Sigma WE1 Sigma WE2 Sigma WE3 Sigma NS1 Sigma NS2 Sigma NS3 Sigma NS4 Log likelihood at estimates Adjusted rho-bar squared

C-Logit −1.110 (−4.3) −0.524 (−6.0) −0.304 (−3.5) 1.504 (3.5) 1.009 (2.4) −1.023 (−1.9) −0.769 (−2.0) 2.948 (7.2) 2.051 (3.5) −0.901 (−3.0) −0.474 (−1.9) 0.576 (2.2) 0.229 (2.4) −1.024 (−2.5)

PSL −0.557 (−2.1) −0.471 (−5.4) −0.269 (−3.2) 1.177 (2.9) 1.023 (2.6) −0.667 (−1.2) −0.643 (−1.7) 2.600 (6.4) 2.222 (3.8) −0.970 (−3.2) −0.432 (−1.8) 0.472 (1.9) 0.220 (2.4)

CNL −1.167 (−4.4) −0.409 (−5.0) −0.233 (−3.7) 1.442 (2.8) 0.875 (2.1) −1.044 (−2.1) −0.733 (−2.2) 2.725 (4.1) 2.137 (3.1) −0.706 (−2.5) −0.197 (−1.5) 0.380 (1.9) 0.166 (2.5)

GNL −1.146 (−5.4) −0.386 (−5.0) −0.223 (−3.1) 1.268 (3.0) 0.725 (1.7) −0.976 (−1.9) −0.636 (−1.9) 2.482 (6.8) 2.018 (4.3) −0.560 (−2.5) −0.321 (−2.1) 0.302 (1.4) 0.231 (3.2)

1.161 (4.8) 14.501 (3.6)

LK −1.634 (−5.4) −0.575 (−5.3) −0.333 (−3.7) 1.250 (2.5) 0.566 (1.4) −0.480 (−1.1) −1.329 (−2.2) 2.781 (4.3) 2.533 (3.3) −0.860 (−2.7) −0.245 (−2.6) 1.083 (3.4) 0.194 (2.2) 0.739 (2.1)

0.335 (2.3) 5.581 (5.2)

−454.63 0.234

−443.94 0.250

−429.61 0.275

−438.14 0.261

4.560 (3.4) 2.154 (3.3) 2.243 (2.0) 4.659 (3.6) 2.497 (2.1) 2.309 (2.2) 2.566 (1.5) −435.93 0.253

NOTE: Models estimated using Biogeme software and Gauss matrix programming language.

senting the correlation through subnetwork components increases model performances with respect to MNL modifications, but this LK specification produces less satisfying results than the nested structures. A similar LK model was found to perform well in other data sets (13). Considering a common nesting coefficient shared by all the network links enhances goodness-of-fit measures, as CNL performs better than GNL. The exception of PSL performing better than every other model specification with the merged data set suggests that the choice set composition effectively influences model performances, as the better model with one choice set is not the better model with the other. Further investigation in the definition of path size, nesting coefficients, and subnetwork components is required but is beyond the scope of this paper. The smaller amount of alternatives in the branch and bound choice set leads to higher probabilities of selecting the actual chosen route and, consequently, to higher likelihood values and better goodness of fit. Simulation approach results are presented in Table 5. Models simulate with good accuracy the behavior of both the smaller number of individuals who have selected the minimum distance or minimum travel time path, and the larger number of drivers who have chosen a route passing a specific landmark. Prediction accuracy suggests that the branch and bound algorithm generates path sets better fitting the representation of the observed choice behavior with respect to the shortest-path-based techniques.

Comparison Within Choice Sets The evaluation of model robustness with respect to the choice set composition considers MNL estimates as a comparison term for the error measures. The procedure accounts for six model specifications estimated for 10 repetitions of three classes of sample size reduction from two initial choice sets. A total of 360 models are estimated and successively applied to the initial data sets to measure the impact over choice probabilities and likelihood values. Different levels of comparison are available from this numerical experiment: across choice sets, across sample sizes, and across model specifications. The discussion presented in this section attempts to interpret the large amount of information resulting from estimation of the route choice models for all the generated choice sets. Tables 6 through 8 present the computational results. Across the error measures for each combination of criterion and property, some common findings are discernible. First, error measures for MNL modifications are smaller than error measures for nested structures and LK. Models that maintain the simple logit structure while accounting for similarities among routes show more robustness with respect to the choice set reduction, whereas GEV structures exhibit more dependence on the choice set composition. Second, none of the initial choice sets yields more robustness than the other. Considering the same model specification, some error

70

Transportation Research Record 2003

TABLE 4

Model Estimation with Merged Choice Set

Variable

C-Logit

Distance (est., t-stat.) Travel time (exp. driver) Travel time (inexp. driver) Sabotino Square Adriano Square Rivoli Square Bernini Square Sommeiller Bridge Dante Bridge Orbassano Square Habit Time-saving skill Navigating ability Commonality factor ln path size Exp path size Nesting coefficient Exp nesting coefficient Sigma WE1 Sigma WE2 Sigma WE3 Sigma NS1 Sigma NS2 Sigma NS3 Sigma NS4 Log likelihood at estimates Adjusted rho-bar squared

PSL

−0.888 (−3.8) −0.494 (−5.9) −0.347 (−4.2) 2.436 (5.6) 1.572 (3.7) −1.142 (−2.0) −1.149 (−2.9) 3.903 (8.1) 3.592 (5.2) −1.299 (−3.6) −0.641 (−2.3) 0.528 (2.1) 0.155 (1.4) −0.830 (−3.1)

CNL

−0.422 (−2.0) −0.435 (−5.6) −0.311 (−4.0) 1.697 (4.2) 1.477 (3.7) −0.802 (−1.4) −1.058 (−2.7) 3.148 (7.1) 3.117 (4.8) −1.216 (−3.7) −0.585 (−2.2) 0.422 (2.0) 0.131 (1.2)

GNL

−0.847 (−3.8) −0.309 (−5.0) −0.235 (−3.2) 2.079 (5.3) 1.375 (3.4) −1.071 (−2.0) −0.911 (−3.1) 3.378 (7.8) 3.656 (5.8) −1.287 (−3.9) −0.288 (−2.2) 0.238 (1.9) 0.162 (1.5)

LK

−0.812 (−4.0) −0.354 (−4.8) −0.221 (−3.2) 2.080 (5.3) 1.312 (3.6) −1.053 (−2.2) −1.009 (−3.1) 3.419 (7.7) 3.564 (6.1) −1.278 (−4.0) −0.447 (−2.4) 0.408 (2.0) 0.212 (2.2)

1.172 (7.0) 33.747 (5.2)

−1.485 (−6.0) −0.558 (−6.0) −0.324 (−4.0) 1.772 (3.6) 0.659 (1.6) −0.847 (−1.9) −1.326 (−2.1) 3.709 (6.4) 3.599 (5.0) −1.537 (−3.7) −0.373 (−4.5) 1.021 (4.2) 0.113 (1.0) 0.209 (0.8)

0.547 (2.1) 4.604 (5.6)

−574.59 0.176

−538.41 0.225

−560.98 0.195

−567.47 0.186

5.677 (4.4) 2.636 (4.2) 2.713 (2.5) 4.370 (4.6) 2.597 (2.3) 1.872 (2.2) 1.381 (1.1) −555.04 0.193

NOTE: Models estimated using Biogeme software and Gauss matrix programming language.

measures are inferior for the branch and bound data set and others are inferior for the merged data set. The number of alternatives appears to be more influential than the nature of the routes once the reduction of the sample size is applied. Third, none of the model specifications appears to be more robust than the others with respect to all the criteria. Considering the same initial choice sets, some models present greater ability to replicate correctly choice probabilities of the actual choices and others display higher accuracy in estimating overall likelihood values or superior capacity in replicating parameter estimates. From a general perspective, the first finding is coherent with the theoretical consistency of the MNL model with the sampling of

TABLE 5

Model MNL C-LOGIT PSL CNL GNL LK

alternatives. The second and third results suggest that the number of alternatives is extremely relevant in route choice modeling per se, regardless of the model specification and the path generation technique used. From a closer perspective, Table 6 focuses on replication of the model parameters. When considering bias RMSE, only MNL and C-logit present some accuracy in replicating parameter estimates. RMSE values for PSL, CNL, GNL, and LK are affected by the variation of path size exponent, common nesting coefficient, nesting coefficient exponent, and variance of the subnetwork components across the repetitions for each class of sample size reduction. When considering bias MAPE, the difference between MNL modifications

Measures of Prediction Accuracy Accuracy (%) for Branch and Bound Choice Set

Accuracy (%) for Merged Choice Set

Min. Distance

Min. Travel Time

Landmarks

Min. Distance

Min. Travel Time

Landmarks

74.58 74.58 74.58 64.41 61.02 67.80

65.22 69.57 65.22 60.87 58.70 63.04

82.28 83.54 84.81 87.34 86.08 86.71

63.16 63.16 75.44 56.14 54.39 57.89

69.70 69.70 69.70 54.55 51.52 60.61

65.43 65.43 70.37 69.14 68.52 68.62

Prato and Bekhor

71

TABLE 6

Measures of Ability to Replicate Parameter Estimates Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.0649 0.0555 0.0604 0.0479 0.0847 0.0675 0.0714 0.0572 0.4643 0.0890 0.4189 0.0592 0.1519 0.2145 0.0561 0.0558 0.1398 0.0999 0.1099 0.0834 0.3611 0.1180 0.3130 0.1019

0.1428 0.1207 0.1273 0.1013 0.1729 0.1367 0.1356 0.1119 0.7992 0.1678 0.6521 0.1083 0.2020 0.2291 0.1257 0.1255 0.2270 0.1561 0.1229 0.1162 0.6537 0.2172 0.4954 0.1711

0.2283 0.2084 0.2090 0.1702 0.2653 0.2277 0.2354 0.1860 0.9858 0.2678 0.9556 0.1945 0.2567 0.2988 0.2047 0.1881 0.4521 0.2813 0.2323 0.2158 1.1221 0.3853 0.8034 0.4115

0.0733 0.0454 0.0649 0.0334 0.0740 0.0498 0.0648 0.0389 0.6281 0.0588 0.6178 0.0449 0.1066 0.1078 0.0853 0.0833 0.0785 0.0587 0.0684 0.0467 0.2467 0.0922 0.1842 0.0687

0.1287 0.0866 0.0890 0.0592 0.1394 0.1007 0.1017 0.0709 1.5668 0.1233 1.5284 0.0897 0.1744 0.1616 0.1427 0.1343 0.1313 0.1064 0.0912 0.0692 0.4393 0.1639 0.3682 0.1255

0.2499 0.1610 0.1672 0.0975 0.2430 0.1726 0.1569 0.0983 1.1500 0.1811 1.1177 0.1150 0.3017 0.2701 0.1637 0.1332 0.3512 0.2359 0.1884 0.1352 0.8573 0.3809 0.6486 0.2362

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

TABLE 7

Measures of Reproducibility of Choice Probabilities for Chosen Routes Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.0102 0.0447 0.0087 0.0375 0.0114 0.0510 0.0097 0.0428 0.0119 0.0533 0.0091 0.0396 0.0318 0.1544 0.0136 0.0576 0.0212 0.0836 0.0176 0.0688 0.0172 0.0762 0.0155 0.0674

0.0239 0.1049 0.0190 0.0768 0.0252 0.1092 0.0199 0.0806 0.0269 0.1060 0.0200 0.0802 0.0427 0.1975 0.0284 0.1216 0.0359 0.1448 0.0262 0.1048 0.0317 0.1408 0.0245 0.1100

0.0425 0.1916 0.0326 0.1322 0.0420 0.1952 0.0322 0.1429 0.0495 0.1937 0.0352 0.1509 0.0702 0.3320 0.0516 0.2072 0.0732 0.3414 0.0474 0.1989 0.0569 0.2488 0.0464 0.2000

0.0102 0.0629 0.0044 0.0293 0.0107 0.0643 0.0052 0.0330 0.0128 0.0662 0.0072 0.0350 0.0259 0.1948 0.0157 0.0706 0.0207 0.1253 0.0077 0.0491 0.0109 0.0676 0.0075 0.0470

0.0199 0.1210 0.0071 0.0486 0.0212 0.1275 0.0093 0.0592 0.0335 0.1617 0.0245 0.1022 0.0477 0.3559 0.0375 0.2242 0.0299 0.1464 0.0097 0.0587 0.0171 0.1174 0.0115 0.0765

0.0331 0.2202 0.0106 0.0802 0.0342 0.2277 0.0113 0.0832 0.0378 0.2083 0.0211 0.1060 0.0537 0.3282 0.0262 0.1279 0.0506 0.2744 0.0186 0.1178 0.0354 0.2635 0.0220 0.1460

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

72

Transportation Research Record 2003

TABLE 8

Measures of Accuracy in Estimation of Log Likelihood Function Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.3276 0.0006 0.1456 0.0003 0.4869 0.0009 0.2350 0.0004 0.6228 0.0013 0.2446 0.0004 6.7002 0.0151 1.7140 0.0035 1.0892 0.0016 0.8357 0.0014 1.4128 0.0028 0.7169 0.0012

1.4262 0.0029 0.4944 0.0009 1.8003 0.0037 0.6589 0.0011 2.5437 0.0050 1.2112 0.0019 9.8788 0.0213 3.7716 0.0071 4.4110 0.0064 3.3920 0.0063 5.0438 0.0104 2.2271 0.0041

4.2371 0.0091 0.8865 0.0015 4.8227 0.0103 1.1945 0.0021 7.3208 0.0155 2.5085 0.0045 16.4802 0.0337 7.8954 0.0152 17.1794 0.0362 6.5502 0.0109 16.0366 0.0349 5.0301 0.0102

0.2360 0.0004 0.0808 0.0001 0.2918 0.0005 0.0952 0.0001 0.5602 0.0009 0.2377 0.0003 10.0607 0.0157 4.8653 0.0072 0.8684 0.0013 0.4676 0.0007 0.7054 0.0012 0.2289 0.0004

0.8376 0.0014 0.2901 0.0004 1.3678 0.0021 0.6082 0.0008 1.9907 0.0035 0.7000 0.0011 28.3954 0.0404 17.1096 0.0227 1.8398 0.0028 0.9522 0.0016 2.3353 0.0039 0.8378 0.0011

3.6537 0.0057 1.5054 0.0023 4.9492 0.0080 1.8743 0.0030 6.0835 0.0103 2.5357 0.0036 41.0378 0.0691 13.4770 0.0177 7.4069 0.0123 2.5153 0.0037 11.5179 0.0195 3.8742 0.0050

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

and the other models in terms of parameter robustness is evident. For the PSL model in particular, the bias RMSE is affected by the high estimated values of the path size exponent, whereas the bias MAPE is not influenced by outliers and thus is comparable to the measure computed for the C-logit model. Variance errors are similar between models, and this finding enforces the idea that the number of alternatives influences parameter estimates more than the generation method of the routes. Table 7 focuses on reproducing the choice probabilities for the actual chosen route and consequently on the individual log likelihood function values. Bias error measures confirm differences between MNL modifications and different model specifications, as C-logit and PSL reproduce probabilities with lower average error and lower variation. The explanation remains in the formulation of choice probabilities, because the variation of nesting coefficients and nesting coefficient exponents directly affects computation of the probability of choosing the actual route, whereas variation of the path size exponent modifies the path size term without directly entering the probability calculation. The LK model maintains a logit probability function, even though numerical simulation is required for the integration, and accordingly the error measures are more similar to MNL than to the nested structures. Table 8 presents evidence that reproduction of the individual likelihood values is less accurate than estimation of the overall log likelihood function. Nonetheless, model specifications present similar relative differences in terms of error measures as in the previous criterion. For MNL modifications, biases are negligible with 75%, reasonable with 50%, and critical with 25% of the initial number of alternatives, whereas variances are minimal across the repetitions. For nested structures and LK, biases are sensibly relevant even with 50% and extremely critical for 25% of the initial sample sizes, whereas variances are more sensitive across the repetitions. In particular, for

these models a reduction to only 25% of the initial choice sets appears to affect significantly the accuracy in likelihood estimation.

SUMMARY AND CONCLUSIONS Modeling revealed that route choice behavior involves formation of a set of alternative routes and estimation of a discrete choice model. This paper focuses on the impact of choice set composition on model estimates by designing an experimental analysis of actual route choices of individuals moving from home to work in an urban environment. The experimental analysis focuses on the application of several path generation techniques to the urban network of the case study, the generation of choice sets consistent with the observed behavior, the extraction of alternatives according to three classes of sample size reduction, the repetition of the estimation of six route choice models for each of these classes, and the application of the estimated parameters to the initial data sets for error measure calculation. Comparison of prediction accuracy across different choice sets suggests that generation techniques that produce heterogeneous routes allow for estimating models with better prediction abilities with respect to the outcomes of the drivers’ actual choices. Error measures provide evidence that random sampling produces good estimates for MNL modifications with samples containing 50% of the initial number of alternatives, whereas nested structures present high variation of model estimates even for a relatively small size reduction. From a practical perspective, however, CNL and GNL likelihood values for reduced choice sets still indicate that the nested structures are significantly better than the MNL model, even though parameter estimates are less robust. Further, the numerical analysis shows that the number of alternatives constitutes a significant issue in route choice modeling, because

Prato and Bekhor

errors exhibit comparable values in terms of variance across the repetitions regardless of the initial composition of the data set and the model specification estimated. The results presented in this paper are based on a relatively small data set, and further investigation will apply the illustrated experimental analysis to additional data sets to generalize these conclusions. Even though each numerical experiment requires caution in generalizing the results, these findings suggest guidelines for analysts intending to estimate and calculate predictions in route choice modeling from the observation of actual behavior: (a) apply a branch and bound algorithm to generate heterogeneous routes; (b) estimate MNL modifications in choice situations with a large number of alternatives, as reduction of the sample size would reduce the computational expenditure without a significant effect on the model estimates because of the parameter robustness; and (c) estimate nested or LK structures in choice situations with a small number of alternatives, because the likelihood values would improve without relevant effects on computational expenditure.

ACKNOWLEDGMENTS The authors are grateful to the anonymous reviewers who provided insightful comments on the initial version of this paper.

REFERENCES 1. Ben-Akiva, M. E., M. J. Bergman, A. J. Daly, and R. Ramaswamy. Modeling Inter-Urban Route Choice Behaviour. In Proc., 9th International Symposium on Transportation and Traffic Theory, VNU Science Press, Utrecht, Netherlands, 1984, pp. 299–330. 2. De La Barra, T., B. Perez, and J. Anez. Multidimensional Path Search and Assignment. Presented at 21st PTRC Summer Annual Meeting, Manchester, United Kingdom, 1993. 3. Azevedo, J. A., M. E. O. Santos Costa, J. J. E. R. Silvestre Madera, and E. Q. Vieira Martins. An Algorithm for the Ranking of Shortest Paths. European Journal of Operational Research, Vol. 69, 1993, pp. 97–106. 4. Bekhor, S., M. Ben-Akiva, and S. Ramming. Route Choice: Choice Set Generation and Probabilistic Choice Models. In Proc., 4th Triennial Symposium on Transportation Analysis Conference, University of Azores, Sao Miguel, Portugal, 2001, pp. 459– 464. 5. Prato, C. G., and S. Bekhor. Applying Branch-and-Bound Technique to Route Choice Set Generation. In Transportation Research Record: Journal of the Transportation Research Board, No. 1985, Transportation Research Board of the National Academies, Washington, D.C., 2006, pp. 19–28. 6. Cascetta, E., A. Nuzzolo, F. Russo, and A. Vitetta. A Modified Logit Route Choice Model Overcoming Path Overlapping Problems: Specification and Some Calibration Results for Interurban Networks. In Proc., 13th International Symposium on Transportation and Traffic Theory, Pergamon, Lyon, France, 1996, pp. 697–711. 7. Ben-Akiva, M., and M. Bierlaire. Discrete Choice Methods and Their Applications to Short Term Travel Decisions. In Handbook of Transportation Science, Kluwer, Dordrecht, Netherlands, 1999, pp. 5–12. 8. Ramming, S. Network Knowledge and Route Choice. PhD thesis. Massachusetts Institute of Technology, Cambridge, 2001. 9. Prashker, J. N., and S. Bekhor. Investigation of Stochastic Network Loading Procedures. In Transportation Research Record 1645, TRB, National Research Council, Washington, D.C., 1998, pp. 94–102. 10. Bekhor, S., and J. N. Prashker. Stochastic User Equilibrium Formulation for Generalized Nested Logit Model. In Transportation Research Record: Journal of the Transportation Research Board, No. 1752, TRB, National Research Council, Washington, D.C., 2001, pp. 84–90. 11. Yai, T., S. Iwakura, and S. Morichi. Multinomial Probit with Structured Covariance for Route Choice Behavior. Transportation Research B, Vol. 31, 1997, pp. 195–207.

73

12. Bekhor, S., M. S. Ben-Akiva, and S. M. Ramming. Adaptation of Logit Kernel to Route Choice Situation. In Transportation Research Record: Journal of the Transportation Research Board, No. 1805, TRB, National Research Council, Washington, D.C., 2002, pp. 78–85. 13. Bierlaire, M., and E. Frejinger. Route Choice Models with Subpath Components. Presented at 5th Swiss Transport Research Conference, Ascona, Switzerland, 2005. 14. Ortuzar, J. D., and L. G. Willumsen. Modelling Transport, 3rd ed. John Wiley and Sons Ltd., Chichester, United Kingdom, 2001. 15. Stopher, P. Captivity and Choice in Travel Behavior Models. Transportation Journal of ASCE, Vol. 106, 1980, pp. 427– 435. 16. Williams, H. C. W. L., and J. D. Ortuzar. Behavioural Theories of Dispersion and Mis-specification of Travel Demand Models. Transportation Research B, Vol. 16, 1982, pp. 167–219. 17. Swait, J., and M. Ben-Akiva. Incorporating Random Constraints in Discrete Models of Choice Set Generation. Transportation Research B, Vol. 21, 1987, pp. 91–102. 18. Swait, J., and M. Ben-Akiva. Empirical Test of a Constrained Choice Discrete Model: Mode Choice in Sao Paulo, Brazil. Transportation Research B, Vol. 21, 1987, pp. 103–115. 19. Basar, G., and C. R. Bhat. A Parameterized Probabilistic Consideration Set Model for Airport Choice: An Application to the San Francisco Bay Area. Transportation Research B, Vol. 38, 2004, pp. 889–904. 20. McFadden, D. Modeling the Choice of Residential Location. In Transportation Research Record 673, TRB, National Research Council, Washington, D.C., 1978, pp. 72–77. 21. Nerella, S., and C. R. Bhat. Numerical Analysis of Effect of Sampling of Alternatives in Discrete Choice Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 1894, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 11–19. 22. Hansen, E. Industrial Location Choice in Sao Paulo, Brazil: A Nested Logit Model. Regional Science and Urban Economics, Vol. 17, 1987, pp. 89–108. 23. Friedman, J., D. Gerlowski, and J. Silberman. What Attracts Foreign Multinational Corporations? Evidence from Branch Plant Location in the United States. Journal of Regional Science, Vol. 32, 1992, pp. 403–418. 24. Woodward, D. Location Determinants of Japanese Manufacturing StartUps in the United States. Southern Economic Journal, Vol. 58, 1992, pp. 690–708. 25. Ben-Akiva, M., and J. L. Bowman. Integration of an Activity-Based Model System and a Residential Location Model. Urban Studies, Vol. 35, 1998, pp. 1131–1153. 26. Sermons, M. W., and F. S. Koppelman. Representing Differences Between Female and Male Commute Behavior in Residential Location Choice Models. Journal of Transport Geography, Vol. 9, 2001, pp. 101–110. 27. Bhat, C. R., and J. Y. Guo. A Mixed Spatially Correlated Model: Formulation and Application to Residential Choice Modeling. Transportation Research B, Vol. 38, 2004, pp. 147–168. 28. Pozsgay, M. A., and C. R. Bhat. Destination Choice Modeling for Home-Based Recreational Trips: Analysis and Implications for Land Use, Transportation, and Air Quality Planning. In Transportation Research Record: Journal of the Transportation Research Board, No. 1777, TRB, National Academy of Sciences, Washington, D.C., 2001, pp. 47–54. 29. Schlich, R., A. Simma, and K. W. Axhausen. Destination Choice Modeling for Different Leisure Activities. Presented at 2nd Swiss Transport Research Conference, Ascona, Switzerland, 2002. 30. Ben-Akiva, M., D. McFadden, and K. Train. The Demand for Local Telephone Service: A Fully Discrete Model of Residential Calling Patterns and Service Choices. Rand Journal of Economics, Vol. 18, 1987, pp. 109–123. 31. Gilbride, T. J., and G. M. Allenby. A Choice Model with Conjunctive, Disjunctive, and Compensatory Screening Rules. Marketing Science, Vol. 23, 2004, pp. 391– 406. 32. Prato, C. G., S. Bekhor, and C. Pronello. Methodology for Exploratory Analysis of Latent Factors Influencing Drivers’ Behavior. In Transportation Research Record: Journal of the Transportation Research Board, No. 1926, Transportation Research Board of the National Academies, Washington, D.C., 2005, pp. 115–125. 33. Prato, C. G. Latent Factors and Route Choice Behaviour. PhD thesis. Turin Polytechnic, Italy, 2005. The Transportation Demand Forecasting Committee sponsored publication of this paper.

Most route choice models are related to revealed choice behavior and are estimated by adding alternative paths to observed routes. This paper focuses on the effects of choice set composition in route choice modeling by designing an experimental analysis of actual route choice behavior of individuals driving habitually from home to work in an urban network. The numerical analysis concentrates on a qualitative perspective, by considering path sets built with different generation techniques, and a quantitative perspective, by accounting for path sets constructed with sample size reduction from each initial choice set. Comparison of prediction accuracy across different choice sets suggests that a recently developed branch and bound algorithm generates heterogeneous routes that allow for estimating models with better prediction abilities with respect to the outcomes of the drivers’ actual choices. Further, comparison of route choice models across different choice set compositions indicates that nonnested structures, such as C-logit and path size logit, yield more robust parameter estimates.

Most route choice models relate to revealed choice behavior and consider separately the generation of additional alternative paths and calculation of the probability of choosing the observed routes from the generated choice set. Selective path generation techniques are preferable to unrealistic exhaustive approaches to create alternative routes. Traditional methods, based on the shortest path search, include the following: a labeling approach, which minimizes generalized cost functions according to link attributes (1); a link penalty, which gradually increases the impedance of all links on the shortest path (2); a link elimination, which removes the shortest paths from the network in sequence to generate new routes (3); and a simulation method, which produces alternative paths by drawing link impedances from probability distributions (4). An alternative approach is the branch and bound algorithm, which constructs a connection tree between origin and destination of a trip by processing sequences of links according to a branching rule that accounts for behavioral constraints formulated to increase route likelihood and heterogeneity (5). Model specifications that account for correlation among alternatives are preferable to the multinomial logit (MNL) model to represent route choice behavior. MNL modifications, such as C-logit and path size logit (PSL), include a correction term in the deterministic part of the utility function and maintain a simple logit structure (6–8). C. G. Prato, Transportation Research Institute, and S. Bekhor, Faculty of Civil and Environmental Engineering, Technion–Israel Institute of Technology, Haifa 32000, Israel. Corresponding author: C. G. Prato, [email protected]. Transportation Research Record: Journal of the Transportation Research Board, No. 2003, Transportation Research Board of the National Academies, Washington, D.C., 2007, pp. 64–73. DOI: 10.3141/2003-09

64

Prato and Bekhor

65

importance of the quantitative composition of the choice set. Model estimates from different choice sets for the same model specification make it possible to compare the robustness of route choice models. The remainder of the paper is structured as follows. The next section describes the experimental design for evaluation of the performances of different combinations of choice set composition and model specification. Then the numerical results are illustrated and the impact of the choice set specification on parameter estimates and log likelihood function values is discussed. Finally, the results are summarized and conclusions are presented.

(33). Prato and Bekhor (5) provide details about path generation methods from a theoretical, behavioral, and practical perspective. Figure 1 illustrates the network structure and highlights the position of the common destination of the survey participants as well as the major arterials according to the road hierarchy. The labeling approach is applied by minimizing the path cost function with respect to attributes such as link distance, free-flow time, travel time, and delay, which measures the level of congestion through the difference between travel times in congested and free-flow conditions. The link elimination approach is modified by running the following routine 10 times: (a) calculation of the shortest travel time path, (b) elimination from the shortest path of a link that takes the driver farther from the destination and closer to the origin or compels the driver to turn from a high hierarchical road to a low hierarchical road, and (c) computation of the next shortest path. The link penalty approach is adapted by repeating a similar procedure 15 times: (a) calculation of the shortest travel time path, (b) penalizing the shortest path links with a factor equal to 1.05, and (c) computation of the next best path. The simulation method is implemented twice by calculating the shortest path for each draw of link impedances from a truncated normal distribution with the mean equal to the travel time, variances equal to 20% or 100% of the mean, left truncation limit equal to the free-flow time, and right truncation limit equal to the time calculated for a minimum speed assumed equal to 10 km/h. The branch and bound algorithm is applied by defining the factors of the branching rule. The behavioral constraints exclude from the path set routes that take the driver 10% of the distance farther from the destination and closer to the origin, constitute unrealistic options because of being more than 150% of the maximum impedance in terms

EXPERIMENTAL DESIGN This section describes implementation of the experimental design, which consists of (a) application of the path generation techniques to the urban network, (b) formation of initial choice sets that are behaviorally consistent with the observed behavior and extraction of subchoice sets that are randomly sampled from the initial sets of alternatives, (c) estimation of the route choice models using the generated choice sets, and (d) evaluation of the ability to reproduce utility parameter estimates and log likelihood function values for different combinations of choice set composition and model specification.

Path Generation Path generation techniques are applied to the urban network of the city of Turin, Italy, which contains 419 nodes and 1,427 links on the basis of the urban traffic plan designed by the municipality in 2001

NS1 NS2

NS3

P WE1 WE2 WE3

FIGURE 1

Urban network of Turin.

66

Transportation Research Record 2003

of travel time, contain detours larger than 120% with respect to other routes, involve difficulty in being distinguished due to their high similarity (80% or more) to other paths, or account for more than four maneuvers that cause delays and dangers to traffic circulation.

Composition of Choice Set The objective of the choice set formation is maximization of coverage of the collected routes and the consequent composition of choice sets behaviorally consistent with the observed behavior. Choice sets may correspond to path sets generated by single methods with good performance indexes or to the combination of path sets produced by different methods with poor individual performances. Coverage measures the percentage of observations for which a path generation technique reproduces the actual behavior according to a certain overlap threshold, which expresses the degree of similarity between generated and collected routes. N

max ∑ I ( Onr ≥ δ ) r

(1)

n =1

where I() = coverage function, equal to 1 when its argument is true and 0 otherwise, Onr = overlap measure for technique r and observation n, and δ = overlap threshold. Onr =

Lnr Ln

(2)

where Lnr is the overlapping length between generated and observed routes and Ln is the length of the observed path for driver n. The index of behavioral consistency compares a path generation method with the ideal algorithm that would replicate link by link all the routes reported in the survey, with a resulting coverage of 100% for a 100% overlap threshold. N

∑O

nr,max

CI r =

n =1

N Omax

(3)

where CIr = consistency index of algorithm r, Onr,max = maximum overlap measure obtained with the paths generated by algorithm r for the observed choice of each driver n, and Omax = 100% overlap over all N observations for the ideal algorithm. After initial choice sets are composed by maximizing the objective functions, reduced choice sets are formed by extracting percentages of alternatives from these initial data sets. The number of alternatives varies across the observations, and for each observation and each choice set (S − 1) alternatives except the chosen one are randomly selected and then the chosen route is added to achieve the size S corresponding, respectively, to 75%, 50%, and 25% of the alternatives in the initial data sets. For each class of sample size reduction, the sampling procedure is repeated 10 times to account for variance due to the randomness of the selection process.

Specifications of Route Choice Model In this analysis, route choice models account for correlation among alternatives and introduce specific parameters to be estimated. C-logit and PSL present similar functional forms by maintaining the MNL structure, but each model interprets differently the correction term that measures the degree of similarity of each route with respect to all other routes in the choice set. In C-logit, the commonality factor indicates that the utility of a path must be reduced because of similarity with other routes and is always greater than or equal to 1. In PSL, the path size presents the fraction of the path that constitutes a “full” alternative and is always less than or equal to 1. In the present research, commonality factors and path sizes are calculated according to the following formulations (6, 8): ⎡ ⎛ Lkl CFk = ln ⎢1 + ∑ ⎜ ⎢ l ∈C ⎝ L L k l ⎢⎣ k ≠l n

⎞ ⎛ Lk − Lkl ⎞ ⎤⎥ ⎟ ⎜⎝ L − L ⎟⎠ ⎥ ⎠ l kl ⎥⎦

(4)

where CFk Lk Ll Lkl PSk =

= = = =

commonality factor of route k, length of route k, length of each route l in choice set Cn, and common length between routes k and l. La

∑L

a∈Γ k

k

1 γ

⎛ Lk ⎞ ∑ ⎜ ⎟ δ al l ∈Cn ⎝ Ll ⎠

(5)

where PSk Γk La δal

= = = =

path size of route k, set of links belonging to route k, length of link a, link-path incidence dummy (equal to 1 if route l uses link a, and 0 otherwise), and γ = positive parameter that accounts for different size contributions due to routes with different lengths.

CNL and GNL introduce specific parameters to capture correlation among routes in the stochastic part of the utility function. In the current analysis, the inclusion coefficients of the alternatives in the nests are computed for both models, then for the CNL model the common nesting coefficient shared by all the nests is estimated, and for the GNL model the unique nesting coefficients for each nest are expressed according to the following parameterized formulation (9, 10): α mk =

Lm δ mk Lk

(6)

where αmk Lm Lk δmk

= = = =

inclusion coefficient of route k and link m (0 ≤ αmk ≤ 1), length of link m, length of route k, and link-path incidence dummy (equal to 1 if route k uses link m and 0 otherwise).

⎛ μm = ⎜1 − ⎜ ⎜⎝

∑α ∑δ

ml

l ∈Cn

ml

l ∈Cn

⎞ ⎟ ⎟ ⎟⎠

γ

(7)

Prato and Bekhor

67

where µm = nesting coefficient of link m (0 ≤ µm ≤ 1), δml = link-path incidence dummy (equal to 1 if route l uses link m and 0 otherwise), and γ = parameter to be estimated. In the present study, LK presents a factor analytic specification that accounts for subnetwork components. These components correspond to the major arterials in the urban road hierarchy. Correlation among alternatives is considered for paths that share the same subnetwork component, even if they do not physically overlap (13). Correlation on the urban network is captured by path size and expressed according to the original formulation (7 ). PSk =

La

∑L

a∈Γ k

k

1 ∑ δ al

(8)

l ∈Cn

PA p ,d =

The LK model exhibits a probability function that depends on the definition of vector ζ of standard normal variables: Pk = Λ ( k ζ ) =

exp ( μ ( X kβ + Fk Tζ )) ∑ exp ( μ ( Xlβ + FTl ζ ))

(9)

l ∈Cn

where Λ(kζ) β(l×V) Xk Fk

= = = =

probability that the choice is k given ζ, column vector of parameters, kth row of the matrix X(J×V) of explanatory variables, kth row of the factor loadings matrix F(K×M) (K paths and M subnetwork components), T(M×M) = diagonal matrix of σm covariance parameters associated with subnetwork component m, and ζ(M×l) = vector of standard normal variables.

Each element fkm of F equals the root square of the overlapping length between the route k and the subnetwork component m, and each element of T is a parameter to be estimated. Because vector ζ is unknown, the unconditional probability is computed by simulation: Mn

Pk = ∫ Λ ( k ζ ) ∏ φ (ζ n ) dζ n = ζ

m =1

Model estimates from choice sets constructed with different path generation techniques are not comparable to statistical methods such as likelihood ratios. A simulation approach evaluates the prediction accuracy of models estimated with different data sets and provides the comparison element to assess the most conducive technique to generating realistic and heterogeneous routes. The simulation approach applies parameter estimates to the appropriate choice set, calculates utility values for each alternative and each observation, and selects the chosen route as the alternative with the highest choice probability. Further, the approach counts predicted shares of individuals choosing the minimum distance path, the minimum travel time path, or routes containing relevant network landmarks and compares them with observed shares from the data collected. The resulting prediction accuracy provides the information necessary to compare models estimated with different choice sets.

1 D ∑ Λ ( k ζd ) D d =1

(10)

where φ(ζ) = standard normal density function, ζd = a draw d from the distribution of ζ, and D = number of draws. In the current analysis, 1,000 Halton draws are used to calculate the choice probability of each alternative route.

NR pq ,d NR q ,observed

(11)

where PAp,d = prediction accuracy for model p estimated with data set d, q = criterion for comparison (i.e., minimum distance path), NRpq,d = number of predicted chosen routes for criterion q applying model p to data set d, and NRq,observed = number of actual chosen routes for criterion q. Model estimates from choice sets built with sample size reduction are comparable in terms of proximity between estimated values and reference values. The true values of the model estimates would be the logical selection for the reference values, but these values are unknown and this lack of information mirrors the absence of knowledge about the consideration set of each survey participant. Model estimates from the initial choice sets are assumed as reference values, because the initial data sets are at least behaviorally consistent with the observed behavior. The proximity between estimated and reference values is evaluated according to three criteria: replication of the initial parameter estimates, reproduction of the choice probabilities for the actual chosen route, and accurate estimation of the log likelihood function values. The second and third criteria require application to the initial choice sets of the parameter estimates obtained using the reduced data sets to have a common support for meaningful comparison. The proximity evaluation is conducted according to two properties: the bias, equal to the difference between the mean of estimates for each sample size class of alternatives across the 10 runs and the initial values, and the repetition variance, equal to the variance in model parameters and likelihood values across the 10 runs of each class of sample size reduction. The proximity measurement is computed for each combination of criterion and property according to two error measures: root-mean-square error (RMSE) and mean-absolute-percentage error (MAPE).

Comparison Procedure The objective of the comparison procedure is the evaluation of differences in parameter estimates and log likelihood function values across choice sets that are different in the qualitative and quantitative composition.

NUMERICAL RESULTS This section illustrates the results of applying the experimental design to the collected data to evaluate the impact of the choice set composition in route choice modeling.

68

Transportation Research Record 2003

TABLE 1

Coverage and Behavioral Consistency of Path Generation Techniques Coverage (%) for Overlap Threshold Equal to

Path Generation Technique

100%

90%

80%

70%

Behavioral Consistency

Labeling approach Link elimination Link penalty Simulation (low variance) Simulation (high variance) Branch and bound

40.68 58.47 53.81 49.15 61.44 91.10

40.68 58.47 53.81 49.15 61.86 91.53

44.91 69.92 62.29 54.24 71.19 96.61

48.31 81.78 68.22 59.32 81.36 97.88

0.672 0.872 0.813 0.755 0.881 0.979

Choice Set Formation

Comparison Between Choice Sets

Among the 276 responses collected with the web-based survey, this analysis accounts for observations without any incorrectly coded route. A total of 236 actual chosen routes over 182 origin–destination pairs constitute the comparison term for evaluating the coverage and the behavioral consistency of implemented path generation techniques with respect to the ideal algorithm, as indicated in Table 1. The branch and bound algorithm largely outperforms each single generation technique for both the high consistency with respect to the observed behavior and the exact replication of more than 90% of the actual chosen routes. Simulation with a high variance and a link elimination approach show less satisfying results for both the consistency index and the coverage. Simulation with a low variance, a link penalty, and particularly a labeling approach present poorer performances. With respect to the path set generated with the branch and bound algorithm, the path set resulting from the combination of all the other generation techniques shows similar behavioral consistency and comparable coverage for an 80% overlap threshold. The choice set formation process excludes from consideration observations that appear inconsistent with the actual behavior (because they do not reproduce the chosen route for an 80% overlap threshold in the coverage function) and with the sampling method (because they do not present an alternative to the collected route for a sample size equal to 25% of the initial sample dimension). The choice set formation process examines path sets generated with the different techniques, considers the number of observations consistent with the actual behavior, and combines path sets created with single techniques that present limited coverage and consequently a limited number of observations for modeling purposes. Accordingly, constructed choice sets consist of 216 observations that contain the same observed route and at least five alternatives; the first choice set corresponds to the path set generated with the branch and bound algorithm, and the second choice set merges the path sets created with the techniques based on the shortest path search. Path sets created with different generation techniques contain dissimilar routes for the same origin–destination pair; consequently, choice sets from different path generation methods are composed of dissimilar alternative routes. Table 2 presents the characteristics of the choice sets and shows that their composition differs significantly for almost 70% of the observations (fewer than one-third of alternative routes are common to both data sets), and it appears similar for more than 13% of the observations (more than two-thirds of alternative routes are common to both data sets). Most likely, the small distance between origin and destination for these observations produces this effect, because the number of possible available alternatives decreases and different path generation techniques produce similar routes.

A linear-in-parameters utility function with 10 explanatory variables was used as a basis for model estimation, following previous work (32, 33). The variables can be grouped into three main categories: level of service, landmark, and individual variables. There are three level-of-service variables: the first variable measures the distance traveled and two additional variables measure travel time, with a distinction between experienced and inexperienced drivers. There are seven landmark dummy variables, equal to 1 if an alternative passes the landmark, and equal to 0 otherwise. The model also includes three behavioral variables measured at the individual level through factor analysis: habit, time-saving skill, and navigation abilities. Estimation results for the initial choice sets are presented in Tables 3 and 4. Significantly, the same description of choice behavior applies for both choice sets. Parameter estimates suggest that, on the one hand, drivers minimize distance and travel time; on the other hand, they behave according to their experience and habits. Travel time appears to be more relevant to experienced drivers, and the assumption that individuals navigate in the urban network through landmarks appears to be justified by the significance of landmark dummies throughout all the model structures and is emphasized by the significance of the subcomponent networks in the LK model. Accounting for the correlation structure of the alternatives within the stochastic part of the utility function improves likelihood values, as CNL, GNL, and LK outperform MNL modifications, with the exception of PSL estimated with the merged data set. Repre-

TABLE 2

Composition of Constructed Choice Sets Branch and Bound

Merged

Coverage (100% threshold)

91.10%

86.44%

Coverage (80% threshold)

96.61%

95.76%

Behavioral consistency

97.91%

98.46%

Total number of routes

4,625

6,881

Choice Set Characteristic

Maximum number of routes per observation

44

55

Median number of routes per observation

17

32

Observations with more than 40 routes

6.36%

31.36%

Shared routes

1,437

Observations with dissimilar sets of routes

68.06%

Observations with similar sets of routes

13.89%

Prato and Bekhor

TABLE 3

69

Model Estimation with Branch and Bound Choice Set

Variable Distance (est., t-stat.) Travel time (exp. driver) Travel time (inexp. driver) Sabotino Square Adriano Square Rivoli Square Bernini Square Sommeiller Bridge Dante Bridge Orbassano Square Habit Time-saving skill Navigating ability Commonality factor ln path size Exp path size Nesting coefficient Exp nesting coefficient Sigma WE1 Sigma WE2 Sigma WE3 Sigma NS1 Sigma NS2 Sigma NS3 Sigma NS4 Log likelihood at estimates Adjusted rho-bar squared

C-Logit −1.110 (−4.3) −0.524 (−6.0) −0.304 (−3.5) 1.504 (3.5) 1.009 (2.4) −1.023 (−1.9) −0.769 (−2.0) 2.948 (7.2) 2.051 (3.5) −0.901 (−3.0) −0.474 (−1.9) 0.576 (2.2) 0.229 (2.4) −1.024 (−2.5)

PSL −0.557 (−2.1) −0.471 (−5.4) −0.269 (−3.2) 1.177 (2.9) 1.023 (2.6) −0.667 (−1.2) −0.643 (−1.7) 2.600 (6.4) 2.222 (3.8) −0.970 (−3.2) −0.432 (−1.8) 0.472 (1.9) 0.220 (2.4)

CNL −1.167 (−4.4) −0.409 (−5.0) −0.233 (−3.7) 1.442 (2.8) 0.875 (2.1) −1.044 (−2.1) −0.733 (−2.2) 2.725 (4.1) 2.137 (3.1) −0.706 (−2.5) −0.197 (−1.5) 0.380 (1.9) 0.166 (2.5)

GNL −1.146 (−5.4) −0.386 (−5.0) −0.223 (−3.1) 1.268 (3.0) 0.725 (1.7) −0.976 (−1.9) −0.636 (−1.9) 2.482 (6.8) 2.018 (4.3) −0.560 (−2.5) −0.321 (−2.1) 0.302 (1.4) 0.231 (3.2)

1.161 (4.8) 14.501 (3.6)

LK −1.634 (−5.4) −0.575 (−5.3) −0.333 (−3.7) 1.250 (2.5) 0.566 (1.4) −0.480 (−1.1) −1.329 (−2.2) 2.781 (4.3) 2.533 (3.3) −0.860 (−2.7) −0.245 (−2.6) 1.083 (3.4) 0.194 (2.2) 0.739 (2.1)

0.335 (2.3) 5.581 (5.2)

−454.63 0.234

−443.94 0.250

−429.61 0.275

−438.14 0.261

4.560 (3.4) 2.154 (3.3) 2.243 (2.0) 4.659 (3.6) 2.497 (2.1) 2.309 (2.2) 2.566 (1.5) −435.93 0.253

NOTE: Models estimated using Biogeme software and Gauss matrix programming language.

senting the correlation through subnetwork components increases model performances with respect to MNL modifications, but this LK specification produces less satisfying results than the nested structures. A similar LK model was found to perform well in other data sets (13). Considering a common nesting coefficient shared by all the network links enhances goodness-of-fit measures, as CNL performs better than GNL. The exception of PSL performing better than every other model specification with the merged data set suggests that the choice set composition effectively influences model performances, as the better model with one choice set is not the better model with the other. Further investigation in the definition of path size, nesting coefficients, and subnetwork components is required but is beyond the scope of this paper. The smaller amount of alternatives in the branch and bound choice set leads to higher probabilities of selecting the actual chosen route and, consequently, to higher likelihood values and better goodness of fit. Simulation approach results are presented in Table 5. Models simulate with good accuracy the behavior of both the smaller number of individuals who have selected the minimum distance or minimum travel time path, and the larger number of drivers who have chosen a route passing a specific landmark. Prediction accuracy suggests that the branch and bound algorithm generates path sets better fitting the representation of the observed choice behavior with respect to the shortest-path-based techniques.

Comparison Within Choice Sets The evaluation of model robustness with respect to the choice set composition considers MNL estimates as a comparison term for the error measures. The procedure accounts for six model specifications estimated for 10 repetitions of three classes of sample size reduction from two initial choice sets. A total of 360 models are estimated and successively applied to the initial data sets to measure the impact over choice probabilities and likelihood values. Different levels of comparison are available from this numerical experiment: across choice sets, across sample sizes, and across model specifications. The discussion presented in this section attempts to interpret the large amount of information resulting from estimation of the route choice models for all the generated choice sets. Tables 6 through 8 present the computational results. Across the error measures for each combination of criterion and property, some common findings are discernible. First, error measures for MNL modifications are smaller than error measures for nested structures and LK. Models that maintain the simple logit structure while accounting for similarities among routes show more robustness with respect to the choice set reduction, whereas GEV structures exhibit more dependence on the choice set composition. Second, none of the initial choice sets yields more robustness than the other. Considering the same model specification, some error

70

Transportation Research Record 2003

TABLE 4

Model Estimation with Merged Choice Set

Variable

C-Logit

Distance (est., t-stat.) Travel time (exp. driver) Travel time (inexp. driver) Sabotino Square Adriano Square Rivoli Square Bernini Square Sommeiller Bridge Dante Bridge Orbassano Square Habit Time-saving skill Navigating ability Commonality factor ln path size Exp path size Nesting coefficient Exp nesting coefficient Sigma WE1 Sigma WE2 Sigma WE3 Sigma NS1 Sigma NS2 Sigma NS3 Sigma NS4 Log likelihood at estimates Adjusted rho-bar squared

PSL

−0.888 (−3.8) −0.494 (−5.9) −0.347 (−4.2) 2.436 (5.6) 1.572 (3.7) −1.142 (−2.0) −1.149 (−2.9) 3.903 (8.1) 3.592 (5.2) −1.299 (−3.6) −0.641 (−2.3) 0.528 (2.1) 0.155 (1.4) −0.830 (−3.1)

CNL

−0.422 (−2.0) −0.435 (−5.6) −0.311 (−4.0) 1.697 (4.2) 1.477 (3.7) −0.802 (−1.4) −1.058 (−2.7) 3.148 (7.1) 3.117 (4.8) −1.216 (−3.7) −0.585 (−2.2) 0.422 (2.0) 0.131 (1.2)

GNL

−0.847 (−3.8) −0.309 (−5.0) −0.235 (−3.2) 2.079 (5.3) 1.375 (3.4) −1.071 (−2.0) −0.911 (−3.1) 3.378 (7.8) 3.656 (5.8) −1.287 (−3.9) −0.288 (−2.2) 0.238 (1.9) 0.162 (1.5)

LK

−0.812 (−4.0) −0.354 (−4.8) −0.221 (−3.2) 2.080 (5.3) 1.312 (3.6) −1.053 (−2.2) −1.009 (−3.1) 3.419 (7.7) 3.564 (6.1) −1.278 (−4.0) −0.447 (−2.4) 0.408 (2.0) 0.212 (2.2)

1.172 (7.0) 33.747 (5.2)

−1.485 (−6.0) −0.558 (−6.0) −0.324 (−4.0) 1.772 (3.6) 0.659 (1.6) −0.847 (−1.9) −1.326 (−2.1) 3.709 (6.4) 3.599 (5.0) −1.537 (−3.7) −0.373 (−4.5) 1.021 (4.2) 0.113 (1.0) 0.209 (0.8)

0.547 (2.1) 4.604 (5.6)

−574.59 0.176

−538.41 0.225

−560.98 0.195

−567.47 0.186

5.677 (4.4) 2.636 (4.2) 2.713 (2.5) 4.370 (4.6) 2.597 (2.3) 1.872 (2.2) 1.381 (1.1) −555.04 0.193

NOTE: Models estimated using Biogeme software and Gauss matrix programming language.

measures are inferior for the branch and bound data set and others are inferior for the merged data set. The number of alternatives appears to be more influential than the nature of the routes once the reduction of the sample size is applied. Third, none of the model specifications appears to be more robust than the others with respect to all the criteria. Considering the same initial choice sets, some models present greater ability to replicate correctly choice probabilities of the actual choices and others display higher accuracy in estimating overall likelihood values or superior capacity in replicating parameter estimates. From a general perspective, the first finding is coherent with the theoretical consistency of the MNL model with the sampling of

TABLE 5

Model MNL C-LOGIT PSL CNL GNL LK

alternatives. The second and third results suggest that the number of alternatives is extremely relevant in route choice modeling per se, regardless of the model specification and the path generation technique used. From a closer perspective, Table 6 focuses on replication of the model parameters. When considering bias RMSE, only MNL and C-logit present some accuracy in replicating parameter estimates. RMSE values for PSL, CNL, GNL, and LK are affected by the variation of path size exponent, common nesting coefficient, nesting coefficient exponent, and variance of the subnetwork components across the repetitions for each class of sample size reduction. When considering bias MAPE, the difference between MNL modifications

Measures of Prediction Accuracy Accuracy (%) for Branch and Bound Choice Set

Accuracy (%) for Merged Choice Set

Min. Distance

Min. Travel Time

Landmarks

Min. Distance

Min. Travel Time

Landmarks

74.58 74.58 74.58 64.41 61.02 67.80

65.22 69.57 65.22 60.87 58.70 63.04

82.28 83.54 84.81 87.34 86.08 86.71

63.16 63.16 75.44 56.14 54.39 57.89

69.70 69.70 69.70 54.55 51.52 60.61

65.43 65.43 70.37 69.14 68.52 68.62

Prato and Bekhor

71

TABLE 6

Measures of Ability to Replicate Parameter Estimates Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.0649 0.0555 0.0604 0.0479 0.0847 0.0675 0.0714 0.0572 0.4643 0.0890 0.4189 0.0592 0.1519 0.2145 0.0561 0.0558 0.1398 0.0999 0.1099 0.0834 0.3611 0.1180 0.3130 0.1019

0.1428 0.1207 0.1273 0.1013 0.1729 0.1367 0.1356 0.1119 0.7992 0.1678 0.6521 0.1083 0.2020 0.2291 0.1257 0.1255 0.2270 0.1561 0.1229 0.1162 0.6537 0.2172 0.4954 0.1711

0.2283 0.2084 0.2090 0.1702 0.2653 0.2277 0.2354 0.1860 0.9858 0.2678 0.9556 0.1945 0.2567 0.2988 0.2047 0.1881 0.4521 0.2813 0.2323 0.2158 1.1221 0.3853 0.8034 0.4115

0.0733 0.0454 0.0649 0.0334 0.0740 0.0498 0.0648 0.0389 0.6281 0.0588 0.6178 0.0449 0.1066 0.1078 0.0853 0.0833 0.0785 0.0587 0.0684 0.0467 0.2467 0.0922 0.1842 0.0687

0.1287 0.0866 0.0890 0.0592 0.1394 0.1007 0.1017 0.0709 1.5668 0.1233 1.5284 0.0897 0.1744 0.1616 0.1427 0.1343 0.1313 0.1064 0.0912 0.0692 0.4393 0.1639 0.3682 0.1255

0.2499 0.1610 0.1672 0.0975 0.2430 0.1726 0.1569 0.0983 1.1500 0.1811 1.1177 0.1150 0.3017 0.2701 0.1637 0.1332 0.3512 0.2359 0.1884 0.1352 0.8573 0.3809 0.6486 0.2362

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

TABLE 7

Measures of Reproducibility of Choice Probabilities for Chosen Routes Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.0102 0.0447 0.0087 0.0375 0.0114 0.0510 0.0097 0.0428 0.0119 0.0533 0.0091 0.0396 0.0318 0.1544 0.0136 0.0576 0.0212 0.0836 0.0176 0.0688 0.0172 0.0762 0.0155 0.0674

0.0239 0.1049 0.0190 0.0768 0.0252 0.1092 0.0199 0.0806 0.0269 0.1060 0.0200 0.0802 0.0427 0.1975 0.0284 0.1216 0.0359 0.1448 0.0262 0.1048 0.0317 0.1408 0.0245 0.1100

0.0425 0.1916 0.0326 0.1322 0.0420 0.1952 0.0322 0.1429 0.0495 0.1937 0.0352 0.1509 0.0702 0.3320 0.0516 0.2072 0.0732 0.3414 0.0474 0.1989 0.0569 0.2488 0.0464 0.2000

0.0102 0.0629 0.0044 0.0293 0.0107 0.0643 0.0052 0.0330 0.0128 0.0662 0.0072 0.0350 0.0259 0.1948 0.0157 0.0706 0.0207 0.1253 0.0077 0.0491 0.0109 0.0676 0.0075 0.0470

0.0199 0.1210 0.0071 0.0486 0.0212 0.1275 0.0093 0.0592 0.0335 0.1617 0.0245 0.1022 0.0477 0.3559 0.0375 0.2242 0.0299 0.1464 0.0097 0.0587 0.0171 0.1174 0.0115 0.0765

0.0331 0.2202 0.0106 0.0802 0.0342 0.2277 0.0113 0.0832 0.0378 0.2083 0.0211 0.1060 0.0537 0.3282 0.0262 0.1279 0.0506 0.2744 0.0186 0.1178 0.0354 0.2635 0.0220 0.1460

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

72

Transportation Research Record 2003

TABLE 8

Measures of Accuracy in Estimation of Log Likelihood Function Branch and Bound Choice Set

Merged Choice Set

Model

Property

Measure

75%

50%

25%

75%

50%

25%

MNL

Bias

RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE

0.3276 0.0006 0.1456 0.0003 0.4869 0.0009 0.2350 0.0004 0.6228 0.0013 0.2446 0.0004 6.7002 0.0151 1.7140 0.0035 1.0892 0.0016 0.8357 0.0014 1.4128 0.0028 0.7169 0.0012

1.4262 0.0029 0.4944 0.0009 1.8003 0.0037 0.6589 0.0011 2.5437 0.0050 1.2112 0.0019 9.8788 0.0213 3.7716 0.0071 4.4110 0.0064 3.3920 0.0063 5.0438 0.0104 2.2271 0.0041

4.2371 0.0091 0.8865 0.0015 4.8227 0.0103 1.1945 0.0021 7.3208 0.0155 2.5085 0.0045 16.4802 0.0337 7.8954 0.0152 17.1794 0.0362 6.5502 0.0109 16.0366 0.0349 5.0301 0.0102

0.2360 0.0004 0.0808 0.0001 0.2918 0.0005 0.0952 0.0001 0.5602 0.0009 0.2377 0.0003 10.0607 0.0157 4.8653 0.0072 0.8684 0.0013 0.4676 0.0007 0.7054 0.0012 0.2289 0.0004

0.8376 0.0014 0.2901 0.0004 1.3678 0.0021 0.6082 0.0008 1.9907 0.0035 0.7000 0.0011 28.3954 0.0404 17.1096 0.0227 1.8398 0.0028 0.9522 0.0016 2.3353 0.0039 0.8378 0.0011

3.6537 0.0057 1.5054 0.0023 4.9492 0.0080 1.8743 0.0030 6.0835 0.0103 2.5357 0.0036 41.0378 0.0691 13.4770 0.0177 7.4069 0.0123 2.5153 0.0037 11.5179 0.0195 3.8742 0.0050

Variance C-LOGIT

Bias Variance

PSL

Bias Variance

CNL

Bias Variance

GNL

Bias Variance

LK

Bias Variance

and the other models in terms of parameter robustness is evident. For the PSL model in particular, the bias RMSE is affected by the high estimated values of the path size exponent, whereas the bias MAPE is not influenced by outliers and thus is comparable to the measure computed for the C-logit model. Variance errors are similar between models, and this finding enforces the idea that the number of alternatives influences parameter estimates more than the generation method of the routes. Table 7 focuses on reproducing the choice probabilities for the actual chosen route and consequently on the individual log likelihood function values. Bias error measures confirm differences between MNL modifications and different model specifications, as C-logit and PSL reproduce probabilities with lower average error and lower variation. The explanation remains in the formulation of choice probabilities, because the variation of nesting coefficients and nesting coefficient exponents directly affects computation of the probability of choosing the actual route, whereas variation of the path size exponent modifies the path size term without directly entering the probability calculation. The LK model maintains a logit probability function, even though numerical simulation is required for the integration, and accordingly the error measures are more similar to MNL than to the nested structures. Table 8 presents evidence that reproduction of the individual likelihood values is less accurate than estimation of the overall log likelihood function. Nonetheless, model specifications present similar relative differences in terms of error measures as in the previous criterion. For MNL modifications, biases are negligible with 75%, reasonable with 50%, and critical with 25% of the initial number of alternatives, whereas variances are minimal across the repetitions. For nested structures and LK, biases are sensibly relevant even with 50% and extremely critical for 25% of the initial sample sizes, whereas variances are more sensitive across the repetitions. In particular, for

these models a reduction to only 25% of the initial choice sets appears to affect significantly the accuracy in likelihood estimation.

SUMMARY AND CONCLUSIONS Modeling revealed that route choice behavior involves formation of a set of alternative routes and estimation of a discrete choice model. This paper focuses on the impact of choice set composition on model estimates by designing an experimental analysis of actual route choices of individuals moving from home to work in an urban environment. The experimental analysis focuses on the application of several path generation techniques to the urban network of the case study, the generation of choice sets consistent with the observed behavior, the extraction of alternatives according to three classes of sample size reduction, the repetition of the estimation of six route choice models for each of these classes, and the application of the estimated parameters to the initial data sets for error measure calculation. Comparison of prediction accuracy across different choice sets suggests that generation techniques that produce heterogeneous routes allow for estimating models with better prediction abilities with respect to the outcomes of the drivers’ actual choices. Error measures provide evidence that random sampling produces good estimates for MNL modifications with samples containing 50% of the initial number of alternatives, whereas nested structures present high variation of model estimates even for a relatively small size reduction. From a practical perspective, however, CNL and GNL likelihood values for reduced choice sets still indicate that the nested structures are significantly better than the MNL model, even though parameter estimates are less robust. Further, the numerical analysis shows that the number of alternatives constitutes a significant issue in route choice modeling, because

Prato and Bekhor

errors exhibit comparable values in terms of variance across the repetitions regardless of the initial composition of the data set and the model specification estimated. The results presented in this paper are based on a relatively small data set, and further investigation will apply the illustrated experimental analysis to additional data sets to generalize these conclusions. Even though each numerical experiment requires caution in generalizing the results, these findings suggest guidelines for analysts intending to estimate and calculate predictions in route choice modeling from the observation of actual behavior: (a) apply a branch and bound algorithm to generate heterogeneous routes; (b) estimate MNL modifications in choice situations with a large number of alternatives, as reduction of the sample size would reduce the computational expenditure without a significant effect on the model estimates because of the parameter robustness; and (c) estimate nested or LK structures in choice situations with a small number of alternatives, because the likelihood values would improve without relevant effects on computational expenditure.

ACKNOWLEDGMENTS The authors are grateful to the anonymous reviewers who provided insightful comments on the initial version of this paper.

REFERENCES 1. Ben-Akiva, M. E., M. J. Bergman, A. J. Daly, and R. Ramaswamy. Modeling Inter-Urban Route Choice Behaviour. In Proc., 9th International Symposium on Transportation and Traffic Theory, VNU Science Press, Utrecht, Netherlands, 1984, pp. 299–330. 2. De La Barra, T., B. Perez, and J. Anez. Multidimensional Path Search and Assignment. Presented at 21st PTRC Summer Annual Meeting, Manchester, United Kingdom, 1993. 3. Azevedo, J. A., M. E. O. Santos Costa, J. J. E. R. Silvestre Madera, and E. Q. Vieira Martins. An Algorithm for the Ranking of Shortest Paths. European Journal of Operational Research, Vol. 69, 1993, pp. 97–106. 4. Bekhor, S., M. Ben-Akiva, and S. Ramming. Route Choice: Choice Set Generation and Probabilistic Choice Models. In Proc., 4th Triennial Symposium on Transportation Analysis Conference, University of Azores, Sao Miguel, Portugal, 2001, pp. 459– 464. 5. Prato, C. G., and S. Bekhor. Applying Branch-and-Bound Technique to Route Choice Set Generation. In Transportation Research Record: Journal of the Transportation Research Board, No. 1985, Transportation Research Board of the National Academies, Washington, D.C., 2006, pp. 19–28. 6. Cascetta, E., A. Nuzzolo, F. Russo, and A. Vitetta. A Modified Logit Route Choice Model Overcoming Path Overlapping Problems: Specification and Some Calibration Results for Interurban Networks. In Proc., 13th International Symposium on Transportation and Traffic Theory, Pergamon, Lyon, France, 1996, pp. 697–711. 7. Ben-Akiva, M., and M. Bierlaire. Discrete Choice Methods and Their Applications to Short Term Travel Decisions. In Handbook of Transportation Science, Kluwer, Dordrecht, Netherlands, 1999, pp. 5–12. 8. Ramming, S. Network Knowledge and Route Choice. PhD thesis. Massachusetts Institute of Technology, Cambridge, 2001. 9. Prashker, J. N., and S. Bekhor. Investigation of Stochastic Network Loading Procedures. In Transportation Research Record 1645, TRB, National Research Council, Washington, D.C., 1998, pp. 94–102. 10. Bekhor, S., and J. N. Prashker. Stochastic User Equilibrium Formulation for Generalized Nested Logit Model. In Transportation Research Record: Journal of the Transportation Research Board, No. 1752, TRB, National Research Council, Washington, D.C., 2001, pp. 84–90. 11. Yai, T., S. Iwakura, and S. Morichi. Multinomial Probit with Structured Covariance for Route Choice Behavior. Transportation Research B, Vol. 31, 1997, pp. 195–207.

73

12. Bekhor, S., M. S. Ben-Akiva, and S. M. Ramming. Adaptation of Logit Kernel to Route Choice Situation. In Transportation Research Record: Journal of the Transportation Research Board, No. 1805, TRB, National Research Council, Washington, D.C., 2002, pp. 78–85. 13. Bierlaire, M., and E. Frejinger. Route Choice Models with Subpath Components. Presented at 5th Swiss Transport Research Conference, Ascona, Switzerland, 2005. 14. Ortuzar, J. D., and L. G. Willumsen. Modelling Transport, 3rd ed. John Wiley and Sons Ltd., Chichester, United Kingdom, 2001. 15. Stopher, P. Captivity and Choice in Travel Behavior Models. Transportation Journal of ASCE, Vol. 106, 1980, pp. 427– 435. 16. Williams, H. C. W. L., and J. D. Ortuzar. Behavioural Theories of Dispersion and Mis-specification of Travel Demand Models. Transportation Research B, Vol. 16, 1982, pp. 167–219. 17. Swait, J., and M. Ben-Akiva. Incorporating Random Constraints in Discrete Models of Choice Set Generation. Transportation Research B, Vol. 21, 1987, pp. 91–102. 18. Swait, J., and M. Ben-Akiva. Empirical Test of a Constrained Choice Discrete Model: Mode Choice in Sao Paulo, Brazil. Transportation Research B, Vol. 21, 1987, pp. 103–115. 19. Basar, G., and C. R. Bhat. A Parameterized Probabilistic Consideration Set Model for Airport Choice: An Application to the San Francisco Bay Area. Transportation Research B, Vol. 38, 2004, pp. 889–904. 20. McFadden, D. Modeling the Choice of Residential Location. In Transportation Research Record 673, TRB, National Research Council, Washington, D.C., 1978, pp. 72–77. 21. Nerella, S., and C. R. Bhat. Numerical Analysis of Effect of Sampling of Alternatives in Discrete Choice Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 1894, Transportation Research Board of the National Academies, Washington, D.C., 2004, pp. 11–19. 22. Hansen, E. Industrial Location Choice in Sao Paulo, Brazil: A Nested Logit Model. Regional Science and Urban Economics, Vol. 17, 1987, pp. 89–108. 23. Friedman, J., D. Gerlowski, and J. Silberman. What Attracts Foreign Multinational Corporations? Evidence from Branch Plant Location in the United States. Journal of Regional Science, Vol. 32, 1992, pp. 403–418. 24. Woodward, D. Location Determinants of Japanese Manufacturing StartUps in the United States. Southern Economic Journal, Vol. 58, 1992, pp. 690–708. 25. Ben-Akiva, M., and J. L. Bowman. Integration of an Activity-Based Model System and a Residential Location Model. Urban Studies, Vol. 35, 1998, pp. 1131–1153. 26. Sermons, M. W., and F. S. Koppelman. Representing Differences Between Female and Male Commute Behavior in Residential Location Choice Models. Journal of Transport Geography, Vol. 9, 2001, pp. 101–110. 27. Bhat, C. R., and J. Y. Guo. A Mixed Spatially Correlated Model: Formulation and Application to Residential Choice Modeling. Transportation Research B, Vol. 38, 2004, pp. 147–168. 28. Pozsgay, M. A., and C. R. Bhat. Destination Choice Modeling for Home-Based Recreational Trips: Analysis and Implications for Land Use, Transportation, and Air Quality Planning. In Transportation Research Record: Journal of the Transportation Research Board, No. 1777, TRB, National Academy of Sciences, Washington, D.C., 2001, pp. 47–54. 29. Schlich, R., A. Simma, and K. W. Axhausen. Destination Choice Modeling for Different Leisure Activities. Presented at 2nd Swiss Transport Research Conference, Ascona, Switzerland, 2002. 30. Ben-Akiva, M., D. McFadden, and K. Train. The Demand for Local Telephone Service: A Fully Discrete Model of Residential Calling Patterns and Service Choices. Rand Journal of Economics, Vol. 18, 1987, pp. 109–123. 31. Gilbride, T. J., and G. M. Allenby. A Choice Model with Conjunctive, Disjunctive, and Compensatory Screening Rules. Marketing Science, Vol. 23, 2004, pp. 391– 406. 32. Prato, C. G., S. Bekhor, and C. Pronello. Methodology for Exploratory Analysis of Latent Factors Influencing Drivers’ Behavior. In Transportation Research Record: Journal of the Transportation Research Board, No. 1926, Transportation Research Board of the National Academies, Washington, D.C., 2005, pp. 115–125. 33. Prato, C. G. Latent Factors and Route Choice Behaviour. PhD thesis. Turin Polytechnic, Italy, 2005. The Transportation Demand Forecasting Committee sponsored publication of this paper.