Statistical and machine learning approach for planning dial-a-ride ...

Statistical and machine learning approach for planning dial-a-ride systems Nikola Markovića , Myungseob (Edward) Kimb , Paul Schonfelda a

Department of Civil and Environmental Engineering, University of Maryland, College Park, MD, USA b Parsons Transportation Group, Parsons Corporation, Washington, DC, USA

Abstract Door-to-door transportation service for elderly and persons with disabilities is often called dial-a-ride (DAR), and is usually provided by transit agencies through private contractors. Growth in DAR ridership is reported across the United States and this tendency will likely continue due to aging population. Such trends encourage development of models that can provide decision support in planning new DAR systems or expanding existing ones. Several statistical models were previously developed to predict the required DAR system capacity, given various characteristics of the service region, level-of-service requirements and operator constraints. Our work contributes to this line of research by proposing statistical and machine learning approaches that provide more accurate predictions over a wider range of scenarios. This is accomplished through transformation of variables and application of generalized linear model and support vector regression. Proposed models are built into an online tool that can help transit planners and policy makers: (a) estimate the capacity and operating cost of a DAR system needed to provide the desired level of service, (b) explore tradeoffs between system costs and levels of service, and (c) compare the cost of providing DAR service with other transportation alternatives (e.g., taxi, conventional transit). Keywords: dial-a-ride, paratransit, planning, decision support system, statistics, machine learning 1. Introduction Dial-a-ride operations have been growing rapidly in the United States since the passage of the Americans with Disabilities Act of 1990, which required a complementary door-todoor service for those passengers who are unable to use public transportation Palmer et al. (2004). In response to this expanding trend in DAR services, considerable research has been devoted towards developing models for helping transit agencies determine the required system capacity for the desired operating conditions. Attempts to develop such models include analytical (Daganzo, 1978; Chang and Schonfeld, 1991; Aldaihani et al., 2004; Diana Preprint submitted to Transportation Research Part A: Policy and Practice

May 10, 2016

et al., 2006; Kim and Schonfeld, 2013, 2014), simulation (Fu, 2002; Horn, 2002; Shinoda et al., 2004; Quadrifoglio et al., 2008; Häll et al., 2012; Ronald et al., 2013; Shen and Quadrifoglio, 2013; Neven et al., 2015), and statistical models that are calibrated based on simulated data (Fu, 2003; Luo and Schonfeld, 2011; Marković et al., 2013). The idea behind the aforementioned statistical models is to establish a functional relation between the capacity of a DAR system (e.g., fleet size, vehicle-hours, vehicle-kilometers) on one hand, and characteristics of the service region, level-of-service requirements and operator constraints on the other hand. Such a functional relation could be used by practitioners to quickly estimate capacity of a DAR system needed to serve a region. This would allow transportation agencies to roughly predict the cost of introducing a DAR service and compare it with costs of other transportation alternatives (e.g., taxi, conventional transit). A particularly convenient characteristic of statistical models is that their use is computationally inexpensive (i.e., calibrated models yield results/predictions instantaneously). This enables quick sensitivity analyses, which can allow transit planners to study tradeoffs between costs and levels of service. In statistical modeling, it is crucial to (a) consider all the relevant influencing factors and (b) calibrate models on a sufficiently large data set. The largest number of factors influencing DAR system capacity is considered in a recent work of Marković et al. (2013). They consider 11 explanatory variables shown in Table 1, such as service area, demand density, temporal distribution of requests, pickup time windows and maximum ride time for a passenger. To obtain a sufficiently large data set, they simulate roughly 1,500 instances of the DAR problem (Cordeau and Laporte, 2003), given randomly generated values of the 11 explanatory variables. They subsequently apply a vehicle routing and scheduling heuristic (Marković et al., 2015) to obtain the DAR system capacity (i.e., fleet, vehicle-hr, vehicle-km) needed to meet the demand in those 1,500 problem instances. The authors focus their statistical analysis on approximately 850 instances in which fleet size ranges between 20 and 90 vehicles, and apply linear regression and artificial neural networks to model the relation between the system capacity and the explanatory variables. It is observed that linear regression models provide very good predictions when compared to real-world scenarios coming from a large DAR service provider in Maryland (Figure 1). The authors report poor performance of the artificial neural networks, which consistently overfit data despite several attempts to avoid this issue (i.e., neural networks would describe random error rather than underlying relations). Also, they make their data publicly available for further analysis. This paper extends our earlier work (Marković et al., 2013) by proposing statistical and machine learning models capable of providing more accurate predictions over a wider range of scenarios. We consider the same type of variables, but calibrate the new models over all 2

Table 1: Variables considered in the statistical models for planning DAR systems.

1) 2) 3) 4) 5) 6)

Explanatory variables service area (km ) 7) passenger boarding time (min) demand density (trips/km2 -day) 8) average vehicle speed (km/hr) peak demand CV 9) network circuity factor peak-hour demand (trips/hr) 10) vehicle capacity (passengers) pickup time window (min) 11) maximum ride time ratio maximum route duration (min) 2

Response variables 1) fleet size (vehicles) 2) vehicle-hr 3) vehicle-km

Peak demand CV is the coefficient of variation computed for the hourly demand during the peak and two hours before and after the peak; Network circuity factor is the average ratio of the shortest road distance between two points and the length of a straight line connecting the two points; Maximum ride time ratio limits the ride time for a passenger and is defined as a multiple of the direct ride time.

Pickups/Dropoffs Terminal

Figure 1: Terminal and pickup/dropoff locations for a daily operation of the Regency Taxi company providing DAR services in the Washington metropolitan area. This real-world instance of the DAR problem is solved with a vehicle routing and scheduling heuristic for different input parameters (i.e., time windows and max route duration) to obtain realistic scenarios, which are used to verify prediction models. (Software by BarYehuda (2014) is used for the visualization.)

3

1,500 instances in which fleet size ranges from 7 to over 450 vehicles. In this paper we make the following contributions: 1. We propose the Generalized Linear Model (GLM) and Support Vector Regression (SVR) for predicting capacities of DAR systems. We explore different model specifications, link/kernel functions and parameter settings to improve performance of the models while avoiding the problem of overfitting. Superior precision of the proposed models is demonstrated through comparison with earlier methods on both simulated and real-world scenarios. The average observed improvement is 9%. 2. The applicability of the proposed models is substantially increased by calibrating them over a much greater range of variables than in earlier studies. Higher prediction accuracy over a range of variables that differ by an order or magnitude (e.g., 7 vs. 450 vehicles) is partially achieved through a log transform. This transform reduces the positive skew in our data, improves the fit to theoretical distributions, and generally enhances performance of our statistical and machine learning models. 3. We update the online decision support system introduced in our previous paper to include newly developed models which provide greater accuracy in predicting the required DAR system capacity over a much wider range of variables. This free online tool can provide decision support in planning DAR systems, and allow practitioners to use calibrated statistical models with a click of a button. The rest of the paper is organized as follows. First, we analyze correlation and distribution of response variables, which provides important information for appropriate specification of prediction models. Second, we introduce GLM and SVR models for predicting DAR system capacity. Third, we explore performance of the two models, show that they outperform their predecessors from the literature, and present the online tool. Forth, we draw conclusions and suggest extensions of this work. 2. Data We explore correlation between the 11 explanatory and 3 response variables outlined in Table 1. Results shown in Table 2 imply no linear correlation between the three response variables, and the vehicle capacity and max ride time ratio (i.e., |r| < 0.02, p > 0.54). The former is unsurprising because vehicle capacity is almost never a binding constraint in DAR operations where the average load per loaded vehicle rarely exceeds 2 passengers. Moreover, the pickup time window is insignificant for the vehicle-hr and vehicle-km at the 95% confidence level (i.e., p ≈ 0.07), and so is the passenger loading time for the vehicle-hr 4

Table 2: Correlation between explanatory and response variables

service area demand density peak-demand CV peak-hour demand pickup time window max route duration passenger boarding time average vehicle speed network circuity factor vehicle capacity max ride time ratio

Fleet r 0.4933 -0.2613 0.1548 0.3849 -0.0645 -0.2174 0.0529 -0.4766 0.1344 0.0110 -0.0114

Size p-value 0.0000 0.0000 0.0000 0.0000 0.0129 0.0000 0.0412 0.0000 0.0000 0.6715 0.6606

Vehicle-hr r p-value 0.5040 0.0000 -0.2650 0.0000 0.1225 0.0000 0.3224 0.0000 -0.0482 0.0629 -0.0817 0.0016 0.0359 0.1659 -0.4922 0.0000 0.1269 0.0000 0.0095 0.7156 0.0072 0.7807

Vehicle-km r p-value 0.6858 0.0000 -0.3668 0.0000 0.1891 0.0000 0.4640 0.0000 -0.0454 0.0798 -0.1200 0.0000 0.0871 0.0008 -0.1387 0.0000 0.1827 0.0000 0.0157 0.5455 -0.0054 0.8345

(i.e., p ≈ 0.17). Other variables have significant correlation with the response variables. The 14 × 14 correlation matrix is visualized (Komarov, 2013) in Figure 2. We observe no strong correlation between the explanatory variables, which would discourage us from including interaction terms in our regression models. On the other hand, there is a strong positive correlation between response variables, as is expected (e.g., a larger fleet needed to serve the demand would imply more vehicle-km and vehicle-hr). Knowing the distribution of a response variable is important for selecting the appropriate GLM model. Thus, we fit 17 continuous distributions (Sheppard, 2012) to the three response variables, and show in Figure 3 the distributions that fit best in terms of Bayesian information criterion (subfigures on the left). Moreover, we provide quantile-quantile (QQ) plots for the selected distributions from the exponential family that can be used within the GLM framework (subfigures on the right). The QQ plots do not show a satisfactory fit over the entire domain. Since the current study includes instances that differ by an order of magnitude (e.g., 7 vs. 450 vehicles), this outcome is unsurprising. To improve the fit to theoretical distributions, we explore square root and log transforms. We conclude that taking the natural log (ln) of the response variables improves the fit and generally enhances the performance of the models introduced in the following section. Figure 4 indicates that the ln transform reduces the skew and establishes symmetry, while the QQ plots show a much better fit for the selected distributions from the exponential family that can be used in the GLM (i.e., gamma for the ln of the fleet size and vehicle-hr, and normal for the ln of vehicle-km). Consequently, we will treat the ln of the fleet size, vehicle-hr and vehicle-km as our response variables.

5

Figure 2: Correlation between variables: yellow and magenta denote positive and negative correlation, respectively. Brighter colors denote greater correlation, while less visible links imply weaker correlation.

6

Probability Density Function: Fleet Size

QQ Plot Sample Data vs. Distribution: Fleet Size 450

empirical inverse Gaussian Birnbaum−Saunders lognormal

400

Quantiles of Input Sample

Probability Density

0.015

0.01

0.005

350 300 250 200 150 100 50

0

50

100

150

200

250

300

350

400

450

50

Fleet Size Probability Density Function: Vehicle−Hr

−3

x 10

empirical inverse Gaussian GEV lognormal

1.6

Probability Density

1.4

1 0.8 0.6 0.4

300

350

400

450

3500 3000 2500 2000 1500 1000 500

0

500

1000

1500

2000

2500

3000

3500

4000

4500

500

Vehicle−Hr

x 10

QQ x 10

Probability Density Function: Vehicle−Km

4

empirical Birnbaum−Saunders lognormal inverse Gaussian

7 6

1000

1500

2000

2500

3000

3500

4000

4500

Quantiles of Inverse Gaussian Distribution Plot Sample Data vs. Distribution: Vehicle−Km

6


−5

Probability Density

250

4000

0.2

5 4 3 2

5

4

3

2

1

1 0

200

4500

1.2

8

150

QQ Plot Sample Data vs. Distribution: Vehicle−Hr


1.8

100

Quantiles of Inverse Gaussian Distribution

1

2

3

4

Vehicle−Km

5

6

1

2

3

4

5

Quantiles of Lognormal Distribution

4

x 10

6 4

x 10

Figure 3: From the exponential family of distributions, inverse Gaussian is the best choice for the fleet size and vehicle-hr, and lognormal for the vehicle-km. However, the QQ plots show unsatisfying fit, especially for the first two responses. Note: GEV stands for the generalized extreme value distribution.

7

Probability Density Function: Ln(Fleet Size)

QQ Plot Sample Data vs. Distribution: Ln(Fleet Size)

0.8

empirical GEV gamma Nakagami

0.6

5.5


0.7

Probability Density

6

0.5 0.4 0.3 0.2

4.5 4 3.5 3 2.5

0.1 0

5

2 2

2.5

3

3.5

4

4.5

5

5.5

6

2

2.5

Ln(Fleet Size) Probability Density Function: Ln(Vehicle−Hr)

3.5

4

4.5

5

5.5

6

QQ Plot Sample Data vs. Distribution: Ln(Vehicle−Hr)

0.7

8.5

empirical gamma lognormal Birnbaum−Saunders

0.5

8


0.6

Probability Density

3

Quantiles of Gamma Distribution

0.4

0.3

0.2

0.1

7.5 7 6.5 6 5.5 5 4.5 4

0

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

4

Ln(Vehicle−Hr)

5

5.5

6

6.5

7

7.5

8

8.5

Quantiles of Gamma Distribution

Probability Density Function: Ln(Vehicle−Km)

QQ Plot Sample Data vs. Distribution: Ln(Vehicle−Km)

0.8

11

empirical GEV normal Rician

0.6

10.5


0.7

Probability Density

4.5

0.5 0.4 0.3 0.2

10 9.5 9 8.5 8

0.1 7.5 0

7.5

8

8.5

9

9.5

10

10.5

11

7.5

Ln(Vehicle−Km)

8

8.5

9

9.5

10

10.5

11

Quantiles of Normal Distribution

Figure 4: From the exponential family of distributions, gamma is the best choice for the ln(fleet size) and ln(vehicle-hr), and normal for the ln(vehicle-km). This time the QQ plots show quite satisfying fit. Note: GEV stands for the generalized extreme value distribution.

8

3. Prediction models We consider two distinct models for predicting the DAR system capacity: GLM and SVR. The GLM is a statistical model in which parameters are estimated via maximum likelihood approach. It allows for simple interpretation of results through investigation of parameters and their significance. The SVR, on the other hand, is a machine learning model calibrated to minimize a weighted sum of the training error and regularization term that controls for the complexity of the model. Both models are nowadays included in various software packages or machine learning libraries which facilitates their use. However, good performance of these models requires exploration of different model specifications, parameter settings, and manipulation of data. Here we provide an overview of the two models and stress specifications that yield good performance in predicting DAR system capacity while avoiding the issue of overfitting. 3.1. Generalized linear model Let y1 , ..., yn denote independent observations of a response variable (e.g., ln of vehiclekm). In a standard linear regression (LR) model, we typically assume that yi represents a realization of random variable Yi ∼ N (µi , σ 2 ), which follows a normal distribution with mean µi and variance σ 2 . Moreover, we assume that µi is a linear function of m explanatory variables. For all n observations, this relation is given as µ = Xβ,

(1)

where (a) X is an n × m matrix of n observations and m explanatory variables, and (b) β is an m-dimensional vector of unknown parameters. The goal in this model is to estimate β, which can be done via ordinary least squares method that seeks to minimizes the sum of squared residuals. In such a case, βˆ can be expressed in the closed form as βˆ = (X | X)−1 X | y,

(2)

where y is a vector of observations y1 , ..., yn . The above LR model was applied to predict DAR system capacity and it performed quite well when calibrated on a data set with a fleet ranging between 20 and 90 vehicles (Marković et al., 2013). In the current analysis, we consider the GLM framework which generalizes the LR model in three ways: (a) by introducing a link function which transforms the mean of the distribution to the explanatory predictor; (b) by allowing for non-constant variance across observations; and (c) by permitting non-normal distribution of the response variable. In the

9

GLM, we let g (µ) = Xβ,

(3)

while assuming that random variable Yi follows any distribution from the exponential family (e.g., normal, gamma, inverse Gaussian, Poisson), and that link function g(·) is smooth and invertible (e.g., identity, reciprocal, logit). In GLM, as in LR, different model specifications can be tested and their goodness of fit compared. Regressors X can include different prespecied functions of the explanatory variables. In addition to constant and linear terms, the GLM can include interactions and higher order terms. Finally, after the distribution, link function, and X are specified, we estimate the β that maximizes the likelihood function βˆ = arg max L(β | X, y).

(4)

β∈Rm

Maximum likelihood estimate of β is generally unavailable in the closed form, but it can be obtained with the iteratively reweighted least squares method McCullagh and Nelder (1989), which is available in software packages such as MATLAB. We apply the GLM to the three response variables separately. The ln of the fleet size and vehicle-hr follow a gamma, while the ln of vehicle-km follows a normal distribution (Figure 4). The model distributions are specified accordingly. We try various link functions and conclude that identity yields best results for all three response variables. (Note that the GLM for the ln of vehicle-km reduces to linear regression because it includes a normal distribution and identity link function.) For the design matrix X, we try different model specifications and observe that including interaction and higher order terms causes models to overfit (i.e., provide very good fit on training data, but poor fit on a newly presented sample). This issue commonly arises with overly complex models, and it was also reported in Marković et al. (2013) for artificial neural networks. Therefore, we recommend application of the GLM with constant and linear terms only. Such a recommendation is also in accordance with a relatively weak correlation between the explanatory variables which we observed in Figure 2. 3.2. Support vector regression The support vector machines are supervised learning models used for classification and regression (Drucker et al., 1997). In our problem of predicting DAR system capacity, we consider the -SVR model which is described as follows: Let {(x1 , y1 ), ..., (xn , yn )} be training data, where xi and yi denote input and target data, respectively. The goal in -SVR is to determine a function f (x) that (a) has at most deviation from target data and (b) is as flat as possible (e.g., its Euclidean norm is as small as possible) (Smola and Schölkopf, 2004). 10

The specific model applied to predict DAR system capacity is the -SVR with the linear kernel function n X f (x, w) = wj x| xj , (5) j=1

where vectors xj are inputs from the training data and w is the vector of unknown parameters. These parameters are determined to minimize the functional min w∈R

1 kwk2 + C · Remp (w), 2

(6)

where (a) training error Remp is defined according to Remp (w) =

n X

max (|yi − f (xi , w)| − , 0) ,

(7)

i=1

and (b) penalty parameter C > 0 controls the tradeoff between the flatness of f (·) and amount up to which deviations greater than are tolerated. The above optimization problem is tackled with convex programming techniques (Smola and Schölkopf, 2004). Note that the above model (5)-(7) includes parameters C and . We wish to determine their values in a way that maximizes the predicting power of our -SVR while avoiding the issue of overfitting. We achieve this via a combination of cross-correlation and grid-search techniques. First, we apply a 5-fold cross-validation where the model is built based on 80% of the randomly selected data points and tested on the remaining 20%. Second, the space of the parameters is divided in an exponential grid and a simple enumeration is performed to select a combination of parameters that results in a model that yields highest R2 (i.e., coefficient of determination). After scaling all the data linearly to [0, 1] interval, the following values of parameters are considered in our applications: C = 1.1a max yi − min yi , a ∈ {−15, ..., 5} ,

(8)

= 2b , b ∈ {−10, ..., 10} .

(9)

i

i

Finally, it should be noted that f (x, w) could be specified using kernels other than linear, such as polynomial or radial, f (x, w) =

n X

wj (γx| xj + r)d ,

j=1

11

(10)

f (x, w) =

n X

wj exp(γkx − xj k2 ),

(11)

j=1

which are very popular due to their robustness. These two functions generalize (5) (Keerthi and Lin, 2003) and should, in theory, provide results at least as good as the linear kernel. However, -SVR with polynomial or radial kernel overfits our DAR data (like the artificial neural networks in our previous study). Thus, we recommend the use of linear kernel in this particular application. 4. Results We calibrate the two proposed models on 1,500 instances from Marković et al. (2013) and explore their performance on both simulated and real-world data. Then we compare the GLM with the LR model applied in the aforementioned study. Finally, we present the updated online tool which employs the GLM to predict DAR system capacity for different inputs. 4.1. GLM application The results for the three response variables are provided in Table 3. They include estimates for β, standard errors, and p-values. The p-values indicate that all the explanatory variables are significant at the 95% level, except vehicle capacity which is insignificant for all three response variables. This is unsurprising because vehicle capacity is typically a nonbinding constraint in DAR operations. Moreover, we study signs of βˆ and observe a positive sign for service area, peak-hour demand, boarding time, and network circuity. This is reasonable, because we expect response variables to increase with these inputs (e.g., greater area would imply more vehicles, vehicle-hr, and vehicle-km to meet the demand). On the other hand, demand density, coefficient of variation, pickup time window, maximum route duration, average vehicle speed, and maximum ride time ratio have negative signs. This is intuitive, because relaxing constraints on route duration or pickup time window gives the operator (or vehicle routing heuristic) the ability to satisfy demand with fewer vehicles, vehicle-hr, or vehicle-km. Finally, explanatory variables have same signs across different response variables, which is in accordance with the strong positive correlation between the response variables shown earlier in Figure 2. The GLM performs well in terms of R2 . We compute the coefficient of determination after reverting the ln of the three predicted responses to their original scales. The R2 takes the values of 0.77 for the fleet size and vehicle-hr, and 0.73 for the vehicle-km. In Figure 5 we visualize performance of the GLM. For each response variable, we plot predicted versus 12

ˆ standard errors (SE), Table 3: The GLM results for all three (transformed) response variables: parameter β, and p-values. Term (intercept) service area demand density peak demand CV peak hour demand pickup time window max route duration passenger boarding time average vehicle speed network circuity factor vehicle capacity max ride time ratio

Ln(Fleet Size) βˆ SE p-value 4.4648 0.0913 0.0000 0.0002 0.0000 0.0000 -0.2908 0.0220 0.0000 -0.3535 0.0372 0.0000 0.0184 0.0004 0.0000 -0.0042 0.0005 0.0000 -0.0022 0.0001 0.0000 0.0439 0.0036 0.0000 -0.0389 0.0007 0.0000 0.7685 0.0450 0.0000 -0.0022 0.0024 0.3536 -0.0980 0.0148 0.0000

Ln(Vehicle-Hr) βˆ SE p-value 5.4731 0.1016 0.0000 0.0002 0.0000 0.0000 -0.3642 0.0246 0.0000 -0.4410 0.0417 0.0000 0.0193 0.0004 0.0000 -0.0024 0.0006 0.0000 -0.0008 0.0001 0.0000 0.0455 0.0040 0.0000 -0.0470 0.0008 0.0000 0.9128 0.0501 0.0000 -0.0021 0.0027 0.4350 -0.0604 0.0165 0.0003

Ln(Vehicle-Km) βˆ SE p-value 7.6547 0.0974 0.0000 0.0002 0.0000 0.0000 -0.3953 0.0253 0.0000 -0.4347 0.0406 0.0000 0.0183 0.0004 0.0000 -0.0025 0.0005 0.0000 -0.0008 0.0001 0.0000 0.0417 0.0038 0.0000 -0.0069 0.0008 0.0000 0.9424 0.0477 0.0000 -0.0005 0.0026 0.8559 -0.0513 0.0158 0.0012

actual values (subfigures on the left). We do this for both simulated and real-world data, and observe satisfactory linearity in both cases. Moreover, we plot the relative errors of the predictions (subfigures on the right). These plots indicate a good performance of the GLM, especially for the real world data, where the maximum errors are roughly 11, 14 and 27% for fleet size, vehicle-hr and vehicle-km, respectively. A more detailed comparison of the GLM-based predictions against the real-world data is shown later in Figure 7, which also includes comparison with the predictions from the LR models introduced in our earlier paper (Marković et al., 2013). 4.2. SVR application We implement the -SVR in MATLAB, using LIBSVM (Chang and Lin, 2011) which is a popular support vector machine library. We scale all the data linearly to [0, 1] interval which improves performance of the model, and fit the three response variables separately. For each response variable, we perform the described grid search to find the best parameter setting, and consequently visualize the SVR-based results in Figure 6. For the simulated data, the computed R2 is 0.81 for the fleet size and vehicle-hr, and 0.74 for the vehiclekm. These coefficients of determination are somewhat greater than those of the GLM. On the other hand, the SVR-based predictions show slightly greater relative errors for the realworld scenarios. (Note that errors shown in subfigures on the right moderately exceed those in Figure 5).

13

Predicted vs. Actual: Fleet Size 450

Relative Residuals: Fleet Size

simulated data real−world data |Actual−Predicted|/Actual

400 350 300

Predicted

simulated data real−world data

1.2

250 200 150 100

real−world data max error: 10.81% mean error: 4.03%

1

0.8

0.6

0.4

0.2

50 0 0

50

100

150

200

250

300

350

400

0

450

0

50

100

150

200

Actual Predicted vs. Actual: Vehicle−Hr

350

400

450


1.2

|Actual−Predicted|/Actual

4000 3500

Predicted

300

Relative Residuals: Vehicle−Hr


4500

250

Actual

3000 2500 2000 1500 1000


1

0.8

0.6

0.4

0.2 500 0 0

500

1000

1500

2000

2500

3000

3500

4000

0

4500

0

500

1000

1500

2000

Actual Predicted vs. Actual: Vehicle−Km

4

x 10

3500

4000

4500



1.2

5

4

Predicted

3000

Relative Residuals: Vehicle−Km


6

2500

Actual

3

2

1


1

0.8

0.6

0.4

0.2

0 0

1

2

3

Actual

4

5

0

6 4

+

x 10

0

1

2

3

Actual

4

5

6 4

x 10

Figure 5: GLM provides good predictions for all three response variables. Subfigures on the left show satisfactory linearity between the predicted and actual values of the response variables, especially for the real-world data for fleet size and vehicle-hr. Subfigures on the right show acceptable prediction errors, which are less than 14% for the real-world data referring to fleet and vehicle-hr and up to 27% for vehicle-km.

14

Predicted vs. Actual: Fleet Size 450

Relative Residuals: Fleet Size

simulated data real−world data |Actual−Predicted|/Actual

400 350 300

Predicted


1.2

250 200 150 100


1

0.8

0.6

0.4

0.2

50 0 0

50

100

150

200

250

300

350

400

0

450

0

50

100

150

200

Actual Predicted vs. Actual: Vehicle−Hr

350

400

450


1.2


4000 3500

Predicted

300

Relative Residuals: Vehicle−Hr


4500

250

Actual

3000 2500 2000 1500 1000


1

0.8

0.6

0.4

0.2 500 0 0

500

1000

1500

2000

2500

3000

3500

4000

0

4500

0

500

1000

1500

2000

Actual Predicted vs. Actual: Vehicle−Km

4

x 10

3500

4000

4500



1.2

5

4

Predicted

3000

Relative Residuals: Vehicle−Km


6

2500

Actual

3

2

1


1

0.8

0.6

0.4

0.2

0 0

1

2

3

Actual

4

5

0

6

0

1

2

3

Actual

4

x 10

4

5

6 4

x 10

Figure 6: The SVR performs similarly to the GLM. Subfigures on the left again show satisfactory linearity between the predicted and actual values of the response variables, particularly for the real-world data regarding the fleet size and vehicle-hr. This time, subfigures on the right show somewhat greater prediction errors for the real-world data referring to fleet and vehicle-hr.

15

Table 4: Comparison of the models on simulated scenarios.

1,500 scenarios

850 scenarios

SVR (new) GLM (new) LR (old) SVR (new) GLM (new) LR (old)

R2 (coefficient of determination) fleet size vehicle-hr vehicle-km 0.81 0.81 0.74 0.77 0.77 0.73 0.73 0.66 0.79 0.75 0.73 0.79 0.74 0.72 0.78 0.75 0.73 0.79

Note: higher R2 implies greater prediction accuracy. Table 5: The RMSE for the predicted and real-world data.

1,500 scenarios

850 scenarios

SVR (new) GLM (new) LR (old) SVR (new) GLM (new) LR (old)

RMSE (root mean squared error) fleet size vehicle-hr vehicle-km 3.40 48.55 2299 2.25 28.75 2294 14.66 82.11 3631 2.07 17.94 836 1.79 13.24 1558 2.03 14.76 754

Note: smaller RMSE implies greater prediction accuracy.

4.3. Model comparison We compare performance of the GLM and SVR with the LR model introduced in our previous study (Marković et al., 2013). For this comparison, we calibrate the models on two data sets: (a) all 1,500 scenarios, and (b) a subset of 850 instances considered in the previous study. To calibrate the GLM and SVR on the latter data set, we repeat the same procedure as described in Sections 2 and 3 (i.e., we explore the square root and ln transforms to improve distribution fit, try different link functions, and perform grid search to optimize parameter setting). Again, for the sake of fare comparison of the models, the R2 is computed after reverting the predictions of transformed responses to their original scales. Table 4 shows R2 for the three models, which indicate that SVR provides results as good as the old LR model when they are calibrated over 850 scenarios. On the other hand, when all 1,500 scenarios are considered, the SVR outperforms LR by 11%, 23%, and -6% for the three response variables. This implies an average improvement of 9% over the old LR model. It is interesting to note that even though the ln transform improves the distribution fit for the vehicle-km (Figure 4), both SVR and GLM provide marginally higher R2 when no transform is applied to vehicle-km (i.e., R2 ≈ 0.77 rather than 0.74 and 0.73) In Figure 7 we compare predictions of the SVR, GLM and old LR model with the real-

16

world scenarios that were obtained by solving an instance of the dial-a-ride problem (see Figure 1 for spatial distribution of requests). Subfigures on the left show the comparison for the case when both models are calibrated based on all 1,500 scenarios. In this case, the newly proposed SVR and GLM provide substantially better predictions than the old LR model. On the other hand, subfigures on the right show a comparison when all models are calibrated on a subset of 850 scenarios. In this case, the GLM outperforms the LR model in predicting fleet size and vehicle-hr, while the SVR provides competitive results in forecasting vehicle-km. In Table 5 we provide the mean square errors (RMSE) that reiterate conclusions drawn from Figure 7. Here, it is also worth noting that all the models show lower prediction accuracy for the vehicle-km variable. Various efforts were made to improve their performance (e.g., exploring more complex model specifications and kernel functions), but the resulting models would overfit. Thus, we believe that our current models have captured most of the underlying relations and that the rest is noise (which seems to be greater in the case of the vehicle-km than for the other two variables). Results reported in Table 4 and Figure 7 indicate that approach presented in this paper substantially enhances the accuracy of predictions. Since the current approach improves upon Marković et al. (2013), it also outperforms models by Fu (2003) and Luo and Schonfeld (2011) who consider a subset of the explanatory variables included in our models (six out of eleven). Moreover, the applicability of current models is significantly increased by calibrating them over a wider range of variables. For example, the three aforementioned studies consider scenarios with up to 90, 140, and 120 vehicles, respectively, while our models are calibrated on scenarios with up to 470 vehicles. To enable a simple use of our enhanced models, the proposed GLM is built into an online tool which is presented next, while the trained SVR is made available for download from the web page hosting the tool. 4.4. Online tool We update the online decision support tool introduced in our earlier study to incorporate the GLM which provides greater accuracy in predicting the required DAR system capacity over a much wider range of variables. Its simple interface is shown in Figure 8. This free online tool can provide decision support in planning DAR systems, and allow practitioners to use calibrated statistical models with a click of a button. We believe that our tool could be very helpful when trying to estimate cost of introducing DAR service in a region. All practitioners need to do is to fill in the values for the eleven explanatory variables in accordance with the region of their interest and desired operating conditions, click the “calculate” button, and obtain the fleet size, vehicle-hr, and vehicle-km needed to provide the desired level of service. Once these measures are estimated, the daily cost of operating the

17

Real−World vs. Predictions: Fleet Size

Real−World vs. Predictions: Fleet Size

90

90

Real−World LR (old) GLM (new) SVR (new)

80 70

70 60

Vehicle

Vehicle

60 50 40

50 40

30

30

20

20

10

10

0


80

0

2

4

6

8

10

12

14

16

0

18

0

2

4

6

Test Scenario

Real−World vs. Predictions: Vehicle−Hr Real−World LR (old) GLM (new) SVR (new) Vehicle−Hr

Vehicle−Hr

16

18

500

450

450

400

400

350

350

0

2

4

6

8

10

12

14

16

300

18

0

2

4

6

Test Scenario

x 10


2

Vehicle−Km

1.7

1.5

1.4

1.4

10

12

18

1.7

1.5

8

16

1.8

1.6

6

14

1.9

1.6

4

12


2

1.8

2

x 10

2.1

1.9

0

10

Real−World vs. Predictions: Vehicle−Km

4

2.2

2.1

8

Test Scenario

Real−World vs. Predictions: Vehicle−Km

4

Vehicle−Km

14


550

500

1.3

12

600

550

2.2

10

Real−World vs. Predictions: Vehicle−Hr

600

300

8

Test Scenario

14

16

1.3

18

Test Scenario

0

2

4

6

8

10

12

14

16

18

Test Scenario

Figure 7: Predictions vs. real-world observations. In subfigures on the left, models are calibrated on roughly 1,500 scenarios in which fleet size ranges from 7 to over 450 vehicles. In subfigures on the right, models are calibrated on a subset of some 850 scenarios in which fleet size ranges from 20 to 90 vehicles. New models substantially outperform the LR when calibrated over a wider range of variables, while they provide at least as good results when calibrated over a limited subset of data.

18

Figure 8: Free online decision support system for predicting required DAR system capacity, hosted at http://www.planning-dial-a-ride-services.com. It should be accessed with Internet Explorer.

DAR system can be computed by simply multiplying the response variables with appropriate unit costs and adding them up. The average cost per trip can be readily computed as well. Application of a calibrated GLM is computationally inexpensive and the online tool returns estimated response variables instantaneously. This allows for a quick what-if analysis that would enable transit planners to explore the tradeoffs between costs and levels of service. For example, they could estimate the cost of providing different time windows or ride time ratios. At the same time, the practitioners can compare the cost of providing dial-a-ride service with other transportation alternatives, such as taxi or conventional transit. The online tool can also serve for a simple comparison of our results with those from other models for DAR system design. To facilitate future research and extensions, we make all the models and data available for download from the web page hosting the tool. As argued above, we believe that the proposed approach and online tool would be useful in making the first estimate of the system capacity when introducing a new DAR service to

19

an area. However, for the analysis of existing systems, we think that the simulation-based approach (Fu, 2002; Quadrifoglio et al., 2008; Häll et al., 2012; Ronald et al., 2013; Shen and Quadrifoglio, 2013) would be preferable. In these cases, the operators would already have detailed information about the system (i.e., historical data for the origins/destinations and pickup/dropoff times), which would enable them to perform simulation analysis and answer various what-if questions, such as how many vehicles would be needed if the local transit agency imposed different levels of service (e.g., tighter time windows or smaller ride time ratios) or if the demand patterns changed (e.g., distribution of requests became more uniform over time). For existing systems, such questions could be answered with greater accuracy via simulation than with our statistical approach. Finally, the decision makers using the online tool should be aware that predictions from the GLM are not 100% accurate (as is the case with any other statistical model). Thus, they may consider correcting these predictions based on their intuition or knowledge about a specific area. This is particularly important for the vehicle-km, because the GLM overestimates this variable for the real-world scenarios by 13.45% on average (see Figure 5 and 7). It should be noted though, that the smallest prediction errors were observed for the fleet size (maximum error of 10.81% and average error of 4.03%), which is the most important response variable as it determines the capital investment for opening a DAR system. Planners using the tool should account for these errors and determine a capacity of a DAR system that balances between its robustness and cost-effectiveness. 5. Conclusions This paper proposes statistical and machine learning models capable of predicting DAR system capacity better than techniques described in the literature. Greater precision of the newly-proposed models is demonstrated through direct comparison with earlier approaches on both simulated and real-world data. An average improvement of 9% is observed. Moreover, the applicability of the new models is substantially increased by calibrating them over a range of variables that is far greater than in earlier studies. Higher precision despite larger ranges of variables is partially achieved through natural log transforms. To make our research contributions practice-ready, the GLM is implemented into an online tool that can provide free decision support in planning new or adjusting the size of existing DAR systems. For example, an agency interested in introducing a DAR service in a region, can use the online tool to estimate the capacity of a system needed to provide the desired level of service. The tool would also enable the agency to (a) explore tradeoffs between system costs and levels of service, and (b) compare the cost of providing DAR service with other transportation alternatives (e.g., taxi or conventional transit). 20

All the models and data are made publicly available to facilitate further analysis. Future work may include collection of more real-world data from DAR companies for the sake of additional verification. Current models are verified through comparison with the realworld instances in which fleet size ranges from roughly 30 to 70 vehicles. It would be interesting to compare predictions of these models against real-world systems including fleet sizes outside this range. The online tool presented here should facilitate further verification of the proposed models as well as their comparison with other methods. References Aldaihani, M. M., Quadrifoglio, L., Dessouky, M. M. and Hall, R. (2004), ‘Network design for a grid hybrid transit service’, Transportation Research Part A: Policy and Practice 38(7), 511–530. Bar-Yehuda, Z. (2014), ‘Plot google map’. MATLAB Central File Exchange, Retrieved 01/02/2015. URL: http://www.mathworks.com/matlabcentral/fileexchange/27627-zoharby-plot-googlemap Chang, C. C. and Lin, C. J. (2011), ‘LIBSVM: a library for support vector machines’, ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27. Chang, S. K. and Schonfeld, P. M. (1991), ‘Optimization models for comparing conventional and subscription bus feeder services’, Transportation Science 25(4), 281–298. Cordeau, J.-F. and Laporte, G. (2003), ‘The dial-a-ride problem (DARP): Variants, modeling issues and algorithms’, Quarterly Journal of the Belgian, French and Italian Operations Research Societies 1(2), 89–101. Daganzo, C. F. (1978), ‘An approximate analytic model of many-to-many demand responsive transportation systems’, Transportation Research 12(5), 325–333. Diana, M., Dessouky, M. M. and Xia, N. (2006), ‘A model for the fleet sizing of demand responsive transportation services with time windows’, Transportation Research Part B: Methodological 40(8), 651–666. Drucker, H., Burges, C. J., Kaufman, L., Smola, A. and Vapnik, V. (1997), ‘Support vector regression machines’, Advances in neural information processing systems 9, 155–161. Fu, L. (2002), ‘A simulation model for evaluating advanced dial-a-ride paratransit systems’, Transportation Research Part A: Policy and Practice 36(4), 291–307. 21

Fu, L. (2003), ‘Analytical model for paratransit capacity and quality-of-service analysis’, Transportation Research Record: Journal of the Transportation Research Board 1841(1), 81–89. Häll, C. H., Högberg, M. and Lundgren, J. T. (2012), ‘A modeling system for simulation of dial-a-ride services’, Public Transport 4(1), 17–37. Horn, M. E. T. (2002), ‘Multi-modal and demand-responsive passenger transport systems: a modelling framework with embedded control systems’, Transportation Research Part A: Policy and Practice 36(2), 167–188. Keerthi, S. S. and Lin, C.-J. (2003), ‘Asymptotic behaviors of support vector machines with Gaussian kernel’, Neural computation 15(7), 1667–1689. Kim, M. E. and Schonfeld, P. (2013), ‘Integrating bus services with mixed fleets’, Transportation Research Part B: Methodological 55, 227–244. Kim, M. E. and Schonfeld, P. (2014), ‘Integration of conventional and flexible bus services with timed transfers’, Transportation Research Part B: Methodological 68, 76–97. Komarov, O. (2013), ‘Schemaball’. MATLAB Central File Exchange, Retrieved 01/02/2015. URL: http://www.mathworks.com/matlabcentral/fileexchange/42279-schemaball Luo, Y. and Schonfeld, P. (2011), Performance metamodels for dial-a-ride services with time constraints, in ‘Transportation Research Board 90th Annual Meeting’, number 11-3144. Marković, N., Milinković, S., Schonfeld, P. and Drobnjak, Z. (2013), ‘Planning dial-a-ride services: Statistical and meta-modeling approach’, Transportation Research Record: Journal of the Transportation Research Board (2352), 120–127. Marković, N., Nair, R., Schonfeld, P., Miller-Hooks, E. and Mohebbi, M. (2015), ‘Optimizing dial-a-ride services in maryland: Benefits of computerized routing and scheduling’, Transportation Research Part C: Emerging Technologies 55, 156–165. McCullagh, P. and Nelder, J. A. (1989), Generalized linear models., London England Chapman and Hall 1983. Neven, A., Braekers, K., Declercq, K., Wets, G., Janssens, D. and Bellemans, T. (2015), ‘Assessing the impact of different policy decisions on the resource requirements of a demand responsive transport system for persons with disabilities’, Transport Policy 44, 48–57.

22

Palmer, K., Dessouky, M. and Abdelmaguid, T. (2004), ‘Impacts of management practices and advanced technologies on demand responsive transit systems’, Transportation Research Part A: Policy and Practice 38(7), 495–509. Quadrifoglio, L., Dessouky, M. M. and Ordonez, F. (2008), ‘A simulation study of demand responsive transit system design’, Transportation Research Part A: Policy and Practice 42(4), 718–737. Ronald, N., Thompson, R., Haasz, J. and Winter, S. (2013), Determining the viability of a demand-responsive transport system under varying demand scenarios, in ‘Proceedings of the Sixth ACM SIGSPATIAL International Workshop on Computational Transportation Science’, ACM, p. 7. Shen, C. W. and Quadrifoglio, L. (2013), ‘Evaluating centralized versus decentralized zoning strategies for metropolitan ADA paratransit services’, Journal of Transportation Engineering 139(5), 524–532. Sheppard, M. (2012), ‘Allfitdist’. MATLAB Central File Exchange, Retrieved 01/02/2015. URL: http://www.mathworks.com/matlabcentral/fileexchange/34943-fit-all-validparametric-probability-distributions-to-data Shinoda, K., Noda, I., Ohta, M., Kumada, Y. and Nakashima, H. (2004), Is dial-a-ride bus reasonable in large scale towns? Evaluation of usability of dial-a-ride systems by simulation, in ‘Multi-agent for mass user support’, Springer, pp. 105–119. Smola, A. J. and Schölkopf, B. (2004), ‘A tutorial on support vector regression’, Statistics and computing 14(3), 199–222.

23