Reference policies for non-myopic sequential network design and ...

Reference policies for non-myopic sequential network design and timing problems Joseph Y.J. Chow* Department of Civil & Urban Engineering, New York University, New York, NY, USA *Corresponding author: [email protected] Hamid R. Sayarshad Department of Civil Engineering, Ryerson University, Toronto, ON, Canada

Abstract Despite a growing number of studies in stochastic dynamic network optimization, the field remains less well defined and unified than other areas of network optimization. Due to the need for approximation methods like approximate dynamic programming, one of the most significant problems yet to be solved is the lack of adequate benchmarks. The values of the perfect information policy and static policy are not sensitive to information propagation while the myopic policy does not distinguish network effects in the value of flexibility. We propose a scalable reference policy value defined from theoretically consistent real option values based on sampled sequences, and estimate it using extreme value distributions. The reference policy is evaluated on an existing network instance with known sequences (Sioux Falls network from Chow and Regan, 2011a): the Weibull distribution demonstrates good fit and sampling consistency with more than 200 samples. The reference policy is further applied in computational experiments with two other types of adaptive network design: a facility location and timing problem on the Simchi-Levi and Berman (1988) network, and Hyytiä et al.’s (2012) dynamic dial-a-ride problem. The former experiment represents an application of a new problem class and use of the reference policy as an upper bound for evaluating sampled policies, which can reach 3% gap with 350 samples. The latter experiment demonstrates that sensitivity to parameters may be greater than expected, particularly when benchmarked against the proposed reference policy.

Keywords: Approximate dynamic programming, sequential network design problems, dynamic dial-a-ride problem, facility location problem, adapted stochastic process, Markov decision process

To appear in Springer’s Networks and Spatial Economics journal 1

1. Problem background 1.1. Introduction to sequential network design and timing problems While dynamic/real-time/online/sequential network models have existed for many decades, the availability of real time “Big” data in recent years has driven an increasing interest in those models in “smart cities” context. A range of different models and algorithms have been developed, in particular, using such data to improve network design and timing decisions under uncertainty with look-ahead policies and rolling horizons, including dynamic vehicle routing problems (e.g. Spivey and Powell, 2004; Mitrović-Minić et al., 2004; Thomas and White, 2004; Ichoua et al., 2006), dynamic pricing and routing (Figliozzi et al., 2007; Sayarshad and Chow, 2015), adaptive network design (Chow and Regan, 2011a), among others. These advances come at a time when “smart cities” solutions are needed more than ever before—both urbanization and climate change threaten to increase uncertainty and its effects on society. Contrary to static network design problems (which, for the purpose of this study, encompass the broad class of network models surveyed by Magnanti and Wong, 1984: road equilibrium network design problems, facility location problems, vehicle routing problems, capacitated multicommodity flow problems, etc.), sequential network design models under uncertainty feature four notable differences (among others). The problem is conceptually illustrated in Figure 1. 𝑡=0

𝑡=1

𝑡=2

…

Built links have effect on performance of other links

Subset of variables stochastic, and may evolve over time

If built in 𝑡 = 0

If not built

If built in 𝑡=1 Design now or later?

Design now or later? If not built Decisions become state-dependent “policies”

Design now or later?

…

Figure 1. Illustration of sequential network design and timing problem (for a single link).

First, decisions can be made over two or more stages, or even continuously over time (e.g. optimal control problems). Earlier decisions are made without yet knowing the decisions to be made in future stages. This distinction itself sets sequential network design apart from many 2

other stochastic network design models in the literature because those typically assume only a two-stage (e.g. stochastic plus recourse, see Zhang et al., 2013) problem (see examples included in a recent survey by Chen et al., 2011, or in Szeto et al., 2013). Second, uncertainty is characterized by adapted stochastic processes that define the state of a system, and such information is revealed over time. Time-independent random variables are special cases of these processes. As a consequence, sequential network design models can make use of historical information, and some can be designed to “look ahead” and anticipate future states that may occur. The “look ahead” variety is referred to as a non-myopic model. Two-stage stochastic models are near-myopic, and do not adequately exploit the value of historical information to anticipate future uncertainty. Third, the presence of states means that a model output is not a deterministic set of actions or decision variables, but is instead a rule or function that selects decision variables based on a state. We call this function a “policy”. For example, one sequential design policy may be “to invest in link A if stochastic volume on that link exceeds 1500 vph”, while another may be “to invest if link volume exceeds 1400 vph”. The optimal policy is to find the optimal rule under the dynamic setting—in the example, the solution is to find the set of states in which it is best to invest in link A (versus other links), and the other states in which it would be better to defer or reject (if at final stage of a finite decision horizon) the investment. Fourth, because uncertainty and decision-making are dynamic, the full set of actions available to a decision-maker is not simply a question of {yes, no} (e.g. build this link or not) but is instead a question of {now, later/never}. The timing component assumes that relevant information for the decision has yet to be revealed, and it may be better to defer the decision until a later time (until a finite horizon, if such exists, at which point further deferral implies rejection). For detailed discussion and unified treatment of sequential network design policies, see Powell (2011) and Powell et al. (2012), and for such policies with more explicit consideration of the both design and timing decisions in a network, see Chow and Regan (2011a) and Chow et al. (2011). This research is applicable to a diverse set of data-driven network design problems in a wide variety of sectors including transportation, energy, telecommunications, spatial economics, and media. 1.2. Motivation Despite a growing number of studies in stochastic dynamic network optimization, the field remains less well defined and unified than other areas of stochastic network optimization (Powell et al., 2012). Unlike clearly defined mathematical programming methods, sequential decisionmaking features additional dimensions of complexity as discussed above that result in an intractable exact problem (Powell, 2011; Chow and Regan, 2011a). As a result, algorithms for policies are all based on approximations of some form. Due to the need for approximation methods like approximate dynamic programming (ADP), one of the most significant problems yet to be solved is the lack of adequate benchmarks. Proposed policies are typically compared to another policy that serves as a benchmark, which we call a reference policy. Three reference policies are most often used: a static or a priori policy that does not change decisions during the stages; a myopic policy that is dynamic but does not use the information to anticipate future states; and a “perfect information” or a posteriori policy where the stochastic dynamic problem is solved in hindsight as a problem. The last policy is used in both analytical and computational evaluations; Berbeglia et al. (2010) described the use of this 3

bound as part of an analytical competitive ratios (Karp, 1992), but it is restricted to finding worst case bounds for highly idealized cases. Reliance of these benchmarks is evident from Table 1, which illustrates the types of reference policies used in studies of non-myopic dial-a-ride problems (DARP). Each new study requires some form of approximation, and furthermore, the benchmarks used to evaluate their proposed policies and algorithms are rarely with another algorithm or state-of-the-art policy. The sole reliance on these benchmarks persists in other types of network design problems as well; Chow et al. (2011) use static and myopic policies to compare against their non-myopic adaptive network design and timing policy. Table 1. Illustration of inadequacy in benchmark policies in non-myopic DARP literature Studies Approximation feature(s) Reference policies used Secomandi (2001) Cyclic rollout policy for value Static policy Thomas and White (2004) One-stage look-ahead Myopic policy Mitrović-Minić et al. (2004) Double horizon insertion heuristic Single horizon heuristic; Perfect information Spivey and Powell (2004) Gradient approximations of value Myopic policy Ichoua et al. (2006) Dummy customers for state; Another algorithm Tabu search heuristic Thomas (2007) Five insertion heuristic policies Each other Novoa and Storer (2009) Cyclic rollout policy for value; Variations of rollout policy; Monte Carlo simulation Perfect information Cortés et al. (2009) Two-stage look-ahead; Explicit enumeration Particle swarm optimization Hyytiä et al. (2012) Multi-server queue for value Myopic policy Sayarshad and Chow (2015) Mult-server queue for pricing Myopic pricing policy

The three common reference policies are inadequate on their own. Let us illustrate this point. Consider the distribution made by a random walk that starts at 𝑥0 and either moves left or right by ∆𝑥 with 50% probability. After 𝑡 steps, a distribution for 𝑥(𝑡) can be determined. However, it is also possible to achieve the exact same distribution with a different 𝑥0 and different left/right probability. A static policy only sees the single distribution at a point in time, so it would not be able to distinguish whether the distribution came from the former or the latter information process. At the other extreme, the perfect information policy sees a single realization, which for three time steps may look like Left-Left-Right. Either of the information processes could have led to this exact same outcome. As such, the values of the static policy or perfect information policy are not dependent on information propagation (see Definition 1), so two networks with different stochastic processes but the same realized outcome would look identical to these benchmarks. Definition 1. A sequential network design reference policy is information sensitive if the value of the policy, Φ, is dependent on adapted stochastic process(es) in which information at time 𝑡 include realizations of the process(es) from all time 𝑠, where 𝑠 ≤ 𝑡. The myopic policies capture the randomness, i.e. they are sensitive to the historical and current state of the network, but they do not look ahead. Chow and Regan (2011b) showed that the value of flexibility in timing a network design when looking ahead can be separated into two parts: the value from timing, and the value from network effects. Since myopic policies by definition do not look ahead, the future value due to network effects cannot be distinguished. As a result, when comparing two myopic policies on two different networks it is not possible to distinguish the 4

portions of the value that are due to network effects or due to timing and information propagation. For example, consider comparing two different instances with different myopic policy values 𝑃 and 𝑄. Suppose 𝑃 > 𝑄 (where a higher value is desired). Is it because the network with 𝑃 is more conducive to the policy? Or is it because of the stochasticity of the parameters? It is important to distinguish this for network design purposes because we do not wish to falsely conclude that a certain design is good and transferable when it is due primarily to a particular information state for that instance. Because we cannot break out the value due to network effects, only comparisons on the same instance would make sense for myopic policies. In Definition 2, this condition for network effect measurability in this context is clarified. Definition 2. A sequential network design reference policy is network effect measurable if the value of the benchmark, Φ, exhibits value from flexibility in which network effects can be distinguished. Benchmarks that are not sensitive to information propagation or network structure lead to findings that may be too localized to a specific instance, even if the comparison were made using the same test case. For example, two policies A and B that are compared on a network using the same simulated sample paths may show that policy A operates better than a static reference policy by 20% while policy B compared to the same static reference policy may improve by only 10%. However, if the uncertainty parameters were changed, the policy A may only be better by 0.5% while policy B is better by 0.3%. One might conclude that A is better than B, but by how much? If policy A and policy B were instead compared to a third reference policy C that was also sensitive to network structure and information, then even as parameters altered, there would be more consistency to the comparison for generalizing the findings. A well-designed reference policy is one that can be used to benchmark existing algorithms on network instances such that conclusions can be generalized to variations of those instances or to other networks. For example, we can show what the effect of changing the demand rate or available fleet size has on algorithm performance with respect to a reference policy. 1.3. Proposed contributions We propose a reference policy that is both information-sensitive and network effect-measurable in this study. The reference policy is derived from an algorithm by Chow and Regan (2011a) that approximates a policy that is consistently defined relative to network effect value, but requires explicit enumeration of sequences. The following contributions are made: 1) A method is proposed to obtain an extreme value distribution of the policy value from Chow and Regan (2011a) without explicit sequence enumeration; 2) The fitness of the extreme value distributions are evaluated to find that the Weibull distribution is a much better fit than the Gumbel distribution for the Sioux Falls example from Chow and Regan (2011a) and the dynamic DARP example from Hyytiä et al. (2012); 3) Sample sizes are evaluated for the Sioux Falls example, and the Weibull-based estimator of the policy value was found to exhibit consistency; 4) A non-myopic adaptive facility location problem is proposed and the reference policy is demonstrated as an upper bound for a policy drawn from sampled sequences; 5) An experiment is conducted in a dynamic DARP environment to replicate the algorithm proposed by Hyytiä et al. (2012) and to evaluate the sensitivity of the proposed reference 5

policy across different parameter instances of a base network. Their policy is shown to be less significant (even negligible) to a myopic algorithm and more sensitive to network parameters when compared to the reference policy.

2. Overview of an existing policy and research gaps The policy obtained with Chow and Regan’s (2011a) algorithm for the adaptive network design and timing problem (ANDTP) is shown in this section to exhibit both information-sensitivity and network effect-measurability. For convenience, we refer to that policy as the CR policy throughout the rest of this study. We review the definition of that policy and the algorithm used to approximate it in this section. 2.1. Why the CR policy has network effect measurability and information sensitivity A non-myopic dynamic optimization problem is solved either as an optimal control differential equation (for continuous time decision problems) or a Markov decision process (for discrete time decision problem) of a form in Eq (1) (shown in discrete time form), called the Bellman equation. 𝑉𝑡 (𝑆𝑡 ) = max(𝐶𝑡 (𝑆𝑡 , 𝑥𝑡 ) + 𝛾𝐸[𝑉𝑡+1 (𝑆𝑡+1 )|𝑆𝑡 , 𝑥𝑡 ]) xt

(1)

where 𝑉𝑡 (𝑆𝑡 ) is the value of the policy being maximized, 𝐶𝑡 is the immediate payoff at time step 𝑡 of the decision/action 𝑥𝑡 under state 𝑆𝑡 (which is also typically driven by information on exogenous stochastic variables), and 𝛾 is a discount factor. The challenge is in determining an appropriate value for the last term, 𝐸[𝑉𝑡+1 (𝑆𝑡+1 )|𝑆𝑡 ]. The term depends on the future state 𝑆𝑡+1 , but 𝑆𝑡+1 is conditional on 𝑆𝑡 and 𝑥𝑡 . As either of these variables increase, and as the number of discrete time steps increases, the problem quickly becomes intractable to solve—the curse of dimensionality. In the networked variant with explicit timing decisions (e.g. “now versus later” up to a finite horizon, instead of “yes versus no”), the 𝑉𝑡,𝑖 is the policy value at time step 𝑡 for project 𝑖 ≤ 𝐾, where 𝐾 is the number of projects that can be decided upon, and 𝑖 is an index of the projects. As a generalization, a project may refer to a single link, a single node, or a set of some combination of the two. As a result, the value depends on both the past (through network effects) and future (through expected value function), leading to no exact solution method possible. The networked variant is shown by Chow and Regan (2011a) to be altogether unsolvable because the 𝑉𝑡,𝑖 for project 𝑖 is dependent on 𝑉𝑡−1,𝑗 for project 𝑗. The intractable problem is shown in Eq (2). In other words, the decision of whether to implement decision 𝑥𝑡,𝑖 at time step 𝑡 for project 𝑖 ≤ 𝐾 depends on the value of the policies pertaining to the other projects 𝑖′ ≠ 𝑖, 𝑥𝑡,𝑖′ , as well as to the expected future policy value of project 𝑖, 𝐸[𝑉𝑡+1,𝑖 (𝑆𝑡+1 )], and its choice will affect the policy values of other projects as well.

6

𝑉𝑡,𝑖 (𝑆𝑡 , 𝑥𝑡,𝑖 ′ ) = max (𝐶𝑡 (𝑥𝑡,𝑖 ; 𝑆𝑡 , 𝑥𝑡,𝑖′ ) xt,i

+ (𝑉𝑡,1 (𝑥𝑡,𝑖 ) + ⋯ + 𝑉𝑡,𝑖−1 (𝑥𝑡,𝑖 ) + 𝑉𝑡,𝑖+1 (𝑥𝑡,𝑖 ) + ⋯ + 𝑉𝑡,𝐾 (𝑥𝑡,𝑖 ))

(2)

+ 𝜌𝐸[𝑉𝑡+1,𝑖 (𝑆𝑡+1 )|𝑆𝑡 , 𝑥𝑡,𝑖 , 𝑥𝑡,𝑖 ′ ]) While this problem is not solvable, approximations can be made. Chow and Regan (2011a) proposed a policy approximation that keeps the value of network effects and timing. Their argument is that for network design and timing problems where the demand is not dependent on decisions and decisions are irreversible, a vector of completed decisions can be represented equivalently as any permutation of sequences. For example, suppose projects 1, 2, and 3 were completed prior to time 𝑡. Then at time 𝑡, the value of the 3 projects together is the same regardless of whether they were completed as {1,2,3}, {1,3,2}, or any other permutation of the three. Let Φt (𝑆𝑡 ) = ∑𝑖 𝑉𝑡,𝑖 be the policy value of all the project policies for state 𝑆𝑡 , let ℎ∗ be an optimal sequence of projects, and let 𝑖(ℎ) be the order of sequence ℎ. For example, consider a set of two sequences {1,2,3}, {1,3,2}. Sequence ℎ = 1 can refer to {1,2,3} while ℎ = 2 can refer to {1,3,2}. Then project 𝑗 of index 𝑖 = 2 would be 𝑗 = 2(1) = 2 if ℎ = 1, or 𝑗 = 2(2) = 3 if ℎ = 2. If the vector of project policies is disaggregated into sequences of policies, then Eq (2) has an equivalent form as a sum of real option premiums shown in Eq (3). 𝐿𝑆 𝐷 𝐿 Φt,h∗ (𝑆𝑡 ) = ∑ 𝑁𝑃𝑉𝑡,𝑖(ℎ∗) (𝑆𝑡 ) + ∑ (𝐹𝑡,𝑖(ℎ ∗ ) (𝑆𝑡 ) + 𝐹𝑡,𝑖(ℎ ∗ ) (𝑆𝑡 )) + 𝐹𝑡,ℎ ∗ (𝑆𝑡 ) 𝑖

(3)

𝑖

𝑁𝑃𝑉𝑡,𝑖(ℎ∗) is the expected net present value of the payoff of the irreversible decision 𝑥𝑡,𝑖(ℎ∗) for 𝐷 project 𝑖(ℎ∗ ) at state 𝑆𝑡 . 𝐹𝑡,𝑖(ℎ ∗ ) is the premium to the policy due to flexibility of timing project ∗ ); 𝐿 𝑖(ℎ 𝐹𝑡,𝑖(ℎ∗) is the premium to the policy due to flexibility of permitting the option to invest in subsequent project {𝑖 + 1}(ℎ∗ ) at state 𝑆𝑡 (this is recursive such that first projects in sequence will have more of this value because they hold the opportunity to invest in subsequent projects), 𝐿𝑆 ∗ and 𝐹𝑡,ℎ ∗ is the premium from flexibility to re-order sequence ℎ at a future time step 𝜏 > 𝑡. All these option premiums are non-negative by definition (Trigeorgis, 1996; Chow and Regan, 2011b), as non-profitable opportunities can always be turned down. ̃ t,h∗ . By forgoing the The CR policy is a lower bound of Eq (3), as shown in Eq (4) as Φ 𝐿𝑆 non-negative 𝐹𝑡,ℎ∗ term, the CR policy is a well-defined lower bound of the true network design 𝐷 and timing policy, one that still captures flexibility from timing (𝐹𝑡,𝑖(ℎ ∗ ) ) and network effects 𝐿 (𝐹𝑡,𝑖(ℎ∗) ) and has greater value than a static policy without non-myopic adaptation (∑𝑖 𝑁𝑃𝑉𝑡,𝑖(ℎ∗) ). Regardless of what example this network design problem is applied to, the value can be clearly broken down into contributions from network effects, timing, or expected value. This is in essence what a reference policy requires for network effect-measurability and informationsensitivity, as it can control for the stochastic and network structures of the examples used by comparing with other algorithms. 𝐷 𝐿 ̃ t,h∗ = ∑ 𝑁𝑃𝑉𝑡,𝑖(ℎ∗) + ∑(𝐹𝑡,𝑖(ℎ Φ ∗ ) + 𝐹𝑡,𝑖(ℎ ∗ ) ) 𝑖

(4)

𝑖

7

2.2. An algorithm to approximate the CR policy The CR policy can be approximated using an ADP method with asymptotic convergence based on a combination of explicit sequence enumeration and the multi-option least squares Monte Carlo (LSM) simulation method (Carriere, 1996; Longstaff and Schwartz, 2001; Gamba, 2002) for each sequence. A summary of the algorithm from Chow and Regan (2011a) is provided in generalized form in Algorithm 1 for any adaptive network design problem with explicit timing. The method was proposed by Chow and Regan (2011a) and applied in Chow et al. (2011) to solve the adaptive discrete network design and timing problem, which extends the discrete network design problem of adding links to a road network with user equilibrium sub-problem to consider timing of those link additions as well. In that study, OD demand is assumed to be stochastic processes that evolve as geometric Brownian motion, but as a simulation-based method Algorithm 1 can substitute in other stochastic processes. ALGORITHM 1: CR policy algorithm for ANDTP 1. Simulate a set 𝑃 of independent sample path realizations of the stochastic processes. 2. Enumerate the set 𝐻 of all feasible (implicitly accounts for budget constraints) sequences of candidate projects. 3. For each sequence ℎ ∈ 𝐻, solve Gamba’s (2002) multi-option LSM simulation: a. For 𝜏 from finite horizon 𝑡 + 𝑇 back to 𝑡, i. For project 𝑖 from last project 𝐿(ℎ) back to 1(ℎ), 1. If 𝜏 < 𝑡 + 𝑇, a. Use least squares to estimate the 𝛽𝑚 for an ℳ-polynomial Hermite least squares regression model shown in Eq (5), where each sample observation of 𝑥 is the net present value at sample path 𝜔 ∈ 𝑃, defined as 𝑥 = 𝜋𝜏 (𝑥𝜏,𝑖(ℎ) , 𝜔), and each observation of 𝑉𝜏+1,𝑖 is one step downstream in the sample path. ℳ

𝑥 2 𝑑𝑚 𝑥2 𝑚 ̂ (−1) 𝐸ℳ [𝑉𝜏+1,𝑖 |𝑥] = ∑ 𝛽𝑚 exp ( ) 𝑚 exp (− ) 2 𝑑𝑥 2

(5)

𝑚=0

2. Set 𝑥𝜏,𝑖(ℎ) to maximize the approximate dynamic programming Bellman equation in Eq (6) using 𝜋𝜏 (𝜔) to approximate 𝜌𝐸̂ [𝑉𝜏+1,𝑖 ](𝜋𝜏 (𝑥𝜏,𝑖(ℎ) , 𝜔), 𝛽). At 𝜏 = 𝑡 + 𝑇, 𝐸̂ℳ [𝑉𝜏+1,𝑖 |𝜋𝜏 (𝑥𝜏,𝑖(ℎ) , 𝜔)] = 0, and 𝑉𝜏,{𝐿+1}(ℎ) (𝜔) ≡ 0. 𝑉𝜏,𝑖(ℎ) (𝜔) = max (𝜋𝑡 (𝑥𝜏,𝑖(ℎ) , 𝜔; 𝑥𝜏,∀𝑗 0, but negative values can be converted to positive values by artificially applying a sufficiently large constant offset. The cumulative distribution of the Gumbel distribution is shown as 𝐹𝐺 in Eq (9) to substitute for the function 𝐹(𝑦) in step 5 of Algorithm 2, where 𝜇 is a location parameter and 𝛿 is a scale parameter. 𝐹𝐺 (𝑦; 𝜇, 𝛿) = exp (− 𝑒𝑥𝑝 (−

𝑦−𝜇 )) 𝛿

(9)

Let 𝑀 be the maximum of a finite population, i.e. 𝑀 = max{𝑌1 , … , 𝑌|𝐻| }. The probability that 𝑀 ≤ 𝑦, i.e. the cumulative distribution of the Gumbel maximum, 𝐹𝑀𝐺 , is shown in Eq (10). |𝐻|

𝐹𝑀𝐺 (𝑦; 𝜇, 𝛿) = (𝐹𝑀𝐺 (𝑦; 𝜇, 𝛿))

= exp⁡(− exp (−

= (exp (−|𝐻| 𝑒𝑥𝑝 (−

𝑦 − (𝜇 + 𝛿 𝑙𝑛|𝐻|) ) 𝛿

𝑦−𝜇 )) 𝛿

(10)

As a result, the maximum is also Gumbel distributed with location parameter 𝜇 + 𝛿 ln|𝐻| and scale parameter 𝛿. The mean and standard deviation of the distribution are: 𝐸[𝑀𝐺 ] = 𝜇 + 𝛿 ln|𝐻| + 𝛾𝛿 𝛿𝜋 𝜎𝑀𝐺 = √6 The parameter 𝛾 is Euler’s constant, 𝛾 ≈ 0.5772. The parameters are estimated using maximum likelihood as shown in Eq (11) to Eq (12).

𝛿̂ =

𝜎𝑉̅𝑡,𝑖(ℎ) √6

(11)

𝜋

𝜇̂ = 𝑉̿𝑡,𝑖(ℎ) − 𝛾𝛿̂

(12)

where 𝜎𝑉̅𝑡,𝑖(ℎ) is the sample standard deviation of set 𝑆 policy values, and 𝑉̿𝑡,𝑖(ℎ) is the mean of the path averages over the set 𝑆. For the 2-parameter Weibull distribution, the cumulative distribution function 𝐹𝑊 is shown in Eq (13), where 𝜇 is a location parameter and 𝛿 is a scale parameter. 12

𝑦 𝛿 𝐹𝑊 (𝑦; 𝜇, 𝛿) = 1 − exp (− ( ) ) , 𝜇

𝑦>0

(13)

The function 𝐹(𝑦) in step 5 of Algorithm 2 can be substituted with a Weibull maximum distribution, shown as 𝐹𝑀𝑊 in Eq (14). |𝐻|

𝐹𝑀𝑊 (𝑦) = (𝐹𝑊 (y))

= (1 − 𝑒

𝑥 𝛿 −(𝜇)

|𝐻|

)

(14)

The parameters are estimated using maximum likelihood as shown in Eq (15) – (16) (per Balakrishnan and Kateri, 2008). 𝑛

𝑛

𝑛 𝑛 ̂ − ∑ 𝑦𝑖𝛿 ln(𝑦𝑖 ) + ∑ ln(𝑦𝑖 ) = 0 ̂ 𝑛 𝛿 ̂ 𝛿 ∑ 𝑦 𝑖=1 𝑖 𝑖=1

(15)

𝑖=1

𝑛

1 ̂ 𝜇̂ = { ∑ 𝑦𝑖𝛿 } 𝑛

1 ̂ 𝛿

(16)

𝑖=1

4. Evaluation of the reference policy In this section, the objective is to study the extreme value distribution of the policy value determined by Algorithm 2. How well does the extreme value assumption fit actual sequences? How many samples are sufficient for a network instance? These research questions are empirically tested. 4.1. Fitness test of the distributions The fitness of the distributions on a real set of sequenced policies is evaluated. In order to validate the fitness of the distributions, we need a network instance where the full population of sequences is available. The Sioux Falls network example from Chow and Regan (2011a) is chosen. In that study, five projects (representing pairings of the ten links in the classical example, shown in Figure 2) were enumerated into 120 sequences, with the sequence policy values shown in Figure 3. We fit the Gumbel and Weibull distributions to the population of sequence policy values and use a two-sample Kolmogorov-Smirnov (K-S) test to evaluate their fitness. Matlab was used to fit the parameters for the Weibull distribution with MLE (wblfit). For the Gumbel distribution, MLE of the parameters for the 120 values from Chow and Regan (2011a) resulted in 𝜇̂ = 20691.26 and 𝛿̂ = 246.64. For the Weibull distribution, the estimated parameters were 𝜇̂ = 20992.70 and 𝛿̂ = 65.69.

13

The two-sample K-S test statistic is shown in Eq (17) for two cumulative distribution functions 𝐹1,𝑛 and 𝐹2,𝑛′ , where the null hypothesis that a distribution is the same as the underlying distribution of the observed data if the relationship in Eq (18) is true. 𝐷𝑛,𝑛′ = sup|𝐹1,𝑛 (𝑥) − 𝐹2,𝑛′ (𝑥)| x

𝐷𝑛,𝑛′ ′ √𝑛 + ′𝑛 𝑛𝑛

> 𝑐(𝛼)

(17) (18)

Figure 2. Sioux Falls network.

14

Figure 3. Enumerated sequence policy values from Chow and Regan (2011a).

The critical statistic for the Gumbel distribution was 𝑐𝐺𝑢𝑚𝑏𝑒𝑙 (𝛼) < 0.53, which suggests a significance level higher than 10% and implies that it fits the data quite well. The Weibull distribution results in a critical value of ⁡𝑐𝑊𝑒𝑖𝑏𝑢𝑙𝑙 (𝛼) = 0.799. Both distributions fit well. However, when we plot the observed population data (in which the maximum is known) along with the two distributions and the maximum distributions, we reach a clearer conclusion. The plot is shown in Figure 4. It is clear that the Weibull distribution provides a more accurate fit for the maximum value here, since the observed CDF represents the full population. Gumbel distribution overestimates the actual maximum value, and is also much less precise. 4.2. Reliability analysis and sampling consistency The choice of sample size parameter dictates the reliability of the estimated distribution. The reliability of parameters can be determined using Wald-type confidence intervals (Lawless, 2003). A location-scale parameter pair (𝜇̂ , 𝛿̂ ) has an approximate 95% confidence interval defined by Eq (19) – (20) (for both the Gumbel and Weibull distributions). 𝜇̂ − 1.96𝑠𝑒(𝜇̂ ) ≤ 𝜇 ≤ 𝜇̂ + 1.96𝑠𝑒(𝜇̂ ) ln 𝛿̂ − 1.96

𝑠𝑒(𝛿̂ ) 𝑠𝑒(𝛿̂ ) ≤ ln 𝛿 ≤ ln 𝛿̂ + 1.96 𝛿̂ 𝛿̂

(19) (20)

where 𝑠𝑒(𝜇̂ ) and 𝑠𝑒(𝛿̂ ) are obtained from the square roots of the diagonals of the inverse Fisher −1 information matrix 𝐼(𝜇̂ , 𝛿̂ ) . The sample size and distribution are incorporated within the information matrix. Readers are referred to Lawless (2003) for details of the method. For convenience, the bounds are estimated using the Matlab functions mentioned earlier.

15

1 0.9 0.8 0.7

%

0.6 0.5 0.4 0.3 0.2 0.1 0 20000

20500

21000

21500

22000

22500

23000

23500

24000

Chow-Regan Policy Value Data

Gumbel Fit

Gumbel Max

Weibull Fit

Weibull Max

Figure 4. Gumbel and Weibull cumulative distribution functions for reference policy.

To test the sensitivity of the reference policy distribution to the sequence sampling size, we compute the 95% confidence bounds of a Weibull distribution for the 10-link design and timing example for Sioux Falls, where the true CR policy value is unknown and where full enumeration would require a population of 10! = 3,628,800 sequences. Sequence samples drawn from this population are increased incrementally from 50 up to 350 samples and the bounds for the parameters are shown in Figure 5. 21300

100

21250

90

mu

delta

21200 21150 21100

70 60

21050 21000 -50

80

50

150

250

Number of sequence samples mu LB

mu UB

350

50 -50

50

150

250

350

Number of sequence samples delta LB

delta UB

(a) (b) Figure 5. Sample convergence for the (a) 𝝁 and the (b) 𝜹 of the Weibull distribution.

In some cases, the lower and upper bounds may not suffice in demonstrating consistency of the estimator (for example, 𝛿̂ still exhibits a large range), but in this case it appears that a stable confidence bound exists for 𝜇̂ in the sense that the mean over consecutive sample rates is not 16

changing by much. For this example it is when the sequence sample size is increased up to 200 or higher. This is not the case with 𝛿̂ , which appears to require more samples to stabilize.

5. Numerical experiments The reference policy is tested on two applications to illustrate its use and to evaluate the sensitivity of the reference policy to network parameters. A non-myopic sequential facility location and timing problem is introduced, followed by an application of the reference policy to evaluate a non-myopic DARP algorithm proposed by Hyytiä et al. (2012). 5.1. Non-myopic sequential facility location and timing policy evaluation Stochastic sequential facility location and timing arises when decision-makers wish to stage their investments over time in addition to deciding the locations to adapt to uncertainties in demand. This problem is highly relevant in the alternative fuel infrastructure investment planning literature (e.g. Chung and Kwon, 2015; MirHassani and Ebrazi, 2013; He et al., 2013; Jung et al., 2014; Huang et al., 2015) due to the uncertainty in demand for these new technologies like electric vehicles or wireless charging. It is also an important application for locating sensing technologies, where the data being sensed is highly uncertain (e.g. Li and Ouyang, 2011), or for non-myopic server relocation problems (e.g. Chow and Regan, 2011c). We introduce a non-myopic sequential facility location and timing problem solved in the same manner as Algorithm 1. In this setting, we consider a p-median base problem in a complete graph 𝐺(𝑁, 𝐴). Demand at each node is assumed to follow a geometric Brownian motion (GBM), as shown in discrete time form in Eq (21). ℎ𝑗,𝜏+1 = ℎ𝑗𝜏 exp(𝜎𝑗 𝑍𝑗𝜏 + 𝜇𝑗 ) , ∀𝑗 ∈ 𝑁, 𝑡 ≤ 𝜏 ≤ 𝑡 + 𝑇 − 1

(21)

where ℎ𝑗𝜏 is the demand at node 𝑗 at time step 𝜏; 𝜎𝑗 is the volatility parameter of the GBM for demand node 𝑗 ∈ 𝑁; 𝜇𝑗 is the trend parameter of the GBM for demand node 𝑗 ∈ 𝑁; 𝑇 is the discrete time horizon from initial time step 𝑡; 𝑍𝑗𝜏 is a random perturbation in demand at node 𝑗 at time step 𝜏 that is distributed as a standard normal distribution Φ(0,1). A sequence in the context of this problem is a set order of facilities to be located; for three candidate nodes A,B, C, the sequence {A,B,C} means that node A has to be located before node B is allowed to be located, and both those have to be located before node C can be located. The enumerative algorithm for the CR policy is essentially identical to Algorithm 1, except that the decision variable 𝑥𝜏,𝑖(ℎ) represents a node location decision at time 𝜏, and the optimization program for the profit function 𝜋𝜏 (𝑥𝜏,𝑖(ℎ) , 𝜔) in Eq (8) is replaced by Eq (22). Eq (22) computes the objective value of a given set of facility locations. When no facility investments are made, 𝑍(𝑥𝜏,0(ℎ) , 𝜔) is set to be the sum of the distances from each demand node to the farthest node in the network. 17

𝑗𝑘

𝑍(𝑥𝜏,𝑖(ℎ) , 𝜔) = min ∑ ∑ ℎ𝑗𝜏 (𝜔)𝑑𝑗𝑘 𝑥𝜏,𝑖(ℎ) 𝑗

(22a)

𝑘

Subject to ∑ 𝑌𝑗𝑘 = 1, 𝑘

𝑗𝑘

𝑌𝑗𝑘 − 𝑥𝜏,𝑖(ℎ) ≤ 0,

∀𝑗

(22b)

∀𝑗, 𝑘

(22c)

𝑥𝜏,𝑙(ℎ) = 1, ∀𝑙 = 1 … 𝑖 𝑌𝑗𝑘 ∈ {0,1}, ∀𝑗, 𝑘

(22d)

𝑗𝑘

(22e)

where 𝑌𝑗𝑘 = 1 if demand at node 𝑗 are served by facility at node 𝑘, 0 otherwise; 𝑑𝑗𝑘 is the distance from node 𝑗 to node 𝑘. The CR policy is estimated using a Weibull distribution from a sample of sequences, which provides an upper bound reference policy to evaluate the sampled optimal policy. The test network from Simchi-Levi and Berman (1988) is used, as shown in Figure 6, given its popularity in recent alternative fuel location literature.

Figure 6. Test network for facility location application (source: Simchi-Levi and Berman, 1988).

18

Cumulative Probability

For the example, each node 𝑗 is assumed to evolve as independent GBM with 𝜇𝑗 = 0 and 𝜎𝑗 = 0.40. Discount rates are assumed 𝜌 = 0 so that timing decision is independent of time value of payoffs. A budget of 8 facilities is assumed over a time horizon of 𝑇 = 10 using 𝑃 = 300 25! sample paths. The population of sequences is 17! = 43,609,104,000. Algorithm 2 with Eq (21) is run for |𝑆| = 350 independently drawn (with replacement) sequence samples, resulting in a sampled policy of investing immediately in node 10 and node 20, while deferring the remaining 6 location decisions to the remaining 10 periods. The sampled policy value is ℎ̂∗ = 397.673, which accounts for the value of timing and deferring the other 6 location decisions to a later period. Note that this decision is that of the maximum policy value from the 350 samples, and not guaranteed to be the decision for the true maximum policy value. However, we now have an estimate of the distribution of the true maximum policy value, and we also know the decision of the maximum value policy among the 350 samples, and can evaluate its suboptimality that way. This is one of the key advantages of this method. The parameters of a Weibull distribution are estimated from the sample, resulting in 𝜇̂ = 385.580 and 𝛿̂ = 56.202. The 95% confidence bounds for each parameter (per Eq (19) – (20)) are obtained as well, with 384.825 ≤ 𝜇 ≤ 386.336 and 51.675 ≤ 𝛿 ≤ 61.126. Using these bounds and mean values, we construct the Weibull CDF of the reference policy along with the lower and upper bounds as shown in Figure 7. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 400

405

410

415

420

Reference policy value Mean parameters

UB

LB

Figure 7. Weibull CDF of the reference policy value with 95% confidence interval.

Based on the mean reference policy value at the 50 percentile (408.3 ± 2.7), the sampled policy is within 2.0% to 3.2% of the CR policy value with 95% confidence. This suggests that the initial state-dependent decision to locate at node 10 and node 20 immediately and to defer the remaining 6 location decisions to one of the remaining 10 periods is already quite close in value to the CR policy value. The experiment demonstrates two points: 1) that Algorithm 2 can be applied with minor modifications to adaptive sequential facility location problems, and 2) that a sample of merely

19

350 sequences for some example networks can obtain policies whose values can be within 3% of a policy that requires enumerating 43 billion sequences. 5.2. Evaluation of Hyytiä et al.’s non-myopic dynamic dial-a-ride algorithm The last experiment is in dynamic fleet management, which is particularly important in flexible transit where data for service demand is becoming more readily available through information and communications technologies (ICT) (Chow, 2014). The non-myopic dynamic dial-a-ride variation has not had as much focus in the literature (see Table 1). Because there are multiple vehicles, there are actually two sets of decisions that need to be made dynamically: customer to vehicle allocation, and routing of vehicles. We use the reference policy to evaluate an algorithm proposed recently by Hyytiä et al. (2012). A brief overview is provided of their algorithm. It views the dynamic DARP as a multiserver queue system. The time that a passenger calls in with a request until they get picked up by a vehicle is the queue wait time; the time that the passenger is in transit is the service time. This perspective allows the policy outcomes to be measured non-myopically within an infinite horizon context as steady state performance measures, which have already been studied extensively in the queueing literature. As a result, the steady state performance measures of such queueing systems can be used to obtain optimal non-myopic policies under infinite horizons. In practice, such a queue does not necessarily follow first-in-first-out principles since a customer can be picked up but dropped off after another customer that is picked up later. Furthermore, the distributions of the wait times and service times in such a queue system can be quite complex. As a result, it may not be possible to obtain an analytical expression for the exact steady state delay of such a multi-server queueing system to obtain the optimal dynamic customer-to-vehicle dispatch policy. Hyytiä et al. (2012) proposed instead to approximate the multi-server queueing system as a series of M/M/1 queues and to use that approximation to derive a non-myopic allocation policy that outperformed a myopic policy. The policy is defined as Eq (23) – (24). 𝑎𝑟𝑔𝑚𝑖𝑛𝑣,𝜉 [𝑐(𝑣, 𝜉) − ⁡𝑐(𝑣, 𝜉 ′ )]

(23)

where 𝑐(𝑣, 𝜉) = 𝜃𝐿(𝑣, 𝜉) + (1 − 𝜃) (𝛽𝑇(𝑣, 𝜉)2 + ∑ 𝑆𝑖 (𝑣, 𝜉))

(24)

𝑖

where 𝑣 is a vehicle, 𝜉 is a tour obtained for a capacitated traveling salesman problem with pickup and delivery (TSPPD), 𝜉 ′ is the previous tour updated to the time of the current customer arrival, 𝑐 is the value function, 𝐿 is the tour length, 𝑆𝑖 is the total delay for customer 𝑖 (service plus wait time, i.e. time from call in to time they are delivered), and 𝜃 and 𝛽 are parameters to adjust the degree of system cost versus user cost (𝜃) and degree of look ahead (𝛽). A value of 𝛽 = 0 meant a purely myopic system, and a 𝛽 > 0 is a non-myopic policy. The method is shown to be very easy to implement even for large systems (up to fleets of 140 vehicles) and improves upon the myopic solution for a range of parameter values. The method appears to have been implemented in practice with the Kutsuplus in Helsinki (Barry, 2013).

20

In this experiment, we implement Hyytiä et al.’s (2012) myopic (𝛽 = 0) and non-myopic (𝛽 = 0.5) policies, and compare them alongside each other as well as with the reference policy. The experiment answers the following questions. 1) Does the non-myopic algorithm perform as expected when replicated? 2) Does comparing the non-myopic algorithm to the myopic algorithm relative to the reference policy provide more insight to sensitivity to the network? A sequence policy in this context is interpreted as a static a priori solution, since there is no timing component. For example {A, B, C} means that the first customer arrival is assigned to vehicle A regardless of the location, the second to vehicle B, and the third to vehicle C. It is possible to have {A, A, A}, so for 5 vehicles and 30 customers the number of sequences is |𝐻| = 530 = 9.3 × 1021 sequences. Since the example is primarily cost minimization, the values are taken as maximization of the negative costs. We use the same square Euclidean plane from Hyytiä et al.’s (2012) study, bounded by by 𝑥 = [−5,5] and 𝑦 = [−5,5], with uniformly distributed origins and destinations of requests for evaluating the policies. Dwell times are assumed to be zero, and idle vehicles are sent back to the depot located at the origin since demand is uniformly distributed. We measure the performance of the system based on 𝑄 = 30 arrivals, under 𝑃 = 300 different independent scenarios. The myopic policy is set using 𝜃 = 0.5, 𝛽 = 0, and the non-myopic policy is set using 𝜃 = 0.5, 𝛽 = 0.5. A capacitated traveling salesman problem with pickup and delivery (TSPPD) reoptimization routine is coded up based on a basic insertion heuristic discussed in Mosheiov (1994) (see Sayarshad and Chow, 2015). An example implementation of Hyytiä et al.’s (2012) myopic policy (𝜃 = 0.5, 𝛽 = 0) and non-myopic policy (𝜃 = 0.5, 𝛽 = 0.5) for one vehicle in one scenario is shown in Figure 8. 5 4 3 2 1 0 -5

-3

-1

1

3

5

-1 -2 -3

-4 -5

Myopic

Origins

Destinations

Non-myopic

Figure 8. Example simulation of one vehicle under two dynamic DARP policies from Hyytiä et al. (2012).

21

Three different parameter scenarios are considered:  Scenario A: In the baseline scenario, arrivals are assumed to be Poisson distributed with 𝜆 = 0.2, speed is set to 1, the system consists of a fleet of 5 vehicles, each with a capacity of 4 passengers.  Scenario B: Same as scenario A, except 𝜆 = 0.4 with a newly simulated set of inter-arrival times. OD locations of the 30 passengers remain the same as scenario A, as are the sampled sequences.  Scenario C: Same as scenario A, except the capacity of each vehicle is reduced to 2 passengers. Over the 300 simulated paths under each of the three scenarios, we obtain the average values from the two policies and compare them in Table 2. Table 2. Comparison of Hyytiä et al.’s policies without any reference policy Scenario Average System Cost Average User Cost Sum of Average Costs per User per User (𝑳/𝑸) per User (𝑺𝒒 ) A Myopic 10.576 9.609 20.185 Non-myopic 10.592 9.720 20.312 B Myopic 9.436 10.431 19.867 Non-myopic 9.403 10.670 20.073 C Myopic 10.580 9.610 20.191 Non-myopic 10.601 9.710 20.311

From Table 2, two conclusions are drawn. First, it appears that the improvements in performance observed using the non-myopic policy from Hyytiä et al. (2012) are not replicable in this example. In fact, their non-myopic policy in all three scenarios is marginally worse off on average than their myopic policy. This result illustrates how the non-myopic policy performance depends on the parameter settings such as number of arrivals, the inter-arrival rate, the fleet size, etc. Second, despite doubling the inter-arrival rate in scenario B or halving the vehicle capacity in scenario C, the average costs of the two algorithms appear relatively unchanged. If a researcher was developing the non-myopic algorithm for the first time and evaluating it based on these results, it might suggest there is not much sensitivity to the parameter changes. Unfortunately, as discussed in Section 1.2 and Table 1, this is often the extent of the comparison in many studies. The picture is very different when comparing the algorithms in the three scenarios alongside the reference policy. We randomly draw |𝑆| = 300 sample sequences, and for each sample sequence, obtain the average cost per user over |𝑃| = 300 sample paths. The same randomly generated sequences are used for all three scenarios. Figure 9 shows the average policy values (as negative costs) for each of the three scenarios along with the distribution of the sampled sequences and the estimated CR policy values.

22


1 0.8

Sample sequence CDF

Weibull CDF of CR policy value

0.6 0.4 0.2 0

-40

-35

-30

-25

-20

-15

-10

-5

0

Option Value

(a) Scenario A

Myopic

Non-myopic


1 0.8

Sample sequence CDF


0.6

0.4 0.2 0

-50

-40

-30

-20

-10

0

Option Value

(b) Scenario B

Myopic

Non-myopic


1


Sample sequence CDF

0.8

0.6 0.4 0.2 0

-40

-35

-30

-25

-20

-15

-10

-5

0

Option Value

(c) Scenario C

Myopic

Non-myopic

Figure 9. Comparison of Hyytiä et al.’s policies with respect to Weibull distribution of CR policy for (a) base scenario, (b) double demand scenario, and (c) half vehicle capacity scenario.

23

First, the average values obtained from the sampled sequences are -26.947, -34.877, and -27.122 for scenarios A, B, and C, respectively. Each of these average values from the randomly generated sequences is worse off than the myopic and non-myopic policy values. This is to be expected, as designed policies should be better than randomly generated decisions. Second, the sample sequences are used to construct the Weibull distribution of the CR policy for each of the three scenarios, with average values of -3.27, -21.27, and -19.00, respectively. An interesting result arises. The sensitivity of the policy to characteristics of the network are notable; as demand is increased in scenario B, average value of the policy drops from -3.27 down to -21.27. Since the CR policy is designed to capture timing (which is not present in this example) and network effects, it means that there is an increase in network effects on vehicle dispatch and routing decisions due to the increased demand. This makes sense. Similarly, as the vehicle capacity is halved, the value decreases to -19.00. This degradation is less than the amount due to the demand doubling, which suggests that for this network instance, the non-myopic policies are more sensitive to demand doubling than to capacity halving. Third, whereas Table 2 suggests there is negligible change in the policy values across the three scenarios, a different conclusion can be drawn when placed in the context of the proposed reference policy. In the base scenario A, the policies from Hyytiä et al. seem to significantly underperform the policy value (-20.312 for non-myopic policy versus -3.27, which is 16% of the cost). However, in scenario B the non-myopic policy improves over the reference policy (-20.073 for non-myopic policy versus -21.27, which is now 106% of the cost), and is much closer to the reference policy in scenario C (-20.311 versus -19.00, which is 94% of the cost). Since the reference policy, with respect to the full real option value as indicated in Eq (3), is missing the premium from re-designing, it suggests that the increased demand in scenario B allows the policies from Hyytiä et al. to gain significant value from opportunities to readjust the routing plans. Furthermore, while the reference policy worsened significantly in scenario B and C, nonmyopic and myopic policies stayed relatively the same, which means they became much more effective in those scenarios. These results lend evidence to the value of comparing against an additional reference policy that is well-defined and easy to adopt across different network design problems and parameter settings.

6. Conclusion Several contributions are made in this study. First, we re-visited Chow and Regan’s (2011a) policy and algorithm and defined it in language more consistent with that of Powell (2011). Second, we proposed a means to obtain a distribution of Chow and Regan’s (2011a) policy value without explicit enumeration. Third, we evaluated different extreme value distributions for this reference policy and empirically found that Weibull distribution is a good-fitting and efficient estimator for one known example. Fourth, we proposed a new adaptive facility location and timing problem, and demonstrated the use of the proposed algorithm in obtaining an upper bound for evaluating sampled policies. Lastly, we replicated Hyytiä et al.’s (2012) dynamic dial-a-ride policy and found that the performance can vary significantly across parameter settings, even if the comparison between only myopic and non-myopic policies may suggest otherwise.

24

The computational experiments show that the network effect-measurable reference policy proposed in this study is both theoretically consistent across test cases (as an approximation of the lower bound of Eq (3)) and scalable for those different cases. This work will enable future researchers to better evaluate dynamic policies in a network context. For future research, the reference policy should be included in new test cases to establish performance benchmarks. Further exploration of the relationship between specific network designs and stochastic properties to the reference policy may identify ways to create new reference policies with higher resolution contours. Algorithm 2 assumes that the demand evolution is independent of the policy, but in reality (e.g. “chicken and egg” problem in alternative fuel infrastructure planning) it may not always be the case. One potential solution is to use decision-dependent stochastic processes (Kirschenmann et al., 2014) in the algorithm or to replace the multi-option LSM with other multidimensional option valuation methods (e.g. Cortazar et al., 2008), which may expand the applicability of the CR policy to a larger pool of supply-demand integrated problems.

Acknowledgements This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Constructive feedback provided by participants of the 3rd Annual INFORMS TSL Workshop held in Chicago, IL, and two anonymous referees is much appreciated. Helpful comments from Dr. Arnold Yuan from Ryerson University are gratefully acknowledged.

References Balakrishnan, N., Kateri, M., 2008. On the maximum likelihood estimation of parameters of Weibull distribution based on complete and censored data. Statistics and Probability Letters 78(17), 2971-2975. Barry, K., 2013. New Helsinki bus line lets you choose your own route. Wired, http://www.wired.com/2013/10/ondemand-public-transit/, accessed June 28, 2014. Berbeglia, G., Cordeau, J.F., Laporte, G., 2010. Dynamic pickup and delivery problems. European Journal of Operational Research 202(1), 8-15. Simchi-Levi, D., Berman, O., 1988. A heuristic algorithm for the traveling salesman location problem on networks. Operations Research 36(3), 478-484. Carriere, J.F., 1996. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics 19(1), 19-30. Chen, A., Zhou, Z., Chootinan, P., Ryu, S., Yang, C., Wong, S.C., 2011. Transport network design problem under uncertainty: a review and new developments. Transport Reviews 31(6), 743-768. Chow, J.Y.J., Regan, A.C., 2011a. Network-based real option models. Transportation Research Part B 45(4), 682695. Chow, J.Y.J., Regan, A.C., 2011b. Real option pricing of network design investments. Transportation Science 45(1), 50-63. Chow, J.Y.J., Regan, A.C., 2011c. Resource location and relocation models with rolling horizon forecasting for wildland fire planning. INFOR 49(1), 31-43. Chow, J.Y.J., Regan, A.C., Ranaiefar, F., Arkhipov, D.I., 2011. A network option portfolio management framework for adaptive transportation planning. Transportation Research Part A 45(8), 765-778. Chow, J.Y.J., 2014. Policy analysis of third party electronic coupons for public transit fares. Transportation Research Part A 66, 238-250. Chung, S.H., Kwon, C., 2015. Multi-period planning for electri-car charging station locations: a case of Korean expressways. European Journal of Operational Research 242(2), 677-687.

25

Clément, E., Lamberton, D., Protter, P., 2002. An analysis of a least squares regression method for American option pricing. Finance and Stochastics 6(4), 449-471. Cortazar, G., Gravet, M., Urzua, J., 2008. The valuation of multidimensional American real options using the LSM simulation method. Computers & Operations Research 35(1), 113-129. Cortés, C.E., Sáez, D., Núñez, A., Muñoz-Carpintero, D., 2009. Hybrid adaptive predictive control for a dynamic pickup and delivery problem. Transportation Science 43(1), 27-42. Figliozzi, M.A., Mahmassani, H.S., Jaillet, P., 2007. Pricing in dynamic vehicle routing problems. Transportation Science 41(3), 302-318. Fisher, R.A., Tippett, L.H.C., 1928. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Cambridge Philosophical Society, 24(2), 180-190. Gamba, A., 2002. An extension of least squares Monte Carlo simulation for multi-options problems. In: Proceedings of the Sixth Annual International Real Options Conference, Paphos, Cyprus, July 2002. García, D., 2003. Convergence and biases of Monte Carlo estimates of American option prices using a parametric exercise rule. Journal of Economic Dynamics and Control 27(10), 1855-1879. Gumbel, E.J., 1958. Statistics of Extremes, Columbia University Press (republication by Dover), NY. He, F., Yin, Y., Zhou, J., 2013. Integrated pricing of roads and electricity enabled by wireless power transfer. Transportation Research Part C 34, 1-15. Huang, Y., Li, S., Qian, Z.S., 2015. Optimal deployment of alternative fueling stations on transportation networks considering deviation paths. Networks and Spatial Economics 15(1), 183-204. Hyttiä, E., Penttinen, A., Sulonen, R., 2012. Non-myopic vehicle and route selection in dynamic DARP with travel time and workload objectives. Computers & Operations Research 39(12), 3021-3030. Ichoua, S., Gendreau, M., Potvin, J.Y., 2006. Exploiting knowledge about future demands for real-time vehicle dispatching. Transportation Science 40(2), 211-225. Jung, J., Chow, J.Y.J., Jayakrishnan, R., Park, J.Y., 2014. Stochastic dynamic itinerary interception refueling location problem with queue delay for electric taxi charging stations. Transportation Research Part C 40, 123142. Karp, R.M., 1992. On-line algorithms versus off-line algorithms: how much is it worth to know the future? Proc. IFIP 12th World Computer Congress on Algorithms, Software, Architecture – Information Processing ’92, Vol 1, 416-429. Kirschenmann, T., Popova, E., Damien, P., Hanson, T., 2014. Decision dependent stochastic processes. European Journal of Operational Research 234(3), 731-742. Lawless, J.F., 2003. Statistical Models and Methods for Lifetime Data, John Wiley & Sons, Inc., Hoboken, NJ. Li, X., Ouyang, Y., 2011. Reliable sensor deployment for network traffic surveillance. Transportation Research Part B 45(1), 218-231. Longstaff, F.A., Schwartz, E.S., 2001. Valuing American options by simulation: a simple least-squares approach. Review of Financial Studies 14(1), 113-147. Magnanti, T.L., Wong, R.T., 1984. Network design and transportation planning: models and algorithms. Transportation Science 18(1), 1-55. MirHassani, S.A., Ebrazi, R., 2013. A flexible reformulation of the refueling station location problem. Transportation Science 47(4), 617-628. Mitrović-Minić, S., Krishnamurti, R., Laporte, G., 2004. Double-horizon based heuristics for the dynamic pickup and delivery problem with time windows. Transportation Research Part B 38(8), 669-685. Mosheiov, G., 1994. The travelling salesman problem with pick-up and delivery. European Journal of Operational Research 79(2), 299-310. Novoa, C., Storer, R., 2009. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands. European Journal of Operational Research 196(2), 509-515. Powell, W. B., 2011. Approximate Dynamic Programming: Solving the curses of dimensionality (2nd ed.), John Wiley and Sons, New York. Powell, W.B., Simao, H.P., Bouzaiene-Ayari, B., 2012. Approximate dynamic programming in transportation and logistics: a unified framework. EURO Journal on Transportation and Logistics 1(3), 237-284. Sayarshad, H.R., Chow, J.Y.J., 2015. A scalable non-myopic dynamic dial-a-ride and pricing problem. Transportation Research Part B, in press, doi:10.1016/j.trb.2015.06.008. Secomandi, N., 2001. A rollout policy for the vehicle routing problem with stochastic demands. Operations Research 49(5), 796-802. Spivey, M.Z., Powell, W.B., 2004. The dynamic assignment problem. Transportation Science 38(4), 399-419.

26

Stentoft, L., 2004. Convergence of the least squares Monte Carlo approach to American option valuation. Management Science 50(9), 1193-1203. Szeto, W.Y., Jiang, Y., Wang, D.Z.W., Sumalee, A., 2013. A sustainable road network design problem with land use transportation interaction over time. Networks and Spatial Economics, in press, doi: 10.1007/s11067-0139191-9. Thomas, B.W., White III, C.C., 2004. Anticipatory route selection. Transportation Science 38(4), 473-487. Thomas, B.W., 2007. Waiting strategies for anticipating service requests from known customer locations. Transportation Science 41(3), 319-331. Trigeorgis, L., 1996. Real Options: Managerial Flexibility and Strategy in Resource Allocation. The MIT Press, Cambridge, MA. Zhang, J., Lam, W.H.K., Chen, B.Y., 2013. A stochastic vehicle routing problem with travel time uncertainty: tradeoff between cost and customer service. Networks and Spatial Economics 13(4), 471-496.

27

Reference policies for non-myopic sequential network design and ...

Reference policies for non-myopic sequential network design and ...

Suggest Documents

Network Design & Network Hardening Policies

Reference Policies

Solution Reference Network Design for Cisco Unified CCX and Cisco ...

Adaptive Policies for Sequential Sampling under Incomplete ...

Adaptive Policies for Sequential Sampling under Incomplete

Solution Reference Network Design for Cisco Unified ... - CiteSeerX

Solution Reference Network Design for Cisco Unified ... - CiteSeerX

Probabilistic Neural Network Models for Sequential Data

Optimal sequential inspection policies - Rutcor - Rutgers University

Reference Guide for Instructional Design and Development

Using Reference Architectures for Design and ...

Sequential Circuit Design

Optimality of Index Policies for a Sequential Sampling Problem ...

Sequential Design for Ranking Response Surfaces

Sequential Residual Design Method for Linear Systems

Self-Checking Synchronous Sequential Circuit Design for ...

TOWARDS LOGICAL FRAMEWORK FOR SEQUENTIAL DESIGN

Sequential Circuit Design for Embedded Cryptographic ... - CiteSeerX

Bayesian sequential experimental design for fatigue

QUICK REFERENCE GUIDE FOR NETWORK TROUBLESHOOTING

Collaborative, Sequential, and Isolated Decisions in Design

Policies and Procedures - Global Compact Network Canada

Neural Network Reference Compensation Technique for Position

QUICK REFERENCE GUIDE FOR NETWORK TROUBLESHOOTING