Preemptive Depot Returns for Dynamic Same-Day ...

1 downloads 0 Views 758KB Size Report
A preemptive depot return occurs when a delivery vehicle returns to ... To facilitate same-day delivery routing with preemptive depot returns, we introduce an ap-.
Preemptive Depot Returns for Dynamic Same-Day Delivery Marlin W. Ulmer

Barrett W. Thomas

Dirk C. Mattfeld

Abstract In this paper, we explore same-day delivery routing and particularly how same-day delivery vehicles can better integrate dynamic requests into delivery routes by taking advantage of preemptive depot returns. A preemptive depot return occurs when a delivery vehicle returns to the depot before delivering all of the packages currently on-board the vehicle. In this paper, we assume that a vehicle serves requests in a particular delivery area. Beginning the day with some known deliveries, the vehicle seeks to serve the known requests as well as additional new requests that are received throughout the day. To serve the new requests, the vehicle must return to the depot to pick up the packages for delivery. In contrast to previous work on sameday delivery routing, in this paper, we allow the vehicle to return to the depot before serving all loaded packages. To solve the problem, we couple an approximation of the value of choosing any particular subset of requests for delivery with a routing heuristic. Our approximation procedure is based on approximate dynamic programming and allows us to capture both the current value of a subset selection decision and its impact on future rewards. Using extensive computational tests, we demonstrate the value of preemptive depot returns and the value of the proposed approximation scheme in supporting preemptive returns. We also identify characteristics of instances for which preemptive depot returns are most likely to offer improvement. Keywords: stochastic dynamic vehicle routing same-day delivery preemptive depot returns approximate dynamic programming

1

1

Introduction

Growing at three times the rate of traditional retail sales, e-commerce is expected to be between a $427 billion and $443 billion industry in the United States in 2017 (BI Intelligence 2017). The industry is projected to grow from 12.7% of retail sales in the US in 2017 to 17% of all retail sales in 2022 (Keyes 2017). One of the biggest trends in this growing e-commerce market is sameday delivery. With companies like BestBuy and Macy’s expanding same-day delivery to compete with Amazon, “Same-day delivery is on its way to becoming a basic option shoppers expect . . .”(Wahba 2017). CVS Health has even recently announced that it will soon begin same-day delivery of prescription drugs (Thomas 2017). Thus, the same-day delivery segment is expected to out pace general e-commerce growth with an annual growth rate of 40% (Yahoo! Finance 2016). As companies continue to seek to take advantage of this growth and enter the same-day delivery market, competition will force companies to most efficiently deliver the same-day packages. In this paper, we explore same-day delivery routing and particularly how same-day delivery vehicles can better integrate dynamic requests into delivery routes by taking advantage of a preemptive returns to the depot. A preemptive depot return occurs when a delivery vehicle returns to the depot before delivering all of the packages currently on-board the vehicle. In this paper, we assume that a single vehicle serves requests in a delivery area. This vehicle can be viewed as part of a fleet, perhaps serving a particularly service area, but operating independently of the other vehicles in the fleet. The vehicle begins the day with a set of known requests and additional new requests are received throughout the day. The known requests represent orders placed before the start of a day’s deliveries. To serve the new requests, the vehicle must return to the depot to pick up the packages for delivery. In contrast to previous work on same-day delivery routing, in this paper, we allow the vehicle to return to the depot before serving all loaded packages. This preemption of a route allows the vehicle to take advantage of the possibility of efficiently integrating new service requests that are located close to the existing route. The vehicle seeks to serve the known requests and as many as possible of the same-day delivery requests that occur over the problem’s time horizon. We assume that the vehicle can decide which subset of dynamic requests to serve. From the point of view of the problem proposed in the paper, the fate of the rejected requests is immaterial. However, in addition to them leaving the system, there multiple ways in rejected requests can be

2

handled. Most notably, such requests might be served by a more expensive third party or the next day. Generally, the problem can be defined as stochastic dynamic one-to-many pickup and delivery problem (SDPD). To facilitate same-day delivery routing with preemptive depot returns, we introduce an approach that we call anticipatory preemptive depot return (APDR). In APDR, we couple an approximation of the value of choosing any particular subset of requests for delivery with a routing heuristic. Our approximation procedure is based on approximate dynamic programming (ADP) and allows us to capture both the current value of a subset selection decision and its impact on future rewards. Our approximation procedure relies on offline simulation and a reduction of the state space via aggregation. Our proposed aggregation scheme explicitly captures planned depot returns. While the aggregation scheme allows us to capture the problem’s large state space, we must also take steps to overcome the problem’s large decision space, which is the result of the need to not only select subsets of customers for service but to also route them. We address this challenge by introducing a routing heuristic that accounts for preemptive depot returns. This paper makes a number of contributions to the literature on same-day delivery routing. First, this paper is the first to introduce a method for preemptive depot returns for same-day delivery routing. Further, the proposed APDR approach makes use of offline simulation to generate approximations and thus allows instant online decision making. Existing approaches rely on online sampling or rollout procedures. In extensive computational studies, we demonstrate the value of both preemptive depot returns and of the proposed aggregation scheme in supporting preemptive returns. We also show that the number of customers at the start of the day has a strong influence on the value of preemptive depot returns as does the combination of depot location and the customer distribution. The remainder of this paper is structured as follows. In §2, we present the related literature. In §3, we present a formal problem description and a Markov decision process model. The solution approach APDR and the benchmark heuristics are defined in §4. We describe our experimental design in §5 and the result of our computational experiments in §6. The paper concludes with a summary of the results and directions for future research in §7.

3

2

Literature Review

The SDPD is dynamic and stochastic following the definition given in Kall and Wallace (1994). The problem is dynamic because the problem information changes over the course of the day and because the dispatcher can adapt plans and decisions in response to newly learned information. The problem is stochastic because the customer realizations that appear over the course of the day follow a know spatial and temporal probability distribution. In this literature review, we first survey work on same-day delivery and then related work in the areas of dynamic pickup and delivery, vehicle routing with stochastic demands, vehicle routing with stochastic requests, and grocery delivery problems. The most closely related work to that in this paper is the small set of literature concerning sameday delivery. The literature on same-day delivery is primarily contained in four papers: Voccia et al. (2017), Azi et al. (2012), Klapp et al. (2016b), Klapp et al. (2016a). There are two key differences between the four papers and this paper. First, none of the four papers allows preemptive vehicle returns. As we show in Section 6, allowing preemptive returns in a same-day delivery problem has the potential to improve the number of customers who can be served on a given day. Second, the bulk of the computation time for our solution approach comes from offline simulation that takes place in advance of the problem horizon. The result is that, in runtime, decision making is almost instantaneously. The methods presented in the four papers cited above all rely on online solution methods. Thus, at runtime, the three cited papers require computation to make a decision, potentially impacting the ability to make decisions in real time. Voccia et al. (2017) present a problem setting similar to that in this paper in which requests arrive over the course of the day and must be served on that day through multiple returns to the depot. Each request must be served within a time window or by a deadline. The authors introduce a sample-scenario planning approach to solve the problem (see Bent and Van Hentenryck (2004) for an overview of sample-scenario planning). Rather than operating on an estimation of the value function as we do in this paper, sample-scenario planning generates a sample of future requests, combines them with existing requests, and produces solutions for each set. Then, the method chooses the solution that is most similar to the others. Unlike the work in this paper, the solution method in Voccia et al. (2017) does not explicitly account for preemptive depot returns. Rather,

4

vehicles must deliver all loaded packages before returning to the depot. Such a strategy is known as plan-at-home (PAH). However, while they do not allow preemptive depot returns, route selection in the sample-scenario method is designed such that shorter routes and thus many depot returns are generated. We note that our H3 benchmark is an ADP analog to the route-selection method of Voccia et al. (2017) in that it too can choose shorter or longer routes depending on the approximated future value of a particular route length. Thus, while not implementing it directly, we do use a benchmark that is similar to the approach presented in Voccia et al. (2017) with the advantage that the calculation is conducted offline. Also related to this paper is the work of Azi et al. (2012). Similar to this paper and Voccia et al. (2017), Azi et al. (2012) study a problem in which requests are realized throughout the day and are served through multiple trips to a depot. Similar to Voccia et al. (2017), Azi et al. (2012) develop a sample-scenario planning approach, and like Voccia et al. (2017), do not explicitly consider preemptive returns to the depot. Instead, Azi et al. (2012) constrain the length of the tours on the vehicles leaving the depot. In this way, Azi et al. (2012) implicitly recognize the value of depot returns, but in contrast to this paper and Voccia et al. (2017), the length of tour limitations proposed in Azi et al. (2012) are fixed throughout the horizon and do not adapt to changing state information. Klapp et al. (2016b) also explore the same-day delivery problem, but assume that all customers are on a line. Thus, the routing and subset selection problem are integrated in a single decision, importantly eliminating the expensive step of evaluating the cost of a chosen subset of customers, a step that is necessary when considering a general network as we do in this problem. Further, the formulation does not allow for preemptive depot returns. Given their problem setting, Klapp et al. (2016b) demonstrate how to efficiently find optimal a priori routes and then introduce an approximate dynamic programming approach known as rollout (see Goodson et al. (2017) for an overview of rollout algorithms) to leverage the a priori routes in a dynamic setting. Klapp et al. (2016a) extend Klapp et al. (2016b) to general network structures. Due to the complexity of their online calculations, the apply their method only once per hour. Because the method introduced in this paper is offline, it enables real-time control. The SDPD can also be seen as a special case of a dynamic pickup and delivery problem. In stochastic and dynamic many-to-many pickup and delivery problems (DPDPs), usually both origin and destination of a request (order) are stochastic. Thus, the main difference between the

5

SDPD and DPDPs is that, in the SDPD, the pickup location is known beforehand. An overview on dynamic pickup and delivery problems is given by Berbeglia et al. (2010). In contrast to the offline method proposed in this paper, many of the SDPD papers, including S´aez et al. (2008), Pureza and Laporte (2008), Ghiani et al. (2009), Mes et al. (2010), and Mu˜noz Carpintero et al. (2015), use online, lookahead methods to anticipate the impact of future requests. Hyyti¨a et al. (2012) use a queueing method that could be characterized as a value-function approximation such as is proposed in this paper. However, from a methodological perspective, the singular pickup location in the SDPD allows us to propose a unique state-space aggregation scheme that allows for improved estimation of future costs for our problem. Ghiani et al. (2017) also use an offline approach, but instead propose a policy-function approximation. The method of Ghiani et al. (2017) does not apply to this problem because it operates off of the distinct classes of customers present in the problem being studied. The concept of preemptive depot returns is mainly found in the literature on vehicle routing with stochastic demands (VRPSD). For these problems, the customers are known but the amount of demand at each customer is unknown prior to arrival. Thus, the vehicles may need to return to the depot to replenish capacity. The most recent work on preemptive depot returns for the VRPSD can be found in Goodson et al. (2016). They provide a comprehensive review of literature related to preemptive depot returns for the VRPSD and present a rollout approach that embeds a dynamicprogramming approach to find optimal preemptive returns for fixed sequences of customers. Other work on preemptive depot returns in the VRPSD literature can be found in Bertsimas et al. (1995), Yang et al. (2000), and Secomandi (2003). The work in this paper differs from the VRPSD literature in a number of ways. First, in the SDPD, each customer requires a unique good, and as a result, the vehicle must return to the depot to serve any customer who is not loaded on the vehicle at the start of the day. Further, we must make a subset selection decision on every return to the depot, thus combining subset selection with routing decisions when at the depot. In some of the VRPSD literature, only a subset of customers is served, but the subset selection and routing do not need to be done simultaneously at the depot. Finally, we note that it is possible that a variant of the solution approach and particularly the aggregation scheme used in this paper could be adapted to the VRPSD. The SDPD can be also seen as a generalization of the dynamic vehicle routing problem with

6

stochastic customer requests (VRPSR). For the VRPSR, only a few customers are known in the beginning of the horizon. Additional customers request service throughout the day, and the vehicle seeks to visit the requesting customer locations. However, time limits mean that not all of the requests can be served, and decisions are made about which subsets of requests to accept and the assignment and routing of the requests. For the VRPSR, Ulmer et al. (2018) present an approximate dynamic programming approach that uses aggregated states and approximate value iteration to develop a lookup table. They call their aggregation scheme the anticipatory time budgeting approach (ATB). Ulmer et al. (2018) demonstrates superior solution quality for the VRPSR compared to state-of-the-art approaches in Ghiani et al. (2012). We use a variant of the ATB approach as one of the benchmarks in this paper. We also extend the aggregation to include problem information specific to the SDPD and demonstrate the value that this additional information brings to the approximation of the value function. Ulmer et al. (pear) combines ATB with a rollout approach to generate solutions that improve on those of Ulmer et al. (2018). The combination of the offline proposed in this paper and rollout is an opportunity for future work. Older VRPSR literature relied on waiting strategies, strategies that have the vehicle wait in particular locations in anticipation of future requests. Mitrovi´c-Mini´c and Laporte (2004) provide a nice overview of the strategies. Of the strategies, only the wait-at-start heuristic (WAS) presented by Mitrovi´c-Mini´c and Laporte (2004) can be adapted to the problem in this paper. For WAS, the vehicles idles at the depot as long as possible. It is possible to adapt WAS because all assignments take place when the vehicle is located at the depot. For the SDPD, the application of WAS does not require any depot returns after the vehicle has left the depot. However, preliminary tests reveal WAS to perform poorly for the SDPD. Also related to same-day delivery is the grocery-delivery problem. While the deliveries for grocery delivery usually take place the next day, at the time an order is placed, the decision maker must determine whether or not the order can be feasibly served on the next day’s routes given the existing requests. In that case, the evaluation of each request is related to the need in this problem to evaluate the routing cost of a subset of customers. Recent work can be found in Ehmke et al. (2015) and Ehmke and Campbell (2014).

7

3

Problem Description and Model

In this section, we present the SDPD and model it as a Markov Decision Process (MDP).

3.1

Problem Description

We assume that a single vehicle delivers customer orders in a service area A. The vehicle starts the tour at a depot D, travels with constant speed ν and returns before the time limit tmax . The travel time between two customers C1 , C2 ∈ A is determined by a specific d(C1 , C2 ) ∈ N+ . The service time at a customer is ζ c . In the beginning, a set of initial orders (IOs) C0 ∈ A is known and must be served. We assume that these IOs are loaded on the vehicle at the beginning of the day. During the day, stochastic orders (SOs) C+ occur in A. When the vehicle is located at a customer or at the depot, the dispatcher determines which customer to visit next or whether to return to the depot. When at the depot, the vehicle can be loaded with packages destined for realized and assigned SOs. The loading time at a the depot ζ d is independent of the number of loaded orders. We assume that the IOs are loaded before the start of the horizon. In determining which SOs should be served, the dispatcher ensures the existence of a feasible tour serving all assigned IOs, previously loaded SOs, and newly loaded SOs. We assume that once loaded on the vehicle, the packages destined for SOs must be delivered. The objective is to maximize the expected number of SOs served over the horizon of the problem.

3.2

Markov Decision Process Model

In this section, we model the SDPD as a Markov Decision Process (MDP). The MDP for the SDPD can be formulated such that assignment decisions are made only when the vehicle is located at the depot. That is, when the orders are loaded. For algorithmic purposes, however, we consider preliminary assignments of customers at every decision epoch. These preliminary assignments are not binding and can be changed at future decisions, provided that the goods of the assigned customers have not yet been loaded on the vehicle. To facilitate these preliminary assignments, we present an alternative MDP-formulation that integrates a planned tour θk into the state variable Sk , the decision variable x, and in the reward function R(Sk , x). Because the planned tour is used only for the purposes of a heuristic solution method, it does not alter the validity of the model and 8

Table 1: MDP: Customer Set Notation Notation

Description

C0

Set of initial orders

C+

Set of stochastic orders

Cl (k)

Set of loaded orders in Sk

Cι (k)

Set of not loaded but preliminarily assigned orders in Sk

C (k)

Set of not loaded and preliminarily excluded orders in Sk

Cr (k) ⊂ C (k) Set of new requests in Sk

could be omitted. A general discussion of the use of planned routes in models for dynamic vehicle routing can be found in Ulmer et al. (2017). A decision epoch k occurs when the vehicle visits a customer or the depot. The decision state Sk ∈ S is defined as follows. At a minimum, the state must contain all of the data necessary for determining feasible decisions, the cost of those decisions as well as defining the transitions to future states. For the SDPD, we can think of the state in terms resources and information. The resources associated with the SDPD are the current time t(k) and the location of the vehicle Pk . The information in the SDPD is the information about service requests and the statuses of those requests. For this purpose, we have three sets per state, summarized in Table 1. We let Cl (k) ⊂ C0 ∪ C+ be the set of customers whose packages have been loaded on the vehicle for delivery. We call these loaded customers. We denote the set of assigned, but not loaded SOs, as Cι (k) ⊂ C+ and the set of excluded SOs as C (k). The set C (k) denotes the currently excluded SOs containing Cr (k) ⊂ C (k), the SOs realized between t(k − 1) and t(k) for k > 0. As noted previously, we also include in the state the planned feasible tour θk . This planned tour θk = (C1 , . . . , Cn , D, Cn+1 , . . . , Cn+m , D) defines the planned sequence of customer and depot returns and may contain preliminarily assigned SOs. These preliminary SOs are SOs that are part of the tour for planning purposes, but are not yet loaded to the vehicle. These preliminary SOs may in fact not be served by the vehicle if later decision change their assignment status. Thus, the state at decision epoch k is given by the tuple Sk = (t(k), θk , Pk , Cl (k), Cι (k), C (k)).

9

Decisions x ∈ X (Sk ) are made about the subsets C(ι,x) (k) ⊂ C (k) ∪ Cι (k) to preliminarily assign, the resulting subset C(,x) (k) to preliminarily exclude, and the corresponding update of θk to θkx . The update of the planned tour determines the next location to visit. This location Cnext ∈ {D} ∪ Cl (k) ∪ {Pk } can be chosen from the set of loaded customers, the depot, or the current location, a choice which implies that the vehicle idles at the current location for a length of time t¯. If the vehicle is located at the depot, Pk = D, the selection of assigned SOs C(ι,x) (k) at epoch k are loaded to the vehicle. Decision x is feasible if θkx starts at Pk , serves all customers Cl,x (k) ∪ C(ι,x) (k), and returns to the depot within the time limit. Formally, feasibility is given by: ¯ x ) ≤ tmax − t(k). d(θ k Each decision results in an immediate reward. This reward is equal to the number of newly assigned SOs minus the number of excluded formally included SOs. Formally, the reward R(Sk , x) is defined as

R(Sk , x) = |C(l,x) (k)| + |C(ι,x) (k)| − |Cl (k)| − |Cι (k)|. The execution of x results in a post-decision state Skx ∈ S x . The post decision-state represents the deterministic transition from Sk that results from the execution of x. Notably, the postdecision state captures the updated route plan and the updated sets of loaded, assigned, and excluded customers resulting from x: C(l,x) (k), C(ι,x) (k), C(,x) (k). If x calls for the loading of orders, the corresponding sets of loaded and not loaded customers are modified as follows. The customers corresponding to the loaded goods are transferred from Cι (k) to C(l,x) (k) and C(ι,x) (k) = ∅. The transition also updates the vehicle location to the next visit location resulting from x. Formally, given Sk and x, the post-decision state is Skx = (t(k), θkx , Cnext , C(l,x) (k), C(ι,x) (k), C(,x) (k)). The next decision epoch occurs at the point of time at which the vehicle finished service at that next location. The time of this decision epoch depends on the travel time function and is deterministic. The next decision epochs occurs at t(k+1) = t(k)+d(Pk , Pk+1 )+ζ with ζ ∈ {ζ c , ζ d , 0} dependent on the next location Pk+1 . Time ζ = 0 is the special case, when k + 1 = K. That is, the vehicle eventually returns to the depot at the end of the horizon. 10

t=120

t=120 P

P

Position

P

Customer (loaded) Customer (not loaded) Customer (excluded) Depot

x

Figure 1: Exemplary State, Decision, and Post-Decision State With each new decision epoch, there occurs a stochastic transition from the post-decision state to the next pre-decision state. This transition is defined by the realization ωk+1 ∈ Ωk+1 . This realization identifies the set of customer orders that were realized between t(k) and t(k + 1). Specifically, the realization ωk+1 provides a set of customers Cr (k + 1) = {C1 , . . . , Ch }. The new pre-decision state Sk+1 contains the time t(k + 1), the remaining loaded and unloaded customers dependent on C(l,x) (k), C(ι,x) (k), the vehicle’s location Pk+1 = Pkx , and the planned tour θk+1 . If no customer is visited at t(k + 1), Cl (k + 1) = C(l,x) (k) and θk+1 = θkx remain unchanged. Otherwise, the visited customer Ck+1 is removed from the corresponding set as θk+1 = θkx \{Ck+1 }. The initial state S0 is given by t(0) = 0, P0 = D, the initial orders C0 , and the initial tour θ0 = (D, D). The termination state SK is given by t(K) = tmax , PK = D, Cl (K) = ∅, and θK = (D). The objective for the SDPD is to determine an optimal decision policy π ∗ ∈ Π leading to the highest expected sum of rewards. Formally, the objective is " π ∗ = arg max E π∈Π

Decision rule

Xkπ (Sk )

K X

# R(Sk , Xkπ (Sk ))|S0 .

k=0

determines the decision x selected by policy π in state Sk .

11

(1)

3.3

Example

In this section, we present an example of the components of the MDP for the SDPD. The example is given in Figure 1. The example depicts the state at time t(k) = 120, the vehicle having just served a customer. The planned tour θk is depicted by the dashed lines. Tour θk plans to visit the loaded customer on the left side of the area, return to the depot, and then visit the unloaded but assigned customer and the loaded customer in the bottom of the area. Currently, two SOs are excluded, located in the right of the service area. They may be new SOs or orders formerly excluded. The selection of x leads to a post-decision state Skx . In Figure 1, this post-decision state is represented on the right-hand side of the block arrow. Decision x excludes the customer on the bottom left that is not currently loaded and includes the SOs located on the right side of the service area. The next location to visit is the customer on the left side. The dashed lines represent the feasible tour θkx . This planned tour may change as the vehicle returns to the depot before serving the currently assigned but not yet loaded customers. Since two orders are assigned and one is excluded, the resulting reward associated with the decision x is R(Sk , x) = 2 − 1 = 1.

4

Anticipatory Preemptive Depot Return Approach

It is well known that Equation (1) can be solved by using backward induction applied to the Bellman Equation or V (Sk ) = max {R(Sk , x) + V (Skx )}

(2)

V (Skx ) = E[V (Sk+1 ) | Skx ].

(3)

x∈X (Sk )

with

However, the traditional backward induction approach for solving Equation (2) suffers from the “curse of dimensionality.” That is, in many problems and the problem presented in this paper, the number of states is so large that a backward induction approach is impossible both in terms of computation time and the memory needed to store the solution. This curse has led to the development of what is known as approximate dynamic programming (see Powell (2011) for an overview

12

of approximate dynamic programming). In contrast to backward dynamic programming, approximate dynamic programming relies on a forward approach that traverses an MDP from initial to final state. However, in stepping forward, the second term of Equation (2) is unknown and must be approximated. As a result, these methods operate on the approximate Bellman Equation given by Vb (Sk ) = max

x∈X (Sk )

n o R(Sk , x) + Vb (Skx ) .

(4)

There are many methods for creating this approximation. The simplest is the myopic approach that simply sets the second term of Equation (4) to zero. We will use such an approach as one of our benchmarks. However, to anticipate the impact of the current state decision on future decisions and orders, we seek to learn values of the second term in Equation (4). To do so, we employ an offline simulation procedure known as Approximate Value Iteration (AVI). We highlight the specifics of our AVI procedure in §4.1. Because of the large number of states, it is not possible to implement AVI directly. We must instead operate on an aggregation of the states. We detail our aggregation scheme in §4.2. Unfortunately, in addition to the proliferation of states, the problem discussed in this paper also suffers from another curse of dimensionality, the size of the decision space. Notably, not only must we selection a subset at each decision epoch, but we must also route that selected subset. In this paper, we heuristically reduce the possible decision space by using a simple routing heuristic that incorporates depot returns. The details of our procedure can be found in §4.3. Combining the three elements of the solution approach, we call our solution approach the anticipatory preemptive depot return approach (APDR).

4.1

Approximate Value Iteration

In this section, we define our method for determining the approximate value of the second term in Equation (4). We present the method generally and write the method with respect to the postdecision state. However, in execution, we operate on an aggregated set of states. We define this aggregation in the following section. Our AVI method is derived from (Powell 2011, pp. 391) and uses offline simulation to determine approximated values. The approximated values are stored in a lookup table and can then be used to solve Equation (4) in real time. Thus, the computational burden of the approach is mainly 13

done prior to when decision making is required and greatly reduces the computational computational burden at runtime. The procedure is described in Algorithm 1. AVI starts with initial values Vb0 (Skx ) for every ¯ = post-decision state Skx . Then, AVI iterates through a set of N sample path realizations Ω {ω 1 , . . . , ω N }. Then, at each iteration i = 1, . . . , N and each step in a given sample path realization ω i , the algorithm solves the approximate Bellman equation (line 11) using the current approximation of the post-decisions states Vˆi−1 . Each iteration terminates at a final state SK . The observed value of the selected decision, given in line 13, is used in line 18 to update the approximated post-decision state values. The algorithm returns values VbN that we use to approximate the second term of Equation (4) at runtime. For each instance described in §5, we run 5 million iterations of AVI to tune our approximations.

4.2

State Space Aggregation

Because of the large number of post-decision states required, we cannot actually find values for each post-decision state and instead develop an approximation that operates on aggregated postdecision states. In aggregating the post-decision states, we seek to meet two criteria first presented in Barto (1998, p. 193). First, the resulting space needs to be of a size that allows a sufficient number of observations and a reliable approximation. Second, the aggregation must maintain the main distinguishing features of the post-decision space. As a starting point, we draw on Ulmer et al. (2018) that proposes the parameters point of time t and free time budget b as the basis for aggregation for the VRPSR. The current point of time t(k) is given in the state. The free time budget b(k) follows from the current time and current planned tour. Essentially, the free time budget is the amount of time left before the end of the horizon after serving the remaining planned tour θkx starting at time t(k). For the VRPSR, the earlier it is in the horizon and the more free time budget is left, the higher may the value of a post-decision state be. Formally, we define the free time budget as ¯ x ). b(k) = tmax − t(k) − d(θ k

(5)

While point of time and free time budget provide a sufficient aggregation for the VRPSR, the

14

Algorithm 1: Approximate Value Iteration ¯ = {ω 1 , . . . , ω N }, Step Size α Input : Initial Values Vb0 , Realizations Ω Output : Values VbN 1 // Simulation 2 i←1 3 while (i ≤ N ) do 4

k ← −1

5

R−1 ← 0

6

S0 ← S0ω

7

while (Skx 6= SK ) do

i

8

k ←k+1

9

x i if k ≥ 1 then Sk ← (Sk−1 , ωk−1 )

10

else Sk = S0

11

n o x b xk ← arg max R(Sk , x) + Vi−1 (Sk ) x∈X (Sk )

12

Skx ← (Sk , x)

13

Rk ← Rk−1 + R(Sk , x)

14

S x ← S x ∪ {Skx }

15

end

16

// Update

17

for all Skx ∈ S x do Vbi (Skx ) ← (1 − α)Vbi−1 (Skx ) + α(RK − Rk )

18 19

end

20

i←i+1

21 end 22 return VbN SDPD is complicated by the return trips to the depot, and future depot returns may significantly impact the value of a state. For example, a depot return early in the horizon might find only a few orders available for loading, and to thus serve customers requesting later in the horizon, the vehicle may need to return to the depot an additional time. This early return time will then likely have a 15

relatively lower future value. Alternatively, a depot return near the end of the horizon might find many requests that need to be served, but very little time to serve them. Then, again, the future value associated with such a depot return is low. Given the dynamic nature of the problem, we do not know exactly when the vehicle will return to depot if at all. However, with θk in the state, we do know at what time a depot return is currently scheduled. We integrate the time of this scheduled return into the aggregation. We denote the time associated with the first return to the depot given sk as a(k). The proposed aggregation A : S x → P ( N3 results in post-decision states A(Skx ) = pk represented by 3-dimensional vectors pk = (t(k), b(k), a(k)) ∈ P. Representation P spans a 3-dimensional vector space as defined in Equation (6):

P = {A(Skx ) : Skx ∈ S x }.

(6)

The value of a post-decision state Skx can now be represented by the value Vb of the vector pk : V (Skx ) ≈ Vb (A(Skx )) = Vb (pk ). To operate on the aggregated state space, the values Vbi (Skx ) are replaced by Vbi (pk ) in lines 11 and 18 of Algorithm 1. The application of A results in a significantly smaller state space. Since for the SPDP, all three parameters are discrete, and P can be associated with a 3-dimensional lookup table (LT) with dimensions t, b, a ∈ {0, . . . , tmax }. Given a sufficient level of aggregation, we can store the aggregated post-decision state values in the LT. One challenge with using a LT table for storing approximated values is that the appropriate level of approximation is not known in advance. For effective and efficient approximation, we partition the vector space with the dynamic lookup table approach (DLT) introduced in Ulmer et al. (2018). The DLT starts with a coarse-grained initial partitioning. During the approximation process, this partitioning adapts with respect to the observations and value deviation. Entries with a high number of observations and high value deviation are considered in more detail while other entries stay in their initial design. For the SDPD, all DLTs start with equidistant intervals of 16 in each dimension. Based on preliminary tests, the disaggregation thresholds are set to τ = 3.0 for both APDR and benchmark H3 and τ = 1.5 for both benchmark P 2 and benchmark H2 introduced later in §4.4. A disaggregation divides each interval of the entry in two equidistant halves. The number of observations is distributed equally to the new entries. The standard deviation of the new entries is set to the standard deviation of the original entry. The disaggregation of an entry stops 16

when the entry reaches an interval length of 1 minute. The update parameter α is set to the inverse of the number of observations. Ulmer et al. (2018) demonstrates the quality of this step-size rule for AVI coupled with DLT.

4.3

Subset Selection and Preemptive Depot Return Routing

The final component of our solution approach overcomes the challenge of the large decision space present in this problem. The decision space’s dimensionality is vast for two reasons. First, as long as a request is not loaded onto the vehicle, the MDP model allows for reconsideration of SOassignments. In combination with new SOs, this leads to a significant subset selection subproblem. Second, the set of potential routing plans is vast, especially when considering potential depot returns. Given this curse of dimensionality in the decision space, in the APDR, we use a heuristic to limit the number of subsets and routing schemes that must be considered when solving line 11 of Algorithm 1 and in runtime when solving Equation (4). These heuristics can be thought of as a means of restricting the decision space at each decision epoch. To alleviate the subset selection complexity, at every decision epoch, APDR and the benchmark policies presented in §4.4 maintain the already assigned SOs and determine only the subset of new SOs to assign. An SO that is left unassigned at a particular decision epoch is permanently excluded. For each state on each sample path of each iteration of Algorithm 1, a decision is selected in line 11. This decision involves the selection and routing of a subset of customers. As discussed previously, the size of the decision space is such that it is impossible to solve Equation (4) optimally. Instead, for each subset of new customer requests, we heuristically generate the new planned tour θkx , for a state Sk , a current tour θk , and a set of SOs Cr . We call the approach the preemptive depot return routing approach (PDR). To heuristically generate tours for a set of SOs, PDR draws on a modification of cheapest insertion (CI), which is first introduced by Rosenkrantz et al. (1974). We derive our implementation from that proposed in Azi et al. (2012). CI has the advantage of being efficient at every decision epoch. Further, the resulting routes are such that they are comprehensible to the driver. Because CI maintains the sequence of customers, the dispatcher might even be able to communicate approximate delivery times (Ulmer and Thomas pear). A downside of PDR is that it does not necessarily 17

1.

2.

3.

4.

Figure 2: Routing and Insertion for PDR return optimal routes and thus may reduce the set of feasible orders. The procedure of PDR is described in Algorithm 2 found in the Appendix. The general idea is to maintain the sequence of customers, to remove the first depot visit, to integrate the new requests with CI, and to integrate the depot between the current position of the vehicle and the first not loaded SO, again, with CI. Figure 2 shows an example for PDR. The first step shows state Sk , θk , and the candidate set of new SOs Cr . The state contains three loaded, one assigned but not loaded, and one new customer. The current depot return is planned after serving the customer located at the top left of the service area. In the second step, PDR removes the depot within the sequence θk . This resulting tour is infeasible because a depot return is required to pickup customer orders at the depot. In the third step, PDR then inserts the candidate subset of new customers via CI. The resulting tour is again infeasible. Feasibility is restored by the addition of a depot return before the first not loaded customer in the tour. The fourth step shows the depot return being inserted between Pk and the first not loaded customer Cn . Due to the stochasticity of the problem, the integration of the IOs at k = 0 may lead to an initial tour duration higher then the time limit. In these cases, the vehicle serves all IOs and none of the SOs. For k > 0, there always exists a feasible decision x assigning no new SOs to the tour. In these cases, θkx is equal to θk .

4.4

Benchmark Heuristics

In this section, we present the benchmark heuristics that we use to test the quality of the proposed approach. As with our proposed approach, each benchmark includes a strategy for estimating the future value of a decision and a strategy for heuristically routing a subset of requests. For anticipation, we consider both ATB aggregation proposed by Ulmer et al. (2018) for the VRPSR

18

and a myopic assignment strategy. For routing, we consider both the preemptive method proposed in the previous section and the well-established plan-at-home heuristic (PAH). The combination of these anticipation and routing strategies results in four benchmarks. We also consider a fifth benchmark derived from combining the proposed three-parameter aggregation scheme described in §4.2 with the PAH routing scheme described subsequently. Anticipation: Myopic and ATB We compare our proposed anticipation approach to both ATB and myopic anticipation. As described previously, ATB is similar to the approach proposed in this paper, but the aggregation does not include information about depot returns. Thus, ATB aggregates over only the point of time and the free time budget. For this paper, the lookup table for ATB is created using a AVI-procedure analogous to that described in Algorithm 1. For an additional point of comparison, we also consider using a myopic policy for anticipation. The myopic approach sets the second term of Equation (4) to zero. The myopic assignment strategy selects decision x leading to the assignment of the largest feasible subset in every decision epoch k. If several decisions with the same subset cardinality exist, the strategy selects the decision of these leading to the highest free time budget. Depot Returns: Plan at Home This paper introduces a preemptive return strategy. As a benchmark, we consider the PAH. In the benchmark, we replace the subset selection and routing discussed in §4.3 with PAH. Particularly, PAH is an approach that does not account for the possibility of preemptive depot returns. Thus, at time of the return to the depot, the vehicle is empty. Upon return, a set of new requests is selected for service, and the vehicle begins a new route. Specifically, the PAH approach is a modification of Algorithm 2 in the Appendix. The modification removes lines 2 through 6 and lines 18 through 25. In addition, line 10 is modified such that ¯ j = k, . . . , |θ|−1, where k is the position in θ¯ of the depot. We note that the modification of line 10 means that PAH does not serve IOs and SOs on the same tours. Like the approach discussed in §4.3, PAH is a means of restricting the decision space. At each decision epoch, the PAH approach seeks to accept some newly occurring requests for inclusion on the tour that will take place once 19

the vehicle returns to the depot. The routing of these accepted requests follows the just described version of Algorithm 2. In the case of a myopic assignment strategy, the selection of requests amounts to choosing the maximal feasible subset of requests. In the case of the three-parameter aggregation strategy and ATB, the subset selection follows the scheme discussed in §4.3 and the routing is replaced by the modification of Algorithm 2 described in the previous paragraph. As noted in §2, Azi et al. (2012), Voccia et al. (2017), and Klapp et al. (2016b), and Klapp et al. (2016a) implement PAH strategies. Our three-parameter and ATB anticipation schemes mimic the schemes in those papers that, while not making preemptive depot returns, control the length of the route to induce a depot return in the PAH scheme. Thus, 2- and 3-dimensional aggregation schemes combined with PAH can be thought of as benchmarks inspired by the literature and particularly Azi et al. (2012) and Voccia et al. (2017). Policy Notation The combination of routing and assignment strategies results in six different policies P g, Hg, g = 1, 2, 3. Parameter g indicates the assignment strategy, P and H the routing. The value g = 1 indicates myopic assignments, g = 2 the ATB-assignment based on the 2-dimensional aggregation, and g = 3 the assignments based on 3-dimensional aggregation presented in §4.2. Indicator P represents preemptive PDR-routing and H PAH-routing.

5

Experimental Design

In this section, we describe the test instances that we use to demonstrate the value of preemptive depot returns and of our proposed solution approach. For all instances, we assume a closed, rectangular service area A assumed to be 20km × 20km, a time horizon of 480 minutes discretized into 1 minute increments, and a vehicle speed ν of 20km/h. Assuming a minimum travel time of 1 minute, the travel time between any two points (ax1 , ay1 ) and (ax2 , ay2 ) in A is given by  x   ((a1 − ax2 )2 + (ay1 − ay2 )2 )1/2 d(C1 , C2 ) = max ,1 . 60−1 ν

(7)

For all instances, we also assume the service time at a customer is ζ c = 2 minutes and the loading time at the depot is ζ d = 5 minutes. 20

Each instance is defined by a set of parameters: expected number of customers c, the degree of dynamism dod, depot location D, and customer distribution F. The expected number of customers is the sum of the IOs and SOs. We test instances with c ∈ {30, 40, 50, 60, 80, 100} expected customers. The degree of dynamism, first discussed in Larsen et al. (2002), is the percentage of the expected number of customers that are dynamic. That is, the degree of dynamism is the percent of customers who are SOs. We test instances with dod ∈ {0.25, 0.5, 0.75}. We denote the expected number of IOs as c0 = c · (1 − dod). To analyze the interdependency of depot location and customer distribution, we define three different depot locations and three different customer distributions for a total of 9 combinations. We set the depot locations at D1 = (10, 10), D2 = (0, 20), and D3 = (0, 0). The latter two depot locations represent the situation in which the vehicle is part of a fleet, but operates independently in a predefined service area. For customer locations, we consider uniform and clustered customer distributions. We refer to uniformly distributed customer locations as U . We define two clustered distributions of customer locations. The first is a two cluster distribution, called 2C, with the two clusters centered at µ1 = (5, 5) and µ2 = (15, 15). Customer requests are equally assigned to the clusters, and the locations follow Normal distributions with respect to the cluster centers and standard deviation of σ = 1. Finally, we define a three-cluster distribution of locations, called 3C. In 3C, the cluster centers are located at µ1 = (5, 5), µ2 = (5, 15), and µ3 = (15, 5). We assign 50% of the orders to the second cluster, 25% to each of the other clusters. The standard deviations are set to σ = 1. A summary of the instance parameters is given in Table 2. In combination, we generate a set of 162 instances. We note that, for the uniform customer distribution, depot positions D2 and D3 result in identical instance settings. For each instance setting, we generate 1,000 realizations. We apply the proposed APDR and benchmarks to every realization. The details of realization generation can be found in the Appendix.

6

Computational Evaluation

In this section, we present the results of our computational experiments. We compare the proposed APDR approach to the five previously described benchmarks. Our results demonstrate the quality

21

Table 2: Instance Parameters Parameter

Values

Service area A

20km × 20km

Vehicle speed ν

20km/h

Expected number of customers c

30, 40, 50, 60, 80, 100 0.25, 0.5, 0.75

Degree of Dynamism dod Depot location D ∈ A

D1 = (10, 10), D2 = (0, 20), D3 = (0, 0)

Customer distribution F

U, 2C, 3C

of the proposed approach and also the value of preemptive depot returns. In our presentation, we characterize the instance parameters that both favor preemptive depot returns and those that do not.

6.1

Overall Solution Quality

In this section, we analyze the solution quality of the six different policies. Detailed results for every instance are available in Table A1 in the Appendix. To analyze the improvement, we measure the five benchmark policies versus the APDR. We first look at the overall quality of each solution to the others by comparing the average over all instance settings of the percentage differences in the average number of SOs served per instance setting. To do so, for each benchmark i and for the APDR, we compute the average number of SOs served over all realizations for each instance setting. For each benchmark i and instance setting j, we compute this value Qij for each benchmark i and QAPDR,j for the APDR approach. Then, for every benchmark i and instance setting j, we compute the percentage difference between a benchmark i and the APDR as QAPDR,j − Qij × 100%. QAPDR,j

(8)

We then average over these percentage differences to get the average percentage difference between APDR and each benchmark i. Figure 3 presents the average percentage difference in solution quality of the approaches. On the x-axis, each benchmark policy i is depicted. On the y-axis, the percentage improvement relative 22

20%

19.3% 18.3%

Difference Relative to APDR

15%

10%

9.7%

7.4%

5%

4.6%

H1

H2

H3

P1

P2

0%

Figure 3: Percentage Difference of the Average Stochastic Orders Served by APDR and the Benchmark Policies to APDR is shown. Positive values indicate that APDR outperforms the benchmark. The values indicate that the proposed APDR approach is best overall. The improvement of APDR is at least 4.6% and with a difference of 19.3% when compared to H1. The results also show that the quality of the APDR approach is due both to the preemptive returns and also to the inclusion of planned depot return information in the aggregation. With P 1 being 4.6% worse than APDR and P 2 being 7.4% worse than APDR, both less than the difference between the APDR and the plan-at-home approaches, the results also show the relative advantage of preemptive depot returns. In §6.3, we analyze the reason that preemptive depot returns are beneficial and also characterize the instance settings in which preemptive returns provide the most value. We also observe a significant gap between APDR and benchmarks P 2 and P 1 as well as a significant gap between H3 and benchmarks H2 and H1. Notably, the improvement from APDR to P 2 is 7.4%, even higher than that of APDR compared to P 1. Recall that policy P 2 is based on the 2-dimensional aggregation of the state space. This aggregation ignores the planned arrival time of a return to the depot. Likewise, H3 is 8.6% and 9.6% percentage points better than H2 and H1, respectively. These significant improvements of the 3-dimensional aggregation over the 223

dimensional case indicate the benefit of capturing the planned depot return time in the aggregation. We investigate the importance of the planned depot return time in the next section.

6.2

The Value of Including Planned Depot Arrival Time in the State Space Aggregation

In this section, we analyze the benefit of 3-dimensional aggregation, notably the inclusion of the planned depot arrival time in the aggregation. To do this, we use an example to show how the value of the aggregated states changes with the planned depot arrival times for both the APDR and H3 approaches. Showing these changes demonstrates the sensitivity of the post-decision state value to the planned depot return time. Specifically, we focus on the instance setting in which c = 50, dod = 0.5, the customers are distributed in two clusters (2C), and the depot is in the center (D1 ). For this instance setting, the solution quality of APDR and H3 are nearly similar with 10.0 and 10.1 assignments on average, respectively. For the purposes of the example, we focus on time t = 180. At t = 180, for H3, the vehicle has usually not yet returned to the depot. We select a free time budget of b = 100 since preliminary tests have revealed frequent observations for the combination of t = 180 and b = 100. With the time and time budget fixed, only the planned arrival time to the depot varies in our example. As a result, only arrival times of 180 ≤ a ≤ 380 = 480 − b are possible. For the just described setting, Figure 4 presents the post-decision state values across planned arrival time values for both the APDR and H3 at time 180 and time budget 100. The x-axis shows the planned arrival time a and the y-axis the value for the corresponding vector. That is, y-axis shows how many assignments are expected for the corresponding post-decision states. The solid line depicts the value of the APDR approach and the dashed line H3. The occasional plateaus in the values are the result of the varying interval sizes of the DLT. Figure 4 shows that the value of the post-decision state is sensitive to the planned arrival time. For example, the post-decision state value of the APDR at a = 200 is 3.86 while for a = 350, the value is 4.82, a difference of nearly 25%. Likewise, the post-decision state value of the H3 at a = 200 is 4.12 while for a = 350, the value is 4.74, a difference of nearly 15%. In contrast, P 2 and H2 neglect parameter a and evaluate every post-decision state with t = 180, b = 100 with the

24

Value

5

4

APDR

H3

3 180

240

300 Arrival Time

360

420

Figure 4: Value for APDR and H3, Instance Setting c = 50, dod = 0.5, 2C, D1 . Point of Time t = 180, Free Budget b = 100 values 4.62 or 3.83, respectively. As a result, their performances of these benchmarks are inferior to APDR and H3, respectively. The question remains as to why the value of the post-decision state is sensitive to the time of the planned depot return. To answer this question, we first examine H3. We observe an increase in the value of the post-decision state until about time a = 260. For 260 ≤ a ≤ 300, the value remains relatively constant and drops for a > 300. This behavior can be explained by two influencing factors. First, as more time passes, more new requests will be accumulated. Further, because it is a plan-at-home policy, the insertion costs for H3 decline as more customers are added to a tour. That is, insertion costs improve with density. Accumulating enough customers to achieve this density takes time. This factor explains the initial rise in the value of the post-decisions states for H3. The second factor is the length of the initial tour. As a plan-at-home strategy, the H3 approach must finish its initial tour before returning to the depot. For the instance setting chosen for this example, the average initial tour is 288.2, and thus the H3 strategy often achieves only a single depot return. As a result, the majority of SOs requesting after the first depot return are not assigned to the vehicle. Thus, while a later arrival time allows the accumulation of more assignments and 25

thus more efficient tours of those assignments, a depot return too late in the horizon begins to limit the number of orders that can be served. Yet, it is important to note that, the closer the arrival time is to 380, the more the value of a planned arrival decreases. The information about the arrival time is valuable in determining the requests that should be loaded at the vehicle’s first return to the depot. Essentially, the inclusion of the planned depot return time in the aggregation helps determine whether or not the second tour should be longer or shorter. The APDR post-decision state values exhibit a different behavior than those for H3. The postdecisions state values increase until a ≈ 300, though at a much slower rate of increase than is exhibited by H3. After a = 300, a similar behavior to that of H3 is observed. As with H3, the increasing value of the post-decision state value up to a < 300 is the result of the need for the accumulation of requests and the value to tours that result from the accumulation. However, because preemptive returns are possible with APDR, the increase in value is slower than with H3. If the initial tour is too long, the APDR strategy can simply choose to return to depot to pickup accumulated requests. Again though, depot returns too late in the horizon offer little value as there is simply too little time to service additional requests. We also note that, in this example, the value of APDR is generally lower than that of H3. This behavior does not imply that APDR performs worse than H3. Rather, at this point in time, APDR has already assigned more customers to the planned route. Therefore, the expected value of the future is lower than that of H3. To further study the impact that the planned depot returns have on routing decisions, we turn to a second example. This example draws on a realization of the instance setting with 80 customers, a degree of dynamism of 0.5, two clusters of customers (2C), and the depot in the third position (D3 ). We choose this example because it is a good demonstration of the value of combining preemption with the planned depot return times. For this instance setting, the average initial free time budget is only b(0) = 46.1 minutes. That is, less than 10% of the horizon is available to serve new requests. The average required detour to return to the depot for APDR is δ¯ = 15.0 minutes plus five minutes of loading at the depot. Details can be found in Tables A1 and A2 in the Appendix. Figure 5 depicts the routes for the APDR and benchmarks. The first tour is depicted by the circles, the second tour, occurring after the depot return, by the triangles. The blank markers indicate IOs, the filled markers assigned SOs. The routing of policy APDR is shown in Figure 5a.

26

20

20

20

Tour 1

Tour 1

Tour 1

Tour 2

Tour 2

Tour 2

0 0

20

0

0 0

(a) APDR

20

(b) P 1, P 2

0

20

(c) H1, H2, H3

Figure 5: Routing for a realization of Instance c = 80, dod = 0.5, D2 , and 2C Figure 5b shows the routes for P 1 and P 2, which are the same for this realization. Figure 5c shows the routes for H1, H2, and H3, which are also the same for this realization. For all H-policies, the vehicle serves all IOs before returning to the depot and is only able to serve a single SO in the second tour. Policies P 2 and P 1, preemptive approaches that do not consider planned depot return times, exhibit returns to the depot immediately after serving the first IO. As a result, almost no SO requests have been accumulated, resulting in only two SOs being assigned the second tour. Due to the consideration of the depot arrival time, the APDR approach avoids the early return and instead chooses to return as the vehicle is about to travel from one cluster to the next. As a result, there has been more time for SOs to accumulate, and five SOs can be integrated following the depot return.

6.3

The Value of Preemptive Depot Returns

As seen in §6.1, APDR performs on average 9.7% better than H3. Yet, as the first example in the previous section indicates, there exists some instance settings for which preemption does not add value. In this section, we analyze the instance settings with respect to the improvement enabled by preemptive depot returns. Figure 6 shows percentage difference between average solution value returned by APDR and that by H3 across all instance settings. Each column of the figure represents a degree of dynamism and each row a different number of customers. The y-axis of each row represents the depot loca27

DOD: 25

DOD: 50

DOD: 75 Customers: 30

U 3C 2C

Customers: 40

U 3C 2C

Customers: 50

U

2C

Depot D1 D2

Customers: 60

Distribution

3C

U 3C 2C

D3

Customers: 80

U 3C 2C

Customers: 100

U 3C

100%

75%

50%

25%

0%

100%

−25%

75%

50%

25%

0%

100%

−25%

75%

50%

25%

0%

−25%

2C

value

Figure 6: Improvement of APDR compared to H3 tions, and the x-axis of each column the percentage difference of the average solution values. Figure 6 shows a general pattern with respect to the number of customers and the degree of dynamism. As the number of customers increase and the degree of dynamism decrease, the performance of APDR compared to H3 improves. Of note, when either the number of customers is large or the degree of dynamism decreases, the expected number of initial orders is relatively higher. For the example, with 80 customers and a degree of dynamism of 0.5, 40 IOs can be expected per realization. In cases such as this, it is more likely that a realized SO is close to an existing IO and thus the marginal cost of serving this SO is relatively lower. Preemptive returns allow APDR to take advantage of these lower marginal insertion costs. As the number of expected IOs decreases, the marginal costs of serving SOs in the existing tour increases and the value of preemption and thus APDR declines. For example, in the case of 30 and a degree of dynamism of 0.75, depicted in the first row of the right column, only 7.5 IOs are expected per realizations. Accordingly, APDR does not offer improvement. We further examine the impact of the expected number of IOs in Figure 7. The x-axis is the expected number of IOs. The y-axis represents the improvement of APDR relative to H3 for the given expected number of IOs. A trendline runs from left to right. The general pattern described 28

1.00





Percentage Improvement of ADPR Relative to H3

0.75



Distribution ●

0.50

● ● ●



2C



3C



U



Depot ●



D1



● ●

D2

● ●

D3 ●

0.25





● ●



● ●



● ●











0.00

● ●

● ● ●

● ●

● ●

● ● ● ●

● ●

● ●

20



● ● ●





40

60

Expected Number of IOs

Figure 7: Improvement with respect to the expected number of IOs c0 previously is evident. We observe an increasing positive difference between APDR and H3 with an increasing number of IOs. The trendline suggests that the shift from negative to positive happens just before 20 expected IOs. For the proposed service area size and travel speed, 20 IOs creates a density such that value is gained by integrating SOs into the existing tour. This result suggests that, when partitioning an area into service zones for fleets, care should be taken to partition in a way that allows each zone to have a sufficient number of initial requests. While there exists a general pattern with regard to the number of customers and the degree of dynamism, the patterns are less clear with regard to the depot locations and customer distributions. To better understand this phenomenon, we focus on the instance settings with 80 customers and a degree of dynamism of 0.5. These results are depicted in the second column and fifth row of Figure 6 and for convenience in Figure 8. The second example in §6.2 is given in the third bar from the bottom in Figure 8. In this setting, the first depot position (D1 ) generally results in a positive difference between APDR and H3. Essentially, the first depot position is generally placed among the customers such that the cost in terms of travel time of a depot return does not overwhelm the relatively low cost of being able to insert new requests into the existing route.

29

DOD: 50

Customers: 80

Distribution

U

3C

Depot D1 D2 D3

100%

75%

50%

25%

0%

−25%

2C

value

Figure 8: Improvement of APDR compared to H3 for c = 80, dod = 0.5 The same cannot be said for the second and third depot positions. Consider the case of the two cluster customer distribution (2C). In this case, with the depot in the third position (D3 ), improvement of APDR over H3 is 72.7% as we showed in the second example in the previous section. Yet, for the second depot position (D2 ), the difference is negative at −5.4%. This negative difference can be explained by the typical routing for this instance setting. The depot is located in the lower left corner of the service area, close to the first customer cluster. The second cluster is far away. For the first and third depot locations, a preemptive depot return is either conducted after serving customers in the first cluster or after serving customers in both clusters. For the second depot position, a depot return after serving the first cluster is costly. In the second case, the routing of APDR is similar to H3. In both cases, the potential of preemptive returns cannot be exploited. For the three cluster customer distribution (3C), the relationship between the third depot position experiences a negative difference and the second a positive difference. The difference results from the relative cost of a return to the depot and a sufficient passage of time to accumulate SOs. In essence, the depot location significantly impacts the potential of preemptive depot returns. The results related to the 2C and 3C depot locations suggest that the decision of whether or not to implement preemptive depot returns should be made for a vehicle based on the characteristics of 30

the service area being served by the vehicle.

7

Conclusion and Outlook

In this paper, we explore preemptive depot returns for the SDPD, a dynamic one-to-many pickup and delivery problem induced by a same-day delivery application. We present an anticipatory assignment and routing policy APDR. APDR is based on approximate dynamic programming and enables explicit decisions about preemptive depot returns. In extensive computational studies, we show that preemptive depot returns and our APDR approach in particularly increase the number of deliveries per workday. Our analysis of our computational tests show that ADPR is most beneficial when density is high enough to reduce the relative marginal cost of serving a new request. Our results also show that preemptive returns are most effective when the returns occur late enough in the horizon that enough time passed so that a sufficient number of stochastic customer requests have accumulated but not so much that there is no longer time to serve the new requests. If considering a fleet of vehicles, these results provide guidelines for how the delivery area can be partitioned so that the delivery vehicles can benefit from preemptive depot returns. There are a number of directions for future research. First, the presented state-space aggregation does not explicitly account for spatial information. Notably, the routing behavior presented in Figure 5 is not achieved for every realization. For some realizations, APDR results in the same routing as P 2 and P 1. Such cases might benefit from the inclusion of spatial information in the aggregation scheme. The authors are not aware of any fully offline approximate dynamic programming approach, whether it state-space aggregation or value-function approximation, that has successfully incorporated spatial information in the routing of vehicles. A second area of future research might consider a fleet of vehicles that are not constrained by delivery zones. As noted previously, our second and third depot positions can represent the position of a depot for a fleet divided into delivery zones, but we do not explicitly consider integrated decision making for a fleet. In the integrated fleet context, the approach presented in this paper, particularly the state space aggregation would require alteration to consider the impact of multiple, interacting vehicles. A third area of future research would be variants of the problem that incorporate third party

31

and/or crowdsourced vehicles. In addition, APDR may be extended to communicate potential delivery times to the customers. These could be also used for pricing decisions for time windows. Finally, the general area of same-day delivery additionally offers challenges on strategic and tactical decision levels. For instance, future research might consider suitable depot locations as well as the flow of inventory between depots.

References Azi, N., M. Gendreau, and J.-Y. Potvin (2012). A dynamic vehicle routing problem with multiple delivery routes. Annals of Operations Research 199(1), 103–112. Barto, A. G. (1998). Reinforcement learning: An introduction. MIT press. Bent, R. W. and P. Van Hentenryck (2004). Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Operations Research 52(6), 977–987. Berbeglia, G., J.-F. Cordeau, and G. Laporte (2010). Dynamic pickup and delivery problems. European Journal of Operational Research 202(1), 8 – 15. Bertsimas, D. J., P. Chervi, and M. Peterson (1995). Computational approaches to stochastic vehicle routing problems. Transportation Science 29(4), 342–352. BI Intelligence (2017, February 10,). National retail federation estimates 8-122017. [Online; accessed 10-November2017]]. Ehmke, J. F. and A. M. Campbell (2014). Customer acceptance mechanisms for home deliveries in metropolitan areas. European Journal of Operational Research 233(1), 193–207. Ehmke, J. F., A. M. Campbell, and T. L. Urban (2015). Ensuring service levels in routing problems with time windows and stochastic travel times. European Journal of Operational Research 240(2), 539–550. Ghiani, G., E. Manni, A. Quaranta, and C. Triki (2009). Anticipatory algorithms for same-day courier dispatching. Transportation Research Part E: Logistics and Transportation Review 45(1), 96–106. Ghiani, G., E. Manni, and A. Romano (2017, July). Scalable anticipatory policies for the dynamic and stochastic pickup and delivery problem. Submitted for publication. Ghiani, G., E. Manni, and B. W. Thomas (2012, oct). A Comparison of Anticipatory Algorithms for the Dynamic and Stochastic Traveling Salesman Problem. Transportation Science 46(3), 374–387. Goodson, J. C., B. W. Thomas, and J. W. Ohlmann (2016). Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transportation Science 50(2), 591 – 607.

32

Goodson, J. C., B. W. Thomas, and J. W. Ohlmann (2017). A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs. European Journal of Operational Research 258(1), 216–229. Hyyti¨a, E., A. Penttinen, and R. Sulonen (2012). Non-myopic vehicle and route selection in dynamic darp with travel time and workload objectives. Computers & Operations Research 39(12), 3021–3030. Kall, P. and S. Wallace (1994). Stochastic Programming. John Wiley & Sons. Keyes, D. (2017, August 11,). E-commerce will make up 17company is the main reason. [Online; accessed 10-November2017]. Klapp, M. A., A. L. Erera, and A. Toriello (2016a). The dynamic dispatch waves problem for same-day delivery. submitted for publication. Klapp, M. A., A. L. Erera, and A. Toriello (2016b). The one-dimensional dynamic dispatch waves problem. Transportation Science. Larsen, A., O. Madsen, and M. Solomon (2002). Partially dynamic vehicle routing-models and algorithms. Journal of the Operational Research Society, 637–646. Mes, M., M. van der Heijden, and P. Schuur (2010). Look-ahead strategies for dynamic pickup and delivery problems. OR spectrum 32(2), 395–421. Mitrovi´c-Mini´c, S. and G. Laporte (2004). Waiting strategies for the dynamic pickup and delivery problem with time windows. Transportation Research Part B: Methodological 38(7), 635–655. Mu˜noz Carpintero, D., D. S`aez, C. E. Cor`es, and A. N´un˜ ez (2015). A methodology based on evolutionary algorithms to solve a dynamic pickup and delivery problem under a hybrid predictive control approach. Transportation Science 49(2), 239–253. Powell, W. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality (Second ed.). Hoboken, NJ, USA: John Wiley and Sons. Pureza, V. and G. Laporte (2008). Waiting and buffering strategies for the dynamic pickup and delivery problem with time windows. INFOR: Information Systems and Operational Research 46(3), 165–176. Rosenkrantz, D. J., R. E. Stearns, and P. Lewis (1974). Approximate algorithms for the traveling salesperson problem. In Switching and Automata Theory, 1974., IEEE Conference Record of 15th Annual Symposium on, pp. 33–42. IEEE. S´aez, D., C. E. Cort´es, and A. N´un˜ ez (2008). Hybrid adaptive predictive control for the multi-vehicle dynamic pick-up and delivery problem based on genetic algorithms and fuzzy clustering. Computers & Operations Research 35(11), 3412–3438.

33

Secomandi, N. (2003). Analysis of a rollout approach to sequencing problems with stochastic routing applications. Journal of Heuristics 9(4), 321–352. Thomas, K. (2017, November 6,). CVS will offer next-day delivery of prescription drugs. [Online; accessed 10-November2017]. Ulmer, M. W., J. C. Goodson, D. C. Mattfeld, and M. Hennig (to appear). Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transportation Science. Ulmer, M. W., J. C. Goodson, D. C. Mattfeld, and B. W. Thomas (2017). Dynamic vehicle routing: Literature review and modeling framework. Ulmer, M. W., D. C. Mattfeld, and F. K¨oster (2018). Budgeting time for dynamic vehicle routing with stochastic customer requests. Transportation Science 52(1), 20–37. Ulmer, M. W., N. Soeffker, and D. C. Mattfeld (2018). Value function approximation for dynamic multiperiod vehicle routing. European Journal of Operational Research. Ulmer, M. W. and B. W. Thomas (to appear). Enough waiting for the cable guy - estimating arrival times for service vehicle routing. Transportation Science. Voccia, S. A., A. M. Campbell, and B. W. Thomas (2017). The same-day delivery problem for online purchases. Transportation Science. Wahba, P. (2017, August 31,). Best buy and macy’s ramp up same-day delivery in race with amazo. [Online; accessed 10-November2017]. Yahoo!

Finance (2016, March 18,).

Increased value-added services expected to boost the

same-day delivery market in the US, says Technavio.

Yahoo!

Finance, Available

from http://finance.yahoo.com/news/increased--value--added--services-expected--204000803.html, accessed on July 14, 2016. Yang, W., K. Mathur, and R. Ballou (2000). Stochastic vehicle routing problem with restocking. Transportation Science 34(1), 99–112.

34

Appendix In the Appendix, we present instance generation details, the PDR-algorithm as well as the results and parameters for every instance setting.

A.1

Instance Generation Details

In the following, we describe how the realizations for the computational evaluation are generated. The number of customers and the order times for a realization are generated by a Poisson process P. With c0 the expected number of IOs, the number of IOs is generated by P(c0 ). The spatial and temporal probability distribution for order times and locations is divided into two independent probability distributions. The times of SO occurrences are (discretely) uniformly distributed t ∼ UZ [1, tmax − 1]. Customer locations f (C) ∈ A are realizations f ∼ F of the spatial probability distribution F : A → [0, 1]. A realization of the order time is again conducted by a Poisson process P for every minute 0 < t < tmax . Given two points of time 0 < tj < th < tmax , this results h in an expected number of customers of cttj = Eω∈Ω {Ciω ∈ C+ω : tj < ti ≤ th } ordering in times tj < ti ≤ th as described in Equation (A1). h

cttj = dod · c ·

A.2

th − tj T −2

(A1)

Preemptive Depot Returns: Algorithm

This section presents a detailed algorithm for the PDR routing heuristic that was described in §4.3. Let D denote the depot, Pk the vehicle’s position, Cl the loaded IOs. Further, we let Cn represent the assigned unloaded SOs. The current planned tour can then be described as θk = (Pk , Cl , . . . , Cl , D, Cn , Cl , Cn , . . . , Cl , D). Let θkj refer to the j th component of θk , e.g., θk1 = Pk . Further, let Cr = {Cr1 , . . . , Crh } be the ¯ subset of new SOs to assign. PDR first removes the depot from θk leading to an infeasible tour θ. In this infeasible tour, the customers Cr are subsequently inserted via CI at the cheapest position. ¯ θ∗ , C ∗ ) inserts the new order C ∗ after θ∗ in tour θ. ¯ When all new customers are Procedure Insert(θ, inserted, the depot is inserted between the current position and the first not loaded customer (Cn 35

Algorithm 2: Preemptive Depot Returns Input : Tour θk = (Pk , Cl , . . . , Cl , D, Cn , Cl , Cn , . . . , Cl , D), New orders Cr = {Cr1 , . . . , Crh } Output : New tour θ¯ 1 θ¯ ← ∅ 2 // Remove Depot 3 for all θki , i = 1, . . . , |θk | − 1 do 4

if θki 6= D then θ¯ ← θ¯ ∪ {θki }

5 end 6 θ¯ ← θ¯ ∪ {D} 7 // Integrate Orders 8 while Cr 6= ∅ do 9 10

δ←M ¯ − 1 do for all C i , θ¯j , i = 1, . . . , |Cr |, j = 1, . . . , |θ| if d(θ¯j , C i ) + d(C i , θ¯j+1 ) − d(θ¯j , θ¯j+1 ) ≤ δ then

11 12

C ∗ ← C i , θ∗ ← θj

13

δ ← d(θ¯j , C i ) + d(C i , θ¯j+1 ) − d(θ¯j , θ¯j+1 )

14

end

15

end

16

¯ θ∗ , C ∗ ), Cr ← Cr \{C ∗ } θ¯ ← Insert(θ,

17 end 18 // Integrate Depot 19 δ ← M , j ← 0 20 while θ¯j ∈ / {Cn , Cr , D} do 21

if d(θ¯j , D) + d(D, θ¯j+1 ) − d(θ¯j , θ¯j+1 ) ≤ δ then θ∗ ← θj , δ = d(θ¯j , D) + d(D, θ¯j+1 ) − d(θ¯j , θ¯j+1 )

22 23

end

24

j ←j+1

25 end ¯ θ∗ , D) 26 θ¯ ← Insert(θ, 27 return θ¯

or Cr ) via CI resulting in a tour θkx . If θkx does not violate the time limit, the tour is feasible. We assume an initial tour θ0 = (D, D) without customers, starting and ending at the depot.

A.3

Detailed Results

In this section, we present the detailed results for the computational experiments discussed in §6. The first table presents the average number of assignments for APDR and each benchmark over all realizations for each instance setting. The second table presents the average number of depot returns, the average time required for a depot return, and the initial free time budget over all

36

realizations for each instance setting for APDR.

37

Table A1: Average Number of Assignments U

Distribution dod

Depot c

2C

3C

APDR

P2

P1

H3

H2 H1

APDR

P2

P1

H3

H2

H1

APDR

P2

P1

H3

H2

H1

0.25 D1

30

2.7

2.7

2.7

2.4

2.2 2.3

4.6

4.6

4.6

4.5

4.5

4.5

4.5

4.5

4.5

4.6

4.5

4.5

0.5

D1

30

6.4

6.3

6.1

6

5.5 5.6

8.6

8.6

8.7

8.5

8.5

8.5

8.3

8.2

8.3

8.3

8.3

8.3

0.75 D1

30

0.25 D2

30

1.1

1

1.1

1

1

3.2

3.1

3.1

3.3

3.3

3.3

2.6

2.5

2.6

2.8

2.6

2.8

0.5

D2

30

3.8

3.5

3.6

3.8

3.3 3.5

6.5

6.3

6.3

6.9

6.8

6.7

5.6

5.3

5.4

6.4

6.3

6.1

0.75 D2

30

7.3

6.9

7

7.8

7.7 7.2

10

9.6

9.8 10.6 10.4 10.5

9

8.8

8.8

9.7

9.6

9.5

0.25 D3

30

1.1

1.1

1.1

1

1

2.2

2.2

2.2

2.2

2

2.1

3.2

3.1

3.2

3.5

3.4

3.4

0.5

D3

30

3.8

3.5

3.6

3.8

3.3 3.5

4.7

4.5

4.6

5.3

5.2

5.1

6.4

6

6.1

6.9

6.8

6.7

0.75 D3

30

7.3

7

7

7.8

7.7 7.2

8.1

7.9

8

8.8

8.8

8.6

10.1

9.6

9.6 10.6 10.2 10.3

0.25 D1

40

1.5

1.4

1.4

1.1

1

4.9

4.8

4.9

4.8

4.6

4.7

4.4

4.4

4.4

4.4

3.8

4.3

0.5

D1

40

5.9

5.6

5.4

5

4.4 4.3

9.6

9.3

9.6

9.8

9.5

9.6

9

8.8

8.9

9.3

8.8

9

0.75 D1

40

0.25 D2

40

0.4

0.4

0.4

0.4

0.4 0.4

3.2

3.1

3.1

3.3

3.2

3.2

2.2

2.1

2.2

2.2

1.9

2.1

0.5

D2

40

3

2.6

2.8

2.8

2.3 2.5

7.3

6.7

6.9

7.8

7.6

7

5.6

5

5.4

6.3

5.8

5.6

0.75 D2

40

7.7

7.1

7.2

8.1

7.7 6.6

11.8 10.9 11.3 12.6 12.4 12.3

10.2

9.7

9.9 11.5 11.4 10.7

0.25 D3

40

0.4

0.4

0.4

0.3

0.3 0.3

1.7

1.6

1.7

1.1

1

1.1

2.8

2.5

2.7

3.1

2.5

2.9

0.5

D3

40

2.9

2.6

2.8

2.8

2.3 2.5

5.2

4.9

4.9

4.9

4.6

4.5

6.8

6

6.3

7.6

7.3

6.7

0.75 D3

40

7.8

7.3

7.3

8.3

7.9 6.7

9.2

8.8

8.8 10.2

10

9.3

0.25 D1

50

0.4

0.4

0.4

0.3

0.3 0.3

4.6

4.4

4.5

4.2

3.4

4

3.7

3.6

3.6

3.2

2.6

3

0.5

D1

50

4.6

4.2

3.9

3.5

3 2.8

10

9.6

9.9 10.1

9.3

9.3

9.3

8.9

8.9

9.3

7.9

8.3

0.75 D1

50

0.25 D2

50

0.1

0.1

0.1

0.1

0.1 0.1

2.7

2.4

2.6

2.9

2.2

2.8

1.3

1.1

1.2

1.2

1

1.1

0.5

D2

50

1.8

1.7

1.7

1.7

1.4 1.5

7.4

6.4

6.7

7.8

7.6

6.5

5.4

4.6

5.1

5.5

4.7

4.5

0.75 D2

50

7.8

7.2

7.3

8.1

7.5

0.25 D3

50

0.1

0.1

0.1

0.1

0.1 0.1

0.8

0.7

0.8

0.3

0.2

0.3

2

1.6

1.8

2.1

1.5

2

0.5

D3

50

1.8

1.6

1.7

1.7

1.4 1.5

4.9

4.4

4.5

3.4

3.1

3.1

6.7

5.4

5.8

7.4

6.2

5.8

0.75 D3

50

8

7.4

7.3

8.1

7.4 5.9

0.25 D1

60

0.1

0.1

0.1

0

0.5

D1

60

2.6

2.3

2.3

0.75 D1

60

0.25 D2

60

0.5

10.6 10.5 10.1 10.3 10.2 9.7 0.9

0.9

1

11.5 11.2 10.6 10.9 10.6 9.5

12.2 11.7 10.8 11.4 10.5 8.9

6

12.4 12.3 12.5 12.3 12.1 12.1

14.6 14.4 14.8 14.7

14 14.2

16.4 15.9 16.3 16.3 15.8 15.8

13.2 11.7 12.4 13.7 13.4 12.7

12.2 12.2 12.2 12.4 12.2 12.2

14.1 13.8 13.9 14.3 13.9 13.9

11.7 10.7 10.9 12.5 12.1 11.6

15.3 14.8

15 15.7 15.4 14.8

11.2 10.4 10.7 12.6 12.5 10.7

10

9.6

9.4 10.6 10.3

8.6

0

3.9

3.8

3.8

3

2.2

2.8

2.5

2.3

2.3

1.6

1.3

1.6

1.8

1.6 1.5

10

9.6

9.6

9.8

8.4

8.3

9.1

8.5

8.5

8.4

6.3

6.9

12.2 11.4 10.1 10.7

9.3 7.6

17.7

0

0

0

0

0

0

0

17 17.3 17.6 17.1 16.9

1.7

1.4

1.6

1.8

1.3

1.8

7.3

5.9

6.4

7.9

6.6

6.4

12.7 10.9 11.5 13.5 13.3 11.5

16.4 15.7 15.7 16.7 16.2

15

0.6

0.5

0.6

0.5

0.4

0.5

4.6

3.8

4.2

4.3

3.2

3.5

11 12.7 12.2

9.5

D2

60

1

0.9

0.9

0.9

0.8 0.8

0.75 D2

60

7.6

6.9

6.8

7.5

6.6 5.1

0.25 D3

60

0

0

0

0

0.5

D3

60

0.8

0.7

0.8

0.75 D3

60

7.6

6.9

6.9

0.25 D1

80

0

0

0

0

0

1.4

1.3

1.3

0.5

D1

80

0.4

0.4

0.4

0.3

0.3 0.2

9.2

8.5

7.9

0.75 D1

80

11.2 10.1

8.5

8.7

7.2 5.6

19.3

0.25 D2

80

0

0

0

0

0

0.2

0.2

0.2

0.2

0.1

0.2

0

0

0

0

0

0

0.5

D2

80

0.1

0.1

0.1

0.1

0.1 0.1

4.8

3.6

4.1

5.1

3.1

4.4

2.1

1.9

2

1.8

1.2

1.6

0.75 D2

80

5.7

5.1

5.1

5.3

4.1 3.8

14.9 12.5 13.5 15.8 15.5

9.6

11.3 10.3 10.8 11.4 10.8

7.1

0.25 D3

80

0

0

0

0

0.5

D3

80

0.1

0.1

0.1

0.75 D3

80

5.5

4.9

0.25 D1

100

0

0

0.5

D1

100

0

0.75 D1

100

0.25 D2

100

0.5

D2

0

14.3 12.3 13.1 14.7 14.4 11.8

11.4 10.4

0

0.2

0.1

0.2

0

0

0

0.9

0.7

0.8

0.9

0.6

0.9

0.7

0.6 0.7

3.9

3.3

3.5

1.9

1.6

1.7

6

4.4

4.9

6.7

4.3

5.2

7.5

6.3 5.2

10.7 10.2 10.2 10.3

10

7.5

0.5

0.4

0.5

0.3

0.3

0.3

0.1

0.1

0.1

7.2

4.8

5.2

6.4

5.4

5.4

4.4

3

3.4

0

0

0

18 17.9 18.7 17.6 15.7

13.5 11.4 11.9 14.4 14.2 10.9

17.7 16.5 16.1 17.2 15.8 12.7

0

0

0

0

0

0

0

0

0

0

0

0

0

0.1

0.1 0.1

1

0.9

0.9

0.3

0.2

0.3

2.9

2.3

2.5

2.9

1.7

2.4

4.9

5.1

3.9 3.6

11.3 10.5 10.1

8.5

7.8

5.8

13.8 11.2 11.9 15.3 14.5

8.8

0

0

0

0

0.1

0.1

0.1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

6.2

5.5

4.9

3.1

2

2.4

2.8

2.3

2.4

1.4

1.1

1.2

7.9

7

5.7

5.3

20.2 18.3 17.2 18.4 17.4

12

17.8 15.6 14.9 15.7 14.1

9.8

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

100

0

0

0

0

0

0

1.8

1.5

1.7

1.6

1.1

1.4

0.4

0.4

0.4

0.4

0.3

0.3

0.75 D2

100

3.5

3

3

3.1

14.2 11.9 12.9 15.7 14.3

8.3

10.4

9.4

9.9

9.7

8.7

5.6

0.25 D3

100

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.5

D3

100

0

0

0

0

0

0

0.1

0.1

0.1

0

0

0

0.6

0.5

0.6

0.5

0.4

0.5

0.75 D3

100

3.6

3.1

3.2

3.2

2.4 2.4

10.4

9.2

8.5

5.7

5.3

4.2

12.3

9.9 10.7 13.4 11.3

7

4.6 3.5

2.4 2.3

38

Table A2: Parameters for APDR U

2C

3C

¯ Budget b(0) Depot Visits Detour δ

¯ Budget b(0) Depot Visits Detour δ

¯ Budget b(0) Depot Visits Detour δ

Distribution dod Depot c 0.25 D1

30

2.4

18.7

95.3

3.7

23.4

207.2

3.6

21.8

189.2

D1

30

3.4

21.7

173.2

4.5

25.3

255.7

4.4

24

240.9

0.75 D1

30

4.2

23.1

268.3

5

26.6

311.9

5

24.7

308.8

0.25 D2

30

1.5

31

76.4

2.6

52.9

181.4

2.2

63.8

155.6

0.5

D2

30

2.2

56.3

148.2

3.2

50.7

228.3

2.7

66.4

210.3

0.75 D2

30

2.6

61.7

235.7

3.4

50.2

277.1

3

66.8

271.9

0.25 D3

30

1.5

31

77.3

1.9

78.2

150.4

2.5

51.5

162

D3

30

2.2

56.8

148

2.2

88.8

201

3.1

50.5

215.8

0.75 D3

30

2.6

61.6

236.6

2.6

88.1

257.7

3.4

51.9

281.8

0.25 D1

40

1.3

9.1

38.5

3.4

22.7

160.7

3.1

21.2

136.6

D1

40

2.8

18.8

121

4.2

23.9

223.6

4.1

22.3

205.5

0.75 D1

40

3.8

22

232.8

4.8

25.8

292.1

4.7

24.1

284.9

0.25 D2

40

0.8

10.4

28.1

2.5

43.8

136.9

2

48.7

107.7

D2

40

1.8

42.6

99.6

3.1

45.7

195.8

2.5

60.7

171

0.75 D2

40

2.4

56.5

202.3

3.4

46.8

257.8

2.9

62.8

248.8

0.25 D3

40

0.8

9.8

27.1

1.6

55.2

103.4

2.2

42.4

110.8

D3

40

1.8

43.1

100.9

2

85.5

165.8

2.8

46.9

177.7

0.75 D3

40

2.4

56

205.6

2.4

89.5

235.6

3.3

47.9

255.2

0.25 D1

50

0.4

2.5

9.3

2.9

18.9

118.5

2.5

17.4

90.5

D1

50

2.1

14.6

77.1

3.9

21.5

191.8

3.7

20.3

168.7

0.75 D1

50

3.5

20.5

203.5

4.5

24.1

273.1

4.5

22.8

260.2

0.25 D2

50

0.2

1.5

5.5

2.1

37.6

94.1

1.5

28.9

64.2

D2

50

1.4

24.8

56.9

2.8

43.9

167.5

2.2

54.6

138.6

0.75 D2

50

2.3

53.3

176.9

3.4

43.9

243.1

2.8

57.9

228.4

0.25 D3

50

0.2

1.6

5.4

1.2

24.3

61.6

1.6

29.1

66.9

D3

50

1.4

24.7

57.4

1.8

76.1

133.5

2.4

45.4

145.5

0.75 D3

50

2.3

52.2

177.5

2.1

91.9

217.3

3.2

44.8

236.7

0.25 D1

60

0.1

0.3

1.3

2.4

13.5

78.4

1.7

11.1

47.6

D1

60

1.2

8.8

38.1

3.4

19.1

161.5

3.2

18.8

136.4

0.75 D1

60

3.2

19

172.4

4.3

23

255.3

4.3

22.6

241.3

0.25 D2

60

0

0.2

0.6

1.4

26.2

57.6

0.9

11.5

29.4

D2

60

0.9

11.4

28.8

2.4

42.4

136.9

2

45.3

107.9

0.75 D2

60

2.1

49.7

147.8

3.2

42.3

228.1

2.6

56.9

208.4

0.25 D3

60

0

0.1

0.5

0.7

5.2

25.3

1

14.2

32.9

D3

60

0.9

11.7

28.4

1.5

56.5

101.8

2

42

112.5

0.75 D3

60

2.1

49

149.3

2

91

200.7

2.9

43.8

216.4

0.25 D1

80

0

0

0

0.8

3.2

17.8

0.3

1.4

5.1

D1

80

0.3

1.6

6.3

2.7

13.6

104.5

2.1

13.7

73

0.75 D1

80

2.5

16.2

121.3

3.8

21.5

222.6

3.7

20.3

203.9

0.25 D2

80

0

0

0

0.4

3.4

9

0.1

0.4

1.8

D2

80

0.2

1.1

3.1

1.7

34.3

81.6

1.3

24.6

51

0.75 D2

80

1.8

38.8

100.5

2.8

40.7

198

2.3

51.7

173.2

0.25 D3

80

0

0

0

0.1

0

1.3

0.1

0.7

2.3

D3

80

0.1

0.8

2.5

1

15

46.1

1.3

24.7

53.3

0.75 D3

80

1.8

38.3

99.5

1.8

81.5

164.7

2.4

42.9

177.2

0.25 D1

100

0

0

0

0.1

0.1

0.9

0

0.1

0.2

D1

100

0

0.1

0.3

1.7

7

54

1

6.6

28.7

0.75 D1

100

1.7

12.9

76.4

3.4

19.4

190.7

3.2

18.4

170.1

0.25 D2

100

0

0

0

0

0.1

0.4

0

0

0

D2

100

0

0

0

1.1

15.8

35.5

0.5

5.8

14.5

0.75 D2

100

1.4

24.4

57.6

2.4

39.6

168.2

2.1

46.2

139.6

0.25 D3

100

0

0

0

0

0

0

0

0

0

D3

100

0

0.1

0.1

0.4

1.4

12.3

0.5

6.2

14.7

0.75 D3

100

1.4

24.6

60.4

1.6

70.6

132.9

2

40.9

144.1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

39