Dynamic Programming Approximations for a Stochastic ... - CiteSeerX

Dynamic Programming Approximations for a Stochastic Inventory Routing Problem Anton J. Kleywegt ∗ Vijay S. Nori Martin W. P. Savelsbergh School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 August 28, 2002

Abstract This work is motivated by the need to solve the inventory routing problem when implementing a business practice called vendor managed inventory replenishment (VMI). With VMI, vendors monitor their customers’ inventories, and decide when and how much inventory should be replenished at each customer. The inventory routing problem attempts to coordinate inventory replenishment and transportation in such a way that the cost is minimized over the long run. We formulate a Markov decision process model of the stochastic inventory routing problem, and propose approximation methods to find good solutions with reasonable computational effort. We indicate how the proposed approach can be used for other Markov decision processes involving the control of multiple resources.

∗ Supported

by the National Science Foundation under grant DMI-9875400.

Introduction Recently the business practice called vendor managed inventory replenishment (VMI) has been adopted by many companies. VMI refers to the situation in which a vendor monitors the inventory levels at its customers and decides when and how much inventory to replenish at each customer. This contrasts with conventional inventory management, in which customers monitor their own inventory levels and place orders when they think that it is the appropriate time to reorder. VMI has several advantages over conventional inventory management. Vendors can usually obtain a more uniform utilization of production resources, which leads to reduced production and inventory holding costs. Similarly, vendors can often obtain a more uniform utilization of transportation resources, which in turn leads to reduced transportation costs. Furthermore, additional savings in transportation costs may be obtained by increasing the use of low-cost full-truckload shipments and decreasing the use of high-cost less-than-truckload shipments, and by using more efficient routes by coordinating the replenishment at customers close to each other. VMI also has advantages for customers. Service levels may increase, measured in terms of reliability of product availability, due to the fact that vendors can use the information that they collect on the inventory levels at the customers to better anticipate future demand, and to proactively smooth peaks in the demand. Also, customers do not have to devote as many resources to monitoring their inventory levels and placing orders, as long as the vendor is successful in earning and maintaining the trust of the customers. A first requirement for a successful implementation of VMI is that a vendor is able to obtain relevant and accurate information in a timely and efficient way. One of the reasons for the increased popularity of VMI is the increase in the availability of affordable and reliable equipment to collect and transmit the necessary data between the customers and the vendor. However, access to the relevant information is only one requirement. A vendor should also be able to use the increased amount of information to make good decisions. This is not an easy task. In fact, it is a very complicated task, as the decision problems involved are very hard. The objective of this work is to develop efficient methods to help the vendor to make good decisions when implementing VMI. In many applications of VMI, the vendor manages a fleet of vehicles to transport the product to the customers. The objective of the vendor is to coordinate the inventory replenishment and transportation in such a way that the total cost is minimized over the long run. The problem of optimal coordination of inventory replenishment and transportation is called the inventory routing problem (IRP). In this paper, we study the problem of determining optimal policies for the variant of the IRP in which a single product is distributed from a single vendor to multiple customers. The demands at the customers are assumed to have probability distributions that are known to the vendor. The objective is to maximize the expected discounted value, incorporating sales revenues, production costs, transportation costs, inventory holding costs, and shortage penalties, over an infinite horizon. 2

Our work on this problem was motivated by our collaboration with a producer and distributor of air products. The company operates plants worldwide and produces a variety of air products, such as liquid nitrogen, oxygen and argon. The company’s bulk customers have their own storage tanks at their sites, which are replenished by tanker trucks under the supplier’s control. Approximately 80% of the bulk customers participate in the company’s VMI program. For the most part each customer and each vehicle is allocated to a specific plant, so that the overall problem decomposes according to individual plants. Also, to improve safety and reduce contamination, each vehicle and each storage tank at a customer is dedicated to a particular type of product. Hence the problem also decomposes according to type of product. (This assumption does not hold if the number of drivers is a tight constraint, and drivers can be allocated to deliver one of several different products.) Therefore, in this paper we consider an inventory routing problem with a single vendor, multiple customers, multiple vehicles, and a single type of product. The main contributions of the research reported in this paper are as follows: 1. In an earlier paper (Kleywegt et al., 2002), we formulated the inventory routing problem with direct deliveries, i.e., one delivery per trip, as a Markov decision process and proposed an approximate dynamic programming approach for its solution. In this paper, we extend both the formulation and the approach to handle multiple deliveries per trip. 2. We present a solution approach that uses decomposition and optimization to approximate the value function. Specifically, the overall problem is decomposed into smaller subproblems, each designed to have two properties: (1) it provides an accurate representation of a portion of the overall problem, and (2) it is relatively easy to solve. In addition, an optimization problem is defined to combine the solutions of the subproblems, in such a way that the value of a given state of the process is approximated by the optimal value of the optimization problem. 3. Computational experiments demonstrate that our approach allows the construction of near optimal policies for small instances and policies that are better than policies that have been proposed in the literature for realistically sized instances (with approximately 20 customers). The sizes of the state spaces for these instances are orders of magnitude larger than those that can be handled with more traditional methods, such as the modified policy iteration algorithm. In Section 1 we define the stochastic inventory routing problem, point out the obstacles encountered when attempting to solve the problem, present an overview of the proposed solution method, and review related literature. In Section 2 we propose a method for approximating the dynamic programming value function. In Section 3 the day-to-day control of the IRP process using the dynamic programming value function approximation is discussed. In Section 4 we investigate a special case of the IRP. Computational

3

results are presented in Section 5, and Section 6 concludes with some remarks regarding the application of the approach to other stochastic control problems.

1

Problem Definition

A general description of the IRP is given in Section 1.1, after which a Markov decision process formulation is given in Section 1.2. Section 1.3 discusses the issues to be addressed when solving the IRP, and Section 1.4 presents an overview of the proposed solution method. Section 1.5 reviews some related literature.

1.1

Problem Description

A product is distributed from a vendor’s facility to N customers, using a fleet of M homogeneous vehicles, ˜ The process is modeled in discrete time t = 0, 1, . . . , and the discrete time each with known capacity C. periods are called days. Let random variable Uit denote the demand of customer i at time t, and let Ut ≡ (U1t , . . . , UN t ) denote the vector of customer demands at time t. Customers’ demands on different days are independent random vectors with a joint probability distribution F that does not change with time; that is, U0 , U1 , . . . is an independent and identically distributed sequence, and F is the probability distribution of each Ut . The probability distribution F is known to the decision maker. (In many applications customers’ demands on different days may not be independent; in such cases customers’ demands on previous days may provide valuable data for the forecasting of customers’ future demands. A refined model with a suitably expanded state space can be formulated to exploit such additional information. Such refinement is not addressed in this paper.) There is an upper bound Ci on the amount of product that can be in inventory at each customer i. This upper bound Ci can be due to limited storage capacity at customer i, as in the application that motivated this research. In other applications of VMI, there is often a contractual upper bound Ci , agreed upon by customer i and the vendor, on the amount of inventory that may be at customer i at any point in time. One motivation for this contractual bound is to prevent the vendor from dumping too much product at the customer. The vendor can measure the inventory level Xit of each customer i at any time t. At each time t, the vendor makes a decision that controls the routing of vehicles and the replenishment of customers’ inventories. Such decisions may have many aspects, some of which are important for the method developed in this paper, and others which are not. Aspects of daily decisions that are important for the method developed in this paper are the following: 1. which customers’ inventories to replenish, 2. how much to deliver at each customer, and

4

3. how to combine customers into vehicle routes. On the other hand, the ideas developed in the paper are independent of the routing constraints that are imposed, and thus routing constraints are not explicitly spelled out in the formulation. Unless otherwise stated, we assume that each vehicle can perform at most one route per day. We also assume that the duration of the task assigned to each driver and vehicle is less than the length of a day, so that all M drivers and vehicles are available at the beginning of each day, when the tasks for that day are assigned. The expected value (revenues and costs) accumulated during a day depends on the inventory levels and decision of that day, and is known to the vendor. As in the case of the routing constraints, the ideas developed in the paper are independent of the exact composition of the costs of the daily decisions. Next we describe some typical types of costs for illustrative purposes. (These costs were also used in numerical work.) The cost of a daily decision may include the travel costs cij on the arcs (i, j) of the distribution network that are traversed according to the decision. Travel costs may also depend on the amount of product transported along each arc. The cost of a daily decision may include the costs incurred at customers’ sites, for example due to product losses during delivery. The cost of a daily decision may include revenue: if quantity di is delivered at customer i, the vendor earns a reward of ri (di ). The cost of a daily decision may include shortage penalties: because demand is uncertain, there is often a positive probability that a customer runs out of stock, and thus shortages cannot always be prevented. Shortages are discouraged with a penalty pi (si ) if the unsatisfied demand on day t at customer i is si . Unsatisfied demand is treated as lost demand, and is not backlogged. The cost of a daily decision may include inventory holding cost: if the inventory at customer i is xi at the beginning of the day, and quantity di is delivered at customer i, then an inventory holding cost of hi (xi + di ) is incurred. The inventory holding cost can also be modeled as a function of some “average” amount of inventory at each customer during the time period. The role played by inventory holding cost depends on the application. In some cases, the vendor and customers belong to different organizations, and the customers own the inventory. In these cases, the vendor typically does not incur any inventory holding costs based on the inventory at the customers. This was the case in the application that motivated this work. In other cases, such as when the vendor and customers belong to the same organization, or when the vendor owns the inventory at the customers, the vendor does incur inventory holding costs based on the inventory at the customers. The objective is to choose a distribution policy that maximizes the expected discounted value (rewards minus costs) over an infinite time horizon.

1.2

Problem Formulation

In this section we formulate the IRP as a discrete time Markov decision process (MDP) with the following components: 5

1. The state x = (x1 , x2 , . . . , xN ) represents the current amount of inventory at each customer. Thus the state space is X = [0, C1 ] × [0, C2 ] × · · · × [0, CN ] if the quantity of product can vary continuously, or X = {0, 1, . . . , C1 } × {0, 1, . . . , C2 } × · · · × {0, 1, . . . , CN } if the quantity of product varies in discrete units. Let Xit ∈ [0, Ci ] (or Xit ∈ {0, 1, . . . , Ci }) denote the random inventory level at customer i at time t. Let Xt = (X1t , . . . , XN t ) ∈ X denote the state at time t. 2. For any state x, let A(x) denote the set of all feasible decisions when the process is in state x. A decision a ∈ A(x) made at time t when the process is in state x, contains information about (1) which customers’ inventories to replenish, (2) how much to deliver at each customer, and (3) how to combine customers into vehicle routes. A decision may contain more information such as travel times and arrival and departure times at customers (relative to time windows); the three attributes of a decision mentioned above are the important attributes for our purposes. For any decision a, let di (a) denote the quantity of product that is delivered to customer i while executing decision a. The set A(x) is determined by various constraints, such as work load constraints, routing constraints, vehicles’ capacity constraints, and customers’ inventory constraints. As discussed in Section 1.1, constraints such as work load constraints and routing constraints do not affect the method described in this paper. The constraints explicitly addressed in this paper are the limited number M of vehicles that can be used each day, the limited quantity C˜ (vehicle capacity) that can be delivered by each vehicle on a day, and the maximum inventory levels Ci that are allowed at any time at each customer i. The maximum inventory level constraints can be imposed in a variety of ways. For example, if it is assumed that no product is used between the time that the inventory level xi is measured at customer i and the time that the delivery of di (a) takes place, then the maximum inventory level constraints can be expressed as xi + di (a) ≤ Ci for all i, all x ∈ X , and all a ∈ A(x). If product is used during this time period, it may be possible to deliver more. The exact way in which the constraint is applied does not affect the rest of the development. For simplicity we applied the constraint as stated above. Let the random variable At ∈ A(Xt ) denote the decision chosen at time t. 3. In this formulation, the source of randomness is the random customer demands Uit . To simplify the exposition, assume that the deliveries at time t take place in time to satisfy the demand at time t. Then the amount of product used by customer i at time t is given by min{Xit + di (At ), Uit }. Thus the shortage at customer i at time t is given by Sit = max{Uit − (Xit + di (At )), 0}, and the next inventory level at customer i at time t + 1 is given by Xi,t+1 = max{Xit + di (At ) − Uit , 0}. The known joint probability distribution F of customer demands Ut gives a known Markov transition function Q, according to which transitions occur. For any state x ∈ X , any decision a ∈ A(x), and any Borel subset B ⊆ X , let U(x, a, B) ≡ U ∈ N : max{x + d (a) − U , 0}, . . . , max{x + d (a) − U , 0} ∈ B . 1 1 1 N N N +

6

Then Q[B | x, a] ≡ F [U(x, a, B)]. In other words, for any state x ∈ X , and any decision a ∈ A(x), P [Xt+1 ∈ B | Xt = x, At = a] = Q[B | x, a] ≡ F [U(x, a, B)]

4. Let g(x, a) denote the expected single stage net reward if the process is in state x at time t, and decision a ∈ A(x) is implemented. To give a specific example in terms of the costs mentioned in Section 1.1, for any decision a and arc (i, j), let kij (a) denote the number of times that arc (i, j) is traversed by a vehicle while executing decision a. Then,

g(x, a) ≡

N

ri (di (a)) −

i=1

cij kij (a) −

N

hi (xi + di (a))

i=1

(i,j)

−

N

EF pi max{Ui0 − (xi + di (a)), 0}

i=1

where EF denotes expected value with respect to the probability distribution F of U0 . 5. The objective is to maximize the expected total discounted value over an infinite horizon. The decisions At are restricted such that At ∈ A(Xt ) for each t, and At may depend only on the history (X0 , A0 , X1 , A1 , . . . , Xt ) of the process up to time t, i.e., when the decision maker decides on a decision at time t, the decision maker does not know what is going to happen in the future. Let Π denote the set of policies that depend only on the history of the process up to time t. Let α ∈ [0, 1) denote the discount factor. Let V ∗ (x) denote the optimal expected value given that the initial state is x, i.e., ∗

V (x) ≡

sup E

π

π∈Π

∞ t=0

α g (Xt , At ) X0 = x t

(1)

A stationary deterministic policy π prescribes a decision π(x) ∈ A(x) based on the information contained in the current state x of the process only. For any stationary deterministic policy π, and any state x ∈ X , the expected value V π (x) is given by

≡ E α g (Xt , π(Xt )) X0 = x t=0 = g(x, π(x)) + α V π (y) Q[dy | x, π(x)]

π

V (x)

π

∞

t

X

(The last equality is a standard result in dynamic programming; see for example Bertsekas and Shreve 1978.) It follows from results in dynamic programming that, under conditions that are not very restrictive (e.g., g bounded and α < 1), to determine the optimal expected value in (1), it is sufficient to restrict attention to

7

the class ΠSD of stationary deterministic policies. It follows that for any state x ∈ X , V ∗ (x)

=

sup V π (x)

V ∗ (y) Q[dy | x, a] g(x, a) + α sup

π∈ΠSD

=

(2)

X

a∈A(x) ∗

A policy π ∗ is called optimal if V π = V ∗ .

1.3

Solving the Markov Decision Process

Many algorithms have been proposed to solve Markov decision processes; for example, see the textbooks by Bertsekas (1995) and Puterman (1994). Solving a Markov decision process usually involves computing the optimal value function V ∗ , and an optimal policy π ∗ , by solving the optimality equation (2). This requires the following major computational tasks to be performed. 1. Computation of the optimal value function V ∗ . Because V ∗ appears in the left hand side and right hand side of (2), most algorithms for computing V ∗ involves the computation of successive approximations to V ∗ (x) for every x ∈ X . These algorithms are practical only if the state space X is small. For the IRP as formulated in Section 1.2, X may be uncountable. One may attempt to make the problem more tractable by discretizing the state space X and the transition probabilities Q. Even if one discretizes X and Q, the number of states grows exponentially in the number of customers. Thus even for discretized X and Q, the number of states is far too large to compute V ∗ (x) for every x ∈ X if there are more than about four customers. 2. Estimation of the expected value (integral) in (2). For the IRP, this is a high dimensional integral, with the number of dimensions equal to the number N of customers, which can be as much as several hundred. Conventional numerical integration methods are not practical for the computation of such high dimensional integrals. 3. The maximization problem on the right hand side of (2) has to be solved to determine the optimal decision for each state. In the case of the IRP, the optimization problem on the right hand side of (2) is very hard. For example, the vehicle routing problem (VRP), which is NP-hard, is a special case of that problem. (Consider any instance of the VRP, with a given number of capacitated vehicles, a graph with costs on the arcs, and demand quantities at the nodes. For the IRP, let the vehicles and graph be the same as for the VRP, let the demand be deterministic with demand quantities as given for the VRP, let the current inventory level at each customer be zero, let the discount factor be zero, and let the penalties be sufficiently large such that an optimal solution for the optimization problem

8

on the right hand side of (2) has to satisfy the demand quantities at all the nodes. Then the instance of the VRP can be solved by solving the optimization problem on the right hand side of (2).) In Kleywegt et al. (2002) we developed approximation methods to perform the computational tasks mentioned above efficiently and to obtain good solutions for the inventory routing problem with direct deliveries (IRPDD). To extend the approach to the IRP in which multiple customers can be visited on a route, we develop in this paper new methods for the first and third computational tasks, that is, to compute, at least approximately, V ∗ , and to solve the maximization problem on the right hand side of (2). The second task was addressed in the way described in Kleywegt et al. (2002).

1.4

Overview of the Proposed Method

An outline of our approach is as follows. The first major step in solving the IRP is to construct an approximation Vˆ to the optimal value function V ∗ . The approximation Vˆ is constructed as follows. First, a decomposition of the IRP is developed. Subproblems are defined for specific subsets of customers. Each subproblem is also a Markov decision process. The subsets of customers do not necessarily partition the set of customers, but must cover the set of customers. The idea is to define each subproblem so that it gives an accurate representation of the overall process as experienced by the subset of customers. To do that, the parameters of each subproblem are determined by simulating the overall IRP process, and by constructing simulation estimates of subproblem parameters. Second, each subproblem is solved optimally. Third, for any given state x of the IRP process, the approximate value Vˆ (x) is determined by choosing a collection of subsets of customers that partitions the set of customers. Then Vˆ (x) is set equal to the sum of the optimal value functions of the subproblems corresponding to the chosen collection of subsets at states corresponding to x. The collection of subsets of customers is chosen to maximize Vˆ (x). Details of the construction of Vˆ are given in Section 2. An outline of the value function approximation algorithm is given in Algorithm 1. Given Vˆ , the IRP process is controlled as follows. Whenever the state of the process is x, then a decision π ˆ (x) is chosen that solves max

a∈A(x)

g(x, a) + α X

Vˆ (y) Q[dy | x, a]

(3)

which is the right hand side of the optimality equation (2) with Vˆ instead of V ∗ . A method for problem (3) is described in Section 3. Algorithm 1 already indicates that the development of the approximating function Vˆ requires a lot of computational effort. The effort is required to determine appropriate parameters for the subproblems and to solve all the subproblems. This effort is required only once at the beginning of the control of the IRP process 9

Algorithm 1 Procedure for computing Vˆ and π ˆ. 1. Start with an initial policy π ˆ0 . Set i ← 0. 2. Simulate the IRP under policy π ˆ0 to estimate the subproblem parameters. 3. Solve the subproblems. 4. Vˆ is determined by the optimal value functions of the subproblems. 5. Policy π ˆ1 is defined by equation (4). 6. Repeat steps 7 through 11 for a chosen number of iterations, or until a convergence test is satisfied. 7. Increment i ← i + 1. 8. Simulate the IRP under policy π î to update the estimates of the subproblem parameters. 9. With the updated estimates of the subproblem parameters, solve the updated subproblems. 10. Vˆ is determined by the optimal value functions of the updated subproblems. 11. Policy π î+1 is given by equation (4). (although, in practice, Vˆ may have to be changed if the parameters of the MDP change), so that a substantial effort for this initial computational task seems to be acceptable. In contrast, once the approximating function Vˆ has been constructed, only the daily problem (3) has to be solved at each stage of the IRP process, each time for a given value of the state x. Because the daily problem has to be solved many times, it is important that this computational task can be performed with relatively little effort.

1.5

Review of Related Literature

In this section we give a brief review of related literature on the inventory routing problem (Section 1.5.1) and on dynamic programming approximations (Section 1.5.2). The review is not comprehensive. 1.5.1

Inventory Routing Literature

A large variety of deterministic and stochastic models of inventory routing problems have been formulated, and a variety of heuristics and bounds have been produced. A classification of the inventory routing literature is given in Kleywegt et al. (2002). Bell et al. (1983) propose an integer program for the inventory routing problem at Air Products, a producer of products such as liquid nitrogen. Dror, Ball, and Golden (1985), and Dror and Ball (1987) construct a solution for a short-term planning period based on identifying, for each customer, the optimal replenishment day t∗ and the expected increase in cost if the customer is visited on day t instead of t∗ . An integer program is then solved that assigns customers to a vehicle and a day, or just a day, that minimizes the sum of these costs plus the transportation costs. Dror and Levy (1986) use a similar method to construct a

10

weekly schedule, and then apply node and arc exchanges to reduce costs in the planning period. Trudeau and Dror (1992) apply similar ideas to the case in which inventories are observable only at delivery times. Bard et al. (1998) follow a rolling horizon approach to an inventory routing problem with satellite facilities where trucks can be refilled. To choose the customers to be visited during the next two weeks, they determine an optimal replenishment frequency for each customer, similar to the approach in Dror, Ball, and Golden (1985), and Dror and Ball (1987). Federgruen and Zipkin (1984) formulate an inventory routing problem quite similar to the one in Section 1.2, except that they focus on solving the myopic single-stage problem maxa∈A(x) g(x, a), which is a nonlinear integer program. Golden, Assad, and Dahl (1984) also propose a heuristic to solve the myopic single-stage problem maxa∈A(x) g(x, a), while maintaining an “adequate” inventory at all customers. Chien, Balakrishnan, and Wong (1989) also propose an integer programming based heuristic to solve the single-stage problem, but they attempt to find a solution that is less myopic than that of Federgruen and Zipkin (1984) and Golden, Assad, and Dahl (1984), by passing information from one day to the next. Anily and Federgruen (1990, 1991, 1993) analyze fixed partition policies for the inventory routing problem with constant deterministic demand rates and an unlimited number of vehicles. They also find lower and upper bounds on the minimum long-run average cost over all fixed partition policies, and propose a heuristic, called modified circular regional partitioning, to choose a fixed partition. Gallego and Simchi-Levi (1990) use an approach similar to that of Anily and Federgruen (1990) to evaluate the long-run effectiveness of direct deliveries (one customer on each route). Bramel and Simchi-Levi (1995) also study fixed partition policies for the deterministic inventory routing problem with an unlimited number of vehicles. They propose a location based heuristic, based on the capacitated concentrator location problem (CCLP), to choose a fixed partition. The tour through each subset of customers is constructed while solving the CCLP, using a nearest insertion heuristic. Chan, Federgruen, and Simchi-Levi (1998) analyze zero-inventory ordering policies, in which a customer’s inventory is replenished only when the customer’s inventory has been depleted, and fixed partition policies, also for the deterministic inventory routing problem with an unlimited number of vehicles. They derive asymptotic worst-case bounds on the performance of the policies. They also propose a heuristic based on the CCLP, similar to that of Bramel and Simchi-Levi (1995), for determining a fixed partition of the set of customers. Gaur and Fisher (2002) consider a deterministic inventory routing problem with time varying demand. They propose a randomized heuristic to find a fixed partition policy with periodic deliveries. Their method was implemented for a supermarket chain. Burns et al. (1985) develop approximating equations for both a direct delivery policy as well as a policy in which vehicles visit multiple customers on a route. Minkoff (1993) also formulated the inventory routing problem as a MDP. He focused on the case with an unlimited number of vehicles. He proposed a decomposition heuristic to reduce the computational effort.

11

The heuristic solves a linear program to allocate joint transportation costs to individual customers, and then solves individual customer subproblems. The value functions of the subproblems are added to approximate the value function of the combined problem. Minkoff’s work differs from ours in the following aspects: (1) we consider the case with a limited number of vehicles, (2) we define subproblems involving one or more customers, and the subproblems are defined differently, one reason being that the bound on the number of vehicles has to be addressed in our subproblems, and (3) we solve an optimization problem to combine the results of the subproblems. Webb and Larson (1995) propose a solution for the problem of determining the minimum fleet size for an inventory routing system. Their work is related to Larson’s earlier work on fleet sizing and inventory routing (Larson, 1988). Bassok and Ernst (1995) consider the problem of delivering multiple products to customers on a fixed tour. The optimal policy for each product is characterized by a sequence of critical numbers, similar to an optimal policy found by Topkis (1968). Barnes-Schuster and Bassok (1997) study the cost effectiveness of a particular direct delivery policy for the inventory routing problem. Kleywegt et al. (2002) also consider the special case with direct deliveries. A MDP model of the inventory routing problem is formulated, and a dynamic programming approximation method is developed to find a policy. Herer and Roundy (1997) propose several heuristics to construct power-of-two policies for the inventory routing problem with constant deterministic demand rates and an unlimited number of vehicles, and they prove performance bounds for the heuristics. Viswanathan and Mathur (1997) propose an insertion heuristic to construct a power-of-two policy for the inventory routing problem with multiple products, constant deterministic demand rates, and an unlimited number of vehicles. Reiman et al. (1999) perform a heavy traffic analysis for three types of policies for the inventory routing problem with a single vehicle. C ¸ etinkaya and Lee (2000) study a problem in which the vendor accumulates customer orders over time intervals of length T , and then delivers customer orders at the end of each time interval. Bertazzi et al. (2002) consider a deterministic inventory routing problem with a single capacitated vehicle. Each customer has a specified minimum and maximum inventory level. They propose a heuristic to determine the vehicle route at each discrete time point, while following an order-up-to policy, that is, each time a customer is visited the inventory at the customer is replenished to the specified maximum inventory level. They consider the impact of different objective functions. The inventory pickup and delivery problem is quite similar to the inventory routing problem. In the inventory pickup and delivery problem, there are multiple sources of a single product, multiple demand points, and multiple vehicles. The vehicles are scheduled to travel alternatingly between sources and demand points to replenish the inventory at the demand points. Christiansen and Nygreen (1998a, 1998b) present

12

a path flow formulation and column generation method for the inventory pickup and delivery problem with time windows (IPDPTW). Christiansen (1999) presents an arc flow formulation for the IPDPTW. 1.5.2

Dynamic Programming Approximation Literature

Dynamic programming or Markov decision processes is a versatile and widely used framework for modeling dynamic and stochastic optimal control problems. However, a major shortcoming is that for many interesting applications an optimal policy cannot be computed because (1) the state space X is too big to compute and store the optimal value V ∗ (x) and an optimal decision π ∗ (x) for each state x; and/or (2) the expected value in (2), which often is a high dimensional integral, cannot be computed exactly; and/or (3) the single stage optimization problem on the right hand side of (2) cannot be solved exactly. In this section we briefly mention some of the work that has been done to address the first issue, that is, how to attack problems with large state spaces. The second issue makes up a large part of the field of statistics, and the third issue makes up a large part of the field of optimization; these fields are not reviewed here. A natural approach for attacking MDPs with large state spaces, which is also the approach used in this paper, is to approximate the optimal value function V ∗ with an approximating function Vˆ . It is shown in Section 2 that a good approximation Vˆ of the optimal value function V ∗ can be used to find a good policy π ˆ . Some of the early work on this approach is that of Bellman and Dreyfus (1959), who propose using Legendre polynomials inductively to approximate the optimal value function of a finite horizon MDP. Chang (1966), Bellman et al. (1963), and Schweitzer and Seidman (1985) also study the approximation of V ∗ with polynomials, especially orthogonal polynomials such as Legendre and Chebychev polynomials. Approximations using splines are suggested by Daniel (1976), and approximations using regression splines by Chen et al. (1999). Recently a lot of work has been done on parameterized approximations. Some of this work was motivated by approaches proposed for reinforcement learning; Sutton and Barto (1998) give an overview. Tsitsiklis and Van Roy (1996), Van Roy and Tsitsiklis (1996), Bertsekas and Tsitsiklis (1996), and De Farias and Van Roy (2000) study the estimation of the parameters of these approximating functions for infinite horizon discounted MDPs, and Tsitsiklis and Van Roy (1999a) consider estimation for long-run average cost MDPs. Value function approximations are proposed for specific applications by Van Roy et al. (1997), Powell and Carvalho (1998), Tsitsiklis and Van Roy (1999b), Secomandi (2000), and Kleywegt et al. (2002). In many models the state space is uncountable and the transition and cost functions are too complex for closed form solutions to be obtained. Discretization methods and convergence results for such problems are discussed in Wong (1970a), Fox (1973), Bertsekas (1975), Kushner (1990), Chow and Tsitsiklis (1991), and Kushner and Dupuis (1992). Another natural approach for attacking a large-scale MDP is to decompose the MDP into smaller related

13

MDPs, which are easier to solve, and then to use the solutions of the smaller MDPs to obtain a good solution for the original MDP. Decomposition methods are discussed in Wong (1970b), Collins and Lew (1970), Collins (1970), Collins and Angel (1971), Courtois (1977), Courtois and Semal (1984), Stewart (1984), and Kleywegt et al. (2002). Some general state space reduction methods that include many of the methods mentioned above are analyzed in Whitt (1978, 1979a, 1979b), Hinderer (1976, 1978), Hinderer and H¨ ubner (1977), and Haurie and L’Ecuyer (1986). Surveys are given in Morin (1978), and Rogers et al. (1991).

2

Value Function Approximation

The first major step in solving the IRP is the construction of an approximation Vˆ to the optimal value function V ∗ . A good approximating function Vˆ can then be used to find a good policy π ˆ , in the sense described next. Suppose that V ∗ − Vˆ ∞ < ε, that is, Vˆ is an ε-approximation of V ∗ . Also suppose that stationary deterministic policy π ˆ satisfies

g(x, π ˆ (x)) + α

y∈X

    Vˆ (y) Q[y | x, π ˆ (x)] ≥ sup Vˆ (y) Q[y | x, a] − δ g(x, a) + α  a∈A(x) 

(4)

y∈X

for all x ∈ X , that is, decision π ˆ (x) is within δ of the optimal decision using approximating function Vˆ on the right hand side of the optimality equation (2). Then V πˆ (x) ≥ V ∗ (x) −

2αε + δ 1−α

ˆ is within (2αε + δ)/(1 − α) of the optimal value for all x ∈ X , that is, the value function V πˆ of policy π function V ∗ . This observation is the motivation for putting in the effort to construct a good approximating function Vˆ . This section describes the construction of Vˆ ; the “decisions” referred to in this section are used only for the purpose of motivating the approximation Vˆ , and are not used to control the IRP process. The decisions used to control the IRP process are described subsequently in Section 3.

2.1

Subproblem Definition

To approximate the optimal value function V ∗ , we decompose the IRP into subproblems, and then combine the subproblem results using another optimization problem, described in Section 2.2, to produce the approximating function Vˆ . Each subproblem is a Markov decision process involving a subset of customers. The subsets of customers do not necessarily partition the set of customers, but must cover the set of customers,

14

and it must be possible to form a partition with a subcollection of the subsets. The approach we followed was to define subproblems for each subset of customers that can be visited on a single vehicle route. Thus each single customer forms a subset, and in addition there are a variety of subsets with multiple customers. Hence, the cover and partition conditions referred to above are automatically satisfied. After the subsets of customers have been identified, a subproblem has to be defined (a model has to be constructed) for each subset. That involves determining appropriate parameters and parameter values for the MDP of each subset. An appealing idea is to choose the parameters and parameter values of each subproblem so that the subproblem represents the overall IRP process as experienced by the subset of customers. There are several obstacles in the way of implementing such an idea. First, the overall process depends on the policy controlling the process, and an optimal policy is not known. Second, even with a given policy for controlling the overall process, it is still hard to determine appropriate parameters and parameter values for each subproblem so that the combined subproblems give a good representation of the overall process. This section, including Subsections 2.1.1 and 2.1.2, is devoted to the modeling of the subproblems, that is, the determination of parameters and parameter values for each subproblem. It has the interesting feature that simulation is used in the process of constructing the subproblem models. Issues that have to be addressed are the following. 1. One question is how many vehicles are available for a given subproblem. This issue comes about because in the overall IRP process, several subsets compete for the M vehicles, and thus, at any given time, all M vehicles will not be available to any given subset. Also a vehicle may visit customers in the subset as well as customers not in the subset, and thus not all of a vehicle’s capacity C˜ may be available to the given subset. Thus, the availability of vehicles and vehicle capacity to subsets of customers (and therefore in subproblems) has to be modeled. 2. Transition probabilities have to be determined for the subproblems. The transition probabilities of the inventory levels are determined by the demand distribution F as before. In addition, for the subproblems we also address the transition probabilities of vehicle availability to the subset of customers. In the description of the subproblems, we sometimes refer to the overall process, and sometimes to the models of the individual subproblems; we attempt to keep the distinctions as well as the similarities clear. To simplify notation, the modeling of the subproblems is described for a two-customer subproblem; the models for the subproblems with one or more than two customers are similar. A two-customer subproblem for subset {i, j} is denoted by MDPij . The method presented in this section is for a discrete demand distribution F and a discrete state space X , which may come about naturally due to the nature of the product or because of discretization of the demand distribution and the state space. Let the support of F be denoted by U1 × · · · × UN , and let fij denote the (marginal) probability mass function 15

of the demand of customers i and j, that is, fij (ui , uj ) ≡ F [U1 × U2 × · · · × {ui } × · · · × {uj } × · · · × UN ] denotes the probability that the demand at customer i is ui and the demand at customer j is uj . Recall that the idea is to define each subproblem so that it gives an accurate representation of the overall process as experienced by the subset of customers. Clearly, the state of a subproblem has to include the inventory level at each of the customers in the subproblem. Furthermore, to capture information about the availability of vehicles for delivering to the customers in the subproblem, the state of a subproblem also includes a component with information about the vehicle availability to the subset of customers. To determine possible values of the vehicle availability component vij of the state of subproblem MDPij , consider the different ways in which the customers i and j can be visited in the overall IRP process. For simplicity, we assume that each customer is visited at most once per day. Consequently, on any day, the subset of two customers can be visited by 0, 1, or 2 vehicles. Hence, in subproblem MDPij , at any point in time, either 0, or 1, or 2 vehicles are available to the subset of two customers. The simplest case is the case with no vehicles available for delivering to customers i and j (denoted by vij = 0 in subproblem MDPij ). When 1 or 2 vehicles are available to the subset of two customers, we also have to specify how much of those vehicles’ capacities are available to the subset of customers, because those same vehicles may also make deliveries to customers other than i or j on a route. Consider the different ways in which one vehicle could deliver to i and/or j in the overall IRP process. There are the following six possibilities: 1. exclusive delivery to i, 2. exclusive delivery to j, 3. exclusive delivery to i and j (no deliveries to other customers), 4. fraction of vehicle capacity delivered to i and no delivery to j, 5. fraction of vehicle capacity delivered to j and no delivery to i, 6. fraction of vehicle capacity delivered to i and j plus delivery to other customers. The first three possibilities are represented by the same vehicle availability component in subproblem MDPij (denoted by vij = a), because in all three cases one vehicle is available exclusively for customers in the subproblem. The other possibilities are denoted by vij = b, c, d respectively, in subproblem MDPij . Next consider the different ways in which two vehicles could deliver to i and j in the overall IRP process. There are the following four possibilities: 1. exclusive delivery to i and j (no deliveries to other customers), 2. exclusive delivery to i, fraction of vehicle capacity delivered to j

16

3. exclusive delivery to j, fraction of vehicle capacity delivered to i 4. fraction of vehicle capacity delivered to i and fraction of vehicle capacity delivered to j (with different vehicles visiting i and j, each also delivering to other customers). These possibilities are denoted by vij = e, f, g, h respectively, in subproblem MDPij . Whenever a vehicle is available for delivering a fraction of its capacity to one or both of the customers in the subset, the model for subproblem MDPij also needs to specify what portion of the vehicle’s capacity is available to the subset. For example, when the vehicle availability vij ∈ {b, c, d}, one vehicle with a fraction of the capacity C˜ is available to the two-customer subset; when vij = h, two vehicles, each with a fraction of ˜ are available to the subset; and when vij ∈ {f, g}, two vehicles, one with capacity C˜ and one the capacity C, ˜ are available to the subset. Each of the subproblem vehicle availabilities with a fraction of the capacity C, vij ∈ {b, g, h} correspond to a situation in the overall IRP in which a vehicle visits i and a customer not in {i, j}, but the same vehicle does not visit j. The fractional capacity associated with the vehicle availabilities ˜ Similarly, the fractional capacity associated with the vij ∈ {b, g} is the same and is denoted by λiij ∈ [0, C]. ˜ When the vehicle availability is vehicle availabilities vij ∈ {c, f } is the same and is denoted by λjij ∈ [0, C]. vij = h, one vehicle with fractional capacity λiij and another vehicle with fractional capacity λjij are available to the subset. Finally, when the vehicle availability is vij = d, the fractional capacity available to the subset ˜ is denoted by λij ij ∈ [0, C]. Table 1 summarizes the vehicle availability values vij and associated available capacities for a two-customer subproblem MDPij . Note that for the subproblem, it is sufficient to know the (possibly fractional) capacities available to the subset. The subproblem decision determines how the capacities will be used to serve customers i and j. Section 2.1.2 explains how simulation is used to choose appropriate values for these λ-parameters. Table 1: Vehicle availability values vij and associated capacities for a two-customer subproblem MDPij . vij -value 0 a b c d e f g h

Vehicle capacities available to customer subset {i, j} None One vehicle with capacity C˜ One vehicle with capacity λiij One vehicle with capacity λjij One vehicle with capacity λij ij Two vehicles, each with capacity C˜ ˜ and one with capacity λj Two vehicles, one with capacity C, ij Two vehicles, one with capacity λiij , and one with capacity C˜ Two vehicles, one with capacity λiij , and one with capacity λjij

Each two-customer subproblem MDPij is a discrete time Markov decision process, and is defined as follows. 17

1. The state space is Xij = {0, 1, . . . , Ci } × {0, 1, . . . , Cj } × {0, a, b, c, d, e, f, g, h}. State (xi , xj , vij ) denotes that the inventory levels at customers i and j are xi and xj , and the vehicle availability is vij . Let Xit ∈ {0, 1, . . . , Ci } denote the random inventory level at customer i at time t, and let Vijt denote the random vehicle availability at time t. 2. For any subproblem state (xi , xj , vij ), let Aij (xi , xj , vij ) denote the set of feasible subproblem decisions when the subproblem process is in state (xi , xj , vij ). A decision aij ∈ Aij (xi , xj , vij ) contains information about (1) which of customers i and j to replenish, (2) how much to deliver at each of customers i and j, and (3) how to combine customers i and j into vehicle routes. (For a two-customer subproblem, the routing aspect of the decision is easy.) Let di (aij ) denote the quantity of product that is delivered to customer i while executing decision aij . The feasible decisions aij ∈ Aij (xi , xj , vij ) satisfy the following constraints when the subproblem state is (xi , xj , vij ). When the vehicle availability is vij = 0, then no vehicles can be sent to customers i and j, and di (aij ) = dj (aij ) = 0. When ˜ xi + di (aij ) ≤ Ci , vij = a, then one vehicle can be sent to customers i and j, and di (aij ) + dj (aij ) ≤ C, and xj + dj (aij ) ≤ Cj . When vij = b, then one vehicle can be sent to customer i, no vehicle can be sent to customer j, and di (aij ) ≤ min{λiij , Ci − xi }, and dj (aij ) = 0. Feasible decisions are determined similarly if vij = c. When vij = d, then one vehicle can be sent to customers i and j, and di (aij ) + dj (aij ) ≤ λij ij , xi + di (aij ) ≤ Ci , and xj + dj (aij ) ≤ Cj . When vij = e, then one vehicle can ˜ Ci − xi }, and dj (aij ) ≤ min{C, ˜ Cj − xj }. be sent to each of customers i and j, and di (aij ) ≤ min{C, ˜ Ci −xi }, When vij = f , then one vehicle can be sent to each of customers i and j, and di (aij ) ≤ min{C, and dj (aij ) ≤ min{λjij , Cj − xj }. Feasible decisions are determined similarly if vij = g. Finally, when vij = h, then both i and j can be visited by a vehicle each, and di (aij ) ≤ min{λiij , Ci − xi }, and dj (aij ) ≤ min{λjij , Cj − xj }. As for the overall IRP, let the random variable Aijt ∈ Aij (Xit , Xjt , Vijt ) denote the decision chosen at time t. 3. The transition probabilities of the subproblems have to incorporate the probability distribution of customer demands, as well as the probabilities of vehicle availabilities to the subset of customers. Because we assume that the probability distribution fij of customer demands is known, the transition probabilities of the inventory levels can be determined for the subproblems as for the overall IRP. In the overall IRP process, the probabilities of vehicle availabilities to a subset of customers depend on the policy used to control the process, and are not directly obtainable from the input data of the IRP. Thus, some additional effort is required to make the transition probabilities of vehicle availabilities in the subproblems representative of what happens in the overall IRP. The basic idea is described next, and more details are provided in Section 2.1.1. Consider any policy π ∈ Π for the IRP with unique stationary probability ν π (x) for each x ∈ X . (Thus, as indicated in Algorithm 1, the formulation

18

of the subproblems depends on the policy used to control the overall process. In each iteration of Algorithm 1, a policy is chosen and a set of subproblems are defined and solved.) Similar to the nine types of vehicle availability vij ∈ {0, a, b, c, d, e, f, g, h} for customers i and j in subproblem MDPij identified above, the delivery actions for customers i and j of each decision a in the overall IRP process can be classified as belonging to one of the above nine types. Let v˜ij (a) ∈ {0, a, b, c, d, e, f, g, h} denote the type of delivery action for customers i and j of decision a in the overall IRP process. Then, for the overall IRP process under policy π, the conditional probability pij (wij |yi , yj ) that the delivery action for customers i and j is v˜ij (a) = wij , given that the inventory levels at customers i and j are yi and yj , is given by pij (wij |yi , yj ) =

ν π (x)

{x∈X : xi =yi , xj =yj , v ˜ij (π(x))=wij }

(5)

ν π (x)

{x∈X : xi =yi , xj =yj }

if the denominator is positive, and pij (wij |yi , yj ) = 0 if the denominator is 0. Suppose we know or have estimates for the conditional probabilities pij (wij |yi , yj ). (The estimation of pij (wij |yi , yj ) is discussed in Section 2.1.1.) Then the transition probabilities for subproblem MDPij (which are input data for the subproblem) are given by Pij (Xi,t+1 , Xj,t+1 , Vi,j,t+1 ) = (yi , yj , wij ) (Xit , Xjt , Vijt ) = (xi , xj , vij ), Aijt = aij    fij (xi + di − yi , xj + dj − yj ) pij (wij |yi , yj ) if yi > 0, yj       ∞ if yi = 0, yj ui =xi +di fij (ui , xj + dj − yj ) pij (wij |yi , yj ) ≡  ∞   if yi > 0, yj  uj =xj +dj fij (xi + di − yi , uj ) pij (wij |yi , yj )     ∞  ∞ if yi = 0, yj ui =xi +di uj =xj +dj fij (ui , uj ) pij (wij |yi , yj )

>0 >0

(6)

=0 =0

4. The costs for subproblem MDPij are the same as the costs involving customers i and j in the overall problem. As for the overall IRP, for any subproblem decision aij and arc (m, n), let kmn (aij ) denote the number of times that arc (m, n) is traversed by a vehicle while executing decision aij . Also, node 0 denotes the vendor location. Then, continuing with the example costs introduced in Section 1.1, the expected net reward per stage for subproblem MDPij , given state (xi , xj , vij ) and decision aij , is given by gij (xi , xj , aij ) ≡

ri (di (aij )) + rj (dj (aij )) − c0i k0i (aij ) + cij kij (aij ) + cj0 kj0 (aij ) − hi (xi + di (aij )) + hj (xj + dj (aij )) 19

− EF pi (max{Ui − (xi + di (aij )), 0}) + pj (max{Uj − (xj + dj (aij )), 0})

(7)

5. The objective is to maximize the expected total discounted value over an infinite horizon.

Let

Vij∗ (xi , xj , vij ) denote the optimal expected value of subproblem MDPij , given that the initial state is (xi , xj , vij ), i.e., Vij∗ (xi , xj , vij )

≡

sup

{Aijt }∞ t=0

E

∞ t=0

α gij (Xit , Xjt , Aijt ) (Xi0 , Xj0 , Vij0 ) = (xi , xj , vij ) t

The decisions Aijt are constrained to be feasible and nonanticipatory. The subproblem MDPij for each two-customer subset is relatively easy to solve using a dynamic programming algorithm such as modified policy iteration (Puterman, 1994). Also note that the subproblems do not have to be solved every day—these problems are solved initially when the value function approximation Vˆ is developed. Two issues related to the definition of the two-customer subproblems remain to be addressed. The first issue concerns the determination of the conditional probabilities pij (wij |yi , yj ), and the second issue involves the determination of the parts λiij of the vehicle capacity that are available for delivery to customer i when the vehicle also visits another customer k ∈ {i, j}. These two issues are addressed in the next two sections. 2.1.1

Determining Subproblem Transition Probabilities

Recall that conditional probabilities pij (wij |yi , yj ) were used to specify the transition probabilities for subproblem MDPij . Computing the conditional probabilities pij (wij |yi , yj ) using (5) is hard, because stationary probabilities ν π (x) have to be computed for all x ∈ X . The conditional probabilities can be estimated by simulation of the overall process under policy π. Let Nijt (yi , yj ) denote the number of times that the inventory levels at customers i and j have been yi and yj respectively by transition t of the simulation, and let Nijt (yi , yj , wij ) denote the number of times that the inventory levels at customers i and j have been yi and yj respectively and the delivery action for customers i and j has been v˜ij (a) = wij by transition t of the simulation of the overall IRP process under policy π. That is, Nijt (yi , yj ) and Nijt (yi , yj , wij ) are updated as follows:

Ni,j,t+1 (yi , yj ) =

  Nijt (yi , yj ) + 1

if Xit = yi and Xjt = yj

 Nijt (yi , yj )

otherwise

20

and

Ni,j,t+1 (yi , yj , wij ) =

  Nijt (yi , yj , wij ) + 1

if Xit = yi , Xjt = yj , and v˜ij (π(Xt )) = wij

 Nijt (yi , yj , wij )

otherwise

Then pîjt (wij |yi , yj ) ≡

Nijt (yi , yj , wij ) Nijt (yi , yj )

gives an estimate of pij (wij |yi , yj ) after t transitions of the simulation. Also, it is often easy to obtain good prior estimates of the probabilities pij (wij |yi , yj ). One can choose initial values Nij0 (yi , yj ) and Nij0 (yi , yj , wij ) of the counters, such that wij Nij0 (yi , yj , wij ) = Nij0 (yi , yj ) for all (yi , yj ) ∈ {0, 1, . . . , Ci }× {0, 1, . . . , Cj }, and pîj0 (wij |yi , yj ) ≡ Nij0 (yi , yj , wij )/Nij0 (yi , yj ) is an initial estimate of pij (wij |yi , yj ). It follows from results for Markov chains (Meyn and Tweedie, 1993) that if the Markov chain under policy π has a unique stationary probability distribution ν π , then, with probability 1, the estimates pîjt (wij |yi , yj ) converge to pij (wij |yi , yj ) as t → ∞ for all (yi , yj ) such that {x∈X : xi =yi , xj =yj } ν π (x) > 0. 2.1.2

Determining Available Vehicle Capacities

As mentioned in Section 2.1, for a subproblem MDPij , we have to specify the part λiij of the vehicle’s capacity C˜ that is available for delivery at customer i whenever a vehicle visits both customer i and another customer k ∈ {i, j}, that is, whenever the vehicle availability variable vij ∈ {b, g, h}. Several ways to model these partial vehicle capacities in the subproblems were investigated. As demonstrated in Section 5, good results were obtained by modeling the λ parameters in the subproblems as follows. Again, we consider a policy π ∈ Π for the overall IRP with unique stationary probability ν π (x) for each x ∈ X . Let λiij

≡

{x∈X : v ˜ij (π(x))∈{b,g,h}}

ν π (x)di (π(x))

{x∈X : v ˜ij (π(x))∈{b,g,h}}

ν π (x)

(8)

if the denominator is positive, and λiij ≡ 0 if the denominator is 0. The λ parameters defined above can also ˆ i denote the estimate of λi be estimated by simulation of the overall IRP process under policy π. Let λ ijt ij i ˆ i denotes an initial estimate, such as C/2. ˜ after t transitions of the simulation, where λ Let Nijt denote the ij0

number of times that the delivery action v˜ij (π(Xs )) for customers i and j have been in {b, g, h} by transition

21

i t of the simulation. That is, Nijt is updated as follows:

i = Ni,j,t+1

  N i + 1 ijt

if v˜ij (π(Xt )) ∈ {b, g, h}

 N i ijt

otherwise

i The initial value Nij0 is a weight, in units of number of observations, associated with the initial estimate

ˆ i . Then the parameter estimates are updated as follows: λ ij0

î λ i,j,t+1

 i i ˆ + d (π(Xt )) N λ    ijt ijt i i Nijt + 1 =   λ î ijt

if v˜ij (π(Xt )) ∈ {b, g, h} otherwise

As before, it can be shown that if the Markov chain under policy π is positive recurrent, then, with probability π ˆ i converge to λi as t → ∞ for all i and j such that 1, the estimates λ ijt ij {x∈X : v ˜ij (π(x))∈{b,g,h}} ν (x) > 0. Parameters λjij and λij ij are estimated in a similar way. An even simpler approach, using λ

i

≡

{x∈X : di (π(x))>0}

ν π (x)di (π(x))

{x∈X : di (π(x))>0}

ν π (x)

(9)

î . also lead to good results. These quantities λi can also be estimated by simulation estimates λ t We have covered the definition of two-customer subproblems at length. We hope that the main ideas have been presented in sufficient detail to make it clear that the same ideas can be applied to subproblems with one customer or with more than two customers.

2.2

Combining Subproblems

The next topic to be addressed is the calculation of the approximate value function Vˆ (x) at a given state x, using the results from the subproblems. Recall that subproblems were formulated and solved for subsets of customers, and that solving the subproblems produces optimal value functions for the subproblems. Let N ≡ {1, . . . , N } denote the set of customer indices, and let 0 be the index of the vendor’s facility. Let S ⊂ 2N denote the collection of subsets of the set N of customers for which subproblems were formulated and solved. In particular, recall that for each customer i ∈ N , {i} ∈ S. Also, for each i ∈ N , let Si ≡ {S ∈ S : i ∈ S} denote the collection of all subsets in S that contain i. For any subset S ∈ S and any state x (vector of inventory levels) of the overall process, let xS denote the subvector of x corresponding to S (vector of inventory levels at the customers in S). Also, let vS denote the vehicle availability component of the state for the subproblem MDPS for subset S, where, for example, vS = 0 denotes that no vehicle is currently

22

available for subset S, and vS = 1 denotes that one vehicle is currently available for subset S. Thus, solving subproblem MDPS for S ∈ S produces optimal value function VS∗ (xS , vS ). Given a state x, the approximate value Vˆ (x) is given by the optimal objective value of the following cardinality constrained partitioning problem. Vˆ (x) =

max y

subject to

Vi∗ (xi , 0) yi0 +

i∈N

yi0 +

VS∗ (xS , 1) yS1

(10)

S∈S

yS1 = 1

∀i∈N

(11)

S∈Si

yS1 ≤ M

(12)

S∈S

yi0 ∈ {0, 1}

∀i∈N

(13)

yS1 ∈ {0, 1}

∀S∈S

(14)

The cardinality constrained partitioning problem partitions the set N of customers into subsets, with each subset S ⊂ N corresponding to a subproblem MDPS . Each subset S for which yS1 = 1 is allocated a vehicle, and contributes value VS∗ (xS , 1) to the objective. Each customer i that is not in any subset that is allocated a vehicle (yi0 = 1) contributes value Vi∗ (xi , 0) to the objective. The first constraint requires that each customer is either in a subset that is allocated a vehicle, or the customer is not in a subset that is allocated a vehicle. The second constraint requires that at most M vehicles be allocated to subsets. The cardinality constrained partitioning problem in general is NP-hard: even if each subset has no more than three elements, the resulting cardinality constrained partitioning problem is NP-hard, because the 3Partition problem reduces to such a restricted cardinality constrained partitioning problem. However, for the special case in which each subset has no more than two elements, the cardinality constrained partitioning problem can be solved in polynomial time, by solving a maximum weight perfect matching problem, as described in Section 4.

3

Choosing a Decision in a State

So far, we have described the approximation of the dynamic programming value function, that is, the first major task in the list of major computational tasks for solving the IRP given in Section 1.3. As mentioned, the second major task was addressed as described in Kleywegt et al. (2002). The third major task is the solution of (3) for any given state x. In this section we address this step in the development of a solution method for the Markov decision process model of the IRP. Recall that the formulation and solution of the subproblems used in the construction of the approximating function Vˆ has to be performed initially only, and not at every stage of the process. In contrast, problem (3)

23

has to be solved at each stage of the process, but at each stage it is solved only for the given current state x of the process. It is therefore acceptable to spend a lot of computational effort on the formulation and solution of the subproblems, but it is desirable to be able to solve daily problem (3) with relatively little computational effort. Given the current state, two types of decisions have to be made, namely which customers to visit on each vehicle route, and how much to deliver at those customers. These decisions are related, because the value of visiting a set of customers on the same route depends on the delivery quantities for the customers. For instances with more than approximately four customers and two vehicles, solving the maximization problem to optimality would require an unacceptable computational effort, and therefore the following threestep local search heuristic was developed: 1. Construct an initial solution consisting of only direct delivery routes. Next, the local search heuristic continues by moving to the best neighboring decision in each iteration until no better neighboring decision can be found. A neighboring decision is formed by adding a customer to an existing route and modifying the delivery quantities. Each iteration consists of Steps 2 and 3: 2. For each existing route, rank all the customers not on the route by an initial estimate of the value of adding the customer to the route. 3. For each route, evaluate more accurately the value of adding to the route the customers not on the route, starting with the most promising customers identified in Step 2 and working down the lists, and stopping the list processing when the accurately evaluated values do not improve. Identify the one customer and one route that lead to the maximum improvement, and add that customer to the route. Step 2 in the heuristic outlined above can be omitted, and is introduced only for efficiency, because accurately evaluating a decision a involves computing

V (x, a) ≡ g(x, a) + α

Vˆ (y)Q[dy|x, a]

(15)

X

where x denotes the current state, which is very time consuming. A more detailed description of each of these steps is given below, followed by a statement of the algorithm.

3.1

Step 1: Choosing Direct-Delivery Routes

It is easy to see that a greedy procedure that chooses routes one at a time to optimize the objective function in (3), could lead to bad decisions. For example, suppose that two vehicles are available and there are two customers that urgently need deliveries, but the transportation cost between these two customers is quite 24

large. A greedy procedure may combine both these customers in the route that is chosen first, because of their urgency (and its impact on the penalty cost in g(x, a) as well as the value function Vˆ (y) at the next state y), and then combine other customers in the second route. A better decision may be to combine one urgent customer with some nearby customers in one route, and the other urgent customer with some other nearby customers in the other route. The proposed heuristic avoids the pitfall described above by using a direct delivery solution as a starting point for a local improvement procedure. Specifically, in Step 1 customers are assigned to vehicles using the algorithm proposed in Kleywegt et al. (2002) for the inventory routing problem with direct deliveries. After Step 1 has been completed, each vehicle visits at most one customer. As a route can visit more than one customer, better decisions may be obtained by modifying the direct delivery routes obtained in Step 1. In Steps 2 and 3, the routes are grown in the local search heuristic by including more customers in the routes, as described next.

3.2

Step 2: Ranking Customers to be Added to Routes

An improvement heuristic explores whether a local modification of the current decision leads to a neighboring decision that is better than the current decision, and, if so, adopts the better decision and repeats. In our case, the local modifications considered are moving a customer from one route to another, adding a customer which has not been included in a route yet to one of the routes, and changing the delivery quantities. As mentioned above, given state x, a decision a can be evaluated by computing V (x, a) as in (15). However, evaluating all neighboring decisions a as in (15) and then moving to the best neighboring decision, requires a prohibitively large amount of computational effort for the following reasons. For each of the M vehicle routes, one has to choose among Θ(N ) customers to be added to the route, and for each of the resulting Θ(M N ) new routes, a large number of delivery quantity combinations are possible, and thus the number of neighboring decisions can be large. Also, computing the value V (x, a) of a neighboring decision a as in (15) can be very time-consuming for instances with many customers, because of the high dimensional integral (and thus, in the case of a discrete distribution, the large number of terms in the sum), and the effort required to compute Vˆ (y) for each state y that can be reached from the current state x with decision a. These considerations motivate one to find a method to first identify promising neighboring decisions with little computational effort, and thereafter to evaluate only the most promising decisions in more detail. Such a method is described next. For each of the routes in the current decision, we consider each customer that can be added to the route. The new set of customers in the modified route should be a set of customers that can be visited by a single vehicle, and thus should correspond to an MDP subproblem such as those defined in Section 2.1.

25

To obtain an initial indication of the value of adding a customer to a current route, we use the optimal delivery quantities from the subproblem for the resulting set of customers, with the state of the subproblem given by the inventory levels and an availability of one vehicle to the set of customers. For each of the M vehicle routes, we choose among the Θ(N ) customers that can be added to the route, and thus the number of neighboring decisions has been reduced to Θ(M N ). In the expression (15) for the objective value V (x, a) of a neighboring decision a, the single-stage value g(x, a) can be computed quickly, whereas the expected future value X Vˆ (y)Q[dy|x, a] is much harder to compute. Also, we observed in empirical studies with the IRP that the decision with the highest single-stage value g(x, a) often also has the highest value of V (x, a) among all the feasible neighboring decisions. Hence, g(x, a) seems to give a good indication of whether it is worth exploring a neighboring decision a in more detail. Thus, given the current state x and current decision, for each of the routes in the current decision, and each customer that can be added to the route, a corresponding neighboring decision a and value g(x, a) have been identified. Next, for each of the routes in the current decision, the customers that can be added to the route are ranked according to the corresponding values g(x, a). Let j(m, i) denote the customer with the ith largest value of g(x, a) that can be added to route m in the current decision. The output of Step 2 is this ranking, that is, the set of indices j(m, i).

3.3

Step 3: Forming Routes Based on Total Expected Value

In Step 3 we decide which neighboring decision to move to (if any) before returning to Step 2. An outline of Step 3 is as follows. We compute the total expected value V (x, a) resulting from adding those customers to the current routes which obtained the highest values in Step 2. Then we move to a neighboring decision by adding the customer to the current route which leads to the best value of V (x, a) and return to Step 2, if such a value is better than the value of the current decision; otherwise the procedure terminates with the current decision. Next we describe Step 3 in more detail. Recall that the delivery quantities used in Step 2 were optimal for the subproblems, but may not be good for the overall problem. In Step 3, we choose the delivery quantities (which determine the decision a) at the customers more carefully, and compute the total expected value V (x, a) resulting from adding a customer to a route and using the delivery quantities. To choose the delivery quantities in Step 3, we use the following local search method. The set of routes is given. Consider the delivery quantities for one route at a time. For any given route and any given vector of delivery quantities for the route, the neighboring set of delivery quantities for the route consists of all the delivery quantity vectors obtained by one of the following four steps: (1) decrease one component of the given vector of delivery quantities by one unit (as long as the component is positive), and increase another component by the same amount, that is, swap a unit of delivery

26

between two customers on the given route; (2) increase the delivery quantity at one customer on the given route by one unit if the vehicle capacity and the customer capacity allow such an increase; (3) decrease the delivery quantity at one customer on the given route by one unit if the delivery quantity at that customer in the given vector is positive; (4) the given vector of delivery quantities is left unchanged (the null step). We start the local search at two solutions: (1) the vector of delivery quantities used in Step 2 (the optimal delivery quantities of the subproblems), and (2) the vector of delivery quantities in the current decision. At each iteration of the local search, we consider each of the given routes, and for each route, the vector of delivery quantities is changed to the vector of delivery quantities in its neighborhood with the best value V (x, a). The local search is terminated when a local optimum is found, that is, when no change in delivery quantities takes place during an iteration. The local search method for the calculation of the delivery quantities for a given set of routes is used as a subroutine for choosing the customer not yet included in a route to be included. The combination of chosen routes with the additional customer and corresponding local optimal delivery quantities defines the chosen neighboring decision moved to next. Let a(m, i) denote the neighboring solution with local optimal delivery quantities when adding customer j(m, i) to route m. Next we describe how the routes in the next solution are chosen. For each route m in the current decision, we successively compute the total expected value V (x, a) resulting from adding the next highest valued customer to route m. That is, we first compute V (x, a(m, 1)) resulting from adding customer j(m, 1) to route m. Then we compute V (x, a(m, 2)) resulting from adding customer j(m, 2) (but not customer j(m, 1)) to route m. If V (x, a(m, 2)) ≥ V (x, a(m, 1)), then we do the same for customer j(m, 3), otherwise we continue with another route. Thus the computation for route m is stopped when we reach a customer j(m, i) for which the total expected value V (x, a(m, i)) is worse than the value V (x, a(m, i−1)) for customer j(m, i − 1), i.e., V (x, a(m, i)) < V (x, a(m, i − 1)). Due to the preliminary ranking in Step 2, this usually happened in computational tests when i = 2. After these computations have been completed for all routes m in the current decision, the neighboring decision a∗ that provides the best total expected value V (x, a∗ ) is determined. If the obtained value V (x, a∗ ) is better than the total expected value V (x, a ) of the current decision a , then a∗ becomes the new current decision, and the procedure returns to Step 2; otherwise the procedure stops with a as the chosen decision. The procedure also stops if no more customers can be added to any routes. As mentioned before, in the expression (15) for V (x, a), the expected future value

X

Vˆ (y)Q[dy|x, a] is a

high dimensional integral if there are a large number of customers (and thus is the sum of a large number of terms if the distribution is discrete and the demand of each customer can take on several values). As pointed out in Kleywegt et al. (2002), if the random vector is high dimensional, then it is usually more efficient to estimate the expected value with random sampling. Related issues to be addressed are (1) how large the

27

sample size should be, and (2) what performance guarantees can be obtained if random sampling is used to choose the best decision. To address these issues, we used a ranking and selection method based on the work of Nelson and Matejcik (1995). We also used variance reduction techniques, such as common random numbers and experimental designs such as orthogonal arrays, to reduce the sample size needed for a specified level of accuracy. Additional details are given in Kleywegt et al. (2002). Algorithm 2 gives an overview of the steps in the procedure to choose a decision for a given state. Recall that Algorithm 1 is executed only once initially, whereas Algorithm 2 is executed at each stage of the process, each time for the given current state of the process.

4

A Special Case—Each Subset at Most Two Customers

In Section 2.2 it was shown how the subproblem results are combined to calculate the approximate value Vˆ (x) for any given state x, by solving a cardinality constrained partitioning problem. It was also mentioned that, in the special case in which each subset has no more than two elements, the cardinality constrained partitioning problem can be solved in polynomial time, by solving a maximum weight perfect matching problem. (In the application that motivated this research, most vehicle routes visit at most two customers, and thus each subset has no more than two elements.) In this section, we show that, in this special case, the cardinality constrained partitioning problem (subsequently called the partitioning problem) can be solved in polynomial time, by solving a maximum weight perfect matching problem. Specifically, the maximum weight perfect matching problem can be solved in O(n2 m) time with Edmonds’ (1965a,1965b) algorithm, where n is the number of nodes and m is the number of arcs in the graph, or in O(n(m + n log n)) time with Gabow’s (1990) algorithm. In our computational work, we used the Blossom IV implementation described in Cook and Rohe (1998). In the construction explained next, n = 4N + 2M , and m = |N 2 | + N + M + 2N (2N + 2M ). We describe the maximum weight perfect matching problem (subsequently called the matching problem) by describing the corresponding graph G = (V, E). Let N 2 ≡ {S ∈ S : |S| = 2} denote the collection of subsets in S of cardinality 2. There are four subsets of nodes, V ≡ V1 ∪ V2 ∪ V3 ∪ V4 , and four subsets of edges, E ≡ E1 ∪ E2 ∪ E3 ∪ E4 . Nodes in V1 represent customers, V1 ≡ {11 , . . . , i1 , . . . , N1 }, and for each pair of customers {i, j} ∈ N 2 , there is an edge (i1 , j1 ) ∈ E1 with value Vij∗ (xi , xj , 1). For each customer i ∈ N , there is also a node i2 ∈ V2 , and an edge (i1 , i2 ) ∈ E2 with value Vi∗ (xi , 1). Choosing an edge (i1 , j1 ) ∈ E1 represents assigning a vehicle to subset {i, j} ∈ N 2 , (for the purpose of computing Vˆ (x)), and choosing an edge (i1 , i2 ) ∈ E2 represents assigning a vehicle to customer i by itself. Vehicles can also be left idle. To capture that, there are 2M nodes, V3 ≡ {13 , . . . , (2M )3 }, and M edges, E3 ≡ {(13 , 23 ), (33 , 43 ), . . . , ((2M − 1)3 , (2M )3 )}, each with value 0. (It follows from the definitions of the subproblems that Vi∗ (xi , 1) ≥ Vi∗ (xi , 0) for all i and xi ,

28

Algorithm 2 Choosing a Decision in a Given State x Step 1: Compute direct-delivery routes and delivery quantities using the algorithm in Kleywegt et al. (2002). Set current decision a equal to the resulting decision. Step 2: if no more customers can be added to any routes then Stop with current decision a as the chosen decision. end if for each route m in the current decision a do for each customer j that can be added to route m do Add customer j to route m. Use the optimal delivery quantities from the subproblem corresponding to the resulting route to determine the neighboring decision a. Compute the single-stage value g(x, a). Remove customer j from route m. end for Sort the customers that can be added to route m, in decreasing order of the single-stage values g(x, a), to obtain a sorted list of customers j(m, 1), j(m, 2), . . . for route m. end for Step 3: for each route m in the current decision a do Set i ← 1. if no customers can be added to route m then Continue with the next route m in the current decision a . end if Add customer j(m, i) to route m. Choose the delivery quantities using local search to determine the decision a(m, i). Remove customer j(m, i) from route m. repeat if no more customers can be added to route m then Break out of the repeat loop. end if Increment i ← i + 1. Add customer j(m, i) to route m. Choose the delivery quantities using local search to determine the decision a(m, i). Remove customer j(m, i) from route m. until V (x, a(m, i)) < V (x, a(m, i − 1)). end for Let m∗ be the route, j ∗ be the added customer, and a∗ be the decision with the best value of V (x, a(m, i)). if V (x, a∗ ) > V (x, a ) then Add customer j ∗ to route m∗ , and set a ← a∗ as the new current decision. Go to Step 2. else

29 Stop with current decision a as the chosen decision. end if

and thus if N ≥ 2M −1, then there is always an optimal solution of the partitioning problem such that all the ∗ vehicles are assigned, that is, S∈S yS1 = M . In such a case, there is no need for any nodes in V3 or any edges in E3 . This was the case in the motivating application.) Figure 1 shows the vertices in V1 ∪ V2 ∪ V3 and the edges in E1 ∪E2 ∪E3 of the matching graph G = (V, E) = (V1 ∪V2 ∪V3 ∪V4 , E1 ∪E2 ∪E3 ∪E4 ) for an example with N = 3 customers and M = 2 vehicles. In the example, V1 = {11 , 21 , 31 }, E1 = {(11 , 21 ), (11 , 31 ), (21 , 31 )}, V2 = {12 , 22 , 32 }, E2 = {(11 , 12 ), (21 , 22 ), (31 , 32 )}, V3 = {13 , 23 , 33 , 43 }, and E3 = {(13 , 23 ), (33 , 43 )}. The nonzero edge values are also shown in the figure. Thus so far there are |V1 | + |V2 | + |V3 | = 2N + 2M nodes. The assignment of M vehicles is to be represented by the matching of 2M nodes. To match the remaining 2N nodes, there are 2N additional nodes, V4 ≡ {14 , . . . , (2N )4 }, and (2N )(2N + 2M ) edges, E4 ≡ E41 ∪ E42 ∪ E43 , where E4k ≡ Vk × V4 . Each edge (i1 , j4 ) ∈ E41 has value Vii∗ (xi , xi , 0), and each edge in E42 and E43 has value 0. (The number of edges can be reduced, for example by having only edges between odd numbered nodes in V3 and odd numbered nodes in V4 , and between even numbered nodes in V3 and even numbered nodes in V4 .) Figure 2 shows the vertices in V1 ∪ V2 ∪ V3 ∪ V4 and the edges in E4 of the matching graph G = (V, E) = (V1 ∪ V2 ∪ V3 ∪ V4 , E1 ∪ E2 ∪ E3 ∪ E4 ) for an example with N = 3 customers and M = 2 vehicles. In the example, V4 = {14 , . . . , 64 }. The nonzero edge values shown in the figure are as follows: x = V1∗ (x1 , 0), y = V2∗ (x2 , 0), and z = V3∗ (x3 , 0). Edges without values shown have value 0. Proposition 1. The partitioning and matching problems described above are equivalent. That is, for any feasible solution of the partitioning problem (10)–(14), there is a feasible solution of the matching problem on the graph G described above with the same objective value, and for any feasible solution of the matching problem, there is a feasible solution of the partitioning problem with the same objective value. Proof. Consider any feasible solution of the partitioning problem. We select edges one at a time, while maintaining feasibility of the matching, until a perfect matching has been constructed. Start with no edges in E selected. First, list the subsets S ∈ S in any sequence. We claim that edges in E can be selected according to the following cases for each S in the list, while maintaining feasibility of the matching. Case 1.1: If S = {i, j}, i = j, and yS1 = 1, then any two unmatched nodes k4 and l4 in V4 are picked, and edges (i1 , j1 ), (i2 , k4 ) and (j2 , l4 ) are selected. Case 1.2: If S = {i, j}, i = j, and yS1 = 0, then no edges are selected for this S. Case 2.1: If S = {i} and yS1 = 1, then edge (i1 , i2 ) is selected. Case 2.2: If S = {i} and yi0 = 1, then any two unmatched nodes k4 and l4 in V4 are picked, and the corresponding edges (i1 , k4 ) and (i2 , l4 ) are selected. Case 2.3: If S = {i}, yS1 = 0, and yi0 = 0, then no edges are selected for this S. 30

*

V

12

1) (x 1,

1

31

12 (

V*

* 3(

V

,1) 2 x ( 2

,1) x3

22

1) x 3, x 1,

*

* 13(

V*23(x2,x3,1)

21 V

V

x, 1 x 2 ,1 )

11

13

23

33

43

32

Figure 1: Part (V1 ∪V2 ∪V3 , E1 ∪E2 ∪E3 ) of the matching graph G = (V, E) = (V1 ∪V2 ∪V3 ∪V4 , E1 ∪E2 ∪E3 ∪E4 ) for an example with N = 3 customers and M = 2 vehicles.

31

14

34

54

13

33

x x

11

z y

12

22

z

21 y

x x

y

y x

y

x

z

31 y

z z

32

z

23

43

44

24

64

Figure 2: Part (V1 ∪ V2 ∪ V3 ∪ V4 , E4 ) of the matching graph G = (V, E) = (V1 ∪ V2 ∪ V3 ∪ V4 , E1 ∪ E2 ∪ E3 ∪ E4 ) for an example with N = 3 customers and M = 2 vehicles. Nonzero edge values are as follows: x = V1∗ (x1 , 0), y = V2∗ (x2 , 0), and z = V3∗ (x3 , 0).

32

Note that (11) excludes the case with S = {i}, yS1 = 1, and yi0 = 1. Thus exactly one of the cases above holds for each S ∈ S. It follows from the construction of G that all the edges selected in the cases above are in E. (Recall that E4 ≡ E41 ∪ E42 ∪ E43 , where E4k ≡ Vk × V4 . Thus, as long as there are a sufficient number of nodes in V4 , the nodes in V4 can be matched as described in the cases above.) To justify the claim, we need to show that for each S, there are sufficient unmatched nodes in V4 , and that feasibility of the matching is maintained. Next we show that there are a sufficient number of nodes in V4 . After all subsets S ∈ S have been processed, the number of nodes in V4 that have been picked is 2 {S∈N 2 } yS1 + 2 i∈N yi0 . Note that i∈N S∈Si yS1 = 2 {S∈N 2 } yS1 + i∈N yi1 . Hence, by adding constraint (11) over all i ∈ N , it follows that

yi0 + 2

yS1 +

{S∈N 2 }

i∈N

yi1 = N

(16)

i∈N

and thus

yi0 + 2

yS1 ≤ N

(17)

{S∈N 2 }

i∈N

and

yi0 ≤ N

(18)

i∈N

By adding (17) and (18) it follows that the number of nodes in V4 that have been picked is less than or equal to 2N , which is the number of nodes in V4 . Next it is shown that the matching constructed so far is feasible. In fact, so far each node in V1 and V2 has been matched with exactly one other node, because constraint (11) implies that for each i ∈ N , exactly one of the following holds: (1) Case 1.1 for one S ∈ Si , or (2) Case 2.1, or (3) Case 2.2. In either case, each of nodes i1 and i2 is matched with exactly one node. Also, so far none of the nodes in V3 has been matched, and each node in V4 has been matched with at most one other node. Next we continue the construction of the perfect matching. The number of unassigned vehicles is M − S∈S yS1 . Thus any M − S∈S yS1 edges in E3 are selected. By the definition of E3 , these edges have no nodes in common. Hence the number of unmatched nodes in V3 is 2 S∈S yS1 . The number of unmatched nodes in V4 is 2N − 2

{S∈N 2 }

yS1 − 2

i∈N

yi0 = 2

{S∈N 2 }

yS1 + 2

i∈N

yi1 = 2

yS1

S∈S

where the first equality follows from (16). That is, the number of unmatched nodes in V3 is equal to the

33

number of unmatched nodes in V4 . Now each unmatched node in V3 is matched with an unmatched node in V4 by selecting an edge in E43 ≡ V3 × V4 . The construction of the perfect matching is complete. It is easily checked that the objective value of the perfect matching is the same as that of the given partitioning solution, because only edges incident to nodes in V1 have nonzero values. Conversely, consider any feasible solution of the matching problem. We construct a feasible solution of the partitioning problem with the same objective value, as follows. For each node i1 ∈ V1 , exactly one of the following three cases holds. Case a: If an edge (i1 , j1 ) ∈ E1 (i = j) is selected, then set yS1 = 1 for S = {i, j}. Case b: If an edge (i1 , i2 ) ∈ E2 is selected, then set yS1 = 1 for S = {i}. Case c: If an edge (i1 , k4 ) ∈ E41 is selected, then set yi0 = 1. All other decision variables of the partitioning problem are set to 0. It follows that constraint (11) is satisfied. Let M denote the number of edges in E3 that are selected, matching 2M nodes in V3 . The remaining 2M − 2M nodes in V3 have to be matched with nodes in V4 . Thus the remaining 2N − (2M − 2M ) nodes in V4 have to be matched with nodes in V1 ∪ V2 . Hence 2N − [2N − (2M − 2M )] = 2M − 2M nodes in V1 ∪ V2 are matched with each other, setting M − M variables yS1 equal to 1. Thus S∈S yS1 = M − M ≤ M , and constraint (12) is satisfied. It is again easily checked that the objective value of the resulting partitioning solution is the same as that of the matching.

5

Computational Results

In this section, we discuss a number of experiments conducted to assess the efficacy and study the computational behavior of the dynamic programming approximation method. More specifically, the purpose of the experiments are 1. to evaluate the quality of the policies produced by the dynamic programming approximation method, 2. to analyze the impact of various problem characteristics, such as the number of customers, the number of vehicles, and the coefficients of variation of customer usage, on the quality of the policies produced, and 3. to measure the computational requirements of the proposed method. All instances used for the computational tests are given in the appendix. First we describe the performance measures used to evaluate and present the qualities of policies.

34

5.1

Evaluating Policies and Comparing Value Functions

We evaluate policies by comparing their value functions with the optimal value functions for small instances (for which the optimal value functions can be computed to within a small tolerance of optimality with reasonable computational effort) and with the value functions of competing policies for larger instances. However, it is difficult to present the quality of a policy π in a concise way, because it involves comparing the value function V π (x) of policy π either with the optimal value function V ∗ (x) or with the value function V π˜ (x) of a competing policy π ˜ over all states x. Therefore, for small instances, we have chosen to compare the average value of policy π over all states π ∗ with the average optimal value over all states. That is, Vavg ≡ x∈X V π (x)/|X | is compared with Vavg ≡ ∗ x∈X V (x)/|X |. Also, since we realize that averaging over all states may smooth out irregularities, we augment this comparison with a comparison of the minimum and maximum values over all states. That is, π π ∗ ∗ Vmin ≡ minx∈X V π (x), and Vmax ≡ maxx∈X V π (x) are compared with Vmin ≡ minx∈X V ∗ (x), and Vmax ≡

maxx∈X V ∗ (x). In addition to presenting statistics of the actual values of the policies as described above, we also present the values of these policies relative to the optimal values. To eliminate the effect of negative optimal values, or values in the denominator close to zero, we shift all the values to fix the minimum value of the shifted optimal value function at 1. Specifically, let m ≡ minx∈X V ∗ (x) and for any stationary policy π, let ρπ (x) ≡ [V π (x) − m + 1]/[V ∗ (x) − m + 1]. For each policy π evaluated, we present ρπavg ≡ x∈X ρπ (x)/|X |, ρπmin ≡ minx∈X ρπ (x), and ρπmax ≡ maxx∈X ρπ (x). For small instances, the value function V π (x) of each policy π is computed using the Gauss-Seidel policy evaluation algorithm (see, for example, Puterman 1994). For larger instances, the Gauss-Seidel policy evaluation algorithm is not useful for computing the value functions of policies, because the number of states becomes too large, and hence the available computer memory is not sufficient to store the values of all the states, and the computation time also becomes excessive. For the same reasons, the optimal value functions cannot be computed for larger instances. In the absence of optimal values, we compare our dynamic programming approximation policy (KNS), presented in Algorithm 2, with the following two policies. The first competing policy is a slightly modified version (to account for additional terms in our objective) of the policy proposed by Chien et al. (1989) (CBW), as described in Kleywegt et al. (2002). The second competing policy is a myopic policy (Myopic) that takes only the single-stage costs into account, i.e., the policy obtained by using value function approximation Vˆ = 0 or discount factor α = 0. The policies were evaluated by randomly choosing five initial states, and then simulating the processes under each of the different policies starting from the chosen initial states. Six sample paths were generated for each combination of policy and initial state, for each problem instance. Each replication produced a sample path over a relatively long but finite time horizon of 800 time periods; each resulting in a total discounted reward. The length of the time horizon was chosen to bound the discounted truncation error to less than 0.01 (approximately 0.1%). The sample means µ and standard deviations σ of the sample 35

means of the total discounted rewards over the six sample paths, as well as intervals (µ − 2σ, µ + 2σ), are presented.

5.2

Policy Quality with Small Instances

In this section we describe the results of computational tests performed to evaluate the quality of our proposed policy KNS for small instances. At the same time, we address an issue encountered during the development of the value function approximation regarding how to capture the interactions between the customer(s) in a subproblem and the remaining customers. One aspect of this interaction is the fact that when a vehicle visits both a customer i in a subproblem MDPij and a customer k not in the subproblem, then less than the full vehicle capacity is available for delivery at the customer in the subproblem. As described in Section 2.1, this interaction is captured by the partial vehicle capacities λiij available to customer i in subproblem MDPij . Appropriate values for λiij can be estimated with (8) or (9). A relevant question is how sensitive the resulting policy is with respect to the estimates of λiij . In the first set of experiments, we compare the effect of using ˆ i = 0.5CV (policy π1 ) to that of using an estimate obtained using (9) and simulation a simple estimate λ ij (policy π2 ), on the solution quality. For both policies, the expected values in (4) are computed exactly, and the decision in each state is chosen by evaluating all feasible decisions in that state. We compare the value functions of policies π1 and π2 with the optimal value function for small but nontrivial instances of the IRP. These comparisons are given in Tables 2 and 3. Table 2: Comparison of the values of policies that use different estimates of the partial vehicle availabilities λiij , with the optimal values. Instance topt1 topt2 topt3 topt4

∗ Vmin 66.77 66.62 22.93 148.02

∗ Vavg 68.13 69.19 27.17 153.19

∗ Vmax 69.21 70.63 29.78 156.42

π1 Vmin 66.42 65.96 22.17 145.68

π1 Vavg 67.84 68.65 26.53 151.34

π1 Vmax 68.83 70.00 28.98 154.53

π2 Vmin 66.58 66.43 22.74 147.27

π2 Vavg 68.09 69.12 27.10 152.90

π2 Vmax 69.05 70.51 29.63 155.71

Table 3: Comparison of the ratios of the values of policies that use different estimates of the partial vehicle availabilities λiij , relative to the optimal values. Instance topt1 topt2 topt3 topt4

1 ρπmin 0.957 0.952 0.961 0.950

1 ρπavg 0.973 0.963 0.971 0.962

1 ρπmax 0.980 0.984 0.976 0.973

2 ρπmin 0.969 0.965 0.967 0.968

2 ρπavg 0.988 0.985 0.988 0.984

2 ρπmax 0.995 0.994 0.994 0.990

When we look at the results in Tables 2 and 3, we observe that the values of the two policies are very

36

close to the optimal values, which indicates that our overall approach provides good policies. Furthermore, the results also reveal that using (9) and simulation to estimate λiij provides a better policy than using a crude estimate, at the cost of only a small increase in computation time. Hence, we used (9) and simulation to estimate λiij in the experiments discussed in the remainder of this section.

5.3

Policy Quality with Larger Instances

In this section we describe the results of computational tests performed to evaluate the quality of our proposed policy KNS for larger instances. In Section 5.3.1, we first focus on the special case in which delivery routes visit at most two customers. In that situation, it suffices to consider subproblems of at most two customers to approximate the value function, and, as discussed in Section 4, calculating the approximate value function at a given state can be done in polynomial time by solving a maximum weight perfect matching problem. Thereafter, in Section 5.3.2, we compare the objective value of the KNS policy for the case in which delivery routes visit at most two customers with the case in which delivery routes visit up to three customers. 5.3.1

At Most Two Customers Per Route

We conducted three experiments with each of the three policies KNS, CBW, and Myopic. In each of these experiments, we varied a single instance characteristic and observed the impact on the performance of the policies. The three instance characteristics varied are (1) the number of customers, (2) the number of vehicles, and (3) the coefficient of variation of customer demand. To study the impact of the number of customers on the performance of the policies, the instances were generated so that larger instances have more customers with the same characteristics as the smaller instances. Hence, customer characteristics as well as the ratio of delivery capacity to total expected demand were kept the same for all instances. Table 4 shows the performance of the policies on instances with varying numbers of customers. The results clearly demonstrate that the KNS policy consistently outperforms the other policies. Furthermore, the difference in quality does not appear to increase or decrease with the number of customers. Second, we studied the impact of the number of vehicles, and thus the delivery capacity available, on the performance of the policies. The numbers of vehicles was chosen in such a way that we could study the effectiveness of the policies when the available delivery capacity is smaller than the total expected demand, as well as when there is surplus delivery capacity. The results are given in Table 5. Intuitively, it is clear that when the delivery capacity is very restrictive, i.e., the number of vehicles is small, then it becomes more important to use the available capacity wisely. The results show the superiority of the KNS policy in handling situations with tight delivery capacity—the differences in quality are much larger for tightly constrained instances than for loosely constrained instances. Third, we studied the impact of the customer demand coefficient of variation on the performance of 37

38

N 10

15

20

Instance tcst1

tcst2

tcst3

µ -12.45 -12.21 -11.97 -12.19 -13.08 -17.62 -17.76 -18.25 -17.37 -18.17 -20.58 -20.81 -20.49 -21.25 -20.36

σ 0.37 0.27 0.28 0.40 0.24 0.42 0.28 0.42 0.39 0.33 0.36 0.29 0.34 0.33 0.26

CBW µ − 2σ -13.20 -12.74 -12.54 -12.98 -13.57 -18.47 -18.32 -19.08 -18.16 -18.83 -21.30 -21.38 -21.18 -21.91 -20.89 µ + 2σ -11.71 -11.67 -11.40 -11.39 -12.60 -16.78 -17.20 -17.41 -16.58 -17.52 -19.86 -20.24 -19.81 -20.58 -19.84

µ -11.39 -11.25 -11.88 -11.65 -11.73 -17.17 -17.09 -17.30 -17.13 -16.92 -19.84 -19.35 -19.21 -19.28 -19.87

Myopic σ µ − 2σ 0.26 -11.91 0.20 -11.64 0.34 -12.56 0.24 -12.13 0.18 -12.09 0.24 -17.64 0.28 -17.66 0.25 -17.80 0.17 -17.48 0.15 -17.21 0.35 -20.54 0.37 -20.10 0.28 -19.77 0.35 -19.97 0.42 -20.72 µ + 2σ -10.86 -10.86 -11.21 -11.18 -11.38 -16.70 -16.53 -16.79 -16.79 -16.62 -19.13 -18.60 -18.66 -18.58 -19.02

µ -8.60 -8.73 -8.53 -8.63 -8.92 -13.10 -13.57 -13.34 -13.63 -13.45 -16.68 -16.85 -16.43 -16.59 -16.21

σ 0.27 0.29 0.11 0.20 0.27 0.13 0.10 0.21 0.31 0.16 0.28 0.27 0.18 0.30 0.27

KNS µ − 2σ -9.13 -9.32 -8.75 -9.04 -9.46 -13.35 -13.77 -13.77 -14.24 -13.78 -17.24 -17.39 -16.79 -17.18 -16.75

Table 4: Comparison of the values of policies for instances with different numbers of customers. µ + 2σ -8.07 -8.14 -8.31 -8.22 -8.37 -12.85 -13.38 -12.92 -13.02 -13.13 -16.12 -16.30 -16.07 -15.99 -15.66

39 6

9

tveh3

M 3

tveh2

Instance tveh1

µ -65.44 -65.85 -65.85 -66.03 -65.72 1.41 1.17 1.43 1.30 0.82 15.01 15.28 15.15 15.30 14.87

σ 0.17 0.25 0.20 0.19 0.32 0.13 0.24 0.18 0.16 0.20 0.32 0.19 0.12 0.24 0.19

CBW µ − 2σ -65.78 -66.34 -66.24 -66.41 -66.36 1.16 0.70 1.08 0.99 0.42 14.37 14.90 14.91 14.83 14.48 µ + 2σ -65.10 -65.35 -65.45 -65.64 -65.07 1.66 1.65 1.78 1.62 1.22 15.65 15.66 15.39 15.78 15.26

µ -64.11 -63.73 -63.82 -63.84 -63.93 2.00 2.17 1.58 1.96 2.18 16.10 15.93 15.98 16.09 16.23

Myopic σ µ − 2σ 0.18 -64.48 0.25 -64.23 0.25 -64.31 0.22 -64.29 0.27 -64.47 0.26 1.48 0.18 1.81 0.27 1.04 0.36 1.24 0.29 1.60 0.21 15.69 0.18 15.56 0.19 15.59 0.22 15.65 0.29 15.64 µ + 2σ -63.75 -63.23 -63.33 -63.40 -63.40 2.51 2.52 2.12 2.68 2.75 16.52 16.29 16.36 16.53 16.82

µ -58.58 -59.24 -59.05 -58.92 -58.73 4.83 5.30 5.43 5.14 5.28 18.34 18.06 17.64 18.17 17.84

σ 0.19 0.29 0.23 0.21 0.18 0.22 0.17 0.24 0.26 0.24 0.18 0.24 0.14 0.33 0.24

KNS µ − 2σ -58.96 -59.82 -59.52 -59.35 -59.09 4.39 4.96 4.95 4.61 4.79 17.97 17.58 17.36 17.52 17.35

Table 5: Comparison of the values of policies for instances with different numbers of vehicles. µ + 2σ -58.20 -58.65 -58.58 -58.50 -58.36 5.27 5.64 5.91 5.67 5.76 18.71 18.53 17.91 18.83 18.33

the policies. The customer demand distributions for the three instances were selected so that the demand distribution is the same for all customers in an instance, and the expected customer demand for each of the instances is 5. We varied the distributions so that the customer demands have different variances, namely 1, 4 and 16. All other characteristics are exactly the same for the instances. The results are given in Table 6. The results show that when the coefficients of variation of customer demand are large and it becomes less clear what the future is going to bring, then the difference in quality between the KNS policy and the other policies tend to be smaller, although the KNS policy still does better on every instance. As expected, this indicates that carefully taking into account the available information about the future, such as through dynamic programming approximation methods, provides more benefit if more information is available about the future. Next, we compare the performance of the three policies on an instance derived from real-world data. The data for this was obtained from one of the smaller plants of a leading producer and distributor of air products. Before describing the results, we indicate some features of the data which are interesting and present in most data sets obtained from this company. We also indicate some of the changes that were made to the data to make them consistent with input requirements of our algorithm. 1. Tank sizes at the customers range from 90,000 cubic feet to 700,000 cubic feet. The tank sizes at the customers were rounded to the nearest multiple of 25,000, and product quantities were discretized in multiples of 25,000. 2. The company did not have estimates of the probability distributions of demands at the customers. However, they did have estimates of the mean and standard deviation of the demand. Using the mean and standard deviation, we created a discrete demand distribution for each customer with the given mean and standard deviation. 3. The company did not provide exact values for the revenue earned per unit of product delivered. We used the same value for the revenue per unit of product at all the customers, assuming that the company charged the same price to all its customers. The performance of the three policies is shown in Table 7. As before, the performance of policy KNS is much better than the Myopic policy, which in turn is better than the CBW policy. Overall, the computational experiments conducted demonstrate the viability of using dynamic programming approximation methods for the IRP. 5.3.2

Up to Three Customers Per Route

Tables 8, 9, 10, and 11 present the differences in the objective values of the KNS policy between a problem that allows at most two customers per route and a problem that allows up to three customers per route. One 40

41

0.4

0.8

tvar2

tvar3

µ -17.21 -17.81 -17.59 -17.24 -17.38 -14.94 -15.15 -14.77 -14.58 -14.77 -9.55 -9.59 -9.85 -9.74 -8.90

σ 0.28 0.16 0.22 0.26 0.33 0.26 0.25 0.27 0.13 0.25 0.17 0.20 0.28 0.29 0.09

CBW µ − 2σ -17.76 -18.14 -18.02 -17.76 -18.04 -15.46 -15.66 -15.31 -14.84 -15.28 -9.89 -10.00 -10.42 -10.32 -9.08 µ + 2σ -16.65 -17.48 -17.15 -16.72 -16.72 -14.42 -14.64 -14.22 -14.33 -14.26 -9.21 -9.19 -9.28 -9.16 -8.72

µ -16.69 -16.71 -16.79 -16.20 -16.41 -14.14 -14.21 -13.60 -14.04 -14.09 -8.17 -8.03 -8.18 -8.04 -8.15

Myopic σ µ − 2σ 0.28 -17.24 0.27 -17.25 0.18 -17.14 0.17 -16.55 0.15 -16.71 0.22 -14.59 0.25 -14.70 0.15 -13.91 0.29 -14.62 0.23 -14.55 0.18 -8.54 0.18 -8.38 0.24 -8.65 0.21 -8.46 0.17 -8.49 µ + 2σ -16.14 -16.16 -16.43 -15.86 -16.11 -13.69 -13.71 -13.29 -13.46 -13.62 -7.80 -7.67 -7.70 -7.62 -7.81

µ -14.02 -13.93 -13.50 -13.88 -13.52 -12.27 -12.10 -11.65 -12.23 -11.73 -6.93 -6.76 -7.04 -7.06 -6.89

σ 0.24 0.25 0.14 0.30 0.28 0.22 0.27 0.21 0.17 0.24 0.29 0.13 0.23 0.25 0.24

KNS µ − 2σ -14.50 -14.42 -13.77 -14.48 -14.09 -12.71 -12.64 -12.08 -12.58 -12.21 -7.52 -7.03 -7.50 -7.56 -7.37

Instance tprx1

µ 32.62 34.96 34.16 34.75 33.93

σ 1.27 1.28 1.84 1.23 1.31

CBW µ − 2σ 30.07 32.40 30.48 32.30 31.30 µ + 2σ 35.17 37.53 37.84 37.21 36.56

µ 36.54 39.41 37.55 39.88 37.14

Myopic σ µ − 2σ 0.48 35.57 1.51 36.38 1.69 34.17 1.19 37.50 1.22 34.71

µ + 2σ 37.51 42.44 40.93 42.26 39.57

µ 45.45 47.15 47.06 44.26 42.97

σ 2.70 1.96 1.62 2.07 1.26

KNS µ − 2σ 40.04 43.23 43.82 40.13 40.44

µ + 2σ -13.55 -13.44 -13.23 -13.29 -12.96 -11.83 -11.56 -11.22 -11.88 -11.24 -6.34 -6.50 -6.58 -6.56 -6.41

µ + 2σ 50.86 51.07 50.30 48.39 45.50

Table 7: Comparison of the values of policies for an instance from the motivating application.

CV 0.1

Instance tvar1

Table 6: Comparison of the values of policies for instances with different demand variances.

would expect the policy to obtain a better objective value on a problem that allows up to three customers per route than on a problem that allows at most two customers per route, for two reasons: the feasible sets of the problem that allows up to three customers per route contains the feasible sets of the problem that allows at most two customers per route; and the value function approximation used when up to three customers per route are allowed is based on subsets with up to three customers, as opposed to the value function approximation that is used when at most two customers per route are allowed, which is based on subsets with at most two customers only. However, even though the results do show improvements in objective value, they are relatively minor. (In the petrochemical and air products industry the number of customers per route typically is less than or equal to three; routes with more than three customers do happen, but infrequently. For example, in the application that motivated this work, approximately 95% of routes visit three or fewer customers.)

5.4

Computation Times

The computational experiments discussed above have demonstrated the quality of the policies produced by the dynamic programming approximation method. Next, we focus on its computational requirements, i.e., the effort needed to construct a policy and the effort involved in executing a policy. All computational experiments were performed on an Intel Pentium III processor running at 1.4GHz. All times are reported in seconds. Recall that to approximate the optimal value function V ∗ , the IRP is decomposed into subproblems, and then, for any given state, the optimal value functions of the subproblems are combined by solving a cardinality constrained set partitioning problem. The most computationally intensive task during the construction of a policy is the solution of the individual subproblems, as each of these subproblems itself is a Markov decision problem. The most computationally intensive task during the execution of a policy is the calculation of the approximate value function at each of a number of possible future states, as each involves the solution of a cardinality constrained set partitioning problem. Table 12 presents the average times required to solve subproblems consisting of subsets of one, two, and three customers, respectively. Based on the times reported in Table 12, one can estimate the total time required to construct a policy. For a 50 customer problem, it takes about 0.5 seconds to solve all one customer subproblems, 318.5 seconds to solve all two customer subproblems, and 668,360 seconds (or 7.73 days) to solve all three customer subproblems. Observe, though, that these estimates are based on simple-minded straightforward implementations. In practice, it typically is not necessary to solve all two or three customer subproblems. Simple rules may reduce the number of subproblems to be solved significantly and thereby the computation times; for example, it makes sense to eliminate all subsets of two and three customers that cannot be visited by a single vehicle in one day. Regardless, the policy has to be constructed only once, and

42

43 15

20

tcst3

N 10

tcst2

Instance tcst1

µ -8.60 -8.73 -8.53 -8.63 -8.92 -13.10 -13.57 -13.34 -13.63 -13.45 -16.68 -16.85 -16.43 -16.59 -16.21

KNS (2-stops) σ µ − 2σ 0.27 -9.13 0.29 -9.32 0.11 -8.75 0.20 -9.04 0.27 -9.46 0.13 -13.35 0.10 -13.77 0.21 -13.77 0.31 -14.24 0.16 -13.78 0.28 -17.24 0.27 -17.39 0.18 -16.79 0.30 -17.18 0.27 -16.75 µ + 2σ -8.07 -8.14 -8.31 -8.22 -8.37 -12.85 -13.38 -12.92 -13.02 -13.13 -16.12 -16.30 -16.07 -15.99 -15.66

µ -7.18 -7.31 -7.42 -7.27 -7.18 -11.10 -11.81 -11.30 -11.63 -11.58 -14.23 -14.26 -14.87 -14.40 -13.60

KNS (3-stops) σ µ − 2σ 0.40 -7.97 0.33 -7.97 0.44 -8.29 0.55 -8.37 0.56 -8.30 0.55 -12.21 0.56 -12.92 0.55 -12.39 0.77 -13.18 0.58 -12.74 0.58 -15.38 0.67 -15.60 0.63 -16.13 0.67 -15.74 0.51 -14.62

µ + 2σ -6.38 -6.64 -6.55 -6.18 -6.07 -10.00 -10.69 -10.21 -10.08 -10.43 -13.08 -12.93 -13.60 -13.06 -12.58

Table 8: Comparison of the values of the KNS policy for problems allowing at most two customers per route versus problems allowing up to three customers per route, for instances with different numbers of customers.

44 6

9

tveh3

M 3

tveh2

Instance tveh1

µ -58.58 -59.24 -59.05 -58.92 -58.73 4.83 5.30 5.43 5.14 5.28 18.34 18.06 17.64 18.17 17.84

KNS σ 0.19 0.29 0.23 0.21 0.18 0.22 0.17 0.24 0.26 0.24 0.18 0.24 0.14 0.33 0.24

(2-stop) µ − 2σ -58.96 -59.82 -59.52 -59.35 -59.09 4.39 4.96 4.95 4.61 4.79 17.97 17.58 17.36 17.52 17.35 µ + 2σ -58.20 -58.65 -58.58 -58.50 -58.36 5.27 5.64 5.91 5.67 5.76 18.71 18.53 17.91 18.83 18.33

µ -56.30 -56.37 -56.61 -56.58 -56.26 5.91 6.47 6.76 6.61 6.82 19.48 18.99 18.90 19.06 19.61

KNS σ 0.62 0.65 0.74 0.67 0.77 0.19 0.36 0.29 0.61 0.47 0.53 0.51 0.47 0.52 0.63

(3-stop) µ − 2σ -57.53 -57.68 -58.10 -57.93 -57.79 5.53 5.75 6.17 5.39 5.88 18.42 17.96 17.97 18.03 18.35

µ + 2σ -55.07 -55.06 -55.12 -55.24 -54.72 6.28 7.18 7.34 7.84 7.77 20.53 20.01 19.83 20.10 20.87

Table 9: Comparison of the values of the KNS policy for problems allowing at most two customers per route versus problems allowing up to three customers per route, for instances with different numbers of vehicles.

45

0.4

0.8

tvar2

tvar3

µ -14.02 -13.93 -13.50 -13.88 -13.52 -12.27 -12.10 -11.65 -12.23 -11.73 -6.93 -6.76 -7.04 -7.06 -6.89

KNS σ 0.24 0.25 0.14 0.30 0.28 0.22 0.27 0.21 0.17 0.24 0.29 0.13 0.23 0.25 0.24

(2-stop) µ − 2σ -14.50 -14.42 -13.77 -14.48 -14.09 -12.71 -12.64 -12.08 -12.58 -12.21 -7.52 -7.03 -7.50 -7.56 -7.37 µ + 2σ -13.55 -13.44 -13.23 -13.29 -12.96 -11.83 -11.56 -11.22 -11.88 -11.24 -6.34 -6.50 -6.58 -6.56 -6.41

µ -11.91 -11.63 -11.38 -11.37 -11.26 -11.23 -11.48 -11.07 -10.87 -10.95 -6.35 -6.32 -5.86 -6.22 -6.18

KNS σ 0.34 0.41 0.54 0.46 0.44 0.20 0.37 0.38 0.17 0.48 0.27 0.16 0.17 0.26 0.34

(3-stop) µ − 2σ -12.60 -12.45 -12.47 -12.29 -12.15 -11.63 -12.23 -11.83 -11.22 -11.91 -6.90 -6.65 -6.21 -6.75 -6.87 µ + 2σ -11.22 -10.81 -10.30 -10.45 -10.38 -10.83 -10.74 -10.31 -10.53 -9.99 -5.81 -6.00 -5.52 -5.69 -5.49

Instance tprx1

µ 45.45 47.15 47.06 44.26 42.97

KNS σ 2.70 1.96 1.62 2.07 1.26

(2-stop) µ − 2σ 40.04 43.23 43.82 40.13 40.44

µ + 2σ 50.86 51.07 50.30 48.39 45.50

µ 49.71 50.04 49.88 48.62 47.50

KNS σ 2.26 2.16 1.15 1.87 1.74

(3-stop) µ − 2σ 45.19 45.73 47.59 44.88 44.01

µ + 2σ 54.24 54.35 52.17 52.36 50.99

Table 11: Comparison of the values of the KNS policy for an instance from the motivating application, for the case in which at most two customers per route are allowed versus the case in which up to three customers per route are allowed.

M 0.1

Instance tvar1

Table 10: Comparison of the values of the KNS policy for problems allowing at most two customers per route versus problems allowing up to three customers per route, for instances with different demand variances.

Table 12: Average solution times in seconds for a set of subproblems with different numbers of customers. Subset size Time (secs.) One customer 0.01 Two customer 0.26 Three customer 34.10

it is acceptable to spend a substantial amount of computational effort to do that. Once the approximating function Vˆ has been constructed, only the problem (3) has to be solved each day, each time for a given value of the state x. Because the daily problem has to be solved many times, it is important that this computational task can be performed with relatively little effort. Given the current state, two types of decisions have to be made in the daily problem, namely which customers to visit on each vehicle route, and how much to deliver at those customers. Table 13 presents the average time to determine an action for a given state, both for the case in which subsets of at most two customers are used in the value function approximation and the case in which subsets of up to three customers are used. Table 13: Average solution times in seconds for the daily problems of the instances used in the computational experiments, for the case in which at most two customers per route are allowed and the case in which up to three customers per route are allowed. Instance size at most 2 size at most 3 tcst1 22.1 107.9 tcst2 52.4 355.5 tcst3 134.1 1080.2 tvar1 54.7 356.4 tvar2 53.4 360.1 tvar3 55 362.5 tveh1 39.8 271.4 tveh2 56.1 354.8 tveh3 64.2 368.9 tprx1 18.3 100.3

Recall that when the approximate value function combines subproblems based on subsets with at most two customers, a maximum weight perfect matching problem is solved to compute the approximate value of a given state, but when the approximate value function combines subproblems based on subsets of up to three or more customers, a cardinality constrained set partitioning problem is solved. In our computational experiments, the set partitioning problems were solved with CPLEX 7.0, a commercially available integer programming solver. All set partitioning problems were solved to proven optimality using the default settings. No attempts were made to speed up the solution process. In practice, it may not be necessary to solve the set partitioning problems to proven optimality. It is well-known that in the solution of difficult integer programs most of the time is spent on proving optimality, and not on finding the optimal solution. Also recall that

46

the purpose of solving the cardinality constrained set partitioning problem is just to obtain an approximate value corresponding to a particular state, and not to obtain an optimal or even feasible solution for the partitioning problem—the solution of the partitioning problem is not used at all. Thus, in an application, it is reasonable to stop the solution of the partitioning problem as soon as sufficiently tight bounds on the optimal value of the partitioning problem has been obtained. The results in Table 13 show that the times required to execute a policy, that is, to determine the action for a given state, are acceptable; about one minute for the case with subsets of at most two customers and five minutes for the case with subsets of at most three customers. On the other hand, the results do demonstrate the value of being able to solve a maximum weight perfect matching problem as opposed to a cardinality constrained set partitioning problem in the two customer subset case.

6

Conclusion

In this paper we formulated a Markov decision process model of a stochastic inventory routing problem. The Markov decision problem can be solved with conventional algorithms, but only very small problems can be solved with reasonable computational resources. This motivated us to develop an approximation method. An important part of the method was the construction of an approximation Vˆ to the optimal value function V ∗ . The approximation Vˆ was based on a decomposition of the overall problem into subproblems. Of course, this is a very natural idea and is not new. However, the way in which the decomposition was done, the subproblems were formulated, and the results of the subproblems were combined to construct Vˆ , seem to be novel, and the results were promising. Subproblems were defined for specific subsets of customers, and the subsets overlapped to quite a large extent, and covered the set of customers. The values Vˆ (x) of the approximating function Vˆ were calculated by solving an optimization problem to choose a collection of subsets of customers that partitions the set of customers. Effort was put into formulating each subproblem so that the combined subproblems gives an accurate representation of the overall process. Then the process is controlled by solving a single-stage problem for the current state at each stage of the process. The approach described in this paper has promise of being applicable to many other stochastic control problems besides the stochastic inventory routing problem. Many stochastic control problems are hard because the Markov decision process formulation of the problem has a high dimensional state space with a huge number of states. This often comes about because the problem addresses the coordinated control of many interdependent resources; sometimes these resources have fairly similar characteristics. A natural extension of the approach described in this paper would be to decompose the overall control problem into subproblems involving subsets of resources. The approximate value function Vˆ is computed by combining the results of the subproblems in an associated partitioning-type optimization problem. The combined

47

subproblems should give an accurate representation of the overall process, and the subproblems should be tractable. The overall process can then be controlled by, at each stage of the process, solving a single-stage problem for the current state. We give a brief example of another application of the approach described above. Recently we worked on a dynamic bin covering problem that was motivated by the following application. Pieces of fish move along a conveyor belt. At the end of the line the pieces are weighed and then packed into one of several open bins. The pieces have different weights that are unknown until the weights are measured. A fairly good probability distribution of the weights can be estimated from historical data. Each bin is closed as soon as the total weight of the fish in the bin exceeds the minimum weight specified for the bin. After each piece of fish has been weighed, a decision has to be made regarding which open bin to place the piece in. The objective is to fill (and close) as many bins over the long run as possible, or equivalently, to minimize the average overweight per bin over the long run. It is easy to formulate a Markov decision process model of the problem. If only a small number (say up to 3 or 4) of bins can be open at a time, then the problem can be solved with reasonable computational resources. However, when many bins can be open at a time, approximation methods are needed. (Some industrial packers can have 10 or even more bins open at a time.) An approximation method along the lines of the approach described above was developed. Computational results have been promising.

Acknowledgement We thank Warren Powell for many constructive discussions.

References S. Anily and A. Federgruen, “One Warehouse Multiple Retailer Systems with Vehicle Routing Costs,” Management Science 36, 92–114 (1990). S. Anily and A. Federgruen, “Rejoinder to “Comments on One Warehouse Multiple Retailer Systems with Vehicle Routing Costs”,” Management Science 37, 1497–1499 (1991). S. Anily and A. Federgruen, “Two-Echelon Distribution Systems with Vehicle Routing Costs and Central Inventories,” Operations Research 41, 37–47 (1993). J. F. Bard, L. Huang, P. Jaillet, and M. Dror, “A Decomposition Approach to the Inventory Routing Problem with Satellite Facilities,” Transportation Science 32, 189–203 (1998). D. Barnes-Schuster and Y. Bassok, “Direct Shipping and the Dynamic Single-depot/Multi-retailer Inventory System,” European Journal of Operational Research 101, 509–518 (1997). Y. Bassok and R. Ernst, “Dynamic Allocations for Multi-Product Distribution,” Transportation Science 29, 256–266 (1995). 48

W. Bell, L. Dalberto, M. Fisher, A. Greenfield, R. Jaikumar, P. Kedia, R. Mack, and P. Prutzman, “Improving the Distribution of Industrial Gases with an On-Line Computerized Routing and Scheduling Optimizer,” Interfaces 13, 4–23 (1983). R. Bellman and S. Dreyfus, “Functional Approximations and Dynamic Programming,” Mathematical Tables and Other Aids to Computation 13, 247–251 (1959). R. E. Bellman, R. Kalaba, and B. Kotkin, “Polynomial Approximation—A New Computational Technique in Dynamic Programming: Allocation Processes,” Mathematics of Computation 17, 155–161 (1963). L. Bertazzi, G. Paletta, and M. G. Speranza, “Deterministic Order-Up-To Level Policies in an Inventory Routing Problem,” Transportation Science 36, 119–132 (2002). D. P. Bertsekas, “Convergence of Discretization Procedures in Dynamic Programming,” IEEE Transactions on Automatic Control AC-20, 415–419 (1975). D. P. Bertsekas, Dynamic Programming and Optimal Control , Athena Scientific, Belmont, MA (1995). D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, NY (1978). D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, New York, NY (1996). J. Bramel and D. Simchi-Levi, “A Location Based Heuristic for General Routing Problems,” Operations Research 43, 649–660 (1995). L. D. Burns, R. W. Hall, D. E. Blumenfeld, and C. F. Daganzo, “Distribution Strategies that Minimize Transportation and Inventory Costs,” Operations Research 33, 469–490 (1985). S. C ¸ etinkaya and C. Y. Lee, “Stock Replenishment and Shipment Scheduling for Vendor Managed Inventory Systems,” Management Science 46, 217–232 (2000). L. M. A. Chan, A. Federgruen, and D. Simchi-Levi, “Probabilistic Analysis and Practical Algorithms for Inventory-Routing Models,” Operations Research 46, 96–106 (1998). C. S. Chang, “Discrete-Sample Curve Fitting Using Chebyshev Polynomials and the Approximate Determination of Optimal Trajectories via Dynamic Programming,” IEEE Transactions on Automatic Control AC-11, 116–118 (1966). V. C. P. Chen, D. Ruppert, and C. A. Shoemaker, “Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming,” Operations Research 47, 38–53 (1999). T. W. Chien, A. Balakrishnan, and R. T. Wong, “An Integrated Inventory Allocation and Vehicle Routing Problem,” Transportation Science 23, 67–76 (1989). C. S. Chow and J. N. Tsitsiklis, “An Optimal One-Way Multigrid Algorithm for Discrete-Time Stochastic Control,” IEEE Transactions on Automatic Control AC-36, 898–914 (1991). M. Christiansen, “Decomposition of a Combined Inventory and Time Constrained Ship Routing Problem,” Transportation Science 33, 3–16 (1999). M. Christiansen and B. Nygreen, “A Method for Solving Ship Routing Problems with Inventory Constraints,” Annals of Operations Research 81, 357–378 (1998a). M. Christiansen and B. Nygreen, “Modelling Path Flows for a Combined Ship Routing and Inventory Management Problem,” Annals of Operations Research 82, 391–412 (1998b).

49

D. C. Collins, “Reduction of Dimensionality in Dynamic Programming via the Method of Diagonal Decomposition,” Journal of Mathematical Analysis and Applications 31, 223–234 (1970). D. C. Collins and E. S. Angel, “The Diagonal Decomposition Technique Applied to the Dynamic Programming Solution of Elliptic Partial Differential Equations,” Journal of Mathematical Analysis and Applications 33, 467–481 (1971). D. C. Collins and A. Lew, “A Dimensional Approximation in Dynamic Programming by Structural Decomposition,” Journal of Mathematical Analysis and Applications 30, 375–384 (1970). W. Cook and A. Rohe, “Computing Minimum-Weight Perfect Matchings,” (1998), preprint. P. J. Courtois, Decomposability: Queueing and Computer System Applications, Academic Press, New York, NY (1977). P. J. Courtois and P. Semal, “Error Bounds for the Analysis by Decomposition of Non-Negative Matrices,” in Mathematical Computer Performance and Reliability, G. Iazeolla, P. J. Courtois, and A. Hordijk (eds), chapter 2.2, 209–224, Elsevier Science Publishers B.V., Amsterdam, Netherlands (1984). J. W. Daniel, “Splines and Efficiency in Dynamic Programming,” Journal of Mathematical Analysis and Applications 54, 402–407 (1976). D. P. De Farias and B. Van Roy, “On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning,” Journal of Optimization Theory and Applications 105, 589–608 (2000). M. Dror and M. Ball, “Inventory/Routing: Reduction from an Annual to a Short Period Problem,” Naval Research Logistics Quarterly 34, 891–905 (1987). M. Dror, M. Ball, and B. Golden, “A Computational Comparison of Algorithms for the Inventory Routing Problem,” Annals of Operations Research 4, 3–23 (1985). M. Dror and L. Levy, “Vehicle Routing Improvement Algorithms: Comparison of a “Greedy” and a Matching Implementation for Inventory Routing,” Computers and Operations Research 13, 33–45 (1986). J. Edmonds, “Maximum Matching and a Polyhedron with 0,1-Vertices,” Journal of Research of the National Bureau of Standards 69B, 125–130 (1965a). J. Edmonds, “Paths, Trees and Flowers,” Canadian Journal of Mathematics 17, 449–467 (1965b). A. Federgruen and P. Zipkin, “A Combined Vehicle Routing and Inventory Allocation Problem,” Operations Research 32, 1019–1037 (1984). B. L. Fox, “Discretizing Dynamic Programs,” Journal of Optimization Theory and Applications 11, 228–234 (1973). H. N. Gabow, “Data Structures for Weighted Matching and Nearest Common Ancestors with Linking,” in Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 434–443, New York, NY, 1990. G. Gallego and D. Simchi-Levi, “On the Effectiveness of Direct Shipping Strategy for the One-Warehouse Multi-Retailer R-Systems,” Management Science 36, 240–243 (1990). V. Gaur and M. L. Fisher, “An Optimization Algorithm for the Joint Vehicle Routing and Inventory Control Problem and Its Implementation at a Large Supermarket Chain,” (2002), preprint. B. Golden, A. Assad, and R. Dahl, “Analysis of a Large Scale Vehicle Routing Problem with an Inventory Component,” Large Scale Systems 7, 181–190 (1984).

50

A. Haurie and P. L’Ecuyer, “Approximation and Bounds in Discrete Event Dynamic Programming,” IEEE Transactions on Automatic Control AC-31, 227–235 (1986). Y. Herer and R. Roundy, “Heuristics for a One-Warehouse Multiretailer Distribution Problem with Performance Bounds,” Operations Research 45, 102–115 (1997). K. Hinderer, “Estimates for Finite-Stage Dynamic Programs,” Journal of Mathematical Analysis and Applications 55, 207–238 (1976). K. Hinderer, “On Approximate Solutions of Finite-Stage Dynamic Programs,” in Dynamic Programmming and its Applications, M. L. Puterman (ed), 289–317, Academic Press, New York, NY (1978). K. Hinderer and G. H¨ ubner, “On Exact and Approximate Solutions of Unstructured Finite-Stage Dynamic Programs,” in Markov Decision Theory : Proceedings of the Advanced Seminar on Markov Decision Theory held at Amsterdam, The Netherlands, September 13–17, 1976 , H. C. Tijms and J. Wessels (eds), 57–76, Mathematisch Centrum, Amsterdam, The Netherlands (1977). A. J. Kleywegt, V. S. Nori, and M. W. P. Savelsbergh, “The Stochastic Inventory Routing Problem with Direct Deliveries,” Transportation Science 36, 94–118 (2002). H. J. Kushner, “Numerical Methods for Continuous Control Problems in Continuous Time,” SIAM Journal on Control and Optimization 28, 999–1048 (1990). H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time, Springer-Verlag, New York, NY (1992). R. Larson, “Transporting Sludge to the 106 Mile Site: An Inventory/ Routing Model for Fleet Sizing and Logistics System Design,” Transportation Science 22, 186–198 (1988). S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, Great Britain (1993). A. S. Minkoff, “A Markov Decision Model and Decomposition Heuristic for Dynamic Vehicle Dispatching,” Operations Research 41, 77–90 (1993). T. Morin, “Computational Advances in Dynamic Programming,” in Dynamic Programmming and its Applications, M. L. Puterman (ed), 53–90, Academic Press, New York, NY (1978). B. L. Nelson and F. J. Matejcik, “Using Common Random Numbers for Indifference-zone Selection and Multiple Comparisons in Simulation,” Management Science 41, 1935–1945 (1995). W. B. Powell and T. A. Carvalho, “Dynamic Control of Logistics Queueing Networks for Large-Scale Fleet Management,” Transportation Science 32, 90–109 (1998). M. L. Puterman, Markov Decision Processes, John Wiley & Sons, Inc., New York, NY (1994). M. I. Reiman, R. Rubio, and L. M. Wein, “Heavy Traffic Analysis of the Dynamic Stochastic InventoryRouting Problem,” Transportation Science 33, 361–380 (1999). D. F. Rogers, R. D. Plante, R. T. Wong, and J. R. Evans, “Aggregation and Disaggregation Techniques and Methodology in Optimization,” Operations Research 39, 553–582 (1991). P. J. Schweitzer and A. Seidman, “Generalized Polynomial Approximations in Markovian Decision Processes,” Journal of Mathematical Analysis and Applications 110, 568–582 (1985). N. Secomandi, “Comparing Neuro-Dynamic Programming Algorithms for the Vehicle Routing Problem with Stochastic Demands,” Computers and Operations Research 27, 1201–1225 (2000).

51

G. W. Stewart, “On the Structure of Nearly Uncoupled Markov Chains,” in Mathematical Computer Performance and Reliability, G. Iazeolla, P. J. Courtois, and A. Hordijk (eds), chapter 2.7, 287–302, Elsevier Science Publishers B.V., Amsterdam, Netherlands (1984). R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA (1998). D. M. Topkis, “Optimal Ordering and Rationing Policies in a Nonstationary Dynamic Inventory Model with n Demand Classes,” Management Science 15, 160–176 (1968). P. Trudeau and M. Dror, “Stochastic Inventory Routing: Route Design with Stockouts and Route Failures,” Transportation Science 26, 171–184 (1992). J. N. Tsitsiklis and B. Van Roy, “Feature-Based Methods for Large-Scale Dynamic Programming,” Machine Learning 22, 59–94 (1996). J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica 35, 1799–1808 (1999a). J. N. Tsitsiklis and B. Van Roy, “Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing High-Dimensional Derivatives,” IEEE Transactions on Automatic Control 44, 1840–1851 (1999b). B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis, “A Neuro-Dynamic Programming Approach to Retailer Inventory Management,” in Proceedings of the IEEE Conference on Decision and Control , IEEE, 1997. B. Van Roy and J. N. Tsitsiklis, “Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions,” in Advances in Neural Information Processing Systems 8 , 1045–1051, MIT Press, Cambridge, MA (1996). S. Viswanathan and K. Mathur, “Integrating Routing and Inventory Decisions in One-Warehouse Multiretailer Multiproduct Distribution Systems,” Management Science 43, 294–312 (1997). R. Webb and R. Larson, “Period and Phase of Customer Replenishment: A New Approach to the Strategic Inventory/Routing Problem,” European Journal of Operational Research 85, 132–148 (1995). W. Whitt, “Approximations of Dynamic Programs, I,” Mathematics of Operations Research 3, 231–243 (1978). W. Whitt, “A-Priori Bounds for Approximations of Markov Programs,” Journal of Mathematical Analysis and Applications 71, 297–302 (1979a). W. Whitt, “Approximations of Dynamic Programs, II,” Mathematics of Operations Research 4, 179–185 (1979b). P. J. Wong, “An Approach to Reducing the Computing Time for Dynamic Programming,” Operations Research 18, 181–185 (1970a). P. J. Wong, “A New Decomposition Procedure for Dynamic Programming,” Operations Research 18, 119–131 (1970b).

52

Appendix Instances Used in Computational Results Table 14: Instance topt1. i

xi

yi

1 2 3 4

0.0 -10.0 0.0 10.0

xi

yi

Ci

fi 0 1 10.0 2 0.0 0.5 0.0 2 0.0 0.7 -10.0 2 0.0 0.3 0.0 2 0.0 0.2 Vendor (0, 0), N = 4, M =

2 0.5 0.3 0.7 0.8 1, CV

ri

pi

hi

100 100 100 100 =4

40 40 40 40

1 1 1 1

ri

pi

hi

100 100 100 100

40 40 40 40

1 1 1 1

ri

pi

hi

100 100 100 100

40 40 40 40

1 1 1 1

Table 15: Instance topt2. i 1 2 3 4

0.0 -10.0 0.0 10.0

Ci

0 10.0 4 0.0 0.0 4 0.0 -10.0 4 0.0 0.0 4 0.0 Vendor (0, 0),

fi 1 2 0.2 0.2 0.1 0.5 0.3 0.3 0.2 0.3 N = 4, M =

3 0.4 0.2 0.3 0.5 1, CV

4 0.2 0.2 0.3 0.0 =5


xi 0.0 -10.0 0.0 10.0

yi 10.0 0.0 -10.0 0.0

Ci 6 6 6 6

0 0.0 0.0 0.0 0.0 Vendor

1 0.2 0.1 0.0 0.0 (0, 0),

fi 2 3 0.2 0.1 0.2 0.2 0.0 0.5 0.3 0.0 N = 4, M =

53

4 0.2 0.2 0.5 0.6 1, CV

5 0.2 0.2 0.0 0.0 =5

6 0.1 0.1 0.0 0.1


xi

yi

0.0 -10.0 0.0 10.0

10.0 0.0 -10.0 0.0

Ci 8 8 8 8

0 0.0 0.0 0.0 0.0

1 0.1 0.1 0.1 0.1 Vendor

2 0.1 0.1 0.1 0.1 (0, 0),

fi 3 4 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 N = 4, M =

5 0.1 0.1 0.1 0.1 1, CV

6 0.1 0.1 0.1 0.1 =8

7 0.1 0.1 0.1 0.1

8 0.1 0.1 0.1 0.1

ri

pi

hi

100 100 100 100

40 40 40 40

1 1 1 1

Table 18: Instances tcst1, tcst2 and tcst3. The values of (N, M ) are (10, 4), (15, 6) and (20, 8). i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

xi 16.2 -23.2 9.1 19.5 -20.0 -4.9 -0.8 4.3 -6.9 21.9 -17.8 7.4 9.1 -0.4 14.7 29.8 -16.4 -5.5 -8.7 25.3

yi -22.2 -18.7 9.8 -9.5 23.5 -22.1 -14.0 14.8 -4.2 -22.2 29.7 11.2 -0.4 23.7 22.0 12.2 -26.9 -25.0 -27.1 17.5

Ci 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 0.5 0.0 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0

2 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

fi 3 4 5 6 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.3 0.4 0.3 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.4 0.6 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 0.3 0.0 0.0 0.6 0.0 0.9 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.4 0.1 0.1 0.3 0.3 0.4 0.0 Vendor (0, 0), CV = 9

54

7 0.0 0.0 0.2 0.0 0.5 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.0 0.0 0.4 0.0

8 0.0 0.0 0.8 0.0 0.5 0.0 0.0 0.0 0.4 0.5 0.3 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.5 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

ri

pi

hi

598 504 571 569 581 551 585 518 571 557 550 551 581 518 575 511 521 523 562 598

310 294 307 304 262 347 266 257 305 281 315 259 346 340 264 327 282 287 271 335

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

Table 19: Instance tvar1. i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

xi

yi

Ci

-11.4 8.0 18.7 14.6 3.5 10.4 -6.1 12.1 13.9 -14.0 21.8 9.6 -12.3 11.8 6.5

-11.8 5.2 -28.3 -19.2 11.0 18.2 4.4 21.6 6.8 -12.6 -4.6 -5.5 -4.5 12.2 8.0

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

xi

yi

Ci

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

fi 1 2 3 4 5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 Vendor (0, 0), N = 15, M =

6 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 5, CV

7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 = 12

8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

9 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

ri

pi

hi

541 515 587 415 507 485 442 515 598 586 492 448 510 476 432

315 238 328 211 237 279 397 287 305 389 295 270 330 244 212

2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

ri

pi

hi

541 515 587 415 507 485 442 515 598 586 492 448 510 476 432

315 238 328 211 237 279 397 287 305 389 295 270 330 244 212

2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

ri

pi

hi

541 515 587 415 507 485 442 515 598 586 492 448 510 476 432

315 238 328 211 237 279 397 287 305 389 295 270 330 244 212

2 1 2 1 2 1 2 1 2 1 2 1 2 1 2


-11.4 8.0 18.7 14.6 3.5 10.4 -6.1 12.1 13.9 -14.0 21.8 9.6 -12.3 11.8 6.5

-11.8 5.2 -28.3 -19.2 11.0 18.2 4.4 21.6 6.8 -12.6 -4.6 -5.5 -4.5 12.2 8.0

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

xi

yi

Ci

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

fi 1 2 3 4 5 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 Vendor (0, 0), N = 15, M =

6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5, CV

7 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 = 12


-11.4 8.0 18.7 14.6 3.5 10.4 -6.1 12.1 13.9 -14.0 21.8 9.6 -12.3 11.8 6.5

-11.8 5.2 -28.3 -19.2 11.0 18.2 4.4 21.6 6.8 -12.6 -4.6 -5.5 -4.5 12.2 8.0

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

fi 1 2 3 4 5 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 Vendor (0, 0), N = 15, M =

55

6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5, CV

7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 = 12

Table 22: Instances tveh1, tveh2 and tveh3. i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

xi 24.8 -3.3 -24.6 25.2 4.3 24.9 -29.3 24.3 5.7 5.9 4.5 22.0 -3.8 -22.6 28.5

yi

Ci

13.8 18.8 -14.6 5.9 26.7 -1.4 20.6 -6.6 -11.8 -2.4 -1.1 -1.9 -28.3 -9.7 26.0

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

Lat

Ci

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

fi 1 2 3 4 5 0.5 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Vendor (0, 0), N = 15, M =

6 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 3, CV

7 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.5 0.5 0.0 0.0 0.5 = 12

8 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.5

9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0

10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0

ri

pi

hi

599 502 644 533 467 479 588 629 647 639 480 593 497 647 562

256 328 268 347 255 324 260 340 301 303 324 266 278 327 284

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Table 23: Instance tprx1. i 1 2 3 4 5 6 7 8 9

Long -86.8 -85.3 -81.0 -96.4 -95.4 -85.8 -90.0 -90.1 -98.1

33.6 35.0 35.2 32.5 29.8 38.2 35.2 30.0 29.3

10 4 8 24 28 4 11 4 18

0 0 0 0 0 0 0 0 0 0

fi 1 2 3 4 5 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.5 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Vendor (−84.2, 33.8), N = 9,

56

6 0.5

7 0.0

8 0.0

9 0.0

10 0.5

0.0 0.0 0.0

0.0 0.0 1.0

0.0 0.0 0.0

0.0 0.0

1.0 0.0

0.5

0.0

0.0

0.0

0.0

0.0 0.0 0.0 M = 4, CV = 20

0.0

0.0

ri

pi

hi

550 550 550 550 550 550 550 550 550

250 250 210 260 260 210 210 190 260

0 0 0 0 0 0 0 0 0

Dynamic Programming Approximations for a Stochastic ... - CiteSeerX

Dynamic Programming Approximations for a Stochastic ... - CiteSeerX

Suggest Documents

Stable Linear Approximations to Dynamic Programming for Stochastic

Statistical approximations for stochastic linear programming problems

Dynamic Programming Approximations for Partially Observable

Parallel Stochastic Dynamic Programming: Finite Element ... - CiteSeerX

TREE APPROXIMATIONS OF DYNAMIC STOCHASTIC PROGRAMS ...

Stochastic Dynamic Programming for Long Term

Stochastic equilibrium programming for dynamic ... - Springer Link

Approximate Stochastic Dynamic Programming for ... - Semantic Scholar

Dynamic Programming for Stochastic Target ... - Semantic Scholar

Fuzzy Stochastic Dynamic Programming for Process ...

Stochastic dynamic programming for unified short- and ... - CiteSeerX

A stochastic dynamic programming model for stream water quality ...

A dynamic stochastic programming model for international portfolio ...

A Dynamic Stochastic Programming Model for Bond ... - Springer Link

A multiscale dynamic programming procedure for ... - CiteSeerX

Two stochastic dynamic programming problems by model ... - CiteSeerX

Discrete Stochastic Dynamic Programming (Wiley ... - Google Sites

Stochastic Dynamic Programming Using Optimal ... - Optimization Online

Optimal Reservoir Operation Using Stochastic Dynamic Programming

Application of Stochastic Dynamic Programming in ...

markov decision processes: discrete stochastic dynamic programming ...

Sampling Stochastic Dynamic Programming Applied to Reservoir ...

Stochastic Differential Dynamic Programming - Computer Science

Deterministic or stochastic dynamic programming? - AgEcon Search