Revisiting Cluster First, Route Second for the Vehicle Routing Problem Fahrettin Cakir, W. Nick Street, Barrett W. Thomas Department of Management Sciences Tippie College of Business University of Iowa Iowa City, Iowa, USA 52242
[email protected],
[email protected],
[email protected]
Abstract While clustering was important to the development of early routing algorithms, modern solution methods rely on metaheuristics. In this paper, we take advantage of clustering research from other domains to revisit clustering approaches to routing. We propose a two-stage, shape-based clustering approach. Our solution technique is based on creating clusters of customers that form certain shapes with respect to the depot. We obtain a routing solution by ordering all customers in every cluster separately. Our results are competitive with a state-of-the-art vehicle routing solver in terms of quality. Moreover, the results show that the algorithm scales and is robust to problem parameters in terms of runtime.
1
1.
Introduction
The vehicle routing problem (VRP) is one of the best known NP-hard problems in the operations research literature. It is the problem of designing minimum cost delivery routes that, from a depot, serve a geographically distributed set of customers subject to side constraints. Most effective solution approaches to large-scale vehicle routing problems are metaheuristics. However, these heuristics are not easily deployable as they often require tailored implementations that cannot take advantage of or do not have open-source code bases. In this paper, we explore the hypothesis that scalable clustering algorithms based on a variant of k -medians can give us a competitive edge in terms of a better trade-off between runtime and solution quality in this application domain. We propose a system that can be assembled using publicly available solvers. A vehicle routing solution is a set of ordered customer sequences. Each ordered sequence is called a route. There is a depot located in a geographic region where a fleet of vehicles serve delivery requests, each assigned a single route. We employ a cluster-first, route-second heuristic to minimize the total length traveled by vehicles. In this approach, the original problem is decomposed into smaller subproblems by first clustering customers into groups whose total demand does not exceed the capacity of the vehicle. Second, the customers in each of these groups are routed. The routing of these clusters is the well-known traveling salesman problem (TSP). In general, petal-shaped routes are a common geometric feature in VRP solutions (Ryan et al. 1993a). Optimal solutions for many problems exhibit such a geometric structure, although finding the optimal solution is still NP-hard. Clustering methods that encourage such features can be useful to find good solutions for practical-sized problems. In this paper, we propose a clustering technique that seeks cluster shapes that are associated with lower routing cost. Our work is motivated by the work of Daganzo (1984), which provides guidelines to aid human dispatchers in the design of near-optimal routes. The main idea behind these guidelines is that the optimal slenderness (length / width) of the rectangle 2
covering the customers in a single cluster should be a ratio depending on how far the group of customers are from the depot, the capacity of the vehicles, and customer density. Ouyang (2007) proposes the automation of the guidelines suggested by Daganzo (1984) using an equilibrium-finding tessellation technique. We operationalize these guidelines by defining a distance function based on polar coordinates for k -medians-like clustering. This similarity measure leads to clustering solutions that have appropriate shapes and therefore lower route costs. The contribution of our work is a new clustering method for the vehicle routing domain. A novel feature of our model is a clustering objective function with parameters that control the shape of the clusters. Our method is effective, uses general commercial solvers rather than specialized metaheuristics, and is scalable. It can especially be used for large-scale problems to compute a feasible solution from which local-search heuristics can be used to improve the solution. In Section 2, we review relevant literature from the vehicle routing literature and clustering. In Section 3, we discuss a mathematical formulation of a standard vehicle routing problem (VRP). In Section 3.1, we formulate a clustering problem as an integer program that takes into account the shape of clusters. Then, in Section 3.2, we discuss the solution method we use for this problem which we call Slender. We present computational results of its effectiveness as compared to a baseline, state-of-the-art VRP solver in Section 4. We conclude our paper in Section 5.
2.
Related Literature
Over the years, many solution techniques have been proposed for the VRP. Exact methods that search for optimal solutions tend to be able to deal with problems as large as couple of hundred customers. As the problem size increases, heuristic methods need to be used (Laporte 2009).
3
The cluster-first, route-second heuristic is an approach to vehicle routing problems in which problem decomposition is used to tackle the fact that the VRP is an NP-hard problem. This clustering approach is tailored more towards VRPs than general clustering techniques in that the clustering specifically aims to lower total route length formed resulting from clustering. As in our approach, the problem is decomposed into subproblems where each subproblem is a traveling salesman problem (TSP). Existing state-of-the-art TSP solvers can easily manage problems of size up to thousands of customers (Applegate et al. 2006a). The challenge is to cluster such that the sum of route distances is collectively minimized in each cluster (Fisher and Jaikumar 1981). Traditional k -means and k -medians clustering approaches do not optimize for such an objective, where k -means clustering finds clusters that are globular and k -medians clustering finds clusters that look like diamonds. None of them consider the location of the depot in their clusters. Other cluster-first, route-second contributions exist, where authors model the capacitated vehicle routing problem (CVRP) as a capacitated discrete facility location problem (Bramel et al. 1992, Bramel and Simchi-Levi 1995). They use a Lagrangian relaxation technique for moderately-sized problems. Although asymptotically optimal, their heuristic cannot compete computationally with other heuristics (Laporte et al. 2000). Moreover, the Lagrangian relaxation-subgradient method can take a significant amount of time to converge for large instances. Hiquebran et al. (1993) combines simulated annealing with the cluster-first, route-second approach. A recent paper used spatial neighborhoods to construct local-search techniques to solve the distance-constrained and capacitated VRP (Fang et al. 2013). The sweep algorithm can be considered as one particular method for the cluster first, route second approach. This algorithm is attributed to Gillett and Miller (1974). The generic version of the sweep algorithm clusters a group of stops into a route based on the polar angle between the stops and the depot. (Ryan et al. 1993a) propose a sweep-based route generation technique that enumerates a large class of solutions using cyclic ordering of some customers according to their radial coordinates. The authors show that an optimal
4
routing solution within this class of solutions can be obtained by applying the shortest path method. Goodson et al. (2012) extend this work to vehicle routing problems with stochastic demand. Renaud and Boctor (2002) propose a new sweep-based heuristic for the fleet size and mixed vehicle routing problem. The proposed heuristic generates many candidate solutions and then chooses those that satisfy constraints for the problem using a polynomial set partitioning algorithm. Liu and Shen (1999) make use of the sweep heuristic within a twostage metaheuristic procedure for the vehicle routing problem with time windows. Dondo and Cerd´a (2013) formulate a mathematical program in order to choose the starting ray for the sweep heuristic for a VRP with cross-docking. It is also used to generate initial solutions to closely related problems (Imran et al. 2009, Zhong and Cole 2005, Franceschi et al. 2006, Crevier et al. 2007, Cordeau et al. 1997). The method is too computationally expensive for large-scale instances such as those of interest in this paper. State-of-the-art research on large scale vehicle routing problems uses metaheuristics. This line of work focuses on heuristics such as genetic/memetic algorithms, tabu search, and other neighborhood search algorithms (Nagata and Braysy 2009, Marinakis 2012, Cordeau and Maischberger 2012, Kyt¨ojoki et al. 2007, Mester and Br¨aysy 2007). Nagata and Braysy (2009) find solutions from a pool of vehicle routing solutions, which are generated using various genetic-inspired operations. Marinakis (2012) uses a two-phase heuristic that consists of building a random initial solution and improving it with a local-search phase. Cordeau and Maischberger (2012) take advantage of parallelization using tabu search heuristics. Kyt¨ojoki et al. (2007) explore variable neighborhoods of the current incumbent solution for improving solutions. Mester and Br¨aysy (2007) uses an iterative two-stage heuristic that combines guided local search with evolutionary strategies. The common strategy of the researchers is to bring together various traditional heuristic methods such as tabu search, variable neighborhood search, and genetic algorithms in an ad hoc manner, which delivers good performance. As a result, their implementation is complicated and their code base can be hard to maintain in practice.
5
Our clustering approach can be considered as discrete k -medians clustering with a special similarity matrix that focuses on clusters with certain shapes. General k -medians clustering chooses k points as cluster centers and assigns data items to their nearest cluster centers in such a way that total sum of distances to the cluster centers is minimized. The 1-norm distance function is used to compute the distances. In our approach, we group customers into clusters represented by a center using a distance function that combines radial and angular distances with weights. In a recent work, Seref et al. (2014) develop a bilinear model inspired by Bradley et al. (1997) for general similarity measures. Their work can be seen as a generalization to the clustering technique we employ in our case. The authors provide efficient techniques for finding approximate solutions to the discrete k -medians clustering problem. Unlike their formulation, our method employs capacity constraints. However, we use a similar iterative solution method to find our clusters. Closely related to k -medians clustering is k -means clustering. K -means clustering and its variants are efficient techniques that can find convex and compact clusters (MacQueen 1967). It can be cast as an optimization problem, and therefore, special mathematical programming techniques can be applied. In the 1-norm, Bradley et al. (1997) take advantage of a bilinear formulation. They propose a method that can find clusters quickly. Although quite efficient, k -means is not suitable for finding arbitrarily-shaped clusters. In our case, the clusters need to be anchored around the depot with a petal shape so that a low routing objective is achieved. An exception is spherical k-means that looks for clusters that lie in nearby rays from the origin (Dhillon and Modha 2001). This clustering method can produce petal-shaped clusters, but they do not overlap. This feature can make them undesirable for vehicle routing purposes. Clustering for arbitrarily-shaped clusters, which k -medians clustering neglects, is not a new idea in the data mining literature. DBSCAN is one of the first methods to capture arbitrarily-shaped clusters (Ester et al. 1996). It finds clusters by defining two parameters, eps and minpts. It distinguishes data points as core points and noise. A data point is a core
6
point if the number of points within its eps neighborhood is at least as many as minpts. A cluster is the maximum set of such reachable points. DBSCAN is sensitive to the design parameters eps and minpts. The number of clusters it finds and their respective size is not easily controlled. Another method that can find arbitrarily-shaped clusters is DENCLUE. DENCLUE is a density-based clustering algorithm, where clusters are determined by finding local maxima of the density function (Chaoji et al. 2008). The size of the clusters is not controlled and some points end up belonging to none of the clusters. However, data points in our problem must belong to some cluster (must be visited by a vehicle). Graph-based clustering techniques can be used to find arbitrary shapes as well. CHAMELEON is a popular technique that works with a similarity matrix for the dataset (Karypis et al. 1999). It works with two measures, relative interconnectivity and relative closeness to merge points together into predefined number of final clusters. It finds clusters in a bottom-up fashion, hence, can be myopic (greedy) in terms of the clusters it finds. Other graph-based methods include BIRCH (Zhang et al. 1996) and CURE (Guha et al. 1998). These methods are not suited well for a routing objective since they are geared towards finding shapes that exist in the dataset rather than directing clusters towards a specific shape well-suited for routing.
3.
The Clustering Problem
The VRP is characterized by a set of customers indexed by I = {1, 2, ..., m} and a group of vehicles indexed by L = {1, 2, .., d} that serve these requests. Each customer has an associated demand {qi }i∈I and there is a vehicle capacity of v. There are a number of different formulations for the VRP. In this paper, we focus on the set-covering formulation. The mathematical program for this formulation can be found in Laporte (2009). We heuristically solve the set-covering formulation for the VRP by decomposing it into clustering and routing problems, which we solve sequentially. In this section, we present the mathematical
7
formulation of the clustering problem. We use the minimum number of clusters needed to cover all customers.
3.1
Formulation of the Clustering Problem
We model our clustering problem as a discrete k -median clustering problem. The problem is to choose k out of n points in R2 as cluster centers such that the sum of distances from the n points to their nearest centers is minimized. What differentiates our clustering approach from the traditional k-median approach is the similarity matrix. The elements of the similarity matrix are weighted combinations of the pairwise radial and angular distances between customers. We formulate the clustering problem as an integer program, where the integer program models the shape of the clusters. This formulation can be seen as a discrete facility location problem based on a convex combination of the angular and radial distances between customers. We operate on polar coordinates in order to control the shape of the clusters anchored around the depot. We control the shape of clusters by optimizing for a trait that we call slenderness. Slenderness is the ratio of the radial distance of a cluster to its angular distance. Slenderness measures the thinness and the width of a cluster. The intuition is that the slenderness ratio of a cluster is not uniform; it changes depending on the cluster’s distance from the depot. The objective reflects the kind of clusters we want to see. We guide the solutions towards the desired cluster shapes using the coefficients that are set a priori and calibrated based on experimentation. The data given to the optimization problem is customer demand, {qi }i∈I and the angular (θij ) and radial (ρij ) distances between customers. The depot is assumed to take the value of (0, 0), corresponding to zero angle (θ) and zero distance (ρ). We use coefficients for the objective function whose parameters control the slenderness of the clusters. These coefficients weigh the cost of assigning customer i to a cluster l that is represented by customer j. The slenderness of the clusters depend on the angular difference 8
between customer i and customer j, θij , and on the difference in the distances to the depot of customer i and customer j, ρij . They assure that we get thin and long clusters elongated towards the depot or broad and wide clusters. These are precomputed as follows:
θij = π − π − θi − θj
for all i, j ∈ I, and
ρij = ρi − ρj
for all i, j ∈ I.
(1)
(2)
Equation (1) shows how the angular distance between customers i and j, θij , is computed. We take the acute angle between two customers as the angular distance using the depot as the origin. Equation (2) shows that the radial distances ρij are the absolute difference of the radial distances of customers to the depot. Equation (3) shows how we combine the two types of distances. The weights sum up to one as given in (4). For later reference, we also compute {δilj }i∈I, l∈L, j∈I :
δilj = αil1 θij + αil2 ρij ,
(3)
where αil1 + αil2 = 1
for all i ∈ I, l ∈ L.
(4)
We control how each distance, angular (θij ) and radial (ρij ), influences the objective via the weights (αil1 , αil2 ) associated with each respectively. The δilj are the coefficients in (3) that belong to the objective of the clustering. Suppose αil1 is relatively high, i.e. close to one, then assigning customer i to center l emphasizes the angular difference in terms of increasing the objective. If αil2 is relatively higher, then it is more preferable to have customer i and center l to be of similar distance to the depot. We illustrate the impact of the parameter settings on the cluster shape in Figure 1 and
9
Figure 1: Contour of minimum slender distance with weights (αil1 , αil2 ) = (0.05, 0.95) to centers (3, 7) and (5, 2).
Figure 2: Contour of minimum slender distance with weights (αil1 , αil2 ) = (0.95, 0.05) to centers (3, 7) and (5, 2).
10
Figure 2. Each figure depicts contour lines that represent the collection of points of equal distance to their nearest center. Figure 1 shows the effect of the weight of the parameters with two centers when (αil1 , αil2 ) = (0.05, 0.95). The clusters are in near-circular form. On the other hand, when there is high emphasis given to angular distance with (αil1 , αil2 ) = (0.95, 0.05), we get thinner clusters as in Figure 2. The decision variables and other parameters of the problem are: 1. Xilj ∈ {0, 1} is equal to 1 when customer i ∈ I is assigned to center l ∈ L which is chosen to be customer j ∈ I. Since we choose cluster centers from among the customers, there needs to be a third index running over all the customers allowing for the possibility of each customer to be a cluster center. 2. Uil ∈ {0, 1} is equal to 1 if customer i is assigned to center l ∈ L. 3. Plj ∈ {0, 1} is equal to 1 if center l is chosen to be customer j ∈ I. 4. v is the capacity of each cluster. 5. d is the number of clusters. The integer program formulation for the clustering problem, denoted by SL1, is:
11
(SL1)
min
XXX
δilj Xilj
(5)
j∈I l∈L i∈I
subject to X
qi Uil ≤ v
∀l ∈L
(6)
∀i∈I
(7)
∀ i, j ∈ I, ∀ l ∈ L
(8)
Xilj ≤ Uil
∀ i, j ∈ I, ∀ l ∈ L
(9)
Xilj ≤ Plj X Plj = 1
∀ i, j ∈ I, ∀ l ∈ L
(10)
∀l ∈L
(11)
i∈I
X
Uil = 1
l∈L
Uil + Plj − 1 ≤ Xilj
j∈I
XX
Plj = d
(12)
j∈I l∈L
Uil , Xilj , Pij
binary.
(13)
In the above formulation, constraints (6) ensure vehicle capacity is not violated and constraints (7) assure that all customers are assigned to a cluster. Constraints (8), (9), (10) model the condition that customer i belongs to a cluster l represented by customer j. Constraints (11) allow only one customer to be chosen as the center for a cluster. Finally, constraints (12) forces the customers representing centers to be uniquely chosen.
3.2
Solution Method for the Clustering Problem
To solve the above math program, we implement an iterative improvement scheme, which only approximates the optimal solution. We begin by fixing the centers, reducing the problem to a simpler one. Having solved the clustering problem for fixed centers, we then update the centers based on the new clusters. The clustering problem for fixed centers turns into the 12
following mathematical program denoted by SL2:
(SL2) min
XX
δil Uil
(14)
l∈L i∈I
X
subject to
qi Uil ≤ v
∀l ∈L
(15)
Uil = 1
∀i∈I
(16)
Uil
binary
(17)
i∈I
X l∈L
The outline of the procedure is given in Algorithm 1. The algorithm starts by randomly initializing a set of customers as cluster centers and iteratively assigns customers to their nearest centers and updates the cluster centers. While doing so, it respects each cluster’s capacity size. In the course of the algorithm, some customers represent centers of each cluster, which we call center-customers. This method is inspired by Bradley et al. (1997) that formulates a 1-norm clustering problem as a bilinear programming problem and uses a similar iterative technique between cluster assignments and center updates. Unlike our formulation based on a special distance function, their formulation is based on a 1-norm distance function, which guarantees convergence. While Algorithm 1 does not have a convergence guarantee, it is a frequent method used to obtain clusters and can optimize well if restarted from different initial solutions enough times. We use ten different initial solutions as starting points. Once we obtain a clustering, we use the Concorde TSP solver to generate routes for each individual cluster separately, which suggests the possible advantage of parallelization (Applegate et al. 2006a).
13
Algorithm 1 Slender Shape Clustering Initialization. Initialize iteration counter, t ← 0. Randomly pick k customers to be center-customers. Compute δil induced by the initial choice of center-customers using equations (1) to (4) do • Find new assignments of customers to centers from integer program SL2. d • Update centers ct+1 as ct+1 = mt+1 where mt+1 is the centroid of customers i such l l l l l=1 t+1 that Uil = 1. • Find closest customers to every center. Let these be new center-customers. until center-customers do not change
4.
Computational Experiments
We are mainly interested in whether or not our clustering problem can deliver better routing based on shape-controlling parameters. We discuss our experiments in this section and provide our results.
4.1
Data Set and Implementation
We implement our algorithm, which we call Slender (SL), in MATLAB-R2012a on a 64-bit Intel (R) Core (TM) i7-3770 CPU with 16GB RAM. At each iteration of the algorithm, the CPLEX IP solver was called using the CPLEX (12.04)-MATLAB interface. The number of drivers is a free parameter to choose in our algorithm. We fix it at the minimum number of drivers needed to satisfy all delivery requests. We choose the clustering with the best clustering objective out of 10 clusterings and then the customers in each cluster are routed using a TSP solver. One other approach could be to route every clustering separately and choose the solution with the best overall routing objective. However, this requires additional computation time. We follow such an approach to evaluate how susceptible routing solutions are to the clustering solutions in Section 4.2.1.
14
We only consider problems that limit the number of stops a vehicle can make. We test our algorithm on several benchmark instances of Gehring and Homberger (1999), which we denote by GH-u. These problems have varying numbers of customers and can be categorized as clustered (c), random-clustered (rc), and random (r) problems. They also include time window constraints, which we do not take into account. However, we modify this benchmark to have unit demand. This choice reflects modeling vehicle capacity as the number of stops a vehicle can make. In practice, waste collection, mail and newspaper delivery are some practical areas for such a setting (Kim et al. 2006, Beltrami and Bodin 1974). When demand is nonhomogeneous (varies across customers), capacity constraints turn out to break total unimodularity and the customer assignment step in Algorithm 1 becomes computationally too costly. Second, we use the benchmark of Augerat et al. (1995). These are smaller-sized problems than the GH-u instances. However, the location of the depot is more varied with respect to the center of the data in each test instance. We also modify this benchmark by assuming unit demand for all customers. We call this modified testset the A-u testset. We use five different algorithms for comparison purposes. The first is a state-of-theart vehicle routing solver, the record-to-record vehicle routing heuristic (Gro¨er et al. 2010). The code for this metaheuristic is open source. We refer to it as the RTR heuristic. The baseline RTR delivers the best routing solution obtained by restarting from 10 different initial solutions that it computes internally. The second benchmark is the classical k-medians (Km) algorithm applied in a cluster-first, route-second fashion. Third benchmark is the Km-RTR referring to the k-medians routing solution fed to a single run of the RTR heuristic. Fourth is another clustering algorithm called spherical k-means (SKm) (Dhillon and Modha 2001). SKm gives clusters that have a wedge shape centered around the depot. Fifth is SKm-RTR that denotes a solution generated by feeding the SKm routing solution to RTR for a single run. We make the comparisons in terms of the RTR solution quality. We compare these benchmarks against Algorithm 1, the Slender (SL) algorithm. We
15
apply each clustering algorithm ten times and pick the best clustering to do the routing. In addition, we compute a routing solution with the Slender algorithm and then feed that as a single initial solution to the RTR and run it a single time (SL-RTR).
4.2
Parameter Setting
We conduct a group of experiments to see how different values for the parameter αil1 in the clustering objective affect the routing solution. This parameter weighs the emphasis given to the angular distances θil for all i, l. According to preliminary testing, we do not expect to see that a small emphasis on θil would deliver better results because we observed that routing solutions in which vehicles cover large angles tend to be more costly. We choose four distinct values for αil1 ∈ {0.4, 0.5, 0.7, 0.9}. This parameter setting is the same across the customers. Table 1 compares the solutions for these different parameter settings. Refer to the caption in Table 1 for the structure of the information in the table. Table 1 shows how many times the Slender algorithm is better (column label ”(-)/0/(+)”) than the RTR. A negative sign indicates an algorithm is better than the baseline (RTR) and the positive sign indicates the baseline is better. In the row with the parameter value αil1 = 0.9 for all i, l, the table entry for comparing the Slender algorithm and RTR on the GH-u instances in terms of solution quality is “9−, 0, 21+”. The results show that the Slender algorithm is better than the RTR algorithm in nine GH-u instances and worse in 21 GH-u instances. In all cases, the Slender algorithm and the “Slender + RTR” does better in fewer cases than the baseline algorithm (10 RTR). The run-time of each algorithm is shown in columns with column label “sec.”. The entries that correspond to the column label “ave. gap” show the relative routing quality, the routing length obtained from an algorithm minus the routing length obtained from the baseline divided by the routing length obtained from the baseline, with respect to the RTR solution quality. Across all instances from different testsets, higher emphasis on θil means better routing. Moreover, it is easier to solve the optimization problems for the Slender algorithm 16
when θil take a higher value. In Table 1, we see this as decreasing run-time as αil1 increases. The best parameter setting is αil1 = 0.9 for the SL algorithm in terms of both solution quality and runtime.
αil1 = 0.4
SL SL-RTR αil1 = 0.5 SL SL-RTR αil1 = 0.7 SL SL-RTR 1 αil = 0.9 SL SL-RTR
A-u ave. gap (std.) .130 (.049) .033 (.026) .106 (.043) .027 (.027) .056 (.032) .013 (.022) .044 (.029) .017 (.029)
A-u A-u GH-u GH-u GH-u (-)/0/(+) sec. ave. gap (std.) (-)/0/(+) sec. 0-, 0, 26+ 4.82 .112 (.048) 0-, 0, 30+ 40.7 3-, 0, 23+ 5.17 .066 (.043) 0-, 0, 30+ 48.2 0-, 0, 26+ 4.09 .087 (.035) 0-, 0, 30+ 38.6 3-, 1, 22+ 4.4 .058 (.037) 0-, 0, 30+ 46.0 0-, 0, 26+ 3.4 .047 (.024) 1-, 0, 29+ 35.1 5-, 0, 21+ 3.7 .027 (.023) 2-, 0, 28+ 42.4 1-, 0, 25+ 3.5 .019 (.034) 9-, 0, 21+ 34.8 6-, 0, 20+ 3.8 .001 (.024) 11-, 1, 18+ 42.2
Table 1: All uniform results for Augerate et al. (A-u) and modified Gehring and Homberger (GH-u). For every clustering method on the left-side, the average gap to the RTR results are shown in columns denoted by ave. gap with standard deviation. The number of times a clustering method performs absolutely better that RTR is shown under columns denoted by ”(-)/0/(+)” with a negative sign. The average runtime is shown under columns denoted by ”sec.”
We conduct another set of experiments to see whether or not we can improve the SL algorithm by allowing customer characteristics to be reflected in the shape-controlling parameters. By this, we mean that the parameter that controls the emphasis for the angular distance is no longer the same across all customers. Ouyang (2007) discusses the qualitative difference of the optimal shape a cluster of customers should make depending on distance to the depot. If a group of customers is close enough to the depot, the shape of such a cluster can be arbitrary including wide and fat clusters. The intuition is that those customers farther away from the depot should have a higher emphasis on their θil in the objective function of the clustering problem so that the vehicle visiting them would not cover an angle far from the depot leading to a long distance route. The important part in this experiment is the set of coefficients: (αil1 θil + (1 − αil1 ) ρil ) for every i, l. The parameter space we search is αil1 =
ρi max(ρi )
× b + a where a and b are chosen as
in the first two columns of Table 2. These parameter configurations reflect the relatively low 17
angular emphasis for customers near the depot and higher angular emphasis for customers farther away. a 0.3 0.3 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.85 0.9 0.95
b 0.6 0.65 0.4 0.45 0.3 0.35 0.2 0.25 0.15 0.19 0.14 0.09 0.04
0 0 1 2 3 1 2 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0
A-u GH-u + mean std. - 0 + mean 26 0.087 0.036 1 0 29 0.082 26 0.097 0.045 1 0 29 0.080 25 0.058 0.036 1 0 29 0.049 24 0.068 0.045 2 0 28 0.054 23 0.053 0.043 2 0 28 0.041 25 0.058 0.042 2 0 28 0.044 24 0.044 0.034 3 0 27 0.032 25 0.056 0.048 6 0 24 0.030 25 0.050 0.035 10 0 20 0.024 25 0.066 0.049 11 0 19 0.031 25 0.062 0.045 12 0 18 0.033 25 0.063 0.046 13 0 17 0.034 25 0.067 0.045 12 0 18 0.042
std. 0.036 0.032 0.025 0.027 0.026 0.028 0.024 0.030 0.035 0.049 0.053 0.058 0.068
Table 2: SL algorithm Better(-) or Worse(+) than RTR for customer dependent parameters. For every parameter setting on the left-side, the number of times SL outperforms absolutely better than RTR is shown under columns denoted by ”(-)/0/(+)” with a negative sign. Zero denotes a tie. The average and standard deviation of the performance gap is shown under mean gap and std. If we compare the best uniform parameter setting from Table 1 with the results presented in Table 2, the SL algorithm outperforms RTR in more instances by allowing customer dependent shape-controlling parameters. The best results on the GH-u instances are obtained for parameter values {a, b} = {(0.85, 0.14), (0.9, 0.09), (0.95, 0.04)}, where the number of instances in which SL outperforms RTR ranges from 12 to 13, (Table 2). However, SL is worse than RTR by about 3.6%, on average. The SL-RTR algorithm is worse than RTR on average by less than 0.1% and outperforms RTR in as many as 15 instances (Table 3). This indicates that although customer specific parameters help SL outperform RTR in relatively more instances, the uniform parameter setting does relatively better in terms of the average performance gap, which is only 1.9% worse than RTR in Table 1. In other words, under the best customer specific parameters, SL does relatively worse by a larger magnitude compared to uniform parameter setting. In Figure 3, we show the routing obtained from the parameter 18
setting {a, b} = {(0.85, 0.14), (0.5, 0.45)} for illustrative purposes. The runtime of the SL algorithm on the GH-u instances is on average 46.51 seconds over all the parameter settings. The SL-RTR runtime is on average 54.61 seconds. These are close to average RTR runtime of 54.25 seconds.
(a) SL solution for (a, b) = (0.85, 0.14).
(b) SL solution for (a, b) = (0.5, 0.45).
Figure 3: SL routing solutions for a GH-u instance with 400 customers. (a) High emphasis to customer specific angular distance. (b) Relatively low emphasis to customer specific angular distance.
In the case of the A-u instances, SL does best when {a, b} = {(0.6, 0.3)}, whereas the best SL-RTR results occur when {a, b} = {(0.7, 0.25)}. In the best parameter configuration, {a, b} = {(0.7, 0.25)}, SL-RTR tends to do worse on cases with the depot closer to the 19
boundary. The SL algorithm is worse than the baseline (RTR) on average by 1.35% in the parameters that perform well, but only 1% worse for {a, b} = {(0.6, 0.35)}. The best parameter settings for the smaller A-u instances puts less weight on the angular distances compared to the GH-u instances. As we increase parameter a, we do not see a significant improvement in the results on the GH-u instances and a decline in performance in the A-u instances. a 0.3 0.3 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.85 0.9 0.95
b 0.6 0.65 0.4 0.45 0.3 0.35 0.2 0.25 0.15 0.19 0.14 0.09 0.04
3 7 8 7 7 8 6 9 7 4 2 3 3
0 1 0 0 0 1 1 1 1 0 1 1 0 1
A-u + mean gap 22 0.021 19 0.019 18 0.014 19 0.012 18 0.014 17 0.010 19 0.013 16 0.014 19 0.016 21 0.022 23 0.021 23 0.022 22 0.027
std. 0.026 0.030 0.028 0.024 0.029 0.024 0.023 0.028 0.028 0.029 0.026 0.026 0.030
1 1 1 3 2 3 3 12 13 14 15 14 14
0 0 0 0 0 0 0 0 0 0 0 0 0 0
GH-u + mean gap 29 0.043 29 0.040 29 0.025 27 0.022 28 0.019 27 0.013 27 0.013 18 0.005 17 0.001 16 0.001 15 0.001 16 0.003 16 0.009
std. 0.031 0.032 0.021 0.020 0.018 0.013 0.016 0.016 0.020 0.025 0.030 0.034 0.039
Table 3: SL-RTR algorithm Better(-) or Worse(+) than RTR for customer dependent parameters. For every parameter setting on the left-side, the number of times SL outperforms absolutely better than RTR is shown under columns denoted by ”(-)/0/(+)” with a negative sign. Zero denotes a tie. The average and standard deviation of the performance gap is shown under mean gap and std.
4.2.1
Initial Solution Susceptibility
The clustering solutions obtained are susceptible to the local optima trap. For this reason, the vehicle routing solutions obtained from clustering are susceptible to variation as well. We test the degree of this variation by routing each clustering obtained from the clustering algorithm instead of choosing the best clustering and calculating the routing afterwards as in our original SL algorithm. Note that this results in longer runtime, however better solutions would be obtained. We give the boxplot of the variation for a choice of parameters that did 20
the best for the slender-based algorithm in Figure 4a and the initial solution susceptibility of the RTR solver in Figure 4b. The negative deviation-percentages mean that those routing solutions were better than the routing solution of the best clustering. We see from the boxplots that the initial solution susceptibility is smaller for the slenderbased clustering although the RTR solver performs slightly better in terms of routing cost. However, this suggests that paralelization can be beneficial. For instance, the clustering method could be applied to the problem 10 times which outputs 10 different clusterings. We can process each single clustering on a separate processor and apply TSP routing to those clusters on separate processors.
4.3
Results on Large Instances
Here, we compare the best parameter results for the SL algorithm with the benchmarks Km, Km-RTR, SKm, and SKm-RTR. Again, we make the comparisons in terms of the RTR solution quality. Table 4 includes the solutions obtained from running these benchmarks. The SKm algorithm does a little better than the SL algorithm on the GH-u instances. It outperforms RTR in 11 GH-u instances whereas SL with αil1 = 0.9 outperforms RTR in 9 instances. This is due to the fact that depot locations are centrally located in the GH-u instances and the SL algorithm with a heavy emphasis on the shape parameter behaves similarly to the SKm algorithm. However, on average, SL is with in 1.9% of RTR while SKm does worse than RTR by 5.4%. In other words, when the SKm does worse than the RTR, it does so more strongly compared to the SL. Km does relatively worse compared to SL and SKm. On average, it is 3.8% worse than RTR and outperforms RTR in only one instance for the GH-u testset and two instances in the A-u testset. The SKm-RTR and SL-RTR perform relatively the same with 11 instances in which they perform better than RTR. By changing the emphasis on the customer characteristics, the SL algorithm does relatively better than Km and SKm compared to having uniform emphasis across all customers. While Km outperforms RTR in only one instance and SKm outperforms RTR in 11, SL out21
RTR Km Km-RTR SKm SKm-RTR 1 αil = 0.9 SL SL-RTR
A-u ave. gap (std.) .065 (.045) .017 (.029) .061 (.043) .029 (.035) .044 (.029) .017 (.029)
A-u A-u GH-u GH-u GH-u (-)/(+) sec. ave. gap (std.) (-)/(+) sec. 2.8 54.2 2-, 0, 24+ 2.8 .038 (.022) 1-, 0, 29+ 114.7 8-, 0, 18+ 5.3 .019 (.017) 1-, 0, 29+ 130.4 2-, 0, 24+ 2.5 .054 (.084) 11-, 0, 19+ 17.3 4-, 0, 21+ 2.8 .028 (.059) 11-, 0, 19+ 22.9 1-, 0, 25+ 3.5 .019 (.034) 9-, 0, 21+ 34.8 6-, 0, 20+ 3.8 .001 (.024) 11-, 1, 18+ 42.2
Table 4: All uniform results for Augerate et al. (A-u) and modified Gehring and Homberger (GH-u). For every clustering method on the left-side, the average gap to the RTR results are shown in columns denoted by ave. gap with standard deviation. The number of times a clustering method performs absolutely better that RTR is shown under columns denoted by ”(-)/(+)”. The average runtime is shown under columns denoted by ”sec.” performs RTR in 12 to 13 instances for the best parameter settings for the GH-u instances in Table 2. By looking at Table 3, we can see that the SL-RTR algorithm does marginally better and outperforms RTR in 14 to 15 instances. If we look at Table 3, we see that giving the SL routing solution to the RTR as an initial solution, thereby obtaining routing solutions from the SL-RTR, does not make the routing solutions better in more than 50% of the test instances for GH-u. Thus, giving random initial solutions to the RTR still outperforms SL-RTR (in 16 instances out of 30, last row).
4.4
Results on Very Large Instances
SL and SL-RTR’s greatest value comes when considering very large scale instances. To show this, in this section, we focus on the runtime of the SL algorithm in which the best clustering out of 10 is chosen and then routed. We use test instances composed of very large size instances. For instance, waste collection and mail and newspaper delivery are some areas where very large instances can be found in practice (Kyt¨ojoki et al. 2007). We test the SL algorithm on these instances and compare its runtime to RTR to see if we can observe a divergence in runtime. We generate 18 very large problems, where customers are partially clustered and partially
22
uniformly distributed. Then we test the SL and RTR heuristic on each instance. Each instance contains a number of customers from the set {2000, 3000, 5000} and a minimum number of vehicles from the set {20, 50, 100}. See Table 5 for specific customer size, number of vehicles, and instance size for every test instance. The minimum number of vehicles is computed by dividing total demand by vehicle capacity. In Figure ??, we show the runtime of the Slender algorithm versus the RTR heuristic on these instances as well as the GH-u instances. On the x-axis, we measure instance size by the number of customers multiplied by the number of vehicles. The y-axis shows the computation time in log(seconds). The squares represent runtime for an instance of the RTR. The stars represent the same for the Slender algorithm. The Slender algorithm runtime (in log seconds) not only grows close to linearly, but is significantly less on these very large problems with a minor loss in solution quality. To the contrary, the runtime for the RTR heuristic is instance dependent and highly variable. The RTR spends much time on instances that have a higher number of customers and fewer vehicles. These instances require more customers per route. Table 5 contains the relative solution quality (positive gap means RTR better) and runtime of the SL algorithm for every instance. Instance size is measured by the number of customers times the number of vehicles. The shape parameter chosen was αil1 = 0.9. The value of the clustering approach is greatest when we consider its solutions as initial feasible solutions for RTR. The SL-RTR algorithm does marginally better than the RTR on these very large instances (Table 5) versus the GH-u instances (Table 4). Figure ?? reflects the disadvantages of using a metaheuristic in that it requires tuning across the parameters (number of customers, number of vehicles) representing the problem instances. Our approach mitigates this and is less susceptible to variations in the number of customers and the number of vehicles.
23
Figure 5: Runtime for Slender and RTR. Squares represent RTR and stars represent the Slender algorithm.
5.
Conclusion
The results in this paper suggest that better routing by cluster-first /route-second demands high emphasis on the angularity of the clusters. Moreover, the angular distance is not as important for customers closer to the depot. The Slender algorithm we propose can be implemented using commercial solvers, while many metaheuristic codes are not available. Further, coupled with an open-source metaheuristic, our method (SL-RTR) demonstrates excellent results on new very large VRP instances where it outperforms the RTR on average. It outperforms spherical k -means and k -medians clustering algorithms on large realistic problems albeit with parameter setting. One possible direction for future research would entail devising a clustering method whose slenderness is responsive to the clustering tendency of the customers for every unique test instance.
24
References Applegate, D, R Bixby, V Chv´atal, W Cook. 2006a.
Concorde TSP Solver, 2006.
http://www.tsp.gatech.edu/concorde. Applegate, D. L., R. E. Bixby, V. Chvatal, W. J. Cook. 2006b. Notes on continuous approximations for TSP. The Traveling Salesman Problem: A Computational Study. Princeton University Press Augerat, P., J.M. Belenguer, E. Benavent, A. Corberan, D. Naddef, G. Rinaldi. 1995. Computational results with a branch and cut code for the capacitated vehicle routing problem. Rapport de recherche - IMAG 1. Beltrami, E. J., L. D. Bodin. 1974. Networks and vehicle routing for municipal waste collection. Networks 4 65–94. Bradley, P.S., Mangasarian, O.L., Street, W.N. 1997. Clustering via concave minimization. Advances in Neural Information Processing Systems -9 368–374. Bramel, J., E. G Coffman Jr., P. W. Shor, D. Simchi-Levi. 1992. Probabilistic analysis of the capacitated vehicle routing problem with unsplit demands. Operations Research 40 1095–1106. Bramel, J., D. Simchi-Levi. 1995. A location based heuristic for general routing problems. Operations Research 43 649–660. Chaoji, V, M Al Hasan, S Salem, M J Zaki. 2008. SPARCL: Efficient and effective shapebased clustering. Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08 . 93–102. Cordeau, J., M. Gendreau, G. Laporte. 1997. A tabu search heuristic for periodic and multi-depot vehicle routing problems. Networks 30 105–119.
25
Cordeau, J., M. Maischberger. 2012. A parallel iterated tabu search heuristic for vehicle routing problems. Computers & Operations Research 39 2033–2050. Crevier, Benoit, J., G. Laporte. 2007. The multi-depot vehicle routing problem with interdepot routes. European Journal of Operational Research 176 756–773. Daganzo, Carlos F. 1984. The distance traveled to visit n points with a maximum of c stops per vehicle: An analytic model and an application. Transportation Science 18 331–350. Dhillon, I S, D S Modha. 2001. Concept decompositions for large sparse text data using clustering. Machine Learning 42 143–175. Dondo, Rodolfo, J. Cerd´a. 2013. A sweep-heuristic based formulation for the vehicle routing problem with cross-docking. Computers & Chemical Engineering 48 293–311. Ester, Martin, H.P. Kriegel, J. Sander, X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Knowledge Discovery and Data Mining 96 1996. Fang, Zhixiang, Wei Tu, Q. Li, S.L. Shaw, S. Chen, B.Y. Chen. 2013.
A Voronoi
neighborhood-based search heuristic for distance/capacity constrained very large vehicle routing problems. International Journal of Geographical Information Science 27 741–764. Fisher, M.L., Ramchandran J.. 1981. A generalized assignment heuristic for vehicle routing. Networks 11 109–124. Franceschi, R., M. Fischetti, P. Toth. 2006. A new ILP-based refinement heuristic for vehicle routing problems. Mathematical Programming 105 471–499. Gehring, H., J. Homberger. 1999. A parallel hybrid evolutionary metaheuristic for the vehicle routing problem with time windows. Proceedings of EUROGEN99 , vol. 2. 57–64. Gillett, B., L.R. Miller. 1974. A heuristic algorithm for the vehicle-dispatch problem. Operations Research 22 340–349. 26
Goodson, J.C., J.W. Ohlmann, B.W. Thomas. 2012. Cyclic-order neighborhoods with application to the vehicle routing problem with stochastic demand. European Journal of Operational Research 217 312–323. Gro¨er, C., B. Golden, E. Wasil. 2010. A library of local search heuristics for the vehicle routing problem. Mathematical Programming Computation 2 79–101. Guha, S., Rajeev R., Kyuseok S. 1998. CURE: An efficient clustering algorithm for large databases. SIGMOD ’98, ACM, 73–84. Hiquebran, D. T., A. S. Alfa, J. A. Shapiro, D. H. Gittoes. 1993. A revised simulated annealing and cluster-first route-second algorithm applied to the vehicle routing problem. Engineering Optimization 22 77–107. Imran, A., Said S., N.A. Wassan. 2009. A variable neighborhood-based heuristic for the heterogeneous fleet vehicle routing problem. European Journal of Operational Research 197 509–518. Karypis, G, Eui-Hong Han, V Kumar. 1999. Chameleon: Hierarchical clustering using dynamic modeling. Computer 32.8 68-75. Kim, B.I., Seongbae K., Surya S.. 2006. Waste collection vehicle routing problem with time windows. Computers & Operations Research 33 3624–3642. Kyt¨ojoki, J., Teemu N., O. Br¨aysy, M. Gendreau. 2007. An efficient variable neighborhood search heuristic for very large scale vehicle routing problems. Computers & Operations Research 34 2743–2757. Laporte, G. 2009. Fifty years of vehicle routing. Transportation Science 43 408–416. Laporte, G, M Gendreau, J-Y. Potvin, F Semet. 2000. Classical and modern heuristics for the vehicle routing problem. International Transactions in Operational Research 7 285–300. 27
Liu, F.H. Franklin, S.Y. Shen. 1999. A route-neighborhood-based metaheuristic for vehicle routing problem with time windows. European Journal of Operational Research 118 485– 504. MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. California, USA, 14. Marinakis, Y. 2012. Multiple phase neighborhood search-GRASP for the capacitated vehicle routing problem. Expert Systems with Applications 39 6807–6815. Mester, D., O. Br¨aysy. 2007. Active-guided evolution strategies for large-scale capacitated vehicle routing problems. Computers & Operations Research 34 2964–2975. Nagata, Y., O. Braysy. 2009. Edge assembly-based memetic algorithm for the capacitated vehicle routing problem. Networks 54 205–215. Ouyang, Y. 2007. Design of vehicle routing zones for large-scale distribution systems. Transportation Research Part B: Methodological 41 1079–1093. Renaud, J., F.F. Boctor. 2002. A sweep-based algorithm for the fleet size and mix vehicle routing problem. European Journal of Operational Research 140 618–628. Ryan, David M, C. Hjorring, F. Glover. 1993a. Extensions of the petal method for vehicle routeing. The Journal of the Operational Research Society 44 289–296. Seref, O., Y.J. Fan, W.A. Chaovalitwongse. 2014. Mathematical programming formulations and algorithms for discrete k-median clustering of time-series data. INFORMS Journal on Computing 26 160–172. Zhang, T., Raghu R., Miron L. 1996. BIRCH: An efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. {SIGMOD} ’96, ACM, New York, NY, USA, 103–114. 28
Zhong, Y., M.H. Cole. 2005. A vehicle routing problem with backhauls and time windows: A guided local-search solution. Transportation Research Part E: Logistics and Transportation Review 41 131–144.
29
(a) Slender-based solution variation.
(b) RTR solver solution variation.
Figure 4: Susceptibility to initial solutions.
30
31
vl-1 vl-2 vl-3 vl-4 vl-5 vl-6 vl-7 vl-8 vl-9 vl-10 vl-11 vl-12 vl-13 vl-14 vl-15 vl-16 vl-17 vl-18 average deviation
Problem 2000 2000 3000 2000 2000 5000 5000 5000 3000 3000 2000 2000 5000 5000 3000 3000 5000 5000
40000 40000 60000 60000 100000 100000 100000 100000 150000 150000 200000 200000 250000 250000 300000 300000 500000 500000
# of Customers Instance Size
SL SL-RTR sec. gap 57.88 -0.002 60.11 0.013 94.61 0.001 101.14 -0.026 131.84 -0.006 179.01 -0.012 133.24 0.005 178.05 -0.020 222.93 -0.016 221.24 -0.007 287.71 0.000 292.82 0.004 455.20 -0.018 408.82 -0.011 480.67 -0.007 470.73 -0.006 1038.77 -0.007 879.33 -0.018 316.34 -0.007 273.34 0.009
Table 5: Very Large Instances
SL SL gap # vehicles 0.035 20 0.062 20 0.051 20 0.033 20 0.022 50 0.051 20 0.057 50 0.046 20 0.025 50 0.030 50 0.030 100 0.038 100 0.029 50 0.039 50 0.028 100 0.035 100 0.030 100 0.017 100 0.036 0.012
SL-RTR # vehicles 20 20 20 20 50 20 50 20 50 50 100 100 50 50 100 100 100 100
SL-RTR sec. 107.79 109.82 246.52 249.75 152.31 828.89 152.92 840.18 268.45 265.69 304.35 309.83 600.52 555.07 512.49 503.36 1117.65 957.65 449.07 310.47
RTR sec. 502.49 498.32 1585.40 1488.34 231.87 6646.81 206.15 6378.33 511.87 478.63 183.30 170.92 1532.73 1479.50 377.23 324.63 840.13 823.17 1347.77 1943.55