It is the case of methods that use prox- imal techniques 10], 15], 44], dual relaxation 3], 61], 47], cutting plane 24], subgradient methods 21]. For the underlying ...
a survey of algorithms for convex multicommodity flow problems A. Ouorou
P. Mahey
Laboratoire Inter-universitaire d'Informatique, de Modelisation et d'Optimisation des Systemes Universite Blaise Pascal, ISIMA, BP 125 63173 Aubiere cedex, France.
J.-Ph. Vial
HEC, Section of Management Studies, University of Geneva, 102, Boulevard Carl-Vogt, 1211, Geneve 4, Suisse.
Abstract
There are many problems related to the design of networks. Among them, the message routing problem plays a determinant role in the optimization of network performance. Much of the motivation for this work comes from this problem which is shown to belong to the class of nonlinear convex multicommodity ow problems. This paper emphasizes the message routing problem in data networks, but it includes a broader literature overview of convex multicommodity ow problems. We present and discuss the main solution techniques proposed for solving this class of largescale convex optimization problems. We conduct some numerical experiments on the message routing problem with some dierent techniques.
1 Introduction The literature dealing with multicommodity ow problems is rich since the publication of the works of Ford and Fulkerson's [19] and T.C. Hu [30] in the beginning of the 1960s. These problems usually have a very large number of variables and constraints and arise in a great variety of applications. The linear multicommodity ow problems have naturally attracted much interest of researchers in Operations Research, rst as basic optimization models for network design and operation problems, and then as a natural illustration 1
of decomposable structures in Linear Optimization [27],[1]. Excellent surveys such as J. Kennington's [35] and Assad's [2] have been published at the end of the 1970s. Numerical experiments have been conducted with a signi cative success of subgradient methods which seem to be relatively insensitive to the number of dual variables associated with the relaxation of the coupling constraints [27] even if slow converging. More recently, the availability of multiprocessor and massively parallel architectures has increased the interest for decomposition techniques, inducing ecient versions of known decomposition algorithms for linear programming like Dantzig-Wolfe's algorithm (for example [33]), partitioning (for example [17]) which are able to solve large scale network ow problems. The rst models with nonlinear costs appeared later and were studied in connection with telecommunication and transportation networks. For instance, one of the most important problem in the design of packet-switched computer networks consists in the determination of a set of routes on which packets have to be transmitted and which is optimal according to some chosen cost criterion. The average message delay is the most frequently performance measure used in the literature for such networks. Under appropriate assumptions, this problem belongs to the class of nonlinear convex multicommodity ow problems [20],[6]. Much of the motivation for this work comes from these message routing problems. The present study is concerned only with the nonlinear convex models for which, to our knowledge, no overview has been published. In theory these models can be solved by general nonlinear programming techniques. However, their special structure makes decomposition methods much more attractive. That is, much eciency can be gained by identifying easier subproblems for which polynomial algorithms can be coded very eciently, such as shortest paths and minimum cost ow calculations. The case of concave dierentiable cost functions and the case of xed costs have been studied by many authors and we refer to [46] for a survey. Recently, a more complex multicommodity ow problem with step increasing discontinuous cost functions have been addressed in [22].
2 The Convex Multicommodity Flow Problem Let us state the notations we will use in the following. We are given a directed graph G = (V; E ) associated with a network with m nodes (which according to the context may represent switching centers, terminals, concentrators, and so on) and n arcs (links between some chosen pairs of nodes). Let K denote the number of commodities to be transported through the network. Each commodity k has a single source-sink pair (sk ; tk ) and we are given the ow requirement rk (trac quantity which must be sent between sk and tk ). The 2
general nonlinear convex multicommodity ow problems we are concerned with may be formulated as n K X n X X fkj (xkj ) min f (x) = fj (x0j ) + j =1 k=1 j =1 X s.t. x0j = xkj j = 1; : : : ; n (1) k
Mxk = rk bk k = 1; : : : ; K 0 xk k = 1; : : : ; K 0 x0j cj j = 1; : : :; n
(2) (3) (4)
where
M is the m n node-arc incidence matrix of G, xk denote the n-vector representing the kth commodity, bk is the m-vector with all components 0 except bks = ?bkt = 1, cj the capacity of arc j , x0j represents the total ow on arc j . k
k
Constraints (2) and (4) are respectively classical network and capacity constraints. Constraints (1) are coupling constraints in the sense that, when relaxed, K independent individual ow problems can be solved separately. The nonnegativity constraints (3) express the fact that arcs are used in the direct way. The functions fkj are associated with the ow of each commodity on each arc of the network and are supposed to be nonlinear and convex as well as the functions fj associated with the total ow x0j . The function fj most used in practice is: x0j Kleinrock delay function (5) fj (x0j ) = c ? j x0j which imposes x0j < cj in (4). The case where fj is de ned by (5) and fkj 0 corresponds to the message routing problem [36] (see also [6]). The above multicommodity ow problem can alternatively be formulated using ows through paths of the network. More precisely, let Nk denote the number of paths between the nodes sk and tk, and kp the arc-path incidence vector of the pth path de ned by 8 1 if j belongs to the path < kp(j ) = : 0 else 3
Then the arc-path formulation of multicommodity ow problem is N K X n n X X X fkj ( kp(j )xkp) min f (x) = fj (x0j ) + k
s.t.
Nk K X X
j =1
k=1 j =1
kp(j )xkp = x0j ; k=1 p=1 Nk X xkp = rk ; 8k p=1 0 x0j cj ; 8j 2 E 0 xkp; 8k; 8p
p=1
8j 2 E
(6) (7) (8) (9)
where xkp is the ow of commodity k through the pth path. The simple network constraints are now expressed by (7) and we have the same capacities constraints (8) and coupling constraints (6). This formulation presupposes that an enumeration of paths for each pair (sk ; tk ) is given. This is unrealistic in practice and the methods dealing with this formulation do not require such an explicit enumeration, but include some path generation iterative procedure. The reader interested by the eects of the formulation of the multicommodity network
ow problems in the framework of decomposition, is referred to [33].
3 Literature Overview In this section, an attempt is made to give a synthesis of solution techniques proposed in the literature for nonlinear convex multicommodity ow problems. In this aim we take into account some criteria which in our point of view characterize the approaches. These criteria are: the decomposition strategy, the multicommodity ow model, the technique used to solve master and subproblems, and the potential applications. Most approaches use a decomposition strategy as is shown below and, in consequence, a lot of them are based on duality and relaxation of the coupling constraints. The eciency of dual schemes depends highly on the smoothness of the dual function. This is obtained with a strictly convex objective function of the problem. For instance, Nagamochi [47] has studied the model where all the functions fj and fkj are strictly convex. Strictly convexifying the objective function is also possible as shown by Stern [61] who solved the message routing problem considering the objective function n K X n X X x2kj (10) f (x) = c x?0jx + r 0j j =1 j k=1 j =1 4
where r should be small enough to keep the solution as realistic as possible. Because of the importance of the characteristic of the objective function, we shall distinguish the following model types according to the objective function 1. f convex. 2. f convex and dierentiable. n n X X 3. f (x) = c x?0jx (delay function) or f (x) = fj (x0j ), fj strictly convex, fkj 0. 0j j =1 j j =1 4. f (x) =
n X j =1
fj (x0j ) +
K X n X k=1 j =1
fkj (xkj ) fj ; fkj strictly convex.
The second column of table 1 refers to these labels. As already noted, the multicommodity
ow problem consists in determining a minimal nonlinear convex cost multicommodity ow through a network that meets the demand for each commodity subject to arc capacities restrictions, ow conservation at transshipment nodes of the network. Other constraints are sometimes added to the models of section 2. One common example is to restrict individual commodity ows on the arcs (see for example [47]). The multicommodity ow problems of section 2 may be solved by mathematical programming tools. However in real life, their very large scale makes a straightforward application of these tools computationally cumbersome. All the proposed methods take advantage of the structure of the problem in one way or another to be eective and many of them are based on decomposition. The main motivation for decomposition is to reduce the problem to smaller subproblems, but other important motivations are present in multicommodity
ow problems, namely to identify easier submodels as linear or convex minimum cost ow problems or shortest paths calculations, to parallelize or distribute computations among commodities, arcs or paths. The latter motivation has renewed interest for decomposition methods since the eighties with the increased development of parallel and distributed architectures (see [56]). Approaches that have been eectively coded in parallel architectures are indicated in table 1 by the symbol \//". The decomposition strategy relies essentially on a problem manipulation: dualization (see for example [24],[21]), distributed coupling [10], monotropic network programming [44]. Concerning the techniques, early approaches were based on classical mathematical programming algorithms which were adapted to the convex multicommodity ow problem (steepest descent, Newton methods, conjugate gradient methods). In particular the most popular among existing algorithms such as Flow Deviation [20],[38], and Projected Newton [4] fall into the category of feasible direction methods. Linear or piecewise linear 5
Authors & References
Models Type
Solution techniques
Subproblems
Decomposition
Applications
Fratta et al.[20] LeBlanc [38]
3
gradient method
shortest paths problems
by commodity
Telecom.
Cantor et al.[8]
3
simplicial decomp.
shortest paths problems
no
Telecom.
Schwartz et al.[57]
3
gradient projection
descent direction
no
Telecom.
Stern [61]
4
dual relaxation
update formulae
by node
Telecom.
LeBlanc et al.[39]
3
gradient, PARTAN's tech.
shortest paths problems
by commodity
Network equilibrium
Bertsekas et al.[5],[4]
3
projection methods
shortest paths problems
by commodity
Telecom.
Fukushima [21]
3
nondi. optimization
shortest paths problems
no
Transport
Authie[3]
3
relaxation
nonlinear min. cost ow
by commodity //
Telecom.
Nagamochi[47]
4
dual relaxation
descent direction
by node
Schultz et al.[56]
2
interior points
min. cost ow
by commodity //
Distribution
Pinar et al[51]
2
penalty, simplicial decomp.
min. cost
ow
by commodities //
Distribution
Eckstein et al[15]
1
proximal
one-dim. minimization & nonlin. min. cost ow
distributed on arcs & simple commodities //
Transport
Chiet et al.[10]
3
proximal
one-dim. minimization & quadrat. min. cost ow
distributed on arcs & simple commodities
Telecom.
Gon et al[24]
1
cutting plane, interior point
one-dim. minimization, & shortest paths prob.
distributed on arcs & simple commodities
Mahey et al[44]
3
proximal
one-dim. minimization, & shortest paths prob.
distributed on arcs & simple commodities
Table 1: A summary of some approaches for convex multicommodity ow problems
6
Telecom.
approximation of the nonlinear function (see [35]), modi cation of the objective function [38],[6] were also proposed. On the other hand, some attempts have been made to solve the problem from a dual or primal-dual point of view. It is the case of methods that use proximal techniques [10], [15],[44], dual relaxation [3],[61], [47], cutting plane [24], subgradient methods [21]. For the underlying applications, we have just retained those that have been originally treated in the papers. We do not mention when the algorithms have been validated on randomly generated problems. Other references on generalized transportation problems are not cited here (see [50]).
4 Solution techniques As already mentioned, the solution techniques for nonlinear multicommodity ow problems were initially special cases of classical methods for general nonlinear optimization programming. Recent techniques such as proximal and interior point ones have appeared later. In this section, we describe a wide selection of existing algorithms with emphasis on four algorithms which are tested in section 5. We make no attempt to be exhaustive and give references where the interested reader can nd further details and convergence results.
4.1 The Flow Deviation method (FD) The Flow Deviation method is a primal method that has been simultaneously proposed for the message routing problem by LeBlanc [38], and by Fratta, Gerla and Kleinrock [20]. It is a special case of the so-called Frank-Wolfe's method [66] for solving nonlinear optimization problems with linear constraints. The direction- nding subproblem is solved by computing shortest paths between each pair (sk ; tk ) and loading the required amount of ow onto the corresponding path. The link costs used when nding cheapest routes are the partial derivatives of the objective function evaluated at the current solution. To deal with capacity constraints, the delay function fj is replaced by 8 > if x0j 2 [0; cj ] < fj (x0j ) = x0j =(cj ? x0j ) fj (x0j ) = > : (x0j ) (quadratic or linear) if x0j > cj The quadratic (or linear) function is chosen so that fj and have equal values and rst two derivatives ( rst derivative if is linear) at the point = cj (in general, = 0:99). More speci cally, the entire algorithm consists of the four following steps: 7
φ
ϕ
Λ
0
C j
x
0j
Figure 1: Modi ed objective function in Flow Deviation method. 1. (Initialization) Find a feasible solution x0, choose a tolerance parameter > 0 and set t = 0; 2. (Subproblem) yt = 0. For each commodity k, nd a shortest path between sk and tk (with link cost fj (xt0j )) and load the corresponding amount rk onto this path (yk is then obtained). Then 8 t > < y0j + rk if arc j belongs to that path t y0j = > : y0t j else (Step length search) Determine t = arg minff (xt + (yt ? xt)); 2 [0; 1]g 3. (Flow deviation) xt+1 = (1 ? t)xt + tyt 4. (Termination criterion) Stop the procedure if f (xt+1) is within of the largest lower bound found at any iteration; otherwise set t := t + 1 and go to step 2. If we need to compute the complete routes taken by each commodity, a simple storing of the individual paths can be include in the procedure. The characteristic property of the method is that ow is shifted from the nonshortest paths in equal proportions. This property distinguishes the Flow Deviation method from the method discussed in the next section. When the optimal total link ows are the only quantities of interest, then the Flow Deviation method requires a small amount of storage in its implementation and allows to solve very large networks problems. Some variants of the method have been proposed in 8
the literature [40],[18]. We mention nally that an algorithm for nding a feasible starting multicommodity ow can be found in [36]. Schultz and Meyer [56] use a logarithmic barrier function to treat the coupling constraints. Then the barrier problem is solved by the method of Frank-Wolfe which, as we have already noted, takes advantage of the constraints structure. The search direction is a bit dierent here: the block structure of the problem allows a multidimensional search, that is a K -dimensional optimization problem with simple bounds is solved to coordinate the subproblem solutions.
4.2 Projection Method (PM) The Flow Deviation method tends to keep all generated path ows strictly positive. Such a behavior is improved in the so-called projection methods [6] in which the ow decreases along a nonshortest derivative path proportionally to the dierence between its length and that of the shortest path. If such decrease results in a negative ow, the path ow is simply set to zero. The capacity constraints are treated as in the Flow Deviation method. At each iteration, the routing problem for the purpose of the next iteration is converted to minimizing a function on the positive orthant. For each commodity k, let pathkp be a shortest path between sk and tk with link costs fj0 (x0j ). From N X xkp = rk k
p=1
we have
xkp = rk ?
X p6=p
xkp
(11)
which is substituted in the objective function obtaining the problem min f(x) (12) s.c. xkp 0 for all k; p 6= p where xkp = xkp for all k; p 6= p. This problem is solved by a projection method [6] which yields xtkp+1 = maxf0; xtkp ? tHkp?1 (dkp ? dkp)g (13) where X 0 X 0 dkp = fj (x0j ); dkp = fj (x0j ); j2pathkp
Hkp =
X
j2Lkp
fj00(x0j );
j2pathkp
Lkp = pathkp [ pathkp ? pathkp \ pathkp 9
To summarize, after an initialization procedure, the following steps are executed sequentially: 1. Compute a shortest path joining (sk ; tk ) for each commodity k with length fj0 (xt0j ) on link j where xt is the current solution. The shortest paths are added to the corresponding active paths for each k, if they are not already in the list. 2. The path ows are updated using (13). The shortest path ows are then adjusted according to (11). For the choice of the step length t see [5]. Note that all the nonshortest path ow that are zero will stay at zero. Hence, the computations of the path ows are to be made for paths that carry positive ow. Another projection method has been proposed by M. Schwartz and C. Cheung [57] for the message routing problem. The projection operator in this approach incorporates the constraint equations. As the authors pointed out, this algorithm is well-suited to networks with a small number of commodities.
4.3 Decomposition and cutting plane algorithms When formulated as a mathematical program, the nonlinear multicommodity ow problem has a constraint matrix with a primal block-angular structure. Decomposition algorithms strive to exploit this structure. First, using a Lagrangian duality, the problem is transformed into a nondierentiable problem of much smaller size. Next, some specialized algorithm for nondierentiable optimization is used to solve the transformed problem.
4.3.1 Decomposition principle Consider the message routing problem formulation given in page 3 with fkj () 0 and the constraints (1){(4). Since the objective function is strictly increasing in x0j , the coupling K equality constraint (1) can be replaced by P xkj x0j . Associating with the coupling k=1 constraints (1) the dual variables u 0, we construct the partial Lagrangian:
L(x; u) =
n X j =1
fj (x0j ) + 10
n X j =1
uj (
K X k=1
xkj ? x0j ):
(14)
We associate the dual problem where
max L(u); u0
L(u) = min fL(x; u) j x satis es (2){(4)g : Problem (16) is separable into n nonlinear one-dimensional subproblems L0j (u) = min ffj (x0j ) ? uj x0j j 0 x0j cj g ;
(15) (16) (17)
and K shortest paths problems with arc costs uj and optimal value L1k (u). Both types of problems are easily solved. In particular, the solution of Problem (17) is given by the equation fj0 (x0j ) = uj . A closed form solution for the delay function (5) can be readily computed. From duality theory, the optimal value of problem (15) is equal to the optimal solution of the primal problem.
4.3.2 Decomposition methods There exist many dierent strategies to solve the nondierentiable problem (15). Let us mention a few of them that have been applied in the context of multicommodity ow problems: the bundle method [41, 65] used by [45], the standard cutting plane method [9, 34, 12] used by e.g. [33], and the analytic center cutting plane method [25], thereafter named ACCPM. We shall now focus on ACCPM that have been extensively tested on very large problems [24, 28]. (For a comparative study of the last two methods, see [13].) Cutting plane methods are based on increasingly re ned polyhedral approximations of the epigraph of L(u). Let us give a very general description of those methods. Let futgTt=1 be a sequence of query points. At each point, one computes L(ut) and a subgradient t 2 @L(ut). As mentioned in the previous subsection, the subgradients are direct byproducts of the optimization in (17) and the shortest path computations. Since L(u) is concave, the convex inequality
L(u) L(ut) + (t)T (u ? ut) is valid for all u. Therefore, the following program n o max z j z L(ut) + (t)T (u ? ut); t = 1; : : :T
(18)
is a polyhedral relaxation of (15). The standard cutting plane method [9, 34, 12] de nes the next query point uT +1 as the maximizer of (18). Let us point out that the constraints of the dual of (18) can be interpreted as a convex combination of the matrix columns, 11
i.e, of the shortest paths. One can therefore easily reconstruct a primal solution, which however may not satisfy the coupling constraint restriction. The analytic center cutting plane method replaces (18) with the so-called set of localization. Let T = maxtfL(ut)g be the best lower bound for max L(u). The set of localization is
HT = f(u; z) j z T ; z L(ut) + (t)T (u ? ut); t = 1; : : : T; 0 uj U; j = 1; : : : ng :
(19)
The box constraints on u ensure that the set is compact. (The constant U is arbitrary; it must be such that the optimal solution makes the box constraints inactive uj < U .) The set of localization always contains the optimal solution. ACCPM picks the central point that minimizes the product of all the slacks to the cutting planes and the box constraints. This strategy regularizes the method and achieves great robustness. One of the issue is the ability of the method to compute new analytic centers after adding many cuts (which might be as many as the number of commodities plus the number of arcs). Although the method requires only a few Newton iterations to recompute an analytic center, each iteration remains computationally costly. For a detailed description of the method in the context of nonlinear multicommodity ow problems, we refer to [24]. Let us point out again that the computation of the analytic center of (19) produces dual variables that have the same interpretation as in the case of the standard cutting plane method. This information is very valuable as it yields a natural upper bound on the optimal value, and thus an upper estimate of the distance to the optimum value for the best recorded solution.
4.4 The Proximal Decomposition Method (PDM) This algorithm is a specialized version of the Partial Inverse method designed by Spingarn [59] for the decomposition of convex separable problems. It was initially designed to solve a generic convex constrained program: minimize a convex lower-semicontinuous F on a closed subspace A. If A? denote the orthogonal subspace to A, an optimal primal-dual pair (x; y) must lie in the Cartesian product space A A?. The algorithm performs two distinct steps at each iteration: a proximal step which regularizes the objective function by adding a quadratic term depending on the previous primal-dual pair of solutions, and a projection step on the corresponding subspaces. In [43], theoretical results show how an optimal scaling parameter can be chosen to accelerate convergence, basically when F is strongly convex with a Lipschitzian gradient. 12
Many distinct strategies are possible to put the multicommodity ow problem in the generic form. In [44], the arc-path formulation of the multicommodity ow problem was considered and a coupling subspace A were proposed which includes the equations (6) and (7). As the set of paths between sk and tk is not known a priori, the authors substitute it at each iteration t = 0; 1; : : : by a subset which contains the previously generated paths. The proximal step consists of one-dimensional convex subproblems for each arc to nd aggregate ows xt0+1 j . Then, new paths are generated by shortest paths calculation with link costs fj (xt0+1 j ) followed by a distributed updating for path ows and potentials. The whole algorithm is represented below with the following notations: Nk denote the number of paths corresponding to the commodity k at iteration t, d(j ) denote the number of paths sharing j and the residual (violation of constraints (6) and (7)) for a vector x = (xk ; k = 0; : : : ; K ) are denoted by : X XX kp(j )xkp ? x0j and rk (x) = rk ? xkp rj (x) = 0
p
k
p
1. Choose the convergence parameters "1; "2; > 0;. Set the iteration index t = 0. The initial vectors x0; y0; Y 0 may be chosen arbitrarily. 2. For each arc j compute t t x + ((x0j )2 ? 2(xt + rj (x ) )x0j )g xt0+1 min f f ( x ) ? y j 0 j 0 j 0 j j = arg 0x j 2 d(j ) 0