NETWORK FLOW CODE WITH LINEAR SIDE CONSTRAINTS ... the
multicommodity network flow problem with a linear objective function and linear
side ...
COMPUTATIONAL TESTS OF A NONLINEAR MULTICOMMODITY NETWORK FLOW CODE WITH LINEAR SIDE CONSTRAINTS THROUGH PRIMAL PARTITIONING Jordi Castro & Narc´ıs Nabona Statistics & Operations Research Dept. UPC DATE 9/94 DR 94/05
COMPUTATIONAL TESTS OF A NONLINEAR MULTICOMMODITY NETWORK FLOW CODE WITH LINEAR SIDE CONSTRAINTS THROUGH PRIMAL PARTITIONING. Abstract: This work is an extension of a previous one [8] where a specialized code for solving the multicommodity network flow problem with a linear objective function and linear side constraints was presented. The new code here described is able to minimize a nonlinear objective function, which generally requires longer executions than linear cases. The code implements a specialization for the multicommodity network problem of Murtagh & Saunders’ strategy of dividing the set of variables into basic, nonbasic and superbasic. All the computations with the basis matrix are reduced to simpler operations with a reduced working matrix, following a primal partitioning technique. An ad hoc update of this working matrix has been developed, which improves the performance of the method considerably. The optimization process followed consists of three phases (instead of the usual two in the simplex method). The first merely solves a single network flow problem for each commodity, the second phase attempts to obtain a feasible point and the last reaches the optimizer of the nonlinear function. This methodology has proved to be very powerful, especially in cases where the number of active linking multicommodity constraints is small. Several tests are reported, using artificial problems and real problems arising from the fields of long and short-term hydro-thermal scheduling of electricity generation, electrical network expansion planning and traffic assignment, with up to ≈17,000 variables and ≈5,000 constraints. Keywords: Multicommodity Network Flows, Network Simplex Methods, Nonlinear Optimization, Primal Partitioning, Side Constraints, Hydro-Thermal Scheduling, Electricity Generation, Traffic Equilibrium, Network Expansion. 1. Introduction. Although primal partitioning has been reported for quite some time to be an appropriate technique for solving the multicommodity linear network flow problem and its algorithm has been described in detail [17], no work has been done adapting this methodology to the nonlinear objective function case. This work aims to fill this void by describing the computational results obtained with an efficient implementation considering different nonlinear objective functions. This is an extension of a previous work by the authors [8] where a code for the linear case and its computational results for a set of tests were presented. The nonlinear multicommodity network flow problem (which will be referred to as the NMP problem) can be cast as: min
h(X1 , X2 , . . . , XK )
(1)
subj. to
AXk = Rk
k =1÷K
(2)
0 ≤ Xk ≤ X k
k =1÷K
(3)
X1 ,X2 ,...,XK
K X
Xk ≤ T
(4)
k=1
where Xk ∈ IRn , (n: number of arcs) is the flow array for each commodity k (k = 1 ÷ K), K being the number of commodities of the problem, and h being a IRK×n → IR1 real valued function. A ∈ IRm×n (m: number of nodes) is the arc–node incidence matrix. Constraints (3) are simple bounds on the flows, X k ∈ IRn , k = 1 ÷ K, being the upper bounds. Equation (4) represents the mutual capacity constraints, where T ∈ IRn . In this work the original NMP problem has been extended to include linear side constraints defined by: K X Lk Xk ≤ U (5) L≤ k=1
1
where Lk : Lk ∈ IRp×n , k = 1 ÷ K, and L, U ∈ IRp (p: number of side constraints). These side constraints can link arcs of the same or different commodities. Therefore, the final formulation of the problem considered can be stated as follows: min
X1 ,X2 ,...,XK
(1) ; subj. to (2–5)
(6)
and will be referred to as the NMPC problem. The code here presented (which will be referred to as the PPRN code [6] in the rest of the document) can be viewed as a general purpose code for solving the NMPC problem. The NMPC problem was first approached using the price-directive decomposition [23] but this procedure does not seem to be as computationally efficient as primal partitioning [7]. If the set of side constraints (5) is empty, code PPRN will only solve a nonlinear multicommodity network flow problem (NMP). If the number of commodities is equal to one, it will work as a specialized nonlinear network flow code with side constraints, as described in [14,17,18]. Even in the latter case the PPRN code can be more efficient than a plain network flow code with side constraints, due to the fact of considering a variable-dimension working matrix instead of a fixed one (as will be shown in later sections). This can improve the performance of the algorithm when the number of active side constraints at the optimum is small with respect to the whole set of side constraints. When the objective function (1) is linear the PPRN code implements a specialized simplex algorithm based on a primal partitioning of the basis, as described in [8]. The rest of the document will be subdivided into six main sections. Firstly, a brief revision of the literature on this subject for the linear case will be made. Secondly, the primal partitioning methodology will be revised. The next section will present the specific implementation of the code developed. The updating process of the working matrix, which is instrumental in the performance of the algorithm, will be described in the following section. Once the main features of the code have been reported, the tests problems that have been employed will be presented. These problems are either artificial problems or real problems arising mainly from long and short-term hydro-thermal scheduling of electricity generation, electrical network expansion planning and traffic assignment. The last main section will show the computational results obtained, with each test problem previously detailed. Comparisons of computational performance with a general-purpose nonlinear optimization code are included. 2. Linear Multicommodity Network Flows. If the objecive function (1) is linear we will refer to the nonlinear multicommodity problem (NMP) merely as the MP problem. To solve the MP problem by exploiting the network structure, various techniques have been described in the literature. Some of them deal with the mutual capacity constraints (4) in an exact fashion whereas others replace them by a Lagrangian relaxation in the objective function. The price-directive decomposition, resource-directive decomposition and primal partitioning methods belong to the first class. A complete description of these three methodologies can be found in [17]. The Lagrangian relaxation technique does not guarantee finding the optimal flows, but it can achieve good approximations simply by solving decoupled single network flow problems obtained by relaxing constraints (4). This methodology is briefly described in [1]. A first attempt, comparing the above multicommodity network techniques (except the Lagrangian relaxation one), was already made in [2]. In that work primal partitioning and price-directive decomposition seemed to be the best methods. Several codes have been developed to solve the MP problem, although up to now none have become standard. Recently, interior point methods have provided another approach to solve the MP problem. These methods appear to be really efficient when the size of the network is very great (see [16] for a description of a code using such methodology). In [8] the authors already presented a code for solving the MP problem considering additional side constraints. That code and the PPRN code here described mainly follow the underlying ideas in the primal partitioning method. Some aspects, especially those related to the management of a working matrix are the same in both codes, and have been upgraded with respect to the original formulation of the problem described in [17], substantially improving the performance of the algorithm. The primal partitoning technique and the special management of the working matrix are revised in later sections. 2
3. The primal partitioning method. In this section a brief description of the primal partitioning method will be presented, paying special attention to the changes brought about by considering the additional side constraints (5). See [17] for a comprehensive description. 3.1. Structure of the problem. Given that constraints (2), (4) and (5) in (6) are linear, it is possible to consider the problem constraint matrix A. Then each variable jk (that is, each flow j of the k-th commodity) has an associated column Ajk in A, with the following no null components: Ã jk t
(A ) =
s ↓
t ↓
j ↓
...1... − 1... | | {z }
.|. .{z 1 . .}. Mutual Capacity
Network
|
aj1 . . . ajp | {z }
!t
Side Constraints
where s and t identify the source and target nodes of arc jk . It can be noticed that each variable appears in three clearly different types of constraint: network, mutual capacity and side constraints. Network constraints are always active (they are equality constraints), whereas mutual capacity and side constraints may not be (as they are inequality constraints). Every basis in the primal partitioning method can be decomposed as follows: L1
R1
0
B = L2
R2
0
L3
R3
1l
(7)
L1 , R2 and 1l being square matrices where: • L1 refers to the network constraints and arcs of the K spanning trees. The topology of this matrix is: B1 L1 =
B2 ..
. BK
each Bk being a nonsingular matrix associated with the k-th spanning tree. L1 can be represented at every iteration by K spanning trees following the methodology described in [4,12]. • R1 refers to the network constraints and complementary arcs of the K commodities. Complementary arcs do not belong to any spanning tree and are required to preserve the nonsingularity of the basis. • L2 refers to the active mutual capacity and side constraints, for the arcs of the spanning trees. • R2 refers to the active mutual capacity and side constraints, for the complementary arcs. • L3 refers to the inactive mutual capacity and side constraints, for the arcs of the spanning trees. • R3 refers to the inactive mutual capacity and side constraints, for the complementary arcs. • 1l , an identity matrix, refers to the slacks of the inactive mutual capacity and side constraints. (It should be noticed that constraints whose slacks are in matrix 1l are treated as inactive constraints, even though the slack values are zero). 3.2. Motivation for using a working matrix. During the optimization process systems Bx = b and xt B = bt must be solved at each iteration, x and b being the variable and independent term vectors respectively. A description of the solution technique can be found in [17] and is briefly outlined here. Considering for x and b a partition such as the one employed above for the basis B, the next subsections show how to efficiently obtain the solution of both systems.
3
3.2.1. Computing Bx = b. This system can be written as: L1
R1
0
x1
b1
L2
R2
0
x2 = b2
L3
R3
1l
x3
b3
By block multiplication we obtain: L1 x1 + R1 x2 = b1 L2 x1 + R2 x2 = b2 L3 x1 + R3 x2 + x3 = b3
(8) (9) (10)
−1 Isolating x1 from (8) we obtain x1 = L−1 1 b1 − L1 R1 x2 . Substituting this expression into (9) yields −1 −1 L2 L1 b1 − L2 L1 R1 x2 + R2 x2 = b2 . Now x2 is directly obtained from this equation: −1 x2 = (R2 − L2 L−1 (b2 − L2 L−1 1 R1 ) 1 x1 )
(11)
Once x2 is known, x1 is computed directly: −1 x1 = L−1 1 b1 − L1 R1 x2
(12)
And finally, with x1 and x2 , x3 is computed as: x3 = b3 − L3 x1 − R3 x2
(13)
Thus by solving (11), (12) and (13) consecutively we obtain the solution of the original system. 3.2.2. Computing xt B = bt . This system can be written as:
x1
x2
x3
L1
R1
0
L2
R2
0 = b1
L3
R3
1l
b2
b3
By block multiplication we obtain: x1 L1 + x2 L2 + x3 L3 = b1 x1 R1 + x2 R2 + x3 R3 = b2 x3 = b3
(14) (15) (16)
The x3 value is found directly from (16). Isolating x1 from (14) we obtain: x1 = (b1 − x3 L3 − x2 L2 )L−1 1
(17)
Using (17) and (16) at (15) we get: (b1 − b3 L3 − x2 L2 )L−1 1 R1 + x2 R2 + b3 R3 = b2 −1 (b1 − b3 L3 )L−1 1 R1 + x2 (R2 − L2 L1 R1 ) = b2 − b3 R3 ¡ ¢ −1 −1 (b2 − b3 R3 ) − (b1 − b3 L3 )L1 R1 (R2 − L2 L1 R1 )−1 = x2
(18)
Thus by solving (16), (18) and (17) we obtain the solution of the original system. As shown in the two previous subsections, to solve systems Bx = b and xt B = bt it suffices to invert submatrix L1 and a matrix whose expression is R2 − L2 L−1 1 R1 . This last matrix will be referred to as the working matrix, and denoted by Q. It must be noticed that the fact of inverting L1 does not involve too much work, given that L1 is a block diagonal matrix, where each block represents a spanning tree. This kind of system can be solved by simply exploiting the tree structure 4
of the matrix and highly efficient procedures have been developed [12]. Therefore, the problem of solving both systems of equations is reduced to factorizing the working matrix Q instead of basis B, and having a procedure to update this factorization at each iteration. Since the dimension of the working matrix is small compared with the whole dimension of basis B, it can be expected that the computation time of an algorithm using this primal partitioning will likewise be small compared with a general-purpose linear optimization package. On the other hand, the dimension of basis B is fixed during the optimization process, whereas the dimension of Q is variable, given that it depends on the number of active mutual capacity and side constraints. That implies that the updating process of the Q factorization must be able to deal with variable size dimensions, increasing the difficulty of the algorithm (as will be shown in later sections). 3.3. Computing the working matrix Q. Some new concepts must first be defined in order to use them in an efficient procedure for computing Q: • Asc : set of active side constraints at current iteration. • Amc : set of active mutual capacity constraints at current iteration. • A: set of active constraints (mutual capacity and side constraints), that is, A = Asc ∪ Amc . • |C|: number of elements of set C. • dim(M ): dimension of matrix M . Given that R2 and L2 are associated with the active mutual capacity and side constraints, they can be subdivided into two submatrices as follows: hR i hL i 2mc 2mc R2 = L2 = R2sc L2sc where R2mc and L2mc refer to constraints belonging to Amc , and R2sc and L2sc refers to constraints of the set Asc . Since Q = R2 − L2 L−1 1 R1 , it can also be considered as subdivided into two submatrices thus: hQ i hR i hL i mc 2mc 2mc Q= = − L−1 1 R1 Qsc R2sc L2sc whose dimensions are dim(Qmc ) = |Amc | × |A| and dim(Qsc ) = |Asc | × |A|. The expression for computing Q involves the calculation of L−1 1 R1 . Since L1 is a block diagonal matrix where the k-th block is a minimum spanning tree for the k-th commodity, and R1 expresses for each complementary arc of the k-th commodity its connection to the k-th minimum spanning tree, then solving L−1 1 R1 is equivalent to having the paths (denoted by Pj , j = 1 ÷ |A|) of complementary arcs in their associated spanning trees. Given an arc a ∈ Pj , we will say than a has normal orientation if it points to the source node of the complementary arc j; otherwise, it has reverse orientation. If we denote by: • aj the arc associated with the j-th column of Q, j = 1 ÷ |A|. • mci the mutual capacity constraint of the i-th row of Q, i = 1 ÷ |Amc | (this capacity constraint refers to the saturated arc mci ). • sci the side constraint of the i-th row of Q, i = |Amc | + 1 ÷ |A|. • B(a, n) a logical function which becomes true if the arc a appears in the side constraint n, and f alse otherwise. • ca,n the coefficient of the arc a in the side constraint n. Then we can compute directly the matrix Q as follows: Submatrix Qmc : Qij i=1÷|Amc | j=1÷dim(Q)
+1, +1, = −1, 0,
if aj = mci if mci ∈ Pj with normal orientation if mci ∈ Pj with reverse orientation otherwise
5
Submatrix Qsc :
Qij i=|Amc |+1÷dim(Q) j=1÷dim(Q)
Following the next 4 steps: 1) Set Qij = 0 2) if B(aj , sci ) then Qij = caj ,sci for each a ∈ Pj , perform next 2 steps = 3) if B(a, sci ) and a has normal orientation then Qij = Qij + ca,sci 4) if B(a, sci ) and a has reverse orientation then Qij = Qij − ca,sci
A full and more detailed description of the computation of Q can be found in [5]. 4. Implementation of primal partitioning. The implementation of the primal partitioning method developed in the PPRN code follows three stages, called phases 0, 1 and 2, instead of the two classical phases of the simplex method. Phases 0 and 1 attempt to obtain a feasible starting point, whereas phase 2 achieves the optimizer. However, although phases 0 and 1 work sequentially to find a feasible point, it can be said that primal partitioning is only applied in phases 1 and 2. The following subsections will clarify these ideas by describing each phase. For computational purposes the inequality constraints (4) and (5) in the original NMPC problem are replaced by equality constraints by adding slacks, obtaining: K X
Xk + s = T ;
0≤s
(19)
k=1 K X
Lk Xk + t = U ;
0≤t≤U −L
(20)
k=1
where s ∈ IRn and t ∈ IRp . In this formulation equations (19) and (20) replace the original equations (4) and (5). Then the formulation of the problem considered by the algorithm (which will be referred to as NMPC2) is: min (1) ; subj. to (2–3), (19–20) (21) X1 ,X2 ,...,XK
It can be noticed that the current version of PPRN cannot deal with lower bounds other than zero in the variables. 4.1. Phase 0. In phase 0 the algorithm considers only the network constraints and bounds on the variables of the problem, without any constraint linking the flows of different commodities. It attempts to obtain for each commodity k, k = 1 ÷ K, a feasible starting point for the linear network problem: min Ckt Xk Xk
subj. to
AXk = Rk
(22)
0 ≤ Xk ≤ X k This problem is solved by applying a specialization of the simplex algorithm for networks. The implementation developed mainly follows the ideas described in [12] with regard to the pivotal operations when managing the spanning trees. It is important to note that phase 0 has nothing to do with primal partitioning, as it only solves single network problems. The code developed can either merely obtain a feasible point for (22) or reach its optimum solution. (The default option is to obtain a feasible point). When at the optimal solution of problem (21) the number of active mutual capacity and side constraints |A| is small, that implies that this point is not far from the point obtained by joining the solutions of the K single network problems (22). Then it seems to be a good choice to obtain the optimum value when solving (22) for each commodity, 6
because at the end of phase 0 the current point will be near the optimum solution desired. Otherwise, simply obtaining a feasible point for (22) will suffice. It has been observed in phase 0 that keeping up and checking a degenerate pivot list to avoid degenerate steps is less efficient than allowing degenerate pivots. Thus the default option in the code developed is to permit degenerate steps. 4.2. Phase 1. The K points obtained in phase 0 will not satisfy in general the mutual capacity and side constraints, thus giving rise to a pseudofeasible point. That implies than some slack variables s for ˆk , k = 1 . . . K the mutual capacity constraints or t for the side constraints will be out of bounds. Let X be the pseudofeasible point obtained; then the following index sets are defined: K © ¡X ¢ ª ˆ k > T i ⇔ si < 0 . • s− = i : X i k=1
©
• t− = i :
K ¡X
ˆk Lk X
¢ i
ª > Ui ⇔ ti < 0 .
i
< Li ⇔ ti > (U − L)i
k=1
©
• t+ = i :
K ¡X
ˆk Lk X
¢
ª
k=1
Introducing new artificial variables e and f , and fixing initial values for s and t such that: K ¡X ¢ ˆ k + si − ei = Ti ; si = 0 ; • X ∀i ∈ s− i k=1
•
K ¡X
ˆk Lk X
¢ i
+ ti − fi = Ui ; ti = 0 ;
∀i ∈ t−
k=1
•
K ¡X
ˆk Lk X
¢ i
+ ti + fi = Ui ; ti = (U − L)i ;
∀i ∈ t+
k=1
The problem solved in phase 1 is: min
X1 ,X2 ,...,XK ,s,t,e,f
subj. to
X
ei +
i∈s−
X i∈t−
fi +
X
fi
(23)
i∈t+
(2–3) K X
Xk + s + 1le e = T
(24)
Lk Xk + t + 1lf f = U
(25)
k=1 K X k=1
0≤t≤U −L ; 0≤s ; 0≤e ; 0≤f Where both matrices 1le ∈ IRn×n and 1lf ∈ IRp×p in (24) and (25) are diagonal and defined as follows: ( n −1 if i ∈ t− − −1 if i ∈ s e f (1l )ii = (1l )ii = +1 if i ∈ t+ 0 otherwise 0 otherwise It can be noticed that the objective function (23) at phase 1 is nothing but the sum of infeasibilities of the mutual capacity and side constraints. Therefore, the NMPC2 problem defined in (21) will be feasible if, at phase 1, the value of (23) at the optimizer is 0. Dividing the process of finding a feasible starting point for problem MCP2 into two stages (phases 0 and 1) has proved to be very efficient in number of iterations with respect to methods that starting from any given point consider the sum of infeasibilities for all constraints. In the PPRN code, at the beginning of phase 1 it is known that no infeasibilities should be considered for the network constraints, because in phase 0 K spanning trees have been obtained. Besides, the obtention of the K 7
spanning trees does not involve much computation time, given that the available methods for solving linear network problems are very efficient. 4.3. Phase 2. Once a feasible point has been obtained, phase 2 attempts to achieve the optimizer of the nonlinear objective function. The primal partitioning method, as presented in [17], was intended for linear objective functions. However, when optimizing nonlinear functions, primal partitioning can be applied together with Murtagh and Saunders’ strategy —described in [19]— of dividing the set of variables into basic, superbasic and nonbasic variables: £ ¤ A = B|S|N (26) A being the matrix of constraints (2), (4), and (5). The efficiency in managing the working matrix Q with respect to the whole basis B is preserved in the nonlinear case. Furthermore, the structure of network, mutual capacity and side constraints, can be exploited, improving the computation time with respect to general methods of optimization where these constraints are treated in a general way. Consider that at iteration i we have (subindex i is omitted in almost all cases to simplify the notation): •xi , h(xi ): the current feasible point and the value of the objective function at this point. •B, S, N : the sets of basic, superbasic and nonbasic variables. B is represented by just K spanning trees and an LU decomposition of the working matrix £Q. ¤ •g(xi ): where g(xi ) = ∇h(xi ) divided into g(xi ) = gB |gS |gN for basic, superbasic and nonbasic variables. •Z: a representation matrix of the null subspace of the constraint matrix A defined in (26). The expression of Z is: −B −1 S (27) Z= 1l 0 It can easily be observed that AZ = 0. •gz , ²gz : the current reduced gradient gz = Z t g(xi ), and a tolerance to estimate when its norm is considered sufficiently small. t •π: a vector satisfying π t B = gB . Then the algorithm of phase 2 can be expressed as the following succession of steps (steps where one can take advantage of the particular structure of the constraints are marked with (∗ )): STEP (1): Optimality test in the current subspace. i) If ||gz || ≥ ²gz go to step (3). STEP (2): Price nonbasic variables. i) Compute Lagrange multipliers λ = gN − N t π. (∗ ) ii) Choose a suitable λq and the associated column Nq . If no multiplier can be chosen go to step (8). iii) Update data structures: remove Nq from N and add it to S; add λq as a new component of gz . £ ¤t STEP (3): Find descent direction P t = PB |PS |0 for basic and superbasic variables. i) Solve Z t Hi ZPS = −gz , where Hi = ∇2 h(xi ). (∗ ) ii) Solve BPB = −SPS . (∗ ) STEP (4): Ratio test. i) Find αmax ≥ 0 such that xi + αmax P is feasible. ii) if αmax = 0 go to step (7). STEP (5): Linesearch. i) Find α∗ such that h(xi + α∗ P ) = min h(xi + αP ) 0≤α≤αmax
ii) Update new point xi+1 = xi + α∗ P , and compute h(xi+1 ) and gi+1 . STEP (6): Update reduced gradient gz . t i) Solve π t B = gB . ii) Perform gz = gS − S t π.
(∗ ) (∗ ) 8
iii) if α < αmax go to step (1). STEP (7): A basic or a superbasic variable becomes nonbasic (it reaches its lower or upper bound). i) If a superbasic variable Sp hits its bound then: - Remove the component of gz associated with the column Sp from S. - Remove Sp from S and add it to N . ii) If a basic variable Bp hits its bound then: - Find a superbasic variable Sq to replace Bp in B preserving the nonsingularity of the basis. (∗ ) - Remove Bp from B and add it to N . Remove Sq from S and add it to B (pivot operation). This involves updating the working matrix Q since a change in the basis B has been made. - Update π. - Perform gz = gS − S t π. (∗ ) iii) Go to step (1). STEP (8): Optimal solution found. Some comments should be made about the finer points of this algorithm: 4.3.1. Computing the descent direction. The current implementation of the program makes it possible to solve the system Z t Hi ZPS = −gz in step (3) i) by two methods: through a truncated-Newton algorithm, or using a quasi-Newton approximation of the projected Hessian Z t Hi Z. In neither case is an analytical expression for the Hessian of the objective function (1). Thus the PPRN code only needs the evaluation of the objective function and its gradient. The following subsections will briefly show both methodologies. 4.3.1.1. Truncated-Newton algorithm. The truncated-Newton algorithm employed for solving Z t Hi ZPS = −gz follows mainly the description of [9]. This is based on the conjugate gradient method for solving linear systems of equations M x = b, M being symmetric and positive definite. In this case the matrix system M is directly the projected HessianZ t Hi Z. The steps for solving Z t Hi ZPS = −gz are: Step 0) Set j = 0, p0 = 0, r0 = gz , d0 = −r0 , δ0 = dt0 d0 , η = θ2 (gzt gz ). Step 1) qj = Z t Hi Zdj ½ p −gz if j=0 t if dj qj ≤ εM δj then exit: PS = pj otherwise else αj = rjt rj /dtj qj Step 2) pj+1 = pj + αj dj rj+1 = rj + αj qj t rj+1 ≤ η then exit: PS = pj+1 if rj+1 t Step 3) βj = rj+1 rj+1 /rjt rj dj+1 = −rj+1 + βj dj t δj+1 = rj+1 rj+1 + β 2 δj j =j+1 go to step 1 The three critical points of this algorithm are the computation of qj and the controlling of the two exit conditions. The next three points will briefly describe how PPRN proceeds with these tasks. i) Given that the current Hessian Hi is unknown, it is necessary to approximate it in some way. The steps for computing qj = Z t Hi Zdj are: • U = Zd ¡ j ¢ • V = g(xi + γU ) − g(xi ) /γ, g(x) being the gradient of the objective function. • qj = Z t V The second step comes from expanding g(xi + γU ) in Taylor series: g(xi + γU ) ≈ g(xi ) + γ∇g(xi )U ¡ ¢ ∇g(xi )U ≈ g(xi + γU ) − g(xi ) /γ = V 9
Thus, since ∇g(xi )U = Hi U = Hi Zdj , V can p be considered a good approximation to Hi Zdj . The p γ difference interval used by the code is γ = εM / δj , where εM is the machine precision, and δj = dtj dj is updated at each iteration in step 3. In fact, the correct value that should be used p p is γ = εM / U t U where U = Zdj . The effect of choosing such a γ value would be equivalent to performing finite differences using the p unit vector U/||U ||2 (where ||.||p denotes the p-norm of a vector) and a difference interval equal to εM , which is a more desirable situation. However, that p would involve computing the 2-norm of U at each iteration, whereas the value δj = ||dj ||2 has been already computed. This value is chosen, then, to avoid extra computations, although in some cases (where ||U ||2 >> ||dj ||2 ) the difference approximation could become meaningless. ii) The first exit condition in step 1 attempts to control whether or not the hessian Hi is positive definite. If the value dtj Z t Hi Zdj was negative we could conclude that Z t Hi Z is not positive definite. However, given that we only have an approximation of Z t Hi Zdj , we have to compare dtj Z t Hi Zdj p with a value greater than 0. The PPRN code uses the value εM δj , where δj = dtj dj . Using this value, the comparison test can be seen as: p dtj Z t Hi Zdj ≤ εM δj p dtj Z t Hi Zdj ≤ εM ||dj ||22 p dtj dj Z t Hi Z ≤ εM ||dj ||2 ||dj ||2 which means than unit vectors are being considered, decreasing the possible round-off error. iii) The second exit condition in step 2 will be active when a sufficient reduction in the vector rj has been achieved. Initially r0 = gz , and (assuming full precision) at each iteration ||rj ||2 ≤ ||rj−1 ||2 , j = 1 ÷ s, s being the number of superbasic variables (which is the dimension of the system). After s iterations the rs vector will have a value of 0, which means that ps is the solution of the system. However, in order to avoid too many computations, the process will be stopped when a sufficient decrease in the 2-norm of rj has been reached. In fact, the exit condition should be written as: t rj+1 rj+1 ≤ η t rj+1 rj+1 ≤ θ2 (gzt gz ) t rj+1 rj+1 ≤ θ2 t gz gz ||rj+1 ||2 ≤ θ = max{ε1 , min{ε2 , ||gz ||2 }} ||gz ||2
p where ε1 is the maximum required precision (in the PPRN code ε1 = εM ), ε2 is the required precision when the current iterate xi is far from the optimum of the current subspace (ε2 ∈ [0, 1] and can be chosen by the user), and ||gz ||2 is used as the required precision when we are near the optimum of the current subspace. A clear disadvantage of this methodology is that at each iteration an evaluation of the objective function has to be made for computing qj through finite differences, which considerably increases the cost of the algorithm. On the other hand, no data structure need be stored to maintain Z t Hi Z at each iteration (as quasi-Newton methods do), and only four vectors of dimension s are needed for pj , rj , dj and qj . If the number of superbasics is high a great memory saving can be made, which can speed up considerably the performance of the program. 4.3.1.2. Quasi-Newton method. The second way of solving Z t Hi ZPS = −gz consists of approximating the s × s matrix Z t Hi Z by a factorization Rt R following the description in [19]. This is the default option in the PPRN code. Thus, the superbasic direction PS in step (3) i) is found directly by solving Rt RPS = −gz by forward and backward substitution. This factorization must be updated whenever a superbasic variable enters the set S (step (2) iii)), a superbasic variable leaves the set S (step (7) i)), there is a basis change (step
10
(7) ii)) or there is a movement from xi to xi+1 and a quasi-Newton update of R must be performed (step (6)). Let us describe briefly how each operation is performed. i) A superbasic variable enters the set S. In this case the current implementation of the PPRN code merely adds 1ls+1 (the (s + 1)-th unit vector) to R as a new column. The new dimension of matrix R will be (s + 1) × (s + 1). This implies that when solving Rt RPS = −gz at the next iteration PSs+1 = −gzs+1 (the gradient direction will be used for the (s + 1)-th component of PS ). However, in successive iterations new second-order information will be added to this component through the quasi-Newton update. ii) A superbasic variable leaves the set S. In this case a column of R (i.e., the q-th column) must be removed from R. The resulting upperHessenberg matrix must be restored to its triangular form. This is performed by pre-multiplying R with suitable Givens matrices Gl,l+1 , l = q ÷ s − 1. iii) A basis change is performed. Consider that the p-th basic variable is replaced by the q-th superbasic variable. Thus, the new null space matrix Z¯ can be written as Z¯ = Z(1l + 1lq ut ), where 1lq is the q-th unit vector, and u = −1/yq (y + 1lq ), y being y = Bp−1 S (Bp−1 is the p-th row of the inverse of the basis, S is the superbasic matrix and yq the q-th component of y). Thus, since Rt R ≈ Z t Hi Z, the next factorization ¯ can be obtained as follows: R ¯tR ¯ ≈ Z¯ t Hi Z¯ R ≈ (1l + u1ltq )Z t Hi Z(1l + 1lq ut ) ≈ (1l + u1ltq )Rt R(1l + 1lq ut ) ≈ (Rt + uRqt )(R + Rq ut ) ⇓ ¯ = (R + Rq ut ) R Rq being the q-th column of R. A more detailed description of this process can be found in [19]. iv) Quasi-Newton update. When there is a movement from xi to xi+1 the approximation Rt R of Z t Hi Z must be modified to take into account the new reduced Hessian Z t Hi+1 Z. The update used by the PPRN code is the well-known BFGS formula 1 1 ¯tR ¯ = Rt R + R (28) yy t + t gz gzt ∗ t α y PS gz PS where α∗ is the linesearch value obtained in step (5) i), gz is the reduced gradient Z t g(xi ) at point xi , PS is the superbasic direction found in step (3) i), and y = g(xi+1 ) − g(xi ). Appendix 1 shows how ¯ is performed in the PPRN code. the factorization update R The main advantage of the quasi-Newton method with respect to the truncated-Newton algorithm is that no extra objective function evaluations are required. For objective functions with high evaluation computation cost this can mean a significant time saving. On the other hand, the R matrix must be stored in dense form, and should the number of superbasics be very great its management would be prohibitive. The default option of the PPRN code is to work with the quasi-Newton method until a certain number of superbasics is reached (this threshold value can be modified by the user), whereupon a switch is made to the truncated-Newton method. 4.3.2. Optimality test in the current subspace. At each iteration it must be tested whether the optimum point of the current subspace has already been reached (step (1) i) of the algorithm) . However, the test performed is much more exhaustive than simply ascertaining whether ||gz || ≥ ²gz . Actually, the code discerns between two situations: when we have still not reached the optimum active constraint set (thus being far from the optimizer) and when we are already in the optimum active constraint set and merely a final, more accurate subspace minimization is required. The variable cs tells us which is the current situation, and it can take the values “far” or “near” depending on whether we are in the first or the second case. The code
11
uses six logical variables (Ti ) to decide whether or not the optimum point has been achieved in the current subspace. Each Ti is defined as follows. p ¡ ¢ T1 := ¡α∗ ||PS ||1 ≤ (εcs εM )(1¢+ ||xs ||1 ) x + T2 := |∆h| ≤ (εcs f +¢ εM )(1 + |h|) ¡ T3 := ¡||gz ||∞ ≤ Tgz ¢ T4 := ¡||gz ||∞ ≤ max{Tgz /10, εg ²(||π||1 )} ¢ T5 := ¡(T5 is active) and (cs = “far”) and (nsame ≤ M AXsame) ¢ T6 := ncurrent ≤ M AXcurrent The first test, T1 , controls whether the 1-norm of the current movement in the superbasic variables α∗ ||PS ||1 is significant with respect to the 1-norm of the superbasic components of the current iterate ||xs ||1 , using for such comparison the machine precision εM and the value εcs x which depends on the variable cs (if cs= “near” this value will be much smaller than when cs= “far”, requiring a smaller movement to satisfy the test). The second test, T2 , will be true when the variation in the objective function |∆h| is not significant with respect to the absolute value of h at the current iterate (|h|). The value εcs f used in the comparison “near00 “far00 depends also on the variable cs, and, as in the previous case, εf 500. On the other hand, the MINOS package always performs a quasi-Newton update. This (1) (2) (3) fact is instrumental in the performance of the test problems P3 , P3 and P3 , where the number of superbasic variables at the optimum is very great. In these three cases the time spent by PPRN is much less than that required by MINOS, even though the PPRN code performs many more objective function evaluations (#h(x)) —since it is using the truncated-Newton algorithm. Thus, it can be concluded that the different behavior of both codes in these examples is mainly due to the different algorithm used for computing the superbasic descent direction PS when the number of superbasic variables is very high, and that, clearly, the truncated-Newton algorithm seems to be much more efficient than the quasi-Newton update in such cases. From tables v–xi some interesting conclusions can be drawn. These are shown in Table xii, where each column means:
25
table xi. Results for the traffic assignment problem. (7)
P1 PPRN
MINOS
Ph.0
869
—
Ph.1
78
660
Ph.2
460
526
h(x∗ )
288.9697
288.9697
|A|
1
—
s
9
9
CPU.sec
11.6
69.6
²∗opt
1.3·10−10
2.7·10−8
#h(x)
258
657
T.It 1
6.66
50.88
T.It 2
19.60
68.47
T.It 2−h
16.93
62.52
table xii. time comparison between minos and pprn. test (1) P1 (1) P2 (1) P3 (2) P1 (2) P2 (2) P3 (3) P1 (3) P2 (3) P3 (4) P1
T.It 1 ratio
T.It 2 ratio
T.It 2−h ratio
time ratio
test
3.10
1.59
1.60
1.63
4.93
1.47
1.48
1.60
1.13
5.92
6.31
7.63
3.16
1.69
1.73
2.16
4.49
1.29
1.30
1.42
3.56
2.44
2.70
2.83
6.27
1.12
1.24
1.11
2.94
1.22
1.28
1.23
4.09
1.74
2.60
2.83
5.25
1.95
3.49
1.21
T.It 1 ratio
T.It 2 ratio
T.It 2−h ratio
time ratio
P2
(4)
3.54
0.92
1.34
1.16
P3
(4)
4.03
1.11
1.84
1.71
(5) P1 (5) P2 (5) P3 (5) P4 (6) P1 (6) P2 (6) P3 (7) P1
1.47
1.29
1.21
2.05
1.38
1.50
1.47
2.28
1.85
1.39
1.32
2.23
1.06
1.00
0.92
1.63
11.80
7.84
7.91
6.90
—
10.85
10.37
6.67
—
8.30
7.83
8.21
7.63
3.49
3.69
6.00
• “T.It 1 ratio”: ratio of the CPU time per phase 1 iteration between MINOS and PPRN ¡ T.It 1 (MINOS) ¢ , that is, how many times faster PPRN is with respect to MINOS in performing T.It 1 (PPRN) one single phase 1 iteration. • “T.It 2 ratio”: ratio of the CPU time per phase 2 iteration between MINOS and PPRN ¡ T.It 2 (MINOS) ¢ T.It 2 (PPRN) , that is, how many times faster PPRN is with respect to MINOS in performing one single phase 2 iteration. • “T.It 2−h ratio”: ratio of the CPU time per phase 2 iteration (excluding the time spent making ¡ 2−h (MINOS) ¢ objective function evaluations) between MINOS and PPRN T.It T.It 2−h (PPRN) , that is, how many times faster PPRN is with respect to MINOS in performing one single phase 2 iteration without considering the objective function evaluations. ¡ sec (MINOS) ¢ • “time ratio”: ratio of the total CPU time between MINOS and PPRN CPU. CPU. sec (PPRN) , that is, how many times faster the PPRN code is with respect to MINOS 5.3. Looking at column “time ratio” of table xii it can be observed that PPRN is faster than MINOS (1) in all the executions. Besides, in some tests (e.g., P3 ) the “time ratio” value is higher than the other ratios; the reason is that PPRN performs in such cases much fewer iterations than MINOS. Another interesting fact is that, in most cases, PPRN is much faster in phase 1 than in phase 2 iterations with respect to MINOS, since it can take advantage of optimizing a linear function only.
26
8. Conclusions. A new nonlinear multicommodity network flow code has been presented, implementing the primal partitioning method. Special features included in it are: managing the working matrix, dividing the optimization process into three phases and using Murtagh & Saunders’ strategy of considering basic, superbasic and nonbasic variables. Moreover, the code is capable of solving problems with side constraints, optimizing a linear objective function through an ad hoc simplex algorithm, and working as a specialized single-commodity network flow code with side constraints. The code has shown to be very efficient in solving some test problems obtained from artificial functions and real models (long and short-term hydro-thermal scheduling of electricity generation, electrical network expansion planning and inelastic traffic assignment), with sizes ranging from 800 to 17,000 variables and 300 to 5,000 constraints.
¯ Appendix 1. Factorization update of R. The BFGS formula used in the code was defined previously in equation (28). Next proposition ¯ can be obtained from R adding only one rank-one will show how the new factorization matrix R matrix. Proposition. Given the BFGS update formula ¯tR ¯ = Rt R + R 1
1 α∗ y t P
yy t + S
1 gz gzt t gz PS
1
, v such that Rt v = gz , v¯ = −δ1 v, and p = δ2 y + δ1 gz − (δ1 , δ2 ∈ IR ; v, v¯, p ∈ IR , s being the dimension of R), ¯ can be computed as R ¯ = R + v¯pt then R
and defining δ1 = p
gzt PS s
, δ2 = p
α∗ y t PS
Proof. ¯ = R + v¯pt , then R ¯tR ¯ = Rt R + It will be shown that defining R the correct factorization update matrix. Indeed,
1 t α∗ y t PS yy
+
1 t gzt PS gz gz ,
¯tR ¯ = (Rt + p¯ R v t )(R + v¯pt ) = Rt R + Rt v¯pt + p¯ v t R + p(¯ v t v¯)pt (using that v¯ = −δ1 v) = Rt R + (−δ1 )Rt vpt + (−δ1 )pv t R + δ12 (v t v)ppt (using that Rt v = gz ) = Rt R + (−δ1 )gz pt + (−δ1 )pgzt + δ12 (v t v)ppt (using that p = δ2 y + δ1 gz ) = Rt R − δ1 δ2 gz y t − δ12 gz gzt − δ1 δ2 ygzt − δ12 gz gzt + δ12 (v t v)ppt = Rt R − δ1 δ2 (gz y t + ygzt ) − 2δ12 gz gzt + δ12 (v t v)ppt
¯ being thus R
(41)
Since p = δ2 y + δ1 gz then ppt can be directly expressed as: ppt = (δ2 y + δ1 gz )(δ2 y t + δ1 gzt ) = δ22 yy t + δ2 δ1 ygzt + δ1 δ2 gz y t + δ12 gz gzt
(42)
Now it will be shown that the coefficient δ12 (v t v) is equal to 1. The descent direction PS is computed as Rt RPS = −gz . Given that Rt v = gz it follows that v = −RPS . Thus v t v = PSt Rt RPS = −PSt gz . Since δ1 = √ 1 t we directly have −gz PS
δ12 (v t v) =
1 (−PSt gz ) = 1 −gzt PS
(43)
27
Substituting (42) and (43) in (41) we have: ¯tR ¯ = Rt R − δ1 δ2 (gz y t + ygzt ) − 2δ12 gz gzt + δ22 yy t + δ2 δ1 ygzt + δ1 δ2 gz y t + δ12 gz gzt R = Rt R − δ12 gz gzt + δ22 yy t 1 1 = Rt R + t gz gzt + ∗ t yy t gz PS α y PS which was the desired result.
28
REFERENCES [1] Ahuja, R.K., T.L. Magnanti, and J.B. Orlin. 1993. Network Flows. Prentice Hall, Englewood Cliffs, New Jersey. [2] Ali, A., R.V. Helgason, J.L. Kennington, and H. Lall. 1980 Computational comparison among three multicommodity network flow algorithms. Operations Research, v. 28, pp. 995–1000 [3] Beckmann M., C.B. McGuire, and C.B. Winsten. 1956. Studies in the Economics of Transportation. Yale University Press, New Haven CT. [4] Bradley, G.H., G.G. Brown, and G.W. Graves. 1977. Design and implementation of large scale primal transshipment algorithms. Management Science, Vol.24, N.1, pp 1–34. [5] Castro, J. 1993. Efficient computing and updating of the working matrix of the multicommodity network flow problem with side constraints through primal partitioning. DR 93/03. Statistics and Operations Research Dept., Universitat Polit`ecnica de Catalunya, Barcelona. (written in Catalan). [6] Castro, J. 1994. PPRN 1.0, User’s Guide. DR 94/06. Statistics and Operations Research Dept., Universitat Polit`ecnica de Catalunya, Barcelona. [7] Castro, J. and N. Nabona. 1994. Nonlinear multicommodity network flows through primal partitioning and comparison with alternative methods. System Modelling and Optimization. Proceedings of the 16th IFIP Conference. J. Henry and J.-P. Yvon editors. pp. 875-884. SpringerVerlag. [8] Castro, J. and N. Nabona. 1994. Computational tests of a linear multicommodity network flow code with linear side constraints through primal partitioning. DR 94/02. Statistics and Operations Research Dept., Universitat Polit`ecnica de Catalunya, Barcelona. [9] Dembo, R.S. and T. Steihaug. 1983. Truncated-Newton algorithms for large-scale unconstrained optimization. Mathematical Programming, v.26, pp. 190-212. [10] DIMACS. 1991. The first DIMACS international algorithm implemyentation challenge: The bench-mark experiments. Technical report, DIMACS, New Brunswick, NJ. [11] Duff, I.S., A.M. Erisman, and J.K. Reid. 1986. Direct Methods for Sparse Matrices. Oxford University Press, New York. [12] Grigoriadis, M.D. 1986. An Efficient Implementation of the Network Simplex Method. Mathematical Programming Study, v. 26, pp. 83–111. [13] Hellerman, E. and D. Rarick. 1971. Reinversion with the preassigned pivot procedure. Mathematical Programming, v. 1, pp. 195–216. [14] Heredia, F.J. and N. Nabona. 1990. The FXCB Program for Linear Network Flows with Side Constraints. RR 90/06. Computer Sciences Faculty of Barcelona, Universitat Polit`ecnica de Catalunya, Barcelona. (written in Catalan). [15] Heredia, F.J. and N. Nabona. 1994. Optimum Short-Term Hydro-thermal Scheduling with Spinning Reserve through Network Flows. Submitted to IEEE for consideration for publication in IEEE Trans. on Power Systems. [16] Kamath, A.P, N.K. Karmarkar, and K.G. Ramakrishnan. 1993. Computational and Complexity Results for an Interior Point Algorithm on Multicommodity Flow Problems. Presented at Netflow93, San Miniato, Italy, October 3–7 1993. [17] Kennington, J.L. and R.V. Helgason. 1980. Algorithms for network programming. John Wiley & Sons, New York. [18] Kennington, J.L. and A. Whisman. 1988. NETSIDE User’s Guide. TR86-OR-01 (revised April 1988), Southern Methodist University, Dallas, Tex., 75275, USA. [19] Murtagh, B.A. and M.A. Saunders. 1978. Large-scale linearly constrained optimization. Mathematical Programming, v. 14, pp. 41–72 [20] Murtagh, B.A. and M.A. Saunders. 1983. MINOS 5.0. User’s guide. Dept. of Operations Research, Stanford University, CA 9430, USA. [21] Nabona, N. 1993. Multicommodity network flow model for long-term hydrogeneration optimization. IEEE Trans. on Power Systems, v. 8, num. 2, pp. 395–404. [22] Nabona, N., J. Castro, J.A. Gonz´alez, and F.J. Heredia. 1994. Optimum Transmission and Distribution Network Expansion at Several Voltage Levels through Multicommodity Network Flows. Submitted to IEEE for consideration for publication in IEEE Trans. on Power Systems.
29
[23] Nabona, N. and J.M. Verdejo. 1992. Numerical implementationof nonlinear multicommodity network flows with linear side constraints through price-directive decomposition, in P. Kall (Ed.) System Modelling and Optimization, Proccedings of the 15th IFIP Conference, Zurich. Springer– Verlag. pp. 311-320. [24] Sheffi, Y. 1985. Urban Transportation Networks. Equilibrium Analysis with Mathematical Methods. Prentice Hall, Englewood Cliffs, New Jersey. [25] Toint, Ph. L. and D. Tuyttens. 1990. On large scale nonlinear network optimization. Mathematical Programming, Series B, v. 48(1), pp. 125–159.
30