An interior-point method for minimizing the sum of piecewise-linear convex functions
Toshihiro Kosaki The Department of Industrial and Management Tokyo Institute of Technology 2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152-8552 Japan
[email protected]
(February 26, 2010 )
Abstract We consider the problem to minimize the sum of piecewise-linear convex functions under both linear and nonnegative constraints. We convert the piecewise-linear convex problem into a standard form linear programming problem (LP) and apply a primal-dual interior-point method for the LP. From the solution of the converted problem, we can obtain the solution of the original problem. We establish polynomial convergence of the interior-point method for the converted problem and devise the computaion of the Newton direction. Keywords: optimization, piecewise-linear convex function, interior-point method
1. Introduction In this paper, we consider the problem to minimize the sum of piecewise-linear convex functions under both linear constraints and nonnegative constraints. This problem is a generalized one from a piecewise-linear convex problem[1, 3]. The purpose of the paper is to estimate complexity of solving this problem. More precisely speaking, is there a polynomial time method for the problem? This paper is organized as follows. In section 2, we introduce the conversion from a linear programming problem(LP) with free variables into the standard form LP. The solution of the original problem is obtained from the solution of the converted problem. In section 3, we formulate the problem to minimize the sum of piesewise-linear convex functions. Firstly we convese the problem into an LP with free variable. And secondly we convert the LP into a standard form problem. We apply the interior-point method. We establish polynomial convergence of the algorithm and devise the computation of the Newton direction. 2. The conversion from the LP with free variables into the standard form LP We consider the following LP with free variables. min cT x + f T z s.t. Ax + Dz = b x ≥ 0,
(P)
where A ∈ ℜm×n , b ∈ ℜm , c ∈ ℜn , f ∈ ℜl , D ∈ ℜm×l are constants, x ∈ ℜn and z ∈ ℜl are variables. Without loss of generality, we can assume that the matrix D is full column. Therefore rank D = l. We introduce new variables z+ and z− . The standard form is as follow. min cT x + f T z+ − f T z− x s.t. [A D − D] z+ = b z−
(P1)
x ≥ 0 z+ ≥ 0 z− ≥ 0.
The dual is as follow. max bT y s.t. AT y ≤ c T
D y = f. 1
(D1)
The transform z = z+ − z− has a drawback that one z can’t determine z+ and z− and the computation is unstable. The constraints of the dual problem, corresponding to free variables of the primal problem, are equality constraints. We solve the equations in basic variables and substitute it for both the objective function and constarint functions. We obtain a dual problem with only nonbasic variables. We consider the primal problem corresponding to this problem.In case of semidefinite programminng, the conversion was proposed[5]. The advantage is that the size of problem is small. We can obtain the solution of original problem from the solution of the converted problem. We use DB for a submatrix of D corresponding to the basis, and use the submatrix DN for the remain. We use subvectors yB and yN corresponding to DB and DN . And we use, respectively, the submatrices AB and AN and subvectors bB and bN corresponding to yB and yN . We solve equations D T y = f in basic variables, −T T (f − DN yN ). yB = DB
(1)
Substituting yB into the objective function, we have −T T (f − DN yN ) + bTN yN bT y = bTB DB
−T T −T = (bTN − bTB DB DN )yN + bTB DB f.
(2)
Substituting yB into constraint functions, we have −T T (f − DN yN ) + ATN yN ≤ c. ATB DB
(3)
The dual problem is as follows. −T T −T max (bTN − bTB DB DN )yN + bTB DB f
−T T −T DN )yN ≤ c − ATB DB f. s.t. (ATN − ATB DB
(D2)
The primal problem is as follows. −T −T f )T x + bTB DB f min (c − ATB DB
−1 −1 s.t. (AN − DN DB AB )x = bN − DN DB bB x ≥ 0.
(P2)
We use x∗ and y∗ for the solution of the converted primal and dual problems. The solutions (x, z) and −1 −T T ∗ (bB − AB x∗ )) and (DB (f − DN y ), y∗ ) (yB , yN ) of the original problems are given respectively by (x∗ , DB . 3. Formulation of the problem to minimize the sum of piecewise-linear convex functions We consider the convex problem to minimize the sum of piecewise-linear convex functions under linear and nonnnegative constraints. The sum of piecewise-linear convex function is also convex [2]. The objective function is nonlinear and has some nondifferential points. The formulation is as follows. T max cpip x + dpip min x
p=1,... ,q
ip =1,... ,lp
s.t. Ax = b
(P3)
x ≥ 0, where A ∈ ℜm×n , b ∈ ℜm , cpip ∈ ℜn , dpip ∈ ℜ are constants, x ∈ ℜn is variable. The objective function is the sum of q piecewise-linear convex functions and each piecewise-linear convex function consists of lp affine funcitions. This problem is written, by introducing both variables tp (p = 1, . . . , q) and slack variables
2
spip (ip = 1, . . . , lp , p = 1, . . . , q), as follows
min
tp
p=1,... ,q T
s.t. cp1 x + dp1 + sp1p = tp p = 1, . . . , q .. . p p cip T x + dip + spip = tp p = 1, . . . , q .. .
(P4)
T
cplp x + dplp + splp = tp p = 1, . . . , q Ax = b x≥0 spip p = 1, . . . , q ip = 1, . . . , lp . We divide tp into two nonnegative variables and transpose the terms. And we obtain a standard form LP, t+ t− min p − p p=1,... ,q
s.t.
T cp1p x
−
p=1,... ,q
t+ p
p p + t− p + s1p = −d1 p = 1, . . . , q
.. .
T
p p − cpip x − t+ p + tp + sip = −dip p = 1, . . . , q
.. .
(P5)
T
p − cplp x − t+ p + tp + slp p = −dlp p = 1, . . . , q
Ax = b x≥0
− t+ p ≥ 0 p = 1, . . . , q tp ≥ 0 p = 1, . . . , q
spip ≥ 0 p = 1, . . . , q i = 1, . . . , lp .
The dual problem is as follows. max bT y − s.t. AT y +
dpip upip
p=1,... ,q ip =1,... ,lp
p=1,... ,q ip =1,... ,lp
ip =1,... ,lp upip ≤ 0 p
upip
cpip upip ≤ 0
(D3)
= −1 p = 1, . . . , q
= 1, . . . , q ip = 1, . . . , lp .
The peculiarity is that the primal (P4) problem has free variables tp . One of the methods dealing with free variables is that the free variable is represented by the difference of two nonnegative variables. This method has a drawback that variable diverge and numerical difficulties happen. We apply another method, which was introduced in section 2, in order to avoid this difficulty. p We solve one of the constraints, then uplp = −1 − uip (≤ 0) (p = 1, . . . , q) and we substitute this ip =lp
3
relation into both objective function and constraint function. The objective function is dpip upip bT y − p=1,... ,q ip =1,... ,lp
=bT y −
dpip upip
−
dplp
p=1,... ,q ip =lp p=1,... ,q p (dlp − dpip )upip + =bT y + p=1,... ,q ip =lp
−1 − dplp .
ip =lp
upip
(4)
p=1,... ,q
The constraint function is
AT y +
cpip upip
p=1,... ,q ip =1,... ,lp
=AT y + cpip upip + upip clp −1 − p=1,... ,q ip =lp p=1,... ,q ip =lp p p (cip − cplp )upip − clp (≤ 0). =AT y +
p=1,... ,q ip =lp
(5)
p=1,... ,q
The dual problem is as follows.
max bT y +
p=1,... ,q ip =lp
T
s.t. A y + −
(dplp − dpip )upip + (cpip
p=1,... ,q ip =lp
upip
−
cplp )upip
≤
dplp
p=1,... ,q
cplp
p=1,... ,q
(D4)
≤ 1 p = 1, . . . , q
ip =lp
upip ≤ 0 p = 1, . . . , q, ip = 1, . . . , lp − 1. The primal problem is as follow.
min
cplp
p=1,... ,q p (c1p − cplp )T x
s.t.
T +
x+
sp1
.. .
−
splp +
p=1,... ,q splp = dplp
−
dplp
p=1,... ,q p d1 p = 1, . . .
,q
(cpip − cplp )T x + spip − splp = dplp − dpip p = 1, . . . , q
(P6)
.. .
(cplp −1 − cplp )T x + splp −1 − splp = dplp − dplp −1 p = 1, . . . , q Ax = b x ≥ 0, spip ≥ 0 p = 1, . . . , q ip = 1, . . . , lp . We introduce slack variables. vx
:=
p=1,... ,q
vspl
p
vspi
p
cplp − AT y −
p=1,... ,q ip =lp
upip p = 1, . . . , q,
:=
1+
:=
ip =lp −upip p =
(cpip − cplp )upip ,
1, . . . , q ip = 1, . . . , lp − 1. 4
(6a) (6b) (6c)
The optimality condition is as follow. (cp1 − cplp )T x + sp1 − splp = dplp − dp1 p = 1, . . . , q .. .
(cpip − cplp )T x + spip − splp = dplp − dpip p = 1, . . . , q .. .
(cplp −1 − cplp )T x + slp −1p − splp = dplp − dplp −1 p = 1, . . . , q Ax = b x≥0 spip ≥ 0 p = 1, . . . , q i = 1, . . . , lp p AT y + (cip − cplp )upip + vx =
−
p=1,... ,q ip =lp
upip + vspl = 1 p = 1, . . . , q
cplp
(7)
p=1,... ,q
p
ip =lp
upip + vspl = 0 p = 1, . . . , q, ip = 1, . . . , lp − 1 p
vx ≥ 0
vspl ≥ 0 p = 1, . . . , q p
v
sp ip
≥ 0 p = 1, . . . , q, ip = 1, . . . , lp − 1
T
x vx = 0 spip vspi = 0 p = 1, . . . , q, ip = 1, . . . , lp . p
∗
∗
We use (x∗ , sp1 , . . . , splp ) for the solution of the converted primal problem (P6). The solution of the original primal problem (P4) is given by ∗
∗
(x, tp , sp1 , . . . , splp ) = (x∗ , cTlp p x∗ + splp + dplp , sp1p , . . . , s∗lp p ). ∗
(8)
∗
We use (y∗ , up1 , . . . , uplp −1 ) for the solution of the converted dual problem(D4). The solution of the original dual problem (D3) is given by p∗ ∗ ∗ (y, up1 , . . . , uplp −1 , uplp ) = (y ∗ , up1 , . . . , uplp −1 , −1 − uip ). (9) ip =lp
By the number of the variables in the primalproblem (P6), the short step interior-point method (Algo q lp L iterations. We ascertain this proposition. rithm SPF[7]) can solve the problem (P6) in O n + p=1
5
We use the following notations.
x˜ :=
(c11
A − c1l1 )T .. .
(c1i − c1 )T l1 1 .. . 1 T (c1 l1 −1 − cl1 ) ˜ . A := .. (cq − cq )T 1 lq .. . (cqiq − cqlq )T .. . (cqlq −1 − cqlq )T
x s11 .. . s1i1 .. . s1l1 .. . sq1 .. .
sqiq .. . sqlq
y u11 .. . u1 i1 .. . u1 l −1 , y˜ := 1. .. uq 1 .. . q u iq .. . uqlq −1
0 ... 1 ... .. .
0 0
... ...
0 0
0 −1 .. .
0 ...
1
0
−1 .. .
0 ...
0
... .. . ...
1
−1
b d1l1 − d11 .. .
d1 − d1i l1 1 . . . d1 − d1 l −1 l ˜b := 1 . 1 .. dq − dq 1 lq .. . dqlq − dqiq .. . dqlq − dqlq −1
The feasible interior is written by
,
vx .. . vs1 1 .. . vs1 i1 . . . v1 s˜ := sl1 , .. . vsq 1 . .. vq siq .. . vsql
q
..
.
q
, c˜ :=
1 ... .. . 0 ...
0
...
0
1
0
0 ...
0
... .. . ...
p p=1 clp
0 .. . 0 .. . 1 .. . 0 .. . 0 .. . 1
1
, −1 .. . −1 .. . −1
(11)
.
˜x = ˜b, A˜T y˜ + s˜ = c˜, (˜ x, y˜, s˜) : A˜ x, s˜) > 0 . F 0 := (˜ 6
(10)
(12)
(13)
The neighborhood is written by ˜ s˜ − µe2 ≤ 0.4µ , N := (˜ x, y˜, s˜) ∈ F 0 : X
(14)
where e is a vector whose components are all 1. We use the following algorithm. step0: Initial point (˜ x0 , y˜0 , s˜0 ) ∈ N is given. Termination criteria µ∗ . 0.4 x ˜0T s˜0 0 q σ := 1 − , µ := , let k := 0. n + p=1 lp n + qp=1 lp
step1:
step2:
If termination criteria µk ≤ µ∗ is satisfied, then stop. T x ˜k s˜k q Set µk := . Solve n + p=1 lp ˜ xk A∆˜
A˜T ∆˜ yk + ∆˜ sk k k k ˜ ∆˜ S˜ ∆˜ x +X sk
= 0
(15a)
= 0 ˜ k s˜k , = σµk e − X
(15b) (15c)
k sk ). and obtain Newton direction xk , ∆˜ k (∆˜ y , ∆˜ k+1 k+1 k+1 k k , y˜ , s˜ y k , ∆˜ sk . := x ˜ , y˜ , s˜ + ∆˜ xk , ∆˜ Let x˜ step3: Set k := k + 1, go to step1. 3.1. iteration number We consider at kth iteration. The equality constraints are satisfied by the following relation
˜xk+1 = A˜ ˜xk = ˜b, A˜
(16)
A˜T y˜k+1 + s˜k+1 = A˜T y˜k + s˜k = c˜.
(17)
T
sk = 0, We have the following estimate Note that, about dualtiy gap used by optimality criteria, ∆˜ xk ∆˜ T k k T s˜ + ∆˜ x ˜k+1 s˜k+1 = x ˜ + ∆˜ xk sk T
T
T
T
=x ˜k s˜k + x˜k ∆˜ sk + s˜k ∆˜ xk + ∆˜ xk ∆˜ sk q 0.4 (n + = 1 − lp )µk . q n + p=1 lp p=1
By the following inequality, the next point ˜ k+1 k+1 s˜ − µk+1 = X 2
= = ≤ =
≤ ≤ ≤ ≤ ≤
(18)
generated by the algorithm is also kept in the neighborhood. ˜k ˜ k s˜k + ∆˜ sk − µk+1 (19) X + ∆X 2 ˜ k k s (20) ∆X ∆˜ 2 −1 ˜ k ˜ k1/2 S˜k−1/2 sk where D := X (21) D ∆X D∆˜ 2 √ 2 2 D −1 ∆˜ xk + D∆˜ s k 2 (22) 4 √ −1/2 2 2 k X ˜ k s˜k ˜ k S˜k σµ e − X (23) 4 2 2 √ ˜ k s˜k − σµk e X 2 2 (24) 4 min x˜ki s˜ki 2 √ ˜ k s˜k − µk e + (1 − σ) µk e X 2 2 (25) k 4 (1 − 0.4)µ q √ 2 2 2 0.4 + (1 − σ) (n + p=1 lp ) (26) 4 1 − 0.4 √ 32 2 k µ (27) 240 0.4µk+1 , (28) 7
˜ and S˜ are diagonal matrices whose components are x˜ and s˜ respectively. Positivity condition is where X also satisfied. Because one iteration can make the duality gap 1−0.4/ n + qp=1 lp time, algorithm can obtain optimal solution in O( n + qp=1 lp L) iterarions, where L is the number of bits expressing the data of the problem. From the solution of the converted problem, we can obtain the solution of the original roblem. 3.2. Newton direction The time of computing the Newton direction is most part of all computing time. Therefore devising computing the Newton direction is important. For the simplicity, we sonsider the problem about the sum of two piecewise-linear convex functions. The Newton direction is given by the solution of the following system of equations −1 ˜ k s˜k ˜ k S˜k−1 A˜T ∆˜ y k = −A˜S˜k σµk e − X (29a) A˜X ∆˜ sk
∆˜ x
k
= −A˜T ∆˜ yk −1 ˜ k s˜k − X ˜ k S˜k−1 ∆˜ = S˜k sk σµk e − X
We use the following notations. A A 1 1 T (c11 − c1l1 )T (c1 − cl1 ) .. .. . . (c1i1 − c1l )T (c1i1 − c1l )T 1 1 .. .. . . 1 1 −1 1 T (c (c − c ) − c1l1 )T ) A¯ := diag(x)diag(v x l l l −1 −1 1 1 1 (c21 − c2 )T (c21 − c2 )T l2 l2 .. .. . . (c2 − c2 )T (c2 − c2 )T i2 i2 l2 l2 .. .. . . (c2l2 −1 − c2l2 ) (c2l2 −1 − c2l2 ) O s11 vs−1 1 1 .. . si11 vs−1 1 i1 .. . sl1 −11 vs−1 + l1 −11 s21 vs−1 2 1
(29b)
(29c)
T
..
. s2i2 vs−1 2 i2
..
. s2l2 −1 vs−1 2
l2 −1
,
(30)
where diag() is a diagonal matrix whose components are elements of the input vector.The coefficient matrices are written by T T 0 0 0 0 e e + s2l v −1 0 0 . (31) A¯ + s1l1 vs−1 1 2 s2 l1 l2 0 e 0 e
Let A be a nonsingular matrix and B, C, D be matrices of proper size. Note that we have Sherman-MorrisonWoodbury formula [4, 6, 7] −1 −1 (A + BDC) = A−1 − A−1 BD D + DCA−1 BD DCA−1 . (32) 8
we use the notation
By using the relation Aˆ−1
T 0 0 e e . Aˆ := A¯ + s1l1 vs−1 1 l1 0 0 T 0 0 sl11 vs−1 1 l −1 −1 1 = A¯−1 − T A¯ e e A¯ , 0 0 0 0 e A¯−1 e 1 + s1l1 vs−1 1 l1 0 0
(33)
(34)
we can obtain Aˆ−1 from A¯−1 . And by using the relation −1 ˜ k S˜k−1 A˜T = Aˆ−1 − A˜X
T 0 0 s2l2 vs−1 2 l2 −1 −1 T Aˆ 0 0 Aˆ , 0 0 e e 0 Aˆ−1 0 1 + s2l2 vs−1 2 l2 e e
(35)
we can obtain the Newton direction. If A¯ has special structure by which it is easy to compute its inverse matrix, then this devise reduces computation time. This method can be applied to the problem to minimize the sum of more than three piecewise-linear convex functions. References [1] [2] [3] [4] [5]
D. Bertsimas and J. N. Tsitsiklis: Introduction to Linear Optimization, Athena Scientific (1997). S. Boyd and L. Vandenberghe: Convex Optimization, Cambridge University Press (2004). G. B. Dantzig and M. N. Thapa: Linear Programming 2:Theory and Extensions, Springer (2003). M. Iri: Ippansenkeidaisu (in Japanese), Iwanami shoten (2003). K. Kobayashi, K. Nakata and M. Kojima: A Conversion of an SDP Having Free Variables into the Standard Form SDP, Computational Optimization and Applications, 36 (2007), 289-307. [6] J. Nocedal and S. J. Wright: Numerical Optimization, Springer (2006). [7] S. J. Wright: Primal-Dual Interior-Point Method, SIAM Publications (1997).
9