Partial Linearization Methods in Nonlinear ... - Semantic Scholar

11 downloads 0 Views 214KB Size Report
programming through the concept of partial linearization of the objective function. ... of this paper is to unify a number of feasible direction methods in nonlinear.
Partial Linearization Methods in Nonlinear Programming1 M. Patriksson2

Communicated by D.G. Luenberger

1 2

The author wishes to thank Drs. K. Holmberg, T. Larsson, and A. Migdalas for their helpful comments. Lic. Phil., Department of Mathematics, Linkoping Institute of Technology, Linkoping, Sweden.

Abstract. In this paper, we characterize a class of feasible direction methods in nonlinear

programming through the concept of partial linearization of the objective function. Based on a feasible point, the objective is replaced by an arbitrary convex and continuously di erentiable function, and the error is taken into account by a rst order approximation of it. A new feasible point is de ned through a line search with respect to the original objective, towards the solution of the approximate problem. Global convergence results are given for exact and approximate line searches, and possible interpretations are made. We present some instances of the general algorithm, and discuss extensions to nondi erentiable programming.

Key Words. Feasible direction methods, partial linearization, regularization, nondi er-

entiable programming.

1. Introduction The purpose of this paper is to unify a number of feasible direction methods in nonlinear programming. Examples of these are the method of Frank and Wolfe and constrained Newton methods. The problem studied is (P)

min T (x); x2X

where T : 1. Below, we establish the global convergence of the modi ed algorithm.

Theorem 2.3 Under the additional condition (9), the partial linearization algorithm is globally convergent with exact line searches replaced by the Armijo steplength rule (10).

n

Proof We will show that any convergent subsequence L^ of the sequences x(l) n o 1 x(l) l=1

o1

satis es



lim^ rT x(l)

T 

l2L

and

l=1



x(l) ? x(l) = rT (x)T (x ? x) = 0;

i.e., equivalent to (8). The proof then follows using exactly the same arguments as in the proof of Theorem 2.2. n

o

By Theorem 2.1, the sequence d(l) consists of feasible directions of descent. Hence, 

rT x(l) and by (10),



T 

x(l) ? x(l) < 0;









T x(l+1)  T x(l) ; 8 l: Indeed, by the construction of the steplength rule, 









T x(l+1)  T x(l) + " lrT x(l)



T 

x(l) ? x(l) :

(11)

This inequality is valid, since (10) can always be satis ed within a nite number of trial steps. The proof of this fact, given below, is a modi cation of a part of the convergence proof for the Frank-Wolfe algorithm of Pshenichny and Danilin (Ref. 7, p. 172). Using Taylor's formula and the condition (9), 



T x + x ?x (l)

(l)

(l)





?T x

(l)



=

Z 0





rT x(l) + s x(l) ? x(l) 

T 





T 



 rT x(l) = rT x(l)

T 

x(l) ? x(l) + x(l) ? x(l)

Z 0



x(l) ? x(l) ds



2 + 1 2K

x(l) ? x(l)

: 2 (12)

From (12) we obtain, by choosing 

 2 (" ? 1) that





rT x(l)



T 



x(l) ? x(l) ;

2

(l)

( l ) K x ? x 





T x(l) + x(l) ? x(l)  T x(l) + " rT x(l)

T 



?i  2 (" ? 1)

T 



x(l) ? x(l) :

Replacing with ?i, we see that { is the smallest integer to satisfy 

x(l) ? x(l) :

2 K

x(l) ? x(l)

rT x(l)

2

sK

x(l) ? x(l)

ds

Hence,



rT x(l)



x(l) ? x(l) ;

2 K

x(l) ? x(l)

l = ?({?1) > 2 (" ? 1) i.e.,

T 

T 





rT x(l) x(l) ? x(l) l > 2 (" ? 1) :

2

(l)

( l ) K x ? x

(13)

Hence, the Inequality (10) will be satis ed after a nite number of steps with a steplength satisfying (13).



Noting that

x(l) ? x(l)

is bounded from above by a positive constant, say C , because of the compactness of X, by (11) and (13) we obtain

T x(l+1) ? T x(l)  2" (" ? 1) 











rT x

(l)

T 

x ?x KC 2 (l)

(l)

2

:

(14)

Summing (14) for all l  m ? 1, we obtain 

(m)

T x





?T x

(0)



mX ?1  2  T  2 " ( " ? 1) (l) (l) (l)  ? KC 2 x ?x rT x : l=1

Using the fact that T (x) is bounded from below on X by, say T , mX ?1  l=1



  (l) T (l)

rT x

It follows that the series is convergent, and hence,

x ?x

1 X l=1

(l)

2



2      KC  2"(" ? 1) T x(0) ? T x(m) 2     (0) ? T T x  2" KC : (" ? 1)

 (l) T



rT x 

lim rT x(l) l!1

x ?x (l)

T 

(l)

2



x(l) ? x(l) = 0:

(15) n

o

By using the same arguments as in Theorem 2.2, we may choose subsequences of x(l) and n o x(l) so that lim x(l) = x and lim^ x(l) = x: l2L

By (15),

which completes the proof.

l2L

rT (x)T (x ? x) = 0;

2

3. Interpretations and Some Instances The convergence proof of Theorem 2.2 implies several properties of the partial linearization algorithm. A nice property of the method of Frank and Wolfe (Ref. 8), which may be recognized by the choice of the function f (x; y)  0, is that, for convex problems, a termination criterion is available through the lower bound on the optimal objective value, obtained from the linear subproblem. In the partial linearization algorithm, this termination criterion is, in general, not valid since the error function, T (x) ? f (x; y), is not convex. However, by Theorem 2.2, 







T x(l) ? T (l) x(l) > 0 whenever x(l) is not a solution to (P), and this di erence tends towards zero when the iterations proceed. Hence, the di erence between the objective value and the corresponding optimal subproblem value may still be utilized in a termination criterion. If the function f (x; y) is chosen strictly convex with respect to x, then the subproblem (PL-SUB(l)) is strictly convex, and the solution, x(l), is unique. This implies the possibility (l) of utilizing dual  for the solution of (PL-SUB ); furthermore, the termination   techniques criterion T (l) x(l) ? T (l) x(l) = 0 is replaced by x(l) = x(l), and it is easily shown that any accumulation point of the sequence of subproblem solutions is a solution to (P). Hence, the choice of strictly convex functions f (x; y) corresponds to making better and better approximations of the original objective, in the sense that in the limit the original problem is solved by the subproblem solution. If the function f (x; y) is chosen so that

rxf (x; y) = 0 if x = y;

(31)

then the subproblem of the partial linearization algorithm is equivalent to 

min rT x(l) x2X

T 







x ? x(l) + f x; x(l) :

Algorithms of this type are studied by Migdalas (Ref. 9), under the name of regularized Frank-Wolfe algorithms. The above expression provides a nice interpretation of the algorithmic class as partial linearization methods as opposed to complete linearization methods, such as the Frank-Wolfe algorithm. In the Frank-Wolfe algorithm, the rst order Taylor expansion, which is valid only locally around x(l), is used globally in the subproblem phase. The subproblem solved in a partial linearization method introduces a regularization term in the objective function of the Frank-Wolfe subproblem, restricting the di erence between the current point x(l) and the subproblem solution x(l) to some measured distance. By retaining the nonlinearity of the original objective function, partial linearization methods will therefore avoid the tailing-o phenomena inherent in the Frank-Wolfe method, caused by the generation of extreme point solutions. This was the original idea behind the rst formulation of a partial linearization method (Ref. 3), where the objective is described as the sum of a strictly convex and di erentiable function and a di erentiable but not necessarily

convex function. The subproblem then is de ned by a linearization of the nonconvex part. Also, it is pointed out the possibility of choosing the strictly convex function separable to utilize separability in the constraints. This is demonstrated in the case of trac equilibrium problems in Refs. 10 and 11. The partial linearization method may also be used to regularize not strictly convex problems by iteratively adding a strictly convex function to the original objective, as shown below. Assume that the objective function T (x) is convex but not strictly convex. Assume also that the function f (x; y) is strictly convex with respect to x. Let f (x; y) = T (x) + f (x; y); which then is strictly convex with respect to x. Rewrite the objective as in (1), i.e., let 



T (x) = f (x; y) + T (x) ? f (x; y) = T (x) + f (x; y) + (?f (x; y)) : The subproblem solved in the partial linearization method then is to minimize 





T (l)(x) = T (x) + f x; x(l) ? rxf x(l); x(l)

T 



x ? x(l) ;

and if f is chosen to satisfy (31), the subproblem objective equals 



T (l)(x) = T (x) + f x; x(l) ; i.e., the subproblem introduces a regularization of the original objective by adding to it a strictly convex function. [For a general discussion on regularization methods in nonlinear programming, see, e.g., Ref. 12.] As an example, consider the function f (x; y) = 2c (x ? y)T I(x ? y) = 2c kx ? yk22; where c > 0. The subproblem, given a feasible point x(l), then is equal to the minimization of the function

2 T (x) + 2c

x ? x(l)

2 ; which is equivalent to the subproblem of the proximal point algorithm (Ref. 13). This also suggests the possibility of using partial linearization methods for nondi erentiable problems; this is further discussed in Section 4.

3.1. Some Instances of the Partial Linearization Algorithm By choosing the function f (x; y) in various ways, well known feasible direction methods are recognized. Below we list a few such methods. For further discussions, we refer to Ref. 11.

f (x; y)

Algorithm

0

Frank-Wolfe

x r2T (y)x

constrained Newton

1 T 2

x B(y)x constrained quasi-Newton proximal point T (x) + 2c kx ? yk2 n Z Pj6 i yj +xi @T (s) X @xi ds nonlinear Jacobi i=1 0 1 T 2

=

Note that when T (x) is only pseudoconvex, the Hessian r2T (x) is not necessarily positive semide nite; however constrained Newton methods can always be modi ed by adding a positive de nite matrix to the Hessian, e.g., let   f (x; y) = 21 xT r2T (y) + "I x; where " > 0 is suciently large so that the matrix r2T (y) + "I is (at least) positive semide nite.

4. Nondi erentiable Programming In this section we extend the partial linearization algorithm to nondi erentiable programming. Let the objective T (x) be given as

T (x) = g(x) + h(x);

(48)

where g(x) :

Suggest Documents