A nonsmooth version of Newton's method - Semantic Scholar

Mathematical Programming 58 (1993) 353-367 North-Holland

353

A nonsmooth version of Newton's method Liqun

Qi*

School of Mathematics, The University of New South Wales, Kensington, NSW, Australia

Jie Sun** Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA Received 27 December 1989 Revised manuscript received 2 July 1991

Newton's method for solving a nonlinear equation of several variables is extended to a nonsmooth case by using the generalized Jacobian instead of the derivative. This extension includes the B-derivative version of Newton's method as a special case. Convergence theorems are proved under the condition of semismoothness. It is shown that the gradient function of the augmented Lagrangian for C2-nonlinear programming is semismooth. Thus, the extended Newton's method can be used in the augmented Lagrangian method for solving nonlinear programs.

Key words: Newton's methods, generalized Jacobian, semismoothness.

1. Introduction Newton's method x k+l = x k - [ F ' ( x k ) ]

' F ( x k)

(1.1)

is a c l a s s i c m e t h o d f o r s o l v i n g t h e n o n l i n e a r e q u a t i o n

F(x) =0, where F:~n~

R" is a c o n t i n u o u s l y

(1.2) d i f f e r e n t i a b l e f u n c t i o n , i.e., a s m o o t h function.

M a n y o t h e r m e t h o d s f o r s o l v i n g (1.2) a r e r e l a t e d t o t h i s m e t h o d

[12].

S u p p o s e n o w t h a t F is n o t a s m o o t h f u n c t i o n , b u t a l o c a l l y L i p s c h i t z i a n f u n c t i o n . T h e n t h e f o r m u l a (1.1) c a n n o t b e u s e d . L e t O F ( x k) b e t h e generalized Jacobian o f F a t x k, d e f i n e d b y C l a r k e [4]. I n t h i s c a s e , i n s t e a d o f (1.1), o n e m a y u s e

x k+l = x k - V ~ l F ( x k ) ,

(1.3)

Correspondence to: Dr. Liqun Qi, School of Mathematics, The University of New South Wales, Kensington, NSW 2033, Australia. * This author's work is supported in part by the Australian Research Council. ** This author's work is supported in part by the National Science Foundation under grant DDM8721709.

L. Qi, J. Sun / Nonsmooth Newton's method

354

where Vk E O F ( x k ) , t o solve (1.2). In this paper, we show that local and global convergence results hold for (1.3) when F is semismooth. Semismoothness was originally introduced by Mifflin [10] for functionals. In Section 2, we extend this definition to function F : R n ~ n and show that semismoothness is equivalent to the uniform convergence of directional derivatives in all directions. We also show that a locally Lipschitzian function is semismooth at a point if it has the strong Fr6chet derivative at that point. In Section 3 the local and global convergence results of (1.3) are proved under the semismooth condition and compared with another nonsmooth version of Newton's method developed by Robinson [17, 18, 19], Pang [13, 14], and Harker and Xiao [7], which uses the B-derivative instead of the Fr6chet derivative in (1.1). The B-derivative was first introduced by Robinson [17]. Pang [13] proved convergence of this version of Newton's Method when F has strong and nonsingular Fr6chet derivative at the solution point. We show that this B-derivative version of Newton's Method can be regarded as a special version of (1.3). Hence our convergence theorem is stronger than Pang's convergence theorem in a certain sense. We also compare our results with Kummer's results in [9]. In Section 4, we show that the augmented Lagrangian of a C2-nonlinear programming problem has semismooth gradients. Thus, the extended Newton's method can be used in the augmented Lagrangian method for solving nonlinear programs. We also briefly mention other possible applications of this method.

2. Semismooth functions of several variables

Suppose F : Nn ~ R" is a locally Lipschitzian function. According to Rademacher's Theorem, F is differentiable almost everywhere. Denote the set of points at which F is ditterentiable by DF. We write JF(x) for the usual n x m Jacobian matrix of partial derivatives whenever x is a point at which the necessary partial derivatives exist. Let OF(x) be the generalized Jacobian defined by Clarke in 2.6 of [4]. Then

OF(x)=co{

lim

JF(xi)}.

(2.1)

xi--~ x xi E D F

By 2.6.5 of [4], for any x, y 6 R n,

F(y) - F(x) c co OF([x, y])(y - x ) ,

(2.2)

where the right-hand side denotes the convex hull of all points of the form V(y with Vc OF(u) for some point u in [x, y]. We will use (2.2) several times later. Let x be a point in Nn. Assume that for any h ~ Nn, lim V~OF(x+th)

t,~o

exists.

{Vh},

x)

(2.3)

L. Qi, Z S u n / N o n s m o o t h

Newton's method

355

Proposition 2.1. Under this assumption, the classic directional derivative

F'(x; h) = lira t$O

F ( x + th ) - F ( x )

(2.4)

t

exists and is equal to the limit in (2.3); i.e., F'(x; h ) =

lira

(2.5)

{Vh}.

V~OF(x+th)

Proof. Since F is locally Lipschitzian,

F(x+ th)-F(x)

(2.6)

t

is bounded. Suppose that l is a limiting point o f (2.6) as tJ,0. Then there are tj$0 such that

F ( x + tjh ) - F ( x )

l = lim j -~oo

tj

It suffices to show that I is equal to the limit in (2.3). By (2.2),

F ( x + tjh ) - F ( x )

tj

e co OF([x, x + tjh])h.

Carath6odory Theorem, there exist tJk)c[0, tj], m OF([x, x + tJX)h]), for k = 0, m, ~k=O a(k)= 1, such that By the

F(x+tjh)-F(x)

=

~j

aJk)c[0,1], VJ(~)-

~, ~(k) v(k) h... ..j *j k=O

By passing to a subsequence, we can assume that ,~)k) ~ Aj a s j ~ co. We have Zj ~ [0, 1] m for k = 0 , . . . , m and ~k-oAj = 1. Then,

l=limF(X+tjh)-F(X)=lim j-~ oc~

tj

limaS )lim(V)

--

k=Oj~oo

V) h

k =O

h}--

j~o9

This completes the proof.

aj

j -~ oO

lira k=O

V~OF(x-cth) t$o

(Vh}=

lira

(Vh .

VEOF(x+th) t~o

[]

We say that F is semismooth at x if F is locally Lipschitzian at x and lira

{ Vh'}

V~OF(x+th') h'~h,t$O

exists for any h c R n. Clearly, this assumption is stronger than (2.3) and it implies that the H a d a m a r d directional derivative lim h '~ h t~O

F ( x + th') - F ( x ) t

-

lim V~oF(x+th') h'~h,t~O

{ Vh'}.

(2.7)


356

Semismoothness was originally introduced by Mifflin [10] for functionals. Convex functions, smooth functions and subsmooth functions are examples of semismooth functions. Scalar products and sums of semismooth functions are still semismooth functions, see [10]. For applications of semismoothness and its relationship with other n o n s m o o t h properties, see [1, 2, 3, 5, 6, 10, 11, 15, 21, 26]. We need a lemma for our discussion. Lemma 2.2. Suppose that F :~" ~ Nm is a locally Lipschitzian function and F'(x; h) exists for any h at x. Then (i) F'(x; .) is Lipschitzian; (ii) for any h, there exists a V ~ O F ( x ) such that F'(x; h)= Vh. Proof. Suppose L is the Lipschitzian constant near x. Let I1" II stand for the Euclidean norm. For any h, h ' c ~ ' , IIF'(x; h ) - F'(x; h')[[ =

lira F ( x + th ) - F ( x + th') t$o

lira t;o

-t

I I f ( x + th) - F ( x + th')ll V.. However, due to the definition of the generalized Jacobian, OF is closed. Thus, V ~ OF(x). This proves (ii). [] Theorem 2.3. Suppose that F : ~" -~ R " is a locally Lipschitzian function. The following statements are equivalent: (i) F is semismooth at x; (ii) the right-hand side limit in (2.7) is uniformly convergent for all h with unit norm; (iii) the right-hand side limit in (2.5) is uniformly convergent for all h with unit norm; (iv) for any V c O F ( x + h ) , h-+O, Vh - F'(x; h) = o([[hll). (v)

lim x+hcDF h -. o

F'(x +h; h ) - F ' ( x ; h)

(2.8) -0.

(2.9)

Ilh[[

Proof. (i) ~ (ii). Suppose (ii) does not hold. Then there exist e > 0, {h k E ~": IlhkN = 1, k = 1, 2 . . . . }, IIhk - h"[I-~ 0, tk$O, Vk ~ OF(x+ tkh k) such that

IIv~tTk-F(x; h~)ll ~>2~,

(2.1o)


357

for k = 1, 2 , . . . . By passing to a subsequence, we may assume that hk-~ h. Thus, /~k~ h too. By Lemma 2.2(i), (2.10) implies that

II v~fi k - F'(x; h )ll >I ~, for all sufficiently large k. This contradicts the semismoothness assumption. (ii) ~ (iii) -+ (iv), obviously. (iv) ~ (i). Suppose that F is not semismooth at x. Then there exist h ~ R", h k --, h, e > 0 , tk+O, Vk6OF(X+tkh k) such that

II Vkh k - F'(x; h )JI ~2e,

(2.11)

for k = 1, 2 , . . . . By L e m m a 2.2(i), (2.11) implies that

II Vkh k -

F'(x; h k)ll >~e,

for all sufficiently large k. This contradicts (2.8). (iv) ~ (v), according to Lemma 2.2(ii). (v) -~ (iv). Given e > 0, by (2.9), there exists 6 > 0 such that for any h ~ R n satisfying J]h II ~< 6 and x + h c DF,

IIF'(x+ h; h ) - f'(x; h)))~< ~llhll. Assume now that I)hll ~ " is a locally Lipschitzian function. I f each component o f F is semismooth at x, then F is semismooth at x. Proof. For x + h 6 D~=, h -> 0, IIF'(x + h; h ) - F ' ( x ; h )ll ~ ~ IFl(x + h; h ) - F'i(x; h)[. i~l

Applying (2.9) to each Fi, we have F : ( x + h; h) - F'~(x; h) = o(llhll)

for each i. We thus get (2.9) for F.

[]

Recall the definition of strong Fr~chet derivative [12]: The Fr6chet derivative F ' ( x ) is said to be strong if lira F ( x ) - F ( y ) - F ' ( x ) ( z - y) _ 0.

I[z-yll

(2.16)

z~x

Corollary 2.5. I f F has strong Frdchet derivative at x, then F is semismooth at x. Proof. In (2.16), let y = x + h ~ DF, z = y + th, and t~,0 first. Then we get (2.9). If for any V e O F ( x + h ) ,

[]

h~O,

Vh - F ' ( x ; h) = O(llhJl~+P),

where 0 < p ~ l , then we call F is p-order semismooth at x. Note that p - o r d e r semismoothness (0 < p ~< 1) implies semismoothness. Remark. By (2.2), if F is s e m i s m o o t h at x, then for any h ~ 0, F ( x + h) - F ( x ) - F ' ( x ; h) =

o(Ib h II),

(2.17)

if F is p - o r d e r s e m i s m o o t h at x, then for any h ~ 0, F(x+h)-F(x)-F'(x;

h) = O(llhlll+P).

3. A nonsmooth version of Newton's method

Suppose F : ~ " - ~ R n is locally Lipschitzian. We wish to find the solution of the equation F ( x ) = 0.

(3.1)

Let us consider the n o n s m o o t h version of N e w t o n ' s m e t h o d x k+' = x k - V k l F ( x k ) ,

where Vk ~ OF(xg).

(3.2)


359

Proposition 3.1. I f all V ~ OF(x) are nonsingular, then there is a neighborhood N ( x ) o f x and a constant C such that f o r any y c N ( x ) and any V c OF(y), V is nonsingular and

II v-'ll ~ c. Proof. If the conclusion is not true, then there is a sequence y k ~ X, Vk ~ OF(y k) such that either all Vk are singular or II Vk 1[[ ~ °0. Since F is locally Lipschitzian, OF is b o u n d e d in a n e i g h b o r h o o d o f x. By passing to a subsequence, we m a y assume the Vk ~ V. Then V must be singular, a contradiction to the a s s u m p t i o n o f this proposition. This completes the proof. [] Theorem 3.2 (local convergence). Suppose that x* is a solution o f (3.1), F is locally Lipschitzian and smemismooth at x*, and all V c OF(x*) are nonsingular. Then the iteration method (3.2) is well-defined and convergent to x* in a neighborhood o f x*. I f in addition F is p-order semismooth at x*, the the convergence o f (3.2) is o f order 1 + p. Proof. By Proposition 3.1, (3.2) is well-defined in a n e i g h b o r h o o d o f x* for the first step k = 0. N o w

IIx ~+'-x*ll = Ilx ~ - x * -

V/F(x~)ll

II V ~ ' [ F ( x ~ ) - F ( x * ) - F'(x*; x ~ - x*)]ll + II v z ' [

v~(x

~ - x*)

-

g'(x*;

x ~ - x * ) ] II

(3.3)

= o(llx~-x*ll).

The last equality is due to Proposition 3.1, (2.8), and (2.17). The case that F is p - o r d e r semismooth at x is similar. [] Theorem 3.3 (global convergence). Suppose that F is locally Lipschitzian and semismooth on S = {x c R": IIx - x °11~I]Vh - g'(x; h )ll.

(3.9)

i~oO

By (3.8) and (3.9), [ [ V h - F ' ( x ; hll O , if s + r f ( x ) < O. (4.2)

This means that 7~/ is s m o o t h except on the surface s + r f ( x ) = O. Let (2, ~) is on this surface, i.e., assume that g+ r f ( ~ ) = O.

(4.3)

We now show that V ~ ( . , . ) is s e m i s m o o t h at (2, ~). Let (h, a ) c Rn+~. It suffices to show that

h'~h,o~' t$o

c~(

\ OL r

exists. If ~ + to~' + r f ( X + th')

(4.5)

keeps the same sign as h ' ~ h, a ' ~ a and t+0, then by (4.2) it is easy to see that (4.4) exists. Assume that (4.5) has different signs as h'-~ h, a ' ~ a and t$0. Then there are hj-~ h, aj ~ a and tj~0 such that ~ + tja j + r f ( ~ + tjhj) = 0.

(4.6)

By (4.3) and (4.6), tjo~j + r [ f ( 2 + tjhj) - f (~)] = 0,

i.e., ~j + r [ ~ (~Z+ tjhi) -f~(~)]/~j = 0. Since f ~ C 2, we have a + rf~(X; h) = 0,

i.e., a + r V f ( x ) Y h = 0.

(4.7)

N o w consider (h', a ' , t). Let V ~ 0V ~7(2 + th', ~+ ta'). If (4.5) is negative, by (4.2),

(00 10 r) Thus,


365

as h ' ~ h, a'--> a and t$0. If (4.5) is positive, by (4.2),

v=(r(Vf(~+th'))(Vf(~+th'))T+(~+ta'+rf(~+th'))V~f(Y,+th')Vf(~+th')) (vf/(~ + th')) T

0

"

Thus,

\a /

(Vfi('~+ th'))Th ' ---~((7fi(x))[r(Vfi(ff))Th+°z]+(s+rfi(x))Vzfi(x)h]:-( \ (Vf(~))Zh ,]

O/r),

(4.9)

as h'-~ h, a ' ~ c~ and t~,0, where the last equality in (4.9) is due to (4.3) and (4.7). By (4.8) and (4.9), V(h',) tends to the same limit in these two cases. When (4.5) is zero, by (2.1) and (4.2), V is a convex combination of the corresponding expressions in the above two cases. Hence, v(h',) also tends to the same limit in this case. Thus, (4.4) exists. This shows that V ~ / ( . , . ) is semismooth at (if, Y). The proof is completed. [] The basic step of the augmented Lagrangian method to solve (4.1) is to minimize

Lr(x,y). Thus, the extended Newton's method (3.2) and Theorem 3.2 can be applied

here. There are other examples of optimization problems where the objective functions are smooth with semismooth derivatives, but are not twice differentiable. For example, the extended linear-quadratic program, which arises from optimal control and stochastic programming, is such a problem. The primal and dual objective functions of the extended linear-quadratic programming problems are piecewise linear-quadratic and smooth, thus their derivatives are piecewise linear, hence semismooth. See [22-24]. For applications of nonsmooth versions of Newton's method in nonlinear complementarity, variational inequality and the K a r u s h - K u h n - T u c k e r system of nonlinear programming, see [13] and [14]. The extending Newton's method can also be used as a core step in interior point algorithms for convex programming. By using this step, the second-order continuity requirements commonly imposed by interior point methods on the objective function and the constraint are reduced to a semismooth condition on their gradients, while the total number of iterations remains the similar order to that of the smooth cases. See [16, 27] for such extensions.

Acknowledgments We are grateful to Terry Rockafellar and Steve Robinson for their long-term help. In particular, Terry Rockafellar provided us the augmented Lagrangian example in Section 4 during his visit at Sydney; Steve Robinson provided us several references.

366


W e also t h a n k P e t e r K a l l a n d D a n n y R a l p h f o r r e f e r e n c e s . I n a d d i t i o n , t h e c o m m e n t s o f C l a u d e L e m a r a c h e l , t h e r e f e r e e s , a n d t h e a s s o c i a t e e d i t o r w e r e v e r y h e l p f u l in r e v i s i n g this p a p e r ,

References [1] J.V. Burke and L. Qi, "Weak directional closedness and generalized subdifferentials," Journal of Mathematical Analysis and Applications 159 (1991) 485-499. [2] R.W. Chaney, "Second-order necessary conditions in constrained semismooth optimization," SlAM Journal on Control and Optimization 25 (1987) 1072-1081. [3] R. W. Chaney, "Second-order necessary conditions in semismooth optimization," Mathematical Programming 40 (1988) 95-109. [4] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983). [5] R. Correa and A. Jofre, " Some properties of semismooth and regular functions in nonsmooth analysis," in: L. Contesse, R. Correa and A. Weintraub, eds., Proceedings of the IFIP Working Conference (Springer, Berlin, 1987). [6] R. Correa and A. Jofre, "Tangent continuous directional derivatives in nonsmooth analysis," Journal of Optimization Theory and Applications 6 (1989) 1-21. [7] P.T~ Harker and B. Xiao, "Newton's method for the nonlinear complementarity problem: A B-differentiable equation approach," Mathematical Programming 48 (1990) 339-357. [8] M. Kojima and S. Shindo, "Extensions of Newton and quasi-Newton methods to systems of PC 1 Equations," Journal of Operations Research Society of Japan 29 (1986) 352-374. [9] B. Kummer, "Newton's method for non-differentiable functions," in: J. Guddat, B. Bank, H. Hollatz, P. Kall, D. Karte, B. Kummer, K. Lommatzsch, L. Tammer, M. Vlach and K. Zimmermann, eds., Advances in Mathematical Optimization (Akademi-Verlag, Berlin, 1988) pp. 114-125. [10] R. Mifflin, "Semismooth and semiconvex functions in constrained optimization," SIAM J. Control and Optimization 15 (1977) 957-972. [11] R. Mifflin, "An algorithm for constrained optimization with semismooth functions," Mathematics of Operations Research 2 (1977) 191-207. [12] J.M. Ortega and W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York, 1970). [13] J.S. Pang, "Newton's method for B-differentiable equations," Mathematics of Operations Research 15 (1990) 311-341. [14] J.S. Pang, "A B-differentiable equation-based, globally, and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems," Mathematical Programming 51 (1991) 101-131. [15] L Qi• ``Semism••thness and dec•mp•siti•n •f maxima• n•rma• •perat•rs••• J•urnal •f Mathematical Analysis and Applications 146 (1990) 271-279. [16] L. Qi and J. Sun, "A nonsmooth version of Newton's method and an interior point algorithm for convex programming," Applied Mathematics Preprint 89/33, School of Mathematics, The University of New South Wales (Kensington, NSW, 1989). [17] S.M. Robinson, "Local structure of feasible sets in nonlinear programming, part III: stability and sensitivity," Mathematical Programming Study 30 (1987) 45-66. [18] S.M. Robinson, "An implicit function theorem for B-differentiable functions," IndustrialEngineering Working Paper, University of Wisconsin (Madison, WI, 1988). [19] S.M. Robinson, "Newton's method for a class of nonsmooth functions," Industrial Engineering Working Paper, University of Wisconsin (Madison, W1, 1988). [20] R.T. Rockafellar, "Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization," Mathematics of Operations Research 6 (1981) 427-437. [21] R.T. Rockafellar, "Favorable classes of Lipschitz-continuous functions in subgradient optimization," in: E. Nurminski, ed., Nondifferentiable Optimization (Pergamon Press, New York, 1982) pp. 125143.


367

[22] R.T. Rockafellar, "Linear-quadratic programming and optimal control," S I A M Journal on Control and Optimization 25 (1987) 781-814. [23] R.T. Rockafellar, "Computational schemes for solving large-scale problems in extended linearquadratic programming," Mathematical Programming 48 (1990) 447-474. [24] R.T. Rockafellar and R.J-B, Wets, "Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time," S I A M Journal on Control and Optimization 28 (1990) 810-822. [25] A. Shapiro, "On concepts of directional differentiability," Journal of Optimization Theory and Applications 66 (1990) 477-487. [26] J,E. Spingarn, "Submonotone subdifferentials of Lipschitz functions," Transactions of the American Mathematical Society 264 (1981) 77-89. [27] J. Sun, "An affine-scaling method for linearly constrained convex programs," Preprint, Department of IEMS, Northwestern University (Evanston, IL, 1990).

A nonsmooth version of Newton's method - Semantic Scholar

A nonsmooth version of Newton's method - Semantic Scholar

Suggest Documents

Nonsmooth Newton's Method and Semidefinite ... - Semantic Scholar

A doubly stabilized bundle method for nonsmooth ... - Semantic Scholar

A direct search method for smooth and nonsmooth ... - Semantic Scholar

A TRUST REGION METHOD FOR NONSMOOTH ... - Semantic Scholar

New Version of the Newton Method for Nonsmooth ... - Springer Link

Nonsmooth Steepest Descent Method by Proximal ... - Semantic Scholar

The Fourier Method for Nonsmooth Initial Data - Semantic Scholar

A NONSMOOTH NEWTON METHOD WITH PATH ...

Approximate Newton Methods for Nonsmooth ... - Semantic Scholar

Analysis of a Path Following Method for Nonsmooth Convex Programsâ

A Regularized Nonsmooth Newton Method for Multi-class ... - CiteSeerX

A Coordinate Gradient Descent Method for Nonsmooth ... - MIT

A Coordinate Gradient Descent Method for Nonsmooth Nonseparable ...

A bundle-filter method for nonsmooth convex constrained optimization

A quasisecant method for minimizing nonsmooth functions 1 Introduction

The Contact Dynamics method: A nonsmooth story - Hal

A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex

A new trust region method for nonsmooth equations

a random search method for nonsmooth unconstrained optimization

WI2006_May_George_Prevot_authors version - Semantic Scholar

version 11 - Semantic Scholar

version 9 - Semantic Scholar

version 2 - Semantic Scholar

A Nonsmooth Equation Based BFGS Method for Solving ... - CiteSeerX

A nonsmooth version of Newton's method - Semantic Scholar