ERROR STABILITY PROPERTIES OF GENERALIZED ... - CiteSeerX

1 downloads 0 Views 222KB Size Report
c 1998 Plenum Publishing Corporation. ERROR STABILITY ..... Parallelization : for each processor l 2 f1;::: ;pg compute zi;j+1 l. = zi;j l ? igi;j l ; j = 1;::: ;Kl; where ...
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS Vol. 98, No. 3, pp. 663-680, September 1998

c 1998 Plenum Publishing Corporation

ERROR STABILITY PROPERTIES OF GENERALIZED GRADIENT-TYPE ALGORITHMS M. V. Solodovy and S. K. Zavrievz

ABSTRACT We present a uni ed framework for convergence analysis of the generalized subgradient-type algorithms in the presence of perturbations. One of the principal novel features of our analysis is that perturbations need not tend to zero in the limit. It is established that the iterates of the algorithms are attracted, in a certain sense, to an "-stationary set of the problem, where " depends on the magnitude of perturbations. Characterization of those attraction sets is given in the general (nonsmooth and nonconvex) case. The results are further strengthened for convex, weakly sharp and strongly convex problems. Our analysis extends and uni es previously known results on convergence and stability properties of gradient and subgradient methods, including their incremental, parallel and \heavy ball" modi cations. Key words. error-stability, perturbation analysis, gradient-type methods, incremental algorithms, approximate solutions. AMS subject classi cations. 90C31, 49M07, 65K10.  The rst author is supported in part by CNPq grant 300734/95-6. Research of the second author

was supported in part by the International Science Foundation Grant NBY000, the International Science Foundation and Russian Goverment Grant NBY300 and the Russian Foundation for Fundamental Research Grant N 95-01-01448. y Instituto de Matematica Pura e Aplicada, Estrada Dona Castorina 110, Jardim Bot^anico, Rio de Janeiro, RJ, CEP 22460-320, Brazil. Email : [email protected]. z Operations Research Department, Faculty of Computational Mathematics and Cybernetics, Moscow State University, Moscow, Russia, 119899.

1 Introduction We consider the general optimization problem min f (x); x2X

(1.1)

where X is a convex compact set in > >
> > > :



there exist sequences fx0ig; fmi g and fyig such that yi 2 F (mi ; x0i); i = 1; 2; : : : ; fx0i g ! x; fmi g ! 1 as i ! 1; and y = limi!1 yi

9 > > > > > = > > > > > ;

In particular, for a bounded sequence fxig  X , lti!1fxig is the set of all accumulation points of fxi g. We say that a sequence fxi g converges into a set C , if lti!1fxig  C . 3

Note that under our assumptions, for all x 2 X we have lt N (x ) = NX (x) x0 (2X )!x X 0

lt

and

x0 (2X )!x (2A)! 0

@fj (x0 ; )  @fj (x; 0):

(1.5)

The rest of the paper is organized as follows. In Section 2 we outline the Generalized Lyapunov Direct Method for stability analysis. In Section 3 we establish convergence properties of the generalized subgradient projection method and its modi cations in the presence of data perturbations. Section 4 contains some new results for a number of special cases. These include the case of asymptotically (relatively) small perturbations and convex, weakly sharp and strongly convex problems.

2 Generalized Lyapunov Direct Method In this section we outline the novel convergence analysis technique that was rst proposed in [29] (albeit in a slightly di erent form). This very useful technique can be viewed as a generalization of the Lyapunov Direct Method for convergence analysis of nonlinear iterative processes. The classical Lyapunov Direct Method is a powerful tool for stability analysis of both continuous and discrete time processes [19, 25, 18]. Roughly speaking, this approach reduces analysis of stability properties of a process to the analysis of local improvement of this process with respect to some scalar criterion V () (usually called the Lyapunov function). In the classical approach, V () monotonically decreases from one iterate of the process to the next. Some typical choices for V () are the objective function being minimized, or the norm of some optimality measure (see [18]). It should be noted that in certain situations, for example in the presence of perturbations or in incremental methods, one cannot exhibit a function which is guaranteed to decrease from one iteration of the algorithm to the next. This makes classical analysis not applicable. The key di erence of the technique presented in this section is that the monotonicity requirement for V () is relaxed. We thus refer to V () as a pseudo-Lyapunov function. This generalization makes our approach applicable to a wider class of algorithms, including incremental methods and methods with perturbations. We now state the Generalized Lyapunov Direct Method. Convergence (attraction) properties of the process are expressed in terms of a pseudo-Lyapunov function V (). For each 4

speci c algorithm, those properties allow further interpretation depending on the choice of V () for this algorithm. We consider the following general iterative process

xi+1 2 xi ? iG(i; xi) ?  i; i = 0; 1; : : : ; x0 2 X 0;  i 2   0 imply that r(x) = 0. Since Xs(0)  Xs("), we have that x 2 Xs("). If r(x)  ", then r(x)  "(x)  ", and hence x 2 Xs("). 13

Let d(; C ) be the distance function to the set C  0 if

f (x) ? min f (y)  d(x; Xopt ) 8x 2 X: y2X The following lemma shows that for problems with weak sharp minima, "-stationary sets coincide with the set of minima, provided " is small relative to the parameter .

Lemma 4.3 Let f () be convex on X . Assume that Xopt is a set of weak sharp minima with parameter  > 0. Then if "(x)  maxf; r(x)g 8x 2 X , where  2 [0; 1) and  2 [0; ), it follows that Xs (") = Xopt .

14

Proof. Obviously, Xopt  Xs("). Take any x 2 Xs("). By Lemmas 4.1 and 4.2, and our assumption, we have x 2 Xs("())  Xs( )  Xopt (d(x; Xopt)). Hence d(x; Xopt )  f (x)?miny2X f (y)  d(x; Xopt). Now  <  implies that d(x; Xopt) = 0, that is x 2 Xopt. In conclusion, we summarize convergence and stability properties of the serial generalized gradient projection method with a \heavy ball" term :

Algorithm 4.1 (GGPM with a heavy ball term). Start with any x0 2 X . Having xi , compute xi+1 as follows :

xi+1 = PX xi ? i(gi + (xi ; i; i)) + i(xi ? xi?1 ) ; gi 2 @f (xi ; i); i = 0; 1; : : : ; h

i

where parameters i ; i ; i are the same as speci ed in Section 3.

Combining Theorems 3.1, 3.2, and Lemmas 4.1-4.3, we immediately obtain the following convergence results for Algorithm 4.1.

Theorem 4.1 Every sequence fxi g generated by Algorithm 4.1 possesses the following properties :

1. there exists an f ()-connected component Xs(") of Xs (") such that

lt ff (xi)g = f ltfxig \ Xs(") ; i!1 



2. every subsequence fxim g of fxig satisfying (3.10) converges into Xs(") ; 3. if perturbations are relatively small, that is "(x)  r(x) for all x 2 X and some  2 [0; 1), and if the set f (Xs) is nowhere dense in 0, and "(x) <  8x 2 X , then fxi g converges into Xopt.

15

6. if f () is strongly convex with modulus  > 0, and Xs(")  int X , where " := supx2X "(x), then fxig converges into Xopt ("2=2).

Theorem 4.1 extends, strengthens and uni es results on convergence and stability properties of the generalized subgradient projection method given in [16, 8, 28].

References [1] Ya.I. Alber, A.N. Iusem, and M.V. Solodov. On the projected subgradient methods for nonsmooth convex optimization in a Hilbert space. Mathematical Programming, to appear. [2] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice{Hall, Inc, Englewood, New Jersey, 1979. [3] D.P. Bertsekas. A new class of incremental gradient methods for least squares problems. SIAM J. on Optimization, to appear. [4] D.P. Bertsekas. Nonlinear programming. Athena Scienti c, Belmont, MA, 1995. [5] D.P. Bertsekas. Incremental least squares methods and the extended Kalman lter. SIAM Journal on Optimization, 6:807{822, 1996. [6] J.V. Burke and M.C. Ferris. Weak sharp minima in mathematical programming. SIAM Journal on Control and Optimization, 31(5):1340{1359, 1993. [7] F.H. Clarke. Optimization and Nonsmooth Analysis. John Wiley & Sons, New York, 1983. [8] P. A. Dorofeev. On some properties of quasi-gradient method. USSR Computational Mathematics and Mathematical Physics, 25:181{189, 1985. [9] T. Khanna. Foundations of neural networks. Addison{Wesley, New Jersey, 1989. [10] Z.-Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Computation, 3:226{245, 1991. 16

[11] Z.-Q. Luo and P. Tseng. Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optimization Methods and Software, 4:85{101, 1994. [12] O.L. Mangasarian. Mathematical programming in neural networks. ORSA Journal on Computing, 5(4):349{360, 1993. [13] O.L. Mangasarian and M.V. Solodov. Backpropagation convergence via deterministic nonmonotone perturbed minimization. In G. Tesauro J.D. Cowan and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 383{390, San Francisco, CA, 1994. Morgan Kaufmann Publishers. [14] O.L. Mangasarian and M.V. Solodov. Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optimization Methods and Software, 4:103{ 116, 1994. [15] D.Q. Mayne and E. Polak. Nondi erentiable optimization via adaptive smoothing. Journal of Optimization Theory and Applications, 43(4):601{614, 1984. [16] V. S. Mikhalevitch, A. M. Gupal, and V. I. Norkin. Methods of nonconvex optimization. Nauka, Moscow, 1987. In Russian. [17] H. Paugam-Moisy. On parallel algorithm for backpropagation by partitioning the training set. In Neural Networks and Their Applications. Proceedings of Fifth International Conference, Nimes, France, November 2-6, 1992. [18] B.T. Polyak. Introduction to Optimization. Optimization Software, Inc., Publications Division, New York, 1987. [19] N. Rouche, P. Habets, and M Laloy. Stability Theory by Liapunov's Direct Method. Springer{Verlag, New York, 1977. [20] M.V. Solodov. Incremental gradient algorithms with stepsizes bounded away from zero. Computational Optimization and Applications, to appear. [21] M.V. Solodov. Convergence analysis of perturbed feasible descent methods. Journal of Optimization Theory and Applications, 93: 337{353, 1997. 17

[22] M.V. Solodov. New inexact parallel variable distribution algorithms. Computational Optimization and Applications, 7:165{182, 1997. [23] M.V. Solodov and B.F. Svaiter. Descent methods with linesearch in the presence of perturbations. Journal of Computational and Applied Mathematics, 80: 265{275, 1997. [24] P. Tseng. Incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. on Optimization, to appear. [25] W.I. Zangwill. Nonlinear Programming: A Uni ed Approach. Prentice{Hall, Inc, Englewood Cli s, New Jersey, 1969. [26] S. K. Zavriev. Stochastic subgradient methods for Minmax problems. Izdatelstvo MGU, Moscow, 1984. In Russian. [27] S. K. Zavriev. Convergence properties of the gradient method under variable level interference. USSR Computational Mathematics and Mathematical Physics, 30:997{ 1007, 1990. [28] S.K. Zavriev and A.G. Perevozchikov. Attraction of trajectories of nite-di erence inclusions and stability of numerical methods of stochastic nonsmooth optimization. Soviet Phys. Doklady, 313:1373{1376, 1990. [29] S.K. Zavriev and A.G. Perevozchikov. Direct Lyapunov's method in attraction analysis of nite-di erence inclusions. USSR Computational Mathematics and Mathematical Physics, 30(1):22{32, 1990.

18