Interior-Point Gradient Methods with Diagonal ... - Semantic Scholar

Interior-Point Gradient Methods with Diagonal-Scalings for Simple-Bound Constrained Optimization Yin Zhang CAAM Technical Report TR04-06 Department of Computational and Applied Mathematics Rice University, Houston, Texas April, 2004

Abstract In this paper, we study diagonally scaled gradient methods for simple-bound constrained optimization in a framework almost identical to that for unconstrained optimization, except that iterates are kept within the interior of the feasible region. We establish a satisfactory global convergence theory for such interior-point gradient methods applied to Lipschitz continuously differentiable functions without any further assumption. Moreover, a strong convergence result is obtained for a class of so-called L-nonlinear functions introduced in this paper which includes virtually all nonlinear functions that do not contain linear pieces.

Key Words. Interior point gradient method, consistent scaling, L-nonlinear function, global convergence.

1

Introduction

A fundamental class of methods in optimization is gradient methods. Without requiring second-order derivative information, these methods are matrix-free, simple to understand and easy to implement. Despite the disadvantage of often having a slow convergence rate, gradient methods are still widely used in practice, especially in large-scale applications where high-accuracy is not essential and the alternatives, such as Newton-type methods, are not affordable. 1

In this paper we consider a general framework to extend gradient methods from unconstrained optimization problems to simple-bound constrained problems, min{f (x) : a ≤ x ≤ b},

(1)

where f : 1, which can be seen from a direct calculation [h(xk − dk ◦ ∇f (xk ))]i = [dk ◦ rk ]i > 0, ∀i ∈ I1 ∪ I2 , 7

indicating that the step α = 1 does not reach the boundary of F. We observe that a consistent scaling sequence {dk } is bounded because {h(xk )} is bounded. Therefore, condition (13) always holds for any consistent scaling sequences.

3.2

L-nonlinear Functions

For a differentiable function f : 0 and ∇f (x) + r ≡ AT Ax > 0 for x > 0. For this convex quadratic function, instead of doing a line search at each iteration, we always take the minimum in the search direction whenever it is inside the permissible interval of the IPG algorithm. We mention that the scaling d = x./(AT Ax) was studied in [13] as a special case of the IPG algorithm for solving nonnegative least squares problems with data A ≥ 0 and b > 0. Starting from the vector of all ones, we ran the IPG algorithm with the above four scalings and a maximum number of 1000 iterations. In Figure 1, we plot the relative 14

Figure 1: Relative errors [f (xk ) − f (x∗ )]/f (x∗ ) for 4 scaling sequences 1

10

d = x; d = x.2; d = x.3; d = x./AtAx;

0

10

−1

Relative Error in f(x)

10

−2

10

−3

10

−4

10

−5

10

−6

10

−7

10

−8

10

0

200

400

600

800

1000

1200

Iteration Number

errors, [f (xk ) − f (x∗ )]/f (x∗ ), against the iteration number, where x∗ is the optimal solution computed by an active set method. As can be seen from Figure 1, the four scalings have led to very different convergence behaviors on this test problem. The scaling d = x3 has the slowest convergence, followed by d = x2 and then d = x, while the scaling d = x./(AT Ax) has the fastest convergence. For this problem, the differences in the speed of convergence for these four scalings are quite significant. In fact, our numerical experiments indicate that the advantage of the last scaling tends to be more pronounced for matrices with larger condition numbers. However, the study of different scalings, as interesting as it may be, is out of the scope of the current paper.

15

6

Discussions

The IPG algorithm studied in this paper can be considered an extension to primal affinescaling gradient algorithms. A closely related work in this direction is Gonzaga and Carlos [8] which studies convergence for convex programming with linear constraints. Our results also have connections to the works of Coleman and Li [3], and Dennis and Vicente [4] that study trust-region methods for simple-bound constrained optimization using scalings related to h(xk ). Since some consistently scaled gradient directions are permissible approximate solutions to the trust-region subproblems in [3] or [4] when the matrix (Hessian approximation) in the quadratic model is set to zero, it is reasonable to argue that the convergence results in [3] and [4] have a certain overlap with ours. However, we point out that, unlike our results, the convergence results in both [3] and [4] assumed the boundedness of the sub-level set at the initial point (hence the boundedness of the entire iterate sequence). In our view, Theorem 1 gives a fairly complete characterization of the behavior of the IPG algorithm for Lipschitz continuously differentiable functions. The most interesting result, though, is perhaps the strong convergence behavior proven for L-nonlinear functions. We are not aware of any similar results of this kind in the literature. The class of consistent scalings introduced in this paper offers flexibility to the choices of diagonal scalings besides the usual affine-scaling, which may prove useful in exploiting problem structures and improving convergence behavior. This topic deserves further research.

Acknowledgments The author would like to thank Matthias Heinkenschloss, Florian Jarre, and Richard Tapia for many useful comments on early drafts of the paper that have helped improve the paper.

References [1] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1-3, 1966. [2] T. F. Coleman and Y. Li. On the convergence of reflective Newton methods for largescale nonlinear minimization subject to bounds. Mathematical Programming 67, Series A (1994), 189-224. 16

[3] T. F. Coleman and Y. Li. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization 6 (1996) pp. 418-445. [4] J. E. Dennis and L. N. Vicente. Trust-Region Interior-Point Algorithms for Minimization Problems with Simple Bounds. H. Fischer (Ed.), Applied Mathematics and Parallel Computing, Festscrift for Klans Ritter, Springer, New York, 1996, pp. 97-107. [5] J. E. Dennis, M. Heinkenschloss and L. N. Vicente. Trust-Region Interior-Point SQP Algorithms for a Class of Nonlinear Programming Problems. SIAM J. Control Optim. 36 (1998), no. 5, 1750-1794. [6] I. I. Dikin. Iterative solution of problems of linear and quadratic programming. Sov. Math. Doklady 8, pp. 674-675, 1967. [7] A. A. Goldstein. Convex programming in Hilbert space. Bull. Amer. Math. Soc. 70 (1964): 709–710. [8] C. Gonzaga and L. Carlos. A primal affine-scaling algorithm for constrained convex programs. Report ES-238/90, COPPE - Federal University of Rio de Janeiro, 1990. Revised in September 2002. [9] M. Heinkenschloss, M. Ulbrich, and S. Ulbrich. Superlinear and quadratic convergence of affine-scaling interior-point Newton methods for problems with simple bounds without strict complementarity assumption. Mathematical Programming, Vol 86 (1999), No. 3, pp. 615-635. [10] E. S. Levitin and B. T. Polyak. Constrained minimization methods. USSR Comput. Math. Math. Phys., vol. 6, pp. 1-50, 1966. [11] G. P. McCormick. Anti-zig-zagging by bending. Management Science, 15 (1969): 315– 319. [12] G. P. McCormick and R. A. Tapia. The gradient projection method under mild differentiability conditions. SIAM J. Control 10 (1972), 93–98. [13] M. Merritt and Y. Zhang. An interior-point gradient method for large-scale totally nonnegative least squares problems. Technical Report TR04-08, Department of Computational and Applied Mathematics, Rice University, Houston, Texas 77005, U.S.A., 2004. [14] J. Nocedal and S. Wright. Numerical Optimization. Springer, New York, 1999. 17

[15] A. M. Ostrowski. Solution of Equations and Systems of Equations. Academic Press, New York, 1966. [16] J. B. Rosen. The gradient projection method for nonlinear programming. I. Linear constraints. J. Soc. Indust. Appl. Math. 8 (1960): 181–217. [17] R. Tapia, Y. Zhang and Y. Ye. On the convergence of the iteration sequence in primaldual interior-point methods. Mathematical Programming, Vol.68:141-154, 1995. [18] P. Tseng. Convergence Properties of Dikin’s Affine Scaling Algorithm for Nonconvex Quadratic Minimization. Manuscript, Department of Mathematics, University of Washington, Seattle, Washington, 2001 (revised June, 2002), to appear in J. Global Optimization. [19] P. Wolfe. Convergence conditions for ascent methods. SIAM Review, 11, 226-235, 1969. [20] P. Wolfe. Convergence conditions for ascent methods II: some corrections. SIAM Review, 13, 185-188, 1971.

18

Interior-Point Gradient Methods with Diagonal ... - Semantic Scholar

Interior-Point Gradient Methods with Diagonal ... - Semantic Scholar

Suggest Documents

Adversarial Texts with Gradient Methods - Semantic Scholar

Policy Gradient Methods - Semantic Scholar

Exponentiated Gradient Methods for ... - Semantic Scholar

Gradient Matching Methods for Computational ... - Semantic Scholar

Exponentiated Gradient Methods for ... - Semantic Scholar

Scaled Diagonal Gradient-Type Method with Extra Update for Large ...

Entropy Gradient-based Exploration with ... - Semantic Scholar

Averaged Stochastic Gradient Descent with ... - Semantic Scholar

Magnetoelectric Transverse Gradient Sensor with ... - Semantic Scholar

Grid Graphs with Diagonal Edges and the ... - Semantic Scholar

Copulas with given diagonal section: some new ... - Semantic Scholar

Evaluation of Policy Gradient Methods and ... - Semantic Scholar

on conjugate gradient-like methods for eigen-like ... - Semantic Scholar

on conjugate gradient-like methods for eigen-like ... - Semantic Scholar

Policy Gradient Methods for Off-policy Control - Semantic Scholar

Norm Descent Conjugate Gradient Methods for ... - Semantic Scholar

Policy Gradient Methods for Robot Control - Semantic Scholar

Topology optimization methods with gradient-free perimeter

Nonlinear Conjugate Gradient Methods with Sufficient Descent

Topology optimization methods with gradient-free ...

Diagonal Locality Preserving Projection as ... - Semantic Scholar

STABLE SPLITTINGS AND THE DIAGONAL ... - Semantic Scholar

Extended Diagonal Exponent Symmetry Model ... - Semantic Scholar

POLYNOMIAL PRESERVING GRADIENT ... - Semantic Scholar