LOCAL CONVERGENCE OF FILTER METHODS FOR EQUALITY CONSTRAINED NONLINEAR PROGRAMMING ´ ELIZABETH W. KARAS§ , CLOVIS C. GONZAGA¶, AND ADEMIR A. RIBEIRO§ Abstract. In [10] we discuss general conditions to ensure global convergence of inexact restoration filter algorithms for nonlinear programming. In this paper we show how to avoid the Maratos effect by means of a second order correction. The algorithms are based on feasibility and optimality phases, which can be either independent or not. The optimality phase differs from the original one only when a full Newton step for the tangential minimization of the Lagrangian is efficient but not acceptable by the filter method. In this case a second order corrector step tries to produce an acceptable point keeping the efficiency of the rejected step. The resulting point is tested by trust region criteria. Under usual hypotheses, the algorithm inherits the quadratic convergence properties of the feasibility and optimality phases. The paper includes a comparison between classical SQP and Inexact Restoration iterations, showing that both methods share the same asymptotic convergence properties.
Key words. Filter methods, nonlinear programming, local convergence
1. Introduction. The filter methods were introduced by Fletcher and Leyffer in their important paper [6]. Since then, the filter technique has been mostly applied, so far, to SLP (Sequential Linear Programming), SQP (Sequential Quadratic Programming) and IR (Inexact Restoration) type methods. Global convergence for SLP was obtained by Fletcher, Leyffer and Toint [7] and by Chin and Fletcher [3]; for SQP the proof was given by Fletcher, Gould, Leyffer, Toint and W¨achter [4] and by Fletcher, Leyffer and Toint [8] and for IR the proof was obtained by Gonzaga, Karas and Vanti [10] and by Karas, Oening and Ribeiro [13]. Ribeiro, Karas and Gonzaga develop in [18] a general global convergence analysis of filter methods that does not depend on the particular way in which the step is computed, as long as a certain efficiency condition is satisfied. The SQP-filter approach was also applied to interior point algorithms by Ulbrich, Ulbrich and Vicente [19]. Audet and Dennis [1] present a pattern search filter method for derivative-free nonlinear programming. Gould, Leyffer and Toint [11] introduce a multidimensional filter algorithm for solving nonlinear feasibility problems. Gould, Sainvitu and Toint [12] extend the multidimensional filter techniques to general unconstrained optimization problems. Filter methods were also used in the context of nonsmooth optimization by Fletcher and Leyffer [5] and by Karas, Ribeiro, Sagastiz´ abal and Solodov [14]. A review of filter methods is presented by Fletcher, Leyffer and Toint in [9]. Fletcher and Leyffer [6] comment that filter algorithms may suffer from the Maratos effect [15], and propose a second order correction to remedy this shortcoming. Ulbrich [20] proposes a modified version of the filter-SQP algorithm introduced by Fletcher, Leyffer and Toint [8]. To avoid the Maratos effect, Ulbrich uses the value of the Lagrangian function in the filter instead of the objective function and modify the infeasibility measure in the filter slightly. This modification ensures transition to superlinear local convergence without using second order correction steps. W¨achter § Department
of Mathematics, Federal University of Paran´ a. Cx. Postal 19081, 81531-980 Curitiba, PR, Brazil; e-mail :
[email protected],
[email protected]. Supported by PRONEX Optimization. ¶ Department of Mathematics, Federal University of Santa Catarina. Cx. Postal 5210, 88040-970 Florian´ opolis, SC, Brazil; e-mail :
[email protected]. Supported by CNPq - Brazil and PRONEX - Optimization. 1
2
E. KARAS, C. GONZAGA AND A. RIBEIRO
and Biegler propose in [22] a line search filter method and prove global convergence. In [21] they discuss the usage of a second order correction in order to avoid the Maratos effect, so that fast local convergence to the solution is achieved. In this paper we modify the algorithm proposed by Gonzaga, Karas and Vanti [10] in order to avoid the Maratos effect by means of a second order correction. The algorithm iterations are composed of a feasibility and an optimality phase, which we shall treat as independent. Each optimality phase begins by trying a full tangential Newton step for minimizing a Lagrangian function. The interesting situation occurs when this Newton step satisfies a trust region sufficient decrease test for the Lagrangian but is not accepted by the algorithm (either because it violates the filter or because the objective does not decrease enough). In this case we introduce a second order correction, which consists of a slightly modified restoration step, whose goal is to find an acceptable point for which the objective function decreases sufficiently. The algorithm behaves exactly as the method in [10], except in the case cited above, and then the only modification is the addition of an extra restoration step. This has no effect on the global convergence of the method and we only have to worry with the local convergence. We show that under usual hypotheses (sufficient second order optimality conditions), the algorithm with second order corrections is locally quadratically convergent. In this section we introduce the problem, the hypotheses and some basic results. Section 2 describes the algorithm. Section 3, which may have some interest independently, compares the SQP Newton step and the IR Newton step, showing that they have similar asymptotic quadratic convergence properties with respect to the Lagrangian. Section 4 shows that the quadratic convergence is preserved when corrector steps are added to the filter method. Section 5 is added for aesthetic reasons: it shows that the corrector step is indeed efficient near an optimal solution. In this paper we restrict the treatment to the equality constrained problem (P )
minimize subject to
f (x) c(x) = 0,
where the functions f : IRn → IR and c : IRn → IRm are twice continuously differentiable. The Jacobian matrix of c is denoted by A(·). The Lagrangian function associated with (P) is given by (1.1)
x ∈ IRn , λ ∈ IRm 7→ ℓ(x, λ) = f (x) + λT c(x),
where λ ∈ IRm is the vector of Lagrange multipliers. Given (x, λ), the Hessian of Lagrangian, ∇2xx ℓ(x, λ), is denoted H(x, λ). We consider a measure of constraint infeasibility x ∈ IRn 7→ h(x), which is an exact penalty applied to the constraints. Usually this measure is given by (1.2)
h(x) = kc(x)k,
where k · k denotes an arbitrary norm. Notation. Given two functions g1 : X ⊆ IRn → IRm and g2 : X ⊆ IRn → IR+ we say that: • g1 (x) = O(g2 (x)) in Γ ⊆ X if there exists M > 0 such that for all x ∈ Γ, kg1 (x)k ≤ M g2 (x). • g1 (x) = Ω(g2 (x)) in Γ ⊆ X if there exists N > 0 such that for all x ∈ Γ, kg1 (x)k ≥ N g2 (x).
LOCAL CONVERGENCE OF FILTER METHODS
3
Given a matrix A ∈ IRm×n we denote the null space of A and its orthogonal complement, the range space of AT , respectively by N (A) = {x ∈ IRn | Ax = 0} and R(AT ) = {AT w | w ∈ IRm }. Hypotheses. In [10] we develop a globally convergent filter algorithm which generates sequences (xk ) and (z k ) in IRn satisfying the following assumptions: (H1) The iterates (xk ) and (z k ) remain in a convex compact domain X ⊂ IRn . (H2) All the functions f (·) and ci (·), for i = 1, . . . , m, are uniformly Lipschitz continuously differentiable in an open set containing X. (H3) At all feasible accumulation points x ¯ ∈ X of (xk ), the gradients ∇ci (¯ x), for i = 1, . . . , m, are linearly independent. From (H2) we conclude that for x, y ∈ X and i = 1, . . . , m, ci (y) = ci (x) + ∇ci (x)T (y − x) + O(ky − xk2 ).
(1.3)
Our aim is to modify the algorithm proposed in [10] to ensure quadratic local convergence. For that, we assume that the sequence (xk ) converges to a local solution x∗ and the following holds: (H4) All the functions f (·) and ci (·), for i = 1, . . . , m, are twice continuously differentiable and ∇2 f (·) and ∇2 ci (·) are Lipschitzian in a neighborhood V¯0 of x∗ . (H5) At the solution point x∗ with optimal Lagrange multiplier λ∗ , the Hessian of the Lagrangian H(x∗ , λ∗ ) is positive definite on the tangent space of the constraints, i.e., dT H(x∗ , λ∗ )d > 0, for all d 6= 0, d ∈ N (A(x∗ )). From (H4) we conclude that for x, y ∈ V¯0 , λ ∈ IRm and i = 1, . . . , m, ∇f (y)
= ∇f (x) + ∇2 f (x)(y − x) + O(ky − xk2 ),
∇ci (y)
= ∇ci (x) + ∇2 ci (x)(y − x) + O(ky − xk2 )
(1.4) and (1.5)
1 ℓ(y, λ) = ℓ(x, λ) + ∇x ℓ(x, λ)T d + dT H(x, λ)d + O(kdk3 ), 2
where d = y − x. We assume from now on that Hypotheses (H1-H5) are satisfied. The following facts follow directly from the hypotheses. Fact 1.1. Using the notation in (H1-H5), there hold: (i) h(x) = O(kx − x∗ k) for x ∈ X. (ii) A(x) − A(z) = O(kx − zk) for x ∈ X. (iii) Given λ ∈ IRm (fixed), H(x, λ) − H(z, λ) = O(kx − zk) for x, z ∈ V¯0 . Fact 1.2. Consider V0 = {(x, λ) | x ∈ V¯0 , λ ∈ IRm }. There exist a neighborhood V1 ⊂ V0 of (x∗ , λ∗ ) and δ > 0 such that if (x, λ) ∈ V1 , then A(x) has full row rank and for all d ∈ N (A(x)), dT H(x, λ)d ≥ δkdk2 .
4
E. KARAS, C. GONZAGA AND A. RIBEIRO
2. The filter algorithm. Gonzaga, Karas and Vanti [10] proposed an inexact restoration (IR) filter method for nonlinear programming and proved its global convergence. Each iteration is composed of a feasibility phase, which reduces a measure of infeasibility, and an optimality phase, which reduces the objective function in a tangential approximation of the feasible set. The method is independent of the internal algorithms used in each iteration, as long as these algorithms satisfy reasonable assumptions on their efficiency. We now present their main algorithm with a slight modification in the feasibility phase. Algorithm 2.1. Filter algorithm. Data: x0 ∈ IRn , F0 = ∅, G0 = ∅, α ∈ (0, 1), β > 1. k=0 repeat ˜ = (f (xk ) − αh(xk ), (1 − α)h(xk )). (f˜, h) S ˜˜ Construct the set F¯k = Fk {(f , h)}. S ˜ ¯ k = Gk Define the set G {(f, h) ∈ IR2 | f ≥ f˜, h ≥ h}. Feasibility phase: if h(xk ) = 0, then set z k = xk else compute z k such that ¯k. h(z k ) < (1 − α) h(xk ) and (f (z k ), βh(z k )) ∈ /G if impossible, then stop with insuccess. Optimality phase: if z k is stationary, then stop with success else compute xk+1 such that f (xk+1 ) ≤ f (z k ) and ¯k. (f (xk+1 ), h(xk+1 )) ∈ /G Filter update: if f (xk+1 ) < f (xk ), then Fk+1 = Fk , Gk+1 = Gk (f -iteration) else ¯k Fk+1 = F¯k , Gk+1 = G (h-iteration) k = k + 1. The filter. The algorithm memorizes a set Fk of pairs (f i , hi ) as in Figure 2.1, and blocks the points x such that f (x) > f i and h(x) > hi in future iterations. Whenever an iteration from xk produces a point xk+1 such that f (xk+1 ) ≥ f (xk ), the pair (f (xk ) − αh(xk ), (1 − α)h(xk )) is added to the filter. This filter algorithm is studied in [10], where it is shown to be globally convergent under reasonable hypotheses. We assume that the Hypotheses (H1-H5) are satisfied, and now we state the assumptions on the performance of the internal algorithms used at each iteration. Feasibility phase. Given xk ∈ IRn , the purpose of the feasibility phase is to find a point z k such that h(z k ) < h(xk ) and the pair (f (z k ), h(z k )) is well below the set ¯ k blocked by the filter. In the algorithm, we require that h(z k ) ≤ (1 − α)h(xk ) and G ¯ k . Here α is the same value used in the construction of the filter, (f (z k ), βh(z k )) ∈ /G ¯ k , as shown in and β > 1 ensures that (f (z k ), h(z k )) is indeed well below the set G Figure 2.1. The introduction of this constant β > 1 is the only difference between this algorithm and the original one presented in [10]. It is done to ensure some “elbow space” around the points z k . This of course can only improve the global convergence properties. Since we intend to prove quadratic convergence for the overall algorithm, we must assume that the algorithm for solving the feasibility phase is quadratically convergent.
LOCAL CONVERGENCE OF FILTER METHODS
h
5
x indicates ( f(x),h(x))
− − Gk
k
x
( f ,i h i ) zk
Hk( f )
f Fig. 2.1. The filter
This will be formally stated in the hypothesis below. Feasibility phase condition. (H6) There exists a neighborhood V¯2 of x∗ such that for any x ∈ V¯2 , the feasibility algorithm produces z ∈ V¯2 satisfying c(z) = z−x =
O(kc(x)k2 ) O(|h(x) − h(z)|).
Note. We shall use decreasing neighborhoods of (x∗ , λ∗ ) in this paper. The neighborhood V¯2 is defined in IRn . For coherence, we define a neighborhood V2 of (x∗ , λ∗ ) given by (2.1)
V2 = {(x, λ) ∈ V1 | x ∈ V¯2 },
where V1 is given in Fact 1.2. 2.1. Optimality phase algorithm. The optimality phase algorithm must find xk+1 not forbidden by the filter such that f (xk+1 ) ≤ f (z k ). We shall describe a very general trust region method for this. The generic filter algorithm does not specify how to model the objective function in the tangent space at z k . Any quadratic model with a symmetric bounded Hessian leads to global convergence. In this paper, quadratic convergence will be achieved by using the second order Taylor model of the Lagrangian, for a given multiplier λk . In the first iteration, λ0 is given (or estimated). For k > 0, each iteration will estimate the multiplier for the next. One can use any method to choose the Lagrange multipliers, as long as the sequence (λk ) keeps bounded. For Lagrange multiplier estimators, see [2, 16]. Given z = z k ∈ X generated by iteration k of Algorithm 2.1 in the feasibility phase and given λ = λk ∈ IRm , the trust region algorithm computes a step d2 , solution of the following problem: (T P )
minimize subject to
mz (d) A(z)d = 0 kdk ≤ ∆,
6
E. KARAS, C. GONZAGA AND A. RIBEIRO
where (2.2)
1 d ∈ IRn 7→ mz (d) = ℓ(z, λ) + ∇x ℓ(z, λ)T d + dT H(z, λ)d 2
is a quadratic model of ℓ. The point z + d2 will be denoted by y. The IR Newton step. Whenever it exists, the IR Newton step dˆ2 is a solution of (TP) for ∆ = +∞. It is given by the solution of
(2.3)
H(z, λ)d2 + A(z)T λ+ A(z)d2
= −∇f (z) = 0.
This equation provides not only a step dˆ2 , but also a new multiplier λ+ , which will be used by the algorithm in the iteration k + 1. The IR Newton step and its properties will be studied in detail in Section 3. For the second order correction we shall need to define the filter height at a point: given a filter F¯k , the filter height at a function value f ∈ IR is defined as (2.4)
Hk (f ) = inf{hi |(f i , hi ) ∈ F¯k and f i ≤ f }.
This function decreases from +∞ to a positive value (see Fig 2.1). Note that by construction of z , h(z) < Hk (f (z))/β. We now present the algorithm for the optimality phase, and then discuss its motivation and behavior. Algorithm 2.2. Optimality phase. Data: 0 < η1 < η2 < 1, z = z k , λ = λk , ∆min > 0, ∆ ≥ ∆min . repeat Compute d2 ∈ IRn solution of the trust region problem (TP). Set y = z + d2 , pred = mz (0) − mz (d2 ), ared = f (z) − f (y). ¯k if ared ≥ η1 pred and (f (y), h(y)) ∈ /G k+1 + set x = y, x = y, choose a new multiplier λk+1 and return. if kd2 k < ∆ (IR Newton step) Choose λk+1 = λ+ given by (2.3). aredℓ = ℓ(z, λk+1 ) − ℓ(y, λk+1 ). if aredℓ ≥ η2 pred Second order correction: Compute a feasibility step to find x+ ∈ IRn such that h(x+ ) < Hk (f (z)). if f (z) − f (x+ ) ≥ η1 pred , set xk+1 = x+ and return. ∆ = kd2 k/2. This algorithm is essentially the same as the one presented in [10]. The only difference is that the second order corrections may be done. The second order correction is tried only once, when the full IR Newton step fails the test ared ≥ η1 pred and passes the test aredl ≥ η2 pred. Note that the second “if” can only be true in the first pass of the loop, because later the IR Newton step will be out of the trust region. To see that the algorithm is well defined note that: (i) The correction step can be computed in finite time: this follows from the fact that
LOCAL CONVERGENCE OF FILTER METHODS
7
h(z) ≤ Hk (f (z))/β with β > 1 fixed, and the algorithm imposes h(x+ ) < Hk (f (z)). This inequality will be satisfied in a large neighborhood of z. Fig. 2.1 shows the regions satisfying these inequalities in the filter plot. In Section 5 we show that a single Newton restoration step will usually succeed at points near x∗ . (ii) When the corrector step is successful, x+ will be accepted by the filter: if x+ is output by the corrector step, then by construction f (x+ ) < f (z) and h(x+ ) < Hk (f (z)). Since Hk (·) is decreasing, it follows that h(x+ ) < Hk (f (x+ )) and hence x+ is accepted by the filter. Global convergence. The proof of convergence in [10] is based on a generic quadratic model for f (·) in each tangent step. The quadratic model for the Lagrangian qualifies as long as the Hessian matrices are Lipschitzian and the multipliers λ are kept bounded. The convergence study is based on the fact that near feasible non-stationary points the optimality step results in a large decrease of the objective function. The second order correction only intervenes when the full IR Newton step produces a large decrease in the Lagrangian but not in the objective and then the correction improves the value of the objective, resulting in an acceptable step. If x+ is accepted, f (z) − f (x+ ) ≥ η1 pred. Otherwise, ∆ is reduced by at least 1/2, resulting in a predicted reduction pred′ ≤ pred. The step finally accepted needs only to satisfy f (z) − f (xk+1 ) ≥ η1 pred′ which certainly holds for x+ . In general, the corrected step (when accepted) should be better than the traditional one. Remarks. (i) It is usually said that filter methods do not use merit functions. This is only partly true, because we measure the quality of the optimality step by comparing the variation ared of f (·) (which is then a merit function) with the variation pred of the quadratic model for the Lagrangian. The values of f (x) − f (z) and l(x, λ) − l(z, λ) coincide if c(x) = c(z). But even for x − z in the tangent space at z, these two values may differ very much due to the curvature of c(·). In this case it may well happen that aredl is near pred, but ared is not near pred. The Maratos effect happens when (for a Newton step) ared > 0. (ii) The second order correction is being done only when the IR Newton step is accepted, because only this case affects the local convergence study. It can certainly be used in any other iterations, since it is harmless. It may lead to larger steps, at the cost of an extra restoration. Second order correction. A second order correction step is performed when the situation described above happens: since the reason for the discrepancy between the values of ared and aredl are due to the discrepancy between c(y) and c(z), we do a restoration step from y to obtain x+ with c(x+ ) ≈ c(z). The second order correction step is computed as a feasibility step from y when (2.5)
ℓ(z, λ) − ℓ(y, λ) > η2 (mz (0) − mz (y − z))
and
kx − yk < ∆.
These conditions mean that y is a global minimizer of mz in the tangent space (a full Newton step for the Lagrangian minimization) and the sufficient decrease condition for the Lagrangian is satisfied. The correction is a feasibility step from y to a point x+ such that c(x+ ) ≈ c(z). If ℓ(x+ , λ) − ℓ(y, λ) is small, then f (x+ ) − f (z) ≈ ℓ(x+ , λ) − ℓ(z, λ) ≈ ℓ(y, λ) − ℓ(z, λ) < 0. Since the second order correction is performed by a feasibility step, we will assume
8
E. KARAS, C. GONZAGA AND A. RIBEIRO
a condition like (H6), that is, if y, z ∈ V¯2 , with h(z) < h(y), then x+ ∈ V¯2 and c(x+ ) − c(z) = O(kc(y) − c(z)k2 ) x+ − y = O(|h(x+ ) − h(y)|),
(2.6)
where V¯2 is the neighborhood of x∗ given by (H6). From (2.6), there exists a constant M > 0 such that kx+ − yk ≤ ≤ ≤
M |h(x+ ) − h(y)| M kc(x+ ) − c(y)k M (kc(x+ ) − c(z)k) + kc(z) − c(y)k.
Since c(x+ ) − c(z) = O(kc(y) − c(z)k2 ), we have x+ − y = O(kc(y) − c(z)k).
(2.7)
h y
c=0
z +
x
x
x
x+ z
y
f
Fig. 2.2. A complete iteration of the algorithm.
Figure 2.2 shows a complete iteration for the well-known example by Powell [17]. Starting at x, a restoration produces z with c(z) ≈ 0. A Lagrange multiplier estimator is computed at z and y is the result of a tangential IR Newton step for the Lagrangian. We see that both c(·) and f (·) increase from z to y. The second order correction produces x+ by computing a Newton iteration for solving c(w) = c(z). The right subplot shows the trajectory from x to x+ in the filter representation. 3. Quadratic convergence properties. The classical SQP method based on pure Lagrangian minimization iterates (without using filter or merit functions) is known to be quadratically convergent under our hypotheses. Proofs of quadratic convergence are described for instance in Bonnans, Gilbert, Lemar´echal, Sagastiz´ abal [2] and Nocedal and Wright [16]. In this section we show that ignoring the filter and using inexact restoration iterates with or without second order corrections instead of the classical SQP iterations the quadratic convergence is preserved. Note that in this treatment each tangential step uses a Newton step for minimizing the Lagrangian and accepts it. In the next section we shall prove that for large k these iterates will be accepted by the filter algorithm, thus proving quadratic convergence. Our strategy in this section is the following: we quote the classical result and then show that near x∗ the inexact restoration step is very similar to the classical SQP step, thus having the same convergence properties. Note that these results are general, and can be applied to the convergence analysis for filter or merit function based methods.
LOCAL CONVERGENCE OF FILTER METHODS
9
Optimality step. Consider z = z k and λ = λk obtained by Algorithm 2.1 with the optimality phase computed by Algorithm 2.2. If we ignore the filter in Algorithm 2.2, then the trust region constraint will be inactive for all sufficiently large k. In this case, we may disregard the trust region constraint in (TP) without affecting the solution. Then the (TP) problem can be rewritten as minimize subject to
(QP )
mz (d) A(z)d = 0.
If dˆ2 ∈ IRn is a solution of (QP ), then there exists µ ∈ IRm such that H(z, λ)dˆ2 + A(z)T µ = −∇x ℓ(z, λ) (3.1) A(z)dˆ2 = 0. We denote yˆ = z + dˆ2
(3.2)
and
λ+ = λ + µ.
The vector λ+ is a good multiplier estimator at yˆ, as we shall see below. By expanding the right-hand side of (3.1), we see that it is equivalent to the system (2.3) used in the tangential step. The next lemma will be useful ahead. Lemma 3.1. Consider the neighborhood V2 given by (2.1). Given (x, λ) ∈ V2 , let ˆ d2 be a solution of (QP). Then mz (0) − mz (dˆ2 ) = Ω(kdˆ2 k2 ). Proof. From (3.1) −∇x ℓ(z, λ)T dˆ2 = dˆT2 H(z, λ)dˆ2 + µT A(z)dˆ2 = dˆT2 H(z, λ)dˆ2 . Then, by (2.2), 1 1 mz (0) − mz (dˆ2 ) = −∇x ℓ(z, λ)T dˆ2 − dˆT2 H(z, λ)dˆ2 = dˆT2 H(z, λ)dˆ2 . 2 2 Using Fact 1.2, we complete the proof. The SQP Newton Step. Now we describe the classical local SQP Newton step from given (x, λ), which will later be compared to our feasibility-optimality combination to conclude that they have the same local convergence properties. Given (x, λ) we consider the quadratic model (2.2) of ℓ and the problem g) (QP
minimize subject to
mx (d) A(x)d = −c(x).
g ), then there exists µ If d˜ ∈ IRn is a solution of (QP ˜ ∈ IRm such that H(x, λ)d˜ + A(x)T µ ˜ = −∇x ℓ(x, λ) (3.3) ˜ A(x)d = −c(x). The quadratic convergence of Newton’s method is stated in the following theorem:
10
E. KARAS, C. GONZAGA AND A. RIBEIRO
Theorem 3.2. Consider the neighborhood V2 given by (2.1). There exist a neigh˜µ borhood V3 ⊂ V2 of (x∗ , λ∗ ) and a constant M1 > 0 such that if (x, λ) ∈ V3 , then (d, ˜) is uniquely determined by (3.3) and satisfies ˜λ+µ k(x + d, ˜) − (x∗ , λ∗ )k ≤ M1 k(x, λ) − (x∗ , λ∗ )k2 . Proof. This theorem coincides with Theorem 12.4 in [2]. Our hypotheses and Fact 1.2 guarantee that the hypotheses used in that theorem hold. ˜ called SQP Newton step, can be uniquely decomposed as d˜ = d˜1 + d˜2 , The step d, where d˜1 ∈ R A(x)T and d˜2 ∈ N (A(x)). We denote (3.4)
z˜ = x + d˜1
and
˜ y˜ = x + d.
Note that, from (3.3), A(x)d˜1 + c(x) = 0. Using (1.3), c(˜ z ) = c(x) + A(x)d˜1 + O(k˜ z − xk2 ). Thus, c(˜ z ) = O(k˜ z − xk2 ).
(3.5) Given (x, λ) we denote
ζ = k(x, λ) − (x∗ , λ∗ )k.
(3.6)
The next four lemmas show the similarities between the classical SQP step and an inexact restoration step, concluding that they are very near each other. Given x ∈ X, consider z obtained by the feasibility phase of the algorithm, yˆ defined by (3.2), z˜ and y˜ defined by (3.4). Lemma 3.3. Consider V3 the neighborhood given by Theorem 3.2. If (x, λ) ∈ V3 , then x, z, z˜, y˜ ∈ B(x∗ , r) with r = O(ζ). Furthermore y˜ − x∗ = O(ζ 2 ) µ ˜ = O(ζ). Proof. Initially, note that, by definition of ζ, x − x∗ = O(ζ). By the triangle inequality we have kz − x∗ k ≤ kz − xk + kx − x∗ k. By Hypothesis (H6), if (x, λ) ∈ V3 , then kz − xk = O(|h(x) − h(z)|). But, since 0 ≤ h(z) ≤ h(x), |h(x) − h(z)| ≤ h(x) = O(kx − x∗ k)
LOCAL CONVERGENCE OF FILTER METHODS
11
where the last equality follows from Fact 1.1. Hence z − x∗ = O(ζ). Using the triangle inequality again and (3.4) we have k˜ z − x∗ k ≤ k˜ z − xk + kx − x∗ k ≤ k˜ y − xk + kx − x∗ k ≤ k˜ y − x∗ k + 2kx − x∗ k. But, from Theorem 3.2, k˜ y − x∗ k = O(ζ 2 ). Thus z˜ − x∗ = O(ζ). Finally, using again Theorem 3.2, k˜ µk ≤ kλ + µ ˜ − λ∗ k + kλ∗ − λk = O(ζ), completing the proof. Lemma 3.4. Let V3 be the neighborhood given in Theorem 3.2. If (x, λ) ∈ V3 , then H(z, λ)(ˆ y − y˜) + A(z)T (µ − µ ˜) = O(ζ 2 ). Proof. From (3.3) and (3.1) we have (3.7)
H(x, λ)(˜ y − x) + A(x)T µ ˜ = −∇x ℓ(x, λ),
(3.8)
H(z, λ)(ˆ y − z) + A(z)T µ = −∇x ℓ(z, λ).
Subtracting (3.8) from (3.7) we have (3.9)
∇x ℓ(z, λ) − ∇x ℓ(x, λ)
= H(x, λ)(˜ y − x) − H(z, λ)(ˆ y − z) + + A(x)T µ ˜ − A(z)T µ.
On the other hand, from (1.4), (3.10)
∇x ℓ(z, λ) − ∇x ℓ(x, λ) = H(x, λ)(z − x) + O(kz − xk2 ).
Subtracting (3.9) from (3.10) H(x, λ)(z − y˜) + H(z, λ)(ˆ y − z) + A(z)T µ − A(x)T µ ˜ + O(kz − xk2 ) = 0. We can also write H(z, λ)(ˆ y − y˜) + A(z)T (µ − µ ˜)
= (H(x, λ) − H(z, λ))(˜ y − z) + + (A(x) − A(z))T µ ˜ + O(kz − xk2 ).
Using Fact 1.1 and Lemma 3.3 we have the result. Lemma 3.5. Let V3 be the neighborhood given in Theorem 3.2. If (x, λ) ∈ V3 , then A(z)(ˆ y − y˜) = O(ζ 2 ). Proof. By (1.3), (3.11)
c(˜ z ) = c(z) + A(z)(˜ z − z) + O(k˜ z − zk2 ).
12
E. KARAS, C. GONZAGA AND A. RIBEIRO
Using (H6) and (1.2), we obtain c(z) = O(kc(x)k2 ) = O(h(x)2 ). From Fact 1.1 and Lemma 3.3, it follows that c(z) = O(ζ 2 ). Using (3.5) and Lemma 3.3, c(˜ z ) = O(ζ 2 ). From (3.11) and Lemma 3.3 we conclude that A(z)(z − z˜) = O(ζ 2 ). By (3.1) we have A(z)z = A(z)ˆ y , hence A(z)(ˆ y − z˜) = O(ζ 2 ).
(3.12) On the other hand A(z)(ˆ y − y˜) = =
A(z)(ˆ y − z˜) + A(z)(˜ z − y˜) A(z)(ˆ y − z˜) + (A(x) − A(z))(˜ y − z˜)
where we used the fact (˜ y − z˜) ∈ N (A(x)). Using (3.12), Fact 1.1 and Lemma 3.3 we complete the proof. Lemma 3.6. Let V3 be the neighborhood given in Theorem 3.2. There exists a neighborhood V4 ⊂ V3 of (x∗ , λ∗ ) such that if (x, λ) ∈ V4 , then (ˆ y , µ) − (˜ y, µ ˜) = O(ζ 2 ). Proof. From Lemmas 3.4 and 3.5, if (x, λ) ∈ V3 , then H(z, λ) A(z)T yˆ − y˜ (3.13) = O(ζ 2 ). A(z) 0 µ−µ ˜ Using Fact 1.2 and [2, Proposition 12.1] the matrix in (3.13) has a uniformly bounded inverse in some neighborhood V of (x∗ , λ∗ ). Consider V4 = V3 ∩V . Thus, if (x, λ) ∈ V4 , then (ˆ y , µ) − (˜ y, µ ˜) = O(ζ 2 ), completing the proof. The next theorem concludes the comparison between the methods, showing that the step decomposed in two phases inherits the quadratic convergence properties of the SQP Newton step. Theorem 3.7. Consider the neighborhood V4 given in Lemma 3.6. If (x, λ) ∈ V4 , then k(ˆ y, λ + µ) − (x∗ , λ∗ )k = O(k(x, λ) − (x∗ , λ∗ )k2 ). Proof. Using the triangle inequality, we have k(ˆ y , λ + µ) − (x∗ , λ∗ )k ≤ k(ˆ y , λ + µ) − (˜ y, λ + µ ˜)k + k(˜ y, λ + µ ˜) − (x∗ , λ∗ )k. Using Lemma 3.6, Theorem 3.2 and (3.6), we complete the proof.
LOCAL CONVERGENCE OF FILTER METHODS
13
Remark. By writing yˆ − z = (ˆ y − x∗ ) + (x∗ − z), it follows from Theorem 3.7 and Lemma 3.3 that yˆ − z = O(ζ).
(3.14)
We now extend the quadratic convergence properties to the result of a second order correction from y. The correction step is small in relation to the complete tangential step, and has little effect on the reduction of the Lagrangian. Its effect is on the balance between the changes in f (·) and c(·). Lemma 3.8. Consider the neighborhood V4 given in Lemma 3.6. If (x, λ) ∈ V4 , then (i) c(ˆ y ) − c(z) = O(kˆ y − zk2 ), + (ii) kx − yˆk = O(kˆ y − zk2 ). Proof. From (1.3), c(ˆ y ) = c(z) + A(z)(ˆ y − z) + O(kˆ y − zk2 ). Using (3.1) we prove (i). From (2.7) kx+ − yˆk = O(kc(ˆ y ) − c(z)k). Using (i), we have the result. We conclude with the main result of this section. Theorem 3.9. Consider the neighborhood V4 given in Lemma 3.6. If (x, λ) ∈ V4 , then k(x+ , λ + µ) − (x∗ , λ∗ )k = O(k(x, λ) − (x∗ , λ∗ )k2 ). Proof. Using the triangle inequality, we have k(x+ , λ + µ) − (x∗ , λ∗ )k ≤ k(x+ , λ + µ) − (ˆ y , λ + µ)k + k(ˆ y , λ + µ) − (x∗ , λ∗ )k. From Lemma 3.8 and (3.14), k(x+ , λ + µ) − (ˆ y , λ + µ)k = kx+ − yˆk = O(kˆ y − zk2 ) = O(ζ 2 ). By Theorem 3.7, k(ˆ y , λ + µ) − (x∗ , λ∗ )k = O(ζ 2 ). Therefore k(x+ , λ + µ) − (x∗ , λ∗ )k = O(ζ 2 ), completing the proof. 4. Acceptance by the algorithm. In Section 3 we studied the quadratic convergence of the algorithm disregarding the filter and the sufficient descent condition of trust region algorithm used in the tangent step. When we take a tangent step from z to y, it may well happen that y − z is a Newton step (i.e, y is not on the boundary of the trust region), y satisfies a sufficient decrease condition, but it is not accepted by the filter method: only in this situation
14
E. KARAS, C. GONZAGA AND A. RIBEIRO
we use a second order correction, which either fails or produces x+ accepted by the filter. In this section we show that for large k this scheme will work: the sufficient decrease condition for the Lagrangian will hold for y and will lead to a sufficient decrease for f (·) between z and x+ . We assume that the Hypotheses (H1-H6) are satisfied. Given x ∈ X, consider z obtained by the feasibility phase of the algorithm, y − z solution of (TP), yˆ defined by (3.2), x+ satisfying (2.6) and the symbol ζ defined in (3.6). Lemma 4.1. Consider the neighborhood V4 given in Lemma 3.6. If (x, λ) ∈ V4 , then ℓ(x+ , λ) − ℓ(ˆ y , λ) = O(ζ kˆ y − zk2 ).
Proof. From (1.3) and (1.4), we have (4.1)
ℓ(x+ , λ) − ℓ(ˆ y, λ) = ∇x ℓ(ˆ y , λ)T (x+ − yˆ) + O(kx+ − yˆk2 )
and (4.2)
∇x ℓ(ˆ y, λ) = ∇x ℓ(z, λ) + H(z, λ)(ˆ y − z) + O(kˆ y − zk2 ).
Using (3.1), (4.2) can be written as ∇x ℓ(ˆ y , λ) = −A(z)T µ + O(kˆ y − zk2 ). By continuity of A(·), kA(z)k is bounded. From Lemmas 3.3 and 3.6, µ ˜ = O(ζ) and µ − µ ˜ = O(ζ 2 ). Thus µ = O(ζ). Since, by (3.14), yˆ − z = O(ζ), we have ∇x ℓ(ˆ y , λ) = O(ζ). Using this and Lemma 3.8 in (4.1), we obtain ℓ(x+ , λ) − ℓ(ˆ y , λ) = O(ζ kˆ y − zk2 ) + O(kˆ y − zk4 ). By (3.14) we have kˆ y − zk4 = O(ζ kˆ y − zk2 ), completing the proof. The next lemma shows that near (x∗ , λ∗ ), ignoring only the filter, the full tangential Newton step will always lead to a sufficient decrease for the Lagrangian, and also to a sufficient decrease for f (·) after a correction step. Lemma 4.2. Consider the constants 0 < η1 < η2 < 1, ∆ ≥ ∆min given in Algorithm 2.2, and d2 = y − z. There exists a neighborhood V5 ⊂ V4 of (x∗ , λ∗ ) such that if (x, λ) ∈ V5 , then (i) kd2 k < ∆, (ii) ℓ(z, λ) − ℓ(y, λ) ≥ η2 (mz (0) − mz (d2 )), (iii) f (z) − f (x+ ) ≥ η1 (mz (0) − mz (d2 )). Proof. (i) Consider dˆ2 a solution of (QP). From (3.14), there exists a constant M > 0 such that kdˆ2 k ≤ M k(x, λ) − (x∗ , λ∗ )k. ∆min For (x, λ) ∈ B1 ⊂ B (x∗ , λ∗ ), ∩ V4 , we have that M kdˆ2 k < ∆min ≤ ∆.
LOCAL CONVERGENCE OF FILTER METHODS
15
So, dˆ2 = d2 is the solution of (TP) and consequently (i) holds. (ii) We denote pred = mz (0) − mz (d2 ) > 0. From (1.5), ℓ(z, λ) − ℓ(y, λ) = =
−∇x ℓ(z, λ)T d2 − 21 dT2 H(z, λ)d2 + O(kd2 k3 ) pred + kd2 k2 O(kd2 k).
From Lemma 3.1, kd2 k2 = O(pred), and from (3.14), kd2 k = O(ζ). It follows that there exists a constant N > 0 such that ℓ(z, λ) − ℓ(y, λ) ≥ pred − N ζpred. For ζ sufficiently small, say (x, λ) in a neighborhood B2 ⊂ B1 of (x∗ , λ∗ ), (ii) holds. (iii) By definition of Lagrangian function we have f (z) − f (x+ ) = ℓ(z, λ) − ℓ(x+ , λ) − λT (c(z) − c(x+ )) (4.3) = ℓ(z, λ) − ℓ(y, λ) + ℓ(y, λ) − ℓ(x+ , λ) − λT (c(z) − c(x+ )). From Lemmas 3.1 and 4.1, (4.4)
ℓ(y, λ) − ℓ(x+ , λ) = pred O(ζ).
By (2.6), Lemmas 3.1 and 3.8, we have (4.5)
c(z) − c(x+ ) = O(kc(y) − c(z)k2 ) = pred O(kd2 k2 ).
Then, using (ii), (4.4) and (4.5) in (4.3), we obtain f (z) − f (x+ ) ≥ pred (η2 − O(ζ)). For ζ sufficiently small, say (x, λ) in a neighborhood V5 ⊂ B2 of (x∗ , λ∗ ), (iii) holds, completing the proof. We can now establish the main result. Theorem 4.3. Consider the sequence (xk , λk ) generated by Algorithm 2.1 with the optimality phase computed by Algorithm 2.2. Let V5 be the neighborhood given by Lemma 4.2. If (xk , λk ) ∈ V5 , then k(xk+1 , λk+1 ) − (x∗ , λ∗ )k = O(k(xk , λk ) − (x∗ , λ∗ )k2 ). Proof. Given (xk , λk ) ∈ V5 , consider the trial point (x+ , λ+ ) obtained by the algorithm. From Lemma 4.2 this point satisfies the sufficient decrease condition and it is accepted by the filter by construction. Thus, this point will be accepted as the new iterate. Therefore, from Theorem 3.9, we have k(xk+1 , λk+1 ) − (x∗ , λ∗ )k = O(k(xk , λk ) − (x∗ , λ∗ )k2 ), completing the proof. This establishes the quadratic convergence of the algorithm.
16
E. KARAS, C. GONZAGA AND A. RIBEIRO
5. Acceptance of the corrector step by the current filter. In this short section we show the following result: if z is sufficiently near x∗ , then a single iteration of a quadratically convergent feasibility algorithm will suffice to eliminate the Maratos effect in the corrector step. Let x ¯ be the point obtained by an iteration of Algorithm 2.2 using in the corrector step a single iteration of an algorithm satisfying (2.6). Lemma 5.1. Consider V5 the neighborhood given by Lemma 4.2. If (x, λ) ∈ V5 , then h(¯ x) − h(z) = O([f (z) − f (¯ x)]2 ). Proof. From (2.6) and Lemma 3.8 we have h(¯ x) − h(z) = O(kc(y) − c(z)k2 ) = O(ky − zk4 ). Using Lemmas 3.1 and 4.2 we obtain f (z) − f (¯ x) = Ω(ky − zk2 ). Therefore h(¯ x) − h(z) = O([f (z) − f (¯ x)]2 ), completing the proof. Corollary 5.2. Given β > 0, there exists a neighborhood V6 ⊂ V5 of (x∗ , λ∗ ) such that if (x, λ) ∈ V6 , then |h(¯ x) − h(z)| < β(f (z) − f (¯ x)). Theorem 5.3. There exists a neighborhood V7 ⊂ V6 of (x∗ , λ∗ ) such that if (x, λ) ∈ V7 , then the point x ¯ is not dominated by x. Proof. Suppose that f (¯ x) ≥ f (x) − αh(x). Thus, (5.1)
f (z) − f (¯ x) ≤ f (z) − f (x) + αh(x).
Assuming without loss of generality that V6 ⊂ B((x∗ , λ∗ ), 1) and using Lemma 3.3, we can conclude that x and z are kept in a compact set. By mean value theorem and (H6), there exist w ∈ [x, z] and constants L, M > 0 such that |f (z) − f (x)| = |∇f (w)T (z − x)| ≤ Lkz − xk ≤ M (h(x) − h(z)) ≤ M h(x). From this and (5.1), we have f (z) − f (¯ x) ≤ (M + α)h(x). Let β, γ > 0 be sufficiently small such that β(M + α) + γ < 1 − α. By Corollary 5.2, if (x, λ) ∈ V6 , then h(¯ x) − h(z) < β(f (z) − f (¯ x)).
LOCAL CONVERGENCE OF FILTER METHODS
17
Hence h(¯ x) ≤ β(M + α)h(x) + h(z). From (H6), h(z) = O(h(x)2 ), thus there exists a neighborhood V such that if (x, λ) ∈ V , then h(z) < γh(x). Taking V7 = V6 ∩ V , for (x, λ) ∈ V7 , we have, h(¯ x) ≤ (1 − α)h(x), completing the proof. This shows that the Maratos effect is eliminated in a single iteration of the feasibility step. Further steps might be needed to avoid the regions blocked by old entries of the filter, but we shall not address this question. 6. Conclusion. We have presented an inexact restoration filter algorithm for equality constrained nonlinear programming. The algorithm is based on feasibility and optimality phases, which can be either independent or not. We compared the classical SQP Newton step and the IR Newton step, showing that both methods share the same asymptotic quadratic convergence properties with respect to the Lagrangian. When a full Newton step for the tangential minimization of the Lagrangian is efficient but not acceptable by the filter method, the algorithm computes a second order corrector step that tries to produce an acceptable point keeping the efficiency of the rejected step. The resulting point is tested by trust region criteria. We proved that the quadratic convergence of the inexact restoration method is preserved when corrector steps are added to the filter method. REFERENCES [1] C. Audet and J.E. Dennis. A pattern search filter method for nonlinear programming without derivatives. SIAM Journal on Optimization, 14(4):980–1010, 2004. [2] J. F. Bonnans, J. C. Gilbert, C. Lemar´ echal, and C. A. Sagastiz´ abal. Numerical Optimization: Theoretical and Practical Aspects. Springer Verlag, Berlin, 2002. [3] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that takes EQP steps. Mathematical Programming, 96(1):161–177, 2003. [4] R. Fletcher, N. Gould, S. Leyffer, Ph. L. Toint, and A. W¨ achter. Global convergence of a trustregion SQP-filter algorithm for general nonlinear programming. SIAM J. Optimization, 13(3):635–659, 2002. [5] R. Fletcher and S. Leyffer. A bundle filter method for nonsmooth nonlinear optimization. Technical Report NA/195, Dundee University, Dept. of Mathematics, 1999. [6] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Mathematical Programming - Ser. A, 91(2):239–269, 2002. [7] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of an SLP-filter algorithm. Technical Report NA/183, Dundee University, Dept. of Mathematics, 1998. [8] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM J. Optimization, 13(1):44–59, 2002. [9] R. Fletcher, S. Leyffer, and Ph. L. Toint. A brief history of filter method. SIAG/Optimization Views-and-News, 18(1):2–12, 2007. [10] C. C. Gonzaga, E. W. Karas, and M. Vanti. A globally convergent filter method for nonlinear programming. SIAM J. Optimization, 14(3):646–669, 2003. [11] N. I. M. Gould, S. Leyffer, and Ph. L. Toint. A multidimensional filter algorithm for nonlinear equations and nonlinear least-squares. SIAM Journal on Optimization, 15(1):17–38, 2005. [12] N. I. M. Gould, C. Sainvitu, and Ph. L. Toint. A filter-trust-region method for unconstrained optimization. SIAM Journal on Optimization, 16(2):341–357, 2006. [13] E. W. Karas, A. P. Oening, and A. A. Ribeiro. Global convergence of slanting filter methods for nonlinear programming. Applied Mathematics and Computation, 2007. To appear. [14] E. W. Karas, A. A. Ribeiro, C. Sagastiz´ abal, and M. Solodov. A bundle-filter method for nonsmooth convex constrained optimization. Mathematical Programming, 2007. To appear.
18
E. KARAS, C. GONZAGA AND A. RIBEIRO
[15] N. Maratos. Exact Penalty Function Algorithms for Finite Dimensional and Control Optimization Problems. PhD thesis, Imperial College Science Technology, University of London, 1978. [16] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, 1999. [17] M. J. D. Powell. Convergence properties of algorithms for nonlinear optimization. SIAM Review, 28:487–500, 1986. [18] A. A. Ribeiro, E. W. Karas, and C. C. Gonzaga. Global convergence of filter methods for nonlinear programming. Technical report, Department of Mathematics, Federal University of Paran´ a, Brazil, 2006. [19] M. Ulbrich, S. Ulbrich, and L. N. Vicente. A globally convergent primal-dual interior-point filter method for nonlinear programming. Mathematical Programming, Ser. A, 100(2):379–410, 2004. [20] S. Ulbrich. On the superlinear local convergence of a filter-SQP method. Mathematical Programming, Ser. B, 100(1):217–245, 2004. [21] A. W¨ achter and L. T. Biegler. Line search filter methods for nonlinear programming: Local convergence. SIAM Journal on Optimization, 16(1):32–48, 2005. [22] A. W¨ achter and L. T. Biegler. Line search filter methods for nonlinear programming: Motivation and global convergence. SIAM Journal on Optimization, 16(1):1–31, 2005.