a new nonlinear lagrangian method for nonconvex semidefinite ...

0 downloads 0 Views 251KB Size Report
Abstract. We study convergence properties of a new nonlinear La- grangian method for nonconvex semidefinite programming. The con- vergence analysis ...
Journal of Applied Analysis Vol. 15, No. 2 (2009), pp. 149–172

A NEW NONLINEAR LAGRANGIAN METHOD FOR NONCONVEX SEMIDEFINITE PROGRAMMING Y. LI and L. ZHANG Received January 30, 2008 and, in revised form, August 17, 2008

Abstract. We study convergence properties of a new nonlinear Lagrangian method for nonconvex semidefinite programming. The convergence analysis shows that this method converges locally when the penalty parameter is less than a threshold and the error bound of solution is proportional to the penalty parameter under the constraint nondegeneracy condition, the strict complementarity condition and the strong second order sufficient conditions. The major tools used in the analysis include the second implicit function theorem and differentials of L¨ owner operators.

1. Introduction Consider the following optimization problem min f (x) s.t. h(x) = 0,

G(x) ∈ Sm −,

(1.1)

2000 Mathematics Subject Classification. Primary: 90C22, 90C26. Key words and phrases. Semidefinite programming, nonlinear Lagrangian method, convergence analysis. The research is supported by the National Natural Science Foundation of China under project No. 10771026 and by the Initial Funds for Imported Talents’ Research Projects, Dalian Nationalities University under project No. 20096208. ISSN 1425-6908

c Heldermann Verlag.

150

Y. LI AND L. ZHANG

where f : Rn → R is a twice continuously differentiable function, h : Rn → Rq is a vector-valued mapping and the mapping G : Rn → Sm, where Sm is m the space of m × m real symmetric matrices and Sm − (S+ ) denotes the space of negatively (positively) semidefinite real symmetric matrix. As nonlinear Lagrangians can be used to develop primal-dual algorithms, requiring no restrictions on primal feasibility, nonlinear Lagrangian methods have been widely studied in nonlinear programming see for instances [2], [3], [16], [17], [18] and the references herein. Among these nonlinear Lagrangians, the following class of Lagrangians for nonlinear programming problems with only inequality constraints (e.g. gi (x) ≤ 0, i = 1, . . . , p) N (x, µ, t) = f (x) +

p 1X µi ψ(kg i (x)) k

(1.2)

i=1

is well investigated, where k > 0 is penalty parameter and ψ is twice continuous differentiable function. And in recent years, many contributions have been done to nonlinear Lagrange methods for solving semidefinite programming [1], [11], [12], [13], and [22]. Mosheyev and Zibulevsky [11] presented penalty/barrier multiplier methods for semidefinite problems but the convergence analysis is based on the results for convex optimization problems. For nonconvex SDP problems, the satisfactory convergence analysis was provided by [12] and [20]. Specially, Fares et al. ([6], [7]) employed augmented Lagrange methods to solving SDP problems in control theory. Kocvara and Stingl ([9], [20]) studied a class of nonlinear Lagrangians and algorithms for solving control and structural optimization problems. In this paper, we choose a new function ψ, defined by ψ(µ) = 2(log[(1 − µ)−1 + 1] − log 2).

(1.3)

The following Lagrangian for (1.1) is the L¨owner operator associated with ψ: F (x, θ, U, t) =f (x) + hθ, h(x)i + (2t)−1 kh(x)k2 + thU, Ψ(t−1 G(x))i,

(1.4)

which has never been investigated before, where θ ∈ Rq , U ∈ Sm + and t > 0. Now we give comments about the difference of our work from that of [20]. First, the problem addressed here is a non-convex problem with both equality constraints and a SDP constraint, whereas the problem in [20] has no equality constraints. Second, the convergence theorem [20, Theorem 6.14] requires that (U, p) is in V(U ∗ , p0 , δ, ε, Θ), where V(U ∗ , p0 , δ, ε, Θ) is defined as an intersection of three sets, and one of the three sets is defined by the constraints sTi U si ≥ ε, i ∈ Iact . So the result is a little bit more complicated than the corresponding result in nonlinear programming, see

A NEW NONLINEAR LAGRANGIAN METHOD

151

[15]. But in Section 4, we obtain the similar convergence theorem (namely Theorem 4.1) to the nonlinear programming case. We organize the paper as follows. In the next section, the definition of L¨ owner operator and some basic assumptions are introduced. In Section 3, we discuss properties of the nonlinear Lagrangian (1.4). In Section 4, we analyze the convergence of the proposed nonlinear Lagrangian method, including the local convergence and the rate of convergence.

2. Preliminaries and assumptions 2.1. Differential properties of L¨ owner operator. For any two matrices A and B in Rm×n , hA, Bi = tr(AT B) is the Frobenius inner product of A and B, where “tr” denotes the trace operation of a matrix. Definition 2.1 ([10]). A L¨owner operator associated with ψ : matrix function Ψ : Sm → Sm defined by

R → R is a

Ψ(A) = P diag (ψ(λ1 ), ψ(λ2 ), . . . , ψ(λm ))P T ,

where A ∈ Sm has the following spectral decomposition, A = P diag (λ1 , λ2 , . . . , λm )P T ,

in which P = (p1 , p2 , . . . , pm ), λi ∈ R, i = 1, . . . , m are the eigenvalues of A and pi , i = 1, . . . , m are the corresponding eigenvectors. Throughout the paper, we use the L¨owner operator Ψ with ψ defined by (1.3). Let σ(A) = {λ1 , λ2 , . . . , λρ(A) } be the set consisting of all distinct eigenvalues ofP A. For any λi ∈ σ(A), denote Ii (A) := {j ∈ {1, . . . , m}|λj = λi } and Pi = j∈Ii pj pTj .

Definition 2.2. Let {λ1 , . . . , λm } be arranged in decreasing order. Define the first divided difference ψ [1] (λ) of ψ at λ as the m × m symmetric matrix with its ijth entry ψ [1] (λ)ij := ψ [1] (λi , λj ), where    ψ(λi ) − ψ(λj ) , λi 6= λj , [1] λi − λj (2.1) ψ (λi , λj ) =  ψ 0 (λ ), λi = λj . i

The second divided difference ψ [2] (λ) of ψ at λ as the p × p × p tensor with its ijkth entry (ψ [2] (λ))ijk := ψ [2] (λi , λj , λk ), where

152

Y. LI AND L. ZHANG

  ψ [1] (λi , λk ) − ψ [1] (λj , λk )   ,   λi − λj   ψ [1] (λi , λj ) − ψ [1] (λj , λk ) ψ [2] (λi , λj , λk ) = ,    λi − λk    ψ 00 (λ ), i

λi 6= λj , λi = λj 6= λk ,

(2.2)

λi = λj = λk .

It follows from [8, Theorem 6.630 (1)] that if ψ is continuously differentiable in an interval containing all eigenvalues of A, Ψ is continuously differentiable at A and the directional derivative of Ψ at A in direction H1 is DΨ(A)(H1 ) = P (ψ [1] (λ) ◦ (P T H1 P ))P T ρ(A)

=

X

ψ [1] (λk , λl )Pk H1 Pl ,

(2.3)

k,l=1

where “◦” denotes the Hadamard product of A and B, i.e., A ◦ B = (aij bij )m ij=1 . It is easy to check that the operator DΨ(A) is self-adjoint, namely h(DΨ(A))∗ H1 , H2 i = h(DΨ(A))H1 , H2 i

for any H1 , H2 ∈ Sm . We have from [8, Theorem 6.630 (2)] that if ψ is continuously differentiable in an interval containing of the second order directional derivative of A, Ψ at A in directions H1 and H2 is D2 Ψ(A; H1 , H2 ) = D((DΨ(A)(H1 ))(H2 ) ρ(A)

=

X

ψ [2] (λk , λl , λs )(Pk H1 Pl H2 Ps + Ps H2 Pl H1 Pk )

(2.4)

k,l,s=1

for H1 , H2 ∈ Sm . The following lemma is a modified version of Debreu Theorem appeared in [5] originally. Lemma 2.1. Let M ∈ Sn , N ∈ Rm×n be full rank in row and D ∈ Sm be a positive definite diagonal matrix. Suppose that N y = 0 implies y T M y ≥ 0, then there exists a positive number k0 such that the matrix M + kN T DN is a positive definite matrix whenever k ≥ k0 . Now, we introduce a definition which is useful in the next section.

A NEW NONLINEAR LAGRANGIAN METHOD

153

Definition 2.3 ([14]). For any matrix A ∈ Cm×n , G ∈ Cn×m is the MoorePenrose inverse of A, if it satisfies the following conditions AGA = A, GAG = G, (AG)H = AG, (GA)H = GA, where H stands for conjugate transpose. 2.2. Problem assumptions. The Lagrangian function for problem (1.1) is

L(x, θ, U ) = f (x) + hθ, h(x)i + hU, G(x)i, (x, θ, U ) ∈ Rn × Rq × Sm .

Let (x∗ , θ∗ , U ∗ ) be a Karush-Kuhn-Tucker point, i.e., ∇x L(x∗ , θ∗ , U ∗ ) = 0, G(x∗ ) ∈ Sm −.

h(x∗ ) = 0,

hU ∗ , G(x∗ )i = 0,

U ∗ ∈ Sm +

and

We assume that (x∗ , θ∗ , U ∗ ) satisfies the following conditions: Assumption (a). The constraint nondegeneracy condition is satisfied [4]. Assume rank G(x∗ ) = r and G(x∗ ) has the following spectral decomposition G(x∗ ) = P ΛP T , where Λ = diag (λ1 , . . . , λm ) is the diagonal matrix whose diagonal elements are eigenvalues of G(x∗ ) in the decreasing order, i.e., λ1 ≥ λ2 ≥ · · · ≥ λm . Then λ1 = · · · = λm−r = 0. Let γ1 = {1, . . . , m − r} and γ2 = {m − r + 1, . . . , m} with 1 < r < m. Notice that the vectors p1 , p2 , . . . , pm−r ∈ Rn form an orthonormal basis for the null space of the matrix G(x∗ ), and Assumption (a) implies the linear independence of the following set of vectors:  T ∗ ∗ ∗ T ∂G(x ) T ∂G(x ) T ∂G(x ) vij = pi pj , p i pj , . . . , p i pj , ∂x1 ∂x2 ∂xn where 1 ≤ i ≤ j ≤ m − r. Then the constrained nondegeneracy condition is equivalent to the fact that (∇h(x) Aγ1 γ1 ) is of full rank in column where Aγ1 γ1 = {vij : 1 ≤ i ≤ j ≤ m − r}. This is a generalization of the linear independence constraint qualification LICQ from classical nonlinear programming which implies that (θ∗ , U ∗ ) is unique. Assumption (b). The strict complementarity condition holds at (x∗ , θ∗ , U ∗ ), i.e., rank(G(x∗ )) = r,

rank(U ∗ ) = m − r.

Without loss of generality, we can assume that U ∗ and G(x∗ ) have the following expressions:     0 0 Λγ1 γ1 0 ∗ T ∗ G(x ) = P P , U =P PT, (2.5) 0 Λγ2 γ2 0 0

154

Y. LI AND L. ZHANG

where Λγ1 γ1 = diag1≤j≤m−r (λ0j ) ∈ Sm−r is positive definite and Λγ2 γ2 = diagm−r+1≤i≤m (λi ) ∈ Sr is negative definite. In addition, Let P = (p1 , p2 , . . . , pm ) = (Pγ1 , Pγ2 ), where the columns of Pγ1 ∈ Rm×(m−r) form an orthonormal basis for the null space of the matrix G(x∗ ). Let P 0 = Pγ1 PγT1 and P 0⊥ = Pγ2 PγT2 . Assumption (c). The strong second order sufficient condition holds at x∗ [21]:

d, ∇2xx L(x∗ , U ∗ )d + ΥG(x∗ ) (U ∗ , DG(x∗ )d) > 0 ∀ d ∈ app(θ∗ , U ∗ ) \ {0} , where

app(θ∗ , U ∗ )   := d| Dh(x∗ )d = 0, DG(x∗ )d ∈ aff C G(x∗ ) + U ∗ ; Sp− (2.6)

and for any given B ∈ as

Sp, the linear-quadratic function ΥB (·, ·) is defined

D E ΥB (Γ, C) := −2 Γ, CB † C ,

(Γ, C) ∈ Sp × Sp

with B † being the Moore-Penrose pseudo-inverse of B. Note that when the strict complementarity condition holds, namely Assumption (b) is satisfied, then the strong second order sufficient condition made in Assumption (c) reduces to the so called “no gap” second order sufficient optimality condition [4, Section 5.3.5], i.e., for any d ∈ C(x∗ ) \ {0}, n o sup dT ∇2xx L(x∗ , θ, U )d + dT Γ(x∗ , U )d > 0, (θ,U )∈Λ(x∗ )

where Λ(x∗ ) is the set of Lagrange multipliers corresponding to x∗ ,  C(x∗ ) = app(θ∗ , U ∗ ) = d ∈ Rn |∇h(x∗ )T d = 0, PγT1 (DG(x∗ )d)Pγ1 = 0 is the critical cone at x∗ and   ∗ ∂G(x∗ ) ∗ † ∂G(x ) ∗ Γ(x , U )i,j = −2 U, G(x ) , i, j = 1, . . . , n, ∂xi ∂xj

which expression can be found in formula (4.13) of [19].

3. Properties of nonlinear Lagrangians

Define υ(x, θ, t) = θ + t−1 h(x) and W (x, U, t) = DΨ(B)|B=t−1 G(x) (U ).

A NEW NONLINEAR LAGRANGIAN METHOD

155

Lemma 3.1. Let (x∗ , θ∗ , U ∗ ) be a KKT point to problem (1.1). If Assumption (b) holds, then υ(x∗ , θ∗ , t) = θ∗ , ∀ t > 0, ∗



(3.1)



W (x , U , t) = U , ∀ t > 0.

(3.2)

Proof. It is obvious that equation (3.1) holds. Now we come to prove equations (3.2). Let e˜m−r = (1, . . . , 1)T ∈ Rm−r . From the decompositions in (2.5), we have λ1 = λ2 = · · · = λm−r = 0. Since ψ [1] (t−1 λ) can be written as  [1] −1  ψ (t λ)γ1 γ1 ψ [1] (t−1 λ)γ1 γ2 [1] −1 ψ (t λ) = , ψ [1] (t−1 λ)γ2 γ1 ψ [1] (t−1 λ)γ2 γ2 and noting that ψ [1] (t−1 λ)γ1 γ1 = e˜m−r e˜Tm−r , we have that

   W (x∗ , U ∗ , t) = P ψ [1] t−1 λ ◦ P T U ∗ P P T  [1] −1  ψ (t λ)γ1 γ1 ◦ Λγ1 γ1 0 =P PT 0 0   Λγ1 γ1 0 P T = U ∗, =P 0 0

which implies the proof is completed.

Theorem 3.1. Under the Assumptions (a)–(c), the nonlinear Lagrangian (1.2) has the following important properties: (a) ∇x F (x∗ , θ∗ , U ∗ , t) = 0; (b) There exists a scalar t0 > 0 such that for any 0 < t ≤ t0 , ∇2xx F (x∗ , θ∗ , U ∗ , t)  0, where ‘ 0’ means that a matrix is positive definite. Proof. By direct calculating and using Lemma 2.1, we have that ∇x F (x, θ, U, t)

= ∇x f (x) + ∇h(x)θ + t−1 ∇h(x)h(x) + t Dx (Ψ(t−1 G(x)))

∗

U ∗ = ∇x f (x) + ∇h(x)υ(x, θ, t) + t DΨ(B)|B=t−1 G(x) t−1 Dx G(x) U ∗ = ∇x f (x) + ∇h(x)υ(x, θ, t) + (Dx G(x))∗ DΨ(B)|B=t−1 G(x) U  = ∇x f (x) + ∇h(x)υ(x, θ, t) + (Dx G(x))∗ DΨ(B)|B=t−1 G(x) (U )

= ∇x L(x, υ(x, θ, t), W (x, U, t)).

156

Y. LI AND L. ZHANG

Combining with Lemma 3.1, we obtain that ∇x F (x∗ , θ∗ , U ∗ , t) = ∇x L(x∗ , υ(x∗ , θ∗ , t), W (x∗ , U ∗ , t)) = ∇x L(x∗ , θ∗ , U ∗ ) = 0,

which completes the proof of (a). According to equation (1.4), we have that ∇2xx F (x, θ, U, t)

= ∇2xx L(x, υ(x, θ, t), W (x, U, t)) + t−1 ∇h(x)∇h(x)T

+ Jx DG∗ W (x, U, t)|DG=DG(x)

+ ∇2xx L(x, υ(x, θ, t), W (x, U, t)) + t−1 ∇h(x)∇h(x)T + Q(x, U, t), where

   ∂G(x) ∗ 2 Q(x, U, t) =t DG(x) D Ψ(B)|B=t−1 G(x) U, ,··· , ∂x1   ∂G(x) ∗ 2 DG(x) D Ψ(B)|B=t−1 G(x) U, . ∂xn −1

For i, j ∈ {1, . . . , n},    ∗ ∂G(x∗ ) 2 ∗ ∂G(x ) , D Ψ(B)|B=t−1 G(x) U , ∂xi ∂xj * ∗ )) ρ(G(x  ∂G(x∗ ) X = ψ [2] t−1 λk , t−1 λl , t−1 λs , ∂xi k,l,s=1   ∗ ∗ ∗ ∂G(x ) ∗ ∗ ∗ ∗ ∗ ∗ ∂G(x ) ∗ × Pk (x ) Pl (x )U Ps (x ) + Ps (x )U Pl (x ) Pk (x ) ∂xj ∂xj * + ρ(G(x∗ )) ∗)  ∂G(x∗ ) X ∂G(x =2 , U∗ ψ [2] t−1 λk , 0, 0 Pk (x∗ ) ∂xi ∂xj k=1

and

t−1 ψ [2] (t−1 λk , 0, 0) = −

ψ [1] (0, t−1 λk ) 1 + , λk λk

k = 2, . . . , ρ(G(x∗ )),

we obtain that Q(x∗ , U ∗ , t)ij − Γ(x∗ , U ∗ )ij * + ∗ ∗ ) ρ(G(x ∗) X ))  ∂G(x ∂G(x = 2t−1 ψ [2] t−1 λk , 0, 0 Pk (x∗ ) , U∗ ∂xi ∂xj k=1 *ρ(G(x∗ )) + ∗) X 1 ∂G(x∗ ) ∂G(x +2 Pk (x∗ ) , U∗ λk ∂xi ∂xj k=2

A NEW NONLINEAR LAGRANGIAN METHOD



157



∂G(x∗ ) ∂G(x∗ ) ∗ = 2t−1 ψ [2] (0, 0, 0) P1 (x∗ ) ,U ∂xi ∂xj  * ρ(G(x∗ )) ∂G(x∗ )  X 1 +2 Pk (x∗ ) ∂xi λk k=2  + ρ(G(x∗ )) ∗) X  ∂G(x + t−1 ψ [2] t−1 λk , 0, 0 Pk (x∗ ) , U∗ ∂xj k=2   ∗ ∗ −1 ∂G(x ) ∗ ∂G(x ) ∗ = 2t P1 (x ) ,U ∂xi ∂xj  ρ(G(x∗ )) [1] X ψ (0, t−1 λk )  ∂G(x∗ ) ∂G(x∗ ) ∗ +2 Pk (x∗ ) ,U λk ∂xi ∂xj = 2t

k=2 m−r X −1

λ0k pTk

k,l=1

+2

m X

m−r X

k=m−r+1 l=1

∂G(x∗ ) T ∂G(x∗ ) pl pl pk ∂xi ∂xj

ψ [1] (0, t−1 λk )λ0l T ∂G(x∗ ) ∂G(x∗ ) pl pk pTk pl . λk ∂xi ∂xj

Thus we have ∇2xx F (x∗ , θ∗ , U ∗ , t)

= ∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ ) + t−1 ∇h(x∗ )∇h(x∗ )T

+ Q(x∗ , U ∗ , t) − Γ(x∗ , U ∗ )

= ∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ ) + 2t−1 (∇h(x∗ )Aγ1 γ1 ) D1 (∇h(x∗ )Aγ1 γ1 )T

+ 2Aγ1 γ2 D2 ATγ1 γ2 , where 



  D1 = diag 1/2, . . . , 1/2, λ01 , 2λ01 , . . . , 2λ01 , λ02 , 2λ02 , . . . , 2λ02 , . . . , λ0m−r  , | {z } q

D2 = diag

ψ [1] (0, t−1 λm−r+1 )λ01 ψ [1] (0, t−1 λm )λ01 ,... , ,... , λm−r+1 λm

ψ [1] (0, t−1 λm−r+1 )λ0m−r ψ [1] (0, t−1 λm )λ0m−r ,... , λm−r+1 λm

!

,

158

and

Y. LI AND L. ZHANG

 ∂G(x∗ ) pj  ∂x1   .   .. Aγ1 γ2 =  1 ≤ i ≤ m − r, m − r + 1 ≤ j ≤ m .  ∂G(x∗ )  pTi pj ∂xn Since Assumption (a) holds at x∗ , we have that the corresponding Lagrange multiplier is unique and the second order sufficient condition Assumption (c) can be equivalently expressed as  dT ∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ ) d ≥ 2%0 kdk2 ∀d ∈ C(x∗ ) 

pT  i

for some %0 > 0. According to equation (2.1), we know that there exist t˜0 > 0 and %˜0 > 0 such that for t ∈ (0, t˜0 ], −1 λ )−1 + 1 2t (1 − t k [1] −1 ≤ %˜0 t, |ψ (t λk , 0)| = log λk 2 where

2 1 log , k = 2, . . . , m. λk 2 ˜ ˜ Hence, there exists t1 ∈ (0, t0 ], when t ∈ (0, t˜1 ], we have Aγ1 γ2 D2 ATγ1 γ2  −%0 In . Accordingly, for any d ∈ C(x∗ ) which satisfies (∇h(x) Aγ1 γ1 )T d = 0, by Lemma 2.1 and Assumption (c), we can obtain that   1 dT ∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ ) d + dT 2Aγ1 γ2 D2 ATγ1 γ2 d ≥ %0 kdk2 . 2 Then there exists t0 ∈ (0, t˜1 ] such that for any t ∈ (0, t0 ], %˜0 ≤

∇2xx F (x∗ , θ∗ , U ∗ , t)  0.

4. Convergence analysis This section presents the algorithm and focuses on analyzing the rate of local convergence of the nonlinear Lagrangian method. Algorithm step 1: Given ϑ ∈ (0, 1), t0 > 0, θ0 ∈ Rq , U 0 ∈ Sm + and set k = 0; step 2: Solve minimize F (x, θk , U k , tk ) and obtain the solution xk ; step 3: If U k satisfies the stopping criterion of (1.1), stop the iteration and we obtain that xk is a KKT point to problem (1.1);

A NEW NONLINEAR LAGRANGIAN METHOD

159

step 4: Else, update θk+1 and U k+1 by k θk+1 =θk + t−1 k h(x ),

U k+1 =DΨ(B)|B=t−1 G(xk ) (U k ), and t

k+1

=

(

tk ϑ tk

if kU k+1 − U k k ≤ kU k − U k−1 k, otherwise;

step 5: Set k = k + 1 and go back to Step 2. Noting that the update scheme for tk is adopted in [12] and the stopping criterion should be chosen before the algorithm is implemented. Let θ and U satisfy θ = t−1 η + θ∗ and U = t−1 Θ + U ∗ , respectively. And for any matrix M = (mij ) ∈ Sn , we define  T √ √ √ svec M := m11 , 2m12 , m22 , 2m13 , 2m23 , m33 , . . . , mnn ∈ Rn(n+1)/2 . Lemma 4.1. If Assumption (a) is satisfied, we have that T Jx svec PγT1 W (x, U, t)Pγ1 |(x,U,t)=(x∗ ,U ∗ ,t) "  T #n   ∗) ∗) ∂G(x ∂G(x + U ∗ Pγ1 = t−1 svec PγT1 U ∗ ∂xj ∂xj

,

(4.1)

−→ Γ(x∗ , U ∗ ),

(4.2)

j=1

and



∂G(x∗ ) ∂W(x∗ , U ∗ , t) , ∂xi ∂xj

n

i,j=1

t→0+

where W(x, U, t) = P 0 W (x, U, t)P 0⊥ + P 0⊥ W (x, U, t)P 0⊥ + P 0⊥ W (x, U, t)P 0 . Proof. Since

T Jx svec PγT1 W (x, U, t)Pγ1 "    T #n ∂G(x) −1 T 2 Pγ1 =t svec Pγ1 D Ψ(B)|B=t−1 G(x) U, ∂xj

j=1

and

  ∗ ∗ ∂G(x ) D Ψ(B)|B=t−1 G(x∗ ) U , ∂xj 2

ρ(G(x∗ ))

=

X

k,l,s=1

ψ [2] t−1 λk , t−1 λl , t−1 λs



(4.3)

160

Y. LI AND L. ZHANG

  ∗ ∗ ∗ ∗ ∗ ∂G(x ) ∗ ∗ ∂G(x ) ∗ ∗ ∗ × Pk (x )U Pl (x ) Ps (x )+Ps (x ) Pl (x )U Pk (x ) ∂xj ∂xj   ρ(G(x∗ )) ∗ ∗ X  [2] −1 ∗ ∂G(x ) ∗ ∗ ∂G(x ) ∗ = ψ 0, 0, t λs U Ps (x ) + Ps (x ) U , ∂xj ∂xj s=1

we obtain that T Jx svec PγT1 W (x, U, t)Pγ1 |(x,U,t)=(x∗ ,U ∗ ,t) "    T #n ∗ ∂G(x∗ ) ∗ −1 [2] T ∗ ∂G(x ) =t svec ψ (0, 0, 0)Pγ1 U + U Pγ1 ∂xj ∂xj j=1 "  T #n   ∗ ∗ ∂G(x ) ∂G(x ) ∗ + U Pγ1 , = t−1 svec PγT1 U ∗ ∂xj ∂xj j=1

which implies equation (4.1) has been proved. For i, j = {1, . . . , n}, a direct calculation yields   ∂G(x∗ ) ∂W(x∗ , U ∗ , t) , ∂xi ∂xj     ∗ ∗ ∂G(x ) 0 2 ∗ −1 ∂G(x ) 0⊥ = , P D Ψ(B)|B=t−1 G(x∗ ) U , t P ∂xi ∂xj     ∗ ∂G(x∗ ) 0⊥ 2 ∗ −1 ∂G(x ) 0⊥ , P D Ψ(B)|B=t−1 G(x∗ ) U , t P + ∂xi ∂xj     ∂G(x∗ ) 0⊥ 2 ∂G(x∗ ) + , P D Ψ(B)|B=t−1 G(x∗ ) U ∗ , t−1 P0 . ∂xi ∂xj Noting that P 0⊥ U ∗ = U ∗ P 0⊥ = 0, P 0 U ∗ = U ∗ P 0 = U ∗ and by equation (4.3), there is   ∂G(x∗ ) ∂W(x∗ , U ∗ , t) , ∂xi ∂xj * + ∗ ∗ ρ(G(x X ))  ∗ ∂G(x∗ ) [2] −1 −1  ∂G(x ) ∗ ψ t λk , 0, 0 U =t , Pk (x ) ∂xi ∂xj k=2 * + ∗ ρ(G(x )) ∗) ∂G(x∗ ) X ∂G(x + , ψ [2] (t−1 λk , 0, 0)Pk (x∗ ) U∗  ∂xi ∂xj k=2   * + ρ(G(x∗ )) ∗) ∗) X  ∂G(x ∂G(x −1 ∗ [2] −1 ∗  = 2t U , ψ t λk , 0, 0 Pk (x ) . ∂xi ∂xj k=2

A NEW NONLINEAR LAGRANGIAN METHOD

161

When t → 0, one has that t−1 ψ [2] (t−1 λk , 0, 0) → −1/λk , k = 2, . . . , ρ(G(x∗ )). Hence, we obtain that   ∂G(x∗ ) ∂W(x∗ , U ∗ , t) , ∂xi ∂xj * ! + m X ∂G(x∗ ) 1 ∂G(x∗ ) ∗ T → −2 = Γ(x∗ , U ∗ )ij , pk pk ,U ∂xi λk ∂xj k=m−r+1

where i, j = 1, . . . , n. This completes the proof of equation (4.2). Lemma 4.2. Let

 n  ∂G(x) −1 ∗ M (x, Θ, t) =Jsvec Θ W x, t Θ + U , t , , ∂xi i=1   M2 (x, Θ, t) =tJsvec Θ svec PγT1 W x, t−1 Θ + U ∗ , t Pγ1 , "   T #n ∂G(x) M3 (x, Θ, t) = t−1 svec D2 Ψ(B)|B=t−1 G(x) t−1 Θ + U ∗ , , ∂xi i=1  M4 (x, Θ, t) =Jsvec Θ svec DΨ(B)|B=t−1 G(x) t−1 Θ + U ∗ . 1

Suppose that Assumptions (a)–(c) are satisfied, there exist positive constants %1 , %2 , %3 , %4 , %5 , %6 , t1 and δ1 such that kM1 (x, Θ, t)kF ≤ %1 ,

kM2 (x, Θ, t)kF ≤ %2 ,

kP 0⊥ M3 (x, Θ, t)P 0⊥ kF ≤ %5 ,

kP 0⊥ M4 (x, Θ, t)P 0⊥ kF ≤ %6 ,

kP 0 M3 (x, Θ, t)P 0⊥ kF ≤ %3 ,

kP 0 M4 (x, Θ, t)P 0⊥ kF ≤ %4 ,

for any x ∈ Bδ1 (x∗ ), Θ ∈ Bδ1 (0) and t ∈ (0, t1 ].

Proof. First, we can find a constant t0 small enough, such that for t ∈ (0, t0 ], k = 2, . . . , ρ(G(x∗ )) and l = 1, . . . , ρ(G(x∗ )), there are t−1 ψ [1] (t−1 λk , t−1 λl ) ≤ %¯0 , t−1 ψ [2] (0, 0, t−1 λk ) ≤ %¯0 ,

(4.4) (4.5)

where %¯0 is a positive constant. In fact, for λk < 0, k = 2, . . . , ρ(G(x∗ )) and by (1.3), we have  ψ(t−1 λk ) − ψ(t−1 λl ) t−1 ψ [1] t−1 λk , t−1 λl =t−1 t−1 (λk − λl )      1 1 2 log + 1 − log + 1 1 − t−1 λk 1 − t−1 λl = λk − λl

162

Y. LI AND L. ZHANG

= and

2 (2t − λk )(t − λl ) t→0 log −→ 0, λk − λl (t − λk )(2t − λl ) ψ [1] (0, t−1 λk ) − ψ [1] (0) t−1 (λk − 0) ψ(0) − ψ(t−1 λk ) −1 0 − t−1 λk = λk 2t − λk 2t log − λk t→0 2t − 2λk = −→ 0. λ2k

t−1 ψ [2] (0, 0, t−1 λk ) =t−1

Thus, we know there exist a constant t0 and %¯0 > 0, such that when t ∈ (0, t0 ], (4.4) and (4.5) hold. Second, we begin to prove the lemma. Since M1 (x∗ , 0, t)  n 0 −1 ∗ 0⊥ ∂G(x) = Jsvec Θ P W (x, t Θ + U , t)P , ∂xi i=1 x=x∗ ,Θ=0 n  ∂G(x) + Jsvec Θ P 0⊥ W (x, t−1 Θ + U ∗ , t)P 0 , ∂xi i=1 x=x∗ ,Θ=0 n  ∂G(x) + Jsvec Θ P 0⊥ W (x, t−1 Θ + U ∗ , t)P 0⊥ , ∂xi i=1 x=x∗ ,Θ=0  T n ρ(G(x∗ )) X  ∂G(x∗ ) 0⊥   P Pl (x∗ )  t−1 ψ [1] t−1 λk , t−1 λl Pk (x∗ )P 0 = svec ∂xi k,l=1



+  svec

i=1

ρ(G(x∗ ))

X

k,l=1

 t−1 ψ [1] t−1 λk , t−1 λl Pk (x∗ )

!T  n ∗) ∂G(x × P 0⊥ P 0 Pl (x∗ )  ∂xi i=1  ρ(G(x∗ )) X  t−1 ψ [1] t−1 λk , t−1 λl Pk (x∗ ) +  svec k,l=1

∂G(x∗ ) 0⊥ × P 0⊥ P Pl (x∗ ) ∂xi

!T n 

i=1

A NEW NONLINEAR LAGRANGIAN METHOD

163

 T n ρ(G(x∗ )) X ∂G(x∗ )   = svec t−1 ψ [1] (0, t−1 λl )P1 (x∗ ) Pl (x∗ )  ∂xi l=2

i=1



 + svec 

ρ(G(x∗ ))

X

t−1 ψ [1]

k=2

T n ∗  ∂G(x )  P1 (x∗ )  t−1 λk , 0 Pk (x∗ ) ∂xi

i=1

ρ(G(x∗ ))

X  + svec t−1 ψ [1] k,l=2

T n ∗  ∂G(x )  Pl (x∗ )  . t−1 λk , t−1 λl Pk (x∗ ) ∂xi i=1

Combining with equation (4.4), there exists a constant %11 > 0 such that kM1 (x∗ , 0, t)kF ≤ %11 . Due to the fact that M1 (x, Θ, t) is continuous with respect to (x, svec Θ) at (x∗ , 0, t), there are constants %1 = 2%11 , δ11 > 0 and 0 < t11 < t0 such that kM1 (x, Θ, t)kF ≤ %1 when x ∈ Bδ11 (x∗ ), Θ ∈ Bδ11 (0) and t ∈ (0, t11 ]. Similarly, we have  M2 (x∗ , 0, t) =tJsvec Θ svec PγT1 W (x, t−1 Θ + U ∗ , t)Pγ1 ∗ x=x ,Θ=0 T T m(m+1)/2 =(svec(Pγ1 DΨ(B)|B=t−1 G(x∗ ) ∆i Pγ1 ) )i=1 m(m+1)/2

=(svec(PγT1 ∆i Pγ1 )T )i=1

,

where ∆i is a matrix with the only non-zero item 1 at the ith position of svec ∆i . Hence, we know that there exists a constant %21 > 0 such that kM2 (x∗ , 0, t)kF ≤ %21 . As M2 (x, Θ, t) is continuous with respect to (x, svec Θ) at (x∗ , 0, t), there are constants %2 = 2%21 , δ12 > 0 and 0 < t12 < t0 such that kM2 (x, Θ, t)kF ≤ %2 where x ∈ Bδ12 (x∗ ), Θ ∈ Bδ12 (0) and t ∈ (0, t12 ]. By a direct calculation, we obtain P 0 M3i (x∗ , 0, t)P 0⊥ ρ(G(x∗ ))

= t−1 svec

X

ψ [2] (t−1 λk , t−1 λl , t−1 λs )P 0

k,l,s=1

  ∗ ∗ ∗ ∗ ∗ ∂G(x ) ∗ ∗ ∂G(x ) ∗ ∗ ∗ × Pk (x )U Pl (x ) Ps (x ) + Ps (x ) Pl (x )U Pk (x ) P 0⊥ ∂xi ∂xi   ρ(G(x∗ )) ∗ X −1 ∗ [2] −1 ∗ ∗ ∗ ∂G(x ) =t Ps (x ) ψ (0, 0, t λs ) P1 (x )U P1 (x ) ∂xi s=2

ρ(G(x∗ ))

= t−1

X s=2

ψ [2] (0, 0, t−1 λs )U ∗

∂G(x∗ ) Ps (x∗ ), i = 1, . . . , n, ∂xi

164

Y. LI AND L. ZHANG

and based on equation(4.5), there exists a constant %31 > 0, such that kP 0 M3 (x, 0, t)P 0⊥ kF ≤ %31 . Moreover, P 0 M3 (x, Θ, t)P 0⊥ is continuous with respect to (x, svec Θ) at (x∗ , 0, t) and we can find constants %3 = 2%31 , δ13 > 0 and 0 < t13 < t0 such that kP 0 M3 (x, Θ, t)P 0⊥ kF ≤ %3 , for x ∈ Bδ13 (x∗ ), Θ ∈ Bδ13 (0) and t ∈ (0, t13 ]. By direct calculation P 0 M4i (x∗ , 0, t)P 0⊥   = t−1 svec P 0 DΨ(B)|B=t−1 G(x∗ ) ∆i P 0⊥ ρ(G(x∗ ))

=t

−1

svec P

X

0

ψ [1] (t−1 λk , t−1 λl ) (Pk (x∗ )∆i Pl (x∗ )) P 0⊥

k,l=1 ρ(G(x∗ ))

=t

−1

X

svec

ψ [1] (0, t−1 λl ) (P1 (x∗ )∆i Pl (x∗ )) , i = 1, . . . , m(m + 1)/2,

l=2

and also with equation(4.4), there exists a constant %41 > 0, such that kP 0 M4 (x∗ , 0, t)P 0⊥ kF ≤ %41 . Moreover, for P 0 M4 (x, Θ, t)P 0⊥ is continuous with respect to (x, svec Θ) at (x∗ , 0, t), we can find constants %4 = 2%41 , δ14 > 0 and 0 < t14 < t0 such that kP 0 M4 (x, Θ, t)P 0⊥ kF ≤ %4 for x ∈ Bδ14 (x∗ ), Θ ∈ Bδ14 (0) and t ∈ (0, t14 ]. Furthermore, we have that P 0⊥ M3i (x∗ , 0, t)P 0⊥

ρ(G(x∗ ))

=P

0⊥ −1

t

X

svec

ψ [2] t−1 λk , t−1 λl , t−1 λs

k,l,s=1



  ∂G(x∗ ) ∂G(x∗ ) × Pk (x∗ )U ∗ Pl (x∗ ) Ps (x∗ )Ps (x∗ ) Pl (x∗ )U ∗ Pk (x∗ ) P 0⊥ ∂xi ∂xi = 0, i = 1, . . . , n and P 0⊥ M4i (x∗ , 0, t)P 0⊥   = t−1 svec P 0⊥ DΨ(B)|B=t−1 G(x∗ ) ∆i P 0⊥ ρ(G(x∗ ))

=t

−1

svec P

0⊥

X

ψ [1] (t−1 λk , t−1 λl ) (Pk (x∗ )∆i Pl (x∗ )) P 0⊥

k,l=1 ρ(G(x∗ ))

= t−1 svec

X

k,l=2

ψ [1] (t−1 λk , t−1 λl )Pk (x∗ )∆i Pl (x∗ ), i = 1, . . . , m(m + 1)/2.

A NEW NONLINEAR LAGRANGIAN METHOD

165

Thus, P 0⊥ M3 (x∗ , 0, t)P 0⊥ and P 0⊥ M4 (x∗ , 0, t)P 0⊥ are bounded when t ∈ (0, t0 ]. For the continuity of P 0⊥ M3 (x, Θ, t)P 0⊥ and P 0⊥ M4 (x, Θ, t)P 0⊥ with respect to (x, svec Θ) at (x∗ , 0, t), we can find constants %5 > 0, %6 > 0, δ15 > 0 and 0 < t15 < t0 such that kP 0⊥ M3 (x, Θ, t)P 0⊥ kF ≤ %5 and kP 0⊥ M4 (x, Θ, t)P 0⊥ kF ≤ %6 , where x ∈ Bδ15 (x∗ ), Θ ∈ Bδ15 (0), t ∈ (0, t15 ]. Above all, there exist δ1 = min{δ11 , δ12 , δ13 , δ14 , δ15 } and t1 = min{t11 , t12 , t13 , t14 , t15 } such that

1

2

M (x, Θ, t) ≤ %1 ,

M (x, Θ, t) ≤ %2 , F F



0 3

0 4

0⊥

P M (x, Θ, t)P ≤ %3 ,

P M (x, Θ, t)P 0⊥ ≤ %4 , F F



0⊥ 3

0⊥ 4 0⊥ 0⊥

P M (x, Θ, t)P ≤ %5 , P M (x, Θ, t)P ≤ %6 F

for any x ∈ Bδ1

(x∗ ),

F

Θ ∈ Bδ1 (0), t ∈ (0, t1 ].

Define G(0) = 

∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ )

  −∇h(x∗ )T   " T   T#n   ∗ ∗  − svec P T U ∗ ∂G(x ) + ∂G(x ) U ∗ Pγ  γ1 1 ∂xj ∂xj j=1

∇h(x∗ ) Aγ1 γ1 0 0



 0      0 

The following lemma is an important key in proving the convergence theorem. Lemma 4.3. Suppose that Assumptions (a)–(c) are satisfied. Then G0 is nonsingular. Proof. In fact, for any vector y T = (α1T , α2T , α3T ) ∈ Rn+q+N satisfying G(0)y = 0, where N = (m − r)(m − r + 1)/2, the following equations hold:  ∇2xx L(x∗ , θ∗ , U ∗ ) + Γ(x∗ , U ∗ ) α1 + ∇h(x∗ )α2 + Aγ1 γ1 α3 = 0, (4.6) − ∇h(x∗ )T α1 = 0, (4.7) "  T   T #n  ∗) ∗) ∂G(x ∂G(x T ∗ ∗  α1 = 0. (4.8) −  svec Pγ1 U + U Pγ1 ∂xj ∂xj j=1

166

Y. LI AND L. ZHANG

From

"

   T #n !T ∗ ∂G(x∗ ) ∗ T ∗ ∂G(x ) svec Pγ1 U + U Pγ1 α1 ∂xj ∂xj i=1   n ∗ ∗ X T ∂G(x ) T ∂G(x ) α1i svec Λγ1 γ1 Pγ1 = Pγ1 + Pγ1 Pγ1 Λγ1 γ1 ∂xj ∂xj i=1  m−r n ∗ ∗ X 0 T ∂G(x ) ∗ 0 T ∂G(x ) ∗ = α1i svec λk pk pl (x ) + λl pk pl (x ) ∂xj ∂xj k,l=1 i=1  m−r n ∗ X ∂G(x ) α1i svec pTk pl (x∗ ) = (λ0k + λ0l )m−r k,l=1 ◦ ∂xj k,l=1 i=1

and equation (4.8), we know that " #n ∂ svec(PγT1 G(x∗ )Pγ1 )T T α1 ∂xi

i=1

Then multiply (4.6) by α1T (∇2xx L(x∗ , U ∗ )

αT

= α1T Aγ1 γ1 = 0.

from the left and we get

+ Γ(x∗ , U ∗ ))α1 + α1T ∇h(x∗ )α2 + α1T Aγ1 γ1 α3 = 0

which implies α1T (∇2xx L(x∗ , U ∗ ) + Γ(x∗ , U ∗ ))α1 = 0. Therefore we have α1 = 0 by Assumption (c). As both ∇h(x∗ ) and Aγ1 γ1 are full rank in columns, we have α2 = 0 and α3 = 0 from equation (4.7) and (4.8). Thus, G(0) is a nonsingular matrix. 1/2 Define k(α, A)k := kαk22 + kAk2F , ∀(α, A) ∈ Rq × Sm . Let us go to prove the convergence theorem and discuss the rate of convergence of our method. Theorem 4.1. Suppose Assumptions (a)–(c) hold. Then there exist scalars δ0 > 0 and t¯0 ∈ (0, t0 ] such that for all (θ, U, t) in the set D which defined by D = {(θ, U, t)| tk(θ, U )−(θ∗ , U ∗ )k ≤ δ0 , t ∈ (0, t¯0 ]}, the following statements hold: (I): There exists a vector x ˜ such that x ˜=x ˜(θ, U, t) = arg minn {F (x, θ, U, t)|x ∈ B(x∗ , δ0 )} ,

R

x∈

which is continuously differentiable in the interior of D. (II): For the vector x ˜, υ(˜ x, θ, t) and matrix W (˜ x, U, t), when (θ, U, t) ∈ D, the following estimates hold, k˜ x − x∗ k ≤ Ct k(θ, U ) − (θ∗ , U ∗ )k ,

A NEW NONLINEAR LAGRANGIAN METHOD

167

kυ(˜ x, θ, t) − θ∗ k ≤ Ct k(θ, U ) − (θ∗ , U ∗ )k ,

kW (˜ x, U, t) − U ∗ kF ≤ Ct k(θ, U ) − (θ∗ , U ∗ )kF ,

where constant C > 0 is independent of t. Proof. To prove (I), we need to construct a function to establish the existence of x ˜(θ, U, t). Here, x ˜(θ, U, t) = x ˆ(η, Θ, t) which also satisfies ∇x F (˜ x, θ, U, t) = 0 i.e., ∇x F (ˆ x, η, Θ, t) = 0. Consider the mapping Φ : Rn × Rq × Sm−r × Rq × Sm × R+ → Rn+q+N defined by Φ(x, u, V, η, Θ, t)

(4.9)  ∇f (x) + ∇h(x)u + DG(x)∗ (Pγ1 V PγT1 ) + DG(x)∗ W (x, t−1 Θ + U ∗ , t)   −DG(x)∗ (P 0 W (x, t−1 Θ + U ∗ , t)P 0 )     = . −h(x) + tu − η − tθ∗     t svec V − t svec(PγT1 W (x, t−1 Θ + U ∗ , t)Pγ1 ) 

It is easy to see that

Φ(x∗ , θ∗ , Λγ1 γ1 , 0, 0, t) = 0, t > 0. After calculating the derivatives of Φ(x, u, V, η, Θ, t) with respect to (x, u, svec V ), we have that Jx,u,svec V Φ(x, u, V, η, Θ, t)|(x,u,V,η,Θ,t)=(x∗ ,θ∗ ,Λγ1 γ1 ,0,0,t)  !n  T G(x∗ )P ) T ∂ svec(P γ γ 1 1 ∇2xx L(x∗ , θ∗ , U ∗ ) + E(x∗ , U ∗ , t) ∇h(x∗ )    ∂xi = i=1  ∗ T   −∇h(x ) tIq 0  T ∗ ∗ −tJx svec Pγ1 W (x , U , t)Pγ1 0 tIN  2  ∇xx L(x∗ , θ∗ , U ∗ ) + E(x∗ , U ∗ , t) ∇h(x∗ ) Aγ1 γ1 −∇h(x∗ )T tIq 0 , = T ∗ ∗ −tJx svec(Pγ1 W (x , U , t)Pγ1 ) 0 tIN where

E(x∗ , U ∗ , t) = Define G :



∂G(x∗ ) ∂W(x∗ , U ∗ , t) , ∂xi ∂xj

R+ → R(n+q+N )×(n+q+N ) by

n

G(t) = Jx,u,svec V Φ(x∗ , θ∗ , Λγ1 γ1 , 0, 0, t)

for t > 0 and according to Lemma 4.1, we have lim G(t) = G(0),

t&0

ij=1

.

168

Y. LI AND L. ZHANG

which implies G(t) is nonsingular when t ∈ (0, t0 ]. And from Lemma 4.3 we know G(0) is nonsingular. Therefore according to the Banach perturbation theorem there are positive constants C1 , C10 , t2 ≤ t0 and δ2 such that when t ∈ (0, t2 ),

C1 ≤ Jx,u,svec V Φ(x, θ, V, η, θ, t)−1 F ≤ C10 (4.10)

for any (x, u, V, η, Θ) ∈ Bδ2 (x∗ , θ∗ , Λγ1 γ1 , 0, 0). ¯ = {(0, 0, t)| t ∈ [0, t¯0 ]}, where t¯0 = min{t0 , t1 , t2 }. Define a compact set X By the second implicit function theorem of Bertsekas [3], there exist scalars δ > 0, δ0 = min{δ1 , δ2 } and ε > 0 which are small enough and continuously differentiable mappings x ˆ(η, Θ, t) : D0 → B(x∗ , ε), u ˆ(η, Θ, t) : D0 → B(θ∗ , ε) and Vˆ (η, Θ, t) : D0 → B(Λγ1 γ1 , ε), where D0 is defined by D0 := {(η, Θ, t)|(η, Θ, t) ∈ B(0, δ0 ) × B(0, δ0 ) × (0, t¯0 ]} such that x ˆ(η, Θ, t) = x∗ , u ˆ(η, Θ, t) = θ∗ , Vˆ (η, Θ, t) = Λγ1 γ1 , ¯ ∀ (η, Θ, t) ∈ X,

(4.11)



2 1/2

ˆ

∗ 2 ∗ 2 kˆ x(η, Θ, t) − x k + kˆ u(η, Θ, t) − θ k + V (η, Θ, t) − Λγ1 γ1 ≤ δ, F 0

∀ (η, Θ, t) ∈ D ,

(4.12)

and Φ(ˆ x(η, Θ, t), u ˆ(η, Θ, t), Vˆ (η, Θ, t), η, Θ, t) = 0, ∀ (η, Θ, t) ∈ D0 .

(4.13)

For any (η, Θ, t) ∈ D0 , differentiating both sides of (4.13) with respect to (η, svec Θ), we obtain that   Jη,svec Θ x ˆ(η, Θ, t)   ˆ(η, Θ, t)  Jxˆ,ˆu,svec Vˆ Φ x ˆ(η, Θ, t), u ˆ(η, Θ, t), Vˆ (η, Θ, t), η, Θ, t  Jη,svec Θ u Jη,svecΘ svec Vˆ (η, Θ, t)   0 −M1 (ˆ x(η, Θ, t), Θ, t)  0 = Iq 2 0 M (ˆ x(η, Θ, t), Θ, t) Based on Lemma 4.2 and equation (4.10), we have that

 

J x ˆ (η, Θ, t) q η,svec Θ

≤ C10 %2 + %2 + q 2 , ∀ (η, Θ, t) ∈ D0 .

 Jη,svec Θ u  ˆ (η, Θ, t) 1 2

Jη,svec Θ svec Vˆ (η, Θ, t) F

A NEW NONLINEAR LAGRANGIAN METHOD

169

p Therefore, by letting C2 = C10 %21 + %22 + q 2 and ξ T = (η T , svec ΘT ), ∀β ∈ (0, 1), we obtain that

   



x ˆ(η, Θ, t) − x∗ x ˆ(η, Θ, t) − x ˆ(0, 0, t)





  =   u ˆ (η, Θ, t) − θ u ˆ (η, Θ, t) − u ˆ (0, 0, t)



svec(Vˆ (η, Θ, t) − Λγ γ ) svec(Vˆ (η, Θ, t) − V (0, 0, t)) 1 1

 

Z 1

Jη,svecΘ x ˆ(βη, βΘ, t)

  J u ˆ (βη, βΘ, t) = ξdβ η,svecΘ

0

Jη,svecΘ svec Vˆ (βη, βΘ, t) ≤ C2 kξk, ∀ (η, Θ, t) ∈ D0 .

Above all, we have

o

n

max kˆ x(η, Θ, t) − x∗ k , kˆ u(η, Θ, t) − θ∗ k , (Vˆ (η, Θ, t) − Λγ1 γ1 ) ≤ C2 kξk, F

which means n max kˆ x(η, Θ, t) − x∗ k , kˆ u(η, Θ, t) − θ∗ k ,

T

o −1 ∗ ∗

Pγ (W (ˆ x (η, Θ, t), t Θ + U , t) − U )P ≤ C2 kξk γ1 F 1

(4.14)

for any (η, Θ, t) ∈ D0 . Furthermore,

W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ F



= (P 0 + P 0⊥ )(W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ )(P 0 + P 0⊥ ) F

0

−1 ∗ ∗ 0

≤ P (W (ˆ x(η, Θ, t), t Θ + U , t) − U )P F

+ (W(ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ ) . F

And we know

0

P (W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ )P 0 F

= PγT (W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ )Pγ 1

1

F

≤ C2 kξk.

In fact, we have

(W(ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ ) F



≤ 2 P 0 (W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ )P 0⊥

F

0⊥ −1 ∗ ∗ 0⊥ + P (W (ˆ x(η, Θ, t), t Θ + U , t) − U )P F

0

−1 ∗ 0⊥ 0 = 2 P W (ˆ x(η, Θ, t), t Θ + U , t)P − P W (ˆ x(0, 0, t), U ∗ , t)P 0⊥

F

0⊥ −1 ∗ 0⊥ 0⊥ ∗ 0⊥ + P W (ˆ x(η, Θ, t), t Θ + U , t)P − P W (ˆ x(0, 0, t), U , t)P

F

170

Y. LI AND L. ZHANG

Z

1

= 2 (P 0 M3 (ˆ x(βη, βΘ, t), t−1 Θ + U ∗ , t)T P 0⊥ Jη,svecΘ x ˆ(βη, βΘ, t)

0

+ P 0 M4 (ˆ x, β(t−1 Θ + U ∗ ), t)|xˆ=ˆx(βη,βΘ,t) P 0⊥ )ξdβ

F

Z

1

+ (P 0⊥ M3 (ˆ x(βη, βΘ, t), t−1 Θ + U ∗ , t)T P 0⊥ Jη,svec Θ x ˆ(βη, βΘ, t)

0

0⊥ 4 −1 ∗ 0⊥ + P M (ˆ x, β(t Θ + U ), t)|xˆ=ˆx(βη,βΘ,t) P )ξdβ

F

0 3 −1 ∗ T 0⊥ ≤ 2 P M (ˆ x(βη, βΘ, t), t Θ + U , t) P F

Z 1

× Jη,svec Θ x ˆ(βη, βΘ, t)ξdβ

0 F Z 1



+ 2 P 0 M4 (ˆ x, β(t−1 Θ + U ∗ ), t)|xˆ=ˆx(βη,βΘ,t) P 0⊥ kξk dβ F 0



+ P 0⊥ M3 (ˆ x(βη, βΘ, t), t−1 Θ + U ∗ , t)T P 0⊥ F

Z 1

× Jη,svec Θ x ˆ(βη, βΘ, t)ξdβ

0 F Z 1



+ P 0⊥ M4 (ˆ x, β(t−1 Θ + U ∗ ), t)|xˆ=ˆx(βη,βΘ,t) P 0⊥ kξk dα F

0

≤ [(%3 + %5 )C2 + %4 + %6 ]kξk,

where the last inequality holds for

Z 1



Jη,svec Θ x ˆ(βη, βΘ, t)ξdβ x(η, Θ, t) − x∗ k ≤ C2 kξk.

= kˆ 0

Let C3 = (%3 + %5 )C2 + %4 + %6 , we obtain that

(W(ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ ) F ≤ C3 kξk,

where (η, Θ, t) ∈ D0 . In conclusion,

W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ F ≤ C2 kξk + C3 kξk = Ckξk,

where C := C2 + C3 which is independent of t. For

W (ˆ x(η, Θ, t), t−1 Θ + U ∗ , t) − U ∗ = kW (˜ x(θ, U, t), U, t) − U ∗ k and kξk = t k(θ, U ) − (θ∗ , U ∗ )k, we have that

k˜ x − x∗ k ≤ Ct k(θ, U ) − (θ∗ , U ∗ )k ,

A NEW NONLINEAR LAGRANGIAN METHOD

and

171



˜

θ − θ∗ ≤ Ct k(θ, U ) − (θ∗ , U ∗ )k ,

kW (˜ x, U, t) − U ∗ kF ≤ Ctk(θ, U ) − (θ∗ , U ∗ )k , where (θ, U, t) ∈ D. Then, the proof of (II) is completed. Remark 4.1. From Theorem 4.1, the special penalty function chosen in this paper enjoys a good theoretical convergence result. However, from the point view of practice, the calculation of spectral decompositions involved here may be very complicated. This seems to be a clear disadvantage compared to the reciprocal barrier function 1 φrec (t) = − 1, t−1 adopted by [9] and [20].

5. Conclusions This paper focuses on analyzing the convergence rate of a new nonlinear Lagrangian method for nonlinear SDP problems with equality constraints, under the constraint nondegeneracy condition, the strict complementarity condition and the second order sufficient conditions with sigma-term. An interesting problem unsolved is whether the convergence result of this nonlinear Lagrangian method is still valid when the strict complementarity condition fails to hold. Acknowledgment. The authors are grateful to the referees for their helpful comments and suggestions on improving the quality of this paper.

References [1] Apkarian, P., Noll, D., Tuan, H. D., Fixed-order H∞ control design via a partially augmented Lagrangian method, Internat. J. Robust Nonlinear Control 13 (2003), 1137–1148. [2] Ben-Tal, A., Zibulevsky, M., Penalty/barrier multiplier methods for convex programming problems, SIAM J. Optim. 7 (1997), 347–366. [3] Bertsekas, D. P., Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982. [4] Bonnans, J. F., Shapiro, A., Perturbation Analysis of Optimization Problems, Springer-Verlag, New York, 2000. [5] Debreu, G., Definite and semidefinite quadratic forms, Econometrica 20 (1952), 295–300.

172

Y. LI AND L. ZHANG

[6] Fares, B., Apkarian, P., Noll, D., An augmented Lagrangian method for a class of LMI-constrained problems in robust control theory, Internat. J. Control 74 (2001), 348–360. [7] Fares, B., Noll, D., Apkarian, P., Robust control via sequential semidefinite programming, SIAM J. Control Optim. 40 (2002), 1791–1820. [8] Horn, R. A., Johnson, C. R., Topics in Matrix Analysis. Vol 2, Cambridge University Press, Cambridge, 1991. [9] Kocvara, M., Stingl, M., PENNON: a generalized augmented Lagrangian method for semidefinite programming, Optim. Methods Softw. 18 (2003), 317–333. ¨ [10] L¨ owner, K., Uber monotone matrixfunktionen, Math. Z. 38 (1934), 177–216. [11] Mosheyev, L., Zibulevsky, M., Penalty/Barrier multiplier algorithm for semidefinite programming, Optim. Methods Softw. 13 (2000), 235–261. [12] Noll, D., Local convergence of an augmented Lagrangian method for matrix inequality constrained progrmming, Optim. Methods Softw. 22 (2007), 777–802. [13] Noll, D., Torki, M., Apkarian, P., Partially augmented Lagrangian method for matrix inequality constraints, SIAM J. Optim. 15 (2004), 161–184. [14] Penrose, R., A generalized inverse for matrices, Proc. Cambridge Philos. Soc. 51 (1955), 406–413. [15] Polyak, R. A., Log-Sigmoid multipliers method in constrained optimization, Ann. Oper. Res. 101 (2001), 427–460. [16] Polyak, R. A., Teboulle, M., Nonlinear rescaling and proximal-like methods in convex optimization, Math. Program. 76 (1997), 265–284. [17] Powell, M. J. D., A method for nonlinear constraints in minimization problems, in “Optimization”, R. Fletcher, ed., Academic Press, New York, 1969, 283–298. [18] Rockafellar, R. T., A dual approach to solving nonlinear programming problems by unconstrained optimization, Math. Program. 5 (1973), 354–373. [19] Shapiro, A., First and second order analysis of nonlinear semidefinite programs, Math. Program. 77 (1977), 301–320. [20] Stingl, M., On the Solution of Nonlinear Semidefinite Programs by Augmented Lagrangian Methods, Dissertation, Shaker Verlag, Aachen, 2006. [21] Sun, D. F., The strong second order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications, Math. Oper. Res. 31 (2006), 761–776. [22] Sun, D. F., Sun, J., Zhang L., The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming, Math. Program. 114 (2008), 349–391.

Yang Li Liwei Zhang School of Science Department of Applied Mathematics Dalian Nationalities University Dalian University of Technology Dalian 116600, China Dalian 116024, China e-mail: yangli [email protected] e-mail: [email protected]

Suggest Documents