A homotopy method based on penalty function for ... - Springer Link

J Glob Optim (2015) 63:61–76 DOI 10.1007/s10898-015-0276-5

A homotopy method based on penalty function for nonlinear semidefinite programming Li Yang · Bo Yu · YanXi Li

Received: 18 November 2013 / Accepted: 29 January 2015 / Published online: 8 February 2015 © Springer Science+Business Media New York 2015

Abstract This paper proposes a homotopy method based on a penalty function for solving nonlinear semidefinite programming problems. The penalty function is the composite function of an exponential penalty function, the eigenvalue function and a nonlinear operator mapping. Representations of its first and second order derivatives are given. Using the penalty function, a new homotopy is constructed. Global convergence of a smooth curve determined by the homotopy is proven under mild conditions. In the process of numerically tracing the curve, the method requires just the solution of a linear system of dimension n + 2, whereas a homotopy method proposed by Yang and Yu (Comput Optim Appl 56(1):81–96, 2013) requires a system of dimension n + m(m + 1)/2 + 1 to be solved, where n is the number of variables while m is the order of constraint matrix. So, it is expected that the proposed method can improve the efficiency of the method proposed by Yang and Yu. Preliminary numerical experiments are presented and show that the considered algorithm is efficient for some nonlinear semidefinite programming problems. Keywords Homotopy method · Global convergence · Penalty function · Nonlinear semidefinite programming

The work was supported by the National Natural Science Foundation of China (11301050, 11171051, 91230103, 71172136). L. Yang (B) School of Science, Dalian University of Technology, Dalian 116024, China e-mail: [email protected] B. Yu School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China e-mail: [email protected] Y. Li Faculty of Management and Economics, Dalian University of Technology, Dalian 116024, China

123

62

J Glob Optim (2015) 63:61–76

1 Introduction Consider the nonlinear semidefinite programming (NSDP) of the following form min f (x) s.t. G(x) 0,

(1)

f : → R and G : → denotes the space where x ∈ of m × m symmetric matrices, G(x) 0 indicates that G(x) is negative semidefinite. The NSDP problem (1) arises in numerous areas of application such as economics and engineering design. Moreover, it contains a wide range of optimization problems, the wellknown linear SDP problem is its important special case. Numerical algorithms for solving linear SDP have been studied extensively; see, for example, [2,3,12,31,33,34,43]. The development of theory and algorithms for solving nonlinear SDP, such as sequential semidefinite programming algorithms and augmented Lagrangian methods, has also received much attention in recent years. Optimality conditions for NSDP were discussed in [14,29,35]. A filter algorithm, successive linearization methods and spectral bundle methods were developed for solving NSDP in [16,19,26] respectively. Sequential semidefinite programming algorithms for the solution of NSDP and their convergence properties were discussed in [11,13,15]. Interior point methods based on the different merit functions were proposed for NSDP in [18,21,40,41]. Augmented Lagrangian methods were proposed in [20,24,25,30,32,36]. In [30], to solve (1), a class of matrix penalty functions was given. The matrix penalty function is composite function of the primary matrix function corresponding to real-valued penalty functions and a nonlinear operator mapping. Based on an exponential penalty function, an approximation to the maximum eigenvalue function was considered in [9]. Using a smooth convex merit function, which is composite function of the approximation to the maximum eigenvalue function and a linear operator mapping, a globally convergent regularization method was proposed for solving eigenvalue optimization problems. A computable Hessian formula of the smooth convex function and matrix representation of the Hessian were given. The penalty function used in [9], sometimes called the aggregation function, has been used in a number of settings, see, e.g., [6,7,22,23,37–39]. In this paper, using a penalty function, which is a composite function of the uniform approximation to the maximum eigenvalue function in [9] and a nonlinear operator mapping, a (n + 1)-dimensional homotopy mapping is constructed. So the dimension of the linear system to be solved per iteration is n + 2, which is less than that in the method proposed in [42]. A (n + m(m + 1)/2)-dimensional homotopy mapping was used to solve the problem (1) in [42]. So, it requires a linear system of dimension n + m(m + 1)/2 + 1 to be solved at each step. In this paper, using the penalty function, a globally convergent homotopy method for solving the problem (1) is given. Existence of a homotopy path, which starts from an interior point and converges to a KKT point, is proven. The organization of this paper is as follows. In Sect. 2, computable representation of first order derivative of the penalty function is given. Based on KKT conditions and the penalty function, a new homotopy is constructed. Global convergence result of the smooth homotopy path is given. In Sect. 3, by using a predictorcorrector algorithm, which approximately follows the homotopy path, preliminary numerical experiments are presented. Numerical results show that the proposed algorithm is successful on some considered examples. Throughout this paper, we use the following notations (expect as explained below): Let φ(x, y) be a mapping, φ and φx denote the derivatives of φ with respect to (x, y) and x respectively, ∇φ(x, y) denotes the gradient of φ at (x, y). The notation ‘intV’ means the topological interior of the set V. Diag(x) denotes a diagonal matrix whose i-th diagonal elem ment is xi . Denote by Sm + and S++ the sets of symmetric positive semidefinite and symmetric Rn ,

123

Rn

Rn

Sm

are sufficiently smooth, Sm

J Glob Optim (2015) 63:61–76

63

positive definite matrices, respectively, of dimension m × m. I denotes the identity matrix of appropriate dimension. Furthermore, for A, B ∈ Sm , A B and A B mean that A − B m T belong to Sm + and S++ , respectively. We define the scalar product A • B = Tr(A B), where ‘Tr’ denotes the trace (sum of diagonal elements) of a matrix. For a given matrix valued function G(x), let G (x) be the differential operator of G(x) evaluated at x with G (x)d =

n

di G i (x), ∀ d ∈ Rn ,

i=1

and

G (x)∗

be the adjoint operator of G (x) with T G (x)∗ Z = G 1 (x) • Z , G 2 (x) • Z , . . . , G n (x) • Z , ∀ Z ∈ Sm ,

where G i (x) =

∂G(x) ∂ xi .

Let Ω = {x : G(x) 0}, Ω 0 = {x : G(x) ≺ 0}, ∂Ω = Ω \ Ω 0 .

2 The penalty function and the homotopy In this section, a new homotopy method is proposed for solving the problem (1). A penalty function and its first order derivative are given. Using the penalty function and KKT conditions, a new homotopy is constructed. Existence and global convergence of the smooth homotopy path are proven. We first introduce first order optimality conditions for (1). Let x ∗ be a local solution of the problem (1). If the Robinson constraint qualification 0 ∈ int G(x) + G (x)Rn − Sm − holds at x ∗ , then there exists Z ∗ ∈ Sm such that ∇ f (x ∗ ) + G (x ∗ )∗ Z ∗ = 0, ∗

∗

G(x )Z = 0, G(x ∗ ) 0, Z ∗ 0.

(2)

For detailed discussions of first order optimality conditions for the problem (1), see [8,14]. For the convenience of the reader, we will cite some definitions and theorems that will be used in this section. Let ψ : Rn → R p be smooth. Definition 1 ([10]) We say y ∈ R p is a regular value for ψ if Range ψ (x) = R p , for all x ∈ ψ −1 (y), where Range ψ (x) denotes the range space of ψ (x). Theorem 1 ([10], Theorem 2.1) Let N ⊂ Rq , M ⊂ Rn be open and let φ : N × M → Rm be C r , r > max{0, n − m}. If 0 ∈ Rm is a regular value of φ, then for almost every u ∈ N , 0 is a regular value of φu (·) = φ(u, ·). We will be interested in the case n = m + 1. More details on Theorem 1 can be found in [1,10]. Now, we introduce a class of matrix functions defined on the space Sm . The definition of these functions is based on a spectral decomposition of a symmetric matrix and a realvalued function. Definition 2 ([30], Definition 2.1) Let ϕ : R → R be a given function and X ∈ Sm be a given symmetric matrix. Let further X = QΛQ T be an eigenvalue decomposition of X ,

123

64

J Glob Optim (2015) 63:61–76

where Λ = Diag((λ1 , . . . , λm )T ). Then the primary matrix function Φ corresponding to ϕ is defined by Φ : Sm → Sm

X → QDiag (ϕ(λ1 ), ϕ(λ2 ), . . . , ϕ(λm ))T Q T .

Theorem 2 ([30], Theorem 2.3) Let (a, b) ⊆ R and A : D ⊂ Rn → Sm be a twice continuously differentiable mapping. Let A(x) = Q(x)Λ(x)Q(x)T be an eigenvalue decomposition of A(x). Denote λ1 (x), λ2 (x), . . ., λm (x) the m increasingly ordered eigenvalues of A(x), and let x ∈ D, λ1 (x) ≥ a, λm (x) ≤ b. Let further ϕ : (a, b) → R be a twice continuously differentiable function and Φ be the corresponding primary matrix function. Then Φ(A(x)) is twice differentiable for all x ∈ D and the following formulas hold:

∂ T Φ (A(x)) = Q(x) [Δϕ (λk (x), λl (x))]m Q(x)T , k,l=1 Q(x) Ai (x)Q(x) ∂ xi where, denotes the Hadamard product, defined by A B = (Ai j Bi j ) for any pair of matrices A, B ∈ Sm , ϕ(λk )−ϕ(λl ) , λk = λl ; λk −λl Δϕ(λk , λl ) = ϕ (λk ), otherwise. Let G(x) = Q(x)Diag(λ(x))Q(x)T be eigenvalue decomposition of G(x), where λ(x) = (λ1 (x), . . . , λm (x))T , λi (x) ≤ λi+1 (x). Let Q 1 (x) be an m × r (x) matrix, whose columns form an orthogonal basis of the null space of G(x), where r (x) = m − rank(G(x)). To simplify the notation, we omit the dependence of r (x) and Q 1 (x) on x and write them as r and Q 1 . From Definition 2, we know that the following primary matrix function is well defined: exp(G(x)) = Q(x)Diag (expλ1 (x), . . . , expλm (x))T Q(x)T . Based on the primary matrix function, the following penalty function is introduced:

m G(x) λi (x) gθ (x, μ) = θ μln Tr exp , = θ μln exp θμ θμ i=1

where θ > 0 is a given constant, μ > 0. In fact, it is also a composite function of the penalty function xi u(x, μ) = μln exp μ and the eigenvalue function λ(G(x)) of the nonlinear operator mapping G(x). From Theorem 2, we can derive a compact representation of the gradient ∇x gθ (x, μ) of gθ (x, μ) about x: ∇x gθ (x, μ) =

G (x)∗ exp G(x) θμ . G(x) Tr exp θ μ

For details, see the Appendix. We make the following hypotheses in this paper: C1. Ω is nonempty and bounded.

123

J Glob Optim (2015) 63:61–76

65

C2. For any x ∈ ∂Ω, G (x)∗ (Q 1 U Q 1T ) = 0 and U ∈ Sr+ imply that U = 0.

(3)

C3. There exists a closed subset E ⊂ Ω 0 with nonempty interior E 0 , such that for any x ∈ ∂Ω, x + G (x)∗ (Q 1 U Q 1T ) : U ∈ Sr+ ∩ E = ∅. Indeed, the condition C2 holds if and only if Robinson constraint qualification holds at any point x ∈ ∂Ω. This claim is verified in the following proposition. Proposition 1 Robinson constraint qualification holds at x if and only if (3) holds at x. Proof It is well known that the generalized gradient of λm (x) is the nonempty compact convex set ∂λm (x) = G (x)∗ (Q 1 U Q 1T ) : Tr(U ) = 1, U ∈ Sr+ . For details, see [27,28]. It is obvious that (3) holds at x ∈ ∂Ω iff 0 ∈ / ∂λm (x). Hence, we only need to prove that Robinson constraint qualification is equivalent to 0 ∈ / ∂λm (x). Robinson constraint qualification, that is, ∃ h ∈ Rn , s.t. G(x) + G (x)h ≺ 0, implies that Q 1T G (x)h Q 1 ≺ 0. Hence, for any nonzero matrix U ∈ Sr+ , Q 1T G (x)h Q 1 • U = Tr(G (x)h Q 1 U Q 1T ) = h T G (x)∗ (Q 1 U Q 1T ) < 0, which implies that 0 ∈ / ∂λm (x). Conversely, if 0 ∈ / ∂λm (x), from the separation theorem for convex set and the fact that ∂λm (x) is nonempty closed convex set and does not contain 0, we know that there exists a plane in Rn that separates strictly the vector 0 from the set ∂λm (x). That is, there is a vector h ∈ Rn such that h T G (x)∗ (Q 1 U Q 1T ) < 0, for any U ∈ Sr+ with Tr(U ) = 1.

(4)

Let u ∈ Rr with u = 1 be any given, and set U = uu T . By (4), we have that n T ∗ T T ∗ T T T T h G (x) (Q 1 U Q 1 ) = h G (x) (Q 1 uu Q 1 ) = u h i Q 1 G i (x)Q 1 u < 0, i=1

n

for any u with u = 1. Hence, i=1 h i Q 1T G i (x)Q 1 ≺ 0, which is equivalent to that Robinson constraint qualification holds at x (see Lemma 5 in [14]). Moreover, the conditions C1 and C2 imply that Ω 0 is nonempty. The results can be also derived from the characterizations discussed in Section 2.3.4 in [8]. Let Ωθ (μ) = {x : g(x, θ μ) ≤ 0}, Ωθ0 (μ) = {x : g(x, θ μ) < 0}, ∂Ωθ (μ) = Ωθ (μ) \ Ωθ0 (μ). To prove a global convergence result for our method, we give the following proposition and lemmas. Proposition 2 For any θ > 0 and μ ∈ (0, 1], we have that λm (x) ≤ gθ (x, μ) ≤ λm (x) + θ μln m. Hence, Ωθ (μ) ⊆ Ω for any θ > 0 and μ ∈ (0, 1].

123

66

J Glob Optim (2015) 63:61–76

m λm (x) = Proof From exp λmθ μ(x) ≤ Tr exp G(x) exp λθi (x) θμ μ ≤ m exp θ μ , we know that i=1

λm (x) = θ μln exp

λm (x) θμ

m λi (x) exp ≤ gθ (x, μ) = θ μln i=1 θμ

λm (x) ≤ θ μln m exp = λm (x) + θ μln m. θμ Lemma 1 Suppose that the conditions C1 and C2 hold, then for any closed subset N ⊂ Ω 0 , there exists a constant θ ∈ (0, 1] such that N ⊂ Ωθ (1)0 . Proof From the conditions C1 and C2, we know that Ω 0 is nonempty. So N is nonempty. By continuity of λm (x) and compactness of the set N , we know that λm (x) reaches its maximum at some point in N . Without loss of generality, let us assume that xˆ is the maximizing point. From N ⊂ Ω 0 , we have λm (x) ˆ < 0. Taking θ = min{1, −λm (x)/2ln ˆ m}, then by Proposition 2, we have that for any x ∈ N , gθ (x, 1) ≤ λm (x) + θ ln m ≤ λm (x) ˆ + θ ln m ≤ λm (x)/2 ˆ < 0, which implies that x ∈ Ωθ (1)0 . Hence, N ⊂ Ωθ (1)0 .

Lemma 2 Suppose that G(x) is continuously differentiable. Let the conditions C1 and C2 hold, then there exists a positive constant θ ∈ (0, 1] such that for any μ ∈ (0, 1], the boundary of Ωθ (μ) is regular, i.e., for any x ∈ ∂Ωθ (μ), ∇x gθ (x, μ) = 0. Proof Suppose for contradiction that for any θ ∈ (0, 1], there exist μ ∈ (0, 1] and x ∈ ∂Ωθ (μ) such that ∇x gθ (x, μ) = 0. Hence, there exists a sequence of points {(x (k) , θk , μk )} with θk → 0 as k → ∞ such that G(x (k) ) G(x (k) ) (k) (k) ∗ Tr exp ∇x gθk (x , μk ) = G (x ) exp = 0, θk μk θk μk where x (k) ∈ ∂Ωθk (μk ). From the condition C1 and Proposition 2, we have that {x (k) } is bounded. Without loss of generality, let us assume that x ∗ is a limit point of the sequence {x (k) }. From Proposition 2, gθk (x (k) , μk ) = 0andθk μk → 0as k → ∞, we know that (k)

(k)

) ) Tr exp G(x x ∗ ∈ ∂Ω. Because the sequence exp G(x θk μk θk μk

is bounded, without loss ¯ then G¯ can be written as of generality, let us assume that it has an accumulation point G, G¯ = Q¯ 1 U Q¯ 1T , where columns of Q¯ 1 form an orthogonal basis of the null space of G(x ∗ ), ¯ , r¯ = m − rank G(x ∗ ). Hence, U ∈ Sr+ 0 = lim ∇x gθk (x (k) , μk ) = G (x ∗ )∗ G¯ = G (x ∗ )∗ ( Q¯ 1 U Q¯ 1T ), k→∞

which contradicts the condition C2. Hence, for any x ∈ ∂Ωθ (μ), ∇x gθ (x, μ) = 0.

Lemma 3 Suppose that G(x) is continuously differentiable. If the conditions C1, C2 and C3 hold, then for any closed subset N ⊂ E , there exists a positive constant θ ∈ (0, 1] such

123

J Glob Optim (2015) 63:61–76

67

that for any μ ∈ (0, 1], Ωθ (μ) satisfies the weak normal cone condition w.r.t. N , i.e., for any μ ∈ (0, 1] and x ∈ ∂Ωθ (μ),

G(x) G(x) Tr exp : α ≥ 0 ∩ N = ∅. x + αG (x)∗ exp θμ θμ Proof Suppose for contradiction that there exists a closed subset N ⊂ E such that for all θ ∈ (0, 1], there exists μ ∈ (0, 1] such that Ωθ (μ) does not satisfy the weak normal cone condition w.r.t. N . Hence, without loss of generality, let us assume that there exists a sequence of points {(x˜ (k) , x¯ (k) , αk , θk , μk )} such that θk → 0, N x˜ (k) → x˜ ∈ N ⊂ E , ∂Ωθk (μk ) x¯ (k) → x¯ ∈ ∂Ω, as k → ∞, μk ∈ (0, 1], αk ≥ 0, (k)

x˜

(k)

= x¯

(k)

G (x¯ (k) )∗ exp G(θkx¯μk ) . + αk (k) Tr exp G(θkx¯μk )

(5)

(k) Similarly to the proof of Lemma 2, we have that any accumulation point of exp G(θkx¯μk ) (k) Tr exp G(θkx¯μk ) can be written as Q¯ 1 U Q¯ 1T , where columns of Q¯ 1 form an orthogonal basis

¯ , r¯ = m − rank G( x). of the null space of G(x), ¯ U ∈ Sr+ ¯ From the condition C2 and x¯ ∈ ∂Ω, T ∗ we know that G (x) ¯ Q¯ 1 U Q¯ 1 = 0, which, together with the condition C1 and the equality (5), implies that {αk } is bounded. Without loss of generality, let us assume that it has an accumulation point α¯ ∈ R+ . By taking the limit as k → ∞ in (5), we have that

x˜ = x¯ + αG ¯ (x) ¯ ∗ ( Q¯ 1 U Q¯ 1T ), which contradicts the condition C3. We construct the following homotopy (1 − μ)(∇ f (x) + y∇x gθ (x, μ)) + μ x − x (0) , H (x, y, μ) = ygθ (x, μ) − μy (0) gθ (x (0) , 1)

(6)

where (x, y, μ) ∈ Ω × R+ × (0, 1], w (0) = (x (0) , y (0) ) ∈ E 0 × R++ , θ > 0 is a constant. We denote the zero point set of H (x, y, μ) as H −1 (0) = {(x, y, μ) ∈ Ω × R+ × (0, 1] : H (x, y, μ) = 0}. Under conditions C1, C2 and C3, we are ready to give the following global convergence result of our method for solving the problem (1). Theorem 3 Suppose that f (x) and G(x) are three times continuously differentiable and the conditions C1, C2 and C3 hold. Let H be defined as (6), then for any x¯ ∈ E 0 , there exists a neighbourhood S(x) ¯ of x¯ such that S(x) ¯ ⊂ E 0 , and there exists a constant θ ∈ (0, 1] such that S(x) ¯ ⊂ Ωθ (1)0 , ∂Ωθ (μ) is regular and Ωθ (μ) satisfies the weak normal cone (0) condition w.r.t. S(x) ¯ for any μ ∈ (0, 1]. Furthermore, for almost ¯ × R++ , (0) all w ∈ S(x) −1 H (0) contains a smooth curve Γw(0) which starts from w , 1 , and terminiates in or approaches to the hyperplane at μ = 0. Moreover, let (x ∗ , y ∗ , 0) be any limit point of Γw(0) on the hyperplane at μ = 0, then x ∗ is a KKT point of (1). ¯ Proof From Lemmas 1–3, we know that for any x¯ ∈ E 0 , there exists a neighbourhood S(x) of x¯ such that S(x) ¯ ⊂ E 0 , and there exists a constant θ ∈ (0, 1] such that S(x) ¯ ⊂ Ωθ (1)0 , ∂Ωθ (μ) is regular and Ωθ (μ) satisfies the weak normal cone condition w.r.t. S(x) ¯ for any μ ∈ (0, 1].

123

68

J Glob Optim (2015) 63:61–76

Let H˜ (x, y, w (0) , μ) be the same mapping with H (x, y, μ) but taking w (0) as a variate, where x (0) ∈ S(x). ¯ It is obvious that ∂ H˜ (x, y, w (0) , μ) −μI 0 . = ∗ −μgθ (x (0) , 1) ∂(x (0) , y (0) ) ˜

(0)

(x,y,w ,μ) ¯ gθ (x (0) , 1) < 0. Hence, ∂ H∂(x is nonsingular and For any x (0) ∈ S(x), (0) ,y (0) ) (0) ˜ H (x, y, w , μ) is row full rank when μ > 0. That is, 0 is a regular value of H˜ (x, y, w (0) , μ). Using Theorem 1, we have that for almost all w (0) ∈ S(x) ¯ × R++ , 0 is a regular value of H (x, y, μ) : Ω × R+ × (0, 1) → Rn+1 . By noting that H (w (0) , 1) = 0 and

∂ H (w (0) , 1) I 0 = ∗ gθ (x (0) , 1) ∂(x, y)

is nonsingular, we know that H −1 (0) must contain a smooth curve, say Γw(0) , which starts from (w (0) , 1) and goes into Ωθ (1)0 × R++ × (0, 1). As μ = 1, the homotopy equation H (x, y, 1) = 0 turns to the system x − x (0) = 0, ygθ (x, 1) − y (0) gθ (x (0) , 1) = 0. (0) ¯ ⊂ Ωθ (1)0 that the system has only one single solution w (0) = It follows from x ∈ S(x) x (0) , y (0) . The curve Γw(0) is bounded, we outline a proof by contradiction. Suppose that Γw(0) is unbounded. Because Ω and (0, 1] are bounded, there exists a sequence of points {(x (k) , yk , μk )} such that x (k) → x ∗ , μk → μ∗ , yk → ∞, as k → ∞. From {(x (k) , yk , μk )} ⊆ H −1 (0), we have that

μk y (0) gθ (x (0) , 1) = 0. k→∞ yk

lim gθ (x (k) , μk ) = lim

k→∞

From the first equality of the homotopy equation, we have G (x (k) )∗ exp(G(x (k) )/θ μk ) (k) (k) (0) x = 0. (1 − μk ) ∇ f (x ) + yk − x + μ k Tr(exp(G(x (k) )/θ μk ))

(7)

(8)

For the limit point μ∗ , only the following three cases are possible: (a) μ∗ = 1; (b) μ∗ ∈ (0, 1); (c) μ∗ = 0. (a) μ∗ = 1. If {(1 − μk )yk } is bounded, without loss of generality, let us assume that it has an accumulation point y¯ . From (7), we know that x ∗ ∈ ∂Ωθ (μ∗ ). By taking the limit as k → ∞ in (8), we have that x (0) = x ∗ + lim (1 − μk )yk ∇gθ (x (k) , μk ) = x ∗ + y¯ ∇x gθ (x ∗ , μ∗ ), k→∞

¯ which contradicts the fact that Ωθ (μ) satisfies the weak normal cone condition w.r.t. S(x) for any μ ∈ (0, 1]. If {(1 − μk )yk } is unbounded, the discussion is the same with the case (b), which will be done below. (b) μ∗ ∈ (0, 1).

123

J Glob Optim (2015) 63:61–76

69

From the equality (7), we know that x ∗ ∈ ∂Ωθ (μ∗ ). By dividing (8) by (1 − μk )yk and taking limits as k → ∞, we obtain that lim

k→∞

G (x (k) )∗ exp(G(x (k) )/θ μk ) = lim ∇x gθ (x (k) , μk ) = ∇x gθ (x ∗ , μ∗ ) = 0, k→∞ Tr(exp(G(x (k) )/θ μk ))

which contradicts the fact that ∂Ωθ (μ) is regular for any μ ∈ (0, 1]. (c) μ∗ = 0. From (7) and Proposition 2, we know that x ∗ ∈ ∂Ω when μ∗ = 0. By dividing (8) by (1 − μk )yk and taking limits as k → ∞, we obtain that G (x (k) )∗ exp(G(x (k) )/θ μk ) = G (x ∗ )∗ (Q ∗1 U Q ∗T 1 ) = 0, k→∞ Tr(exp(G(x (k) )/θ μk )) G(x (k) ) G(x (k) ) , columns of Q ∗1 where Q ∗1 U Q ∗T 1 is an accumulation point of exp θ μk /Tr exp θ μk lim

∗

form an orthogonal basis of the null space of G(x ∗ ), U ∈ Sr+ , r ∗ = m − rank G(x ∗ ). This contradicts the condition C2. From (a), (b) and (c), we conclude that Γw(0) is bounded. Let (x ∗ , y ∗ , μ∗ ) be any limit point of Γw(0) in ∂(Ωθ (μ∗ ) × R+ × [0, 1]). Because Γw(0) is bounded, only the following four cases are possible: (i) (ii) (iii) (iv)

(x ∗ , y ∗ , μ∗ ) ∈ Ωθ (μ∗ ) × R+ × {1}; (x ∗ , y ∗ , μ∗ ) ∈ Ωθ (μ∗ ) × ∂ R+ × (0, 1); (x ∗ , y ∗ , μ∗ ) ∈ ∂Ωθ (μ∗ ) × R++ × (0, 1); (x ∗ , y ∗ , μ∗ ) ∈ Ω × R+ × {0}. (0)

(w ,1) is nonsingular, (i) Because H (x, y, 1) = 0 has only one simple solution and ∂ H∂(x,y) is impossible. By the second equality of the homotopy equation, the cases (ii) and (iii) are impossible. Summing up, case (iv) is the only possible case. As (x (k) , y (k) , μk ) → (x ∗ , y ∗ , 0), the homotopy equation turns to the system

∇ f (x ∗ ) + y ∗ G (x ∗ )∗ Z ∗ = 0, where Z ∗ = Q ∗1 U Q ∗T 1

y ∗ λm (x ∗ ) = 0, (9) (k) exp(G(x )/θ μk ) is an accumulation point of Tr(exp(G(x (k) )/θ μ )) as k → ∞, columns k ∗

of Q ∗1 form an orthogonal basis of the null space of G(x ∗ ), U ∈ Sr+ , r ∗ = m − rank G(x ∗ ). It is easy to see that the first order optimality condition (2) is equivalent to the system (9). As a result, x ∗ is a KKT point of (1).

3 Algorithm and numerical results In this section, a predictor-corrector algorithm which numerically traces the homotopy path, is given. For details, see [4,5]. We implemented the predictor-corrector algorithm, the method [42] and the Algorithm 4 (the modified augmented Lagrangian (MAL) method) in [36] for solving the problem (1), and compared our results with the ones reported by the method [42] and the MAL method [36]. The predictor-corrector algorithm for the solution of the problem (1) is given below. For to denote H (w (k, j) , μ ) and H (w (k, j) , μ ) notational convenience, we use Hk, j and Hk, k, j k, j j (0) (0) (0) respectively, denote w = (x , y ).

123

70

J Glob Optim (2015) 63:61–76

Predictor-Corrector Algorithm. Step 0. (Initialize) Give 1 ≥ 2 > 0, α0 > 0, 0 < θ1 < θ2 < θ3 < 1 < θ4 < θ5 , αmin , αmax , kmax , 0 < θangle < 1. Set (w (1) , μ1 ) = (w (0) , 1), α = α0 , = 1 , k = 1. Step 1. (Predictor step) Compute a predictor point. (1.1) If k = 1, solve the linear equation

H (w (0) , 1) (0, . . . , 0, −1)

d (1) =

0 1

to obtain a unit tangent vector d (1) = d (1) /d (1) . Set d (0) = d (1) ; (1.2) If k > 1, set d (k) = ((w (k) , μk ) − (w (k−1) , μk−1 ))/(w (k) , μk ) − (w (k−1) , μk−1 ); (1.3) Determine the smallest nonnegative integer i such that (w (k+1,0) , μk+1,0 ) = (w (k) , μk ) + θ1i αd (k) ∈ Ω 0 × R++ × (0, 1), set α = θ1i α. Go to step 2. Step 2. (Corrector step) Compute a corrector point. Set j = 0. Repeat the following process until Hk+1, j ≤ or j = kmax : Compute the Newton step d¯ by solving

¯ Hk+1, j d = −Hk+1, j ,

d¯ T d (k) = 0.

(10)

Determine the smallest nonnegative integer i such that (w (k+1, j+1) , μk+1, j+1 ) = (w (k+1, j) , μk+1, j ) + θ3i d¯ ∈ Ω 0 × R++ × (0, 1). Set j = j + 1. Go to step 3. Step 3. (Steplength adaptation) The steplength adjusting. (3.1) If j = kmax and Hk+1, j > , set α = max{αmin , θ2 α} and (w (k+1,0) , μk+1,0 ) = (w (k) , μk ) + αd (k) , go to step 2; Else, set (w (k+1) , μk+1 ) = (w (k+1, j) , μk+1, j ); (3.2) Adjust the steplength α as follows: (a) (b) (c) (d)

If d (k)T d (k−1) < θangle , set α = max{αmin , θ1 α}; If j > 4, set α = max{αmin , θ2 α}; If j = 2, set α = min{αmax , θ4 α}; If j < 2, set α = min{αmax , θ5 α}.

Step 4. (Termination) If μk+1 ≤ 2 and H (w (k+1) , μk+1 ) ≤ 2 , then stop; Else, set = max(min(μk+1 , 1 ), 2 ) and k = k + 1, and go to Step 1. Examples and numerical results of our preliminary experiments are given below. In our experiments, all implementations were done in MATLAB R2012b running on a notebook PC Intel(R) Core(TM) i5-3210M CPU with 2.5 GHz and 4 GB RAM. The optimization subproblems arising in the MAL method were solved by the subroutine fmincon in the optimization toolbox of MATLAB. In the MAL method, parameters were the same as that in [36]. In our method, parameters were set as kmax = 5, α0 = 0.2, αmin = 10−10 , αmax = +∞, θ1 = 0.4, θ2 = 0.7, θ3 = 0.9, θ4 = 1.5, θ5 = 2, θangle = 0.7, 1 = 10−4 , 2 = 10−6 .

123

J Glob Optim (2015) 63:61–76

71

Example 1 ([17]) Find a matrix K such that A(K ) is Hurwitz, i.e., eigenvalues of matrix A(K ) all belong to the left half plane D = {t ∈ C : t + t ∗ < 0} of the complex plane, where t ∗ is the conjugate of t, ⎞ ⎛ ⎞ ⎛ 0.4422 0.1761 −0.0366 0.0271 0.0188 −0.4555 ⎟ ⎜ ⎜ 0.0482 −1.01 0.0024 −4.0208 ⎟ ⎟ , B = ⎜ 3.5446 −7.5922 ⎟ , A=⎜ ⎝ ⎠ ⎝ 0.1002 0.3681 −0.7070 −5.52 4.49 ⎠ 1.42 0 0 0 0 1 0 C = 0 1 0 0 . Example 1 amounts to solving the quadratic matrix inequality problem S(K ) 0. We solve it by solving non-strict optimization problem {min K ,μ μ : μI +S(K ) 0}. For details, see [42]. For the initial point K = (−1, 0)T , our method solved this problem with 0.0612 seconds and ended with the solution (−0.8856, 0.3625). Below, we solve the following more general formulation for Example 1, which can be used to represent more types of optimization and control problems. i ∈ Sm , Example 2 Let Aik , K kl

min

x∈Rn ,λ∈R

λ,

s.t. Ai0 +

n k=1

xk Aik +

¯ b ≤ x ≤ b,

d k,l=1

i λI, i = 1, . . . , J, xk xl K kl

i were generated by the where J = 1, b¯ = −b = 50(1, . . . , 1)T , 0 < d ≤ n, Aik and K kl Matlab function randn. We symmetrized the matrices by copying the upper triangular part to the lower one after creation.

Example 3 ([36]) Let Ak ∈ Sm , min x T Qx/2 + c T x, n xk Ak 0, s.t. A0 +

x∈Rn

k=1

−1 ≤ x ≤ 1, l ) l where Q = P + P T , P = ( pi j )n×n , Al = BlT Bl , Bl = (bsk m×m , pi j and bsk are random numbers uniformly distributed in the interval [0,1], c is a random number uniformly distributed in the interval [−5,5].

For Example 2, the initial point x (0) is generated by the Matlab function randn. For Example 3, x (0) is a random number uniformly distributed in the interval [−1, 1]. For each (n, m), 100 random instances are tested. In Tables 1 and 2, the average CPU time in second (time[s]) is given, ‘–’ indicates Out of Memory, ‘Vmal ’ indicates the number of the instances, for which our method required more time than the method [36], among 100 random instances. From Tables 1 and 2, we find that when (n, m) in Example 2 is identical to the one in Example 3, the solution of Example 2 is more time-consuming. The reason may be that G(x) in Example 2 is nonlinear and nonconvex in general. Numerical results in Tables 1a and 2a show that our method tends to require more time than the methods [42] and [36] for n m. Although our method requires just the solution of a linear system of dimension n + 2 at each iteration, the gradient and Hessian

123

72 Table 1 Numerical results for Example 2

J Glob Optim (2015) 63:61–76 n

m

Time (s) [42]

Ours

[36]

(a) n > m, d = 5 50

25

360

23

60

30

402

36

70

38

581

46

80

45

755

57

43

375

26

60

45

439

32

70

61

655

60

80

67

807

67

44

254

42

60

75

435

42

70

112

796

68

80

124

838

80

50

50

n

5

8

12

m

Time (s) [42]

Vmal Ours

[36]

(b) n < m, d = 10 10

20

30

50

436

8

15

0

23

56

0

–

52

160

0

–

108

341

0 0

100

–

150 200

526

13

33

100

50

–

44

115

0

150

–

107

315

0

200

–

0

205

653

675

26

55

0

100

–

73

190

0

150

–

147

535

0

200

–

435

994

0

50

calculations of the penalty function gθ (x, μ) may be more time-consuming for n m. For the problems with large n, some efficient strategy for the reduction of the computation cost of derivatives may be beneficial for the improvement of the predictor-corrector algorithm. From the results in Tables 1b and 2b, we find that our method requires fewer time than the methods [42] when n m. At each iteration, our method requires just the solution of a linear system of dimension n + 2, whereas the method [42] requires a linear system of dimension n + m(m + 1)/2 + 1 to be solved. So, our method is faster than the method [42] for solving the problem with n m. Moreover, we also observe that although our method required more time than the method [36] for some instances in Example 3 with n m, the running average CPU time of our method is less. Preliminary computational experiences show that our method is competitive with the methods [42] and [36].

123

J Glob Optim (2015) 63:61–76 Table 2 Numerical results for Example 3

73 n

m

Time (s) [42]

Ours

[36]

(a) n > m 50

5

1.0

4.7

2.7

1.4

8.3

3.5

60 80

2.8

27.6

5.7

100

4.3

39.3

7.5

50

10

1.9

6.8

3.6

2.5

10.2

4.6

80

4.1

24.7

7.1

100

6.9

36.7

9.0

60

50

20

7.2

11.6

5.5

8.5

14.9

6.8

60 80

10.9

51.2

10.0

100

22.9

125.6

15.1

Ours

[36]

n

m

Time (s) [42]

Vmal

(b) n < m 10

224.3

2.6

5.0

100

50

–

7.5

16.1

6

150

–

16.5

36.6

10

–

28.2

43.8

9

5.9

9.4

13

200 20

30

50

323.2

11

100

–

21.4

36.8

12

150

–

53.3

104.5

15

200

–

90.3

113.3

12

386.6

14.2

15.6

21

50 100

–

52.8

64.2

26

150

–

114.3

167.1

24

200

–

210.5

227.1

32

Our future work will include consideration to the strategy for the reduction of the computation cost of derivatives of the penalty function and using other penalty functions as well. Acknowledgments The authors thank the anonymous referees, whose comments and suggestions led to an improved version of this article.

Appendix: The gradient and Hessian of gθ (x, μ) For the convenience of the reader, representations of the gradient and Hessian of gθ (x, μ) will be derived below. From Theorem 2, we have that

∂expG(x) T = Q(x) [Δϕ (λk (x), λl (x))]m Q(x)T , k,l=1 Q(x) G i (x)Q(x) ∂ xi

123

74

J Glob Optim (2015) 63:61–76

∂exp G(x) θμ ∂ xi

=

1 Q(x) [Δϕ(λk (x)/θ μ, λl (x)/θ μ)]m k,l=1 θμ [Q(x)T G i (x)Q(x)] Q(x)T ,

∂Tr (expG(x)) T = Tr Q(x) (Δϕ (λk (x), λl (x)))m Q(x)T k,l=1 Q(x) G i (x)Q(x) ∂ xi

T = Tr (Δϕ (λk (x), λl (x)))m k,l=1 Q(x) G i (x)Q(x) = Tr Diag(expλ(x))Q(x)T G i (x)Q(x) = Tr Q(x)Diag(expλ(x))Q(x)T G i (x) = Tr G i (x)expG(x) , ∂Tr exp G(x) θμ ∂ xi

= Tr Q(x) [Δϕ(λk (x)/θ μ, λl (x)/θ μ)]m k,l=1 θμ [Q(x)T G i (x)Q(x)] Q(x)T T [Q(x) G (x)Q(x)] θμ = Tr [Δϕ(λk (x)/θ μ, λl (x)/θ μ)]m i k,l=1

λ(x) θμ = Tr Diag exp Q(x)T G i (x)Q(x) θμ

λ(x) = Tr Q(x)Diag exp Q(x)T G i (x) θμ θμ

G(x) θ μ, = Tr G i (x)exp θμ

where ϕ(α) = exp α, α ∈ R. Consequently, the gradient and Hessian of gθ (x, μ) about x and μ are as follows: ∇x gθ (x, μ) =

G (x)∗ exp G(x) θμ Tr(exp G(x) θμ )

,

G(x) G(x) G(x) ∇μ gθ (x, μ) = θ ln Tr exp Tr μ exp − Tr G(x)exp , θμ θμ θμ G (x)∗ exp G(x) Tr G(x)exp G(x) G (x)∗ G(x)exp G(x) θμ θμ θμ 2 , ∇xμ gθ (x, μ) = − 2 G(x) 2 G(x) θ μ Tr exp θ μ θ μ2 Tr exp θ μ

∂exp G(x) G(x) ∂ 2 gθ (x, μ) G(x) θμ Tr exp = Tr G i j (x)exp + Tr G i (x) ∂ xi ∂ x j θμ ∂x j θμ

G(x) G(x) G(x) 2 −Tr G i (x)exp Tr G j (x)exp θ μ Tr exp , θμ θμ θμ

where G ij (x) =

123

∂ 2 G(x) ∂ xi x j .

J Glob Optim (2015) 63:61–76

75

References 1. Abraham, R., Robbin, J.: Transversal Mappings and Flows. W. A. Benjamin Inc, New York-Amsterdam (1967) 2. Alizadeh, F., Haeberly, J.P.A., Nayakkankuppam, M.V., Overton, M.L., Schmieta, S.: SDPPACK user’s guide. Tech. rep. Courant Institute of Mathematical Sciences, New York University, New York, NY (1997) 3. Alizadeh, F., Haeberly, J.P.A., Overton, M.L.: Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8(3), 746–768 (1998) 4. Allgower, E.L., Georg, K.: Numerical Path Following. North-Holland, Amsterdam (1997) 5. Allgower, E.L., Georg, K.: Introduction to Numerical Continuation Methods. Classics in Applied Mathematics, vol. 45. SIAM, Philadelphia, PA (2003) 6. Auslender, A.: Penalty and barrier methods: a unified framework. SIAM J. Optim. 10(1), 211–230 (1999) 7. Ben-Tal, A., Teboulle, M.: A smoothing technique for nondifferentiable optimization problems. In: Optimization (Varetz, 1988), Lecture Notes in Math., vol. 1405, pp. 1–11. Springer, Berlin (1989) 8. Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, New York (2000) 9. Chen, X., Qi, H., Qi, L., Teo, K.L.: Smooth convex approximation to the maximum eigenvalue function. J. Glob. Optim. 30(2), 253–270 (2004) 10. Chow, S.N., Mallet-Paret, J., Yorke, J.A.: Finding zeroes of maps: homotopy methods that are constructive with probability one. Math. Comput. 32(143), 887–899 (1978) 11. Correa, R., Ramirez, C.H.: A global algorithm for nonlinear semidefinite programming. SIAM J. Optim. 15(1), 303–318 (2004) 12. de Klerk, E.: Aspects of Semidefinite Programming. Kluwer Academic Publishers, Dordrecht (2002) 13. Fares, B., Noll, D., Apkarian, P.: Robust control via sequential semidefinite programming. SIAM J. Control Optim. 40(6), 1791–1820 (2002) 14. Forsgren, A.: Optimality conditions for nonconvex semidefinite programming. Math. Program. 88(1), 105–128 (2000) 15. Freund, R.W., Jarre, F., Vogelbusch, C.H.: Nonlinear semidefinite programming: sensitivity, convergence, and an application in passive reduced-order modeling. Math. Program. 109(2–3), 581–611 (2007) 16. Gómez, W., Ramírez, H.: A filter algorithm for nonlinear semidefinite programming. Comput. Appl. Math. 29(2), 297–328 (2010) 17. Henrion, D., Löfberg, J., Koˇcvara, M., Stingl, M.: Solving polynomial static output feedback problems with PENBMI. In: In IEEE (ed.) Proceedings of the 44th IEEE Conference on Decision and Control, Sevilla, Spain, vol. 1, pp. 7581–7586 (2005) 18. Jarre, F.: An interior method for nonconvex semidefinite programs. Optim. Eng. 1(4), 347–372 (2000) 19. Kanzow, C., Nagel, C., Kato, H., Fukushima, M.: Successive linearization methods for nonlinear semidefinite programs. Comput. Optim. Appl. 31(3), 251–273 (2005) 20. Koˇcvara, M., Stingl, M.: Pennon: a code for convex nonlinear and semidefinite programming. Optim. Methods Softw. 18(3), 317–333 (2003) 21. Leibfritz, F., Mostafa, E.M.E.: An interior point constrained trust region method for a special class of nonlinear semidefinite programming problems. SIAM J. Optim. 12(4), 1048–1074 (2002) 22. Liqun Qi, P.T.: On almost smooth functions and piecewise smooth functions. Nonlinear Analysis: Theory, Methods & Applications 67(3), 773–794 (2007) 23. Liuzzi, G., Lucidi, S., Sciandrone, M.: A derivative-free algorithm for linearly constrained finite minimax problems. SIAM J. Optim. 16(4), 1054–1075 (2006) 24. Luo, H., Wu, H., Chen, G.: On the convergence of augmented Lagrangian methods for nonlinear semidefinite programming. J. Glob. Optim. 54(3), 599–618 (2012) 25. Noll, D.: Local convergence of an augmented Lagrangian method for matrix inequality constrained programming. Optim. Methods Softw. 22(5), 777–802 (2007) 26. Noll, D., Apkarian, P.: Spectral bundle methods for non-convex maximum eigenvalue functions: secondorder methods. Math. Program. 104(2–3), 729–747 (2005) 27. Overton, M.L.: Large-scale optimization of eigenvalues. SIAM J. Optim. 2(1), 88–120 (1992) 28. Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(2, Ser. B), 321–357 (1993) 29. Shapiro, A.: First and second order analysis of nonlinear semidefinite programs. Math. Program. 77(2), 301–320 (1997) 30. Stingl, M.: On the Solution of Nonlinear Semidefinite Programs by Augmented Lagrangian Methods. Ph.D. thesis, Institute of Applied Mathematics II. Friedrich–Alexander University of Erlangen-Nuremberg (2006)

123

76

J Glob Optim (2015) 63:61–76

31. Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11/12(1–4), 625–653 (1999) 32. Sun, D., Sun, J., Zhang, L.: The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math. Program. 114(2), 349–391 (2008) 33. Todd, M.J.: Semidefinite optimization. Acta Numer. 10, 515–560 (2001) 34. Tütüncü, R.H., Toh, K.C., Todd, M.J.: Solving semidefinite-quadratic-linear programs using SDPT3. Math. Program. 95(2), 189–217 (2003) 35. Wolkowicz, H., Saigal, R., Vandenberghe, L. (eds.): Handbook of Semidefinite Programming. International Series in Operations Research & Management Science, 27. Kluwer Academic Publishers, Boston, MA (2000) 36. Wu, H., Luo, H., Ding, X., Chen, G.: Global convergence of modified augmented lagrangian methods for nonlinear semidefinite programming. Comput. Optim. Appl. 56(3), 531–558 (2013) 37. Xiao, Y., Yu, B.: A truncated aggregate smoothing Newton method for minimax problems. Appl. Math. Comput. 216(6), 1868–1879 (2010) 38. Xiong, Hj, Yu, B.: An aggregate deformation homotopy method for min–max–min problems with max– min constraints. Comput. Optim. Appl. 47(3), 501–527 (2010) 39. Xu, S.: Smoothing method for minimax problems. Comput. Optim. Appl. 20(3), 267–279 (2001) 40. Yamashita, H., Yabe, H.: Local and superlinear convergence of a primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 132, 1–30 (2012) 41. Yamashita, H., Yabe, H., Harada, K.: A primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 135, 89–121 (2012) 42. Yang, L., Yu, B.: A homotopy method for nonlinear semidefinite programming. Comput. Optim. Appl. 56(1), 81–96 (2013) 43. Zhao, X.Y., Sun, D., Toh, K.C.: A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010)

123