semismooth matrix valued functions1 - CiteSeerX

0 downloads 0 Views 197KB Size Report
Nov 17, 1999 - find this task is not trivial and is often difficult due to the hardship of algebraically .... Suppose that F : Sn → Sn, f : iRν → iRν is defined by (1), and f is .... 5. (Differential Correspondence) Let f : iR¯ν → iR¯ν be defined and differentiable on an ... The directional derivative of a matrix function can be defined in a ...
SEMISMOOTH MATRIX VALUED FUNCTIONS1 Defeng Sun2 School of Mathematics The University of New South Wales, Sydney, Australia

Jie Sun3 Faculty of Business Administration and the Singapore-MIT Alliance National University of Singapore, Republic of Singapore

November 17, 1999

Abstract. Matrix valued functions play an important role in the development of algorithms for semidefinite programming problems. This paper studies generalized differential properties of such functions related to nonsmooth-smoothing Newton methods. The first part of this paper discusses basic properties such as the generalized derivative, Rademacher’s theorem, B-derivative, directional derivative, and semismoothness. The second part shows that the matrix absolute-value function, the matrix semidefinite-projection function, and the matrix projective residual function are strongly semismooth. Keywords: Matrix functions, Newton’s method, nonsmooth optimization, semidefinite programming. AMS subject classification: 65K05, 90C25, 90C33.

1 The research was partially supported by the Australian Research Council and grant RP3972073 of National University of Singapore. 2 E-mail: [email protected]. 3 Fax: (65)779-2621 E-mail: [email protected].

1

Introduction

Let Mmn and Mpq be the spaces of m × n and p × q matrices, respectively. Let M′ be a subset of Mmn . A matrix valued function (matrix function for short) is a function that maps a matrix in M′ to a matrix in Mpq . We are particularly concerned with the case that both Mmn and Mpq are real, symmetric, block-diagonal, and of the same block sizes. Although many of the results could be made more general, we will assume that F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ), where S(n1 , · · · , nm ) is the space of real symmetric n×n block-diagonal matrices with m blocks of sizes n1 , · · · , nm . When m = 1, we simply write S(n1 ) as S n . The differential properties of F are important in view of the recent research on semidefinite programs (SDP) and its generalization, the semidefinite complementarity problems (SDCP). For instance, it is shown (Tseng [14]) that the solution of SDP and SDCP can be reduced to solving a matrix equation F (X) = 0, where F is a certain matrix merit function. However, in order to develop Newton-type methods for such equations, some kind of semismoothness properties has to be established for F. Historically, vector valued semismooth functions played a crucial role in constructing high-order nonsmooth and smoothing Newton methods (see e.g.,[10, 11]) for nonlinear equations, complementarity problems, and variational inequality problems. There are computational evidences showing that those methods are competitive compared with interior point methods in solving complementarity and variational inequality problems [1, 4]. The recent paper of Chen and Tseng [2] provided promising results in non-interior point methods for SDCP. In order to extend semismooth and smoothing Newton methods to SDP, SDCP, and other problems involving the cone of positive semidefinite matrices, the first goal in this direction is naturally to study the semismoothness of matrix functions. In our research we find this task is not trivial and is often difficult due to the hardship of algebraically representing the constraint of positive semidefiniteness. This paper is organized as follows. We study the Fr´echet differentiability of matrix functions in Section 2 by using an isometry between vector spaces and matrix spaces and by invoking some results in matrix analysis. The semismoothness and related concepts are considered in Section 3. Then we show that the absolute-value, the semidefinite projection, and the projective residual matrix functions are strongly semismooth in Section 4. Some notations to be used are as follows. n is the subset of S n consisting of positive semidefinite matrices. • S+

• O is the space of all n × n orthogonal matrices. • For matrices A, B ∈ S m , we define hA, Bi = A • B := Trace(AB) =

m X

Aij Bij .

i,j=1

• Let “T ” represent the transpose of matrices and vectors. • kAkF , or simply kAk, is the Frobenius norm of matrix A : kAkF := hA, Ai1/2 . • kak2 , or kak, represents the 2-norm of a vector a. 1

• We write Y ≻ 0 and Y  0 if Y is positive definite and √ positive semidefinite, respectively. For Y  0, we denote its symmetric square root by Y or Y 1/2 . • Usually, calligraphic letters denote matrix sets. Capital letters represent matrices or matrix valued functions. Lower case letters are for vectors or vector valued functions. Greek letters stand for scalars.

2

The Fr´ echet Derivative of Matrix Functions

2.1

Differentiability of functions

Definition 2.1 Let F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) and let H ∈ S(n1 , · · · , nm ). If FX′ : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) is a linear operator that satisfies kF (X + H) − F (X) − FX′ (H)k = 0, H→0 kHk lim

then F is said to be Fr´echet differentiable (F-differentiable) at X and FX′ is the F-derivative of F at X. Example 2.2 Let F (X) = AXAT for any X ∈ S n , where A ∈ IRn×n . It is easy to verify that kF (X + H) − F (X) − AHAT kF = 0. That is, FX′ (H) = AHAT . Note that FX′ belongs to the space of all linear operators from S(n1 , · · · , nm ) to S(n1 , · · · , nm ). Let us denote it by S(n1 , · · · , nm )∗ , which, according to functional analysis, has the so-called weak∗ -topology and can be identified with a finite dimensional linear space. We will focus on space S n temporarily and discuss the block-diagonal case in the next subsection. Definition 2.3 If A ∈ S n , then vec (A) is a column vector whose entries come from A by stacking up columns of A, from the first to the nth column, on top of each other. The operator mat is the inverse of vec ; i.e. mat ( vec (A)) = A. The operator svec (A) only takes the lower half of A, including the diagonal entries, and stacks it into a vector in a similar way like vec (A). In other words, svec (A) is obtained from vec (A) by eliminating all supradiagonal elements of A. We use smat to represent the inverse of svec . Example 2.4

A=



1 2

2 3



  1 1 2    , vec (A) =   , svec (A) = 2  . 2  

3

3

Remark. Let ν := n(n+1)/2. svec is an isometry identifying S n and IRν with h vec (K), vec (L)i := K • L. 2

For any matrix function F : S n → S n , define its corresponding vector function f : IRν → IRν by f (x) := svec [F (X)], (1) where x := svec (X), X ∈ S n . Lemma 2.5 F is locally Lipschitz continuous on S n if and only if f is locally Lipschitz continuous on IRν . Proof. This is an implication of the isometry of S n with IRν .

We will next derive an operator representation of FX′ in terms of the Jacobian of f. Definition 2.6 The duplication matrix Dn is the n2 ×ν matrix with the property that Dn svec (A) = vec (A) for every A ∈ S n . The following properties of Dn can be easily obtained from the definition of Dn . Theorem 2.7 1. Dn has full column rank ν. 2. Dn+ = (DnT Dn )−1 DnT , where Dn+ is the generalized inverse of Dn . 3. Dn+ Dn = Iν , where Iν designates the unit ν × ν matrix. 4. Dn+ vec (A) = svec (A) for every A ∈ S n . Example 2.8 For n = 2 1 0 D2 =  0 0 

0 1 1 0

0  1  0 +  ,D = 0 0 2 0 1 

0 1/2 0

0 1/2 0

0 0. 1 

Suppose that F : S n → S n , f : IRν → IRν is defined by (1), and f is differentiable at x ∈ IRν with Jacobian ∂f (x) J(x) := . ∂x The following theorem establishes a one-to-one correspondence between J(x) and FX′ . Theorem 2.9 (Differential Correspondence) Let f : IRν → IRν be defined and differentiable on an open subset U of IRν . Let U = smat (U ) and x = svec (X). Then ∂f (x) = J(x) for x ∈ U ∂x if and only if FX′ (H) = smat [J(x) svec (H)] for X ∈ U, H ∈ S n . 3

(2)

Proof. Let J + (x+ ) be the Jacobian of vec F (X) at vec (X) =: x+ . Then by definition ∂ svec [F (X)] ∂ vec [F (X)] = J(x) ⇐⇒ = J + (x+ ), ∂ svec (X) ∂ vec (X) where ⇐⇒ stands for “if and only if”. From a result of Magnus and Neudecker (see [7, Ch. 5, Theorem 11]), one has vec [FX′ (H)] = J + (x+ ) vec H

⇐⇒

∂ vec [F (X)] = J + (x+ ). ∂ vec (X)

Now from the definitions of vec , svec , and Dn it follows that J + (x+ ) = Dn J(x)Dn+ and Dn+ vec = svec . Thus,

⇐⇒ ⇐⇒

⇐⇒

⇐⇒

∂ svec [F (X)] = J(x) ∂ svec (X) ∂ vec [F (X)] = J + (x+ ) ∂ vec (X) vec [FX′ (H)] = J + (x+ ) vec H vec [FX′ (H)] = Dn J(x)Dn+ vec H svec [FX′ (H)] = J(x) svec H

⇐⇒ (2).

It is immediate to get the following results. Among them, the last is a direct extension of Rademacher’s Theorem to matrix functions on S n . Theorem 2.10 1. F (X) is F-differentiable at X if and only if f (x) is differentiable at x in the usual Jacobian sense. 2. The F-derivative of F at X is unique. 3. If F is locally Lipschitzian on S n , then F is almost everywhere F-differentiable on S n . Proof. The first claim is a part of Lemma 2.9. The second claim is due to the uniqueness of Jacobian J(x) and Lemma 2.9. The third is obtained by combining Lemma 2.5, Rademacher’s Theorem on Rν , and Lemma 2.9.

2.2

Extension to block-matrix functions

The results in the last subsection can be extended directly to block-matrix functions. Their proofs are straightforward. Definition 2.11 For any X ∈ S(n1 , · · · , nm ), let Vec (X) be a column vector with Vec (X) := ( vec (X1 )T , · · · , vec (Xm )T )T , 4

where Xi , i = 1, · · · , m is the ith block of X. The operator Mat is the inverse of Vec ; i.e., Mat [ Vec (X)] = X. The operator Svec is defined by Svec (X) := ( svec (X1 )T , · · · , svec (Xm )T )T and Smat is the inverse of Svec . The matrix function F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) under Svec becomes a vector valued function f : IRν¯ → IRν¯ defined by f (x) := Svec [F (X)],

(3)

where x = Svec (X) and ν¯ = m i=1 ni (ni + 1)/2. Conversely, the vector valued f under Smat turns to be a matrix function F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) with P

F (X) := Smat [f (x)], where X = Smat (x). Parallel to Theorems 2.9 and 2.10, we have the following results. Theorem 2.12 1. F is locally Lipschitz continuous on S(n1 , · · · , nm ) if and only if f is locally Lipschitz continuous on IRν¯ . 2. F (X) is F-differentiable at X if and only if f (x) is differentiable at x in the usual Jacobian sense. 3. The F-derivative of F at X is unique. 4. If F is locally Lipschitzian on S(n1 , · · · , nm ), then F is almost everywhere F-differentiable on S(n1 , · · · , nm ). 5. (Differential Correspondence) Let f : IRν¯ → IRν¯ be defined and differentiable on an open subset U of IRν¯ . Let U = Smat (U ) and x = Svec (X). Then ∂f (x) = J(x) for x ∈ U ∂x if and only if FX′ (H) = Smat [J(x) Svec (H)] for X ∈ U, H ∈ S(n1 , · · · , nm ). 6. FX′ can be represented by a composite operator: FX′ = Smat ◦ J(x) ◦ Svec .

5

(4)

3

Generalized Derivative, Directional Derivative, B-Derivative, and Semismoothness of Matrix Functions

Suppose that F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) is a locally Lipschitzian matrix

function and f : IRν¯ → IRν¯ is defined as in Theorem 2.12. According to Theorem 2.12, F is F-differentiable almost everywhere in S(n1 , · · · , nm ). Denote the set of points at which F is F-differentiable by DF . Definition 3.1 The generalized derivative of F at X, denoted by ∂FX , is the set defined as following: ∂FX = co[∂B FX ], where ∂B FX := { lim FZ′ }, (5) Z→X Z ∈ DF where “co” stands for the convex hull in the usual sense of convex analysis [12]. This definition is an extention of the generalized (Clarke) Jacobian ∂f (x) [3]: ∂f (x) = co[∂B f (x)], where ∂B f (x) := { lim J(z)}, z→x z ∈ Df

(6)

where Df is the set of differentiable points of f in IRν¯ and J(z) is the Jacobian of f at z ∈ Df . It is easy to see by the differential correspondence that there is an isometry between ∂f (x) and ∂FX . Similarly to (4) one can write ∂FX = Smat ◦ ∂f (x) ◦ Svec ,

(7)

which is the operator form of the isometry. Topological properties of ∂FX can then be identified with the corresponding properties of ∂f (x). Theorem 3.2 Let F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) be locally Lipschitzian on an open neighborhood U of X. (a) ∂FX is nonempty, convex, and compact subset of S(n1 , · · · , nm )∗ . (b) The mapping ∂FX is closed at X; that is, if Z → X, W ∈ ∂FZ , W → V, then V ∈ ∂FX . (c) ∂F is upper-semicontinuous: for any ε > 0 there is δ > 0 such that, for all Y ∈ X + δB (B is the unit ball in S(n1 , · · · , nm )), ∂FY ⊂ ∂FX + εB ∗ , where B ∗ is the unit ball in S(n1 , · · · , nm )∗ . (d) If each entry Fij of F (X) is αij Lipschitzian at x(= Svec (X)) in the sense that kFij (x) − Fij (y)k ≤ αij kx − yk2 , then F (X) is α := max(αij ) Lipschitzian at X with ∂FX ⊂ αB ∗ . 6

(e) The mean value theorem holds: F (X) − F (Y ) ∈ co {∂FZ (X − Y )|Z ∈ [X, Y ]} =: ∂F[X,Y ] ,

(8)

where [X, Y ] is the line segment connecting X and Y. The directional derivative of a matrix function can be defined in a similar way to scalar or vector valued functions. Definition 3.3 Let F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) and X, H ∈ S(n1 , · · · , nm ). The directional derivative of F at X along H is the following limit (if it exists at all) F ′ (X; H) = lim τ ↓0

F (X + τ H) − F (X) . τ

Corollary 3.4 If F is F-differentiable at X, then for any H ∈ X F ′ (X; H) = FX (H). A function f : IRν¯ → IRν¯ is said to be B-differentiable at a point x ∈ IRν¯ if there exists a function Bf (x) : IRν¯ → IRν¯ , called the B-derivative of f at x, which is positively homogeneous of degree 1 (i.e. Bf (x)(τ h) = τ Bf (x)(h) for all h ∈ Rν¯ and all τ ≥ 0), such that f (x + h) − f (x) − Bf (x)(h) = o(khk)

as h → 0.

(9)

An analogous definition on B-differentiability of matrix valued functions can be established if we change the space to S(n1 , · · · , nm ). There is also an isometry identifying the B-differentiable of f at x with B-derivative of F at X.

Shapiro [13] showed that a locally vector valued Lipschitzian function f : Rν¯ → Rν¯ is Bdifferentiable at x if and only if it is directionally differentiable at x. Therefore, the Bdifferentiability of Lipschitzian matrix valued functions is also equivalent to their directional differentiability.

In the case that a function f : IRν¯ → IRν¯ is not differentiable but locally Lipschitzian, Mifflin [8] and Qi and Sun [11] introduced the concepts of semismoothness and p-order semismooth for vector valued function f . The definition of semismoothness implies that the function is directionally differentiable (hence is B-differentiable according to [13]) and that (Proposition 1 of Pang and Qi [9]) there exists ε > 0 such that sup v∈∂f (x+h)

kf (x + h) − f (x) − vhk = o(khk) ∀ khk < ε.

(10)

In the case of p-order semismooth the term o(khk) is replaced by O(khk1+p ). Equation (10) plays an important role in developing semismooth Newton methods and smoothing methods for nonlinear equation systems and nonlinear programming problems. We now extend the definition of semismoothness to matrix functions in terms of (10).

7

Definition 3.5 Suppose that F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) is a locally Lipschitzian matrix valued function. F is said to be semismooth at X ∈ S(n1 , · · · , nm ) if F is directionally differentiable at X and for any V ∈ ∂FX+H and H → 0, F (X + H) − F (X) − V (H) = o(kHk). F is said to be p-order (0 < p < ∞) semismooth at X if F is semismooth at X and F (X + H) − F (X) − V (H) = O(kHk1+p ).

(11)

In particular, F is called strongly semismooth at X if F is first-order semismooth at X. We find it is convenient to establish an equivalent definition for the analysis of matrix functions. Theorem 3.6 Suppose that F : S n → S q is locally Lipschitzian and directionally differentiable in a neighborhood of X. Then for any p ∈ (0, ∞) the following two statements are equivalent: (a) for any V ∈ ∂F (X + H), H → 0, F (X + H) − F (X) − V (H) = O(kHk1+p ); (b) for any X + H ∈ DF , H → 0, F (X + H) − F (X) − F ′ (X + H; H) = O(kHk1+p ). Proof. From the differential correspondence between F and f it is obvious that one only has to show the following two statements are equivalent: (a′ ) for any v ∈ ∂f (x + h), h → 0, f (x + h) − f (x) − vh = O(khk1+p ); (b′ ) for any x + h ∈ Df , h → 0, f (x + h) − f (x) − f ′ (x + h; h) = O(khk1+p ). (a′ )⇒(b′ ) is obvious. Next we prove (b′ )⇒(a′ ): Assume by contradiction that (b′ ) holds while (a′ ) does not hold. i Then, there exist a sequence {hi }∞ i=1 (h 6= 0) converging to 0 and a corresponding generalized i Jacobian sequence vi ∈ ∂f (x + h ) such that kf (x + hi ) − f (x) − vi hi k → ∞, i → ∞. khi k1+p

(12)

By Carath´eodory theorem, any vi ∈ ∂f (x + hi ) can be expressed as vi =

2 +1 ν¯X

j=1

8

λij vij ,

(13)

where

2 +1 ν¯X

λij ∈ [0, 1],

λij = 1

(14)

j=1

and vij ∈ ∂B f (x + hi ). For each vij ∈ ∂B f (x + hi ), by the definition of ∂B f (x + hi ), there exists y ij ∈ Df such that ky ij − (x + hi )k ≤ khi k1+p (15) and kvij − f ′ (y ij )k ≤ khi kp .

(16)

By (13)–(16), we obtain kf (x + hi ) − f (x) − vi hi k 2 +1 ν¯X



2 +1 ν¯X

λij kf (x + h ) − f (x) − f (y )h k +



2 +1 ν¯X

λij kf (x + hi ) − f (x) − f ′ (y ij )hi k + O(khi k1+p )

=

2 +1 ν¯X

λij kf (x + hi ) − f (x) − f ′ (y ij )[(y ij − x) + (x + hi − y ij )]k

j=1

j=1

j=1

i



ij

i

j=1

k[vi − f ′ (y ij )]hi k

+O(khi k1+p ) ≤

2 +1 ν¯X

j=1

+

λij kf (x + hi ) − f (x) − f ′ (y ij )(y ij − x)k

2 +1 ν¯X

j=1



2 +1 ν¯X

j=1

λij kf ′ (y ij )(x + hi − y ij )k + O(khi k1+p )

λij kf (x + hi ) − f (x) − f ′ (y ij )(y ij − x)k + O(khi k1+p ),

(17)

where in the last inequality we used the fact that {f ′ (y ij )} are uniformly bounded because of the local Lipschitzian property of f . Relations (17), (14) and (15), together with the Lipschitzian continuity of f , imply that i

i

kf (x + h ) − f (x) − vi h k ≤

2 +1 ν¯X

j=1

λij kf (y ij ) − f (x) − f ′ (y ij )(y ij − x)k + O(khi k1+p ).

Thus, by (b′ ), (14) and (15), from (18) we obtain kf (x + hi ) − f (x) − vi hi k ≤ O(ky ij − xk1+p ) + O(khi k1+p ) = O(khi k1+p ), 9

(18)

which contradicts (12). This contradiction shows that (b′ )⇒(a′ ). We list below some properties of semismooth matrix functions involving components and composition. Theorem 3.7 Suppose that F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) is a locally Lipschitzian function. Let p ∈ (0, ∞). If each entry of F is (p-order semismooth) semismooth at X, then F is (p-order semismooth) semismooth at X. Proof. For X + H ∈ DF , H → 0, kF (X + H) − F (X) − F ′ (X + H; H)k ≤

ni m X X

i=1 j,k=1

|[Fi (X + H)]kj − [Fi (X)]kj − [(Fi )kj ]′ (X + H; H)|,

which completes the proof. The following result on semismoothness is due to Mifflin [8] and the result on p-order semismoothness is essentially due to Fischer [5] Lemma 3.8 Let p ∈ (0, ∞). Suppose that f : IRν¯ → IRν1 is (p-order semismooth) semismooth at x ∈ IRν¯ and g : IRν1 → IRν2 is (p-order semismooth) semismooth at f (x). Then, the composite function h := g ◦ f is (p-order semismooth) semismooth at x. The above lemma implies Theorem 3.9 Let p ∈ (0, ∞). Suppose F : S(n1 , · · · , nm ) → Y is (p-order semismooth) semismooth at X and G : Y → Z is (p-order semismooth) semismooth F (X), where Y := S(d1 , · · · , dl ) and Z := S(t1 , · · · , ts ). Then the composite function G◦F is (p-order semismooth) semismooth at X. Proof. Define two vector valued functions g and h by g(y) := Svec [G(Y )], where y := Svec (Y ), Y ∈ Y and h(x) := Svec [G(F (X))], where x := Svec (X), X ∈ S(n1 , · · · , nm ). Then, by Lemma 3.8 and the relationship Smat f ◦ g Svec = F ◦ G, the above result follows directly.

4

Some Strongly Semismooth Matrix Functions

The projective residual function r : IRν → IRν r(x) := g(x) − max[0, g(x) − f (x)], 10

(19)

is often used in continuation methods, QP-free methods, smoothing methods, and merit function methods; see Qi and Sun [10] for its applications. It is also closely related to other merit functions. We will show that the corresponding matrix projective residual function is locally Lipschitzian and strongly semismooth. Let S(n1 , · · · , nm )+ be the cone of symmetric block-diagonal positive semidefinite matrices. Define the semidefinite projection matrix function [·]+ : S(n1 , · · · , nm ) → S(n1 , · · · , nm )+ as the matrix valued function that satisfies (1) [X]+ ∈ S(n1 , · · · , nm )+ ; (2) kX − [X]+ k ≤ kX − Zk ∀ Z ∈ S(n1 , · · · , nm )+ . The following proposition means that we can focus our discussion on matrix projection from n , the cone of symmetric positive semidefinite matrices. S n to S+ Proposition 4.1 For any X ∈ S(n1 , · · · , nm ), we have 

[X1 ]+ ..

[X]+ =  



. [Xm ]+

where Xi is the i-th block matrix of X.

 ,

Proof. Straightforward. Since (S n , h·, ·i) is a Hilbert space, we have the following proposition (Zarantonello [15]). Proposition 4.2 For the positive semidefinite projection function [·]+ , we have n , Z = [X] if and only if hX − Z, Y − Zi ≤ 0 (a) for any X ∈ S n and Z ∈ S+ +

(b) k[Y ]+ − [X]+ k ≤ kY − Xk

n. ∀ Y ∈ S+

∀ Y, X ∈ S n .

Proposition 4.2(b) says that [·]+ is globally Lipschitz continuous with Lipschitz constant 1. Definition 4.3 Let F, G : S(n1 , · · · , nm ) → S(n1 , · · · , nm ). The matrix projective residual function is defined as R(X) := G(X) − [G(X) − F (X)]+ , X ∈ S(n1 , · · · , nm ). It is well known [14, 15] that the root of the above function satisfies G(X)  0, F (X)  0, and hG(X), F (X)i = 0. Thus, one can obtain a solution to SDCP by solving R(X) = 0. Similarly to Proposition 4.1, we could focus on residual functions from S n to S n only. The strong semismoothness of such functions depends on the strong semismoothness of [·]+ according to Theorem 3.9 if G and F are strongly semismooth. Note that function [·]+ can be expressed as the optimal solution of the parametric SDP [X]+ = argmin {kX − Y k|Y  0}. 11

Hence [·]+ is quite different from the vector function max(0, ·). The vector case is always strongly semismooth due to the piecewise linear structure of the function max. However, the matrix case does not have a piecewise linear structure. This poses great difficulty in proving the (strong) semismoothness of [·]+ . We further reduce the problem to the strong semismoothness of the so-called absolute-value function. Definition 4.4 Let A ∈ S(n1 , · · · , nm ) be positive semidefinite. Then there exists a unique symmetric positive semidefinite matrix √ B ∈ S(n1 , · · · , nm ) such that B 2 = A. We call B √ the square root of A and denotes it by B = A. The matrix absolute value of A is defined as A2 . The following proposition says that the three matrix functions — the absolute-value, the semidefinite-projection, and the projective residual functions — will be all strongly semismooth if one of them is. Proposition 4.5 There hold



X 2 = [X]+ + [−X]+ and [X]+ = [X +

√ X 2 ]/2.

Proof. Tseng [14] showed that for any X ∈ S n , [X]+ = P diag [max(0, λ1 ), · · · , max(0, λn )]P T , where P ∈ O and λ1 , · · · , λn ∈ R satisfy X = P diag [λ1 , · · · , λn ]P T . By definition we have √

q

X 2 = P diag [ λ21 , · · · ,

The proposition follows.

q

λ2n ]P T .

Definition 4.6 (Kronecker Product) Given two matrices A ∈ Rm×n and B ∈ Rp×q . The Kronecker product is defined as follows. a11  .. A= . am1 

···

··· ···

a1n a11 B ..   .. .  and A ⊗ B =  . am1 B amn 



···

··· ···

a1n B ..  . . amn B 

Note that we always have (A ⊗ B)−1 = B −1 ⊗ A−1 T

vec (ABC) = (C ⊗ A) vec B.

(20) (21)

√ Theorem 4.7 Denote Y 2 by F (Y ), Y ∈ S n . If X ∈ S n is nonsingular, then FX′ exists and satisfies √ √ vec FX′ (H) = (I ⊗ X 2 + X 2 ⊗ I)−1 (I ⊗ X + X ⊗ I) vec H. (22) Moreover, for H → 0, H ∈ S n , we have

F (X + H) − F (X) − FX′ (H) = O(kHk2 ). 12

(23)

Proof. Let T ∆F

:= F (X + H) − F (X) − FX′ (H),

:= F (X + H) − F (X).

We shall substitute (22) for FX′ (H) to verify that T = O(kHk2 ), which will prove both (22) and (23). Notice that from (22) and (21) we have F (X)FX′ (H) + FX′ (H)F (X) = XH + HX, H ∈ S n . p √  (X + H)2 + X 2 T

= (X + H)2 − X 2 −

(24)

√ √ p √ p (X + H)2 X 2 + X 2 (X + H)2 − ( (X + H)2 + X 2 )FX′ (H)

p

√ √ √ p = (X + H)2 − X 2 − ∆F X 2 + X 2 ∆F − ( (X + H)2 + X 2 )FX′ (H).

That is h √ i √ √ √ [2 X 2 + O(kHk)]T = XH + HX + H 2 ∆F X 2 + X 2 ∆F − 2 X 2 + O(kHk) FX′ (H),

which implies that √ √ √ √ √ 2 X 2 ∆F −2 X 2 FX′ (H) = XH +HX −∆F X 2 + X 2 ∆F −2 X 2 FX′ (H)+O(kHk2 ). (25)

because from the definitions of FX′ (H) and ∆F it is clear that kFX′ (H)k = O(kHk) and k∆F k = O(kHk). In the following we use ⇐⇒ to stand for “implies”. √ √ X 2 [∆F − FX′ (H)] + [∆F − FX′ (H)] X 2 = O(kHk2 ) (25) ⇐⇒ √ √ X 2 T + T X 2 = O(kHk2 ) ⇐⇒ √ √ ⇐⇒ (I ⊗ X 2 + X 2 ⊗ I) vec T = O(kHk2 ) √ √ Since X is nonsingular, I ⊗ X 2 + X 2 ⊗ I is also nonsingular. Thus T = O(kHk2 ). The proof is complete. For any matrix X ∈ S n , there exists P ∈ O such that X=P

D 0 0 0

!

PT,

where D is a nonsingular diagonal matrix. Let m denote the rank of X and let

Let C :=



K := {1, · · · , m}, J := {1, · · · , n}\K. D 2 and define the linear operator LC by LC (Z) := CZ + ZC, Z ∈ S m

and L−1 C its inverse operator. 13

Theorem 4.8 Let F : S n → S n be defined by F (Y ) := differentiable at any X ∈ S n and for any H ∈ S n , 

F ′ (X; H) = P  



Y 2 , Y ∈ S n . Then F is directionally

−1 D H ˜ ˜ ˜ KJ  L−1 C [D HKK + HKK D] C

q

˜2 H JJ

˜ T DC −1 H KJ

˜ = P T HP . where H

 T P ,

Proof. For any H ∈ S and τ ∈ [0, ∞), let ∆(τ ) := F (X + τ H) − F (X) and

˜ ) := P T ∆(τ )P. ∆(τ

Then,

˜ ) = P T F (X + τ H)P − P T F (X)P ∆(τ =

q

=

q

= ˜ := P T XP. Define where X

[P T (X + τ H)P ]2 −

q

(P T XP )2

[P T XP + τ P T HP ]2 −

q

˜ + τ H] ˜ 2− [X

C˜ :=

q

(P T XP )2 ,

p

q

˜ 2, X

C 0 0 0

˜2 = X

!

.

Then, ˜ )= ∆(τ where

q

˜ − C, ˜ C˜ 2 + W

˜ = τX ˜H ˜ + τH ˜X ˜ + τ 2H ˜ 2. W

After simple computations we have ˜ =τ W

˜ KK + H ˜ KK D DH ˜ KJ DH T ˜ D H 0 KJ

!

+

O(τ 2 ) O(τ 2 ) 2 2 T ˜ H ˜ ˜2 O(τ ) τ [H KJ KJ + HJJ ]

!

.

(26)

By Lemma 6.2 in Tseng [14], we have

and Thus,

˜ KK ) + o(kW ˜ k), ˜ )KK = L−1 (W ∆(τ C

(27)

˜ )KJ = C −1 W ˜ KJ + o(kW ˜ k) ∆(τ

(28)

˜ JJ = ∆(τ ˜ )TKJ ∆(τ ˜ )KJ + ∆(τ ˜ )2JJ , W

(29)

˜ )KJ = τ C −1 D H ˜ KJ + o(τ ). ∆(τ

(30)

14

Let E := C −1 D. Then Eij =

(

1 or − 1 if j = i, 0 otherwise.

Thus, from (30), 2 ˜ )T ∆(τ ˜ )KJ = τ 2 H ˜ T E2H ˜ KJ + o(τ 2 ) = τ 2 H ˜T H ˜ ∆(τ KJ KJ KJ KJ + o(τ ).

(31)

According to (27) and (26), ˜ )KK = τ L−1 (D H ˜ KK + H ˜ KK D) + o(τ ). ∆(τ C Since

(32)

T ˜ 2 ˜ JJ = τ 2 [H ˜ KJ ˜ JJ W HKJ + H ],

from (29) and(31), we obtain

2 ˜ )2JJ = τ 2 H ˜ JJ ∆(τ + o(τ 2 ).

(33)

˜ )JJ is positive semidefinite (see the definition of ∆(τ ˜ )), from (33), Furthermore, since ∆(τ ˜ )JJ is well-defined and ∆(τ q ˜ ˜ 2 + o(1). ∆(τ )JJ = τ H (34) JJ

Then, from (32), (30) and (34), lim τ ↓0



˜ )  ∆(τ = τ

−1 D H ˜ ˜ ˜ KJ L−1 C [D HKK + HKK D] C

q

˜ T DC −1 H KJ

˜2 H JJ

which proves the theorem.

Theorem 4.9 Let F : S n → S n be defined by F (Y ) := matrix X ∈ S, F is not F-differentiable at X.





 ,

Y 2 , Y ∈ S n . Then for any singular

Proof. By Theorem 4.8, F is directionally differentiable at X. By contradiction, suppose that F is F-differentiable at X. Then F ′ (X; ·) = FX′ (·) is a linear operator. But, let H ∈ S n be such that ! 0 0 T ˜ := P HP = H 0 I and G = −H, where I is the identity matrix in ℜr×r and r = n − rank(X) ≥ 1. Then, by Theorem 4.8, we should have ′ 0 = F ′ (X; G + H) = FX′ (G + H) = FX′ (G) ! + FX (H) 0 0 = F ′ (X; G) + F ′ (X; H) = 2P PT, 0 I

which is a contradiction. This contradiction shows that F is not F-differentiable at X. The proof is complete. An interesting implication of Theorems 4.7 and 4.9 is that the Lebesque measure of singular symmetric matrices is zero. This is almost intuitively obvious and is a useful fact in our analysis later. 15

Corollary 4.10 Matrices in S n are almost everywhere nonsingular. √ Proof. By Theorems 4.7 and 4.9, F (Y ) := Y 2 , Y ∈ S n is F-differentiable at X ∈ S n if and only if X is nonsingular. Since F is Lipschitzian, F is F-differentiable almost everywhere in S n by Theorem 2.10. Thus, matrices in S n are almost everywhere nonsingular. Lemma 4.11 Let F : S n → S n be defined by F (Y ) = matrix H ∈ S n , FH′ (·) exists and satisfies



Y 2 , Y ∈ S. Then for any nonsingular

F (H) − F (0) − FH′ (H) = 0.

(35)

Proof. By Theorem 4.7, for any nonsingular H ∈ S n , FH′ (·) exists and satisfies √ √ FH′ (H)H + HFH′ (H) = H 2 H + H H 2 , which implies that √ vec (FH′ (H)) = (I ⊗ H + H ⊗ I)−1 (I ⊗ H + H ⊗ I) vec ( H 2 ) = vec (F (H)), That is, vec (FH′ (H) − F (H)) = 0, which proves (35) because F (0) = 0, which completes our the proof. The following is a main result that leads to the strong semismoothness of the absolute value function. √ Lemma 4.12 Let F : S n → S n be defined by F (Y ) := Y 2 , Y ∈ S n . Assume that X ∈ S n , X 6= 0. Then, for any H ∈ S n such that X + H is nonsingular, F (·) is F-differentiable at X + H and ′ F (X + H) − F (X) − FX+H (H) = O(kHk2 ). (36) Proof. For any H ∈ S n such that X + H is nonsingular, denote ′ ∆T := F (X + H) − F (X) − FX+H (H).

Then, q

(X + H)2 ∆T = (X + H)2 −

and

∆T

q

(X +

H)2

q √ ′ (X + H)2 X 2 − (X + H)2 FX+H (H)

q

q √ q ′ 2 2 = (X + H) − X (X + H) − FX+H (H) (X + H)2 . 2

16

(37)

(38) (39)

By (38), (39), and Theorem 4.7, we have p

(X + H)2 ∆T + ∆T

p

(X + H)2

√ √ p p = 2(X + H)2 − [ (X + H)2 X 2 + X 2 (X + H)2 ] ′ ′ −[ (X + H)2 FX+H (H) + FX+H (H) (X + H)2 ]

p

p

√ √ p p = 2(X + H)2 − [ (X + H)2 X 2 + X 2 (X + H)2 ] −[(X + H)H + H(X + H)]

√ p √ p = (X + H)X + X(X + H) − [ (X + H)2 X 2 + X 2 (X + H)2 ].

(40)

Since 0 6= X, there exist an orthogonal matrix Q ∈ O and a diagonal matrix Σ such that 

σ1



σ2

      T Q XQ = Σ =      

..

. σm 0 ..

. 0

      ,     

where σ1 ≥ · · · ≥ σm , σi 6= 0, i = 1, · · · , m and m ∈ {1, · · · , n − 1}. Let ˜ = QT HQ. H Then, from (40), we have q

q

˜ 2 QT ∆T Q + QT ∆T Q (Σ + H) ˜ 2 (Σ + H) ˜ + Σ(Σ + H) ˜ − = (Σ + H)Σ

 √ √ q ˜ 2 Σ2 + Σ2 (Σ + H) ˜ 2 . (Σ + H)

q

(41)

˜ = QT (X + H)Q is also nonsingular It is noted that if X + H ∈ S n is nonsingular, then Σ + H n ˜ → 0. Thus, for any H ∈ S such that X + H is nonsingular, there exist an and H → 0 ⇐⇒ H orthogonal matrix P ∈ O (depending on H) and a nonsingular diagonal matrix Λ (depending on H) such that 

      ˜ P T (Σ + H)P =Λ=     

λ1



λ2 ..

. λm λm+1 ..

. λn

17

      ,     

(42)

where λ1 ≥ λ2 ≥ · · · ≥ λm ≥ λm+1 ≥ · · · ≥ λn and λi 6= 0, i = 1, 2, · · · , n. Since the eigenvalues are continuous functions of the matrix, λi → σi , i = 1, · · · , m

(43)

λi → 0, i = m + 1, · · · , n.

(44)

and Equations (41) and (42) imply √ √ Λ2 ∆T˜ + ∆T˜ Λ2 = Λ(P T ΣP ) + (P T ΣP )Λ − where

√

q

Λ2 (P T ΣP )2 +

√  (P T ΣP )2 Λ2 ,

q

∆T˜ := P T QT ∆T QP.

Therefore, from (45), we have in fact obtained √ √ (I ⊗ Λ2 + Λ2 ⊗ I) vec (∆T˜) = (I ⊗ Λ + Λ ⊗ I) vec (P T ΣP ) − (I ⊗



Λ2 +

= (I ⊗ Λ + Λ ⊗ I)(P T ⊗ P T ) vec (Σ) − (I ⊗



(45)

(46)

Λ2 ⊗ I) vec (P T |Σ|P )

√ √ Λ2 + Λ2 ⊗ I)(P T ⊗ P T ) vec (|Σ|).

(47)

Since Λ is nonsingular, we obtain, √ √ vec (∆T˜) = (I ⊗ Λ2 + Λ2 ⊗ I)−1 (I ⊗ Λ + Λ ⊗ I)(P T ⊗ P T ) vec (Σ) −(P T ⊗ P T ) vec (|Σ|).

(48)

Define a vector s ∈ IRm by si =

(

1 if σi > 0 , i = 1, · · · , m. −1 otherwise

Let u := (P T ⊗ P T ) vec (Σ)

(49)

v := (P T ⊗ P T ) vec (|Σ|).

(50)

and In terms of the definitions of P T ⊗ P T and Σ, we obtain for k = (j − 1)n, j = 1, · · · , n,      

uk+1 uk+2 .. . uk+n





     = P1j     

P11 P12 .. . P1n





     σ1 + P2j     

18

P21 P22 .. . P2n





     σ2 + · · · + Pmj     

Pm1 Pm2 .. . Pmn



   σm  

(51)

and      

vk+1 vk+2 .. . vk+n





     = P1j     

P11 P12 .. . P1n





     σ1 s1 + P2j     

P21 P22 .. . P2n

where Pij is the (i, j)-th entry of P. Let





     σ2 s2 + · · · + Pmj     

Pm1 Pm2 .. . Pmn



   σm s m ,  

w := vec (∆T˜).

(52)

(53)

Then, from (48)-(50) and (53), for k = (j − 1)n, j = 1, · · · , n we have 

wk+1

  wk+2  ..   .

wk+n





     =     

λj +λ1 |λj |+|λ1 | uk+1 λj +λ2 |λj |+|λ2 | uk+2

− vk+1

λj +λn |λj |+|λn | uk+n

− vk+n

.. .

 

− vk+2  

,   

which, together with (51) and (52), implies that for k = (j − 1)n, j = 1, · · · , n and i = 1, · · · , n we have λj + λ i wk+i = [P1j P1i σ1 + P2j P2i σ2 + · · · + Pmj Pmi σm ] |λj | + |λi | −[P1j P1i σ1 s1 + P2j P2i σ2 s2 + · · · + Pmj Pmi σm sm ]. Now

  

∆T˜ = mat (w) =   

We can see that

w1 w2 .. . wn

wn+1 · · · w(n−1)n+1 wn+2 · · · w(n−1)n+2 .. .. . ··· . wn+n · · · w(n−1)n+n

(54)



  .  

(55)

∆T˜ T = ∆T˜.

The symmetric property of ∆T˜ will be used in the following analysis. ˜ Suppose q is a nonnegative integer such Next, we establish a relationship between P and H. that σ1 ≥ · · · ≥ σq > 0 > σq+1 > · · · > σm .

˜ , which implies for i = 1, · · · , n, Then, from (42), we have P Λ = (Σ + H)P P1i  ..   .  

  P  qi  . λi   ..   Pmi   ..  .

Pni





          =             

˜ 11 )P1i + · · · + H ˜ 1q Pqi + · · · + H ˜ 1m Pmi + · · · + H ˜ 1n Pni (σ1 + H .. . ˜ ˜ ˜ qm Pmi + · · · + H ˜ qn Pni H1q P1i + · · · + (σq + Hqq )Pqi + · · · + H .. . ˜ 1m P1i + · · · + H ˜ qm Pqi + · · · + (σm + H ˜ mm )Pmi + · · · + H ˜ mn Pni H .. . ˜ 1n P1i + · · · + H ˜ qn Pqi + · · · + H ˜ mn Pmi + · · · + H ˜ nn Pni H 19



        . (56)      

Since P in an orthogonal matrix, for i, j = 1, · · · , n, kP·i k2 = 1, P·iT P·j = 0, j 6= i,

(57)

where for any k = 1, · · · , n, P·k denotes the k-th column of P . Thus, in terms of (43), (44), (56) and (57), we have  ˜ P(q+1)i O(kHk)     . .. , .  = . .   ˜ Pni O(kHk) 



P1i .. .



    Pqi    P(m+1)i  ..   .

Pni

and





          =         



˜ O(kHk) .. . ˜ O(kHk) ˜ O(kHk) .. . ˜ O(kHk)

     ,     





(58)

i = q + 1, · · · , m,

(59)



 ˜ O(kHk) P1i   ..   . , ..  . =   ˜ Pmi O(kHk) 

i = 1, · · · , q,

i = m + 1, · · · , n.

(60)

By (60) and (56), we have

 ˜ P1i O(kHk)   ..   ..  , i = m + 1, · · · , n, λi  .  =  .   ˜ Pni O(kHk) 





which, together with (57), implies

i.e.,

˜ 2 ), λ2i = O(kHk

i = m + 1, · · · , n,

˜ λi = O(kHk),

i = m + 1, · · · , n.

Then, for i = 1, · · · , m, (56) and (58)-(60) give P1i  ..   .  

  P  qi  . λi   ..   Pmi   ..  .

Pni





          =             

˜ 11 )P1i + · · · + H ˜ 1q Pqi + · · · + H ˜ 1m Pmi + O(kHk ˜ 2) (σ1 + H .. . ˜ 1q P1i + · · · + (σq + H ˜ qq )Pqi + · · · + H ˜ qm Pmi + O(kHk ˜ 2) H .. . ˜ ˜ ˜ mm )Pmi + O(kHk ˜ 2) H1m P1i + · · · + Hqm Pqi + · · · + (σm + H .. . ˜ O(kHk)

(61)



       .      

(62)

In the following we will prove for k = (j − 1)n, j = 1, · · · , n and i = 1, · · · , n the following equation ˜ 2) wk+i = O(kHk (63) 20

holds. We analyze this in eight different cases. Case 1: k = (j − 1)n, j = 1, · · · , q and i = 1, · · · , q. Then, by (54) and (58)-(60), ˜ 2 ). wk+i = 2P(q+1)j P(q+1)i σq+1 + · · · + 2Pmj Pmi σm = O(kHk So, (63) holds for this case. Case 2: k = (j − 1)n, j = 1, · · · , q and i = q + 1, · · · , m. Then, again, by (54), wk+i =

λj + λi [P1j P1i σ1 + P2j P2i σ2 + · · · + Pqj Pqi σq |λj | + |λi | +P(q+1)j P(q+1)i σq+1 + · · · + Pmj Pmi σm ]

−[P1j P1i σ1 + P2j P2i σ2 + · · · + Pqj Pqi σq −P(q+1)j P(q+1)i σq+1 − · · · − Pmj Pmi σm ].

(64)

Now we assume σ1 ≥ · · · ≥ σj−s−1 > σj−s = · · · = σj = · · · = σj+t > σj+t+1 ≥ · · · ≥ σq > 0 and 0 > σq+1 ≥ · · · ≥ σi−a−1 > σi−a = · · · = σi = · · · = σi+b > σi+b+1 ≥ · · · ≥ σm for some nonnegative integers s, t, a and b. Then, by (43) and (62), 

P1j .. .

    P  (j−s−1)j   P(j+t+1)j  ..   .

Pmj

and



P1i .. .

    P  (i−a−1)i   P(i+b+1)i  ..   .

Pmi



     =    

           





          =         

21

˜ O(kHk) .. . ˜ O(kHk) ˜ O(kHk) .. . ˜ O(kHk) ˜ O(kHk) .. . ˜ O(kHk) ˜ O(kHk) .. . ˜ O(kHk)

           

(65)



     .     

(66)

Then, by (58)-(60) and (64)-(66), we have wk+i =

λj + λi [P P σj + Pjj Pji σj + · · · + P(j+t)j P(j+t)i σj |λj | + |λi | (j−s)j (j−s)i

˜ 2] +P(i−a)j P(i−a)i σi + · · · + Pij Pii σi + · · · + P(i+b)j P(i+b)i σi + O(kHk −[P(j−s)j P(j−s)i σj + Pjj Pji σj + · · · + P(j+t)j P(j+t)i σj ˜ 2 )]. −P(i−a)j P(i−a)i σi − · · · − Pij Pii σi − · · · − P(i+b)j P(i+b)i σi + O(kHk

(67)

Let η1 := P(j−s)j P(j−s)i + Pjj Pji + · · · + P(j+t)j P(j+t)i

(68)

η2 = P(i−a)j P(i−a)i + · · · + Pij Pii + · · · + P(i+b)j P(i+b)i .

(69)

and Then, by (43), (67) becomes wk+i = =

λj + λ i ˜ 2 ] − [σj η1 − σi η2 + O(kHk) ˜ 2] [σj η1 + σi η2 + O(kHk) |λj | + |λi |

2 ˜ 2, [λi σj η1 + λj σi η2 ] + O(kHk) λj − λi

(70)

where we used the fact that when H → 0, λj → σj > 0 and λi → σi < 0. According to (62) and the fact that σj−s have   P(j−s)j   ..   .

and

  λj    

Pjj .. .

P(j+t)j

P(i−a)i   ..   .

  λi    

Pii .. .

P(i+b)i



      =      





= · · · = σj+t = σj and σi−a = · · · = σi+b = σi , we



      =      

˜ σj P(j−s)j + O(kHk) .. . ˜ σj Pjj + O(kHk) .. . ˜ σj P(j+t)j + O(kHk) ˜ σi P(i−a)i + O(kHk) .. . ˜ σi Pii O(kHk) .. . ˜ σi P(i+b)i + O(kHk)

Equations (68), (69), (58)-(60), (65)-(66) and (57) imply ˜ 2 ), (P(j−s)j )2 + · · · + (P(j+t)j )2 = 1 + O(kHk and

˜ η1 = O(kHk),

        

(71)



    .   

(72)

˜ k2 ) (73) (P(i−a)i )2 + · · · + (P(i+b)i )2 = 1 + O(kH

˜ η2 = O(kHk),

˜ 2 ). η1 + η2 = O(kHk

(74)

Thus, from (72) and (73), we obtain ˜ λi − σi = O(kHk),

˜ λj − σj = O(kHk). 22

(75)

Therefore, by (70), (74) and (75), we obtain wk+i =

2 ˜ ˜ [σi σj (η1 + η2 ) + O(kHk)η 1 + O(kHk)η2 ] ˜ σj − σi + O(kHk)

=

2 ˜ 2 ) + O(kHk)η ˜ ˜ [σi σj O(kHk 1 + O(kHk)η2 ] ˜ σj − σi + O(kHk)

˜ 2 ), = O(kHk which proves (63). Case 3: k = (j − 1)n, j = 1, · · · , q and i = m + 1, · · · , n. Then, by (54), wk+i =

λi − |λi | [P1j P1i σ1 + · · · + Pqj Pqi σq ] λj + |λi | +

2λj + λi + |λi | [P(q+1)j P(q+1)i σq+1 + · · · + Pmj Pmi σm ], λj + |λi |

which, together with (58)-(60), implies wk+i =

λi − |λi | ˜ 2 ). [P1j P1i σ1 + · · · + Pqj Pqi σq ] + O(kHk λj + |λi |

(76)

By, equations (60) and (61), we have ˜ 2 ), λi [P1j P1i σ1 + · · · + Pqj Pqi σq ] = O(kHk which, together with (76), proves (63). Case 4: k = (j − 1)n, j = q + 1, · · · , m and i = 1, · · · , q. Similar to the discussion in Case 2, we can prove (63). We omit the detail here. Case 5: k = (j − 1)n, j = q + 1, · · · , m and i = q + 1, · · · , m. Similar to the discussion in Case 1, we can prove (63). We omit the detail here. Case 6: k = (j − 1)n, j = q + 1, · · · , m and i = m + 1, · · · , n. Similar to the discussion in Case 3, we can prove (63). Again, we omit the detail here. Case 7: k = (j − 1)n, j = m + 1, · · · , n and i = 1, · · · , m. Due to the symmetric property of ∆T˜ we can prove (63) according to the Cases 1 − 6. Case 8: k = (j − 1)n, j = m + 1, · · · , n and i = m + 1, · · · , n. Then, by (58)-(60) and (54), we have λj + λ i ˜ 2 )] − [O(kHk ˜ 2 )] = O(kHk ˜ 2 ), [O(kHk wk+i = |λj | + |λi | which proves (63). Overall, we have proved that (63) holds for k = (n − 1)j, j = 1, · · · , n and i = 1, · · · , n. Since ˜ = QHQT , we have in fact proved H k∆T˜k = O(kHk2 ). 23

(77)

By (46), (53), and (21), ∆T˜ = vec (P T QT ∆T QP ) = (P T QT ⊗ P T QT ) vec (∆T ) = (P T ⊗ P T )(QT ⊗ QT ) vec (∆T ), which in turn implies

vec (∆T ) = (P ⊗ P )(Q ⊗ Q)∆T˜.

This and (77) prove vec (∆T ) = O(kHk2 ). So, (36) is proved. Theorem 4.13 Let F : S n → S n be defined by F (Y ) = semismooth at any X ∈ S n .



Y 2 , Y ∈ S n . Then F is strongly

Proof. By Theorem 4.8, F is directionally differentiable at X. On the other hand, in terms of Theorem 4.7 and Lemmas 4.11 and 4.12, ′ F (X + H) − F (X) − FX+H (H) = O(kHk2 ) ∀ X + H ∈ DF , H → 0.

Thus, according to Theorem 3.6, F is semismooth at X ∈ S n . By Theorem 3.6, Lemmas 4.11 and 4.12 and the semismoothness of F , F is strongly semismooth at any X ∈ S n . Corollary 4.14 Let F : S(n1 , · · · , nm ) → S(n1 , · · · , nm ) be defined by F (Y ) = S(n1 , · · · , nm ). Then F is strongly semismooth at any X ∈ S(n1 , · · · , nm ).



Y 2, Y ∈

Corollary 4.15 The function of semidefinite projection is strongly semismooth on S(n1 , · · · , nm ). Corollary 4.16 The function of projective residual defined on S(n1 , · · · , nm ) is strongly semismooth at X ∈ S(n1 , · · · , nm ) if both F and G are strongly semismooth at X ∈ S(n1 , · · · , nm ). Proof. This is a direct implication of Theorems 3.9 and 4.14.

5

Final Remarks

we have discussed some important differential properties of matrix valued functions. In particular, we have proved that the semidefinite-projection function, the absolute-value function, and the projective residual function are strongly semismooth on S(n1 , · · · , nm ). To use these properties to design semismooth Newton methods and smoothing Newton methods for matrix valued equations and the SDCP is a logical step in future research.

24

References [1] J. Burke and S. Xu, “The global linear convergence of a non-interior path-following algorithm for linear complementarity problem”, Mathematics of Operations Research, 23 (1998) 719–734. [2] X. Chen and P. Tseng, “Non-interior continuation methods for solving semidefinite complementarity problems”, Tech. Report, Dept of Math, University of Washington, 1999. [3] F. Clarke, Optimization and Nonsmooth Analysis, John Wiley and Sons, New York, 1983. [4] F. Facchinei and C. Kanzow, “A nonsmooth inexact Newton method for the solution of large-scale nonlinear complementarity problems”, Mathematical Programming , 76 (1997) 493–512. [5] A. Fischer, “Solution of monotone complementarity problems with locally Lipschitzian functions”, Mathematical Programming , 76 (1997) 513–532. [6] J.R. Magnus, Linear Structures, Oxford University Press, New York, 1988. [7] J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, New York, 1986. [8] R. Mifflin, “Semismooth and semiconvex functions in constrained optimization”, SIAM Journal Control and Optimization, 15 (1977) 957–972. [9] J.S. Pang and L. Qi, “Nonsmooth equations: Motivation and algorithms”,SIAM Journal on Optimization, 3 (1993) 443-465. [10] L. Qi and D. Sun, “Nonsmooth and smoothing Methods for NCP and VI”, To appear in the Encyclopedia of Optimization, C. Floudas and P. Pardalos (editors), Kluwer Academic Publishers, 1999. [11] L. Qi and J. Sun, “A nonsmooth version of Newton’s method”, Mathematical Programming, 58 (1993) 353–367. [12] R.T. Rockafellar, Convex Analysis, Princeton University Press, New Jersey, 1970. [13] A. Shapiro, “On concepts of directional differentiability”, Journal of Optimization Theory and Applications, 66 (1990) 477–487. [14] P. Tseng, “Merit functions for semidefinite complementarity problems”, Mathematical Programming , 83 (1998) 159–185. [15] E.H. Zarantonello, “Projections on convex sets in Hilbert space and spectral theory”, in Contributions to Nonlinear Functional Analysis, E.H. Zarantonello ed. Academic Press, New York (1971) 237–424.

25