A sequential semismooth Newton method for the nearest low-rank ...

1 downloads 0 Views 374KB Size Report
Key words. low-rank correlation matrix, quadratic semidefinite programming, semismooth. Newton method, quadratic convergence, constraint nondegeneracy.
A Sequential Semismooth Newton Method for the Nearest Low-Rank Correlation Matrix Problem Qingna Li∗ and Houduo Qi† September 16, 2009

Abstract Based on the well known result that the sum of largest eigenvalues of a symmetric matrix can be represented as a semidefinite programming problem (SDP), we formulate the nearest low-rank correlation matrix problem as a nonconvex SDP and propose a numerical method that solves a sequence of least-square problems. Each of the least-square problems can be solved by a specifically designed semismooth Newton method, which is shown to be quadratically convergent. The sequential method is guaranteed to produce a stationary point of the nonconvex SDP. Our numerical results demonstrate the high efficiency of the proposed method on large scale problems.

Keywords: Low-rank correlation matrix, quadratic semidefinite programming, semismooth Newton method, quadratic convergence, constraint nondegeneracy.

1

Introduction

In this paper, we introduce a fast algorithm for the problem of computing the nearest correlation matrix under an arbitrary rank constraint: minX∈S n s.t.

1 2 kX

− Ck2 diag(X) = e, X  0 rank(X) ≤ r,

(1)

where C ∈ S n is given, S n is the space of n × n symmetric matrices endowed with the standard n denotes the cone trace inner product; k · k is the induced norm (i.e., the Frobenius norm); S+ n n of all positive semidefinite matrices in S ; X  0 means X ∈ S+ ; diag(X) is the vector formed by the diagonal elements of X; e is the vector of all ones in IRn ; and rank(X) is the rank of X and r ≤ n is a given integer. n , and the second The first constraint in (1) defines the set of all correlation matrices in S+ constraint stipulates that only those correlation matrices that have rank less than r are interested. On top of those conditions, we seek the nearest candidate to a given matrix C. Unless r = n (so that the last constraint becomes superfluous), problem (1) is a nonconvex problem. ∗ College of Mathematics and Econometrics, Hunan University, Changsha, 410082, China. The work of this author was done while visiting the University of Southampton and was financially supported by China Scholarship Council (CSC). E-mail: [email protected]. † School of Mathematics, University of Southampton, Highfield, Southampton SO17 1BJ, UK. This author’s research was partially supported by EPSRC Grant EP/D502535/1. E-mail: [email protected].

1

Problem (1) has important applications in finance, where the input matrix C is often a known correlation matrix, but with rank larger than r. Interested reader may refer to Brigo and Mercurio [7], Rebonato [31, 32], Grubisic and Pietersz [11], Pieterz and Groenen [24], Wu [38], Zhang and Wu [39] (to just name a few) for concrete examples and a few proposed numerical algorithms, some of which will be briefly discussed shortly. In fact, the literature review section [11, Sec. 2] refers to 17 papers/books addressing the problem. A widely adopted approach is to tackle the rank constraint rank(X) ≤ r and the positive semidefinite constraint X  0 together by using the Gramian representation: X = RRT ,

(2)

where R ∈ IRn×r . This results in (1) being a standard nonconvex programming problem with quadratic constraints (i.e., diag(RRT ) = e). Geometric optimization methods (including Newton and gradient types) in [11] as well as the majorization method [24] are designed to solve this nonconvex programming problem. They each led to a computational package: LRCMmin package from the geometric approach and Major from the majorization approach. Both packages are in Matlab and are publicly available. It is these two packages that we are going to compare with both in terms of cpu time and the objective function values obtained. An interesting feature of the two packages is that they are able to verify whether a local minimum (of the nonconvex programming problem) is a global minimum by making use of a result obtained in [39, Thm. 4.4, Thm. 4.5] and [38] for the Lagrangian method. It is emphasized that in order to conduct such a nice verification one has to meet the following two conditions: (i) C  0, and (ii) the rth eigenvalue of C + Diag(y ∗ ) is simple, where y ∗ is the vector of Lagrangian multipliers (corresponding to the equality constraint in (1)) and Diag(y ∗ ) is the diagonal matrix formed by vector y ∗ . For detailed comparison between the Lagrangian method of Zhang and Wu [39] and Major, see [11]. The method of parameterization takes a further step to parameterize R in (2) satisfying diag(X) = e by trigonometric functions through spherical coordinates. See Rebnato [30], Brigo [6], and Rapisarda et al. [29] for more on this method. We once again refer to [11, Sec. 2, Sec. 7] for a comprehensive survey on methods available for solving the low rank problem (1). It is noted that all numerical results in existing literature are only for problems of size up to n = 80. Our method is capable of dealing with problems of n being very large (i.e., n ≥ 1000). The method proposed here is strongly motivated by the semismooth Newton method of Qi and Sun [26] for the nearest correlation matrix problem [13]: minX∈S n s.t.

1 2 kX

− Ck2 diag(X) = e, X  0.

(3)

The resulting Matlab package CorNewton.m (http://www.math.nus.edu.sg/∼matsundf/ with eig.m replaced by mexeig.m) has been widely used by many banks and insurance companies. Further development on the method can be found in Borsdorf and Higham [3], which leads to the NAG (Numerical Analysis Group) Fortran implementation. We also refer to Malick [18], Boyd and Xiao [4], and Toh [36] for other efficient methods for (3). It is obvious that numerical methods for (3) cannot be used for (1) because of the rank constraint: rank(X) ≤ r. However, the idea of the semismooth Newton method plays a very important role in our approach because each subproblem is going to be solved by it. Our approach is based on the following three fundamental facts: 2

(i) For any correlation matrix X ∈ S n , trace(X) = n.

(4)

(ii) For any X ∈ S n admitting the following spectral decomposition X = P Diag(λ1 , . . . , λn )P T ,

(5)

where P P T = I and λ1 ≥ . . . ≥ λn in nonincreasing order are eigenvalues of X, we have X  0, rank(X) ≤ r ⇐⇒

r X

λi = trace(X).

(6)

i=1

Pr We note that Pr i=1 λi is the sum of the first r largest eigenvalues of X. This brings out the third fact that i=1 λi can be represented as a semidefinite programming problem (see Overton and Womersley [21, 22] and Hiriart-Urruty and Ye [14]). (iii) The sum of the first r largest eigenvalues of X can be calculated as follows: Pr hX, U i i=1 λi = maxU ∈S n s.t. hI, U i = r 0  U  I,

(7)

and a solution is U = P1 P1T ,

(8)

where P1 is the submatrix of P consisting of the first r columns of P . Combination of the three facts (4), (6), and (7) yields the following result, whose proof is postponed to Section 2. b ∈ S n solves the nearest low-rank correlation matrix problem (1) if and only Theorem 1.1 X b ∈ S n such that (X, b U b ) solves the following nonlinear semidefinite if there exists a matrix U programming problem, denoted by NSDPr : min(X,U )∈S n ×S n s.t. (NSDPr )

1 2 kX

− Ck2 diag(X) = e, X  0 hX, U i = n hI, U i = r 0  U  I.

(9)

Moreover, the optimal objective values of (1) and (9) are equal. NSDPr (9) is a continuous optimization reformulation of the low-rank problem (1). Due to the bilinear term hX, U i = n, it is a nonconvex problem. The main purpose of this paper is to design an efficient algorithm that finds a stationary point of NSDPr (9). We first note the complexity of (9). It has two variables X and U , each is of the dimension n(n + 1)/2. Moreover, it involves three positive semidefinite cones of the same size: X  0, U  0, and Z  0 with Z := I − U . On top of those, it has the nonconvex constraint hX, U i = n. It seems unwise to solve this 3

problem directly. In fact, our initial attempt to use an augmented Lagrangian method to solve (9) proved unsuccessful. In order to avoid dealing with those three positive semidefinite cones simultaneously, we recall a key fact that has been established in the proof of Theorem 1.1. Let X ∈ S n have the spectral decomposition (5) and the matrix U is defined by the equation (8). The key fact is that X is a feasible point of the low-rank problem (1) if and only if (X, U ) is a feasible point of NSDPr (9). Our algorithm makes use of this fact iteratively. Suppose X k is the current iterate and U k is obtained by the equation (8), where the matrix P is from the spectral decomposition of (5) with X = X k . Then, U k automatically satisfies the following constraints hI, U k i = r and I  U k  0. (10)

The idea is to fix U k and attempt to solve the following problem, which is obtained from (9) by removing the constraints (10): minX∈S n s.t.

1 2 kX

− Ck2 diag(X) = e, X  0 hX, U k i = n.

(11)

We then update X k by the optimal solution of (11) and update U k accordingly by the equation (8). The algorithm continues until it converges. Comparing to (9), problem (11) is a standard quadratic semidefinite programming extensively studied by Toh [36] and it only involves one positive semidefinite cone X  0. However, a potential problem for (11) is that it may have no feasible solutions. That is, the first constraint may contradict the second constraint in (11) due to the way how U k is defined. To avoid the infeasibility issue, we propose to solve its augmented Lagrangian problem, which selectively handles the constraint hX, U k i = n, while keeping other constraints unchanged. minX∈S n s.t.

1 2 kX

   2 − Ck2 − µ hX, U k i − n + 2c hX, U k i − n

diag(X) = e, X  0,

(12)

where µ ∈ IR is the Lagrangian multiplier corresponding to the constraint hX, U k i = n and c > 0 is the penalty parameter. The augmented Lagrangian problem is always feasible and always has a unique optimal solution because it is strongly convex in X. We update X k by the optimal solution of (12) and update U k accordingly. The Lagrangian multiplier µ and the penalty parameter c are updated according to some rules. The main task of our algorithm is to solve the subproblem (12) efficiently per iteration. We will develop a semismooth Newton method, which is motivated by the method in [26], to solve (12). Consequently, our algorithm makes use of the semismooth Newton method iteratively. We hence name it as Sequential Semismooth Newton Method. The high efficiency of the resulting method, as demonstrated in the numerical part (Section 5), heavily depends on the good performance of the semismooth Newton method, which is not directly developed for (12). It is actually developed as follows. First, (12) is reformulated as a least-square problem. We then consider the Lagrangian dual of this least-square problem. The dual problem is unconstrained and convex. However, the objective function is only once continuously differentiable. Therefore, it is not possible to develop a classical Newton method for the dual problem. Nevertheless, we are able to develop a semismooth version of Newton’s method, which is also quadratically convergent. Moreover, we can recover the solution of (12) 4

from the solution of the dual problem. The dual approach just outlined is classical [33] and is proved very successful with the nearest correlation matrix problem (3) in [18, 4, 26, 3]. The paper is organized as follows. In Section 2, we characterize stationary points of the nonconvex problem (9) and include a formal proof of Thm. 1.1. We describe our algorithm in Section 3 and show that every accumulation point is stationary (see Thm. 3.2). Section 4 details our semismooth Newton method, including calculation of the generalized Hessian of the dual function and the convergence analysis. A key property that ensures the quadratic convergence (Thm. 4.6) is constraint nondegeneracy of the least-square problem (see Prop. 4.4 and Prop. 4.5). We conduct extensive numerical tests and comparison with Major.m and LCRMmin.m on a large set of problems in Section 5, where we also deal with some practical issues in our implementation. We conclude the paper in Section 6. Notation: We use ◦ to denote the Hadamard product of matrices; i.e., for any B, C ∈ S n , B ◦ C = [Bij Cij ]ni,j=1 . We let E denote the matrix of all ones in S n . For subsets α, β of {1, 2, . . . , n}, we denote Bαβ as the submatrix of B indexed by α and β, and Bα as the submatrix of B consisting the columns in B indexed by α. We also denote by |α| the cardinality of the set α. Let e denote the vector of all ones. We sometimes use 0n to denote the zero vector in IRn . For a matrix X ∈ S n , diag(X) is the vector in IRn formed by the diagonal elements of X; while Diag(y) is the diagonal matrix formed by the vector y ∈ IRn . “:=” means “define”.

2

Preliminaries

This section includes three independent parts. The first part is to study characterization of stationary points of NSDPr (9). The second part is to introduce the concept of constraint nondegeneracy, which will play an important role in the quadratic convergence analysis of the semismooth Newton method in Section 4. The last part contains a formal proof of Thm. 1.1.

2.1

Stationary Points of (9)

We copy (9) below with the corresponding Lagrangian multipliers to the constraints: min(X,U )∈S n ×S n s.t. (NSDPr )

1 2 kX

− Ck2 diag(X) = e hX, U i = n hI, U i = r I −U 0 X  0, U  0.

(y) (µ) (ν) (R)

(13)

where y ∈ IRn , µ, ν ∈ IR, and R ∈ S n are each respectively the Lagrangian multiplier of the corresponding constraint. After certain elementary linear algebra, the KKT condition of NSDPr can be stated as follows  1 1 2 2 2 T  2 kX − Ck + 2 (kXk − kCk ) − e y − µn = 0   diag(X) = e       hX, U i = n hI, U i = r (KKT) (14)   U  0, −µX − νI + R  0, hU, Ri = µn + νr      I − U  0, R  0, hI − U,Ri = 0   X  0, X − C + Diag(y) − µU  0. 5

Definition 2.1 A point (X, U ) ∈ S n × S n is said to be a stationary point of NSDPr (9) if there exist Lagrangian multipliers (¯ y, µ ¯, ν¯, R) such that (X, U ; y¯, µ ¯, ν¯, R) satisfies the KKT condition (14). We have the following result. Proposition 2.2 Suppose (X, U ; y¯, µ ¯, ν¯, R) satisfies the KKT condition (14) and r < n. We must have µ ¯≥0 and ν¯ ≤ 0. Proof. We note from (14) that −¯ µX − ν¯I + R  0

and

I − U  0.

The product of the above two matrices gives 0 ≤ hI − U , −¯ µX − ν¯I + Ri  = −¯ µ hI, Xi − hU , Xi − ν¯hI − U , Ii + hI − U , Ri ¯ , Ii. = −¯ ν hI − U

(15)

Since r < n, I 6= U because of the condition hI, U i = r in (14). This implies hI − U , Ii > 0 because I  U in (14). Therefore, we must have from (15) ν¯ ≤ 0. It also follows from the KKT condition (14) that 0 ≤ hU , Ri = µ ¯n + ν¯r, which implies µ ¯ ≥ 0 because we just proved ν¯ ≤ 0.

2.2



Constraint Nondegeneracy

Consider the linearly constrained system: A(x) = b, x ∈ K,

(16)

where A : IRn 7→ IRm is a linear operator, b ∈ IRm , and K ⊂ IRn is a closed convex set. For x ∈ K, let TK (x) denote the tangent cone of K at x and lin(TK (x)) be the largest linear space contained in TK (x). We say that constraint nondegeneracy holds at a feasible point x of (16) if   A lin(TK (x)) = IRm . (17) The definition (17) is equivalent to [2, Eq. 4.172], which is generally defined for nonlinearly constrained systems. When applied to the constraint set that appears in the standard SDP: 

n X ∈ S+ | hAi , Xi = bi , i = 1, . . . , m , 6

where Ai ∈ S n and bi ∈ IR for i = 1, . . . , m, the definition of constraint nondegeneracy is equivalent to the primal nondegenracy in [1, Definition 5]. In the case of the correlation matrix constraints:  n | diag(X) = e , (18) X ∈ S+

(17) is equivalent to:

hDiag(y), Zi = 0, ∀ Z ∈ lin(TS+n (X))

=⇒

y = 0.

(19)

The following result appears in [37, Prop. 4.2] proved for some special points and is extended to any correlation matrix in [27, Prop. 2.1]. n } is Lemma 2.3 Any point X satisfying the correlation constraints {diag(X) = e, X ∈ S+ constraint nondegenerate. That is, the implication (19) always holds at any correlation matrix X.

The lemma will be used in Section 4, where we will deal with a slightly more complicated set than the correlation matrix set (18). We now finish this section by a formal proof of Thm. 1.1.

2.3

Proof of Theorem 1.1

Proof. Let F1 denote the feasible region of (1) and F2 the X-component projection of the feasible region of (9):   diag(X) = e, X  0   . F2 := X ∈ S n | there exists U ∈ S n s.t. hX, U i = n, hI, U i = r   0U I

We prove F1 = F2 . Suppose X ∈ F1 and X := X has the spectral decomposition (5). Because rank(X) ≤ r, we have λi = 0 for i ≥ (r + 1). Then U = P1 P1T solves (7) by (8), and by (4)-(7) we have hX, U i =

r X

λi = trace(X) = n.

i=1

Therefore, (X, U ) is a feasible point of (9), implying X ∈ F2 . Conversely, let us assume X ∈ F2 . Then, there exists a matrix U ∈ S n such that (X, U ) is a feasible point of (9). Once again, let X := X have the spectral decomposition (5). Then the optimal value of (7) is not bigger than the trace of X, i.e., r X i=1

λi ≤ trace(X) = n,

(20)

where the inequality holds because X  0. We also note that (X, U ) achieves its maximal value n: r X (21) λi ≥ hX, U i = n. i=1

The fact X  0, together with (20) and (21), implies λi = 0 for i ≥ (r+1). That is, rank(X) ≤ r,  implying X ∈ F1 . The claim of the theorem follows easily from F1 = F2 . 7

3

The Algorithm and Its Convergence

As mentioned in Introduction, the main computational burden of our algorithm is on the quadratic semidefinite programming subproblem (12), which we regard as the inner problem of our overall algorithm. In this section, we describe our algorithm mainly focusing on the outer iterations and analyze its convergence to a stationary point. The solution condition required for the inner problem will be met by our semismooth Newton method in Section 4.

3.1

Description of the Algorithm

For given U ∈ S n and µ, c ∈ IR, define

2   c 1 hX, U i − n . L(U,µ,c) (X) := kX − Ck2 − µ hX, U i − n + 2 2

Then, the main subproblem that our algorithm aims to solve in each iteration takes the following form: minX∈S n L(U,µ,c) (X) (22) s.t. diag(X) = e (y) X  0, (S)

where y ∈ IRn and S ∈ S n in brackets are respectively the Lagrangian multipliers of the corresponding constraints. The KKT condition of (22) is    Rp := diag(X)  −e=0    Rd := X − C + Diag(y) + µU + c hX, U i − n U − S = 0 (23)   X  0, S  0, hX, Si = 0.

It is well known (see [9]) that the complementarity condition (i.e., the last condition) in (23) is equivalent to Rc := X − [X − S]+ = 0, n . An approximate solution to where [X]+ denotes the orthogonal projection of X ∈ S n onto S+ the subproblem (22) is often measured by

kRp k + kRd k + kRc k ≤ ǫ for some small tolerance ǫ ≥ 0. Algorithm 3.1 (S.0) Let {ǫk } ↓ 0. Choose initial point X 0 ∈ S n ; Lagrangian multiplier µ0 ; penalty parameter c0 > 0, ρ > 1, and τ > 0. Conduct the spectral decomposition (5) on X := X 0 . Let P 0 := P and let P10 denote the submatrix consisting of the first r columns of matrix P 0 . Let k := 0. (S.1) Let U k := P1k (P1k )T . 8

(S.2) Find a triple (X k+1 , y k+1 , S k+1 ) of the following subproblem minX∈S n s.t.

L(U k ,µk ,ck ) (X) diag(X) = e X  0,

(24)

such that kRp k + kRd k + kRc k ≤ ǫk . (S.3) Conduct the spectral decomposition (5) with X := X k+1 . Let P k+1 := P and let P1k+1 denote the submatrix consisting of the first r columns of matrix P k+1 . (S.4) Update Lagrangian multiplier by n   o µk+1 := max 0, µk − hX k+1 , U k i − n ck . Update the penalty parameter ck by  ck+1 := max ρck , |µk+1 |1+τ . (S.5) Set k := k + 1 and go to (S.1). We have a few comments on Algorithm 3.1. (R1) We did not include any stopping criterion in Alg. 3.1 because of the following two reasons. First, according to the KKT condition (14) of (9), we need availability of the Lagrangian multipliers (¯ y, µ ¯, ν¯, R) in order to verify whether (X, U ) is a stationary point. Some (not all) of the multipliers may come from solving the subproblem (24) depending on which method is being used. Therefore, it would be difficult at this stage to give a meaningful stopping criterion. Second, the situation will become clear and simple when it comes to our implementation. A simple criterion is proposed in (68) and studied in Prop. 5.1. (R2) In the standard augmented Lagrangian method (see, e.g., [20]), the Lagrangian multiplier µ should be updated by the formula µk+1 := µk − (hX k+1 , U k i − n)ck , rather than taking its nonnegative part as adopted in (S.4). It is because we have proved in Prop. 2.2 that the correct µ must be nonnegative. As a matter of fact, if subproblem (24) is solved to high accuracy, we always have (hX k+1 , U k i − n) ≤ 0 because X k+1 is always a correlation matrix. Therefore, {µk } is nondecreasing. If µ0 = 0, the formula for (S.4) reduces to µk+1 := µk − (hX k+1 , U k i − n)ck . That the penalty parameter in the updating formula in (S.4) always grows faster than the Lagrangian multiplier is recently introduced by Lu and Zhang [16] to deal with the feasibility issue at the limit in their augmented Lagrangian method. (R3) We allow freedom to use any suitable algorithm to solve the subproblem (24) as long as the algorithm is able to produce a triple (X k+1 , y k+1 , S k+1 ) that satisfies the required accuracy. (y k+1 , S k+1 ) may not be required to be in an explicit form. In fact, Alg. 3.1 only needs availability of X k+1 . However, the information about (y k+1 , S k+1 ) is important to our convergence analysis below. 9

3.2

Convergence to a Stationary Point

We first note the fact that kU k k = r for all k = 0, . . .. In other words, {U k } is uniformly bounded. We now state our stationary property on a converging subsequence generated by the algorithm. Theorem 3.2 Let {X k } be the sequence generated by Alg. 3.1. Suppose that a subsequence {X k+1 , y k+1 , S k+1 }k∈K converges to {X, y¯, S} and {U k }k∈K converges to U . Then, the following statements hold. (i) hX, U i = n. (ii) {µk+1 }k∈K is bounded. (iii) (X, U ) is a stationary point of (9). Proof. (i) By (S.3) in Alg. 3.1, {X k+1 , y k+1 , S k+1 } satisfies the following conditions     kX k+1 − C + Diag(y k+1 ) + µk U k + ck hX k+1 , U k i − n U k − S k+1 k ≤ ǫk kdiag(X k+1 ) − ek ≤ ǫk

kX

k+1

− [X

k+1

−S

k+1

]+ k ≤ ǫk .

(25) (26) (27)

Since {ǫk } ↓ 0, we follow from (26) and (27) that diag(X) = e

and

X − [X − S]+ = 0.

(28)

By the formula for ck in (S.4) of Alg. 3.1, we have ck → ∞. Dividing (25) by ck and taking limits on both sides yield |hX, U i − n| = 0, (29) where we used the fact that ck outgrows µk by at least a factor |µk |τ .

(ii) To prove the boundedness of {µk+1 }k∈K , we may assume, without loss of any generality, that µk+1 = µk − ck (hX k+1 , U k i − n) ≥ 0, ∀ k ∈ K. Otherwise, µk+1 ≡ 0 for any k ∈ K and hence {µk+1 }k∈K is bounded. It again follows from (25) that kX k+1 − (C + Diag(y k+1 )) − µk+1 U k − S k+1 k ≤ ǫk . (30) Assume that {µk+1 }k∈K is unbounded. Dividing the both sides of (30) by µk+1 and taking limits yield kU k = 0, contradicting kU k = r. This establishes (i). (iii) Since {µk+1 }k∈K is bounded, we may assume without loss of generality that {µk+1 }k∈K → µ ¯. We note that µ ¯ ≥ 0. Taking limits on both sides of (30) gives X − (C + Diag(¯ y) + µ ¯U ) − S = 0. 10

(31)

Recall P k+1 is the matrix from the spectral decomposition (5) with X := X k+1 and P1k+1 is the submatrix consisting of the first r columns of P k+1 . Subsequencing if necessary, we may assume {P k+1 }k∈K → P . Let P 1 be the submatrix consisting of the first r columns of P . We must have ¯1, . . . , λ ¯ n )P T lim X k+1 = X = P Diag(λ

k∈K

and

T

U = P 1P 1 ,

¯ i , i = 1, . . . , n are eigenvalues of X. Due to the continuity of eigenvalues, we must have where λ ¯1 ≥ λ ¯2 ≥ . . . ≥ λ ¯ n . If follows from (28) and (29) that X is a correlation matrix, rank(X) = r, λ and S0 and hX, Si = 0. (32) Hence, ¯ i = 0, i = r + 1, . . . , n. λ Define the following scalars ¯ i , i = 1, . . . , r, ν¯ := 0, τi := µ ¯λ and form the matrix

T

R := P Diag(τ1 , . . . , τr , 0, . . . , 0)P . It is a matter of elementary linear algebra to check that R  0 (because µ ¯ ≥ 0) and hI − U , Ri = 0, −¯ µX − ν¯I + R  0, hU , Ri = µ ¯n + ν¯r.

(33)

Moreover, it follows from (31) and (32) that hX, Si = 0 implies 1 1 ¯n = 0. kX − Ck2 + (kXk − kCk2 ) − eT y¯ − µ 2 2

(34)

Putting the relationships (28), (29), and (31)-(34) together is sufficient to claim that (X, U ), with (¯ y, µ ¯, ν¯, R) satisfies the KKT condition (14). Hence, (X, U ) is a stationary point. 

4

Solving the Subproblem by a Semismooth Newton Method

The efficiency of Alg. 3.1 depends on whether the subproblem (24) can be efficiently solved. Being a standard quadratic semidefinite programming problem (QSDP), (24) could be solved by any QSDP solver such as the one proposed by Toh [36]. However, our numerical results showed that using general QSDP solver may result in a quite slow Alg 3.1. In this section, we develop a more efficient method called semismooth Newton method to solve (24). It all starts from reformulating it as a least-square problem.

4.1

Least-Square Reformulation and Its Dual Problem

We introduce a new variable z of one dimension by z := hX, U i − n. 11

Then, the subproblem (22) can be equivalently stated as the least-square problem min(X,z)∈S n ×IR 21 kX − Ck2 − µz + 2c z 2 s.t. diag(X) = e (y) hX, U i − z = n (s) X  0,

(35)

where y ∈ IRn and s ∈ IR in the brackets denote the Lagrangian multipliers of the corresponding constraints. The well-known Slater condition is obviously satisfied for (35). There is no duality gap (see Rockafellar [33]) between (35) and its dual problem: max

(y,s)∈IRn ×IR

min L(X, z; y, s),

(36)

X0

z∈IR

where the Lagrangian function L(X, z; y, s) is defined by L(X, z; y, s) := =

1 c kX − Ck2 − µz + z 2 − y T (diag(X) − e) − s(hX, U i − z − n) 2 2 1 1 kX − (C + Diag(y) + sU )k2 − kC + Diag(y) + sU k2 2 2 1 1 µ − s 2 c − (µ − s)2 + y T e + sn + kCk2 . + z− 2 c 2c 2

By inspecting the quadratic terms involving X and z in L(X, z; y, s), we see that the inner n × IR. It has a unique problem of (36) is an orthogonal projection problem onto the cone S+ solution given by µ−s X := [C + Diag(y) + sU ]+ and z := . (37) c It is well known that for Z ∈ S n , Z = [Z]+ − [−Z]+ , ∀ Z ∈ S n .

(38)

Substituting X and z in (37) into (36) and using the decomposition property (38), we obtain max

(y,s)∈IRn ×IR

= =−

max

(y,s)∈IRn ×IR

min

(y,s)∈IRn ×IR

1 1 k[C + Diag(y) + sU ]+ − (C + Diag(y) + sU )k2 − kC + Diag(y) + sU k2 2 2 1 1 − (µ − s)2 + y T e + sn + kCk2 2c 2 1 1 1 2 − k[C + Diag(y) + sU ]+ k − (µ − s)2 + y T e + sn + kCk2 2 2c 2 1 θ(y, s) − kCk2 , 2

where

1 1 θ(y, s) := k[C + Diag(y) + sU ]+ k2 + (µ − s)2 − y T e − sn. 2 2c To simplify the notation below, we define a linear operator and its adjoint. For a given matrix U ∈ S n , we define A : S n 7→ IRn+1 by   diag(X) A(X) := , ∀ X ∈ S n. hU, Xi 12

Then, its adjoint operator A∗ : IRn+1 7→ S n is given by A∗ (y, s) = Diag(y) + sU,

∀ (y, s) ∈ IRn × IR.

We often use Ak to indicate the dependence of A on U k , which is produced in Alg. 3.1. We also let b ∈ IRn+1 be defined by (note that e is the vector of all ones in IRn ) b :=



e n



.

Then, the function θ takes the following form: 1 1 θ(y, s) = k[C + A∗ (y, s)]+ k2 + (µ − s)2 − bT (y T , s)T . 2 2c It follows from the general duality theory of Rockafellar [33] that we have the following nice relationship between (35) and its dual problem (36). Proposition 4.1 Consider the optimization problem min

(y,s)∈IRn ×IR

θ(y, s).

(39)

The following statements hold. (i) θ(·, ·) is convex and continuously differentiable everywhere. Moreover, 



∇θ(y, s) = A [C + A (y, s)]+



1 − c



0n µ−s



− b.

(ii) θ(·, ·) is coercive, i.e., θ(y, s) → +∞ as k(y, s)k → +∞. (iii) (¯ y , s¯) ∈ IRn × IR is an optimal solution of (39) if and only if it satisfies the following equation: ∇θ(y, s) = 0. (40) Furthermore, (X, z¯) given by X := [C + A∗ (¯ y , s¯)]+

and

1 z¯ := (µ − s¯) c

(41)

is the optimal solution of (35). We have the following two remarks. (R1) The coerciveness of θ comes from two facts. One: the Slater condition holds for (35); and Two: the linear equations in (35) are linearly independent (since U 6= I). 13

(R2) The dual approach outlined above is classical and has been successfully applied to the nearest correlation matrix problem (3) (i.e., without the rank constraint) in [18, 4, 26, 3]. The minimization problem (39) is often regarded as the dual problem of (35). Instead of solving the subproblem (24) at iteration X k of Alg. 3.1 (S.2), we try to solve its dual problem: min θk (y, s), (42) (y,s)∈IRn ×IR

where

1 1 θk (y, s) := k[C + A∗k (y, s)]+ k2 + (µk − s)2 − bT (y T , s)T , 2 2ck

(43)

and A∗k (y, s) := Diag(y) + sU k . Suppose (¯ y , s¯) is an approximate solution of (42), the next k+1 iterate X is taken to be X in (41). We can also construct (y k+1 , S k+1 ) from X k+1 to meet the required accuracy in (S.2) of Alg. 3.1 (see Section 5.1 for detailed argument). Below we are going to develop a fast algorithm to solve (42).

4.2

Semismooth Newton Method

We note that the solution X given by (41) is also the optimal solution of (22). It follows from Prop. 4.1 that it is sufficient to find a solution (¯ y , s¯) of the optimality equation (40). The equation is obviously nondifferentiable because of the projection operator [·]+ . Therefore, the classical Newton’s method is not applicable. However, it is a strongly semismooth equation [17, 28]. To see the strong semismoothness of n ∇θ(·, ·), let us indicate the dependence of the operator [X]+ on the positive semidefinite cone S+ n by denoting ΠS+n (X) := [X]+ for any X ∈ S . It is known that ΠS+n (·) is strongly semismooth due to [35]. ∇θ(·, ·) is just a composite function of linear operators and ΠS+n (·), it is therefore strongly semismooth. To simplify our notation below, we denote x := (y, s) ∈ IRn × IR. Therefore, iterate xk has a natural representation xk = (y k , sk ) ∈ IRn × IR. Since ΠS+n (·) is Lipschitz continuous with modulus 1, the function ∇θ(·, ·) is Lipschitz continuous on IRn × IR. According to Rademacher’s Theorem, ∇θ(·, ·) is almost everywhere Fr´echet-differentiable in IRn ×IR. The generalized Hessian of θ(·, ·) at x = (y, s) ∈ IRn × IR is defined as ∂ 2 θ(x) := ∂(∇θ)(y, s), where ∂(∇θ)(y, s) is the Clarke generalized Jacobian of ∇θ(·, ·) at (y, s) [8]. It is usually difficult to determine ∂ 2 θ(x) exactly. We define the following alternative 1 ∂ˆ2 θ(x) := A∂ΠS+n (C + A∗ (y, s))A∗ + Diag(0, . . . , 0, 1), c

(44)

where ∂ΠS+n (C + A∗ (y, s)) is the Clarke generalized Jacobian of ΠS+n (·) at C + A∗ (y, s). Note that the second (constant) term in (44) is from the linear part of ∇θ(x). From [8, p.75], it holds for any d ∈ IRn+1 , ∂ 2 θ(x)d ⊆ ∂ˆ2 θ(x)d, (45) which means that if every element in ∂ˆ2 θ(x) is positive (semi)definite, so is every element in ∂ 2 θ(x). 14

Having defined the second-order object ∂ˆ2 θ(x), we are now able to develop a semismooth Newton method for the equation (40). Suppose xk = (y k , sk ) ∈ IRn × IR is the current iterate, the standard semismooth Newton method [17, 28] takes the following form: Vk ∈ ∂ˆ2 θ(xk ).

xk+1 = xk − Vk−1 ∇θ(xk ),

(46)

In order for the method (46) to be well-defined, we need to address the following questions: (Q1) How to calculate an element V ∈ ∂ˆ2 θ(x) for a given point x ∈ IRn+1 ? (Q2) Is the term Vk−1 ∇θ(xk ) well-defined and how to solve it? (Q3) What are the convergence properties of the method? We will address these questions one by one. Similar questions have also been addressed in [26, 40] for the methods developed therein. We first address (Q1). For a given x = (y, s) ∈ IRn × IR, we conduct the spectral decomposition of the symmetric matrix X := C + A∗ (x) such that X = C + A∗ (x) = P Λx P T ,

(47)

where P P T = I and Λx is the diagonal matrix with diagonal entries consisting of the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn of X being arranged in nonincreasing order. Define three index sets α := {i | λi > 0} ,

β := {i | λi = 0} ,

γ := {i | λi < 0} .

Define the operator Wx0 : S n 7→ S n by

Wx0 (H) := P (Ω ◦ (P T HP ))P T ,

where Ω :=



Eαα ωαα¯ ωαTα¯ 0



,

with ωij :=

∀ H ∈ S n,

(48)

λi , i ∈ α, j ∈ α ¯, λi − λj

(49)

α ¯ := {1, . . . , n} \ α, and Eαα ∈ S |α| is the matrix of ones. It follows [23, Lemma 11] that Wx0 ∈ ∂ΠS+n (C + A∗ (x)).

Define Vx0 : IRn+1 7→ S n by Vx0 h := AWx0 (A∗ (h)) +

1 c



0n hn+1



,

h ∈ IRn+1 .

(50)

We know from (44) that Vx0 ∈ ∂ˆ2 θ(x). This answered (Q1). Wx0 is just one element of ∂ΠS+n (C + A∗ (x)). Actually, we are able to characterize the whole set to some extent. We put this characterization into a lemma for later use. Lemma 4.2 [34, Prop. 2.2] Suppose C + A∗ (x) has the spectral decomposition (47) and Wx ∈ ∂ΠS+n (C + A∗ (x)). Then there must exist an M ∈ ∂ΠS |β| (0) such that +



e αα H  eT Wx (H) = P  H αβ e T ◦ ΩT H αγ αγ

e := P T HP . where H

 e αβ e αγ H Ωαγ ◦ H  T e ββ ) M (H 0 P , 0 0

15

∀ H ∈ S n,

(51)

We note that Wx0 is just a special element of (51) with M chosen to be 0. Now we answer (Q2), which is equivalent to solving the type of the following linear equation: Vx0 (∆x) = −∇θ(x).

(52)

Noticing ωij ≥ 0 for i ∈ α, j ∈ α ¯ in (49), it is easy to see that Wx0 is positive semidefinite, and 0 so is Vx . In the view of solving the optimality equation ∇θ(x) = 0, we propose to solve the following regularized linear equation:   Vx0 + tI ∆x = −∇θ(x), (53) where t := min{κ1 , κ2 k∇θ(x)k} and κ1 , κ2 > 0. Hence, (53) is a positive definite linear equation as long as ∇θ(x) 6= 0. In order to tackle large scale problems, we follow the ideas of [26, 40] to use a conjugate gradient (CG) method to solve (53). The scheme of using a CG method to solve (53) answers (Q2). The choice of t in (53) does not impair the fast convergence property of Newton’s method. We will deal with the convergence (Q3) in the next subsection. Below is a globalized version of Newton’s method for (39). Algorithm 4.3 (Semismooth Newton Method) (S.0) Given x0 ∈ IRn+1 , η ∈ (0, 1), σ ∈ (0, 1), κ1 ∈ (0, 1), κ2 ∈ (1, ∞), κ3 ∈ (1, ∞), and δ ∈ (0, 1). Let j := 0. (S.1) Select an element Vj ∈ ∂ˆ2 θ(xj ), compute tj := min{κ1 , κ2 k∇θ(xj )k}, and apply the CG method [12] starting with the zero vector as the initial search direction to (Vj + tj I)∆x = −∇θ(X j )

(54)

to find a search direction ∆xj such that k∇θ(xj ) + (Vj + tj I)∆xj k ≤ ηj k∇θ(xj )k ,

(55)

where ηj := min{η, κ3 k∇θ(xj )k}. (S.2) Let lj be the smallest nonnegative integer l such that

θ(xj + δ l (∆xj )) − θ(xj ) ≤ σ δ l ∇θ(xj ), ∆xj . Set τj := δ lj and xj+1 := xj + τj ∆xj .

(S.3) Replace j by j + 1 and go to (S.1).

4.3

Quadratic Convergence

Note that since for each j ≥ 0, Vj + tj I is positive definite, one can always use the CG method to find ∆xj such that (55) is satisfied. Furthermore, since the CG method is applied with the zero vector as the initial search direction, it is not difficult to see that ∆xj is always a descent direction for θ(·) at xj . In fact, it holds that

1 1 k∇θ(xj )k2 ≤ −∇θ(xj ), ∆xj ≤ k∇θ(xj )k2 , λmax (Vj + tj I) λmin (Vj + tj I) 16

(56)

where for any matrix A ∈ S n , λmin (A) and λmax (A) represent the smallest and largest eigenvalues of A, respectively. For a proof on (56), see [40]. Therefore, Algorithm 4.3 is well defined as long as ∇θ(xj ) 6= 0 and its convergence analysis can be conducted in a similar way as in [26, Thm. 5.3]. In order for our proof to follow [26, Thm. 5.3], we need the following key property of (35) at its solution. Proposition 4.4 Let (X, z¯) denote the optimal solution of (35) and let X := X have the spectral decomposition (47). Then, the following statements hold. (i) Constraint nondegeneracy holds at (X, z¯), i.e.,   A lin(TS+n (X)) × IR = IRn+1 , where A(X, z) := A(X) −



0n z



(57)

∀ (X, z) ∈ S n × IR.

,

(ii) (57) holds if and only if the following implication holds. e αα = 0, H

e αβ = 0, H

e αγ = 0, H

hn+1 = 0

=⇒

h = 0,

e := P T (A∗ (h))P for any h ∈ IRn+1 and A∗ is the adjoint of A. where H

Proof. (i) First we note that the definition of constraint nondegeneracy (17), when applied to the constraints in (35), is equivalent to (57). We prove (57) must hold. It is easy to see that (57) holds if and only if o⊥ n  = {0n+1 } , A lin(TS+n (X)) × IR

(58)

  where the left-hand side of (58) denotes the subspace that is orthogonal to A lin(TS+n (X))×IR . Then, (58) holds if and only if the following implication holds h(y, s), A(Z, t)i = 0,

∀ (Z, t) ∈ lin(TS+n (X)) × IR

=⇒

y = 0, s = 0.

(59)

The left-hand side of (59) has the following expression ∗

0 = hA (y, s), (Z, t)i

= hDiag(y) + sU, Zi − st,

where we used the relation ∗

A (y, s) =



A∗ (y, s) −s



(60)

.

Let Z = 0 ∈ lin(TS+n (X) in (59), we see from (60) that st = 0 for all t ∈ IR. This implies s = 0. Then, the left-hand side of (59) reduces to hDiag(y), Zi = 0,

∀ Z ∈ linTS+n (X). 17

(61)

Since X is a correlation matrix, it follows from Lemma 2.3 that X is constraint nondegenerate. Putting (61) and (19) together, we get y = 0. This proves (y, s) = 0. That is, the implication (59) is satisfied and hence (X, z¯) is constraint nondegenerate. (ii) We simply recall the formula for lin(TS+n (X)) [1]:  lin(TS+n (X)) = H ∈ S n | [Pβ , Pγ ]T [Pβ , Pγ ] = 0 .

By making use of the above formula, it is easy to see that (59) is equivalent to the implications in (ii).  Proposition 4.5 Let (X, z¯) be the optimal solution of (35) and x ¯ := (¯ y , s¯) be an optimal solu2 ˆ tion of (39). Then, every element V ∈ ∂ θ(¯ x) is positive definite. Proof. It follows from (41) that X = ΠS+n (C + A∗ (¯ x)). Let V ∈ ∂ˆ2 θ(¯ x) be arbitrarily n+1 chosen. We want to show that for any 0 6= h ∈ IR , hT V h > 0. Let X := X have the spectral decomposition (47). From (50) and (51), there exists an M ∈ ∂ΠS |β| (0) such that Wx¯ has representation (51) and +

1 V h = AWx¯ (A h) + c ∗



0n hn+1



.

e := P T (A∗ h)P . Then, we have We let H

hh, V hi = hA∗ h, Wx¯ (A∗ h)i + h2n+1 /c e P T (Wx¯ (A∗ h))P i + h2n+1 /c = hH, X X X e ij2 + 2 e ij2 + 2 e ij2 + hH e ββ , M (H e ββ )i + h2n+1 /c. = H H ωij H i,j∈α

i∈α

j∈β

(62)

i∈α

j∈γ

We note that ωij ∈ (0, 1) for all i ∈ α and j ∈ γ. We let κ := min ωij > 0. i∈α

j∈γ

We also note that M ∈ ∂ΠS |β| (0) is always positive semidefinite (see [19, Prop. 1]). Hence, + 2 e ββ , M (H e ββ )i ≥ 0. It follows from (62) that hh, V hi ≥ κ P Pn H e2 hH i∈α j=1 ij + hn+1 /c. Therefore, e αα = 0, H e αβ = 0, H e αγ = 0, and hn+1 = 0. From Lemma 4.4, this can only hh, V hi = 0 only if H happen when h = 0. Hence, V is positive definite.  One can actually strengthen the above result by stating that the positive definiteness of every element V in ∂ˆ2 θ(¯ x) is equivalent to the constraint nondegeneracy (57). We refer to [40, Prop. 3.2] for detailed argument on a similar problem. One important consequence of Prop. 4.5 is that the optimality equation (40) has a unique solution. Another important consequence is that 18

the whole sequence generated by Alg. 4.3 converges to this unique solution. The convergence analysis can be conducted in a similar way as in [26, Thm. 5.3] or [40, Thm. 3.5]. In order to follow the proofs in the two theorems, one needs to use the coerciveness of θ (Prop. 4.1) and the positive definiteness of V ∈ ∂ˆ2 θ(¯ x) (Prop. 4.5). We state the convergence result in the next theorem without giving a detailed proof. Theorem 4.6 Suppose that in Alg. 4.3, ∇θ(xj ) 6= 0 for all j ≥ 0. Then Alg. 4.3 is well defined and the generated iteration sequence {xj } converges to the unique solution x ¯ of problem (39) quadratically.

5

Numerical Results

Our overall algorithm runs as follows. We apply Alg. 3.1 to the problem (9) with each subproblem (24) being solved by applying the semismooth Newton method Alg. 4.3 to its dual problem (42) for an output x ¯. The next iterate X k+1 is taken to be X in (41). We may regard Alg. 3.1 as our outer algorithm and Alg. 4.3 as the inner algorithm. In this overall setting, we have to address whether the inner algorithm is able to provide a good enough approximate (X k+1 , y k+1 , S k+1 ) as requested in (S.2) of Alg. 3.1. This is related to the stopping criteria used in both the outer and inner algorithms.

5.1

Stopping Criteria

Since θ is convex and the optimization problem (39) is unconstrained, an obvious stopping criterion in Alg. 4.3 is k∇θ(¯ x)k ≤ tolin , (63) where tolin > 0 is a given tolerance and x ¯ is the final iterate produced by Alg. 4.3. The output (X, z¯) is given in (41). For brevity, we omit the superscript k of the iteration used by the outer algorithm. For the convergence property (Thm. 3.2) of Alg. 3.1 to be valid, we have to verify that we can find an approximate (X, y, S) from (X, z¯) as requested in (S.2) of Alg. 3.1. Denote ǫ := tolin . Then, (63) is equivalent to (recall x ¯ := (¯ y , s¯)) kdiag(X) − ek ≤ ǫ

and

|hU, Xi − n − (µ − s¯)/c| ≤ ǫ.

(64)

Let S := X − [C + Diag(¯ y ) + s¯U ].

(65)

It follows from the definition X in (41) and the decomposition property (38), we must have S0

and

hX, Si = 0.

(66)

It also follows from (65) that y ) + µU ] + c(hU, Xi − n)U − Sk kX − [C + Diag(¯

= kX − [C + Diag(¯ y ) + s¯U ] + c(hU, Xi − n + ((¯ s − µ)/c)U ) − Sk

s − µ)/c|kU k ≤ crǫ. = c|hU, Xi − n + (¯ 19

(67)

The inequality used (64) and kU k = r. Putting (64), (66), and (67) together, we see that (X, y¯, S) is an approximate solution of (22) satisfying kRp k + kRd k + kRc k = O(ǫ). Therefore, at each iteration, Alg. 4.3 provides an approximate solution to (24) as requested in (S.2) of Alg. 3.1. Now we come to address the outer stopping criterion, which is to test whether the current iterate (X k , U k ) satisfies |

r X i=1

λki − n| ≤ tolrank

and

|hX k , U k−1 i − n| ≤ toleig

(68)

where λki , i = 1, . . . , r are the first r largest eigenvalues of X k , tolrank > 0 is a given tolerance that control the first r leading eigenvalues, toleig > 0 is a given tolerance that control the first r leading eigenvectors. If the following measure on the closedness between U k and U k−1 is satisfied, |hX k , U k − U k−1 i| ≤ toleig − tolrank , the second inequality in (68) is satisfied once the first one is satisfied. We choose the initial iterate X 0 to be the nearest correlation matrixPfor (3) and set U −1 := U 0 (recall k in U k is the iteration index). According to Thm. 3.2 (i), | ri=1 λki − n| → 0 at least on a subsequence of {X k }k∈K . Moreover, we have the following property. Proposition 5.1 Suppose tolin = tolrank = toleig = 0. If (X k , U k ) satisfies (68), then (X k , U k ) is a stationary point of (9). Proof. If k = 0, the stopping criterion (68) means rank(X 0 ) = r as X 0 is the nearest correlation matrix. Therefore, X 0 is the global solution of the low rank problem (1) and (X 0 , U 0 ) is a global solution of (9). It must be a stationary point. Now suppose k ≥ 1 and suppose x ¯ := (¯ y , s¯) is the final iterate produced by Alg. 4.3 applied to the dual problem (39) of the subproblem (35) at iteration (k − 1) (i.e., U := U k−1 , µ := µk−1 , and c := ck−1 ). Then, X k = X = [C + Diag(¯ y ) + s¯U k−1 ]+ , and U k is formed as in (S.1) of Alg. 3.1. Consequently, hX k , U k i =

r X

λki .

i=1

Since tolin = 0, we have ∇θ(¯ x) = 0. Let S be defined as in (65). We then must have

(69) together with the fact 0, we have

  diag(X) − e = 0 hU k−1 , Xi − n − (µk−1 − s¯)/c = 0  X  0, S  0, hX, Si = 0.

(69)

hX, U k i = hX, U k−1 i = n.

(70)

Pr

k i=1 λi

= n means that X is a correlation matrix. Because toleig =

20

It follows from the second equation in (69) that µk−1 = s¯. Denote µ ¯ := µk−1 and U := U k . It follows from the definition of S in (65) that X − (C + Diag(¯ y) + µ ¯U ) + c(hX, U i − n)U − S

= X − (C + Diag(¯ y) + µ ¯U ) − S

= 0.

(71)

Putting together of (69), (70), and (71) and repeating the proof (iii) of Thm. 3.2, it is easy to  see that (X, U ) = (X k , U k ) is a stationary point.

5.2

Diagonal Preconditioner for CG

Having settled the stopping criteria, we come to address the issue of improving the efficiency of the CG method by calculating the diagonal preconditioner of the matrix used in (S.1) of Alg. 4.3. Recall that, at iteration X k , Alg. 4.3 is applied to the dual problem (42). At xj , Vj in (54) is taken to be Vx0 defined by (50). That is,   1 0n 0 0 ∗ Vx (h) = Ak (Wx (Ak (h))) + , h ∈ IRn+1 , ck hn+1 where x := xj . We note that A∗k (h) = Diag(h1 , . . . , hn )

whenever hn+1 = 0.

Therefore, the first n diagonal elements (d1 , . . . , dn ) of Vx0 can be calculated in the same way as in the Newton method for the nearest correlation matrix problem (3). The total computational cost in this part is no more than (2n3 + 3n2 ) flops. We now calculate the last diagonal element dn+1 . Let en+1 be the standard (n + 1)th unit vector in IRn+1 , then dn+1 = hen+1 , Ak (Wx0 (A∗k (en+1 )))i + 1/ck

= hU k , P (Ω ◦ (P T U k P ))P T i + 1/ck

= hP T U k P, Ω ◦ (P T U k P )i + 1/ck .

We also note that U k = P1k (P1k )T , where P1k ∈ IRn×r . The total cost of calculating dn+1 is no more than (4n2 r + 3n2 ) flops. In our implementation, we used thus calculated diagonal preconditioner in the CG method.

5.3

Modified PCA as Final Output

We address the output solution Cr by our method. Suppose X f is the final iterate upon satisfying the stopping criterion (68), which means the sum of the r leading eigenvalues of X f is very close to n and the remaining (n − r) eigenvalues of X f are very small. However, they are not zero. Hence, X f is not a true low-rank matrix in the mathematical sense. We apply the modified PCA (principal component analysis) to X f to output Cr . Let X f have the spectral decomposition (47) with X := X f . Let Λr := Diag(λ1 , . . . , λr ). Define {Xpca }i· :=

zi kzi k

with

zi := {P1 Λ1/2 r }i· , i = 1, . . . , n, 21

(72)

where {Y }i· denotes the ith row of a matrix Y . Then, the output Cr is given by Cr := (Xpca )(Xpca )T .

(73)

The modified PCA (72) and (73) is due to Flury [10] and a description of it in finance related articles can be found in [15, 24]. Cr is guaranteed to be a rank r correlation matrix. If λr = 0, we only consider eigenvalues up to λr−1 . Then, the output would be Cr−1 . The PCA modification may increase the distance from C. The next result, in the spirit of [3, Lemma 3.2], provides a bound on the increase. Proposition 5.2 Let X f be the final iterate satisfying the outer stopping criterion (68). Then,  tol  kCr − Ck ≤ kX f − Ck + n tol + , 1 − tol

where tol := tolin + tolrank .

Proof. Let X f have the spectral decomposition (47) with X := X f . We note that X f = ΠS+n (C + A∗ (¯ x))

n X

and

ℓ=r+1

λℓ ≤ tolrank ,

(74)

for some x ¯ satisfying k∇θ(¯ x)k ≤ tolin . By using the formulation of ∇θ(·) in Prop. 4.1 (i), we have diag(X f ) = {∇θ(¯ x)}1:n + e, where {∇θ(¯ x)}1:n is the vector of the first n components of ∇θ(¯ x). Therefore, the diagonal elements of X f have the following bound: 1 − tolin ≤ (X f )ii ≤ 1 + tolin ,

i = 1, . . . , n.

(75)

Through the spectral decomposition (47), we have Xiif =

n X

Piℓ2 λℓ ,

i = 1, . . . , n.

ℓ=1

Let ¯i := {P Λ1/2 }i· , i = 1, . . . , n. z

Then, Xiif

2

2

= k¯ zi k = kzi k +

n X

Piℓ2 λℓ

ℓ=r+1

2

≤ kzi k +

n X

λℓ ,

ℓ=r+1

which implies by (75) and (74) kzi k2 ≥ 1 − (tolin + tolrank ) = 1 − tol.

(76)

Furthermore, Xijf

− (Cr )ij

= =

n X

r

X 1 Piℓ Pjℓ λℓ − Piℓ Pjℓ λℓ kzi kkzj k

ℓ=1 n X

ℓ=1

Piℓ Pjℓ λℓ +

r X ℓ=1

ℓ=r+1

22

Piℓ Pjℓ λℓ



1−

 1 . kzi kkzj k

(77)

We note that by Cauchy-Schwarz inequality |

r X ℓ=1

Piℓ Pjℓ λℓ | ≤

r X

Piℓ2 λℓ

r 1/2  X

2 Pjℓ λℓ

ℓ=1

ℓ=1

and |

n X

ℓ=r+1

1/2

Piℓ Pjℓ λℓ | ≤

≤ (1 + tolin )1/2 (1 + tolin )1/2 = 1 + tolin ,

n X

ℓ=r+1

λℓ ≤ tolrank .

We therefore have from (77) and (76) |Xijf − (Cr )ij | ≤ tolrank + (1 + tolin )(1/(kzi kkzj k) − 1)   1 ≤ tol 1 + . 1 − tol We then have 

tol  kCr − Ck ≤ kX − Ck + kX − Cr k ≤ kX − Ck + n tol + . 1 − tol f

f

f

 We note that in our implementation, tol = 2 × 10−8 . Therefore, the increase by using Cr is very small.

5.4

Test Problems

As mentioned before, low-rank problems of the type (1) tested in existing literature are only up to n = 80. Therefore, we collect 6 problems that have been tested before in order to conduct comparison with existing methods, mainly against Major and LCRMmin. Our algorithm is capable of solving very large scale problems (n ≥ 1000). We hence collect 6 other randomly generated problems with dimensions up to n = 2000. (E1) C is from [5] with n = 11. It is a forward correlation matrix for GBP and is given by                  

1 0.8415 0.6246 0.6231 0.533 0.4287 0.3274 0.4463 0.2439 0.3326 0.2625

0.8415 1 0.7903 0.7844 0.732 0.6346 0.4521 0.5812 0.3439 0.4533 0.3661

0.6246 0.7903 1 0.9967 0.8108 0.7239 0.5429 0.6121 0.4426 0.5189 0.4251

0.6231 0.7844 0.9967 1 0.8149 0.7286 0.5384 0.6169 0.4464 0.5233 0.4299

0.533 0.732 0.8108 0.8149 1 0.9756 0.5676 0.686 0.4969 0.5734 0.4771

0.4287 0.6346 0.7239 0.7286 0.9756 1 0.5457 0.6583 0.4921 0.551 0.4581

0.3274 0.4521 0.5429 0.5384 0.5676 0.5457 1 0.5942 0.6078 0.6751 0.6017

0.4463 0.5812 0.6121 0.6169 0.686 0.6583 0.5942 1 0.4845 0.6452 0.5673

0.2439 0.3439 0.4426 0.4464 0.4969 0.4921 0.6078 0.4845 1 0.6015 0.5200

0.3326 0.4533 0.5189 0.5233 0.5734 0.551 0.6751 0.6452 0.6015 1 0.9889

0.2625 0.3661 0.4251 0.4299 0.4771 0.4581 0.6017 0.5673 0.5200 0.9889 1



        .        

(E2) [6] Cij is the correlation between forward rates i and j and is given by Cij = 0.5 + (1 − 0.5) exp(−0.05|i − j|). (E3) [6] It is the extreme case of E2 and is generated by Cij = exp(−|i − j|). 23

estimate standard error

γ1 0.000 -

γ2 0.480 0.099

γ3 1.511 0.289

γ4 0.186 0.127

Table 1: Statistics of γi , i = 1, . . . , 4 (E4) [39] Cij is also the correlation between forward rates i and j which has the following form Cij = LongCorr + (1 − LongCorr) exp(κ|ti − tj |). κ = d1 − d2 max(ti , tj ), LongCorr = 0.3, d1 = −0.12, d2 = 0.005, and ti = 2i . (E5) [31, Sec. 9.3] C has the same form as in E4 with the following parameters: LongCorr = 0.6, κ = −0.1, ti = i. (E6) [24] The random ’interest rate’ correlation matrix C is given by Cij = exp(−γ1 |ti − tj | −

p √ γ2 |ti − tj | − γ4 | ti − tj |); γ 3 max(ti , tj )

with γi > 0, i = 1, . . . , 4, ti denoting the expiry time of rate i and ti = i. C is generated by randomizing the γ parameters, with means and standard errors given by Table 1 with γ1 , γ2 , γ4 capped at zero. (E7) [13] C = E +αR, where E is generated by Matlab’s gallery(’randcorr’,n), R is a random n × n symmetric matrix with Rij ∈ [−1, 1], i, j = 1, 2, . . . , n. α = 0.1. (E8) [18] C is a random n × n symmetric matrix with Cij ∈ [−1, 1], Cii = 1, i, j = 1, 2, . . . , n. That is, G = rand(n); G=(G+G’)/2; G=1-2*G; G = G - diag(diag(G)) +eye(n). (E9) [40] C = E + 0.05R, where E and R are generated by: d= 10.ˆ(4*[-1:1/(n-1):0]), E = gallery(’randcorr’,n*d/sum(d)), R = 2*rand(n) - 1, R = triu(R) + triu(R,1)’. (E10) C is generated by C = 2.0*rand(n,n)-ones(n,n); C = triu(C) + triu(C,1)’; for i=1:n; C(i,i) =1; end. (E11) [27] C = (1 − α)E + αR, where E and R are generated by: d= 10.ˆ(4*[-1:1/(n-1):0]), E = gallery(’randcorr’,n*d/sum(d)), R = 2*rand(n) - 1, R = triu(R) + triu(R,1)’, R = (R+R’)/2. α ∈ (0, 1). We choose α = 0.1. (E12) The matrix C is the 387 × 387 1-day correlation matrix (as of Oct 10, 2008) from the lagged datasets of RiskMetrics (www.riskmetrics.com/stddownload edu.html).

5.5

Numerical Tests

Alg. 3.1 and Alg. 4.3 were coded in Matlab and were run on a desktop (Pentium(R) D, CPU 3.00Ghz, and RAM 2.00GB). The parameters in Alg. 3.1 were set to c0 = 10, µ0 = 1, ρ = 1.4, tolrank = 10−8 , toleig = 5.0 × 10−5 24

and X 0 was taken to be the nearest correlation matrix of (3) solved by the Newton method (CorNewton.m) [26]. The parameters used in Alg. 4.3 were set to tolin = 10−8 , κ1 = 10−10 , κ2 = 1, κ3 = 104 , η = 10−2 , δ = 0.5, σ = 10−4 . In our reported results, we denote the first residual in (68) by Res1, while the second by Res2, the objective function value by f = kCr − Ck where Cr is the output solution by our method. Iter denotes the number of iterations in Alg. 3.1. Except E1 and E12, which have fixed dimensions, we name our test problems in the following way. For example, E2n100r20 mean it is from E2 with n = 100 and r = 20. We name our method as SemiNewton.m. We also tested Major.m and LCRMmin.m. In Major, we choose tol = 1.0 × 10−5 . In LRCMmin, we choose maxiter = 500 instead of the default value 200 to achieve better performance, ftol = 1.0 × 10−3 , gradtol = 1.0 × 10−5 . Other parameters are set to the default values. (a) We first try to assess the quality of low-rank solutions by SemiNewton.m using E1. C is a true correlation matrix and we try to see how well it can be approximated by various low-rank solutions. Figure 1 shows the surface of C and that of the low-rank solutions Cr with r = 2, 3, 6. It can be clearly seen that the surface of C6 is very similar to that of C; while C2 and C3 are significantly different from C. Another way of assessing the quality of the low-rank solution by our method is to compare principle components of C and its low-rank approximation. As in [39], we use E4n12r3 to demonstrate the comparison. Figure 2 shows the first three principle components of the market matrix (i.e., C) and its low-rank approximation C3 (i.e., r = 3). Figure 3 compares respectively the first row, the third row, the sixth row, and the tenth row of C and C3 , which is interpreted in [39] as “the market and model correlations between the first, the third, the sixth, and the tenth forward rates and the rest of the forward rates obtained using three-factor iterative model”. Judging from those figures, the low-rank solution is as good as reported in [39]. (b) Our second attempt is to show that both SemiNewton.m and Major.m are able to provide better solutions than LCRMmin.m. We just list one example (E2n50) to demonstrate this point. We run the three algorithms for E2n50 with r ranging from 1 to 50 and plot the cpu times and the final objective function values in Figure 4 and Figure 5. We see from Figure 4 that the cpu time used by SemiNewton.m is always less than half a second for all instances, while the other two methods run into hundreds of seconds. Moreover, the final objective function value by SemiNewton.m is always within 10−3 distance from that of Major.m. In other words, Major.m can find a slightly better objective function value than SemiNewton.m, but using much longer time. This observation for the three algorithms is generally true for all our test. Therefore, from now on we only compare with Major.m on a large set of problems. (c) Comparison with Major on small Problems (n ≤ 100). Tests are done for E2-E6 with n = 50, 100 and r = 1, 2, 3, 4, 5, 10, 25. When the maximal iteration number 20000 was reached by Major we stop it (indicated by ’-’). From Table 2, we can see that SemiNewton was able to solve all of those problems under 1 second, while Major used hundreds of seconds in some cases. There are also several failures for Major. We also observe that SemiNewton uses less than 10 iterations to reach the required accuracy. We may set a higher accuracy to tolo , but the improvement in terms of the final objective value is insignificant. Comparing to the results obtained by Major, we think the current accuracy is about the right level. 25

(a) E1n11: C

(b) E1n11,r = 2: by SemiNewton

1

1 0.8 0.6

0.5

0.4 0 15

0.2 15 15

10 0

10

5

5 0

Forward Index

15

10

10

5

0

Forward Index

Forward Index

5 0

Forward Index

(d) E1n11,r = 6: by SemiNewton

(c) E1n11,r = 3: by SemiNewton

1.5

1

1 0.5

0.5

0 −0.5 15

0 15 15

10 Forward Index

0

10

5

5 0

15

10

10

5

Forward Index

Forward Index

5 0

0

Forward Index

Figure 1: Comparison of surfaces of C, C2 , C3 , and C6 .

(d) Numerical Results on large problems (n ≥ 500). The test problems of this group (E7E12) are too big for Major. Therefore, we only test SemiNewton to see how it works for large scale problems. The last problem E12 (n = 387) comes from RiskMetrics. We include it to test our method against real data. From Table 3, we see it again took less than 10 iterations for all problems except E7n1000r20. The cputime used is not high considering the size of the problems we are solving and the computer we used. Overall, our method can solve any of those problems comfortably. In order to get a feeling on the performance of SemiNewton, we plot the results (both cputime and the final objective values) for E8n500 against r = 1, . . . , 118 (see Figure 6). We only plot instances for r up to 118. The reason is the nearest correlation matrix X 0 (i.e., the starting point) is already a low-rank matrix when r ≥ 118. This means that X 0 is the global solution of (1) when r ≥ 118 and there is no need to run SemiNewton.

(e) Our final comment is on an interesting observation. For a large set of the test problems, SemiNewton took only 2 or 3 iterations to reach the final iterate, denoted here by X f . A close examination reveals that the eigenspace generated by the leading r eigenvectors of X f is very close to that of the nearest correlation matrix X 0 measured by kP1f − P10 k, where P1f is formed by the leading r eigenvectors of X f . The only problem with X 0 is that its rank exceeds r. In other words, for most cases, X 0 and X f share almost the same leading r eigenvectors. It only 26

E4n12: Principle components: by SemiNewton 1 1st C r=3

0.5

2nd

0 3rd

−0.5 0

2

4

6

8

10

12

Figure 2: Comparison between the leading 3 principle components of C and C3 .

1

1 C r=3

0.95

0.95 0.9 correlation

correlation

0.9 0.85 0.8 0.75

0.8 0.75

0.7

0.7

0.65 0

0.85

2

4

6 8 forward index

10

0.65 0

12

1

2

4

6 8 forward index

10

12

2

4

6 8 forward index

10

12

1 0.95

0.95

correlation

correlation

0.9 0.9 0.85 0.8

0.85 0.8 0.75 0.7

0.75 0.7 0

0.65 2

4

6 8 forward index

10

12

0

Figure 3: Comparison between C and C3 : the first, the third, the sixth, and the tenth row.

27

E2n50 by SemiNewton 0.7

800

E2n50

0.6

700 600 cputime(s)

0.5 cputime(s)

MAJOR LRCM

0.4 0.3 0.2

500 400 300 200

0.1

100

0 0

0 0

50 r

50 r

Figure 4: cpu time by SemiNewton, Major, and LCRMmin. took 2 or 3 adjustments from X 0 to find the correct leading r eigenvectors of X f .

6

Conclusion

A novel approach is introduced to solve the nearest low-rank correlation matrix problem (1). The resulting sequential semismooth Newton method guarantees to produce a stationary point of the nonlinear semidefinite programming problem (9), which is an equivalent reformulation of (1). Our numerical results show the method is highly efficient and outperform the best available methods for (1). The key to the success of our method is the fast semismooth Newton method used to solve the subproblem at each iteration. The semismooth Newton method heavily depends on the subproblem being reformulated as a least-square problem. This brings out our first question: how to extend our method to the nearest low-rank correlation matrix problem with H-weighting: minX∈S n s.t.

1 2 kH

◦ (X − C)k diag(X) = e, X  0 rank(X) ≤ r.

For the nearest correlation matrix problem (3), this has been done in [27]. We leave this topic to our future research. The second question is to characterize the local and global minima of the new reformulation (9). Such characterizations will lead to significant improvement in moving the obtained stationary point toward a local/global minimum of (9). We feel this is a difficult question, but is worth serious investigation. Acknowledgments. We would like to thank Prof. Defeng Sun of National University of Singapore for bringing up the idea to treat the subproblem (22) as a least-square problem so 28

8

−4 x 10 E2n50

E2n50 0.02 0 f(SemiNewton)−f(LRCM)

f(SemiNewton)−f(MAJOR)

7 6 5 4 3 2

−0.02 −0.04 −0.06 −0.08

1 0 0

50

−0.1 0

50

r

r

Figure 5: Difference of objective values obtained by SemiNewton and Major, LCRMmin.

E8n500 by SemiNewton

E8n500 by SemiNewton

550

55

500

50 45

450

40 cputime(s)

f

400 350

35 30

300 25 250

20

200 150 0

15 20

40

60 r

80

100

120

10 0

20

40

60 r

80

100

Figure 6: cpu time and final objective values on E8n500 by SemiNewton

29

120

that the semismooth Newton method can be developed. We also like to thank Dr Zhaosong Lu of Simon Fraser University for bring [16] to our attention and for the helpful discussion on the feasibility issue in augmented Lagrangian methods.

Table 2

E2n50r1 E2n50r2 E2n50r3 E2n50r4 E2n50r5 E2n50r10 E2n50r25 E2n100r1 E2n100r2 E2n100r3 E2n100r4 E2n100r5 E2n100r10 E2n100r25 E3n50r1 E3n50r2 E3n50r3 E3n50r4 E3n50r5 E3n50r10 E3n50r25 E3n100r1 E3n100r2 E3n100r3 E3n100r4 E3n100r5 E3n100r10 E3n100r25 E4n50r1 E4n50r2 E4n50r3 E4n50r4 E4n50r5 E4n50r10 E4n50r25 E4n100r1 E4n100r2 E4n100r3 E4n100r4 E4n100r5 E4n100r10 E4n100r25

Iter 4 4 3 3 2 2 3 3 3 3 3 3 2 2 2 2 2 3 4 6 18 2 2 2 2 2 3 5 3 3 4 3 3 2 2 3 2 6 3 4 4 2

SemiNewton f Res1 1.39E+01 2.11E-09 5.97E+00 3.65E-12 3.17E+00 5.66E-12 2.05E+00 1.75E-09 1.44E+00 6.69E-10 4.90E-01 4.38E-10 1.17E-01 2.66E-10 3.67E+01 2.93E-12 1.91E+01 1.63E-10 1.12E+01 3.82E-09 7.60E+00 1.72E-09 5.47E+00 6.47E-11 1.93E+00 4.50E-11 4.78E-01 1.43E-09 4.85E+01 7.94E-10 3.33E+01 9.49E-12 2.63E+01 7.36E-12 2.19E+01 5.72E-10 1.89E+01 1.37E-11 1.10E+01 9.17E-12 3.95E+00 7.01E-09 9.85E+01 2.41E-09 6.88E+01 2.92E-09 5.53E+01 1.83E-09 4.71E+01 1.12E-10 4.14E+01 1.45E-10 2.67E+01 6.74E-11 1.26E+01 5.97E-13 2.58E+01 6.28E-09 1.31E+01 9.48E-11 7.53E+00 3.19E-12 5.02E+00 4.06E-09 3.58E+00 5.91E-10 1.24E+00 2.07E-10 2.96E-01 1.21E-11 6.21E+01 6.82E-11 3.97E+01 7.34E-11 2.69E+01 7.95E-09 2.02E+01 1.50E-09 1.56E+01 4.98E-11 6.20E+00 3.95E-09 1.60E+00 3.95E-11

Res2 3.29E-08 2.02E-06 1.98E-06 3.05E-06 7.58E-06 1.18E-05 1.94E-05 1.01E-07 1.57E-06 7.26E-07 8.57E-07 8.91E-07 2.91E-06 8.74E-06 8.10E-05 6.92E-05 5.85E-05 8.43E-05 5.25E-05 7.23E-05 2.08E-05 2.29E-05 2.13E-05 1.97E-05 1.78E-05 1.77E-05 4.59E-05 1.03E-05 1.36E-06 6.22E-06 2.50E-05 3.93E-06 3.73E-06 1.24E-05 6.50E-05 2.32E-07 7.95E-06 5.88E-05 1.42E-05 9.95E-06 2.00E-06 7.69E-06

30

t(s) 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 0.3 0.4 0.2 0.2 0.2 0.2 0.3 0.5 1.4 0.5 0.5 0.5 0.4 0.5 0.6 1.0 0.2 0.2 0.3 0.2 0.2 0.1 0.2 0.6 0.4 1.1 0.6 0.7 0.6 0.4

Iter 0 114 174 345 562 3593 0 145 138 239 396 2266 0 1765 628 350 254 117 196 0 7687 3767 2262 719 245 0 52 69 128 205 1250 16424 0 184 140 165 200 643 5980

Major f Res 1.39E+01 0.00E+00 5.97E+00 9.92E-06 3.17E+00 9.63E-06 2.05E+00 9.77E-06 1.44E+00 9.89E-06 4.89E-01 1.00E-05 3.67E+01 0.00E+00 1.91E+01 9.64E-06 1.12E+01 9.34E-06 7.60E+00 9.84E-06 5.47E+00 9.92E-06 1.93E+00 9.99E-06 4.85E+01 0.00E+00 3.32E+01 9.96E-06 2.62E+01 9.73E-06 2.19E+01 9.90E-06 1.89E+01 9.69E-06 1.10E+01 9.56E-06 3.95E+00 9.99E-06 9.85E+01 0.00E+00 5.51E+01 9.99E-06 4.69E+01 9.99E-06 4.13E+01 9.97E-06 2.67E+01 9.97E-06 1.26E+01 9.59E-06 2.58E+01 0.00E+00 1.31E+01 9.52E-06 7.53E+00 8.63E-06 5.02E+00 9.62E-06 3.58E+00 9.88E-06 1.24E+00 9.96E-06 2.95E-01 1.00E-05 6.21E+01 0.00E+00 3.86E+01 9.39E-06 2.69E+01 9.82E-06 2.01E+01 9.37E-06 1.55E+01 9.80E-06 6.20E+00 9.86E-06 1.60E+00 9.99E-06

t(s) 0.0 0.4 0.7 1.3 2.3 19.4 0.0 1.0 1.1 2.0 3.5 26.0 0.0 5.8 2.4 1.4 1.1 0.7 2.4 0.0 60.3 31.1 19.8 8.4 6.5 0.0 0.2 0.2 0.5 0.8 6.8 192.4 0.0 1.3 1.1 1.4 1.8 7.6 157.3

Table 2 Con’t

E5n50r1 E5n50r2 E5n50r3 E5n50r4 E5n50r5 E5n50r10 E5n50r25 E5n100r1 E5n100r2 E5n100r3 E5n100r4 E5n100r5 E5n100r10 E5n100r25 E6n50r1 E6n50r2 E6n50r3 E6n50r4 E6n50r5 E6n50r10 E6n50r25 E6n100r1 E6n100r2 E6n100r3 E6n100r4 E6n100r5 E6n100r10 E6n100r25

Iter 4 3 3 3 3 2 3 4 3 4 4 6 4 3 3 3 3 2 2 3 2 3 3 3 3 3 2 2

SemiNewton f Res1 1.47E+01 4.93E-11 7.67E+00 2.34E-11 4.48E+00 3.32E-09 3.04E+00 2.17E-11 2.19E+00 5.81E-10 7.73E-01 1.37E-10 1.86E-01 2.94E-10 3.43E+01 2.11E-10 2.08E+01 2.33E-11 1.38E+01 3.71E-11 1.01E+01 3.56E-11 7.67E+00 8.97E-09 2.97E+00 4.79E-10 7.59E-01 5.65E-10 2.23E+01 1.66E-09 9.06E+00 3.09E-10 5.15E+00 8.46E-12 3.43E+00 3.58E-10 2.49E+00 5.43E-11 8.95E-01 4.26E-10 1.90E-01 2.70E-10 4.10E+01 8.01E-10 1.55E+01 1.15E-11 8.44E+00 4.50E-09 5.47E+00 6.80E-09 3.92E+00 4.26E-10 1.40E+00 3.34E-10 3.43E-01 1.19E-11

Res2 3.94E-08 7.17E-06 2.79E-06 3.43E-06 3.57E-06 1.31E-05 1.94E-05 9.85E-10 3.00E-06 7.51E-07 5.98E-07 3.09E-06 8.24E-07 2.51E-06 5.42E-06 5.73E-06 9.14E-06 1.69E-05 1.44E-05 2.78E-05 5.21E-05 7.66E-07 1.00E-06 1.37E-06 1.88E-06 2.37E-06 2.86E-06 1.33E-05

31

t(s) 0.3 0.2 0.2 0.2 0.2 0.1 0.2 0.6 0.5 0.6 0.6 0.9 0.6 0.6 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.6 0.5 0.5 0.5 0.5 0.4 0.4

Iter 0 206 167 286 471 2542 15872 0 315 279 375 487 1853 15528 0 45 110 216 381 2422 0 53 136 275 512 3274 20001

Major f Res 1.47E+01 0.00E+00 7.66E+00 9.61E-06 4.48E+00 9.89E-06 3.04E+00 9.71E-06 2.19E+00 9.89E-06 7.72E-01 9.98E-06 1.86E-01 1.00E-05 3.43E+01 0.00E+00 2.07E+01 9.97E-06 1.38E+01 9.59E-06 1.01E+01 9.81E-06 7.67E+00 9.96E-06 2.97E+00 1.00E-05 7.59E-01 1.00E-05 2.23E+01 0.00E+00 9.06E+00 9.82E-06 5.15E+00 9.12E-06 3.42E+00 9.51E-06 2.48E+00 9.89E-06 8.91E-01 9.97E-06 4.10E+01 0.00E+00 1.55E+01 9.64E-06 8.44E+00 9.70E-06 5.47E+00 9.64E-06 3.91E+00 9.89E-06 1.40E+00 9.99E-06 3.40E-01 2.54E-04

t(s) 0.0 0.7 0.6 1.1 1.9 13.8 187.0 0.0 2.2 2.1 3.1 4.2 22.1 406.0 0.0 0.2 0.4 0.8 1.6 13.5 0.0 0.4 1.0 2.2 4.2 37.8 514.5

E7n500r10 E7n500r20 E7n500r25 E7n500r50 E7n1000r20 E7n1000r40 E7n1000r50 E7n1000r100 E7n1500r30 E7n1500r60 E7n1500r75 E7n1500r150 E7n2000r40 E7n2000r80 E7n2000r100 E7n2000r200 E8n500r10 E8n500r20 E8n500r25 E8n500r50 E8n1000r20 E8n1000r40 E8n1000r50 E8n1000r100 E8n1500r30 E8n1500r60 E8n1500r75 E8n1500r150 E8n2000r40 E8n2000r80 E8n2000r100 E8n2000r200 E9n500r10 E9n500r20 E9n500r25 E9n500r50 E9n1000r20 E9n1000r40 E9n1000r50 E9n1000r100 E9n1500r30 E9n1500r60 E9n1500r75 E9n1500r150 E9n2000r40 E9n2000r80 E9n2000r100 E9n2000r200 E10n500r10 E10n500r20 E10n500r25 E10n500r50 E10n1000r20 E10n1000r40 E10n1000r50 E10n1000r100

Iter 1 5 5 6 15 8 6 5 9 6 4 1 6 1 1 1 1 4 4 3 5 5 4 3 4 3 3 2 3 2 1 2 1 4 3 3 7 6 4 2 5 3 5 1 4 1 1 1 1 4 3 5 9 4 3 3

Table f 1.32E+02 8.90E+01 7.82E+01 5.05E+01 1.83E+02 1.25E+02 1.10E+02 7.02E+01 2.24E+02 1.52E+02 1.33E+02 8.50E+01 2.58E+02 1.75E+02 1.53E+02 9.72E+01 2.23E+02 1.93E+02 1.87E+02 1.77E+02 4.10E+02 3.80E+02 3.74E+02 3.65E+02 6.00E+02 5.71E+02 5.65E+02 5.57E+02 7.93E+02 8.62E+02 7.58E+02 7.51E+02 1.39E+02 8.54E+01 7.13E+01 3.60E+01 1.96E+02 1.22E+02 1.02E+02 5.34E+01 2.41E+02 1.50E+02 1.27E+02 6.83E+01 2.79E+02 1.75E+02 1.48E+02 8.21E+01 2.90E+02 2.69E+02 2.64E+02 2.58E+02 5.61E+02 5.40E+02 5.36E+02 5.31E+02

32

3 Res1 7.97E-09 5.51E-10 4.72E-11 8.76E-11 1.76E-11 9.21E-12 9.68E-10 2.27E-13 1.20E-10 2.00E-11 3.64E-12 3.27E-09 9.87E-11 2.62E-10 2.27E-13 9.97E-10 1.68E-09 8.56E-09 7.12E-10 5.33E-11 2.53E-10 5.34E-12 3.22E-10 5.03E-09 7.47E-10 2.46E-09 1.02E-10 5.53E-11 2.26E-10 1.54E-09 1.03E-09 3.14E-11 7.08E-11 3.07E-12 2.84E-12 6.82E-13 2.06E-11 2.50E-12 2.45E-10 1.35E-11 3.36E-10 2.50E-12 7.24E-09 9.38E-09 8.41E-12 1.45E-10 1.14E-11 8.19E-09 2.05E-10 2.95E-09 2.53E-11 6.84E-09 6.97E-10 6.82E-13 2.73E-12 4.12E-10

Res2 8.13E-05 8.30E-05 6.84E-05 4.25E-05 3.26E-05 5.06E-05 8.10E-05 6.38E-05 7.76E-05 5.05E-05 9.66E-05 7.76E-05 7.54E-05 7.95E-05 7.04E-05 4.99E-05 9.55E-05 1.42E-05 4.07E-05 5.73E-05 1.32E-05 5.03E-05 6.95E-05 5.66E-05 6.60E-05 6.14E-05 5.08E-05 8.19E-05 8.33E-05 7.88E-05 6.74E-05 4.98E-05 9.32E-05 8.52E-05 7.16E-05 4.01E-05 1.72E-06 1.27E-06 3.69E-06 9.34E-05 1.14E-05 7.77E-05 9.52E-07 7.12E-05 4.08E-05 8.04E-05 7.09E-05 5.13E-05 8.27E-05 5.71E-05 4.18E-05 1.49E-05 3.49E-06 5.25E-05 8.99E-05 1.57E-06

t(s) 11.7 42.7 41.3 45.1 781.3 376.5 286.8 240.9 1357.7 837.8 581.7 171.4 1992.1 421.7 424.0 378.7 12.6 33.8 33.5 25.2 269.3 260.9 213.3 172.9 637.2 479.5 485.2 355.8 1142.9 888.1 432.6 789.5 11.4 30.1 24.0 23.7 325.5 267.6 184.9 107.0 709.3 429.8 690.5 182.5 1276.0 368.2 372.1 390.0 13.1 32.7 25.5 41.6 465.6 216.8 164.4 175.3

E10n1500r30 E10n1500r60 E10n1500r75 E10n1500r150 E10n2000r40 E10n2000r80 E10n2000r100 E10n2000r200 E11n500r10 E11n500r20 E11n500r25 E11n500r50 E11n1000r20 E11n1000r40 E11n1000r50 E11n1000r100 E11n1500r30 E11n1500r60 E11n1500r75 E11n1500r150 E11n2000r40 E11n2000r80 E11n2000r100 E11n2000r200 E12n387r1 E12n387r2 E12n387r5 E12n387r10 E12n387r25

Iter 4 3 2 3 3 1 1 3 1 5 4 3 5 4 3 3 7 5 3 1 4 1 1 1 3 3 3 3 3

Table 3 Con’t f Res1 8.36E+02 4.14E-11 8.16E+02 2.72E-09 8.12E+02 5.17E-09 8.07E+02 7.50E-12 1.11E+03 6.93E-11 1.09E+03 6.32E-09 1.09E+03 4.52E-09 1.09E+03 6.40E-09 1.42E+02 6.37E-11 8.95E+01 3.18E-12 7.60E+01 1.11E-10 4.29E+01 3.75E-11 2.03E+02 3.52E-10 1.31E+02 6.82E-13 1.12E+02 7.45E-11 6.85E+01 1.33E-10 2.51E+02 5.91E-12 1.65E+02 7.50E-11 1.43E+02 0.00E+00 9.23E+01 1.18E-09 2.94E+02 6.82E-13 1.96E+02 6.82E-09 1.72E+02 1.03E-09 1.16E+02 2.08E-09 3.12E+02 4.66E-09 1.65E+02 3.86E-10 6.25E+01 2.06E-09 2.42E+01 6.00E-11 3.41E+00 6.91E-09

Res2 6.69E-06 7.08E-06 7.96E-06 2.71E-08 7.08E-06 8.24E-06 7.46E-06 4.33E-06 8.71E-06 4.00E-06 5.67E-06 6.97E-06 1.91E-06 7.96E-07 9.25E-06 6.34E-06 1.42E-07 2.22E-06 6.54E-06 7.72E-06 6.77E-06 7.98E-06 6.93E-06 5.03E-06 1.29E-07 1.74E-07 3.96E-07 7.35E-06 7.30E-06

t(s) 633.3 480.1 339.8 492.7 1052.4 426.6 450.4 1119.8 12.2 36.9 31.6 24.1 242.5 191.1 150.6 151.7 1023.7 698.8 452.2 184.0 1342.5 380.4 385.8 403.1 16.9 17.7 15.6 11.9 11.6

References [1] F. Alizadeh, J.-P. A. Haeberly, M. L. Overton, Complementarity and nondegeneracy in semidefinite programming. Math. Programming 77 (1997), 111-128. [2] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems, Springer, New York 2000. [3] R. Borsdorf and N.J. Higham, A preconditioned Newton algorithm for the nearest correlation matrix, to appear in: IMA J. Numer. Anal.. [4] S. Boyd and L. Xiao, Least-squares covariance matrix adjustment, SIAM Journal on Matrix Analysis and Applications 27 (2005), pp. 532-546. [5] A. Brace, D. Gatarek, and M. Musiela, The market model of interest rate dynamics. Mathematical Finance 7 (2), 1997, pp.127-155. [6] D. Brigo, A note on correlation and rank reduction, 2002 (www.damianobrigo.it). [7] D. Brigo and F. Mercurio, Interest Rate Models: Theory and Practice, SpringerVerlag, Berlin, 2001. 33

[8] F.H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, 1983. [9] B.C. Eaves, On the basic theorem for complementarity, Math. Programming 1 (1971), pp. 68–75. [10] B. Flury, Common Principle Components and Related Multivariate Models, J. Wiley & Sons, New York. ´ and R. Pietersz, Efficient rank reduction of correlation matrices, Linear [11] I. Grubiˇ sic Algebra and Its Applications 422 (2007), pp. 629–653. [12] M.R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand. 49 (1952), pp. 409–436. [13] N.J. Higham, Computing the nearest correlation matrix – a problem from finance, IMA J. Numer. Analysis 22 (2002), pp. 329–343. [14] J.-B. Hiriart-Urruty and D. Ye Sensitivity analysis of all eigenvalues of a symmetric matrix, Numer. Math., 70 (1995), 45–72. [15] J.C. Hull and A. White, Forward rate volatilities, swap rate volatilities, and implementation of the LIBOR market model, Journal of Fixed Income, 10 (2000), 46–62. [16] Z. Lu and Y. Zhang, An augmented Lagrangian approach for sparse principal component analysis. Technical report, Department of Mathematics, Simon Fraser University, 2009. [17] B. Kummer, Newton’s method for nondifferentiable functions, Advances in Mathematical Optimization, 114–125, Math. Res., 45, Akademie-Verlag, Berlin, 1988. [18] J. Malick, A dual approach to semidefinite least-squares problems, SIAM Journal on Matrix Analysis and Applications 26 (2004), pp. 272–284. [19] F. Meng, D.F. Sun, and G.Y. Zhao, Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization, Math. Programming 104 (2005), pp. 561–581. [20] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 1999. [21] M. Overton and R. S. Womersley, On the sum of the largest eigenvalues of a symmetric matrix, SIAM J. Matrix Anal. Appl., 12 (1992), 41–45. [22] M. Overton and R. S. Womersley, Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices, Math. Programming, 62 (1993), 321–357. [23] J.S. Pang, D.F. Sun, and J. Sun, Semismooth homeomorphisms and strong stability of semidefinite and Lorentz complementarity problems, Mathematics of Operations Research 28 (2003), pp. 39–63. ´, Rank reduction of correlation matrices by majorization, [24] R. Pietersz and I. Grubiˇ sic Quantitative Finance 4 (2004), 649–662. 34

[25] H.D. Qi, Local duality of nonlinear semidefinite programming, Mathematics of Operations Research, 34 (2009), p. 124–141. [26] H.D. Qi and D.F. Sun, A quadratically convergent Newton method for computing the nearest correlation matrix, SIAM Journal on Matrix Analysis and Applications 28 (2006), pp. 360–385. [27] H.D. Qi and D.F. Sun An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem. To appear in: IMA J. Numer. Analysis. [28] L. Qi and J. Sun, A nonsmooth version of Newton’s method, Math. Programming 58 (1993), pp. 353–367. [29] F. Rapisarda, D. Brigo, F. Mercurio, Parametrizing correlations: a geometric interpretation, Banca IMI Working Paper, 2002 (www.fabiomercurio.it). [30] R. Rebonato, On the simultaneous calibration of multifactor lognormal interest rate models to Black volatilities and to the correlation matrix, J. Comput. Finance 2 (1999), pp. 5–27. [31] R. Rebonato, Modern Pricing and Interest-Rate Derivatives, Princeton University Press, New Jersey, 2002. [32] R. Rebonato, Interest-rate term-structure pricing models: a review, Proc. Royal Soc. London Ser. A 460 (2004), pp. 667–728. [33] R. T. Rockafellar, Conjugate Duality and Optimization. SIAM, Philadelphia, 1974. [34] D.F. Sun, The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications, Mathematics of Operations Research 31 (2006), pp. 761–776. [35] D.F. Sun and J. Sun, Semismooth matrix valued functions, Math. Oper. Res. 27 (2002), pp. 150–169. [36] K.C. Toh, An inexact path-following algorithm for convex quadratic SDP, Math. Programming, 112 (2008), 221–254. ¨ tu ¨ncu ¨, and M.J. Todd, Inexact primal-dual path-following algo[37] K.C. Toh, R.H. Tu rithms for a special class of convex quadratic SDP and related problems, Pacific J. Optimization, 3 (2007), pp. 135–164. [38] L. Wu, Fast at-the-money calibration of the LIBOR market model using Lagrangian multipliers, J. Comput. Finance 6 (2003), pp. 39–77. [39] Z. Zhang and L. Wu, Optimal low-rank approximation to a correlation matrix, Linear Algebra and Its Applications 364 (2003), pp. 161–187. [40] X.Y. Zhao, D.F. Sun, K.C. Toh, A Newton-CG augmented Lagrangian method for semidefinite programming. Technical Report, National University of Singapore (2008). To appear in SIAM J. Optimization.

35