An investigation of feasible descent algorithms for estimating the ...

An investigation of feasible descent algorithms for estimating the condition number of a matrix Carmo P. Brás ∗ E-Mail: [email protected] William W. Hager † E-Mail: [email protected] Joaquim J. Júdice ‡ E-Mail: [email protected] 20th October 2009

Abstract

Techniques for estimating the condition number of a nonsingular matrix are developed. It is shown that Hager’s 1-norm condition number estimator is equivalent to the conditional gradient algorithm applied to the problem of maximizing the 1-norm of a matrix-vector product over the unit sphere in the 1-norm. By changing the constraint in this optimization problem from the unit sphere to the unit simplex, a new formulation is obtained which is the basis for both conditional gradient and projected gradient algorithms. In the test problems, the spectral projected gradient algorithm yields condition number estimates at least as good as those obtained by the previous approach. Moreover, in some cases, the spectral gradient projection algorithm, with a careful choice of the parameters, yields improved condition number estimates. Key Words: Condition number, Numerical linear algebra, Nonlinear programming, Gradient algorithms. Classification MSC: 15A12, 49M37, 90C30. ∗ CMA, Departamento de Matemática, Faculdade de Ciências e Tecnologia, FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal. † Department of Mathematics, University of Florida, Gainesville, FL 32611-8105. Research partly supported by National Science Foundation Grant 0620286. ‡ Departamento de Matemática da Universidade de Coimbra and Instituto de Telecomunicações, 3001-454 Coimbra, Portugal.

1

1

Introduction

The role of condition number is well established in a variety of numerical analysis problems (see, for example [7, 9, 11]). In fact, the condition number is useful for assessing the complexity of a linear system and the error in the solution since it provides information about the sensitivity of the solution to perturbations in the data. For large matrices, the calculation of the exact value is impracticable as it involves a large number of operations. Therefore, techniques have been developed to approximate

−1

A without computing the inverse of the matrix A. Hager’s 1-norm condition number estimator 1

[10] is implemented in the MATLAB builtin function Condest. This estimate is a gradient method for computing a stationary point of the problem M aximize kAxk1 subject to kxk1 = 1.

(1)

1 1 1 , ,..., The algorithm starts from x = , evaluates a subgradient, and then moves to a vertex n n n ±ej , j = 1, 2, . . . , n, which results in the largest increase in the objective function based on the subgradient. The iteration is repeated until a stationary point is reached, often within two or three iterations. We show that this algorithm is essentially the conditional gradient (Cg) algorithm. Since the solution to (1) is achieved at a vertex of the feasible set, an equivalent formulation is M aximize kAxk1 subject to eT x = 1 x≥0

(2)

where e is the vector with every element equal to 1. We propose conditional gradient and gradient projected algorithms for computing a stationary point of (2). The paper is organized as follows. In Section 2 Hager’s algorithm is described and its equivalence with a Cg algorithm is shown. The Simplex formulation of the condition number problem and a Cg algorithm for solving this nonlinear program are introduced in Section 3. The Spg algorithm is discussed in Section 4. Computational experience is reported in Section 5 and some conclusions are drawn in the last section.

2

Hager’s Algorithm for Condition Number Estimation

Definition 1 The 1-norm of a square matrix A ∈ Rn×n is defined by kAk1 =

where kxk1 =

max

x∈Rn \{0} n X

kAxk1 kxk1

(3)

| xj |.

j=1

In practice the computation of the 1-norm of a matrix A is not difficult since kAk1 = max

1≤j≤n

2

n X i=1

| aij | .

In this paper we are interested in the computation of the 1-norm of the inverse of a nonsingular matrix A. This norm is important for computing the condition number of a matrix A, a concept that is quite relevant in numerical linear algebra [8, 13]. We recall the following definition. Definition 2 The condition number for the 1-norm of a nonsingular matrix A is given by

where A−1 is the inverse of matrix A.

cond1 (A) = kAk1 A−1 1

In practice, computing the inverse of a matrix involves too much effort, particulary when the order is large. Instead, the condition number of a nonsingular matrix is estimated by cond1 (A) = kAk1 β

where β is an estimate of A−1 1 . To find a β that is close to the true value of A−1 1 , the following optimization problem is considered in [10] n X X n a x M aximize F (x) = kAxk1 = ij j (4) i=1 j=1 subject to

x∈K

where K = {x ∈ Rn : kxk1 ≤ 1}. The following properties hold [12]. (i) F is differentiable at all x ∈ K such that (Ax)i 6= 0, ∀i. (ii) If k∇F (x)k∞ ≤ ∇F (x)T x,

(5)

then x is a stationary point of F on K. (iii) F is convex on K and the global maximum of F is attained at one of the vertices ej of the convex set K. The optimal solution of the program (4) is not in general unique and there may exist an optimal solution which is not one of the unit vectors ei . In fact, the following result holds. Theorem 1 Let J be a subset of {1, . . . , n} such that kAk1 = kA.j k1 , for all j ∈ J and A.j ≥ 0 for all j ∈ J. If J is nonempty, then any convex combination x ¯ of the canonical vectors ej , j ∈ J also satisfies kAk1 = kA¯ xk1 . Proof: Let x ¯=

X

j∈J

3

x ¯j ej

where x ¯j ≥ 0, for all j ∈ J and

X

x ¯j = 1. Then

j∈J

(A¯ x)i =

X

aij x ¯j , i = 1, 2, . . . , n.

j∈J

Since A.j ≥ 0 for all j ∈ J, then n X X kA¯ xk1 = a x ¯ ij j i=1 j ∈ J n X X aij x ¯j = i=1 j ∈ J

=

x ¯j

X

x ¯j

j∈J

=

n X

X

j∈ J

= 

X

j∈J

aij aij

i=1 n X i=1 

!

x ¯j  kAk1 = kAk1

For instance, let 

Hence J = {1, 2}. If x ¯=

2 1 1 2 e + e = 3 3

1  A= 2 3

2 3

1 3

0

 4 0  0 −1  . 2 4

T

, then kA¯ xk1 = kAk1 = 6.

Hager’s algorithm [10] has been designed to compute a stationary point for the optimization problem (4). The algorithm investigates only vectors of the canonical basis with exception of the initial point, and moves through these vectors using search directions based on the gradient of F . If F is not differentiable at one of these vectors, then a subgradient of F can be easily computed [10]. The inequality (5) provides a stopping criterion for the algorithm. The steps of the algorithm are presented below, where sign(y) denotes a vector with components   1 if y ≥ 0 i (sign(y))i =  −1 if y < 0. i

HAGER’S ESTIMATOR FOR kAk1

Step 0: Initialization Let x = n1 , n1 , . . . , n1 .

Step 1: Determination of the Gradient or Subgradient of F , z Compute y = Ax, ξ = sign(y), z = AT ξ. 4

Step 2: Stopping Criterion If kzk∞ ≤ z T x , stop with γ = kyk1 the estimate for kAk1 .

Step 3: Search Direction Let r be such that kzk∞ =| zr |= max | zi |. Set x = er and return to Step 1. 1≤i≤n

This algorithm can now be used to find an estimation of B −1 1 by setting A = B −1 . In this case, the vectors y and z are computed by y = B −1 x

⇔

By = x

−1 T

⇔

B T z = ξ.

z = (B

) ξ

So, in each iteration of Hager’s method for estimating B −1 1 two linear systems with the matrices B

and B T are required to be solved. If the LU decomposition of B is known, then each iteration of the algorithm corresponds to the solution of four triangular systems. Since the algorithm only guarantees a stationary point of the function F (x) = kAxk1 on the convex set K, the procedure is applied from different starting points in order to get a better estimate for cond1 (B). The whole procedure is discussed in [13] and has been implemented in MATLAB [16]. Furthermore, the algorithm provides an estimator for cond1 (B) that is smaller than or equal to the true value of the condition number of the matrix B in the 1-norm. The purpose of this paper is to investigate Hager’s algorithm and to show how it might be improved. In [2], the conditional gradient algorithm is discussed to deal with nonlinear programs with simple

constraints. The algorithm is a descent procedure that possess global convergence towards a stationary point under reasonable hypothesis [2]. To describe an iteration of the procedure consider again the nonlinear program (4) and suppose that F is differentiable. If x ¯ ∈ K, then a search direction is computed as d = y¯ − x ¯ where y¯ ∈ K is the optimal solution of the convex program OP T : M aximize ∇F (¯ x)T (y − x ¯) subject to y ∈ K. Two cases may occur: (i) The optimal value is nonpositive and y¯ is a stationary point of F on K; (ii) The optimal value is positive and d is an ascent direction of F at x ¯. In order to compute this optimal solution y¯, we introduce the following change of variables y i = u i − vi ui ≥ 0, vi ≥ 0

(6)

ui vi = 0, for all i=1,2,. . . ,n. Then OP T is equivalent to the following Mathematical Program with Linear Complementarity Con-

5

straints M aximize ∇F (¯ x)T (u − v − x ¯) T T subject to e u + e v ≤ 1 u ≥ 0, v ≥ 0 uT v = 0 where e is a vector of ones of order n. As discussed in [17], the complementarity constraint is redundant and OP T is equivalent to the following linear program LP :

M aximize ∇F (¯ x)T (u − v − x ¯) subject to

eT u + eT v ≤ 1 u ≥ 0, v ≥ 0.

Suppose that |∇r F (¯ x)| = max |∇i F (¯ x)|. 1≤i≤n

An optimal solution (¯ u, v¯) of LP is given by  1 if j = r (i) ∇r F (¯ x) > 0 ⇒ u ¯j = , v¯ = 0 0 otherwise

 1 if j = r (ii) ∇r F (¯ x) < 0 ⇒ v¯j = , 0 otherwise

u ¯=0

It then follows from (6) that

y¯ = sign(∇r F (¯ x)) er . After the computation of the search direction, a stepsize can be computed by an exact line-search M aximize F (¯ x + αd) = g(α) subject to α ∈ [0, 1]. Since F is convex on Rn , g is convex on [0, 1] and the maximum is achieved at α = 1 [1]. Hence the next iterate computed by the algorithm is given by x ˜=x ¯ + y¯ − x ¯ = y¯ = sign(∇r F (¯ x)) er . Hence, the conditional gradient algorithm is essentially the same as Hager’s gradient method for (4). Except for signs, the same iterates are generated and the algorithms terminate within n steps at the same iteration.

3

A Simplex Formulation

In this section we introduce a different nonlinear program for kAk1 estimation n X X n M aximize F (x) = kAxk1 = aij xj i=1 j=1 subject to eT x = 1 x ≥ 0.

6

(7)

Note that the constraint set of this nonlinear program is the ordinary simplex ∆. Since ∆ has the extreme points e1 , e2 , . . . , en , the maximum of the F function is attained at one of them, which gives the 1-norm of the matrix A. As in Theorem 1, this maximum can also be attained at a point that is not a vertex of the simplex. As before the difficulty for computing kAk1 is due to the concavity of program (7). If the matrix has nonnegative elements the problem can be easily solved. In fact, if A ≥ 0 and x ≥ 0, then ! n X n n n X n X X X X n aij xj = aij xj = F (x) = aij xj . i=1 j=1 j=1 i=1 i=1 j=1

Hence

F (x) = dT x with d = AT e. Therefore the program (7) reduces to the linear program M aximize dT x subject to eT x = 1 x≥0 which has the optimal solution x = er ,

with dr = max{di : i = 1, . . . , n}.

Hence kAk1 = dr . This observation also appears in [10]. Like the previous algorithm, the importance of this procedure is due to the possibility of computing a good matrix condition number estimator. Let A be a Minkowski matrix, i.e., A ∈ P and the nondiagonal elements are all nonnegative. Then A−1 ≥ 0 [6] and

with d the vector given by

−1

A = dr = max{di , i = 1, . . . , n}, 1

AT d = e.

(8)

It should be noted that this type of matrices appears very often in the solution of second order ordinary and elliptic partial differential equations by finite elements and finite differences [14]. So the condition number of a Minkowski matrix can be computed by solving a unique system of linear equations (8) and setting cond1 (A) = kAk1 dr where d is the unique solution of (8).

In the general case let us assume that F is differentiable. Then the following result holds.

7

Theorem 2 If F is differentiable on an open set containing the simplex ∆ and if

max ∇i F (x) ≤ i

∇F (x)T x, then x is a stationary point of F on ∆. Proof: For each y ∈ ∆, ∇F (x)T y =

n X

∇i F (x) yi ≤

n X

(max ∇i F (x)) yi ≤

i=1

i=1

≤ max ∇i F (x) 1≤i≤n

n X

yi

i=1

Hence, by hypothesis,

!

= max ∇i F (x). 1≤i≤n

∇F (x)T (y − x) ≤ 0, ∀ y ∈ ∆ and x is a stationary point of F on ∆.

Next we describe the conditional gradient algorithm for the solution of the nonlinear program (7). As before, if x ¯ ∈ ∆ then the search direction d is given by d = y¯ − x ¯ where y¯ ∈ ∆ is the optimal solution of the linear program LP :

M aximize ∇F (¯ x)T (y − x ¯) subject to

eT y = 1 y ≥ 0.

Two cases may occur: (i) Maximum value is nonpositive and y¯ is a stationary point of F on ∆. (ii) Maximum value is positive and d is an ascent direction of F at x ¯. Furthermore y¯ is given by

where r is the index satisfying

 1 y¯i = 0

if i = r otherwise

∇r F (¯ x) = max {∇i F (¯ x)}. 1≤i≤n

An exact line-search along this direction leads to α = 1 and the next iterate is x ˜=x ¯ + y¯ − x ¯ = y¯ = er . Hence the conditional gradient algorithm for estimating kAk1 can be described as follows.

CONDITIONAL GRADIENT ESTIMATOR FOR kAk1 Step 0: Initialization Let x = n1 , n1 , . . . , n1 . 8

Step 1: Determination of the Gradient or Subgradient of F , z Compute y = Ax, ξ = sign(y), z = AT ξ. Step 2: Stopping Criterion If max zi ≤ z T x, stop the algorithm with γ = kyk1 the estimate for kAk1 . 1≤i≤n

Step 3: Search Direction Consider zr = max zi . 1≤i≤n

Set x = er and return to Step 1.

As discussed in [2] this algorithm possesses global convergence towards a stationary point of the function F on the simplex. Furthermore, the following result holds Theorem 3 If x ¯ is not a stationary point, then F (er ) > F (¯ x), for r such that ∇r F (¯ x) = max{∇i F (¯ x) : i = 1, . . . , n}. Proof: F (er ) ≥ F (¯ x) + ∇F (¯ x)T (er − x ¯) = F (¯ x) + ∇r F (¯ x) − ∇F (¯ x)T x ¯ h i = F (¯ x) + max ∇i F (¯ x) − ∇F (¯ x)T x ¯ > F (¯ x) i

by Theorem 2.

Since the simplex has n vertices the algorithm should converge to a stationary point of F in a finite number of iterations. Furthermore the algorithm is strongly polynomial, as the computational effort per iteration is polynomial in the order of the matrix A.

As mentioned before, the estimation of A−1 1 can be done by setting A = B −1 for this algorithm. If

the LU decomposition of B is available, then, as in Hager’s method, the algorithm requires the solution of four triangular systems in each iteration. Unfortunately, the asymptotic rate of convergence of the conditional gradient method is not very fast when ∆ is a polyhedron [2] and gradient projection methods often converge faster. In the next section we discuss the use of the so-called Spectral Projected Gradient algorithm [3, 15] to find an estimate of kAk1 .

4

A Spectral Projected Gradient Algorithm

As discussed in [3, 4], the Spectral Projected Gradient (Spg) algorithm uses in each iteration k a vector xk ∈ ∆ and moves along the projected gradient direction defined by (9)

dk = P∆ (xk + ηk ∇F (xk )) − xk

where P∆ (w) is the projection of w ∈ Rn onto ∆. In each iteration k, xk is updated by xk+1 = xk + δk dk , where 0 < δk ≤ 1 is computed by a line search technique. The algorithm converges to a stationary point of F [3]. Next we discuss the important issues for the application of the Spg algorithm to the simplex formulation discussed in [15]. 9

(i) Computation of Stepsize: As before exact line search gives αk = 1 in each iteration k. (ii) Computation of the Relaxation Parameter ηk : Let zk = ∇F (xk ) and ηmin and ηmax be a small and a huge positive real number respectively, and as before, let PX (x) denote the projection of x over a set X. Then [15] (I) For k = 0, η0 = P [ηmin , ηmax ]

1 kP∆ (x0 + z0 ) − x0 k∞

(II) For any k > 0, let xk be the corresponding iterate, zk the gradient (subgradient) of F at xk and  s k−1 = xk − xk−1 y k−1 = zk−1 − zk

Then

  P [η , η ] hsk−1 , sk−1 i min max hsk−1 , yk−1 i ηk =  ηmax

⇐ hsk−1 , yk−1 i > ε

(10)

⇐ otherwise

where ε is a quite small positive number. (iii) Computation of the Projection over ∆ : The projection onto ∆ that is required in each iteration of the algorithm is computed as follows: (I) Find u = xk + ηk zk . (II) The vector P∆ (u) is the unique optimal solution of the strictly convex quadratic problem M inimize n

1 2

subject to

eT w = 1 w≥0

w∈R

2

ku − wk2

As suggested in [15], a very simple and strongly polynomial Block Pivotal Principal Pivoting Algorithm is employed to compute this unique global minimum solution. The steps of the Spg algorithm can be presented as follows.

SPECTRAL PROJECTED GRADIENT ESTIMATOR FOR A−1 1

Step 0: Initialization Let x0 = n1 , n1 , . . . , n1 and k = 0.

Step 1: Determination of the Gradient or Subgradient of F , zk Compute Ayk = xk , ξ = sign(yk ), AT zk = ξ. Step 2: Stopping Criterion If

max (zk )i ≤ zkT xk ,

1≤i≤n

stop the algorithm with γ = kyk k1 the estimate for A−1 1 .

Step 3: Search Direction Update xk+1 = P∆ (xk + ηk zk ) and go to Step 1 with k ← k + 1. 10

5

Computational Experience

In this section we report some computational experience with the methods previously described to es

timate A−1 1 . All the estimators have been implemented in MATLAB [16], version 7.1. The test problems have been taken from well-known sources. Type I. Minkowski matrices [14], with the following structure 2

4 −1

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

−1 4 −1

−1 4 −1

−1 4

−I4

3

−I4 4 −1

−1 4 −1

−1 4 −1

−1 4

−I4

−I4

..

.

7 7 7 7 7 7 7 7 7 7 7 7. 7 7 7 7 7 7 7 7 7 5

Type II. Matrices [12] A = α I + e × eT , where e = [ 1 1 . . . 1 ]T and the parameter α takes values close to zero. Type III. Triangular bidiagonal unitary matrices [12], where the diagonal and the first lower diagonal are unitary and the remaining elements are zero. Hence these matrices have the form 2 6 6 6 6 6 6 6 4

1 1 0 . . . 0

3

1 1 . . . 0

1 . . . 0

..

7 7 7 7 7. 7 7 5

.

...

1

Type IV. Pentadiagonal matrices [18] with mii mi,i−1 mi,i−2

=

6

= mi−1,i = −4 = mi−2,i = 1

and all other elements equal to zero. Type V. Murty matrices [18] with the structure 2 6 6 6 6 6 6 6 4

1 2 2 . .. 2

3

1 2 . .. 2

1 . .. 2

..

7 7 7 7 7. 7 7 5

.

...

1

Type VI. Fathy matrices [18] of the form F = MT M where M is a Murty matrix. 11

Type VII. Matrices [12] whose inverses can be written as A−1 = I + θ ∗ C where C is an arbitrary matrix such that Ce = C T e = 0, and the parameter θ taking values close to zero. Type VIII. Matrices from the MatrixMarket collection [5] that covers all of the Harwell-Boeing matrices. In Table 1 some specifications of the selected matrices are displayed. Table 1: Properties of the Type VIII matrices. Name

Original Name

Origin

Number of nonzeros Order

elements

Type

Matrix1 Matrix2 Matrix3 Matrix4 Matrix5 Matrix6 Matrix7 Matrix8 Matrix9 Matrix10 Matrix11 Matrix12 Matrix13 Matrix14 Matrix15 Matrix16 Matrix17

s1rmq4m1 s2rmq4m1 s3rmq4m1 s1rmt3m1 s2rmt3m1 s3rmt3m1 bcsstk14 bcsstk15 bcsstk27 bcsstk28 orani678 sherman5 nos7 nos2 e20r5000 fidapm37 fidap037

Finite-Elements Finite-Elements Finite-Elements Finite-Elements Finite-Elements Finite-Elements Structural Engineering Structural Engineering Structural Engineering Structural Engineering Economic Modeling Petroleum Modeling Finite Differences Finite Differences Dynamics of Fluids Finite Elements Modeling Finite Elements Modeling

5489 5489 5489 5489 5489 5489 1806 3948 1224 4410 2529 3312 729 957 4241 9152 3565

143300 143300 143300 112505 112505 112505 32630 60882 28675 111717 90158 20793 2673 2547 131556 765944 67591

SPD SPD SPD SPD SPD SPD SPD SPD SPD Indefinite Unsymmetric Unsymmetric SPD SPD Unsymmetric Unsymmetric Unsymmetric

Table 2 includes the numerical results of the performance of the three algorithms discussed in this paper for the estimation of the 1-norm condition number of the matrices mentioned before. The notations N, Value and It are used to denote the order of the matrices, the value of the condition number estimator and the number of iterations provided by the three algorithms, which is the number of iterates visited by the method. Furthermore Cn represents the true condition number of these matrices. In general the Spg algorithm converges to a vertex of the simplex. In Table 2 we use the notation ∗ to indicate the problems where the algorithm converges to a point that is not a vertex of ∆. Finally column ηmax indicates the value of ηmax that leads to the best performance for the Spg algorithm.

12

Hager’s Estimator Value Nit

Cg Estimator Value Nit

Spg Estimator Value Nit ηmax

Type I

N 50 250 500 1000 2000 4000

2.31E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001

2 2 2 2 2 2

2.31E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001

2 2 2 2 2 2

2.31E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001

2 3∗ 3∗ 2∗ 3∗ 2∗

1.0E+08 1.0E+08 1.0E+08 1.0E+08 1.0E+04 1.0E+04

2.31E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001 2.40E+001

Type II

α 0.50 0.25 0.125 0.01 1.0E-03 1.0E-04 1.0E-05

(N=4000) 1.60E+004 3.20E+004 6.40E+004 8.00E+005 8.00E+006 8.00E+007 8.00E+008

2 2 2 2 2 2 2

1.60E+004 3.20E+004 6.40E+004 8.00E+005 8.00E+006 8.00E+007 8.00E+008

2 2 2 2 2 2 2

1.60E+004 3.20E+004 6.40E+004 8.00E+005 8.00E+006 8.00E+007 8.00E+008

2 2 2 3 3 3∗ 3

1.0E+16 1.0E+16 1.0E+16 1.0E+12 1.0E+12 1.0E+10 1.0E+10

1.60E+004 3.20E+004 6.40E+004 8.00E+005 8.00E+006 8.00E+007 8.00E+008

Type III

N 50 250 500 1000 2000 4000

9.80E+001 4.98E+002 9.98E+002 2.00E+003 4.00E+003 8.00E+003

2 2 2 2 2 2

9.80E+001 4.98E+002 9.98E+002 2.00E+003 4.00E+003 8.00E+003

2 2 2 2 2 2

9.80E+001 4.98E+002 9.98E+002 2.00E+003 4.00E+003 8.00E+003

3 3 3 3 3 3

1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04

1.00E+002 5.00E+002 1.00E+003 2.00E+003 4.00E+003 8.00E+003

Type IV

N 50 250 500 1000 2000 4000

3.04E+005 1.68E+008 2.65E+009 4.20E+010 6.69E+011 1.07E+013

2 2 2 2 2 2

3.04E+005 1.68E+008 2.65E+009 4.20E+010 6.69E+011 1.07E+013

2 2 2 2 2 2

3.04E+005 1.68E+008 2.65E+009 4.20E+010 6.69E+011 1.07E+013

3 3 3 2 2 2

1.0E+11 1.0E+06 1.0E+06 1.0E+06 1.0E+06 1.0E+06

3.04E+005 1.68E+008 2.65E+009 4.20E+010 6.69E+011 1.07E+013

Type V

N 50 250 500 1000 2000 4000

9.80E+003 2.49E+005 9.98E+005 4.00E+006 1.60E+007 6.40E+007

2 2 2 2 2 2

9.80E+003 2.49E+005 9.98E+005 4.00E+006 1.60E+007 6.40E+007

2 2 2 2 2 2

9.80E+003 2.49E+005 9.98E+005 4.00E+006 1.60E+007 6.40E+007

2 2 2 2 2 2

1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04

9.80E+003 2.49E+005 9.98E+005 4.00E+006 1.60E+007 6.40E+007

Type VI

N 50 250 500 1000 2000 4000

2.50E+007 1.56E+010 2.50E+011 4.00E+012 6.40E+013 1.02E+015

2 2 2 2 2 2

2.50E+007 1.56E+010 2.50E+011 4.00E+012 6.40E+013 1.02E+015

2 2 2 2 2 2

2.50E+007 1.56E+010 2.50E+011 4.00E+012 6.40E+013 1.02E+015

2 2 2 2 2 2

1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04 1.0E+04

2.50E+007 1.56E+010 2.50E+011 4.00E+012 6.40E+013 1.02E+015

Type VII

Table 2: Condition number estimators.

θ 0.50 0.25 0.125 0.01 1.0E-03 1.0E-04 1.0E-05

(N=4000) 5.44E+013 2.37E+013 1.01E+013 1.67E+011 5.68E+009 1.27E+008 7.14E+006

3 3 4 3 3 3 4

5.43E+013 2.37E+013 1.01E+013 1.67E+011 5.68E+009 1.27E+008 7.14E+006

3 3 4 3 3 3 4

8.14E+013 2.73E+013 1.09E+013 2.12E+011 6.89E+009 2.74E+008 8.10E+006

3 3 3 3 3 4 4

1.0E+02 1.0E+02 1.0E+02 1.0E+05 1.0E+06 1.0E+08 1.0E+08

8.14E+013 2.75E+013 1.10E+013 2.14E+011 6.99E+009 2.90E+008 8.15E+006

Matrix

13

Cn

Table 2: (continuation) Matrix

Type VIII

Name Matrix1 Matrix2 Matrix3 Matrix4 Matrix5 Matrix6 Matrix7 Matrix8 Matrix9 Matrix10 Matrix11 Matrix12 Matrix13 Matrix14 Matrix15 Matrix16 Matrix17

Hager’s Estimator Value Nit

Cg Estimator Value Nit

Spg Estimator Value Nit ηmax

7.47E+004 7.66E+005 4.54E+007 4.49E+004 9.22E+005 4.39E+007 1.11E+010 7.66E+009 1.23E+004 4.49E+004 1.00E+007 3.90E+005 3.00E+008 3.71E+005 1.84E+010 3.05E+010 2.26E+002



2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2 3 2 2 2 2∗ 2∗ 3 3 2 2 3∗ 3∗ 2 2 2∗


Cn


The results show that the three algorithms are able to obtain similar estimations for the condition number of the matrices. The number of iterations for the three algorithms is always quite small. Furthermore for the best values of ηmax the Spg algorithm finds in general the best estimate for the condition number. The main drawback of the Spg algorithm relies on its dependence of the choice of the interval [ηmin , ηmax ]. Our experience has shown that a value of ηmin = 10−3 is in general a good choice. However the choice of ηmax has an important factor on the computational effort. The results show that a value of ηmax of 104 or 105 works very well for many cases, but for other problems ηmax should be chosen larger. It is important to note that the choice of ηmax has even an impact on the stationary point obtained by the Spg algorithm, and of the corresponding condition number estimate. As suggested in [10], we decided to apply the Spg algorithm more than once starting in each iteration with an initial point that is the barycenter of the non visited vertices of the canonical basis in the previously iterations. The results are presented in Table 3, where Nap represents the number of applications of the Spg algorithm and Nit the total number of iterations for these experiments. These results show that the algorithm can improve the estimate in some cases, when the true condition number has not been obtained by the Spg algorithm.

14

Table 3: Performance of the Spg algorithm with a different initial point. Previous Estimative

Matrix

Type III

Type VII

Type VIII

6

Spg Estimator Value Nit Nap

Cn

N 50 250 500

9.80E+001 4.98E+002 9.98E+002

1.00E+002 5.00E+002 1.00E+003

5 5 5

2 2 2

1.00E+002 5.00E+002 1.00E+003

θ 0.25 0.125 0.01 1.0E-03 1.0E-04 1.0E-05

(N=4000) 2.73E+013 1.09E+013 2.12E+011 6.89E+009 2.74E+008 8.10E+006

2.73E+013 1.09E+013 2.12E+011 6.89E+009 2.74E+008 8.10E+006

6 23 6 9 16 12

2 7 2 3 4 3

2.75E+013 1.10E+013 2.14E+011 6.99E+009 2.90E+008 8.15E+006

Name Matrix3 Matrix4 Matrix6

4.45E+07 4.49E+04 4.39E+07

4.54E+07 4.93E+04 4.60E+07

5 8 8

2 4 3

4.54E+07 4.93E+04 4.62E+07

Conclusions

The formulation of the condition number estimation problem was changed from a minimization over a unit sphere in the 1-norm to a minimization over the unit simplex. For this new constraint set, it is relatively easy to project a vector onto the feasible set. Hence, a projected gradient algorithm could be used to estimate the 1-norm condition number. Numerical experiments indicate that the so-called spectral projected gradient (Spg) algorithm [3] can yield a better condition number estimate than the previous algorithm, while the number of iterations increased by at most one.

References [1] M. S. Bazaraa, H. D. Sherali, and C. Shetty, Nonlinear programming : theory and algorithms, 2nd ed., John Wiley & Sons, New York, 1993. [2] D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Belmont, Massachusetts, 2003. [3] E. G. Birgin, J. M. Martínez, and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets, SIAM Journal on Optimization, 10 (2000), pp. 1196–1211. [4]

, Algorithm 813: Spg - software for convex constrained optimization, ACM Transactions on Mathematical Software, 27 (2001), pp. 340–349.

[5] R. Boisvert, R. Pozo, K. Remington, B. Miller, and R. Lipman, Mathematical and Computational Sciences Division of Information Technology Laboratory of National Institute of Standards and Technology, Available at http://math.nist.gov/MatrixMarket/formats.html, 1998. [6] R. Cottle, J. Pang, and R. Stone, The Linear Complementarity Problem, Academic Press, New York, 1992. 15

[7] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997. [8] P. Gill, W. Murray, and M. Wright, Numerical Linear Algebra and Optimization, AddisonWesley, New York, 1991. [9] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed., John Hopkins University Press, Baltimore, 1996. [10] W. W. Hager, Condition estimates, SIAM Journal on Science and Statistical Computing, 5(2) (1984), pp. 311–316. [11]

, Applied Numerical Linear Algebra, Prentice Hall, New Jersey, 1998.

[12] N. J. Higham, FORTRAN codes for estimating the one-norm of a real or complex matrix with applications to condition estimation, ACM Transactions on Mathematical Software, 14(4) (1988), pp. 381–396. [13] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996. [14] C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method, Cambridge University Press, London, 1990. [15] J. J. Júdice, M. Raydan, S. S. Rosa, and S. A. Santos, On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm, Numerical Algorithms, 44 (2008), pp. 391–407. [16] C. Moler, J. N. Little, and S. Bangert, Matlab User’s Guide - The Language of Technical Computing, The MathWorks, Sherborn, Mass, 2001. [17] K. Murty, Linear and Combinatorial Programming, Wiley, New York, 1976. [18] K. Murty, Linear Complementarity, Linear and Nonlinear Programming, Heldermann Verlag, Berlin, 1988.

16

An investigation of feasible descent algorithms for estimating the ...

An investigation of feasible descent algorithms for estimating the ...

Suggest Documents

An overview of gradient descent optimization algorithms

Coordinate Descent Algorithms for Phase Retrieval - arXiv

Parallel multiclass stochastic gradient descent algorithms for ...

An Investigation of Reorganization Algorithms - Semantic Scholar

Two algorithms for estimating the energetic ...

Convergence Analysis of Gradient Descent Algorithms with

Descent direction of two algorithms conjugate

An Investigation of Classification Algorithms for Predicting ... - BioAfrica

An Investigation of Classification Algorithms for Predicting HIV Drug ...

An investigation of reservation random access algorithms for voice ...

An Investigation of Scheduling and Packet Reordering Algorithms for

A taxonomy of descent algorithms for nonlinear programs ... - CiteSeerX

Cost approximation: A uni ed framework of descent algorithms for

which algorithms are feasible? maxent approach - DigitalCommons ...

Boosting Algorithms as Gradient Descent - NIPS Proceedings

OPTIMAL STEEPEST DESCENT ALGORITHMS ... - Semantic Scholar

A Finite-Differences Derivative-Descent Approach for Estimating Form ...

OCEAN COLOR ALGORITHMS FOR ESTIMATING WATER ... - SPG

Random walk algorithms for estimating electrostatic ...

Algorithms for Estimating Learning Opportunity and ... - CiteSeerX

Computationally Efficient Algorithms for Estimating the Angle of Arrival ...

Recursive Algorithms for Estimating Multiple Co

Gradient Descent Bit Flipping Algorithms for Decoding LDPC ... - arXiv

Fast Algorithms for Robust PCA via Gradient Descent