A LOW-RANK APPROXIMATION FOR COMPUTING ...

SIAM J. MATRIX ANAL. APPL. Vol. 32, No. 2, pp. 349–363

© 2011 Society for Industrial and Applied Mathematics

A LOW-RANK APPROXIMATION FOR COMPUTING THE MATRIX EXPONENTIAL NORM* YURI M. NECHEPURENKO†

AND

MILOUD SADKANE‡

Abstract. This work is devoted to computing the function γðtÞ ¼ k expðtAÞk2 in a given time interval 0 ≤ t1 ≤ t ≤ t2 , where A is a square matrix whose eigenvalues have negative real parts. The main emphasis is put on computations of the maximal value of γðtÞ for t ≥ 0. To speed up the computations, we propose and justify a new algorithm based on low-rank approximations of the matrix exponential and prove that it computes γðtÞ with a given accuracy. We discuss its implementation and demonstrate its efficiency with some numerical experiments. Key words. matrix exponential norm, maximal amplification, low-rank approximation, Schur decomposition AMS subject classification. 65F30 DOI. 10.1137/100789774

1. Introduction. This paper is concerned with the development of an algorithm for fast computation of the 2-norm of the matrix exponential ð1:1Þ

γðtÞ ¼ k expðtAÞk2 ;

where A is a given n × n matrix whose eigenvalues have negative real parts and t is a varying nonnegative parameter which will be referred to time. Note that the assumption on the eigenvalues of A implies that expðtAÞ → 0 as t → ∞. This problem arises, for example, in linear stability analysis of steady states of ordinary differential equations [3], [4], [10] in addition to the more traditional characteristics of linear stability such as the largest real part of eigenvalues and the maximal value of the resolvent norm on the imaginary axis [13]. Of particular interest are the maximal value of (1.1) and the smallest t, denoted hereafter by topt , for which the maximum ð1:2Þ

maxγðtÞ t≥0

is attained. Their knowledge may be useful in transition growth and robust control investigations. In this paper we consider the case when n is not very large (around 1000 or less). For instance, one obtains such problems when studying stability of a mechanical system whose number of degrees of freedom is not very large and stability of three-dimensional partial differential equations with the steady states depending on one or two spatial coordinates [11], [12]. *Received by the editors March 22, 2010; accepted for publication (in revised form) by N. J. Higham February 11, 2011; published electronically June 24, 2011. http://www.siam.org/journals/simax/32-2/78977.html † Institute of Numerical Mathematics, Russian Academy of Sciences, ul. Gubkina 8, 119991 Moscow, Russia ([email protected]). This author’s work was supported by the Russian Foundation for Basic Research (project 10-01-00513) and the Russian Academy of Sciences (project “Optimization of numerical algorithms for solving the problems of mathematical physics”). ‡ Laboratoire de Mathématiques, Université de Brest, CNRS UMR 6205, 6 Av. Le Gorgeu, 29238 Brest Cedex 3, France ([email protected]). 349

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

350

YURI M. NECHEPURENKO AND MILOUD SADKANE

There are several methods for computing the matrix exponential and hence the norms in (1.1) [9], [10], [6], [7, Chap. 13], [1]. The conclusion in [10] is that the method of scaling and squaring [6] appears to be one of the best methods for computing the matrix exponential of a not very large general square matrix. It has recently been improved in [1] (see also [8]), but [6] is still the method used by MATLAB function expm. However, if the norm in (1.1) is to be computed with many values of t, for instance, t ¼ s1 ; : : : ; sN with sj − sj−1 ≡ τ and large N , then computations based only on the method of scaling and squaring become very expensive. One direct way for accelerating the computations is to use this method only for computing the matrices E 1 ¼ expðs1 AÞ and F ¼ expðτAÞ and then use the recurrence formula E j ¼ FE j−1 . We discuss this approach in section 2. Despite the large acceleration in comparison with the use of the method of scaling and squaring at each grid point, the above computations can still be too expensive. The main goal of this paper is to speed up the computations further. To this end, we propose and justify in sections 3 and 4 a new algorithm based on low-rank approximations of the matrix exponential of the upper triangular matrix which results from the Schur decomposition of A. We discuss the main properties of this algorithm and prove, in particular, that it computes (1.1) with a given accuracy (Theorem 4.1). We use the following matrices throughout the paper to illustrate the behavior of our algorithm. Matrix 1. Upper bidiagonal matrix A of order n ¼ 1000, whose nonzero entries are given by Akk ¼ −0.01k2 , k ¼ 1; : : : ; n, and Ak;kþ1 ¼ 1, k ¼ 1; : : : ; n − 1. Matrix 2. The Tolosa matrix of order n ¼ 340 taken from the NEP collection.1 It is obtained from a stability analysis of an aircraft structure. The known bound ð1:3Þ

γðtÞ ≤ expðtμÞ;

t ≥ 0;

where μ ¼ μðAÞ denotes the largest eigenvalue of the Hermitian matrix ðA þ A Þ∕ 2, and the equality dγ ð0Þ ¼ μ dt

ð1:4Þ

(see, e.g., [2], [14], [13]), shows that topt ¼ 0 if μ ≤ 0 and topt > 0 otherwise. In the last case, the graph of γðtÞ has at least one hump and the most interesting part of this function lies between 0 and the value ð1:5Þ

tmin ¼ infft∶γðtÞ < 1g:

Table 1.1 shows the above quantities and the maximal real part of eigenvalues α ¼ αðAÞ for the considered matrices. The left part of Figure 1.1 shows the behavior of γðtÞ for Matrix 1. This is one of typical behaviors for γðtÞ with A whose eigenvalues have negative real parts and μ > 0. The function decreases to zero after an initial growth. A more intricate behavior, shown in Figure 1.1 on the right, is demonstrated by γðtÞ for Matrix 2 which has eigenvalues with large imaginary parts. In this case, γðtÞ oscillates and has a lot of humps. 1

See http://math.nist.gov/MatrixMarket/.


351

COMPUTING THE MATRIX EXPONENTIAL NORM

TABLE 1.1 Shows the above quantities and the maximal real part of eigenvalues α ¼ αðAÞ for the considered matrices. Matrix

1

α

−0.01

μ topt

2 −0.156 1.01 × 105

0.791

3.0 × 10−3

80.4

γðtopt Þ

9.30 × 104

tmin

1.25 × 103

304 7.11

Matrix 2

4

10

x 10

350 300

8 250 γ(t)

γ(t)

6

200 150

4

100 2

0 0

50 200

400 t

600

800

0 0

0.05

0.1

t

0.15

0.2

0.25

FIG. 1.1. Function γðtÞ for Matrices 1 and 2.

Throughout the paper we use the MATLAB notation for submatrices. The notation 0 denotes the zero matrix of appropriate size, and X denotes the conjugate transpose of a matrix X. 2. Direct algorithm. For a given n × n matrix A whose eigenvalues have negative real parts, let A ¼ QSQ be the Schur decomposition [5], where Q is unitary and S is upper triangular, whose diagonal elements are ordered in a nonincreasing order of their real parts. Then μðAÞ ¼ μðSÞ and ð2:1Þ

γðtÞ ¼ k expðtSÞk2 :

In this section, we discuss the direct method mentioned in the introduction and apply it to S instead of A. This replacement makes the computations faster when N ≫ n since the matrix-by-matrix multiplications and computation of the 2-norm are less expensive with triangular matrices than with general square ones. Then, in the next section, we show that this computation can be made significantly faster using the upper triangular structure of S and applying a low-rank approximation. Suppose we want to compute the maximal value of the function γðtÞ in (2.1) for t ≥ 0; i.e., we want to find γðtopt Þ, where ð2:2Þ

topt ¼ min min maxγðtÞ: t≥0


352


Assume that μ ¼ μðSÞ > 0 and that we know t1 and t2 such that t1 ≤ topt ≤ t2 . We define the grid ð2:3Þ

t1 ¼ s 1 < · · · < s N

with a fixed step sj − sj−1 ¼ τ, 0 < τ ≤ t2 − t1 , and choose N such that sN −1 < t2 ≤ sN . Due to (1.3), we have ð2:4Þ

γðsj Þ ≤ max γðtÞ ≤ γðsj Þ expðϵÞ; sj ≤t t1 ≥ 0 and ϵ > 0. Set τ ¼ ϵ∕ μ. Compute F ¼ expðτSÞ, E 1 ¼ expðs1 SÞ using [6], and γðs1 Þ ¼ kE 1 k2 . for j ¼ 2; 3; : : : Terminate if sj−1 > t2 . Compute E j ¼ FE j−1 and γðsj Þ ¼ kE j k2 . end To apply Algorithm 1 we need to choose t1 and t2 . The most obvious way is to take t1 ¼ 0 and t2 ≥ tmin with tmin defined in (1.5). In this case we have γðtÞ ≤ γðt − t2 Þγðt2 Þ < γðt − t2 Þ ≤ γðtopt Þ for all t ≥ t2 and therefore t2 > topt . We can obtain the desired t2 from Algorithm 1, where the termination condition sj−1 > t2 is replaced by γðsj−1 Þ < 1. Combining these two termination conditions we obtain the condition ð2:5Þ

sj−1 > t2

or γðsj−1 Þ < 1

which allows the use of an upper bound t2 for topt if it is available, and otherwise we set t2 ¼ ∞ and terminate with the single condition γðsj−1 Þ < 1. The computational cost of Algorithm 1 is about cn3 N arithmetic operations with some multiplicative constant c larger than one. (Each iteration necessitates the computations of the matrix norm kE j k2 and the matrix-by-matrix product FE j−1 . Each of these computations requires Oðn3 Þ arithmetic operations; see [5].) In all our experiments presenting the computational cost, we will omit the constant c. To estimate the cost in a typical situation, consider the following Example 2.1. Let A be Matrix 1. This matrix is already in Schur form with the desired ordering of its eigenvalues, so S ¼ A. We have μ ≈ 0.791 and tmin ¼ 1.25 × 103 . If we set ϵ ¼ 10−3 , then τ ≈ 1.26 × 10−3 and by Algorithm 1 with the termination condition (2.5), t1 ¼ 0 and t2 ¼ ∞, we obtain N ≈ 106 . This leads to a computational cost of about 1015 arithmetic operations. We may conclude that Algorithm 1 is prohibitively expensive for such computations.



353

Example 2.2. If we take t1 ¼ 75 and t2 ¼ 85 (a good location for topt ¼ 80:4) and use the same ϵ as in Example 2.1, then N ≈ 8 × 103 and the computational cost of Algorithm 1 decreases to 8 × 1012 which is still expensive. 3. Low-rank approximation. Let the upper triangular matrix S be partitioned as

S¼

S 11 0

S 12 S 22

with square blocks S 11 and S 22 . Then XðtÞ expðtS 11 Þ expðtSÞ ¼ 0 expðtS 22 Þ with some matrix XðtÞ. Due to the assumption on the real parts of the eigenvalues of S, if t2 ≫ 0, then we can find some grid point s in (2.3) such that δ ¼ k expðs S 22 Þk2 is small enough and therefore the following low-rank approximation is possible: B expðs S 11 Þ Xðs Þ expðs SÞ ≈ ¼ ð3:1Þ U ; 0 0 0 where expðs S 11 ÞXðs Þ ¼ B U with square and upper triangular B of the same order k as S 11 with k < n and k × n unitary rectangular U ; i.e., the rows of U are orthonormal. Since exp ððt − s ÞS 11 ÞB B ¼ ; expððt − s ÞSÞ 0 0 we obtain

expððt − s ÞSÞ B U ¼ k exp ððt − s ÞS 11 ÞB k ; 2 0 2

and the computation of γðtÞ at t > s can be reduced to that of ð3:2Þ

~ γðtÞ ¼ k expððt − s ÞS 11 ÞB k2 :

The error can be estimated as follows: 0 ~ jγðtÞ − γðtÞj ¼ expfðt − s ÞSg ð3:3Þ 0

0 ≤ γðtopt Þδ : exp ðs S 22 Þ 2

Note that the above estimation may not be valid for t < s since the matrix expðsSÞ is unlimited as s → −∞. Consider the function ð3:4Þ

kω ðtÞ ¼ min fk∶k exp ðtSðk þ 1∶n; k þ 1∶nÞÞk2 ≤ ω; 0 ≤ k ≤ n − 1g ∪ fng:

At t ¼ s , kω ðs Þ is the smallest value of k which ensures for the error of approximation (3.1) the fulfilment of the inequality δ ≤ ω. For a given matrix S, the smaller the function kω ðs Þ is, the more successful the approximation (3.1). Figure 3.1 shows kω ðtÞ with ω ¼ 10−4 for Matrices 1 and 2. As we can see from this figure, in both cases the use of


354

YURI M. NECHEPURENKO AND MILOUD SADKANE 3

10

350 300

2

10

k

k

250

1

10

200 150 100

0

10

0

20

40

60

80

100

t

50 0

0.2

0.4

0.6

0.8

1

t

FIG. 3.1. Function kω ðtÞ for Matrices 1 and 2 with ω ¼ 10−4 .

low-rank approximations can give a large acceleration of the direct algorithm in the considered time intervals. 4. Algorithm based on the low-rank approximation. The idea is to use the approximation (3.1) only a few times during the direct algorithm with the termination condition (2.5) and estimate γðtÞ via (3.2). A formal description of the resulting algorithm is given in Algorithm 2. We assume that μðSÞ > 0 (since otherwise topt ¼ 0) and t1 ≤ topt ≤ t2 . ALGORITHM 2. Choose t2 > t1 ≥ 0, ϵ > 0, 0 ≤ tol ≤ 1∕ 2, p > 1, and an integer r ≥ 1. Set k0 ¼ n, s1 ¼ t1 , τ ¼ ϵ∕ μðSÞ, i ¼ 1, tol1 ¼ tolðp − 1Þ∕ p, δ ¼ 0. ~ 1 Þ ¼ kE 1 k2 . Compute F ¼ expðτSÞ and E 1 ¼ expðs1 SÞ using 6, and γðs for j ¼ 2; 3; : : : Terminate if sj−1 > t2

ð4:1Þ

or

~ j−1 Þ < 1 − δ: γðs

if tol > 0 and modðj − 2; rÞ ¼ 0 Find the minimal ki < ki−1 such that the decomposition Bi U i þ Δi ð4:2Þ E j−1 ¼ 0 is possible with ki × ki upper triangular matrix B i , ki × ki−1 unitary rectangular matrix U i , and δi ¼ kΔi k2 ≤ toli . if ki exists ~ for the main ki × ki submatrix S~ of S and terminate if Compute μ ¼ μðSÞ ð4:3Þ

μ ≤ 0:

Otherwise, compute the matrix B i in (4.2), replace E j−1 by B i , compute ~ Set δ ¼ δ þ δi , i ¼ i þ 1, toli ¼ toli−1 ∕ p. τ ¼ ϵ∕ μ and F ¼ expðτSÞ. endif endif ~ j Þ ¼ kE j k2 . Compute E j ¼ FE j−1 , sj ¼ sj−1 þ τ, and γðs end



355

If tol ¼ 0, then Algorithm 2 works as Algorithm 1 with the termination condition ~ (2.5) and computes γðtÞ ¼ γðtÞ at the uniform grid (2.3). Otherwise it computes a func~ tion γðtÞ which approximates γðtÞ. In this case the grid is not uniform and its step τ ~ changes during the computations. The function γðtÞ is computed at points of this grid. We define this function for all t ≥ t1 as follows: ð4:4Þ

~ jk ; ~ γðtÞ ¼ k expððt − sj ÞSÞE 2

sj ≤ t < sjþ1 ;

j ¼ 1; : : : ; N ;

with the leading submatrix S~ of S of the same size as E j , where s1 ¼ t1 , sN is the largest grid point with the convention sN þ1 ¼ ∞. Before each iteration E j ¼ FE j−1 with modðj − 2; rÞ ¼ 0, if the termination condition (4.1) is not satisfied, Algorithm 2 checks for the matrix E j−1 , which was already computed, if the decomposition (4.2) exists with ki < ki−1 . If the decomposition does not exist, the iterations continue as before until the next step j with modðj − 2; rÞ ¼ 0. If the ~ of the leading decomposition is obtained, the algorithm computes the value μ ¼ μðSÞ ki × ki submatrix of S. If the termination condition (4.3) is not satisfied, then the algorithm continues the iterations E j ¼ FE j−1 with new τ ¼ ϵ∕ μ and new matrices F and E j−1 of smaller order as before. Otherwise, the algorithm terminates since this situation ~ means that the function γðtÞ cannot increase for t ≥ sj−1 . Note that if the second inequality in (4.1) is not satisfied, then, due to the condition 0 ≤ tol ≤ 1∕ 2, we have kE j−1 k2 > 1 − δ > 1∕ 2 and toli < 1∕ 2, and therefore ki > 0. Otherwise, if this inequality is satisfied, then (as is shown below) sj−1 > topt . The matrix B i in (4.2) may be computed as follows. We start by looking for the largest block row of the upper triangular matrix E j−1 whose norm is less than toli . We find the smallest k such that kE j−1 ðk þ 1∶ki−1 ; k þ 1∶ki−1 Þk2 ≤ toli varying k from ki−1 to 1 until the inequality holds. To minimize the computational cost we replace the 2-norm by the Frobenius norm until kE j−1 ðk þ 1∶ki−1 ; k þ 1∶ki−1 ÞkF ≤ toli . A corresponding algorithm can be written as follows: Set k ¼ ki−1 and compute e0 ¼ jE j−1 ðki−1 ; ki−1 Þj2 . while (e0 ≤ ðtoli Þ2 ) e ¼ e0 k¼k−1 e0 ¼ e0 þ E j−1 ðk; k∶ki−1 ÞE j−1 ðk; k∶ki−1 Þ end e0 ¼ kE j−1 ðk∶ki−1 ; k∶ki−1 Þk2 while (e0 ≤ toli ) e ¼ e0 k¼k−1 e0 ¼ kE j−1 ðk∶ki−1 ; k∶ki−1 Þk2 end Set ki ¼ k. Then, if ki ¼ ki−1 , we conclude that the desired decomposition (4.2) does not exist. Otherwise, we compute a square upper triangular ki × ki matrix B i such that E j−1 ð1∶ki ; ∶Þ ¼ B i U i , where U i is unitary rectangular, and set δi ¼ e. This matrix can be computed by applying the standard QR decomposition to the matrix E j−1 ð1∶ki ; ∶Þ with reordered columns: E j−1 ðki ∶ − 1∶1; ∶Þ ¼ QR, where Q is ki−1 × ki unitary rectangular and R is a ki × ki upper triangular matrix. The desired matrix is B i ¼ Rðki ∶ − 1∶1; ki ∶ − 1∶1Þ . Note that the matrix U i in (4.2) is not needed in Algorithm 2, and therefore we can use the less expensive version of the QR decomposition algorithm [5], where only the matrix R is computed. The corresponding matrix


356


Δi ¼

0 E j−1 ðki þ 1∶ki−1 ; ∶Þ

in (4.2) is not needed as well, but its norm δi is added to δ which is an output variable of Algorithm 2 and is used in the next section to estimate the approximation error. 4.1. Approximation error. Denote by imax the total number of successful lowrank approximations (with ki < ki−1 ). Assume that imax ≥ 1 and that Algorithm 2 com~ j Þ, j ¼ 1; : : : ; N , and terminates. Denote by N 1 ; : : : ; N imax the steps of the alputes γðs gorithm (values of j) at which the approximations are carried out and N imax þ1 ¼ N þ 1. THEOREM 4.1. Under the above assumptions, the following estimates hold: ð4:5Þ

~ − γðtÞj ≤ γðtopt Þδ; jγðtÞ

δ¼

imax X i¼1

δi ≤ tol;

t ≥ t1 :

Moreover, if Algorithm 2 terminates due to the condition (4.1), then sN > topt . Other~ wise, if it terminates due to the condition (4.3), then the function γðtÞ does not increase for t ≥ sN . Proof. From Algorithm 2, we have by construction, expðsj SÞ ¼ E j for 1 ≤ j ≤ N 1 − 1 and expðsj SÞ ¼ exp ððsj − sN 1 −1 ÞSÞ ¼

B1

U 1 þ Δ1

0 Ej U 1 þ exp ððsj − sN 1 −1 ÞSÞΔ1 0

for N 1 ≤ j ≤ N 2 − 1. Proceeding this way, it is easy to check that for N i ≤ j ≤ N iþ1 − 1 and 1 ≤ i ≤ imax , we have expðsj SÞ ¼ ð4:6Þ

þ

Ej 0

i X i 0 ¼2

Y 1 i 0 ¼i

U i 0 þ exp ððsj − sN 1 −1 ÞSÞΔ1

exp ððsj − sN i 0 −1 ÞSÞ

Δi 0 0

Y 1 i 0 0 ¼i 0 ¼1

U i 0 0

P with the convention that the sum ii 0 ¼2 equals 0 when i ¼ 1. Multiplying (4.6) on the left by expððt − sj ÞSÞ and taking into account that Y 1 Ej ~ U i 0 ¼ γðtÞ; sj ≤ t < sjþ1 ; expððt − sj ÞSÞ 0 0 i ¼i

2

Pimax

and δ ≤ i¼1 toli ≤ tol, we obtain (4.5). In the same way, setting j ¼ N , we show that ~ j−1 Þ < 1 − δ, then if Algorithm 2 terminates with the condition γðs ð4:7Þ

γðtÞ < γðtopt Þ;

t ≥ sN ;

and therefore sN > topt . The last proposition of the theorem follows directly from (4.4) and (1.3).

▯



357

~ 1 Þ; : : : ; γðs ~ N Þ computed with Algorithm 2 allow Let us show now that the values γðs ~ in us to find γðtopt Þ with a given accuracy. Note that similarly to (2.4), the function γðtÞ (4.4) satisfies ð4:8Þ

~ j Þ ≤ max γðtÞ ~ ~ j Þ expðϵÞ; γðs ≤ γðs sj ≤t topt if Algorithm 2 terminates because of the condition (4.1), and for all t ≥ t1 if it terminates because of the condition (4.3). THEOREM 4.2. Let so ¼ min arg

ð4:9Þ

max

t¼s1 ; : : : ;sN

~ γðtÞ:

Then ð4:10Þ

~ oÞ γðs

1 expðϵÞ ~ oÞ ≤ γðtopt Þ ≤ γðs : 1þδ 1−δ

Proof. Note first that, as it follows from the second part of Theorem 4.1 and the second inequality in (4.8), for each termination case we have ~ opt Þ ≤ max γðtÞ ~ ~ o Þ expðϵÞ: γðt ≤ γðs

ð4:11Þ

s1 ≤t≤sN

Inequality (4.8) implies that ð4:12Þ

~ − γðtÞ ≤ γðtopt Þδ; ðaÞ γðtÞ

~ ðbÞ γðtÞ − γðtÞ ≤ γðtopt Þδ:

From (4.12a) we obtain ~ o Þ ≤ γðso Þ þ γðtopt Þδ ≤ γðtopt Þð1 þ δÞ γðs ~ opt Þ, that yields the first inequality in (4.10). From (4.12.b) we have γðtopt Þð1 − δÞ ≤ γðt and using (4.11), the second inequality in (4.10) follows. ▯ 4.2. Computational cost. The total number of successful low-rank approximations imax is smaller than n, and if imax is much smaller than N , then the computational cost of Algorithm 2 estimates as ð4:13Þ

n3 ðN 1 − 1Þ þ

imax X i¼1

k3i ðN iþ1 − N i Þ

multiplied by the same multiplicative constant c as for Algorithm 1. We will omit this constant as before. P max The computational cost of all low-rank approximations is ii¼1 ki−1 k2i with a multiplicative constant smaller than c. In all computations with our algorithm we have imax ≪ N and therefore the computational cost of the low-ranks is negligible compared to one of the main computations. For this reason, we do not take into account the cost of low-rank modifications.


358


Example 4.1. If we apply Algorithm 2 to Matrix 1 with ϵ ¼ 10−3 , tol ¼ 10−3 , r ¼ 10, p ¼ 1.1, t1 ¼ 0, and t2 ¼ ∞, then the cost in (4.13) is about 1010 . We obtain an acceleration of 105 in comparison with Example 2.1 while on output δ ¼ 7.55 × 10−4 (the decrease of orders ki ; i ¼ 0; 1; : : : , is shown in Figure 4.1 on the left). Example 4.2. Taking t1 ¼ 75, t2 ¼ 85 and keeping the other parameters, we obtain the cost 109 and an acceleration of 8 × 103 in comparison with Example 2.2 and δ ¼ 1.15 × 10−11 (the low-rank approximation is carried out only at j ¼ 2 to order of 4). Algorithm 2 terminates at about t ¼ 683 due to the condition (4.3) in Example 4.1, and due to the first condition in (4.1) in Example 4.2. As it follows from Theorem 4.2, in each case the obtained value of δ is small enough for computing γðtopt Þ with almost the same accuracy as with Algorithm 1. If we need to compute γðtÞ with a high accuracy at all grid points, we need, according to Theorem 4.1, to choose tol ≪ γðtopt Þ. If we set tol ¼ 10−8 , then the same computations as in Example 4.1 give us the accuracy δ ¼ 6.89 × 10−9 with almost the same cost equal to 1.2 × 1010 . The above examples suggest two main ways for using Algorithm 2: computations of γðtÞ with a high accuracy (small tol) and large steps (large ϵ) for a rough investigation of the behavior of this function, and computations of γðtÞ with small steps (small ϵ) and appropriate tol for finding topt and γðtopt Þ with a high accuracy once a small interval which contains topt has been found. Moreover, the first computations can give a time interval for the second ones. Example 4.3. If we apply Algorithm 2 to the Schur form of Matrix 2 with ϵ ¼ 0.5, tol ¼ 10−4 , r ¼ 10, p ¼ 1.1, t1 ¼ 0, and t2 ¼ ∞, then the cost in (4.13) is about 2.4 × 1012 while on output δ ¼ 2.44 × 10−5 . The decrease of orders is shown in Figure 4.1 on the right. We obtain an acceleration of at least 23. ~ In Example 4.3 we obtained values of γðsÞ at points 0 ¼ s1 < · · · < sN . If we reject ~ o Þ > γðs ~ j Þ expðϵÞ∕ from the time interval ½s1 ; sN the subintervals ½sj ; sjþ1 Þ, where γðs ð1 − δÞ with so defined in (4.9), then we obtain for Matrix 2 an interval about ð1.1 × 10−3 ; 8.7 × 10−3 Þ. According to (4.8) and Theorem 4.2 this interval contains ~ expðϵÞ∕ ð1 − δÞ in a neighborhood of this topt . Figure 4.2 shows the values of ηðtÞ ¼ γðtÞ interval. So, Example 4.3 shows that 1.1 × 10−3 < topt < 8.7 × 10−3 . For a further localization of topt , we apply Algorithm 2 once more with ϵ ¼ 0.05, t1 ¼ 1.1 × 10−3 , and t2 ¼ 8.7 × 10−3 keeping the other parameters the same. We obtain a new localization interval about ð2.3 × 10−3 ; 3.7 × 10−3 Þ. In this computation, the low-rank approxi3

350

10

300 250

2

10 k

k

200 150 1

10

100 50

0

10

0

200

400 t

600

800

0 0

2

4

6

8

t

FIG. 4.1. Order reduction in Examples 4.1 and 4.3.


10


359

500

450 η(t) 400

350 γ˜ (so ) 300 2

4

6

8 −3 x 10

t

FIG. 4.2. Localization of topt for Matrix 2.

mation could not be carried out (kω ðtÞ ≡ n for the considered time interval and ω ¼ tol ≈ 10−5 ), and on output we have δ ¼ 0. The computational cost is about 6.0 × 1011 . 4.3. Parameters r, p and tol. In comparison with Algorithm 1, Algorithm 2 has three additional parameters r, p, and tol. These parameters can significantly influence the computational cost and accuracy. To illustrate, we apply Algorithm 2 to Matrix 1 and 2 with t1 ¼ 0, t2 ¼ ∞, and different values of r, p, and tol. The following tables show the cost measured by (4.13), the accuracy δ, the number N of time steps required by the algorithm, and the final value sN of t at which the algorithm has stopped. In Tables 4.1, 4.2, 4.3, 4.6, and 4.7 the parameter tol is fixed and ϵ varies, whereas in Tables 4.4 and 4.5, ϵ is fixed and tol varies. From these tables we see that the computational cost is minimized when r and p are small. When r is small the decomposition (4.2) is carried out more frequently, which leads to small sized matrices. Likewise, the smaller p is, the larger the value of toli for large i, which in turn allows the decomposition (4.2) to be carried out more often at the second stage of computations. Of course, a small ϵ increases the number N and hence the computational cost but leads to a better location of topt . A small tol guarantees a good accuracy since we always have δ ≤ tol. In all these examples, the order reductions resemble those in Figure 4.1. For example, in the case ϵ ¼ 10−2 , tol ¼ 10−4 , r ¼ 10, and p ¼ 1.1 (see Table 4.2), and the order (i.e., the size of the matrix E j in Algorithm 2) is equal to 1000 during the first ten iterations, to 95 during the ten TABLE 4.1 Matrix 1, ϵ ¼ 10−1 , tol ¼ 10−4 , t1 ¼ 0, t2 ¼ ∞. r 1 1 1 10 10 10 100 100 100

p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost 1.00 × 1.00 × 109 1.00 × 109 1.00 × 1010 1.00 × 1010 1.00 × 1010 1.00 × 1011 1.00 × 1011 1.00 × 1011 109

δ 10−5

3.40 × 5.58 × 10−5 8.67 × 10−5 1.84 × 10−10 5.00 × 10−10 1.61 × 10−9 7.52 × 10−11 7.52 × 10−11 6.83 × 10−11

N

sN

3664 5351 7211 3360 3930 4680 3300 3400 3600

6.36 × 102 9.26 × 102 1.24 × 103 5.83 × 102 6.83 × 102 8.13 × 102 5.72 × 102 5.93 × 102 6.29 × 102


360


TABLE 4.2 Matrix 1, ϵ ¼ 10−2 , tol ¼ 10−4 , t1 ¼ 0, t2 ¼ ∞. r 1 1 1 10 10 10 100 100 100

p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost 1.08 × 1.11 × 109 1.15 × 109 1.00 × 1010 1.00 × 1010 1.00 × 1010 1.00 × 1011 1.00 × 1011 1.00 × 1011 109

δ 10−5

7.12 × 9.44 × 10−5 9.77 × 10−5 3.40 × 10−5 5.58 × 10−5 8.67 × 10−5 1.84 × 10−10 5.00 × 10−10 1.61 × 10−9

N

sN

43608 75686 85306 36640 53510 72110 33600 39300 46800

7.55 × 102 1.25 × 103 1.25 × 103 6.36 × 102 9.26 × 102 1.24 × 103 5.83 × 102 6.83 × 102 8.13 × 102


p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost

δ

N

sN

3.84 × 109 6.27 × 109 9.80 × 109 1.08 × 1010 1.11 × 1010 1.15 × 1010 1.00 × 1011 1.00 × 1011 1.00 × 1011

9.43 × 10−5 9.79 × 10−5 9.81 × 10−5 7.12 × 10−5 9.44 × 10−5 9.77 × 10−5 3.40 × 10−5 5.58 × 10−5 8.67 × 10−5

600674 912120 967524 436080 756852 853055 366400 535100 721100

1.03 × 103 1.25 × 103 1.25 × 103 7.55 × 102 1.25 × 103 1.24 × 103 6.36 × 102 9.26 × 102 1.24 × 103


p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost

δ

N

sN

1.06 × 109 1.08 × 109 1.13 × 109 1.00 × 1010 1.00 × 1010 1.00 × 1010 1.00 × 1011 1.00 × 1011 1.00 × 1011

7.55 × 10−4 9.38 × 10−4 9.44 × 10−4 2.61 × 10−4 7.44 × 10−4 8.27 × 10−4 2.10 × 10−8 1.22 × 10−7 1.21 × 10−7

39400 74273 83721 32850 48370 65700 30100 35400 41400

6.83 × 102 1.25 × 103 1.25 × 103 5.71 × 102 8.38 × 102 1.13 × 103 5.24 × 102 6.17 × 102 7.21 × 102

next iterations, to 68 during the ten next iterations, etc. In the last 27750 iterations, the order is equal to 2. The computational cost can be decomposed as follows: 10 × 10003 þ 10 × 953 þ 10 × 683 þ · · · þ27750 × 23 ≈ 1.0025 × 1010 . It is dominated by the computational cost before the first successful reduction, which is almost independent of tol and ϵ in the considered range of these parameters. For Matrix 2, the cost seems less dependent on r. The reduction is spread over the iterations. For example,



361

TABLE 4.5 Matrix 1, ϵ ¼ 10−2 , tol ¼ 10−6 , t1 ¼ 0, t2 ¼ ∞. r

p

1 1 1 10 10 10 100 100 100

Cost

1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

1.13 × 1.16 × 109 1.22 × 109 1.00 × 1010 1.00 × 1010 1.00 × 1010 1.00 × 1011 1.00 × 1011 1.00 × 1011 109

δ 10−7

7.42 × 9.22 × 10−7 9.46 × 10−7 2.50 × 10−7 5.21 × 10−7 7.49 × 10−7 1.96 × 10−14 9.09 × 10−14 8.65 × 10−14

N

sN

51603 78071 88297 43940 63220 74543 40600 47300 55600

8.91 × 102 1.25 × 103 1.25 × 103 7.61 × 102 1.10 × 103 1.25 × 103 7.03 × 102 8.20 × 102 9.63 × 102

TABLE 4.6 Matrix 2, ϵ ¼ 0.5, tol ¼ 10−4 , t1 ¼ 0, t2 ¼ ∞. r 1 1 1 10 10 10 100 100 100

p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost

δ

N

sN

2.41 × 1012 2.97 × 1012 3.68 × 1012 2.42 × 1012 2.99 × 1012 3.60 × 1012 2.38 × 1012 3.05 × 1012 3.58 × 1012

4.75 × 10−5 2.37 × 10−5 9.58 × 10−6 2.41 × 10−5 2.37 × 10−5 7.04 × 10−6 2.11 × 10−5 2.86 × 10−5 2.50 × 10−5

192260 225133 260612 191948 238052 257377 191771 228258 255432

7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088

TABLE 4.7 Matrix 2, ϵ ¼ 0.05, tol ¼ 10−4 , t1 ¼ 0, t2 ¼ ∞. r 1 1 1 10 10 10 100 100 100

p 1.1 1.5 2 1.1 1.5 2 1.1 1.5 2

Cost

δ

N

sN

2.40 × 1013 2.97 × 1013 3.82 × 1013 2.41 × 1013 2.97 × 1013 3.68 × 1013 2.42 × 1013 2.99 × 1013 3.60 × 1013

6.92 × 10−5 3.45 × 10−5 4.25 × 10−6 4.75 × 10−5 2.37 × 10−5 9.58 × 10−6 2.41 × 10−5 2.37 × 10−5 7.04 × 10−6

1923185 2245163 2790963 1922592 2251324 2606111 1919478 2380516 2573763

7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088 7.1088

in the case ϵ ¼ 0.5, tol ¼ 10−4 , r ¼ 10, and p ¼ 1.1 (see Table 4.6), the order is equal to 340 during the first 17750 iterations, to 339 during the 4880 next iterations, to 337 during the 4710 next iterations, etc. In the last 2288 iterations, the order is equal to 29. The computational cost can be decomposed as follows: 17750 × 3403 þ 4880 × 3393 þ · · · þ11120 × 313 þ 5240 × 303 þ 2288 × 293 ≈ 2.4175 × 1012 . Unlike for Matrix 1, this cost is not dominated by the one before the first successful reduction.


362


The parameter tol guarantees some approximation accuracy and its choice is dictated by the goal of computations while, as the above experiments show, the parameters r and p mostly influence the computational cost and can be arbitrarily chosen. We have no general formula for estimating the computational cost as a function of these parameters. However, the following remarks may help a user to estimate the execution time before computations with small steps, which can be too expensive, and choose the optimal value of p: since the low-rank approximations are carried out at t ¼ t1 , t ¼ t1 þ rτ, t ¼ t1 þ 2rτ, : : : , this means that two runs, one with r ¼ r0 and ϵ ¼ ϵ0 and the other with r ¼ qr0 and ϵ ¼ ϵ0 ∕ q, where q is a positive integer, lead to the same low-rank approximation at the same value of t, and therefore to the same error δ if the other parameters are kept the same. The computational cost in (4.13) will be q times larger for the second case. See, for example, the case r ¼ 1, p ¼ 1.1 in Table 4.1 and the case r ¼ 10, p ¼ 1.1 in Table 4.2. This suggests that one can first set r ¼ 1 and ϵ, which is q times larger than is needed, to measure the execution time and minimize it by varying p. The execution time of the computations with the needed ϵ and r ¼ q will be approximately q times longer. 5. Conclusions. In this paper we have developed a new algorithm, based on low-rank approximations of the matrix exponential, for computing the function γðtÞ in (1.1). We have shown that the algorithm computes this function with a given accuracy and significantly faster than the direct approach which does not use the low-rank approximations. A further work is needed to characterize classes of matrices for which the algorithm works very fast and for which the acceleration is not too large. The function kω ðtÞ in (3.4) is a useful tool that can be applied to describe such classes, but it remains to connect it with traditional and widely used characteristics of matrices. The parameters r and p significantly influence the computational cost. In this paper we have proposed a practical approach to their choice. The optimal choice of these parameters for a given matrix A is another topic for a further work. Acknowledgment. We would like to thank the referees for their helpful comments. REFERENCES [1] A. H. AL-MOHY AND N. J. HIGHAM, A new scaling and squaring algorithm for the matrix exponential, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 970–989. [2] G. DAHLQUIST, Stability and Error Bounds in the Numerical Integration of Differential Equations, Trans. Royal Institute of Technology, No. 130, Stockholm, Sweden, 1959. [3] J. L. M. VAN DORSSELAER, J. F. B. M. KRAAIJEVANGER, AND M. N. SPIJKER, Linear stability analysis in the numerical solution of initial value problems, Acta Numer., 2 (1993), pp. 199–237. [4] S. K. GODUNOV, Ordinary Differential Equations with Constant Coefficients, Trans. Math. Monogr. 169, AMS, Providence, RI, 1997. [5] G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, 3rd ed., The John Hopkins University Press, Baltimore, MD, 1996. [6] N. J. HIGHAM, The scaling and squaring method for the matrix exponential revisited, SIAM J. Matrix Anal. Appl., 26 (2005), pp. 1179–1193. [7] N. J. HIGHAM, Functions of Matrices: Theory and Computation, SIAM, Philadelphia, PA, 2008. [8] N. J. HIGHAM, The scaling and squaring method for the matrix exponential revisited, SIAM Rev., 51 (2009), pp. 747–764. [9] C. B. MOLER AND C. F. VAN LOAN, Nineteen dubious ways to compute the exponential of a matrix, SIAM Rev., 20 (1978), pp. 801–836. [10] C. B. MOLER AND C. F. VAN LOAN, Nineteen dubious ways to compute the exponential of a matrix, twentyfive years later, SIAM Rev., 45 (2003), pp. 3–49.



363

[11] P. J. SCHMID AND D. S. HENNINGSON, Stability and Transition in Shear Flows, Springer-Verlag, Berlin, 2000. [12] L. N. TREFETHEN, A. E. TREFETHEN, S. C. REDDY, AND T. A. DRISCOLL, Hydrodynamic stability without eigenvalues, Science, 261 (1993), pp. 578–584. [13] L. N. TREFETHEN AND M. EMBREE, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, NJ, 2005. [14] C. F. VAN LOAN, The sensitivity of the matrix exponential, SIAM J. Numer. Anal., 14 (1977), pp. 971–981.


A LOW-RANK APPROXIMATION FOR COMPUTING ...

A LOW-RANK APPROXIMATION FOR COMPUTING ...

Suggest Documents

An Improved Approximation Algorithm for Computing Geometric ...

An Approximation Algorithm for Computing Minimal Nonintersecting ...

A Bisection/Successive Approximation Method for Computing Gittins ...

COMPUTING A LOW-RANK APPROXIMATION TO A MATRIXâ 1 ...

Computing marginal posterior densities approximation or Gibbs ...

Sparse Approximation via Iterative Thresholding - Computing and

Distributed computing of simultaneous Diophantine approximation ...

COMPUTING HITTING TIMES VIA FLUID APPROXIMATION ...

Approximation and Heuristic Algorithms for Computing Backbones in ...

Approximation and Heuristic Algorithms for Computing Backbones in ...

Improved Approximation Algorithms for Computing k Disjoint Paths ...

Exact and Approximation Algorithms for Computing the Dilation

Approximation Algorithms for Computing the Earth Mover's Distance

Analysis and Finite Element Approximation of a ... - D-ITET Computing

A saddlepoint approximation for testing

A Truthful Approximation Mechanism for ... - Semantic Scholar

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING ...

A Simple Approximation for Modeling ... - Semantic Scholar

approximation algorithms for a graph-cut problem

A Parallel Approximation Algorithm for Positive Semidefinite ...

Approximation Algorithms for a Balanced Capacity ...

A POINTWISE APPROXIMATION FOR INDEPENDENT GEOMETRIC ...

A Faster Combinatorial Approximation Algorithm for Scheduling

A SAMPLE APPROXIMATION APPROACH FOR ... - CiteSeerX

A LOW-RANK APPROXIMATION FOR COMPUTING ...