Applied Mathematics and Computation 219 (2012) 729–740
Contents lists available at SciVerse ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Low-rank approximation to the solution of a nonsymmetric algebraic Riccati equation from transport theory Peter Chang-Yi Weng a, Hung-Yuan Fan b,⇑, Eric King-wah Chu a a b
School of Mathematical Sciences, Building 28, Monash University 3800, Australia Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan
a r t i c l e
i n f o
a b s t r a c t We consider the solution of the large-scale nonsymmetric algebraic Riccati equation XCX XD AX þ B ¼ 0 from transport theory (Juang 1995), with M ½D; C; B; A 2 R2n2n being a nonsingular M-matrix. In addition, A; D are rank-1 updates of diagonal matrices, with the products A1 u; A> u; D1 v and D> v computable in OðnÞ complexity, for some vectors u and v, and B, C are rank 1. The structure-preserving doubling algorithm by Guo et al. (2006) is adapted, with the appropriate applications of the Sherman–Morrison– Woodbury formula and the sparse-plus-low-rank representations of various iterates. The resulting large-scale doubling algorithm has an OðnÞ computational complexity and memory requirement per iteration and converges essentially quadratically, as illustrated by the numerical examples. Ó 2012 Elsevier Inc. All rights reserved.
Keywords: Doubling algorithm Krylov subspace M-matrix Nonsymmetric algebraic Riccati equation
1. Introduction Consider the nonsymmetric algebraic Riccati equation (NARE)
RðXÞ XCX XD AX þ B ¼ 0;
ð1Þ
with A ¼ D eq ; B ¼ ee ; C ¼ qq ; D ¼ C qe being n n; D ¼ diagfd1 ; ; dn g; C ¼ diagfc1 ; ; cn g; e ¼ ½1; . . . ; 1> ; q ¼ ½q1 ; . . . ; qn > and di ; ci ; qi > 0 ði ¼ 1; . . . ; nÞ. From [14], for the solvability of (1), we assume that the matrix >
M¼
>
D
C
B
A
>
>
2 R2n2n
ð2Þ
is a nonsingular M-matrix. We are interested in the minimal nonnegative solution X [14]. Motivated by the low-ranked cases from the applications in transport theory [14,21,22], we further assume that n is large, thus A and D are sparse-like (with the products A1 u; A> u; D1 v and D> v computable in OðnÞ complexity, for arbitrary vectors u and v). We adapt the structurepreserving doubling algorithm (SDA) [12] to the NARE in (1), resulting in a large-scale doubling algorithm (SDA_ls) with an OðnÞ computational complexity and memory requirement per iteration. Note that the orthodox SDA in [12] has a computational complexity of Oðn3 Þ per iteration. Also, the Newton’s algorithm and the iterative methods in [3,11] are of Oðn2 Þ in computational complexity per iteration, and appear to be incapable of benefitting from the low-rank structures in (1) and the numerically low-ranked X (implied constructively by the SDA; see the discussion at the end of Section 2.1). The fact that ⇑ Corresponding author. E-mail addresses:
[email protected] (P.Chang-Yi Weng),
[email protected] (H.-Y. Fan),
[email protected] (E.King-wah Chu). 0096-3003/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2012.06.066
730
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
h i X ¼ T ðuv > Þ, the Hadamard product of the Cauchy matrix T ¼ ðdi þ cj Þ1 and the rank-1 matrix uv > , is clever and appropriate in previous work but is inhibiting for problems of large size n here. More generally, algebraic Riccati equations arise in many important applications, including the total least squares problems with or without symmetric constraints [4], the spectral factorizations of rational matrix functions [5], the linear and nonlinear optimal controls [2], the contractive rational matrix functions [19], the structured complex stability radius [13], transport theory [14,21,22], the Wiener–Hopf factorization of Markov chains [33], and the optimal solutions of linear differential systems [20]. Symmetric algebraic Riccati equations have been the topic of extensive research, and the theory, applications and numerical solution of these equations are the subject of the monographs [20,31]. The minimal positive solution to the NARE in (1), for medium size problems without the sparseness and low-rank assumptions, has been studied recently by many authors, employing functional iterations, Newton’s method and the structure-preserving algorithm; for example, see [1,3,7–12,14,21,22,28,29,32] and references therein. There have also been generalizations of (1) in [21,22], for onedimensional multi-state and two-dimensional problems. Finally, for the condition and error analysis of NAREs, see the recently published [26]. 2. Large-scale doubling algorithm 2.1. Main ideas Sharing the same philosophy of [23–25], we apply the Sherman–Morrison–Woodbury formula in order to avoid the inversion of large or unstructured matrices, and use sparse-plus-low-rank matrices to represent iterates when appropriate. Also, some matrix operators are computed recursively, to preserve the corresponding sparsity or low-rank structures, instead of forming them explicitly. If necessary, we compress and truncate fast growing components in the iterates, to trade off negligible amount of accuracy for better efficiency. Together with the careful organization of convergence control in the algorithm, we obtain a OðnÞ computational complexity and memory requirement per iteration for our algorithm. Recall the low-rank decompositions
B ¼ ee> ;
C ¼ qq> :
The SDA [12] has the form
E0 ¼ Es ;
F0 ¼ Fs;
G0 ¼ Gs ; 1
H0 ¼ Hs ;
F kþ1 ¼ F k ðI Hk Gk Þ1 F k ;
Ekþ1 ¼ Ek ðI Gk Hk Þ Ek ; 1
Hkþ1 ¼ Hk þ F k ðI Hk Gk Þ1 Hk Ek ;
Gkþ1 ¼ Gk þ Ek ðI Gk Hk Þ Gk F k ;
ð3Þ
with
Es ¼ I 2sV 1 s ; where the parameter
As ¼ A þ sI;
1 Gs ¼ 2sD1 s CW s ;
F s ¼ I 2sW 1 s ;
1 Hs ¼ 2sW 1 s BDs ;
ð4Þ
s P maxi;j faii ; djj g and
Ds ¼ D þ sI;
W s ¼ As BD1 s C;
V s ¼ Ds CA1 s B:
ð5Þ 1
1
From (3), we can summarize our algorithm in the following sentence. It is evident that ðI Gk Hk Þ and ðI Hk Gk Þ are low-rank updates of the identity, and Ek ; F k Gk and Hk are low-rank updates of E2k1 ; F 2k1 ; Gk1 and Hk1 , respectively. This, together with the compression and truncation of the Krylov subspaces described in Section 2.3, form the basis of the SDA for the large-scale NARE in (1). Similar to the approach in [23–25], we apply the Sherman–Morrison–Woodbury formula (SMWF):
e U CVÞ e 1 A e 1 Uð C e 1 UÞ1 V A e 1 ; e 1 ¼ A e 1 V A ðA to various inverses of matrices in sparse-plus-low-rank (splr) form, required in the SDA. With A; D being sparse-like, as As and Ds can be ‘‘inverted’’ (or the corresponding linear equations solved) efficiently in OðnÞ complexity. Consequently, we obtain the rank-1 updates of the inverses of As and Ds , respectively, 1 1 > 1 W 1 s ¼ As þ c W As e q As ; 1 1 > 1 V 1 s ¼ Ds þ cV Ds q e Ds ;
ð6Þ
with the constants
cW
ds ; 1 ds as
ds e> D1 s q;
cV
as 1 a s ds
;
as q> A1 s e:
Denote Ds D þ sI and Cs C þ sI, we have
731
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
cA ð1 e d s Þ1 ;
1 1 > 1 A1 s ¼ Ds þ c A Ds e q Ds ;
1
1 1 > 1 D1 s ¼ Cs þ cD Cs q e Cs ;
cD ð1 e cs Þ ;
e d s q> D1 s e; e c s q> C1 s e: 1
>
ð7Þ 1
Consequently, for arbitrary vectors u and v, the products As u; As u; Ds all be computed in OðnÞ complexity. Indeed, from (7), we see that 1 A1 s e ¼ cA ðDs eÞ;
>
1
v and Ds v , or V s
>
1
u; V s u; W s
>
v and W s v , can
> 1 q> A1 s ¼ cA ðq Ds Þ;
1 D1 s q ¼ cD ðCs qÞ;
> 1 e> D1 s ¼ cD ðe Cs Þ:
ð8Þ
Furthermore, it follows from (6)–(8) that
1 ¼ cA 1; cA 1 > 1 ~s ¼ c D 1 ¼ cD 1: ds ¼ e> D1 s q ¼ c D e Cs q ¼ c D c cD
> 1 ~ as ¼ q> A1 s e ¼ cA q Ds e ¼ cA ds ¼ cA 1
Hence, the scalar coefficients cV and cW in (6) become
as cA 1 1 cA ¼ ; ¼ 1 as ds 1 ðcD 1ÞðcA 1Þ cA cD cA cD ds cD 1 1 cD ¼ ¼ : ¼ 1 ds as 1 ðcA 1ÞðcD 1Þ cA cD cA cD
cV ¼ cW
ð9Þ
Consequently, it follows from (8) and (9) that 1 1 > 1 W 1 s ¼ Ds þ cs ðDs eÞðq Ds Þ;
1 1 > 1 V 1 s ¼ Cs þ cs ðCs qÞðe Cs Þ;
ð10Þ
with
cs ¼
cA cD : cA þ cD cA cD
Substituting (10) into (4), we obtain
1 > 1 > 1 Es ¼ I 2sC1 2scs C1 2scs D1 s s q e Cs ; F s ¼ I 2sDs s e q Ds ; 1 > 1 > 1 Gs ¼ 2scs C1 s q q Ds ; Hs ¼ 2scs Ds e e Cs :
ð11Þ
Note that the matrices Es and F s are in the form of a diagonal matrix plus a rank-1 update, and Gs ; Hs are rank-1. All these 1 1 1 four matrices can be computed in OðnÞ complexity, dependent on only the four vectors D1 s e; Ds q; Cs e and Cs q, as well as two inner products e d s and e c s in (7). From [12], as k ! 1, we have the convergence result
Ek ; F k ! 0;
Hk ! X;
Gk ! Y;
with Y being the solution to the dual NARE. Together with the formulae derived above, it is clear that Hk and Gk are numerically low-ranked, with diminishing components keeping being added for increasing values of k. This exponential growth of small components in Hk and Gk can be controlled as in [23,25], by compressing and truncating the low-ranked components. Importantly, the computation complexity and memory requirement of the resulting algorithm will be OðnÞ per iteration. 2.2. Algorithm For k ¼ 0; 1; . . ., we organize the SDA so that the iterates have the recursive forms
Gk ¼ C 1k T k C >2k ; Ek ¼
E2k1 nrE
þ
Hk ¼ B1k Rk B>2k ;
E1k E>2k ;
Fk ¼ nr F
F 2k1
þ
ð12Þ F 1k F >2k ;
ð13Þ mk mk
lk lk
with Eik 2 R k and F ik 2 R k (i ¼ 1; 2), and the kernels Rk 2 R and T k 2 R . We shall see later that rEk ¼ mk and r F k ¼ lk from (20a)–(23), after the truncation and compression of Bik and C ik (i ¼ 1; 2) in Section 2.3. (Note that 1 > Hk ¼ B1k Rk B> 2k , contrasting with Hk ¼ C k T k C k in [23–25], due to the inconsistent use of symbols for different algebraic Riccati equations and the use of inverses to represent kernels.) > We should compute products like Ek u; E> k u; F k v ; F k v , for some vectors u and v, by applying (13) recursively. Without actually forming Ek and F k , we avoid any possible deterioration of their sparsity or other structures as k grows, and preserve the OðnÞ computational complexity of the algorithm. As a trade-off, we need to store all the Bik ; Rk ; C ik and T k for all previous k, as we shall see below. Applying the SMWF again, we obtain the low-ranked updates of I:
732
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
1 ðI Gk Hk Þ1 ¼ I þ Gk B1k Rk Imk B>2k Gk B1k Rk B>2k ;
ð14aÞ
1 ¼ I þ C 1k Ilk T k C >2k Hk C 1k T k C >2k Hk
ð14bÞ
and
1 ðI Hk Gk Þ1 ¼ I þ B1k Imk Rk B>2k Gk B1k Rk B>2k Gk ;
ð15aÞ
1 ¼ I þ Hk C 1k T k Ilk C >2k Hk C 1k T k C >2k :
ð15bÞ
From (3), (14a)–(15b), we can choose the matrices in (13) recursively as
B1;kþ1 ¼ ½B1k ; F k B1k ;
B2;kþ1 ¼ ½B2k ; E>k B2k ;
ð16Þ
h i Rkþ1 ¼ Rk Rk þ ðImk Rk B>2k Gk B1k Þ1 Rk B>2k Gk B1k Rk ;
ð17aÞ
h i ¼ Rk Rk þ Rk B>2k C 1k T k ðIlk C >2k Hk C 1k T k Þ1 C >2k B1k Rk ;
ð17bÞ
C 2;kþ1 ¼ ½C 2k ; F >k C 2k ;
ð18Þ
C 1;kþ1 ¼ ½C 1k ; Ek C 1k ;
h i T kþ1 ¼ T k T k þ T k C >2k B1k Rk ðImk B>2k Gk B1k Rk Þ1 B>2k C 1k T k ;
ð19aÞ
h i ¼ T k T k þ ðIlk T k C >2k Hk C 1k Þ1 T k C >2k Hk C 1k T k ;
ð19bÞ
1 E1;kþ1 ¼ Ek Gk B1k Rk Imk B>2k Gk B1k Rk ;
ð20aÞ
1 ¼ Ek C 1k Ilk T k C >2k Hk C 1k T k C >2k B1k Rk ;
ð20bÞ
E2;kþ1 ¼ E>k B2k
ð21Þ
and
1 F 1;kþ1 ¼ F k B1k Imk Rk B>2k Gk B1k Rk B>2k C 1k T k ;
ð22aÞ
1 ¼ F k Hk C 1k T k Ilk C >2k Hk C 1k T k ;
ð22bÞ
F 2;kþ1 ¼ F >k C 2k :
ð23Þ
For the initial values as in (3), we have E0 ¼ Es ; F 0 ¼ F s ; G0 ¼ Gc ¼ 1
1
C 10 ¼ Cs q;
C 20 ¼ Ds q;
T 0 ¼ 2scs I;
B10 ¼ D1 s e;
B20 ¼ C1 s e;
R0 ¼ 2scs I;
C 10 T 0 C > 20
and H0 ¼ Hc ¼
B10 R0 B> 20 ,
with
ð24Þ
from (11). > Ultimately from (4)–(6), we see that the SDA is dominated by the computation of products like Ek u; E> k u; F k v ; F k v , for > arbitrary vectors u and v. By applying (13) recursively, products like Es u; E> u; F v ; F v have to be computed. This can s s s be realized using (4)–(6) in OðnÞ complexity and memory requirement, because of our assumptions on A; B; C; D. Similar to the SDA_ls in [23–25] for large-scale CAREs, DAREs and Stein and Lyapunov equations, in our SDA_ls for NAREs, the convergence of the doubling iteration competes with the exponential growth in Bik and C ik . The success of the SDA_ls depends on the trade-off between the accuracy of the approximate solution and its CPU-time and memory requirements, controlled by the compression and truncation of Bik and C ik in Section 2.3. The SDA_ls for NAREs, realizing the iteration in (3) with the help of (4), (6), (11), (12), (13), and the size mk formulae in (16)–(23) (involving small inverses of size mk , ignoring the size lk alternatives), with initial values in (24), the compression and truncation in (25) and (26) in Section 2.3, and the convergence control in Section 3.1. We summarize the SDA_ls in Algorithm 1 below.
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
733
Algorithm: (SDA_ls) A; B; C; D 2 Rnn ; shift s, positive tolerances
Input: Output:
nm
m m
sB ; sC and , and mmax ; lmax ;
nl
; R 2 R ; C i 2 R and T 2 Rl l ði ¼ 1; 2Þ, Bi 2 R > > with B1 R B2 and C 1 T C 2 approximating, respectively, the solutions X and Y to the large-scale NARE in (1) and its dual; 1 1 > > 1 > > Compute D1 s e; Cs e; Ds q; Cs q; e ðCs qÞ; q ðDs eÞ, for Bi0 ; C i0 ði ¼ 1; 2Þ; R0 ; T 0 , and E0 ; F 0 (as in (3), (11), (24)); Set k ¼ 0; e r 0 ¼ 2; E0 ¼ Es ; F 0 ¼ F s ; G0 ¼ Gs ; H0 ¼ Hs ; Do until convergence: e k þ hÞj < , If the relative residual ~rk ¼ jrk =ðhk þ m Set Bi ¼ Bik ; R ¼ Rk ; C i ¼ C ik and T ¼ T k ði ¼ 1; 2Þ; Exit End If Compute B1;kþ1 ¼ ½B1k ; F k B1k ; B2;kþ1 ¼ ½B2k ; E> k B2k , h i 1 > > Rkþ1 ¼ Rk Rk þ ðImk Rk B2k Gk B1k Þ Rk B2k Gk B1k Rk ;
C 1;kþ1 ¼ ½C 1k ; Ek C 1k ; C 2;kþ1 ¼ ½C 2k ; F > k C 2k , h i 1 > > B R ðI B T kþ1 ¼ T k T k þ T k C > 2k Gk B1k Rk Þ B2k C 1k T k ; 2k 1k k mk 2 > with Ek ¼ E2k1 þ E1k E> 2k ; F k ¼ F k1 þ F 1k F 2k (as in (13), (20a)–(23)); Compress Bi;kþ1 and C i;kþ1 ði ¼ 1; 2Þ, using the tolerances sB and sC , and modify Rkþ1 and T kþ1 , as in (25) and (26); e k , as in (34) in Section 3.1; Compute k k þ 1; rk ; hk and m End Do
2.3. Truncation and compression of Bik and C ik Let us start off by saying the truncation and compression process described in this section is unnecessary in most circumstances. It is described for the situation when the convergence of the SDA is slow in comparison with the exponential growth in the dimensions of the iterates Gk and Hk . In this situation, the numerical rank of X will be high and we obviously cannot achieve high accuracy in the approximation Hk of X by any method. We then have to compromise its accuracy for the sake of less memory and CPU-time consumption. We should then either choose larger tolerances for the truncation and compression process (sB ; sC below), to control the growth in the iterates and adjust them until the accuracy of the approximate solution is acceptable, or simply abandon the truncation and compression process and accept whatever approximate solution obtained within reasonable time. > B C In Gk ¼ C 1k T k C > 2k and H k ¼ B1k Rk B2k , let nik and nik be the widths of Bik and C ik ði ¼ 1; 2Þ, respectively. Apply the QR decomposition with column pivoting to obtain (for i ¼ 1; 2)
eB M eB; Bik ¼ Q Bik M Bik þ Q ik ik eC M eC; C ik ¼ Q C MC þ Q ik
with
Q Bik
nr Bik
2R
ik
ik
and
e B k 6 sB ; kM ik
Q Cik
ik
C
B
B
C
C
2 Rnrik ði ¼ 1; 2Þ being orthogonal, M Bik 2 Rrik nik and M Cik 2 Rrik nik being upper triangular,
e C k 6 sC kM ik
ði ¼ 1; 2Þ
and
r Bik ¼ rank Bik 6 nBik 6 mmax n; r Cik ¼ rank C ik 6 nCik 6 lmax n: Here sB and sC are some small positive tolerances controlling the compression and truncation process, and mmax and lmax are some preset upper bounds. From here on, let k k be the 2-norm unless otherwise stated. Truncating C ik and Bik by discarding the small components under ~, we have
h i b k þ OðsB Þ; Hk ¼ Q B1k M B1k Rk ðM B2k Þ> ðQ B2k Þ> þ OðsB Þ H h i b k þ OðsC Þ; Gk ¼ Q C1k M C1k T k ðMC2k Þ> ðQ C2k Þ> þ OðsC Þ G
with
734
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
b k Hk H b k; dH
b k Gk G b k: dG
ð25Þ
By the replacements or re-organization of symbols: (for i ¼ 1; 2)
Bik
Q Bik ;
Rk
MB1k Rk ðM B2k Þ> ;
C ik
Q Cik ;
Tk
MC1k T k ðM C2k Þ> ;
ð26Þ
we sacrifice a hopefully small bit of accuracy, in return for the control of the growth in Bik and C ik , as well as the corresponding demand on CPU-time and computer memory. We can also restrict Rk and T k to be square and nonsingular, applying further compression and truncation if necessary. At the end, we relabel the widths of Bik and C ik as mk and lk ði ¼ 1; 2Þ, respectively. In the SDA_ls, we further prevent any excessive growth of mk and lk by setting the respective upper bounds mmax and lmax . Lastly from [12], the convergence of the SDA depends on the fact that the iterates Hk and Gk are nonnegative componentwise, preserving the fact that I Gk Hk and I Hk Gk in (3) are invertible M-matrices with nonnegative inverses componentwise. The above truncation and compression strategy will not disturb the nonnegativity of Hk and Gk when the tolerances for truncation sB and sC are relatively small. The strategy has produced acceptably accurate approximate solution to X efficiently and we have presented the corresponding results in Section 4. We have yet to find any failures in the SDA_ls in our extensive numerical experiments. From (3), we have
Hk ¼ Hk1 þ dHk1 ;
dHk1 F k1 ðIn1 Hk1 Gk1 Þ1 Hk1 Ek1 ;
Gk ¼ Gk1 þ dGk1 ;
dGk1 Ek1 ðIn2 Gk1 Hk1 Þ1 Gk1 F k1 :
ð27Þ
It is obvious that dHk1 ; dGk1 are nonnegative componentwise. To be really safe, we can accept or reject the whole of dHk in (27), thus preserving the nonnegativity of Hk and Gk . To quantify the amount of perturbation allowed in Hk and Gk while preserving their monotonicity, we first quote a perturbation result from [34, Theorem 2.4]. For any real p q matrices A ¼ ðaij Þ and B ¼ ðbij Þ, we denote A 6 B if aij 6 bij for all i; j. In particular, A > 0 means that aij > 0 for all i; j. The matrix jAj is defined by jAj ¼ ðjaij jÞ 2 Rpq , and qðMÞ denotes the spectral radius of a square matrix M. Theorem 2.1. Let A ¼ D N be a nonsingular M-matrix, where D is diagonal and N has zero diagonal entries. Let c ¼ qðD1 NÞ. If dA is a perturbation matrix to A satisfying jdAj 6 jAj with 0 < < ð1 cÞ=ð1 þ cÞ, then A þ dA is a nonsingular M-matrix. b k , from the truncation process in (25) or any other sources. From (25), b k and G Let Hk and Gk be perturbed respectively to H we have
bkH b k D1k ; I Gk Hk ¼ I G b k D2k ; b kG I Hk Gk ¼ I H
ð28Þ
b k dH bkH b k dH b k þ dH b k with b k þ dH b k þ dG b k and D2k ¼ H b k dG b kG b k dG b k þ dG where D1k ¼ G
jD1k j ¼ Oð e s Þ;
jD1k j ¼ Oð e s Þ;
e s maxfsB ; sC g:
Since the matrix M defined in (2) is a nonsingular M-matrix, it is known from [12, Theorem 4.1] that I Gk Hk and I Hk Gk are bkH b k and nonsingular M-matrices for all k P 1. Assume that e s is sufficiently small in the sense of Theorem 2.1, such that I G 1 b k g1 b b b I H k G k remain nonsingular M-matrices. The monotonicity property of the truncated matrix sequences f H k gk¼1 and f G k¼1 will then be maintained, as summarized below: Theorem 2.2. Let M defined by (2) be a nonsingular M-matrix. Let X and Y be the minimal nonnegative solutions of NARE (1) and its dual matrix equation
YBY YA DY þ C ¼ 0; respectively, and let Rs ¼ ðR þ sIÞ1 ðR sIÞ and Ss ¼ ðS þ sIÞ1 ðS sIÞ, where R ¼ D CX and S ¼ A BY. If the parameter satisfies
s > max maxaii ; maxdii 16i6n
16i6n
such that Es ; F s ; Gs ; Hs > 0, and if e s is sufficiently small, then for all k P 1, we have k
(a) Ek > 0 with kEk k1 6 kR2s k1 þ Oð e s Þ; k (b) F k > 0 with kF k k1 6 kS2s k1 þ Oð e s Þ; bkH b k are nonsingular M-matrices; b k and I H bkG (c) I G
s
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
735
bk 6 H b kþ1 6 X and kX H b k k 6 kXk kS2k k kR2k k þ Oð e (d) 0 6 H s Þ; 1 1 s 1 s 1 bk 6 G b kþ1 6 Y and kY G b k k 6 kYk kS2k k kR2k k þ Oð e (e) 0 6 G s Þ, 1 1 s 1 s 1
s maxfsB ; sC g. where k k1 denotes the matrix 1-norm and e 2.4. SDA and Krylov subspaces There is an interesting relationship between the SDA_ls and Krylov subspaces. Define the Krylov subspaces
( Kk ðA; VÞ
spanfVg
ðk ¼ 0Þ; 2
spanfV; AV; A V; . . . ; A
2k 1
Vg ðk > 0Þ:
From (4), (6), (16) and (20a)–(23), we see that 1 1 B10 ¼ W 1 s e 2 K0 ðAs ; As eÞ;
B11
> > B20 ¼ D> s e 2 K0 ðDs ; Ds eÞ; 1 > > > ¼ ½B10 ; F 0 B10 2 K1 ðA1 s ; As eÞ; B21 ¼ ½B20 ; E0 B20 2 K1 ðDs ; Ds eÞ:
(We have abused notations, with V 2 Kk ðA; BÞ meaning spanfVg # Kk ðA; BÞ.) Similarly, it is easy to show that 1 B1k 2 Kk ðA1 s ; As eÞ; 1 C 1k 2 Kk ðD1 s ; Ds qÞ;
> B2k 2 Kk ðD> s ; Ds eÞ; > C 2k 2 Kk ðA> s ; As qÞ:
ð29Þ
Note that, for k ¼ 0; 1; . . ., 1 1 1 Kk ðA1 s ; As eÞ ¼ Kk ðDs ; Ds eÞ; 1 1 1 Kk ðD1 s ; Ds qÞ ¼ Kk ðCs ; Cs qÞ;
> > > Kk ðD> s ; Ds eÞ ¼ Kk ðCs ; Cs eÞ; > > > Kk ðA> s ; As qÞ ¼ Kk ðDs ; Ds qÞ:
In summary, the general SDA is closely related to approximating the solutions X and Y using Krylov subspaces, with additional components diminishing quadratically. However, for problems of moderate size n; Bk and C k become full-ranked after a few iterations. The link between the SDA and the Krylov subspaces defined above is important in explaining the fast convergence of the SDA. We used to believe the convergence of the SDA came from the following inequalities from (27):
kdHk1 k 6 kF k1 kkðIn1 Hk1 Gk1 Þ1 Hk1 kkEk1 k; kdGk1 k 6 kEk1 kkðIn2 Gk1 Hk1 Þ1 Gk1 kkF k1 k and the fact that kEk1 k; kF k1 k ! 0 quadratically. However, we understand now the power of approximation of the Krylov subspaces contributes in (27) itself as well, creating cancellations and diminishing components in dHk1 and dGk1 themselves. This is consistent with numerical results from examples associated with M in (2) which is barely an M-matrix, where the corresponding Ek ; F k ! 0 slowly but the overall convergence for Hk and Gk are much faster. 2.5. Errors of SDA_ls Readers should consult Theorem 2.1, or the original detailed error analysis for the SDA, in [12, Section 6], We shall write down the simple error analysis for the truncated SDA_ls, applying the theory for Galerkin methods as in [16–18]. However, there is a hidden difference flowing from the fact that SDA_ls generates automatically the Krylov subspaces in (29), from which Bik and C ik ði ¼ 1; 2Þ, or the approximate solutions Gk and Hk , are generated respectively. In [18], the Krylov subspaces have to be generated explicitly by some Arnoldi processes. Both approaches yield small residuals for the algebraic Riccati equations in different forms. (However, the Arnoldi process may not benefit from the diminishing Ek and F k ; more on that later.) First, let us consider the following Galerkin method for NAREs. Let the column spaces of the orthogonal Z 1 ; Z 2 2 RnN be some Krylov subspaces associated with the solution of our NARE and assume that the approximate solution has the low-rank e Z 1 KZ > , with K 2 RNN . Assume that X e approximates the solution X to the AREs in the sense that the residual Rð X e Þ is form X 2 small of order OðÞ. As mentioned before, the Krylov subspaces spanned by the columns of Z 1 and Z 2 may be the results of an Arnoldi process [15–18] or the SDA_ls. e in place of X in (1), we have Substitute X
Z 1 KZ >2 CZ 1 KZ >2 Z 1 KZ >2 D AZ 1 KZ >2 þ B ¼ OðÞ; which leads to the projected NARE in K of size N:
eK þ B e K KD eA e ¼ OðÞ; e KÞ K C Rð
ð30Þ
736
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
with
e Z > AZ 1 ; A 1
B Z >1 BZ 2 ;
e Z > CZ 1 ; C 2
e Z > AZ 2 : D 2
e 1 ; ½Z 2 ; Z e 2 2 Rnn . We then have We next extend Z 1 and Z 2 respectively to the orthogonal matrices ½Z 1 ; Z
"
e 1 > Rð XÞ½Z e 2; Z e2 ¼ ½Z 1 ; Z
e KÞ OðÞ Rð OðÞ
OðÞ
#
¼ OðÞ:
e KÞ ¼ OðÞ. If the associated Krylov subspaces are known or have been In the SDA_ls, a particular K is produced such that Rð b to the small projected NARE generated from an Arnoldi process, a solution K
e KÞ ¼ 0 Rð
ð31Þ
b Z1 K e and X b are b Z 2 to the NARE in (1). Consequently, X can be computed, generating an approximate Galerkin solution X approximately equal, as they solves two equations differed by at most OðÞ. Note that proving the solvability of (31) is nontrivial from the solvability of (1). Also, the Galerkin method summarized in (31) will be inferior to the SDA_ls if the Krylov subspaces are selected inappropriately. Note that many others selected the Krylov subspaces generated arbitrarily by C, D, their inverses and transposes, by the Arnoldi process. These approaches resulting in subspaces very different from those in (29) by the SDA_ls, which also benefitting from the diminishing magnitudes of Ek and F k . This explains the relatively slow convergence of some Galerkin or Krylov-subspace methods. 3. Computational issues 3.1. Residual and convergence control For the convergence control in Algorithm 1, we should compute residuals and differences of iterates carefully in OðnÞ complexity. From (27), we have
e 1k Te k C e> ; dGk Gk Gk1 ¼ C 2k
ð32Þ
with the already computed quantities
e 1k Ek1 ðIn Gk1 Hk1 Þ1 C 1;k1 ; C 2
Te k T k1 ;
e 2k ¼ F > C 2k : C k1
All the calculations in this subsection involve the norms of similar low-ranked matrices as in dGk . To compute kdGk k effie i;k ði ¼ 1; 2Þ and modify T e k accordingly, compressing as in (25) and (26) without truncation, we then ciently, orthogonalize C have
e 1;k Te k C e > k ¼ k Te k k kdGk k ¼ k C 2;k e k is very small in dimension relative to dGk . in 2- or F-norm. Note that T Similarly from (27), we have
ek B e> ; e 1k R dHk Hk Hk1 ¼ B 2k
ð33Þ
with the already computed quantities
e 1k F k1 ðIn Hk1 Gk1 Þ1 B1;k1 ; B 1
e k Rk1 ; R
e 2k E> B2;k1 : B k1
For the residual r k kRðHk Þk of the NARE, the corresponding relative residual equals
er k
rk ; ek þh hk þ m
with
hk kAHk þ Hk Dk;
e k kHk CHk k; m
h kBk:
We then have
b 1k R bk B b > k; rk ¼ kHk CHk AHk Hk D þ Bk ¼ k B 2k with
2k D> B2k ; B2k ; B 1k ; B1 ; B 2k ; B2 ; b 1k B b 2k B B 0 Rk k Rk B> CB1k Rk ; R k 0 Rk ; R bk R 2k k R: Rk 0 Rk R
1k ½AB1k ; B1k ; B
737
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
e ik ; B b ik ; Bik ; Bi ; B ik ð¼ Bik Þ and B ik ði ¼ 1; 2Þ and transforming respectively R ek; R b k ; Rk ; R; R k and R k , we After orthogonalizing B have efficient formulae in 2- or F-norm:
e k k; kdHk k ¼ k R h ¼ kRk;
b k k; kHk k ¼ kRk k; rk ¼ k R k k; hk ¼ kR k k: e k ¼ kR m
ð34Þ
3.2. Operation and memory counts We assume that cs mn flops are required in the products MR or M > R (with M ¼ A; D), or the solution of MZ ¼ R or M > Z ¼ R (with M ¼ As ; Ds ), and R 2 Rnm . A start up cost of 10n flops is made up of the following: (1) set up As ¼ A þ sI and Ds ¼ D þ sI, requiring 2n flops; and (2) set up Bi0 and C i0 ði ¼ 1; 2Þ as in (24) with the help of (6) and (11), requiring 8n flops. The operation and memory counts of Algorithm 1 (SDA_ls) for the kth iteration are summarized in Table 1 below. In the third column, the number of variables is recorded. Only OðnÞ operations or memory requirement are included. Note that > most of the work is done in the computation of Bi;kþ1 and C i;kþ1 ði ¼ 1; 2Þ, for which F k B1k ; E> k B2k ; Ek C 1k and F k C 2k in (16) and (18) have to be calculated recursively, as Ek and F k in (13) are not available explicitly. In Table 1, we use the notation P N k kj¼1 ðlj þ mj Þ. The operation count for the QR decomposition of an n r matrix is 4nr 2 flops [6, p. 250]. With lk and mk controlled by the compression and truncation in Section 2.3, the operation count will be dominated by the calculation of Bi;kþ1 and C i;kþ1 ði ¼ 1; 2Þ. In our numerical examples in Section 4, the flop count near the end of Algorithm 1 dominates, with the work involved in one iteration approximately doubled that of the previous one. This corresponds to the 2kþ2 factor in the total flop count. However, the last iteration is virtually free, as there is no need to prepare for the next iteration. Before the numerical examples, we shall consider the main point of the SDA_ls once again, as lk ; mk can grow like 2k when uncontrolled. When the convergence of the SDA is fast relative to the 2k factor in the flop count, that is, 2k n, then mk ; lk are just constants relative to n (which we assume to be large). If the convergence of the SDA is not fast, we control the growth of lk ; mk by setting the tolerances sB and sC or the upper limits mk 6 mmax ; lk 6 lmax . Again, mk and lk will be constants relative to n. Of course, in such a case we cannot expect great accuracy from the SDA_ls. The solution X is obviously not numerically low-ranked and cannot be approximated to high accuracy by any method. As we have stated before, the SDA_ls is a competition between the convergence of the iterates and the growth of mk ; lk ; 2k . If the convergence is slow, compromise has to be made between the accuracy of the approximate solution and the computing resource we are willing to consume. 4. Numerical examples The numerical examples were constructed as in [14] with ðdi ; ci ; qi Þ randomly chosen while satisfying the solvability conditions. Only non-critical cases were tested, to illustrate the feasibility of the SDA_ls. The critical cases will behave similarly, although with much slower convergence. All the examples are the same except for the values of n ¼ 10000; 50000; 100000; 500000; 1000000. The numerical results were computed using MATLAB [30] Version R2010b, on a HP SL390s workstation with a 3.60 GHz Intel Xeon X5687 4-core processor and 48 GB RAM, with machine accuracy eps ¼ 2:22 1016 . P We have included tables containing kdHk k=kHk k; e r k ; mk ; dt k (the CPU-time for the kth iteration) and tk ( ki¼1 dt i , the total CPU-time up to the kth iteration), for various k. As discussed before, the operation counts are dominated by that for the last but one iteration, as each iteration doubles the previous one in work. The last iteration is virtually free because there is no need to prepare the updated Krylov bases for the next iteration.
Table 1 Operation and memory counts for the kth iteration in Algorithm 1 (SDA_ls). Computation
Memory
Bi;kþ1 ; C i;kþ1
Flops h i 2kþ2 cs ðlk þ mk Þ þ 2ð6lk þ 1ÞN k n
Rkþ1 ; T kþ1
8lk mk n
Ei;kþ1 ; F i;kþ1 Compress Bi;kþ1 ; C i;kþ1
4lk mk n
Oðlk þ m2k Þ – –
Modify Rkþ1 ; T kþ1
Oðlk þ m3k Þ h i 8ð2mk þ 1Þ2 þ 40m2k n h i 2 2kþ2 cs ðlk þ mk Þ þ 2ð6lk þ 1ÞN k þ 4ðlk þ m2k þ 3lk mk Þ þ 8ð2mk þ 1Þ2 þ 40m2k n
~rkþ1 Total
2
8ðlk þ m2k Þn 3
2N kþ1 n 2
– – 2N kþ1 n
738 Table 2 Example 1 ðn ¼ 104 ;
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
s ¼ 1497:48; sB ; sC ¼ 1015 ; mk ; lk 6 500Þ.
k
kdHk k=kHk k
e rk
mk
lk
dt k
tk
1 2 3 4 5 6 7
1.7e–03 3.5e–01 1.6e–01 3.4e–02 1.9e–03 7.7e–06 1.9e–10
1.6e–01 6.0e–02 1.1e–02 5.6e–04 2.2e–06 5.3e–11 2.5e–15
2 4 8 16 32 59 104
2 4 8 16 32 59 104
1.50e–01 1.90e–01 2.90e–01 4.30e–01 1.34e+00 4.32e+00
1.50e–01 3.40e–01 6.30e–01 1.06e+00 2.40e+00 6.72e+00
Table 3 Example 2 ðn ¼ 5 104 ;
s ¼ 196:25; sB ; sC ¼ 1015 ; mk ; lk 6 500Þ.
k
kdHk k=kHk k
e rk
mk
lk
dtk
tk
1 2 3 4 5 6 7
5.8e–01 4.2e–01 2.2e–01 6.3e–02 5.8e–03 5.9e–05 8.5e–09
2.1e–01 9.0e–02 2.2e–02 1.8e–03 1.8e–05 2.5e–09 5.9e–15
2 4 8 16 29 53 88
2 4 8 16 29 53 88
3.50e–01 2.70e–01 6.50e–01 2.10e+00 6.43e+00 2.41e+01
3.50e–01 6.20e–01 1.27e+00 3.37e+00 9.80e+00 3.39e+01
Table 4 Example 3 ðn ¼ 105 ; s ¼ 136:05; sB ; sC ¼ 1015 ; mk ; lk 6 500Þ. k
kdHk k=kHk k
er k
mk
lk
dtk
tk
1 2 3 4 5 6 7 8
2.4e+00 4.0e–01 1.9e–01 5.1e–02 3.9e–03 2.8e–05 2.0e–09 2.7e–14
2.0e–01 7.8e–02 1.7e–02 1.2e–03 8.2e–06 5.6e–10 1.2e–14 9.2e–15
2 4 8 16 32 59 104 165
2 4 8 16 32 59 104 165
5.30e–01 4.70e–01 1.37e+00 4.51e+00 1.32e+01 5.49e+01 1.73e+02
5.30e–01 1.00e+00 2.37e+00 6.88e+00 2.01e+01 7.50e+01 2.48e+02
Comparing with CAREs of similar sizes, at a particular iteration, the CPU-times required for NAREs roughly doubled. This is due to the lack of symmetry in NAREs, reflected by the 2kþ2 factor in Table 1, instead of the 2kþ1 factor in [24]. For all examples, s is chosen as in Theorem 2.2 or [12]. Example 1. We tried various parameters in sB ; sC ; mmax and lmax , looking for a good balance between accuracy and efficiency. The parameters mmax ; lmax play secondary roles and are only created to prevent disasters from badly chosen sB ; sC . We may choose large sB ; sC , and restrict mmax ; lmax , to find out the relative numerical ranks of X. Equally, we may choose small mmax ; lmax , and relax sB ; sC with small values, to find out the corresponding achievable accuracy. For the numerical results we present here, we try to achieve accuracies as high as possible, within a few thousand seconds. We choose sB ad sC to be 1015 , similar to the accuracy we like to achieve for the relative residual e r k . We relax mmax and lmax to the large values of 300 or 500, essentially abandoning these particular controls of mk ; lk . We found the numerical results acceptable in terms of CPU time and have not tried to optimize the parameters for SDA_ls. Similarly in other examples, it usually take two or three tests to find out the best combinations of parameters for a given class of problems. For higher accuracies, smaller sB ; sC and larger mmax ; lmax are required, demanding more computing resources. In the following examples, the choice sB ; sC ¼ 1012 or 1015 makes very little different in the numerical results. For n ¼ 10; 000, a near machine accuracy of 1015 was achieved within 6.72 s and 7 iterations, as summarized in Table 2. The CPU time t k roughly double that for the previous iteration, as predicted in Table 1. Example 2. For n ¼ 50; 000, a near machine accuracy of 1015 was achieved within 33.9 s and 7 iterations, as summarized in Table 3. The CPU time tk roughly double that for the previous iteration for the first few iterations, as predicted in Table 1. For k P 4; tk 3tk1 , possibly because of the swapping of virtual memory between the RAM and the hard disk, because of the large amount of data involved. This observation holds true also for the later larger examples.
739
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740 Table 5 Example 4 ðn ¼ 5 105 ; s ¼ 59:58; sB ; sC ¼ 1015 ; mk ; lk 6 500Þ. k
kdHk k=kHk k
e rk
mk
lk
dtk
tk
1 2 3 4 5 6 7 8
7.9e+01 5.5e–01 3.4e–01 1.4e–01 2.5e–02 8.4e–04 1.0e–06 1.8e–12
3.0e–01 1.6e–01 5.5e–02 8.7e–03 2.8e–04 3.3e–07 5.8e–13 3.0e–14
2 4 8 16 32 59 104 165
2 4 8 16 32 59 104 165
1.33e+00 1.63e+00 7.92e+00 2.22e+01 6.98e+01 2.42e+02 7.32e+02
1.33e+00 2.96e+00 1.09e+01 3.30e+01 1.03e+02 3.45e+02 1.08e+03
Table 6 Example 5 ðn ¼ 106 ; s ¼ 59:59; sB ; sC ¼ 1015 ; mk ; lk 6 300Þ. k
kdHk k=kHk k
e rk
mk
lk
dtk
tk
1 2 3 4 5 6 7 8
1.6e+02 5.5e–01 3.4e–01 1.4e–01 2.5e–02 8.4e–04 1.0e–06 1.8e–12
3.0e–01 1.6e–01 5.5e–02 8.7e–03 2.8e–04 3.3e–07 6.0e–13 2.3e–13
2 4 8 16 32 59 104 165
2 4 8 16 32 59 104 164
2.16e+00 2.67e+00 1.40e+01 3.76e+01 1.17e+02 4.15e+02 1.32e+03
2.16e+00 4.83e+00 1.88e+01 5.64e+01 1.73e+02 5.88e+02 1.91e+03
Example 3. For n ¼ 100; 000, a near machine accuracy of 1015 was achieved within 248 s and 8 iterations, as summarized in Table 4. The CPU time tk roughly double that for the previous iteration for the first few iterations. For k P 4; tk 3t k1 because of the large amount of data involved. Example 4. For n ¼ 500; 000, a near machine accuracy of 1014 was achieved within 1,080 s and 8 iterations, as summarized in Table 5. The CPU time tk roughly double that for the previous iteration for the first few iterations. For k P 3; tk 3t k1 because of the large amount of data involved. Example 5. For n ¼ 1; 000; 000, an accuracy of 1013 was achieved within 1, 910 s and 8 iterations, as summarized in Table 6. The CPU time tk roughly double that for the previous iteration for the first few iterations. For k P 3; t k 3tk1 because of the large amount of data involved. We accept the slightly lower accuracy in Examples 4 and 5 in order to avoid excessive amount of computation. One or two more iterations will produce better answers but approximately two or four times more computing time will be required.
5. Conclusions We have proposed a structure-preserving doubling algorithm, the SDA_ls, for the large-scale nonsymmetric algebraic Riccati equation in (1), with A and D being large and sparse(-like), and B and C being low-ranked. The trick is to apply the Sherman–Morrison–Woodbury formula when appropriate and not to form Ek and F k (the iterates for E and F) explicitly. For well-behaved non-critical NAREs, low-ranked approximations to the solutions X and Y can be obtained efficiently. The convergence of the SDA_ls is quadratic, ignoring the compression and truncation of Bk and C k , as shown in [12,27]. The computational complexity and memory requirement are both OðnÞ per iteration. Similar to the methods in [23–25] for other large-scale algebraic Riccati and linear matrix equations, our technique can be applied when A and D are large and sparse(-like), or are products (inverses) of such matrices. The feasibility of the SDA_ls depends on whether A1 u; A> u; D1 v and D> v can be formed efficiently, for arbitrary vectors u and v. The proposed SDA_ls solves NAREs efficiently. For instance in Section 4, Example 5, of dimension n ¼ 1; 000; 000 with one trillion variables in the solution X, was solved using MATLAB on a HP SL390s workstation within 1,910 s to a near-machine accuracy of Oð1013 Þ. Acknowledgment This research work is partially supported by the National Science Council and the National Centre for Theoretical Sciences, Taiwan. The first author has been supported by a Monash Graduate Scholarship and a Monash International Postgraduate
740
P.Chang-Yi Weng et al. / Applied Mathematics and Computation 219 (2012) 729–740
Research Scholarship. It is truly grateful from the second author that this research was partially supported by the National Science Council in Taiwan under Grant NSC 100-2115-M-003-004. Part of the work was completed when the third author visited the National Centre for Theoretical Sciences at Tainan and the National Chiao Tung University, and we would like to thank these institutions. References [1] Z.-Z. Bai, Y.-H. Gao, L.-Z. Lu, Fast iterative schemes for nonsymmetric algebraic Riccati equations arising from transport theory, SIAM J. Sci. Comput. 30 (2008) 804–818. [2] D.S. Bernstein, W.M. Haddad, LQG control with an H1 performance bound: a Riccati equation approach, IEEE Trans. Automat. Control 34 (1989) 293– 305. [3] D.A. Bini, B. Iammazzo, F. Poloni, A fast Newtons method for a nonsymmetric algebraic Riccati equation, SIAM J. Matrix Anal. Appl. 30 (2008) 276–290. [4] B. De moor, J. David, Total linear least squares and the algebraic Riccati equations, Syst. Control Lett. 18 (1992) 329–337. [5] I. Gohberg, M.A. Kaashoek, An inverse spectral problem for rational matrix functions and minimal divisibility, Integral Equat. Oper. Theory 10 (1987) 437–465. [6] G.H. Golub, C.F. Van loan, Matrix Computations, second ed., Johns Hopkins University Press, Baltimore, MD, 1989. [7] C.-H. Guo, Nonsymmetric algebraic Riccati equations and Wiener–Hopf factorization for M-matrices, SIAM J. Matrix Anal. Appl. 23 (2001) 225–242. [8] C.-H. Guo, A new class of nonsymmetric algebraic Riccati equations, Linear Alg. Appl. 426 (2007) 636–649. [9] C.-H. Guo, N.J. Higham, Iterative solution of a nonsymmetric algebraic Riccati equation, SIAM J. Matrix Anal. Appl. 29 (2007) 396–412. [10] C.-H. Guo, P. Lancaster, Analysis and modification of Newton’s method for algebraic Riccati equations, Math. Comput. 67 (1998) 1089–1105. [11] C.-H. Guo, W.-W. Lin, Convergence rates of some iterative methods for nonsymmetric algebraic Riccati equations arising in transport theory, Linear Alg. Appl. 432 (2010) 283–291. [12] X.-X. Guo, W.-W. Lin, S.-F. Xu, A structure-preserving doubling algorithm for nonsymmetric algebraic Riccati equations, Numer. Math. 103 (2006) 393– 412. [13] D. Hinrichsen, B. Kelb, A. Linnemann, An algorithm for the computation of the structured complex stability radius, Automatica 25 (1989) 771–775. [14] J. Juang, Existence of algebraic matrix Riccati equations arising in transport theory, Linear Alg. Appl. 230 (1995) 89–100. [15] I.M. Jaimoukha, E.M. Kasenally, Krylov subspace methods for solving large Lyapunov equations, SIAM J. Numer. Anal. 31 (1994) 227–251. [16] K. Jbilou, Block Krylov subspace methods for large continuous-time algebraic Riccati equations, Numer. Alg. 34 (2003) 339–353. [17] K. Jbilou, An Arnoldi based algorithm for large algebraic Riccati equations, Appl. Math. Lett. 19 (2006) 437–444. [18] K. Jbilou, Low rank approximate solutions to large Sylvester matrix equations, Appl. Math. Comput. 177 (2006) 365–376. [19] P. Lancaster, L. Rodman, Solutions of the continuous and discrete-time algebraic Riccati equations: a review, in: S. Bittanti, A.J. Laub, J.C. Willems (Eds.), The Riccati Equations, Springer-Verlag, Berlin, 1991. [20] P. Lancaster, L. Rodman, Algebraic Riccati Equations, Clarendon Press, Oxford, 1995. [21] T. Li, E.K.-W. Chu, J. Juang, W.-W. Lin, Solution of a nonsymmetric algebraic Riccati equation from a two-dimensional transport model, Linear Alg. Appl. 434 (2011) 201–214. [22] T. Li, E.K.-W. Chu, J. Juang, W.-W. Lin, Solution of a nonsymmetric algebraic Riccati equation from a one-dimensional multi-state transport model, IMA J. Numer. Anal. 31 (2011) 1453–1467. [23] T. Li, E.K.-W. Chu, W.-W. Lin, Solving large-scale discrete-time algebraic Riccati equations by doubling, Technical Report, NCTS Preprints in Mathematics, National Tsing Hua University, Hsinchu, Taiwan, 2012. [24] T. Li, E.K.-W. Chu, W.-W. Lin, C.-Y. Weng, Solving large-scale continuous-time algebraic Riccati equations by doubling, Technical Report, NCTS Preprints in Mathematics, National Tsing Hua University, Hsinchu, Taiwan, 2012. [25] T. Li, C.-Y. Weng, E.K.-W. Chu, W.-W. Lin, Solving large-scale Stein and Lyapunov equations by doubling, Technical Report, NCTS Preprints in Mathematics, National Tsing Hua University, Hsinchu, Taiwan, 2012. [26] L. Liu, Mixed and componentwise condition numbers of nonsymmetric algebraic Riccati equation, Appl. Math. Comput. 218 (2012) 7595–7601. [27] W.-W. Lin, S.-F. Xu, Convergence analysis of structure-preserving doubling algorithms for Riccati-type matrix equations, SIAM J. Matrix Anal. Appl. 28 (2008) 26–39. [28] Y. Lin, L. Bao, Y. Wei, A modified Newton method for solving non-symmetric algebraic Riccati equations arising in transport theory, IMA J. Numer. Anal. 29 (2008) 215–224. [29] L.-Z. Lu, Newton iterations for a non-symmetric algebraic Riccati equation, Numer. Linear Alg. Appl. 12 (2005) 191–200. [30] Mathworks. MATLAB User’s Guide, 2010. [31] V. Mehrmann, in: The Autonomous Linear Quadratic Control Problem, Lecture Notes in Control and Information Sciences, 163, Springer-Verlag, 1991. [32] V. Mehrmann, H. Xu, Explicit solutions for a Riccati equations from transport theory, SIAM J. Matrix Anal. 30 (2008) 1339–1357. [33] D. Williams, in: A Potential-Theoretical Note on the Quadratic Wiener–Hopf Equation for Q-matrices, in Seminar on Probability XVI, Lecture Notes in Mathematics, 920, Springer-Verlag, Berlin, 1982, pp. 91–94. [34] J. Xue, E. Jiang, Entrywise relative perturbation theory for nonsingular M-matrices and applications, BIT 35 (1995) 417–427.