INT. J. CONTROL,
20 MAY 2004, VOL. 77, NO. 8, 767–788
Structure-preserving algorithms for periodic discrete-time algebraic Riccati equations E. K.-W. CHUy, H.-Y. FANz, W.-W. LINz* and C.-S. WANG} In this paper we investigate structure-preserving algorithms for computing the symmetric positive semi-definite solutions to the periodic discrete-time algebraic Riccati equations (P-DAREs). Using a structure-preserving swap and collapse procedure, a single symplectic matrix pair in standard symplectic form is obtained. The P-DAREs can then be solved via a single DARE, using a structure-preserving doubling algorithm. We develop the structure-preserving doubling algorithm from a new point of view and show its quadratic convergence under assumptions which are weaker than stabilizability and detectability. With several numerical results, the algorithm is shown to be efficient, out-performing other algorithms on a large set of benchmark problems.
1. Introduction In this paper we investigate structure-preserving algorithms for computing the symmetric positive semi-definite (s.p.s.d.) solutions fXj gpj¼1 to the periodic discrete-time algebraic Riccati equations (P-DAREs) of period p 1
ð1Þ
Here, for all j, Aj ¼ Ajþp 2 Rnj nj1 with nj ¼ njþp , Xj ¼ Xjþp 2 Rnj nj , Rj ¼ Rjþp 2 Rmj mj and Qj ¼ Qjþp 2 Rrj rj are symmetric positive definite (or s.p.d.; i.e. Rj , Qj > 0), Bj ¼ Bjþp 2 Rnj mj , Sj ¼ Sjþp 2 Rrj mj , and Cj ¼ Cjþp 2 Rrj nj1 , with Bj, CjT being of full column T rank. Furthermore, the matrix Qj Sj R1 j Sj is supposed to be symmetric positive definite. Throughout this paper, the indices j for all periodic matrices are chosen in f1, . . . , pg modulo p. Equations in (1) arise frequently in the periodic discrete-time linear optimal control problem for the periodic systems xjþ1 ¼ Aj xj þ Bj uj , yj ¼ Cj xj
xj 2 Rnj1
with the controls {uj} chosen through optimizing the cost function min J ¼ uj
1 2
uj ¼ ðRj þ BTj Xj Bj Þ1 ðBTj Xj Aj þ SjT Cj Þxj ð j ¼ 1, . . . , pÞ ð2Þ where fXj gpj¼1 are s.p.s.d. solutions to (1).
Xj1 ¼ ATj Xj Aj ðBTj Xj Aj þ SjT Cj ÞT ðRj þ BTj Xj Bj Þ1 ðBTj Xj Aj þ SjT Cj Þ þ CjT Qj Cj :
The periodic optimal feedback controls uj are given by (Bittanti et al. 1991)
1 X
yTj Qj yj þ uTj Rj uj þ yTj Sj uj þ uTj SjT yj :
j¼1
Received 1 April 2003. Revised and accepted 17 April 2004. * Author for correspondence. e-mail:
[email protected]. edu.tw y School of Mathematical Science, Building 28, Monash University, VIC 3800, Australia. z Department of Mathematics, National Tsing Hua University, Hsinchu 30043, Taiwan. } Department of Mathematics, National Cheng Kung University, Tainan 701, Taiwan.
Definition 1 (Bittanti et al. 1991): The periodic matrix pairs fðAj , Bj Þgpj¼1 are said to be p-stabilizable (P-S) if the pairs ðAj , Bj Þ are stabilizable (S), for j ¼ 1, . . . , p, where Aj Aj ðpÞ Aj ð1Þ and h Bj Aj ðpÞ Aj ð2Þ Bj ð1Þ jAj ðpÞ Aj ð3Þ Bj ð2Þ j i jAj ðpÞ Bj ðp1Þ jBj ðpÞ with the permutation j defined by j ðkÞ ¼
k j þ 1 þ p, k j þ 1,
for k ¼ 1, . . . , j 1, for k ¼ j, . . . , p:
Definition 2 (Bittanti et al. 1991): The periodic matrix pairs fðAj , Cj Þgpj¼1 are said to be p-detectable (P-D) if the pairs ðAj , Cj Þ are detectable (D), for j ¼ 1, . . . , p, where Aj and j are defined as in Definition 1, and h Cj CTj ð1Þ jATj ð1Þ CTj ð2Þ jATj ð1Þ ATj ð2Þ CTj ð3Þ j iT jATj ð1Þ ATj ðp1Þ CTj ðpÞ : Note that the pair ðA, BÞ is stabilizable (S) if wT B ¼ 0 and wT A ¼ wT for some constant implies jj < 1 or w ¼ 0, and the pair ðA, CÞ is detectable (D) if ðAT , C T Þ is stabilizable. Under assumptions of (P-S) and (P-D), P-DAREs have been proved to possess unique s.p.s.d. solutions (Bittanti et al. 1988, 1991).
International Journal of Control ISSN 0020–7179 print/ISSN 1366–5820 online # 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00207170410001714988
768
E. K.-W. Chu et al.
Via elementary matrix calculation, one can show that the P-DAREs (1) are equivalent to the form T T 1 T Xj1 ¼ ðAj Bj R1 j Sj Cj Þ Xj ðAj Bj Rj Sj Cj Þ T T T 1 ðAj Bj R1 j Sj Cj Þ Xj Bj ðRj þ Bj Xj Bj Þ T BTj Xj ðAj Bj R1 j Sj Cj Þ T þ CjT ðQj Sj R1 j Sj ÞCj
ð3Þ
for j ¼ 1, . . . , p. It is easily seen that the periodic matrix pairs fðAj , Bj Þgpj¼1 are p-stabilizable if and only if p T fðAj Bj R1 j Sj Cj , Bj Þgj¼1 are p-stabilizable. Similarly, the periodic matrix pairs fðAj , Cj Þgpj¼1 are p-detectable p T if and only if fðAj Bj R1 j Sj Cj , C j Þgj¼1 are p-detectable, T where C j Cj is a full rank decomposition (FRD) of T rj nj1 . Consequently, CjT ðQj Sj R1 j Sj ÞCj with C j 2 R with the following FRDs T Gj :¼ Bj R1 j Bj 0,
T
Hj :¼ C j C j 0
ð4Þ
there is no loss of generality to consider, instead of (3), the following P-DAREs Xj1 ¼ ATj Xj Aj ATj Xj Bj ðRj þ BTj Xj Bj Þ1 BTj Xj Aj þ Hj or Xj1 ¼ ATj Xj ðInj þ Gj Xj Þ1 Aj þ Hj :
ð5Þ
Note that (5) is obtained using the Sherman–Morrison– Woodbury formula (SMWF; see, e.g. Golub and Van Loan 1996, p. 50) when ðInj þ Gj Xj Þ1 exists. We now consider the periodic matrix pairs fðMj , Lj Þgpj¼1 associated with the P-DAREs in (5) with " # 0 Aj Mj ¼ 2 Rðnj1 þnj Þ2nj1 Hj Inj1 " # ð6Þ In j G j ðnj1 þnj Þ2nj Lj ¼ 2R 0 ATj where Inj denotes the identity matrix of compatible order. It is easily seen that the matrix pair ðMj , Lj Þ is symplectic, that is " # 0 Inj1 T T Mj Jj Mj ¼ Lj Jjþ1 Lj , Jj : ð7Þ Inj1 0 The matrix pair in the form of (6), with Hj , Gj being s.p.s.d., is said to be a standard symplectic form (SSF). Being in SSF is a central concept in our paper and is stronger than being a symplectic form as defined in (7). Being in SSF is the structure we try to preserve in the numerical algorithm. From (6), the P-DAREs in (5) can be written, for all j, as Inj1 Inj Mj ð8Þ ¼ Lj Fj Xj1 Xj
for some appropriate Fj 2 Rnj nj . In the case when all Aj have equal size and are non-singular, the s.p.s.d. solutions Xj to (5) can be easily obtained through the invariant subspaces, associated with the eigenvalues inside the unit disc, of the periodic matrices (Hench and Laub 1994) 1 1 Pj ¼ L1 jþp1 Mjþp1 Ljþp2 Mjþp2 Lj Mj :
ð9Þ
Under the (P-S) and (P-D) assumptions, each Pj has n such stable eigenvalues. If the columns of exactly Z1j span the stable invariant subspace of Pj , then Z1j Z2j 1 is non-singular and Xj ¼ Z2j Z1j for j ¼ 1, . . . , p. Note that these relations still hold in a generalized sense if some of the Aj are singular or not squared (Bittanti et al. 1991, Hench and Laub 1994, Van Dooren 1999, Benner and Byers 2001, Benner et al. 2002). The theory and algorithms for this general case will be considered in } 3. Periodic linear systems arise naturally from continuous linear systems, when multi-rate sampling is performed (Francis and Georgiou 1998). These systems have many interesting and practical applications, with notable examples such as the helicopter ground resonance damping problem and the satellite altitude control problems (Varga and Pieters 1998, Benner et al. 1999, Bittanti and Colaneri 1999). Large state-space dimensions or large periods appear in different circumstances. The analysis and design of such systems have received much attention in recent years (Bittanti et al. 1991, Sreedhar and Van Dooren 1994, 1998, Varga 1997, Varga and Pieters 1998, Bittanti and Colaneri 1999). A numerically backward-stable periodic QZ algorithm for the P-DAREs, which relies on an extension of the generalized Schur method, has been proposed in Bojanczyk et al. (1992) and Hench and Laub (1994). Reliable parallel algorithms for solving the P-DAREs based on the swap and collapse technique have been developed (Lainiotis et al. 1994, Benner 1997, Benner et al. 2000 a,b, 2002, Benner and Byers 2001). For the case of p ¼ 1, the P-DAREs (1) become a single DARE. A well-known backward stable approach, utilizing the QZ algorithm for computing the unique s.p.s.d. solution to a DARE, has been proposed in Pappas et al. (1980), Van Dooren (1981) and MathWorks (1992). Algorithms using symplectic orthogonal transformations for solving DAREs have been proposed in (Mehrmann 1988, Ammar and Mehrmann 1991). The doubling algorithms with second-order convergence have been developed in Anderson (1978) and Kimura (1988). Matrix sign function-type methods, which solve DAREs implicitly by transforming the symplectic pair into a Hamiltonian matrix, have been developed in Lu and Lin (1993) and Lu et al. (1999). More recently, a matrix disk function method has been devel-
769
Structure-preserving algorithms oped in Benner (1997) and Benner et al. (2002) based on an inverse-free iteration (Malyshev 1993, Bai et al. 1997) for computing the unique s.p.s.d. solution of DAREs while preserving the symplectic structure (7) in each iterative step. The QZ-type algorithms (Pappas et al. 1980, Van Dooren 1981, Bojanczyk et al. 1992, MathWorks 1992, Hench and Laub 1994) (Periodic QZ or QZ) are numerically backward stable, but do not take into account the symplectic structure of ðMj , Lj Þ. Non-structurepreserving iterative processes loosen the symplectic structure, thus may cause the algorithms to fail or to lose accuracy in adverse circumstances. This will be more serious for ill-conditioned problems, when errors corrupt the stabilizing invariant subspaces and the solution process based on it. The inversion of some potentially ill-conditioned matrices cannot be avoided in the matrix sign function-type methods in Lu and Lin (1993) and Benner et al. (2002), leading to possible loss of accuracy. The symplectic structure in the algorithms (Mehrmann 1988, Ammar and Mehrmann 1991) is preserved only for systems with single input or output. For the general case, the symplectic structure is only retained in exact arithmetic. Similarly, in the matrix disc function/inverse-free methods (Bai et al. 1997, Benner 1997, Benner et al. 2000 a,b, 2002, Benner and Byers 2001), the symplectic structure can only be preserved in exact arithmetic. The aforementioned problems in non-structure-preserving algorithms will still occur, probably to a lesser extent. In this paper, we first revisit the doubling algorithm (Anderson 1978, Kimura 1988) for solving DAREs while keeping the associated symplectic matrix pairs in SSF in each iterative step. This algorithm attracted much attention but somehow went out of favour in the last decade. We develop the doubling algorithm from a new point of view, which is referred to as the structure-preserving doubling algorithm (SDA), and show the quadratic convergence of the SDA under assumptions which are weaker than (S) and (D). More details can be found in } 2. Second, we develop a structure-preserving swap and collapse algorithm (SSCA) to reduce the P-DAREs to a single DARE while keeping the associated symplectic matrix pairs in SSF. The P-DAREs can then be solved via the single DARE by SDA. The paper is organized as follows. In } 2, we revisit the doubling algorithm for solving a single DARE, based on the disk function approach (Benner 1997). Convergence and error analysis are also presented. The relationship between the disc function method and the doubling algorithm will be discussed. Section 3 contains a structure-preserving algorithm which swaps and collapses the associated symplectic matrix pairs as in (6) to a single matrix pair in SSF. In } 4, we report
some numerical results for selected DAREs (Bialkowski 1978, Patnaik et al. 1980, Petkov et al. 1989, Gudmundsson et al. 1992, Benner et al. 1995, Lin et al. 2000), comparing the SDA algorithm with the disc function/inverse-free methods (Bai et al. 1997, Benner 1997, Benner et al. 2002) and the method associated with dare in the MATLAB control toolbox (MathWorks 1992). Section 5 reports the numerical performance of the SSCAþSDA for P-DAREs sampled from Pittelkau (1993), Hench and Laub (1994) and Varga and Pieters (1998). Concluding remarks are given in } 6. 2.
Structure-preserving doubling algorithm for DAREs Let
M¼
0 , I
A H
L¼
I 0
G AT
ð10Þ
where A 2 Rnn , R 2 Rmm is s.p.d., B 2 Rnm and C T 2 Rnr are of full column rank, G ¼ BR1 BT 0 and H ¼ CT C 0. The pairs ðA, BÞ and ðA, CÞ are assumed to be stabilizable (S) and detectable (D), respectively. Then the DARE X ¼ AT XA AT XBðR þ BT XBÞ1 BT XA þ H or X ¼ AT XðI þ GXÞ1 A þ H
ð11Þ
has a unique s.p.s.d. solution (Pappas et al. 1980). In this section, we apply a swap and collapse procedure to derive the structure-preserving doubling algorithm (SDA) for solving the DARE (11), and prove the quadratic convergence of the SDA. Note that the quadratic convergence of the doubling algorithm is proven in Kimura (1988) for ðA, G, HÞ which is stabilizable and detectable. Below, in Theorem 1 we shall prove the quadratic convergence of the SDA under weaker conditions. Given M and L as in (10), we construct 2 3 I 0 0 0 6 0 I 0 07 6 7 T ð1Þ ¼ 6 7 4 A 0 I 05 H 2
0 0
0
0
60 I 6 T ð2Þ ¼ 6 40 0 0 0 2 I 0 60 0 6 T ð3Þ ¼ 6 40 0 0 I
0
I
I 0
I 0
AT ðI þ HGÞ1 7 7 7 AGðI þ HGÞ1 5 I 3
0
0
0
I7 7 7 05
I 0
3
0
ð12Þ
770
E. K.-W. Chu et al. with b denoting the result of one iterative step. Then b, L bÞ is again in SSF and satisfies by (17), ðM
and " T T ð3Þ T ð2Þ T ð1Þ ¼ 2 6 6 6 ¼6 6 4
T11
T12
T21
T22
#
b ¼ ðM 1 LÞ2 b 1 L M
I
0
0
H
0
0
0
I
I
0
AðI þ GHÞ1 T
1
A ðI þ HGÞ H
0
3
7 7 7 7: 1 7 AGðI þ HGÞ 5 AT ðI þ HGÞ1 I
ð13Þ
b are invertible. Otherwise, please provided that M and M refer to the detailed proof in Lemma 1 below. Equations (18)–(20) have exactly the same form as the doubling algorithm (Anderson 1978, (4)–(5)) (see also the references therein, as well as Lainiotis et al. (1994) and Kimura (1988)). However, the original doubling algorithm was derived as an acceleration scheme for the fixed-point iteration from (11):
We then have 2
L T M
I 60 6 ¼6 40 0
Xkþ1 ¼ AT Xk ðI þ GXk Þ1 A þ H:
3
G ðI þ HGÞ 7 7 7: 5 0
ð21Þ
ð14Þ
0
The transformation T represents row operations on L M and is obtained as follows: (1) Use the identity matrix in the (1,1)-block in L to annihilate submatrices beneath it. (2) Then use the resulting (4, 2)-block ½ðI þ HGÞ to eliminate the (2, 2)- and (3,2)-blocks. (3) Permute the row-blocks to the block-upper triangular form on the right-hand side in (14). Ignoring how T is constructed, the above factorization (14) can easily be checked by direct multiplication with the help of the SMWF. From (3), we define " # 1 0 AðI þ GHÞ e T21 ¼ M AT ðI þ HGÞ1 H I " # ð15Þ 1 I AGðI þ HGÞ e T22 ¼ L 0 AT ðI þ HGÞ1
Instead of producing the sequence {Xk}, the doubling algorithm produces fX2k g. Furthermore, the convergence of the doubling algorithm was proven when A is non-singular (Anderson 1978), and for ðA, G, HÞ which is stabilizable detectable (Kimura 1988). Our convergence results in Theorem 1 are stronger under weaker conditions (which are implied by (S) and (D)). The preservation of stabilizability and detectability is shown in Lemma 2. The interesting relationship between the SDA and the swap and collapse procedure in } 3 is also new. Problems arising from R being ill-conditioned are tackled in Chu et al. (2003 a). We now describe the SDA for solving the DARE. SDA algorithm: Input: A, G, H; (a small tolerance); Output: s.p.s.d. solution X for DARE. Initialize j 0, A0 A, G0 G, H0 H; Repeat W I þ G j Hj , Solve for V1 , V2 from WV1 ¼ Aj , V2 W T ¼ Gj ; Ajþ1 Aj V1 , Gjþ1 Gj þ Aj V2 ATj , T Hjþ1 Hj þ V1 Hj Aj ; Stop when kHjþ1 Hj kF kHjþ1 kF ; Set X Hjþ1 . End of SDA algorithm
and consequently deduce that eL ¼ L eM: M
ð16Þ
eL and M e M and apply the SMWF We then compute L to produce " # " # b b 0 I G A b eL and M b eM L ¼L ¼M bT b I 0 A H ð17Þ where b ¼ AðI þ GHÞ1 A A
ð18Þ
b ¼ G þ AGðI þ HGÞ A G 1
T
b ¼ H þ AT ðI þ HGÞ1 HA H
ð19Þ ð20Þ
2.1. Convergence of SDA Let
A M¼ H
0 , I
I L¼ 0
G AT
where G ¼ GT , H ¼ H T . Suppose M L has no eigenvalues on the unit circle and there exist non-singular Q, Z such that J 0 I 0 QMZ ¼ s , QLZ ¼ ð22Þ 0 I 0 Js where the spectrum ðJs Þ 2 Os f: jj < 1g. For the convergence analysis, we first prove the following lemma.
771
Structure-preserving algorithms Lemma 1: Let T be any 4n 4n non-singular matrix such that L 0 0 M T M L 0 0 L11 L12 0 M12 ¼ : ð23Þ 0 L22 0 M22
Routine manipulations show that I G T11 L T12 M ¼ 0 ðI þ HGÞ A 0 : T11 M ¼ HA 0 Recall that
Then (i) the pencil M22 L22 is uniquely determined up to a left transformation. b L b is equivalent to the pencil (ii) The pencil M G
Js2 0
I 0 0 Z 1 2 0 Js I
b L eL, M bM e M are given by (17), for where L some non-singular matrix G.
From (13), (14) and (22), we can choose J 0 Y¼ s 0 0 so that " T
ð3Þ
Proof:
T¼
T11 T21
T12 : T22 "
Since M L is regular, so is L 0 0 M M L 0 0 implying that
L M
J T YJ
0
I2n
Therefore, the pencil M22 L22 ¼ T21 M T22 L is uniquely determined up to a left transformation. (ii) From (15)–(17), we have # " #" L 0 T11 L T12 M T11 T12 ¼ e e M L M L 0 # " " #" # 0 M 0 T11 M T11 T12 ¼ e e b 0 0 M L 0 M
L M
T12 L b L
#
e, L e are given in (15). From the definiwhere M tion of T in (13), we have I 0 0 0 T11 ¼ , T12 ¼ : H 0 0 I
0 L
#
0
Q
I 6 0 60 ¼6 40 Z
0 I
0 0
0
I
0
0
0 Js2
Y
I2n
2
# Q 0 0 I2n 0 J T YJ 0 Q 0 Y I2n I2n 2 3 0 0 Js 0 60 0 0 07 6 7 ¼6 7: 4 0 0 Js2 0 5 0
0
0
Z 0
Q
I2n
M
I2n 0
0 0
is of full column rank. An inspection of (23) indicates that the rows of ½T21 , T22 form a basis of the null space of L : M
"
I2n
(i) Partition
I : 0
0 J I
0
3 0 Js 7 7 7 0 5
Z
0
0
Z
I
By (i) we have b L b¼ G M
Js2 0
I 0 0 I
0 Js2
Z 1 œ
for some non-singular matrix G.
We now prove the following convergence theorems. Theorem 1: Let A M¼ H
0 , I
L¼
I 0
G AT
where G ¼ GT , H ¼ H T . Suppose M L has no eigenvalues on the unit circle and there exist non-singular Q, Z such that (22) holds. Denote Z1 Z3 Z¼ Z2 Z4 Zi 2 Rnn for i ¼ 1, 2, 3, 4. If Z1 and Z4 are invertible, then the sequences fAk , Hk , Gk g computed by the SDA algorithm satisfy: k
(i) kAk k ¼ OðkJs2 kÞ ! 0 as k ! 1,
772
E. K.-W. Chu et al.
(ii) Hk ! X, where X solves the DARE (11): X ¼ AT XðI þ GXÞ1 A þ H (iii) Gk ! Y, where Y solves the dual DARE Y ¼ AYðI þ HYÞ1 AT þ G:
ð24Þ
Moreover, the convergence rate in (i)–(iii) above is k Oðjn j2 Þ, where j1 j jn j < 1 < jn j1 j1 j1 with i , 1 being the eigenvalues of M L i (including 0 and 1).
A0 H0
M0 ¼ M ¼
0 , I
L0 ¼ L ¼
I 0
ð29Þ Substituting Z 1 in (29) into (27), we obtain "
Let
Proof:
Now, by row-block elimination using Z1 as a pivot, we can compute " 1 # Z1 ½I YðI þ XYÞ1 X Z11 YðI þ XY Þ1 1 Z ¼ Z41 ðI þ XYÞ1 X Z41 ðI þ XYÞ1 " # Z11 ðI þ YXÞ1 Z11 YðI þ XYÞ1 ¼ : Z41 ðI þ XY Þ1 X Z41 ðI þ XYÞ1
G0 : AT0
Ak
0
Hk
I
#
" ¼
Then " Mk " ¼
Ak
0
Hk
I
Hk1 þ ATk1 ðI þ Hk1 Gk1 Þ1 Hk1 Ak1
0
#
I
and " # I Gk 0 ATk
" ¼
and Lk " ¼
G1k G3k
k
Js2 Z11 YðI þ XYÞ1
#
Z41 ðI þ XYÞ1
Gk
0
ATk
I
Gk1 þ Ak1 Gk1 ðI þ Hk1 Gk1 Þ1 ATk1
#
G2k G4k " Z11 ðI þ YXÞ1
#
I
0
G2k G4k " k Js2 Z11 ðI þ YXÞ1
ð30Þ
ð25Þ "
#
Z41 ðI þ XYÞ1 X
#
Ak1 ðI þ Gk1 Hk1 Þ1 Ak1
G3k
G1k
Z11 YðI þ XYÞ1
k
#
k
Js2 Z41 ðI þ XYÞ1 X Js2 Z41 ðI þ XYÞ1 ð31Þ
ATk1 ðI
þ Hk1 Gk1 Þ
1
ATk1
From the (2, 1)-block of (29), we obtain
# : ð26Þ
k
G2k Z11 ðI þ YXÞ1 ¼ G4k Js2 Z41 XðI þ YXÞ1 implying that k
From Lemma 1(ii) and the SDA, we have k 2 0k 0 I Z1 Mk Lk ¼ Gk Js 0 Js2 0 I
G2k ¼ G4k Js2 Z41 XZ1 :
Gk ¼
G1k G2k
G3k G4k
h i k k G4k Z41 ðI þ XYÞ1 þ Js2 Z41 XZ1 Js2 Z11 YðI þ XYÞ1 ¼ I:
ð34Þ
k ¼ 1, 2, . . . , are suitable non-singular matrices. Let X ¼ Z2 Z11 ,
Y ¼ Z3 Z41 :
ð33Þ
Consequently, (33) and the (2, 2)-block of (30) lead to ð27Þ
where
ð32Þ
It then follows from (33) and (34) that kþ1
k
G4k ¼ ðI þ XYÞZ4 þ O kJs2 k , G2k ¼ O kJs2 k ð35Þ
ð28Þ
From (22), it follows that the spans Z1 Z3 < and < Z2 Z4 respectively form the stable invariant subspaces of M L and ðJ T LJÞ ðJ T MJÞ. By the result of Pappas et al. (1980), it is clear that the symmetric X and Y solve the DAREs (11) and (24), respectively.
for sufficiently large k. Similarly, from the (1, 2)-block of (30), we have k
G3k ¼ G1k Js2 Z11 YZ4 :
ð36Þ
From the (1, 1)-block of (31), we obtain h i k k G1k Z11 ðI þ YXÞ1 þ Js2 Z11 YZ4 Js2 Z41 ðI þ XYÞ1 X ¼ I:
ð37Þ
:
773
Structure-preserving algorithms It follows from (36) and (37) that kþ1
G1k ¼ ðI þ YXÞZ1 þ O kJs2 k ,
G3k
k
¼ O kJs2 k ð38Þ
for sufficiently large k. From (35) and the (2, 1)-block of (30), we obtain k
Hk ¼ G4k Z41 ðI þ XYÞ1 X G2k Js2 Z11 ðI þ YXÞ1 kþ1
¼ X þ O kJs2 k ð39Þ for sufficiently large k. Equation (38) and the (1, 2)block of (31) then lead to k
Gk ¼ G1k Z11 ðI þ YXÞ1 Y þ G3k Js2 Z41 ðI þ XYÞ1 kþ1
¼ Y þ O kJs2 k ð40Þ for k sufficiently large. Finally, (38) and the (1, 1)-block of (30) imply k
Ak ¼ G1k Js2 Z11 ðI þ YXÞ1 G3k Z41 ðI þ XYÞ1 X k
k ð41Þ ¼ ðI þ YXÞZ1 Js2 Z11 ¼ O kJs2 k : Since the spectral radius of Js is equal to jn j < 1, (39)–(41) imply the results in (i)–(iii), as well as the k Oðjn j2 Þ rate of convergence. œ The following lemma proves that the stabilizability and detectability properties are preserved by the SDA throughout its iterative process. Lemma 2: The stabilizability of ðA, BÞ implies that b, B b¼ B bÞ is stabilizable, where G bB bT 0 is an FRD of ðA b b, C bÞ is detectG. The detectability of ðA, CÞ implies that ðA Tb b b b able, where H ¼ C C 0 is an FRD of H . Proof:
œ
See Appendix.
Theorem 2: M¼
Let " A H 1
T
0 I
#
" and
L¼
I
G
0
AT
#
T
with G ¼ BR B 0 and H ¼ C C 0. Assume that ðA, BÞ is stabilizable and ðA, CÞ is detectable. Then the sequences fAk , Hk , Gk g computed by the SDA satisfy (i), (ii), (iii) as in Theorem 1. Proof: It is well known that these reasonable assumptions imply that M L has no eigenvalues on the unit circle, and that Z1 and Z4 are invertible (see, e.g. Paige and Van Loan (1981) and Lu and Lin (1993), for details). Thus the conditions in Theorem 1 are satisfied. Remark: Theoretically, the convergence behaviour for the SDA and the algorithms (Bai et al. 1997, Benner
1997, Benner et al. 2002) are similar. Nevertheless, Theorem 1 directly proves, under the assumptions that M L have no unit modulo eigenvalues and Z1 , Z4 are invertible, that the sequences fAk , Hk , Gk g generated by the SDA converge to zero and the unique s.p.s.d. solutions of the DAREs in (11) and (24), respectively. Lemma 2 shows the preservation of stabilizability and detectability of the iterates ðAk , Gk , Hk Þ generated by the SDA. Furthermore, in Theorem 2, we see that the assumptions in Lemma 1 are weaker than the conditions (S) and (D). This distinction of preserving the symplectic structure in SSF, as well as the difference in operation counts is responsible for the superior performance of the SDA. b, G b and H b 2.2. Computation of A We now propose a structured and efficient procedure b, G b and H b in (18)–(20), respecfor the computation of A T tively, where G ¼ BB 0, H ¼ C T C 0 are FRDs. Let W ¼ ðI þ GHÞ1 . It is easily seen that HW ¼ W T H and GW T ¼ WG are s.p.s.d. By the SMWF we can derive the formulas W ¼ ðI þ GHÞ1 ¼ I BðI þ BT HBÞ1 BT H T
T
ð42Þ
T 1
GW ¼ G GC ðI þ CGC Þ CG ¼ BðI þ BT HBÞ1 BT T
ð43Þ
T
1
T
W H ¼ H HBðI þ B HBÞ B H ¼ C T ðI þ CGCT Þ1 C:
ð44Þ
When B and C start with low ranks, we can improve the efficiency of our computation further by the following compression process. Compute the Cholesky decomposition of WG ðI þ BT HBÞ ¼ KBT KB and WH ðI þ CGC T Þ ¼ KC KCT . Applying (42)–(44) to (18)–(20), we compute b ¼ A2 ABðI þ BT HBÞ1 BT HA A b ¼ G þ ABðI þ B HBÞ B A G " # T B bB bT 0 ¼ B, ABKB1 B KBT BT AT T
1
T
ð45Þ
T
(FRD) ð46Þ
and b ¼ H þ AT CT ðI þ CGC T Þ1 CA H C b 0 bT C ¼ C T , AT C T KCT C KC1 CA
(FRD) ð47Þ
bT
b and C are the full column rank compreswhere B sions of matrices B, ABKB1 and C T , AT CT KCT , bÞ > rank(B) and respectively. In general, rank(B bÞ > rank(C), and the compression process rank(C
774
E. K.-W. Chu et al.
b b and C becomes unprofitable when the ranks of B reach n. Remark: From (43) and (44), it is necessary to compute the Choleskey decompositions of the symmetric positive definite matrices WG and WH when updating G and H. This requires m3 =3 þ r3 =3 flops. If, instead of Cholesky factors, we compute the square roots of WG and WH in (43) and (44), then an additional 12ðm3 þ r3 Þ flops are required. Here the square roots of WG and WH are obtained from WG ¼ KBT KB ¼ VB B VBT ¼ ðVB B VBT ÞðVB B VBT Þ ¼ XB2 WH ¼ KC KCT ¼ UC C UCT ¼ ðUC C UCT ÞðUC C UCT Þ ¼ XC2 and the SVDs KB ¼ UB B VBT and KC ¼ UC C VCT . Cheaper methods for calculating the square roots are available but the corresponding implications on numerical stability and cost benefits are questionable. As a result, we do not choose the square root alternative in our algorithm. 2.3. Error analysis of SDA b, G b and We now consider the errors in calculating A b as in (18)–(20), respectively. From (18)–(20) we see H that the matrix W ðI þ GHÞ1 occurs frequently in the SDA algorithm, for some generic s.p.s.d. matrices G and H. From (42)–(44), instead of inverting the nonsymmetric ðI þ GHÞ, we can invert the s.p.d. matrices b, G b ðI þ BT HBÞ and ðI þ CGC T Þ, when updating of A b. The conditioning of ðI þ BT HBÞ in (42) and H and (43) (or ðI þ CGC T Þ in (44)) is well known, with the condition number being 2 1 þ max ðCBÞ ¼ 2 ðCBÞ 1 þ min
ð48Þ
and denoting the singular values. The error analysis in b, G b, H bÞ in (18)–(20) is thus the updating ðA, G, HÞ to ðA reduced to the routine discussion about the accumulation of errors in forming sums and products. With indicating errors, jFj denoting the matrix with all signs of elements in F ignored and denoting the maximum error in the starting data A, G and H, we typically have the asymptotic inequalities
convergence factor dominates. It is unlikely to have any ill effect, as the accumulated error in the matrix additions and multiplications should be of magnitude around a small multiple of the machine accuracy. As the SSF properties are preserved in the SDA, any error will be a structured one, only pushing the iteration towards a solution of a neighbouring SSF system. Thus the algorithm is stable in this sense, when the errors are not too large and when stabilizability and detectability are maintained. For large ks, as Ak ! 0, Gk and Hk converge to the unique s.p.s.d. solutions of (24) and (11), respectively. Danger again will only come at the initial stage of the iteration. Corresponding checks may be prudent in the algorithm.
2.4. Operation counts The matrix disk function method in Benner (1997) and Benner et al. (2002) is developed to solve the DARE (11) by using a swapping technique built on the QR decomposition. We refer to the algorithm presented in Benner (1997) and Benner et al. (2002) as QR-SWAP. We shall perform a flop-count for the SDA as well as the QR-SWAP algorithm. For the counts for components like LU- and QR-decompositions, consult Golub and Van Loan (1996) for details. For the SDA, we have the following count for one iteration: Calculation in SDA GH LU decomposition of I þ GH ðI þ HGÞ1 AT b ¼ AðI þ GHÞ1 A A
Flops n3 2 3 3n 3
n
n3
ðI þ GHÞ1 A
n3
AG b ¼ G þ AGðI þ HGÞ1 AT G
n3
HA b ¼ H þ AT ðI þ HGÞ1 HA H The total count ¼
1 3 2n 3
n
1 3 2n 23 3 3 n
bjk, kjG bjk, kjH bjk ðc1 þ c2 Þ kjA with c1 and c2 being polynomials in n of low degrees. Note that the coefficients c1 and c2 are dependent on the sizes of A, G and H. When the condition number in (48) is bounded by an acceptable number, the accumulation of error will be dampened by the fast rate of convergence at the final stage of the iterative process. Danger, ifk any, lies in the early stage of the process before the 2n
There is a small saving by (42)–(44) at the early stage of the iteration, when G and H have low ranks. We ignore this saving in the above count. Note that the b and H b saves n3 flops. We have also symmetry in G 2 ignored any Oðn Þ operation counts and the memory counts. For the QR-SWAP algorithm (Benner 1997, Benner et al. 2002), we have the following count for
775
Structure-preserving algorithms one iteration: Calculation in QR-SWAP " # " # L R Q ¼ M 0 " # Q11 Q12 Forming Q ¼ Q21 Q22
Lemma 3: Flops 80 3 3 n
224 3 3 n 3
Q21 L
8n
Q22 M
8n3
The total count ¼
352 3 3 n
There is some saving for the QR-SWAP algorithm in the first iteration, making use of the structure in M and L. This structure is lost in the later stages. There is also some saving in the accumulation of Householder factors when forming Q, as only part of Q is later required. This accounts for part of the over-estimation in the table above, as compared to the operation count 3 of 320 3 n flops from Benner (1997) and Benner et al. (2002). The operation count for the SDA is about 7% of that for QR-SWAP. This is mainly due to the fact that the main steps in QR-SWAP involve the QR decomposition of L 2 R4n2n M and the formation of Q 2 R4n4n , all in higher dimensions. The operations in the SDA are all within Rnn . It is difficult to conduct an operation count for dare, mainly because of the iterative nature of the Schur decomposition before invariant subspaces and solutions can be obtained. Operation counts per iteration should be of the same order as QR-SWAP. Peripheral operations in MATLAB itself also add heavily to the count and make a detailed comparison difficult. 3. Swap and collapse Recall that the s.p.s.d. solutions fXj gpj¼1 to the PDAREs (5) can be obtained from the invariant subspace associated with the stable eigenvalues for Pj in (9) when all Aj are non-singular. In general, the representation of Pj in (9) can also be applied when some Aj are singular or even not squared as in (4). The swap and collapse process (Benner and Byers 2001, Benner et al. 2002), which does not form the product Pj in (9) explicitly, can be used to compute Xj1 by swapping the order of the products and collapsing them into a single bj , L b bj Þ such that Pj ¼ L b1 symplectic matrix pair ðM j Mj , 2n 2n bj , M bj 2 R j1 j1 . The process relies on the where L following lemma (Benner et al. 2002, Lemma 1).
Consider E 2 Rsq , F 2 Rtq and let Q11 Q12 E R ¼ Q21 Q22 F 0
be a QR factorization of ½E T , F T T where R 2 Rqq , Q11 2 Rqs , Q12 2 Rqt , Q21 2 RðsþtqÞs and Q22 2 RðsþtqÞt . Then 1 Q1 22 Q21 ¼ FE :
ð49Þ
Here, the inverse of E or Q22 is purely notational. In fact, the relation in (49) denotes the relation 1 Q21 E ¼ Q22 F. We use the relation Q1 in 22 Q21 ¼ FE the swap and collapse process even when E or Q22 are singular or not squared. Using Lemma 3, the order of the products in (9) can be swapped, with all Ls b1 collapsed together to form L j . As an illustration, let us consider the following P1 for a period p ¼ 4 1 1 1 P1 ¼ L1 4 M4 L3 M3 L2 M2 L1 M1 :
ð50Þ
Note that the sizes of matrices Mj and Lj are given in (6) with nj ¼ njþ4 . Applying Lemma 3, we can swap the ð1Þ 1 ð1Þ order in the product M2 L1 1 ¼ ðL1 Þ M2 to obtain ð1Þ 1 ð1Þ 1 1 P1 ¼ L1 4 M4 L3 M3 L2 ðL1 Þ M2 M1 : ð1Þ Collapsing Lð1Þ 1 L2 and M2 M1 into, respectively, L1:2 and M1:2 , we obtain 1 1 P1 ¼ L1 4 M4 L3 M3 L1:2 M1:2 : ð1Þ 1 ð1Þ Repeat the process, swapping M3 L1 1:2 ¼ ðL1:2 Þ M3 and then collapsing the resulting terms, we obtain, ð1Þ 1 ð1Þ 1 with L1 1:3 M1:3 ¼ L3 ðL1:2 Þ M3 M1:2 ð1Þ 1 ð1Þ 1 1 1 P1 ¼ L1 4 M4 L3 ðL1:2 Þ M3 M1:2 ¼ L4 M4 L1:3 M1:3 : ð1Þ 1 ð1Þ A final swap for M4 L1 1:3 ¼ ðL1:3 Þ M4 and the associated collapse step will produce ð1Þ 1 ð1Þ 1 P1 ¼ L1 4 ðL1:3 Þ M4 M1:3 ¼ L1:4 M1:4
where L1:4 , M1:4 2 R2n4 2n4 . The solution X4 can then be calculated via the stable invariant subspace of ðM1:4 , L1:4 Þ, with other Xk ðk 6¼ 1Þ obtained from (1). As indicated in Benner and Byers (2001) and Benner et al. (2002), note that the swap and collapse step can be performed for different products in Pj in parallel. For example in (50), we can swap and collapse M4 L1 3 and M2 L1 1 simultaneously to obtain 1 1 1 P1 ¼ L1 4 M4 L3 M3 L2 M2 L1 M1 ð1Þ 1 ð1Þ ð1Þ 1 ð1Þ 1 ¼ L1 4 ðL3 Þ M4 M3 L2 ðL1 Þ M2 M1 1 ¼ L1 3:4 M3:4 L1:2 M1:2 :
A final swap and collapse associated with M3:4 L1 1:2 then produces ð1Þ 1 ð1Þ 1 P1 ¼ L1 3:4 ðL1:2 Þ M3:4 M1:2 ¼ L1:4 M1:4 :
776
E. K.-W. Chu et al.
More importantly, notice that the QR factorization in Lemma 3 can be replaced by other factorizations. We now develop a structure-preserving procedure, which is closely related to the QR-SWAP algorithms (Benner and Byers 2001, Benner et al. 2002), based on the LUlike factorization as in (12)–(14) to reduce the periodic symplectic matrix pairs fðMj , Lj Þgpj¼1 in (6) to a single symplectic matrix pair ðM1:p , L1:p Þ 2 R2np 2np R2np 2np in SSF. Given M1, L1, M2 and L2 as in (6), we have the factorization
L1 T M2
R ¼ 0
or
Consequently, we obtain " # In2 G2 þ A2 G1 ðIn1 þ H2 G1 Þ1 AT2 ð1Þ L1:2 :¼ L1 L2 ¼ 0 AT1 ðIn1 þ H2 G1 Þ1 AT2 " # b2 In 2 G ð53Þ bT2 0 A and M1:2 :¼ M2ð1Þ M1 " # A2 ðIn1 þ G1 H2 Þ1 A1 0 ¼ ½H1 þ AT1 ðIn1 þ H2 G1 Þ1 H2 A1 In4 " # b2 A 0 : b2 In H
ð54Þ
4
2 6 6 6 6 6 6 4
0
In 1
0
3
0
7 7 H2 0 0 In1 7 7 1 1 7 A2 ðIn1 þ G1 H2 Þ 0 In2 A2 G1 ðIn1 þ H2 G1 Þ 7 5 AT1 ðIn1 þ H2 G1 Þ1 H2 In4 0 AT1 ðIn1 þ H2 G1 Þ1 2 3 2 3 In1 G1 In 1 G1 6 7 6 7 6 0 0 ðIn1 þ H2 G1 Þ 7 AT1 7 6 7 6 6 7 7¼6 6 ð51Þ 7: 6 7 6 7 6 A2 7 4 0 0 0 5 4 5 H2
0
In1
0
Again, similar to (14), the transformation T represents row operations on
L1
M2 and the factorization (51) can easily be checked by direct multiplication with the help of the SMWF. Similar to Lemma 3, it is then obvious, after the swap, that we have ð1Þ 1 ð1Þ 1 M2 L1 1 ¼ ðL1 Þ M2 ¼ Q22 Q21
with Q22 ¼ Lð1Þ being the bottom-right ðn2 þ n4 Þ 1 ðn1 þ n2 Þ block in T of (51), Q21 ¼ M2ð1Þ being the bottom-left ðn2 þ n4 Þ ðn1 þ n4 Þ block in T, and 2 4 Lð1Þ 1 ¼ 2 M2ð1Þ ¼ 4
In 2
A2 G1 ðIn1 þ H2 G1 Þ1
0
AT1 ðIn1 þ H2 G1 Þ1
5 0
AT1 ðIn1 þ H2 G1 Þ1 H2
In4
1 1 L1 1:2 M1:2 ¼ L2 M2 L1 M1 :
ð55Þ
In (52)–(54), we have performed the transformation 3 2 L1 M1 7 6 M2 L2 7 6 7 6 7 6 M L 3 3 7 6 7 6 . . 7 6 .. .. 5 4 Mp Lp 3 2 7 6 M 0 L1:2 7 6 1:2 7 6 ! 6 7 M L 3 3 T 6 ð56Þ 7 7 6 .. .. 7 6 . . 5 4 Mp Lp using row operations in T, with L1:2 and M1:2 given in (53) and (54). The calculation in the structure-preserving swap and collapse algorithm (SSCA), continuing the swaps and collapses in (52)–(54), can then be summarized as: For j ¼ 2, 3, . . . , p, bj ¼ Aj ðIn þ G bj1 Hj Þ1 A bj1 2 Rnj np , A j1
ð57Þ
bj1 ðIn þ Hj G bj1 Þ1 ATj 2 Rnj nj , bj ¼ Gj þ Aj G G j1
ð58Þ
bTj1 ðIn þ Hj G bj1 Þ1 Hj A bj1 2 Rnp np bj1 þ A bj ¼ H H j1
3
A2 ðIn1 þ G1 H2 Þ1
Note that ðM1:2 , L1:2 Þ is again in SSF and satisfies
ð59Þ 3 5:
ð52Þ
b1 ¼ A1 , G b1 ¼ G1 and H b1 ¼ H1 , and the b denotwith A ing the result of a swap and collapse step. Finally, the following SSCA reduces the periodic symplectic matrix pairs fðMj , Lj Þgpj¼1 as in (6) into a
Structure-preserving algorithms single symplectic matrix pair in SSF 3 02 bp 0 A b, L bÞ ðM1:p , L1:p Þ ¼ @4 5, ðM bp In H p
2 4
In p
bp G
0
bTp A
the swapping and collapsing procedure using GSVD is not considered here. In the SSCA, non-orthogonal transformations are used and matrix products are computed in each step. However, the nice structures in the standard symplectic form (SSF) are preserved in each step. Furthermore, the SSCA only involves inverses of s.p.d. matrices, with little error accumulation. For solving P-DAREs, applying the SDA to the collapsed system produces Xp ¼ X0 , from which the other Xj , ð j ¼ p 1, . . . , 1Þ can be found through (1). Accumulation of error for moderate values of p should be acceptable. A good check of ep again by subaccuracy will be to calculate X stituting X1 into (1) and compare that with the Xp from the SDA.
31 5A:
SSCA algorithm: Input: Aj; Gj, Hj 0, j ¼ 1, . . . , p; bp , H bp ; G bp 0; Output: A b1 b b1 A1 , G G1 , H H1 ; Initialize A1 For j ¼ 2, 3, . . . , p bj1 ; Compute W Inj1 þ Hj G T Solve WV1 ¼ Aj , WV2 ¼ Hj for V1, V2 ; bj bj1 , G bj1 V1 , bj V1T A Gj þ Aj G A T bj1 V2 A bj1 ; bj bj1 þ A H H End End of SSCA algorithm
(iv) The SSCA is utilized as a Gaussian-like decomposition to perform the swaps in (51), in contrast with the QR decompositions used in Benner (1997), Benner and Byers (2001) and Benner et al. (2002). Although important differences in structure-preserving set the algorithms apart in performance, they share the same parallelism. Swaps and collapses can be carried out at different points in (56) simultaneously (see Benner et al. (2002) for details of parallelism, and the related remarks in } 6). This parallelism is obvious in the SSCA.
Remarks: (i) It is vital to preserve the SSF property of the symplectic matrix pairs in the SSCA, by mainbj and H bj for j ¼ taining the symmetry of G 2, 3, . . . , p. For solving P-DAREs, applying the SDA to the collapsed system produces Xp ¼ X0 , from which the other Xj ð j ¼ p 1, . . . , 1Þ can be found through (1). We call this combination SSCAþSDA. (ii) It can be observed that the operations in the SSCA are closely related to those in the SDA in } 2. (The iterative process in the SDA is like the swapping and collapsing in the SSCA, with periodicity p ¼ 1.) The error analysis of the SDA in the previous section can be shared with that for the SSCA. (iii) In Lemma 3, an orthogonal reduction technique, based on the QR-factorization, is used to swap and collapse two matrix pairs. However, the process is not backward stable because only the lower half of the orthogonal transformation is involved. Consequently, the swap and collapse procedure of the QR-SWAP algorithm (Benner and Byers 2001, Benner et al. 2002) is not backward stable and the final collapsed matrix pair ðM1:p , L1:p Þ is generally not in SSF. On the other hand, the swapping and collapsing of two matrix pairs using GSVD has been proven to be numerically backward stable (Benner and Byers 2001). However, the computational cost of the GSVD is much higher than that of the QR-factorization. Also, the question of stability for the collapsing of products of more than two matrix pairs is still open. Thus,
777
3.1. Preservation of positivity, stabilizability and detectability The following lemma proves that the important stabilizability and detectability are preserved by the SSCA. Lemma 4: The (P-S) property of the fðAj , Bj Þgpj¼1 bp , B bp ¼ bp Þ is stabilizable, where G implies that ðA T b b b Bp Bp 0 is the FRD of Gp . The (P-D) property of the bp , C bp Þ is detectable, where fðAj , Cj Þgpj¼1 implies that ðA Tb b bp . b Hp ¼ Cp Cp 0 is the FRD of H Proof: 4.
See Appendix.
Numerical experiments for DAREs The aim of this section is to illustrate the superior performance of the SDA algorithm, as compared to the QR-SWAP algorithm (Benner et al. 2002) and the MATLAB control toolbox command dare (MathWorks 1992). The solution of a small number of difficult DAREs, some from the benchmark set of problems in Benner et al. (1995), are considered. Some problems have parameters which control their degree of difficulty and conditioning. Tables of residuals, relative
778
E. K.-W. Chu et al.
errors and iteration numbers are presented for selected values of the parameter. For examples with varying dimensions, graphs of accuracies, CPU-times and efficiency ratios against the problem size are also presented. Numerical results confirm the accuracy and efficiency of the SDA as predicted in } 2. In particular, the SDA is up to ten times more efficient than the QR-SWAP, as predicted by the flop-counts in } 2. The SDA also seems to be more efficient than dare but the comparison with part of a general purpose package cannot be done on an equal footing. All in all, the SDA is efficient without equal compared to other methods, for the difficult problems we have tested. We see no reason why it should be performing differently for other problems in general, especially after other refinements, such as the possibilities in parallel computing discussed in } 6, are implemented. Now some details in the numerical experiments are listed. When the exact solution, denoted by X, is known, the relative error of an approximate solution e is calculated by X Rel: err:
e XkF kX : kXkF
The associated residual is calculated by eðI þ GX eÞ1 A þ H X ekF : Residual kAT X We try our best to compare the CPU time used by different methods for approximate solutions of similar accuracies. Often, it is impossible to match these accuracies and we have been forced to compare more accurate results from the SDA with less accurate ones from other methods. The opposite situation of comparing less accurate results from the SDA seldom occurs. This issue of relative accuracies in the comparison for DAREs is less severe than for P-DAREs in } 5. For the tables in the following examples, data for various methods are lists in columns with obvious headings. The heading ‘dare’ is for the dare command in MATLAB (MathWorks 1992), ‘QR’ is for the QR-SWAP method in Benner et al. (2002), and ‘SDA’ stands for the SDA algorithm. There are no iteration numbers to report for dare and an ‘’ in the tables indicates a failure of convergence. Failures occur frequently for dare for difficult problems. In the graphs, ‘ratio_dare’ is the ratio between the CPU-times for dare and the SDA, and ‘ratio_QR’ is defined similarly. Note the logarithmic scales used in these graphs. Here we only reported some representative numerical examples and retained the numbering of examples in Chu et al. (2003 b), where more numerical examples are presented. In order to demonstrate the behaviour of quadratic convergence for the SDA and QR-SWAP algorithms, we display the relative errors (djSDA and djQR in Frobenius norm) in two examples. Note that all
examples are ‘square’, with nj and mj being invariant of j. All computations were performed using MATLAB/ Version 6.0 on a Compaq/DS20 workstation. The machine precision is "m 2:22 1016 . Example 1: Let 0 " , A¼ 0 0
0 B¼ , 1
R ¼ 1,
H ¼ I2 :
The stabilizing solution is given by 1 0 X¼ 2 0 1þ" and the closed-loop spectrum is f0, 0g. For " ¼ 100, this is Example 2 from Gudmundsson et al. (1992). As " ! 1, this becomes an example of a DARE which is badly scaled in the sense of Petkov et al. (1989), due to the fact that kAkF kGkF , kHkF . The numerical results with " ¼ 100, 104 , 106 are given in table 1. For " ¼ 100, the behaviour of quadratic convergence for the QRSWAP algorithm and SDA is shown in table 2. Example 2: The following example is identical to Example 13 of Benner et al. (1995) which was presented originally in Petkov et al. (1989). Let A0 ¼ diagð0, 1, 3Þ, V ¼ I 23 vvT , vT ¼ 1 1 1 : Then A ¼ VA0 V,
1 G ¼ I3 , "
H ¼ "I3 :
dare
QR
SDA
" ¼ 100
Residual Rel. err. Iter. no.
6:72 109 6.72 1013 –
3:25 1017 3.25 1021 4
0:00 100 0.00 100 2
" ¼ 104
Residual Rel. err. Iter. no.
4.40 101 4.40 109 –
2.98 108 2.98 1016 4
0.00 100 0.00 100 2
" ¼ 106
Residual Rel. err. Iter. no.
6.11 107 6.11 105 –
1.22 104 1.22 1016 4
0.00 100 0.00 100 2
Table 1.
Results for Example 1.
j
djQR
djSDA
1 2 3 4
1.00 102 1.41 102 1.42 1018 "m
1.00 100 0.00 100 0.00 100 0.00 100
Table 2.
Results for Example 1 with " ¼ 100.
779
Structure-preserving algorithms dare
QR
SDA
Residual Rel. err. Iter. no.
2.57 1015 2.01 1016 –
3.09 1015 4.04 1016 7
2.23 1015 1.86 1016 6
" ¼ 104
Residual Rel. err. Iter. no.
2.18 1011 2.39 1016 –
1.79 103 2.11 108 7
1.93 1011 1.72 1016 6
" ¼ 106
Residual Rel. err. Iter. no.
2.66 109 2.80 1016 –
1.22 103 1.42 104 6
1.47 109 1.64 1016 6
" ¼ 1:0
Table 3.
Results for Example 2.
j
djQR
djSDA
1 2 3 4 5 6 7
6.07 101 5.41 102 7.02 103 1.41 104 6.52 108 1.35 1014 "m
4.87 101 3.83 101 4.93 103 3.24 107 2.65 1014 0.00 100 0.00 100
Table 4.
Results for Example 2 with " ¼ 1:0.
pffiffiffi The factorization H ¼ C C with C ¼ "V, and simi1 T larly, G ¼ BR B with B ¼ I3 and R ¼ "I3 . The exact solution is given by X ¼ V diagðx1 , x2 , x3 Þ V where pffiffiffiffiffi pffiffiffi ð1 þ 5Þ ð9 þ 85Þ , x3 ¼ " : x1 ¼ ", x2 ¼ " 2 2 4 6 The numerical results with " ¼ 1:0, 10 , 10 are given in table 3. For " ¼ 1:0, the behaviour of quadratic convergence for the QR-SWAP algorithm and SDA is shown in table 4.
The stabilizing solution has a very simple form, namely X ¼ diagð1, 2, . . . , nÞ: Note that the choice of r does not influence the stabilizing solution X but for r < 1, the condition number of DARE behaves like 1/r. In figure 1, we report the comparison of CPU times and its ratio with respect to the SDA for n ¼ 50, 100, 150, 200, 250, 300. We also list the residuals (res) and relative errors (RE) in table 5. Note that the residuals and relative errors for the SDA are in machine accuracy. For a smaller parameter r, say r ¼ 1012 , the reports of CPU times, residuals and relative errors are given in the figure 2 and table 6, respectively. Again, the residuals and relative errors for the SDA are in machine accuracy. Example 4: In this example we consider a linear system ðA, B, CÞ such that the corresponding symplectic matrix pair ðM, LÞ has a pair of eigenvalues close nearly to the unit circle in the complex plane. In the following the system matrices are constructed step by step via some symplectic structure-preserving equivalence transformations. Let A0, G0 and H0 be 10 10 matrices defined by A0 ¼ diagð1, A01 , A02 , A03 , A04 , 1Þ G0 ¼ diagð103 , 0, . . . , 0, 102 Þ
T
Example 3: The following example is identical to Example 15 of Benner et al. (1995) which was presented originally in Pappas et al. (1980, Example 3). Consider the DARE defined by 3 2 0 1 0 0 6 .. . . . . . . .7 6. . .. 7 . . 7 6 7 6 .. .. 2 Rnn , A ¼ 6 ... . 07 . 7 6 7 6 40 0 15 0 0 0 2 3 0 6 .. 7 6.7 7 B¼6 6 7, R ¼ r, H ¼ In : 405 1
H0 ¼ diagð102 , 0, . . . , 0, 103 Þ where A01 ¼
r1 cosð=3Þ
r1 sinð=3Þ r2 cosð7=5Þ ¼ r2 sinð7=5Þ r3 cosð=4Þ ¼ r3 sinð=4Þ r4 cosð=8Þ ¼ r4 sinð=8Þ
A02 A03 A04
r1 sinð=3Þ
r1 cosð=3Þ r2 sinð7=5Þ r2 cosð7=5Þ r3 sinð=4Þ r3 cosð=4Þ r4 sinð=8Þ r4 cosð=8Þ
and r1 ¼ 1 þ 3 1015 ,
r2 ¼ r3 ¼ 1 þ 104 ,
r4 ¼ 1 þ 105 :
Let
A0
0
,
I
G0
H0 I 0 V1 ¼ diagð0, 0:5, . . . , 0:5, 0Þ
AT0
M0 ¼
L0 ¼
V2 ¼ diagð0, 1:5, . . . , 1:5, 0Þ and define non-singular matrices Y1 and Z1 by " # I V1 I A0 V1 ðI þ H0 V1 Þ1 Y1 ¼ ¼ , Z : 1 0 I 0 ðI þ H0 V1 Þ1
780
E. K.-W. Chu et al. 3
2
10
10 dare QR SDA
ratio_dare ratio_QR
2
ratio = t / tSDA
CPU time (in sec.)
10
1
10
1
10
0
10
1
10
0
50
100
150
200
250
300
10
50
100
150
n
Figure 1.
n 50 100 150 200 250 300
200
250
300
n
The comparison of CPU times with r ¼ 1.
res_dare
res_QR
res_SDA
RE_dare
RE_QR
RE_SDA
3.50 1012 3.55 1011 1.04 1010 2.58 1010 5.31 1010 1.03 109
5.11 1012 7.17 1011 2.32 1010 5.67 1010 1.39 109 2.78 109
0.00 100 0.00 100 0.00 100 0.00 100 0.00 100 0.00 100
4.99 1014 2.46 1013 1.04 1012 2.66 1012 4.91 1012 1.03 1011
4.33 1014 1.80 1013 5.06 1013 8.41 1013 9.08 1013 1.67 1012
0.00 100 0.00 100 0.00 100 0.00 100 0.00 100 0.00 100
Table 5.
A simple calculation gives " Y1 M0 Z1 ¼ " Y1 L0 Z1 ¼
Results for Example 3 with r ¼ 1.
then it follows that A1
0
H1
I #
I
G1
0
AT1
# M1
L1
where A1 ¼ A0 ðI þ V1 H0 Þ1 , H1 ¼ ðI þ H0 V1 Þ1 H0 , and G1 ¼ ðG0 V1 Þ þ A0 V1 ðI þ H0 V1 Þ1 AT0 . Furthermore, if we define non-singular matrices Y2 and Z2 by " # " # I 0 ðI þ G1 V2 Þ1 0 Y2 ¼ , Z2 ¼ V2 I AT1 V2 ðI þ G1 V2 Þ1 I
Y2 M1 Z2 ¼
A2
0
M2 H2 I I G2 Y2 L1 Z2 ¼ L2 0 AT2
where A2 ¼ ðI þ G1 V2 Þ1 A1 , H2 ¼ H1 þ V2 AT1 V2 ðIþ G1 V2 Þ1 A1 and G2 ¼ ðI þ G1 V2 Þ1 G1 . Let G2 ¼ B2 BT2 0 and H2 ¼ C2T C2 0 be the FRD, respectively. Then the system matrices are given by A :¼ U T A2 U,
B :¼ U T B2 ,
R :¼ I pffiffiffiffiffi where U :¼ I 2uu with u ¼ ½1, 1, . . . , 1 = 10 2 R10 . It is easy to check that minfjjj 1j: is an eigenvalue T
C :¼ C2 U, T
781
Structure-preserving algorithms 3
2
10
10 dare QR SDA
ratio_dare ratio_QR
2
ratio = t / tSDA
CPU time (in sec.)
10
1
10
1
10
0
10
1
10
0
50
100
150
200
250
300
10
50
100
150
n
Figure 2.
300
The comparison of CPU times with r ¼ 1012 .
res_QR
res_SDA
RE_dare
RE_QR
RE_SDA
1.20 1011 8.13 1011 2.17 1010 5.68 1010 1.32 109 2.23 109
5.12 1012 6.34 1011 2.50 1010 6.34 1010 1.75 109 2.82 109
0.00 100 0.00 100 0.00 100 0.00 100 0.00 100 0.00 100
2.25 1013 4.93 1013 2.67 1012 4.88 1012 1.33 1011 2.00 1011
4.36 1014 1.82 1013 5.43 1013 8.53 1013 8.99 1013 1.44 1012
0.00 100 0.00 100 0.00 100 0.00 100 0.00 100 0.00 100
Table 6.
of ðM, LÞg 3 1015 , where A 0 and M¼ CT C I
Results for Example 3 with r ¼ 1012 .
where L¼
I 0
BB : AT T
2
The numerical results are shown in table 7. Example 5: For r > 0, consider a parameterized symplectic pair ðMðrÞ, LðrÞÞ with 2
250
res_dare
n 50 100 150 200 250 300
200 n
0:4323
6 6 0:5969 6 6 0:8750 6 AðrÞ :¼ 6 6 1:0347 6 6 4 0:2771 0:8080
0:2582 1:2863
1:8430
0:2553
0:2746
3
7 1:8618 0:0046 0:7127 0:3544 1:7583 7 7 1:5715 1:3551 0:4912 0:9922 2:1640 7 7 7 1:1935 0:3797 0:8341 0:7323 1:8743 7 7 7 0:8410 1:1405 1:3839 0:2333 0:3544 5 0:9526 1:2224 1:2405 1:5662 1:5694
GðrÞ :¼ B2 BT2
1 B1 BT1 , r2
HðrÞ :¼ C1T C1
0:3447
0:6321
0:4592
1:0773
0:2610
1:3565
3
6 7 6 1:7938 0:9404 1:1726 0:3441 0:1703 0:1008 7 6 7 6 7 0:4660 1:0479 0:1899 1:0075 0:4529 7 6 0:6840 7 B1 ¼ 6 6 0:7424 0:6171 1:7952 0:0011 1:7101 0:5320 7 6 7 6 7 6 0:6319 0:8059 0:6623 0:4091 7 0:7990 1:4504 4 5 1:7719 2
0:3107
0:0055
0:6855
0:4471 0:1384
0:0057
0:7207
0:2926 0:1119
1:3962 0:7315
3
6 7 6 0:5037 0:9720 0:7164 0:3462 0:3193 1:6300 7 6 7 6 7 6 1:5449 3:0129 1:2720 1:8523 0:4305 0:0600 7 7 B2 ¼ 6 6 0:6068 0:6410 0:1884 0:4436 1:5227 0:1858 7 6 7 6 7 6 0:2213 1:0175 0:5326 0:2597 7 0:0057 0:4042 4 5 0:9153
0:1943
0:6435 1:1077 0:1157
0:6489
782
E. K.-W. Chu et al.
Residual Iter. no.
dare
QR
SDA
* –
5:68 105 54
6:01 1013 54
Table 7.
dare Residual Iter. no.
– Table 8.
dare
Results for Example 4.
QR
Residual 1:34 1014 Rel. err. 9:22 1016 Iter. no. –
8:57 1015 8:54 1016 7
1:66 1014 1:46 1016 6
¼ 106
Residual 8:86 1010 Rel. err. 3:98 1011 Iter. no. –
3:62 105 1:39 106 17
5:58 1010 2:75 1012 16
Table 9.
Results for Example 6.
13
1:29 10 22
5:07 10 37
Results for Example 5.
2
3 2:2752 2:1534 0:9038 1:8451 1:4674 1:0841 6 7 1:7412 1:5000 1:6086 7 6 0:4996 1:0463 0:6970 6 7 6 1:7526 0:5329 1:0929 0:6429 0:0580 1:2661 7 6 7 C1 ¼ 6 7 6 0:9504 0:4575 0:3857 1:1104 0:1943 0:1205 7 6 7 6 7 4 1:5133 0:6674 0:5427 0:8445 1:2548 1:3334 5 0:7063 1:1925 0:0400 0:4600 1:5304 0:4101
This example involves H1 norm computation (details in Lin et al. (2000)). By applying the bisection method, we can obtain the smallest r 1:08324 such that Xðr Þ :¼ RicðMðr Þ, Lðr ÞÞ exists, I þ Gðr ÞXðr Þ is invertible, and Xðr Þ 0. Moreover, a pair of real eigenvalues ð,1=Þ ¼ ð0:9999999999998726, 1:000000000000128Þ of the symplectic matrix pair ðMðr Þ, Lðr ÞÞ approach to the unit circle with the distance 1:2745 1013 . The numerical results are given in table 8. As the s.p.s.d. assumption for Gðr Þ is violated for this example, dare cannot be used. This leads to the ‘’ in table 8. Example 6: The following example is identical to Example 2.1 of Abels and Benner (1999), which has been presented originally in Laub (1979, Example 2) and Sun (1998). This is an example of stabilizabledetectable, but uncontrollable-unobservable data. We have the following system matrices 4 3 1 9 6 A¼ , B ¼ , R ¼ , H ¼ : 92 72 1 6 4 The stabilizing solution is pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ 1 þ 4 9 X¼ 6 2
SDA
¼1
SDA 8
QR
6 : 4
The parameter of R was introduced in Sun (1998) to construct ill-conditioned DAREs. Small values for will not affect the condition number of DARE much while it grows with increasing values . The numerical results with ¼ 1 and ¼ 106 are shown in table 9. The following three examples come from proportional-plus-integral (PI) control problems. The design
includes the original system with coefficient matrices A1, B1, C1, Q1 and R1. Additionally, there are r error integrators that are concatenated with the original system. The coefficient matrices of the DARE to be solved are " # " # A1 0 B1 A¼ , B¼ C1 Ir 0 " T # C1 Q1 C1 0 H¼ , R ¼ R1 : 0 Q2 Example 7: This example is identical to Example 1.11 of Abels and Benner (1999), which has been presented originally in Sima (1996, Section 1.2.2) (Foulard et al. 1977). The actual data are defined by 2 3 0 0 6 7 A 1 ¼ 4 I4 0 5
A22
0 A22 2 0:222 0:778 6 0:4 0 6 ¼6 4 0 0
B1 ¼
1 0
0
0
0 0
0 0 0 0
0 C1 ¼ 0
0 0 0 0 0:5 Q1 ¼ Q2 ¼ 0
0 0
0 0:6
0 0
0
1:372
0
1
0 0
3
0 0
7 7 7 0:47 5
0 0 0 0:098
0 0 0
T
0 0
0 15 0 0 0 0 0 7 5:357 3:943 0 400 0 , R1 ¼ : 5 0 700
The numerical results are given in table 10. 4.1. Comments The tables show that the approximate solutions from the SDA are either as accurate as or more accurate than those from the QR-SWAP method and dare. For the examples considered, the SDA converges to a comparable accuracy, in the same number of iterations or in one less iteration, when compared to the QR-SWAP
783
Structure-preserving algorithms
Residual Iter. no.
dare
QR
SDA
2:68 1010 –
2:38 104 9
1:64 1011 8
Table 10. Results for Example 7.
method. The graphs show the relative efficiencies more clearly, for examples with a parameter reflecting varying degrees of difficulty or conditioning. The efficiency ratios ‘ratio_dare’ and ‘ratio_QR’ stay between three and ten. The real ratios should be bigger, as solutions from the SDA are generally more accurate. For example, the residuals and relative errors for SDA for Example 3 are virtually zero. Note that several problems investigated are extremely ill-conditioned. Others have eigenvalues extremely close to the unit circle, numerically violating the assumptions of our theory. The SDA solves them efficiently and accurately without failure. Example 5 comes from an application of the SDA in H1 norm computation. The family of examples is dependent on the parameter r, which we would like to minimize before some stabilizability, detectability and s.p.s.d. constraints are violated. The minimization in this example was carried out by bisection. For r near its minimum, the DARE involved becomes ill-conditioned. This challenging problem was solved in 22 iterations to near machine accuracy. For QR-SWAP, the residual converged to around 108 in 27 iterations and did not improve further even after 100 more iterations. This behaviour, and the similar behaviour in Example 4, illustrate the importance of the SSF property and the consequent superior convergence, in addition to the better operation count. There seemed to be a lack of numerical results involving the doubling algorithms, which might have led to the neglect of this class of methods. The preservation of (S) and (D) properties in Lemma 1, the convergence results in Lemma 2 and Theorem 2, the superior operation count and the above numerical examples suggest that the neglect has been unjustified. 5. Numerical experiments for P-DAREs Similar convention as in } 4 is utilized in this section, except the PQZ method (Bojanczyk et al. 1992, Hench and Laub 1994) replaces dare for P-DAREs. In PQZ, QZ decompositions are applied to a 2pn 2pn matrix pair containing the p matrix pairs defining the PDAREs. This makes PQZ extremely unattractive in terms of operation counts and efficiency, comparing with the other methods. Also, the method failed when the periodicity p is greater than 10, because of difficulties in the deflation processes involved (Bojanczyk et al.
1992, Hench and Laub 1994). Thus, the superiority of the SSCAþSDA for P-DAREs is even more marked than that shown in } 4 for DAREs. In terms of QR-SWAP for P-DAREs (Benner and Byers 2001, Benner et al. 2002), this superiority may be explained by the accumulation of errors in the QR-SWAP method, due to the relative lack of structure-preserving properties and the consequent slower convergence. ej gp , For the residual of approximate solutions fX j¼1 the PQZ method produces p residuals rj for each ej as defined in X ej ðI þ Gj X ej Þ1 Aj þ Hj X ej1 kF : rj ¼ kATj X The total residual is thus defined as !1=2 p X 2 Residual ¼ rj : j¼1
The situation is different in QR-SWAP and SSCAþ ep via a collapsed SDA, as these methods solve for X matrix pair, generating a residual rp as in the DARE es are obtained through substitutions case. The other X using the Riccati equation ej ðI þ Gj X ej Þ1 Aj þ Hj : ej1 ¼ ATj X X This substitution process generates very little error and virtually no residuals. Consequently, it is difficult to compare the residuals of the PQZ solution with others. From the definition of the residuals and numerical experience, we consider a PQZ solution as equivalent pffiffiffi in accuracy if its residual is approximately p times the residuals from QR-SWAP or the SSCAþSDA. For the tables in the following examples, data for various methods are lists in columns with obvious headings. The heading ‘PQZ’ is for the periodic QZ algorithm (Bojanczyk et al. 1992, Hench and Laub 1994), ‘QR’ is for the QR-SWAP method in Benner et al. (2002), and for simplicity, the SSCAþSDA algorithm is abbreviated to ‘SDA’ . Example 8: As in Example 2 of Hench and Laub (1994), we consider periodic discrete-time algebraic Riccati equations with n ¼ p ¼ 3. The system matrices are 2 3 2 3 3 2 9 6 3 0 6 7 6 7 A1 ¼ 4 0 0 4 5, A2 ¼ 4 4 2 2 5 3 2 3 2 1 4 2 3 2 3 2 3 3 1 6 7 6 7 A3 ¼ 4 4 15 3 5, B1 ¼ 4 1 5 2 2 3 0 6 7 B2 ¼ 4 1 5, 0 Hj ¼ ej eTj ,
9
1 2 3 0 6 7 B3 ¼ 4 1 5, 1 j ¼ 1, 2, 3
0 R1 ¼ R3 ¼ 1,
R2 ¼ 2
784
E. K.-W. Chu et al.
Residual Iter. no.
PQZ
QR
SDA
1:48 106 –
1:66 107 5
2:18 108 4
Residual CPU time Iter. no.
PQZ
QR
SDA
* * –
2:43 1013 0:133 3
2:00 1014 0:083 2
Table 11. Results for Example 8. Table 12. Results for Example 9 with n ¼ 4 and p ¼ 120.
where ej denotes the jth column of the identity matrix. The numerical results are reported in table 11. Example 9: In Varga and Pieters (1998), the authors considered an optimal periodic output feedback control problem with n ¼ 4 and p ¼ 120. This periodic discretetime model was generated from a continuous-time linearized state space model of a spacecraft system (Pittelkau 1993). For j ¼ 1, . . . , p, the system matrices are 2
3 0:9506860 0:0429866 0:4827320 2:5564383 6 0:0409684 0:9721628 1:3617382 0:5081454 7 6 7 Aj ¼ 6 7 4 0:0122736 0:0363280 0:8671394 0:6014295 5 0:0346225 0:0072209 0:3203622 0:8456626 3 2 0:2220925 6 0:1300536 7 7 6 Bj ¼ 105 6 7 cosð!0 jTÞ 4 0:1877217 5 0:0271167 2 3 0:5035620 6 0:4241087 7 6 7 þ 105 6 7 sinð!0 jTÞ, 4 0:1218290 5 0:3583826 " pffiffiffi # 2 0 0 0 , Rj ¼ 1011 , Cj ¼ 0 1 0 0
where !0 ¼ 0:00103448 rad/s is the orbital frequency and T ¼ 2=ð!0 pÞ is the sampling period. The numerical results are reported in table 12. Example 10: In this example, we tested the three methods on the randomly generated periodic matrix pairs fðMj , Lj Þgpj¼1 . Entries of Aj are distributed normally in the interval ½2, 2, and entries of matrices Bj, Cj are distributed normally in the interval ½1, 1 ð j ¼ 1, . . . , p). We set rankðBj Þ ¼ rankðCj Þ ¼ 0:7n, for all j. Figure 3 reports the comparison of CPU times for n ¼ 50, 100, 150, 200, 250, 300, all with p ¼ 8. Figure 4 reports the comparison of CPU times for p ¼ 4, 8, 16, 32, 64, 128, with n ¼ 30. In these two cases, the numerical results of residuals are shown in table 13. 5.1. Comments For Example 8, the SDA produced the most accurate solution in three iterations, as compared to four for
QR-SWAP. The solution from PQZ is two orders of magnitude worse than those from the SDA. The PQZ method failed for Example 9, probably because of the small elements in Rj and Cj. Nearmachine accuracy was achieved by the SDA in two iterations. The QR-SWAP method produced a solution one order of magnitude worse in one more iteration. Example 10 contains several randomly generated examples, with varying values of n and p. The graphs show that the SDA is around three to five times more efficient than QR-SWAP, and around 1000 times more efficient than PQZ, which failed for large values of p. This can be explained by the well-known fact that the shift and deflate approach in PQZ fails when p is large. Note that the efficiency advantage should be higher because of the smaller residuals for solutions from the SDA. The examples illustrated that the SDA is more efficient than the PQZ and QR-SWAP algorithms. 6.
Conclusions
We conclude the paper with a summary of results and a few comments. SDA and QR-SWAP: In this paper, we investigate structure-preserving algorithms (SDA and SSCA) for solving DAREs and P-DAREs and prove the quadratical convergence of the SDA under assumptions which are weaker than stabilizability and detectability. P-DAREs are first reduced to a DARE by the SSCA. The resulting DARE is then solved by the iterative SDA. The algorithm looks, on the surface, very similar to the QR-SWAP algorithm in Benner et al. (2002). The algorithms SDA and QR-SWAP are obviously closely related, sharing similar theoretical background and convergence analysis. However, there are some important differences in the details. The main difference is in the stronger SSF properties in the SDA, preserving the symplecticity in standard symplectic form as well as the stabilizability and detectability properties, through the iterative process. In addition, the SDA allows the iteration to be carried out with far fewer flops. It is interesting that the swap and collapse steps in QR-SWAP are forward stable numerically, as compared to the structure-preserving steps (SSF) in the SDA which are numerically efficient and reliable (recall the
785
Structure-preserving algorithms 5
6
10
10 PQZ QR SDA
ratio_PQZ ratio_QR
4
5
10
10
3
4
10 ratio = t / tSDA
CPU time (in sec.)
10
2
10
1
2
10
10
0
1
10
10
1
10
3
10
0
50
100
150
200
250
300
10
50 100 150 200 250 n n Figure 3. The comparison of CPU times for n ¼ 50, 100, 150, 200, 250, 300 and p ¼ 8.
inversion of the well-behaved matrix operation ðI þ GHÞ with G, H 0). It may be the case that smaller errors (in QR-SWAP) can do more harm than larger but structured errors (in SDA). Also, the least squares step at the end of QR-SWAP is not required in the SDA. Together with the difference in operation counts, the SDA seems to be a superior algorithm. Notice that the PQZ algorithm (or its equivalent dare for DAREs) is never a competitor to QRSWAP or the SDA, due to its inferior operation count. Notice also that the SDA performs a lot better for ill-conditioned P-DAREs. In summary, the numerical evidence we have gathered so far indicates that the SDA is an accurate, robust and efficient algorithm for PDAREs. The algorithm appears to be a sound basis on which a general-purpose algorithm for P-DAREs can be built. Deficiency in SDA: There is one advantage of QRSWAP over the SDA that we are aware of. Let the state equation xjþ1 ¼ Aj xj þ Bj uj be replaced by a descriptor system Ej xjþ1 ¼ Aj xj þ Bj uj with a non-singular but ill-conditioned Ej. The QR-SWAP algorithm should
300
still work while the SDA will founder, because the inversion of Ej is required in the SSF structure. Parallelism: It is easy to see that all the possibilities of parallelism in QR-SWAP (Benner et al. 2002) exist in SSCA and the SDA. Recall that the swap and collapse procedure in the SDA and QR-SWAP can be carried out simultaneously at different points. For example for P-DAREs with period p ¼ 4, we can swap and collapse the first two matrix pairs in parallel to the same operations on the last two, and then swap and collapse the two resulting matrix pairs into the final pair.
Acknowledgements We would like to thank Professor Dr Volker Mehrmann for his thoughtful comments on a draft of the paper. Appendix Proof of Lemma 2:
Let
b ¼ vT , vT A
jj 1
ð60Þ
786
E. K.-W. Chu et al. 2
4
10
10 PQZ QR SDA
ratio_PQZ ratio_QR
1
3
10
ratio = t / tSDA
CPU time (in sec.)
10
0
10
10
1
1
10
2
10
2
10
0
2
3
4
5
6
7
10
2
3
4
m (Period = 2m) Figure 4.
n
res_PQZ
res_QR
res_SDA
8
50 100 150 200 250 300
1.29 105 5.72 104 6.03 103 3.47 102 1.21 101 3.42 101
3.94 106 4.13 104 4.30 103 2.48 102 9.91 102 2.17 101
3.64 106 2.08 104 2.56 103 1.41 102 4.39 102 1.30 101
30
4.99 107 8.56 107
7.42 107 9.34 107 4.13 107 3.19 107 1.12 106 6.51 107
2.20 107 4.04 107 2.42 107 2.03 107 3.17 107 2.79 107
Table 13. Results of Example 10 for various p and n.
T
7
FRD of G ¼ BR1 BT 0 and the fact that ðGðIþ HGÞÞ1 is s.p.s.d. (two applications of the SMWF) it follows that vT B ¼ 0,
vT AB ¼ 0:
1
T
v G ¼ v ðG þ AGðI þ HGÞ A Þ ¼ 0:
ð61Þ
We need to show that v ¼ 0 for the stabilizability of b, B b¼ B bÞ, where G bB bT 0 is an FRD. From (61), the ðA
ð62Þ
Substituting (18) into (60), and using the SMWF and (62), we obtain b ¼ vT AðI þ GHÞ1 A ¼ vT A½I GðI þ HGÞ1 HA vT A ¼ vT A2 ¼ vT :
ð63Þ
From (63) it follows that v is a linear combination of two left eigenvectors fu1 , u2 g of A corresponding to the eigenvalues f!1 , !2 g (!1 6¼ !2 ), i.e. uTj A ¼ !j uTj , with !2j ¼ , j ¼ 1, 2. Write v ¼ 1 u1 þ 2 u2 :
and Tb
6
The comparison of CPU times for p ¼ 2, 4, 8, 16, 32, 128 and n ¼ 30.
p
4 8 16 32 64 128
5
m (Period = 2m)
ð64Þ
Substituting (64) into (62) and eliminating 2 !2 u2 B, we obtain that uT1 B ¼ 0. From the stabilizability of ðA, BÞ and the relation uT1 A ¼ !1 A, j!1 j 1, it follows that u1 ¼ 0. Similarly, we can also show that u2 ¼ 0. These imply v ¼ 0.
Structure-preserving algorithms b, C bÞ can be proved The detectability result of ðA similarly. œ Proof of Lemma 4:
Let
bp ¼ vT , vT A
jj 1
ð65Þ
and bp ¼ vT ½Gp þ Ap G bp1 ðIn þ Hp G bp1 Þ1 ATp ¼ 0: vT G p1 ð66Þ We need to show that v ¼ 0 for the stabilizability of bp ¼ B bp , B bTp 0 is an FRD. bp Þ, where G bp B ðA It can be shown from (57)–(59) that the FRDs T bp1 ¼ B bp1 ðIn þ Hp G bp1 Þ1 is bp1 bp1 B of G 0 and G p1 s.p.s.d. (two applications of the SMWF). From (66) T and the FRD of Gp ¼ Bp R1 p Bp , it then follows that vT Bp ¼ 0 ,
bp1 ¼ 0 : vT A p B
ð67Þ
Substituting (57)–(59) into (65) and using the SMWF, we obtain bp1 Hp Þ1 A bp ¼ vT Ap ðIn þ G bp1 vT A p1 h i bp1 ðIn þ Hp G bp1 Þ1 Hp A bp1 ¼ vT Ap Inp1 G p1 bp1 ¼ vT : ¼ vT Ap A Repeating the argument in (67) and (68) using (57)–(59), we arrive at vT Ap A1 ¼ vT
with jj 1
and vT Bp ¼ vT Ap Bp1 ¼ ¼ vT Ap Ap1 A2 B1 ¼ 0: These imply v ¼ 0 because of the (P-S) property of the original periodic system. The (P-D) can be proved similarly. œ
References Abels, J., and Benner, P., 1999, DAREX – a collection of benchmark examples for discrete-time algebraic Riccati equations (version 2.0), Tech. Rep. SLICOT Working Note 1999-16, The Working Group on Software. Ammar, G., and Mehrmann, V., 1991, On Hamiltonian and symplectic Hessenberg forms. Linear Algebra Applications, 149, 55–72. Anderson, B., 1978, Second-order convergent algorithms for the steady-state Riccati equation. International Journal of Control, 28, 295–306. Bai, Z., Demmel, J., and Gu, M., 1997, An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems. Numerische Mathematik, 76, 279–308. Benner, P., 1997, Contributions to the numerical solutions of algebraic Riccati equations and related eigenvalue problems. PhD Dissertation, Fakulta¨t fu¨r Mathematik, TU Chemnitz-Zwickau, Chemnitz, Germany.
787
Benner, P., and Byers, R., 2001, Evaluating products of matrix pencils and collapsing matrix products. Numerical Linear Algebra with Applications, 8, 357–380. Benner, P., Byers, R., Mayo, R., Quintana-Orti, E. S., and Herna¤ndez, V., 2002, Parallel algorithms for LQ optimal control of discrete-time periodic linear systems. Journal of Parallel and Distributed Computing, 62, 306–325. Benner, P., Laub, A. J., and Mehrmann, V., 1995, A collection of benchmark examples for the numerical solution of algebraic Riccati equations II: Discrete-time case. Tech. Rep. SPC 95_23, Fakulta¨t fu¨r Mathematik, TU Chemnitz– Zwickau, 09107 Chemnitz, FRG. Available from http:// www.tu-chemnitz.de/sfb393/spc95pr.html. Benner, P., Mayo, R., Quintana-Orti, E. S., and Herna¤ndez, V., 2000 a, A coarse grain parallel solver for periodic Riccati equations. Tech. Rep. 2000-01, Depto. de Informa´tica, 12080-Castello´n, Spain. Benner, P., Mayo, R., Quintana-Orti, E. S., and Herna¤ndez, V., 2000 b, Solving discrete-time periodic Riccati equations on a cluster. In A. Bode, T. Ludwig, W. Karl, and R. Wismuller (Eds), Euro-Par 2000 Parallel Processing, No. 1900 in Lecture Notes in Computer Science, Springer-Verlag, pp. 824–828. Benner, P., Mehrmann, V., Sima, V., Huffel, S. V., and Varga, A., 1999, SLICOT — a subroutine library in systems and control theory. Applied and Computational Control, Signals, and Circuits, 1, 499–539. Bialkowski, W., 1978, Application of steady-state Kalman filters — theory with field results. Proceedings of the Joint Automatic Control Conference, Philadelphia, PA, USA. Bittanti, S., and Colaneri, P., 1999, Periodic control. In J. G. Webster (Ed.), Wiley Encyclopedia of Electrical and Electronic Engineering, vol. 16 (New York, Wiley), pp. 59–74. Bittanti, S., Colaneri, P., and Nicolao, G. D., 1988, The difference periodic Riccati equation for the periodic prediction problem. IEEE Transactions Automatic Control, 33, 706–712. Bittanti, S., Colaneri, P., and Nicolao, G. D., 1991, The periodic Riccati equation. In S. Bittanti, A. Laub and J. Willems (Eds), The Riccati Equation, Springer-Verlag, pp. 127–162. Bojanczyk, A., Golub, G. H., and Van Dooren, P., 1992, The periodic Schur decomposition; algorithms and applications. In Proceedings of the SPIE Conference, vol. 1770, San Diego, USA, pp. 31–42. Chu, E. K.-W., Fan, H.-Y., and Lin, W.-W., 2003 a, A generalized structure-preserving doubling algorithm for generalized discrete-time algebraic Riccati equations. Preprint 2002-29, NCTS, National Tsing Hua University, Hsinchu 300, Taiwan. Chu, E. K.-W., Fan, H.-Y., Lin, W.-W., and Wang, C.-S., 2003 b, A structure-preserving doubling algorithm for periodic discrete-time algebraic Riccati equations. Preprint 2002-18, NCTS, National Tsing Hua University, Hsinchu 300, Taiwan. Foulard, C., Gentil, S., and Sandraz, J. P., 1977, Commande et re´gulation par calculateur nume´rique: De la the´orie aux applications, Eyrolles, Paris. Francis, B., and Georgiou, T. T., 1998, Stability theory for linear time-invariant plants with periodic digital controllers. IEEE Transactions on Automatic Control, 33, 820–832. Golub, G. H., and Van Loan, C. F., 1996, Matrix Computations, 3rd edn (The Johns Hopkins University Press).
788
E. K.-W. Chu et al.
Gudmundsson, T., Kenney, C., and Laub, A. J., 1992, Scaling of the discrete-time algebraic Riccati equation to enhance stability of the Schur solution method. IEEE Transactions on Automatic Control, 37, 513–518. Hench, J. J., and Laub, A. J., 1994, Numerical solution of the discrete-time periodic Riccati equation. IEEE Transactions on Automatic Control, 39, 1197–1210. Kimura, M., 1988, Convergence of the doubling algorithm for the discrete-time algebraic Riccati equation. International Journal of System Science, 19, 701–711. Lainiotis, D., Assimakis, N., and Katsikas, S., 1994, New doubling algorithm for the discrete periodic Riccati equation. Applied Mathematics and Computation, 60, 265– 283. Laub, A. J., 1979, A Schur method for solving algebraic Riccati equations. IEEE Transactions on Automatic Control, 24, 913–921. Lin, W.-W., Wang, C.-S., and Xu, Q.-F., 2000, Numerical computation of the minimum H1 norm of the discrete-time output feedback control problem. SIAM Journal of Numerical Analysis, 38, 515–547. Lu, L.-Z. and Lin, W.-W., 1993, An iterative algorithm for the solution of the discrete time algebraic Riccati equations. Linear Algebra and its Applications, 189, 465–488. Lu, L.-Z., Lin, W.-W., and Pearce, C. E. M., 1999, An efficient algorithm for the discrete-time algebraic Riccati equation. IEEE Transactions on Automatic Control, 44, 1216–1220. Malyshev, A. N., 1993, Parallel algorithm for solving some spectral problems of linear algebra. Linear Algebra and its Applications, 188/189, 489–520. MathWorks, 1992, MATLAB user’s guide (for UNIX Workstations) (The Math Works, Inc.). Mehrmann, V., 1988, A symplectic orthogonal method for single input or single output discrete time optimal linear quadratic control problems. SIAM Journal on Matrix Analysis and Applications, pp. 221–248. Paige, C., and Van Loan, C., 1981, A Schur decomposition for Hamiltonian matrices. Linear Algebra and its Applications, 41, 11–32. Pappas, T., Laub, A. J., and Sandell, N. R., 1980, On the numerical solution of the discrete-time algebraic Riccati
equation. IEEE Transactions on Automatic Control, 25, 631–641. Patnaik, L., Viswanadham, N., and Sarma, I., 1980, Computer control algorithms for a tubular ammonia reactor. IEEE Trans. Auto. Control, 25, 642–651. Petkov, P., Christov, N., and Konstantinov, M., 1989, A posteriori error analysis of the generalized Schur approach for solving the discrete matrix Riccati equation. Preprint, Department of Automatics, Higher Institute of Mechenical and Electrical Engineering, 1756 Sofia, Bulgaria. Pittelkau, M. E., 1993, Optimal periodic control for spacecraft pointing and attitude determination. Journal of Guidance, Control, and Dynamics, 16, 1078–1084. Sima, V., 1996, Algorithms for Linear-Quadratic Optimization, volume 200 of Pure and Applied Mathematics (New York: Marcel Dekker). Sreedhar, J., and Van Dooren, P., 1994, Periodic Schur form and some matrix equations. Systems and Networks: Mathematical Theory and Applications, 77, 339–362. Sreedhar, J., and Van Dooren, P., 1999, Periodic descriptor systems: solvability and conditionability. IEEE Trans. Auto. Control, 44, 310–313. Sun, J.-G., 1998, Sensitivity analysis of the discrete-time algebraic Riccati equation. Linear Algebra and its Applications, 275/276, 595–615. Van Dooren, P., 1981, A generalized eigenvalue approach for solving Riccati equations. SIAM Journal on Scientific and Statistical Computing, 2, 121–135. Van Dooren, P., 1999, Two point boundary value and periodic eigenvalue problems. In K. Kirchgassner et al., (Ed.) Proceedings of the 1999 IEEE International Symposium on Computer Aided Control-System Design, Kohala Coast-Island, Hawaii, USA, pp. 22–27. Varga, A., 1997, Periodic Lyapunov equations: some applications and new algorithms. International Journal of Control, 67, 69–87. Varga, A., and Pieters, S., 1998, Gradient-based approach to solve optimal periodic output feedback control problems. Automatica, 34, 477–481.