Fast algorithms for the numerical solution of structured

Universita degli Studi di Pisa Dipartimento di Matematica Dottorato di Ricerca in Matematica - IX Ciclo Tesi per il conseguimento del titolo

Beatrice Meini

Fast algorithms for the numerical solution of structured Markov chains

Il candidato

Dott.ssa Beatrice Meini

Il tutore

Prof. Dario A. Bini

Il coordinatore

Prof. Mariano Giaquinta

Consorzio delle Universita di Pisa (Sede amministrativa), Bari, Ferrara, Lecce, Parma

Contents Introduction 1 Markov chains

1 7

1.1 Nonnegative matrices . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Discrete time Markov chains . . . . . . . . . . . . . . . . . . . . . 8 1.3 Markov chains of M/G/1 type . . . . . . . . . . . . . . . . . . . . 13

2 Exploitation of structures and computational tools

2.1 Fast Fourier Transforms . . . . . . . . . . . . . . . . . . . . 2.2 Block Toeplitz matrices computation . . . . . . . . . . . . . 2.2.1 Block Toeplitz matrices and block vector products . . 2.2.2 Inversion of block triangular block Toeplitz matrices . 2.3 Pointwise power series arithmetic . . . . . . . . . . . . . . . 2.4 Displacement structure . . . . . . . . . . . . . . . . . . . . .

3 A fast version of Ramaswami's formula

3.1 Ramaswami's formula and block UL factorization 3.2 Fast computation of Ramaswami's formula . . . . 3.3 Numerical results . . . . . . . . . . . . . . . . . . 3.3.1 8-state BMAP/G/1 system . . . . . . . . 3.3.2 Metaring MAC Protocol . . . . . . . . . .

4 Functional iteration methods

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

4.1 Convergence properties of standard functional iteration methods . 4.2 A general class of functional iteration methods . . . . . . . . . . . 4.3 Functional iteration methods generating a sequence of stochastic matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 A strategy for the choice of the starting approximation . . . . . . 4.5 A relaxation technique . . . . . . . . . . . . . . . . . . . . . . . .

5 A \divide and conquer" algorithm

17 18 21 22 25 27 30

35 35 37 40 40 41

45 47 53

56 64 67

73

5.1 The Stewart algorithm . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Exploitation of the displacement structure . . . . . . . . . . . . . 75 i

CONTENTS

ii

5.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 The cyclic reduction method

6.1 The case of nite systems . . . . . . . . . . . . . . . . 6.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . 6.1.2 LU factorization by means of cyclic reduction. . 6.2 The case of in nite systems . . . . . . . . . . . . . . . 6.2.1 Computation the probability invariant vector 6.2.2 Computation of the matrix G . . . . . . . . . . 6.2.3 Numerical results . . . . . . . . . . . . . . . . . 6.3 The case of non-skip-free matrices . . . . . . . . . . . . 6.3.1 Structural properties of cyclic reduction . . . . 6.3.2 Convergence properties . . . . . . . . . . . . . . 6.3.3 Structural properties of the matrix G. . . . . . 6.3.4 Numerical results . . . . . . . . . . . . . . . . .

Bibliography

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

79

79 80 81 86 86 90 102 103 104 107 110 111

113

Introduction Markov chains are a powerful tool for solving a wide variety of problems of the real world. In particular, many problems in the important area of telecommunication networks, are described by queueing models that involve Markov chains characterized by particular structural properties. For this reason, in the world of engineers and of industry there is a great demand for ecient mathematical techniques and fast algorithms for the \real time" solution of complex and strongly structured Markov chains. In the last decade, the research in the area of theoretical and numerical solution of structured Markov chains has had a strong impetus, also stimulated by the growing importance of the applications. A milestone of the work done in this eld is constituted by the books of Marcel Neuts [80, 79], where new tools for solving the class of M/G/1 type Markov chains (and the dual of G/M/1 type) are introduced and deeply investigated. M/G/1 type Markov chains are characterized by a structure that \is shared by a large number of rather complicated models in theory of queues, communications systems, dams and inventories" (see [80], page v); they are de ned by in nite transition probability matrices P that have an upper block Hessenberg structure and, except for their rst block row, a block Toeplitz structure, i.e.,

3 77 A0 77 ; A1 A0 ... ... ... 5

2B 66 B21 P T = 66 B3 4 .

A0 A1 A2 .. ...

0

for given m m nonnegative matrices Ai, Bi such that P+i=01 ATi , P+i=11 BiT are stochastic. In the common applications the block entries of the matrix P may have a considerable size m and may have themselves an inner Toeplitz structure, depending on the model that they represent. We recall that a (block) matrix is (block) Toeplitz if its (block) entry in the position (i; j ) depends on j ? i. The main computational burden in solving M/G/1 type Markov chains consists in evaluating the solution of a homogeneous in nite linear system of equations of the form PT = ; 1

INTRODUCTION

2

where is the probability invariant vector of the Markov chain, and the symbol T denotes matrix transposition. A related problem arises in the computation of an approximate nonnegative solution G to the matrix equation

X=

X

+1

i=0

X iAi = A0 + XA1 + X 2A2 + X 3 A3 + : : : :

This problem can be reduced to the computation of the top left-most block entry of the in nite block matrix (I ? Q)?1, where I denotes the identity operator and 3 2A A 0 1 0 77 66 A2 A1 A0 7; 6 Q = 6 A3 A2 A1 A0 4 . . . . . 75 .. . . . . . . . . is an in nite block Toeplitz matrix in block Hessenberg form. The knowledge of G allows one to recover an arbitrary number of components of the vector by means of a well-known tool in Markov chains known as the Ramaswami formula [80, 84]. It is rather surprising that in the existing literature, concerning both theoretical properties related to Markov chains of M/G/1 type and numerical methods for their solution, no explicit use is made of the Toeplitz structure, whereas a wide literature exists concerning theoretical and computational properties of Toeplitz matrices. This fact has in part motivated our interest in the design and analysis of numerical methods for the solution of Markov chains. The literature on Toeplitz matrices is wide and covers many areas like the analysis of spectral properties, the analysis of computational and structural issues, the correlations with polynomial computations and more. In particular, powerful tools as the spectral theory of Szego [55, 91, 93, 96], the displacement rank and the properties of the Schur complement [59, 48, 60, 61], the representation formulas [49, 47, 7, 8, 50, 51, 72], the correlations with polynomial and power series computations [15, 24, 26], the correlations with matrix algebras and trigonometric transforms [11, 12, 14, 27, 28, 35], fast and super-fast algorithms for linear systems [2, 33, 59, 61, 92], preconditioned conjugate gradient techniques [13, 30, 34, 86], and more, have been introduced, analyzed and developed in the years and constitute an advanced and consolidated knowledge that represents the Toeplitz matrix technology. Such tools are currently used and adapted for devising ecient algorithms for the solution of speci c problems in many applicative elds like signal processing, numerical solution of integral and dierential equations, power series analysis, system theory, symbolic computations. However these tools do not seem to be well known to the researchers working on numerical solution of M/G/1 type Markov chains.

INTRODUCTION

3

In this thesis we overview the main results obtained by our research in this eld. In particular, we present new algorithms, based on the advanced Toeplitz matrix technology and on the theory of nonnegative matrices, that outperform the existing ones; we carry out their analysis and give proof of their convergence. We also describe the implementation of our algorithms, performed with advanced techniques, provide numerical experiments and apply the algorithms to solve certain problems arising in the analysis of telecommunications networks. For more details we refer to our papers [17, 18, 21, 19, 23, 16, 20, 22, 76, 75, 77, 39, 40, 38, 4, 71], published (or to appear) on journal articles or on conference proceedings. More speci cally, we point out the correlations between block Toeplitz matrices and matrix polynomials, and, in the case of in nite matrices, with matrix power series. We show how certain computations involving block Toeplitz matrices can be rephrased in terms of matrix polynomials, for instance matrix polynomial multiplication and matrix polynomial division, and adapt to matrix polynomials and matrix power series the evaluation/interpolation technique at the roots of 1 used for devising the FFT-based fast polynomial arithmetic [20, 26]. We recall the fundamental concept of displacement rank [61, 62], and borrow from [59, 10, 25, 26] a suitable displacement operator that can be eciently used for devising a representation formula for the inverse of block Toeplitz matrices in block Hessenberg form. The correlations between in nite block Toeplitz matrices and matrix power series are used for interpreting the Ramaswami formula in terms of the UL decomposition of a suitable block Toeplitz matrix in block Hessenberg form and a fast algorithm for its computation, based on FFT, is derived [75]. Concerning the problem of the computation of the matrix G we analyze the classical linearly convergent methods based on functional iterations. By performing an accurate analysis of the convergence, based on the theory of PerronFrobenius on nonnegative matrices, we give an optimality result by characterizing the fastest method in a class of functional iterations [76] and by proposing some strategies in order to improve the rate of convergence [40, 39, 38]. The concept of displacement rank is successively used [23], together with the Sherman-Woodbury-Morrison formula, for substantially improving a doubling algorithm, introduced by Latouche and Stewart in [70, 88], for the inversion of a block Toeplitz matrix in block Hessenberg form. In the solution of queueing problems, this algorithm is applied for computing the vector or the matrix G. For n n block matrices in block Hessenberg form with blocks of size m, the representation of the inverse can be computed in O(m3n + m2 n log n) arithmetic operations (ops). We devise fast and reliable algorithms [17, 18, 21, 20, 16, 19], both for the computation of the probability invariantPvector and for the computation of the solution G of the matrix equation X = +i=01 X iAi, based on the cyclic reduction technique, introduced in [29] for solving certain block tridiagonal systems and

4

INTRODUCTION

adapted to block Toeplitz matrices in [17, 18, 21]. We observe a very nice property, that is, one step of this algorithm leaves unchanged the block Toeplitz and the block Hessenberg structures of the matrix P . This fact is rephrased in functional form as a property of the Schur complement of the matrix obtained under an odd/even permutation of the block rows and block columns of P . In fact, we associate with P a pair of matrix power series and explicitly relate this pair of matrix functions with the ones associated with the Schur complement obtained after one step of cyclic reduction. In this way we generate a sequence of (in nite) block Toeplitz-like block Hessenberg matrices which quadratically converges to a block bidiagonal block Toeplitz-like matrix. This simple fact is the basis for devising very fast and numerically stable FFT-based algorithms for computing an arbitrary number of components of the vector or for computing the matrix G. The algorithms devised in this way, applied for the solution of a nite linear system, or for inverting a nite block Toeplitz block Hessenberg matrix, have the same asymptotic cost of the doubling algorithm based on displacement rank, but with a smaller overhead constant. For in nite matrices the former algorithm is much superior due to its quadratic convergence and to the low memory space required. Besides the case of general M/G/1 type Markov chains we have focused our interest also on speci c models of broad interest, like the Quasi-Birth-Death (QBD) problems [79, 80] and the non-skip-free case [45]. In the case of Quasi-Birth-Death (QBD) problems [79, 80] where the stochastic matrix P is block tridiagonal, the algorithm is much simpli ed, does not require FFT but just a constant number of matrix products per step. In the non-skip-free case, where the matrix P has an inner (block) Toeplitz structure, the blocks obtained just after a single step of cyclic reduction do not keep the inner (block) Toeplitz structure. However, we prove that their displacement rank remains bounded from above by a small constant at each step of the cyclic reduction process. This allows us to provide a special version of the algorithm that takes advantage of this additional structure. In fact, the cyclic reduction step is reformulated in terms of explicit equations that relate the displacement of the blocks obtained at two consecutive steps of the algorithm. Moreover convergence results, depending on the location of the zeros of a suitable polynomial, are provided. In our research we have investigated problems and we have devised and analyzed new algorithms for their solution. This has opened several interesting issues that are worth being studied more carefully in the next future. More speci cally, we list some of them which in our opinion have the priority in our research. { We believe that some techniques that we have introduced can be extended to more general problems (say, the \funnel" problems) that, in a weak form, keep the Toeplitz like structure.

INTRODUCTION

5

{ The mathematical models provided by the applications oer a rich variety of structures that are worth being investigated and that may lead to new ideas and to the design of new computational tools. { Another issue that we have not yet considered so far is using the preconditioned conjugate gradient techniques for solving Markov chains. In the very recent literature new ecient preconditioners, based on properties of matrix algebras and trigonometric transforms, have been introduced and analyzed for the solution of Toeplitz problems arising in signal processing. We believe that similar preconditioners can be designed and can be specially tailored for solving certain Markov chains. { Some results that we have obtained can be generalized to the solution of large banded (Hessenberg) Toeplitz systems, moreover their features seem to be closely related to certain properties (say, Konig theorem) and algorithms (say, Sebastiao-e-Silva, Graee, QD), valid for polynomial computations. We believe that a deeper analysis of such correlations may have a very interesting fall out in the eld of polynomial computations, where known algorithms for factoring polynomials can be rephrased in terms of Toeplitz computations and where the Toeplitz matrix technology may lead to new and hopefully more ecient methods for polynomial computations. The thesis is organized as follows. In Chap. 1 we recall some de nition and results concerning nonnegative matrices, Markov chains and Markov chains of M/G/1 type, that will be useful later on. In Chap. 2 we introduce the main computational tools used in the rest of the thesis. More precisely, we rst recall the de nition and the main properties of discrete Fourier transforms. We deal with the properties relating block Toeplitz matrices and matrix power series and introduce the basic algorithms for manipulating matrix polynomials and block Toeplitz matrices. Then we deal more speci cally with computations concerning matrix power series and in nite block Toeplitz matrices. Finally, we select a suitable displacement operator and recall its properties in the speci c relations with the block Toeplitz structures arising in queueing problems. In the subsequent chapters we present and analyze our advanced algorithms based on the Toeplitz matrix technology and we perform an accurate study of the customary methods based on functional iterations, providing new convergence results and new strategies to improve the rate of convergence. In particular, in Chap. 3 we give a novel interpretation of Ramaswami's formula in terms of Toeplitz matrices and present a fast algorithm for its computation. In Chap. 4 we analyze customary techniques based on functional iteration methods. After providing an accurate analysis of the rate of convergence, we

6

INTRODUCTION

give an optimality result and we propose two strategies to improve the rate of convergence (the rst one is based on the choice of the initial approximation; the second one relies on a relaxation technique). In Chap. 5 we review the doubling algorithm of Latouche and Stewart, for the inversion of a block Toeplitz matrix in block Hessenberg form, by means of the concept of displacement rank. In Chap. 6 we describe and analyze the cyclic reduction technique and its correlations with block Toeplitz block Hessenberg matrices of nite size. In the case of in nite matrices we give an interpretation of the cyclic reduction in terms of matrix power series. An algorithm for the implementation of cyclic reduction, based on the point-wise evaluation of matrix power series is also described. Finally, we consider the case where the block Toeplitz matrix in block Hessenberg form has an additional inner (block) Toeplitz structure (non-skip-free matrices). Numerical results and comparisons are presented in each chapter. For each chapter we list the papers where our results have been presented: Chap. 3: [75] Chap. 4: [76, 39, 40, 38] Chap. 5: [23] Chap. 6: [17, 18, 21, 19, 22] Reviews of parts of our results have been reported in [16, 20, 77]. Applications of our algorithms to the solution of concrete queueing problems are presented in [4, 71]. Programs implementing our algorithms can be got by anonymous FTP at morse.dm.unipi.it in the directory /pub/MARKOV.

Acknowledgments. I wish to thank Luciano Lenzini who introduced us to the

interesting world of Markov chains and queueing problems, and who provided us real models to test the eectiveness of our algorithms. I am indebted to Marcel F. Neuts for the stimulating discussions on queueing models and for the interest that he has devoted to our results. Thanks to Guy Latouche who has pointed out to me some problems related to functional iterations. Thanks to Attahiru S. Alfa, Srinivas R. Chakravarthy, Vaidyanathan Ramaswami, Udo Krieger, Sid Hantler, and to all the people in the eld of Markov chains who have shown a special interest in our research. Of course, my warm thanks to Dario Bini, who, through his collaboration, has made the research exciting and fruitful.

Chapter 1 Markov chains Here we recall de nitions and the main results concerning nonnegative matrices, discrete time Markov chains and Markov chains of M/G/1 type. The results and the de nitions reported in this chapter, together with further results and references, can be found in [94, 9, 78] for nonnegative matrices (see Sect. 1.1), in [31, 89] for discrete time Markov chains (see Sect. 1.2), in [79, 80, 84] for M/G/1 type Markov chains (see Sect. 1.3).

1.1 Nonnegative matrices In this section we recall some de nitions and some important results concerning nonnegative matrices that will be used later on. De nition 1.1 (Nonnegative matrix) A real matrix A = (aij )ij whose entries satisfy aij 0, for any i and j , is called nonnegative matrix, and we write A 0. Given the real matrices A = (aij )ij and B = (bij )ij , we write A B if ai;j bi;j for any i and j . Given the (possibly complex) matrix A = (aij )ij , we denote with jAj the matrix whose entries are jaij j. De nition 1.2 (Spectral radius) Given the matrix A, the real number max fjj : is eigenvalue Ag is called the spectral radius of A and is denoted by (A). De nition 1.3 (Reducible and irreducible matrix) A matrix A is called reducible if there exists a permutation of rows and columns which transforms A by similarity in the following way # " A 1;1 A1;2 T A = 0 A 2;2 7

CHAPTER 1. MARKOV CHAINS

8

where the blocks A1;1 , A2;2 are square matrices and is a permutation matrix. An irreducible matrix is a matrix that is not reducible.

One of the main results on the spectral properties of nonnegative matrices is expressed by the Perron-Frobenius Theorem [94]:

Theorem 1.1 (Perron-Frobenius) Let A be an n n nonnegative matrix. Then:

1. (A) 0 and there exists a nonnegative eigenvalue of A such that = (A); 2. there exists a nonnegative eigenvector x corresponding to ; 3. if B is an n n real matrix such that B A then (B ) (A). If, in addition, the matrix A is irreducible then: 1. (A) > 0 and the positive eigenvalue such that = (A) is unique; 2. there exists a positive eigenvector x corresponding to ; 3. if B is an n n real matrix such that B A and B 6= A, then (B ) > (A).

Another important result that will be used later on is the following:

Theorem 1.2 (Inequalities) Let A be an irreducible n n nonnegative matrix and let E be the set of n-dimensional positive vectors. Then, for any x 2 E , either

or

( Pn a x ) ( Pn a x ) j =1 ij j ; j =1 ij j < ( A ) < max min 1 i n 1in xi xi Pn a x j =1 ij j = (A): xi

1.2 Discrete time Markov chains

A stochastic process X is a family of random variables X = fX (t); t 2 T g, de ned on a probability space, having the discrete set E as space of the states. The set T usually represents the time parameter set. In the case where the set T is discrete, the stochastic process is called discrete; otherwise, it is called continuous. Markov processes are particular stochastic processes, whose conditional probability is \memory-less":

1.2. DISCRETE TIME MARKOV CHAINS

9

De nition 1.4 (Markov process) A continuous-time stochastic process X (t) is called Markov process if, for any integer n and for any sequence t0 ; t1 ; : : : ; tn such that t0 < t1 < : : : < tn < t it holds

ProbfX (t) ijX (t0 ) = i0 ; X (t1) = i1 ; : : : ; X (tn) = ing = ProbfX (t) ijX (tn) = in g for any i0 ; : : : ; in; i in the set of states E .

In other words, that state of the system at time t only depends on the state of the system at time tn, and the state of the system at time t0 ; : : : ; tn?1 is irrelevant. In the case where the set T is discrete (suppose without loss of generality that T is the set of natural numbers N) then we have discrete time Markov chains.

De nition 1.5 (Discrete time Markov chain) A discrete time Markov chain X is a discrete-time stochastic process X = fXn; n 0g, such that, for any integer n and for any set fi0 ; : : : ; in+1 g of states it holds ProbfXn+1 = in+1 jX0 = i0 ; X1 = i1 ; : : : ; Xn = ing = ProbfXn+1 = in+1 jXn = ing: That is, for any n, the state of the system at time n + 1 depends only on the state of the system at time n. The conditional probabilities ProbfXn+1 = j jXn = ig = pi;j (n) are called transition probabilities, since they represent the probability of making a transition from state i to state j when the time parameter increases from n to n + 1. A Markov chain is called homogeneous if the transition probabilities do not depend on the time n, i.e., if ProbfXn+1 = j jXn = ig = pij ; 8n 0; i; j 2 E: The matrix P = (pi;j ), is called the probability transition matrix associated with the Markov chain. The matrix P is stochastic.

De nition 1.6 (Stochastic matrix) A nonnegative matrix A = (aij ) is called

stochastic if Pj ai;j = 1, for any i.

Hereafter we suppose that the Markov chains are homogeneous and have discrete time. A Markov chain is uniquely determined by its transition probability matrix and by the initial distribution:

Theorem 1.3 For any n; m 2 N, where m 1, and for any i0 ; : : : ; im 2 E , it holds

n

o

Prob Xn+1 = i1; : : : ; Xn+m = im j Xn = i0 = pi ;i pi ;i : : : pim? ;im : 0

1

1

2

1


10

Corollary 1.4 Let be a probability distribution on E , and suppose n o Prob X0 = i = (i) 8i 2 E: Then, 8m 2 N; 8i0 ; : : : ; im 2 E n

o

Prob X0 = i0; X1 = i1; : : : ; Xm = im = (i0 )pi ;i : : : pim? ;im : 0 1

1

Let us denote with p(ijn) the entries of the matrix P n, n 1.

Proposition 1.5 For any n 2 N it holds n o (m) Prob Xn+m = j j Xn = i = pi;j 8i; j 2 E; 8m 2 N: Whence, the probability to reach state j , starting from state i, after n steps, is the i; j -th entry of the matrix P n. Now we give a classi cation of the states. Let j 2 E a xed state and de ne

Tj = inf fn 1 : Xn = j g: Tj is a discrete-time random variable, having as space of the states the set of natural numbers N [ f1g.

De nition 1.7 (Classi cation of the states) The state j is called recurrent if

Otherwise, if

n

o

n

o

Prob Tj < 1 j X0 = j = 1:

Prob Tj = 1 j X0 = j > 0; the state j is called transient. A recurrent state j is null if

E [ Tj j X0 = j ] = +1: where E [X ] denotes the mean of the random variable X . Otherwise, it is called positive. A recurrent state j is called periodic with period if 2 is the largest integer such that n o Prob Tj = n j X0 = j = 1; Otherwise, if such 2 does not exist, it is called aperiodic.

In other words, the state j is recurrent if the probability to reach state j , by starting from j , is 1. The state j is null recurrent if the expected time between two returns to j is in nite. The state j is transient if the probability to not reach the state j , starting from j , is positive. The classi cation of the states allows to state the following result [31]:

1.2. DISCRETE TIME MARKOV CHAINS

11

Theorem 1.6 (Limit theorem) Let j a xed state. The following properties hold: 1. if j is transient or null recurrent, then

lim p(n) n!+1 i;j

= 0 8i 2 E ;

2. if j is positive recurrent and aperiodic, then

lim p(n) n!+1 j;j

n

= j > 0; n!lim p(n) = Fi;j j 8i 2 E +1 i;j

o

where Fi;j = Prob Tj < 1 j X0 = i ; i; j 2 E .

The number limn!+1 p(i;jn) is the probability that, starting from i, we reach state j after \in nite steps". We will show, that, under suitable hypothesis, when j is positive recurrent, the limit does not depend of the initial state i. Let us introduce some de nitions:

De nition 1.8 (Reachable state) We say that the state j 2 E can be reached from the state i, and we write i ! j , if there exists an integer n 0 such that (n) pi;j > 0.

Thus, j can be reached from the state i if there exists states i1 ; i2; : : : ; in such that pi;i > 0; pi ;i > 0; : : : ; pin;j > 0. It is possible to prove that: 1

1

2

Theorem 1.7 (Transient and recurrent states) Suppose that the number of states is nite. A state i 2 E is transient if and only if there exists a state j 2 E such that i ! j and j 6! i. A state i 2 E is recurrent if and only if i ! j implies j ! i: De nition 1.9 (Closed set) A set of states C E is said closed if no state of the set E n C can be reached from a state of C . De nition 1.10 (Irreducible Markov chain) A Markov chain is called irreducible if its only closed set of states is the set of all the states. It is straightforward to observe that:

Theorem 1.8 (Irreducible Markov chains) The following properties are e-

quivalent: 1. the Markov chain X is irreducible;

2. all states can be reached from each other;


12

3. the probability transition matrix is irreducible. Irreducible Markov chains have important properties: Theorem 1.9 (Classi cation of the states) Let X be an irreducible Markov chain. Then either all states are transient, or all are recurrent null, or all are positive recurrent. Either all states are aperiodic, or else if one is periodic with period , then all states are periodic with period . In the case of irreducible aperiodic Markov chains we provide a necessary and sucient condition for the positive recurrence: Theorem 1.10 (Positive recurrence) Let X be an irreducible aperiodic Markov chain. Then all states are positive recurrent if and only if the linear system ( j = Pi2E ipi;j ; j 2 E P (1.1) =1 j 2E j

admits a solution = (j )j . If a solution exists, then it is unique and has positive components; moreover, for any j 2 E , it holds (n) j = nlim !1 pi;j 8i 2 E:

Denote with e the vector (of suitable dimension) having all the components equal to 1. De nition 1.11 (Probability invariant vector) A vector that veri es (1.1), i.e., that solves the system of equations ( T T = P (1.2) T e = 1 is called probability invariant vector for the Markov chain X . Thm. 1.10 states that, if an irreducible aperiodic Markov chain has a probability invariant vector, then all states are positive recurrent. The word \invariant" comes from the fact that T = T P = T P 2 = : : : = T P n 8n 2 N: Thus, if is the initial probability distribution of the Markov chain X , that is, if n o Prob X0 = j = j 8j 2 E; then n o X Prob Xn = j = i p(i;jn) = j 8j 2 E 8n 2 N: i2E

In the case where the set of the states E is nite, the irreducibility of the Markov chain guarantees its positive recurrence, due to the Perron-Frobenius Theorem 1.1.

1.3. MARKOV CHAINS OF M/G/1 TYPE

13

1.3 Markov chains of M/G/1 type Markov chains of M/G/1 type, introduced by M.F. Neuts in [80], are de ned by transition probability matrices of the form 3 2B A 0 1 0 77 66 B2 A1 A0 T 7 6 (1.3) P = 6 B3 A2 A1 A0 4 . . . . . 75 .. .. . . . . . .

P+1 AT and where A , B , i 0, are m m nonnegative matrices such that i i +1 i=0 i P+1 B T are stochastic. A speci c case is the one arising from Quasi-Birth-Death i=1 i (QBD) problems, where in addition to the M/G/1 structure, the matrix P T is also block tridiagonal. That is, P is uniquely de ned by the blocks A0 , A1, A2 , B1 , B2 , since Ai = Bi = 0 for i > 2. More general M/G/1 type Markov chains, called non-skip-free in [45], are de ned by transition matrices of the form 3 2 B ::: B A 0 1;1 1;k 0 77 66 B2;1 : : : B2;k A1 A0 T 77 6 (1.4) P = 6 B3;1 : : : B3;k A2 A1 A0 5 4 . ... ... ... . . . . . . .. wherePAi, Bi+1;j , iP 0, j = 1; : : : ; k, are m m nonnegative matrices such that +i=01 ATi and +i=11 Bi;jT , j = 1; : : : ; k, are stochastic. This class of transition matrices can be regarded as special cases of the block Hessenberg form (1.3) by partitioning the matrix P T in (1.4) into blocks Ai, Bi of dimension M = mk, where 2 A 0 ::: 0 3 66 0 . . . ... 77 A A 77 ; 6 1 0 A0 = 66 .. . . . . . . 0 75 4 . Ak?1 : : : A1 A0 (1.5) 2 A 3 Aik?1 : : : Aik?k+1 66 ik ... 77 ... A A 6 77 ; i 1; ik +1 ik Ai = 66 .. . . . . . . Aik?1 75 4 . Aik+k?1 : : : Aik+1 Aik and 2B 3 (i?1)k+1;1 B(i?1)k+1;2 : : : B(i?1)k+1;k 66 B(i?1)k+2;1 B(i?1)k+2;2 : : : B(i?1)k+2;k 77 77 ; i 1: Bi = 664 ... ... ... 5 Bik;1 Bik;2 : : : Bik;k


14

Similar transition matrices arise in the case of nite Markov chains or where the in nite matrix P T is replaced by a nite stochastic matrix obtained by cutting P T to a nite (large) size and by adjusting the last row in order to preserve the stochastic nature of the matrix. In this case the resulting structure becomes:

2 B1 66 B 66 .2 T P = 66 .. 64 Bn?1 Cn

A0 A1 ... An?2 Cn?1

3 77 A0 77 ... ... 77 : : : : A1 A0 75 : : : C2 C1 0

(1.6)

For block tridiagonal matrices, like the ones arising from QBD problems, P is then uniquely de ned by the blocks A0 , A1 , A2, B1, B2, C1, C2. In the case of in nite matrices of the structure (1.3), it possible to provide conditions for the positive recurrence, i.e., for the existence of a vector solving (1.2). Indeed the following fundamental result holds [80]:

Theorem (Positive recurrence) Suppose that the matrices P and P+1 A are1.11 Let a, b be two m-dimensional vectors such that a = Pi+=01 iAi e, Pirreducible. +1 AT b = b, jjbjj = 1. Then the Markov chain is positive recurrent i=1 i i=0 i if and only if = bT a < 1. For positive recurrent in nite Markov chains (1.3), the vector solving (1.2) can be computed by means of a recursive, numerically stable formula (Ramaswami's formula [84]), once the solution of the following nonlinear matrix equation is known: + X1 (1.7) X = X i Ai ; i=0

where X is an m m matrix. It is possible to prove [80] that for positive recurrent Markov chains, the equation (1.7) has a unique nonnegative solution G, such that GT is stochastic. De ne the matrices

X X Ai = Gj?iAj ; Bi = Gj?iBj ; for i 1; +1

+1

j =i

j =i

(1.8)

and partition the vector into m-dimensional vectors i, i 0, according to the block structure of (1.3). Then, the vector 0 solves the system

( B = 1T 0 T 0 P+1 ?1 P+1 e + e (I ? i=1 Ai ) i=2 Bi 0 = 1;

1.3. MARKOV CHAINS OF M/G/1 TYPE

15

and the remaining components of can be computed by means of the Ramaswami formula [84]:

0 1 i?1 X i = (I ? A1 )?1 @Bi+1 0 + Ai+1?j j A ; i 1: j =1

(1.9)

Formula (1.9) was rst proved by Ramaswami in [84], by using results from Markov renewal theory (Ramaswami's proof may also be found in [80]). In [85], formula (1.9) was proved by using a dierent technique, based on the computation of the Schur complement of a leading principal submatrix of I ? P T . In both the approaches, the proof consists in building, by suitably truncating the matrix P of (1.3), a sequence fP (n)gn of stochastic (n + 1) (n + 1) block matrices, whose probability invariant vector is given, up to a multiplicative constant, by the corresponding components of the in nite vector . Solving the system (1.2) and/or the matrix equation (1.7) is fundamental in the analysis of Markov chains and in queueing problems. It is possible to rephrase the matrix equation (1.7) in terms of block Toeplitz block Hessenberg matrix and to reduce the design and analysis of algorithms for solving (1.7) to the design and analysis of algorithms for treating the in nite matrix: 3 2 I ? A ?A 0 1 0 77 66 ?A2 I ? A1 ?A0 77 : 6 (1.10) H = 6 ?A3 ?A2 I ? A1 ?A0 5 4 . ... ... ... ... .. Indeed, if G is a solution of (1.7) then a solution of the in nite system [X1; X2; X3; : : :] H = [A0 ; 0; 0; : : :] ; (1.11) where Xi are m m matrices, is given by Xi = Gi, i 1. Moreover, for positive recurrent Markov chains the equation (1.7) has only one nonnegative solution and the system (1.11) has only one nonnegative solution [44]. Therefore solving (1.11) provides the only nonnegative solution of (1.7). Observe that in order to compute G it is sucient to compute the block entry (H ?1)1;1 of H ?1 in the north western corner, in fact it holds G = A0 (H ?1)1;1. Observe also that, if we consider the nite system obtained by truncating (1.11) at the dimension n [X1(n) ; X2(n); : : : ; Xn(n)]Hn = [A0 ; 0; : : : ; 0]; (1.12) where 2 I ? A ?A 3 0 1 0 66 ?A2 I ? A1 ?A0 77 66 .. 77 ... ... ... (1.13) Hn = 66 . 7; 64 ?A ... . . . I ? A ?A 775 1 0 n?1 ?An ?An?1 : : : ?A2 I ? A1

16


then fX1(n)gn is a sequence of nonnegative matrices that monotonically converges to G, as n tends to in nity. Throughout the thesis we suppose that the matrices P and P+i=01 Ai are irreducible and that the corresponding Markov chain is positive recurrent.

Chapter 2 Exploitation of structures and computational tools The research in the eld of Toeplitz computations oers a consolidated variety of advanced tools for dealing with the matrix structures presented in the previous chapter. The presently available Toeplitz matrix technology constitutes a powerful means in the design and analysis of ecient algorithms for the numerical solution of Markov chains and queueing problems. In this chapter we recall the main tools, presented in [20, 21, 16], that we will use later on in order to devise ecient solution algorithms, and we adapt them to the speci city of the class of problems that we are going to solve. In Sect. 2.1 we recall the de nition of Discrete Fourier Transforms (DFT) and some known algorithms for their fast computation (FFT). In Sect. 2.2 we describe and analyze the relations connecting block Toeplitz matrices, matrix polynomials and matrix power series. We show that matrix computations involving (in nite) block Toeplitz matrices can be rephrased in terms of matrix polynomials or matrix power series, thus generalizing the correlations between Toeplitz computations and polynomial computations widely investigated in [26]. We also introduce some basic algorithms for polynomial (power series) computations and for manipulating block Toeplitz matrices. In particular, in Sect. 2.2.1 we consider the problem of computing block Toeplitz matrices and block vector products, in Sect. 2.2.2 we analyze the problem of the inversion of block triangular block Toeplitz matrices. In Sect. 2.3 we deal more speci cally with computations concerning matrix power series and in nite block Toeplitz matrices. We introduce an algorithm, based on FFT, for the point-wise evaluation of expressions involving matrix power series. Finally in Sect. 2.4 we select a suitable displacement operator and recall its main properties in the speci c relation with the block structures arising in Markov chain modeling. 17

18

CHAPTER 2. STRUCTURES AND COMPUTATIONAL TOOLS

2.1 Fast Fourier Transforms Let us introduce some notation and de nitions concerning discrete Fourier transforms. Let i be the imaginary unit such that i2 = ?1 and denote !N = cos 2N + i sin 2N ; a primitive N -th root of unity such that the powers !Nj , j = 0; : : : ; N ? 1, are all the N -th roots of unity. Moreover let z denote the complex conjugate of the complex number z. De nition 2.1 (DFTTand IDFT) For N dimensional complex vectors aT = [a0 ; a1; : : : ; aN ?1] and b = [b0 ; b1; : : : ; bN ?1 ], such that

bj = or, equivalently,

NX ?1 k=0

ak !Nkj ;

NX ?1 1 aj = N !Nkj bk ; k=0 we denote b = DFTN (a), the discrete Fourier transform of a of order N , and a = IDFTN (b), the inverse discrete Fourier transform of b of order N . If the order of the DFT or IDFT is clear from the context, we will use the notation DFT and IDFT , instead of DFTN and IDFTN , respectively. Observe that evaluating b = DFTN (a) corresponds to computing the values that the polynomial a(z) = PkN=0?1 ak zk takes on at the N -th roots of unity. Similarly, computing a = IDFTN (b) corresponds to solving an interpolation problem, viz., that of computing the coecients of a polynomial a(z) given its values at the N -th roots of unity. It is well known [36] that, if N = 2M , where M is a positive integer, the computation of DFTN and IDFTN can be performed with O(N log N ) real arithmetic operations (hereafter denoted by ops) by means of the base-2 FFT (Fast Fourier Transform) algorithms. More precisely, in the case of real input, i.e., when the vectors a and b have real components, the cost of DFTN and IDFTN is 25 N log N + O(1) ops and 25 N log N + N + O(1) ops, respectively, if we do not count the cost of computing the N -th roots of 1. Moreover, the computation of FFT is weakly numerically stable, i.e., it is possible to give very good relative error bounds in terms of Euclidean norm jjjj2 ([26, 73]). More precisely, if a~ and b~ denote the values of a and b computed by means of the base-2 FFT by using

oating point arithmetic, with a machine precision , then jja~ ? ajj2 5pp1N log N jjbjj2 = 5 log N jjajj2 jjb~ ? bjj2 5 N log N jjajj2 = 5 log N jjbjj2:

2.1. FAST FOURIER TRANSFORMS

19

Now we recall two known equations relating the vectors w; z 2 CN , where C denotes the set of complex numbers, such that z = DFTN (w), on which the algorithms of Cooley-Tukey and Sande-Tukey for the FFT computation are based. Let D = Diag(1; !N ; !N2 ; : : : ; !NN=2?1); (2.1) w(1) = (wj ); w(2) = (wN=2+j ); N= 2 ( even ) ( odd ) z = (z2j ); z = (z2j+1) 2 C : Then it holds z (even) = DFT (w(1) + w(2) ) N=2 (2.2) z (odd) = DFTN=2 (D(w(1) ? w(2) )) and w(1) = (IDFT (z (even) ) + D IDFT (z(odd) ))=2 N=2 N=2 (2.3) (2) ( even ) w = (IDFTN=2 (z ) ? D IDFTN=2 (z (odd) ))=2 where D is the complex conjugate of D. The following problem, which consists in computing the DFT of a real vector of N components, is solved by means of the computation of a complex DFT of order N=2. Problem 1. Given the integer N = 2M , MP> 0 integer, and the real coecients a0 ; a1; : : : ; aN ?1 of the polynomial a(z) = jN=0?1 aj zj , compute the values bj = a(!Nj ), j = 0; : : : ; N ? 1, that the polynomial a(z) takes on at the N -th roots of 1. Solution. The following well-known formulae reduce Problem 1 to computing a complex DFT of order N=2. Let w = (wj ); wj = a2j + ia2j+1; j = 0; : : : ; N=2 ? 1; (2.4) z = DFTN=2 (w); then b0 = re(z0) + im(z0); bN=2 = re(z0 ) ? im(z0 ); (2.5) bj = ((zj + zN=2?j ) ? i!Nj (zj ? zN=2?j ))=2; bN=2+j = bN=2?j ; j = 1; : : : ; N=2 ? 1:

2

The converse of Problem 1 is described below. Problem 2. Given the integer N = 2M as in Problem 1,P and the values bj = j a(!N ), j = 0; : : : ; N ? 1 that the real polynomial a(z) = jN=0?1 aj zj takes on at the N -th roots of 1, compute the coecients a0 ; a1 ; : : : ; aN ?1 of a(z). Solution. Problem 2 can be easily reduced to the computation of a complex IDFT of order N=2; in fact, we have the following formulae that can be obtained by reverting (2.4) and (2.5). Let z = (zj ); (2.6) zj = ((bj + bN=2+j ) + i!Nj (bj ? bN=2+j ))=2; j = 0; : : : ; N=2 ? 1; w = IDFTN=2 (z);

20 then

CHAPTER 2. STRUCTURES AND COMPUTATIONAL TOOLS a2j = re(wj ); a2j+1 = im(wj ); j = 0; : : : ; N=2 ? 1:

(2.7)

2

The next problem consists in computing the values that a polynomial of degree less than N takes on at the odd powers of !2N , once the values taken on at the even powers of !2N have been computed. Problem 3. Given the integer N = 2M , as in Problem 1, the2j coecients a0 ; a1; : : : ; aN ?1 of the polynomial a(z) and the values b2j = a(!2N ) = a(!Nj ), j = 0; : : : ; N ? 1 that a(z) takes on at the even powers of !2N , compute the values b2j+1 = a(!22Nj+1), j = 0; : : : ; N ? 1, that a(z) takes on at the odd powers of !2N . Solution. Indeed, Problem 3 can be solved by computing the DFT of order 2N of the real vector u = (u(1)T ; u(2)T )T , where u(1) = (a0; : : : ; aN ?1)T , u(2) = 0, i.e., v = DFT2N (u). But in this way the values b2j would be computed once again. In a more ecient way this problem can be solved by reducing it to the computation of a complex DFT of order N=2. In fact, by using (2.4) and (2.5) where N is replaced by 2N , we nd that Problem 3 is reduced to computing z = DFTN (w) where the vector w = (wj ) is such that wj = a2j + ia2j+1, wN=2+j = 0, j = 0; : : : ; N=2 ? 1. Partitioning w as (w = (w(1)T ; w(2)T )T and applying (2.1) and (2.2) yields

z(even) = DFTN=2 (w(1) ) z(odd) = DFTN=2 (Dw(1) ) since w(2) = 0. Now, again from (2.5) we nd that the components b2j+1 , are fully de ned by the components z2j+1. In this way the computation is reduced to a single DFT of order N=2, namely DFTN=2 (Dw(1) ). 2 In the implementation of fast algorithm for solving M/G/1 type Markov chains we also need to solve the following problem. Problem 4. Given the integer N = 2M as in ProblemP1, the values b2j+1 = 2j +1 N ?1 a z j takes on at a(!2N ), j = 0; : : : ; N ? 1 that the real polynomial a(z) = j2=0 j the odd powers of !2N , and a(1) = IDFTN (b(even) ), b(even) = (b2j ), b2j = a(!22Nj ) = a(!Nj ), j = 0; : : : ; N ? 1, compute the coecients a0 ; a1; : : : ; a2N ?1 of a(z). Solution. Indeed, this problem can be solved by computing a = IDFT2N b. A cheaper solution can be obtained as follows. By applying (2.6) and (2.7) where N is replaced by 2N , we nd that the vector a is recovered from the components of w, where w = IDFTN (z). Now, in order to compute w we apply (2.1), (2.3) and reduce the problem to computing IDFTN=2 (z(even) ) and IDFTN=2(z (odd) ). Now, the vector IDFTN=2 (z (even) ) is already available and needs the computation of no IDFT. In fact, observe that applying (2.6) and (2.7) for the computation of a(1) = IDFTN=2 (b(even) ), i.e., with aj and bj replaced with a(1) j and b2j , respectively, yields

2.2. BLOCK TOEPLITZ MATRICES COMPUTATION

21

(even) ) = IDFTN=2 (z(even) ) as explicit function of a(1) j , more precisely, IDFT(z (1) (a(1) 2 2j + ia2j +1 ).

2.2 Block Toeplitz matrices computation The relation between Toeplitz matrices and polynomials have been implicitly used in many elds but only recently pointed out in a detailed way in [15, 24, 25, 26, 46]. In particular, computations like polynomial multiplication, polynomial division (evaluation of quotient and remainder), polynomial GCD and LCM, Pade approximation, modular computations, Chinese remainder, and Taylor expansion, have their own counterparts formulated in terms of Toeplitz or Toeplitz-like computations, and vice-versa. Quite surprisingly, almost all the known algorithms for polynomial division match with corresponding algorithms independently devised for the inversion of triangular Toeplitz matrices [24]. Also several computations involving matrix algebras strictly related with Toeplitz matrices, as the circulant class, the class, and the algebra generated by a Frobenius matrix, can be rephrased in terms of computations among polynomials modulo a given speci c polynomial associated with the algebra. These results can be extended in a natural way to the case of block Toeplitz matrices; a fact that does not seem to be well known in the literature. For instance, a remark in [88] states that \there are fast algorithms for multiplying Toeplitz matrices by a vector. Unfortunately, they do not generalize to block Toeplitz matrices". The goal of this section is to derive FFT-based algorithms for performing operations between block Toeplitz matrices, such as computing the product of a block Toeplitz matrix and a block vector, and computing the inverse of a block Toeplitz block triangular matrix. In order to do achieve these results, we exploit the relations between block Toeplitz matrices and matrix polynomials. First, we recall the de nition of a block Toeplitz matrix: De nition 2.2 (Block Toeplitz matrix) A (possibly in nite) block matrix T = (Ti;j ) is called block Toeplitz if its block components are constant along each block diagonal, i.e., if, for any xed h, it holds Ti;j = Ti+h;j+h, for any i, j . scalar indeterminate and consider the matrix polynomial A(z) = Pp LetA zzi ,beofadegree p [65], where Ai, i = 0; : : : ; p, are m m matrices having i=0 i real entries and Ap 6= 0. A matrix polynomial can be viewed as a polynomial in z having matrix coef cients, or, equivalently, as a matrix having entries that are polynomials. Similarly, a matrix power series is a series having matrix coecients, or, equivalently, a matrix having entries that are power series. A matrix polynomial is fully de ned by its matrix coecients A0 ; : : : ; Ap, or, equivalently, by the (p + 1) 1 block matrix A having blocks A0; : : : ; Ap. We call

22


block column vector any n 1 block matrix. Similarly we call block row vector any 1 n block matrix. The blocks de ning a block (row/column) vector are called block components.

2.2.1 Block Toeplitz matrices and block vector products

Given two matrix polynomials A(z) and B (z) of degrees p and q, respectively, we may consider the matrix polynomial C (z) = A(z)B (z) having degree at most p + q, obtained by P means of the row-by-column product of A(z) and B (z). If +q we denote C (z) = pi=0 Cizi , then we have C0 = A0 B0, C1 = A0 B1 + A1B0 , C2 = A0 B2 + A1B1 + A2 B0, : : :, Cp+q = ApBq . That is, in matrix form

2 66 66 4

C0 C1 ... Cp+q

2 66 A0 3 666 A.1 77 66 .. 77 = 66 Ap 5 666 66 0 64 ...

0

::: ... ... ... ... ...

A0 ... ... ... ... 0 ::: 0

0

...

0

A0 A1 ... Ap

3 77 77 2 77 B0 77 66 B1 77 66 .. 77 4 . 77 Bq 75

3 77 77 : 5

(2.8)

Observe that, by choosing p = 2n ? 2, q = n ? 1, the middle n block rows of the matrix equation (2.8) yield the following product between a general block Toeplitz matrix and a block column vector.

2 C 66 Cn?n 1 66 .. 4 . C2n?2

3 2 An?1 77 66 A 77 = 66 .n 5 64 .. A2n?2

An?2 An?1 ... :::

3 : : : A 0 2 B0 3 . . . ... 77 66 B1 77 7 . . . A 775 664 ... 775 : n?2 Bn?1 An An?1

(2.9)

This shows that the product of a block Toeplitz matrix and a block vector can be viewed in terms of the product of two matrix polynomials. More speci cally, equation (2.8) can be used together with an evaluation/interpolation technique at the roots of unity in order to eciently compute the product (2.9), as we now clarify. In summary, the evaluation/interpolation technique for computing a product of the form (2.9) proceeds according to the following scheme: 1. Evaluate A(z) at the N -th roots of unity, !Nj , j = 0; : : : ; N ? 1, for a choice of N that satis es N > 3n ? 3. This requires that we evaluate all m2 entries of A(z) at the roots of unity, and this computation can be performed by applying m2 DFT's of order N each.


23

2. Evaluate B (z) at the N -th roots of unity, again by performing m2 DFT's of order N each. 3. Compute the N matrix products C (!Nj ) = A(!Nj )B (!Nj ), j = 0; : : : ; N ? 1. The total cost of this step is O(m3N ) operations. 4. Interpolate the values of the entries of C (!Nj ) by means of m2 IDFT's of order N each, and recover the coecients of the m2 polynomials, i.e., the blocks Ci. The total cost of the above procedure is O(m2N log N + m3 N ) operations where O(m2N log N ) is due to the FFT's, while O(m3N ) is the cost of stage 3. Therefore we have proved that:

Theorem 2.1 (Block Toeplitz matrix and vector product) The product of an n n block Toeplitz matrix having blocks of size m and a block column vector can be performed by means of O(m2 ) FFT's of length N = O(n) in O(m2N log N + m3 N ) arithmetic operations.

This cost is below the customary gure of O(m3N 2 ) operations for carrying out the matrix multiplication (2.9). We may in fact devise a more ecient algorithm involving FFT's of lower order for computing the product between a block Toeplitz matrix and a block vector if we consider computations modulo the polynomial zN ? 1. Indeed, given matrix polynomials P (z); Q(z); R(z) of degree at most N ? 1 such that R(z) = P (z)Q(z) mod zN ? 1; (2.10) we may similarly rephrase the above equation in matrix form in the following way: 2 R 3 2 P0 P1 : : : Pk?1 3 2 Q 3 ... 77 66 Q01 77 66 R01 77 66 P ... P 77 6 . 7 : 66 .. 77 = 66 k.?1 . 0 . (2.11) 6 7 6 . . . 4 . 5 4 . . . P1 75 4 .. 5 Qk?1 Rk?1 P1 : : : Pk?1 P0 The matrix on the right hand side of (2.11) is a block circulant matrix:

De nition 2.3 (Block circulant matrix) An N N block matrix is called

block circulant if the block entries in the i-th block row and in the j -th block column depend on j ? i mod N .

The blocks Ri can be eciently computed, given the blocks Pi and Qi , by exploiting the polynomial relation (2.10). In fact, from (2.10) we deduce that R(!Nj ) = P (!Nj )Q(!Nj ), j = 0; : : : ; N ? 1. Therefore the blocks Ri can be computed by means of the evaluation/interpolation technique at the N -th roots of unity.

24


Now, given an nn block Toeplitz matrix A = (Ai?j+n?1)i;j=1;:::;n it is possible to embed A into an N N block circulant matrix H , N = 2n, de ned by its rst block column having blocks An?1; An; : : : ; A2n?2; 0; A0 ; : : : ; An?2, i.e., 3 2 A 0 A2n?2 : : : An n?1 An?2 : : : A0 66 . . . . ... 77 . A0 0 77 66 An An?1 . . .. ... ... ... A ... ... A 7 66 ... 2n?2 7 n?2 7 66 A : : : A A : : : A 0 77 2n?2 n n ? 1 An?2 0 6 H=6 0 A An An?1 An?2 : : : A0 777 : 66 2n?2 : : : . . . ... . . . ... 77 66 A0 0 A A n n ? 1 66 . ... . . . . . . A 775 ... ... A 4 .. 2n?2 n?2 An?2 : : : A0 0 A2n?2 : : : An An?1 The product of H and the block column vector de ned by the blocks B0 , B1 , : : :, BN ?1, where Bj = 0 for j n, delivers a block column vector of N block components C0; C1; : : : ; CN ?1 such that 3 2 2 3 C0 B0 66 .. 77 66 .. 77 66 . 77 66 . 77 7 6 6 7 H 66 Bn0?1 77 = 66 CCn?1 77 : 7 6 n 7 66 64 ... 775 664 ... 775 0 CN ?1 Since the leading principal block submatrix of H of block size n coincides with A, the blocks C0; C1; : : : ; Cn?1 de ne the block column vector C such that C = AB , where B is the block column vector de ned by B0; B1 ; : : : ; Bn?1. In this way we arrive at the following algorithm for the multiplication of a block Toeplitz matrix and a block vector:

Algorithm 2.1 (Block Toeplitz and vector multiplication) Input. The m m matrices A0 ; A1 ; : : : ; A2n?2 de ning the n n block Toeplitz matrix A = (Ai?j+n?1 )i;j=1;:::;n ; the m m matrices B0 ; B1 ; : : : ; Bn?1 de ning the block vector B . Output. The m m matrices C0 ; C1 ; : : : ; Cn?1 de ning the block vector C such that C = AB . Computation.

1. Evaluate the matrix polynomial

(z) = An?1 + Anz + + A2n?2 zn?1 + A0zn+1 + + An?2z2n?1


25

at the 2n roots of 1, by means of m2 DFT's of order 2n each, and obtain the matrices (!2jn ), j = 0; : : : ; 2n ? 1. 2. Evaluate the matrix polynomial (z) = B0 + B1z + + Bn?1zn?1 at the 2n roots of 1, by means of m2 DFT's of order 2n each, and obtain the matrices (!2jn ), j = 0; : : : ; 2n ? 1. 3. Compute the products (!2jn ) = (!2jn) (!2jn ), j = 0; : : : ; 2n ? 1. 4. Interpolate (!2jn ) by means of m2 IDFT's of order 2n each, obtain the coecients 0 ; 1 ; : : : ; 2n?1 such that

(z) =

X

2n?1

i=0

izi = (z) (z) mod z2n ? 1

and output Ci = i, i = 0; : : : ; n ? 1.

2.2.2 Inversion of block triangular block Toeplitz matrices

We can also devise ecient algorithms for the inversion of block triangular block ToeplitzPmatrices. For thisPpurpose, observe that given three matrix power series P + 1 + 1 + 1 i i A(z) = i=0 Aiz , B (z) = i=0 Biz , and C (z) = i=0 Cizi , the relation C (z) = A(z)B (z) can be equivalently rewritten in matrix form, in a way similar to (2.8), as 32 3 2 3 2 C A 0 0 0 0 7 77 66 B 66 C1 77 66 A1 A0 B 4 . 5 = 4 . . . 5 4 .1 75 : .. .. .. . . . . Analogously, the same functional relation modulo zn, i.e., C (z) = A(z)B (z) mod zn, can be rewritten as the nite lower block triangular block Toeplitz system 2 3 3 2 32 B C A 0 0 0 0 66 .. 77 66 .. . . 77 66 .. 77 (2.12) . 4 . 5=4 . 54 . 5: Bn?1 Cn?1 An?1 : : : A0 In this way, the product of two matrix power series modulo zn can be viewed as the product of an n n lower block triangular block Toeplitz matrix and a block vector. Now since the inversion of a block triangular block Toeplitz matrix can be viewed as the inversion of a matrix power series A(z) modulo zn, that is as the computation of the coecients of the matrix polynomial B (z) such that A(z)B (z) = I mod zn ; (2.13) we see that the inverse of a block triangular block Toeplitz matrix is still a block triangular block Toeplitz matrix itself.

26


We now describe an algorithm for the inversion of a block triangular block Toeplitz matrix that has been derived for scalar entries in [64]. In the matrix polynomial framework, this algorithm extends to matrix polynomials the so-called Sieveking-Kung algorithm, which is based on Newton's iteration for inverting a polynomial [26, 63]. Let us denote by Tn the coecient matrix in (2.12), and assume for simplicity that n = 2q with q a positive integer. We further partition Tn as follows T 0 n= 2 Tn = H ; n=2 Tn=2 where all the four blocks are (n=2) (n=2) block Toeplitz matrices. Moreover, let B0; : : : ; Bn?1 denote the block components of the rst block column of Tn?1, i.e., the solution of the linear system of equations 2 B 3 2I3 66 B01 77 66 0 77 (2.14) Tn 66 .. 77 = 66 .. 77 : 4 . 5 4.5 0 Bn?1 Now note that, in view of the block lower triangular structure of Tn, it holds " # ?1 T 0 ? 1 n= 2 Tn = ?T ?1 H T ?1 T ?1 ; n=2 n=2 n=2 n=2 which shows that the rst block column that solves (2.14) can be computed from the rst block column of Tn=?12 by means of two multiplications between an (n=2) (n=2) block Toeplitz matrix and a block vector. More speci cally, we have 2 Bn 3 2 B 3 66 B n +1 77 66 B01 77 66 . 77 = ?T n?1H n 66 .. 77 : (2.15) 4 .. 5 4 . 5 B n ?1 Bn?1 This observation leads to the following algorithm, whose cost is O(m3n+m2n log n) operations. 2

2

2

2

2

Algorithm 2.2 (Inversion of a block triangular block Toeplitz matrix)

(This algorithm also solves the congruence relation A(z)B (z) = I mod z n .) Input. The m m matrices A0 ; A1 ; : : : ; An?1 , n = 2q , det A0 6= 0, de ning the rst block column of the block triangular blockPToeplitz matrix Tn (equiva?1 Ai z i .) lently, de ning the matrix polynomial A(z) = in=0 Output. The matrices B0 ; B1 ; : : P : ; Bn?1 satisfying (2.14), or equivalently, such ?1 Bi z i solves the congruence A(z )B (z ) = that the polynomial B (z) = in=0 I mod zn .

2.3. POINTWISE POWER SERIES ARITHMETIC

27

Computation.

1. Compute B0 = A?0 1 . i h T = B T ; : : : ; B Ti 2. For i = 0; : : : ; q ? 1, given the rst column U 0 2 ?1 of i h T2?i 1 , compute the block vector V T = B2Ti ; : : : ; B2Ti ?1 , which de nes the remaining blocks of the rst column of T2?i 1 , by applying equation (2.15) with n = 2i+1 where the products W = H2i U and V = T2?i 1 W are computed by means of Alg. 2.1. +1

+1

Hence, we have proved that:

Theorem 2.2 (Block triangular block Toeplitz inverse) The inverse of an n n block triangular block Toeplitz matrix, having blocks of size m, can be computed by means of O(m2n log n + m3 n) arithmetic operations.

We may remark that the solution of (2.13) can also be computed by formally applying the Newton-Raphson algorithm to the equation

A(z)?1 ? B (z) = 0 ; where the unknown is A(z), thus obtaining the functional iteration

(i+1) (z) = 2(i)(z) ? (i)(z)2 B (z) ; i = 0; 1; : : : ; (0) (z) = A?0 1 : It is easy to show that

(i)(z) = B (z) mod z2i : Hence, the algorithm obtained by rewriting the above formula as (i+1) (z) = 2(i) (z) ? (i) (z)2 B (z) mod z2i ; +1

and by implementing the latter equation by means of the evaluation/interpolation procedure at the roots of unity, is equivalent to Alg. 2.2 (compare with [24] for the scalar case).

2.3 Pointwise power series arithmetic In certain computations related to Markov chains we need to perform several operations (multiplications and matrix inversion) among in nite lower block triangular block Toeplitz matrices, or equivalently among matrix power series. More speci cally, we have to compute rational functions Y = F (W (1) ; : : : ; W (h)), in the matrix power series W (i) = W (i)(z), i = 1; : : : ; h.


28

Now since all these series are assumed convergent for jzj = 1, their block coecients will tend to zero. Hence, we may replace the power series by matrix polynomials and then apply the algorithms of Sect. 2.2. However, in order to truncate a power series W (i) (z) at a degree that results in a negligible remainder, we need to know its numerical degree.

De nition P 2.4 (-degree and numerical degree) For a matrix power series A(z) =

+1 i=0

Aizi convergent in the closed unit disk, we de ne its -degree as the minimum integer d such that eT P+i=1d+1 jAij < eT . When is the unit roundo of the used oating point arithmetic, then d is said to be the numerical degree of the matrix power series.

Throughout this chapter we assume that all the series are convergent in the closed unit disk. The evaluation of a function F by means of a coecient-wise arithmetic requires that we apply several FFT's for all the products and inversions involved. Even when the numerical degree of the output is rather small, the degrees of the intermediate power series might be large depending on the number of operands and on the computations performed. For this reason, the fast coecient-wise arithmetic and the fast Toeplitz matrix computations described in the previous section may be inadequate to achieve highest performance. An alternative way for computing the coecients of the matrix power series F is the use of point-wise arithmetic. In this approach, assuming knowledge of the numerical degree d of F , it is sucient to rst evaluate all the individual matrix power series W (i) (z) at the (d +1) roots of unity and then to evaluate d +1 times the function F at these values, thus obtaining Y (!dj+1 ), j = 0; 1; : : : ; d. In this way, by means of a single interpolation stage at the d + 1 roots of unity, it is possible to recover all the coecients of the matrix power series Y . This strategy allows us to reduce the number of FFT's to m2(h + 1), and it is particularly convenient in the case where the numerical degree of the output is smaller than the numerical degree of the input or of the intermediate matrix power series. In order to arrive at an ecient implementation of this technique, a criterion for the dynamical evaluation of the numerical degree is needed. The design and analysis of such a criterion is performed in Sect. 6.2, where the speci c properties of the function F and of the power series are used. For now, we simply assume that the following test function is available: TEST(Y; A) is true if the numerical degree of Y is less than or equal to the degree d of the matrix polynomial A(z) that interpolates Y at the (d + 1) roots of unity. A dynamic evaluation of the matrix power series Y may proceed according to the following scheme: Set d = 0.

2.3. POINTWISE POWER SERIES ARITHMETIC

29

Repeat Set d = 2d + 1, compute W (i) (z) at the (d + 1) roots of unity, apply the point-wise evaluation of F , and compute the coecients of the matrix polynomial A(z) of degree d that interpolates Y . Until TEST(Y; A) is true.

Set Y = A. It is worth observing that in order to compute the values of W (i) (z) at the (d+1) roots of unity, once the values of the same power series have been computed at the ( d?2 1 +1) roots of unity, it is not needed to apply afresh the FFT's of order d + 1. The same observation applies to the computation of the interpolation stage. Algorithms for performing these computations, where part of the output has been precomputed, are provided by the solutions of Problems 2 and 4 in Sect. 2.1. An interesting property of the pointwise power series arithmetic is that it is possible to estimate the error of the coecients of the polynomial that interpolates the series [21]:

Proposition 2.3 PLet s(z) = P+i=01 sizi be a series converging in the closed unit disk. Let r(z) = iN=0?1 riz i be the polynomial of degree N ? 1, N power of 2, interpolating the series s(z) at the N -th roots of 1. Then it holds that

ri ? si =

X

+1

j =1

si+jN ; i = 0; : : : ; N ? 1:

Proof. Let ! be a principal N -th root of 1. Then the coecient ri , for i = 0; : : : ; N ? 1, is given by + MX ?1 NX ?1 +X 1 X1 1 1 ( ? i + j ) k ? ik k ! sj = si+jN : ! s(! ) = N ri = N j =0 k=0 k=0 j =0

2

The above proposition can be easily extended to the case of matrix power series, thus obtaining the following [22]:

Theorem 2.4 (Matrix power series product) The rst N coecients of the product C (z) of an m m matrix power series A(z) and an m p matrix power

series B (z), can be approximated within the error bound by means of 2mp + m2 FFT's of length N in O(m2 pN +(mp + m2 )N log N ) arithmetic operations, where N is the -degree of C (z). Proof. The product is computed by means of the evaluation/interpolation technique at the roots of 1. In order to recover the rst N coecients of C (z)

30


within the error , it is sucient to interpolate C (z) at the N -th roots of 1 (compare with Prop. 2.3). The evaluation of C (z) is performed by rst evaluating A(z) and B (z) at the roots of 1 by means of m2 + mp FFT's of length N for the arithmetic cost O((m2 + mp)N log N ), and then computing N matrix products of size m m and m p for the cost O(Nm2 p) arithmetic operations. The interpolation of C (z) at the roots of 1 is performed by means of mp FFT's of length N for the cost O(mpN log N ). 2 A similar result can be proved for the inversion of matrix power series.

Theorem 2.5 (Matrix power series inversion) Let A(z) be an m m matrix power series such that det A(0) = 6 0, A(z)?1 exists and has -degree N . Then

the rst N coecients of the matrix power series A(z)?1 can be approximated within the error in O(m3 N + m2 N log N ) operations by means of the evaluation/interpolation technique at the roots of 1.

Observe that, by combining Thms. 2.4 and 2.1 we may easily arrive at the following:

Theorem 2.6 (Block Toeplitz and vector power series product) Let A(z) be an mn mn matrix power series having the n n block Toeplitz structure with m m blocks. Let B (z) be an mn m matrix power series. Then the rst N matrix coecients of the product C (z) = A(z)B (z) can be approximated within the error in O(m2nN log(Nn) + m3 nN ) arithmetic operations, where N is the -degree of C (z).

2.4 Displacement structure The concept of displacement rank, introduced in [59, 60, 62], is fundamental in devising and analyzing algorithms related to Toeplitz matrices [59, 26]. For our applications in Markov chains we shall adopt a de nition of displacement structure that seems to be particularly suitable for dealing with block Toeplitz matrices in block Hessenberg form. We de ne block down-shift matrix the n n block matrix

20 3 66 I 0 77 6 7; Zn;m = 6 . . . . . . 75 4 I 0 where the blocks have dimension m. Consider the block displacement operator

n;m(H ) = Zn;mH ? HZn;m

(2.16)

2.4. DISPLACEMENT STRUCTURE

31

de ned for any n n block matrix H . We also denote by Ln (W ) the n n lower block triangular block Toeplitz matrix de ned by its rst block column W . For the sake of notational simplicity, if the dimensions are clear from the context, we use the notation (H ) instead of n;m(H ), Z instead of Zn;m, and L(W ) instead of Ln(W ). We also introduce the following notations: the block vectors En(1) = [I; 0; : : : ; 0]T ; En(2) = [0; : : : ; 0; I ]T denote mn m matrices made up by the rst and the last m columns of the identity matrix of order mn, respectively. Observe that for a general block Toeplitz matrix A the displacement (A) is zero except for the entries in the rst block row and in the last block column, i.e., it has the structure 2 ::: 0 3 66 0 : : : 0 77 (A) = 66 .. ... ... 775 : 4. 0 ::: 0 Hence, the rank of (A) is at most 2m. De nition 2.5 (Block displacement rank) We shall say that the block matrix H has block displacement rank r if r is the minimum integer such that there exist block column vectors U (i) , and block row vectors V (i) , i = 1; : : : ; r, satisfying the equation r X (H ) = U (i) V (i) : i=1

For an n n block Toeplitz matrix A we obtain (A) = ?En(1) En(1)T AZ + ZAEn(2) En(2)T ; (2.17) hence a block Toeplitz matrix has block displacement rank at most 2. De nition 2.6 (Block Toeplitz-like matrix) We call block Toeplitz-like matrix an n n block matrix having block displacement rank r which is independent of the block dimension n. An interesting property of (H ) is that the block displacement of the inverse matrix can be explicitly related to the block displacement of the matrix itself, viz., if H is nonsingular, then (H ?1) = ?H ?1(H )H ?1: (2.18) From this property it follows in particular that the inverse of a block Toeplitz, or Toeplitz-like, matrix is a block Toeplitz-like matrix. The following result can be easily proved by extending to block matrices the same proof given in [10], [26] for the scalar case.


32

Theorem 2.7 (Displacement representation) Let Kn be an n n block maP

trix such that (Kn ) = ri=1 U (i) V (i) , where U (i) and V (i) are n-dimensional block column and block row vectors, respectively. Then we have

Kn = L(Kn En(1) ) ?

r X i=1

L(U (i) )LT (ZV (i)T ):

The above result allows us to represent any matrix Kn as a sum of products of lower and upper block triangular block Toeplitz matrices de ned by the rst block column of Kn and by the block vectors U (i) , V (i) associated with the block displacement of Kn. If the matrix Kn is nonsingular then the above representation theorem can be applied to Kn?1 in the light of (2.18), thus leading to

Kn?1 = L(Kn?1 En(1) ) +

r X i=1

L(Kn?1U (i) )LT (ZKn?T V (i)T ):

(2.19)

For the n n block Toeplitz matrix Hn in lower block Hessenberg form (1.13) we can easily verify that

2 A 0 ::: 0 66 0 . . . ... (Hn) = 666 0.. .0. 4 . . 0 0 0 : : : 0 ?A0

3 77 77 ; 75

and, hence,

(Hn) = En(1) A0En(1)T ? En(2) A0En(2)T : (2.20) This formula is the basis of the algorithm developed in Chap. 5. Equation (2.19), together with (2.17) and (2.20), allow us to write useful inversion formulae for a generic block Toeplitz matrix A and for the block Hessenberg block Toeplitz matrix Hn of (1.13). We have in fact,

A?1 = L(A?1 En(1) )(I ? LT (ZA?T Z T AT En(1) )) + L(A?1ZAEn(2) )LT (ZA?T En(2) ) (2.21) and Hn?1 = L(Cn(1) )LT (En(1) + ZRn(1)T AT0 ) ? L(Cn(2) )LT (ZRn(2)T AT0 ) ;

(2.22)

where Cn(1) , Cn(2) denote the rst and the last block columns, respectively, of Hn?1, while Rn(1) , Rn(2) denote the rst and the last block rows, respectively, of the matrix Hn?1. From the above results it follows that a block Toeplitz-like matrix, and its inverse, can be expressed by means of a sum of a few products between block

2.4. DISPLACEMENT STRUCTURE

33

lower triangular and block upper triangular block Toeplitz matrices. This fact allows us to perform block Toeplitz-like matrix computations in terms of a few multiplications between matrix polynomials, in the light of the results of Sect. 2.2, with a substantial reduction of the computational cost. There exists a wide literature concerning the analysis of representation formulas like (2.21) for (block) Toeplitz and (block) Toeplitz-like matrices, and for their inverses. We refer to [49, 47, 7, 8, 50, 51, 72] for more details and for the analysis of related problems. The known algorithms for the inversion of block Toeplitz matrices can be extended to the inversion of block Toeplitz-like matrices. In particular the \fast" and the \superfast" algorithms for Toeplitz inversion [26] turn into algorithms for block Toeplitz-like matrix inversion having a cost O(m3n2 ) and O(m3n log2 n) operations, respectively. The conjugate gradient method [30] can also be applied to solve the few systems that provide the block vectors de ning the inverse matrix. This algorithm has a cost of O(m3n log n) operations per step, and requires at most k iterations. In many cases, with a suitable preconditioning technique, the number of iterations can be reduced to O(1) [86]. The above remarks and the results of Sect. 2.3 allow us to state the following theorem that constitutes a useful tool for implementing the pointwise power series interpolation:

Theorem 2.8 (Block Toeplitz-like power series computation) Let F (X1,

: : :, Xh) be a rational function in the variables X1; : : : ; Xr . Assume that the n n block matrix power series W (i) (z), i = 1; : : : ; h, with blocks of size m, are such that Y = F (W (1) ; : : : ; W (h)) exists and has -degree n, W (i) (z) as well as Y = F (W (1); : : : ; W (h) ) are block Toeplitz-like. Then the rst N coecients of F (W (1) ; : : : ; W (h) ) can be approximated in O(m2 nN log(nN ) + Nm3 n log2 n) operations, within the error , where we assume h constant with respect to m and n.

34


Chapter 3 A fast version of Ramaswami's formula By using the natural isomorphism between matrix polynomials and block Toeplitz matrices, pointed out in the previous chapter, we provide a novel interpretation of the Ramaswami formula (1.9) in terms of Toeplitz computations. This fact allows us to derive a new fast algorithm, based on the use of FFT's, for its computation [75]. This new algorithm is particularly ecient when many block components of the vector need to be computed and in the case where the blocks Ai are negligible only for large values of i. In Sect. 3.1 we derive the Ramaswami formula by means of a block UL factorization of the matrix H of (1.10). In Sect. 3.2 we describe the new fast algorithm for the computation of the vector . Finally, in Sect. 3.3 we present numerical results and comparisons between customary and fast Ramaswami's formula.

3.1 Ramaswami's formula and block UL factorization Here we show that the Ramaswami formula can be viewed as the forward substitution stage in the solution of a block lower triangular system. Indeed, by means of the natural isomorphism between formal power series and triangular Toeplitz matrices, we nd a block UL factorization of the matrix H of (1.10). Then we derive the formula of Ramaswami by solving the system (I ? P T ) = 0 by means of that block UL factorization. Consider the block Toeplitz block lower triangular matrix

2 ?A0 03 77 66 I ? A1 ?A0 75 ; 64 ?A2 I ? A1 ?A0 ...

...

35

...

...

36

CHAPTER 3. FAST RAMASWAMI'S FORMULA

obtained by discarding the rst block column of the matrix I ? P T , where P is given by (1.3). We associate with it the formal matrix power series

H (z) = zI ?

X

+1

i=0

Aizi :

The following result holds [75]: Proposition 3.1 Let G be a solution of the matrix equation (1.7) and let Ai , for i 1, be de ned in (1.8). Then

+X1 H (z) = zI ? G I ? Ai+1zi : i=0

(3.1)

Proof. Immediate, by comparing the coecients of z i in both sides of (3.1), since Ai = Ai ? GAi+1, for i 1. 2 Due to the correlation between block triangular block Toeplitz matrices and matrix power series discussed in Sect. 2.2, the functional decomposition given in Prop. 3.1 can be rephrased in terms of the factorization of block lower triangular block Toeplitz matrices [75]:

Theorem 3.2 (Factorization of the matrix H ) Under the hypothesis of Prop. 3.1, the matrix H of (1.10) can be factorized as the product H = UL, where 2 I ? A1 2 I ?G 0 3 0 3 77 77 66 ?A2 I ? A1 6 I ?G 75 : 7 L = 6 U = 664 I ?G 5 4 ?A3 ?A2 I ? A1 .. ... ... ... ... ... 0 .

Proof. The proof readily follows by taking the matrix equation corresponding to decomposition (3.1) and by discarding the rst block row. 2 Ramaswami's formula can now be derived by solving the system (I ? P T ) = 0. Indeed, this can be equivalently rewritten as

2 1 3 2 B1 3 6 7 6B 7 H 664 23 775 = 664 B23 775 0 ; ...

...

that is, as

2 1 3 2 B1 3 6 7 6B 7 UL 664 23 775 = 664 B23 775 0: ...

...

(3.2)

3.2. FAST COMPUTATION OF RAMASWAMI'S FORMULA

37

Let G be the minimal nonnegative solution of the matrix equation (1.7). Then there exists limj Gj (compare with [80]). Hence the matrix U is invertible and has the structure 2 I G G2 : : : 3 66 I G . . . 77 ? 1 7 U = 66 I . . . 75 : 4 ... 0 By multiplying on the left equation (3.2) by the inverse of U , we obtain that 2 1 3 2 B1 3 6 7 6 B 7 (3.3) L 664 23 775 = 664 B23 775 0 ; ... ... where that matrices Bi are de ned in (1.8). Hence, by solving the block lower triangular system (3.3) by forward substitution, we arrive at (1.9).

3.2 Fast computation of Ramaswami's formula We have observed that Ramaswami's formula consists in solving the block triangular, block Toeplitz system (3.3) by forward substitution. This method does not exploit the Toeplitz structure of the system, that can be used by solving the system by means of FFT-based techniques. Let N be an upper bound to the numerical degree of the matrix power seP P P P + + 1 + 1 + 1 T i ? 1 i ries i=0 Aiz and i=1 Biz . Then the matrices i=0 Ai and i=11 BiT are numerically stochastic, and system (3.3) reduces to a block banded system with bandwidth N .

De nition 3.1 (-stochastic and numerically stochastic matrix) A nonnegative matrix A is said -stochastic if (I ? A)e < e. If is the unit roundo of the used oating point arithmetic, A is said numerically stochastic.

Solving the banded system (3.3) by forward substitution leads to the Ramaswami algorithm [84]:

Algorithm 3.1 (CRF { Customary Ramaswami's Formula) Input. Integers p and N , the matrix G nonnegative P solution P of (1.7) and the

matrices A0 , Ai , Bi , i = 1; : : : ; N , such that ically stochastic.

N AT , i=0 i

N BT i=1 i

are numer-

Output. An approximation of the components 0 ; 1 ; : : : ; p of the P probability

invariant vector associated with (1.3) and the value (p) =

p eT . i i=0


38 Computation.

1. Computation of matrices Ai , i = 0; : : : ; N , Bi, i = 1; : : : ; N : (a) Set AN = AN , BN = BN . (b) For i = N ? 1; : : : ; 1 compute Ai = Ai + GAi?1 and Bi = Bi + GBi?1. 2. Computation of 0 : (a) Solve the system B1 0 = 0 , kT 0 = 1, where

kT

= eT

+ eT

N ! N !?1 X X Bi : I ? Ai i=2

i=1

(b) Set (p) = eT 0 . 3. Computation of 1 ; : : : ; p: (a) Compute 1 = (I ? A1 )?1 B2 0 ; (b) For i = 2; : : : ; p, compute i = (I ? A1 )?1 Bi+1 0 + Pji?=1j Ai+1?j j (p) = (p) + eT i where j0 = min(1; i + 1 ? N ) and Bi = 0 for i > N . The computational cost of step 1 and 2 is about 4Nm3 ops; the computational cost of step 3, if p > N , is about m3 + m2 N 2 + 2m2N (p ? N ) ops. Therefore, the asymptotic computational cost to calculate p components of , where p > N , by means of Alg. 3.1 is 2pm2 N . That computational cost can be substantially reduced by solving system (3.3) by the FFT-based algorithms for Toeplitz computation described in the previous chapter. Suppose, without loss of generality, that N = 2M (if N is not a power of 2, we truncate the sequences Ai , Bi at the least power of 2 greater than N ). Let us partition the matrix L of (3.3) into blocks of dimension mN , where each block is itself partitioned into N sub-blocks of dimension m. So, the matrix L can be rewritten as 3 2 0

I ?R 0 77 66 ?S I ? R L = 64 ?S I ? R 75 ;

where

...

0

2 A 66 A12 A1 6 R = 66 A.3 A2 64 .. . . . AN AN ?1

0

A1 ... :::

... A2 A1

...

2 0 A : : : A 3 66 0N AN : :3: 77 6 77 ... ... 77 ; S = 666 4 5 0 0

(3.4)

A2 3 A3 777 ... 7 77 AN 5 0

3.2. FAST COMPUTATION OF RAMASWAMI'S FORMULA

39

are block triangular block Toeplitz matrices. Let us partition the vector according to the structure (3.4) of L, i.e.,

h i T = ~ T0 ; ~ T1 ; ~ T2 ; : : : ;

where

~ 0 = h 0 ; i ~ Ti = TN (i?1)+1 ; TN (i?1)+2 ; : : : ; TN (i?1)+N ; i 1:

With these notations the system (3.3) can be solved by means of the equations

~ 1 = (I ? R)?1b ~ i = (I ? R)?1S ~ i?1; i 2;

(3.5)

2 B1 3 6 7 b = 664 B...2 775 0: BN

(3.6)

where

The matrices involved in (3.5) are block triangular block Toeplitz matrices. The computations can be performed by the FFT-based methods of Sect. 2.2, leading to the following fast algorithm for computing [75]:

Algorithm 3.2 (FRF { Fast Ramaswami's Formula) Input. Integers n and N , the matrix G nonnegative Psolution of (1.7) and P N N T the matrices A0 , Ai , Bi, i = 1; : : : ; N , such that i=0 Ai , i=1 BiT are

numerically stochastic.

Output. An approximation of the components 0 ; 1 ; : : : ; p of the probability

invariant vector , associated with (1.3), where p = nN , and the value P p (p) = i=0 eT i .

Computation.

1. Computation of matrices Ai , i = 0; : : : ; N , Bi, i = 1; : : : ; N : (a) Set AN = AN , BN = BN . (b) For i = N ? 1; : : : ; 1 compute Ai = Ai + GAi?1 , Bi = Bi + GBi?1 . 2. Computation of 0 : (a) Solve the system B1 0 = 0 , kT 0 = 1, where

kT

= eT

+ eT

N !?1 X N ! X I ? Ai Bi : i=1

i=2

40


(b) Set (p) = eT 0 . 3. Computation of 1 ; : : : ; p: (a) Compute (I ? R)?1 by Alg. 2.2. (b) Compute, by Alg. 2.1, ~ 1 = (I ? R)?1 b, where b is given in (3.6), and compute (p) = (p) + eT ~ 1 . (c) For i = 2; : : : ; n, compute, by Alg. 2.1, s = S ~ i?1; ~ i = (I ? R)?1 s; and (p) = (p) + eT ~ i. Steps 1 and 2 in Alg. 3.2 are the same as steps 1 and 2 in Alg. 3.1, and have the computational cost of 4Nm3 ops. At step 3, the computational cost of calculating stage (a) and (b) is about 6Nm3 + 35=2m2N log N ; at stage (c), once the DFT of the matrix S is computed (with less than 10m2N log N ops.), the computational cost of the generic block ~ i is at most 7m2 N + 20mN log N + 20mN 7m2 N + 40mN log N ops. Whence the overall cost to compute p = nN components is at most 10Nm3 + 39N log N + n(7m2 N + 40mN log N ), i.e., the computational cost grows linearly in p at most as p(7m2 + 40m log N ). Therefore Alg. 3.2 is asymptotically (as p tends to in nity) 2m2N=(7m2 + 40 log N ) times faster than Alg. 3.1. Moreover Alg. 3.2 is particularly faster in cases where N is large, that is, in cases where the blocks Ai and Bi are negligible only for large values of i. Concerning numerical stability, all computations in Alg. 3.1 consist in performing additions of nonnegative numbers, multiplications and divisions, so that Alg. 3.1 is strongly numerically stable, i.e., the relative errors generated by the

oating point arithmetic can be bounded entry-wise. Alg. 3.2, due to the use of FFT's, is weakly numerically stable, i.e., it is only possible to give very good relative error bounds in terms of the norm (see Sect. 2.1). However, in spite of the weak stability of FFT, in numerous experiments we have performed, we detected no dierence between the results delivered by the two algorithms.

3.3 Numerical results We implemented Algs. 3.1 and 3.2 in Fortran 77; the programs were run on an Alpha Workstation using standard double precision IEEE arithmetic. We tested the algorithms on a BMAP/G/1 queue [74] and on a problem arising from modeling a Metaring MAC Protocol [4].

3.3.1 8-state BMAP/G/1 system

We consider an 8-state BMAP/G/1 queue with constant service times. The blocks Ai and Bi have dimension 8 and have the structure described in [74], that

3.3. NUMERICAL RESULTS is,

ATi =

X

+1

j =0

41

j Xi e? j ! Ki(j) ; BiT = ?D0?1 Dj+1ATi?j ; j =0

where = maxhf?(D0 )hhg and the fKi g are recursively de ned by K0(0) = I , Ki(0) = O, i 1, and K0(j+1) = K0(j) (I + ?1D0); (j )

Ki(j+1) = ?1 Pih?=01 Kh(j) Di?h + Ki(j)(I + ?1 D0); i 1: In our particular example, ijh ? jh (Di)jh = e i! ; i 1; ? jh (D0 )jh = e P ; j 6= h; P P (D0 )hh = ? j6=h(D0)jh ? +i=11 8j=1(Di )jh; for suitable nonnegative parameters jh. P The integer N for which the matrices Ni=0 ATi , PNi=1 BiT are numerically stochastic is 64. Table 3.1 reports, for dierent numbers p of computed block components, the corresponding value (p) = Ppi=0 eT i, the CPU times (in seconds) needed by Alg. 3.1 and Alg. 3.2, respectively, and their ratios. Fig. 3.1 p 26 27 28 29 210 211 212 213 214

(p) CRF (sec:) FRF (sec:) CRF=FRF 0:077379 0:01 0:07 0:1 0:148238 0:03 0:08 0:4 0:274049 0:05 0:09 0:6 0:472667 0:11 0:12 0:9 0:721746 0:22 0:17 1:3 0:922526 0:41 0:27 1:5 0:993994 0:84 0:59 1:4 0:999963 1:77 1:03 1:7 0:999999 3:73 1:74 2:1 Table 3.1: BMAP/G/1 Queue

shows the CPU time needed by Alg. 3.2 and Alg. 3.1 to compute the same number of block components of the vector .

3.3.2 Metaring MAC Protocol

We consider a problem arising from modeling a metropolitan queueing problem (Metaring MAC Protocol [4]). The dimension m of the blocks Ai and Bi is 80, and


42 4

CRF FRF 3.5

CPU Time (sec.)

3

2.5

2

1.5

1

0.5

0 2^6

2^7

2^8

2^9

2^10 2^11 Block Components

2^12

2^13

2^14

Figure 3.1: BMAP/G/1 Queue the integer N such that the matrices PNi=0 ATi , PNi=1 BiT are numerically stochastic is 256. For a description of the model and the structure of the blocks we refer to [4]. This particular example is characterized by large block bandwidth of the matrix L in system (3.3), by a large dimension of the blocks Ai and Bi, and by a long queue, i.e., the blocks i tend to zero slowly. Table 3.2 reports, for dierent P p numbers p of block components, the corresponding value (p) = i=0 eT i , the CPU times (in seconds) needed by Alg. 3.1 and Alg. 3.2, respectively, and their ratios. Fig. 3.2 shows the CPU time needed by Alg. 3.2 and Alg. 3.1 to compute

p 29 210 211 212 213 214 215

(p) CRF (sec:) FRF (sec:) CRF=FRF 0:285677 181 374 0:5 0:457895 385 399 1:0 0:687780 797 424 1:9 0:896434 1681 454 3:7 0:988604 3236 487 6:6 0:999862 6632 652 10:2 0:999999 13322 829 16:1 Table 3.2: Metaring MAC Protocol

the same number of block components of the vector . Observe that, in both the examples, the customary Ramaswami formula is faster than Fast Ramaswami formula in the case where only few block components

3.3. NUMERICAL RESULTS

43

14000 CRF FRF 12000

CPU Time (sec.)

10000

8000

6000

4000

2000

0 2^9

2^10

2^11

2^12 Block Components

2^13

2^14

2^15

Figure 3.2: Metaring MAC Protocol of the vector need to be computed. Fast Ramaswami formula really shows its eectiveness when many block components must be computed. When the block dimension m is large and the number N of nonnegligible blocks Ai and Bi is large, the advantages of the FRF method are more transparent.

44


Chapter 4 Functional iteration methods The rst class of methods introduced in the literature for the computation of the nonnegative matrix G solving the matrix equation (1.7) is the one based on functional iterations, de ned by the recursion

Xn+1 = F (Xn); n 0; (4.1) where F () is a matrix function like F (X ) = P+i=01 X iAi, and X0 = 0, or X0 = I . There exist a wide literature and a wide numerical experimentation regarding this class of methods (see for example [83, 80, 56, 67, 66]), which were, till few years ago, the most commonly used algorithms for the computation of the matrix G. The functional iteration methods taken into consideration were de ned by one of the following three matrix functions: + X1 i (4.2) F (X ) = X Ai ; i=0 + X1 i ! (4.3) F (X ) = A0 + X Ai (I ? A1)?1 ; i=2 + X1 i?1 !?1 (4.4) F (X ) = A0 I ? X Ai : i=1

The sequence fXng of (4.1), with F () de ned by (4.2), (4.3) or (4.4), converges linearly to the matrix G. However a detailed analysis of the speed of convergence, in relation with the particular function F () or with the starting matrix X0 , has not been provided. In [66] Latouche shows that, in the case where X0 = 0, the sequence (4.1), where F () is (4.2), or (4.3), or (4.4), converges monotonically to the matrix G; moreover, he proves that the method based on (4.4) is faster than that based on (4.3), and the method based on (4.3) is faster than that based on (4.2), but the rate of convergence of the three sequences is not estimated. In [66] the author also shows that, by choosing as starting matrix X0 a general column stochastic matrix, the sequence (4.1), where F () is (4.2), or (4.3), or 45

46

CHAPTER 4. FUNCTIONAL ITERATION METHODS

(4.4), is a sequence of stochastic matrices that still converges to the matrix G. Moreover, Latouche observes [68] that, in many numerical experimentations, the sequence obtained in this way converges much faster than the sequence obtained by starting with X0 = 0. In this chapter we introduce a class of functional iterations (4.1) containing the ones described above; then we associate, with each function F de ning a functional iteration of this class, a suitable nonnegative m m matrix RF . In the case where the starting matrix of (4.1) is X0 = 0, we show that the sequence (4.1) converges monotonically to the matrix G and that the spectral radius F of the matrix RF coincides with the mean asymptotic rate of convergence of the sequence fXngn. Moreover we provide an optimality result by showing that the method based on (4.4) is the fastest method in the class of functional iterations that we introduced [76]. In the case where the starting matrix X0 of (4.1) is any column stochastic matrix, we associate with any function F () of the above class a nonnegative m2 m2 matrix Rb F . The increasing of the speed of convergence of the sequence fXng, with respect to the case X0 = 0 observed in [68], can be formally explained by means of the spectral properties of the matrices Rb F . In fact, we rst observe that the spectral radius of the matrix Rb F coincides with the spectral radius of the matrix RF . Then we show that, if Rb F is irreducible, the mean asymptotic rate of convergence is bounded from above by the second largest modulus eigenvalue of Rb F [76]. Finally, due to the correlation between the rate of convergence and the spectral properties of the matrices Rb F , we propose two strategies for improving the rate of convergence of such iterative methods. The rst strategy consists in choosing an initial approximation X0 which shares with G some eigenvalues and the corresponding left eigenvectors; the second one relies on a relaxation technique which modi es the spectral properties of the matrix Rb F [39, 40, 38]. Numerical results show the eectiveness of these strategies. In Sect. 4.1 we analyze functional iteration methods (4.1) de ned by (4.2), (4.3) and (4.4) with X0 = 0. In Sect. 4.2 we introduce the general class of functional iteration methods and we estimate the rate of convergence of the sequences generated in this class, in the case where X0 = 0. In Sect. 4.3 we analyze functional iteration methods having, as starting point X0 , a column stochastic matrix. In Sect. 4.4 we propose a strategy for the choice of the starting approximation X0 that allows to improve the rate of convergence. In Sect. 4.5 we consider a relaxation technique.

4.1. STANDARD METHODS

47

4.1 Convergence properties of standard functional iteration methods In this section we analyze functional iteration methods de ned by (4.1), with X0 = 0 and F (X ) given by (4.2), (4.3), (4.4). A convergence analysis of such iterative methods have already been performed by G. Latouche in [66]. More precisely, by following the notations introduced in [66], let us denote by fXn(N ) g the sequence de ned by (4.2) (Natural algorithm), by fXn(T ) g the sequence de ned by (4.3) (Traditional algorithm) andPby fXn(U ) g the sequence de ned by (4.4) (algorithm based on the matrix U = +i=11 Gi?1Ai ), with X0(N ) = X0(T ) = X0(U ) = 0. Then the following result holds [66]:

Theorem 4.1 (Monotonic convergence) The sequences fXn(N )gn, fXn(T )gn, fXn(U )gn converge monotonically to the matrix G. Moreover, for any n 0, it holds

Xn(N ) Xn(T ) Xn(U ):

Thm. 4.1 allows us to deduce [66] that the number of iterations IN , IT , IU sucient to obtain the same approximation of the matrix G by means of the sequences fXn(N )gn, fXn(T )gn, fXn(U )gn, respectively, are such that

IU I T I N : Therefore the method based on (4.4) is faster than that based on (4.3), and the method based on (4.3) is faster than that based on (4.2). We give a more precise result, by estimating also the rate of convergence of the three sequences. Let us rst prove the following:

Lemma 4.2 Let X , Y be m m matrices. Then, for any integer i 1, it holds

that

Xi ? Y i

=

Xi j =1

X j?1(X ? Y )Y i?j :

Proof. The equality can be simply proved by induction on i. 2 De ne, for each integer n, the matrices En(N ) = G ? Xn(N ) , En(T ) = G ? Xn(T ) and En(U ) = G ? Xn(U ), which represent the error at step n for the sequences (4.1) de ned by (4.2), (4.3), (4.4), respectively, with X0 = 0. Observe that, for the monotonicity stated by Thm. 4.1, the errors at any step n are such that En(N ) 0, En(T ) 0, En(U ) 0. Moreover it is possible to express them recursively, as shown by the following [76]:


48

Proposition 4.3 For every integer n 0 it holds that eT En(N+1) = eT En(N ) Rn(N ) ; eT En(T+1) = eT En(T ) Rn(T ) ; eT En(U+1) = eT En(U ) Rn(U ) where, for n 0,

Rn(N ) = P+i=11 Pij=1 Xn(N ) i?j Ai Rn(T ) = P+i=11 Pij=1 Xn(T )i?j Ai ? A1 (I ? A1 )?1 ?1 Rn(U ) = P+i=21 Pij=2 Xn(U ) i?j Ai I ? P+i=11 Xn(U ) i?1Ai :

(4.5)

Proof. Let us rst analyze the functional iteration method based on (4.2). From Lem. 4.2 and from (1.7) and (4.1) it follows that

En(N+1) = G ? Xn(N+1) =

+ X1 Xi j?1 (N ) (N ) i?j X i (N ) i G En Xn Ai : G ? Xn Ai =

+1

i=1 j =1

i=1

Whence, since G is column stochastic, we have

eT En(N+1) = eT

XX

+1 i

i=1 j =1

En(N ) Xn(N ) i?j Ai = eT En(N ) Rn(N ) :

Analogously, for the functional iteration method based on (4.3), we obtain

En(T+1) = G ? Xn(T+1) = P+i=21 Gi ? Xn(T ) i Ai(I ? A1)?1 = P+1 Pi Gj?1E (T ) X (T ) i?j A (I ? A )?1: i=2

j =1

n

n

i

1

Hence, from the stochasticity of G, we obtain T (T ) n+1

eE

1 0+1 i X X i ? j Xn(T ) AiA (I ? A1 )?1 = eT En(T ) Rn(T ) : = eT En(T ) @ i=2 j =1

Let us now analyze the functional iteration method based on (4.4). From (4.1) and (1.7) we have

En(U+1) = P ?1 P+1 (U ) i?1 ?1 (U ) + 1 i ? 1 G ? Xn+1 = A0 I ? i=1 G Ai ? I ? i=1 Xn Ai = P ? 1 G I ? I ? +i=11 Gi?1Ai I ? P+i=11 Xn(U )i?1 Ai :


49

Whence, from Lem. 4.2, it follows that

En(U+1) = h G I ? I ? P+i=11 Xn(U )i?1 Ai ? P+i=21 Pij?=11 Gj?1En(U ) Xn(U ) i?1?j Ai I ? P+i=11 Xn(U ) i?1Ai ?1 =

?1 G P+i=21 Pij?=11 Gj?1En(U ) Xn(U ) i?1?j Ai I ? P+i=11 Xn(U ) i?1Ai : Therefore, from the stochasticity of G, we have

eT En(U+1) = eT En(U ) P+i=21 Pij?=11 Xn(U ) i?1?j Ai I ? P+i=11 Xn(U )i?1 Ai ?1 = eT En(U ) Rn(U )

2

Prop. 4.3 allows us to express the norm of the error at step n, by means of the matrices Rn(N ) , Rn(T )P, Rn(U ) of (4.5). Let us rst introduce the vector norm jjjj1 such that jjvjj1 = i jvij for any (possibly in nite) vector v = (vi)i (we will denote with the same symbol jjjj1 also the corresponding induced matrix norm).

Theorem 4.4 (Error at each step) For every integer n 1 it holds that ?1 ?1 ?1

nY

(N )

nY

nY

En 1 = Ri(N )

1 ;

En(T )

1 =

Ri(T )

1;

En(U )

1 =

Ri(U )

1 ; i=0

i=0

where the matrices Ri(N ) , Ri(T ) , Ri(U ) are de ned in (4.5) and : : : Hn?1, for m m matrices Hi.

i=0

Qn?1 H = H H 0 1 i=0 i

Proof. For every integer n, since En(N ) 0, we have kEn(N ) k1 = keT En(N ) k1 . ?1 R(N ) k1 . On the other hand, since Whence, for Prop. 4.3, kEn(N ) k1 = keT E0(N ) Qin=0 i ?1 R(N ) k1 X0(N )Q= 0, we have eT E0(N ) = eT G = eT . Therefore, kEn(N ) k1 = keT Qin=0 i ?1 R(N ) k : The analogous relations for E (T ) and E (U ) readily follow by = k in=0 i 1 n n using the same arguments. 2 It is also possible to estimate the asymptotic rate of convergence of the sequences fXn(N ) g, fXn(T ) g, fXn(U ) g. Let us rst de ne the matrices R(N ) = P+i=11 Ai ;

R(T ) = P+i=11 Ai ? A1 (I ? A1 )?1; ?1 R(U ) = P+i=21 Ai I ? A1 ;

(4.6)


50 such that

(N ) (T ) (T ) (U ) (U ) R(N ) = lim n Rn ; R = lim n Rn ; R = lim n Rn : Then the following theorem holds [76]:

Theorem 4.5 (Asymptotic rate of convergence) Let qn

qn

(4.7)

qn

rN = lim kEn k; rT = lim kEn k; rU = lim kEn(U ) k; n n n (N )

(T )

be the mean asymptotic rates of convergence for the sequences fXn(N ) g, fXn(T ) g, fXn(U )g, respectively, where k k is any matrix norm. Then it holds that

rN = (R(N ) ); rT = (R(T ) ); rU = (R(U ) ); where the matrices R(N ) , R(T ) , R(U ) are de ned in (4.6). Proof. We prove the theorem only for the sequence fXn(N ) g and we leave the reader to complete the proof for the sequences fXn(T ) g and fXn(U )g. For Thm. 4.4 it holds that v

u n?1 u tn

Y Ri(N )

: k E rN = lim k = lim n 1 n n 1 qn

(N )

(4.8)

i=0

For the monotonicity of the sequence fXn(N ) g stated by Thm. 4.1 and for the properties of the spectral radius (compare with [94]), we nd

v u ?1 qn

nY

u (N ) n t lim

Ri 1 lim kR(N ) nk1 = (R(N ) ); n n i=0

whence

rN (R(N ) ): (4.9) Let us now prove the opposite inequality. Suppose rst that the nonnegative vector eT R0(N ) has no null components. For any integer k < n, for the monotonicity of the sequence fXn(N ) g and for the monotonicity of k k1, it holds that ?1 ?1

nY

kY

k n?k n?k (N )

Ri 1 Ri(N ) Rk(N )

1

R0(N ) Rk(N )

1: (4.10) i=0

i=0

k

Hence, for any integer k, since the nonnegative vector eT R0(N ) also has no null components, it holds that

(N ) k (N ) n?k

n?k

R0 Rk

1 ck

Rk(N )

1

(4.11)


51

where ck > 0 is a suitable constant. From (4.7) it follows that, for any xed > 0, there exists an integer k0 such that (Rk(N ) ) (R(N ) ) ? . Whence, from (4.8), (4.10) and (4.11), we obtain 0

r n?k r k

(N ) n?k n (N ) rN limn R0 Rk r 1 limn n ck

Rk(N )

1 r n r n k ?1 p (N ) n (N )

n n limn ck Rk 1 Rk 1 = limn n

Rk(N )

1 = (Rk(N ) ) (R(N ) ) ? : Therefore, for the arbitrarity of , the above inequality, together with (4.9), leads to the thesis. Suppose now that the rst i components of the nonnegative vector eT R0(N ) = eT E1(N ) are zero and that the remaining components are non-null. Whence, for the monotonic convergence of the sequence fXn(N ) g, it follows that also the rst i components of the nonnegative vector eT En(N ) are zero for any n 1. Moreover, it can easily be proved (we leave the proof to the reader) that, for any n 0, the rst i components of the vector eT Rn(N ) are zero. Therefore, the matrices Rn(N ) have the structure (N ) Rn(N ) = 00 UTn(N ) n where Un(N ) is an (m ? i) (m ? i) matrix and Tn(N ) is an i (m ? i) matrix. Moreover, the matrix R(N ) has the structure 0 T (N ) ( N ) R = 0 U (N ) where U (N ) = limn Un(N ) , T (N ) = limn Tn(N ) . If, except for the rst i components, no other entry of the vector eT En(N ) vanishes in a nite number of steps, then the matrices Un(N ) have no null columns for any n 1. Let vT1 be the m ? i dimensional vector de ned by the non-null components of the vector eT E1(N ) . From the monotonicity of the sequence fXn(N ) g and from the monotonicity of k k1, it follows that, for any integer k < n

Qn?1 (N )

T Qn?1 (N )

Qn?1 (N )

i=0 Ri 1 = v1 i=1 Ui 1 c i=1 Ui 1

k?1 n?k

n?k

n?k c

Qk?1 Ui(N ) U (N )

c

U1(N ) U (N )

ck

U (N )

; 0

0

0

0

0

0

0

0

0

0

0

0

i=1

k

k

1

1

k

1

where c; ck > 0 are suitable constants. Whence, for every > 0, there exists k0 such that

r n?k n rN lim ck

Uk(N )

1 = (Uk(N ) ) (U (N ) ) ? = (R(N ) ) ? ; n 0

0

0

0

which leads to the thesis, for the arbitrarity of . If the components i +1; : : : ; i + i0 of the vector eT En(N ) vanish in a nite number of steps and no other component


52

vanishes, it follows that, for n 1, the matrices Un(N ) have the structure (N )

Un

Vb (N ) Tb (N ) = n0 Ubn(N ) n

where Ubn(N ) is an (m ? i ? i0 ) (m ? i ? i0 ) matrix having no null columns and Vbn(N ) is an upper triangular matrix with null diagonal entries. Therefore we apply the same arguments to the matrix Ubn(N ) , concluding the proof. 2 The following proposition, together with Thm. 4.5, allows us to compare the rate of convergence of the three sequences. In particular, it shows that the sequence fXn(U )g based on (4.4) is asymptotically faster than the sequences fXn(N ) g and fXn(T )g based on (4.2) and (4.3), respectively [76]:

Theorem 4.6 (Comparisons of rates of convergence) The spectral radii of the matrices R(N ) , R(T ) and R(U ) are related by the following inequality

(R(U ) ) (R(T ) ) (R(N ) ): Proof. Observe that

R(N ) = C (N ) D(N ) ?1; R(T ) = C (T )D(T ) ?1; R(U ) = C (U ) D(U )?1; where

are such that

P+1 A ; C (N ) = P D(N ) = I i=1 i + 1 ( T ) C = Pi=1 Ai ? A1 ; D(T ) = I ? A1 D(U ) = I ? A1 C (U ) = +i=21 Ai;

D(N ) ? C (N ) = D(T ) ? C (T ) = D(U ) ? C (U ) = I ?

X Ai ;

+1

i=1

that is, the matrices R(N ) , R(T ) and R(U ) are obtained by means of regular splittings (compare with [94]) of the M-matrix I ? P+i=11 Ai . Whence, for the PerronFrobenius Theorem 1.1, since

C (U ) C (T ) C (N ) ; it follows that

(R(U ) ) (R(T ) ) (R(N ) ):

2

4.2. A GENERAL CLASS OF FUNCTIONAL ITERATION METHODS

53

4.2 A general class of functional iteration methods Functional iteration methods (4.1) de ned by (4.2), (4.3), (4.4) belong to a more general class where the function F () is such that the matrix G satis es the equation = F (X ). More precisely, consider the formal power series A(z) = P+1 zi A X(observe that A(z) converges inPthe closed unit circle)Pand suppose i i=0 that A(z) = zH (z) + K (z), where H (z) = +i=01 zi Hi and K (z) = +i=01 zi Ki are such that Hi 0, Ki 0, i 0 (observe that A0 = K0 and Ai = Hi?1 + Ki, for i 1). Then the following property holds [76]:

Proposition 4.7 The matrix G solves equation (1.7) if and only if it solves the

matrix equation

X = K (X )(I ? H (X ))?1 (4.12) P P where X is an m m matrix and K (X ) = +i=01 X iKi, H (X ) = +i=01 X iHi. Proof. The matrix G solves equation (1.7) if and only if it solves equation

X (I ? H (X )) = K (X ): Since Hi Ai+1 for i 0 and since the matrix I ? P+i=01 GiAi+1 is nonsingular (compare with [80]), it follows that I ? H (G) is nonsingular. Hence G solves (4.12). 2 Equation (4.12) allows us to de ne the sequence of matrices Xn = F (Xn?1); n 1 where F (X ) = K (X )(I ? H (X ))?1: (4.13) P By choosing H (z) = 0, H (z) = A1, H (z) = +i=01 zi Ai+1, we obtain the functional iteration methods de ned by (4.2), (4.3), (4.4), respectively. The results stated in Sect. 4.1 can easily be extended to iterations (4.1) de ned by (4.13), with X0 = 0. Let us rst prove the following result, which extends Thm. 4.1:

Theorem 4.8 (Monotonic convergence) The sequence Xn = F (Xn?1), X0 = 0, where F () is given by (4.13), converges monotonically to the matrix G. Proof. We prove the monotonicity of the sequence fXng by induction on n. For n = 1 the thesis is true, since X1 = F (X0) 0 = X0. Suppose that Xn Xn?1 and let us prove that Xn+1 Xn: from (4.13) and from the inductive hypothesis, we obtain

Xn+1 = F (Xn) = K (Xn)(I ? H (Xn))?1 K (Xn?1)(I ? H (Xn?1))?1 = Xn:


54

2

De ne, for each integer n 0, the matrices En = G ? Xn, which represent the error at step n of the sequence fXng. Then the following general recursive relations for the matrices fEng hold: Proposition 4.9 Let Xn = F (Xn?1), X0 0, where F () is de ned by (4.13). Then for every integer n 0 it holds that

0+1 i 1 X X En+1 = @ (Gj?1EnXni?j Ki + Gj EnXni?j Hi)A (I ? H (Xn))?1 i=1 j =1

(4.14)

or, in equivalent form,

0 +1 1 + 1 + 1 X X X En+1 = @En Xni?1 Ki + Gj?1En Xni?j AiA (I ? H (Xn))?1: i=1

j =2

i=j

(4.15)

Proof. From (4.1) and (4.13) it follows that En+1 = G ? Xn+1 = K (G)(I ? H (G))?1 ? K (Xn)(I ? H (Xn))?1 =

P+1 GiK (I ? H (G))?1 ? P+1 X i K (I ? H (X ))?1 = n i i=0 n i i=0 P+1(Gi ? X i )K (I ? H (X ))?1 + K (G) ((I ? H (G))?1 ? (I ? H (X ))?1) = n n i=0 n i P+1(Gi ? X i )K (I ? H (X ))?1 + G (I ? (I ? H (G))(I ? H (X ))?1) : i=0

n

i

n

Whence, from Lem. 4.2 P+1(Gi ? X i )K (I ? H (X ))?1 = n i=0 n i

n

P+1 Pi Gj?1(G ? X )X i?j K (I ? H (X ))?1 n n i n i=1 j =1

and I ? (I ? H (G))(I ? H (Xn))?1 =

I ? I ? P+i=01 Xni Hi ? P+i=11 Pij=1 Gj?1(G ? Xn)Xni?j Hi (I ? H (Xn))?1 = P+1 Pi j?1 G (G ? X )X i?j H (I ? H (X ))?1: i=1

Therefore

n

j =1

n

i

n

0+1 i 1 X X En+1 = @ (Gj?1EnXni?j Ki + Gj EnXni?j Hi)A (I ? H (Xn))?1 : i=1 j =1

From the above equation and from the relation Ai = Hi?1 + Ki, i 1, we obtain formula (4.15). 2 From the above proposition the generalization of Prop. 4.3 follows [76]:

4.2. A GENERAL CLASS OF FUNCTIONAL ITERATION METHODS

55

Corollary 4.10 Let Xn = F (Xn?1), X0 = 0, where F () is de ned by (4.13). Then for every integer n 0 it holds that eT En+1 = eT EnRn (4.16) where, for n 0, 0+1 +1 1 X X i ? j Rn = @ Xn Ai ? H (Xn)A (I ? H (Xn))?1: j =1 i=j

(4.17)

Proof. For the stochasticity of the matrix G, from (4.14) and from the relation Ai = Hi?1 + Ki we obtain eT En+1 =

eT En P+i=11 Pij=1 Xni?j (Ki + Hi) (I ? H (Xn))?1 = eT E P+1 P+1 X i?j A ? H (X ) (I ? H (X ))?1 : n

j =1

i=j

n

i

n

n

2

The following result is a straightforward consequence of Cor. 4.10 and a generalization of Thm. 4.4 (we leave the proof to the reader): Theorem 4.11 (Error at each step) For every integer n 1 it holds that

?1

nY (4.18)

En 1 = Ri

1 i=0 Q ?1 R = R R : : : R . where the matrices Rn are de ned in (4.17) and in=0 i 0 1 n?1

De ne the matrix

! X Ai ? H (G) (I ? H (G))?1 RF = +1

i=1

(4.19)

such that limn Rn = RF . The matrix RF allows us to express the asymptotic rate of convergence of the sequence fXng, by means of the following [76]: Theorem 4.12 (Asymptotic rate of convergence) Let

qn kEnk r = lim n

be the mean asymptotic rate of convergence of the sequence Xn = F (Xn?1 ), X0 = 0, where F () is de ned in (4.13) and k k is any matrix norm. Then r = (RF ) where RF is given in (4.19).

56


Proof. The theorem can easily be proved by using the same arguments of Thm. 4.5. 2 From Thm. 4.12 it follows that the speed of convergence of the sequence Xn = F (Xn?1), X0 = 0, de ned by (4.13) is related to the spectral radius of the matrix RF of (4.19). Observe that the matrix RF can be written in the form RF = CD?1; where + X1 C = Ai ? H (G); D = I ? H (G)

are such that

i=1

C 0; D?1 0; D ? C = I ?

X Ai ;

+1

i=1

that is, RF is obtained by means of a regular splitting (compare with [94]) of the M-matrix I ? P+i=11 Ai. Suppose that R1 = C1D1?1 and R2 = C2D2?1 are the matrices of (4.19) associated with the functions F1 (), F2() of (4.13), respectively, and suppose that C1 C2 . From the Perron-Frobenius Theorem 1.1, it follows that (R1 ) (R2 ): On the other hand observe that, for any choice of F () de ned by (4.13), since Hi 0 for any i, it holds that

H (G) Whence

C= (U ) ?1

X

+1

i=1

Gi?1 Ai = A1 :

+ X X1 Ai ? H (G) Ai = C (U ) ;

+1

i=1

i=2

where R(U ) = C (U ) D is the matrix associated with the sequence fXn(U )g de ned by (4.4). Therefore, the functional iteration method de ned by (4.4) is the fastest one in the class of method de ned by (4.13), with X0 = 0.

4.3 Functional iteration methods generating a sequence of stochastic matrices Consider functional iteration methods Xn = F (Xn?1); X0 = Q (4.20) where F () is given by (4.13) and Q is a column stochastic matrix. The sequence fXng is a sequence of stochastic matrices that still converges to the matrix G

4.3. A SEQUENCE OF STOCHASTIC MATRICES

57

(compare with [66]). We prove that such sequence of matrices converges faster than the sequence of matrices obtained by starting with X0 = 0. Prop. 4.9 expresses recursively the error En of the sequence Xn = F (Xn?1), where F () is de ned in (4.13) and X0 is any nonnegative matrix. In the case where X0 = 0, since at each step n it holds Xn G, we easily obtain formula (4.18) by left multiplying (4.15) by eT . In the case where X0 is not necessarily equal to zero (hence En is not necessarily a nonnegative matrix), in order to express the error at step n in a simpler form, we introduce the matrix tensor product such that A B is the block matrix de ned by the blocks ai;j B , where A = fai;j gi;j . In fact, if we represent a matrix A by the vector a obtained by column-wise arranging the entries of A (we shall write a = vec(A) and A = vec?1(a)), we nd that the expression vec(AXB ) can be rewritten as (B T A)x, where A; X; B are matrices of compatible sizes and x = vec(X ). Hence, the recursive formula (4.15) can be rewritten as e^n+1 = Rb ne^n (4.21) where e^n = vec(En),

Rb n = P+i=11 (XPni?1KiP(I ? H (Xn))?1)T I + +1 +1 (X i?j A (I ? H (X ))?1 )T Gj ?1 ; n j =2 i=j n i

where I denotes the m m identity matrix. Observe that the matrix Rb n, for n 0, can be expressed in the form

X Rb n = Yj;n Gj +1

j =0

where

P+1 (X i?1K (I ? H (X ))?1 )T Y0;n = P i n i=1 n Yj;n = +i=1j (Xni?j Ai+1(I ? H (Xn))?1)T ; j 1: Let Yj = limn Yj;n, j 0, i.e., P+1 (Gi?1 K (I ? H (G))?1)T Y0 = P i i=1 + 1 i ? j Yj = i=j (G Ai+1(I ? H (G))?1)T ; j 1; and let Y = P+j=01 Yj . Observe that 0 +1 +1 1T X X i?j Y =@ G Ai ? H (G) (I ? H (G))?1A = RFT ;

(4.22) (4.23) (4.24)

j =1 i=j

where RF = limn Rn of (4.17) and is related to the speed of convergence of the sequence Xn = F (Xn?1), X0 = 0. From the above equation it follows, in particular, that (Y ) = (RF ): We prove also that [76]:

58


Proposition 4.13 The spectral radius of the matrix Rb F = limn Rb n is given by (Rb F ) = (RF ):

Moreover, a nonnegative m2 -dimensional vector v such that

v T Rb F = vT ; where = (RF ), is given by v T = (y T eT ), where y is a nonnegative mdimensional vector such that RF y = y. Proof.P Let S be the Schur canonical form [94] of the matrix G. Then, the matrix T = +j=01(Yj S j ) is similar to the matrix Rb F . Moreover, due to the properties of the matrix S , the eigenvalues of the matrix T are given by the set

[ 2

fj is eigenvalue of

X

+1

j =0

Yj j g;

where is the set of the eigenvalues of G; in particular, we deduce that the eigenvalues of the matrix RF are eigenvalues of the matrix Rb F . On the other hand, for the properties of nonnegative matrices (see Sect. 1.1), since for any 2 it holds that

+1 +1 X j X j +X1 Yj Yj j j Yj = RFT ; j =0

j =0

j =0

we nd that (Rb F ) (RF ), hence (Rb F ) = (RF ). From the properties of tensor product and from the relation eT G = eT , we obtain (yT eT )Rb F = (yT eT ) P+j=01(Yj Gj ) =

P+1(yT Y ) (eT Gj ) = P+1(yT Y ) eT = j =0

j

j =0

T P+1 T y i=0 Yj e = (yT eT ):

j

Therefore, the vector vT = yT eT is a left nonnegative eigenvector of the matrix Rb F . 2 By following the same argument of the proof of Prop. 4.13, it is straightforward to prove the following:

Proposition 4.14 For every n 0 the spectral radius of the matrix Rb n is given

by

0+1 1 X (Rb n) = @ Yj;nA : j =0


59

Moreover, a nonnegative m2 -dimensional vector v n such that

v Tn Rb n = nvTn ; where n = (Rn ), is given by vPTn = (yTn eT ), where y n is a nonnegative m1 Y = yT . dimensional vector such that yTn +j=0 j;n n n From Prop. 4.14 it follows that, for any integer n 0, the left nonnegative eigenvector vn of the matrix Rb n associated with its spectral radius n belongs to the linear space S1 generated by the orthogonal vectors u1 = p1m (e1 e); u2 = p1m (e2 e); : : : ; um = p1m (em e); (4.25) where ei is the m-dimensional vector having the i-th entry equal to 1 and the remaining entries equal to zero. Let T1 be the linear space T1 = S1?. It can easily be observed that, since the matrices Xn and G are column stochastic, the error vector e^n belongs to the linear space T1, for any n. In particular, if S1 is the m2 m matrix whose columns are an orthogonal basis of the linear space S1 , and T1 is the m2 (m2 ? m) matrix whose columns are an orthogonal basis of the linear space T1, then the m2 m2 matrix = (S1 jT1) is such that T = I;

T e^n

0 = f ; n

where f n is an m(m ? 1)-dimensional vector. Whence the recursive equations (4.21) can be rewritten in the form

0 0 T T T b b (4.26) f n+1 = ( Rn)( e^n) = ( Rn ) f n : On the other hand, observe that, if v 2 S1 then, from (4.22) and from the properties of tensor product, it follows that (vT Rb n)T 2 S1 , whence S1T Rb n T1 = 0. Therefore the matrix T Rb n has the structure 2 T3 V 0 S1 7 6 T b b (4.27) Rn = 4 ? 5 Rn (S1 jT1) = Un W ; n n T T 1

where Vn = S1T Rb nS1, Wn = T1T Rb nT1 and Un = T1T Rb nS1 . Hence the recurrences (4.26) can be rewritten in the form

f n+1 = Wnf n: The above properties allow us to prove the following convergence results:

(4.28)


60

Theorem 4.15 (Asymptotic rate of convergence) Let qn ke^nk r = lim n

be the mean asymptotic rate of convergence of the sequence (4.20) where F () is de ned by (4.13) and k k is any vector norm. Then r (WF ) where WF = T1T Rb F T1 and Rb F = limn Rb n . Proof. From (4.26) and (4.28) it follows that

qn qn ^ k kf nk: e k = lim r = lim n n n Therefore, let us analyze the convergence of the sequence ff ng. Let > 0 be xed. Then (see [94]) there exists a matrix norm k k such that kWF k (WF ) + : On the other hand, since the sequence fWng converges to WF , there exists an integer i0 such that kWik kWF k + for any i i0 . Therefore kf nk = kWn?1Wn?2 : : : W0 f 0 k ci kWn?1kkWn?2k : : : kWi k 0

0

ci (kWF k + )n?i ci ((WF ) + 2)n?i where ci is a positive constant. Hence, by taking the n-th root, we have qn kf nk (WF ) + 2: r = lim n Therefore, from the arbitrarity of and from the above inequality, the thesis follows. 2 Observe that, from the structure (4.27) of Rb F , it holds (WF ) (Rb F ) = (RF ). In the case where the matrix Rb F is irreducible the inequality is strict, as shown by the following: Theorem 4.16 (Improvement of the rate of convergence) If the matrix Rb F 0

0

0

0

0

is irreducible, then the mean asymptotic rate of convergence

qn ke^nk r = lim n of the sequence (4.20), where F () is de ned by (4.13) and k k is any vector norm, is r 2(Rb F ) where 2 (Rb F ) denotes the second largest modulus eigenvalue of the matrix Rb F .


61

Proof. From Thm. 4.15 it is sucient to prove that

(WF ) 2 (Rb F ): First observe that, from (4.27), the eigenvalues of the matrix Rb n coincide with the eigenvalues of the matrices Vn and Wn. On the other hand, if the columns of the matrix S1 are given by the vectors u1 ; : : : ; um of (4.25), then the matrix Vn is nonnegative and such that, for the properties of tensor product and from the structure (4.22) of Rb n, + X1 Vn = S1T Rb nS1 = m1 (I eT )Rb n (I e) = Yj;n:

j =0

Therefore, for Prop. 4.14, the spectral radius of the matrix Vn is equal to the spectral radius of the matrix Rb n, for any n. On the other hand, since the matrix Rb F is irreducible, it follows that there exists an integer n0 such that the matrices Rb n are irreducible for n n0 . Whence, for the Perron-Frobenius Theorem 1.1, for any n n0, there exists a unique positive eigenvalue of the matrix Rb n having modulus equal to the spectral radius and its algebraic multiplicity is equal to 1. Therefore, for any n n0 , the spectral radius of the matrix Wn is such that (Wn) 2 (Rb n); where 2 (Rb n) is the second largest modulus eigenvalue of the matrix Rb n. In particular, by taking limits for n ! +1, we obtain (WF ) 2 (Rb F ); (4.29) where WF = limn Wn = T1T Rb F T1. 2 2 2 Therefore, the spectral properties of the m m matrix Rb F = limn Rb n, i.e.,

Rb F = P+i=11 (GPi?1 KiP(I ? H (G))?1)T I + +1 +1 (Gi?j A (I ? H (G))?1 )T Gj ?1 = i j =2 i=j (I ? H (G))?1T I P+1 AT Gj?1 ? H (G)T I ; j =1 j

provide informations on the rate of convergence. In particular, from the above theorems and from the results of the previous section, it follows that, in the cases where the matrix Rb F is irreducible, the mean asymptotic rate of convergence of the sequence Xn = F (Xn?1) is: { equal to the spectral radius of Rb F , if X0 = 0; { less than or equal to the modulus of the second largest eigenvalue of Rb F , if X0 is a column stochastic matrix. In many cases the modulus of the second largest eigenvalue of a matrix can be much smaller than its spectral radius.


62 2

Identity Zero 0

Log (residual)

-2

-4

-6

-8

-10

-12 0

50

100

150

200

250

Iterations

Figure 4.1: Natural algorithm In Figs. 4.1, 4.2, 4.3 we report, for the functional iterations Pde ned by (4.2), (4.3), (4.4) the logarithm (to the base 10) of the residual kXn ? +i=01 Xni Aik1, for the sequences obtained by starting with X0 = 0 and with X0 = I , for a problem arising from the modeling of a metropolitan network [3]. It is worth pointing out the increasing of the speed of convergence obtained by starting with X0 = I instead of X0 = 0. Moreover it can be observed that the method de ned by (4.4) (method based on the matrix U ) converges more quickly than the methods based on (4.2) and (4.3) (natural and traditional algorithm, respectively). More precisely the spectral radius 1 of the matrix Rb F , which gives the mean asymptotic rate of convergence of the sequence obtained by starting with X0 = 0, is given by 1 = 0:998448 for the natural algorithm de ned by (4.2), 1 = 0:998379 for the traditional algorithm de ned by (4.3) and 1 = 0:998176 for the algorithm based on the matrix U de ned by (4.4). On the other hand the second largest modulus eigenvalue of the matrix Rb F is 2 = 0:883677 for the natural algorithm, 2 = 0:875044 for the traditional algorithm and 2 = 0:858737 for the algorithm based on the matrix U . Moreover, as it can easily be observed from the gures, for this particular example the rate of convergence of the sequence obtained by starting with X0 = I is equal to the second largest modulus eigenvalue of the matrix Rb F .


63

2 Identity Zero 0

Log (residual)

-2

-4

-6

-8

-10

-12 0

50

100

150

200

250

Iterations

Figure 4.2: Traditional algorithm

2 Identity Zero 0

Log (residual)

-2

-4

-6

-8

-10

-12 0

50

100

150

200

Iterations

Figure 4.3: Algorithm based on the matrix U

250

64


4.4 A strategy for the choice of the starting approximation In the previous section we have observed that the rate of convergence of functional iterations (4.1) is strongly related to the starting approximation X0. Moreover, we have proved that the better rate of convergence of the sequence obtained with X0 = Q, Q column stochastic matrix, is due to the fact that X0 shares with G the eigenvalue 1 and the corresponding left eigenvector eT = (1; : : : ; 1). Here we generalize this result, by considering starting approximations X0 which share with G more eigenvalues, and corresponding left eigenvectors. Denote with = f1; : : : ; mg the set of the eigenvalues of the matrix G, and with wT1 ; : : : ; wTm the corresponding left eigenvectors, where 1 = 1 and wT1 = eT . Let h be a subset of h eigenvalues of , 1 h < m, such that: P1) 1 2 h;

P2) if 2 h, then 2 h, where denotes the complex conjugate of ;

P3) the left eigenvectors corresponding to the eigenvalues in h are linearly independent. Throughout, we suppose, without loss of generality, that h = f1 ; : : : ; hg. Denote with Sh the subspace generated by the m2{dimensional vectors ei wj , i = 1; : : : ; m, j = 1; : : : ; h, let Th = Sh?. Let Sh = I Eh an m2 hm matrix whose columns are an orthonormal basis of Sh, and let Th = I Fh an m2 (m2 ? hm) matrix whose columns are an orthonormal basis of Th. The following result generalizes the convergence properties stated by Thm. 4.15, that hold in the case where X0 is column stochastic [39]:

Theorem 4.17 (Asymptotic rate of convergence) Let fXngn0 be the sequence generated by Xn+1 = F (Xn), where F () is de ned by (4.13). Let 1 h < m and h satisfying properties P1, P2, P3. Suppose that X0 veri es wT X0 = wT ; (4.30) for any 2 h , where wT is the corresponding left eigenvector of G. If the sequence Xn is convergent then it holds

qn ke^nk (ThT Rb F Th); r = lim n

for any vector norm jj jj. Proof. First observe that e^n 2 Th , for n 0. In fact, for n = 0, this property follows from the relations wT (G ? X0 ) = wT ? wT = 0, for any 2 h; for

4.4. CHOICE FOR THE STARTING APPROXIMATION

65

n 1, the property follows from (4.21) and from the fact that, if u 2 Sh, then uT Rb n 2 Sh, for any n 0. Let = (ShjTh). From (4.21) it follows that h ih i h i T e^n+1 = f 0 = T Rb n T e^n = T Rb n f0 ; (4.31) n+1

n

where f n = ThT en. Since ShT Rb n Th = 0, it follows that

2 T3 V Sh 0 7 6 n;h T b b Rn = 4 ? 5 Rn [ShjTh] = U W ; (4.32) n;h n;h T Th = ThT Rb nTh. From (4.31) and (4.32), the recurrence (4.21) can be

where Wn;h rewritten in the form

f n+1 = Wn;hf n:

Let > be xed and let jj jj be a matrix norm such that

kThT Rb F Thk (ThT Rb F Th) + : Since the sequence fWn;hg converges to ThT Rb F Th, it can be easily veri ed that there exists a positive constant c such that

kf nk c((ThT Rb F Th) + 2)n: Whence, we have

qn qn ^ k kf nk (ThT Rb F Th): e k = lim r = lim n n n

2 From the above theorem it follows that, by starting with a matrix X0 sharing with G more than one eigenvalue, the rate of convergence can be reduced. In fact, the following result holds:

Theorem 4.18 Under the hypotheses of Thm. 4.17, the set of the eigenvalues of the matrix ThT Rb F Th is given by

=

m [

f : is eigenvalue of Qig

i=h+1

P j Y , i = 1; : : : ; m, and the matrices Y are de ned in (4.24). where Qi = 1 j j =0 i j

66


Proof. Observe that the matrix WF;h = ThT Rb F Th can be written as

X WF;h = (I FhT )Rb F (I Fh) = Yj FhT Gj Fh: 1

j =0

T j Whence, the matrix WF;h is similar to the matrix P1 j =0 Fh G Fh Yj , which is P j T similar to 1 j =0 U Yj , where U is the Schur canonical form of Fh GFh . Since the eigenvalues of FhT GFh are h+1; : : : ; m , U is an upper triangular matrix whose diagonal entries are h+1; : : : ; m . In this way we nd that WF;h is similar to a block upper triangular matrix whose diagonal blocks are the matrices Qi , i = h + 1; : : : ; m. 2 In particular, from the above results, it follows that

(ThT Rb F Th) (T1T Rb F T1 ) = (WF )

for any 1 < h < m, that is, the rate of convergence can be improved, if the starting approximation X0 has h > 1 eigenvalues, and corresponding left eigenvectors, in common with G. In the above theorems we have given conditions on the starting matrix X0 which allow to increase the rate of convergence of the sequence Xn+1 = F (Xn), under the hypothesis that the sequence Xn is convergent. In the case where the matrix X0 is column stochastic, the sequence is obviously convergent. In the more general case where X0 is not nonnegative, the local convergence is guaranteed by the classical results on functional iteration methods [94], as stated by the next theorem.

Theorem 4.19 Let gF : Rm ! Rm the function de ned by gF (x) = vec(F (vec?1(x))); where F is given in (4.13), and denote with JF (x) the m2 m2 Jacobian matrix associated with the function gF . Then, gF (g) = g and Rb F = JF (g), where g = vec(G). 2

2

Proof. The proof directly follows from the properties of Kronecker product, see [53]. 2 Since (Rb F ) < 1, the local convergence is guaranteed. Once the eigenvalues 1 = 1; 2; : : : ; h, and the corresponding eigenvectors T w1 ; : : : ; wTh are known, a matrix X0 satisfying (4.30) can simply be obtained by

X0 = V Diag(1; : : : ; h)W T ; (4.33) where W is the m h matrix whose columns are the vectors w1; : : : ; wh, and V is the generalized inverse of W T .

4.5. A RELAXATION TECHNIQUE

67

2 h=1 h=2 h = 16 0

Log (Residual)

-2

-4

-6

-8

-10

-12 0

5

10

15

20

25 Iterations

30

35

40

45

50

Figure 4.4: Metaring MAC Protocol In the light of Thms. 4.17, 4.18, 4.19, choosing an initial approximation satisfying (4.33) can lead to a strong reduction of the number of iterations needed to reach the required accuracy. We have veri ed the eectiveness of the proposed strategy on a problem arising from the modeling of a Metaring MAC protocol [4], for which even functional iteration methods starting with a column stochastic matrix converge very slowly. For this problem the dimension m of the blocks is 16. We have sorted the eigenvalues in such a way that j1j j2j : : : jmj, and we have chosen dierent subsets h = f1; : : : ; hg, where h is such that properties P1, P2, and P3 of this section are satis ed. Fig. 4.4 reports the logarithm (to the base 10) of the P + 1 residual error kXn ? i=0 Xni Ai k1 for h = 1; 2; 16. It is worth pointing out the substantial reduction of the number of iterations (and of the total time) when h passes from 1 to 2. In the case where h = 16, i.e., all the eigenvalues and corresponding eigenvectors are used, the matrix X0 is a poor approximation of G, which can be re ned by applying few steps of functional iterations. However, from Fig. 4.4, it appears that the asymptotic rate of convergence is almost the same as in the case h = 2.

4.5 A relaxation technique In this section we propose to improve the rate of convergence by applying a relaxation technique to customary functional iterations. More precisely, we consider

68


functional iterations de ned by the recursion

Xn+1 = (1 ? !)Xn + !F (Xn); n 0;

(4.34)

where F () is given by (4.13) and ! is a relaxation parameter. It can be readily observed that, in this case, the error e^n+1 is related to the error e^n by means of the recursion

e^n+1 = ((1 ? !)I + !Rb n)^en; n 0:

(4.35)

From the above equation, it follows that the rate of convergence of the relaxed sequence is related to the parameter !, as stated by the following theorem:

Theorem 4.20 (Asymptotic rate of convergence) Let ! > 0 a relaxation parameter. Let 1 h < m and h satisfying properties P1, P2, P3 of Sect. 4.4. Suppose that X0 veri es

wT X0 = wT ; for any 2 h , where wT is the corresponding left eigenvector of G. If the sequence Xn, n 0 of (4.34) is convergent, then it holds

qn ke^nk ((1 ? !)I + !ThT Rb F Th): r = lim n

Proof. The proof can be carried out by following the same lines of the proof of Thm. 4.17. 2 The above result suggests us to determine a value ! of ! for which the upper bound ((1 ? !)I + !ThT Rb F Th) to the rate of convergence is minimum. Observe that, if ! = 1 the local convergence of the sequence Xn is guaranteed, as pointed out in the previous section, since (ThT Rb F Th) (Rb F ) < 1. By choosing ! = ! the local convergence is still guaranteed since ((1 ? !)I + !ThT Rb F Th) (ThT Rb F Th). Hereafter we use the notations

W! = (1 ? !)I + !ThT Rb F Th and

x ? Re ) f (x) = 2(Re x2(Re ? Re ) + jj2 ? jxj2 :

In the following theorem we characterize the value ! for which the upper bound to the asymptotic rate of convergence is minimum [40]:


69

Theorem 4.21 (Optimal relaxation parameter) Let Xn = (1 ? !)Xn?1 + !F (Xn?1), for n 1. Let 1 h < m and h satisfying properties P1, P2,

P3 of Sect. 4.4 and suppose that X0 veri es the hypothesis of Thm. 4.20. Let be the set of the eigenvalues of the matrix ThT Rb F Th and ? = f 2 : Re = max2 Re g. Then the parameter ! such that (W! ) = min!>0 (W! ) is given by ( ) 1 ? Re ! = min f ( ); 1 ? 2Re + jj2 where:

- 2 ? is such that Im = max2? Im - 2 ? ? is such that f( ) = min2?? f (). Proof. Let us estimate the spectral radius of (W! ). It can be easily veri ed that the eigenvalues of the matrix W! are given by the set

f1 ? ! + !; 2 g: Whence, we have where

(W! )2 = max p (!); 2

p (!) = j1 ? ! + !j2 = !2(1 + jj2 ? 2Re ) ? 2!(1 ? Re ) + 1: The function p (!) belongs to a sheaf of m2 ? hm parabolas with center in (0; 1) and vertices in ! 2 Im 1 ? Re (x ; y) = 1 + jj2 ? 2Re ; 1 + jj2 ? 2Re ; 2 : Since (ThT Rb F Th) < 1, for any 2 we have Re < 1, whence x > 0 and 0 < y < 1. Moreover we have p(0) = 1 and p0(0) = 2(Re ? 1). Hence, for ! in a right neighborhood of zero, the value max2 p(!) is reached for 2 ? such that Im = max2? Im . It can be observed that min (max p (!)) = p (!) !>0 2 where with

! = min (x ; !b ); !b = min f! > 0 : p(!) = p(!); 6= g ;

70 that is

CHAPTER 4. FUNCTIONAL ITERATION METHODS !b = 2min f(): ??

2

From Thm. 4.18 it follows that the entries of the set , and consequently, the asymptotic optimal relaxation parameter !, cannot be evaluated without knowing the matrix G. We overcome this problem by dynamically computing an optimal relaxation parameter by minimizing, at each step, the spectral radius of a suitable matrix which relates the error of two successive approximations. Observe that, if X0 veri es the hypothesis of Thm. 4.20, then the error vector e^n belongs to the linear space Th for any n. Whence, by premultiplying both sides by the matrix T as in the proof of Thm. 4.17, the recurrences (4.35) can be rewritten in the form ThT e^n+1 = ((1 ? !)I + !ThT Rb nTh))(ThT e^n): Based on these considerations, we generate the sequence of matrices Xn+1 = (1 ? !n)Xn + !nF (Xn); n 0; where the value of !n is such that (Wn;! ); (Wn;!n ) = min !>0 and Wn;! = (1 ? !)I + !ThT Rb nTh. Indeed, the following results hold:

Theorem 4.22 (Dynamic relaxation parameter) Let Xn = (1 ? !)Xn?1 + !F (Xn?1). Let 1 h < m and h satisfying properties P1, P2, P3 of

Sect. 4.4 and suppose that X0 veri es the hypothesis of Thm. 4.20. Let n be the set of the eigenvalues of the matrix ThT Rb n Th and ?n = f 2 n : Re = max2n Re g. If (ThT Rb nTh) < 1 then the parameter !n such that (Wn;!n ) = min!>0 (Wn;! ) is given by

) ( 1 ? Re !n = min f ( ); 1 ? 2Re + jj2

where: - 2 ?n is such that Im = max2?n Im , - 2 n ? ?n is such that f( ) = min2n??n f (). Proof. The proof can be carried out analogously to the proof of Thm. 4.21. 2 In the case where (ThT Rb nTh) 1 we set !n = 1, i.e. we apply the standard functional iteration formulas; however, since the nonrelaxed sequence converges to the matrix G, after a nite number of steps the condition (Wn) < 1 is satis ed.


71

-3 Standard FIF Relaxed FIF

-4 -5 -6

log (Error)

-7 -8 -9 -10 -11 -12 -13 0

10000

20000

30000

40000

50000

60000

70000

Steps

Figure 4.5: Teletrac system Observe that the optimal parameter !n depends on the eigenvalues of the matrix ThT Rb n Th, that can be computed by only knowing the approximation Xn and the eigenvalues of the matrix G, as stated by the following result, which is analogous to Thm. 4.18:

Theorem 4.23 Let Qi;n =

X

+1

j =0

ji Yj;n; i = 1; : : : ; m;

where the matrices Yj;n are de ned in (4.23) and i, i = 1; : : : ; m, are the eigenvalues of the matrix G. Under the hypothesis of Thm. 4.7 the set n of the eigenvalues of the matrix ThT Rb n Th is given by

n =

[

i=h+1;:::;m

f : is eigenvalue of Qi;ng:

From the many numerical experiments that we performed, we observed that the sequence !n rapidly converges to the asymptotic optimal relaxation parameter ! of Thm. 4.21. This suggested us to stop the computation of the dynamical relaxation parameter !n when the dierence between the parameters obtained at two subsequent steps is \small enough". We have tested the relaxation technique on the family of teletrac systems, modeled as a continuous-time QBD Markov chain and de ned by blocks of dimension 24, described in [32].

72


Fig. 4.5 reports, for one of these problems, the logarithm (to the base 10) of P + the residual error kXn ? i=01 Xni Aik1 of the approximations obtained at step n, with the relaxed and standard functional iteration methods, starting with X0 = I . For this problem, the asymptotic optimal relaxation parameter is ! = 2:086. The time needed for the computation of the relaxation parameter does not signi cantly aect the overall computational time of the relaxed method. The ratio between the number of iterations (and the computational time) of the standard and relaxed method, needed to reach a residual error less than 10?12, is about 2.

Chapter 5 A \divide and conquer" algorithm In [88, 87] G. W. Stewart has introduced, analyzed and implemented a recursive algorithm, that relies on the Sherman-Morrison-Woodbury formula, for solving systems in block Hessenberg form. This method, based on a doubling technique, has been adapted in [70], by G. W. Stewart and G. Latouche, to block Toeplitz matrices and used for solving M/G/1 type Markov chains by computing the solution of a sequence of systems (1.12). Here we show how this algorithm can be improved by using the concept of displacement rank, both in terms of computational cost and of memory storage. More speci cally, we explicitly relate the rst and the last block rows and block columns of Hn?1 with the corresponding ones of H ?n 1. These block vectors fully de ne all the entries of Hn?1 by means of a Gohberg-Semencul-like formula. In this way we obtain a doubling algorithm for the computation of H2?i 1, i = 0; 1; : : : ; q, n = 2q , where at each stage of the doubling procedure only few convolutions of block vectors must be computed. The overall cost of this computation is O(m2n log n + m3 n) arithmetic operations with a moderate overhead constant, versus the cost O(m3n log2 n) of the Stewart algorithm. In Sect. 5.1 we recall the equations, based on the Sherman-Morrison-Woodbury formula, relating the matrices Hn?1 and H ?n 1, on which the algorithm of [88, 70] relies. In Sect. 5.2, we describe and analyze our algorithm. In Sect. 5.3 we present some numerical results. 2

2

5.1 The Stewart algorithm

Consider the n n block Toeplitz matrix Hn in block Hessenberg form of (1.13) obtained by truncating the in nite matrix (1.10). Let us assume for simplicity that n = 2q , for a positive integer q. Suppose that det H2j 6= 0, j = 0; 1; : : : ; q (this condition is satis ed for a positive recurrent Markov chain), and partition 73

CHAPTER 5. A \DIVIDE AND CONQUER" ALGORITHM

74

the matrix Hn in the following way

"

H n U n V nT Hn = T n H n 2

2

2

where

2

2

2 ?A n 3 +1 ?A n 66 77 77 ; T n = 66 ?A.n +2 ?A. n +1 64 .. .. 5 ?An ?An?1

203 2 ?AT 66 .. 77 66 0 0 U n = 66 . 77 ; V n = 66 .. 405 4 . I 0 2

#

2

2

2

2

2

2

: : : ?A2 3 . . . ?A 77 3 7 ... 775 : ... : : : ?A n +1 2

By applying the Sherman-Morrison-Woodbury formula [52] to the decomposition

"

n Hn = HT n " n M = U

+ 00 " 0

#H n ; Nn = V n 0 2

n

# "

0

2

2

2

# U n V nT = S + M N T ; n n n # 0 2

2

(5.1)

2

we immediately nd the following expression for the matrix inverse of Hn:

Hn?1 = Sn?1 ? Sn?1Mn(I + NnT Sn?1Mn)?1NnT Sn?1; where

(5.2)

"

# H ?n 1 0 (5.3) n ?H ?n 1 T n H ?n 1 H ?n 1 : The above formulae have been used by Stewart [88] in order to devise an ecient doubling method for solving the block Hessenberg system Hnx = b, where x and b are mn-dimensional vectors, and applied by Latouche and Stewart [70] to devise a doubling algorithm for the solution of the linear system (1.11), truncated at a nite system of block dimension n, in O(m3n log2 n) operations. The algorithm can be simpli ed by exploiting displacement structure, as shown in [23]. Consider the nite system Hnx = b, where x and b are mn-dimensional vectors. By partitioning x and b in two nm=2 dimensional vectors such that xT = [xT1 ; xT2 ], bT = [bT1 ; bT2 ], and denoting y1 = H ?n 1b1 , y2 = H ?n 1(b2 ? T n y1), Y3 = H ?n 1U n , Y4 = ?H ?n 1T n Y3, from (5.2), (5.3) we easily nd that " # " # x = yy1 ? YY34 (I + V nT Y4)?1 V nT y2: (5.4) 2 S ?1 =

2

2

2

2

2

2

2

2

2

2

2

2

2

2

In this way the solution of an n n block Hessenberg block Toeplitz system is reduced to solving the two n2 n2 block Hessenberg block Toeplitz systems yielding

5.2. EXPLOITATION OF THE DISPLACEMENT STRUCTURE

75

y1 and y2 , to solving the block Hessenberg block Toeplitz systems H n Y3 = U n

and H n Y4 = ?T n Y3. Observe that all the above systems are associated with the same matrix H n and that the latter two block systems do not depend on the vector b. Now, denote with pn the cost of computing Y3 and Y4, with sn the arithmetic cost of computing x by means of (5.4) once the block vectors Y3 and Y4 have been computed, nally with cn the overall cost of computing x. In this way from the equations (5.2), (5.4) and (5.3) we have 2

2

2

2

2

cn = sn + pn + O(m2n + m3 ) sn = 2s n + O(m2n log n) pn = 2ms n + p n + O(m2n log n + m3n); 2

2

2

where the products between block Toeplitz matrices and (block) vectors are computed by means of FFT's, as shown in Sect. 2.2, for the cost O(m2n log n + m3 n) ops. From the above relations we obtain sn = O(m2n log2 n), whence pn = p n + O(m3n log2 n). Therefore we deduce that pn = O(m3n log2 n) and cn = O(m3n log2 n). 2

5.2 Exploitation of the displacement structure We show how, by using the displacement operator presented in Sect. 2.4, the above doubling technique can be modi ed in order to obtain a simpler algorithm having a computational cost of O(m3n + m2 n log n) operations. Indeed, observe that for the matrix Hn?1 the representation (2.22) holds. In this way the inverse of Hn is explicitly determined as a sum of products of block triangular block Toeplitz matrices de ned by the block A0 , by the block rows Rn(1) , Rn(2) , and the block columns Cn(1) , Cn(2) . The Sherman-Morrison-Woodbury formula allows us to relate the block vectors Cn(1) , Cn(2) , Rn(1) , Rn(2) de ning the matrix Hn?1 and the block vectors C (1) n , (2) (1) (2) C n , R n , R n de ning the matrix H ?n 1. Moreover, since such relations only involve operations between block Toeplitz matrices and block vectors, we may devise an ecient scheme for their implementation based on Alg.2.1, of Sect. 2.2. In this way we arrive at a procedure for the computation of the inverses of H2i , i = 0; 1; : : : ; q, requiring O(m3n + m2 n log n) arithmetic operations, for n = 2q , and a low storage cost; in fact, only four auxiliary block vectors need to be allocated in order to carry out our algorithm. The following result [23] provides the desired correlation among block rows and columns of Hn?1 and H ?n 1. 2

2

2

2

2

2

Theorem 5.1 (Rows and columns of Hn?1 and H ?n 1) The rst and last block columns Cn(1) , Cn(2) , and the rst and last block rows Rn(1) , Rn(2) of Hn?1 satisfy the 2


76 relations

2 3 2 3 (1) (2) C n C n (1) Cn(1) = 4 ?1 n (1) 5 ? 4 ?1 n (2) 5 QR(1) n Tn Cn ?H n T C n ?H n3 T C n " O # 2 (2) (2) 4 ?C1 n (2) 5 QR(1) n En Cn(2) = C (2) + n ?H n T n C n 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

(5.5)

2

2

(1) (2) (1) (1) Rn(1) = A0 [R(1) n ; O ] ? A0 R n E n Q[R n T n H ? n 1 ; ?R n ] 2

2

2

2

2

2

2

(2) (2) (1) (1) (2) Rn(2) = A0 [?R(2) n Tn H? n 1 ; R n ] + A0 R n T n C n Q[R n T n H ? n 1 ; ?R n ]; 2

2

2

2

2

2

2

2

2

2

2

(2) where Q = (I + A0R(1) n T n C n )?1 A0 . 2

2

2

2

Proof. The formulae follow directly from (5.1{5.3). We can now describe the algorithm for the inversion of Hn:

Algorithm 5.1 (Inversion of the matrix Hn) Input. A positive integer q and the m m blocks Ai , i = 0; 1; : : : ; n, n = 2q , such that det H2j = 6 0, j = 0; 1; : : : ; q. Output. The block vectors C2(1)j , C2(2)j , R2(1)j , R2(2)j , de ning the rst and the

last block columns and block rows, respectively, of the matrix H2?j1 , for j = 0; 1; : : : ; q.

Computation.

1. Compute H1?1 = (I ? A1 )?1 and set R1(1) = R1(2) = C1(1) = C1(2) = (I ? A1 )?1. 2. For j = 0; 1; : : : ; q ? 1 compute C2(1)j , C2(2)j , R2(1)j , R2(2)j by means of Theorem 5.1, by calculating: (a) X (1) = T2j C2(1)j , X (2) = T2j C2(2)j , Y (1) = R2(1)j T2j , Y (2) = R2(2)j T2j ; (b) H2?j1 X (1) , H2?j1X (2) , Y (1) H2?j1 , Y (2) H2?j1 ; (c) Q = (I + A0 R2(1)j X (2) )?1 A0 ; (d) C2(1)j , C2(2)j , R2(1)j , R2(2)j by means of (5.5). +1

+1

+1

+1

+1

+1

+1

+1

For the computation of the products T2j C2(1)j , T2j C2(2)j , R2(1)j T2j , R2(2)j T2j , at stage (2a), involving the block Toeplitz matrix T2j , we use Alg. 2.1 where the FFT of the matrix T2j , that appears in each product, is computed once. In this way we have to compute 9m2 FFT's of order 2j+1 and about 2j+2 products of m m complex matrices.

5.2. EXPLOITATION OF THE DISPLACEMENT STRUCTURE

77

For the computation of the four products involving the matrix H2?j1 at stage (2b), we use again Alg. 2.1 and (2.22), where the four FFT's associated with the matrix H2?j1 are computed only once. In this way the computation is reduced to performing 12m2 FFT's of order 2j+1 and about 10 2j complex m m matrix products. Since the cost of an FFT of a real vector of order n is about 52 n log2 n real arithmetic operations and the product of two complex matrices can be performed by means of three products of real matrices, we nd that the overall cost of Alg. 5.1 is roughly given by 42m3n + 100m2n log2 n arithmetic operations. Once the block vectors Rn(1) , Rn(2) , Cn(1) , Cn(2) have been computed, the solution x of the system Hnx = b, can be obtained by means of (2.22) and Alg. 2.1, by performing O(m2) FFT's of order n, thus, leaving unchanged the asymptotic cost O(m3n + m2 n log n). For block tridiagonal matrices Hn, like the ones arising from QBD problems, the formulae of Thm. 5.1 get simpli ed and the cost becomes O(m3n). For non-skip-free matrices, Hn is obtained by reblocking the matrix (1.4) leading to the structure (1.5). Here we show that, for the non-skip-free case, our algorithm can also be further simpli ed. In fact, since Hn is an nk nk block Toeplitz matrix, in light of (2.21), we may rewrite Hn?1 as T (1)T (2) T (2)T Hn?1 = L(u(1) n )L (v n ) ? L(un )L (v n ) ;

for suitable kn dimensional block vectors u(ni) , v(ni) , i = 1; 2. In this way it is sucient to relate the block vectors u(ni) , v(ni) , with u(ni) , v(ni) , for i = 1; 2. This relation is implicitly provided by Thm. 5.1. We observe that the computation of u(ni) , v(ni) is reduced to performing a nite number of products of block Toeplitz matrices and block vectors of block dimension nk except for the computation of the inverse of Q. For this purpose we have the following result [23]. 2

2

Theorem 5.2 (Displacement of Q) The block displacement rank of Q, with respect to the operator k;m is at most 6.

Proof. We prove the theorem in the case where m = 1 and we leave the reader to complete the proof in the general case. It is sucient to compute the rank of (I + NnT Sn?1Mn), i.e., rank((NnT Sn?1Mn)). Due to the structure of Nn and Mn, we have NnT Sn?1Mn = A0 (Sn?1) n +1; n , where (Sn?1 ) n +1; n is the ( n2 + 1; n2 ) block entry of Sn?1. Now, from the de nition of displacement rank it follows that rank((Sn?1 ) n +1; n ) rank(Sn?1) + 2, moreover, since Sn is a 2 2 block triangular block Toeplitz matrix with Toeplitz blocks, we have rank(Sn) 4, and therefore rank(Sn?1) 4. Whence we deduce that rank((Sn?1) n +1; n ) 6. Since A0 is a lower triangular Toeplitz matrix then it has null displacement rank 2

2

2

2

2

2

2

2

78


and from the relation (AB ) = A(B ) + (A)B , that can be easily veri ed, it 2 follows rank(A0(Sn?1 ) n +1; n ) rank((Sn?1) n +1; n ) 6. From Thm. 5.1 and 5.2, it follows that the inverse of Q can be represented as the sum of at most 7 products of nk nk lower and upper block triangular block Toeplitz matrices. Thus, its inverse can be computed fast in O(k2m3 ) arithmetic operations. Hence, each doubling step of size 2i requires O(m2) FFT's of order k2i and the inversion of a matrix having block displacement rank at most 4. Therefore, the total cost is O(m2kn log(kn)) + O(knm3). 2

2

2

2

5.3 Numerical results Alg. 5.1 has been implemented in Fortran 90 and compared with the Stewart implementation of the algorithm given in [87], on a problem arising from the modeling of a metropolitan network [4], where the goal of the computation is solving the matrix equation (1.7). The size of the blocks in this problem is m = 16 and the maximum value of n that we have tested for our algorithm is n = 4096. The experiments have been performed on an Alpha server. The following table reports, for the values of n ranging from 256 to 4096, the number of cpu seconds required by our algorithm (algorithm DR: Displacement Rank) and the residual error; it reports also the time required by the Latouche{Stewart algorithm (denoted by LS) to reach the same residual error of our algorithm, and the ratio between the times. The symbol \*" denotes that we were not able to reach the required accuracy for lack of memory.

n 256 512 1024 2048 4096

DR LS time residual time residual ratio 8.7 2:5 10?3 38.3 2:5 10?3 4.4 18.1 4:0 10?4 103.5 4:0 10?4 5.7 39.2 1:7 10?5 264.9 1:7 10?5 6.8 89.3 3:4 10?8 * * * ? 13 193.4 1:3 10 * * * Table 5.1: Comparison of the algorithms

Our algorithm, besides being faster than Stewart's algorithm, requires less storage and therefore allows us to deal with larger values of n, and provides more accurate results.

Chapter 6 The cyclic reduction method The cyclic reduction algorithm, originally introduced for solving certain block tridiagonal block Toeplitz systems arising from the numerical treatment of elliptic equations [29], has been more recently rediscovered in [97, 69] for solving QBD Markov chains. Here we apply the cyclic reduction method for solving general M/G/1 type Markov chains. In particular, in the case of nite matrices, we devise a fast algorithm, based on the use of FFT, for the LU factorization of the matrix I ? P T , where P T is given in (1.6), and for the computation of the vector solving (1.2). In the case of in nite matrices we present a quadratically convergent algorithm both for the computation of the vector and for the computation of the matrix G solving (1.7); the algorithm, which consists in rephrasing the cyclic reduction in functional form and in evaluating the involved matrix power series by means of the point-wise power series arithmetic of Sect. 2.3, outperforms the existing numerical methods. In the case of non-skip-free matrices, a combination of the cyclic reduction algorithm and of the displacement properties of Sect. 2.4, allows us to devise a speci c fast numerical method for the computation of the matrix G. In Sect. 6.1 we consider the case of nite Markov chains. In Sect. 6.2 we deal with the general case of in nite M/G/1 type Markov chains. Finally, in Sect. 6.3 we analyze the speci c case of non-skip-free matrices.

6.1 The case of nite systems The problem of computing the steady state distribution of a nite Markov chain has been investigated by several authors. First we recall some of the main results on direct approaches based on LU and UL matrix factorizations of an M-matrix (we refer to [5, 6, 41, 42, 43, 54, 57, 58] for more details and further references). Then we introduce the algorithm based on cyclic reduction.

79

80

CHAPTER 6. THE CYCLIC REDUCTION METHOD

6.1.1 Preliminaries

We recall that an M-matrix is any matrix of the form I ? A where A has nonnegative entries and the spectral radius (A) of A is such that (A) . Thus, for any stochastic matrix P , both I ? P and I ? P T are singular M-matrices since (P ) = (P T ) = 1. The next two theorems, which are a special case of more general results given in [95, 41], are at the basis of the methods given in [54, 41, 42, 57, 58, 6] for the computation of .

Theorem 6.1 (LU factorization) For any nn stochastic matrix P there exist matrices L and U , such that I ? P T = LU , where L is lower triangular with unit diagonal entries, and U is upper triangular. Moreover, L and U are M-matrices, therefore the o-diagonal entries of L and U are nonpositive, the diagonal entries of U are nonnegative. If in addition P is irreducible then ui;i 6= 0, i = 1; : : : ; n ? 1, un;n = 0.

Observe that theorem 6.1 still holds for block matrices if the LU factorization is replaced with the block LU factorization and the condition un;n = 0 is replaced with det Un;n = 0. A similar result can be obtained in terms of (block) UL factorization, by applying the following:

Theorem 6.2 (UL factorization) For any n n stochastic irreducible matrix P there exist matrices L and U such that I ? P = LU , where L is lower triangular

with unit diagonal entries, and U is upper triangular. Moreover, L and U are M-matrices, therefore the o-diagonal entries of L and U are nonpositive, the diagonal entries of U are ui;i > 0, i = 1; : : : ; n ? 1, un;n = 0.

By applying theorem 6.2 to the stochastic matrix JPJ , where J is the permutation (reversion) matrix having unit entries along the secondary diagonal, i.e., for i = j ? n +1, we have the factorization I ? JPJ = L^ U^ . In this way we obtain ^ )T , L~ = (J L^ J )T . It is a simple the factorization I ? P T = U~ L~ , where U~ = (J UJ matter to check that the rst entry (block) on the main diagonal of U~ is zero (singular). If the factors L, U , or L~ , U~ are computed by means of Gaussian elimination with the diagonal adjustment technique of [54], the computation is strongly numerically stable [81]. That is, the relative rounding errors in the output can be bounded componentwise. Once the (block) triangular factors L and U , L~ and U~ , respectively, have been computed, the relations I ? P T = LU , I ? P T = U~ L~ allow us to compute the components of the probability invariant vector in a numerically stable way. In fact, the following three conditions are equivalent (I ? P T ) = 0;

(6.1)

6.1. THE CASE OF FINITE SYSTEMS ( ~ U = 0; L = 0 U~1;1 0 = 0:

81 (6.2) (6.3)

The method proposed in [41, 42] consists of solving equation (6.2): we may arbitrarily x the last component of and use back substitution in order to recover all the remaining components. Normalization allows us to compute the desired values. This procedure is numerically stable since it involves only additions of positive numbers. If I ? P T = LU is a block factorization, we have just to solve the problem Un;nn?1 = 0 (say, by means of the computation of the LU or U~ L~ factorization of Un;n) and then recover the remaining block components by means of back substitution. In this last stage we have to solve n ? 1 linear systems having matrix Ui;i, i = 1; : : : ; n ? 1. Again, this computation can be performed by means of the LU , or U~ L~ factorizations of the matrices Ui;i, i = 1; : : : ; n ? 1. Also in the block case, the back substitution stage is numerically stable since it is reduced to performing additions of positive numbers. A similar computation holds if (6.3) is used instead of (6.2). In this case we have the method proposed in [54]: the back substitution stage is replaced by forward substitution, once the rst component (block) of has been xed (computed). In the approach of [57, 58, 5] the matrix I ? P T is partitioned as follows B y T I ? P = zT ; (6.4) where B is an (n?1)(n?1) matrix, y, z are vectors having n?1 components and is a real number. Thus, the problem is reduced to solving the system B x = ?y and to setting T = (xT ; 1)=(1 + P xi ). In [6] Barlow performs an accurate error analysis and proves that this approach is numerically stable provided that the system B x = ?y is solved by means of the LU , or QR factorization, with the diagonal adjustment technique of Grassman, Taksar and Heyman [54]. Now observe that if P is stochastic then T P is stochastic for any permutation matrix and even irreducibility is preserved under this similarity transformation. Therefore theorems 6.1 and 6.2 still hold if I ? P is replaced with I ? T P , for any permutation matrix . This fact is used in the next section where the LU factorization of the matrix I ? T P is computed, for a suitable matrix , by means of cyclic reduction.

6.1.2 LU factorization by means of cyclic reduction.

We now describe an algorithm, introduced in [17], for the numerical solution of (1.2) in the case where P is the nite matrix of (1.6). This method, that consists in computing a LU factorization of the n n block Hessenberg block Toeplitzlike matrix S = I ?P T (here is a permutation matrix), is numerically stable and

82


has a low computational cost. The algorithm relies on the technique of successive state reduction (or cyclic reduction [29]), when applied to the computation of the vector . Let q 2 be an integer such that n = 2q , and consider the block Toeplitz-like matrix in block Hessenberg form, 2 K 3 c1 H0 0 66 K 77 c2 H1 H0 6 77 . . .. . . . . . . (6.5) S = S (0) = I ? P T = 66 .. 7; 64 K cn?1 Hn?2 : : : H1 H0 75 fn K fn?1 : : : K f2 K f1 K c1 = I ? B1 , which is uniquely de ned by the blocks H0 = ?A0 , H1 = I ? A1 , K f1 = I ? C1, Hi = ?Ai , K ci = ?Bi , K fi = ?Ci, i = 2; : : : ; n ? 2, K cn?1 = ?Bn?1 , K f f Kn = ?Cn, Kn?1 = ?Cn?1 . By interchanging block rows and columns of S (0) = S according to the oddeven permutation (1; 3; 5; : : : ; n ? 1; 0; 2; 4; 6; : : :; n ? 2) we obtain the matrix " (0) (0) # (0) (0) (0) T S = T1(0) W(0) ; (6.6) Z T2 where (0) is the block odd-even permutation matrix, T1(0) , T2(0) , Z (0) and W (0) are (n=2) (n=2) block Toeplitz-like matrices, de ned by 3 2K 2 H0 c2 H0 0 03 7 66 K . 77 . c 77 (0) 66 H2 H0 75 ; 6 ; Z = W (0) = 66 .. 4 H.. 2 . . . . 4 .. . . . . . . 4 . . H0 75 . fn K fn?2 : : : K f2 2 Hn?2 : : : H2 H0 3 K H1 0 66 H3 H1 77 66 .. 77 . (0) .. . . . T1 = 6 . 77 ; 64 H (6.7) 5 n?3 Hn?5 : : : H1 fn?1 K fn?3 : : : K f3 K f1 K

2 K 3 c1 0 66 K 77 c 66 .. 3 H. . 1 . . 77 (0) T2 = 6 . . . 77 : 64 K cn?3 : : : H3 H1 5 cn?1 Hn?3 : : : H3 H1 K

By applying one step of block Gaussian elimination to the 2 2 block matrix (6.6) we nd that I 0 " T (0) W (0) # (0) (0) (0) T 1 S = V (0) I 0 S (1) ; (6.8) (0) ?1 (0) (0) (0) (1) (0) (0) V = Z T1 ; S = T2 ? V W :

6.1. THE CASE OF FINITE SYSTEMS

83

Now we show that the Schur complement S (1) has the same structure as S (0) in (6.5). For this purpose, we de ne the block column vectors Hodd = (H2i+1)i=0;n=2?2 , Heven = (H2i )i=0;n=2?2 and consider the (n=2 ? 1) (n=2 ? 1) block matrices

F0(0) = L(Zn=2?1;m Hodd ); F1(0) = L(Heven); F2(0) = L(Hodd ); where Zn=2?1;m is the block down-shift matrix de ned in (2.16). Let us partition S (1) as " (1) (1) # (1) S = S1(1);1 S1(1);2 S2;1 S2;2 where S1(1);2 is an (n=2 ? 1) (n=2 ? 1) block submatrix. Then from (6.8) and (6.7) it follows that ?1

S1(1);2 = F2 0(0) ? F31(0) F2(0) F1(0) ; 2 c1 c2 3 K K c3 77 (0) (0)? 66 K c4 77 66 K (1) S1;1 = 64 .. 75 ? F1 F2 64 .. 75 ; . . c c h cKn?3 i Kn?2 h (1) (1) i (0)? W (0) : ; H ; : : : ; H S2;1 ; S2;2 = K n?1 n?3 1 ? [Hn?2 ; Hn?4 ; : : : ; H0 ] T1 (6.9) Since the linear space of lower block triangular block Toeplitz matrices is closed under multiplication and inversion, i.e., it constitutes a matrix algebra, we deduce that S1(1);2 is a block triangular block Toeplitz matrix. Hence, the Schur complement S (1) has the same structure of (6.5), i.e., P (1)T = I ? S (1) is still a block Toeplitz-like matrix in block Hessenberg form, where S (1) is uniquely (1) c(1) c(1) f(1) f(1) de ned by the matrices H0(1) ; : : : ; Hn= 2?2 , K1 ; : : : ; Kn=2?1 , and K1 ; : : : ; Kn=2 such that 1

1

S1(1);2 = L(Q(1) ); i h c(1)T c(1)T (1)T T c S1(1);1 = K ; K ; : : : ; K n=2?1 ; h f1(1) f(1)2 h (1) (1) i f1(1) i ; S2;1 ; S2;2 = Kn=2; Kn=2?1 ; : : : ; K

(6.10)

where Q(1) = (Hi(1) )i=0;:::;n=2?2. Moreover, it is easy to check that P (1)T is a stochastic matrix. Therefore the cyclic reduction process can be recursively applied q ? 1 times until the 2m 2m matrix S (q?1) is computed. The resulting algorithm, which computes the block LU factorization of a matrix obtained by suitably permuting rows and columns of S , is described by the formulae (6.5){ (6.10) adjusted to the generic j -th step. This algorithm outputs the blocks Hi(j+1), fi(j+1) ; i = 1; : : : ; 2q?j?1, and K ci(j+1) ; i = 1; : : : ; 2q?j?1 ? 1, i = 0; : : : ; 2q?j?1 ? 2, K de ning the block matrices T1(j), W (j), V (j), S (j+1), j = 0; 1; : : : ; q ? 2.


84

Consider now the problem of solving the system of equations (1.2) and partition the vector h into blocks of length m according to the block structure of S , i i.e., set T = T0 ; T1 ; : : : ; Tn?1 . Moreover, denote by (?j) the vector having blocks i2j , i = 0; : : : ; 2q?j ? 1, j = 0; 1; : : : ; q ? 1, where n = 2q , q 2. Similarly, denote by (+j) the vector having blocks (2i+1)2j? , i = 0; : : : ; 2q?j ? 1, for j = 1; : : : ; q ? 1. In this way, at the j -th recursive step of(jthe cyclic reduction algorithm applied to S (0) = 0 we obtain the system S (j)? ) = 0, j = 0; : : : ; q ? 1. In fact, after the odd-even permutation on the block rows and the block columns and after performing one step of block Gaussian elimination we obtain the system of equations " (j) # #" T1 W (j) (+j+1) = 0; 0 S (j +1) (?j +1) (6.11) (j ) (j )? ( j ) ( j +1) ( j ) S = T2 ? Z T1 W ; (compare with (6.8)). At the end of the cyclic reduction algorithm we have to solve the 2m 2m homogeneous system 1

1

S

(q?1) (q?1)

?

(q?1)

= 0 ; ?

= q0? ; 2 (

1)

and we may recover the remaining components of the vector by means of back substitution by using the following relation derived by (6.11):

(+j+1) = ?T1(j)? W (j)(?j+1); j = q ? 2; : : : ; 0: 1

The algorithm thus obtained for the computation of the probability invariant vector can be carried out in the following way.

Algorithm 6.1 (Computation of the probability invariant vector ) Input. Positive integers m, q , n, n = 2q , and the nonnegative m m matrices Ai, i = 0; 1; : : : ; n ? 2, Bi , i = 1; : : : ; n ? 1, Ci, i = 1; : : : ; n, de ning the stochastic matrix P of (1.6).

Output. An approximation of the solution of system (1.2). Computation.

1. Cyclic reduction stage: Let S (0) = I ? P T and, for j = 1; : : : ; q ? 1, recursively apply the cyclic reduction algorithm and compute the j ) , i = 0; 1; : : : ; n ? 2, and K j) , i = ci(+1 fi(+1 m m matrices Hi(j), K j 0; 1; : : : ; nj ? 1, nj = n=2j , de ning the nj nj block matrices S (j) = ?1 T2(j?1) ? Z (j?1) T1(j?1) W (j?1) ; by means of the formulae (6.9){(6.11)

6.1. THE CASE OF FINITE SYSTEMS

85

represented for j = 0 and here adjusted for the recursive j -th step of the cyclic reduction algorithm as follows: ?1

S1(j;2+1) = F2 0(j) ? F31(j)F2(j) F1(j) ; 2 c2(j) 3 c1(j) K K 66 K c4(j) 77 c3(j) 77 (j) (j)? 66 K (j +1) 6 6 7 S1;1 = 6 .. 7 ? F1 F2 6 .. 77 ; 4 . 5 4 . 5 (j ) cn(jj)?2 c h cK(nj)j ?3 (j) iK h (j+1) (j+1)i (j ) = Knj ?h1; Hnj ?3; : : : ; H1 ? i S2;1 ; S2;2 ? Hn(jj)?2; Hn(jj)?4; : : : ; H0(j) T1(j) W (j) ; (6.12) (j ) (j ) (j ) (j ) (j ) ( j ) where F0 = L(ZHodd ), F1 = L(Heven ); F2 = L(Hodd ) and Z is the 1

1

block down-shift matrix.

2. Back substitution stage: Solve, by means of LU or UL factorization, the 2m 2m system of equations S (q?1) (?q?1) = 0: ?1 Compute (+j+1) = ?T1(j) W (j) (?j+1) for j = q ? 2; : : : ; 0. 3. Normalization stage: Normalize the vector .

Let us analyze the computational cost of the algorithm. By formulae (6.12), at each step of the cyclic reduction scheme, stage 1 involves one inversion of a triangular Toeplitz matrix and ve products between a block triangular block Toeplitz matrix and a block vector. Stage 2 involves two products of block triangular block Toeplitz matrices and a vector. In fact observe that F1(0) , F2(0) , as well ? as F2(0) , are block triangular block Toeplitz matrices. Thus we may use the algorithms described in Sect. 2.2 for performing fast computations with block Toeplitz matrices, thus reducing the cost of the j -th step to nj = O(m2nj log nj + m3nj ) arithmetic operations. Hence, the overall cost of the algorithm to compute by means of cyclic reduction is n + n + : : : + 2 = O(2n) = O(m2n log n + m3 n) arithmetic operations. On the other hand, it can be easily veri ed that the computational cost obtained by applying the customary block Gaussian elimination is O(m3n2 ). Moreover, Gaussian elimination with the diagonal adjustment of [54] is strongly numerically stable ([81]), and cyclic reduction with FFT-based computations is weakly numerically stable. It is worth pointing out that for moderate values of n below a certain threshold value n~, the customary algorithms for matrix-vector product are faster. The value n~ depends on the block size m, the larger is m the smaller is n~. In the case where the matrix P is block tridiagonal, i.e., when the Markov chain comes from a QBD problem, the whole computation can be carried out in a simpler form and we arrive at the nite-dimensional analogue of the algorithm 1

2


86

of [69] (see also [97]). In fact, let

2K c1 H0 c2 H1 H0 66 K 6 S = I ? P T = 66 H2 . . . . . . 64 ... H 1 f2 0 K

0

H0 f1 K

3 77 77 77 ; 5

then the block tridiagonal matrix S (j) obtained at the j -th recursive step of the ci(j), K fi(j) , cyclic reduction procedure, de ned by the blocks Hi(j), i = 0; 1; 2, K i = 1; 2, j = 0; : : : ; log2 n ? 1, is such that

H1(j+1) H0(j+1) c1(j+1) K f1(j+1) K f2(j+1) K

= = = = =

?1

?1

H1(j) ? (H2(?j)H1(j) H0(j) + H0(j)H1(j) H2(j)?); ?H0(j)H1(j) H0(j?); H2(j+1) = ?H2(j)H1(j) H2(j); c2(j) ; c1(j) ? H0(j)H1(j) K K f2(j) + H2(j)H1(j)? H0(j)); f1(j)? K H1(j) ? (H0(j)K c2(j) ; c2(j) = ?H2(j)H1(j)? K H2(j+1); K 1

1

1

1

1

(6.13)

1

c1(0) = I ? B1, K f1(0) = I ? C1, where H1(0) = I ? A1 , H0(0) = ?A0 , H2(0) = ?A2 , K c2(0) = ?B2 and K f2(0) = ?C2. K In this case, by formulae (6.13) it is easily seen that each step of cyclic reduction involves two inversions of m m matrices and ten products between m m matrices. Hence, the overall cost for the computation of is O(m3 log n + m2 n), versus the O(m3n) cost of Gaussian elimination.

6.2 The case of in nite systems The technique of successive state reduction, discussed in the previous section, can be successfully applied even to in nite systems [17, 18, 21]. In this case, at each stage of the cyclic reduction the size of the problem does not decrease since we obtain a sequence of in nite block Hessenberg block Toeplitz-like matrices. However, the sequence of problems that we obtain in this way converges to a block Toeplitz system that can be easily solved. A sort of a back substitution step allows us to recover the solution of the starting problem.

6.2.1 Computation the probability invariant vector The cyclic reduction technique for successive state reduction can also be applied to in nite stochastic matrices P of M=G=1 type (1.3). Analogously to what we have done in the previous section, we may consider the in nite block submatrices

6.2. THE CASE OF INFINITE SYSTEMS

87

T1(0) , T2(0) , Z (0) , W (0) of

2K c1 H0 03 c2 H1 H0 6K 77 S (0) = S = I ? P T = 664 K 75 ; c3 H2 H1 H0

... ... . . . . . . . . . obtained by considering separately block rows and block columns having odd, even position, respectively, i.e., 2K 2 H1 c1 03 03 c3 H1 77 (0) 66 K 77 6H H 75 ; T2 = 64 K 75 ; T1(0) = 664 H53 H31 H1 c5 H3 H1 .. . . . . . . ... . . . . . . . . . 2K 2 H. 0 . . 0. 3 c2 H0 03 c4 H2 H0 77 (0) 66 K 6H H 77 75 ; W = 64 K Z (0) = 664 H42 H20 H0 75 : c6 H4 H2 H0 ... . . . . . . . . . ... ... . . . . . . . . . Similarly, de ne the vectors T = hT ; T ; T ; : : :i ; (1)T = h T ; T ; T ; : : :i ; (1) + ? 1 3 5 0 2 4

i

h

where T = T0 ; T1 ; T2 ; : : : is the probability invariant vector associated with P , i.e., S = 0. One step of cyclic reduction yields the matrix ?1

S (1) = T2(0) ? Z (0) T1(0) W (0) such that

(6.14)

S (1) (1) (6.15) ? = 0: (1) T (1) Moreover, it is easy to observe that the matrix P = I ? S has the structure (1.3), i.e., it is a nonnegative in nite block Hessenberg matrix, that, except for the rst block column, has the block Toeplitz structure. We may recursively apply the same reduction (cyclic reduction) to the matrix equation (6.15) thus obtaining a sequence of matrices fS (j)g such that P (j)T = I ? S (j) has the same lower block Hessenberg and Toeplitz-like structure of (1.3) and such that S (j)(?j) = 0; where we de ne h i i T h T T T +(j)T = T2j? ; T32j? ; T52j? ; : : : ; (1) ? = 0 ; 2j ; 22j ; : : : : ci(j) , i 1, Hi(j), i 0, Each matrix S (j) is uniquely determined by the blocks K de ning its rst two block columns. The matrices P (j), j = 0; 1; : : :, have further properties as shown by the following [18]: 1

1

1


88

Lemma 6.3 If the matrix P (0)T = P T of (1.3) is stochastic, irreducible and positive recurrent, then all the matrices P (j)T = I ? Q(j) recursively generated by applying cyclic reduction are stochastic, irreducible and positive recurrent.

Proof. It is sucient to prove the lemma for P (1) , then the proof can be completed by induction. Since P (0)e = e, i.e., eT T1(0) + eT Z (0) = 0 and?eT W (0) + eT T2(0) = 0, from (6.14) we have eT (I ? P (1)T ) = eT (T2(0) ? Z (0) T1(0) W (0) ) = ? eT T2(0) + eT T1(0) T1(0) W (0) = eT T2(0) + eT W (0) = 0, that is, P (1) is stochastic. Now, in order to prove the irreducibility of P (1) , we rst show that a reducible stochastic matrix M having a positive probability invariant vector p such that M T p = p, jjpjj1 = 1, has at least a probability invariant vector having some null components. Indeed, without loss of generality, we may assume that 1

1

MT

M 0 1;1 = M M ; 2;1 2;2

where M1;1 and M2;2 are square (possibly i matrices and M1;1 is irreducible. h in nite) T T T Let us partition the vector p as p = p1 ; p2 , according to the block structure of M . From the condition M T p = p we deduce that M1;1 p1 = p1, whence eT M1;1 p1 = eT p1 : (6.16) Since p1 > 0 and eT M1;1 eT (due to the stochasticity of M ), we nd that if a component of eT M1;1 were less than 1, we would have eT M1;1 p1 < eT p1 , which would contradict (6.16). Therefore eT M1;1 = eT ,h i.e., Mi1T;1 is stochastic and M2;1 = 0. Whence it follows that the vector yT = pT1 ; 0T =jjp1jj1 is such that M T y = y, jjyjj1 = 1. Now, since the vector made up by the even block components of is a positive probability invariant vector of P (1)T , we deduce that, if P (1) were reducible, then there would exist a vector x 0 having some null components such that ?P (1)x = x, jjxjj1 = 1. In this way the vector yT = (yT1 ; yT2 ) de ned by y1 = ?T1(0) W (0) x, y2 = x, is such that P (0)T y = T y, where is the odd-even permutation matrix. This follows from the relation " (0) (0) # " T (0) W (0) # I 0 T W (0) T 1 1 S = (0) (0) = Z (0) T (0)? I 0 S (1) ; Z T2 1 1

1

?1

S (1) = T2(0) ? Z (0) T1(0) W (0) (compare (6.14)). This contradicts the assumptions, since the positive recurrent matrix P (0) cannot have a probability invariant vector with some null components. In order to prove that P (1) is positive recurrent it is sucient to show that there exists a positive invariant vector of P (1) with nite norm (compare [31], page 152). This vector is given by the even blocks of . 2 Explicit recursive formulae for the block entries Bi(j) i1, A(ij) i1 de ning the rst two block columns of the matrices P (j)T = I ? S (j) are rather complicate.


89

However, we can give a simple matrix functional representation of such blocks, by using the natural isomorphism between formal power series and triangular Toeplitz matrices. We associate with the stochastic matrix P (j)T = I ? S (j) , obtained at the P j) zi; + 1 ( j ) j -th step of cyclic reduction, the formal power series B (z) = i=0 Bi(+1 A(j) (z) = P+i=01 A(ij) zi ; which represent the rst and the second block column of P (j)T . Then, the rst and the second block column of the matrix P (j+1)T are determined by the formal power series

8 > (j ) (z ) + A(j ) (z ) I ? A(j ) (z ) ?1 B (j ) (z ) < B (j+1)(z) = Beven even odd odd (6.17) ? 1 ( j ) ( j ) > j ) (z ) I ? A (z ) (j ) (z ) : A(j+1)(z) = zAodd (z) + A(even A even odd P where for any matrix power series R(z) = +i=01 Rizi we de ne [R(z)]even = Reven (z) =PP+i=01 R2izi ; [R(z)]odd = Rodd (z) = +i=01 R2i+1 zi : Hence, we have a recursive formula expressed in terms h Tof formalipower series. (j )T We observe that ? converges to the vector 0 ; 0; 0; : : : , as j tends to in nity; in fact, i h ?(j)T = T0 ; T2j ; T22j ; : : : and limj!1 i2j = 0, for i = 1; 2; : : : because of jjjj1 = 1. Since 0 > 0, due to the irreducibility assumption (compare [31]), and Bi(j) , A(ij) 0, from the condition S (j)(?j) = 0 we deduce that limj!1 Bi(j) = 0, i 2. On the other (j ) (0) = B (j ) , from (6.17) we have that the sequence hand, since B (j)(0) = Beven 1 fB1(j)gj is monotonically increasing and, since jjB1(j)jj1 1 we have that there exists limj!1 B1(j) = B1 and B1T is a stochastic matrix. We can also estimate the speed of convergence to zero of the blocks Bi(j), i 2, as j tends to the in nity. P In fact, since +i=01 jjijj1 = 1, we have jjijj1 c=i, for i 1, and for a suitable positive constant c. Therefore, from the condition S (j)(?j) = 0, it follows that i?1 0 (j ) X jjBi(j)0 jj1 2cj jjHi ?r rjj1 2cj ; r=0 for i 2 and for a constant c0. Hence jjBi(j)jj1 converges to zero, as j tends to

the in nity, at least exponentially. Moreover, if jjijj1 i, for i 0, 0 < < 1, we have a double exponential convergence (conditions on the speed of convergence to zero of the entries of the vector are provided in [37]). In fact, by following the same argument used above, we arrive at the inequality

jjBi 0jj1 (j )

2j

i?1 X

r=0

jjHr(j)jj1i?r c00 2j ; i 2;

90


for a suitable constant c00 . In this way we may apply the technique of cyclic reduction in order to approximate the matrix B1 and the components of the vector 0 . For instance, the iterations can be performed until the matrix B1(j) satis es the condition jjeT B1(j) ? eT jj1 for a xed positive . Then, a vector e 0 such that B1(j)e 0 = e 0 can be computed, where is the eigenvalue of B1(j) having maximum modulus. Finally, after setting e i2j = 0, i 1, we may recover, by means of back substitution, approximations e i of the the components i through the formula

e (+r+1) = ?T1(r)? W (r) e (?r+1) ; r = j ? 1; : : : ; 0: 1

6.2.2 Computation of the matrix G

We now consider the problem of solving the matrix equation (1.7), by computing the solution of the in nite system (1.11). A simple trick allows us to compute an approximation of the matrix G by performing few steps of cyclic reduction, without the back substitution stage necessary in the algorithm described in the previous section. As a consequence we arrive at a quadratically convergent and numerically stable method for the computation of the matrix G, requiring a low storage cost, since the block entries T1(j), W (j) of the intermediate block LU decompositions do not need to be stored. The resulting method outperforms the doubling algorithm of Chap. 5, both in terms of computational cost and in terms of storage cost. First, we prove the following result (see [18]) that relates the matrices A(ij) i0 generated by the cyclic reduction described in the previous section to the matrix G j = G 2j :

Theorem 6.4 (Properties of the matrix G) The following properties hold: 1. For every positive integer j , the matrix Gj = G2j is the minimal nonnegative solution of the matrix equation

X=

X

+1

i=0

X iA(ij)

(6.18)

where A(0j) = ?H0(j) , A(1j) = I ? H1(j) , A(ij) = ?Hi(j) , i 2, and A(0) i = Ai , i 0. 2. There exists limj Gj = G0 . Moreover G0 = geT , where g = [g1; : : : ; gm]T is the nonnegative vector such that eT g = 1, G0 g = g. 3. The sequence of matrices fEj gj , where Ej is such that Gj = G0 + Ej , for j 0, converges quadratically to zero.


91

Proof. It can be easily observed that the row block vector [I; G; G2 ; : : :] veri es the equation h h i i I; G; G2; : : : I ? P T = B1; 0; 0; : : : : By applying the cyclic reduction technique to the above system we obtain the sequence of equations h h i i I; Gj ; G2j ; : : : I ? P (j)T = B1; 0; 0; : : : :

Hence, the matrix Gj = G2j is a nonnegative solution of the matrix equation (6.18). From the irreducibility and positive recurrence of the stochastic matrix P (j)T = I ? S (j) , obtained at each step j of cyclic reduction, proved in Lem. 6.3, the nonnegative solution Gj of (6.18) is unique. The convergence properties stated by the theorem follow since G is a stochastic matrix and the only eigenvalue having the largest modulus is 1 [80]. Indeed, let G = geT + E0 so that eT E0 = 0, E0 g = 0, and the spectral radius of E0 is strictly less than 1. Then Gj = geT +Ej , where Ej = E02j ; whence Ej tends quadratically to zero and limj Gj = geT . 2 From the above theorem the cyclic reduction algorithm generates a sequence of irreducible, positive recurrent M/G/1 type Markov chains P (j), whose associated nonlinear matrix equation has the nonnegative solution Gj = G2j . We might recursively apply cyclic reduction, as described in the previous section, and then recover the matrix G by means of a back substitution stage, as for the computation of the vector . For this purpose we should store the matrices T1(j), W (j) generated at each step j . A smarter algorithm, which avoids the back substitution, consists in applying the cyclic reduction technique to the in nite block Toeplitz system (1.11). In this way we generate the sequence of equations h 2j +1 32j +1 i " T1(j) W (j) # j +1 42j +1 2 2 G ;G ; : : : j G; G ;G ; : : : (j) (j) = Z T2 [0; 0; : : : j A0; 0; : : :] such that ? (j ) H T2(j?1) ? Z (j?1)i T1(j?1) W (j?1); h =2j +1 (6.19) G; G ; G22j +1; : : : H (j) = [A0 ; 0; 0; : : :] ; where 2 H1 2 H1 03 03 77 77 (0) 66 H3 H1 6H H 75 ; 75 ; T2 = 64 H5 H3 H1 T1(0) = 664 H53 H31 H1 ... . . . . . . . . . ... . . . . . . . . . 2 H2 H0 2 H0 03 03 77 77 (0) 66 H4 H2 H0 6H H 75 : 75 ; W = 64 H6 H4 H2 H0 Z (0) = 664 H42 H20 H0 ... ... . . . . . . . . . ... . . . . . . . . . 1

92


The matrix Q(j) = I ? H (j) is a nonnegative matrix having, except for the rst block column, the same block Hessenberg and Toeplitz structure of the matrix in (1.10). Each matrix Q(j) is uniquely determined by the blocks Ab(ij) , i 1, A(ij) , i 0, de ning its rst two block columns. Moreover, the matrices Q(j) , except for the rst block column, coincide with the matrices P (j)T generated in the previous section. For any j 0, we associate with the matrix sequences fAb(ij)gi1, fA(ij) gi0, the formal matrix power series

X X j) i (j) z ; A (z) = A(ij) zi ; Ab(j)(z) = Ab(i+1 +1

+1

i=0

i=0

(0) (0) where Ab(0) i = Ai = Ai , for i 1 A0 = A0 , i 1. Then, analogously to the previous section, we can write

8 (j+1) j ) (z ) + A(j ) (z )(I ? A(j ) (z ))?1 Ab(j ) (z ) < Ab (z) = Ab(even even odd odd j ) (z ) + A(j ) (z )I ? A(j ) (z )?1 A(j ) (z ): : A(j+1) (z) = zA(odd even odd even

(6.20)

The rst equation of (6.20) can be rewritten in a more convenient form that will be used in the next section to prove some useful structural properties of the cyclic reduction method applied to non-skip-free M/G/1 type Markov chains. For this purpose we introduce the matrix power series

H (j)(z) = zI ? A(j)(z) =

X

+1

i=0

j ) z i ; j 0: ci(+1 c(j)(z) = I ? Ab(j) (z) = X H Hi(j)zi ; H +1

i=0

With the latter notations the equation (6.20) can be rewritten in the following way (j ) H (j+1)(z2 ) = 2z(H (j)(z)?1 ? H (j)(?z)?1 )?1 = ?2zH (j) (z)Hodd (z)?1 H (j)(?z); (j ) (j ) (z ) ? H (j ) (z )H (j ) (z )?1 H codd c(j+1)(z) = Hceven (z): H even odd (6.21) The relations (6.20), in addition to expressing in compact form the recursions of the cyclic reduction method, they also provide a tool for the ecient computation of the matrices fAb(ij)gi1 , fA(ij) gi0, by using the algorithms of Chap. 2. The following results (see [17, 18, 21]) are fundamental for devising an ecient algorithm that is based on cyclic reduction. (j ) (j ) Theorem 6.5 (Relation among AP(ij) and G) The blocks f A g i0 , fAbi gi1 i P are nonnegative matrices such that +i=01 Ai(j)T , AT0 + +i=11 Abi(j)T are stochastic. j ) is nonsingular, then Moreover, if the matrix I ? P+i=01 Gi2j Ab(i+1

G = A0 I ?

X

+1

i=0

j) Gi2j Ab(i+1

!?1

:

(6.22)


93

Proof. The nonnegativity of the blocks fA(ij) gi0 , fAb(ij) gi1 and the stochasticity can be easily proved by induction on j . Finally, from equation (6.19), it readily follows that + X1 i2j b(j) ! G I ? G Ai+1 = A0: i=0

2

that, if the matrix A0 has no null columns, then the matrix I ? P+1Observe i Ab(j ) is nonsingular for any j . G i=0 j i+1 The following convergence results allows to recover the matrix G, after applying few steps of cyclic reduction [18]:

Theorem 6.6 (Quadratic convergence) Let G0 = limj!1 Gj . Then we have (j ) 0

A

I?

X

+1

i=1

(j )

Ai

!?1

= G0 ?

E1j ?

X

+1

i=1

(j )

Eij Ai

!

I?

X

+1

i=1

(j )

Ai

!?1

(6.23)

where Eij = Gij ? G0 = Gi2j ? G0. Moreover, if the entries of the matrix (I ? P+i=11 A(ij))?1 are bounded matrices above by a constant, then the sequence of P (j ) ?1 (j ) + 1 0 ( j ) converges quadratically to the matrix G . R = A0 I ? i=1 Ai Proof. Since the stochastic matrices P (j)T are irreducible and positive recurrent, the matrix + X1 +X1 i?l (j) C (j) = Gj Ai (6.24) l=1 i=l

has radius less than 1 (compare (1.10) and [80]). Therefore, since P+1spectral (j ) (j ) i=1 Ai C , for the Perron-Frobenius theorem 1.1 it holds that

X

+1

i=1

(j )

Ai

!

C (j) < 1;

that is, the matrix I ? P+i=11 A(ij) is nonsingular. By replacing Gij with G0 + Eij in (6.18) we arrive at (6.23). The quadratic convergence holds for theorem 6.4, part 3. 2 Under mild conditions, usually satis ed in applications, further convergence properties useful in the applications, can be proved [17]:

Lemma 6.7 For every integer j 0, the matrix I ? A(1j) is nonsingular and the nonnegative matrix sequence fA(1j) gj is nondecreasing. Moreover, there exists limj A(1j) = A01 .

94


Proof. (j) By following the same argument used in the proof of Thm. 6.6, since C < 1 (compare (6.24)), it follows that for any j , the matrix I ? A(1j) is nonsingular. Let us prove that the sequence of matrices fA(1j) gj is monotonically convergent. Comparing the coecients of z in both the sides of the second relation in (6.20), it follows that, for every j 0, A(1j+1) = A(1j) + D(j) , where D(j) is the j ) (z )(I ? A(j ) (z ))?1 A(j ) (z ). Since the coecients coecient of z in the series A(even odd even ?1 (j ) ( j ) of Aeven(z) and (I ? Aodd (z)) are nonnegative, we nd that D(j) 0, whence A(1j+1) A(1j). Moreover, since A(1j) has nonnegative entries not greater than 1, it follows that there exists limj A(1j) = A01. 2

Lemma 6.8 For every convergent subsequence fA(0j )gj of the sequence of matrices fA(0j) gj , it holds limj A(0j ) 6= 0. Proof. Let fA(0j ) gj be a convergent subsequence of fA(0j) gj . We prove that, if limj A(0j ) = 0, then limj Gj = 0, which would contradict part 2 of Thm. 6.4. In order to arrive at the condition limj Gj = 0, we prove that for any > 0 there exists a sequence of nonnegative integers m(j ) such that for any j it holds

eT Gj ? eT eT ? eT (M (j) )m(j);

(6.25)

eT Xi;j eT ? eT M (j) i;

(6.27)

1 A(j ) : First we introduce the matrix sequence de ned for where M (j) = P+h=1 h i; j 0, ( X0;j = 0 P (6.26) 1 X n A(j ) ; i 0 Xi+1;j = +n=0 i;j n such that Gj = limi Xi;j . Then we observe that

as it can be easily proved by induction on i (we leave it to the reader). Since 0 eT M (j) i eT , there exists an increasingi sequence of nonnegative numbers in = in(j ) such that the subsequence fM (j) n gn is convergent. Therefore, from (6.27), we nd that the matrix Gj = limn Xin;j satis es the inequality (j ) in : eT Gj eT ? eT lim n M

Hence, xed > 0, there exists a positive integer = (j ) such that (6.25) holds with m(j ) = i(j) (j ). If the fA(0j )gj converges to the null matrix, o n sequence then the sequence of matrices M (j ) j converges to a stochastic matrix, since nP+1 (j ) o h=0 Ah j is stochastic. Whence, by replacing j with j in (6.25) and by taking limits for j ! +1, we conclude that limj Gj = 0. 2


95

P Remark 6.9 We observe that from Lem. 6.8 it follows that, if +i=11 A(ij) is irP

reducible and if no accumulation point of +i=11 A(ij) is reducible, then the matrix (I ? P+i=11 A(ij) )?1 is bounded abovefor any j. In order to prove this we P+1 A(j) < 1 for any j . Inshow that, under these assumptions, i=1 i P (j ) + 1 sequence of nondeed, let = lim supj i=1 Ai and let j be anincreasing P ( ) negative numbers such that = limj +i=11 Ai j = limj P+i=11 A(ij ) and limj A(0j ) = A00 (observe that j exists since P+i=11 A(ij) and A(0j) belong to a compact set for any j ). In this way, since P+i=01 A(ij) is stochastic for any j , we have that A0 + A00 is stochastic, where A0 = limj P+i=11 A(ij ) . Since A0 is irreducible and, for Lem. 6.8, A00 0, A00 6= 0, from the Perron-Frobenius theorem 1.1 we deduce that 1 = (A0 + A00 ) > (A0 ) = . In the light of Lem. 6.7, if A(1j) isPirreducible for j = j0 , then the conditions of Rem. 6.9 on the irreducibility of +i=11 A(ij) and of the accumulation points of the latter sequence are satis ed. Lemma 6.10 If the matrix G is irreducible, then the sequence ?1

j = eT A(0j)(I ? A(1j)) g

is such that limj j = 1.

(6.28)

Proof. Since 0 j 1 for every integer j , it follows that the sequence j has convergent subsequences, whose limits belong to the set [0; 1]. We rst prove that 0 cannot be a limit. Let fj gj be a convergent subsequence. Assume without loss of generality that the sequence fA(0j ) gj is also convergent (the matrices A(0j) belong to a compact set) and set A00 = limj A(0j ) . If limj j = 0, then from (6.28) it holds that limj A(0j ) = 0 since the vector g is strictly positive, due to the irreducibility of G, and the nonnegative matrix (I ? A(1j))?1 has diagonal entries greater than 1. This contradicts Lem. 6.8. Now we show that any number such that 0 < < 1 cannot be the limit of any subsequence fj gj . From (6.18) it follows that + X1 A(0j ) = Gj ? Gj Gi?j 1A(ij ) ;

by substituting

Gj = geT

i=1

+ Ej (compare with Thm. 6.4) we obtain

A(0j ) = geT A0(j ) + Ej

X

+1

i=1

Gi?j 1A(ij ):

Hence, by taking limits for j ! 1, we nd that A00 = geT A00 = gvT , where vT = eT A00. From the recursive equation

?1 A(0j+1) = A(0j) I ? A(0j) A(0j) ;

96


obtained from (6.20), it follows that the sequence eT A(0j +1) is convergent and T A(j +1) = v T : lim e 0 j

It can be easily shown by induction on h that for every integer h 0 there exists T A(j +h) = 2h ?1 v T : lim e 0 j

Hence, for every nonnegative integer h there exists a nonnegative integer jh such that 8j jh jjeT A(0j +h) ? 2h ?1vT jj1 < 2h ?1: (6.29) Consider the strictly increasing sequence of nonnegative integers 0 = j h = h + minfj : j jh; j > h?1g; h 1: 0

It is hreadily seen, from (6.29), that the sequence eT A0(h ) is such that eT A0(h ) uT 2 ?1 for every nonnegative integer(h ,)where u is a constant nonnegative vector. Whence the sequence of matrices A0 h converges to the null matrix, giving a contradiction, again for Lem. 6.8. Therefore, the unique accumulation point of the sequence fj gj is 1; hence the sequence fj gj converges to 1. 2

Theorem 6.11 (Convergence of P (j)) If the matrix G is0 irreducible, then the

sequence of matrices P (j)T converges to a stochastic matrix P T having the M/G/1 structure (1.3). Moreover,

lim A(j) (I ? A(1j) )?1 = G0 j 0 and the matrix P 0T , de ned by the blocks Bi0 , i 1, of its rst block column and by the block entries A0i , i 0, of its second block column, is such that

Bi0 = A0i = 0; i = 2; 3; : : : : Proof. The convergence of the block entries Bi(j) , i 1, has been already proved in Sect. 6.2.1. Let us analyze the convergence of the block entries A(ij), i 0. Let fA(0j ) gj be a convergent subsequence of fA(0j)gj . Then for Lem. 6.10 we have

T A(j ) I ? A(j ) lim e 1 0 j

?1

g = 1:

Hence, for the stochasticity of the matrix P+i=01 Ai(j ) , it holds T X A(j ) lim e i j i=2 +1

?1 I ? A(1j ) g = 0:


97

Since the vector g is positive and the diagonal entries of the matrices I ? A(1j ) are greater than or equal to 1, then it holds lim eT j

X

+1

i=2

?1

Ai(j ) = 0:

Therefore, the unique accumulation points of the sequence fP (j)T gj are stochastic 0T matrices P such that A0i = 0, i = 2; 3; : : :. Moreover, for the convergence of the sequence Gj and from (6.18), we obtain that A00 = G0 ? G0A01 and that G0 = limj A(0j) (I ? A(1j) )?1. 2 By following the same arguments of Sec. 6.2.1 to show the convergence to zero of the blocks Bi(j) , i 2, for j ! +1, we now prove the following [18]:

Theorem 6.12 (Convergence of Ab(ij)) For the matrices Ab(ij) = ?Hci(j), i 2, generate by cyclic reduction it holds limj Ab(ij) = 0, for any i 2, whence b(j)?1 : G = lim A I ? Ai 0 j

(6.30)

Proof. The probability invariant vector associated with the M/G/1 type matrix (1.3) satis es the following equation

2 I ? A1 ?A0 0 3 2 1 3 2 B2 3 77 66 2 77 66 B3 77 66 ?A2 I ? A1 ?A0 75 64 3 75 = 64 B4 75 0 ; 64 ?A3 ?A2 I ? A1 ?A0 ...

...

...

...

...

...

...

where the block matrix in the left-hand size is the same block matrix of (1.11). Thus, applying j steps of cyclic reduction to the above system we nd that

2 Hc(j) H (j) 3 2 3 2 Bb (j) 3 0 1 0 66 Hc(j) H (j) H (j) 77 66 2j1+1 77 66 Bb1(j) 77 66 c2(j) 1(j) 0(j) (j) 77 66 j 77 = 66 b2(j) 77 0 ; 4 H.3 H.2 H1 H0 5 4 22. +1 5 4 B.3 5 ..

...

..

...

...

..

..

where Bbi(j) is such that Bbi(j) 0, for every i 1. Since limj i2j +1 = 0, for i 1, we have

2 Ab(j) ? I 3 1 02 Bb (j) 3 2H 02 Bb (j) 3 c1(j) 3 1 1 1 66 1Ab(j) 77 CC BB66 Bb (j) 77 66 H BB66 Bb (j) 77 (j ) 7 C c 7 C B66 b2(j) 77 0 ? 66 c2(j) 77 1 CC = lim BB66 b2(j) 77 0 + 66 b2(j) 77 1CC = 0: lim j B 4 A.3 5 A 4 H.3 5 A j @4 B.3 5 @4 B.3 5 ..

..

..

..


98

Therefore, for the conditions 0 > 0, 1 > 0, Bbi(j) 0, Ab(ij) 0, we deduce that limj Bi(j) = limj Ab(ij) = 0, for i 2. Equation (6.30) readily follows. 2 The results of Thms. 6.5, 6.6 and 6.12, allow us to devise the following algorithm for the numerical computation of the matrix G [18]:

Algorithm 6.2 (Computation of the Matrix G) Input. Positive integers q0 ; n0 ; m, n0 = 2q , an error bound > 0, and the 0

nonnegative m m matrices Ai , i = 0P ; 1; : : : ; n0, where n0 is an upper bound to the numerical degree of A(z) = +i=01 Ai zi .

Output. An approximation of the solution G of (1.7). Computation.

1. Apply the cyclic reduction procedure to (1.11), using (6.20), and thus obtaining the sequence (6.19) of in nite systems de ned by the blocks A(ij) ,Ab(ij), j = 1; 2; : : : ; r, until one of the following conditions is satis ed: (C1) jR(r) ? R(r?1) j < E , (C2) eT (I ? A(0r) (I ? A(1r) )?1) < eT , (C3) eT (I ? A0 (I ? Ab(1r) )?1 ) < eT , where, at each step j , the matrix R(j) is de ned by Thm. 6.6 and E is the m m matrix having all the entries equal to 1. 2. Compute an approximation of the matrix G: (a) If condition (C1) or condition (C2) is veri ed, compute an approximation of G by replacing, in the right hand side of (6.22), for j = r, the positive powers of G with R(r) , and by stopping the summation to the numerical degree of the series Ab(r) (z). (b) If condition (C3) is veri ed, an approximation of G is given by A0(I ? Ab(1r) )?1.

The eectiveness of Alg. 6.2 relies on the possibility of computing the blocks Ai , Ab(ij) , at each step j , in an ecient way. We may truncate the power series A(j) (z) and Ab(j) (z) to polynomials having -degree nj and nb j , respectively, acP nj (j )T cording to the de nition given in Sect. 2.3. In this way, the matrices i=0 Ai j A bi(j)T are -stochastic. and AT0 + Pbni=1 With this truncation, we may reduce the computation of the cyclic reduction steps to the multiplication of matrix polynomials and the inversion of matrix polynomials modulo zn , for values of n that can be dynamically adjusted step by (j )


99

step according to the numerical degrees of the power series involved. In this way we obtain an algorithm similar to the one described in Sect. 6.1 in the nite case but which does not require the truncation of the in nite matrices to a nite size. The cost of this computation, based on the coecient-wise polynomial arithmetic described in Sect. 2.2, amounts to O(m3n + m2 n log n) ops. A more ecient technique consists in using the point-wise evaluation/interpolation procedure at the roots of unity (see Sect. 2.3), of the series involved in (6.20). This approach consists in performing the following steps at the stage j +1 of the cyclic reduction procedure [21]: j ) (z ), A(j ) (z ), Ab(j ) (z ), Ab(j ) (z ) at the n -th roots 1. Evaluate the series A(odd j even odd even of unity, where nj ? 1 = 2q ? 1 is an upper bound to the -degree of the above series. 2. Perform a point-wise evaluation of (6.20) at the nj -th roots of unity. 3. Compute the coecients of the matrix polynomials P (z) and Pb (z) of degree nj ? 1, which interpolate the values of the matrix series A(j+1) (z) and Ab(j+1) (z) of (6.20). 4. Check whether or not the matrix polynomials P (z) and Pb (z) are good approximations of the series A(j+1) (z), Ab(j+1) (z), respectively. 5. If the polynomials P (z) and Pb (z) are poor approximations of A(j+1)(z), Ab(j+1) (z), set nj = 2nj and repeat steps 1{5 until the accuracy of the approximation is reached. Due to the properties of the FFT, in each doubling step, part of the results is already available at no cost (algorithms for performing these computations, where part of the output has been precomputed, are provided by the solutions of Problems 2 and 4 in Sect. 2.3). Moreover, unlike in the version based on the coecient-wise polynomial arithmetic, the computation of the reciprocal of a polynomial is avoided and the order of the involved DFT and IDFT is kept to its minimum value, thus substantially reducing the computational cost per iteration. The following properties [21] of cyclic reduction provide a good test for checking the accuracy of the approximations P (z), Pb (z) of the series A(j+1)(z), Ab(j+1) (z) needed at step 4. Theorem 6.13 (Mean values) Let, for any j 0,

(j)T = eT

X

+1

i=1

iA(ij) ; b (j)T = eT

X b(j) iAi+1 :

+1

i=1

Then the following recursive relations hold: j ) (1))?1 A(j ) (1) =2 ; (j+1)T = eT + (j)T ? (eT ? (j)T )(I ? A(odd even j ) (1))?1 Ab(j ) (1) =2: b (j+1)T = b (j)T ? (eT ? (j)T )(I ? A(odd odd

(6.31)

100


Proof. Observe that, at each step j , the following relation holds:

2 I ? Ab(j) ?A(j) 3 0 1 0 77 h T T T i 666 ?Ab(2j) I ? A(1j) ?A(0j) 0; e ; 2e ; 3e ; : : : 64 ?Ab(j) ?A(j) I ? A(j) ?A(j) 775 = 1 0 .3 .2 h

..

..

...

...

...

i

?b (j)T ; eT ? (j)T ; eT ? (j)T ; eT ? (j)T ; : : : :

By applying to the above system one step of cyclic reduction we easily obtain formulae (6.31). 2

Theorem 6.14 (Test condition) Let, at step j , P (n)(z) = Pin=0?1 Pizi and Pb (n) P ?1 Pb z i be the matrix polynomials of degree n ? 1 interpolating the series = in=0 i A(j) (z), Ab(j)(z) at the n-th roots of unity. Then the following inequalities hold: eT P+i=1n A(ij) (j)T ? eT Pin=1?1 iPi ; eT P+1 Ab(j) b (j)T ? eT Pn?1 iPb : i=n+1 i

i=1

i

Proof. The proof readily follows from Prop. 2.3 of Chap. 2 and from the de nition of (j) and b (j) given in Thm. 6.13. 2 Thms. 6.13 and 6.14 provide a good test to check the accuracy of the approximations of the series A(j) (z), Ab(j)(z) at each step j . Indeed, suppose we know the coecients of the series A(j) (z), Ab(j) (z), and compute the coecients Pi, Pbi, P P ?1 Pi z i , Pb (n) (z ) = n?1 Pb z i i = 0; : : : ; n ? 1, of the approximations P (n)(z) = in=0 i=0 i ( j +1) ( j +1) b of the series A (z), A (z) by interpolating the functional relations (6.20) at the n-th roots of unity. From Thm. 6.14 it follows that, if

(j+1)T ? eT and

nX ?1 i=1

iPi eT ;

(6.32)

nX ?1 (6.33) b (j+1)T ? eT iPbi eT ; i=1 ?1 A(j +1)T and AT + Pn Ab(j +1)T are -stochastic. Hence, then the matrices Pin=0 i i=1 i 0 the series A(j+1) (z), Ab(j+1) (z) have -degree n ? 1 and the matrix coecients of P (n)(z) and Pb (n) (z) are approximations of the corresponding coecients of A(j+1) (z), Ab(j+1) (z) within the error . It is important to point out that (6.32)

and (6.33) can be easily applied without knowing the coecients of the series A(j+1) (z), Ab(j+1) (z). In fact, the mean values (j+1) and b (j+1) can be explicitly obtained by means of (6.31) at the cost of O(m3) arithmetic operations. This provides an ecient tool to check the accuracy of the approximations P (n)(z), Pb (n)(z) to A(j+1)(z), Ab(j+1)(z).


101

Algorithm 6.3 (Computation of the coecients of A(j+1) (z), Ab(j+1) (z)) Input. An error bound > 0, nonnegative integers nj , qj , and the matrix power

series A(j) (z), Ab(j) (z), having numerical degree bounded above by nj = 2qj .

Output. An approximation of the matrix power series A(j +1) (z ), Ab(j +1) (z ),

and an upper bound nj+1 = 2qj to their numerical degrees. +1

Computation.

1. Set qj+1 = qj ? 1; nj+1 = 2qj . 2. Compute (j+1) and b (j+1) by means of (6.31). j ) (z ), A(j ) (z ) and Ab(j ) (z ), Ab(j ) (z ) at the 3. Evaluate the functions A(odd even even odd nj+1-th roots of unity, by means of DFT's. 4. Apply equations (6.20), and obtain the coecients of the matrix polynomials P (z) and Pb (z) interpolating A(j+1) (z), Ab(j+1) (z) at the nj+1 -th roots of unity. 5. Apply the tests (6.32) and (6.33) in order to check if P (z) and Pb (z) are good approximations of the series. If the inequalities (6.32) and (6.33) are satis ed then skip to the next stage. Otherwise, set nj+1 = 2nj+1 , qj+1 = qj+1 + 1 and repeat from stage 3. 6. Set A(j+1) (z) = P (z), Ab(j+1) (z) = Pb (z). +1

In the case of QBD processes, where P is a block tridiagonal matrix, the functions A(j)(z) and Ae(j) (z) are matrix polynomials of degree 2 and 1, respectively. By comparing the terms of same degree, we obtain the simple relations (compare with [69]):

A(0j+1) = A(0j) (I ? A(1j) )?1A(0j) A(1j+1) = A(1j) + A(0j)(I ? A(1j))?1 A(2j) + A(2j) (I ? A(1j) )?1A(0j) A(2j+1) = A(2j) (I ? A(1j) )?1A(2j) Ab(1j+1) = Ab(1j) + A(0j)(I ? A(1j))?1 Ab(2j) Ab(2j) = A(2j): Moreover, it is surprising to observe that the cyclic reduction step is equivalent to the squaring step in the Graee algorithm, which is used for factoring polynomials [26, 82]. In fact, the rst functional relation of (6.20) can be equivalently rewritten as z2 I ? A(j+1) (z2 ) = ?(zI ? A(j)(z))(I ? A1(j))?1(?z ? A(j) (?z)) :

102

CHAPTER 6. THE CYCLIC REDUCTION METHOD Time (s.) Iterations Residual 0.1 0.9 9 1:8 10?13 0.8 1.5 13 5:4 10?14 0.9 2.3 14 1:5 10?14 0.95 2.4 16 1:8 10?14 0.96 2.4 17 2:3 10?14 0.97 2.5 20 4:2 10?14 Table 6.1: Point-Wise Cyclic Reduction

By evaluating the determinants of both sides of the above equation we obtain

(j+1) (z2 ) = ? (j) (z) (j) (?z)= det(I ? A(1j) ); (6.34) where (j) (z) = det(zI ? A(j)(z)). The latter relation extends to matrix polynomials the Graee iteration, which is formally obtained from (6.34) for m = 1.

6.2.3 Numerical results

We have tested the cyclic reduction algorithm Alg. 6.2, with the point-wise computation described by Alg. 6.3, on a problem arising from the mathematical modeling of a Metaring MAC Protocol [4]. For this particular example the blocks Ai have dimension m = 16. We have tested the algorithm for dierent values of the parameter , 0 < < 1, which represents the stability condition of the associated Markov chain (see Sect. 1.3): as tends to one, the problem of computing the solution G of (1.7) by using customary techniques becomes more dicult; in fact, for = 1 the Markov chain is not positive recurrent. The numerical degree of the series A(0) (z) is equal to 168 for = 0:1, is equal to 240 for = 0:8, and is equal to 264 for the remaining tested values of 0:9. We have compared our algorithm with the \divide and conquer" algorithm of Chap. 5 and with the functional iteration method based on the recursion (4.4), with X0 = I , which is the fastest among the classical linearly convergent functional iterations (compare with Chap. 4). The numerical experiments have been performed on an alpha workstation with a base 2 arithmetic endowed with 53 bits. Table 6.1 reports the CPU time (in seconds) and the number of iterations needed by the Point-Wise Cyclic Reduction (PWCR) algorithm to compute an approximation of the matrix G, by choosing = 10?12; the residual of the computed approximation is also reported. Table 6.2 reports the CPU time (in seconds) and the block dimension n of the matrix Hn?1 of the \divide and conquer" algorithm (DC) the residual of the computed approximation, and the ratio between the times needed by DC and by PWCR. A \(*)" means that the residual

6.3. THE CASE OF NON-SKIP-FREE MATRICES Time (s.) Block dimension 0.1 18.5 512 0.8 198.5 4096 0.9 197.9 4096 0.95 199.0 4096 0.96 199.0 4096 0.97 199.3 4096

103

Residual DC/PWCR 4:0 10?16 20.1 1:3 10?13 132.3 9:6 10?8 86.0 (*) ? 5 1:2 10 82.9 (*) 1:7 10?5 82.9 (*) ? 5 1:7 10 79.7 (*)

Table 6.2: Divide and Conquer Algorithm

Time (s.) Iterations Residual FIF/PWCR 0.1 0.3 22 2:2 10?14 0.3 0.8 3.9 148 1:1 10?13 2.6 ? 13 0.9 10.8 373 1:2 10 4.7 0.95 44.4 1534 1:3 10?13 18.5 0.96 91.4 3158 1:3 10?13 38.1 ? 13 0.97 96.8 3336 1:3 10 38.7 Table 6.3: Functional Iteration Method errors of the two algorithms are not comparable and we were not able to increase the block dimension n, due to lack of memory. Observe that the algorithm PWCR, besides being faster than DC, leads to a higher accuracy of the result. We also compared our method with classical techniques based on functional iterations: Table 6.3 reports the CPU time (in seconds) and the number of iterations needed by the Functional Iteration Formula (FIF) de ned in (4.1), the residual of the computed approximation, and the ratio between the times needed by this algorithm and by PWCR. Table 6.3 shows the eectiveness of our method, specially in cases where customary techniques converge very slowly. Indeed, observe that the cyclic reduction algorithm is almost insensitive to the values of the parameter , whereas the performances of the functional iteration method and of the doubling algorithm strongly deteriorate as approaches to 1.

6.3 The case of non-skip-free matrices Since a non-skip-free matrix (1.4) can be reblocked into an M/G/1 matrix (1.3), we may apply the cyclic reduction technique for solving problems with the structure (1.4). However, in this way, we would not exploit the additional structure


104

of the problem, more speci cally the fact that the blocks of the matrix P , being block Toeplitz matrices, have block displacement rank at most 2, and the fact that the matrix G that solves the equation (1.7) has block displacement rank 1. In this section we present new results that allow us to fully exploit the problem structure. In this way we devise an algorithm, based on FFT, that performs a single cyclic reduction step in O(m2kn log(kn) + m3 kn log2 k) arithmetic operations, where n is an upper bound to the numerical degree of the involved series [22]. So consider the problem of solving the in nite system (1.11) (the same technique can be applied to solve any generalized block Hessenberg block Toeplitz system, for instance banded block Toeplitz systems [19]).

6.3.1 Structural properties of cyclic reduction

Let Ai of (1.5) denote the matrices that are obtained by reblocking the matrix (1.4) and consider the sequence of matrix power series A(j)(z), Ab(j)(z) that are obtained by means of the functional relations (6.20), where each A is replaced with P (0) i + 1 (0) (0) (z ) A. We note that the matrix power series H ( z ) = H z = zI ? A i i =0 c(0) (z) = P+i=01 Hci(0) zi = I ? Ab(0) (z) are block Toeplitz matrices. and H A direct inspection further shows that the Toeplitz structure is generally lost (j ) (z ) = P+1 H(j ) z i = zI ? A(j ) (z ) and H c(j)(z) = by the matrix power series H =0 i P+1 Hc(j)zi = I ? Ab(j)(z), for j 1. iHowever, the displacement structure is i=0 i ( j ) c preserved by the matrix power series H (z) and H(j) (z). This fact allows us to devise an FFT-based implementation of the cyclic reduction algorithm [19, 22]. Denote with e1 = [I; 0; : : : ; 0]T , e2 = [0; : : : ; 0; I ]T the km m matrices made up by the rst and the last m columns of the identity matrix of order km. Let r(j)(z) = eT2 H(j) (z), c(j)(z) = H(j) (z)e1, be the last block row and the c(j)(z), rst block column of H(j)(z), respectively. Similarly, denote rb (j) (z) = eT2 H cb(j) (z) = Hc(j) (z)e1. We prove the following result [22].

Theorem 6.15 (Displacement of H(j)(z), Hc(j) (z)) For the matrix power sec(j) (z) = I ? Ab(j)(z) generated at j -th step of cyclic ries H(j) (z) = zI ? A(j) (z), H reduction we have

k;m(H(j)(z)) = c(j) (z)u(j) (z) ? v(j) (z)r(j)(z) c(j) (z)) = e1r(0) (0) ? c(j)(z)ub (j)(z) ? v(j)(z)rb (j)(z) k;m(H

P

P

where u(j) (z) = +i=01 u(ij) z i , ub (j) (z) = +i=01 ub (ij) z i are m mk matrix (block row P vector) power series, v (j) (z) = +i=01 v(ij) zi is an mk m matrix (block column

6.3. THE CASE OF NON-SKIP-FREE MATRICES

105

vector) power series, de ned by

(j) 2 ?1 (j) ( j ) u z) = ? u (z) Hodd (z ) H (?z) (j) 2 ?1 (j) even ( j +1) ( j ) v (z) = ? H (?z) Hodd (z ) v (z) even j ) (z ) + u(j ) (z ) H(j ) (z )?1 H (j ) (z ) c bu(j+1) (z) = ub (odd even odd odd (j +1) (

(6.35)

for j 0, and u(0) (z) = zeT2 , ub (0) (z) = eT2 , v (0) (z) = ze1 . Proof. We prove relation (6.35) by induction on j . For j = 0 a simple calculation shows that (H(0) (z)) = z(c(0) (z)eT2 ? e1 r(0) (z)) = c(0) (z)u(0) (z) ? v(0) (z)r(0) (z): Suppose that relation (6.35) holds for a xed j 0; we show that it holds for j +1. For simplicity let us denote B = H(j) (z)?1 ?H(j) (?z)?1 . For the properties of the displacement operator (2.18) and for (6.21) we have:

(H(j+1) (z2 )) = 2z(B ?1 ) = ?2zB ?1 (B )B ?1 = ?2zB ?1 (?H(j) (z)?1 (H(j)(z))H(j) (z)?1+ H(j) (?z)?1 (H(j) (?z))H(j) (?z)?1 )B ?1; by substituting relation (6.35) in the above equation we obtain (H(j+1)(z2 )) = 2zB ?1 e1u(j)(z)H(j) (z)?1 B ?1 ? 2zB ?1 H(j)(z)?1 v(j) (z)eT2 B ?1? 2zB ?1e1u(j)(?z)H(j) (?z)?1 B ?1 + 2zB ?1 H(j)(?z)?1 v(j)(?z)eT2 B ?1 = c(j+1)(z2 )u(j+1) (z2 ) ? v(j+1)(z2 )r(j+1) (z2 ); where u(j+1) (z) and v(j+1) (z) satisfy (6.35). Similarly, we proceed for proving the second equation in (6.35) and the third in (6.35). 2 From the above theorem, the following representations hold for the matrix c(j) (z) [22]: power series H(j)(z) and H

Theorem 6.16 (Representation theorem) At each step j of cyclic reduction it holds that

H(j)(z) = L(c(j) (z))(I ? LT (Z u(j)(z)T )) + L(v(j) (z))LT (Z r(j) (z)T ) Hc(j)(z) = L(cb(j) (z)) + LT (Z r(0) (0)T ) + L(c(j) (z))LT (Z ub (j) (z)T ) + L(v (j)(z))LT (Z rb (j)(z)T ): Proof. The proof simply follows from Thms. 2.7 and 6.15. 2 From the above results it follows that, at each step j , only the seven block vector power series u(j) (z), ub (j) (z), v(j) (z), c(j) (z), r(j) (z), cb(j) (z), rb (j)(z), need c(j)(z). to be computed in order to represent the matrix power series H(j)(z) and H


106

(j ) (z ) = P+1 c(j ) z i , ExplicitPrelations between thePblock vector power series c i=0 i r(j)(z) = +i=01 r(ij)zi , cb(j) (z) = +i=01 cb(ij)zi , rb (j) (z) = P+i=01 rb (ij)zi at two subsequent steps of cyclic reduction can be obtained by multiplying the functional relations (6.20) on the left by eT2 , and on the right by e1. c(j) (z) are related to the Moreover, the block vectors de ning H(j)(z) and H c(j+1) (z), by block vectors obtained at step j + 1, which de ne H(j+1) (z) and H means of functional relations that only involve sums of products between block triangular block Toeplitz matrices. Hence, in the light of the results of Sect. 2.2 and of Thm. 6.16, the computational cost needed to perform the j -th step of cyclic reduction by using FFT reduces to O(nj m3 k log2 m + m2knj log knj ), where nj is an upper bound to the numerical degree of the matrix power series H(j)(z) and Hc(j)(z). In the case where Hi = 0 for i > 2 the above results are further simpli ed. c1(j), moreover We can give explicit expressions of the blocks Hi(j) , i = 0; 1; 2, H since H(j) (z) is a matrix polynomial of degree 2 having block displacement rank 2 then it follows that its matrix coecients have block displacement rank at most 3. In fact we have the following theorems: Theorem 6.17 (The tridiagonal case) If Ai = 0 for i > 2, then for the mac1(j), j 0, generated by the cyclic reduction (6.20) we trices Hi(j) , i = 0; 1; 2, H have (H0(j) ) = c(0j) u(0j) ? v(0j)r(0j) (H1(j) ) = c(0j) u(1j) + c(1j)u(0j) ? v(0j)r(1j) ? v(1j) r(0j) (H2(j) ) = s(j)u(1j) ? v(1j) rb (1j) (j ) (j ) (j ) b (j ) c1(j) ) = ?e1r(0) (H 0 ? c0 u1 ? v 0 r 1 ;

?1

(j +1) = s(j ) ? H(j ) H(j ) c(j ) , for j 0. where s(0) = c(0) 1 ,s 2 1 0

Proof. The proof follows by comparing the terms of the same degree in formula (6.35). 2

Theorem 6.18 (Representation? theorem) At each step j of cyclic reduction, c1(j) can be rewritten as the matrices H0(j) , H1(j) , H2(j) , H1(j) , H H0(j) = L(c(0j))(I ? LT (Z u0(j)T )) + L(v (0j))LT (Z r0(j)T ) H1(j) = L(c(1j))(I ? LT (Z u0(j)T )) ? L(c(0j) )LT (Z u1(j)T ) + L(v (0j))LT (Z r1(j)T ) + L(v(1j))LT (Z r0(j)T ) H2(j) = L(c(2j)) ? L(s(j) )LT (Z u1(j)T ) + L(v (1j))LT (Z rb 1(j)T ) H1(j)? = L(H1(j)? e1 ) + LT (Z H1(j)?T u0(j)T ) + L(H1(j)? c(0j))LT (Z H1(j)?T u1(j)T ) ? L(H1(j)? v(1j) )LT (Z H1(j)?T r0(j)T ) T (j ) T (j )T (j ) T b (j )T Hc1(j) = L(cb(1j)) + LT (Z r(0) 0 ) ? L(c0 )L (Z u1 ) + L(v 0 )L (Z r 1 ): 1

1

1

1

1


107

Proof. The proof follows from Thm. 6.17 and by the results of Sect. 2.4. 2 Explicit relations for the vectors u(ij) , v(ij) c(ij) , r(ij), cb(ij), rb (ij) , can be provided:

u(0j) = u(0j?1) ? u(1j?1)?H1(j?1)? H0(j?1) u(1j) = ?u(1j?1)H1(j?1) H2(j?1)? v(0j) = v(0j?1) ? H0(j?1)?H1(j?1) v(1j?1) v(1j) = ?H2(j?1) H1(j?1)? v(1j?1) c(0j) = ?H0(j?1) H1(j?1) c(0j?1)? c(1j) = c(1j?1) ? H0(j?1)?H1(j?1) c(2j?1) ? H2(j?1)H1(j?1)? c(0j?1) c(2j) = ?H2(j?1) H1(j?1)? c(2j?1) r(0j) = ?r(0j?1)H1(j?1) H0(j??1) r(1j) = r(1j?1) ? r(0j?1)?H1(j?1) H2(j?1) ? r(2j?1)H1(j?1)? H0(j?1) r(2j) = ?r(2j?1)H1(j?1) H2(j?1)? rb 1(j) = rb (1j?1) ? H0(j?1)H1(j?1)? r(2j?1) cb(1j) = cb(1j?1) ? c(0j?1) H1(j?1) H2(j?1) (0) (0) T (0) for j 1, where u(0) 0 = v 0 = 0, u1 = e2 , v 0 = e1 . For this particular case, the computational cost per step is reduced to O(m3k log2 k) operations. For the blocks A(ij), Ab(ij) the quadratically convergence properties of the pre1

1

1

1

1

1

1

1

1

1

1

1

1

1

vious section still hold. Hence, due to the displacement structure of the blocks Ai of (1.5), we may devise a quadratically convergent algorithm, with low computational cost, for non-skip-free M/G/1 type Markov chains, just by applying ci(j) the cyclic reduction technique where the computations of the blocks Hi(j) , H is performed by exploiting their Toeplitz-like structure according to the formulae displayed in this section. This new algorithm particularly shows its eectiveness, with respect to the linearly convergent method of [45], when the block dimension k is large; in fact, the computational cost of one step of the latter method is O(m3kn) and many iterations may need to be computed in order to reach a good approximation of G. Numerical experiments which show the eciency of our approach are presented in the Sect. 6.3.4.

6.3.2 Convergence properties

In this section we prove some convergence properties of the cyclic reduction for non-skip-free Markov chains of QBD processes that improve the known properties valid in the general case, presented in Sect. 6.2, and express in a dierent form the convergence results of [69]. More speci cally we prove that the lower diagonal blocks A(2j) , generated by cyclic reduction, tend to zero as O(?2j ), where is the minimum modulus of the zeros of the function (z) = det(zk I ? A(z)) lying out of the unit disk.


108

Consider the power series (z) and ?(z) de nedP as (z) = det(zk I ? A(z)), P A(z) = +i=01 zi Ai, ?(z) = det(zI ? A(z)), A(z) = +i=01 zi Ai. Let us rst recall the following result of [45] that relates the zeros of (z) and ?(z).

Theorem 6.19 (Relation among ?(z) and (z)) For the power series ?(z) it holds

(z!i); ! = cos 2k + i sin 2k ; i2 = ?1: i=0 A consequence of the above theorem is that, if is zero of the power series

(z) then k is zero of ?(z). Conversely, if is zero of ?(z) then there exists a k-th root of which is zero of (z). Throughout this section we assume that A(z) is a matrix polynomial such that

(z) is a polynomial of degree 2M , M = km. This condition is not restrictive if we consider QBD problems, where Ai = 0 for i > 2. In fact in this case

(z) is a polynomial of degree at most 2M , moreover, if (z) has degree less than 2M we may replace A(z) with A(z) = A(z) + z2k I ? (z), where (z) is chosen in such a way that A(z) has nonnegative coecients and A(1) is column stochastic. The power series (z) exists since A(1) has not null columns. In this way, (z) = det(zk I ? A(z)) has degree 2M , and the corresponding Markov chain is positive recurrent, for any 6= 0 in a suitable neighborhood of 0, so that we may apply a homotopy argument based on the continuity of the eigenvalues/eigenvectors of a matrix polynomial with respect to its coecients. We also assume that m = 1, the general case can be treated similarly and is left to the reader. Let us denote ?(j)(z) = det(zI ? A(j)(z)). By taking determinants in both sides of (6.21) we have ?(zk ) =

kY ?1

?(j+1)(z2 ) = ??(j) (z)?(j) (?z)= det H0(j): An immediate consequence of the above relation is that the 2M zeros of the polynomial ?(j)(z) are explicitly given by

iM 2j ; i = 1; : : : ; 2M; where

(i) = 0; i = 1; : : : ; 2M: This property is very useful to prove some convergence results of the cyclic reduction algorithm. First observe that, for ranging in the set of zeros of (z), multiplying on the left the reblocked matrix H (0) of (1.11) (where each A is replaced with A) by the vector T = [1; ; 2; : : :] yields T H (0) = [wT ; 0; 0; : : :], where w is a suitable vector of M components.


109

As it is well known [80], for the positive recurrence of the Markov chain, the polynomial ?(z) has M zeros in the closed unit disk, and consequently, M zeros of modulus greater than 1. Let us arrange the zeros i in such a way that j1j j2j : : : > jMj+1 j = 1 : : : j2M j. We consider the M M Vandermonde ? 1 matrix VM = (i )i;j=1;M made up with the rst M zeros 1; : : : ; M . It holds h i VM ; DVM ; D2VM ; : : : H (0) = [W; 0; 0; : : :] ; D = Diag(1M ; : : : ; MM ); (6.36) where W is a suitable M M matrix. Observe that, if the zeros i, i = 1; : : : ; M are pair-wise distinct, then the matrix VM is nonsingular. For a polynomial

(z) having multiple zeros out of the unit circle, we may replace VM with the generalized Vandermonde matrix that is invertible and satis es relation (6.36). The homogeneous part of the above relation can be rewritten as VM H0 + DVM H1 + D2VM H2 = 0: By applying the cyclic reduction to the system (6.36) we deduce that VM H0(j) + D2j VM H1(j) + D2j VM H2(j) = 0; j = 0; 1; : : : Multiplying the above relation on the left by VM?1 we nd H0(j) + F M 2j H1(j) + F M 2j H2(j) = 0; j = 0; 1; : : : where F = VM?1 DVM is the Frobenius matrix associated with the polynomial having zeros i, i = 1; : : : ; M (compare [90]). Whence we nd that H2(j) + F ?M 2j H1(j) + F ?M 2j H0(j) = 0; j = 0; 1; : : : (6.37) The following result can be obtained from the above relations [22, 19]. Theorem 6.20 (Convergence speed) Let i, i = 1; : : : ; 2M , be the zeros of the polynomial (z) of degree 2M , ordered such that j1 j j2j : : : > jM +1j = 1 jM +1j : : : j2M j. Then we have limj A(2j) = 0. Moreover, if M is simple, then for any operator norm jj jj, jjA(2j)jj tends to zero as O(jM j?M 2j ). (j ) Otherwise, for any > 0, there exists an operator norm jj jj such that jjA 2 jj j ? 1 M 2 tends to zero as O((jM j + ) ). Proof. For any > 0 let jjjj be an operator norm such that jjF ?1jj < (F ?1 )+ (compare [52]), where (F ?1) denotes the spectral radius of F ?1. Then from (6.37) we have jjA(2j)jj jjF ?1jjM 2j jjI ? A(1j)jj + jjF ?1jjM 2j jjA(0j)jj; whence, since jjA(0j)jj; jjI ? A(1j)jj are bounded from above by a constant, it holds jjA(2j)jj = O(((F ?1) + )M 2j ) = O((jM j?1 + )M 2j ): If M is a simple root of (z) then there exists a norm jj jj such that jjF ?1jj = jM j?1. Therefore, by applyingj the same argument as before, the above relation turns into jjA(2j)jj = O(jM j?M 2 ). 2 +1

+1

+1

+1

110


6.3.3 Structural properties of the matrix G.

The argument used for proving Thm. 6.20 can be used for obtaining some structural properties of G, already proved in [45], that hold for general non-skip-free Markov chains. In fact, let us assume for simplicity that the M zeros of modulus less than or equal to 1 of (z), denoted with 1; : : : ; M , are pair-wise distinct and let u1; : : : ; uM be the corresponding m-dimensional vectors such that uTi (ik I ? A(i)) = 0, i = 1; : : : ; M . Let us denote with U the M m matrix whose i-th row is uTi , with D the diagonal matrix having eigenvalues 1 ; : : : ; M , and with Vb the M M matrix Vb = [U; DU; D2U; : : : ; Dk?1U ]. Multiplying on the left the matrix H (0) by the block vector [Vb ; Dk Vb ; D2k Vb ; : : :] yields the following equation

Vb H0 + Dk Vb H1 + D2k Vb H2 + : : : = 0:

For Vb nonsingular (this holds if m = 1), multiplying the above equation on the left by Vb ?1 yields H0 + Fb k H1 + Fb 2k H2 + : : : = 0 where Fb = Vb ?1 DVb . Due to the structure of Vb it is easy to check that the matrix Fb is a k k block Frobenius matrix, i.e.,

: : : : : : 0 R0 3 ... R 77 ... 1 7 . . . . . . ... ... 777 ; . . . . . . 0 ... 775 0 : : : 0 I Rk?1 where R0; : : : ; Rk?1 are m m matrices. Since Fb k has M eigenvalues ik , i = 1; : : : ; M , of modulus at most 1, and since Fb k solves the equation (1.7), then G = Fb k (compare [80]). This result has been already obtained by Gail et al. [45]. We recall that the k-th power of a k k Frobenius matrix can be factored as the product of a lower triangular Toeplitz matrix and an upper triangular Toeplitz matrix [26], and deduce that for m = 1 the matrix G = Fb k can be factored in the same way. This property can be proved even for m > 1 in terms of block triangular block Toeplitz matrices. This can also be obtained by using the concept of displacement rank extended to in nite matrices and applied to the reblocked system (1.11). In fact, from (1.11) we deduce that G = A0E , ? (0) where E is the M M leading principal submatrix of H . Moreover, A0 is a block lower triangular block Toeplitz ?matrix and, due to the properties of the displacement operator, the matrix H (0) can be factored into the product of a

20 66 66 I Fb = 66 0 66 . 4 ..

1

1

6.3. THE CASE OF NON-SKIP-FREE MATRICES NCR

CR

Steps Time Residual Steps Time Residual 16 10 14 6 10?16 10 1 1 10?15 32 9 19 6 10?15 9 10 1 10?15 ? 13 64 9 38 5 10 9 136 2 10?14 128 10 89 1 10?11 * * * 256 9 146 2 10?15 * * * 512 12 448 3 10?11 * * * k

111 GHT

Steps Time Residual 1534 6 1 10?10 848 13 1 10?10 980 60 1 10?10 2229 555 1 10?10 800 816 1 10?10 1380 5830 2 10?4

Table 6.4: Non-skip-free Markov chains block lower triangular block Toeplitz matrix and a block upper triangular block Toeplitz matrix. Thus, E and consequently G, admit the block LU factorization with L and U block Toeplitz matrices. The existence of this factorization can be also proved by means of the relation (6.30); in fact, from the general converge c1(j) = properties presented in Sect. ?6.2 and from the displacement structure of H c1(j) converges to a matrix that can be factored into I ? Ab(1j), it follows that H the product of a block lower triangular block Toeplitz matrix and a block upper triangular block Toeplitz matrix. The relation G = Fb k can be used for computing approximations of G in the case where m = 1, and k = M . In fact, we may approximate the zeros of (z) in the unit disk, 1; : : : ; M , say, by means of Aberth's method [1], and then compute the coecients of the polynomial QMi=1(z ? iM ), thus obtaining the entries of the Frobenius matrix Fb . Finally we compute Fb k by means of repeated squaring, by exploiting the displacement structure of F i, for integer i. It is possible to prove that the most expensive stage of this computation is performing Aberth's method, where each iteration costs O(k2n), where n is the numerical degree of (z). This cost is asymptotically higher than the cost of performing one step of cyclic reduction, i.e., O(kn log nk + kn log2 k). 1

6.3.4 Numerical results

We implemented our algorithm for reblocked QBD problems in Fortran 90 (Nonskip-free Cyclic Reduction, NCR Algorithm) and compared it with the cyclic reduction (CR Algorithm) for general QBD of [18], and with the algorithm of Gail, Hantler, Taylor [45] (GHT Algorithm). We generated random stochastic matrices with m = 4 for several values of k. Table 6.4 reports the time (in seconds) needed by the programs to compute the matrix G, together with the residual error and the number of iterations sucient to reach the requested accuracy. A \*" denotes failure of the program due to lack of memory.

112


The computations have been performed on an alpha workstation. From the numerical experiments it results that our algorithm is much faster than CR and GHT if the number k of boundary levels is suciently large. Moreover, for the small memory storage, it is much robust than CR. Unlike GHT, the performance of our algorithm does not seem to be dependent on the values of the input data. This is likely due to the quadratic convergence of the cyclic reduction.

Bibliography [1] O. Aberth. Iteration methods for nding all zeros of a polynomial simultaneously. Math. Comp., 27, 1973. [2] G. Ammar and W. Gragg. Superfast solution of real positive de nite Toeplitz systems. SIAM J. Matrix Anal. Appl., 9:61{76, 1981. [3] G. Anastasi and L. Lenzini. Personal communication, 1996. [4] G. Anastasi, L. Lenzini, and B. Meini. Performance evaluation of a worst case model of the MetaRing MAC protocol with global fairness. Performance Evaluation, 29:127{151, 1997. [5] J. L. Barlow. On the smallest positive singular value of a singular M-matrix with applications to ergodic Markov chains. SIAM J. Alg. Discr. Meth., 7:414{424, 1986. [6] J. L. Barlow. Error bounds for the computation of null vectors with application to Markov chains. SIAM J. Matrix Anal. Appl., 14:798{812, 1993. [7] A. Ben-Artzi and T. Shalom. On inversion of block Toeplitz matrices. Integral Equations Oper. Theory, 8:751{779, 1985. [8] A. Ben-Artzi and T. Shalom. On inversion of Toeplitz and close to Toeplitz matrices. Linear Algebra Appl., 75:173{192, 1986. [9] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York, 1979. [10] D. Bini. On a class of matrices related to Toeplitz matrices. Technical Report 83-5, SUNY University, Albany, New York, 1983. [11] D. Bini and M. Capovani. Spectral and computational properties of band symmetric Toeplitz matrices. Linear Algebra Appl., 52:99{126, 1983. [12] D. Bini and M. Capovani. Tensor rank and border rank of band Toeplitz matrices. SIAM J. Comput., 16:252{258, 1987. 113

114

BIBLIOGRAPHY

[13] D. Bini and F. Di Benedetto. A new preconditioner for the parallel solution of positive de nite Toeplitz systems. In Proceedings 2nd Annual SPAA, Crete, Greece, pages 220{223. ACM Press, July 1990. [14] D. Bini and P. Favati. On a matrix algebra related to the discrete Hartley transform. SIAM J. Matrix Anal. Appl., 14:500{507, 1993. [15] D. Bini and L. Gemignani. Fast parallel computation of the polynomial remainder sequence via Bezout and Hankel matrices. SIAM J. Comput., 24:63{77, 1995. [16] D. Bini and B. Meini. Exploiting the Toeplitz structure in certain queueing problems. Calcolo, 33:289{305, 1996. [17] D. Bini and B. Meini. On cyclic reduction applied to a class of Toeplitz-like matrices arising in queueing problems. In W. J. Stewart, editor, Computations with Markov Chains, pages 21{38. Kluwer Academic Publisher, 1996. [18] D. Bini and B. Meini. On the solution of a nonlinear matrix equation arising in queueing problems. SIAM J. Matrix Anal. Appl., 17:906{926, 1996. [19] D. Bini and B. Meini. Eective methods for solving banded Toeplitz systems. Annual SIAM Meeting, Stanford, CA, 1997. Submitted for publication, 1997. [20] D. Bini and B. Meini. Fast algorithms for structured problems with applications to Markov chains and queueing models. In Fast reliable methods for matrices with structure. SIAM, 1997. In printing. [21] D. Bini and B. Meini. Improved cyclic reduction for solving queueing problems. Numerical Algorithms, 15:57{74, 1997. [22] D. Bini and B. Meini. Using displacement structure for solving non-skip-free M/G/1 type Markov chains. Submitted for publication, 1997. [23] D. Bini and B. Meini. Inverting block Toeplitz matrices in block Hessenberg form by means of displacement operators: application to queueing problems. Linear Algebra Appl., 272:1{16, 1998. [24] D. Bini and V. Pan. Polynomial division and its computational complexity. J. Complexity, 2:179{203, 1986. [25] D. Bini and V. Pan. Improved parallel computations with Toeplitz-like and Hankel matrices. Linear Algebra Appl., 188{189:3{29, 1993. [26] D. Bini and V. Pan. Matrix and Polynomial Computations, Vol. 1: Fundamental Algorithms. Birkhauser, Boston, 1994.

BIBLIOGRAPHY

115

[27] E. Bozzo. Algebras of higher dimension for displacement decompositions and computations with Toeplitz plus Hankel matrices. Linear Algebra Appl., 230:127{150, 1995. [28] E. Bozzo and C. Di Fiore. On the use of certain matrix algebras associated with discrete trigonometric transforms in matrix displacement decomposition. SIAM J. Matrix Anal. Appl., 16:312{326, 1995. [29] B. L. Buzbee, G. H. Golub, and C. W. Nielson. On direct methods for solving Poisson's equation. SIAM J. Num. Anal., 7:627{656, 1970. [30] R. H. Chan and M. K. Ng. Conjugate gradient methods for Toeplitz systems. SIAM Review, 38:427{482, 1996. [31] E. Cinlar. Introduction to stochastic processes. Prentice-Hall, Englewood Clis, N. J., 1975. [32] J. N. Daigle and D. M. Lucantoni. Queueing systems having phasedependent arrival and service rates. In W. J. Stewart, editor, Numerical solution of Markov Chains, pages 161{202. Marcel Dekker, New York, 1991. [33] F. De Hoog. On the solution of Toeplitz systems. Linear Algebra Appl., 88{89:123{138, 1987. [34] F. Di Benedetto, G. Fiorentino, and S. Serra. C.G. preconditioning for Toeplitz matrices. Computers Math., 25:35{45, 1993. [35] C. Di Fiore and P. Zellini. Matrix decompositions using displacement rank and classes of commutative matrix algebras. Linear Algebra Appl., 229:49{ 99, 1995. [36] D. F. Elliott and K. R. Rao. Fast Transform Algorithms, Analyses, Applications. Academic Press, New York, 1982. [37] E. Falkenberg. On the asymptotic behaviour of the stationary distribution of Markov chains of M=G=1 type. Commun. Statist. Stochastic Models, 10:75{ 97, 1994. [38] P. Favati and B. Meini. On functional iteration methods for solving M/G/1 type Markov chains. Submitted for publication, 1997. [39] P. Favati and B. Meini. On functional iteration methods for solving nonlinear matrix equations arising in queueing problems. Submitted for publication, 1997. [40] P. Favati and B. Meini. Relaxed functional iteration techniques for the numerical solution of M/G/1 type Markov chains. BIT, 38, 1998. In printing.

116

BIBLIOGRAPHY

[41] R. E. Funderlic and J. B. Mankin. Solution of homegeneous systems of linear equations arising in compartmental models. SIAM J. Sci. Stat. Comput., 2:375{383, 1981. [42] R. E. Funderlic and R. J. Plemmons. LU decomposition of M-matrices by elimination without pivoting. Linear Algebra Appl., 41:99{110, 1981. [43] R. E. Funderlic and R. J. Plemmons. A combined direct-iterative method for certain M-matrix linear systems. SIAM J. Alg. Discr. Meth., 5:33{42, 1984. [44] H. R. Gail, S. L. Hantler, and B. A. Taylor. Solutions of the basic matrix equation for M/G/1 and G/M/1 type Markov chains. Commun. Statist. Stochastic Models, 10:1{43, 1994. [45] H. R. Gail, S. L. Hantler, and B. A. Taylor. Non-skip-free M/G/1 and G/M/1 type Markov chains. Adv. Appl. Prob., 1997. To appear. [46] L. Gemignani. Schur complement of Bezoutians with applications to the inversion of block Hankel and block Toeplitz matrices. Linear Algebra Appl., 1997. To appear. [47] I. C. Gohberg and G. Heinig. Die Resultantenmatrix und ihre Verallgemeinerungen. II: Ein kontinuierliches Analogon des Resultantenoperators. Acta Math. Acad. Sci. Hung., 28:189{209, 1976. [48] I. C. Gohberg and V. Olshevsky. Displacement structure approach to Chebyshev-Vandermonde and related matrices. Integral Equations and Operator Theory, 20:65{92, 1992. [49] I. C. Gohberg and A. A. Semencul. Ueber die Inversion endlicher Toeplitzscher Matrizen und ihre kontinuierlichen Analoga. Mat. Issled. 7, 24:201{223, 1972. [50] I. C. Gohberg and T. Shalom. On inversion of square matrices partitioned into non-square blocks. Integral Equations Oper. Theory, 12:539{566, 1989. [51] I. C. Gohberg and T. Shalom. On Bezoutians of nonsquare matrix polynomials and inversion of matrices with nonsquare blocks. Linear Algebra Appl., 137/138:249{323, 1990. [52] G.H. Golub and C.F. van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 1989. [53] A. Graham. Kronecker Products and Matrix Calculus with Applications. Ellis Horwood Limited, Chichester, 1981.

BIBLIOGRAPHY

117

[54] W. K. Grassman, M. I. Taksar, and D. P. Heyman. Regenerative analysis and steady state distribution for Markov chains. Oper. Res., 33:1107{1116, 1985. [55] U. Grenander and G. Szego. Toeplitz Forms and Their Applications. Chelsea Publishing, New York, 2nd edition, 1984. [56] L. Gun. Experimental results on matrix-analytical solution techniques. Extensions and comparisons. Commun. Statist. Stochastic Models, 5:669{682, 1989. [57] W. J. Harrod. Rank modi cation methods for certain singular systems of linear equations. PhD thesis, Univ. of Tennessee, Knoxville. TN, Department of Mathematics, 1984. [58] W. J. Harrod and R. J. Plemmons. Comparison of some direct methods for computing stationary distributions of Markov chains. SIAM J. Sci. Stat. Comput., 5:453{469, 1984. [59] G. Heinig and K. Rost. Algebraic Methods for Toeplitz-like Matrices and Operators. Akademie-Verlag, Berlin, and Birkhauser, Boston, 1984. [60] T. Kailath, S. Kung, and M. Morf. Displacement ranks of matrices and linear equations. J. Math. Anal. Appl., 68:395{407, 1979. [61] T. Kailath and A.H. Sayed. Displacement structure: theory and applications. SIAM Review, 37:297{386, 1995. [62] T. Kailath, A. Viera, and M. Morf. Inverses of Toeplitz operators, innovations and orthogonal polynomials. SIAM Review, 20:106{119, 1978. [63] D. E. Knuth. The Art of Computer Programming: Seminumerical Algorithms, volume 2. Addison Wesley, Reading, Mass., 1981. [64] J. C. Lafon. Base tensorielle des matrices des Hankel (ou de Toeplitz), applications. Numer. Math., 23:249{361, 1975. [65] P. Lancaster and M. Tismenetski. The Theory of Matrices. Academic Press, 1985. [66] G. Latouche. Algorithms for evaluating the matrix G in Markov chains of PH/G/1 type. Technical report, Bellcore, 1992. [67] G. Latouche. Algorithms for in nite Markov chains with repeating columns. In C.D. Meyer and R.J. Plemmons, editors, Linear Algebra, Queueing Models and Markov Chains, pages 231{265. Springer-Verlag, New York, 1993.

118

BIBLIOGRAPHY

[68] G. Latouche. Personal communication, 1995. [69] G. Latouche and V. Ramaswami. A logarithmic reduction algorithm for Quasi-Birth-Death processes. J. Appl. Probability, 30:650{674, 1993. [70] G. Latouche and G.W. Stewart. Numerical methods for M/G/1 type queues. In W. J. Stewart, editor, Computations with Markov Chains, pages 571{581. Kluwer Academic Publishers, 1996. [71] L. Lenzini, B. Meini, and E. Mingozzi. An ecient numerical method for performance analysis of contention MAC protocols: a case study (PRMA++). Submitted for publication, 1997. [72] L. Lerer and M. Tismenetsky. Generalized Bezoutian and the inversion problem for block matrices, I. General scheme. Integral Equations Oper. Theory, 9:790{819, 1986. [73] E. Linzer. On the stability of transform-based circular deconvolution. SIAM J. Num. Anal., 29:1482{1492, 1992. [74] D. M. Lucantoni. New results on the single server queue with a batch Markovian arrival process. Commun. Statist. Stochastic Models, 7:1{46, 1971. [75] B. Meini. An improved FFT-based version of Ramaswami's formula. Commun. Statist. Stochastic Models, 13:223{238, 1997. [76] B. Meini. New convergence results on functional iteration techniques for the numerical solution of M/G/1 type Markov chains. Numer. Math., 78:39{58, 1997. [77] B. Meini. Solving M/G/1 type Markov chains: recent advances and applications. Commun. Statist. Stochastic Models, 1998. In printing. [78] H. Minc. Nonnegative Matrices. John Wiley and Sons, Inc., 1988. [79] M. F. Neuts. Matrix-Geometric Solutions in Stochastic Models. Johns Hopkins University Press, Baltimore, 1981. [80] M. F. Neuts. Structured Stochastic Matrices of M/G/1 Type and Their Applications. Marcel Dekk., New York, 1989. [81] C. A. O'Cinneide. Entrywise perturbation theory and error analysis for Markov chains. Numer. Math., 65:109{120, 1993. [82] M. A. Ostrowski. Recherches sur la methode de Graee et les zeros des polynomes et des series de Laurent. Acta Math., 72:99{257, 1940.

BIBLIOGRAPHY

119

[83] V. Ramaswami. Nonlinear matrix equations in applied probability - Solution techniques and open problems. SIAM Review, 30:256{263, 1988. [84] V. Ramaswami. A stable recursion for the steady state vector in Markov chains of M/G/1 type. Commun. Statist. Stochastic Models, 4:183{188, 1988. [85] H. Schellhaas. On Ramaswami's algorithm for the computation of the steady stete vector in Markov chains of M/G/1 type. Commun. Statist. Stochastic Models, 6:541{550, 1990. [86] S. Serra. Preconditioning strategies for asymptotically ill-conditioned block Toeplitz systems. BIT, 34:579{594, 1994. [87] G. W. Stewart. Implementing an algorithm for solving block Hessenberg systems. Technical Report CS-TR-3295, Department of Computer Science, University of Maryland, College Park, 1993. [88] G. W. Stewart. On the solution of block Hessenberg systems. Numer. Lin. Algebra with Appl., 2:287{296, 1995. [89] W. J. Stewart. Introduction to the numerical solution of Markov chains. Princeton University Press, Princeton, New Jersey, 1994. [90] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer Verlag, New York, 1980. [91] P. Tilli. On the asymptotic spectrum of Hermitian block Toeplitz matrices with Toeplitz blocks. Math. Comp., 1997. To appear. [92] W. Trench. An algorithm for the inversion of nite Toeplitz matrices. SIAM J. Appl. Math., 12:515{522, 1964. [93] E. E. Tyrtyshnikov. Singular values of Cauchy-Toeplitz matrices. Linear Algebra Appl., 161:99{116, 1992. [94] R. S. Varga. Matrix Iterative Analysis. Prentice Hall, New Jersey, 1962. [95] R. S. Varga and D.-Y. Cai. On the LU factorization of M-matrices. Numer. Math., 38:179{192, 1981. [96] H. Widom. Asymptotic behaviour of block Toeplitz matrices and determinants. Advances in Math., 13:284{322, 1974. [97] J. Ye and San-Qi Li. Analysis of multi-media trac queues with nite buer and overload control{Part I: algorithm. In Proc. IEEE Infocom 91, Bal Harbour, pages 1464{1474. 1991.

Fast algorithms for the numerical solution of structured

Fast algorithms for the numerical solution of structured

Suggest Documents

FAST NUMERICAL ALGORITHMS FOR THE ... - UT Math

Structured High Resolution Algorithms in the Solution of the Euler ...

Fast algorithms for positive definite matrices structured by orthogonal ...

Fast Algorithms for Structured Matrices: Theory and ...

Fast Numerical Algorithms for Euler's Elastica Inpainting Model 1 ...

Matlab for Numerical Algorithms

Design of Numerical Algorithms for the Ballistic

Fast Iterative Solution Algorithms in the ... - Purdue Engineering

Fast Iterative Solution Algorithms in the ... - Purdue Engineering

NUMERICAL METHODS FOR THE SOLUTION OF ... - Mathematics

A Numerical Algorithm for the Solution of

Structured Derivations of Consensus Algorithms for ... - CiteSeerX

NUMERICAL SOLUTION OF THE ABELIAN

Numerical Solution of a Reaction-Diffusion System with Fast ...

NUMERICAL SOLUTION OF THE BOLTZMANN

numerical algorithms for image superresolution

Interpolation Algorithms for Numerical Control

Accurate Solution Algorithms for Incompressible

Fast Algorithms for Generating Delaunay

Numerical Elasticity Solution for Continuously

Fast Algorithms for Segmented Regression

Fast Approximation Algorithms for Fractional

Fast Algorithms for Segmented Regression

SOME FAST ALGORITHMS FOR SEQUENTIALLY