Optimal and superoptimal matrix algebra operators Fabio Di Benedetto Stefano Serra Capizzanoy November 28, 1997
Abstract
We study the optimal and superoptimal Frobenius operators in a general matrix vector space and in particular in the multilevel trigonometric matrix vector spaces, by emphasizing both the algebraic and geometric properties. These general results are used to extend the Korovkin matrix theory for the approximation of block Toeplitz matrices via trigonometric vector spaces. The abstract theory is then applied to the analysis of the approximation properties of several sine and cosine based vector spaces. Few numerical experiments are performed to give evidence of the theoretical results.
Keywords: masking operators, Toeplitz matrices, matrix vector spaces and matrix algebras, Korovkin Theorem. AMS SC: 65F10, 15A30, 15A60, 47B48.
1 Introduction Multilevel Toeplitz matrices arise in and have application to a wide variety of elds of pure and applied mathematics. We may mention multidimensional Fourier analysis, numerical treatment of partial dierential equations, industrial control theory, applications in Markov chains, multidimensional signal processing and image restoration [40, 25, 23, 24, 17, 16, 14, 10, 8, 18]. Here the Toeplitz matrices fAn (f )gn , generated [17] by a Lebesgue-integrable function f , are intended in their utmost general form in the sense that the (block) entries of An (f ) along the k-th (block) diagonal are given by the k-th Fourier coecient Ak of a function f acting on I p, I = [?; ] and having values in the space Cst of the complex s t matrices. In particular, we nd in the Toeplitz matrix p structured levels whose dimensions are represented by the multi-index n = (n1; n2; : : :; np ) 2 Np+ , so that the inner nonstructured blocks are indexed by subscripts like k = (k1; k2; : : :; kp) with kj 2 f?nj ; : : :; nj g. More precisely, setting x = (x1; x2; : : :; xp) and k x = k1x1 + + kpxp , we have Z 1 [An (f )]i;j = Ai?j ; Ak = [2 ]p p f (x)e?i(kx)dx; i2 = ?1: (1) I It is understood that the basic blocks of this multilevel structure are given by fAk g with Ak 2 Cst . To have an idea of the multilevel structure we have to choose an ordering among the indices fkj g. The following scheme is the classical one which is also clearly described in [38]. The matrix An (f ) has dimension N (n)s N (n)t where N (n) = n1 n2 np, and the symbol [An (f )]i;j denotes that Dipartimento di Matematica, Via Dodecaneso 35, 16146 Genova (ITALY) (
[email protected]) y Dipartimento di Energetica, Via Lombroso 6/17, 50100 Firenze; Dipartimento di Informatica, Corso Italia 40,
56100 Pisa (ITALY) (
[email protected])
1
we are selecting the block (i1; j1) (which is a (p ? 1)-level Toeplitz matrix); in this block we are selecting the block (i2; j2) which is a (p ? 2)-level Toeplitz matrix and so on. At the end of this process we nd that the basic blocks are given by the fAi?j g formally determined in equation (1). If the unitary N (n) N (n) matrices U; V are chosen as left and right transformations, we may consider the family of N (n)s N (n)t matrix spaces which are brought into a block diagonal form with blocks j;j 2 Cst :
fM(U; V )ngn = fX = U V 2 CN (n)sN (n)tgn ;
(2)
where V denotes the adjoint of V . We then introduce the family of operators fP [U; V ]n gn . For any n 2 Np+ , the \optimal" operator P [U; V ]n : CN (n)sN (n)t ! M(U; V )n associates with any N (n)s N (n)t matrix A the matrix X^ that minimizes the functional FA (X ) = kA ? X kF in the Frobenius norm over the whole space M(U; V )n: in this way P [U; V ]n(An(f )) is the matrix where the above de ned functional FA(X ), for A = An (f ), attains its minimum value. Notice that, when s = t and by choosing U = V , the vector space M(U; V )n becomes a Banach matrix algebra. Algebraic, geometric and spectral properties of the operator P [U; V ]n have been established in literature [7] for the case where p = s = t = 1 and U = V is the Fourier matrix associated with the circulant class M(U; V )n . In order to extend such results to the general case, it is worth to assume that in (2) is not necessarily block diagonal, but has a generic prescribed sparsity pattern. This is made possible by introducing the notion of a \masking operator" , leading to the construction of new spaces M(U; V; )n and of new optimal operators P [U; V; ]n: in this way we are able to prove general versions of all the properties already known for the one-level circulant scalar case. The same generalization is performed for the \superoptimal" operator P^ [U; V; ]n, which is an alternative approach for approximating a general matrix in the space M(U; V; )n, introduced in [36] for the scalar circulant class. The theoretical study of the optimal operator allows us to link the Korovkin theory of matrix convergence, recently devised in [26, 31, 28], and the solution of Toeplitz equations by a Preconditioned Conjugate Gradient (PCG) method based on fast trigonometric transforms [9]. In particular, we consider the subset AT of all the spaces M(U; V; )n whose unitary matrices U; V are related to trigonometric functions. More precisely, we focus our attention on the spaces M(U; V; )n for which the adjoints U and V are tensor products of p Vandermonde-like matrices [13] whose related functions are linearly independent and are evaluated on a quasi-equispaced set of points (the grid points) in a suitable interval I [?; ]. As a case study, we test the eectiveness of several trigonometric spaces related to 8 classical discrete cosine and sine transforms, by proving a convergence theorem and by performing a few numerical tests. We emphasize that our tools can be used independently of the particular trigonometric spaces where the multilevel preconditioner is constructed. The paper is organized as follows. In Section 2 we introduce the basic concepts of masking operators and trigonometric spaces. The main part of the paper is concerned with the optimal operator: we present in Section 3 the theoretical properties, we expose the Korovkin theory in Section 4 by showing that some assumptions can be removed from the earlier results in [26, 31, 28], we perform the case study in Section 5. Only Section 6 is devoted to the superoptimal operator, where we explain theoretical and practical reasons for which its numerical performance cannot be expected as good as for the optimal operator. A nal section of conclusions ends the paper. 2
2 Matrix spaces and optimal approximation
Let U and V be unitary complex N N matrices; then, for any xed N; s; t, by M(U; V ) we denote the vector space of all the matrices simultaneously block diagonalized by the unitary transforms U (s) = U Is and V (t) = V It. Here the symbol means transpose and conjugate. More precisely, M(U; V ) = fX = U (s)V (t) : = diagj=1;:::;N (j )g; j being s t complex matrices. Some generalizations have been considered in the scalar setting, by modifying the canonical form which is assumed to be diagonal in the de nition of M(U; V ). For example, some authors [4, 19] studied the case where has the following pattern 0 1 1 N BB CC . . . . .. B CC : =B (3) B@ CA .. . . . . 1 N This choice leads to a space M(U; V ) whose dimension is doubled and therefore should give better performances with respect to ordinary spaces widely used in literature, like the circulant class. In order to give more generality to our context, we further extend this idea by considering the matrix with a prescribed sparsity pattern, which can be formalized by introducing the concept of masking operator.
De nition 2.1 Let A be the linear space of Ns Nt complex matrices; a masking operator (MO)
is a map : A ?! A that puts some entries of a generic matrix X to zero, according to the mask E^ 2 f0; 1gNsNt in the following way:
^ (X ) = X E;
where is the Hadamard (componentwise) product.
In order to match the block pattern underlying the matrix spaces we are studying, throughout the paper we will assume that E^ has the following special structure: E^ = E ones(s; t); E = (eij ) 2 f0; 1gN N where ones(s; t) denotes the s t matrix of all 1's. In this case, if X is block partitioned as (Xij )Ni;j =1 with Xij 2 Cst , then the (i; j ) block of (X ) is Xij itself if eij = 1, the null matrix if eij = 0. Once the MO has been characterized from the mask E , it is natural to de ne the associated MO's s and t acting on square matrices of dimension Ns and Nt respectively, as follows: X 2 CNsNs ) s(X ) := X (E ones(s; s)); (4) X 2 CNtNt ) t(X ) := X (E ones(t; t)): In the same way, the dual operator is de ned on a matrix X 2 CNtNs by setting (X ) := X (E ones(t; s)): Now we restrict ourselves to a special class of masking operators. 3
(5)
De nition 2.2 A MO is non-degenerate (ND) if its mask E satis es the following two condi-
tions:
ND1) IN E (that is, eii = 1 for all i); ND2) whenever elj = 1, the l-th and j -th columns of E coincide.
De nition 2.2 is motivated by the algebraic properties characterized in the next proposition. Proposition 2.1 Let A = CNsNt, A = CNtNs, be a MO on A with range R(). 1. De ne for X = (Xij )Ni;j =1 2 A the \block trace" Trace(X ) =
N X i=1
Xii . Then the relation
Trace(X ) = Trace((X )) holds if and only if condition ND1) is true. 2. Let be the dual MO related to , s and t be de ned as in (4); then, for all X 2 A and 2 R( ) the relations
t(X ) = (X )
and
s(X ) = (X )
(6)
hold if and only if condition ND2) is true.
Proof.
1. If ND1) holds, then every diagonal block of X is kept unchanged by the action of ; therefore the block trace of (X ) equals that of X . Now assume that preserves the block trace: consider an arbitrary index k and take X = (Xij ) with Xkk := ones(s; t) and Xij := 0 elsewhere; clearly Trace(X ) = ones(s; t) and (X ) has zero diagonal blocks, except for at most the k-th one, equal to ekk Xkk . Since Trace((X )) = ekk ones(s; t) = Trace(X ), we get ekk = 1, whence condition ND1) follows from arbitrariness of k. 2. The (i; j ) block of t (X ) ? (X ), for X = (Xij ) 2 A and = (ij ) 2 R( ), is equal to N X l=1
Xillj (eij ? eil ):
(7)
Whenever elj = 0, lj vanishes by construction and the l-th term of the summation (7) is zero. If elj = 1 and condition ND2) holds, then the elements eij and eil must be equal and then the l-th term in (7) vanishes as well, so that the rst equality of (6) holds. For the second equality the proof is quite similar. Conversely, in order to prove ND2) assume by contradiction that an element elj is 1 and that columns l and j are dierent, say, in the i-th position. De ne X = Eil ones(t; s), = Elj ones(s; t), where Eil (resp. Elj ) is a matrix of all zeros except for the (i; l) (resp. (l; j )) entry that equals 1. Clearly, belongs to the range of and a direct calculation yields X = sEij ones(t; t). Now, if eil = 0 and eij = 1 then t (X ) = X 6= 0 while (X ) = 0 because (X ) = 0; if eil = 1 and eij = 0 then t (X ) = 0 while (X ) = X 6= 0: hence the columns l and j of E must be equal. 4
A further motivation for De nition 2.2 is given by the following
Proposition 2.2 If ND2) holds true, then
81 2 R(); 82 2 R() : 12 2 R(s ) and 21 2 R(t): In particular, if s = t then R( ) is an algebra and if also ND1) holds true, then the vector space R() equipped with a natural matrix norm is a Banach algebra. Proof. 12 2 R(s) if and only if its (i; j ) block entry vanishes whenever eij = 0. It is easy N X to check that such entry is equal to [1]il [2]lj . By de nition, the l-th term of that summation l=1 vanishes unless eil = elj = 1: in this case, condition ND2) yields eij = 1 as well. Hence, for eij = 0 every term must be zero, and this proves that 1 2 2 R(s ); an analogous argument shows that 2 1 2 R(t ). For s = t, it is immediate to see that R( ) is always a vector space and we have just proved that it is also closed under multiplication. Moreover ND1) ensures that R( ) contains the identity matrix, whence the space (R( ); k k) where k k is any natural matrix norm is a Banach matrix algebra. Several practical examples of masking operators fall inside De nition 2.2. The diagonal operator with s = t = 1 and E = IN is a trivial instance, leading to the space M(U; V ). It can be shown that the de nition also applies to block diagonal operators (BDMO), for which the mask E is a block diagonal matrix, whose blocks potentially have dierent sizes. In the particular case where
N = qr
and
E = diag(ones | (r; r); :{z: :; ones(r; r)}); q times
we refer to as to a balanced block diagonal operator (BBDMO), which will be the main subject of subsequent sections of this paper. In the next step we want to completely characterize non-degeneration. Remark 2.1 If is ND, then its mask E is symmetric. In fact, whenever eij = 1 the columns i and j of E must agree by virtue of ND2); since ejj = 1 by condition ND1), then the corresponding entry of column i equals 1, that is eji = 1. The 1 of E is not a sucient condition: for example, the symmetric 3 3 mask 0 symmetry 1 1 1 E=B @ 1 1 0 CA de nes a MO which is not ND. 1 0 1 We are now ready to prove that permuted BDMO's are the only ND masking operators. Theorem 2.1 A MO is ND if and only if condition ND1) holds and its associated mask E is block diagonal up to a permutation of its rows and columns.
Proof. Consider the direct graph G associated with a mask E . Condition ND2) is equivalent to
saying that whenever G contains an arc from i to j , then all the nodes connected to i are connected to j as well (we use here the symmetry of G , induced by that of E : this holds both for ND and permuted BDMO's). This in turn is clearly equivalent to assuming that G is a disjoint collection of subgraphs that are complete (that is, each subgraph contains all the possible arcs joining its nodes). It is now straightforward to verify that the only mask having a complete direct graph is the matrix of all 1's, and hence permuted block diagonal masks are characterized by a graph G having 5
the mentioned property (the permutation is naturally induced by the partitioning of G ). We give in the example below an evidence of our argument:
01 1 1 1 B 1 1 1 C CC B B 1 1 E=B C= B @ 1 1 1 CA 1
1
01 BB 1 =B 0 1 BB @ 1 0
1
ones(3; 3)
0
!
0 T ; ones(2; 2)
1 8 9 > > CC > > 1 2 < CC ; G = AK = [ f3 $ 5g ; U A > CA > : 4 > ;
we have not drawn the loops in G . From this theorem it is clear that a BBDMO is non-degenerate, and that canonical forms like (3) immediately fall into our generalization. With the new de nitions, the matrix space M(U; V ) naturally becomes the following:
M(U; V; ) = fX = U (s)V (t) : 2 R()g A: This is a linear space, which becomes a Banach matrix algebra for s = t, U = V and non-degenerate , thanks to Proposition 2.2. In the general non-degenerate case, it is worth pointing out that, if X 2 M(U; V; ), then X 2 M(V; U; ) X X 2 M(V; V; t) XX 2 M(U; U; s). The optimal preconditioning operator PN = P [U; V; ]N is de ned on A and takes values in M(U; V; ) where both the vector spaces are equipped with the Frobenius norm kX k2F = Pi;j jxi;j j2. Then PN (A) = arg minX 2M(U;V;)kA ? X kF (8) where the minimum exists and is unique since M(U; V; ) is a linear nite dimensional space and (A; h; iF ) is a Hilbert space with hA; B iF = tr(A B ), kAk2F = hA; AiF and where tr() is now the ordinary trace.
2.1 Trigonometric matrix spaces and algebras
Here we de ne a special subset of matrix vector spaces that we call trigonometric matrix spaces and we denote by AT . Let v = fvj(n) : n 2 N; j = 0; : : :; n ? 1g be a sequence of n-tuples of trigonometric functions on a interval I and S = fSm g be a sequence of grids of m points on I , namely, Sm = fx(im); i = 0; : : :;m ? 1g. Let us suppose that the generalized Vandermonde m?1 matrix GV (v; S; m) = vj(m)(x(i m)) i;j =0 is a unitary matrix. Then, a vector space of the form M(U; V; )n is a trigonometric (p-level) matrix space if there exist 2p pairs (vi;j ; Si;j ), i = 1; : : :; p, j = 1; 2 such that U = Qpi=1 GV (vi;1; Si;1; ni ) and V = Qpi=1 GV (vi;2; Si;2; ni) (here the symbol 6
Q is understood in the sense of the tensorial product). In this way U and V are generalized
p-dimensional Vandermonde matrices. In addition, given a matrix space M(U; V; )m belonging to AT , we will call it regular if each one of the 2p sets of the grid points Si;j forms a quasi-uniformly distributed mesh in I . For a formal de nition of quasi-uniform distribution see the following De nition 2.3 A sequence of meshes S = fSmg, Sm = fx(im) : i = 0; : : :; m ? 1g belonging to an interval I is called quasi-uniform if mX ?1 jI j ? (x(im) ? x(i?m1)) = o(1) i=1 m with jI j being the width of I . If the previous relation holds for o(1) = O(m?1 ), then the sequence fSmg is called uniform. For s = t = 1 and U = V , examples of such matrix spaces are the circulant, the Tau, the Hartley class for which the matrix U [3] is m U = F = p1m eijxi ; i; j = 0; : : :; m ? 1; n (m) o Sq m = xi = 2mi : i = 0; : : :; m ? 1 I = [?; ]; 2 sin((j + 1)x(m)) ; i; j = 0; : : :; m ? 1; U =S = m+1 n (n) (i+1)i o S m =h xi = m+1 : i = 0; : :i:; m ? 1 I = [0; ]; U = H = p1m nsin(jx(im)) + cos(jx(im)) ; i;oj = 0; : : :; m ? 1; Sm = x(im) = 2mi : i = 0; : : :; m ? 1 I = [?; ]; respectively. For the class of the ! -circulants [11, 1] we may consider the case of ! on the unit complex circle because it assures that the matrix U continues to be unitary. More precisely, if ! = ei2 then the matrix U has the following representation m 1 i ( j + ) x p i e ; i; j = 0; : : :; m ? 1; U = F! = (
m
n
)
(
)
o
Sm = x(im) = 2mi : i = 0; : : :; m ? 1 I = [?; ]: Observe that all these matrix spaces are also Banach matrix algebras, trigonometric and regular with uniform meshes. Notice that several examples of general trigonometric matrix spaces can be constructed by using these unitary matrices as basic ingredients. Actually all possible combinations are allowed. For s = 2; t = 3, we may de ne the p-level matrix space (p = 2) of dimension 2n1n2 3n1 n2 by choosing U (s) = F?1 (n1) S (n2 ) I2 and V (t) = F p i (n1 ) H (n2) I3. In addition, we remark that all the algebras and spaces considered in the literature have a special property concerning the functional set v = fvj(n)g. In fact, by taking into account the circulants [11], the ! -circulants with j! j = 1 [26], the Tau algebra [2], the Hartley class [3] and most of the sine/cosine algebras considered in [20] we observe that, for any m, n and any j minfn; mg ? 1 we nd a pure constant c(n; m) depending only on n and m so that vj(n) = c(n; m)vj(m): This implies that, up to suitable scaling factors fc(n; m)g, the functions of the set v = fvj(n)g are uniquely determined by a unique sequence of functions fwj gj 2N. Finally, notice the similarity between the matrix ortogonality of the matrices U and V and the Gram polynomials [32] de ned by using discrete inner products. 2( +1) 2
7
3 The optimal operator In this section we investigate algebraic and geometric properties of the optimal approximation operator, generalizing those proved in [7] for the basic circulant case and in [15] for arbitrary algebras corresponding to the choice s = t = 1; U = V; () = diag().
Theorem 3.1 Let A; B 2 A = CNsNt, Y 2 A = CNtNs, be a non-degenerate MO, and
PN = P [U; V; ] be de ned as in (8). The following algebraic properties hold: 1. PN (A) satis es the explicit formula PN (A) = U (s)(U (s)AV (t))V (t); 2. 3. 4. 5. 6.
(9)
PN (A + B) = PN (A) + PN (B) with ; 2 C; PN (A) = (PN (A)), where PN = P [V; U; ] and is the dual MO of as in (5); if s = t and U = V , PN (A) is Hermitian if A is; if U = V , Trace(PN (A)) = Trace(A) (in the block sense); for every B 2 M(U; V; ), the following equalities hold: P [U; U; s](BY ) = BPN (Y ); P [V; V; t](Y B) = PN (Y )B;
s and t being de ned as in (4). Proof. For every matrix of the form U (s)V (t) with 2 R(), the Frobenius distance from A is equal to kA ? U (s)V (t)kF = kU (s)AV (t) ? kF (10) since U (s) and V (t) are unitary. For the sake of simplicity, de ne A^ := U (s)AV (t); the square of (10) can be written as N X k[A^]ij ? []ij k2F ; i;j =1
where we suppose that matrices are partitioned into N 2 s t blocks. The summation above can be splitted as X X (11) k[(A^)]ij ? []ij k2F + k[A^]ij k2F : eij =0
eij =1
in fact, whenever eij = 1 the masking operator preserves the corresponding entry of A^, while []ij is put to zero when eij = 0. Now observe that the second sum in (11) is independent on the choice of , and it agrees with the approximation error; the rst sum attains its (zero) minimum value by setting = (A^), which completely characterizes the minimizer PN according to the formula (9) given in part 1 of the statement. From this fact we directly have part 2, since the MO is linear with respect to A. Concerning part 3, apply (9) to PN (A) = V (t)(A^)U (s); where now A^ := V (t)AU (s) = (U (s)AV (t)) and (A^) = A^ (E ones(t; s)) = A^ (E ones(s; t)) = (A^) 8
since E is symmetric by Remark 2.1. It follows that (PN (A)) = U (s) (A^)V (t) = U (s) (U (s)AV (t))V (t) = PN (A ); again in view of formula (9). In the special case s = t and U = V , it is immediate to check that = and PN = PN , and this proves part 4. N X In order to prove part 5, de ne the block trace as Trace(A) = Akk , where A = (Aij )Ni;j =1 k=1 and Aij 2 Cst . For any unitary N N matrix Q, the k-th diagonal block of the matrix A^ := (Q Is )A(Q It ) is equal to N X qkiq kj Aij ; Q = (qij )Ni;j=1 ; [A^]kk = i;j =1
so that
Trace(A^) = =
N X k;i;j =1 N X i;j =1
qkiqkj Aij =
Aij [QQ]ji =
N X i;j =1
X i=j
Aij
N X k=1
q kj qki
Aij = Trace(A):
(12)
Applying this relation to the case Q = U , we have
Trace(PN (A)) = Trace((U (s)AU (t))) = Trace(U (s)AU (t)); in the light of (9) and part 1 of Proposition 2.1. A further application of relation (12) to the case Q = U yields Trace(U (s)AU (t)) = Trace(A), whence the thesis claimed in part 5. Finally, if B 2 M(U; V; ) then there exists 2 R( ) such that B = U (s)V (t); thus, by (9) we have P [U; U; s](BY ) = U (s)s(U (s)BY U (s))U (s) = U (s)s(X )U (s) and P [V; V; t](Y B) = V (t)t(V (t)Y BV (t))V (t) = V (t)t(X )V (t); with X = V (t)Y U (s). By recalling part 2 of Proposition 2.1, we obtain the rst relation of part 6: P [U; U; s](BY ) = U (s)(X )U (s) = BPN (Y ); having applied (9) to PN . The same argument proves the second relation.
Lemma 3.1 Let X 2 A and be a non-degenerate MO. Then the inequality max((X )) max(X ) holds, where max() denotes the largest singular value.
Proof. Recall that for every X 2 A, max(X )2 = max(X X ). Since is non-degenerate, by
Theorem 2.1 there exists a BDMO 0 and a N N permutation matrix satisfying the following relation for all X 2 A:
(X ) = s0 (B)Tt ; with B = Ts X t; s = Is and t = It : 9
Thus (X ) (X ) = t 0 (B )0 (B )Tt and B B = Tt X X t, so that max((X )) = max(0 (B)) and max(X ) = max(B): (13) Hence, it suces to prove the thesis just for block diagonal masking operators. Under this assumption, we have 0 (B ) = diag(B11; : : :; Bmm ) where each Bll is the l-th block m X diagonal entry of B of size nl snl t with nl = N (we are considering a partitioning B = (Bij )mi;j =1 l=1 B11 ; : : :; B Bmm ) and then with Bij 2 Cni snj t ). It follows 0 (B )0 (B ) = diag(B11 mm Bkk ); (14) max(0 (B) 0 (B)) = l=1max (B B ) = max(Bkk ;:::;m max ll ll for a suitable k. By the Cauchy interlace theorem, max(B B ) is not exceeded by the largest eigenvalue of the k-th diagonal block in B B, which is equal to [B B ]kk =
m X i=1
Bkk + Bik Bik = Bkk
X i6=k
Bik Bik ;
since the second term is positive semide nite, by monotonicity we conclude Bkk ): max(B B) max([B B]kk ) max(Bkk (15) Comparing (14) and (15) proves the thesis for a BDMO 0 applied to B , which is also true for general X and in view of the relations (13).
Theorem 3.2 Let A 2 A, be a non-degenerate MO, PN = P [U; V; ]. The following geometric properties hold: 1. kPN k = 1 with k k the dual 2 norm, 2. kPN k = 1 with k k the dual F norm, 3. kA ? PN (A)k2F = kAk2F ? kPN (A)k2F .
Proof. Since the spectral norm is invariant with respect to unitary transformations, by formula (9) we obtain
kPN (A)k2 = kU (s)(U (s)AV (t))V (t)k2 = k(U (s)AV (t))k2; by applying Lemma 3.1, it holds
kPN (A)k2 kU (s)AV (t)k2 = kAk2 and part 1 is proved, since for every B 2 M(U; V; ) we have PN (B ) = B .
In order to prove parts 2 and 3, it is enough to recall that they hold for any orthogonal projector mapping a Hilbert space to a linear subspace of nite dimension (part 3 is just an instance of Pithagora's theorem). It is then evident that M(U; V; ) is a linear subspace (by part 2 of Theorem 3.1) of the nite-dimensional Hilbert space A, equipped to the norm k kF that is induced by the scalar product hA; B iF = tr(A B ); hence PN can be viewed as the orthogonal projector of A onto M(U; V; ). It is also interesting the following result. 10
Theorem 3.3 If s = t, U = V
and A is Hermitian, then the eigenvalues of PN (A) are contained in the closed real interval [min(A); max(A)] containing all the eigenvalues of A. Moreover, when A is positive de nite also PN (A) is positive de nite.
Proof. Since PN (A) = U (s)(U (s)AU (s))U (s) by virtue of (9), the spectrum of PN (A) is the
same as of (A^), where A^ := U (s)AU (s) is in turn similar to A. Furthermore, non-degeneracy of implies that is a BDMO up to a permutation; hence we can assume without loss of generality that is block diagonal. It is now straightforward to observe that min((A^)) = min(A^ll ); max((A^)) = max(A^kk ); where A^ll , A^kk are suitable diagonal blocks of A^. By the Courant-Fischer theorem, the spectrum of any principal submatrix of A^ lies in the interval [min(A^); max(A^)] = [min(A); max(A)]; the application of this result to A^ll and A^kk completes the proof. Notice that the operator is always linear. The property of preserving the positivity (as stated in the latter theorem) and other good properties hold only when U = V .
4 Some Korovkin-type theorems This section can be divided into two parts. In the rst one we recall some results of approximation theory based on the concepts of linear positive operators and in the spirit of the Korovkin theorems [21]. The second part is devoted to the analysis of the clustering properties of Toeplitz matrices preconditioned by optimal operators. These spectral results strongly use the algebraic and geometric properties of the optimal operator and the Korovkin theory and can also be interpreted and explained in the Korovkin language. In all this section the MO is supposed to be diagonal and therefore we omit it, i.e., we set P [U; U; ] = P [U; U ]: the general case is sketched at the end of section.
4.1 Approximation theory premises
Since the goodness of the optimal preconditioners is roughly speaking decided by the behaviour of an operator [31] over the grid points, we are motivated to introduce the notions of convergence on discrete sets.
De nition 4.1 Let f : I ! X , where X = (X; k k) is a normed vector space. sequence of functions belonging to C (I; X ) and fSn gn I a sequence of grids.
Let ffn gn be a
We say that fn uniformly converges to f on fSng if nlim !1 sup kfn (x) ? f (x)k = 0: x2Sn
The convergence is called pointwise if for any xed sequence fxn g such that xn 2 Sn the relation
holds true.
nlim !1 kfn (xn ) ? f (xn )k = 0
11
The convergence is O(1){uniform (over Sn ) if nlim !1 sup kfn (x) ? f (x)k = 0 x2Sn nS (Jn )
where S (Jn ) is a set of points of Sn associated with the set of indices Jn of cardinality bounded by an absolute constant.
The convergence is o(n){uniform (over Sn) if nlim !1 sup kfn (x) ? f (x)k = 0 x2Sn nS (Jn )
where S (Jn ) is a set of points of Sn whose cardinality is o(n). In the case where f acts on I p, the de nitions above have a natural extension by considering n = (n1; : : :; np): the grid Sn is then consisting of N (n) = n1 np points and Jn is a set of multi-indices. The expression n ! 1 denotes that nj ! 1 for any j 2 f1; : : :; pg and in the last convergence type o(n) is replaced by o(N (n)).
Now let us introduce the following de nition.
De nition 4.2 [31] Let G be the linear space
C (I p; Cst ); k k1
of the continuous (periodic) functions de ned on I p and let fn g be a sequence of linear operators on G . If fqi g3i=1 is the set of the three test functions 1, sin(x), cos(x), and Ej;k is the matrix of the canonical basis of Cst having 1 in the position (j; k) and zero otherwise, let q^i;j;k;l (x) = Ej;k qi (xl) for i = 1; 2; 3, (j; k) 2 f1; : : :; sg f1; : : :; tg, l 2 f1; : : :; pg. We say that \fn gn satis es the Korovkin test" if n (^qi;j;k;l ) uniformly converges to q^i;j;k;l , according to one of the notions given in De nition 4.1.
It is worth pointing out that in some applications (like the discretization of elliptic boundary value problems BVPs [5, 29]) we encounter inherently symmetric Toeplitz matrices which are generated by trigonometric even functions (f (x) = f (?x)). In this case, in all the Korovkinstyle theorems, the set f1; sin(x); cos(x)g (2 periodic case) cannot be used since sin(x) is not even and must be replaced by another Chebyshev set [21]. A canonical proposal is given by f1; cos(x); cos(2x)g (2 periodic, even case). Moreover, we notice that for p = s = t = 1, the latter result reduces to the classical Korovkin Test [21] in the trigonometric scalar case. In the following we will consider the linear operator
Ln [U ](f ) : x 2 I p 7! ([u(n)](x) Is) An (f ) ([u(n)](x) It ) 2 Cst :
(16)
Here [u(n)](x) is the generic row of U , where the grid points have been replaced by the continuous variable x = (x1 ; : : :; xp).
Remark 4.1 It can be seen that Ln[U ](f ) is nothing other than the continuous expression of the
s t diagonal blocks of U (s)An (f )U (t); hence, the singular values of P [U; U ]n(An(f )) are given by those of Ln [U ](f ) evaluated on the grid points. 12
Now we analyze the problem of the convergence with regard to the concepts of O(1){uniform convergence or o(N (n)){uniform convergence, by stating a specialized version of the Korovkin theorem.
Theorem 4.1 Let G be the functional space of De nition 4.2 and suppose that the linear operators
Ln [U ] : G ! G , given by (16), satisfy the Korovkin test on a given sequence of grids fSng I p, with exception of the points belonging to S (Jn ). Then, for any continuous function f 2 G , Ln [U ](f ) uniformly converges to f on the same sequence of grids fSn g with the exception of the same points. The same statement holds if the \uniform convergence" is replaced by \pointwise convergence".
Proof. It is enough to observe that fSnnS (Jn)g is a new sequence of grids of I p. Therefore the
claimed thesis is just an application of the generalized Korovkin theorem proved in [31, Theorem 4.3] (see this reference for details). We remark that the related convergence is O(1){uniform or o(N (n)){uniform according to the cardinality of S (Jn): as usual in the Korovkin results, the convergence must be only checked on the test functions. Finally, we stress that the Korovkin test gives also information of quantitative type about the convergence speed of the process Ln [U ]() over the polynomials of xed degree.
Theorem 4.2 [26, 31] Let fq^i;j;k;lg be the test functions introduced in De nition 4.2 and let n = o(1). If
sup
x2Sn nS (Jn )
jLn[U ](^qi;j;k;l)(x) ? q^i;j;k;l(x)j = O(n)
then, for any trigonometric polynomial q of xed degree (independent of n), we nd
sup
x2Sn nS (Jn )
jLn[U ](q)(x) ? q(x)j = O(n ):
4.2 Other technical premises
Now, since we are interested in the study of the approximation of fAn (f )g by fP [U; U ]n(An (f ))g, we have to introduce some notions of convergence for sequences of matrices.
De nition 4.3 Given two sequences of matrices fAng and fBn g of dimension m n with m n, we say that \fAn g and fBn g (strongly) converge each other" if, for any > 0, there exists n such that, for n n , An ? Bn has singular values in [0; ) except for a constant number N of outliers; we say that \fAn g and fBn g weakly converge each other" if there are N = o(n) outliers. When the number N is upperbounded by a constant which does not depends on we say that the convergence is also uniform. The following result due to Tyrtyshnikov gives a criterion to establish if convergence occurs.
Lemma 4.1 [38] Let fAng, fBng be two sequences of m n complex matrices with m n. When kAn ? Bn k2F = O(1) then we have convergence in the strong sense. Otherwise, if kAn ? Bn k2F = o(n) then the convergence is weak.
The following result, called of Weierstrass type , is useful to prove the main Theorems 4.5 and 4.6. 13
Theorem 4.3 [31] Let f be a continuous periodic function belonging to C (I p; Cst). Then, fP [U; V ]n(An(f ))g
fAn (f )g
and
strongly (or weakly) converge, if the same convergence condition holds for all the trigonometric polynomials q .
Another technical premise concerns some ergodic results regarding the eigenvalues/singular values asymptotical distribution when we associate with a family of Toeplitz matrices a functional symbol f .
n
Theorem 4.4 [33] Let f 2 L2(I p; Cst) and let i(n)
o
be the singular values of An (f ). Then, for any continuous function F with bounded support, we nd the following asymptotic formula (the Szego-Tilli relation) N (n)min Z minXfs;tg X fs;tg (n) 1 1 F (i ) = [2]p p F (j (f (x)))dx: nlim !1 N (n) I j =1 i=1
(17)
The boundedness assumption on the support of F can be removed, if f is bounded or if F is nonnegative.
Now we are ready for stating a convergence theory for Toeplitz matrices.
4.3 Korovkin-type results for Toeplitz matrices
First we want to study when the singular values (eigenvalues in the positive de nite case) of P [U; U ]n(An (f )) are in some sense close to the grid values of f for n going to in nity. With the help of the previous results, we can show three Korovkin-style theorems. In [31] they are proved in the case of uniform convergence without exceptions.
Theorem 4.5 Let f 2 C (I p; Cst) be a continuous periodic function. If Ln[U ](q) = q + n (q) for each one of the test functions q and with n going uniformly to zero over the grid points of the algebra fx(i n) gi except for a set of multi-indices Jn so that #(Jn ) = o(N (n)), then fP [U; U ]n(An (f ))g converges to fAn (f )g in the weak sense.
Proof. Setting for brevity Pn = P [U; U ]n, from identity 3. in Theorem 3.2, for any trigonometric polynomial q we have
0 kAn (q ) ? Pn (An (q ))k2F = kAn (q )k2F ? kPn (An (q ))k2F :
(18)
From the uniform convergence of Ln [U ](q ) to q on the test functions and over the meshes fSn nS (Jn )g, we obtain the same convergence property (over the same sets) for any trigonometric polynomial of xed degree: this is clearly a consequence of Theorem 4.1. Therefore
kAn(q) ? Pn(An (q))k2F = kAn(q)k2F ? 14
X i
ki(U (s)An (q)U (t))k2F
where i () stands for the i-th s t diagonal block. The last expression, in view of Remark 4.1, coincides with X kAn(q)k2F ? kq(x(in)) + n (q)(x(in))k2F : i
Now, from the de nition of the Frobenius norm, we nd that
kAn(q)k2F
=
N (n)min X fs;tg i=1
i(An(q))2:
The preceding relation is very interesting because, after division by N (n), it coincides with the sum appearing in the left-hand side of the Szego-Tilli relation (see Theorem 4.4) with F (t) = t2 . Then, by applying the quoted result, we nd
kAn(q)k2F = N (n) [21 ]p
Z
Ip
minX fs;tg
j =1
j (q(x))2dx + o(N (n)):
(19)
In addition, by exploiting the o(N (n)){convergence of Ln [U ](q ) to q , we may conclude that
kPn(An(q))k2F =
X i
kq(x(in)) + n (q)(x(in))k2F =
X i
kq(x(in))k2F + o(N (n)):
(20)
Indeed, we notice that the quantity kn (q )(x(i n))k2F is in nitesimal for any i except i 2 Jn . But from identity 2. in Theorem 3.2, we deduce that the 2-norm of Pn (An (q )) is bounded by the norm of An (q ) and from the Szego theory we nd that kAn (q )k2 kq k1 which is an absolute constant. It follows that over Jn the quantity kq (x(i n)) + n (q )(x(i n))k2F is in general not in nitesimal but is bounded by the constant minfs; tgkq k21 . As a consequence of (20), by virtue of the quasi-uniform distribution of the grid points fx(i n)g (recall [31, Lemma 2.6]), we arrive at Z 1 2 (21) kPn(An (q))kF = N (n) [2]p p kq(x)k2F dx + o(N (n)); I where minX fs;tg j2(q(x)): kq(x)k2F = j =1
The combination of equations (18), (19) and (21), in the light of the powerful Lemma 4.1, allows one to state the weak convergence of fPn (An (q ))g to fAn (q )g for trigonometric polynomials. But, by noticing that this is the assumption of the Weierstrass-type Theorem 4.3, the weak convergence is proved for every continuous function.
Theorem 4.6 Under the same assumption of the previous Theorem 4.5 with p = 1, if n (q) =
sup
x2Sn nS (Jn )
kLn[U ](q)(x) ? q(x)k = O(n?1); #(Jn ) = O(1)
for the functions of the Korovkin test and if the grid points of the algebra are uniformly distributed, then fP [U; U ]n(An (f ))g converges to fAn (f )g in the strong sense.
15
Proof. We follow the same proof given in Theorem 4.5. In particular, in all the equations (19),
(20) and (21) the terms o(N (n)) are replaced by terms of constant order. In equation (19), we exploit [31, Proposition 8.1]. For the relation (20), Theorem 4.2 is used while, for equation (21), we need the uniform distribution (see [31, Lemma 2.6]) instead of the quasi-uniform one. Finally, Lemma 4.1 and the strong case of the Weierstrass-type Theorem are invoked. We observe that the latter two results have not only a theoretical appeal but also a practical application since in Section 5 we study some algebras for which there exists a nontrivial set of exceptional indices Jn of cardinality 1 or 2. Moreover, we remark that if f 2 C (I p; Cst) with p > 1, then the presence of more than one variable leads to a deterioration of the approximation of the Toeplitz matrices. This is due to the diminished precision of the Riemann sum considered in the proof of [31, Lemma 2.6]. At this point, if p > 1 then we loose the strong convergence. All this is stated in the next theorem.
Theorem 4.7 Under the same assumption of the previous Theorem 4.6 with p > 1, we assume that ?1
n (^qi;j;k;l ) =PO(nl ) for the test polynomials q^i;j;k;l and on the mesh points of U with the exception of O(N (n) pl=1 n?l 1 ) grid points. If the grid points of the algebra are uniformly distributed, then
the convergence is weak and we cannot guarantee that the number of outlying singular values is less P p than O(N (n) l=1 n?l 1 ).
From the point of view of the applications the latter result gives a complete explanation (see also [26]) of the fact that we do not nd in the relevant literature \optimal" PCG methods (in the sense that they require only a constant number of iterations) based on the optimal Frobenius approximation in the case where f is multivariate, that is in the p-level, p > 1, block Toeplitz case. This partial negative result holding for the optimal operator has been fully generalized in [34] where it is shown that no superlinear/linear preconditioners can exist in a given matrix algebra/matrix vector space for the preconditioning of multilevel Toeplitz structures.
4.4 Further extensions
Here we generalize the Korovkin matrix theory to the case where the MO is not in diagonal form. If U , s and t are xed, then we can de ne a partial ordering among the algebras fM(U; U; )g with regard to the usual set inclusion. So it is easy to verify that M(U; U; ) M(U; U; ^) if the graph associated to is a subgraph of the one related to ^. In this case the rst algebra is a subalgebra of the second one. With regard to this partial ordering, the algebra M(U; U ), where the MO is the diagonal one, is a minimal element. The following results have then a trivial proof.
Theorem 4.8 Let C (I p; Cst) be the space of the continuous periodic functions and let M(U; U; ) M(U; U; ^). If fP [U; U; ]n(An (f ))g converges to fAn(f )g in the weak/strong sense, then the sequences fP [U; U; ^]n (An (f ))g and fAn (f )g converge each other in the weak/strong sense.
Proof. It is enough to notice that
kAn (f ) ? P [U; U; ^]n(An(f ))kF kAn(f ) ? P [U; U; ]n(An(f ))kF since M(U; U; ) M(U; U; ^).
Therefore, with regard to Theorems 4.5, 4.6 and 4.7, we have that if the assumption of one of them is ful lled, then the corresponding claimed thesis follows where fP [U; U ]n(An (f ))g is replaced by fP [U; U; ]n(An (f ))g. 16
5 A case study: some trigonometric matrix spaces In this section we analyze some trigonometric matrix algebras introduced in [20]. We rst consider the simplest case in which p = s = t = 1: this means that we consider one-level scalar Toeplitz matrices. In a second part we brie y deal with the general case. We remark that the more classical algebras, i.e., circulants, Tau and Hartley have been analyzed in [26, 31].
5.1 The one-level scalar case
We introduce the one-level scalar trigonometric algebras
M(U; U ) = fX = U U : = diagj=1;:::;n(j ); j 2 Cg where the transform U is one of those listed in Table 1. Table 1. Discrete trigonometric transform matrices U . transform transform q 2 h Discrete kj in?1 Inverse I I T I DCT-I Cn = n?1 g (j; k) cos n?1 k;j =0 [Cn] = Cn q2 h i k(2j +1) n?1 [C II ]T = C III DCT-II CnII = cos k n n n 2n k;j =0 q i h n ? 1 (2k+1)j 2 DCT-III CnIII = [CnIII ]T = CnII n j cos 2n k;j =0 q 2 h (2k+1)(2j+1) in?1 DCT-IV CnIV = [CnIV ]T = CnIV n qcos h 4n k;j =0 i n 2 DST-I SnI = sin nkj+1 k;j =1 [SnI ]T = SnI n +1 q2 h in DST-II SnII = k sin k(22jn?1) k;j=1 [SnII ]T = SnIII n q2 h i (2k?1)j n DST-III SnIII = [SnIII ]T = SnII sin n j 2n k;j =1 q i h (2k?1)(2j ?1) n 2 DST-IV SnIV = [SnIV ]T = SnIV n sin 4n k;j =1 Here 0 = n = 12 , k = 1 for k = 1; 2; : : :; n ? 1 and g (j; k) = k n?1?k j n?1?j . We notice that the meshes related to these tranforms, whose expression is reported in the next table, are uniform and therefore the associated algebras are regular. DCT-I DCT-II DCT-III DCT-IV DST-I DST-II DST-III DST-IV
Table 2. Discrete transform meshes. Grid Sn n (n) points o i x = : i = 0 ; : : :; n ? 1 oI = [0; ] n i(n) (ni?+11 =2) nxi(n) = i n : i = 0; : : :; on ? 1 I = [0; ] nxi(n) = (ni+1: =i2)= 0; : : :; n ? 1 Io= [0; ] nxi(n) = (i+1)n : i = 0; : : :; n ?o1 I = [0; ] nxi(n) = (in+1+1=2): i = 0; : : :; n ? 1 o I = [0; ] nxi(n) = (i+1)n : i = 0; : : :; n ?o1 I = [0; ] nxi(n) = (i+1n =2): i = 0; : : :; n ? 1 o I = [0; ] xi = n : i = 0; : : :; n ? 1 I = [0; ] 17
By virtue of the Korovkin-style theorems, in order to check the convergence of fP [U; U ]n(An (f ))g to fAn (f )g where f ranges over the continuous periodic functions, we have to perform the Korovkin test. We remark that all these algebras are inherently real and symmetric and so the test functions are given by the set f1; cos(x); cos(2x)gn. o First of all, we de ne the functions vj(n) j which determine the generic row v(n) of the matrix U : we stress that, except at most the rst and last functions and at most the rst and the last point of the grid Sn , all the functions related to the \cosine" or \sine" algebras are of the form
s
s
2 cos((j + )x) or 2 sin((j + )x) 2 2 n + 1 n + 1 respectively, where 1 2 f?1; 0; 1g and 2 2 f0; 1=2; 1g according to the following table.
vj(n)(x) =
(22)
Table 3. Values of the parameters in (22). 1 2 1 2 DCT-I ?1 0 DST-I 1 1 DCT-II 0 0 DST-II 0 1 DCT-III 0 1=2 DST-III 0 1=2 DCT-IV 0 1=2 DST-IV 0 1=2
5.2 The Korovkin test
Now we are ready to make the Korovkin test for these algebras. Let us study the behaviour of n (q) = v(n)An(q)[v(n)]T ? q over the grid points of the algebra and with q = 1, q = cos(x) and q = cos(2x). Evidently, An (1) is the identity and so, due to the fact that U is unitary, we have v(n)An(q )[v(n)]T ? q 0 over the grid points. So the rst real check concerns the functions cos(x) and fvj(n)(x)g of the form (22): ?1 cos2 ((j + 2 )x)2 cos(x) ? 2 cos(x) + O(n?1 ) n (2 cos(x)) = hn+2 Pnj=0 i P ?1 cos2((j + 2 )x)2 cos(x) (1 + O(n?1 )) = n2 nj =0 ?2 cos(x) + O(n?1) for the \cosine" algebras, ?1 sin2((j + )x)2 cos(x) ? 2 cos(x) + O(n?1 ) n(2 cos(x)) = hn+2 Pnj=0 2 i P ?1 sin2 ((j + )x)2 cos(x) (1 + O(n?1 )) = n2 nj =0 2 ?2 cos(x) + O(n?1) for the \sine" algebras. Pn?1 At this point we have just to prove that C ( x; n 2) := n2 j =0 cos2 ((j + 2 )x) and Sn (x; 2) := 2 Pn?1 2 n j =0 sin ((j + 2 )x) satisfy the relations 1
1
Cn (x; 2) = 1 + O(n?1 );
Sn (x; 2) = 1 + O(n?1 ):
18
(23)
Since Sn (x; 2) + Cn (x; 2) = 2 for every common value of 2 , it suces to prove only the rst relation of (23) for 2 = 0; 1=2; 1, even though this expression is not always meaningful in our context. 2 ?1 ei j x +e?i j x Cn(x; 2) = n2 Pnj=0 2 P n ? 1 1 2 i ( j + ) x = 2n j =0P(e + e?2i(j + )x + 2) ?1 (e2i(j + )x + e?2i(j + )x ) = 1 + 21n Pnj =0 n ?1 (e2i x e2ijx + e?2i x e?2ijx ) 1 = 1 + 2n j =0 = 1 + 21n e2i x ee iinxx ??11 + e?2i x ee?? iinxx ??11 ; ( + 2)
( + 2)
2
2
2
2
2
2
2
2
2
2
2
2
provided that x 6= 0; . To prove that Cn (x; 2) = 1 + O(n?1 ), now we exploit the knowledge of the expression of the grid points. In the case where x = x(i n) is of the form in (DCT-III) or (i+1) n (DST-III) we nd that e2nix = 1 and therefore Cn (x; 2) = 1. When x = x(i n) = (i+1n=2) (DCT-II, DCT-IV, DST-II and DST-IV), we have e2inx = ?1 and then
e2i2 x + e?2i2 x e2ix ? 1 e?2ix ? 1
Cn(x; 2) = 1 ? n1
! 8 > < 1 + n1 if 2 = 0; if = 1=2; => 1 : 1 ? n1 if 22 = 1:
Finally, for the transform DCT-I, we nd that 2 = 0, x = x(i n) = ni?1 and therefore e2inx = e2ix , so that 2ix ? 1) (e?2ix ? 1) ! 1 ( e Cn(x; 0) = 1 + 2n e2ix ? 1 + e?2ix ? 1 = 1 + n1 : 2inx = e2ix , so that For DST-I, we remark that 2 = 1, x = x(i n) = (in+1) +1 and therefore e
!
2ix ?2ix ?2ix 2ix Cn(x; 1) = 1 + 21n e (2eix ? 1) + e ?(2eix ? 1) = 1 ? n1 : e ?1 e ?1
The last test concerns cos(2x). The interesting fact is that the computations previously done can be exploited again. For x belonging to the grid points with at most the exception of the rst one and of the last one, we nd that
Pn?1 2 ?1 n (2 cos(2x)) = n+2P j =0 cos ((j + 2 )x)2 cos(2x) ? 2 cos(2x) + O(n ) n ? 1 2 = [ n j =0 cos2 ((j + 2 )x)2 cos(2x)](1 + O(n?1 )) ?2 cos(2x) + O(n?1) = O(n?1): for the \cosine" algebras and Pn?1 2 ?1 n (2 cos(2x)) = n+2P j =0 sin ((j + 2 )x)2 cos(2x) ? 2 cos(2x) + O(n ) n ? 1 2 = [ n j =0 sin2((j + 2 )x)2 cos(2x)](1 + O(n?1 )) ?2 cos(2x) + O(n?1 ) = O(n?1) for the \sine" algebras. To be precise, the Korovkin test fails only for x(0n) and x(nn?)1 in the case of the transform DCT-I and for x(0n) in the case of the transforms DCT-III and DST-III. Therefore, in these three cases, the set Jn has cardinality 1; 2 = O(1) and so the strong convergence property holds for all the 8 \cosine" or \sine" algebras and for any continuous function f (Theorem 4.6). 1
1
19
5.3 The multivariate trigonometric block case Here we consider the case where f 2 C (I p; Cst). We want to approximate the Toeplitz matrices fAn (f )g by using multilevel block matrix spaces M(U; U )n = fX = U (s)U (t) 2 CN (n)sN (n)tg; s t; where is a block diagonal matrix and where U (m) = Un Un Unp Im . Here the dimension is N (n) = n1 np and each unitary matrix Unj 2 Cnj nj is one of the 8 sine/cosine 1
2
tranformations displayed at the beginning of this section. If we want to apply the Korovkin test, by following the indications of Theorems 4.5{4.7, it seems that all the (2p + 1) s t test functions q^i;j;k;l of De nition 4.2 should be considered. However, in [31, Section 6.1], it is proved that it suces to reduce the test to the functions used in the scalar case. Therefore, in the light of the results of Subsection 5.2, we can state the following general proposition that holds for every space M(U; U; )n for which the Korovkin test is satis ed.
Proposition 5.1 Let us assume that f 2 C (I p; Cst) with s t, the minimal singular value 1(f )
of f is strictly positive for every x and is a non-degenerate MO. If p = 1 then for any > 0, for n large enough,
(P [U; U; ]n(An (f )))+ An (f ) has singular values in (1 ? ; 1 + ) except, at most, N = O(1) outliers.
If p > 1 then for any > 0, for n large enough, (P [U; U; ]n(An (f )))+ An (f ) has singular values in (1 ? ; 1 + ) except, at most, p X ?1 N = O(N (n)
i=1
ni )
outliers.
Proof. The statement is a direct consequence of the success of the Korovkin tests and Theorems 4.6 and 4.7 in the case of usual diagonal BBDMO. In fact, in the light of Theorems 4.6, 4.7 and 4.8, for any positive we have that P [U; U; ]n(An(f )) ? An (f ) = L;n + R;n
where kL;n k2 < and rank(R;n ) = O(tn ) with tn = 1 if p = 1 and tn = N (n) Ppi=1 n?i 1 in the general case. Now, since Ln [U ](f ) converges to f with at most O(tn ) exceptional points and by the use of the SVD, for every positive , we have that (P [U; U; ]n(An (f )))+ = Q;n + K;n where, for n large enough, the maximal singular value of Q;n is less than the supremum value over x 2 I p of [1(f )]?1 + , and rank(K;n) = O(tn ). It follows that (P [U; U; ]n(An (f )))+An (f ) = (P [U; U; ]n(An (f )))+P [U; U; ]n(An (f )) ? (Q;n + K;n )(L;n + R;n ): 20
But
(P [U; U; ]n(An (f )))+ P [U; U; ]n(An (f )) = I + X;n where rank(X;n) = O(tn ) and so the claimed thesis now follows from the uniform boundedness of Q;n . In the case where f is just square integrable, 1 (f ) is sparsely vanishing [37], and the relation Z N ( n ) 2 kP [U; U ]n(An(f ))kF = (1 + o(1)) [2]p p kf k2F (24) I holds true, then for any > 0, for n large enough, (P [U; U; ]n(An (f )))+An (f ) has singular values in (1 ? ; 1 + ) except, at most, N = o(N (n)) outliers. This claim can be proved by using the tools introduced in [31, 28] when the MO is balanced block diagonal. If equation (24) holds, then by part 3 of Theorem 3.2 and by equation (17) in Theorem 4.4 with F (t) = t2 , we infer that kAn(f ) ? P [U; U ]n(An (f ))k2F = o(N (n)): (25) This relation, joint with Lemma 4.1, allows one to say that the singular values of P [U; U ]n(An (f )) distribute like f and to write that An (f ) ? P [U; U ]n(An(f )) = L;n + N;n holds for any > 0 with kL;n k2 and rank(N;n ) = o(N (n)). Moreover, for any positive , equation (25) and the fact that 1(f ) is sparsely vanishing imply that (P [U; U ]n(An (f )))+ = Q ;n + K ;n ; where for n large enough, the maximal singular value of Q ;n is bounded by ?1 and rank(K ;n)
N (n). Finally by direct computation, for any > 0 and > 0, we nd that the matrix (P [U; U ]n(An (f )))+ An (f ) can be written as the identity plus a term phaving rank bounded by o(N (n)) + N (n) plus a term of norm bounded by . By choosing = and owing to the arbitrariness of , we arrive to prove that the matrices f(P [U; U ]n(An (f )))+ An (f )gn have weakly clustered spectra (compare also [39, Theorem 2]). Finally, since M(U; U ) is a subalgebra of M(U; U; ), the proof is easily extended to the preconditioned matrices f(P [U; U; ]n(An (f )))+An (f )gn. So the crucial point is to ful ll relation (24). In the case where the matrix U is related to the !-Fourier transforms with j!j = 1 or to the Hartley transform or to the transform DST-I (the Tau algebra) this fact has been proved in [12]. The same can be easily proved for the other sine/cosine based transforms by using the explicit expression of the coecients of the optimal approximation to An (f ).
5.4 Numerical Experiments
Here we want just to give a numerical evidence of the clustering behaviour of singular values, stated in Proposition 5.1, and consequently of the superlinear convergence rate attained by a PCG-like method based on the trigonometric spaces analyzed so far. Therefore we consider two examples of real matrix-valued generating functions f (x) of one variable, leading to one-level block toeplitz matrices An (f ) with nonstructured rectangular blocks. For several increasing values of n, we then perform the following calculations. 21
1. Construction of the preconditioner P [U; U ]n(An (f )), where U is the DCT-III transform as in Table 1. We made this choice because such particular transform has received some attention in the context of image deblurring [6]. 2. Computation of the singular values of the preconditioned matrix (P [U; U ]n(An (f )))+An (f );
(26)
which are expected to have a proper cluster at 1 by Proposition 5.1. In fact, a neighborhood of center 1 and decreasing radius n is detected, containing all the singular values corresponding to the size n with the exception of a constant number of outliers. 3. Solution of the normal equations related to the least-squares problem min x kAn(f )x ? bk2;
(27)
where b is the vector of all ones, by means of a PCG method with starting vector of all zeros and P [U; U ]n(An (f )) as a preconditioner. The iterations stop when the normal equation residual norm is reduced by a factor less than 10?12 . All the experiments are done in MATLAB on a PC 486.
Example 1 s = 2; t = 1 and f : I ! R21 is such that f (x) =
!
1 + x2 jxj(2 ? 2 cos x) :
Since the rst component of f is strictly positive, the hypothesis of Proposition 5.1 (1 (f ) > 0) is ful lled. For every n, the preconditioned matrix (26) has 2 singular values (1) and (2) far from 1 and the remaining ones lie in a n -neighborhood of 1, as sketched below.
n 16 32 64 128 256 512 (1) 2.02 2.16 2.24 2.27 2.29 2.29 (2) 1.99 2.16 2.24 2.27 2.29 2.29 n 0.2964 0.1682 0.0884 0.0575 0.0456 0.0402 The PCG method applied to the least-squares problem (27) achieves the desired precision 10?12 after kn iterations, where kn tends to decreasing with respect to n.
n 16 32 64 128 256 512 kn 10 15 13 12 10 9
Example 2 s = 3; t = 2 and f : I ! R32 is such that
1 0 2 1 + x j x j f (x) = B @ 2 ? 2 cos x jxj(2 ? 2 cos x) CA :
x2 1 + x4 The minimal singular value of f is strictly positive, since the minor of f consisting of the rst two rows vanishes only for x = 0, and for this value the rows 1 and 3 form an invertible submatrix. Hence Proposition 5.1 holds true, and we nd 6 \outliers" (j ); j = 1; : : :; 6 and all the remaining singular values close to 1 within the distance n . 22
n
(1); (2) (3); (4) (5); (6) n
16 32 64 128 256 5.10; 4.90 6.82; 6.72 8.41; 8.38 9.60; 9.59 10.35; 10.35 1.78; 1.72 1.92; 1.90 2.02; 2.01 2.08; 2.08 2.11; 2.11 0.35; 0.34 0.43; 0.42 0.46; 0.46 0.48; 0.48 0.49; 0.49 0.6232 0.4673 0.3050 0.1794 0.0983
Also in this case we observe a superlinear convergence rate of the PCG method for reaching the desired precision.
n 16 32 64 128 256 kn 22 29 27 22 20
6 The superoptimal operator The optimal choice is the simplest way of de ning a projection operator mapping an arbitrary matrix A into the trigonometric linear space M(U; V; ); it is based on the notion of \matrix distance" induced by the Frobenius norm. Since the performance of a square preconditioner P essentially depends on the spectral behaviour of the matrices P ?1 A or AP ?1 , some authors investigated other projection operators for which P ?1 A is forced to approximate the identity matrix [19]. Here we want to extend and brie y discuss the most popular de nition, introduced for the (scalar) circulant class by Tyrtyshnikov with the name of superoptimal operator [36]. De nition 6.1 The superoptimal operator P^[U; V; ] maps a matrix A 2 CNsNt to a matrix P 2 M(U; V; ) such that its Moore-Penrose inverse P + solves the minimization problem min
X 2M(V;U;)
kAX ? I kF ;
(28)
I being the Ns Ns identity matrix. It is implicitly assumed that for X 2 M(V; U; ) the Moore-Penrose inverse X + belongs to M(U; V; ). This is true for non-degenerate masking operators, as the reader can easily check by using the block diagonality of (up to permutation). As well as for the optimal operator, it is possible to give an explicit expression for the construction of P^ [U; V; ].
Theorem 6.1 Assume that is ND and A has full rank with s t. Then the solution of the problem (28) has the expression P + = (P [U; U; t](AA))?1 (P [U; V; ](A)) ; where P means the optimal approximation and t is de ned in (4).
Proof.
First, observe that the statement makes sense because the matrix A A is Hermitian positive de nite under our hypotheses, so that the invertibility of P [U; U; t](AA) is ensured by Theorem 3.3. For all X 2 M(V; U; ) there exists 2 R( ) such that X = V (t)U (s). Since V and U are unitary, the function to be minimized in (28) can be written as follows: kAX ? I kF = kAV (t)U (s) ? I kF = kA^ ? I kF 23
where A^ = U (s)AV (t). Hence kAX ? I k2F = tr[(A^ ? I )(A^ ? I )] = tr[I ? A^ ? A^ + A^A^] where tr() is the ordinary trace. By property ND1), for every matrix Y 2 CNsNs we have tr(Y ) = tr(s(Y )); moreover, the relation s (I ? A^ ? A^ + A^ A^) = I ? (A^) ? (A^) + t (A^A^) can be proved by a repeated application of Proposition 2.1. The problem (28) then reduces to minimize (29) tr(I ? (A^) ? (A^) + t (A^A^)) over 2 R( ). By making use of Theorem 2.1, we may nd a N N permutation matrix such that = ( It )diag(1 : : : N )(T Is ); (A^) = ( Is)diag(1 : : : N )(T It ); (30) T t (A^ A^) = ( It)diag( 1 : : : N )( It ); where fk g; fk g and f k g are suitable t s; s t and t t matrices, respectively. Moreover, the positive de niteness of A^ A^ (induced by that of A A) implies that each k is positive de nite as well. The trace function (29) can so be expressed in the following form: N X
k=1
tr(Is ? k k ? k k + k k k );
(31)
and we should take the minimum with respect to the k 's. Notice that all the traces are mutually independent, so that we will consider the generic k-th term separately in order to compute the minimum, by omitting the subscript k for simplicity. A simple count shows that ?1 ? ? + = ( ? ) ?1 ( ? ) (32) where is Hermitian positive de nite. Hence, the right hand side of (32) is congruent to ?1 and therefore its trace is nonnegative. Since the trace is linear, we obtain for the k-th term of (31) tr(I ? ? + ) = tr(I ? ?1 ) + tr[( ? ) ?1 ( ? )] tr(I ? ?1); where the equal sign is attained by choosing = ?1 . It follows that (31) is minimized by taking k = ?k 1 k for all k, that yields = t (A^A^)?1 (A^) after the use of relations (30). This equality is equivalent to X = (P [U; U; t](AA))?1 (P [U; V; ](A)) ; thanks to the explicit expression of the optimal preconditioner given in Theorem 3.1. 24
Corollary 6.1 Under the same assumptions as of Theorem 6.1, if U = V; s = t and A is Hermitian positive de nite then P^ [U; V; ](A) is Hermitian symmetrizable and positive de nite.
Proof.
In our setting, = t and, in view of Theorem 6.1, the Moore-Penrose inverse of ^ P = P [U; V; ](A) is P + = P (A2)?1 P (A), where P (A) = P [U; V; ](A). By Theorem 3.1, part 4 and Theorem 3.3, P (A) and P (A2) are Hermitian positive de nite. Therefore both P and P + = P ?1 are Hermitian symmetrizable and positive de nite.
6.1 Why \superoptimal" is not so \good"
Although it is designed in order to improve the spectral behaviour of the preconditioned matrix, the superoptimal operator reveals to have a poor performance when compared to the optimal one: see, for example, the experiments reported in [35, 15, 27], especially those involving ill-conditioned Toeplitz matrices. From Tyrtyshnikov we heard a \phylosophical" explanation to the badness of the superoptimal preconditioner: \the Best is the enemy of the Good". Here we try to substantiate this claim by discussing which algebraic/geometric/spectral properties are shared by the two operators we are considering in this paper, and which ones fail to be satis ed by P^N = P^ [U; V; ]. More precisely, we have already proved algebraic properties that are the analogues of parts 1 and 4 in Theorem 3.1; without diculty, it is also possible to show algebraic properties of P^N very similar to parts 3 and 6 of the same theorem. On the other hand, the most relevant algebraic properties hold for PN and they do not for P^N : the latter operator is not linear (part 2 of Theorem 3.1) and does not preserve the block trace (part 5). In the same way, we cannot prove the analogue of Theorem 3.2 about the geometrical behaviour of the superoptimal operator. This means that a Korovkin-type theory is hardly extendable to P^N , since it is heavily based on linearity and Pithagora's theorem (part 3 of Theorem 3.2). Concerning the spectral properties, the counterpart of Theorem 3.3 holds for P^N too, but a very dierent interval contains the eigenvalues of the superoptimal preconditioner. Theorem 6.2 If s = t, U = V and A is Hermitian positive de nite, then the eigenvalues of P^N (A) are contained in the closed real interval
[min(A)2=max(A); max(A)2=min(A)]: In particular, the condition number of P^N (A) is bounded by the cube of the condition number of A.
Proof. As we have seen in Corollary 6.1, the superoptimal preconditioner of A has the expression
P^N (A) = P (A)?1P (A2), where P () is the optimal operator. The thesis is then a straightforward
consequence of the relations (P (A2)) [min(A)2; max(A)2]; (P (A)) [min(A); max(A)] (ensured by Theorem 3.3) and of the Courant-Fischer minimax characterization. In view of the previous theorem, the conditioning of P^N (A) may be much worse than that of A, and this could be an explanation of the poor performance of the superoptimal preconditioner when applied to ill-conditioned Toeplitz matrices. On the other hand, we want to recall that a clustering result, proved in [36, 15] for the most known matrix algebras, holds in the case where A is a Toeplitz matrix whose condition number stays bounded as the dimension increases. In this case, the two preconditioners considered in this paper are quite comparable and this fully agrees with the theory developed in [30]. 25
7 Conclusions A general theory for the approximation of Toeplitz structures has been introduced and discussed. It has been also shown how to use in practice the Korovkin based matrix theory for devising eective superlinear preconditioners. Therefore we think that this contribution can be used as a guiding tool in order to check the goodness of the \optimal" approach in speci c matrix vector spaces which can be de ned \ad hoc" for a wide variety of speci c applications.
References
[1] D. Bini, \Parallel solution of certain Toeplitz linear systems", SIAM J. Comput., 13 (1984), pp. 268{276. [2] D. Bini, M. Capovani, \Spectral and computational properties of band symmetric Toeplitz matrices", Linear Algebra Appl., 52/53 (1983), pp. 99{126. [3] D. Bini, P. Favati, \On a matrix algebra related to the discrete Hartley transform", SIAM J. Matrix Anal. Appl., 14 (1993), pp. 500{507. [4] N. Bonanni, Proprieta spettrali e computazionali di algebre di matrici. Graduate Thesis in Computer Science, University of Pisa, 1993. [5] R.H. Chan, T.F. Chan, \Circulant preconditioners for elliptic problems", J. Numer. Linear Algebra Appl., 1 (1992), pp. 77{101. [6] R.H. Chan, T.F. Chan, C. Wong, \Cosine transform based preconditioners for total variation minimization problems in image processing", Iterative Methods in Linear Algebra , II, V3, IMACS Series in Computational and Applied Mathematics, Proceedings of the Second IMACS International Symposium on Iterative Methods in Linear Algebra, Bulgaria, June 1995, pp. 311{329. [7] R.H. Chan, X. Jin, M.C. Yeung, \The circulant operator in the Banach algebra of matrices", Linear Algebra Appl., 149 (1991), pp. 41{53. [8] R.H. Chan, J. Nagy, R. Plemmons, \Circulant preconditioned Toeplitz least squares iterations" SIAM J. Matrix Anal. Appl., 15 (1994), pp. 80{97. [9] R.H. Chan, M. Ng, \Conjugate gradient methods for Toeplitz systems", SIAM Rev., 38 (1996), pp. 427{482. [10] R.H. Chan, P. Tang, \Fast band{Toeplitz preconditioners for Hermitian Toeplitz systems", SIAM J. Sci. Comp., 15 (1994), pp. 164{171. [11] P. Davis, Circulant Matrices. John Wiley and Sons, New York, 1979. [12] F. Di Benedetto, S. Serra Capizzano, \A unifying approach to abstract matrix algebra preconditioning", TR nr. 338, Dept. of Mathematics - Univ. of Genova (1997). [13] W. Gautschi, \The condition of Vandermonde-like matrices involving orthogonal polynomials", Linear Algebra Appl., 52/53 (1983), pp. 293{300. [14] I. Gohberg, I. Fel'dman, Convolution Equations and Projection Methods for Their Solution, Trans. Math. Monographs, 41, Amer. Math. Soc., Providence, RI, 1974. 26
[15] B. Grasso, Tecniche di precondizionamento per la risoluzione numerica di sistemi lineari tramite proiezione su particolari algebre matriciali . Graduate Thesis in Mathematics, University of Genova, 1995. [16] U. Grenander, M. Rosemblatt, Statistical Analysis of Stationary Time Series. Second edition, Chelsea, New York, 1984. [17] U. Grenander, G. Szego, Toeplitz Forms and Their Applications. Second Edition, Chelsea, New York, 1984. [18] M. Hanke, J. Nagy, \Restoration of atmospherically blurred images by symmetric inde nite conjugate gradient techniques", Inverse Problems, 12 (1996), pp. 157{173. [19] T. Huckle, \Some aspects of circulant preconditioners", SIAM J. Sci. Comp., 14 (1993), pp. 531{541. [20] T. Kailath, V. Olshevsky, \Displacement structure approach to discrete-trigonometrictransform based preconditioners of G. Strang type and T. Chan type", Proc. \Workshop on Toeplitz matrices" Cortona (Italy), September 1996. Appeared on Calcolo, 33 (1996), pp. 191{208. [21] P.P. Korovkin, Linear Operators and Approximation Theory (English translation). Hindustan Publishing Co., Delhi, 1960. [22] I.P. Natanson, Constructive Function Theory, I. Frederick Ungar Publishing Co., New York, 1964. [23] M. Neuts, Structured Stochastic Matrices of M/G/1 Type and Their Applications. Dekker Inc., New York, 1989. [24] A. Oppenheim, Applications of Digital Signal Processing . Prentice{Hall, Englewood Clis, NJ, 1978. [25] R. Preuss, \Toeplitz matrices and control theory", (R. Preuss, 118 Chandler Str., Suite 3, Boston MA 02116 USA), private communication at \Workshop on Toeplitz matrices in Filtering and Control", Santa Barbara (CA), August 1996. [26] S. Serra, \A Korovkin-type Theory for nite Toeplitz operators via matrix algebras", Numer. Math., to appear. [27] S. Serra, \Superlinear PCG methods for symmetric Toeplitz systems", Math. Comp., under revision. [28] S. Serra Capizzano, \A Korovkin based approximation of multilevel Toeplitz matrices via multilevel trigonometric matrix vector spaces and applications", Linear Algebra Appl., under revision. [29] S. Serra, \The rate of convergence of Toeplitz based PCG methods for second order nonlinear boundary value problems", Numer. Math., under revision. Also TR nr. 15, LAN - Univ. of Calabria (1995). [30] S. Serra Capizzano, \Toeplitz preconditioners constructed from linear approximation processes", SIAM J. Matrix Anal. Appl., under revision. 27
[31] S. Serra Capizzano, \A Korovkin based approximation of multilevel Toeplitz matrices (with rectangular unstructured blocks) via multilevel Trigonometric matrix vector spaces: part I", submitted. Also TR nr. 340, Dept. of Mathematics - Univ. of Genova (1997). [32] S. Serra Capizzano, \Korovkin Theorems and linear positive Gram matrix algebras approximation of Toeplitz matrices", submitted (1997). [33] S. Serra Capizzano, P. Tilli, \Extreme singular values and eigenvalues of non Hermitian block Toeplitz matrices", manuscript (1997). [34] S. Serra Capizzano, E. Tyrtyshnikov, \No multilevel matrix algebra preconditioner is optimal for Toeplitz matrices", manuscript (1997). [35] V. Strela, \Exploration of circulant preconditioning properties", Matrix methods and algorithms, IVM RAN, Moscow, 1993, pp. 9{46. [36] E. Tyrtyshnikov, \Optimal and superoptimal circulant preconditioners", SIAM J. Matrix Anal. Appl., 13 (1992), pp. 459{473. [37] E. Tyrtyshnikov, \Circulant preconditioners with unbounded inverses", Linear Algebra Appl., 216 (1995), pp. 1{23. [38] E. Tyrtyshnikov, \A unifying approach to some old and new theorems on distribution and clustering", Linear Algebra Appl., 232 (1996), pp. 1{43. [39] E. Tyrtyshnikov, N. Zamarashkin, \Spectra of multilevel Toeplitz matrices: advanced theory via simple matrix relationships", Linear Algebra Appl., 270 (1997), pp. 15{27. [40] H. Widom, Toeplitz matrices. In Studies in real and complex analysis, I. Hirshman Jr. Ed., Math. Ass. Amer., 1965.
28