A MULTI-VARIABLE EXTENSION OF TWO-VARIABLE MATRIX ...

15 downloads 0 Views 178KB Size Report
as a limit point based on the two-variable forms of matrix means. ... several matrices based on a differential geometric approach to the subject. This defi- nition as ...
A MULTI-VARIABLE EXTENSION OF TWO-VARIABLE MATRIX MEANS ∗ ´ PALFIA ´ MIKLOS

Abstract. In this article general extension methods to means of several matrices are provided as a limit point based on the two-variable forms of matrix means. The convergence of these extension methods are proved for general, not necessarily orderable tuples of matrices and for any matrix mean which is smaller than or equal to the arithmetic mean. Some useful properties of the extension method are also proved here which are considered by several authors as important properties of a multi-variable matrix mean. Key words. means, operator means, geometric mean AMS subject classifications. 15A24, 15A45, 26E60, 47A64

1. Introduction. Means of two positive definite matrices were developed and defined by Kubo and Ando [15]. They showed that means of two positive definite matrices are in close connection with certain operator monotone functions. At that point they were not able to give a straightforward generalization to means of several matrices, however Ando, teamed up later with Mathias and Li [2], was able to give a well defined generalization of the geometric mean as a limit with the use of a so called symmetrization procedure which defined the geometric mean in a recursive manner. Based on this recursive extension idea Petz and Temesi [21] gave a generalized recursive extension method to every mean function, however they were not able to prove the convergence of the matrix sequences if they were not orderable. Later in [16] Lawson and Lim were able to prove the convergence of this recursive extension in metric spaces where the mean fulfills some geometrical properties which are closely related to nonpositive curvature in the sense of Busemann. Very recently Bini, Meini and Poloni proposed a geometric mean in [8] which is also recursive, in particularly it needs the 2 and n − 2 variable forms of the mean itself to construct the n variable form, but has cubic convergence. Meanwhile Moakher [18] gave a different definition of the geometric mean for several matrices based on a differential geometric approach to the subject. This definition as the Riemann barycenter appeared to be slightly different from the recursive definition, a fact that was pointed out first by Bhatia and Holbrook [7]. The other difference between the two methods is that the recursive definition is monotone while the same is not known for the Riemann barycenter. According to the definition of matrix means by Kubo and Ando we expect the several variable forms to be monotone as well. Generally the recursive method is computationally problematic for 4 or more matrices because it needs explicit, closed formulas. Considering the fact that even the 3 variable form is not known explicitly in the case of the geometric mean and that the same situation applies for other less well known means we must conclude that 4 or more variable forms of means are still problematic. In this article we will consider an iterative extension which relies on the 2-variable forms of means. A very preliminary formulation of this extension method were already given in [20], although its convergence was not proved there. This extension method is based on the 2-variable forms of means therefore it does not suffer from the problems ∗ Department of Automation and Applied Informatics, Budapest University of Technology and Economics, H-1521 Budapest, Hungary ([email protected]).

1

2 of recursion like the Ando-Li-Mathias mean, while at the same time it fulfills some other useful properties of their recursive definition. We will show here that this iterative extension is convergent for all positive definite matrices without the need of any orderability condition like in [21]. The proof will be presented in Section 3 for any matrix mean which is smaller than the arithmetic mean. In Section 4 we will investigate some useful properties of the extension. 2. Preliminaries. We begin with recalling the general definition of mean functions for two and for several matrices. We will also provide here the definition of the iterative extension method. Let P (r) denote the set of self-adjoint, positive definite r × r matrices and S(r) denote the set of symmetric r × r matrices. Definition 2.1. A two-variable function M : P(r) × P(r) → P(r) is called a mean function if (i) M (X, X) = X for every X ∈ P(r), (ii) If X < Y , then X < M (X, Y ) < Y , (iii) If X ≤ X 0 and Y ≤ Y 0 , then M (X, Y ) ≤ M (X 0 , Y 0 ), (iv) M (X, Y ) is continuous, (v) M (CXC ∗ , CY C ∗ ) = CM (X, Y )C ∗ X, Y ∈ P(r) and for all invertible C. The above properties were suggested by Kubo and Ando [15]. With the help of property (v), they were able to arrive at the following formula that connects normalized operator monotone functions f with matrix means:   M (A, B) = A1/2 f A−1/2 BA−1/2 A1/2 .

(2.1)

The next definitions provide the iterative extension method mentioned earlier. It is based on the basic two-variable forms of mean functions and will be used as the extension of means to several variables. Definition 2.2. Let G be a directed graph with n vertices and n edges such that the undirected version of the graph contains a cycle which is Hamiltonian and Eulerian at the same time. This means that every vertex has two edges connected to it. Suppose we have a labeling for edges from 1 to n and for the vertices from 1 to n. This labeling may be arbitrarily chosen as well as the direction of the edges. It is not required from G to have a closed Hamiltonian or Eulerian cycle as a directed graph. We will use the above definition to provide the following iteration. Definition 2.3. Let X = (X10 . . . Xn0 ) where Xi0 ∈ P (r) and Gk be an infinite sequence of arbirtarily directed graphs given in Definition 2.2 for k = 0, 1, 2.... An example of such a graph can be seen in Figure 2.1. Let 1, . . . , n label the vertices of Gk , and for each edge i = e(j, l) in Gk , define Xik+1 = M (Xjk , Xlk ),

(2.2)

where j be the tail vertex and l the head vertex of the edge labelled i and M (X, Y ) is a mean satisfying the properties in Definition 2.1. Taking one iteration step by computing these means, we will get another n tuple of matrices as a result of this iteration step. We apply this procedure recursively. This means that every Gk and the correspondence between the n-tuple of matrices and the vertices of Gk may be different for every k. This procedure yields a sequence of n-tuple of matrices. The purpose of the above definition is to acquire an n-variable function which fulfills the following definition of n-variable mean functions.

3

Fig. 2.1. This graph represents a mapping between two iteration steps in Definition 2.3.

Definition 2.4. An n-variable function Mn : P (r)n → P (r) is a mean function if (i’) Mn (X, . . . , X) = X for every X ∈ P (r), (ii’) min(X1 , . . . , Xn ) ≤ Mn (X1 , . . . , Xn ) ≤ max(X1 , . . . , Xn ) if min and max exists, (iii’) If Xi ≤ Xi0 , then Mn (X1 , . . . , Xn ) ≤ Mn (X10 , . . . , Xn0 ), (iv’) Mn is continuous, (v’) Mn (CX1 C ∗ , . . . , CXn C ∗ ) = CM (X1 , . . . , Xn )C ∗ for all invertible C. The next section is devoted to provide a general proof of the assertion that the sequences given in Definition 2.3 are convergent for all n and have the same limit point. 3. Extension theorem for means of several matrices. In this section we will prove the convergence of the sequences of matrices given in Definition 2.3 to a common limit point. We begin with showing the boundedness of sequences in Definition 2.3, then we prove an inequality which is similar to the semi-parallelogram law at a certain point in the set of positive definite matrices. At the end of the section, some nice properties of the limit point of the sequences in Definition 2.3 are presented as well. Lemma 3.1. The sequences given in Definition 2.3 are bounded for all n. Proof. Let Di0 be matrices such that Di0 ≥ Xi0 and D10 ≤ D20 , . . . ≤ Dn0 . Set up the iteration given in Definition 2.3 on the Di0 matrices with the same mappings as given in every iteration step for the Xi0 matrices. Considering property (ii) and (iii) in Definition 2.1, it is easy to see that Xik ≤ Dik and Dik ≤ Dn0 for all i and k. Therefore every Xik is bounded above by Dn0 for all i and k. In the same way we may construct the lower bound as well. We advance further by showing that a ”semi-parallelogram law” d(X, M (A, B))2 ≤

d(X, A)2 + d(X, B)2 1 − d(A, B)2 2 4

(3.1)

4 holds for X = 0 for the distance function d(A, B)2 = T r {(A − B)∗ (A − B)} ,

(3.2)

which is induced by the Hilbert-Schmidt inner product. First of all, note that we are working in the set of positive definite matrices. Each element of this set of finite norm is of finite distance from the 0 matrix measured with the above distance function. Lemma 3.2. For the distance function (3.2) and for any matrix mean function the following holds M (A, B) ≤ A+B 2 d(0, M (A, B))2 ≤

1 d(0, A)2 + d(0, B)2 − d(A, B)2 . 2 4

(3.3)

Proof. We will show that the above inequality can be reduced to an easier one, which can be easily proved to be true. So using the distance function (3.2), the above equation is equivalent to T r {A∗ A} + T r {B ∗ B} − 2 T r {A∗ A} + T r {B ∗ B} − T r {A∗ B} − T r {B ∗ A} . − 4

T r {M (A, B)∗ M (A, B)} ≤

Noticing the fact that all of the matrices are self-adjoint we get (  2  2 2 )  T r A + T r B + T r {AB + BA} A + B T r M (A, B)2 ≤ = Tr 4 2 ( ) 2 A+B 0 ≤ Tr − M (A, B)2 2  0 ≤ Tr

  A+B A+B − M (A, B) + M (A, B) 2 2

(3.4)

Thus we see that (3.4) is equivalent with the inequality of the assertion. From the condition A+B ≥ M (A, B) it is easy to see that A+B − M (A, B) and 2 2 A+B + M (A, B) are both positive definite as well, so 2  0≤

A+B + M (A, B) 2

1/2 

 1/2 A+B A+B − M (A, B) + M (A, B) . 2 2

This yields (3.4), which proves the assertion. We will also need the following preparatory lemma, which involves the arithmetic mean. Lemma 3.3. Let Xik be sequences given in Definition 2.3 and A(X, Y ) ≥ M (X, Y ) where A(X, Y ) = (X + Y )/2, then Pn Pn k+1 Xk i=1 Xi ≤ i=1 i . (3.5) n n

5 Proof. In one iteration step we have the following for every i  Xjki + Xlki Xik+1 = M Xjki , Xlki ≤ , 2

(3.6)

It must be noted that every Xlk appears twice when the Xik+1 -s are computed since every vertex is one of the ending points of exactly two distinct edges in the graph which has one Hamiltonian and Eulerian cycle as an undirected graph in Definition 2.3. So summing up these equations for every i we arrive at (3.5). Now we are ready to prove the main theorem of this section. Theorem 3.4. The sequences given in Definition 2.3 for any matrix mean are convergent for all n and have the same limit point. M (A, B) ≤ A+B 2 Proof. We begin with showing that the distances d(Xik , Xjk ) are converging to zero, where d(·, ·) is defined by (3.2). Later on we will show that the Xik sequences are themselves convergent. Let us consider one iteration step in Definition 2.3, which actually maps pairs of Xik to a Xjk+1 through a graph by taking the mean M (Xik , Xjk ) of the matrices Xik and Xjk corresponding to the two vertices of an edge. From Lemma 3.2 we get d 0, Xi1

2

2



2

1 d (0, Xji ) + d (0, Xli ) 2 − d (Xji , Xli ) , 2 4

(3.7)

where Xi1 = M (Xji , Xli ). The undirected version of the graph in Definition 2.3 has one cycle which is Hamiltonian and Eulerian at the same time, therefore every vertex has two edges connected to it. So if we sum up the equations above for every edge we arrive at n X

d 0, Xi1

2



n X

n

2

d (0, Xi ) −

i=1

i=1

1X 2 d (Xji , Xli ) . 4 i=1

(3.8)

Applying this to every iteration step we get n X

d 0, Xik+1

2

i=1

|



n X

d 0, Xik

2

i=1

{z

ak+1

}

|

{z ak

}

n



2 1X d Xjki , Xlki . 4 i=1 {z } |

(3.9)

ek

Note that the above is valid for every possible mapping from one n-tuple to another, so we are not restricted to a particular correspondence between the vertices of the graph and the n-tuple of matrices. Now the sequence ak ≥ 0 measures the sum of the squared distances from 0 and the matrices of the n-tuple in every iteration step. This sequence is monotonic decreasing and bounded from above by the initial finite value a0 , therefore it is convergent. From the convergence of ak and (3.9) we have ak+1 ≤ ak − (1/4)ek ,

(3.10)

2 Pn which means that ek = i=1 d Xjki , Xlki → 0. Hence we know that the Xik matrices are approaching one another. Now from Lemma 3.1 it is easy to see that the matrix sequences in Definition k 2.3 are bounded, therefore they have a convergent subsequence of n-tuples Xi j which

6 l

must have the same limit point A. Let Xi j be another convergent subsequence of tuples but with another limit point B. Without loss of generality kj > lj , we have from Lemma 3.3 Pn Pn kj l Xj i=1 Xi ≤ i=1 i . (3.11) n n But we may choose a subsequence of subsequences as kr < lr , then Pn

i=1

n

Xikr

Pn ≥

i=1

Xilr

n

.

(3.12)

Taking the limits we have A ≤ B and A ≥ B, so A = B. This yields that every subsequence has the same limit point, so the main sequences must be convergent and must have the same limit point as well. Note that the above proof does not tell anything about the possibly different limit points of the iterative procedures corresponding to different sequence of graphs in Definition 2.3. These limit points may be different and they might depend on the graphs Gk chosen in every iteration step k. Therefore we introduce the following notation in order to express this dependence on the sequence of graphs. Let us denote the infinite sequence of graphs as G = {G0 , G1 , . . .} .

(3.13)

With this notation from now on we denote the common limit point of the sequences in Theorem 3.4 as MG (X1 , . . . , Xn ) to express the dependence of the limit point on the infinite sequence of graphs Gk . The other question that one can ask is what is the rate of convergence of the sequences Xik to the common limit MG (X1 , . . . , Xn ). Or more specifically how does the infinite sequence of graphs G = {G0 , G1 , . . .} affect the rate of convergence. Generally numerical experiments show that the rate of convergence should be linear [20] for all possible infinite sequence of graphs G. However it appears that the chosen graphs can greatly affect the quotient of this linear convergence. In [20] a heuristic function called Idealmapping was suggested to speed up the convergence of the iterative procedure. Roughly speaking the heuristic function Idealmapping maximizes the length of the closed path in Gk . By length we mean the sum of the squared distances of the matrices Xik from one another measured over the edges of the closed path in Gk with the distance function (3.2). This is just the error term ek introduced in (3.10). Now one can conclude that as Xik approaches the common limit point so does ak its own limit which is a nonnegative number. Therefore if we maximize ek in every step we can speed up the convergence. This argument shows us how Idealmapping works. 4. Properties of the extension MG (X1 , . . . , Xn ). Now that we have proved the convergence of these extension methods, we advance further by showing some useful properties of the limit point MG (X1 , . . . , Xn ). Proposition 4.1. The limit point MG (X1 , . . . , Xn ) of the matrix sequences given in Definition 2.3 satisfies (i0 ), (ii0 ) and (iii0 ) in Definition 2.4 with respect to an infinite sequence of graphs G. Proof. Poperty (i0 ) is trivial. We prove property (iii0 ). Let X1 , . . . , Xn ∈ P (r) and Xi ≤ Xi0 ∈ P (r). Let us consider one iteration step with respect to the mapping between the n-tuple of matrices and some graph g. Compute the means with respect to

7 the graph g on the two n-tuple given as X1 , . . . , Xn and X10 , . . . , Xn0 . Considering the two iteration steps with respect to the same graph g we get the following inequalities M (Xi , Xj ) ≤ M (Xi0 , Xj0 ),

(4.1)

for all i, j pairs by property (iii) in Definition 2.1. Using again property (iii) and also property (ii), it is easy to see that throughout the iterative process we will have these kind of inequalities so we can see that monotonicity is preserved by one iteration step, thus taking the limits we can see that the two limits will have the same ordering as well. Finally poperty (ii0 ) is an easy consequence of property (i0 ) and (iii0 ), if minimum and maximum exist. Setting up the same iteration on the new n-tuple formed by the minimal element we get the inequality on the left in property (ii0 ), similarly we can obtain the inequality on the right as well. Proposition 4.2. If M (A, B) ≤ N (A, B) ≤ (A + B)/2 are matrix means, then the same ordering is true for the induced limit points MG (X1 , . . . , Xn ) and NG (X1 , . . . , Xn ) of the matrix sequences given in Definition 2.3 with respect to an infinite sequence of graphs G. Proof. The proof of this assertion is very similar to the above one, the only difference is that after one iteration we get Xi1 ≤ (Xi0 )1 , where Xi1 = M (Xj , Xl ) and (Xi0 )1 = N (Xj , Xl ) for all i. Now again considering the fact that the monotonicity is preserved by one iteration step, we get the same ordering for the limits. Proposition 4.3. The limit point MG (X1 , . . . , Xn ) of the matrix sequences given in Definition 2.3 satisfies property (v 0 ) in Definition 2.4 with respect to an infinite sequence of graphs G. Proof. Let (Xi0 )0 = CXi0 C ∗ and set up the same iteration on (X10 )0 , . . . , (Xn0 )0 as on X10 , . . . , Xn0 . Property (v) in Definition 2.1 implies   CXik+1 C ∗ = CM Xjki , Xlki C ∗ = M CXjki C ∗ , CXlki C ∗ . (4.2) Applying this recursively in every iteration step we get CXik C ∗ = (Xi0 )k .

(4.3)

Taking the limit k → ∞ the assertion follows. Proposition 4.4. The limit point MG (X1 , . . . , Xn ) of the matrix sequences given in Definition 2.3 is a continuous function in its X1 , . . . , Xn variables with respect to an infinite sequence of graphs G. Proof. We know that for a function f : Y1 → Y2 between two metric spaces (Y1 , d1 ) and (Y1 , d1 ) sequential continuity and the usual topological continuity are equivalent. A proof can be found for example in [17]. We will show that sequential continuity holds therefore arriving at the desired result. We will make use of the following multiplicative metric on P (r)  R(A, B) = max ρ(A−1 B), ρ(B −1 A) (4.4) for all A, B ∈ P (r) and ρ(A) denotes the spectral radius of A. The above metric has the following properties (i) R(A, B) ≥ 1, (ii) R(A, B) = 1 iff A = B, (iii) R(A, C) ≤ R(A, B)R(B, C),

8 (iv) R(A, B)−1 A ≤ B ≤ R(A, B)A, (v) kA − Bk ≤ (R(A, B) − 1) kAk. An extension of this metric to P (r)n can be given as follows. Let X = (X1 , · · · , Xn ) ∈ P (r)n and Y = (Y1 , · · · , Yn ) ∈ P (r)n , then we define Rn (X, Y ) = max {R(Xi , Yi )} .

(4.5)

1≤i≤n

 Now suppose we have a convergent sequence of tuples X k = X1k , . . . , Xnk ∈ P (r)n for which X1k , . . . , Xnk → (X1 , . . . , Xn ) = X ∈ P (r)n . Using property (iv) of R(A, B) we have the following inequalities Rn (X k , X)−1 Xik ≤ Xi ≤ Rn (X k , X)Xik .

(4.6)

Now applying the monotonicity property of MG proved in Proposition 4.1 we have with the notation ck = Rn (X k , X) the following   −1 k k k k MG c−1 (4.7) k X1 , . . . , ck Xn ≤ MG (X1 , . . . , Xn ) ≤ MG ck X1 , . . . , ck Xn . Using Proposition 4.3 we conclude that   k k k k c−1 k MG X1 , . . . , Xn ≤ MG (X1 , . . . , Xn ) ≤ ck MG X1 , . . . , Xn .

(4.8)

Taking the limit k → ∞ we have ck → 1. This shows that    lim MG X1k , . . . , Xnk = MG lim X1k , . . . , lim Xnk

(4.9)

k→∞

k→∞

k→∞

which is sequential continuity for MG . 5. Conclusions. In this article extension methods for means of several matrices were presented as a limit point of an iterative procedure. These methods were based on directly the 2-variable forms of means which is an improvement over the existing recursive extension method of Ando-Li-Mathis and Petz-Temesi. The convergence of these extension methods were proved here for all matrix means which are smaller than the arithmetic mean, therefore obtaining a generally applicable multi-variable matrix mean for all possible symmetric means. Some useful properties of this extension were also investigated and proved to hold in full generality. Acknowledgments. The author would like to thank Istv´an Antal, D´enes Petz, Istv´ an Vajk and the anonymous referees for their useful comments on the paper and the simplification of the proof of Lemma 3.3. REFERENCES [1] E. Ahn, S. Kim and Y. Lim, An extended LieTrotter formula and its applications, Linear Algebra and its Appl., 427 (2007), pp. 190–196. [2] T. Ando, C-K. Li and R. Mathias, Geometric means, Linear Algebra Appl., 385 (2004), pp. 305–334. [3] W. Ballmann, Lectures On Spaces Of Nonpositive Curvature, Birkh¨ auser, 1995, ISBN: 978-37643-5242-4. [4] R. Bhatia, Matrix Analysis, Springer-Verlag, New York, 1996. [5] R. Bhatia, On the exponential metric increasing property, Linear Algebra Appl., 375 (2003), pp. 211-220. [6] R. Bhatia, Positive Definite Matrices, Princeton University Press, 2007.

9 [7] R. Bhatia and J. Holbrook, Riemannian geometry and matrix geometric means, Linear Alg. Appl., 413 (2006), pp. 594-618. [8] D. A. Bini, B. Meini and F. Poloni, An effective matrix geometric mean satisfying the AndoLi-Mathias properties, Math. Comp., 79 (2010), pp. 437–452. [9] D. Burago, Yu D. Burago and S. Ivanov, A Course in Metric Geometry, American Mathematical Society, 2001, ISBN 0-8218-2129-6. ´ Cartan, La topologie des espaces representatifs des groupes de Lie, Enseignment Math., 35 [10] E. (1936), pp. 177-200, Expos´ es de G´ eometrie VIII. [11] C. Conde, Differential Geometry for Nuclear Positive Operators, Integral Equations and Operator Theory, Vol. 57, Num. 4 (2006), pp. 451-471. [12] F. Hiai and H. Kosaki, Means of Hilbert space operators, Lecture Notes In Maths. 1820 (2003), Springer, ISBN: 978-3-540-40680-8. [13] J. Jost, Nonpositive Curvature: Geometric and Analytic Aspects, Birkh auser-Verlag, Basel, 1997. [14] H. Karcher, Riemannian center of mass and mollifier smoothing, Comm. Pure Appl. Math., vol. 30 (1977), pp. 509-541. [15] F. Kubo and T. Ando, Means of positive linear operators, Math. Ann., 246 (1980), pp. 205– 224. [16] J. Lawson and Y. Lim, A general framework for extending means to higher orders, Colloq. Math., 113 (2008), pp. 191–221. [17] I. J. Maddox, Elements of functional analysis, Cambridge University Press, Cambridge, 1988. [18] M. Moakher, A Differential Geometric Approach to the Geometric Mean of Symmetric Positive-Definite Matrices, SIMAX, 26 (2005), pp. 735–747. [19] M. Moakher, Means and averaging in the group of rotations, SIAM J. Matrix Anal. Appl., vol. 24, no. 1 (2002), pp. 1-16. ´ lfia, The Riemann barycenter computation and means of several matrices, Int. J. Com[20] M. Pa put. Math. Sci., 3(3) (2009), pp. 128–133. [21] D. Petz and R. Temesi, Means of positive numbers and matrices, SIAM Journal on. Matrix Analysis and Applications, 27 (2006), pp. 712–720.

Suggest Documents