The Geometric Mean, Matrices, Metrics, and More Jimmie D. Lawson and Yongdo Lim 1. GEOMETRIC MEANS OF POSITIVE REAL NUMBERS. The geometric √ mean ab of positive real numbers a and b “squares the rectangle”, i.e., gives the length of the side of the square with the same area as the rectangle with sides of length a and b. Its geometric construction can be found in Euclid’s Elements (Book II, Proposition 14). The geometric mean of positive real numbers a and b has various alternative characterizations: 1. It is the unique positive solution of the quadratic equation x 2 = ab, or alternatively of the equation xa −1 x = b. √ 2. The geometric mean ab is the√common limit of the successive iterations of harmonic and arithmetic means: ab = limn→∞ an = limn→∞ bn , where a0 = a, b0 = b, an+1 = 2an bn /(an + bn ), bn+1 = (an + bn )/2. The harmonic-geometricarithmetic mean inequalities min{a, b} ≤
√ a+b 2ab ≤ ab ≤ ≤ max{a, b} a+b 2
imply that [an , bn ]n≥1 is a nested sequence of closed √ having length √ intervals, each a b = an bn = · · · = no more than half of the preceding one. Since n+1 n+1 √ √ ab, the intersection must consist of the singleton { ab}. Thus the sequence {an }n≥1 converges monotonically to the geometric mean from below, while the sequence {bn }n≥1 converges monotonically from above. 3. It is the maximum among all real x for which
a x
x b
is positive semidefinite. Since a, b > 0, this matrix is positive semidefinite if and only if its determinant is non-negative if and only if x 2 ≤ ab. 4. The geometric mean is the unique midpoint for the metric δ on R+ given by δ(x, y) = | log(x/y)|: δ(a, b) = 2δ(a,
√ √ ab) = 2δ(b, ab).
The metric δ coincides with the usual distance between log a and log b. It arises as the minimal arc length distance function for the Riemannian metric ds 2 = ((1/t) dt)2 . 5. The geometric mean is the “exponential” of the arithmetic mean in the sense that exp
November 2001]
x+y 2
=
√ ab,
where a = e x ,
b = ey .
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
797
6. It is the unique (positive) fixed point of the following naturally associated M¨obius transformation on R: Let 0 b . X= a −1 0 Then e X , the exponential of the matrix X, has positive entries, and in fact √ b b cosh ab sinh a a . eX = √1 sinh b cosh ab a ab carries As a M¨obius transformation x → (a11 x + a12 )(a21 x + a22 )−1 on R, e X √ the set of positive reals R+ into itself, and one calculates directly that ab is a fixed point. G. Birkhoff [5] has shown that any M¨obius transformation with positive entries is a strict contraction with respect to the Riemannian distance metric δ of (4) by calculating the optimal coefficient of contraction, and hence e X has a unique (positive) fixed point. 7. If x(t) is the solution√of x˙ = b − (1/a)x 2 with initial condition x(0) = x0 > 0, then limt→∞ x(t) = ab. √ 8. Let F(x) = − log x on the positive reals. Then x = ab is the unique positive real number such that F (x)(a) = 1/b. The purpose of the remaining sections is to indicate that these formulations of the geometric mean generalize to a rather remarkable variety of contexts and applications. Systematic consideration of the eight properties provides an outline for our development. 2. GEOMETRIC MEANS FOR POSITIVE DEFINITE MATRICES. Throughout this section all matrices are assumed to be square matrices with real entries. A matrix A is symmetric if A = A T , where A T denotes the transpose. Let Sym(n, R), or simply Sym when n is understood, be the vector space of all n × n symmetric matrices. An A ∈ Sym is positive semidefinite, denoted 0 ≤ A, if x T Ax = x, Ax ≥ 0 for all x ∈ Rn , where ·, · denotes the usual inner product on Rn . Similarly, an A ∈ Sym is positive definite, denoted 0 < A, if it is positive semidefinite and invertible, or equivalently if x T Ax = x, Ax > 0 for all non-zero x. We denote the set of positive definite (respectively, semidefinite) matrices by Sym++ (respectively, Sym+ ). The following “internal” characterization of a positive definite matrix involves orthogonal matrices, matrices U such that U −1 = U T , and is a standard linear algebra result. Proposition 2.1. A symmetric matrix A is positive definite (respectively, semidefinite) if and only if A = U T DU for some orthogonal matrix U and some diagonal matrix D with positive (respectively, non-negative) diagonal entries if and only if A has all eigenvalues positive (respectively, non-negative). rBy standard matrix spectral theory, we can write A ∈ Sym(n, R) uniquely as A = proi=1 λi E i , where λ1 , . . . , λr ∈ R are distinct, each E i is a non-zero orthogonal jection (E i ∈ Sym and E i2 = E i ), and the collection {E i }1≤i≤r satisfies ri=1 E i = I and E i E j = 0 for i = j. Indeed existence follows by choosing the λi to be the distinct eigenvalues and each E i to be the orthogonal projection onto the eigenspace of A associated with λi . The uniqueness, on the other hand, follows by (i) observing that for any 798
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
such decomposition the λi , i = 1, . . . , r , must be eigenvalues and the range of E i must consist of eigenvectors of A associated with λi , (ii) using the equality ri=1 E i = I and the orthogonality E i E j = 0 for i = j to argue that Rn is the direct sum of the ranges of the E i , and then (iii) deducing that the ranges of the E i must exhaust the eigenspaces and thus that the λi must exhaust the eigenvalues. We call {λ1 , . . . , λr } the spectrum of A and ri=1 λi E i the spectral decomposition. M2 , we define a function on For M1 , M2 ⊆ R and a bijection f : M1 → all A ∈ Sym with spectrum contained in M1 by f (A) = ri=1 f (λi )E i , where A = ri=1 λi E i is the spectral decomposition (functions constructed in this way are called primary matrix functions and provide a simple example of the functional calculus). Uniqueness of the spectral decomposition ensures that f is well-defined and defines a bijection from all symmetric matrices with spectrum contained in M1 to all symmetric matrices with spectrum contained in M2 (with inverse defined from f −1 : M2 → M1 ). Extending the function exp : R → (0, ∞) to a primary function on Sym, we obtain (in light of Proposition 2.1) Proposition 2.2. The exponential function exp : Sym → Sym++ given by r r λi E i ) = eλi E i exp A = exp( i=1
i=1
is a bijection. In particular, a symmetric matrix is positive definite if and only if it is the exponential of a symmetric matrix. We recall that the matrix exponential function is more commonly defined by n A /n!, which converges for all A. Since for x in the the power series e A = ∞ n=0 eigenspace of an eigenvalue λi of A, we have (e A )x = eλi x = (exp A)x, the matrix operators e A and exp A agree on the eigenspaces of A, and hence e A = exp A. Using the same methods as those employed for Proposition 2.2, we obtain by extending the bijection f (x) = x 2 on (0, ∞) (respectively, [0, ∞) ) to the matrices with spectrum contained in (0, ∞) (respectively, [0, ∞) ), that is, the positive definite (respectively, semidefinite) matrices, the following Proposition 2.3. If 0 < A (respectively, 0 ≤ A), then A has a unique positive definite (respectively, semidefinite) square root, denoted A1/2 . For any 0 < A, by Proposition 2.2 there exists a unique symmetric matrix log A such that exp(log A) = A. We can thus define Ar for any 0 < A by Ar = exp(r log A). Since exp(A + B) = exp(A) exp(B) if AB = B A, the one-parameter group {Ar : r ∈ R} satisfies the standard laws of exponents. For 0 < A and r = 1/2, this definition agrees with that of Proposition 2.3 since exp((1/2) log A) is a positive definite square root of A and this square root is unique. We introduce the important group of congruence transformations on the vector space Sym. For a given invertible n × n matrix C, define C : Sym → Sym by
C (A) = C T AC. Then C is a linear isomorphism with inverse C −1 . Since 0 < C T AC (respectively, 0 ≤ C T AC) whenever 0 < A (respectively, 0 ≤ A), C (Sym++ ) = Sym++ (respectively, C (Sym+ ) = Sym+ ). In the special case of an orthogonal matrix U , U (A) = U T AU = U −1 AU is also a similarity and preserves multiplication. Note also that C (A) = C AC if C is symmetric. Lemma 2.4. For 0 < A, B, the Riccati equation X A−1 X = B has a unique positive definite solution. November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
799
Proof. By direct substitution
A#B := A1/2 (A−1/2 B A−1/2 )1/2 A1/2 = A1/2 ( A−1/2 (B))1/2
(1)
is a solution of X A−1 X = B for positive definite matrices A and B. The right-hand equality in (1) ensures that it is positive definite. Suppose that 0 < X, Y satisfy X A−1 X = B = Y A−1 Y . Then 1
1
1
1
1
1
1
1
(A− 2 X A− 2 )2 = A− 2 X A−1 X A− 2 = A− 2 Y A−1 Y A− 2 = (A− 2 Y A− 2 )2 and hence A−1/2 X A−1/2 = A−1/2 Y A−1/2 by uniqueness of positive definite square roots. Since A−1/2 is invertible, we conclude that X = Y . See [21] for a discussion of the operator equation X AX = B for positive operators A and B on a Hilbert space. We call A#B := A1/2 (A−1/2 B A−1/2 )1/2 A1/2 the geometric mean of A and B. Lemma 2.4 gives an alternative characterization as the unique positive definite solution of the Riccati equation, X A−1 X = B, an analog of Property 1 of the first section. From Lemma 2.4 we deduce immediately The Idempotent Property: A#A = A and (by inverting both sides of the Riccati equation) The Inversion Property: (A#B)−1 = A−1 #B −1 . If we pre- and post-multiply both sides of the Riccati equation by X −1 , we conclude additionally that (A#B)−1 = B −1 #A−1 , and thus that A−1 #B −1 = B −1 #A−1 . Since every positive definite matrix can be represented as the inverse of its (positive definite) inverse, we have The Commutative Property: A#B = B#A. This identity is by no means obvious from the definition of A#B. If two positive definite matrices A and B commute, then A#B = A1/2 B 1/2 . Indeed A and B are simultaneously diagonalizable, i.e., there exists an orthogonal matrix U and diagonal matrices D1 and D2 such that A = U −1 D1 U and B = U −1 D2 U ; see [14, Section 4.5]. Let (Di )1/2 be the unique positive definite square root of Di for i = 1, 2, which is obtained by taking the positive square roots of the diagonal entries. Then the diagonal matrices (Di )1/2 for i = 1, 2 commute, and hence A1/2 = U −1 (D1 )1/2U and B 1/2 = U −1 (D2 )1/2U commute. Since each is symmetric, their product A1/2 B 1/2 is symmetric. Furthermore, the eigenvalues of A1/2 B 1/2 = U −1 D1 D2 U are the same as those of D1 D2 and thus A1/2 B 1/2 is positive definite. Since A1/2 B 1/2 is a solution of the Riccati equation X B −1 X = A, it follows that A1/2 B 1/2 = B#A = A#B. We also deduce the following property directly from the Riccati equation of Lemma 2.4 for positive definite matrices A and B: The Transformation Property: C (A)# C (B) = C (A#B) for all invertible C. The Transformation Property asserts that C is an isomorphism on Sym++ with respect to the operation of taking the geometric mean. The geometric mean of two positive definite matrices has been considered by Fiedler and Pt´ak in [13], where one finds results both overlapping and extending our preceding discussion. They also introduce a second geometric mean called the spectral geometric mean; the one we are considering they call the metric geometric mean. 800
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
3. THE GEOMETRIC MEAN AND THE MATRIX ORDER. Let ≤ be the partial order (sometimes called the Loewner order) on Sym induced by the closed convex cone of positive semidefinite matrices, i.e., A ≤ B if and only if B − A is positive semidefinite. Also we define A < B if 0 < B − A. Since C preserves Sym+ and Sym++ for each invertible C, we conclude that C is an order isomorphism on Sym with respect to the order relations ≤ and B (respectively, A ≥ B) to denote B < A (respectively, B ≤ A). The geometric mean can be used to establish various matrix inequalities. Suppose, for example, that A ≤ B for A, B ∈ Sym++ . Then from Lemma 2.4 and the Commutative Property, A#B(A−1 − B −1 )A#B = B − A. Since A#B is a bijection carrying Sym+ onto itself, it follows that A−1 − B −1 ∈ Sym+ , i.e., B −1 ≤ A−1 and thus inversion is order-reversing. As we move from the positive real numbers to the positive definite matrices in the study of geometric means, complications arise both from the fact that Sym is not closed under multiplication and that matrix multiplication is not commutative. However, there is a device that sometimes allows us to transfer properties of the geometric mean on R+ to the positive definite matrices; see [14, Corollary 7.6.5]. Lemma 3.1. For A, B ∈ Sym++ , there exists an invertible matrix C such that C (A) = I , the identity matrix, and C (B) is diagonal. Proof. Let A, B ∈ Sym++ . Then there exists an orthogonal matrix C1 such that C1T AC1 is diagonal, a diagonal matrix D ∈ Sym++ such that D(C1T AC1 ) = I , and an orthogonal matrix C2 such that C2T (D 1/2 C1T BC1 D 1/2 )C2 is diagonal. Set C := C1 D 1/2 C2 . Then C (A) = I and C (B) is diagonal. Let D be a diagonal matrix in Sym++ . It follows immediately from the Riccati equation of Lemma 2.4 that I #D = D 1/2 . From I − 2D 1/2 + D = (I − D 1/2 )2 ≥ 0, we deduce that I #D = D 1/2 ≤ (I + D)/2, the geometric-arithmetic mean inequality for I and D. Using the linearity of C (and the consequence that it preserves arithmetic means) and the Transformation Property, we derive from Lemma 3.1 the general geometric-arithmetic mean inequality for positive definite matrices A and B: A#B ≤ (A + B)/2. Applying the geometric-arithmetic mean inequality to A−1 and B −1 , we obtain A−1 #B −1 ≤ (A−1 + B −1 )/2. Taking inverses reverses the inequality and yields the harmonic-geometric mean inequality: 2(A−1 + B −1 )−1 = ((A−1 + B −1 )/2)−1 ≤ (A−1 #B −1 )−1 = A#B; the last equality follows from the Inversion Property. In summary, we obtain for arbitrary A, B ∈ Sym++ , the harmonic-geometric-arithmetic mean inequality: 2(A−1 + B −1 )−1 ≤ A#B ≤ (A + B)/2. −1 −1 For A, B ∈ Sym++ , set A0 = A, B0 = B, An+1 = 2(A−1 n + Bn ) , and Bn+1 = (An + Bn )/2. If A = I and B = D is diagonal, then at all stages we obtain diagonal matrices and stage n + 1 arises from stage n by computing the harmonic mean and the arithmetic mean of the individual diagonal elements. Thus by the computation in property 2 of Section 1 we see that the entries of Bn , n ≥ 1, converge from above to
November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
801
I #D = D 1/2 and the entries of An , n ≥ 1, converge from below to I #D. Using the fact that C preserves the harmonic, geometric, and arithmetic means, it follows from Lemma 3.1 for arbitrary A, B ∈ Sym++ and all n ≥ 1 that An ≤ An+1 ≤ A#B ≤ Bn+1 ≤ Bn , and the sequences {An }n≥1 and {Bn }n≥1 converge monotonically to A#B. Thus the algorithm in property 2 of Section 1 remains valid for positive definite matrices. More generally, this computational algorithm remains valid for positive operators on a Hilbert space [15]. ˆ Applying the algorithm of the preˆ B ∈ Sym++ and A ≤ A. Suppose that A, A, ˆ ceding paragraph to the pairs A, B and A, B, one verifies directly that An ≤ Aˆ n and ˆ Bn ≤ Bˆ n for all n. Since Sym+ is closed, An → A#B, and Aˆ n → A#B, we conclude ˆ Applying this property one entry at a time and using the Commutathat A#B ≤ A#B. tive Property, we obtain The Monotone Property: For A, B, A , B ∈ Sym++ , A #B ≤ A#B whenever A ≤ A and B ≤ B. Applying the Monotone Property to A ≤ B and I ≤ I yields the positive definite case of Theorem 3.2. (The Loewner-Heinz inequality) Let C ∈ Sym, A, B ∈ Sym+ . If C 2 ≤ A ≤ B then C ≤ A1/2 ≤ B 1/2 . Proof. Let A be a positive semidefinite matrix. Pick an orthogonal matrix U and a diagonal matrix D such that A = U T DU . Then A + (1/n)I = U T (D + (1/n)I )U for each n and (A + I /n)1/2 = U T (D + I /n)1/2 U , since the latter is a square root and there is a unique positive definite square root (Proposition 2.3). The right-hand side converges to U T D 1/2 U (since the square root is calculated entrywise on the diagonal), which squares to A. Since the positive semidefinite square root is also unique (Proposition 2.3), we conclude that A1/2 = U T D 1/2 U = lim(A + (1/n)I )1/2 . Now let A ≤ B for A, B ∈ Sym+ . Then A + (1/n)I ≤ B + (1/n)I for each n and thus (A + (1/n)I )1/2 ≤ (B + (1/n)I )1/2 , since we have already established the positive definite case of the theorem. Since the left-hand side converges to A1/2 and the right-hand side to B 1/2 by the previous paragraph, and since Sym+ is closed, we conclude that A1/2 ≤ B 1/2 . If C = U T DU for an orthogonal U and a diagonal D, then (D 2 )1/2 is obtained by taking absolute values along the diagonal of D, and hence D ≤ (D 2 )1/2 ∈ Sym+ . Thus C = U T DU ≤ U T (D 2 )1/2U ∈ Sym+ . Since (U T (D 2 )1/2U )2 = U T D 2 U = C 2 , we conclude that (C 2 )1/2 = U T (D 2 )1/2U ≥ C. By the preceding paragraph, (C 2 )1/2 ≤ A1/2 , and thus C ≤ A1/2 . Assuming the Loewner-Heinz inequality, one can immediately deduce the Monotone Property from the definition of the geometric mean and the Commutative Property. Primary matrix functions on Sym++ that preserve the Loewner order are called monotone matrix functions and have a rich theory all their own; see [8, Chapter VII]. We have employed the geometric mean in establishing the monotonicity of the primary matrix functions A → −A−1 and A → A1/2 induced by t → −1/t and t → t 1/2 , respectively. 802
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
For A ∈ Sym++ (n, R), B ∈ Sym+ (n, R), an n × n-matrix C, and I −A−1 C g= , 0 I we deduce from the block matrix equation A C I 0 A
g = CT B −C T A−1 I CT A 0 = 0 B − C T A−1 C
C B
I 0
−A−1 C I
and from the preservation of Sym+ by g the following Lemma 3.3. For 0 < A, 0 ≤ B, the matrix A C CT B is positive semidefinite if and only if B − C T A−1 C is positive semidefinite, i.e., if and only if C T A−1 C ≤ B. Theorem 3.4. The geometric mean A#B of A, B ∈ Sym++ is the largest matrix (with respect to the the partial order ≤) of all X ∈ Sym for which the block matrix A X (2) X B is positive semidefinite. Proof. Since (A#B)A−1 (A#B) = B, the matrix A A#B A#B B is positive semidefinite by Lemma 3.3. Suppose X ∈ Sym makes the matrix in (2) positive semidefinite. Again by Lemma 3.3 X A−1 X ≤ B, and hence (A−1/2 X A−1/2 )2 = A−1/2 (X A−1 X)A−1/2 ≤ A−1/2 B A−1/2 . Theorem 3.2 implies that A−1/2 X A−1/2 ≤ (A−1/2 B A−1/2 )1/2 , and applying A1/2 yields X ≤ A#B. Theorem 3.4 establishes the matrix version of Property 3 of Section 1. The proof given here is that of Ando [3]. The proof shows that not only is A#B the unique solution of X A−1 X = B, but, more generally, A#B is the largest (in the Loewner order) solution of the Riccati inequality X A−1 X ≤ B. Corollary 3.5. For A, B ∈ Sym++ and X ∈ Sym, we have X A−1 X ≤ B if and only if X ≤ A#B. November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
803
Suppose that we have decreasing sequences An ↓ A and Bn ↓ B, where A, B ∈ Sym+ and An , Bn ∈ Sym++ for each n (as in the proof of Theorem 3.2). By the Monotone Property the sequence (An #Bn ) is a decreasing sequence bounded below by the zero matrix, and hence must have a limit. It is not difficult to show that this limit is independent of the chosen sequences An and Bn and hence a well-defined limit results. We call this limit the geometric mean, and denote it by A#B. Many properties of the geometric mean such as the idempotency, commutativity, and monotonicity carry over to this extended mean on Sym+ , but we do not pursue these topics here. These and other non-trivial properties of the geometric mean A#B generalize readily to positive operators on a Hilbert space, both real and complex, and have played important roles in operator algebra theory: operator inequalities, monotone operator functions, the AGM (arithmetic-geometric mean) for operators, and related geometries in operator spaces; see [2], [4], and [20]. 4. BRUHAT-TITS SPACES. We say that a metric space (X, d) satisfies the semiparallelogram law if for any two points x1 , x2 ∈ X, there is a point z that satisfies: d(x1 , x2 )2 + 4d(x, z)2 ≤ 2d(x, x1 )2 + 2d(x, x2 )2 for all x ∈ X. For x1 , x2 , x in euclidean space Rn , consider the vectors u stretching from x to x1 , v stretching from x to x2 and u − v stretching from x2 to x1 . Extend this triangle to a parallelogram by adjoining the vectors v originating from x1 and u originating from x2 . Let z be the midpoint of u − v, which is also the midpoint of the other diagonal u + v of the parallelogram. x2
v z
u −v
u+v
x
u
x1
Then the usual vector parallelogram law u − v2 + u + v2 = 2u2 + 2v2 is equivalent to the semiparallelogram law with the inequality replaced by an equality (indeed, corresponding terms in the two expressions are equal). This motivates the terminology “semiparallelogram law” and establishes that Hilbert space metrics satisfy it, the point z being the midpoint of x1 and x2 . At first glance the semiparallelogram law appears rather cumbersome. We give a simple illustration of its application. Lemma 4.1. For points x1 , x2 in a metric space (X, d) the z arising in the semiparallelogram law is the unique point in X satisfying d(x1 , z) = d(x2 , z) = (1/2)d(x1 , x2 ). Proof. By consecutively setting x = x1 and x = x2 in the semiparallelogram law, we obtain 2d(xi , z) ≤ d(x1 , x2 ) for i = 1, 2. Then d(x1 , x2 ) ≤ d(x1 , z) + d(x2 , z) ≤ 2max{d(x1 , z), d(x2 , z)} ≤ d(x1 , x2 ) 804
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
implies d(x1 , z) + d(x2 , z) = 2max{d(x1 , z), d(x2 , z)} = d(x1 , x2 ), from which d(x1 , z) = d(x2 , z) = (1/2)d(x1 , x2 ) follows readily. Let y be another point satisfying d(x1 , y) = d(x2 , y) = (1/2)d(x1 , x2 ). Setting x = y in the semiparallelogram inequality yields d(x1 , x2 )2 + 4d(y, z)2 ≤ 2d(y, x1 )2 + 2d(y, x2 )2 = (1/2)d(x1 , x2 )2 + (1/2)d(x1 , x2 )2 , and thus 4d(y, z) ≤ 0, i.e., y = z. The point z of the semiparallelogram law is called the midpoint between x1 and x2 . Lemma 4.1 ensures that it is unique. A Bruhat-Tits space is a complete metric space satisfying the semiparallelogram law; see [7] for the pioneering work on such spaces (the definition appears as item (3.2.1), page 63) and [16, Chapter 11] for their elementary theory. We give a general setting in which Bruhat-Tits spaces arise. Proposition 4.2. Let (X, δ) be a metric space equipped with a binary operation (x, y) → x#y satisfying: 1. The group G of isometries of X acts transitively. 2. There exists a Hilbert space H and a continuous bijection exp : H → X such that for all a, b ∈ H , a − b ≤ δ(exp a, exp b) and exp restricted to any line Ra through the origin is an isometry. 3. Given x, y ∈ X and g ∈ G such that g.(x#y) = ε := exp(0), then log(g.y) = − log(g.x), where g.x denotes the image of x under the isometry g and the function log : X → H is the inverse of exp : H → X. Then X is a Bruhat-Tits space and x#y is the midpoint of x and y for all x, y ∈ X. Furthermore, each g ∈ G preserves #, and # is characterized as the only G-invariant binary operation satisfying exp ((a + b)/2) = exp(a)# exp(b) whenever a, b ∈ Rc, some line through the origin. Proof. Let x1 , x2 ∈ X. Pick g ∈ G such that g.(x1 #x2 ) = ε; thus log(g.(x1 #x2 )) = 0. By Property 3, we have log(g.x2 ) = − log(g.x1 ). Now let x ∈ X, and pick v ∈ H such that exp(v) = g.x. Since H is a Hilbert space and 0 is the midpoint of u 1 := log(g.x1 ) and u 2 = log(g.x2 ), we have the parallelogram law d(u 1 , u 2 )2 + 4d(v, 0)2 = 2d(v, u 1 )2 + 2d(v, x2 )2 , where d(a, b) = a − b. Applying exp to u 1 , u 2 , v, 0, property 2 ensures that exp is an isometry for the left-hand distances and is nondecreasing on the right-hand distances, i.e., δ(g.x1 , g.x2 )2 + 4δ(g.x, ε)2 ≤ 2δ(g.x, g.x1 )2 + 2δ(g.x, g.x2 )2 . Now applying the isometry g −1 ∈ G, we obtain the general semiparallelogram law with midpoint z = x#y. That exp is continuous and nondecreasing for distances ensures that (X, δ) is complete, and hence is a Bruhat-Tits space. Since each g ∈ G is an isometry, it also preserves midpoints and hence the operation #. The function exp restricted to any line Rc through the origin is an isometry and hence carries the midpoint (a + b)/2 in H to the midpoint exp(a)# exp(b) in X for any a, b ∈ Rc. If #1 is another G-invariant binary operation satisfying this property, then # and #1 agree on images of lines Rc through the origin. Since any two points x1 , x2 can be translated to such an image by some g ∈ G (see the preceding paragraph), and since both # and #1 are G-invariant, it follows that # = #1 . November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
805
It is too much to ask that the operation # be the exponential image of the arithmetic mean for all pairs from H . The setting of the proposition gives a reasonable interpretation for declaring that # is induced by the arithmetic mean on H , and thus is an appropriate generalization of Property 5 of Section 1. The Riemannian distance δ on Sym++ is defined as follows: for A, B ∈ Sym++ , δ(A, B) =
n
1/2 log λi 2
,
i=1
where λ1 , . . . , λn are the eigenvalues of the matrix AB −1 . The metric δ is invariant under similarities, since they preserve eigenvalues. Since AB −1 is similar to A−1/2 (AB −1 )A1/2 = A1/2 B −1 A1/2 > 0, the eigenvalues of AB −1 are all positive, and hence log λi is defined for each i. The Riemannian distance δ arises as the metric associated with arc length for a Riemannian metric called the trace metric: ds 2 = tr{(P −1 d P)2 }. Specifically the inner product A, B P on the tangent space TP (Sym++ ) = {P} × Sym ⊆ Sym++ × Sym = T (Sym++ ) at P > 0 is given by the trace of the matrix P −1 A P −1 B for A, B ∈ Sym; see [18, §6, page 63], [16, Chapter 12]. For a curve t → P(t) in Sym++ , the differen
2 tial for computing length is given by (ds/dt)2 = tr P(t)−1 P (t) . By direct computation, δ is invariant under the group of transformations G + := { C : C invertible} and under inversion: δ(C T PC, C T QC) = δ(P, Q) = δ(P −1 , Q −1 ), i.e., the inversion map and the transformations C are isometries for the Riemannian distance δ. The group G + acts transitively on Sym++ since by Lemma 3.1 any two elements can be carried to I and hence (by composition) to each other. By Proposition 2.2, exp : Sym → Sym++ is bijective. We endow Sym with the structure of a Hilbert space with inner product A, B = tr(AB). Note that U leaves this inner product invariant whenever U is orthogonal (since U T = U −1 ). On the diagonal matrices in Sym++ , the Riemannian distance δ coincides with the Euclidean distance between their logarithms. For any symmetric matrix A, there exists an orthogonal matrix U and a diagonal matrix D such that A = U T DU . Then exp t A = exp(tU T DU ) = U T (exp t D)U = U (exp t D) for all t ∈ R. Hence δ(exp t A, exp s A) = δ( U (exp t D), U (exp s D)) = δ(exp t D, exp s D) = |t − s| D. Since U (D) = A and U is an isometry on Sym, t A − s A = |t − s| D = δ(exp t A, exp s A). Thus exp is an isometry on every line RA through the origin in Sym. Suppose that A, B ∈ Sym++ and C (A#B) = I , where A#B is the geometric mean. Then the Transformation Property and Lemma 2.4 imply that I ( C A)−1 I = C (B), and thus log( C B) = − log( C A). We have now verified all the conditions of Proposition 4.2 except for the fact that exp : Sym → Sym++ is distance nondecreasing. Since δ is the arc length metric, it suffices to show that the derivative d exp A is norm nondecreasing for each A ∈ Sym. We sketch the steps and leave the details to the reader: 1. For unitary U , U preserves both the trace metric ds 2 and the Hilbert metric ·, · on Sym and satisfies U ◦ exp = exp ◦ U . Thus application of Proposition 2.1 allows one to reduce to the case that A is diagonal with dii = λi , say. 806
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
2. If E i j , i = j, has i j-th entry 0 and if λi = λ j , then
1 and all other entries exp(A + s E i j ) = exp A + (eλ j − eλi )/(λ j − λi ) s E i j . 3. For exp : Mn (R) → GL(n, R), d exp A (E i j ) = lims→0 (exp(A + s E i j ) − exp A)/s = (eλ j − eλi )/(λ j − λi ) E i j .
4. d exp A (E i j + E ji ) = (eλ j − eλi )/(λ j − λi ) (E i j + E ji ).
5. tr (exp A)−1 d exp A (E i j + E ji )(exp A)−1 d exp A (E i j + E ji ) = 2(eλ j − eλi )2 /eλ j eλi (λ j − λi )2 . 6. The trace in the preceding step is greater than or equal to 2 = E i j + E ji 2 ⇔ (eλ j − eλi )/(eλ j /2 eλi /2 (λ j − λi )) ≥ 1 ⇔ eλ j − eλi − eλ j /2 eλi /2 (λ j − λi ) ≥ 0 ⇔ eλi (et − 1 − tet/2 ) ≥ 0, where t = λ j − λi . 7. By manipulation of power series, et − 1 − tet/2 ≥ 0 if t = λ j − λi ≥ 0, which we may assume by interchanging λi and λ j , if necessary. 8. Carry out similar, but simpler, calculations for the case λi = λ j , j = i and for the case i = j. 9. Since the collection of all E i j + E ji , i = j, and E ii forms an orthogonal basis for Sym and each is an eigenvector for d exp A , and since d exp A is norm nondecreasing on each member of this basis, d exp A is norm nondecreasing on all of Sym. Theorem 4.3. With respect to the distance metric δ on Sym++ arising from the Riemannian trace metric, the space (Sym++ , δ) is a Bruhat-Tits space, and the (unique) midpoint of any two points is given by their geometric mean. We have assumed heretofore that the metric δ is the one arising as the metric associated with arc length for the trace metric. This can also be derived from the preceding calculations. The facts that d exp A is norm nondecreasing, that exp is an isometry on lines through the origin for the metric δ, and that both δ and the trace metric ds 2 are invariant with respect to members of G + yield that δ is bounded by the arc length metric. Conversely, for a diagonal matrix D a direct computation shows that the arc length of t → exp(t D), 0 ≤ t ≤ 1, is δ(D, I ); the invariance of both metrics then implies that the reverse inequality holds everywhere. Theorem 4.3 establishes the connection with Property 4 of Section 1 for positive definite matrices, and together with Proposition 4.2, with Property 5 there. The midpoint property of the geometric mean can be established independently of the theory of Bruhat-Tits spaces, although the uniqueness of the midpoint appears to be another story. Since AB −1 is similar to A1/2 B −1 A1/2 and since A(A#B)−1 is similar to A1/2 (A#B)−1 A1/2 = (A1/2 B −1 A1/2 )1/2 , we have δ(A, B) = 2δ(A, A#B). Analogously δ(A, B) = 2δ(B, A#B). Thus the geometric mean A#B is a midpoint of A and B with respect to the distance δ. Bruhat-Tits spaces arise from a much larger class of Riemannian manifolds, the Cartan-Hadamard manifolds, which are complete simply connected Riemannian manifolds with seminegative curvature; see [16, Chapter XI]. In this case one has the standard exponential mappings from the tangent spaces to the manifold, whose role is similar to that of the exponential functions introduced in this section. The seminegative curvature gives the nondecreasing property of the exponential map. The Riemannian manifold Sym++ is one example of a Cartan-Hadamard manifold. 5. DIFFERENTIAL RICCATI EQUATIONS. The standard symplectic (skewsymmetric and non-degenerate) form on Rn × Rn is given by β(x1 , y1 ), (x2 , y2 ) = November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
807
x1 , y2 − y1 , x2 , where ·, · is the usual inner product on Rn . The symplectic Lie group Sp(2n, R) consists of all invertible linear transformations on Rn × Rn that preserve the form β. A subspace V of Rn × Rn is called isotropic if βx, y = 0 for all x, y ∈ V . The symplectic transformations carry isotropic subspaces to isotropic subspaces of the same dimension. If X ∈ Mn (R), then the graph X G := {(X x, x) | x ∈ Rn } of X is an n-dimensional subspace of Rn × Rn . Furthermore, if X is symmetric, then X G is isotropic, since β(X x, x), (X y, y) = X x, y − x, X y = 0. Denote by Xn the collection of all ndimensional isotropic subspaces of Rn × Rn . Then a symplectic transformation A B g= ∈ Sp(2n, R) C D gives rise to a bijection g : Xn → Xn defined by sending an isotropic n-dimensional subspace V of Rn × Rn to its image g(V ); the mapping (g, V ) → g(V ) : Sp(2n, R) × Xn → Xn is a group action. If X ∈ Sym, then the graph X G := {(X x, x) | x ∈ Rn } of X is an n-dimensional isotropic subspace of Rn × Rn as is its image g(X G ). The image g(X G ) is the graph of a linear map Y ∈ Mn (R) if (and only if) C X + D is invertible and (AX + B)(C X + D)−1 = Y . Indeed, if C X + D is invertible and y = (C X + D)x, then Xx A B Xx AX x + Bx (AX + B)(C X + D)−1 y g = = = y x C D x C X x + Dx and thus g(X G ) is Y G for Y = (AX + B)(C X + D)−1 . In this case, since Y G is isotropic, it follows that Y is symmetric. Therefore the action of Sp(2n, R) on Xn induces (via the identification of a symmetric matrix with its graph) an action on the symmetric matrices, which we call the fractional (or M¨obius) action, defined by A B · X = (AX + B)(C X + D)−1 , if C X + D is invertible. C D Now suppose that the graph X G is an invariant n-dimensional subspace of a 2n × 2n real matrix T ∈ M2n (R). Then X G is also an invariant subspace of exp T . Thus (exp T ) · X = X, that is, X is a fixed point of exp T under the fractional action. Let A, B ∈ Sym++ and for the remainder of this section let 0 B . T = A−1 0 Then since A#B is the solution of the quadratic equation X A−1 X = B, we have 0 B (A#B)x Bx (A#B)y = = for y = A−1 (A#B)x; A−1 0 x A−1 (A#B)x y hence the graph of A#B is T -invariant. Therefore, A#B, the geometric mean of A and B, is a fixed point of exp(t T ) for all t ∈ R, and these are symplectic transformations since T is in the symplectic Lie algebra, which consists of block matrices A B C D satisfying B, C ∈ Sym and D = −A T . 808
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
The matrix exp t
0 A−1
0 0
=
I t A−1
0 I
acts on a symmetric matrix X under the fractional action by (t, X) → X (t A−1 X + I )−1 , where the latter inverse is defined at least for small t (for fixed X). Differentiating with respect to t and evaluating at t = 0 gives the differential equation corresponding to this continuous dynamical system: X˙ = −X A−1 X. Similarly 0 B I tB exp t = 0 0 0 I acts by (t, X) → X + t B, which has corresponding differential equation X˙ = B. Thus the fractional action of exp(t T ) gives the solution of the differential Riccati equation X˙ = B − X A−1 X. This last step uses a standard fact about differentiable Lie group actions G × M → M: if the infinitesimal generator of the dynamical system (t, x) → exp(tCi )x is the vector field Ui on M for i = 1, 2, then (t, x) → exp (t (C1 + C2 )) x has infinitesimal generator U1 + U2 . In [6] Bougerol considered the subsemigroup A B + T T H := ∈ Sp(2n, R) : A is invertible, B A , A C ∈ Sym C D of the symplectic group Sp(2n, R); the members of H are called Hamiltonian matrices. We denote by H0 the subset of H for which B A T , A T C ∈ Sym++ . If we calculate exp(t T ), t > 0, directly from the exponential series, we see that its (1, 2)entry is t B + X (t) and its (2, 1)-entry is t A−1 + Y (t), X (t), Y (t) ∈ Sym+ , and thus exp(t T ) ∈ H0 for t > 0. Bougerol has shown that each member of H0 carries Sym++ into itself (under the fractional action) and is a strict contraction with respect to the Riemannian metric δ. Since A#B is a fixed point (from the previous paragraph or directly from the differential Riccati differential equation), we conclude that A#B is the unique fixed point for each exp(t T ), t > 0, and achieve the analog of Property 6 of Section 1. However the Banach fixed point theorem for contractions gives us even more, an analog of Property 7: Theorem 5.1. For A, B ∈ Sym++ , A#B is the limiting value as t → ∞ of the solution of the differential Riccati equation X˙ = B − X A−1 X on Sym++ with initial value any member of Sym++ . The contraction properties of the Hamiltonian semigroup of matrices as mappings of Sym++ into itself under the fractional action (which includes exp(t T ), t > 0) can be used to establish fundamental results in Kalman filtering theory [6], a basic tool in stochastic control. 6. CONVEX PROGRAMMING. Since Karmarkar introduced his polynomial-time projective-scaling algorithms for linear programming in 1984, the field of interior point methods has developed at a rapid rate. A unified class of convex optimization problems (including nonlinear problems) has emerged that are susceptible to interior point methods, provided that the cone of admissible solutions is a symmetric cone, i.e., an open convex cone in a finite dimensional real Hilbert space V that is self-dual with November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
809
respect to the given inner product and is homogeneous in the sense that the group G() = {g ∈ GL(V ) | g() = } acts transitively on . In classical linear programming the cone was chosen to be (0, ∞)n ⊆ Rn , but an appropriate example in our context is Sym++ ⊆ Sym. A typical problem considered is: given a symmetric cone in a Hilbert space V , c ∈ V , a linear and surjective A : V → W , and b ∈ W , maximize c, x subject to x ∈ and Ax = b. The key ingredients in the interior point algorithms are a modified Newton’s method and the presence of a self-concordant barrier function F : → R, a smooth convex function tending to ∞ as the boundary of is approached, that together with its derivatives satisfies appropriate Lipschitz continuity properties. The barrier enters directly into the functions used for path-following and potential-reduction. For any two points in , there is a point called a scaling point at which the Hessian of the barrier carries the first point to the second. One repeated step in typical interior point algorithms is to locate these scaling points, a process that sometimes has ties with the geometric mean. This provides motivation to study the geometric mean in symmetric cones. We illustrate with an example that also provides an analog of Property 8 of Section 1. The canonical barrier function on the symmetric cone of positive definite matrices is given by F(X) = − log det(X). Then F (X) = −X −1 and the Hessian of F is F (X)(A) = X −1 AX −1 . Thus the geometric mean A#B is characterized as the unique positive definite element with the property that the Hessian F at A#B maps A to B −1 (take inverse of both sides of the Riccati equation in Lemma 2.4). Therefore the scaling point from A to B −1 is given by the geometric mean. The book [19] provides details of the theory of convex programming. 7. GEOMETRIC MEANS IN SYMMETRIC CONES. There is an attractive theory of geometric means for symmetric cones that both extends and closely parallels our development of the theory for Sym++ . Indeed familiarity with the positive definite case and a basic facility with Jordan algebras provide reliable guides to this extension. We offer the briefest of glimpses in this closing section. The theory of symmetric cones is closely tied to that of Euclidean Jordan algebras. A Jordan algebra V over R is a finite-dimensional commutative (but in general not associative) algebra satisfying x 2 (x y) = x(x 2 y) for all x, y ∈ V . We also assume the existence of a multiplicative identity e. A Euclidean Jordan algebra is a Jordan algebra V equipped with an associative inner product ·, · satisfying x y, z = y, x z for all x, y, z ∈ V . The space Sym is a Euclidean Jordan algebra with Jordan product X ◦ Y = (1/2)(XY + Y X) and inner product X, Y = tr(XY ). An alternative statement of the Jordan algebra law is (x y)x 2 = x(yx 2 ), a weak associativity condition that is strong enough to ensure that the subalgebra generated by {e, x} in V is associative [12, Proposition II.1.2]. there is a well-defined expo Hence n x /n!, and as in the special case of nential map exp : V → V given by exp(x) = ∞ n=0 Sym, exp is a bijection onto its image := exp V . A (or perhaps the) fundamental theorem of Euclidean Jordan algebras asserts that (i) is a symmetric cone, and (ii) every symmetric cone in a finite dimensional real Hilbert space arises in this way. This result and the basic theory of Euclidean Jordan algebras may be found in [12]. Members of are called positive elements. In the case of the Jordan algebra Sym, the Jordan al810
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108
gebra exponential is just the usual one, and hence the corresponding symmetric cone is Sym++ . The other irreducible symmetric cones have been classified and consist of (i) the cones of positive definite members of the Hermitian and Hermitian quaternion matrices, (ii) the Lorentz cones, and (iii) a 27-dimensional exceptional cone. General symmetric cones are products of these. There are two basic operator representations associated with a Jordan algebra: left translation given by L(a)x = ax and the quadratic representation given by P (x) = 2L(x)2 − L(x 2 ); the latter is the important one for our purposes. The Jordan algebra law asserts alternatively that L(x) and L(x 2 ) commute, and that the associativity of the inner product is equivalent to the symmetry of L(x). The general formula P (x)y = 2x(x y) − x 2 y reduces in the case V = Sym to P (X)(Y ) = XY X, where the multiplication on the right-hand side is usual matrix multiplication, not the Jordan multiplication. Thus a key observation in generalizing from Sym to Euclidean Jordan algebras is to replace consistently expressions of the form XY X by P (x)(y). For a, b ∈ , the symmetric cone Riccati equation P (x)a −1 = b has a unique positive solution given by x = a#b := P (a 1/2 )(P (a −1/2 )b)1/2 , called the geometric mean of a and b. One uses this variant of the Riccati equation to establish various algebraic properties of the operation # such as the idempotent, commutative, inversion, and transformation properties. A symmetric cone admits a G()-invariant Riemannian metric defined by γx = P (x)−1 u|v, x ∈ , u, v ∈ Tx () = V for which the inversion j (x) = x −1 is an (unique) involutive isometry fixing the identity e. The unique geodesic curve joining a and b is γ (t) = P (a 1/2 )(P (a −1/2 )b)t ,
0≤t ≤1
and the Riemannian distance δ(a, b) is given by δ(a, b) =
r
1/2 log λi 2
,
i=1
where the λi s are the eigenvalues of P (a −1/2 )b. (The eigenvalues of an element in a Euclidean Jordan algebra arise from a canonical spectral decomposition; see [12, Theorem III.1.2].) Then the geometric mean a#b is the unique midpoint of a and b for the Riemannian distance δ. In this case the symmetric cone turns out to be a Riemannian symmetric space of noncompact type. These are known to be Cartan-Hadamard manifolds, and hence the metric δ is a Bruhat-Tits metric with midpoint a#b. For more details and additional information on geometric means on symmetric cones, see [17]. For an approach to convex programming via Jordan algebra methods, see [9] and [10], and for connections of convex programming with the geometric mean see [11]. REFERENCES 1. W. N. Anderson, Jr., Shorted operators, SIAM J. Appl. Math. 20 (1971) 520–525. 2. T. Ando, Topics on Operator Inequalities, Lecture Notes Hokkaido Univ., Sapporo, 1978. , Concavity of certain maps on positive definite matrices and applications to Hadamard products, 3. Linear Algebra Appl. 26 (1979) 203–241. , On some operator inequalities, Math. Ann. 279 (1987) 157–159. 4. 5. G. Birkhoff, Extensions of Jentzsch’s theorem, Trans. Amer. Math. Soc. 85 (1957) 219–227.
November 2001]
THE GEOMETRIC MEAN, MATRICES, METRICS, AND MORE
811
6. P. Bougerol, Kalman filtering with random coefficients and contractions, SIAM J. Control Optim. 31 (1993) 942–959. 7. F. Bruhat and J. Tits, Groupes R´eductifs sur un Corps Local I, Pub. IHES 41 (1972) 5–251. 8. W. F. Donoghue, Jr., Monotone matrix functions and analytic continuation, Springer-Verlag, Berlin, 1974. 9. L. Faybusovich, Linear systems in Jordan algebras and primal-dual interior-point algorithms, J. Comput. Appl. Math. 86 (1997) 149–175. , Euclidean Jordan algebras and interior-point algorithms, Positivity 1 (1997) 331–357. 10. 11. , A Jordan-algebraic approach to potential-reduction algorithms, Technical Report, Department of Mathematics, University of Notre Dame, Notre Dame, 1998. 12. J. Faraut and A. Kor´anyi, Analysis on symmetric cones, Clarendon Press, Oxford, 1994. 13. M. Fiedler and V. Pt´ak, A new positive definite geometric mean of two positive definite matrices, Linear Algebra Appl. 251 (1997) 1–20. 14. R. Horn and C. Johnson, Matrix Analysis, Cambridge University Press, New York, 1985. 15. F. Kubo and T. Ando, Means of positive linear operators, Math. Ann. 246 (1980) 205–224. 16. S. Lang, Fundamentals of differential geometry, Graduate Texts in Math., Springer, Heidelberg, 1999. 17. Y. Lim, Geometric means on symmetric cones, Arch. Math. 75 (2000) 39–45. 18. H. Maass, Siegel’s modular forms and Dirichlet series, Lecture Notes in Math. 216, Springer-Verlag, Heidelberg, 1971. 19. Y. E. Nesterov and A. S. Nemirovskii, Interior Point Polynomial Methods in Convex Programming, SIAM Publications, Philadelphia, 1993. 20. R. D. Nussbaum and J. E. Cohen, The arithmetic-geometric mean and its generalizations for noncommuting linear operators, Ann. Scuola. Norm. Sup. Pisa Cl. Sci. 15 (1988) 239–308. 21. G. Pedersen and M. Takesaki, The operator equation T H T = K , Proc. Amer. Math. Soc. 36 (1972) 311-312. JIMMIE LAWSON received his B.Sc. from Harding College (1964) and his Ph.D. from the University of Tennessee (1967), both in mathematics, a subject that attracted his attention from primary school days. He has spent a long and enjoyable career at Louisiana State University, where he was recently made a Boyd Professor, but he has also been a frequent visitor at the Technical University, Darmstadt, Germany. His varied research interests include continuous lattices and domains, Lie theory, geometric control, topological dynamics, and topological and Lie semigroups. He is now completing a term as Notes Editor for the M ONTHLY. Louisiana State University, Baton Rouge, LA 70803
[email protected] YONGDO LIM obtained his B.Sc. and M.Sc. in mathematics at Kyungpook National University, and his Ph.D. in mathematics in 1996 at Louisiana State University under the supervision of Jimmie Lawson. He is an Assistant Professor at Kyungpook National University. His research interests include the Lie theory of semigroups and Jordan algebras. Kyungpook University, Taegu, Korea
[email protected]
812
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 108