arXiv:1702.03237v1 [math.MG] 10 Feb 2017
Foundations for a scaling-rotation geometry in the space of symmetric positive-definite matrices David Groissera,∗, Sungkyu Jungb , Armin Schwartzmanc a Department
of Mathematics, University of Florida, Gainesville, FL 32611, USA of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA c Division of Biotatistics, University of California, San Diego, CA 92903, USA
b Department
Abstract We investigate a geometric structure on Sym+ (p), the set of p × p symmetric positive-definite matrices, based on eigen-decomposition. Eigenstructure determines both a stratification of Sym+ (p), defined by eigenvalue multiplicities, and fibers of the “eigen-composition” map F : M (p) := SO(p) × Diag+ (p) → Sym+ (p). The fiber structure leads to the notions of scaling-rotation distance between X, Y ∈ Sym+ (p), the distance in M (p) between fibers F −1 (X) and F −1 (Y ), and minimal smooth scaling-rotation (MSSR) curves [Jung et al. (2015)], images in Sym+ (p) of minimal-length geodesics connecting two fibers. In this paper we study the geometry of the triple (M (p), F, Sym+ (p)), where M (p) is given a suitable product Riemannian metric, and use this basic geometry to begin a systematic characterization and analysis of MSSR curves. This task raises basic questions: For which X, Y is there a unique MSSR curve from X to Y ? More generally, what is the set M(X, Y ) of MSSR curves from X to Y ? This set is influenced by two ways that non-uniqueness can potentially occur, “Type I” and “Type II”. We translate the question of whether Type II nonuniqueness can occur into a question about the geometry of Grassmannians Gm (Rp ), with m even, that we answer for p ≤ 4 and p ≥ 11. Our method of proof also yields an interesting half-angle formula concerning principal angles beween subspaces of Rp whose dimensions may or may not be equal. The general-p results concerning MSSR curves and scaling-rotation distance that we establish here underpin the p = 3 results in [8], where they enable us to find new explicit formulas for MSSR curves and scaling-rotation distance, and to identify M(X, Y ), in all nontrivial cases. Keywords: eigen-decomposition, stratified spaces, scaling-rotation distance, signed-permutation group, geometric structures on quotient spaces, principal angles, geometry of Grassmannians 2010 MSC: 53C99, 53C15, 57R15, 53C22, 51F25, 15A18, 58A35 ✩ This
work was supported by NIH grant R21EB012177 and NSF grant DMS-1307178. author Email addresses:
[email protected] (David Groisser),
[email protected] (Sungkyu Jung),
[email protected] (Armin Schwartzman) ∗ Corresponding
Preprint submitted to Advances in Applied Mathematics
February 13, 2017
1. Introduction In this work, we investigate a geometric structure on Sym+ (p), the set of p × p symmetric positive-definite (SPD) matrices, p > 1, and special curves that this structure gives rise to. Both the geometric structure and these special curves are built from eigen-decomposition of SPD matrices. Let Diag+ (p) denote the set of p× p diagonal matrices with positive diagonal entries. By an (orthonormal) eigen-decomposition of X ∈ Sym+ (p) we will mean a pair (U, D) ∈ SO(p) × Diag+ (p) such that X = U DU −1 = U DU T . The space of such decompositions, M (p) := SO(p) × Diag+ (p),
(1.1)
F (U, D) = U DU T .
(1.2)
thus comes naturally equipped with a smooth surjective map F : M → Sym+ (p) defined by
+
For each X ∈ Sym (p) we define the fiber over X to be the set EX := F −1 (X). Note, however, that M (p) is not a fiber bundle over Sym+ (p) with projection F ; the map F is not even a submersion. (Rather, the relation of M (p) to Sym+ (p) is reminiscent of the notion of blow-up in algebraic geometry: M (p) can be viewed as a sort of blow-up of Sym+ (p) along several subvarieties.) The natural action SO(p) × Sym+ (p) → Sym+ (p), (U, X) 7→ U XU T , endows Sym+ (p) with a stratification by orbit-type, and the derivative of F is nonsingular only on the pre-image of the“top” stratum. This stratification is identical to the stratification by “eigenvalue-multiplicity type”, in which the strata are labeled by partition of the integer p. Eigenvalue multiplicities also determine a more refined stratification of the space M (p), in which the strata are labeled by partitions of the set {1, . . . , p}. The fiber structure of M (p) formalizes the notion of minimal smooth scalingrotation curves [11], whose characterization and analysis are among the main goals of this paper. Motivated by applications to diffusion-tensor imaging, Schwartzman [13] introduced smooth scaling-rotation curves as a way of interpolating between SPD matrices in such a way that eigenvectors and eigenvalues both change at uniform speed. Minimal smooth scaling-rotation curves were defined in [11] as smooth curves whose length (as determined by an appropriate Riemannian metric on M (p)) minimizes the amount of scaling and rotation needed to transform an SPD matrix into another. More precisely, each factor of M (p) is a Lie group, and for our Riemannian metric gM on M (p) we take a product metric determined by choosing biinvariant metrics gSO , gD+ on the factors SO(p), Diag+ (p). We define smooth scaling-rotation (SSR) curves in Sym+ (p) to be the projections to Sym+ (p) of geodesics in (M (p), gM ). In this scaling-rotation framework, the “distance” dSR (X, Y ) between any two matrices X, Y ∈ Sym+ (p) is defined to be the 2
distance between the fibers EX and EY (nonzero if X 6= Y since each fiber is compact). The minimal-length geodesics connecting two fibers EX and EY give rise to minimal smooth scaling-rotation curves (MSSR) curves, “efficient” smooth scaling-rotation curves that join X and Y . The geometry of M (p) with the Riemannian metric gM is relatively simple; for example the squared-distance function d2M is simply a sum of squares. Despite this simplicity “upstairs”, the problem of determining MSSR curves between arbitrary X, Y in the quotient space Sym+ (p) is far from trivial, and the dependence on X and Y of the set M(X, Y ) of such curves is quite intricate. Each fiber EX is a submanifold of M (p), typically with many connected components, whose diffeomorphism-type (which we give very explicitly in Corollary 2.9) depends on the stratum of Sym+ (p) in which X lies, and has positive dimension when X does not lie in the top stratum. While it is easy to write down an explicit formula for the distance between two arbitrary points of M (p), even in Euclidean space there is no formula for distance between submanifolds of positive dimension, even when the submanifolds come equipped with explicit parametrizations or are level-sets of explicitly given functions. Understanding the fibers of F is thus one of the key ingredients in analyzing non-trivial features of the scaling-rotation framework. Among the nontrivial features is the fact, shown in [11], that dSR is a metric on the top stratum of Sym+ (p), but not on all of Sym+ (p). In [9], we use dSR to generate a true metric on Sym+ (p). This paper has several interrelated goals: to provide a sytematic description of the geometry and topology of the triple (M (p), F, Sym+ (p)) in terms of fibers and strata; to lay a firm foundation for further study of the scaling-rotation framework, such as in [8] and [9]; to characterize dSR and MSSR curves in a way that reduces computational complexity and that, for small enough p, enables the derivation of closed-form formulas; to address nonuniqueness questions for MSSR curves between two given points X, Y ∈ Sym+ (p); and to present some unanticipated geometric results (described in more detail below), potentially of independent interest, that were produced in the course of studying the nonuniqueness questions. We elaborate below how this paper is structured relative to these goals. Section 2 of this paper is devoted to a thorough understanding of the fibers of F . This includes a discussion of an important finite group S˜p+ , the group of “even signed-permutations”, that acts in an isometric and fiber-preserving way on M (p), inducing a transitive action on the set of connected components of each fiber (thereby allowing us to count the connected components). To place the fibers into the context of a “big picture” view of the triple (M (p), F, Sym+ (p)), in Section 3 we discuss the strata of M (p) and Sym+ (p), and the labeling of strata by partitions. The strata and their labelings are always at work in the background of the scaling-rotation framework, even when not mentioned explicitly, since the fiber-type varies from stratum to stratum of Sym+ (p), and because the more-refined stratification of M (p) is needed in order to characterize the set of all MSSR curves from one point of Sym+ (p) to another. The labeling by partitions plays a critical role in [8, Sections 5 and 6] and [9], for example. 3
With the “home space” (M (p), F, Sym+ (p)) of the scaling-rotation framework now described, in Section 4 we turn our attention to scaling-rotation distance and MSSR curves. In Section 4.1 we review the basics of SSR curves, before restricting attention to MSSR curves in Section 4.2 and beyond. As mentioned earlier, MSSR curves from X to Y are images of shortest-length geodesics in M (p) between the submanifolds EX and EY . The scaling-rotation distance dSR (X, Y ) is the length of any such geodesic, and we call the pair of endpoints of such a geodesic a minimal pair. Effectively, computing dSR (X, Y ) amounts to finding minimal pairs. However, because of the action of S˜p+ , minimal pairs are never unique for any (X, Y ). For some (X, Y ), there are also minimal pairs that are not related to each other by this action. A priori, to compute dSR (X, Y ) and to find all minimal pairs in EX × EY entails computing all the minimal-length geodesics from each connected component of EX to each connected component of EY . Proposition 4.10, which concludes Section 4.2, takes advantage of the S˜p+ -action to reduce the complexity of computing dSR (X, Y ), of finding all MSSR curves from X to Y , and of characterizing all minimal pairs. Among the outcomes is that for any connected component of EX , each MSSR curve is represented by a geodesic whose initial point lies in that component, and the number of connected components of EY that must be considered is only the number of double-cosets of S˜p+ relative to two subgroups determined by X and Y . In Section 4.3, we address uniqueness questions for MSSR curves, the most basic of which is: under what conditions on X, Y ∈ Sym+ (p) is there a unique MSSR curve from X to Y ? For there to be more than one MSSR curve from X to Y , there must exist distinct shortest-length geodesics γ1 , γ2 : [0, 1] → M (p) from EX → EY such that F ◦γ1 6= F ◦γ2 . There are essentially two ways, not mutually exclusive, that this can happen: (i) there can exist such γi (i = 1, 2) whose endpoint-pairs are distinct minimal pairs, and (ii) there exist such γi whose endpoint-pairs are the same minimal pair. We call these possibilities “Type I” and “Type II” non-uniqueness, respectively. A minimal pair ((U, D), (V, Λ)) ∈ M (p) × M (p) has more than one minimal geodesic connecting its first and second points if and only if the pair (U, V ) ∈ SO(p) × SO(p) is geodesically antipodal, meaning that each of U, V is in the cut-locus of the other; equivalently, that V −1 U is an involution. Thus, Type II nonuniqueness occurs for some X, Y ∈ Sym+ (p) if and only there exists a minimal pair ((U, D), (V, Λ)) for which (U, V ) is geodesically antipodal. Our chief tool for determining whether such minimal pairs exist is a property we call sign-change reducibilty: we say that the pair (U, V ) is sign-change reducible if dSO (U, V ) can be reduced by multiplying U or V by a (positive-determinant) “sign-change matrix”, a diagonal matrix each of whose diagonal entries is ±1. We show in Proposition 4.16 that if (U, V ) ∈ SO(p) × SO(p) is not sign-change reducible, then there exist D, Λ in the top stratum of Diag+ (p) such that ((U, D), (V, Λ)) is a minimal pair. We show in Proposition 4.14 that for p ≤ 4, every geodesically antipodal pair (U, V ) is sign-change reducible, and that for p ≥ 11, there exist geodesically antipodal pairs that are not sign-change reducible. From these propositions we deduce that Type II nonuniqueness never occurs for p ≤ 4 (Corollary 4.15), and that 4
it always occurs for some (X, Y ) ∈ Sym+ (p) × Sym+ (p) if p ≥ 11 (Corollary 4.17). We do not believe that either of the numbers 4 and 11 above is sharp; our methods are simply not conclusive for 5 ≤ p ≤ 10. Together, Proposition 4.16 and Corollary 4.17 show that sign-change reducibility is the only obstruction to having points X, Y in the top stratum of Sym+ (p) for which the set M(X, Y ) of MSSR curves from X to Y exhibits Type II nonuniqueness. We mention that even without Proposition 4.14, for p ≤ 3 it is rather trivial that all geodesically antipodal pairs are sign-change reducible, and for p = 4 an independent proof relying on quaternions is also possible. However, our proof of the p ≤ 4 part of Proposition 4.14 makes no use of quaternions, and unifies these low-p results. A more refined version of the nonuniqueness question is: for each pair (X, Y ) ∈ Sym+ (p) × Sym+ (p), what is the set M(X, Y )? By characterizing all minimal pairs, Proposition 4.10 provides a starting point for answering this question. But to completely understand M(X, Y )—or even just determine its cardinality—we still need a way to tell whether MSSR curves corresponding to two (not necessarily distinct) minimal pairs with first point in a given connected component of EX are the same. In Section 4.4, Proposition 4.19 provides such a tool. While Propositions 4.10 and 4.19 are general-p results, they were originally proven in order to enable concrete computations in the case p = 3, the home of diffusion-tensor imaging. These are propositions from which we derive no additional conclusions in this paper, but that are crucial to [8, Sections 5 and 6]. There, aided by the quaternionic parametrization of SO(3), these two results are the key tools that, for p = 3, allow us to find closed-form formulas for dSR (X, Y ), and to explicitly describe the set M(X, Y ) (in particular, knowing when there is a unique MSSR curve from X to Y ) when X and Y do not both lie in the top stratum. Our proof of Proposition 4.14, completed in Section 8 after laying groundwork in Sections 5–7, takes us in some unexpected directions, with unanticipated consequences. We initially introduced the notion of sign-change reducibility into our scaling-rotation-curve study as an ad hoc tool to help us determine whether Type II nonuniqueness of MSSR curves is possible. However, as we show in Proposition 5.11, a refined version of the question of whether sign-change reducibility occurs is equivalent to a question purely about the geometry of Grassmannians equipped with a standard Riemannian metric: for m even and positive, is every m-dimensional subspace of Rp within a certain distance c(m) of a coordinate m-plane? (This question can, of course, be asked without restricting the parity of m, but the above equivalence leads us to consider only even m in this paper.) By constructing examples, we show that for m = 2, the answer to the latter question is no for p ≥ 11. This, combined with the equivalence result in Proposition 5.11, yields the “p ≥ 11” part of Proposition 4.14 mentioned above. The “p ≤ 4” part of Proposition 4.14 is proven by other means (via the more technical Proposition 5.6). Although it was the possibility of Type-II non-uniqueness for MSSR curves that led to us to the question above concerning the geometry of Grassmannians, 5
this question and our study of it may be of independent interest, outside of any connection to our scaling-rotation framework. This study led us to investigate several related questions concerning distances between (even-dimensional) subspaces of Rp and (even-dimensional) coordinate planes not necessarily of the same dimension. Perhaps the most unexpected of these is a half-angle relation stated in Proposition 5.10 and proven in Section 6: for any two involutions R1 , R2 ∈ SO(p), each of the principal angles between the (−1)-eigenspaces of R1 and R2 is exactly half a correspondingly indexed normal-form angle of R1 R2 . This relationship holds whether or not the dimensions of the (−1)-eigenspaces are equal. When the dimensions are equal, we use this relationship to show that a natural correspondence between Grm (Rp ) and a connected component Invm (p) ⊂ Inv(p), discussed in Remark 5.2, is a metric-space isometry up to a constant factor of 2 (Proposition 5.9). This isometric relation is also deducible (and may already be known) from a purely Riemannian approach, but our proof uses essentially no Riemannian geometry (see Remark 6.5 for a more precise statement, and an additional interpretation of what our proof of Proposition 5.9 shows). The most important results coming from our study of sign-change reducibility are stated in Section 5, with the proofs deferred to Sections 6, 7, and 8. These results include those mentioned above and one more whose statement involves terminology not included in this Introduction: Proposition 5.8, a special case of a more general conjecture we make about sign-change reducibility (Conjecture 5.7). Key to almost all of these results are several facts, established in the long and technical Lemma 6.2, concerning the product of a general involution in SO(p) and a positive-determinant sign-change matrix. In this paper, there are numerous instances of a group G acting from the left on a space X. We use the notation (g, x) 7→ g ·x for all such actions (where g ∈ G, x ∈ X), and X/G will always denote the corresponding quotient space (the space of orbits, with the quotient topology). When X is a finite set, we always give X (and therefore X/G) the discrete topology. 2. Partitions and Fibers 2.1. Partitions and eigenstructure Let Part({1, . . . , p}) denote the set of partitions of {1, 2, . . . , p}, and Part(p) the set of partitions of the integer p. We will write elements J of Part({1, . . . , p}) are in the form J = {J1 , J2 , . . . }, where the Ji are the blocks of J. We will consider several stratified spaces in this paper. For each space, the strata are labeled naturally either by Part({1, . . . , p}) or by Part(p). The natural left-action of the symmetric group Sp on {1, 2, . . . , p} induces left-actions of Sp on Part({1, . . . , p}) and Rp . There is a canonical bijection between the quotient Part({1, . . . , p})/Sp and the set Part(p), so we will implicitly regard these as the same set. For J ∈ Part({1, . . . , p}), we write [J] for the image of J in Part(p) under the quotient map. The sets Part({1, . . . , p}) and Part(p) are partially ordered by the refinement relation. For J, K ∈ Part({1, . . . , p}), we write J ≤ K if K refines J. Similarly, for 6
[J], [K] ∈ Part(p) we write [J] ≤ [K] if [K] refines [J]. (We use this ordering, rather than its opposite, to make the bijection between partitions and strata discussed in Section 3 order-preserving.) In each of these partially ordered sets there is a well-defined “highest” (most refined) and “lowest” (least refined) element; we denote these with the subscripts “top” and “bot” respectively. Note that the quotient map Part({1, . . . , p}) → Part(p) is order-preserving. Notation 2.1. 1. Let Diag(p) denote the set of p × p diagonal matrices, and let Diag+ (p) := {diag(d1 , . . . , dp ) : di > 0, 1 ≤ i ≤ p} ⊂ Diag(p). For D = diag(d1 , . . . , dp ) ∈ Diag(p), let JD denote the partition of {1, 2, . . . , p} determined by the equivalence relation i ∼D j ⇐⇒ di = dj . 2. For ∅ = 6 J ⊂ {1, 2, . . . , p}, let RJ ⊂ Rp denote the subspace p {(x1 , . . . , xp ) ∈ R | xj = 0 ∀j ∈ / J}. For a partition J = {J1 , . . . , Jr } of {1, 2, . . . , p}, let {W1 , . . . , Wr } = {W1J , . . . , WrJ } = {RJ1 , . . . , RJr } denote the corresponding subspaces of Rp ; note that we have an orthogonal decomposition Rp = RJ1 ⊕ . . . ⊕ RJr . Define the subgroup GJ ⊂ SO(p) by GJ = {R ∈ SO(p) | RWi = Wi , 1 ≤ i ≤ r},
(2.1)
a Lie group with (generally) more than one connected component. We write G0J for the identity component of GJ . For any subgroup H ⊂ O(p), we write S(H) for H GJ
∼ = ∼ =
T
SO(p). Note that
S(O(W1 ) × O(W2 ) × . . . O(Wr )) S(O(|J1 |) × O(|J2 |) · · · × O(|Jr |)),
(2.2) (2.3)
where O(Wi ) denotes the orthogonal group of the subspace Wi , which we identify with a subgroup of O(p). Hence, writing ki = |Ji |, we have G0J ∼ = SO(k1 ) × SO(k2 ) · · · × SO(kr ).
(2.4)
Obviously (2.2) holds whether or not k1 ≥ k2 ≥ · · · ≥ kr . However, if the ki are non-decreasing then [J] = (k1 , . . . , kr ), so for the sake of concreteness we define G0[J] = SO(k1 ) × SO(k2 ) · · · × SO(kr ) if [J] = (k1 , k2 , . . . , kr ). Remark 2.2. An easily-verified fact, reflected in the stratifications of Sym+ (p) and M (p) discussed in Section 3, is the following: J ≤ K ⇐⇒ GJ ⊃ GK .
(2.5)
The groups GJ are related to eigenstructure as follows. For each D ∈ Diag(p) let GD denote the stabilizer of D under the action of SO(p) on Sym(p) by conjugation, and observe that if (U, D) ∈ M (p) is an eigen-decomposition of X ∈ Sym+ (p), and R lies in GD , then (U R, D) is also an eigen-decomposition of X. But GD is precisely the group GJD defined using Notation 2.1. (In other 7
words, the group GD does not depend on the absolute or relative sizes of the diagonal entries of D, but only on which eigenvalues are equal to which other eigenvalues.) 2.2. Signed permutations and signed-permutation matrices Let Ip = (Z2 )p (the direct product of p copies of Z2 ). The role of Z2 will be as the group of signs, so we write it multiplicatively, with elements ±1. We will write typical elements of Ip as σ = (σ1 , . . . , σp ); we write the identity element as 1. Both Ip and the symmetric group Sp have natural representations on Rp : the group Ip acts via sign-changes of the coordinates, while Sp permutes the coordinates. These representations embed Ip and Sp as subgroups of O(p), together generating a group of “signed-permutation matrices”. Abstractly, this group is a semidirect product S˜p = Ip ⋊ Sp , a split extension of Sp by Ip . Let mat : S˜p → O(p) denote the induced embedding of the abstract group S˜p into O(p). Writing elements of S˜p (uniquely) as pairs (σ, π) withQ σ ∈ Ip and π ∈ Sp , p define g sgn(σ, π) = sgn(σ)sgn(π), where sgn(σ1 , . . . , σp ) = i=1 σi and sgn(π) is the sign of the permutation π. The homomorphism g sgn : S˜p → Z2 satisfies sgn(σ, π) = det(mat(σ, π)). Hence g S˜p+ := ker(g sgn),
(2.6)
which we call the group of even signed permutations, is exactly the set of signed permutations mapped by mat into SO(p); we call mat(S˜p+ ) the group of even signed-permutation matrices. We call elements of mat(Ip ) sign-change matrices; these are simply diagonal matrices with ±1’s along the diagonal. Writing Ip+ := {σ ∈ Ip : sgn(σ) = 1}, we have a short exact sequence proj incl 1 → Ip+ −→ S˜p+ −→2 Sp → 1.
(2.7)
∼ (Z2 )p−1 (non-canonically), S˜+ is an extension of Sp by (Z2 )p−1 . Since Ip+ = p This extension is non-split, in contrast to the extension S˜p of Sp by (Z2 )p . However, from (2.7) it is immediate that |S˜p+ | = 2p−1 p! .
(2.8)
Henceforth we will usually not distinguish explicitly between the abstract group S˜p (respectively S˜p+ ) the corresponding subgroup of O(p) (resp. SO(p)); we will generally use the terms “signed permutation” and “signed-permutation matrix” synonymously in any context in which only one interpretation makes sense. However, to avoid some odd-looking formulas, for g ∈ S˜p (respectively π ∈ Sp ) we will generally write Pg (resp. Pπ ) for the matrix mat(g) (resp. mat(1, π)), and write Iσ for mat(σ, id.). The image of g under the projection proj2 : S˜p → Sp will be denoted πg .
8
Remark 2.3. For a subspace W ⊂ Rp and ǫ ∈ Z2 = {±1}, let Oǫ (W ) ⊂ O(W ) denote the set of orthogonal transformations with determinant ǫ. In the setting of (2.2), the connected components of GJ are Oǫ1 (W1 )×Oǫ2 (W2 )×· · ·×Oǫr (Wr ), Q subject to the restriction i ǫi = 1. Thus a labeling of the blocks of an r-block partition J yields a 1-1 correspondence between Ir+ and the set of connected components of GJ . In particular, the number of connected components is 2r−1 . Identifying Diag(p) with Rp , the natural left-action of Sp on Rp yields a left-action of Sp on Diag(p). For D ∈ Diag(p), we will write [D] for its image in the quotient space Diag(p)/Sp . Note that the action of Sp on Diag+ (p) ⊂ Diag(p) lifts to an action of S˜p on Diag+ (p), (2.9) g·D := πg·D.
Of course, the action of S˜p on Rp induces an action on Sym(p), (g, X) 7→ Pg XPgT = Pg XPg−1 . This conjugation action preserves the space Diag+ (p), on which it reduces to (2.9): Pg DPg−1 = πg·D. We end this subsection by stating an easy lemma (proof left to reader) that will be used in the proof of Proposition 2.5 in the next (sub-)subsection: Lemma 2.4. Ip+ ⊂ GJ for all J ∈ Part({1, . . . , p}). 2.3. Structure of the fibers
In this subsection we will provide a systematic description of the fibers of F . The starting point is the following proposition. Proposition 2.5. Let X ∈ Sym+ (p), (U, D) ∈ EX = F −1 (X), and write GD = GJD . Then EX = {(U R(Pg )−1 , πg·D) : R ∈ GD , g ∈ S˜p+ }.
(2.10)
Proof: This is a simple corollary of [11, Theorem 3.3]. Details are left to the reader. Corollary 2.6. Let X ∈ Sym+ (p) and (U, D) ∈ EX , and write G0D = G0JD . Then (2.11) EX = {(U R(Pg )−1 , πg·D) : R ∈ G0D , g ∈ S˜p+ }.
Proof: Clearly the right-hand side of (2.11) is contained in the right-hand side of (2.10), so it suffices to prove the opposite inclusion. Let R ∈ GD , g ∈ S˜p+ . Enumerate the blocks of J := JD as J1 , . . . , Jr , and let Wi be as in Notation 2.1. As noted in Remark 2.3, the enumeration of the blocks of J yields a 1-1 correspondence between Ir+ and the connected components of GJ . Let R lie in the component of GJ labeled by (ǫ1 , . . . , ǫr ) ∈ Ir+ . The 9
cardinality of {j : ǫj = −1} is some even number k. Let σ = (σ1 , . . . , σp ) ∈ Ip , where for 1 ≤ i ≤ p we set σi =
−1 if i ∈ Jj , ǫj = −1, and i is the smallest element of Jj ; 1 otherwise.
Then R1 := IσR ∈ G0J . But also |{i : σi = −1}| = k, so σ ∈ Ip+ ⊂ S˜p+ , and Pg Iσ = Pg1 for some g1 ∈ S˜p+ with πg1 = πg . Hence Pg R = (Pg Iσ)(IσR) = Pg1 R1 , so (U (Pg R)−1 , πg·D) = (U (Pg1 R1 )−1 , πg1·D), which lies in the righthand side of (2.11). The desired inclusion follows. We will need a bit more notation to complete our characterization of the fibers of F : Notation 2.7. 1. For J = {J1 , . . . , Jr } ∈ Part({1, . . . , p}), let KJ = {π ∈ Sp : π(Ji ) = Ji , T T T 1 ≤ i ≤ r}, ΓJ = S˜p+ GJ , and Γ0J = ΓJ G0J = S˜p+ G0J . Also define the subgroup Y IJ+ = {(σ1 , . . . , σp ) ∈ Ip : σj = 1, 1 ≤ i ≤ r} ⊂ Ip+ . (2.12) j∈Ji
+
2. For any X ∈ Sym (p) and (U, D) ∈ EX , define [(U, D)] = {(U R, D) : R ∈ G0D },
(2.13)
the connected component of EX containing (U, D). We write Comp(EX ) for the set of connected components of EX . 3. For any Lie group G and closed subgroup K, we write G/K and K\G for the spaces of left- and right-cosets, respectively, of K in G. (In particular, we use this notation when G is a finite group.) Note that the groups IJ+ are a generalization of Ip+ ; for J = Jtop = {{1}, {2}, . . . , {p}} we have IJ+ = Ip+ . We observe that equivalent definitions of ΓJ and Γ0J are: ΓJ Γ0J
= {(σ, π) ∈ S˜p+ : π ∈ KJ }, = {(σ, π) ∈ S˜p+ : σ ∈ IJ+ , π ∈ KJ }.
(2.14) (2.15)
Thus, analogously to (2.7), we have short exact sequences 1 → Ip+ → ΓJ → KJ → 1, 1→
IJ+
→
Γ0J
→ KJ → 1,
10
(2.16) (2.17)
and the cardinality of the middle group in each of these sequences is the product of the cardinalities of the groups to its left and right. Note also that equivalent definitions of KJ are KJ
= {π ∈ Sp | π·D = D for some D with JD = J} = {π ∈ Sp | π·D = D for all D with JD = J}.
(2.18)
Next, observe that the group S˜p+ acts on M (p) via setting g·(U, D) = (U Pg−1 , g·D).
(2.19)
g·[(U, D)] = [g·(U, D)].
(2.20)
This action preserves every fiber of F . Thus for each X ∈ Sym+ (p) there is an induced action of S˜p+ on Comp(EX ), given by
This leads us to:
Proposition 2.8. Let X ∈ Sym+ (p). Then every (U, D) ∈ EX determines a bijection between Comp(EX ) and the set S˜p+ /Γ0JD . Proof: Two elements (U, D), (U ′ , D′ ) lie in the same component of EX if and only if and only if D′ = D and U ′ = U R for some R ∈ G0D . Thus it is clear from (2.11) that the action (2.20) of S˜p+ on Comp(EX ) is transitive. Therefore for any (U, D) ∈ EX , the map S˜p+ → Comp(EX ), g 7→ g·[(U, D)], induces a bijection S˜p+ /Stab([(U, D)]) → Comp(EX ), where Stab([(U, D)]) is the stabilizer of [(U, D)] under the action (2.20). But, as is easily checked, Stab[(U, D)] is exactly the group Γ0JD .
An important special case of Proposition 2.8 is the case in which all eigenvalues of X are distinct. In this case, JD = Jtop = {{1}, {2}, . . . , {p}} and Γ0JD = {id.}. Thus the action of S˜p+ on Comp(EX ) is free as well as transitive. Furthermore G0D = {I}, so each connected component of EX is a single point; Comp(EX ) = EX . Thus EX itself is an orbit of S˜p+ , and any choice of (U, D) ∈ EX yields a bijection S˜p+ → EX , g 7→ g·(U, D). Corollary 2.9. Let X ∈ Sym+ (p), (U, D) ∈ EX , and let k1 , . . . kr be the parts of the partition [JD ] of p. Then EX is diffeomorphic to a disjoint union of 2r−1 k1 !k2p!!...kr ! copies of SO(k1 ) × SO(k2 ) × · · · × SO(kr ).
Proof: Let J = JD . It is clear from (2.13) that each connected component of EX is a submanifold of M (p) diffeomorphic to G0D = G0J , which from (2.4) is isomorphic (hence diffeomorphic) to SO(k1 ) × SO(k2 ) × · · · × SO(kr ). From Proposition 2.8, the number of connected components is |S˜p+ /Γ0JD | = |S˜p+ |/|Γ0JD |. 11
From (2.8) we have |S˜p+ | = 2p−1 p!, while from (2.17) we have |Γ0JD | = |IJ+ | |KJ |. It is easily seen that IJ+ is isomorphic to (Z2 )p−r , and that KJ is isomorphic to Sk1 × Sk2 × · · · × Skr , and hence that |KJ | = k1 !k2 ! . . . kr !. The result follows.
Remark 2.10. An alternate, instructive route to Corollary 2.9 is the following. (We merely sketch the ideas; the reader may fill in the details.) For J ∈ Part({1, . . . , p}), define QJ = {Pg R : g ∈ S˜p+ , R ∈ GJ } ⊂ SO(p). Thus the set QJ is a finite union of left-cosets of GJ , each of which is diffeomorphic to the compact submanifold GJ ⊂ SO(p). If X ∈ Sym+ (p), (U, D) ∈ EX , and J = JD , the map QJ → M (p), Q 7→ (U Q−1 , QDQ−1 ), is an embedding with image EX . Hence EX is a submanifold of M (p) diffeomorphic to QJ . But for any closed subgroups H1 , H2 of a compact Lie group G, the set H1 H2 := {h1 h2 : h1 ∈ H1 , h2 ∈ H2 } ⊂ G is a submanifold of G and a principal H2 -bundle over T T H1 /(H1 H2 ), with projection map given by h1 h2 7→ h1 (H1 H2 ). Applying T this to the case H1 = S˜p+ , H2 = GJ , G = SO(p), we have H1 H2 = ΓJ , so QJ is a principal GJ -bundle over the finite set S˜p+ /ΓJ . But the natural map S˜p+ /ΓJ → Sp /KJ , gΓJ 7→ proj2 (g)KJ (where proj2 is as in (2.7)), is a bijection, so QJ may be viewed as a principal GJ -bundle over Sp /KJ . The cardinality of this base-space is |Sp |/|KJ |, which is simply the multinomial coefficient k1 !k2p!!...kr ! if [J] = (k1 , . . . , kr ) ∈ Part(p). Thus EX is diffeomorphic to k1 !k2p!!...kr ! copies of GJ , and each copy of GJ is diffeomorphic to 2r−1 copies of SO(k1 )×· · ·×SO(kr ). 3. Stratification of Sym+ (p), M (p), and related spaces The group G = SO(p) acts from the left on Sym+ (p) via (U, X) 7→ U·X = U XU T .
(3.1)
+
As for any group-action, elements X, Y ∈ Sym (p) are said to have the same orbit type if their stabilizers are conjugate; this implies that the fibers EX , EY are diffeomorphic. The orbit-type stratification of any manifold under the action of compact Lie group is known to be a Whitney stratification ([6, p. 21]). The orbit-type stratification of Sym+ (p) is easily seen to be identical to the stratification by “eigenvalue-multiplicity type” defined below. We use Part({1, . . . , p}) to define stratifications of the spaces Diag+ (p) and M (p), and use Part(p) to define stratifications of Diag+ (p)/Sp and Sym+ (p). The commutative diagram in Figure 1 indicates the relationships among these spaces and label-sets. In this diagram, the maps not yet defined are as follows: • proj2 : M (p) = SO(p) × Diag+ (p) → Diag+ (p) is projection onto the second factor. • For X ∈ Sym+ (p), if (U, D) ∈ EX we define proj2 (X) = [D] ∈ Sym+ (p)/Sp ; this is well-defined since in any two eigen-decompositions X, the diagonal parts differ at most by a permutation. 12
M (p)
proj2 ✲ Diag+ (p)
F ❄ Sym+ (p)
lbl ✲
Part({1, . . . , p})
quo1
proj2
quo2
❄ ✲ Diag+ (p)/Sp
❄ ✲ Part(p)
lbl
Figure 1: Commutative diagram defining the stratifications of Sym+ (p) and related spaces.
• The stratum-labeling maps lbl, lbl are defined by lbl(D) = JD , lbl([D]) = [JD ]. • quo1 and quo2 are the quotient maps Diag+ (p) → Diag+ (p)/Sp and Part({1, . . . , p}) → Part({1, . . . , p})/Sp = Part(p) respectively. We define strata as the diagram suggests: for J ∈ Part({1, . . . , p}) and [K] ∈ −1 Part(p), (i) DJ := lbl−1 (J) ⊂ Diag+ (p), (ii) D[K] := lbl ([K]) ⊂ Diag+ (p)/Sp , −1
(iii) SJ := proj−1 (D[K] ). 2 (DJ ) = SO(p) × DJ ⊂ M (p), and (iv) S[K] := proj2 For X ∈ S[K] , we call the partition [K] ∈ Part(p) the eigenvalue-multiplicity type of X. It is clear that each of the four spaces Diag+ (p), M (p), Diag+ (p)/Sp , and Sym+ (p), is the disjoint union of the strata we have defined in that space. Note also that for any J ∈ Part({1, . . . , p}), F (SJ ) = S[J] and quo1 (DJ ) = D[J] . One can easily check that DK = D[K] =
[
DJ ,
SK =
D[J] ,
S[K] =
J≤K
[
[J]≤[K]
[
J≤K
SJ ,
[
[J]≤[K]
(3.2) S[J] .
(3.3)
In any stratified space, there is a natural partial ordering “≤” on the set of strata Ti defined by declaring T1 ≤ T2 if T1 ⊂ T2 . Then (3.2)–(3.3) imply that for all J, K ∈ Part({1, . . . , p}), in the appropriate spaces we have SJ ≤ SK ⇐⇒ J ≤ K ⇐⇒ DJ ≤ DK , S[J] ≤ S[K] ⇐⇒ [J] ≤ [K] ⇐⇒ D[J] ≤ D[K] .
(3.4) (3.5)
In each of the stratified spaces above, there is a highest stratum and a lowest stratum, namely the strata labeled by Jtop , [Jtop ], Jbot , and [Jbot ]. Note that for J, K ∈ Part({1, . . . , p}), J ≤ K implies [J] ≤ [K] In view of (3.4)–(3.5), a similar comment applies to our stratified spaces: SJ ≤ SK implies S[J] ≤ S[K] , and DJ ≤ DK implies D[J] ≤ D[K] . The converse of each of these implications is false for p > 2. 13
4. Scaling-rotation distance and minimal smooth scaling-rotation curves The Lie groups SO(p) and Diag+ (p) carry natural bi-invariant Riemannian metrics. If we endow M (p) = SO(p) × Diag+ (p) with a product Riemannian metric gM , the geodesics γ in (M (p), gM ) are easily computed. We define smooth scaling-rotation (SSR) curves in Sym+ (p) to be the projections to Sym+ (p) of the geodesics in (M (p), gM ), i.e. curves of the form F ◦ γ. (In [13] and [11] these were called simply “scaling-rotation curves”. Remark 4.8 explains why we have added “smooth” to this name.) 4.1. Smooth scaling-rotation curves The Lie algebra so(p) = TI (SO(p)) is the space of p × p antisymmetric matrices. For U ∈ SO(p), the tangent space TU (SO(p)) can be identified with either the left-translate or right-translate of so(p) by U . For the purposes of this paper the latter identification is somewhat more convenient: TU (SO(p)) = {AU : A ∈ so(p)}.
(4.1)
With this identification, the standard bi-invariant Riemannian metric gSO on SO(p) is defined by 1 gSO |U (A1 , A2 ) = − tr(A1 U −1 A2 U −1 ), (4.2) 2 where U ∈ SO(p) and A1 , A2 ∈ TU (SO(p)). The requirement of bi-invariance determines the metric gSO up to a constant factor unless p = 4. This is true for trivial reasons if p = 2; for larger p 6= 4 it follows from the simplicity of so(p). For p = 4 the isomorphism so(4) ∼ = so(3) ⊕ so(3) implies that there is a two-parameter family of bi-invariant metrics; for this p the choice (4.2) amounts to fixing two scaling-parameters rather than one.) The manifold Diag+ (p) is also a Lie group, but since it is an open subset of the vector space Diag(p) we make the identification TD (Diag+ (p)) = Diag(p) for all D ∈ Diag+ (p). This abelian Lie group carries a (bi-)invariant Riemannian metric gD+ defined by +
gD+ |D (L1 , L2 ) = tr(D−1 L1 D−1 L2 )
(4.3)
+
where D ∈ Diag (p) and L1 , L2 ∈ TD (Diag (p)). There is a p-parameter family of bi-invariant metrics on Diag+ (p), but, up to a constant factor, gD+ is the unique one that is also invariant under the action of the symmetric group Sp . On M (p), we will use a product Riemannian metric determined by the metrics on the two factors SO(p), Diag+ (p). Specifically, letting k > 0 be an arbitrary parameter that can be chosen as desired for applications, we set gM |(U,D) ((A1 , L1 ), (A2 , L2 )) = k gSO |U (A1 , A2 ) + gD+ |D (L1 , L2 ),
(4.4)
where U ∈ SO(p), D ∈ Diag+ (p), and (Ai , Li ) ∈ TU (SO(p)) ⊕ TD (Diag+ (p)) = T(U,D) M (p). Since the metrics gSO and gD+ are bi-invariant, the geodesics in 14
M (p) can be obtained as either left-translates or right-translates of geodesics through the identity (I, I). In this paper, the latter will be more convenient. Definition 4.1. A smooth scaling-rotation (SSR) curve is a curve χ in Sym+ (p) of the form F ◦ γ, where γ : I → M (p) is a geodesic defined on some interval I. Note: in this paper, we use curve sometimes to mean a parametrized curve—a map I → Z, where I is an interval and Z is some space—and sometimes to mean an equivalence class of such maps, where two maps are regarded as equivalent if one is a monotone reparametrization of the other. (In particular, two such equivalent maps have the same image.) It should always be clear from context which meaning is intended. Notation 4.2. For (U, D) ∈ M (p), A ∈ so(p), and L ∈ Diag(p), we define γU,D,A,L : R → M (p) and χU,D,A,L : R → Sym+ (p) by γU,D,A,L (t) = (exp(tA)U, exp(tL)D)
(4.5)
χU,D,A,L = F ◦ γU,D,A,L .
(4.6)
and
We use the same notation γU,D,A,L , χU,D,A,L for the restrictions of the curves above to any interval I ⊂ R. The curve γU,D,A,L : R → M (p) is the geodesic in M (p) with initial conditions γ(0) = (U, D), γ ′ (0) = (AU, DL) ∈ T(U,D) M (p). The projection of γU,D,A,L to Sym+ (p), the curve χU,D,A,L , is the corresponding smooth scalingrotation curve. It well known that in the Riemannian manifold (SO(p), gSO ), the cut-locus of the identity is the set of all involutions, {R ∈ SO(p) | R2 = I 6= R}. For every non-involution R ∈ SO(p), there is a unique A ∈ so(p) of smallest norm such that exp(A) = R (see Section 5.1); we define log(R) = A. If R is an involution, there is more than one smallest-norm A ∈ so(p) such that exp(A) = R, and we allow log(R) to denote the set of all such A’s. However, all elements A in this set have the same norm, which we write as k log(R)k. Thus k log(R)k is a well-defined real number for all R ∈ SO(p), even though log(R) is not a unique element of so(p) when R is an involution. With this understood, the geodesic-distance function dM on M (p) is given by d2M ((U, D), (V, Λ)) = =
k dSO (U, V )2 + dD+ (D, Λ)2
k
log(U −1 V ) 2 + log(D−1 Λ) 2 , 2
(4.7) (4.8)
where in (4.8) and for the rest of this paper, k k denotes the Frobenius norm on matrices: kAk2 = kAk2F = tr(AT A) for any matrix A. The invariances of the metrics dSO and dD+ lead to the following: 15
Proposition 4.3 ([11, Proposition 3.7]). The geodesic distance (4.7) on M (p) is invariant under simultaneous left or right multiplication by orthogonal matrices, permutations and scaling: For any R1 , R2 ∈ O(p), π ∈ Sp and S ∈ Diag+ (p), and for any (U, D), (V, Λ) ∈ (SO × Diag+ )(p), dM ((U, D), (V, Λ)) = dM ((R1 U R2 , Sπ·D), (R1 V R2 , Sπ·Λ)) .
4.2. Scaling-rotation distance and MSSR curves Definition 4.4 ([11, Definition 3.10]). For X, Y ∈ Sym+ (p), the scalingrotation distance dSR (X, Y ) between X and Y is defined by dSR (X, Y ) :=
inf
(U,D)∈EX , (V,Λ)∈EY
dM ((U, D), (V, Λ)).
(4.9)
Definition 4.5. Let γ be a piecewise-smooth curve in M (p) and let ℓ(γ) denote the length of γ. For X, Y ∈ Sym+ (p), we call γ : [0, 1] → M (p) an F -minimal geodesic (from EX to EY ) if γ(0) ∈ EX , γ(1) ∈ EY , and ℓ(γ) = dSR (X, Y ). We call a pair of points ((U, D), (V, Λ)) ∈ EX × EY a minimal pair if (U, D) = γ(0) and (V, Λ) = γ(1) for some F -minimal geodesic γ. A minimal smooth scalingrotation (MSSR) curve from X to Y is a curve χ in Sym+ (p) of the form F ◦ γ where γ is an F -minimal geodesic. We say that the MSSR curve χ = F ◦ γ corresponds to the minimal pair formed by the endpoints of γ. We let M(X, Y ) denote the set of MSSR curves from X to Y. Obviously an F -minimal geodesic is a minimal geodesic in the usual sense of Riemannian geometry: it is a curve of shortest length among all piecewisesmooth curves with the same endpoints. (From the general theory of geodesics, the image of any such curve γ is actually smooth.) Thus a definition equivalent to (4.9) is dSR (X, Y ) = inf {ℓ(γ) | γ : [0, 1] → M (p) is a geodesic with γ(0) ∈ EX , γ(1) ∈ EY } . (4.10) Thus an F -minimal geodesic can alternatively be defined as a geodesic of minimal length among all geodesics starting in one given fiber and ending in another. From Corollary 2.9, every fiber of F is compact, so the infimum in (4.9) is always achieved. Hence for all X, Y ∈ Sym+ (p), there always exists an F minimal geodesic, a minimal pair in EX × EY , and an MSSR curve from X to Y . We discuss uniqueness issues in the next subsection. Remark 4.6. Observe that we have not defined a Riemannian metric on Sym+ (p), so there is no “automatic” meaning attached to the phrase length of a smooth curve in Sym+ (p). However, for an SSR curve χ in Sym+ (p) we define the length of χ to be ℓ(χ) := inf{ℓ(γ) : γ is a geodesic in M (p) and F ◦ γ = χ}. With this definition, (4.10) becomes dSR (X, Y ) = inf ℓ(χ) | χ : [0, 1] → Sym+ (p) is an SSR curve with χ(0) = X, χ(1) = Y } . (4.11) 16
In the term “smooth scaling-rotation curve”, the word “smooth” needs some comment. By its definition, every SSR curve χ : I → Sym+ (p) is a smooth map, but it is not clear whether the image of χ is “geometrically smooth”, i.e. locally (in I) a smooth submanifold or submanifold-with-boundary of Sym+ (p). For the image of χ to be “geometrically smooth”, χ must admit a regular parametrization, one that is an immersion. It turns out that all SSR curves do, except for those whose images are single points: Proposition 4.7. If γ is a non-constant geodesic, then F ◦ γ is either an immersion or a constant map. Proof: Let γ : [0, 1] → M (p) be a non-constant F -minimal geodesic and let χ = F ◦ γ. Let (U, D) = γ(0) and let X = χ(0) = F (U, D). Since γ is a geodesic there exist unique A ∈ so(p), L ∈ Diag(p) such that γ(t) = (etA U, etL D). Nonconstancy implies (A, L) 6= (0, 0). Direct computation yields χ′ (t) = etA [A, U Λ(t)U T ] + U LΛ(t)U T e−tA ,
where Λ(t) = etL D and [ , ] denotes matrix commutator. Suppose that t0 ∈ [0, 1] is such that χ′ (t0 ) = 0. Then [A, U Λ(t0 )U T ] + U LΛ(t0 )U T = 0.
(4.12)
˜ Λ(t0 )] + LΛ(t0 ) = 0, Multiplying on left by U T and on the right by U yields [A, T ˜ where A = U AU . But because Λ(t0 ) is diagonal, the diagonal entries of any commutator [B, Λ(t0 )] are zero. Since LΛ(t0 ) is a diagonal matrix, this implies ˜ Λ(t0 )] = 0 = LΛ(t0 ). But Λ(t0 ) is invertible, so the second equality that [A, implies L = 0. Thus Λ(t) = D for all t, and plugging this into (4.12) with t = t0 we find [A, X] = 0. It follows that X commutes with etA for every t. Hence χ(t) = etA U DU T e−tA = etA Xe−tA = X for all t. Thus either χ′ (t) is nonzero for every t ∈ [0, 1] or χ is constant. It seems likely that a non-constant MSSR curve χ is actually an embedding (for this, it suffices that χ be injective, since [0, 1] is compact), but we have not proven this. There do exist non-minimal non-constant SSR curves that are not one-to-one. One example is any nonconstant periodic SSR curve: in (4.6), take L = 0 and take A to be any nonzero element of so(p) for which exp(t1 A) = I for some t1 6= 0. (For p ≤ 3, the latter condition is redundant.) In this case, the curve χU,D,A,L |[0,|t1 |] is an SSR curve of positive length from(U, D) to (U, D). A 0 −1 nonperiodic example with p = 2 is the following. Let J = , U (t) = 1 0 1−t e 0 exp(t π2 J), D(t) = . Then the curve t 7→ γ(t) := (U (t), D(t)) is 0 et the geodesic γU,D,A,L , where U = U (0) = I, D = D(0) = diag(e, 1), A = π2 J, and L = diag(−1, 1). Let χ be the SSR curve F ◦ γ. Then, as the reader may check, if t1 < t2 we have χ(t1 ) = χ(t2 ) if (and only if) for some integer n ≥ 0 17
we have t1 = −n and t2 = n + 1. Now let n1 , n2 be non-negative integers, let t1 ∈ (−n1 −1, −n1 ), t2 ∈ (n2 +1, n2 +2), and let n = min{n1 , n2 }, X = χ(t1 ), and Y = χ(t2 ). Then χ[t1 ,t2 ] is an SSR curve from X to Y with n + 1 self-crossings. Note that the presence of self-crossings does not directly imply that χ[t1 ,t2 ] is not an MSSR curve: if we remove the closed curve χ[−n,n+1] from χ[t1 ,t2 ] , the piecewise-smooth curve χ1 from X to Y that remains is not an SSR curve. (As the reader may check, the set {χ′ (−n), χ′ (n + 1)} is linearly independent, so χ1 cannot be reparametrized as an immersion. Hence, by Proposition 4.7, there is no geodesic γ1 in M (2) such that χ1 can be reparametrized as F ◦ γ1 .) Hence χ1 is not a candidate for an SSR curve from X to Y that is shorter than χ. However, with a little effort one can check by direct computation that there is an F -minimal geodesic from X to Y that is shorter than γ|[t1 ,t2 ] . (One can compute the length of the minimal geodesic from any of the four points in EX to any of the four points in EY , and see that each of these lengths is less than ℓ(γ|[t1 ,t2 ] ).) Remark 4.8. As noted in [11], the “scaling-rotation distance” dSR is not a metric on Sym+ (p); it does not satisfy the triangle inequality. In [9], we show that the pseudometric ρSR generated by the semimetric dSR is a true metric ρSR on Sym+ (p). (It is not trivial to show that ρSR (X, Y ) 6= 0 for X 6= Y .) Effectively, the construction enlarges the class of scaling-rotation (SR) curves χ considered in (4.11) from smooth maps to piecewise-smooth maps (with ℓ(χ) redefined correspondingly). It is for this reason we have made “smooth” part of the terminology used in Definition 4.1. This definition of the scalingrotation metric ρSR is analogous to the definition of “distance between two points in a Riemannian manifold”: the infimum of the lengths of piecewisesmooth curves joining the points. Some minimal-length scaling-rotation curves are geometrically non-smooth (having corners); an MSSR curve from Xto Y has minimal length only among smooth SR curves from X to Y . This phenomenon does not occur in Riemannian geometry; in a Riemannian manifold, minimal piecewise-smooth curves between two points are actually smooth, geometrically. Definition 4.9. Recall that given any group G and subgroups H1 , H2 , an (H1 , H2 ) double-coset is an equivalence class under the equivalence relation ∼ on G defined by declaring g1 ∼ g2 if there exist h1 ∈ H1 , h2 ∈ H2 such that g2 = h1 g1 h2 . The set of equivalence classes under this relation is denoted H1 \G/H2 . By a set of representatives of H1 \G/H2 we mean a subset of G consisting of exactly one element from each (H1 , H2 ) double-coset. Note that every ordinary left coset or right coset is also a double coset (with the second subgroup taken to be trivial), so “set of representatives” is defined for ordinary cosets as well. In [8] we apply the next proposition to compute all scaling-rotation distances, and to help compute and classify all smooth minimal scaling-rotation curves, in the case p = 3.
18
Proposition 4.10. Let X, Y ∈ Sym+ (p) and let (U, D) ∈ EX , (V, Λ) ∈ EY . Let Z be any set of representatives of Γ0JD \S˜p+ /Γ0JΛ . Then the scaling-rotation distance from X to Y is given by dSR (X, Y )2
=
=
min kdSO (U RU , V RV Pg−1 )2 + dD+ (D, πg·Λ))2 : RU ∈ G0D , RV ∈ G0Λ , and g ∈ Z (4.13) n 2 −1 2 0 min k min dSO (U RU , V RV Pg ) : RU ∈ GD , RV ∈ G0Λ g∈Z (4.14) + k log D−1 (πg·Λ) k2 .
Every minimal smooth scaling-rotation curve from X to Y corresponds to some minimal pair whose first element lies in the connected component [(U, D)] of EX . A pair ((U RU , D), (V RV Pg−1 , πg·Λ) with RU ∈ G0D , RV ∈ G0Λ , and g ∈ S˜p+ , is minimal if and only if the triple (g, RU , RV ) is a minimizer of the expression in braces in (4.13). Proof: From Proposition 2.5, we have
o n , πg2·Λ)) : RU ∈ G0D , RV ∈ G0Λ ; g1 , g2 ∈ S˜p+ . , πg1·D), (V RV Pg−1 EX ×EY = (U RU Pg−1 2 1
(4.15) Using Proposition 4.3 and the fact the maps g 7→ πg , g 7→ Pg , are homomorphisms, for all g1 , g2 ∈ S˜p+ we have , πg2·Λ)) , πg1·D), (V RV Pg−1 d((U RU Pg−1 2 1 −1
= d((U RU , D), (V RV (Pg−1 g2 ) 1
, πg−1 g2·Λ)). 1
(4.16) (4.17)
˜ underlying the equality above is an isom˜ , D) ˜ 7→ (U ˜ Pg1 , πg−1·D) The map (U 1 + etry of (SO × Diag )(p) that preserves every fiber, hence carries a geodesic γ1 with the endpoints in (4.16) into a geodesic γ2 with the endpoints in (4.17) and that satisfies F ◦ γ1 = F ◦ γ2 . Hence, every smooth scaling-rotation (SSR) curve from X to Y is of the form F ◦γ where γ : [0, 1] → (SO ×Diag+ )(p) is a geodesic with γ(0) = (U RU , D) ∈ [(U, D)] and γ(1) ∈ EY . , πgi·Λ), i = Suppose γ1 , γ2 are two such geodesics, with γi (1) = (V RV Pg−1 i 1, 2. If g2 = hD g1 hΛ , with hD ∈ Γ0JD and hΛ ∈ Γ0JΛ , then , πg2·Λ)) d((U RU , D), (V RV Pg−1 2
−1 −1 −1 Pg1 hD , πhD·πg1·πhΛ·Λ)) = d((U RU , D), (V RV hΛ
−1 = d((U RU hD , πhD·D), (V RV h−1 Λ Pg1 , πg1·πhΛ·Λ))
, πg1·Λ)) = d((U RU,1 , D), (V RV,1 Pg−1 1 19
0 where RU,1 = RU hD ∈ G0D and RV,1 = RV h−1 Λ ∈ GΛ . The same argument as in the preceding paragraph shows that the SSR curve determined by the pair , πg2·Λ)) is the same as the SSR curve determined by the ((U RU , D), (V RV Pg−1 2 , πg1·Λ)). Hence any representative g ∈ S˜p+ of a pair ((U RU,1 , D), (V RV,1 Pg−1 1 0 0 given (ΓJD , ΓJΛ ) double-coset determines the same set of SSR curves (the set of curves F ◦ γ, where γ is a geodesic from (U RU , D) to (V RV g −1 , πg·Λ) for some RU ∈ G0D , RV ∈ G0Λ ) as does any other representative of that double-coset. The Proposition now follows.
Proposition 4.10 remains true if Z is replaced by S˜p+ in (4.13), or by a set of representatives of the left-coset space Sp /Γ0JΛ or the right-coset space Γ0JD \Sp . However, taking Z as in the Proposition reduces the combinatorial complexity of the minimization problem when the eigenvalues of X or Y are not all distinct. In Sections 5 and 6 of [8], this proposition is applied to simplify the concrete computations of dSR (X, Y ) and M(X, Y ), respectively, in the case p = 3. 4.3. Uniqueness questions for MSSR curves As noted in Section 4.1, for all X, Y ∈ Sym+ (p) there always exists an MSSR curve from X to Y , the projection of some F -minimal geodesic. A priori, different F -minimal geodesics could project to the same MSSR curve or to different MSSR curves. It is natural to ask: Under what conditions on (X, Y ) is there a unique MSSR curve from X to Y ? When uniqueness fails, how does it fail, and what can we say about the set M(X, Y )? For uniqueness to fail for given X, Y , there must be distinct F -minimal geodesics γi : [0, 1] → M (p), whose endpoints are minimal pairs ((Ui , Di ), (Vi , Λi )) ∈ EX × EY , i = 1, 2, such that F ◦ γ1 6= F ◦ γ2 . The “how” question above concerns the following two possibilities (not mutually exclusive): 1. “Type I non-uniqueness”: There exist such γi whose endpoints are distinct minimal pairs ((Ui , Di ), (Vi , Λi )). 2. “Type II non-uniqueness”: There exist such γi whose endpoints are the same minimal pair ((U, D), (V, Λ)). Since for any D, Λ ∈ Diag+ (p) the minimal geodesic from D to Λ is unique, Type II non-uniqueness with minimal pair ((U, D), (V, Λ)) is equivalent to the existence of two or more minimal geodesics from U to V , which is equivalent to each of U, V being in the cut-locus (in SO(p)) of the other. It will be convenient for us to have some other terminology for this situation: Definition 4.11. Call a pair of points (U, V ) in SO(p) × SO(p) geodesically antipodal if one point is in the cut-locus of the other (equivalently, if each point is in the cut-locus of the other) and geodesically non-antipodal otherwise. Call a pair of points ((U, D), (V, Λ)) in M (p) × M (p) geodesically antipodal if (U, V ) is a geodesically antipodal pair in SO(p) × SO(p), and geodesically non-antipodal otherwise.
20
As mentioned earlier, the cut-locus of the identity I ∈ SO(p) is precisely the set of all involutions in SO(p). Furthermore, because of the invariance of the Riemannian metric gSO , an element V ∈ SO(p) is in the cut-locus of element U if and only if V −1 U is in the cut-locus of I. Note that, for elements a, b of any group, ab is an involution ⇐⇒ ba is an involution ⇐⇒ b−1 a−1 is an involution ⇐⇒ a−1 b−1 is an involution. Thus, if any of the elements V −1 U, U V −1 , U −1 V, V U −1 is an involution, so are all the others. Note that a pair (U, V ) in SO(p) can be geodesically antipodal without either point being maximally remote from the other. (For example, with p = 4, the matrix diag(−1, −1, 1, 1) is an involution, but is closer to the identity I than is the involution −I.) However, if (U, V ) is geodesically antipodal, then there exists a (not necessarily unique) closed geodesic in SO(p) containing U and V , isometric to a circle of some radius, such that U and V are antipodal points of this circle in the usual sense, and such that each of the two semicircles connecting U and V is a minimal geodesic between these two points. One reason for our interest in Type II non-uniqueness is its effect on a true scaling-rotation metric ρSR on Sym+ (p), mentioned earlier, that we construct from dSR in [9]. Various constructions and assertions concerning this metric are simplified when we know that Type II non-uniqueness does not occur. For small enough values of p, Type II non-uniqueness never occurs; for large enough p, it always occurs. Our proof of this requires a long digression from the topic of scaling-rotation distance and MSSR curves, so we defer the proof to Section 5. However, we will introduce here our main tool for ruling out Type II non-uniqueness, based on a property we call sign-change reducibility (for want of a better term) defined shortly. To motivate the definition, let X, Y ∈ Sym+ (p) and let ((U, D), (V, Λ)) ∈ EX × EY be a minimal pair. Then one minimizer (g, RU , RV ) of the expression in brackets on the right-hand side of (4.13) is the triple (e, I, I), where e is the identity element of S˜p+ . Hence for all g ∈ S˜p with πg · Λ = Λ— i.e. for all g ∈ KJΛ (see Notation 2.7 and equation (2.18))—we must have dSO (U Pg , V ) = dSO (U, V Pg−1 ) ≥ dSO (U, V ). But Ip+ ⊂ KJ for all J (since sign-change matrices act trivially on Diag+ (p)), so, in particular, we must have dSO (U Iσ, V ) ≥ dSO (U, V ) for all σ ∈ Ip+ . Definition 4.12. Call a pair of points (U, V ) ∈ SO(p) × SO(p) sign-change reducible if dSO (U Iσ, V ) < dSO (U, V ) for some σ ∈ Ip+ . From the discussion preceding Definition 4.12, we have the following: Corollary 4.13. Let ((U, D), (V, Λ)) ∈ M (p) × M (p). If (U, V ) ∈ SO(p) × SO(p) is sign-change reducible, then ((U, D), (V, Λ)) is not a minimal pair.
Sign-change reducibility is studied in more detail in Sections 5–8. Below, we summarize some results proven there, and their consequences. Two of the main results are given in the following Proposition (proven in Section 8): 21
Proposition 4.14. (a) For p ≤ 4, every geodesically antipodal pair (U, V ) in SO(p) × SO(p) is sign-change reducible. (b) For p ≥ 11, there exist geodesically antipodal pairs (U, V ) in SO(p) × SO(p) that are not sign-change reducible. Thus the largest dimension p1 for which every geodesically antipodal pair (U, V ) in SO(p1 ) × SO(p1 ) is sign-change reducible satisfies 4 ≤ p1 ≤ 10. A combination of theory and numerical evidence leads the authors to believe that p1 is closer to 10 than to 4. An immediate consequence of Proposition 4.14 (a) is the following. (Again, we do not believe the number “4” here is sharp.) Corollary 4.15. For p ≤ 4, every minimal pair in M (p) × M (p) is geodesically non-antipodal. Hence for p ≤ 4, for all X, Y ∈ Sym+ (p) for which M(X, Y ) > 1, the non-uniqueness is purely of Type I. Part of the importance of sign-change reducibility comes from the following: Proposition 4.16. Suppose that (U, V ) is a pair in SO(p) × SO(p) that is not sign-change reducible. Then there exist D, Λ ∈ DJtop such that the pair ((U, D), (V, Λ)) is minimal. We will prove this below. But first note that an immediate corollary of Propositions 4.14(b) and 4.16 is: Corollary 4.17. For p ≥ 11, there exist geodesically antipodal, minimal pairs ((U, D), (V, Λ)) ∈ SJtop × SJtop ⊂ M (p) × M (p). Hence, for p ≥ 11, there exist X, Y ∈ S[Jtop ] ⊂ Sym+ (p) for which the set M(X, Y ) exhibits Type II nonuniqueness. Thus sign-change reducibility is more than an ad hoc criterion for ruling out Type II non-uniqueness for small enough p. Proposition 4.16 and Corollary 4.17 show that, in some sense, sign-change reducibility is the only obstruction to having points X, Y in the top stratum of Sym+ (p) for which M(X, Y ) exhibits Type II nonuniqueness. For X or Y not in the top stratum of Sym+ (p), the relationship between Type II nonuniqueness and sign-change reducibility of minimal pairs in EX × EY situation is more complicated to analyze. We do not investigate this relationship further in this paper. To prove Proposition 4.16 we start with a lemma: Lemma 4.18. Let c > 0. There exist D, Λ ∈ Dtop := DJtop such that k log(D−1 (π·Λ)k2 > k log(D−1 Λ)k2 + c for all non-identity π ∈ Sp . p Proof: Let c1 = c/(3p) and let {ai }pi=1 be a sequence of numbers satisfying √ √ ai+1 − ai > (2 p + 1)c1 for 1 ≤ i ≤ p − 1. Then |c + aj − ai | > 2 pc for all i 6= j. Let D = diag(ea1 , . . . , eap ) and let Λ = ec1 D. Then D, Λ ∈ Dtop and k log(D−1 Λ)k2 = kc1 Ik2 = pc21 . 22
Let π ∈ Sp , π 6= id, and let i be such that π −1 (i) 6= i. Then √ k log(D−1 (π·Λ)k2 ≥ |c1 + aπ−1 (i) − ai |2 > (2 pc1 )2 = k log(D−1 Λ)k2 + c.
Proof of Proposition 4.16. Let D, Λ ∈ Dtop such that k log(D−1 (π·Λ)k2 > k log(D−1 Λ)k2 + k diam(SO(p))2
(4.18)
for all non-identity π ∈ Sp ; such D, Λ exist by Lemma 4.18. Let X = F (U, D), Y = F (V, Λ). The subgroups G0D , G0Λ of SO(p) are trivial, as are the subgroups Γ0JD and Γ0JΛ of S˜p+ . Hence in Proposition 4.10 we have Z = S˜p+ and dSR (X, Y )2
= =
min
˜p+ g∈S
n
k dSO U, V Pg−1
2
o + k log D−1 (πg·Λ) k2
n n o 2 min k min dSO U, V Pg−1 : g ∈ S˜p+ , πg = π π∈Sp o + k log D−1 (π·Λ) k2 .
For all non-identity π ∈ Sp and all g1 , g2 ∈ S˜p+ with πg1 = id. and πg2 = π, using (4.18) we then have , πg1·Λ))2 dM ((U, D), (V Pg−1 1
+ k log D−1 Λ k2 ≤ k diam(SO(p))2 + k log D−1 Λ k2 < k log D−1 (π·Λ) k2 = k dSO U, V Pg−1 1
2
, πg2·Λ))2 . ≤ dM ((U, D), (V Pg−1 2
Hence the identity permutaton is the only element of Sp for which the expression inside the outer braces in (4.19) achieves the minimum over all π ∈ Sp . But {g ∈ S˜p+ : πg = id.} is precisely the sign-change subgroup Ip+ , and by hypothesis (U, V ) is not sign-change reducible. Hence dSR (X, Y )2
=
n o 2 min+ k dSO (U, V Iσ) + k log D−1 Λ k2
σ∈Ip
= =
k dSO (U, V )2 + k log D−1 Λ k2
dM ((U, D), (V, Λ))2 .
Thus ((U, D), (V, Λ)) is a minimal pair.
23
4.4. When do two minimal pairs determine the same MSSR curve? Proposition 4.10 is a starting-point for understanding the set M(X, Y ) for all p and all X, Y ∈ Sym+ (p): it assures us that, for any (U, D) ∈ EX , every MSSR curve from X to Y corresponds to some minimal pair whose first element lies in the connected component [(U, D)] of EX . But even once we know all the minimal pairs, to completely understand M(X, Y )—or even just determine its cardinality—we need a way to tell whether MSSR curves corresponding to two (not necessarily distinct) minimal pairs with first point in [(U, D)] are the same. (This is true whether the non-uniqueness, if any, in M(X, Y ) is of Type I, Type II, or a mixture of both). Proposition 4.19 below provides such a tool. For p = 3, this proposition suffices for the computation of M(X, Y ) in the “nontrivial” cases (those in which neither X nor Y lies in the bottom stratum of Sym+ (3), at least one lies in the middle stratum—the unique stratum of Sym+ (3) that is neither the top nor bottom stratum); see [8, Section 6]. Proposition 4.19. Let X, Y ∈ Sym+ (p), X 6= Y . For i = 1, 2 assume that χi = F ◦ γi is a minimal smooth scaling-rotation curve from X to Y correspond, Λi )), where RU,i ∈ G0D , RV,i ∈ ing to the minimal pair ((U RU,i , D), (V RV,i Pg−1 i G0Λ , gi ∈ S˜p+ , Λi = πgi·Λ, and γi : [0, 1] → M (p) is a geodesic. (We do not assume that the two minimal pairs are distinct.) Then χ1 = χ2 if and only if the following two conditions hold. ) are geodesically non-antipodal and (i) Both pairs (U RUi , V RV,i Pg−1 i −1 −1 , RU,1 = RV,1 Pg−1 RU,2 RV,2 Pg−1 1 2
(4.19)
or both pairs are geodesically antipodal and −1 −1 (projSO(p) γ1′ (0))RU = (projSO(p) γ2′ (0))RU , 1 2
(4.20)
where for any (U ′ , D′ ) ∈ M (p), projSO(p) denotes the natural projection T(U ′ ,D′ ) M (p) → TU ′ SO(p). (ii) There exist g ∈ S˜p+ , R ∈ G0D,Λ1 such that D Λ2 and
−1 RU,1 RU,2
= πg·D,
(4.22)
= RPg−1 .
(4.23)
= πg·Λ1 ,
(4.21)
Equation (4.20) implies equation (4.19), so (4.19) is always a necessary condition for the equality χ1 = χ2 .
24
In Proposition 4.19, in the geodesically non-antipodal case we use endpoint data to tell whether the projections to Sym+ (p) of two minimal geodesics from EX to EY are equal. We will deduce this proposition from the following theorem, proven in [11], that gives a criterion based on initial-value data to tell whether the projections of two geodesics emanating from EX are equal. In this theorem, T T GD,L := GD GL , gD,L =: gD gL (the Lie algebra of GD,L ), and for A ∈ SO(p), adA : so(p) → so(p) is the linear map defined by adA (B) = [A, B]. Theorem 4.20 ([11, Theorem 3.8]). For i = 1, 2 let (Ui , Di ) ∈ M (p), Ai ∈ so(p), Li ∈ Diag(p), and let Aˇi = U1−1 Ai U1 . Let I be a positive-length interval containing 0. Then the smooth scaling-rotation curves χi := χUi ,Di ,Ai ,Li : I → Sym+ (p) are identical if and only if (i) Aˇ2 − Aˇ1 ∈ gD1 ,L1 , (ii) (adAˇ2 )j (Aˇ1 ) ∈ gD1 ,L1 for all j ≥ 1, and (iii) there exist R ∈ GD1 ,L1 and g ∈ S˜p+ , such that U2 = U1 RPg−1 , D2 = πg·D1 , and L2 = πg·L1 .1
To deduce Proposition 4.19 from Theorem 4.20, we first prove two lemmas. Beyond helping us to prove the Proposition, these lemmas may be useful in further analysis of MSSR curves. In these lemmas, for any X ∈ Sym+ (p) we write gX for the Lie algebra of the stabilizer GX := {U ∈ G : U XU T = X}; thus gX = {A ∈ so(p) : AX = XA}. (Observe that this is consistent with the notation GD , gD introduced earlier for diagonal matrices.) Lemma 4.21. Let X, Y ∈ Sym+ (p) and suppose that χ : [0, 1] → Sym+ (p) is a minimal smooth rotation-scaling curve with X := χ(0) 6= Y := χ(1). Let γ = γU,D,A,L : [0, 1] → SO(p)×Diag+ (p) be a geodesic for which χ = F ◦γ. Then T A ∈ (gX )⊥ (gY )⊥ , where the orthogonal complements are taken in so(p). Proof: Since γ is a smooth curve of minimal length connecting the submanifolds EX and EY of (SO × Diag)+ (p), the velocity vectors γ ′ (0), γ ′ (1) must be perpendicular to the tangent spaces Tγ(0) EX , Tγ(1) EY , respectively ([3, Proposition 1.5]). Making natural tangent-space identifications, we have Tγ(0) EX = T(U,D) EX = U gD ⊕ {0} ⊂ U gD ⊕ Diag(p), where U gD := {U C : C ∈ gD }. ˇ DL), and the Riemannian metric we Let Aˇ = U −1 AU . Since γ ′ (0) = (U A, are using on SO(p) is left-invariant, the condition γ ′ (0) ⊥ Tγ(0) EX is equivalent to Aˇ ∈ (gD )⊥ , hence to A ∈ U (gD )⊥ U −1 . Using additionally the rightinvariance of the metric on so(p), we have U (gD )⊥ U −1 = (U gD U −1 )⊥ . From general group-action properties, it is easily seen that U gD U −1 = gUDU −1 . Since U DU −1 = X, it follows that A ∈ (gX )⊥ . A similar argument at the point (V, Λ) := γ(1) shows that A ∈ (gV ΛV −1 )⊥ = (gY )⊥ .
1 In [11, Theorem 3.8], g was actually required to be of the form (s(π), π), where s is as in (??). But the same argument as in the proof of Proposition 2.5 shows that this restriction can be removed.
25
Lemma 4.22. In the setting of Theorem 4.20, assume that the smooth scalingrotation curve χ1 is minimal. Then conditions (i) and (ii) in the theorem can be replaced by the single condition A2 = A1 . Proof: With notation as in Theorem 4.20, assume that χ2 = χ1 . Then the Theorem implies that U −1 (A2 − A1 )U ∈ gD,L ⊂ gD , implying that A2 − A1 ∈ U gD U −1 = gX (as in the proof of Lemma 4.21). But since χ1 is minimal, Lemma 4.21 implies that both A2 and A1 lie in (gX )⊥ , hence that A2 − A1 ∈ (gX )⊥ . Hence A2 − A1 = 0, i.e. A2 = A1 . Conversely, assume that A2 = A1 . Then conditions (i) and (ii) are satisfied trivially. , Proof of Proposition 4.19: For i ∈ {1, 2} let Ui = U RU,i , Vi = V RV,i Pg−1 i and Λi = πgi·Λi . By hypothesis χi = F ◦ γi , where γi = γUi ,D,Ai ,Li : [0, 1] → M (p) (for some Ai ∈ so(p), Li ∈ Diag(p)) is a minimal geodesic from (Ui , D) to (Vi , Λi ). Hence Li = log(Λi D−1 ) and Ai ∈ log(Vi Ui−1 ) (we write “∈” rather than “=” since if R is an involution, “ log R”, as we have defined it, is a set with more than one element; see Section 4.1). It is straightforward to show that GD,Li = GD,Λi . From Lemma 4.22, the conditions (i) and (ii) in Theorem 4.20 in the equality-conditions for χ1 and χ2 can be replaced by the single condition A2 = A1 . If A2 = A1 then V2 U2−1 = V1 U1−1 , implying that either both pairs {(Ui , Vi )} are geodesically antipodal or both are geodesically non-antipodal. In the converse direction, suppose that the pairs {(Ui , Vi )} are geodesically non-antipodal and that V2 U2−1 = V1 U1−1 . Then A2 = log(V2 U2−1 ) = log(V1 U1−1 ) = A1 . Whether or not the pairs {Ui , Vi } are geodesically antipodal, by definition (projSO(p) γi′ (0))Ui−1 = Ai , so if (4.20) holds then A2 = A1 . Hence the condition A2 = A1 is equivalent to condition (i) in Proposition 4.19. Next, letting D play the role of D1 in Theorem 4.20, condition (iii) in the Theorem is equivalent to the existence of g ∈ S˜p+ , R ∈ GD,Λ1 such that D = πg·D, L2 = πg·L1 , and U2 = U1 RPg−1 . But for all such R, π, we have RPg−1 = for some R0 ∈ G0D,Λ1 and g0 ∈ S˜p+ with πg0 = πg . Furthermore, for R0 Pg−1 0 any π ∈ Sp , if π·D = D then L2 = π·L1 ⇐⇒ Λ2 = π·Λ1 . Hence, under the hypotheses of Proposition 4.19, condition (iii) in Theorem 4.20 is equivalent to condition (ii) stated in the Proposition. This establishes the “if and only if” statement in the Proposition. The final statement of the proposition follows from the fact that, in the notation of this proof, (4.20) is the equality A2 = A1 (after multiplying both sides of (4.20) on the right by U −1 ), an equality that implies V2 U2−1 = exp(A2 ) = exp(A1 ) = V1 U1−1 .
26
5. Involutions, sign-change reducibility, and distance between subspaces of Rp In this section we begin our study of sign-change reducibility. This culminates in Section 8 with the proof of Proposition 4.14 (which, as we have seen, implies Corollary 4.17, our main result concerning Type II nonuniqueness), but we discover some other interesting facts along the way. As we shall see, questions concerning the seemingly ad hoc notion of sign-change reducibility can be translated into questions about distances between subspaces of Rp ; for example, Proposition 5.11 states the equivalence between a sign-change-reducibility question and a question purely about the geometry of the Grassmannian Grm (Rp ) (endowed with a standard metric). Thus, some unexpected benefits of our investigation of Type II nonuniqueness are results, possibly of independent interest, concerning the geometry of Grassmannians and, more generally, principal angles between subspaces of Rp . Recall from Section 4.3 that two points U, V ∈ SO(p) are geodesically antipodal if and only if V −1 U is an involution. Since dSO (U, V ) = dSO (V −1 U, I), the set of distances between geodesically antipodal points in SO(p) is the same as the set of distances between the identity and involutions. Thus to understand which (if any) geodesically antipodal pairs (U, V ) in SO(p) are sign-change reducible, it suffices to study the case (U, V ) = (R, I), where R is an involution. Definition 5.1. 1. Call R ∈ SO(p) sign-change reducible if dSO (RIσ, I) < dSO (R, I) for some σ ∈ Ip+ (equivalently, if the pair (R, I) is sign-change reducible). Note that sign-change reducibility of the pair (U, V ), as previously defined in Definition 4.12, is equivalent to sign-change reducibility of V −1 U . 2. For σ = (σ1 , . . . , σp ) ∈ Ip , define the level of σ, written level(σ), to be #{i : σi = −1}. 3. For any involution R ∈ SO(p), define the level of R, written level(R), to be dim(E−1 (R)). We write Inv(p) for the set of involutions in SO(p), and for 0 < m ≤ p we write Invm (p) for the set of involutions in SO(p) of level m. Note that dim(E−1 (R)) is even for any R ∈ SO(p), so Invm (p) is S empty unless m is even and at least 2. Thus Inv(p) = even m≥2 Invm (p) (a disjoint union). 4. Let R ∈ SO(p) be an involution. We say that R is reducible by a signchange of level m if there exists σ ∈ Ip+ of level m such that dSO (RIσ, I) < dSO (R, I). Observe that for non-identity σ ∈ Ip+ , the matrix Iσ is an involution in SO(p), and level(σ) = level(Iσ). Remark 5.2 (Involutions and Grassmannians). The space Inv(p) can be naturally identified with a disjoint union of Grassmannians, because an involution R ∈ SO(p) is completely determined by its (−1)-eigenspace E−1 (R). Let Grm (Rp ) denote the Grassmannian of m-planes in Rp , and for even m ∈ (0, p] 27
define Φm,p : Grm (Rp ) → Invm (p) to be the map carrying W ∈ Grm (Rp ) to the involution in SO(p) whose (−1)-eigenspace is W . (Thus E−1 (R) = Φ−1 m,p (R) for all R ∈ Invm (p).) Concretely, letting πV : Rp → V denote orthogonal projection onto any subspace V, and letting PV denote the matrix of πV with respect to the standard basis of Rp , the map Φm,p is given by Φm,p (W ) = PW ⊥ − PW = I − 2PW ,
(5.1)
reflection about the (p − m)-plane W ⊥ . It is not hard to show that Invm (p) is a submanifold of SO(p) and that Φm,p is a diffeomorphism from Grm (Rp ) to this submanifold. Our study of sign-change reduciblity of involutions will make frequent use of the normal form of an element of SO(p), so we review this before proceeding. 5.1. Normal form and distance to the identity in SO(p) Let k = ⌊ p2 ⌋. Recall that every R ∈ SO(p) has a normal form: a blockdiagonal matrix that, for p even, is of the form C(θ1 ) C(θ2 ) . , R(θ1 , . . . , θk ) = (5.2) . . C(θk )
where
C(θ) =
cos θ sin θ
− sin θ cos θ
(5.3)
and where θi ∈ [0, π], 1 ≤ i ≤ k. (This can be derived quickly from the normal form of an antisymmetric matrix, since the compactness of SO(p) guarantees that the exponential map so(p) → SO(p) is onto.) For the odd-p case, the normal-form matrix is the matrix (5.2) with one more row and column appended, and with a 1 in the lower right-hand corner (and zeroes everywhere else in the last row and column). In this case we define θk+1 = 0, so that for both even and odd p we can use the notation R(θ1 , . . . , θ⌈p/2⌉ ) for the normal form. Note that 0 −1 C(θ) = exp(θJ) where J = . (5.4) 1 0
For each R ∈ SO(p) there exists an orthonormal basis of Rp with respect to which the linear transformation Rp → Rp , v 7→ Rv, has matrix R(θ1 , . . . , θ⌈p/2⌉ ). Thus there exists Q ∈ O(p) such that R = QR(θ1 , . . . , θ⌈p/2⌉ )Q−1 . 28
(5.5)
The normal form of a given R is unique up to ordering of the blocks; the multi-set {θ1 , . . . , θ⌈p/2⌉ } is uniquely determined by R. From (5.3) and (5.5) we have R = Q exp(A(θ1 , . . . , θ⌈p/2⌉ ))Q−1 = exp(QA(θ1 , . . . , θ⌈p/2⌉ )Q−1 )
(5.6)
where A(θ1 , . . . , θ⌈p/2⌉ ) is the block-diagonal matrix obtained by replacing C(θi ) by θi J in (5.2), 1 ≤ i ≤ ⌊p/2⌋, and, in the odd-p case, replacing the 1 in the lower right-hand corner by 0. Since the normal form is unique up to block-ordering, it follows that dSO (R, I)2 =
⌊p/2⌋
X
θi2 =
⌈p/2⌉
X
θi2
(5.7)
i=1
i=1
Furthermore, from (5.5) and (5.3) it follows that
Rsym
R + RT = Q := 2
cos θ1 I2×2 cos θ2 I2×2 . . . cos θk I2×2
−1 Q
(5.8) if p is even; for odd p we again just append one more row and column of the middle matrix, with a 1 in the lower right-hand corner. Hence the values cos θi (and therefore the values θi ∈ [0, π]) can be recovered from R as the eigenvalues of Rsym , with the multiplicity of an eigenvalue λ of Rsym equal to twice the multiplicity mλ of λ in the list cos θ1 , . . . , cos θk in the even-p case; for odd p the only difference is that multiplicity of the eigenvalue 1 of Rsym is 2m1 + 1 . Remark 5.3 (Normal form, involutions, and distances to identity). Writing R ∈ SO(p) in the form (5.5), it is easily seen that R is an involution if and only if (i) for each i, θi is either 0 or π, and (ii) θi = π for at least one i. For such R, if θi = π for exactly m values of i, then kA(θ1 , . . . , θ⌈p/2⌉ )k2 = mπ 2 . Hence if R ∈ SO(p) is an involution of level m, then dSO (R, I)2 =
m 2 π . 2
(5.9)
Thus {dSO (R, I) : R ∈ SO(p), R an involution} =
j p ko n√ . (5.10) mπ : 1 ≤ m ≤ 2
Using (5.6) it can also be shown that for every non-involution R ∈ SO(p), there is a unique A ∈ so(p) of smallest norm such that exp(A) = R. 29
Notation 5.4. 1. Given R ∈ SO(p) and angles θ1 , . . . , θ⌈p/2⌉ ∈ [0, π] for which R(θ1 , . . . , θ⌈p/2⌉ ) is a normal form of R, we define “redundant normal-form angles” θ˜i ∈ [0, π], 1 ≤ i ≤ p, by jpk ; θ˜p = 0 if p = 2k + 1. θ˜2i−1 = θ˜2i = θi , 1 ≤ i ≤ k = 2
(5.11)
2. For any square matrix A we write Eλ (A) for the λ-eigenspace of A. Note that (5.7) can now be written as p
dSO (R, I)2 =
1 X ˜2 θi . 2 i=1
(5.12)
5.2. Sign-change reducibility, distances in Grassmannians, and a half-angle relation In this section we state and discuss several results, but defer their proofs to later sections. For p ≤ 4 one can show, without appealing to Proposition 5.6 below, that every involution in SO(p) is sign-change reducible. (This sign-change redubility holds for trivial reasons for when p = 2; holds for slightly less trivial reasons, mentioned later in Remark 5.12, for p = 3; and can be shown to be hold for p = 4 using a quaternionic approach.) It is reasonable to wonder whether this holds for all p: Question 5.5. Let p ≥ 2. Is every involution in SO(p) sign-change reducible? Our motivation for this question is not just generalization for its own sake, however. Potential Type II non-uniqueness complicates several aspects of the analysis of scaling-rotation distance and the associated metric ρSR studied in [9]. To understand whether the “Type II non-uniqueness” defined in Section 4.3 can occur, we need to know whether a geodesically antipodal pair in M (p) can be minimal. (As discussed in Section 4.3, a geodesically non-antipodal minimal pair in M (p) uniquely determines an MSSR curve in Sym+ (p).) A sufficient condition for any pair ((U, D), (V, Λ)) in M (p) × M (p) to be non-minimal is that the pair (U, V ) ∈ SO(p) be sign-change reducible. Since sign-change reducibility of involutions rules out the possibility of Type II non-uniqueness, and all involutions are sign-change reducible for p ≤ 4, it is natural to ask Question 5.5 and wish for the answer to be yes. The answer, however, is more complicated. We shall see that the answer to Question 5.5 is yes for p ≤ 4 and no for p ≥ 11 (we do not know the answer for 5 ≤ p ≤ 10), but that for all p, involutions of high enough level are sign-change reducible—morevover, by a sign-change of the same level:
30
Proposition 5.6. Let R ∈ SO(p) be an involution for which level(R) ≥ 21 p. Then there exists σ ∈ Ip+ , with level(σ) = level(R), such that dSO (RIσ, I) < dSO (R, I). We defer the proof to Section 7. Since level(R) = dim(E−1 (R)) ≥ 2 for every involution R, Proposition 5.6 (once proved) immediately establishes Proposition 4.14(a) and Corollary 4.15: for p ≤ 4, all involutions are sign-change reducible, and hence all minimal pairs in M (p) × M (p) are geodesically non-antipodal. We shall see below (Proposition 5.11) that sign-change reducibility by a signchange of the same level is equivalent to a statement purely about the geometry of Grassmannians. For reasons given shortly, it seems likely to the authors that the “same level” condition appearing in Proposition 5.6 is optimal (even without the “level(R) ≥ 21 p” restriction) in the sense that minσ∈Ip+ {dSO (RIσ, I)} is achieved by a sign-change matrix σ ∈ Ip+ for which level(σ) = level(R). If this is true, then the analysis of whether an involution R is sign-change reducible simplifies; we need only consider σ ∈ Ip+ of the same level as R. This (potential) simplication is actually of greater value to us than knowing, for a given R ∈ Inv(p), whether all minimizers of dSO (RIσ, I) have the same level as R, so we state only the following weaker conjecture: Conjecture 5.7. Let m ≥ 2 be even, and let R ∈ SO(p) be an involution of level m. If R is sign-change reducible, then it is reducible by a sign-change of level m. In Section 7 we will prove the following special case of this conjecture: Proposition 5.8. Conjecture 5.7 is true for m = 2. The reason we expect more generally that minσ∈Ip+ {dSO (RIσ, I)} is achieved by a σ for which level(σ) = level(R) is as follows. Every sign-change matrix Iσ1 ∈ Ip+ is itself an involution, and satisfies dSO (Iσ1 Iσ1 , I) = 0 < min{dSO (Iσ1 Iσ , I) : σ ∈ Ip+ , σ 6= σ1 }. Thus for R ∈ SO(p) sufficiently close to Iσ1 , we have dSO (RIσ1 , I) < min{dSO (RIσ, I) : σ ∈ Ip+ , σ 6= σ1 }. The function carrying an involution in R ∈ SO(p) to level(R) is continuous, so for R ∈ Inv(p) sufficiently close to Iσ1 we also have level(R) = level(σ1 ). Hence for every R ∈ Inv(p) sufficiently close to a sign-change matrix, minσ∈Ip+ {dSO (RIσ, I)} is achieved by a sign-change matrix having the same level as R. It seems plausible that this remains true even without the “sufficiently close to a sign-change matrix” restriction. As noted in Remark 5.2, for even m ≥ 2 the space Invm (p) is diffeomorphic to the Grassmannian Grm (Rp ). This Grassmannian carries a Riemannian metric 31
induced by Riemannian submersion from (SO(p), gSO ). It is known that the associated squared geodesic-distance between two points W, Z ∈ Grm (Rp ) is, up to a constant factor, simply the sum of squares of the principal angles between the two m-planes W, Z.2 Choosing the normalization in which the squared geodesic distance dGr (W, Z)2 equals the sum of squares of the principal angles (equation (6.1) below), we will prove the following in Section 6: Proposition 5.9. The map Φ = Φm,p : (Grm (Rp ), dGr ) → (Invm (p), dSO ) (see (5.1)) is an isometry, up to a constant factor of 2: dSO (Φ(W ), Φ(V )) = 2dGr (W, V )
(5.13)
p
for all W, V ∈ Grm (R ). We derive Proposition 5.9 from a general half-angle relation proven in Section 6: Proposition 5.10. Let R1 , R2 be involutions in SO(p). For i = 1, 2 let mi = ⌈p/2⌉ dim(E−1 (Ri )), and let m = min{m1 , m2 }. Let {θi ∈ [0, π]}i=1 be angles for which R(θ1 , . . . , θ⌈p/2⌉ ) is a normal form of the product R1 R2 , and let {θ˜i }pi=1 be as defined in (5.11). Then for some injective map ι : {1, 2, . . . , m} → {1, 2, . . . , p}, the principal angles between E−1 (R1 ) and E−1 (R2 ) satisfy φj (E−1 (R1 ), E−1 (R2 )) =
θ˜ι(j) , 1 ≤ j ≤ m. 2
(5.14)
For every i ∈ / range(ι), the angle θ˜i is either 0 or π. In other words, as stated in the introduction: for any two involutions R1 , R2 ∈ SO(p), each of the principal angles between E−1 (R1 ) and E−1 (R2 ) is exactly half a correspondingly indexed normal-form angle of R1 R2 . Proposition 5.9 can also be proven by purely Riemannian methods, but the proof we give, via Proposition 5.10, is independent in the sense that it does not make any use of a Riemannian metric on Grm (Rp ); see Remark 6.5. In Section 6, after proving Proposition 5.9 we will use it to deduce the following: Proposition 5.11. Let m, p be integers with m even and 0 < m ≤ p. Then the following two statements are equivalent: 1. Every involution R ∈ SO(p) of level m is sign-change reducible by a signchange of level m. 2 This fact follows from Wong’s results on geodesics in [15], and has been cited elsewhere in the literature (e.g. [5, p. 337]), though the explicit statement does not appear in [15].
32
2. For every W ∈ Grm (Rp ), there exists a coordinate m-plane RJ (see Notation 6.1) such that dGr (W, RJ )2
p2 than for m < p2 . Remark 5.12. It is relatively easy to show that for any involution R, there exists σ for which RIσ is not an involution. For p = 2, 3, we have dSO (I, R) ≤ π for every R ∈ SO(p), and dSO (I, R) = π for every involution R, so any noninvolution is closer to the identity than is any involution. Hence for these values of p, Proposition 5.11 is easy to prove. However, for p ≥ 4, given an involution R and a σ ∈ Ip+ for which RIσ is not an involution, (5.10) shows that we cannot immediately deduce that dSO (I, RIσ) < dSO (I, R). 6. Proofs of the half-angle relation and results related to Grassmannians: Propositions 5.9, 5.10, and 5.11 The half-angle relation in Proposition 5.10 underlies our proofs of of the most of the other results stated in Section 5.2 (all but Proposition 5.8). When the dimensions of the eigenspaces in Proposition 5.10 are equal, the half-angle relation leads to the elegant distance-relation (5.13). This equidimensonal case is actually the only one we need for the application to Type-II nonuniqueness of MSSR curves. However, the half-angle relation (5.14) holds whether or not dim(E−1 (R1 )) = dim(E−1 (R2 )). Since this fact may be of interest outside
33
the scope of this paper, and is not much harder to prove without the equaldimensions restriction, we have stated (and will prove) the more general relation. Section 6.1 is devoted to establishing Proposition 5.10. In Section 6.2, we apply this proposition to establish Propositions 5.9 and 5.11. 6.1. The half-angle relation We start with some notation. Notation 6.1. 1. For 1 ≤ i ≤ p let ei denote the ith standard basis vector of Rp . 2. For 0 ≤ m ≤ p, let Jm,p denote the collection of m-element subsets of {1, . . . , p}. (a) For 0 ≤ m ≤ p and J ∈ Jm,p , define σJ = (σ1 , . . . , σp ) ∈ Ip by σi = −1 for i ∈ J and σi = 1 for i ∈ / J. Similarly, for σ = (σ1 , . . . , σp ) ∈ Ip , define J σ = {i ∈ {1, . . . , p} : σi = −1}. (The maps J 7→ σJ and σ 7→ J σ are inverse to each other.)
(b) If 1 ≤ m ≤ p and J = {i1 , . . . , im } ∈ Jm,p , with i1 < i2 < · · · < im , let EJ denote the p × m matrix whose k th column is eik , 1 ≤ k ≤ m. (c) For 0 ≤ m ≤ p and J ∈ Jm,p , define RJ = {(x1 , x2 , . . . , xp ) ∈ Rp : xi = 0 if i ∈ / J}.
The collection {RJ : J ∈ Jm,p } is the set of “coordinate m-planes” in Rp . 3. For any J ⊂ {1, . . . , p}, let J ′ denote the complement of J in {1, . . . , p}. 4. For m1 , m2 ∈ {1, 2, . . . , p}, W ∈ Grm1 (Rp ), Z ∈ Grm2 (Rp ), and J ∈ Jm1 ,p , writing m = min{m1 , m2 }, (a) let φ1 (W, Z), . . . , φm (W, Z), denote the principal angles between the m1 -plane W and the m2 -plane Z (see [7, Section 12.4.3]), and (b) let φJ,i (W ) = φi (W, RJ ), 1 ≤ i ≤ m1 . 5. For 1 ≤ m ≤ p define dGr : Grm (Rp ) × Grm (Rp ) → R by dGr (W, Z) =
m X i=1
2
φi (W, Z)
!1/2
.
(6.1)
As noted earlier, dGr is the distance-function defined by the standard SO(p)-invariant Riemannian metric on Grm (Rp ) (up to a constant factor).
34
The following long but far-reaching technical lemma, giving several detailed relations between a general involution in SO(p) and its product with a signchange matrix, is our key tool for establishing the results stated in Section 5.2. It is best thought of as a series of lemmas, all with the same hypotheses, that have been rolled into one long lemma in order to avoid restating hypotheses and notational definitions. After proving the lemma, we build on it with two corollaries, completing the groundwork for the proofs (in later sections) of the Section 5.2 propositions. Lemma 6.2. Let R ∈ SO(p) be an involution, let σ ∈ Ip+ , assume 0 < mσ := ′ level(σ) < p, and let J = J σ (see Notation 6.1). Viewing Rp as RJ ⊕ RJ , A1 A2 , where A1 is below we write every p × p matrix in the block form A3 A4 (p − mσ) × (p − mσ), A2 is (p − mσ) × mσ, A3 is mσ × (p − mσ), and A4 is mσ × mσ. Then: (i) In this block form, R1 R2 R= , (6.2) R2T R4 where R1 is a symmetric (p−mσ)×(p−mσ) matrix, R4 is a symmetric mσ ×mσ matrix, and R2 is (p − mσ) × mσ. (ii) In the same block form, 1 R1 0 (RIσ)sym = (RIσ + IσR) = . (6.3) 0 −R4 2
(iii) All eigenvalues of R1 and R4 lie in the interval [−1, 1]. (iv) For every λ ∈ (−1, 1), if λ is an eigenvalue of R1 (respectively, R4 ), then −λ is an eigenvalue of R4 (resp. R1 ) with the same multiplicity. (v) Let l denote the number of eigenvalues of R1 , counted with multiplicity, lying in the interval (−1, 1). Then l is also the number of eigenvalues of R4 , counted with multiplicity, lying in (−1, 1), and l ≤ min{m σ, p− mσ}. v J′ p (vi) The inclusion map R → R defined by v 7→ restricts to iso0 T J′ J p morphisms E±1 (R 1) → E±1 (R) R . Similarly the inclusion map R → R T 0 defined by w 7→ restricts to isomorphisms E±1 (R4 ) → E±1 (R) RJ . w T T ′ (vii) Let l− = dim(E1 (R) RJ ), l+ = dim(E−1 (R) RJ ).3 Then dim(E1 (R4 )) = l− and dim(E−1 (R1 )) = l+ . (Thus l− + l+ is the multiplicity of −1 as an eigenvalue of (RIσ)sym in (6.3), hence of RIσ itself, and therefore yields a lower bound on dSO (RIσ, I).) Furthermore, l− ≥ level(σ) − level(R)
and
l+ ≥ level(R) − level(σ),
(6.4)
± subscripts are chosen according to the eigenspaces of Iσ rather than R: RJ = ′ E−1 (Iσ), RJ = E1 (Iσ). 3 The
35
and l− − l+ = level(σ) − level(R).
(6.5)
σ {vi }p−m i=1
of Rp−mσ (i.e. (viii) There exist an orthonormal R1 -eigenbasis p−mσ consisting of eigenvectors of R1 ) and an R4 an orthonormal basis of R mσ σ . For any such bases {vi } of Rp−mσ , {wi } of Rmσ , eigenbasis {wi }m i=1 of R let {λ′i }, {λi } be the corresponding eigenvalues (i.e. R1 vi = λ′i vi and R4 wi = λi wi ), and define
Then (r
vi
=
wi
=
vi 1 T ′ 1+λ R2 vi i
, 1 < i ≤ p − mσ, λ′i 6= −1,
vi , 1 ≤ i ≤ p − mσ , λ′i = −1, 0 −1 1−λi R2 wi , 1 ≤ i ≤ mσ, λi 6= 1, wi 0 , wi
(6.6)
(6.7)
1 ≤ i ≤ mσ, λi = 1.
) 1 + λ′i S ′ vi : 1 ≤ i ≤ p − mσ, λi 6= −1 {wi : 1 ≤ i ≤ mσ, λi = 1} (6.8) 2
(ordered arbitrarily) is an orthonormal basis of E1 (R), and the set (r
) S 1 − λi wi : 1 ≤ i ≤ mσ, λi 6= 1 {vi : 1 ≤ i ≤ p − mσ, λ′i = −1} (6.9) 2
(ordered arbitrarily) is an orthonormal basis of E−1 (R). Note that the cardinality of the second set in (6.8) (respectively (6.9)) is l− (resp. l+ ). Proof: To simplify notation in this proof, we let m = mσ. Since R ∈ SO(p) is an involution, R = R−1 = RT . Hence R is symmetric, implying assertion (i), and Rp is the orthogonal direct sum of E1 (R) and E−1 (R) (since the only possible eigenvalues of an involution are ±1). For (ii), observe that in the block-form decomposition we are using, I(p−m)×(p−m) 0 . Iσ = 0 −Im×m A simple calculation then yields (6.3). Next, because R2 = I, we have the following relations: 36
R12 + R2 R2T R1 R2 + R2 R4
= =
I(p−m)×(p−m) , 0(p−m)×m ,
(6.10) (6.11)
R2T R1 + R4 R2T
=
0m×(p−m) ,
(6.12)
=
Im×m .
(6.13)
R42
+
R2T R2
From (6.10) and (6.13), for any v ∈ Rp−m , w ∈ Rm , we have kR1 vk2 + kR2T vk2 kR4 wk2 + kR2 wk2
= =
kvk2 , kwk2 .
(6.14) (6.15)
It follows from (6.14)–(6.15) that if λ is an eigenvalue of R1 or R4 , then |λ| ≤ 1, yielding (iii). To obtain (iv), consider the operators L : Rm → Rp−m and L∗ : Rp−m → m R defined by L(w) = R2 w and L∗ (v) = R2T v. Suppose that R1 has an eigenvalue λ with |λ| < 1, and let 0 6= v ∈ Eλ (R1 ). Let w = R2T v; note that (6.14) implies w 6= 0. Using (6.12), R4 w = R4 R2T v = −R2T R1 v = −R2T λv = −λw.
Hence L∗ maps Eλ (R1 ) injectively to E−λ (R4 ). Similarly, if R4 has an eigenvalue −λ with |λ| < 1, and L∗ maps E−λ (R4 ) injectively to Eλ (R1 ). It follows that, for any λ ∈ R with |λ| < 1, λ is an eigenvalue of R1 if and only if −λ is an eigenvalue of R4 , and that the maps L∗ |Eλ (R1 ) : Eλ (R1 ) → E−λ (R4 ) = Eλ (−R4 ), L|Eλ (−R4 ) : Eλ (−R4 ) = E−λ (R4 ) → Eλ (R1 )
(6.16) (6.17)
are isomorphisms. This establishes (iv). Statement (v) is an immediate corollary of (iv). J′ p For(vi),let ι : R → R be the first inclusion map in the lemma. Note that v R1 v R = . If v ∈ Eλ (R1 ) with λ = ±1, equation (6.14) implies that 0 R2T v R2T v = 0, hence that Rι(v) = λι(v). Conversely, if Rι(v) = λι(v), then R1 v = T ′ λv (and R2T v = 0). Hence ι carries Eλ (R1 ) isomorphically to Eλ (R) RJ . The argument for the inclusion map RJ → Rp is essentially identical. This establishes (vi). T Part (vi) implies that dim(E1 (R4 )) = dim(E1 (R) RJ ) = l− and that T J′ dim(E−1 (R1 )) = dim(E−1 (R) R ) = l+ , the first assertion in (vii). To obtain (6.4)–(6.5), note that for any subspaces V, W of Rp , we have dim(V ⊥
T
W ) − dim(V
T
W ⊥ ) = dim(W ) − dim(V ).
37
(6.18)
(The proof of (6.18) is straightforward linear algebra.) Applying this to the ′ case V = E−1 (R), V ⊥ = E1 (R), W = E−1 (Iσ) = RJ , W ⊥ = E1 (Iσ) = RJ , we T ⊥ T ⊥ have l− = dim(V W ) and l+ = dim(V W ), so (6.5) follows from (6.18). The inequalities in (6.4) follow directly from (6.5). (viii) Since R1 (respectively R4 ) is symmetric, an orthonormal R1 -eigenbasis {vi } of Rp−m (resp., orthonormal R4 -eigenbasis {wi } of Rm ) exists. Select such eigenbases, and let {λi }, {λ′i } be eigenvalues as defined in the Lemma. Note that the second set in (6.8) is a basis of E1 (R4 ), which by (vi) is isomorphic T T to E1 (R) RJ . Hence the cardinality of this set is dim(E1 (R) RJ ), i.e. l− . Similarly, the second set in (6.9) is a basis of E−1 (R1 ) and has cardinality l+ . Without loss of generality, we may assume that the eigenvectors vi with eigenvalue −1, if any, are the last l+ , and that the eigenvectors wi with eigenvalue 1, if any, are the last l− . Using (6.13), for 1 ≤ i ≤ m we have R2T R2 wi = (1 − λ2i )wi , while using (6.11) we find R1 R2 wi = −λi R2 wi . Then, using (6.2), a simple calculation shows that Rwi = −wi . Hence wi ∈ E−1 (R) for 1 ≤ i ≤ m − l− , while from part (vi), vi ∈ E−1 (R) for p − m − l+ < i ≤ p − m. Let h· , ·i denote the standard inner product on Rn for any n. As seen in the proof of part (vi), v ∈ E−1 (R1 ) implies R2T v = 0. Hence for p − m − l+ < i ≤ p − m and 1 ≤ j ≤ m − l− , hvi , wj i ∝ hvi , R2 wj i = hR2T vi , wj i = 0, while for p − m − l+ < i, j ≤ p − m we have hvi , vj i = hvi , vj i = δij . Finally, for i, j ≤ m − l− , using the fact that hR2 wi , R2 wj i = hwi , R2T R2 wj i = 2 δij . Thus hwi , (1 − λ2i )wj i, a simple computation yields hwi , wj i = 1−λ i q S 1−λi { 2 wi : 1 ≤ i ≤ m − l− } {vi : p − m − l+ < i ≤ m} is an orthonormal subset of E−1 (R). Using (6.5), the cardinality of this subset is m − l− + l+ = level(σ) − (level(σ) − level(R)) = level(R) = dim(E−1 (R)). Hence (6.9) is an orthonormal basis of E−1 (R). The proof that (6.8) is an orthonormal basis of E1 (R) is similar.
Corollary 6.3. Hypotheses and notation as in Lemma 6.2. Let l+ = dim(E−1 (R1 )) and l− = dim(E1 (R4 )) (as in Lemma 6.2(vii)). In addition ⌈p/2⌉ let {θi ∈ [0, π]}i=1 be angles for which R(θ1 , . . . , θ⌈p/2⌉ ) is a normal form of RIσ, and let {θ˜i }pi=1 be as defined in (5.11). Let J∗ = {j ∈ J : 0 < θ˜j < π}. Then |J∗ | ≤ min{mσ, p − mσ}, and dSO (RIσ, I)2 =
X 1 (l+ + l− )π 2 + θ˜j2 . 2
(6.19)
j∈J∗
If level(σ) = level(R), then dSO (RIσ, I)2 = l− π 2 +
X
j∈J∗
θ˜j2 =
X
θ˜j2 .
(6.20)
j∈J
Proof: Let β ′ : J ′ → {1, . . . , p − mσ}, β : J → {1, . . . , mσ}, be order-preserving bijections. By (5.8), the eigenvalues of (RIσ)sym , counted with multiplicity, are
38
{cos θ˜i }pi=1 . But from Lemma 6.2(ii), we can read off the eigenvalues of (RIσ)sym from (6.3); they are λ′1 , . . . , λ′p−mσ , −λ1 , . . . , −λmσ (ordered arbitrarily). Thus, reordering the λ′j and the λj appropriately, for 1 ≤ j ≤ p we have ( λ′β ′ (j) if j ∈ J ′ , ˜ cos θj = (6.21) −λβ(j) if j ∈ J. Define J∗ = {j ∈ J : λβ(j) 6= ±1}. Observe that J∗ can also be characterized as {j ∈ J : λβ(j) 6= ±1}. Similarly, define J∗′ = {j ∈ J ′ : λ′β ′ (j) 6= ±1} = {j ∈ J ′ : 0 < θ˜j < π}. By part (v) of Lemma 6.2, |J∗′ | = |J∗ | = l ≤ min{mσ, p − mσ}, and by part (iv) of the Lemma there is a bijection b : J∗ → J∗′ such that −λj = λ′b(j) for all j ∈ J∗′ . Hence −1 ′ if j ∈ J∗′ , cos λβ ′ (j) −1 ′ ˜ cos λb(β(j)) if j ∈ J∗ , θj = (6.22) 0 or π otherwise. In particular,
X
θ˜j2 =
X
θ˜j2 .
(6.23)
j∈J∗′
j∈J∗
Next, note that X
j∈J ′ \J∗′
θ˜j2 =
X
θ˜j2 = #{j ∈ J ′ : λ′β ′ (j) = −1}π 2 = dim(E−1 (R1 ))π 2 = l+ π 2 ,
{j∈J ′ :θ˜i =π}
and similarly
P
j∈J\J∗
dSO (RIσ, I)2
(6.24)
θ˜j2 = l− π 2 . From (5.12) we therefore have
=
=
1 X
X
X
X
θ˜j2 + θ˜j2 θ˜j2 + θ˜2 + 2 ′ ′ j j∈J∗ j∈J∗′ j∈J\J∗ j∈J \J∗ X 1 l+ π 2 + l− π 2 + 2 θ˜j2 , 2 j∈J∗
establishing (6.19). If level(σ) = level(R), then equation (6.5) implies that l+ = l− , so (6.19) implies the first equality in (6.20). For the second equality, observe that j ∈ J \ J∗ if and only if θ˜j is 0 or π. The number of j’s in J for which θ˜j = π is P exactly l− , while the j’s in J for which θ˜j = 0 have no effect on j∈J θ˜j2 . Hence the second equality in (6.20) holds.
39
Corollary 6.4. Hypotheses and notation as in Lemma 6.2, except that we ad⌈p/2⌉ ditionally write mR := level(R) and m = min{mσ, mR }. Let {θi ∈ [0, π]}i=1 be angles for which R(θ1 , . . . , θ⌈p/2⌉ ) is a normal form of RIσ, let {θ˜i }pi=1 be as defined in (5.11), let the elements of J be i1 < i2 < · · · < imσ , and let φJ,j = φJ,j (E−1 (R)), 1 ≤ j ≤ m. Then: (i) Up to ordering, φJ,j =
θ˜ij , 1 ≤ j ≤ m. 2
(6.25)
(ii) If mσ = mR then dSO (RIσ, I) = 2dGr (E−1 (R), RJ ).
(6.26)
f be the p × mR matrix formed by the columns of the basis Proof: (i). Let W (6.9) of E−1 (R), with the elements of the first set in (6.9) comprising the first mσ − l− columns , and the elements of the second set comprising the last l+ columns. (Here l± are defined as in Lemma 6.2(vii).) Without loss of generality we order the R4 -eigenvectors wi such that the first mσ − l− are the ones for which λi 6= 1. f form an orthonormal basis of E−1 (R), the numbers Since the columns of W m f T EJ . (This is {cos φJ,i }i=1 are the singular values of the mR × mσ matrix W true whether mR ≤ mσ or mR > mσ.) But, relative to the block-decomposition of matrices used in Lemma 6.2, the upper (p − mσ) × mσ block of EJ is 0, and f∗ for the mσ × (mσ − l− ) the lower mσ × mσ block is Imσ×mσ . Hence, writing W f , and noting matrix formed by the last mσ rows of the first mσ −l− columns of W T f∗ W f T EJ = , where the that mσ − l− = mR − l+ (by (6.5)), we have W 0l+ ×mσ f∗T is a multiple of wT . Hence for ith row of the (mσ − l− ) × (p − mσ) matrix W i i, j ≤ mσ − l− = mR − l+ ,
f T EJ )(W f T EJ )T (W
ij
=
r
1 − λi 2
r
1 − λj 1 − λj hwi , wj i = δij , 2 2
(6.27)
T fT fT and all other entries of the mR × mR matrix 0. But for W EJ (W EJ ) are 1−λ T T T f f mσ − l− < i ≤ mσ, we have λi = 1, so (W EJ )(W EJ ) = 2 j δij ij
for all i, j ≤ m = min{mR , mσ}. Thus the upper left-hand m × m block of f T EJ )(W f T EJ )T (the entire mR × mR matrix if mR ≤ mσ) is diag( 1−λ1 , . . . , (W 2 q 1−λj 1−λm fT 2 ), so the numbers 2 , 1 ≤ j ≤ m, are the singular values of W EJ . Thus, up to ordering, the principal angles {φJ,i } are given by r 1 − λj , 1 ≤ j ≤ m. (6.28) cos φJ,j = 2
40
The bijection β : J → {1, . . . , mσ} used in the proof of Corollary 6.3 is simply the inverse of the map j 7→ ij . Thus from (6.21), we have −λj = cos θ˜ij , 1 ≤ j ≤ mσ .
(6.29)
Combining (6.28) with (6.29), cos φJ,j =
s
But θ˜ij ∈ [0, π], so both φJ,j and φJ,j = θ˜ij /2, 1 ≤ j ≤ m.
1 + cos θ˜ij θ˜i = cos j . 2 2 θ˜ij 2
(6.30)
lie in [0, π2 ]. Hence (6.30) implies that
(ii) Assume mσ = mR ; then both equal m. Corollary 6.3 then implies that dSO (RIσ, I)2 =
m X
θ˜i2j .
(6.31)
i=1
But from part (i) we have θ˜ij = 2φJ,j for 1 ≤ j ≤ m, so, using (6.20), dSO (RIσ, I)2 = 4dGr (E−1 (R), RJ )2 , implying (6.26). We are now ready to establish the general half-angle relation: Proof of Proposition 5.10. Let U ∈ O(p) and let TU : Rp → Rp be the corresponding orthogonal transformation. For any even m′ > 0 and any R ∈ Invm′ (p), we have E−1 (U RU −1 ) = TU (E−1 (R)). (6.32) Now let T : Rp → Rp be an orthogonal transformation carrying E−1 (R2 ) to a coordinate plane RJ , and let U ∈ O(p) be the matrix for which T = TU . Then U R2 U −1 = Iσ, where σ = σJ . For i = 1, 2 let Ri′ = U Ri U −1 . Since T is an orthogonal transformation, the (multi-)set of principal angles between E−1 (R1′ ) = T (E−1 (R1 )) and E−1 (R2′ ) = T (E−1 (R2 )) is identical to the (multi-)set of principal angles between E−1 (R1 ) and E−1 (R2 ). But R1′ Iσ = R1′ R2′ = U R1 R2 U −1 , so R(θ1 , . . . , θ⌈p/2⌉ ) is a normal form of R1′ Iσ as well as of R1 R2 . The result now follows from Corollary 6.4(i) and equation (6.22) (the latter being needed only for the final statement of the result).
6.2. The proofs of Propositions 5.9 and 5.11 Proof of Proposition 5.9. Since dSO (RIσ, I) = dSO (R, Iσ−1 ) dSO (R, Iσ), conclusion (ii) of Corollary 6.4 can be written equivalently as: dSO (Φ(W ), Φ(RJ )) = 2dGr (W, RJ ).
=
(6.33)
Fix any J ∈ Jm,p . Letting “·” denote the natural left-action of SO(p) on Grm (Rp ) (U·W = TU (W )), observe that, in the notation of the proof of 41
Proposition 5.10), for all U ∈ SO(p) and W ∈ Grm (Rp ) we have Φ(U·W ) = U Φ(W )U −1 (simply another way of writing (6.32).) Clearly dGr is invariant under this action, and dSO is both left- and right-invariant, so (6.33) implies that dSO (Φ(U·W ), Φ(U·RJ )) = 2dGr (U·W, U·RJ ).
Now let W, V ∈ Grm (Rp ). Since the action of SO(p) on Grm (Rp ) is transitive, there exists U ∈ SO(p) such that U·RJ = V. Using any such U , we then have dSO (Φ(W ), Φ(V ))
= dSO (Φ(U·U −1·W ), Φ(U·RJ )) = 2dGr (U·U −1·W, U·RJ ) = 2dGr (W, V ).
Remark 6.5. Of course, Proposition 5.9 can be deduced from computations with the principal fibration π : SO(p) → SO(p)/S(O(m) × O(p − m)) ∼ = Grm (Rp ); the standard Riemannian metric on Grm (Rp ) (for which dGr is the geodesicdistance function) is defined so as to make π a Riemannian submersion up to a normalization constant. Our proof of Proposition 5.9 is independent of this Riemannian proof in the sense that it establishes equality between the left-hand side of (6.33) and the right-hand side as defined by equation (6.1). Without the a priori knowledge that dGr is a geodesic-distance function, it is not obvious that dGr satisfies the triangle inequality, hence whether dGr is a metric. Thus Proposition 5.9 actually provides an independent proof that dGr is a metric on Grm (Rp ). The only use of Riemannian geometry in this proof is through the knowledge that dSO is, in fact, a metric (because it is a geodesic-distance function). Proof of Proposition 5.11. Let “Statement 1” and “Statement 2” be the statements listed as 1 and 2 in the Proposition. As noted in the proof of Proposition 5.9, dSO (RIσ, I) = dSO (R, I σ), so the inequality dSO (RIσ, I) < dSO (R, I) can be rewritten as mπ 2 . 2 Assume first that Statement 1 is true. Let W ∈ Grm (Rp ). Then Φm,p (W ) is an involution of level m, so there exists σ ∈ Ip+ of level m such that dSO (R, Iσ)2