Horizontal Dimensionality Reduction and Iterated Frame Bundle Development Stefan Sommer Department of Computer Science, University of Copenhagen,
[email protected]
Abstract. In Euclidean vector spaces, dimensionality reduction can be centered at the data mean. In contrast, distances do not split into orthogonal components and centered analysis distorts inter-point distances in the presence of curvature. In this paper, we define a dimensionality reduction procedure for data in Riemannian manifolds that moves the analysis from a center point to local distance measurements. Horizontal component analysis measures distances relative to lowerorder horizontal components providing a natural view of data generated by multimodal distributions and stochastic processes. We parametrize the non-local, lowdimensional subspaces by iterated horizontal development, a constructive procedure that generalizes both geodesic subspaces and polynomial subspaces to Riemannian manifolds. The paper gives examples of how low-dimensional horizontal components successfully approximate multimodal distributions.
1 Introduction In Euclidean space, the Pythagorean theorem splits squared distances into orthogonal components. Dimensionality reduction of sampled data can therefore be performed relative to the data mean. In non-Euclidean spaces, curvature makes orthogonality a local notion. Therefore, analysis centered at one point, e.g. an intrinsic mean, can only provide an approximation of the metric at points distant from the mean. In this paper, we define a dimensionality reduction procedure that moves the analysis close to each data point by performing the orthogonal split into components locally. We argue that the derived horizontal component analysis (HCA) by measuring distances relative to lower-order components provides a natural view of data generated by multimodal distributions and stochastic processes such as anisotropic diffusions. To provide a parametric representation of the low-dimensional data approximation, we generalize geodesic subspaces by iterated horizontal development. This geometric procedure in the frame bundle of the manifold allows a non-centered definition of subspaces that generalize Euclidean linear subspaces in addition to providing a natural generalization of polynomials and other parametric subspaces to Riemannian manifolds. 1.1 Orthogonality and Locality In Riemannian manifolds, orthogonality as measured by the metric tensor at each point is not preserved when data is linearized to the tangent space at an intrinsic mean, see Figure 1. This situation arises in multiple cases: (1) a centered view of samples from a
1.5
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1.5 −1.5
−1
−1
−0.5
0
0.5
1
1.5
−1.5 −1.5
−1
−0.5
0
0.5
1
1.5
(a) Bimodal distribution on S2 . (b) HCA visualization, esti- (c) PGA visualization, estimated variance: 0.492 . mated variance: 1.072 .
Fig. 1. A bimodal distribution on S2 constructed from Gaussian distributions in the tangent space at the two modes. With HCA, the first horizontal component (black dotted in (a,b)) is the geodesic between the modes. The second horizontal component (blue vectors) is parallel transported along the first component and the analysis is performed relative to the first component. With PGA (c), the curvature skews the centralized linearization giving a curved view of the distributions. Since geodesics curve towards the modes (green dotted in (a)), variance along the second PGA component (red vector) is over-estimated. In higher dimensions, components may flip, see Figure 4.
multimodal distribution will be skewed by the curvature because an intrinsic mean will be placed between the modes; (2) the analysis of a distribution with center of mass in a subspace, e.g. generated by a diffusion process starting at a geodesic, will be affected by which point in the subspace is chosen as the center point; (3) the analysis of stochastic processes will be affected by the geometry far from the starting point. The dimensionality reduction framework designed in this paper is inspired by the situation when data is generated by a stochastic process that is as close to stationary as allowed by the geometry of the manifold. This view emphasizes transport of the infinitesimal generator of the process, and the holonomy of the connection therefore becomes important. While other approaches assume data varies mainly along geodesics [1], the fundamental assumption is here that generators are mainly transported horizontally along principal geodesics. Based on this assumption, the low-dimensional components will here be horizontal subspaces of the frame bundle that project to subspaces of the manifold. HCA builds low-dimensional representations by centering the analysis at projections to the lower horizontal components. Analysis of multimodal distributions (1) is with HCA performed near the local modes, see Figure 1, and distances in noncentered distributions (2) are measured relative to the center of mass. The analysis of stochastic processes (3) is likewise less influenced by global effects of curvature.
2 Non-Euclidean Dimensionality Reduction Dimensionality reduction of data generated by random variables taking values in nonEuclidean spaces is difficult: it is hard to describe probability distributions parametrically; there is no canonical generalization of Euclidean affine subspaces; global effects of curvature make centered analysis less natural; and distributions with non-centered mass can occur naturally, in particular in compact manifolds.
Principal Geodesic Analysis (PGA, [2]) extends Euclidean PCA by finding lowdimensional geodesic subspaces approximating a set of observed data. The analysis is centered at an intrinsic data mean µ , and the observed data is projected to the tangent space Tµ M using the logarithm map. The linearization to the tangent space can be avoided using exact PGA optimization [3]. The geodesic subspaces spanned by lower order principal components are in general not totally geodesic, a fact that makes the choice of center point µ central to the method. Geodesic PCA (GPCA, [1]) finds principal geodesics instead of geodesic subspaces. The choice of center point is moved from intrinsic means to points along the first principal geodesic that best describe the data. Second- and higher principal geodesics are required to pass these principal means (PMs) and be orthogonal at the PMs. Distances are measured from the data points to the principal geodesics using intrinsic distances. Using geodesics as vehicles for low-dimensional representations is meaningful when the data is congruent with the underlying geometry [1]. The above techniques take on this point of view by assuming that data varies along geodesic subspaces (PGA) and geodesics (GPCA). When the data is assumed to be generated by stochastic processes, the viewpoint shifts to transport of the generator. For example, PCA can capture the components of the transition density of data generated by a stationary diffusion in Euclidean space. Diffusion processes on manifolds are more complicated because the diffusion matrix cannot be uniquely transported to all points of the manifold by parallel transport; the difference between the parallel transport along different curves connecting two points is given by the holonomy of the connection, and, in the stochastic setting, random holonomy [4]. The holonomy can be represented by lifting the transport to the frame bundle as we will take advantage of in this paper. Multimodal distributions and distributions with center of mass in a subspace of the manifold emphasize the effects of centering analysis to mean points, either intrinsic means (PGA) or PMs (GPCA). With a bimodal distribution, the center point may lie between the modes emphasizing the effects of curvature as logarithm maps and distances are measured over long distances. The approach taken in this paper moves the analysis from a center point towards the modes thus providing a view of the local covariance structure that is less influenced by global curvature effects.
3 Frame Bundle, Horizontal Lift, and Developement Stochastic processes and stochastic differential equations on a manifold M with connection can be defined in the frame bundle F(M) of the manifold where curves and parallel transport are represented so that the holonomy of the connection is explicit. We will use this construction to define a parametric representation of subspaces of M, and use the subspaces for HCA. Here, we describe the frame bundle, horizontal lifts, and developments. The exposition follows [4] which also contains more information horizontal lifts and the Eells-Elworthy-Malliavin construction of Brownian motion. Let (M, g) be a Riemannian manifold of dimension η . For each point x ∈ M, let Fx (M) be the set of frames ux , i.e. ordered bases of Tx M. The entire set {Fx (M)}x∈M is the frame bundle F(M), and the map πF(M) : F(M) → M is the canonical projection. Let xt be a curve in M with x = x0 . Using the covariant derivative connected to g, a frame u =
πT F(M)
T F(M) = HF(M) ⊕V F(M)
h + v 7→ h
F(M)
HFM
πF(M)
π∗ TM
πT M
M
Fig. 2. (left) Commutative diagram for the manifold, frame bundle, and the horizontal subspace of T F(M). (right) Subspaces h1 , h2 , and h3 constructed by iterated geodesic development.
u0 for Tx0 M can be parallel transported along xt giving a path of frames ut in F(M) with π (ut ) = xt . Such curves in the frame bundle are called horizontal, and their derivatives form an η -dimensional subspace of the η + η 2 -dimensional tangent space Tu F(M). This horizontal subspace Hu F(M) and the vertical subspace Vu F(M) of vectors tangent to the fiber π −1 (u) together split the tangent space, i.e. Tu F(M) = Hu F(M) ⊕ Vu F(M), and the split induces an isomorphism π∗ : Hu F(M) → Tπ (u) M, see Figure 2. Using π∗ , the horizontal vector fields He on F(M) are defined for vectors e ∈ Rη by He (u) = (ue)∗ . Here e is a representation in Rη of the tangent vector ue ∈ Tx M, and, intuitively, He (u) provides the parallel transport of the frame u along curves tangent to ue. A horizontal lift of xt is a curve ut in F(M) tangent to HF(M) such that πF(M) (ut ) = xt . Horizontal lifts are unique up to a choice of initial frame u0 . Let e1 , . . . , eη be the standard basis on Rη . Brownian motion on M can be defined as the projection πF(M) (Ut ) of the solution to the stochastic differential equation dUt = ∑di=1 Hei (Ut ) ◦ dWti in F(M) where Wt is a standard Brownian motion in Rη driving the processes. Similarly, curves wt in Rη defines curves on F(M) trough the ordinary differential equation u˙t = ∑di=1 Hei (ut )wti that project to curves in M using πF(M) . Correspondingly, a horizontal lift ut of a curve xt defines a curve in Rd R by wt = 0t u−1 s x˙s ds. If ut is a horizontal lift of a curve xt , the curve wt is called the anti-development of xt , and xt is the development of wt . As for horizontal lifts, the antidevelopment is uniquely defined up to a choice of initial frame u0 .
4 Horizontal Dimensionality Reduction In the next section, we will extend the development of curves to development of subspaces, and the development will be iterated generating subspaces of M with increasing dimension. In particular, we will define subspaces through geodesic development. Here, we jump ahead and use geodesic developments to define horizontal component analysis (HCA) with developments that seek optimal approximations of the observed data. Let x1 , . . . , xN be a finite sampling of an M-valued random variable X, and let f : Rd−1 → M be a smooth map. Let r f : M → Rd−1 be a map that sends points in M to the representation of nearby points in the image of f , i.e. such that p ∈ M is close to
f (r f (p)). We denote r f an f -representation. A geodesic development h of f is a smooth map Rd → F(M) such that the curves t 7→ πF(M) (h(v,t)), v ∈ Rd−1 are geodesics. The geodesic development is constructed such that these geodesic curves are orthogonal to subspaces parallel transported along f , in particular at the points f (r f (xi )). We denote these curves xth (xi ) and define a projection on the curve xth (xi ) by πh (xi ) = argmint d(xth (xi ), xi )2 . A d-dimensional horizontal component relative to f is a geodesic development hd relative to f minimizing the residual res f (hd ) = ∑Ni=1 d(xi , πhd (xi ))2 , i.e. minimizing the squared residuals from the data to the developed curves. If f itself is a horizontal component defined through the iterated development procedure described in the next section, we say that hd is a horizontal component of the observed data xi . A sequence h1 , . . . , hd of horizontal components is called a horizontal component analysis. Each horizontal component hi is determined by a linear subspace of Rη that defines the initial direction of the geodesics starting at the image of hi−1 in M. Given a base point and frame (p, u), the analysis therefore determines a sequence of subspaces V1 , . . . ,Vd of Rη . The hd -representation gives coordinates for approximations of the data points xi in each of the components. The definition mimics the formulation of Euclidean PCA when defined as to minimize residual errors: in Euclidean space, h1 , . . . , hd are linear subspaces, and HCA finds principal linear subspaces. Zeroth horizontal components are intrinsic means. First horizontal components are geodesics equal to the first principal geodesics in GPCA. Second horizontal components are in general not equal to second principal subspaces in PGA because the residual measurement is performed against geodesics starting at the projections to the first horizontal component, i.e. at h1 (rh1 (xi )) instead of at a mean point. Under certain conditions, the horizontal component hd is a d-dimensional immersed submanifold of M. Note the difference with GPCA that finds principal geodesics, i.e. 1-dimensional subspaces. The definition of the residual res f implies a nesting of the horizontal components. If the analysis is started with h0 , the analysis will be nested around the intrinsic mean. Conversely, if it is started at h1 , intrinsic means are discarded from the analysis. The nesting is however not required, and the minimization can be extended to the non-nested case. In a similar fashion, the minimization can be extended such that the coordinates in the lower components are not fixed. The increase in dimension from hi−1 to hi can be allowed to be greater than one so that e.g. a 2-dimensional geodesic subspace is developed from the image of hi−1 . If the data has isotropic covariance in some dimensions, it may be useful to develop these dimensions together. The hd -representation rhd : M → Rd applied to xi provides a representation of the observed data in Euclidean space. The geodesics that generate the development are mapped to straight lines and distances are preserved along the geodesics. In the experiments, we will see 2D and 3D visualizations of multimodal distributions using the horizontal representation. The variances reported are measured in the hd -representation. A computational representation can consist of a base point p and reference frame u so that each point x ∈ hd is represented by the d-dimensional vector rhd (x). The computation of hd can be partly carried out using the optimization framework [3] developed for exact PGA computations. The code used for the experiments in Section 6 is available in the software package smanifold (http://github.com/nefan/smanifold).
5 Iterated Development We here construct parametrized subspaces of F(M) and M by using repeated application of the development process. In particular, this will include the geodesic developments that we use for HCA. Because the development maps curves with constant derivative w˙ t to geodesics, geodesic developments provide a generalization of geodesic submanifolds. With the development, frames are parallel transported, and the iterated development thereby provides a unique way of transporting orientations in the generated subspaces. Development, also known as “rolling without slipping” is a widely used construction in differential geometry [4]. Rolling of submanifolds is considered in [5]. To the best of our knowledge, use of development for defining parametric subspaces and the iterated development process have not previously been described in the literature. Let V be a subspace of Rη and let f : V → F(M) be a smooth map. Letting f (0) = (p, u), we can extend the development to curves starting at vectors w0 ∈ V away from the origin by starting the development at f (w0 ). The development of a curve wt with w0 ∈ V is then the development of wt − w0 started at f (w0 ). For such curves, we use the notation xt = D f (w)t and ut = D f (w)t for both the manifold and frame bundle curves, and we say that the development is relative to f . If v ∈ Rη is a sum v = w0 + w of vectors with w0 ∈ V and w ∈ V ⊥ , we write D f (v) for the development D f (w0 + tw) along the curve w0 + tw. This defines geodesics passing πF(M) ( f (w0 )) that are orthogonal to f (w0 )V . In this case, we say that D f (v) is a geodesic development of f . Assume that V splits into orthogonal subspaces V1 and V2 so that, combined with a frame u at p, the subspaces Vi represent subspaces of Tp M. Let X 1 , . . . , X d2 : V → V be d2 vector fields with V2 components having rank d2 everywhere. Let X be a matrix that represents the d2 vector fields in the columns, and let f : V1 → F(M) be a smooth map. We then define the development D f ,X : V → F(M) along X relative to f as the map given by the development v1 + v2 7→ D f (w)1 centered at (p, u) evaluated at time t = 1 where wt is the curve w˙ t = X(wt )v2 starting at v1 and where vi ∈ V i . Like f , the development D f (w)1 is a smooth map, and the process can therefore be iterated. For a d-dimensional linear subspace V that splits into orthogonal subspaces V1 , . . . ,Vk with corresponding sets of vector fields X1 , . . . , Xk−1 , the development can be iterated for each i = 1, . . . , k − 1. We call this process iterated development. The result is a map from V into F(M) that projects to M. Close to V1 , the development D f ,X of an immersion f is an immersion provided that the columns of the push-forward dv f and f (v)X( f (v)) form a linearly independent set for each v ∈ V1 . In this case, the image is a d-dimensional immersed submanifold of F(M) that projects to an immersed submanifold of M. Iterated development thus produces immersed submanifolds when the vector fields f (v)Xi (v) are disjoint from the tangent space of the development up to i − 1. This is the case locally, i.e. for small v. Therefore, we can define a d-dimensional polynomial submanifold in M as a polynomial p on Rη together with a base point and frame (p, u) and a partition V1 , . . . ,Vk of a ddimensional subspace V of Rη . The surface is generated trough iterated development along the vector fields Dei p. See also the construction of polynomial curves in [6]. Iterated geodesic development produces polynomial submanifolds of first order polynomials. If the partition of V is given by {0},V / , the development h is a geodesic
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3 −3
(a) S2 with samples.
−2
−1
0
1
2
(b) HCA visualization.
3
−3
−2
−1
0
1
2
3
(c) PGA visualization.
Fig. 3. Illustrating a diffusion starting at the equator, 29 points are sampled uniformly around equator with normally distributed vertical position (variance 0.352 ). (a) HCA captures the center of mass in the first component (black dotted in (a)), and it estimates the vertical correct variance (0.342 ). (c) The curvature and centralized analysis distorts the PGA visualization, and the variance is over-estimated (1.752 ).
submanifold. Geodesic developments thus extend geodesic submanifolds in that the flow curves generating the development are geodesics in M. A frame of the tangent space Tp h is parallel transported along the curves used for generating h, and the frame does not stay tangent to T h away from p. On a complete manifold, the development of h is defined for all v ∈ V . The map h is an immersion for small v. If hd is a geodesic development of hd−1 , the parallel transport of Vd and the tangent space of hd−1 may coincide for large v reducing the rank of dv hd and collapsing the dimension of the image of hd around v. Analyzing this case, where hd still provides a low-dimensional representation of the data, remains a subject of future research.
6 Experiments To illustrate differences between HCA and PGA, we analyze data sampled from three multimodal distributions. The experiments will show how the localized analysis with HCA provides improved variance estimation and improved visualization. In Figure 1, data is sampled from a bimodal distribution on the sphere S2 . Around each mode, the samples are normally distributed with vertical variance 0.52 and horizontal variance 0.12 . Both HCA and PGA find the first component close to the equator. Because HCA measures the vertical variation relative to the first component, the normal distributions are captured in the HCA representation and the correct variance is estimated. The visualization is distorted with PGA, and the variance is over-estimated. In Figure 3, data is sampled from a distribution that is uniform around the equator and normal in the vertical direction (variance 0.352). This situation illustrates a diffusion process with mass at t = 0 concentrated at the equator. As for the bimodal case, HCA measures the vertical variation relative to the first component that aligns with the equator, and the vertical variance is correctly estimated. The curvature again distorts the PGA picture, and the uniform distribution along the first component is lost. To see how the difference in variance estimation can make the two methods compute completely different components, we embed the manifold 2x21 − 2x22 + x23 + x24 = 1 in R4
x4
x4
1.5
x1
1
x3
x3
0.5
x3
0 −0.5
x2
x1
x2
−1 −1.5 −1 0 1 −1.5
(a) x1 = 0 slice
(b) x2 = 0 slice
−1
−0.5
0
0.5
1
1.5
(c) HCA visualization
Fig. 4. The manifold M 3 with samples from two Gaussian distributions with largest variance in the x2 direction (0.62 vs. 0.42 in grid units). (a,b) Slices x1 = 0 and x2 = 0 of M 3 with data. (c) HCA visualization of the data. The second HCA horizontal component has largest x2 component (blue vector) whereas the second PGA component has largest x1 component (red vector).
and sample a bimodal distribution with two modes distance 1 away from a center point p = (0, 0, 0, 1) in the x1 = x2 = x4 = 0 plane. The variances are 0.42 , 0.62 , 0.12 in the x1 , x2 and x3 directions, respectively. The first principal component is expected to align with the x3 direction because of the distance between the modes, and the second principal component should capture the large variance in the x2 direction. HCA finds the correct components whereas PGA over-estimates the variance in the x1 direction making it choose the second principal component to lie in the x1 direction.
7 Conclusion Horizontal Component Analysis moves the analysis closer to the data points by performing the orthogonal split into components locally thus providing improved analysis of distributions with large spread and multiple modes. Based on the assumptions that generators are transported horizontally in the frame bundle, HCA choses subspaces constructed by iterated geodesic development to approximate the sampled data. The ability of the method to provide a low-dimensional representation and visualization with correctly estimated variance is illustrated on low-dimensional manifolds.
References 1. Huckemann, S., Hotz, T., Munk, A.: Intrinsic shape analysis: Geodesic PCA for riemannian manifolds modulo isometric lie group actions. Statistica Sinica 20(1) (January 2010) 1–100 2. Fletcher, P., Lu, C., Pizer, S., Joshi, S.: Principal geodesic analysis for the study of nonlinear statistics of shape. Medical Imaging, IEEE Transactions on (2004) 3. Sommer, S., Lauze, F., Nielsen, M.: Optimization over geodesics for exact principal geodesic analysis. Advances in Computational Mathematics, in press (2013) 4. Hsu, E.P.: Stochastic Analysis on Manifolds. American Mathematical Soc. (2002) 5. Leite, F.S., Krakowski, K.A.: Covariant differentiation under rolling maps. (2008) 6. Hinkle, J., Muralidharan, P., Fletcher, P.T., Joshi, S.: Polynomial regression on riemannian manifolds. In: ECCV 2012, Springer (January 2012) 1–14