Manifold Learning in Data Mining Tasks Alexander Kuleshov 1,2, and Alexander Bernstein 1,2 1
Kharkevich Institute for Information Transmission Problems RAS, Moscow 127994, Russia
[email protected] 2 National Research University Higher School of Economics, Moscow 109028, Russia
[email protected]
Abstract. Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented. Keywords: Data Mining, Statistical Learning, Representation learning, Dimensionality Reduction, Manifold Learning, Tangent Learning, Tangent Bundle Manifold Learning.
1
Introduction
The goal of Data Mining, which is a part of Artificial Intelligence, is to extract previously unknown information from a dataset. Thus, it is supposed that information is reflected in the structure of a dataset which must be discovered from the data. Many Data Mining tasks, such as Pattern Recognition, Classification, Clustering, and other, which are challenging for machine learning algorithms, deal with real-world data that are presented in high-dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena in Data Mining tasks, various Representation learning algorithms are used as a first key step in solutions of these tasks. Representation learning (Feature extraction) algorithms transform the original high-dimensional data into their lower-dimensional representations (or features) so that as much information about the original data required for the considered Data Mining task is preserved as possible. adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011
After that, the initial Data Mining task may be reduced to the corresponding task for the constructed lower-dimensional representation of the original dataset. Of course, construction of the low-dimensional data representation for subsequent using in specific Data Mining task must depend on the considered task, and the success of machine learning algorithms generally depends on the data representation [1]. Representation (Feature) learning problems that consist in extracting a lowdimensional structure from high-dimensional data can be formulated as various Dimensionality Reduction (DR) problems whose different formalizations depend on the considered Data Mining tasks. Solutions of such DR problems make the corresponding Data Mining tasks easier to apply to the high-dimensional input dataset. This paper is about DR problems in Data Mining tasks. We describe a few key Data Mining tasks that lead to different formulations of the DR (Sample Embedding, Data Space (Manifold) embedding, Manifold Learning as Data Manifold Reconstruction/Estimation). We propose an amplification of the Manifold Learning (ML) problem, called Tangent Bundle ML (TBML), which provides good generalization ability properties of ML algorithms. There is no generally accepted terminology in the DR; thus, some terms introduced below can be different from those used in some other works. We present a new geometrically motivated algorithm that solves the TBML and also gives a new solution for all the considered DR problems. The rest of the paper is organized as follows. Sections 2 - 5 contain definitions of various DR problems motivated by their subsequent using in specific Data Mining tasks. The proposed TBML solution is described in Section 6; some properties of the solution are given in Section 7.
2
Sample Embedding Problem
One of key Data Mining tasks related to unsupervised learning is Clustering, which consist in discovering groups and structures in data that contain ‘similar’ (in one sense or another) sample points. Constructing a low-dimensional representation of original high-dimensional data for subsequent solution of Clustering problem may be formulated as a specific DR problem, which will be referred to as the Sample Embedding problem and is as follows: Given an input dataset Xn = {X1, X2, … , X n} Х
(1)
randomly sampled from an unknown nonlinear Data Space (DS) Х embedded in a pdimensional Euclidean space Rp, find an ‘n-point’ Embedding mapping h(n): Xn Rp hn = h(n)(Xn) = {h1, h2, … , hn} Rq
(2)
of the sample Xn to a q-dimensional dataset hn, q < p, which ‘faithfully represents’ the sample Xn while inheriting certain subject-driven data properties like preserving the local data geometry, proximity relations, geodesic distances, angles, etc.
If the term ‘faithfully represents’ in the Sample Embedding problem corresponds to the ‘similar’ notion in the initial Clustering problem, we can solve the reduced Clustering problem for the constructed low-dimensional feature dataset hn. After that, we can obtain some solution of the initial Clustering problem: the clusters in the initial problem are the images of the clusters discovered in the reduced problem by using a natural inverse mapping from hn to the original dataset Xn. The term ‘faithfully represents’ is not formalized in general, and in various Sample Embedding methods it is different due to choosing some optimized cost function L(n)(hnXn) which defines an ‘evaluation measure’ for the DR and reflects desired properties of the n-point Embedding mapping h(n) (2). As is pointed out in [2], a general view on the DR can be based on the ‘concept of cost functions’. For example, the cost function L(n)(hnXn) = ∑ ( (
)
‖
‖)
is considered in the classical metric Multidimensional Scaling [3], here is a chosen metric on the DS X. Note that the Multidimensional Scaling and Principal Component Analysis (PCA) [4] methods are equivalent when is the Euclidean metric in Rp. Another cost function LE(hnXn) = ∑
(
)
‖
‖
(3)
is considered in the Laplacian Eigenmaps method [5] for the Sample Embedding problem, it is minimized under the normalizing condition ∑
( )
(
)
,
(4)
required to avoid a degenerate solution; here KE(X,X)=I{Х-Х 0} we will denote the algorithm parameters. For X Хn, let UE(X) = {X Xn: Х - Х < 1} be the 1-ball centered at X; for OoS point Х Хn, X is included in the UE(X) also. Let KE(X, X) (5) be the Euclidean ‘heat’ kernel introduced in [5]. By applying PCA to the set UE(X), ordered eigenvalues 1(X) 2(X) … p(X) and the corresponding p-dimensional orthonormal principal vectors are computed. Let Хh = {X Rp: q(X) > 3}
(29)
be the set which will be the domain of definition for the Embedding mapping h (6) to be constructed later. In what follows, we assume that the DM X is well sampled to provide the inclusion Х Хh. Denote by QPCA(X), X Хh, the orthogonal pq matrix with columns consisting of the first q principal vectors, and let LPCA(Х) = Span(QPCA(X)) Grass(p, q)
(30)
be the linear space spanned by columns of the QPCA(X), called for short the PCAspace. If X Хh and if the neighborhood UE(X) is small enough, then [57, 58, 59, 60] LPCA(Х) is an accurate approximation (called the PCA-approximation) of the tangent space L(X): LPCA(Х) L(Х), Х Xh.
(31)
Let KG(Х, Х) = I{dBC(LPCA(Х), LPCA(Х)) 3},
(46)
where ( ) is the qth largest eigenvalue in the PCA, and define the pq orthogonal matrix q(y), y Yg, whose columns are the first q principal vectors, corresponding to the q largest eigenvalues. For y Yg and y Yn, introduce the ‘Grassmann’ kernel kG(y, y) = I{dBC(L*, L) < 4} KBC(L*, L), in the FS; here we for short denote L* = L*(y) = Span(q(y)) and L = LPCA(h-1(y)) and dBC(L*, L) = {1 - Det2[qT(y) QPCA(h-1(y))]}1/2, KBC(L*, L) = Det2[qT(y) QPCA(h-1(y))] are the Binet-Cauchy metric and Binet-Cauchy kernel, respectively, on the Grassmann manifold Grass(p, q). Note that the approximate equalities L*(h(X)) L(X) hold. For y Yg and y Yn, introduce the aggregate kernel k(y, y) = kE(y, y) kG(y, y) providing the equalities k(h(Х), h(X)) K(Х, X) for close points X Xh and X Xn. Denote also k(y) = ∑
(
).
Step 2. We want the matrix G(y) to provide the condition (24); thence, the matrix must meet the condition G(h(X)) H(X). Taking into account the cost function
H(H, X) (39) for constructing the matrix H(X), we construct a pq matrix G(y) for an arbitrary point y Yg which satisfies the constraint Span(G(y)) = L*(y) and minimizes the form ∆ (G )
∑
(
)
‖G
( )‖ .
A solution of this problem in an explicit form is obtained in Theorem 6. Theorem 6. The matrix G(y) = *(y)
( )
∑
(
)
( )
meets the above constraint and minimizes G(G, y), where *(y) is the projector onto the linear space L*(y). Step 3. We will construct the Reconstruction mapping g to meet conditions (10) and (23) It follows from (42), that, under the desired conditions X g(h(X)) and H(X) G(h(X)), the approximate equalities X – g(y) G(y) (y – y) are satisfied for near points y = h(X) Yg and y = h(X) Yn. Construct mapping g(y) for an arbitrary point y Yg by minimizing over g Rp the weighted residual g(g, y)=∑
(
)
|
G( )
)| .
(
A solution of this problem in an explicit form is obtained in Theorem 7. Theorem 7. The p-dimensional vector g(y) =
( )
∑
(
)
+ G(y) (
( )
∑
(
)
)
meets constraint (23) and minimizes the quadratic form g(g, y).
7
Properties of the GSE-algorithm
1) The inclusions Хh X and Yg Yh = h(Хh) Y = h(X) for the constructed domains of definition Хh (29) and Yg (46) of the mapping h (6) and g (7), respectively, which are required to provide the proximity ТB(X) ТB(X) (26) between the Tangent Bundle TB(X) of the Data Manifold X and the Reconstructed Tangent Bundle ТB(X) (25) of the Reconstructed Manifold X, may, in general, fail for finite samples. But if the DM X is well sampled, then all the points from the sets X and Y fall into the sets Xh and Yg, respectively.
Formally this means that, under the considered Sampling model, and if the sample size n tends to infinity, the sample-based sets Xh and Yg consistently estimate the DM X and the FS Y = h(X), respectively. 2) Consider the asymptotic n , under which the ball of radius 1 = 1n (a threshold in the neighbourhoods UE(X) and uE(y)) tends to 0 with an appropriate rate O(n-1/(q+2)). Then [69], uniformly in points X X, the relations X – r(X) = O(n-2/(q+2)),
(47)
dP,2(L(X), LG(h(X))) = O(n-1/(q+2))
(48)
hold true with high probability, where dP,2 is the above-defined projection 2-norm metric on the Grassmann manifold Grass(p, q). The term ‘an event occurs with high probability’ means that its probability exceeds the value (1 – C / n) for any n and > 0, and the constant C depends only on . 3) The rate in (47) coincides with the asymptotically minimax lower bound for the Hausdorff distance H(Х, X) between the DM X and RM Х, which was set out in [70]. It follows from (47) and the obvious inequality H(Х, X) supХ Х (X) that the RM Х estimates the DM Х with the optimal rate of convergence. The rate (48) for the deviation of the local PCA-estimator LPCA(X) from the tangent space L(X) at a reference point X is known [58, 59], but relation (48) holds uniformly in manifold points. 4) Consider the Jacobians Jg•h and Jh•g of the mappings r = g•h and h•g: Y Y, respectively; the latter mapping h(g(y)) is the result of successively applying the reconstruction mapping g to the point y Y, and then applying the embedding mapping h to the reconstruction result g(y) X Xh. Then the relations Jg•h(X) = (X) and Jh•g(y) = Iq hold true. As a consequence, the residual vector (X - r(Х)) has a zero Jacobian, and the relations r(Х) - r(Х) = Х - Х + o(X - X), h(r(Х)) - h(r(Х)) = h(Х) - h(Х) + o(X - X) hold true for near points X, X X. 5) Results of performed comparative numerical experiments [56] show that the proposed algorithm outperforms the compared known algorithms for the Dimensionality Reduction problem.
Conclusion Many Data Mining tasks, such as Pattern Recognition, Classification, Clustering, and others, which are challenging for machine learning algorithms, deal with real-world
data presented in high-dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid this phenomena in Data Mining tasks, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original highdimensional data into their lower-dimensional representations such that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems can be formulated as various Dimensionality Reduction problems, whose different formalizations depend on the considered Data Mining tasks. Different formulations of the Dimensionality Reduction (Sample Embedding, Data Space (Manifold) embedding, Manifold Learning as Data Manifold Reconstruction/Estimation, and newly proposed Tangent Bundle Manifold Learning), which are motivated by various Data Mining tasks, are described. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered DR problems is also presented. If the sample size tends to infinity, the proposed algorithm has the optimal rate of convergence. Acknowledgments. This work is partially supported by the RFBR, research projects 13-01-12447 and 13-07-12111.
References 1. Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. In arXiv preprint: arXiv:1206.5538v2, 1 – 64 (2012). 2. Bunte, K., Biehl, M., Hammer, B.: Dimensionality reduction mappings. In: IEEE Symposium Series in Computational Intelligence (SSCI) 2011 - Computational Intelligence and Data Mining (CIDM), pp. 349 – 356. Paris, France. Piscataway, N.J.: IEEE (2011). 3. Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall/CRC, London, UK (2001). 4. Jollie, T.: Principal Component Analysis. New-York, Springer (2002). 5. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373 – 1396 (2003). 6. Hecht-Nielsen, R.: Replicator neural networks for universal optimal source coding. Science 269, 1860–1863 (1995). 7. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504 – 507 (2006). 8. Kramer, M.: Nonlinear Principal Component Analysis using autoassociative neural networks. AIChE Journal 37(2), 233 – 243 (1991). 9. DeMers, D., Cottrell, G.W.: Nonlinear dimensionality reduction. In: Hanson, D., Cowan, J., Giles, L. (eds.) Advances in Neural Information Processing Systems 5, pp. 580–587. San Mateo, CA: Morgan Kaufmann (1993). 10. Kohonen, T.: Self-organizing Maps (3rd Edition). Springer-Verlag (2000). 11. Martinetz, T., Schulten, K.: Topology representing networks. Neural Networks 7, 507–523 (1994).
12. Lafon, S., Lee, Ann B.: Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning and Data Set Parameterization. IEEE Transaction on Pattern Analysis and Machine Intelligence 28(9), 1393 – 1403 (2006). 13. Schölkopf, B., Smola, A., Műller, K.: Nonlinear component analysis as a kernel eige nvalue problem. Neural Computation 10(5), 1299 – 1319 (1998). 14. Saul, L.K., Roweis, S.T.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323 – 2326 (2000). 15. Donoho, D.L., Grimes, C.: Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Arts and Sciences 100, 5591 – 5596 (2003). 16. Tehenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319 – 2323 (2000). 17. Weinberger, K. Q., Saul, L. K.: Maximum Variance Unfolding: Unsupervized Learning of Image Manifolds by Semidefinite Programming. International Journal of Computer Vision 70(1), 77 – 90 (2006). 18. Brand, M.: Charting a manifold. In: Becker S., Thrun S., and Obermayer K. (eds.) Advances in Neural Information Processing Systems 15, pp. 961 – 968. Cambridge, MA: MIT Press (2003). 19. Zhang, Z., Zha, H..: Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment. SIAM Journal on Scientific Computing 26(1), 313 – 338 (2005). 20. Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., Ouimet, M.: Learning Eigenfunctions Link Spectral Embedding and Kernel PCA. Neural Computation 16(10), 2197 – 2219 (2004). 21. Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., Ouimet, M.: Out-ofsample extension for LLE, Isomap, MDS, Eigenmaps, and spectral clustering. In: S ebastian Thrun, Lawrence Saul, and Bernhard Schölkopf (eds.) Advances in Neural Information Processing Systems 16, pp. 177 - 184. Cambridge, MA: MIT Press (2004). 22. Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4, 119 – 155 (2003). 23. Saul, L. K., Weinberger, K. Q., Ham, J. H., Sha, F., Lee, D. D.: Spectral methods for dimensionality reduction. In: O. Chapelle, B. Schölkopf and A. Zien (eds.) Semisupervised Learning, pp. 293-308. Cambridge, MA: MIT Press (2006). 24. Burges, Christopher J.C.: Dimension Reduction: A Guided Tour. Foundations and Trends in Machine Learning 2(4), 275 – 365 (2010). 25. Gisbrecht, A., Lueks, W., Mokbel, B., Hammer, B.: Out-of-Sample Kernel Extensions for Nonparametric Dimensionality Reduction. In: Proceedings of European Symposium on Artificial Neural Networks, ESANN 2012. Computational Intelligence and Machine Learning, pp. 531 – 536. Bruges, Belgium (2012). 26. Strange, H., Zwiggelaar, R.: A Generalised Solution to the Out-of-Sample Extension Problem in Manifold Learning. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 471 – 478. San Francisco, California, USA: AAAI Press, Menlo Park, California (2011). 27. Cayton, L.: Algorithms for manifold learning. Univ of California at San Diego (UCSD), Technical Report CS2008-0923, pp. 541 – 555. Publisher: Citeseer (2005). 28. Huo, X., Ni, X., Smith, A.K.: Survey of Manifold-based Learning Methods. In: T. W. Liao and E. Triantaphyllou (eds.) Recent Advances in Data Mining of Enterprise Data, pp. 691 – 745. Singapore: World Scientific (2007). 29. Izenman, A.J.: Introduction to manifold learning. Computational Statistics 4(5), 439 – 446 (2012).
30. Y. Ma and Y. Fu (eds.): Manifold Learning Theory and Applications. London, CRC Press (2011). 31. Narayanan, H., Mitter, S.: Sample complexity of testing the manifold hypothesis. In: J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 1786 – 1794. Cambridge, MA: MIT Press (2010). 32. Rifai, S., Dauphin, Y.N., Vincent, P., Bengio, Y., Muller, X.: The manifold Tangent Classifier. In: J. Shawe-Taylor and R.S. Zemel and P. Bartlett and F. Pereira and K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 24, pp. 2294 - 2302. Cambridge, MA: MIT Press (2011). 33. Chen, J., Deng, S.-J., Huo X.: Electricity price curve modeling and forecasting by manifold learning. IEEE Transaction on power systems 23(3), 877 – 888 (2008). 34. Song, W., Keane, A.J.: A Study of Shape Parameterisation Methods for Airfoil Optimisation. In: Proceedings of the 10th AIAA / ISSMO Multidisciplinary Analysis and Optimization Conference, AIAA 2004-4482, Albany, New York: American Institute of Aeronautics and Astronautics (2004). 35. Bernstein, A., Kuleshov, A., Sviridenko, Y., Vyshinsky, V.: Fast Aerodynamic Model for Design Technology. In: Proceedings of West-East High Speed Flow Field Conference, WEHSFF2007. Moscow, Russia: IMM RAS, http://wehsff.imamod.ru/pages/s7.htm (2007). 36. Bernstein, A., Kuleshov, A.: Cognitive technologies in the problem of dimension reduction of geometrical object descriptions. Information technologies and Computer systems 2, 6 – 19 (2008). 37. Bernstein, A.V., Burnaev, E.V., Chernova, S.S., Zhu, F., Qin, N.: Comparison of Three Geometric Parameterization methods and Their Effect on Aerodynamic Optimization. In: Proceedings of International Conference on Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems (Eurogen 2011, Capua, Italy, September 14 – 16) (2011). 38. Lee, John A., Verleysen, Michel: Quality assessment of dimensionality reduction based on k-ary neighborhoods. In: Yvan Saeys, Huan Liu, Iñaki Inza, Louis Wehenkel and Yves Van de Peer (eds.) JMLR Workshop and Conference Proceedings. Volume 4: New challenges for feature selection in data mining and knowledge discovery, pp. 21–35. Antwerpen, Belgium (2008). 39. Lee, John A., Verleysen, Michel: Quality assessment of dimensionality reduction: Rankbased criteria. Neurocomputing 72(7-9), 1431–1443 (2009). 40. Freedman, D.: Efficient simplicial reconstructions of manifold from their samples. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(10), 1349-1357 (2002). 41. Karygianni, S., Frossard P.: Tangent-based manifold approximation with locally linear models. In arXiv preprint: arXiv:1211.1893v1 [cs.LG], 6 Nov. (2012). 42. Golub, G.H., Van Loan, C.F.: Matrix Computation. 3rd ed. Baltimore, MD: Johns Hopkins University Press (1996). 43. Hotelling, H.: Relations between two sets of variables. Biometrika 28, 321 – 377 (1936). 44. James, A.T.: Normal multivariate analysis and the orthogonal group. Ann. Math. Statistics 25, 40 – 75 (1954). 45. Wang, L., Wang, X., Feng, J.: Subspace Distance Analysis with Application to Adaptive Bayesian Algorithm for Face Recognition. Pattern Recognition 39(3), 456 – 464 (2006). 46. Edelman, A., Arias, T. A., Smith, T.: The Geometry of Algorithms with Orthogonality Constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303-353 (1999).
47. Hamm, J., Lee, Daniel D.: Grassmann Discriminant Analysis: a Unifying View on Subspace-Based Learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 376 – 383 (2008). 48. Bernstein, A.V., Kuleshov, A.P.: Manifold Learning: generalizing ability and tangent proximity. International Journal of Software and Informatics 7(3), 359 - 390 (2013). 49. Kuleshov, A.P., Bernstein, A.V.: Cognitive Technologies in Adaptive Models of Complex Plants. Information Control Problems in Manufacturing 13(1), pp. 1441 – 1452 (2009). 50. Lee, Jeffrey M.: Manifolds and Differential Geometry. Graduate Studies in Mathematics 107. Providence: American Mathematical Society (2009). 51. Lee, John M.: Introduction to Smooth Manifolds. New York: Springer-Verlag (2003). 52. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive Auto-Encoders: Explicit Invariance during Feature Extraction. In: Getoor, Lise, Scheffer, Tobias (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 833 – 840. Bellevue, Washington, USA: Omnipress (2011). 53. Silva, J.G., Marques, J.S., Lemos, J.M.: A Geometric approach to motion tracking in manifolds. In: Paul. M.J., Van Den Hof, B.W., Weiland, S. (eds.) A Proceedings Volume from the 13th IFAC Symposium on System Identification, Rotterdam (2003). 54. Silva, J.G., Marques, J.S., Lemos, J.M.: Non-linear dimension reduction with tangent bundle approximation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05). Conference Publications 4, pp. 85 – 88 (2005). 55. Silva, J.G., Marques, J.S., Lemos, J.M.: Selecting Landmark Points for Sparse Manifold Learning. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press (2006). 56. Bernstein, A.V., Kuleshov, A.P.: Tangent Bundle Manifold Learning via Grassmann & Stiefel Eigenmaps. In arXiv preprint: arXiv:1212.6031v1 [cs.LG], December 2012, pp. 1 – 25 (2012). 57. Achlioptas, D.: Random matrices in data analysis. In: Jean-Francois Boulicaut, Floriana Esposito, Fosca Giannotti, Dino Pedreschi (eds.) Proceedings of the 15th European Conference on Machine Learning. Lecture Notes in Computer Science 3201, pp. 1 - 8. Pisa: Springer Verlag (2004). 58. Tyagi, H., Vural, E., Frossard, P.: Tangent space estimation for smooth embeddings of riemannian manifold. In arXiv preprint: arXiv:1208.1065v2 [stat.CO] 17 May 2013, pp. 1 – 35 (2013). 59. Singer, A., Wu, H.: Vector Diffusion Maps and the Connection Laplacian. Comm. on Pure and App. Math. (2012). 60. Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Warner, F., Zucker, S.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences, 7426 – 7431 (2005). 61. Wolf, L., Shashua, A.: Learning over sets using kernel principal angles. J. Mach. Learn. Res. 4, 913 – 931 (2003). 62. Gower, J., Dijksterhuis, G.B.: Procrustes problems. Oxford University Press (2004). 63. Bengio, Y., Monperrus, M.: Non-local manifold tangent learning. In: Saul, L., Weiss, Y., Bottou L. (eds.) Advances in Neural Information Processing Systems 17, pp. 129 – 136. Cambridge, MA: MIT Press (2005). 64. Dollár, P., Rabaud, V., Belongie, S.: Non-Isometric Manifold Learning: Analysis and an Algorithm. In: Zoubin Ghahramani (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 241 – 248. Corvallis, OR, USA: Omni Press (2007).
65. Dollár, P., Rabaud, V., Belongie, S.: Learning to Traverse Image Manifolds. In: Schölkopf Bernhard, Platt John C., Hoffman Thomas (eds.) Advances in Neural I nformation Processing Systems 19, pp. 361 - 368. Cambridge, MA: MIT Press (2007). 66. He, X., Lin, B.: Tangent space learning and generalization. Frontiers of Electrical and Electronic Engineering in China 6(1), 27-42 (2011). 67. Goldberg, Y., Ritov, Y.: Local Procrustes for Manifold Embedding: A Measure of Embedding Quality and Embedding Algorithms. Journal Machine Learning archive 77(1), 1 – 25 (2009). 68. Wasserman, L.: All of Nonparametric Statistics. Berlin: Springer Texts in Statistics (2007). 69. Kuleshov, A., Bernstein, A., Yanovich, Yu.: Asymptotically optimal method in Manifold estimation. In: Márkus, L., Prokaj, V. (eds.) Abstracts of the XXIX-th European Meeting of Statisticians, 20-25 July 2013, Budapest, Hungary, p. 325, http://ems2013.eu/conf/upload/BEK086_006.pdf (2013). 70. Genovese, C.R., Perone-Pacifico, M., Verdinelli I., Wasserman L.: Minimax Manifold Estimation. Journal Machine Learning Research 13, 1263 - 1291 (2012).