IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
133
Prime Discriminant Simplicial Complex Junping Zhang, Member, IEEE, Ziyu Xie, and Stan Z. Li, Fellow, IEEE Abstract— The structure representation of data distribution plays an important role in understanding the underlying mechanism of generating data. In this paper, we propose the prime discriminant simplicial complex (PDSC) by utilizing persistent homology to capture such structures. Assuming that each class is represented with a prime simplicial complex, we classify unlabeled samples based on the nearest projection distances from the samples to the simplicial complexes. We also extend the extrapolation ability of these complexes with a projection constraint term. Experiments in simulated and practical datasets indicate that, compared with several published algorithms, the proposed PDSC approaches achieve promising performance without losing structure representation. Index Terms— Object recognition, supervised learning, topology.
persistent
homology,
I. I NTRODUCTION
S
TRUCTURE representation plays an important role in understanding the underlying mechanism of generating data. To capture such structures, manifold learning algorithms such as isometric mapping, locally linear embedding, Laplacian eigenmaps, and locality preserving projections [1]–[5] assume that data are generated from an underlying low-dimensional manifold. In the last decade, many manifold learning algorithms have been proposed for dealing with various applications, e.g., classification [6], [7], dimension reduction [8], [9], feature selection [10], and data visualization [11], [12]. The performance of these approaches relies on the graph construction process. For example, isometric mapping constructs a neighbor graph for a set of points in a local neighborhood and employs graph distances to generate a complete graph for the whole samples [1]. Locally linear embedding assumes that each high-dimensional point can be represented as the weighted sum of its neighboring points, and that such a relationship should be preserved in the corresponding lowdimensional subspace [2]. Laplacian eigenmaps construct an adjacency graph from neighborhood information of the dataset and then use spectral techniques to perform dimensionality reduction [4]. Locality-preserving projection is a linear approximation of the nonlinear Laplacian eigenmaps based on
Manuscript received October 21, 2011; revised September 19, 2012; accepted September 26, 2012. Date of publication December 3, 2012; date of current version December 18, 2012. This work was supported in part by the NFSC under Grant 61273299 and Grant 60975044, and the 973 Program under Grant 2010CB327900. J. Zhang and Z. Xie are with the Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China (e-mail:
[email protected]; ziyu.ryan@ gmail.com). S. Z. Li is with the Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2012.2223825
the same graphical construction methods [5]. However, these graph representations are dense, sensitive to parameter setting, and less robust to data noise. To address these issues, Yan and Wang [13] proposed the 1 -graph by utilizing 1 norm to penalize the sparseness of weighted edges between each point and the remaining points. Chen et al. [14] proposed several algorithms by utilizing the 1 -graph for data clustering, subspace learning, and semisupervised learning. However, it is not easy for these graphical approaches to discover and preserve the topological structure hidden in the manifold. For example, Lewandowski et al. [15] pointed out that the view and styleindependent action manifolds, which are used to describe human activities, can be assumed to lie in a torus. Besides, manifold learning is incapable of distinguishing between two similar shapes with different topological structures for lack of consideration of topological structures, e.g., a cylinder-shape data distribution and a similar one which is actually generated by a spiral with noise. Persistent homology can effectively discover topological invariants, such as holes and tunnels, which is difficult for manifold learning algorithms [16]. The method first incrementally constructs nested families of simplicial complexes from point cloud data (PCD), and then computes the life cycle of each possible topological invariant by placing the complexes within an evolutionary growth process. Finally, it extracts those truly topological invariants or features with longer life cycle and removes topological noises [17]. Although some efforts have been made in natural image statistics [16], [18], shape classification [19], effective coverage in sensor networks [20], and clustering analysis in a stratified space [21], how to generalize persistent homology for supervised learning remains unsolved. In this paper, we propose a novel method, called the prime discriminant simplicial complex (PDSC) approach, to obtain a structure-preserving representation and achieve competitive performance in supervised learning. Specifically, we generate a nested family of simplicial complexes for each class in the training phase, and estimate a prime simplicial complex per class by weighting the life cycles of live topological structures. We classify objects in the testing phase based on the nearest projection distances from each object to these simplicial complexes. Furthermore, we utilize a projection constraint term to enhance the extrapolation ability of PDSC and prevent incorrect projection. Our main contribution is that we extend the geometrical framework of the simplicial complex to supervised learning. Moreover, we refine the construction of simplical complexes to remove the redundant simplices for classification, and propose a recognitionoriented barcode technique to determine the PDSCs. We also propose a projection technique to calculate the nearest projection distance from the data to the prime simplicial
2162–237X/$31.00 © 2012 IEEE
134
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
complexes. Experiments in several simulated and practical datasets show that, without losing the structure representation, PDSC attains competitive performance compared with several well-known algorithms. The remainder of this paper is organized as follows. In Section II, we give a brief survey on persistent homology and introduce some preliminaries of simplicial complex. In Section III, we detail our proposed PDSC algorithm. We evaluate the performance of PDSC in Section IV, and conclude this paper in Section V. II. R ELATED W ORK AND P RELIMINARIES In this section, we will survey the development of persistent homology and introduce some preliminaries related to simplicial complex. A. Related Work Persistent homology has been developed to discover some stable topological invariants from PCD. To achieve the goal, there are three crucial steps [17]: 1) selecting a subset represented by finite simplicial complexes; 2) measuring the importance of these subsets; and 3) eliminating the topological attributes (such as voids and holes, etc.,) with the minimum number of side effects. de Silva and Carlsson [16] and Carlsson et al. [18] investigated the influence of sampling technique on the estimation of topological invariants. With persistent homology and sampling strategy, they discovered that image patches with edges lie in a Klein-bottle-shape space [16], [18].1 Oudot et al. [22] proposed using geodesic Delaunay triangulation to reduce the number of samples for effectively capturing the topology of PCD. Zomorodian and Carlsson [23] employed the barcode technique to measure the importance of topological attributes. Assuming that the dimension of the points is N, a barcode is a collection of N +1 k-graphs. Each bar in the k-graphs indicates the existence of a k-simplex. Bacodes are somewhat analogous of the Betti numbers.2 Furthermore, Collins et al. [19] applied persistent homology to extract some topological features from character-shape PCD. de Silva and Ghrist [20] studied the smallest coverage issue in sensor networks based on the persistent homology. Assuming that stratified spaces consist of multiple manifolds or nonmanifolds, each of which has varying dimensions, Bendich [25] generalized the computation of persistent homology to that of intersection homology for better analysis of the stratified spaces. Moreover, Bendich et al. [21] clustered data points into different stratified spaces using methods derived from kernel and cokernel persistent homology. Adler et al. [26] investigated the persistent homology of random fields and manifold learning. A major difficulty is that it is not easy to fill the gap between the persistent homology and practical applications. 1 A real Klein bottle, which can only exist in 4-D, is an open object with no inside and no outside. 2 Betti numbers are topological invariants of a finite complex. For example, the first Betti number b1 = 1 means the complex has only one hole. The readers can see [24] for more details.
(a)
(b)
Fig. 1. Distinction between (a) non-face-to-face complex and (b) simplicial complex.
B. Preliminaries If σ is the convex hull of k + 1 affinely independent points, we call σ a k-simplex of dimension k, dim(σ ) = k. For 0 ≤ k ≤ 3, a k-simplex is called a vertex, an edge, a triangle, or a tetrahedron. A simplicial complex K is a set of finite sets such that, if σ ∈ K and τ ⊆ σ , then τ ∈ K . Here σ is a simplex and τ is a face of σ , i.e., its coface [24]. A simplex is maximal if it has no proper coface in K . A simplicial complex K may be embedded in Euclidean space as the union of its geometrically realized simplices such that they only intersect along shared faces. An illustration on the distinction between simplicial complex and non-face-to-face complex is shown in Fig. 1. To discover the topological structure from data cloud points S, furthermore, it is necessary to construct the simplicial complexes K that approximate the space X from which S is sampled. There are several different complexes including Cˇech [27], Vietoris–Rips [28], and Witness complexes [16] in the literature. Specifically, let Y be a metric space with metric d : Y × Y → R; then we have following definitions. Cˇech Complex: Let B (x) = {y ∈ Y|d(x, y) < , ∀x ∈ Y, ∈ R} be the open ball of radius centered at x. Given S ⊆ Y and ∈ R, we center an -ball at each point to get a cover: U = {B (x)|x ∈ S}. The Cˇech complex C (S) is the nerve of this cover. Actually, Cˇech complex is hard to compute, and Vietoris– Rips complex is a practical approximation of the Cˇech complex and is much easier to construct. Vietoris–Rips Complex: Unlike the previous complex, it is based on a graph, instead of a cover. Given S ⊆ Y and ∈ R, let G = (S, E ) be the -neighborhood graph on S, where E = {{u, v}|d(u, v) ≤ , u = v ∈ S}. The Vietoris–Rips complex V (S) is the clique complex of the -neighborhood graph. Here, the clique complex, also called the flag complex, has the maximal cliques of a graph as its maximal simplices. We refer to the Vietoris–Rips complex as the Rips complex for brevity. For saving storage space, Rips complex V (S) only stores the edges and vertices, and forms the largest simplicial complex that has the same 1-skeleton as Cˇech complex C (S). However, both methods produce a very large number of complexes, especially for large-scale PCD. To refine the efficiency, de Silva and Carlsson [16] proposed the witness complex by selecting a group of landmark points and utilizing the remaining points as witnesses of the existence of simplicial complexes. Witness Complex: Given S ⊆ Y, a weak witness w ∈ Y is closer to points in σ ⊆ S than S − σ , witnessing the
ZHANG et al.: PRIME DISCRIMINANT SIMPLICIAL COMPLEX
creation of a Delaunay simplex σ . Let L ⊆ S be the set of landmarks, and the remaining points W = S − L be the set of potential witnesses. For ∈ R, the -witness graph is the graph G = (L, E ), where {l1 , l2 } ∈ E if there exists a weak witness w ∈ W that is closer to li than any other landmark, and d(w, li ) ≤ for i = 1, 2. The witness complex W (S) is the clique complex of this -witness graph. Note that there is also a “lazy” version of the witness complex, namely, Lazywitness, that has the same 1-skeleton as W (S). We will introduce it in Section III. III. PDSC In this section, we propose the PDSC for supervised learning. For better understanding, we will detail the PDSC algorithm by dividing it into the following three parts: 1) the construction of simplicial complexes; 2) the selection of prime simplicial complexes; and 3) classification based on PDSC. A. PDSC As mentioned above, there are several methods to construct simplicial complexes. For efficiency, in this paper we utilize both Rips complexes and Lazywitness complexes to construct the simplicial complex on PCD. When the number of training samples is small, which is very common in supervised learning, we choose the Rips complex method to obtain the nested families of simplicial complexes. Given the PCD S = {v 0 , v 1 , . . . , v n−1 } and the scalar , we first compute the n × n distance matrix D, and then construct the Rips complex V (S, ) based on the following rule [16]. A k-simplex σ = [v 0 v 1 · · · v k ] belongs to V (S, ) iff | v i − v j | , for all edges [v i v j ], 0 ≤ i < j ≤ k. It implies that, when [v 0 v 1 · · · v k−1 ] and [v 1 v 2 · · · v k ] occurs, all the edges [v i v j ](0 ≤ i < j ≤ k), except [v 0 v k ], have already been checked. So we can generate higher dimensional cells inductively by just checking the lower dimensional triple set {[v 1 · · · v k ], [v 0 · · · v k−1 ], [v 0 v k ]}. If the k-simplex is not a face of any k + 1-simplex in the same complex, we call it a relatively highest dimensional simplex (RHDS). It is worth noting that, since the faces of RHDS have been implicitly checked in our PDSC approach, we only focus on the RHDS themselves. To avoid the repeated computation, we propose to remove their faces from the simplices set when constructing the simplicial complex. A pseudocode based on Rips complexes is shown in Algorithm 1. As the number of samples increases and the dimension of the data space becomes large, the construction tends to be extremely inefficient. To solve this problem, we introduce the witness complexes, which behaves like Delaunay triangulations computed in the intrinsic geometry of the dataset S. Specifically, we select a subset L = {l1 , . . . , l p } ⊂ S as the vertex set by using sampling techniques such as maxmin sampling or random sampling methods at first. The maxmin sampling method randomly extracts one point from the data cloud points as the first vertex, and iteratively selects the next p − 1 points that maximize the minimal distances between them and the previous vertices. In this way, the method generates a set of vertices uniformly distributed around
135
Algorithm 1 Construct the Simplicial Complex V (S, ) Using Rips Complexes input Data points S, and radius output the edges and vertices of V (S, ) 1: Compute the n × n distance matrix D. 2: Consider every two pairs (i, j ) where 0 ≤ i < j ≤ n 3: if D(i, j ) ≤ then 4: Add [v i v j ] to V (S, ). 5: Remove [v i ], [v j ] from V (S, ). 6: end if 7: Generate higher dimensional cells inductively: the ksimplex [v a0 v a1 . . . v ak ] occurs iff the three lower dimensional simplices [v a1 . . . v ak ], [v a0 . . . v ak −1 ] and [v a0 v ak ] all occur. 8: Remove those lower dimensional triple sets from V (S, ).
the data structure [16]. They can be regarded as landmark points. Then we utilize the remaining ones as the witness points {w1 , w2 , . . . , wq } to determine which simplices will occur in the complex [16]. More formally, let D be a p × q distance matrix, where q denotes the number of the witness points. Each element D(i, j ) measures the distance between the landmark point li and witness point w j . To discover a persistent topological invariant from PCD, we construct a nested family of simplicial complexes W (L; , f ), where is the radius of metric ball, and f is a nonnegative integer. Then we employ two rules to determine which simplices should be added into the complex [16]. 1) A 1-simplex σ = [la lb ] shall be added to W (L; , f ) iff there exists a witness w j (1 ≤ j ≤ q) satisfying max( D(a, j ), D(b, j )) ≤ + m j . 2) A k-simplex [la0 la1 · · · lak ] shall be added to W (L; , f ) iff there exists a witness w j (1 ≤ j ≤ q) satisfying max( D(a0 , j ), D(a1 , j ), . . . , D(ak , j )) ≤ + m j . Note that, if f = 0, we define m j = 0, ∀ j = 1, 2, . . . , q. Otherwise, let m j be the f th smallest entry of the j th column of matrix D. Furthermore, its lazy version is different only in the second rule: a k-simplex [la0 la1 · · · lak ] will be added to W (L; , f ) iff all of its edges belong to W (L; , f ). For algorithm simplification, we introduce a matrix E which records the time of appearance of each edge E(i, j ) = min max( D(i, k), D ∗ (k, j ))) − m k . k
(1)
A pseudocode based on the Lazywitness complexes is shown in Algorithm 2. B. Selecting PDSCs To utilize persistent homology for supervised learning, we propose the concept of PDSCs by constructing a supervisedlearning-related barcode and selecting the PDSC from it. Specifically, we construct a filtered simplicial complex from the PCD by increasing from 0 to ∞ (we then use parameter
136
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
Algorithm 2 Construct the Simplicial Complex W (L; , f ) Using Lazywitness Complexes input Data points S, , the ratio r , the family f output the edges and vertices of W (L; , f ) 1: Choose p landmark points and q witness points, where p = n · r and q = n − p. 2: Compute the p × q matrix D of distances. 3: Compute the p × p matrix E which records the time when edge [li l j ] appears. 4: Consider every two pairs (i, j ) where i < j ≤ p. 5: if E(i, j ) ≤ then 6: Add [li l j ] to W (L; , f ). 7: Remove [li ], [l j ] from W (L; , f ). 8: end if 9: Generate higher dimensional cells inductively: the ksimplex [la0 la1 . . . lak ] occurs iff the three lower dimensional simplices [la1 . . . lak ], [la0 . . . lak −1 ] and [la0 lak ] all occur. 10: Remove those lower dimensional triple sets from W (L; , f ) after the merging procedure.
R to replace ). The filtered complex is an increasing sequence of simplicial complexes which determine an inductive system of homology groups [23]. For our purpose, we discover that in this sequence, a proper complex, named PDSC, is useful for supervised learning. The PDSC is a relatively stable complex from which we can capture the homology of the data’s topological structure. For better understanding, an example is shown in Fig. 2. Once the filtered simplicial complex is constructed, we use the barcode technique to record the lifetime of each simplex belonging to the complex as the parameter R increases from 0 to Rmax . An example is shown in Fig. 3 where we only consider the simplices still alive when R = Rmax .3 Obviously, it is hard to find the best prime simplicial complex from the sequence because it is not easy to determine the parameter R. In supervised learning, cross-validation is a reasonable way to select such a parameter. In our experiments, however, we find that cross-validation is not helpful in selecting the parameter R. The reason is that our algorithm needs to construct a simplicial complex on each class. As a result, the prime simplicial complex per class will have different Rs. Moreover, even if we confirm R by using cross-validation on each class, the Rs are still not fit for all the classes because we cannot construct the simplicial complex on the whole training samples during the cross-validation procedure. Therefore, we propose an alternative way to select the radius R ∗ based on the weighted lifecycles m i=1 i Mi ∗ (2) R = m i=1 i where m is the number of simplices, i is the length of the i th barcode, and Mi is the radius corresponding to the median 3 Note that our barcode is different from that in [26]. The reason is that although the barcode technique in [26] is a good way to describe the persistent homology by recording the birth and death time of some topological invariants, only the live simplices are useful for our proposed PDSC algorithm.
of i . Intuitively, the shorter the life cycle, the more unstable the corresponding simplex, and the less influence it has in the determination of a stable and prime simplicial complex. Formally, let the length of the shorter lifecycles be A,i (i = 1, . . . , s) and that of the others be B, j ( j = 1, . . . , s ) with s + s = m; then we can rewrite (2) as s s i=1 A,i Mi + j =1 B, j M j ∗ R = s s i=1 A,i + j =1 B, j s s j =1 B, j M j i=1 A,i Mi = s + s . s s j =1 B, j j =1 B, j i=1 A,i + i=1 A,i + (3) When for all the lifecycles, we have A,i B, j , then (3) can be approximated by s j =1 B, j M j ∗ (4) R ≈ s j =1 B, j which indicates that the primal simplicial complex is less sensitive than simplicial complexes with shorter life cycles. C. Classifying Objects Based on PDSC Assuming that data distribution per class is represented by a PDSC, we attempt to classify unlabeled samples by projecting them to the faces of PDSC. In this way, we can avoid projecting them to some holes and voids that may exist in the topological structures. The holes and voids will lead to incorrect projection and impair the predicted accuracy of supervised learning. Specifically, let σi (i = 1, 2, . . . , m) be a k-simplex with vertices {v 0 , v 1 , . . . , v k }. Then the projection position x p of sample x can be defined as a linear combination of vertices in the simplex xp =
k i=0
λi v i , where
k
λi = 1
(5)
i=0
where λi is the weight value. Taking a 2-simplex as an example, the weight is equal to λ0 = (B T B)−1 B T (x − v 2 ) λ1 (6) λ2 = 1 − λ0 − λ1 where B = [v 0 − v 1 , v 0 − v 2 ]. An example is shown in the left plot of Fig. 4. Here the sample point is projected into the inner of the 2-simplex. As for a 1-simplex, the weight is equal to (x−v 0 )T (v 1 −v 0 ) , i =0 (7) λi = (v 1 −v 0 )T (v 1 −v 0 ) 1 − λ0 , i = 1. If the projection index 0 ≤ λi ≤ 1, the projection position locates inside the face; otherwise, it locates outside. For λi > 1 or λi < 0, on one hand, it can lead to an incorrect projection for distant points. On the other hand, it provides a forward and backward interpolation along a face when the number of training samples is small. To make a compromise between preventing the data from being incorrectly projected outside
ZHANG et al.: PRIME DISCRIMINANT SIMPLICIAL COMPLEX
137
(a)
(b)
(c)
Fig. 2. Construction of a simplicial complex through metric balls with a radius R. (a) Good choice of R (left) induces a PDSC which can help us to capture the homology of an annulus from the union of balls. (b) and (c) Meanwhile, the union of balls with incorrect radius will induce an incorrect structure representation.
5 4 3 2 1 0 −1 −2 −3 −4 −4
30 25 20 15 10 5 −2
0
2
4
0 0
(a)
0.5 R
1
D. Importance of Preserving the Topology
(b)
Fig. 3. (a) Construction of a simplicial complex for circle-shaped data. In the panel, the red dotted line and the blue line denote 1-simplex and 2-simplex, respectively. (b) Each barcode of its simplices starts at a specific R value, and ends up at Rmax which is used to determine when to stop the computation of barcode. In this figure, Rmax is set to 1. The disappeared simplices have not been shown in the figure.
the face and preserving the extrapolation ability of the topological structure, we introduce a parameter γ to compute the projection position and the corresponding projection distance as follows: v i + (1 + γ )(v j − v i ), if λi 1 + γ xp = (8) v i − γ (v j − v i ), if λi ≤ −γ where v i and v j denote two different vertices of a simplex. An example of the projection distances between the sample and 1- or 2-simplex is shown in Fig. 4. Then the distance between a sample x and a simplicial complex of the cth class is T ,c dPDSC (x|SCc ) = min(x − x ,c p ) A(x − x p ),
= 1, . . . , m; c = 1, . . . , C
(9)
where m denotes the number of simplices in the complex, C is the number of classes, and A is a nonnegative matrix. Finally, we classify the sample to a class that has the nearest simplicial complex distance to it as C(x) = arg min dPDSC (x|SCc ) c
c = 1, . . . , C.
we set it to be an identity matrix or inverse covariance matrix. The former is equivalent to a Euclidean distance. Meanwhile, the latter leads to the classical Mahanalobis distance, named PDSC-M. Moreover, a MATLAB implementation of the proposed PDSC is available from http://www.iipl. fudan.edu.cn/~zhangjp/sourcecode/PDSC.zip.
(10)
For better understanding, let us consider an illustrative example. Given 20 points belonging to two classes, we construct the Rips complex which has only 1-simplices per class. The test point is classified to class 2 based on the nearest projection distance. The result is shown Fig. 5. Note that the matrix A can be obtained by metric learning, which is beyond the scope of this paper. In this paper,
Why do we emphasize the importance of preserving the topological structure? What role does the topology play in PDSC? The answers to these stem from the intrinsic geometry structure of many real-world data. For instance, faces, handwritten digits, speech data, and web pages are assumed such that each class in them lies on a low-dimensional manifold embedded in a higher dimensional space [6]. Since it is impossible to exactly depict the intrinsic manifold from discrete data points, some topological properties (such as homeomorphism), invariances, and topological features (such as number of tunnels, number of voids) are alternative ways to demonstrate it. In summary, preserving the topology has the following advantages. 1) Persistent homology bridges the gap between discrete structures and continuous structures, as it calculates the persistent life cycles of the simplicies [23], [25], [26]. This means that keeping the topology can generate a multiscale model with more usable information for supervised learning or other tasks. 2) Persistent homology omits the short-life-cycle features, which makes the generated topology model robust to noisy data. Keeping the topology makes the model more stable on noisy datasets than many graph-based manifold learning algorithms. 3) Topological structure can lead to a simplified geometric representation on datasets to be analyzed. For example, Ge et al. [29] proposed a simple framework to extract a 1-D skeleton from data. Although such a 1-D skeleton is not extracted in this paper, PDSC still gets a nice representation. It is beyond the capacity of some learning algorithms [such as support vector machine (SVM)]. In another sense, keeping the topology provides PDSC with good interpretability. We support these views in Section IV. Some recent works also try to explore the usage of topology in practical
138
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
1
1 Vertices 2−Simplex Projection Position Sample Point Projection Line
0.5 0 −0.5 3
x
0 v2
p
x
−0.5
p
3
v1
2
x
0.5
x
v0
Vertices 2−Simplex Extended Vertex Extended Line Projection Position Sample Point Projection Line
v2
v0
v
1
2 1
1 2
1
0
−1
3
4
(a)
−1
4
3
2
1
0 (b)
Fig. 4. (a) Sample point x projected into the inside of a 2-simplex. (b) Sample point x orthogonally projected outside a 2-simplex but within a 1-simplex formed by the extended vertices. TABLE I
SC of Class 1 SC of Class 2 Test Point Projection Position Projection Position Projection Line Projection Line
1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 4
D ESCRIPTION OF S EVERAL B ENCHMARK D ATASETS . # AND D IM D ENOTE THE N UMBER OF S AMPLES AND M EANS D IMENSION , R ESPECTIVELY. C D ENOTES THE N UMBER OF C LASSES , AND RA D ENOTES THE R ATIO OF THE N UMBER OF T RAINING S AMPLES IN E ACH D ATASET OR THE N UMBER OF T RAINING S AMPLES V ERSUS T HAT OF T EST S AMPLES . T HE
L ATTER O NE M EANS T HAT T RAINING S ET AND T EST S ET H AVE B EEN S EPARATED BY T HEIR P ROVIDER
2 0 −2 −4 3
2
1
0
−1
−2
−3
Fig. 5. PCD consisting of two classes. We construct a simplicial complex which has only 1-simplices per class. The test point is then projected onto the edges, avoiding the hole in each complex. We can then label it as class 2 because the dotted line simplicial complex is nearer.
applications, such as faces identification across ages [30]. In [30], Bouchaffra applied an α-shape4 geometric constructor to build hierarchical structures assigned to dynamic Bayesian networks (DBNs). The generation of several different shapes from the same DBN representing an individual’s face allows exploring of several deformations of the same human face. The topological structure supplies more information to increase the identification performance. IV. E XPERIMENTS In this section, first, we show the differences between our approach and one of the manifold learning graph constructions. Then we perform experiments in: 1) five simulated datasets; 2) four multiview datasets; 3) two face datasets; and 4) eight UCI benchmark datasets [32] to evaluate the performance of the PDSC approach. The latter 10 datasets are listed in Table I. Finally, we analyze the ability of a simplical complex 4 α-shape is the underlying space corresponding to α-complex. Refer to [31] for more details about α-shape.
Datasets
#
Dim
C
RA
ORL
400
10 304
40
0.5
UMIST
575
10 304
20
0.5
Iris
150
4
3
0.5
Landsat satellite
6335
36
2
0.1
Image segmentation
2310
16
7
210/2100 0.5
Gaussian Elena
5000
8
2
Breast cancer Wisconsin
569
31
2
0.5
Phoneme
5404
5
2
0.1
Pendigits
10 992
17
10
7494/3498
Optdigits
5620
65
10
3823/1797
for generating virtual samples and preserving the topological structure. The five simulated datasets are generated from different topological structures plus random noise with variance ρ as follows. D1) Two concentric circles of 2-D (ρ = 1.0). D2) Two spirals of 2-D (ρ = 3.5). D3) Circle-cross-circle of 3-D (ρ = 2.0). D4) Four circle-cross-circle of 3-D (ρ = 2.0). D5) Sphere-cross-sphere of 4-D (ρ = 1.5), as shown in Fig. 6. Each dataset includes two classes, each of which has 500 training samples and 500 test samples without overlap. Their functional forms are summarized in Table II. We use maxmin sampling strategy to select 50% training samples as the landmark points [16] and the remaining samples as the witness points to construct the prime simplicial complexes. Some examples of these complexes are illustrated in Fig. 6. From
ZHANG et al.: PRIME DISCRIMINANT SIMPLICIAL COMPLEX
139
15
4
4 2
10
3
5
2
0
1
0
0
−2
−5
−4 −4
−2
0
2
−1
−10 −10
4
−5
0
5
(a)
10
−2 −2
15
0
2
(b)
4
6
5
0
−5
(c)
5 10 5
0
−5 −2
2
0
4
0
5
0
(d)
(e)
z
y
5 0
6
4
4
2
2
0
0
−2
−2
−4 −5
5
0 x
−4 −5
5
6
6
4
4
4
2
2
2
w
6
w
z
0 x
6
w
10
−5 −5
0
0
0
−2
−2
−2
−4 −5
0
5
−5 −5
10
−4 −5
0
y
5 y
10
−4 −5
0 x
5
0 z
5
(f) Fig. 6. (a)–(e) D1 to D5 datasets. In each panel, the red dotted line and the blue line denote 1-simplex and 2-simplex, respectively. The test sets are generated based on the same distribution. (f) Since the fifth dataset cannot be shown correctly in the 3-D space, we plot the 2-D figure for each axis pair.
TABLE II F UNCTIONAL F ORMS OF F IVE S IMULATED D ATASETS Class 1
Class 2
ρ
Two Concentric Circles
x1 = 2 cos(t) y1 = 2 sin(t) ∀t ∈ [0, 2π )
x2 = 3 cos(s) y2 = 3 sin(s) ∀s ∈ [0, 2π )
1
Two Spirals
x1 = t cos(t) y1 = t sin(t) ∀t ∈ [π/2, 3π )
Two Circles
x1 = 0 y1 = 2 cos(t) z 1 = 2 sin(t) ∀t ∈ [0, 2π )
x2 = s cos(s + π ) y2 = s sin(s + π ) ∀s ∈ [π/2, 3π ) x2 = 2 cos(s) y2 = 2 + 2 sin(s) z2 = 0 ∀s ∈ [0, 2π ) x2 = 2 cos(s) y2 = 0 z 21 = 6 + 2 sin(s) z 22 = −2 + 2 sin(s) ∀s ∈ [0, 2π )
Four Circles
Two Balls
x1 = 0 y1 = 2 cos(t) z 11 = 2 sin(t) z 12 = 4 + 2 sin(t) ∀t ∈ [0, 2π ) x1 = 2 cos(s) cos(t) y1 = 2 sin(s) z1 = 0 w1 = 2 cos(s) sin(t) ∀s, t ∈ [0, 2π )
x2 = 2 cos(s) sin(t) y2 = 2 cos(s) cos(t) z 2 = 2 sin(s) w2 = 0 ∀s, t ∈ [0, 2π )
3.5
2
1
1.5
the figures we can see that the PDSC approaches preserve the structure representation well. It is noticeable that the data points that are not covered in the complexes are actually
the witness points (see Section II-B for more details). These witness points never appear in the witness complex, but are necessary for constructing the witness complex. The four practical multiview datasets used for object recognition are COIL-20 [33], COIL-100 [34], SOIL-47A, and SOIL-47B [35]. The COIL-20 dataset consists of 20 objects, each of which is an image with size 128×128 and has 72 different views that are sampled every 5° around an axis passing through the object. We subsample them to 32 × 32 ones. The COIL-100 dataset has 100 objects and is collected with the same way as the COIL-20. We subsample each object image to colored 16 × 16, short for COIL-100A, and gray 32 × 32, short for COIL-100B. The SOIL-47A and SOIL-47B datasets are sampled at different illumination levels [35]. Each dataset consists of 47 objects with 21 different views that are sampled every 9° around an axis passing through the object. Each object image is subsampled to a colored image with size 24×30. All images directly served as the feature vectors. Some objects in the three practical datasets are shown in Fig. 7. The two face datasets we used are the UMIST face dataset [36] and the ORL face dataset [37]. The UMIST face dataset is a multiview dataset for testing the robustness of our approach, and the ORL dataset is also another
140
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
(a) Fig. 7.
(b)
Examples of (a) COIL-20 [33], (b) COIL-100 [34], and (c) SOIL-47 [35] benchmark datasets.
3
3
(a)
3
2
2
2
1
1
1
0
5
−1 −1
(b)
(c)
−0.5
0
0.5
1
0
0
−1 −1
0
−0.5
0
0.5
1
−1 −1
0
15
15
15
10
10
10
5
5
5
0
0
0
−5
−5
−5
−10
−10
−10
−15 40
−15 40
−15 40
−5
0
5
10
15
−0.5
0
0.5
1
0
20
20
20 0 −10
5
5
0 −10
−5
0
5
10
15
0 −10
−5
0
5
10
15
Fig. 8. (a) 3-D Scurve (1000 samples). (b) Swiss-roll with a hole (800 samples). From left to right: original, witness complex, and 3-NN ISOMAP graph construction.
popular benchmark dataset for face recognition. It is worth mentioning that our object recognition is in an instance level, i.e., all the data points in a dataset belong to the same category, and is not in the sense of the visual object classes challenges. We compare the performance of our approaches with 1-nearest neighbor (1-NN), 3-NN, SVM with linear kernels (SVM-L), and SVM with Gaussian radial basis function kernels (SVM-G) [38]. The parameters in SVM are tuned by cross-validation. The whole training set is used by these approaches. Unlike other mentioned approaches that achieve classification in the original space, SVM-G actually searches an optimal classification hyperplane in an implicit nonlinear space and thus loses its interpretation to structure representation. A. PDSC Versus ISOMAP Graph Construction By constructing a simplicial complex, our approach can obtain a good representation of the data structure. To illustrate this point, we construct two witness complexes on an S-curve dataset and a Swissroll dataset with a hole in the structure. We also construct a 3-NN graph on the two datasets by employing
the ISOMAP graph construction technique [1]. From Fig. 8, we can see that unlike the ISOMAP graph, our representation preserves the intrinsic topologies, such as a hole, in the datasets. B. Simulated Datasets and Parameters’ Influences We investigate the influence of f in the five simulated datasets. Given f = 0, 1, 2, Rmax = 0.5 and γ = 0, the average results of 20 repetitions are shown in Table III. From Table III, we can see that the performance of the proposed approaches with f = 2 is better in most cases. A possible reason is that, as pointed out in [16], f = 2 provides a clean persistent interval graph with little noise. Therefore, it leads to a more stable structure representation. Note that in practical noisy environments, such graphs cannot be easily obtained. Meanwhile, f = 1 can be interpreted as arising from a family of coverings of the space X with Voronoilike regions surrounding each landmark point. We thus set the parameter f to be 1 or 2 in the subsequent experiments. Note that in the small training samples case shown in the next subsection, we use Rips complex, which does not need the parameter f , for object recognition. Furthermore, we discover
ZHANG et al.: PRIME DISCRIMINANT SIMPLICIAL COMPLEX
141
TABLE III E XPERIMENT I: T HE I NFLUENCE OF f ON THE C LASSIFICATION P ERFORMANCE ON THE F IVE S IMULATED D ATASETS . E XPERIMENT II: T HE I NFLUENCE OF Rmax AND C OMPARISON W ITH O THER A LGORITHMS . T HE E XPERIMENTAL R ESULTS A RE THE AVERAGE OF 20 R EPETITIONS D1
D4
D5
Experiment I: Rmax = 0.5, γ = 0 4.24 ± 0.68 13.38 ± 1.18 8.09 ± 1.03
9.08 ± 1.05
3.85 ± 0.56
PDSC-M ( f = 0)
4.40 ± 0.85
13.27 ± 1.10
9.52 ± 0.92
13.29 ± 1.55
7.62 ± 1.04
PDSC ( f = 1)
3.81 ± 0.59
11.70 ± 1.09
6.01 ± 0.74
6.94 ± 0.82
2.97 ± 0.57
PDSC ( f = 0)
D2
D3
PDSC-M ( f = 1)
3.83 ± 0.59
11.66 ± 1.13
6.54 ± 0.79
8.53 ± 1.00
5.15 ± 0.92
PDSC ( f = 2)
3.87 ± 0.81
10.68 ± 1.05
6.19 ± 0.70
6.55 ± 0.72
2.89 ± 0.73
PDSC-M ( f = 2)
3.79 ± 0.76
10.63 ± 1.06
6.69 ± 0.63
7.79 ± 1.05
4.29 ± 0.84
Experiment II: f = 2, γ = 0 PDSC: Rmax = 0.5
3.41 ± 0.51
11.70 ± 0.62
5.77 ± 0.42
6.58 ± 0.57
2.68 ± 0.70
PDSC-M: Rmax = 0.5 PDSC: Rmax = 1.0
3.43 ± 0.52
11.67 ± 0.70
6.23 ± 0.49
8.06 ± 0.76
4.36 ± 1.01
3.83 ± 0.74
10.60 ± 0.76
6.18 ± 0.46
6.77 ± 0.70
2.42 ± 0.67
PDSC-M: Rmax = 1.0 1-NN
3.78 ± 0.74
10.59 ± 0.76
6.46 ± 0.52
7.60 ± 0.89
3.42 ± 0.88
4.58 ± 0.53
13.24 ± 0.80
7.30 ± 0.88
7.88 ± 0.71
3.13 ± 0.86
3-NN
4.20 ± 0.63
11.74 ± 0.87
6.65 ± 0.76
7.09 ± 0.73
2.83 ± 0.77
SVM-L
48.46 ± 0.45
40.45 ± 0.66
32.05 ± 1.23
40.26 ± 4.53
18.88 ± 1.36
SVM-G
3.24 ± 0.62
10.23 ± 0.87
5.46 ± 0.73
6.30 ± 0.63
2.35 ± 0.52
A ± B means average error rate and standard deviation (%).
C. Small Training Samples and High-Dimensional Datasets We test the proposed approaches on four multiview object recognition datasets, each of which can be regarded as generated from a circle-like structure. We use four different views
Two Concetric Circles [f=2] NSC mean error
that the Mahalanobis distance is helpful in improving the performance of the proposed algorithms in some cases. We also study the influence of the parameter Rmax in determining the optimal value R ∗ , which is closely related to the selection of prime simplicial complex per class. We perform experiments on the five simulated datasets by selecting a group of Rmax , followed by computing the corresponding R ∗ . The results are shown in Table III and Fig. 9. From the results, we can see that when Rmax is located in an interval [0.3, 1], the classification performance is better. One reason for this is that the radius of these simulated datasets is close to 0.5. As a result, the topological structure can be preserved well when Rmax is selected around 0.5. Furthermore, we compare the PDSC approaches with 1-NN, 3-NN, and SVM methods in the five 2-class datasets. The reported results are shown in Table III. It can be seen from the table that in these five datasets, the PDSC approaches are always better than 1-NN and 3-NN, and achieve competitive performance compared with SVM. It is worth noting that SVM attempts to maximize the margins between each of the two classes under linear or nonlinear reproducing kernel Hilbert space, and to preserve only support vectors for classification. Therefore, it is obvious that SVM loses the topological structure hidden in the datasets. In contrast to SVM, our PDSC approaches preserve reasonable structure representations to these data distributions, as illustrated in Fig. 6, so they can benefit from the interpretation of the underlying structure representation of generating data.
R max R chosen
0.1 0.08 0.06 0.04 0.02
Fig. 9.
0
0.5
1 R
1.5
2
Parameter influence on the D1 simulated dataset.
per class (i.e., 0°, 90°, 180°, and 270°) and eight views per class as the training set, respectively. The remaining images are used as the test set. Since the number of training samples is small, we employ the Rips method [16] instead for the construction of prime simplicial complexes based on the whole training set. Note that R in Rips method is different from that in Lazywitness method. Here R is set to be 70. The results are shown in Table IV. From the results we can see that, compared with NN and SVM algorithms, the performance of PDSC is slightly worse than that of SVM algorithms. This indicates that the proposed PDSC approaches can work well in high-dimensional multiview structures. Note that here the results of PDSC-M approach are not reported because the computation of covariance matrix is ill-posed when the number of samples is less than the dimension of a dataset. Moreover, we observe that with eight views as the training samples, our approach obtains the competitive performance
142
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
TABLE IV E RROR R ATES AND S TANDARD D EVIATIONS (%) OF S EVERAL A PPROACHES IN THE 12 P RACTICAL D ATASETS . 4V AND 8V D ENOTE F OUR AND E IGHT V IEWS , R ESPECTIVELY PDSC
PDSC-M
1-NN
3-NN
SVM-L
SVM-G 13.00
COIL-100A (4V)
14.87
N/A
18.35
28.96
14.62
COIL-100A(8V)
4.40
N/A
6.72
15.23
3.36
4.11
COIL-100B (4V)
23.61
N/A
29.38
41.82
25.14
23.20
COIL-100B (8V)
8.70
N/A
14.45
27.02
9.25
8.21
SOIL-47A (4V)
23.03
N/A
24.88
59.92
18.63
17.90
SOIL-47A (8V)
15.39
N/A
21.15
36.22
8.97
9.62
SOIL-47B (4V)
27.94
N/A
31.50
59.07
24.02
24.02
SOIL-47B (8V)
20.51
N/A
26.44
35.58
14.10
14.75
COIL-20 (4V)
12.79
N/A
16.76
25.44
13.38
13.68
COIL-20 (8V)
3.20
N/A
5.16
6.95
2.03
1.25
ORL
3.75 ± 2.10
N/A
5.13 ± 2.25
11.33 ± 2.30
3.87 ± 1.86
3.96 ± 2.09
UMIST
3.66 ± 1.52
N/A
5.34 ± 1.52
11.05 ± 2.36
3.03 ± 1.38
6.43 ± 1.31
Iris
5.27 ± 2.10
4.00 ± 2.10
7.40 ± 1.81
5.93 ± 2.10
2.87 ± 1.63
4.53 ± 1.80
Landsat Satellite
13.49 ± 0.44
20.40 ± 0.61
13.54 ± 0.38
13.58 ± 0.55
13.85 ± 0.47
11.89 ± 0.54
Image Segmentation
6.38
45.90
7.10
10.62
5.85
6.05
Gaussian Elena
15.24 ± 0.58
34.61 ± 1.23
20.15 ± 0.70
18.52 ± 0.55
42.23 ± 0.66
9.98 ± 0.51
Breast Cancer Wisconsin
3.77 ± 0.95
10.63 ± 2.03
5.12 ± 1.32
4.18 ± 0.83
5.12 ± 1.00
3.18 ± 1.12
Phoneme
19.91 ± 0.61
21.85 ± 0.66
16.13 ± 0.45
16.57 ± 0.54
23.63 ± 1.01
15.40 ± 0.78
Pendigits
2.20
4.20
2.57
2.43
4.66
1.83
Optdigits
3.06
3.95
3.45
3.28
3.50
1.56
as those state-of-art algorithms using four views in the COIL and SOIL datasets [39]. However, the latter ones utilize very effective feature extraction and image registration techniques. In contrast, our approaches achieve a good tradeoff between recognition accuracy and topology preservation by only introducing additional four views. D. Face Recognition and UCI Dataset We compare our approaches with others in the ORL [37] and UMIST [36] face datasets. In the ORL dataset, the images of each subject are taken at different times with various lighting, facial expressions, and facial details [37]. In the UMIST dataset, the images of each subject are taken by varying the angles from left profile to right profile. We employ PCA to reduce the original dimensions to 40-dimensional subspaces since, empirically, the subspaces preserve most of the principal structures. In ORL, we select the 70% of the data per class as the training set, and set γ = 0.75, R = 11.5 to construct the PDSCs based on Rips complexes method. In UMIST, we select the 50% of the data per class as the training set, and set γ = 1, R = 5.5 based on Rips complexes method. The results are shown in Table IV. It can be seen from the results that PDSC approach obtains the best performance in the ORL dataset and ranks 2 in the UMIST dataset. Finally, we evaluate the performance of the PDSC approaches in eight UCI datasets. Different from the aforementioned datasets, these datasets are taken from remarkably different domains. The results are shown in Table IV. We can see from the table that, compared with SVM-L, the proposed PDSC approaches obtain better results in six out
of eight datasets. Meanwhile, its performance is comparable with that of SVM-G. It means that, although devoted to preserving structures, the proposed PDSC approaches can also be generalized to some general fields. E. Discussion Now we give real examples to show that PDSC can preserve the topological structure. First, we extract the 72 images from the first class of COIL-20. Then we adopt the LPP algorithm [5] to embed these points into 2-D space since it is difficult to visualize a high-dimensional dataset. Finally, we construct the witness complex on it. Note that the topology preserving property of our approach is dimension-free. The result is shown in Fig. 10. For LPP, we set the neighbor mode as KNN and the weight mode as heat kernel, where K = 5 and t = 5. For the witness complex, we let r = 0.2, f = 2, and R = 0.5. The sample points reveal the intrinsic circle-like structure and our approach preserves the hole in the middle. This representation is beyond the capacity of SVM. We also choose some points covered in the witness complex, calculate the weights of their 5-NN by using LPP [5], and reconstruct the virtual sample points as shown in Fig. 10. From this example, we can see that: 1) our representation preserves the topological structure and 2) our representation is helpful to generate virtual samples that have potential application for the small sample size problem, along the preserved topological structure. In addition, we perform a significant analysis to the proposed PDSC approaches based on the results shown in Table IV. With the significance level of 5%, the p-values of the
ZHANG et al.: PRIME DISCRIMINANT SIMPLICIAL COMPLEX
143
Sample Points 1−Simplex 2−Simplex Landmark Points Virtual Samples
2
1
0
−1
−2
−3
−4 −6
−5
−4
−3
−2
−1
0
1
2
Fig. 10. Witness complex constructed on the data points of one object in the COIL-20 dataset preserving the topological structure. The reconstructed virtual sample points also reveal the intrinsic circle-like structure.
paired t-test results for the PDSC approaches in the 20 datasets are as follows: PDSC versus 1NN, PDSC versus 3-NN, PDSC versus SVM-L, and PDSC versus SVM-G are 0.3764, 0.0237, 0.8545, and 0.8545, respectively. If we allow the rejection of the null hypothesis to be at the 5% significance level, which means there is significant difference between the two approaches, then the results indicate that the classification abilities of PDSC, 1-NN, and SVM are statistically competitive in these datasets. What is more, the PDSC can be seen as a good tradeoff between topological preservation and predication accuracy. We also want to discuss some limitations of the proposed approaches. First, although our goal is to preserve the topological structure of datasets, the current persistent homology techniques can only provide some approximations to the truly topological invariants, as other approaches do. It is also unclear whether topological structures indeed exist in the highdimensional datasets. Second, due to the nature of the witness complex, we can select at most 50% of the training sets as the landmark points to build our classification model for largescale training samples, whereas other approaches we compared use the whole training sets to train their models. Third, the computational complexity is higher. Given the dimension is d, and the number of dataset is n, specifically, the computational complexity of Rips complexes is O(d ·n 2 ), and that of witness complexes is O(r · d · n 2 ), where r is the ratio of the number of landmark points to n. Furthermore, the computational complexity of computing the nearest distance from the data point to the prime simplicial complexes is O(n 2 ). Finally, when data are subject to Gaussian distribution, the proposed approaches will lose their merit in recognizing objects. V. C ONCLUSION In this paper, we proposed a new structure-preserving PDSC approach by utilizing the persistent homology technique.
We refined the construction of simplicial complex by removing some simplices that were redundant to PDSC. We presented a new barcode method to determine a PDSC per class for classification. We also proposed a nearest projection technique by computing the distance from unlabeled samples to the PDSCs. Furthermore, we generalized the extrapolation ability of simplicial complexes with a projection constraint term. Experiments indicate that, compared with several well-known algorithms, our proposed PDSC approach achieves promising performance without losing the preservation of structure representation. In this paper, we did not concern ourselves with how to deal with recognizing those faces in the wild. However, our goal is to design a topology-preserving classifier for object recognition and supervised learning, and the face in the wild problem can be avoided by employing a near-infrared sensor to alleviate the influence of background if we attempt to employ our approach to such a scenario. In the future, we will investigate how to employ the PDSC approach to other practical applications with more complex topological structures. Besides, how to construct a more suitable PDSC deserves study. For example, we can consider utilizing generalized low-rank approximation of matrices [40] to select landmark points when the witness complex is employed. Moreover, we hope to study the performance of PDSC for object recognition at a category level rather than at an instance level. Finally, we will consider further refining the performance of PDSC by utilizing metric learning methods. ACKNOWLEDGMENT The authors would like to thank the Associate Editor and the three reviewers for their various comments that helped to greatly improve this paper. R EFERENCES [1] J. Tenenbaum, V. Silva, and J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. [2] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. [3] J. Zhang, H. Huang, and J. Wang, “Manifold learning for visualizing and analyzing high-dimensional data,” IEEE Intell. Syst., vol. 25, no. 4, pp. 54–61, Jul.–Aug. 2010. [4] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2001, pp. 585– 591. [5] X. He and P. Niyogi, “Locality preserving projections,” in Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2003. [6] L. Chen, I. Tsang, and D. Xu, “Laplacian embedded regression for scalable manifold regularization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 6, pp. 902–915, Jun. 2012. [7] G. Pölzlbauer, T. Lidy, and A. Rauber, “Decision manifolds—a supervised learning algorithm based on self-organization,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1518–1530, Sep. 2008. [8] Y. Hou, P. Zhang, X. Xu, X. Zhang, and W. Li, “Nonlinear dimensionality reduction by locally linear inlaying,” IEEE Trans. Neural Netw., vol. 20, no. 2, pp. 300–315, Feb. 2009. [9] K. Zhang and J. T. Kwok, “Clustered Nyström method for large scale manifold learning and dimension reduction,” IEEE Trans. Neural Netw., vol. 21, no. 10, pp. 1576–1587, Oct. 2010. [10] Z. Xu, I. King, M. Lyu, and R. Jin, “Discriminative semi-supervised feature selection via manifold regularization,” IEEE Trans. Neural Netw., vol. 21, no. 7, pp. 1033–1047, Jul. 2010.
144
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 1, JANUARY 2013
[11] G. G. Yen and Z. Wu, “Ranked centroid projection: A data visualization approach with self-organizing maps,” IEEE Trans. Neural Netw., vol. 19, no. 2, pp. 245–259, Feb. 2008. [12] J. A. K. Suykens, “Data visualization and dimensionality reduction using kernel maps with a reference point,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1501–1517, Sep. 2008. [13] S. Yan and H. Wang, “Semi-supervised learning by sparse representation,” in Proc. SIAM Int. Conf. Data Mining, 2009, pp. 792–801. [14] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with 1 graph for image analysis,” IEEE Trans. Image Process., vol. 19, no. 4, pp. 858–866, Apr. 2010. [15] M. Lewandowski, D. Makris, and J.-C. Nebel, “View and styleindependent action manifolds for human activity recognition,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 547–560. [16] V. de Silva and G. Carlsson, “Topological estimation using witness complexes,” in Proc. Eurograph. Symp. Point-Based Graph., 2004, pp. 1–10. [17] H. Edelsbrunner, D. Letscher, and A. Zomorodian, “Topological persistence and simplification,” in Proc. IEEE Symp. Found. Comput. Sci., Nov. 2000, pp. 454–463. [18] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, “On the local behavior of spaces of natural images,” Int. J. Comput. Vis., vol. 76, no. 1, pp. 1–12, 2007. [19] A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas, “A barcode shape descriptors for curve point cloud data,” in Proc. Eurograph. Symp. Point-Based Graph., 2004, pp. 1–11. [20] V. de Silva and R. Ghrist, “Coverage in sensor networks via persistent homology,” Algebraic Geometric Topol., vol. 7, pp. 339–358, Apr. 2007. [21] P. Bendich, B. Wang, and S. Mukherjee. (2010). Toward Stratification Learning Through Homology Inference [Online]. Available: http://arxiv.org/abs/1008.3572 [22] S. Y. Oudot, L. J. Guibas, J. Gao, and Y. Wang, “Geodesic delaunay triangulations in bounded planar domains,” ACM Trans. Algorithms, vol. 6, no. 4, pp. 1–67, 2010. [23] A. Zomorodian and G. Carlsson, “Computing persistent homology,” in Proc. IEEE Symp. Comput. Geometry, Dec. 2004, pp. 1–15. [24] J. R. Munkres, Elements of Algebraic Topology. Boulder, CO: Westview, 1984. [25] P. Bendich, “Analyzing stratified spaces using persistent versions of intersection and local homology,” Ph.D. dissertation, Dept. Math., Duke Univ., Durham, NC, 2009. [26] R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag, and S. Weinberger. (2010). Persistent Homology for Random Fields and Complexes [Online]. Available: http://arxiv.org/abs/1003.1001 [27] A. Hatcher, Algebraic Topology. Cambridge, U.K.: Cambridge Univ. Press, 2002. [28] Z. Afra, “Fast construction of the Vietoris–Rips complex,” Comput. Graph., vol. 34, no. 3, pp. 263–271, 2010. [29] X. Ge, I. I. Safa, M. Belkin, and Y. Wang, “Data skeletonization via reeb graphs,” in Advances in Neural Information Processing Systems 24. Cambridge, MA: MIT Press, 2011, pp. 837–845. [30] D. Bouchaffra, “Mapping dynamic Bayesian networks to α-shapes: Application to human faces identification across ages,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8, pp. 1229–1241, Aug. 2012. [31] H. Edelsbrunner and E. P. Mücke, “Three-dimensional alpha shapes,” ACM Trans. Graph., vol. 13, no. 1, pp. 43–72, 1994. [32] A. Asuncion and D. J. Newman. (2007). UCI Machine Learning Repository [Online]. Available: http://archive.ics.uci.edu/ml/ [33] S. Nene, S. Nayar, and H. Murase, “Columbia object image library (COIL-20),” Dept. Comput. Sci., Columbia Univ., Manhattan, NY, Tech. Rep. CUCS-005-96, 1996. [34] S. Nene, S. Nayar, and H. Murase, “Columbia object image library (COIL-100),” Dept. Comput. Sci., Columbia Univ., Manhattan, NY, Tech. Rep. CUCS-006-96, 1996. [35] D. Koubaroulis, J. Matas, and J. Kittler, “Evaluating colour-based object recognition algorithms using the SOIL-47 database,” in Proc. Asian Conf. Comput. Vis., 2002, pp. 840–845. [36] D. B. Graham and N. M. Allinson, “Face recognition: From theory to applications,” in NATO ASI Series F, Computer and Systems Sciences, vol. 163. New York: Springer-Verlag, 1998, pp. 446–456. [37] F. Samaria and A. Harter, “Parameterisation of a stochastic model for human face identification,” in Proc. 2nd IEEE Workshop Appl. Comput. Vis., Dec. 1994, pp. 138–142.
[38] S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy, “SVM and kernel methods MATLAB toolbox,” Percept. Syst. Inf., vol. 2, pp. 1–2, 2005. [39] G. Mori, S. Belongie, and J. Malik, “Shape contexts enable efficient retrieval of similar shapes,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2001, pp. 1–8. [40] J. Liu, S. Chen, Z.-H. Zhou, and X. Tan, “Generalized low-rank approximations of matrices revisited,” IEEE Trans. Neural Netw., vol. 21, no. 4, pp. 621–632, Apr. 2010.
Junping Zhang (M’05) received the B.S. degree in automation from Xiangtan University, Xiangtan, China, the M.S. degree in control theory and control engineering from Hunan University, Changsha, China, and the Ph.D. degree in intelligent systems and pattern recognition from the Institution of Automation, Chinese Academy of Sciences, Beijing, China, in 1992, 2000, and 2003, respectively. He has been an Associate Professor with the School of Computer Science, Fudan University, Shanghai, China, since 2006. He has authored or co-authored papers in highly ranked international journals such as the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE I NTELLIGENCE and the IEEE T RANSACTIONS ON N EURAL N ETWORKS . His current research interests include machine learning, image processing, biometric authentication, and intelligent transportation systems. Dr. Zhang has been an Associate Editor of the IEEE I NTELLIGENT S YSTEMS since 2009 and the IEEE T RANSACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS since 2010.
Ziyu Xie received the B.S. degree in mathematics from Fudan University, Shanghai, China, in 2010, where he is currently pursuing the M.S. degree in computer science. His current research interests include machine learning and topology.
Stan Z. Li (S’00–M’01–SM’06–F’09) received the B.Eng. degree from Hunan University, Changsha, China, the M.Eng. degree from the National University of Defense Technology, Changsha, and the Ph.D. degree from Surrey University, Guildford, U.K., 1982, 1985, and 1991, respectively. He is currently a Professor with the National Laboratory of Pattern Recognition and the Director of the Center for Biometrics and Security Research, Institute of Automation, and the Director of the Center for Visual Internet of Things Research, Chinese Academy of Sciences, Beijing, China. He was with Microsoft Research Asia as a Researcher from 2000 to 2004. He was an Associate Professor with Nanyang Technological University, Singapore. He contributed to research on recognition, pattern recognition, and computer vision. His current research interests include pattern recognition and machine learning, image and vision processing, face recognition, biometrics, and intelligent video surveillance. He has authored or co-authored over 200 papers in international journals and conferences, and authored and edited 8 books among which Markov Random Field Models in Image Analysis (New York, NY: Springer, 1st edition 1995, 2nd edition 2001, 3rd edition 2009) has been cited more than 2000 times (by Google Scholar). His other books include Handbook of Face Recognition (Springer, 1st edition 2005, 2nd edition 2011) and Encyclopedia of Biometrics (Springer Reference Work, 2010).