3D face recognition in the Fourier domain using ...

2 downloads 281 Views 4MB Size Report
May 11, 2015 - Division of Mobile Communications, Samsung Electronics Co. ..... (S = [S1, S2, S3,...,]), or simply consider a pairwise comparison of two objects.
Multidim Syst Sign Process DOI 10.1007/s11045-015-0334-7

3D face recognition in the Fourier domain using deformed circular curves Deokwoo Lee1 · Hamid Krim2

Received: 14 June 2014 / Revised: 11 May 2015 / Accepted: 30 May 2015 © Springer Science+Business Media New York 2015

Abstract One of the most significant problems in image and vision applications is the efficient representation of a target image containing a large amount of data with high complexity. The ability to analyze high dimensional signals in a lower dimension without losing their information, has been crucial in the field of image processing. This paper proposes an approach to 3D face recognition using dimensionality reduction based on deformed circular curves, on the shortest geodesic distances, and on the properties of the Fourier Transform. Measured geodesic distances information generates a matrix whose entities are geodesic distances between the reference point and an arbitrary point on a 3D object, and an onedimensional vector is generated by reshaping the matrix without losing the original properties of the target object. Following the property of the Fourier Transform, symmetry of the magnitude response, the original signal can be analyzed in the lower dimensional space without loss of inherent characteristics. This paper mainly deal with the efficient representation and recognition algorithm using deformed circular curves and the simulation shows promising result for recognition of geometric face information. Keywords Face recognition · Classification · Deformed circular curves · Geodesic distance · Fourier Transform · Dimensionality reduction

1 Introduction 3D measurement based on structured light patterns has been an alternative approach to the passive stereo vision, and has been extensively researched. Performance evaluation of 3D

B

Deokwoo Lee [email protected] Hamid Krim [email protected]

1

Division of Mobile Communications, Samsung Electronics Co. Ltd, Suwon, Gyeonggi, South Korea

2

Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA

123

Multidim Syst Sign Process

reconstruction is sometimes difficult to asses if the ground-truth model is not provided. To that end we use an approximate reconstruction result which is sufficient to identify a target, to recognize/classify a target in practical perspectives. Comparing circular pattern based representations of two 3D objects is one way of carrying out these practical applications, and the resulting performance is subsequently used as a reconstruction performance measure. In light of large data set of 3D objects, it is particularly important to utilize efficient representations of objects. Dimensionality reduction has been employed to manage a large amount of datasets, and is one of the most fundamental and crucial issues in many areas (De Lathauwer et al. 2004; Wang and Paliwal 2003; Belkin and Niyogi 2003), particularly, object recognition, classification, as well as in the areas of general signal processing. Currently, objects are represented using 3D geometric structure in addition to texture information to achieve reality and high quality visualization. In practice, however, processing 3D information (e.g. 3D Euclidean coordinates) demands high computational complexity due to its large quantity information. Hence, it is sometimes impossible to process and analyze raw data information directly. Therefore, there has been much research into analyzing high-dimensional data in lower-dimensional space in the areas of (image) signal processing as well as in the areas of mathematics and statistics. Sampling theorem enables us to recover the original signal perfectly from the minimum number of data points or the subset of the original signal, and the Shannon-Nyquist Sampling Theorem is one of the most popular methods. Other proposed approaches beyond the Shannon-Nyquist Sampling Theorem are shown in Unser (2000), Donoho (2006), Aldroubi and Grochenig (2001) and Lee and Krim (2011), etc. Perfect reconstruction may not be necessary if we intend to achieve only detection, recognition or classification from practical perspectives. Therefore, we intend to represent the target signal in lower dimensional space, and the signals represented in the lower dimension do not lose their intrinsic properties and they are therefore sufficient for recognition or classification processes. Principal component analysis (PCA) is one of the classical and widely used methods for dimensionality reduction and it contributes to other approaches for nonlinear embedding techniques. In the PCA method, eigenface, induced by calculating eigenvectors of the covariance matrix (Turk and Pentland 1991), is used to represent the properties of object information (usually M by N pixels size) in lower dimensional space so that the variance of the projected data is maximized. Other classical approaches, e.g., linear methods, include independent component analysis (ICA) (Hyvarinen and Oja 1997), linear discriminant analysis (LDA) (Mika et al. 1999), canonical correlation analysis (CCA) (David 2004), non negative matrix factorization (NMF) (Guillamet et al. 2002), etc. To alleviate the limitations of linear projection approaches above, nonlinear approaches have been proposed, for instance, Kernel PCA (KPCA) (Scholkopf et al. 1997), Locally Linear Embedding (LLE) (Saul and Roweis 2000; Roweis and Saul 2000), Self Organizing Map (SOM) (Kohonen 1990), etc. KPCA is an extended version of PCA which applies kernel functions (·) to input data and the kernel function enables us to extend PCA to nonlinearity by projecting the input data X onto another feature space Y = (X) (Scholkopf et al. 1998). The LLE, a nonlinear approach uses neighboring data points followed by estimating weight factor matrix W which can most accurately represent (or reconstruct) a data point as its neighboring points. 2     Xi −  , E(W) = W X (1) i j j   i

j

where Wi j is selected to minimize the cost function E(W), and |A − B| is defined as the absolute distance between A and B. Then, the data X is mapped to Y which is in the lower

123

Multidim Syst Sign Process

dimensional space (X and Y are any dimensional vectors and W is a matrix composed of the elements, [W ]i j ). Each component of Y is represented by using the weight factors estimated above and by an optimization process (i.e. Y is composed of Yi which minimized the embedding function (Y)).  2    Yi −  (Y) = W Y (2) ij j ,  i

j

Similar to PCA, Locality Preserving Projection (LPP) (He and Niyogi 2004), Orthogonal LPP (OLPP) (Vasuhi and Vaidehi 2009) also project the original data to the lower dimensional space. LPP uses the same approaches as PCA, but it assigns weight factor to the object function according to connectivity between all of the pairs of data points. OLPP is simply an extended version of OPP which adds the constraint of orthogonality to the projection matrix P (i.e. PP T = I). The approaches mentioned above, PCA, KPCA, LLE, LPP and OLPP, basically use Eigen decomposition approach for dimensionality reduction process. One of the alternative approaches is the use of intrinsic geometric structure and the preservation of local information of neighboring points of the data set lying on the higher dimensional manifold. Geodesic or Euclidean distance between data points on a surface can represent intrinsic geometric properties of the object, and these approaches are categorized as multi dimensional scaling (MDS) (Borg and Gronen 2005). Some examples of MDS are ISOMAP (Lee et al. 2004) and curvilinear component analysis (CCA) (Demartines and Herault 1997) (MDS and ISOMAP are based on eigen decomposition as well). They use the distance between all pairs of data points and achieve lower dimensional object representation which best preserves the intrinsic geometric structure of a target object. In addition, there have been many proposed algorithms for dimensionality reduction and several reviews and drawbacks of existing nonlinear dimensionality reduction methods have been addressed in Belkin and Niyogi (2003), Huang and Yin (2009), Tenenbaum et al. (2000), Zhang et al. (2004), Liu et al. (2014) and Hou et al. (2014). In particular, recently, in Liu et al. (2014), dimensionality reduction method has been proposed using global and local structure preservation framework for an efficient feature selection. Another recent work in Hou et al. (2014), the joint embedding learning and sparse regression (JELSR) has been used for feature selection. The recent approaches that is based on the assumption that neighboring data points have similar information (geometry, color or intensity), also require optimization process leading to high computational complexity. In Hou et al. (2014), selection of parameters (α and β) for optimization is unsolved problem and it could result in unexpected degradation of performance in real system. In addition, the assumption of similarity of neighboring data points sometimes does not work in case of large variation of characteristics of data information in practice. In this paper, the proposed algorithm needs geodesic distance between the reference point (fixed on the tip of nose) and a point on the curve rather than the distance between all the pairs of points or geodesic path between two objects. In addition, the curves that are deformed circular ones preserve geometrically intrinsic structures of 3D objects. These two properties lead to low computational complexity and simple feature selection processes compared to the approaches proposed previously. This paper proposes the dimensionality reduction approach for efficient object classification for a recognition system by reducing the dimension of original data sets that are located on a set of curves (deformed circular curves). Optimization problems may be sometimes difficult or ill-posed in practical perspectives, and in this paper, however, the proposed algorithm does not generate many parameters compared to the approaches introduced above. In addition, this paper only uses the deformed circular curves containing geometric information of 3D objects.

123

Multidim Syst Sign Process

The 3D object in this paper is represented as a set of concentric projected circular patterns. Deformed circular patterns represent 3D geometric information of an object surface. Since facial curves represent geometric information of an object, these have been used for classification for a recognition as well, for instance, geodesic curves (Feng et al. 2007; Miao and Krim 2010; Jahanbin et al. 2008 and Berretti et al. 2010). For example, previous algorithms used iso-geodesic curves which represent 3D geometric information of objects, and investigated integral invariant signatures (Feng et al. 2007), the evolution of geodesic curves (Miao and Krim 2010). The level curves of the depth function can be another approach to 3D object recognition (Samir et al. 2006). Recently, circular curves and radial curves (Ballihi et al. 2012), and spherical harmonic feature Liu et al. (2013) have been used for 3D face recognition or classification. In Ballihi et al. (2012), the recognition and classification work has been carried out by using two kinds of curves, circular and radial ones. Using two curves could increase the recognition and classification performance, but the performance is not much higher than the existed method even though simultaneous using two kind of curves would increase computational complexity and the complexity of practical systems. In addition, the circular curves used in their work is generated artificially by slicing the facial surface by Sr difined by the radius r . But our research generates deformed circular curves that can be generated after naturally projecting circular light patterns to the surface of interest. Thus, the deformed circular curves are more resonable in practical perspectives when the real active/hybrid (passive + active) camera system is developed. And the registration procedure is required prior to extracting geodesic path. Our approach, however, we use the property of the Fourier Transform, registration between two objects is not required when the intrinsic information is transformed to the Fourier domain. In Liu et al. (2013), SHF based on SHD is used for 3D face recognition and the subset of SHF is used, but the paper does not provide any concrete criterion of the selection of subset of SHF, that can result in unexpected degradation of recognition rate. In addition, the process of selection of relevant SHF features could increase computational complexity (time). In general, O(Nlog N ) is the computational complexity of spherical harmonics and this is higher than our approach which only uses 0.5 × M × N . Even though they used the higher computation, recognition rate does not show significant higher performance. This paper proposes deformed circular curves that are more relevant in practice when considering structured light 3D imaging system. Circular curves used in Ballihi et al. (2012) is artificially generated, but deformed circular curves are naturally generated when projecting circular light patterns onto a surface. Registration process is not required in this work because all of the intrinsic information of 3D face is transformed to the Fourier Domain. In Liu et al. (2013), the process of selection of the relevant SHF feature is required and this can increase computational complexity and the subset of all SHF can lead to loss of information of a 3D face in practice. Since the Euclidean distances cannot preserve intrinsic properties of an object if they are not defined in hyperplane space, it is more useful to use geodesic distance to represent 3D object data such as faces (Ballihi et al. 2012). The shortest geodesic distance between the reference point (e.g. nosetip of a face) and any point on a deformed circular curve is measured and stored in a database. A matrix is generated and is composed of values of the shortest geodesic distances between the reference and any point. Since geodesic distances can preserve the intrinsic geometric structure of the target object, the matrix is considered as the object geometric information. In this paper, we use deformed circular curves (which we will call “curves” for simplicity) which have simpler constraints than geodesic or level curves of the depth function. This work herein proposes an approach to face classification and recognition using dimensionality reduction based on deformed circular curves, the shortest geodesic distances and the properties of the Fourier Transform. The original signal (of 3 × M × N size) is represented

123

Multidim Syst Sign Process

Fig. 1 The geodesic distance is more proper than the Euclidean distance to represent the intrinsic properties of a generic surface

as a 0.5 × M × N size signal, preserving the characteristics of the original signal. To achieve dimensionality reduction, a geodesic distance between the reference point and a point on the surface is measured. Note that the geodesic distance preserves the intrinsic property of a 3D object. By measuring the geodesic distance between the reference point and a point on the curve, data size is reduced to M × N . Once the M × N matrix is reshaped as a vector whose length is M × N , the vector (or called a 1-D signal) is transformed using Fourier Transform. Following the properties of Fourier Transform, dimensionality reduction is performed and the proposed approach achieves an efficient dimensionality reduction which is crucial in the field of object classification / recognition. The rest of this paper is organized as follows : Sect. 2 briefly explains the overall algorithm proposed in this paper and Sect. 3 describes 3D object representation using deformed circular curves and their constraints. Sects. 4 and 5, the most important contribution in this paper, propose the algorithm for a classification using the shortest geodesic distances. Section 6 presents the process to measure the similarity between two faces using the property of the Fourier Transform. Concerning the proposed algorithm, Sect. 7 presents some simulation results prior to concluding the paper.

2 Algorithm description The Euclidean structure sometimes fails to analyze the underlying structure of 3D objects (such as faces). The classification algorithm using dimensionality reduction, consists of measuring the shortest geodesic distances between a reference point and other points, showing robustness to bending of generic surfaces over the Euclidean distance (Aouada and Krim; Elad et al. 2003) (Fig. 1). The proposed approach in this paper is to use the shapes of facial curves, rather than all of the information (all of 3D coordinates) of a target object. In addition, geodesic distances are not calculated between all the pairs of points on the surface, but only calculated between a reference and points on the curve leading to reducing a computational complexity (Fig. 4). Recall the facial curves are deformed circular curves due to a shape of an object surface (Lee and Krim 2010). The measured shortest geodesic distances yields a matrix which preserves the intrinsic properties of an object. Exploiting the properties of the Fourier Transform, we can reduce the dimensionality of the original 3D object information. To that end, we note :

123

Multidim Syst Sign Process

Fig. 2 Overview of algorithms approached for face recognition using deformed circular curves

1. Facial curves are deformed circular curves which have simple constraints (Sect. 3). 2. The shortest geodesic distances are measured between a reference point and a point on the curve (geodesic distances are not measured between all pairs of the points) (Sect. 4). 3. The shortest geodesic distance information is reduced using the properties of the Fourier Transform (Sect. 5). In this paper, we focus on preserving intrinsic geometric properties and performing recognition/classification by comparing the deformed circular curves each of which belonging to a 3D face model (Fig. 2).

3 3D object representation In face recognition, a 2D or 3D shape is used. 2D face recognition, due to its history, is more widely deployed, and enjoys a wider acceptance rate, a better performance, albeit in a fairly controlled environment. However, 3D recognition methods hold more or promising as 3D shapes have advantages in enhanced visibility and independence of illumination (Bowyer et al. 2006). Geometric information of an object S is defined in R3 . Extracting 3D information employs active methods, a structured light system which generates a set of concentric circular patterns. Deformed circular patterns (curves in this paper) generated by a shape of object surface, provide sufficient information to extract 3D coordinates of an object. Under the assumption of parallel projection of light patterns, the deformed patterns preserve the constraint of x and y coordinates which belong to the projected patterns (Lee and Krim 2010).

123

Multidim Syst Sign Process

Fig. 3 Original circles are deformed when projected onto an object surface. Under the assumption of parallel projection, the constraint of x and y is preserved

4 The shortest geodesic distance Let C ⊂ R3 be a set of N closed, deformed circular curves on an object, and let PR ∈ R3 be the reference point (center point in Fig. 3) and Pi j ∈ C be the ith point of the jth curve, respectively (Fig. 3), and under the assumption of parallel projection of circular curves onto the object, we can write : C = [C j ] Nj=1 , C = {Pi j |Pi j = (xi j , yi j , z i j ), g(xi j , yi j ) = R j },  g(xi j , yi j ) = xi2j + yi2j , j

(3) (4) (5)

i = 1, 2, . . . , M, j = 1, 2, . . . , N , PR = (x R , y R , z R ),

(6)

where M and N respectively represent the number of sampled points of each curve, and the number of curves themselves. In the previous work using facial curves, 3D coordinates information was used. Using such 3D information leads to extracting depth values which may, in turn, result in a significant computational complexity. Deformed circular curves only use x and y coordinates, which in effect resulting advantage for algorithmic efficiency (Lee and Krim 2010). The shortest geodesic distance between two points on the surface is taken into account (Fig. 4). Theoretically, if a curve is represented parametrically by x = x(t), y = y(t), z = z(t), the shortest geodesic distance dGmin is written as the summation or integral of the shortest arclength (in discrete domain, this is a piecewise Euclidean distance) between two points, t0 = a and t N = b,   i=N −1  ti+1  d x(t) 2  dy(t) 2  dz(t) 2

+ + dt , (7) dGmin = min dt dt dt ti i=0

or dGmin can be represented using arclength parameterization (arclength s is defined as t s(t) = a |α  (u)|du, where α(t) = [x(t), y(t), z(t)]) (Oprea 2007). In practice, the face model, however, is represented using vertices and triangular faces (i.e. discretely distributed),

123

Multidim Syst Sign Process Fig. 4 A set of deformed circular curves represents a surface S, and the shortest geodesic distance is measured between the reference point PR and any point Pi j on the jth curve. This figure shows an example of the shortest geodesic path from PR to Pi N

the dGmin between PR and any point Pi j is calculated as the minimal summation of the distances between neighboring curves. dGi1 = min[d(PR , Pi j )], j = 1, dGi j = dGmini j =

j=N d(Pi j−1 , Pi j ) j=2 ,

N  min dGi j , j=1 j

(8) (9) (10)

where dGi1 is the shortest geodesic distance between the reference point and the ith point on the 1st curve, and dGi j is the geodesic distance between the ith point on the ( j − 1)th and the ith point on the jth curve. In practice, prior information is not provided between neighboring curves (i.e. no information between the jth and the j + 1st curve, as the information is only given along with the curves), so dGi1 and dGi j are approximated by the Euclidean distance. dGmini j is defined as the shortest geodesic distance between the reference point and the ith point on the jth curve. In practice, since only deformed curves are given, and there is no prior information between neighboring curves, the shortest geodesic distance (SGD) is the summation of piecewise Euclidean distances (Fig. 3). Once SG D’s are computed for all curves, we get the shortest path from PR to Pi N (Fig. 4). In a similar fashion to the ith point, we calculate dGmini j ’s for other points leading to the N M shortest geodesic paths for a face model. As a result, the face model whose size is 3 × M × N is represented as a M × N matrix composed of SG D’s.

5 Representation in lower dimension Once dGmini j ’s are measured at all points on the curves, an M × N matrix D is generated (We will just write di j instead of using dGmini j from now on). ⎤ ⎡ d11 · · · · · · · · · d1N . ⎥ ⎢ . . . ⎢ d21 . . . . . . .. ⎥ ⎥ ⎢ ⎥ ⎢ D = ⎢ ... . . . di j . . . ... ⎥ , ⎥ ⎢ ⎥ ⎢ . . . . ⎣ .. . . . . . . ... ⎦ d M1 · · · · · · · · · d M N

123

Multidim Syst Sign Process 8

Fig. 5 In case of a planar surface, projected circular patterns preserve their shapes, and f D P (x) is a summation of unit step functions. This figure shows an example of N = 8 and M = 10.

7

6

fD

p

5

4

3

2

1

0

10

20

30

40

50

60

70

80

x

where the jth column represents the shortest geodesic distances between the reference point and the points which belong to the jth curve. We can reduce the dimension of an object by representing a set of 3D facial curves whose size is M × N × 3 using an M × N matrix, and the matrix preserves the intrinsic properties of the object. Let us define the radius of the jth original circular patterns as jk R, where k R represents the scaling factor of a radius of the light pattern and can include distortion factor. In this paper we assume that the distortion of the light pattern is appropriately compensated and the scaling factor is omitted for simplicity. If the target object is a planar surface, then the elements of the jth column are j. ⎤ ⎡ 1 ··· ··· ··· ··· ··· N ⎢ . . . . . . . . . . .. ⎥ ⎢1 . . . . . . ⎥ ⎥ ⎢ ⎥ ⎢ DP = ⎢ ... . . . j . . . . . . . . . ... ⎥ . ⎥ ⎢ ⎢. . . . . . . ⎥ ⎣ .. . . . . . . . . . . .. ⎦ 1 ··· ··· ··· ··· ··· N If we represent the matrices D and D P as one-dimensional signals, the signals can be written by f D = [d11 , d21 , . . . , d M1 , . . . , di j , . . . , d1N , . . . , d M N ], f D P = [1, 1, . . . , 1, . . . , j, . . . , N , . . . , N ],

(11) (12)

or in the continuous domain, f D P can be represented as the following : f D P (x) =

k=N 

u(x − k M),

(13)

k=0

where f D (x) is called face distance function in this paper, u(x) is the unit step function, N is the number of circular patterns projected onto a planar surface, and M is the number of sampled points from each curve (In a continuous domain, N and M are called sampling density) (Fig. 5). If the circular patterns are not projected onto a planar surface, then f D (x) is not f D P (x) and the shape of f D (x) possesses the characteristics of an object (e.g. Fig. 6). The Fourier Transform of f D is written by

123

Multidim Syst Sign Process

(a)

(c)

(b)

(d)

Fig. 6 In case of a nonplanar surface, projected circular patterns do not preserve their shapes, and f D (x) is not a summation of unit step functions. The face model is represented using vertices and triangular faces. Bottom f D (x) shows greater variance compared to f D P (x) in Fig. 5 due to the surface shape. a Original face, b geometric face (frontview), c geometric face, d facial curves

 FDC (ω) = FD D (ω) =

+∞

−∞ N M−1

f D (x)e− jωx d x : (Continuous Fourier T rans f or m),

(14)

f D (x)e− jωx : (Discrete Fourier T rans f or m),

(15)

x=0

or we can consider N M-point DFT, FD (k) =

N M−1



f D (x)e− j N M kx .

(16)

x=0

Following the properties of the Fourier Transform (FT ), FD (ω) is conjugate symmetric and since the elements of f D (x) are real values, the magnitude of FD (ω) is a symmetric function (Fig. 7). In addition, even if the starting point is not the same between the curves, the magnitude response |FD (ω)| is not affected and the starting point alignment (Miao and Krim 2010) is not needed.  +∞ FD (ω) = f D (x)e− jωx d x, (17) −∞

FT

f D (x) ⇒ FD (ω), f D (x − m) ⇒ FD (ω)e− jωm , FT

|FD (ω)| = |FD (ω)e− jωm |.

123

(18) (19) (20)

Multidim Syst Sign Process 5

10

x 10

Fourier transform of fD(x)

FD(ω)

8 6 4 2 0 −2

0

1

2

3

4

5

6

5

6

Magnitude of FD(ω)

Radian frequency (rad/s) 120

Magnitude of the Fourier transform

100 80 60 40 20 0

0

1

2

3

4

Radian frequency (rad/s)

Fig. 7 The magnitude of the Fourier transform of f D (x) shows symmetric property because f D (x) is a real value function

This advantage contributes to the simplicity and efficiency of the classification and recognition algorithm. FD (ω) = FD∗ (2π − ω),

|FD (ω)| = |FD∗ (2π − ω)| = |FD (2π − ω)|.

(21) (22)

In other words, due to the symmetry of |FD (ω)|, [|FD (ω)|]πω=0 has sufficient information about an object. In addition, even if the starting point on the curve is not aligned, the magnitude response of the Fourier Transform is invariant, thus an advantage of using the Fourier Transform. Let us define [FD (ω)]πω=0 as  F(ω), and the Inverse Fourier Transform of  F(ω)  as f(x), then the Inverse Fourier Transform of FD∗ (ω) is f ∗ (−x). Therefore, the 3D object S may be analyzed only using a one-dimensional signal,  f(x), or a magnitude of  f(x).  +∞ 1   F(ω)e jωx dω f(x) = 2π −∞  π 1 FD (ω)e jωx dω, (23) = 2π 0  π 1  FD∗ (ω)e jωx dω. (24) f ∗ (−x) = 2π 0 Due to the symmetry, either  f(x) or  f ∗ (−x) is sufficient to represent the characteristics of an object, and to classify or recognize an object. In addition, since the Fourier Transform is one-to-one and onto, the inverse Fourier Transform of FD (ω), ω ∈ [0, π), also preserves the information of the original object. The Symmetry property of the Fourier Transform enables a target object to be represented as the signal whose size is 0.5 × M × N (Fig. 8). Assuming that the number of points in each curve is sufficiently large (i.e. M is sufficiently large), characteristics of f D (x), FD (ω), etc., are preserved even if there are some missing points from the curves.

123

Multidim Syst Sign Process 120

The first half of the magnitude of F (ω)

Magnitude

100

D

80 60 40 20 0

0

0.5

1

1.5

2

2.5

3

Radian frequency (rad/s) 50

The magnitude of the inverse Fourier Transform

Magnitude

48 46 44 42 40 38 36

0

1000

2000

3000

4000

5000

6000

7000

index

Fig. 8 Due to symmetry, the first half of the magnitude of FD (ω) or the Inverse Fourier Transform of FD (ω)  (ω ∈ [0, π )) sufficiently provide the information about an object. Top [|FD (ω)|]π ω=0 , Bottom |f(x)|

6 Classification/recognition The recognition proposed in this work is performed by a pairwise comparison of the intrinsic properties of objects, and the computational complexity problem is very crucial in practical perspectives. Comparison of the objects is carried out by calculating the correlation coefficient between two objects. Many of other recognition researches employed geodesic paths and the selection of the shortest path has been investigated by calculating the distance between facial curves by considering all pairs of points. But in our research, prior to calculating correlation coefficient between two faces, dimension of geometric face information is reduced by calculating geodesic distance between the reference point (e.g., tip of a nose) and any point on the curve followed by using the properties of the Fourier Transform. Then, by reducing the number of curves to represent a 3D face, computational complexity could be decreased without losing the system performance (recognition rate). Recognition is performed by comparing different objects : Let us define S as a set of objects (S = [S1 , S2 , S3 , . . . , ]), or simply consider a pairwise comparison of two objects (i.e. S = [S1 , S2 ]). S1 and S2 are represented as a set of curves, C1 and C2 , and each face model is represented as a matrix D1 and D2 generated by calculating the shortest geodesic distances. As explained in Sect. 5, we can generate one-dimensional functions, f1 (x) and f2 (x), corresponding to D1 and D2 , respectively. C1 = [C11 , C12 , . . . , C1N ], C2 = [C21 , C22 , . . . , C2N ], i j i=M, j=N

D1 = [d1 ]i=1, j=1 , i j i=M, j=N

D2 = [d2 ]i=1, j=1 , ij

f D1 = [d111 , d121 , . . . , d1M1 , . . . , d1 , . . . , d11N , . . . , d1M N ], ij

f D2 = [d211 , d221 , . . . , d2M1 , . . . , d2 , . . . , d21N , . . . , d2M N ],

123

Multidim Syst Sign Process

Fig. 9 a Original face, b geometric representation of a face (a), c extracted facial curves extracted facial curves which are deformed circular patterns. When we calculate the geodesic distances, reference point is chosen as the highest z-value, i.e., the nosetip is selected as the reference point in this paper

 FD1 (ω) = FD2 (ω) =

+∞

−∞  +∞ −∞

f D1 (x)e− jωx d x, f D2 (x)e− jωx d x, x = 1, 2, 3, . . . , N M − 1, N M,

 FD1 (ω) = [FD1 (ω)]πω=0 ,  FD2 (ω) = [FD2 (ω)]πω=0 ,

 +∞ 1  FD1 (ω)e jωx dω f1 (x) = 2π −∞  +∞ 1  FD2 (ω)e jωx dω f2 (x) = 2π −∞

Given above the equations, a comparison between S1 and S2 is performed using f1 (x) (or

 FD1 (ω)) and f2 (x) (or  FD2 (ω)) whose sizes are reduced from M × N × 3 to 0.5 × M ×

N . Although there are several methods to classify f1 (x) and f2 (x), we here calculate the correlation coefficient matrix between two functions, as a classification using the correlation coefficient matrix K12 is invariant to scaling factors. In general, the correlation coefficient matrix between the kth (or fk (x)) and lth (or  fl (x)) class is written as 

σkk σkl Kkl = , σlk σll

and the classification result is represented as σkl or σkl due to the symmetry of Kkl . In practice, the distance between a viewpoint and an object may affect the relative coordinates of an object, and the associated scaling factors. To achieve reliable classification results, the classification is based on the shapes of facial curves and the comparison of shapes is carried out by calculating the correlation coefficient matrix (Fig. 9).

7 Simulation results An experiment is performed using 3D face models represented by vertices and triangular faces. 195 face models (60 persons with different expressions, FRGC2 database) are used to

123

Multidim Syst Sign Process

Fig. 10 Extracted facial curves from the face models. a, b Same face with different expressions, c, d different face models

verify the proposed approach for a classification which achieves a recognition process. In the simulations, other databases have been used for more convincing evaluations, but in this paper, we aim to show efficient recognition framework rather than the recognition performance itself. The face models include approximately 60 people with different expressions such as neutral, smile, surprise frown and inflated. Each face model is represented as approximately 95 to 100 curves (i.e., face is represented as C = [C 1 , C 2 , . . . , C 94 , C 95 ]). Since all the faces are composed of different numbers of curves, we use 95 curves (Range of radius : j= [1 to 95], j ∈ Z) to generate the common range of all face models. The jth curve is the deformed circular pattern whose radius is j and for the jth curve, xi2j + yi2j = j 2 .

(25)

Using circular patterns, we do not need to consider depth values of each data point for curve extraction and ultimately for recognition, since the curve is constrained only using x and y coordinates leading to more algorithmic simplicity. Concerning the constraints of the facial curves whose shape is determined by depth, for example, a 3D face and facial curves are shown in Fig. 6. We consider geometric perspectives of face models, and Fig. 10 shows examples of the facial curves for same person with different expressions ((a) and (b)) and curves from different people ((c) and (d)). From the face models in Fig. 10, our proposed algorithms are applied, and substantial results are provided. Each face is composed of N curves each of which contains M points. By measuring dGmini j , the matrix D is generated, and a one-dimensional function f D (x) shows the intrinsic characteristics of the face models (Fig. 11). In Fig. 11, f D (x) of each face is induced from the shortest geodesic distances

123

Multidim Syst Sign Process

Fig. 11 1-(a)–1-(d) f D (x) of the same person with four different expressions. 2–5: f D (x) of four different people faces. f D (x)’s are one-dimensional functions corresponding to the faces in Fig. 10. 1-(a)–1-(d)s are functions on the same person with different expressions and 2–5 are functions on the different people. We can see that f D (x)s from the same person have similar shapes, and f D (x)s from the different persons does not have similar shape, because each function contains the shortest geodesic distance information which is preserved

between the reference point and the point on the facial curves. Since the geodesic distance contains the intrinsic geometric information of a 3D face, face distance function (FDF)s of different expressions of the same face are very similar to each other, while FDF of different faces are not similar in comparison to that of the same face. In other words, the intraclass similarity is higher than the interclass similarity. Similarity is measured using a correlation coefficient matrix K or the non-diagonal element of K, 

σ11 σ12 K= . σ21 σ22 Due to the symmetry of K, σ12 is equal to σ21 and represents the similarity between objects FDF. The more similar the objects FDF are, the higher σ12 is (similarity between 195 faces is represented as 195 × 195 matrix). As proposed in Sect. 5, the symmetric property of the Fourier Transform increases the efficiency of a classification system. The magnitude responses of the faces in Fig. 10 are presented in Figs. 12, 13, 14 and 15 (The indices 1-(a), 1-(b), 1-(c), 1-(d), 2, 3, 4 and 5 are consistent with the Figs. 10 and 11). The similarity between 8 face models (4 are in the same class and the others are in different classes) above is presented in Fig. 16. The values in the table of Fig. 16 represents the quantity of similarity (σ12 ). Intraclass similarity (σ12 in shaded area) is higher than other areas. Classification results using 30, 60, more than 60 curves from 195 face models (60 persons with different expressions) are presented in Fig. 17. the recognition result is represented as 195×195 matrix, and each element represents the similarity (σ12 ). The results show that σ12 is higher in the areas close to the diagonal, and lower in other areas, because the areas close to diagonal contain σ12 from intraclass similarity, and other areas contain σ12 from interclass similarity. The proposed approach herein shows robustness in classification work only using a low sampling

123

Multidim Syst Sign Process

Fig. 12 Left top FD (ω), Fourier transform of f D (x), left bottom magnitude response of FD (ω), right top  f(x), the inverse Fourier transform of F D (ω), the first half of the magnitude response of FD (ω), right bottom   F D (ω)

Fig. 13 Left top FD (ω), Fourier transform of f D (x), left bottom Magnitude response of FD (ω), right top  f(x), the inverse Fourier transform of F D (ω), the first half of the magnitude response of FD (ω), right bottom   F D (ω)

123

Multidim Syst Sign Process

Fig. 14 Left top FD (ω), Fourier transform of f D (x), left bottom Magnitude response of FD (ω), right top  f(x), the inverse Fourier transform of F D (ω), the first half of the magnitude response of FD (ω), right bottom   F D (ω)

Fig. 15 Left top FD (ω), Fourier transform of f D (x), left bottom Magnitude response of FD (ω), right top  f(x), the inverse Fourier transform of F D (ω), the first half of the magnitude response of FD (ω), right bottom   F D (ω)

123

Multidim Syst Sign Process

Fig. 16 8 faces with 5 persons (classes) are used to measure the similarity. The similarity quantities (σ12 ) between the same class (shaded area) are higher than the ones between different classes

density (approximately 31 %). In Fig. 17, recognition results show a better performance with the higher sampling density even though 33 % sampling density shows reliable classification results. This sampling density is lower than the density required for a surface reconstruction shown in Lee and Krim (2011). Using the FRGCv2 dataset, the table the proposed method shows the higher performance when all the curves are used. But when selected SHF and circular/radial curves are used, the proposed method shows lower acceptance rate. Even though the existed methods require additional procedures such as selection of the relevant curves, registratioin, extracting geodesic paths between two objects (Ballihi et al. 2012) and selection of the relevant SHF (Liu et al. 2013), the maximal performance of the most recent research does not show significant higher performance than the proposed methods. Each face has it own  f(x), where x = 1, 2, . . . , 0.5 × N M, and we select 60 face models as a training set (i.e. one face from each class). A training set is represented as [ f j (x)]60 j=1 . Each class has 3 or 4 expressions (smile, frown, inflated and surprise), and we store the training set in a database to carry out face recognition. The recognition rate also varies with the number of deformed circular curves N (Tables 1, 2). The performance is evaluated using Receiver Operating Curves (ROC) which is illustrated in Fig. 18. With 60 people with different expressions, the higher performance of the proposed method is shown in this ROC. The Euclidean distance between the reference point and the points on the curve does not explicitly preserve the intrinsic properties of generic surfaces (e.g. human faces). In other words, this method is not bending invariant. Correlation between 3D points cannot be used for classification unless the image is exactly registered or the coordinate

123

Multidim Syst Sign Process

Fig. 17 Classification matrix is composed of σ12 ’s, by calculating the correlation coefficient matrices between all pairs of the faces. Higher sampling density shows better recognition results Table 1 Recognition rate

N (Number of curves)

30

60

80

90

Sampling density (%)

31.6

63.2

84.2

94.7

Recognition rate (%)

91.28

93.33

94.87

95.38

Table 2 Comparison with State-of-the-art Approaches on FRGCv2 Methods (Number of curves)

Pr oposed

SH F Liu et al. (2013)

Balli hi(all) Ballihi et al. (2012)

Balli hi(selected) Ballihi et al. (2012)

Recognition rate (%)

95.38

96.94

91.81

98.00

system is aligned. The proposed method outperforms other methods as illustrated in Fig. 18. In this work, total processing time is composed of an extracting deformed circular curves, calculation of geodesic distances, Fourier Transform and comparison. In the part of curve extraction and calculating geodesic distances (when 90 curves were used), the computational complexities were on the order of 4 seconds and 6 seconds, respectively. When 30 curves

123

Multidim Syst Sign Process Receiver Operating Characterisitic Curves (ROC) 1

0.9

True Positive

0.8

0.7

Geodesic distance Euclidean distance

0.6 Points Correlation

0.5

0.4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive

Fig. 18 Classification matrix is composed of σ12 s by calculating the correlation coefficient matrices between all pairs of the faces. Higher sampling density shows the better classification results

were used for recognition, the minimum processing time was 3.1230 seconds and 4.8473 seconds, respectively. The processing time for Fourier Transform and a comparison using correlation coefficient was on the order of less than 2 seconds (1.9125 seconds in average). Concerning all above, on the average, total processing time for the recognition in this paper is on the order of 10 seconds. But, all of the simulations have been carried out using Matlab on a PC and the processing speed for the proposed work needs to be alleviated in the further work.

8 Conclusion In this paper, we have proposed an approach to face recognition using deformed circular curves. We have measured the shortest geodesic distances between the reference point (e.g. nosetip) and a point on the curve. The shortest geodesic distances are in turn put in generate a matrix or in one-dimensional function, and the functions are compared to each other. Experimentally, there is little difference in the geodesic distance between the same face with different expressions (intraclasses difference). But different face models of different people show a low similarity due to the shape of facial curves (interclasses difference). Even though the faces are composed of more than 90 curves, recognition results are sufficiently reliable only using 30 curves (sampling density is approximately 33 % and the recognition rate is 91.28 % which is higher than the previous work in Samir et al. (2006)). This result also shows that the sampling rate can be adjusted according to purposes of interest, such as reconstruction, classification and recognition. Using a 94.2 % sampling density, the recognition rate is close to 95.38 %, which shows competitive performance compared to the other approaches only with geometric information. Even though the higher sampling density may lead to computational complexity, the facial curves here are extracted only using x and y coordinates, with no use for the depth values. In addition, this result is very promising because the proposed algorithm uses only geometric information and does not require additional processes such as the selection of the relevant curves, optimization by iteration procedure, etc. In particu-

123

Multidim Syst Sign Process

lar, the robustness of the approach results from using the symmetric property of the Fourier Transform whose magnitude response leads to dimensionality reduction as well as invariance to shift or rotation of the curve. Most importantly, a similarity measurement is carried out without alignment of the curves of interest. In addition, since the intrinsic properties of a face are represented using a one dimensional function, f D (x), assuming that the number of sampled points in each curve is sufficiently large (i.e. M is sufficiently large), the recognition performance is not sensitive to missing sampled points. Pose-invariant face recognition is one of the future studies which will estimate a criterion for a viewpoint angle which will affect the acceptance rate.

References Aldroubi, A., & Grochenig, K. (2001). Nonuniform sampling and reconstruction in shift-invariant spaces. SIAM Review, 43(4), 585–620. Aouada, D. & Krim, H. (2010). Squigraphs for fine and compact modeling of 3-D shapes. IEEE Transaction on Pattern Analysis and Machine Intelligence, 19(2), 306–321. Ballihi, L., Ben Amor, B., Daoudi, M., Srivastava, A. & Aboutajdine, D. (2012). Boosting 3-D geometric features for efficient face recognition and gender classification. IEEE Transactions on Information Forensic and Security, 7(6), 1766–1799. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396. Berretti, S., Del Bimbo, A., & Pala, P. (2010). 3D dace recognition using iso-geodesic stripes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2162–2177. Borg, I. & Gronen, P. J. F. (2005). Modern multidimensional scaling–Theory and applications, Springer series in statistics (2nd ed.). New York: Springer-Verlag. Bowyer, K. W., Chang, K., & Flynn, P. (2006). A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition. Computer Vision and Image Understanding, 101(1), 1–15. De Lathauwer, L., & Vandewalle, J. (2004). Dimensionality reduction in higher-order signal processing and rank-(R1, R2,., RN) reduction in multilinear algebra. Special Issue on Linear Algebra in Signal and Image Processing, 391, 31–55. Demartines, P., Herault, J. (1997). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8(1), 148–154. Donoho, D. L. (2006) Compressed sensing. IEEE Transaction on Information Theory, 52(4), 1289–1306. Elad, A., & Kimmel, R. (2003). On bending invariant signatures for surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1285–1295. Feng, S., Krim, H., & Kogan, I. A. (2007). 3D face recognition using Euclidean integral invariants signature. In IEEE/SP 14th Workshop on Statistical Signal Processing, 2007. SSP ’07, pp. 156–160 Guillamet, D., Schiele B., & Vitria, J. (2002). Analyzing non-negative matrix factorization for image classification. In Proceedings of the 16th International Conference on Pattern Recognition. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16, 2639–2664. He, X. & Niyogi, P. (2004). Locality preserving projections. In Proceedings of the NIPS, Advances in Neural Information Processing Systems (Vol. 16). Vancouver: MIT Press. Hou, C., Nie, F., Li, X., Yi, D., & Wu, Y. (2014). Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793–804. Huang, W. & Yin, H. (2009). Nonlinear dimensionality reduction for face recognition. IDEAL’09 Proceedings of the 10th International Conference on Intelligent Data Engineering and Automated Learning, pp. 424–432. Hyvarinen, A., & Oja, E. (1997). A fast fixed-point algorithm for independent component analysis. Neural Computation, 9, 1483–1492. Jahanbin, S., Choi, H., Liu, Y. & Bovik, A. C. (2008). Three dimensional face recognition using iso-geodesic and iso-depth curves. In 2nd IEEE International Conference on Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480. Lee, D., & Krim, H. (2010). 3D surface reconstruction using structured circular light patterns. In ACIVS 2010. Part I, LNCS (Vol. 6474, pp. 279–289).

123

Multidim Syst Sign Process Lee, D., & Krim, H. (2011). A sampling rate for a 2D surface. SSVM, LNCS, 6667, Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 57, 49–76. Liu, P., Wang, Y., Huang, D., Zhang, Z., & Chen, L. (2013). Learning the spherical harmonic features for 3-D face recognition. IEEE Transactions on Image Processing, 22(3), 914–925. Liu, X., Wang, L., Zhang, J. & Liu, H. (2014). Global and local structure preservation for feature selection. IEEE Transaction on Neural Networks and Learning Systems, 25(6), 1083–1095. Miao S., & Krim, H. (2010). 3D face recognition based on evolution of iso-geodesic distance curves. In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 1134–1137. Mika, S., Ratsch, G., Weston, J., Scholkopf, B. & Muller, K.-R. (1999). Fisher discriminant analysis with kernels, neural networks for signal processing IX. In Proceedings of the 1999 IEEE Signal Processing Society Workshop. Oprea, J. (2007). Differential geometry and its applications (2nd ed.). The Mathematical Association of America (Incorporated): Pearson Education. Roweis, S. T., & Saul L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. Samir, C., Srivastava, A., & Daoudi, M. (2006). Three-dimensional face recognition using shape of facial curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1858–1863. Saul, L. K., & Roweis, S. T. (2000). An introduction to locally linear embedding. Florham Park, NJ: AT and T. Scholkopf, Bernhard, Smola, Alexander, & Muller, Klaus-Robert. (1997). Kernel principal component analysis, Artificial Neural Networks - ICANN’97. Lecture Notes in Computer Science, 1327, 583–588. Scholkopf, B., Smola, A., & Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323. Turk, Matthew A. & Pentland, A. P. (1991) . Face recognition using eigenfaces. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–591. Unser, M. (2000). Sampling—50 years after Shannon. Proceedings of the IEEE, 88(4), 569–587. Vasuhi, S., & Vaidehi, V. (2009). Identification of human faces using orthogonal locality preserving projections. In International Conference on Signal Processing Systems. Wang, X., & Paliwal, K. K. (2003). Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recognition, 36(10), 2429–2439. Zhang, C., Wang, J., Zhao, N., & Zhang, D. (2004). Reconstruction and analysis of multi-pose face images based on nonlinear dimensionality reduction. Pattern Recognition, 37(2), 325–336. Deokwoo Lee received B.S. in Electrical Engineering from Kyungpook National University, Daegu, South Korea in 2007, and M.S. and Ph.D. degree in Electrical Engineering from North Carolina State University , Raleigh, NC, USA, in 2008 and 2012, respectively. Prior to joining Samsung Electronics in 2013, he worked at Washington University in St. Louis, MO, USA as a Postdoctoral Research Associate. He is currently working as a Senior Researcher in the Division of Mobile Communications at Samsung Electronics Co., Ltd., South Korea

123

Multidim Syst Sign Process Hamid Krim received the degrees in electrical engineering. As a member of technical staff at AT&T Bell Labs, he worked in the area of telephony and digital communication systems/subsystems. In 1991, he became a NSF postdoctoral scholar at the Foreign Centers of Excellence (LSS Supelec University of Orsay, Paris, France). He subsequently joined the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, as a Research Scientist performing and supervising research in his area of interest, and later as a faculty in the ECE Department, North Carolina State University, Raleigh, NC, where he is currently Professor and directing the Vision, Information, Statistical Signal Theories and Applications (VISSTA) Laboratory in 1998. He is an original contributor and now an affiliate of the Center for Imaging Science sponsored by the Army. His research interests are in statistical signal processing and mathematical modeling with a keen emphasis on applications. Dr. Krim is a recipient of the NSF Career Young Investigator award. He was on the editorial board of the IEEE TRANSACTIONS ON SIGNAL PROCESSING and regularly contributes to the society in a variety of ways.

123

Suggest Documents