Selection and Extraction of Patch Descriptors for 3D Face ... - CiteSeerX

6 downloads 0 Views 733KB Size Report
In 3D face recognition systems, 3D facial shape information plays an important ... Despite two decades of intensive study, the challenges of face recognition re-.
Selection and Extraction of Patch Descriptors for 3D Face Recognition Berk G¨okberk and Lale Akarun Bo˘ gazi¸ci University Computer Engineering Department TURKEY {gokberk,akarun}@boun.edu.tr

Abstract. In 3D face recognition systems, 3D facial shape information plays an important role. 3D face recognizers usually depend on point cloud representation of faces where faces are represented as a set of 3D point coordinates. In many of the previous studies, faces are represented holistically and the discriminative contribution of local regions are assumed to be equivalent. In this work, we aim to design a local regionbased 3D face representation scheme where the discriminative contribution of local facial regions are taken into account by using a subset selection mechanism. In addition to the subset selection methodology, we have extracted patch descriptors and coded them using Linear Discriminant Analysis (LDA). Our experiments on the 3D_RMA database show that both the proposed floating backward subset selection scheme and the LDA-based coding of region descriptors improve the classification accuracy, and reduce the representation complexity significantly.

1

Introduction

Despite two decades of intensive study, the challenges of face recognition remain: changes in the illumination and in-depth pose problems make this a difficult problem. Recently, 3D approaches to face recognition have shown promise to overcome these problems [1]. 3D face data essentially contains multi-modal information: shape and texture. Initial attempts in 3D research have mainly focused on shape information, and combined systems have emerged which fuse shape and texture information. Surface normal-based approaches use facial surface normals to align and match faces. A popular method is to use the EGI representation [2]. Curvaturebased approaches generally segment the facial surface into patches and use curvatures or shape-index values to represent faces [3]. Iterative Closest Point-based (ICP) approaches perform the registration of faces using the popular ICP algorithm [4], and then define a similarity according to the quality of the fitness computed by the ICP algorithm [5]. Principal Component Analysis-based (PCA) methods first project the 3D face data into a 2D intensity image where the intensities are determined by the depth function. Projected 2D depth images can later be processed as standard intensity images [6]. Profile-based or contour-based

approaches try to extract salient 2D/3D curves from face data, and match these curves to find the identity of a person [7]. Point signature-based methods encode the facial points using the relative depths according to their neighbor points [8]. In this paper, we present a framework to represent faces locally by surface patches. The motivations of employing local patches are twofold: 1) we can analyze the contribution of local facial regions to the recognition accuracy of a 3D face recognizer, and 2) we can obtain a more compact face representation using sparse feature sets. For the first case, we formulate the recognition problem as a floating feature selection problem, and for the second case, we extract patch descriptors, and code them using statistical feature extraction methods. The face representation part is based on a novel variation of an ICP-based registration scheme. Designed registration algorithm is very fast, and it makes use of a generic average face model. As features, we have used the 3D coordinates and surface normals of the registered faces. The organization of the paper is as follows: Section 2.1 explains the registration algorithm. In Section 2.2, we provide our face description method and the similarity measures. Detailed explanation of the patch-based representation schemes are given in Section 2.3. The application of feature selection and extraction methods are explained in Section 2.4. Experimental results are presented in Section 3.

2 2.1

The Proposed System 3D Face Registration and Dense Correspondence Establishment

Registration of facial data involves two steps: a preprocessing step and a transformation step. In the preprocessing step, a surface is fitted to the raw 3D facial point data. Surface fitting is carried out to sample the facial data regularly. After surface fitting, central facial region is cropped and only the points inside the cropped ellipsoid are retained. In order to determine the central cropping region, nose tip coordinates are used. Cropped faces are translated so that the nose tip locations are at the same coordinates. In the rest of the paper, we refer to the cropped region as the facial data. After preprocessing of faces, a transformation step is used to align them. In the alignment step, our aim is to rotate and translate faces such that later on we can define acceptable similarity measures between different faces. For this purpose, we define an average face model in a specific position in the 3D coordinate system. Average face model is defined as the average of the training faces. Each face is rigidly rotated and translated to fit the template. Iterative Closest Point (ICP) algorithm is used to find the rotation and the translation parameters. The correspondences found between the template face and two arbitrary faces Fi and Fj by the ICP algorithm are then used to establish point-to-point dense correspondences. The ICP algorithm basically determines the nearest point on Fi to an arbitrary point on the average face model. Therefore, for each point on the average face model, the corresponding point on Fi is selected. If there are m points on the average face A A model FA = {pA 1 , p2 , ..., pm }, we represent face Fi by the nearest m points on Fi ,

i.e Φi = {pi1 , pi2 , ..., pim }. Here, Φi denotes the registered face. This methodology allows the ordering of 3D points on faces which is necessary to define similarity between two faces. 2.2

Facial Features

Let Φi be the registered 3D face of the ith individual. In point cloud-based representation, 3D coordinates of a facial point cloud are used as features. We can i i i i represent Φi in point cloud representation as ΦP i = {p1 , p2 , ...pm }, where p s are the (x, y, z) coordinates of each 3D point the face and m is the number of points in the face. In surface normal-based representation, surface normals of all i i i i m points are used as features: ΦN i = {n1 , n2 , ...nm } where n s are unit surface normal vectors. Since the dense point correspondence algorithm produces an ordering of facial points, we define the distance between two faces Φi and Φj in Pn j P i point-cloud representation as: D(ΦP i , Φj ) = k=1 ||pk − pk || where ||.|| denotes Euclidean norm. The same distance function is used for the surface normal-based representation technique. As a pattern classifier, 1−nearest neighbor algorithm is used. 2.3

Local Patch-based Representation of Faces

In this paper, we propose to use local patches for 3D face representation. Instead of using all facial features extracted from every point on the facial surface, we divide the face region into rectangular patches. Figure 1 depicts the patches on a sample face. We use two different patch representation techniques. The first technique uses all the features extracted from each point inside the patches. Suppose that there are k patches over the facial surface. In the first technique, the patch Γi is represented by Γi = {pi1 , pi2 , ..., piq } where p’s are point cloud features, and there are q points on the Γi . If all k patches are used to define a face Φ, then Φ can be written as: Φ = ∪ki=1 Γi . The second patch representation technique is based on patch descriptors. Instead of using all features, we compute a patch descriptor, and use this descriptor to represent the patch. Formally, let di be the descriptor calculated from patch Γi , then Γi = di . In this work, we use two patch descriptors. In point cloud representation, average 3D coordinate of the patch points are used as di , and in surface-normal representation, average surface normal of every point on the patch is used as di . The difference between the two patch representation techniques is that in the first one, all surface features are stored in representing patch Λi , whereas in the second, only one surface feature is stored. In the rest of the paper, we refer to the first technique as full patch representation, and the second technique as patch descriptor representation. The full patch representation technique is used for floating feature selection, and the patch descriptor representation is used for statistical feature extraction.

Fig. 1. Illustration of full patch representation and patch descriptor representation for point cloud and surface normal features.

2.4

Feature Selection and Extraction Methodology

Subset Selection We use near-optimal feature selection techniques to find the most discriminating patch subsets for identification. Our aim is to find the patch subset Ω = ∪ci=1 Γi where c

Suggest Documents