Pose Tolerant Surface Alignment for 3D Face Verification ... - CiteSeerX

4 downloads 0 Views 1010KB Size Report
Dirk Colbry and George Stockman ... effect on the SAM of surface variations and improper align- ... matching Surface Alignment Measure (SAM) as a match-.
Pose Tolerant Surface Alignment for 3D Face Verification with Symmetry Test Reject Option Dirk Colbry and George Stockman Department of Computer Science and Engineering, Michigan State University East Lansing, Michigan 48824-1226 {colbrydi, stockman}@cse.msu.edu

Abstract We present a viable 3D face verification algorithm that uses automatically identified anchor points and the iterative closest point (ICP) algorithm to align two face surfaces. This algorithm is fast (< 2 seconds on a 3.2GHz P4) and robust to noise in the data (< 10% spike noise). We show that our algorithm tolerates up to 15 degrees of variation in pose due to roll and pitch, 30 degrees of variation in yaw, and can handle stand off variations up to the limits of the sensor’s range (0.6 to 2.5 meters). By using the average root mean squared error as an initial matching score and automatically rejecting poor quality scans, we achieve 1.2% equal error rate on the FRGC [10] database of 948 scans from 275 subjects.

1

Keywords

Biometrics, 3D Scanners, Sensors, Error Analysis, Pattern Recognition

2

Introduction

We have developed a fast, viable 3D face verification system that calculates the surface alignment measurement (SAM) between two surfaces. Large SAM values could mean improper alignment between two near-frontal scans, or it could mean that the surfaces are shaped differently. Two scans may have different shapes for several reasons: • The scans are from different people. • Appearance changes in a person (e.g. expression, hair, weight, etc.) • Changes in matching system factors, such as pose, illumination, and camera distance. The experiments reported here focus on understanding the effect on the SAM of surface variations and improper align-

ment resulting from changes in pose, illumination and distance from the camera (stand off). Results indicate that our current system is sufficiently robust to these variations to be of practical use. Studies from FRVT 2000 have shown that when variations in lighting and pose are introduced into a data set, the performance of 2D face recognition system degrades significantly [11]. Much recent research has focused on the development of 3D face recognition technology. Some investigators have used the 3D processing component to normalize the pose and lighting of the input image to match pose and lighting variations in the 2D gallery images. These mappings include simple 3D rotations, as well as more advanced morphing models that include variations in expression [3]. There is a long history of face recognition research based on the distance between sets of known anchor points on the face [9]. Early work in this area used manually selected anchor points on 2D images with various levels of successful recognition. Some recent 3D work is also based on manually selected points [12]. Current research in the use of automatically detected anchor points is limited to small data sets, and the results are susceptible to changes in pose and to noise in the data [7]. Other work in face recognition takes advantage of features that can be found on profile images. In early work, these images were taken with 2D cameras and compared using various techniques. However, by adding 3D scans, the profile can be extracted from a wide range of scan poses. Results are mixed and much of the existing work still requires relatively frontal images in order to accurately extract the pose [2]. The following section presents our surface-matching algorithm, which uses anchor points for coarse alignment and ICP for fine alignment. Then we describe our experimental setup followed by five experiments for evaluating performance. The first four experiments explore the limits of pose and lighting variations that our surface alignment algorithm can handle. These four experiments are based on a database of scans taken with a rigid mannequin face, as well

as artificially generated poses taken from a database of over 300 scans. In the final experiment, we use the average ICP matching Surface Alignment Measure (SAM) as a matching score and evaluate the performance of our algorithm on a large, publicly available data set.

3

Matching System

Our approach for comparing two face scans from a 3D scanner begins by aligning the surfaces using the two-step, rigid alignment process shown in Figure 1.

Figure 2. Anchor points (red) and control point (white) locations on the mannequin.

the scans, we developed a surface alignment measurement (SAM), which is the final average matching error produced by the ICP algorithm. All SAM values are reported in millimeters and reflect the average over the 90 control points (after trimming). For more information on our matching system, see [8].

4

Figure 1. Matching system diagram.

Experimental Setup

Our face matching system uses input from a structured light scanner, the Minolta Vivid 910 [14], which is also the primary scanner used to gather 3D face data for the Face Recognition Grand Challenge (FRGC) [10]. It produces scan depth values accurate to less than one millimeter (±0.10mm in high resolution mode). For the experiments described here, the Vivid 910 recorded 320x240 pixel images. For every pixel, the scanner outputs the Cartesian coordinates (x, y, z), the color (R, G, B), and a flag value indicating whether or not the depth value could be computed. Two data sets were used for our experiments: the first contains scans of a mannequin head (Marie); the second contains artificially rotated, virtual scans generated from a set of over 300 real face scans of 111 human subjects. The virtual rotation process rotates the points of a face scan in 3D space and then re-projects them onto an orthogonal plane that is parallel to the xy plane. The orthogonal projection produces a virtual scan that includes a depth map, flag map and color image similar to the data produced by the Minolta scanner. Artificially generated scans are not as accurate as true scan data; however, exact pose angles can be produced and it is possible to control the sampling rate. The faces shown in Figure 3 are examples of virtually rotated scans (generated from the original scan shown in Figure 3c).

The first step in our process is a coarse alignment using anchor points [6]. These anchor points include the nose tip, inside corners of the eyes, the mouth, chin, etc., and are detected by using a model of the structure of the face and the curvature (second derivative) at various points. Once the anchor points are detected in both the probe and gallery scans, corresponding anchor points are used to estimate a coarse, rigid transformation between the two scans. The second step in aligning two scans is to use the Iterative Closest Point (ICP) algorithm to finely align the scans [1]. ICP samples a set of control points on one scan and calculates the nearest points on the second scan, then calculates a transformation that reduces the error between the point sets. This algorithm terminates when the change in error is below a threshold, or when the iteration limit is reached. A grid of 100 control points located around the eyes and the nose was chosen for the ICP fine alignment (see Figure 2). The ICP algorithm is not guaranteed to converge to the correct surface match and may fall into local minima. To minimize this risk, we trim the top 10% of the noisy control points before calculating the transformation [5]. Another problem with ICP is that searching for the closest point on the surface of the gallery scan for each of the control points can be slow. To speed up the closest point calculation, we use row and column lookup tables, which store the maximum and minimum x and y values. After the scans are finely aligned, the next step in our matching system is to calculate the distance between the two scans in order to determine the likelihood that they are scans of the same person. To calculate the distance between

5

Experiments

Five experiments were conducted to evaluate the performance of the surface matching system given changes in ex2

5.2

20 scans (190 matching scores) were made of a mannequin head (Marie) in a frontal pose with constant florescent lighting. The resulting SAM values had a mean of 0.37mm, ±0.15mm. Figure 5 shows a histogram of these SAM values for Marie. The dotted line represents the Gaussian approximation to the histogram with a mean at 0.37mm and a standard deviation of 0.15mm. As a comparison, two more Gaussian distributions are shown next to “Marie,” one for intra-class (Genuine) SAM values and the other for inter-class (Imposters) SAM values. The “Genuine” curve represents scans taken from 111 people; in a third of these scans the people were smiling. Notice that the mean of the genuine curve is slightly higher than that for Marie due to changes in expression (0.7mm ± 0.15). Also, the SAM values for “Imposters” match scores are much higher (1.5mm ± 0.3). The overlap between the “Genuine” and “Imposters” curves represents the error region for our system. We are looking at other matching scores that may better separate these curves by taking expression into account.

Figure 3. Example of virtual yaw rotation.

ternal conditions. The first experiment examined the effects of facial color and lighting. The second experiment was a control experiment to determine the baseline SAM values for standard operation of the system with neutral expression and cooperative subjects approximately 1 meter from the scanner. The third experiment evaluated the performance of the system under changes in head pose (roll, pitch, yaw). The fourth experiment explored the changes in SAM when changing the distance between the camera and the subject. In the final experiment, we used our face matching system to calculate the SAM scores on the FRGC version 1.0 data set (948 scans of 275 people with non-rigid features).

5.1

Optimal Conditions Baseline SAM

Color and Lighting

Although skin color can be a source of identification or disguise, it rarely affects the ability to match two surfaces using SAM criteria. Changes in color, however, can affect the Minolta scanner’s ability to detect the laser, which leads to failure of the triangulation scheme and of computing the depth value. We found that washout from severe lighting inhibits the camera’s ability to detect the laser beam, as do some dark colors. These types of changes, however, usually do not prohibit the matching system from generating an acceptable value of SAM. Figure 4 shows some extreme examples with SAM values well within the normal range for valid surface alignment. These conditions would cause any traditional 2D face recognition algorithm to fail.

(a) Low Light, 0.23mm SAM (b) Camouflage, SAM

Figure 5. Gaussian approximation of SAM distributions. This baseline experiment is similar to that reported by Boehnen and Flynn [4]. They compared the characteristics of different scanner technologies scanning rigid masks and comparing the scans to the original shape of the masks. This study showed that the Minolta scanner made the most accurate surface measurement. However, surface matching errors reported in [4] are much lower than many of the averages seen in our experiments. We discovered three experimental differences that account for this discrepancy. First, the study in [4] used the fine resolution mode on the Minolta scanner (640x480) instead of the fast resolution mode that we use (320x240). This change in resolution will affect the baseline distance errors for the closest point calculation. Second, Boehnen and Flynn preprocessed the scans to remove small holes and spike noise on the surface of the scan. Finally, the masks used in [4] were manufactured to have an

0.59mm

Figure 4. Extreme lighting and makeup condition that produced successful matches.

3

optimal white surface to reflect the light from the Minolta scanner. The combination of these three experimental differences account for the differences in error scores between our study and [4].

5.3

the missing anchor points. Instead, the increase is due to the number of iterations used by ICP. We ran the roll experiment again using a limit of fifty ICP iterations instead of ten; the results are also shown in Figure 8. The graph confirms our hypothesis, since a flattened out, low SAM area is now present between -30 and 30 degrees for both the Marie and the virtual rotation data. With 50 ICP iterations, the range of acceptable SAM expanded and almost all of our tests converged (the average number of iterations was 19, with a standard deviation of 13 iterations). In order to determine whether the results of the roll, pitch and yaw are reasonable, a control experiment was made on the Face Recognition Grand Challenge (FRGC) database of frontal scans. For these experiments all pairs of scans from the same subject were aligned and the roll, pitch and yaw were measured, the results are shown in Table 1.

Pose Variation

The third set of experiments tested the effects of changing the relative pose angle between the face and the camera. There are three types of pose changes that a face can undergo in 3D space: yaw, pitch, and roll (see Figure 6).

MSU Average Max FRGC1.0 Average Max

Figure 6. Roll, Pitch, and Yaw. It is difficult to have human subjects change pose angles at small increments and measure these angles accurately to establish a ground truth. Instead we use a mannequin head (Marie) attached to a tripod to accurately measure angle differences. In this experiment, five different scans of Marie were taken at different pose angles that varied in the Yaw, Pitch, and Roll directions. The average of these five scans are shown as the “Marie” curve in Figures 7, 8 and 9. However, Marie is an idealized subject and variations in face shape may affect these calculations. So, as a comparison, over 300 frontal scans of real human faces were virtually rotated within the same range of poses and plotted next to the Marie data. For reference, some baseline scans are also shown. These baseline scans are from the same subject at different poses (not virtually rotated). The rotation of the pose is calculated after ICP alignment. This calculation is not the same as ground truth, but these scans are included for comparison and demonstrate that the values found with both the virtual scans and the Marie scans are similar. In the yaw and pitch experiments, the increase in the SAM values at high rotations ( > 10 degrees) is due to inaccurate anchor point detection. This is an expected result because our anchor point detection system was designed to find anchor points on strictly frontal pose scans. In experiments with roll, the nose is always detected because the nose tip always remains closest to the camera. ICP does not need all of the anchor points to correctly align two surfaces; therefore, the gradual increase in SAM shown in Figure 8 (10 ICP Iterations) is not due simply to

Roll 0.36◦ ±0.49◦ 3.11◦ 0.73◦ ±0.81◦ 6.12◦

Pitch 1.62◦ ±2.25◦ 16.03◦ 2.95◦ ±2 .50◦ 13.03◦

Yaw 1.52◦ ±1.99◦ 17.30◦ 2.54◦ ±2.13◦ 17.58◦

Distance 63.69mm ±66.08mm 431.97mm 164.44mm ±78.17mm 646.75mm

Table 1. Differences between valid scans in the database. The maximum rotations shown in Table 1 for the roll and yaw directions are within the operating ranges shown in Figures 8, and 9. The pitch direction is not within the described range. However, the results in Table 1 are the maximum angles between scans and the results for Figure 7 are the angles from zero. Since the largest variations are between subjects looking up and down, their actual angle variation from zero is closer to half the maximum value, and this is well within the range shown in Table 1.

5.4

Change in Stand Off Distance

The Minolta Vivid 910’s operating stand off for the medium distance lens is reported to be between 0.6 and 2.5 meters. In our experiment, the system was tested with the distance between the camera and the subject varied from 0.35 to 3.05 meters. The system found no valid points on the face when the distance was below 0.5 meters and above 2.9 meters. We chose to omit those scans. The results of the valid scans can be seen in Figure 10. The closest viable distance for Marie to the scanner was found to be 0.65 meters. At this distance the matching error level was measured to be 0.41mm. If Marie was closer to the camera, the computed SAM value was not reliable. As the distance increased, the SAM increased at an almost linear rate. 4

Figure 10. Change in Stand Off vs. SAM.

5.5

Human Frontal Data

Our algorithm was evaluated on the Face Recognition Grand Challenge (FRGC 1.0) database, containing 275 subjects with 948 subject scans. All the scans are frontal pose, neutral expression; however, there are some deviations from this due to natural human pose. The data were gathered using 640×480 resolution instead of our 320×240 resolution. A baseline 3D matching performance using PCA (similar to [13]) is also available for this data set from the University of Notre Dame. In this baseline algorithm, both the color and depth components of the face scans are normalized to a standard width and height using manually selected points at the center of the eyes. Once the images are normalized, the PCA algorithm is applied to both data channels independently and the matching results are reported. Figure 11 shows the ROC curves produced by the Notre Dame baseline algorithm and our algorithm.

Figure 7. SAM vs. Variations in Pitch.

Figure 8. SAM vs. Variations in Roll.

Figure 11. ROC curves for the FRGC 1.0 Data.

The Notre Dame baseline algorithm has an EER of 4.6% and our algorithm has a baseline EER of 3.77% on the same database. Our matching algorithm performs much better for low values of the false positive rate. Many of the errors on

Figure 9. SAM vs. Variations in Yaw.

5

the top end of the ROC curve are due to incorrect detection of the anchor points. In fact, our 3.77% EER includes only 23 False Reject errors. One of these scans is missing its depth data, and the other 22 errors are due to bad anchor points in the same 15 scans. These error rates demonstrate some of the difficulties with this testing method: a single error in anchor point detection on a single scan can propagate and cause misleading results. In a real application, it would be better to automatically identify and reject some of the bad scans and have the subject rescanned.

5.6

6. Using a rejection option to remove poor quality scans improves the EER to 1.2% (at a rejection rate of 1.5%). These experiments demonstrate that the SAM can be used as a matching score to achieve equal error rates of 1.2% for frontal, neutral expression scans. This error rate and the speed of our system are acceptable for many biometric applications, although 3D scanners are still too expensive for widespread application. We are currently developing better anchor point detection algorithms and exploring more advanced matching techniques in order to achieve better performance.

Automatic Rejection Option

References

We developed a rejection algorithm based on the symmetry of the face, using ICP to align the current scan with a mirror image of itself. If the anchor points are not found correctly, then the SAM score between a scan and its mirror image will likely be high. However, if the anchor points are found correctly, then the SAM score will be quite low. Using this assumption, we designed a rejection criteria that rejected 1.5% of the 948 scans. Errors such as spike noise in the hair, spike noise around the eyes and scans with no valid depth points are typical types of errors that the anchor point algorithm cannot handle. Only 1 good scan was incorrectly rejected and 2 bad scans were not rejected. After implementing the reject option, the ROC was recalculated and a new EER of 1.2% was achieved as shown in Figure 11

6

[1] P. Besl and N. McKay. A method for registration of 3d shapes. IEEE Trans. on PAMI, 14(2):239–256, 1992. [2] C. Beumier and M. Acheroy. Automatic 3d face authentication. Image and Vision Computing, 18(4):315–321, 2000. [3] V. Blanz and T. Vetter. Face recognition based on fitting a 3d morphable model. IEEE Trans. on PAMI, 25(9):1063–1074, 2003. [4] C. Boehnen and P. Flynn. Accuracy of 3d scanning technologies in a face scanning scenario. In Proc. of The 5th International Conf. on 3DIM, Ottawa, Ontario Canada, 2005. [5] D. Chetverikov, D. Svirko, D. Stepanov, and P. Kresek. The trimmed iterative closest point algorithm. In Proc. 16th International Conf. on Pattern Recognitionn, volume 3, page 30545, Quebec City, 2002. [6] D. Colbry, G. Stockman, and A. Jain. Detection of anchor points for 3d face verification. In Proc. of A3DISS 2005, San Diego California, 2005. [7] Y. Lee, H. Song, U. Yang, H. Shin, and K. Sohn. Local feature based 3d face recognition. In AVBPA, pages 909– 918, 2005. [8] X. Lu, A. K. Jain, and D. Colbry. Matching 2.5d face scans to 3d models. IEEE Trans. on PAMI, 28:31–34, 2005. [9] B. Manjunath, R. Chellappa, and C. v. d. Malsburg. A feature based approach to face recognition. In Proc. of IEEE Computer Society Conf. CVPR, pages 373–378, 1992. [10] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In Proc. of IEEE Computer Society Conf. CVPR, 2005. [11] P. J. Phillips, P. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi, and M. Bone. Face recognition vendor test 2002 overview and summary. 2003. [12] D. Riccio and J.-L. Dugelay. Asymmetric 3d/2d processing: A novel approach for face recognition. In Proc. ICIAP, pages 986–993, 2005. [13] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proc. of IEEE Computer Society Conf. CVPR, pages 586–591, 1991. [14] Vivid910. Minolta vivid 910 non-contact 3d laser scanner. http://kmpi.konicaminolta.us/eprise/main/content/ISD/ ISD Product Pages/Vivid 910, 2005.

Conclusions

We conducted several experiments using the Minolta Vivid 910 to determine how changes in sensor conditions (pose, distance from camera, lighting) impact our 3D face matching system. We found that: 1. Most color and lighting changes did not adversely affect the 3D face matching system, although severe color and lighting should be avoided for optimum results. 2. Under optimally controlled baseline operating conditions, the SAM values are 0.37mm ± 0.15mm. 3. Our face matching system can easily tolerate rotations (yaw) of up to ±30 degrees from frontal. Changes in pitch and roll were not as well tolerated (±10 degrees). However, when the ICP iteration limit was increased from 10 to 50, the acceptable SAM range increased to ±30 degrees from center for variations in roll. 4. The effective operating distance between the subject and the camera (stand off) is between 0.65 meters and 2.4 meters. 5. Using the SAM as a matching metric can produce reasonable results (3.77% EER) on a large dataset. 6

Suggest Documents