Nonlinear Mapping from Multi-View Face Patterns ... - Semantic Scholar

0 downloads 0 Views 2MB Size Report
When face patterns are subject to changes in view and illumination and facial shape, their distribution is highly nonlinear and complex in any space linear to the ...
Nonlinear Mapping from Multi-View Face Patterns to a Gaussian Distribution in a Low Dimensional Space Stan Z. Li, Rong Xiao, ZeYu Li, HongJiang Zhang Microsoft Research China, Beijing Sigma Center, Beijing 100080, China Contact: [email protected], http://research.microsoft.com/szli Abstract

tate [13], and use kernel support vector machines (SVMs) for multi-pose face detection and pose estimation [24, 20]. Huang et al. [16] use an SVM to classify between three face poses viewed at 33:75; 0; +33:75 degrees. Multi-view face detection and pose estimation require to model faces seen from varying view points, subject to variations in illumination and facial shape. An illumination and view invariant subspace modeling, by which undesirable variations are reduced or even removed, would provide a powerful basis for object detection and recognition [7, 14, 9, 1, 3, 10, 30, 2, 4, 12, 8, 39, 15, 31]. Appearance based methods [22, 23] avoid difficulties in 3D modeling by using images or appearances of the object viewed from possible viewpoints. Linear principal component analysis (PCA) is a powerful technique for data reduction and feature extraction from high dimensional data. It has been used to represent faces [17] for face detection and recognition [36, 21, 33]. In [7], a learning method is proposed which learns an input-output mapping from examples for the compensation of illumination and estimation of pose, in which computer graphics techniques need be used to synthesize new, valid examples for the learning and the location of one of the eyes needs be known for estimating the pose. The distribution of face images under a perceivable variation in viewpoint, illumination or expression is highly nonconvex and complex [5]. The pixel values of an image can change drastically as illumination changes, even when the view and the individual remain fixed, and actually variations between images induced by illumination changes are larger than differences between individuals [1]. A further evaluation using 107 different representations, such as edge maps, image derivatives, images filtered by a 2D Gabor-like functions, shows that none of them is sufficient by itself to overcome such image variations for face recognition [1]. Similar conclusions also hold regarding image variations induced by changes in viewpoint and expression. Face distributions under changes in illumination and view can hardly be well represented by using a single linear model, such as PCA based, to described variations

When face patterns are subject to changes in view and illumination and facial shape, their distribution is highly nonlinear and complex in any space linear to the original image space. In this paper, we investigate into a nonlinear mapping by which multi-view face patterns in the input space are mapped into invariant points in a low (10-) dimensional feature space. The invariance to both illumination and view is achieved in two-stages: First, a nonlinear mapping from the input space to a low (10-) dimensional feature space is learned from multi-view face examples to achieve illumination invariance. The illumination invariant feature points of face patterns across views are on a curve parameterized by the view parameter, and the view parameter of a face pattern can be estimated from the location of the feature point on the curve by using least squares fit. Then the second nonlinear mapping, which is from the illumination invariant feature space to another feature space of the same dimension, is performed to achieve invariance to both illumination and view. This amounts to do a normalization based on the view estimate. By the two stage nonlinear mapping, multi-view face patterns are mapped to a zero mean Gaussian distribution in the latter feature space. Properties of the nonlinear mappings and the Gaussian face distribution are explored and supported by experiments.

1 Introduction Dealing with multi-view faces is important for many face-related applications. Statistics show that approximately 75% of the faces in home photos are non-frontal [18]. Over past years, progress has been made for nonfrontal faces detection and recognition. Feraud et al. [11] and Schneiderman and Kanade [29] adopt the view-based representation [26] in face detection. Wiskott et al. [38] build elastic bunch graph templates for multi-view face detection and recognition. Gong and colleagues study the trajectories of faces in linear PCA feature spaces as they ro1

encountered in practice. When a view-labeled data set of appearances is available, the modeling can be done in a supervised way. In view-based approach [26], the range of view is partitioned into a number of intervals. A view-subspace defines the manifold of possible appearances of the object in each interval, subject to illumination. Such view-subspaces can be constructed in supervised ways by using view-labeled examples. The supervised learning can also be done to build parametric subspaces: With training data labeled and sorted according to the view (and perhaps also illumination values), one may be able to construct a manifold describing the distribution across views [22, 2]. Gong and colleagues use kernel support vector machines for multipose face detection and pose estimation [24, 20]. In these view-based methods, different view channels, and thus the resulting view-subspaces, are not correlated in any way for a sensible use of them. Several techniques have been proposed for nonlinear manifold (subspace) modeling. Methods proposed so far can be categorized into two broad categories: modeling using piecewise linear models and modeling using a single nonlinear model. Recent two papers in Science [34, 27] have explored methods in this two approaches. Both compute measures of the local geometry of the manifold. Start with this, the ISOMAP method [34] uses a nonlinear method by performing multidimensional scaling on a matrix derived based on non-Euclidean geometry of the data points. The locally linear embedding algorithm [27] finds a set of low-dimensional points, each of which can be linearly approximated by its neighbors with the same coefficients that were determined from the high-dimensional data points. A nonlinear manifold can be also modeled by using a mixture of probabilistic PCA subspace model, with a given number of mixtures [35] or an unknown number of them [6]; based on variational inferences and latent variable models. Being a generative model, it has an advantage in that probabilities are readily available for later use such as for pattern classification purposes. In this paper, we present a method for learning a nonlinear mapping for extraction of illumination and view invariant representations and for dimension reduction (Section 2). The invariance to both illumination and view is achieved in two-stages: First, a constrained nonlinear mapping from the input space to a low (10-) dimensional invariant feature space is learned through supervised learning. This mapping transforms multi-view face images, of which the distribution is highly complex in the high dimensional input space, to a distribution around a simple curve in the feature space. It eliminates undesirable effect of illumination, obtain illumination-invariance, and simplifies manifold of face patterns across views into a simple curve. The mapping is constrained in such a way that there is a

one-to-one correspondence between the view of a face pattern and its location in the feature space. This makes it easy to estimate the view (pose) by using a simple least squares (LS) fit. Then, a second nonlinear mapping is performed to achieve the view-invariance. This is done by a normalization, using the LS view estimate, which cancels the shift of feature points caused by the view change. A representation that is invariant to both illumination and view is thereby achieved. After the two stage mapping, the face patterns become a zero mean Gaussian distribution in the feature space. Properties of the nonlinear mappings and the obtained invariance are discussed and demonstrated by experiments (Sections 2 & 3).

2 Learning Invariant Manifold of MultiView Faces First, the nonlinear mapping is learned by an SVR array, which achieves illumination-invariant representation and reduces the data dimensionality. This is view-specific and provides a good basis for pose estimation. Then a second nonlinear mapping is performed to achieve invariance to both illumination and view. Properties of the nonlinear mapping and resulting invariance are explored. Let x 2 RN be a windowed grey-level image of a face, possibly preprocessed. The images x are subject to changes not only in the view , but also in illumination and facial shape. Assume that a training set of face examples is available; and that all left rotated faces (those with view angles between 91Æ and 180Æ ) are mirrored to right rotated so that every view angle is between 0 Æ and 90Æ ; this does not cause any loss of generality. Each example is manually labeled with its view value as close to the truth view of each example as possible, and then assigned into one of L (a given number) groups according to the nearest view value. This produces L view-labeled face image subsets for learning view-subspaces of faces. A pose is quantized into one of L discrete values. See more description of face data in the experiments section. The first mapping is from the input space R N to a RL dimensional feature space. An array of L = 10 “channels” of regression estimators to implement the the nonlinear mapping, y = y(x) = (y0 (x); : : : ; yL 1 (x)), as illustrated in Fig.1, where each channel j has a designated view  j (< j +1 ). The array of regression estimators is trained in the following way: (1) y j should reach the maximum when the view (x) of the input face pattern x matches the designated j . (2) The L output values y = (y 0 ; : : : ; yL 1 ) are mutually constrained to each other through a constraining function g in the following way

yj = yj (x) = g((x) j )

(1)

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2 0

Figure 1. The two-stage nonlinear mappings. The first is from the input space RN to the feature space RL , and the second from RL to RL . For face patterns x in the input, the final  = y~ (x) = (~y0 ; y~0 ; : : : ; y~L 1) is a zero output y mean Gaussian distribution in RL .

In such a way, the output values of the L channels becomes correlated to each other. The constraining function g ( ) should be even and reach its maximum at  = 0. Possible choices include g ( ) = cos(=2), 1 j j, and 1  2 . In the present work, the cosine type is chosen, and support vector regression (SVR) estimator is used to learn the mapping x ! yj = g ((x) j ) (see next section). We choose L = 10 for 10 equally spaced angles of designations 0 = 0Æ , 1 = 10Æ ,   , 9 = 90Æ , with 0Æ corresponding to the right side view and 90 Æ to the frontal view. The number L = 10 is not meant to be optimal. It is chosen because it is low enough as the dimension of a feature space for face patterns and seems high enough to characterize such patterns. A further investigation is needed to study the effect of this number and the partition of the range of view into L intervals. Denote the output of the first nonlinear mapping by y^ = (^ y0 ; : : : ; y^L 1) as opposed to the constraining val^ constitutes an illumination invariant signature ues yj . y with the following properties: (1) It is invariant to illumination for any face pattern x because the regression array ^ (x) is related to the view of x only, reis trained so that y gardless of illumination. (2) The sequence of the L points (j; y^j ) forms a curve in the j -y j plot. The curve is determined by the constraining function g according to Eq.1, and is parameterized by the true view (x) of the face pattern. The two properties are illustrated in the second rows of Figs.2-4. In contrast, coefficients obtained by viewbased PCA representation [26], which are un-correlated,

2

4

6

8

−0.2 0

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 −0.1 0

2

4

6

8

0

1

2

3

4

5

6

7

8

9

−0.1 0

1

2

3

4

5

6

7

8

9

Figure 2. Faces of the 0Æ view (row 1) from the

training (left) and test (right) sets, their signa^ after the first nonlinear mapping (row tures y 2) which are invariant to illumination, and the  after the second nonzero mean signatures y linear mapping (row 3) which are invariant to both illumination and view.

do not present such invariance nor similar regularity in the shape [19]. The first rows show faces of 3 views (0 Æ , 50Æ , and 90Æ ), randomly sampled from the training (left) and test (right) face databases. The faces are subject to differ^ have a ent illuminations. The 10-dimensional signatures y similar shape for faces of a similar view, regardless lighting direction. To achieve view invariance, the second nonlinear map^ to canping on the same feature space is applied to y cel the effect of the view change. Consider yj (x) = yj (x) g((x) j ) (j = 0; : : : ; L 1). Ideally, these  (x) quantities should be zero according to Eq.1, and by y we could map x of any illumination and any view to the zero vector (the origin point) in R L , achieving invariance to both illumination and view; this would build the simplest and most compact possible face representation in that space. However, this cannot be done exactly because in the testing stage; only the regression estimates y^j (x) can be computed according to Eq.(8) and (x) has to be estimated by some means before the second mapping can be defined. Finding (x) from a face image x is the pose estimation ^ (x) using the problem. The pose can be estimated from y least squares (LS) fit. The LS estimate ^LS minimizes the

which is invariant to both illumination and view. The normalized yj (x) is a zero mean random variable. The 3rd rows of Figs.2-4 show the illumination and view invari corresponding to the illumination invariant ant signatures y ^ in the 2nd row. signature y We notice from Figs.2-4 that the variance of the signatures for the half-side view is larger. There are two reasons for this: The first is because the data was manually labeled by human beings and the errors in the manual view-labeling tend to be larger for half-side views than for the two extreme views. Fig. 5 shows a few such examples found from the 50Æ view data set.

1.1 1

1 0.9

0.8

0.8 0.6 0.7 0.6 0

2

4

6

0.4 0

8

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

2

4

6

8

1.2

0 −0.1 0

1 0.8

0

1

2

3

4

5

6

7

8

9

−0.1 0

0.6 1

2

3

4

5

6

7

8

> 40 Degree

9

Figure 3. Those for 50Æ faces.

0.4 0.2 0 −0.2 0

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 2

4

6

8

−0.2 0

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

−0.1 0

1

2

3

4

5

6

7

8

9

−0.1 0

2

4

6

8

j

4

5

6

7

8

9

2000

1800 1

2

3

4

5

6

7

8

9

1600

1400

1200

sum of squared errors

j

3

Fig.6 shows the error of pose estimated from a test data set, and the distributions of single components yj (x) of the illumination and view invariant signature. Fig.7 shows the joint distributions of two components. From this figure, we see that these distributions have near zero mean and are very close to Gaussian.

Figure 4. Those for 90Æ faces.

X e() = [^y

2

Figure 5. Larger variance of the signatures in the 40Æ plot are due to larger errors in manual view-labeling.

0

−0.2 0

1

1000

g( j )]2

(2)

800

600

400

The error distribution of the LS estimate is shown in Fig.6. It is calculated from the test face samples whose views are evenly distributed across the 10 designated views. With the LS estimate, we can obtain a view normalized signature

yj (x) = y^j (x) g(^LS (x) j )

(3)

200

0 −60

−40

−20

0

20

40

60

Figure 6. The distribution of pose estimation errors ^LS (x) for all x from a test data set.

3 Properties of Learned Multi-View Face Manifolds

7000

6000

5000

4000

3000

2000

1000

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

300 250 200 150 100 50 0 0.05 0.05

0

0 −0.05

−0.05

200 150 100 50 0 0.05 0.05

0

0 −0.05

−0.05

Figure 7. The marginal distributions p( y0 ), p(y3 ), p(y6 ), and p(y9 ), p(y0 ; y3 ) and p(y6 ; y9 ) of the test set.

In the original input space R N , the trajectory of face patterns x, as the pose changes from the one extreme view min to the other max , is highly nonlinear. The nonlinear mapping into the invariant feature space, R L , not only reduces the dimensionality of face patterns but also makes the distribution much simpler. Assuming there is no noise nor estimation errors in the ^ (x) are the same as the trainSVR estimate (meaning y ing constraints y((x))) or in the LS fitting (meaning ^LS (x) = (x)), then in this ideal case, the SVR output has the following properties in R L :  For any face sample x, the corresponding SVR output y(x) is a point y(x) = (y0 (x); : : : ; yL 1 (x)) in RL . The coordinates of this point, y j (x) = g ((x) j ), depend on the true view (x) only, regardless of illumination and other artifacts.  As the view  changes from  min to max, the face trajectory is a simple curve segment in R L : fy() j min    maxg.  After the nonlinear normalization by Eq.3, the corre = 0 simply corresponds to the origin sponding trajectory y  () = 0 j min    max g. of RL : fy While the above discussions are about the face manifold in the L dimensional feature space R L , we can also visualize the face manifold in a three dimensional space as follows: Treat j , which varies in a set of discrete values, as a continuous value varying in the range of [0; =2]. Consider three dimensions of variations: (1) (x) 2 [0; =2], (2) 2 [0; =2], and (3) y = cos( ) or y 0 = y cos( ). In this 3D space, y is visualized as a curved surface f(; ; y (; )) j 0  ;  =2g, and y0 is a plane f(; ; 0) j 0  ;  =2g. In practice, there are always noise and errors and therefore face patterns form a distribution. The face manifold in the input space RN is a “cloud” distributed around a highly nonlinear curved volume due to illumination changes and other artifacts. In the feature space R L resulting from the first nonlinear mapping, in which the face pattern is illumination invariant only, the simple curve segment of y^ () drawn by faces across views is degraded into a cloud around the curve segment. If we look at the distribution of the face pattern produced after the second nonlinear mapping of Eq.3 whereby the face pattern is invariant to both  () in the feature illumination and view, the zero point of y space RL is degraded into a distribution around the origin.

4 Implementation of Nonlinear Mappings Support vector regression (SVR) [32] is chosen to be the algorithm for the estimators because it has good general-

ization ability and is a general purpose algorithm. In SVR, one is given a set of training data (x 1 ; y1 ); : : : ; (xm ; ym ) where xi 2 RN is a training example and y i 2 R is the associated target output. A linear SV regression function takes the form y (x) = w  x + b. Assume that we do not care about fitting errors as long as they are less than a small number , that is, jy (x i ) (w  xi + b)j  . One way to define the best w is to let kwk2 as small as possible. Then the optimal regression function can be found by solving the following constrained minimization

1 E (w ) = k w k2 2 jy(xi ) (w  xi + b)j  

min w subject to

X y( ) = (

(4)

8i

x

i xi  x) + b

i=1

(6)

where i 6= 0 and b are parameters found by SVR learning algorithm and l  m is the number of support vectors. A nonlinear regression can be performed using the kernel trick [37] as follows: (1) Use a mapping (x) to map the data a high dimensional feature space so that the nonlinear regression in the input space can be done by a linear regression in the the feature space. (2) Use the kernel trick to do the linear regression in the feature space without explicitly performing the mapping to that high, possibly infinite, dimensional space

y(x) =

X K( l

i=1

xi ; x) + b

i

where K is a kernel satisfying K (x; y) = (x)  (y).

(7)

the Mercer condition

For our nonlinear mapping, the training data for channel is computed from the set X of training faces and given as f(x; yj (x)) j8x 2 Xg where yj (x) = g ((x) j ) as in Eq.1. Once trained, the SVR regression estimate is calculated as

j

y^j (x) =

X K( lj

i=1

j i

xji  x) + bj

View TrainSet TestSet View TrainSet TestSet

90Æ 1752 2104 40Æ 2840 2820

80Æ 2376 2460 30Æ 3342 2916

70Æ 2664 1944 20Æ 3036 2628

60Æ 2316 2676 10Æ 3012 4008

50Æ 2868 2208 0Æ 3360 3360

Total 27566 27124

(5)

Solving it results in the optimal linear regression function

l

original (including mirrored) sample is left- and right- rotated by 5 degrees, and this gives additional 2 rotated versions of the sample. Each of these 3 versions is then shifted left, right, up and down by 2 pixels, and this produces additional 4 shifted versions. By these, each sample is duplicated into 15 varied versions.

(8)

where lj is the number of SV’s for channel j and ji and bj are the coefficients learned by SVR training.

5 Experiments More than 3,000 face samples were collected by cropping from various sources (mostly from video). A total number of about 55,000 multi-view face images are generated from the 3,000+ samples in the following way: Each

Each windowed subimage is normalized into a fixed size of 20  20 pixels, and preprocessed by illumination correction, mean value normalization, histogram equalization, as is done in most existing systems, e.g. [33, 28, 25]. Then, one-dimensional Haar wavelet transform, which encodes differences in average intensities between different regions, is applied to the preprocessed data. The low frequency region from the transform consists of a subimage of 10  10 pixels. The four corners of the subimage are removed to reduce artifacts. This results in an 88-dimensional feature vector which is the actual form of the input x to the SVR array.

5.1 Training of SVR Array view-specific SVRs, corresponding to  j of 0; 10; : : : ; 90 degrees, are trained using the training set to produce the desired correlated output in RL . The “SVM Torch” package from www.idiap.ch/learning/SVMTorch.html is used for SVR learning and estimation. A SVR is trained using face samples not only of its designated angle but also those of the other angles as well. The Gaussian kernel  = 1 is used for the SVRs, and the width of the error interval for regression is set to  = 0:1, and the cost of mis-classification during the training is is assigned to C = 500. The selection of a kernel function and the involved parameters is an engineering practice [37]. Ten

(i = 0; : : : ; 9)

5.2 Invariants and Distributions

^ after the Invariance of face patterns in R L , that is, y  after the second first nonlinear mapping by SVR’s and y nonlinear mapping by Eq.3, represented in signatures has been demonstrated in Figs.2-4. The distribution of view estimation error is shown in Figs.6 and 7. The zero mean  are Gaussian distributions of the normalized quantities y also shown in the figures.

6 Conclusion We have presented a nonlinear mapping method for extracting low (10-) dimensional features for multi-view faces which are invariant to both illumination and view. It is implemented by two stage nonlinear transforms. The result is an invariant signature representation with the fol^ of the lowing properties for face patterns: (1) The output y first stage, the nonlinear mapping by SVR array, is invariant to illumination only, and constitutes a signature whose shape is parameterized by the view. This not only eliminates the illumination effect but also allows us to estimate  resulting from the pose by regression. (2) The signature y the second stage mapping is invariant to both illumination and view. This enables us to represent face patterns effectively and compactly. There are several issues for further investigation. The objective values for the output of the SVR array are chosen to be a simply function, such as cosine, of the view. This embeds the view change in the L dimensional space. This could be improved in the following ways: (1) The view alone could be imbedded in a lower, which could be as low as one-, dimensional nonlinear space. (2) Some dimensions could be used to embed other intrinsic variations such as illumination.

References [1] Y. Adini, Y. Moses, and S. Ullman. Face recognition: The problem of compensating for changes in illumination direction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):721–732, July 1997. [2] S. Baker, S. Nayar, and H. Murase. “Parametric feature detection”. International Journal of Computer Vision, 27(1):27–50, March 1998. [3] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720, July 1997. [4] P. N. Belhumeur and D. J. Kriegman. “What is the set of images of an object under all possible illumination conditions”. IJCV, 28(3):245–260, July 1998. [5] M. Bichsel and A. P. Pentland. “Human face recognition and the face image set’s topology”. CVGIP: Image Understanding, 59:254–261, 1994. [6] C. Bishop and J. M. Winn. “Nonlinear Bayesian image modeling”. In Proceedings of the European Conference on Computer Vision, pages 1–15, 2000. [7] R. Brunelli. Estimation of pose and illuminant direction for face processing. A. I. Memo 1499, MIT, 1994. [8] H. F. Chen, P. N. Belhumeur, and D. W. Jacobs. “In search of illumination invariants”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages I:254–261, 2000.



2 eigenim[9] R. Epstein, P. Hallinan, and A. Yuille. “5 ages suffice: An empirical investigation of low-dimensional lighting models”. In IEEE Workshop on Physics-Based Vision, pages 108–116, 1995. [10] K. Etemad and R. Chellapa. “Face recognition using discriminant eigenvectors”. 1996. [11] J. Feraud, O. Bernier, and M. Collobert. “A fast and accurate face detector for indexation of face images”. In Proc. Fourth IEEE Int. Conf on Automatic Face and Gesture Recognition, Grenoble, 2000. [12] A. S. Georghiades, D. J. Kriegman, and P. N. Belhumeur. Illumination cones for recognition under variable lighting: Faces. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 52–59, 1998. [13] S. Gong, S. McKenna, and J. Collins. “An investigation into face pose distribution”. In Proc. IEEE International Conference on Face and Gesture Recognition, Vermont, 1996. [14] P. W. Hallinan. “A low-dimensional representation of human faces for arbitrary lighting conditions”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 995–999, 1994. [15] J. Hornegger, H. Niemann, and R. Risack. “Appearancebased object recognition using optimal feature transforms”. Pattern Recognition, 33(2):209–224, February 2000. [16] J. Huang, X. Shao, and H. Wechsler. “Face pose discrimination using support vector machines (SVM)”. In Proceedings of International Conference Pattern Recognition, Brisbane, Queensland, Australia, 1998. [17] M. Kirby and L. Sirovich. “Application of the KarhunenLoeve procedure for the characterization of human faces”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103–108, January 1990. [18] A. Kuchinsky, C. Pering, M. L. Creech, D. Freeze, B. Serra, and J. Gwizdka. ”FotoFile: A consumer multimedia organization and retrieval system”. In Proc. ACM HCI’99 Conference, 1999. [19] S. Z. Li, J. Yan, and H. J. Zhang. “Learning illuminationinvariant signature of 3-d object from 2-d multi-view appearances”. In Proceedings of IEEE International Conference on Computer Vision, page ???, Vancouver, Canada, July 9-12 2001. [20] Y. M. Li, S. G. Gong, and H. Liddell. “support vector regression and classification based multi-view face detection and recognition”. In IEEE Int. Conf. Oo Face & Gesture Recognition, pages 300–305, France, 2000. [21] B. Moghaddam and A. Pentland. “Probabilistic visual learning for object representation”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:696–710, July 1997. [22] H. Murase and S. K. Nayar. “Visual learning and recognition of 3-D objects from appearance”. International Journal of Computer Vision, 14:5–24, 1995. [23] S. Nayar, S. Nene, and H. Murase. Subspace methods for robot vision. RA, 12(5):750–758, October 1996. [24] J. Ng and S. Gong. “performing multi-view face detection and pose estimation using a composite support vector machine across the view sphere”. In Proc. IEEE International Workshop on Recognition, Analysis, and Tracking of Faces

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37] [38]

[39]

and Gestures in Real-Time Systems, pages 14–21, Corfu, Greece, September 1999. E. Osuna, R. Freund, and F. Girosi. “Training support vector machines: An application to face detection”. In CVPR, pages 130–136, 1997. A. P. Pentland, B. Moghaddam, and T. Starner. “View-based and modular eigenspaces for face recognition”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 84–91, 1994. S. Roweis and L. Saul. “Nonlinear dimensionality reduction by locally linear embedding”. Science, (5500):2323–2326, December 22 2000. H. A. Rowley, S. Baluja, and T. Kanade. “Neural networkbased face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–28, 1998. H. Schneiderman and T. Kanade. “a statistical method for 3d object detection applied to faces and cars. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2000. A. Shashua. “On photometric issues in 3d visual recognition from a single 2d image”. International Journal of Computer Vision, 21:99–122, 1997. A. Shashua and T. R. Raviv. “The quotient image: Class based re-rendering and recognition with varying illuminations”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2):129–139, 2001. A. J. Smola and B. Sch¨olkopf. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998. K.-K. Sung and T. Poggio. “Example-based learning for view-based human face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51, 1998. J. B. Tenenbaum, V. de Silva, and J. C. Langford. “A global geometric framework for nonlinear dimensionality reduction”. Science, (5500):2319–2323, December 22 2000. M. E. Tipping and C. M. Bishop. “Mixtures of probabilistic principal component analyzers”. Neural Computation, 11(2):443–482, 1999. M. A. Turk and A. P. Pentland. “Face recognition using eigenfaces.”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 586–591, Hawaii, June 1991. V. N. Vapnik. Statistical learning theory. John Wiley & Sons, New York, 1998. L. Wiskott, J. Fellous, N. Kruger, and C. V. malsburg. ”face recognition by elastic bunch graph matching”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):775–779, 1997. A. Yilmaz and M. Gokmen. “Eigenhill vs. eigenface and eigenedge”. In Proceedings of International Conference Pattern Recognition, pages 827–830, Barcelona, Spain, 2000.

Suggest Documents