Document not found! Please try again

Learning Low Dimensional Invariant Signature of 3 ... - Semantic Scholar

2 downloads 54 Views 3MB Size Report
problems using the signature representation. We assume that a set of labeled data (x½ ½) (x¾ ¾). (xС С),. i.e. (face-image, angle) pairs, is available for training ...
Learning Low Dimensional Invariant Signature of 3-D Object under Varying View and Illumination from 2-D Appearances Stan Z. Li, Jie Yan, XinWen Hou, ZeYu Li, Hongjiang Zhang Microsoft Research China, Beijing Sigma Center, Beijing 100080, China Contact: [email protected], http://research.microsoft.com/szli Abstract

illumination. Much research has been done in this area [9, 18, 12, 1, 3, 13, 30, 2, 4, 15, 11, 37, 19, 31]. Finding illumination and viewpoint invariants is a hard problem. It has been shown that there are no illumination invariants for Lambertian surfaces [11]. Distributions of appearances in linear subspaces, such as those based on principle component analysis (PCA), under perceivable variations in viewpoint and illumination are highly nonlinear, nonconvex, complex and perhaps twisted [6, 24, 16, 17, 8]. Indeed, a single linear model can hardly provide a solution for the problem. A mixture of probabilistic PCA subspaces [34, 7], a generative model, may be used for this. In this paper, we propose a nonlinear method for learning, from example multi-view appearances, a signature representation in a low dimensional nonlinear feature space for describing 3-D object under changing illumination and view conditions. The learned signature has the following three properties: (1) Its location in the feature space is a simple function of the view and is insensitive or nearly invariant to illumination. (2) It deforms continuously as the view changes, so that the object appearances at all possible views constitute a known simple curve segment in the feature space. (3) The coordinates of the projection of the object onto the feature space are correlated in a known way according to a predefined function of the view. The signatures computed from the example appearances of a certain view constitute a very tight nonlinear subspace for the object seen at that view. The view-specific signature can be used for two types of applications: model identification (object detection) and model parameter estimation (pose estimation). The former is an application of properties (1) and (2) whereas the latter is one of property (3). The invariant signature can be computed by performing a nonlinear mapping from the input (e.g. image) space to the feature space. We propose a method for learning such a mapping from view-labeled example appearances, using an array of correlated support vector regression (SVR) [36, 32] filters. Each filter is tuned to produce the desired output according to the predefined function. The SVR filters are trained to correlate one another. The output of the SVR

In this paper, we propose an invariant signature representation for appearances of 3-D object under varying view and illumination, and a method for learning the signature from multi-view appearance examples. The signature, a nonlinear feature, provides a good basis for 3-D object detection and pose estimation due to its following properties: (1) Its location in the signature feature space is a simple function of the view and is insensitive or invariant to illumination. (2) It changes continuously as the view changes, so that the object appearances at all possible views should constitute a known simple curve segment (manifold) in the feature space. (3) The coordinates of the object appearances in the feature space are correlated in a known way according to a predefined function of the view. The first two properties provide a basis for object detection and the third for view (pose) estimation. To compute the signature representation from input, we present a nonlinear regression method for learning a nonlinear mapping from the input (e.g. image) space to the feature space. The ideas of the signature representation and the learning method are illustrated with experimental results for the object of human face. It is shown that the face object can be effectively modeled compactly in a 10-D nonlinear feature space. The 10-D signature presents excellent insensitivity to changes in illumination for any view. The correlation of the signature coordinates is well determined by the predefined parametric function. Applications of the proposed method in face detection and pose estimation are demonstrated.

1 Introduction The appearance-based vision approach (see e.g. [21, 35, 5, 27, 24]) attempts to avoid difficulties encountered in traditional 3-D vision by modeling a 3-D object using its 2-D appearances. To facilitate tasks such as object detection and recognition, it is desirable to derive a representation which is invariant to changes in viewpoint and 1

array has the predefined shape for the object of interest because it is so trained. The proposed method is demonstrated with the face object and applied to multi-view (out-of-plane rotations) face detection and pose estimation. Results show that the face object can be effectively modeled by an invariant signature of 10 coefficients; the trained nonlinear mapping from the input image to the 10-D signature presents excellent insensitivity or invariance to illumination. The face detection is performed by differentiating signatures of faces from SVR output of other shapes in the 10-D space. The pose estimation is to estimate the parameter with which the predefined shape is generated, by using least squares fit. The rest of the paper is organized as follows: Section 2 describes the invariant signature representation and the learning of the nonlinear mapping. Section 3 demonstrates its applications in face detection and pose estimation. Section 4 presents experimental results.

2 Learning Invariant Signature Representation 2.1 Invariant Signature Representation Let x 2 R be a windowed image, or appearance, of the object of interest, possibly preprocessed. The appearance x is subject not only to the view parameter , but also to illumination parameters u. Denote the dependencies of the appearance on the view and illumination by x = x(; u), we want to construct a nonlinear mapping y : x ! R L from the input (e.g. image) space to an L-dimensional feature space embedded in R N . The intrinsic dimension of the pattern of interest should be much lower than that of the input, L  N , so that dimension reduction is achieved. In fact, The mapping should map samples of the pattern of interest to a very small manifold in R L . The mapped point y = (y 0 ; : : : ; yL 1 ) in the L dimensional feature space is called a signature because we require it to have the following characteristics for appearances of the object of interest:  y(x) It is invariant to illumination, that is, y(x(; u 0 )) = y(x(; u00 )) for u0 6= u00 . As such, y = y(x()) is ultimately a (vector) function of the view parameter .  y(x) is a known point in RL , whose location is determined by the view (x). This way, detecting the object at a certain view is to verify whether y(x) is near that predictable point.  y(x) deforms continuously as the view changes, so that the object appearances at all possible views constitute a simple curve segment in the feature space. This way, detecting the object at any view is to verify whether y(x) is near that curve segment. N

1.2

1.2

1

1

0.8

0.8

0.6

0.6

0.4

0.4 Frontal Lighting Left Lighting Right Lighting

0.2 0 −0.2 1.10

1

2

3

4

5

6

7

0.2

Frontal Lighting Left Lighting Right Lighting

0 8

9

−0.2 10

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

0.8 0.9 0.6

0.4 0.7

0.5 0

Frontal Lighting Left Lighting Right Lighting

1

2

3

4

5

0.2

6

7

8

9

0 0

Figure 1. For the object of interest (human faces), the 10-D signatures y of a fixed view are invariant (have the same shape) regardless of illumination. For non-object of interest (lower-right), the mapped points y are unconstrained or inconsistent with the signatures of the object.

 The L values (y0 ; : : : ; yL

1) are correlated to each other according to some predefined function of the view . In other words, the L points (j; y j ) form a shape in the j -y j plane. This way, the view, i.e. object pose, can be estimated by using a regression method. These properties are illustrated by examples in Fig.1. Note that when the input appearance is not of the object of interest, e.g. a nonface pattern, y are not constrained by the properties; see lower-left of the figure. 2.2 Learning the Nonlinear Mapping

Figure 2. Examples of the face object from frontal to side view with variable illumination. Let us give a description of the data before presenting the ideas about the learning. An assumption is made on the view , that all right rotated faces (those with view angles between 91Æ and 180Æ ) are mirrored to left rotated; this does not cause any loss of generality of the method. Let a set of training example appearances be given, such as shown in

Fig. 2 (see the Experiments section for more details). Quantize the pose of the training examples into a set of L discrete values such as the view angles. We choose L = 10 for 10 equally spaced angles between 0 and 90 degrees,  0 = 0Æ , 1 = 10Æ,   , 9 = 90Æ, with 0Æ corresponding to the left side view and 90Æ to the frontal view. The choice of L = 10 not be optimal, but is low enough as compared to the original image size yet empirically sufficient for representing a pattern such as faces. Label each example by the nearest one of L view labels. Now L view-labeled training subsets are available for learning the mapping. We propose to use an array of L correlated regression estimators to implement the mapping y = y(x) from R N to RL , each estimator forming a “channel”. Each channel j has its designated view j with (j < j +1 ). We choose to use a support vector regression (SVR) estimator [32] to implement the mapping x ! y j for each channel j because it has good generalization ability and is a general purpose algorithm. With the SVR estimators, the structure of the SVR array (and subsequent face detection and pose estimation) is illustrated in Fig.3.

ance x has the same view as the designated view, i.e. when (x) = j . So, letting jj = j((x) j )=90Æj 2 [0; 1] ((x) 2 (0Æ ; 90)), g ( ) should be an even and montonically decreasing function. Possible choices include g ( ) = cos( 2  ), 1 j j, and 1  2 . With the cosine choice, the output of the j -th channel is specified as y j = cos((x) j ). The inverse of the g function can be recovered as

g 1 (yj ) = (x) j

(2)

which is the residual difference between the true view of x and the designate view for SVR j . SVR j is trained to produce, for each sample x, the objective value yj , which eventually is a function of ((x) j ) according to Eq.(1). Note that it is trained using not only the examples belonging to the designated view  j but also all the others as well; unlike in [27] where a view subspace is learned from examples of that view only. Once trained, the view estimate for a sample x is calculated as

yj (x) = cos ((x) j ) =

l X j

i=1

ji K (xji  x) + bj

(3)

where K is the kernel function and  ji and bj are coefficients. From this we are able to calculate the view angle estimate, using the channel j output, as follows

^j (x) = j + cos

0l 1 X 1 @ j K (xj  x) + bj A i i j

i=1

(4)

3 Face Detection and Pose Estimation

Figure 3. The structure of the SVR array and the composite face detector and pose estimator. The SVR’s are correlated in such a way that their L output values y = (y0 ; : : : ; yL 1 ) are constrained by a predefined function g

yj = yj (x) = g((x) j )

(1)

where (x) denotes the view of the object in the training image x (when x is an un-labeled image, finding (x) from x is the pose estimation problem). The L correlated values y = (y0 ; : : : ; yL 1) constitute a signature of the object. The function g is such a function that SVR j should produce the maximum output when the object in the appear-

Multi-view face detection is important because approximately 75% of the faces in home photos are non-frontal [22]. Gong et al. extend SVMs to model the appearance of human faces which undergo nonlinear changes across multiple views and use a single SVR to estimate the pose [25, 23]. The view based approach has been used for multiview faces [20, 14, 29]. Given a windowed image, the tasks here are (1) to determine whether the windowed pattern belongs to the face object, and if yes, (2) estimate the pose. We solve these problems using the signature representation. We assume that a set of labeled data (x 1 ; 1 ); (x2 ; 2 ); : : : ; (xm ; m ), i.e. (face-image, angle) pairs, is available for training the SVR array. In addition, a set of negative examples of the objects, e.g. non-faces, are also available for training a face detector which in our system is a support vector machine classifier (SVC) [36, 10]. The training of the system shown in Fig.3 consists of three parts:

1. Train L correlated SVRs to learn the invariants of view subspaces using the labeled data of (face image, angle) pairs, 2. Train a SVC to classify between face and nonface (the view label is not needed in this stage), and 3. Build a regression estimator, which is a least squares (LS) fitter in our system, for the view parameter estimation. Face detection is based on the constraint that the SVR array outputs a unique signature constrained by y = y((x)) for a view of the object-of-interest (face in this case), but the outputs tend to be unconstrained for a nonface pattern. This is a model (object) identification problem. The face detector takes the L-dimensional SVR array output y = (y0 ; : : : ; yL 1 ) as the input, and classifies it into face or nonface. The problem is to differentiate the face manifold from its complementary in R L . A nonlinear SV classifier (SVC) [36, 10] is trained using the labeled face/nonface data, and then used to perform the classification. When x is classified as a face, pose estimation can be performed as follows: The signature y should be consistent with the values defined by Eq.(1) for any face pattern x within the considered view range. For example, with y j = cos((x) j ), the signature y = ( y0 ; : : : ; yL 1) should form approximately a cosine shape. This enables us to use a simple least squares (LS) regression method to estimate  from y. According to Eq.(2), we have  = g 1 (yj ) + j ; see Eq.(4) for the case with the cosine function for g . Therefore, to minimize the squared error in the pose angle, we just need to find

^LS (x) = arg min

X



[

(g

1 (yj ) + j )]2

(5)

j

(mostly from video). A total number of about 100,000 multi-view face images are generated from the 6,000 samples in the following way: Each original (including mirrored) sample is left- and right- rotated by 5 degrees, and this gives additional 2 rotated versions of the sample. Each of these 3 versions is then shifted left, right, up and down by half pixel, and this produces additional 4 shifted versions. By these, each sample is duplicated into 15 varied versions. About 1/3 of the 100K examples are used to train the array of the correlated view-specific SVRs, another 1/3 used to train the SVC, and the last 1/3 to test the system. The composition of the three subsets is shown in Table 1. Note that the classification of the samples into the 10 view classes is the result of manual view-labeling and is error-prone. After mirroring part of the examples, the views are in the range of [0 Æ ; 90Æ ] with 0Æ representing the side view and 90Æ the frontal view. For the training data, the range is quantized into a set of L = 10 values f0 Æ ; 10Æ ; : : : ; 90Æ g. Table 1. Composition of three data sets View Set 1 Set 2 Set 3 90Æ 4215 4200 3985 80Æ 3765 5025 3725 70Æ 3795 3285 3365 60Æ 4192 3645 3145 50Æ 3546 3525 3025 40Æ 3585 2760 2960 30Æ 2895 3345 3245 20Æ 3329 2640 2940 10Æ 2970 3285 3235 0Æ 2190 2630 2955 Tot.Faces 34482 34340 32580 Nonfaces 0 11620 11000

The solution to this is simply

X1

The following experiments are aimed to evaluate the performance of the proposed regression array method in illumination-invariant, view-specific representation, and its applications in face detection and pose estimation.

Each windowed subimage is normalized into a fixed size of 20  20 pixels, and preprocessed by illumination correction (see [28] for the algorithm), mean value normalization, histogram equalization, as is done in most existing systems, e.g. [33, 28, 26]. Then, one-dimensional Haar wavelet transform, which encodes differences in average intensities between different regions, is applied to the preprocessed data. The low frequency region from the transform consists of a subimage of 10  10 pixels. The four corners of the subimage are removed to reduce artifacts. This gives a 88-dimensional feature vector which is the actual form of the input x to the SVR array.

4.1 Data Preparation

4.2 Training

More than 6,000 face examples (some were shown in Fig. 2) were collected by cropping from various sources

Ten view-specific SVRs, corresponding to  j (i = 0; : : : ; 9) of 0; 10; : : : ; 90 degrees, are trained

^LS =

1

L

L j=0

[g

1 (yj ) + j ]

(6)

which can be computed easily from the SVR output y k .

4 Experiments

1.2 1 0.8

1.3 1.2

0.6

1.1

0.4

1 0.9

0.2

0.8

0

0.7 0.6

−0.2 0

0.5 1000 8

600 6

400

4 200

2 0

1

2

3

4

5

6

7

8

9

Figure 5. The signatures for the three difficult images reported by Schneiderman.

800

0

Figure 4. Signatures for faces labeled with the 50Æ view.

5

x 10

4

3.5 Face None Face

4

x 10

4

Face None Face

3 2.5

using data set 1 to produce the correlated output according to Eq.(1), with the cosine output function yj = cos( j ). The “SVM Torch” package from www.idiap.ch/learning/SVMTorch.html is used for SVR learning and estimation, with the Gaussian kernel of size  = 1, the width of the error interval for regression of  = 0:1, and the cost of mis-classification of C = 500. The selection of a kernel function and the involved parameters is an engineering practice [36]. Data set 2 (see Table 1) is used to train the SVC for face detection. The SVC basically classifies between signatures of faces and unconstrained SVR output of nonfaces. The SVM Light package (www-ai.cs.unidortmund.de/SOFTWARE/SVM_LIGHT/ svm_light.eng.html) is used for the SVC training, with the Gaussian kernel of  = 1 and C = 100.

4.3 Performance Evaluations The following results and statistics are obtained using data set 3. The illumination invariance and view-specific properties of the signature representation have been demonstrated in Fig.1. This is further proven with a larger data (1000 faces of 50 Æ view) in Fig.4. Fig.5 shows results for three of the challenging face images reported by Schneiderman at www.cs.cmu.edu/afs/cs.cmu.edu/user/hws/ www/face_detection.html. Of the three, two are near the frontal view at 90 Æ and the other one is between 40Æ and 50Æ . The angles are well estimated and the signatures are not affected by lighting and well represented by the objective values by y j = cos( j ). The following demonstrates face detection and pose estimation results. Fig. 6 shows the distribution of SVC outputs for face and nonface patterns for the test set, with that of the training set present as a reference. For the test set (set 3), the missing detection rate is 1.88% while the false

3

2

2

1.5 1

1 0.5 0 −4

−3

−2

−1

0

1

2

3

0 −6

−5

−4

−3

−2

−1

0

1

2

3

4

Figure 6. Distributions of SVC outputs for faces and nonfaces, for the training set (left) and test set.

detection rate is 9.09%. For the training set (set 2), the rates are 0.07% and 0.86%, respectively. Noticing that the number of nonface training examples is not as big as that of faces (normally it should be much bigger according to experiences [28, 33]), we believe that the false alarm can be reduced by re-training the SVC after more nonface patterns near boundaries are added using bootstrapping. Our pose estimator composed of the SVR array and an LS regression is compared with a single SVR architecture in pose angle estimation for face patterns (bear in mind that the latter architecture does not cater to face detection). One way to evaluate the accuracy is to consider the estimate to be accurate when the estimated value is within 20 Æ from the label of the sample, or an error if outside. Our estimator achieves 10% higher accuracy than the single SVR. The accuracy obtained by using the cosine constraining function is compared against the linear and quadratic functions mentioned earlier. It shows that the cosine type function in produces the best results. The cosine type function and its slop somehow reflect appearance changes in a way consistent to out-of-plane rotations. Finally, the face detection and pose estimation are jointly performed on real images. Fig.7 shows two examples. A test set of 50 images collected from VCD movies is used for the evaluation, with 146 multi-view faces that vary from frontal to side views. The images are scanned at different scales and locations, and the pattern in each sub-window is classified into one of the L + 1 classes, giving the decision

Figure 7. Results of multi-view face detection and pose estimation.

of face/nonface and the pose estimate if a face. Of the 146 faces, 113 are detected with correct pose estimation (the difference between the estimated angle and the labeled angle is less than 20 degree), there are 89 false alarms.

5 Conclusion We have proposed a novel nonlinear method to learn a low dimensional representation for multi-view image appearances. The method can discover a nonlinear manifold of the object of interest embedded in the input space, and achieve a low dimensional signature representation having the following properties: (1) The signature is illumination invariant, and for a given view, it has a fixed shape; this provides a good basis for identifying the object of interest from others. (2) The signature is a continuous function of the view parameter; this facilitates the estimation of the view parameter. (3) The span of the signatures over the views constitutes the nonlinear manifold of the object; this can be used for object detection. The invariant signature representation and its applications may be further developed. For example, some linear or nonlinear PCA may be performed on the signature data to achieve more efficient and compact representations. We may also develop better methods for identifying the collection of the modeled signatures from other shapes. Acknowledgement The authors would like to thank Anil Jain for his comments on the drafts of the paper, and QingDong Fu for his assistance in collecting the face samples.

References [1] Y. Adini, Y. Moses, and S. Ullman. Face recognition: The problem of compensating for changes in illumination direction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):721–732, July 1997. [2] S. Baker, S. Nayar, and H. Murase. “Parametric feature detection”. International Journal of Computer Vision, 27(1):27–50, March 1998. [3] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720, July 1997. [4] P. N. Belhumeur and D. J. Kriegman. “What is the set of images of an object under all possible illumination conditions”. IJCV, 28(3):245–260, July 1998.

[5] D. Beymer, A. Shashua, and T. Poggio. “Example based image analysis and synthesis”. A. I. Memo 1431, MIT, 1993. [6] M. Bichsel and A. P. Pentland. “Human face recognition and the face image set’s topology”. CVGIP: Image Understanding, 59:254–261, 1994. [7] C. Bishop and J. M. Winn. “Nonlinear Bayesian image modeling”. In Proceedings of the European Conference on Computer Vision, pages 1–15, 2000. [8] H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz. “Active object recognition in parametric eigenspace”. In Proc. 9th British Machine Vision Conference, pages 63–72, Southampton, UK, 1998. [9] R. Brunelli. Estimation of pose and illuminant direction for face processing. A. I. Memo 1499, MIT, 1994. [10] C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998. [11] H. F. Chen, P. N. Belhumeur, and D. W. Jacobs. “In search of illumination invariants”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages I:254–261, 2000. [12] R. Epstein, P. Hallinan, and A. Yuille. “5 2 eigenimages suffice: An empirical investigation of low-dimensional lighting models”. In IEEE Workshop on Physics-Based Vision, pages 108–116, 1995. [13] K. Etemad and R. Chellapa. “Face recognition using discriminant eigenvectors”. 1996. [14] J. Feraud, O. Bernier, and M. Collobert. “A fast and accurate face detector for indexation of face images”. In Proc. Fourth IEEE Int. Conf on Automatic Face and Gesture Recognition, Grenoble, 2000. [15] A. S. Georghiades, D. J. Kriegman, and P. N. Belhumeur. Illumination cones for recognition under variable lighting: Faces. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 52–59, 1998. [16] S. Gong, S. McKenna, and J. Collins. “An investigation into face pose distribution”. In Proc. IEEE International Conference on Face and Gesture Recognition, Vermont, 1996. [17] D. Graham and N. Allinson. “Face recognition from unfamiliar views: Subspace methods and pose dependency””. In Proc. 3rd International Conference on Automatic Face and Gesture Recognition, pages 348–353, Nara, Japan, April 1998. [18] P. W. Hallinan. “A low-dimensional representation of human faces for arbitrary lighting conditions”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 995–999, 1994. [19] J. Hornegger, H. Niemann, and R. Risack. “Appearance-based object recognition using optimal feature transforms”. Pattern Recognition, 33(2):209–224, February 2000. [20] J. Huang, X. Shao, and H. Wechsler. “Face pose discrimination using support vector machines (SVM)”. In Proceedings of International Conference Pattern Recognition, Brisbane, Queensland, Australia, 1998. [21] M. Kirby and L. Sirovich. “Application of the Karhunen-Loeve procedure for the characterization of human faces”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103–108, January 1990. [22] A. Kuchinsky, C. Pering, M. L. Creech, D. Freeze, B. Serra, and J. Gwizdka. ”FotoFile: A consumer multimedia organization and retrieval system”. In Proc. ACM HCI’99 Conference, 1999. [23] Y. M. Li, S. G. Gong, and H. Liddell. “support vector regression and classification based multi-view face detection and recognition”. In IEEE Int. Conf. Oo Face & Gesture Recognition, pages 300–305, France, 2000. [24] H. Murase and S. K. Nayar. “Visual learning and recognition of 3-D objects from appearance”. International Journal of Computer Vision, 14:5–24, 1995. [25] J. Ng and S. Gong. “performing multi-view face detection and pose estimation using a composite support vector machine across the view sphere”. In Proc. IEEE International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pages 14–21, Corfu, Greece, September 1999. [26] E. Osuna, R. Freund, and F. Girosi. “Training support vector machines: An application to face detection”. In CVPR, pages 130–136, 1997. [27] A. P. Pentland, B. Moghaddam, and T. Starner. “View-based and modular eigenspaces for face recognition”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 84–91, 1994. [28] H. A. Rowley, S. Baluja, and T. Kanade. “Neural network-based face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–28, 1998. [29] H. Schneiderman and T. Kanade. “a statistical method for 3d object detection applied to faces and cars. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2000. [30] A. Shashua. “On photometric issues in 3d visual recognition from a single 2d image”. International Journal of Computer Vision, 21:99–122, 1997. [31] A. Shashua and T. R. Raviv. “The quotient image: Class based re-rendering and recognition with varying illuminations”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2):129–139, 2001. [32] A. J. Smola and B. Sch¨olkopf. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998. [33] K.-K. Sung and T. Poggio. “Example-based learning for view-based human face detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51, 1998. [34] M. E. Tipping and C. M. Bishop. “Mixtures of probabilistic principal component analyzers”. Neural Computation, 11(2):443–482, 1999. [35] M. A. Turk and A. P. Pentland. “Face recognition using eigenfaces.”. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 586–591, Hawaii, June 1991. [36] V. N. Vapnik. Statistical learning theory. John Wiley & Sons, New York, 1998. [37] A. Yilmaz and M. Gokmen. “Eigenhill vs. eigenface and eigenedge”. In Proceedings of International Conference Pattern Recognition, pages 827–830, Barcelona, Spain, 2000.



Suggest Documents