Face Processing with Whitney Reduction Networks - Semantic Scholar

2 downloads 0 Views 147KB Size Report
Michael Kirby. Department of Mathematics. Department of Mathematics. UMIST, P.O. Box 88. Colorado State University. Manchester M60 1QD, U.K.. Fort Collins ...
Face Processing with Whitney Reduction Networks D.S. Broomhead

Michael Kirby

Department of Mathematics UMIST, P.O. Box 88 Manchester M60 1QD, U.K.

Department of Mathematics Colorado State University Fort Collins, CO 80523

Abstract

This paper investigates the application of the Whitney Reduction Network (WRN) to the lowdimensional characterization of digital images human faces. Motivated by Whitney's Embedding theorem from di erential topology the WRN provides a nonlinear parameterization of m dimensional manifolds. Based on this, the reduction of the high-dimensional raw image data consists of two-stages. First, a lowdimensional representation is constructed using an optimal projection. Secondly, a nonlinear inverse from the image of the projection to the null-space of the projection is approximated to permit the (almost) perfect reconstruction of the data. This architecture is applied to problem of representing a family of digital images, i.e., faces. We compare this method with the wellknown eigenpicture approach, which may be viewed as a special limiting case of the WRN.

1 Introduction

The application of the Karhunen-Loeve (KL) procedure for the representation of digital images of faces was introduced over a decade ago [10]. Since this time, this work has been extended and applied, see, e.g., [7, 12, 1]. The basis of the approach is to nd an optimal set of spanning eigenvectors, or eigenfaces, to represent the data. This is equivalent to nding an optimal rotation of the coordinate system, i.e., one which captures the maximum variance in each direction, subject to orthogonality. Given that the vectors associated with a family of images may not lie fully in a subspace, but rather on a submanifold, it follows that the spanning vector approach could be suboptimal. The representation of manifolds is achieved through a nonlinear parameterization, by computing a graph of the surface on which the data lies. Although fully nonlinear approaches have been attempted for this, they are computationally expensive and the representations are non-unique [5]. Here we propose the Whitney Reduction Network

for the processing of classes of high-dimensional images. The motivation for the approach adopted in this paper is the Whitney Embedding Theorem, or|more precisely|its proof as given in, for example, Hirsch [6]. The Whitney Embedding Theorem shows that it is always possible to nd a representation of a compact, nite-dimensional, di erentiable manifold as a submanifold of a vector space. Roughly speaking, given an m dimensional di erentiable manifold, we can nd a mapping to the Euclidean space R2m+1 which is di eomorphic onto its image. A di eomorphism is a di erentiable map with a di erentiable inverse.

2 The Whitney Network Architecture

In this Section we describe the main features of the architecture of the Whitney reduction network shown in Figure 1. It will be notationally convenient now to assume that the data set A  M is alternatively represented as an n  P data matrix X , i.e., each column is a sample point in Rn . The architecture of the network is driven by the decomposition of a data point x 2 A under the action of a projector P x = Px + (I ? P )x (1) If we let p = Px and q = (I ? P )x then we view any element x as being the sum of the portion of x in the range of P , i.e., p 2 R(P ) and the portion in the null space of P , i.e., q 2 N (P ). Whitney's theorem ensures the existence of the map from the range of the projector to its null space, i.e., q = f (p) This provides a parameterization of the data set A in terms of p as x = p + f (p)

The inverse of P takes a projected data point Px 2 A and maps it back to x. In practice P~ ?1 approximates the true inverse P ?1 and is obtained by solving the interpolation problem on the data set P~ ?1 : P A ! A. The actual inverse is viewed as a map between manifolds P ?1 : P M ! M The extent to which P~ ?1 acts as P ?1 on the extension from A to M will be considered the generalizability of P~ ?1 .

2.1 The Reduction Mapping

In general, the rst stage of the data reduction is the determination of a projector P which maps the data to a d-dimensional subspace where d  2m + 1 where m is the topological dimension of the data [6]. Although Whitney's theorem states almost every projector will be a di eomorphism if d is large enough (it need not exceed 2m + 1), in practice good projectors must be designed. Although it is not optimal in the sense of the WRN, in this paper we employ the PCA basis to permit a comparison of the WRN with the eigenpicture technique. Let U = [u1; u2 ; : : : ; un] denote the matrix of eigenvectors produced by KL. If we deem a set of vectors fud+1; : : : ; un g to span a subspace along which we may project it is sucient to form the matrix reduced n  d matrix U^1 whose columns are these basis vectors, i.e., U^1 = [u1j : : : jud]. The associated n  n orthogonal projector is then de ned explicitly as

P = U^1 U^1 T We note that the quantity Px 2 R(P ) is an n-tuple in the ambient basis, i.e.,

Px = (uT1 x)u1 + : : : + (uTd x)ud It is the expansion coecients which provide the ddimensional representation and these are given as

p^ = U^1T x It is convenient to introduce the notation P^ = U^1T to de ne the transformation which takes the data point x in the ambient space and produces the point p^, i.e., the representation of x in the reduced space

0 uT x 1 ^ =B p^ = Px @ ... CA 2 Rd 1

uTd x

2.2 The Reconstruction Mapping

Given that the data has been projected such that a perfect reconstruction exists, the problem now consists of determining a nonlinear mapping which acts to reconstruct the data. Again, the existence of the inverse is guaranteed by Whitney's theorem as long as the projected dimension d is large enough. We know it need be no larger than 2m + 1. For purposes of exposition it will be useful to assume that the data matrix has full rank r = n with the implication that n  m. Given an o.n. basis [u1; : : : ; un ], the ambient space Rn may then be decomposed into two orthogonal subspaces by de ning the additional projector

Q = U^2 U^2 T where U^2 = [ud+1j : : : jun]. The superposition of these

projectors rebuilds a data point, i.e., the identity mapping is I = P + Q, and Q = I ? P . The range of Q is the null space of P , i.e., R(Q) = N (P ). The component of each point x in N (P ), i.e., q = (I ? P )x = Qx is required for the training phase of the network, as described below. As before, we distinguish the projector Q from the linear mapping Q^ : Rn ! Rn?d where Q^ = U^2T . So the (n ? d)-tuple representing the component of x residing in the null space of P is denoted ^ . Note that if the data matrix has rank r < n, q^ = Qx i.e., is rank de cient, then the dimension of the range of Q, or null space of P , is r ? d. The purpose of the radial basis function mapping is to take the components of the points in X which reside in R(P ) and to map them to the associated points in N (P ), i.e., ^  Rd ! QX ^  Rn?d f : PX

by f~. It is this function which is guaranteed to exist by Whitney's theorem. The decomposition described in the previous sections and the associated reconstruction may be interpreted as an autoassociative reduction network with architecture as shown in Figure 1. The network is feedforward and consists of an input and output layer, as well as two internal layers.

3 Implementation

The ability to recover a data point after it has been projected relies on the fact the projector does not act along a secant of the data. It is also very plausible that data points which are almost projected onto each other will also create practical diculties when attempting to recover the projected data in the ambient space. Points which are close together in the domain of this

function must be pulled apart in the image. The natural consequence of this requirement is that the resulting mapping will be ill-conditioned and dicult to approximate. To control this problem we propose a criterion for good projections in the next Section.

3.1 Dimensionality Estimation

Given a projector P , as de ned, e.g., by the KL basis, we propose to select the number of dimensions d to be retained by requiring that the inequality

kPx ? Pyk  k kx ? yk (2) be satis ed for all x; y 2 A and some xed tolerance

k > 0. The tolerance k is a measure of the maxi-

mum permissible shortening of the distance between any two projected data points. 1 Equivalently, k is a lower bound on the norm-squared of the projected secants. We note that this good projection criterion in equation (2) may be used both to design good projections and to determine the reduction dimension d required for a given basis to produce a good projection. It can be shown that, given an n-dimensional basis, equation (2) will be satis ed if

p kP k^k  k

(3)

for all k^ 2 , the set of all secants of the data [2]. We examine this quantity for the face data base in Section 4, taking P as the KL basis.

3.2 Radial Basis Function Inverse

In this Section we brie y discuss our approach for approximating the (well-conditioned) inverse q^ = f (^p) using the radial basis function approach [4]. An RBF approximation is written

y = w0 +

n X c

j =1

wj (kx ? cj k)

(4)

where () is a xed function centered at the point cj and nc is the number of basis functions employed in the expansion. Thus we must solve the linear system

Y = W T

(5)

for the unknown n  (nc + 1) weight matrix W = [w0 jw1 jw2 j : : : jwnc ] given Y = [y(1) j : : : jy(P ) ] and , the interpolation matrix [4]. Formally, a solution may be obtained by computing the pseudo-inverse of  giving W T = y Y T 1 Note, that by construction 0 < k   1 with k  = 1 when P

represents a unitary transformation.

There exist excellent algorithms based on the singularvalue decomposition for computing y [11]. We remark that implementing either the pseudo-inverse method or a gradient based descent technique for radial basis functions is extremely ecient. In this application we restricted our attention to the global approximation scheme using the cubic RBF (r) = r3 . This choice was made for convenience given there are no additional parameters in the RBF to tune. However, we have also experimented with other forms of radial basis functions (see [9] for a list of possibilities), such as the Gaussian with adjustable width, which produced similar results in this application.

4 Results

We now consider the application of the Whitney reduction network to the problem of the dimensionality reduction of an ensemble of 200 faces. The images were normalized for lighting as in [10, 8] and the background was eliminated partially by constructing 310 by 380 pixel cameos. The eigenvalues of the ensemble averaged covariance matrix, shown in Figure 2 as the monotonically decreasing line, indicate that a low-dimensional representation will capture a signi cant fraction of the variance of the data. It is, however, hard to translate this observation into a criterion for determining the number of dimensions to retain. We have proposed that the conditioning of the projection be used to determine the dimension, given that this translates into the nonlinear inverse being well-conditioned. The monotonically increasing line in Figure 2 is the minimum projected secant norm of the data set. We selected d = 13 for projecting the data given the reasonably large minimum projected secant norm, roughly 0:30. It is true that this curve also does not give a precise number for the dimension, rather only an estimate. It does indicate, however, to what degree the reconstruction will have to work to pull the data apart [2]. In Figure 3 we have shown the results, in terms of the original image coordinate system, of the linear, nonlinear and full reconstruction and compare them to the original image. It is clear that the 13-dimensional KL projection, when linearly reconstructed, is a poor approximation to the original data, although the relative mean-square-error is just 17%. Despite this poor pointwise approximation provided by the 13dimensional KL subspace, it does work very nicely to parameterize the data set. As such, only 13 coecients need to be retained for each picture, as well as the KL transformation and the RBF model. The nonlinear mapping provides the detail as the image of this projection. However, note that, given the decoding model

on the receiving end, in a transmission problem only the 13 numbers need to be sent. In Figure 4 we see that the linear reconstruction error, for all the images, goes down steadily as eventually the KL basis will indeed encapsulate the data. However, even at d = 25 the reconstruction error exceeds 10%, while the WRN has achieved approximately 5% error with d = 13 terms and, more importantly, produces a visually superior reconstruction given the detail of the image (which has small amplitude coecients) is well approximated. Now let's consider the reconstruction procedure more closely. As shown in Figure 5, the linear reconstruction accounts for the perfect reconstruction of the rst 13 KL coecients for the image. The nonlinear reconstruction does not contribute to these terms. Similarly, the linear reconstruction from the 13-dimensional KL subspace does not contribute to the nonlinear construction of any term after 13, but this is accomplished completely by the nonlinear RBF function. In Figure 6. the solid dots represent the true KL coecient values for dimensions 1-100. We see the WRN reconstruction is essentially perfect, with the linear term reconstructing the rst 10 dimensions and the nonlinear term reconstructing the complement.

5 Discussion

The Whitney Reduction Network provides a method for approximating, as the graph of a function, a nonlinear manifold. In addition, by considering the norms of the projected secants, it is possible to base the choice of reduction dimension in terms of the conditioning of the inverse mapping. A 13 dimensional KL projection gives a blurry reconstruction but serves to parameterize the null space of the projection. It is interesting to note that a basis consisting of the principal component vectors of the data matrix X is, in general, not a good basis for maximizing k . This is due to the fact that principal components are based on mean-square statistics and our criterion is applied pointwise. Thus we propose an adaptive secant basis approach that is appropriately designed for parameterization of the data, rather than encapsulation [2, 3].

Acknowledgments

This research was initially supported in part by the Engineering and Physical Sciences Research Council, U.K., in the form of a Visiting Research Fellowship to M.K. This research has also been partially supported by the National Science Foundation under grant INT9513880 and the Air Force Oce of Scienti c Research under contract DOD-USAF F49620-99-1-0034. Also, thanks to the MIT media labs for providing the data

which was examined in this paper and to Rick Miranda for commenting on the draft.

References

[1] Joseph J Atick, Paul A Grin, and A Norman Redlich. The vocabulary of shape: principal shapes for probing perception and neural response. Network: Computation in Neural Systems, 7:1{5, 1996. [2] David Broomhead and M. Kirby. New approach for dimensionality reduction: Theory and algorithms. submitted to SIAM J. of Applied Mathematics, 1998. [3] David Broomhead and M. Kirby. The whitney reduction network: a method for computing autoassociative graphs. submitted to Neural Computation, 1998. [4] D.S. Broomhead and David Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2:321{355, 1988. [5] Garrison W. Cottrell and Janet Metcalfe. Empath: Face, emotion and gender recognition using holons. In R.P. Lippman, J. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processing Systems, number 3, pages 564{ 571, San Mateo, CA, 1993. Morgan Kaufmann Publishers. [6] Morris W. Hirsch. Di erential Topology. Graduate Texts in Mathematics 33. Springer-Verlag, 1976. [7] M. Kirby, J.P. Boris, and L. Sirovich. An eigenfunction analysis of axisymmetric jet ow. J. of Comp. Phys., 90(1):98., 1990. [8] M. Kirby and L. Sirovich. Application of the Karhunen-Loeve procedure for the characterization of human faces. IEEE trans. PAMI, 12(1):103., 1990. [9] M.J.D. Powell. The theory of radial basis function approximation in 1990. pages 105{209, 1990. [10] L. Sirovich and M. Kirby. A low-dimensional procedure for the characterization of human faces. J. of the Optical Society of America A, 4:529., 1987. [11] Lloyd N. Trefethen and III David Bau. Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997. [12] Matthew Turk and Alex Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71{86, 1991.

Suggest Documents