Terry Huntsberger, John Rose, Shashidhar Ramaka. Intelligent Systems Laboratory ...... 16] G. Ge en, J. L. Bradshaw, and G. Wallace. Interhemispheric e ects on ...
Fuzzy{Face: A Hybrid Wavelet/Fuzzy Self-Organizing Feature Map System for Face Processing Terry Huntsberger, John Rose, Shashidhar Ramaka Intelligent Systems Laboratory, Department of Computer Science University of South Carolina, Columbia, SC 29208 USA Abstract The human face is one of the most important patterns our visual system receives. It establishes a person's identity and also plays a signi cant role in everyday communication. Humans can recognize familiar faces under varying lighting conditions, dierent scales, and even after the face has changed due to aging, hair style, glasses, or facial hair. Our ease at recognizing faces is a strong motivation for the investigation of computational models of face processing. This paper presents a newly developed face processing system called Fuzzy{Face that combines wavelet pre-processing of input with a fuzzy self-organizing feature map algorithm. The wavelet-derived face space is partitioned into fuzzy sets which are characterized by face exemplars and membership values to those exemplars. This system learns faces using relatively few training epochs, has total recall for faces it has been shown, generalizes to face images that are acquired under dierent lighting conditions, and has rudimentary gender discrimination capabilities. We also include the results of some experimental studies. Keywords:
Face processing, wavelets, neural network
Research supported in part under ONR Grant No. N00014-94-1-1163.
1
1 Introduction Computer face processing systems can be broadly broken into two categories: featurebased and image-based systems. Feature-based approaches concentrate on individual facial features such as the eyes, nose, and mouth, and de ne a model by measuring the position, size and the relationships among these features. An excellent review of such systems can be found in Samal and Iyengar [49]. The image-based systems use the raw pixel information or a processed form of this as a representation for storage and recognition. Processing of the facial image is usually done in an attempt to decrease the amount of data that must be stored, while at the same time preserving the salient information [41, 57]. Oftentimes, the processed facial image data is used in conjunction with neural networks for recognition and categorization in the connectionist face processing systems [29, 38, 42, 58]. A comprehensive review of these types of systems can be found in Valentin, et al [58]. A recent study by Micheli{Tzanakou and associates compared three dierent feature sets (F-CORE [37], moments [20], wavelets [32]) for eciency of learning and recall accuracy on face recognition tests using neural networks [38]. They found that the wavelet feature set had the fastest convergence properties. No attempt was made in that study to exploit the information contained in the dierent wavelet channels. We have developed a two-stage face processing system called Fuzzy-Face that is a hybrid blend of signal processing (wavelet transform) and connectionist (fuzzy selforganizing feature map) algorithms. This system is shown in Figure 1. The rst stage performs the wavelet transform on the facial images. The wavelet transform builds a multiresolution representation of the original image that captures coarse facial structure in the average channel, as well as ne facial features in the detail channels. This choice was motivated by the similarities to wavelet processing found in experimental studies of retinal and simple cortical cells [2, 9, 10, 31, 34, 63]. Both the lowest resolution average channel and the lowest resolution detail channel serve as inputs to the second stage, which is two fuzzy self-organizing feature maps (hereafter referred to as FSOFM) [24] running in parallel. The FSOFM is an unsupervised neural network that partitions the space of wavelet-derived input vectors into fuzzy sets. These fuzzy sets are characterized by cluster centers that can be viewed as face exemplars [59, 60], and by membership values that capture the degree of closeness of any input vector to the exemplar classes. Since, Fuzzy-Face is a connectionist approach to face processing, we will limit our review to such systems. The next section gives a discussion of the evidence for hemispheric specialization found in a study by Hillger and Koenig [18], which serves as a motivation 2
FS OFM Wavelet Trans form FS OFM
Stage 1
Stage 2
Figure 1: Fuzzy-Face system. for our approach to the face processing problem. This is followed by a brief review of connectionist face processing systems. The next two sections discuss the wavelet transform and the FSOFM. Finally, some experimental studies are presented, followed by a discussion of results and current research directions.
2 Psychophysiological background The human visual system uses highly redundant, temporally ltered information for perceptual purposes [54]. Evidence for pure spatial frequency processing of input [53] is somewhat mitigated by the combined size/position/orientation/frequency sensitivities of retinal and simple cortical cells [2, 9, 10, 31, 34, 63]. More relevant to the computational models of face processing are the results of Sergent [51], where there seemed to be a distinct dierence between the types of processing that were being done in the two hemispheres. Dierences of the same feature as to the degree of dissimilarity were perceived dierently in the two hemispheres. This result received some con rmation in the work of Hillger and Koenig, where their fourth experiment indicated that the right hemisphere advantage lies in tasks requiring global attention, whereas the left hemisphere specializes in tasks requiring local attention [18]. The experiment used face stimuli consisting of three features: eyes, nose, and mouth. This experiment was designed using a Same/Dierent judgement test with face images that have a single feature in common (Same) versus all features are dierent (Different). An overall advantage was found in the LH response times, but no appreciable dierences were found between the two hemispheres over the dierent facial features [18]. We report some preliminary results along these lines in the experimental section of this paper. Hillger and Koenig describe a set of four experiments (fourth experiment referred 3
to above) that support the hypothesis that hemispheric specialization is involved in face processing [18]. There are several related questions addressed by their set of experiments. The rst question is whether face processing is primarily accomplished by the right hemisphere, or do both hemispheres make important contributions. This question arises out of studies involving brain damaged patients which provided evidence indicating that the right hemisphere plays a critical role in face processing [3, 40, 47, 61]. In addition, there are studies of normal subjects that appear to con rm these results [16, 19, 48, 62]. However, it is pointed out that the results with normal subjects are not as unambiguous as those with brain-damaged patients. Investigations that indicate a left-hemisphere advantage in face processing are cited [35, 46, 50]. Unfortunately, it is dicult to compare the con icting results of these studies since the methodologies in these studies vary considerably. Thus the question arises as to whether the results re ect dierences or methodological artifacts. For this reason, one purpose of their rst experiment was to establish a common methodology for all subsequent experiments in the set. The entire series of experiments demonstrated that there is a: 1. right hemisphere advantage for discrimination between faces where all of the features are dierent (global task), 2. left hemisphere advantage for discrimination between faces where one feature diers (local task), 3. relatively specialized face processing mechanisms in the right hemisphere versus more general visual processing mechanisms in the left hemisphere (inverted face phenomena), 4. task specialization of local processing capabilities in the left hemisphere and global processing capabilities in the right hemisphere. All of these conclusions were reached based on response times and error analysis of Same/Dierent trials using collages of the three facial features of eyes, nose, and mouth [18]. One thing of note in these studies is the absence of other possible face processing cues such as hair length and style, skin pigmentation, and face outline. These other global/local face characteristics may also be used in face processing. Hillger and Koenig also argue against the view that the hemispheric dierences can be accounted for exclusively by diering abilities to process high and low spatial frequency information [52, 53]. Although the left hemisphere's advantage in parsing individual features may be a consequence of its eciency at processing high spatial frequency data, 4
no evidence for a right hemisphere advantage in processing of low spatial frequency data was observed. Experiments two and three involve the same set of stimuli with the exception that the stimuli in experiment three were inverted. Inverting the stimuli does not change the spatial frequency content. And yet, the right hemisphere advantage for Same trials involving upright faces was not repeated when the faces were inverted [18]. This indicates that the dierences observed in these two experiments depend on factors other than the spatial frequency of the stimuli.
3 Connectionist Systems Previous studies that are closest to the method of face space partitioning found in the FSOFM can be found in Kohonen [28] and Millward and O'Toole [39]. Kohonen developed an auto-associative network using a simple Hebbian learning rule that could recognize (classify) face images and recall a face image from an incomplete or noisy input to the network [28]. This network learns a set of stimulus-response pairs, and reproduces the response given only the stimulus. It exhibited a degree of generalization, in that after learning a set of faces, the network could recognize and reconstruct a known face given a degraded or incomplete input image. A review of Kohonen's system can be found in the section on self-organizing feature maps which follows the discussion of the wavelet transform. This work was further extended by Millward and O'Toole using a Widrow-Ho learning rule and shown to be a true face recognition system [39]. Both of these approaches are more or less equivalent to using a weighted sum of eigenvectors that result from a principal component analysis (PCA) of the weight matrix [58]. The PCA method has been used successfully for face recognition [57], and general face processing [42, 43, 44, 58]. There have been a number of studies using backpropagation networks for face processing. Among these are the works of Cottrell and Fleming [15], the Sexnet system of Golomb et al., the EMPATH system of Cottrell and Metcalfe [7], and the NLDR system of DeMers and Cottrell [13]. A recent comparison of these systems [58] characterized them as three types: binary [15], linear [7], and face manifold [13]. The next section discusses the wavelet transform, which is used to build a multiresolution representation of average/detail face characteristics in the rst stage of the FuzzyFace system.
5
4 Wavelet Transform Wavelets have been used previously for face processing [12, 17, 33, 38]. These approaches to feature extraction are based on recent studies of the mammalian visual system, where Gabor wavelets can be used to t response properties of retinal and simple cortical cells [2, 9, 10, 31, 34, 63]. Two-dimensional Gabor wavelets have the basic form [12]: mpq (x; y ) = 2
?2m
(x0; y0)
(1)
where dilation by 2?m , translation by p and q, and rotation by of the wavelet is captured in x0 and y0 as:
x0 = 2?m [x cos + y sin ] ? p y0 = 2?m [?x sin + y cos ] ? q The mother wavelet (x; y) is given by: (x; y) = e?[ x?x = y?y (
2 0)
2
+(
0)
2
(2) (3) (4)
= 2 ] e?2i[u0 (x?x0 )+v0 (y?y0 )]
(5)
where and are extent factors, x and y are centroids, and u and v are frequency modulation components. The cells in the visual cortex of mammals respond to small lines and edges [14]. The behavior of these cells is modeled using Gabor wavelets by determining the maximal response of lters that are tuned to speci c frequency and orientation. This method of analysis detects the salient features in the image. These feature locations correspond to points with signi cant curvature change. Manjunath et al. describe a feature-based approach where the features are derived directly from the intensity data using Gabor wavelets [33]. Topographical graphs are used to represent relations between features, and a deterministic graph matching scheme is used to recognize familiar faces from a database. Daugman uses the Gabor wavelet transform to encode a multiresolutional representation of iris textures [12]. Hamming distance measurements between like and unlike iris byte codes derived from the results of the wavelet transform gave a bimodal distribution, with a clear discrimination of authentic and imposter patterns. There are a number of computational and theoretical concerns that need to be addressed for the use of Gabor wavelets in a discrete context. Ideally, the basis that is chosen to represent an image should be complete in that the image can be reconstructed in a 0
0
6
0
0
numerically stable way [30]. This is a well de ned problem for orthonormal wavelets [36]. However, the discrete Gabor wavelets are examples of nonorthogonal wavelets, which means that complete characterization of an image can only be achieved using the dual frame ~mpql [8]:
f (x) =
X D
m;p;q;l
f; ~m;p;q;l
E
m;p;q;l =
X
m;p;q;l
hf;
m;p;q;li ~m;p;q;l;
(6)
E D where f; ~m;p;q;l is de ned as the inner product of f and ~m;p;q;l, and = l is the discrete sampling for the angle of orientation of the lter. Generation of the dual frame is a nontrivial problem, somewhat mitigated by an iterative technique of Daugman, which uses error minimization to obtain the coecients [11]. A recent study by Lee has established guidelines for the proper choice of Gabor wavelet sampling based on the tightness of the frame [8, 30]. Lee was able to generalize the 1{D frame criterion of Daubechies to 2{D. He determined that a phase space sampling of 20 orientations per cycle and 3 scale steps per octave with 1.5 octave bandwidth Gabor wavelets exhibited tight frame characteristics if the spatial sampling interval was 0.8 [30]. The forward and inverse transform can then be performed in the same way as that for orthonormal wavelets. The number of Gabor lters for tight frame sampling is relatively large unless a more ecient method can be found for their application. Gross and Koch have developed an algorithm for the Gabor wavelet decomposition based on convolution products in the Fourier domain [17]. They used this method to encode texture and range information from facial images using 34 (17 real and 17 imaginary) Gabor wavelet convolutions. Facial features were then analyzed using a 3-D Kohonen self{organizing feature map alone or in combination with a principal components analysis to perform dimensionality reduction. Their experimental studies indicated that a set of features suitable for facial image database matching could be extracted using the complex Gabor wavelets in combination with the Kohonen network [17]. A single image process from Gabor wavelet feature generation to classi cation took about 390 seconds on a SGI Indigo2 workstation. For large databases operating in real-time environments, this amount of time might be prohibitive. These concerns led us to investigate the use of orthonormal or biorthogonal discrete wavelets for the Fuzzy{Face system. There are extremely ecient integer algorithms for these types of wavelets that can perform the transform on images at frame rates (30 frames/sec) [56]. The orthonormal discrete wavelet decomposition of a general signal (or image) f is a representation in terms of linear combinations of translated dilates of a 0
7
single function :
f (x) =
XX
k
hf;
;k (x)i ;k (x);
(7)
with ;k (x) = 2= (2 x ? k) for integers and k. Since the transform is linear, all of the information content of the image or signal is contained in the inner product coecients. The detail channels of the coecient space are de ned as the orientation-speci c dierence between average channels at resolution and ? 1. A hierarchical representation of f is obtained by varying the scale from coarse (small ) to ne (large ). The extension of this set of functions to multiple dimensions is straight-forward using separate wavelets obtained from the products of one-dimensional wavelets and scaling functions [32]. An example of the transform applied to a face image is shown in Figure 2, where the hierarchical display format of Mallat is used [32]. The block-like appearance is due to the low spatial resolution of the original image (128128). The set of coecients that correspond to the average channel at the coarsest resolution is in the small rectangle in the upper left hand corner of the Figure. The coecients corresponding to the vertical, diagonal, and horizontal detail channels are shown in clockwise order from the upper right hand corner of the Figure, and at dierent levels of resolution while traveling from the lower right to the upper left corner. Another advantage of the orthonormal transforms compared to the Gabor wavelets is the savings in storage (the output of the transform has the same size as the original image). The average channel and a single detail channel (formed by taking the maximum value of the three detail channels) at a coarse resolution are fed into the second stage of the Fuzzy-Face system for further processing. There are a number of limitations to orthogonal wavelets, including border artifacts and a dyadic (power of 2) restriction on image sizes. A method that can be used to avoid these problems involves the use of biorthogonal wavelets [1, 6]. A biorthogonal wavelet i and ~i to decompose the decomposition uses two families of basic building blocks ;k ;k image as E X XXD i i (8) f (x) = f; ~;k ;k ; 2
D
E
ifinite k
i . This type of wavelet can i is de ned as the inner product of f and ~;k where f; ~;k be de ned to exist within an arbitrarily shaped region [1, 6], and local areas within the image can be focused on for such operations as image enhancement [27]. Border artifacts are eliminated using dierent basis functions for the edges of the image [1]. Wavelet encoding of localized properties of faces such as eyes, nose, and mouth is possible using biorthogonal wavelets. Micheli{Tzanakou et al. use the orthogonal wavelet transform of facial images to build
8
Figure 2: Wavelet coecients of transformed face image. a feature vector as input to a backpropagation network [38]. The wavelet coecients with the highest values are selected, without regard to the particular channel that they are from. Their study concluded that the wavelet coecients were easier to learn, but the invariant moments of Hu [20] were more accurate for recall purposes [38]. Whereas the approaches described above extract features from the Gabor wavelet coecient space or use the largest wavelet coecients as a vector, our system uses the average and detail channels of the general biorthogonal wavelet transform separately for face processing. The motivation for this lies in the dierent information content of the channels: coarse facial structure in the average channel and ne facial features in the detail channels. As discussed previously, some sense of hemispheric-speci c processing can be captured with this approach. The next section describes the FSOFM, which is used to build an exemplar-based face space [59, 60] in the second stage of the Fuzzy-Face system.
5 Fuzzy Self-Organizing Feature Map The neural network model [24] used in the second stage of the system shown in Figure 1 is a modi cation of the self{organizing feature maps of Kohonen [28]. Kohonen's system consisted of a two{layer feedforward network that used an in uence neighborhood in the output layer. An output node was a candidate for update of its weights if the weights were found to be closest to the input vector using the Euclidean distance. Only nodes 9
within the local neighborhood around this output node have their weights updated. For a network with i input nodes and j output nodes, the update rule is [28]:
Wk = Wkold + (t) dwk ;
(9)
where dwk = (i ? Wk ), (t) is the learning rate, and Wkold is the previous value of Wk ; 0 k j ? 1. The learning rate has to decrease with time in order to guarantee convergence since the governing equation is of the Riccati type, which has no closed-form solution. In addition the neighborhood size is decreased with time. Kohonen used the network with great success for matching of faces given partial information [28], and Gross and Koch demonstrated its use for dimensionality reduction for data visualization [17]. The FSOFM, shown in Figure 3, is a member of the generalized class of clustering networks developed by Pal, Bezdek, and Tsao [45] (hereafter referred to as GLVQ). There are three layers in the FSOFM. The input layer feeds forward into a distance layer which determines the distance between the input vector and the current weights using a prede ned metric (usually Euclidean). The distance layer then feeds forward into a membership layer which calculates the membership values of the input vector to the set of all output vectors. This is done using the following form derived from the fuzzy c-means algorithm [4] and used in a previously developed computer vision system [25]: 1 # (10) uki = "jP ? d ki m ? ( dli ) 1
l=0
(
2
1)
where j is the number of output nodes, dki is the distance (Euclidean) between the input vector i and weight vector Wk , and m is a weighting exponent which is set within the range 2.0 to 1. These membership values uki are then fed back into the network and participate in the weight update rule as:
Wk = Wkold + uki dwk ;
(11)
where dwk = (i ? Wk ), and Wkold is the previous value of Wk . The addition of a feedback loop allows the network to respond to the localized patterns of activity in the distance layer without the need for a neighborhood. The set of weights Wk can be viewed as cluster centers that are representative of global partitioning of the input vectors into classes. The membership values uki range from 0.0 (no membership in the fuzzy set) to 1.0 (total membership in the fuzzy set). The sum of membership values for any given input vector to all of the j sets is normalized to 1.0. This means that an input vector that is 10
not close to any of the previously de ned sets (an outlier) will have an equal membership value to all sets approximately equal to one over the number of output nodes (1/j). Such a vector would be considered to be unfamiliar. F e e d b a c k P a t h
u 0i
u 1i
u (j-1) i
... d 0i W0
...
d 1i
W1
Membership Layer d (j-1) i W(j-1)
... ...
Distance Layer
Input Layer
ξi Figure 3: Fuzzy self{organizing feature map neural network. Input Vector
The complete algorithm is given as [24]:
P1. Randomly initialize weights Wk ; 0 k j ? 1 to values between 0 and 1. Set the limit value to be suciently small (e.g, 0.001), and set the total squared change in weights, Difftotal, to 0. P2. For each vector input i, i = 1; 2; :::; n where n is the number of inputs and for all weight vectors 0 k j ? 1: 1. Calculate dki and determine the feedback membership value uki for each input vector using Equation 10. 2. Update the weight change factor dwk such that dwk = (i ? Wk ). 3. Save the current weight Wk as Wkold before updating weight. 4. Update weight Wk using Equation 11. 5. Return the value Di, where Di = P(Wk ? Wkold) . Update the value of Difftotal using Difftotal = Difftotal + Diff . 2
P3. If Difftotal > limit then go to P4, else go to P5. P4. Reset Difftotal to 0 and go to P2. 11
P5. Write out weight le and determine the fuzzy membership value uki; 0 k j ? 1 for each input vector i by using equation 10. The FSOFM algorithm has certain advantages over GLVQ [45] one of which is that there is no need to determine a winning output node. This property of FSOFM leads to a uniform weight update rule for all nodes. Details and performance of sequential and parallel versions of the algorithm can be found in the original paper [24]. Experimental studies indicated that the network reproduced the segmentation results from an earlier standard implementation of the fuzzy c-means algorithm [25, 26] with close to two orders of magnitude increase in performance. The network was recently used for sensor fusion/motion analysis [22], and unsupervised identi cation of target signatures in multiband imagery [21, 23].
6 Experimental Studies In order to test the learning, recall, generalization and categorization characteristics of the Fuzzy-Face system, we ran a number of experimental studies. We used the all-male database of 16 faces from the original investigation of Turk and Pentland [57] (hereafter called the MIT database), and a mixed gender/race database of 42 faces obtained at the University of South Carolina (hereafter called the USC database). All images have an 8-bit grayscale resolution, and were resized to have a spatial resolution of 128128 pixels. We have made no attempt to normalize the intensity distributions, to lter out the background non-face information, or to correct for head tilt in any of the images. The convergence limit was set to 0.001 in all of the following studies. The rst study was designed to test the learning and recall capabilities of the Fuzzy{ Face system. The wavelet transform was performed on the face databases and the average and detail channel images from 3 levels down in the hierarchy (1616 pixels) were used to train the two FSOFM in the second stage of the system. This represents a 98.5% reduction in the amount of information that is being processed for each image. The network is con gured with 256 input nodes, 16 output nodes for the MIT database, and 42 output nodes for the USC database. The Fuzzy{Face system was able to learn the average channel coecients for the MIT database in 57 epochs, and for the USC database in 40 epochs. The detail channel coecients were learned in 906 epochs for the MIT database, and in 1526 epochs for the USC database. This is a considerable savings over the backpropagation networks, which typically take tens of thousands of epochs to converge. Recall accuracy in both channels on both of these databases was 100%. 12
The second study tested the Fuzzy-Face system for its recognition capabilities. The USC database was broken into two unique sets with 21 members in each set. Using the average channel of the rst set as the training set, the network with 21 output nodes converged in 460 epochs with a recall accuracy of 100%. Fixing the network weights from this training session for recall purposes, some sense of familiarity/non-familiarity can be derived from the membership values if the network is shown the second set of facial images. If images from the second dataset are judged by the network to be unfamiliar, then the membership values will be approximately equal to one over the number of output nodes, or 0.05 (1/21). Signi cant deviations from this value indicate that the input face has some similarity to the previously learned images. None of the membership values from the second dataset exceeded 0.13, with the majority of them having a value of 0.05 or 0.06. On the basis of these values, the network correctly responded that it had not seen any of the second portion of the USC database before. Although the network indicated that it was not familiar with any members of the second dataset, the output node that responded with the largest membership value when presented with an image gives some idea of similarity between the new image and those that were used to train the network. This pairing is shown in Figure 4, where the top row is the average wavelet channel of the 21 members of the second dataset and the bottom row is the set of weights that correspond to the output node with the largest membership value. The system correctly identi ed gender, with the exception of the fth image from the right. An interesting aspect of the match for the fth image from the right is the correct identi cation of race (Asian).
Figure 4: Familiarity study using the Fuzzy-Face system. Top row: Unfamiliar dataset; Bottom row: Closest match The third study was designed to test the generalization capabilities of the Fuzzy{Face system. The MIT database also has images of the same 16 faces taken under dierent lighting conditions (full-face, side-lit, 45o ). These additional images can be used to test 13
generalization of the system by locking the weights found during training in the rst study, and using the membership values produced by the third layer in the FSOFM as an indication of recognition. We made no attempt to spatially register the new faces to the original 16 faces that the system was trained on. The recognition rate for the two additional MIT databases (side-lit, 45o ) was 91%. If the faces are centered in the eld of view, the recognition rate climbs to 100%. The fourth study uses the USC database to test the gender categorization capabilities of the Fuzzy{Face system. The full USC database was used and the number of output nodes was set to two. The system converged in 1588 epochs and the resulting set of weights is shown in Figure 5. Gender categorization was done on the basis of hair length, face outline, and cheekbone structure. The gender identi cation accuracy of the network was 76% (1 misclassi ed male and 9 misclassi ed females). The misclassi ed male was the same one seen in the second study.
Figure 5: Gender identi cation in the Fuzzy-Face system; left image is generic male generated by output Node 1, right image is generic female generated by output Node 2. Finally, we report the results of a preliminary study into the hemispheric specialization seen in the study of Hillger and Koenig [18]. Due to localized coherence properties of the coecients in the average channel, the FSOFM trained with this information will be less sensitive to changes in individual features and would exhibit more global processing capabilities (right hemisphere). Although there is high spatial frequency information in the detail channels of the wavelet coecients, there is also spatial localization of this information. The FSOFM trained with the detail channel information will be sensitive to changes in single features and would exhibit more local processing capabilities (left 14
hemisphere). If this hypothesis is true, a comparison of membership values for changes in features such as those seen in the Hillger and Koenig experimental studies [18] should elicit the same sorts of trends. They determined that the response times for a single feature change were most sensitive to a change in eyes, followed by mouth, and nally nose. We ran an experiment where we duplicated changes in a single feature (Hillger and Koenig, Experiment 2), and changes in any two of three features (Hillger and Koenig, Experiment 4). Although it is dicult to duplicate the trial process in the original experiments, the membership values should be inversely correlated with the degree of change, with more discrimination capability in a given channel corresponding to a low membership value (Dierent response). The results of this experiment are shown in Table 1, where the values reported for the channel response are the inverse of the membership values. A value of 1.0 for the channel response indicates a match (Same response). Change Average channel Detail channel Eyes 2.56 10.00 Mouth 1.49 1.49 Nose 1.32 2.44 Eyes/Mouth 3.03 10.00 Eyes/Nose 2.63 11.11 Nose/Mouth 1.82 2.56 TABLE 1. Experimental results from Same/Dierent feature trials. The trend for the single feature change was replicated in the average channel, where the system is most sensitive to changes in the eyes, followed by the mouth, and then the nose. These results agree with those found in Experiment 2 of Hillger and Koenig [18]. The detail channel response agreed with the sensitivity to eye change, but reversed the nose/mouth relationship. The detail channel had a greater response to the changes, indicating a local variation discrimination. A wider range of experimental studies would be necessary to determine if this trend would continue. The study which changed two of three features indicated that both channels in the system were most sensitive to changes in the eyes/mouth (same nose), followed by the eyes/nose (same mouth), and then the nose/mouth (same eyes). Once again, the detail channel had a much greater response to the changes, indicating a large degree of local variation. These results agree with those found in Experiment 4 of Hillger and Koenig [18]. 15
7 Summary We have presented a face processing system called Fuzzy-Face, shown in Figure 1, that is a hybrid blend of wavelet processing and a unique connectionist architecture called the FSOFM [24]. The system design was motivated by the results found in psychophysiological experiments in face processing done by Hillger and Koenig [18]. The wavelet transform was used to derive average and detail channel images at a reduced resolution over the input face images (savings of 98.5% on processing and storage). A FSOFM was trained for each of these channels and were then run in parallel for face processing. Once again, we make note of the fact that no attempt was made to normalize the intensity distributions, to lter out the background non-face information, or to correct for head tilt in any of the images. We ran a number of experiments to test the learning, recall, generalization and categorization characteristics of the Fuzzy-Face system. The two databases used in these studies were the all-male MIT database of Turk and Pentland [57], and a mixed gender/race database from the University of South Carolina. Our rst experiment demonstrated that the system was able to learn both databases extremely fast (57 and 40 training epochs respectively) with 100% recall capability. The second experiment demonstrated that the recognition capabilities for the system on the USC database were 100%, with a built-in capability of providing a degree of closeness to the previously learned faces through the membership values. The third experiment tested the generalization capabilities of the system, with a recognition study of the members of the MIT database under dierent lighting conditions. The system accuracy for matching was 91% if no attempt was made to center the faces and 100% if the faces were centered. This result demonstrates the robustness of the Fuzzy-Face system, since the unfamiliar images used in the test had a wide variety of background and head orientation changes compared to the original training data. The fourth experiment examined the exemplar building capabilities of the system for gender identi cation. Generic male and female classes, shown in Figure 5, were found using totally unsupervised learning in the average channel FSOFM. The system was then able to categorize the USC database with respect to gender with an accuracy of 76%. Our nal study examined the relationship between the average and detail channel FSOFMs for overall response to changes in features, as was done in an experimental study by Hillger and Koenig [18]. The average channel FSOFM can be broadly categorized as sensitive to global changes, whereas the detail channel FSOFM is sensitive to local changes. Using the inverse of the membership values as an indication of relative response, 16
we were able to demonstrate an increased sensitivity in the detail channel over the average channel for these types of tasks. We are currently investigating the Fuzzy-Face system sensitivity to changes in head orientation. Although the face can be rotated into an upright position as was done in the Turk and Pentland study [57], it would be better if the system could generalize without this step. A recent study by Bichsel and Pentland has shown that the transformation in face space for rotated views would be highly non-linear [5]. This can be addressed through the use of the class of shiftable wavelet transforms introduced by Simoncelli et al. [55]. In addition, we are testing scaling properties of the system as to the number of learning epochs and recognition capability for much larger facial databases. This research will be combined with further investigations, along the lines of our fourth experiment, into the exemplar-building capabilities of Fuzzy-Face.
8 Acknowledgements The authors would like to thank the reviewers, whose comments have greatly enhanced the clarity of this manuscript. In addition, we would like to thank Gary Zeigler of the School of Business Administration at the University of South Carolina for supplying the second database used in our experimental studies.
References [1] L. Andersson, N. Hall, B. Jawerth, and G. Peters. Wavelets on closed subsets of the real line. In L. L. Schumacher and G. Webb, editors, Topics in the Theory and Applications of Wavelets. Academic Press, New York, NY, 1994. [2] B. W. Andrews and D. A. Pollen. Relationship between spatial frequency selectivity and receptive eld pro le of simple cells. J. Physiology, 287:163{176, 1979. [3] A. L. Benton and M. W. Van Allen. Impairment in facial recognition in patients with cerebral disease. Cortex, 4:344{358, 1968. [4] J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
17
[5] M. Bichsel and A. P. Pentland. Human face recognition and the face image set's topology. Computer Vision, Graphics, Image Processing: Image Understanding, 59(2):254{261, March 1994. [6] A. Cohen, I. Daubechies, B. Jawerth, and P. Vial. Multiresolution analysis, wavelets and fast algorithms on an interval. Comptes Rendes, I(316):417{421, 1993. [7] G. W. Cottrell and J. Metcalfe. EMPATH: Face, gender and emotion recognition using holons. In D. S. Touretzky and R. Lippman, editors, Advances in Neural Information Processing Systems, volume 3, pages 564{571. Kaufman, San Mateo, CA, 1991. [8] I. Daubechies. The wvaelet transform, time{frequency localization and signal analysis. IEEE trans. Information Theory, 36:961{1004, 1990. [9] J. G. Daugman. Two{dimensional spectral analysis of cortical receptive eld pro les. Vision Research, 20:857{856, 1980. [10] J. G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two{dimensional visual cortical lters. J. Optical Society of America A, 2:1160{1169, July 1985. [11] J. G. Daugman. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Trans. on Acoustics, Speech, and Signal Processing, 36(7):1169{1179, 1988. [12] J. G. Daugman. High con dence visual recognition of persons by a test of statistical independence. IEEE Trans. PAMI, 15(11):1148{1161, November 1993. [13] D. DeMers and G. W. Cottrell. Non-linear dimensionality reduction. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 580{587. Kaufman, San Mateo, CA, 1993. [14] A. Dobbins, S. S. Zucker, and M. S. Cynader. Endstrapped neurons in the visual cortex as a substrate for calculating curvature. Nature, 329(1):438{441, 1987. [15] M. Fleming and G. Cottrell. Categorization of faces using unsupervised feature extraction. In Proc. Intern. Joint Conf. on Neural Networks, Vol II, pages 65{70, San Diego, CA, 1990. 18
[16] G. Geen, J. L. Bradshaw, and G. Wallace. Interhemispheric eects on reaction time to verbal and nonverbal visual stimuli. J. Experimental Psychology, 87:415{ 422, 1971. [17] M. H. Gross and R. Koch. Visualization of multidimensional shape and texture features in laser range data using complex valued Gabor wavelets. IEEE Transactions on Visualization and Computer Graphics, 1(1):44{59, 1995. [18] L. A. Hillger and O. Koenig. Separable mechanisms in face processing: Evidence from hemispheric specialization. J. Cognitive Neuroscience, 3:42{58, 1991. [19] R. D. Hilliard. Hemispheric laterality eects on a facial recognition task in normal subjects. Cortex, 9:246{258, 1973. [20] M. K. Hu. Visual pattern recognition by moment invariants. IEEE Trans. Information Theory, 8:179{187, 1962. [21] T. L. Huntsberger. Application of parallel self-organizing neural networks to automatic target cueing of multiple band imagery. Final Report, ARO Contract DAAL03{91{C{0034, January 1992. [22] T. L. Huntsberger. Data fusion: A neural networks implementation. In M. A. Abidi and R. C. Gonzalez, editors, Data Fusion in Robotics and Machine Intelligence. Academic Press, Orlando, FL, 1992. [23] T. L. Huntsberger. Biologically motivated cross-modality sensory fusion system for automatic target recognition. Neural Networks, 8(7/8):1215{1226, 1995. [24] T. L. Huntsberger and P. Ajjimarangsee. Parallel self-organizing feature maps for unsupervised pattern recognition. International Journal of General Systems, 16:357{ 372, 1990. also reprinted in Fuzzy Models for Pattern Recognition, (Eds. J. C. Bezdek, S. K. Pal), IEEE Press, Piscataway, NJ, 1992, 483{495. [25] T. L. Huntsberger, C. L. Jacobs, and R. L. Cannon. Iterative fuzzy image segmentation. Pattern Recognition, 18:131{138, 1985. [26] T. L. Huntsberger, C. Rangarajan, and S. N. Jayaramamurthy. Representation of uncertainty in computer vision using fuzzy s ets. IEEE Transactions on Computers, 35:145{156, 1986. also reprinted in Fuzzy Models for Pattern Recognition, (Eds. J. C. Bezdek, S. K. Pal), IEEE Press, Piscataway, NJ, 1992, 397{408. 19
[27] B. Jawerth, M. Hilton, and T. L. Huntsberger. Local enhancement of compressed images. Journ. Math. Imaging and Vision, 3:39{49, 1993. [28] T. Kohonen. Self-Organization and Associative Memory. Springer{Verlag, Berlin, third edition, 1989. [29] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. Von der Malsburg, R. P. Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Computers, 42(3):300{311, 1993. [30] T. S. Lee. Image representation using 2D Gabor wavelets. IEEE Trans. PAMI, 18(10):959{971, 1996. [31] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanisms. J. Optical Society of America A, 7:923{932, May 1990. [32] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. PAMI, 11(7):674{693, July 1989. [33] B. S. Manjunath, R. Chellappa, and C. Von der Malsburg. A feature based approach to face recognition. In Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pages 373{378, Champaign, IL, June 1992. [34] S. Marcelja. Mathematical description of the responses of simple cortical cells. J. Optical Society of America, 70:1297{1300, 1980. [35] C. Marzi and G. Berlucchi. Right visual eld superiority for accuracy of recognition of famous faces in normals. Neuropsychologia, 15:751{756, 1977. [36] Y. Meyer. Wavelets and Operators. Cambridge Univ. Press, New York, 1992. [37] E. Micheli-Tzanakou and G. M Binge. F-CORE: A Fourier based image compression and reconstruction technique. In SPIE Proc. on Visual Communication and Image Processing IV, volume 1199, pages 1563{1574, 1989. [38] E. Micheli-Tzanakou, E. Uyeda, A. Sharma, R. Ramanujan, and J. Dong. Comparison of neural network algorithms for face recognition. Simulation, 64(1):15{27, July 1995. [39] R. B. Millward and A. J. O'Toole. Recognition memory transfer between spatially transformed faces. In H. D. Ellis, Jeeves M. A., F. Young, and A. Newcombe, editors, Aspects of Face Processing. Martinus Nijho, Dordrecht, 1986. 20
[40] B. Milner. Visual recognition and recall after right temporal lobe excision in man. Neuropsychologia, 6:191{209, 1968. [41] A. J. O'Toole, H. Abdi, K. A. Deenbacher, and D. Valentin. Low{dimensional representation of faces in higher dimensions of the face space. J. Optical Society of America, 10(3):405{411, March 1993. [42] A. J. O'Toole, H. Abdi, K. A. Deenbacher, and D. Valentin. A perceptual learning theory of the information in faces. In T. Valentine, editor, Cognitive and Computational Aspects of Face Recognition. Routledge, London, 1995. [43] A. J. O'Toole, K. Deenbacher, H. Abdi, and J. C. Bartlett. Simulating the 'otherrace eect' as a problem in perceptual learning. Connection Science, 3:163{178, 1991. [44] A. J. O'Toole, R. B. Millward, and J. A. Anderson. A physical system approach to recognition memory for spatially transformed faces. Neural Networks, 1:179{199, 1988. [45] N. R. Pal, J. C. Bezdek, and E. C.-K. Tsao. Generalized clustering networks and Kohonen's self{organizing scheme. IEEE Trans. Neural Networks, 4(4):549{557, July 1993. [46] K. Patterson and J. L. Bradshaw. Dierent hemispheric mediation of nonverbal visual stimuli. J. Experimental Psychology: Human Perception and Performance, 1:246{252, 1975. [47] E. De Renzi and H. Spinnler. Facial recognition in brain damaged patients: An experimental approach. Cortex, 16:634{642, 1966. [48] G. Rizzolatti, C. Ulmita, and G. Berlucchi. Opposite superiorities of the left and right cerebral hemispheres in discriminative reaction time to physiognomical and alphabetical material. Brain, 94:431{442, 1971. [49] A. Samal and P. A. Iyengar. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recognition, 25(1):65{77, 1992. [50] J. Sergent. About face: Left hemisphere involvment in processing physiognomies. J. Experimental Psychology: Human Perception and Performance, 8:1{14, 1982. 21
[51] J. Sergent. Con gural processing of faces in the left and the right cerebral hemispheres. J. of Exper. Psychology: Human Perception and Performance, 10:554{572, 1984. [52] J. Sergent. Methodological constraints on neuropsychological studies of face perception in normals. In R. Bruyer, editor, The neuropsychology of face perception and facial expression, pages 91{124. Erlbaum, Hillsdale, NJ, 1986. [53] J. Sergent. Information processing and laterality eects for object and face perception. In G. W. Humphreys and M. J. Riddoch, editors, Visual object processing: A cognitive neuropsychological approach, pages 145{173. Erlbaum, Hillsdale, NJ, 1987. [54] J. Sergent. Structural processing of faces. In A. W. Young and H. D. Ellis, editors, Handbook of Research on Face Processing, pages 57{92. Elsevier Science Publishers, New York, NY, 1989. [55] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger. Shiftable multiscale transforms. IEEE Trans on Information Theory, 8(2):587{607, March 1992. [56] Summus, Ltd. Irmo, SC, 1994. [57] M. Turk and A. Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience, 3(1):71{86, 1991. [58] D. Valentin, H. Abdi, A. J. O'Toole, and G. W. Cottrell. Connectionist models of face processing: A survey. Pattern Recognition, 27(9):1209{1230, 1994. [59] T. Valentine, P. Chiroro, and R. Dixon. An account of the own{race bias and the contact hypothesis based on a 'face space' model of face recognition. In T. Valentine, editor, Cognitive and Computational Aspects of Face Recognition. Routledge, London, 1995. [60] T. Valentine and M. Endo. Towards an exemplar model of face processing: The eects of race and distinctiveness. Quarterly J. of Exper. Psychology, 44A:671{703, 1992. [61] E. Warrington and M. James. An experimental investigation of facial recognition in patients with unilateral cerebral lesions. Cortex, 3:317{326, 1967. [62] A. W. Young, D. C. Hay, and K. H. McWeeny. Right cerebral hemisphere superiority for constructing facial representations. Neuropsychologia, 23:195{20, 1985. 22
[63] R. A. Young. The Gaussian derivative model for spatial vision: I. Retinal mechanism. Spatial Vision, 4:273{293, 1987.
23