Caricature Recognition in a Neural Network

2 downloads 87 Views 245KB Size Report
Oct 1, 2010 - Taylor & Francis makes every effort to ensure the accuracy of ... We are grateful to Glyn Humphreys, Gill Rhodes, Mike Burton, and an anonymous reviewer ... face's second-order relational properties (Diamond & Carey, 1986).
This article was downloaded by: [University of Victoria] On: 05 March 2015, At: 19:39 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Visual Cognition Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ pvis20

Caricature Recognition in a Neural Network James W. Tanaka Published online: 01 Oct 2010.

To cite this article: James W. Tanaka (1996) Caricature Recognition in a Neural Network, Visual Cognition, 3:4, 305-324, DOI: 10.1080/135062896395616 To link to this article: http://dx.doi.org/10.1080/135062896395616

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The

Downloaded by [University of Victoria] at 19:39 05 March 2015

accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

VISUAL COGNITION, 1996, 3 (4), 305–324

Caricature Recognition in a Neural Network James W. Tanaka and Valerie B. Simon Downloaded by [University of Victoria] at 19:39 05 March 2015

Oberlin College, Oberlin, Ohio, USA In a caricature drawing, the artist graphically exaggerates those features of an individual in proportion to their deviations from the normative or average face. The paradox of caricature recognition is that subjects recognize these distorted drawings of faces better than or equally as well as veridical drawings. The present research explores the caricature paradox by modelling caricature recognition in a neural network. Simulations were used to investigate the perceptual versus memorial bases of caricature recognition (Simulation 1) and examine the effects that familiarity (Simulation 2) and typicality (Simulation 3) have on caricature recognition. Simulation results are interpreted as support for a distinctive access hypothesis of the caricature recognition paradox that emphasizes the interaction between the stimulus properties of the caricature drawing and the underlying veridical face representation.

In a caricature drawing, the artist graphically exaggerates those features of an individual in proportion to their deviations from the normative or average face. For example, in a political cartoon, Bill Clinton is often portrayed as having especially large cheeks and Richard Nixon as having an exaggerated, upturned nose. As Gibson (1973) noted, “the caricature may be a poor projection of a face, but (preserves) good information about it. The form of the face is distorted, but not the essential features of the face” (p. 3). The “good” information provided by a caricature has been demonstrated in studies where people recognize caricature drawings better or equally as well as veridical drawings of familiar persons (Benson & Perrett, 1991; Mauro & Kubovy, 1992; Rhodes, Brennan, & Carey, Requests for reprints should be sent to James Tanaka, Department of Psychology, Severance Lab, Oberlin College, Oberlin, OH 44074, USA. Email: tanaka@occs. cs.oberlin.edu We are grateful to Glyn Humphreys, Gill Rhodes, Mike Burton, and an anonymous reviewer for their valuable comments. We would also like to thank Martha Farah and Jay McClelland for their help and support. This research was supported by NIH Grant R15 HD30433 and a Keck Foundation Faculty Research Award. An earlier version of this research was reported at the Proceedings of the 12th Annual Conference of the Cognitive Science Society. © 1996 Psychology Press, an imprint of Erlbaum (UK) Taylor & Francis Ltd

Downloaded by [University of Victoria] at 19:39 05 March 2015

306

TANAKA AND SIMON

1987; Rhodes & Tremewan, 1994). The caricature effect1 presents an interesting paradox: why is it that distorted drawings are recognized better than or equally as well as veridical drawings? As an account of the caricature effect, the norm-based coding theory claims that the average or normative face plays a vital role in face recognition processes. As objects of recognition, all faces share the same part features (i.e. two eyes, a nose, and a mouth) arranged in a similar configuration (i.e. the eyes are above the nose, which is above the mouth). In order to cope with the homogeneous structure of faces, human face recognition processes must therefore be sensitive to metric properties—such as the slope of the nose or the distance between the eyes—that distinguish one face from another. According to the norm-based coding theory, such metric information is computed relative to a face’s second-order relational properties (Diamond & Carey, 1986). Secondorder relational properties are derived by encoding metric differences between a given face and the average face in the population. Because the second-order relational properties of a face are exaggerated in a caricature drawing, a caricature advantage could be interpreted as evidence for the norm-based model of face recognition. As shown in Fig. 1, in a norm-based coding model faces can be presented as points in a Euclidean, multi-dimensional “face space”. The dimensions of the face space—although not explicitly stated in the model—represent the structural and configural properties of a face (e.g. length of nose, inter-eye distance). The average face represents the mean value of these dimensions and lies at the origin of the face space. Other faces in the population are located around the normative face according to their dimensional values. Given that faces are normally distributed in face space about the origin, face typicality is shown as the relative distance from the normative face representation such that typical faces lie close to the origin and atypical faces are located farther away (see Valentine, 1991, for a full discussion of the norm-based theory). According to the norm-based coding theory, individual faces are encoded with respect to their deviations from the normative face. This relationship is shown in Fig.1, where Face 1 Vector is a vector that connects the Normative Face and Face 1 veridical . In a caricature drawing, the caricature artist increases the distinctiveness of a face by exaggerating those features that deviate most from the normative face. This is shown in Fig.1 where the Face 1 caricature is located along the Face 1 Vector but at a greater distance from the normative face than the veridical face. Although caricaturization is not an objective representation of a person, it preserves those critical aspects of a face that distinguish it from other faces in the population relative to the normative face. 1 We use the term “caricature effect” to refer to instances where the caricature representation is recognized as well as or better than the veridical representation; the term “caricature advantage” refers to the more specific case where the caricature representation is recognized better than the veridical representation.

Downloaded by [University of Victoria] at 19:39 05 March 2015

CARICATURE RECOGNITION

307

FIG. 1. Diagram of theoretical face space showing location of the normative face ( d ), veridical face ( d ), caricature and anti-caricature faces ( s ). Face 1 Vector indicates the line of caricaturization connecting the Face 1 veridical and the normative face representation.

The virtue of the norm-based coding theory is that it provides a computational and therefore readily testable account of face recognition processes. Previous simulations have demonstrated that neural networks are well suited for exploring the computational “microstructure” of face recognition. For example, it has been shown that neural networks perform similarly to humans in that they learn atypical faces faster than typical faces (Burton, Bruce, & Johnson, 1990; Valentine & Ferrara, 1991) and are more accurate in categorizing a typical face as a member of the general category “face” than an atypical face (Valentine & Ferrara, 1991). The model described in this paper is similar to these networks in that it accepts as its input an abstract “feature” vector composed of individual feature units. Feature vectors are intended to represent the structural aspects of the face, such as the face outline, internal features, and spatial configuration, that serve as input to higher recognition units (Bruce & Young, 1986). In these models, faces that are structurally similar to each other are represented by feature vectors with similar values. Consistent with a norm-based coding approach, previous neural network models have demonstrated the capability to abstract a prototypical representation from a set of specific exemplars. For example, McClelland and Rumelhart (1985, 1986a) trained a neural network model on a set of inputs representing different dogs. Although the network responded to individual dog exemplars, it responded most strongly to the average or prototypical dog. If neural networks abstract a prototypical or normative representation, can this representation be

308

TANAKA AND SIMON

Downloaded by [University of Victoria] at 19:39 05 March 2015

used to produce a caricature as predicted by the norm-based coding theory of face recognition? In the present paper, we explore the computational aspects of the norm-based coding theory through a series of neural network simulations. In the first simulation, we test two competing hypotheses of the caricature effect. In a second simulation, we examine the effects of familiarity on the recognition of caricatures and anti-caricatures. Finally, in the third simulation, we contrast the effects of caricaturization on the recognition of typical and atypical faces.

SIMULATIONS 1a and 1b A Test of the Caricatured Trace and Distinctive Encoding Hypotheses Although the caricature effect has been interpreted as evidence for the normbased coding theory, there has been debate as to whether it reflects an underlying distortion in the face memory or the beneficial encoding properties of the caricature stimulus. The caricatured trace hypothesis suggests that face recognition involves some type of caricature process (Rhodes et al., 1987). Like the caricature artist, face recognition processes exaggerate the distinctive features of an individual relative to a normative face before storing the representation in long-term memory. According to the caricatured trace hypothesis, faces are represented in memory not as veridical representations, but as caricatures. Hence, caricature drawings serve as better retrieval cues than veridical drawings, because they are more similar to the underlying stored representation. Rhodes et al. (1987) provided some support for the caricatured trace position. They found that in a naming task, subjects identified caricature drawings of familiar individuals faster than veridical drawings. Moreover, subjects judged caricatures as depicting better likenesses of individuals than the veridical drawings (Benson & Perrett, 1991; Rhodes et al., 1987). These results are consistent with the view that caricature drawings provide an equal or better fit to the stored face representation than veridical drawings. Although some studies have found that subjects were faster and more accurate at identifying caricature drawings than veridical drawings, other studies have shown that subjects were only as fast (Rhodes & McLean, 1990) or as accurate (Rhodes & Tremewan, 1994) at identifying the caricature drawings as the veridical drawings. Rather than a caricature advantage, these studies reported a caricature equivalence effect in that subjects recognized caricature and veridical drawings better than equally distorted anti-caricature drawings. Although likeness measures revealed that the best resemblance of an individual was interpolated to be a slightly caricaturized version, no differences in likeness ratings were found between the actual veridical and caricature drawings used in these studies (Benson & Perrett, 1991; Rhodes et al., 1987).

Downloaded by [University of Victoria] at 19:39 05 March 2015

CARICATURE RECOGNITION

309

The distinctive access hypothesis also claims that individual faces are encoded with respect to deviations from the normative face (Rhodes et al., 1987). However, in contrast to the caricatured trace hypothesis, the distinctive access position maintains that the deviations are not stored as exaggerated representations, but as veridical ones. According to the distinctive access approach, a face is stored in memory according to its distinctive second-order relational properties. By exaggerating second-order relational properties in the caricature drawings, the artist directs the viewer’s attention to distinctive properties of the face, thereby facilitating quicker access to the stored face representation. Thus, the caricature advantage can be explained as an interaction between the stimulus properties of the caricature drawing and the underlying face representation. In the following simulations, a computational test between the competing hypotheses of caricatured trace and distinctive access was performed: (1) A neural network was trained to recognize three (Simulation 1a) or eight (Simulation 1b) feature vectors. (2) An average feature vector was presented to the network to establish whether it had abstracted a prototypical representation from the input. (3) The predictions of the caricatured trace and distinctive access hypotheses were tested by presenting veridical and caricaturized versions of the learned feature vectors to the model. According to the caricatured trace hypothesis, there should be an absence of a caricature advantage because the exaggerated caricature representations were not explicitly encoded in memory. On the other hand, the distinctive access hypothesis predicts a caricature advantage based on the network’s ability to abstract the distinctive feature information relative to the normative face.

Method Description of Neural Network Model The neural network was created with the “back-propagation” template in the McClelland and Rumelhart software package (1986b). The model was designed as a three-layer network consisting of an eight-feature unit input layer, a threeunit hidden layer, and a three-unit output layer. The network was fully connected, such that each input unit was linked to all of the hidden units and each hidden unit linked to all the output units. The eight input units formed a feature vector whose activation values were either 1 2 or 1 1, and these input units were associated with output units whose activation values were either 0 or 1 1. Learning was accomplished by computing the difference between the obtained and desired output and adjusting the connection weights between units using the standard back-propagation algorithm (McClelland & Rumelhart, 1986a) with a learning rate of 0.03. Connection weights between units were either excitatory or inhibitory, and weights were updated at the end of each epoch of training. During learning, activation values of the hidden and output units were updated via a sigmoid activation function (McClelland & Rumelhart, 1986a).

310

TANAKA AND SIMON

Downloaded by [University of Victoria] at 19:39 05 March 2015

Simulation 1a: Learning Three Feature Vectors Training in Simulation 1a. In Simulation 1a, the network was trained to recognize three veridical feature vectors {2, 2, 2, 2, 1, 1, 1, 1}, {2, 2, 1, 1, 2, 2, 1, 1} {2, 1, 2, 1, 2, 1, 2, 1}, representing the output face vectors of {1, 0, 0}, {0, 1, 0}, and {0, 0, 1}, respectively. Instead of presenting the veridical feature vectors for learning, eight permutations of the veridical vectors were created by flipping the value of one of the eight input units to either 1 2 or 1 1. For example, a permutation of the first feature vector would be the vector {1, 2, 2, 2, 1, 1, 1, 1}. The permutated feature vectors represented the slight perturbations of an individual’s face that might arise due to changes in facial expressions or viewing conditions. A total of 24 feature vectors—eight permutations of the three veridical feature vectors—were presented randomly for learning to the network. Training continued for 50 learning epochs. A total of ten simulations was carried out, and before the start of each simulation connection weights were initialized to a small, random value. Results and Discussion of Simulation 1a. An initial test was performed to determine whether the model had abstracted the prototypical patterns from the set of permuted feature vectors. As a test of prototype abstraction, the network’s output activations to the 24 learned exemplar vectors were compared to the output activations of the three prototypical vectors. The three prototypical vectors produced higher output activations than 22 of the 24 learned exemplar vectors. Thus, consistent with similar distributed memory models (McClelland & Rumelhart, 1986a), the network demonstrated prototype abstraction by computing the central tendency from the set of given inputs. An assumption common to the caricatured trace and distinctive encoding hypotheses is that all faces are encoded with respect to their deviations from a normative representation. To test for the presence of a normative face representation, a normative vector was generated by averaging the values of the eight feature units of the three veridical feature vectors yielding a normative feature vector of {2, 1.67, 1.67, 1.33, 1.67, 1.33, 1.33, 1}. When presented to the network, the normative feature vector produced levels of activation that were roughly equivalent in the three output face units. Specifically, the first output unit captured an average of 31%, the second 28%, and the third 39% of the available activation. Thus, all three face units were partially activated by the normative feature vector, with none of the face units collecting the majority of the available activation. The critical test between the caricatured trace hypothesis and distinctive access hypothesis was the model’s response to the caricature feature vectors. The caricatured trace hypothesis predicts no advantage for the caricature feature vectors because the distorted vectors were not encoded in memory. On the other hand, the distinctive access hypothesis predicts a possible caricature advantage

CARICATURE RECOGNITION

311

contingent upon the model’s ability to identify distinctive features of the learned veridical vectors. As a test of these competing hypotheses, caricature versions of the veridical feature vectors were produced by the following equation:

Downloaded by [University of Victoria] at 19:39 05 March 2015

caricaturej 5

Ij 1 b (Ij 2

normj)

where caricaturej is the new caricature value, Ij is the activation value of the original feature unit, b is a caricature constant that controls the amount of exaggeration, and norm j is the value of the average feature. The caricature equation is similar to the equation found in Brennan’s Caricature Generator Program (Brennan, 1985). The caricature equation has the quality of emphasizing veridical features in proportion to their deviations from the normative feature. Features that show large deviations from the average face are exaggerated more than features with small deviations. Applying a b value of 1.00 in the caricature formula produced caricature feature vectors of { 1 2, 1 2.33, 1 2.33, 1 2.67, 1 0.33, 1 0.67, 1 0.67, 1 1}, {1 2, 1 2.33, 1 0.33, 1 0.67, 1 2.33, 1 2.67, 1 0.67, 1 1}, {1 2, 1 0.33, 1 2.33, 1 0.67, 1 2.33, 1 0.67, 1 2.67, 1 1}. When presented to the network, the veridical vectors produced an average activation output of 0.753 (out of a possible 1.000), whereas the caricature vectors produced an average output of 0.789. The difference between veridical activation and caricature activation was reliable, t(9) 5 16.30, p , 0.001. Thus, consistent with predictions of the distinctive access hypothesis, the model produced a caricature advantage without storing the caricature distortions in memory. What was the source of the caricature advantage? Inspection of the connection values between the feature units and the face units revealed that the model assigned a larger weight value to those connections that were most distinctive for a given feature vector. Feature units that had high discrimination value for a particular feature vector had larger weight values than those features that were less discriminating. The distinctive feature information computed in the connection weights of the model combined with similar distinctive information computed in the caricature equation yielded the overall caricature advantage.

Simulation 1b:

Learning Eight Feature Vectors

The goal of Simulation 1b was to replicate the caricature effects found in Simulation 1a using a larger number of feature vectors in the training set. It has been shown that results derived from smaller simulations do not necessarily “scale-up” to simulations that incorporate more exemplars in the training set (Hinton & Shallice, 1991). In Simulation 1b, we tested the presence of a caricature effect using eight, rather than three, feature vectors in the training set.

Training in Simulation 1b. In Simulation 1b, the network was trained to learn eight veridical input vectors. Once again, rather than presenting the veridical vectors to the network for learning, permutations were created by

312

TANAKA AND SIMON

Downloaded by [University of Victoria] at 19:39 05 March 2015

flipping the value of one of the eight feature units to either 1 2 or 1 1. A total of 64 feature vectors—eight permutations of the eight veridical feature vectors— was presented randomly for learning to the network. The input feature vectors were associated with patterns of 1’s and 0’s distributed across the three output units. Learning was again accomplished using a back-propagation algorithm (McClelland & Rumelhart, 1986a) with a learning rate of 0.03. Training continued for 50 learning epochs. A total of 10 simulations was carried out, and before the start of each simulation connection weights were initialized to a randomly small value.

Results and Discussion of Simulation 1b. After 50 learning epochs, the activation was computed for veridical and caricature vectors. The average activation produced by the veridical vectors was 0.679, as compared to 0.702 produced by the caricature vectors. The difference between veridical activation and caricature activation was reliable, t(9) 5 8.347, p , 0.001. These results were consistent with the findings of Simulation 1a showing that the network demonstrated a caricature advantage despite being trained with only veridical vectors. The simulation findings are interpreted as support for the distinctive access hypothesis, but it is not clear as to whether the caricature advantage was produced by the recognition of multiple features or a single distinctive feature in the caricature stimulus. Because all features in a caricature drawing are distorted in relation to each other, it has been assumed that the caricature advantage is produced by the encoding of multiple features. However, it is equally plausible that the caricature advantage emerges from the recognition of a single distinctive feature that has been exaggerated in the caricature stimulus. For example, a person with a highly distinctive nose might be more easily recognized when only the nose is caricaturized. If the caricature advantage is based on the recognition of a single distinctive feature, exaggeration of the distinctive feature should produce the same advantage as exaggeration of all the features in the representation. To test this prediction, activations of highly distinctive feature caricatures were compared against activations of the original caricature vectors (i.e. inputs created by the caricature generator equation). For this test, only feature vectors from Simulation 1b that contained a highly distinctive feature were selected. A feature vector contained a highly distinctive feature if one of its features showed the most deviation from the prototype relative to the deviations of the other features in the vector. Of the eight feature vectors in the training set, two feature vectors contained a highly distinctive feature. Highly distinctive feature caricatures were obtained by computing the combined caricature distortion in the feature vector and assigning this value to the highly distinctive feature while reassigning the other features back to their veridical values. For example, the caricature version of feature vector {2, 2, 1, 1, 2, 2, 1, 1} was {2, 2.25, 0.5, 0.5,

Downloaded by [University of Victoria] at 19:39 05 March 2015

CARICATURE RECOGNITION

313

2.5, 2.75, 0.5, 1} and therefore had a combined caricature distortion value of 3 (i.e. 0 1 0.25 1 0.50 1 0.50 1 0.50 1 0.75 1 0.50). The highly distinctive feature caricature for this vector was {2, 2, 1, 1, 2, 5, 1, 1}. Using this procedure, the effects of highly distinctive feature caricatures could be compared to the original feature caricatures while controlling for the total amount of input into the network. Using the weight matrices from Simulation 1b after 50 epochs of learning, output activation was computed for highly distinctive feature caricatures and the original caricatures. The difference in activation between the highly distinctive feature caricatures (m 5 0.804) and the original caricatures (m 5 0.830) was reliable, t(9) 5 3.057, p , 0.02. This result suggests that the caricature advantage is produced by the graded activation of an ensemble of distinctive features rather than the absolute activation of a single distinctive feature. Because computing graded distortions across many dimensions can be accomplished relative to a prototypical representation, these results provide further support for a norm-based coding theory of face recognition.

General Discussions of Simulations 1a and 1b According to these results, the caricature paradox is resolved in the following manner: Faces are encoded in memory with respect to their distinctive properties. By emphasizing the same features in the picture stimulus that are distinctive in memory, caricature inputs activate face representations more strongly than do veridical inputs. Higher levels of activation lead to enhanced access to veridical face representations. Thus, this interpretation permits better recognition of a face stimulus whose features are distorted despite the storage of a veridical face representation. The present simulation does not rule out the possibility that the recognition system encodes a distorted representation rather than a veridical representation. It only demonstrates that the extra computational step is not necessary to produce a caricature advantage. Because the network responded more strongly to caricature inputs than to veridical inputs, it could be argued that the network represented faces as caricatures. Indeed, we claimed that the network represented faces as prototypes rather than permutations based on the network’s enhanced response to the prototype input. However, prototype abstraction is the direct result of the network’s capability to extract the central tendencies found in a given set of inputs (McClelland & Rumelhart, 1985, 1986a). In contrast, caricatures are not representations of the central tendencies of the inputs, but distortions away from central tendencies of the inputs. Therefore, the mechanisms responsible for caricature effects and prototype abstraction cannot be the same. It is therefore possible for the network to have prototypical representations without also having caricaturized representations.

314

TANAKA AND SIMON

SIMULATION 2:

Downloaded by [University of Victoria] at 19:39 05 March 2015

Familiarity Effects on Caricature and Anti-caricature Recognition In Simulation 2, we examined the role of experience on caricature effects. Previous studies have shown that subjects demonstrate a caricature effect for familiar faces, but not for unfamiliar faces (Benson & Perrett, 1991; Rhodes et al., 1987; Rhodes & Moody, 1990; but see Rhodes & Tremewan, 1994). Similarly, in a study involving expert subjects, it has been found that bird experts show a caricature advantage for recognizing birds for which they have considerable field experience but fail to show a caricature advantage for birds with which they have less experience (Rhodes & McLean, 1990). As evidenced by their poor overall recognition and lack of a caricature advantage, it seems that early on in learning, subjects do not possess good individuating information about a face. The distinctive access hypothesis predicts that with repeated learning experiences, the distinctive aspects of a face are better encoded in memory, leading to better recognition and a stronger caricature advantage. In this simulation, to test for the effects of familiarity, the strength of the caricature advantage (i.e. the difference in activation between veridical and caricature inputs) was measured over 200 learning epochs. In addition to measuring output activation, output suppression was also calculated in Simulation 2. Suppression was measured by averaging the activation of non-target outputs produced by the veridical and caricature vectors. It was expected that caricature and veridical inputs should not only activate the appropriate target outputs, they should also be effective in suppressing inappropriate non-target outputs. Simulation 2 also tested the effects of anti-caricature inputs on activation and suppression levels. Anti-caricature representations, in contrast to caricature representations, de-emphasize the distinctive qualities of a face with respect to the normative face. As shown in Fig. 1, caricature and anti-caricature faces can be created such that they are equidistant from the veridical representation in face space and, hence, are equal in distortion. However, although anti-caricature and caricature faces can be equated with respect to their distortion from the veridical face, subjects are slower and less accurate to recognize anti-caricatures than they are to recognize caricatures (Benson & Perrett, 1991; Rhodes & McLean, 1990; Rhodes & Moody, 1990; Rhodes et al., 1987; Rhodes & Tremewan, 1994). In Simulation 2, anti-caricatures were created by applying the caricature generator equation with a negative b value.

Method Simulation 2 used the same 24 training patterns and learning parameters as Simulation 1a. The 24 patterns were randomly presented to the network for learning. Veridical, caricature, and anti-caricature vectors were created via the

CARICATURE RECOGNITION

315

caricature generator equation using a b of 0, 1 1.0 and 2 0.50, respectively. The value of 2 0.50 was selected as the b for the anti-caricatures because it represented the intermediate value between the veridical and normative feature vectors. Before the start of each run, connection weights were initialized to a randomly small value. In Simulation 2, a total of 10 simulation runs was executed.

Downloaded by [University of Victoria] at 19:39 05 March 2015

Results and Discussion The average activation and suppression effects of the veridical, caricature, and anti-caricature vectors were plotted as a function of learning epochs, as shown in Fig. 2. In the early stages of learning (Epochs 0 to 20), the amount of activation produced by the veridical vectors and the caricature vectors was essentially identical, p . 0.10; hence, there was no evidence of a caricature advantage at this point in learning. By Epoch 20 and continuing through Epoch 100, a reliable caricature advantage was found, whereby the caricature vector produced more activation than its veridical counterpart, p , 0.01. However, with increased learning, the magnitude of the caricature advantage gradually diminished, to the point where, at Epoch 200, no caricature advantage was found, p . 0.10. In contrast, a caricature equivalence effect (i.e. a reliable difference in activation between anti-caricature vectors and both veridical and caricature vectors) was observed at Epoch 20 and remained reliable through Epoch 200, p , 0.01.

FIG. 2. Results from Simulation 2 show levels of activation and suppression for veridical, caricature, and anti-caricature vectors over 100 learning epochs.

Downloaded by [University of Victoria] at 19:39 05 March 2015

316

TANAKA AND SIMON

These observations were confirmed in a two-way ANOVA with learning epoch and face type (veridical, caricature and anti-caricature) as within-simulation factors. The main effect of epoch was significant, F(19, 171) 5 329.622, MSe 5 1431.652, p , 0.001, indicating that recognition improved with experience. The main effect of face type reached significant levels, F (2, 18) 5 646.607, MSe 5 321.390, p , 0.001, and this factor interacted significantly with epoch, F (38, 342) 5 44.578, MSe 5 3.964, p , 0.001, indicating that the difference in activation between the caricature, veridical, and anti-caricature vectors varied as a function of learning. With respect to suppression, at Epoch 20 caricature inputs produced more inhibition than did veridical inputs, p , 0.001, which, in turn, produced more inhibition than did anti-caricature inputs, p , 0.001. At Epoch 100, veridical inputs still produced more suppression than anti-caricature inputs, p , 0.01, but the amount of suppression produced by caricature inputs and veridical inputs did not differ significantly, p . 0.10. This pattern of suppression remained relatively stable throughout the remainder of learning. Suppression effects were tested in a two-way ANOVA with learning epoch and face type (veridical, caricature and anti-caricature) as within-simulation factors. The main effect of epoch was significant, F (19, 171) 5 186.288, MSe 5 309.581, p , 0.001, indicating that suppression of competing distractors increased with experience. The main effect of face type also reached significant levels, F(2, 18) 5 195.423, MSe 5 22.827, p , 0.001, demonstrating that anticaricatures and veridical vectors produced more suppression than the anticaricature vectors. The Epoch 3 Face Type interaction was also significant, F (38, 342) 5 17.689, MSe 5 0.334, p , 0.001. In summary, consistent with the empirical results, the neural network produced a familiarity effect such that less familiar faces—faces that the network had seen only a few times—were poorly recognized and showed no caricature advantage, whereas familiar faces were accurately recognized and exhibited a reliable caricature advantage in activation. This finding is compatible with the norm-based coding model, which proposes that veridical and caricature recognition processes are contingent upon learning the distinctive deviations of a face. Unexpectedly, with high levels of familiarity, it was found that the caricature advantage disappeared. This result seemed a bit puzzling, but would not be unexpected if it is assumed that with extensive experience, the recognition system becomes acutely tuned to the subtle features that individuate a familiar face. Therefore, the enhanced perceptual information provided by the caricature drawing may reach a ceiling, such that a caricature does not facilitate the recognition of a highly familiar face (e.g. spouse, best friend). Clearly, the waxing and waning of the caricature advantage demonstrated by the present simulation is preliminary and awaits future empirical test. Another central finding of Simulation 2 concerns the differences between the caricature and anti-caricature effects. As shown in Fig. 2, the relatively small

CARICATURE RECOGNITION

317

Downloaded by [University of Victoria] at 19:39 05 March 2015

output differences between the caricature and veridical vectors are contrasted with the relatively large output differences between the veridical and anti-caricature vectors. This was true for both measures of target activation and nontarget suppression. The difference in effects was noteworthy, considering that the absolute caricature value b was less for the anti-caricature vectors (b 5 2 0.50) than for the caricature vectors (b 5 1 1.00). Thus, in Simulation 2, the magnitude of the caricature advantage (i.e. the difference between the caricature and veridical vectors) was relatively modest, but the magnitude of the caricature equivalence effect (i.e. the difference between anti-caricature vectors and both veridical and caricature vectors) was relatively robust.

SIMULATION 3: Caricature Recognition of Typical and Atypical Faces In the following simulation, we explored the effects of caricaturization on typical and atypical faces. The effects of typicality on face recognition performance have been well documented in the literature. The general finding is that when subjects are asked to remember randomly presented typical and atypical faces, their recognition is reliably better for atypical faces than for typical faces (Barlett, Hurry, & Thorley, 1984: Going & Read, 1974; Light, Kayra-Stuart, & Hollander, 1979). One explanation of the typicality effect is that typical faces are more similar to each other and, therefore, more confusable than are atypical faces. This explanation is consistent with the finding that in old/new recognition task subjects are more likely to identify a new typical face falsely as “old” than a new atypical face (Barlett et al., 1984; Light et al., 1979). What are the effects of caricaturization on typical and atypical faces? This question has not been directly addressed in the empirical literature, but the distinctive access hypothesis predicts that caricaturization would have differential effects on face recognition as a function of typicality. In a norm-based coding model, the typicality of a face can be represented as a distance from the normative face. As shown in Fig. 3, typical faces are located in a region of high face density, whereas atypical faces are located in a region of low face density. According to the distinctive access hypothesis, caricaturization of a typical face would enhance the perceptual saliency of its identifying features, thereby facilitating its recognition. Given that the distinctive features of an atypical face are already perceptually salient, caricaturization should have less of a facilitative effect on recognition of an atypical face. Thus, the distinctive access hypothesis predicts that equal amounts of positive caricaturization should produce greater facilitation for the recognition of typical faces than for atypical faces. In this simulation, the predictions of the norm-based coding theory were tested by training the neural network to recognize typical and atypical feature vectors and then testing the model for caricature effects.

Downloaded by [University of Victoria] at 19:39 05 March 2015

318

TANAKA AND SIMON

FIG. 3. Diagram depicts veridical and caricature representations of three typical faces and one atypical face in face space.

Method To test for the effects of typicality, four input feature vectors were created. As in the other simulations, each input vector consisted of 8 units. The 3 typical feature vectors differed from each other by 2 units {2, 2, 1, 2, 2, 1, 1, 1}, {1, 2, 1, 1, 2, 2, 2, 1, 1}, {2, 2, 1, 1, 2, 2, 1, 1}. The atypical feature vector differed from the other three feature vectors by four units, {2, 1, 2, 1, 1, 1, 2, 1}. The four feature vectors were associated with a distributed output pattern of {1, 0, 0}, {0, 0, 1}, {0, 1, 1}, or {1, 1, 0}. Similarly to previous simulations, feature vectors were randomly presented to the network for learning. Training continued for 50 epochs of learning. Before the start of each run, connection weights were initialized to a randomly small value. Ten simulation runs were executed.

Results and Discussion The mean output activations for the typical veridical vectors and typical caricature vectors were 0.781 and 0.826, respectively. The mean output activations for the atypical veridical vectors and atypical caricature vectors were 0.907 and 0.927, respectively The output activations were submitted to a two-way ANOVA with typicality (typical and atypical) and face type (veridical and caricature) as

319

within-simulation factors. Consistent with the empirical findings (Barlett et al., 1984; Going & Read, 1974; Light et al., 1979), atypical faces were better recognized by the network than were typical faces, F(1, 9) 5 39.427, MSe 5 122.102, p , 0.001. The main effect of face type was also significant F (1, 9) 5 36.052, MSe 5 8.702, p , 0.001, indicating that caricature versions of the faces were better recognized than were veridical versions. The critical Typicality 3 Face Type interaction was also significant, F (1, 9) 5 27.990, MSe 5 22.402, p , 0.001. As shown in Fig. 4, caricaturization produced greater facilitation effects for the recognition of typical faces than for the recognition of atypical faces. The effect of typicality on caricature recognition is most consistent with the distinctive access hypothesis. The distinctive access hypothesis claims that by emphasizing the distinctive properties of the face, the caricature drawing serves as a better decoding stimulus for recognition than does the veridical drawing. However, the caricature–typicality interaction indicated that more than just stimulus properties of the caricature drawing contributed to the magnitude of the caricature advantage. It is important to note that the amount of absolute exaggeration produced by caricaturization will vary as a function of a face’s location in face space. Specifically, faces that are located farther away from the normative face will be exaggerated more by an equivalent caricaturization value than faces located nearer to the normative face. Although the distinctive features of the atypical face were exaggerated more than the distinctive features of the typical faces, the atypical face demonstrated less of a caricature advantage. 1.00

n

Typical

l

Atypical

0.95 0.90

Activation

Downloaded by [University of Victoria] at 19:39 05 March 2015

CARICATURE RECOGNITION

l n

0.85 0.80

l

0.75

n

0.70 0.65 0.60

Caricatu re

Veridical Face Type

FIG. 4. Results from Simulation 3 show levels of activation for veridical and caricature representations of atypical and typical faces.

Downloaded by [University of Victoria] at 19:39 05 March 2015

320

TANAKA AND SIMON

Thus, the caricature advantage was jointly determined by the amount of caricature exaggeration and the face’s location in the face space representation. The connection between caricature effects and typicality has also been suggested in the empirical literature. In a bird recognition study, Rhodes and McLean (1990) found that the magnitude of the caricature advantage reliably correlated with the bird typicality (i.e. caricatures of typical birds showed a larger recognition advantage than did caricatures of atypical birds). In a recent study, Dodd and Perrett (submitted) taught subjects to identify caricature, anticaricature, or veridical versions of common automobiles. When the recognition for caricaturized versions of these stimuli were tested, subjects showed the largest caricature advantage for the anti-caricature versions, an intermediate advantage for veridical versions, and the smallest advantage for the caricaturized versions. According to Dodd and Perrett, because the anti-caricaturized automobiles were the least distinctive, they benefited the most from the additional exaggeration. Thus, although caricaturization improves recognition of the typical exemplars because differentiating information is made perceptually more salient, caricaturization of atypical exemplars is relatively less effective because their differentiating information is already perceptually distinctive.

GENERAL DISCUSSION The norm-based coding theory of face recognition claims that all faces are encoded in memory according to their deviations from the average face. Caricature drawings provide the ideal test stimuli for the norm-based coding theory because in a caricature drawing facial features have been graphically exaggerated in proportion to their deviations from the average face. Empirical studies have shown that subjects recognize caricatured faces better than or equally as well as veridical faces, and this effect of caricaturization has been interpreted as support for the norm-based theory of face recognition (Rhodes et al., 1987). In the current study, the caricaturization effects were investigated in a series of computer simulations. As one account of the caricature effect, the caricature trace hypothesis claims that face representations are caricaturized in memory, and therefore, a caricature drawing provides a better fit to the stored face memory than does the veridical drawing. In contrast, the distinctive access hypothesis maintains that the face representation is veridical, but the caricature drawing is a better decoding stimulus because it graphically exaggerates the second-order relational properties of a face. Simulation 1 tested the two competing hypotheses of the caricature effect by training a neural network with veridical feature vectors. As predicted by the distinctive access hypothesis, the network demonstrated a reliable caricature advantage based on the storage of veridical face representations. Thus, caricature effects were obtained without appealing to the storage of caricature distortions.

Downloaded by [University of Victoria] at 19:39 05 March 2015

CARICATURE RECOGNITION

321

Although caricaturization facilitates the recognition of most faces, there are cases where the enhancement of faces whose features are already distinctive— that is, atypical faces—seems less crucial. The test of reduced caricature effects for the recognition of atypical faces was examined in Simulation 3. Consistent with the distinctive access prediction, it was found that veridical atypical faces were better recognized than were veridical typical faces, and caricaturization had less impact on the recognition of atypical faces than on the recognition of typical faces. The results are compatible with the distinctive access hypothesis, which claims that the function of the caricaturization process is to facilitate face recognition by making a face more distinctive. Although these findings emphasize the stimulus properties of caricatures, the current simulations also investigated the role that experience plays in the caricature effect. Empirical findings indicate that a caricature effect is not obtained when subjects identify unfamiliar faces (Rhodes et al., 1987) or unfamiliar objects (Rhodes & McLean, 1990; but see Rhodes & Tremewan, 1994). These results suggest that the encoding of norm-based deviations accrues with time over the course of multiple stimulus exposures. The time course of caricature effects was examined in Simulation 2, where the network’s response to veridical and caricature input was measured after every 10 training epochs. Consistent with the empirical data, this simulation revealed a reliable interaction between caricature effect and learning epoch, such that the magnitude of the caricature effect increased with learning, to a point. Although the overall recognition levels increased with learning, the relative magnitude of the caricature effect was smaller with high levels of training as compared to moderate levels of training. This result was unexpected, but might be explained by the distinctive access hypothesis if it is assumed that with extensive training perceptual processes become exquisitely tuned to the second-order distinctions of very familiar faces. Therefore, the initial stimulus advantage provided by the caricature drawing may be overridden by extensive experience and practice. With respect to the caricatured trace and distinctive access hypotheses of the caricature effect, the simulation results seemed most compatible with the distinctive access approach. Simulation 1 showed that a reliable caricature advantage could be obtained without the creation of new caricature representations. Simulation 2 demonstrated the absence of a caricature suppression effect, suggesting that once activated, the veridical face representation mediates the suppression of non-target faces. Finally, Simulation 3 showed a diminished caricature effect for recognition of atypical faces, indicating that the caricaturization is less effective for improving the accessibility of already distinctive faces. As an alternative to a norm-based model, the exemplar-based approach claims that distinctive information could be obtained by comparing the features of a target face against the features of all faces stored in memory. The distinctive features are identified as those features that differ the most from other faces. According to this approach, the number of computations needed to calculate face

Downloaded by [University of Victoria] at 19:39 05 March 2015

322

TANAKA AND SIMON

distinctiveness would increase as the number of stored faces increased. Although the exemplar-based approach is computationally feasible, it seems more efficient to calculate distinctiveness on the basis of a single comparison (i.e. the difference between a given face and the normative face), as proposed by the normbased hypothesis, than on the basis of multiple comparisons, as proposed by the exemplar-based hypothesis. It is interesting that something analogous to a caricature effect has been reported in the animal literature. It has been found that females in species as diverse as birds of paradise, guppies, frogs, and insects prefer males with exaggerated featural patterns (Kirkpatrick & Ryan, 1991). Simulating this preference for exaggerated stimuli, Enquist and Arak (1993) trained a neural network to recognize bird-like stimuli with elongated tail patterns. Enquist and Arak found that the network responded most positively to novel supra-normal test patterns (i.e. patterns in which the length of tail was exaggerated). Although it is not clear whether this preference is derived from a norm-based representation, these studies suggest that the caricature effect might be a general strategy of the recognition system. These simulations shed some light on conditions under which a caricature effect would be expected to occur: (1) Objects must form a homogeneous shape category, such that corresponding locations can be identified on all exemplars, and when exemplars are averaged, they generate a new exemplar of the category (Diamond & Carey, 1986)2 . Although faces are the paradigmatic example of such an object category, other object categories, such as birds and automobiles, have members that share similar parts and are arranged in a prototypical configuration. Therefore, it is not surprising that subjects show caricature effects when identifying objects from these categories (Dodd & Perrett, submitted; Rhodes & McLean, 1990). (2) Subjects must have sufficient experience in making withincategory distinctions. In face recognition, within-category distinctions are mandatory, given that everyday face recognition requires that a face be recognized at the level of “unique identity” (Tanaka, in preparation). In other types of recognition, however, it is common for people to identify objects only as a member of a general or basic level object category (e.g. “dog”, “car”, “chair”) (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). This more general form of recognition does not rely on second-order relational information and would therefore not be expected to promote caricature effects. This is not to say that face recognition is necessarily “special” in its use of norm-based coding. Indeed, people who specialize in within-category recognition, such as bird and dog experts, become keenly aware of distinctive second-order relational information (Diamond & Carey, 1986; Tanaka & Taylor, 1991). Expert subjects who are sensitive to these metric distinctions demonstrate facilitative effects of caricaturization (Rhodes & McLean, 1990). 2 In other simulations, we found that there was a diminished caricature advantage for face inputs that were more heterogeneous than the ones described in these simulations.

CARICATURE RECOGNITION

323

In conclusion, results from these simulations provide computational support for the norm-based coding theory of face recognition and the distinctive access hypothesis account of the caricature effect. Although many of our findings confirm previously reported experimental results, some of our simulation findings, specifically those regarding effects of repeated exposures and typicality, have not been empirically tested and suggest avenues for future research.

Downloaded by [University of Victoria] at 19:39 05 March 2015

REFERENCES Barlett, J.C., Hurry, S., & Thorley, W. (1984). Typicality and familiarity of faces. Memory and Cognition, 12, 219–228. Benson, P.J., & Perrett, D.I. (1991). Perception and recognition of photographic quality facial caricatures: Implications for the recognition of natural images. European Journal of Cognitive Psychology, 3, 105–135. Brennan, S.E. (1985). The caricature generator. Leonardo, 18, 170–178. Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305–327. Burton, M., Bruce, V., & Johnston, R.A. (1990). Understanding face recognition with an interactive activation model. British Journal of Psychology, 81, 361–380. Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology; General, 115, 107–117. Dodd, J.V., & Perrett, D.I. (1996). The effect of caricaturing on learning and recognition of car shapes. Manuscript submitted for publication. Enquist, M., & Arak, A. (1993). Selection of exaggerated male traits by female aesthetic senses. Nature, 361, 446–448. Gibson, J.J. (1973). On the concept of formless invariants in visual perception. Leonardo, 6, 3. Going, M., & Read, D.J. (1974). Effects of uniqueness, sex of subject, and sex of photograph on facial recognition. Perceptual & Motor Skills, 39, 109–110. Hinton, G.E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 98, 74–95. Kirkpatrick, M., & Ryan, M.J. (1991). The evolution of mating preferences and the paradox of the lek. Nature, 350, 33–38. Light, L.L., Kayra-Stuart, F.,& Hollander, S. (1979). Recognition memory for typical and unusual faces. Journal of Experimental Psychology; Human Learning and Memory, 5, 212–228. Mauro, R., & Kubovy, M. (1992). Caricature and face recognition. Memory and Cognition, 20, 433–440. McClelland, J.L., & Rumelhart, D.E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159–188. McClelland, J.L., & Rumelhart, D.E. (1986a). A distributed model of human learning and memory. In D.E. Rumelhart, J.L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition, Vol. II. Cambridge, MA: Bradford Books.

Downloaded by [University of Victoria] at 19:39 05 March 2015

324

TANAKA AND SIMON

McClelland, J.L., & Rumelhart, D.E. (1986b). Explorations in parallel distributed processing. Cambridge, MA: Bradford Books. Rhodes, G., Brennan, S., & Carey, S. (1987). Identification and ratings of caricatures: Implications for mental representations of faces. Cognitive Psychology, 19, 473–479. Rhodes, G., & McLean, I.G. (1990). Distinctiveness and expertise effects with homogenous stimuli: Towards a model of configural coding. Perception, 19, 773–794. Rhodes, G., & Moody, J. (1990). Memory representations of unfamiliar faces: Coding of distinctive information. New Zealand Journal of Psychology, 19, 70–78. Rhodes, G., & Tremewan, T. (1994). Understanding face recognition: Caricature effects, inversion and the homogeneity problem. Visual Cognition, 1, 275–311. Rosch, E., Mervis, C.B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439. Tanaka, J.W. (in preparation). The basic level of face recognition. Tanaka, J.W., & Taylor, M. (1991). Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 23, 457–482. Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology, 43A, 161–204. Valentine, T., & Ferrara, A. (1991). Typicality in categorization, recognition and identification: Evidence from face recognition. British Journal of Psychology, 82, 87–102. Revised manuscript received 7 July 1995