Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007
Incorporating Forgetting in a Category Learning Model Yasuaki Sakamoto and Toshihiko Matsuka Abstract— We present a computational model of human category learning that learns the essential structures of the categories by forgetting information that is not useful for the given task. The model shifts attention to salient information and learns associations between items and categories. Attention and association strengths are adjusted according to the degree of prediction errors the model makes. The attention and association weights are interpreted as memory strengths in the model and decay over time, allowing the model to focus on the salient structures. Using memory decay mechanisms, our model simultaneously explained human recognition and classification performances that previous models could not.
I. I NTRODUCTION We present a computational model of human category learning that learns the essential structures of the categories by forgetting information. The model develops knowledge structures based on prior experience and responds to information that deviates the acquired knowledge structures. For example, the concept “bird” is a knowledge structure that provides expectations such as flying when it is activated. Bats may be the exceptions that deviate the “bird” structure. The model detects category structures, such as regularities, by shifting attention to salient information and learning associations between items and categories. Attention plays a major role in guiding the encoding and retrieval of information [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Thus, attention strength is interpreted as memory strength in the model. The model responds to novelty by adjusting attention and association strengths according to the degree of prediction errors it makes (i.e., error-driven learning). A unique contribution of the current work is that attention and association strengths decay over time, and that this forgetting allows the model to focus on the essential structures. Like the model, humans tend to store only the gist of a story after delay [11]. In the remainder of the paper, we review previous work that examines memory for exceptions, introduce the model, present model fits to previous findings, and discuss our modeling results. II. M EMORY A DVANTAGE FOR E XCEPTIONS The ability to selectively attend to salient information is critical to all intelligent agents operating in complex environments. It has been known for a long time that information that violates a prevailing context draws people’s attention and is better remembered than information that is consistent with the context [12]. For example, people show better memory for a deviant event, such as a single word in upper-case in a list of lower-case words. Yasu Sakamoto and Toshi Matsuka are with the Howe School of Technology Management, Stevens Institute of Technology, Hoboken, NJ 07030, USA (email:
[email protected]).
1-4244-1380-X/07/$25.00 ©2007 IEEE
Deviation or novelty detection is the opposite of stimulus generalization and likely plays a central role in our mental development. Indeed, infants tend to show preference for a novel stimulus once they habituate to a familiar one [13] and this ability to respond to novelty is predictive of later intelligence [14]. Of course, information is not deviant without an expectation or a knowledge structure that can be violated. In the above example, the expectation is the tendency of words in the list to be in lower-case. The uppercase word violates this expectation. A robust finding from the category learning research is that exceptions to the category rules tend to be remembered better than items that are consistent with the rules [15], [16], [17]. We review Experiment 1 of Sakamoto and Love [16] because we simulate our model to their data. Many existing models of category learning have difficulty accounting for their results. A. Human Experiment 1) Design: In Sakamoto and Love [16], participants completed a learning phase consisting of trial by trial classification learning of the items under the heading Learning item in Table I. Items A1–A9 are members of category A and items B1–B5 are members of category B. Each item (e.g., A1) has five binary-valued dimensions (e.g., 21112). For example, the first column under the heading Dimension value could be the size dimension, and value 1 could be small and value 2 could be large, in which case item A1 was large. The following five dimensions were mapped (randomly assigned for each participant) onto the logical structure shown in Table I: size (small or large), color (blue or purple), border (yellow or white), texture (smooth or dotted), and diagonal cross (present or absent). As can be seen in Table I, the categories followed a rule-plus-exception category structure. The majority of items (i.e., A2–A9 and B2–B5) followed simple rules on the first dimension: e.g., “if small, category A” and “if large, category B” (as opposed to “if category A, small” because the learner predicts the category label given stimulus features in classification). An item from each category (i.e., A1 and B1) was inconsistent with the rules (e.g., a small item belonging to category B). A1 violates the feature-category rule of category B, whereas B1 violates the feature-category rule of category A. Thus, B1 violates the more frequent regularity of category A. Following learning, participants’ ability to recognize exceptions and rule-following items was measured. Finally, participants’ generalization behavior was measured in a transfer phase in which they classified studied and novel items without corrective feedback.
TABLE I
TABLE II
T HE ABSTRACT CATEGORY STRUCTURES USED IN S AKAMOTO AND L OVE [16]. T HERE IS AN IMPERFECT RULE ON WHICH ITEMS
A1
AND
Learning item Category A → A1 A2 A3 A4 A5 A6 A7 A8 A9 Category B → B1 B2 B3 B4 B5
B1
THE FIRST DIMENSION ,
H UMAN (O BS ) AND
21112 12122 11211 12211 11122 12111 11222 11212 12121
Novel item N1 N2 N3 N4 N5 N6 N7 N8 N9 N10
Dimension value 11221 12112 12221 12212 12222 21221 22112 22221 22212 22222
11121 22122 21211 22211 21122
2) Procedure: On each learning trial, participants saw one item on the screen, pressed the A or B key, and received corrective feedback. Participants completed 10 blocks of learning trial. A block was a successive presentation of all 14 learning items in a random order. After the learning phase, participants completed a recognition phase consisting of two-alternative forced choice (2AFC) recognition judgments on 50 pairs of stimuli presented in a random order. Each pair consisted of an item from the learning phase and a novel item (under the heading Novel item in Table I). Five studied items, A2–A5 and B1, with value 1 on the first dimension were paired with each of the five novel items with value 1 on the first dimension (i.e., N1–N5), which resulted in 25 pairs. Another set of 25 pairs was created in the same manner using the items with value 2 on the first dimension. Following 50 recognition trials, participants completed a transfer phase in which they classified all items but A6–A9 in Table I without corrective feedback. Participants completed two blocks of learning trials, where a block consisted of successive presentation of 20 items in a random order. 3) Results: Table II shows the results from Sakamoto and Love [16]. RA represents the rule-following A2–A9 in the learning phase, the 20 pairs involving A2–A5 in the recognition phase, and A2–A5 and N1–N5 in the transfer phase. RB represents the rule-following B2–B5 in the learning phase, the 20 pairs involving B2–B5 in the recognition phase, and B2–B5 and N6–N10 in the transfer phase. Exceptions A1 and B1 resulted in more classification errors in the learning and transfer phases and better recognition accuracies than the rule-following items. This robust memory phenomenon has been established in various forms [18], [19], [20], [21], [22], [23], [24] and is not specific to exceptions in category learning experiments. For example, deviant faces [25] and behaviors [26] result in enhanced memory. An aspect of Sakamoto and Love’s (2004) results that is
(MW = WITH ; MO =
WITHOUT MEMORY DECAY ) MEAN ACCURACIES IN THE LEARNING ,
VIOLATE ( INDICATED BY THE ARROWS ).
Dimension value
SIMULATED PARTICIPANTS ’
RECOGNITION , AND TRANSFER PHASES ARE SHOWN .
Item A1 B1 RA RB
Obs .46 .44 .92 .86
Learning MW .52 .42 .88 .75
MO .52 .42 .88 .75
Recognition Obs MW MO .79 .79 .80 .87 .84 .84 .70 .72 .75 .69 .72 .76
Obs .56 .64 .86 .85
Transfer MW .52 .66 .90 .82
MO .77 .76 .94 .83
problematic for many existing theories of category learning is the effect of the frequency manipulation on memory for exceptions. As shown in Table II, human learners recognized the exception B1, which violated the more frequently occurring rule, with significantly higher accuracy than the exception A1 (.87 vs. .79). This result parallels that from the schema and memory research in which schema-inconsistent information (akin to exceptions) that violates a stronger schema (i.e., more frequent knowledge structure) tends to be remembered better than inconsistent information deviating a weaker schema [24], [26]. Thus, humans are sensitive to the strengths of regularities and better remember an item that violates a stronger regularity. Furthermore, previous models cannot explain human performances in the learning, recognition, and transfer phases simultaneously. It is not trivial to find a model of category learning that can remember exceptions better than rulefollowing items but at the same time makes more classification errors on the exceptions than the rule-following items, like humans do. We briefly describe a subset of previous models and how they process rule-plus-exception category structures. Then, we present our model, which can simultaneously explain the human recognition and classification results from Sakamoto and Love [16] fairly well. B. Previous Models of Category Learning 1) Rule-Based Representations: The RULEX (rule-plusexception) model of category learning [27] is a hypothesistesting model that constructs rules and stores exceptions to the rules. Rule-following items are captured by the rule. Information about exceptions is explicitly stored. The classification decision is based on the exception match and rule application when no stored exceptions match the current stimulus. The likelihood of recognizing a test item is determined by the response from the items in the exception store. The storage of exception information allows RULEX to predict a memory advantage for rule-violating information. However, RULEX cannot account for the better recognition of exception B1 than exception A1 by representing rulegoverned behavior with actual rules [16]. Because a central property of rules is insensitivity to frequency information [28], [29], such as how often rule-consistent information is encountered, exceptions A1 and B1 are treated in the same manner when the regularities are represented by actual rules.
Furthermore, rules encode little information about rulefollowing items. Thus, RULEX cannot explain people’s ability to recognize rule-following studied items better than novel items as neither class of items is very similar to items in the exception store [15]. The inability of RULEX alone to account for the difference in recognition between rule-following and novel items suggests that humans store more than a rule to represent rule-following items. In fact, even when a rule is explicitly applied to a novel item, humans are still somewhat sensitive to the similarity between the novel item and previously encountered examples [30]. Analogously, humans tend to rely on familiar instantiations of abstract features in medical diagnosis rather than the abstract rules themselves [31]. 2) Exemplar Representations: Unlike RULEX, exemplar models store every instances in memory as a separate trace. An item is classified into a category depending on the item’s relative similarity to all stored exemplars and their associations to the category. Regularities are represented by the shifting of attention to the rule-relevant dimension. The likelihood of recognizing a stimulus as a studied item is proportional to the absolute sum of similarity to all stored exemplars. Exemplar models have a long history of explaining key psychological phenomena in the category learning research [32], [33], [34]. However, when all stored exemplars share the same attention along a dimension, exemplar models treat violating and rule-following items in the same fashion and cannot account for memory advantage for exceptions. Exemplar models can be extended to treat exceptions and rule-following items separately by allowing each stored exemplar to have its own attention [35]. Some laboratory work does suggest that humans attend to different dimensions of an item depending on the context the item is in, and thus attention is specific to the region along a dimension in the representational space [36], [37], [38]. For example, humans may attend to the color dimension when shopping for clothing but not as much when shopping for a computer. With exemplar-specific attention, exemplar models can predict the memory advantage for violating items; attention is distributed uniformly for exemplars encoding exceptions but is allocated to the rule dimension of exemplars encoding rule-following items. This differential attention makes exceptions distinctive in memory. Exemplar models with exemplar-specific attention can show recognition advantage for exception B1 over exception A1 by responding to larger prediction errors and ignoring smaller errors, leading to more attention learning for B1 than A1 [35]. III. M ODEL Although the extended exemplar models can account for Sakamoto and Love’s (2004) recognition data, these models cannot successfully fit the recognition and classification data simultaneously. One problem is that once the models remember the exceptions to the same extent that humans do, the models do not make classification errors on these exceptions. In the laboratory experiments, humans make classification errors on the exceptions in the transfer phase even when
they have mastered these items in the initial learning because memory strength decays in humans. Although there are models of memory and forgetting [39], [40], [41], [42], previous models of category learning do not incorporate memory decay mechanisms. Thus, once the knowledge structures are developed during training, these structures remain unchanged in the recognition and transfer classification phases. A. Incorporating Forgetting in the Model In the current work, we incorporate a simple forgetting mechanism into an exemplar model. We advance that each experience leads to some changes in human memory. Encountering an item leaves a new memory trace although it may not be always available for retrieval. In our model, each item is encoded in memory when it is encountered for the first time. Each stored item or exemplar has its own attention and association weights. An attention weight determines which dimension is important for the given situation (i.e., exemplar). An association weight determines how strong an exemplar is associated with each category. When the same item is encountered later on, the model change its memory by adjusting the attention and association weights. The attention and association weights decay over time. Given the close relationship between attention and memory [3], [4], we interpret the attention weights of an exemplar as the memory strengths for the exemplar. Thus, whether an exemplar is available for retrieval is determined by its attention weights, allowing the model to have no memory for an exemplar at one time but have perfect memory at another time. Similarly, humans can retrieve previously unavailable information in response to some cues [7], suggesting that information is not completely lost. Using memory decay mechanisms, our model can simultaneously explain human recognition and classification performances that previous models cannot. In the next section, we formalize our model. B. Formalism The model stores every training item as a separate trace. The probability that a stimulus xi is classified into category K is determined by: eφ·OK (xi ) P (K|xi ) = X eφ·Ok (xi )
(1)
k
where k indicates each category and the parameter φ controls the decisiveness of classification response. OK (xi ) is the category K output activation given a stimulus xi defined as: X OK (xi ) = ωjK · S(xi , yj ) (2) j
where j indicates each stored exemplars and ωjK is the strength of association between exemplar yj and category K. S(xi , yj ) is the similarity between stimulus xi and stored exemplar yj given by: Y S(xi , yj ) = e−c·αjm ·|xim −yjm | · cαjm −1 (3) m
1.0
c = 1.5
1.0
c =1 .0 .1
0.8
0.8
.2
.5 .6
.0
0.6
S(x, y)
.4
0.6
S(x, y)
.3
1 .9 .8 .7 .6 .5 .4 .3 .2 .1 .0
.1 .2
.7
.3
0.4
.4 .5 .6 .7 .8 .9 1
0.2
0.2
0.4
.8 .9 1
0.0
0.2
0.4
0.6
0.8
1.0
0.0
|x-y| on dimension m
0.2
0.4
0.6
0.8
1.0
|x-y| on dimension m
Fig. 1. Similarity is shown as a function of dimension match. As displayed in the left side, when c = 1, dimension match leads to similarity of 1 regardless of the attention strength, as in the similarity calculation mechanisms in most previous models of category learning. Because attention strengths represent memory strengths in the present work, we modified the similarity function such that dimension match becomes more similar when memory strength is stronger as shown in the right side when c = 1.5.
where m is the number of dimensions, the free parameter c scales the overall strength of similarity, αjm is the attentional weight for the mth dimension of exemplar yj . Figure 1 shows similarity S(x, y) when c = 1 on the left side and when c = 1.5 on the right side. In the current simulation, binaryvalued dimensions are used and thus |xim − yjm | is either 0 or 1. The value next to each line in Figure 1 is attention weight. As can be seen on the left side of Figure 1, when c = 1, dimension match (i.e., |xim − yjm | = 0) leads to similarity of 1 regardless of the attention strength, as in the similarity calculation mechanisms in most previous models of category learning. Because attention strengths represent memory strengths in the present work, we modified the similarity function such that dimension match becomes more similar when memory strength is stronger, as shown in the right side of Figure 1. The probability of choosing the studied (s) item over the novel (n) item in the recognition phase is determined by the exponential decision function: P (s|s, n) =
eρ·F (s) + eρ·F (n)
eρ·F (s)
(4)
where the parameter ρ controls the decisiveness of recognition response and F (s) is the model’s familiarity for the studied item. The familiarity F (xi ) of a stimulus xi is determined by the summed similarity of the stimulus item to the stored exemplars of both categories A and B: X F (xi ) = S(xi , yj ) (5) j
where S(xi , yj ) is defined in Equation 3. During the learning phase, the model updates the association (ω) and attention weights (α) after the presentation of each training item xi in response to prediction error E: Ek (xi ) = tk − Ok (xi )
(6)
where tk =
max[Ok (xi ), 1] if xi is in category K min[Ok (xi ), 0] if xi is not in category K
.
(7) This kind of teaching signal is referred to as a “humble teacher” [32], in which the model is not penalized for predicting the correct response more strongly than is necessary. Association weights are adjusted according to: ∆ωjk = λ · S(xi , yj ) · E
(8)
where the learning rate parameter λ exaggerate errors such that larger prediction errors or surprises lead to more learning [43]. Attention weights update is defined as: X ∆αjm = λ · S(xi , yj ) · Dm · E2 (9) k
where Dm is the model’s tendency to focus on information that is diagnostic in discriminating members of different categories [44] and is given by: X Dm = Dm + 1 − |pKm − xim | + |pkm − xim | (10) k
where pKm is the last item encountered from the same category as the current stimulus xi and pkm is the last item from each category that is different from the category to which the current stimulus belongs. Thus, the model learns the dimension values shared within categories but not shared between categories and use this information to adjust its attention weights. In the recognition and transfer phases, association strengths decay in the following fashion: +ηω · S(xi , yj ) if ωjk < 0 ∆ωjk = (11) −ηω · S(xi , yj ) if ωjk ≥ 0 where the free parameter ηω determines the rate of decay. Attention strengths decay as follows: ∆αjm = ηα · S(xi , yj )
(12)
Transfer
B1
1.0
Recognition
1.0
1.0
Learning
0.8
RA
2
3
Dimension
4
5
B1 A1
RB
0.2
RA
0.0
0.0
0.0
1
0.6
Attention (memory) Strength
0.6 RB
0.4
0.8
A1
0.2
0.4
0.6
RA
B1
0.4
Attention (memory) Strength
RB
0.2
Attention (memory) Strength
0.8
A1
1
2
3
Dimension
4
5
1
2
3
4
5
Dimension
Fig. 2. Attention strength for each dimension of the exceptions and rule-following items is shown after the learning (left), recognition (center), and the transfer phases (right). Memory trace was strongest for the most predictive, first dimension. The error-driven learning mechanism led exceptions to have stronger memory than rule-following items throughout the experiment. When memory becomes weaker in the transfer phase, different items begin to look more similar as suggested by the similarity gradient in Figure 1, leading to many classification errors on the exceptions in the transfer phase.
where the free parameter ηα determines the rate of decay. Attention strengths are constrained to be nonnegative [32]. Association and attention strengths decay as a function of similarity between an exemplar and the current item. C. Simulation The model was fit to the mean learning, recognition, and transfer accuracies provided by human participants in Sakamoto and Love [16]. The general philosophy of the model fits was to match the procedures applied to human participants and models as closely as possible. For instance, the same trial randomization procedures and training criteria were used in the original study and model simulations. There were six free parameters: ηα , ηω , λ, c, φ, and ρ. Genetic algorithm was used to search for a good set of parameters that minimized root mean squared deviations (RMSD). One hundred simulated participants completed the learning, recognition, and transfer phases. As shown in Table II, the model was able to capture the observed pattern (RMSD = 0.052), with ηα = .235, ηω = .035, λ = 6.480, c = 1.250, φ = 4.360, and ρ = 2.510. For a comparison, Table II displays that the model cannot predict the observed results without using the forgetting mechanism. The models without forgetting over-predict recognition memory for rulefollowing items and transfer classification performance for exceptions. Figure 2 shows a typical simulated participant’s memory strength (i.e., attention weight) for each dimension of the exceptions and rule-following items after the learning, recognition, and transfer phases. Memory trace was strongest for the first dimension. Exceptions led to stronger memory than rule-following items throughout the experiment. When memory becomes weaker in the transfer phase, different items begin to look more similar as suggested by the similarity gradient in Figure 1, leading to many classification errors on exceptions in the transfer phase. In the last block of the learning phase, the model, like humans, had high
classification accuracies on the exceptions. IV. C ONCLUSIONS We presented a computational model of human category learning that developed knowledge structures through trial by trial classification learning with corrective feedback. The model detected category regularities by shifting attention to diagnostic information and learning associations between items and categories. Like humans, the model showed sensitivity to information that deviated the acquired knowledge structures by responding to prediction errors. The model remembered exception B1 better than exception A1 because B1 had many competing rule-following items from category A, leading to more prediction errors than A1. This explanation is consistent with the contextual interference effects in which interference during learning, such as simultaneous presentation of competing stimuli, could facilitate retention [45]. Items with more contextual interference require deeper processing and once mastered are better remembered. Similarly, deviant items tend to be processed more fully and deeply because they violate the context and are harder to process [46]. Attention weights determined the likelihood of encoding and retrieval and were interpreted as memory strengths in the model. Attention and association strengths decayed over time, allowing the model to focus on the essential structures and forget other information that is not useful for the given task. Using memory decay mechanisms, our model was able to simultaneously explain human recognition and classification performances that previous models could not. Future work should test our model in different category learning experiments and other studies that directly examine human memory. A simple linear forgetting mechanism using attention and association weights may lead to observed curvilinear memory decay (e.g., exponential or power) in
recall [42]. Furthermore, future work should examine how the length of delay affects the degree of forgetting. ACKNOWLEDGMENT This research was supported by the Office of Naval Research, grant #N00014-05-1-00632. R EFERENCES [1] J. W. Alba and L. Hasher, “Is memory schematic?” Psychological Bulletin, vol. 93, pp. 203–231, 1983. [2] W. F. Brewer and G. V. Nakamura, “The nature and functions of schemas,” in Handbook of social cognition, R. S. Wyer and R. K. Srull, Eds. Hillsdale, NJ: Erlbaum, 1984, vol. 1, pp. 119–160. [3] D. Broadbent, Perception & Communication. New York: Pergamon Press, 1958. [4] N. Cowan, “Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system,” Psychological Bulletin, vol. 104, pp. 163–191, 1988. [5] R. L. Goldstone, “Similarity, interactive activation, and mapping,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 20, pp. 3–28, 1994. [6] G. R. Loftus and N. H. Mackworth, “Cognitive determinants of fixation location during picture viewing,” Journal of Experimental Psychology: Human Perception and Performance, vol. 4, pp. 565–572, 1978. [7] J. W. Pichert and R. C. Anderson, “Taking different perspectives on a story,” Journal of Educational Psychology, vol. 69, pp. 309–315, 1977. [8] P. G. Schyns and G. L. Murphy, “The ontogeny of part representation in object concepts,” in The psychology of Learning and Motivation, D. L. Medin, Ed. San Diego, CA: Academic Press, 1994, vol. 31, pp. 305–354. [9] T. K. Srull, “Person memory: Some tests of associative storage and retrieval models,” Journal of Experimental Psychology: Human Learning and Memory, vol. 7, pp. 440–463, 1981. [10] E. J. Wisniewski and D. L. Medin, “On the interaction of theory and data in concept learning,” Cognitive Science, vol. 18, pp. 221–281, 1994. [11] F. C. Bartlett, Remembering. London: Cambridge University Press, 1932. ¨ [12] H. von Restorff, “Analyse von vorg¨angen in spurenfeld. I. Uber die wirkung von bereichsbildungen im spurenfeld [Analysis of processes in the memory trace. I. On the effect of group formations on the memory trace],” Psychologische Forschung, vol. 18, pp. 299–342, 1933. [13] R. L. Fantz, “Visual experience in infants: Decreased attention to familiar patterns relative to novel ones,” Science, vol. 146, pp. 668– 670, 1964. [14] R. B. McCall and M. S. Carriger, “A meta-analysis of infant habituation and recognition memory performance as predictors of later IQ,” Child Development, vol. 64, pp. 57–59, 1993. [15] T. J. Palmeri and R. M. Nosofsky, “Recognition memory for exceptions to the category rule,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 21, pp. 548–568, 1995. [16] Y. Sakamoto and B. C. Love, “Schematic influences on category learning and recognition memory,” Journal of Experimental Psychology: General, vol. 133, pp. 534–553, 2004. [17] ——, “Vancouver, Toronto, Montr´eal, Austin: Enhanced oddball memory through differentiation, not isolation,” Psychonomic Bulletin & Review, vol. 13, pp. 474–479, 2006. [18] G. H. Bower, J. B. Black, and T. J. Turner, “Scripts in memory for text,” Cognitive Psychology, vol. 11, pp. 177–220, 1979. [19] G. S. Goodman, “Picture memory: How the action schema affects retention,” Cognitive Psychology, vol. 12, pp. 473–495, 1980. [20] R. Hastie and P. A. Kumar, “Person memory: Personality traits as organizing principles in memory for behaviors,” Journal of Personality and Social Psychology, vol. 37, pp. 25–38, 1979. [21] R. R. Hunt and C. A. Lamb, “What causes the isolation effect?” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 27, pp. 1359–1366, 2001. [22] K. Koffka, Principles of Gestalt Psychology. New York: Harcourt, Brace, 1935.
[23] K. Pezdek, T. Whetstone, K. Reynolds, N. Askari, and T. Dougherty, “Memory for real-world scenes: The role of consistency with schema expectation,” Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 15, pp. 587–595, 1989. [24] K. Rojahn and T. F. Pettigrew, “Memory for schema-relevant information: A meta-analytic resolution,” British Journal of Social Psychology, vol. 31, pp. 81–109, 1992. [25] T. Valentine, “A unified account of the effects of distinctiveness, inversion, and race in face recognition,” The Quarterly Journal of Experimental Psychology, vol. 43A, pp. 161–204, 1991. [26] C. Stangor and D. McMillan, “Memory for expectancy-congruent and expectancy-incongruent information: A review of the social and social developmental literatures,” Psychological Bulletin, vol. 111, pp. 42–61, 1992. [27] R. M. Nosofsky, T. J. Palmeri, and S. C. McKinley, “Rule-plusexception model of classification learning,” Psychological Review, vol. 101, pp. 53–79, 1994. [28] S. Pinker, “Rules of language,” Science, vol. 253, pp. 530–535, 1991. [29] E. E. Smith, C. Langston, and R. E. Nisbett, “The case for rules in reasoning,” Cognition, vol. 16, pp. 1–40, 1992. [30] S. W. Allen and L. R. Brooks, “Specializing the operation of an explicit rule,” Journal of Experimental Psychology: General, vol. 120, pp. 3– 19, 1991. [31] L. R. Brooks, G. R. Norman, and S. W. Allen, “Role of specific similarity in a medical diagnostic task,” Journal of Experimental Psychology: General, vol. 120, pp. 278–287, 1991. [32] J. K. Kruschke, “ALCOVE: An exemplar-based connectionist model of category learning,” Psychological Review, vol. 99, pp. 22–44, 1992. [33] D. L. Medin and M. M. Schaffer, “Context theory of classification learning,” Psychological Review, vol. 85, pp. 207–238, 1978. [34] R. M. Nosofsky, “Attention, similarity, and the identificationcategorization relationship,” Journal of Experimental Psychology: General, vol. 115, pp. 39–57, 1986. [35] Y. Sakamoto, T. Matsuka, and B. C. Love, “Dimension-wide vs. exemplar-specific attention in category learning and recognition,” in Proceedings of the 6th International Conference of Cognitive Modeling, M. Lovett, C. Schunn, C. Lebiere, and P. Munro, Eds. Mahwah, NJ: Lawrence Erlbaum Associates, 2004, pp. 261–266. [36] D. W. Aha and R. L. Goldstone, “Concept learning and flexible weighting,” in Proceedings of the 14th Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum Associates, 1992, pp. 534–539. [37] L. W. Barsalou and D. L. Medin, “Concepts: Static dimensions or context-dependent representations?” Cahiers de Psychologie Cognitive, vol. 6, pp. 187–202, 1986. [38] S. Lewandowsky, M. Kalish, and S. K. Ngang, “Simplified learning in complex situations: Knowledge partitioning in function learning,” Journal of Experimental Psychology: General, vol. 131, pp. 163–193, 2002. [39] R. C. Atkinson and R. Shiffrin, “Human memory: A proposed system and its control processes,” in The Psychology of learning and motivation, K. Spence and Spence, Eds. New York: Academic Press, 1968, vol. 2. [40] G. Gillund and R. M. Shiffrin, “A retrieval model for both recognition and recall,” Psychological Review, vol. 91, pp. 1–67, 1984. [41] J. S. Nairne, “Modeling distinctiveness: Implications for general memory theory,” in Distinctiveness and memory, R. R. Hunt and J. B. Worthen, Eds. New York, NY: Oxford University Press, 2006, pp. 27–46. [42] J. T. Wixted, “On common ground: Jost’s (1897) law of forgetting and Ribot’s (1881) law of retrograde amnesia,” Psychological Review, vol. 111, pp. 864–879, 2004. [43] R. A. Rescorla and A. R. Wagner, “A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement,” in Classical Conditioning: II. Current Research and Theory, A. H. Black and W. F. Prokasy, Eds. New York: ApplentCentury-Crofts, 1972, pp. 64–99. [44] A. B. Markman and B. H. Ross, “Category use and category learning,” Psychological Bulletin, vol. 129, pp. 592–613, 2003. [45] W. F. Batting, “The flexibility of human memory,” in Levels of processing and human memory, L. S. Cermak and F. I. M. Craik, Eds. Mahwah, NJ: Erlbaum, 1979, pp. 23–44. [46] A. Friedman, “Framing pictures: The role of knowledge in automatized encoding and memory for gist,” Journal of Experimental Psychology: General, vol. 108, pp. 316–355, 1979.