Theories of Artificial Grammar Learning - CiteSeerX

10 downloads 0 Views 144KB Size Report
Zoltan Dienes, Arthur Reber, David Shanks, and Katy Tapper for their help with the manuscript. ..... In the competitive chunking hypothesis (Servan-Schreiber,.
Psychological Bulletin 2007, Vol. 133, No. 2, 227–244

Copyright 2007 by the American Psychological Association 0033-2909/07/$12.00 DOI: 10.1037/0033-2909.133.2.227

Theories of Artificial Grammar Learning Emmanuel M. Pothos Swansea University Artificial grammar learning (AGL) is one of the most commonly used paradigms for the study of implicit learning and the contrast between rules, similarity, and associative learning. Despite five decades of extensive research, however, a satisfactory theoretical consensus has not been forthcoming. Theoretical accounts of AGL are reviewed, together with relevant human experimental and neuroscience data. The author concludes that satisfactory understanding of AGL requires (a) an understanding of implicit knowledge as knowledge that is not consciously activated at the time of a cognitive operation; this could be because the corresponding representations are impoverished or they cannot be concurrently supported in working memory with other representations or operations, and (b) adopting a frequency-independent view of rule knowledge and contrasting rule knowledge with specific similarity and associative learning (co-occurrence) knowledge. Keywords: artificial grammar learning, implicit learning, rules, similarity, associative learning

tencies in claims of implicit learning. These problems reduce the capacity of AGL research to develop and have an impact in cognitive science generally. It has been suggested (Brooks & Vokey, 1991; Knowlton & Squire, 1996; Manza & Reber, 1997; Pothos & Bailey, 2000; A. S. Reber, 1993) that a satisfactory AGL account should be forthcoming from a combination of existing theories, but an adequate synthesis has been elusive. The first aim of this article is to develop such a synthesis. The second aim is to examine how theoretical claims in AGL theory generalize. AGL research is useful only to the extent that AGL findings can be related to cognitive science research generally. For example, how does the notion of rules in AGL relate to rules in categorization? To achieve these objectives, I characterize each AGL theory along two dimensions: the form of the knowledge acquired and whether this knowledge is implicit or explicit. With respect to the first dimension, the roots of specific theories of learning can be traced to rules, similarity, or associative learning (Pothos, 2005a), and this characterization is pursued in AGL as well. The contrasts among rules, similarity, and associative learning and between implicit and explicit cognition have been extremely controversial. Accordingly, I discuss some common conceptions of rules, similarity, associative learning, and implicit and explicit cognition before I review the AGL literature. Following this review, the utility of AGL is assessed as to how it can inform these debates.

This paper reviews and evaluates the main theoretical accounts in artificial grammar learning (AGL). Since its conception (A. S. Reber, 1967), the AGL paradigm has been extensively used to address two core issues in cognitive science: (a) What is the form of the knowledge acquired as a result of learning? and (b) Is the knowledge acquired implicit or explicit? AGL lends itself exceptionally well to the study of these issues: First, the regularity of AGL stimuli can be described in many ways. Thus, many influential hypotheses about learning have been applied in AGL. Second, AGL appears to take place both in a passive mode and in an explicit hypothesis-testing mode. Finally, AGL performance appears to be guided sometimes by conscious knowledge and other times by intuition without conscious awareness. A PsycINFO search with the term artificial grammar learning produced 125 hits, a figure that underestimates the number of AGL studies, as the AGL paradigm is often used without being directly referred to in the title or the abstract. This article contains a review of nine theoretical accounts: rules (A. S. Reber, 1967), microrules (Dulany, Carlson, & Dewey, 1984), fragments (Perruchet & Pacteau, 1990), the chunking hypothesis (Knowlton & Squire, 1996), specific similarity (Brooks & Vokey, 1991), generalized context model (GCM) similarity (Pothos & Bailey, 2000), single recurrent networks (SRNs; Boucher & Dienes, 2003), parser (Perruchet, Vinter, Pacteau, & Gallego, 2002), and fluency (Kinder, Shanks, Cock, & Tunney, 2003). In this article, I show that AGL theories do an excellent job in accounting for aspects of AGL performance but that not all of them may be needed. Moreover, some contrasts among AGL theories are ill-defined. Finally, there are inconsis-

Artificial Grammar Learning In AGL, a finite state language (Figure 1) is used to specify a set of continuation relations among symbols, including beginning and end states, so that sequences of symbols can be constructed (Chomsky & Miller, 1958). Any finite state language divides the sequences made from these symbols into ones consistent with the finite state language (grammatical; G) and ones that violate it (ungrammatical; NG). Test G sequences are usually different from training ones. AGL is the experimental paradigm used to examine the knowledge that results from the study of symbol sequences

I am very grateful to Todd Bailey, Nick Chater, Axel Cleeremans, Zoltan Dienes, Arthur Reber, David Shanks, and Katy Tapper for their help with the manuscript. I am solely responsible for the ideas presently expressed. This research has been partly supported by European Community Framework 6 Grant, Contract 516542. Correspondence concerning this article should be addressed to Emmanuel M. Pothos, Department of Psychology, Swansea University, Swansea SA2 8PP, United Kingdom. E-mail: [email protected] 227

POTHOS

228 OUT

3 T IN

1

J V

T

2

V

X X

OUT

5

4

OUT

J

Figure 1. An example of a finite state language (from Knowlton & Squire, 1996), where the symbols corresponding to the different transitions are letters and therefore the resulting sequences are letter strings. The circles are the states of the language. Every time a legal transition is made between states, the letter corresponding to this transition is added, until a transition is made to one of the OUT states. For example, whereas string XXVT is G (grammatical), string XT is not.

generated by a finite state language. The majority of AGL experiments involve letter strings, such as MSSV. G sequences are typically fewer than NG ones, as not all transitions between states of a finite state grammar are possible. Miller (1958) computed the redundancy of G sequences relative to NG ones and showed that the former are typically more predictable than the latter (Miller, 1958). A. S. Reber (1967) presented what became the standard AGL paradigm. In a training phase, participants observe a subset of the G sequences without any information about how the sequences were created or about the test phase. Subsequently, in a test phase, they are told that the training items were generated by a complex set of rules and that they would have to discriminate between novel G and NG items with no feedback. A robust finding is that participants can discriminate between G and NG sequences with above-chance accuracy. Participants consistently favor the G items in test; thus, they are sensitive to the regularity in the G items relative to the NG ones. However, this finding is not important in its own right. AGL research has flourished because of the multitude of psychological theories that have been proposed to explain AGL performance, relating notably to rules, similarity, or associative learning and implicit versus explicit cognition. Other paradigms have been used to address these issues. The most relevant is research with biconditional grammars, in which each stimulus has two parts such that the symbols in the first part determine the symbols in the second (Mathews et al., 1989; Shanks, Johnstone, & Staggs, 1997). Biconditional grammars are well suited for examining the ability of participants to learn rules embedded in complex stimulus domains (see also Dienes & Kuhn, 2005; Dienes & Longuet-Higgins, 2004). Also related is the serial reaction time (SRT) task, in which participants must rapidly identify the screen region where a probe appears. The screen is divided into as many regions as the symbols corresponding to the transitions in a finite state language. The sequence of regions in which the probe appears corresponds to sequences of the finite state language, although participants are not told that. Results with the SRT task have been very prominent in the implicit versus explicit debate (Cleeremans & McClelland, 1991; Rah, Reber, & Hsiao,

2000), much less so in characterizing learning in terms of rules, similarity, and so forth. In AGL, learning in terms of rules, similarity, or associations is a priori equally plausible, likewise for implicit and explicit learning. Thus, AGL reflects a different type of learning from that in biconditional grammars and the SRT. Accordingly, findings from these paradigms are used sparingly here.

Rules, Similarity, and Associative Learning It is impossible to present the rules, similarity, and associative learning debate in its full complexity within such a limited space. My purpose is to sample enough of the relevant theory so as to characterize AGL research. The rule intuition is that “we recognize why 24683 is an odd number, and why Priscilla Presley is a grandmother. . ., know that an offspring of raccoons that looks and acts like a skunk is nonetheless not a skunk. . ., joke that one cannot be a little pregnant” (Marcus, Brinkmann, Clahsen, Wiese, & Pinker, 1995, p. 245). A rule is a mental operation that allows characterization of a stimulus by examining only a part of it (Pothos, 2005a; E. Smith, Patalano, & Jonides, 1998). Rules typically allow certain conclusions (Sloman & Rips, 1998). In categorization, rules often have the form of critical features, that is, features whose presence deterministically affects classification of the corresponding object. For example, suppose that, as a result of a toxic accident, a bird is transformed so that it looks like an insect, but it can still mate with other birds. Participants would consider the transformed creature to be a bird, rather than an insect; thus, mating is a critical feature of the category “birds” (Rips, 1989). In language, we often assume that grammar and syntax are represented as a network of abstract rules, as it appears that we can readily recognize novel sentences as grammatical or not, regardless of their frequency (e.g., “colorless ideas sleep furiously” is meaningless and thus has zero prior probability; Chomsky, 1965; Marcus et al., 1995; Pinker, 1994). Similarity is the mental operation that allows us to compare two instances and establish their commonalities. When applying a rule we only consider a particular aspect of a target representation (e.g., can a creature mate with other birds?), whereas judging the similarity between two instances involves consideration of all the characteristics of the instances. Similarity operations are flexible; given two instances, it is always possible to establish some sort of similarity judgment. An influential tradition in categorization assumes that category exemplars are stored, and the potential membership of new instances to a category is determined through a process of (whole) exemplar similarity (Nosofsky, 1988; Nosofsky & Zaki, 1998). Recently, Chater and Hahn (1997; Chater, 1999) defined exemplar similarity between two objects as the ease with which one object can be transformed into the other. Other formalisms explain similarity as a process of feature overlap. According to Tversky (1977), similarity between two objects increases and decreases as a function of the objects’ common and mismatching features, respectively (for objects represented as lists of features). Comparing rules with similarity has been an extremely controversial issue. Goodman (1972) criticized similarity for being so flexible as to be vacuous. However, the same criticism can be made for any explanatory framework as general as similarity and applies equally to a general rule framework or an associative learning one (Boucher & Dienes, 2003; Churchland, 1990).

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

In associative learning, learning is guided by sensitivity to contingency relations between the elements in our environment (Wasserman & Miller, 1997). The simplest form of contingency information is co-occurrence, in which the cognitive system tracks which elements co-occur. Forward transitional probability was made prominent by the use of SRNs (Elman, 1990) in cognitive modeling. The relation of associative learning theory with rules and similarity is controversial. Some argue that the relation is that of algorithmic and computational-level explanations, as associative learning theory is concerned with the process of learning and rules, similarity with the outcome of learning (Cheng, 1997). Others have suggested that associative learning knowledge supports generalization that can be understood in terms of similarity but not rules (Shanks & Darby, 1998; Wills & Mackintosh, 1998). In fact, associative learning processes are sometimes thought to be fundamentally incompatible with rules, but this view is not universally accepted (Gentner & Medina, 1998; Goldstone & Barsalou, 1998).

Implicit Versus Explicit Cognition There have been two influential but somewhat separate traditions in studies of implicit cognition. Schacter and colleagues (Schacter, 1992; Schacter, Chiu, & Ochsner, 1993) defined implicit memory as “facilitation or change in test performance that is attributable to information or skills acquired during a prior study episode, even though subjects are not required to, and may even be unable to, recollect the study episode” (p. 159). For example, in a stem-completion task, participants might spontaneously prefer words studied previously, even if they have no explicit recollection of these words. Hayes and Broadbent (1988; see also Berry & Dienes, 1993) characterized implicit learning as “unselective and passive aggregation of information about the co-occurrence of environmental events and features” (p. 251) and postulated an independent explicit learning system guided by hypothesis testing. Moreover, Litman and Reber (2005; A. S. Reber, 1993) considered implicit learning as learning taking place independently of an intention or effort to learn or the actual process of learning, and as occurring independently of the products of learning. The relation between implicit learning and implicit memory is unclear. First, implicit memory experiments typically rely on the incidental activation of knowledge we already have, whereas implicit learning results in the creation of new knowledge (the knowledge that allows a distinction between G and NG sequences). Second, Schacter et al. (1993) noted that priming (as the main indicator of implicit memory) is largely modality specific and plausibly mediated by a visual processing module (Williams & Tarr, 1997). However, in AGL there is extensive evidence of transfer across modalities (Altmann et al., 1995; Conway & Christiansen, in press). A likely source of this difference is that AGL stimuli embody many kinds of structural regularities, whereas Schacter et al.’s visual priming stimuli have simple (figural goodness) structure. Despite these differences, there is consensus in that implicit memory and implicit learning both involve the influence of knowledge on a cognitive process without conscious activation of this knowledge (Dienes & Perner, 1999). Intuitively, it seems clear that there are kinds of knowledge about which we have little conscious introspection, such as motor skills, much of language, the rules of social interaction, and so

229

forth (A. S. Reber, 1993). However, empirically grounding the implicit– explicit distinction has been controversial. Shanks and St. John (1994) argued that in all claims that knowledge X in a task is implicit, it is either the case that knowledge X is actually explicit (as shown with more accurate measures of conscious awareness) or that participants do not employ knowledge X, but rather X⬘, where X⬘ is typically less complex than X and is explicit. Recent implicit versus explicit arguments follow this pattern (Lovibond & Shanks, 2002; Shanks, 2005). The SRT has been the other major paradigm utilized in the implicit versus explicit debate. Recently, Destrebecqz and Cleeremans (2001) manipulated the interval between response and (next) stimulus: A 0-ms interval was meant to foster implicit learning, as it allows no time for conscious anticipation. Indeed, with a 0-ms interval, participants instructed not to generate the SRT sequence were unable to do so, demonstrating a lack of strategic control of their knowledge (and thus implicit knowledge). However, these results were challenged by Wilkinson and Shanks (2004). Generally, there have been several demonstrations of strategic application of knowledge in AGL (Conway & Christiansen, 2005; Dienes et al., 1995; Higham et al., 2000). Note that measures of explicit knowledge are typically more reliable and accurate than measures of implicit knowledge (Buchner & Wippich, 2000; Tunney, 2005), but this is arguably an indicator that implicit and explicit processes are distinct (Dienes, 2004). Seeking a demonstration of implicit influences on performance may be methodologically impossible, and some researchers have argued that the effort is misguided (Dulany, 2003; A. S. Reber, 1993). Is it meaningful to retain an implicit– explicit distinction even if there are no clear empirical markers of implicit influences (cf. Shanks, 2005)? This would be the case if the behaviors labeled “implicit” differ in interesting ways from the ones labeled “explicit,” and if we can unambiguously identify which behaviors should be characterized as implicit and which as explicit (cf. Pothos, 2005a). In this vein, consider three arguments. First, implicit cognition typically involves processes that are automatic, nondirected, nonintentional, and sensitive to multiple regularities, whereas explicit cognition involves processes that are flexible, controlled, and directed (A. S. Reber, 1993). Second, the distinction between explicit hypothesis testing and implicit processing appears uniquely in human cognition, suggesting that explicit processing is a later evolutionary facility (Gray, 2004; A. S. Reber, 1993). Indeed, there is considerable evidence that implicit cognition is more robust against neurological damage, as compared with explicit cognition (the first demonstration of robust AGL with a patient population was described by Abrams & Reber, 1989; see also Don, Schellenberg, Reber, DiGirolamo, & Wang, 2003; Knowlton & Squire, 1996; cf. A. S. Reber’s, 1993, “primacy of the implicit” hypothesis). Third, having distinct implicit and explicit modes is functionally advantageous (Cleeremans, 2005). For example, Gray (2004) and Koch (2004) argued that the explicit system serves a late errordetection function for decisions already formed unconsciously, an intuition consistent with explicit learning as directed hypothesis search. This suggests that the difference between explicit and implicit cognition can be partly understood in terms of controlled versus automatic cognition (Higham, 1997a; Higham et al., 2000): Implicit processing is automatic and unaffected by explicit considerations, but with effort, the explicit system can sometimes

230

POTHOS

modulate implicit procedures. Explicit knowledge starts as slow and consciously modifiable by the conscious system, but with repetition, it gradually becomes automatized (A. S. Reber, personal communication, February 2006; cf. Logan, 1988). Moreover, implicit cognition may be well suited for processing complex stimuli. The higher the complexity of a stimulus domain, the greater the number of potential hypotheses for stimulus structure. Working memory restrictions impose a limitation on how many hypotheses can be concurrently considered explicitly (Baddeley, 1983) and thus would disadvantage the processing of complex stimuli. By contrast, there is evidence that the implicit system can process multiple regularities concurrently (Rah et al., 2000; J. Smith & McDowall, 2005). Note that short-term memory range limitations appear to apply equally to implicit and explicit processing (Cleeremans & McClelland, 1991), but this issue is different from the one currently of interest. Complexity can encourage implicit processing in AGL (A. S. Reber, 1976, 1989) and more generally (Halford, Wilson, & Philips, 1998; Sun, Merrill, & Peterson, 2001). Consistent with these ideas, Sun, Slusarz, and Terry (2005) modeled a wide range of learning findings with Connectionist Learning With Adaptive Rule Induction Online (CLARION). In CLARION, a selective hypothesis-testing explicit system competes with an “implicit” one that extracts complex statistical structure from information-rich stimulus domains. Criticism for the idea that complexity encourages implicit processing is pronounced in Competition Between Verbal and Implicit Systems (COVIS; Ashby, Alfonso-Reese, Turken, & Waldron, 1998), in which categorization is described as competition between a verbal system (under conscious control) and a nonverbal implicit/procedural system. The verbal system dominates initially; however, the implicit system gradually takes over (given that applying procedural knowledge is more efficient; Logan, 1988). Ashby et al. (1998) claimed that this competition is not a matter of complexity. The COVIS formulation, however, is a matter of current debate; hence, any corresponding implications should be considered with caution (Nosofsky, Stanton, & Zaki, 2005). In summary, defining the implicit– explicit distinction simply in terms of inaccessibility to consciousness appears both theoretically inappropriate and empirically indefensible (Dulany, 2003; Shanks, 2005; Shanks & St. John, 1994). Dienes and colleagues (Dienes, 2004; Dienes & Perner, 1999; Dienes & Scott, 2005) formulated a definition of implicit cognition that seems to be in line with most of the relevant theory (e.g., Dulany, 1997; A. S. Reber, 1993; Schacter, 1992), and his perspective is adopted in this work: Implicit knowledge is knowledge not consciously activated at the time of a cognitive process. Empirically, A is implicit if we believe we are guessing, and A is explicit if we are aware of any degree of knowing A. Moreover, I assume that a representation is consciously activated if it is activated within working memory (cf. A. S. Reber, 1989; Sun et al., 2005) and if it is of sufficient quality (as opposed to impoverished). The following inferences can now be made. First, knowledge A may be activated implicitly in one cognitive process and explicitly in another. For example, the representation of A may be impoverished such that additional effort is required to consciously activate A (Shanks, 2005; Shanks & St. John, 1994). Whether A is ultimately accessible to consciousness is irrelevant (nonprovable; Dulany, 1997). Dulany (2003, 2004) noted that some knowledge A may be consciously activated in different ways. For example,

conscious activation of a sub-propositional content (e.g., a fragment) may consciously activate another sub-propositional content (e.g., a sense of correctness). However, this association may also be represented in terms of an explicit proposition (cf. Dienes & Perner, 1999). Second, suppose that a cognitive process requires the activation of several pieces of information, X1, X2,. . .XN, where N ⬎⬎ 4 (cf. Cowan, 2001). The representations of X1, X2,. . .XN could not be concurrently activated in working memory, because their number exceeds working memory capacity. Accordingly, in the present account, activation of X1,X2,. . .XN would be implicit. This is the way in which complexity encourages implicit processing (indeed, gut-feeling intuition often appears highly sophisticated). Explicit consideration of a problem provides an advantage to implicit processing if directed hypothesis search identifies a satisfactory solution early.

Theoretical Accounts of Artificial Grammar Learning Rules Format of Knowledge A. S. Reber and Allen (1978) proposed that “what subjects tacitly know is a valid, if partial, representation of the actual underlying rules of the language” (p. 191). In other words, participants are assumed to develop a rule-based representation that corresponds to the finite state language used; such knowledge is referred to as grammaticality. Because a finite state language defines the G–NG distinction, rule knowledge has been traditionally inferred in terms of the accuracy with which participants can discriminate G items from NG ones. To what does rule knowledge correspond? Several investigators, starting with A. S. Reber himself (A. S. Reber, 1993; see also Pothos & Bailey, 2000), have noted that the training items in an AGL experiment only partially determine the finite state language used in the experiment. Thus, participants might extract knowledge in the form of a network of rules, but this network could be different than the one the experimenter used to create the G sequences. Subsequently, in the second phase, participants might fail to discriminate between G and NG sequences not because they have not learned rules but because they have learned rules that are inconsistent with the ones used by the experimenter. The approach of inferring rule knowledge on the basis of successful G–NG item discrimination, is inaccurate in a second way, since the G–NG distinction often covaries not only with (partial) knowledge of the finite state grammar, but also with other kinds of knowledge (such as similarity). Investigators have tried to narrow down the interpretation of grammaticality, by equating G items with NG ones on the basis of several stimulus properties (e.g., Brooks & Vokey, 1991; Knowlton & Squire, 1994b). Johnstone and Shanks (1999) and Pothos and Bailey (2000) contrasted grammaticality with other possible knowledge influences using multiple regressions analyses to show that, even when other influences were controlled, an effect of grammaticality was identifiable. However, even a residual grammaticality effect need not reflect rule knowledge. For example, Vokey and Brooks (1992, 1994; Brooks & Vokey 1991; see also Higham, 1997a, 1997b) suggested that grammaticality corresponds to the additive similarity influence of

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

all training exemplars in the classification of each test item (“a chorus of instances”; cf. the GCM approach). A fundamental problem here is that grammaticality has not been linked to a conception of psychological rules, defined in terms of stimulus properties. Johnstone and Shanks (1999) proposed “transition rule strength” as a measure of rule knowledge. Rules are defined as transitions between states of a finite state language and transition rule strength depends on the repetition of rules in the training items. A test item would be likely to be selected as G on the basis of rule knowledge if the rules characterizing the item had a high overall transition rule strength. Finally, a noted confound in AGL studies is that successful G–NG discrimination may not indicate knowledge acquired in training but rather selection biases favoring G items as opposed to NG ones. Accordingly, some investigators have proposed that performance of experimental participants needs to be compared with that of control participants who receive the test without having seen the training stimuli (Altmann, Dienes, & Goode, 1995; Redington & Chater, 1996). There has been some controversy as to whether even a control group comparison is adequate (R. Reber & Perruchet, 2003), but Dienes and Altmann (2003) showed that the effect of training in an AGL task is to replace the prior biases for grammaticality selections at test (i.e., biases relating to the wellformedness of stimuli, for example) with biases corresponding to the structure of the training items. Overall, A. S. Reber and Allen’s (1978) AGL rules resemble language rules (i.e., an integrated network of rules), but modeling rule knowledge in terms of G–NG discrimination (even with appropriate stimulus controls) is clearly inappropriate.

Implicit Versus Explicit Cognition A. S. Reber (1967) suggested that participants respond on the basis of gut-feeling intuition about which items are G. However, A. S. Reber and Allen (1978) asked participants to provide introspective reports on all their grammaticality judgments. In a sample of 10 participants, 821 justified responses showed that whatever the nature of the knowledge acquired in AGL, it is partly accessible to consciousness. Similar procedures by Dulany et al. (1984); Dienes, Broadbent, and Berry (1991); Mathews et al. (1989); and Perruchet and Pacteau (1990) replicated this result. Evidence for implicit learning in AGL would be apparent if AGL knowledge was unaffected by encoding strategies, instructional manipulations, and so forth (Dienes & Perner, 1999). From this perspective, it appears that AGL involves both an implicit and an explicit component. In favor of an implicit component, Whittlesea and Dorken (1993) noted that “the circumstances under which subjects are most clearly sensitive to general principles of the grammar are those that require subjects to do the least with stimuli” (p. 229; see also, Dienes et al., 1991; Mathews et al., 1989; A. S. Reber, 1976). Moreover, Higham, Vokey, and Pritchard (2000; Higham, 1997a; Vokey & Higham, 2004; but see Tunney & Shanks, 2003) provided evidence that an aspect of AGL performance is unaffected by certain methodological manipulations and is therefore automatic. Turner and Fischler’s (1993) results corroborate the notion of an automatic component in AGL in that under a response deadline, participants who were asked to search for rules in training, but not participants asked to memorize the training items, were impaired.

231

In favor of an explicit component, A. S. Reber and Allen (1978) first suggested that how participants approach the training stimuli could influence the form of knowledge acquired. Whittlesea and colleagues (Whittlesea & Dorken, 1993; Whittlesea & Wright, 1997) advocated an episodic processing account of AGL, according to which the particulars of AGL training affects how the stimuli are generalized (Morris, Bransford, & Franks, 1977). For example, Whittlesea and colleagues argued that a memorization training procedure would lead predominantly to knowledge of fragments, whereas knowledge of rules is associated primarily with variants of incidental observation procedures (cf. A. S. Reber & Allen, 1978). They also showed that if participants are asked to pronounce some of the training items and spell the rest, they could strategically apply the knowledge from the pronounced or the spelled items (see also Dienes, Altmann, & Gao, 1999). Whittlesea and Wright (1997) found that participants are sensitive to changes in incidental stimulus properties, such as expanding some items by repetition of some elements. In Pothos (2005b), the standard AGL task was modified such that the stimuli corresponded to sequences of cities a traveling salesperson would visit. Participants’ performance was impaired when the structure of the stimuli was incompatible with a prior expectation that short trips are preferable. Note that the obvious manipulation of telling participants that rules exist in training has a detrimental effect on performance (A. S. Reber & Allen, 1978), but this could be because such instructions elicit anxiety (Rathus, Reber, Manzus, & Kushner, 1988). Instructions to look for rules are beneficial only with biconditional grammars (Mathews et al., 1989; Shanks et al., 1997), but biconditional grammars correspond to a different type of learning than that in AGL. Overall, on request, participants in AGL experiments can provide considerable insight about their selections. Moreover, it appears that AGL performance involves a component that is implicit (in that it is unaffected by learning procedure manipulations) as well as one that is explicit.

Microrules Form of Knowledge Participants do not represent the experimenter’s finite state language, but they might develop idiosyncratic finite state languages that describe the training items with different degrees of accuracy. Dulany et al. (1984; Dulany, Carlson, & Dewey, 1985) developed this assumption into the microrules hypothesis of AGL (a similar view was discussed by A. S. Reber & Lewis, 1977). Dulany and colleagues suggested that, in training, participants develop insights about which elements (bigrams, trigrams, etc.) are characteristic of the training items. For example, they might notice that training items start only with an M or a V. In test, when they are told that there are legal and illegal items, they may use these insights as “personal sets of conscious rules, each of limited scope and many of imperfect validity” (Dulany et al., 1984, p. 541) to determine which items are G. The notion of rules as independent heuristics that guide classification is akin to critical feature theories in categorization, according to which classification of an instance is (sometimes) causally determined by the presence of certain features.

POTHOS

232 Implicit Versus Explicit Cognition

Dulany et al. (1984) used a passive observation learning procedure. At test, their participants were asked to justify their grammaticality decisions by underlining the part of the string they thought made it G. For example, a participant might classify MSSV as G, underlining MS, to mean that the feature MS led to the G response. In this way, Dulany et al. (1984) collected information about which features participants used in their grammaticality judgments. Dulany et al. (1984, 1985) examined whether the reported features could account for performance by computing a measure of the validity of each feature; grammaticality accuracy correlated with mean validities to an extent that there appeared no room for residual implicit influence on performance. A. S. Reber (1993) criticized this procedure by noting that for G items, the entire string makes them G. The analogy with natural language processing is to decide what it is about this sentence that makes it grammatical. Nonetheless, other researchers have reported that participants are able to verbalize at least a portion of the knowledge that influenced their grammaticality decisions (Mathews et al., 1989), in part regardless of the learning procedure.

Fragments Form of Knowledge A fragment (or chunk) can be any part of a string, typically a bigram or a trigram. The label fragments is typically associated with Perruchet and Pacteau’s (1990) work and chunks with Knowlton and Squire’s (1996) research. Both terms are used here to differentiate between the two approaches. Perruchet and Pacteau’s (1990) suggestion was that in processing the training items, participants gradually recognize that certain symbols co-occur with others. Eventually, participants develop knowledge of (primarily) the common bigrams or trigrams that make up the training set. In seeing a test item, participants can explicitly recognize whether it is made up of familiar fragments and, accordingly, classify it as G or NG. Perruchet and Pacteau (1990) found that participants who went through a standard training phase performed at a comparable level with participants who were only shown the bigrams that made up the training sequences. Additionally, grammaticality decisions about NG items composed of legal bigrams in illegal positions were very inaccurate, showing that participants were more sensitive to items made NG by having illegal bigrams than to items made NG by placing legal bigrams in illegal positions. A problem with Perruchet and Pacteau’s (1990) demonstration is that what is effectively shown is that if participants learn only bigrams, they will perform equivalently to participants exposed to whole letter strings. Thus, when bigram knowledge is the only source of information, it can be learned and account for performance. However, the same level of overall grammaticality accuracy can be obtained through different learning routes. The critical question is, when several sources of knowledge are available, does learning still involve bigrams (cf. Vokey & Brooks, 1994)? Results show that participants’ knowledge often involves bigrams and trigrams (e.g., Dienes et al., 1991). Moreover, participants are sensitive to fragment positional information and larger groups of letters (Gomez & Schvaneveldt, 1994; Johnstone & Shanks, 1999; Shanks, Johnstone, & Staggs, 1997). There is also evidence that AGL

knowledge goes beyond fragments, to include sensitivity to item similarity between test and training items (Vokey & Brooks, 1994) or even physical similarity between test and training items (Higham, 1997b). The fragments theory is an associative learning view of AGL, in which the contingency measure is co-occurrence. There is an alternative interpretation, attributable to Mathews (1991), whereby participants generalize to test sequences because they have partially encoded the training ones, and partial encoding arises because participants forget parts of the training items. Indeed, in AGL, it does appear that grammaticality performance is highest when the training procedure requires participants to do the least with the items (e.g., A. S. Reber, 1976; Whittlesea & Dorken, 1993). Forgetting may have adaptive value in that it enables mnemonic heuristics, such as recognition and fluency (Schooler & Hertwig, 2005). However, there are problems with the forgetting hypothesis, as perfect recall of the training instances can support generalization as well (in AGL, Brooks & Vokey, 1991; McAndrews & Moscovitch, 1985; more generally, Nosofsky, 1989). Finally, Perruchet and Pacteau’s (1990) fragments theory is remarkably similar to that of Dulany et al. (1984). Both postulate that the knowledge acquired in AGL is explicit and in the form of parts of training stimuli that determine grammaticality decisions. For example, a participant might decide that test string MSSXV is G because she recognizes bigrams MS and VX. However, to know that a legal start bigram is MS looks equivalent to the rule “a legal start bigram is MS” (cf. critical feature categorization; Rips, 1989). Indeed, it is not clear what other form microrules could have in AGL. Consistent with this intuition, Brooks and Vokey (1991) noted, “These investigators [Perruchet & Pacteau, 1990] hypothesized that grammaticality judgments can be accounted for by fragmentary, conscious knowledge of pairs of letters, a position similar to that taken by Dulany et al. (1984)” (p. 321). Why can’t microrules be in the form of fragments?

Implicit Versus Explicit Cognition Perruchet and Pacteau (1990) used a recognition task for bigrams in the training phase. They found that bigram knowledge was sufficient to account for the observed grammaticality accuracy (on the basis of calling a test string NG if it contained at least one bigram that had not been recognized). However, what Perruchet and Pacteau demonstrated was that it is possible to consciously activate fragment knowledge, not that, at the time of grammaticality decisions, such knowledge is consciously activated.

The Chunking Hypothesis Form of Knowledge In the competitive chunking hypothesis (Servan-Schreiber, 1991) co-occurring chunks or elementary units are organized into larger chunks. Frequency of occurrence determines the salience of each chunk. The familiarity of a novel stimulus increases if it can be perceived with fewer chunks. Thus, knowledge of bigrams would develop initially, leading gradually to knowledge of trigrams, four-grams, and so forth. This account is consistent with the observation of Perruchet and Pacteau (1990) that bigrams are salient entities in AGL and with later investigations showing that

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

participants acquire knowledge that is more complex than bigrams (Gomez & Schvaneveldt, 1994). Servan-Schreiber and Anderson (1990) cued participants to perceive AGL stimuli in terms of specific chunks that were made salient with the use of spaces. Their participants were less likely to correctly reject NG items that preserved the chunks made salient from training than those that did not preserve chunks. Such cues, of course, guide participants to process the stimuli in a certain way, so it is unclear that they would do so spontaneously. Nonetheless, in support of Servan-Schreiber and Anderson’s (1990) procedure, Saffran, Aslin, and Newport (1996) reported that for sequential stimuli (such as AGL letter strings), decreases in predictability lead to segmentation: Almost by definition, chunks are units within which later symbols are highly predictable from previous ones. Boucher and Dienes (2003) implemented competitive chunking with a multilayer feedforward neural network, whereby each possible stimulus chunk corresponds to a different unit, with successive layers involving larger units. Every time a chunk unit is activated, its “strength” parameter increases, which in turn, determines the likelihood that the chunk will be used in the encoding of a novel stimulus. The strength of a chunk decays with time as well. Note that the model does not greatly differentiate between transitional probabilities. For example, whether fragment XA is twice as frequent as fragment XB makes little difference to the model; both fragments will be equally likely to encode a novel stimulus. Boucher and Dienes (2003) showed that their chunking model provides a more accurate description of AGL performance than does an SRN one. Knowlton and Squire (1994b, 1996; cf. Cheng, 1997) computed the outcome of the chunking process. They defined the associative chunk strength of each chunk (bigram or trigram) in a test item as the number of times it appeared in the training set. The global associative chunk strength of a test item was computed by averaging the associative chunk strength of all its chunks. The higher the global associative chunk strength of an item, the more likely it would be selected as G. Additional measures have been computed in a similar way, to take into account the salience of the beginning and end parts of each training string (anchor chunk strengths) and the sensitivity of participants to chunks seen for the first time at test (chunk novelty; Johnstone & Shanks, 1999; Meulemans & van der Linden, 1997). Overall, such measures can be understood as measures of fragment overlap, that is, the extent to which test items “overlap” with training ones. All chunk measures have been empirically supported, most notably global associative chunk strength (Johnstone & Shanks, 1999; Knowlton & Squire, 1994b, 1996; Meulemans & van der Linden, 1997; Pothos & Bailey, 2000). However, there are no studies that have concurrently examined the relative predictive adequacy of different chunk measures. The chunking hypothesis is an associative learning account that assumes coarse sensitivity to both backward and forward associations (i.e., sensitivity to co-occurrence; Boucher & Dienes, 2003). Global associative chunk strength is effectively a measure of co-occurrence for pairs and triplets of symbols and, as such, is in the spirit of Servan-Schreiber’s (1991) hypothesis. With a psycholinguistics task, Perruchet and Peereman (2004) showed that people are sensitive to measures of contingency that are more complex than co-occurrence, but these results are of arguable relevance to AGL. Additional chunk measures, such as anchor chunk strength

233

or chunk novelty, are difficult to interpret either within ServanSchreiber’s (1991) framework or within a general associative learning framework. Knowlton and Squire (1994b) noted that “the competitive chunking hypothesis also resembles exemplar-based models. . . in holding that classification is based on the frequency of occurrence of particular test item features across a set of test items” (p. 88). We can consider the possible equivalence between chunking measures and similarity within Tversky’s (1977) contrast model. Fragments are the features of AGL stimuli. The contrast model interpretation of global associative chunk strength is that the greater the number of shared fragments between two stimuli, the higher the similarity of the stimuli. Chunk novelty (Meulemans & van der Linden, 1997) can be equated with mismatching features. Finally, the contrast model allows for increased weighting of important features, consistent with the assumption that some fragments are more salient (e.g., anchor fragments). In sum, the chunking hypothesis is an associative learning explanation of AGL, and it can be seen as a computational elaboration of the fragments hypothesis. Brooks and Vokey (1991), among others, considered fragment/chunk knowledge as equivalent to microrules. However, chunking measures have also been interpreted as similarity. This confusion in AGL theory simply echoes the more general confusion of how associative learning theory relates to rules and similarity.

Implicit Versus Explicit Cognition Knowlton and Squire (1993, 1994a, 1994b, 1996; Knowlton, 1999; Knowlton, Ramus, & Squire, 1992; Meulemans & Van der Linden, 2003) argued extensively that AGL knowledge is implicit. They found that individuals with amnesia made grammaticality selections on the basis of the same chunk strength and grammaticality influences as controls. As amnesic individuals’ recognition memory for the training items is extremely poor (but not nonexistent), such results have often been used to argue that the knowledge in AGL is implicit. There have been enough demonstrations of AGL with amnesic individuals that we can consider the effect robust, despite the low power of comparisons (about 10 amnesic participants per study) and retesting of the same amnesic participants (controls are sensitive to AGL knowledge for as long as 2 years; Allen & Reber, 1980). One interpretation of AGL amnesia results is that there are separate implicit and explicit stores and that the latter is impaired in amnesia. Nosofsky and Zaki (1998, 1999; cf. Huppert & Piercy, 1978) suggested that both amnesic and nonamnesic individuals have a single and intact exemplar-memory store. Amnesia is an impairment in the recognition mechanism but not in the classification one. Specifically, (exemplar) classification and recognition differ in that the former is a function of the similarity of a target instance to a particular category, whereas the latter involves similarity to all possible categories. Hence, reducing exemplar discriminability in amnesic individuals damages recognition but not classification. Shanks et al. reached a similar conclusion via a different route (Kinder & Shanks, 2001; Shanks & Perruchet, 2002; cf. Juola & Plunkett, 1998). Suppose that both classification and recognition are monotonic noisy functions of an item’s familiarity f, such that the probability of classification is C(f ⫹ n) and of recognition R(f ⫹ n) (noise could be variance attributable to

POTHOS

234

items). If C and R correlate, then they are (functionally) the same. If C and R are the same and n is comparable to f, then C can have high values when R has low values, and vice versa, so we would get a false impression of two separate systems. Shanks and colleagues argued that classification and recognition are, in fact, the same and that the observed dissociations can be accounted for by a unitary process with noise. The discussion above shows that the notion of separate implicit– explicit stores is probably wrong, and amnesic patients do demonstrate some awareness of AGL knowledge in recognition tasks. However, empirical evidence suggests that amnesic patients are not consciously aware of the basis of their grammaticality decisions. Moreover, both Nosofsky and Zaki’s (1998) and Kinder and Shanks’s (2001) analyses indicate that the representations of AGL knowledge available to amnesics are noisy, or impoverished, relative to controls. Hence, according to the definition of implicit cognition adopted here, amnesic participants’ grammaticality decisions are implicit. However, in recognition tasks, with enough effort (apparently) some amnesic participants can consciously activate knowledge initially resistant to such activation (Dienes & Perner, 1999; Dulany, 1997; A. S. Reber, 1993). Accordingly, recognition judgments for some amnesic individuals can be characterized as explicit, without loss of consistency.

Specific Similarity Form of Knowledge Learning involves encoding the training stimuli so that the similarity between new and old stimuli can be computed. Brooks and Vokey (1991; Vokey & Brooks, 1992, 1994) defined the specific similarity of two strings as the number of letter changes, including insertions or deletions, in one string that produces the other. A test item was considered similar to the training items if at least one training item existed such that the two strings differed by one letter, dissimilar otherwise. Specific similarity (or edit distance) is a form of exemplar similarity widely used, especially in psycholinguistics (e.g., Luce & Pisoni, 1998) and readily motivated within Chater and Hahn’s (1997) transformational approach. Vokey and Brooks (1992) created test stimuli such that the G and NG groups could be divided into equal subsets of high- and low-similarity items. These authors found that stimuli were more likely to be selected as G if stimuli were G or similar to the training set; grammaticality was interpreted as the overall similarity of all training instances to each test one. Specific similarity was supported in several later studies (Higham, 1997a, 1997b; Johnstone & Shanks, 1999; McAndrews & Moscovitch, 1985; Pothos & Bailey, 2000). Specific similarity contrasts with associative learning in that the former assumes that “a string. . .constitutes a functional unit. . . [and] forms a representation which is encoded, stored, and retrieved in an all-or-nothing manner” (Perruchet, 1994, p. 223), whereas the latter assumes that stimulus processing is bottom-up, starting from constituent symbols and gradually building up more complex chunks. How stimuli are encoded plausibly depends on their format (Whittlesea & Dorken, 1993). For example, Pothos and Bailey (2000) created AGL stimuli to encourage processing as individual, nonanalyzable entities.

Implicit Versus Explicit Cognition Vokey and Brooks’s (1992) model assumes that training stimuli need be individually represented for specific similarity computations to occur. In categorization, individual representation of exemplars usually implies that they can be consciously activated; there have been correlations between exemplar recognition memory and corresponding exemplar categorization effects (Nosofsky & Zaki, 1998; Nosofsky, Clark, & Shin, 1989). However, in AGL, Higham and Vokey (1994) argued that exemplar effects do not require explicit exemplar retrieval.

Generalized Context Model Similarity Form of Knowledge; Implicit Versus Explicit Cognition Pothos and Bailey (2000) modeled similarity in AGL with the GCM (Nosofsky, 1988), a widely supported exemplar categorization model. The GCM can be characterized as a nonlinear estimator of category boundaries, whose operation depends on approximately six parameters, which cannot be set a priori (Ashby & Alfonso-Reese, 1995). As applied to AGL, the average similarity of a test stimulus to all training ones is assumed to affect the likelihood of its selection as G. In Pothos and Bailey’s (2000) study, similarity was assessed empirically, where participants provided similarity ratings among all AGL stimuli. Pothos and Bailey (2000) found that GCM similarity predicted participants’ grammaticality decisions independently of grammaticality, a combination of chunk strength measures, and specific similarity with item length (unlike Vokey & Brooks, 1992, specific similarity was computed relative to all training items). That is, empirical similarity was not predicted by either chunk strength or specific similarity, but the basis of the empirical similarity ratings among AGL stimuli is unclear (for example, similarity ratings of test items might have been influenced by whether they were G). Contrasting specific similarity with the GCM, one notes that specific similarity involves comparing test items to a single training item, whereas GCM similarity depends on average similarity to all training items. The latter approach is more common in categorization (Pothos, 2005a), but specific similarity has been supported both as a statistical model (nearest neighbor methods) and in psychological research (Luce & Pisoni, 1998). In categorization, a new instance X is likely to be classified in a category A on the basis of X’s average similarity to A’s members, when the salience and distinguishability of A’s members is enhanced (Lacroix, Giguere, & Larochelle, 2005; Regehr & Brooks, 1993; Rouder & Ratcliff, 2004; Shin & Nosofsky, 1992). For example, according to Rouder and Ratcliff (2006), as learning enables participants to differentiate among exemplars, categorization becomes exemplarbased, but early on, when stimuli are confusable, categorization is rule-based (Gibson & Gibson, 1955; Johansen & Palmeri, 2002; Logan, 1988). However, Vokey and Brooks (1992) reported that the influence of specific similarity in AGL decreased with item memory. This result further indicates that specific similarity and average exemplar (GCM) similarity are distinct influences on AGL performance, although the lack of a direct comparison renders any conclusion tentative. Finally, whether AGL knowledge is implicit or explicit was not considered in Pothos and Bailey (2000).

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

Single Recurrent Networks Form of Knowledge; Implicit Versus Explicit Cognition An SRN (Elman, 1990) is a variation of the feedforward threelayer network, whereby a context layer stores the activations of a hidden layer at every time step and feeds this information into the next time step. The output of the SRN depends on both its current input and its history of activation; thus, SRNs are particularly suited for discovering structure in temporal sequences. The operation of an SRN depends on several parameters that cannot be determined a priori (e.g., a learning parameter affects the speed of connection changes, and the number of hidden units affects the network’s generalization ability). Cleeremans and McClelland (1991; Cleeremans, 1993) compared how an SRN acquired the structure of a finite state language in an SRT with participants’ performance. SRN responses gradually reflected knowledge of an increasing number of previous symbols. By contrast, the human ability to take into account preceding symbols was capped at around four items (Cowan, 2001; Chen & Cowan, 2005). One might think that with enough training, participants would acquire sensitivity to longer range dependencies, but this is unlikely: Cleeremans and McClelland (1991) tested their participants for 60,000 trials. Modifying the SRN by introducing response and item priming enabled excellent fits with human performance. Boucher and Dienes (2003) compared an SRN with a chunking model in AGL. They found that the SRN differed from their chunking model in two ways. First, the SRN learns forward transitional probabilities accurately. For example, an SRN can learn if symbol A is followed twice as frequently by symbol X than by symbol Y. Second, the SRN is subject to catastrophic forgetting. If the network is presented with new information, it forgets old information. Empirical results supported the chunking model but not the SRN. Humans are sensitive to contingencies other than forward transitional probabilities, they do not display catastrophic forgetting, and their sensitivity to forward transitional probabilities is crude (as long as a chunk is activated, it does not matter whether its frequency differs from that of other chunks). An SRN can be seen as a computational instantiation of an associative learning model sensitive to forward transitional probabilities. Moreover, SRNs are considered to reflect similarity processes in that similar inputs lead to similar outputs. In psycholinguistics, many researchers consider the operation of a neural network as fundamentally incompatible with psychological rules (Marcus et al., 1995; E. E. Smith, Langston, & Nisbett, 1992). However, neural networks do develop structures that resemble rules. Dienes (1992) examined an autoassociator network trained to reproduce the results of various AGL studies. He identified sequence parts such that their presence would guarantee a G classification by the network and proposed that such parts might be interpreted as rules. Arguably, if emergent properties of neural networks resemble rules, they should be considered rules in the same way we consider emergent properties of brain neurons as rules (Pothos, 2005a; but cf. Gross, Rocha-Miranda, & Bender, 1972). It is unclear how the knowledge developed by an SRN can be considered implicit or explicit.

235 PARSER

Form of Knowledge PARSER, a computer-implemented model, encodes new input in terms of fragments that are initially randomly generated from the available symbols (Perruchet et al., 2002; there is another version in which stimulus encoding involves initially individual symbols). Fragments that are frequently used are favored in subsequent encodings; the others disappear from the system (cf. Mathews, 1991). PARSER eventually learns to encode G sequences in terms of fragments that reflect the structure of the corresponding finite state language. PARSER simulates the function of an attentional window (like short-term memory) that enables input processing by a succession of discrete, disjunctive attentional foci. The number of elements that can be perceived during each focus is limited. Also, co-occurring elements are chunked together to form larger fragments. Fragments may disappear from the system in either of two ways: by a decay process, such that fragments must be continually reinforced to be maintained, or by an interference process, such that the weight for element XY is reduced if X and Y appear in contexts other than XY. PARSER depends on various parameters (notably decay and interference), but it is not currently possible to appreciate how sensitive to parameter changes PARSER’s predictions are. Perruchet et al. (2002; cf. Servan-Schreiber & Anderson, 1990) asked participants to divide AGL sequences into parts before and after a familiarization phase. The number of different fragments, but not the total number of fragments, reliably decreased before and after familiarization. They suggested that familiarization resulted in fragments more representative of the structure of the AGL stimuli (not larger ones); more representative fragments are basically more frequent ones. PARSER was successfully fitted to human experimental results. Perruchet and Peereman (2004) compared PARSER with an SRN and found that the SRN was sensitive to forward associations more so than was PARSER, but PARSER had a more balanced sensitivity across several measures of contingency (forward and backward transitions as well as more complex contingency functions) and provided better fits to participant performance than did the SRN— but in a psycholinguistics task, not AGL. Although the learning outcomes from PARSER and the chunking hypothesis are nearly the same, the learning processes are different. According to chunking, more complex fragments are built bottom-up from simpler fragments and individual symbols; unrepresentative fragments are not developed as their constituent elements do not co-occur often enough. By contrast, PARSER starts with knowledge of different fragments. As the system processes a set of stimuli, the fragments that are inconsistent with the structure of the stimuli disappear with the aid of an interference mechanism. Therefore, chunking can readily be motivated as an associative learning model, but this is less true for PARSER.

Implicit Versus Explicit Cognition Perruchet et al. (2002) claimed that fragment knowledge was explicit and there is little doubt that this was the case with their procedure, as participants were asked to indicate how AGL stimuli should be divided into fragments. However, such a procedure is different from the standard AGL one, in which participants are

POTHOS

236

exposed to AGL stimuli and incidentally form, or do not form, fragments. Moreover, participants’ recognition of the relevant fragments could be epiphenomenal to some other type of acquired knowledge. In other words, participants could acquire some nonfragmentary knowledge that allows them to recognize which fragments are representative of the training items.

Fluency Form of Knowledge; Implicit Versus Explicit Cognition Certain stimuli are processed faster than others, for example, among word stimuli, some are more legible or familiar than others. Suppose that in a set of stimuli, some can be processed more easily, or fluently, than others and that these stimuli are evaluated along some dimension (e.g., Were they seen before? Are they aesthetically pleasing? Are they G?). The fluency hypothesis proposes that responses favor the stimuli processed most fluently (Whittlesea & Price, 2001; Whittlesea & Williams, 2000; Whittlesea, Jacoby, & Girard, 1990). That is, participants confuse whatever stimulus dimension they are meant to be processing with fluency. Fluency is related to the “mere exposure effect” (R. Reber, Schwarz, & Winkielman, 2004; Zajonc, 1968): Stimuli encountered previously are considered more pleasing than novel stimuli. In AGL, Gordon and Holyoak (1983) used a standard AGL paradigm except that they replaced the grammaticality task with a “liking” one, in which participants rated how much they liked the test items. Legal strings were rated as more pleasing (i.e., the structural mere exposure effect; Zizak & Reber, 2004), although it is unclear how participants decided one meaningless letter string was more pleasing than another. Note that the AGL structural mere exposure effect occurs only with stimuli made of highly familiar symbols. Stimuli that are G (or high in chunk strength or similar to the training items) are plausibly more fluent than others (cf. Miller, 1958). Buchner (1994) examined how the grammaticality status of an AGL string affects its fluency and vice versa. Following a memorization training procedure, test G or NG sequences (and distractors) were presented on a computer screen, initially obscured by a black square that was gradually removed. Participants had to identify as quickly as possible the obscured sequence in each trial. Buchner found faster responses to G than NG sequences. Hence, G sequences appear to be more fluent than NG ones. However, perceptual clarification speed did not predict judgments in a later grammaticality task. Hence, fluency, as operationalized by Buchner, does not explain grammaticality judgments. Additionally, manipulations enhancing fluency can be totally irrelevant to the structure of the stimuli, such that it is possible to alter the likelihood of selecting an item as G in ways that have nothing to do with, for example, grammaticality. Kinder et al. (2003) examined fluency arising from ease of processing. These authors used a perceptual clarification procedure, expecting that rapidly identified test strings would be more fluent, and therefore more likely to be selected as G, than slowly identified strings. This prediction was confirmed only when, according to Kinder et al., participants were (instructionally and/or methodologically) guided to respond on the basis of gut-feeling intuition. Kinder et al. suggested that an alternative, recollective, strategy can be adopted

in AGL, whereby participants at test try to recall as much as they can from training to decide which new items are G. Such a recollective strategy was hypothesized to be uninfluenced by fluency. Fluency relates to understanding AGL not only in terms of stimulus structural properties but also via episodic aspects of the learning procedure (Whittlesea & Dorken, 1993; Whittlesea & Leboe, 2000). When fluency is assumed to be independent of the structural content of AGL stimuli, it is orthogonal to accounts of AGL pertaining to rules, similarity, or associative learning. When fluency can arise from stimulus properties, it can, in principle, covary with chunk overlap (cf. Buchner, 1994), rule, or instance hypotheses in AGL. Further work is needed here. Finally, fluency, as a nonrecollective strategy, reflects (by definition) implicit cognition. However, Kinder et al. (2003) were not very clear on how to encourage, or assess, recollective versus nonrecollective strategies.

Additional Data Transfer In standard AGL, the training and test sequences are created from the same set of symbols. In transfer experiments, this is not the case; test sequences look different from the training ones; for example, the training sequences might consist of T,V,J,X, whereas the test ones might read B,C,W,K. Participants can successfully discriminate between G and NG strings in transfer experiments with a level of accuracy typically lower than that of nontransfer experiments (Brooks & Vokey, 1991; Howard & Ballas, 1980; Manza & Reber, 1997; Mathews et al., 1989; A. S. Reber, 1969). One intuition is that for transfer to occur, AGL stimuli must be processed at an abstract level such that their regularity structure is encoded independently of their appearance. Redington and Chater (1996, 2002) modeled transfer results without abstraction at training, with a learning process restricted to bigrams and trigrams. Test items were recognized as G if they could be parsed with familiar bigrams and trigrams. If the symbols changed at test, a mapping between old and new symbols was attempted, and processing of the test stimuli proceeded as in the nontransfer case. However, participants cannot do transfer AGL on the basis of bigrams and trigrams alone (Gomez & Schvaneveldt, 1994; Manza & Reber, 1997). There is evidence that stimuli are represented with both appearance-specific and abstract codes, even if the latter is harder to do (Kroger, Holyoak, & Hummel, 2004). A. S. Reber (1969) suggested that transfer is possible because participants represent the abstract rule structure of the G items. Brooks and Vokey (1991) proposed a mechanism of abstract analogy, whereby stimuli can be similar both if they are superficially similar and if they share the same abstract structure (e.g., MSSSV and XYYYT). Abstraction is often equated with rules in psychology; the notion of abstract analogy sheds new light on what rule knowledge can or cannot be (Pothos, 2005a; cf. Marcus et al., 1995). Servan-Schreiber (1991) proposed an extension of his competitive chunking model, whereby abstract chunks are formed and strengthened in a manner analogous to concrete chunks. In abstract chunks, concrete symbols are replaced with variables such that the same– different relations among the concrete symbols are preserved. Dienes et al.

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

(1999; Altmann & Dienes, 1999) presented an SRN with two sets of input units (one for the symbols of the training items and one for those of the test items) that could simulate transfer results in AGL. There is little research as to whether fluency can capture transfer results. Participants rate as more pleasing test G items in nontransfer AGL (Gordon & Holyoak, 1983), but they do not always do so in a transfer version (Whittlesea & Wright, 1997; Zizak & Reber, 2004). Gomez, Gerken, and Schvaneveldt (2000) suggested that repetition structure is the basis of transfer, and Tunney and Altmann (1999) highlighted the importance of frequency distributions of elements in the first position. Tunney and Altmann (2001) proposed two independent mechanisms for transfer: one that depends on the presence of repeating elements (cf. Brooks & Vokey, 1991) and another guided by sequential dependencies of nonrepeating elements (cf. Dienes et al., 1999). These mechanisms have been interpreted as exemplar and fragmentary processes, respectively. However, such an interpretation may be premature. First, no model of rule knowledge was evaluated in Tunney and Altmann (2001): An AGL rule hypothesis very plausibly includes knowledge of sequential dependencies (cf. A. S. Reber, 1967). Second, a number of studies have reported contrasts between grammaticality and either specific similarity (Brooks & Vokey, 1991) or fragment knowledge (Knowlton & Squire, 1996). It is unclear how this empirical contrast can be explained in Tunney and Altmann’s proposal. Overall, the transfer paradigm provides an excellent medium for studying abstraction, but current results do not allow us to support one AGL hypothesis more than the others.

Stimulus Format AGL is possible with stimuli other than letter strings, such as graphical symbols, colors, or musical tones (Altmann et al., 1995; Chan, 1992; Pothos, Chater, & Ziori, 2006; Whittlesea & Wright, 1997; cf. Pothos & Cox, 2002). Pothos and Bailey (2000) compared embedded geometric shapes with sequences of geometric shapes. The former were meant to give the impression of more coherent wholes, but Pothos and Bailey (2000) failed to find any differences in specific similarity influences. Chang and Knowlton (2004; see also Higham, 1997b) found that changing the font and case of the training stimuli (letter strings) did not affect grammaticality accuracy but impaired sensitivity to bigram and trigram frequencies. These results corroborate Whittlesea and Wright’s (1997) episodic processing perspective of AGL. Relating to abstraction, Altmann et al. (1995) demonstrated transfer from tones to letters, suggesting that the training stimuli were processed at a level abstract enough to enable modalityindependent knowledge. Conway and Christiansen (2005, in press) used a crossover design, with training items from two grammars in two modalities (auditory tones, color sequences). At test, all sequences were presented in one of the vocabularies used in training. Participants successfully identified G items from either training grammar, showing that superficial aspects of the training stimuli were encoded. As Manza and Reber (1997) observed, participants apparently learn both about the abstract structure and the superficial characteristics of AGL stimuli. Conway and Christiansen’s (2005; in press) results suggest that regularity in different modalities may be processed simulta-

237

neously. Mayr (1996) used a variant of the SRT, whereby objects appeared in different locations on a screen. The object sequence and the location one were deterministic, and participants learned both regularities. Rah, Reber, and Hsiao (2000) used a similar task (based on tones and lights), again finding that participants learned both sequences. Overall, AGL results with stimuli other than letter strings demonstrate that stimulus processing extracts regularity at multiple levels of abstraction. The concurrent processing of multiple regularities is a marker of implicit cognition (A. S. Reber, 1993).

Other Neuroscience Data Alzheimer’s disease (AD) has been linked with damage in the medial temporal lobe, specifically the neocortex and the hippocampus. Loss of declarative memory is a functional marker of AD. Parkinson’s disease (PD) is associated primarily with damage to the striatum/basal ganglia and the caudate nucleus and, accordingly, PD patients show impairments in habit learning (e.g., Eldridge, Masterman, & Knowlton, 2002). AGL results with AD patients are limited, as AD patients display similar characteristics to amnesic individuals in AGL. Several investigators (Peigneux, Meulemans, Van der Linden, Salmon, & Petit, 1999; P. J. Reber & Squire, 1999; J. G. Smith et al., 2001; Witt, Nuehsman, & Deuschl, 2002) have shown intact AGL with PD patients. Peigneux et al.’s (1999) PD patients were at chance in a second presentation of the test items and J. G. Smith and McDowall’s (2006) PD participants were impaired in an AGL task with corrective feedback. However, both findings probably reflect an attentional deficit in PD. Moody, Bookheimer, Vanek, and Knowlton (2004) examined neural activity in PD patients with a probabilistic classification task and found that there was activation in prefrontal cortex areas associated with explicit memory retrieval; habit learning areas in PD patients were dark (suggesting lack of activation, as expected). This could mean either that the probabilistic task reflects explicit cognition generally or that PD patients could identify explicit strategies to cope with the task, but that such strategies were not adopted by nonpatient controls. Additionally, J. G. Smith and McDowall (2006) thoroughly assessed the explicit knowledge of both control and PD patients in an AGL task to find that explicit knowledge was more likely to correlate with accuracy for controls. But, as stated, J. Smith and McDowall used corrective feedback and thus their results may be less applicable to AGL generally (see also Fletcher, Buechel, Josephs, Friston, & Dolan, 1999). Overall, extrapolating Moody et al.’s (2004) results, one can suggest that the implicit component of AGL does not function with PD patients. This could happen either because it is impaired or because its normal operation is prevented in some other way. With respect to the latter possibility, normal individuals can process or integrate regularity structure from different sources (Mayr, 1996; Rah, Reber, & Hsiao, 2000), but this does not appear to be the case for PD patients: In an SRT involving two correlated sequences (objects, locations), PD patients learned the individual sequences but not the integrated one (J. G. Smith & McDowall, 2005). Accordingly, the rich structure of AGL stimuli may involve too many possible regularities for PD patients to (implicitly) process concurrently; hence, their intact explicit system takes over; with

POTHOS

238

simpler implicit learning tasks, such as the SRT, PD patients’ performance is intact. The competition between implicit and explicit cognition implied by Moody et al. (2004) has been supported by other investigations (Eldridge et al., 2002; Poldrack et al., 2001; P. J. Reber, Stark, & Squire, 1998; P. J. Reber, Gitelman, Parrish, & Mesulam, 2003; Skosnik et al., 2002). Neurologically, this competition appears to involve the basal ganglia and the medial temporal lobe. The neuroscience perspective on the implicit versus explicit contrast complements the functional interpretation of the contrast considered earlier (in terms of stimulus complexity; Sun et al., 2005). There have been attempts to link neuroscience and functional considerations. For example, Atallah, Frank, and O’Reilly (2004) proposed two learning mechanisms, one driven by the posterior cortex that involves overlapping representations and slow, integrative learning (e.g., “What is the best place to park my car?”) and another driven by the hippocampus that involves sparse, distinct representations, and fast, literal encoding (e.g., “Where did I park yesterday?”). Further research is needed, however, before such a perspective can be linked to either the implicit– explicit contrast or the rule–similarity–associative learning distinction. Overall, the neuroscience data corroborate a view of separate implicit and explicit components in competition with each other. Finally, recent fMRI studies have examined possible competition between rules and chunks: Lieberman, Chang, Chiao, Bookheimer, and Knowlton (2004) found that chunk strength performance was linked to the medial temporal lobe, whereas grammaticality was linked to the right caudate nucleus, with corresponding activations negatively correlated.

Rules, Similarity, and Associative Learning The general controversy on how to define the relations between rules, similarity, and associative learning is reflected in AGL. The AGL review suggests the following questions. First, how are rules to be understood? Second, should fragmentary knowledge be considered microrules or similarity (or neither)? Third, what is the relation between specific similarity and chunking measures? Finally, how many hypotheses for AGL performance are needed? To address these issues, rules, specific similarity, and associative learning are specified in an abstract framework. In this way, a comparison between psychological explanations may be achieved that is not obscured by implementational details (Anderson, 1978; Marr, 1982).

Associative Learning I consider the co-occurrence view of associative learning in AGL. Let t1. . .tz be the allowed transitions of a finite state language, in which the index enumerates transitions (I am referring to the finite state language a participant abstracts, not the experimenter’s one; also the analysis below is the same whether symbol specificity or abstract codes are assumed). Test sequences are labeled as t1. . .tk. . .tn; for simplicity, I assume 1 corresponds to the first position and n to the last and that 1 ⬍ k ⬍ n (i.e., indexing is inconsistent across strings). How can a sequence be selected as G? On the basis of bigram overlap, a sequence is more likely to be classified as G than NG if it has more frequent (from training) bigrams, that is, if the sum f(t1t2) ⫹ f(t2t3). . . ⫹ f(tktk⫹1). . . ⫹

f(tn⫺1tn) is greater (f: frequency). Generalizing, G classification of a test item on the basis of fragment knowledge depends on: f共t1 t2 兲 ⫹ f共t2 t3 兲. . . ⫹ f共tktk⫹1 兲. . . ⫹ f共tn⫺1 tn兲 ⫹ f共t1 t2 t3 兲 ⫹ f共t2 t3 t4 兲. . . ⫹ f共tktk⫹1 tk⫹2 兲. . . ⫹ f共tn⫺2 tn⫺1 tn兲 ⫹ f共t1 t2 t3 t4 兲 ⫹ f共t2 t3 t4 t5 兲. . . ⫹ f共tktk⫹1 tk⫹2 tk⫹3 兲 ⫹ . . . f共tn⫺3 tn⫺2 tn⫺1 tn兲. . . ⫹ f共t1 . . .tk. . .tn兲 (As fragments become larger they end up corresponding to entire strings.) Overall, fragments knowledge is bottom-up: Smaller [n]grams are learned first and gradually lead to more complex ones.

Specific Similarity A test string t1t2. . .tk. . .tn⫺1tn is most likely to be classified as G on the basis of specific similarity if there is an identical string in training (trivially). It is next most likely if there are present in training either of the two [n⫺1]grams it consists of, and so forth. The worst approximation to specific similarity is provided by knowledge of bigrams, as the bigrams between two sequences could be similar, but their specific similarity could be low. Formalizing, the probability of classification on the basis of specific similarity increases with increasing (in which frequencies, as always, correspond to training items): f共t1 t2 . . .tk. . .tn⫺1 tn兲 f共t1 . . .tk. . .tn⫺1 兲, f共t2 . . .tk. . .tn兲 f共t1 . . .tk. . .tn⫺2 兲, f共t3 . . .tk. . .tn兲, f共t2 . . .tk. . .tn⫺1 兲 f共t1 t2 t3 兲, f共t2 t3 t4 兲, f共tktk⫹1 tk⫹2 兲, f共tn⫺2 tn⫺1 tn兲 f共t1 t2 兲, f共t2 t3 兲. . ., f共tktk⫹1 兲. . ., f共tn⫺1 tn兲 Thus, specific similarity assumes top-down knowledge: That is, to the extent that specific similarity is empirically supported, larger [n]grams must be encoded first and only gradually (possibly) broken down into their constituent elements. Overall, it is clear that specific similarity and associative learning are distinct views of what is learned in AGL, differentiated in terms of whether stimulus encoding is bottom-up or top-down. Moreover, Knowlton and Squire’s (1996) chunking measures are an associative learning model of AGL; they have an interpretation in terms of similarity, as feature overlap (Pothos, 2005a; Tversky, 1977), but this similarity is different from specific similarity.

Rules Knowledge of a finite state language is basically knowledge of all the allowed transitions, such that a participant could classify a test item as G by deciding whether the second symbol is allowed given the first one, whether the third symbol is allowed given the first two, and so forth (A. S. Reber, 1976). In other words,

THEORIES OF ARTIFICIAL GRAMMAR LEARNING

knowledge of the finite state language is (for a particular sequence): t1

is the transitional strength referred to by Johnstone & Shanks, 1999). This possibility may be called the frequency-dependent view of rule knowledge, whereby knowledge of rules is as follows (where f⬘ indicates effective frequency):

t1 t2

f⬘共t1 . . .tk. . .tn⫺1 兲, f⬘共t2 . . .tk. . .tn兲

t1 t2 t3

f⬘共t1 . . .tk. . .tn⫺2 兲, f⬘共t2 . . .tk. . .tn⫺1 兲, f⬘共t3 . . .tk. . .tn兲

t1 t2 t3 tk. . .

f⬘共t1 t2 t3 兲, f⬘共t2 t3 t4 兲, f⬘共tktk⫹1 tk⫹2 兲, f⬘共tn⫺2 tn⫺1 tn兲

t1 t2 t3 tk. . .tn

f⬘共t1 t2 兲, f⬘共t2 t3 兲. . ., f⬘共tktk⫹1 兲. . ., f⬘共tn⫺1 tn兲

The above-described knowledge is subsumed by knowledge of the entire string and approximated by knowledge of fragments. For example, if one were to break the string into two [n⫺1]grams, then complete transitional knowledge would be knowledge of the [n⫺1]grams plus knowledge of which goes first (Shanks et al., 1997). Accordingly, knowledge of the finite state language for a string is approximated by (in order of more accurate approximations, listed first): t1 . . .tk. . .tn⫺1 , t2 . . .tk. . .tn t1 . . .tk. . .tn⫺2 , t2 . . .tk. . .tn⫺1 , t3 . . .tk. . .tn t1 t2 t3 , t2 t3 t4 , tktk⫹1 tk⫹2 , tn⫺2 tn⫺1 tn t1 t2 , t2 t3 . . ., tktk⫹1 . . ., tn⫺1 tn However, there is a different way to approximate knowledge of the finite language that does not depend on consecutive symbols. For example, a person might know that when the second symbol is an M, the fourth symbol is an X; this knowledge would be of the form: tk⫺tl, where 1 ⱕ k ⬍ l ⱕ n. (The dash means that we do not have in-between symbols.) It does not appear necessary to include relations ti ⫺ tk ⫺ tl, where 1 ⱕ i ⬍ k ⬍ l ⱕ n, as any three-symbol relation can be decomposed into 2 two-symbol ones. (Because two-symbol relations include positional information, the problem of indeterminacy that occurs in decomposing trigrams into bigrams is not an issue here.) Knowledge of tk⫺tl, where 1 ⱕ k ⬍ l ⱕ n, is included in knowledge of the [l⫺k]gram tk. . .tl. In fact, we can postulate that knowledge of tk⫺tl cannot exist without processing of tk. . .tl, and, consistent with Bukach, Bub, Masson, and Lindsey (2004) and Glenberg (1997), knowledge of tk ⫺ tl is nothing but the summation of experiences of tk. . .tl, but in the case where the intermediate symbols are too variable to lead to a stable representation (Pothos, 2005a). Indeed, Gomez (2002) showed that nonadjacent dependencies would be learned when the adjacent ones were very variable (or very stable). Thus, effectively, knowledge of tk ⫺ tl, where 1 ⱕ k ⱕ i ⱕ l ⱕ n, is the same as knowledge of tk. . .tl. Thus, knowledge of a finite state language is knowledge of all [n]grams, with simpler [n]grams corresponding to cruder approximations of this knowledge. Thus far, nothing has been said as to how knowledge of a finite state grammar corresponds to rules. Two possibilities can be identified. First, assume that knowledge of the finite state language develops as a function of experience with different [n]grams (this is not contentious) and that the effective knowledge of the finite state language (i.e., the knowledge that we apply in order to make grammaticality judgments) depends on the extent of experience with different [n]grams (this

239

Note that in this scheme, there is no assumption as to whether knowledge is bottom-up or top-down: The processor could start from larger [n]grams and gradually break them down or from smaller ones and gradually build them up. I refer to the second possibility as the frequency-independent view of AGL rules. As before, assume that knowledge of the finite state language develops as a function of experience with [n]grams. The second assumption is that the effective knowledge of the finite state language is independent of the extent of experience with different [n]grams; in other words, once we have “discovered” a rule, it is applied in the same way regardless of the frequency with which it has been experienced. This is Pothos’s (2005a) rule proposal. That this is a reasonable way to understand rules is evident in language, whereby our immediate recognition of the phrase “colorless green ideas sleep furiously” as grammatically correct cannot be guided by previous experience, as the phrase is entirely novel (Chomsky, 1965). Accordingly, by this view, rule knowledge is as follows (where again there is no suggestion as to whether it is top-down, bottom-up, or otherwise): t1 . . .tk. . .tn⫺1 , t2 . . .tk. . .tn t1 . . .tk. . .tn⫺2 , t2 . . .tk. . .tn⫺1 , t3 . . .tk. . .tn, t1 t2 t3 , t2 t3 t4 , tktk⫹1 tk⫹2 , tn⫺2 tn⫺1 tn t1 t2 , t2 t3 . . ., tktk⫹1 . . ., tn⫺1 tn It should now be apparent that, in the general case, the frequencydependent view of rules in its bottom-up version is identical to associative learning and in its top-down version is identical to specific similarity. This equivalence has arguably been the single most important factor obstructing progress in AGL, confusing contrasts among rules and grammaticality and other knowledge influences. Of course, contrasts between grammaticality and associative learning/specific similarity have been demonstrated, but, unless specifically defined, grammaticality is a notion devoid of explanatory content. Retaining a conception of AGL rules is important, as a claim that rules are learned in AGL can be related to a vast literature on rules in categorization, language, and reasoning (Pothos, 2005a). The frequency-independent view of AGL rules can be understood in terms of either Dulany et al.’s (1984) microrules, whereby participants develop tests to decide which items are G, or a rule-based representation of the finite state language each participant develops in training. The latter possibility corresponds to the view of rules postulated in language (Chomsky, 1965). There has been no evidence that such rules are developed in AGL, nor a proposal for a corresponding learning mechanism.

POTHOS

240

Finally, it should be now clear that rules, specific similarity, and associative learning form distinct hypotheses of the form of AGL knowledge; moreover, there is an additional conception of similarity (through chunking) distinct from specific similarity. Whereas there have been sophisticated empirical contrasts between specific similarity and associative learning, this is not so for the rule proposal (that has been typically and inappropriately equated with grammaticality). Future research will hopefully clarify this issue.

Implicit Versus Explicit Cognition In brief, the findings of Perruchet and Pacteau (1990) and Dulany et al. (1984) need be reconciled with those of Knowlton and Squire (1996) and A. S. Reber (1989). Adopting the perspective of Dienes and Perner (1999; cf. Schacter, 1992; Shanks, 2005), it is noted that implicit cognition involves knowledge that is activated but not consciously activated in a cognitive operation. It is also postulated that for conscious activation of a representation, this representation must be activated within working memory and must be of sufficient quality. Grammaticality judgments involving fragments or whole instances would typically involve computations over a number of elements, too great to be represented in working memory (Baddeley, 1983; Cowan, 2001). Hence, such judgments would typically be implicit (behaviorally, they would correspond to gut-feeling intuitions of which test items are most familiar or best formed; A. S. Reber, 1993). Recognition, by contrast, would be implicit or explicit depending on the quality of the fragment or instance representations, which would in turn depend on instructional or methodological manipulations, the distributional characteristics of the training items, and participant strategies (cf. Higham et al., 2000; Turner & Fischler, 1993). Thus, Perruchet and Pacteau (1990) showed that fragment knowledge can be recognized explicitly, but arguably this same knowledge would be used implicitly in standard AGL grammaticality selections. Vokey and Brooks’s (1992, 1994) whole exemplar knowledge would likewise be expected to affect grammaticality selections implicitly, given that identifying the most similar training exemplar would presumably require consideration of several, roughly similar, items. Dulany et al.’s (1984) microrules could have two forms. At test, conscious activation of salient fragments or instances in working memory may be associated with conscious activation of a feeling of correctness. Alternatively, participants may represent such associations propositionally, as tests to determine the grammaticality of test strings. Passive observational training is more consistent with the former possibility, directed hypothesis search, with the latter. Either way, microrules would be applied individually and activated explicitly (these have indeed been Dulany’s conclusions; Dulany, 2003, 2004). If AGL leads to a network of rules, then plausibly such knowledge would influence performance implicitly (A. S. Reber, 1993). By analogy with linguistic processing, we very rarely consciously activate the grammatical and syntactic knowledge that allows us to understand linguistic input. With respect to fluency, if one adopts Kinder et al.’s (2003) approach in which fluency arises from nonrecollective processing, then, by definition, fluency represents an implicit influence. It is possible, however, that future formulations of the fluency perspec-

tive that take into account structural stimulus properties will differ in this respect (cf. Buchner, 1994). In conclusion, the two dimensions along which AGL performance can be characterized, that is, rules, similarity, and associative learning, on the one hand, and implicit versus explicit cognition, on the other, are intimately intertwined. Characterizing AGL performance in this way allows an appreciation of the value of AGL research in the context of corresponding debates in cognitive science generally. The value of AGL is evident in that it allows a specific formulation of the contrast between rules, similarity, and associative learning along with a view of the implicit– explicit distinction that reconciles seemingly conflicting findings.

References Abrams, M., & Reber, A. S. (1989). Implicit learning: Robustness in the face of psychiatric disorders. Journal of Psycholinguistic Research, 17, 425– 439. Allen, R., & Reber, A. S. (1980). Very long term memory for tacit knowledge. Cognition, 8, 175–185. Altmann, G. T. M., & Dienes, Z. (1999). Rule learning by seven-month-old infants and neural networks. Science, 284, 87. Altmann, G. T. M., Dienes, Z., & Goode, A. (1995). Modality independence of implicitly learned grammatical knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 899 –912. Anderson, J. R. (1978). Arguments concerning representations for mental imagery. Psychological Review, 85, 249 –277. Ashby, G. F., & Alfonso-Reese, L. A. (1995). Categorization as probability density estimation, Journal of Mathematical Psychology, 39, 216 –233. Ashby, G. F., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in category learning. Psychological Review, 105, 442– 481. Atallah, H. E., Frank, M. J., & O’Reilly, R. C. (2004). Hippocampus, cortex, and basal ganglia: Insights from computational models of complementary learning systems. Neurobiology of Learning and Memory, 82, 253–267. Baddeley, A. D. (1983). Working memory. Philosophical Transactions of the Royal Society of London, 302B, 311–324. Berry, D. C., & Dienes, Z. (1993). Implicit learning: Theoretical and empirical issues. Hove, England: Erlbaum. Boucher, L., & Dienes, Z. (2003). Two ways of learning associations. Cognitive Science, 27, 807– 842. Brooks, R. L., & Vokey, R. J. (1991). Abstract analogies and abstracted grammars: Comments on Reber (1989) and Mathews et al. (1989). Journal of Experimental Psychology: Learning, Memory, and Cognition, 120, 316 –323. Buchner, A. (1994). Indirect effects of synthetic grammar learning in an identification task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 550 –566. Buchner, A., & Wippich, W. (2000). On the reliability of implicit and explicit memory measures. Cognitive Psychology, 40, 227–259. Bukach, C. M., Bub, D. N., Masson, M. E. J., & Lindsay, D. S. (2004). Category specificity in normal episodic learning: Applications to object recognition and category-specific agnosia. Cognitive Psychology, 48, 1– 46. Chan, C. (1992). Implicit cognitive processes: Theoretical issues and applications in computer systems design. Unpublished doctoral dissertation, University of Oxford, Oxford, England. Chang, G. Y., & Knowlton, B. J. (2004). Visual feature learning in artificial grammar classification. Journal of Experimental Psychology: Leaning, Memory, and Cognition, 30, 714 –722. Chater, N. (1999). The search for simplicity: A fundamental cognitive principle? Quarterly Journal of Experimental Psychology, 52A, 273– 302.

THEORIES OF ARTIFICIAL GRAMMAR LEARNING Chater, N., & Hahn, U. (1997). Representational distortion, similarity and the universal law of generalization. In Proceedings of the Similarity and Categorization Workshop (pp. 31–36). Edinburgh, Scotland: University of Edinburgh. Chen, Z., & Cowan, N. (2005). Chunk limits and length limits in immediate recall: A reconciliation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1235–1249. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367– 405. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N., & Miller, G. A. (1958). Finite state languages. Information and Control, 1, 91–112. Churchland, P. M. (1990). Cognitive activity in artificial grammar neural networks. In D. N. Osherson & E. E. Smith (Eds.), An invitation to cognitive science: Vol. 3. Thinking (pp. 199 –227). Cambridge, MA: MIT Press. Cleeremans, A. (1993). Mechanisms of implicit learning: Connectionist models of sequence processing. Cambridge, MA: MIT Press. Cleeremans, A. (2005). Computational correlates of consciousness. Progress in Brain Research, 150, 81–98. Cleeremans, A., & McClelland, J. L. (1991). Learning the structure of event sequences. Journal of Experimental Psychology: General, 120, 235–253. Conway, C. M., & Christiansen, M. H. (2005). Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 24 – 39. Conway, C. M., & Christiansen, M. H. (in press). Statistical learning within and between modalities: Pitting abstract against stimulus specific representations. Psychological Science. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. Destrebecqz, A., & Cleeremans, A. (2001). Can sequence learning be implicit? New evidence with the process dissociation procedure. Psychonomic Bulletin and Review, 8, 343–350. Dienes, Z. (1992). Connectionist and memory-array models of artificial grammar learning. Cognitive Science, 16, 41–79. Dienes, Z. (2004). Assumptions of subjective measures of unconscious mental states: Higher order thoughts and bias. Journal of Consciousness Studies, 11, 25– 45. Dienes, Z., & Altmann, G. (2003). Measuring learning using an untrained control group: Comment on R. Reber and Perruchet. The Quarterly Journal of Experimental Psychology, 56A, 117–123. Dienes, Z., Altmann, G. T. M., & Gao, S. (1999). Mapping across domains without feedback: A neural network model of transfer of implicit knowledge. Cognitive Science, 23, 53– 82. Dienes, Z., Altmann, G. T. M., Kwan, L., & Goode, A. (1995). Unconscious knowledge of artificial grammars is applied strategically. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1322–1338. Dienes, Z., Broadbent, D., & Berry, D. (1991). Implicit and explicit knowledge bases in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 875– 887. Dienes, Z., & Kuhn, G. (2005). Implicit learning of nonlocal musical rules: Implicitly learning more than chunks. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1417–1432. Dienes, Z., & Longuet-Higgins, H. C. (2004). Can musical transformations be implicitly learned? Cognitive Science, 28, 531–558. Dienes, Z., & Perner, J. (1999). A theory of implicit and explicit knowledge. Behavioral and Brain Sciences, 22, 735– 808. Dienes, Z., & Scott, R. (2005). Measuring unconscious knowledge: Dis-

241

tinguishing structural knowledge and judgment knowledge. Psychological Research, 69, 338 –351. Don, A. J., Schellenberg, E. G., Reber, A. S., DiGirolamo, D. M., & Wang, P. P. (2003). Implicit learning in individuals with Williams Syndrome. Developmental Neuropsychology, 23, 201–225. Dulany, D. E. (1997). Consciousness in the explicit (deliberative) and implicit (evocative). In J. D. Cohen & J. W. Schooler (Eds.), Scientific approaches to consciousness (pp. 179 –211). Mahwah, NJ: Erlbaum. Dulany, D. E. (2003). Strategies for putting consciousness in its place. Journal of Consciousness Studies, 10, 33– 43. Dulany, D. E. (2004). Higher order representation in a mentalistic metatheory. In R. J. Gennaro (Ed.), Higher-order theories of consciousness: An anthology (pp. 315–338). Amsterdam: Benjamins. Dulany, D. E., Carlson, R. A., & Dewey, G. I. (1984). A case of syntactical learning and judgment: How conscious and how abstract? Journal of Experimental Psychology: General, 113, 541–555. Dulany, D. E., Carlson, R. A., & Dewey, G. I. (1985). On consciousness in syntactic learning and judgment: A reply to Reber, Allen, and Regan. Journal of Experimental Psychology: General, 114, 33– 49. Eldridge, L. L., Masterman, D., & Knowlton, B. J. (2002). Intact implicit habit learning in Alzheimer’s disease. Behavioral Neuroscience, 116, 722–726. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179 –211. Fletcher, P., Buechel, C., Josephs, O., Friston, K., & Dolan, R. (1999). Learning-related neuronal responses in prefrontal cortex studied with functional neuroimaging. Cerebral Cortex, 9, 168 –178. Gentner, D., & Medina, J. (1998). Similarity and the development of rules. Cognition, 65, 263–297. Gibson, J. J., & Gibson, E. J. (1955). Perceptual learning: Differentiation or enrichment? Psychological Review, 67, 32– 41. Glenberg, A. M. (1997). What memory is for. Behavioral and Brain Sciences, 20, 1–55. Goldstone, R. L., & Barsalou, L. W. (1998). Reuniting perception and conception. Cognition, 65, 231–262. Gomez, R. L. (2002). Variability and detection of invariant structure. Psychological Science, 13, 431– 436. Gomez, R. L., Gerken, L., & Schvaneveldt, R. W. (2000). The basis of transfer in artificial grammar learning. Memory and Cognition, 28, 253–263. Gomez, R. L., & Schvaneveldt, R. W. (1994). What is learned from artificial grammars? Transfer tests of simple association. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 396 – 410. Goodman, N. (1972). Seven strictures on similarity. In N. Goodman (Ed.), Problems and projects (pp. 437– 447). Indianapolis, IN: Bobbs-Merrill. Gordon, P. C., & Holyoak, K. J. (1983). Implicit learning and generalization of the “mere exposure” effect. Journal of Personality and Social Psychology, 45, 492–500. Gray, J. (2004). Consciousness: Creeping up on the hard problem. Oxford England: Oxford University Press. Gross, C. G., Rocha-Miranda, C. E., & Bender, D. B. (1972). Visual properties of cells in inferotemporal cortex of the macaque. Journal of Neurophysiology, 35, 96 –111. Halford, G., Wilson, W., & Philips, S. (1998). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behavioral and Brain Sciences, 21, 803– 865. Hayes, N. A., & Broadbent, D. E. (1988). Two modes of learning for interactive tasks. Cognition, 28, 249 –276. Higham, P. A. (1997a). Chunks are not enough: The insufficiency of feature frequency-based explanations of artificial grammar learning. Canadian Journal of Experimental Psychology, 51, 126 –137. Higham, P. A. (1997b). Dissociations of grammaticality and specific

242

POTHOS

similarity effects in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1029 –1045. Higham, P. A., & Vokey, J. R. (1994). Resource to stored exemplars is not necessarily explicit: A comment on Knowlton, Ramus, and Squire (1992). Psychological Science, 5, 59 – 60. Higham, P. A., Vokey, J. R., & Pritchard, J. L. (2000). Beyond dissociation logic: Evidence for controlled and automatic influences in artificial grammar learning. Journal of Experimental Psychology: General, 129, 457– 470. Howard, J. H., & Ballas, J. A. (1980). Syntactic and semantic factors in classification of nonspeech transient patterns. Perception and Psychophysics, 28, 431– 439. Huppert, F. A., & Piercy, M. (1978). The role of trace strength in recency and frequency judgments by amnesic and control subjects. Quarterly Journal of Experimental Psychology, 30, 347–354. Johansen, M. K., & Palmeri, T. J. (2002). Are there representational shifts during category learning? Cognitive Psychology, 45, 482–553. Johnstone, T., & Shanks, D. R. (1999). Two mechanisms in implicit grammar learning? Comment on Meulemans and Van der Linden (1997). Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 524 –531. Juola, P., & Plunkett, K. (1998). Why double dissociations don’t mean much. In M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society (pp. 561–566). Mahwah, NJ: Erlbaum. Kinder, A., & Shanks, D. R. (2001). Amnesia and the declarative/ nondeclarative distinction: A recurrent network model of classification, recognition, and repetition priming. Journal of Cognitive Neuroscience, 13, 648 – 669. Kinder, A., Shanks, D. R., Cock, J., & Tunney, R. J. (2003). Recollection, fluency, and the explicit/implicit distinction in artificial grammar learning. Journal of Experimental Psychology: General, 132, 551–565. Knowlton, B. J. (1999). What can neuropsychology tell us about category learning? Trends in Cognitive Sciences, 3, 123–124. Knowlton, B. J., Ramus, S. J., & Squire, L. R. (1992). Intact artificial grammar learning in amnesia: Dissociation of classification learning and explicit memory for specific instances. Psychological Science, 3, 172– 179. Knowlton, B. J., & Squire, L. R. (1993, December). The learning of categories: Parallel brain systems for item memory and category knowledge. Science, 262, 1747–1749. Knowlton, B. & Squire, L. R. (1994a). Artificial grammar learning and implicit memory: Reply to Higham and Vokey. Psychological Science, 5, 61. Knowlton, B. J., & Squire, L. R. (1994b). The information acquired during artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 79 –91. Knowlton, B. J., & Squire, L. R. (1996). Artificial grammar learning depends on implicit acquisition of both abstract and exemplar-specific information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 169 –181. Koch, C. (2004). The quest for consciousness: A neurobiological approach. Englewood, CO: Roberts and Company. Kroger, J. K., Holyoak, K. J., & Hummel, J. E. (2004). Varieties of sameness: The impact of relational complexity on perceptual comparisons. Cognitive Science, 28, 335–358. Lacroix, G. L., Giguere, G., & Larochelle, S. (2005). The origin of exemplar effects in rule-driven categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 272–288. Lieberman, M. D., Chang, G. Y., Chiao, J., Bookheimer, S. Y., & Knowlton, B. J. (2004). An event-related fMRI study of artificial grammar learning in a balanced chunk strength design. Journal of Cognitive Neuroscience, 16, 427– 438. Litman, L., & Reber, A. S. (2005). Implicit cognition and thought. In K. J.

Holyoak & R. G. Morrison (Eds.), Cambridge handbook of thinking and reasoning (pp. 431– 453). New York: Cambridge University Press. Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492–527. Lovibond, P. F., & Shanks, D. R. (2002). The role of awareness in Pavlovian conditioning: Empirical evidence and theoretical implications. Journal of Experimental Psychology: Animal Behavior Processes, 28, 3–26. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19, 1–36. Manza, L., & Reber, A. S. (1997). Representing artificial grammars: Transfer across stimulus forms and modalities. In D. C. Berry (Ed.), How implicit is implicit learning? (pp. 73–106). Oxford: Oxford University Press. Marcus, G. F., Brinkmann, U., Clahsen, H., Wiese, R., & Pinker, S. (1995). German inflection: The exception that proves the rule. Cognitive Psychology, 29, 189 –256. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman. Mathews, R. C. (1991). The forgetting algorithm: How fragmentary knowledge of exemplars can abstract knowledge. Journal of Experimental Psychology: General, 1, 117–119. Mathews, R. C., Buss, R. R., Stanley, W. B., Blanchard-Fields, F., Cho, J. R., & Druhan, B. (1989). Role of implicit and explicit processes in learning from examples: A synergistic effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 1083–1100. Mayr, U. (1996). Spatial attention and implicit sequence learning: Evidence for independent learning of spatial and nonspatial sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 350 –364. McAndrews, M. P., & Moscovitch, M. (1985). Rule-based and exemplarbased classification in artificial grammar learning. Memory & Cognition, 13, 469 – 475. Meulemans, T., & Van der Linden, M. (1997). Associative chunk strength in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1007–1028. Meulemans, T., & Van der Linden, M. (2003). Implicit learning of complex information in amnesia. Brain and Cognition, 52, 250 –257. Miller, G. A. (1958). Free recall of redundant strings of letters. Journal of Experimental Psychology, 56, 485– 491. Moody, T. D., Bookheimer, S. Y., Vanek, Z., & Knowlton, B. J. (2004). An implicit learning task activates medial temporal lobe in patients with Parkinson’s disease. Behavioral Neuroscience, 118, 438 – 442. Morris, C. C., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519 –533. Nosofsky, R. M. (1988). Similarity, frequency, and category representations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 54 – 65. Nosofsky, R. M. (1989). Further tests of an exemplar-similarity approach to relating identification and categorization. Perception & Psychophysics, 45, 279 –290. Nosofsky, R. M., Clark, S. E., & Shin, H. J. (1989). Rules and exemplars in categorization, identification, and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 2, 282–304. Nosofsky, R. M., Stanton, R. D., & Zaki, S. R. (2005). Procedural interference in perceptual classification: Implicit learning or cognitive complexity? Memory & Cognition, 33, 1256 –1271. Nosofsky, R. M., & Zaki, S. R. (1998). Dissociations between categorization and recognition in amnesic and normal individuals: An exemplarbased interpretation. Psychological Science, 9, 247–255. Nosofsky, R., & Zaki, S. (1999). Math modeling, neuropsychology, and

THEORIES OF ARTIFICIAL GRAMMAR LEARNING category learning: Response to B. Knowlton (1999). Trends in Cognitive Sciences, 3, 125–126. Peigneux, P., Meulemans, T., Van der Linden, M., Salmon, E., & Petit, H. (1999). Exploration of implicit artificial grammar learning in Parkinson’s disease. Acta Neurologica Belgica, 99, 107–117. Perruchet, P. (1994). Defining the knowledge units of a synthetic language: Comment on Vokey and Brooks (1992). Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 223–228. Perruchet, P., & Pacteau, C. (1990). Synthetic grammar learning: Implicit rule abstraction or explicit fragmentary knowledge? Journal of Experimental Psychology: General, 119, 264 –275. Perruchet, P., & Peereman, R. (2004). The exploitation of distributional information in syllable processing. Journal of Neurolinguistics, 17, 97–119. Perruchet, P., Vinter, A., Pacteau, C., & Gallego, J. (2002). The formation of structurally relevant units in artificial grammar learning. Quarterly Journal of Experimental Psychology, 55A, 485–503. Pinker, S. (1994). The language instinct. New York: Morrow. Poldrack, R. A., Clark, J., Pare-Blagoev, E. J., Shohamy, D., Creso Moyana, J., Myers, C., & Gluck, M. A. (2001). Interactive memory systems in the human brain. Nature, 414, 546 –550. Pothos, E. M. (2005a). Expectations about stimulus structure in implicit learning. Memory & Cognition, 33, 171–181. Pothos, E. M. (2005b). The rules versus similarity distinction. Behavioral & Brain Sciences, 28, 1– 49. Pothos, E. M., & Bailey, T. M. (2000). The importance of similarity in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 847– 862. Pothos, E. M., Chater, N., & Ziori, E. (2006). Does stimulus appearance affect learning? The American Journal of Psychology, 119, 277–301. Pothos, E. M., & Cox, W. M. (2002). Cognitive bias for alcohol-related information in inferential processes. Drug and Alcohol Dependence, 66, 235–241. Rah, S. K. Y., Reber, A. S., & Hsiao, A. T. (2000). Another wrinkle on the dual-task SRT experiment: It’s probably not dual-task. Psychonomic Bulletin & Review, 7, 309 –313. Rathus, J. H., Reber, A. S., Manza, L., & Kushner, M. (1988). Implicit and explicit learning: Differential effects of affective states. Perceptual and Motor Skills, 79, 163–184. Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 6, 855– 863. Reber, A. S. (1969). Transfer of syntactic structure in synthetic languages. Journal of Experimental Psychology, 81, 115–119. Reber, A. S. (1976). Implicit learning of synthetic language. Journal of Experimental Psychology: Human Learning and Memory, 2, 88 –94. Reber, A. S. (1989). Implicit learning and tacit knowledge. Journal of Experimental Psychology: General, 118, 219 –235. Reber, A. S. (1993). Implicit learning and tacit knowledge. New York: Oxford University Press. Reber, A. S., & Allen, R. (1978). Analogic and abstraction strategies in synthetic grammar learning: A functional interpretation. Cognition, 6, 189 –221. Reber, A. S., & Lewis, S. (1977). Toward a theory of implicit learning: The analysis of the form and structure of a body of tacit knowledge. Cognition, 5, 333–361. Reber, P. J., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M. (2003). Dissociating explicit and implicit category knowledge with fMRI. Journal of Cognitive Neuroscience, 15, 574 –583. Reber, P. J., & Squire, L. R. (1999). Intact learning of artificial grammars and intact category learning by patients with Parkinson’s disease. Behavioral Neuroscience, 113, 235–242. Reber, P. J., Stark, C. E. L., & Squire, L. R. (1998). Contrasting cortical activity associated with category memory and recognition memory. Learning & Memory, 5, 420 – 428.

243

Reber, R., & Perruchet, P. (2003). The use of control groups in artificial grammar learning. The Quarterly Journal of Experimental Psychology, 56A, 97–115. Reber, R., Schwarz, N., & Winkielman, P. (2004). Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience? Personality and Social Psychology Review, 8, 364 –382. Redington, F. M., & Chater, N. (1996). Transfer in artificial grammar learning: Methodological issues and theoretical implications. Journal of Experimental Psychology: General, 125, 123–138. Redington, M., & Chater, N. (2002). Knowledge representation and transfer in artificial grammar learning. In R. M. French & A. Cleeremans (Eds.), Implicit learning and consciousness (pp. 121–143). Hove, United Kingdom: Psychology Press. Regehr, G., & Brooks, L. R. (1993). Perceptual manifestations of an analytic structure: The priority of holistic individuation. Journal of Experimental Psychology: General, 122, 92–114. Rips, L. J. (1989). Similarity, typicality and categorization. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge, United Kingdom: Cambridge University Press. Rouder, J. N., & Ratcliff, R. (2004). Comparing categorization models. Journal of Experimental Psychology: General, 133, 63– 82. Rouder, J. N., & Ratcliff, R. (2006). Comparing exemplar- and rule-based theories of categorization. Current Directions in Psychological Science, 15, 9 –13. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926 –1928. Schacter, D. L. (1992). Understanding implicit memory. American Psychologist, 47, 559 –569. Schacter, D. L., Chiu, C.-Y. P., & Ochsner, K. N. (1993). Implicit memory: A selective review. Annual Review of Neuroscience, 16, 159 –182. Schooler, L. J., & Hertwig, R. (2005). How forgetting aids heuristic inference. Psychological Review, 112, 610 – 628. Servan-Schreiber, E. (1991). The competitive chunking theory: Models of perception, learning, and memory. Unpublished doctoral dissertation, Department of Psychology, Carnegie-Mellon University. Servan-Schreiber, E., & Anderson, J. R. (1990). Learning artificial grammars with competitive chunking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 592– 608. Shanks, D. R. (2005). Implicit learning. In K. Lamberts & R. Goldstone (Eds.), Handbook of cognition (pp. 202–220). London: Sage. Shanks, D. R., & Darby, R. J. (1998). Feature- and rule-based generalization in human associative learning. Journal of Experimental Psychology: Animal Behavior Processes, 24, 405– 415. Shanks, D. R., Johnstone, T., & Staggs, L. (1997). Abstraction processes in artificial grammar learning. Quarterly Journal of Experimental Psychology, 50A, 216 –252. Shanks, D. R., & Perruchet, P. (2002). Dissociation between priming and recognition in the expression of sequential knowledge. Psychonomic Bulletin & Review, 9, 362–367. Shanks, D. R., & St. John, M. F. (1994). Characteristics of dissociable human learning systems. Behavioral and Brain Sciences, 17, 367– 447. Shin, H. J., & Nosofsky, R. M. (1992). Similarity-scaling studies of “dot-pattern” classification and recognition. Journal of Experimental Psychology: General, 121, 278 –304. Skosnik, P. D., Mirza, F., Gitelman, D. R., Parrish, T. B., Mesulam, M. M., & Reber, P. J. (2002). Neural correlates of artificial grammar learning. NeuroImage, 17, 1306 –1314. Sloman, S. A., & Rips, L. J. (1998). Similarity as an explanatory construct. Cognition, 65, 87–101. Smith, E. E., Langston, C., & Nisbett, R. E. (1992). The case for rules in reasoning. Cognitive Science, 16, 1– 40. Smith, E. E., Patalano, A. L., & Jonides, J. (1998). Alternative strategies of categorization. Cognition, 65, 167–196. Smith, J. G., & McDowall, J. (2005). The implicit sequence learning deficit

244

POTHOS

in patients with Parkinson’s disease: A matter of impaired sequence integration? Neuropsychologia, 44, 275–288. Smith, J. G., & McDowall, J. (2006). When artificial grammar acquisition in Parkinson’s disease is impaired: The case of learning via trial-by-trial feedback. Cognitive Brain Research, 1067, 216 –228. Smith, J., Siegert, R. J., McDowall, J., & Abernethy, D. (2001). Preserved implicit learning on both the serial reaction time task and artificial grammar in patients with Parkinson’s disease. Brain and Cognition, 45, 378 –391. Sun, R., Merrill, E., & Peterson, T. (2001). From implicit skills to explicit knowledge: A bottom-up model of skill learning. Cognitive Science, 25, 203–244. Sun, R., Slusarz, P., & Terry, C. (2005). The interaction of the explicit and the implicit in skill learning: A dual process approach. Psychological Review, 112, 159 –192. Tunney, R. J. (2005). Sources of confidence in implicit cognition. Psychonomic Bulletin and Review, 12, 367–373. Tunney, R. J., & Altmann, G. T. M. (1999). The transfer effect in artificial grammar learning: Reappraising the evidence on the transfer of sequential dependencies. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1–12. Tunney, R. J., & Altmann, G. T. M. (2001). Two modes of transfer in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 614 – 639. Tunney, R. J., & Shanks, D. R. (2003). Does opposition logic provide evidence for conscious and unconscious processes in artificial grammar learning? Consciousness and Cognition, 12, 201–218. Turner, C. W., & Fischler, I. S. (1993). Speeded tests of implicit knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1165–1177. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352. Vokey, J. R., & Brooks, L. R. (1992). Salience of item knowledge in learning artificial grammar. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 328 –344. Vokey, J. R., & Brooks, L. R. (1994). Fragmentary knowledge and the processing-specific control of structural sensitivity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1504 –1510. Vokey, J. R., & Higham, P. A. (2004). Opposition logic and neural network models in artificial grammar learning. Consciousness and Cognition, 13, 565–578. Wasserman, E. A., & Miller, R. R. (1997). What’s elementary about associative learning? Annual Review of Psychology, 48, 573– 607.

Whittlesea, B. W. A., & Dorken, M. D. (1993). Incidentally, things in general are particularly determined: An episodic-processing account of implicit learning. Journal of Experimental Psychology: General, 122, 227–248. Whittlesea, B. W. A., Jacoby, L. L., & Girard, K. (1990). Illusions of immediate memory: Evidence of an attributional basis for feelings of familiarity and perceptual quality. Journal of Memory and Language, 29, 716 –732. Whittlesea, B. W. A., & Leboe, J. P. (2000). The heuristic basis of remembering and classification: Fluency, generation, and rememberance. Journal of Experimental Psychology: General, 129, 84 –106. Whittlesea, B. W. A., & Price, J. R. (2001). Implicit/explicit memory versus analytic/nonanalytic processing: Rethinking the mere exposure effect. Memory & Cognition, 29, 234 –246. Whittlesea, B. W. A., & Williams, L. D. (2000). The source of feelings of familiarity: The discrepancy-attribution hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 547–565. Whittlesea, B. W. A., & Wright, R. L. (1997). Implicit (and explicit) learning: Acting adaptively without knowing the consequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 181–200. Wilkinson, L., & Shanks, D. R. (2004). Intentional control and implicit sequence learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 354 –369. Williams, P., & Tarr, M. J. (1997). Structural processing and implicit memory for possible and impossible figures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1344 –1361. Wills, S., & Mackintosh, N. J. (1998). Peak shift on an artificial dimension. Quarterly Journal of Experimental Psychology, 51B, 1–31. Witt, K., Nuehsman, A., & Deuschl, G. (2002). Intact artificial grammar learning in patients with cerebellar degeneration and advanced Parkinson’s disease. Neuropsychologia, 40, 1534 –1540. Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology Monograph Supplement, 9(2), Part 2. Zizak, D. M., & Reber, A. S. (2004). Implicit preferences: The role(s) of familiarity in the structural mere exposure effect. Consciousness & Cognition, 13, 336 –362.

Received March 10, 2006 Revision received July 3, 2006 Accepted July 12, 2006 䡲

Suggest Documents