Rethinking the Logical Problem of Language ... - Semantic Scholar

3 downloads 127 Views 153KB Size Report
For example,. Schieffelin (1985) showed that the emergence of the Kaluli ergative is confined to high ..... grammar that adds “-ed” to the end of the present tense.
Rethinking the Logical Problem of Language Acquisition

Brian MacWhinney Carnegie Mellon University

[The child’s acquisition of grammar] is hopelessly underdetermined by the fragmentary evidence available. -- Chomsky 1968 Language and Mind

Abstract

The study of child language acquisition is dominated by three major competing visions: socialization theory, learning theory, and nativist theory. Each takes a different approach to a core issue in developmental psycholinguistics known as the logical problem of language acquisition (LPLA). This paper argues that the LPLA is composed of two, only partially related, sub-problems. The first form of the LPLA emphasizes recovery from overgeneralization. Nativists claim that, contrary to the claims of socialization theory, recovery occurs without corrective feedback under the guidance of innate constraints. Learning theory presents five plausible and interesting alternatives to constraints, including: conservatism, indirect negative evidence, competition, cue construction, and monitoring. The second form of the LPLA focuses on error-free processes in acquisition. The nativist claim is that error-free performance shows that the child understands the possible shape of human language. However, error-free performance can also arise from these same five learning mechanisms. The availability of so many mechanisms for addressing the logical problem 1

indicates that it is time to view recovery from overgeneralization and error-free learning not as logical problems, but evidence for the collaboration of acquisitional supports. We now need to specify the interactions of mechanisms derived from each of the three major competing visions. Emergentist models (MacWhinney, 1999c) present a particularly promising framework for specifying this integration, if they can develop richer linguistic representations and make fuller use of data on spontaneous conversational interactions.

Three Approaches to Child Language Learning The study of child language acquisition is dominated by three major competing visions: socialization theory, learning theory, and nativist theory. Socialization theory holds that language is acquired from social interactions. Nativist theory holds that language is innately derived from a series of genetically programmed modules. Connectionist theory holds that language is acquired from the detection of patterns in the input. Each of these theories is committed to providing fundamental accounts for all of the core phenomena of language acquisition. Among these core phenomena, one that has been the particular focus of theoretical attention is the capacity for recovery from overgeneralization. Overgeneralization and the subsequent recovery from overgeneralization are common processes in the normal course of language acquisition. Sometime during the first years, every normally developing English-speaking child will produce an overgeneralization like “goed” or “ated.” We can be sure that children will also learn to stop making these errors. Each of the three major competing visions gives a very different story about how and why this recovery occurs. This paper will examine those assumptions, calling into question each of the currently accepted approaches to this issue and suggesting an alternative approach grounded on the notion of multiple supports for language learning.

2

1. Socialization Theory The oldest and most widely held approach to language acquisition is socialization theory. This approach focuses on the role of caregivers as sources of social wisdom. Children are viewed as novices who are learning to act like others so that they can communicate their desires. The earliest articulation of this point of view was provided by St. Augustine in his Confessions when he described the ways in which he crudely negotiated the meanings of words with his elders in order to express his wills and desires. This I remember; and have since observed how I learned to speak. It was not that my elders taught me words (as, soon after, other learning) in any set method; but I, longing by cries and broken accents and various motions of my limbs to express my thoughts, that so I might have my will, and yet unable to express all I willed or to whom I willed, did myself, by the understanding which Thou, my God, gavest me, practise the sounds in my memory. When they named anything, and as they spoke turned towards it, I saw and remembered that they called what they would point out by the name they uttered. And that they meant this thing, and no other, was plain from the motion of their body, the natural language, as it were, of all nations, expressed by the countenance, glances of the eye, gestures of the limbs, and tones of the voice, indicating the affections of the mind as it pursues, possesses, rejects, or shuns. And, thus, by constantly hearing words, as they occurred in various sentences, I collected gradually for what they stood; and, having broken in my mouth to these signs, I thereby gave utterance to my will. Thus, I exchanged with those about me these current signs of our wills, and so launched deeper into the stormy intercourse of human life, yet depending on parental authority and the beck of elders. This view of language as a negotiated expression of will fits in well with the views of many developmental psychologists (Bruner, 1978; Ervin-Tripp, 1981; Moerk, 1983; Snow, 1995; Tomasello, 1999); social anthropologists (Heath, 1983; Hymes, 1964; Ochs, 1985; Scollon, 1976); and functional linguists (Chafe, 1987; Givón, 1979). Perhaps the strongest version of socialization theory is the position advocated by Hopper (1987) who suggests that 3

grammar emerges directly from social interaction. In child language, we see that the first uses of grammatical forms are often tightly linked to discourse contexts. For example, Schieffelin (1985) showed that the emergence of the Kaluli ergative is confined to high transitive uses in particular conversational contexts. Similarly, Idiazabal (in press) shows that the first uses of the perfective in Basque appear within narrative structures. Socialization theory emphasizes the developmental importance of corrective feedback. In many middle class families, those “magic moments” in which the parent provides corrective feedback occur hundreds of times each day. Let us take a look at one of the most often cited of these moments, as reported by McNeill (1966): Child: Nobody don’t like me. Mother: No, say “Nobody likes me.” Child: Nobody don’t like me. (dialogue repeated eight times) Mother: Now listen carefully, say “Nobody likes me.” Child: Oh! Nobody don’t likeS me. Examining data from Adam, Eve, and Sarah (Brown, 1973) for evidence of learning during these magic moments, Brown and Hanlon (1970) found that correction was more often for meaning than for form and that, when formal correction was provided by the parent, it was not immediately echoed by the child. Further research has modified this initial assessment. Typically, the form of corrective feedback is not so overt as in the example from McNeill. Instead, parents rely on more subtle forms of recasting. Parents tend to provide corrections for form most often when the child’s utterance is very close to the adult standard, often containing only one error (Bohannon, MacWhinney, & Snow, 1990; Bohannon & Stanowicz, 1988). Children who receive corrective feedback in the form of recasts tend to learn the corrected structures more quickly (Farrar, 1992; Nelson, 1982; Nelson, Denninger, Bonvilian, Kaplan, & Baker, 1984). A very general finding of this research is that the type of feedback that parents provide to their children is finely tuned to the developmental stage of the child’s grammar (Demetras, Post, & Snow, 1986; Hirsh-Pasek, Trieman, & Schneiderman, 1984; Morgan, 4

Bonamo, & Travis, 1995; Penner, 1987; Post, 1994; Snow, 1995; Sokolov, 1993; Sokolov & MacWhinney, 1990). It makes sense for a parent to provide some form of corrective feedback. However, unless the feedback is extremely stereotypic, the child may have trouble interpreting it as an overt correction (Marcus, 1993; Saxton, 1997). Consider the most extreme and clear form of corrective feedback. Every time the child makes a grammatical mistake, the parent would clap his hands and say “ungrammatical.” If a parent were to provide absolutely obvious and uniform negative evidence in this way, interactions would look like this: Child: me want more. Father: ungrammatical. Child: want more milk. Father: ungrammatical. Child: more milk! Father: ungrammatical. Child: cries Father: ungrammatical However, parents cannot interact with their children in this unresponsive way. If they are to provide any form of feedback, it needs to be through recasting and expansion, rather than overt correction. Here is a more plausible interaction: Child: Me want more. Father: You want more? More what? Child: Want more milk. Father: You want more milk? Child: More milk! Father: Sure, honey, I’ll get you some more. Child: (cries) Father: Now don’t cry. Daddy is getting you some. The parent’s main goal in providing feedback to the child is not the provision of negative evidence, but the extraction of the child’s meaning and the maintenance of a successful interaction. When one thinks a bit about the language learning process, this makes sense. If the parent started from the beginning by providing uniform negative 5

feedback to all ungrammatical sentences, virtually all of the child’s first 1000 utterances would be marked with the word “ungrammatical” and an eyebrow raise or a clap of the hands. The child would learn little from this process except perhaps to avoid communicating with a person who provides nothing but raised eyebrows. Proponents of socialization theory can argue that feedback need not be provided in this absolute fashion. Rather, both parents and children could obey the principles of signal detection theory by maximizing “hits” and “correct rejections,” while minimizing “misses” and “false alarms.” However, Marcus (1993) has shown that the actual distribution of individual types of feedback such as recasts or expansions to specific syntactic constructions is so noisy that a huge amount of feedback would be required before the child could establish a sufficient level of confidence to know that a given construction is either correct or incorrect. Unfortunately, much of the discussion of the use of negative evidence has tended to underestimate the information-processing abilities of the child. In most models, it is assumed that the child focuses on the use of a single cue, rather than a combination of cues. By integrating a variety of cues with differential cue validities (Anderson, 1982; MacWhinney & Bates, 1989; Massaro, 1987), the child could establish an overall “negative feedback index” for each utterance. Some of the cues that could be integrated include overt correction, recasting, expansions, clarification questions, topic continuation, proxemics, gesture, and intonation. If the child could put together all of this information, there might be enough parental feedback to tag sentences as grammatical or ungrammatical. Socialization theory emphasizes the importance of tutoring, scaffolding, and corrective feedback as cues that guide the child through every step of linguistic socialization. This view tends to minimize the importance of a priori hypotheses while maximizing the impact of the structure of the sociolinguistic environment. Socialization theory places a great emphasis on the “here and now” as the wellspring of grammatical learning. Because it assigns no particular role to memory or off-line hypothesis checking, socialization theory views 6

linguistic input as having a direct and immediate effect on language learning. For the issue of recovery from overgeneralization, the finding that would provide the strongest support for socialization theory is one that shows direct links between parental feedback and recovery from overgeneralization. Evidence that parental feedback plays no direct role in language acquisition would strike at the heart of socialization theory by weakening its conceptual underpinnings and markedly limiting its scope. 2. Learning Theory The second major vision of the process of language learning is that espoused by the empiricists and associationists. The roots of empiricism go back to Aristotle and the Skeptics in ancient Greece. In the modern period, philosophers such as Locke, Hume, and Berkeley outlined the general shape of associationist psychology. In the period between the two world wars, associationist thinking was a dominant theme in American psychology. During that period, associationism became closely linked to behaviorism, particularly in the work of Skinner, Hull, and Thorndike. In modern times, associationist thinking has reemerged without its earlier behaviorist linkages in the context of connectionism or neural network modeling. For our current concerns, one of the most interesting connectionist models is the account of past tense learning explored by Rumelhart and McClelland (1986), Plunkett and Marchman (1991), and MacWhinney and Leinbach (1991). Connectionism tends to be rather agnostic on the issue of the Poverty of Stimulus. Some connectionist models such as Back Propagation rely heavily on corrective feedback. Others, such as Competitive Learning and Adaptive Resonance Theory (ART) learn simply on positive data. Despite this apparent eclecticism and agnosticism, the issue of the Logical Problem of Language Acquisition is just as much a problem for connectionists as it is for nativists and social environmentalists. Not all learning mechanisms must be expressed in neural networks. However, all of the mechanisms we will consider here could be expressed in neural network terms. 7

3. Generativist Theory The third major approach to language acquisition is Chomsky’s generativist theory. At the core of this view of language learning is the rejection of the behaviorist premises of socialization theory. Chomsky (1965) has argued that natural language input is full of retracings, errors, and slips of the tongue. Because language is degenerate in these various ways, the child should find it difficult to acquire grammatical rules by simple induction across the input. This analysis of the learning challenge (Chomsky, 1980) has been called “Argument from Poverty of the Stimulus.” According to this argument, no child can build an adult language out of such degenerate input without obeying by a rich set of speciesspecific innate hypotheses. These hypotheses are encoded genetically as specifications for the shape of the language organ. Some refer to this issue as the “Logical Problem of Language Acquisition” (LPLA) (Baker, 1979), while others have called it “Plato’s Problem”, “Chomsky’s Problem”, “Gold’s Problem”, or “Baker’s Paradox”. The LPLA has served as the fundamental motivation for an enormous body of research on both first and second language acquisition in the generativist framework. The argumentation in this work takes one of two forms. The first form focuses on the process of recovery from overgeneralization. 1. A linguistic structure is presented. 2. It is shown that children sometimes overgeneralize the use of this structure. This demonstrates that the structure is not being learned by rote. 3. It is argued that there is not enough evidence in the “stimulus” to force recovery from overgeneralization. 4. Therefore, the observed recovery from overgeneralization must be due to some innate mechanisms. We can think of this argument as argument from recovery. We will refer to it as the LPLA #1.

8

The second form of the nativist argument focuses on certain grammatical features that the child putatively produces without any errors at all. For this second type of phenomenon the argument is as follows: 1. A linguistic structure is presented. 2. It is shown that children use this structure without ever making mistakes. 3. This structure is shown to be so rare that children never encounter it in the input. 4. Therefore, the observed correct performance and avoidance of alternative incorrect performances must be due to innate mechanisms. We can think of this analysis as argument from lack of error. We will refer to it as the LPLA #2. Both of these forms of the LPLA are arguments from Poverty of the Stimulus. However, they differ in terms of the nature of the child’s performances. In the first case, the child produces errors and then recovers. In the second case, the child never makes an error in the first place. LPLA #1 and LPLA #2 have played a very different role in the literature. In his articulations of the theory of Principles and Parameters (P&P), Chomsky has tended to emphasize the importance of lack of error and LPLA #2. However, empirical studies of child language acquisition have tended more to emphasize recovery from overgeneralization and LPLA #1. A failure to clearly distinguish these two very different lines of argumentation has led to some confusion in this literature. Therefore, one of our goals here will be to clarify this distinction and to analyze each of the arguments separately.

The LPLA #1: Recovery from overgeneralization Although these two forms of the LPLA look at very different language phenomena, both are grounded conceptually on a formal analysis presented by Gold (1967). Gold contrasted two different language-learning situations: text presentation and informant presentation. With informant presentation, the language learner can receive feedback from an infallible informant regarding the grammaticality of every candidate sentence. This corrective 9

feedback is called “negative evidence” and it only requires that ungrammatical strings be clearly identified as unacceptable. Whenever the learner formulates an overly general guess about some particular linguistic structure, the informant will label the resulting structure as ungrammatical and the learner will use this information to restrict the developing grammar. In the case of text presentation, the learner only receives information on acceptable sentences and no information regarding ungrammaticality is available. Gold showed that, with only text presentation, languages with reasonably complex grammars, such as those that have phrase structure rules, are not learnable. Nativists have then argued that, since language is not learnable from input in this way, it must be innate in the sense that the child must already have identified the basic shape of possible grammars before any learning begins. Gold’s proof is formulated in the terms of the abstract objects of recursive function theory. However, it only takes a little rephrasing to see how the proof can be applied directly to the actual process of language learning. The child can be viewed as the learner and the adult can be viewed as the informant. It does not matter for Gold’s argument whether the child or the adult is the source of a given string. What is important is only the shape of the feedback associated with that string. In text presentation, no feedback can occur, so the following interaction types are possible: Utterance

Feedback

Result

1. Child says, “went.”

none

none

2. Child says, “*goed.”

none

none

3. Adult says, “went.”

none

positive data

The only information that the child receives in these sequences is positive data, since there is no feedback regarding the child’s own productions. In sequence #1, there is no information presented regarding the acceptability of “went.” However, sequence #3 does provide this positive evidence by allowing the “text” to include acceptable sequences. In sequence #2, there is no information presented regarding the unacceptability of “goed.” Moreover, the 10

adult “text presentation” will never produce the form “goed.” Therefore, the child has no direct way of knowing that “goed” is ungrammatical. Unlike text presentation, informant presentation provides feedback. The strongest form of feedback is that which presents positive feedback for grammatical utterances and negative feedback for ungrammatical utterances. If the child makes an error, it will be marked by a signal from the adult. The adult can produce the error directly, along with information signaling the fact that the error is ungrammatical. The provision of these signals is the responsibility of the adult. In the informant presentation scenario, there are four types of possible sequences: Utterance

Adult Feedback

Result

1. Child says, “went.”

Good

Positive data

2. Child says, “*goed.”

Bad

Corrective feedback

3. Adult says, “went.”

Good

Positive data

4. Adult says, “*goed.”

Bad

Corrective feedback

There is no attested example in the literature of a sequence like #4 in which a parent spontaneously produced a random error just to have an opportunity to mark it as ungrammatical. Although such sequences never actually occur, they would fit in well with the Gold framework, if they did. However, in Gold’s framework, a sequence like #2 is functionally equivalent to a sequence like #4, so the absence of #4 does not affect the analysis. In cases like sequence #3, the provision of positive feedback is not necessary, since the child can reasonably assume that most forms produced by the adult are grammatical. Of course, adults will occasionally make errors. However, on the level of the lexical item and the construction, the notion that adult input is correct is a good working assumption. To implement this fully, the child may need to filter out false starts and retracings, and just store away words, constructions, and sentences that are clear, unretraced, and fully comprehended. Once this is done, the child can then treat all remaining adult forms as positive evidence. 11

With text presentation, if the learner formulates an overly general hypothesis, there is no way to exclude that general hypothesis. Consider a very simple example in which the learner is given a corpus of regular present and past tense verbs, along with a few verbs that have irregular past tense forms. Using the regular past tense examples, the learner will induce a grammar that adds “-ed” to the end of the present tense. This rule will then produce the overgeneralized form “goed.” Without information regarding the ungrammaticality of “goed,” the learner will never be able to recover from this overgeneralization and will never learn to restrict the language to the smaller grammar that produces just “went.” Thus, the grammar induced by this process will forever remain too big, since it will include both “goed” and “went.”

correct grammar

went jumped

overly general grammar

goed runned falled wented

12

Gold showed that this problem occurs inevitably for the learning of all but the simplest forms of language. If the set of languages being explored includes only finite languages generated by finite-state machines (i.e., languages generated by regular, Markov processes), text presentation is adequate. To see why this is true, consider a simple finite-state grammar such Grammar (1):

(1) B start

A

D end

C

This grammar will generate the strings ABD or AC. If we add the string ACD to the positive evidence in the input, the grammar will add a new connection to permit the additional string. The result will be Grammar (2): (2) B start

A

D end

C

Learning involves the addition of new connections or transitions between nodes and no cutting or rewiring of old transitions. New positive strings always lead to the addition of new transitions. There is no way for a finite grammar of this type to overgeneralize or overgenerate, since it is simply an organized summary of the information in the input strings. Basically, the learning of a finite-state grammar is a very conservative, data-based process. However, if the set of possible grammars that may be confronting the child includes all possible finite grammars of this type as well as potentially at least one non-finite grammar, Gold shows that the correct grammar cannot be induced from text presentation. For 13

example, one non-finite grammar that is consistent with the strings ABD, AC, and ACD is Grammar (3): (3)

S -> AP + (BP) AP -> A + (C) BP -> (B) + D

The problem with this grammar is that it will also generate the ungrammatical string ACBD. Since the learner will never be told that ACBD is ungrammatical, there will be no way to reject the nonfinite grammar and no way to settle on the correct grammar indicated in (2). Gold’s proof relates to the case in which a child is willing to consider all possible finitestate grammars along with just one nonfinite grammar. One might object that this is a rather bizarre limitation. However, Gold selected this configuration only to illustrate the problem in its simplest form. One could equally well imagine that the child is examining the utility of many alternative non-finite grammars, along with the basic finite-state summaries of the input. If one allows the child to hypothesize multiple possible nonfinite grammars, the problem only gets worse. In this second scenario, the child could induce Grammar (4) (4)

S -> AP + (BP) + (C) + (D) AP -> A BP -> BD

This second nonfinite grammar would generate illegal strings such as ABDCD or ADD. If the child goes down the road of formulating all manner of non-finite grammars, it is difficult to constrain this process to just a particular grammar. In fact, the child might well formulate both (3) and (4) as alternatives. Given this, and given the a priori commitment to view language identification as deterministic, many linguists and psycholinguists have accepted Gold’s analysis and used it as the foundation stone upon which to build further analyses. When coupled with certain additional forms of argumentation, this logical problem of language acquisition (LPLA) has functioned as a major conceptual pillar supporting current

14

work in generative linguistics, language acquisition theory, and second language acquisition theory.

Solving the LPLA #1 through conservatism The most direct way for a language learner to solve Gold’s problem is to avoid formulating overly general grammars in the first place. If the child never overgeneralizes, there is no problem of recovery from overgeneralization and no need for negative evidence or corrective feedback. In the examples presented above, the conservative child would avoid formulating Grammar (4) and never go beyond a finite-state grammar. To insure that this happens, the child simply has to avoid constructing a grammar with greater than finite-state complexity. This first solution to the LPLA #1 emphasizes the child’s obedience to the Subset Principle of Angluin (1980) or Fodor and Crain (1987). The Subset Principle requires the child to avoid overgeneralization by always sticking with the most conservative grammar. It stipulates that grammars are ordered in a subset relation such that the child explores the more restrictive grammar first before even considering the less restrictive one. In essence, the Subset Principle says that the child is conservative. Virtually all accounts of language learning assume some degree of conservatism in the child’s approach to rule induction. Many children are able to avoid falling into the trap of overgeneralization by using linguistic forms cautiously and conservatively. For example, if a child avoids using a verb with dative movement until that verb is detected in a sentence with dative movement, dative movement overgeneralization will never occur. Conservative learners can learn without negative evidence, because they never make errors. This means that they never actually go beyond the data given. Baker (1981), Fodor and Crain (1987), Maratsos, Kuczaj, Fox, and Chalkley (1979) and others have emphasized the extent to which syntactic learning can proceed conservatively, often avoiding the need for negative evidence. Wolfe Quintero (1992) has shown that conservatism can be used to account for learner acquisition 15

of the sentence patterns that have been used to motivate the subjacency constraint and its related parameter. For example, she notes that second language learners acquire these positive contexts for wh-movement in this order: What did the little girl hit __ with the block today? What did the boy play with __ behind his mother? What did the boy read a story about __ this morning? Because they are proceeding conservatively, learners never produce forms such as: *What did the boy with ___ read a story this morning? They never hear this structure in the input and never hypothesize a grammar that includes it. As a result, they never make overgeneralizations and never attempt wh-movement in this particular context. Data from Maratsos, Kuczaj, Fox, and Chalkley (1979) suggest that this same analysis may also apply to first language learners. Many child language researchers have emphasized the importance of item-based constructions (Braine, 1976; Lieven, Pine, & Baldwin, 1997; MacWhinney, 1975, 1982; Tomasello, 1992) in acquisition. If the child formulates and applies these patterns conservatively, overgeneralization will be minimized. For

example, a common

overgeneralization at age 3 involves the frequent verb “say.” Children will ask parents to “say me that story” instead of “tell me that story.” However, conservative children will not make this error, since they will only use the verb “say” in exactly the way it was used in the input. In the terms of MacWhinney (1982; 1988), conservative children will learn a finitestate transition network centered on the lexical item “tell.” This network accepts (or generates) an NP in the role of “speaker” in preverbal position, an NP in the role of “listener” in postverbal position, and an NP in the role of “story” in the post-postverbal slot. A second network is used to produce the periphrastic dative, as in “tell that story to me.” These two networks can then be joined into a single item-based finite-state grammar that operates on narrowly defined lexical categories. Children can learn this item-based grammar using positive data only. They can also learn a similar network for the verb “say.” 16

However, for that network, there is only the periphrastic dative. Moreover, for the verb “say,” the category of the NP in postverbal position is defined semantically as a short verbalization, rather than a longer story. This means that to minimize the possibility of error here, the child has to be conservative in three ways: 1. The child needs to formulate each syntactic combination as an item-based pattern. 2. Each item-based pattern needs to record the exact semantic status of each positive instance of an argument in a particular grammatical configuration (MacWhinney, 1988). 3. Attempts to use the item-based pattern with new arguments must be closely guided by the semantics of previously encountered positive instances. If the child has a good memory and applies this method cautiously, overgeneralization will be minimized. Conservatism can be viewed as a powerful mechanism for addressing the LPLA. However, it is better understood as one of several crucial supports for successful acquisition. Children will eventually go “beyond the information given” and produce the occasional error (Jespersen, 1922). However, by blending a certain level of conservatism with other supports for successful acquisition, the child can make optimal progress in language learning.

Solving the LPLA #1 by recovering from overgeneralization Even if the child minimizes error through conservatism, successful learning will require some form of negative evidence. The logic of Gold’s proof cannot be avoided. When the child overgeneralizes, some force must prune back that overgeneralization. However, researchers (Marcus, 1993) have often mistakenly assumed that negative evidence is equivalent to overt parental correction. This is only true if the learner has no ability to construct secondary comparisons across the positive input. If we modify the Gold scenario by providing the learner with the ability to construct searches across the input, there are at 17

least four ways to compute negative evidence from positive instances. These four processes are: competition, cue construction, monitoring, and indirect negative evidence. 1. Competition Psychological theories have often referred to the notion of competition (Freud, 1958; Herbart, 1891). In the area of language acquisition, MacWhinney (1978) used competition to account for the interplay between “rote” and “analogy” in learning morphophonology. This mechanism was later generalized to all levels of linguistic processing in the form of the Competition Model (MacWhinney, 1988; MacWhinney & Bates, 1989). In the 1990s, the Competition Model was further elaborated in terms of neural network theory. The Competition Model views overgeneralizations as arising from three types of pressures. The first is the underlying analogic pressure that produces the overgeneralization. The second pressure is the growth in the rote episodic auditory representation of a correct form. This representation slowly grows in strength over time, as it is repeatedly strengthened through encounters with the input data. The third pressure is the competition of analogy with rote. Consider the case of “*goed” and “went” viewed diagrammatically. The overgeneralization “goed” is supported by analogy. It competes against the weak rote form “went” which is supported by auditory memory: go + PAST

went

competition

episodic/rote support

go + ed

analogic pressure

As the strength of the rote auditory form for “went” grows, it begins to win out in the competition against the analogic form “*goed”. Finally, the error is eliminated.i Saxton (1997) has emphasized the ways in which competition operates directly during conversation. He argues that, “When the child produces an utterance containing an 18

erroneous form, which is responded to immediately with an utterance containing the correct adult alternative to the erroneous form (i.e. when negative evidence is supplied), then the child may perceive the adult form as being in contrast with the equivalent child form. Cognizance of a relevant contrast can then form the basis for perceiving the adult form as a correct alternative to the child form.” (p. 155). Saxton refers to this juxtaposition as the Direct Contrast hypothesis. A paradigmatic example of a Direct Contrast exchange for Saxton would be: Child: Well, I feeled it. Adult: I felt it. Child: I felt it. As Saxton notes, the child is aware of the existence of both “felt” and “feeled” and uses the parental data to reinforce the strength of the former. Thus, Saxton’s Direct Contrast account is equivalent to the Competition Model account (MacWhinney, 1993). Further implementing this concept, Saxton (1997; 1998) has conducted training experiments with novel irregular past tense forms. His studies clearly demonstrate the efficacy of providing correct models that are closely tuned to the child’s own productions (Bohannon et al., 1990; Bohannon & Stanowicz, 1988). If the learner is sufficiently conservative, learning will be close to error free. In this account, conservatism works by placing relatively more reliance on episodic/rote support and discounting the influences of analogic pressure. Errors will only occur in cases where analogy is strongly in competition with rote. Generalizing away from the particular example given above, the general schema for competition looks like this: meaning

competition

word

word

analogic pressure

episodic support

19

The competition between two candidate forms is governed by the strength of their episodic auditory representations. In the case of the competition between “*goed” and “went”, the overgeneralized form has little episodic auditory strength, since it is heard seldom if at all in the input. Although “*goed” lacks auditory support, it has strong analogic support from the general pattern for past tense formation (MacWhinney & Leinbach, 1991). In the Competition Model, analogic pressure stimulates overgeneralization and episodic auditory encoding reins it in. The analogic pressure hypothesized in this account has been described in detail in several connectionist models of morphophonological learning. The models that most closely implement the type of competition being described here are the models of MacWhinney and Leinbach (1991) for English and MacWhinney, Leinbach, Taraban, and McDonald (1989) for German. In these models, there is a pressure for regularization according to the general pattern that produces forms such as “*goed” and “*ranned”. In addition, there are weaker gang effects that lead to overgeneralizations such as “*stang” for the past tense of “sting”.

Morphological Competition Bowerman (1987) has suggested that recovery from overgeneralizations such as “*unsqueeze” is particularly problematic for a Competition Model account. To make this example concrete, let us imagine that “*unsqueeze” is being used to refer to the voluntary opening of a clenched fist. In this case, likely competitors include “release” or “let go.” Because there is no rote auditory support for “*unsqueeze,” forms like “release” or “let go” will eventually compete against and eliminate this particular error. Several semantic cues support this process of recovery. In particular, inanimate objects such as rubber balls and sponges cannot be “*unsqueezed” in the same way that they can be “squeezed.” Squeezing is only reversible if we focus on the action of the body part doing the squeezing, not the object being squeezed. Or consider the competition between 20

“*unapprove” and “disapprove”. We might imagine that a mortgage loan application that has been initially approved can then be subsequently “unapproved.” At that point, we would still not have heard “unapproved” actually supported by input data, but there would be less direct competition with “disapprove.” Forces that minimize the competition between meanings can help an overgeneralization survive long enough for it to begin to carve out its own “ecological niche” (MacWhinney, 1989).

Lexical Competition The same logic that can be used to account for recovery from morphological overgeneralizations can be used to account for recovery from lexical overgeneralizations. For example, a child may overgeneralize the word “kitty” to refer to tigers and lions. The child will eventually learn the correct names for these animals and restrict the overgeneralized form. The same three forces are at work here: analogic pressure, competition, and episodic encoding. Although the child has never actually seen a “kitty” that looks like a tiger, there are enough shared features to license the generalization. If the parent supplies the name “tiger.” there is a new episodic encoding which then begins to compete with the analogic pressure. If no new name is supplied, the child may still begin to accumulate some negative evidence, noting that this particular use of “kitty” is not yet confirmed in the input. Merriman (1999) has shown how the linking of competition to a theory of attentional focusing can account for the major empirical findings in the literature on Mutual Exclusivity (Markman, 1989), or the tendency to treat each object as having only one name. By treating this constraint as an emergent bias, we avoid a variety of empirical problems (MacWhinney, 1991). Since competition is implemented probabilistically through fuzzy logic (Massaro, 1987) or connectionist nets, it only imposes a bias, rather than a fixed constraint. The probabilistic basis for competition allows the child to deal with hierarchical category structure without having to enforce major conceptual reorganization (Carey, 1985). 21

Competition may initially lead a child to avoid referring to a “robin” as a “bird,” since the form “robin” would be a direct match. However, sometimes “bird” does not compete directly with “robin.” These include reference to a collection of different types of birds that may include robins, reference to an object that cannot be clearly identified as a robin, or anaphoric reference to an item that was earlier mentioned as a “robin.”

Syntactic Frame Competition Overgeneralizations in syntax arise when a valency pattern common to a large group of verbs is incorrectly overextended to a new verb. This type of overextension has been analyzed in both distributed networks (Miikkulainen & Mayberry, 1999) and interactive activation networks (MacDonald, Pearlmutter, & Seidenberg, 1994; MacWhinney, 1987). These networks demonstrate the same gang effects and generalizations found in networks for morphological forms (Plunkett & Marchman, 1993) and spelling correspondences (Plaut, McClelland, Seidenberg, & Patterson, 1996). If a word shares a variety of semantic features with a group of other words, it will be treated syntactically as a member of the group. Consider the example of overgeneralizations of dative movement. Verbs like “give”, “send”, and “ship” all share a set of semantic features involving the transfer of an object through some physical medium. In this regard, they are quite close to a verb like “deliver” and the three-argument group exerts strong analogic pressure on the verb “deliver”. However, dative movement only applies to certain frequent, monosyllabic transfer verbs and not to multisyllabic, Latinate forms with a less transitive semantics such as “deliver” or “recommend.” When children overgeneralize and say, “Tom delivered the library the book,” they are being influenced by the underlying analogic pressure of the group of transfer verbs that permit dative movement. In effect, the child has created a new argument frame for the verb “deliver.” The first argument frame only specifies two arguments – a subject or “giver” and an object or “thing transferred.” The new lexical entry specifies 22

three arguments. These two homophonous entries for “deliver” are now in competition, just as “*goed” and “went” were in competition. Like the entry for “*goed”, the threeplace entry for “deliver” has good analogic support, but no support from episodic encoding derived from the input. Over time, it loses in its competition with the two-argument form of “deliver” and its progressive weakening along with strengthening of the competing form leads to recovery from overgeneralization. Thus, the analysis of recovery from “Tom delivered the library the book” is identical to the analysis of recovery from “*goed”. 2. Cue construction Most recovery from overgeneralization relies on competition. However, competition will eventually encounter limits in its ability to deal with the fine details of grammatical patterns. To illustrate these limits, consider the case of recovery from causative overgeneralizations such as “*I untied my shoes loose”. This particular extension receives analogic support from verbs like “shake” or “kick” which permit “I shook my shoes loose” or “I kicked my shoes loose.” It appears that the child is not initially tuned in to the fine details of these semantic classifications. Bowerman (1988) has suggested that the process of recovery from overgeneralization may lead the child to construct new features to block overgeneralization. We can refer to this process as “cue construction.” Recovering from other causative overgeneralizations may also require cue construction. For example, an error such as “*The gardener watered the tulips flat” can be attributed to a derivational pattern which yields three-argument verbs from “hammer” or “rake”, as in “The gardener raked the grass flat.” Source-goal overgeneralization can also fit into this framework. Consider, “*The maid poured the tub with water” instead of “The maid poured water into the tub” and “*The maid filled water into the tub” instead of “The maid poured water into the tub”. In each case, the analogic pressure from one group of words leads to the establishment of a case frame that is incorrect for a particular verb. Although this competition could be handled just by the strengthening of the correct patterns, it seems 23

likely that the child also needs to clarify the shape of the semantic features that unify the “pour” verbs and the “fill” verbs. Bowerman (personal communication) provides an even more challenging example. One can say “The customers drove the taxi driver crazy,” but not “*The customers drove the taxi driver sad.” The error involves an overgeneralization of the exact shape of the resultative adjective. A connectionist model of the three-argument case frame for “drive” would determine not only that certain verbs license a third possible argument, but also what the exact semantic shape of that argument can be. In the case of the standard pattern for verbs like “drive”, the resultant state must be terminative, rather than transient. To express this within the Competition Model context, we would need to have a competition between a confirmed three-argument form for “drive” and a looser overgeneral form based only on analogic pressure. A similar competition account can be used to account for recovery from an error such as, “*The workers unloaded the truck empty” which contrasts with “The workers loaded the truck full”. In both of these cases, analogic pressure seems weak, since examples of such errors are extremely rare in the language learning literature. The actual modeling of these competitions in a neural network will require detailed lexical work and extensive corpus analysis. A sketch of the types of models that will be required is given in MacWhinney (1999a). 3. Monitoring The Competition Model holds that, over time, correct forms gain strength from encounters with positive exemplars and that this increasing strength leads them to drive out incorrect forms. In the terms of Gold’s analysis, this strengthening of correct forms can guarantee the learnability of language. However, by itself, competition does not fully account for the dynamics of language processing in real social interactions. Consider a standard self-correction such as “I gived, uh, gave my friend a peach.” Here the correct form “gave” is activated in real time just after the production of the overgeneralization. 24

MacWhinney (1978) and Elbers (1993) have treated this type of self-correction as involving “expressive monitoring” in which the child listens to her own output, compares the correct weak rote form with the incorrect overgeneralization, and attempts to block the output of the incorrect form. One possible outcome of expressive monitoring is the strengthening of the weak rote form and weakening of the analogic forms. Exactly how this is implemented will vary from model to model In general, retraced false starts move from incorrect forms to correct forms, indicating that the incorrect forms are produced quickly, whereas the incorrect rote forms take time to activate. Kawamoto (1994) has shown how a recurrent connectionist network can simulate exactly these timing asymmetries between analogic and rote retrieval. For example, Kawamoto’s model captures the experimental finding that incorrect regularized pronunciations of “pint” to rhyme with “hint” are produced faster than correct irregular pronunciations. An even more powerful learning mechanism is what MacWhinney (1978) called “receptive monitoring.” If the child shadows input structures closely, he will be able to pick up many discrepancies between his own productive system and the forms he hears. Berwick (1987) found that a great deal of syntactic learning can be driven by the attempt to extract meaning during comprehension. Whenever the child cannot parse an input sentence, the failure to parse can be used as a means of expanding the grammar. The kind of analysis through synthesis that occurs in some parsing systems can make powerful use of positive instances to establish new syntactic frames. Receptive monitoring can also be used to recover from overgeneralization. The child may monitor the form “went” in the input and attempt to use his own grammar to match that input. If the result of the receptive monitoring is “*goed”, the child can use the mismatch to reset the weights in the analogic system to avoid future overgeneralizations. Neural network models that rely on back-propagation assume that negative evidence is continually available for every learning trial. This assumption is clearly much too strong. 25

However, not all connectionist models rely on the availability of negative evidence. For example, Kohonen’s self-organizing feature map model (Miikkulainen, 1993) learns linguistic patterns simply using cooccurences in the data with no reliance on negative evidence. 4. Indirect Negative Evidence Another interesting approach to the LPLA involves the examination of the input corpus to compute indirect negative evidence. This computation can be illustrated with the error “*goed.” To construct indirect negative evidence in this case, children need to track: 1. The frequency of all verbs. 2. The frequency of the past tense as marked by the regular “-ed.” 3. The ratio of (2) over (1). 4. The frequency of the verb “go.” 5. The predicted frequency of the form “*goed” as the product of (3) times (4). 6. The actual frequency of “*goed” in the input. If (5) exceeds (6) by some specified threshold, then children can conclude that the form “*goed” is excluded by the grammar. They can do this without ever receiving overt correction from the informant. Arguments based on this analysis have been offered by Chomsky (1981), Lasnik (1989), Braine (1989) and others. In logical terms, indirect negative evidence is an interesting solution to the LPLA. However, there is little actual evidence that children keep track of the facts they would need to perform this computation. For elements (1) and (2) above, it might be sufficient to only track the relative frequency of the present and the past for a few core verbs. However, some frequency tracking of the general class must be done. A neural network model or some other generalization mechanism could compute (3) and (5). Moreover, the frequency tracking in (4) and (6) is something that most learning models will have to assume in any case. The real question for this approach is whether children 26

actually compute anything like (1) and (2). Recent evidence for a slow rise in generalization abilities before age 3 (Pine, Lieven, & Rowland, 1998; Tomasello, 2000) suggests that indirect negative evidence might well be available to older children, but probably not to younger children. Interestingly, the structures for which indirect negative evidence provides the most useful accounts are ones that are learned rather late. These typically involve the LPLA #2, rather than the LPLA #1. For example, the learner could compute indirect negative evidence that would block wh-raising from object-modifying relatives in sentences such as: The police arrested the thieves who were carrying the loot. *What did the police arrest the thieves who were carrying? To do this, they would need to track the frequency of sentences such as: Bill thought the thieves were carrying the loot. What did Bill think the thieves were carrying? Noting that raising from predicate complements occurs fairly frequently, children can reasonably conclude that the absence of raising from object modification position means that it is ungrammatical. Coupled with conservatism, indirect negative evidence could be a powerful mechanism for avoiding overgeneralization of complex structures syntactic structures. Unfortunately, we have little direct evidence demonstrating that either children or adults compute indirect negative evidence in the way suggested above. One problem faced by the indirect negative evidence account is that the child would need to know beforehand which structures to include in the ratio. For example, the child would need to know that the frequency of raising in relatives needs to be compared with the frequency of raising in complements. However, if learning is item-based, as suggested earlier, this comparison could be restricted to structures potentially involving a particular lexical item such as “what” or “where.” This suggests that the computation of indirect negative evidence may be partially linked to the same item-based mechanisms that support conservatism.

27

The Competition Model account can also be extended to compute indirect negative evidence. The indirect negative evidence tracker could note that, although “squeeze” occurs frequently in the input, “*unsqueeze” does not. Diagrammatically, this mechanism works through the juxtaposition of a form receiving episodic support (“squeeze”) with a predicted inflected form (“unsqueeze”).

comparison

squeeze

episodic/rote support

gap prediction

(unsqueeze)

analogic prediction

(unconfirmed)

gap tracking

This mechanism uses analogic pressure to predict the form “*unsqueeze.” This is the same mechanism as used in the generation of “*goed.” However, the child does not need to actually produce “*unsqueeze,” only to hypothesize its existence. This form is then tracked in the input. If it is not found, the comparison of the near-zero strength of the unconfirmed form “unsqueeze” with the confirmed form “squeeze” leads to the strengthening of competitors such as “release” and blocking of any attempts to use “unsqueeze.” Although this mechanism is plausible, it is more complicated than the basic competition mechanism and places a greater requirement on memory for tracking of nonoccurrences. Since the end result of this tracking of indirect negative evidence is the same as that of the basic competition mechanism, it is reasonable to imagine that learners use this mechanism only as a fall back strategy, relying on simple competition for most problems with overgeneralization.

Solving the LPLA #1 by recharacterizing the target A less direct, by equally effective, method of solving the LPLA #1 involves a recharacterization of the shape of the target grammar. Gold’s analysis shows that, if the

28

child hypothesizes a language with more than finite state complexity, negative evidence will be needed to recover from overgeneralization. However, if we provide a characterization of language that stays within the bounds set by this proof, then we can assume that children are capable of learning language through simple positive data. In that case, the LPLA #1 essentially vanishes. There are five ways we can achieve this type of recharacterization. The first involves the postulation of a set of innate constraints, as in Principles and Parameters (P&P) Theory. A second involves the imposition of a strict ordering on the set of constraints, as in Optimality Theory (OT). A third approach views constraints not as innate, but as emergent. A fourth recharacterization involves providing alternative characterizations of the formal shape of the target grammar. The fifth involves a recharacterization of the endstate of language learning as probabilistic, rather than deterministic. Let us examine each of these five recharacterizations. 1. Innate constraints Generativists argue that children solve the LPLA by obeying innate constraints on the shape of possible grammars that they consider. Viewed historically, the constraints imposed by the child have played a large role in the development of generative theory. For example, early on, generativists realized that, even with informant presentation, the child could not learn a full transformational grammar of the type proposed in Chomsky (1957). The problem at that time was a technical one, since the transformational component of the grammar could be characterized and ordered in so many alternative ways that it was essentially impossible to know which form was uniquely correct, even with negative evidence. The solution was to constrain the shape and ordering of transformations {Chomsky, 1973 #9492}. For example, permutations were eliminated, since they could be formulated as combinations of additions and deletions. Pursuing this line of thinking, Wexler and Culicover (1980) showed that constraints such as subjacency could allow children to acquire a transformational grammar, as long as 29

some types of negative evidence were provided. Their demonstration depended on the fact that subjacency limited the depth to which the child would have to track interrelations between syntactic roles across clauses. Lightfoot (1989) then showed that the child could acquire nearly all of the important rules of the language from non-embedded structures. He called this degree-0 learnability. Over the last four decades, each new version of generative grammar has brought with it a new vision of the innate constraints that provide the child with prior guidance about the shape of human language. In the 1980s, these constraints involved parameterized principles contained in a series of modules. Children were thought to begin learning with the parameters set for some default value and would only change this default setting if they encountered some triggering linguistic structure (Jespersen, 1922; Matthews & Demopoulos, 1989). The learning of marked parameters in the theory of Principles and Parameters (P&P) can avoid the LPLA #1 if three conditions are met. First, there must be a small set of possible parameters constituting the set of possible human languages. Second, there must be a clear specification of the unmarked settings of these parameters. Third, there must be a clear specification of the surface structure triggers that would lead the child to move from an unmarked parameter setting to a marked parameter setting for each of the hypothesized parameters. Despite two decades of work within the framework of P&P, none of these three conditions has yet been met. Nonetheless, researchers in the P&P tradition remain optimistic about the program, as well as its newer articulation in the minimalist framework. Chomsky (1981) has noted that the P&P view of language acquisition leads directly to a trivial solution to the LPLA. However, there has not yet been any general acceptance of this view among generative linguists (Osherson, Stob, & Weinstein, 1989) or child language researchers (Pinker, 1984).

30

2. Strict constraint ordering Like P&P, Optimality Theory (OT) views language structure as arising from the application of a universal set of constraints. Learning a particular language is basically just the learning of the correct ordering of the constraints in this universal set. The fullest articulation of OT has been in the area of phonology, where Tesar and Smolensky (2000) have offered a formal proof of the learnability of OT phonology without negative evidence. Initially, one might think that this demonstration has little to say to the main line of discussion of language learnability for grammar. However, OT has now also been applied to syntax (Barbosa, Fox, Hagstrom, McGinnis, & Pesetsky, 1997). Moreover, as Pulleybank and Turkel (1997) observe, OT faces the same learnability problems in phonology and syntax. Although both P&P and OT emphasize the role of constraints in typology and learning, they are still generative grammars deep down. In P&P, it is assumed that the basic rules of X-bar syntax and move-α operate to produce all possible structures. The constraints then apply to filter out from the millions of impossible structures, the few that are actually grammatical. In OT phonology, the same strategy applies. Each word begins in its underlying form. Then all possible derivations through the phonological processes that implement the constraints are applied. All those that violate highly ranked constraints are thrown out. The single remaining form is the one that violates either no constraint or only some very weak constraint. In OT, learning the phonology of a language involves learning a specific ordering of the universal constraints. Tesar and Smolensky (2000) show that, if one assumes no interaction between constraints and a strict dominance ordering within each possible language, it is possible to use a certain form of indirect negative evidence to learn which constraints should be demoted based on particular data for a language. If a child learns a form from the input in which constraint B takes precedence over constraint A, and if constraint A is ranked above constraint B in the child’s current grammar, then the child will simply demote 31

constraint A on the basis of this positive evidence. This method works equally well for learning either OT phonology or OT syntax. Both OT and P&P achieve their ability to solve the LPLA at the expense of making extremely strong claims about the shape of human language. Attempts to test simple versions of P&P (Hyams, 1986) have not produced clear empirical (Liceras, 1989; Pizzuto & Caselli, 1993; Valian, 1991) or conceptual (Truscott & Wexler, 1989) support. Direct application of OT to child language leads to complex derivations (Bernhardt & Stemberger, 1998) and unclear predictive power. Moreover, the rigid ordering assumptions made in OT seem to undercut its utility as a psycholinguistic theory. 3. Emergent constraints Evidence that the child follows some general guidelines in recovering from overgeneralization and avoiding errors can be interpreted as evidence for innate constraints. However, it can equally well be explained through the operation of emergent constraints that solidify during the process of language learning itself. In other words, the child can use language learning to learn about the shape of language learning. In the next major section, we will examine this possibility in detail. 4. Alternative formal analysis Gold’s formulation of the LPLA rests on Chomsky’s formulation of relations between types of grammars known as the Chomsky Hierarchy (Chomsky, 1963). Other formal work has often presented alternative ways of understanding the shape of human language. By refining or modifying the formal characterization of human language, these alternative analyses can lead to markedly different consequences in the context of Gold’s analysis. We can mention at least two analyses of this type, each of which presents an interesting solution to the LPLA.

32

One solution to the LPLA strikes directly at the notion (Reich, 1969) that language cannot be described by finite-state grammars. Hausser (1999) has developed a powerful parser based on the use of left-associative grammar. He has shown that left-associative grammar can be expressed as a finite-state grammar that orders words in terms of part-ofspeech categories. Because we know that finite-state grammars can be acquired from positive evidence (Hopcroft & Ullman, 1979), this means that children should be able to learn left-associative grammars directly without encountering the LPLA. Given the fact that these grammars can parse sentences in a time-linear and psycholinguistically plausible fashion, they would seem to be excellent candidates for further exploration by child language researchers. A second formal solution to the LPLA arises in the context of the theory of categorical grammar. Kanazawa (1998) shows that a particular class of categorial grammars known as the k-valued grammars can be learned on positive data within the Gold framework. Moreover, he shows that most of the customary versions of categorial grammar discussed in the linguistic literature can be included in this k-valued class. These attempts to recharacterize the nature of human language by revised formal analysis all stand as useful approaches to the LPLA. By characterizing the target language in a way that makes it learnable by children, linguists help bridge the gap between linguistic theory and child language studies. 5. Revised end-state criterion A particularly powerful solution to the LPLA was proposed by Horning (1969), just after the publication of the original Gold analysis. Horning showed that, if the notion of language identification is treated in terms of a certain probability of identification, rather than an absolute guarantee of no further error ever, then language may be identified on the basis of positive evidence alone. It is surprising that this solution has not received more attention. This crucial early demonstration undercuts the core logic of the LPLA, as it 33

applies to the learning of all rule systems up to the level of context-sensitive grammars. If learning were deterministic, children would go through a series of attempts to hypothesize the “correct” grammar for the language. Once they hit on the correct identification, they would then remain correctly with this final guess forever. The fact that adults make speech errors and differ in their judgments regarding at least some syntactic structures suggests that this criterion is too strong and that the analysis provided by Horning is more realistic.

The LPLA #2: Errors children never make Beginning in the early 1980s, workers in the generative tradition began to shift their attention from the LPLA #1 to the LPLA #2. Realizing that there are many mechanisms capable of achieving recovery from overgeneralization, this alternative shape of the LPLA seemed to provide clearer and less ambiguous guidance for the discovery of the contents of Universal Grammar. Argumentation in this area has centered on characterizing a set of grammatical errors that English-speaking children never make. Failure to produce possible errors is then used as evidence for the innateness of structural dependency, c-command and the three binding conditions, subjacency, and the empty category principle. The basic form of the argument has remained constant throughout various versions of the theories of Government and Binding, Principles and Parameters, and Minimalism. The analysis of non-occuring errors is not linked to the search for a set of parameters within P&P. Because the erroneous setting of a parameter can lead to overgeneralization, parameter setting data is relevant to the LPLA #1, not the LPLA #2. Data that are relevant to LPLA #2 are those that show evidence of non-parameterized universals. The paradigm case of argumentation based on the LPLA #2 is, instead, the child’s obedience to the Structural Dependency condition, as presented by Chomsky in his formal discussion with Jean Piaget (Piatelli-Palmarini, 1980, p. 40). Chomsky notes that children learn early on to move the auxiliary to initial position in questions like “Is the man coming?” One possible formulation of this movement rule looks only at the surface structure of a sentence like 34

“The man is coming” and formulates the question as moving the first auxiliary to initial position. However, if children want to question the proposition given in (1), they will never produce a movement such as (2). Instead, they will always produce (3). 1. The man who is first in line is coming. 2. Is the man who __ first in line is coming? 3. Is the man who is first in line __ coming?” The movement of the auxiliary involves a movement of INFL to COMP that is subject to the head movement constraint. In (2) the auxiliary would have to move around the N’ of “man” and the CP and Comp of the relative clause, but this would be blocked by the head movement constraint (HMC). No such barriers exist in the main clause. In addition, if the auxiliary moves as in (2), it leaves a gap that will violate the empty category principle (ECP). However, Chomsky’s analysis of this pattern does not rely on the details of the operation of the ECP and the HMC. Chomsky simply argues that the child has to realize that phrasal structure is somehow involved in this process and that one cannot formulate the rule of auxiliary movement as “move the first auxiliary to the front.” This restriction on auxiliary movement is called “structural dependency.” Chomsky claims that, “A person might go through much or all of his life without ever having been exposed to relevant evidence, but he will nevertheless unerringly employ the structuredependent generalization, on the first relevant occasion.” A more general statement of this type provided by Hornstein and Lightfoot (1981) who claim that, “People attain knowledge of the structure of their language for which no evidence is available in the data to which they are exposed as children.” As Pullum (1996) has noted, a major problem with Chomsky’s analysis in this case is the fact that children do indeed hear sentences such as “The child who is first in line is getting the prize” or “The child who is first in line will get the prize.” A conservative child can easily hold off on producing auxiliary movement in complex sentences until hearing one or two sentences with the needed positive evidence.

35

Pullum’s analysis, although technically accurate, seems to miss the essence of Chomsky’s point. First, it is certainly true that sentences such as (1) are extremely rare in the input to children. In a search of the input to the three children studied by Brown (1973), I found no such sentences. Sentences of this type may well appear in the Wall Street Journal corpus studied by Pullum, but they are rare in the input to children. Second, it would seem counter-intuitive to argue against Chomsky’s basic point. The structural dependency condition only requires that the child pay attention to the relations between words, rather than just their serial order. Behaghel (1923) pointed out that words that are meaningfully related typically appear next to each other. Some appreciation of this principle must certainly be basic to both auditory and visual processing across species and is not in disagreement with any of the fundamental tenets of an emergentist view of learning. Although Chomsky may have overstated this argument a bit, it is difficult to imagine a language learner who does not pay some attention to conceptual structure. Given this general ability to represent conceptual structure, it seems fair enough to wonder what kind of child would even consider producing a sentence such as “Is the man who first in line is coming?” The theory of item-based learning (MacWhinney, 1975, 1982, 1988) supports Chomsky’s analysis. In that theory, the syntactic positions of arguments are specified in relation to the predicates with which they cluster. Children learn the positioning of the auxiliary marking a yes-no question on an item-by-item basis. For each yes-no auxiliary, children learn that it must appear in preinitial position (before the subject NP). As several of these yes-no auxiliary item-based patterns accumulate, they form a gang, which then constitutes an emergent construction (Goldberg, 1999). This learning is driven by positive evidence. When the child first needs to form a question on the basis of (1), the available device is therefore one that is formulated in terms of relations, not positions and (3) is produced, instead of (2). Thus, both an item-based account and a Chomskyan account agree on the importance of structural dependency. However, the item-based account views the 36

particular implementation of structural dependency in this case as emergent from earlier item-based learning. This analysis of a solution to a particular instance of the LPLA #2 relied on positive evidence, conservative item-based learning, and competition. The mechanisms of monitoring and indirect negative evidence can provide additional support for (3) over (2). In general, all of the mechanisms that we discussed in terms of our solution of the LPLA #1 apply with equal strength to the LPLA #2. Let us consider how these processes apply to some of the other standard arguments based on the LPLA #2. One constraint that has a clear impact on adult English is the complex-NP constraint (Ross, 1974) or head movement constraint that blocks movement of a noun from a relative clause as in (4) and (5). 4. * Who did John believe the man that kissed __ arrived 5. Who did John believe __ kissed his buddy? The problems that we have with such sentences like (4) can be viewed in processing terms (O’Grady, in press). Verbs like “believe” encourage the initial wh-word to continue its search for a gap in as long, as they are expecting complements, as in (5). However, when the expectation for a complement is blocked by the presence of a complex NP as direct object, the usual complement-based filler strategy is thrown for a loop. It is important to realize that what causes the problem is the ambiguity after the verb, not the time taken to find a gap. For example, we can compare (6) in which a gap is found right away with (7) in which it is found later. 6. Who could my friends have asked __ to take the biscuits to Tom last week? 7. Who could my friends have asked us to take the biscuits to Tom for __ last week? Neither of these causes problems, because the cues for continuing the search are clear. The complex-NP constraint also blocks movement from prepositional phrases and other complex NPs, as in 8. * Who did pictures of ___ surprise you? 37

9. * What did you see a happy ___ ? 10. * What did you stand between the wall and ___ ? The constraint in (8) has also been treated as the coordinated-NP constraint in some accounts. Although it appears that most children obey these constraints, there are some exceptions. Wilson and Peters (1988) present these violations of the complex NP constraint from Wilson’s son Seth: what am I cooking on a hot __ ? (-- stove) what did I get lost at the __ , Dad? what are we gonna look for some __ ? (houses) what is this a funny __ , Dad? what are we gonna push number __ ? (9) where did you pin this on my __ ? (robe) what are you shaking all the __ ? (batter and milk) what is this medicine for my __ ? (cold) what are we gonna go at Auntie and __ ? (priya - name of babysitter) Nearly all of these violations involve movement of a noun modified by an adjective. It appears that Seth had in fact learned to produce these violations almost as a game. Nonetheless, it is interesting to see that this putatively universal principle could be so easily violated by a young child. In my own recording of my sons Ross and Mark, I only observed a very few violations. One occurred when my son Mark was 5;4.4. He said (out of the blue as it were): “Dad, next time when it's Indian Guides and my birthday, what do you think a picture of ___ should be on my cake?” Catherine Snow reports that at age 10;10, her son Nathaniel said, “I have a fever, but I don't want to be taken a temperature of.” Most researchers would agree that violations are rare. However, the structures that might trigger violations are also rare. The binding theory (Chomsky, 1981) focused quite heavily on a set of three proposed universal conditions on the binding of pronouns and reflexives to referents. Sentence (11) illustrates two of the constraints. In (11), “he” cannot be coreferential with “Bill” because

38

“Bill” does no c-command the pronoun. At the same time, “himself” must be coreferential with “Bill” because it is a clausemate and does c-command “Bill.” 11. He said that Bill hurt himself. When attempting to apply the LPLA to the study of the binding constraints, it is important to remember that the sentences produced or interpreted are fully grammatical. However, one of the possible interpretations is disallowed by the universal constraints. This means that, to study the imposition of the constraints, researchers must rely on comprehension studies. As an example of the studies conducted during this period, consider this example from a study of long-distance movement of adjuncts by de Villiers, Roeper, and Vainikka (1990). Children were divided into two age groups: 3;7 to 5;0 and 5;1 to 6;11. They were given sentences such as: 12. When did the boy say he hurt himself? 13. When did the boy say how he hurt himself? 14. Who did the boy ask what to throw? For (12), 44% gave long distance interpretations, associating “when” with “hurt himself”. For (13), with a medial wh-phrase blocking a long-distance interpretation, only 6% were long-distance responses. So children were sensitive to the conditions on traces, in accord with P&P theory. However, it appears that this sensitivity develops over time. In the youngest group, children had trouble even understanding sentences with medial arguments like (14). The fact that this ability improves over time suggests that there may well be learning occurring for the easier patterns such as (12) at an earlier age. The argument in this particular case is very different from Chomsky’s argument regarding the structure dependency constraint. In this case, we know that children themselves actually produce sentences with these structures. De Villiers et al report these instances from Brown’s subject Adam: What chu like to have? – 30 months What you think this look like? – 30 months What he went to play with? – 31 months 39

What do you think the grain is going to taste like? – 55 months The question is when are children able to construct the two interpretations for (12) and when do they realize that only one of these interpretations is available for (12)? The P&P answer is that this depends on parameter-setting. First, the child must realize that their language allows movement, unlike Chinese. Next they must decide whether the movement can be local, as in German, or both local and distant as in English. Finally, they must decide whether the movement is indexed by pronouns, traces, or both. However, once a parametersetting account is detailed in this way, it can be difficult to distinguish it from a learning account. Using positive evidence, children can first learn that some movement can occur. Next, they can learn to move locally and finally they can acquire the cues to linking the moved argument to its original argument position, one by one. In learning these structures, children must be sensitive to complex syntactic configurations. This means that any learning account must provide a large role for syntactic structure and provide mechanisms that are capable of acquiring complex patterns.

Implications The study of the LPLA provided a useful focus for child language research in the 1970s and 1980s. However, the use of the LPLA #1 as a way of guiding research has not kept pace with advances in theory, experimentation, and observation. We now know that recovery from overgeneralization is supported by a set of five powerful processes that effectively solve the LPLA #1. The process of recovery from overgeneralization continues to be an important research topic, but it is not longer appropriate to conduct this investigation within the narrow conceptual focus of the LPLA #1. The LPLA #2 has more life in it. Human language is the result of a long, gradual process of evolution (MacWhinney, in press). This process has provided us with some clear ideas about the possible shapes of sounds, words, and sentences in language. These ideas

40

are grounded primarily on facts about our body (MacWhinney, 1999b) and general processes in cognition, perception, and action. By pursuing the study of error-free acquisition in the context of the LPLA #2, we can hope to shed light on these universals. However, we need to conduct this study in the context of an integrated account that derives insights from each of the major competing visions. How can we unite the insights of the three major competing views of language development to derive a fuller, more satisfying account? One framework for producing this integration is provided by the concept of emergentism (MacWhinney, 2001). Emergentism views language structure are emerging from processes operating on six different time scales, including phylogeny, embryology, development, online processing, and diachronics. Emergentism in the area of language acquisition commits itself to providing a neurologically and socially grounded mechanistic account of the interaction of these forces. This means that any integration of the three competing visions must occur on the level of neural mechanism and the body. Constructing this account is currently a goal, rather than an achieved reality (Elman, Bates, Plunkett, Johnson, & Karmiloff-Smith, 1996). One way to begin building this integration is to look at how socialization processes interact with specific learning mechanisms. In the Competition Model, children rely on stored auditory representations to recover from overgeneralization. These stored representations are in fact delayed traces of interactions with adults. This means that an integrated emergentist theory needs to understand the ways in which adults can assist the child in acquiring accurate stored auditory forms. One way in which a parent can do this is through recasting. Marcus (1993) have suggested that parents are inconsistent in their provision of negative evidence to the child. However, there is abundant evidence that parents can provide finely tuned, sensitive input (Snow, 1995). This suggests that what is important to the child is not the provision of negative evidence, but the sensitive provision of finely tuned positive evidence in accord with the Competiton Model analysis. As Merriman (1999) has argued, successful learning depends on the child being able to attend to the objects and 41

actions being discussed. Tomasello (1999) has also emphasized the role of joint attention and mutual understanding in language learning. Careful examination of the impact of these social frameworks on language learning can further clarify the processes of recovery from overgeneralization. One promising avenue for developing an emergentist account would integrate analyses and findings from generative theory with the theory of item-based learning. The clearer separation of phrasal structure, lexicon, and processing through unification that Chomsky has articulated in the current Minimalist Program matches up in some ways with the claims of item-based learning and Construction Grammar. However, there is not yet a fully powerful way of simulating item-based learning in neural networks (MacWhinney, 1999a). This means that major advances must be achieved in learning theory models to properly model the actions of an item-based processor. In summary, the successful construction of an integrated emergentist account of error-free learning will require major conceptual advances in each of the three major competing visions of human language learning.

References

Anderson, N. (1982). Methods of information integration theory. New York: Academic Press. Angluin, D. (1980). Inductive inference of formal languages from positive data. Information and Control, 45, 117-135. Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533-581. Baker, C. L., & McCarthy, J. J. (Eds.). (1981). The logical problem of language acquisition. Cambridge: MIT Press.

42

Barbosa, P., Fox, D., Hagstrom, P., McGinnis, M., & Pesetsky, D. (Eds.). (1997). Is the best good enough: Optimality and competition in syntax. Cambridge, MA: MIT Press. Behaghel, O. (1923). Deutsche Syntax. Heidelberg: Winter. Bernhardt, B., & Stemberger, J. (1998). Handbook of phonological development. San Diego, CA: Academic. Berwick, R. (1987). Parsability and learnability. In B. MacWhinney (Ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates. Bohannon, N., MacWhinney, B., & Snow, C. (1990). No negative evidence revisited: Beyond learnability or who has to prove what to whom. Developmental Psychology, 26, 221-226. Bohannon, N., & Stanowicz, L. (1988). The issue of negative evidence: Adult responses to children's language errors. Developmental Psychology, 24, 684-689. Bowerman, M. (1987). Commentary. In B. MacWhinney (Ed.), Mechanisms of language acquisition. Hillsdale, N.J.: Lawrence Erlbaum Associates. Bowerman, M. (1988). The "no negative evidence" problem. In J. Hawkins (Ed.), Explaining language universals (pp. 73-104). London: Blackwell. Braine, M. D. S. (1976). Children's first word combinations. Monographs of the Society for Research in Child Development, 41, (Whole No. 1). Braine, M. D. S. (1989). Modeling the acquisition of linguistic structure. In Y. Levy & I. Schlesinger & M. Braine (Eds.), Categories and processes in language acquisition (pp. 217-259). Hillsdale, NJ: Lawrence Erlbaum Associates. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard. Brown, R., & Hanlon, C. (1970). Derivational complexity and order of acquisition in child speech. In J. R. Hayes (Ed.), Cognition and the development of language (pp. 1154). New York: Wiley.

43

Bruner, J. (1978). On prelinguistic prerequisites of speech. In R. N. Campbell & P. T. Smith (Eds.), Recent Advances in the Psychology of Language. New York: Plenum Press. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press. Chafe, W. (1987). Cognitive constraints on information flow. In R. Tomlin (Ed.), Coherence and grounding in discourse. Philadelphia, PA: Benjamins. Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton. Chomsky, N. (1963). Formal properties of grammars. In R. B. R. Luce & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2). New York: Wiley. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. (1980). Rules and Representations. New York: Columbia University Press. Chomsky, N. (1981). Lectures on government and binding. Cinnaminson, NJ: Foris. de Villiers, J., Roeper, T., & Vainikka, A. (1990). The acquisition of long distance rules. In L. Frazier & J. de Villiers (Eds.), Language processing and language acquisition. Amsterdam: Kluwer. Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of Child Language, 13, 275-292. Elbers, L., & Wijnen, F. (1993). Effort, production skill, and language learning. In C. Ferguson & L. Menn & C. Stoel-Gammon (Eds.), Phonological development (pp. 337-368). Timonium, MD: York. Elman, J., Bates, E., Plunkett, K., Johnson, M., & Karmiloff-Smith, A. (1996). Rethinking innateness. Cambridge, MA: MIT Press. Ervin-Tripp, S. (1981). Social process in first and second language learning. In H. Winitz (Ed.), Native language and foreign language acquisition. New York, N. Y.: The New York Academy of Sciences. Farrar, J. (1992). Negative evidence and grammatical morpheme acquisition. Developmental Psychology, 28, 90-98. 44

Fodor, J., & Crain, S. (1987). Simplicity and generality of rules in language acquisition. In B. MacWhinney (Ed.), Mechanisms of Language Acquisition. Hillsdale, N.J.: Lawrence Erlbaum. Freud, S. (1958). Psychopathology of everyday life. New York: New American Library, Mentor. Givón, T. (1979). On understanding grammar. New York: Academic Press. Gold, E. (1967). Language identification in the limit. Information and Control, 10, 447474. Goldberg, A. E. (1999). The emergence of the semantics of argument structure constructions. In B. MacWhinney (Ed.), The emergence of language (pp. 197-213). Mahwah, NJ: Lawrence Erlbaum Associates. Hausser,

R.

(1999).

Foundations

of

computational

linguistics:

Man-machine

communication in natural language. Berlin: Springer. Heath, S. (1983). Ways with words: Language, life and work in communities and classrooms. Cambridge: Cambridge University Press. Herbart, J. F. (1891). A text-book in psychology. New York: Appleton and Co. Hirsh-Pasek, K., Trieman, R., & Schneiderman, M. (1984). Brown and Hanlon revisited: mother sensitivity to grammatical form. Journal of Child Language, 11, 81-88. Hopcroft, J., & Ullman, J. (1979). Introduction to automata theory, languages, and computation. Reading, Mass.: Addison-Wesley. Hopper, P. (1987). Emergent grammar. In J. Aske & N. Beery & L. Michaelis & H. Filip (Eds.), Berkeley Linguistic Society. Vol 13. Berkeley: University of California Press. Horning, J. J. (1969). A study of grammatical inference.: Stanford University, Computer Science Department. Hornstein, N., & Lightfoot, D. (1981). Explanation in linguistics: the logical problem of language acquisition. London: Longmans.

45

Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: D. Reidel. Hymes, D. (1964). Language in culture and society: A reader in linguistics and anthropology. New York: Harper and Row. Jespersen, O. (1922). Language: Its nature, development, and origin. London: George Allen and Unwin. Kanazawa, M. (1998). Learnable classes of categorial grammars. Stanford, CA: CSLI Publications. Kawamoto, A. (1994). One system or two to handle regulars and exceptions: How timecourse of processing can inform this debate. In S. D. Lima & R. L. Corrigan & G. K. Iverson (Eds.), The reality of linguistic rules (pp. 389-416). Amsterdam: John Benjamins. Lasnik, H. (1989). On certain substitutes for negative data. In R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory. Dordrecht: Kluwer. Liceras, J. (1989). On some properties of the "pro-drop" parameter: looking for missing subjects in non-native Spanish. In S. Gass & J. Schachter (Eds.), Linguistic perspectives on second language acquisition (pp. 109-133). Cambridge: Cambridge University Press. Lieven, E. V. M., Pine, J. M., & Baldwin, G. (1997). Positional learning and early grammatical development. Journal of Child Language, 24, 187-219. Lightfoot, D. (1989). The child's trigger experience: Degree-0 learnability. Behavioral and Brain Sciences, 12, 321-275. MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). Lexical nature of syntactic ambiguity resolution. Psychological Review, 101(4), 676-703. MacWhinney, B. (1975). Pragmatic patterns in child syntax. Stanford Papers And Reports on Child Language Development, 10, 153-165.

46

MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society for Research in Child Development, 43, Whole no. 1, pp. 1-123. MacWhinney, B. (1982). Basic syntactic processes. In S. Kuczaj (Ed.),

Language

acquisition: Vol. 1. Syntax and semantics (pp. 73-136). Hillsdale, NJ: Lawrence Erlbaum. MacWhinney, B. (1987). Toward a psycholinguistically plausible parser. In S. Thomason (Ed.), Proceedings of the Eastern States Conference on Linguistics. Columbus, Ohio: Ohio State University. MacWhinney, B. (1988). Competition and teachability. In R. Schiefelbusch & M. Rice (Eds.), The teachability of language (pp. 63-104). New York: Cambridge University Press. MacWhinney, B. (1989). Competition and lexical categorization. In R. Corrigan & F. Eckman & M. Noonan (Eds.), Linguistic categorization (pp. 195-242). Philadelphia: Benjamins. MacWhinney, B. (1991). Reply to Woodward and Markman. Developmental Review, 11, 192-194. MacWhinney, B. (1993). The (il)logical problem of language acquisition, Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society (pp. 61-70). Hillsdale, NJ: Lawrence Erlbaum Associates. MacWhinney, B. (1999a). Connectionism and language learning. In S. Kemmer (Ed.), Data-driven models of language learning. Stanford: CSLI Press. MacWhinney, B. (1999b). The emergence of language from embodiment. In B. MacWhinney (Ed.), The emergence of language (pp. 213-256). Mahwah, NJ: Lawrence Erlbaum. MacWhinney, B. (2001). Emergence from what? Journal of Child Language, 28, 726-736. MacWhinney, B. (in press). The gradual evolution of language. In T. Givón & B. Malle (Eds.), The evolutionary emergence of language. Amsterdam: Benjamins. 47

MacWhinney, B. (Ed.). (1999c). The emergence of language. Mahwah, NJ: Lawrence Erlbaum Associates. MacWhinney, B., & Bates, E. (Eds.). (1989). The crosslinguistic study of sentence processing. New York: Cambridge University Press. MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 29, 121-157. MacWhinney, B. J., Leinbach, J., Taraban, R., & McDonald, J. L. (1989). Language learning: Cues or rules? Journal of Memory and Language, 28, 255-277. Maratsos, M., Kuczaj, S. A., Fox, D. E., & Chalkley, M. A. (1979). Some empirical studies in the acquisition of transformational relations: Passives, negatives, and the past tense. In W. A. Collins (Ed.), Children's language and communication. Hillsdale, N.J.: Lawrence Erlbaum. Marcus, G. (1993). Negative evidence in language acquisition. Cognition, 46, 53-85. Markman, E. (1989). Categorization and naming in children: Problems of induction. Cambrdige, MA: MIT Press. Massaro, D. (1987). Speech perception by ear and eye. Hillsdale, NJ: Lawrence Erlbaum. Matthews, R., & Demopoulos, W. (1989). Learnability and linguistic theory. Dordrecht: Kluwer. McNeill, D. (1966). The creation of language by children. In J. Lyons & R. Wales (Eds.), Psycholinguistics papers. Edinburgh: University of Edinburgh Press. Merriman, W. (1999). Competition, attention, and young children's lexical processing. In B. MacWhinney (Ed.), The emergence of language (pp. 331-358). Mahwah, NJ: Lawrence Erlbaum. Miikkulainen, R. (1993). Subsymbolic natural language processing. Cambridge, MA: MIT Press.

48

Miikkulainen, R., & Mayberry, M. R. (1999). Disambiguation and grammar as emergent soft constraints. In B. MacWhinney (Ed.), The emergence of language (pp. 153176). Mahwah, NJ: Lawrence Erlbaum Associates. Moerk, E. (1983). The mother of Eve as a first language teacher. Norwood, N.J.: ABLEX. Morgan, J. L., Bonamo, K. M., & Travis, L. L. (1995). Negative evidence on negative evidence. Developmental Psychology, 31, 180-197. Nelson, K. (1982). Experimental gambits in the service of language acquisition theory. In S. Kuczaj (Ed.), Language development: Syntax and Semantics. Hillsdale, N.J.: Lawrence Erlbaum. Nelson, K. E., Denninger, M. S., Bonvilian, J. D., Kaplan, B. J., & Baker, N. D. (1984). Maternal input adjustments and non-adjustments as related to children's linguistic advances and to language acquisition theories. In A. D. Pellegrini & T. D. Yawkey (Eds.), The development of oral and written language in social contexts. Norwood, N.J.: Ablex Publishing Corporation. Ochs, E. (1985). The acquisition of Samoan. In D. I. Slobin (Ed.), The crosslinguistic study of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence Erlbaum Associates. Osherson, D., Stob, M., & Weinstein, S. (1989). Learning theory and natural language. In R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory. Dordrecht: Kluwer. Penner, S. G. (1987). Parental responses to grammatical and ungrammatical child utterances. Child Development, 58, 376-384. Piatelli-Palmarini, M. (1980). Language and learning: the debate between Jean Piaget and Noam Chomsky. Cambridge MA: Harvard University Press. Pine, J. M., Lieven, E. V. M., & Rowland, C. F. (1998). Comparing different models of the development of the English verb category. Linguistics, 36, 4-40.

49

Pinker, S. (1984). Language learnability and language development. Cambridge, Mass: Harvard University Press. Pizzuto, E., & Caselli, M. (1993). The acquisition of Italian morphology: A reply to Hyams. Journal of Child Language, 20, 707-712. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56-115. Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multilayered perceptron: Implications for child language acquisition. Cognition, 38, 43102. Plunkett, K., & Marchman, V. (1993). From rote learning to system building. Cognition, 49, xx-xx. Post, K. (1994). Negative evidence. In J. Sokolov & C. Snow (Eds.), Handbook of Research in Language Development Using CHILDES (pp. 132-173). Hillsdale, NJ: Lawrence Erlbaum Associates. Pulleybank, D., & Turkel, W. (1997). The logical problem of language acquisition in Optimality Theory. In P. Barbosa & D. Fox & P. Hagstrom & M. McGinnis & D. Pesetsky (Eds.), Is the best good enough? Optimality and competition in syntax (pp. 399-420). Cambridge, MA: MIT Press. Pullum, G. (1996). Learnability, hyperlearning, and the poverty of the stimulus. In J. Johnson & M. Juge & J. Moxley (Eds.), Proceedings of the 22nd Annual Meeting: General Session and Parasession on the Role of Learnability in Grammatical Theory (pp. 498-513). Berkeley, CA: Berkeley Linguistics Society. Reich, P. (1969). The finiteness of natural language. Language, 45, 831-843. Ross, J. (1974). Three batons for cognitive psychology. In W. B. Weimer & D. S. Palermo (Eds.), Cognition and the symbolic processes. New York: Wiley.

50

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (pp. 216-271). Cambridge: MIT Press. Saxton, M. (1997). The Contrast Theory of negative input. Journal of Child Language, 24, 139-161. Saxton, M., Kulcsar, B., Greer, M., & Rupra, M. (1998). Longer term effects of corrective input: An experimental approach. Journal of Child Language, 25, 701-721. Schieffelin, B. (1985). The acquisition of Kaluli. In D. Slobin (Ed.), The crosslinguistic study of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence Erlbaum Associates. Scollon, R. (1976). Conversations with a one year old: A case study of the developmental foundation of syntax. Honolulu: University Press of Hawaii. Snow, C. (1995). Issues in the study of input: Finetuning, universality, individual and developmental differences, and necessary causes. In P. Fletcher & B. MacWhinney (Eds.), The handbook of child language (pp. 180-193). Oxford: Blackwells. Sokolov, J. L. (1993). A local contingency analysis of the fine-tuning hypothesis. Developmental Psychology, 29, 1008-1023. Sokolov, J. L., & MacWhinney, B. (1990). The CHIP framework: Automatic coding and analysis of parent-child conversational interaction. Behavior Research Methods, Instruments, and Computers, 22, 151-161. Tesar, B., & Smolensky, P. (2000). Learnability in optimality theory. Cambridge, MA: MIT Press. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge: Cambridge University Press. Tomasello, M. (1999). The cultural origins of human communication. New York: Cambridge University Press. 51

Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74, 209-253. Truscott, J., & Wexler, K. (1989). Some problems in the parametric analysis of learnability. In R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory. Dordrecht: Kluwer. Valian, V. (1991). Syntactic subjects in the early speech of American and Italian children. Cognition, 40, 21-81. Wexler, K., & Culicover, P. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press. Wilson, B., & Peters, A. M. (1988). What are you cookin' on a hot?: Movement Constraints in the Speech of a Three-Year-Old Blind Child. Language, 64, No.2, 249-273. Wolfe Quintero, K. (1992). Learnability and the acquisition of extraction in relative clauses and wh-questions. Studies in Second Language Acquisition, 14, 39-70.

i

The competition between “went” and “*goed” has also been treated as an instance of

“blocking” (Baker & McCarthy, 1981; Pinker, 1984). In the blocking account, “went” is said to block “*goed” because lexically-based rules are ordered before general rules in the rule cycle of the morphological component. This account involves an unnecessary commitment to strict rule-ordering and an unnecessary invocation of an ability to order rules according to some innate criteria. Since the explanatory power of blocking is completely captured by the mechanism of competition, we will rely on competition here, rather than blocking.

52

Suggest Documents