Apr 27, 1998 - in Copenhagen; the gist of the analysis in the second half was presented in a poster at the 1997 GALA conference in ..... The diagram in (5) represents this evolution of the type hierarchy schematically. (5) ... 5 The Projection of lexical properties to phrases ..... R: When I'm in here, call me. .... head-nexus-ph.
1
MODELLING GRAMMAR GROWTH; Universal grammar without innate principles or parameters Georgia M. Green University of Illinois
27Apr98
2 Abstract
This paper1 sketches a solution to the string-to-structure problem in rst language acquisition within a set of emergentist assumptions that minimizes innate linguistic knowledge, minimizes demands for linguistic analysis by the language learner, and maximizes the projection of lexical properties of words. These conceptual constraints minimize the rules that have to be developed|in number, in complexity, and in diversity. The goal of this paper is to demonstrate the compatibility of theories of grammar which describe grammars of natural languages in terms of inheritance hierarchies of constraints on linguistic object types such as word, phrase, (syntactic) category, semantic content, referential index, nominal object and the like. The constructs and distinctions that would have to be attributed to the infant learner in order to represent the knowledge of words at the one-word stage provide a foundation for developing, via incremental addition of properties and distinctions, a competence that includes phrasal expressions with compositional semantics. All of the speci cally syntactic notions that must be developed appear to have counterparts in non-syntactic notions that are implicit in the child's understanding of the world at the age when language acquisition begins. An analysis of the observed sequence of stages in the acquisition of the syntax of English questions shows in more detail how language acquisition can be characterized in terms of incremental monotonic changes to a type hierarchy constituting an increasingly less skeletal grammar.
This work was supported in part by the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. I am indebted beyond words to Sae-Youn Cho, John Huitema, Neal Pearlmutter, Hisayo Tokura, and especially to Kay Bock, Gary Dell, Cynthia Fisher, and Jerry Morgan whose interest, observations and incisive questions made a semester-long seminar on this material an intellectual work-out and all the work it involved worthwhile. I am grateful to Vic Ferreira, Hana Filip, Susan Garnsey, Tsuneko Nakazawa, and Ivan Sag, for comments on earlier versions which have contributed to improving the present exposition. The rst six sections of this paper summarize and update work that was rst presented at the 1994 HPSG conference in Copenhagen; the gist of the analysis in the second half was presented in a poster at the 1997 GALA conference in Edinburgh. 1
27Apr98
3
1 Preliminaries
This paper represents a somewhat programmatic account of how language acquisition might proceed incrementally, given an extremely minimal initial ontology of abstract linguistic elements and relations. It purports to be a demonstration of potential, not an argument with any claim of exhaustiveness. Conspicuous lacunae (e.g., discussion of the development of grammatical relations as distinct from semantic roles or thematic relations, and of what determines phrase order) only point up how many other relations and phenomena have to be established to motivate an explicit account of grammatical knowledge. The goal is an emergentist description of language acquisition that minimizes speci cally linguistic innate knowledge, minimizes demands for linguistic analysis by the language learner, and maximizes the projection of lexical properties of words. As many others have argued (e.g., Macnamara (1972, 1982), Anderson (1977), Bates and MacWhinney (1979, 1982), Maratsos and Chalkley (1980), Braine (1987) Pinker (1987, 1989), and recently, Bloom (1993)), if grammars of actual natural languages are not virtually innate, they have to grow from something. The present argument assumes that human beings are born with language-receptive brains, that the ability to learn language is (largely) independent of intelligence, and that the operational entailments of whatever brain properties guarantee language-receptivity are still unknown. Given that attributing language acquisition to innate knowledge of linguistic categories is not an explanation, but an admission of failure to nd one, in exploring what might be involved in acquiring a constraint-based grammar \from scratch," the reasoning pursued here relies on the proposition that much of what is universal in grammars of natural languages is so because it is constrained by design properties of a communication system and, like the acquisition of the ability to discriminate objects in 3-dimensional space, by general cognitive and developmental properties of human beings. The analysis does not assume that the language-learner is a little linguist, that the language-learner evaluates grammars by comparing the sets of sentences they generate, that any speci c, detailed theory of Universal Grammar is innate, or that infants have the capacity to segment speech, classify segments, and infer constituent structure before they begin to learn a grammar. The analysis does assume that the child brings to the task of language learning the ability to recognize types of things, the ability to discriminate subtypes, the ability to attribute properties to types and subtypes, and the assumption that inheritance of properties is strict (monotonic), so that subtypes cannot lack properties that their supertypes have. This is essentially what is required to learn|or invent|a head-driven phrase structure grammar. and it is presumably easily demonstrated that children have these abilities at the age when they acquire language.
2 Head-driven phrase structure grammar
Head-driven phrase structure grammars consist of multiple-inheritance hierarchies of types of linguistic objects (Pollard & Sag 1987, 1994; Pollard (In press); Sag 1997). In such grammars, constraints de ning the types and subtypes at particular nodes in the hierarchy impose compositionality on the meanings of linguistic expressions so that the semantics of any expression is a function of the semantics of its parts. Other such constraints project the properties of heads to the phrases they head, and guarantee correspondence between \extracted" elements and the clauses they belong to.2 Pollard & Sag 1994 or Sag 1997 should be consulted for an introduction to the theory. Sag & Wasow 1998 provides a very elementary introduction. In providing concrete illustrations for readers unfamiliar with the theory, of how a child might induce a head-driven phrase-structure grammar, I have naturally simpli ed irrelevant details at various points. Indeed, if acquisition really does proceed as the development of a taxonomic hierarchy of linguistic objects, it is to be expected that there may be hidden dierences of little or no practical consequence in grammars 2
27Apr98
4
3 The \one-word" stage: Learning words
If acquisition proceeds by making ever ner distinctions, i.e., by discriminating properties of sounds and sound-types, words and word-types, and (eventually) phrases and phrase-types, then by the time a child's behavior evinces a correspondence between something phonetic and something semantic, she must already have an ability to discriminate kinds of things, and, insofar as knowledge of properties of things is propositional knowledge, to add to a store of propositional knowledge. \Learning words" amounts to the realization that types of concrete objects and of situations have names, and entails an ontology of language-related objects that includes sounds, linguistic signs, sets, lists, propositions, names of things (\nominal-objects"), typed reference variables (\referential indices"), and contexts (knowledge about situations), as represented in a taxonomic hierarchy of types of linguistic objects such as (1): (1)
object speech-sound fa,m,ae,b...g "
list sign
set
phon list(speech-sounds) content nom-obj _ proposition context context
index #
h
boolean + { nom-obj
index index restr set(propositions)
proposition i
h
context
backgrd set(propositions) c-inds contextual-indices
contextual-indices h i spkr index addr index
The rst fty or so words that a child learns apparently are primarily names of individuals: mommy, daddy, names of categories of things: baby, juice, milk, cookie, dog, cat, shoes, ball, car, book, words for properties: hot, allgone, dirty, cold, here, there, up, words for relations: eat, see, go, sit, social expressions (words that are part of a particular activity, rather than having a naming or predicating function): Hi, Bye-bye, No, Yes, Please, Bravo, So big. Plausibly, property and relation expressions are represented3 within this ontology as signs with predicative content (as in (2)a), while names are represented as signs with restricted individual reference (as in (2)b), and social expressions are represented as signs for which only the context is speci ed (as in (2)c). 2
(2) a. phon /iyt 2/ 6 eat-rel 6
4content 4eater
3 37 7 index55
eaten index developed by dierent children with exposure to inde nitely varying linguistic input. 3 An expression in small caps is a feature name; an expression in italics is the name of the sort of object which is its value. An expression enclosed in slashes is a string of speech sounds. Curly braces enclose set values; angled brackets enclose list values.
i
27Apr98
5 2
b. phon /gaga 2 / 6 nom-obj 6 6 6 6content 6index 4 4
1
restriction
2
c phon /baybay 2 / 6 leave-rel 6 4context
f 4leaver left
f dog-rel instance
3 37 7 index5 5
index
3 37 7 77 7 57 5 1
g
g
If this is what knowing a word involves, then learning another word evidently amounts to discriminating a(nother) phoneme-sequence and associating it with a(nother) situation, individual, or category.
Recovery from error in word learning
Narrowing down the meaning of a term (so that truck, for example, refers to only a certain type of large vehicle) would amount to either discovering that the particular category has subdivisions, and the name attaches to only one of them, or discovering that the name attaches to some element lower in the pre-existing, non-linguistic hierarchy that represents the child's ontology or theory of the world. To the extent that either kind of development involves non-monotonic changes to the linguistic system, they are at the maximally speci ed fringes of the hierarchy, where they can't entail wholesale revisions of knowledge about subtypes. This is equally true of correcting for overspeci cation|learning that cookie refers to more kinds of things than just vanilla wafers. Where the types postulated don't quite correspond to those of the (presumed) adult grammar, they dier in either of two ways. 1. In the adult grammar, some of the objects have more attributes than are re ected here. E.g., indices have attributes for grammatical person and number, and signs have attributes for quanti er-scope representations, etc. 2. The properties attributed to types in the one-word grammar are properties of intermediatelevel properties of more structured types in the adult grammar which are not yet dierentiated in the one-word grammar. E.g., content and context are attributes of an intermediate attribute of sign; but at this point, the child's grammar does not encompass distinctions that would motivate the intermediate attribute. Similarly, there are other kinds of linguistic signs besides word-signs, which is what the initial grammar describes. Eliminating the rst kind of dierence involves just the addition of attributes and declarations of their value type, re ecting distinctions the child comes to recognize as signi cant|just as she might learn that paper can be torn, and infer a boolean-valued attribute tears for her model of paper. Insofar as this is just addition of information, it is incremental and monotonic. Transitions eliminating the second kind of dierence involve Type Dierentiation: a type X with certain attributes is discovered to have an additional attribute, with subtypes diering substantively according to what value they have for this new attribute. This is illustrated schematically in (3): type-x: ?! type-x: (3) subtype1: [G+] subtype2: [G?]
27Apr98
6
The subcase of this where a type X with particular values is discovered to be a subtype of a previously unknown supertype which does not share all of the attribute speci cations is essentially the same, as illustrated in (4), the only dierence being the names of the types (addresses might be a better metaphor). In any case, the internal representation of the name of a property of any sort of object is a fairly super cial aspect of the system of knowledge representation, not an aspect of its content. (4) type-x: [F+, G?] ?!
supertype: [F+] type-x: [G?]
othersubtype: [G+]
Both cases involve learning that some property (whether previously observed or not) is independently variable, and both surely occur regularly in non-linguistic learning.4
4 Getting syntax: multi-word utterances
A major milestone in rst language acquisition is advancement to the so-called \two-word stage." The frequent occurrence of two-word signs is better seen as a performance or processing matter than as a competence or knowledge matter. For one thing, even children who are just beginning to put words together two at a time produce the occasional three- or four-word utterance (like clock on there or other cover down there). Furthermore, a number of two-word utterances are understood as having more implicit parts than just two. The regular appearance of utterances longer than two words will represent not so much a quantum change in grammar or ability or competence, as something more on the order of a developmental change relating to the size and organization of the mental structure that amounts to a \buer," a change which one would expect to correlate with changes in non-linguistic behavior of the sort observed by Piaget. It seems likely that the notions of compositionality, syntactic categories, subcategorization, and heads develop together, in the solution to a simultaneous constraint satisfaction problem (cf. Pinker, 1989: 250). Accounting for the properties that each of these notions represents requires representing at least one of the other notions. Perhaps surprisingly, the need to distinguish between syntactic categories and semantic content types is implicit in two-word utterances insofar as reference to the content of arguments can be semantically entailed without there being any actual syntactic argument present. The model must have this property if we take the interpretations of caregivers seriously. Examples include the celebrated mommy sock (`mommy is putting my sock on me'), as well as negative phrases without overt relation terms, such as no bed, where a relation term with particular expressed roles just has no phonological expression. (The relation intended in the use of such an expression may be speci c, though we have no way of knowing, after the fact, just what relation it is.)
Learning about phrases
A grammar that advances beyond a list of sign-meaning correspondences to a structure-dependent system requires, minimally, the notion phrase|the idea of linguistic objects which consist of a succession of signs, and a principle of semantic compositionality which entails that the content value of the phrase is some function of the content values of the parts, regardless of how vaguely
A non-linguistic example of type dierentiation might be learning the dierence between boys and girls, or that not all cats are gray. An example of the type-revision subcase might be the child of nutrition-fanatic parents learning that fruit and dessert are not the same thing, that there are cold-and-creamy and dry-and-crumbly desserts that are not fruit. 4
27Apr98
7
or speci cally those content values map onto the reality that the child perceives. Maintaining a non-arbitrary account of the meanings of phrases in terms of the meanings of their parts in the face of non-semantic constraints on combination entails the notion `syntactic category', the notion `head-of-a-phrase' (the part which determines the general character of the phrase), a principle that guarantees projection of properties of heads to the phrases they head, and something that tracks the satisfaction of selection requirements. Learning that phrases are signs that have signs as subparts is straightforwardly the addition of information. The diagram in (5) represents this evolution of the type hierarchy schematically. 2
3
sign
6phon list(speech-sounds) 7 6 7 4content nom-obj proposition 5
(5)
context context
_
?!
2
3
sign
6phon list(speech-sounds) 7 6 7 4content nom-obj proposition 5
context context
word
2
_
phrase
3
sign5 non-head-dtr sign
4head-dtr
At this stage in the discussion, there is nothing to say about the content of the attributes
head-dtr and non-head-dtr beyond the fact that they are distinct from each other. But having
a grammar for phrasal expressions requires that they have content, as described in Section 5.5
Compositionality
The meaning of a phrase is not a simple sum of the meanings of its parts, but a complex function where dierent kinds of words have dierent kinds of meanings and combine in phrases in typespeci c ways: property- (or relation-)denoting words can combine with individual-denoting words to form predicating phrases like Robin cry, see train, property-denoting words can combine with property-denoting words to form predicating expressions (as in more sing, all wet), property-denoting words can combine with individual-denoting words to form individual denoting phrases like dry pants, big spoon. Insofar as the combining function that represents compositionality is sensitive to the categorial value (`part of speech') of the parts, and insofar as the least arbitrary accounts are sensitive to which part of the expression is the syntactic head, accounting for compositionality requires implicit knowledge of syntactic categories, as well as knowledge of the associated notion of headedness. Support for this account comes as much from getting the syntax and semantics of modi cation right as it does from complements and strict subcategorization. Non-headed phrases such as coordinate structures emerge in the syntax explosion following the so-called two-word stage. 5
27Apr98
8
5 The Projection of lexical properties to phrases
Since the elementary syntactic objects whose combination a grammar describes have both syntactic and semantic properties, the grammar has to represent how the semantic and syntactic properties of phrases are a function of the semantic and syntactic properties of their parts. A learnable grammar has to represent this as non-arbitrary, learnable functions. The function denoting the projection of semantic properties amounts at this stage to two mutually exclusive identity statements which represent the uncontroversial hypothesis of formal semantics that a head-adjunct phrase has the meaning of the adjunct predicating something about the head it is adjoined to, while a headargument phrase has the meaning of the head predicating something of its arguments. This is represented in the elaboration of a bit of the type hierarchy in (6), as the sharing of structure between the identically tagged values. phrase
(6)
2
head-adjunct-phrase
6cont 2 6 4head-dtr
[cont ] adju-dtr [cont [arg 1 ]] 1 2
3
2
7 7 5
4cont
head-argument-phrase 1
head-dtr [cont ]
3 5
1
From the language-learner's point of view, it represents the idea that one way in which phrase types dier from one another is in how their meanings are a function of the meanings of their parts. A separate function which describes the projection of syntactic properties of phrases from their heads is necessary because syntactic selection, and the projection and composition of meanings vary independently, and constituent structure may be independent of both, while the syntactic character of a phrase is always the same as the syntactic character of the constituent which is its head. Represented as a constraint on phrases, as in (7), it says that the head properties of a phrase are the same as the head properties of its head daughter. 3
2
(7) phrase 6head 6 4
1
head-dtr sign head
1
part-of-speech
7 7 5
An analog of this interpretation of the head attribute that is implicit in the child's everyday experience is her ability to distinguish between those parts of a thing that make it what it is, and those parts that, while normal or desirable, are not essential, for example, the idea that dolls must have heads and bodies, but needn't have feet or hair, or that cars must have frames and wheels, but needn't have doors, roofs, and so on. The attribute head must not include phonological representation, because the phonological representation of a phrase includes the phonological representations of all its parts, not just a particular one. It must not include content, because it is already clear that the content value of a phrase is only sometimes the same as the content value of its head. It must represent syntactic category information, though, however well or poorly it correlates with content. Thus, the additional feature head must have a value which is a part-of-speech from among a set approximating fN, V, A, Detg, and word-signs must correlate the value N with nominal-object content, and the values V and A with propositional content, and Det with quanti er content. Inducing
27Apr98
9
the attribute head and an attribute head-daughter to identify which phrase-part is the head, with these constraints on them constitute incremental, monotonic changes (and not restructuring or reclassi cation of value types at intermediate levels of the ontology). Subcategorization
In theory, two classes of facts motivate classifying words in terms of the syntactic categories of the expressions they combine with: those which indicate selection for a subclass of sign-type, and those that have to do with assigning semantic roles to argument daughters of a phrase. As for the former, if types are semantic, describing the selection as semantic proliferates the number of semanticallyde ned phrase types. This results in a syntactically unstructured set of syntactic types, rather than a syntactically structured set with semantically structured subtypes, and so makes the account of how meanings are composed increasingly arbitrary. It is more straightforward to distinguish syntactic sub-classes and map them to semantic types than to make all selection semantic. Since regularities are never completely semantic anyway|there are always pockets of arbitrariness| any apparent redundancy should guide learning in a way that reduces idiosyncrasy. As is commonly observed, redundancy makes it easier to reconstruct the message, given the fallibility of vocal communication, and provides \bootstraps" for learning (cf. Bates and MacWhinney 1979, 1982, Maratsos and Chalkley 1980, Pinker 1987). In learning, certain \local" attributes of linguistic expressions have to be segregated (and before long contrasted with nonlocal unbounded-dependency-tracking attributes (see Sec. 7.3)), because what gets subcategorized for is linguistic objects with a certain part of speech and a certain content,6 rather than entire signs with phonological information and arbitrary details of structure and content. There seems to be an analog of this sort of reorganization of information in the child's everyday experience in identifying and naming a complex object consisting of already named and identi ed objects (like the notion `pair' or `parents' or `siblings'7 ). Note that substantive entailments that were formerly true do not become false with this kind of object identi cation and reorganization of information. Subcategorization attributes can be inferred which take lists of these local things as their value; any number of appropriate means are available to distinguish among subjects, complements and adjuncts (even at the \two-word" stage, children's grammars distinguish agents and patients, and therefore also probably subjects from objects of transitive verbs).8 In any case, the child must come to know that the subcategorization properties of a phrase are a function of those of its head. More speci cally, he learns the principle represented in (8). (8) A phrase subcategorizes for all the arguments its head subcategorizes for that aren't sisters of the head. Like the Head Feature Principle described above, this subcategorization or valence principle amounts to an additional constraint on the type phrase. The analog of this principle in the everyday experience of infants is the idea that when you nd what you were seeking, you stop looking for it. Finally, speech at this stage shows evidence of knowledge that just as heads select particular sorts of categories for their arguments, modi ers require particular sorts of categories for their heads. Thus there must be a constraint like (9) on head-adjunct-phrases that requires the adjunct-daughter At least, an indicatable index, and maybe a relation type. Or, like `mother-and-child' (bo-shi) or `parent(s)-and-child' (oya-ko) in Japanese. 8 In this discussion, grammatical relations are represented by position on subcat and comp-dtrs lists ordered by decreasing obliqueness. See Pollard & Sag 1987 for discussion of identifying grammatical relations by obliqueness. 6
7
27Apr98
10
to modify something of the type that the head-daughter belongs to, which can be represented through a modifies attribute. 2 3 (9) head-adjunct-phrase 4head-dtr [[head 1 ]] 5 adju-dtr [[head [mod 1 ]] The available non-linguistic correlate of this very speci cally linguistic notion would be the association relation `go with' (as in hats go with heads, leashes go with dogs, etc.).
6 Interim summary
So far, it has been shown that A solution to the string-to-structure problem can be framed in constraint-based terms. No speci cally syntactic primitives need to be assumed to be innate. Notions crucial for the development of syntax appear to have counterparts in non-syntactic notions that are implicit in the child's understanding of the world at the age when language acquisition takes o (9-18 months). The acquisition of constraints can be represented as the addition of high-level attributes of kinds in an inheritance hierarchy, constructed by the child the same way her theories of physical objects and events and social interaction are constructed|however that may be. Constraints on the types of linguistic object that come to have to be distinguished do not have to be assumed to be innate, because they have counterparts in nonlinguistic cognition which facilitate acquisition of knowledge generally, and so should be seen as natural developments in the child's ongoing classi cation of the world.
7 The Acquisition of Questions
The sequence of stages in the development of questions is not controversial.9 Questions like Dat? Doggy? Uzzae? (D: 1.4.25)10 appear early|in the one-word stage. Polar (\yes-no") questions appear rst as just intonation-marked, and uninverted forms appear before inverted forms. In constituent (\WH") questions, Where and What emerge before other interrogative words, with be, and are always utterance-initial, in examples like Whereda N? (T: 1.3.20 - 1.4.24), Whereda [Name]? (T: 1.3.20 - 1.4.24), Where N?/Where's N? (T: 1.4.25 - 1.6.26). Why-questions like Why kitty sleep? come next, without auxiliaries, at about the same time as auxiliaries emerge in negative sentences. How-questions come after that, at about the same time as auxiliaries in positive sentences, and inverted polar questions (Did you see him?). WH-questions are still not inverted: How she can do that? An incremental account of the acquisition of questions should account for these stages with grammars that dier only by the monotonic addition of information. A respectable account will predict both the non-adult patterns that systematically occur as enumerated in (10), and the spontaneous recovery from them. (10) a. WH-questions without inversion: Why his feet are cold? (T: 2.3.09) See Clark & Clark 1977, Foss & Hakes 1978. Attributed forms are mainly from journal studies; T is the child Travis from Tomasello 1992, M is the Madeleine described in Hall 1997, and references to R and D are from the author's unpublished journals. References to \Finer's child," \Gillian," and \Ga(vin)" are from email discussions on the Linguist List. 9
10
27Apr98
11
b. c. d. e. f.
missing auxiliary, wrong ( nite) verb form: Why Ann and Dave bought this? (T: 2.3.06) missing auxiliary, right verb form: Why we going to the doctor's oce? inversion too far: What could be that? (R: 2.0.20) wrong auxiliary: Are you have a knife? What are you see? (Ga: 1.11) \Double tense" questions (auxiliaried, inverted Q with nite complement VFORM): Does Brinlee and Jana has shorts? (R: 1.11) g. \Double auxiliary" forms: What's \delusions" is? (D: 2.3) Some \errors"|for example, apparent Left Branch violations|are probably best explained as performance error even relative to the child's grammar, insofar as they are frequently spontaneously self-corrected, as in the monologues quoted in (11). (11) a. Finer's child (3.10): What do diesels put diesel fuel in their? What do diesels put diesel fuel in that place? What do diesels put diesel fuel in? b.
7.1
Gillian (2.4):
Who is it peepee? ( ushing) Whose pee-pee was that?
One-word stage
Presumably, at the one-word stage, questions like Dat? Dat. and Uzzae? are just words with more contextual than semantic content, and their question force is carried by their intonation,11 as illustrated in (12). 2 3 33 2 2 want-rel (12) 6 6 6 6 6 6context 6 6 6 4
7.2
6expr 1 6 6 2 6 6 6 tell-name-rel 6bkd f 6 6 6 6 6proposition 6agnt 6 4 4goal 6 6 6 theme 4 spkr 1
c-inds
addr
2 1 deictically
7 77 37 7 77 7 7g7 7 7 77 7 77 77 55 77 7 77 77 indicated 55
2
Two-word stage questions
At the two-word stage, when the child has acquired the minimal grammar for phrases, WH-words (where, who, what) are plausibly analyzed as predicative heads (like verbs or prepositions) with very underspeci ed content. In this analysis there would be an index corresponding to the subject, but none to any object, and they might be inexplicit about the property predicated of an argument. The annotated constituent-structure tree in (13) represents the question Where's Robin? at this stage.12 11 There is abundant evidence that children as young as 13 months are sensitive to prosodic cues (Morgan and Newport 1981, Wanner and Gleitman 1982, Hirsh-Pasek et al. 1987, Jusczyk et al. 1992). 12 There is no reason to assume that at this stage the s is parsed as an auxiliary verb, or indeed as anything except a bit of phonology, the principles for the occurrence of which are not understood, implying (correctly, I believe) that it goes in and out, like articles (cf. Brown 1968).
27Apr98
12 2
phon 6 6local 6 6 6content 6
6 6 6 6 6 6 6 6context 6 6 6 4
(13)
Where's 6
7
3
Robin?
head 4 subcat hi 2
2
want
6expr 6 6 6 6 6 6 6 6 6 6bkd f 6 6proposition 6 6 6 6 6 6 4 4 spkr
c-inds addr
H 2
phon
6 6 6 6 6local 6 4
2Where's
head
4
?
6subcat h 5 npi 6 6 6cont 6 at-rel 4 theme
context
1 2
7 7 7 7 337 7 7 777 7 1 2 37 7 77 7 tell-location-rel 777 7 2 3 6 agent 2 7 7 77 6 7g7 7 7 44goal 7 15 5 7 77 7 777 3 theme 757 55
C 3 37 7 77 7 7 77 77 3 55
2
3
phon Robin 4 5 head n 5 local cont [index 3 ]
7
The fact that WH-words are phrase-initial at this stage is perhaps best seen as only the performance eect of the child having no reason to make utterances with (gross) order dierent from what he hears. Once children start combining words freely, word order errors seem to be rare and isolated (i.e., limited to very speci c constructions), suggesting that they represent incorrect analyses. (An example of such an error would be that quanti ed predicative phrases typically precede their arguments in spontaneous utterances like All gone milk, which have no model in the language of the environment.)
7.3 Organization of Information about Unbounded Dependencies
The appearance of multi-constituent WH-initial questions evidences a new kind of phrase (head ller-phrase), where the head is a type of clause missing some constituent, and the non-head is a phrase that has the properties required to inhere in whatever is missing. The analog of a head ller-phrase in a child's cognitive experience would be the ability to identify what is missing when something is missing, and to recognize it when it is located outside of where it belongs.13 The correspondence between the ller and what is missing from the head is another instantiation of the same general notion that supported the Subcategorization Principle: When you nd something, you stop looking for it, but you don't stop looking for it until you nd it. This representation of constituent questions also requires a feature slash whose value encodes the properties required to inhere in whatever is missing, and some kind of representation of the questioning non-head ller-expression. In the earliest productions, the only non-local value im-
13 One might seek to test this claimed correspondence by comparing the age at which a child gains the ability to solve shape-sorting puzzles with the age at which WH-extractions begin to occur. Performance on such a concrete task must be expected to underrepresent ability, however, since children are reported (Kay Bock, p.c.) to demonstrate discriminations that aect them earlier than purely abstract ones.
27Apr98
13
plicated is the characterization of gap sites (what is missing). However, the WH-ness of question words is also nonlocal information. It, however, originates in non-heads, and is propagated through heads, but not like head information|i.e., whose book is a WH-phrase even though the WH-property comes from a modi er of the head. WH-phrases like this are produced before the verbal syntax of questions is fully mastered, and are presumably understood long before that. The distinguishing property of WH-ness information is represented by a nonlocal feature que. It is inherited by the phrase of which it is a part, and represents the fact that, e.g., Whose book is a WH-phrase because whose is a WH-word). WH-ness information is independent of information about the gap. Tracking local and nonlocal information entails another organizing attribute (synsem), to segregate information about a phrase's inherent properties from information about its subconstituents, and provide a common anchor for local and nonlocal (unbounded dependency) information. This means that the type hierarchy must be elaborated as in (14) to include the following (highlighted) types and additional constraints. TYPE
(14) CONSTRAINTS
IS-A
synsem
object
local
2
LOCAL local NONLOCAL nonlocal
head ...
3
... context ...
5
object
SUBCAT list(synsems)77
6 6 4cont
nonlocal
sign
head ? filler ? ph
2
object
SLASH local QUE nom-obj phon ...
SYNSEM synsem
object
3
LOCAL [HEAD verb] 4HEAD-DTR NONLOCAL [SLASH 1 ] 5 FILLER-DTR [LOCAL 1 ]
phrase
The correspondence between slash14 and local speci ed for head- ller-phrases amounts to the observation that a given puzzle piece ts in a particular place in a puzzle. A cognitive prerequisite 14 Although the chart represents slash and que as having individual objects of a certain type as their values, nonlocal features of this type actually have to be set-valued in order to have empty-set values when they lack a substantive value. Con rmation that slash is set-valued comes with the ability (acquired much later) to understand
sentences like Which violin is that sonata easiest to play t on t? and Manners, Daddy is hard to talk to t about t. The value for que seems always (in all languages) to be no larger than a singleton set; one WH-word never binds (asks about) more than one nominal-object. (The fact that that a nominal-object could be referred to in terms of multiple properties (What did Kim nd t and give t to Sandy?) doesn't even involve a non-singleton set for que, and the fact that a WH-word can be used in connection with other WH-words asking about other nominal objects
27Apr98
14
for this would seem to be the ability to solve slot-and- ller problems where you have to identify what's missing from a picture, or to observe that customarily present objects are absent|like no milk in the bottle, collar gone from dog, etc.
Propagation of Unbounded Dependency Information
Information about a gap has to match information about the ller, which may be at some remove from it, so there must be some means of propagating the infomration from the gap site (following immediately or shortly after the subcategorizing word) up to the ller-daughter in the head- ller-phrase, where it is resolved. Most simply, the information that selection requirements for a complement or modi er of a lexical head will be satis ed by the non-head daughter in the head- ller-phrase is represented on that head, and propagated from head to head.15 The nonlocal (slash and que) values of a phrase are the same as that of the head daughter (except where speci ed (as in the sort declarations for head- ller-phrase) to be dierent). The analog in the everyday experience of a child would be the notion that if the essential part of something has a special property, or is missing something, then the thing of which it is a part ordinarily has that same property.
Is this monotonic learning?
Feature addition is clearly incremental and monotonic. Feature-structure reorganization that amounts to elaborating a classi cation is also incremental, as described in Section 3 above.
7.4 Inverted Yes-No questions
As a prerequisite to being able to produce the inverted forms of polar questions, the child obviously must have the capacity to ask polar questions. Typically, he does, marking them by intonation rather than inversion (just as his caregivers often do) as in Dat da park? (D: 1.11). Furthermore, the child's vocabulary must include auxiliary verbs such as be, can, will, since inverted forms occur only with an auxiliary verbs. Interestingly, in diary studies, modals show up at about the same time as inverted questions (about 23 months), about two months after the copula. Two observations about the constituents of inverted polar questions entail their exemplifying an additional kind of phrase, consisting of a head and all of its arguments.
Individual auxiliary verbs select both a VP complement with a particular verbal in ection type (vform), and a subject, which is the same as the subject that the complement VP subcategorizes for. Constituent order in inverted polar questions is dierent from other sentences; in other sentences, subjects routinely precede their VPs, but in polar questions, the auxiliary precedes the subject, which is followed by the rest of the predicate.
An additional subtype of head-arguments-phrase which treats subject and auxiliary verb as sisters of the auxiliary's complement enables all of these facts to be represented as interconnected constraints in the grammar, as described in (15). (Who said what to whom?) are irrelevant.) We would be talking about something like Whichich book by author won the Newbery prize in 1971? meaning `Name the book that won the prize and its author,' and such questions do not seem to be possible in any language. 15 A sort of lexical redundancy rule (cf. the Lexical Amalgamation of slash in Sag (1997)) must ensure that the slash value of a lexical item is the union of the slash values of all of its arguments. An analog in the child's cognitive experience of this principle would be the understanding that properties of the parts are cumulatively properties of the whole; when a part of something is aected in a certain way (say, injured or dirty), the object as a whole is aected in that way.
27Apr98
15 2 (15) inverted-ph 2
3 2 3 3 66 7 vform n 7 4 5 6 66 7 7 head aux + 6 7 66hd-dtr 4 7 57 inv + 7 4 subcat h 1 , 2 vp[subcat h 1 i]i 5 comp-dtrs h[synsem 1 ], [synsem 2 ]i
How phrase-order options get connected with just auxiliaries
The principles of phrase order, and how they might be learned incrementally are beyond the scope of this work. For the present, it is only a conviction that an explanatory account need not depend on innate rules of abstract case assignment or the like. But granting that in learning a relatively xed phrase order language like English (or even Japanese), a child has no reason to imagine utterances with a phrase order grossly dierent from what she perceives, the child comes to know that (in English): Only auxiliaries may occur in inverted (\ at") structures All auxiliaries can occur in uninverted (hierarchical) structures. Conceivably, this is represented in the addition of constraints re ecting the following sequence of observations. 1. be occurs either before or after its subject 2. When it precedes its subject, be has special ([inv +]) forms: (e.g., aren't I, not *amn't I) 3. Certain other forms (can, will) with the same meaning type as verbs can occur before the subject, although most verbs cannot.16 4. Therefore the ones that can are also special ([inv +]) when they do, and dierent ([inv -]) when they do not. 5. The property of potentially having a positive value for the feature [inv] (i.e., being invertable) de nes a subclass of verbs ([aux +]); all other verbs are [aux -]. Observing the fact that among verbs, only auxiliaries have a vowel-reduced form (e.g., 'll, 'd, c'n, c'd, 's, 're, 'm) and that only auxiliaries have a negative form (isn't, wasn't, ain't, can't, won't, etc., but not *missn't, *cryn't) would con rm the distinctness of the class of auxiliary verbs.
Learning when inversion is obligatory
Having had the opportunity to observe ample evidence from caregiver speech that inversion in main clause Yes-No questions is optional, and model their own speech accordingly, children may or may not assume (incorrectly, along with many linguists)17 from ambient discourse that inversion does not occur in embedded questions. They also have ample experience consistent with an assumption that inversion in main clause WH-interrogatives is obligatory, and none inconsistent.18 This is a negative observation of a common sort; for example, children learn early that English syllables do not begin with [dl], by not hearing any such syllables. 17 In fact, inversion is syntactically optional in a number of types of embedded clauses, including both polar and WH-questions, so that sentences like I wonder did he leave and It's unclear who did he see are fully acceptable in at least some registers in a variety of dialects. For discussion, see Green 1981 and Green & Morgan 1996. 18 Clari cation echo questions (You spilled what on the rug?) do not involve head- ller constructions, and are beyond the scope of this discussion. See Ginzburg & Sag 1998 for some analysis. 16
27Apr98
16
If a learner does not suppose that [inv +] and [inv -] WH-clauses are in complementary distribution, and believes [inv +] is always an option, her grammar should just lack any [inv] constraint on head-WH ller-phrases; she should ask uninverted sincere WH-questions like Why his feet are cold? This analysis predicts correctly that children who report questions may have embedded inversions like Say where am I when I'm in here, meaning `Ask where I am when I'm in here.'19 Learning not to invert would be a monotonic change; it would involve just adding the restriction [inv -] for some clause types.
7.5 Do-support
At the point of acquiring do-questions, the child already knows that one option for polar questions is inversion, and that inversion only occurs with [aux +] verbs. A plausible scenario for acquiring the competence to use the do-support construction would begin with the meaningless \noise" [d ] heard at the beginning of utterances interpreted as questions nally getting parsed as the auxiliary verb do (did). (Meaninglessness may be the reason that it takes so long.) Interestingly, some children try be before they settle on do, as shown in (16). (16) a. Are you have a knife? (All Ga: 1.11) b. What are you see? c. What are you do? Correcting this error (if it is, as it appears to be, systematic, and part of the child's grammar) would require a non-monotonic change; however, it would be localized to the lexical entries, where nonmonotonic learning has to be possible, or one could never recover from routine errors in associating meanings with forms. Learning that the complement VP has to be unin ected ([vform base]) would be monotonic, since it just involves the addition of information. This predicts, apparently correctly, that before this information is added, children should use whatever verbal in ection value they would use if the \embedding" auxiliary weren't there.20 Thus, we should nd brief stages where the complement of do is nite rather than unin ected (Does Brinlee and Jana has shorts? (R: 1.11)) 19
A transcription of the entire interchange between R (3.6) and her (linguist) mother: R: Say where am I when I'm in here. G: Where am I when I'm in here. R: Say where's Robin when I'm in here. G: Where's Robin when I'm in here. R: When I'm in here say where's my name. G. Where's my name. R: When I'm in here, call me. G: Robin.
20 Evidence seems to be available only from attempts to learn this do-support construction. Modals don't appear until inverted questions do, and always seem to have the correct [base] form. Perfective have does not appear until much later. (See Brown 1973: 335. Past participles are rare through \Stage V" (MLU 4.0, upper bound 13; typically ages 2.2 - 4.0), even though questions appear in Stage III (MLU 2.75, upper bound 9; typical ages 1.10 - 2.8), and participles have distinguishable forms for some verbs that are very frequent in discourse with children (see, be, fall, break). Forms like Why they got them? (T: 2.3) do occur, but cannot be distinguished from approximately equally appropriate past tenses.) Relying on a possibly faulty memory, it seems likely that after \Stage V", children may say things like I been to Carle Park, I been playing with Joey and ask questions like You been to the park? Where you been?, but not ones like *Do you been to the park? *Where do you been?. One might infer from this that when children nally learn to segment out and parse [v] and [hv] as have, they know it must be an [aux +] and therefore potentially an [inv +] verb.)
27Apr98
17
7.6 Recovery from errors in terms of monotonic additions to minimal grammars
The proof of the pudding was going to be whether a monotonic, incremental, start-from-scratch learning hypothesis could give an account of recovery from errors. I hope to have demonstrated that the early failure to invert polar questions at all (Why his feet are cold? Why you're writing? Why they got them?) is correctible by adding the speci cation [head-dtr [+inv]] in the type declarations for the WH ller subtype of head- ller-phrase, and that the initial failure of do-support (Why Ann and Dave bought this? What Daddy did? (M),) is correctible by learning, in addition to that, that [d ] is an (invertible) auxiliary verb do that subcategorizes for a [vform base] complement. The short-lived appearance of nite complements in do-support constructions (Does Brinlee and Jana has shorts? What did you did?) represents a (false) analysis of Do/Did and 's as empty bits of phonology, rather than as verbs with identity-semantics that subcategorize for a complement with a particular in ection type. Such an analysis, preceding an analysis where they are parsed as auxiliary verbs, predicts the attested occurrence of \double auxiliary" forms where the rst auxiliary is be, as in What's \delusions" is? (D: 2.3) or Look how big I'm are (M). Recovery can be accomplished by including in their new entries the speci cation that they are verbs, in fact, invertible, auxiliary verbs, and subcategorize for a verb phrase complement and a subject which is the same as the subject that their complement subcategorizes for: [subcat < 1 , VP[vform base, subcat < 1 >]>. Supposed Left Branch violations like Who is that [t]'s; Whose is it [t] bicycle? involve \extraction" of an NP from a Determiner phrase, or of a determiner phrase itself. If gaps are licensed as suggested above (cf. Pollard & Sag 1994, Ch. 9; Sag 1997) in derived lexical entries which require that one of the arguments of a head not be realized, the assumption that the determiner phrase and the possessor NP are correctly analyzed as speci ers rather than arguments will preclude examples like this in a mature grammar. A grammar that would allow LBC violations would be one which didn't restrict the position of gaps to argument21 expressions. Learning not to produce these forms would re ect the addition of constraints. 8 Conclusion
The analysis presented here of the observed sequence of stages in the acquisition of the syntax of English polar and constituent questions illustrates how language acquisition can be characterized in terms of incremental and largely monotonic changes to a type hierarchy that constitutes an increasingly less skeletal constraint-based grammar. Insofar as making incrementally ner distinctions among linguistic objects has parallels in what the child learns about the world she lives in as she matures, this provides a plausible and falsi able alternative to a view of acquisition in which the child is seen as inductively hypothesizing rules, or as setting or switching speci c parameters of an innate and detailed grammar template. An advantage of this approach is that at the same time it says that languages are all of the same general character (multiple-inheritance hierarchies of linguistic objects), it allows languages to dier from one another in substantive ways, and makes the existence of language-particular constraints and constituent-structure schemata unremarkable, rather than an embarrassment. Insofar as details of types and constraints in grammars are similar or identical across languages, this can plausibly be attributed to common communicative pressures and common developmental reactions to them. When we can give a predictive account of the variability of human languages both within and across cultures, as well as of the universals, we will be on our way to having a theory of universal grammar that is more than a promissory note. Adjunct extraction also has to be licensed, and is mastered early. The best analysis of this in mature grammars is still elusive, but see Bouma et al. 1998 for an analysis of the problems. 21
27Apr98
18
Appendix I: A grammar for the one-word stage object
speech-sound fa,m,ae,b...g
"
list sign
set
list(speech-sounds) nom-obj _ proposition context
phon content context
#
index
h
boolean + { nom-obj
index restr
i index set(propositions)
proposition
h
context
contextual-indices h i spkr addr
index index
set(propositions) i contextual-indices
backgrd c-inds
27Apr98
19
Appendix II: A grammar with compositional semantics and subcategorization TYPE part-of-speech noun verb adjective preposition determiner quanti er local
sign
word phrase
CONSTRAINTS
IS-A object
[mod part-of-speech _ none]
2
part-of-speech part-of-speech part-of-speech part-of-speech part-of-speech object object
3
head part-of-speech 6subcat list(locals) 7 6 7 4content nom-obj _ proposition _ quanti er5 context context
phon list(speech-sounds) local local
2
head 1 local subcat 6 2 6
head-arg-ph
head-adjunct-ph
2
3
3
2
sign 6 6head-dtr 4 1 6 local head 4 subcat comp-dtrs
object
list(signs)
append-locals ( 2 ,
local [content 1 ] head-dtr [local [content 1 ]]
3 2] local [content 7 6 head 3 7 6head-dtr local 1 7 6 content 5 4 3 head j mod adju-dtr local content 2 [arg 1 ]
sign sign
37 7 7 57 7 3) 5
phrase
phrase
The types speech-sound, list, set, boolean, context, c-inds, proposition, nom-obj, index are as in Appendix I. The types sign, phrase are as in Appendix I, with additional constraints noted.
27Apr98
20
Appendix III: A Grammar for Questions TYPE
CONSTRAINTS
synsem
local
2
nonlocal
sign
phrase
head-nexus-ph head-args-ph
head-comps-ph
IS-A
local local nonlocal nonlocal
object
object
3
head part-of-speech 6content content 7 6 7 4subcat list(synsems)5 context context
slash set(locals) que set(nom-objs)
object
phon list(speech-sounds) synsem synsem
object
synsemjlocaljhead 1 head-dtrjsynsemjlocaljhead
sign
1
synsemjlocaljcontent 1 head-dtrjsynsemjlocaljcontent
2
phrase
1
3
subcat 2 4head-dtr sign[subcat append-synsems(( 2 , 3 ))]5 comp-dtrs list(signs) 3
[head-dtr word]
head-nexus-ph
head-args-ph
27Apr98 head-su-ph
inverted-ph
21
2
subcat h i 2
6 4
head- ller-ph
head-WH ller-ph
verb
auxiliary-verb
main-verb
head-args-ph
[head-dtrjsynsemjlocaljsubcat h[ ]i]
2
vform head-dtr 4synsemjlocaljhead 4aux + inv +
3
2
3 33 nite 75 55
synsemjlocal [head verb] 4head-dtr synsemjnonlocal [slash 1 ] 5 filler-dtr [local 1 ]
[filler-dtr synsemjnonlocal que f2g] 2
3
vform vform 4aux boolean 5 inv boolean
[aux +]
aux { inv {
head-comps-ph
head-nexus-ph
head- ller-ph
part-of-speech
verb
verb
vform
object
n, inf, base
vform
The types speech sound, list, set, boolean, nom-obj, index, proposition, quanti er are as in Appendix II. The types sign, local, phrase, head-adju-phr are as in Appendix II, with additions noted.
27Apr98
22 REFERENCES
Anderson, J. R. (1977). Induction of augmented transition networks. Cognitive Science, 1, 125157. Bates, Elizabeth, and Brian MacWhinney. (1979). The functionalist approach to the acquisition of grammar. In E. Ochs and B. Schiein (Eds.), Developmental pragmatics. New York: Academic Press. Bates, Elizabeth, and Brian MacWhinney. (1982). The development of grammar. In E. Wanner and L. Gleitman (Eds.), Language acquisition; the state of the art. Cambridge, MA: MIT Press. Bloom, L. (1993). The transition from infancy to language. Cambridge, England: Cambridge University Press. Braine, Martin D. S. (1987). What is learned in acquiring word-classes{a step toward an asquisition theory. In B. MacWhinney (Ed.), Mechanisms of language. Hillsdale, N.J.: L. Erlbaum Associates. Bouma, Gosse, Robert Malouf, and Ivan A. Sag. (1998). Satisfying constraints on adjunction and extraction. Ms. Brown, Roger. (1968). The development of wh questions in child speech. Journal of Verbal Learning and Verbal Behavior, 7, 277-90. Brown, Roger. (1973). A rst language: the early stage. Cambridge, MA.: Harvard University Press. Clark, Herbert H., and Eve Clark. (1977). Psychology and Language: an introduction. New York: Harcourt Brace Jovanovich. Foss, D., and D. Hakes. (1978). Psycholinguistics. Englewood Clis, NJ: Prentice-Hall. Ginzburg, Jonathan, and Ivan A. Sag. (1998). English interrogative constructions. Ms. CSLI Publications Green, Georgia M. (1981). Pragmatics and syntactic description. Studies in the Linguistic Sciences, 11;1, 27-37. Department of Linguistics, University of Illinois, Urbana. Green, Georgia M., and Jerry L. Morgan. (1996). Auxiliary inversions and the notion `default speci cation'. Journal of Linguistics, 32, 43-56. Hall, Brian. (1997). Madeleine's world. Boston: Houghton-Miin. Hirsh-Pasek, K., D. G. Kemler Nelson, P. W. Jusczyk, K. Wright-Cassidy, B. Druss, and L. Kennedy. (1987). Clauses are perceptual units for young infants. Cognition, 26, 269-286. Jusczyk, P. W., K. Hirsh-Pasek, D. G. Kemler Nelson, L. J. Kennedy, A. Woodward, and J. Piwoz. (1992). Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology, 24, 252-293. MacNamara, John. (1972). Cognitive basis for language learning in infants. Psychological Review, 79, 1-13.
27Apr98
23
MacNamara, John. (1982). Names for things; a study of child language. Cambridge, MA: Bradford Books/MIT Press. Maratsos, M. and M. A. Chalkley, (1980). The internal language of children's syntax; the ontogenesis and representation of syntactic categories. In K. E. Nelson (Ed.), Children's language (Vol. 1). New York: Gardner Press. Morgan, James L. and Elissa Newport. (1981). The role of constituent structure in the induction of an arti cial language. Journal of verbal learning and verbal behavior, 20, 67-85. Pinker, Stephen. (1987). The bootstrapping problem in language acquisition. In B. MacWhinney (Ed.), Mechanisms of language. Hillsdale, N.J: L. Erlbaum Associates. Pinker, Stephen. (1989). Learnability and cognition. Cambridge, MA: MIT Press. Pollard, Carl. (In press). Strong generative capacity in HPSG. In A. Kathol, J.-P. Koenig, and G. Webelhuth (Eds.), Studies in Constraint-based Lexicalism. Stanford: CSLI Publications. Pollard, Carl, and Ivan Sag. (1987). Information-based syntax and semantics (Vol. 1). Stanford University, Center for the Study of Language and Information. Pollard, Carl, and Ivan Sag. (1994). Head-driven phrase structure grammar. Chicago: University of Chicago Press. Sag, Ivan A. (1997). English Relative Clause Constructions. Journal of Linguistics, 32, 431-485. Sag, Ivan A., and Thomas Wasow. (1998). Syntactic theory: a formal introduction. Ms. Tomasello, Michael. (1992). First verbs; a case study of early grammatical development. Cambridge, England: Cambridge University Press. Wanner, E. and L. Gleitman. (1982.) Language acquisition; the state of the art. Cambridge, MA: MIT Press.