Jun 1, 2018 - actual development of human language then doesn't start in ..... also can be via shared gaze, body orientation, topic â the key is congruence.
6/1/18 David McNeill. For a book (not yet titled) on children’s first language
PART 2
Chapter 1. Speech-sound development DRAFT
Beginning speech-sounds.................................................................................................................... 1 First speech-sounds ......................................................................................................................... 2 Categorial perception, ‘tangential’ motor theory ............................................................................. 3 Why no speech-sounds at birth? ..................................................................................................... 4 Limits to studies .............................................................................................................................. 5 Babbling .............................................................................................................................................. 5 First words .......................................................................................................................................... 7 Jakobson’s theory (new basis) ......................................................................................................... 8 Acquisition 1 rides on and shapes Acquisition 3, extinguishes.......................................................... 10 The second year (shaping) ............................................................................................................. 10 The third year “dark age” (extinction) ............................................................................................ 11 Acquisition 2 vocabulary growth ...................................................................................................... 12 References ........................................................................................................................................ 14 New points........................................................................................................................................ 15
To review and to make this chapter self-contained, here again is the basic argument, first presented in Chap 1: children’s language development is not continuous. It is a succession of recapitulated language origins, the first of which, gesture-first, extinguished with the Neanderthals or other early human species. It nonetheless left a residue in homo sapiens and children now recapitulate it. This is the language of children in roughly the second year of life, with a recapitulation of its extinction as well, and this is in the third year. Children then start to recapitulate a second origin, the origin of our human language, around age 3. The gesture-first language does not turn into the gesture–speech unity language. The actual development of human language then doesn’t start in children until age 3 or so. However, there is also continuity from a third acquisition, that of speech. Its evolution was to ensure adult-infant attachment. This is where the development of the first speech sounds, babbling, the first words, and vocabulary development come in. There are recapitulations here too, as we shall see.
Beginning speech-sounds Before speech starts, the infant’s vocalizations are close to the primate mode, stereotyped, repetitive, tied to specific emotions and functions, like cooes
and cries. In phylogenesis, the gesture self-response was the force breaking vocalization free from primate stereotypy. What form does the self-response wedge take for children now at 2 months? They are not recapitulating the self-response itself (it does not emerge in development until much later) but could recapitulate something of the gesture-orchestrated speech it unleashed. The speech looks like a kind of ideophone – a “quasi-ideophone”, a vocalization that resembles something, in this case a movement (not yet a meaning). The speech-sound is an “image” of movements of head, torso, and arms and hands. Full-blown adult ideophones with meaning can orchestrate speech (Kita 1993) and this may be possible without meaning in the earliest infant vocalizations. The infant need not be in any way conscious of eliciting attachment but everything she does has this effect. The quasi-ideophone starts at 2 months roughly. The child has by then developed some voluntary control of hand and arm movements and with them orchestrates Cs and Vs. Quasi-ideophones appear in adult subjects in an experiment by Gentilucci and Dalla Volta (2007). In fact, the experiment is a good approximation of babbling. An action shapes the mouth as it utters a CV. Saying “ba” with a smaller mouth opening when picking up a small object than when picking up a large object is a quasi-ideophone where the vocal movement is an image of a manual action. Behind the quasi-ideophone is speech-for-attachment. This touches everything in the child’s development as a speaking human. In the origin of language, it broke vocalization loose from primate stereotypy. It now opens the way into the whole narration of the human mind. It brings not only continuity through successive gesture-first and gesture–speech unity recapitulations but is the bedrock on which they stand. . First speech-sounds
Clark cites perception research that by 2 months infants can discriminate “b” from “p”, “d” and “g” (pp. 65-66). We can assume the infants also control their own vocalizations for the speech-sounds they discriminate. Indeed, Leopold (1947), in his diaries, a vast record of his infant daughter’s speech, found that by 2 months all the sounds that Clark mentions and others are “frequent” (pp. 99-103). Before then his daughter “cooed” and cried (which likely she continued to do and added speech-sounds as well).1 How do quasi-ideophones apply to the first infant speech-sounds? It could work as follows. The stop consonants, as vocal movements, are images of – 1
A 20-month-old infant who had been tracheostomized and was aphonic from 5 months showed, after the tracheostomy tube was removed, extreme reduction of the consonants typical for her age (Lock and Pearson 1990). However, she also showed a preference for bilabial stops, the sounds that appear among the first speechsounds at 2 months. It could be the speech-for-attachment process attempting to restart itself after long delay (I have no information on the child’s subsequent speech development).
2
resemble – the hand and other movements with which they co-occur. I’ve put my own gesture-sound correlations in Table 1 to illustrate how vocalizations like “b”, “p”, “d” and “g” could be images of manual gestures for one speaker. I do not suggest that infants are miniature phoneticians, but they have access to the feelings of their movements producing speech-sounds much as adults feel that “b” and “p” differ as movements as well as sounds, and that the differences resemble the accompanying movements of their body, limbs and hands. If there is control over producing movements, the infant’s movements, seemingly random, could actually be recapitulations of the ancient breakthrough into speech. These co-occurrences may be labile, but the motions are simple and broad, and infants at 2 months could likely perform them. Table 1. Vocalization-movement illustrations Discriminated sounds Gestures (adult intuitions) “b” / “p”
Hand opens / Hand closes
“b” / “g”
Hand opens with move forward / Hand opens plus move backward
“b” / “d”
Hand opens with move forward / Hand opens further with move forward
Vocal images of hand motions Mouth opens with no delay of voicing / Mouth opens with delay Mouth opens with no delay of voicing / Back of mouth opens with immediate voicing Mouth opens with no delay of voicing / Mouth stays open with immediate voicing
Categorial perception, ‘tangential’ motor theory
Clark also describes “categorial perception”, once thought to be unique to speech but since shown to exist in many sensory domains and with a wide variety of animals and birds. For infants, this also appears at 2 months. Categorial perception is studied with sounds that are made to differ in micro steps of, say, delay of voicing onset, an acoustic dimension known to differ between “b” and “p”. As the delay increases, the child continues to hear “b” until it hits a threshold, then perception jumps to “p”. One hypothesis that applies to both speech-sound discrimination and categorial perception is the motor theory, which claims that we perceive by matching our own speech to the acoustic stimulus; we hear what we ourselves say (also called analysis-by-synthesis). If the gestural self-response is this synthesis, then the motor theory explains how production and perception relate. The difficulty, of course, is that we have to understand that the self-response is not yet
3
possible for infants this young; in fact, we have to say it is not yet possible for another 2 or 3 years. 2 Quasi-ideophones provide a tangential version of the motor theory however that might explain both categorical perception and speech-sounds discrimination at 2 months. We are not examining unlimited arrays of speech-sounds – just a few – and we adopt the idea that categorial perception is a general perceptual property. The speech signal triggers two gestures, one of the vocal tract, the other of the hands, torso, and head. If they don’t match, the hand and body gestures dominate. A “b” gesture occurs (hand-or arms-opening) and the mouth-opening oral movement matches. It continues to match but less perfectly as the delay of voicing grows larger, until a different hand gesture arises (hand-opening delayed) – it dominates vocal tract movements, and perception switches to a “p”. It is a kind of motor theory/quasi-ideophone/categorial perception that may be within reach of an infant’s capabilities at 2 months. I don’t have a nifty name for this process, and “tangential motor-theory” will have to do. Now observing infant’s speech-sounds and hand movements has a new interest (infant mimicry of adult facial expressions, if enlarged to include adult and infant speech and the infant’s hand and body movements, could be a method; cf. Meltzoff & Moore 1983). Why no speech-sounds at birth?
One last question: why does speech not appear immediately, at birth? Birthcries differ depending on the language of the mother (Mampe et al. 2009). Why does a résumé of primate stereotypy appear at all? I think the answer lies in three or possibly four places. First, the adult-infant system. The infant needs to hear adult speech and see it coming from another person (speech heard intrauterine probably envelops the fetus and does not have a locus). Also, to form quasiideophones the infant must be able to move both hands/arms and oral parts with some amount of control, and this too could delay the onset. Third, the selfresponse, while it triggers phylogenesis, is not accessible to the infant, only gesture-orchestrated speech in the form of quasi-ideophones is, and finding how to manage them may take time. And possibly fourth, if there is some condition of the 2
Wikipedia summarizes various criticisms of the motor theory, but none apply here. Some are entangled with the modular “speech-is-special” aspect, but the gesture–speech unity, gesture-orchestrated speech, self-response interpretation is non-modular and is in no way confounded by the entertaining phenomena mentioned – “duplex perception” with door slams contradicting the supposed difference between speech and non-speech perception (Fowler & Rosenblum1990); the McGurk Effect with bouncing ping-pong balls; and hearing acoustic differences within sound categories; other criticisms point to contextual information sources supposedly outside motor experience, but gesture-orchestrated speech is intrinsically contextual; and finally that infant motor control is insufficient to explain speech-sound discrimination, but the motor control that matters is the first gestureorchestrated speech, not the refined control of speech in fully a fully developed language. Only birds that hear songs in terms of vocal movements seem beyond it, since “manual” gestures could not be part of it (but gestures of a different kind could, Pika and Bugnyar 2011) (see Williams & Nottebohm 1985).
4
early human creature in which gesture-orchestrated speech evolved that matures in the infant today at around 2 months, speech could delay until then. Online guides for parents describe some relevant maturations at 2 months that could have prevailed at the origin of gesture-orchestrated speech, in hand movements (primatelike grip reflexes disappear, deliberate grips emerge) and vision (fixation and tracking of the adult face, especially eyes, and possibly the mouth when speaking), which the evolution of the self-response also would have required. This reason, if sustained, could be the basic explanation of the 2-months delay of speech-sounds. Infants, recapitulating the breakthrough, have to wait until they too meet these conditions. Limits to studies
A limitation of the studies in this area is not having correlated the infant’s vocalizations with movements. I tried to start my gesture-orchestrations with gestures and watched to see which of the 4 speech-sounds came out, but obviously my hand-opening and “ba” and the other movement-sound correlations arose from many sources and tell us nothing about the hand movement-speech correlations of infants. Observing infant motion and vocalization together, alone can hint at primordial recapitulations. One scholar I am aware of who has been interested in infant perception and movement together at very early ages is Colwyn Trevarthen at the University of Edinburgh, some of whose work I described in Chap 1. Clark’s table 5.1, to be discussed in the First Words section, records gestures with a child’s first words in the second year of life. One effect of this limit to studies is that, at several places in the following, I can hypothesize but cannot verify how gestures are involved in research findings. My statements should then be regarded as predictions which further studies could test.
Babbling Here are two clips of “Charlotte” at 10 months that figure in the discussion to follow:
5
ëdja
ai saitjëz
The quasi-ideophone develops along several dimensions and produces what I am saying goes on when an infant babbles – a merger of what the infant sees and hears, her play, her motor control and the continuation of quasi-ideophone vocal movements echoing manual-brachial ones without meaning, all under the blanket of fostering adult-infant attachment, the evolutionary imperative it meets. The first babbles are repetitions of CVs – like “ba-ba-ba-ba” etc., possibly a vocal image of the hands closing-opening repeatedly. Variations come later, say, ba-ga-ta-ba etc., an image possibly of the hands closing-opening with variations of position and shape. “Charlotte” to this way of thinking illustrates an advanced form of babbling speech-orchestration by manual/brachial movements, and indeed shows it quite strikingly. The first clip is her mouth full open and hand also full open and her arm lifted to full length overhead. She articulates an open “a”, a vocal-tract echo of the arm/hand. The second clip shows her hand as it moves from (her) left to right, opening and closing as it goes. Her mouth is partly open and we infer her tongue movements are creating the complex sequence of consonants and vowels that Kendon transcribes as “saitjëz” which the opening and closing hand could orchestrate. At the moment of the clip she is saying the closed “i” or “j”, the vocal tract echoing the then closed hand (clips used with the permission of Kendon, who also allows quoting his transcript, “Charlotte’s” mother, and the now 13-year-old “Charlotte” herself). As Kendon says, these are gestures in all but meaning. The thread of adult-infant attachment was also much elaborated, “Charlotte”, for those who have seen Kendon’s video, performs in a style worthy of Roman Senator, directs her gaze at her audience as in clip 1 or at her hand as if following it as in clip 2. This clip is a clue, too, to how “Charlotte” organized her babble dance. It was led by gesture; speech followed. She made her gestures and they led to vocal echoes, which were equally part of the spectacle. There is also synchrony of gesture and vocalization, as orchestration by quasi-ideophones implies. Do her utterances reveal discreteness and merging? Not 6
in the full sense, because rtheym lack meaning, but the discreteness and merging that do exist are unborn as it were, awaiting this animating spark of meaning.
First words Gesture-first, the next recapitulation at about the first birthday, delivers the spark. It brings meanings that animate the discretenesses and “merges” of babbling. There is continuity from babbling’s quasi-ideophones to gesture-first’s ideophones. What are these ideophone gestures like? A table in Clark suggests that they are often truncated actions – the hand reaching as if to pick up an object but not completing it. Kendon has written of these early symbols (1991). Thanks to gesture-first, the hand has taken on symbolic value beyond being just a movement. The gestures are pantomimes and can orchestrate speech sounds as did the earlier babble movements but now as images of meanings related to the actions the gesture pantomimes. No longer quasi-ideophones they are real ideophones (Haiman 2018). Like gesture-first symbols in general, they are icons of actions in the world. In Clark’s Table 5.1 (p. 113, based on Carter 1979), the infant’s truncated actions convey actions desired but not fulfilled; that is, actions in the world that are themselves truncated. Some examples (with comments on the gesture details that could have orchestrated the speech): Request help to obtain object. Gesture: reach to object. Speech: “m-” (Mommy?) and if the gesture included closure it could orchestrate bilabial closure. Request attention to object. Gesture: points, which Vygotsky and others linked to truncated grasps. Speech: “d-” (child’s name? there?) where a closed gesture could orchestrate a second type of vocal tract closure – the obtain-grasp and point-(possible)-grasp forming a contrast or “l-” (look?? with an alternate version vocal tract closure). Request transfer to self or to other. Gesture: reach to person. Speech: “phonetic variants of ‘David’, ‘Mommy’” Impossible to say if gesture orchestrates “David” but if there was no closure it could orchestrate non-stop bilabial “m-"; Get help removing object. Gesture: waving hands, slapping (seemingly incomplete versions of removing or knocking away object). Speech: “b-” (“bad”??), another bilabial closure stop that slap could orchestrate; the hand-wave pantomime of removal would not orchestrate this consonant but could orchestrate following vowels. Get help to change situation. Gesture: negative headshake. Speech: “nasalized glottal stop sequence” (headshake could directly shape this sound, repeated shakes a sequence of them, and this is causation, not an ideophone).
7
The table also suggests that when the child starts recapitulating gesture-first it rides on the discreteness and merging that babbling passed on. The result is that discreteness fixes words as entities and merging scaffolds them vis-à-vis the pantomime – sounds no longer in extensive babbled strings but isolated, “discrete”, and “merged”, not with other sounds, but with gestures. There need not be gesture-sound synchrony and I assume there was none (the table does not show timing). Truncated-action pantomimes continue to be dominant during the whole of the gesture-first recapitulation as other semantic relations enter, as described in Greenfield & Smith. This gives a new perspective on the vocabulary explosion at around 18 months and why it is greater for children whose vocabulary growth has been slow. In the analysis in Chap 1, I proposed it was because “scaffolding” by gesture drew words in, and children low in them filled their supply quickly; and now we can see that a need to complete ideophones would be the engine for this. Jakobson’s theory (new basis)
An important theoretical question concerns Roman Jakobson’s (1968, originally 1941) theory that babbling and the first speech-contrasts are developmentally distinct, even with a period of silence at the transition (p. 22): As all observers acknowledge with great surprise, the child then loses nearly all his ability to produce sounds in a passing over from the pre-language stage to the first acquisition of words, i.e., to the first genuine stage of langage … It is easy to understand how those articulations which are lacking in the language of the child’s environment easily disappear from his inventory. But it is striking that in addition, many other sounds which are common both to both the child’s babbling and to the adult language are in the same way disposed of, in spite of the environmental model which he depends on.
Clark summarizes the critique of this theory (p.111) that modern researchers have provided, which emphasizes the continuity of speech-sounds running from babbling into first words (as put by Oller et al 1976, “strong similarities of sound sequences in babbling and first words and absences in first words of sounds not in babbling”). There is also a continuity of quasi-ideophones to ideophones. Paradoxically, this continuity explains Jakobson’s observation, although in a way he did not envision (whether frequent is another matter). At some point, the sources Jakobson drew on had observed something like sounds common to both babbling and adult language that were being disposed of in the child’s first words, and with quasi-ideophone-ideophone continuity we can see how this might have happened. Despite quasi-ideophone/ideophone continuity, there is an important difference. Quasi-ideophone babbling is free, expansive, liberated by echoing the infant’s newfound motor control, a source of fascination and play (as we gather for 8
“Charlotte”). The ideophones that gesture-first creates, on the other hand, carry meanings dependent on the child’s own practical actions, truncated into pantomimes. This opens the possibility that different actions appear in babbling and in the first words. Then the child could lose the babble sounds even though they are part of her linguistic environment. An opening-closing hand might have orchestrated “Charlotte’s” babbled “i” but not have been part of any practical action she would perform, and so not part of her pantomime-orchestration of the first word-sounds. If there were no other quasi-ideophones for the sound (which we don’t know) and no pragmatic actions that could have been pantomimed for it, her first words could have disposed of it.3 It is unlikely the continuity papers have considered this possibility. Further, a silent period between babbling and the first words also could appear, though not always, even if the quasi-ideophone-toideophone change is seamless. If the advent of gesture-first is slowed for some reason, then quasi-ideophones could cease, a pause intrude, and the first words be delayed. For other children, the processes could run on without break, and both outcomes reflect the same basic developmental phenomenon. The effect of disposed of speech-sounds however is not likely to last long, and a fleeting existence could be one reason it has not been observed in modern studies. It is certainly true, as Elbers & Ton (1985) write, that “…new words may influence the character and the course of babbling, whereas babbling in turn may give rise to phonological preferences for selecting other new words” (p. 551), and even if “Charlotte” had disposed of “i” she soon would have discovered how to orchestrate it with an ideophone from other sources. The analysis adopts the critique that the papers cited by Clark have put forth going back at least to 1976, while explaining how first words still might dispose of some babbling sounds and how even a silent period could come to be. It has this advantage, that it combines views that have been perceived as opposed. And it also shows how a version of Jakobson’s theory is displayed in Clark’s Table 5.1. Long babble strings like “Charlotte’s” “ëdja ëdet dit ai tjë-e-h isë atje si iai kh’ ai saitjëz bop bop”,4 are reduced to single-sound first-words once meaning is part of the orchestration. Ideophones from pantomimes explains this. I agree that restoring Jakobson’s idea is a long shot but before removing it from all consideration it needs to be judged with reference to action.
3
This is pure speculation: I am trying to make clear the explanation of how sounds could be disposed of and I have no information on “Charlotte’s” first word-sounds and gestures. 4 Transcription by Adam Kendon, quoted with permission.
9
Acquisition 1 rides on and shapes Acquisition 3, extinguishes The second year (shaping)
Whatever the final disposition of Jakobson’s theory, gestures change during the second year as they come to carry the new semantic/pragmatic relations Greenfield & Smith documented. And speech changes as well. There are many kinds of points but one appears in a new relation, a demand (not a request) and with it a new intonation contour. The same explanations apply to points with a falling intonation and requests with a flat or rising intonation. A point is complete in the sense that it does not approximate an action but is an action in itself. A falling, terminal (“complete”) intonation contour then echoes it (Cameron-Faulkner 2014, who also included eye-gaze). Joint completeness is the link. Requests have flat or rising intonation contours (Flax et al 1991). A truncated reach naturally fits the pragmatic function of a request, and a flat or rising intonation contour, itself incomplete, echoes it. Such is expected from gesture-first – the vocal movement echoing the gesture in terms of a semantic-pragmatic relation: falling intonation echoing pointing as a demand, a flat or rising intonation echoing incomplete action as a request. If one thinks of gestures and vocal movements as “merging”, the complete action merges with a complete intonation contour to form an ideophone of demand, while the incomplete action of a request merges with an incomplete “flat” intonation contour in an ideophone of request. This is how a gesture-first language orchestrates speech; gesture and speech relate not as co-expressive but as entities in semantic relations. Co-expressiveness comes later, with Acquisition 2. Here then is Acquisition 1 shaping the speech-for-attachment of Acquisition 3. It seems likely too that the demand quality of the point, once established, adds to the infant’s understanding of the quality of a demand, by contrasting it to non-demand requests. If the truncated reach or some development of it continues with flat intonation, the infant would have ideophones carrying both semanticpragmatic relations. Infants at 18 months with two kinds of ideophones may be on their way to the difference in pragmatic value that adults distinguish (a point with a falling intonation is almost imperious, one with a flat or rising contour indicates or creates a reference and is not a demand at all). And the two ideophones, orchestrating intonation contours of different qualities, promote what Jakobson at another point described as the child acquiring contrasts of sounds (not sounds as such). Contrasts could be vocal movements that echo gestures – bilabial “b”/dental “d” differences echo closing the full hand vs closing a few fingers only and short and long vowels echo gestures with or without cut-offs, the short vowels being the ones with some interrupting force (this is surely not universal, just one possibility: depending on the gesture 10
which happens to be the start of the contrasting orchestrations, others then contrast to it, and this can vary). As Jakobson said, contrasts are the thing and this is separate from a fixed hierarchy if any. The several versions of vocal-tract closure in Clark’s Table 5.1 suggest families of contrasts around possibly different versions of gestures. Another interesting example could be either an ideophone or a quasiideophone orchestration. It is the 2-year-old’s ability to distinguish words with long vowels (like “tea”) and words with short vowels (like “tick”) by adding a coda consonant to only (or chiefly) the short-vowel words (Miles et al 2016). Twoyear-olds often omit consonants from terminal positions but do so less with short vowels. This fits a convention of English phonology and could be another impact of Acquisition 1 on Acquisition 3. We do not have gesture information and must speculate as to how gesture-first ideophones could be involved, but we can suppose that the very shortness of short vowels is orchestrated by gestures that are in some way abbreviated – cut off, terminated with another gesture, or other possibilities, with meanings to correspond (“tick” ends abruptly, and an ideophone with this meaning could come from an abrupt gesture). The cut-offs open a space for the coda consonant, orchestrating the possibility of the coda too. If it was an ideophone it includes a semantic relation in the echo or “merge” of the meaning of “tick”, meaning an abrupt sound, with a cut-off manual gesture as a kind of sound. This account also applies to quasi-ideophones. With them, it would be the vocal movement echoing the gesture movement itself without meaning. I think the quasi-ideophone alternative is less insigshtful, as it doesn’t include a role for the child’s burgeoning vocabulary. Nonetheless, either mode predicts that the manual side is also shaped: an abrupt, cut off movement echoes the short vowel with coda consonant, and the sides of the ideophone converge. The third year “dark age” (extinction)
As the child enters his/her third year, Acquisition 1 continues its recapitulation of gesture-first, including now its extinction. At the same time Acquisition 2 is beginning. These changes affect the speech-for-attachment in Acquisition 3. The most important change is that with the extinction of gesturefirst vocal movements cease echoing manual movements as ideophones. Speech is no longer “merged” in a semantic relation to a gesture but is synchronous and coexpressive with it as a gesture–speech unity. These changes have major effects on he child’s ability, for one thing, to produce clusters of consonants. Smit (1993) summarizes children at 2;0 (well into gesture-first and entering the "dark age") reducing initial consonant clusters to stops, leaving out “approximants” – [w, j, r, l]. The initial [kw] of “queen” for example comes out as “k”. These reductions could be naturally formed out of 11
ideophones. The child has a gesture embodying her idea of a queen (say, an up and down circle around her head for a crown) but unless at the same time she can orchestrate the hand, say, to slide forward, the approximant [w] will be disposed of; this is at 2;0. Starting at 3 these reductions become rare, and the reason is that gesture-first has extinguished and with it the ideophones that led to them. This much is the "dark age". When self-responses to gestures come on with Acquisition 2, the child learns to produce the target “queen” and can do it because learning takes place unimpeded in a new framework of gesture-speech co-expressivity without the ideophone’s limits. Finally, the extinction of gesture-first and the beginning of gesture–speech unity dramatically change “merge”. In gesture-first it had the function of connecting the ideophone’s gestural and vocal movements. In the new regime it unpacks the GP gesture–speech units that Acquisition 2 is bringing forth.
Acquisition 2 vocabulary growth How does Acquisition 2 foster word learning? Growth point mimicry is surprisingly common among adults and is possible for young children once Acquisition 2 has begun. Importantly, it makes vocabulary growth possible. There must be many ways that children acquire words, but I wish to describe one that is specific to Acquisition 2. Infants mimic adult speech and facial expressions from 2 months, and Acquisition 2 gives them the ability to also mimic growth points. For adults, such mimicry shows every sign of being involuntary and unconscious, and for children it would be a natural continuation of mimicry with the new form of language that Acquisition 2 brings, The power of mimicry is that the mental process of one person orchestrates that of another person. Person 1 orchestrates person 2’s speech by virtue of 2’s mimicry of 1’s gesture. The result is two persons sharing one growth point and one awareness (gesture coders often quite unconsciously mimic the gestures and speech they are coding to achieve this insight). The result is two persons sharing gesture-orchestrated speech and gesture–speech unity awareness. The figure shows the phenomenon with adults:
12
Embodiment in two bodies (from Furuyama 2000). Panel 1. Silent involuntary mimicry of gesture by learner (on left) synchronized with teacher’s speech. Panel 2. While speaking, learner (left) appropriates teacher’s gesture, nullifying usual prohibition of physical contact by strangers. As Furuyama observed, the teacher invited contact by turning his body away, giving the same left-right orientation to his gesture space as the learner’s. Computer art by Fey Parrill. Used with permission of University of Chicago Press.
Equipped with this kind of mimicry, a child’s vocabulary could expand rapidly. Adult and child momentarily share growth points and word awareness, and this seems a powerful situation for learning. Huttenlocher et al (1991) discovered a correlation between the amount and quality of speech that adults use speaking to children and the children’s own vocabulary growth. Mimicry explains this, since crucially it depends on the child being spoken to, hence ensuring a correlation of adult and child speech that will be positive and substantial. Mimicry by little children is rarely so focused as the adult example however. Imagine a child well into Acquisition 2 with something like these abilities who hears a word that interests her but which she does not know. She may attempt to repeat the word, mimic her speaker’s gesture and get a growth point, but mimicry also can be via shared gaze, body orientation, topic – the key is congruence between child and adult. Yet, because it is so non-specific, it may not look like mimicry. The child fills the gap by fitting various things from the situation as she comprehends it into the growth point she mimics. This gives the new word a meaning although errors are possible. Such is the vocabulary growth that Acquisition 2 makes possible. Ella at 3 has a word-game with her father that includes mimicry of this kind (CHILDES/ Eng-UK/ Forrester/ 030018). She is playing with the word “Nutella” and her own name. While this is not word learning but word play, it is the kind of total-situation mimicry in which learning may take place. There is explicit mimicry by both FAT (464) and Ella (480). And it is related to learning – a modification of how to say “Nutella”, to bring out the joke of “n::ut [nʌʌt] Ella”: :
13
453 454 455 456 457 458 459 460 461 462 463 464 465 466
CHI:
480
CHI:
CHI: FAT: CHI: FAT: CHI: FAT: CHI:
nut↑[nʌt] ella (0.9) daddy , yeah (0.5) nut[nut]ella and nut[nʌt] ella (.) ↑A::T would be silly , it would wouldn't it (0.2) the sound (0.5) nut[nut]ella and nut[nʌt] ella (0.6) yeah
… n::ut [nʌʌt] ella
(spacing in 453, 458, 464 and 480 as in original) Pronunciation: ʌ = the “u” in “Strut” u = the vowel in “goose”
References Carter, Anne L. 1979. The disappearance schema: Case study of a second-year communication behavior. In E. Ochs & B.B. Schieffelin (eds.), Developmental Pragmatics, pp. 131-165. Academic Press. Clark, Eve. 2016. First Language Acquisition. Cambridge University Press. Elbers, Lockie &Ton, Josi. 1985. Play pen monologues: The interplay of words and babbles in the first words period. Journal of Child Language 12: 551-565. Everett, Daniel L. 2017. How Language Began: The story of humanity’s greatest invention. Norton. Flax, J., Lahey, M., Harris, K. & Boothroyd, A. 1991. Relations between prosodic variables and communicative functions. Journal of Child Language 18(1): 3-19. Fowler, C. A. & Rosenblum, L. D. 1990. Duplex perception: A comparison of monosyllables and slamming doors. Journal of experimental psychology. Human perception and performance. 16 (4): 742–754. Furuyama, Nobuhiro. 2000. Gestural interaction between the instructor and the learner in origami instruction, in D. McNeill (ed.), Language and Gesture, pp. 99–117. Cambridge University Press. Gentilucci, Maurizio & Dalla Volta, Riccardo. 2007. The motor system and the relationship between speech and gesture. Gesture 7:159-177.
14
Greenfield, Patricia M. & Smith, Joshua H. 1976. The Structure of Communication in Early Language Development. Academic Press. Haiman, John. 2018. Ideophones and the evolution of language. Cambridge University Press. Huttenlocher, Janellen, Haight, Wendy, Bryk, Anthony, Seltzer, Michael and Lyons, Thomas. 1991. Early Vocabulary Growth: Relation to Language Input and Gender. Developmental Psychology 27(2): 236-248. Kendon, Adam. 1991. Some considerations for a theory of language origins. Man 26: 199-221. Kita, S. 1993. Language and thought interface: A study of spontaneous gestures and Japanese mimetics. PhD Dissertation, Departments of Linguistics and Psychology, The University of Chicago Regenstein Library. Jakobson, Roman. 1968. Child language, aphasia, and phonological universals. Mouton (German version published in 1941). Leopold, Werner F. 1947. Speech Development of a Bilingual Child. A Linguist's Record. Vol. II. Sound-learning in the first two years. Northwestern University Press. Lock, John L. & Pearson, Dawn M. 1990. Linguistic significance of babbling: Evidence from a tracheostomized infant. Journal of Child Language 17(1): 1-16. Mampe, Birgit, Friederici, Angela D., Christophe, Anne and Wermke, Kathleen. 2009. Newborns’ cry melody is shaped by their native language. Current Biology 19:1-4. Meltzoff, A. N. & Moore, M. K. 1983. Newborn infants imitate adult facial gestures. Child Development. 54(3): 702-709. Miles, Kelly, Yuen, Ivan, Cox, Felicity & Demuth, Katherine. 2016. The prosodic licensing of coda consonants in early speech: interactions with vowel length. Journal of Child Language 43: 265-282. Oller, D. Kimbrough, Wieman, Leslie A. Doyle, William J. & Ross, Carol. 1976. Infant babbling and speech. Journal of Child Language. 3: 1-11. Pika, Simone & Bugnyar, Thomas. 2011. “The use of referential gestures in ravens (Corvus corax) in the wild.” Nature Communications 29 November 2011. Smit, A. B. 1993. Phonologic error distributions in the Iowa-Nebraska Articulation Norms Project: word-initial consonant clusters. Journal of Speech and Hearing Research 36(5): 931-47. Tomasello, Michael. 1999. The Cultural Origins of Human Cognition. Harvard University Press. Williams, H. & Nottebohm, F. 1985. Auditory responses in avian vocal motor neurons: A motor theory for song perception in birds. Science. 229 (4710): 279–282.
New points Quasi-ideophones distinguished from ideophones, and both distinguished from gesture–speech unity. I came to appreciate ideophones as central to Acquisition 1 in discussions with John Haiman about his new book, cited above.
15
and cries. In phylogenesis, the gesture self-response was the force breaking vocalization free from primate stereotypy. What form does the self-response wedge take for children now at 2 months? They are not recapitulating the self-response itself (it does not emerge in development until much later) but could recapitulate something of the gesture-orchestrated speech it unleashed. The speech looks like a kind of ideophone – a “quasi-ideophone”, a vocalization that resembles something, in this case a movement (not yet a meaning). The speech-sound is an “image” of movements of head, torso, and arms and hands. Full-blown adult ideophones with meaning can orchestrate speech (Kita 1993) and this may be possible without meaning in the earliest infant vocalizations. The infant need not be in any way conscious of eliciting attachment but everything she does has this effect. The quasi-ideophone starts at 2 months roughly. The child has by then developed some voluntary control of hand and arm movements and with them orchestrates Cs and Vs. Quasi-ideophones appear in adult subjects in an experiment by Gentilucci and Dalla Volta (2007). In fact, the experiment is a good approximation of babbling. An action shapes the mouth as it utters a CV. Saying “ba” with a smaller mouth opening when picking up a small object than when picking up a large object is a quasi-ideophone where the vocal movement is an image of a manual action. Behind the quasi-ideophone is speech-for-attachment. This touches everything in the child’s development as a speaking human. In the origin of language, it broke vocalization loose from primate stereotypy. It now opens the way into the whole narration of the human mind. It brings not only continuity through successive gesture-first and gesture–speech unity recapitulations but is the bedrock on which they stand. . First speech-sounds
Clark cites perception research that by 2 months infants can discriminate “b” from “p”, “d” and “g” (pp. 65-66). We can assume the infants also control their own vocalizations for the speech-sounds they discriminate. Indeed, Leopold (1947), in his diaries, a vast record of his infant daughter’s speech, found that by 2 months all the sounds that Clark mentions and others are “frequent” (pp. 99-103). Before then his daughter “cooed” and cried (which likely she continued to do and added speech-sounds as well).1 How do quasi-ideophones apply to the first infant speech-sounds? It could work as follows. The stop consonants, as vocal movements, are images of – 1
A 20-month-old infant who had been tracheostomized and was aphonic from 5 months showed, after the tracheostomy tube was removed, extreme reduction of the consonants typical for her age (Lock and Pearson 1990). However, she also showed a preference for bilabial stops, the sounds that appear among the first speechsounds at 2 months. It could be the speech-for-attachment process attempting to restart itself after long delay (I have no information on the child’s subsequent speech development).
2
resemble – the hand and other movements with which they co-occur. I’ve put my own gesture-sound correlations in Table 1 to illustrate how vocalizations like “b”, “p”, “d” and “g” could be images of manual gestures for one speaker. I do not suggest that infants are miniature phoneticians, but they have access to the feelings of their movements producing speech-sounds much as adults feel that “b” and “p” differ as movements as well as sounds, and that the differences resemble the accompanying movements of their body, limbs and hands. If there is control over producing movements, the infant’s movements, seemingly random, could actually be recapitulations of the ancient breakthrough into speech. These co-occurrences may be labile, but the motions are simple and broad, and infants at 2 months could likely perform them. Table 1. Vocalization-movement illustrations Discriminated sounds Gestures (adult intuitions) “b” / “p”
Hand opens / Hand closes
“b” / “g”
Hand opens with move forward / Hand opens plus move backward
“b” / “d”
Hand opens with move forward / Hand opens further with move forward
Vocal images of hand motions Mouth opens with no delay of voicing / Mouth opens with delay Mouth opens with no delay of voicing / Back of mouth opens with immediate voicing Mouth opens with no delay of voicing / Mouth stays open with immediate voicing
Categorial perception, ‘tangential’ motor theory
Clark also describes “categorial perception”, once thought to be unique to speech but since shown to exist in many sensory domains and with a wide variety of animals and birds. For infants, this also appears at 2 months. Categorial perception is studied with sounds that are made to differ in micro steps of, say, delay of voicing onset, an acoustic dimension known to differ between “b” and “p”. As the delay increases, the child continues to hear “b” until it hits a threshold, then perception jumps to “p”. One hypothesis that applies to both speech-sound discrimination and categorial perception is the motor theory, which claims that we perceive by matching our own speech to the acoustic stimulus; we hear what we ourselves say (also called analysis-by-synthesis). If the gestural self-response is this synthesis, then the motor theory explains how production and perception relate. The difficulty, of course, is that we have to understand that the self-response is not yet
3
possible for infants this young; in fact, we have to say it is not yet possible for another 2 or 3 years. 2 Quasi-ideophones provide a tangential version of the motor theory however that might explain both categorical perception and speech-sounds discrimination at 2 months. We are not examining unlimited arrays of speech-sounds – just a few – and we adopt the idea that categorial perception is a general perceptual property. The speech signal triggers two gestures, one of the vocal tract, the other of the hands, torso, and head. If they don’t match, the hand and body gestures dominate. A “b” gesture occurs (hand-or arms-opening) and the mouth-opening oral movement matches. It continues to match but less perfectly as the delay of voicing grows larger, until a different hand gesture arises (hand-opening delayed) – it dominates vocal tract movements, and perception switches to a “p”. It is a kind of motor theory/quasi-ideophone/categorial perception that may be within reach of an infant’s capabilities at 2 months. I don’t have a nifty name for this process, and “tangential motor-theory” will have to do. Now observing infant’s speech-sounds and hand movements has a new interest (infant mimicry of adult facial expressions, if enlarged to include adult and infant speech and the infant’s hand and body movements, could be a method; cf. Meltzoff & Moore 1983). Why no speech-sounds at birth?
One last question: why does speech not appear immediately, at birth? Birthcries differ depending on the language of the mother (Mampe et al. 2009). Why does a résumé of primate stereotypy appear at all? I think the answer lies in three or possibly four places. First, the adult-infant system. The infant needs to hear adult speech and see it coming from another person (speech heard intrauterine probably envelops the fetus and does not have a locus). Also, to form quasiideophones the infant must be able to move both hands/arms and oral parts with some amount of control, and this too could delay the onset. Third, the selfresponse, while it triggers phylogenesis, is not accessible to the infant, only gesture-orchestrated speech in the form of quasi-ideophones is, and finding how to manage them may take time. And possibly fourth, if there is some condition of the 2
Wikipedia summarizes various criticisms of the motor theory, but none apply here. Some are entangled with the modular “speech-is-special” aspect, but the gesture–speech unity, gesture-orchestrated speech, self-response interpretation is non-modular and is in no way confounded by the entertaining phenomena mentioned – “duplex perception” with door slams contradicting the supposed difference between speech and non-speech perception (Fowler & Rosenblum1990); the McGurk Effect with bouncing ping-pong balls; and hearing acoustic differences within sound categories; other criticisms point to contextual information sources supposedly outside motor experience, but gesture-orchestrated speech is intrinsically contextual; and finally that infant motor control is insufficient to explain speech-sound discrimination, but the motor control that matters is the first gestureorchestrated speech, not the refined control of speech in fully a fully developed language. Only birds that hear songs in terms of vocal movements seem beyond it, since “manual” gestures could not be part of it (but gestures of a different kind could, Pika and Bugnyar 2011) (see Williams & Nottebohm 1985).
4
early human creature in which gesture-orchestrated speech evolved that matures in the infant today at around 2 months, speech could delay until then. Online guides for parents describe some relevant maturations at 2 months that could have prevailed at the origin of gesture-orchestrated speech, in hand movements (primatelike grip reflexes disappear, deliberate grips emerge) and vision (fixation and tracking of the adult face, especially eyes, and possibly the mouth when speaking), which the evolution of the self-response also would have required. This reason, if sustained, could be the basic explanation of the 2-months delay of speech-sounds. Infants, recapitulating the breakthrough, have to wait until they too meet these conditions. Limits to studies
A limitation of the studies in this area is not having correlated the infant’s vocalizations with movements. I tried to start my gesture-orchestrations with gestures and watched to see which of the 4 speech-sounds came out, but obviously my hand-opening and “ba” and the other movement-sound correlations arose from many sources and tell us nothing about the hand movement-speech correlations of infants. Observing infant motion and vocalization together, alone can hint at primordial recapitulations. One scholar I am aware of who has been interested in infant perception and movement together at very early ages is Colwyn Trevarthen at the University of Edinburgh, some of whose work I described in Chap 1. Clark’s table 5.1, to be discussed in the First Words section, records gestures with a child’s first words in the second year of life. One effect of this limit to studies is that, at several places in the following, I can hypothesize but cannot verify how gestures are involved in research findings. My statements should then be regarded as predictions which further studies could test.
Babbling Here are two clips of “Charlotte” at 10 months that figure in the discussion to follow:
5
ëdja
ai saitjëz
The quasi-ideophone develops along several dimensions and produces what I am saying goes on when an infant babbles – a merger of what the infant sees and hears, her play, her motor control and the continuation of quasi-ideophone vocal movements echoing manual-brachial ones without meaning, all under the blanket of fostering adult-infant attachment, the evolutionary imperative it meets. The first babbles are repetitions of CVs – like “ba-ba-ba-ba” etc., possibly a vocal image of the hands closing-opening repeatedly. Variations come later, say, ba-ga-ta-ba etc., an image possibly of the hands closing-opening with variations of position and shape. “Charlotte” to this way of thinking illustrates an advanced form of babbling speech-orchestration by manual/brachial movements, and indeed shows it quite strikingly. The first clip is her mouth full open and hand also full open and her arm lifted to full length overhead. She articulates an open “a”, a vocal-tract echo of the arm/hand. The second clip shows her hand as it moves from (her) left to right, opening and closing as it goes. Her mouth is partly open and we infer her tongue movements are creating the complex sequence of consonants and vowels that Kendon transcribes as “saitjëz” which the opening and closing hand could orchestrate. At the moment of the clip she is saying the closed “i” or “j”, the vocal tract echoing the then closed hand (clips used with the permission of Kendon, who also allows quoting his transcript, “Charlotte’s” mother, and the now 13-year-old “Charlotte” herself). As Kendon says, these are gestures in all but meaning. The thread of adult-infant attachment was also much elaborated, “Charlotte”, for those who have seen Kendon’s video, performs in a style worthy of Roman Senator, directs her gaze at her audience as in clip 1 or at her hand as if following it as in clip 2. This clip is a clue, too, to how “Charlotte” organized her babble dance. It was led by gesture; speech followed. She made her gestures and they led to vocal echoes, which were equally part of the spectacle. There is also synchrony of gesture and vocalization, as orchestration by quasi-ideophones implies. Do her utterances reveal discreteness and merging? Not 6
in the full sense, because rtheym lack meaning, but the discreteness and merging that do exist are unborn as it were, awaiting this animating spark of meaning.
First words Gesture-first, the next recapitulation at about the first birthday, delivers the spark. It brings meanings that animate the discretenesses and “merges” of babbling. There is continuity from babbling’s quasi-ideophones to gesture-first’s ideophones. What are these ideophone gestures like? A table in Clark suggests that they are often truncated actions – the hand reaching as if to pick up an object but not completing it. Kendon has written of these early symbols (1991). Thanks to gesture-first, the hand has taken on symbolic value beyond being just a movement. The gestures are pantomimes and can orchestrate speech sounds as did the earlier babble movements but now as images of meanings related to the actions the gesture pantomimes. No longer quasi-ideophones they are real ideophones (Haiman 2018). Like gesture-first symbols in general, they are icons of actions in the world. In Clark’s Table 5.1 (p. 113, based on Carter 1979), the infant’s truncated actions convey actions desired but not fulfilled; that is, actions in the world that are themselves truncated. Some examples (with comments on the gesture details that could have orchestrated the speech): Request help to obtain object. Gesture: reach to object. Speech: “m-” (Mommy?) and if the gesture included closure it could orchestrate bilabial closure. Request attention to object. Gesture: points, which Vygotsky and others linked to truncated grasps. Speech: “d-” (child’s name? there?) where a closed gesture could orchestrate a second type of vocal tract closure – the obtain-grasp and point-(possible)-grasp forming a contrast or “l-” (look?? with an alternate version vocal tract closure). Request transfer to self or to other. Gesture: reach to person. Speech: “phonetic variants of ‘David’, ‘Mommy’” Impossible to say if gesture orchestrates “David” but if there was no closure it could orchestrate non-stop bilabial “m-"; Get help removing object. Gesture: waving hands, slapping (seemingly incomplete versions of removing or knocking away object). Speech: “b-” (“bad”??), another bilabial closure stop that slap could orchestrate; the hand-wave pantomime of removal would not orchestrate this consonant but could orchestrate following vowels. Get help to change situation. Gesture: negative headshake. Speech: “nasalized glottal stop sequence” (headshake could directly shape this sound, repeated shakes a sequence of them, and this is causation, not an ideophone).
7
The table also suggests that when the child starts recapitulating gesture-first it rides on the discreteness and merging that babbling passed on. The result is that discreteness fixes words as entities and merging scaffolds them vis-à-vis the pantomime – sounds no longer in extensive babbled strings but isolated, “discrete”, and “merged”, not with other sounds, but with gestures. There need not be gesture-sound synchrony and I assume there was none (the table does not show timing). Truncated-action pantomimes continue to be dominant during the whole of the gesture-first recapitulation as other semantic relations enter, as described in Greenfield & Smith. This gives a new perspective on the vocabulary explosion at around 18 months and why it is greater for children whose vocabulary growth has been slow. In the analysis in Chap 1, I proposed it was because “scaffolding” by gesture drew words in, and children low in them filled their supply quickly; and now we can see that a need to complete ideophones would be the engine for this. Jakobson’s theory (new basis)
An important theoretical question concerns Roman Jakobson’s (1968, originally 1941) theory that babbling and the first speech-contrasts are developmentally distinct, even with a period of silence at the transition (p. 22): As all observers acknowledge with great surprise, the child then loses nearly all his ability to produce sounds in a passing over from the pre-language stage to the first acquisition of words, i.e., to the first genuine stage of langage … It is easy to understand how those articulations which are lacking in the language of the child’s environment easily disappear from his inventory. But it is striking that in addition, many other sounds which are common both to both the child’s babbling and to the adult language are in the same way disposed of, in spite of the environmental model which he depends on.
Clark summarizes the critique of this theory (p.111) that modern researchers have provided, which emphasizes the continuity of speech-sounds running from babbling into first words (as put by Oller et al 1976, “strong similarities of sound sequences in babbling and first words and absences in first words of sounds not in babbling”). There is also a continuity of quasi-ideophones to ideophones. Paradoxically, this continuity explains Jakobson’s observation, although in a way he did not envision (whether frequent is another matter). At some point, the sources Jakobson drew on had observed something like sounds common to both babbling and adult language that were being disposed of in the child’s first words, and with quasi-ideophone-ideophone continuity we can see how this might have happened. Despite quasi-ideophone/ideophone continuity, there is an important difference. Quasi-ideophone babbling is free, expansive, liberated by echoing the infant’s newfound motor control, a source of fascination and play (as we gather for 8
“Charlotte”). The ideophones that gesture-first creates, on the other hand, carry meanings dependent on the child’s own practical actions, truncated into pantomimes. This opens the possibility that different actions appear in babbling and in the first words. Then the child could lose the babble sounds even though they are part of her linguistic environment. An opening-closing hand might have orchestrated “Charlotte’s” babbled “i” but not have been part of any practical action she would perform, and so not part of her pantomime-orchestration of the first word-sounds. If there were no other quasi-ideophones for the sound (which we don’t know) and no pragmatic actions that could have been pantomimed for it, her first words could have disposed of it.3 It is unlikely the continuity papers have considered this possibility. Further, a silent period between babbling and the first words also could appear, though not always, even if the quasi-ideophone-toideophone change is seamless. If the advent of gesture-first is slowed for some reason, then quasi-ideophones could cease, a pause intrude, and the first words be delayed. For other children, the processes could run on without break, and both outcomes reflect the same basic developmental phenomenon. The effect of disposed of speech-sounds however is not likely to last long, and a fleeting existence could be one reason it has not been observed in modern studies. It is certainly true, as Elbers & Ton (1985) write, that “…new words may influence the character and the course of babbling, whereas babbling in turn may give rise to phonological preferences for selecting other new words” (p. 551), and even if “Charlotte” had disposed of “i” she soon would have discovered how to orchestrate it with an ideophone from other sources. The analysis adopts the critique that the papers cited by Clark have put forth going back at least to 1976, while explaining how first words still might dispose of some babbling sounds and how even a silent period could come to be. It has this advantage, that it combines views that have been perceived as opposed. And it shows how a version of Jakobson’s theory is displayed in Clark’s Table 5.1. Long babble strings like “Charlotte’s” “ëdja ëdet dit ai tjë-e-h isë atje si iai kh’ ai saitjëz bop bop”,4 are reduced to single-sound first-words once meaning is part of the orchestration. Ideophones from pantomimes explains this. I agree that restoring Jakobson’s idea is a long shot but before removing it from all consideration it needs to be judged with reference to action.
3
This is pure speculation: I am trying to make clear the explanation of how sounds could be disposed of and I have no information on “Charlotte’s” first word-sounds and gestures. 4 Transcription by Adam Kendon, quoted with permission.
9
Acquisition 1 rides on and shapes Acquisition 3, extinguishes The second year (shaping)
Whatever the final disposition of Jakobson’s theory, gestures change during the second year as they come to carry the new semantic/pragmatic relations Greenfield & Smith documented. And speech changes as well. There are many kinds of points but one appears in a new relation, a demand (not a request) and with it a new intonation contour. The same explanations apply to points with a falling intonation and requests with a flat or rising intonation. A point is complete in the sense that it does not approximate an action but is an action in itself. A falling, terminal (“complete”) intonation contour then echoes it (Cameron-Faulkner 2014, who also included eye-gaze). Joint completeness is the link. Requests have flat or rising intonation contours (Flax et al 1991). A truncated reach naturally fits the pragmatic function of a request, and a flat or rising intonation contour, itself incomplete, echoes it. Such is expected from gesture-first – the vocal movement echoing the gesture in terms of a semantic-pragmatic relation: falling intonation echoing pointing as a demand, a flat or rising intonation echoing incomplete action as a request. If one thinks of gestures and vocal movements as “merging”, the complete action merges with a complete intonation contour to form an ideophone of demand, while the incomplete action of a request merges with an incomplete “flat” intonation contour in an ideophone of request. This is how a gesture-first language orchestrates speech; gesture and speech relate not as co-expressive but as entities in semantic relations. Co-expressiveness comes later, with Acquisition 2. Here then is Acquisition 1 shaping the speech-for-attachment of Acquisition 3. It seems likely too that the demand quality of the point, once established, adds to the infant’s understanding of the quality of a demand, by contrasting it to non-demand requests. If the truncated reach or some development of it continues with flat intonation, the infant would have ideophones carrying both semanticpragmatic relations. Infants at 18 months with two kinds of ideophones may be on their way to the difference in pragmatic value that adults distinguish (a point with a falling intonation is almost imperious, one with a flat or rising contour indicates or creates a reference and is not a demand at all). And the two ideophones, orchestrating intonation contours of different qualities, promote what Jakobson at another point described as the child acquiring contrasts of sounds (not sounds as such). Contrasts could be vocal movements that echo gestures – bilabial “b”/dental “d” differences echo closing the full hand vs closing a few fingers only and short and long vowels echo gestures with or without cut-offs, the short vowels being the ones with some interrupting force (this is surely not universal, just one possibility: depending on the gesture 10
which happens to be the start of the contrasting orchestrations, others then contrast to it, and this can vary). As Jakobson said, contrasts are the thing and this is separate from a fixed hierarchy if any. The several versions of vocal-tract closure in Clark’s Table 5.1 suggest families of contrasts around possibly different versions of gestures. Another interesting example could be either an ideophone or a quasiideophone orchestration. It is the 2-year-old’s ability to distinguish words with long vowels (like “tea”) and words with short vowels (like “tick”) by adding a coda consonant to only (or chiefly) the short-vowel words (Miles et al 2016). Twoyear-olds often omit consonants from terminal positions but do so less with short vowels. This fits a convention of English phonology and could be another impact of Acquisition 1 on Acquisition 3. We do not have gesture information and must speculate as to how gesture-first ideophones could be involved, but we can suppose that the very shortness of short vowels is orchestrated by gestures that are in some way abbreviated – cut off, terminated with another gesture, or other possibilities, with meanings to correspond (“tick” ends abruptly, and an ideophone with this meaning could come from an abrupt gesture). The cut-offs open a space for the coda consonant, orchestrating the possibility of the coda too. If it was an ideophone it includes a semantic relation in the echo or “merge” of the meaning of “tick”, meaning an abrupt sound, with a cut-off manual gesture as a kind of sound. This account also applies to quasi-ideophones. With them, it would be the vocal movement echoing the gesture movement itself without meaning. I think the quasi-ideophone alternative is less insigshtful, as it doesn’t include a role for the child’s burgeoning vocabulary. Nonetheless, either mode predicts that the manual side is also shaped: an abrupt, cut off movement echoes the short vowel with coda consonant, and the sides of the ideophone converge. The third year “dark age” (extinction)
As the child enters his/her third year, Acquisition 1 continues its recapitulation of gesture-first, including now its extinction. At the same time Acquisition 2 is beginning. These changes affect the speech-for-attachment in Acquisition 3. The most important change is that with the extinction of gesturefirst vocal movements cease echoing manual movements as ideophones. Speech is no longer “merged” in a semantic relation to a gesture but is synchronous and coexpressive with it as a gesture–speech unity. These changes have major effects on he child’s ability, for one thing, to produce clusters of consonants. Smit (1993) summarizes children at 2;0 (well into gesture-first and entering the "dark age") reducing initial consonant clusters to stops, leaving out “approximants” – [w, j, r, l]. The initial [kw] of “queen” for example comes out as “k”. These reductions could be naturally formed out of 11
ideophones. The child has a gesture embodying her idea of a queen (say, an up and down circle around her head for a crown) but unless at the same time she can orchestrate the hand, say, to slide forward, the approximant [w] will be disposed of; this is at 2;0. Starting at 3 these reductions become rare, and the reason is that gesture-first has extinguished and with it the ideophones that led to them. This much is the "dark age". When self-responses to gestures come on with Acquisition 2, the child learns to produce the target “queen” and can do it because learning takes place unimpeded in a new framework of gesture-speech co-expressivity without the ideophone’s limits. Finally, the extinction of gesture-first and the beginning of gesture–speech unity dramatically change “merge”. In gesture-first it had the function of connecting the ideophone’s gestural and vocal movements. In the new regime it unpacks the GP gesture–speech units that Acquisition 2 is bringing forth.
Acquisition 2 vocabulary growth How does Acquisition 2 foster word learning? Growth point mimicry is surprisingly common among adults and is possible for young children once Acquisition 2 has begun. Importantly, it makes vocabulary growth possible. There must be many ways that children acquire words, but I wish to describe one that is specific to Acquisition 2. Infants mimic adult speech and facial expressions from 2 months, and Acquisition 2 gives them the ability to also mimic growth points. For adults, such mimicry shows every sign of being involuntary and unconscious, and for children it would be a natural continuation of mimicry with the new form of language that Acquisition 2 brings, The power of mimicry is that the mental process of one person orchestrates that of another person. Person 1 orchestrates person 2’s speech by virtue of 2’s mimicry of 1’s gesture. The result is two persons sharing one growth point and one awareness (gesture coders often quite unconsciously mimic the gestures and speech they are coding to achieve this insight). The result is two persons sharing gesture-orchestrated speech and gesture–speech unity awareness. The figure shows the phenomenon with adults:
12
Embodiment in two bodies (from Furuyama 2000). Panel 1. Silent involuntary mimicry of gesture by learner (on left) synchronized with teacher’s speech. Panel 2. While speaking, learner (left) appropriates teacher’s gesture, nullifying usual prohibition of physical contact by strangers. As Furuyama observed, the teacher invited contact by turning his body away, giving the same left-right orientation to his gesture space as the learner’s. Computer art by Fey Parrill. Used with permission of University of Chicago Press.
Equipped with this kind of mimicry, a child’s vocabulary could expand rapidly. Adult and child momentarily share growth points and word awareness, and this seems a powerful situation for learning. Huttenlocher et al (1991) discovered a correlation between the amount and quality of speech that adults use speaking to children and the children’s own vocabulary growth. Mimicry explains this, since crucially it depends on the child being spoken to, hence ensuring a correlation of adult and child speech that will be positive and substantial. Mimicry by little children is rarely so focused as the adult example however. Imagine a child well into Acquisition 2 with something like these abilities who hears a word that interests her but which she does not know. She may attempt to repeat the word, mimic her speaker’s gesture and get a growth point, but mimicry also can be via shared gaze, body orientation, topic – the key is congruence between child and adult. Yet, because it is so non-specific, it may not look like mimicry. The child fills the gap by fitting various things from the situation as she comprehends it into the growth point she mimics. This gives the new word a meaning although errors are possible. Such is the vocabulary growth that Acquisition 2 makes possible. Ella at 3 has a word-game with her father that includes mimicry of this kind (CHILDES/ Eng-UK/ Forrester/ 030018). She is playing with the word “Nutella” and her own name. While this is not word learning but word play, it is the kind of total-situation mimicry in which learning may take place. There is explicit mimicry by both FAT (464) and Ella (480). And it is related to learning – a modification of how to say “Nutella”, to bring out the joke of “n::ut [nʌʌt] Ella”: :
13
453 454 455 456 457 458 459 460 461 462 463 464 465 466
CHI:
480
CHI:
CHI: FAT: CHI: FAT: CHI: FAT: CHI:
nut↑[nʌt] ella (0.9) daddy , yeah (0.5) nut[nut]ella and nut[nʌt] ella (.) ↑A::T would be silly , it would wouldn't it (0.2) the sound (0.5) nut[nut]ella and nut[nʌt] ella (0.6) yeah
… n::ut [nʌʌt] ella
(spacing in 453, 458, 464 and 480 as in original) Pronunciation: ʌ = the “u” in “Strut” u = the vowel in “goose”
References Carter, Anne L. 1979. The disappearance schema: Case study of a second-year communication behavior. In E. Ochs & B.B. Schieffelin (eds.), Developmental Pragmatics, pp. 131-165. Academic Press. Clark, Eve. 2016. First Language Acquisition. Cambridge University Press. Elbers, Lockie &Ton, Josi. 1985. Play pen monologues: The interplay of words and babbles in the first words period. Journal of Child Language 12: 551-565. Everett, Daniel L. 2017. How Language Began: The story of humanity’s greatest invention. Norton. Flax, J., Lahey, M., Harris, K. & Boothroyd, A. 1991. Relations between prosodic variables and communicative functions. Journal of Child Language 18(1): 3-19. Fowler, C. A. & Rosenblum, L. D. 1990. Duplex perception: A comparison of monosyllables and slamming doors. Journal of experimental psychology. Human perception and performance. 16 (4): 742–754. Furuyama, Nobuhiro. 2000. Gestural interaction between the instructor and the learner in origami instruction, in D. McNeill (ed.), Language and Gesture, pp. 99–117. Cambridge University Press. Gentilucci, Maurizio & Dalla Volta, Riccardo. 2007. The motor system and the relationship between speech and gesture. Gesture 7:159-177.
14
Greenfield, Patricia M. & Smith, Joshua H. 1976. The Structure of Communication in Early Language Development. Academic Press. Haiman, John. 2018. Ideophones and the evolution of language. Cambridge University Press. Huttenlocher, Janellen, Haight, Wendy, Bryk, Anthony, Seltzer, Michael and Lyons, Thomas. 1991. Early Vocabulary Growth: Relation to Language Input and Gender. Developmental Psychology 27(2): 236-248. Kendon, Adam. 1991. Some considerations for a theory of language origins. Man 26: 199-221. Kita, S. 1993. Language and thought interface: A study of spontaneous gestures and Japanese mimetics. PhD Dissertation, Departments of Linguistics and Psychology, The University of Chicago Regenstein Library. Jakobson, Roman. 1968. Child language, aphasia, and phonological universals. Mouton (German version published in 1941). Leopold, Werner F. 1947. Speech Development of a Bilingual Child. A Linguist's Record. Vol. II. Sound-learning in the first two years. Northwestern University Press. Lock, John L. & Pearson, Dawn M. 1990. Linguistic significance of babbling: Evidence from a tracheostomized infant. Journal of Child Language 17(1): 1-16. Mampe, Birgit, Friederici, Angela D., Christophe, Anne and Wermke, Kathleen. 2009. Newborns’ cry melody is shaped by their native language. Current Biology 19:1-4. Meltzoff, A. N. & Moore, M. K. 1983. Newborn infants imitate adult facial gestures. Child Development. 54(3): 702-709. Miles, Kelly, Yuen, Ivan, Cox, Felicity & Demuth, Katherine. 2016. The prosodic licensing of coda consonants in early speech: interactions with vowel length. Journal of Child Language 43: 265-282. Oller, D. Kimbrough, Wieman, Leslie A. Doyle, William J. & Ross, Carol. 1976. Infant babbling and speech. Journal of Child Language. 3: 1-11. Pika, Simone & Bugnyar, Thomas. 2011. “The use of referential gestures in ravens (Corvus corax) in the wild.” Nature Communications 29 November 2011. Smit, A. B. 1993. Phonologic error distributions in the Iowa-Nebraska Articulation Norms Project: word-initial consonant clusters. Journal of Speech and Hearing Research 36(5): 931-47. Tomasello, Michael. 1999. The Cultural Origins of Human Cognition. Harvard University Press. Williams, H. & Nottebohm, F. 1985. Auditory responses in avian vocal motor neurons: A motor theory for song perception in birds. Science. 229 (4710): 279–282.
New points Quasi-ideophones distinguished from ideophones, and both distinguished from gesture–speech unity. I came to appreciate ideophones as central to Acquisition 1 in discussions with John Haiman about his new book, cited above.
15