Gesture–Speech Unity or Embodiment — Its Origin in

0 downloads 0 Views 606KB Size Report
reach it gesture and speech had to be “equiprimordial.” Keywords: ... thought and action, cognitive being, origin of language. 1. .... However, gesture-first could not have produced the gesture-speech unity that .... All across the world, .... propose instead a mechanism for its selection that I call, after the early 20th Century.
CHRIST'S COLLEGE CAMBRIDGE EMBODIED LANGUAGE II

Gesture–Speech Unity or Embodiment — Its Origin in Human Evolution1 David McNeill University of Chicago Abstract  .................................................................................................................................  1   1.   Why  we  gesture  ..........................................................................................................  1   2.   Language  origin  ..........................................................................................................  3   2.1   Gesture-­‐first.  ......................................................................................................................  3   2.1.1   Models  of  supplanting  and  scaffolding  ...........................................................................  4   2.1.2   Problems  with  pantomime  ..................................................................................................  5   2.1.3   Two  current  gesture-­‐first  theories  ..................................................................................  6   2.1.3.1   The  mirror  system  hypothesis  ...................................................................................................  6   2.1.3.2   The  recursion  hypothesis  ............................................................................................................  7   2.1.4   End  of  gesture-­‐first  .................................................................................................................  7   2.2   Mead's  Loop.  ......................................................................................................................  8   2.2.1.   Answering  puzzles  .................................................................................................................  9   Comments  by  Renia  Lopez-­‐Ozieblo  ......................................................................................  9   Responses by David McNeill  ............................................................................................................  9   2.2.2   How  Mead's  Loop  engendered  Growth  Point  properties.  ....................................  11   3   Last  word  on  the  origin  ..................................................................................................  14  

3.     Timeline  ....................................................................................................................  15   References  .........................................................................................................................  17  

Abstract   Minimal packages of language embodiment have been called growth points (GPs). In a GP gesture and speech are inherent and equal parts. Out of a GP comes speech orchestrated around a gesture. Can theories of language origin explain this dynamic process? A popular theory, gesture-first, cannot; in fact, it fails twice – predicting what did not evolve (that gesture was marginalized when speech emerged), and not predicting what did evolve (that there is gesture–speech unity). A new theory, called Mead's Loop, is proposed that meets the test. Mead's Loop agrees that gesture was indispensable to the origin of language but holds that gesture was not first, that any gesture-first could not have led to language, and that to reach it gesture and speech had to be “equiprimordial.” Keywords:

1.  

1 Based

Gesture, dialectic, dynamic dimension, psychological predicates, language, thought and action, cognitive being, origin of language

Why  we  gesture  

largely on Chapters 2, 3, 5 and 6 of McNeill, D. How Language Began: Gesture and Speech in Human Evolution, Cambridge, 2012. Prepared for Embodied Language II, Christ’s College, Cambridge, 2-4 Sept., 2013.

Why do we gesture? Many would say that it brings emphasis, energy and ornamentation to speech (which is assumed to be the core of what is taking place); in 2 short, gesture is an “add-on.” However, evidence is against this. The reasons we gesture are more profound. Language is inseparable from it. While gestures enhance the material carriers of meaning, the core is gesture and speech together. They are bound more tightly than saying the gesture is an “add-on” or “ornament” implies. They are united as a matter of thought itself. Even if, for some reason, the hands are restrained and a gesture is not externalized, the imagery it embodies can still be present, hidden but integrated with speech (it may surface in some other part of the body, the feet for example). But the ultimate reason we gesture is that the origin of language was the origin of gesture and language jointly; they cannot be separated, and this is the reason we gesture. I can do no better than start with Humboldt’s distinction between Ergon and Energeia: “’An important distinction…kept reemerging’; this was Humboldt’s distinction between language as Ergon—language viewed as structure—and as Energeia—language as an ‘embodied moment of meaning located both in the organism and in the medium that the organism uses for expression.’ The latter is language at the moment of its use, ‘alive, in an actor’.” (Elena Levy, quoting Glick 1983, on Heinz Werner).

Saussure (1959, in lectures around 1910) crystalized Ergon into the synchronic approach, wherein language is considered a static entity, its components all seen panoramically at one theoretical instant. In language, Saussure said, there are only differences. To see them the synchronic view is essential. This approach is fundamental to most current-day linguistics, an academic field founded upon it, and a century of insights attests to its intellectual vigor. But there is also Energeia. Why do we gesture? is a question in this domain, of the dynamics of language. As it intersects language form gesture brings language to life, it standing otherwise inertly as a crystalline object. Humboldt’s Ergon and Energeia are conceptualized here as dimensions of language that cross in the growth point, described below. The dimensions have classically been called “linguistic” (Ergon) and “psychological” (Energeia) but better if less colorful (and less proprietary) terms are static and dynamic. Some phenomena are more accessible or prominent on one dimension, others on the other, but the dynamic and static dimensions cannot be isolated. They intersect and interact. The dimensions are structured on different principles and draw on different methods of description and analysis (the field of linguistics, as mentioned, specializing on the static). In a trenchant remark concerning the “poetic function” of language Roman Jakobson concisely explained the two principal axes of the static dimension. He wrote that “[t]he poetic function projects the principle of equivalence from the axis of selection into the axis of combination” (Jakobson 1960, p. 358). By poetic function, he means a process whereby sequences come to have contrastive values. His “axis of selection” is the paradigmatic axis – the contrasts established when a linguistic form is selected from a set of equivalents (“sheep’” and “mutton” are equivalents that contrast on an axis of selection in English). The “axis of combination” is the syntagmatic axis – by combining words new linguistic units and values appear (combining “hit” and “ball” into “hit the ball” generates a verb phrase, a unit at a higher level, and a direct object, a value “ball” does not have outside the combination). The dynamic dimension, cross-cutting the static, could be termed the “activity” of language but I am calling it “inhabiting” language. Inhabiting is done with one’s being, thought and action and becomes part of the speaker’s cognitive being at the moment of speech (Merleau-Ponty 1962). It is more than just making 2

Kendon (2008), who also argues against the view.

2

language your own. Inhabitance is all-encompassing, organizing thought and action, effecting goals and fulfilling presuppositions. The “inhabiting” terminology has the advantage of alluding to both langue, Saussure’s static system revealed synchronically, and to whatever animates it, the dynamic dimension. Besides Merleau-Ponty a historical figure associated with language regarded dynamically is Vygotsky (1987). On the dynamic dimension, units come and go, emerge and disperse in real time. It crosses at 90 degrees the abstractions from time and the unmoving totality of synchronic langue.

2.  

Language  origin  

How did this dynamic system emerge in human evolution? Even though we are referring to events that took place hundreds of thousands years ago origin theories can be empirically tested. We ask: does a theory explain how the gesture–speech unity dialectic evolved? Can it explain gesture–speech synchrony? If it cannot, that is one kind of falsification. Does it also positively predict that these things could not evolve? That is another kind. We shall see that a popular theory, the mirror neuron hypothesis of gesture-first, is such a theory, failing to predict in both senses.

2.1  

Gesture-­‐first.    

This theory, recurring in many recent books and articles, says that the first steps toward language phylogenetically were not speech, nor speech with gesture, but were gestures alone. Vocalizations in non-human primates, the presumed precursors of speech without gesture’s assistance, are too rigid and restricted in their functions to offer a plausible starting point for language, but primate gestures appear to offer the desired flexibility. Thus, the argument goes, gesture could have been the linguistic launching pad (speech evolving later). The gestures in this theory are regarded as the mimicry of real actions, a kind of pantomime, hence the appeal of mirror neurons as the mechanism. Current chimps show this kind of action mimicry. However, gesture-first could not have produced the gesture-speech unity that we find in ourselves. It “predicts” what did not evolve (that gesture withered or was marginalized when speech arose) and does not predict what did evolve (that gesture is an integral part of speaking). That so many have adopted it I explain by folk (and fabricated) beliefs about gestures that do not stand up to scrutiny when actual speech and gesture are examined. The contradiction of gesture-first is that speech first supplants gesture, it says, yet ends up integrated with it. Why does gesture-first say gesture must have withered when speech emerged? The logic of gesture-first, at its very core, means that supplantation of gesture by speech, overt or hidden, is inescapable. This is why every advocate automatically posits it. Empirically, there is a perfect correlation of those advocating gesture first and the supplantation step (Table 1). Moreover, there is this conceptual point that explains it; namely, supplantation is built into the gesture-first theory. It is important to see that gesturefirst is a theory about the origin of speech (not gesture). Given that aim, it must logically consider that from gesture one gets to speech; and here supplantation enters: it is unavoidable logically. I do not deny that gesture-first may once have existed, but if it did it could not have led to human language. It would have created pantomime, which does not synchronize with speech, and then extinguished or branched off into a dead-end. In fact, in children’s language, we see that it possibly did once exist but entered a deadend. The earliest stages of language development look much as gesture-first would expect, but then nothing comes from it; having no effects, it disappears from the child’s development. The imagery–language dialectic at the heart of our language then emerges separately and later.

3

SOURCE Henry Sweet (and presumably Henry Higgins).

Rizzolatti and Arbib.

Stefanini et al. (referring to Gentilucci and others). Tomasello, thinking in terms of primates and very young (one-year and less) human infants but with the suggestion that something similar took place in phylogenesis.

STATEMENT (REGARDING SUPPLANTED GESTURES IN BOLDFACE) “Gesture … helped to develop the power of forming sounds while at the same time helping to lay the foundation of language proper. When men first expressed the idea of ‘teeth’, ‘eat’, ‘bite’, it was by pointing to their teeth. If the interlocutor’s back was turned, a cry for attention was necessary which would naturally assume the form of the clearest and most open vowel. A sympathetic lingual gesture would then accompany the hand gesture which later would be dropped as superfluous so that ADA or more emphatically ATA would mean ‘teeth’ or ‘tooth’ and ‘bite’ or ‘eat’, these different meanings being only gradually differentiated” (Henderson 1971, pp. 3-4). (Thanks to Bencie Woll for bringing this passage to my attention “Manual gestures progressively lost their importance, whereas, by contrast, vocalization acquired autonomy, until the relation between gestural and vocal communication inverted and gesture became purely an accessory factor to sound communication” (p. 193). “the primitive mechanism that might have been used to transfer a primitive arm gesture communicative system from the arm to the mouth...” (p. 218). “Infants’ iconic gestures emerge on the heels of their first pointing … they are quickly replaced by conventional language … because both iconic gestures and linguistic conventions represent symbolic ways of indicating referents” (2008, p. 323).

Table 1. Gesture-first advocates and supplantation of gesture by speech.

  2.1.1   Models  of  supplanting  and  scaffolding  

Contemporary coded gesture systems, such as the Warlpiri sign language or ASL signs with speech performed by hearing sign-speech bilinguals, do not combine with speech – speech and sign mutually repel each other in time, so breaking synchrony, or they are simultaneous but are not co-expressive: either way languagelike coded gestures, semiotically like spoken language, do not form gesture-speech unities. The principle is that juxtaposing two codes does not form a dialectic of semiotic opposites; in fact, they are semiotically similar, not opposite, and an imagery– language dialectic cannot form. This affected evolution – it could not have progressed from a gesture-first language, and did require something else that enabled a combination of opposites. Below, I argue that “Mead's Loop” provided it. The following models demonstrate inherent gesture-first limitations. a) Warlpiri sign language. Women use the Warlpiri sign language of Aboriginal Australia when they are under (apparently quite frequent) speech bans and also, casually, when speech is not prohibited. When this latter happens signs and speech co-occur and lets us see what may have occurred at the hypothetical gesture or sign-speech crossover. Here is one example from Kendon (1988):

4

The spacing is meant to show relative durations, not that signs and speech were performed with temporal gaps (they were performed continuously). Speech and sign start out together at the beginning of each phrase but, since signing is slower, they immediately fall out of step. Each is on a track of its own and they do not unify. Speech does not slow down to keep pace with gesture, as would be expected if speech and gesture were unified (mutual speech–gesture slowing is shown with gesticulations, described in McNeill et al 2008). They then reset (there is one reset in the example) and immediately commence to separate again. So, according to this model, co-expressive speech–gesture synchrony would be systematically interrupted at the crossover point of gesture and speech codes. Yet synchrony of co-expressive speech and gesture is what evolved. b) English-ASL bilinguals. The second model is Emmorey et al.’s (2005) observation of the pairings of signs and speech by hearing ASL/English bilinguals. While 94% of such pairings are signs and words translating each other, 6% are not mutual translations. In the latter, sign and speech collaborate to form sentences, half in speech, half in sign. For example, a bilingual says, “all of a sudden [LOOKS-ATME]” (from a Sylvester and Tweety narration; capitals signify signs simultaneous with speech). This could be “scaffolding” but it does not create the combinations of unlike semiotic modes at co-expressive points that we are looking for. First, signs and words are of the same semiotic type – segmented, analytic, repeatable, listable, and so on. Second, there is no global-synthetic component, no built-in merging of analytic/combinatoric forms with gesture’s global synthesis, and the spoken and gestured elements are not co-expressive but are the different constituents of a sentence. Of course, ASL/English bilinguals have the ability to form GP-style cognitive units. But if we imagine a transitional species evolving this ability, the bilingual ASL-spoken English model suggests that scaffolding did not lead to GPstyle cognition; on the contrary, it implies two analytic/combinatoric codes dividing the work. If we surmise that an old pantomime/sign system did scaffold speech and then withered away, this leaves us unable to explain how gesticulation emerged and became engaged with speech. We conclude that scaffolding, even if it occurred, would not have led to current-day speech-gesticulation linkages. Corballis, in his 2002 argument for speech supplanting a gesture-first system of communication, points out the advantages of speech over gesture. There is the ability to communicate while manipulating objects and to communicate in the dark. Less obviously, speech reduces demands on attention since interlocutors do not have to look at one another (p. 191). While valid, these qualities are not necessary. There are also positive reasons for gestures not being language-like, and they would be so even if gesture and speech co-evolved as a single adaptation. All across the world, languages are spoken/auditory unless there is some interference to the channel (deafness, acoustic incompatibility, religious practice, etc.), and no culture has a visual/gestural primary language. Susan Goldin-Meadow, Jenny Singleton and I (1996) once proposed that gesture is the non-linguistic side of the gesture–speech dual semiotic because it is better than speech for imagery: gesture has multiple dimensions on which to vary, while speech has only the one dimension of time. Given this asymmetry, even if speech and gesture were jointly selected, as proposed, it would work out that speech is the medium of linguistic segmentation.

2.1.2   Problems  with  pantomime    

5

A central reason why gesture-first cannot lead to gesture–speech unity is that its gestures would be pantomimes. It portrays the initial communicative actions as symbolic replications of actions of self, others and entities, and it was these replications that scaffolded speech. The process appeals because it clearly taps the (straight) mirror neuron response. Arbib (2012) gives it a central role. Donald (1991) likewise posited mimesis as an early stage in the evolution of human intelligence. It is conceivable that pantomime is something that an apelike brain is capable of and was already in place in the last common chimp–human ancestor, some 8 million years back. Contemporary bonobos are capable of it, supporting this idea (Pollick 2006). The problem is not a lack of pantomime precursors but that pantomime repels speech. The distinguishing mark of pantomime compared to gesticulation is that the latter is integrated with speech; it is an aspect of speaking itself. In pantomime this does not occur. There is no co-construction with speech, no coexpressiveness; timing is different (if there is speech at all), and no dual semiotic modes. Pantomime, if it relates to speaking at all, does, as Susan Duncan points out, as a “gap filler” – appearing where speech does not, for example completing a sentence (“the parents were OK but the kids were [pantomime of knocking things over]”). Movement by itself offers no clue to whether a gesture is “gesticulation” or “pantomime”; what matters is whether two modes of semiosis combine to co-express one idea unit simultaneously. Pantomime does not; it does not have this dual semiosis.

2.1.3   Two  current  gesture-­‐first  theories   Table 1 is a roster of gesture-first advocates, going back to Henry Sweet (said to be Shaw’s model for Pygmalion). Note how all at some point say that speech supplants the original gesture language and then is marginalized. Gesture–speech unity is permanently blocked. This logical position is inescapable, and is the undoing of every gesture-first theory, including two recent theories. 2.1.3.1 The mirror system hypothesis Michael Arbib (2005, 2012), in his gesture-first theory, envisions an “‘expanding spiral’ of increasingly sophisticated protosign and protospeech,” a spiral moving from gesture-first to speech with pantomime a central bridge. He writes (2012, p. 229), “… the path to language went through protosign, rather than building speech directly from vocalizations. It shows how praxic hand movements could have evolved from the communicative gestures of apes and then, along the hominid line, via pantomime to protosign.” The Warlpiri sign/speech and bilingual ASL/English situations above however suggest that pantomime and any kind of sign language alone could never have evolved into spoken language. The spiral model’s gradual change may seem unlike the “crossovers” modeled above but still they apply. With each turn the two codes meet, and this is the mechanism of the spiral. A pantomime (or sign) spins off a bit more of itself into speech; speech then becomes to this extent another code. Far from shaping gesture or being shaped by it, as a code speech repels coded gesture and/or divides the labor between itself and its former gesture master; these are the demonstrable effects of two codes pitted against each other in the sign language models above and they still exist in the expanding spiral. The spiral however might work by exchange: as gesture and speech spiral, bits of co-expressive gesture and speech trade places, a gesture (a bit of sign language up till now) gives structure to a bit of speech (a vocal gesture till now) and the speech (gesture till now) also gives bit of global-synthetic semiosis to gesture; so they remain united.3 But note what this does. It supplants gesture by speech, yes; but it 3

Arbib writes for example (2012, p. 231), “Protospeech builds on protosign in an expanding spiral: Neural mechanisms that evolved for the control of protosign production came to also control the vocal apparatus with increasing flexibility, yielding protospeech as, initially, an adjunct to protosign” (emphasis added), which seems close to the exchange model.

6

also supplants speech by gesture. Each bit of gesture, formerly a bit of language, ceases to be language and becomes a bit of global-synthetic gesture. The global-synthetic gesture bits accumulate. The outcome is that gesture, unless the whole phylogenetic exchange for some reason reverses, cannot be a language again. And the existence of deaf sign languages, including “home signs,” invented by children themselves (Goldin-Meadow 2003), shows the falsity of this prediction. Mead's Loop does not use gesture–speech exchanges. Gesture and speech evolved together, equiprimordially, the Loop bringing one’s own gestures into the orchestration of speech, with the effect gesture–speech unity, not exchange. Moreover, when speech is blocked sign languages are natural outcomes of a gesture–speech unity. Unity provides the modes equal potential for codification. Sign languages arise as naturally as speech (language is everywhere spoken unless there is blockage, because gesture is better than speech for imagery). In keeping with this unity (now, gesture–sign unity), signs should also have their own spontaneous gesticulations. Susan Duncan (2005) observed co-expressive gestures in Taiwanese Sign Language. The gestures were manual: iconic distortions of canonical sign forms and accordingly were perfectly synchronous with the signs. The distortions incorporated the context and differentiated interiority in Canary Row narrations when the signer had just before described Sylvester’s climbing the pipe on the outside, exactly as gestures by hearing speakers do. So a spiral either produces mutually repellent gestures and speech, contrary to fact, or it makes sign languages impossible, also contrary to fact, and the only way to avoid the supplantation trap is for gesture and speech to have been “equiprimordial,” to have evolved together, and this is the process of Mead's Loop. 2.1.3.2 The recursion hypothesis Michael Corballis (2011) likewise continues to advocate gesture-first in a new work, which takes as its central theme a posited universal, recursion, the embedding of like things into like. This ability Corballis proposes is part of the psychological foundation of human culture, “…providing the creative potential for such diverse activities as reconstructing past episodes or imagining future ones, telling stories, creating music or art, and manufacturing edifices and complicated machines…” (p. 181), but the gesture-first creature would not have been capable of it for language. This is because recursion is not only the embedding of like into like at least but has dual semiosis, it enters into gesture–speech unities. Gesture-first, trapped in its own logic, does not achieve this. Gesture-first would have driven recursion into out-of-synchrony gesture and speech or divided it so that speech but not gesture (or vice versa), and not both, had it. But we, surviving the extinction of gesture-first, evolved Mead's Loop and the possibility of recursion in a gesture–speech unity, wherein gesture imagery includes recursion and orchestrates speech with it as well (as in this example: “you can’t tell” – the containing clause stating an ambiguity in the cartoon while, simultaneously, the left hand moves into a new space with the value: Ambiguity; and speech continues, “if the bowling ball is under Sylvester or inside of him” – two embedded clauses giving the poles of the ambiguity, and simultaneously the left hand, still in the Ambiguity space, moving inward with “under” and outward with “inside of him”; Figure 1).

2.1.4   End  of  gesture-­‐first     The upshot is that gesture-first has little light to shed on the origin of language as we know it; at best it explains the evolution of pantomime as a stage of phylogenesis that, if it once occurred, went extinct as a code and landed at a different point on the

7

[is under Sylvester]

[or inside of him]

Figure 1. GESTURE–SPEECH UNITY WITH RECURSION. The speaker has outlined what she took to be an ambiguity in the bowling ball episode. She first states the ambiguity (“you can’t tell if the bowling ball”) and then, recursively, states the alternatives (“is under Sylvester or inside of him”); concurrently and co-expressively, she moves her left hand to a certain space for the ambiguity itself (the first gesture – actually two gestures in the same space with “you can’t tell…” etc.), and then opposes two spaces within it for the poles of the ambiguity (two further gestures in the “ambiguity” space – first the hand moves forward with “is under”, then inward with “or inside of him”); so there is recursion on both sides of the dialectic. The recursions, spoken and gestured, partake of the usual dialectic semiotic oppositions: while speech is codified, comprised of recurrent elements with constraints of meaning and form, gesture is global and synthetic and the meaning of the whole (ambiguity) determines the meanings of the parts (the two poles, the “under” pole, in particular, being anti-iconic). Transcriptions by Susan Duncan.

continuum of gestures. Instead, I propose the somewhat out-of-the-box theory of Mead's Loop. The next section shows how Mead's Loop created a dynamic system of langage.

2.2  

Mead's  Loop.    

Instead of gesture-first, I propose that speech and gesture were what Liesbet Quaeghebeur has called “equiprimordial.” Gesture and speech had to be naturally selected together. Neither gesture-first nor speech-first could have led to language. I propose instead a mechanism for its selection that I call, after the early 20th Century philosopher, George Herbert Mead, “Mead's Loop.” Mead's Loop is a hypothesis of what emerged some one-half to one million years ago in the evolution of the human brain. There began to evolve, in the brain, a thought-language-hand link, localized at least in part in the area now called Broca’s Area (other brain areas also must have been involved, the right hemisphere for imagery and metaphoricity, Wernicke’s area for categorized semantics, the prefrontal cortex for fields of equivalents and psychological predicates, and Broca’s area for constructions and unpacking: all are “language areas” in the brain). This link was a new kind of mirror neuron, a “twisted” mirror neuron. Mirror neurons have been directly recorded in monkeys and reside supposedly in all primate brains, including ours. To quote a Wikipedia article, “[a] mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another.” I call these mirror neurons “straight,” to distinguish them from the Mead's Loop “twist”. Note what they provide. The significance of the mirror neuron response is that of the action it mimics. The action of another is repeated and becomes as one’s own. If the mirror neuron circuit is used to produce a gesture it will be a mimicked action, a pantomime. However, pantomime repels speech. Its timing vis-à-vis speech is loose; often there is no speech at all, and if there is speech the speech and the pantomime do not combine into a gesture–speech unit. These empirical limits are inevitable if pantomime is the product of an earlier

8

gesture-first language which evolution sidelined when Mead's Loop began. Here is the twist: G. H. Mead said that a gesture is meaningful when it evokes the same response in the one making it as it evokes in the one receiving it: “The gesture which indicates the object indicates the object both to the other and to the individual himself. In so far as this takes place the gesture is called significant. The meaning of significance is that the individual, in indicating the object or a part of the object to another and by this same process to one’s self, takes the role of the other and the indication of the object involves one’s tending to act upon this gesture, as the other. The gesture is said then to signify the action, and comes therefore to stand for this character of the object. It stands for the reactions to the object, or for its meaning.4

For evolution, this suggests a kind of “twist,” in which mirror neurons came to respond to one’s own gestures as if from another. They thereby brought one’s own gesture imagery and its significance into the motor areas for orchestrating speech. Gesture became the dynamic unit and not syntax (which separately arises as a means to stabilize the dialectic of imagery and form).With metaphoricity, the expansion of meaning is unlimited. Twisted mirror neurons also built into gesture a social orientation. While “straight” mirror neurons reproduce the actions of another, with meanings that are those of the other’s actions, the Mead’s Loop “twist” responds to one’s own gestures as if from another, and brings into the action-orchestration areas of the brain different meanings, those of the gestures. This was Mead’s insight: gesture (and speech) have fundamentally a social character and, to be meaningful, must have a social/public presence. With Mead's Loop, this occurs when the one making and the one receiving are the same. This is the “twist”; then the gesture is a meaningful and socially pertinent event, with the potential to connect to everything else in language dynamically, and specifically to orchestrate vocal and manual movements. But was the twist needed? It was, because the gesture, although emanating from the same brain area as speech, does not unite with it without self-response. This gesture would be incomplete. It is neither synchronous nor co-expressive with speech. Gesture–speech unity happens only when the gesture gets a self-response via Mead's Loop. Then it becomes able to orchestrate speech movements.

2.2.1.   Answering  puzzles   I have discussed Mead's Loop with well-informed linguists and have, I believe, identified, with their help, sticking points for understanding it and the arguments in support of it. Most insidious is a seemingly irresistible attraction for many readers to a linear, cause-line style of thinking that gets in the way of Mead's Loop and the GP, which require thinking that is non-linear, simultaneous and “all at once” (cf. Quaeghebeur 2012). The following exchange clarifies this obstacle. Comments  by  Renia  Lopez-­‐Ozieblo     Responses by David McNeill Self-response and gesture-speech unit at the same time: I can, at a push, follow the idea of the self response and the unit formation all happening together. But if for Mead’s Loop to originate I need a gesture, but this gesture is not the same as the GP one …

This is the key – you must not think sequentially. Think instead of Mead's Loop “all at once”. And how? There is just one gesture. The “gesture” has reality on two levels; think of a “gesture” as occurring and, at the same time, as selfresponding; the trick is to avoid sequential thinking. The word “response”

4

Emphasis added.

9

sounds sequential, but think of it not as a stimulus-response sequence but, all at once, as a gesture that has self-response as an integral aspect. This is no different than self-awareness of any action. If you act and are aware of your agency, it is not that you act, then become aware that you were the one who did it; self-awareness is part of the act from its inception, and this reality makes it qualitatively different from unaware action. It’s thus not that you gesture and then respond; the self-response is part of the gesture fundamentally. Why would the gesture come by itself? And wouldn’t this be fuel for the gesture first argument?

It’s not a separate gesture. It is a level or residue in the new gesture that Mead's Loop made possible. A primate gesture in any case is not

necessarily a gesture-first gesture but is better considered to be a precursor to Mead's Loop, not a gesture facing extinction. The figure (from Amy Pollick) shows a precursor type gesture; in this case, an iconic-deictic gesture by one bonobo to show another bonobo where to go: Why is this not Mead's Loop? While not impossible, in incipient form, Mead's Loop means the left bonobo also would have had a self-response to her gesture, so that it was more than an expression of her wish to get the other bonobo to move forward, and also had significance for her as a public act, responding to it as though from an other.

I think this is solved if instead of thinking of “gesture” we can think of the thesis “imagery”. But was this what you meant?

Yes, but a gesture is an image, one that is materialized by the body itself. Image and gesture are not separate entities. This is true even if an overt gesture is lacking. A gesture is still present as imagery, at the low-material end of a continuum with a full gesture at the high end. Imagery is gesture’s most constant aspect; the outward movement is its material embodiment but since less newsworthiness summons less material, the absence of gesture is just the endpoint of this continuum. The dual semiosis of imagery and form still is present.

What is Mead’s Loop adding?

Without it, the gesture is a single-level pantomime or point, and as such is repelled by or otherwise unable to unify with speech. With it, the gesture becomes two levels, and gains control over speech orchestration. (Try the bonobo gesture yourself, in the same imagined situation: “that way” or some equivalent speech is automatic. This would be Mead's Loop coming into play.)

De Ruiter in his paper "Gesture and Speech Production" (1998), p. 60, says: “It seems therefore necessary to assume that the gesture planner, after having constructed a motor program for the gesture, sends a "resume" signal to the conceptualizer. Upon receiving this gesture, the conceptualizer can send the preverbal message to the formulator”. Is this not Mead's Loop?

It is a loop but it lacks two essentials of Mead's Loop. First, there is no social reference – which for Mead was crucial, and was the reason Mead's Loop had an adaptive advantage in evolution. This loop has no biological

10

reason. Second, it does not capture gesture–speech unity. The gesture is a signal to the conceptualizer but it has no orchestration power. It is a “resume” signal. Accordingly, it produces a delay of speech, not an orchestration of it. It tells the conceptualizer to let go, but the unit of speech it releases is the unit the conceptualizer has already created, and the gesture does not orchestrate it. This and other Speaking-inspired models (such as Kita-Özyürek’s 2003) face the empirical fact of synchrony and include it but for them it is arbitrary and external, not inherent, as in 5 the GP.

2.2.2   How  Mead's  Loop  engendered  Growth  Point  properties.       A growth point or GP is a cognitive package that combines semiotically opposite linguistic categorial and imagistic components (McNeill & Duncan 2000). Table 2 summarizes the semiotic oppositions inside a GP. Imagery side Language side Global: meanings of parts dependent on Compositional: meaning of whole meaning of whole. dependent on meaning of parts. Synthetic: distinguishable meanings in Analytic: distinguishable meanings single image. separated . Additive: no new syntagmatic values Combinatoric: new syntagmatic values when images combine. when parts combine. Table 2. Semiotic Oppositions Within a GP The GP becomes the minimal unit of the dynamic dimension itself. It is called a growth point because it is meant to be the initial pulse of thinking-for-(and while)speaking, out of which a dynamic process of organization emerges. The linguistic component categorizes the visuo-actional imagery component.6 The linguistic component is important since, by categorizing the imagery, it brings the gesture into the system of language. Imagery is equally important, since it grounds sequential linguistic categories in an instantaneous visuospatial frame. Imagery provides the GP with the property of ‘chunking’, a hallmark of expert performance (cf. Chase & Ericsson 1981), whereby a chunk of linguistic output is organized around the presentation of an image. Synchronized speech and gesture are the key to this theoretical GP unit. It means that the same idea is realized in two opposite semiotic modes, and the GP as a gesture–speech unity combines them in a dialectic. This model in intrinsically dynamic, as semiotic opposition is unstable and seeks resolution. A grammatical construction is this resolution, and one that is able to unpack the GP dialectic without distortion is the spoken result. Mead’s Loop led to the GP because it had both semiotic and motor effects:



Semiotically, it brought the gesture’s meaning into the mirror neuron area.

A modification of the de Ruiter model, namely, that the gesture, when sent back, is not a “resume” signal but has the power to reorchestrate the already formulated speech. This requires new connections that further undermine the modularity of Speaking and hardly seems economical. However, a loop within a loop like this is not impossible. 6 For this reason questions like, “how can the image of a sunset be a component of speaking?”, are irrelevant. There is in the image no actional component, and possibly not a global-synthetic semiotic component either. If a visuo-actional image of a sunset were to occur, we expect that it would be like many other such images – perhaps a vertical surface created by the hand in a vertical palm facing ego and at the locus of the “sunset”, as a discursive object. In others words, the image changes from a photo- or paintinglike picture, so to mesh with linguistic form 5

11



Mirror neurons no longer were confined to the semiosis of actions. One’s own gestures (such as “rising hollowness”) entered, as if it were liberating action from action and opening it to imagery in gesture. Extended by metaphoricity, the significance of imagery is unlimited. So from this one change, the meaning potential of language moved away from only action and expanded vastly. At the motor level, in Brodmann’s areas 44 and 45 (i.e. Broca’s Area), the areas of the brain where speech movements are orchestrated, Mead’s Loop enabled significant imagery – gesture – to “chunk” vocal motor control, the foundation of the GP.

I conclude by listing the chief properties of the dynamic dimension and GP, and how Mead’s Loop engendered them. For the GP itself, I refer to McNeill (2012). Using only general primate cognition this ensemble of dynamic dimension properties would not have come together into a dynamic dimension of langauge. It goes without saying that other factors could have influenced the dual semiotic, social reference and psychological predicate properties, but all would seem to have their roots in the semiotic and motor effects of Mead’s Loop. Inhabitance. Gesture, the instantaneous, global, nonconventional component (Energeia), is “not an external accompaniment” of speech, which is the sequential, analytic, combinatoric component (Ergon); it is not a “representation” of meaning, but instead meaning “inhabits” it: “The link between the word and its living meaning is not an external accompaniment to intellectual processes, the meaning inhabits the word, and language ‘is not an external accompaniment to intellectual processes.’7 We are therefore led to recognize a gestural or existential significance to speech…. Language certainly has inner content, but this is not self-subsistent and self-conscious thought. What then does language express, if it does not express thoughts? It presents or rather it is the subject’s taking up of a position in the world of his meanings” (Merleau-Ponty 1962, p. 193; emphasis in the original).8

The GP is a mechanism geared to this “existential content” of speech – this “taking up a position in the world.” Gesture, orchestrating speech, is inhabited by the same “living meaning” that inhabits the word (and beyond, the discourse). The positive correlation, as widely observed, of speech and gesture complexity thus involves greater “inhabitance” (if this concept can be gradual, half in, half out); it involves more of the body in movements more coordinated by the significance being differentiated from the context of speaking. It is the thought-language-hand link that Mead's Loop created simultaneously affecting both motor domains. The dual semiotic and dialectic. The new form of mirror neuron response in Mead’s Loop merged vocal movements and gesture, and synchronized them at points where they were co-expressive of an underlying idea unit, laying the ground for the imagery–language dialectic and the dynamic dimension of language. Once codified linguistic forms evolved (along with GPs), a dialectic would be the immediate response. Without Mead’s Loop, gesture and speech would have had only the loose connection seen with pantomime, and language could not have escaped the single-semiosis box. The social reference. The social orientation of mirror neurons with the Mead’s Loop “twist” gave gestures and GPs a social reference character. Without Mead’s Loop gestures could have social reference only if directed at an interlocutor. But speech-unified gesticulations are not necessarily aimed at someone else (in contrast to emblems, points or pantomimes) yet gestures and their GPs are social (“public”) entities. Also, crucially, a foundation in Mead’s Loop opened a route over 7

Merleau-Ponty’s quotation is from Gelb and Goldstein (1925, p. 158). am indebted to Jan Arnold for this entire quotation.

8I

12

which the social conventions of speech and thought could form, GPs absorbing social-interactive content. An important effect of the inherent sociality of GPs due to Mead’s Loop arose in the origin of syntax, namely, “sharaeability” (Freyd 1983). Origin of psychological predicates. The psychological predicate, the differentiation of what the speaker deems newsworthy in the immediate context, was inherent to Mead’s Loop by virtue of how it brought gesture in as a speechorchestrating force. The inseparability of the psychological predicate from context resides in Mead’s Loop’s self-response. A gesture of the gesticulation kind (in contrast to sign language signs) is not extractable from context to context. Over Mead’s Loop it orchestrates vocal movements under the gesture’s significance. The result is inherently context bound, as a matter of how it is formed. Moreover, given the social reference of Mead’s Loop, the contexts that gesture differentiates mesh with ongoing social interactions. So what is newsworthy is meaningful in a social framework. Eventually, the clever Mead’s Loop creature was able to shape contexts to fit intended differentiations. This was an elaboration of the psychological predicate functionality of Mead’s Loop and may or may not have been part of it from the beginning (it seems possible that the ability to shape context to fit the intended meaning is linked to another ability, which also had to emerge, to use metapragmatic indicators to orchestrate unpackings for goals and intentions – together, they promote utterances that will be true to goals and intentions). Origin of catchments. Catchments are threads of consistent imagery attached to themes. They also arose with Mead’s Loop as a matter of course. Mead’s Loop binds imagery with speech and brings the meaning of the image into the (twisted) mirror neuron circuit. Although not shown here, each time the “it down” speaker (see McNeill 2005) regarded the bowling ball as an antagonistic force its iconic imagery returned along with this theme. There were a half dozen such occurrences, and theme and image were bound together throughout. The theme is bound to the image because “twisted” mirror neurons, echoing the gesture, also echo its significance as an antagonistic force plus whatever iconicity it had. Details varied but the recurring antagonistic force theme was a constant. Metapragmatic indicators. Mead's Loop engendered gesture–speech unities, as we have seen, but it also had a role to play in what Michael Silverstein (2003) has pioneered, the collection of metapragmatic scaffolding that encompasses the GP dialectic. The essential feature of metapragmatics – awareness and control of the pragmatic effects of one’s utterance – stemmed from the social impetus that Mead's Loop provided. Without its sense of one’s own actions as social and public, there could be no metapragmatic indicators. The equiprimordiality of speech and gesture. To select Mead’s Loop, speech and gesture had to evolve together. One could not have come first and the other later. This basic difference from gesture-first is perhaps the one step, if we try to single out one, that sets human linguistic evolution apart. Some avian species (crows, ravens) have evolved surprisingly elaborate vocal and gestural repertoires but have not taken the step that led to language, evolving a unit that is both sound production and gesture integrally. The entire Mead's Loop process involves one’s own gesture impinging on the same area of the brain where vocal actions are orchestrated, so both are necessary. Origin of unpacking. The GP is the core idea at the moment of speaking, differentiating a context; unpacking with a syntactic construction cradles it, and intuitions of well-formedness comprise the dialectic’s stop-order. Some constructions unpack their GPs and nothing more, others add further meanings such as causedmotion, but all arise from the GP. GP and unpacking are not necessarily sequential. But even if simultaneous they are functionally distinct, with unpacking dependent on the GP. This is they key to its origin as well. By combining imagery with codified form, a self-created selection pressure for the static dimension also arose. But not theory of mind. Mead’s Loop however is not theory of mind. In a sense, they are opposites. The Mead’s Loop adaptation brings self-awareness of one’s own behavior as social, not a theory of the cognitions and intentions of another (a theory of mind could evolve from straight mirror neurons in any case, which further

13

limits its role in the origin of language). Both theory of mind and Mead's Loop depend on a more fundamental faculty, self-aware agency, a topic the late Susan Hurley explored in depth. Awareness of self as agent seems to develop in the child around age 4, and both Mead's Loop and theory of mind show themselves then as well (children’s language is discussed in the fifth part of this series). The origin of Mead's Loop depended on this sense. It is only through awareness of one’s own agency that gestures can be responded to as if from another. Summary. Gesture–speech unity is felt across a spectrum of dynamic effects – prosody (the peak corresponding to the GP), the formation and differentiation of contexts, and the energizing of communicative dynamism and with it, the amount of coding material in both gesture and speech in motion together in a positive relationship. All of this sorts out the two modes of consciousness of sentences.

3  

Last  word  on  the  origin      

Pulling the threads together, we have a new idea of how language began. How Mead's Loop works in outline:  Gestures emanating from Broca's Area are mirrored by it.  The gesture acquires significance according to Mead by being responded to in the same way it is responded to by others.  It acquires the sense of being social/public.  It is now ready to orchestrate actions of the vocal tract (or hands in the case of a sign language).  Thus speech (or signing) becomes a dual semiotic – imagery orchestrating symbols that are themselves codified by sociocultural conventions. The scenario is the emergence of human family life. The family, and child-rearing in particular, are environments where the social/public value of one’s own gestures is adaptive, and there Mead's Loop could have been naturally selected (no doubt it was adaptive in other contexts as well). Archeologists date the dawn of family life (with cooking hearths, for example) to about one million years ago, with a stable family membership and a division of labor. So it was back then that the natural selection of Mead's Loop possibly also began. Mead's Loop also gave the speaker the sense that gestures have a public, social significance. Whereas straight mirror neurons take a socially received action and make it into a personal property, Mead's Loop makes one own gesture into a socially referenced action. For the speaker it gains public significance. This social reference was a vital innovation. It gave the adult, typically a mother inculcating cultural norms in infants, the sense of being an instructor as opposed to being just a doer with an onlooker (the chimpanzee way). Entire cultural practices of human childrearing depend upon this sense (Tomasello 1999). To do this the adult must be sensitive to her own gestures as social/public actions. Sensing actions as social would impact the next generation of children who, as a result of it, do better at coping. Given the social references of Mead’s Loop, natural selection for it would arise in situations where sensing one’s own actions as social and public is advantageous – for example, as suggested here, imparting information to infants, where it gives the adult, typically a mother, the sense of being an instructor. The focus of this selection would be adults – in this scenario language evolved in women. Their infants, both female and male, benefit from cultural inculcation and inherit any genetic dispositions. Sarah Hrdy (2009) has highlighted the group rearing of infants, including the infants of other adults, as an aspect of early human family life. The ability to engage in collective infant rearing clearly demands (and naturally selects) seeing one’s own actions as social. Such practices stand in sharp contrast to chimpanzee infant rearing, where infants, if left without their mothers, are neglected and vulnerable to attacks by adults in the same group. On this view it was women, mothers, in whom language began. We should

14

not be surprised, accordingly, that baby talk – the speech register that adults instinctively adopt speaking to infants – appears universally. Mead's Loop “twisted” mirror neurons, in other words, are symbolic in their initiation, control and effects (giving material carriers to meanings and multimodal embodiments to cognitive “being”). Their natural selection could have had nothing to do with straight mirror neurons. Also, the downstream links of Mead's Loop and straight mirror neurons differ – with “twisted” neurons the thought–language–hand link; with “straight” ones actions manipulating the world. This difference reflects separate evolutions as well. Whether you are persuaded by these arguments depends, ultimately, on taking seriously gesture–speech unity, that gesture and speech comprise a single multimodal system, and that gesture is not an accompaniment, ornament, supplement or “add-on” to speech but is actually part of it. Gesture-first does not predict this language–gesture integration. When we look at models of speech– gesture crossovers of the kind that, in theory, gesture-first would have encountered when speech supplanted an original gesture language, we do not find conditions for gesture–speech unity, but instead non-co-expressiveness or mutual speech–gesture exclusion. Joining the damage is Woll’s (2005/2006) argument that not only does gesture-first leave gestures unable to integrate with speech but it also blocks, within speech itself, the arbitrary pairing of signifiers with signifieds that is characteristic of (or, Saussure says, defining of) a linguistic code.

3.    

Timeline  

Finally, when did Mead's Loop evolve? The phrase, “the dawn of language,” suggests that language burst forth at some definite point, say 150~200 kya (thousand years ago), when the prefrontal expansion of the human brain was complete. But the origin of language has elements that began long before – 5 mya (million years ago) for bipedalism, on which things gestural depend. I think 1 mya, based on humanlike family life dated to then, for starting the expansion of forebrain and the selection of self-responsiveness of mirror neurons and the resulting reconfiguration of Areas 44/45. I imagine this form of living was itself the product of changes in reproduction patterns, female fertility cycles, child rearing, neotony, all of which must have been emerging over long periods before. So this says that language as we know it emerged over 1 to 2 million years 4A7G;4GABG@H6;;4F6;4A:87FB9E86BA9