John Benjamins Publishing Company

0 downloads 0 Views 36MB Size Report
The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of ...... La mandibule humaine dans la biologie du développement normal et ... Anatomie et croissance du conduit vocal du fœtus à.
John Benjamins Publishing Company

This is a contribution from New Perspectives on the Origins of Language. Edited by Claire Lefebvre, Bernard Comrie and Henri Cohen. © 2013. John Benjamins Publishing Company This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to post this PDF on the open internet. For any other use of this material prior written permission should be obtained from the publishers or through the Copyright Clearance Center (for USA: www.copyright.com). Please contact [email protected] or consult our website: www.benjamins.com Tables of Contents, abstracts and guidelines are available at www.benjamins.com

part 2

At the roots of language

© 2013. John Benjamins Publishing Company All rights reserved

© 2013. John Benjamins Publishing Company All rights reserved

Reconstructed fossil vocal tracts and the production of speech Phylogenetic and ontogenetic considerations Louis-Jean Boë, Jean Granat, Jean-Louis Heim, Pierre Badin, Guillaume Barbier, Guillaume Captier, Antoine Serrurier, Pascal Perrier, Nicolas Kielwasser & Jean-Luc Schwartz Institut national polytechnique de Grenoble, Université Stendhal, Muséum National d’Histoire Naturelle

The end of the twentieth century and the beginning of this one saw a reorganization of research in the field of speech and language emergence (SLE). Naturalism is the core of this new approach. It consists in describing the relations between biological aspects (in every sense of the word) on the one hand and speech and language, on the other hand, by an accumulation of hypotheses and evidence derived from a wide range of data collected by means of interdisciplinary collaborations. As with studies on the origin of man, a profusion of hypotheses has arisen, which sometimes lead to very dubious developments, based on flimsy results and on too little data, and connected to related but not fully mastered or overly simplified disciplines. This is why regular critical overviews do not seem superfluous. First, we propose a classification (push and pull theory) that provides a new reading of the various theories which have been proposed for half a century. In the present state of knowledge, it is not possible to infer when our ancestors acquired the faculty of speech and language: control of speech articulators, coordination between larynx and vocal tract, phonology, syntax, semantics, and recursion. Some longstanding questions remain unanswered: Why is our species alone in having speech and language? Many other questions are (for now) badly formulated problems: we do not have sufficient data to answer them. Perhaps these questions will remain unanswered. But we think that the following question can be answered: If we suppose that our ancestors (and distant cousins) controlled their larynx and vocal tract in the same way as present-day humans, did the geometry of their vocal tract allow them to produce the universal sound structures of the languages spoken today? We analyzed 32 skulls from the present to 1.6 Ma (million years) BP (before present) of fossil hominids available at the Musée de l’Homme in Paris or in the literature: (1) 10–30 ka (thousand years) BP: modern humans: Paleolithic; (2) 90–200 ka BP: anatomically modern humans; (3) 45–90 ka BP: Neanderthals;

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

(4) 1.6 Ma BP: Homo ergaster. These skulls are all well preserved and most possess a mandible but the vertebral column has been reconstructed from some fossil vertebrae. We attempt to: (1) identify the position of the hyoid bone and glottis; (2) reconstruct a vocal tract in a plausible way using an articulatory model; (3) quantify the acoustic capabilities of this reconstructed vocal tract. For this purpose, we combine phylogeny and ontogeny. We can now state that our ancestors and distant cousins were equipped with a vocal tract that could produce the same variety of vowel sounds as we can today: mainly the vowels /i a u/. Vocal tract morphology has allowed for the emergence and production of speech for several hundred thousand years. But how can we know to what extent earlier hominids mastered the control skills needed to produce speech? New lines of research are proposed in which orofacial abilities necessary to the emergence of speech are linked to a precursor mechanism dedicated to feeding (masticating-swallowing gestures).

1.  Introduction Speech – the philosophical, cognitive, linguistic, and social touchstone – confers a singular status on humans among living species. Debates concerning the origins of speech, the conditions for its emergence, the nature of its production and perception, the understanding of its acquisition and pathology, and the exploration of its diversity in the world’s languages raise numerous questions. Many of these are still widely debated although they may date back to centuries-old intuitions, beliefs, and myths (see Auroux 1982, 1989, 1992, 1995, 2000, 2007). For the human species, language and speech undoubtedly constitute an existential question that has given rise to countless myths, theories, digressions, and even prohibitions. The end of the twentieth century and the start of this one witnessed a profound reorganization of research in the field of speech and language emergence (SLE). Free of all religious, philosophical, or institutional constraints, these naturalistic studies are characterized by their multidisciplinary approach, and involve a wide range of methods and data. A good deal of evidence shows that work on SLE is burgeoning, both in academia and in the media: academies (e.g. The New York Academy of Sciences, The ­California Academy of Sciences) have promoted it, and learned societies have been created (Association for the Study of Language in Prehistory with its journal Mother Tongue). The ninth edition of EVOLANG, the Evolution of Language Conference, has been held in Kyoto in 2012. In another sign of this change, a workshop titled Language Origins Research: State of the Art as of 1997 was held as part of the 16th International Congress of Linguists organized by the Société de Linguistique de Paris, which in 1866 had ­written in its constitution that it would not accept any work on the origin of © 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

l­anguage. In 2004, one of the authors of the present chapter presented, at a session of this society, a paper on monogenesis and Ruhlen’s hypothesis (cf. Ruhlen 1994). The Collège de France invited Luigi Luca Cavalli-Sforza to give courses between 1981 and 1990, and published one of his works (Gènes, peuples et langues); in 2002, it organized an international colloquium on the origin and evolution of languages: approaches, models, paradigms. The CNRS and the European Science Foundation have supported programs in France – OHLL (Origine de l’Homme du Langage et des Langues) – and in Europe – OMLL (Origins of Man, Language and Languages) and the European-based Hand to Mouth project (some of the authors have been members of these projects). In the ­summer of 2010, the Université du Québec à Montréal organized a Summer Institute in Cognitive Sciences dedicated to the origin of language. The Origins and Evolution of Language and Speech conference organized under the aegis of the New York Academy of Sciences in 1975 (Harnad, Steklis & Lancaster 1976) is a good illustration of this new approach, which has given up the problem of the temporal and spatial location of the origin of languages and the pursuit of comparative research. The 13 parts of this conference, which were organized into position papers and discussions, specified that the goal concerned the very nature of language (presented by Chomsky) and of protolanguages. The cognitive aspects of communication were examined, along with the neurological aspects (lateralization, pathologies, memory disorders, aphasia) and the parallel approaches taken by artificial intelligence. Paleobiological and comparative paleoanthropological work on the great apes was also placed in the cognitive and neural domain. The core topics were language and the production/perception of speech (anatomy, phylogeny, and biomechanics of the vocal tract). In the field of linguistics, this represented the replacement of the culturalist program, which had reached its peak in structuralism, with a new focus on naturalism, but on other bases than those of the nineteenth century. We can also note that speech occupies a significant place in this new framework, whereas it had previously been marginalized, considered as a secondary issue or even a phenomenon outside the scope of linguistics (not so long ago; cf.; Milner 1983, p. 183 and Hagège 1985, p. 171–172), by Saussurean structuralism, with its dogma of the primacy and superiority of language over speech, and by an approach to phonology (from Troubetzkoy & Hjelmslev to Dell; cf. Dell 1973, pp. 5, 49) in which the determination of the form was not considered to be bound by the substance in any way (Boë 1997a, 1997b). Today, a review of the literature on speech and language emergence reveals a large number of very different methodological approaches and initiatives. Declarations of intent and consensus positions (Christiansen & Kirby 2003; Hauser, Chomsky & Fitch 2002) stress the need for multidisciplinary work (Figure  1). We ourselves are well aware of this need and have applied it. © 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al. Speech and language acquisition pathology (phonetics, psycholingustics)

Speech and language universals and changes (phonetics, linguistics)

Cognition (neurophysiology)

Theory of mind (psychology)

Development biology (genetics)

Research of symbolic cues (archeology, prehistory)

Articulatory, acoustics, perception (speech sciences) Modeling (biomechanics)

Epistemology (philosophy)

Modern men and fossils (physical anthropology, paleoanthropology) Animal communication (primatology, comparative psychology) Control, multi-agent modeling (robotics)

Figure 1.  The fields contributing to research on speech and language emergence: a plethora of very different data, protocols, and investigative methods (based on Christiansen & Kirby 2003)

2.  I ssues and risks of an expanding multidisciplinary dialogue: The contagion of scientific ideas 2.1  The contagion of scientific ideas Due to their composite nature and their development traditions, multidisciplinary domains are very permeable to data, theories, and even simple, uncorroborated hypotheses that have been formulated outside their own fields of investigation, especially if these ideas support their own theories (circular reasoning). How are the ideas and theories developed in these fields transmitted? …an idea, born in the brain of one individual, may have, in the brains of other individuals, descendants that resemble it. Ideas can be transmitted, and by being transmitted from one person to another, they may even propagate. (Sperber 1996, p. 1)

In this way, Dan Sperber (1996) proposes an epidemiology of representations. He examines two categories of representations. The first are private mental representations: beliefs, intentions, preferences, etc. The second category includes public representations, which are far less numerous: these are images, signals, and texts that are transformed by the communicator and then commuted by the recipient into private representations. Ideas are transmitted and transformed as they move from one person © 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

to another: “not randomly but in the direction of content that demands less mental effort and results in greater cognitive effects.” This is the tendency to optimize the effect/effort ratio (Sperber & Wilson 1986). In our opinion, this explains why certain hoaxes work so well: they can have resounding and long-lasting success if they are designed to correspond to cultural prejudices, in which case they can be accepted easily and propagated quickly. Thus, in 1912, the Piltdown Man skull arrived at the right time in Great Britain, which had hitherto been lacking in fossils: it provided proof of the missing link between apes and humans predicted by Darwin; furthermore, the skull must have contained a large brain, a proof of intelligence that supported the theory of the supremacy of the white race (Gould 1981, 1983, pp. 78–131; Thomas 2002). It took 40 years for top British scientists and the authorities at the British Museum to recognize that Eoanthropus dawsoni (from the name of its inventor, Charles Dawson) was only a hoax – a skillful assemblage of bones from a skull that dated back at most to the Middle Ages and the mandible of a young (and recent) orangutan from Borneo. Regarding the emergence of speech, our work has verified the hypothesis that certain theories – very flimsy but very intuitive, that is, with a good effect/effort ratio – have been able to penetrate the private mental representation module and, so to speak, parasitize it and feed it with the transmission of public representations (Boë 2001). Thus, the following claims continue to propagate successfully: ––

––

the statement that a tiny bone, the hyoid bone, is a key factor in the origin of speech, even though all mammals have one. The existence of the hyoid bone is not a relevant factor in the discussion of speech emergence. Claiming that the hyoid is indispensable for clear speech is no more accurate than claiming that the first cervical vertebra is indispensable for standing upright, given that the vast majority of mammals have this vertebra but they have not all acquired the upright position and bipedal walking; the larynx descent theory proposed by Philip Lieberman (1984) has the aim of explaining generally and very simply why the great apes do not speak, why Neanderthals could only utter a few relatively undifferentiated vowels – and those very slowly – and why babies do not generate a well-formed vowel system in their first months of life. Because of its generality and simplicity, this theory is a textbook example of a contagious idea: we have shown that the comparison of Neanderthals with modern humans reveals that in fact the larynx has not descended (Boë, Heim, Honda & Maeda 2002; Heim, Boë & Abry 2002; Boë, Heim, Honda & Maeda et al. 2007; Boë, Ménard, Captier & Davis et al. 2013). In all likelihood, our close cousins did not have any vocal tract disability. We will return to this point in detail later on to show that, ever since our ancestors stood erect, that is to say several million years ago, there has been no larynx descent during the course of phylogeny.

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

2.2  C  ircular reasoning To support their theory that Neanderthals could not produce the cardinal vowels of human languages, namely /i a u/, Philip Lieberman and Edmund Crelin (1971) chose the skull of La Chapelle-aux-Saints. This skull, which was discovered in several pieces, was reconstructed in 1913 by Marcellin Boule, who considered that the Neanderthals were closer to great apes than to humans (Figure 2), which was the dominant idea at the time (Boule 1913). It was later shown that Boule’s reconstruction was biased (Heim 1986, 1989). In our work, we have shown that Neanderthals were very close to Homo sapiens, as is increasingly acknowledged today (Patou-Mathis 2006). Nevertheless, Lieberman (1984, 2007) has not yet taken this finding into consideration.

Figure 2.  Neanderthal as represented by Marcellin Boule in the early 1900s when he ­reconstructed the skull of La Chapelle-aux-Saints (photo L.-J. Boë) and a new r­ econstruction by Élisabeth Daynès (2009), who worked closely with anthropologists specializing in ­Neanderthals (© 2009 Photo E. Daynès, Reconstruction Atelier Daynès, Paris)

2.3  A  scientific presuppositions As we have indicated, SLE is a field with a rich background of beliefs. Thus, Neanderthals, considered to be a different species from Homo sapiens, are at the center of some lively controversies that go beyond the purely scientific. As Sophie de Beaune asks, is prehistory still a science? Whether or not Neanderthals used the language capacities available to them is a question that goes beyond the limitations of exact sciences. Opinions are divided on this question, and hypotheses bring ascientific presuppositions into play. This is still a highly subjective field in which belief-based arguments clash […] when it becomes necessary to defend and rehabilitate it. We feel that researchers on both sides are not merely defending a scientific theory. (De Beaune 2007, p. 16; our translation)

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

3.  Structuring epistemology: The push-pull framework 3.1  Pull and push approaches Speech and language emergence is the result of a remarkable conjunction of: –– –– ––

the existence of a highly sophisticated auditory system and of organs for respiration, mastication, and swallowing shaped for vital functions; the exaptation (Gould & Vrba 1982) of these organs as vocal instruments capable of producing complex sound signals, coordinated with respiration; the emergence of cognitive capacities that allowed humans: –– to learn fine control of the vocal instrument (phonation + articulation); –– to make use of a doubly compositional system, language, that enables them, on the basis of only a few dozen sounds, to generate tens of thousands of words and combine them in an infinite number of sentences (generativity, creativity); –– to arbitrarily associate meanings with each of the elements produced in this way, thanks to a linguistic system, language, that is shared by a group of speakers and listeners.

Learning to speak requires reference, recopying, and mimicry, but these are only ­limited, since the generativity inherent in language allows children to produce new combinations and generalizations. Recursion, which is claimed to be specific to human language (Chomsky 1980), gives it enormous productive possibilities. The coexistence of very powerful learning and sound production/perception capacities on one hand and symbolic cognitive capacities on the other hand was certainly not sufficient to enable humans to start speaking. In and of themselves, these two clusters of potential could not have underlain this cognitive leap. One might hypothesize that certain motivations and objectives enabled humans to benefit from this combination of circumstances that resulted from evolution. The question we ask concerns the possibility of SLE from an initial state in which language was absent. By setting aside the underlying question of when SLE took place in phylogenetic development, we propose a possible combination of the approaches that Jean-Luc Schwartz (2010) has referred to as pull and push, cross-cutting a dichotomy between function and substance (Figure 3). This possible classification is based on a Darwinian model, in which if emergence takes place, it presupposes a preliminary state, a process of generating genetic variants, a process of selection, and a process of reproduction. In this context, the approaches we will call pull are those that emphasize the functional content of this

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

selection ­process: Which are the best variants? This question leads to another, more general, one: What is the function of language and why was this evolved (apomorphic) secondary specialization retained by evolutionary selection? Pull approaches aim to pull language from a major functional objective, which remains to be determined. What we call push approaches focus on the preliminary state, the main properties of which must be determined as much as possible: What existed before language? More specifically, what kind of prelanguage did nonhuman primates have that might have provided the essential prerequisites of language? Push approaches are thus centered on language’s continuity with the functions, mechanisms, and ingredients that preceded it, rather than on the major evolutionary discontinuity that the appearance of language represents. They ask what could have pushed the higher primates (hominids) toward SLE. They avoid the question – which is an essential one – of the possible selection process, with the hope of linking up language with possible earlier forms (protolanguage).

3.2  Motivations and pull hypotheses Some pull hypotheses had been proposed in the nineteenth century and even well before. In essence, they fit into the Darwinian framework: the passions that in previous centuries were considered to be the engines of speech development became needs, multiple functions with adaptive benefits: ––

–– –– –– ––

––

the extension, structuring, and maintenance of social relationships to increase the chances of survival (cooperation, forming alliances, political organization, status, bonding, etc.), not only for small groups but also for large populations; the choice of sexual partners (enhanced ability to persuade and seduce); the possibility of transmitting information (memory) and educating children better; hunting (sign language is not as useful if one has weapons in one’s hands); technology that requires complex learning and planning of a procedure: for manufacturing tools, as well as their diversification and semiotic and sociocultural function; transportation of materials, flint knapping workshops, microliths, bone tools, blades, needles (which can be studied by experimental archaeology); the transmission of flint knapping techniques and their diffusion throughout the ancient world; the construction of complex sites; the possibility of representing other people’s mental states.

All of these motivations can be combined under the heading of social and technological cooperation and the transmission of mental representations (cf. Ambrose 2001;

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

Bickerton 2003; Dunbar 1993, 1996; Gärdenfors 2003; Harnad et al. 1976; Hockett & Ascher 1964; Jablonski & Aiello 1978; Johansson 2005; King 1996; Landau  1991; Leroi-Gourhan 1964; Lock & Peters 1999; Montagu 1976; Origgi 2001; Premack 2004; Schepartz 1993; Schick & Toth 1993; Sperber 2000; Sperber & Origgi 2005; ­Stoczkowski 2001; Tomasello 1999; Tomasello, Carpenter, Call, Behne & Moll 2005; White 1999; Wildgen 2004; Wynn 1999). In fact, many of the arguments regarding the emergence of speech can be contradicted, at least in part: some other mammals hunt effectively in groups, raise their young (albeit sometimes for a very short time), use tools (albeit rudimentary), cooperate, reproduce successfully, and have good representations of allies, competitors, and predators. Like François Rastier (2006), one might wonder whether, in some cases, these are not merely petits romans anthropologiques (“anthropological novelettes”); certainly, they are very popular in the media: thus, language is said to have enabled hunters to be sure that their wives were being faithful in their absence (Coon 1962, pp. 87–88); was used to tell stories (Homo inventans; McNeil 1996), for example to ensure peace in tribes dominated by aggressive males (Homo narrans; Victorri 1999, 2005); to talk spitefully about people who were not there while tightening social bonds (grooming and gossip hypothesis, Dunbar 1996); to influence other people’s behavior and secure one’s social status by communicating information that a priori appears improbable, for example; or to engage in politics (Dessales 2001; Dessales, Picq & Victorri 2006). It is true that, in our current state of knowledge, these suggestions can be neither corroborated nor invalidated, but at most judged on the basis of their logic and plausibility. Nevertheless, if one adopts a Popperian scientific process in which “the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability” (Popper 1963, 36), one might perhaps wonder about their relevance. Thus, a major objective of future research will be to strive to obtain evidence, direct or indirect, confirming or refuting these scenarios or newer ones.

3.3  Push theories: Substance and function We now come to what we have called push theories, that is, those focusing on the investigation of the state that existed prior to the emergence of speech. The objective is to specify, as accurately as possible, the continuities between other animals (and especially nonhuman primates, of course) and humans, by pushing research as far as possible into what are believed to be the prerequisites, seeking evidence of their presence or absence in our evolutionary predecessors. Considering the plentiful literature of the last 50 years, one can propose a second dichotomy in this regard, namely between the prerequisites for cognitive functions and the prerequisites for

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

substance (genetic equipment, apparatus for speech perception and production); ­following the lead of information technologists, one might refer to these two components as software and hardware. Figure 3 schematizes the interactions that may have led to SLE.

• Society • Technology

Pull

Selection

• Evolutionary benefits

Speech language

Push software hardware

• Substance • Functions

Figure 3.  Speech emergence is pushed by pre-existing mechanisms and ingredients and pulled by social and technological motivations, while providing evolutionary benefits

3.4  Push hardware: The FOXP2 gene Specific language disorders are among the issues that feed into the research, and the controversies, concerning the specific faculty of language, its possible dissociation from other cognitive capacities, and its innateness. Following an initial publication concerning language disorders affecting three generations of a single family, a relationship was posited between a gene and the type of dysphasia observed (Gopnik 1990). Five years later, for the first time, the mutation of a single gene (FOXP2, for forkhead box P2, chromosome 7) was associated with morphosyntactic learning problems affecting 15 members in three generations of a British family, the 30-member KE family. Thereafter, the in-depth studies by Faraneh Vargha-Khadem emphasized the dysarthric aspects of the deficit: the KE family’s problems are not limited to language deficits (Vargha-Khadem, Gadian, Copp & Mishkin 2005; Vargha-Khadem, Watkins, Alcock, Fletcher & Passingham 1995). The affected subjects have serious problems controlling coordinated complex movements (buccofacial dyspraxia of the mandible

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

and upper lip), resulting in altered speech production. In fact, vocal learning (the phonation process), which is one of the components of speech production, is also found in other animal species; this led to the idea of identifying FOXP2 in songbirds that are able to change innate vocalizations and create new ones. The zebra finch is a bird that always generates practically the same song in adulthood. It has been shown that the expression of FOXP2 in young males increases significantly during the vocal learning period in a region belonging to the brain circuit involved in song learning (Haesler et al. 2004, 2007). If the level of expression of FOXP2 is experimentally lowered, treated birds show a diminished imitative capacity, a lesser ability to link sounds together, and an unstable song in adulthood. This confirms the gene’s influence on sound learning and production (Haesler et al. 2007). The FOXP2 gene appears to have mutated very little during the course of evolution (Enard et al. 2002): in 75 million years, the estimated time since mice separated from the monkeys and apes (rhesus monkey, orangutan, chimpanzee) and humans, only two nonsilent mutations (i.e. mutations that resulted in a modified amino acid) have occurred. The most probable scenario is that, in addition to playing a role in the embryonic development of the brain circuits underlying language, the FOXP2 gene is also involved at a later stage in sensorimotor abilities. However, it remains to be determined whether FOXP2 is important for motor production or for motor learning and to what extent production deficits can affect the learning of language itself. Thus, the debate has been repositioned: there is no question of a gene for grammar, but researchers now have a point of entry for understanding the neuromolecular mechanisms that influence language and speech acquisition. Other genes influencing language may well be discovered (Fisher & Marcus 2006, p. 9).

3.5  Push software: Prerequisites for cognitive functions With the push software component, we address the very rich topic of the cognitive environment likely to have played a role in the emergence of language, either because it provided the minimal conditions and thus in some sense the premises or because it constituted a system of constraints that shaped human language in some way. It is natural to then ask whether this cognitive environment is shared by other animals (particularly non-human primates) as well as human primates. Without going into detail on all the various hypotheses and studies that have enriched the literature in the last 20 years, we shall mention several lines of investigation. First of all, to be able to speak, one has to be able to hear: the human auditory system is very sophisticated, but no more so than the auditory systems of a very large number of other species of mammals. All the evidence suggests that continuity is the rule rather than the exception here. Multisensory integration capacities

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

(­ Ghazanfar, Maier, Hoffman & Logothetis 2005) and perceptual categorization capacities ­(Kluender, Diehl & Killeen 1987; Kuhl & Miller 1975) appear to be no better at discriminating humans from other mammals. Next, one must produce sounds with the vocal tract. Although the hypothesis that humans have some specific features in this regard was once widely accepted, in our view it is now obsolete. Thus, researchers have now turned to more complex integrative cognitive capacities, in two fields in particular. First, speech and language presuppose the solving of complex problems related to the matching of action and perception, in order to understand other people’s actions, as well as to acquire and learn to control complex actions. Since Rizzolatti and Arbib’s (1998) work, mirror neurons have been proposed to be the critical neuronal system linking perception and action in the brain, and thus to provide a crucial nucleus in the evolution of language. Noting that mirror neurons in monkeys’ brains respond not only during the production and perception of hand gestures but also during mimicry of facial expressions, Arbib (2005) deemed them to be the cornerstone of a phylogenetic sequence that led to imitation, the learning of simple sequences and then more complex ones – the initial step in the direction of sophisticated communication systems – and eventually the shift to protolanguages that combine independence of form and content with sequentiality of actions, the intrinsic constituents characterizing human language. Next (and this is closely related to the previous point), one might turn to communicative capacities. Studies of these capacities in other animal species are of course legion. In relation to the appearance of language, the main issues concern the sophistication of primate communication systems, and particularly the question of whether hand or voice has priority. Communicative gestures are attested to in numerous species of monkeys and apes: threat gestures (Meguerditchian & ­Vauclair 2006) and designation or pointing gestures (Hopkins & Leavens 1998; Leavens 2004). However, some species also seem to have vocal communication systems, especially noteworthy in the case of vervet monkeys’ alarm cries (Cheney & Seyfarth 1990), which distinguish between three kinds of predators: the eagle that flies down from the sky, the leopard that approaches on foot, and the snake lying concealed on the ground. A debate arose among primatologists regarding which of these systems of communication – ­gestural or oral – constituted the most plausible precursor of human language. Gestural communication seems to have good properties of intentional control and functional specialization (flexibility), which make it a powerful system of social communication (Corballis 2003). Moreover, this system is naturally predisposed to contain referential capacities, since a gesture may refer to elements in the outside world by means of its intrinsic properties (iconicity). Pointing is an evident starting point, enabling monkeys to focus their attention on the object so

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

­designated (Leavens 2004). Still, there are arguments in favor of vocal communication as well. Contrary to earlier beliefs, it is not totally dependent on a system of emotional control triggering reflexive behaviors. On the contrary, it increasingly appears to be complex, flexible, and tactical – that is to say, controlled – up to a point, depending on the goal of communication (Arnold & Z ­ uberbühler 2006; Lemasson & Barbu 2011; Slocombe & Zuberbühler 2005). Its continuity with speech gave rise to Peter MacNeilage’s (1998) Frame-Content theory, which proposes that the mastication system might have provided a precursor system that was naturally adapted for modulating vocalizations and then generating consonant-vowel sequences that may have been the basis for development of the syllable and of phonology in general. A version of this approach developed around the Vocalize to Localize theory (Abry, Vilain & Schwartz 2004, 2009). Finally, beyond the social functions of communication such as those we have just examined, one might wonder whether language is rooted in a deeper level of communication. Research on theory of mind, humans’ ability to project themselves into another person’s brain in order to systematically search for and decipher intentions, raises the question of whether this kind of mechanism exists in the ape brain (Premack  & Woodruff 1978), and more globally to assess the differences in social interaction capacities between nonhuman primates and Homo sapiens (Origgi 2001; Sperber & Origgi 2005). In a recent article, Michael Tomasello et al. (2005) hypothesize that a crucial difference resides in shared intentionality, namely the ability not only to read the partner’s mind but also to know the value of a shared action and thus to share mental states, a crucial step in the emergence and evolution of sophisticated cultural cognition mechanisms, such as language.

4.  Genetic hardware: HOX genes In the ontogeny of the human embryo, there is an organization process to build up the different tissues required. The 1978 discovery of HOX and non-HOX genes in the fruit fly and then in all mammals indicates that these “architect genes” have been present throughout evolution. After the first 15 to 20 days following fertilization (Couly, ­Coltey & Le Douarin 1993; Couly & Bennaceur 1998; Couly et al. 2002), these genes are responsible for embryo development (Figure 4) and determine the anterior-posterior and dorsoventral organization of the embryo, and thus the placement of the base of the skull, the head, and the body. Consequently, these genes are involved in the growth of bones in the head and neck, the framework in which the vocal tract is situated. An overview of this development (Benoît 2001, 2008a) enables us to grasp the basic outlines of the bone structure of the head, the hyoid bone, and the neck. Non-HOX

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

HOX

Non HOX r1

r2

r3

r4

r5

r6

r7 r8

B2

3

4

B1

Figure 4.  Zones of expression of HOX and non-HOX genes. Schematic representation of the migration and destination of neural crests (based on Charrier & Creuzet 2007)

genes (which are distributed among the 46 chromosomes) are responsible for putting in place the elements needed for the membranous ossification of the anterior and upper portion of the head and the front of the mandible. HOX genes (distributed among four chromosomes) are responsible for the enchondral ossification of the back of the skull (the occiput), the base of the skull, the hyoid bone, the cervical spine ­(vertebrae C1 to C7) and the rest of the postcranial skeleton. For a long time, studies of comparative anatomy have stressed the unchanging nature of the cervical spine. In his 1912 treaty on variations in the human vertebral column, Anatole-Félix Le Double emphasized this point, from a global anthropological perspective: “Of all regions of the spine, the cervical region is the one in which the number of vertebrae is most fixed in animals belonging to the class of Mammals. Apart from a few exceptions […], it always includes 7 vertebrae […], which explains already its low variability in the human species” (p. 8; our translation). To visualize the areas of expression of these developmental genes, we have drawn a line in the medio sagittal plane passing through the lambda point (at the junction of the interparietal and occipitoparietal sutures), the anterior synostosic crest of the sphenoid bone, and the margin between the body and the greater horns of the hyoid bone (Figure 5). © 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech  Membranous ossification (with connective tissue) Lambda

At birth no HOX genes Turcical saddle HOX genes

Enchondral ossification (replacement of hyaline cartilage) Cartilage

Hyoid

Figure 5.  Zones of influence of HOX and non-HOX genes on the bone structures of the head and neck (based on Benoît 2001, 2008a)

5.  Anatomical hardware 5.1  The vocal tract In order to speak, one needs, at the very least, a phonatory system that generates a sound source by means of the vibration of the vocal cords located in the larynx and a vocal tract that modifies the features of this source to produce a succession of sounds. In that way, a sound continuum is generated in which it is possible to identify a sequence of syllables made up of vowels and consonants. The vocal tract can be considered as a tube that extends from glottis (space between the vocal folds) to the lips, with a possible bifurcation by the nasal cavities, connected when the velum is lowered (Figure 6). In essence, this tract is delimited by soft parts (lips, tongue, pharyngeal wall) whose configuration depends on the cranial morphology and the position of the spine in relation to it. Control of this tract essentially ­consists in managing its configuration by the positioning of articulators (mandible, tongue, velum, lips), especially the openness of the lips and the position and size of a zone of contraction or occlusion that may range from the pharynx to the dental-alveolar zone. What we call the oral cavity length (OCL) is the distance from the prosthion to the pharyngeal point, while pharynx height (PH) is the distance from the pharyngeal point to the glottis (Figure 6). The ratio PH/OCL is the pharynx height index (PHI). © 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

4 1

2

5

3

8

6

1 2

9

3

7

10 11 12

14 13

Figure 6:  Anatomical section of a man’s vocal tract from the lips to the glottis (G. Captier, Montpellier anatomical laboratory): (1) outer limit of lip protrusion; (2) prosthion and infradentale; (3) incisors; (4) hard palate; (5) velum; (6) pharyngeal point, which corresponds to the area of contact with the velum, when it is raised; (7) pharyngeal wall; (8) anterior tubercle of C1; (9) odontoid process; (10) C2; (11) C3; (12) C4; (13) glottis below the vestibular folds; (14) hyoid bone

5.2  The hyoid bone In male subjects, the hyoid bone is located at the level of C3–C4 and the glottis at­ C5–C6 (one vertebra higher for females) (cf. Figure 6). The line that connects the tuberculum sellae (or tubercle of sella turcica) to the posterior portion of the hyoid bone passes through the projection of the foramen mandibulae; this property makes it possible to position the hyoid bone (Figure 7). This line corresponds also to the border between Hox and non-Hox genes for mandible and skull base (Benoît 2008b, 2012). The hyoid bone is extremely fragile and disappears very quickly in the course of fossilization due to its small size and the fact that it is not connected to other bones. Only a few specimens have been discovered: ––

two Neanderthals: at Kebara, Mount Carmel, in Israel (60 ka [thousand years]) and at the El Sidron site in Spain (43 ka), both of which are very similar to those of modern humans (Arensburg 1991; Arensburg et al. 1989; Rodriguez, Cabo & Egocheaga 2003);

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

Tuberculum sellae

Frankfort line

Foramen mandibulae

Figure 7.  Position of the hyoid bone predicted by the intersection of two lines: T ­ uberculum sellae to the Foramen mandibulae, and the line parallel to Frankfort line through the ­mandibular symphysis

–– ––

several pre-Neanderthals, at the Sima de los Huesos site in Spain (at least 530 ka), presenting the characteristics of modern humans (Martinez et al. 2008); one Australopithecus afarensis, at the Dikika site in Ethiopia (3.3 Ma [million years]), for “Selam,” a three-year-old child; this bone resembles that of a chimpanzee more than a human one (Alemseged et al. 2006).

As we have seen, the anatomical layout of the base of the skull, the cervical spine, the hyoid bone, and thus the larynx is controlled by the expression of HOX genes. Since the transition to the upright stance, this morphology has probably not changed much (Boë et al. 2007, Granat, Peyre & Boë 2007).

5.3  The hyoid bone and the lowering of the mandible Turning now to musculature, the hyoid bone is inserted at the center of a star-shaped network made up of the supra- and infrahyoid muscles (Figure 8a). This network serves to stabilize the position of the larynx and tongue and enable the mandible to be lowered. Its positioning in this intricate muscle complex means that the hyoid intervenes indirectly in breathing, chewing, swallowing, maintaining head and neck posture, and speech production. In fact, during speech production, this bone can move © 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

vertically and horizontally by close to ±7 mm. With almost a centimeter and a half of latitude, it is not crucial to position the hyoid bone of a fossil to within a few millimeters. These movements are essentially related to movements of the tongue and mandible (Beautemps, Badin & Bailly 2001; Hoole & Kroos 1998). The mandible is lowered by contraction of the pterygoid muscles and co-contraction of the two bellies of the digastric muscle (Figure 8b). The position of the hyoid bone under the mandibular plane facilitates this movement (Figure 9). a.

b.

Pterygoid Estimated rotation center (mandibular formanen)

1

1 2

Post. digastric Digastric

3 Ant. digastric

Hyoid

Figure 8.  (a) Star-shaped suspension of the hyoid bone with the (1) anterior and posterior digastric muscles, (2) thyrohyoid muscle, and (3) sternothyroid muscle (based on Szunyoghy & Fehér 1996); (b) The role of the pterygoid and digastric muscles in lowering the mandible (based on Kamina 2009)

2

2

1

1

Figure 9.  Lowering of the mandible by co-contraction of the two bellies of the digastric muscle and action of the pterygoid muscle. Anterior (1) and posterior part (2) of the digastric muscle between the fossa of the digastric, the hyoid, and the mastoid process. One can see that the hyoid bone remains well under the mandibular plane (X-rays: O. Granat, & J. Granat)

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

5.4  The hyoid bone and the tongue The hyoid bone supports the tongue: the mylohyoid, geniohyoid, and hyoglossal muscles are inserted in it (Figure 10). We have increasingly precise 2D and 3D models of this organ (Wilhelms-Tricarico 1995; Payan & Perrier 1997; Takemoto 2001; Gérard, Wilhelms-Tricarico, Perrier & Payan 2003; Gérard 2004; Gérard, Perrier & Payan 2006; Buchaillard 2007; Buchaillard, Perrier & Payan 2009) (Figure 11), which make it possible to monitor the actions involved in moving it (Gérard, Perrier & Payan 2006; Perrier, Payan, Buchaillard, Nazari & Chabanas 2011) and conduct comparative studies of its anatomy in humans and the great apes (Takemoto 2008). Stylo-glossus Superior longitudinal

Hyo-glossus

Fibers Inferior longitudinal Genio-glossus Genio-hyoid

Mylo-hyoid

Hyoid

Figure 10.  Schematic representation of the hyoid bone, the tongue muscles, and their insertions (based on Takemoto 2001)

5.5  The hyoid bone and the larynx The larynx is attached to the hyoid bone by the thyrohyoid membrane, which is composed of the thyroid, cricoid, and arytenoid cartilages and contains the vocal folds (Figure 12). Biometric data on these cartilages from varying sources reflect the stable size of this structure. The dimensions of the various parts and their respective positions (Ajmani 1990; Leksan et al. 2005; Miller et al. 1998) enable us to position this group under the m ­ andibular plane. It is important to note that the best-preserved fossil hyoid bone, the one from Kebara 2, has the same dimensions as a modern hyoid – specifically 32.4 mm for W1 (­Figure 12) – as Figure 13 shows. © 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

Figure 11.  3D biomechanical model of the tongue and its surrounding bones (Buchaillard et al. 2009)

H1

Hyoid bone H3

Thyroid cartilage

W1

W2 H2

Arytenoid cartilage and vocal folds

Cricoid cartilage Figure 12.  Relative positions of hyoid bone, thyroid, arytenoid, and cricoid cartilages and of thyroarytenoid muscle which constitutes focal folds (distances and standard deviations in mm). H1, height of the hyoid bone body 12.3 (1.4) ; H2, larynx total height 38.1 (8.2); H3, height between hyoid bone superior edge and cricoid lower edge 63.1 (4.9) ; W1, hyoid bone width 29.3 (5.2) ; W2, thyroid cartilage width 32.2 (4.3)

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

Figure 13.  Kebara and modern hyoid bones (Arensburg 1991; Arensburg et al. 1989)

5.6  The hyoido-laryngeal space The outer limit of the lips depends on the alveolar rims of the maxilla and mandible, while the pharyngeal wall is limited in the back region by the anterior edge of the spine. As for the vocal folds, there are limits that allow one to locate them too: the group comprising the hyoid bone (support for the tongue and attachment point for the depressor muscles of the mandible) and the larynx from which it is suspended is positioned in the neck, below the mandibular plane and above the seventh cervical vertebra, which constitutes the lower limit (Figure 14). Cervical vertebrae, pharyngeal wall: Posterior limit

Prosthion

Hard palate PNS C1

Incisors Infradental Mentum Hyoido-laryngeal space

Mandible plane: Upper limit

Gonion Hyoid Thyroid Glottis Cricoid

C5 C6 C7: Lower limit

Figure 14.  Bony and cartilaginous limits within which the soft parts of the vocal tract are situated: the lips in front of the incisors, the hard palate, the mandible, the anterior portion of the cervical spine and its seventh vertebra, and the glottis (space between the vocal cords) between the thyroid and cricoid cartilages (based on Pernkopf 1980)

© 2013. John Benjamins Publishing Company All rights reserved

 Louis-Jean Boë et al.

5.7  The cervical spine To obtain the mean dimensions of the cervical spines of modern humans, we used orthodontic X-rays of 56 adult men and 83 adult women. In general, only vertebrae C1 to C5 are shown, with possibly the top of C6. We chose to use the basion and, in the anterior portion of the spine, the top of C1, the border between C1 and the body of C2, the top and bottom of C2 to C5, and the top of C6 as benchmarks (see Figure 15). The hypothesis that the dimensions of the spine are unchanging was not disconfirmed. Table 1 presents the means (and standard deviations, in mm) of our measurements.

Basion C1

C1

C2 D2 C3 D3 C4 D4 C5

C5

D5

Figure 15.  Landmarks used to measure the height of the cervical spine

Table 1.  Distances and standard deviations (in mm) between the basion and the top of C1; heights of the bodies of vertebrae C2 to C5 and intervertebral disks D2 to D5. The distances are measured on the anterior face of the vertebrae Distances (mm) Basion–C1

Men

Women

  7.2 (2.0)

  6.0 (1.5)

C1

17.1 (2.3)

14.7 (2.2)

C2 (body)

23.7 (2.4)

21.1 (2.3)

D2

  4.6 (1.2)

  4.3 (1.0)

C3

13.8 (1.8)

12.0 (1.2) (Continued)

© 2013. John Benjamins Publishing Company All rights reserved



Reconstructed fossil vocal tracts and the production of speech 

Table 1.  (Continued) Distances (mm) D3

Men

Women

  5.1 (1.2)

  4.8 (1.1)

C4

12.8 (2.1)

11.2 (1.4)

D4

  4.7 (1.1)

  4.6 (1.0)

C5

12.3 (2.0)

10.6 (1.2)

D5

  5.1 (1.2)

  4.8 (1.1)

C1–C5 (bottom)

94.1 (8.0)

83.4 (5.9)

Our results correspond well to previously published findings (Danforth 1930; Gilad & Nissan 1985; Katz et al. 1975; Kosif et al. 2007; Liguoro et al. 1994). Males have greater vertebral body heights, C3 through C7, than females (p