The Evolution of Speech and Language

4 downloads 0 Views 525KB Size Report
Mar 3, 2014 - muscle "spindles" that monitor the force that they produce (Bouhuys. ) ..... from the hyoid bone with the lateral thyrohyoid ligament and triteceum ...
Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

The Evolution of Speech and Language Abstract Human speech, language, and cognition derive from anatomy and neural mechanisms that have been shaped by the Darwinian process of natural selection acting on variation but that have roots present in other living species. Language did not suddenly arise 50,000-100,000 years ago through a mutation that yielded an innate "faculty of language " nor does the human brain include an organ devoted to language and language alone. Broca's area is not the center of language. Neural circuits linking local activity in different neural structures regulate complex behaviors. Neural circuits that were present in early mammal-like reptiles play a part in regulating laryngeal phonation, conveying both referential information and emotion. Speech plays a central role, enabling transmission of information at a rate that exceeds the auditory fusion frequency. The unique human tongue enhances the robustness of speech, but Neanderthals and other archaic hominins whose neck and skull proportions preclude their having an adult-like human tongue nevertheless could talk. Comparative studies of present-day apes suggest that hominin "protolanguage" lacking syntax never existed. The neural bases of human language are not domain-specific - in other words, they are not devoted to language alone. Mutations on the FOXP2 transcriptional gene shared by humans, Neanderthals, and at least one other archaic species enhanced synaptic plasticity in cortical-basal ganglia circuits that are implicated in motor behavior, cognitive flexibility, language, and associative learning. A selective sweep occurred about 200,000 years ago on a unique human version of this gene. Other transcriptional genes appear to be implicated in enhancing cortical-basal ganglia and other neural circuits.

Introduction The findings of studies of the communicative capacities of living species, the biological bases of human language including recent advances in neuroscience and genetics, and the archaeological record suggest that human language has a long evolutionary history. Human language makes use of anatomical structures and brain mechanisms that have deep evolutionary roots. The evolutionary framework proposed by Charles Darwin appears to account for human language, which is linked to the biological bases of other aspects of human behavior including cognition and motor control. Aspects of human anatomy and brain mechanisms that were initially adapted to serve other ends were modified to confer human linguistic capacities. Natural selection acted on mutations that were useful in the Darwinian "struggle for existence," entailing the survival of progeny in particular ecosystems. For hominins, this includes their culture. Thus, culture, defined in a broad sense, must be taken into account. A wide range of independent studies will be taken into account. These include comparative studies of the behavior, morphology, and neuroanatomy of other species. Traditional "experiments in nature" of the behavioral deficits of human subjects arising from trauma, strokes, and neurodegenerative diseases have yielded insights on the neural bases of motor control, language, cognition, and emotional regulation. The findings and implications of these studies will be reviewed. The findings of neuroimaging techniques such as functional magnetic resonance imaging (fMRI)) and diffusion tensor analysis (DTI) have yielded further insights on both the nature and evolution of the brain bases of language and cognition. Recent genetic studies such as ones comparing the DNA of apes, humans, and extinct hominins provide a fresh starting point for understanding how the human brain was shaped. They point to mutations on transcriptional factors and selective sweeps in the last 500 Ka acting on humans, Neanderthals, and other hominin species. A unique human mutation and selective sweep about 250 Ka may have yielded current human cognitive and linguistic capacities. The archaeological record cannot, in itself, serve as an index of cognitive or linguistic ability, but it has yielded valuable insights that will be taken into account. Although it is probable that earlier forms of hominin language lacked many of the characteristics of present-day languages, many proposals concerning the precise form of language at any period can neither be verified nor refuted owing to the inherent impossibility of observing behavior in prehistory. However, it is possible to rule out some proposals, such as a "protolanguage" that had words but no syntax (Bickerton 1990). The fact that present-day apes raised in a language-using environment can master simple syntax using sign language or manual phonetic systems rules out this possibility. The evolution of the human tongue, which will be discussed here, likewise rules out Neanderthal language being limited to humming (Mithen 2005). Chomsky's (2012) claim that hominins lacked any form of language until 50 Ka is

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 1

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

ruled out by the findings of the studies reviewed here. Chomsky's (1972, 2012) dismissal of natural selection having a significant role in biological evolution also is ruled out by this evidence. Selective sweeps on mutilations on transcriptional genes enhanced cortical-basal ganglia circuits implicated in language cognition and motor control (Enard et al. 2002; Reimers-Kipping et al. 2011; Lieberman 2013; Maricic et al 2013).

The Evolutionary Framework Charles Darwin in 1859 could not have imagined the progress achieved in our understanding of the biological bases of human language and their evolution. Although imperfect - no one can state with certainty how biological brains work current findings affirm that the agents for the evolution of language are those proposed by Darwin, natural selection, whereby …any variation, however slight and from whatever cause proceeding, if it be in any degree profitable for an individual of any species, in its infinitely complex relations to other organic beings and to external nature, will tend to the preservation of that individual, and will generally be inherited by its offspring. The offspring, also, will thus have a better chance of surviving…. (p. 61) And the …fact that an organ originally constructed for one purpose…may be converted into one for a wholly different purpose…. (p. 190) Only humans possess language, but its evolution does not appear to involve any singular, uniquely human, evolutionary process. Despite what I view as claims to the contrary by Noam Chomsky and his colleagues, the evolution of the specialized anatomy and neural substrates that confer language derives from these Darwinian mechanisms, as is the case for the specialized capabilities of other species, for example, butterflies or anteaters. Moreover, comparative studies and insights from genetics and neurophysiology show that language has roots that can be traced back in time to extinct species, as is the case for other aspects of human behavior. Some aspects of human language can be observed in other living species. It is also becoming apparent that elements of the neural substrate that confer language are involved in other aspects of cognitive and motor behavior. Current research shows that genes that play critical roles in the development of the muscles, lungs, and brains of other species have been modified through the process of natural selection to enhance human cognitive and linguistic capacities. The seemingly intractable problem in tracing the evolution of human language is the extinction of the hominin species that represent key intermediate stages in the evolution of language. However, the fossil and archaeological records and current knowledge of the biological bases of human language allow us to rule out implausible scenarios and permit reasonable inferences about the evolution of language.

Communication and Cognition Human language serves as a medium both of thought and communication, and specialized anatomy has evolved to enhance the robustness of human speech, the default modality by which humans communicate using language. Studies of the functional architecture of the brains of primates and other living species show that activity in different parts of the brain linked in neural circuits generally is necessary to carry out a "complex" behavior, whether that is picking up an object, comprehending the meaning and emotional content of a sentence, or changing the direction of a thought process (e.g., Alexander et al. 1986; Kotz et al. 2003; Monchi et al. 2001, 2006a; Simard et al. 2011; Wang et al. 2005; Postle 2006; Lieberman 2000, 2002, 2006, 2013). The operations performed by the linked neural structures' circuits do not, in general, appear to be domain-specific. Neural circuits that are active during cognitive tasks - including arithmetic, sorting objects according to shape or color, shifting cognitive sets, etc. - are also involved in linguistic tasks such as keeping words in short-term "working memory" and comprehending the meaning of a sentence. Debates as to whether language evolved to serve communication or cognition are thus inherently irresolvable. Since language plays a role in virtually all aspects of human culture, it also is difficult to identify any single aspect of behavior (e.g., social interaction) that was "the" factor that led to the evolution of human language. Moreover, the specific form of a language appears to both reflect the needs of a culture and to a degree affect the thoughts and manner in which the speakers of a language view the world (Everett 2012). However, it is clear that language is the primary medium by which people communicate, transmitting the information that constitutes a culture from generation to generation and sharing thoughts from individual to individual in

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 2

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

every time and place. It is thus improbable that natural selection facilitating communication had no role in the evolution of language (e.g., Chomsky 2012; Fitch 2010).

Comparative Studies Comparative studies of species other than present-day humans show that they possess some aspects of human language to a lesser degree. Humans and chimpanzees share a common ancestor that lived about six million years ago (The Chimpanzee Sequencing and Analysis Consortium 2005). Thus, although living apes have also evolved since that epoch, studies of the communicative capabilities of apes can yield some inferences on the linguistic abilities of extinct hominin species. Any aspect of language that can be mastered by present-day apes most likely was present in the early stages of hominin evolution. It is apparent that culture plays a role in the acquisition of linguistic capabilities. Chimpanzees, if exposed to a language-using environment early in life, can acquire active vocabularies of about 150 words, communicating their needs and observations to humans and to each other (e.g., Gardner and Gardner 1969; Savage-Rumbaugh et al. 1985). The linguistic environment can even be one in which chimpanzees use a version of American Sign Language (ASL) to communicate with each other. Chimpanzee Loulis in infancy acquired some ASL proficiency when he could only observe and participate in ASL communication with other ASL-using chimpanzees. Biological capacities that may not be apparent in one cultural setting can be observed in other environments. No one, for example, in 1,800 could have thought that cars could routinely pass each other at closing speeds that exceed 350 km/h without massive loss of life. Comparative studies of ape communication show that apes can actively use simple syntax and comprehend spoken words and simple spoken sentences. It thus is improbable that any "protolanguage" lacking words ever existed. Other living species can also comprehend spoken words. Some dogs can learn in one trial to reference the meaning of hundreds of spoken words with specific objects (Kaminski et al. 2004). However, no nonhuman species can talk. Apes instead use manual sign language and other manual system to signify words, lending plausibility to the idea that manual gestures played a significant role in the early stages of language evolution (Hewes 1973).

Human Speech Manual gestures, facial expressions, and body language all continue to play a role in human communication, but speech is the default, primary, phonetic modality of language. Sign languages are a comparatively recent invention, dating to the eighteenth century. Speech confers numerous advantages over communication by means of manual gestures, facial expressions, and posture, such as not having to direct one's attention to individuals who are communicating and freeing one's hands when communicating. However, perhaps more importantly, speech allows humans to rapidly transmit information. The rate at which the sounds that convey words are transmitted exceeds the fusion frequency of the auditory system - the rate at which other sounds merge into a meaningless buzz. The process by which this high transmission rate is achieved has been enhanced through the evolution of the species-specific human tongue and the airway above the larynx. The fossil record thus provides a time line for this process as well as for the evolution of the brain bases for speech motor control, language, and some critical elements of human cognitive ability (Lieberman and McCarthy 2007; Lieberman 2007, 2013). Understanding this process requires some prefatory information on the physiology of speech production . The invention of pipe organs in mediaeval Europe shows that some knowledge of the physiology of speech production was present. In a pipe organ, a source of acoustic energy with a wide frequency spectrum is filtered by pipes that allow energy to pass through them in narrow ranges of frequency, producing particular musical notes. Johannes Muller systematically described the physiology of speech production in 1848. In Fig. 1 a sketch view of the anatomy involved in speech production is presented.

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 3

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Fig. 1 The lungs, larynx, and supralaryngeal vocal tract

The lungs provide the source of energy for speech production . Speech is almost always produced during expiration, reflecting the evolutionary history of the lungs. As Darwin pointed out, the lungs of mammals and other terrestrial species evolved from the swim bladders of fish. Swim bladders enabled fish to hover at a particular depth, thereby conserving energy that would otherwise be necessary to move flippers or tails. This is accomplished by storing air extracted from water by gills in elastic sacks, which adjust their body size in order to displace water at a given depth to match their weight. Human lungs retain this elastic property, reflecting the opportunistic, proximate, logic of evolution. During quiet inspiration, the diaphragm and intercostal and abdominal muscles expand the lungs. The elastic recoil of the lungs then provides the force that expels air during expiration. The duration of inspiration and expiration is almost equal. Since the elastic lungs act in a manner analogous to a rubber balloon, the alveolar air pressure is at a maximum at the start of each expiration and linearly falls as the lungs deflate. The alveolar (lung) air pressure during expiration thus starts at a high level and falls as the volume of the elastic lung sacks falls. The alveolar pressure of the outgoing flow of air impinges on the vocal cords of the larynx. The pattern of activity during speech and singing is quite different. The diaphragm is immobilized. The duration of expiration is keyed to the length of the sentence that a speaker produces, and alveolar air pressure is maintained at an almost uniform level until the end of expiration. This entails a speaker enabling a set of instructions to the intercostal and abdominal muscles to "hold back" against the force generated by the elastic recoil force, which is high at the start of the expiration and gradually falls. The intercostal and abdominal muscles contain muscle "spindles" that monitor the force that they produce (Bouhuys 1974). The diaphragm contains few spindles, which accounts for its taking no part during speech and singing. A speaker must anticipate the length of the sentence that he or she intends to produce, generally taking in more air before the start of a long sentence. Since lung volume is higher before the start of a long sentence, the holdback maneuvers of the intercostal and abdominal muscles must take the higher elastic recoil force into account to achieve a relatively level alveolar air pressure during the sentence (Bouhuys 1974). In his experiments, Johannes Muller found that the rate at which the vocal cords open and close depends on (1) the tension of the muscles that make up the vocal cords, which are complex structures made up of muscles, cartilage, and http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 4

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

other tissue, and (2) alveolar air pressure. The fundamental frequency of phonation (Fo) is determined by the rate at which the vocal cords open and close. It is necessary that alveolar air pressure be regulated during speech, as the initial high alveolar air pressure would blow apart the vocal cords, or the fundamental frequency of phonation would start at a very high Fo and rapidly decrease throughout the sentence. This Fo pattern generally does not occur during speech. For most declarative sentences in languages such as English, Fo is more or less level except for momentary controlled peaks that signal emphasis and a sharp decline at the sentence's end (Armstrong and Ward 1926; Lieberman 1967). Aerostatic and aerodynamic forces and muscle tension acting on the vocal cords of the larynx in all mammals and anurans generate and modulate phonation in much the same manner. However, as Victor Negus (1949) pointed out, the larynges of species that rely on vocal communication, including humans, have been adapted to facilitate phonation at the expense of protecting the lungs from the intrusion of water and reducing the rate at which air can be transferred into the lungs during inspiration. The meaningful calls of many species are differentiated by Fo contours and Fo variations that convey emotional information in all known human cultures and languages. But the fundamental frequency of phonation also conveys linguistic distinctions beyond signaling the end of sentence-like segments and yes-no questions which in English and other languages have rising or level sentence-end Fo contours (Armstrong and Ward 1926; Pike 1945). Local modulation of Fo contours differentiates words in tone languages such as the Chinese languages (Tseng 1981). Many independent studies show that primates signal referential information by means of calls that have different F0 contours (e.g., Cheyney and Seyfarth 1990). This again points out the implausibility of any stage in early hominin communication that relied exclusively on manual gestures.

The Supralaryngeal Vocal Tract and Encoding The larynx, however, is not the key anatomical structure involved in speech production. Its primary role is that of a transducer; the larynx converts the relatively slow flow of air out of the lungs into phonation. As the vocal cords open and close an almost periodic series of "puffs" of air enters the airway above the vocal cords, the supralaryngeal vocal tract (SVT), that contain acoustic energy at frequencies that are audible. The acoustic energy generated by the larynx is the "source" or energy for phonated vowels such as the vowels and initial consonants of the words bit and map. Acoustic energy occurs at the fundamental frequency of phonation and at its harmonics, which are integral multiples of Fo. Movable type is often used as a metaphor to describe the speech signal; speakers ostensibly strung together phonemes, segmental sounds approximated by the letters of the alphabet, to form words. The phonemes /t/, /a/, and /b/, for example, can be rearranged to form the words tab or bat. This, however, is not the case. In the 1960s attempts were made to build devices that would string together phonemes to produce comprehensible speech. The first step appeared to isolate phonemes. It was thought that it would be possible to isolate phonemes, equivalent to the letters of the alphabet, from tape recordings. When a person spoke the word too, there hypothetically should be a segment of tape that contained the phoneme /t/ before a segment of tape that contained the phoneme or sound [u] (the phonetic symbol for the actual sound of the vowel of the word too). However, much to everyone's surprise, it proved to be impossible to isolate sounds that corresponded to the hypothetical phonemes that formed words. When the segment of recording tape that was supposed to correspond to the phoneme /t/ in the word too was isolated and linked to the vowel [i] segmented from the word tea, the result was incomprehensible. The reason for this phenomenon soon became evident - the position assumed by the lips, tongue, jaw, and larynx for the phoneme /t/ is affected by those necessary to produce the vowel [u] in a different manner than the vowel [i] yielding different overall "encoded" formant frequency patterns for the words too and tea. It was impossible to segment a "pure" [t] or for that matter any consonant or vowel. Encoding is a general effect. For example, when producing a word such as bit, the position of the tongue, jaw, lips, and larynx which determine the formant frequencies of [b] must move to the different positions necessary to produce [I] (the vowel of this word) and then to produce the [t]. As they move, albeit rapidly, there must be transitions between the SVT shapes of each of the phonemes. Speech allows humans to transmit information at a rate that exceeds that of any other acoustic signal by means of a complex perceptual process. Research at Haskins Laboratories in the 1960s first showed that speech perception involves listeners paying attention to the encoded, melded formant frequency pattern that conveys the entire word. The Haskins research program attempted to devise a machine for blind persons that would transform printed texts into speech. They found that it was impossible to find any acoustic signals that would correspond to isolated phonemes. Moreover, when any other nonspeech acoustic signals or mixed tactile/acoustic signals were used to transmit the segmental roughly phonemic representation of words, it was necessary to transmit about 20 phonemes per second to approximate a normal speaking rate. However, the fusion frequency of the human auditory system is about 15 Hz and the hypothetical phonemes would merge into a meaningless buzz. In short, the perceptual unit to which listeners pay attention is the

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 5

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

encoded word with the consonant-vowel (CV) syllable being the minimal unit. Listeners perceptually decode each word at some internal level, taking account of the constraints of speech production according to the "motor theory of speech perception" (Liberman et al. 1967). The formant frequency patterns of the hypothetical independent phonemes posited by linguists are always melded together into syllables and words. This seeming deficiency explains why speech is the default medium conveying information through the medium of language. The minimal unit in the speech signal is inherently an encoded CV or longer word sequence. It then can be perceptually decoded into sequences of phoneme. At the neural level, phonemes may be motor-control instruction sets - gestures that a person learns that are instantiated in the motor cortex as matrisomes, instructions that guide a set of muscles to perform an act (Sanes et al. 1999). The encoded instruction sets yield the motor acts necessary to produce an entire syllable or word. Segmental un-encoded motor gestures and acoustic cues do not characterize fluent speech. If, for example, a person's lips are viewed when she or he utters the words too and tea (the syllables [tu] and [ti]), it is apparent that the lips are already pursed and projecting to produce the vowel [u] of too at the very start of the syllable. Chinese orthography, which codes words, is a better approximation of the speech signal than alphabetic systems. Successful speech-recognition systems that have been developed since 1967 use algorithms that involve matching the incoming acoustic signal to probable word templates.

Fig. 2 The adult human tongue and supralaryngeal vocal tract. Half of the tongue, the "SVTh segment," rests in the oral cavity, half the "SVTv segment" rests in the pharynx

The Supralaryngeal Vocal Tract and Normalization Johannes Muller (1848) realized that the airway above the larynx, the supralaryngeal vocal tract (SVT), played a critical role in speech production in a manner similar to a pipe organ. The note that is heard is the product of the organ pipe acoustically filtering the source of sound energy produced by the air flowing through a constriction at the end of the pipe. The airflow through the constriction produces acoustic energy across a wide range of audible frequencies. The organ pipe reduces the amount of sound energy that passes through it at most frequencies. The frequency at which maximum acoustic energy passes through the organ pipe is perceived as the musical note. The length and shape of the organ pipe determines the musical note. The length and shape of the SVT, which can be thought of as a malleable organ pipe, results in maximum acoustic energy passing through it at formant frequencies that are the determinants of individual phonemes along with durational cues. The average formant frequencies of the vowel [i], for example, are 270, 2,300, and 3,000 Hz for the adult males

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 6

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

studied by Peterson and Barney (1952), whereas the formant frequencies of the vowel /u/ were 300, 870, and 2,240 Hz. The absolute values of the formant frequencies for any phoneme depend on the length of a speaker's SVT. The average formant frequencies of /i/ and /u/ of the women in Peterson and Barney's study, who had shorter SVTs, were higher. One of the problems encountered in devising automatic speech-recognition systems is how to take into account the effect of speakers' differing SVT lengths. Since the length of the SVT varies from person to person and during the years of childhood and adolescence for the same individual, the absolute values of the formant frequency pattern vary. For example, the formant frequencies of an [i] would be 1.5 times higher for a child whose SVT length was 11.3 cm long than for an adult whose SVT was 17 cm long. Both formant frequency patterns would be perceived as examples of an [i] owing to a speech-specific process of perceptual normalization in which listeners internally estimate the length of a speaker's SVT (Nearey 1978). Listeners can estimate SVT length after hearing a short stretch of speech or "reverse engineering" a known phrase such as person saying hello. Nearey (1978) showed that the vowel [i] (of the word see) was an optimal signal for immediate SVT normalization. The problem of SVT normalization was evident in one of the earliest studies aimed at achieving speech recognition by machine. Since different speaker's SVTs differ in length, any successful automatic procedure would have to take this into account. Data from Peterson and Barney's (1952) study pointed at [i] (the vowel of see) and to a lesser extent [u] being optimal acoustic cues for SVT normalization. Words having the form [hVd], such as heed and had, produced by ten different speakers were presented in quasi-random order to a panel of listeners; the listeners had to immediately adjust for different speakers' voices and identify each word. Out of 10,000 trials, listeners misidentified [i] two times, [u] six times, and other vowels hundreds of times. The speakers in Peterson and Barney's ( 1952) study spoke different dialects of American English, and some had foreign accents, which led to the supposition that dialect differences were responsible for some of the errors. However, Hillenbrand et al. (1995) reported virtually identical results in a study which eliminated dialect variation and made use of computer-implemented technology that was unavailable in 1952 to analyze the speech signals.

The Unique Human Tongue Charles Darwin raised the question of why the human tongue is so peculiar. Darwin noted: The strange fact that every particle of food and drink which we swallow has to pass over the orifice of the trachea, with some risk of falling into the lungs…. (1859, p. 191) In the twentieth century, Victor Negus's studies of comparative anatomy demonstrated both the species-specific nature of the human tongue and the fact that it increased the risk of choking to death on food. Negus concluded that the adult human larynx was carried down into the pharynx because it "is closely apposed to the tongue" (Negus 1949, pp. 25-26). Choking on food remains the fourth leading cause of accidental death in the United States ( http://www.nsc.org/library/report_injury_usa.htm). Negus speculated that the unique shape of the human tongue in some manner facilitated speech communication, compensating for increasing the risk of choking. That supposition has been validated by computer-modeling studies that calculate the range of formant frequencies that could be produced by adult human and nonhuman SVTs (Lieberman et al. 1969, 1972; Lieberman and Crelin 1971; Carre et al. 1995; De Boer 2010). The initial Lieberman et al. (1969) study calculated the formant frequency patterns of the vowels that a rhesus macaque's tongue and SVT could produce. The range of tongue shapes was estimated by taking into account constraints on tongue deformation, which were subsequently confirmed by Takemoto (2001, 2008). The monkey's tongue was positioned as far as possible to produce the SVT configurations used by adult human speakers to yield the "point" vowels [i], [u], and [a]. These vowels delimit the range of vowels used in human languages (Greenberg 1963). The computer-modeling technique showed that the monkey's vowel space did not include these point vowels. Newborn infants have SVTs that are similar to those of nonhuman primates (Negus 1949; Crelin 1969). The SVT computer modeling of Lieberman et al. (1972 ) used similar techniques to model the SVTs of chimpanzees and human newborn infants. Crelin had published the first comprehensive anatomy of the human newborn ( 1969). Crelin concluded that the Neanderthal SVT was similar to that of a large human newborn on the basis of the total pattern of morphological similarities between the basicrania of newborns and the adult male Neanderthal specimen from La Chapelle-aux-Saints. Lieberman and Crelin (1971) produced a computer model of the reconstructed La Chapelle 1 specimen using cineradiographic data on newborn infant cry (Truby et al. 1965) and cineradiographic data on adult human speech (Perkell 1969) to guide the jaw, tongue, lip, and laryngeal maneuvers to derive the range of possible vowels. As

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 7

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Lieberman and Crelin (1971, p. 211) noted, "When we were in doubt as, for example, with respect to the range of variation in the area of the larynx, we used data derived from adult Man that would enhance the phonetic ability of the Neanderthal vocal tract …" At birth in humans, most of the tongue is positioned in the mouth and its shape is flat as is the case for other mammals. The proportion of the tongue in the oral "horizontal" (SVTH) part of the infant oral cavity relative to the part of the tongue in the "vertical" pharynx (SVTV) - SVTH/SVTV - was 1.5 when the larynx was positioned at its lowest point in the "forceful" cries pictured in Truby et al. (1965, pp. 75-78). The human tongue does not attain its adult 1:1 SVTH/SVTV proportions and almost circular posterior midsagittal shape until around 6-8 years of age. The descent and reshaping of the human tongue was determined using cephalometric radiographs of 28 subjects between the ages of 1 month and 14 years. The developmental process by which the species-specific human vocal tract is formed is complex and takes 6-8 years and sometimes as long as 10 years (Lieberman and McCarthy 1999; Lieberman et al. 2001). The length of the oral cavity is first shortened in humans by developmental processes that move the hard palate back on the base of the skull, shortening the nasopharynx (D. Lieberman 2011). The shape and position of the tongue then gradually changes from the newborn tongue, which is flat and is positioned almost entirely in the oral cavity. The human tongue descends down into the pharynx and achieves its posterior rounded contour, carrying the larynx down with it. By 6-8 years of age, SVT H (the horizontal segment) and SVTV (the vertical segment) reach the 1:1 proportion. Data from a longitudinal study of 605 subjects imaged using magnetic resonance imaging (MRI) and computed tomography (CT) are consistent with these developmental studies (Vorparian et al. 2009). In contrast, the nonhuman primate tongue is long, rectangular, and positioned primarily in the oral cavity. In fetal development and shortly after birth, the chimpanzee larynx drops slightly owing to an increase in the distance between the larynx and hyoid (Nishimura 2003; Nishimura et al. 2003, 2006, 2008), whereas the human growth pattern involves the descent and shaping of the tongue. Tongue shape and SVT H/SVTV proportions in nonhuman primates remain almost constant from birth onward. The human tongue's oral and pharyngeal proportions and shape explain why only adult humans can produce the vowels [i], [u], and [a] and why these vowels contribute to the robustness of human vocal communication. Half the tongue (SVT H) is positioned in the oral cavity, and half SVTV is positioned in the pharynx. SVTH and SVTV meet at an approximate right angle, owing to the tongue's posterior circular shape. The extrinsic muscles of the tongue, muscles anchored in bone, can move the almost undeformed tongue to create abrupt midpoint ten-to-one discontinuities in the SVT's cross-sectional area. Stevens's (1972) parallel research explained why the unique human tongue contributed to the robustness of human vocal communication. Stevens showed that only the species-specific human SVT can produce the ten-to-one midpoint area function discontinuities that are necessary to produce the vowels [i], [u], and [a], which he termed "quantal." Stevens employed both computer modeling using Henke's (1966) algorithm and physical models (wooden tubes that could be shifted to change the position of the 10:1 changes in SVT cross-sectional area). Quantal vowels are perceptually salient owing to the convergence of two formant frequencies which yield spectral peaks. Their formant frequency patterns do not shift when tongue position varies about one centimeter about the midpoint. Speakers thus can be sloppy and produce the "same" vowel. Nearey (1978) subsequently showed that the vowel [i] is an optimal signal for determining the length of a speaker's vocal tract - a necessary step in the complex process of recovering the linguistic content from the acoustic signals that convey speech. Whereas the identical formant frequency pattern can represent two different vowels for speakers who have different SVT lengths, no such overlap occurs for tokens of [i]. Independent computer-modeling studies carried out by Lieberman and Crelin (1971), Lieberman et al. (1972); Stevens ( 1972); Carre et al. (1995), and De Boer (2010) have reached similar conclusions. Carre and his colleagues used a technique that directed the computer model to produce a vocal tract that could produce the full range of formant frequencies of human speech by modifying a nonhuman SVT. The system "grew" a pharynx equal in length to its oral segment. Carre et al. (1995) concluded that in order to produce the full phonetic range of human speech, "a vocal tract must have independently controllable oral and pharyngeal cavities nearly equal in length." The computer-modeling studies of Boë and colleagues (e.g., 2002) have disputed these findings. They claim that Neanderthals and newborn human SVTs have the same phonetic potential as ones having the oral and pharyngeal proportions of human adults. However, their computer model inherently produces irrelevant results because it distorts any SVT into the proportions of a human adult SVT (De Boer and Fitch 2010; Lieberman 2012). A straight plastic mailing tube would also assume the shape and proportions of an adult human tongue during speech using the computer-modeling procedure employed by the Boë studies.

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 8

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

The Linguistic Capacities of Living Nonhuman Species Syntax and Semantics The linguistic capacities of other species are seemingly so limited compared to those of humans that they can be ignored. Nonetheless, they can provide insights into the evolution of the different aspects of human language and the role of culture. As noted above, apes raised in environments in which sign language and other manual phonetic means are used can acquire vocabularies of about 150 words. They can also expand the semantic referents of words. The chimpanzee Washoe, for example, used the word dirty on her own as an epithet. She had only heard the word before used by the Gardner team to refer to her soiling herself. The limits on the passive vocabulary of chimpanzees have not been determined, but some dogs can learn at least the primary meanings of 200 words (Kaminski et al. 2004). When exposed to a rich human linguistic environment, it is evident that the capacity to learn some aspects of human language exists in other living species. Conversely, humans raised in an extremely deprived linguistic/social environment do not appear to develop either linguistic or cognitive proficiency. Genie, a child locked in a room and virtually deprived of human contact until puberty, failed to develop either normal language or cognitive ability despite intensive therapy (Curtiss 1977). Other less well-documented cases of feral children suggest a similar interaction between the environment and "inherent" biological capacities.

Vocal Tract Normalization and the Range of Speech The neural basis for vocal tract normalization appears to be genetically transmitted in humans since other species make use of this ability for another purpose - estimating the size of conspecifics and other species from their vocalizations (Fitch 2000). However, other living species cannot talk. One phonetic limit derives from their tongues. In contrast to the human SVT, the tongues of animals always remain anchored in their mouths during vocalization. Deer, for example, can transiently descend by increasing the distance between the hyoid bone and larynx. The lowered formant frequencies serve to signal to conspecifics that the animal is larger than it actually is. However, the deer cannot change the shape of their SVT to produce quantal speech sounds. This is apparent in the McElligott et al. (2006) study that synchronized audio and video recordings of mature groaning fallow bucks. Their acoustic analyses (e.g., Fig. 1, p. 342) show no change in vowel quality. The deer instead produces the same schwa-like vowel with gradually falling formant frequencies. Cineradiographs of other mammals vocalizing in Fitch (2000) again show that though transient larynx lowering occurs, the animals cannot produce the shapes necessary to produce quantal vowels because the tongue is still positioned in the animals' mouths. Thus, the dynamic articulatory maneuvers executed by animals discussed in some detail in Fitch (2010, pp. 315-320) do not increase the phonetic range of their vocalizations. Fitch is correct when he notes that the key to the evolution of the human SVT involves the descent of the tongue, not the larynx, which is carried down into the throat as the tongue moves down into the pharynx and is reshaped. However, Fitch overlooks the fact that the SVT shapes that are necessary to produce quantal vowels involve movements of the reshaped human tongue as a whole in the right-angle space formed at the junction of the oral cavity and pharynx. Thus, the undocumented "extinct hominid" discussed by Fitch (2010, p. 318), which has a "flat" nonhuman primate-like tongue anchored in the oral cavity instead of a rounded, human tongue with equal SVTH and SVTV segments, could not have produced quantal vowels. Surprisingly, DeBoer and Fitch ( 2010, p. 41), in their discussion of the vocal tracts of humans and other species, note that the unique attributes of the human tongue and SVT are adaptations for enhancing the robustness of speech communication.

Speech Would Be Possible, Absent the Human Tongue It is imperative to point out that speech and language are possible without the ability to produce quantal vowels. Lieberman and Crelin concluded in 1971 that the archaeological record demonstrates that Neanderthals must have possessed spoken language in order to transmit their stone-working technology, and in light of the selective advantage of speech's information transfer rate, it is possible to argue that Neanderthals undoubtedly talked. The overall error rate for all vowels in the Peterson and Barney (1952) study was low, only 4.6 %. However, the incremental difference in speech intelligibility that derives from being able to produce quantal vowels is only one example of the fact that small differences

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 9

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

drive natural selection. Evolution for adult lactose tolerance, for example, has occurred independently on different genes. It is possible to survive, absent this capacity. The selective advantage was an incremental increase in the food supply (Tishkoff et al. 2007). As Darwin (1859, p. 61) put it: any variation, however slight and from whatever cause proceeding, if it be in any degree profitable to an individual of any species, in its infinitely complex relation to other organic beings and to external nature, will tend to the preservation of that individual, and will generally be inherited by its offspring. The offspring, also, will thus have a better chance of surviving … I have called this principle, by which each small variation if useful is preserved by the term of Natural selection.

The Fossil Record of Evolution of the Human Tongue and SVT Given the advantages of the species-specific vocal tract for speech production, it is of great interest to determine the sizes and shapes of hominin vocal tracts. However, this presents challenges since the hyoid bone is the only component of the SVT that fossilizes, so that indirect approaches are necessary to infer the size and shape of the tongue, larynx, and other soft tissues. Researchers have focused on the bones that directly border the vocal tract - the hyoid, basicranium, mandible, cervical vertebrae, clavicles, and sternum - since these bones constrain the size and shape of the SVT. In addition, researchers have used the comparative method to reconstruct characteristics of the vocal tract in hominins.

Hyoid Bone and Larynx Eight hyoid bones form the hominin fossil record (see Table 1). Capasso et al. ( 2008:1007) argued that the lack of muscular impressions on a ~400-ky-old hyoid body from Italy indicates a "reduced capability for elevating this hyoid bone and modulating the length of the vocal tract," whereas Martínez et al. (2008) and Rosas et al. (2006) argued that the archaic hyoid bodies from Sima de los Huesos and El Sidrón, Spain, look nearly identical to those for fully modern humans. Arensburg and colleagues (1989; 1990) argued that the Neanderthal hyoid bone from Kebara, Israel, looks essentially modern, although several of its dimensions fall outside the range of variation of modern humans (Lieberman 1993, 1994; Arensburg 1994). However, it is unclear that there is any relationship between hyoid morphology and SVT length and shape. Table 1 Hyoid bones in the fossil record Element

Species

Age (m.y.)

Site

References

Body

Australopithecus afarensis

3.3

Dikika, Ethiopia

Alemseged et al. (2006)

Simo de los Huesos, Spain

Martínez et al. (2008)

b

Bodies

Homo heidelbergensis

0.60

Body

Homo heidelbergensis a

0.40

Castel di Guido, Italy

Capasso et al. (2008)

Bodies

Homo neanderthalensis

0.043

El Sidrón, Spain

Rosas et al. (2006)

Hyoid

Homo neanderthalensis

0.60

Kebara, Israel

Arensburg et al. (1989)

Hyoid

Homo sapiens

0.019

Ohalo II, Israel

Hershkovitz et al. (1995)

a

Capasso et al. (2008) attributed this hyoid to Homo erectus b

Date from Bischoff et al. (2007) The hyoid bone can, however, provide concrete evidence about the presence or absence of air sacs. A hyoid body for a juvenile Australopithecus afarensis specimen from Dikika, Ethiopia, exhibits a scalloped dorsal surface identical in morphology to the hyoid bodies of great apes that have laryngeal air sacs (Alemseged et al. 2006). This indicates that loss of air sacs occurred sometime between 3.3 Ma and 600 ky (De Boer 2012). In nonhuman primates, laryngeal air sacs are thought to recycle air from the lungs during long vocalizations (Hewitt et al. 2000) and enhance the impression of the vocalizer's size (De Boer 2009), but their presence limits the ability to produce distinctive speech by reducing the perceptual effects of vowels (De Boer 2009, 2012).

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 10

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

The larynx does not fossilize. However, it is possible to say something about its spatial relationships in hominins using comparative data from nonhuman primates. In humans, great apes, and lar gibbons, the thyroid cartilage is separated from the hyoid bone with the lateral thyrohyoid ligament and triteceum cartilage intervening between the two, which allows the thyroid cartilage to move independently of the hyoid (Bibby and Preston 1981; Nishimura 2003). This is different from the hyolaryngeal configuration in monkeys and agile gibbons, where the hyoid bone and thyroid cartilage overlap. Nishimura et al. (2003, 2006, 2008) showed that the thyroid cartilage descends relative to the hyoid prior to 1 year of age in chimpanzees, much the same as it does in modern humans. It is therefore likely that descent of the thyroid cartilage relative to the hyoid bone is the first step in a multistage process of laryngeal descent that did not necessarily have anything to do with vocalization (Nishimura et al. 2006). Parsimony would suggest that a thyroid cartilage descended relative to the hyoid and the presence of a lateral thyrohyoid ligament and triteceum cartilage were characteristics exhibited by the last common ancestor of chimpanzees and hominins.

Basicranium The bones of the palate, vomer, sphenoid, and basioccipital (for the purposes of this paper, the "basicranium") form the roof of the SVT. Several lines of research used the basicranium to reconstruct SVT size and shape in hominin specimens. As noted previously, the first such attempt was by Lieberman and Crelin (1971) and Lieberman et al. (1972), who noted that the "unflexed" basicranium of the adult Neanderthal specimen La Chapelle 1 resembled the basicranium of a human infant or a nonhuman primate. These researchers reconstructed La Chapelle 1 as having a superiorly positioned hyoid and larynx, long SVTH, and unequal SVTH: SVTV ratio. Laitman and Crelin (1976) provided additional justification for this approach, noting that the basioccipital rotates ventrally during early postnatal development, bringing the suprahyoid muscles into a more inferior position and contributing to hyoid and larynx descent. By analogy, fossil hominins with "flexed" basicrania were thought to have low hyoids and larynges in a modern-humanlike configuration, whereas hominins with "unflexed" basicrania were thought to have high hyoids and larynges and an infant human- or nonhuman primate-like tongue and SVT. Laitman and colleagues characterized flexion using a "cranial baseline" spanning the palate, nasopharynx, sphenoid, and basioccipital and reconstructed nonhuman primate-like vocal tracts for paranthropiths, australopiths, and some archaic Homo and modern humanlike vocal tracts for OH 12 ( Homo habilis), Kabwe and Steinheim (archaic Homo), Skhul 5, and late Pleistocene H. sapiens (Crelin 1973; Laitman et al. 1978, 1979; Laitman and Heimbuch 1982). The statistical procedure used to characterize the cranial baseline took into account the length of the oral cavity and nasopharynx and the angles formed by the basioccipital, vomer, and palate relative to one another. Other researchers have used on the basicranium to make inferences about SVT size and shape. Budil ( 1994) used the relationship between flexion and orientation of the styloid process in Petralona (archaic Homo) to infer a low hyoid and larynx position for this specimen. Duchin (1990) noted that H. sapiens, Neanderthals, and H. erectus have similarly shaped oral cavities, suggesting that fibers of three muscles (genioglossus, palatoglossus, mylohyoid) important for movements of the tongue had muscle fiber orientations similar to those of modern humans. The use of the basicranial angle for reconstructing SVT size and shape has been criticized for a number of reasons. First, it has been noted that many normal human populations have unflexed basicrania, like those exhibited by Neanderthals, but normal vocal tracts (Kean and Houghton 1982; Houghton 1993; Frayer and Nicolay 2000). Carlisle and Siegel (1974, 1978) and Falk (1975) noted that the hyoid in Lieberman and Crelin's (1976) reconstruction was positioned higher relative to the mandible's inferior border than is normal in chimpanzees. Lieberman and McCarthy (1999) showed that basicranial flexion and laryngeal descent occur at two entirely different times during ontogeny, suggesting that any effect flexion has on laryngeal descent must be indirect. Although there may be no direct relationship between basicranial flexion, craniofacial shortening, and laryngeal descent, other studies note that the modern human naso- and oropharynx do not leave much room for the hyoid and larynx near the skull base, and space limitations are such that normal increases in size of the adenoids, tonsils, and other lymphatic tissues during childhood often threaten the airway (Tourné 1991). As noted above, Laitman and colleagues characterized various skulls as either "flexed" or "unflexed" using a cranial baseline. However, these terms do not just refer to flexion of the cranial base in a narrow sense, but to the position of the palate relative to the nasopharynx and basioccipital on the underside of the cranium. The use of terminology most often employed for referring to the orientation of the basioccipital, sphenoid, frontal, and ethmoid bones relative to one another on the inner surface (brain side) of the cranium (see Lieberman and McCarthy 1999) was unfortunate, as it has been a perpetual source of misunderstanding. The statistical procedure employed by Laitman et al. (1979) and Laitman and Heimbuch (1982) clustered together hominin specimens with palates angled relative to the nasopharynx and basioccipital.

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 11

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

In retrospect, the real characteristic that determines speech ability is not basicranial flexion, but length of the SVTH, since this dimension cannot be overly long for there to be a 1:1 proportion. There is not necessarily a link between SVT dimensions and flexion of the basicranium as measured by the cranial baseline. Modern humans are characterized by a large, globular brain and cranial vault; short, retracted palate; short nasopharynx; and flexed basioccipital (Lieberman et al. 2002; Lieberman 2008; Gunz et al. 2010, 2012). In fully modern humans, these traits are related to a SVT with equal-length horizontal and vertical segments, since the palate, nasopharynx, and basioccipital directly border SVT H. In chimpanzees and other nonhuman primates, a long palate, long nasopharynx, and unflexed basioccipital form the superior border for a long SVTH and short SVTV. However, there is a wider range of variation in these parameters in fossil hominins. Olduvai Hominid (OH) 12, Kabwe, and Steinheim have been reconstructed with "flexed" basicrania, but each of these specimens has a long nasopharynx and long, projecting face. In other words, previous reconstructions based on basicranial flexion and/or palate length (Crelin 1973; Laitman and Crelin 1976; Laitman et al. 1978, 1979; Duchin 1990; Budil 1994) agree with reconstructions based on SVTH length only insofar as a long nasopharynx and long, projecting face are attributes normally associated with an unflexed basicranium. There is one specimen, Skhul 5, that has an intriguing combination of features in this regard. This specimen has a relatively long palate but a short, fully modern humanlike nasopharynx and flexed basioccipital. However, in this one case, a moderate SVTH that does not fall outside the modern human range is incompatible with a 1:1 SVT (see below). One implication of the above discussion is that one way to rule out certain SVT sizes and shapes is to measure or estimate SVTH. Since modern humans have a 1:1 SVT, this measurement can then be doubled to estimate hominin SVT length. Using this approach, Lieberman and McCarthy (2007) showed that Neanderthal crania are characterized by a long palate and nasopharynx that forms the roof of a SVTH that is outside the range of modern humans, and Lieberman and McCarthy (2007) and Granat et al. (2008) reconstructed a SVTH for Skhul 5 that falls within the high end of the fully modern human range. SVTH can be approximated by taking a measurement from prosthion (landmark on the alveolar bone between the central incisors) to the pharyngeal tubercle, which is the point where the pharyngeal constrictors that form the posterior pharyngeal wall attach to the basioccipital, and adding ten centimeters to approximate the anteroposterior length of the lips. Several studies (Boe et al. 2002; Granat et al. 2007) have used basion (landmark on the anterior edge of the foramen magnum) instead of the pharyngeal tubercle to model SVT length. These studies claimed that a modern human-shaped tongue would fit into a Neanderthal head and neck, but did not account for the fact that the posterior pharyngeal wall does not extend all the way back to basion. It is clear that Neanderthals and other archaic hominins have long, projecting faces (Arsuaga et al. 1997; Trinkaus 2003) and that they also would have had long SVTH dimensions.

Mandible At one time, it was thought that the genial tubercle, which marks the attachment for the genioglossus and geniohyoid muscles of the tongue to the inner border of the mandibular symphysis, provided the "surest anatomical evidence of speech that the skeleton affords" (Hooton 1946, p. 169; see DuBrul and and Reed 1960), although its presence and distribution in modern humans is highly variable. In nonhuman primates, the hyoid body is positioned at the mandible's inferior border at or near the gonial angle. As noted above, Lieberman and Crelin ( 1971) positioned the hyoid body just superior to the mandible's inferior border in their La Chapelle 1 reconstruction, a move which was criticized at the time. However, it is not the inferior border of the mandible that the hyoid must fall below, but instead the mylohyoid line (which marks the attachment area for the mylohyoid muscle to the deep surface of the mandible) and the attachment sites of the other suprahyoid muscles. The hyoid cannot be positioned much higher than this, since the suprahyoid muscles that attach to the hyoid need to act as elevators, not depressors, of the tongue, in order for the tongue to function in swallowing. There is therefore a constraint on the superior position of the hyoid bone relative to the mandible. Finally, there is some evidence that the resting position of the hyoid may be more superior than previously appreciated in nonhuman primates with normal head and neck postures. The inferior position of the hyoid may be partially an artifact of the manner in which nonhuman primates are positioned for MRI (Nishimura et al. 2008), so that the slightly higher hyoid position reconstructed by Lieberman and Crelin (1971) may fall within a normal range of variation. All hominins, therefore, would have possessed hyoids positioned no higher than the inferior attachment points of the suprahyoid muscles to the mandible's inner border, in close approximation to the mandible's inferior border.

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 12

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Cervical Vertebrae In modern humans the hyoid descends below the basicranium, palate, and mandible during ontogeny (Lieberman and McCarthy 1999; Lieberman et al. 2001), but maintains a steady position relative to the cervical vertebrae after 2 years of age. At that age, the superior margin of the hyoid body lies opposite the intervertebral disk between C3 and C4 (Carlsöö and Leijon 1960; Roche and Barkla 1965; Westhorpe 1987). In adults, the hyoid body is positioned as far inferiorly as C4 (King 1952; Bench 1963; Ardran and Kemp 1972), the vocal folds of the larynx lie opposite C5, and the cricoid cartilage lies opposite C6-C7 (Roche and Barkla 1965; Koppel et al. 1968). It is therefore feasible to infer hyoid and larynx position (and, by extension, SVTV) by reconstructing neck length in Neanderthals and other hominins that have preserved cervical vertebrae. Table 2 shows the ventral heights of cervical vertebrae C3-C7 for two Neanderthals (La Chapelle 1, La Ferrassie 1) and Skhul 5. Ventral heights for each of the Neanderthal specimens fall within the low end of the range of modern human data (Arensburg et al. 1990). However, it is clear that, when fit together, Neanderthals would have relatively short necks. McCarthy et al. (n.d.) used an algorithm constructed for humans to estimate intervertebral disk heights and reconstructed neck lengths of about 12 cm for the two Neanderthals and 11 cm for Skhul 5. For comparison, Trinkaus (1983) estimated Shanidar 2's neck length at 11.5 cm. As noted above, in modern human adults, the vocal folds of the larynx lie opposite C5. Placing the vocal folds of the larynx in the same position produces a ~8 cm long SVT V for Neanderthals and a 7.5 cm SVTV for Skhul 5, which are close to the averages for modern humans. However, SVTH is so long in these specimens that SVT proportions range between 1.54:1 and 1.60:1. Table 2 Ventral heights of cervical vertebrae. Data are from Arensburg et al. (1990) and McCown and Keith (1939). Parentheses indicate original authors' estimates based on damaged fossils Vertebra #

La Chapelle 1

La Ferrassie 1

Skhul 5

Modern human mean (range)

C3

11.25

12.5

(10.0)

14.1 (11.0-17.0)

C4

(10.6)

12.0

8.5

13.5 (10.3-16.2)

C5

11.5

11.5

9.5

12.7 (10.5-15.2)

C6

12.1

12.5

10.0

12.7 (10.2-15.9)

C7

13.4

13.0

12.5

14.4 (11.2-16.3)

SVT Reconstructions and Constraints on Hyoid and Larynx Position A more conservative way to reconstruct SVT shape would be to consider constraints on the position of the hyoid and larynx in the throat. The hyoid bone must be positioned at or below the mandible's inferior border; otherwise, the suprahyoid muscles would not function as elevators of the hyoid during swallowing. Similarly, the hyoid bone cannot lie below the pectoral girdle or else the infrahyoid muscles would not act as depressors of the hyoid, and protraction of the hyoid during swallowing would be impeded by the sternum and clavicles. Taking these constraints into account, the lowest possible position for the hyolaryngeal complex in the three specimens above is for the vocal folds to be at C6, which would put the cricoid cartilage at C7/T1. In this case, La Chapelle 1, La Ferrassie 1, and Skhul 5 would have SVT ratios between 1.31:1 and 1.33:1, still outside the quantal region (see Fig. 3). The anatomical features that result in unequal SVT ratios are different for the Neanderthal specimens and Skhul 5. The two Neanderthal specimens have a long palate, long nasopharynx, and short neck, whereas Skhul 5 has a moderately long palate, short nasopharynx, and short neck.

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 13

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Fig. 3 Position of the hyoid and larynx reconstructed for (a) La Ferrassie 1, a Neanderthal, (b) Skhul 5, and (c) Predmosti 3, a fully modern human associated with Upper Paleolithic tools. The shaded hyoid and larynx in (a) and (b) indicate the position of these structures in a hypothetical 1:1 SVT. A (d) chimpanzee and (e) modern human SVT configuration are provided for comparison

Schedule of Evolutionary Events The hyoid, basicranium, mandible, and cervical vertebrae all provide different insights into the evolution of the vocal tract and the timing of the appearance of hominin speech abilities. Comparative data on the spatial relationships between the hyoid and larynx (Nishimura 2003), in combination with ontogenetic data for chimpanzees (Nishimura et al. 2003), indicates that descent of the thyroid cartilage relative to the hyoid body in early infancy arose in an ancestor of extant hominoids, probably for reasons unrelated to vocalization. The origin of a straight bar-like hyoid body sometime between 3.3 My and 600 ky ago indicates the loss of laryngeal air sacs in Homo erectus or one of its predecessors or descendants (De Boer 2009, 2012). In great apes, laryngeal air sacs are thought to recycle air from the lungs during prolonged vocalizations. It is known that the nasal cavity, which is parallel to the SVT along the pathway to a speaker's lips, absorbs acoustic energy. This effect degrades the recovery of formant frequencies, reducing speech intelligibility (Bond 1976). Laryngeal air sacs would have had a similar effect, reducing speech intelligibility if hominin speech communication made use of segmental phonemes differentiated in part by their formant frequency patterns. This perhaps accounts for the disappearance of laryngeal air sacs. As noted above, in modern humans, intercostal and abdominal muscles innervated by thoracic nerves are recruited during sustained speech to regulate air pressure. Evidence for an expanded thoracic vertebral canal in Neanderthals and modern humans - and perhaps in Homo erectus - may be important in this regard (MacLarnon and Hewitt 2004), although such evidence must remain circumstantial because it is difficult to predict a nerve's cross section from the width of a bony canal, as shown for the hypoglossal canal by DeGusta et al. ( 1999). A fully modern humanlike SVT, with equally long horizontal and vertical segments, did not arise until sometime after the appearance of H. sapiens 200 ky ago, as the culmination of three distinct processes: (1) shortening of the nasopharynx, (2) shortening of the face, and (perhaps) (3) slight elongation of the neck. Neanderthals and other archaic hominins had a long palate and nasopharynx, a configuration which is associated with a long SVTH. A long SVTH cannot be paired with a long SVTV, unless Neanderthals had necks that were much longer than those of fully modern H. sapiens. All available evidence suggests that their necks were short, even if they do fall within the range of variation of modern H. sapiens. If one considers the fossil hominins from Mugharet es-Skhul to be anatomically modern H. sapiens, the final two steps in the above schedule occurred after the origin of H. sapiens. However, in light of the biological cost of the modern human tongue - an increased propensity for choking to death - it is apparent that speech was the default medium for hominin

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 14

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

language before the evolution of a fully modern human tongue. There otherwise would have not been any selective advantage for the retention of mutations and the selective sweep that resulted in the human tongue's proportions. We can thus conclude that the neural substrate for speech production was present in earlier, extinct hominins.

The Neural Substrate for Speech Neural Circuits The neural bases of human speech and language have for almost 200 years been linked to discrete localized structures, organs of the brain. Locationist theories for the brain bases of language derive from early nineteenth century phrenology. Phrenologists proposed that areas of the neocortex - the outermost layer of the human brain - were the seats of complex aspects of human behavior such as piety, language, mathematical ability, and so on (Spurzheim 1815). Each area was devoted to a specific, observable, aspect of behavior. Since direct inspection of the cortex in living subjects was not possible, the area of the skull above the location of the hypothetical seat was the metric that supposedly correlated with the degree to which that behavior was manifested by a particular subject. These claims were tested and were found wanting. When the skulls of clerics were studied, some had exceedingly small areas of their skull at the hypothetical seat for piety. Homicidal individuals instead could be found who had large bumps on their skulls supposedly related to piety and trust. Phrenology fell into disfavor, but it never died. Paul Broca had decided that the seat of language was not between a person's eyes as Spurzheim had suggested, but Broca operated within the same paradigm. In 1861 Broca published his study of a stroke victim, patient Tan, whose speech was limited to a syllable that sounded like tan. Broca limited his postmortem observations to the cortical surface of the patient's brain and linked a cortical area that was damaged to the patient's deficits. Thus, Broca's area was born - a cortical area devoted to language and, according to most accounts, language alone. Broca examined a second patient who had similar speech deficits, in this case also noting damage extending into the basal ganglia, but ignored the subcortical damages. A few neurologists demurred, pointing out the fact that postmortem examinations showed the language and speech were disrupted only when subcortical brain damage was present. It became evident that Broca's area is not the brain's center of language when the brains of both of the patients examined by Broca were imaged more than a country later. The patients' brains had been preserved in alcohol; a high-resolution MRI of Broca's patient Tan shows that he also had massive damage to the basal ganglia, other subcortical structures, and pathways connecting cortical and subcortical neural structures (Dronkers et al. 2007). Moreover, the left inferior gyrus of the brain - the traditional site of Broca's area in textbook illustrations and published studies - was not the cortical area actually damaged in Tan. Damage occurred anterior to it, close to the ventrolateral prefrontal cortex. After the 1970s, brain imaging techniques such as computerized tomography (CT) and magnetic resonance imaging (MRI) allowed the brains of thousands of patients who suffered aphasia to be examined. Patients who had suffered brain damage limited to cortex that spared subcortical brain structures recovered. Conversely, aphasia only resulted when subcortical structures were damaged. Alexander et al. (1987), for example, documented the speech production deficits of patients who had suffered strokes that damaged the basal ganglia and other subcortical structures, but spared the cortex altogether. The current view, expressed by Stuss and Benson in their comprehensive 1986 study, is that aphasia never occurs without subcortical damage. Researchers with very divergent positions with linguistic theory share this view in light of the fact that the ventrolateral prefrontal cortex and Broca's area form parts of basal ganglia circuits implicated in regulating speech and language (Lieberman 2000, 2002; Ullman 2004). The brain bases for complex motor acts, such as walking or talking, are neural circuits that link local operations performed in different neural structures. While some neural structures, such as those involved in the initial stages of visual perception, are domain-specific, other neural structures perform local operations that constitute elements of different circuits that regulate seemingly unrelated aspects of behavior. Your car makes use of similar functional architecture. If your car won't start, the repair manual will not instruct you to locate the center of starting. The manual instead will point out a set of linked structures that each performs a local operation. The battery, for example, provides electrical power to the starter motor but it also powers the car's lights, radio, computer, etc., through circuits linking it to these devices. The battery is not in itself the sole device dedicated to electrical power. The generator and voltage regulator form part of the electrical system, and in a hybrid gasoline-electric car, the braking system also charges the battery (Fig. 4).

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 15

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Fig. 4 The basal ganglia and thalamus are positioned deep within the skull. The putamen and globus pallidus (palladium) are contiguous and form the lentiform nucleus

The basal ganglia, subcortical structures that date back in time to anurans similar to present-day frogs, support circuits that link different areas of motor and prefrontal cortex. The basal ganglia operate as a sequencing engine (Marsden and Obeso 1994), regulating motor control and cognition - including aspects of language. Damage to the subcortical basal ganglia or the circuits linking them to cortex explains why the signs and symptoms of aphasia can include speech production deficits, difficulties comprehending the meaning of a sentence, and cognitive deficits. Indeed, although aphasia is usually characterized as a language deficit, Kurt Goldstein (1948), one of the leading aphasia specialists of the twentieth century, pointed out its primary cognitive deficit - loss of the "abstract capacity."

Circuits Linking Cortex and the Basal Ganglia Invasive tracer studies of the brains of monkeys and other mammals first mapped out a class of circuits that linked areas of motor cortex with the basal ganglia. Retrograde tracer studies entail injecting a chemical or virus that will propagate back down the neural pathway that transmits the electrochemical signals controlling the muscle. The animal must live for a while for the tracer to move down the circuit before being sacrificed. The animal's brain is then sliced into thin sections. Color couplers, similar in principle to those that were used in conventional color film, attach themselves to the tracer and mark out the circuit when the sectioned, color-stained brain tissue is viewed under a microscope. Other tracers can be injected into neural structures that mark the "ascending" neural pathways. Cortical-basal ganglia circuits were discovered that connected areas of prefrontal cortex through the basal ganglia and other subcortical structures to temporal and parietal cortical regions of the brain (e.g., Alexander et al. (1986)). These invasive techniques could not be used to map out human neural circuits, but noninvasive diffusion tensor imaging (DTI) confirmed that these cortical-basal ganglia circuits appear to be similar in humans and nonhuman primates (Lehéricy et al. 2004). Evidence from studies of neurodegenerative diseases such as Parkinson disease that degrade basal ganglia operations suggested that nonhuman primates and humans had similar cortical-basal ganglia circuits. In Parkinson disease (PD), depletion of the neurotransmitter dopamine degrades the local operations of the basal ganglia (Jellinger 1990). Patients have difficulty in sequencing the submovements that are necessary to carry out internally directed motor acts such as walking. A common clinical observation is that PD patients who have difficulty walking will do better when they are asked

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 16

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

to copy someone walking. An external model allows them to function better. Similar problems occur when PD patients execute manual acts (Harrington and Halland 1991) and speech motor control deteriorates (Lieberman et al. 1992; Lieberman 2006). Cognitive inflexibility and difficulties performing cognitive acts that require planning or selecting criteria also occur in PD (e.g., Lange et al. 1992). As Alexander et al. (1986), Cummings (1993), and other studies note, prefrontal cortical areas associated with "higher" human cognitive capacities project to the basal ganglia, thus, accounting for cognitive deficits associated with insult to the basal ganglia component of cortical-basal ganglia circuits. A "subcortical dementia" involving profound diminution of cognitive flexibility can occur in Parkinson disease (PD). Patients so afflicted perseverate - in other words, they are unable to change the direction of a thought process or action (Flowers and Robertson 1985; Fig. 5).

Fig. 5 Studies of the deficits of Parkinson disease and tracer studies of other species formed the basis for three cortical-basal ganglia circuits noted in Cummings (1993). The dorsolateral circuit is involved in cognition, the lateral orbital circuit in emotional regulation, and the anterior cingulate in attention and laryngeal control

Marsden and Obeso (1994) exhaustively reviewed the effects of surgical lesions and dopamine replacement therapy aimed at mitigating the problems associated with PD and concluded that the basal ganglia were a sequencing engine that could link submovements - motor acts stored in motor cortex to carry out an internally guided motor act such as walking. This fit into the traditional view that PD affected motor acts. Marsden and Obeso also noted that when circumstances dictated, the basal ganglia could change a course of action. Focal brain damage limited to the basal ganglia results in similar speech production and cognitive deficits. Bilateral lesions to the caudate nucleus and putamen of the basal ganglia in the subject studied by Pickett et al. (1998) resulted in severe deficits in sequencing the laryngeal, lingual, and lung motor activity necessary to produce articulate speech. The subject had profound difficulty comprehending distinctions in meaning conveyed by syntax that are comprehended by 6-year-old children and was almost incapable of planning her daily activities. When the subject sorted cards in the "Odd-Man-Out" test, which Flowers and Robertson ( 1985) devised to test PD patients' cognitive flexibility, she was incapable of changing the sorting criterion. The Odd-Man-Out test uses a pack of cards that each shows three images. For example, the first card might show a large triangle, a small triangle, and a large circle. Subjects are asked to identify the "odd" image on each of ten cards. If the subject had selected shape as the sorting criterion, the circle could be selected. If size was selected as the sorting criterion, the small triangle could be selected. The subjects taking the test are not explicitly given the sorting criteria; they must infer the criteria. The subject can start with either criterion, but after sorting ten cards, the subject is asked to pick a different sorting criterion for the next ten cards and again for at least 14-card sorts. PD patients typically have few errors on the first ten-card sort but have difficulty and high error rates each time that they have to change the sorting criterion. The subject tested by Pickett and her colleagues was unable to even think of a different sorting criterion after successfully completing the first ten-card sort (Fig. 6).

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 17

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Fig. 6 Two pages of the Odd-Man-Out test. The odd shape or letter of the alphabet can be selected on the basis of either shape or size (uppercase or lowercase for the alphabet). The test subject starts by selecting one sorting criterion but has to shift the criterion after sorting ten cards. This process is repeated four to six times

Neuroimaging studies of subjects performing the Wisconsin Card Sorting Test (WCST) confirm the role of the basal ganglia in tasks involving cognitive flexibility and explain why cognitive inflexibility occurs in Parkinson disease. The WCST is an instrument that measures cognitive flexibility - a person's ability to form cognitive criteria and shift from one criterion to another. The usual form of the WCST entails subjects sorting cards that each have images that differ with respect to shape, color, and number. Subjects have to match test cards to four "reference" cards - a card with one red triangle, a card with two green stars, a card with three yellow + signs, and a card with four green circles. A typical test card might have four yellow stars printed on it. The criterion according to which a test card is matched to one of the reference cards varies and can be the number of images, color, or shape. The subject starts out by making a sort and then is informed whether the sort was "correct" or not. For example, starting with number, matching the test card to the reference card that has four green circles on it would be incorrect if the person running the session wanted to start with color. The subjects then would continue to match by color, receiving a "correct" response until they are being told that the sort was "incorrect" and then having to establish a new sorting criterion without any explicit instruction. After achieving a "correct" response to the new criterion - for example, number - the subject would continue with number, receiving positive feedback until the sorting criterion changed. The Monchi et al. (2001) fMRI study monitored oxygen levels in the prefrontal cortex, the basal ganglia, and other subcortical structures. Depleted oxygen levels in the ventrolateral prefrontal cortex, the caudate nucleus, and the thalamus confirmed that this cortical-striatal circuit was activated when planning criterion-sorting shifts. Another cortical-striatal circuit that included posterior prefrontal cortex and the putamen was observed during the execution of a sorting criterion set shift. Dorsolateral prefrontal cortex was involved whenever subjects made any decision as they performed card sorts, apparently monitoring whether their responses were consistent with the chosen criterion. Other fMRI studies have replicated these findings and show that the caudate nucleus uses this information when a novel action needs to be planned (Monchi et al. 2006a, 2007). Similar activation patterns were apparent when subjects were sorting words instead of images and had to match words on the basis of semantic similarity, phonetic similarity of the start of the syllable, or rhyme (Simard et al. 2011). The neural circuits involved thus are not domain-specific, operating solely on visual criteria. Studies ranging from electrophysiological recordings of neuronal activity in the basal ganglia of mice and other animals as they learn tasks (Graybiel 1995; Mirenowicz and Schultz 1996; Jin and Costa 2010) to studies of PD patients (Lang et al. 1992; Monchi et al. 2007) and birds (Brainard and Doupe 2000) also show that the basal ganglia play a critical role in associative learning and in planning and executing motor acts including speech in humans and songs in songbirds. Basal ganglia activity in these cortical-basal ganglia circuits clearly is not domain-specific, i.e., limited to language. Motor activity, associative learning, and cognitive flexibility manifested in both linguistic and visual tasks involve local basal ganglia operations, reflecting the mark of evolution as cortical-basal ganglia circuits were adapted to "new" tasks.

Cortex Comparative studies of the architecture of the frontal regions of the brains of monkeys and humans have been conducted over the course of more than a century (Brodmann 1908, 1909, 1912). One explanation for why we, but no other

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 18

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

primates, can talk and command complex syntax might seem to rest on structural cortical differences. However, this does not seem to be the case. Petrides (2005), in his review of both classical and current studies and his own research, concludes that "the basic architectonic organization" (the distribution of neurons in the frontal layers of the cortex) is the same in humans and monkeys. What characteristics of the human cortical-basal ganglia circuits noted above might allow them to regulate speech motor control and aspects of cognition in humans? Domain specificity, particularly specificity for language, is inherent in Noam Chomsky's proposal that humans possess a faculty of language devoted to language and language alone. The central premises of Chomsky ( 2012) and earlier publications and Pinker's (1998) modular view of the brain are that language and other aspects of behavior are each regulated by domain-specific neural circuits and structures. However, noninvasive neuroimaging techniques show that this is not the case. The transmission of information from neuron to neuron in the brain is fueled by burning glucose. Therefore, increased activity in a neural structure uses up more oxygen. Thus, studies using functional magnetic resonance imaging (fMRI), which tracks the level of neural activity in a particular region of the brain by monitoring oxygen depletion, can infer whether a particular cortical area or subcortical structure is active during a task. It is not possible to specify where cortical brain activity is occurring in terms of Brodmann areas on a "standard brain," as is often the practice in neuroimaging studies. Brodmann (1908-1912) microscopically examined brains and found differences in the distribution of neurons in the cortex, which he thought reflected functional differences. The maps that he produced have since been used to refer to delimit areas of the cortex. However, differences in brain morphology make it particularly difficult to identify the particular Brodmann areas in prefrontal cortex (Devlin and Poldark 2007), but some of the local operations performed by different regions of the frontal cortex and in the subcortical components of neural circuits in humans and monkeys are becoming apparent. The dorsal posterior motor cortex areas control fine motor control during speech and other motor tasks (Petrides 2005). Ventrolateral prefrontal cortex connected to posterior regions by circuits involving the subcortical basal ganglia is active during virtually all tasks that involve selecting and retrieving information according to specific criteria (Duncan and Owen 2000). Dorsolateral prefrontal cortex is active while monitoring motor or cognitive events during a task while taking into account earlier events stored in working memory (Badre and Wagner 2006; Monchi et al. 2001,2006a,b, 2007; Postle 2006; Wang et al. 2005). These cognitive tasks range from retrieving information and holding it in short-term "working memory" to changing the direction of a thought process .

The Antiquity of Laryngeal Control The melody of speech which reflects the output of the larynx, or intonation, has a deep evolutionary history. Intonation reflects laryngeal activity, and the fundamental frequency of phonation (Fo) and amplitude of speech play a central role in vocal communication, signaling sentence boundaries and other aspects of syntax (c.f., Armstrong and Ward 1926; Lieberman 1967). Controlled Fo contours differentiate words in "tone" languages such as Mandarin Chinese. The neural circuits and anatomy involved in controlling Fo can be traced back to therapsids, mammal-like reptiles who during the Triassic, Jurassic, and early Cretaceous lived alongside dinosaurs. The initial role of these structures appears to be mother-infant communication. Studies of human mother-infant communication reveal a special vocal mode or register, motherese, by which parents address infants in speech that has a high fundamental frequency of phonation and extreme Fo variation (Fernald et al. 1989). Mammalian infants, (including human infants) also signal for attention by means of a forceful isolation cry that has a high Fo and amplitude (Truby et al. 1965) - the cries that can keep parents awake for months. Comparative studies suggest that therapsids employed similar anatomical specializations and neural structures to produce isolation cries. All mammals possess a paleocortex, which includes the anterior cingulate cortex (ACC). The findings of studies of the behavioral effects of damage to the ACC and the neural circuits that connect it to other parts of the human brain show that the anterior cingulate cortex plays a role in controlling Fo and directing attention to virtually anything that one wishes to do. While the soft tissue of the brains of therapsids has not survived, the inference that these mammal-like reptiles had an ACC rests on the fact that they possess three middle ear bones found in all present-day mammals. The initial function of the hinge bones of the reptilian jaw was to open the jaw wide. In the course of evolution, the hinge bones took on a dual role, functioning as "organs" of hearing. This transition from mammal-like reptiles to mammals involved the former jaw bones migrating into the middle ear, where they serve as a mechanical amplifier that enhances auditory acuity. All mammals have both an ACC and these middle ear bones, so that middle ear bones are regarded as an index for the presence of the ACC in mammal-like therapsids. The Darwinian struggle for existence transcends aggressive acts, instead referring to any aspect of behavior that confers a selective advantage and increases reproductive success. Vocal communication that enhances mother-infant contact

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 19

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

increases an individual's biological fitness and thereby contributes to its reproductive success. Anterior cingular cortex-basal ganglia neural circuits are involved in both attention and laryngeal control and therefore appear to function in that role. For example, lesion studies on mice show that mouse mothers do not pay attention to their infants when neural circuits to the ACC are disrupted (MacLean and Newman 1988; Newman 1985). In adult humans with Parkinson disease, degradation of ACC-to-basal ganglia circuits results in patient apathy (Cummings 1993). Hypophonation, an anomalous low-amplitude phonation, is often disrupted in patients with PD (Tsanas et al. 2009), and in extreme cases, PD patients can become mute (Cummings 1993). Virtually every PET or fMRI neuroimaging study ever published shows ACC activity when subjects are asked to perform any task. Our reptilian heritage is evident when an ear ache occurs as result of grinding one's teeth. We retain the "old" nerve pathways between the jaw and bones that have migrated into the middle ear. An alternate proposal stressing human uniqueness is that laryngeal control in humans derives from a circuit that directly links cortex to the larynx, bypassing the basal ganglia. According to Fitch (2010), this hypothetical circuit is the key to humans being able to learn to talk and the "faculty of language," conferring the ability to form and comprehend sentences that have embedded clauses (Hauser et al. 2002). Fitch also claims that vocal imitation in birds derives from a similar direct cortical-to-laryngeal neural circuit. However, Fitch overlooks evidence that shows that this human-specific hypothetical circuit does not exist. The basis for Fitch's claim derives from one of the first attempts to study human neural circuits, decades before cortical-basal ganglia circuits were mapped out by tracer studies in other species. Kuypers (1958) attempted to adapt an invasive tracer technique used in early studies on animals to study human brains. The Nauta-Gygax (1954) technique which Kuypers attempted to adapt involves first destroying a discrete part of a living animal's brain. After a few weeks, the animal is sacrificed, its brain is impregnated with a silver solution that highlights neuronal structure, and it is dissected and microscopically examined. The infused silver solution can show damages to downstream neurons of the neural circuit that connected them to the structure of the brain that had been destroyed, thereby tracing out a neural circuit. Instead of surgically lesioning any part of a human subject's cortex, Kuypers examined the brains of patients who had died after massive damage to one hemisphere of their brain. Since no one dies because of laryngeal dysfunction (other than from respiratory obstruction), it is difficult to see how this technique could be used to isolate a cortical-to-laryngeal neural circuit. Nonetheless, Iwatsubo et al. (1990) again used the Nauta-Gygax technique to study the brain of an 84-year-old woman who died after two massive strokes. They reported degeneration in spinal cord neurons that they believed revealed a direct cortical-to-laryngeal neural circuit. Jurgens's (2002) review article of 301 studies is cited by Fitch and others to support the claim that this neural circuit is present in humans. However, Jurgens ( 2002) overlooked damage to basal ganglia and pathways between the cortex and basal ganglia which clearly indicate that the degeneration of brainstem neurons do not constitute evidence for a direct cortical-to-laryngeal circuit. Terao et al. ( 1997) compared the brainstem neurons of four patients who died from brain damage after strokes with those of four age-matched patients who died from non-neurological diseases. No differences were apparent when the brainstem neurons of the two groups were examined. The neuronal degeneration observed in the Kuypers and Iwatsubo studies therefore may reflect degeneration after death, not downstream damage in circuits linked to cortex. CT scans and an autopsy of the patient studied by Iwatsubo et al. (1990) showed "massive infarctions of the entire territories of the middle cerebral artery on the right and anterior and middle cerebral arteries on the left." The patient's basal ganglia, pathways to the basal ganglia, and other midbrain and brainstem structures would have been damaged or destroyed by these lesions, resulting in death. Subsequent studies have used noninvasive techniques to study the neural circuits regulating vocalization in living humans. Schulz et al. (2005) used positron emission tomography (PET), which tracks the metabolism of glucose in the brain, to map neural circuits that are active when subjects talked. They concluded that the circuits were a "phylogenetically older system present in all mammals (p. 1845)" involving the basal ganglia and thalamus. As noted above, diffusion tensor imaging (DTI) which directly maps out neural circuits confirms that the human circuits are those present in monkeys and apes (Lehéricy et al. 2004). Fitch apparently overlooked the conclusion of Jurgens's review article. Jurgens (2002, p. 251) concluded that "motor coordination of learned vocal patterns comes from the motor cortex and basal ganglia." Fitch also takes note of the evolution of vocal imitation in birds to support his claim that direct human cortical control of laryngeal activity is the basis for the evolution of human speech and language. Fitch (2010, p. 352) identifies "X" of the bird brain as the cortical homolog of a circuit similar to the hypothetical Kuypers-Iwatsubo neural circuit. However, area "X" is homologous to the mammalian basal ganglia (Brainard and Doupe 2000).

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 20

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

The Sudden Appearance of Language According to Noam Chomsky Fitch's hypothetical unique human circuit reflects the position of Noam Chomsky concerning the biological basis of human language. The details of Chomsky's theories have changed over time, but they all posit an innate "faculty of language" that genetically transmits "knowledge of language." In his many publications, Chomsky posits an innate universal grammar (UG) that conveys the details of syntax of all human languages. In 1976 Chomsky stated that …language is as much an organ of the body as the eye or heart or the liver. It's strictly characteristic of the species, has a highly intricate structure, developed more or less independently of experience in very specific ways, and so on. (Chomsky 1976, p. 57) Chomsky's 2012 model focuses on the proposal made in Hauser et al. (2002) that the faculty of language includes a "narrow" faculty of language, specific to humans, that confers the ability to form and comprehend sentences that contain embedded clause, such as "The boy who fell down was thin." According to linguists following Chomsky's train of thought, sentences having embedded clauses sentence are formed in the human mind by a process of "recursion" that melds simple sentences such as the "The boy is fat" and "The boy fell down" together. It is not clear how Fitch's unique neural circuit to the larynx could account for the evolution of this mental capacity, but that problem is irrelevant because Chomsky's (2012) proposal for the biological bases of human language is outside the framework of evolutionary biology. Chomsky now claims that human language suddenly appeared between 100,000 and 50,000 years ago by means of a mutation that yielded a mental capacity that he terms merge. Following linguistic practice, Chomsky uses the term "rule" merge to refer to the mental capacity, the "narrow faculty of language," that yields recursion and complex sentences. Language, in Chomsky's account, was absent before this mutation occurred in one person. The mutation somehow became a characteristic of all human brains, without natural selection playing a role in this process. Chomsky's theories have been outside of the framework of evolutionary biology for decades. In a 1972 publication, he stated that It is perfectly safe to attribute this development [of innate language structures] to 'natural selection', so long as we realize that there is no substance to this assertion, that it amounts to no more than a belief that there is some naturalistic explanation for these phenomena. (Chomsky 1972, p. 97) The pages of the Science of Language (Chomsky 2012) include a sustained argument that natural selection has virtually no role in evolution. Selective sweeps in which a mutation that enhances survival in a particular ecosystem has spread through a human population have been well documented. For example, natural selection acted on different genes to confer the ability to use milk as an additional food source in different parts of the world at different times (Tishkoff et al. 2007). In human settings where domesticated milk-producing animals were absent, adult lactase did not evolve. The interplay between the ecosystem and genetic evolution was stressed by Darwin (1859) throughout On the Origin of Species. According to Chomsky and those who share his view, children would not be able to acquire any language because insufficient information supposedly is not present in the utterances that children hear in the early years of life. The syntactic rules, speech sounds, and syllable structures of the language a child hears are acquired because "parameters and principles" that activate them are innate, i.e., genetically transmitted. It's as though all human brains are computers that have identical preloaded software that automatically selects the appropriate instruction set for whatever language a child encounters in the first years of life. In substance, Chomsky's claim is that children do not really learn language in the way that they learn to use forks or chopsticks or the modes of behavior or technology of any human culture. In other words, Chomsky's claim is equivalent to the assertion that no child would "acquire" a language unless this preloaded language software was in his or her brain. However, natural selection on humans never ended, and this biological certainty demonstrates the implausibility of UG. If anyone's native language was acquired because UG existed, children of Chinese ancestry born and raised in the United States would be unable to fluently speak English. In adult humans, the ability to digest cows' milk is an innate, genetically transmitted biological attribute. In cultures that possessed herds of animals that could be milked, natural selection acted on genetic variations to yield individuals who could digest milk as adults, thereby enhancing their survival and their children's survival. While human cultures exist in which animal husbandry and sources of milk were absent, leading to reduced adult lactose tolerance, language typifies all human cultures. Virtually all aspects of human life are culturally transmitted, and the primary medium of transmission is language. Thus, if

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 21

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

anyone's ancestors had been using any particular language for an extended period - let's say the last 3,000 years (the period when adult lactose tolerance or other attested aspects of human biology evolved) - then if any innate UG were necessary to "acquire" that language, natural selection would have acted to enhance survival, optimizing the UG to favor acquiring the particular linguistic characteristics of the indigenous language. The United States is an optimal "experiment in nature" since the ancestors of most Americans did not speak English before their arrival. Since languages differ dramatically, you might have never been able to acquire English if your ancestors were speaking Chinese, Hungarian, or some other language whose structure and sound pattern differs dramatically from English. Natural selection acting over generations would have optimized your innate UG for your ancestral language rendering it deficient for acquiring English. The time depth of the Chinese languages whose syllable structure differs profoundly from that of English would thus prevent Americans of Chinese ancestry from "acquiring" fluent English. Since that is not the case, we can conclude that the innate Chomskian UG does not exist.

Fully Human Linguistic and Cognitive Capability Brain Size One aspect of cognitive capability that is apparent in the fossil record is brain size. The first anatomical study of an ape, Tyson's 1699 dissection of an orangutan, revealed many of the anatomical affinities between great apes and humans, but it showed that the ape brain was much smaller than that of any normal adult human. A great deal of attention has since been focused on the human brain being about three times larger than a chimpanzee brain because brains require lots of biological support. Hence, scholars have reasoned that a large brain signifies that it must be useful. Texts on hominin evolution generally include a chart showing an increase in brain size over time. Current assessments show that humans have a scaled-up primate brain with about three times as many neurons (the basic computing elements of all brains) as a chimpanzee brain (Herculano-Houzel 2009). The size of most parts of the mammalian brain scales up in proportion to overall brain size, but the human brain differs in that the posterior temporal cortex is disproportionately larger than would be expected (Semendeferi et al. 1997, 2002). Temporal cortex is part of the human long-term information storage system. Working memory, the ability to keep information in short-term memory during a cognitive process, appears to access information from permanent memory through neural circuits linking prefrontal cortex to temporal cortex and other structures (Badre and Wagner 2006; Postle 2006). The human prefrontal cortex has long been associated with "higher" cognition and, as noted previously, when linked with the basal ganglia and other neural structures, it is involved in the range of cognitive acts involving "executive control." Dorsolateral and ventrolateral prefrontal cortex work through the basal ganglia as well as through direct neural circuits with information-storing regions of the brain, pulling memory traces of images, words, and probably other stored information into short-term working memory (e.g., Postle 2006; Badre and Wagner 2006; Miller and Wallis 2009). Although some studies that compared chimpanzee and human brains using magnetic resonance imaging studies have claimed that humans have a disproportionately larger prefrontal cortex, thereby enhancing human cognitive capabilities, that may not be the case. As Semendeferi et al. (2002) noted, MRI scans inherently cannot show that humans have a disproportionately larger prefrontal cortex. The human frontal cortex includes prefrontal as well as posterior areas involved in motor control and, together, these areas are not disproportionately larger than an ape's. It is impossible to differentiate prefrontal cortical areas from the motor regions of the frontal cortex on an MRI. Many proposals have been made for why hominin brains became larger over time. One recurring theory hinges on abrupt climate changes taking place in Africa where early hominins evolved. Alternating periods of frigid glacial cold and heat may have resulted in natural selection for individuals able to think of creative solutions to survival. However, there is no evidence for glacial cold in Africa. Another proposal in the same vein suggests alternating periods of drought and heavy rainfall that resulted in desertlike or lush rain forests in the Rift Valley of Africa as the causal element driving hominin brain size enlargement. However, if archaic hominins responded in a manner similar to other species, the most likely scenario would have involved hominins moving away when the climate became inhospitable. Another theory stressed the ability to communicate with members of one's group, noting the correlation between group size and size of the neocortex. However, the missing data point is that of solitary orangutans' brains. Nonetheless, it is clear that brain size is linked to cognitive ability if only in memory storage capacity. Lartet (1868) noted a gradual increase in brain size in wolves and their prey, and Jerison (1973) and subsequent researchers (e.g., Kondoh 2010) have documented this trend across multiple groups of animals. Acquiring a meal and avoiding being a meal

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 22

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

apparently drove the evolution of larger brains. As Darwin (1859, p. 61) pointed out, natural selection will act on any attribute that enhances the survival of progeny in "the infinitely complex relations to other organic beings and to external nature" that all beings face. Given the interlocked neural structures implicated in motor control, cognition, and language, it is improbable that any "one" factor was responsible for driving an increase in hominin brain size.

Transcriptional Genes Advances in genetics and neurophysiology have unlocked a new means of inquiry into the evolution of human language and cognition. Transcriptional factors are essentially "master" genes that affect the way that other genes are activated to form brains and bodies. Transcription factors are genes that govern the transcription of information stored in DNA into a different form, single-stranded mRNA, which is later translated into functional proteins that make up the building blocks of the body. These proteins bind to DNA sequences near a gene that they regulate so as to control the degree to which they release information to the mRNA. Many members of the extended KE family in London who had severe deficits in speech production, sentence comprehension, and cognitive ability possessed only one copy of the human FOXP2 transcriptional factor, instead of the normal two (Fisher et al. 1998). The Foxp2 gene is one of many transcription factors that exist in all mammals, birds, and other creatures. The mouse form of Foxp2 (the lowercase spelling indicates that it is not the human version, which is capitalized) controls embryonic development of the lungs, intestinal system, heart, and other muscles as well as the spinal column in mice (Shu et al. 2001). Humans are separated from mice by 75 million years of evolution ( Mouse Genome Sequencing Consortium 2002). The Foxp2 and FOXP2 genes encode a protein that regulates the expression of other genes during processes that mark embryonic development, such as signal transduction, cellular differentiation, and pattern formation. Mutations to other Forkhead transcription factor genes have been implicated in a number of developmental disorders. The areas of expression of FOXP2 and Foxp2 in both the human and mouse brain are similar and include neural structures that form the human cortical-striatal-cortical circuits involved in motor control and cognition - the thalamus, caudate nucleus, putamen, and other subcortical structures (Lai et al. 2003). These neural structures are all intricately interconnected. The cerebellum, which receives input from the inferior olives, is involved in motor coordination. The cortical plate (layer 6), the input level of the cortex, is also affected by the FOXP2 mutation. The subsequent focus on the role of FOXP2 in human evolution follows from its being one of the few genes that has been shown to differ from its chimpanzee version (The Chimpanzee Sequencing and Analysis Consortium 2005). A "human" version evolved that differs from the version found in chimpanzees during the six- or seven-million-year period that separates humans and chimpanzees. In that period FOXP2human underwent two substitutions in its DNA sequences, causing two amino acid changes in the FOXP2 protein compared to one amino acid substitution between chimpanzees and mice over the previous 70 million years. The form of FOXP2 that has two amino acid substitutions also occurs in Neanderthals and Denisovans - a group related to Neanderthals. A third change unique to humans occurred in intron 8 of the FOXP2 gene and resulted in a selective sweep during the period in which anatomically modern humans appeared some 260,000 years ago (Maricic et al. 2013). The date of this "selective sweep" for the unique human form of FOXP2, approximately 260 ky ago, was first established by Enard and colleagues (2002). Selective sweeps occur when a gene confers a significant advantage in the Darwinian "struggle for existence" - an individual's having more surviving children, such as adult lactose tolerance. In most instances it is unclear what the function of a gene is that differs between chimpanzees and humans. However, in this instance, the effects of a FOXP2 anomaly in the KE family showed that it is playing a key role in the attributes of speech, language, and cognition that distinguish humans from chimpanzees and other living species. Members of this extended family who had the anomalous version of FOXP2 had difficulty executing simple orofacial maneuvers - such as simultaneously protruding their tongue while pursing their lips, repeating spoken words, and talking. In addition, these subjects had difficulty comprehending distinction conveyed by syntax, forming sentences, and had a group, lower scores on intelligence tests (Vargha-Khadem et al. 1995, 1998; Watkins et al. 2002). Anomalies in basal ganglia also were noted (Watkins et al. 2002 ). Mouse "knock-in" studies have demonstrated that the human version of FOXP2 enhances information transfer and associative learning in the basal ganglia. In light of basal ganglia activity in both associative learning and motor control, this would account for cognitive deficits in afflicted members of the KE family as well as their inability to learn and execute the complex motor acts that enable us to talk and which also control internally guided motor acts. The human form of FOXP2 was knocked into mice (Enard et al. 2009; Lieberman 2009; Reimers-Kipping et al. 2011). When the human version of FOXP2 was knocked into mouse pups, their vocal calls were somewhat different than the calls of mouse pups http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 23

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

that had the normal "wild" version of Foxp2. When the wild form of Foxp2 was knocked out in mouse pups, they died soon after, reflecting its role in lung and cardiovascular development. The critical finding was that the knocked-in human form of FOXP2 conferred increased synaptic plasticity in basal ganglia neurons as well as increased lengths of dendrites in the basal ganglia, thalamus, and layer VI of the cortex. In particular, the human version of FOXP2 increased synaptic plasticity in medium spiny neurons in the basal ganglia and in the substantia nigra, another structure of cortical-basal circuits (Alexander et al. 1986; Cummings 1993). Increasing synaptic plasticity has been shown to enhance associative motor learning in mice (Jin and Costa 2010), a result consistent with one of the first findings of modern neuroscience. Hebb ( 1949) formulated the theory that synapses transfer information from one neuron to another, a paradigm that has since guided research on the basic operations of the brain. Synaptic modification is the neural mechanism by which the relations that hold between seemingly unrelated phenomena are learned. The process by which we learn anything - motor acts, words, concepts, etc. - involves modifying synaptic "weights," the degree to which synapses transmit information to a neuron. This is the case for creatures as far removed on the evolutionary scale as humans and mollusks (Carew et al. 1981). Increased synaptic plasticity and connectivity in the medium spiny neurons of the basal ganglia that resulted from FOXP2 is especially significant in light of their role in associative learning. In combination with dopamine-activated neurons in the basal ganglia and prefrontal cortex, these neurons in essence guide associative learning, coding the expectation of achieving a desired goal or avoiding aversive outcomes (Bar-Gad and Bergman 2001; Joshua et al. 2008; Assad and Eskander 2011). In short, information stored in the synaptic weights of these neurons directs the process of associative learning, allowing animals to learn to perform complex linked sequences. In humans, similar processes account for our learning complex grammatical "rules," as well as the complex "rules" that guide our interactions with other people, other species, and the ever-changing conditions of life. Other "highly accelerated regions" (HARs) of the human genome may be implicated in neural development (Konopka et al. 2009). These advances in genetics suggest that neural circuits that humans share with other primates were, in effect, "supercharged" by the action of transcriptional genes. Circuits linking prefrontal cortex and the basal ganglia enhance the cognitive and motor capabilities involved in human language and speech. The role of circuits linking cortex and the basal ganglia is well established, but DTI studies reveal a bewildering array of neural circuits in the human brain. We are at the starting point of an understanding of language, cognition, and motor control.

The Archaeological Record The artifacts preserved in the archaeological record inherently cannot provide direct evidence of their maker's cognitive or linguistic abilities. Computers and software used to write novels represent a technology that is far more complex than Jane Austin's quill pen, yet they do not constitute evidence for a leap in human cognitive ability over the last two centuries. However, there are periods extending over millions of years in which Oldowan tools are virtually identical. Oldowan tools continued to be made by Homo erectus as well as hand axes that are more complex, and it is difficult to see how the technique necessary to make them could have been transmitted without some form of language, but they conform to the same pattern over a hundred thousand years. Neanderthal Levallois stone tools are more complex still, and Neanderthals survived in a cold difficult environment. Their brains were as large as humans and they undoubtedly talked. But one signal element that characterizes all human cultures was missing - there is almost no trace of the creative impulse that leads humans to produce art (artifacts that are useless in the struggle for existence that humans produce and value). Nor is there evidence of the pattern of innovation and imitation that mark human behavior in some cultures. However, the intersection of culture and the biological bases of human cognitive and linguistic ability preclude the archaeological record providing any detailed conclusions on any stage of the evolution of language. In his notebooks, Charles Darwin expressed no doubt that the Yaghan, the indigenous inhabitants of Tierra del Fuego, had the same general cognitive and linguistic capabilities as Europeans (Browne 1995), but he noted the primitive aspects of Yaghan culture. They were naked except for animal skins thrown over their shoulders; later studies found that their tools were similar to those fabricated at Oldowan sites. Yet the three Yaghans who had been brought to England 3 years before on the Beagle had acquired English and conducted themselves in an acceptable early Victorian manner before reverting to Yaghan behavior when they returned to Tierra del Fuego.

Conclusion The evolution of the biological bases of human language involved natural selection acting on anatomy and neural

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 24

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

mechanisms that have a long evolutionary history. Human language did not suddenly arise from a single mutation that occurred 50,000-100,000 years ago, creating an "organ" of the brain - a hypothetical faculty of language - devoted to language and language alone. The traditional localization of language to Broca's area of the neocortex is also incorrect. Computed tomography (CT) and magnetic resonance (MR) imaging of patients exhibiting aphasia show that damage to subcortical parts of the brain like the basal ganglia, putamen, or olives is necessary to induce permanent loss of language. Neural circuits linking local activity in different neural structures regulate complex behaviors, including speech and language. Circuits linking motor cortex and prefrontal cortex with the basal ganglia play critical roles in motor control including speech, cognitive flexibility, working memory, and other aspects of cognition. Studies of Parkinson, a disease which disrupts basal ganglia operations, have established a relationship between deficits in motor control of internally guided acts and deficits in cognition including language. The basal ganglia constitute a sequencing engine that links internally guided motor acts that control walking, talking, and manual tasks. The basal ganglia also act to interrupt a sequence when changing circumstances suggest a different response. Similar basal ganglia operations involving the basal ganglia linked to prefrontal cortex contribute to cognitive flexibility, the comprehension of distinctions in meaning conveyed by syntax, and tasks such as mental arithmetic. These neural circuits linking cortex and the basal ganglia are not domain-specific, i.e., they are not committed to language and language alone. Speech plays a critical role in language, permitting information transfer at a rate that exceeds the fusion frequency of the human auditory system. The high data transmission rate of speech involves the process of "encoding," where the individual "phonemes" - roughly equivalent to the letters of the alphabet - are melded together into words. The acoustic signals that convey words are generated by the supralaryngeal vocal tract (SVT), which acts as a variable acoustic filter that produces local energy maxima occurring at formant frequencies that are the major determinants of vowel and consonant quality. However, the overall length of a speaker's SVT determines the absolute value of the formant frequencies that conveys a given vowel or consonant. Listeners must form an estimate of the length of the SVT that generated a particular formant frequency pattern to recover the vowel or consonant. The vowel [i] (the vowel of the word see) and to a lesser extent [u] (the vowel of too) are optimal signals for recovering the length of a speaker's SVT. Computer-modeling studies and acoustic analyses of the vocalizations of human newborn infants and monkeys and apes show that their SVTs cannot generate these vowels and other "quantal" speech sounds that enhance the robustness of speech communication. Their tongues prevent them from producing the quantal speech sounds that are among the few attested "universals" of human language. Human infants have a tongue situated almost entirely in the oral cavity, more closely resembling the proportions of the tongue in a nonhuman primate or other mammal than that of an adult human. The shape of the adult human tongue is attained through a developmental process involving restructuring of the skull to shorten the length of the oral cavity and descent of the tongue, hyoid, and larynx below the mandible into the neck. This process takes 6-8 years in modern humans. In humans, half the adult tongue is situated in the pharynx in the neck and half is in the oral cavity. Computer-modeling studies show that configuration enables humans to form the SVT shapes necessary to produce quantal sounds. The evolution of the human tongue can be traced by taking into account the relative length of the oral cavity, which can be determined by examining bony landmarks on the basicrania of fossil hominins and neck length which can be estimated from the dimensions of their cervical vertebrae. Since the tongue carries the larynx down into the neck with it to attain the 1:1 oral to pharyngeal proportions necessary to produce quantal speech sounds, there must be sufficient room for the laryngeal maneuvers involved in swallowing. Taking into account these constraints, Neanderthals and other archaic hominins with a long face and nasopharynx would not have had the 1:1 SVT proportions necessary to produce quantal speech. Speech is, however, possible without the ability to produce quantal sounds. Since the human tongue has a biological liability - increasing the risk of choking on food lodged in a descended larynx - the neural capacity for speech must have been present before the adaptations that yielded the adult humanlike tongue. The neural circuits that appear to regulate laryngeal phonation appear to have a long evolutionary past, deriving from circuits whose initial purpose was mother-infant communication in transitional mammal-like reptiles. The neural capacity to regulate alveolar (lung) air pressure during speech may also be present in other species. Humans do not appear to have a unique, direct cortical-to-laryngeal neural circuit that confers the ability to learn and execute the motor commands that underlie speech. That seems to be the province of cortical-basal ganglia circuits which in humans are similar to circuits mapped out in nonhuman primates, except that, in the human circuits, synaptic plasticity and dendritic connectivity are enhanced by mutations on the FOXP2 transcriptional gene. Studies of the pattern of deficits of an extended family in which many individuals had only one copy of the FOXP2 gene established the role of the gene in speech motor control,

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 25

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

cognition, and language. Comparisons of DNA from chimpanzees, contemporary humans, and fossil DNA recovered from Neanderthals and Denisovans revealed distinctions between the forms of FOXP2 in chimpanzees, Neanderthals, and humans. A series of selective sweeps, the most recent at about 260 ky ago, occurred on the uniquely human form of FOXP2. Other transcriptional genes are most likely implicated in evolution of the neural capacities that confer human cognitive and linguistic ability. Darwin pointed out that biological evolution and the ecosystem, which for humans includes their culture, are not separable. Selective sweeps for adult lactose tolerance, for example, occurred in cultural settings in which milk-producing animals were domesticated. This argues against the universal grammar (UG), proposed by Chomsky, which specifies the details of syntax and the sound pattern of every language. The UG is necessary for a child to acquire any language. If such were the case, natural selection and selective sweeps would have occurred to optimize UG for the language spoken in a particular culture, rendering it deficient for acquiring languages that differed. However, a child from any cultural background can learn any language with native proficiency if he or she is immersed in a given culture at an early age. Likewise, chimpanzees raised in a human setting where sign language is used can, to a degree, learn to use words and understand simple syntax - demonstrating the continuity of the evolution of human language and the antiquity of some of the elements that constitute language .

Cross-References Ancient DNA Archaeology Cooperation, Coalition and Alliances Cultural Evolution in Africa and Eurasia During the Middle and Late Pleistocene Evolution of the Primate Brain Hominoid Cranial Diversity and Adaptation Neanderthals and Their Contemporaries Origin of Modern Humans Paleoecology: An Adequate Window on the Past? Primate Intelligence The Emergence of Symbolic Behavior

References Alemseged Z, Spoor F, Kimbel WH et al (2006) A juvenile early hominin skeleton from Dikika, Ethiopia. Nature 443:296-301 Alexander GE, DeLong FH, Strick PL (1986) Parallel organization of segregated circuits linking basal ganglia and cortex. Ann Rev Neurosci 9:357-381 Alexander MP, Naeser MA, Palumbo CL (1987) Correlations of subcortical CT lesion sites and aphasia profiles. Brain 110:961-991 Ardran GM, Kemp FH (1972) A functional assessment of relative tongue size. Am J Roentgen 114:282-288 Arensburg B (1994) Middle Paleolithic speech capabilities: a response to Dr. Lieberman. Am J Phys Anthropol 94:279-280 Arensburg B, Tillier AM, Vandermeersch B et al (1989) A Middle Paleolithic human hyoid bone. Nature 338:758-760 Arensburg B, Schepartz LA, Tillier A-M et al (1990) A reappraisal of the anatomical basis for speech in Middle Paleolithic hominids. Am J Phys Anthropol 83:137-146 Armstrong LE, Ward IC (1926) Handbook of English intonation. Teubner, Leipzig Arsuaga JL, Martinez I, Gracia A et al (1997) The Sima de los Huesos crania (Sierra de Atapuerca, Spain: a comparative study. J Hum Evol 33:219-281 Assad WF, Eskander EN (2011) Encoding of both positive and negative reward prediction errors by neurons of the primate lateral prefrontal cortex and caudate nucleus. J Neurosci 31:17772-17787 Badre D, Wagner AD (2006) Computational and neurobiological mechanisms underlying cognitive flexibility. Proc Natl Acad Sci U S A 103:7186-7190

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 26

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Bar-Gad I, Bergman H (2001) Stepping out of the box: information processing in the neural networks of the basal ganglia. Curr Opin Neurobiol 11:689-695 Bench RW (1963) Growth of the cervical vertebrae as related to tongue, face and denture behavior. Am J Orthod 49:183-214 Bibby RE, Preston CB (1981) The hyoid triangle. Am J Orthod 80:92-97 Bickerton D (1990) Language and species. University of Chicago Press, Chicago Bischoff JL, Williams RW, Rosenbauer RJ et al (2007) High-resolution U-series dates from the Sima de los Huesos hominids yields 600±‰/66 kyrs: implications for the evolution of the early Neanderthal lineage. J Arch Sci 34:763-770 Boë L-J, Heim J-L, Honda K et al (2002) The potential Neanderthal vowel space was as large as that of modern humans. J Phon 30:465-484 Bond ZS (1976) Identification of vowels excerpted from neutral nasal contexts. J Acoust Soc Am 59:1229-1232 Bouhuys A (1974) Breathing. Grune and Stratton, New York Brainard MS, Doupe AJ (2000) Interruption of a basal-ganglia-forebrain circuit prevents plasticity of learned vocalizations. Nature 404:762-766 Broca P (1861) Nouvelle observation d'aphémie produite par une lésion de la moitié postérieure des deuxième et troisième circonvolutions frontales. Bull Soc Anat 6:398-407, 2nd ser Brodmann K (1908) Beiträge zur histologischen Lokalisation der Grosshirnrinde. VII. Mitteilung: Die cytoarchitektonische Cortexgleiderung der Halbaffen (Lemuriden). J Psychol Neurol 10:287-334 Brodmann K (1909) Vergleichende Lokalisationslehre der Groshirnrinde in iheren Prinzipien dargestellt auf Grund des Zellenbaues. Barth, Leipzig Brodmann K (1912) Ergebnisse uber die vergleichende histologische Lokalisation der Grosshirnrinde mit besonderer Berucksichtigung des Stirnhirns. Anat Anz Suppl 41:157-216 Browne J (1995) Charles Darwin voyaging. Princeton University Press, Princeton Budil I (1994) Functional reconstruction of the supralaryngeal vocal tract of fossil human. Hum Evol 9:35-52 Capasso L, Michetti E, D'Anastasio R (2008) A Homo erectus hyoid bone: possible implications for the origin of the human capability for speech. Coll Anthropol 4:1007-1011 Carew TJ, Waters ET, Kandel ER (1981) Associative learning in aplysia: cellular correlates supporting a conditioned fear hypothesis. Science 211:501-503 Carlisle RC, Siegel MI (1974) Some problems in the interpretation of Neanderthal speech capabilities: a reply to Lieberman. Am Anthropol 76:319-322 Carlisle RC, Siegel MI (1978) Additional comments on problems in the interpretation of Neanderthal speech capabilities. Am Anthropol 80:367-372 Carlsöö S, Leijon G (1960) A radiographic study of the position of the hyo-laryngeal complex in relation to the skull and cervical column in man. Trans R Sch Dent Umea (Stockh) 513-535 Carre R, Lindblom B, MacNeilage P (1995) Acoustic factors in the evolution of the human vocal tract. CR Acad Sci Par t320(Ser IIb):471-476 Cheyney DL, Seyfarth RM (1990) How monkeys see the world: inside the mind of another species. University of Chicago Press, Chicago Chomsky N (1972) Language and mind. Harcourt Brace Jovanovich, New York Chomsky N (1976) On the nature of language. In: Steklis HB, Harnad SR, Lancaster J (eds) Origins and evolution of language and speech. New York Academy of Sciences, New York Chomsky N (2012) The science of language. Cambridge University Press, Cambridge Crelin ES (1969) Anatomy of the newborn: an atlas. Lea and Febiger, Philadelphia Crelin ES (1973) The Steinheim skull: a linguistic link. Yale Sci 48:10-14 Cummings JL (1993) Frontal-subcortical circuits and human behavior. Arch Neurol 50:873-880 Curtiss S (1977) Genie: a psycholinguistic study of a modern-day "wild child". Academic, New York Darwin C (1859/1964) On the origin of species. Harvard University Press, Cambridge, MA De Boer B (2009) Acoustic analysis of primate air sacs and their effect on vocalization. J Acoust Soc Am 126:3329-3343 De Boer B (2010) Modeling vocal anatomy's significant effect on speech. J Evol Psychol 8:351-366 De Boer B (2012) Loss of air sacs improved hominin speech abilities. J Hum Evol 62:1-6

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 27

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

De Boer B, Fitch WT (2010) Computer models of vocal tract evolution: an overview and critique. Adapt Behav 18:36-47 DeGusta D, Gilbert WH, Turner SP (1999) Hypoglossal canal size and hominid speech. Proc Natl Acad Sci U S A 16:1800-1804 Devlin JT, Poldark RA (2007) In praise of tedious anatomy. Neuroimage 37:1033-1058 Dronkers NF, Plaisant O, Iba-Zizen MT et al (2007) Paul Broca's historic cases: high resolution MR imaging of the brains of Leborgne and Lelong. Brain 130:1432-1441 DuBrul EL, Reed CA (1960) Skeletal evidence of speech? Am J Phys Anthropol 18:153-156 Duchin L (1990) The evolution of articulate speech: comparative anatomy of the oral cavity in Pan and Homo. J Hum Evol 19:687-697 Duncan J, Owen AM (2000) Common regions of the human frontal lobe recruited by diverse cognitive demands. TINS 10:475-483 Enard W, Przeworski M, Fisher SE et al (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 41:869-872 Enard W, Gehre S, Hammerschmidt K et al (2009) A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice. Cell 137:961-971 Everett DL (2012) Language: the cultural tool. Pantheon, New York Falk D (1975) Comparative anatomy of the larynx in man and the chimpanzee: implications for language in Neanderthal. Am J Phys Anthropol 43:123-132 Fernald AT, Taeschner J, Dunn J et al (1989) A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants. J Child Lang 16:477-501 Fisher SE, Vargha-Khadem F, Watkins KE et al (1998) Localization of a gene implicated in a severe speech and language disorder. Nat Genet 18:168-170 Fitch WT (2000) Skull dimensions in relation to body size in nonhuman mammals: the causal bases for acoustic allometry. Zool 103:40-58 Fitch WT (2010) The evolution of language. Cambridge University Press, New York Flowers KA, Robertson C (1985) The effects of Parkinson's disease on the ability to maintain a mental set. J Neurol Neurosurg Psych 48:517-529 Frayer DW, Nicolay CW (2000) Fossil evidence for the origin of speech sounds. In: Wallin NL, Merker B, Brown S (eds) The origins of music. MIT Press, Cambridge, MA Gardner RA, Gardner BT (1969) Teaching sign language to a chimpanzee. Science 165:664-672 Goldstein K (1948) Language and language disturbances. Grune and Stratton, New York Granat J, Boë L-J, Badin P et al (2007) Prediction of the ability of reconstituted vocal tracts of fossils to produce speech. In: ICPhS XVI, pp 381-384. www.icphs2007.de Graybiel AM (1995) Building action repertoires: memory and learning functions of the basal ganglia. Curr Opin Neurobiol 5:733-741 Greenberg J (1963) Universals of language. MIT Press, Cambridge, MA Gunz P, Neubauer S, Maureille B et al (2010) Brain development after birth differs between Neanderthals and modern humans. Curr Biol 20:R921-R922 Gunz P, Neubauer S, Golovanova L et al (2012) A uniquely modern human pattern of endocranial development: insights from a new cranial reconstruction of the Neandertal newborn from Mezmaiskaya. J Hum Evol 62:300-313 Harrington DL, Haaland KY (1991) Sequencing in Parkinson's disease: abnormalities in programming and controlling movement. Brain 114:99-115 Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language: what is it, who has it, how did it evolve? Science 298:1569-1579 Hebb DO (1949) The organization of behavior: a neurophysiological theory. Wiley, New York Henke W (1966) Dynamic articulatory model of speech production using computer simulation. Ph.D. diss, MIT Herculano-Houzel S (2009) The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci 3:1-11 Hershkovitz I, Speirs MS, Frayer D et al (1995) Ohalo II H2: a 19,000-year-old skeleton from a water-logged site at the Sea of Galilee, Israel. Am J Phys Anthropol 96:215-234 Hewes GW (1973) Primate communication and the gestural origin of language. Curr Anthropol 14:5-24

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 28

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Hewitt G, MacLarnon A, Jones KE (2000) The functions of laryngeal air sacs in primates: a new hypothesis. Folia Primatol 73:70-94 Hillenbrand J, Getty LA, Clark MJ et al (1995) Acoustic characteristics of American English vowels. J Acoust Soc Am 97:3099-3111 Hooton EA (1946) Up from the Ape, 2nd edn. Macmillan, New York Houghton P (1993) Neandertal supralaryngeal vocal tract. Am J Phys Anthropol 90:139-146 Iwatsubo T, Kuzuhara S, Kanemitsu MD et al (1990) Corticofugal projections to the motor nuclei of the brainstem and spinal cord in humans. Neurology 140:309-312 Jellinger K (1990) New developments in the pathology of Parkinson's disease. In: Streifler MB, Korczyn AD, Melamed E et al (eds) Advances in neurology, vol 53, Parkinson's disease: anatomy, pathology, and therapy. Raven, New York Jerison HJ (1973) Evolution of the brain and intelligence. Academic, New York Jin X, Costa RM (2010) Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466:457-462 Joshua M, Adler A, Mitelman R et al (2008) Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 28:11673-11684 Jurgens U (2002) Neural pathways underlying vocal control. Neurosci Biobehav Rev 26:235-258 Kaminski J, Call J, Fischer J (2004) Word learning in a domestic dog: evidence for "fast mapping". Science 304:1682-1683 Kean MR, Houghton P (1982) The Polynesian head: growth and form. J Anat 135:423-435 King EW (1952) A roentgenographic study of pharyngeal growth. Angle Orthod 22:23-37 Kondoh M (2010) Linking learning adaptation to trophic interactions: a brain size-based approach. Funct Ecol 24:35-43 Konopka G, Bomar JM, Winden K et al (2009) Human-specific transcriptional regulation of CNS development genes by FOXP2. Nature 462:213-217 Koppel HJ, Kendrick GS, Moreland JE (1968) The vertebral level of the arytenoid cartilages. Anat Rec 160:583-586 Kotz SA, Frisch S, von Cramen DY et al (2003) Syntactic language processing: ERP lesion data on the role of the basal ganglia. J Int Neuropsychol Soc 9:1053-1060 Kuypers H (1958) Corticobulbar connections to the pons and lower brainstem in man. Brain 81:364-388 Lai CS, Gerrelli D, Monaco AP et al (2003) FOXP2 expression during brain development coincides with adult sites of pathology in a severe speech and language disorder. Brain 126:2455-2462 Laitman JT, Crelin ES (1976) Postnatal development of the basicranium and vocal tract region in man. In: Bosma JT (ed) Development of the basicranium. National Institutes of Health, Bethesda Laitman JT, Heimbuch RC (1982) The basicranium of Plio-Pleistocene hominids as an indicator of their upper respiratory systems. Am J Phys Anthropol 59:323-343 Laitman JT, Heimbuch RC, Crelin ES (1978) Developmental change in a basicranial line and its relationship to the upper respiratory system in primates. Am J Anat 152:467-482 Laitman JT, Heimbuch RC, Crelin ES (1979) The basicranium of fossil hominids as an indicator of their upper respiratory systems. Am J Phys Anthropol 51:15-34 Lange KW, Robbins TW, Marsden CD et al (1992) L-dopa withdrawal in Parkinson's disease selectively impairs cognitive performance in tests sensitive to frontal lobe dysfunction. Psychopharmacology (Berl) 107:394-404 Lartet E (1868) De quelques cas de progression organique vérifiables dans la succession des temps, géologiques sur des mammifères de même famille et de même genre. Comp Rendus Acad Sci Par 66:1119-1122 Lehéricy S, Ducros M, Van de Moortele PF et al (2004) Diffusion tensor fiber tracking shows distinct corticostriatal circuits in humans. Ann Neurol 55:522-527 Liberman AM, Cooper FS, Shankweiler DP et al (1967) Perception of the speech code. Psychol Rev 74:431-461 Lieberman P (1967) Intonation, perception and language. MIT Press, Cambridge, MA Lieberman P (1993) On the Kebara KMH 2 hyoid and Neanderthal speech. Curr Anthropol 34:172-175 Lieberman P (1994) Hyoid bone position and speech: reply to Dr Arensburg, et al. (1990). Am J Phys Anthropol 94:275-278

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 29

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Lieberman P (2000) Human language and our reptilian brain: the subcortical bases of speech, syntax, and thought. Harvard University Press, Cambridge, MA Lieberman P (2002) On the nature and evolution of the neural bases of human language. Yearb Phys Anthropol 45:36-62 Lieberman P (2006) Toward an evolutionary biology of language. Harvard University Press, Cambridge, MA Lieberman P (2007) The evolution of human speech: its anatomical and neural bases. Curr Anthropol 48:39-66 Lieberman DE (2008) Speculations about the selective basis for modern human craniofacial form. Evol Anthropol 17:55-68 Lieberman P (2009) FOXP2 and human cognition. Cell 137:800-802 Lieberman DE (2011) The evolution of the human head. Harvard University Press, Camridge, MA Lieberman P (2012) Vocal tract anatomy and the neural bases of talking. J Phon 40:608-622 Lieberman P (2013) The unpredictable species: what makes humans unique. Princeton University Press, Princeton Lieberman P, Crelin ES (1971) On the speech of Neanderthal man. Linguistics Inq 2:203-222 Lieberman DE, McCarthy RC (1999) The ontogeny of cranial base angulation in humans and Chimpanzees and its implications for reconstructing pharyngeal dimensions. J Hum Evol 36:487-517 Lieberman P, McCarthy RC (2007) Tracking the evolution of language and speech. Expedition 49:15-20 Lieberman P, Klatt DH, Wilson WH (1969) Vocal tract limitations on the vowel repertoires of rhesus monkey and other nonhuman primates. Science 164:1185-1187 Lieberman P, Crelin ES, Klatt DH (1972) Phonetic ability and related anatomy of the newborn, adult human, Neanderthal man, and the Chimpanzee. Am Anthropol 74:287-307 Lieberman P, Kako ET, Friedman J et al (1992) Speech production, syntax comprehension, and cognitive deficits in Parkinson's disease. Brain Lang 43:169-189 Lieberman DE, McCarthy RC, Hiiemae KM et al (2001) Ontogeny of larynx and hyoid descent in humans: implications for deglutition and vocalization. Arch Oral Biol 46:117-128 Lieberman DE, McBratney BM, Krovitz G (2002) The evolution and development of cranial form in Homo sapiens. Proc Natl Acad Sci U S A 99:1134-1139 MacLarnon AM, Hewitt GP (2004) Increased breathing control: another factor in the evolution of human language. Evol Anthropol 13:181-197 MacLean PD, Newman JD (1988) Role of midline frontolimbic cortex in the production of the isolation call of Squirrel monkeys. Brain Res 450:111-123 Maricic T, Günther V, Georgiev O et al (2013) A recent evolutionary change affects a regulatory element in the human FOXP2 gene. Mol Biol Evol 25:1257-1259 Marsden CD, Obeso JA (1994) The functions of the basal ganglia and the paradox of sterotaxic surgery in Parkinson's disease. Brain 117:877-897 Martínez I, Arsuaga JL, Quam R et al (2008) Human hyoid bones from the middle Pleistocene site of the Sima de los Huesos (Sierra de Atapuerca, Spain). J Hum Evol 54:118-124 McCarthy RC, Strait DS, Yates FW et al (in preparation) Origin of fully modern human speech capabilities McCown TD, Keith A (1939) The stone age of Mount Carmel. The fossil human remains from the Levalloiso-Mousterian, vol II. Oxford University Press, Oxford McElligott AG, Birrer M, Vannoni E (2006) Retraction of the mobile descended larynx during groaning enables fallow bucks (Dama dama) to lower their formant frequencies. J Zool 270:340-345 Miller EK, Wallis JD (2009) Executive function and higher order cognition: definition and neural substrates. In: Squire LR (ed) Encyclopedia of, neuroscience, vol 4. Springer, New York Mirenowicz J, Schultz W (1996) Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379:449-451 Mithen S (2005) The singing Neanderthals; the origins of music, language, mind and body. Weidenfeld and Nicolson, London Monchi O, Petrides M, Petre V et al (2001) Wisconsin card sorting revisited: distinct neural circuits participating in different stages of the task identified by event-related functional magnetic resonance imaging. J Neurosci 21:7739-7741 Monchi O, Petrides M, Strafella AP et al (2006a) Functional role of the basal ganglia in the planning and execution

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 30

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

of actions. Ann Neurol 59:257-264 Monchi O, Ko JH, Strafella AP (2006b) Striatal dopamine release during performance of executive function: A [ 11 C] raclopride PET study. Neuroimage 33:907-912 Monchi O, Petrides M, Mejia-Constain B et al (2007) Cortical activity in Parkinson disease during executive processing depends on striatal involvement. Brain 130:233-244 Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520-562 Muller J (1848) The physiology of the senses, voice and muscular motion with the mental faculties (trans: Baly W). Walton and Maberly, London Nauta WJH, Gygax PA (1954) Silver impregnation of degenerating axons in the central nervous system: a modified technic. Stain Technol 29:91-93 Nearey T (1978) Phonetic features for vowels. Indiana University Linguistics Club, Bloomington Negus VE (1949) The comparative anatomy and physiology of the larynx. Hafner, New York Newman JD (1985) The infant cry of primates: an evolutionary perspective. In: Lester BM, Zachariah Boukydis CF (eds) Infant crying: theoretical and research perspectives. Plenum, New York Nishimura T (2003) Comparative morphology of the hyo-laryngeal complex in anthropoids: two steps in the evolution of the descent of the larynx. Primates 44:41-49 Nishimura T, Mikami A, Suzuki J et al (2003) Descent of the larynx in chimpanzee infants. Proc Natl Acad Sci U S A 100:6930-6933 Nishimura T, Mikami A, Suzuki J, Matsuzawa T (2006) Descent of the hyoid in Chimpanzees: evolution of face flattening and speech. J Hum Evol 51:244-254 Nishimura T, Oishi T, Suzuki J et al (2008) Development of the supralaryngeal vocal tract in Japanese macaques: implications for the evolution of the descent of the larynx. Am J Phys Anthropol 135:182-194 Perkell JS (1969) Physiology of speech production: results and implications of a quantitative cineradiographic study. MIT Press, Cambridge, MA Peterson GE, Barney HL (1952) Control methods used in a study of the vowels. J Acoust Soc Am 24:175-184 Petrides M (2005) Lateral prefrontal cortex: architectonic and functional organization. Phil Trans R Soc Lond B 360:781-795 Pickett ER, Kuniholm E, Protopapas A et al (1998) Selective speech motor, syntax and cognitive deficits associated with bilateral damage to the putamen and the head of the caudate nucleus: a case study. Neuropsychology 36:173-188 Pike KE (1945) The intonation of American English. University of Michigan Press, Ann Arbor Pinker S (1998) How the mind works. Norton, New York Postle BR (2006) Working memory as an emergent property of the mind and brain. Neuroscience 139:23-38 Reimers-Kipping S, Hevers S, Pääbo S et al (2011) Humanized Foxp2 specifically affects cortico-basal ganglia circuits. Neuroscience 175:75-84 Roche AF, Barkla DH (1965) The level of the larynx during childhood. Ann Otol Rhinol Laryngol 74:645-654 Rosas A, Martínez-Maza C, Bastir M et al (2006) Paleobiology and comparative morphology of a late Neandertal sample from El Sidrón, Asturias, Spain. Proc Natl Acad Sci USA 103:19266-19271 Sanes JN, Donoghue JP, Thangaraj V et al (1999) Shared neural substrates controlling hand movements in human motor cortex. Science 268:1775-1777 Savage-Rumbaugh S, Rumbaugh D, McDonald K (1985) Language learning in two species of Apes. Neurosci Biobehav Rev 9:653-665 Schulz GM, Varga M, Jeffires K et al (2005) Functional neuroanatomy of human vocalization: an H 2 15O PET study. Cereb Cortex 15:1835-1847 Semendeferi K, Damasio H, Frank R et al (1997) The evolution of the frontal lobes: a volumetric analysis based on three-dimensional reconstructions of magnetic resonance scans of human and Ape brains. J Hum Evol 32:375-378 Semendeferi K, Lu A, Schenker N et al (2002) Humans and great apes share a large frontal cortex. Nat Neurosci 5:272-276 Shu W, Yang H, Zhang L et al (2001) Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lungs and act as transcriptional repressors. J Biol Chem 276:27488-27497

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 31

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

Simard F, Joanette Y, Petrides M et al (2011) Fronto-striatal contributions to lexical set-shifting. Cereb Cortex 21:1084-1093 Spurzheim JK (1815) The physiognomical system of Drs. Gall and Spurzheim. Printed for Baldwin Cradock and Joy, London Stevens KN (1972) Quantal nature of speech. In: David EE Jr, Denes PB (eds) Human communication: a unified view. McGraw Hill, New York Stuss DT, Benson DF (1986) The frontal lobes. Raven, New York Takemoto H (2001) Morphological analyses of the human tongue musculature for three-dimensional modeling. J Speech Hear Res 44:95-107 Takemoto H (2008) Morphological analyses and 3D modeling of the tongue musculature of the Chimpanzee (Pan troglodytes). Am J Primatol 70:966-975 Terao S, Li M, Hashizume Y et al (1997) Upper motor neuron lesions in stroke patients do not induce anterograde transneuronal degeneration in spinal anterior horn cells. Stroke 28:2553-2556 The Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the Chimpanzee genome and comparison with the human genome. Nature 437:69-87 Tishkoff SA, Reed F et al (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39:31-40 Tourné LPM (1991) Growth of the pharynx and its physiologic implications. Am J Orthod Dentofac Orthop 99:129-139 Trinkaus E (1983) The Shanidar Neandertals. Academic, New York Trinkaus E (2003) Neandertal faces were not long; modern human faces are short. Proc Natl Acad Sci U S A 100:8142-8145 Truby HL, Bosma JF, Lind J (1965) Newborn infant cry. Almquist and Wiksell, Uppsala Tsanas A, Little MA, McSharry PE et al (2009) Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests. IEEE Trans Biomed Eng 53:1943-1953 Tseng C-Y (1981) An acoustic study of tones in Mandarin. PhD dissertation, Brown University Tyson E (1699) Orang-outang, sive, Homo sylvestris; or, the anatomy of a pygmie compared with that of a monkey, an ape, and a man. Printed for Thomas Bennet and Daniel Brown, London Ullman MT (2004) Contributions of memory circuits to language: the declarative/procedural model. Cognition 92:311-270 Vargha-Khadem F, Watkins K, Alcock K et al (1995) Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proc Natl Acad Sci U S A 92:930-933 Vargha-Khadem F, Watkins KE, Price CJ et al (1998) Neural basis of an inherited speech and language disorder. Proc Natl Acad Sci U S A 95:12695-12700 Vorperian HK, Wang S, Chung MK et al (2009) Anatomic development of the oral and pharyngeal proportions of the vocal tract: an imaging study. J Acoust Soc Am 125:1666-1678 Wang J, Rao H, Wetmore GS et al (2005) Perfusion functional MRI reveals cerebral blood flow pattern under psychological stress. Proc Natl Acad Sci U S A 102:17804-17809 Watkins KE, Vargha-Khadem F, Ashburner J et al (2002) MRI analysis of an inherited speech and language disorder: structural brain abnormalities. Brain 125:465-478 Westhorpe RN (1987) The position of the larynx in children and its relationship to the ease of intubation. Anaesth Intensive Care 15:384-388

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 32

Dr. Philip Lieberman and Dr. Robert C. McCarthy The Evolution of Speech and Language

SpringerReference

The Evolution of Speech and Language Dr. Philip Lieberman

Brown University, Providence, USA

Dr. Robert C. McCarthy

Department of Biological Sciences, Benedictine University, Lisle, USA

DOI:

10.1007/SpringerReference_363355

URL:

http://www.springerreference.com/index/chapterdbid/363355

Part of:

Handbook of Paleoanthropology

Editors:

Professor Winfried Henke and Professor Ian Tattersall

PDF created on:

March, 03, 2014 13:29

© Springer-Verlag Berlin Heidelberg 2014

http://www.springerreference.com/index/chapterdbid/363355 © Springer-Verlag Berlin Heidelberg 2014

3 Mar 2014 13:29 33