Language Evolution from a Simulation Perspective

1 downloads 0 Views 5MB Size Report
William S-Y WANG, so greatly that all the words in this thesis do not suffice to ...... The effects of the parameters marked with "*" are ...... through the internalization of the corresponding grammar in language learners, ...... matter how subtle and advanced these factors and gestural or vocalization ...... For instance, Skipper et al.
Language Evolution from a Simulation Perspective: on the Coevolution of Compositionality and Regularity

GONG, Tao

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electronic Engineering

© The Chinese University of Hong Kong May 2007

The Chinese University of Hong Kong holds the copyright of this thesis. Any person(s) intending to use a part or whole of the materials in the thesis in a proposed publication must seek copyright release from the Dean of the Graduation School.

To my parents, Dekui GONG and Qihui WANG, for their unfailing consideration and continuous support.

“Chaque langue forme un système où tout-se-tient, et a un plan général d’une merveilleuse rigueur.”(“Every language forms a system, all parts of which organically cohere and interact.”) ---- Antoine Meillet (1893)

“Art is a lie, but it helps us to understand the truth.” ---- Pablo Picasso

Acknowledgement

This thesis is a synthesis of the research during my PhD study at both City University of Hong Kong (CityU) and The Chinese University of Hong Kong (CUHK). I am indebted to my supervisor, Prof. William S-Y WANG, so greatly that all the words in this thesis do not suffice to thank him. Without his kindness of enrolling me into the Language Engineering Laboratory (LEL) at CityU in 2003, I would not have had the opportunity to start my academic study in evolutionary linguistics. His special wisdom led me into the realm of linguistics, and his resourceful papers on language evolution inspired me to choose my topic on computational simulation of language origin. He has guided me to learn how to appreciate the beauty and cherish the joy of doing multi-disciplinary research in basic science, and his encouragement has been all along supporting me to overcome various difficulties in research. He is not only a mentor of research but also a mentor of life. I am deeply grateful for his valuable edifications on how to be disciplined and persistent, how to handle miscellaneous interferences, and how to be optimistic about life. All these years of study from him has transformed me from an ignorant student and a “sad apple” into a dedicated researcher and an energetic person. I would like to extend my gratitude to Prof. Ron CHEN, who has stimulated my interests in complex networks, partially supported my study in CityU, and sincerely helped me in search of suitable future academic positions. I also want to thank professors Thomas Hun-Tak LEE and John H. HOLLAND. Prof. Lee’s great knowledge on language acquisition and critical comments on my work are valuable for me to design my model. Prof. Holland’s insightful lectures at the Complex System Summer School and precious suggestions during his visit to LEL have greatly inspired me to not only fulfill my work but also extend my vision to other evolutionary topics. The friendship with these leading scholars has been guiding me to explore in this field. I am also grateful to the following professors, Vittorio LORETO, Jim HURFORD, Simon KIRBY, Luc STEELS, Tom SCHOENEMANN, and scholars, Christophe COUPÉ, Paul VOGT, for the useful discussions during a number of exchanges between their institutions and LEL. They have kindly supported me to pursue my research in these years.

i

I have been in LEL for almost four years. I own a lot to the technical support and moral encouragement given by all the current and alumnal members of this great family. It was my great honor to collaborate with Dr. James W. MINETT and Dr. Jinyun KE. Their resourceful ideas and rich experiences on linguistics and computational simulation have greatly inspired me to fulfill my research. They also generously helped me to improve my awkward English and polish my publications. In addition, I am especially grateful to Dr. Feng WANG, Dr. Gang PENG and Dr. Ching-Pong AU, who kindly helped me to adapt to the life in Hong Kong. Furthermore, I enjoyed a lot discussing with my current colleagues, Susan Lan SHUAI, Francis Chunkit WONG, Hongying ZHENG, Raymond Weiwen Wu, Yingwei WONG, Margaret LEI, and Hui CHEN on various research and living topics. I am very fortunate to have these close amigos in my life. In the end, I would like to thank the members in a bigger family, the Digital Signal Processing and Speech Technology Laboratory (DSPSTL). The leaders of this laboratory, Professors Pak-Chung CHING and Tan LEE, have generously supported me to perform my basic science research in this engineering laboratory. And I enjoyed working, growing, and improving my skills on and off the bench with the colleagues in DSPSTL, Meng YUAN, Yujia LI, Nengheng ZHENG, Hua OUYANG, Wei ZHANG, Ning WANG, Feng TIAN, Houwei CAO, Yao QIAN, Zhao WANG, Jing ZHANG, Sheng ZHANG, Yvone Siuwa LI, and Natalie Li ZENG. Without these friends, my PhD life in CUHK would have had lost many colors.

ii

Abstract

The thesis presents a multi-agent computational model to explore a key question in language emergence, i.e., whether syntactic abilities result from innate, species-specific competences, or they evolve from domain-general abilities through gradual adaptations. The model simulates a process of coevolutionary emergence of two linguistic universals (compositionality, in the form of lexical items; and regularity, in the form of constitute word orders) in human language, i.e., the acquisition and conventionalization of these features coevolve during the transition from a holistic signaling system to a compositional language. It also traces a “bottom-up” process of syntactic development, i.e., agents, by reiterating local orders between two lexical items, can gradually form global order(s) to regulate multiple lexical items in sentences. These results suggest that compositionality, regularity, and correlated linguistic abilities could have emerged as a result of some domain-general abilities, such as pattern extraction and sequential learning. Apart from individual learning mechanisms, the thesis also explores the effects of cultural transmissions, social and semantic structures on language evolution. First, it simulates some major forms of cultural transmissions, and discusses the role of conventionalization during horizontal transmissions in language evolution. Second, it traces the emergence and maintenance of language in some stable social structures, and explores the role of popular agents in language evolution, the relationship between mutual understanding and social hierarchy, and the effect of exoteric communications on the convergence of communal languages. Finally, it studies language maintenance given different semantic spaces, and illustrates that the semantic structure may cause bias in the constituent word order, which can help predict the word order bias in human languages. These explorations examine the role of self-organization in language evolution, provide some reconsideration on the bottleneck effect during cultural transmissions, and shed light on the study of the social structure effects on language evolution.

iii

iv

Table of Contents

Acknowledgement ................................................................................................... i Abstract................................................................................................................. iii ....................................................................................................................... iv List of Figures ..................................................................................................... viii List of Tables ...................................................................................................... xiii Chapter 1. Introduction......................................................................................... 1 1.1. 1.2. 1.3.

Evolutionary Linguistics and Simulation Perspective ..................................................... 1 The Coevolutionary Hypothesis and My Computational Framework ............................. 3 Thesis Structure ................................................................................................................ 7

Chapter 2. Research Question: Language Emergence ......................................... 10 2.1. 2.1.1. 2.1.2. 2.1.3. 2.1.4. 2.1.5. 2.1.6.

2.2. 2.2.1. 2.2.2. 2.2.3.

2.3. 2.3.1. 2.3.2. 2.3.3.

Three Stages of Language Evolution ............................................................................. 10 What is the capacity for human language? ................................................................................10 What is evolving during language evolution?............................................................................13 How to date language evolution? ..............................................................................................17 Language emergence................................................................................................................19 Language change .....................................................................................................................21 Language death ........................................................................................................................23

Mainstream Theories on Language Emergence ............................................................. 24 Innatism ...................................................................................................................................25 Connectionism/Emergentism ....................................................................................................34 The scenarios of Innatism and Connectionism/Emergentism .....................................................41

The Coevolutionary Hypothesis on Language Emergence ............................................ 47 Holistic protolanguage with no syntactic structure to encode descriptive meanings ...................48 Coevolution of compositionality and regularity ........................................................................49 From “local” to “global”, a “bottom-up” process of syntactic development ...............................52

Chapter 3. Research Method: Computational Simulation ................................... 56 3.1. 3.2. 3.2.1. 3.2.2. 3.2.3. 3.2.4. 3.2.5. 3.2.6. 3.2.7.

3.3.

Major Components of Computational Linguistics ......................................................... 56 Computational Simulation.............................................................................................. 58 The necessity of computational simulation ...............................................................................59 The limitations and difficulties of computational simulation .....................................................66 Classification based on the purpose ..........................................................................................69 Classification based on the resolution of language representation..............................................71 Classification based on the situateness of individual and the structure of language ....................77 Classification based on the adopted research methods ...............................................................86 General steps to build up a computational model on language evolution ...................................90

My Computational Framework on Language Emergence ............................................. 93

Chapter 4. Compositionality-Regularity Coevolution Model .............................. 96 4.1. 4.2. 4.3. 4.3.1. 4.3.2.

Semantic Space .............................................................................................................. 96 Utterance Space .............................................................................................................. 99 Representation and Acquisition of Linguistic Knowledge ............................................ 99 Lexical rules .......................................................................................................................... 101 Syntactic rules........................................................................................................................ 103

v

4.3.3. 4.3.4.

4.4. 4.5. 4.5.1. 4.5.2. 4.5.3. 4.5.4.

4.6.

Categories .............................................................................................................................. 104 Acquisition of linguistic knowledge ....................................................................................... 107

Memory System ........................................................................................................... 121 Communication Scenario ............................................................................................. 124 The utterance exchange .......................................................................................................... 124 The environmental cue ........................................................................................................... 127 The interaction of linguistic rules in production and comprehension ....................................... 129 The strength-based competition .............................................................................................. 131

Indices to Test the Performance ................................................................................... 139

Chapter 5. The Coevolutionary Emergence Process and Effects of Languagerelated Abilities ................................................................................................... 143 5.1. 5.1.1. 5.1.2. 5.1.3.

5.2. 5.2.1. 5.2.2. 5.2.3. 5.2.4.

The Coevolutionary Emergence of Compositionality and Regularity ......................... 143 The simulation condition and parameter setting ...................................................................... 143 Language emergence through iterated communications .......................................................... 144 Heterogeneity on language development................................................................................. 155

The Effects of Language-related Abilities on Language Emergence .......................... 162 The effects of the production probability (PP) ........................................................................ 165 The effects of the learning ability ........................................................................................... 166 The effects of the ability to build up a shared intention ........................................................... 172 Language emergence as a collective result of language-related abilities .................................. 174

Chapter 6. The Role of Cultural Transmissions in Language Evolution ........... 178 6.1. 6.2. 6.3. 6.3.1. 6.3.2. 6.3.3.

6.4.

Introduction .................................................................................................................. 178 The Acquisition Framework......................................................................................... 183 The Role of Cultural Transformation in Language Evolution ..................................... 186 Language emergence in the acquisition framework ................................................................. 186 Exp.1 ..................................................................................................................................... 192 Exp.2 ..................................................................................................................................... 197

Conclusion and General Discussion ............................................................................. 201

Chapter 7. The Effects of Social Structure on Language Evolution .................. 206 7.1. 7.2. 7.2.1. 7.2.2. 7.2.3.

7.3.

Introduction .................................................................................................................. 206 The Effects of Social Structure on Language Evolution .............................................. 211 Exp.1: a community with a single popular agent ..................................................................... 211 Exp.2: a community with a given distribution of individuals’ popularities ............................... 214 Exp.3: the linguistic contact between two communities .......................................................... 221

Conclusion and General Discussion ............................................................................. 224

Chapter 8. The Prediction of Word Order Bias ................................................. 228 8.1. 8.2. 8.2.1. 8.2.2. 8.2.3.

8.3.

Introduction .................................................................................................................. 228 Simulations on the Word Order Bias ........................................................................... 231 Terminology to classify local and global orders as well as order changes ................................ 231 Exp.1 ..................................................................................................................................... 235 Exp.2 ..................................................................................................................................... 239

Conclusion and General Discussion ............................................................................. 246

Chapter 9. General Discussion ......................................................................... 250 9.1. 9.1.1. 9.1.2. 9.1.3. 9.1.4.

9.2.

Major Conclusions of this Research............................................................................. 250 The coevolution of compositionality and regularity ................................................................ 250 The influence of the linguistic environment ............................................................................ 251 The self-organization of various intensively-connected components........................................ 253 The underlying driving force of semantics .............................................................................. 254

Adopted Assumptions and Empirical Bases ................................................................ 256

vi

9.2.1. 9.2.2. 9.2.3. 9.2.4. 9.2.5. 9.2.6. 9.2.7.

9.3. 9.3.1. 9.3.2. 9.3.3.

9.4. 9.5.

Agents share the same semantic space and semantic categories. .............................................. 256 Agents have communicative intentions and symbolic communication abilities ........................ 258 Agents can limitedly read other’s intention ............................................................................. 259 Agents have the imitation ability ............................................................................................ 260 Agents have the pattern extraction and associative learning abilities ....................................... 261 Agents have the sequencing and categorization abilities.......................................................... 263 Agents’ behaviors are rule governed ....................................................................................... 264

Limitations and Future Modifications .......................................................................... 266 The implicit connection between semantics and syntax ........................................................... 267 The scale-up problems ........................................................................................................... 269 The offline utterance processing mechanisms ......................................................................... 271

Emergentism and Computational Simulation Revisited .............................................. 274 Final remark: A Scenario for the Evolution of Linguistic Morphology ...................... 276

Appendix A. Algorithms and Parameters of the Model ..................................... 278 A.1 A.1.1 A.1.2 A.1.3

A.2 A.2.1 A.2.2 A.2.3

A.3 A.3.1 A.3.2 A.3.3 A.3.4

A.4 A.5 A.5.1 A.5.2 A.5.3

Data Structures of Basic Components .......................................................................... 278 Data structures representing the artificial language (see Figure A.1)........................................ 278 Data structures representing the linguistic knowledge (see Figure A.2) ................................... 279 Data structures representing the memory system and nonlinguistic information ...................... 280

Acquisition of Linguistic Knowledge .......................................................................... 281 The insertion of a new M-U mapping in the buffer ................................................................. 281 The acquisition of lexical rules ............................................................................................... 283 The acquisition of categories and syntactic rules ..................................................................... 285

Utterance Exchange...................................................................................................... 288 Production ............................................................................................................................. 288 Comprehension ...................................................................................................................... 291 The selection of categories and syntactic rules in production................................................... 296 The selection of categories and syntactic rules in comprehension ............................................ 299

Strength-based Competition ......................................................................................... 302 Parameters of the Model .............................................................................................. 308 Parameters for the artificial language ...................................................................................... 308 Parameters for the linguistic rules ........................................................................................... 309 Parameters for the communication.......................................................................................... 310

Appendix B. Adjustments on Word Order Stability and Transitivity Matrices.. 312 Bibliography ....................................................................................................... 318

vii

List of Figures Figure 1.1. The conceptual framework of the computational model in this thesis. The SEMANTICS rectangle represents the predefined semantic space, and the ovals represent the three aspects of linguistic knowledge acquired by agents based on different domain-general abilities, pattern extraction, sequential learning, and categorization. The EMERGENT GLOBAL ORDERS rectangle represents the emergent syntactic patterns triggered by these domain-general abilities and correlated linguistic knowledge (indicated by arrows).........................................................................................................................................................6 Figure 3.1. The explicit meaning transference. A communicative episode which consists of the explicit transfer of both the utterance /zknvrt/ and the meaning “three apples” from the speaker (left) to the listener (right) (adapted from A. Smith 2003c)...75 Figure 3.2. The semiotic triangle. The arbitrary or conventionalized relation is indicated by direct line, and indirect relation is indicated by the dotted line. (adapted from Ogden and Richards 1923)…………………………………………..………………..76 Figure 3.3. The speaking (pij) and listening (qij) matrices. (a) An example of the speaking and listening matrices; (b) An example of the inconsistent coherent speaking and listening matrices; (c) An example of the consistent coherent speaking and listening matrices. (adapted from Ke et al. 2002)……………………………………………………………………………………..……...79 Figure 3.4. The lexical convergence. X-axis shows the number of communications, and Y-axis the similarity or the convergence rate. The thick line traces SI, the ticked line traces PC, and the dash line traces IC. Phase transitions of IC and PC take place around 3,000 communications. Simulation conditions: population size = 10, number of meaning = number of utterance = 3, . = 0.2. (adapted from Ke et al. 2002)…………………………………………………………………………….…………………….80 Figure 3.5. The iterative learning framework. Linguistic competence (Hi) regulates an individual (Ai) of generation i to express certain meanings (Mi) with certain utterance (Ui). These utterances are also the primary linguistic data exposed to individuals in generation i+1. In this figure, each generation has one individual and the learning happens only between individuals in successive generations. (adapted from K. Smith et al. 2003)…………………………………..……………………………………………….81 Figure 3.6. Learning and production. (a) Learning: the large filled circles represent activated, related nodes (labelled with the component they represent) and small filled circles represent associative weights; (b) Production: the relevant connection weights are highlighted in gray. (adapted from Smith et al. 2003)……………………………………………………………………..……82 Figure 3.8. The structure of the neural network in an individual. The input layer includes visual information of a mushroom and a linguistic utterance to describe a mushroom. The output layer includes an action part to correlate language communications with effective actions, and a linguistic output for training other individuals. (adapted from Munroe and Cangelosi 2002)………….…85 Figure 3.7. The compositional structure (average of 100 agents in one generation, the cone height corresponds to the probability of using that node: the greatest height corresponds to 100% use of that node in the whole population). (adapted from Munroe and Cangelosi 2002)……………………………………………………………………………………………………..………………85 Figure 4.1. The examples of lexical rules. Numbers in brackets are lexical rule strengths. Lexical rules (a)-(d) are holistic, (e)-(g) are compositional, in which (e) and (f) are word rules and (g) is a phrase rule. Lexical rules (b) and (c) are synonymous rules, and (b) and (f) are homonymous rules.....................................................................................................................................................101 Figure 4.2. The examples of syntactic rules. Numbers in brackets are syntactic rule strengths. “” is “after”, “S” is “surround”, and “T” is “between”. Syntactic rule (1) means that the utterance of Lexical rule (a) is before the utterance of Lexical rule (b). Syntactic rule (2) means that the utterance of Lexical rule (c) is after the utterance of any lexical rule in Category (I). Syntactic rule (3) means that the utterance of any lexical rule in Category (II) surrounds the utterance of any lexical rule in Category (III). Syntactic rule (4) means that the utterance of any lexical rule in Category (III) is surrounded by the utterance of any lexical rule in Category (IV).................................................................................................................................104 Figure 4.3. The examples of categories. Numbers in brackets are lexical or syntactic rule strengths. Numbers in square brackets are association weights of lexical rules. S, V, and O in brackets are syntactic roles of categories. Cat1, Cat2 and Cat3 are categories, in which Lex-List is the list of lexical rules, Syn-List is the list of syntactic rules. The sememe of the lexical rule (d) can be either Ag or Pat in different integrated meanings, and this lexical rule is associated with both an S category and an O category with different association weights. The syntactic rule (I) indicates a local order “after” between the utterance of any lexical member from the V category and the utterance of any lexical member from the S category, and it can be simplified as SV.....................................................................................................................................................................................................106

viii

Figure 4.4. The examples of detecting recurrent patterns to acquire lexical rules...........................................................................112 Figure 4.5. The examples of the acquired syntactic rules and categories based on the instances in Figure 4.4...............................113 Figure 4.6. The example of category merging..................................................................................................................................114 Figure 4.7. The learning mechanism in ILM. In the grammatical construction, “#” is a semantic variable (unspecified item) and “*” is an utterance variable (unspecified syllables). When “#” is replaced by “Mike” or “Peter”, “*” will be replaced by /mike/ or /peter/ respectively (adapted from Kirby 2002a)..............................................................................................................................117 Figure 4.8. The alignment-based learning mechanism in Vogt’s model. (a) An example of the instance set. In each M-U mapping, the meaning consists of 4 numbers, each indicating the value of a semantic feature, such as shape and RGB colours; (b) Some recurrent patterns (underlined) found after hearing an M-U mapping: “(1, 0, 0, 0.57)” /redcircle/. The recurrent patterns in Cases (a)-(d) are allowed, but those in Cases (e) and (f) are disallowed (adapted from Vogt 2005a).............................................118 Figure 4.9. The two-level memory system. Com(i), Com(i+1), Com(i+k) are indices of communications. In each communication, the listener can acquire several M-U mappings...............................................................................................................................122 Figure 4.10. One round of utterance exchange in a communication. The dotted block of environmental cues indicates that this information is optional; and even it is available, it could be unreliable...........................................................................................125 Figure 4.11. An example of interactions of different types of linguistic rules in production (on the left) and comprehension (on the right). CatS, CatV and CatO are categories with syntactic roles S, V and O. “” is “after”...............................................................................................................................................................................................129 Figure 4.12. An example of calculating the combined strength in production (on the left) and comprehension (on the right). AssoWdef and SynStrdef are both 0.5. “T” means the local order “in between”. The blackened rules are the speaker’s and the listener’s winning sets of linguistic rules.........................................................................................................................................136 Figure 4.13. The conceptual figure showing the transition from relying on nonlinguistic information to relying on linguistic information.......................................................................................................................................................................................138 Figure 5.1. The emergence of compositionality and regularity in Sim.1.........................................................................................146 Figure 5.2. The emergence of compositionality and regularity in Sim.2.........................................................................................147 Figure 5.3. The expressivities of global and local orders in Sim.1 (a)-(d) and Sim.2 (e)-(h)..........................................................148 Figure 5.4. The UR and Disp curves in Sim.1 (a) and Sim.2 (b).....................................................................................................153 Figure 5.5. The UR and Disp of the communal language and idiolects in Sim.1 (a)-(d) and Sim.2 (e)-(h)....................................158 Figure 5.6. The average category sizes in idiolects in Sim.1 (a)-(d) and Sim.2 (e)-(h)...................................................................159 Figure 5.7. The regularity in idiolects in Sim.1 (a)-(d) and Sim.2 (e)-(h).......................................................................................160 Figure 5.8. The regularity in expressions of every integrated meaning in Sim.1 (a)-(d) and Sim.2 (e)-(h)....................................161 Figure 5.9. The effects of PP on language emergence: the UR and CT values of the emergent languages under different PP values. The distance between a pair of error bars above and below a data point is twice the standard deviation.......................................165 Figure 5.10. The effects of the buffer size on language emergence: the UR and CT values of the emergent languages under different buffer sizes.........................................................................................................................................................................166 Figure 5.11. The effects of DP on language emergence: the UR and CT values of the emergent languages under different DP values................................................................................................................................................................................................167 Figure 5.12. The effects of NoDetRecPatExm on language emergence. (a) and (b) show the UR and CT values of the emergent languages under different NoDetRecPatExm; (c) and (d) show a typical run under the situation where NoDetRecPatExm is 5....168

ix

Figure 5.13. The shared lexical rules after 600 rounds of communications in a simulation where NoDetRecPatExm is 5. ComLex records the shared rules, the first number in the bracket is the number of shared holistic rules, and the second one is that of shared compositional rule, and the sum of these two numbers is written outside the bracket. For each shared rule, Hol denotes a holistic rule, Ag, Pr or Pat denote the semantic roles of the sememe in a compositional rule, the double value denotes the average strength of this shared lexical rule in all agents, the numbers left to “” indicate different sememes, the numbers right to “” indicate different syllables.............................................................................................................................................................................170 Figure 5.14. The effects of the ability to learn simple sequential relations on language emergence: the UR and CT values of the emergent languages under different DetLocOrd Probability values................................................................................................171 Figure 5.15. The effects of RC on language emergence: the UR and CT values of the emergent languages under different RC values................................................................................................................................................................................................173 Figure 6.1. The acquisition framework. Empty dots are adults and filled ones are children. Different arrows represent different cultural transmissions. In vertical and oblique transmissions (AC-coms), only adults can be speakers and children are always listeners. In horizontal transmissions among children (CC-coms), only children can be either speakers or listeners.....................184 Figure 6.2. The emergence of compositionality and regularity in Sim.1.........................................................................................188 Figure 6.3. The emergence of compositionality and regularity in Sim.2.........................................................................................189 Figure 6.4. The children’s URs in Sim.1 and Sim.2.........................................................................................................................190 Figure 6.5. The statistical results of Exp.1 on language emergence: the UR, CT and URser values of the emergent languages after 100 generations of learning under different number of CC-coms. Numbers outside brackets are average values, numbers inside brackets are standard deviation values. The distance between a pair of error bars above and below a data point is twice the standard deviation.............................................................................................................................................................................193 Figure 6.6. The statistical results of Exp.1 on language emergence: the UR curves of the communal languages under different number of CC-coms. Each panel lists 10 UR curves in each condition. The X-axis is the number of generations, and the Y-axis the percentage of integrated meanings...................................................................................................................................................194 Figure 6.7. The statistical results of Exp.1 on language maintenance: the UR, URser and URini of the communal language after 100 generations of learning under different number of CC-coms....................................................................................................195 Figure 6.8. The statistical results of Exp.1 on language maintenance: the UR curves under different number of CC-coms..........196 Figure 6.9. The statistical results of Exp.2 on language emergence: the UR, CT and URser of the emergent language after 100 generations of learning under different ratios of AC-coms over CC-coms......................................................................................198 Figure 6.11. The statistical results of Exp.2 on language emergence: the UR curves of the emergent languages under different ratios of AC-coms over CC-coms....................................................................................................................................................199 Figure 6.10. The statistical results of Exp.2 on language emergence: the group UR curves (a) and the child UR curves (b) under a certain ratio of AC-coms over CC-coms (40:160). Each panel lists 10 UR curves in each condition. The X-axis is the number of generations, and the Y-axis the percentage of integrated meanings................................................................................................199 Figure 6.12. The statistical results of Exp.2 on language maintenance: the UR, URser and URini of the communal languages after 100 generations of learning under different ratios of AC-coms over CC-coms..............................................................................200 Figure 6.13. The statistical results of Exp.2 on language maintenance: the UR curves of the communal languages after 100 generations of learning under different ratios of AC-coms over CC-coms.....................................................................................201 Figure 7.1. The statistical results in Exp.1 on language emergence: the UR and CT values of the emergent languages under different PR values. The distance between a pair of error bars above and below a data point is twice the standard deviation......211 Figure 7.2. The statistical results in Exp.1 on language maintenance: the UR, URser and URini values of the communal languages under different PR values.................................................................................................................................................................212 Figure 7.3. The individuals’ popularities in different power-law distributions. The top figure is in normal axes, and the bottom one in logarithmic axes...........................................................................................................................................................................217

x

Figure 7.4. The statistical results in Exp.2 on language emergence: the high UR, last UR and CT values of the emergent languages in communities of different sizes and with different power-law distributions................................................................................218 Figure 7.5. The statistical results in Exp.2 on language maintenance: the UR, URser and URini values of the communal languages in communities of different sizes and with different power-law distributions................................................................................219 Figure 7.6. The statistical results in Exp.3 on language emergence: the UR and CT values of the communal languages in different communities under various degrees of linguistic contact................................................................................................................222 Figure 7.7. The statistical results in Exp.3 on language maintenance: the URCross-group values between two communal languages sharing different linguistic features. Numbers outside brackets are average values, and numbers inside brackets standard deviation values.................................................................................................................................................................223 Figure 8.1. The understandabilities of local and global orders (a) and the local order states of the communal language (b) under the initial SV and SO local orders in Exp.1. The X-axes in these panels express the rounds of communications. In (a), the Y-axes express the proportion of integrated meanings................................................................................................................................235 Figure 8.2. The understandabilities of local and global orders (a) and the local order states of the communal language (b) under the initialized SV and OS local orders in Exp.1..............................................................................................................................236 Figure 8.3. The stability matrix (a) and the transitivity matrix (b) in Exp.1...................................................................................237 Figure 8.4. The understandabilities of local and global orders (a) and the local order states of the communal language (b) under the initialized SV and SO local orders in Exp.2..............................................................................................................................240 Figure 8.5. The understandabilities of local and global orders (a) and the local order states of the communal language (b) under the initialized SV and OS local orders in Exp.2..............................................................................................................................241 Figure 8.6. The stability matrix (a) and the transitivity matrix (b) in Exp.2....................................................................................242 Figure 8.7. The word order bias shown in Exp.2. The stability of a set of local orders (its corresponding global orders are inside the brackets) is indicated by different types of rectangles, and its transitivity is indicated by different types of arrows................244 Figure A.1. The data structures representing the artificial language. Each rectangle is a data structure. The arrow indicates that the data structure on the right contain some instances of the data structure on the left.........................................................................279 Figure A.2. The data structures representing the linguistic rules. Each rounded rectangle contains a list of pointers. The arrow indicates that the pointers in the pointer list point to some instances of the data structures representing the linguistic rules........279 Figure A.3. The data structures representing the memory system....................................................................................................280 Figure A.4. The data structure representing the environmental cue.................................................................................................281 Figure A.5. The data structure representing the language user (agent)............................................................................................281 Figure A.6. The flow chart of the insertion of a new M-U mapping into the buffer. Rectangle represent actions, rounded rectangles represent the staring and ending points, diamonds represent decisions, and a pair of rounded trapeziums indicates a repetition of several actions. CurBufSize represents the current buffer size, MaxBufSize represents the maximum buffer size. Buffer is an instance of CBuffer, in which Buffer[i] stores the ith M-U mapping in the buffer. newMU is an instance of CMeanUtter storing the new M-U mapping to be inserted. m is a repetition variable, and feedback is the listener’s confidence feedback to the speaker..............................................................................................................................................................................................282 Figure A.7. The flow chart of the acquisition of lexical rules. MU1 and MU2 are instances of CMeanUtter, Mean and Utter are the meaning and utterance parts of MU1 or MU2. newCompList is a list of pointers, each pointing to a newComp, an instance of CMeanUtter. LTMem is an instance of CLTMem. Pdet is the probability of the detection of recurrent pattern ………………...283 Figure A.8. The flow chart of the acquisition of categories and syntactic rules. Each item in MUmatSet stores a list of compositional rules (in matLexRule) and an integrated meaning (in Mean) from an M-U mapping in Buffer. No.matLexRule indicates the number of compositional rules stored in matLexrule. These compositional rules can partially or fully match the utterance of the M-U mapping in Buffer. Cat is an instance of CCat. LexRule, shaLex, othLexM, and othLexN are instances of CLexRule, SynRule is an instance of CSynRule. LTMem is an instance of CLTMem. SV/VS, SO/OS, and VO/OV are local

xi

orders among word rules. S-VO, V-SO, and O-SV are local orders between word rules and phrase rules. m and n are repetition variables............................................................................................................................................................................................286 Figure A.9. The flow chart of production. Parallelograms are the interface between the speaker and the listener. Case 1 and Case 2 are conditions in production. Pcrt is the probability for random creation. randNo is a randomly generated double value between 0.0 and 1.0. LexRule is an instance of CLexRule, Cat is an instance of CCat, and SynRule is an instance of CSynRule. LTMem is an instance of CLTMem, and IMMem is an instance of CIMMem. numHolist, numAg, numPr1/2, numPat, numAgPr, numAgPat, and numPrPat record the numbers of the corresponding lexical rules..............................................................................................288 Figure A.10. The flow chart of listener’s comprehension. Case 1, Case 2 and Case 3 are conditions in comprehension...............292 Figure A.11. The flow chart of the selection of categories and syntactic rules in production (3 word rules case). Cati_j (i=1, 2, 3; j=1, 2) and Cat are instances of CCat, Syni_j (i=1, 2, 3; j=1, 2) and SynRule are instances of CSynRule......................................296 Figure A.12. The flow chart of the selection of categories and syntactic rules in comprehension (3 word rules case). LocOrds are local orders between the utterances of two of the word rules in the heard sentence. Cati_j (i=1, 2, 3; j=1, 2) and Cat are instances of CCat, Syni_j (i=1, 2, 3; j=1, 2) and SynRule are instances of CSynRule....................................................................................299

xii

List of Tables Table 5.1. The major parameters and their values in the parameter set. The effects of the parameters marked with "*" are discussed in detail in later sections...................................................................................................................................................145 Table 5.2. The possible prevalent local orders if SV wins the competition with VS.......................................................................155 Table 6.1. The possible prevalent local orders if SV wins the competition with VS………………………………..…………….191 Table 7.1. The four situations for the simulations in Exp.3 on language maintenance…………………………………..………..222 Table 8.1. The 19 local order states..................................................................................................................................................232 Table A.1. The criteria for separating different cases in production (“||” is logic operator “OR”, “&” is “AND”, and “!” is “NOT”)………………………………………………………………………………………………………………………..…...289 Table A.2. The criteria for separating different cases in production (modified)……………………………………..……………292 Table A.3. The criteria for separating different cases in comprehension…………………………………………..……………...293 Table A.4. Situations for calculating the combined strength in production…………………………………………..…………...305 Table A.5. Situations for calculating the combined strength in comprehension (Case 1)………………………………..……….306 Table A.6. Situations for calculating the combined strength in comprehension (Case 2 & Case 3)……………………..………..307 Table A.7. Parameters for the artificial language…………………………………………………………………..……………...308 Table A.8. Parameters for the linguistic rules…………………………………………………………………………….……….309 Table A.9. Parameters for the communication…………………………………………………………………………..………...310 Table B.1. The score calculation of the local order states…………………………………………………………..……………..313 Table B.2. Transitions among different types of syntax that require one or two intermediate states………………….....……….314

xiii

Chapter 1. Introduction

1.1. Evolutionary Linguistics and Simulation Perspective Language defines human intelligence. Humans have long wondered why only our species have language, why language is the way it is, and how language became so. These questions that concern the evolution of human language are among the most intriguing questions in the understanding of human nature. The preliminary study of these questions dated back to the publication of The Origin of Species by Charles Darwin in 1859. However, the scientific conditions of the nineteenth century did not allow a systematic study on this topic and many speculative conjectures lacked sufficient empirical foundation and experimental verification. Therefore, in 1866, the Société de Linguistique de Paris imposed a ban on the discussion of this topic in scientific discourse. The ban said “La Société n’admet aucune communication concernant, soit l’origine du language, soit la creation d’une langue universelle.” (“The Society will accept no communication dealing with either the origin of language or the creation of a universal language.”) (Stam 1976) However, in the following “dark period” lasting almost a century, the curiosity for human language never ceased. Since the 1950s, linguistics has become proliferated, owing to the availability of more data on languages in the world, the increasing understanding of the behaviors of humans and other species, the development of new technology such as computers and other research methods such as statistical analysis, and the multidisciplinary contributions from a variety of fields such as archaeology, anthropology, neurology, and so on. Over fifty years of modern linguistic research has revealed more about the nature of human language than the Société could have ever imagined (Yang 2005). Meanwhile, there has been a resurgence of the discussion of language evolution (Schoenemann 1999). The investigation of this topic has returned as a scientific and collaborative enterprise, and gradually become a beacon for research (Oudeyer 2006). The study on language evolution aims to identify when, where, and how a language emerges, changes, and dies out. Genetic studies, archaeological and anthropological findings can be referred to for answering the “when” question; the research on animal behavior can help to

Introduction 2

reply the “where” question; and historical linguistics may shed light on the “how” question (Ke and Holland 2006). However, language and linguistic behaviors do not fossilize, considering the fact that speech acts themselves are inherently ephemeral, and the relevant fossil and archaeological clues are tantalizingly equivocal (Wang 1991; Hauser et al. 2002a). In addition, even though some early hominids might have been capable of using primitive languages, it is difficult to compare the behaviors of these extinct species with those of modern humans. We can only speculate what these primitive languages looked like and what capacities the early hominids had to process these languages (Lieberman 2006). Therefore, the study on language evolution has been largely restricted within the synchronic timescale, and many hypotheses about the evolution of language in the past based on the data at present are devoid of a scientific method to evaluate (Ke and Holland 2006). Thanks to the emergence of computer, we can break down this barrier by adopting the simulation perspective into linguistic research. Ever since von Neumann designed a model to demonstrate that a machine could reproduce itself in the late 1940s (Neumann 1966), the simulation approach has been widely adopted in many research areas including mathematics, physics, theoretical biology, artificial intelligence, and so on. Computational models that abstract the dynamic process of language evolution or specify the behaviors of language users in communication serve to evaluate the internal coherence of many verbally expressed hypotheses on language evolution, especially those on language origin, and the results of these models help to modify or even generate new theories (Cangelosi and Parisi 2002; Oudeyer 2006; Mareschal and Thomas 2006). Computational models on language evolution enable us to properly replicate the environment of early hominids indicated by the ancient remains, vividly experience the several million years of evolution within a short time of simulation, and even plausibly predict the future trajectory of language evolution based on the simulation results. Together with empirical data, computational simulation opens a new arena to discuss language evolution. Its rationality has been gradually accepted by many linguists, and numerous models studying the evolution of language and the human communication system have been developed (see, Parisi 1997; K. Wagner et al. 2003; Wang et al. 2004; Gong and Wang 2005; Vogt et al. 2006, for overviews).

Introduction 3

1.2. The Coevolutionary Hypothesis and My Computational Framework Among the contemporary wide spectrum of opinions and viewpoints on language evolution, there is a central question, i.e., whether syntactic abilities culturally evolved following a “mosaic” fashion (Wang 1982; Schoenemann and Wang 1996); or they were biologically determined by some unique genetic traits. The dogma in linguistics, founded by the Chomskyan School (Chomsky 1965, 2002; Pinker and Bloom 1990; Pinker 1994; Jackendoff 2002) almost half a century ago, states that the genetic traits determine the grammatical knowledge of human language, and this human-unique and language-specific knowledge exists as “modules” wired in human brains from birth (Lieberman 2000, 2006). This innateness hypothesis on human linguistic knowledge is now facing critical challenges. On the one hand, due to the research on neuroscience and cognitive science, we have obtained better understanding of the general functions of human brains in processing language and performing other cognitive activities. Based on the comparison between human and animal behaviors, we have discovered many precursors of human linguistic abilities in other species. Thanks to the advancement in genetics, we can clearly specify the genetic similarities between humans and other primates. All these have revealed not only the intensive connections between linguistic abilities and other cognitive capacities, but also the great similarities between human and primate behaviors, both of which shake the foundation of the innateness hypothesis. On the other hand, physicists, mathematicians, and social scientists have discovered many subtle properties or general mechanisms that operate in natural systems and human societies, such as self-organization (Camazine et al. 2001), phase transition (May 1976), emergence (Bonabeau et al. 1999), and so on, which inspire some reconsideration on evolution itself, and trigger further explorations on how these newly found mechanisms work along with natural selection. All these have provided new insights on language and suggested more plausible paraphrases to the questions such as how linguistic abilities and knowledge evolved and how natural selection proceeded together with those general mechanisms in various systems, which seriously question the belief in the innateness of human language and linguistic knowledge. The growing body of empirical data and the deeper understanding of human language as a Complex Adaptive System (Steels 2000; Wang 2006) induce a new view on language evolution.

Introduction 4

This view, advocated by the Emergentism/Connectionism School (Elman et al. 1996; MacWhinney 1999), suggests that syntactic as well as other linguistic abilities should gradually evolve from some general abilities that initially might not be specific to language, and the complex process of language evolution should result from a variety of linguistic and nonlinguistic factors. Instead of discussing “the global picture” of language evolution or the complete trajectory of syntactic development in human language, this thesis focuses on a “local picture”; it only studies two linguistic universals, compositionality (in the form of lexical items) and regularity (in the form of simple word order), and mainly explores how these universals gradually emerged and got maintained in primitive languages adopted by early hominids. Considering the close connections between linguistic features and individuals’ linguistic abilities, and the complex interactions between language and the linguistic environment of communications, I propose a coevolutionary hypothesis on the emergence of compositionality and regularity. It suggests that the acquisition of lexical items and the development of syntactic knowledge coevolve during the transition from a holistic signaling system to a compositional language with preliminary linguistic features similar to those observed in modern languages. Some general learning abilities are proposed as the precursors to the linguistic abilities of processing lexical items and simple word order, and they bridge the gap between domain-general abilities and language-specific competences. For instance, the pattern extraction ability (L. Fillmore 1979; Monchi et al. 2001) may eventually lead to the emergence of compositionality in human language. The sequential learning ability (Lashley 1951; Hauser 1996; Christiansen and Chater 2008) and the ability to reiterate available knowledge make it possible for acquiring local orders and blending them to form global orders at the sentence level. Instead of a built-in regularity, this constructivist approach allows a “bottom-up” development in human language. In addition to these crucial abilities, other fundamental ones, such as the imitation ability, the symbolic communication ability, and so on, are also necessary for language emergence. As indicated by the empirical data, most of these abilities are present with different levels in other species, and at least not specific for language processing at the very beginning. For instance, the sequential learning ability is also tapped in other domains such as tool making and miming (Corballis 1989, 2002; Allott 1991; Gärdenfors 2003).

Introduction 5

This thesis adopts the simulation perspective to study the emergence of compositionality and regularity in a population of artificial language users. A multi-agent model is designed, in which language users (agents) are endowed with the simple abilities discussed above. During communications, by applying these abilities to process arbitrary sound sequences that encode simple semantic expressions, agents autonomously develop their languages to exchange information. Without any centralized guidance, after iterated communications, an artificial communal language that share similar features as those in modern languages may emerge through self-organization of the linguistic components and individual learning abilities. The conceptual framework of this model is illustrated in a diagram shown in Figure 1.1. In this model, utterances encoding simple semantic expressions such as “run” (means “a fox is running”) or “chase” (means “a wolf is chasing a sheep”) are exchanged among agents in communications. Through pattern extraction, individuals may acquire some components that repeatedly appear in the exchanged utterances as lexical items (e.g., “fox”

/FOX/, “wolf”

/WOLF/, “chase”

/CHASE/, “sheep”

/SHEEP/, “



means “mapped with”, and the capitalized letters delimited by '/' represent utterance syllables. In simulations, these syllables are not necessarily identical to the semantic expressions). By sequential learning, individuals may notice some order relations among lexical items in the exchanged utterances and acquire these relations as local orders. In addition, when an individual observes that two (or more) meaning-utterance mappings involve a single lexical item that displays the same local order with respect to two lexical items having the same semantic role (‘Ag’, the actor of an action; ‘Pr’, the action; and ‘Pat’, the entity that undergoes an action), he/she can assign these two lexical items to the same category. For convenience, categories are labeled with the syntactic roles to which they correspond in simple declarative sentences in English (i.e., ‘S’, Subject; ‘V’, Verb; and ‘O’, Object). The local order is also acquired as a syntactic rule between the first lexical rule and the newly formed category. Then, through reiterating local orders among the categories, individuals gradually acquire an emergent global word order to regulate the strings of some lexical items of these categories to form an utterance expressing the integrated meaning encoded by those lexical items. For instance, if there exist some S, V, and O categories that are locally ordered S before V and S before O, then two emergent global orders are produced, SVO and SOV. The integrated meaning "chase" could then be expressed as either /WOLF CHASE SHEEP/ or /WOLF SHEEP CHASE/.

Introduction 6

Figure 1.1. The conceptual framework of the computational model in this thesis. The SEMANTICS rectangle represents the predefined semantic space, and the ovals represent the three aspects of linguistic knowledge acquired by agents based on different domain-general abilities, pattern extraction, sequential learning, and categorization. The EMERGENT GLOBAL ORDERS rectangle represents the emergent syntactic patterns triggered by these domain-general abilities and correlated linguistic knowledge (indicated by arrows).

As an analogy to the emergence of human language, the simulation results of this model recapitulate the major stages in the emergence of compositionality and regularity in the communal language and evaluate whether those domain-general abilities equipped by agents are sufficient to trigger these universals in the communal language, both of which help to verify the coevolutionary hypothesis. Apart from tracing the emergence of linguistic universals, the proposed model also simulates the interactions among the artificial language, its users, and the linguistic environment. It further discusses the influence of some correlated linguistic and nonlinguistic factors on language evolution. These correlated factors include those concerning the artificial language, such as the semantic structures; those concerning the language users, such as the learning abilities; and those concerning the linguistic environment, such as the nonlinguistic information, different forms of cultural transmissions, and various types of social structures. The discussion of the influence of these factors attempts to provide a comprehensive understanding on language evolution. In addition to the physical factors inherent to the model, some general mechanisms and subtle properties of the human communication system, such as multi-criteria competition, selforganization, are also taking effect during language evolution. The proposed model provides an efficient way to investigate these mechanisms and their effects on language evolution, which could be hard to predict in verbal hypotheses and difficult to trace in empirical studies. Moreover,

Introduction 7

this investigation may be instructive to the study of other natural or social phenomena that involve similar mechanisms or properties. The coevolutionary hypothesis inspires us to take an integrated view on human language and to notice that language evolution can be affected by many factors as well as random events (Steels 2000; de Boer 2001; Ke 2004). Based on this view, the emergence of syntax and the singularity of human language can be reasonably explained, without resorting to the innateness explanation prematurely. I do not attempt to claim that coevolution is the definitive answer to the question of syntax evolution. Instead, as a sensible response to some arguments advocated by Innatism, my purport is to provide an exploratory discussion of possible scenarios of language evolution in a quantitive manner, and to propose simpler, but still useful, forms of behaviors that could bridge the gap between no languages at all and the languages we use today. Coevolution, as a process separating from the innately-determined development, may explain the evolution of some linguistic universals.

1.3. Thesis Structure The thesis consists of nine chapters, which are divided into four parts: 1) the research background of language emergence and computational simulation; 2) the description of the compositionalityregularity coevolution model; 3) the application of this model in the study of cultural transmissions, social structures, and word order bias; and 4) the general discussion and summary of conclusions. The first part includes Chapter 2 and Chapter 3. Chapter 2 first presents the research background of the study of language evolution, with a brief description of the main historical stages in this area. It then discusses the mainstream theories on language emergence, by reviewing their central claims and addressing their major problems. The theoretical arguments for the coevolutionary hypothesis are given in the last section of Chapter 2. Chapter 3 provides a state-of-the-art review of computational modeling. This chapter summarizes both the advantages and limitations of computational simulation. Based on a brief review of some existing models in the field, it gives a general tutorial on how to develop a computational model to study linguistic problems. Then, it briefly describes the computational framework of the model designed in this thesis.

Introduction 8

The following two chapters, Chapter 4 and Chapter 5, form the second part. Chapter 4 introduces the major components of this model, and discusses its rationality and originality compared with other models addressing similar questions. Chapter 5 presents the simulation results of the coevolution of compositionality and regularity, and the “bottom-up” process of syntactic development. After explaining these results, it traces the heterogeneous development of individual’s language, and discusses the effects of various language-related abilities. This discussion suggests that the whole process of language emergence is a collective result of factors from different domains and at different levels. The third part includes Chapter 6 to Chapter 8. Chapter 6 studies the role of different forms of cultural transmissions in language emergence and maintenance. These forms are horizontal, vertical, and oblique transmissions, which differ in the relationship between the participants in a communication. The simulation results acquired in an acquisition framework illustrate the effects of these inter- and intra-generational transmissions, inspire some reconsideration on the bottleneck effect during cultural transmissions, and discuss the roles of conventionalization during horizontal transmissions in language evolution. Through manipulating the probabilities for individuals to participate in communications and those for individuals of different communities to communicate with each other, Chapter 7 explores the effects of social structures in three experiments: the first one discusses the role played by a popular agent in language emergence and maintenance; the second one examines the social structures in which individuals’ popularities follow different power-law distributions, and discusses the relation between linguistic communications and social hierarchy; and the third one explores the effect of exoteric communications on the convergence of communal languages adopted in different communities. Chapter 8 addresses the question of word order bias in the languages of the world, and examines, from a simulation perspective, whether functional constraints operating at the sentence level can result from constraints working at a lower level. By tracing the transitivity and stability of different sets of local orders, the model shows that the degree of bias in the global orders is caused by self-organization of local orders and the semantic structures cognized by agents. This study further reveals the close connection between syntax and other linguistic components.

Introduction 9

The final part, Chapter 9, summarizes the major findings of this research and highlights the empirical bases of the adopted assumptions concerning individuals’ linguistic abilities. It continues with a discussion of some limitations of the model and possible future modifications. After a recapitulation of Emergentism and computational simulation, it concludes with a scenario proposed for the evolution of linguistic morphology, which is another important linguistic feature in human language not yet simulated in this model.

Chapter 2. Research Question: Language Emergence

This chapter first introduces the major stages of language evolution, including language emergence, change, and death. Then, it discusses the contemporary mainstream theories on language emergence, Innatism and Connectionism/Emergentism. In the end, it proposes the coevolutionary hypothesis on the emergence of compositionality and regularity.

2.1. Three Stages of Language Evolution The evolution of human language can be considered from two complementary aspects: that of the user and that of the tool (Wang 1982). On the one hand, there is the evolution of the human biological system. After numerous anatomical modifications over several millions of years, we have achieved the capacity for learning and using any modern language in the world. On the other hand, there is the evolution of language itself, from body gestures, facial expressions and sounds for various activities and emotions, to the intricate, abstract, and elaborate symbolic devices that characterize all human languages. Before tackling any specific problem on language evolution, we have to clarify some general questions, which include: what is the capacity for human language, what is evolving during language evolution, and how to date language evolution in different timescales. 2.1.1. What is the capacity for human language? The capacity for human language is postulated to exist in the following three aspects (Chomsky 1972, 1986, 1995; Steels 2005), in which the first concerns the language user and the others concern the language itself: 1) The biological capacity of language, which is “the set of physiological and cognitive components that an individual needs for participating in a language community” (Steels 2005, p.214). These capacities include some basic abilities or functions of some physical organs to handle linguistic materials, such as the associative memory to memorize the mappings between

Research Question: Language Emergence 11

meanings and utterances, the sequential ability to apply syntax to regulate linguistic components, and the vocal tract and vocal-auditory channel to exchange signals in communications. 2) The idiolect or I-Language, which is the internal language system of an individual, represented in the brains of the population (Kirby 1999). It is the body of knowledge that an individual uses to produce and comprehend language. As defined by Steels (2005), an idiolect contains a specific inventory of sounds, a particular set of lexical items, and a set of grammatical constructions, etc. The difference between the idiolect defined by Steels and I-Language by Chomsky is that the latter focuses more on the abstract, grammatical knowledge, and precludes some actual use of language as attested in speech, such as sets of lexical items and sounds. 3) The communal language or E-Language, which is “the consensus that has arisen in a particular population on how to express meanings, and such consensus emerges from activities of individual agents and is not physically stored anywhere or globally observable by individuals in a population” (Steels 2005, p.214). From the ecological perspective, a communal language can be viewed as “the extrapolation from I-Languages whose speakers in a particular population communicate successfully with each other most of the time” (Mufwene 2001, p.2). Communal languages of different communities can differ, and affect one another via the contact of idiolects of their members. E-Language, however, is defined as the “performance” that imperfectly reflects an individual’s “competence” (idiolect) (Chomsky 1986), which concentrates more on the use at the individual level and does not clearly distinguish it from the use at the community level as the consensus arising in a population. Both idiolect and communal language are developed by individuals using their biological capacities. The communal language can be viewed as the product of the idiolects, and the idiolect as the product of the communal language which an individual is accessed to (Kirby 2002a). ILanguage can be viewed as a mechanism by which an individual constructs utterances in ELanguage, and E-Language, in turn, provides the examples that allow an individual to construct his/her I-Language (Burling 2005). A comprehensive discussion of language evolution should consider the distinction of this dual existence of language, but such distinction has been ignored in many discussions (Ke 2004). For instance, the viewpoint of “language as an organism” (Christiansen 1994; Christiansen and Chater 2008) claims that language has adapted to be symbiotic within human cognitive processes. It argues that certain linguistic structures (e.g.,

Research Question: Language Emergence 12

binding, head order, and subjacency) are more adaptively processed by idiolects, but it did not consider how communal language has evolved accordingly. In addition, it is necessary to distinguish the evolution of the biological capacity for human language and the evolution of idiolects and communal languages (Batali 1998). First, these two levels of evolution are correlated. It is indisputable that human beings are biologically predisposed to acquire and use language (Lieberman 2006). Without those biological capacities such as the anatomical and neurological endowment, and sufficiently powerful cognitive abilities, an individual is deprived of the abilities to produce and perceive complex signals, as well as to participate in communications. Therefore, it is impossible for him/her to form and transmit linguistic knowledge. Meanwhile, these two levels of evolution are different. The adaptive benefits of those biological capacities might not be specific to communications. To a certain degree, mutual intelligibility or communicative efficiency (Komarova and Niyogi 2004; Niyogi 2006) of language users can provide some selective pressures for the development of correlated capacities and linguistic knowledge. After fundamental biological capacities are adopted into linguistic activities, language evolution, especially the emergence, modification, and enrichment of grammatical resource in human language, could mainly occur in both idiolects and communal languages (Batali 1998). A central question on the evolution of the biological capacity for human language is whether those abilities or functions are language-specific (Chomsky 1972, 1986 and 1995; Pinker and Bloom 1990), or they are gradually adapted from general abilities driven by functional principles (Hurford et al. 1998; Knight et al. 2000) or through spandrel (by-products with no previous function come to serve some novel function, Gould and Lewontin 1979) or exaptation (an organ with an original function got adapted to perform another function, Gould and Vrba 1982). The current consensus is that humans must have a biologically-determined set of predispositions that impact on our ability to learn and use language (Brighton et al. 2005a). There have been a large number of discussions on the degree to which these capacities are languagespecific or human-unique, and how they affect language evolution (see Hurford et al. 1998; Knight et al. 2000; Oller and Griebel 2004; Minett and Wang 2005; Lieberman 2006, for overviews).

Research Question: Language Emergence 13

Several hot topics concerning the evolution of language itself include: how linguistic features in idiolects and communal languages have emerged and changed, as a result of innate grammatical knowledge (Chomsky 1965 and 1986), or as a consequence of communications and other nonlinguistic factors (Hurford et al. 1998; Knight et al. 2000; Labov 2001); and how idiolects and communal languages affect each other via various forms of contact or transmission. 2.1.2. What is evolving during language evolution? Linguists have noticed that there are many invariant sub-patterns or properties across languages. For example, Hockett (1960) has proposed thirteen design features or properties for human languages, e.g., arbitrariness, i.e., meanings contained in discrete units used in human language (e.g., sounds or gestures) are arbitrarily assigned and conventionalized through language use; displacement, i.e., language can describe events that may not happen in the immediate temporal or spatial environment of the conversation. Greenberg (1963, 1966) has observed that among the six mathematically possible orderings of the three nuclear constituents of a transitive clause, i.e., subject(S), verb(V) and object(O), only three orders (SOV, SVO, VSO) occur with high frequencies among the languages of the world. Another important property across languages is recursion (Chomsky 1986), i.e., human language tends to use the minimal materials to maximize the expressive power. Such recursiveness exists in lexical items (e.g., affixation), phrases (e.g., adjunct) and sentences (e.g., embedding), and there have been many discussions of the nature of recursion in humans and human languages (Hauser et al. 2002a; Fitch and Hauser 2004). These particular features on language structure and use that hold across most but not necessarily all languages of the world are defined as linguistic universals (Christiansen and Kirby 2003a). Various linguistic universals have been identified in phonological, syntactic, and many other linguistic aspects, and arbitrariness, displacement, dominant constituent word orders, and recursion are some examples of these universals. The evolution of these universals indirectly reflects the evolution of idiolects and communal languages. Study of these universals and their causations can inform us about what language is, what constraint it has, and how it is linked to other cognitive capacities of the human brains (de Boer 2001). Two important linguistic universals, compositionality and regularity at the syntactic level, are studied in this thesis. Compositionality reflects the phenomenon that in most languages, the

Research Question: Language Emergence 14

meanings of complex expressions are determined by the meanings of their components. For example, the meaning of the following English expression is determined by its components, the lexical items /cat/, /rat/ and /eats/: /cat eats rat/

This compositional feature also holds in many other languages, e.g.: /

/ in Chinese;

Cat eat rat /un

chat mange

un

rat/ in French.

(determiner) cat eat (determiner) rat

Compositionality provides building blocks (e.g., lexical items or phrases) to create complex expressions (e.g., clauses or sentences), and structural constraints are necessary to organize these building blocks in complex expressions. Regularity at the syntactic level reveals the fact that many languages adopt conventionalized structures, in the form of word order or morphology, to build up complex expressions. In other words, the structure of the components in a complex expression also affects the meaning of this expression. Regularity at the syntactic level manifests in the following three aspects. First of all, different orders with which lexical items appear in an expression can affect the grammaticality and meaning of this expression. For instance, the following expressions are ungrammatical in English, since the orders of their components violate the basic constituent word order (the unmarked or prototypical constituent order in which the nuclear constituents, subject, verb, and object, appear in sentences of a language, Tomlin 1986) of English (SVO): /cat rat eats/ and /eats cat rat/

However, the following expression with a similar order to the first of the above English expressions is grammatical in Japanese, whose basic constituent word order is SOV:

Research Question: Language Emergence 15

/

/

Cat (nominative) rat (accusative) eat (canonical)

In addition, the following two English expressions mean differently even though they share identical lexical items and the same “noun verb noun” order: /cat eats rat/ and /rat eats cat/

According to the basic constituent word order of English (SVO), in the former expression, cat is the subject that instigates the eating action and rat is the object that undergoes this action. In the latter expression, however, with rat being the subject and cat being the object, the meaning of this expression completely differs from the former one. Second, grammatical information can be expressed by morphological tags (morphemes such as words, word stems and affixes assigned on components of expressions). For example, In Navajo (an American Indian language, Tomlin 1986), different morphological tags can be assigned on the same noun to make it either a subject or an object, as shown in the following two sentences: /Ashkii at’éédyiyi ictsá/ means “boy saw girl”. /Ashkii at’éédbi ictsá/ means “girl saw boy”. Boy

girl

saw

A variety of morphology is adopted in human languages. For example, English assigns morphological tags on verbs to express tense and aspect in sentences (e.g., /-ed/ for the past tense and /-ing/ for progressive aspect. Here, /-/ represents the root of a verb). Russian uses /-a/, /-u/ or /-y/ to indicate nominative, accusative and genitive. Japanese also contains some morphological tags, such as the nominative marker / / and the accusative marker / / in the aforementioned Japanese expression.

Research Question: Language Emergence 16

Finally, many languages are mixed systems of word order and morphology. Some (e.g., Chinese) rely much on word order to regulate compositional materials, others (e.g., some agglutinative languages such as Classic Ainu in Japan, Shibatani 1990) mainly adopt morphology to build up complex expressions and carry grammatical information. Compositionality and regularity provide both the building blocks (Holland 1995) and the building mechanisms to systematically express infinite meanings with limited linguistic materials, which endow human language with the ability to make “infinite use of finite means” (Von Humboldt 1972). Some linguists (e.g., Krifka 2001) define regularity as a subset of compositionality, whereas others (e.g., C. Fillmore 1968; Croft 2001) emphasize more on the structural feature of human language, and suggest that certain structural entities (constructions) can be independent of lexical items and they themselves can transmit meanings. The development of compositionality and regularity is a hot topic in language evolution, which has been discussed in many theoretical scenarios, e.g., the “bootstrapping” scenario (Calvin and Bickerton 2000) and the “formulaic” scenario (Wray 1998, 2002a, 2002b). These scenarios will be discussed in Section 2.2. Apart from these theoretical argumentations, there are many empirical studies in historical linguistics (Arlotto 1981) tracing the changes of these universals in languages, e.g., the lexical diffusion (Wang 1969, 1977), the syntax change from Early English to Modern English (Fischer et al. 2000), and the disappearance of pro-drop (the ability to drop the pronominal subject from a sentence without sacrificing the grammaticality of the resulting expression, Niyogi 2006) in Modern French. In the end, it is necessary to notice that there are languages in which compositionality and regularity are inexplicit. First of all, morphemes and lexical boundaries are equivocal in polysynthetic, agglutinative languages such as Classic Ainu, many expressions in which are holophrastic. Second, in Yi (a minority language in China, Chen and Wu 1998), although its basic constituent word order is SOV, in some cases, the subject and the object can be distinguished based on different tones assigned on the verb, as shown in the following sentences:

Research Question: Language Emergence 17

/tshi33 mu33ka55 ndu21/ means “John hits him” /tshi33 mu33ka55 ndu34/ means “he hits John” He

John

hit

Finally, word order is less constrained in some languages when a speaker wants to emphasize various components in a sentence. For instance, in Mohawk (another American Indian language, Baker 2001), any of the following six expressions (adapted from Tomlin 1986) are grammatical, and all express the same meaning “Sak likes the dress” with different emphases: /Sak ranuhwe’s ne atya’tawi/

/Sak atya’tawi ranuhwe’s/

Sak

Sak dress

likes

the dress

likes

/Ranuhwe’s ne atya’tawi ne Sak/

/Ranuhwe’s ne Sak ne atya’tawi/

Likes

Likes

the dress (the) Sak

(the) Sak the dress

/Atya’tawi Sak ranuhwe’s/

/Atya’tawi ranuhwe’s ne Sak/

Dress

Dress

Sak likes

likes

the Sak

These exceptional cases illustrate the variety of human languages: each language conforms to some universal properties, meanwhile, it also has its unique features; and even for the same universal property, different languages could have different forms of it and follow different trajectories to develop it. 2.1.3. How to date language evolution? Language evolution takes place in different timescales. Wang (1991) proposed three timescales to define different aspects of the synchronic and diachronic study of language: 1) Microhistory timescale, which is “reckoned across a very thin slice of time, in years or decades” (Wang 1991, p.61). Based on study of language in the microhistory timescale, we can make short-term predictions on which of the various usages of today will continue into the language of tomorrow (Weinreich et al. 1968). An important question in this timescale is how a

Research Question: Language Emergence 18

child acquires a language from his/her linguistic environment. Study on this question usually deals with a stable linguistic environment, i.e., the linguistic input to a child is assumed to be stable. 2) Mesohistory timescale, which is “the middle timescale of language evolution” (Wang 1991, p.63), in terms of centuries or millennia. A classical question in this timescale is by what manner or means a linguistic change is implemented (Wang 1991). Some interesting topics include the study of language contact (the prolonged association between the speakers of different languages, Thomason and Kaufman 1988; Crystal 1992) within and among different social communities, and consequential changes in their communal languages, e.g., how linguistic innovations (e.g., lexical items or morphemes) are diffused in a social community (Wang 1977) and how different languages borrow lexical items from one another. Generally speaking, the dynamics of theses changes in the mesohistory timescale is smooth and gradual. 3) Macrohistory timescale, which is the largest timescale, “much longer than the span of time over which current methods of historical linguistics can take us” (Wang 1991, p.68). The evolution of many fundamental linguistic universals is studied based on this timescale, e.g., how linguistic universals such as compositionality and regularity emerged, and how language gradually evolved from protolanguage state to a modern state after a series of major modifications. Research in the macrohistory timescale concentrates on the emergent state (Wang 1982) which may undergo abrupt changes. The timescale during which a language is acquired by an individual is also referred to as the ontogenetic timescale; and the timescale during which a language evolves from some primitive stage to a modern stage with the advent of new universals or the modification of some available ones is also dubbed as the phylogenetic timescale. In linguistics, phylogenetic was first used in the historical linguistics concept, the phylogenetic tree, which was derived from biology to describe ancestral and inheritable relations among languages. The concept of the phylogenetic timescale defines the time in which the phylogeny of a language occurs, and it does not specify whether the phylogeny is achieved through genetic transmission. Some researchers (e.g., de Grolier 1983; Kirby 2001) proposed another timescale in their theories or computational models, the glossogenetic timescale, which is intermediate between the

Research Question: Language Emergence 19

phylogenetic and ontogenetic timescales. It considers both the ontogeny, i.e., how an idiolect is acquired, and the phylogeny, i.e., how a communal language is formed after generations of linguistic communications. Based on the glossogenetic timescale, many linguistic questions have been explored, including how idiolects and communal languages affect each other and how linguistic universals emerge due to communications among individuals in different communities or within and across generations. After clarifying these general questions, we can separate language evolution into three stages based on the various states of linguistic universals, the distinct linguistic environments, and the different timescales. These stages are: language emergence, language change, and language death. Research on these stages covers a variety of topics, including linguistic systems, individual learning, population dynamics, social connections, and so on. 2.1.4. Language emergence Emergence is the process where “macroscale phenomena results from microscale interaction” (Schelling 1978), or the ways in which structures can arise without being prespecified, given materials and temporal constraints (Elman et al. 1996). Language emergence is the process from “essentially no language” to “a simple language”, or from “a primitive language without universals” to “a modified language with certain linguistic universals”. It has two distinct senses (Wang 1991; Holland 1998) in the ontogenetic and phylogenetic aspects. Ontogenetically, emergence refers to the process whereby an infant acquires a language from its environment, i.e., Language Acquisition (Clark 2003a); phylogenetically, it refers to the process whereby Homo sapiens made the gradual transition from a prelinguistic communication system, perhaps not unlike those of our ape contemporaries, to a communication system with languages of the sort we use today (Gong and Wang 2005). The phylogenetic emergence of language is also referred to as Language Origin (Ruhlen 1994). The phylogeny and the ontogeny are alike if we assume that language emergence in both situations results from a series of small and gradual modifications. From this point of view, how the child learns language, and how this learning led from no language to the kind of languages we use today can be viewed as twin questions, and research on the ontogeny is presumably beneficial to project the trajectory of the phylogenetic emergence. This analogy has long been adopted by

Research Question: Language Emergence 20

linguists in their research. However, strictly speaking, these two aspects concentrate on different linguistic capacities, and they are achieved in different linguistic environments and timescales. The ontogeny focuses on the regeneration of the biological capacity, as well as the acquisition and update of idiolects, i.e., how an internal system of language is formed and modified based on the learning mechanisms at the individual level. During the ontogeny, the linguistic input to an individual is assumed to be stable, and the acquisition process mainly occurs in the microhistory timescale. The phylogeny, however, concentrates on the communal language, i.e., how it is developed through communications among individuals using their idiolects. It touches upon both the learning mechanisms at the individual level and the external mechanisms at the community level. During the phylogeny, the state of linguistic universals undergoes modifications with the occurrence of new universals as well as the adjustment of available ones, and the linguistic environment changes accordingly. Depending on the complexity and spread of linguistic universals, the phylogenetic process usually occurs in either the mesohistory or the macrohistory timescale. Witnessing these crucial differences, we have to be very cautious when reconstructing the phylogenetic emergence of language based on the observations of language acquisition. Study of the latter is in a modern linguistic environment where both human cognitive abilities and language itself have already developed into sophisticated systems. However, during language origin, early hominids had to invent the system de novo in a piecemeal manner. Therefore, it is imprecise to postulate that either the complexity lying in modern languages results from sudden changes, or those early hominids have already been capable of manipulating linguistic materials having comparable semantic or syntactic complexities as in modern languages. Due to the neglect of these differences in the linguistic environment and cognitive abilities, some theories or scenarios on language emergence, as discussed later, are precipitate to draw some unrealistic conclusions. Making judgments about the human language on the basis of the languages most easily and often studied today will give us a narrow perspective (Wray and Grace 2007). Studies of the ontogenetic and phylogenetic emergence of language involve different research methods. Research on the former usually has a strong empirical basis, since many

Research Question: Language Emergence 21

linguistic data are available and experiments can be directly operated on human adults or children. However, due to its long time span and the extinction of intermediate states, it is hardly possible to retrieve sufficient linguistic data, especially the oral data, to reconstruct the phylogeny of language. Multi-disciplinary research has to be adopted in order to make the study on the phylogeny of language “a legitimate scientific enquiry” (Christiansen and Kirby 2003a, p.300). To propose, support, and verify related theories on the phylogeny, findings from biology, archaeology, and cognitive science are referred to, and methods such as comparative approach, logical analogy, computational simulation, and mathematical analysis are necessary. 2.1.5. Language change Change is the process that some features go from “one state” to “another state”. Language change is the process during which the states of some linguistic universals are changing after they are established in idiolects and communal languages. The study of language change concentrates on changes both in idiolects, e.g., how idiolects across generations become different through the internalization of the corresponding grammar in language learners, and in communal languages, e.g., how the communal language of an adult population is defined in terms of the aggregate output of the (changing) individuals of that population (Briscoe 2002). Generally speaking, emergence and death (discussed in Section 2.1.6) are also changes. But since they mainly deal with the foundations of linguistic universals, they are excluded here. Language change mainly occurs in the mesohistory timescale. The linguistic environment is stable in the sense that the biological capacities for language are mature and many linguistic universals have already been established. It is also unstable in the sense that as a consequence of the frequent occurrence of salient forms of linguistic universals and the nonstop influence from linguistic or nonlinguistic factors, qualitative linguistic differences may be observed after several generations or even within some decades. Topics in language change include the phonological and syntactic changes (Niyogi 2006). As for the phonological change, the Great Vowel Shift in English that occurred during the Middle English period from the fourteenth to the sixteenth century has been intensively studied (Wang 1969, Wolfe 1972). During this shift, the long vowels of English went through a cyclic shift so that pronunciations of words using these long vowels changed systematically (Niyogi 2006). In

Research Question: Language Emergence 22

addition, there are many studies concerning the phonological mergers (two phonemes that are distinguished by speakers of a language stop being distinguished) and splits (one phoneme splits into two) in some languages. For example, in Wu dialect (one of the major dialects of Chinese), some diphthongs (e.g., /oy/) underwent a change to monophthongs (e.g., /o/), and the corresponding phonological categories were gradually merged (Shen 1997). As for the syntactic change, there are many topics covering changes in compositionality and regularity. For example, studies on lexical borrowing (also referred to as word loaning, the gradual or sudden adoption or incorporation of individual words or even large sets of vocabulary items from another language or dialect) concern how lexical items change their forms due to the innovation or contact of idiolects or communal languages (e.g., Bloomfield 1933; Cheng 1987). Studies on the disappearance of pro-drop in Modern French and the disappearance of OV order in Modern Chinese concern the change of morphological tags and word order. Take the disappearance of OV order as an example, in Old Chinese, the Object, especially in the form of pronoun, was usually put before the verb in some questions or negation sentences, an example is showed in the following negation sentence. However, such order almost totally disappeared in Modern Chinese: /

/ means “the ancients did not cheat me”.

The ancients not me cheat (declarative)

Both internal (e.g., Shore 1995; Labov 1994) and external (e.g., Romaine 1994; Labov 1972, 2001; Mufwene 2001) manners or means to implement a language change are being explored by linguists or sociolinguists. Internal mechanisms, such as the various learning mechanisms and learning patterns, can cause individual differences on language development (Shore 1995) and affect communal languages across generations. External mechanisms include the learning bottleneck (a constraint on the sample of utterances from which the language learner must try and reconstruct the communal language of his/her speech community during the transmission across generations, Christiansen and Kirby 2003a; Kirby 1998; K. Smith et al. 2003), and various patterns of language contact (Fisiak 1995; Mufwene 2001) that are determined by linguistic or nonlinguistic factors. Language contact can cause the diversity and convergence of communal languages, and may also be adopted to construe the emergence of pidgin and creole

Research Question: Language Emergence 23

languages (Bickerton 1995; Mufwene 2001). The effects of these internal and external mechanisms have been demonstrated in some computational models (e.g., Nettle 1999b, 1999c; Livingstone 2002), verified by some empirical data (e.g., the lexical borrowing between Taiwan Mandarin and Mainland Mandarin, Cheng 1987), and explored in the emergence of some creole languages due to language competition (e.g., the emergence of Daohua, a creole language in Sichuan, China, as a competition between Southwest Mandarin and Tibetan, Atshogs 2003, 2004). These mechanisms have also been discussed in some theories on cultural dissemination (the emergence and maintenance of differences in beliefs, attitudes, behaviors, institutions and practices among individuals of different groups, Axelrod 1997; Greenspan and Shanker 2004). 2.1.6. Language death Death is the process from “something” to “nothing”. Language death usually means the process during which an entire generation of speakers gradually or suddenly fails to transmit its language to the next generation (if any) (Minett and Wang 2008). In other words, a language is dead when it lost all its speakers, just like a species is dead when no more specimens are left to instantiate it (Mufwene 2008). Based on the analysis of current languages around the world, some linguists (e.g., Crystal 2000) estimated that over 50% of these languages (over 3,000, especially those with few speakers) would die within the next 100 years, and the majority of languages in history had already died out (Pagel 1995). The process of language death usually involves the simplification of linguistic forms along with the restriction of linguistic functions (Knab 1980). In a sense, fewer languages can make communications relatively easier. However, the extinction of a language usually causes the disappearance of not only some unique linguistic features (Ebert 2005) contained in this language but also the cultural identity of its users (Wang 2003). The aim of study on language death is to find out some strategies by which certain language and related cultural identity can be maintained. This study may also shed light on the research of language emergence (Mufwene 2004) because the death of one language is sometimes accompanied by the birth of another. Language death can occur either suddenly in the microhistory timescale (e.g., the genocide caused by wars or natural disasters) or gradually in the mesohistory timescale (e.g., migration, competition or union of various ethnic groups). In the former, there is a dramatic

Research Question: Language Emergence 24

change of the linguistic environment; in the latter, however, the linguistic environment is gradually becoming harmful for the dying language. Many mechanisms or factors taking effect during language change may also affect language death, and relevant research methods on language change can also be borrowed to study language death. For example, the analysis based on limited empirical data can extract patterns in the process of language death and discover reasons that cause the death; computational simulations (e.g., Abram and Strogatz 2003; Minett and Wang 2008) are also useful for tracing the dying process and study the effects of related linguistic or non-linguistic factors on language death. Language emergence, change, and death are closely correlated, since they are just three different aspects from which we are viewing the evolution of language. For example, the process of language change and origin is mainly via language acquisition. Without the transmission into new generations, a language will die out and no further changes will occur. The features in language change and death may also take effect during language origin.

2.2. Mainstream Theories on Language Emergence Among the three stages of language evolution, language emergence, especially the phylogenetic emergence of language, is most intriguing to scholars, and research on this topic has mobilized scholars from many different scientific cultures and undergone spectacular development. A major stimulus for research in this area was the paper published by Hockett (1960), in which human language was compared with various forms of animal communication. Another significant stimulus was provided by a major conference held in New York City, sponsored by the New York Academy of Sciences (Harnad et al. 1976). That this area has rapidly grown into a powerful magnet for interdisciplinary research can be seen in the numerous anthologies published in the past dozen years (e.g., Hawkins and Gell-Mann 1992; Oller and Griebel 2004; Christiansen and Kirby 2003b; Minett and Wang 2005) and the recent EVOLANG series conferences organized by Jim Hurford, Chris Knight and many others (e.g., Hurford et al. 1998; Knight et al. 2000; Wray 2002c; Tallerman 2005; Cangelosi et al. 2006). Many disciplines have contributed significantly to our knowledge on this topic. For example, anthropologists have uncovered more and more fossils of our ancestors, particularly in

Research Question: Language Emergence 25

northern Africa (e.g., Asfaw et al. 2002; White et al. 2003). From these discoveries, we may conjecture that our species evolved into modern form about 160,000 years ago, an important landmark for dating language emergence. This date roughly matches the time range as some language-related developments in our genes (e.g., FOXP2, Hurst et al. 1990; Fisher et al. 1998; Enard et al. 2002). Another landmark must be placed at around 50,000 years ago, when cultural achievements blossomed in the form of stone tools, art forms in sculpture and cave paintings, and burial sites (Appenzeller 1998; Klein 1999). Many studies by population geneticists have given us some baselines for the earliest major human migrations across waters (e.g., Bowler et al. 2003). All these indirectly indicate the gradually enriched and refined communicative abilities of our ancestors. In addition, the beginning of the twenty-first century witnesses the coming together of molecular genetics and neuroscience (Marcus 2004). The integration of the new knowledge here will have important bearing on such age-old controversies as whether there is a “language organ” (Anderson and Lightfoot 2002) and whether language emerged monogenetically or polygenetically (Freedman and Wang 1996). And recently, computational simulation was gradually adopted into this field. By quantifying the hypotheses and simulating the processes, computational simulation can investigate the dynamic, emergent process of linguistic universals, reconstruct the trajectory of language emergence, recapitulate the major breakthroughs during the emergence, and explore the effects of various factors in this process. This new method will be briefly introduced in Chapter 3. Many viewpoints on language emergence have been triggered based on this multidisciplinary research. Among the available theories, Innatism and Connectionism/Emergentism are the two mainstreams. The major contradiction between them lies on their different views on the nature of linguistic features, knowledge, and abilities, whether they are innate in humans or gradually exapted from fundamental abilities present in both humans and other species. 2.2.1. Innatism The Chomskyan School (Chomsky 1965, 1972, 1986, 1995, 2002; Pinker and Bloom 1990; Pinker 1994; Jackendoff 2002) proposed an innateness view on language emergence based on the assumptions extracted from the study of language acquisition. These assumptions, summarized by Clark (2003a), include: a) the syntax of natural language is too complex for children to learn from the forms they hear; b) adults offer such a distorted and imperfect source of data; and c)

Research Question: Language Emergence 26

children learn their first language so fast that they must rely on an innate capacity, especially for syntax. Innatism believes that humans have a set of species-specific capacities, the Language Faculty (Chomsky 2002), to master and use a natural language, which is viewed as an “instinctive tendency” for language. “A particular language L is an instantiation of the initial state of the cognitive system of the language faculty with options specified.” (Chomsky 1995, p.219) According to the summary of Lieberman (2000, 2006), these cognitive capacities involved “receptive resources” to separate linguistic signals from the rest of the background noise, and to build the rich system of linguistic knowledge that every speaker possesses, based on other inner resources activated by a limited and fragmentary linguistic experience. Among the linguistic knowledge, syntax, defined as principles and procedures for arranging words in such ways that long strings of words can be uttered and understood effortlessly (Calvin and Bickerton 2000), derives from a “specialized, genetically transmitted syntax module” or an “organ” in the human brain (Chomsky 1986), which instantiates the Universal Grammar (UG, the set of grammatical principles and parameters which makes human language possible, usually thought to be determined by the human genome and to have a physical existence in the brain, Chomsky 1972, 1986). The UG, containing invariant principles and associated parameters of variation, specifies the total range of grammatical rules including phonology, morphology and syntax that can occur in any given human language. As summarized by Fauconnier and Turner (2002), Innatism believes that children do not learn the rules that govern syntax by means of general cognitive processes such as imitation (a behavior performed by one individual modeled upon that of another, Stanford 2006) or associative learning. Instead, the principles and parameters coded in the hypothetical UG are triggered to yield the correct grammar of a particular language as a child is exposed to normal discourse (the linking of sentences such that they constitute a narrative, Yip 2006). The hypothetical, innate grammar-acquiring capacity in the human brain that enables any normal human to learn any human language is called Language Acquisition Device (LAD). To conclude, Innatism places the distinctiveness of language in specific genetic endowment for a specifically, genetically instructed language module (Fauconnier and Turner 2002). This module is devoted to language alone and it is distinct from the mechanisms that regulate other aspects of human behaviors (Lieberman 2006). The initial state of the language

Research Question: Language Emergence 27

faculty consists of invariant principles and a finite array of choices as to how the whole system can function (Chomsky 2004). A particular language is fully determined by these principles and parameters, as well as related basic operations such as merge and move (Chomsky 1995) in the wired-in UG. Minimal learning is involved in acquiring a language; after an exposure to a fragment of a particular language, the UG can trigger a detailed representation of the syntax of that language. Innatism underscores the dissociation of language from other cognitive domains, and the discontinuity between prelinguistic and linguistic developments (Volterra et al. 2005). Many nativists have embraced a discontinuous viewpoint on language emergence (e.g., Chomsky 1972, 1986), which states that language emerged rather suddenly as a result of some remarkable massive mutation in the early hominid gene pool. The mutation of FOXP2 was recently postulated to be one of such unique events. Other nativists (e.g., Pinker and Bloom 1990; Newmeyer 1991; Lightfoot 2000) have viewed language as having arisen by gradual natural selection. There must have been a series of steps leading from no language at all to modern languages that we now use, and each step is small and gradual enough to have been produced by a random mutation or recombination (Pinker and Bloom 1990). These nativists admit that natural selection can account for the complex design features of a trait such as language (Chomsky 2002). To explain the evolution of language faculty defined by the innate UG, some scholars (e.g., Pinker and Bloom 1990; Newmeyer 1991) proposed that each element of the UG may have its own adaptive function, and the UG may have evolved gradually in a piecemeal manner as a result of natural selection. However, other scholars (e.g., Lightfoot 2000) proposed that the UG is like spandrels evolving as a by-product of something else and not the result of adaptive changes favoring survival to the reproductive age. Innatism was established around 1950s, and this speculation about the predominance of innate knowledge about language has become the default dogma in linguistics (Deacon 2003). However, this theory contains some critical problems: 1) Researchers from cognitive science (e.g., Fauconnier and Turner 2002) have pointed out that Innatism commits two mistakes by assuming that cause and effect are qualitatively isomorphic, so as function and organ. For example, when recognizing an effect, we usually conceive of the cause as having much the same status as the effect. If the effect is dramatic, a dramatic causal event is also expected (Cause-Effect Isomorphism, Fauconnier and Turner 2002).

Research Question: Language Emergence 28

This leads us to think that a discontinuity in effect must come from a discontinuity in cause, and therefore, that the sudden appearance of language must be linked to a catastrophic neural event. However, sharp behavioral accelerations and discontinuities can often be the results of underlying processes that are essentially continuous (Elman et al. 1996). For example, a counter example to the above analogy is the physical phenomenon, phase transition (May 1976), i.e., under continuous heat, ice can go abruptly from solid to liquid as water, and later on, from water to steam. This phenomenon may characterize the case of language emergence; the abrupt emergence of language or linguistic universals might be a continuous causation of some general factors, initially not specific to language. As put by Bates and Carnevale (1993, p.461): “discontinuous outcomes can emerge from continuous change within a single system.” Similarly, when observing a new organismic function, we usually assume that the onset of this function requires the evolution of a new organ (Function-Organ Isomorphism, Fauconnier and Turner 2002). This leads to the thought of a “language organ”. However, the continuous evolution of an organ does not necessarily correlate with a continuous evolution of a function. Language is a singularity of function subserved by human brains and assisted by various other organs, but the evolution of language does not necessarily correlate with the evolution of the brain or other organs. 2) Innatism has some difficulties in explaining the specific process of ontogeny and phylogeny of language. First, Innatism does not clearly state what, in the evolution of human brains, could have been the precursor of the language module. Nor does it state clearly what pressures from natural selection could have produced such a module (Fauconnier and Turner 2002). For example, some scenario based on Innatism (e.g., the “bootstrapping” scenario discussed in Section 2.2.3) gives a detailed description about which cortex is responsible for which aspect of linguistic functions, but it states nothing about how such topology came from. Second, the Innatism assumptions on language acquisition are questionable. For example, the logic problem on language acquisition based on the “poverty of stimulus” arguments (Chomsky 1980; Marcus 1999) states that the input to the learner is too inconsistent and incomplete to determine the acquisition of grammar, and parents rarely provide negative evidence (the explicit labeling of children’s incorrect output as errors by parents or other caregivers),

Research Question: Language Emergence 29

correcting grammar errors of their young children. Even when corrective feedback is provided, children tend to ignore it. Therefore, learning must rely on additional constraints from the UG. These arguments have been strongly questioned. For example, Horning (1969), based on a mathematical logic, has proved that if a close approximation of the grammar could be sufficiently achieved, negative evidence would not be necessary. Pullum and Scholz (2002), after revealing the logic behind these arguments, have argued against the often taken-for-granted “inaccessibility” of the crucial data for data-driven learning. This inaccessibility was usually for supporting the existence of innate knowledge. MacWhinney (2004), based on a careful analysis of child language corpora, has cast doubts on the claims regarding the absence or incomplete of related exemplars. Meanwhile, there has been a growing body of evidence showing that children acquire words and syntax by means of statistical (Saffran et al. 1996) or associative learning (MacWhinney 2004), imitation (Meltzoff and Moore 1977, 1984; Bates and Goodman 1997), and subtle social cues (Tomasello 2004) given by parents and caretakers, such as facial or gestural expressions conveying displeasure or corrections. The process of statistical pre-emption (Braine and P. Brooks 1995; P. Brooks and Tomasello 1999; Marcotte 2006) is also a powerful way for learners to gather indirect negative evidence (Goldberg 2006). In addition, Tomasello (2003, 2004) has pointed out that the innate UG was doubtable when considering two problems in language acquisition: a) how to link one’s abstract UG to the particularities of a language that one is learning, and b) how to understand the changing nature of children language across development under the same UG. Based on some empirical findings, he has suggested that children have at their disposal much more powerful learning mechanisms than simple association and blind induction as stated by Chomsky. Besides, there exist plausible and rigorous mechanisms that characterize adult linguistic competence in much more child-friendly terms than the innate UG (Tomasello 2003). Some of these mechanisms are: a) sharing intention, following the attraction and gesturing, directing attention of others, and culturally learning the intentional actions of others; and b) pattern detection at various levels, e.g., forming perceptual and conceptual categories of “similar” objects and events, and performing statistics-based distributional analyses on various kind of perceptual and behavioral sequences. Innatism argues that the complexity of core language cannot be learned inductively by general cognitive processes, but all the above observations and arguments suggest that many

Research Question: Language Emergence 30

generalizations about language that have traditionally been seen as requiring recourse to innate stipulations specific to language can actually be explained by general cognitive mechanisms (Goldberg 2006). 3) Some linguistic features and acquisition mechanisms, which were originally assumed to be language-specific and part of the language faculty, have now been argued to be general and shared among various species. In addition, observations from primate studies have increased the growing body of evidence showing that linguistic skills did not just suddenly appear in humans (Chow 2005), and demonstrated that the brain mechanisms that yield human syntactic abilities also have evolutionary antecedents outside the domain of language (Lieberman 2006). First of all, the ability to assign meanings to acoustic units is present in non-human primates and other species, at least as precursors. For example, some insects, such as Apis mellifera, can learn specific objects and their physical parameters, and master some abstract interrelationships, such as sameness and difference (Giurfa et al. 2001). Dolphins (Mercado et al. 2000) and primates (Oden et al. 1988) can also learn and generalize the concepts of sameness and difference. This ability is essential for a child who is just starting language (Burling 2005). A boarder collie, Rico, was found to be able to fast map labels of 200 different items (Kaminski et al. 2004), though whether it really understands the reference is in question (Bloom 2004; Tomasello 2004). Such “fast mapping” ability (to form quick and rough hypotheses about the meaning of a new word after even a single exposure, Carey 1978; Behrend et al. 2001) was originally assumed to be mediated by LAD (Markson and Bloom 1997). Many studies on chimpanzees that have been taught to use sign languages or other manual systems have shown that they can communicate and think using words (Gardner and Gardner 1983, 1989; SavageRumbaugh et al. 1986). Second, the ability to adjust meaning as a function of a combinatorial rule might not be human specific, and it can be acquired through general cognitive learning mechanisms. Some primates also possess simple syntactic abilities, e.g., the free-ranging putty-nosed monkeys can combine two vocalizations into different call sequences that are linked to specific external events (Arnold and Zuberbühler 2006). Comparative study has shown that similar to children, chimpanzees and monkeys also have the sequential ability to manipulate simple objects and acquire serial orders based on statistical information (Hauser et al. 2001; Terrace 2002; Hauser

Research Question: Language Emergence 31

1996, 2005). Savage-Rumbaugh and Savage-Rumbaugh (1993) have reported that chimpanzees can comprehend distinctions in meanings conveyed by simple syntax. Recently, Gentner et al. (2006) reported that European starlings (Sturnus vulgaris) were capable of not only recognizing acoustic patterns defined by a recursive, self-embedding, context-free grammar (Chomsky 1965; Hopcroft and Ullman 1979), but also classifying new patterns defined by this grammar and reliably excluding agrammatical patterns. This evidence contradicts the central claim in Innatism that the capacity for syntactic recursion forms the computational core of a unique human language faculty (Hauser et al. 2002a; Fitch and Hauser 2004; Fitch et al. 2005). In addition, based on mathematical analysis (Solé 2005), a simple word-object matrix might provide the syntax bases “almost for free”, and some computational models (de Pauw 2002, 2006; Gong et al. 2005b) have shown that using rigid data-driven approach or machine learning mechanisms, fixed word order can be learned without requiring many specific, built-in syntactic strategies. Third, based on some newly developed equipments and tools such as Functional Magnetic Resonance Imaging (fMRI), Magnetoencephalography (MEG), Electroencephalogram (EEG), and Positron Emission Tomography (PET), researchers have found that some neuroanatomical cortical areas in human brains that were originally assumed unique for linguistic activities, e.g., Broca’s (Broca 1861; Lichtheim 1885) and Wernicke’s areas (Wernicke 1874; de Renzi and Vignolo 1962), are also involved in other cognitive activities, such as processing music (Maess et al. 2001) and integrating hand movements with vision (Corballis 2002), most of which have nothing to do with vocal language. In addition, some researchers have reported that the structural homologs of Broca’s and Wernicke’s areas in other species such as rhesus macaques (Macaca fascicularis) are also active during the presence of species-specific vocalizations (Gil-da-Costa et al. 2006), and these cortical areas are also involved in controlling orofacial musculature during vocalizations (Petrides et al. 2005). Fourth, although the left hemispheres in human brains are generally more active during linguistic tasks, a complementary cooperation of both hemispheres involving several cortexes has been traced in some language comprehension tasks, and some of the involved cortexes are also activated during other cognitive behaviors (Just et al. 1996; Beeman and Chiarello 1998; Kotz et al. 2003; Rissman et al. 2003). For instance, Just et al. (1996) have traced that during the comprehension of English subject or object relative clause sentences, with the increase in the

Research Question: Language Emergence 32

syntactic complexity, the homologies of Broca’s and Wernicke’s areas in the right hemisphere become activated, though the degree of which is lesser than that of the left hemisphere. In addition, the linguistically-nondominant right hemisphere contributes to and is sufficient for the optimal processing of language (Pulvermüller 2002). As shown in the lexical decision experiments (Pulvermüller and Mohr 1996; Hasbrooke and Chiarello 1998), additional involvement of the right hemisphere can improve language processing compared to the left hemispheric processing alone. Meanwhile, as shown in the category-specific deficits revealed by the lexical decision task (Neininger and Pulvermüller 2003), lesions in the right hemisphere can reduce language processing abilities. Furthermore, the cortical areas are correlated and malleable. They can even take on new functions due to damage to the brain or birth defects (Donoghue 1995; Elman et al. 1996). For instance, the comprehension of sign language could even activate auditory cortex (Nishimura et al. 1999). These empirical findings suggest that language has considerable behavioral and neural links with related nonlinguistic skills and with the sensorimotor substrate that allows language to be perceived and produced (Dick et al. 2005), and those language-related neuroanatomical areas are also present as homologies in the brains of other species and involved in their speech-related activities (e.g., vocalizations). Therefore, the linguistic processing mechanisms might be neither language-specific nor human-unique. Finally, although Innatism suggests that language is the result of mechanisms that are specially human and specific to language, it underestimates the central role of human speech, the physical supporting medium of language (Lieberman 2006), and says little about phonology (Yip 2006). Speech is to human beings as echolocation is to bats and song is to birds (Aaltonen and Uusipaikka 2005). Although no other species appear to be able to produce human speech, most of the acoustic-perceptual characteristics of human speech are not species-specific nor of recent origin (Lieberman 2006). The mechanisms underlying speech in humans could be evolutionarily ancient, inherited from a vertebrate ancestor (Hauser 2001), and it seems that the neural substrate of speech production and comprehension also “overlaps” with that of nonlinguistic activities involving the body parts that express language (Pulvermüller 2006). Many empirical studies have revealed that many non-human species also possessed some basic phonological knowledge and skills relating to the sounds themselves and to the structures in

Research Question: Language Emergence 33

which they occur. As for the sound distinctions, many birds can learn to respond to the acoustic parameters that convey human speech (Hauser 1996). Blackbirds and pigeons can discriminate steady-state vowels that are differentiated only by their formant frequency patterns (Heinz et al. 1981). Some studies (e.g., Kluender et al. 1987, 1998) discovered that besides humans, Japanese quail (Coturnix coturnix) and starlings were also aware of certain phonological categories (at least the prototypes) and displayed the perceptual magnet effect (the decrease of the intracategorical perceptual difference and the increase in the inter-categorical perceptual difference, Kuhl 1991; Kuhl et al. 1992). Bengalese finches have even shown the syntactic control of their singings (Okanoya 2002). In addition, as reviewed by Hauser (2001), many species, such as mice (Ehret and Haack 1981), swap sparrows (Nelson and Marler 1989), and field crickets (Wyttenbach and Hoy 1999), have been reported to show categorical perception (when the variable and confusable stimulation that reaches the eyes and ears is classified by the mind into discrete, distinct categories whose members somehow come to resemble one another than they resemble members of other categories, Liberman et al. 1957) in distinguishing conspecific calls, and this ability has also been found in primates such as chinchilla (Kuhl and Miller 1975, 1978), pygmy marmosets (Snowdon 1987), Japanese macaques (May et al. 1989), macaques (Fischer 1998), and baboons (Cheney and Seyfarth 2005). As for the structural distinctions, the ability to pay attention to transient probabilities have long been argued to be crucial for humans, both adults and children, to acquire a language (Saffran et al. 1996; Pierrehumbert 2003). However, other species, such as cotton-top tamarins, are also able to track transitional probabilities for adjacent syllables but not for consonants (Hauser et al. 2001), whereas human infants can do this for nonadjacent consonants but not for nonadjacent syllables. In addition, it has been shown that starlings can acquire rules for the temporal patterning of song motifs, and even learn the structural form AnBn, which is present in many languages such as Chinese (Gentner et al. 2006). With the accumulation of more comparative studies on phonological abilities between humans and other species, the list of shared phonological abilities among humans and other species keeps growing. This inspires us to reconsider the singularity of human speech and wonder whether the existence of apparently similar phonological skills necessarily implies some

Research Question: Language Emergence 34

similar neural structures or cognitive mechanisms involved in both human languages and animal vocalizations (Bolhuis and Gahr 2006; Lieberman 2006; Yip 2006). These empirical findings from linguistic and nonlinguistic aspects have triggered some suspicions of the Innatism claims on the uniqueness of human language and correlated linguistic mechanisms. They also provide reconsideration on the connection between linguistic and other cognitive abilities, both of which are isolated in Innatism. 2.2.2. Connectionism/Emergentism Connectionism/Emergentism, as a new school of theory on language emergence, was gradually established during the period when more problems on Innatism have been pointed out. Both Connectionism and Emergentism (Elman et al. 1996; Elman 1999; MacWhinney 1999) deny the pre-existence of syntactic representations, of the innate UG, and of an inborn acquisition device specifically for language such as LAD. Linguistic universals exist, but their existence does not imply that they are prefigured in the brain; although innate properties are necessarily universal, universals of human language are not necessarily an innate genetic endowment (Croft 2001). As suggested by Deacon (1997), the universal rules or explicit grammar axioms may have emerged spontaneously and independently in each evolving language, in response to universal biases in the selection processes affecting language transmission. “Language does not expect us to build everything starting with lumber, nails or blueprint; instead, it provides us with an incredibly large number of prefabs” (Bolinger 1976, p.96). Instead of springing forth in its full splendor, language is seen as having evolved “in a mosaic fashion” (Wang 1982; Schoenemann and Wang 1996), with the emergence of semantics, phonology, morphology, and syntax all at different times and according to different schedules. Constraints on learning and cultural transmissions (the mechanism by which behaviors persist over time by being acquired and performed by a number of individuals, Christiansen and Kirby 2003a) play an important role in determining linguistic structures (Brighton et al. 2005a). Connectionism concentrates more on the questions about how language is acquired, how linguistic knowledge is represented in human brains, and how richness of experience that can help to acquire a language (Elman 2005). It “provides a useful conceptual framework for understanding emergent form and the interaction of constraints at multiple levels” (Elman et al.

Research Question: Language Emergence 35

1996, p.359). The ontogeny of language, as Connectionism claims, is not fundamentally different from any other type of learning and can be accounted for by the same mechanisms that are required for interactions with the environment in general. It is a task of integrating and coordinating multiple sources of information in the service of communicative goals (Bates and MacWhinney 1979). This theory takes the pursuit of generality so seriously that ultimately it arrives at the strongest possible conclusion concerning the nature of the human language faculty, i.e., it has no special properties of its own (O’Grady 2005). The rich linguistic and nonlinguistic experiences are sufficient for children to acquire the language exposed to them based on some general learning operations, such as the statistical learning based on the transitional probabilities (e.g., Saffran et al. 1996), the occurrence frequency (e.g., Shi and Werker 2001; Shi et al. 2006), and the cue-based learning using multiple acoustic and phonological cues (e.g., Shi et al. 1999). Language is viewed as a new machine built out of old parts (Bates and Goodman 1997), “emerging from a nexus of skills in attention, perception, imitation, and symbolic processing that transcend the boundaries of ‘language proper’” (Bates and Dick 2002, p.294). Connectionism takes approaches that enable developmental, cognitive, and neurobiological issues to be addressed within a single, integrated formalism, and it provides new ways of thinking about how cognitive processes are implemented in the brain and how disorders in brain functions lead to disorders of cognition (Plaut 2003). Emergentism focuses more on the questions about how language is the way it is and how various factors shape the linguistic structures. Emergentism views language as a kind of “interface” among a variety of more basic abilities (Wang 1982). These abilities may underlie nonlinguistic processes, and involve various mechanisms operating in different domains and at different levels, e.g., the perception of patterns in the frequency and temporal domains, the coding and storage of events and objects at different memory levels, and the manipulation of various hierarchical mental structures. Many of these abilities are present with different degrees among other animals and probably emerged much earlier than language in hominid evolution. They were increasingly made accessible to use in the elaboration of language as well as several other elaborate human institutions, such as mathematics and music. Language did not come out of nowhere nor did it arise as some bizarre genetic mutation unrelated to other aspects of human cognition and social life (Tomasello 1999). The phylogeny of language is a process of elaboration of these domain-general abilities into linguistic activities, and language should be viewed not as

Research Question: Language Emergence 36

a wholesale innovation, but as “a complex reconfiguration of ancestral systems that have been adapted in evolutionarily novel ways” (Fisher and Marcus 2005, p.9). Emergentism adopts three approaches to construe the phylogeny of language: the sociobiological explanation, the sociocultural explanation, and the mixed approach of these two paraphrases, some of which were first proposed to explore how certain innate strategy increases its prevalence over generations. 1) The sociobiological explanation, also referred to as the functional approach, relies on genetic coding and natural selection to explain how certain strategy has gradually become prevalent over generations (Steels 2005). This approach, first introduced by Hurford (1989), connects communications with some functional or selective advantage, and assumes that individuals having more successful communications have a selective advantage for producing more offspring than others, and some innate strategies for successful communications will gradually be prevalent in later generations. The selective advantage may be a survival benefit as in a foraging scenario (e.g., Cangelosi and Parisi 1998), e.g., to obtain more resources to survive; or a mating benefit as in a sexual selection scenario (e.g., Okanoya 2002), e.g., to be attractive to more mates. On the phylogenetic timescale, those prevalent strategies would become the strategies for establishing and maintaining the communication system in the group. Innatists can adopt this explanation to answer how some strategies for linguistic universals have gradually become as part of the innate LAD (e.g., strategies for compositionality, Nowak et al. 2000, 2001); emergentists, however, can adopt it to answer how general learning frameworks have been adapted and led to the emergence of grammar through natural selection (e.g., Cangelosi and Parisi 1998). 2) The sociocultural explanation, also dubbed as the emergent approach, ascribes the emergence of linguistic universals in both idiolects and communal languages to the expressive power in communications and the social factors in communities (Steels 2005). It allows no direct connections between communications and functional or selective advantage. This explanation postulates that the strategies which can help individuals to maximize expressive power, minimize communicative effort, and optimize communicative success would be prevalent on a cultural time (the glossogenetic timescale).

Research Question: Language Emergence 37

Some nonlinguistic properties, such as self-organization, or general learning mechanisms, such as learning from experience and lateral inhibition (a mechanism by which neurons are able to determine more precisely the origin of a stimulus, A. Anderson 1995), are effective for spreading linguistic features among group members (e.g., Hutchins and Hazlehurst 1995; Ke et al. 2002; Oliphant 1997) and triggering crucial behavioral patterns for establishing the human communication system (e.g., altruism and cooperation, Noble 2000; Ohtsuki et al. 2006). Among these properties and mechanisms, self-organization is an important property exhibiting at different levels and in different aspects of human language. Self-organization is “a process in which pattern at the global level of a system emerges from numerous interactions among the lower-level components of the system; moreover, the rules specifying interactions among the system components are executed using only local information, without reference to the global pattern” (Camazine et al. 2001, p.8). This property characterizes many systems in which macroscopic outcomes emerge from microscopic interactions among various system components, and the global organizational properties are not seen at the local level (Oudeyer 2006). Despite their diverse appearances, different systems with self-organization (e.g., the ant nest, the foraging patterns of bees, the magnetization of magnets, the crystallization of snowflakes, and so on, see Ball 2001) share some common features (Bonabeau et al. 1999): a) emergence, the creation of spatiotemporal structures that arise unexpectedly from interactions among systems’ components, rather than from a property imposed on the system by an external ordering influence; b) multistability, the possible coexistence of several stable states, due to the magnification of random deviation sensitive to the initial conditions; and c) phase transition, the behavior of a self-organizing system may change abruptly and dramatically. The property of self-organization influences language evolution at two levels (Ke 2004). At the community level, linguistic communications are the interactions among the lower-level components (individual language user) of the system, and the communal language developed through linguistic communications is the emergent pattern at the global level. Idiolects determine communications based on local information, without referring to the global knowledge of the communal language. Meanwhile, at the individual level, the internal mechanisms for acquiring linguistic knowledge work on the lower-level components (lexical items and syntax) of the language on a local scale, and the idiolect that emerges is the pattern on a global scale. The work

Research Question: Language Emergence 38

cited above has demonstrated self-organization at either or both of these levels, and showed that certain patterns or features on a global scale can gradually emerge through interactions or mechanisms that work on a local scale. The sociocultural explanation suggests that it is the language itself that undergoes linguistic selection, rather than language users who undergo natural selection. In other words, language itself evolves to be adaptive, rather than humans evolve to be adaptive for using language. This explanation has been utilized to study the phylogeny of various aspects of language, such as the lexical conventionalization (Ke et al. 2002), the emergence of compositionality (Kirby 2001, 2002), the formation of the vowel system (de Boer 2001), and the emergence of perceptually grounded categories (Steels and Belpaeme 2005). As pointed out by Steels (2005), there are two crucial differences between the sociocultural explanation and the sociobiological explanation. In the former, a) selection does not go through fitness, genetic coding, and reproductive success, but through cultural transmissions and direct or indirect feedback on successful or failed communications; and b) the innovation is based not on genetic mutation or recombination, but on the invention, the use of salient linguistic structures, and their partial adoption by others. 3) The mixed approach combines both the sociobiological and sociocultural paraphrases to explain the phylogeny of language. It has been used in some simulations (e.g., Munroe and Cangelosi 2002) to study how Baldwin Effects (i.e., the sustained behavior of a species or group can shape the evolution of that species, Baldwin 1896, 1897), a means from the sociobiological explanation, helps to genetically encode certain strategies that evolve through self-organization, a means from the sociocultural explanation. It is also used to develop a coevolution theory on human brains and language (Deacon 1997, 2003), the main idea of which is that “the adaptive advantage of language communication would have provided selection for progressively internalizing certain crucial features of language structure in order to make it more efficient and more easily acquired” (Deacon 2003, p.328). This theory assumes that language arose slowly through cognitive and cultural inventiveness, and the demanding environment favored genetic variations that rendered the human brain more adept at language. In order words, cognitive effort and genetic assimilation interacted as language and brain coevolved.

Research Question: Language Emergence 39

To a certain extent, these three approaches match the three perspectives on language evolution proposed by Ke (2004): the biological perspective, which views language as the result of a biological evolutionary process governed by natural selection; the cultural perspective, which emphasizes the nature of language as a cultural phenomenon and studies its evolution as a cultural selection process; and the mixed perspective, which views both the biological and cultural aspects of language as coevolving. The mixed perspective also suggests that both the biological and cultural perspectives should be considered as complementary in their contribution to constructing a full picture of language evolution, especially regarding language emergence. Unlike Innatism, Connectionism and Emergentism do not advocate a nativist account for the emergence of linguistic universals. However, they also face some problems in order to give an integrated picture of language emergence. First of all, many connectionists utilized some Artificial Neural Networks (ANNs, Schalkoff 1997) as the abstract models to represent the general cognitive module in human brains. These ANNs include the Simple Recurrent Network (SRN, Elman 1991, 2004) and the Recurrent Auto-Associative Memory (RAAM, Pollak 1990; Chamlers 1990). However, whether these frameworks are capable of representing the actual cognitive module is under a hot discussion, since the current development of neurobiology is insufficient to provide an explicit empirical basis for SRN and RAAM. Furthermore, whether these frameworks can actually acquire some unique linguistic features or competence, such as the recursive, context-free grammar (Christiansen and Devlin 1997; Fitch and Hauser 2004; Rodriguez 2001; Grüning 2006) and the competence to handle Combinatorial Productivity (whether the network can generalize from limited exemplars to novel but still correct sentences, van der Velde et al. 2004; van der Velde 2005; van der Velde and Kamps 2006; Wong et al. 2006), is still highly controversial. Second, as discussed by Pullum and Scholz (2002), linguistics lacks a solid theory and empirical studies to identify which aspects of language can be learned, and which must be innate. There are many language-related abilities whose origins have not been touched upon. Even for those capacities claimed as domain-general, such as the sequential learning ability (Christiansen and Ellefson 2002; Christiansen and Chater 2008) and the pattern extraction ability (Tomasello 2003), it remains challenges for emergentists to explain how these abilities and their biases for language could have evolved (Fauconnier and Turner 2002). Besides the claims of certain

Research Question: Language Emergence 40

abilities as domain-general, a quantitative analysis is necessary to determine the different degrees to which these abilities are present in humans and other species. In addition, many emergentists have proposed some functional constraints to explain the linguistic universals. However, these paraphrases seem to be ad hoc (Kirby 1999), or often constructed “after the event” (Lass 1980). Some popular functional constraints are: a) economy, the linguistic forms which are used commonly will be shortened to simplify the utterance for the sake of economy (Croft 2000); b) iconicity, the structure of the language reflects the structure of the experience of its speaker (Croft 2000); c) processing, the linguistic structures evolve to make processing easier (Cutler et al. 1985); d) pragmatics, some linguistic structures are natural consequences of some characteristics in natural language use. These constraints are adopted to explain some morphosyntactic universals. However, some universals cannot be purely explained by functional constraints (e.g., the universal order of the prepositional noun-modifier hierarchy, Kirby 1999), the functional motivations to cause some changes in universals are hard to identify (e.g., the disappearance of OV local order in modern Chinese as shown in previous examples), and some of these functional explanations do not address the question of how those universals are formed in a language at the very beginning. Third, both the sociobiological and sociocultural explanations face some challenges in order to give a reasonable, comprehensive explanation on the phylogeny of language. Some challenges for the sociobiological explanation are: 1) It is subject to the challenge on the validity of its assumption that the advantage of language and language faculty directly determines the fitness of human beings (Ke 2004). Not all linguistic features and not all information exchanged via language (e.g. rumors or gossips) can directly provide selective advantages. In a cultural environment, such selective advantage becomes much inexplicit. 2) It does not clearly state which linguistic universals or related competences are supposed to be emergent from domain-general abilities. Therefore, both Innatism and Emergentism can adopt it for their own purpose.

Research Question: Language Emergence 41

3) It is difficult to define the “global hand”, the global fitness function, to evaluate language-related activities, since it might require some global information to be available to each individual in the community (Steels 2006). However, such information is hardly observable. As for the sociocultural explanation, it emphasizes that language itself evolves to be adaptive, rather than humans evolve to be adaptive to using language. However, too much emphasis on this point may cause an exaggeration of the degree to which language acts as a selfadaptive system, and lead to an implication that language can evolve freely, independent of the learning mechanisms of its users. Then, in order to be efficiently acquired, language may directly modify its users’ linguistic abilities. Some scholars (e.g., Dawkins 1976; Blackmore 1999) proposed a “meme” view on language and its evolution, which suggested that language acted like a virus, and human’s linguistic knowledge showed signs of having been designed for transmitting memes with high fecundity, fidelity and longevity (Dawkins 1976), rather than conveying information about some particular topics such as hunting, foraging or the symbolic representation of social contracts (Blackmore 1999). This extreme view confuses the relation between language and its users by mistaking the result with the cause. It is language users that develop and modify their language based on their linguistic abilities, and use this language to fulfill communicative activities. In order to be efficiently acquired, language could change itself to better fit its users, but any change of it must match the corresponding linguistic abilities of its users, and the degree to which it changes in its own right must be confined within its users’ linguistic abilities. These restrictions from individual linguistic abilities to the acquired language are quite obvious, e.g., without the learning mechanisms towards certain linguistic features, individuals can by no means acquire a language that contains these features. 2.2.3. The scenarios of Innatism and Connectionism/Emergentism Calvin and Bickerton (2000) proposed a “bootstrapping” scenario, a version of Innatism, to explain language emergence. A protolanguage (Bickerton 1995; 2007) made up of utterances comprising a few words and with no syntactic structure was assumed as the primitive stage of human language, and modern language with hierarchical syntactic structure was conceived to originate from it through a single exaptation followed by a series of Baldwin Effects. Bickerton

Research Question: Language Emergence 42

(2002) argued that it was the pragmatics of communicating food sources that spurred hominids into linguistic expression, and the transmission of information arose directly from the requirements of food foraging, predator avoiding, and offspring instructing. Along with the appearance of “a single combinatorial operation of hierarchical concatenation”, those pre-existing substrates of words “qua conceptual relations and categorical perception” directly triggered many distinctive syntactic properties in human language. Human brains were viewed as “language organs”, in which different nerve tissues in the frontal, temporal and parietal lobes served different linguistic functions. To sum up, the “bootstrapping” scenario suggests that the transition from protolanguage to language, i.e., from words to word concatenation, and to hierarchical syntactic structure, is synthetic and discontinuous. The advancement in each stage requires further complex cognitive and linguistic abilities, and some genetic changes lead to a novel language faculty that enables individual language users to go beyond their cultural heritage and innovate a modern language (Kirby 2007a). Jackendoff (2002) has extended this scenario by listing a detailed developmental process from symbols encoding atomic meanings, to a protolanguage without hierarchy, and then, to a modern language with complex syntax and phonology. The “bootstrapping” scenario itself commits the mistake discussed in Section 2.1.4, i.e., it fails to notice the difference between the phylogenetic emergence and the ontogenetic emergence of language when proposing a scenario based on observations of language acquisition. According to some reviews (e.g., Prizant 1983; Shore 1995) of empirical studies on language development in normal children, there are two preponderant acquisition styles. The first one is the synthetic style (Peters 1977), which characterizes the language development in most if not all normal children. In this style, children emphasize single words for primarily referential functions and acquire more complex language by combining elements into multiword utterances. The second style is the gestalt style (Prizant 1983). In this style, children produce unanalyzed language forms or chunks with little appreciation of their internal structures or specific meanings, and these utterances may still be somewhat appropriately used in communicative interactions (Peters 1983). The language acquisition in some autism children has been argued to follow the same gestalt style (Prizant 1983). Similar distinctions of these language developmental processes have been made by other researchers, e.g., the “word babies” (who mainly produce clearly

Research Question: Language Emergence 43

articulated single words used in referential contexts in early acquisition) and “intonation babies” (who target on longer utterances in early production and concentrate on intonation contour with less well-articulated segments) by Dore (1974). The dominant synthetic style on children language acquisition inspired the “bootstrapping” scenario to propose that the phylogenetic emergence of language should also follow a similar style. However, the ontogeny and the phylogeny of language vary in several aspects, among which the key difference lies on their distinct linguistic environments. In the primitive environment, the exchanged meanings could mainly include integrated events or frequent situations (Arbib 2005), and these meanings could be only mapped to holophrastic utterances. Therefore, communications in social interactions such as foraging could be mainly achieved by exchanging holistic messages with only generic references (Wray 2002b). In this condition, the phylogeny of language, at least the early stage of it, should not follow the predominant synthetic style in language acquisition, during which the linguistic input to the children is specialized (e.g., motherese, Tomasello 2003) with restricted range of the topics, highly selected words and syntactic structures, isolated words with specific references, and usually accompanied by concrete gestures redundant with and reinforcing the message conveyed in speech (Volterra et al. 2005). After the symbols for atomic meanings are acquired through segmenting the holophrastic utterances, the following developmental process may begin to follow the “bootstrapping” scenario. Based on Emergentism, Wray (1998, 2000, 2002a and 2002b) proposed a “formulaic” scenario to explore the phylogeny of language. Considering both the similarity between human and animal communication systems, and the primitive linguistic environment, protolanguage is defined as holistic or “formulaic” (Wray 2002b), which is “composed mainly of ‘holophrases’ or ‘unitary utterances’ that symbolized frequently occurring situations … without being decomposable into distinct words” (Arbib 2005, p.108), and words as we know then coevolved culturally with syntax through fractionation. These frequently occurring situations might involve similar objects or actions, and their holophrastic expressions might occasionally share one or a few similar syllables (a linguistic syllable is a vowel plus one or more preceding or following consonants, Yip 2006). Then, the transition from protolanguage to language is through the repeated discovery that one could gain expressive power by fractionating (decomposing)

Research Question: Language Emergence 44

holophrastic utterances into shorter ones conveying components of a scene or a command (Arbib 2006a). By repeated segmentation and isolation, the learner can divide unitary utterances into meaningful sub-units and syntactic rules that govern the recombination of these sub-units. An example of the decomposition process is shown below: So if, besides /tebima/ meaning “give that to her”, /kumapi/ meant “share this with her”, then it might be concluded that /ma/ had the meaning “female person” + “beneficiary”. (Wray 2000, p.297)

To sum up, the “formulaic” scenario suggests that the transition from protolanguage to language, i.e., from holistic utterances to lexical items and syntax, is analytic and continuous. Generations of language users analyze the signal-meaning correspondences that arise by chance in the repertoire of holistic expressions, and eventually generalize these analyses to novel utterances (Kirby 2007a). The emergence of syntax and lexical items is mainly a process of conventionalization (a language user conforms his/her language to the language of the community; it is a process of social agreement that some utterance will have a particular meaning, Burling 2005) and recursiveness (mechanisms or knowledge working at a local or lower level are adopted to operate on materials at a global or higher level). After Wray proposed her “formulaic” scenario, there has been a debate on whether the phylogeny of language follows a synthetic or an analytic process (K. Smith 2006; A. Smith 2006; Tallerman 2007). Tallerman (2007) proposed some arguments against this analytic scenario. For instance, the size limit of early humans makes it impossible for them to memorize many holistic expressions. In addition, the effort to store and retrieve a mapping between a holistic utterance and a proposition is relatively greater than that to memorize a compositional mapping between a signal and an atomic concept. Furthermore, there might be more counter-examples than correct instances for individuals to extract compositional mappings. It turns out that these accusations are problematic. For instance, as argued by K. Smith (2006), there is no direct link between the brain size and the capacity for lexical memorization (Jackendoff 2002). The small brain size of early hominids does not indicate that they cannot store and retrieve many holistic expressions. In addition, it is incomparable between the effort to memorize a mapping between a signal and an atomic concept, and the effort to memorize a

Research Question: Language Emergence 45

mapping between a holistic utterance and a proposition involving a predicate and some argument(s). Furthermore, some pattern detection mechanism based on a pairwise comparison (e.g., Kirby 2001) does not need to consider counter-examples at all. In a sense, there are neither “correct” nor “incorrect” examples. What individuals learned are conventionalized mappings that could be either compositional or holistic. Except for these problematic accusations, the analytic scenario does contain some problems. On the one hand, as pointed out by Tallerman (2007), some proposed examples of holistic utterances involve complex semantic structures, which would require a brain with storage and retrieval capacities vastly superior to those available to early hominids. For example, Arbib (2005, 2006a) and Mithen (2005) gave the following two holistic expressions to describe some frequent situations in the life of early hominids:

Arbib’s example (2005, 2006a): /grooflook/ or /koomzask/ means: “Take your spear and go around the other side of that animal and we will have a better chance together of being able to kill it.” or “The alpha male has killed a meat animal and now the tribe has a chance to feast together. Yum, yum!” Mithen’s example (2005): a holistic utterance means: “Go and hunt the hare I saw five minutes ago behind the stone at the top of the hill”.

The semantic structures in these examples contain many clauses and hierarchical structures, which is even difficult for modern humans to handle, let alone early hominids with insufficient semantic and syntactic abilities. In addition, as argued by A. Smith (2006), it is “utterly implausible” that early hominids would have considered such specific propositions as frequent situations, and it is “wholly unlikely” that hearers could possibly reconstruct such complex meanings from context, without any help from the utterance structure. Furthermore, various lexical constraints on vocabulary acquisition such as the whole object constraint (a novel label is usually perceived as referring to the whole object and not its parts, substance or

Research Question: Language Emergence 46

characteristics, Markman 1990) preclude the possibility of acquiring expressions encoding such kind of complex meanings as in these examples. On the other hand, idioms in modern languages are claimed by Wray (2002b) as one type of holistic expressions in protolanguage (the others include adjuncts, collocations, sentence frames and standard situational utterances). During production and comprehension of these idioms, any single word in them is assumed not to play any role; semantic or syntactic information of each word is assumed not to get activated. However, Cutting and Bock (1997) have concluded that idioms in modern languages are not produced as “frozen phrases”, instead, they are syntactically analyzed, and their literal meanings are also activated during production. The above two problems of the “formulaic” scenario are partially caused by committing the mistakes discussed in Section 2.1.4. First, these cited holistic examples indicate that the researchers who propose them assume that ancient humans have already been able to manipulate the semantic complexity comparable to that in modern languages. Second, it should be noted that formulae in modern languages are rather different since they appear to contain individuals words (Kirby 2007a). Therefore, it is misleading to use idioms in modern languages to characterize the features of protolanguage, which will neglect the primitive syntactic level in protolanguage and syntactic abilities of ancient humans. A large proportion of idioms in modern languages are grammatical. Instead of being “frozen phrases” at the very beginning, they could have resulted from a process of metaphorical innovation (Deutscher 2005) or reanalysis (Hopper and Traugott 2003). For instance, some Chinese idioms or clichés are frozen metaphorical expressions of some historical stories or simple situations, but the metaphorical meanings of these expressions have lost their connections to the original sources, e.g.: /

/ (“to send off the pigeon”) connotatively means “to miss an appointment”;

/

/ (“to climb the chimney”) connotatively means “to die”.

Some English idioms are also formed through a similar metaphorical process:

Research Question: Language Emergence 47

/to kick the bucket/ (from Vogt 2005a) and /to buy the farm/ (Kirby 2007a) both connotatively means “to die”; /to hit the roof (ceiling)/ (from Wray 2002b) connotatively means “to be angry”;

Other idioms are formed through a reanalysis process due to frequent use, and the whole meanings of them are slightly different from the meanings of their components, e.g., in French: /bon/ (“good”) + /jour/ (“day”) become /bonjour/ (“how are you”/“good morning”).

In this section, the “bootstrapping” scenario of Innatism and the “formulaic” scenario of Connectionism/Emergentism are briefly reviewed. As for the latter scenario, some inappropriate criticisms on it are rebutted, and some latent problems in it are investigated. Based on these discussions, I adopt a theoretical framework based on a modified “formulaic” scenario to explain the phylogeny of language, which adjusts the semantics of the primitive holistic expressions and incorporates an analytic process of the segmentation of holistic expressions with a synthetic process of the concatenation of lexical items. This framework and my coevolutionary viewpoint on the phylogenetic emergence of language are introduced in the next section.

2.3. The Coevolutionary Hypothesis on Language Emergence The thesis contributes to the study of language origin by investigating the emergence of some linguistic universals such as compositionality and regularity. Among the various levels of regularity, my focus is the regularity in the form of basic constituent word orders. Without discussing how the biological capacity of language emerges, the thesis mainly explores the influence of some basic abilities on language evolution. In the glossogenetic timescale, how idiolects and communal languages are updated within and across generations, and how they affect each other are discussed in a theoretical framework. Based on the sociocultural explanation, this framework adopts a “from simple to complex” routine to describe the emergence of a compositional language out of a holistic signaling system, and implements a “bottom-up” process of syntactic development during language emergence.

Research Question: Language Emergence 48

Three main aspects of this framework are introduced in this section, the first of which deals with which type of protolanguage should be adopted as the primitive stage of human language, and what is the simple semantic structure contained in this protolanguage; the second touches upon what the emergent process should look like under related mechanisms; and the third focuses on word order, and discusses whether this syntactic feature can gradually emerge as a result of some domain-general abilities. 2.3.1. Holistic protolanguage with no syntactic structure to encode descriptive meanings Considering the two problems discussed in Section 2.2.3, I make some modifications on the “formulaic” scenario. I postulate that meanings contained in holistic utterances are descriptions of relatively simple integrated events, and structural information is gradually revealed in utterances with the emergence of compositionality and regularity in the holistic protolanguage. On the one hand, meanings contained in utterances of a holistic protolanguage could not be too complex beyond the scope of the cognitive abilities of Homo sapiens. Meanwhile, the meanings exchanged during the foraging activities, as still shown in many modern hunter-gathers communities (Barnard 2004), should be complex to a certain extent such that information about food sources, locations of predators, or some environmental events, can be explicitly described, and expected responses can be clearly triggered from the individuals participating in conversations. Therefore, meanings encoded in utterances should be a little bit more complex than the atomic concept of a single object or action, since utterances encoding atomic concepts could be occasionally ambiguous, and require further cognitive or psychological efforts to detect the intentions contained in them. For instance, in order to trigger an escape action from an early hominid, considering his/her limited abilities to detect intentions, sending an utterance encoding a single atomic concept like “tiger”, “jump” or “run” is inefficient than sending an utterance encoding an explicit event “a tiger is jumping at you!” or “you run!”. This discussion suggests that the meanings exchanged in these conversations should be descriptive about simple integrated events involving probably a single predicate and its related arguments, e.g., “a wolf is running” or “a lion is chasing a gazelle”. It is relatively easier for such meanings to be produced by a speaker based on not much language-specific knowledge or mechanisms, and it is also easier for them to be inferred and reconstructed by a hearer from

Research Question: Language Emergence 49

contextual or nonlinguistic cues. In addition, these utterances encoding descriptive meanings are sufficient for individuals to develop linguistic universals based on some social intelligence. As claimed in the “formulaic” scenario, by comparing holistic utterances previously acquired in communications, individuals can detect atomic concepts such as actions or their instigators, and develop some mechanisms to regulate these concepts to express salient integrated meanings. Furthermore, by replacing predicates and arguments in these integrated meanings with the corresponding actions involved in other social activities and related human instigators, individuals can create manipulative meanings, e.g., “B come” or “A make a fire”, in which “A” and “B” are human instigators. Then, the individuals who participate in other social activities but not foraging can also acquire similar linguistic knowledge from these manipulative expressions. On the other hand, the initial holistic utterances should contain no structure, since they are always unitarily produced and comprehended. When holistic utterances are segmented into subunits and some of these sub-units are acquired as lexical items, the structural information concerning the order among these sub-units begins to emerge. Other sub-units arbitrarily introduced or not mapped to any atomic meanings could also be acquired as morphological tags. These structural features regarding word order or morphological tags can probably be acquired as rules for governing lexical items. This leads to the mergence of regularity at the syntactic level. When a set of lexical items and regulatory mechanisms are conventionalized, the exchanged holistic utterances can be fully segmented into lexical items and morphological tags, then, a compositional language emerges. 2.3.2. Coevolution of compositionality and regularity The concept of coevolution was borrowed from biology. In biology, coevolution is the mutual evolutionary influence between two or more species. Each party in a coevolutionary relationship exerts selective pressures on the other, thereby affecting each other’s evolution (Futuyma and Slatkin 1983). The interactions between species could be competitive or cooperative. Generally speaking, species that have mutually influenced one another’s evolution are said to have coevolved (Purves et al. 2004). There are three types of coevolution: 1) reciprocal coevolution, e.g., the flower-insect interactions (Ehrlich and Raven 1964), in which flowers coevolve with the insets that pollinate

Research Question: Language Emergence 50

them and feed on their nectar in a mutualist relationship where reproductive success of one species is beneficial for other species’ survival; 2) competitive coevolution, e.g., the predator-prey or host-parasite interactions (Tuner 1981), in which the survival of individuals of one species requires the death of individuals of the other species; and 3) diffuse coevolution, in which a broad group of species are involved and influenced by a wide variety of predators, parasites, preys, or mutualists. In a sense, the entire cosmos can be attributed to diffuse coevolution (Jantsch 1980). Coevolution has inspired many studies in other research fields apart from biology. For example, a family of coevolutionary algorithms has been proposed for optimization, machine learning, and design of strategy in games (e.g., Steels 1998a; Michelle et al. 2006). The reciprocal and competitive features of coevolution also characterize some social phenomena, such as the coevolution of people and plagues (Wills 1996) as an example of the competitive coevolution. In linguistics, the development of some language-related functions or patterns have been assumed to coevolve with other cognitive abilities or nonlinguistic factors in a reciprocal manner, e.g., the coevolution of neocortical size, group size, and language in humans (Dunbar 1993), the coevolution of human brain and language (Deacon 1997), the coevolution of language size and the critical period (Hurford and Kirby 1999), the coevolution of language and LAD (Briscoe 2000). In addition, some linguistic aspects have been claimed to coevolve with other aspects, e.g., the coevolution of phonology and lexicon (Nettle 1998). In my modified scenario on the phylogeny of language, compositionality and regularity are assumed to have coevolved in a reciprocal manner during the transition from holistic utterances to a compositional language. This coevolutionary hypothesis is supported from the following three aspects. First of all, from the aspect of pragmatics, compositionality and regularity are inseparable. They are necessary and reciprocal to each other when an individual combines lexical items into sentences to encode integrated events. Although they deal with different linguistic aspects, both can be viewed as recurrent patterns in different domains and at different levels. Compositionality can be viewed as recurrent patterns in both utterance syllables and meaning constituents, i.e., identical utterance syllables and identical atomic meanings; morphology can be viewed as recurrent patterns in utterance syllables, i.e., identical utterance syllables assigned to lexical items with the same semantic role in different integrated meanings, such as actions, their instigators, or

Research Question: Language Emergence 51

entities undergoing actions; and word order can be viewed as recurrent patterns in utterance syllables, i.e., identical sequential relations among utterance syllables of different lexical items. Based on general cognitive mechanisms, once one type of these recurrent patterns is noticed and grasped, the other type could probably be acquired as well. Second, the coevolution of compositionality and regularity has been traced in some empirical studies on language acquisition and normal language processing. For example, Bates and Goodman (1997) have studied the development of linguistic abilities of normal children, and showed that the emergence of grammatical abilities is highly dependent on the acquisition of sufficiently many lexical items. Similar observations are also found in some atypical populations, including early talkers (Bates et al. 1995a) and children with brain damage (Bates et al. 1995b). Studies of language breakdown in children and adults with neurological disorder (Bates and Goodman 1997) also support the claim that there is a strong association between grammar and lexical items; once the development of one aspect is damaged, the development of the other is certainly affected as well. In addition, the studies of online language processing of Spanish (Reyes 1995) and Italian (Bates et al. 1996) suggest that in normal adults, both lexical and grammatical structures are processed in an integrated store, following common principles of access and integration, and sentence processing requires probabilistic, semantic, and syntactic knowledge coded in lexicons (MacDonald 1994). Finally, empirical studies on second language acquisition have traced some mechanisms for simultaneously acquiring compositionality and regularity. L. Fillmore (1979) has reported a study on second language acquisition of five Spanish-speaking children learning English. She found that in order to participate in social discourse, these children adopted a social strategy by producing formulaic (holistic) or memorized, unanalyzed utterances that allowed them to use the language long before they knew anything about its structure, and before they could create any sentences in this language themselves. These holistic utterances or gestalt forms partially resulted from the abilities in rote memory and motor proficiency that exceeded linguistic comprehension and productive linguistic abilities (Prizant 1983). Some of these utterances were associated with particular activities or routines, and were often used in somewhat appropriate contexts. The meanings contained in these utterances could be understood based on contexts or nonlinguistic cues. With more of such holistic utterances being memorized, these children began to notice and

Research Question: Language Emergence 52

acquire similarities among them as lexical or phrasal items. And then, based on these items, these children could identify some structural information such as word order or morphological information, which paved the way for their acquisition of grammatical knowledge in the unknown language. Through this analytic process, these children gradually grasped some lexical items and grammatical knowledge of English. To a certain degree, the linguistic environment in this report resembles the primitive environment where some basic semantic concepts were presumably established. For instance, in the primitive environment, the exchanged sentences may also encode simple integrated events involving frequent activities or routines, and nonlinguistic or contextual information could also limitedly assist the comprehension of these sentences. In order to participate in social discourse, early hominids also need to acquire linguistic knowledge based on their linguistic instances. Since most of these instances are holophrastic, the acquisition of linguistic knowledge may follow a similar analytic process as shown in L. Fillmore’s report. However, there are crucial differences between these two cases. For instance, in these cases, the mental representation of language could be different, and the level of adopted mechanisms could also be different. Nevertheless, some mechanisms reported in this study could probably be utilized by early hominids to develop compositionality and regularity in their languages. This study actually inspired Wray to establish her “formulaic” scenario on language emergence. 2.3.3. From “local” to “global”, a “bottom-up” process of syntactic development This thesis mainly discusses the basic constituent word order with which subject, object and verb appear in sentences of different languages. Following Tomlin’s terminology (1986), subject (S) here refers to the primary syntactic relation borne by a Noun Phrase (NP) with respect to the verb; object (O) refers to the secondary grammatical relation borne by a NP with respect to the verb; and verb (V) refers to the verb root. In the simple descriptive semantic space adopted in this thesis, lexical items as the subjects in sentences always encode the instigators of actions, and lexical items as the objects always encode the entities that undergo actions. Other grammatical information, such as tense, aspect, voice, agreement, and pronominal clitics, are excluded. A domain-general ability, the simple sequential learning ability (the ability to encode and represent the order of discrete elements occurring in a temporal sequence, such as the sequence of

Research Question: Language Emergence 53

sounds making up words or the sequence of words making up sentences, Christiansen and Kirby 2003a) is assumed as the precursor of syntactic abilities in human language. “Simple” here means that the sequential orders only concern a limited number (say, 2) of discrete elements in a temporal sequence, and the orders for language learners to manipulate are rather simple (e.g., before or after). Local order is defined as a simple sequential order of two lexical items with which they appear adjacently or separately in sentences. Global order is the basic constituent word order at the sentence level, which regulates the order of two or three discrete elements. My hypothesis on the emergence of regularity is that global orders in sentences can emerge by reiterating local orders, and the whole emergent process follows a “bottom-up” routine, evolving from local, simple orders to global, complex syntax. This hypothesis is supported from the following four aspects. First of all, the sequential ability is domain-general, present with different degrees in humans and other species such as chimpanzees and monkeys. For example, as shown in an experiment training chimpanzee to learn Arabic numerals (Biro and Matsuzawa 1999; Tomonaga and Matsuzawa 2000; Hauser 2005), when present with three-to-five Arabic numerals on a monitor, the subject pressed each number in its appropriate ordinal sequence, independent of numerical distance. Other studies also discovered that nonhuman primates such as cotton-top tamarins showed clear evidence of possessing mechanisms capable of computing serial orders occurred in the input speech stream (Hauser et al. 2001). Second, humans are capable of detecting and adopting local information to process modern languages. For example, O’Grady (2005) reported that most native English speakers regard the following sentence with partial agreement as grammatical, since its first conjunct /there is water/ is grammatical: /There is water and sand on the floor/ (adapted from O’Grady 2005)

When native speakers process other languages such as Moroccan Arabic and Brazilian Portuguese, a similar phenomenon has been noticed (Munn 1999). Based on these findings, O’Grady (2005) has proposed a “carpentry” theory on syntax emergence, according to which, under the driving of efficiency and local sequential information, sentences with complex word

Research Question: Language Emergence 54

orders can be gradually built up. In addition, Akhtar and Tomasello (1997) have discovered that young children, in their early stage of language acquisition, not only fail to generalize SVO word order knowledge from one verb to another, but are also unable to use it as a cue for sentence comprehension with novel verbs. This indicates that the global information is not directly available, but acquired and processed based on local, piecemeal information. Third, Tomlin (1986) has proposed a few functional principles to explain why SOV and SVO are more frequent than other word orders in human languages. Some of these principles, if correct, have revealed the fact that in human languages, global orders are formed under the guidance of local orders. For example, V-O Binding claims that the object of a transitive verb forms a more cohesive syntactic and semantic connection than that does a transitive verb and its subject, and Animated First Principle claims that in basic transitive clauses, the NP which is most “animated” will precede other NPs. Finally, from “simple” to “complex”, “local” to “global”, and “bottom-up” are general trajectories that characterize the evolution of many biological phenomena. For example, the evolution of the lung from the esophagus in freshwater fishes to the breathing organ in terrestrial vertebrates followed a series of changes that occur locally (Mayr 1982). The evolution of the larynx from its initial survival function to more complex functions such as enhancing phonation and generating human speech followed a “bottom-up” routine through the occurrence of a series of simple modifications in muscles and cartilages that control the larynx (Negus 1949; Lieberman 2006). The evolution of the eyes and the early development of human embryo also followed the similar trajectories (Jacob 1977). Based on the empirical evidence, Jacob (1977) has proposed a “tinkerer or bricolage” view on evolution, which claims that evolution, like a tinkerer, is an opportunist, an inventor that takes whatever available materials and tinkers with them. It rarely produces novelties from scratch. Instead, it works on what already exists by either transforming a system to give it new functions or combining several systems to produce a more elaborate one. Unlike engineers who have some blueprints or global arrangements beforehand, tinkerers who tackle the same problem are likely to end up with different solutions, and they only refer to local and available materials at their disposal. This view on evolution proposes that the evolutionary process does not favor the evolution of domain-specific modules, particularly if any can be found to accomplish the task by modifying pre-existing mechanisms (Schoenemann 2005). During

Research Question: Language Emergence 55

evolution, structures and systems that were previously adapted for one purpose can be modified to take on new tasks, though the results are not necessarily simple or elegant (Lieberman 2006). Following the “tinkerer” view, language must have emerged through making significant use of the abilities that our ancestors already had. According to Arbib (2006a), the full syntax of human language could be a result of bricolage, a historical accumulation of piecemeal strategies for achieving a wide range of communicative goals complemented by a process of generalization whereby some strategies become unified into a single general strategy. Considering the mainstream theories on language emergence, Innatism is more like an engineer who requires the whole blueprint at the beginning of the evolution, and small modifications during the progress may not greatly affect the whole arrangement. Therefore, it faces the problem to discern the variety of human language. On the contrary, Connectionism/Emergentism is more like a tinkerer who states that complex language-specific abilities in modern languages should gradually emerge from simple available competences. Given similar domain-general abilities, in different situations, various linguistic features and abilities could burgeon, which leads to the variety of human language. To summarize, my hypothesis, based on Emergentism, suggests that language emergence should be a coevolutionary process of compositionality and regularity, the development of syntax in human language should follow a “bottom-up” routine, and the related linguistic abilities for processing lexical items and local orders should derive from some domain-general abilities, such as pattern extraction and sequential learning.

Chapter 3. Research Method: Computational Simulation

This chapter first introduces the major components of computational linguistics. Then, it discusses computational simulation on language evolution, together with a review of some contemporary models. And finally, it briefly describes the framework of my coevolution model on language emergence.

3.1. Major Components of Computational Linguistics Computational linguistics (CL) dated back to the machine translation in the sixties of the last century. And now, with the rapid development of computational power, the availability and reusability of linguistic resources, and the multidisciplinary contributions from mathematics, information theory, neuroscience and humanities, its scope has been greatly extended to touch upon a variety of topics (Huang and Lenders 2004), covering both the intrinsic linguistic aspects such as semantics and syntax and some modern tasks such as the computer-aided natural language processing and the human-machine linguistic interface. Aided by Artificial Intelligence (AI, the study dealing with intelligent behavior, learning and adaptation in machines, Jackson Jr. 1985) and Artificial Life (ALife, the study of life through the use of human-made analogs of living system, Adami 1998), CL has adopted the simulation method into not only the design of a machine to process various languages, but also the exploration of the topics concerning the evolution of language and some linguistic features. There are currently three representative components in CL (Huang and Lenders 2004; Gong 2005), all of which are interdependent and contain many subfields: 1) Natural Language Processing (NLP), which studies the problem of automated generation and understanding of natural human languages (Harris 1985), so that eventually people will be able to interact with computers as if they do with other people. Some of the latest NLP frameworks or methods include the FrameNet (C. Fillmore et al. 2004) and XML (eXtensible Markup Language) based NLP tools (Wilcock et al. 2004), etc.

Research Method: Computational Simulation 57

2) Construction and Analysis of Language Databases, which deal with the construction of formal representations of languages in computer databases, and the analysis of statistical characteristics of these databases. An emergent field called corpus linguistics has been growing from this line of research (Kufera and Francis 1967; Biber et al. 1998), and many linguistic corpora have been developed and widely referred to in not only linguistic but also engineering research, such as The British National Corpus (BNC, http://www.natcorp.ox.ac.uk), Child Language Data Exchange System (CHILDES, http://childes.psy.cmu.edu, Sokolov and Snow 1994; MacWhinney 1995), and multi-language corpora provided by Linguistic Data Consortium (LDC, http://www.ldc.upenn.edu). 3) Computational Simulation (CS), which are “operationalized” hypotheses or theories that are expressed neither in words nor in mathematical symbols, but as computer programs (Parisi 2006). The simulation results of these programs are the empirical predictions derived from the incorporated hypotheses or theories (Cangelosi and Parisi 2002). Computational simulation is an invaluable tool for transforming developmental theories from a descriptive science into a mature explanatory science. CL and its components share many properties with traditional linguistics, yet they possess some new characteristics (Gong 2005): First of all, a distinct feature of CL is to solve linguistic problems using computational techniques. Many new techniques have been utilized in CL, especially in NLP and CS. For instance, a hierarchical network structure was adopted in FrameNet (C. Fillmore et al. 2004), in which nodes denoted linguistic features and connections among them denoted relations of these features. Dynamic analysis was adopted in some computational models (e.g., Wang et al. 2004) to study lexical diffusion. Other techniques, such as the association network (e.g., Brighton et al. 2005a), the rule-based system (e.g., Kirby 2002a), and the decision-tree (e.g., Steels 2002; Steels et al. 2002), have also found their ways into NLP methods and computational models. Second, those representative components of CL can assist linguistic research from different angles. The starting point of exploring a language could be the construction and analysis of a corpus of it. More information can be exchanged using this language between humans and computers when rules and regularities extracted from the corpus are implemented in NLP.

Research Method: Computational Simulation 58

Meanwhile, the evolutionary panorama of this language, projected by CS, may guide the collection of linguistic data, the comprehension of linguistic features, and the modification of NLP methods. Third, similar to traditional areas in linguistics, CL is also of a multi-disciplinary nature, containing a wide range of perspectives for theorizing and applications. Besides mathematics and computer science, anthropology, psychology, physics, and biology all contribute to CL. These disciplines present new records in language database, introduce new techniques in NLP, and provide different theoretical backgrounds in CS. Meanwhile, to achieve great advancements in CL, cross-language analyses, cross-discipline research, and cross-region collaboration are also necessary.

3.2. Computational Simulation “A basic task of science is to build models — simplified and abstracted descriptions — of natural phenomena” (Belew et al. 1996, p.432). Computational simulation has long been adopted by researchers from many disciplines to study various natural phenomena. And recently, it joined the endeavor to serve as an effective method to study language evolution. It can help to recapitulate language evolution, reconstruct language history, and reconsider the effects of linguistic and nonlinguistic factors during language evolution. There has been a large amount of work collected in the recent anthologies on language evolution based on computational models (e.g., Hurford et al. 1998; Knight et al. 2000; Briscoe 2002; Cangelosi and Parisi 2002; Wray 2002c; Christiansen and Kirby 2003b; Cangelosi et al. 2006), and several comprehensive or brief reviews of this area can be found in Parisi (1997), Kirby (2002b), K. Wagner et al. (2003), Wang et al. (2004), Gong and Wang (2005), and Vogt et al. (2006). In this section, computational simulation is introduced from three aspects, the first of which deals with the necessity of computational simulation; the second provides some criteria for classifying the contemporary computational models; and the last enumerates some general steps to establish a computational model to explore linguistic questions.

Research Method: Computational Simulation 59

3.2.1. The necessity of computational simulation Why should we adopt computational simulation and how could it be of any help? There are three primary reasons, the first of which concerns its complementary role to the traditional study on language evolution; the second suggests that it is an appropriate tool to study language as a Complex Adaptive System (CAS, Holland 1995, 1998; Briscoe 1998; Steels 2000); and the last concerns its validity (Gong 2005). 1) Computational simulation is complementary to the traditional empirical or theoretical study on language evolution. This role manifests in the following two aspects. a) The limitations of the traditional means of studying language evolution. First, the reconstruction of the trajectory of language development has long relied upon anthropological and archaeological findings, but most of them are just preliminary or indirect evidence. For example, based on the anatomical analysis of the cranium fossils of hominids, we can trace the starting point of the emergence of the earliest anatomically modern humans at least 160,000 years ago (White et al. 2003), which only provides a lower bound for language origin, without further information about the trajectory of language emergence. Second, since it is generally impossible to preserve any ancient speech of a language in anthropological or archaeological evidence, anthropologists have no choice but to use hominid fossil records to deduce the timing and order of speech-related adaptations, but these records can tell us little about the roots of cognition and language because the soft tissues involved do not fossilize (Stanford 2006). Therefore, the inferred conclusions based on this limited evidence could be incomplete or even inaccurate. For example, Lieberman et al. (1969) concluded that as a by-product of upright bipedal locomotion, the decent of larynx allowed human beings to be capable of mimicking a variety of sounds, which provided a possible source of spoken language (Beaken 1996). However, new comparative data have shown that some mammals (e.g., goats and monkeys) lower their larynges and tongues during vocalization (Fitch 2002, 2006), and vocal mimicry also exists in certain avian species (Hauser 1996; Janik and Slater 2000) and aquatic mammals (e.g., cetaceans). But none of these nonhuman species produce complex vocalizations comparable to human speech (Nishimura et al. 2003). It is in need of more convincing proofs and further analysis to study the speech-related adaptations towards human language.

Research Method: Computational Simulation 60

Third, since no fossils allow us to trace language development from its prehistoric states, historical linguistics has been confined to a time limit beyond which little information can be retrieved (Trask 1996). One way to overcome this limitation is to adopt a comparative approach by observing and comparing various forms of communication systems in other animals (Oller and Griebel 2004), especially primates. These species’ learned, culturally varied behaviors can provide us with a sense of the likeliest range of behavioral or cognitive options that natural selection would have taken with ancestral forms of humans (Stanford 2006). Some similarities and crucial differences between human and animal communication systems have been identified by empirical studies, including the cultural factors, such as social structure (e.g., Dunbar 1993), and the biological factors, such as the declarative (e.g., Terrace 2002) and episodic (Raby et al. 2007) memories, as well as the physiological (e.g., Nishimura et al. 2003) and neural bases (e.g., Rizzolatti et al. 1996). However, due to the fundamental distinctions between language and purely functional and stimulus-bound animal communication systems (Chomsky 1967), no matter how subtle and advanced these factors and gestural or vocalization systems might be in other species, they are not comparable to the complexity and creativity of the human language system (Greenspan and Shanker 2004), and the study of human language using animal models has offered only limited usefulness (Chow 2005). Apart from this comparative approach, we still need a “test bed” to obtain better understanding on the effects of these factors and the process of their entering into human language and human communication system. Finally, one important source for understanding language evolution is to compare the performance of normal human subjects with that of aphasia patients (Caplan 1987) in linguistic tasks. For instance, the discoveries of Broca’s and Wernicke’s areas were based on this approach, and the study on the KE family revealed the FOXP2 gene, which appeared to play important roles in human language (Hurst et al. 1990; Fisher et al. 1998; Enard et al. 2002; Marcus 2004). However, it is usually difficult to find patients suffering from specific linguistic impairment, and it is generally impossible to compare a subject’s behavior before and after some injury to his/her brain occurs. In the human population, it has been estimated that only about 2%—5% of children who are otherwise normal have significant difficulties in acquiring expressive or receptive language (Bishop et al. 1995), and these reported cases often fall into various types of language disorders. In addition, the scarcity of families with a large number of affected individuals also limits the geneticists to identify the responsible locus via focusing on large multi-generational

Research Method: Computational Simulation 61

pedigrees (Chow 2005). Furthermore, the modern technologies for temporarily disabling some abilities by shielding certain brain areas (e.g., the Wada test, Hugdahl 2005) or those for reinforcing certain brain areas by inducing electric fields in part of the brain (e.g., Transcranial magnetic stimulation (TMS), Pulvermüller et al. 2005) are still in a burgeoning stage and face technical and ethic problems. For instance, to test a theory, it is ethically not justifiable to put a neurologically intact subject in a situation that might cause degradation to his/her neural activities. In the above situations where traditional means fail to provide a comprehensive analysis, computational simulation offers new perspectives and new data. First of all, by controlling the initial or boundary conditions, it can investigate the dynamics of a developmental process or some transient behaviors. For example, many simulations on language emergence predefined some linguistic environments and equipped individuals with some linguistic abilities to explore whether certain initial conditions could lead to the emergence of some linguistic features (e.g., Kirby 1999, 2002a; Vogt 2005b; Gong et al. 2005b, 2006a, 2006b). Second, by manipulating different situations and comparing the consequences in these situations, it can discuss the necessary and sufficient conditions for certain linguistic phenomena. For example, by defining different population replacement strategies, de Boer (2001) studied the effects of these strategies on the maintenance of the vowel system in an emergent language across generations. Through adopting different social structures to restrict communications among individuals, Ke et al. (2004) explored the conditions in which a linguistic innovation could spread out. Third, by incorporating various parameters, it can test whether some factors are prerequisite for the emergence of linguistic universals. For example, by adjusting the probability for an adult or a child to be chosen as a speaker or a listener in a communication, Vogt (2005a) studied the influence of different types of communications on the emergence of a compositional language. Fourth, by singling out or blocking some factors, it can study the isolated or collective effects of a single or a set of factors on language evolution. For example, by comparing the

Research Method: Computational Simulation 62

isolated effects of two related factors, the cultural variation and learning cost, Munroe and Cangelosi (2002) presented a comprehensive study of the Baldwin Effects on language emergence. Ke et al. (2006) discussed the collective effects of various parameters related to some lower-level cognitive abilities on language emergence, and showed that some parameter changes could result in phase transition in the emergent language. b) The complementary role of computational simulation to the linguistic theories or hypotheses that were sometimes founded on unreliable bases. Most of these theories or hypotheses are based on empirical findings or research that only cover limited timescales in history, or touch upon short-term phenomena. Even for the generalized theories discussing the whole evolutionary trajectory of human language, they may simply describe a rough picture that focuses mainly on general processes or primary driving forces, and leaves out details and exceptions or neglects the effects of correlated factors. For instance, Jackendoff (2002) listed the conceptual stages from symbolic, atomic protolanguage to modern language with complex syntactic structures, but he did not give detailed or explicit explanations on how those structural features in different stages were formed in the first place. Meanwhile, both the “bootstrapping” and the “formulaic” scenarios (discussed in Chapter 2) proposed relatively detailed explanations about the transitions from protolanguages to modern languages, but the dynamic processes of these transitions were not further explored. In these situations where linguistic theories are devoid of convincing arguments to describe an integrated picture of language evolution, computational simulation could contribute from the following four aspects. First of all, as pointed out by Christiansen and Kirby (2003a), computational simulation has three major roles to assist the theories or hypotheses proposed based on observations and generalizations of some empirical findings: a) exemplification, to demonstrate how an explanation works; b) evaluation, to assist researchers to identify hidden problems or trigger additional experiments to modify or enrich the original theories, e.g., using the performance of human subjects on comprehending an artificial language to further verify the results obtained from the simulation (Christiansen and Ellefson 2002; Perruchet et al. 2004); and c) exploration,

Research Method: Computational Simulation 63

to explore the general ways in which explanatory mechanisms or theoretical constructs interact, thus directing us to new theories. Second, as summarized by Cangelosi and Parisi (2002), if a theory or a hypothesis can be expressed as a computer program, this theory or hypothesis itself has to be explicit, detailed, consistent, and complete, because without these properties, the simulation would neither run in a computer nor generate meaningful results. When translating an underlying theory into a computer model, a researcher must specify precisely what is meant by the various terms. And building a model to implement a theory provides a means of testing the internal self-consistency of the theory; once implemented as a computer program, an inconsistent or incomplete theory will become obviously problematic, leading to conflict situations in which the computer program fails to function (Mareschal and Thomas 2006). Third, since the simulation results are the empirical predictions derived from the theory incorporated in the program, the program could necessarily and uncontroversially generate many detailed empirical predictions, some of which may even confront with the empirical data (Parisi 2006), since the empirical data may only cover limited conditions under specific settings of related factors. Finally, simulations, like virtual experiment laboratories, should allow researchers to observe phenomena under well-controlled conditions, manipulate variables or conditions that control the phenomena, and determine the consequences of these manipulations. The abundant results are beneficial for thoroughly analyzing the incorporated theories and hypotheses, and some long-term consequences help to modify these theories and hypotheses. Even the failures of a program sometimes point to a need to reassess the situation and reevaluate the involved theories (Mareschal and Thomas 2006). 2) Computational simulation allows precisely studying the nature of language as a CAS. A CAS is defined as a complex system in which the behavior of individual elements and the nature of the interactions may change, thus giving rise to a higher-order dynamics (Steels 2000). CAS usually consists of a large number of entities that are self-organizing. It is sensitive to initial conditions and tends to change in ways depending on the particular environment in

Research Method: Computational Simulation 64

which it exists. It also tends to be organized in a hierarchy of multiple levels, with entities at a lower level determining the properties of a single entity at a higher level. Similarly, language is a complex, self-organizing, and adaptive system. It also possesses the hierarchical organization and can be viewed as a CAS, e.g., neurons interact in the brain to produce the linguistic behavior of an individual, and individuals interact using their idiolects to produce the global properties of the communal language. Meanwhile, computational simulation usually adopts a synthetic strategy (Steels 1997a) using bottom-up and constructive approaches. Based on these approaches, researchers only need to decide the basic components of a system, the rules by which these components interact, and possibly the environment in which these components interact with each other. In addition, computational simulation allows researchers to create scenarios in which a large number of entities are assigned a set of properties and interact in such a way that researchers can observe the emergent global properties from these interactions (Cangelosi and Parisi 2002). Therefore, computational simulation is an effective approach to study CAS. In the end, adopting computational simulation into the study on language evolution could complement laboratory experiments and data collection, since these traditional methods may only adopt some top-down approaches, obtain results under limited conditions, or indirectly reveal the effects of some underlying mechanisms. 3) Computational simulation is valid in the following three aspects: First of all, in a sense, it is more rigorous than mathematical theories, which, out of compactness, may occasionally replace some steps with conventions or shortcuts (Holland 2005). However, if even a single step is missed or left out in a computer program, the simulation might collapse or the generated results could be of no use. Second, most of its assumptions are plausible and supported by empirical findings or theories in linguistics as well as in other disciplines. For instance, “follow the majority” is a plausible psychological assumption adopted in many computational simulations studying collective behaviors (e.g., Ke et al. 2002).

Research Method: Computational Simulation 65

Third, it uses objective, realistic mechanisms, and follows well-defined, traceable procedures to obtain convincing, replicable results. For example, the results generated by some neural network models (e.g., Christiansen and Ellefson 2002) well matched the empirical findings in the task of artificial language perception. The lexical diffusion process demonstrated by Wang et al. (2004) has already been traced in linguistic data (e.g., Shen 1997). Apart from these aspects, there are other advantages for using computational simulation in linguistic studies. First, simulation provides a more convenient way for cross-disciplinary communication and cross-fertilization (Ke 2004). A computer framework with explicit and quantified conditions and parameter settings can be borrowed to study similar phenomena in fields of other disciplines, and its results can be exported from one field to another in a comprehensible fashion (Belew et al. 1996). For instance, after modifying the general framework to simulate the Baldwin Effects (Hinton and Nowlan 1987), Munroe and Cangelosi (2002) adopted it into their own models to study the Baldwin Effects on language emergence. Ke (2004) adopted some general models of complex networks (e.g., the small-world network, Watts 1999, 2003; the scale-free network, Barabási 2002; and the local-world network, Li and Chen 2003) into her own study to explore the effects of social structure on language change. Second, many computer programs involve interactions of a large number of entities, and their results can capture some complexities that may be extremely difficult to predict or explain based on experience or mathematical expressions. For example, the Talking Heads experiment (Steels 1995, 1997b, 1998b, 1999; Steels and Kaplan 2002) simulated a series of physically grounded adaptive language games among physical robotic agents. In addition to the modules handling linguistic information, this system also involved some physical equipments and related engineering operations. Fluid Construction Grammar (FCG, Steels 2004; de Beule and Steels 2005) considered many semantic and syntactic features, as well as related cognitive and linguistic mechanisms, and traced the emergence of a language with complex morphosyntactic structures. The whole emergent process in this model was full of complexities from various aspects.

Research Method: Computational Simulation 66

3.2.2. The limitations and difficulties of computational simulation Despite its numerous advantages as discussed above, computational simulation is still in a burgeoning stage and has its limitations and difficulties. Its major limitations are simplification and specification. 1) Simplification, i.e., most available models are just highly simplified and idealized “toy models” of real phenomena, capturing restricted features and abstracting them to a certain extent such that they might not fully represent the original features in real phenomena. For example, most of the current ANNs are “toy models of the brain” (Schalkoff 1997). The input to them is like “toy string set” and they “do not even have an interface with semantics” (Pinker 1996). Their major functions and mechanisms (e.g., Parallel Distributed Processing, McClelland and Rumelhart 1988) are highly simplified compared with the activities of real neurons in human brains. In addition, although many models can produce expected results among a limited number of agents or under some ideal situations, when some parameters or situations are scaled up, e.g., the population size is increased, the complexity in semantics is increased, or the information transmission is imperfect, these models could probably crash (the scale-up problem, Cangelosi and Parisi 2002). On the one hand, simplification produces understanding; a certain degree of simplification is necessary to study the target question and achieve reasonable conclusions. All scientific work and many scientific breakthroughs rely on simplification and finding simpler ways to view complex phenomenon (Feldman 2006). To study a CAS such as human language, it is wise to separate the whole system into smaller subsets, disentangle the complex process into relatively simple sub-processes, and divide confusing, mixed results into direct ones dependent on specific situations. After simplification, in those subsets, sub-processes, and particular situations, the target question is much obvious, the process is immediately apprehensible, and the factors related to the results are easily identified. The convincing conclusions obtained in these simplified cases pave the way for a further study of the target question with increased complexity, and eventually lead to the comprehensive understanding on the whole target question. These “divide and conquer” and “from simple to complex” strategies are common in scientific investigations.

Research Method: Computational Simulation 67

However, during simplification, the assumptions on which simplification is based may be questionable and the obtained results may be meaningless as well. For instance, there are many models studying the emergence of linguistic universals. Some are based on the “principle and parameter” framework (e.g., Briscoe, 2000; Nowak et al. 2000, 2001), and others are based on conventionalization or self-organization (e.g., Ke et al. 2002; Gong et al. 2005b). If either of these assumptions is implausible, the results of related models are also questionable. On the other hand, the model should be neither too simplified nor too complicated. If a model is oversimplified, it may miss the essence of the problem, and have little contribution to the understanding of the target question, or its results might have been directly built-in from the initial conditions, and the related factors become trivial, having no effects on its results. On the contrary, a model containing too many entities whose effects are hard to disentangle also has little contribution to the comprehension of the target question, because it is difficult to examine the effect of each factor on the final results, and the conclusions may be misled or overwhelmed by these numerous factors. 2) Specification, i.e., during simplification, a model needs to focus on specific areas and particular factors that are much relevant to the incorporated theory or hypothesis. For example, some models studying the evolution of linguistic universals represented language as meaningform mappings (e.g., Nettle 1999b, 1999c; Nowak 2000, 2001; Ke et al. 2002); others treated language as simple, independent syllables (e.g., Kirby 1999; Briscoe 2000; Gong et al. 2005b), leaving out phonological features of human language; and still others concentrated on syntax of simple semantic expressions (e.g., Kirby 1999; Smith et al. 2003; Gong et al. 2005b), leaving out morphological and grammatical features of human language. In a CAS as human language, any factor may potentially influence any component of the system. However, it is not possible for researchers to work on all these factors simultaneously, nor is it possible to resolve all of them once for all. Therefore, specification is necessary. But too much specification may cause the corresponding conclusions to be of little significance, since the ideal situations where one linguistic aspect is rarely affected by other aspects are hardly possible in reality.

Research Method: Computational Simulation 68

In a sense, simplification and specification are inevitable. In order to be relatively easy and direct to study the target question, simplification is necessary; in order to concentrate on major factors relevant to the target question, specification is necessary. But too much simplification or specification can reduce a model’s contribution to understanding the target question. Therefore, based on the level or resolution of understanding, we need to strike a balance between simplification and complication, and between specification and incorporation. But it is usually difficult to achieve such balance in designing, evaluating, and modifying computational models. Besides these limitations, there are several other difficulties that challenge the scholars who adopt computational simulation into their research. First, to implement a model, a lot of arbitrary choices are required for defining its details, some of which are difficult or hard to abstract from reality. Any tiny negligence on parameter setting, coding, and result display may cause some built-in bias towards certain results or some misinterpretations of the results, which would seriously affect the validity and generality of the model. In order to overcome this difficulty, it is necessary to take explicit, justified, and plausible choices beforehand, and adopt systematic analysis afterwards. Second, due to the lack of direct evidence and quantitative evaluation mechanisms, a direct comparison between simulation results and empirical data is not often possible (Cangelosi and Parisi 2002), especially for simulations on language origin, since there are few data directly pointing to the different stages in the emergence of human language from its prelinguistic versions (Niyogi 2006). In addition, some results and correlated situations in which these results are obtained are not yet observable in reality. For these results, a comparison can only be made in principle. One way to partially overcome this difficulty is to adopt a comparative approach (Hauser 1996; Oller and Griebel 2004) by studying other communication systems found in the animal world. Case studies of the origin of these systems can provide fruitful empirical data to sharpen the questions and models on language origin. However, as argued in Section 3.2.1, this approach has its limitations. Another way to overcome this difficulty is to rely on the development of various techniques incorporated into language evolution models. For example, with the aid of brain imaging technologies such as fMRI or EEG, simulation results of neural networks on language processing can be compared with the experiments on human brains.

Research Method: Computational Simulation 69

After discussing the advantages and limitations of computational simulation, in the next sections, I will summarize four criteria to classify the contemporary computational models, which can help researchers to analyze these models and subsequently design their own models. The first criterion deals with the purpose of models, the second and third criteria deal with the content of models, and the last one deals with the adopted methods in models. 3.2.3. Classification based on the purpose Different models may serve different purposes, though these purposes might often be confused or misunderstood in practice. Holland (2005) divided the current computational models into the following three types based on this criterion: 1) Data-driven models. These models are meant to generate outputs that mimic and predict the data collected through experiments and observations. An example of data-driven models is the weather prediction model, the sole aim of which is to predict the weather based on the available data using whatever available tools. The adopted analyzing and predicting mechanisms may have nothing to do with the phenomenon itself. 2) Existence-proof models. These models are meant to show that something is possible or not in principle. Many computational models on language evolution are existence-proof models, which can demonstrate that some abilities or mechanisms are crucial or insignificant for language evolution, and some linguistic or nonlinguistic factors do or do not play a role in shaping linguistic structures in principle. An example of existence-proof models is the SRN model proposed by Wong et al. (2006). It demonstrated that the general structure of SRN can develop the competence to handle combinatorial productivity, given particularly designed training data and sufficient training time. On the contrary, an opposite conclusion was drawn by van der Velde (2005) who also studied the same problem in his existence-proof model. Whether or not an existence-proof model really demonstrates what it is meant to prove greatly relies upon its assumptions, the adopted mechanisms, and the rationality of the analysis. As pointed out by Wong et al. (2005), there were structural differences between their SRN models and those of van der Velde, and the analysis of the latter was based on incomplete data.

Research Method: Computational Simulation 70

Another existence-proof model is the SRN model proposed by Lupyan and Christiansen (2002). Based on the training results of an artificial language with different word orders, their connectionist model demonstrated that flexible word orders were no harder to learn than the common SVO and SOV orders. This result contradicts the subset principle (Pinker 1995), which states that fixed word order is the default, and children are conservative, only switching on “free word order” when positive evidence is obtained. However, this model itself contained some questionable aspects, e.g., the training language made no distinction between case marking and word order, and the roles of these linguistic features were ambiguous and indistinguishable. 3) Exploratory models. These models put together a set of mechanisms and explore the possibilities inherent in them. These models can demonstrate in principle the effects of some previously neglected factors, which may inspire further studies or new theories. A large proportion of models fall into this type. For example, Gong et al. (2005a; 2005b) proposed an existence-proof model demonstrating that lexical items and syntax could coevolve during language emergence. An extension of this model (Gong et al. 2004) further explored the relationship between language use and social structure. In this extended model, the original mechanisms concerning the acquisition of linguistic information were put together with a set of mechanisms on the formation of social connections. The simulation results showed the emergence of a small world network along with the emergence of a compositional language, which, compared with empirical data, could be possible in principle. To a certain degree, many existence-proof models are also exploratory. First, in order to demonstrate that some results are possible in principle, a computational model may be designed to incorporate some factors directly relevant to these results, based on empirical studies or theoretical analyses. In other situations, however, the model may incorporate some indirect factors, which may also lead to the expected results in principle, but their effects are not yet clear. In these situations, one purpose of the model is to explore the effects of these factors. Second, before the simulation results are available, all these models are just to explore what would happen. Models incorporating direct factors have served the role of existence-proof, no matter whether the simulation results match the expected ones. Models incorporating indirect factors may reveal the mutual influence (complementary or contradictory) between the direct and indirect factors,

Research Method: Computational Simulation 71

and generate some results caused by this mutual influence. It will trigger reconsideration on the original claims and further study on those incorporated factors. 3.2.4. Classification based on the resolution of language representation Ke (2004) classified the current models into four levels based on the resolution with which language is represented in these models: 1) Language as a synthetic whole. At this level, language evolution is studied from a macro perspective. Language is simulated without considering its internal structure, and sometimes, an overall fitness is defined to determine language evolution. An example of models at this level is the language competition model proposed by Abrams and Strogatz (2003). In their model, two languages, X and Y, were defined as a synthetic whole and the fitness (attractiveness) of a language increased with its number of speakers and its perceived status (a parameter that reflects the social or economic opportunities afforded to its speakers). Without considering the internal structure of the two languages, the model focused on the conversion from one language to the other. Equation 3.1 shows the transitional probabilities converting from Y to X and vice versa, where x and y are the fractions of the population speaking X and Y, x + y = 1 , 0hsh1 measures the relative status of X, and B is greater than 1.0. Equation 3.2 shows the differential equations of the change between these two languages. PYX ( x, s ) = cx s and PXY ( y , s ) = cy s dx dy = yPYX ( x, s) xPXY ( y, s) and = xPXY ( y, s) yPYX ( x, s) dt dt

3.1 3.2

Based on these equations and related analysis, the dynamics of the competition between two languages was studied and some key factors affecting the competition were investigated. Although this model well matches some empirical data, it has a linguistic problem: in real language competition, speakers can not directly change from using X to using Y or vice versa, there must be some intermediate bilingual state. Minett and Wang (2008) extended this model by introducing a bilingual state Z. In their model, the differential equations were updated as shown in Equation 3.3, where x + y + z = 1 and QZX (1 x,1 s ) = c' (1 x ) (1 s ) is the probability for a bilingual speaker to give up language X and become a monolingual speaker of Y:

Research Method: Computational Simulation 72

dx dy = zQZX (1 y, s) xPXZ ( y,1 s) and = zQZY (1 x,1 s) yPYZ ( x, s) dt dt

3.3

In addition to the monolingual states of X and Y, this extended model introduced another stable state in which X and Y coexisted. This result is more realistic to represent the ongoing situations of language competition, and it suggests some mechanisms to preserve endangered languages from a bilingual perspective. Computational models at this level can investigate the mechanisms and dynamics of language evolution from a macro perspective. However, language as a CAS contains many components or complex structures, each of which is subject to different constraints and may have different evolutionary dynamics. It is necessary to develop specific models to study the subsystems of language from a micro perspective. 2) Subsystem as an independent whole. Models at this level focus on the specific subsystems of language, such as phonology, semantics, syntax, and morphology. Some models concern the performance of an individual in some learning tasks in a specific linguistic domain during the microhistory timescale, such as segmentation of continuous speech (Christiansen et al. 1998), bilingual acquisition (Sternberg and Christiansen 2006), and learning the distinction of grammatical categories (Elman 1990). Other models focus on the evolutionary dynamics through cultural transmissions at the community level. They usually define some learning frameworks or learning strategies, and apply the population dynamics to examine the evolution of some specific linguistic subsystems. The learning frameworks or strategies adopted in these models include: a) Strategies adopted in the sociobiological explanation, such as the assumption that parents whose idiolects tend to have a better communicative fitness can produce more offspring, and these offspring can learn the idiolects of their parents. Some model (e.g., Komarova and Nowak 2001) adopted these strategies, in which the idiolect of an individual was treated as a lexical matrix, and the matrices of the language users who had higher communicative fitness would spread out in the community after several generations.

Research Method: Computational Simulation 73

b) Strategies adopted in the sociocultural explanation, such as the assumption that systems with a better fitness have a higher chance to be learned and maintained in the next generation. Some model (e.g., Kirby 1999) adopted such strategies, in which some competing variants initially coexisted in the population, and their fitness would determine their degrees of preference and their proportions in the population. c) Some optimization strategies. These strategies postulate that some optimization processes are ongoing in individual language users. For example, Redford et al. (2001) introduced some explicit centralized optimization mechanisms in their model to study the evolution of phonological structure in a community. These mechanisms were the articulatory cost and its role in shaping syllable repertoires. However, the model did not study how these mechanisms could be realized in a community of decentralized, autonomous language users. Some later models proposed self-organizing mechanisms to culturally select the phonological structures (e.g., de Boer 2001; Oudeyer 2005, 2006), but these models belong to the next level of resolution, since they consider language use. Most models at this level either focus on an individual language user or do not consider actual language use. Language use is an important medium for language to change, and it should not be omitted from the evolutionary framework of language. 3) Language embodied in use. Models at this level usually simulate a situation where language is used in a population. Language use can be simulated as pair-wise interactions among individuals. Each interaction usually requires two individuals, a speaker and a listener, or a teacher and a leaner, chosen from the population according to different constraints. Every individual uses his/her own linguistic knowledge (in the form of a look-up table of MeaningUtterance mappings, Steels 2002; a neural network, Munroe and Cangelosi 2002; or a list of linguistic rules, Kirby 2001) to produce meanings or comprehend utterances. Through iterated interactions, idiolects are formed and updated based on some self-organizing mechanisms, and a communal language may emerge. There are two questionable assumptions usually built in models at this level. The first one is the pre-existing semantics space: all individuals initially have some atomic concepts of objects and actions, and can even identify simple semantic roles of these concepts. This assumption was

Research Method: Computational Simulation 74

criticized by some researchers (e.g., Hutchins and Hazlehurst 2002; Vogt 2007), but others (e.g., Schoenemann 1999, 2005) argued that a relatively complex meaning system must have preexisted before a complex syntax emerged. In addition, if the models focus on how lexical items are conventionalized through linguistic communications, or how syntactic structures gradually emerge in human language, the postulation of the pre-existing semantic space is acceptable. Otherwise, it requires further simulations to study the formation of semantic concepts, which makes these models belong to the next level. The second controversial assumption is the “explicit meaning transference” assumption illustrated in Figure 3.1 (A. Smith 2001, 2003a, 2003c, 2005; Gong and Wang 2005; Gong et al. 2005b): during interactions among individuals, the intended meanings, encoded in linguistic utterances produced by speakers, are always accurately available to listeners. Some empirical findings have shown that children (Tomasello 1999) and other animals such as chimpanzees (Tomasello 2000; de Waal 2005) could have a high level of cognitive abilities in terms of social intelligence. Based on these abilities, a certain degree of mind-reading or theory of mind (the ability for an individual to place himself/herself in the mind of others, to ascribe their mental state, and to act on that insight, Byrne 1995; Stanford 2006) is possible. However, if this assumption were true, language as a communication medium could have been unnecessary, since intended meanings would always be available to listeners without linguistic communications. In addition, it is obvious that there is no direct connection between speakers’ production and listeners’ comprehension; speakers always use the utterances that they believe to represent the intended meanings, and listeners always interpret the utterances into the meanings that they believe these utterances express (Kirby 2002b). Other channels, such as pointing while talking or feedback by gestural or facial expressions, can only provide a certain degree of confirmation. Regarding pointing while talking, Quine (1960) provided a good counterexample: if a child heard his/her mother uses a word like /gavagai/ while pointing to a rabbit, what meaning should the child assign to the utterance, the rabbit, some part of it, or any of a host of properties or details of that condition? This problem was referred to as the referential indeterminacy (Quine 1960): under a given context, there could be a set of references for a simple word. Furthermore, feedback through facial expressions or gestures (e.g., nodding) may not allow speakers to confidently know whether listeners correctly infer their intended meanings.

Research Method: Computational Simulation 75

Figure 3.1. The explicit meaning transference. A communicative episode which consists of the explicit transfer of both the utterance /zknvrt/ and the meaning “three apples” from the speaker (left) to the listener (right) (adapted from A. Smith 2003c).

All these suggest that there is no telepathic access to other individuals’ minds (Bickerton 2007). Meanings contained in the speaker’s utterances cannot be directly transferred, but must instead be inferred by the listener from the signals and the context in which these utterances are heard (A. Smith 2003a). This inference process may require linguistic knowledge, nonlinguistic information, or other constraints. For example, visual information can be a nonlinguistic cue to assist comprehension. A. Smith (2003b) has listed some constraints possibly adopted by children during language acquisition: a) Whole-Object Bias (Macnamara 1972), children naturally represent their environment in terms of the objects within it; b) Shape Bias (Landau et al. 1988), children are more likely to categorize new objects in terms of their shape, rather than other perceptual features; c) Mutual Exclusivity (Markman 1989), children ensure that a newlyencountered word does not refer to the same thing as a word which already exists in their lexicon; and d) Principle of Contrast (Clark 1987), any difference in form marks a difference in meaning. Some of these cues or constraints were simulated in computational models to avoid the “explicit meaning transference” assumption (e.g., Gong et al. 2005b; A. Smith 2005; K. Smith et al. 2006). 4) Meaning and/or form embodiment. Models at this level usually adopt an elaborated representation of language, and they can study the evolution of complex semantic and/or syntactic structures. Models at this level usually assume that language is grounded into the agents’ own cognitive representations (internal symbol grounding) and into the social or physical environment in which they develop their linguistic capabilities (external symbol grounding) (Cangelosi 2006; Vogt 2007; Vogt and Divina 2007). And they usually deal with the symbol grounding problem (Harnad 1990), which concerns the construction of a semiotic relation between a referent (something concrete or abstract), a meaning (a representation inside an agent’s brain that has

Research Method: Computational Simulation 76

some function to the agent), and a form (the conveyed signal), as illustrated by the semiotic triangle (Ogden and Richards 1923) in Figure 3.2. An important feature of the models studying the symbol grounding problem is that both the properties of the agents’ own bodies and their physical environment can affect and contribute to the acquisition of a lexicon directly grounded in the world they live in (Steels 2003a; Cangelosi 2005; Loula et al. 2007). There are two types of the symbol grounding problem. One is the physical symbol grounding problem (Vogt 2002). The difficult part of solving this symbol grounding problem is to construct a relation between a referent and a meaning (Vogt 2002). This symbol grounding problem usually concerns idiolects, but a communal language consists of the forms conventionalized through interactions among agents and their environments. Therefore, the community has to deal with another symbol grounding problem, the social symbol grounding problem (Vogt and Divina 2007). The difficult part of solving this symbol grounding problem is to construct a relation between a referent and a form. Some external factors, such as cultural transmissions and social structures, will cast their influence on this process. To implement symbol grounding, some mechanisms are usually required (Cangelosi 2007), such as the direct grounding of the agent’s basic lexicon by linking perceptual representations with symbols through supervised feedback, and the mechanisms to transfer the grounding from basic symbols to new ones obtained by logical combinations of the elementary lexicon. The models dealing with the physical symbol grounding problem usually study the emergence of simple semantic concepts based on features received from sensorimotor channels. For example, in the Talking Heads experiment (e.g., Steels 1995, 1999; Steels and Kaplan 2002), the information about shape, color, and size of different objects could be acquired by individuals. During a communication, this information could help the listener to make a decision about which object was described by the speaker, and then, the listener could gradually build up his/her own semantic concepts on various objects.

Figure 3.2. The semiotic triangle. The arbitrary or conventionalized relation is indicated by direct line, and indirect relation is indicated by the dotted line. (adapted from Ogden and Richards 1923)

Research Method: Computational Simulation 77

The models dealing with the social symbol grounding problem usually study the emergence of a shared lexicon through biological and/or cultural mechanisms in a population of agents (Cangelosi and Parisi 2002). For example, Vogt (2002, 2007) developed some robotic experiments, in which mobile robots developed a symbolic structure from scratch by engaging in a series of language games with other robots, and used this structure to efficiently communicate the names of a few objects. In addition to the symbol grounding problem, there are other models exploring the emergence of forms with morphosyntactic structures. For instance, in FCG (Steels 2004), robots were required to perceive a series of motions. In order to communicate these scenes, they have to build up forms with morphosyntactic structures through pair-wise language games. Then, along with the emergence of a symbolic system for simple concepts, some syntactic structures to regulate these symbols also emerged. 3.2.5. Classification based on the situateness of individual and the structure of language K. Wagner et al. (2003) proposed a criterion based on the content of the model from two aspects: whether language users were situated in an environment, and whether the communication activity used unstructured tokens or structured utterances composed of multiple tokens. Situated simulations place individual language users in an environment with physical objects or an artificial world with virtue materials. In addition to communications among individuals, interactions between individuals and their environmental entities, such as gathering food or avoiding predators, can also affect the environment and/or modify their internal states. However, nonsituated simulations focus more on the dynamics of the communication system and language. In these simulations, individuals only send and receive signals among one another during communications. There are some models (e.g., the Talking Heads experiment; and FCG) that also put individuals in an environment and study the symbol grounding problem, but these models are nonsituated since they do not simulate any nonlinguistic interactions between the environment and individuals. Utterances in structured communications consist of small units such as words or phrases. These simulations can clearly study how these utterances are formed or regulated, and how

Research Method: Computational Simulation 78

individuals produce and comprehend them. All these aspects deal with the formation and application of linguistic knowledge. Therefore, these simulations can explore the evolution of linguistic universals (e.g., compositionality and regularity) and related mechanisms. However, utterances in unstructured simulations consist of single units on multiple channels, and the values of different channels can be dependent or not. These simulations concentrate on how other nonlinguistic channels help to comprehend linguistic information, and how nonlinguistic factors affect language evolution. Based on the situateness of individuals and the structure of language, many simulations can be divided into four subgroups: 1) Nonsituated and unstructured models. These models contain nonsituated individuals and unstructured signals. The tasks of individuals include encoding and decoding arbitrary meanings in communications. The linguistic materials are represented and processed by look-up tables, associated memories, neural networks, and so on. The lexical conventionalization model (Wang and Ke 2001; Ke et al. 2002) is an example of this type. It demonstrated the emergence of a coherent vocabulary through self-organization during dyadic communications in a population. In this model, language was simulated as a set of M-U mappings between inseparable meanings and utterances. Each individual had his/her own speaking (pij) and listening (qij) matrices for storing M-U mappings (see Figure 3.3 (a)), the numbers in which indicated the probability with which a meaning was correlated with an utterance. In a speaking matrix, all probabilities in a row summed up to 1.0; in a listening matrix, all probabilities in a column summed up to 1.0. In production, based on his/her speaking matrix, the speaker encoded the meaning into an utterance according to the probabilities of different mappings of that meaning. In comprehension, based on his/her listening matrix, the listener decoded the utterance into a meaning according to the probabilities of different mappings of that utterance. After that, through the “explicit meaning transference”, the speaker’s intended meaning was compared with the listener’s comprehended one. If they matched, the probability of the chosen mapping in the speaker’s speaking matrix was increased with an amount (.), and other probabilities in the same row were decreased accordingly. Meanwhile, the probability of the chosen mapping in the listener’s listening matrix was increased, and other probabilities in the same column were decreased. Otherwise, an inverse adjustment was executed. All probabilities in

Research Method: Computational Simulation 79

these matrices were randomly initialized. After a number of communications, a coherent vocabulary emerged: all individuals shared similar speaking and listening matrices (see Figure 3.3 (b) and (c)).

(a)

(b)

(c)

Figure 3.3. The speaking (pij) and listening (qij) matrices. (a) An example of the speaking and listening

matrices; (b) An example of the inconsistent coherent speaking and listening matrices; (c) An example of the consistent coherent speaking and listening matrices. (adapted from Ke et al. 2002)

Using matrices to represent M-U mappings was first introduced by Hurford (1989) (the transmissions and reception matrices), and it has been widely adopted in many models (e.g., the active and passive matrices, Nowak et al. 1999; the encoding and decoding matrices, Komarova and Niyogi 2004; Niyogi 2006). An interesting result shown in some of these models is that occasionally, the listening and speaking matrices are inconsistent (see Figure 3.3 (b)): a speaker cannot understand the utterance he/she produces for a certain meaning, or a hearer understands a certain meaning from a certain utterance but he/she is unable to produce that utterance for that meaning. Some additional mechanisms to couple the speaking and listening matrices of an individual (e.g., the self-talk, Ke et al. 2002) can avoid the occurrence of such inconsistency, which is unrealistic in human languages.

Research Method: Computational Simulation 80

Figure 3.4. The lexical convergence. X-axis shows the number of communications, and Y-axis the

similarity or the convergence rate. The thick line traces SI, the ticked line traces PC, and the dash line traces IC. Phase transitions of IC and PC take place around 3,000 communications. Simulation conditions: population size = 10, number of meaning = number of utterance = 3, = 0.2. (adapted from Ke et al. 2002)

Some indices were defined, such as similarity of the mapping matrices (SI), measuring the similarity between the matrices of two agents; individual convergence rate (IC), measuring the degree of consistency of an individual’s speaking and listening matrices; and population convergence rate (PC). A process of conventionalization with some phase transitions (a sudden increase of IC and PC values) can be traced using these indices (see Figure 3.4). 2) Nonsituated and structured models. Language users in these models are equipped with capacities for participating in communications where structured utterances are produced and comprehended. These models consider language in a more complex form, with certain structures instead of simple mappings between meanings and utterances. An example of these models is Iterated Learning Model (ILM) proposed by Kirby (1999). It introduced an iterated learning framework (see Figure 3.5) in the glossogenetic timescale to simulate communications between individuals across generations, and demonstrated a transition from a holistic signaling system to a compositional language.

Research Method: Computational Simulation 81

Figure 3.5. The iterative learning framework. Linguistic competence (Hi) regulates an individual (Ai) of generation i to express certain meanings (Mi) with certain utterance (Ui). These utterances are also the primary linguistic data exposed to individuals in generation i+1. In this figure, each generation has one individual and the learning happens only between individuals in successive generations. (adapted from K. Smith et al. 2003)

K. Smith et al. (2003) extended ILM by introducing a structured representation of semantics and utterance. In their model, as shown in Equation 3.4, meanings were represented as points in an F-dimensional space, and each dimension had V discrete values. Utterances were represented as strings of characters, the length of which was from 1 to lmax, and each character wi was drawn from an alphabet set D: M = {( f1 f 2 ... f m ) : 1

fi

V ,1 i

F } and U = {w1 w2 ...wm : wi

,1 l

l max }

3.4

For a specific meaning or an utterance (say, “(12)” and /abc/), there were many holistic or compositional components. A component was a vector, each feature of which had either a value as same as the meaning or the utterance, or a wildcard (*). Holistic components had the same features as in the specific meaning or utterance, and compositional components had wildcard(s) in certain dimensional space or utterance location (e.g., the compositional meaning components “(1*)” or “(*2)”, or the compositional utterance components /ab*/ or /*bc/). An associative network was adopted to store M-U mappings (see Figure 3.6). Every crossing point in this network represented a lexical mapping between a meaning component and an utterance component, and an associative weight stored at this point indicated the probability of this mapping. This associative network covered all mappings between meaning and utterance components, and all associative weights were initialized as zero. The evolution of language was indicated by the adjustment and competition of these associative weights. The adjustment of associative weights happened during learning (comprehension) based on the acquired M-U mapping produced by the individual in the previous generation. For

Research Method: Computational Simulation 82

example, when an individual heard an M-U mapping: “(21)”

/ab/, he/she would either

increase (+) or decrease (-) the associative weights of the related mappings in his/her associative network, and leave those unrelated ones unchanged, as shown in Figure 3.6 (a).

(a)

(b)

Figure 3.6. Learning and production. (a) Learning: the large filled circles represent activated, related

nodes (labelled with the component they represent) and small filled circles represent associative weights; (b) Production: the relevant connection weights are highlighted in gray. (adapted from Smith et al. 2003)

The competition of associative weights happened in production. By comparing the average weights of all applicable holistic or combinable compositional mappings, the speaker decided how to encode a meaning into an utterance. For example, the speaker wanted to express the meaning “(21)” (as shown in Figure 3.6 (b)). There were three ways for him/her to do so: a) using a holistic utterance (gray circle indicated by i); b) using some compositional utterances (gray circles indicated by ii) with one order; and c) using some compositional utterances (gray circles indicated by iii) with another order. Then, the average associative weights in these conditions were calculated, and the winning way was chosen accordingly. By comparing the results in different conditions, the model showed that a compositional language emerged when there was a bottleneck on the transmission between individuals of different generations. It also discussed the relationship of meanings in the environment, and discovered that the maximum compositionality occurred when a language learner perceived his/her world as structured, i.e., the objects in the environment related to one another in a structured way.

Research Method: Computational Simulation 83

3) Situated and unstructured models. Many models of this type were originally inspired directly or indirectly from the observations of animal communication systems, not human language. Animals can communicate simple meanings, such as an alarm call, a food call, and a predator call (e.g., Cheney and Seyfarth 1990; Caine et al. 1995). These calls play an important role in their survival, such as finding food and avoiding predators. Simulating these behaviors and comparing animal communication systems with the human communication system can provide understandings about the prehistoric state of human language. Some models of this type focus on the formation and application of animal calls. For example, the neural network model of Cangelosi and Parisi (1998) concentrated on the formation of utterances for describing different objects; the cellular-automata adaptation model designed by Grim et al. (2000) studied food calls. Other models of this type focus on animal behaviors and strategies, some of which, such as altruism and cooperation, are also important in the human communication system. For example, the neural network model of Ackley and Littman (1994) studied altruism; the signalling game of Noble (2000) focused on cooperation and competition; the cellular automata model of Jaffe and Cipriani (2007) studied the dominance of cooperative behaviors over non-cooperative ones via biological or cultural transmissions. 4) Situated and structured models. These models study both linguistic communications and nonlinguistic interactions between individuals and their environment. Linguistic communications can assist individuals to participate in interactions with the environment, e.g., efficient communications may help individuals to undertake collective activities such as foraging. Meanwhile, the complexity required in interactions with the environment can be a driving force for individuals to adopt some linguistic behaviors to express complex concepts or processes, e.g., different items in the environment require different semantic concepts, and complex activities involving multiple objects and actions require mechanisms for regulating lexical items to encode these objects and actions in expressions. One example of these models is the mushroom foraging model designed by Munroe and Cangelosi (2002). In this model, generations of individuals were foraging in an artificial world containing poisonous and edible mushrooms represented by different visual information, and different actions were required for properly eating different types of edible mushrooms. The fitness of an individual increased when he/she properly ate an edible mushroom. Each individual

Research Method: Computational Simulation 84

used a neural network to handle linguistic and visual information (see Figure 3.8). Individual’s linguistic competence was stored as connection weights in cross-layer connections of his/her neural network. The language was structured; two clusters of nodes in the input and output of language were set up beforehand. The winner-takes-all mechanism was executed inside each cluster; if one node in a cluster had the highest value, the value of this node was set to 1, and others were all set to 0. After a generation, some expert foragers were chosen as parents based on their fitness values. Then, through asexual reproductions, each of them produced some offspring who copied the initial connection weights of their parent’s neural network with some errors controlled by a mutation rate as in Genetic Algorithm (GA, Holland 1995). The simulation included two stages of foraging, each of which covered several generations. In Stage 1, individuals judged the edibility of encountered mushrooms only by their visual information. After several generations, Stage 2 began, in which linguistic communications were incorporated. In this stage, the expert foragers were carried over into the next generation as “teachers”. Each new agent continued to forage as before, but in most time, the visual information of encountered mushrooms was blocked. However, a linguistic input from his/her parent was always given. The update of linguistic competence (connection weights) took place during a lifetime learning that included several training tasks based on a supervised learning algorithm (a machine learning technique for creating a function from the training data by mapping inputs to desired outputs, Micheli-Tzanakou 2000): in Task 1, the child decided his/her corresponding action based on his/her parent’s linguistic description of the mushroom that the child encountered; in Task 2, the child was required to produce a linguistic output to describe the encountered mushroom based on its visual information, and then, a learning algorithm (e.g., the back-propagation mechanism, a supervised learning algorithm to train the artificial neural network, which calculates the gradient of the error of the network with respect to the network’s modifiable connection weights, and uses this gradient to a stochastic gradient descent algorithm to find weights that minimize the error, Rumelhart et al. 1986; Werbos 1994) was applied to adjust the connection weights of the child’s neural network based on the difference between his/her linguistic output and his/her parent’s linguistic output describing the same mushroom; in

Research Method: Computational Simulation 85

Task 3, the child tried matching his/her own linguistic output to his/her parent’s, which was input to the child, and the back-propagation learning mechanism was also applied.

Figure 3.8. The structure of the neural network in an individual. The input layer includes visual information of a mushroom and a linguistic utterance to describe a mushroom. The output layer includes an action part to correlate language communications with effective actions, and a linguistic output for training other individuals. (adapted from Munroe and Cangelosi 2002)

After the foraging of Stage 1, individuals achieved the ability to distinguish different mushrooms based on visual information. When language communication was incorporated, this ability was achieved within fewer generations and the level of accuracy was increased. Meanwhile, the model traced the emergence of a simple syntactic structure in the linguistic input and output of the neural network, i.e., one cluster of the linguistic input and output was adopted for distinguishing the edibility of mushrooms (similar to a noun), and the other was adopted for distinguishing different actions towards edible mushrooms (similar to a verb) (see Figure 3.7).

Figure 3.7. The compositional structure (average of 100 agents in one generation, the cone height corresponds to the probability of using that node: the greatest height corresponds to 100% use of that node in the whole population). (adapted from Munroe and Cangelosi 2002)

Research Method: Computational Simulation 86

This model also tested the Baldwin Effects (Baldwin 1896) from two aspects: a) learning cost, whether a fitness penalty was given for eating poisonous mushrooms; and b) cultural variation, whether a random change was added to the parents’ idiolects when the parents trained their children. Through testing the difference between the language of the children and that of the parents, and analyzing the efficiency of language acquisition, this model showed that: the learning cost could make individuals gradually assimilate in their genomes some explicit features (e.g., lexical properties) of the specific language exposed to them; under the cultural variation, the Baldwinian processes could cause the assimilation of a predisposition to learn any language exposed to them. 3.2.6. Classification based on the adopted research methods Simulations can use different research methods to study various topics on language evolution. Based on the adopted methods, computational models can be divided into two subgroups: 1) Behaviorist models. These models simulate a set of actual behaviors of an individual or a community of individuals, and study the effects of these behaviors on language evolution. These behaviors include acquisition, production and comprehension of linguistic materials, communication with others, interaction with the environment, and birth or death. A widely adopted framework to implement behaviorist models is the multi-agent system (MAS). In computer science, MAS is defined as a system composed of a number of agents, which are collectively capable of reaching some goals that are difficult to achieve for an individual agent (Ferber 1998; Alonso et al. 2003). Each agent is an independent, automatous unit that possesses certain abilities (Steels 2003b). Agents can share similar (homogeneous) or different (heterogeneous) characteristics, and act according to simple rules or local information. Their abilities include obtaining knowledge, applying it in communications with other agents or interactions with the environment (if situated), and other activities such as moving, substituting old agents or being substituted by new agents. In MAS, the simple local interactions or communications among agents often lead to complex structures at the global level. This “bottomup” self-organizing routine makes MAS be widely used in the study of CAS. For example, MAS can be used to simulate a neural system, in which neurons are agents, interactions among them are their abilities, and the function of the system is the goal collectively achieved by them. In a

Research Method: Computational Simulation 87

human society, people are agents, various social activities are their abilities, and some collective behaviors achieved by them are the goals of MAS. MAS has been adopted in many models studying language evolution, especially the evolution of communal language (e.g., ILM; Ke et al. 2002; Munroe and Cangelosi 2002; Gong et al. 2005a, 2005b; 2006a, 2006b). In these models, artificial language users (agents) are equipped with computational mechanisms to develop their idiolects and shape the communal language. After iterated communications, a population of agents are able to develop an artificial language having certain linguistic features that resemble those in human language, such as compositionality (e.g., ILM), regularity (e.g., Gong et al. 2005b), and a vowel system (e.g., de Boer 2001). 2) Mathematical models. These models do not directly simulate individual behaviors relevant to the development of the system. Instead, they abstract the general or average performance of these behaviors into some mathematical equations. And then, based on the analysis of these equations, the dynamics of the whole system can be revealed. Some mathematical models, such as the language competition models with or without the bilingual state (Abram and Strogatz 2003; Minett and Wang 2008), were reviewed in Section 3.2.4. The mathematical equations in those models predicted the stable states during language competition and revealed some mechanisms for the system to approach and reside in these stable states. Mathematical models can also study the dynamics of other linguistic features or linguistic behaviors. For example, Wang et al. (2004) presented a mathematical model studying the lexical diffusion (Wang 1969) of one word in a population of agents. This model assumed that at some time instant t, the frequency of the unchanged form of a word u(t) and the frequency of the changed form of that word c(t) could be calculated from the frequencies at an earlier time instant, say, t-1. It also postulated that the use of the changed form was propagated by contact between pairs of agents, one of whom used the changed form and the other used the unchanged form. Based on these assumptions, the increase in the frequency of changed forms was proportional to the product of c(t ) × u (t ) and the rate of affective contact (Shen 1997).

Research Method: Computational Simulation 88

The frequencies of changed forms at time t+Et can be written as in Equation 3.5, where B is the rate of affective contact, and u (t ) + c (t ) = 1 : c (t + t ) = c (t ) + c(t )u (t ) t and u (t + t ) = u (t )

c (t )u (t ) t

3.5

The differential equation on the lexical diffusion of one word and its solution are shown in Equation 3.6, where k is the initial frequency of the changed forms at t=0:

dc exp( t ) = c(t )u (t ) and its solution : c(t ) = dt 1 + [exp( t ) 1]

3.6

This model can be extended to study the diffusion of multiple words, and the lexical diffusion process suggested by this model well matched the results of some empirical studies (e.g., Shen 1997). Most mathematical models by now only focus on the general dynamics of some simplified systems and leave out some complexities. It is difficult to establish a mathematical model to study the dynamics of a complex nonlinear system containing many factors as well as many interactions among these factors. In order to get a comprehensive understanding on language evolution, which involves numerous elements such as individuals, their behaviors, and linguistic and nonlinguistic constraints, both behaviorist and mathematical models are necessary. In the above sections, four kinds of criteria to classify computational models have been summarized. These criteria are correlated, and the subgroups under different criteria could overlap. For example, models at the first level of resolution are usually mathematical models (e.g., Komarova and Nowak 2001; Minett and Wang 2008) that focus on the dynamics of the whole system, such as the equilibrium and transient states, and the average characteristics of individuals. Models at the remaining levels of resolution are usually behaviorist models (e.g., Kirby 1999; Gong et al. 2006a) that focus on the actual interactions of one or a group of agents. In addition, the subgroups under one criterion are not mutually exclusive; a computational model may fall into more than one subgroup following one criterion. For example, the language change model by Ke et al. (2004) treated language as a synthetic whole with some innovations, and studied the effects of interactions constrained by social structures on the diffusion of these innovations. Therefore, this model falls into both the first and third levels of resolution. Meanwhile, the subgroups based on different criteria are not mutually exclusive; a computational

Research Method: Computational Simulation 89

model can fall into different subgroups following different criteria. For example, the Talking Heads experiment (Steels, 1997b, 1999; Steels and Kaplan 2002) is an exploratory model, a model at the fourth level of resolution, a nonsituated and structured model, and a behaviorist model. Third, the subgroups following one or different criteria can be mutually beneficial. For instance, the models at different levels of resolution can assist each other. Analysis in models at the first level of resolution focuses more on the general mechanisms of language evolution, which may influence the choice of principles in designing models at the remaining levels. Meanwhile, findings in models at the remaining levels of resolution provide evidence and arguments to support or oppose the theories investigated by models at the first level of resolution (Ke 2004). In addition, mathematical and behaviorist models can provide mutual benefits to each other. Mathematical models provide mathematical or theoretical support for behaviorist models that describe similar phenomena. For instance, some mathematical models (e.g., Baronchelli et al. 2005, 2006) systematically proved the dynamics of the emergence of communication conventions (e.g., common vocabularies) through iterated language games. This self-organizing process contains a period of preparation followed by a rather sharp (S-shape) transition towards a set of conventions. This S-shape phase transition has already been shown in many behaviorist models (e.g., Steels 1995; Shen 1997; Ke et al. 2002; Niyogi 2006), and it can be adopted to describe the changing processes in linguistic behaviors over successive generations (e.g., Weinreich et al. 1968; Bailey 1973). Meanwhile, behaviorist models specify some mechanisms that cause the whole system to follow the dynamical processes shown in mathematical models. For instance, Steels and Belpaeme (2005) listed three types of mechanisms from nativism (e.g., genetic transmission), empiricism (e.g., shared learning mechanisms) and culturalism (e.g., conventionalization through communications and language games), and traced the formation of common color categories in various computational models. Each of these mechanisms worked in different domains and adopted different ways to develop, spread, and merge color categories. For example, the nativism mechanism built in color categories into genomes and relied upon genetic transmission to spread these genomes among agents, whereas the culturalism mechanism developed color categories through conventionalization and adopted cultural transmissions to spread these categories. However, the simulation results showed that given different mechanisms,

Research Method: Computational Simulation 90

the formation of common color categories all underwent an S-shape transition from no conventions to a set of color categories shared among agents. 3.2.7. General steps to build up a computational model on language evolution The above sections discussed some classification criteria and briefly reviewed a number of computer models. Based on the study of these models, a general procedure to build up a computation model on language evolution can be summarized as follows: 1) Define an artificial language. Before the simulation, an artificial language should be defined based on the target question of the model. If it is to explore the evolution of linguistic universals, an artificial language can be defined such that some universals may be gradually developed in it. For example, in SRN models (e.g., van der Velde et al. 2004; Wong et al. 2006) studying whether SRN could be capable of generalizing syntactic information from limited sentences and applying it in the prediction task of some novel sentences, the artificial language was defined as a set of sentences comprising nouns or verbs that follow a consistent word order. Other models studying the evolution of compositionality and regularity defined their artificial languages as M-U mappings. In this language, compositionality was reflected from the segmentation of atomic meanings from integrated expressions (e.g., Vogt 2005a, 2005b) and the segmentation of syllables from utterances (e.g., ILM). Regularity was reflected from the manipulation of compositional materials (lexical items) in encoding and decoding utterances (e.g., ILM; Gong et al. 2005b). However, if the model focuses on language as a holistic entity without addressing the detailed components, the artificial language can be simply defined as a synthetic whole. For example, in the language competition models (Abram and Strogatz 2003; Minett and Wang 2008), the considered factors, such as the fractions of speakers and the social status of these languages, did not directly deal with any linguistic features. Therefore, the artificial languages in these models were greatly simplified, leaving out semantic and syntactic features. It is important to note that although the models usually predefine an artificial language so that the considered structures could emerge, it does not mean that the results are directly built in. First, the predefined language simply allows the possibilities for certain structures to emerge. Some of these structures are expected, but others are not. For instance, the lexical

Research Method: Computational Simulation 91

conventionalization model (Ke et al. 2002) defined lexical items as M-U mappings in speaking and listening matrices. The emergence of a consistent coherent vocabulary was expected (as shown in Figure 3.3), but the emergence of an inconsistent coherent vocabulary was not. Studying these unexpected results can provide reconsideration on some assumptions, reveal some effects previously neglected, and modify the model to achieve more realistic results. Second, even if most results are expected beforehand, when they eventually emerge, their occurrence frequencies might still be unpredictable, since these frequencies can be determined by initial conditions or factors working at lower levels. For example, in the computational model described in this thesis, although all six global orders for regulating three lexical items are expected beforehand, different global orders will emerge in different simulations, and a statistical analysis shows that some global orders are more frequent than others. This bias is partially caused by the design of the semantic space (Minett et al. 2006). 2) Define linguistic knowledge and related mechanisms. After an artificial language is defined, it is necessary to define linguistic knowledge and related mechanisms to acquire and apply this language in communications. The definition of linguistic knowledge and related mechanisms is also related to the target question of the model. For instance, in ILM (e.g., Kirby 1999), to simulate the evolution of compositionality, relevant linguistic knowledge on how to store compositional materials (e.g., compositional rules) and correlated mechanisms on how to acquire these compositional materials (e.g., the detection of recurrent patterns) were defined. In addition, the mechanisms to acquire linguistic knowledge can be either languagespecific or not. By applying some domain-general abilities to linguistic activities, we can explore the evolutionary relations between language-specific competences and domain-general abilities. For example, the SRN used in connectionist models is a general learning framework, not specific to language, but some studies have shown that this learning framework can acquire simple linguistic knowledge (e.g., Elman 1990). 3) Define communication or training scenarios. After the linguistic knowledge and related mechanisms to handle the artificial language are defined, the activities for developing the knowledge, applying the mechanisms, and evolving the language will be clarified.

Research Method: Computational Simulation 92

In models adopting MAS, iterated communications are such activities. There are many types of iterated communications, such as dyadic communications or one-speaker-multiplelisteners communications, all of which are defined according to the following two criteria: a) The internal components. A communication covers several components, such as production, comprehension, and other necessary components like the information transmission (utterance or feedback) between the speaker and the listener, the memory system, and so on. Various models may define their communication scenarios covering all or part of these components; b) The external constraints on selecting agents to communicate. Agents can be randomly chosen to communicate with each other, or they can be chosen following some social or other arbitrary constraints. For example, in Vogt’s model (2005a), the possibilities for an adult to be chosen as a speaker or a listener were arbitrarily pre-defined. In models adopting SRN to study individual language processing abilities, training scenarios are such activities. For example, in the SRN model of Wong et al. (2006), in order to examine the competence to handle combinatorial productivity, the training sentences were separated into four groups, each containing a limited set of nouns and verbs. After the network was trained with these groups of sentences, sentences comprising cross-group items were input to the network as the testing sentences to evaluate whether the network could generalize linguistic knowledge from limited training sentences, and efficiently apply it in the prediction task on the novel testing sentences. 4) Define the indices to test the performance of the system. After the artificial language, the linguistic knowledge and related mechanisms to handle the language, and the playground for developing the language are defined, the computer program is ready to run. In order to systematically evaluate its performance and explicitly interpret its results, several indices are necessary. For example, in ILM, to evaluate compositionality, the expressivity of compositional rules was defined as how many integrated meanings could be expressed using compositional rules. A high value of this index indicated the emergence of a language consisting of compositional rules. In Vogt’s models (2005a), the communicative accuracy was defined to indicate the percentage of the speaker’s intended meanings that were accurately comprehended

Research Method: Computational Simulation 93

by the listener. A high value of this index showed a high similarity of the idiolects among individuals. In some neural network models (e.g., van der Velde et al. 2004; Wong et al. 2006), Grammatical Prediction Error (GPE, Christiansen and Chater 1999) was defined as in Equation 3.7. In this equation, the presumable correct and incorrect activations of each node in the output layer are pre-defined based on the linguistic knowledge contained in the training and testing sentences, and the ratio of the actual activations of these nodes in a specific prediction task is calculated to evaluate the performance of the network in this task. A low value of GPE generally corresponds to a good performance in the prediction task:

GPE = 1

correct activation correct activation + incorrect activation

3.7

3.3. My Computational Framework on Language Emergence In this thesis, I propose an existence-proof model to demonstrate whether compositionality and regularity at the syntactic level can coevolve during the transition from a holistic signaling system to a compositional language. This model is also exploratory; it incorporates the sequential learning ability to explore whether this domain-general ability can pave the way for the emergence of syntactic features in human language, and whether regularity (in the form of global order) can emerge from local orders. This model represents language at the third level of resolution; it incorporates both subsystems of language, e.g., semantics and syntax, and language use, e.g., cultural transmissions among individuals of the same and different generations. Language is represented by M-U mappings. The semantics space includes a set of integrated meanings, each representing a simple event involving some atomic components, e.g., an action together with one or two of its arguments. The utterance contains either no structure, if it is processed as a whole; or some structure, if it can be segmented into compositional subunits. To avoid the “explicit meaning transference” assumption (discussed in Section 3.2.4), environmental information is simulated to assist comprehension, though there is no physical or virtual environment in which individuals are situated. This model is a nonsituated and structured model. This model aims to explain the emergence of compositionality and regularity as a result of self-organization in both idiolects and a communal language. It adopts MAS to simulate cultural transmissions and a rule-based system to simulate individual linguistic knowledge.

Research Method: Computational Simulation 94

In MAS, each individual is an agent who has some linguistic knowledge and related mechanisms to apply this knowledge to develop his/her idiolect in communications. Agents can communicate with others, and be assigned into different generations. A communal language may emerge via iterated communications between agents from the same or different generations. Linguistic knowledge can be represented in many ways. In neural network models (e.g., Munroe and Cangelosi 2002; Wong et al. 2006), individual’s linguistic knowledge is stored into the connection weights of his/her neural network. This linguistic knowledge is applied during the calculation of the output based on some input, and can be updated by some back-propagation mechanism. This way of representation is general in the sense that it does not consider any specific linguistic features such as semantics or syntax. It is also restricted in the sense that the structure of the neural network is predefined, both knowledge and training data have to be constrained to this structure. In other models, a rule-based system is adopted to represent linguistic knowledge. It has been widely adopted in AI simulations (Jackson Jr. 1985). A rule-based system contains a bunch of rules, a memory for storing them, and some mechanisms to adjust them. Each rule contains a condition and a response, and sometimes, an additional part indicating its strength (probability or frequency). If its condition is satisfied, a rule is said to be active or activated, and its response will be executed. For the same condition, different rules with different responses can be active simultaneously, and based on their strengths, they can compete with one another. Rule strength can be adjusted following certain criteria, e.g., if a rule is frequently activated, its strength tends to increase. The knowledge of a phenomenon can be specified as a set of rules, each focusing on some condition(s) in that phenomenon and suggesting some appropriate response(s). If many individuals share a set of rules, they are considered as having similar knowledge of that phenomenon. Many cognitive and linguistic phenomena can be viewed as governed by different types of rules (Gumb 1972; Hayes 2004). For instance, lexical rules can deal with how to map meanings to utterances; morphological rules can instruct on how to assign morphological tags on lexical items. However, most models that use rule-based systems to represent linguistic knowledge do not explicitly state how these rules are stored, activated, and executed in human brains. Further research on neuroscience may provide answers to these questions. In a sense, any

Research Method: Computational Simulation 95

abstract or specific, general or complex rules can be defined in a rule-based system, and empirical bases are necessary to support the correlated mechanisms for processing these rules. Compared with the neural network, the rule-based system is more “practical” in recapitulating linguistic behaviors, but it is not firmly grounded in biology. The neural network is “motivated” from biological findings, but its structure-sensitivity makes it ineffective to represent linguistic behaviors in various situations. In order to study the mechanisms in producing, comprehending, and learning lexical items and word order, my model adopts the rule-based system and defines several types of rules to represent linguistic knowledge. A communal language is said to emerge if all individuals share similar linguistic rules and consistently use them in communications. In addition, there is a critical difference between rules defined in a rule-based system and those defined in linguistic discussions. In the latter, rules are usually defined by linguists as abstract regulatory mechanisms that could even be independent of specific instances. For example, some linguists (Pinker 1996 and Marcus et al. 1999) suggested that the categorical information or grammatical rules in human brains were predefined, independent of linguistic instances and not acquirable through statistical learning. In my model, such kind of “visionary” rules are precluded, and the categorical information can be gradually acquired based on actual instances and statistical information detected during language use. Dyadic communications between two agents (a speaker and a listener) are simulated in this model, in which speaker’s production, listener’s comprehension, and the rule competition processes are modeled. Apart from random communications, cultural transmissions among agents of the same or different generations and communications among agents in some particular social structures are also simulated. Finally, some indices are defined to evaluate the emergent language, such as the expressivity and understandability of lexical rules and global or local orders. The values of these indices can indicate the emergence of linguistic universals, trace the “bottom-up” process of syntactic development, and evaluate the effects of linguistic or nonlinguistic factors on language evolution.

Chapter 4. Compositionality-Regularity Coevolution Model

This chapter introduces in detail the coevolutionary model on language emergence, including its semantic and utterance spaces, its representation and acquisition of linguistic knowledge, its communication scenario, and the indices to evaluate its performance. The chapter also compares it with some contemporary models that address similar questions. The implementation details of it are provided in Appendix A.

4.1. Semantic Space The model predefines a semantic space that contains a set of integrated meanings. Agents in the multi-agent system understand not only the atomic concepts of objects and actions, but also the simple semantic roles assigned to these concepts in the integrated meanings. This assumption is made according to the following three reasons: First of all, language had to conform to the conceptual system that we already had (Burling 2005). Language seems to have appeared in evolution only after early hominids had become capable of generating and categorizing actions and creating and categorizing mental representations of objects, events, and relations. Similarly, infants’ brains are also busy with representing and evoking concepts and generating myriad actions long before they utter their first well-selected word, form sentences, and truly use a language (Damasio and Damasio 1992). Therefore, a primitive semantic concept system must have been established via other sensory channels (e.g., vision) before linguistic communications are possible (Jackendoff 2002), and a simple structured meaning system must have pre-existed before a complex syntax emerged (e.g., Schoenemann 1999, 2005). Second, as discussed in Chapter 2, meanings contained in the exchanged holistic utterances during primitive linguistic communications could be descriptions of some integrated events frequently occurring in the environment of early hominids. This semantic information

Compositionality-Regularity Coevolution Model 97

must have been shared among agents before it was encoded in utterances and exchanged in communications. Finally, compositionality in this model deals with how word-like utterances are mapped to semantic concepts and how lexical items get conventionalized. Regularity in this model concerns the emergence of syntactic structures in utterances. Instead of semantics, both of them concentrate on the development of linguistic utterances under the guidance of semantics. Therefore, a built-in semantic space is acceptable. In this model, a semantic space, M = P × A × A , is constructed from two sets of sememes (atomic units of proposed or intended concepts): 1) A set of animate objects: A = {a j : j = 1,...n A } , where n A is the number of these objects. Some examples of a j could be “fox” or “tiger”; 2) A set of actions: P = { p j : j = 1,...n P } , where nP is the number of these actions. Some examples of p j could be “run” or “chase”; An integrated meaning is an element of the semantic space, m M , and it can be classified into the following two types: Type1: " pi < a j >", 1 i n P and 1

j

Type2: " pi < a j , a k >", 1 i n P , 1

j, k

Where a j

nA ; n A , and j

k (i.e., reflexive is avoided);

A denotes Agent (Ag), the instigator of an action; ak

(Pat), an entity that undergoes an action; and pi

A denotes Patient

P denotes Predicate (Pr), the action. In

addition, pi in Type1 meanings denotes Predicate1 (Pr1) that represents an action (e.g., “run”) involving a single argument, Ag; and pi in Type2 meanings denotes Predicate2 (Pr2) that represents an action (e.g., “chase”) involving two arguments, Ag and Pat. By replacing specific sememes with these semantic roles (e.g., Ag, Pr1/2, or Pat), we can rewrite the two types of integrated meanings as: “Pr1” and “Pr2”. These forms will be used in the following discussions.

Compositionality-Regularity Coevolution Model 98

This semantic space does not include actions such as “give” or “take” that involve three arguments. Nor does it include “Pr1” integrated meanings (e.g., “ (hit)< (people)>” or “ (eat)< (rice)>” in Chinese) since they can be viewed as Type2 meanings with omitted (or dropped) Ags. These exclusions will not greatly affect the generality of this semantic space. A semantic expression, m' , is an element of the space M ' = ( P # ) × ( A # ) × ( A # ) . A semantic expression does not need to specify the values of all its components; unspecified component(s) are written in “#”. A semantic expression can denote a single sememe (e.g., “wolf”, “run”, or “chase”, where the number of “#” indicates the number of arguments that action can have and whether it belongs to Pr1 or Pr2), two sememes that cannot form a Type2 integrated meaning (e.g., “chase”, “#”, or “fight”), or a Type1 or Type2 integrated meaning containing two or three sememes (e.g., “run” or “chase”). Generally speaking, a Type2 meaning with animate objects (e.g., “fox” or “tiger”) as Ag and inanimate objects (e.g., “food” or “meat”) as Pat is transparent; after all its sememes are identified, it is obvious that which is Ag and which is Pat. For example, if all sememes of an integrated meaning are identified as “fox”, “eat”, and “meat”, the whole meaning of it, in a normal situation, must be “eat”, instead of “eat”. However, a Type2 meaning with different animate objects as both Ag and Pat is opaque; even all its sememes are identified, it is still unclear that which is Ag and which is Pat. For example, if all sememes of an integrated meaning are identified as “chase”, “fox” and “wolf”, without further linguistic (e.g., word order, morphological, or context cues) or nonlinguistic (e.g., the visual scene of this event) information, it is unclear “who is chasing whom”. The transparency and opacity of integrated meanings were first clarified in my early model studying the emergence of compositionality and the convergence of global orders (Gong and Wang 2005, Gong et al. 2005b). As shown in that model, with no further syntactic information, the processing of transparent integrated meanings is relatively easier than that of opaque ones. In addition, since these two types of integrated meanings may share identical sememes as Ag, the processing of transparent integrated meanings and that of opaque ones can affect each other and collectively contribute to the syntactic development. Some computational

Compositionality-Regularity Coevolution Model 99

models in which agents can perform actions on inanimate objects from their environment (e.g., Munroe and Cangelosi 2002) have touched upon transparent integrated meanings, and others based on the sociocultural explanation (e.g., ILM) have taken account of opaque integrated meanings. The semantic space of my model involves both Type1 and opaque Type2 meanings. In this semantic space, n A = 4 and nP = 8 . In other words, there are 4 animate objects to be Ag or Pat, and 8 actions to be Pr, among which, 4 (Pr1) involve a single argument and the other 4 (Pr2) involve two arguments. In total, these 12 sememes can build up 64 integrated meanings, in which 16 ( 4 × 4 ) are Type1 and 48 ( 4 × 4 × ( 4 1) ) are Type2. The type ratio between Type1 and Type2 meanings is 1:3. For each integrated meaning, a token frequency is added to denote its occurrence frequency in agents’ environment. In most simulations, the ratio between token frequencies of Type1 meanings and those of Type2 meanings is set to 3:1. Considering both the token and type frequencies, during the meaning selection in production, the probability of choosing a Type1 meaning and that of choosing a Type2 meaning are equal.

4.2. Utterance Space An utterance space, U = S × S × ... × S , is constructed from a set of syllables, each of which is chosen from an alphabet set: S = {s j : j = 1,...n S } , where nS is the size of this alphabet set. Some examples of s j could be /a/, /b/, /c/, etc. These syllables can be viewed as the building blocks for human speech, but no phonetic features (e.g., consonants, vowels, tones or stresses) are considered in this model. An utterance is an element of the utterance space, u U , and it can be written as a syllable string: u = / si s j ...s Lu / , where Lu is the number of syllables in this utterance, and Lu

L,

L is a positive integer indicating the upper bound to the length of an utterance. In this model,

nS = 30 and L = 9 , and this utterance space can be easily adjusted by these parameters.

4.3. Representation and Acquisition of Linguistic Knowledge In this model, the artificial language exchanged among agents is represented by M-U mappings between semantic expressions of integrated meanings and utterances. Take the following M-U mapping as an example:

Compositionality-Regularity Coevolution Model 100

“chase”

/abcd/

Its semantic part consists of an integrated meaning (“chase”) chosen from the semantic space, and its utterance part consists of a string (/abcd/) chosen from the utterance space. There are three types of string in this model: 1) A string is called a sentence if it encodes an integrated meaning, e.g., the string /abcd/ in the above example is a sentence representing the integrated meaning “chase”; 2) A continuous string is called a word if it encodes a single sememe, e.g., if a substring /bc/ encodes the sememe “deer”, it is a word; 3) A continuous or separate string is called a phrase if it encodes two sememes that cannot form a Type2 meaning, e.g., if a separate substring /a*d/ represents “chase”, i.e., “a fox is chasing something”, then, /a*d/ is a phrase, where “*” indicates an unspecified string that can be replaced by syllable(s). Without particular linguistic senses, the concepts of sentence, word and phrase are just notations of different types of string encoding either integrated meanings or their sememe(s). A sentence comprises either a holistic, inseparable string, or a combination of a set of words or phrases. In the latter situation, syllables in sentence are concatenated without boundaries. For example, in the above example, if the sentence /abcd/ is composed of a word /bc/ and a phrase /a*d/, there is no physical boundary between /a/ and /b/ or between /c/ and /d/. Agents can recognize and distinguish each syllable in an utterance, and during linguistic communications, they have to learn linguistic knowledge on how to represent sememes with words or phrases, how to combine words or phrases into sentences to encode integrated meanings, and how to parse sentences into words or phrases. The linguistic knowledge of an agent is represented by different types of rules, which include: a) lexical rules, encoding the mappings between semantics and utterances; b) categories, encoding the membership of lexical rules and mediating the semantic roles with the syntactic structures; and c) syntactic rules, encoding the order relations between lexical rules or between

Compositionality-Regularity Coevolution Model 101

lexical members of different categories. The representation of the artificial language and linguistic rules in this model is shown in Appendix A, Section A.1. 4.3.1. Lexical rules A lexical rule denotes a relation between a semantic expression and an utterance, which is an instance of linguistic knowledge on how to map a semantic expression with an utterance. In this model, each lexical rule contains an M-U mapping and certain strength. The M-U mapping of a lexical rule is bidirectional, i.e., a lexical rule can be activated during production if its semantic expression fully or partially matches the intended meaning; a lexical rule can also be activated during comprehension if its utterance fully or partially matches the heard sentence. In other words, a lexical rule can be referred to in both production and comprehension. The strength of a lexical rule numerically indicates the probability (from 0.0 to 1.0) of using its M-U mapping successfully. The initial value of lexical rule strength is 0.5, and it can be adjusted in communications. Lexical rule strength provides a criterion for selection and competition of lexical rules. There are two types of lexical rules defined in this model, holistic rules and compositional rules, which can be classified based on their M-U mappings. The M-U mapping of a holistic rule maps a sentence with an integrated meaning, and the M-U mapping of a compositional rule either maps a word with a single sememe (a compositional rule of this type is also called a word rule) or maps a phrase with two sememes that cannot form an integrated meaning (a compositional rule of this type is also called a phrase rule). Some examples of lexical rules are shown in Figure 4.1.

Lexical rules Holistic rules:

Compositional rules:

(a) “chase”

/a d/ (0.5)

(e) “wolf”

/d/ (0.6)

(b) “hop”

/a/ (0.4)

(f) “run”

(c) “hop”

/b c/ (0.6)

(g) “chase”

(d) “run”

/a/ (0.8) /a b * d/ (0.7)

/e f/ (0.5)

Figure 4.1. The examples of lexical rules. Numbers in brackets are lexical rule strengths. Lexical rules (a)(d) are holistic, (e)-(g) are compositional, in which (e) and (f) are word rules and (g) is a phrase rule. Lexical rules (b) and (c) are synonymous rules, and (b) and (f) are homonymous rules.

Compositionality-Regularity Coevolution Model 102

Based on some regulatory mechanisms, agents can combine word or phrase rules into sentences. For example, if all lexical rules in Figure 4.1 are stored in one agent, there are two ways for him/her to encode the integrated meaning “chase” into a sentence: either using the string /ad/ of holistic rule (a), or combining the strings of compositional rules (e) and (g) into /abdd/. Similarly, there are two ways to encode the integrated meaning “run”: either using the string /ef/ of holistic rule (d), or combining the words of compositional rules (e) and (f) into /da/ or /ad/, each of which follows a different word order. The strengths of related lexical rules decide to use which way to encode the intended meanings in these examples. Natural languages contain many synonyms and homonyms. Synonyms are words or expressions with different pronunciations but the same meanings, e.g.: In English, /freedom/ and /liberty/, /subway/ and /tube/ In Chinese, /

/ and /

/ (“bus”), /

/ and /

/ (“spy”)

Homonyms are words or expressions with the same pronunciations but different meanings, e.g.: In English, /aye/, /I/ and /eye/, /dear/ and /deer/ In Chinese, / /

/ (“reason”) and / / (“to enhance”) and /

/ (“vowel”) / (“furniture”)

In the artificial language of this model, synonyms and homonyms are introduced by synonymous and homonymous rules. Lexical rules whose M-U mappings have the same meanings but different utterances are synonymous rules, e.g., lexical rules (b) and (c) in Figure 4.1; lexical rules whose M-U mappings have the same utterances but different meanings are homonymous rules, e.g., lexical rules (b) and (f) in Figure 4.1. Synonyms and homonyms add to the complexity of language processing, e.g., homonyms may cause ambiguity in comprehension, especially when contextual information is not present. There are several reasons for the existence of synonyms and homonyms in natural languages, such as lexical borrowing (e.g., /freedom/ in English was borrowed from German), different

Compositionality-Regularity Coevolution Model 103

usage in various domains (e.g., /

/ in normal conversations but /

existence of contextual, linguistic or nonlinguistic cues (e.g., /

/ in linguistics), and the / as a verb but /

/ as a

noun). Empirical studies on language acquisition have traced some mechanisms adopted by children to avoid synonyms and homonyms, such as mutual exclusivity (Markman 1989) and principle of contrast (Clark 1987). These mechanisms have been simulated in some models (e.g., A. Smith 2003b, 2003c; Gong et al. 2005b; Divina and Vogt 2006; K. Smith et al. 2006). However, there is a critical problem: due to the direct adoption of these avoiding mechanisms, the emergent language has precluded the probability of preserving synonyms or homonyms, which contradicts what is shown in natural languages. This model, without adopting any direct avoiding mechanisms, will explore whether language use can help to distinguish synonyms and homonyms. For instance, some homonymous rules, based on their different semantic information, could be associated with different categories. Then, the different usage of these categories may help to distinguish these lexical rules, and even allow the coexistence of some homonymous rules in both idiolects and communal language. 4.3.2. Syntactic rules A syntactic rule denotes a relation between a set of lexical rules and a set of order relations, which is the linguistic knowledge on the local orders that the utterances of lexical rules could follow. In this model, local orders with which strings appear in sentences can be detected and acquired by agents as syntactic rules. A syntactic rule contains a simple local order between strings of two lexical rules or two clusters (categories) of lexical rules, and certain strength. These simple local orders include: 1) before, a word or continuous phrase is before (adjacently or not) another word or continuous phrase; 2) after, a word or continuous phrase is after (adjacently or not) another word or continuous phrase; 3) surround, a separate phrase surrounds a word; 4) between, a word is surrounded by a separate phrase. The strength of a syntactic rule numerically indicates the probability (from 0.0 to 1.0) of using its local order successfully. The initial value of

Compositionality-Regularity Coevolution Model 104

syntactic rule strength is 0.5, and it can be adjusted in communications. Some examples of syntactic rules are shown in Figure 4.2.

Figure 4.2. The examples of syntactic rules. Numbers in brackets are syntactic rule strengths. “” is “after”, “I” is “surround”, and “J” is “between”. Syntactic rule (1) means that the utterance of Lexical rule (a) is before the utterance of Lexical rule (b). Syntactic rule (2) means that the utterance of Lexical rule (c) is after the utterance of any lexical rule in Category (I). Syntactic rule (3) means that the utterance of any lexical rule in Category (II) surrounds the utterance of any lexical rule in Category (III). Syntactic rule (4) means that the utterance of any lexical rule in Category (III) is surrounded by the utterance of any lexical rule in Category (IV).

Each syntactic rule contains a local order between utterances of two lexical rules. Multiple syntactic rules can be activated simultaneously to build up a global order for regulating utterances involving several lexical rules, and their strengths determine which global order is to be adopted when multiple choices are available. 4.3.3. Categories A category connects a set of lexical rules with a set of syntactic rules, which is the linguistic knowledge concerning which lexical rule follows which local order. Lexical rules can be associated with a category if their semantic expressions have the same semantic roles (e.g., Pr1, Pr2, Ag or Pat) in some integrated meanings and their utterances are similarly used in some sentences, i.e., having the identical local order with respect to the utterances of other lexical rules. This identical local order is also associated with this category as a syntactic rule. Categories mediate semantic roles (e.g., Pr1/2, Ag and Pat) and semantic structures (e.g., Type1 and Type2 integrated meanings) with syntactic roles (e.g., verb (V), subject (S) and object (O)) and syntactic structures (e.g., global order). First, to a certain extent, syntactic roles and semantic roles are correlated. In this model, without considering the passive voice, words of lexical rules that encode sememes having the semantic role Pr1 or Pr2 in integrated meanings are usually V in sentences, words of lexical rules

Compositionality-Regularity Coevolution Model 105

that encode sememes as Ag are usually S, and words of lexical rules that encode sememes as Pat are usually O. Therefore, a category having the semantic role Pr1 or Pr2 is also considered as a V category, a category of Ag as an S category, and a category of Pat as an O category. Similarly, phrase rules can be associated with phrase categories. Phrase rules encoding sememes as Ag and Pat may form a SO category if their utterances are similarly used in sentences, those encoding sememes as Ag and Pr2 can form a SV category, and those encoding sememes as Pr2 and Pat can form a VO category. Second, although semantic roles and syntactic roles are correlated, semantic structures initially have nothing to do with syntactic structures. Different syntactic structures can encode integrated meanings of the same type. For example, to encode a Type1 integrated meaning, there are two possible word orders (SV or VS); to encode a Type2 integrated meaning, there are six possible word orders (SVO, SOV, OSV, VSO, VOS, OVS). Meanwhile, based on its relative orders with respect to other rules, one lexical rule can be associated with different categories all having the same syntactic role. Therefore, before specific forms of syntactic structures emerge, there is no correlation between semantic structures and syntactic structures. A systematic mapping

between

semantic

and

syntactic

structures

emerge

(the

semantics-syntax

correspondence) only after consistent word orders are developed to express meanings with the same semantic structures. The emergence of this mapping results from the formation and selforganization of categories. During the formation of categories, a lexical rule can be simultaneously associated with different categories. First, some word rules whose sememes can be either Ag or Pat in different integrated meanings can be associated with both S and O categories. For example, the sememe “fox” can be Ag in “chase” but Pat in “chase”, and a lexical rule encoding this sememe can be associated with both S and O categories. Second, due to its different local orders with respect to different rules, one lexical rule can be associated with different categories all having the same syntactic role. In order to allow a lexical rule to be associated with multiple categories, a weight is set up for each association. An association weight of a lexical rule to a category numerically indicates the probability (from 0.0 to 1.0) for this lexical rule to follow the syntactic rules of that category.

Compositionality-Regularity Coevolution Model 106

In other words, an association weight can be viewed as a membership value of a lexical rule to a category. The initial value of association weight is 0.5, and it can be adjusted in communications. In this model, each category contains a syntactic role, a list of lexical rules, and a list of syntactic rules. Lexical rules in a category encode sememes that have the same semantic role in integrated meanings, and syntactic rules in a category encode local orders between lexical rules of this category and those of other categories, or between lexical rules of this category and some other lexical rules. Some examples of categories are shown in Figure 4.3. Categories Cat1 (V):

Lex-List: (a) “fight” (b) “run”

/b/ (0.5) [0.5] /b c/ (0.6) [0.6]

Syn-List: (I) Cat1 >> Cat2 (SV) (0.6) Cat2 (S):

Lex-List: (c) “wolf” /a c/ (0.5) [0.5] (d) “fox” /d/ (0.3) [0.4] Syn-List: (I) Cat1 >> Cat2 (SV) (0.6) (II) Cat2

Suggest Documents