How much do we really know about human languages? A computational model of language diversification and extinction Ferm´ın Moscoso del Prado Mart´ın∗ Laboratoire Dynamique du Langage (UMR–5596) Centre National de la Recherche Scientifique, Lyon, France & Institut Rhˆone-Alpin des Syst`emes Complexes, Lyon, France Michael Dunn† Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands (Dated: October 9, 2010)
Abstract An assumption of many current linguistic theories is that the distribution of linguistic features present in extant human languages is a representative sample of the features that could possibly have evolved. This implicit assumption subserves many claims that the current linguistic features provide a reliable estimate on the relative optimality of language properties, from either linguistic or general cognitive perspectives. However, considering that the extinction of a language happens more often due to socio-political factors than to linguistic and/or cognitive ones, it could well be the case that most of the languages that have historically evolved disappeared – for reasons unrelated to their linguistic properties – without leaving trace of potentially important linguistic innovations. In this study, we introduce a mesoscopic scale model of the diversification and extinction of human languages. The results of the model suggest that the development of human languages is a complex, self-organizing dynamical system that is subject to oscillations, and naturally settles on critical states. Crucially, our results indicate that the current sample of human linguistic features corresponds to a very limited and biased sample of what has actually existed. This implies that many important linguistic innovations are missing from the current records. We argue that, based only on the sample of historically known features, it is highly problematic to make inferences on the “optimal” properties of human languages and/or the underlying neurophysiological mechanisms that drive such optimality.
1
Language is among the most salient specific traits of the human species. As such, the understanding of its properties and the biological bases that enable this ability is a crucial area of research spanning multiple disciplines. According to the 2009 edition of Ethnologue, there are at present approximately seven thousand living languages in the world [1]. In addition, for many of these languages there exist a considerable number of dialects and vernaculars restricted to particular regions and/or social groups. Although officially not recognized as languages in full right, dialects nevertheless constitute consistent and often temporarily resilient systems of communication that differ from the standard varieties in phonological, lexical, syntactic, and semantic aspects. The distinction between a language and a dialect is far from clear, and the decision is often made on the bases of purely political issues. The enormous diversity of consistent communication systems is the product of thousands of years of evolution and competition between languages and groups of people. Despite the obvious differences between individual languages, many have noticed that there are some features of languages that appear to take a limited number of different values across the languages of the world, with some values of these features occurring more frequently than others in known languages. This has led to what some popular linguistic theories term ‘language universals’ or even ‘universal grammar’ (UG; [2]). In Chomsky’s original proposal UG would be something that is directly encoded in the human genome. More recent proposals aim to explain these cross-linguistic regularities as the outcome of a sort of natural selection process [3], in an idea that dates back to Darwin himself [4]. Drawing a parallel with biological evolution, these views propose that, over time, speakers tend to use more ‘fit’ forms of language, where the degree of fitness of a particular feature or combination of features is determined by its better suitability for an innate system specific for language [5] or it is driven by general-domain cognitive mechanisms influencing the relative ease or difficulty with which a linguistic feature or trait is acquired, understood, or produced [6, 7]. The selection process, whether of a linguistic-specific or general cognitive nature, would lead to the prevalence of languages with fitter linguistic traits or features. As a consequence, prevalent (combinations of) features are searched for in extant or historically known human languages [8]. An assumption that is implicit in this search is that the current distribution of linguistic features (and the known cases of feature change within or across languages) provide a representative sample of the featural developments that could have possibly evolved, therefore revealing information on the constrains in the human brain that determine the 2
shape of languages. Darwin’s often quoted “curiously parallel” patterns between the evolution of languages and the evolution of species [4] must however be taken with some care. In the biological case, the survival of a species (in a particular environment) is mostly determined by its degree of genetic fitness. Therefore, among competing species from different genetic lineages, the fitter will prevail and the less fit will become extinct. This selection also implies that the fitness of particular genes or combinations of genes ultimately benefit from the selection process, leading to a distribution of fitter genes across the species. In the linguistic analogue [9], the evolution and selection of the features within a particular language is also likely to be driven by their degrees of fitness. However, an important difference is also present. The extinction of a language (with all its features) is in most cases determined by factors that have little or no relation with the fitness of its linguistic features; rather, socio-political issues seem to be of greatest importance. The most likely reason for the extinction of a language is that the social or ethnic group in which the language lived has been economically or militarily overpowered or conquered by a different group speaking another language or dialect. Consider for instance the Americas; since the point of European colonization a very large number of indigenous languages have become extinct (a pattern that unfortunately continues today). However, it would be difficult to claim that these extinctions were caused by the Spanish, English, Portuguese, French, or Dutch speaking languages with fitter linguistic features than those spoken by the natives, rather than just having much ‘fitter’ armies and navies [10]. In essence, whereas the evolution of a language is likely to be governed by pressures on the optimality of its features similar to those of the evolution of species, the selection between different languages has very little – if any – influence from linguistic factors. The extinction of a language entails the loss of the developments and linguistic innovations evolved by its speakers throughout history. That the selection is extra-linguistic entails that a proportion of potentially fit evolved features can have been lost through socio-political accidents that pruned the evolutionary tree. Once a branch of the tree disappears, as with species, the innovations that it contained are likely not to develop again. In order to claim that the current sample and distribution of linguistic features is representative of the ‘optimal’ set one would need to ascertain that it constitutes a significant proportion of what has been evolved. The lost developments in extinct languages can be viewed as a leakage of evolutionary time; one can consider the total time spent by the cumulative sum of speakers 3
of current languages and their ancestor languages across time, in relation to the cumulative sum of the speakers of extinct languages with the corresponding proportion of speakers of their ancestral languages (which may or may not be shared with the ancestors of the extant languages). This ratio provides an estimate of the representativity of the sample of features that are present in current languages, and of their suitability to support strong claims about the organization of the relevant neurophysiological mechanisms. In recent years, problems concerning the evolution of human languages have attracted considerable interest from physical scientists, who have begun to apply techniques from dynamical systems and statistical mechanics to model these problems (e.g., [11–24]). Ref. [15] broadly classifies these models into macroscopic and microscopic models. Whereas macroscopic models are defined in terms of differential equations working on average population statistics, microscopic models go into the detailed structures in individual languages and speakers. These models have proved successful in providing insights into the properties of the evolution and geographical distributions of languages. There appears also to be some level of disagreement between the results of different models and their correspondence with empirical data. For instance, based on an analysis of data from the 1996 edition of Ethnologue, Ref. [25] concludes that the distribution of speakers across languages follows a bi-stable power-law, with different scaling exponents for the more and less popular languages, and the models presented by Refs. [20, 21] replicate this distribution. On the other hand, Ref. [3] analyzed the 2000 edition of Ethnologue and concluded that the distribution of the number of speakers across languages is actually log-normal, a claim that is supported by the models presented in Refs. [16, 23]. Another issue that seems suprising, but one on which most models appear to agree, is an exponential growth in time in the number of languages spoken in the world, roughly paralleling the simulated growth of world population [15, 16, 20, 21, 23]. A monotonous increase in the number of world languages appears at odds with the observation that linguistic diversity is alarmingly decreasing [26, 27] despite (or perhaps because of) the equally alarming increase in the rate of global population growth. Does this suggest that the technological and commercial development has modified qualitatively the dynamics of the system? In this study, we introduce a new model of the diversification and extinction of languages. Following the classification of Ref. [15], our model goes into further details than those based on differential equations using mean populational values. On the other hand, 4
the model we present does not directly consider the actual linguistic properties of languages, their geographical distribution, or the life-course of speakers at the individual level, as the microscopic models do. Therefore, we would classify our model at a mesoscopic scale, intermediate between the microscopic and macroscopic models. We will use the model to provide tentative answers to the questions that were posed above: To what extent do the current languages constitute a representative sample of the patterns that have evolved? Is there a temporarily stable distributional shape for the number of speakers of different languages? Is the apparent decrease in linguistic diversity a trace of a qualitatively different process, or rather, it agrees with the general pattern of the system?
THE MODEL
In order to make inferences on the process of diversification and extinction of languages we will abstract away from their actual properties and geographic distributions. For our purposes in this study, we assume that languages change their characteristics randomly in time, as a diffusion process in an unknown, very high-dimensional, space defining all possible human languages. Possibly, the diffusion will show spatially dependent drifts or preferred directions, reflecting the evolutionary pressures towards fitter linguistic features. Without loss of generality, we restrict our model to considering that, at random points in time, a language might split into multiple branches. Each of these branches will correspond to a consistent system that is used as the primary means of communication by some group of people larger than some minimal number. Whether these correspond to languages, dialects, or vernaculars is not relevant for our purposes; any consistent system is counted equally. The model is schematized in Fig. 1. Consider that at some time τ there exist in the world a total of L(τ ) different consistent systems of communication – that for simplicity we will refer to as languages – `1 , . . . , `L , each of them having a different number of speakers s1 , . . . , sL , such that the sum of the speakers of all languages equals the world population at that time
L X
si (τ ) = W (τ ).
(1)
i=1
Note that, in this model, each individual is mapped onto a single language. For the purposes of the simulations we are counting each speaker to its primary or dominant native language or dialect, without considering whether an individual can speak any other languages. 5
Competition
At each time point, the total world population W (τ ) is provided as an input. To compute the distribution of population at time τ given the population at time τ − 1, we compute the proportion of the total population that spoke `i at τ − 1, pi (τ − 1) = si (τ − 1)/W (τ − 1). A certain proportion of the population will have switched languages, reflecting the descendents of a speaker of `i that grow up as speakers of a different language `j . This can happen due to inter-marriages between speakers of different languages, migrations, or plain socio-economic or legal pressure. As a general rule, a larger proportion of speakers will move towards the more popular languages than towards the less popular ones, resulting in a ‘rich get richer’ type of process. The probability of language switch also increases with a larger number of available languages (i.e., it is more probable that another language is dominant with respect to the original one if there are many other languages). Instead of modeling the individual transitions between pairs of languages (as done for instance by Ref. [23]) we focus directly on the effect on the proportion of speakers of each language resulting from this process. For this purpose, we make use of the escort distribution from non-extensive thermodynamics (cf., [28]). The population at time τ is therefore sampled from the escort distribution pα (τ − 1) , qi (τ ) = PL i α (τ − 1) p j j=1
(2)
where α is a parameter controlling the amount of competition between languages. Values of α > 1 result in exaggerations of the inequalities of the original distribution (i.e., ‘rich get richer’) whereas values of α < 1 would give rise to redistribution of the speakers (i.e., ‘poor get richer’). As required, the effect of α is dependent on both the values of each pi (τ −1) and the number of languages in the population. The new values of si (τ ) are obtained as a random sample of size W (τ ) from a multinomial distribution with parameters q1 (τ ), . . . , qL (τ ). Any language for which si (τ ) = 0 is considered extinct. At each point in time, a number of languages E(τ ) will be permanently removed from the population by this procedure. Some languages are remarkably resilient. Despite having relatively small numbers of speakers, these languages manage to survive for many generations on a relatively healthy state. As expressed above, a constant value α > 1 would not be able to account for this. For this purpose, we randomly assign at the birth of each language (details below) a random individual value αi from a normal distribution with mean µα > 1 and standard deviation 6
σα < µα − 1. This individual parameter indicates the relative fitness (in geographic, socioeconomic, and/or military terms) of the group of speakers of a language. As this relative fitness can change with time, at each point in time we allow the αi values to take a step from a random walk of step size δα σα , with the direction chosen randomly. Eq. 2 then becomes
α (τ −1)
p i qi (τ ) = PL i
(τ − 1)
αj (τ −1) (τ j=1 pj
− 1)
.
(3)
Diversification
At each time step, after having assigned new values for the population of existing languages, and removed any extinct ones, we consider the possibility of a language splitting into more than one descendents. Each language `i at time τ − 1 can give rise to 1 + ni languages at time τ . Each step the ni values are sampled from a Poisson distribution with rate parameter ρ. This reflects the probability that a group of speakers of `i have developed a consistent system of communication different from `i (whether the linguistic divergence between the split languages is large or small is not relevant for our purposes, it suffices that it is consistent in a sub-population). The reasons that lead to these splits are multiple, including migrations of groups of speakers, contact with other languages (i.e., horizontal influences), differentiation of a social group or class, regional variation, or even fashions. The number of variants that can arise for a given language depends on the number of speakers that the language has. Therefore, we take ni to be the minimum between the Poisson sample above, and the integer floor of si /smin , where smin is a parameter representing the minimum number of speakers for each new language. Finally, the values of the populations for each of the new languages are obtained through a multinomial sample of size si with 1 + ni equiprobable options. Descendents for which no new speakers were assigned in the sample are directly removed from the pool and not counted [29]. The first of the descendents of a language is taken to be the evolved version of the original, and as such it inherits the fitness parameter αi . For the remaining offspring, their fitness parameter is given by averaging αi with samples from the normal distribution with mean µα and variance σα2 (as above). In this manner, the population of speakers inherits some of the fitness (by cultural development, technology, etc.) of the original population, but with a variability reflecting that it is a new social group. The averaging with the original normal 7
distribution guarantees a certain degree of regression towards the mean, keeping the average distribution of cultural fitnesses relatively stable. Notice that the model above is a dissipative system in the thermodynamical sense, with a limited, but steadily expanding volume. The rate of world population increase corresponds to the rate of volume expansion in a dynamical systems interpretation. On the one hand, the process of diversification (controlled by the ρ and smin parameters) represents a tendency towards chaos that increases the entropy of the system at each time step. On the other hand, the competition process (controlled by µα , σα , and δα ) constitutes a tendency towards deterministic order (i.e., towards a single language), binding the energy in the system and thus reducing its entropy. This type of systems sometimes drift naturally towards critical meta-stable states, exhibiting what is known as Self-Organizing Criticality (SOC; [30]). Such property has been observed in the evolutionary patterns of language [31], and would also be consistent with the observations of Ref. [25] of pervasive power-law distributions in the population of languages. We will therefore investigate this point in our simulations.
Variables of Interest
With the goal of estimating what proportion of evolutionary time is leaked in language extinctions, each language accumulates the number of speakers of its ancestors, which we will refer to as the kept time ki . When a language splits, each of the descendent languages inherits a part of ki in proportion to the amount of population of the parent language that passed on to speak the new language. The values of ki (τ ) for languages that become extinct are accumulated over time, giving an estimate of the leaked time T (τ ). At each time point the proportion of kept time is computed as PL(τ )
K(τ ) =
ki (τ ) . PL(τ ) T (τ ) + i=1 ki (τ ) i=1
(4)
In addition, for the purpose of illustrating the evolution of the population of languages, the values of the total number of languages L(τ ), the number of language births B(τ ), the number of language extinctions E(τ ), and the mean value of the αi coefficients in the population (both plain and weighted by the number of speakers of the corresponding language) are recorded for each step of the simulations. Finally, in order to investigate whether the system is indeed in a critical state, we record the language entropy (S), that is the entropy 8
of the distribution of speakers across languages relative to the maximum possible entropy P ) − L(τ i=1 pi (τ ) log pi (τ ) . (5) S(τ ) = log L(τ ) This measure provides an index of the relative stability of the shape of the distribution of speakers across languages. In addition, it reflects the relative rate of entropy production by the system (as an upper bound for the relative Kolmogorov-Sinai entropy of the system), which will be useful in assessing its criticality.
SIMULATIONS & RESULTS Parameter settings
As an input value for the simulations, we used an estimate of the evolution of the total world population between the years 10,000 BC and July 1, 2008 AD [32]. From these estimates, we interpolated 12,008 hypothetical years, which define the time-steps of the simulation, depicted by the upper line of Fig. 2(a). In order to avoid numerical underflow from very small probabilities, the populational unit in the simulations was taken to be four speakers. At the first time step we randomly split the global population estimate of one million into speakers of one thousand hypothetical primordial languages [33]. We report the results of four combinations of parameter values, which illustrate the overall pattern of results of this type of model. In all four cases we set µα = 1.0005 and smin = 8. The remaining parameter values for each simulation were [34]: Simulation 1: ρ = .003331, σα = .0001, δα = 6 · 10−7 . Simulation 2: ρ = .0029, σα = .0001, δα = 6 · 10−7 . Simulation 3: ρ = .003331, σα = .0003, δα = 6 · 10−5 . Simulation 4: ρ = .003331, σα = .0002, δα = 6 · 10−6 . For a sketch of the phylogenetic relations between the final languages, in each simulation, the ancestor languages at time-points corresponding to 10,000 BC, 7,500 BC, 5,000 BC, 2,500 BC, and 1 AD were recorded for all resulting languages. This provides a rough classification of the final distribution of languages into phylogenetic “language families” and “subfamilies” at five levels. 9
Number of languages
Fig. 2(a) shows the evolution of the total world population (upper line) with the simulated number of languages that evolved in each of the simulation (lower lines). Notice that, in contrast with the results of Refs. [15, 16, 20, 21, 23], the pattern of growth of the language population is not monotonic, but rather exhibits oscillations in all four simulations (note that the vertical axis is on logarithmic scale), despite the monotonic increase in the global population. This appears to be more realistic than concluding that the number of languages and/or dialects in the world exhibits a steady exponential increase. However, something in what all simulations do agree is in a stark increase in the number of languages paralleling the dramatic increase in the population growth rate in the XX century. Is this in contradiction with the empirical fact that many languages are currently becoming extinct? Does this signal a fundamental change of regime in the system that makes the model inadequate for the recent situation? To see that this may not be the case, consider the rates of language births and extinctions depicted by Fig. 2(b). Notice that, in all but the third simulation, the rate of language extinctions has increased notably during the last two thousand years. This happens in response to the corresponding increase of the language “birth rate”, which the extinction rate roughly tracks with a small time lag. This is consistent with the large number of language extinctions documented in historical times. Note, however, that in the most recent times, the trend in the extinction rate is significantly attenuated, or even decreasing in simulation 4 (which, as we will argue below, is the most realistic). Despite this attenuation, the rate remains fairly high. Recall that what we are calling languages refers to any consistent system of communication, including regional dialects and sociolects. We would argue that the increase in the birth rate can correspond to an increasing number of variations of the more powerful languages (Chinese, Spanish, Arabic, English, etc.) whereas the extinctions are mostly happening in the less powerful (and possible more valuable in terms of diversity) ones, such as small indigenous languages of less developed areas. This is reflected in the evolution of the average fitness parameter in the population, depicted by Fig. 2(c,d). Simulations 1 and 2 hardly show any variability in this parameter. This is due to the near-lack of Darwinian selection of cultures (not languages) induced by the very low values of the σα and δα parameters. On the other hand, as seen in Fig. 2(c), simulation 3, even without weighting by the number of speakers, due to too aggressive settings
10
of σα and δα , the population average has gone significantly below one, indicating that all languages would be in saturated non-competing dynamics, which is hardly realistic. Finally, simulation 4 exhibits the more realistic pattern. Considered individually, the average value of the fitness is above one, with most languages suffering from the competition. However, also taking into account the number of speakers of each language, one sees that a cultural selection process has taken place, leading to most people being speakers of one of the culturally stronger languages, and putting the languages associated with more fragile cultures at a growingly marked disadvantage.
Distribution of speakers across languages
Fig. 2(e) plots the evolution of the language entropy along the simulations. Notice that the entropy seems to follow a series of plateaus, corresponding to transient equilibria of the system, which are most clearly visible in simulation 4. In this simulation, the length and location of the plateaus accurately reflect the different rates of global population growth shown in Fig. 2(a). The ‘staircase’ appearance of simulation 4 suggests a structure of phase transitions. Higher rates of population growth decrease the relative entropy rate of the system, this is, they increase the relative inequality of the distribution. This is suggestive of an adaptive system that naturally keeps itself in a critical point with respect to the income of population. Notice also that, although still very slightly, all four simulations show a trace of another downwards adjustment in response to the final increase in the population growth rate (this drop consistently appeared when the simulations were re-run multiple times with different seeds for the random number generator). These discrete changes in the entropy of the system reflect changes in the shape of the probability distribution of the language speakers. Fig. 2(f) illustrates the evolution of the distribution of speakers for times corresponding to the approximate center of the entropy plateaus in simulation 4, as well as the final outcome. The dots and solid lines plot the histograms (in log-log scale) of the distribution in each time step [35]. The dashed lines show the fit of log-normal distributions to the data, which are in agreement with the empirical observation of Ref. [3]. However, the curves could also be interpreted as power-laws with two different exponents in agreement with the conclusions of Ref. [25], or as a single power-law with an exponential cutoff [36]. Distinguishing between 11
these distributional shapes is difficult, as they are known to accurately mimic each other [37]. Note also that, in the power-law interpretation, the model seems to be increasing the scaling exponent, and the tail of the distribution looks increasingly straight. This suggests a transition from an regime such as a log-normal, into a critical, power-law regime.
Self-Organizing Criticality
The next issue is to address whether the model naturally settles in a critical state. The entropy of the speakers exhibits a sequence of relative equilibria, in responses to changes in the rate of global population growth. This indeed suggests that the system is self-organizing into a state that enables optimal dissipation of the energy in each state, consistent with the SOC hypothesis of a meta-stable critical state [30]. For this, we investigated whether the time series of the relative increments in the number of living languages (the net differences between the number of born and number of extinct languages at each step, normalized by the total population of languages at that time step) constitutes a stochastic fractal, as is characteristic of critical states. This was done by assessing the values of the Hurst exponent [38] using Detrended Fluctuation Analysis (DFA; [39]). In order to make the series stationary in variance, the series was divided by a running standard deviation of the same window size. After this transformation, the series were stationary in variance [40], and their values were normally distributed according to normal quantile-quantile plots, thus they were instances of a (possibly fractional) Gaussian noise [41]. In all four cases, the Hurst exponent consistently corresponded to persistent fractional Gaussian noises, with estimated values of .83, .83, .83, and .70 respectively for each of the four simulations. These fractal patterns are consistent with the ‘punctuated equilibria’ that are observed in the patterns of language change [31], and with the pervasive power-laws found in the population of human laguages [25], which are also exhibited by the model (on top of the power-law with cutoff exhibited by the distribution of speakers across languages, we also observed truncated powerlaw distributions in the distribution of speakers across language “families”, in the number of languages in each family, and in the distribution of language ‘ages’). Taken together, these results point towards a system having the SOC property. This is not fully surprising, Ref. [30] already showed that models of biological evolution exhibit SOC. This extends the parallelism between the evolution of species and the evolution and diversification of human 12
languages [3, 4, 42].
Leakage of evolutionary time
Finally, we are in the position to address the representativity of the current sample of human languages. Fig. 2(g) shows the accumulated proportion of evolutionary time that is represented in the final distribution of languages. In the four simulations, the kept time very quickly settled in very low values (i.e., below one percent), which was followed by a sharp increase caused by the enormous growth of world population in historical times. The pattern is very similar across simulations, suggesting that it is not hugely dependent on the particular values of the model parameters. Even after the increase at the later simulated times, this percentage is only of about three percent. This great loss is confirmed by examining the proportion of languages from the original populations (using the phylogenetic representation described above) that left descendants in the final population. As shown by Fig. 2(h), most of the languages existing in the historical reference times have actually become extinct without leaving trace in the current populations. For instance, in simulation 4, of the 1,000 primordial languages, only 22 are represented in the final population of 9,705 languages, forming 22 macro-families. The rest of the linguistic diversity, its evolution, and the innovations that it gave rise to has not left any trace in the current distribution of languages.
DISCUSSION
We have introduced a simple, yet powerful model for the diversification, birth, and extinction of human languages. The model is driven by Darwinian selection on cultural features, subject to stochastic variability. The results of the model, especially so those of simulation 4, seem to be in good agreement with the empirical observations on the distribution of human languages. A number of simple extensions on this model could give additional pieces of information. For instance, geographic information could be added by placing the speakers on a map or lattice, as was done by Refs. [13, 20, 21, 24], enabling to simulate in more detail the pattern of local interactions between different groups of languages. Similarly, further information about the relative interrelationship of human groups throughout history, as provided by genetics and archaeology could be taken into consideration. Notice however
13
that the SOC property exhibited by the model suggests that including further constraints is not likely to produce qualitatively different results on the long run. Rather this would probably have the effect of small temporal disturbances in the model, after which it would naturally regain its self-organized critical state. An important implication of the SOC property of the model is that a re-running of the “tape of evolution” with minimally different starting conditions, would give rise to dramatically different results [30] in the languages that were selected and how they would have evolved. It is crucial to recall here that – unlike the biological parallel – the selection was not driven by features of the languages, but rather by features of the cultures on which each language was riding. Thus, whereas in biology one can be sure that the set of selected features represents a significant improvement (adaptation) over the original set, this process in the linguistic analogue is likely to be – at best – much slower. A re-run of the evolution could lead to a completely different set of linguistic features, including perhaps features whose mere possibility is unknown to us, but that may have happened in history with a relatively high probability, given the massive percentages of lost information. Furthermore, the levels of communication between populations have been increasing throughout history. This also increases the contact and possible horizontal influences between languages, possibly leading to a more homogeneous evolution of the linguistic features in modern times. This implies that the available sample is not only small, but probably very biased as the later stages of the evolution have received more horizontal feature transmissions (which are also not present in the biological counterpart). Therefore, the languages lost at the very early stages of evolution might be expected to contain a larger degree of linguistic heterogeneity (and thus a wider range of starting points for the diffusion processes) than a current sample of equivalent size. This makes the estimates in Figs. 2(g,h) actually upper-bounds of the information that has been kept. Our answer to question in the title is therefore: not that much. Undoubtably, the field of Linguistics provides us with vast and detailed knowledge about the languages in the world, as well as good hypotheses on the forms of the languages from which these can have evolved. This is evidenced, for instance, by the consistent detailed reconstruction of undocumented prehistorical languages – such as Proto-Indo-European – using the languages that have descended from them. However, using the current prevalence of certain features in known languages to make inferences on their relative optimality, and thus onto constraints 14
on the neurophysiogical system that causes them – irrespective of whether this is a specific ‘modular’ system [2, 5] or just the interaction of multiple general cognitive mechanisms [6, 7] – does not seem to be justified; such inferences can only be derived from the properties of the brain/mind, rather than from those of the resulting languages. As argued above, known languages are only a tiny and biased sample of the languages that have existed during the evolution of mankind, and the selection of their features has been mostly determined by extra-linguistic factors.
∗
Corresponding author:
[email protected]
†
[email protected]
[1] Ethnologue: Languages of the World, 16th ed., edited by P. M. Lewis (SIL International, Dallas, TX, 2009) http://www.ethnologue.com/ [2] N. A. Chomsky, Aspects of the theory of syntax (MIT Press, Boston, MA, 1965) [3] W. J. Sutherland, Nature 423, 276 (2003) [4] C. R. Darwin, The Descent of Man, and Selection in Relation to Sex (reprint of the 2nd edition) (Princeton University Press, Princeton, NJ, 1871/1981) [5] A. Prince and P. Smolensky, Science 275, 1604 (1997) [6] J. L. Elman, E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, Rethinking Innateness: A Connectionist Perspective on Development (The MIT Press/Bradford Books, Cambridge, MA, 1996) [7] M. H. Christiansen and N. Chater, Behav. Brain Sci. 31, 489 (2008) [8] Universals of Human Language (v. 1–4), edited by J. H. Greenberg (Stanford University Press, Stanford, CA, 1978) [9] Some authors consider that the evolution of species and the evolution of languages are not just analogue, but are rather two instances of a general theory of evolution [42? ]. [10] In the opinion of some [? ], that fitter and more ‘civilized races’ speak fitter languages is part of the argument that was put forward by Darwin. This would assume a very strong version of the Whorf–Sapir linguistic relativism hypothesis by which language determines the success of a nation. [11] Simulating the Evolution of Language, edited by A. Cangelosi and D. Parisi (Springer-Verlag,
15
New York, NY, 2002) [12] D. M. Abrams and S. H. Strogatz, Nature 424, 900 (2003) [13] M. Patriarca and T. Lepp¨ anen, Physica A 338, 296 (2004) [14] J. Mira and A. Paredes, Europhys. Let. 69, 1031 (2005) [15] D. Stauffer and C. Schulze, Phys. Life Revs. 2, 89 (2005) [16] C. Schulze and D. Stauffer, Int. J. Mod. Phys. C 16, 781 (2005) [17] C. Schulze and D. Stauffer, Computing in Science and Engineering 8, 60 (2006) [18] X. Castell´ o, V. M. Egu´ıluz, and M. San Miguel, New J. Phys. 8, 308 (2006) [19] K. Kosmidis, A. Kalampokis, and P. Argyrakis, Physica A 366, 495 (2006) [20] V. M. de Oliveira, M. A. F. Gomes, and I. R. Tsang, Physica A 361, 361 (2006) [21] V. M. de Oliveira, P. R. A. Campos, M. A. F. Gomes, and I. R. Tsang, Physica A 368, 257 (2006) [22] D. Stauffer, X. Castell´ o, V. M. Egu´ıluz, and M. San Miguel, Physica A 374, 835 (2007), ISSN 0378-4371 [23] C ¸ . Tun¸cay, Europhys. Let. 82, 20004 (2008) [24] M. Patriarca and E. Heinsalu, Physica A 388, 174 (2009) [25] M. A. F. Gomes, G. L. Vasconcelos, I. J. Tsang, and I. R. Tsang, Physica A 271, 489 (1999), ISSN 0378-4371 [26] M. Krause, Language 68, 4 (1992) [27] D. Crystal, Language Death (Cambridge University Press, Cambridge, UK, 2000) [28] C. Beck and F. Schl¨ ogl, Thermodynamics of Chaotic Systems: An Introduction (Cambridge University Press, Cambridge, UK, 1993) [29] Notice here that we are assuming that every language needs to split away from another preexisting language. This is the case in the overwhelming majority of cases. However, some cases of new languages arising ex novo are known, most saliently in the case of some modern sign languages. By ignoring these cases, we are assuming that their effect on the global population is minimal. [30] P. Bak and M. Paczuski, Proc. Natl. Acad. Sci. U. S. A. 92, 6689 (1995) [31] E. Lieberman, J.-B. Michel, J. Jackson, T. Tang, and M. A. Nowak, Nature 449, 713 (2007) [32] http://en.wikipedia.org/wiki/World population based
on
median
values
from
the
16
US
(retrieved
Department
of
on
Dec.
17,
the
Census
2009),
estimates
http://www.census.gov/ipc/www/worldhis.html. [33] Simulations run with different numbers of initial languages showed that this number mattered little for the results obtained from around 8,000 BC onwards. [34] The values of ρ were set to represent approximately one dialectal split every three generations (∼ 75 years) in the high condition, and slightly longer (∼ 85 years) in the low condition. The values of the remaining paramaters were chosen through trial and error to represent different situations. [35] For accurate estimates on the right tails, the densities were estimated using logarithmic bin widths. This has the side-effect of overestimating the densities in the leftmost bins. [36] This is the more parsimonious power-law, the number of speakers of a language is always upper-bounded by the global population, thus the cutoff is required in any case [37] A. Clauset, C. R. Shalizi, and M. E. J. Newman, SIAM Rev.(2007) [38] B. B. Mandelbrot and J. R. Wallis, Water Resour. Res. 5, 228 (1969) [39] C. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, and A. L. Goldberger, Phys. Rev. E 49, 1685 (1994) [40] The DFA method deals automatically with non-stationarities in the mean, which were in any case very small. [41] B. B. Mandelbrot and J. W. van Ness, SIAM Rev. 10, 422 (1968) [42] J. Whitfield, PLoS Biol. 6, e186 (2008)
17
ρ α
Time
X
X
Figure 1. Schema of the model. Each of the circles represents a language or dialect, with their diameters reflecting their numbers of speakers. Each row of circles represents a point in time, and the solid arrows represent the genealogy of languages. The columns represent the evolution of individual languages. At each time point a language can split following a Poisson distribution with rate ρ. The competition among languages (horizontal dashed links) is governed by the escort parameter α. When one language becomes extinct, marked by the Xs in the diagram, it is removed from the pool. The proportional loss of evolutionary time (shaded circles) propagates up the tree from the extinction of a language, indicating the proportion of each population that is not represented in the final population.
18
(b)
1.000
107
Simulation 1
106
2 3
5
4
10
104 103
1.000 0.998
30 0.999
Simulation 25
0.996
1 2
20
Simulation
0.998
3
1
4
2 0.997
15
3
Rate 10
Simulation 1
0.994
2
α
10
(d)
α
8
3
0.992
4
birth
4
0.996
0.990
extinct. 5 0.988
0.995 0
−6000
−4000
−2000
0
2000
−10000
−8000
−6000
Year
−4000
−2000
0
2000
−10000
Year
(e)
Simulation 1
0.6
2 3 4 0.4
Proportion of Languages (log)
0.8
−4000
Year
Figure 2.
−2000
0
2000
−2000
0
2000
−10000
●●
●●
●●●
●●●●
●●
−10
●
●●
●
●
●
●
●
● ●
● ●
●
Year
● ● ●
−15
●
● ●
●
● ●
1000 AD 3000 BC
● ● ●
−20
2008 AD
● ●
7500 BC
● ● ● ●
●
● ● ●
5
−6000
10
15
−2000
0
0.35
10−1 Simulation 10−1.5
1 2
10−2
3 4
−2.5
10
10−3
−8000
−6000
−4000
−2000
Year
0
2000
0.30 0.25 Simulation 0.20
●
3 ●
0.10 0.05 ●
−10000
●
−8000
●
●
−6000
−4000
−2000
Reference Year
Notice the vertical logarithmic scale. (b) Number of languages born/extinct by year (smoothed). (c) Mean value of the α fitness index across the population of languages. (d) Mean value of α weighted by the proportion of speakers of each language. (e) Evolution of the language entropy. (f) Evolution of the distribution of speakers across languages in simulation 4 (in log-log scale). (g) Preserved evolutionary time. Notice the vertical logarithmic scale. (h) Proportion of surviving families (of those that could have evolved from the year of reference).
1 2
0.15
(a) Number of languages (lower lines) and total world (top line) population by year.
19
2000
(h)
10−0.5
−10000
Log Number of Speakers
−4000
Year
100 ●
−8000
(g)
●
−6000
−4000
Year
−25
0.2 −8000
−6000
(f) −5
−10000
−8000
Proportion of Surviving Ancestor Languages
−8000
Proportion of Preserved Evolutionary Time
−10000
Language Entropy
(c)
35 109
N. Languages Born & Extinct per Year
Number of Living Languages & World Population
(a)
0
4