James Clerk Maxwell observed about analogical sets of ideas: ârecognition of the .... 307) due to role of James Flynn in focusing attention on the in- crease in ...
Scaling in Ensembles and Networks Robert Shour
Abstract This is an overview of three universal scaling laws: 1. The 4/3 Scaling Law, 2. The Ensemble Scaling Law, 3. The Natural Logarithm Scaling Law, with a focus on the Ensemble Scaling Law.
Introduction Three scaling laws appear to be universal and in some sense scale-invariant: 1. The 4/3 Scaling Law which is also manifested in 3/4 metabolic scaling. 2. The Ensemble Scaling Law: R = ηr; R is the ensemble’s collective rate for some process, η is the ensemble’s entropy or degrees of freedom relative to its mean path length µ and r = µ/δt is the ensemble’s average individual rate. 3. The Natural Logarithm Scaling Law: the mean path length in steps in a homogeneously scaled ensemble is the natural logarithm. This article mainly discusses the Ensemble Scaling Law and incidental to it, the Natural Logarithm Scaling Law. Adapting ensemble scaling law ideas to metabolic scaling leads to the 4/3 Scaling Law mentioned near the conclusion of this article.
On problem solving and scaling laws In the ancient world, the consensus view, adopted in Ptolemy’s Almagest, was that the Earth was stationary and the sun and planets revolved around the Earth. The consensus prevailed until the heliocentric point of view adopted by Copernicus about 1
2
Robert Shour
1,400 years later. Nowadays five year old children know that the sun revolves around the Earth; the sun stationary relative to the Earth provides a simpler more useful reference frame than a stationary Earth. Having regard for the disconnect between the perceived simplicity in modern times of some inferences made by physicists compared to their historically perceived difficulty, this article sets out some of the reasoning leading to the Ensemble Scaling Law, since I suspect that the three scaling laws discussed here will eventually be perceived as simple and obvious (unless someone proves them wrong). The development of ideas leading to these three scaling ideas involved: asking a question that led to unanticipated and seemingly unrelated questions, regularly encountering mistakes and inconsistencies that required fixing leading to solutions for problems that at the outset were not known to be problems at all. It is like being in a dark room, not knowing if it is a room, not knowing if it is dark, not knowing what to look for, and hoping for an answer without knowing if there is even a question: lost. It is hard to solve a problem that at the outset is not recognized as a problem and much easier with hindsight.
Naming scaling laws The name the 4/3 Scaling Law gives more precision than ‘the 4/3 law’, a name used to describe the law relating to wind eddies empirically inferred by Lewis Fry Richardson in 1926. The name Ensemble Scaling Law is better than ‘Network Rate Theorem’ used in my 2012 arXiv article on isotropy and energy scaling. Features of the law suggest it applies generally: ensemble is more encompassing than network. As well ‘ensemble’ alludes to statistical mechanics which plays a role in the Ensemble Scaling Law. Natural logarithm Scaling Law is descriptive. Observation of scaling patterns in nature imply the three scaling laws; the term ‘theorem’ is inapplicable.
Scaling laws: common attributes The three scaling laws are universal. The 4/3 Scaling Law is scale invariant. The Ensemble Scaling Law describes a measure of system entropy and its relation to system capacities that applies regardless of size. The three scaling laws all involve energy, thermodynamics, statistical mechanics, degrees of freedom, and entropy; all are modeled by mathematical scaling. Since scaling is mathematically described using an exponent applied to a scale factor (or base of a logarithmic function), scaling implies degrees of freedom plays a role. Degrees of freedom is based on dimension. Size does not affect an ensemble’s dimensionality. The common mathematical and dimensional attributes of the three scaling laws suggests that a common physical principle may underlie them.
Scaling in Ensembles and Networks
3
Status of the three scaling laws Are these three scaling laws valid laws? This article sets out reasons for thinking they are.
The three scaling laws: relationships to each other The 4/3 Scaling Law is likely the most fundamental of the three scaling laws because it seems to relate to emergence (a system growing) and because it implies (unobviously) the Ensemble Scaling Law which in turn implies the Natural Logarithm Scaling Law. Adapting the Ensemble Scaling Law in 2008 implied the existence of 4/3 scaling. The 4/3 Scaling Law can be used to infer the existence of the Ensemble Scaling Law. Since each of the 4/3 Scaling Law and the Ensemble Scaling Law implies the other, the validity of one supports the validity of the other and suggests a common underlying physical principle. The 4/3 Scaling Law implies that the capacity of a system to transmit and to receive energy depends on available degrees of freedom, and, per dimension, a four dimensional system has greater capacity to transmit energy than a three dimensional system has to receive it. The Ensemble Scaling Law seems the most remarkable of the three laws because but for the specific lexical problem it addressed, I can’t imagine hypothesizing the relationship between degrees of freedom and ensemble capacity. The mean path length in its dual role as length and scale factor in the Ensemble Scaling Law implies that the mean path length for a homogeneously scaled system is the natural logarithm.
Problem solving and analogy Karl Popper wrote All Life is Problem Solving (Popper, 1999). The creation of spoken language involves collective problem solving: how to encode percepts and concepts in an organized way using organs of speech: lips, tongue, mouth, larynx. Figuring out the relationship between the lexical creativity of societies compared to average the lexical creativity of its constituent individual members is a particular problem considered in this article. George P´olya (1887 - 1985), a mathematician, wrote on mathematical problem solving (P´olya, 1954, 1962, 1957). He included as problem solving tools induction, generalization, specialization, analogy and variation and gave examples. His observations apply not just to mathematics but also to the creation and evolution of language and generally to all problem solving. James Clerk Maxwell observed about analogical sets of ideas: “recognition of the formal analogy between the two systems of ideas leads to a knowledge of both, more profound that could be obtained by studying each system separately” (Maxwell,
4
Robert Shour
1890, Vol. II, p. 647). It may be easier to detect certain underlying principles in one system than in another; recognizing that those principles operate in both systems can illuminate both systems yielding the ‘more profound’ knowledge of which Maxwell spoke. Analogies can be found between on one hand mathematical models and on the other hand physical phenomena, graphs and data, to furnish a theory explanatory of observations. Learned activities such as, for example walking, hunting, farming, manufacturing, sport techniques, dancing and problem solving generally involve analogy: if a person in a certain situation acts in a certain way then another person in an analogical situation acting in an analogical way will achieve an analogical outcome. In devising a mathematical model of a given observation, analogy saves time and energy. A yet unsolved target problem can be solved if the circumstances are analogical to an applicable well-established source theory or already solved problem. Considerably less work may be required to find the analogical source solution than to instead solve the target problem from scratch; that’s why calculus theorems are useful. Finding an applicable analogy can save energy. The experience of successfully using analogy to solve novel problems rewards analogical repetitions of the same problem solving techniques.
Problem solving sequences as evidence of validity Solutions that click into place in sequence, particularly after a series of mistakes, inconsistencies and failures, confer (sometimes mistakenly) increased plausibility; a solution advancing the problem appears not so much as a discovery as an inevitable encounter. This effect is amplified if the researcher does not know in advance what to expect or what to look for, as if what is sought instead finds the seeker, or as if the solution was always there waiting to be found. This article attempts to recreate parts of the problem solving sequences leading to the Ensemble Scaling Law.
Inference sets Denote as an inference set any collection which may include observations, data, perceptions, inferences (reasoning). The concept of number, an inference set, for example, includes the idea of counting identical things, the concept of countability, and the application of an inference: that it is convenient to provide names and symbols to differently sized groups. An inference set can include other inference sets. Inference sets can be hierarchical; that is an attribute of taxonomy or categorization. Mathematical theories, themselves inference sets, are sometimes built up from preliminary steps called lemmas, which are also inference sets.
Scaling in Ensembles and Networks
5
In particular, parts of language can be considered parts of hierarchies of inference sets: phonemes, morphemes, alphabets, orthography, sentences, words, grammars, stories, books, encyclopedias.
Energy as a common currency for things and abstractions Energy is used to make ideas and collections of ideas. Devising theories requires energy. Forks and knives, buildings and bridges, automobiles and planes, books and movies, are realizations of energy invested in thought: ideas, with the application of energy, converted into things. Money enables people to acquire things that were produced by the expenditure of energy. Language is a tool (an artifact) produced by collective problem solving, which requires a collective investment of energy, for communication. Natural phenomena, organisms and space itself involve distribution and applications of energy. In making and using artifacts (including language), societies collectively appraise their efficiency; feedback leads to increases in efficiency. Organisms compete for survival; those that increase their energy efficiency increase their survivability. Cross-disciplinary analogies for ideas used to characterize theories, artifacts and natural phenomena arise because energy plays a role everywhere. Energy is common currency for money, artifacts, and abstractions. Societies emergently collectively seek the best return for energy invested in solving problems and in the creation of new abstractions. Economics applies to society’s collective deployment of its energy resources in problem solving; “a potatoe-field should pay as well as a clover-field, and a clover-field as a turnip-field” (Jevons, 1879, Preface, p. liv). Analogously, promulgating a new idea should pay as well — or better, considering the added energy cost of learning a new idea — as increasing the efficiency of an existing idea, and similarly for artifacts.
Initial question and tacks ‘’Tis evident, that all the sciences have a relation, greater or less, to human nature’ (Hume, 1888, p. xix). The problem solving sequence leading to the Ensemble Scaling Law began with this question about human nature: Why do people act violently? One answer is that they are under the influence of ideas. The first line of investigation involved reading about history and culture to learn what ideas might have such influence. After reading several books in 2001, I concluded that learning about relevant history and culture exceeded my abilities and available time (and probable lifespan). The initial tack failed. I narrowed the scope of investigation to the nature of inference: behind deliberate conduct is reasoning, either of the actor or of someone influencing the actor, or of both. But to analyze reasoning requires knowledge
6
Robert Shour
of the capacities and limitations of language and so this second tack failed. Then I began to look at language but knew too little. This third tack failed. In problem solving, there is an advantage in accumulating a repertoire of failed hypotheses, assumptions and calculations: you know where not to go, narrowing the scope of investigation. In June 2002 I began to keep notes of observations about inference and language hoping that the accumulation of observations might after a few years reveal a pattern despite having no hypothesis and no endpoint in mind. I continued that process and in June 2005 read my notes: over 105,000 words. I reached a conclusion: I had wasted my time. Ironically I now hope I was wrong in my conclusion then.
Inference and language My 2002 to 2005 notes explored the concept of inference sets, collective intelligence, and emergence. They also considered an analogy based on an idea of Michael Barnsley I read about in 1994. Barnsley had invented an algorithm to reduce the digital size of images (Barnsley and Hurd, 1993). I analogized the collective efforts of software engineers increasing the efficiency of data compression software to the collective efforts of societies increasing the efficiency of inference sets. Emergence (Thompson, 1945; Kauffman, 1993, 1995; Holland, 1998; Johnson, 2001; Gould, 2002) and complexity (Waldorp, 1992; Boccara, 2004) are concepts popularized in the 1990s. Complex systems — cells, organisms, brains, markets, societies — emerge from networked components that individually do not exhibit emergent attributes. The data compression software analogy, inference sets, emergence, and collective behavior combined imply that societies emergently collectively increase the efficiency of their store of inference sets. Histories of ideas such as calculus (Baron, 1969; Boyer, 1949), mathematics (Boyer, 1991; Cajori, 1993 originally 1928 and 1929) and biology (genetics), and economic progress (Romer, 1990) evidence that. Analogous observations apply to language based on historical linguistics (McMahon, 1994; Campbell, 1998). Emergence of more efficient inference sets is difficult to observe over periods of months and years and difficult to measure. That applies to language. In the past 60 years language facility has been described as an instinct (Pinker, 2000) with a grammar module (Chomsky, 1975) evolving in human brains, a point of view that minimizes the collective role of societies in emergently building languages. A grammar module is unlikely (Everett, 2017); speech is too recent for brains to have evolved a grammar module. Before the existence of words, the efficient use of which is improved by grammar, what would prompt or require the evolution of a grammar module? More likely, ‘Speech is a non-instinctive, acquired, “cultural” function’ (Sapir, 1921, p. 4).
Scaling in Ensembles and Networks
7
Increasing average IQs hypothesis In early July 2005 I read an article that mentioned the Flynn Effect. The ‘Flynn Effect’ was called that by Richard Herrnstein and Charles Murray (Herrnstein and Murray, 1994, p. 307) due to role of James Flynn in focusing attention on the increase in average IQ test scores first identified in the 1930s. The average rate of increase in IQs was in the vicinity of 3% per decade. No consensus on why average IQs increase existed. I had read about increasing average IQs before but now read this observation in the context of having just read my notes about societies increasing the efficiency of their inference sets. The notes emergently implied that average IQs may increase because more efficient inference sets enable more efficient problem solving. Whether increasing efficiency of inference sets explains increasing IQs seemed to me to be a distraction from the initial question about human behavior. (Wrong again, as it turned out.) Then I wondered: is there a way to show a correlation between increasing average IQs and the increasing efficiency of inference sets? Showing that existing ideas increase in efficiency would be difficult to measure. Instead, based on the economic point of view above, infer that the rate at which ideas increase in number is the same as the rate at which existing ideas become more efficient; society would collectively allocate problem solving energy resources to obtain the highest return.
What inference sets increase in efficiency like IQs? What data might be proxy for the rate of increase in the number of inference sets? The number of magazines, books, novels, or newspapers sold or words written or read? The number of pages in printed books or encyclopedias? The number of new patents issued? The number of new mathematics theorems or scientific discoveries? How can the discreteness of theorems and discoveries be measured in a way that makes them countable? Finding tabulated totals for such data might be difficult. More pointedly, none of these are necessarily connected to society-wide collective emergent problem solving efforts that increase the efficiency of inference sets. The problem solving rates of small subsets of society are not necessarily representative of a whole society’s average problem solving rate. In July 2005, after a few weeks I arrived at the sizes of lexicons as possible proxies for measuring the rate of increase in inference sets. The number of words increase because the number of inference sets increase. Words are countable. Word counts can be compared provided the criteria of what counts as a word for different compilations of words are consistent. Lexicons include a large number of solved problems: how to make phonemes, which of the available phonemes to use, what rules apply to joining phonemes together, what musicality applies to phonemes chosen for encoding, how to use phonemes to encode percepts and concepts, how to join words into phrases, sentences and stories. Problems involved in building a lan-
8
Robert Shour
guage involve lots of problem solvers — the whole society speaking the language over many generations. Because of the large number of solved problems involved in building a language and the large number of problem solvers, using the rate of increase of a lexicon is a good proxy candidate to use to calculate the rate of increase in the efficiency of collective problem solving. The internet provided some information about lexicon sizes. ‘The vocabulary has grown from the 50,000 to 60,000 words in Old English to the tremendous number of entries – 650,000 to 750,000 – in an unabridged dictionary of today’ (Encyclopedia Americana, Volume 10. Grolier, 1999). Old English may be dated to 500 or so in the common era. The historical English lexical growth rate worked out to be roughly 3.39% per decade. This seemed too close to the 3.3% per decade or so rate of increase in average IQs to be a mere coincidence.
Lexicostatistics Are there studies of other kinds of lexical growth rates that can be compared to the English lexical growth rate, to test the 3.39% per decade rate of increase estimate? David Crystal’s book on language mentions lexicostatistics (Crystal, 2005, p. 333), the study of lexical change. Morris Swadesh measured, by studying historical records, how many of a list of words not culturally dependent (such as you, he, we, they, this, that, there) were still in common (cognates) for two related languages after a thousand years; he said it was about 86% (Swadesh, 1971, p. 276). Swadesh called his study of lexical change glottochronology. Swadesh noted the divergence rate of two related languages from each other was — based on his 86% per thousand year finding — not more than 14% per thousand years (Swadesh, 1971). Using that rate, he extrapolated backward in time and estimated that Indo-European began at least 7,000 years earlier, that is, at least 7,000 years before 1966 when he wrote. Swadesh’s 7,000 year estimate can be improved by a recent study. Using computer programs, in 2003 Gray and Atkinson estimated that English’s ancestral language began 8,700 years ago Gray and Atkinson (2003). Since 37 years separates Swadesh’s 1966 estimate of the origin of Indo-European at least 7,000 years ago and Gray and Atkinson’s 2003 estimate of an origin 8,700 years ago, update Swadesh’s estimate of the rate of divergence of two daughter languages as: 7037 × 14% = 11.32%. 8700
(1)
per decade. If the same collective problem solving that applies to growing lexicons applies to two diverging lexicons of sister languages separating each of the diverging languages should on average separate from their common source at the rate of 1/2 × 11.32% = 5.66% per thousand years, much more slowly than English lexical growth. This presents a new problem. This new problem leads to the Ensemble Scaling Law.
Scaling in Ensembles and Networks
9
Why do these lexical rates of change differ? The average 3.39% per decade English lexical growth rate plausibly results from collective problem solving applied to devising words for new ideas and indirectly measures increases in the number of inference sets which are represented by new words. If the same process of collective problem solving applies to modify existing words, how could these two rates — on one hand average English lexical growth of 3.39% per decade and on the other hand half the rate of lexical divergence for related languages — 5.66% per thousand years — differ so much? The average English lexical growth rate is about 60 times greater than 5.66% per thousand years. Perhaps the rates are measuring different effects? Perhaps 5.66% per thousand years is a kind of ‘fossil rate’ embedded in the English lexical growth rate?
Ask experts By September 2005 these problems had arisen: • Does increasing inference set efficiency account for increasing average IQs? • Are measurements of English lexical growth rates reliable statistics? • Why does half the rate of lexical divergence of daughter languages differ from the English lexical growth rate? • Glottochronology can estimate the age of the ancestral Indo-European language. Can the English lexical growth rate be used to estimate the age of language itself: when language was invented? If the rate of increases in the efficiency of inference sets is almost constant, and if human physiology is constant over tens of thousands of years, it might be possible to estimate when language began, far before historical records came into existence. • Suppose the possible fossil rate of change in language suggested by glottochronology, 5.66% per thousand years, was applied to the size of the English lexicon in 1989, 616,500 words, in the Oxford English Dictionary (OED) (Simpson and Weiner, 1989)? Can that estimate when grammatical language began? In September 2005, I wondered, had increasing compression of information contained in language ever been proposed as an explanation for increasing average IQs? The basis for the question was: If language and symbolic communication over time increase in conceptual density, then the ability of speakers to juggle concepts increases over time; . . . it becomes easier to be “smarter”. Maybe I should ask experts? If research is preparation for future publication, asking questions has some risks; even if not intended, posing the question to others could put the question and proposed solutions for it into the public domain. But posing the question to an expert seemed likely to be faster than searching the literature and future publication was not, at the time, a consideration.
10
Robert Shour
On September 19, 2005, I emailed a friend who was a professor at the University of Waterloo, setting out the basis for the question, who forwarded the question to a colleague. His colleague replied: ‘I’ve heard various cognitive explanations, as the Wikipedia article summarizes, but not the linguistic change one.’ September 20, 2005 another professor expert in the area answered my emails similarly and expressed interest in these ideas. It thus appeared that the hypothesis was novel but was there evidence for it?
English lexicon sizes To calculate English lexical growth rates requires historical English lexicon sizes. Dr. Traxel (University of Munster) helpfully replied to a September 2005 email from me that the Old English lexicon is: ‘. . . from 24,000 (M. Scheler, Der englische Wortschatz, Grundlagen der Anglistik und Amerikanistik 9 (Berlin, 1977), p. 14) to 30,000 (H. Gneuss, ’The Old English Language’, in The Cambridge Companion to Old English Literature, ed. M. Godden and M. Lapidge (Cambridge, 1991), pp. 2354, at p. 39). But bear in mind that these are simply the extant words.’ Gneuss was a good starting place. Bosworth in 1898 mentioned an estimate of 38,000 English words of which 23,000 were of Anglo-Saxon origin (Bosworth, 1898, Preface, p. iv). After Bosworth’s death Toller supplemented Bosworth’s dictionary (Toller, 1921). The University of Toronto’s Dictionary of Old English (DOE) covers the years 600 to 1150. Letter A was released in 1994. The letter I is being prepared for publication in 2018 (University of Toronto, 2017). Since DOE is intended to complement the OED (Simpson and Weiner, 1989), word counts are likely determined on a comparable basis. By October 9, 2005 DOE had completed A through F: 11,052 words. Extrapolating to 24 letters gives 37,892 words, close to Bosworth’s 38,000 estimate. By September 2008, DOE had added G; extrapolated to 24 letters gives 37,113 words. My 2005 and 2006 spreadsheets used 37,892; with the addition of G, 37,113 words may be a better estimate of the size of the lexicon. The University of Toronto also has The Early Modern English Dictionaries Database (Lancashire, 1999) with about 200,000 words for 1530 to 1657. The OED in 1989 had 616,500 words (Simpson and Weiner, 1989). My 2005 calculations did not refer to University of Michigan’s Middle English dictionary based on a large corpus from 1100 to 1500 with 54,081 entries, which is not used in this article either mainly because I did not know about it then, and did not now want to redo spreadsheet calculations. It may be worth investigating.
Scaling in Ensembles and Networks
11
English lexical growth rates Increasing to 616,500 words at 1989 gives a per decade English lexical growth rate for • 37,113 words at 1150: 3.35%. • 37,892 words at 1150: 3.32%. • 200,000 words at 1657 : 3.39%. Using 1657 as a starting point is probably better than using 1150, since more recent writings are likely more comprehensive than writings hundreds of years older. Even if the decision as to what counts as a word is consistent, comparable rates for other kinds of collective problem solving would give more confidence that English lexical growth rates correspond to the rate of increase in average IQs.
The log hypothesis A child is born knowing nothing, more or less. Suppose the child acquires all knowledge from its parents and via parents from ancestors. Assume each generation contributes an equal amount to the child’s acquired knowledge. Then the knowledge the child receives would be log2 (n) = k for k generations of knowledge times the average individual knowledge of parents and ancestors totaling n in number: a child networks with historical knowledge possessed by ancestors. It is a simplified way to model the acquisition of knowledge — the acquisition of a store of solved problems. What is the base of the logarithmic function? If only a hierarchy of ancestors contributed knowledge the base would be 2. A child, however, has companions and contemporaries other than parents. The base of the log is not 2, but what is it? This suggests exploring other bases; looking for numerical bases other than 2 turned out to be hopeless, but hopeless in interesting ways.
Band size In 2006 I reasoned: humans in a group communicate daily with each other. Over time efficiencies in conveying information develop. Improved abstractions disseminate in the band. Slightly more efficient abstractions emerge. The average size of a paleolithic hunter band may have been about 50 before language and 150 after language (Dunbar, 1997); Dunbar supposes that language as a grooming analog enables tripling band size. Perhaps 3, 50 or 150 or 85 people can comfortably converse with each other. Here anthropological studies links to a mathematical problem about networks (I would not have supposed that possible); everything is connected.
12
Robert Shour
By October 2006 I had a vague chancy, hypothesis that the log of band size is a factor in the rate of increase in abstraction efficiency.
Spreadsheets in vain From October 3, 2005 to May 2007 spreadsheets I set up included: • October 10, 2005: Can English lexical growth rates estimate when Indo-European began? Calculating rates of lexical increase for 38,000 to 60,000 words at the year 600, 200,000 words at the year 1657 and 616,000 to 1,250,000 words at the years 1989 and 2000 gave increases of 1.1% to 5.3% per decade. Projecting at those rates back to one word gave 228 to 966 decades ago. The calculations were wrong because the Old English dictionary included words accumulated from 600 to 1150; the 38,000 Old English words should be dated to 1150, not the year 600, an error that took me two years to identify. • Suppose language began one half, one, two or five million years ago leading to OED’s 616,500 words in 1989. How does the rate of change compare to 3.39% per decade? For one half million years ago, the rate would be 0.027% per decade, or 2.7% per thousand years. The per thousand year rate seemed closer to the 5.6% per thousand year rate based on Swadesh’s ideas. • Using 50, 85 and 150 as bases in logarithmic functions in spreadsheets in 2006 did not give 3.39% per decade. • Mistaken assumptions incorporated into the spreadsheets implied that the rate of lexical increase from 600 to 1657 was 1.6%; the mistakes took two years to identify and correct. The spreadsheets were littered with mistaken assumptions and analyses. The spreadsheet rates, however, suggested turned out to be helpful guides. But spreadsheets failed as a substitute for mathematical analysis. By May 2007 I was ready to give up (again).
The mean path length solution Perhaps the required function was not logarithmic, perhaps it was too complex for me, or perhaps even too complex for a human being, to understand. At the end of May 2007 I resolved on a final attempt to find a scale factor, if there was one. Suppose the collective problem solving rate R of a society with n members multiplied the average individual problem solving rate r such that R = logb (n)r for some base b not 2, 50, 85 or 150. I finally realized that in principle no number at all representing acquaintances could work as a base. The more people an individual knows, the larger b would be. But the larger b is, the smaller is the knowledge multiplier logb (n) = η(n). (I now
Scaling in Ensembles and Networks
13
use Greek η because it sounds similar to the start of the word entropy.) It cannot be that knowing more people reduces the knowledge benefit of networking. But if the network rate multiplier is logarithmic then when the base b increases (more acquaintances), logb (n) would decrease (less information). What base b increases the value of logb (n) when b decreases? If the time it takes for people to connect decreases then the information supply to an individual should increase. The average (mean) time of connection is proportional to the mean path length µ, measured in steps, between members of a network. Thus amend the hypothesis to: R = η(pop) × η(lex)r where, for a given date, pop is the number of English speaking people and lex is the number of words in the English lexicon, and the bases for the log functions of population and lexicon are their respective mean path lengths. Is η(n) = logµ (n) for a network? Almost.
Clustering coefficient The mean path length µ applies equally to all nodes. Using an average as a base treats social and lexical networks as, in the average, homogeneously scaled. But network nodes have closer connections to some nodes than others; not all acquaintances are friends. The preference of nodes for certain other nodes can be characterized using a clustering coefficient. The clustering coefficient Ci for the ith node measures what proportion of the nodes adjacent to it actually connect to it in one step — not via intermediaries. A network’s clustering coefficient C is the average of all Ci . The function sought η(n) = C logµ (n) can be calculated if n, µ and C are known for a network. Why C and not 1/C in front of logµ (n)? Suppose µ a = n so logµ (n) = a, where a represents a levels of scaling. Since only an average proportion 0 < C < 1 nodes connect, energy not used to connect to one step away nodes can be used to connect to farther away nodes: there are not a hierarchical levels (a being the exponent of µ), but rather a/C > a hierarchically scaled levels. So η(n) = C logµ (n). Equivalently, energy moves on only a proportion Ci of the possible one step paths to adjacent nodes, and so the available energy can go a farther distance, to a/C levels. Of the energy available to move through logµ (n) levels, only a proportion C of that energy is used for the average number of adjacent nodes.
Mean path length scaling implications Take the network’s mean path length µ to be the distance between all pairs of nodes to simplify analysis. Assume the mean is the actual path length. Either node in a pair can transmit information to the other. A single path for bi-directional transfer of information would seem to be efficient. The bi-directionality implies that the energy to receive and the energy to transmit are the same.
14
Robert Shour
However, metabolic scaling reveals bi-directionality to be mistaken; the two system aspect of the 4/3 Scaling Law is mentioned later in this article. Transmitters are in four dimensional systems (three spatial dimensions and one dimension for the flow of information) and receivers are in three dimensional spaces. There are distinct networks for social networks. What obscures the existence of the two contemporaneous networks is that the same person can — at different times — be in one of the two networks. Zipf presciently treated speakers and hearers as distinct groups Zipf (1949) with different priorities.
Mean path length data A 1998 study of 225,226 networked actors found µ = 3.65 and C = 0.79 (Watts and Strogatz, 1998). Do values for µ and C apply to networks in different places, eras and cultures? For a mean path length, applying the 4/3 Scaling Law to the Natural Logarithm later in this article suggests yes. Perhaps that applies to clustering coefficient C too; I don’t know. In the following I will use the 1998 study’s value for C. A 2001 study on about 3/4 of the million words appearing in the British National Corpus, 70 million words of written English (Ferrer i Cancho and Sol´e, 2001), found µ = 2.67 and C = 0.437. A 2002 study based on an online English thesaurus of about 30,000 words found µ=3.16 and C=0.53 (Motter et al, 2002). What follows uses the 2001 study with its larger sample size more representative of usage than a thesaurus.
Population data Both η(pop) and η(lex) require numbers. For lexicon sizes I use the numbers above. For English population at 1150, Hinde remarked (Hinde, 2003, p. 28) that the English population was 1.6 to 1.7 million in 1086 (Domesday) and estimated (p. 24) English population growth of 0.5% per year for 1086 to 1348. Similar estimates for Domesday are in (Sawyer, 1998, p. 149), (Grigg, 1980, p. 53), and (Snooks, 1995, p. 34). Using these sources, estimate 2,300,000 English speakers in 1150. The estimated English population in 1656 was 5,281,347 (Wrigley et al, 1989, Table 7.8); use that figure for 1657, the end date of the Middle English dictionary. For native English speakers in 1989, add censuses: for the USA, 248,000,000 in 1990 (Perry and Mackun, 2001); Canada 27,296,859 in 1991 (Sta, 1996); England 50,748,000 in 1991 (UKC, 2003); Australia, 16,850,540 in 1991 (Aus, 1993, p. 11): total 343,595,000 people, round up to 350,000,000.
Scaling in Ensembles and Networks
15
Finding the individual rate using network entropy From June 2007 to August 2009 finding average r included: • Adjusting Swadesh’s 14% per thousand year rate using Gray and Atkinson’s computer-based (Gray and Atkinson, 2003) results, giving 11.32% per thousand years. Half that is 5.66% per thousand years. • Using end dates not start dates to date lexicons in dictionaries. • Recognizing that to calculate R and r averaged for a period of time t1 to t2 requires calculating η(pop) and η(lex) averaged over the same period of time. • At 1657: η(pop) = η(15, 000, 000) = 10.082; η(lex) = η(200, 000) = 5.623. • At 1989: η(pop) = η(350, 000, 000) = 12.04; η(lex) = η(616, 500) = 5.932. • Using averages for η(pop) and η(lex) for 1657 to 1989 and R = 3.39% per decade in R = η(pop) × η(lex)r (2) gives average r = 5.6 % per thousand years. In 2007 I divided 5.6% per thousand years by two on the incorrect assumption that at a beginning time η was 0, an error corrected in the summer of 2009. Using the incorrect assumption, in 2007 I inferred that the ratio between the rate of divergence of two daughter languages and average individual nodal rate r was 4 : 1 instead of the correct 2 : 1. The explanation for the 2 : 1 ratio is set out later in this article. In June 2007 the apparent whole number relationship between r and the rate at which languages diverged persuaded me to continue investigating these ideas; the formula for η seemed to know the answer but I did not. And what was the formula C logµ (n)?
Robustness of rate calculations In spreadsheets from October 2005 to May 2007 varying the number of words at starting and end dates had a relatively immaterial effect on the calculated lexical growth rates. The results were robust because of the logarithmic formulas.
Lexical rate problems, 2007 Might the rate at which average IQs increase match the rate of increase in the English lexicon be coincidence? The plausibility of the hypothesis that the increasing efficiency of inference sets depends also on accepting that: • The rate of increase in the size of a lexicon should match the rate of increase in the efficiency of existing concepts.
16
Robert Shour
• Increasing the efficiency of existing inference sets helps solve problems collectively. • The available data is reliable enough to give good estimates of rates of increase in the efficiency of inference sets. Different English dictionaries have different word counts. Comparing word counts for dictionaries compiled at different dates requires that the compared dictionaries have the same criteria for what counts as a word. Both DOE and OED are academically grounded dictionaries, probably with similar if not the same criteria for what counts as a word. The robustness of calculated average rates (due the role of the logarithmic function) is helpful. The larger the lexicon and the longer the period of time for different historical lexicons the more reliable are calculations of average rates. The remarkable concurrence between half the rate of divergence for daughter languages, 5.66% per thousand years, and the rate 5.6% per thousand years for r calculated using η gives much assurance. But why do they match?
Exponent implications in July 2007 η(pop) × η(lex) = 12.04 × 5.932 = 71.42 at 1989 represents 71.42 scaled hierarchies — orders of magnitude relative to the mean path lengths for a population and a lexicon. English speaking society’s 1989 collective problem solving capacity is 71.42 orders of magnitude in degrees of freedom greater than that of an individual who is unnetworked and without language. The relationship R = η(pop) × η(lex) × r accounts for the role of language in increasing the number of collectively solved problems. η(lex), the entropy of the lexicon, as proxy for the entropy of the cumulative store of collectively solved problems, enormously increases society’s collective problem solving degrees of freedom. For English speakers, η(lex) multiplies collective problem solving capacity (absent language) by a factor of 5.932. A year contains 31,153,600 seconds, about 3.2 × 107 seconds. The current estimated age of the universe is 13.7 billion years: 1.37 × 1010 years. In seconds, the age of the universe is 3.2 × 107 × 1.37 × 1010 ≈ 4.4 × 1017 . To represent the age of universe in seconds using a power of 10 requires an exponent of 17. To represent how much greater than an individual’s problem solving degrees of freedom is compared to that of English speaking society at 1989 requires an exponent of 71.42. Even considering that an individual learns and acquires some proportion of society’s problem solving capacity and knowledge, compared to an individual’s problem solving capacity the cumulative problem solving capacity of all humans who ever lived would seem omniscient. The collective capacity pre-exists and outlasts every person. After a few weeks of considering these observations in July 2005 (which seemed overwhelming) I decided to instead investigate C logµ (n).
Scaling in Ensembles and Networks
17
Problems with network entropy The initial hypothesis about a child acquiring knowledge — adding solved problems to a store of solved problems — can be visualized as a scaled pyramid of linked nodes. In June and July 2007 a pyramid model implied these problems: • Counting problem. For µ a scale factor µ k = n for some k for n nodes in a network. Each node has potential clustered sources of information µ + µ 2 +. . .+ µ k . Each node would have more information sources than nodes in the network. If the network receives n units of energy per time unit for each generation of information transmission involving a particular node, not enough energy exists to transmit that information to the recipient. What model resolves this? This problem is analogous to problems raised by the ergodic hypothesis. • Commensurability problem. How can the mean path length, a measure of distance, scale n, a population size? • Categorization problem. Is a mean path length a distance or a scale factor? • The n − 1 problem. If a node receives information from the rest of the network, the argument of log should be n − 1 not n unless the node transmits, impossibly, new information to itself (perhaps related a self-energy problem). • The connection problem. How would nodes in such a network connect?
Information theory and thermodynamics I thought articles about information (MacKay, 2003; Shannon, 1948), networks (Watts and Strogatz, 1998; Latora and Marchiori, 2001) and emergence (Waldorp, 1992; Kauffman, 1993; Holland, 1998) might assist in solving problems with η. Information entropy is defined by a logarithmic function with base 2. The formula for η resembles that for information entropy. To learn more about entropy I began reading about thermodynamics (Longair, 2003; Allen and Maxwell, 1948) and statistical mechanics (Gibbs, 1914; Tolman, 1942). By late 2007 it was apparent that C logµ (n) is the entropy of a network based on a network statistic, the mean path length µ. Shannon noted that information entropy was maximized when the probability of two symbols is equal (Shannon, 1948). Khinchin points out (Khinchin, 1957) that Jensen’s inequality about concave and convex functions (Jensen, 1906) implies Shannon’s observation. The scale factor being the mean path length µ implies homogeneous scaling and isotropy on average.
18
Robert Shour
Flattened pyramid The counting, n−1 and connection problems resolved due to persistence and luck in December 2007 by flattening the hierarchically scaled pyramid model. Each scaled group of nodes is contained in the next scaled up group of nodes for all k levels; n energy units can reach the n nodes of k levels all at once, because of the nestedness of clusters. The smallest cluster would have a number of nodes proportional to µ, the next smallest µ 2 and so on, while the number of clusters represented by the exponent of the scale factor would go down; the scale factor µ k−1 and the mean path length µ multiplied together gave µ k = n. There are not k levels of the pyramid but rather a single level of nested sets of nodes with n nodes. Each node is in k nested hierarchies, solving the n − 1 problem. As for the connection problem, it is capacity to connect to other nodes not actually connecting that matters. A couple of years later this led to characterizing network entropy as network degrees of freedom relative to network mean path length. The commensurability and categorization problems are addressed later in this article in considering the dual roles of mean path lengths.
Metcalfe’s Law Metcalfe’s Law estimates that the value of a communications network grows as the square of the number of users. A 2006 article (Briscoe et al, 2006) derived an estimate that growth increased instead by n log(n). Calculating network growth in 2007 for the same problem using the η formula above gives: η2 − η1 ∝ C logµ (n2 ) −C logµ (n1 ) = C logµ (n2 /n1 ).
(3)
Like the 2006 article this involves a logarithmic function but the derivation is much shorter. Would η give the result that efficiently if it was invalid? Maybe. If equation (3) is correct then network entropy η has powerful problem solving utility.
Natural logarithm scaling In the Ensemble Scaling Law involving R and r the formula logµ (n)µ is implicit, since r = µ/t for information or energy proportional to µ. The mean path length µ has dual roles, as a scale factor — base of the logarithmic function — and as a length which logµ (n) multiplies. Using another symbol such as s for the base of the logarithmic function dispenses with the ambiguity of µ’s roles but the dual role itself is revealing. Choosing the length µ in logs (n)µ where sk µ = n induces the value of k.
Scaling in Ensembles and Networks
19
Calculus courses teach that ex is its own derivative. It had not occurred to me that a physical principle might lie beneath the natural logarithm. The dual role of µ as scale factor and length implies that the rate of change of a scale factor µ k is proportional to µ, implying that the mean path length equals the natural logarithm. The mean path length is by definition the same for all nodes. If a system’s actual path length for all pairs of nodes is the same — equivalent to assuming a constant speed for all nodes in the system — then the mean path length for a homogeneously scaled, and therefore isotropic, network must equal the natural logarithm. Use the mean path length of a system to scale it and as helps reveal the natural logarithm’s connection to isotropy. The ambiguity of the role of the mean path length, treated as a feature not a defect, seems to be ingeniously exploited by Nature. If energy distribution in networks and systems, such as the atmosphere or space, is actually or on average uniform (isotropic), then the natural logarithm should figure in modeling phenomena. This would account for the ubiquity of the natural logarithm in the natural sciences. The natural logarithm is also a kind of compromise mean path length, reconciling competing features of the mean path length. On one hand a smaller base — and hence a length requiring less energy to traverse — increases entropy logµ (n), so increasing the capacity of an ensemble and hence the benefit of networking. On the other hand, the advantage an individual’s longer path length can be proportional to a larger economic output. Equalizing the base of η increases collective output; but an individual prefers to receive the benefit of its larger output. This relates to an economic policy conundrum: does equality mean proportionality (output proportional to path length) or does it mean equalizing outputs by aiming for averaging inputs or outputs? Natural logarithm scaling is one of my favorite aspects of the Ensemble Scaling Law, partly because it arrived unsought. Had the Ensemble Scaling Law been a known law, the isotropic scaling of the natural logarithm would have been routinely mentioned, which was not the case.
Testing mean path length scaling In February 2008, I posted an article on lexical growth on arXiv. More evidence for a 3.39% per decade increase in efficiency of inference sets seemed desirable to reduce the chance that the rate of increasing average IQs matched the rate of English lexical growth by coincidence. In 2008, the role of the mean path length as scale factor seemed crucial but eventually it emerged that more important was the concept of degrees of freedom; calculating degrees of freedom relative to the mean path length permitted use of measurements of mean path lengths for various systems. In March of 2008 in that context I wondered could the validity of lexical scaling ideas be tested by looking for scaling in other contexts? John Whitfield had re-
20
Robert Shour
cently authored a book on biological scaling (Whitfield, 2006). His book discussed metabolic scaling and referred to an important article (West et al, 1997) - WBE 1997. The 4/3 Scaling Law emerged from attempting to apply lexical scaling ideas to metabolic scaling and is a story itself. Sometime after, I happened across an article by William Nordhaus on how lighting increases in efficiency.
Nordhaus’s lighting study William Nordhaus is a professor of economics at Yale. He co-authored later editions of Paul Samuelson’s university economics textbook. In investigating whether price indices omit technological advances that improve economic well-being, in 1991 he studied how much lighting has improved in its long history (Nordhaus, 1994). I read about Nordhaus’s lighting study in November 2008 and wondered if it might contain implicit observations about the increasing efficiency of inference sets. Nordhaus measured lumens (a measure of brightness) given off by a sesame oil lamp typical of Babylonia in 1750 BCE (Nordhaus, 1994, p. 64) with a Minolta illuminance meter. He calculated the labor cost for 1000 lumen-hours: 42 hours of work in 1750 BCE. The labor cost for 1000 lumen-hours in 1992 was 0.000119 hours of work (Nordhaus, 1994, Table 1.6). He remarks ‘an hour’s work today will buy about 350,000 times as much illumination as could be bought in early Babylonia’ (Nordhaus, 1994, p. 33), no calculation shown. This must be based on the ratio of labor costs of lumens at 1992 compared to 1750 BCE: 42/0.000119 = 348, 739.5. Over the span of 3,742 years, the 348,739.5 times increase in efficiency is 3.41% per decade, one half per cent higher than 3.39% found for the English lexical growth rate from 1657 to 1989. Nordhaus’s study requires accurate estimates of long ago labor costs and lighting efficiencies, which sounds challenging. The matching rates of increase in the efficiency of lighting and of inference sets (for which lexical growth is a proxy) reinforce the reliability of rates calculated for each. Is reliance on Nordhaus’s paper cherry picking data? Nordhaus’s study reflects increases in the efficiency of inference sets used to make lighting — ideas converted to usable artifacts. There is no cherry picking because I had no surfeit of collective problem solving data to pick and choose from; it was tough to find instances, such as lexical growth, for which there is data. A promising field for such data is in economics, where lots of things are measured; η seems to work when applied to economic growth rates (Shour, 2009). More collectively built countable inference sets likely exist, but I did not find many. Not everyone in society is a lighting inventor; only a small proportion of society invents. How can a small subset of lighting inventors reflect a broadly based collective problem solving rate?
Scaling in Ensembles and Networks
21
Here are two possible answers; perhaps both apply. First, collective problem solving applied to lighting reflects society testing (using) and evaluating which lighting technology is better. All society participates in solving the problem of which technology is more efficient. Second, the rate at which inference sets increase in general may in particular increase the problem solving abilities of inventors. Similarly for language, all society is involved in testing and evaluating which phonemes, morphemes, words, expressions and grammatical rules are most useful and efficient. With more efficient linguistic tools, problem solvers can increase their problem solving efficiency.
Error rates How reliable are rates such as the 3.41% per decade based on Nordhaus’s study? Suppose the labor lighting cost at 1992 is 10% too high or too low. For 3,742 years the error in estimating the average rate of increase in efficiency is less than ninetenths of one per cent of 3.41% per decade estimated based on Nordhaus’s study. For 332 years from 1657 to 1989, the error for the 1989 size of the English lexicon being high or low by 10% is about 9% of 3.39%. For a longer period an error in lexicon size has less effect on estimating the rate of increase. Similar observations apply to start date estimated amounts.
Other rates Oeppen and Vaupel collected data on longevity. Denmark males increased longevity from 1840 to 1919 from 43.11 to 56.69 years, 3.46% per decade. Other rates are based on their data are similar (Oeppen and Vaupel, 2002) but for periods of time shorter than for Nordhaus’s lighting study. Eisner (Eisner, 2003) estimates the London homicide rate in 1278 at 15 per 100,000 inhabitants (p. 84) compared to the English homicide rate in 1975 of 1.2 per 100,000 inhabitants (p. 99): an average decrease of 3.75% per decade, close to but not the same as 3.41% per decade. Increases in the efficiency of inference sets relating to public health and socialization should result in improvements; these data do not give as close a match in rates as for lighting, lexicons and IQs.
An entropy analogy for scaling ensembles Scaling lexical growth led to η = C logµ (n). Thermodynamics implies that C logµ (n) represents network entropy. (Clausius coined the word entropy (Clausius, 1865, p.
22
Robert Shour
400); English translation: (Clausius, 1867, p. 357).) Instead of an indirect approach to scaling a lexicon by seeking a scale factor, start from Clausius’s entropy η(n) =
Q T
(4)
for η entropy, Q total heat (energy), T a degree Kelvin. T is proportional to an amount of energy required to increase temperature by one degree Kelvin in a given system, and Q is a measure of energy in the system. By analogy, for an individual IQ IQind η(n) = (5) IQ1 where IQind is an individual’s IQ and IQ1 is a single IQ point. The single IQ point is also analogous to a mean path length, proportional to the average amount of energy required to travel a mean path length or to learn enough to generate a single IQ point. An individual’s IQ is analogous to total heat Q in a system. The analogy holds for one IQ point compared to a person’s IQ, and also for average individual problem solving capacity IQ (here IQi ) compared to the average problem solving capacity IQ of a group (here IQgp ). The analogy applies so long as the individual measurement in the denominator is that of a component of the whole ensemble measured in the numerator. For a network of problem solvers, their total solving capacity IQgp = η(n) × IQi .
(6)
This in effect is η(pop) × r in the terminology above. By analogy, a lexical network scales the same way: the two types of degrees of freedom multiply: η(pop) × η(lex). Then IQgp = η(pop) × η(lex)IQi
(7)
for a society compared to an average component individual, a result the same as R = η(pop) × η(lex) × r. In effect we have applied statistical mechanics to the analysis of collective and individual intelligence, considering intelligence as a rate of problem solving. If information output increases with energy input, then increased effort — increased energy input — increases a person’s and a society’s problem solving output — knowledge. Education increases the knowledge component used in problem solving, represented by η(lex) in equation (2). Clausius devised the concept of mean path length in response to an objection by Buijs-Ballot (Buijs-Ballot, 1858): gas molecules do not move in straight lines at the speeds claimed by Clausius, as shown by slow wafting of tobacco smoke. Clausius replied (Clausius, 1859) that gas molecules moved in straight lines but only over small distances; the slow dispersal of smoke was due to intermediate collisions of particles. The mean length of path between collisions could be estimated.
Scaling in Ensembles and Networks
23
Stanley Milgram did an experiment that involved a mean path length for social connections (Travers and Milgram, 1969). The average number of intermediate acquaintanceship steps — degrees of separation — between pairs of individuals in a society and the average number of intermediate collisions between pairs of gas molecules are analogous. The mean path length concept useful in the kinetic theory of gases is also useful in social and lexical networks.
Swadesh glottochronology confirming network entropy Half of Morris Swadesh’s glottochronological rate of lexical divergence updated by the Gray and Atkinson paper (Gray and Atkinson, 2003) is 5.66% per thousand years. An approach entirely different using lexical growth and network entropies gives 5.6% per thousand years, a close correspondence. The correspondence is so close that it convinced me in the summer of 2009 that the formula for network entropy η and its relationship to lexical growth must be valid. What explains the numerical correspondence? Mother language (M) has two daughter languages D1 and D2 with subscripted parameters respectively using the same notation. Simplify assumptions for a period of time from t1 to t2 : • rD1 = rD2 = r. Average individual lexical problem solving rate r is unchanging since average individual physiologies are unchanging. • LexM = LexD1 = LexD2 . Hence η(LexM ) = η(LexD1 ) = η(LexD2 ). Lexicon sizes for mother and daughter languages are the same. While the lexicons of daughter languages diverge from the mother language and from each other, since collective rates of change are the same, their lexicon sizes remain equal. • PopM = PopD1 = PopD2 . Hence η(PopM ) = η(PopD1 ) = η(PopD2 ). The mother language’s population has divided into two daughter populations. But since the effect on the collective rate is a logarithmic function of population, the effect of slightly different population sizes on the average rate over time is negligible. M • Lex dt = 0. To compare how daughter languages change relative to their mother language, suppose the mother language is unchanging. This implies that LexMt1 = LexMt2 ; as if, individual lexical creativity rM = 1, so LexM is unchanged. The ratio of rates for a changing daughter lexicon to that of the unchanging mother language is: η(LexD1 ) × η(PopD1 ) × r r = η(LexM ) × η(PopM ) × rM rM r = 1 = r.
(8) (9) (10)
24
Robert Shour
Due to the assumptions, η(Lex) and η(Pop) appearing in the numerator for the daughter language and in the denominator for the mother language cancel. The second last line (9) uses rM = 1. Swadesh’s glottochronology provides persuasive confirmation in two ways, first the numerical correspondence, and second the principled explanation it provides for the numerical correspondence. But for Swadesh and his work I would have given up on these ideas. Moreover, the η calculations in turn substantiate the validity of Swadesh’s approach. “Such an agreement between results which are obtained from entirely different principles cannot be accidental; it rather serves as a powerful confirmation of the two principles and the first subsidiary hypothesis annexed to them” (Clausius, 1899).
While I went through various degrees of skepticism from June 2007 to June 2009 about the ideas sketched above, most of those doubts were allayed by the explanation of how it is that the ‘fossil rate’ found by Swadesh is contained in the English lexical growth rate. Swadesh came close to showing that half the rate of divergence isolated the average individual rate of increase in the efficiency of inference sets.
Some implications of the Swadesh rate What does a rate of 5.6% per thousand years imply? The average person without any social networking or language — without the factors η(Pop) and η(Lex) that multiply the problem solving capacity they were born with and model the ability of an individual to learn from others and to acquire known ideas — would take 1,000 years to increase his existing store of knowledge by 5.6%. If collective knowledge increased at the same rate as the individual rate does, ancestral homo sapiens likely would have been challenged to survive in their environments. If one hundred thousand years ago life span was 30 years, individual homo sapiens would not have had enough brain power and a long enough life to improve its circumstances using its unaided problem solving capacity. Inventing speech had to be a group effort; there were too many problems to be solved by a single homo sapiens or even in a single generation of a group. When homo sapiens lived in small tribes, the problem solving advantage of collective problem solving would be small. Homo sapiens lacking physical advantages over other animals must have faced bottlenecks to their survival that made their posterity a hit and miss prospect. The average individual problem solving rate is much less than society’s collective rate. This seems to partly address the initial question about human behavior: a society’s ideas can so overwhelm an individual’s problem solving resources that they can profoundly influence an individual’s behavior. A society’s values matter. The relationship R = η(pop) × η(lex) × r sheds light on the nature of ‘intelligence’ or IQs. Part of individual problem solving capacity depends on r which depends on genetics and health. Part depends on η(pop), how much a person’s so-
Scaling in Ensembles and Networks
25
ciety enhances or inhibits social networking and transmission of knowledge. Part depends on η(lex) which is a proxy for society’s available store of knowledge. The relationship may partly account for morality. Society adopts norms that inhibit individuals from damaging the multiplicative benefits of η(Pop) and η(Lex) for societies and those who live in societies.
Mean path length as a scale factor In C logµ (n)r the rate r = µ/t. The rate r is measuring the rate — the t in µ/t — at which energy or information moves through the network in mean path length steps. Entropy is a measure of the number of degrees of freedom expressed by the exponent of a scale factor. Entropy is not an esoteric idea that measures disorder; it emerges algebraically from the concepts of dimension and degrees of freedom: it is a measure of possible states. In logµ (n)µ the µ on the right side is a length. As the base of the log µ keeps track of the number of degrees of freedom in n relative to µ.
Arriving at the Ensemble Scaling Law: mean path length statistics The concept of network entropy logµ (n)µ and its relation to collective network capacity is recent, probably for a variety of reasons which include: • Clustering coefficient C as a concept necessary for η(n) = C logs (n) was popularized only recently by a 1997 article by Watts and Strogatz (Watts and Strogatz, 1998). • Computers only recently enabled calculation of mean path lengths and clustering coefficients for lexicons and for social and other networks (Watts and Strogatz, 1998; Ferrer i Cancho and Sol´e, 2001; Motter et al, 2002; Achard et al, 2006). • Detecting emergence leading to lexical growth is difficult without antecdent concepts of emergence. • IQ tests date to the early 20th century. Rising average IQs is a recent observation; explaining increasing average IQs requires applying recent concepts such as emergence and swarm intelligence to recent data on rising IQs. • The idea of calculating the rate at which lexicons increase as a proxy rate for collective problem solving and the rate at which inference sets become efficient is likely novel, but Swadesh measured lexical change earlier. The scientific spirit of our times prepared the way with the concept of emergence and by recently making available for analogy the technology of software data compression. • Articles can be located and accessed more quickly on the internet than in a library. Online resources such as archive.org and ResearchGate speed up access
26
Robert Shour
to articles and books. Online resources speed dissemination of data and increase the rate of problem solving. Diverse ideas helpful and perhaps necessary for conceiving of network entropy are recent. Average IQs can be used to reveal the average individual rate r using η. Measurement of an individual’s IQ indirectly measures an individual’s average problem solving rate, including the rate at which problem solving techniques are acquired and implemented. Thus R = nIQav for average IQav in a population of n individuals. Thus R/n = IQav . (11) Equation (11) reveals the connection between R/n and IQav ; the rate of increase on the left side of the equation is the same as the rate of change on the right side of the equation. That is how the rate at which average IQs increase reflect the rate of increase in the efficiency of inference sets in general and of the English lexicon in particular. Studies in the years 2000 to 2010 calculated mean path lengths using computers. The possibility arose of relating computed mean path lengths to collective rates. The utility of the mean path length in ensemble scaling is practical because it can be calculated for a network and then used to calculate network entropy. The role of the mean path length also leads to the Ensemble Scaling Law. In these historical, conceptual and technological contexts the Ensemble Scaling Law emerged in consequence of contributions from a wide variety of problem solvers: an uncoordinated emergent group effort.
Degrees of freedom possibilities Computed mean path lengths permit connecting degrees of freedom to collective and individual problem solving rates. That reveals phenomena in which degrees of freedom plays a role that are not at the quantum level. Energy sufficient to explore all degrees of freedom is not required, just enough for one possibility. A mean path length is the average distance between any pair of network nodes. If the pair of nodes chosen are at the extreme opposite reaches of the network, then there are logµ (n) = k levels separating them. Then the same energy required to traverse a single mean path length µ would at the same time appear to traverse kµ. This reveals that what matters is the capacity of a network or system to connect nodes, not that all possible paths be actually realized. These aspects of degrees of freedom suggest these possibilities: There are not many worlds (Everett, 1957) due to quantum mechanics but only a large number of available degrees of freedom. Quantum computing theory exploits degrees of problem solving freedom. The ergodic hypothesis problem — insufficient time exists to permit all possibilities to occur (Ma, 2000) — is avoided by the Ensemble Scaling Law.
Scaling in Ensembles and Networks
27
Feynman’s sum over histories may be an instance the Ensemble Scaling Laws. Superposition involves degrees of freedom and the Ensemble Scaling Law.
The 4/3 Scaling Law As a test of the validity of ensemble and lexical scaling, in March 2008 I adapted those ideas to a derivation in WBE 1997 (West et al, 1997) of 3/4 metabolic scaling. Metabolic scaling is characterized by metabolism Y ∝ M b for organism mass M for some b. Rubner in 1883 concluded based on his measurements that b = 2/3 (Rubner, 1883, 1902, 1982); beginning in 1932 Kleiber concluded based on his measurements that b = 3/4 (Kleiber, 1932, 1947) and (Kleiber, 1961, p. 214). WBE 1997 derived 3/4 metabolic scaling by modeling energy and oxygen distribution from circulatory system to organism using scaled circulatory system tubes: for the kth levelβ scales tube radius rk , γ scales tube length `k , and n scales the number of branchings Nk . The service volume to which capillaries distribute energy is spherical with radius `/2. Then β = n−1/2 and γ = n−1/3 . The number of capillaries is Nc = nN . Using the sum of a geometric series and approximations, WBE 1997 showed b = 3/4. Exponents of a scale factor for lexicons increase with lexical size but for organisms b in M b is constant. How then to adapt lexical to metabolic scaling? In April 2008, I treated a blood flow path from aorta to capillary as analogous to nested scaled energy clusters used in lexical scaling. Per level the circulatory system had 4/3 the degrees of freedom (the exponent of a scale factor) compared to the spherical service volume. (For a week I thought this was an algebra mistake.) Though the mathematics I used was both different — no geometric series, no summing and no approximations— and shorter than that in WBE 1997, I concluded (incorrectly) that I had merely confirmed the results of WBE 1997. A couple of weeks later, I noticed in a text on heat (Allen and Maxwell, 1948, p. 743) 4/3 scaling in an intermediate step of Boltzmann’s derivation (Boltzmann, 1884; Planck, 1914) of Stefan’s Law. Stefan’s Law (Stefan, 1879) applies to black body radiation at all scales, consistent with 4/3 scaling being universal, implying that 4/3 scaling leading to metabolic scaling does not merely apply to many sizes of organisms, it applies universally at all scales. (Stefan’s Law can be found using dimensional analysis (Mahajan, 2014, p. 181-185) which it turns out is a clue about the role of dimensions in 4/3 scaling.) Derivation of 4/3 scaling in metabolism can be adapted to 4/3 scaling of radiation by dividing a radiation cone into segments analogous to individual tube volumes; a radiation cone segment’s average cross-sectional area and average length correspond respectively to a circulatory system tube cross-sectional area and length. In October 2010 I read an article by the authors of WBE 1997 (Brown et al, 2005) rebutting a critique of WBE 1997. The critique (Kozlowski, 2004, 2005) claimed the assumption of fixed volume capillaries in WBE 1997 lead to a contradiction. The
28
Robert Shour
critique’s analysis appeared valid. As a result, I reviewed my 2008 article and found problems. In WBE 1997, the scale factor β is squared in calculating tube volume because the radius rk is squared for the cross-sectional area A of a tube. WBE first finds that the scale factor radius β = n−1/2 and in its derivation squares the radius’s scale factor β : β 2 = (n−1/2 )2 = n−1 . These steps taking the square root followed by squaring — a mathematical step followed by its inverse — imply that the scale factor for cross-sectional area A is conceptually more significant that β . Let v be the scale factor tube for volume Vk such that v ∝ n−1 since the number of tubes per level scales by n and flow volume per level is constant. Then v ∝ β 2 γ = (n−1/2 )2 n−1/3 = n−4/3 . WBE 1997 implies that the exponent of v and of β 2 expressed in n are the same, which amounts to saying that a tube volume and a tube cross-sectional area scale the same way. This inconsistency is hard to see in WBE 1997. Moreover, γ scales a length and so does β so their relationship to n should be the same not different as in WBE 1997. If v ∝ n−1 then we can not also have v ∝ n−4/3 as above. Somehow an extra 1/3 creeps in. With hindsight, the inconsistencies involving scale factors arise (in my view) by not assigning one dimension to blood flow; one dimensional blood flow across cross-sectional area A of a tube differentiates the two dimensions of A from the three dimensions of A plus blood flow. Including the dimensionality of blood flow accounts for three dimensional volume scaling the same way as two dimensional area plus one dimensional blood flow: both have three dimensions. From 2010 to late 2015 I tried to discern where the 1/3 extra to 1 in 4/3 scaling came from. Mathematics explains the origin of perplexity about the extra 1/3: k 4k 4 : 1 = k : 1. 3 3
(12)
The apparent scaling by a fraction 4/3 in fact was a ratio of the scaling of two distinct systems: the circulatory system scaled by 4 and the service volume it supplies scaled by 3. There is no fractional 4/3 scaling of a parameter. The 4/3 as an exponent describes degrees of freedom per scaling. Degrees of freedom relates to dimension; 4/3 scaling can only exist as a ratio of dimensions. Consider an empty circulatory system as 3 dimensional. Add one dimensional blood flow and together the system has four dimensions, supplying the three dimensional volume proportional to organism mass. Scaling arising from a ratio of dimensions is used in an 1838 paper proposing a geometric explanation for why large animals breath more slowly than smaller ones. Sarrus and Rameaux (Sarrus and Rameaux, 1838) said that an organism’s heat is produced by a three dimensional volume and is emitted by a two dimensional skin surface. To maintain a constant body temperature, breathing must slow by a 2/3 power of organism mass. The surface law described by Sarrus and Rameaux suffered from a number of defects: skin surface is hard to measure (stretched or unstretched, with folds or without) (Brody, 1945) but primarily because an animal is not a machine for creating
Scaling in Ensembles and Networks
29
heat to be discharged into the air, but, via its circulatory system, supplies energy to its organs (as WBE 1997 assumed). Adjust the observations of Sarrus and Rameaux to a four dimensional circulatory system including blood flow supplying a three dimensional organism mass, and their analysis applies by analogy: metabolism scales by a 4/3 power of organism mass. Metabolism must slow because the 4/3 scaling of energy supply would exceed (by 1/3 the entropy) the capacity of cells to use energy, and would overheat the cells leading to death. By slowing metabolism, the same intracellular biochemistry that is in smaller animals can work in larger ones, which reduces physiological change required by organisms to adapt to increasing size.
Other instances of the 4/3 Scaling Law In 2008 after unexpectedly finding 4/3 scaling (instead of 3/4 scaling) in connection with metabolism, 4/3 scaling also appeared, as I mentioned above, in Boltzmann’s proof of Stefan’s Law. In 2016 I read about similar principles using different dimensions in Sarrus and Rameaux’s 1838 article. The 4/3 Scaling Law also appears in: • John James Waterston’s 1845 article published in 1892 at the instance of Lord Rayleigh (Waterston, 1892). Waterston supposed a gravitationally bound elastic plane struck on the underside by a number of molecules sufficient to maintain a fixed height above the Earth. The vis viva (kinetic energy) of the molecules was 4/3 that of the elastic plane in free fall. If by assumption the gravity that causes descent were removed, then the elastic plane would have 4/3 more energy to move outward than the energy of stasis. Applying this reasoning to space itself, space would expand outward: space plus radiation (instead of molecular motion) has a 4/3 pressure relative to static space. One referee of the 1845 paper stated ‘the paper is nothing but nonsense, unfit for reading before the Society’ (Waterston, 1892, p. 2). Rayleigh remarks that ‘highly speculative investigations, especially by an unknown author, are best brought before the world through some other channel than a scientific society, which naturally hesitates to admit into its printed records matters of uncertain value.’ • Consider gas molecules moving with average velocity v in a given volume. What is the average velocity vrel of the gas molecules relative to each other? √ Clausius asserted vrel = (3/4)v (Clausius, 1859). Maxwell showed that vrel = 2v, accurately matching ratios of measured specific heat capacities (Longair, 2013, p. 7); Clausius erred (Maxwell, 1890, Vol. 1, p. 387). To calculate relative velocity, Clausius first held all molecules still except one. Finding vrel = (3/4)v, Clausius erroneously generalized this for all n moving molecules. But one molecule moving among many is only a test particle sampling distances between molecules. What Clausius actually found was that the mean path length of the n molecules all moving was 3/4 the distance between
30
•
• •
•
Robert Shour
n equally sized cube volumes containing the molecules. He had calculated the ratio of velocities for a three dimensional system with motion compared to a static three dimensional system. Average lengths between cube centers in static three dimensional space are 4/3 (inverting the compared spaces and his 3/4 ratio) the mean path length of the four dimensional system. Three dimensional space stretches lengths in four dimensional space by 4/3. The 4/3 Scaling Law thus has two ratio manifestations, one for degrees of freedom and the other for lengths. The same energy in four dimensions has 4 degrees of freedom compared to three degrees of freedom in three dimensional space. On the other hand, length L in four dimensions with equal energy per dimension has length (4/3)L in three dimensional space because in three dimensional space each dimension has 4/3 as much energy per dimension. H. Minkowski in 1908 gave a lecture using a four dimensional space to explain special relativity: three spatial dimension and one time dimension. Motion at a constant speed (such as light motion) is proportional to time. Minkowski’s four dimensions correspond to a four dimensional space in the 4/3 Scaling Law. Minkowski noted ‘that one never determine from physical phenomena whether space, which is assumed to be at rest, may not after all be in uniform translation’ (Minkowski, 2012, p. 39). In 1998, astronomical observations of supernovae observation contradicted his space at rest assumption. Hermann Bondi uses scaling to derive special relativity (Bondi, 1980, originally 1962). General relativity implies (Wang, 2010) that for distance scale factor a, radiation energy density varies with distance by 1/a4 and matter energy density varies as 1/a3 . The scalings 1/a4 and 1/a3 are consistent with the 4/3 Scaling Law. Lewis Fry Richardson found that wind gusts scale by a 4/3 power: F(l) ∝ `4/3 ,
(13)
a result similar to the result implicit in Clausius’s work on 3/4 mean path lengths. (Richardson, 1926). Kolmogorov in his 1941 articles derived the result (Kolmogorov, 1991b,a). Batchelor explained Kolmogorov’s work (Batchelor, 2008). (I still have difficulty following Kolmogorov and Batchelor.) Dimensional analysis can be used to arrive at Kolmogorov’s result (K¨orner, 1996, p. 191). • In 1998, astronomers reported on measurements of type IA supernovae, useful standard candles for distance calibration. They found the dark energy proportion ΩΛ of the universe’s energy density (Ω = 1) was about 0.70. Using the ratio of lengths version of the 4/3 Scaling Law, for radiation energy density ργ , matter energy density ρM and energy quantity E:
Scaling in Ensembles and Networks
31
ργ E/13 = ρM E/(4/3)3 64 = 27 0.7033 ; ≈ 0.2967
(14)
+0.015 the numerator is close to 2010 measurements of ΩΛ = 0.728−0.016 , with a 68% confidence limit (Jarosik et al, 2010; Komatsu et al, 2011). • Luminosities for type IA supernovae were ‘on average 25% less than anticipated’ (Cheng, 2010, p, 259). A single space reference frame implied the supernovae were 10% to 15% farther than expected (Riess et al, 1998) and (Carroll and Ostlie, 2007, p. 1042-1044). With a four dimensional space causing three dimensional space to expand, 25% dimmer implies 1/3 farther, consistent with the 4/3 Scaling Law. • In 2001, mathematicians proved, using sophisticated mathematics, that the fractal envelope for Brownian motion is 4/3 (Lawler et al, 2001); the envelope of moving particles in a volume has four degrees of freedom compared to a static environment which has three. The 4/3 Scaling Law thus implies the same result. • In 2014, a study of 740 type IA supernovae implied matter energy density proportion of 0.2965 (Betoule et al, 2014), less than six tenths of one per cent different than 0.2967 predicted by the 4/3 Scaling Law.
The 4/3 Scaling Law implying the Ensemble Scaling Law The 4/3 Scaling Law implies that the capacity of a system is linearly related to its degrees of freedom per scaling: 4 dimensions can spread energy among more dimensions. That implication is express in the Ensemble Scaling Law. In 2008, I had adapted lexical scaling to find 4/3 scaling. For example, (β k r)2 = 2k β r2 ; the scale factor exponent is 2k. The exponent ratio 4k/3k for the circulatory system compared to the service volume simplifies to 4/3. The difference between lexical scaling and metabolic scaling was only apparent. The exponent b in M b in metabolic scaling is constant because it is a ratio. The exponent of a scale factor for a four dimensional systems grows with size, and so does the exponent of a scale factor for the corresponding three dimensional system but the growth part cancels: 4k/3k = 4/3. At every level there is a 4 : 3 ratio, so this can confusingly masquerade as a geometric series, because the same scaling prevails at each level. Since I found 4/3 scaling by adapting the Ensemble Scaling Law to metabolic scaling and on the other hand, the 4/3 Scaling Law implies (very subtly) the Ensemble Scaling Law, the two laws imply each other. They are equivalent aspects of a general principle. If one is invalid, so is the other. Swadesh’s work on glottochronology indirectly supports the validity of the 4/3 Scaling Law.
32
Robert Shour
The 4/3 Scaling Law and mean path lengths The average mean path length for an isotropic network distributing information is the natural logarithm, about 2.71828. For a network of information receivers, the mean path length should be 4/3 × 2.71828 ≈ 3.624. Watts and Strogatz found the mean path length for a network of actors to be 3.65, close to what theory predicts. Using the mean path length for the actors network for social networks generally is justified if measured mean path lengths conform to theory. The English lexicon study (Ferrer i Cancho and Sol´e, 2001) found a mean path length of µ = 2.67, not far off what natural logarithm scaling predicts for isotropic distribution.
Two reference frames of the 4/3 Scaling Law Two reference frames (arguably less elegant than having only one) can simplify some problems. Quantum entanglement and the cosmological horizon problem might resolve if things too far apart in three dimensions are closer together in four dimensions. The two slit experiment might be explained by particles lighting up dark energy ripples in two reference frames. The cosmic scale factor a(t) is simpler if a does not change with time. Cosmology proposes that inflation expanded the universe initially, which then slowed down and later, speeded up (dark energy). If the 4/3 Scaling Law and two reference frames apply, then inflation could arise from distance ratios relative to the then current size of the universe inflating. The ratio of 100 compared to 1000 when the universe was smaller is larger than the ratio of 100 compared to one million when the universe is bigger. Two reference frames could simplify concepts of proper time, proper distance and co-moving distance used in cosmology. Perhaps tensors in gravity theory are required because of the assumption of one reference frame. Two reference frames might be characterized as a broken symmetry; they seem to relate to emergence.
Plausibility of Ensemble Scaling and 4/3 Scaling Using the mean path length µ to scale an ensemble gives degrees of freedom. Using computers, the mean path length for networks such as actors or English words can be calculated. Mean energy to connect nodes is proportional to mean path length. If a proportion C of nodes connect, η = C log(n). Using η for populations and lexicons gives average individual problem solving rate r.
Scaling in Ensembles and Networks
33
Mathematical principles that involve finding degrees of freedom relative to the mean path length are uncomplicated. Psychological hurdles in ensemble scaling lie in the assumption that lexical growth rate R is proxy for increasing efficiency of collectively created inference sets and accepting data about R. Describing mistakes on the way to ensemble scaling takes longer than explaining it. It is possible that ensemble scaling occurs at the quantum level too. The 4/3 Scaling Law, on the other hand, has been an unnoticed part of the scholarly landscape since 1845 until now, possibly with more precursors (though some are obscure) in the historical record than is the case for the Ensemble Scaling Law. That the 4/3 Scaling Law accounts for Kleiber’s 3/4 metabolic scaling requires only updating the 1838 insights of Sarrus and Rameaux. Psychological hurdles for the 4/3 Scaling Law applied to dark energy are considerable. ‘The nature of dark energy is now perhaps the most profound mystery in cosmology and astrophysics. And it may remain forever so’ (Cho, 2012). What if years from now (or it could be centuries as it is sometimes in science), children in kindergarten know that there are four dimensions and three dimensions in our universe, and regard that as no more mysterious than the Earth revolving on its axis? Another psychological hurdle: the 4/3 Scaling Law posits two contemporaneous universes. Still that is a less formidable hurdle that Everett’s many worlds hypothesis which has adherents in the physics community. The Ensemble Scaling Law has arrived due to computability and availability of statistics for mean path lengths and clustering coefficients for networks (social, lexical, conceptual) and due to concepts of emergence and software data compression. The 4/3 Scaling Law depends on acceptance of commonalities among diverse phenomena and related mathematical theories. Ironically more work with a longer history supports the 4/3 Scaling Law than the Ensemble Scaling Law.
Conclusion The relationship between glottochronology rates and English lexical growth rates implicitly includes a hypothesis corresponding to the Ensemble Scaling Law. In this way I inadvertently ran into the Ensemble Scaling Law; it took a few years for me to appreciate its generality. This article began with a question about human behavior. The question diverted into ideas about language, problem solving and IQs, thermodynamics, statistical mechanics, and eventually cosmology. Perhaps this validates Hume: all sciences ‘have a relation, greater or less, to human nature’. Ask questions, learn some mathematics, read widely, make notes, look for patterns and analogies. I am surprised what that led to.
34
Robert Shour
References and Citations References (1993) Census characteristics of Australia—1991 Census of Population and Housing. Australian Bureau of Statistics, Catalogue No 27100 (1996) Population and dwelling counts, for Canada, Provinces and Territories, 1991 and 1996 Censuses (2003) Revised population estimates England & Wales 1991-2000. National Statistics Achard S, Salvador R, Whitcher B, Suckling J, Bullmore E (2006) A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. J Neuro Sci 26(1):63 – 72 Allen HS, Maxwell RS (1948) A Text-book of Heat. Macmillan and Co. Barnsley M, Hurd LP (1993) Fractal Image Compression. AK Peters Baron ME (1969) The Origins of the Infinitesimal Calculus. Dover Batchelor GK (2008) Kolmogoroff’s theory of locally isotropic turbulence. Mathematical Proceedings of the Cambridge Philosophical Society 43(4):533–559 Betoule M, et al (2014) Improved cosmological constraints from a joint analysis of the SDSS-II and SNLS supernova samples. A & A 568(A22) Boccara N (2004) Modeling Complex Systems. Springer Boltzmann L (1884) Ableitung des Stefan’schen Gesetzes, betreffend die Abh¨angigkeit der W¨armestrahlung von der Temperatur aus der electromagnetischen Lichttheorie. Ann Phys 22:291–294 Bondi H (1980, originally 1962) Relativity and Common Sense. Dover Bosworth J (1898) A Compendious Anglo-Saxon and English Dictionary. John Russel Smith Boyer CB (1949) The History of the Calculus and Its Conceptual Development. Dover Boyer CB (1991) A History of Mathematics (revision by Merzbach, U. C.). Wiley Briscoe B, Odlyzko A, Tilly B (2006) A refutation of metcalfe’s law and a better estimate for the value of networks and network interconnections. IEEE Spectrum p 26 Brody S (1945) Bioenergetics and Growth with Special Reference to the Efficiency Complex in Domestic Animals. Reinhold Brown JH, West GB, Enquist BJ (2005) Yes, West, Brown and Enquist’s model of allometric scaling is both mathematically correct and biologically relevant. Funct Ecol 19:735–738 Buijs-Ballot CHD (1858) Ueber die Art von Bewegung, welche wir W¨arme und Elektricit¨at nennen. Annalen der Physik 179(2):240–259, DOI 10.1002/andp.18581790205, URL http: //dx.doi.org/10.1002/andp.18581790205 Cajori F (1993 originally 1928 and 1929) A History of Mathematical Notation. Dover Campbell L (1998) Historical Linguistics – An Introduction, Second Edition. MIT Press Ferrer i Cancho R, Sol´e RV (2001) The small world of human language. Proceedings of the Royal Society of London B 268:2261–2266 Carroll BW, Ostlie DA (2007) An Introdution to Modern Astrophysics – Second Edition. Pearson Addison Wesley Cheng TP (2010) Relativity, Gravitation and Cosmology, Second Edition. Oxford Cho A (2012) What is dark energy? Science 336:1090–1091 Chomsky N (1975) Reflections on Language. Random House Clausius R (1859) On the Mean Length of the Paths described by the separate Molecules of Gaseous Bodies on the occurrence of Molecular Motion: together with some other Remarks upon the Mechanical Theory of Heat.(Translation by Guthrie, P. of Annalen, No. 10, 1858). Philos Mag 17. Fourth Series(112):81–91 Clausius R (1865) Ueber verscheidene f¨ur die anwendug bequeme formen der hauptgleichungen der mechanischen w¨armtheorie. Annalen der Physik und Cehemie 125(7):353–400
Scaling in Ensembles and Networks
35
Clausius R (1867) The Mechanical Theory of Heat with the applications to the steam -engine and to the Physical Properties of Bodies (translated by Tyndall, J.). John van Voorst Clausius R (1899) On the Motive Power of Heat, and on the laws which can be deduced from it for the theory of heat, translation by Magie, W. F. of Annalen der Physik, Vol 79, (1850). In: Magie WF (ed) The Second Law of Thermodynamics, Harper & Brothers Crystal D (2005) Cambridge Encyclopedia of Language (Second edition). Cambridge University Press Dunbar R (1997) Grooming, Gossip and Language. Harvard University Press Eisner M (2003) Long-term historical trends in violent crime. Crime and Justice 30:83–142 Everett DL (2017) How Language Began: The Story of Humanity’s Greatest Invention. W. W. Norton Everett H (1957) The many-worlds interpretation of quantum mechanics,. Thesis, Princeton University Gibbs JW (1914) Elementary Principles in Statistical Mechanics Developed with Especial Reference to the Rational Foundation of Thermodynamics. Yale University Press Gould SJ (2002) The Structure of Evolutionary Theory. Harvard University Press Gray RD, Atkinson QD (2003) Language-Tree Divergence Times Support the Anatolian Theory of Indo-European Origin. Nature 426:435–439 Grigg DB (1980) Population growth and agrarian change. Cambridge University Press Herrnstein RJ, Murray C (1994) The Bell Curve: Intelligence and Class in American Live. Simon & Schuster Hinde A (2003) England’s population: a history from the domesday survey to 1939. Hodder Arnold Holland JH (1998) Emergence – From Chaos to Order. Addison-Wesley Hume D (1888) A Treatise of Human Nature: being an Attempt to introduce the experimental Method of Reasoning into Moral Subjects. Oxford Univ. Press (reprint of 1739. ed.) Jarosik N, et al (2010) Seven-year Wilkinson Microwave Anisotropy Probe (WMAP). arXiv:10014744v1 Jensen JLWV (1906) Sur les fonctions convexes et les in´egalit´es entre les valeurs moyennes. Acta mathematica 30:175 Jevons WS (1879) The Theory of Political Economy (Second Edition). MacMillan Johnson S (2001) Emergence — The Connected Lives of Ants, Brains, Cities, and Software. Scribner Kauffman S (1995) At Home in the Universe – The Search for the Laws of Self-Organization and Complexity. Oxford University Press Kauffman SA (1993) The Origins of Order. Oxford Khinchin AY (1957) Mathematical Foundations of Information Theory. Dover Kleiber M (1932) Body size and metabolism. Hilgardia 6:315 Kleiber M (1947) Body size and metabolic rate. Physiol Rev 27(4):511–541 Kleiber M (1961) The Fire of Life: an introduction to animal Bioenergetics. John Wiley & Sons Kolmogorov AN (1991a) Dissipation of energy in the locally isotropic turbulence (translation of 1941 article in Russian by V. Levin). Proc R Soc A 434(1890):15–17 Kolmogorov AN (1991b) The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers (translation of 1941 article in Russian by V. Levin). Proc R Soc Lond A 434:9–13 Komatsu E, et al (2011) Seven-year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Cosmological interpretation. ApJS p 192:18 K¨orner TW (1996) The Pleasures of Counting. Cambridge University Press Kozlowski M J & Konarzewski (2004) Is West, Brown and Enquist’s model of allometric scaling mathematically correct and biologically relevant? Funct Ecol 18:283–289 Kozlowski M J & Konarzewski (2005) West, Brown and Enquist’s model of allometric scaling again: the same questions remain. Funct Ecol 19:739–743 Lancashire I (ed) (1999) The Early Modern English Dictionaries Database (EMEDD). University of Toronto Latora V, Marchiori M (2001) Efficient behavior of small-world networks. Phys Rev Lett 87(19)
36
Robert Shour
Lawler GF, Schramm O, Werner W (2001) The dimension of the planar Brownian frontier is 4/3. Math Res Lett 8:401–411 Longair MS (2003) Theoretical Concepts in Physics, Second edition. Cambridge University Press Longair MS (2013) Quantum Concepts in Physics — An Alternative Approach to the Understanding of Quantum Mechanics. Cambridge University Press Ma SK (2000) Statistical Mechanics. World Scientific MacKay DJC (2003) Information Theory, Inference, and Learning Algorithms. Cambridge University Press Mahajan S (2014) The Art of Insight in Science and Engineering – Mastering Complexity. The MIT Press Maxwell JC (1890) The Scientific Papers of James Clerk Maxwell. Cambridge University Press McMahon AMS (1994) Understanding Language Change. Cambridge University Press Minkowski H (2012) Space and Time - Minkowski’s Papers on Relativity (Translated by Fritz Lewertoff and Vesselin Petkov). Minkowski Institute Press, Montreal, Quebec, Canada Motter A, de Moura A, Lai YC, Dasgupta P (2002) Topology of the conceptual network of language. Phys Rev E 65:065,102(R) Nordhaus WD (1994) Do real-output and real-wage measures capture reality? The history of lighting suggests not. Tech. Rep. 1078, Cowles Foundation for Research in Economics, Yale University Oeppen J, Vaupel JW (2002) Broken limits to life expectancy. Science 296:1029 Perry MJ, Mackun PJ (2001) Population change and distribution census 2000 brief. nationalatlasgov Census 2000 Brief Series Pinker S (2000) The Language Instinct. Harper Collins Planck M (1914) The Theory of Heat Radiation (Translator Masius, M.) — Second Edition. P. Blackiston’s Son & Co. P´olya G (1954) Mathematics and Plausible Reasoning. Princeton University Press P´olya G (1957) How to Solve It — A New Aspect of Mathematical Method. Princeton University Press P´olya G (1962) Mathematical Discovery — On Understanding, Learning, and Teaching Problem Solving. Wiley Popper K (1999) All Life is Problem Solving. Routledge Richardson LF (1926) Atmospheric diffusion shown on a distance-neighbour graph. Proc R Soc A 110(756):709 – 737, DOI 10.1098/rspa.1926.0043 Riess AG, et al (1998) Observational evidence from supernovae for an accelerating universe and a cosmological constant. Astron J 116:1009 Romer PM (1990) Endogenous technological change. J Polit Econ 98(5):S71 – S102 ¨ Rubner M (1883) Uber den einfluss der k¨orpergr¨osse auf stoff- und kraftwechsel. Z Biol 19:536– 562 Rubner M (1902) Die Gesetze des Energieverbrauchs bei der Ern a¨ hrung. Franz Deuticke Rubner M (1982) The Laws of Energy Consumption in Nutrition (translators Markoff, A. and Sandri-White, A.). Academic Press Sapir E (1921) Language: An Introduction to the Study of Speech. Harcourt Brace Sarrus F, Rameaux J (1838) Rapport sur une m´emoire adress´e a´ l’Acad´emic royale de M´edecine. Bull Acad R Med Paris 3:1094–1100 Sawyer PH (1998) From Roman Britain to Norman England, 2nd edn. Routledge Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423, 623–656 Shour R (2009) A theory of intelligence. arXiv:09090173v8 Simpson J, Weiner E (eds) (1989) Oxford English Dictionary. Oxford University Press Snooks GD (1995) The dynamic role of the market in the anglo-norman economy, 1086 - 1300 and beyond. In: A commercialising economy: England 1086 to c. 1300, Manchester University Press
Scaling in Ensembles and Networks
37
¨ Stefan J (1879) Uber die Beziehung zwischen der W¨armestrahlung und der Temperatur. Sitzungsberichte der mathematisch-naturwissenschaftlichen Classe der kaiserlichen Akademie der Wissenschaften 79(1):391–428 Swadesh M (1971) The Origin and Diversification of Language. Aldine-Atherton Thompson DW (1945) On Growth and Form. Cambridge University Press Toller T (1921) An Anglo-Saxon Dictionary Supplement. Oxford Tolman RC (1942) The Principles of Statistical Mechanics. Oxford Univ. Press Travers J, Milgram S (1969) An experimental study of the small world problem. Sociometry 32(4):425–443 University of Toronto (2017) Dictionary of old english: 2017 progress report. URL www.doe. utoronto.ca Waldorp MM (1992) Complexity — The Emerging Science at the Edge of Order and Chaos. Simon & Shuster Wang Y (2010) Dark Energy. Wiley-VCH Waterston JJ (1892) On the physics of media that are composed of free and perfectly elastic molecules in a state of motion. Philos T Roy Soc A 183:1–79 Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440 West GB, Brown JH, Enquist BJ (1997) A general model for the origin of allometric scaling laws in biology. Science 276:122–126 Whitfield J (2006) In the Beat of a Heart. Joseph Henry Press Wrigley EA, Schofield R, Lee RD (1989) The population history of England, 1541-1871: a reconstruction. Cambridge University Press Zipf GK (1949) Human Behavior and the Principle of Least Effort. New York. Hafner Publishing Company (1972 reprint)