Apr 26, 2016 - over the years (Marantz 2013, but see Hay & Baayen 2005). Understanding a ..... these predictions with incomplete information.19. 3.1 Entropy.
Implicative Relations in Word-Based Morphological Systems Farrell Ackerman University of California, San Diego
Robert Malouf San Diego State University
April 26, 2016 “If, therefore, we wish to understand language in its true inwardness we must disabuse our minds of preferred ‘values’ and accustom ourselves to look upon English and Hottentot with the same cool, yet interested, detachment.” (Sapir 1921:124)
1 Introduction Speakers of languages with complex morphology and multiple in lection classes confront a large learning task whose solution raises fundamental questions about morphological systems and their organization. This task receives a general formulation as the Paradigm Cell Filling Problem (PCFP) in Ackerman et al. (2009:54): P C F P : Given exposure to an in lected wordform of a novel lexeme, what licenses reliable inferences about the other wordforms in its in lectional family? For example, in Tundra Nenets (Samoyedic branch of Uralic) each noun lexeme has at least 210 possible in lected forms for the morphosyntactic feature property combinations for 7 cases, 3 numbers, and 3 persons and numbers for possessors: (7 × 3) + (7 × 3 × 3 × 3) = 210 distinct wordforms.1 So, confronted with the need to produce a previously unencountered form of a known in lected word, what guides a native speakers’ guesses about the (patterns of) word forms that encode all of the licit morphosyntactic feature combinations of a lexeme? How do speakers reliably resolve uncertainties in the selection of an appropriate form? In some intuitive sense, the problem seems increasingly dif icult (i) the larger the number of morphosyntactic properties a language contains, (ii) the greater the number of allomorphic variants it uses to encode them, and (iii) the more extensive the conjugation classes and 1
We ignore for present purposes both predicative forms and so-called predestinative forms which host possessive markers serving to indicate future bene iciaries (see ?.
1
subclasses, i.e., distinctive patterns, over which words can be distributed. In fact, morphological complexity is commonly calculated by considering these factors.2 Ackerman & Malouf (2013) refer to this perspective on morphological typology as enumerative complexity or Ecomplexity: this is the classi ication and quanti ication of morphological phenomena by reference to factors (i)–(iii) above, which all igure in the formal shapes of words. In this respect, even a language like Tundra Nenets seems impoverished in comparison with, e.g., Estonian (Blevins 2006, Baerman 2014a), Seri (Marlett 2009, Baerman 2014b), Archi (Kibrik 1991, Corbett 2013) or South Saami (Gabbard 2015). The essential challenge, as formulated in the PCFP, is not new, and proposed answers to it have a similar pro ile (Paul 1891, Anttila 1989, Wurzel 1989, Fertig 2013): analogical inferences from forms belonging to known in lectional patterns permit reasonable guesses concerning likely candidates for unknown forms. Analogy as a mechanism that facilitates morphological learning, guides use, and directs change, has gained increasing recognition as a crucial explanatory resource over the years in numerous behavioral and cognitive domains (Esper 1925, 1966, 1973, Gentner et al. 2001, Hofstadter & Sander 2014, among others). But in order for analogy to be useful in understanding the development, maintenance and changes of morphological systems, the phenomena over which it applies must be clearly delineated. In this chapter, accordingly, we identify the basic word-based morphological assumptions which permit analogical inferencing to operate effectively via systems of implicational relations. In section 2 we differentiate between familiar morpheme-based approaches and wordbased approaches in terms of their interpretations of the part-whole relations central to morphological analysis. We explore the internal structure of complex words, the status of words in morphological theory and how this provides insights into the organization of words as systems of relatedness. We begin by suggesting that the construct “morpheme”, construed in the variety of operating assumptions deriving from the Post-Bloom ieldian tradition or more comprehensively, if vacuously, construed as in Beard (1995) and the Distributed Morphology tradition3 , provides little insight into the nature and effects of word internal structure: we will argue that the importance of internal structure for morphology is not in the identi ication of exponents for meaningful bits, but in ways that the organization of exponents facilitate patterns of discriminability that help to distinguish and relate (classes of) words4 . Sometimes, as we will see, this involves (semi-)classic morphemic composition, but more commonly it requires considering words as recombinant gestalts, i.e., wholes consisting of con igurations of redeployed elements (segmental, suprasegmental) that each alone do not contribute invariant meanings irrespective of the word contexts in which they occur. Most obviously, the reuse of the same forms with different functions in different word contexts is illustrated by 2
Cf. the articles in Sampson et al. (2010) and Miestamo et al. (2008): the more morphosyntactic distinctions, mappings and classes, the greater the morphological complexity. 3 Marantz (2013) is a recent defense of this position. 4 See Blevins (in press) for an extended exploration of this theme.
2
the distributions of morphomes5 and the phenomenon of polyfunctionality6 . This establishes the word as a patterned entity to be an essential unit of analysis in morphological theory. We then turn from the internal structure of words to their participation as parts within systems of relations constitutive of morphological organization. Developing themes found in Ackerman & Blevins (2008), we discuss af inities between the conception of part-whole relations in word-based approaches and similar analytic assumptions presently leading to reconceptualizations in the developmental sciences. In section 3 we demonstrate how a particular interpretation of implicational relations among words in paradigms can be understood in information-theoretic terms: we show how relations between words analyzed informationtheoretically in terms of conditional entropy suggest new perspectives on old issues and lead to new research questions, including ways to address the Paradigm Cell Filling Problem. Focusing on the systems of relations guiding morphological organization we introduce a different perspective on morphological complexity: instead of focusing on the additive aspect of how individual elements combine and are distributed, the complementary perspective, referred to as integrative complexity or I-complexity in Ackerman & Malouf (2013), identi ies and measures the relations between individual elements, i.e., the paradigmatic systems, that produce the organization in morphological systems. We also explore some recent developments and applications of the model presented in section 3. In section 4 we provide some conclusions.
2 Part/Whole Relations Since the middle of the 20th century there has been growing recognition of the importance of understanding the principles shaping systemic organization within the developmental sciences (see Gottlieb 1997, Oyama et al. 2001, Jablonka & Lamb 2006, Gilbert & Epel 2008, Overton 2010, Bateson & Gluckman 2011, among others), represented by the family of developmental dynamic systems approaches within both biology and psychology. There is an emerging cross-disciplinary consensus that familiar analyses which reduce complex wholes to their constitutive parts and formulate associated procedures to reconstruct wholes from these parts has led to inevitable gaps in our understanding of the targeted phenomena: it has been observed that the most ordinary, and often most puzzling objects, display properties that are not explicable in this way. Hence, the revived currency of the mysterious sounding dictum “The whole is more/different than the sum of its parts”. In line with this, Anderson (1972:393), arguing for a complex dynamic systems approach to the analysis of complex objects in contrast to the then-prevalent reductionism, writes the following: The main fallacy in this kind of thinking is that the reductionist hypothesis does not by any means imply a “constructionist” one: The ability to reduce everything 5 6
See Aronoff (1994), Cruschina et al. (2013). Ackerman & Bonami (in press) provide a formal analysis of polyfunctional markers in Tundra Nenets
3
to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. In fact, the more the elementary particle physicists tell us about the nature of the fundamental laws the less relevance they seem to have to the very real problems of the rest of science, much less to those of society. The constructionist hypothesis breaks down when confronted with the twin dificulties of scale and complexity. The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear, and the understanding of the new behaviors requires research which I think is as fundamental in its nature as any other. A view that recognizes the theoretical signi icance of wholes emerging from their parts, necessarily requires an understanding of the parts that receive their value from their participation in those wholes. From this perspective, the focus is on the relations between parts, rather than only on the parts themselves. Modern generative models of morphology that regard both words and paradigms as epiphenomena rather than irst-class theoretical objects, simply reducible to their parts (if abstracted correctly) and combinatoric rules (if formulated correctly), make this same constructionist error (see Blevins (2006) on constructionist v. abstractionist approaches to morphological analysis). In contrast, from a contemporary perspective the hypotheses guiding Word and Paradigm proposals have a natural af inity to much research in the complexity sciences and their efforts to explicate and explain systemic organization in different domains. For example, Russell (1930), arguing against reductionist genocentric speculations7 for understanding the development of morphology in biology, observes that “the unity of the organism is … not decomposable without loss, and cannot be resynthesized in its original completeness from the abstract components distinguished by analysis” (Russell 1930:146). This early insight, as well as the two “cardinal laws” he formulates, can be seen as foundational for most modern research on the central role for systems and wholes in developmental science (Gilbert & Sarkar 2000, Stiles 2008, Bateson & Gluckman 2011). His two laws are usefully phrased as follows (Russell 1930:147): 1. The activity of the whole cannot be fully explained in terms of the activities of the parts isolated by analysis, and it can be less explained the more abstract are the parts distinguished. 2. No part of any living unity and no single process of any complex organic activity can be 7
This is, effectively, what Benítez-Burraco & Longa (2010) refer to as Evo-DevoGEN in their reassessment of longstanding operating asssumptions in Chomskyan theory. Their claim is that Chomskyan theory as ordinarily practiced is incompatible with the sort of systems perspective on biological analysis associated with most research in Evolutionary Developmental Biology. Accordingly, they propose a radically revised version of the theory that adopts many insights of perennial critics of that framework. For example, Elman et al. (1996) articulates a systems perspective on language and development that argues against Evo-DevoGEN from a systems perspective.
4
fully understood in isolation from the structure and the function of the organism as a whole. Returning to language, Matthews (1991:204) observes that morphological analysis focuses on identifying patterns of part-whole relations constitutive of word internal structure as well as the relatedness between words that produces the organization characteristic of paradigmatic systems. …words are not merely wholes made up of parts, but are themselves construable as parts with respect to systems of forms in which they participate. Differing views about part/whole relations permit a perspicuous contrast between (post) structuralist assumptions typi ied in generativist (particularly Chomskyan) approaches and the classical representations associated with Word and Paradigm (WP) approaches.8 In particular, the distinctive properties of these approaches can be identi ied by distinguishing between two different domains over which part-whole relations are de inable. We will argue that word-based approaches, unlike “morpheme”-based approaches, recognize words and paradigms as wholes whose theoretical status provides instructive insights into morphological organization. In the domain of in lectional word structure questions arise concerning the internal composition of (complex) words, especially concerning the relevant units of analysis, their types of combination and their association with morphosyntactic properties . For example, relevant inquiry concerns whether all of the pieces of complex words are meaningful, whether linear and hierarchical combination are the sole modes of arrangement and whether composition is the only means for achieving the meanings associated with words. We will refer to this as Domain 1. The appropriate identi ication of word internal structure facilitates the discovery of inlectional patterns that (classes of) words participate in. In the domain of relations between words questions arise concerning sets and subsets of patterns that constitute the organization of morphological systems: this is the domain of paradigm structure which, of course, presupposes that words are construed as parts of patterned wholes. We will refer to this as Domain 2.
2.1 Domain 1: The internal structure of words The central focus of structuralist and generative morphology has been on the internal structure of words, that is on Domain 1. A prevailing assumption has been that internal structure consists of morphemes, interpreted in an increasingly capacious and theory-driven fashion over the years (Marantz 2013, but see Hay & Baayen 2005). Understanding a complicated whole by reducing it into constitutive parts is a familiar and successful analytic strategy in the sciences. Within the ield of linguistic morphology, Blevins 8
See Blevins (in press) for a comprehensive introduction to Word and Pattern morphology.
5
(in press), following Hockett (1987), refers to this as disassembly. For example, the complicated Hungarian word in (1) can be disassembled into 4 parts, each bearing its own meaning. (1) bátor-ság-om-ról brave-ness-1 . -about ‘about my bravery’ It is conventionally thought in Post-Bloom ieldian structuralist morphology and its modern congeners within the generative paradigm, that it is the theoretician’s task is to identify the constitutive pieces of complex words and to explicate the nature of the reassembly processes that reconstitute the whole from its parts. For (1) this is straightforward: a simple concatenation of the disassembled parts additively produces a word that re lects both the form and the meaning of these parts.9 In the present instance, the meaning of the assembled whole can be construed as a composite of its meaningful pieces: an adjectival base meaning ‘brave’ is followed by a marker of nominalization and this is followed in turn by a possessive marker and a case marker. What theory requires, accordingly, is an inventory of parts and the rules that arrange them into words. A widespread approach to morphological analysis, reinforced in introductory linguistic textbooks and assumed in highly developed structuralist approaches such as Distributed Morphology, hypothesizes that the fundamental units of disassembly and reassembly are morphemes. Morphemes, in their ordinary interpretation, are bi-unique form-meaning pairings.10 This approach can be characterized as syntagmatic, because it emphasizes both the linear concatenation of constitutive parts as well as their hierarchical arrangement into (binary) branching structures, and compositional, because it derives the meaning of the whole word from the meanings of its identi iable parts, including the scopal relations provided by the hierarchical organization of its elements. On this view, 9
We ignore here the issue of possible hierarchical structure within the word. Recent efforts within Distributed Morphology have assimilated the criticisms levelled at both standard and extended notions of the morpheme construct by denominating the all of the word internal structure conceived as non-morphemic in WP models as morphemic: this is motivated by Beard’s lexeme/morpheme proposal, according to which every element which is not a lexeme is a morpheme. This concurs with common assumptions within WP approaches that part of morphology is the study of word internal structure, but it also, in effect, deines all word internal structure as necessarily morphemic. It includes as “morphemic” every type of exponent conventionally used to argue against the standard notion of morpheme. In doing so, it locates the difference between WP and structuralist approaches in their implementational choices and the guiding assumptions about how language analysis corresponds with the analysis of complex objects in other disciplines. The crucial question appears to be not what the elements of word internal structure are called, but how they are used in different morphological approaches. For instance, since DM adopts the standard many-to-many relations between morphosyntactic properties and their forms found in WP proposals, a distinguishing difference between alternative approaches concerns whether the “internal structure of in lectional morphology receives a syntactic treatment” (Marantz 2013:908): the syntactic treatment of words is one of the gambits guiding research in DM that differentiates it from WP approaches. In the latter, morphology and syntax are interdependent, but independent systems governed by their own primitives and principles of organization: syntax is not morphology beyond the word, as suggested by the stucturalists, nor is morphology syntax below the word, as suggested in various generative proposals. Finally, concerning correspondence with other disciplines, it is not evident how the implementational preferences in, e.g., DM, relate to results achieved by systems oriented proposals such as those described here. 10
6
Hungarian morphology can serve as a model system for understanding cross-linguistic morphology, since it so clearly exempli ies the essential assumptions guiding morphemic theory. In fact, this basic analytic strategy appears so intuitive, given the instructive example of Hungarian, that it seems commonsensical to extend it to languages in which the meanings of the parts and the composite meaning of the whole are less transparently related than they are in Hungarian. This is the issue Lounsbury (1953) raised when he wondered whether morphological theory should be predicated on theoretizing a “ ictive agglutinative analog”. How much are all languages underlyingly like an idealized version of Hungarian is on its surface? In this connection, a fundamental theoretical question that arises, for both Hungarian and other languages, concerns whether the instructions for reassembly of the pieces in terms of e.g. linear sequence or hierarchy, can adequately recapture the nature of Hungarian complex words and whether the parts’ participation in the whole in Hungarian and in other languages more generally is unrecapturable without diacritic instructions for reassembly posited in all such theories? With at least equal plausability, even such straighforward Hungarian data can be interpreted as motivating a morphological theory guided by the assumption that there are gradient degrees of regularity, as in Bybee (1985) and Bochner (1993), among others. Of course, in Hungarian, as in other languages, there are similarly structured words that are not simply the sums of their parts. Hungarian egész-ség-em-ről ‘about my health’ contains all the same basic parts as bátor-ság-om-ról (namely, an adjective root, a nominalizer, a possessive marker, and an elative case suf ix), but the derived nominal egész-ség conventionally denotes ‘health’, similar to what occurs in English, not the expected compositional meaning ‘wholeness’. Accepting the need to stipulate the meaning of the derived noun, however, the rest of the word’s meaning conforms with expectation. Such locally idiosyncratic departures from expectations, then, can be viewed as restricted deviations that support the basic analysis into meaningful parts and division between regular and irregular forms. In both instances the basic elements recur in essentially the same locations and functions: the potential role of their patterned organization is hidden by the coincidence of the same pieces organized in the same way. Consideration of the irst past tense realizations of -em conjugation verbs in Mari (Uralic) provides a simple demonstration of how the organization of elements can be as crucial as the elements themselves. This is illustrated with partial paradigms for the representative verb kol ‘die’. (2)
1 1 kolə̂ -s̆ -ə̂ m 2 kolə̂ -s̆ -ə̂ c 3 kolə̂ -s̆
1 s̆ -ə̂ m kolə̂ s̆ -ə̂ c kolə̂ ə̂ s̆ kolə̂
In Mari there are clearly segmentable elements whose invariant sequence is associated with past tense and 1sg subject for the relevant conjugation class: on the other hand, it is the location of this unit as a suf ix or as an independent unit preceding the verbal stem that is associ-
7
ated with positive versus negative polarity of verbs, respectively.11 The same pieces deployed in different morphological con igurations convey different polarity values for verbs: they take on different functions in the word context in which they occur. We ind another, far more complex, example of the challenge of conveying, e.g., the singular versus plural number distinction in nouns, in the Agar dialect of Dinka (Eastern Nilotic) as analyzed in Andersen (2014).12 This language distinguishes case and number for its inventory of mostly mono-syllabic and di-syllabic noun lexemes by word internal interactions among four parameters: (i) vowel length, (ii) tone, (iii) voice quality of the vowel, and (iv) vowel quality alternation grade. He presents the following noun pairs to illustrate the evident independence of any speci ic collection of these parameters with determinate number values: (3) dı̠́t kɔ̠̀ ɔɔr rjɛ̠̀ ɛm cjé̤ ec lá̤ j mà̠ ac dò̠ m tò̤ oɲ t̪wô̤ ooɲ ɰà̤ am
djɛ̠̀ ɛt kà̠ ar rı̠̂m cı̤́c là̤ aj mɛ̤̂ ɛc dṳ̂ um tô̤ oɲ t̪ṳ́uɲ ɰɔ̤̀ ɔɔm
‘bird’ ‘elbow’ ‘blood’ ‘bee’ ‘animal’ ‘ ire’ ‘ ield’ ‘pot’ ‘ember’ ‘thigh’
Grossly describing one simple contrast, when we compare the pairs of words for ‘thigh’ and ‘elbow’: it becomes clear that they display mirror images with respect to length and tone for their singular and plural exponents. Whereas the singular for ‘elbow’ has triple length for its vowels and low tone on the irst vowel, this is the word internal pattern for the plural of ‘thigh’. Similarly, the double length vowels and low tone for the singular of ‘thigh’ parallels the same pattern for the plural of ‘elbow’. Concerning the relation between singular and plural pairs of the subset of types presented above, Andersen (2014:226) writes: From the very beginning of linguistic research on Dinka, it has been noted that number in lection of nouns in this language is irregular. Mitterutzner (1866:15) and Beltrame (1880:22–24) stated that there is no general rule for forming the plural from the singular, and both authors made observations about the types of phonetic differences existing between the singular form and the plural form of a noun… That number in lection of simple native nouns, such as those … above, is 11
Following Ackerman & Stump (2004), we assume that the periphrastic expressions for negative polarity ill cells in the morphological paradigm of the lexeme ‘die’. See Bonami (2015) for a formal treatment of the relation between multiword morphological expressions and their syntactic realization. 12 See Baerman (2012) for a detailed analysis of similar data in Nuer that recognizes the importance of the ideas presented in this chapter for the organization of the Nuer system.
8
indeed irregular and unpredictable, has recently been established by Ladd et al. (2009). The plural form cannot be predicted from the singular form, nor can the singular form be predicted from the plural form, and the number in lection may appear to be totally irregular. An approach to word structure that focuses on the surface shapes of word pairs, rather than on the generation of individual words, predicts that there is hidden patternment in this evident profusion of forms. It is the morphologist’s task to reveal this. Indeed, Andersen delineates numerous distinct patterns, demonstrating that what seemed utterly irregular is actually organized into different patterns and subpatterns with many or few members. Patterned pairings of word emerge from varying con igurations of vowel length, tone, vowel quality and vowel gradation. A crucial ingredient for understanding this system is the recognition that words are primary objects of morphological theory and that, consequently, contrasts between words disclose the patterned nature of morphological organization. Indeed, a fundamental distinction between “morpheme”-based and word-based approaches is the claim in the former approach that words are epiphenomal and the hypothesis in the latter that they represent an important independent level of analysis. This basic view is foundational for the careful exploration of Georgian morphology in Gurevich (2006:44): The meaning of the whole word licenses the exponents to be used, but there is no precondition that the meanings of the exponents have to combine to comprise the meaning of the whole. Compositionality may, indeed, emerge, but as a side product rather than a central principle, or perhaps as an effective learning strategy. The whole itself may contribute meaning to the meanings of the parts, or may override the meanings of the parts. (Classes of) words are wholes that are distinguishable by means of the patterns associated with their pieces. From a word-based perspective,surface wordforms are best viewed as ‘recombinant gestalts’ or con igurations of recurrent partials (segmental or suprasegmental) that get distributed in principled ways among members of paradigms. This parallels what Oudeyer (2006:22) describes as the “systematic reuse” (we would suggest systemic reuse) of phonological distinctions: all languages have repertoires of gestures and combinations of gestures which are small in relation to the repertoires of syllables, and whose elements are systematically reused to make syllables. Similarly the domain of morphology can be seen as an instance of a complex adaptive system, redeploying the same pieces in new ways for different purposes. Given this, the analysis of morphology begins to look like it can bene it from methods used in other ields which study such systems. The adoption of the word as an independent and necessary unit of analysis also permits words, in turn, to be parts of paradigmatic systems or niches. In this respect, a consequence of 9
permitting words to be contrasted with words is the possibility of discovering morphological organization in the systems of relations between words. This accords with Robins (1959:128) observation: …words anchored, as it were, in the paradigms of which they form a part usually bear a consistent, relatively simple and statable grammatical function. The word is a more stable and solid focus of grammatical relations than the component morpheme by itself. Put another way, grammatical statements are abstractions, but they are more pro itably abstracted from words as wholes than from individual morphemes. In this connection Blevins (in press) makes an incisive observation about the consequences of favoring a focus on pieces and their composition, while ignoring the pivotal role of the word in morphology. Solely focusing on the comprehensive reduction of the word into smaller pieces denies a whole line of inquiry into morphology, namely, the examination of how surface patterns of words cohere into organized systems. In particular, the surface expressions of words do not simply motivate the need for operations that transpose covert invariant representations into overt wordforms, but they are centrally informative units which contribute to the similarity and difference relations facilitating the organization of and relations between paradigms. It is to this other aspect of the part/whole aspect of morphological organization that we turn to now.
2.2 Domain 2: The place of words in the systemic organization of paradigms The value of describing contrasts between words differing in morphosyntactic content and surface exponence is evident from the Mari and Agar Dinka examples discussed in the preceding section: the words used in these data displays are representative of general patterns instantiated by morphological systems in these languages. Rather than being epiphenomena, like words, as suggested in some theoretical frameworks, the patterns extracted from the comparisons between related words can be interpreted as instructive about fundamental organizing principles of morphological systems. As argued in the following section, such patterns provide data for quantifying and modelling relations between (classes of) words. The explanatory utility of words and paradigms in the dynamics of language change is unquestionable. This is especially evident from instances of analogy. For example, Anttila (1989:91) provides a simple example from Estonian, where the loss of medial -k as an onset to a closed syllable had consequences for forms of words without medial -ks. The nominative singular form of the word kask ‘birch’ had an original plural form kas-k-et ‘birches’. Since the -k functioned as the onset to the closed syllable of the case/number marker, it was elided: the new form was kased. This form resembled the nominative plural for some -s inal words. For example, the original nominative singular for ‘ ir’ was kuus. while its nominative plural was kuused. This resemblance to the nominative plural for ‘birch’ led to an innovatory nominal singular form for ‘ ir’, speci ically kuusk. Historical examples such as this demonstrate the 10
importance of words and their role in morphological systems. But, they also suggest that language as a constantly changing object is pro itably viewed more generally as a dynamic system. This was the view of Paul (1891:5–6): Thus it is that the different uses, which have come to be associated with a word or a phrase, associate themselves with each other. Thus to the different cases of the same noun, the different tenses, moods and persons of the same verb, the different derivatives of the same root, associate themselves, thanks to the relationships between their sounds and the meaning; … further forms of different words with similar functions – e.g., all plurals, all genitives, … all masculines which form their plural by means of umlaut as contrasted with those that form it otherwise … These associations may one and all arise and operate without consciousness, and they must not be confounded with grammatical categories, which are the result of conscious abstraction, though they not infrequently cover the same ground. Paul also observed that patterned relations which inhered in such associative networks were useful as analogical bases for grammatical generalizations. Indeed, Davies (1998:259) identi ies the creative role that analogy has in Paul’s conception of language organization: Paul’s concept of analogy and of analogical proportion is a de inite attempt at providing a generalized account at a certain level of detail of how language production occurs and of how the speaker and hearer can produce an in inite number of forms and sentences which they have not heard before. Appropriate analogical inferences are facilitated by the implicational relations characteristic of paradigm structure as observed by Matthews (1991) and explored by Wurzel (1989). The basic nature of such structure is made plain in Pauonen’s (1976) analysis of Finnish nominal paradigms13 Following the classi ication scheme found in Pihel & Pikamäe (1999:758–771) we can represent a subset of the Finnish nominal paradigms as follows:14 (4)
N S ovi kieli vesi lasi nalle kirje
G S oven kielen veden lasin nallen kirjeen
P S ovea kieltä vettä lasia nallea kirjettä
P P ovia kieliä vesiä laseja nalleja kirjeitä
I P ovissa kielissä vesissä laseissa nalleissa kirjeissä
‘door’ (8) ‘language’ (32) ‘water’ (10) ‘glass’ (4) ‘teddy’ (9) ‘letter’ (78)
The numbers in parentheses after the glosses refer to Pihel & Pikamäe’s classes. Looking at classes 8, 32, 10, and 4, it becomes clear that nominative singular forms ending in the vowel -i 13
See also Thymé 1993. The following discussion describes the patterns in the simplest way in order to provide a sense of the nature of implicational relations. Among other factors, it ignores well-known phonological generalizations and patterns of stem alternation. 14
11
are not diagnostic of the corresponding genitive singular wordform: these variants end in -en for 8 and 32, but in -in for class 4. Moreover, familiarity with the genitive singular forms for classes 8, 32, and 10, is not directly diagnostic for the predicting the nominative singular, since there is a consonant difference in the stem for class 10 from -d to -s. Likewise, knowledge of the genitive singular is not unambiguously diagnostic of the partitive singular for these classes, since class 8 appears to pattern with class 9 for this morphosyntactic property set in ending with -ea, while 32 and 10 appear to pattern together, both taking -tä. Though it seems that there is a kaleidoscope of patterns in this subparadigm, a close look reveals that there are some individual forms in speci ic cells that are predictive of forms in other cells, while in other instances ensembles of forms from several cells jointly predict the particular patterns for words. For example, knowledge of the partitive plural form of a word in class 4 and 9 is not enough to predict their nominative singular variants. However, knowledge of both the partitive plural and the genitive singular are jointly predictive of the the correct nominative singular forms. Relatedly, knowledge of the partitive singular alone for class 8 and class 4 is not enough to predict whether the nominatve singular ends in -i or -e. But, knowledge of the forms for both partitive singular and partitive plural is reliably predictive of the nominative singular forms.15 Paradigmatic implications of this sort appear to be an essential property of morphological organization cross-linguistically and are particularly important in languages with complex morphological systems. The existence of relations between words such as these in complex systems motivates the need for the application of new quantitative methods for revealing patterns that would otherwise remain undetected and for calculating degrees of relatedness between the words within (and across) identi ied patterns. This will be subject of section 3.
2.3
Summary
In sum, it is useful to distinguish between different approaches to morphology in terms of how they conceptualize part/whole relations with respect to Domain 1 and Domain 2. So, what is the difference in the interpretation of word internal structure between “morpheme”based and word-based approaches? From a word-based perspective the “morpheme”-based foci on resolving the indeterminacies of segmentation, developing reassembly and readjustment rules, assuming a stark division between regular and irregular formations and presuming the epiphenomal status of words seems to misunderstand the fundamentally part/whole organization of morphology. The cross-linguistic examination of word internal structure reveals, instead, how the con igurative organization of elements constitutive of words yields patterns that distinguish (classes of) words from each other. The function of internal structure is, accordingly, discriminative (see Baayen & Ramscar 2015, Ramscar et al. 2015): it provides discriminably different patterns that facilitate the patterns of similarity and difference that words participate in. This leads to questions concerning how wordforms are orga15
There are, of course, many other implicative relations among forms in this illustrative subparadigm.
12
nized into structured networks of conjugation and declension classes within in lectional and derivational families. It becomes natural to ask why the systems of organization cohere in the ways that they do, how such organization is learned, and whether the nature of the organization re lects learnability constraints, either speci ic to language or relevant in other learned domains as well. The largest difference between these approaches concerns whether words as whole units are regarded as theoretical objects of interest: they are not in generative approaches, but they are in WP approaches. Given the status of words as theoretical objects they can function as parts in systems of relations that constitute another whole, namely, the patterns of related words. This is a dimension of paradigmatic organization, where whole words constitute parts of patterns which themselves are constitutive of larger systems of word patterns and their interrelations. Beyond the descriptions of paradigmatic structure in terms of implicative organization and intuitions concerning their importance, modern development of word-based proposals have also devised ways to measure and quantify the relations constitutive of this structure, We explicate and explore one particularly fertile approach to this below.
3 Information theoretic models The complexity associated with lexically-conditioned allomorphy typically shows only loose correlation with systematic phonological or semantic conditions (synchronically, at least) and often seems to serve no apparent communicative function. All natural languages show a certain degree of what Baerman et al. (2010:2) call “gratuitous” morphological complexity and Wurzel (1986:76) describes as “ballast” in the linguistic system. Take, for example, Pite Saami nominal paradigms. In Pite Saami, seven cases (setting aside the marginal essive and abessive) and two numbers are encoded via a suf ix and choice of a weak or strong stem, where stem grades are distinguished by regular patterns of consonant and vowel alternations. For example, the complete paradigm for bäbbmo ‘food’ is (Wilbur 2014:102): 16
Class Ie nouns are also distinguished by “non-adjacent regressive vowel harmony triggered by the presence of /j/ in certain case/number suf ixes” (Wilbur 2014:102). 17 Class II nouns show variation in the suf ix vowel, though “there do not appear to be many words in Class II, and the data in the corpus are ultimately inconclusive” (Wilbur 2014:104). 18 In Class IIIb, nominative singular forms drop a stem- inal consonant. For example, compare Class IIIa vanás ‘boat’ . ∼ vadnás-a . and Class IIIb bena ‘dog’ . ∼ bednag-a . . In both, the -n- ∼ -dn- alternation follows from general stem grade patterns, but the loss of the inal -g in bena does not (Wilbur 2014:106).
13
. Ia Ib Ic Id Ie16 II17 IIIa IIIb
str+a str+á str+o str+å str+e wk+aj wk+∅ wk+V 18
. wk+a wk+á wk+o wk+å wk+e str+a str+a str+a
. Ia Ib Ic Id Ie II IIIa IIIb
wk+a wk+á wk+o wk+å wk+e str+a str+a str+a
. wk+aj wk+áj wk+oj wk+åj wk+ij str+aj str+ij str+ij
.
.
wk+av wk+áv wk+ov wk+åv wk+ev str+av str+av str+av
str+aj str+áj str+oj str+åj str+áj str+aj str+ij str+ij
.
.
wk+ajd wk+ájd wk+ojd wk+åjd wk+ijd str+ajd str+ijd str+ijd
wk+ajda wk+ájda wk+ojda wk+åjda wk+ijda str+ajda str+ijda str+ijda
. wk+an wk+án wk+on wk+ån wk+en str+an str+in str+in . wk+ajn wk+ájn wk+ojn wk+åjn wk+ijn str+ajn str+ijn str+ijn
. wk+ast wk+ást wk+ost wk+åst wk+est str+ast str+ist str+ist . wk+ajst wk+ájst wk+ojst wk+åjst wk+ijst str+ajst str+ijst str+ijst
. wk+ajn wk+ájn wk+ojn wk+åjn wk+ijn str+ajn str+ijn str+ijn . wk+aj wk+áj wk+oj wk+åj wk+ij str+aj str+ij str+ij
Table 1: Pite Saami nominal in lection classes (adapted from Wilbur 2014)
(5) bäbbmo biebmo biebmov bäbbmoj biebmon biebmost biebmojn
biebmo biebmoj biebmojd biebmojda biebmojn biebmojst biebmoj
Following Wilbur (2014), Pite Saami has eight nominal declensions showing distinct grade and suf ix patterns, shown in Table 1. Since the assignment of lexical items to particular declensions is largely arbitrary (though in luenced by phonological factors), these classes add complexity to the in lectional system in a way that serves no communicative purpose. In classical paradigm-based models of morphology, a morphological system is represented via two distinct components: a set of exemplary full paradigms that exhibit the in lectional classes of a language, and sets of diagnostic principal parts which can be used to deduce which in lectional class a given lexeme belongs to. Speakers may memorize complete paradigms for frequent lexemes, but for infrequent lexemes speakers must produce wordforms by analogy from known lexical items. Given the right wordforms of a novel lexeme, Word and Paradigm models provide a general strategy for illing in the rest of the paradigm by exploiting its im14
plicational structure. In general, a small set of diagnostic principal parts is often suf icient to identify the in lectional class of a lexeme and thus to accurately predict the remaining wordforms of the lexeme. Paradigm-based models also re lect a measure of E-complexity: languages with a greater number of possible exponents, in lectional classes, and principal parts will require more wordforms to be memorized by the language user (and recorded by the lexicographer) in exemplary paradigms. However, from the point of view of the ( luent) language user, this is an arti icial measure of complexity. While speakers of morphologically complex languages do often have to produce wordforms that they have never heard before, they rarely have to predict all forms of a given lexeme. On the contrary, speakers must produce some subset of the complete paradigm of a lexeme given knowledge of some other subset, a task which rarely requires completely resolving a lexeme’s in lectional class membership. In addition, speakers have no guarantee that they will have been exposed to the most relevant or diagnostic principal parts of a novel lexeme. Thus, patterns of implicational relations among all wordforms within paradigms, not just principal parts, can be interpreted as providing speakers with a means for carrying out these predictions with incomplete information.19
3.1 Entropy In order to assess the strength of implicational relations among wordforms, we will use the information-theoretic notion entropy as the measure of uncertainty or predictability (Ackerman et al. 2009, Ackerman & Malouf 2013).20 This permits us to quantify “prediction” as a reduction in uncertainty, or information entropy (Shannon 1948).21 Suppose we are given a random variable 𝑋 which can take on one of a set of alternative values 𝑥 , 𝑥 , … , 𝑥 with corresponding probability 𝑝(𝑥 ), 𝑝(𝑥 ), … , 𝑝(𝑥 ). Then, the amount of uncertainty in 𝑋, or, alternatively, the degree of information conveyed on learning the value of 𝑋, is the entropy 𝐻(𝑋): 𝐻(𝑋) = −
𝑝(𝑥 ) log 𝑝(𝑥 )
The entropy 𝐻(𝑋) is the weighted average of the surprisal − log 𝑝(𝑥 ) for each possible outcome 𝑥 . The surprisal is a measure of the amount of information expressed by a particular outcome measured in bits, where 1 bit is the information content of a choice between two equally probable outcomes. Outcomes which are less probable (and therefore harder to predict) have higher surprisal. Speci ically, surprisal is 0 bits for outcomes which always occur (𝑝(𝑥) = 1) and approaches ∞ for very unlikely events (as 𝑝(𝑥) approaches 0). The 19
This fact motivates Stump & Finkel (2009, 2013) to propose a notion of “cell predictability” as one measure of paradigm transparency. 20 There are different ways to calculate complexity and the way that it impacts on the predictiveness and predictability of forms. A particularly well-developed alternative can be found in Stump & Finkel (2013). 21 For a comprehensive treatment of information theory, see Cover & Thomas (2006), while Peirce (1980) offers a more accessible introduction.
15
more choices there are in a given domain and the more evenly distributed the probability of each particular alternative, the greater the uncertainty or surprise there is (on average) that a particular choice among competitors will be made and, hence, the greater the entropy. Conversely, choices with only a few possible outcomes or with one or two highly probable outcomes among many unlikely exceptions have a low entropy. With this as background we can now return to the Pite Saami nominal paradigms in Table 1 to quantify the uncertainty among the nominal types. Suppose we want to represent the in lection class membership of an arbitrary lexeme. This is, for instance, the problem faced by a lexicographer preparing a dictionary of the language. If 𝐷 is the set of declensions for a particular paradigm, the probability (assuming all declensions are equally likely) of an arbitrary lexeme belonging to a particular paradigm 𝑑 is 𝑝(𝑑) =
1 |𝐷|
Since in the Pite Saami example there are eight distinct classes, the probability of any lexeme belonging to any one class would be . We could represent a lexeme’s declension as a choice among eight equally likely alternatives, which thus has an entropy of − log 8 = 3 bits. This is the declension entropy 𝐻(𝐷), the average information required to record the in lection class membership of a lexeme. In general, not all in lection classes are equally likely: for Pite Saami, Wilbur (2014:100) reports that Class I is “a sort of default class” which contains the majority of nouns, while Class II and III are relatively rare. In any in lectional system, some classes will have more members than others, and a randomly selected lexeme is more likely to be a member of a class with many members. Let 𝐹typ (𝑑) be the type frequency of declension 𝑑, i.e., the number of lexemes that are members of that class. Then, in general, the probability of a declension 𝑑 is: 𝐹typ (𝑑) 𝑝(𝑑) = ∑ ∈ 𝐹typ (𝑑) That is, the probability of a randomly selected word being in declension 𝑑 is just the number of lexemes which actually are in declension 𝑑 divided by the sum of the lexeme counts for all declensions (which in turn is just the total number of lexemes in the relevant vocabulary). Factoring type frequency into our calculation of declension entropy can only reduce our estimate, sometimes substantially. Hypothetically, suppose that 95% of Pite Saami noun lexemes are evenly divided among the subclasses of Class I and the remaining 5% are divided among the other three classes. Then, our estimate of the declension entropy would be reduced from 3 bits to 2.6 bits. In many cases, in lectional class membership is also at least partly predictable by external factors, such as the phonological shape or lexical gender of the root. Any information that helps speakers predict the realization of a wordform can only reduce the entropy. For the sake of this example, we ignore both frequency and these external factors. This means that
16
the entropy values we present are upper bounds. If all factors are taken into account the actual entropies will likely be much lower. Recording the declension of an arbitrary noun lexeme (the problem faced by our hypothetical lexicographer) is more dif icult than the problem faced by a speaker. An exhaustive dictionary might provide complete paradigms showing all of the in lected forms of a lexeme or class of lexemes, while speakers need only produce one single form in any particular context. When we look at individual paradigm cells rather than full paradigms/in lection classes. we ind much less uncertainty than the declension entropy would lead us to expect. While there are eight declensions in Table 1, most cells show only seven distinct forms, and the illative singular only has ive possible realizations. Let 𝐷 be the set of declensions for which the paradigm cell 𝑐 has the formal realization 𝑟. Then the probability 𝑝 (𝑟) that a paradigm cell 𝑐 of a particular lexeme has the realization 𝑟 is the probability of that lexeme belonging to one of the declensions in 𝐷 , or: 𝑝 (𝑟) =
𝑝(𝑑) ∈
The entropy of this distribution is the paradigm cell entropy 𝐻(𝑐), the uncertainty in the realization for a paradigm cell 𝑐. Carrying out the necessary calculations for the Pite Saami paradigms, we get: (6)
. 3.000
. 2.406
. 2.406
. 2.250
. 2.750
. 2.750
. 2.750
. 2.406
. 2.750
. 2.750
. 2.750
. 2.750
. 2.750
. 2.750
Note that the paradigm cell entropy varies across the paradigm cells. The illative singular only has ive possible realizations and an entropy of 2.250 bits, while the most diverse cells have an entropy (at 3.00 bits, assuming uniform declension probabilities) equal to that of the declension system as a whole. The average entropy across all cells is 2.658 bits; this average is a measure of how dif icult it is for a speaker to guess the realization of any one wordform of any particular lexeme in the absence of any information about that lexeme’s declension. An entropy of 2.66 bits is equivalent to selecting among only 2 . = 6.31 equally likely alternatives. That is, the Pite Saami paradigms in Table 1 fall into eight declensions, but selecting the realization for a particular wordform of a lexeme is as dif icult as a choice among a little more than six equally likely alternatives.
3.2 Conditional entropy Guessing the realization of a single wordform is quite a bit easier than guessing the declension of a lexeme. But, even this overstates the complexity of the system, as speakers must have some information about the lexeme in order to know that the lexeme even exists. At 17
a minimum, speakers will know at least one wordform of a lexeme for which they wish to produce a novel wordform. To quantify the predictability of one form given the other, we can measure the size of the surprise associated with these forms using conditional entropy 𝐻(𝑌|𝑋), the uncertainty in the value of 𝑌 given that we already know the value of 𝑋: 𝐻(𝑌|𝑋) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = −
𝑝(𝑥, 𝑦) log 𝑝(𝑦|𝑥) ∈
∈
The smaller 𝐻(𝑌|𝑋) is, the more predictable 𝑌 is on the basis of 𝑋, i.e., the less surprised one is that 𝑌 is selected given knowledge of 𝑋. In the case where 𝑋 completely determines 𝑌, the conditional entropy 𝐻(𝑌|𝑋) is 0 bits: given the value of 𝑋, there is no question remaining as to what the value of 𝑌 is. On the other hand, if 𝑋 gives us no information about 𝑌 at all, the conditional entropy 𝐻(𝑌|𝑋) is equal to 𝐻(𝑌): given the value of 𝑋, we are just as uncertain about the value of 𝑌 as we would be without knowing 𝑋 at all. Above we de ined 𝑝 (𝑟), the probability that paradigm cell 𝑐 of a lexeme has the realization 𝑟. We can easily generalize that to the joint probability of two cells 𝑐 and 𝑐 having the realizations 𝑟 and 𝑟 respectively: 𝑝
,
(𝑟 , 𝑟 ) =
𝑝(𝑑) ∈
∧
To quantify paradigm cell inter-predictability in terms of conditional entropy, we can de ine the conditional probability of a realization given another realization of a cell in the same lexeme’s paradigm: 𝑝 , (𝑟 , 𝑟 ) 𝑝 (𝑟 |𝑐 = 𝑟 ) = 𝑝 (𝑟 ) With this background, the conditional entropy 𝐻(𝑐 |𝑐 ) of a cell 𝑐 given knowledge of the realization of 𝑐 for a particular lexeme is: 𝐻(𝑐 |𝑐 ) =
𝑝 (𝑟 ) 𝑝 (𝑟 ) log 𝑝 (𝑟 |𝑐 = 𝑟 )
In the case of the Pite Saami forms in Table 1, if we know the genitive singular, then we know the nominative plural; these forms are always identical. If we know the nominative plural is marked by the weak grade stem and the suf ix -a, then we can predict with certainty that the illative singular is marked by the strong grade stem and the suf ix -aj. That is, 𝐻( . | . = wk+a) is 0 bits. If, though, the nominative plural is in str+a, there are two possibilities for the irst person plural, either str+aj (in one class) or str+ij (in two classes). Therefore, 𝐻( . | . = str+a) = 0.918 bits. Averaging across each of the
18
possible realizations for the nominative plural, we get the conditional entropy: 𝐻(
. |
. ) = 0.344 bits
In other words, while guessing the . of a lexeme is a choice among ive alternatives, guessing the . on the basis of the . requires (on average) a choice among only 2 . = 1.3 alternatives. The entropy is a measure of the dif iculty for solving one instance of what was referred to above as the Paradigm Cell Filling Problem: predicting a speci ic unknown wordform from a speci ic known wordform. A complete table of pairwise conditional entropies for Pite Saami nouns is given in Table 2. One obvious pattern is that for the majority of cell pairs, the conditional entropy is zero bits. That is, most cells in the paradigm are completely predictable from most other cells. And, even for the cells which are not completely predictable, the conditional entropy is consistently less than one bit: all cells are mostly predictable from any other cell. These values tell us how dif icult it is to guess one particular wordform on the basis of one other particular wordform. In general, however, we cannot predict which forms a speaker will generalize from or to. This will depend on the cell probability 𝑝(𝑐), the probability that a randomly selected wordform is some lexeme’s realization of cell 𝑐. In the simplest case we can assume that all cells are equally likely, so if 𝐶 is the set of cells in a paradigm then: 𝑝(𝑐) =
1 |𝐶|
Or, we could estimate 𝑝(𝑐) from the token frequency 𝐹tok (𝑐) of the cell 𝑐 in a representative corpus: 𝐹tok (𝑐) 𝑝(𝑐) = ∑ ∈ 𝐹tok (𝑐) Given 𝑝(𝑐), the expected values 𝐸[col = 𝑐 ] and 𝐸[row = 𝑐 ] are the average uncertainty in guessing the form of some cell 𝑐 or guessing based on the form of cell 𝑐 (respectively): 𝐸[col = 𝑐 ] =
𝑝(𝑐 ) 𝐻(𝑐 |𝑐 )
𝐸[row = 𝑐 ] =
𝑝(𝑐 ) 𝐻(𝑐 |𝑐 )
Columns averages 𝐸[col] are a measure of predictedness, or how dif icult it is to guess the realization of a cell (on average) given knowledge of some other cell. For Pite Saami, we get: (7)
. 0.368
. 0.038
. 0.038
. 0.079
. 0.118
. 0.118
. 0.118
. 0.038
. 0.118
. 0.118
. 0.118
. 0.118
. 0.118
. 0.118
19
. . . . . . . . . . . . . . .
— 0.594 0.594 0.750 0.250 0.250 0.250 0.594 0.250 0.250 0.250 0.250 0.250 0.250 .
. . . . . . . . . . . . . .
0.000 0.000 0.000 0.500 0.000 0.000 0.000 — 0.000 0.000 0.000 0.000 0.000 0.000
. 0.000 — 0.000 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 . 0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 — 0.000 0.000 0.000 0.000 0.000
.
.
0.000 0.000 — 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.344 0.344 — 0.000 0.000 0.000 0.344 0.000 0.000 0.000 0.000 0.000 0.000
.
.
0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 0.000 — 0.000 0.000 0.000 0.000
0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 0.000 0.000 — 0.000 0.000 0.000
. 0.000 0.344 0.344 0.500 — 0.000 0.000 0.344 0.000 0.000 0.000 0.000 0.000 0.000 . 0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 0.000 0.000 0.000 — 0.000 0.000
. 0.000 0.344 0.344 0.500 0.000 — 0.000 0.344 0.000 0.000 0.000 0.000 0.000 0.000 . 0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 0.000 0.000 0.000 0.000 — 0.000
. 0.000 0.344 0.344 0.500 0.000 0.000 — 0.344 0.000 0.000 0.000 0.000 0.000 0.000 . 0.000 0.344 0.344 0.500 0.000 0.000 0.000 0.344 0.000 0.000 0.000 0.000 0.000 —
Table 2: Conditional entropies 𝐻(col|row) for Pite Saami noun paradigms in Table 1
20
Row averages indicate a cell’s predictiveness: the average uncertainty in another paradigm cell given knowledge of that cell. For Pite Saami, we have: (8)
. 0.000
. 0.311
. 0.311
. 0.519
. 0.019
. 0.019
. 0.019
. 0.311
. 0.019
. 0.019
. 0.019
. 0.019
. 0.019
. 0.019
The nominative singular is very predictive but harder to predict: on its basis all other forms are completely predictable, making it a principal part in the classical sense. The illative singular is the least predictive: knowing the . leaves on average 0.511 bits of uncertainty remaining about the realization of another cell. The average across all possible pairs of wordforms, the overall average conditional entropy 𝐻(𝑃) is: 𝐻(𝑃) = =
𝑝(𝑐 ) 𝐸[col = 𝑐 ] 𝑝(𝑐 ) 𝐸[row = 𝑐 ]
=
𝐻(𝑐 |𝑐 )
For our Pite Saami example, 𝐻(𝑃) is 0.116 bits, equivalent to a choice among only 2 . = 1.08 equally likely declensions. That is, while Pite Saami has eight nominal declensions from the point of view of a lexicographer trying to describe the language, for a speaker trying to use the system it has on average only slightly more than one: this is the I-complexity of this paradigm. Accordingly, in lectional systems in the form of Table 1 have the potential to greatly overstate the apparent complexity of a morphological system. In (6) above, we saw that the average entropy per cell in the Pite Saami nominal paradigm is 2.658 bits, and the average conditional entropy given one other cell is 0.116 bits. That means that, on average, each paradigm cell provides 2.658−0.116 = 2.542 bits of information about each other cell. So, while these paradigms show a moderate degree of E-complexity (measured either by the number of paradigm cells and in lection classes or by paradigm cell entropy), each wordform of a lexeme also provides the speaker with a lot of information about the other wordforms. By exploiting these implicational relations, speakers can make reliable inferences about unknown cells on the basis of known ones. These results are in line with the range of average conditional entropy values reported by Ackerman & Malouf (2013). In a small sample of paradigms taken from typologically and genetically diverse languages, the average conditional entropy ranged from 0 bits to 0.75 bits. The sample included languages with impressively complex-looking morphological systems, such as Nuer (see section 2.1) and Chiquihuitlán Mazatec (a language with at least 109 21
different verbal conjugations). Despite the large range in the E-complexities of these languages, as measured by the number of paradigm cells, allomorphs, and in lection classes, the I-complexities fell within a narrow range. One thing to note about these results is that they do not depend on the speci ic forms in the paradigm. All that matters is whether a speaker can discriminate between two forms: the speci ic form and the manner of discrimination (whether by af ixes, tone, stress, ablaut, etc.) is irrelevant. A (hypothetical) straightforwardly agglutinative language with the paradigm in (9) has an average conditional entropy of 0 bits, as expected: (9)
. I II
-a -o
. -am -om
. -aj -oj
. -ajm -ojm
However, a similarly hypothetical fusional paradigm like the one in (10) also has an average conditional entropy of 0 bits: (10)
. I II
-am -it
. -ij -os
. -im -un
. -ux -ad
In both paradigms, each cell uniquely identi ies the in lection class. There may be other reasons that paradigms like (9) are preferred. They might be easier for children to learn, or they might simply be more likely to evolve through natural processes of grammaticalization. Indeed, given the syntactic origins of many morphological markers, word internal structure of the forms that occupy paradigm cells is commonly encountered among the world’s languages (Bickel & Nichols 2013a,b). Instructively, from the perspective of I-complexity, these systems are equivalent, suggesting that there are many strategies that can produce similar outcomes. An information-theoretic approach, accordingly, provides important insights about cross-linguistic morphological organization that are inaccessible without the recognition of words and paradigms as primary objects of analysis. In the examples in (9) and (10), it is the inventory of af ixes that leads to low conditional entropy. Rearranging alignment between allomorphs and in lection classes would make no difference. In most real languages, however, in lection classes are organized into an implicational structure in a way that supports speakers’ ability to generalize to unknown forms. To quantify the role that implicational organization speci ically plays in Pite Saami morphology, we can perform a simple ‘bootstrap’ simulation (Davison & Hinkley 1997). Take Pite Saami , an alternate version of Pite Saami with the same E-complexity but with formal realizations assigned randomly to paradigm cells. More speci ically, we generate Pite Saami by randomly shuf ling the cells in each of the columns in Table 1, so that each declension is produced by randomly selecting (without replacement) for each of the fourteen paradigm cells one of the possible realizations of that cell. The result is a language with more or less the same E-complexity – the same number of declensions, paradigm cells, and allomorphs – as Pite Saami, but with no implicational structure. If we repeat this experiment and compute 22
the average conditional entropy for a range of randomly generated Pite Saami s, we ind that no randomized Pite Saami had an average conditional entropy as low as the actual language, and the average of the average conditional entropy for the randomized languages is 0.322 bits. The observed average conditional entropy is well outside what would be expected under the null hypothesis, which we can con idently reject in favor of the alternative, namely that assignment of realizations to cells in Pite Saami is not in fact random. Instead, realizations are assigned to cells in a way that reduces average conditional entropy by licensing inferences about unknown word forms and is crucial to lowering its I-complexity. In contrast, Ackerman & Malouf (2013) performed this same experiment with Russian, a language with relatively low E-complexity, and got a very different result: the average conditional entropy of randomized versions of Russian is 0.541 bits, only slightly higher than the actual average conditional entropy of 0.538 bits. This indicates that the implicational structure of the Russian paradigm is much less important for constraining the overall average conditional entropy. Nearly any random mapping between the morphosyntactic property sets and the resources for exponence in Russian yields low entropy, so there is no need for such languages to rely on implicational organization.
3.3 Emerging research directions While the simplest and most straightforward way to reduce uncertainty would be to have the one-to-one mappings without any allomorphy found in proto-typical agglutinative strategies, even casual acquaintance with cross-linguistic morphological systems reveals that this is somewhat atypical. However, the differences in strategies are rendered unimportant from the present perspective, since ef icient integrative complexity can be achieved in many ways in different language systems. In this connection, the organization of morphological systems discovered by the use of information-theoretic and other quantitative measures, encourages a more careful examination of subtle cues to morphological structure that further facilitate this organization and makes it learnable. The hypothesis that morphology is sensitive to uncertainty reduction and the effects that this has on accurate inferencing, i.e., providing an answer to the PCFP, leads to experimental research and modelling in order to understand both language particular systems and typological phenomena. In the previous section we introduced how information-theoretic measures have been applied to reveal relations that otherwise remain latent in the data distributions. This encourages new ways of inquiring about familiar issues that bear on what has been become newly visible in the domain of morphological organization. These are the perennial issues bearing on the identi ication of veridical data sets, the resources responsible for the learnability of such data sets and distillation of principles guiding cross-linguistic generalizations that are responsive to language variation. One direction pursued in current research is an effort to more directly ground entropy calculations in linguistic forms. As noted above, the manner of calculating entropy shown here gives an upper bound on entropy. Two signi icant factors which have the potential to greatly 23
reduce the effective entropy are excluded. For one, they assume that all in lection classes are equally likely, which is certainly not the case for real in lection systems. In fact, in Pite Saami, as in most systems, the majority of lexemes fall into one or two classes, leaving the remaining classes as ‘irregular’ or ‘exceptional’. For a given number of classes, the uniform distribution will be the one with the highest entropy. The kinds of highly skewed distributions that are found in natural in lectional systems will have much lower entropy.22 Most grammatical descriptions give little or no details about the number of lexemes in each in lection class, but class type frequencies can usually be inferred from dictionaries or wordlists (Bonami & Henri 2010). A second important property of real linguistic systems is not accounted for in the calculations shown here. While in lection class systems are, by our working de inition, arbitrary, most involve at least probabilistic correlations with factors such as semantic class, phonological form, grammatical gender, lexical tier, and other external semantic or grammatical dimensions of classi ication. The in luence of these external factors is dif icult to quantify. But, to the extent that these factors help speakers solve the PCFP, they will serve to lower the average conditional entropy of the paradigms, and should be included in the calculations (see also Stump & Finkel 2013). A more worrisome problem with the calculations in the previous section is that they depend on the particular analysis of Pite Saami verbs sketched in Table 1.23 The accuracy of the entropy values depends on the correctness of Wilbur’s analysis, the knowledge implicit in that analysis that native speakers are able to draw on, and the ability of native speakers to apply that knowledge in speci ic instances. All three of these assumptions are suspect. No linguistic analysis, no matter how carefully constructed, captures every detail of a language. As (Sapir 1921:39) observed, “all grammars leak” . In lection class systems often include remnants of old historical developments, leading to highly abstract analyses, the synchronic psychology reality of which is unclear. The analysis in Table 1 depends on a distinction between strong and weak stems which is overtly marked by a sometimes idiosyncratic combination of vowel and consonant alternations. And inally, in order to apply this analysis to solving the PCFP, speakers need to be able to recognize whether a wordform is strong or weak in isolation, which sometimes presupposes knowledge of exactly the information which they are trying to predict. For example, not all Pite Saami nouns participate in grade alternations. For nouns that do, an illative plural in -ajda is diagnostic of class membership. But, for these nouns that do not show an overt grade distinction, an illative plural in -ajda could indicate membership in class Ia or in class II. The average conditional entropy of the system for these nouns, with only af ixal marking and no stem alternations, is 0.237 bits, more than twice what was calculated above. 22
See Shannon (1948) for a formal discussion. Intuitively, though, knowledge of unequal in lection class probabilities will make it easier for speakers to guess the realization of some paradigm cell of a novel lexeme. This prior knowledge of frequencies is a source of information, and information lowers entropy. 23 In a slightly different context, the importance of the underlying analytic assumptions to complexity measures is explored by Stump & Finkel (2013, 2015).
24
These are cogent criticisms that have been made by Sims (2015) in her response to Ackerman & Malouf’s (2013) analysis of Modern Greek, and more extensively by Bonami & Boyé (2014) and Bonami & Luís (2014), who adapt Albright & Hayes’s (2003) Minimal Generalization Learner to extract implication relations directly from surface representations of paradigms. This allows them to include sensitivity to class frequencies in their model and to avoid assuming in advance a particular analysis of the data. Their learning algorithm induces an analysis using only the information that is available to speakers learning the system. In more current work, Bonami & Beniamine (2015) have extended this to address the effects associated with several cells for jointly predicting previously unencountered wordforms. Recent research within this information-theoretic approach has gone in numerous directions. There have been efforts that (1) identify larger data sets supplemented with frequency information to serve as objects of measurement to replace the measurement of forms derived from descriptive grammars, as well as developing appropriate tools for their measurement (Bonami 2014, Sims 2015, Bonami & Beniamine 2015), (2) more carefully explore the nature of the phonological/phonetic stimuli constitutive of word internal structure and, more generally, inquire about the appropriate forms that words as objects of analysis should take (Lehiste 1972, Kemps et al. 2005, Blazej & Cohen-Goldberg 2015, Seyfarth et al. 2015, Plag et al. to appear), (3) explore how analogical inference may rely upon implicative organization in the learning of complex morphological systems (Baayen & Ramscar 2015, Ramscar et al. 2015), and (4) identify cross-linguistic generalizations concerning possible constraints on the organization of morphological systems (Stump & Finkel 2013, Baerman et al. 2015, Ackerman & Malouf 2015).
4 Conclusions A Word-based implicative approach, as characterized here, is leading morphological theory to “refurbish its foundations” (Hockett 1987) and to undergo important reconceptualizations concerning its methodologies, its objects of inquiry, and ideas about theory construction. The nature of these changes align it more with recent dynamic systems perspectives on analysis in the developmental sciences (Lehrman 1953, 1970, von Bertalanffy 1973, Oyama et al. 2001), both (ecological) evolutionary developmental biology (Gilbert & Epel 2008, Laland et al. 2010, Arthur 2010) and developmental psychology (Karmiloff-Smith 1994, Elman et al. 1996, Stiles 2008, Spencer et al. 2009, Hood et al. 2010). In the broadest terms this relates to their emphases on the (probabilistic) modeling of complex interactions of multi-level systems. Morphology is clearly an adaptive system consisting of discriminable parts: word internal structure represents patterns of elements constitutive of words and words are the parts that constitute the patterns of paradigm structure.The analogy with biological systems permits researchers to entertain the notion that the evident complexity of morphological organization emerges from relatively simple interactions among its contributing morphological, phonological, phonetic and semantic component parts. As Camazine et al. (2001) suggest with respect 25
to biological systems, Relatively little needs to be coded at the behavioral level … In place of explicitly coding for a pattern by means of a blueprint or recipe, self-organized pattern formation relies on positive feedback, negative feedback, and a dynamic system involving large numbers of actions and interactions. (Camazine et al. 2001:13) With such self-organization, environmental randomness can act as the ‘imagination of the system’, the raw material from which structures arise. Fluctuations can act as seeds from which patterns and structures are nucleated and grow. (Camazine et al. 2001:26) This way of seeing has permitted researchers analyzing complex phenomena to see what their unassisted imaginations failed to anticipate: remarkable ranges of possibility and extraordinary variation become visible and cohere when the appropriate assumptions and tools are used to make sense of the ‘imagination of the system’. To solve the PCFP and related problems necessitates exploring the imagination in morphological systems and this requires a proper understanding of word internal structure and the patterns of relatedness within which words are organized.
References Ackerman, F. & Blevins, J. P. (2008). Syntax: The state of the art. In van Sterkenberg, P. (ed.), Unity and diversity of languages, Benjamins. 215–229. Ackerman, F., Blevins, J. P. & Malouf, R. (2009). Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In Blevins, J. P. & Blevins, J. (eds.), Analogy in grammar: Form and acquisition, Oxford: Oxford University Press. 54–82. Ackerman, F. & Bonami, O. (in press). Systemic polyfunctionality and morphology-syntax interdependencies. In Hippisley, A. & Gisborne, N. (eds.), Defaults in morphological theory, Oxford: Oxford University Press. Ackerman, F. & Malouf, R. (2013). Morphological organization: The low conditional entropy conjecture. Language 89. 429–464. Ackerman, F. & Malouf, R. (2015). The No Blur Principle effects as an emergent property of language. In Proceedings of the 41st meeting of the Berkeley Linguistics Society. Ackerman, F. & Stump, G. (2004). Paradigms and periphrasis: A study in realization-based lexicalism. In Sadler, L. & Spencer, A. (eds.), Projecting morphology, Stanford: CSLI Publications. 111–157. Albright, A. & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition 90. 119–161. 26
Andersen, T. (2014). Number in Dinka. In Storch, A. & Dimmendaal, G. J. (eds.), Number – constructions and semantics, Amsterdam: Benjamins. 221–264. Anderson, P. W. (1972). More is different. Science 177. 393–396. Anttila, R. (1989). Historical and comparative linguistics. Amsterdam: Benjamins. Aronoff, M. (1994). Morphology by itself: Stems and in lectional classes. Cambridge MA: MIT Press. Arthur, W. (2010). Evolution: A developmental approach. John Wiley and Sons. Baayen, R. H. & Ramscar, M. (2015). Abstraction, storage, and naive discriminative learning. In Dabrowska, E. & Divjak, D. (eds.), Handbook of cognitive linguistics, Mouton de Gruyter. Baerman, M. (2012). Paradigmatic chaos in Nuer. Language 88. 467–494. Baerman, M. (2014a). Covert systematicity in a distributionally complex system. Journal of Linguistics 50. 1–47. Baerman, M. (2014b). Floating morphological paradigms in Seri. Paper presented at the Sixteenth International Morphology Meeting, Budapest. Baerman, M., Brown, D. & Corbett, G. (2010). Morphological complexity: a typological perspective. http://www.morphology.surrey.ac.uk/Papers/Morphological_ complexity.pdf. Baerman, M., Brown, D. & Corbett, G. G. (eds.) (2015). Understanding and measuring morphological complexity. Oxford: Oxford University Press. Bateson, P. & Gluckman, P. (2011). Plasticity, robustness, development and evolution. Cambridge: Cambridge University Press. Beard, R. (1995). Lexeme-morpheme base morphology: A general theory of in lection and word formation. Albany, NY: SUNY Press. Beltrame, G. (1880). Grammatica e vocabularia della lingua denka. Rome: Guiseppe Civelli. Benítez-Burraco, A. & Longa, V. M. (2010). Evo-devo—of course, but which one? some comments on Chomsky’s analogies between the biolinguistic approach and evo-devo. Biolinguistics 4. 308–323. Bickel, B. & Nichols, J. (2013a). Exponence of selected in lectional formatives. In Dryer, M. S. & Haspelmath, M. (eds.), The world atlas of language structures online, Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/21.
27
Bickel, B. & Nichols, J. (2013b). Fusion of selected in lectional formatives. In Dryer, M. S. & Haspelmath, M. (eds.), The world atlas of language structures online, Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/20. Blazej, L. J. & Cohen-Goldberg, A. M. (2015). Can we hear morphological complexity before words are complex? Journal of Experimental Psychology: Human Perception & Performance 41. 50–68. Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics 42. 531–573. Blevins, J. P. (in press). Word and paradigm morphology. Oxford: Oxford University Press. Bochner, H. (1993). Simplicity in generative grammar. Mouton. Bonami, O. (2014). La structure ine des paradigmes de lexion. Habilitation, Université Paris Diderot. Bonami, O. (2015). Periphrasis as collocation. Morphology 25. 63–110. Bonami, O. & Beniamine, S. (2015). Implicative structure and joint predictiveness. In Pirelli, V., Marzi, C. & Ferro, M. (eds.), Word structure and word usage. proceedings of the networds inal conference. Bonami, O. & Boyé, G. (2014). De formes en thèmes. In Villoing, F., Leroy, S. & David, S. (eds.), Foisonnements morphologiques. etudes en hommage à françoise kerleroux, Presses Universitaires de Paris Ouest. 17–45. Bonami, O. & Henri, F. (2010). Assessing empirically the in lectional complexity of Mauritian Creole. Paper presented at workshop on Formal Aspects of Creole Studies, Berlin. Online: http://www.llf.cnrs.fr/Gens/Bonami/presentations/BoHen-FACS-10.pdf. Bonami, O. & Luís, A. R. (2014). Sur la morphologie implicative dans la conjugaison du portugais : une étude quantitative. In Léonard, J.-L. (ed.), Morphologie lexionnelle et dialectologie romane. Typologie(s) et modélisation(s)., Leuven: Peeters. Number 22 in Mémoires de la Société de Linguistique de Paris, 111–151. Bybee, J. L. (1985). Morphology: A study of the relation between meaning and form. Philadelphia: Benjamins. Camazine, S., Deneubourg, J.-L., Franks, N. R., Sneyd, J., Theraulaz, G. & Bonabeau, E. (2001). Self-organization in biological systems. Princeton University Press. Corbett, G. G. (2013). The unique challenge of the Archi paradigm. In Proceedings of the 37th annual meeting of the berkeley linguistics society: Special session on languages of the caucasus. 52–67.
28
Cover, T. M. & Thomas, J. A. (2006). Elements of information theory. Hoboken: John Wiley and Sons, second edition edition. Cruschina, S., Maiden, M. & Smith, J. C. (eds.) (2013). The boundaries of pure morphology: Diachronic and synchronic perspectives. Oxford: Oxford University Press. Davies, A. M. (1998). Nineteenth-century linguistics, vol. 4 of History of Linguistics. New York: Longman. Davison, A. C. & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge: Cambridge University Press. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D. & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, Ma: MIT Press. Esper, E. A. (1925). A technique for the experiment investigation of associative interference in arti icial linguistic material. Language monographs . Esper, E. A. (1966). Social transmission of an arti icial language. Language 42. 575–580. Esper, E. A. (1973). Analogy and association in linguistics and psychology. Athens, GA: University of Georgia Press. Fertig, D. (2013). Analogy and morphological change. Edinburgh: Edinburgh University Press. Gabbard, K. (2015). South Saami vowel alternations. Ms. UC San Diego. Gentner, D., Holyoak, K. J. & Kokinov, B. N. (2001). The analogical mind: Perspectives from cognitive science. Cambridge, MA: MIT Press. Gilbert, S. F. & Epel, D. (2008). Ecological developmental biology. Sinauer Associates. Gilbert, S. F. & Sarkar, S. (2000). Embracing complexity: organicism for the 21st century. Developmental dynamics 219. 1–9. Gottlieb, G. (1997). Synthesizing nature-nurture: The prenatal roots of instinctive behavior. Sussex: Psychology Press. Gurevich, O. I. (2006). Constructional morphology: The Georgian version. Ph.D. dissertation, University of California, Berkeley. Hay, J. & Baayen, R. H. (2005). Shifting paradigms: Gradient structure in morphology. Trends in Cognitive Science 9. 342–348. Hockett, C. F. (1987). Refurbishing our foundations. Amsterdam: John Benjamins.
29
Hofstadter, D. & Sander, E. (2014). Surfaces and essences: Analogy as the fuel and ire of thinking. New York: Basic Books. Hood, K. E., Halpern, C. T., Greenberg, G. & Lerner, R. M. (eds.) (2010). Handbook of developmental science, behavior, and genetics. Wiley. Jablonka, E. & Lamb, M. J. (2006). Four dimensions of evolution: Genetic, epigenetic, behavioral and symbolic variation in the history of life. Cambridge MA: MIT Press. Joos, M. (ed.) (1957). Readings in linguistics i. Chicago: University of Chicago Press. Karmiloff-Smith, A. (1994). Precis of beyond modularity: A developmental perspective on cognitive science. Behavioral and Brain Sciences 17. 693–707. Kemps, J. J. K., Rachèl, Wurm, L. H., Ernestus, M., Schreuder, R. & Baayen, R. H. (2005). Prosodic cues for morphological complexity in Dutch and English. Language and Cognitive Processes 20. 43–73. Kibrik, A. E. (1991). Organising principies for nominai paradigms in Daghestanian ianguages: Comparative and typoiogicai observations. In Plank, F. (ed.), Paradigms: The economy of in lection, Mouton de Gruyter. 255–274. Ladd, D. R., Remijsen, B. & Manyang, A. (2009). On the distinction between regular and irregular in lectional morphology: Evidence from Dinka. Language 85. 659–670. Laland, K. N., Odling-Smee, J. & Myles, S. (2010). How culture shaped the human genome: Bringing genetics and the human sciences together. Nature Reviews Genetics 11. 137–148. Lehiste, I. (1972). The timing of utterances and linguistic boundaries. Journal of the Acoustical Society of America 51. 2018–2024. Lehrman, D. S. (1953). A critique of Konrad Lorenz’s theory of instinctive behavior. Quarterly Review of Biology 28. 337–363. Lehrman, D. S. (1970). Semantic and conceptual issues in the nature-nurture problem. In Aronson, L. R. & Schneirla, T. C. (eds.), Development and evolution of behavior, W.H. Freeman and Co. 17–52. Lounsbury, F. (1953). Oneida verb morphology. Yale University Publications in Anthropology 48. New Haven: Yale University Press. Chapter 1 reprinted in Joos (1957), 379–385. Marantz, A. (2013). No escape from morphemes in morphological processing. Language and Cognitive Processes 28. 905–916. Marlett, S. A. (2009). A grammar of Seri. http://www.und.nodak.edu/instruct/smarlett/ Stephen_Marlett/GrammarDraft.html. 30
Matthews, P. H. (1991). Morphology. Cambridge: Cambridge Univesity Press. Miestamo, M., Sinnemäki, K. & Karlsson, F. (eds.) (2008). Language complexity: Typology, contact, change. Amsterdam: John Benjamins. Mitterutzner, J. C. (1866). Die Dinka-Sprache in Central-Africa. Kurze grammatik, text und wörterbuch. Brixen: Verlag von A. Weger’s Buchhandlung. Oudeyer, P.-Y. (2006). Self-organization in the evolution of speech. Oxford: Oxford University Press. Overton, W. F. (2010). Life-span development: Concepts and issues. Handbook of life-span development 1. 1–29. Oyama, S., Gray, R. D. & Grif iths, P. E. (2001). Cycles of contingency: Developmental systems theory and evolution. Cambridge, MA: MIT Press. Paul, H. (1891). Principles of the history of language. London: Longmans, Green and Co., translated from 2nd edition into english by h.a. strong edition. Paunonen, H. (1976). Allomor ien dynamiikkaa [The dynamics of allomorphs]. Virittäjä 79. Peirce, J. R. (1980). An introduction to information theory: Symbols, signals and noise. Dover. Pihel, K. & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Valgus. Plag, I., Hormann, J. & Kunter, G. (to appear). Homophony and morphology: The acoustics of word- inal S in English. Journal of Linguistics . Ramscar, M., Dye, M., Blevins, J. P. & Baayen, R. H. (2015). Morphological development. In Bar On, A. & Rabvit, D. (eds.), Handbook of communication disorders, Mouton de Gruyter. Robins, R. H. (1959). In defense of WP. Transactions of the Philological Society 116–144. Russell, E. (1930). The interpretation of development and heredity. Oxford: Clarendon. Sampson, G. B., Gil, D. & Trudgill, P. (eds.) (2010). Language complexity as an evolving variable. Oxford: Oxford University Press. Sapir, E. (1921). Language. San Diego: Harcourt Brace. Seyfarth, S., Ackerman, F. & Malouf, R. (2015). Acoustic differences in morphologically-distinct homophones. Presentation at American International Morphology Meeting, Amherst, MA. Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal 27. 379–423, 623–656. Sims, A. D. (2015). In lectional defectiveness. Cambridge: Cambridge University Press. 31
Spencer, J. P., Blumberg, M. S., McMurray, B., Robinson, S. R., Samuelson, L. K. & Tomblin, J. B. (2009). Short arms and talking eggs: Why we should no longer abide the nativist-empiricist debate. Child Development Perspectives 3. 79–87. Stiles, J. (2008). The fundamentals of brain development: Integrating nature and nurture. Cambridge: Harvard University Press. Stump, G. & Finkel, R. (2009). Principal parts and degrees of paradigmatic transparency. In Blevins, J. P. & Blevins, J. (eds.), Analogy in grammar: Form and acquisition, Oxford: Oxford University Press. 13–54. Stump, G. & Finkel, R. (2013). Morphological typology: From word to paradigm. Cambridge: Cambridge University Press. Stump, G. & Finkel, R. (2015). Contrasting modes of representation for in lectional systems: Some implications for computing morphological complexity. In Baerman, M., Brown, D. & Corbett, G. G. (eds.), Understanding and measuring morphological complexity, Oxford: Oxford University Press. 119–140. Thymé, A. (1993). Connectionist approach to nominal in lection: Paradigm patterning and analogy in Finnish. Ph.D. dissertation, UC San Diego. von Bertalanffy, L. (1973). General system theory: Foundations, development, applications. New York: Braziller, revised edition. Wilbur, J. (2014). A grammar of Pite Saami. Berlin: Language Science Press. Wurzel, W. U. (1986). Die wiederholte Klassi ikation von Substantiven. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 39. 76–96. Wurzel, W. U. (1989). In lectional morphology and naturalness. Dordrecht: Springer.
32