Multiplex lexical networks reveal patterns in early

0 downloads 0 Views 3MB Size Report
Sep 13, 2016 - We find that the emerging multiplex network topology is an important proxy of the cognitive ... a fruit that is red' or to 'name a red fruit' [5], likely because hearing fruit first activates rep- ... We demonstrate that the multiplex model is more accu- rate than any ...... 19 Nicole M Beckage and Eliana Colunga.
Multiplex lexical networks reveal patterns in early word acquisition in children arXiv:1609.03207v1 [physics.soc-ph] 11 Sep 2016

Massimo Stella1,+ , Nicole M. Beckage2,3 , and Markus Brede1 1

2

Institute for Complex System Simulations, University of Southampton, UK Department of Computer Science, Institute of Cognitive Science, University of Colorado Boulder, USA 3 Department of Electrical Engineering and Computer Science, University of Kansas, USA + Corresponding author: [email protected]

September 13, 2016 Abstract Network models of language have provided a way of linking cognitive processes to the structure and connectivity of language. However, one shortcoming of current approaches is focusing on only one type of linguistic relationship at a time, missing the complex multirelational nature of language. In this work, we overcome this limitation by modelling the mental lexicon of English-speaking toddlers as a multiplex lexical network, i.e. a multi-layered network where N=529 words/nodes are connected according to four types of relationships: (i) free associations, (ii) feature sharing, (iii) co-occurrence, and (iv) phonological similarity. We provide analysis of the topology of the resulting multiplex and then proceed to evaluate single layers as well as the full multiplex structure on their ability to predict empirically observed age of acquisition data of English speaking toddlers. We find that the emerging multiplex network topology is an important proxy of the cognitive processes of acquisition, capable of capturing emergent lexicon structure. In fact, we show that the multiplex topology is fundamentally more powerful than individual layers in predicting the ordering with which words are acquired. Furthermore, multiplex analysis allows for a quantification of distinct phases of lexical acquisition in early learners: while initially all the multiplex layers contribute to word learning, after about month 23 free associations take the lead in driving word acquisition.

Introduction Language is a complex system: a non-trivial, multi-level mapping of meanings onto words [1, 2]. In order to communicate, humans must learn how to use linguistic structures for expressing meaning and thoughts as words. The cognitive processes driving language learning likely organise the components of language into a so-called mental lexicon (ML) [2], which can be described by a web-like structure of interacting lexical items (e.g. word representations). These linguistic interactions are multi-faceted or multiplex in nature, as they must account for several types of relationships. Empirical studies in psycholinguistics have suggested that rather than providing exact word definitions (as in common dictionaries), the ML contains

1

word meanings in terms of the above multi-relational word patterns [3–7]. How this internal, multiplex organisation of language relates to and influences language learning is still poorly understood. In this paper we quantify the relation between the structure of language and cognitive processing through multiplex networks [8,9]. We achieve this by combining information about word phonology, semantics, and syntactic components in the layers of a multiplex network. Going beyond the network’s topological description, we then exploit the multiplex structure to predict lexical acquisition of young children. These representational patterns heavily influence the way people learn and access information from the lexicon. For example, individuals respond differently when asked to ’name a fruit that is red’ or to ’name a red fruit’ [5], likely because hearing fruit first activates representations of fruit as compared to first activating the colour red. Further, in phonological confusability tasks, words with more phonological neighbours are more likely to be incorrectly identified in noisy environments, suggesting that phonological features influence recall and recognition [10]. Understanding the structure and interactions of the ML provides insights into human cognition and language. A framework for investigating the multi-relational, dynamic nature of the ML requires two key aspects: (i) accounting for different types of linguistic relationships and (ii) considering cognitive processes that shape and influence the structure of the ML. One approach that is capable of addressing both of these requirements is to use models that are based on a multiplex, or multi-layer, network representation [8,9,11]. Hence, we project the complexity of an individual’s mental lexicon (ML) to a multiplex network structure, which we call Multiplex Lexical Network (MLN). The MLN is a multi-layer network, where nodes represent words. The layers of the MLN capture relationships previously shown to influence children’s lexical learning in single-layer network studies, namely 1) free-associations [12,13], 2) shared features [14,15], 3) co-occurrence in child directed speech [16,17] and 4) phonological similarity [10,18]. Nodes can additionally be characterised by external (word-specific) features such as frequency or length. Previous literature on modelling language use and learning through network science has largely focused on single-layer representations of networks, for a review see [6, 7, 19]. These works strongly suggests that cognitive constraints and mechanisms affecting the use of language can be uncovered and explored through network science. For example, semantic relationships of all types have been shown to have small-world structure, i.e. they are described by highly clustered networks that have short paths between nodes [13,20–22]. In fact, it has been suggested that this small-world structure may be cognitively beneficial to language learning and use [6, 19, 23]. Beyond topological classifications of language networks, experimental results shown strong correlations between network structure and human performance in language related tasks. Measurements of retrieval times [4, 5, 24], age of acquisition [13,15,17] and even semantic degradation due to ageing [23] can be studied through analyses of single layer networks. Phonological similarities among words, when represented by a network, are characterised by a bound on the maximum sizes of phonological neighbourhoods and a tendency to avoid local clustering [10, 25, 26]. Experimental work suggests that this reduces confusability and aids in quick retrieval of words from memory [10,27]. While single layer networks reveal aspects of the structure of language and languagerelated cognitive processes, there is still no unified understanding of language and cognition that accounts for both phonological and semantic aspects of language. Multi-layered linguistic networks begin to fill this gap. Liu et al. [28] analysed Chinese as a multi-layered network, composed of three syntactic networks of words and one network of co-occurring phonemes. They found that almost half of the syntactic dependency relations in modern Chinese sentences are between co-occurring words. A similar analysis was performed for

2

English and Croatian [29]. While these studies consider topological features of a multi-layer linguistic networks, more research exploring the role of a multiplex representation of language is necessary in order to highlight the interplay between structural multi-layer features and the cognitive processes shaping them. The main advantage of the multiplex approach is that it allows for a more detailed representation of the complexity of many real-world systems [30]. This added complexity provides additional insights into a system’s structure and dynamics. To name some examples, in the last few years multiplex modelling has provided novel insights about social balance in online platforms [31, 32], the emergence and stability of multiculturalism [33], congestion in transportation networks [30], and ecosystem characterisation in ecology [34, 35], cf. [8, 9] for a review on multi-layer and multiplex networks. Here, we use multiplex lexical networks to understand language acquisition in children by assessing the impact of several external and internal structural network features. We evaluate performance using empirical parent reports of a child’s production, aggregated to capture the order of early learned words for children between 16 and 30 months [36]. Our main aim is to assess the use and possible necessity of a multiplex network representation in capturing emerging features of children’s MLs. We demonstrate that the multiplex model is more accurate than any single layer representation in predicting this normative acquisition trajectory of young children. We find that the multiplex network can account for developmental trends, offering predictions and interpretability that analysis based on word features, or single layer network cannot.

Results Multiplex network construction and analysis We first construct a multiplex lexical network composed of four layers capturing i) free associations [12], (ii) feature sharing [14], (iii) co-occurrence [16], and (iv) phonological relationships [10] (cf. Methods). Panels (a) and (b) of Fig. 1 provide a visual impression of the connections and clusters in the four multiplex layers, either by separating out the layers (a), or by treating the multiplex network as an edge-coloured graph (b). For easier visualisation words have been grouped into topic areas and clustered into two communities and marked by red (e.g. clothing, body, etc.) and yellow (games, toys, food, etc.) and obtained via Louvain’s method [37]. The data representation in Fig. 1(a) also shows how words are connected within word topics. In agreement with other authors [3, 4], we find that semantically related words tend to be clustered in the association, feature, and co-occurrence layers. Single layer analysis, with findings and comparisons to appropriate null models, are summarized in Tab. 1. The results in the table show that apart from the association layer, all layers are disconnected, and have fairly small giant components. In contrast, the multiplex network is fully connected [38], so that for every pair of nodes there is a multiplex path joining them. This indicates the importance of inter-layer jumps for navigating the ML. In any network representation detecting how the organisation of connections differs from a random one can provide insightful information on the represented system. Here we briefly report on the topological features of words in the MLN (e.g. being a hub, being in a densely connected core) for relating them to word acquisition through ordering experiments in the next Section. In terms of word clustering, previous studies highlighted small-worldness as being a distinctive trait of language networks [10, 20, 21]. Analysis of clustering and shortest path lengths (cf. Tab. 1) confirm this trend: each MLN layer is a small-world [39]. Further,

3

bathroom

Asso. – Words in Furniture

potty

Feat. – Words in Clothing coat

window porch

dress

shower

bench bed crib chair sofa table couch

drawer door bedroom bathtub sink closet refrigerator room kitchen

boots belt necklace pants mittens

oven

stove

People

Clothing

Body

pajamas shirt

underpants

Descriptive Places

Quantifiers

Pronouns Vehicle

Outside

diaper hat sneaker zipper

Food

tissue

Sound

(a)

Toys Household

Prepositions

scarf

garage

stairs basement

(b)

jeans sweater

beads button shorts

dryer

jacket

gloves

Action

Animal

Time Questions

Furniture

Games

(c) K.Tau

-

1.0 0.8 0.6 0.4

Asso.

Co-occ.

Feat.

0.2

-

Phon.

0 -0.1

eat

! carry touch cover

pour

read

swim

)

(d) he

hurry paint

stop wash feed fall talk drive pick be write blow push sit hide tear throw lick

hate shake drop

bump

spill

Mult. 0.49

her

0.40

mine

share

walk hold sleep fix do sing knock help get wait kick

( him

think cut have look find stay play

say like hear

you

wish

take bite

put

me

-

Co-occ. ride Words in Actions

0.30 she

Phon. Words in Pronouns

0.20

-

0.10 0

kiss

tickle chase

Figure 1: (a) Visualisation of the multiplex network of the mental lexicon of young children (grey box), highlighting some particular subgraphs in each layer (insets). (b) Visualisation of the MLN as an edge coloured network. For simplicity, words are clustered into Communicative Development Inventory categories (e.g. clothing, toys, etc.). (c) Visualisation of the degree correlations across layers quantified by the Kendall Tau. (d) Visualisation of edge overlap among different layers, compared against configuration models. The aggregate layer is reported for quantifying how much each layer contributes to it. apart from the phonological layer which is expected to be assortatively mixed [10, 25], all other layers are characterised by disassortative mixing by degree. Combining findings of a high disconnectedness ratio θ [40], strong clustering and typically high assortativity values indicates the presence of a marked core-periphery structure in the MLN layers (as further supported by a k-core analysis, see Supplementary Information). Words in the core display different topological properties that might result in different word learning patterns compared to words in the periphery, as we test in the next Section. Further analysis in Fig. 2(b) also reveals that layers differ in their degree distributions which are right-skewed and exponentiallike for the association and phonological layers and significantly more heavy-tailed for the co-occurrence layer. The degree distribution for the multi- or overlapping degree of the MLN [11, 41] underlines the presence of hubs across the multiplex structure, i.e. words which are significantly more connected than average and that are typically important for navigation through concepts [20, 21, 38]. This is why we will test whether hubs are words learned at

4

earlier stages in the next Section. One might wonder if a representation of the ML in terms of four separate layers is an appropriate representation. To address this, we carried out a structural reducibility analysis [42] which tests if multiplex layers can be aggregated without losing information (see Supplementary Information for details). Results reported in Fig. 2(b) show that aggregation cannot be performed without information loss, emphasising that our multiplex representation is irreducible [42]. The analysis also quantifies similarity between individual layers of the multiplex. We find the strongest similarity between the co-occurrence and phonological layers. This result is not surprising as co-occurrence in child directed speech has strong phonological influences, particularly for young children [16]. The second strongest similarity is found between the layers for free association and feature sharing norms, which are both related to semantic features [12, 14, 43]. These results suggest that the network layers are irreducible and capture relational structures of language that have been shown previously to influence word learning. The multiplex structure allows quantification of correlations across different aspects of the ML. We are interested in understanding how the degree structure in one layer is similar to the degree structure in another layer. Fig. 1(c) visualizes these results for the MLN. We find that words in the feature and co-occurrence layers tend to have negative degree correlations (Kendall Tau κ ≈ −0.13, p < 0.0001), so that hubs in one layer tend to have lower degrees in the other layer. In contrast, the co-occurrence and phonological layers display positive degree correlations (Kendall Tau κ ≈ 0.64, p < 0.0001): in the children’s lexicon words having many co-occurrences display larger phonological neighbourhoods too, further supporting the idea that phonological similarities and co-occurrences in children influence each other [16, 18]. Another question of interest is to what extend the presence of relationships in one layer implies relationships in other layers. This correlation can be quantified by edge overlap (or multiplexity [44]). Results are reported in Fig. 1(d). As one would expect, we find some overlap between connections in the feature and association layers, as both capture semantic aspects of the ML. Importantly, however, links tend not to overlap across layers. This suggests that different layers of the multiplex tend to capture different language and relational aspects of the ML.

Multiplex Orderings: Results and Discussion Next, we explore the process of word learning through the MLN assuming a preferential acquisition scenario [13, 15]. We generate a word acquisition ordering by ranking words according to features derived from the vocabulary network structure. We compare these orderings to those based on word specific information such as word frequency and word length. Orderings are evaluated by measuring the relative gain in the number of words correctly predicted. More specifically, we define word gain as the size of the overlap between the set of predicted words and the actual words reported as learned by age t. To obtain relative word gains we calculate the difference to random guessing and normalise by the total number of words that have been learned at age t (see Methods). Hence, positive relative word gain indicates results superior to, and negative values indicate performance inferior to, random guessing. Results in Fig. 4, supported by results in the next Section, suggest that the normative acquisition trajectory is well characterized by three distinct learning phases. In a first stage, comprising months 19 and 20 which we call the very early learning stage (VELS), each ordering essentially tries to predict the first 40 learned words. This phase is followed by an early learning stage (ELS) covering an age range between 20 and 23 months. Last we

5

Network

L

θ

CC(def.) a

GCSize D

hdi

Associations (Asso.) Asso. Config. Model Feature Norms (Feat.) Feat. Config. Model Co-occurrences (Co-occ.) Co-occ. Config. Model Phonological Similarities (Phon.) Phon. Config. Model Multiplex Aggregate Mult. Agg. Combined CMs

2441 2441 2389 2389 2147 2147 349

0.02 0.02 0.76 0.76 0.497 0.497 0.68

0.2 0.03 0.63 0.38(2) 0.69 0.75 0.37

-0.1 -0.007(6) -0.01 -0.07(6) -0.44 -0.4 0.48

527 527 128 128 330 330 175

3.2 3.00(1) 1.8 1.72(1) 2.2 2.19(1) 7.7

349 6998 6998

0.68 0.004 0.004

0.02 0.33 0.18(5)

0.03 -0.07 -0.1

241(5) 12.5(3)5.31(7) 529 5 2.4 529 4.2(1) 2.25(1)

6 6.1(1) 3 3.1(1) 4 4.2(1) 20

Table 1: Metrics for the MLN with N = 529 nodes listing edge count L, average degree hki, disconnectedness ratio θ, mean deformed clustering coefficient CC, assortativity coefficient a, size of the largest connected component GCSize, diameter D and mean shortest path length hdi. For reference, also expectations from configuration models as reference models are reported with errors given by standard deviations (reported in parentheses for the last digits). discriminate a late learning stage (LLS) comprising ages between 23 and 28 months. Table 3 reports average relative word gains over the entire study period and for each of the mentioned stages of learning. A first observation from Fig. 4 and Tab. 2 is that the use of frequency data to predict word learning depends strongly on what corpus is used to compile frequency data. We find that frequency data computed based on child directed speech (which was also used to compute the co-occurrence networks [16]) allows for markedly better ordering than random baseline predictions. In contrast, word frequency data gathered for adult speakers [45] give results close to random chance and are thus not suitable for predicting word learning in young children. While previous work has found frequency to be a good predictor for word learning [46], results in Fig. 4 suggest that some topological network information is more predictive of acquisition ordering. For instance, degree-based orderings in the association layer yield the highest word gains against random word guessing (see Supplementary Fig. S4). Notice that a simple addition of layers does not improve performances of the degree-based orderings: orderings based on the multi-degree network [11] perform worse than using just the association layer. We conjecture this is due to the lack of strong positive degree correlations across the layers (see Fig. 1(c)), i.e. nodes that are hub nodes in one layer often have low degrees in other layers so that uniformly combining degrees washes out much of the MLN structure that may be useful to young language learners. The high prediction power of word degree in the association layer (see Fig. 1(c)) is interesting in the context of understanding acquisition processes of young children. Our finding is in agreement with previous studies indicating how association norms are generally valid predictive models for language learning [13, 15, 47]. We conjecture associations perform particularly well in our case because they do contain both strong relationships with semantic memory [12, 24, 43] and early learned words [15]. Previous work has pointed out that it is very difficult to outperform random guessing in the ELS and particularly in the VELS because children may be trying out different ways of learning [47]. We find that better than random prediction is possible, at least for normative

6

The Multiplex Lexicon Network Asso.! ! ! !

Feat.! !

���������� ������������

���

▼ ◆ ○ ▼ ○ ■ ○ ■ ▼ ● ▼ ○ ■ ■ ○ ○ ▼ ◆ ○ ■ ○ ● ● ▼ ● ■ ■ ■ ■ ■ ▼ ○ ● ● ▼ ◆ ▼ ▼ ○ ▲▲▲▲▲▲◆ ◆ ◆ ● ● ● ○ ▼ ○ ▼ ○ ▼ ◆ ◆ ◆ ▲▲▲ ● ◆ ▲▲▲ ◆ ● ◆ ◆ ▲ ● ▲▲ ● ▲

��-� ●

��-� ■

��-�

◆ ▲ ▼ ○

Florida Asso. McRae Feature Norms

▲▲ ▲

Co-occurrence

!

■ ■ ■ ○ ▼ ○ ▼ ○ ▼ ○ ■ ▼ ◆ ◆ ◆ ○ ◆ ▼ ■ ○ ◆ ● ◆ ▼ ◆ ○ ● ▼ ◆ ○ ■ ▼ ◆ ◆ ▼ ● ▼ ○ ○

Phonological Sim.

!

Co-occ.! ! ! Phon.! !

1.19 . 10-2"



1.02 . 10-2"

Aggregated

Relative Entropy of different Aggregates "

Multidegree

���

���

���

0.72 . 10-2" 0"

���� ������

Figure 3: Relative entropy for different aggreFigure 2: Cumulative degree distributions gation stages of the MLN. The maximum relaP (K ≥ k) for the individual layers and for the tive entropy determines which layer configuraaggregate and multi-degree. tion has the least redundant information. The maximum relative entropy is highlighted in red and it indicates irreducibility of the MLN, cf. [42]. acquisition, even in very early learning during the VELS, in which orderings based on the association layer, and multiplex measures give better than random results (cf. Tab. 2). The ELS phase is characterised by the dominating performance of a global multiplex measure, multiplex closeness. In fact, all orders based on individual layer features fail to replicate the peak in word gain observed for multiplex closeness (cf. Fig. 4). Multiplex measures such as the versatile PageRank [48] or multidegree also give considerably better performance than random guessing. Good performance can also be achieved using an ordering based on word length, possibly because of a least effort effect [1] that suggests that short words are easier to memorise and learn. This hypothesis is also supported by the observation that word length loses its predictive power at later learning stages. From these collective results, we can conclude that early learning is strongly influenced by the multiplex structure, specifically the closeness of words via all layers that form the MLN structure. Results for the LLS show another large change of trends. Here we find that the association degree gives best predictability. Global and local multiplex measures, as well as word specific features such as word frequency, perform slightly worse that association degree. This suggests that relational learning is very important, but this type of relational learning is best captured by the single association layer. This is compatible with the observed period of semantic learning in children [18]. This may also be due to the design of vocabulary norms collection. During later stages of learning fewer and fewer words are monitored. This results in difficulty teasing apart performance differences. During the LLS relative word gains for each ordering approach word gains predicted by the baseline of random guessing, making performance in the LLS hard to evaluate. Averaging over the entire time frame it becomes clear that closeness measures on the multiplex allow for best predictions. This provides evidence for the hypothesis that predictors based on the multiplex can outperform single layer network measures. So far, we have tested

7

����

�������� ���� ����

���� ���� ���� ���� ����

��

��

�������� ����� (��� �� ������) �� �� ��

◆◆



�� ����� ����



■ ���� (�����) ◆ ▼ ▼ ▼ ◆ ● ● ■ ■ ■ ◆ ����� (���������) ■◆ ▼◆ ◆ ●▼ ▼◆ ■ ■ ▼◆ ● ▼ ■ ◆ ▲ ����� (������) ▲ ▼ ▼ ■ ■ ● ■ ▲ ▲◆ ▼ ■ ■●■ ■ ▲ ▲● ● ▲ ▲ ▲ ▲ ▼ ▼ ������ (���������) ▼ ■ ◆ ▲ ▲ ■ ▲ ◆ ■ ▼◆ ●▼ ● ▲ ▼ ◆ ▲ ▲ ▲ ■ ◆ ● ▼ ▲ ◆ ■ ▼ ▲ ▼ ■ ◆ ▼ ▲ ● ■ ▲ ▼ ◆ ● ▼ ■ ▲ ●● ◆ ▼ ■ ▲ ◆ ▼ ■ ▲ ▼ ●●●●●●●◆ ▲ ▼ ■◆ ● ▲ ●◆ ■ ● ● ▲ � ��� ��� ��� ��� ��� �������� ����� (# ������� �����) ◆

Figure 4: Relative word gains for several acquisition orderings, at different learning stages. Error bars are approximately the same size as the symbols in the plot.

Table 2: Average relative word gains for different orderings, evaluated when: 40 words have been learned in VELS, 160 words have been learned in ELS, when 369 have been learned in LLS and when all the 529 MLN words have been learned (Total). The empirical highest word gain of each stage is highlighted in red. Results from the optimisation procedure are reported at the end of the table. Error bars for the empirical orderings were estimated as being ≈ 10−3 and are not reported for brevity. Ordering Short Len. Deg. (Asso.) Clos. (Multiplex) Freq. (Child.) Multidegree PageR. (Multiplex) Opt. Deg. (VELS) Opt. Deg. (ELS) Opt. Deg. (LLS) Opt. Clos. (VELS) Opt. Clos. (ELS) Opt. Clos. (LLS)

VELS 0.044 0.077 0.070 -0.007 0.070 0.063 0.17(2) 0.14(2) 0.13(2) 0.19(3) 0.16(2) 0.14(2)

ELS 0.143 0.129 0.191 0.091 0.114 0.142 0.13(2) 0.17(2) 0.17(2) 0.19(2) 0.25(2) 0.24(2)

LLS 0.026 0.088 0.083 0.077 0.072 0.081 0.08(1) 0.09(1) 0.09(1) 0.07(1) 0.09(1) 0.10(1)

Total 0.057 0.095 0.107 0.072 0.080 0.093 0.09(1) 0.11(1) 0.11(1) 0.12(2) 0.13(1) 0.14(1)

ranking measures on single network layers, and for the combined multiplex and shown that prediction performance can benefit from taking account of information from multiple layers. However, we assumed that each layer plays an equal role. Our next aim is to test for a changing influence of layers as the early lexicon evolves to better understand developmental mechanisms of acquisition

Optimisation Procedure To explore the influence of different multiplex layers on word acquisition we consider linear convex combinations of single-layer network metrics (cf. Methods section). For a given network metric, we then optimise relative word gains to obtain a set of coefficients that indicate the influence of all multiplex layers on word acquisition. In order to avoid over-fitting, we perform a Monte Carlo robustness analysis: we optimise over subsets of word trajectories sampled uniformly at random and including only 80% of the original whole word dataset. Reported results come with error bars calculated by averaging over randomly sampled word sets. We also confirmed that the improvement in performance of optimised multiplex parameters is strongly dependent on the structure and overlap of layers, as a randomised multiplex network is not able to replicate quantitatively similar prediction accuracies (see Supplementary Information). While we explored many different local and global network metrics (see Supplementary Information) we here focus on presenting the results for the best performing measures of degree and closeness. As reported in Tab. 3 optimisation of layer contributions clearly improves prediction performance at all learnings stages over the word orderings from the previous section. All optimisation results gave negligible weights on the order of (≈ 10−3 ) for the phonological layer, which also performed poorly in the ordering experiments above. For this reason we exclude the phonological layer from the presentation of results in Fig. 5. Ternary plots

8

0.16

0.16

0.12

0.12

0.08

0.08

0.04

0.04

0.03

0

0

0

0

0

0.4

●●● ■■ ■

0.2

◆ ◆◆

■ ◆

19



Associations



Feature Norms



Co-occurrences

■■ ■■■■■■■■■■■■■■■■■■■■ ◆◆ ◆◆◆◆◆◆◆ ◆◆◆◆◆◆◆◆◆◆◆◆◆ 21

23

24

Closeness Optimisation

(d) 0.8

●●●●●●●●●●●●●● ●●●●●●

26

Optimal Layer Importance

Optimal Layer Importance

0.04

0



0.6

0.0

0.04

0.06

Degree Optimisation ●



●●●● ●●

●●

● ◆◆●

0.4

0.2

0.0

28



0.6

●●●●●●

●●●●●



Associations



Feature Norms



Co-occurrences

◆ ◆ ■ ● ◆ ●■ ◆◆ ◆◆◆◆◆◆ ● ◆◆ ■ ◆◆◆◆ ■ ◆◆◆◆◆◆◆ ■■ ■■■■■■ ■■■■■■■■■■■■■■ 19

21

23

24

26

28

Learning Stage (Age in Months)

Learning Stage (Age in Months)

(�)

(�) �������� ����� (��� �� ������) �� �� �� ��

��



���� ����

◼ ◆

����

◆◆◆

���� ����





◆◼ ◼ ◼ ◆◆◆ ◼ ◆ ◆◆ ◼◼ ◼ ◆◼ ◼◼ ◼◼ ◆ ◼ ◼ ◼ ◼◆ ◆◆ ◆ ◼

���� (�����) ����� (���������)

◼ ◆

�������� ���� ����

����

◼◼ ◆ ◆◼ ◆◆ ◼◆

���� �

��� ��� ��� ��� �������� ����� (# ������� �����)

��� ����

���� ����



���� ����

◼ ◆

����

◼◆ ◼◆ ◼

◆◆◆

���� ����





◆◼ ◼ ◼ ◆◆◆ ◼ ◆ ◆◆ ◼◼ ◼ ◆◼ ◼◼ ◼◼ ◆ ◼ ◼ ◼ ◼◆ ◆◆ ◆ ◼

���� (�����) ����� (���������)

◼ ◆

◼◼ ◆ ◆◼ ◆◆ ◼◆

◼◆ ◼◆ ◼

���� �

���

��

���� ����

����

��� ����

����

�������� ����� (��� �� ������) �� �� �� ��

��

��

���� ����

���� �������� ���� ����

0.08

0.12

0.08



0.12

0.18

0.12

0.8

0.06

0.24

0.16

(c)

0.09

��� ��� ��� ��� �������� ����� (# ������� �����)

���

Figure 5: (a) and (b): Ternary plots of the average relative word gains at the end of VELS, the middle of ELS and the end of VELS for degree-based (a) and closeness-based (b) optimisation. (c) and (d): average optimal coefficients indicating layer importances for different learning stages obtained from cross-validation experiments with degree-based (c) and closeness-based (d) optimisation. (e) and (f): averages for the relative word gains corresponding to weighting layers for optimal word gains at the end of VELS, middle of ELS, and end of LLS for degree-based (e) and closeness-based (f) optimisation. Error margins were estimated as standard deviations over cross-validation ensembles.

9

in panels (a) and (b) in Fig. 5 visualise the optimisation landscape of relative word gains over the remaining three layer weights at different stages of word acquisition, i.e. at the end of VELS, middle of ELS, and end of LLS. Optima are typically fairly broad in VELS, but they narrow for later learning stages. We also note that optima change their location in the landscape, suggesting that different linguistic information may play different roles at different stages of development. To investigate this further, optimal layer weights are displayed as a function of age in panels (c) and (d) of Fig. 5. Similar patterns are found for both degree and closeness based rankings and we see a clear distinction between different learning stages. Initially in VELS all layers contribute strongly toward optimal performance. In the case of degree-based optimisation the main contributions stem from the association and the feature norm layer. Instead, when optimising closeness combinations all layers are found to initially contribute equally. The next learning stage, the ELS marks a transition region between the VELS and the LLS in which dominant contributions stem from only one layer, the association layer. For degree-based optimisations, contributions from the co-occurrence layer become very small in the LLS, while associations contribute 80% of weight. An analogous transition is observed in the ELS phase when closeness is optimised. In this case the importances of the feature and the co-occurrence layers are changing over time but end up contributing less than 20% of weight, while associations make up 80% (as in the LLS from the degree optimisation). In panels (e) and (f) of Fig. 4 we compare the evolution of relative word gains for optimisations aiming to predict word acquisition at different stages. We consider periods of acquisition stopping at the end of VELS, up to the middle of ELS and the entire time period up to age 28 months, using both degree and closeness based combinations. When predicting only words learned in VELS, our model provides a significant enhancement in average word gains in this stage, but this combination of layer weighting performs poorly at later stages (see also Table 3), suggesting that VELS is qualitatively different than other periods of language acquisition. Optimisation aimed to predict words up to the middle of ELS gives similar results to optimisation for the entire time frame suggesting similar learning strategies are used during both ELS and LLS. More importantly, results in Fig. 4(e) and (f) point out the importance of closeness centrality for predicting word acquisition at early stages. This finding is already reflected in the highest word gains achieved by closeness centrality in the single-measure based rankings reported in Table 3 in ELS. In fact, as reported in Fig. 4(e), in the early learning stage ranking by multiplex closeness centrality clearly outperforms the optimal convex combination of degree centralities. Optimal linear combinations of individual layer closeness perform even better in ELS, see Fig. 4(f). It is important to stress that closeness centrality is a global network feature accounting for the position of a node relative to all other nodes in the network. This is distinct from local characteristics which only measure a nodes relationship to its immediate neighbours, such as degree. Our findings thus give support to the hypothesis that, particularly in the early learning stages, word learning is influenced by the global structure of the multiplex lexical network. This global multiplex structure seems able to capture some important word patterns that shape or influence early lexicon development.

Conclusions and future work In this paper we have described and analysed normative word acquisition of children between the ages of 19 and 28 months introducing the framework of multiplex lexical networks (MLN). The motivation behind our approach is that multiplex networks allow for the simultaneous

10

analysis of correlations and interactions among words in phonological, semantic, and syntactic contexts. This provides a network representation of the mental lexicon which is irreducible [42] and captures more linguistic information than separate analysis of any single layer previously could [15, 17, 24, 47]. We go beyond a purely topological description by exploiting network properties to predict the order of normative word learning in children. Comparing the prediction power of various network features in a preferential acquisition scenario [15], we find evidence that global multiplex measures can greatly outperform individual layer characteristics. Interestingly, we find that the best topological feature in predicting word learning changes through the course of development. This fact allows us to use network information to distinguish three stages of learning: (i) a very early learning stage where all layers, apart from the phonological one, contribute substantially to prediction, (ii) an early learning stage which marks a transition period, and (iii) a late learning stage in which contributions of word associations dominate word learning. Comparing the predictive power of various MLN topological features over time confirms: (i) the superiority of some multiplex network characteristics relative to single layer features, and (ii) the special role closeness centrality might play in word learning. This is emphasised in two ways: (i) strong performance of multiplex closeness, outperforming all combinations of local network characteristics tested in the early learning phase, and (ii) the best overall predictions are possible by using certain weightings of the individual layer measure of closeness. We thus find very strong indications that, particularly at early stages of learning, word acquisition in children might be strongly driven by distances of learned words relative to other known words at various levels of the human mental lexicon. Our use of the MLN allows us to explore questions about lexical acquisition, uncovering developmental learning stages and quantifying the influence of certain types of linguistic structure on acquisition. Nevertheless, our MLN is limited in important ways. It is important to bear in mind that the MLN representation is only a projection of an individual’s ML. Additionally, the results presented in this paper consider lexical acquisition of a normative child. That is to say, no specific child learns words in this particular order. Instead the order is obtained by average over roughly 1000 productive vocabulary reports. We plan to extend our model to the acquisition trajectory of individual children as future work. Another limitation is the use of the CDI vocabulary. While the CDI is a commonly used check list of words a child produces, there are many words that a child may know that are not on the CDI. Expanding our model to the full vocabulary of a child, and for longer periods of development will further increase our understanding of the language acquisition process. In the future, our MLN can be corroborated by individual learning trajectories as well as language learning at later stages and across other languages. Future work might extend our analysis by considering correlations in word learning across other cognitive and linguistic domains to build an even richer picture of language organisation and learning than the one offered here. For instance, one might argue that the marginal influence of the phonological layer found in this study may reflect that the definition of phonological similarity employed here is not able to capture effects in young children. Indeed previous work finds phonological information useful in explaining cognitive behaviour and language learning tasks in adults [27, 49, 50]. Other metrics of phonological similarity for children should be tested in future work. Even considering these limitations, we have shown strong evidence that multiplex lexical networks provide a powerful formalised representation for exploring and explaining cognitive processes using language, capturing a richer picture of the mental lexicon than previous works using only single-layer networks.

11

Methods Age of acquisition dataset Age of acquisition orderings are constructed based on the MacArthur-Bates Communicative Development Inventory (CDI) norms [36]. The CDI, based on parent report of the productive vocabulary of children aged between 16 and 30 months, has been shown to be predictive of future language ability in young children [36,51]. The norms are averaged over more than 1000 vocabulary reports and indicate the percentage of children, at a given age, that reportedly produce (and understand) a specific word. From these norms, we inferred an ordering of learning. For example, ”mommy” is reported as produced by 93% of children by 16 months; this word is learned earlier than the word ”chair” which is produced by only 14% of 16 month olds. We assume a word is known once 50% of children in a given month produce that word, we then order words within a month based on the assumption that higher rates of production indicate earlier learning. Note that this type of ordering has been used in other developmental modelling contexts [15, 47] and can be used to define a strict ordering of learning across all items on the CDI.

Construction of the multiplex network The four layers of the MLN were selected according to previous single-layer network studies investigating which single-layer network representations may carry information about word learning by young children [6, 15, 17, 19]. The four layers we consider here are (i) a semantic layer based on the Florida Free Association Norms [12] where words A and B are connected if one of them remind of the other; (ii) a layer capturing feature similarity based on the McRae Feature Norms [14] where words A and B are connected if they share more than one semantic features; (iii) a layer based on word co-occurrences (measured in child-directed speech [16]) where words A and B are connected if they co-occur more than X times, where the threshold X was chosen approximately to match the link density of the other semantic layers; and a (iv) layer capturing phonological similarities (based on WordNet 3.0 [52]) where words A and B are connected if they have phonological transcriptions with edit distance 1. We treat the association layer as undirected and edge weights in layers (i) to (iii) are ignored. The resulting MLN is composed of 529 words which represent the intersection between the CDI data and the network information for words that have at least one connection. For more details we refer to the Supporting Information.

Overlap measures and word ranking A word trajectory is an ordered list τ = (w1 , w2 , ..., wN ) and the learning trajectory derived from the CDI gives an empirical ordering τaoa for word learning as observed in children. Predicted word orderings are generated as follows. First, each word is ranked according to a word score, si , which is derived from network specific or extrinsic measures (e.g. frequency or word length). Words are then ordered according to word score, starting with the largest score. Words receive a position in the predicted trajectory according to their position in the ordering of word scores. Ties in word orderings si are taken into account by averaging over all resolutions. For example, one might wish to consider orderings based on degree in the association layer and would obtain the following word scores sf ood = 62, swater = 45, seat = 20. The corresponding predicted acquisition trajectory would be τ = (f ood, water, eat). Apart from the measures mention in the main text of the paper, we also considered further measures listed in the Supporting Information.

12

To evaluate predictive performance of a word score, we first measure the overlap O(τ, τaoa , t) between the predicted learning trajectory τ and the empirically known trajectory τaoa at time t by counting the number of words that co-occur in τ and τaoa up to time t. We then define relative word gain G(τ, τaoa , t) as the overlap minus the random performance, i.e. G(τ, τaoa , t) =

O(τ, τaoa , t) − hO(τrand , τaoa , t)i . t

(1)

The reference case of random word ordering is constructed by averaging over random permutations of τaoa . Relative word gain gives a measure of the predictive power of word scores. A value of G = 0 indicates performance equivalent to random guessing, while scores larger (smaller) than zero indicate performance superior (inferior) to random guessing. We also consider the average word gain over the entire time period or specific learning periods.

Optimal combinations of layers To allow for varying influences of layers we construct word scores as convex combinations of individual layer word scores, i.e. a word score sw for word w is obtained by combining layerspecific values of that word eat) hon) sw = αs(Asso) + βs(F + γs(Co−occ) + (1 − α − β − γ)s(P , w w w w (Asso)

(F eat)

(Co−occ)

(2)

(P hon)

where sw , sw , sw , and sw give the word scores obtained by using single layer metrics in the respective layers and coefficients α, β, γ give the influence of each layer on the word score sw . This linear combination of single-layer features is similar in spirit to previous decompositions of multiplex centrality in edge-coloured graphs, see [53] and discussions in [42]. For individual layer word scores we consider all relevant single-layer network statistics, see Supporting Information. Notice that, contrary to ranking experiments above, word scores sw now depend on the combined set of coefficients, so that different values of α, β, γ can lead to different word orderings for the same sets of individual layer scores. For a given network statistic we then find the set of coefficients that maximises average word gain (either over the entire time period or over specific learning phases) in a data set in which 20% of the words have been left out at random. 20% was chosen as a reasonable value, in order not to remove too many words particularly in the very early word learning stages (when only 20 or 40 words are learned). To perform the optimisation a numerical scheme using the differential evolution method [54] is used and averages are calculated over 50 configurations of left out words. In low dimensional cases, visualisations of the search space can be given in ternary plots (see Fig. 5(a) and (b)) and we have verified that global optima have been reached by the numerical scheme in these cases.

References 1 Ramon

Ferrer i Cancho and Ricard V Sol´e. Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences, 100(3):788–791, 2003.

2 Jean

Aitchison. Words in the mind: An introduction to the mental lexicon. John Wiley & Sons, 2012.

3M

Ross Quillian. Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral science, 12(5):410–430, 1967.

13

4 Allan

M Collins and M Ross Quillian. Retrieval time from semantic memory. Journal of verbal learning and verbal behavior, 8(2):240–247, 1969.

5 Allan

M Collins and Elizabeth F Loftus. A spreading-activation theory of semantic processing. Psychological review, 82(6):407, 1975.

6 Javier

Borge-Holthoefer and Alex Arenas. Semantic networks: structure and dynamics. Entropy, 12(5):1264–1302, 2010.

7 Andrea

Baronchelli, Ramon Ferrer-i Cancho, Romualdo Pastor-Satorras, Nick Chater, and Morten H Christiansen. Networks in cognitive science. Trends in cognitive sciences, 17(7):348–360, 2013.

8 Mikko

Kivel¨ a, Alex Arenas, Marc Barthelemy, James P Gleeson, Yamir Moreno, and Mason A Porter. Multilayer networks. Journal of Complex Networks, 2(3):203–271, 2014.

9 Stefano

Boccaletti, G Bianconi, R Criado, Charo I Del Genio, J G´omez-Garde˜ nes, M Romance, I Sendina-Nadal, Z Wang, and M Zanin. The structure and dynamics of multilayer networks. Physics Reports, 544(1):1–122, 2014.

10 Michael

S Vitevitch. What can graph theory tell us about word learning and lexical retrieval? Journal of Speech, Language, and Hearing Research, 51(2):408–422, 2008.

11 Manlio

De Domenico, Albert Sol´e-Ribalta, Emanuele Cozzo, Mikko Kivel¨a, Yamir Moreno, Mason A Porter, Sergio G´ omez, and Alex Arenas. Mathematical formulation of multilayer networks. Physical Review X, 3(4):041022, 2013.

12 Douglas

L Nelson, Cathy L McEvoy, and Thomas A Schreiber. The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36(3):402–407, 2004.

13 Mark

Steyvers and Joshua B Tenenbaum. The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive science, 29(1):41–78, 2005.

14 Ken

McRae, George S Cree, Mark S Seidenberg, and Chris McNorgan. Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4):547–559, 2005.

15 Thomas

T Hills, Mounir Maouene, Josita Maouene, Adam Sheya, and Linda Smith. Longitudinal analysis of early semantic networks preferential attachment or preferential acquisition? Psychological Science, 20(6):729–739, 2009.

16 Brian

MacWhinney. The CHILDES project: The database, volume 2. Psychology Press,

2000. 17 Nicole

Beckage, Linda Smith, and Thomas Hills. Small worlds and semantic network growth in typical and late talkers. PloS one, 6(5):e19348, 2011.

18 Fernanda

Marafiga Wiethan, Let´ıcia Arruda N´oro, and Helena Bolli Mota. Early lexical and phonological acquisition and its relationships. In CoDAS, volume 26, pages 260–264. SciELO Brasil, 2014.

19 Nicole

M Beckage and Eliana Colunga. Language networks as models of cognition: Understanding cognition through language. In Towards a Theoretical Framework for Analyzing Complex Linguistic Networks, pages 3–30. Springer, 2015.

20 Adilson

E Motter, Alessandro PS de Moura, Ying-Cheng Lai, and Partha Dasgupta. Topology of the conceptual network of language. Physical Review E, 65(6):065102, 2002.

21 Mariano

Sigman and Guillermo A Cecchi. Global organization of the WordNet lexicon. Proceedings of the National Academy of Sciences, 99(3):1742–1747, 2002.

14

22 Ramon

Ferrer i Cancho and Richard V Sol´e. The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261–2265, 2001.

23 Joaqu´ ın

Go˜ ni, Gonzalo Arrondo, Jorge Sepulcre, Inigo Martincorena, Nieves V´elez de Mendiz´abal, Bernat Corominas-Murtra, Bartolom´e Bejarano, Sergio Ardanza-Trevijano, Herminia Peraita, Dennis P Wall, et al. The semantic organization of the animal category: Evidence from semantic verbal fluency and network theory. Cognitive processing, 12(2):183– 196, 2011.

24 Simon

De Deyne and Gert Storms. Word associations: Network and semantic properties. Behavior Research Methods, 40(1):213–231, 2008.

25 Massimo

Stella and Markus Brede. Patterns in the English language: Phonological networks, percolation and assembly models. Journal of Statistical Mechanics, 2015:P05006, 2015.

26 Massimo

Stella and Markus Brede. Investigating the phonetic organisation of the english language via phonological networks, percolation and Markov models. In Proceedings of ECCS 2014, pages 219–229. Springer International Publishing, 2016.

27 Michael

S Vitevitch, Kit Ying Chan, and Steven Roodenrys. Complex network structure influences processing in long-term and short-term memory. Journal of memory and language, 67(1):30–44, 2012.

28 Haitao

Liu and Jin Cong. Empirical characterization of modern chinese as a multi-level system from the complex network approach. Journal of Chinese Linguistics, 42(1):1–38, 2014.

29 Sanda

Martinˇci´c-Ipˇsi´c, Domagoj Margan, and Ana Meˇstrovi´c. Multilayer network of language: A unified framework for structural analysis of linguistic subsystems. Physica A: Statistical Mechanics and its Applications, 457:117–128, 2016.

30 Manlio

De Domenico, Clara Granell, Mason A Porter, and Alex Arenas. The physics of multilayer networks. arXiv preprint arXiv:1604.02021, 2016.

31 Michael

Szell, Renaud Lambiotte, and Stefan Thurner. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences, 107(31):13636–13641, 2010.

32 Jes´ us

G´ omez-Gardenes, Irene Reinares, Alex Arenas, and Luis Mario Flor´ıa. Evolution of cooperation in multiplex networks. Scientific reports, 2, 2012.

33 Federico

Battiston, Andrea Cairoli, Vincenzo Nicosia, Adrian Baule, and Vito Latora. Interplay between consensus and coherence in a model of interacting opinions. Physica D: Nonlinear Phenomena, 2015.

34 Shai

Pilosof, Mason A Porter, and Sonia K´efi. Ecological multilayer networks: A new frontier for network ecology. arXiv preprint arXiv:1511.04453, 2015.

35 Massimo

Stella, Cecilia S Andreazzi, Sanja Selakovic, Alireza Goudarzi, and Alberto Antonioni. Parasite spreading in spatial ecological multiplex networks. arXiv preprint arXiv:1602.06785, 2016.

36 P

S Dale and L Fenson. Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers, 28(1):125–127, 1996.

37 Mark

Newman. Networks: an introduction. Oxford University Press, 2010.

15

38 Manlio

De Domenico, Albert Sol´e-Ribalta, Sergio G´omez, and Alex Arenas. Navigability of interconnected networks under random failures. Proceedings of the National Academy of Sciences, 111(23):8351–8356, 2014.

39 Duncan

J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. nature, 393(6684):440–442, 1998.

40 Marcus

Kaiser. Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks. New Journal of Physics, 10(8):083042, 2008.

41 Federico

Battiston, Vincenzo Nicosia, and Vito Latora. Structural measures for multiplex networks. Physical Review E, 89(3):032804, 2014.

42 Manlio

De Domenico, Vincenzo Nicosia, Alexandre Arenas, and Vito Latora. Structural reducibility of multilayer networks. Nature communications, 6, 2015.

43 Michael

Zock. Words in books, computers and the human mind. Journal of Cognitive Science, 16(4):355–378, 2015.

44 Valerio

Gemmetto and Diego Garlaschelli. Multiplexity versus correlation: the role of local constraints in real multiplexes. Scientific reports, 5, 2015.

45 Adrien

Barbaresi. Language-classified Open Subtitles (LACLOS): download, extraction, and quality assessment. PhD thesis, BBAW, 2014.

46 Victor

Kuperman, Hans Stadthagen-Gonzalez, and Marc Brysbaert. Age-of-acquisition ratings for 30,000 english words. Behavior Research Methods, 44(4):978–990, 2012.

47 N

M Beckage, A Aguilar, and E Colunga. Modeling lexical acquisition through networks. Proc. of the 37th Conf. of the Cog. Sci. Society, 2015.

48 Manlio

De Domenico, Albert Sol´e-Ribalta, Elisa Omodei, Sergio G´omez, and Alex Arenas. Ranking in interconnected multilayer networks reveals versatile nodes. Nature communications, 6, 2015.

49 Michael

S Vitevitch, Kit Ying Chan, and Rutherford Goldstein. Insights into failed lexical retrieval from network science. Cognitive psychology, 68:1–32, 2014.

50 Melissa

K Stamer and Michael S Vitevitch. Phonological similarity influences word learning in adults learning spanish as a foreign language. Bilingualism: Language and Cognition, 15(03):490–502, 2012.

51 L

Fenson, P S Dale, J S Reznick, E Bates, D J Thal, S J Pethick, M Tomasello, C B Mervis, and J Stiles. Variability in early communicative development. Monographs of the society for research in child development, 59(5):1–185, 1994.

52 George

A Miller. WordNet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.

53 Luis

Sol´ a, Miguel Romance, Regino Criado, Julio Flores, Alejandro Garcia del Amo, and Stefano Boccaletti. Eigenvector centrality of nodes in multiplex networks. Chaos: An Interdisciplinary Journal of Nonlinear Science, 23(3):033131, 2013.

54 Kenneth

Price, Rainer M Storn, and Jouni A Lampinen. Differential evolution: a practical approach to global optimization. Springer Science & Business Media, 2006.

55 Manlio

De Domenico, Mason A Porter, and Alex Arenas. Muxviz: a tool for multilayer analysis and visualization of networks. Journal of Complex Networks, page cnu038, 2014.

16

Author contributions statement M.S. designed the original study, M.S., M.B. and N.B. conceived the experiments, N.B. provided the empirical data, M.S. conducted the experiments, M.S., M.B. and N.B. analysed the results. All authors reviewed the manuscript.

Additional information The authors declare no competing financial interests. M.S. was supported by an EPSRC Doctoral Training Centre grant (EP/G03690X/1). The authors acknowledge the use of the muxViz software [55].

17