networks including âsmall worldâ phenomenon, âscale freeâ property and ... Table 1. Coefficient of node degree correlation for social networks, technological net-.
Online Social Network Model Based on Local Preferential Attachment Yang Yang, Shuyuan Jin, Yang Zuo, and Jin Xu Peking University, Beijing, 100871, P.R. China {yyang1988,jinshuyuan,yangzuo,jxu}@pku.edu.cn
Abstract. Social network is a kind of complex networks which reflects social relations among people and is characterised by distinctive node degree correlation. In recent years, the node degree correlation of online social network (OSN) has been found to undergo a transition from assortativity to dissortativity. To reveal the mechanism of network evolution that leads to this transition, in this paper, an OSN model based on local preferential attachment (LPA) is proposed. The LPA model is proved to reproduce not only the transition of node degree correlation from assortative to dissortative, but also common characteristics of all social networks including “small world” phenomenon, “scale free” property and community structures. Keywords: Local preferential attachment, Node degree correlation, assortativity.
1
Introduction
It is evident that the structural and functional properties of complex networks are dependent on their static or evolutionary network structures. In order to gain a better understanding of relationships between them, appropriate network models are needed for theoretical and experimental studies. For instance, the WS model [1] and NW model [2] were built to introduce randomness into regular networks by rewiring existing edges, which helps to explain that the “small world” networks are networks of intermediate randomness between regular networks and random networks with short average path length and high clustering coefficient. The BA model [3] creates networks with preferential attachment and proved that equivalent behaviors of webpage links or human interactions are the cause of pow-law distribution in corresponding real world networks. The development of social network studies has partially stimulated the improvement of understanding complex networks. After thoroughly theoretical and experimental researches, a general belief exists that essential properties of social networks are short average path length, high degree clustering coefficient, scale-free node degree distribution, community structures and assortative node degree correlation (see Table 1). Therefore, different approaches have been taken
Corresponding author.
W. Han et al. (Eds.): APWeb 2014 Workshops, LNCS 8710, pp. 35–46, 2014. c Springer International Publishing Switzerland 2014
36
Y. Yang et al.
Table 1. Coefficient of node degree correlation for social networks, technological networks and biological networks Network Physics coauthorship Biology coauthorship
size 52909
r
Ref.
0.363 [4]
1520251 0.127 [4]
Mathematics coauthorship 253339 0.120 [4] File actor collaborations Company directors Power grid
449913 0.208 [4] 7673
0.276 [4]
4941
0.003 [5]
Internet
10697 -0.189 [4]
World Wide Web
269504 -0.065 [4]
Protein interaction
2115
-0.156 [4]
Neural network
307
-0.163 [4]
Marine food web
134
-0.247 [4]
Freshwater food web
92
-0.276 [4]
to build social network models that reproduce part or all of the essential features. In recent years, however, studies have found that OSNs are not assortative but mostly dissortative or inapparent assortative networks [6] (see Table 2). To make a further explanation, we firstly divide social networks into traditional social networks and OSNs according to the type of human interactions. In OSNs, social relationships are maintained by virtual web links, such as followships in twitter and acquaintanceships in facebook, while in traditional social networks, links are maintained by face-to-face contact, such as collaborations of actors or scientists. Thereby, an adjustment of the former conclusion may be made that traditional social networks are assortative networks and OSNs are dissortative ones. However, a recent study on a large social network called Wealink1 with data sets over 27 months found that the network underwent a transition from the initial assortativity characteristic of traditional networks to subsequent dissortativity characteristic of many OSNs [6]. Time before this, no studies had ever been involved with the transition of node degree correlation of an OSN and there had been a general belief that social networks always belong to the same type of assortativity or dissortativity from time to time. In this paper, a novel model is proposed to reproduce the transition of node degree correlation in OSNs. Numerical simulations indicate that this network model also exhibits “small world” phenomenon, scale-free property and community structures. The rest of the paper is organized as follows: in Section 2, we give a brief review of related works. Section 3 introduces the motivation of LPA model and 1
http://www.wealink.com
Online Social Network Model Based on LPA
37
Table 2. Node degree correlation coefficient of online social networks Network Cyworld
size
r
12048186 -0.13
Ref. [7]
nioki
50259
-0.13
[8]
MySpace
100000
0.02
[7]
100000
0.02
[7]
Orkut Xiaonei Flickr
396836 -0.0036 [9] 1846198
0.202 [10]
LiveJournal 5284457
0.179 [10]
YouTube mixi
1157827 -0.033 [10] 360802
0.1215 [10]
its algorithm. Simulation results are discussed in Section 4 and conclusions are made in the last section.
2
Related Works
It was argued in [11] that assortative mixing is an important feature that differentiates social network from other complex networks. This has been taken into consideration by many works in social network modeling [11–13]. However, there are still models not involve assortative mixing [14–17]. In this section, we briefly summarize the related work from the following two perspectives: social network models without assortative mixing, and social network models involve assortative mixing. 2.1
Models without Assortative Mixing
Social network models that does not include assortativity are the ones [15, 18] proposed before [11] or the ones that focus on aspects of other network properties [14, 16]. Taking geographical space into consideration, a spatial social network model was built in [14] by letting the edge probability between any two nodes dependent on their spatial distances. The model captures “small-world” properties, skewed degree distribution and community structures. Another model in [15] was built to reproduce short path length, high clustering, scale-free or exponential link distributions and it was based on the assumption that acquaintanceships tend to form between any two friends of a person. A community structure oriented model was built in [16] based on inner-community preferential attachment and inter-community preferential attachment mechanisms, its community structure and scale-free properties was proved by theoretical results and numerical simulations. In [18], a model which incorporates three features extracted from real human behavior was proposed and reproduces high level of clustering or
38
Y. Yang et al.
network transitivity and strong community structures. Above all, all the models mentioned above do not incorporate with node degree correlation. 2.2
Models Involve Assortative Mixing
The assortative mixing pattern of social networks was firstly found in [4] and a model of assortatively mixed network was proposed. Within the model, networks were found to percolate more easily and more robust to vertex removal if they are assortative. In [11], a simple model was built to assist the explanation of latent relationship between assortative mixing and variation in the sizes of the communities. In [12], the social network model was built to produce efficiently very large networks to be used as platforms for studying sociodynamic phenomena. The model was built under a mixture of random attachment and implicit preferential attachment and is characterized by high value of clustering coefficient, assortativity and community structures. Inspired by individual’s altruistic behavior in sociology, a model with “altruistic attachment” growth was developed in [13] and found that altruism leads to emergence of scaling and assortative mixing in social networks. The work in this paper extends the above models by proposing an evolutionary model that incorporates the transition of node degree correlation from assortative to dissortative during its growth. To the best of our knowledge, the model in this paper is the first one focusing on the transition of node degree correlation with other essential features including short average path length, high degree clustering coefficient, scale-free node degree distribution, community structures.
3 3.1
Model Motivation for the Model
To explain the transition from assortativity to dissortativity in Wealink, a conjecture was proposed in [6] that in the early stage of network growth, the network structure of this OSN reflects traditional social networks, which means that users link to other ones who are their friends in real world and makes the OSN an identity mapping of traditional social network. As the OSN attracts more users, preferential attachments occur and the OSN gradually evolves into a dissortative one. Consequently, we build LPA model based on following considerations: for a constantly growing OSN, there exists one tight-knit group at the initial stage. The group is an identity mapping of traditional social network and corresponds to the seed network of LPA model. Link establishment occurs when a new user join in the OSN. The user establishes relationships firstly with a pre-existing member chosen randomly and secondly with the a influential neighbor of the first chosen one. “Local preferential attachment (LPA)” comes from the second relationship formation process, during which the influential neighbor is chosen preferentially based on local information. Namely, the first chosen member with
Online Social Network Model Based on LPA
39
limited neighbors provides partial information for the new user and the uuser cannot get to know the globally most influential members. The influence of a user is measured under node degree centrality [19], i.e. , higher the degree, greater the influence. The LPA model is characterised by three important features: (1) A complete seed network at initial stage. (2) Growing network size, as newcomers are introduced into the club. (3) Establishment of preferential attachment based on partial information, as newcomers attaches to influential neighbors of the first randomly chosen user. 3.2
Model Algorithm
The LPA model we build produces undirected and unweighted networks with nodes representing social network members and links representing relationship of acquaintances. The algorithm is described as follows: Step 1. Initially, the target network N starts with a complete seed network Ns consisting of S0 nodes. Step 2. A new node na arrives at the network. Step 3. A node nb is chosen from N with probability Pr and a link is established between na and nb . Step 4. A node nc is chosen from neighbors of node nb with probability Pl and a link is established between node na and nc . Step 5. Repeat from step 3 to step 4 for m(m < S0 ) times. Step 6. Repeat from step 2 to step 5 until the network size reaches S. Probability Pr and Pl are defined as follows: Pr =
m S0 + r
kj Pl (nj ) = . i ki where r presents repetition times of step 6, ranging from 0 to N − N0 − 1, ki represents node degree of ni .
4
Statistic Results
In this section we present the simulation results of LPA model. Statistical characteristics of the generated networks are proved to be evolutionary degree correlation, “rich club” phenomenon, “small world” phenomenon, “scale free” property and community structures. All results of statistical properties are averaged over 100 iterations.
40
Y. Yang et al.
4.1
Evolution of Node Degree Correlation
Node degree correlation is a comprehensive measurement in describing the pattern of links built between any node pairs. It is the Pearson Correlation coefficient of degree between pairs of linked nodes and defined by Newman [4] as follows: 1 2 M −1 ji ki − [M −1 2 (ji + ki )] i i r= 1 1 2 −1 −1 M 2 (ji + ki ) − [M 2 (ji + ki )] i
i
where M is the total number of links in the network, ji and ki are the degrees of the two nodes at either end of the ith link, respectively. Networks with positive value of r are assortative networks and this indicates that nodes of similar degrees tend to link to each other. Networks with negative node degree correlation are dissortative networks and this indicates that nodes tend to link to others of dissimilar degrees. 0.5
1 bezier curve data points
0.9
0.4
0.8
0.35 Node degree coefficient
0.7 Node degree coefficient
bezier curve data points
0.45
0.6 0.5 0.4 0.3 0.2
0.3 0.25 0.2 0.15 0.1 0.05
0.1
0
0
-0.05 -0.1
-0.1 0
5000
10000
20000 15000 Network size
25000
30000
35000
(a) Coefficient vs. network size. The network size ranges from 1000 to 35000 nodes with m = 3.
1
2
3
5
4
6
7
8
m
(b) Coefficient vs. m. The size of network is 5000 nodes with m ranging from 1 to 7.
Fig. 1. Evolution of node degree correlation
The value of node degree correlation not only manifests in its capability of differentiating social networks from other types of networks, but also it can be used to reveal the trend of acquaintanceship establishment between two people in social networks. As in physics coauthorship and film actor collaboration networks, both of these networks are assortative mixing and it can be inferred that partnerships are inclined to be established between people of same reputation or influence. While in OSNs, the dissortative mixing of some networks indicates a contrary followships establishment, and inconspicuous assortative mixing in some networks indicates random link formation. Particularly, from the evolutionary pattern of Wealink, it can be inferred that at the stage of website establishment, all of new registrants have few and similar links and the network is characterised by assortative mixing. With more registrants joining in, high influential ones emerge and followed by more and more less influential ones, leading
Online Social Network Model Based on LPA
41
the network evolves into a dissortative mixing. The LPA model is built to reveal this evolutionary pattern. Fig. 1(a) shows the simulation results of node degree correlation coefficient. We can see that the coefficient drops dramatically when the network size is smaller than 10000 and it is followed by a smooth downtrend towards a steady negative value. At the end of evolutionary process, the correlation coefficient reaches a stable state. This process adequately reveals the characteristic of node degree correlation with the evolutionary process from assortative to dissortative and in the steady state, the network remains inconspicuous dissortative. Fig. 1(b) shows another property of node degree evolution and node degree correlation coefficient decreases monotonically with the growth of m. It shows that if the newly added members are willing to make more acquaintances, the network will become less assortative. This can be used to explain the phenomenon that traditional social networks are assortative networks while OSNs are dissortative or inconspicuous assortative networks: the social activity in OSNs is maintained by web links, thus one can follow more people than he may be acquainted in real social environment. From the simulation we can conclude that seed network has important impact on the evolutionary pattern of node degree correlation. In the initial stage, the network is a seed network of N0 seed nodes and each node links to other N0 − 1 nodes. Thus the network is assortative mixing with correlation coefficient equals 1. With new nodes of lower degrees added in, pre-existing nodes are more likely to be linked to and some of them grow into influential ones with high degree. While network size is small, the network correlation coefficient drops dramatically with more nodes of low degree attach to influential ones. Until the turning point around 0, the network grows into a larger size and the decline trend becomes smooth. 4.2
“Rich Club” Phenomenon
In human society, the “rich club” phenomenon refers to the fact that people who possess most of the social wealth tend to form tight social relationships. While in social network, the “rich people” correspond to influential nodes. Based on the measurement of node degree centrality, the “rich club” in social network are composed of high degree nodes, which are inclined to form links between each other. Networks produced by LPA model are characterised by “rich club” phenomenon. During the evolutionary process in numerical simulation, one “rich club” forms up. The “club” is a sub-network with only nodes of high degrees linking to each other, while these nodes may be connected with low degree nodes which are not “rich club” members. The occurrence of “rich club” phenomenon can be explained as follows: during the evolutionary process, nodes of seed network are tightly connected with each other and their initial degrees are higher than any newly added member. Therefore, they are more likely to be linked to by newly added nodes and this advantage is amplified when the network size grows larger. Meanwhile, several newly added nodes happen to be linked by
42
Y. Yang et al.
Fig. 2. The “Rich club” phenomenon. This network consists of 1000 nodes with m equals 3. It is displayed using F ruchterman − Reingold layout[20] scheme with nodes of higher degrees placed in the center. Seed nodes are painted in red, newly added nodes green and links between nodes of degrees higher than 100 yellow. In this figure, nodes linked to each other by yellow lines form a tight-knit group of high degree nodes with most of them are red nodes and few are green nodes.
other ones and become members of the “rich club”. As a result, the “rich club” includes most of the seed nodes and a very small part of newly added nodes. This phenomenon is presented in Fig. 2 and we can see that a tightly connected “rich club” forms up at the end of the evolution process. 4.3
“Small-World” Phenomenon
Small world networks are highly transitive networks. These networks possess high proportion of cliques and quasi cliques, of which node pairs are mostly connected by short paths. A simple measurement of network transitivity is the average clustering coefficient, which is the number of closed triplets over the total number of triplets with an interesting saying: the probability of “the friends of one’s friend is also his friend”. The clustering coefficientCi of any node i can be calculated as follows: 2ei Ci = ni (ni − 1) Where ni is the number of neighbors of node i and ei is the number of links between these ni neighbors. And the average clustering coefficient C is the average value of all nodes. It is calculated as follows: C=
N 1 Ci N
Where N represents the network size. To verify the network of “small-world” phenomenon, a second measurement is the length of average shortest paths, which is the mean value of all shortest paths between node pairs. Social networks are typically “small-world” networks. They are more transitive than random networks and are characterised by high average clustering coefficients and short average length of distances between node pairs. Correlation between network evolutionary mechanism and transitivity is analysed in numerical simulations. The results are drawn in Fig. 3(a). In the figure, the network size grows from 0 to 50000 nodes with m equals 3 and network transitivity is calculated with every 1000 nodes added in. From the results, we
0.4 0.39 0.38 0.37 0.36 0.35 0.34 0.33 0.32 0.31 0.3 0.29 0.28 0.27 0.26 0.25 0.24 0.23 0.22 0.21 0.2
43
4.5
bezier curve data points
4 3.5 3
Length
Clustering coefficient
Online Social Network Model Based on LPA
2.5 2 1.5 1 Diameter Average shortest paths Radius
0.5 0 0
5000
10000
15000
20000 25000 30000 Network size
35000
40000
(a) Network transitivity
45000
50000
0
5000
10000
15000
20000
25000 30000 Network size
35000
40000
45000
50000
(b) Network diameter, radius and average shortest paths
Fig. 3. Evolution of network metrics
can see that there is a dramatic decline during the first stage and the descending trend stops when the network size reaches 5000, with the lowest point no less than 0.285. During the second stage, it rises gradually as the network size grows larger. At the equilibrium of network evolution, the clustering coefficient ranges around 0.310. It can be concluded that the networks produced by LPA model exhibit the transition from assortativity to dissortativity, which matches the one proposed by Hu et al [6] in analysing Wealink data sets. The value at the equilibrium of network evolution also matches those of Flickr and LiveJournal at 0.313 and 0.330. It can be explained as follows: initially, the seed network is a complete network with clustering coefficient equals 1. As new nodes added in, the network becomes more and more sparse with a downtrend of clustering coefficient. However, with newly added nodes repeat m iterations on linking to m randomly chosen nodes and their influential neighbors, a definite number of m triangles form and the lower limit of clustering coefficient is set when network grows into large scale. Similarly, the simulation results of the diameter, radius and length of average shortest paths are shown in Fig. 3(b). In the figure, network size grows from 0 to 50000 nodes with m equals 3. Numerical values are calculated as every 1000 nodes add in. The length of diameter and radius are calculated under 1 iteration, while the length of average shortest paths is calculated under 50 iteration. The diameter of network falls between 3 and 4, while radius falls between 2 and 3. The length of average shortest paths ranges between 2 and 3. From the results, we can see that network diameter and radius are network size independent and it can be concluded that the strengths of association between any node pairs are steady and strong. This property matches those of traditional social networks [1] and online social networks [10].
44
Y. Yang et al.
1e+000
5000 10000 15000
Proportation
1e-001
1e-002
1e-003
1e-004
1e-005 10
100 Node degree
1000
Fig. 4. Node degree distribution displayed in log scale. Networks with the size of 5000, 10000 and 15000 nodes displayed in red, green and blue colors.
4.4
“Scale Free” Property
“Scale free” networks are heterogeneous networks with power-law degree distribution. Simulations have been proceeded to verify that preferential attachment based on partial information produces “scale free” networks. Fig. 4 reveals the degree distribution of networks in different sizes. It apparently shows that all of them follow power law distribution and display apparent “big tails”.
Fig. 5. Community division. A random choosen network of 50000 nodes is drawn in 15 colors. This network is displayed under DrL layout[21] with nodes belonging to the same community in one color. There are totally 15 communities of different size in this figure.
4.5
Community Structure
Community structure is one of the most important characteristics in social networks. Communities of different size and closeness are built based on common interests or organizational structure in our social interaction. Under fast algorithm of community detection [22] based on optimization method of modularity [23] or fast greedy algorithm [24] for community detection, simulation networks of different size are tested and different size of communities are found in these
Online Social Network Model Based on LPA
45
networks. To make a further explanation, a randomly chosen network of 5000 nodes is tested under fast greedy algorithm and 15 communities are found under modularity 0.394, which is represented in Fig. 5. The network is displayed under Drl layout [21]. As a result, social network model based on local preferential attachment produces networks with different size of communities, in accordance with real social networks.
5
Conclusions
In this paper, we propose a LPA model to characterize online social network evolution. This model is capable of reproducing networks with characteristics of evolutionary node degree correlation, “small world” phenomenon, “scale free” property and community structures. Unlike most existing models, the proposed model naturally produces networks with the same topological characteristics as real OSNs. From the statistical results we can see that the seed network has important influence on the evolution of node degree correlation, and the evolution process goes from strong assortativity to a steady state of weak dissortativity. Interesting phenomenon of the “rich club” phenomenon in human society is also found. Acknowledgments. This work is supported by the State Key Development Program of Basic of Basic Research of China(973) under Grant Nos. 2013cd329601 and 2013cb329602.
References 1. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world networks. Nature 393, 440–442 (1998) 2. Newman, M.E., Watts, D.J.: Renormalization group analysis of the small-world network model. Physics Letters A 263, 341–346 (1999) 3. Barab´ asi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999) 4. Newman, M.E.: Assortative mixing in networks. Physical Review Letters 89, 208701 (2002) 5. Zhou, T., L¨ u, L., Zhang, Y.C.: Predicting missing links via local information. The European Physical Journal B 71, 623–630 (2009) 6. Hu, H., Wang, X.: Evolution of a large online social network. Physics Letters A 373, 1105–1110 (2009) 7. Ahn, Y.Y., Han, S., Kwak, H., Moon, S., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th International Conference on World Wide Web, pp. 835–844. ACM (2007) 8. Holme, P., Edling, C.R., Liljeros, F.: Structure and time evolution of an internet dating community. Social Networks 26, 155–174 (2004) 9. Fu, F., Liu, L., Wang, L.: Empirical analysis of online social networks in the age of web 2.0. Physica A: Statistical Mechanics and its Applications 387, 675–684 (2008)
46
Y. Yang et al.
10. Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42. ACM (2007) 11. Newman, M.E., Park, J.: Why social networks are different from other types of networks. Physical Review E 68, 036122 (2003) 12. Toivonen, R., Onnela, J.P., Saram¨ aki, J., Hyv¨ onen, J., Kaski, K.: A model for social networks. Physica A: Statistical Mechanics and its Applications 371, 851– 860 (2006) 13. Li, P., Zhang, J., Small, M.: Emergence of scaling and assortative mixing through altruism. Physica A: Statistical Mechanics and its Applications 390, 2192–2197 (2011) 14. Wong, L.H., Pattison, P., Robins, G.: A spatial model for social networks. Physica A: Statistical Mechanics and its Applications 360, 99–120 (2006) 15. Davidsen, J., Ebel, H., Bornholdt, S.: Emergence of a small world from local interactions: Modeling acquaintance networks. Physical Review Letters 88, 128701 (2002) 16. Li, C., Maini, P.K.: An evolving network model with community structure. Journal of Physics A: Mathematical and General 38, 9741 (2005) 17. Marsili, M., Vega-Redondo, F., Slanina, F.: The rise and fall of a networked society: A formal model. Proceedings of the National Academy of Sciences of the United States of America 101, 1439–1442 (2004) 18. Jin, E.M., Girvan, M., Newman, M.E.: Structure of growing social networks. Physical review E 64, 46132 (2001) 19. Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1, 215–239 (1979) 20. Fruchterman, T.M., Reingold, E.M.: Graph drawing by force-directed placement. Software: Practice and Experience 21, 1129–1164 (1991) 21. Martin, S., Brown, W., Klavans, R., Boyack, K.: Drl: Distributed recursive (graph) layout. SAND2008-2936J: Sandia National Laboratories (2008) 22. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008) 23. Newman, M.E.: Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582 (2006) 24. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Physical Review E 70, 66111 (2004)