Internetworking Assortativity in Facebook - Semantic Scholar

2 downloads 388 Views 98KB Size Report
new form of assortativity for Facebook w.r.t. other seven (top- level) on-line social ... represents the most effective way to conquer new market segments.
Internetworking Assortativity in Facebook Francesco Buccafurri

Gianluca Lax

Antonino Nocera

Domenico Ursino

DIIES Dept. Univ. of Reggio Calabria Italy e-mail: [email protected]

DIIES Dept. Univ. of Reggio Calabria Italy e-mail: [email protected]

DIIES Dept. Univ. of Reggio Calabria Italy e-mail: [email protected]

DIIES Dept. Univ. of Reggio Calabria Italy e-mail: [email protected]

Abstract—The role of assortativity in real-world and on-line social networks has been largely investigated in the literature, in which several forms of assortativity have been analyzed. However, all these forms are not able to capture some pieces of knowledge that are strategic when moving from a single-social-network to a multiple-social-network perspective. The relevance of such a point of view is strongly increasing, due to the interaction among users, applications, information flows of different social networks. This interaction is the key feature of a new emerging paradigm called social internetworking scenario. Here, all the knowledge concerning information crossing different social networks assumes high importance. This stimulates the study of assortativity under the social internetworking scenario perspective. In this paper, we propose a new notion of assortativity that captures some important aspects concerning the above issue. Furthermore, we give an effective methodology for its computation. A deep experimental analysis has been performed aimed to measure the new form of assortativity for Facebook w.r.t. other seven (toplevel) on-line social networks.

Keywords: assortativity, assortative mixing, social networks, social internetworking scenarios, Facebook I.

interact with each other. This implies that a lot of information may cross different social networks resulting in a substrate of a challenging scenario, in which the scope of the action of both people and applications is not confined to a single social network. Consider for example the case of applications aimed at implementing tribal marketing strategies, in which allowing the passage of information from a certain social network (a community) to other social networks (different communities) represents the most effective way to conquer new market segments. Therefore, a social internetworking scenario can be seen as a sort of “federation” of social networks and the key issue on which focusing our attention is just how information can cross different social networks. From this new perspective, classical measures and properties defined in the context of social network analysis should be reviewed to take the intersocial-network aspects into account. Among these, assortativity is certainly an important issue to deal with. Indeed, if we apply the classical assortativity notions, no knowledge about the specificity of the social internetworking scenario can be drawn, since no SIS specific entities are included in these notions.

I NTRODUCTION

The term “assortativity” or “assortative mixing” indicates the preference, for a network node, to relate to other nodes that are someway similar. The similarities can be various even though, in most cases, node degree is chosen. In this case, the term “degree-degree assortativity” is adopted. The concept of assortativity was introduced in the renowned paper of Newman [1]. Here, the author presents this concept, specializes it to degree-degree assortativity, and shows that, from this point of view, real social networks are often assortative, whereas technological and biological networks show a disassortative behavior. In the past, assortativity in social networks has been largely investigated, as witnessed by the high number of papers about this topic (see Section II). Several forms of assortativity have been analyzed (for instance, betweenness centrality betweenness centrality assortativity), and important results, along with several possible applications, have been presented. In the last years, (on-line) social networks have shown a spectacular growth, both in their number and in their dimension. This led to a scenario in which a single user joined more social networks that appear strictly connected with each other to form a unique context called social internetworking scenario [2], [3]. This scenario cannot be seen as a unique social graph since the involved social networks maintain their specificities. On the other side, it cannot be seen as a set of disconnected social networks since users registered to more social networks allow other users, who did not join the same social networks, to

Conversely, the idea underlying this paper is to introduce some new notions of assortativity that are able to capture some peculiarities of the social internetworking scenario and, thus, to give us potential tools to study the crossing-social-network information passing. Concerning this aspect, the key role is played by “bridge nodes” and “me edges” [4]. A me edge represents a connection between two accounts of the same user in two different social networks, declared by the user herself through simple tools provided by the most social networks (requiring just some clicks of the user). A bridge (node) is associated with the account of a user in which a me edge is declared. Due to the importance of bridges from this new perspective, it appears interesting to understand whether the bridge behavior follows some form of assortativity specific for social internetworking scenario. The relevance of this issue is not simply related to its analytical aspects, but may enable a number of possible applications, such as: (i) information spreading (as described above), (ii) intelligent techniques for detecting missing me edges (since bridge assortativity, if there exists, may drive the search of candidates towards the neighborhood of declared bridges), and (iii) supporting a crawling strategy oriented to a social internetworking scenario (because standard crawling strategies may remain confined in the starting social network, if we do not force, in some way, the visit of bridges to move towards other social networks).

In this paper, we introduce and study this new form of bridge-bridge assortativity, by declining it in two different versions, namely: (i) Loose Internetworking Assortativity (LIA), denoting the preference of a bridge to have other bridges among its friends, and (ii) Strict Internetworking Assortativity (SIA), representing the preference of a bridge, having a me edge from a social network S to a social network T , to have, among its friends, other bridges having a me edge from S to T . We present also a methodology for the computation of these two kinds of assortativity, along with a suitable experimental campaign to test their significance. In these experiments, we consider a number of social networks, including the most famous ones, and we compute the Loose Internetworking Assortativity and the Strict Internetworking Assortativity of Facebook, whose importance, popularity and attractiveness in the social network analysis field is unquestionable. The plan of this paper is as follows: Section II presents related literature about assortativity. Our assortativity measure is proposed in Section III. Section IV describes our experimental campaign. Some possible applications of our assortativity notion are introduced in Section V. Finally, in Section VI, we draw our conclusions. II.

R ELATED W ORK

The concepts of assortativity and degree assortativity have been introduced in the renowned paper of Newman [1]. Here, the author defines a measure of assortativity for networks and shows that real social networks are often assortative, whereas technological and biological networks tend to be disassortative. In the same paper, the author proposes a model for an assortative network and exploits it for analytic and numeric studies. At the end of this task, he finds that assortative networks tend to percolate more easily than disassortative ones and that they are more robust to node removal. A further important study concerning social network assortativity has been proposed in [5]. In this paper, the authors confirm the results of [1] and investigate the relation between clustering and assortativity in the communities composing a social network. In particular, they show that group structure does not depend only on observed clustering but it also accounts for degree assortativity. Finally, they demonstrate that assortativity can be expected whenever there is a variation in the group size and that the predicted level of assortativity compares well with that observed in real-world networks. The concept of assortativity in social networks is a particular case of that of homophily. This derives from the famous homophily principle “similarity breeds connection” [6] that can be applied for network ties of every type. The result is that people’s personal networks are homogeneous w.r.t. many sociodemographic, behavioral, and intrapersonal characteristics. Homophily limits peoples social worlds in received information, attitudes and interactions. In the wake of [1] and [5], the authors of [7] develop a model that, starting from microscopical mechanisms of growth, allows the modeling of biological, technological and on-line social networks. This model is exploited to perform statistical evaluations. The results of this task show that the statistical properties of biological, technological and on-line social networks are in good agreement with those of the real-world social networks of scientists co-authoring papers in condensed matter physics. In this analysis, assortativity

plays a key role. Indeed, the authors show that while the majority of technological and biological networks appear to be disassortative with respect to the degree, social networks are generally assortative. A further development of the investigation of [7] is presented in [8]. Here, the authors propose an analysis on assortativity/disassortativity for different kinds of network. In particular, they focus on the same network categories considered in [1], [5], [7]. They confirm the results of [7] about the disassortativity of biological and technological networks. Furthermore, they show that real social networks are assortative. As for on-line social networks, differently from the wide-spread belief and the results of [7], they find that not all of them are assortative. Specifically, they find that the Chinese professional social networks Wealink underwent a transition from degree assortativity to degree disassortativity. Finally, in the same paper, the authors investigate the relations among network assortativity, clustering and modularity. Most of the results of [8] are confirmed in [9], in which the authors study the structural evolution of large on-line social networks. This investigation reveals that, with the huge increase of the size of these networks, many network properties, such as density, clustering, heterogeneity, and modularity, show a non-monotone behavior. As for degree assortativity, the authors demonstrate that, as a consequence of their growth, on-line social networks underwent a transition from degree assortativity, characteristic of collaborative networks, to degree disassortativity. In the wake of the previous papers, the authors of [10] analyze assortativity on three very large social networks, namely Cyworld, MySpace and Orkut. They compute the degree assortativity of these three networks and find that on-line social networks encouraging activities that cannot be copied in real life, do not show a similar degree correlation pattern to real-life social networks. An opposite behavior is observed for those on-line social networks handling activities similar to real-life ones. In [11], the authors perform an analysis of assortativity and other network parameters, on both standard social graphs and interaction graphs. Here ties represent real interactions between users, instead of static friendship relationships, as it happens for social graphs. As far as assortativity is concerned, the authors found that interaction graphs present a higher assortativity than social graphs. They verified their conjectures by performing experimental tests on Facebook. In [12], the authors present a study on degree assortativity for co-author networks. In particular, they define a growth model for this kind of network and show that the generated network presents the same features and, in particular, the same degree-assortativity level, of a real co-author network (i.e., cul.arxiv.org). In [13], the authors present two algorithms to change the correlation degree among nodes in a network by keeping unchanged the degree distribution. To obtain this feature, these algorithms simply rewire network links until the expected level of assortativity (resp., disassortativity), in accordance to a specific parameter p, is obtained. As a consequence, assortativity (resp., disassortativity) acts as an engine guiding the network changes. A further contribution of this paper is an investigation of the properties of the new-generated network w.r.t. the original one. This investigation shows that, although the degree distribution remains unchanged, the variations on assortativity level cause significant changes on several other parameters, such as clustering coefficient, shell structure and

percolation. An interesting study about the relationship between assortativity and centrality can be found in [14]. In this paper, the authors investigate the relation between the degreedegree correlation coefficient and the BC-BC (i.e., betweenness centrality-betweenness centrality) one. In particular, they classify networks on the basis of degree-degree correlation into three categories, namely assortative, disassortative and neutral. After this, they verify what happens for the three network categories when BC-BC correlation is considered. They find that, for disassortative and neutral networks, the BCBC correlation has the same trend as degree-degree correlation. By contrast, for assortative networks, given a node with BC equal to g, then the average BC of its neighbors is almost independent from g. This implies that each network user is surrounded by almost the same influential environments of people, no matter how influential she is [14]. A deep study about the relation between Shannon entropy and degree assortativity can be found in [15]. In this paper, the authors define a general class of degree-degree correlated networks and obtain the corresponding Shannon entropy starting from some suitable parameters. A first result found by performing some tests on this general class is that the maximum entropy does not typically correspond to neutral networks but to either assortative or disassortative ones. Specifically, they show that for highly heterogeneous scale-free networks, the maximum entropy is obtained for disassortative networks. Starting from this result, they construct a model that, in the absence of further knowledge concerning network evolution, allows the computation of the expected value of disassortativity. In case empirical observations on a network deviate from the model prediction, it is possible to infer that the network is assortative (this generally happens in social networks). In [16], the author performs sensitivity analyses to evaluate the impact of missing data on the structural properties of social networks. The analysis shows that some missing data mechanisms can dramatically alter estimates of several network parameters and that assortativity is one of them. Finally, he shows that, differently from wide-spread belief, degree assortativity does not necessarily improve network robustness to random omission of nodes. In [17], the authors investigate the assortativity of psychological states in real social networks and on-line social networks. In particular, they want to verify if the tendency of being assortative shown by real-world social networks is also valid for on-line social networks. For this purpose, they perform an analysis of six month records of tweets in Twitter and show that this network is equally subject to the social mechanisms that cause assortativity in real social networks. Specifically, they show that assortativity takes place at the level of happiness or subjective well-being. Their result can be exploited for better understanding how both positive and negative sentiments spread through on-line social ties. A further analysis of assortativity on Twitter is performed in [18]. Here, the authors crawl the entire Twitter site and obtain 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. Their analysis concerned several network parameters, such as degree distribution, diameter, reciprocity of user friendship declaration, homophily and assortativity. The authors investigate these two parameters in two dimensions, namely geographic location and popularity, and obtain some interesting knowledge patterns. For instance,

they showed that users with 1,000 followers or less are likely to be geographically close to their reciprocal-friends and also have similar popularity with them. An interesting application of degree assortativity is proposed in [19]. Here, the authors exploit this measure, along with several other ones, to classify YouTube users in spammers, promoters, and legitimates. In the proposed algorithm node assortativity is considered for all the four types of degree-degree correlations (i.e., in-in, in-out, out-in, out-out). As pointed out in the introduction, our paper lies in the wake of the literature about assortativity mentioned above. However, to the best of our knowledge, it represents the first attempt to define assortativity on social internetworking scenarios instead of on single social networks. III.

I NTERNETWORKING A SSORTATIVITY

In this section, we present our proposal of internetworking assortativity tailored to a social internetworking scenario. In our proposal, a key role is played by bridges, since they are the main actors in this context. In particular, our internetworking assortativity concept is based on the observation that often bridges have other bridges as their friends. This observation has been already guessed to detect missing me edges [4]. In a SIS, we can identify two forms of assortativity: •

Loose Internetworking Assortativity (LIA), as the preference of a bridge to have other bridges among its friends;



Strict Internetworking Assortativity (SIA), as the preference of a bridge, having a me edge from a social network S towards a social network T , to have, among its friends, other bridges having a me edge from S to T.

To define the above measures, we need some preliminary definitions and results. Observe that, as usual in social network analysis, we represent social networks as graphs. Definition 3.1: A social internetworking system (SIS) Ω is a directed graph G = ⟨N, E⟩, where N is the set of nodes, E is the set of edges (i.e., ordered pairs of nodes) and N is partitioned into subsets each corresponding to a social network. Given a node a ∈ N we denote by S(a) the social network which a belongs to. E is partitioned into two subsets Ef and Em . Ef is said the set of friendship edges and Em is the set of me edges. Ef is such that for each (a, b) ∈ Ef , S(a) = S(b), while Em is such that for each (a, b) ∈ Em , S(a) ̸= S(b). Given a social network S belonging to Ω, a bridge b (of S) in Ω is a node of S such that there exists a me edge (b, x) ∈ Em . For a me edge (b, x) ∈ Em , we say that b is a bridge towards S(x). Given a node a we denote by Γ(a) the set of nodes in S(a) such that, for each b ∈ Γ(a), (a, b) ∈ Ef . Γ(a) is said the set of neighbors of a. From now on consider given a SIS Ω. We are ready to introduce the notions at the basis of our assortativity measures. Definition 3.2: Let S and T be two social networks of Ω, let BS be the set of bridges of S, and let BS,T be the set of the bridges of S towards T . We define:

1)

The Loose Bridge Friend Fraction of S as LBFS =

2)

where P () represents the probability of the considered event.

|{b ∈ BS | Γ(b) ∩ BS ̸= ∅}| |BS |

The Strict Bridge Friend Fraction of S towards T as SBFS,T

|{b ∈ BS,T | Γ(b) ∩ BS,T ̸= ∅}| = . |BS,T |

In words, the Loose Bridge Friend Fraction of a social network S measures the fraction of the bridges of S having at least one friend that is a bridge. Moreover, the Strict Bridge Friend Fraction of a social network S represents the fraction of the bridges of S toward T having at least one friend that is a bridge toward T too. In the next definition we introduce the concept of random version of a social network S. Intuitively, it is obtained by maintaining the macro parameters of S (i.e., number of nodes, number of edges, number of me edges) and replacing the deterministic occurrence of friendship edges and me edges with two random variables with uniform distribution. The role of this notion is to have a theoretical reference to measure the bias of real-life social networks w.r.t. the case in which no bridge-bridge correlation exists. This approach is commonly adopted in literature in this context [1].

The above probabilities can be computed by the following theorem: Theorem 3.1: Let S and T be two social networks of Ω, n be the number of its nodes, nb be the number of its bridges, ˆˆ nSb T is the number of bridges having a me edge from Sˆ towards Tˆ, and d be the average outdegree of S. Then: ∏ 1) LBF Sˆ = 1 − k=1..nb −1 n−d−k n−k ∏ n−d−k 2) SBF S, ˆ Tˆ = 1 − k=1..nb −1SˆTˆ n−k Proof. First, we prove Item (1). LBF Sˆ can be computed as the probability that there is a bridge among the neighborhood of a bridge, which is the complementary of the probability that there is no bridge among the neighborhood of a bridge. It is easy to see that the corresponding random variable follows a hypergeometric distribution. As a consequence: (nb −1) ((n−1)−(nb −1)) · 0 (n−1) 0 LBF Sˆ = 1 − = d

=1−

Definition 3.3: Let S be a social network of Ω. We define ˆ as the random graph the random version of S, denoted by S, having: • •



( =1−

the same nodes as S; EfS

as set of friendship edges, a random variable following a uniform distribution such that |EfS | = |{(a, b) ∈ Ef s.t. a ∈ S}|;

=1−

S as set of me edges, a random variable Em following S | = |{(a, b) ∈ a uniform distribution such that |Em Em s.t. a ∈ S}|.

Now we extend the definitions of Loose Bridge Friend Fraction and Strict Bridge Friend Fraction to the case of the random version of a social network. Observe that, given a social network S, being Sˆ a random graph, about the above measures it holds that: (i) The Loose Bridge Friend Fraction is the probability of a bridge to have a bridge as a friend; (ii) The Strict Bridge Friend Fraction is the probability of a bridge towards a given social network T to have a bridge towards the same social network T as a friend. This is encoded in the following definition. Definition 3.4: Let Sˆ and Tˆ be the random versions of the two social networks S and T in Ω, respectively. Let BSˆ be the ˆ and let B ˆ ˆ be the set of the bridges of set of bridges of S, S,T ˆ We define: Sˆ towards S. 1)

The Loose Bridge Friend Fraction of Sˆ as ( ) LBF Sˆ = P Γ(b) ∩ BSˆ ̸= ∅|b ∈ BSˆ

2)

The Strict Bridge Friend Fraction of S towards T as ) ( SBF S, = P Γ(b) ∩ B = ̸ ∅|b ∈ B ˆ Tˆ ˆ Tˆ ˆ Tˆ S, S,

(n−nb )! (nb −1)! 0!·(nb −1)! · d!·(n−nb −d)! (n−1)! d!·(n−d−1)!

(n − nb )! d! · (n − d − 1)! · d! · (n − nb − d)! (n − 1)!

) =

(n − nb )! · (n − 1) · (n − 2) . . . (n − nb + 1) · (n − nb )! ·

=1− ·

=

(n − d − 1)! = (n − d − nb )!

1 · (n − 1) · (n − 2) . . . (n − nb + 1)

(n − d − 1) · (n − d − 2) . . . (n − d − nb + 1) · (n − d − nb )! = (n − d − nb )! =1−

∏ k=1..nb −1

n−d−k n−k

The proof of Item (2) is analogous to the previous except for the parameters of the hypergeometric distribution that must be specialized according to the definition of SBF S, ˆ Tˆ , instead ˆTˆ S of LBF Sˆ . This simply implies to consider nb instead of nb in the hypergeometric distribution. 2 Now we are ready to define the Loose Internetworking Assortativity and the Strict Internetworking Assortativity. Definition 3.5: Let S be a social network of Ω. We define the Loose Internetworking Assortativity of S as: LIAS = LBFS − LBFSˆ

Observe that, according to Definition 3.3, Sˆ is a random social network built in such a way that no bridge-bridge correlation exists among its nodes [1]. Thus, LIA measures how much a social network is biased w.r.t. the random case in terms of probability of finding bridges among the friends of a bridge. SIA can be defined as follows. Definition 3.6: Let S be a social network of Ω. We define the Strict Internetworking Assortativity of S as: ∑ SBFS,T −SBFS, ˆ T ˆ SIAS = T ∈(Ω\{S})|Ω\{S}| Intuitively, SIA measures how much a social network is biased w.r.t. the random case in terms of probability of finding bridges among the friends of a bridge, which are coherent with it in terms of target social network. IV.

E XPERIMENTS

In the past, several papers proposing assortativity measures applied the new defined measures on some (real-world or social) networks, both to verify if the hypothesis leading to the new assortativity measure made sense in real contexts and to measure the assortativity/disassortativity degree of some of these contexts w.r.t. the new measure. We proceed in this way also for our assortativity measure. We decided to conduct an experimental campaign on Facebook. The choice of this social network is motivated by the fact that it is currently the on-line social network with the highest number of users, and it attracted the attention of many researchers (see, for instance, [20], [21]). To conduct our campaign, we had to consider a SIS and to extract a Facebook-centered sample obtained by considering Facebook users, their friendships and me edges towards the other networks of the SIS. As a consequence, existing Facebook samples, which were used in many experiments in the past (see, for instance, the sample described in [20]), were not suitable for our campaign because they did not contain information about me edges. Friendships and me edge detection relies on the two standards XFN and FOAF. XFN (XHTML Friends Network) [22] uses an attribute, called rel, to specify the kind of relationship between two accounts. Possible values of rel are me, friend, contact, co-worker, and parent. FOAF (Friend-Of-A-Friend) [23] is a human readable ontology serialized into an XML document encoding human relationships. Our SIS consisted of the following social networks: {Twitter, YouTube, LiveJournal, Flickr, Facebook, MySpace, LinkedIn, Friendster}. For obtaining our sample, we cannot rely on a specific crawling technique because, in this case, the way of proceeding of the crawler introduces some biases in the parameter estimation (for instance, a crawler specific for bridges, such as BDS [24], produces a sample with a fraction of bridges higher than the average fraction of bridges in the network, thus biasing the estimation of LIA and SIA). Thus, to avoid biases, we uniformly sampled Facebook. The uniform sampling of a social network is generally not trivial. However, in Facebook, this activity is facilitated by the organization of user identifiers adopted by this network. Specifically, until 2007, each Facebook user had associated a 32-bit numeric identifier. After 2007, Facebook introduced 64bit identifiers. As a consequence, the URL address of the profile page of a Facebook user is http://www.facebook.

Parameter Number of visited nodes Number of seen nodes Number of friendship edges Average outdegree of publicly accessible accounts only Average outdegree of nodes Number of me edges Fraction of bridges TABLE I.

Value 49,268 3,900,055 3,846,407 372.5334 78.0711 277 0.001137

M AIN FEATURES OF OUR FACEBOOK

SAMPLE .

com/XXX, where XXX is her 64-bit numeric identifier. For instance, 1477228048 is the numeric identifier of the profile of one of the authors of this paper. On the basis of the previous reasoning, to obtain a uniform sample of Facebook, it suffices to generate 64-bit numbers uniformly at random and, for each number, to verify whether it has been already assigned to a real user. For this task, several parsing strategies can be adopted. We have chosen to exploit the SNAKE system [25], which supports data extraction in a multi-social-network scenario. Our sampling activity was conducted from January 7th , 2013 to April 19th , 2013. The main features of our sample are reported in Table I. The whole sample can be accessed at the address http://www.ursino.unirc.it/ assortativity.html. For the reviewer the password to open the archive is “73482936”. Once the sample was obtained, we computed the Loose Internetworking Assortativity LIAF b for Facebook (Fb). For this purpose, we had to compute LBFF b and LBF Fˆb . As for the computation of LBFF b , we have used our Facebook sample and, for each bridge of this sample, we have verified whether there existed a bridge in its neighborhood. The fraction of bridges having this property is LBFF b = 0.4642. As for the computation of LBF Fˆb (see Theorem 3.1), we needed three Facebook parameters, namely the number n of Facebook users, the average outdegree d of Facebook nodes and the number nb of Facebook bridges. From our sample, we have estimated d and the fraction of bridges (see Table I). Actually, as for the average outdegree of nodes, we computed both the one measured for only publicly accessible accounts (as done in [20]) and the global average degree (obtained as the ratio between the number of edges and the number of visited nodes). We set d to the last one since we had to consider all Facebook nodes (and not only the public ones) in our experiments. From the Facebook official report of December 2012, n is 1.06 billion. By combining this value with the fraction of bridges derived from our sample we obtain that nb = 1, 205, 220. As a consequence, LBF Fˆb = 0.0571. Therefore, on the basis of Definition 3.5, we have that LIAF b = 0.4071. To fully understand this result, it is worth recalling that assortativity can range in the real interval [−1, 1], and that, in the past even the most assortativity networks had an assortativity degree lesser than 0.4. For instance, in the paper of Newman [1] mentioned in Section II, the most assortative network was the physics coauthorship one, and had an assortativity value equal to 0.363. On the basis of this reasoning, we can conclude that Facebook, with a LIA equal to 0.4071 is highly assortative, as far as the bridge-bridge assortativity is concerned. As for the calculation of SIAF b , for each social network

T ∈ Ω \ {F acebook}, we computed SBFF b,T and SBF Fˆb,Tˆ . The computation of SBFF b,T (resp., SBF Fˆb,Tˆ ) is analogous to the one of LBFF b (resp., LBF Fˆb ). As a consequence, by applying Definition 3.6, we obtain SIAF b = 0.05434. This result indicates that, as for SIA, Facebook is neutral. The interpretation of the above experimental conclusions is obviously related to behavioral and sociological aspects of social network people, so that it can be correctly given only if a lot of further non-technical information (besides those considered in this paper) are taken into account. This is clearly out of the scope of this paper. Anyway, we give some basic explanation of our results showing that Facebook is assortative in the loose sense, while is neutral in the strict sense. Intuitively, this phenomenon is related to the propensity of people to imitate their acquaintances. As a matter of fact, the declaration of a me edge results in an insertion of the logo/url of the target social network in the home page of the user. Thus, the friends of this user could be enticed to insert the same feature in their page. This explains the loose assortativity measured in Facebook. Probably, the target social network chosen by the “imitators” mostly depends on factors not related to the target social networks of the imitated user, like, for example, the membership of the imitators to a given social network. V.

P OSSIBLE APPLICATIONS

1) Information spreading over a SIS: The problem of spreading information over large communities as much quickly, capillary and effectively as possible, has a crucial importance in many areas, such as economy, government, culture, society, etc. This problem is much more evident if we consider new generation communication systems, such as social networks and especially SISs. Information spreading has been first considered in the context of economy, above all in marketing [26], [27]. In a SIS context, information spreading is even more challenging and includes some new issues to deal with, mainly related to the possibility that information can cross different social networks. We expect that if we apply to a SIS one of the techniques conceived in the literature for facilitating information spreading in a (single) social network, we will be far from the best possible result, since those techniques do not take into account the problem of crossspreading (i.e., information spreading across different social networks), which is instead crucial in this context. To handle this issue, bridge are the main actors since, thanks to me edges, they allow information spreading among social networks. In this scenario, our assortativity measure can play an important role. For instance, think of a marketing campaign in a SIS. In this case, it is important to select the right nodes from which information spread should be activated. It can be easily verified that these nodes should be bridges and should belong to the social networks with the highest bridge-bridge assortativity. In fact, if this condition is true, then information passes through a high number of bridges, thus maximizing the cross-spreading probability. 2) Intelligent techniques for the extraction of hidden me edges: We have seen that, in a SIS, bridges and me edges play a key role. Unfortunately, for disparate reasons, users do not always make their role of bridge explicit by specifying their

me edge, missing thus a potentially very useful information. As a consequence, in the overall underlying (social internetworking) graph a big number of missed me edges exists, whose discovery represents a very important issue. In other words, an interesting problem of missing link detection arises, which partially overlaps with a link prediction issue, since we may expect that a portion of missing me edges will be inserted in the graph later. Indeed, if a network is assortative, it is possible to start from an already existing me edge and to identify the corresponding source and target nodes. These are clearly bridges and, thanks to the assortativity property, it is presumable that several other bridges (and, consequently, several other candidate pairs) can be found in their neighborhood. This reduces the computational cost of detecting hidden me edges by restricting the search space. Even more, if SIA is positive for the involved networks, we expect that given a me edge from the social network A to the social network B, the me edges found by applying the previous approach link the same pair (A, B) of social networks. The reader can find a detailed description of this approach in [4]. 3) Supporting a SIS-oriented crawler: In the context of social network analysis, a crucial task is the extraction of significant samples of the social network being analyzed. Clearly, the sample must be as much representative of the original social network as possible. In the past, several crawling strategies for single social networks have been proposed. Among them, the most representative ones are Breadth First Search (BFS) [28], Random Walk (RW) [29] and Metropolis-Hastings Random Walk (MH) [30]. They were largely investigated for single social networks and their pros and cons have been highlighted [20], [31]. Social network analysis is still crucial in the context of SISs. However, we cannot expect that a crawling strategy good for single social networks is still valid for SISs, due to the specific topological features of these structures. Indeed, one of the main problems of classical crawling techniques, when applied to a SIS, is that they tend to remain confined to the social network they started from [24]. This is due to the fact that the fraction of bridges in a social network is low and consequently without adopting specific strategies, it is improbable that a crawling technique visits many bridges. A crawler specific for SISs has been already presented in [24]. It adopts a specific heuristics that privileges bridges in the selection of the nodes to visit. In this context, bridge-bridge assortativity may help to improve this crawler, especially in the selection of the bridges to visit. Indeed, it is possible to define a “score” for each bridge on the basis of the bridgebridge assortativity degree of the social networks target of its me edges. The higher this degree, the higher the score of the corresponding bridge. This score can be considered in the strategy for the choice of the next bridges being visited by the crawler in such a way as to increase the probability that, in the next steps, many of the visited nodes are in turn bridges. VI.

C ONCLUSION

In this paper, we have proposed an extension of the assortativity concept to SISs to capture some specific features of this new emerging scenario. The aim of our approach was to obtain

some measures able to extract knowledge about the specificity of the social internetworking scenario, concerning informationcrossing among different social networks. To do this, we had to take into account the peculiar entities of SISs, which are not considered by the classical assortativity measures. Such entities are bridges and me edges. Therefore, we have proposed two forms of bridge-bridge assortativity, based on the hypothesis that the friends of a bridge in a SIS are in turn bridges. These two forms are Loose Internetworking Assortativity, denoting the preference of a bridge to have other bridges among its friends, and Strict Internetworking Assortativity, representing the preference of a bridge, having a me edge from a social network S towards a social network T , to have among its friends other bridges having a me edge from S towards T . The above measures have been tested on Facebook, which is currently the most popular social network collecting more than 1 billion users, concluding that Facebook is bridge-bridge assortative in the loose sense and is neutral in the strict sense. This confirmed that the correlation among bridges exists in the real-life case (at least in Facebook), thus witnessing that the investigation towards this direction is meaningful. We have given a possible explanation of the results based on the expected behavior of people. This issue merits further analysis involving also sociological aspects, which we plan to deal with in our future research. Finally, it appears interesting to test our measures on other social networks to check if the bridge-bridge assortativity neutrality of Facebook in the strict sense, as well as its strong assortativity in the loose sense, are confirmed. Acknowledgment This work has been partially supported by the TENACE PRIN Project (n. 20103P34XC) funded by the Italian Ministry of Education, University and Research. R EFERENCES [1] [2]

[3]

[4]

[5] [6]

[7]

[8] [9] [10]

M. Newman, “Assortative mixing in networks,” Physical Review Letters, vol. 89, no. 20, p. 208701, 2002. Y. Okada, K. Masui, and Y. Kadobayashi, “Proposal of Social Internetworking,” in Proc. of the International Human.Society@Internet Conference (HSI 2005). Asakusa, Tokyo, Japan: Lecture Notes in Computer Science, Springer, 2005, pp. 114–124. F. Buccafurri, V. Foti, G. Lax, A. Nocera, and D. Ursino, “Bridge Analysis in a Social Internetworking Scenario,” Information Sciences, vol. 224, pp. 1–18, 2013, elsevier. F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, “Discovering Links among Social Networks,” in Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012). Bristol, United Kingdom: Lecture Notes in Computer Science. Springer, 2012, pp. 467–482. M. Newman and J. Park, “Why social networks are different from other types of networks,” Physical Review E, vol. 68, no. 3, p. 036122, 2003. M. McPherson, L. Smith-Lovin, and J. Cook, “Birds of a feather: Homophily in social networks,” Annual review of sociology, vol. 27, pp. 415–444, 2001. M. Catanzaro, G. Caldarelli, and L. Pietronero, “Social network growth with assortative mixing,” Physica A: Statistical Mechanics and its Applications, vol. 338, no. 1, pp. 119–124, 2004. H. Hu and X. Wang, “Disassortative mixing in online social networks,” EPL (Europhysics Letters), vol. 86, no. 1, p. 18003, 2009. H. B. Hu and X. F. Wang, “Evolution of a large online social network,” Physics Letters A, vol. 373, no. 12, pp. 1105–1110, 2009. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong, “Analysis of topological characteristics of huge online social networking services,” in Proc. of the International Conference on World Wide Web (WWW’07). Banff, Alberta, Canada: ACM, 2007, pp. 835–844.

[11] C. Wilson, B. Boe, A. Sala, K. Puttaswamy, and B. Zhao, “User interactions in social networks and their implications,” in Proc. of the ACM European Conference on Computer systems (EuroSys’09). Nuremberg, Germany: ACM, 2009, pp. 205–218. [12] M. Catanzaro, G. Caldarelli, and L. Pietronero, “Assortative model for social networks,” Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, vol. 70(3), pp. 037 101–037 104, 2004. [13] R. Xulvi-Brunet and I. Sokolov, “Changing correlations in networks: assortativity and dissortativity,” Acta Physica Polonica B, vol. 36, no. 5, pp. 1431–1455, 2005. [14] K. Goh, E. Oh, B. Kahng, and D. Kim, “Betweenness centrality correlation in social networks,” Physical Review E, vol. 67, no. 1, p. 017101, 2003. [15] S. Johnson, J. Torres, J. Marro, and M. Munoz, “Entropic origin of disassortativity in complex networks,” Physical review letters, vol. 104, no. 10, p. 108702, 2010. [16] G. Kossinets, “Effects of missing data in social networks,” Social networks, vol. 28, no. 3, pp. 247–268, 2006. [17] J. Bollen, B. Gonc¸alves, G. Ruan, and H. Mao, “Happiness is assortative in online social networks,” Artificial life, vol. 17, no. 3, pp. 237–251, 2011. [18] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in Proc. of the International Conference on World Wide Web (WWW’10). Raleigh, NC, USA: ACM, 2010, pp. 591–600. [19] F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gonc¸alves, “Detecting spammers and content promoters in online video social networks,” in Proc. of the International Conference on Research and Development in Information Retrieval (SIGIR ’09). Boston, MA, USA: ACM, 2009, pp. 620–627. [20] M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou, “Walking in Facebook: A case study of unbiased sampling of OSNs,” in Proc. of the International Conference on Computer Communications (INFOCOM’10). San Diego, CA, USA: IEEE, 2010, pp. 1–9. [21] A. Patriquin, “Connecting the Social Graph: Member Overlap at OpenSocial and Facebook,” http://blog.compete.com/ 2007/11/12/connecting-the-social-graph-memberoverlap-at-opensocial-and-\\facebook/, 2007. [22] “XFN - XHTML Friends Network,” http://gmpg.org/xfn, 2012. [23] D. Brickley and L. Miller, “The Friend of a Friend (FOAF) project,” http://www.foaf-project.org/, 2012. [24] F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, “Crawling Social Internetworking Systems,” in Proc. of the International Conference on Advances in Social Analysis and Mining (ASONAM 2012). Istanbul, Turkey: IEEE Computer Society, 2012, pp. 505–509. [25] F. Buccafurri, G. Lax, B. Liberto, A. Nocera, and D. Ursino, “Supporting Community Mining and People Recommendations in a Social Internetworking Scenario,” in Proc. of the International Workshop on Mining Communities and People Recommenders at ECML/PKDD 2012 (COMMPER 2012), Bristol, UK, 2012, pp. 24–31. [26] J. Goldenberg, E. Libai, and E. Muller, “Talk of the network: A complex systems look at the underlying process of word-of-mouth,” Marketing letters, vol. 12, no. 3, pp. 211–223, 2001. [27] Y. Wong, R. Chan, and T. Leung, “Managing information diffusion in Internet marketing,” European Journal of Marketing, vol. 39, no. 7/8, pp. 926–946, 2005. [28] S. Ye, J. Lang, and F. Wu, “Crawling online social graphs,” in Proc. of the International Asia-Pacific Web Conference (APWeb’10). Busan, Korea: IEEE, 2010, pp. 236–242. [29] L. Lov´asz, “Random walks on graphs: A survey,” Combinatorics, Paul Erdos is Eighty, vol. 2, no. 1, pp. 1–46, 1993. [30] D. Stutzback, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, “On unbiased sampling for unstructured peer-to-peer networks,” in Proc. of the International Conference on Internet Measurements. Rio De Janeiro, Brasil: ACM, 2006, pp. 27–40. [31] M. Kurant, A. Markopoulou, and P. Thiran, “On the bias of BFS (Breadth First Search),” in Proc. of the International Teletraffic Congress (ITC 22). Amsterdam, The Netherlands: IEEE, 2010, pp. 1–8.