Cite this paper as: Tutoky G., ParaliÄ J. (2011) Time Based Modeling of Collaboration Social Networks. In: JÄdrzejowicz P., Nguyen N.T., Hoang K. (eds) ...
Time Based Modeling of Collaboration Social Networks Gabriel Tutoky and Ján Paralič Dept. of Cybernetics and Artificial Inteligence, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 040 01 Košice, Slovakia {Gabriel.Tutoky,Jan.Paralič}@tuke.sk
Abstract. This article describes basic approaches for modeling of collaboration networks. We discuss typical ways of modeling collaboration networks and we also propose a new extension in weighting ties among event participants and the idea of aging of the ties among collaborators. Classical as well as proposed approaches to weighting of ties in collaboration networks are experimentally evaluated on real data set and compared with peoples’ opinions expressed in a targeted inquiry. Keywords: social network analysis, collaboration networks, networks modeling, network projection.
1 Introduction In recent years, many “social networks” have been analyzed like various Internet communities, email networks, peer-to-peer networks, telephone call graphs or train routers [1]. All of these networks are interesting in some of their specific aspects and they provide a notable data source for network analysis. There are usually large-scale networks with thousands of nodes and edges. Analysis of these networks, usually based on global properties, can lead to interesting and helpful results. Nevertheless, there exist many different situations in network analysis where data used for analysis of these networks did not carry sufficient information, e.g. temporal information is often neglected in these analyses. In this paper we describe a new approach how to model and analyze one particular type of social networks, affiliation networks, making use of more strands of additional information, including the temporal one. One of interesting types of social networks are affiliation networks. An affiliation network is a network of actors connected by common memberships in groups/events such as clubs, teams, organizations or other activities. Affiliation networks are special type of two-mode social networks [2] where one mode is a set of actors, and the second mode is a set of events which are affiliated to the actors. The tie between actors and events are created, if actor is member of a group/participates on particular event. Affiliation networks describe collections of actors rather than simply ties between pairs of actors. Based on such an affiliation network we are able to derive connections among members of one of the modes based on linkages established through the second mode [2]. P. Jędrzejowicz et al. (Eds.): ICCCI 2011, Part I, LNCS 6922, pp. 409–418, 2011. © Springer-Verlag Berlin Heidelberg 2011
410
G. Tutoky and J. Paralič
Affiliation networks were studied in past, e.g. studying attendance of women in social events [3], movies and their actors [4] or co-authorship network of scientists and their papers [1]. Whereas in the first two examples, the authors used unweighted representations of the networks, in the last work, the author introduced interesting approach for building of collaboration network where he used weighted ties between authors of the same paper. The weight of the tie between collaborated authors of a single paper is derived from count of the paper collaborators, and final weight of two collaborated authors is a sum of weights over all papers where authors collaborated. This approach allows finding the “most connected” scientists in the whole collaboration network. In our work we build collaboration network of teenagers (the necessary background details are described in next section 2) based on their participations on educative-pedagogic workshops for evaluating the “most important” persons in desired time stamp. In network building process we used two different approaches: 1) modified weighting of the ties between collaborators based on weighting used by Newman in [1, 5] and 2) our weighting described in details in section 2.2. Both of these weightings are next evaluated taking into account also the temporal aspects. We proposed time dependent weights decreasing over time (see section 2.3). We assume that weight of two collaborators is dependent on the number of events’ participants and on the time interval between events where these actors collaborated together. Weight decreases with increasing number of event’s participants and also with increasing length of the time interval between two common events.
2 DAK Collaboration Network DAK1 – community network is a collaboration network of members (usually teenagers) of a non-profit organization dealing with organizing educative-pedagogic workshops for young people [6]. Usually there are organized around 10 workshops annually with 150 – 700 participants on single workshop. The number of participants depends on workshop’s type and duration. All participants of a single workshop are partitioned into smaller groups, usually with 8 – 12 members. Each group member cooperates with other group members and so there are established new social relations between group members. Generally there are two types of group members – group participants and leader(s) of a group. Participants are spending most of the time inside the group (i.e. they usually do not establish social relations outside the group), but leader cooperates with other leaders and so they create another type of social relations. Additionally we recognize two types of groups – participants’ group and organizers’ group. In summary we have two types of groups and four types of group members: base participant, base organizer, leader of participants’ group and leader of organizers’ group. Compositional attributes in DAK data set are available for both, actors and events. Actors are described by attributes such as date of birth (age) and gender; as well as by geographical attributes – city or area of living. Events are described by their type. 1
http://www.zksm.sk/
Time Based Modeling of Collaboration Social Networks
411
We can recognize two main types of events – events for base participants and events for organizers. Events for organizers are next categorized by their types of activity like registration, security or accommodation event. Moreover, temporal attributes are available together with compositional attributes, e.g. start and end of events/workshops. From these data we can derive several other attributes, such as “length of event” or “time of first visit” for particular actor. In our case it means the moment when the actor visited any event for the first time.
3 Collaboration Network Modeling Collaboration network described above can be expressed for single workshop by a bipartite graph as representation of two-mode network. The first mode is a set of actors, and the second mode is a set of events which affiliate the actors. We represent each group on the single workshop as a single event, so for one workshop we obtain several events. Additionally we added two more events for representation of cooperation between leaders of participants and leaders of organizers. One of the advantages of DAK data set is availability of temporal information in the data. We are able to track events in the time and recognize which events were organized in parallel and which sequentially. Also we are able to track participation of single actors on particular events. 2.1 Base Projection Onto One-Mode Network Affiliation networks (two-mode representation) are most often projected onto onemode networks where actors are linked to one another by their affiliation with events (co-membership or co-operation), and at the same time events are linked by the actors who are their members (overlapping events). This property is called duality of the affiliation network [2]. Usually, weights in both, affiliation (two-mode) and also in projected (one-mode) networks have binary values. The ties in the networks exist or not. In the step of projection of two-mode networks onto one-mode networks we can use different measures for weight definition, e.g. the ones summarized by Opsahl in [7]: -
Weight is determined as a count of participations (co-occurrences) – e.g. the count of events were two actors participated together, formalized expression is ∑ 1,
-
(1)
where is the weight between actors (nodes of the first mode) and , and are events (nodes of the second mode) where and participated together. Newman in [1, 5] proposed extended determination of weights while working with scientific collaboration networks. He supposes that strength of social bonds between collaborators is higher with lower count of collaborators on a paper and vice versa social bonds are lower with many collaborators on a paper. He proposed formula (see formula 2) for defining the weights among collaborators is the count of collaborators on paper (event) . where
412
G. Tutoky and J. Paralič
∑
(2)
Till now we considered only binary two-mode networks and their projection to weighted one-mode networks. However, there exist also weighted two-mode networks, such as networks of online forums (weight is determined as count of posts or posted characters) or collaboration network described above and also in [8, 9]. So, both just presented measures for weight definition could be extended for weighted two-mode networks as follows: ∑
-
-
,
,
(3)
where , is the weight of th actor to th event where and participated together. This method differentiates how two particular actors interact with the common event, and projects it onto a directed weighted one-mode network. [7]. In a similar way, the Newman’s method can be extended for projecting of twomode networks. The weights are calculated by the following formula: ,
∑
.
(4)
This formula would create a directed one-mode network in which the out-strength of a node is equal to the sum of the weights attached to the ties in the two-mode network that originated from that node [7]. 2.2 Extension of One-Mode Projection In the next two sections we describe our extensions of projection of two-mode collaboration networks onto one-mode networks. This step – projection of two-mode networks, has strong impact on analysis of collaboration networks. It is important step for creation of the most suitable network model by projection onto one-mode network. At first, we propose new, more general weighting of the ties created among event participants as Newman’s weighting method. The reason is that Newman’s weighting method results in fast decreasing value with just a small increase of event participants (more than two). This can be good in some cases, but not in general for any collaboration network. We suggest using an exponential curve: (5) The weights are also here decreasing with increasing number of event participants. But parameter can be adjusted with respect to particular type of collaboration network (and in such a way influence the shape of the exponential curve). Parameter depends on collaboration type and it should by estimated by a domain expert e.g. with the following formula: .
(6)
Time Based Modeling of Collaboration Social Networks
413
This formula enables easier set up of an optimal value of the parameter for particular type of collaboration network because is the weight which should be established between participants of an event in the collaboration network. For example in scientific collaboration network, the strength of collaboration ties among 8 scientists should by weaker (by Newman it is 0,14286), but e.g. in collaboration network of students or in the DAK network described above, the strength of the ties among 8 collaborators participating on the same event should be higher, e.g. 0,62. Number 2 used in the index of radical represents an “ideal” number of event participants, when the strongest ties are created among event participants (this is analogical to the Newman’s method). 2.3 Time Based Network Modeling Various collaboration networks contain time series data –usually the time of the event is known. It is reasonable to assume that the weight of ties created between participants of a common event will decrease over time. So, we propose time dependent weights in our representation of one-mode projected affiliation network – a kind of aging of the ties. This should be considered as similar approach to the one presented in [10, 11] where authors considered aging of the nodes in the context of citation networks. They describe node’s age as influence to the probability of connecting current node to new nodes in the network. Our proposed weight aging method is based on assumption that past collaborations among network members are less important than lately created collaborations. These past collaborations after passing sufficient long time have no more influence in the present and they are next removed from the network – old ties (without refreshing) among collaborators are than “forgotten” in such a way. From the social network analysis point of view our proposal of aging of the edges can lead to new opportunities in network analysis: -
-
Tracking collaborations over the time – i.e. tracking of collaboration strength with passing time among selected actors of the network. This should provide detailed information describing evolution of cooperation among desired actors. Creation of network snapshots in given time – it allows us to obtain actual network state in desired time and consequently to analyze e.g. strongest collaborations in the network. It can lead to different results of network analysis because we do not consider older collaborations so high like last created. In collaboration network we are able to “view” still actual and (by our confidence) important collaborations among network members.
We have investigated humans’ forgetting curve described by Ebbinghaus in [12] which has exponential progress for possibility of using it for aging of edges (i.e. for decreasing of their weights) with passing time. Forgetting curve can be described as , 2
(7)
We consider interval for weight values where 0 represents weakest and 1 represents strongest tie.
414
G. Tutoky and J. Paralič
where is memory retention, is the relative strength of memory, and is time. By [12, 13] the forgetting curve decreasing rate depended on repetition of memory – in collaboration network context it is repetition of collaborations among actors. If actors collaborate together frequently (the collaboration strength grows quickly in short time) the edge aging is slower than aging in case of just one (or very sporadic) occurrence of mutual collaboration. Also we have investigated similar works with aging of nodes where in [11] authors studied aging of the nodes in scientific collaboration networks and they derive exponential progress of network aging – similar to the forgetting curve (formula 7). In network modeling process we propose the use an exponential curve for modeling of edges aging described by formula ∆
∆ ∆
where among actors
1
,
(8)
is the weight after ∆ time left after the last collaboration in time and . Formula
decreasing in passing time and
1
expresses rate of the edge weight
is the relative strength of collaboration.
2.4 Network Modeling Process We decompose the network modeling process into the following steps: -
Creation of two-mode network – from available real data we created affiliation network with expert support. Projection of two-mode network onto one-mode network using the following alternative weighting schemes: -
Our proposed weighting – collaboration strength is derived from number of event participants by means of equation 5; for we used value 1,04. Solitary network modeling over the time: - Simple summation over all collaborations – collaborations computed in step before are now summed – see equation 1 for simple weighting case and eq. 2 for Newman’s weighting. - Aging of edges – simulation of network edges aging, we created 24 network snapshots, each one for the time of a workshop and we derived collaboration strength before and after the workshop. Collaboration strength between two selected actors is depicted on the following figure 1 for different types of analyzed weighting schemes. -
-
Simple weighting – each collaboration on an event has value 1 Newman’s weighting – collaboration strength is derived from number (see equation 2). of event participants by formula
Time Based Modeling of Collaboration Social Networks
415
Fig. 1. Simple summing of collaboration weights (dotted lines) – collaboration weight between two selected actors never decreases; “Aging” of collaboration weights (dashed lines) – in the case of high frequency of collaboration (e.g. between dates 22.1.2010 and 10.8.2010) weight is increasing also relatively fast, but in case of no collaboration (after date 10.8.2010) weight is decreasing. Triangle marked lines – weight is increasing in collaboration time with constant value 1; Circle marked lines – weight is increasing by Newman’s weighting, see formula (2) for dotted circle marked line; Square marked lines – weight is increasing with our proposed weighting, see formula (5) for dotted square marked line.
3 Evaluation We implemented all methods for projection of two-mode networks onto one-mode networks presented above and we evaluated them for both variations of weighting of the ties – for simple summing of all collaboration weights; and also for aging of ties (collaborations) with passing time. In order to evaluate which of these approaches models best the reality, we used for comparison data gathered by means of targeted inquiry from 16 respondents who are actual members of the analyzed collaboration network. We selected such members of the network who know the network structure very well since a longer time period and follow activities of its members trough organized workshops. Each of these respondents had to create a list with 30 most important persons in the network by his/her opinion. The goal was also to sort created list from most important to less important persons. As a result of this inquiry we obtained from all respondents altogether a list of 90 distinct actors which were mentioned 1 – 15 times by our respondents. For our evaluation we filtered this list for persons who were mentioned at least 4 times and next we ordered this list by average value of actor’s position assigned to him/her by respondents. On the other side, for each particular model of collaboration network we obtained list of top 35, 30, 25 and 15 actors by Hubs and Authorities analysis [14, 15]. We
416
G. Tutoky and J. Paralič
compared these lists with results from our inquiry. For each list size we evaluated the quality of estimation of most important actors by particular model of collaboration network (see figure 2) so that we first simply computed intersections between these lists (one gathered as a result from the inquiry process described above and the other one by calculation of Hubs and Authorities in case of particular collaboration network model) and expressed it in percentage of the whole. The results are graphically presented in figure 2.
Fig. 2. Evaluation results of the most important actors in the network. Three weighting types – simple, Newman’s and our proposed weighting are distinguishable by gray tone. Left columns of the same tone displaying result for simple summing of weights over the time; whereas rights columns displaying results with aging of ties over the time.
This experiment confirmed our assumption that projection of two-mode network based on weighting with constant value 1 (formula (1)) cannot provide sufficient model of collaboration network (see dark columns in figure 2). On the second hand, this experiment showed unexpectedly high precision of results for Newman’s weighting of collaboration ties for simple summing of weights over the time (see all left middle gray columns). In this case we expected better results for Newman’s weighting than constant weighting, but we also expected higher precision of our proposed weighting. Our expectation was validated in case of aging of the ties where it has better results than Newman’s weighting (see lightest gray columns). In case of aging of the ties, our proposed weighting has better results than Newman’s weighting, especially for identifying 15 most important actors. In the next step we evaluated mean absolute deviation of ordering of important actors so that we counted all differences (in absolute value) between ordering obtained from inquiry and from network analysis (see figure 3). We can see that the best results for aging have been achieved with our approach to weighting, and for simple summing our approach also achieved good results.
Time Based Modeling of Collaboration Social Networks
417
Fig. 3. Evaluation results of ordering of important actors by mean absolute deviation
4 Conclusion In section 2.1 we described various methods for weighting of collaborations among event participants in collaboration networks and in section 2.2 we described one new method of weighting of the ties. In next sections 2.3 and 2.4 we described our method for modeling networks with passing time, where we proposed method for aging of the ties among collaborators. We next evaluated all presented methods on data from DAK collaboration network. Experiments brought positive results, showing that proposed type of weighting, especially in combination with aging resulted in very good results. But there is still space for further investigations of proposed methods. In our future work we will evaluate different collaboration network models with further feedback from actors. Currently we are gathering data from respondents who are asked for expression of collaboration strength to persons from their collaborations. In addition they are also asked for expression of trend of collaboration strength in the last 2 years. Acknowledgments. The work presented in this paper was supported by the Slovak Grant Agency of Ministry of Education and Academy of Science of the Slovak Republic under grant No. 1/0042/10 (30%) and by the Slovak Research and Development Agency under the contract No. APVV-0208-10 (30%). This work is also the result of the project implementation Development of Centre of Information and Communication Technologies for Knowledge Systems (project number: 26220120030) supported by the Research & Development Operational Programme funded by the ERDF (40%).
References 1. Newman, M.E.J.: Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks (2004) 2. Wasserman, S., Faust, K.: Social Network Analysis. Cambridge University Press, Cambridge (1994)
418
G. Tutoky and J. Paralič
3. Davis, A., Gardner, B.B., Gardner, M.R.: Deep South. A social Anthropological Study of Caste and Class. University of Chicago Press, Chicago (1941) 4. Watts, D.J., Strogats, S.H.: Collective dynamics of ’small-world’ networks. Nature (1998) 5. Newman, M.E.J.: Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. The Amarical Physical Society 64 (2001) 6. DAK - Collaboration Network, data set of non-profit organization (2011), http://www.domcek.org 7. Opsahl, T.: Projecting two-mode networks onto weighted one-mode networks (2009) 8. Tutoky, G., Paralič, J.: Modelovanie a analýza malej komunitnej sociálnej siete. In: 5th Workshop on Intelligent and Knowledge Oriented Techniloties, Bratislava (2010) 9. Tutoky, G., Repka, M., Paralič, J.: Structural analysis of social groups in colla-boration network. In: Faculty of Electrical Engineering and Informatics of the Technical University of Košice, Košice (2011) 10. Hajra, K.B., Sen, P.: Aging in citation networks. Elsevier Science, Amsterdam (2008) 11. Zhu, H., Wang, X., Zhu, J.-Y.: The effect of aging on network structure. The American Physical Society (2003) 12. Ebbinghaus, H.: Memory: A Contribution to Experimental Psychology. Teachers College, New York (1885) 13. Savara, S.: The Ebbinghaus Forgetting Curve – And How To Overcome It, http://sidsavara.com/personal-productivity/the-ebbinghauscurve-of-forgetting 14. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: ACM-SIAM Symposium on Discrete Algorithms (1998) 15. Batagelj, V., Mrvar, A.: Pajek - Program for Analysis and Visualization of Large Networks, Ljubljana (2009)