The Evolution and Dynamics of Research Networks

0 downloads 0 Views 265KB Size Report
Vladimir Batagelj1, Bettina Hoser2, Claudia Müller3, Steffen Staab4 and ... a researcher, who has a background in semantic web communities, wants to gain.
The Evolution and Dynamics of Research Networks Vladimir Batagelj1 , Bettina Hoser2 , Claudia M¨ uller3 , Steffen Staab4 and 5 Gerd Stumme 1

5

University of Ljubljana, Department of Mathematics 1000 Ljubljana, Jadranska 19, Slovenia [email protected] 2 University of Karlsruhe, Institute for Information Systems and Management (Information Services and Electronic Markets) 76128 Karlsruhe, Kaiserstrasse 12, Germany [email protected] 3 University of Stuttgart, Institute for IT-Services 70550 Stuttgart, Allmandring 30 E, Germany [email protected] 4 University of Koblenz-Landau, Institute for Computer Science (Information Systems and Semantic Web) 56070 Koblenz, Universit¨ atsstrasse 1, Germany [email protected] University of Kassel, Electrical Engineering and Computer Science (Knowledge & Data Engineering) 34121 Kassel, Wilhelmsh¨ oher Allee 73, Germany [email protected]

Abstract. Existing collaboration and innovation in scientific communities can be enhanced by understanding the underlying patterns and hidden relations. Social network analysis is an appropriate method to reveal such patterns. Nevertheless, research in this area is mainly focused on social networks. One promising approach is to use homophily networks as well. Furthermore, extending the static to a dynamic network model enables to understand existing interdependencies in these networks. A mathematical description of possible analyses is given. Finally, resulting research questions are illustrated and the necessity of an interdisciplinary research approach is pointed out. Keywords. Homophily networks, social networks, evolution, scientific community

1

Analysis of Research Networks

Nowadays, communities of researchers have been studied from many different angles for many years (e.g., [1][2], [3], [4]). This subject is very attractive to researchers, as it is relatively simple to obtain data for the analysis [5]. Furthermore

2

V.Batagelj, B.Hoser, C.M¨ uller, S.Staab, G.Stumme

the collaboration of researchers can be improved based on the understanding of the underlying patterns and the relations in scientific communities. However, most of the analyses focus on a single type of network (e.g., only on co-authorship or only on citation networks). In order to get a complete picture of a research community, all information contained in these different networks should be combined. Our long-term vision is to build a system that allows a researcher – who is not a specialist in Social Network Analysis (SNA) – to get insights in any research domain that he is not acquainted with yet. For instance, a researcher, who has a background in semantic web communities, wants to gain knowledge in data mining field. He would be satisfied to find out all the information about the new community i.e. main actors, the main lines of research, the most central papers, etc., without extensive literature work. Nowadays, there are many algorithms for different kinds of network analyses (e.g., [6], [7]) or knowledge discovery (e.g., [8]). All of them have a number of drawbacks like they are not interacting in a coherent way, they are usually restricted to one specific type of dataset, and their results are not easy to interpret by a nonSNA-specialist. A creation of a system that overcomes theses drawbacks poses interesting implementation and research issues. From an application perspective, we would require such a system to be able to fulfill at least the following tasks:

– Identify the main trends in a specific research community (like the semantic web community) in the last n years, – Reveal the main trends of a specific topic (like “topic detection”) in the last n years (This differs from the previous task such that it may involve different communities who do not know each other), – Observe how colleagues move through communities (“What is Tim BernersLee up to next?”), – Analyse one community over time (decay, growth, stability), – Discover the emergence, convergence/divergence and decay of communities. (mergers & acquisitions) and – Recommend, for a given interest profile, the membership in a community, or the creation of a new community. The following contribution summarizes the results of a lively two days lasting discussion about how scientific communities can be analysed on a timely basis and how existing changes in the topic and community structure can be identified. This contribution is organized as follows: First of all, we started to apply the idea on folksonomies and social platforms. As a result a general model is defined. Then an introduction in network evolution is given. The proposed methods allow to meet same tasks of the proposed application. We conclude with a short outlook. Open questions in the field of interdependent evolution of networks are discussed.

The Evolution and Dynamics of Research Networks

2

3

A Static Model of Homophily and Social Networks

We started our considerations with a definition of a general model. This model is based on folksonomies and social platforms, where the following two kinds of networks are recognized: 1. Homophily is the tendency of individuals to contact similar people at a higher rate than dissimilar people. Cultural, behavioral, genetic, or material information that flows through networks will tend to be localized. The distance in terms of social characteristics is translated into network distance [9]. 2. A social network is a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency representing social acts such as communication (e.g. letters, e-mail, face-to-face), co-authorship of texts, trading or friendship. In spite of the differences, empirical investigations in the past have shown that the two kind of networks exhibit correlations. How one network may give predictive insight into the other and how one network helps to explain and predict relationships in the other remains an open question up to now. Both kinds of network are built from an overlapping set of entities. Social networks (typically) constitute one mode networks of actors and specific relationships between them. Homophily networks constitute higher mode n-partite networks of actors and things. Also, the homophily networks lead to relationships between actors – but only inferred relationships of closeness by interest, which do not necessarily imply that the one actor is aware of the other.

Groups explicit/implicit

Groups =?

Social Networks

Homophily Networks

Actors

Things

Fig. 1. Deriving Groups from Communities

In homophily networks, unsupervised data mining techniques, e.g. K-means, lead to clusters of actors into common interest groups based on the description of actors based on many things, e.g. tags, keywords, or articles they cite. In social networks, graph clustering techniques, e.g. spectral methods or cliques, lead to clusters of actors based on how the actors communicate. Both clustering techniques return groups. No matter how meaningful these groups are, they

4

V.Batagelj, B.Hoser, C.M¨ uller, S.Staab, G.Stumme

need to be carefully evaluated. But clearly, the different groups lead to different insights into the underlying community under consideration. An overview of interaction between the two kind of networks is sketched in Figure 1. Thereby, the figure also mentions some methods in order to illustrate its points, but the set of methods is only exemplary and not meant to be exclusive. The figure illustrates that one may define e-mail or co-authorship networks based on actors – here researchers – (first level) which are organized in institutions communication networks (second level). Researchers use systems like Bibsonomy to organize their publications and references. Using this information, the similarities of tags, content or co-citations can be obtained (homophily networks – level 3). The results of analyses are applied to identify implicit and explicit communities.

SNA community, SemWeb community explicit (mailing list,...) / implicit

=?

Groups analyze or recommend

Graph Clustering Result

Data Clustering Result

Communication Networks (e-mail, co-author, etc.)

Similarity based on tags, content, co-citation, co-visit

Researchers, Institutions

Tags, words, keywords, events, conferences

Folksonomy

Fig. 2. Example Methods for Clustering into Groups

3

Evolution of Homophily and Social Networks

In the proposed approach, there are three levels: actor descriptions, homophily relations and social networks. Using the interplay of these three levels we are trying to answer several questions about the evolution of networks. The basic level are the actor descriptions. Let V be the set of actors; then to each actor v ∈ V is assigned its description Tv . For example: the description Tv can consist of tags or keywords used by author v, but also some other of his properties. In the proposed approach, we are trying to relate the development of social network to the homophily relation among actors induced by (dis)similarity of their descriptions. Assuming that the descriptions of actors are “correlated

The Evolution and Dynamics of Research Networks

5

with” (are inducing) the social network structure, we can try to operationalize homophily using an appropriate dissimilarity between the actors’ descriptions. u ⊕Tv | For example d(u, v) = |T |Tu ∪Tv | where ⊕ is the symmetric difference – A⊕B = (A ∪ B) \ (A ∩ B). The dissimilarity d determines a kind of homophily space or field (V, d). From it we can derive “homophily” networks – for example: rneighborhoods network or k-nearest neighbors network. On the basis of the homophily relation under social actions of actors the social network (V, L) is constructed. The assumption is that closer actors in (V, d) have higher probability to establish a link in the social network. An important question is, how good our assumption about the interconnections between the three levels is. Possible approaches to answer it are:

– Let Nr = (V, Lr ) be the r-neighborhoods network – Lr = {(u, v) : d(u, v) ≤ r ∩L| is r} then we can observe how the “compatibility” index Com(r) = |L|L| ∗ ∗ changing with r. There exists r such that Com(r) = 1 for all r ≥ r . Small value of r∗ indicates strong influence of actors’ descriptions on formation of social network links. – Another P possibility is a kind of permutation test. We introduce the quantity S = (u,v)∈L d(u, v) measuring a stress in the social network; and, for perP mutation π : V → V , S(π) = (u,v)∈L d(π(u), π(v)). Now we can compare the value of S against the distribution of S(π) for random πs. If only few values of S(π) are smaller than S, this is an indication of strong influence of descriptions/homophily on the structure of social network. The interconnection goes also in the opposite direction: actors linked in the social network will get more similar descriptions. Till now we were looking at the static model. A sequence of such models indexed by time t describes the evolution of the system. An example is given in figure 3. Here we see how over time sub-communities of the larger community split, merge, grow and shrink respectively.

T1

T2

T3

Fig. 3. Evolving communities

6

V.Batagelj, B.Hoser, C.M¨ uller, S.Staab, G.Stumme

Assume that in each time point we have a clustering C = {C1 , C2 , . . . , Ck } where ∅ ⊂ Ci ⊆ V . Note that, C need not to be a partition, and some clusters Ci can even overlap. The clustering C can be explicit (given as data) or implicit (determined by some procedure, such as islands algorithm [10] or clustering with relational constraint [11] or k-cliques [12] or cores [13] etc.). In directed networks, an interesting approach could be the single-center clusters We also assume that the evolution relation ; among clusters from consecutive time points is known: Ci (t) ; Cj (t + 1), meaning that cluster Cj (t + 1) evolved from cluster Ci (t). Again it can be given as data or computed. There are nontrivial problems in computationally determining the evolution relation. which are obtained by the leaders strategy in constrained clustering.

LIN_N

LAZEGA_E

ROBINS_G

FRIEDKIN_N

VAPNARSK_V LAI_G

LEVOT_P

STRAUSS_D

MERTON_R

HOLLAND_P

LYNCH_E

FRANTZ_P

DEYRIS_E

ALAKARE_B

COLEY_J

GERGEN_K VANDUIJN_M

GALASKIE_J

AALTONEN_J

LEMOIGNE_M

SCHWARTZ_N

BURT_R

MAILLARD_J PATTISON_P SHOTTER_J

BARAN_M

GULATI_R

WHITE_H

LEINHARD_S

SNIJDERS_T

ATRAN_S

MUSSAT_M MARECHAL_M

ANDERSEN_T

EK_E

BOORMAN_S FAUST_K

SEIKKULA_J

DEPOMPER_M SELVINIP

MIZRUCHI_M

CORP_E

ROSS_N MEDIN_D

LEMOY_A DELALAUR_L

COLEMAN_J

ANDERSON_H

TIMURA_C

LEBEAU_E

MCGORRY_P

ANDERSON_C

BREIGER_R GRANOVET_M

ALANEN_Y

KILDUFF_M WASSERMA_S BRASS_D

FIENBERG_S

LAUMANN_E

DOREIAN_P IACOBUCC_D

MATTOSO_J ARRUDA_M

MARSDEN_P FARARO_T

GIRVAN_M SKVORETZ_J

COHEN_A

HURLBERT_J

FREEMAN_L

IBARRA_H

EVERETT_M

BATAGELJ_V

STEDILE_J

COSTA_L

DOROGOVT_S BALKUNDI_P

BENJAMIN_C

MORENO_Y

KIM_D

GRABOWSK_A

HUMMON_N BOCCALET_S

BIONDI_A BARTHELE_M

FERLIGOJ_A

BOFF_C

LESBAUPI_I

MOORE_C

WHITE_D

MILLER_M

KRACKHAR_D

PARK_J

COOK_K

CARLEY_K

STROGATZ_S

WILLER_D

TRINDADE_H

GRONLUND_A

NEWMAN_M

PINAUD_J AMARAL_L

MARKOVSK_B

VLAHOV_D

NEAIGUS_A

GONCALVE_R

MOLLOY_M WATTS_D JEONG_H

BONACICH_P

KLOVDAHL_A

HOLME_P

ALBERT_R

MASUDA_N VANDIEN_S

BORGATTI_S

DESJARLA_D LATKIN_C

BIENENST_E

BURGARD_A

BARABASI_A

MANDELL_W

MAHADEVA_R ROGERS_E

RODKIN_P

POTTERAT_J

CRICK_N

CELENTAN_D KLINKE_D ESPELAGE_D

XIE_H

BROADBEL_L

FRIEDMAN_S

FAMILI_I

MUTH_S

LEUNG_M VALENTE_T ROSSI_M CASSIRAM_A GIULINI_G LATUADA_S BELTRAMI_L

MAGNUSSO_D

CAIRNS_R

GEST_S CAIRNS_B

KINDERMA_T

BELGIOJO_A

VANACKER_R CADWALLA_T COIE_J

DARROW_W UNKNOWN ROTHENBE_R

GATTIPER_M

KELLY_J

REGGIORI_F VACANI_C

KANNES_G SABORNIE_E

GUILINI_G ZOTTI_S

ARRIGONI_P BIANCONI_C

FARMER_T

D’AMIA_G

GOFORTH_J

NIELSEN_R

FRANCHET_G

MORRIS_M ANNONI_A

VERCELLO_V

ROMUSSI_C

AMATI_C FEDORA_P

HOLLOWEL_J

GOODHART_K HOYME_H

GOZZOLI_M

BUCHANAN_L PELLEGRI_A ADLER_P

MAY_P

VILJOEN_D

MERIGGI_M

HYMEL_S

WEISNER_C

CHIZZOLI_G SCOTTI_A

MEZZANOT_G

DISHION_T

KRETZSCH_M

HIGGINS_C

LUCCHELL_G PAPAGNA_P

CATTANEO_C ASPARI_D

GARIEPY_J

CURTIS_R

AMIRKHAN_Y MAVROVOU_M SUMATHI_R

RICCI_G

VALLI_F BECCARIA_G MONTALTO_R

PATETTA_L

PFAENDTN_J

HONEGGER_A

PIZZAGAL_F

THOMPSON_J

KNOWLTON_A WOODHOUS_D

SCHILLIN_C

PEARL_R

ESTELL_DNECKERMA_H CLEMMER_J

TRUJILLO_P

GOLDOLI_E

MATZGER_H DELUCCHI_K

DALLAJ_A SANDRI_M MEZZANOT_P

KALBERG_W ABEL_E

WHITE-CO_M KASKUTAS_L

GOSSAGE_J DECOTEAU_S

LESAGE_A

OZEL_S

WALKER_B WESTLEY_F

AYDIN_I

BREWIN_C

BROWN_G

JANSSEN_M MACCARTH_B

FARRELL_M

MARASCO_C HELD_T

CARPENTE_S

HUNT_J

ZILELI_L

EREN_E

BERKES_F

ADGER_W

DEROSA_C SOLOMON_P

FOLKE_C BRUGHA_T

SCHEFFER_M

WING_J OZCURUME_G

BASOGLU_M

GAMBOA_G REEVE_H

HAHN_T BEBBINGT_P

DAPPORTO_L

RAU_P MALANGON_C

LALE_T

MAGLIANO_L

FIORILLO_A

JEANNE_R

OSTROM_E HOLLING_C

HENDERSO_S

LEWIS_G

GUNDERSO_L JENKINS_R

MELTZER_H

OLSSON_P ROSELER_P GUARNERI_M

FADDEN_G

TURILLAZ_S PALAGI_E

TASKINTU_N

KILIC_C

MORGAN_Z

MAJ_M

STARKS_P STRASSMA_J

KURT_G

WESTEBER_M

Pajek

Fig. 4. Islands in authors’ citations network of Social Networks field

What are the properties of the evolution relation? Different patterns can be noticed in the evolution of clusters [14]: birth of new cluster, merging/fusion of clusters, growth of cluster, splitting of cluster, contraction/decay of cluster, death of cluster, stability of cluster, internal change in cluster, etc. The clusterings can be analyzed at the level of social network based on structural characteristics of clusters: single-kernel (strong component, clique, etc.), centrality/centralization, etc. For example:

The Evolution and Dynamics of Research Networks

7

– not single-kernel cluster can indicate a future splitting of the cluster and – the change in the cluster’s kernel structure can indicate a change of cluster’s topics. Additional analyses can be based on the other two levels: – S(C) can be used to measure the stress/divergence in aScluster C, – to each cluster C we can assign its vocabulary T (C) = v∈C Tv . Analyzing the changing of vocabulary through time (T (Ct ))t∈T , we can get insight into the life of the cluster: convergence to common topics, changing of interest, etc. All of the described analyses can be done already using existing tools. But in several places these analyses are too crude – new methods, more tuned to the data and problems, need to be developed. Here, we described only the simplest model which can be extended in some directions: – in the same model we can consider different sets of vertices: authors, works, tags, journals/conferences, institutions, countries, etc. and – in the same model we can consider different social networks: co-authorship, citations, communications, etc.

4

Outlook

The future research questions will focus on the problem of interdependent evolution of networks. This means that the analysis of the evolution of different networks is defined on the same set of actors. An example in the research community could be the co-evolution of the co-authorship-network, the co-citation network, the co-participation in research projects and for instance the communication network (this is most probably difficult to obtain). Questions of research might be: – How does the evolution and the dynamics in one network influence the others? – Is there a “leading” network, which mainly drives the co-evolution of the coupled system? – How do social processes provide an explanation to the results found? – Are there predictors for success or failure of a coupled system, given that success or failure have been defined in the context? To even go a step further, one could take a look at the question: what will happen when the set of actors is unstable or even unpredictable? In the “real world” actors join and leave groups constantly. How can this behavior be modeled into the dynamics of co-evolving networks? These questions call for an interdisciplinary research approach. Coupled systems and their behavior are a well established field within physics. The social

8

V.Batagelj, B.Hoser, C.M¨ uller, S.Staab, G.Stumme

aspects of the question are being investigated in sociology and psychology. The data acquisition and handling need computer and information scientists, if the data is gathered from the Internet. Ideas might also be found in other fields like biology or chemistry. There are a number of benefits of such a research. If for instance, one is the administrator of a collaboration platform for scientists, another one could monitor how the community as a whole works, but also how parts of the community are evolving. Then another one could intervene if the systems shows signs of failure. Another application could be a “recommender service” for researchers. This system will give results such as relevant papers, relevant actors, or the change of research topics within a given field.

References 1. Freeman, L.C.: The impact of computer based communication of the social structure of an emerging scientific speciality. Social Networks 3 (1984) 201–221 2. Newman, M.E.: The structure of scientific collaboration communities. Proceedings of the National Academy of Science (PNAS) 98 (2001) 404–409 3. Barabasi, A.L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientic collaborations. Physica 311 (2002) 590–614 4. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69 (2004) 5. Batagelj, V.: Wos2pajek – networks from web of science (http://pajek.imfm.si/ doku.php?id=wos2pajek) (2007) 6. Wasserman, S., Faust, K.: Social network analysis: methods and applications. Cambridge Univ. Press, Cambridge (1997) 7. Newman, M.E.: The structure and function of complex networks. SIAM Review 45 (2003) 8. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. American Association for Artificial Intelligence Magazine (1996) 37–54 9. Mcpherson, M., Lovin, L.S., Cook, J.M.: Birds of a feather: Homophily in social networks. Annual Review of Sociology (http://dx.doi.org/10.1146/annurev. soc.27.1.415) 27 (2001) 415–444 10. Zaverˇsnik, M., Batagelj, V.: Islands (http://vlado.fmf.uni-lj.si/pub/ networks/Doc/Sunbelt/islands.pdf). In: XXIV International Sunbelt Social Network Conference, Portoroˇz, Slovenia (2004) 11. Batagelj, V., Ferligoj, A., Mrvar, A.: Hierarchical clustering in large networks (pajek.imfm.si/lib/exe/fetch.php?id=presentations&cache=cache&media= slides:clurelcon.pdf). In: Workshop on complex networks, Louvain-la-Neuve (2008) 12. Palla, G., Der´enyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 (2005) 814 13. Batagelj, V., Zaverˇsnik, M.: Generalized cores. http://arxiv.org/abs/cs.DS/ 0202039 (2002) 14. Palla, G., Barabasi, A.L., Vicsek, T.: Quantifying social group evolution. Nature (http://www.nature.com/nature/journal/v446/n7136/pdf/nature05670. pdf) 446 (2007) 664