A Centrality Approach to Identify Sets of Key Players in an ... - CiteSeerX

4 downloads 116883 Views 317KB Size Report
133 million blog records indexed by Technorati since. 2002 and 900000 blog posts in 24 hours. By June 2008,. Technorati tracked blogs in 81 languages and ...
SHORT PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 3, November 2009

A Centrality Approach to Identify Sets of Key Players in an Online Weblog Dr. M. Mohamed Sathik1, A. Abdul Rasheed2 1

Reader in Computer Science, Sathakathullah Appa College, Palayamkottai, Tirunelveli, Tamilnadu, India. Email: [email protected] 2 Assistant Professor in Computer Applications, Valliammai Engineering College, SRM Nagar, Kattankulathur, Tamilnadu. India Email: [email protected] Abstract — Social Network Analysis (SNA) is an active area of research, not only in inter – disciplinary areas but also in multidisciplinary areas. Key players can be identified using different methodologies, including centrality measures. Few works are done using different datasets in different areas. In this paper, we tried to identify the sets of key players from blog posts ad responses that are posted by individuals, using centrality measures.

Key players are those elements in the network that are considered important, in regard to some criteria [1]. Identifying key players is one among the goals in an online interaction media such as blog posts. There are many blog posts over a period of time. There may be one or more responses for blog posts. The growing phenomena of blog posts need to be analyzed. This leads to the situation of identifying the key players, who got more responses for their blog posts. The remaining area of the paper is structured in the following manner: Section 2 discusses about motivation and problem definition. The works that are already done in the related area is discussed as in Section 3. Section 4 meant for the data that we have taken for our study, method that we applied to identify the sets of key players through that dataset and the results obtained. In Section 5 we conclude this paper and we also discuss about the results

Index Terms — Social Networking, Key Players, Centrality Measures, Graphs.

I. INTRODUCTION The study of networks is an active research topic because of their capability of modelling many real world complex systems. Social networks are the graphs of interactions between individuals. A social network N consists of a collection of nodes like people, organizations, or groups together with a collection of link sets which generalize the idea of a link from A to B. Social Network Analysis (SNA) provides a spectrum of tools and theoretical approaches for holistic exploration of the interaction patterns among individuals, groups and even organizations. Social networks have gained popularity recently with the advent of sites such as MySpace, Friendster, Orkut, Twitter, Facebook, etc. The number of users participating in these networks is large and still growing. 133 million blog records indexed by Technorati since 2002 and 900000 blog posts in 24 hours. By June 2008, Technorati tracked blogs in 81 languages and there are 77.7 million unique visitors in the US by August 2008[9]. This growing trend helps the researchers to turn their attention for analyzing the blog posts in several dimensions. A fundamental problem related to these networks is the discovery of clusters or communities. A Blog, also referred to as weblog, is a popular way of publishing information on the web. It comprises blog posts, or content written by the bloggers, which are typically organized into categories. Blogs create a context for dialogues between bloggers and readers. Most blog platforms provide a personal writing space that is easy to publish, sharable. Online social networking has become a very popular application in the age of Web 2.0 application, which facilitates the users to communicate, interact and share on the World Wide Web (WWW).

II. MOTIVATION AND PROBLEM DEFINITION A Social network is generally viewed as a graph, as its structure is so complex. Measuring the network location is finding the centrality of a node. These measures give us insight into the various roles and groupings in a network, who are the connectors, leaders, bridges, and key players. The Key Player Problem (KPP) can be viewed as two sub – problems as stated below[6]: i. KPP – 1: Given a social network SN, find a set of k nodes, (can be called as kp-set of order k) which, if removed, would maximally disrupt communication among the remaining nodes. KPP – 1 is the identification of key players for the purpose of optimally diffusing something through the network by using the key players as seeds. ii. KPP – 2: Given a social network SN, find a kp – set of order k that is maximally connected to all other nodes. KPP – 2 is the identification of key players for the purpose of disrupting or fragmenting the network by removing the key nodes. In graph theory and network analysis, there are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph. Measuring the network location is finding the centrality of a node. The centrality approach consists of measuring the centrality of each node in the network, then selecting the k most central nodes to comprise the kp – set. Centrality measures how central an

85 © 2009 ACADEMY PUBLISHER

SHORT PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 3, November 2009 vertices, and E is a set of links, called edges, that connect two elements of V. Cutpoints and Key players are nodes whose removal would fragment the network into disconnected subgroups. We found that it is a rare literature in this field. We provide some significant works that are already been done in this area. Ref. [2] describes the notable work that provides some introductory part of the problem along with its classification. In this paper, the author explained how the centrality measures can be applied over social networks. Ref. [4] by the same author is the counterpart which explains how the centrality measures can be applied to identify traffic flows over a network structure. Ref. [3] provides a geometric characterization of the key player identified with an intercentrality measure, which takes into account both a player’s centrality and the contribution to the centrality of the others. The authors have taken gaming as their field for their study. The results are also expressed as an outcome of their research. Ref. [6] provides an insight into the problem. This paper can be used as an introductory material to know the details regarding the area of key player like the meaning of the key player problem, its classification and different application areas of the area. This paper also discusses about how the centrality measures are useful in finding the key players. Different approaches that are applied to find the key players are also explained in this paper. Ref. [1] applies information theory approach to identify the sets of key players. The authors proposed a new method aimed at finding a set of key players using entropy measures. Ref. [5] combines existing methods on calculating exact values and approximate values of closeness centrality and presents new algorithms to rank the top-k vertices with the highest closeness centrality.

individual is positioned in a social network. Degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality are the four measures of centrality that are widely used in network analysis. For a graph G = (V, E) with n vertices, the Betweenness Centrality CB(v) for vertex v is defined by: (1) where σst is the number of shortest paths from s to t, and σst(v) is the number of shortest paths from s to t that pass through a vertex v. Freeman defined the betweenness measure as sums the proportion of shortest paths from one node to another that pass through a given node. As far as considering KPP – 1, a node with high betweenness centrality is responsible for connecting many pairs of nodes via the best path, and deleting that node should cause many pairs of nodes to be more distinctly. Deleting that node should cause many pairs of nodes to become fully disconnected or at least more distantly connected. Degree centrality and closeness centrality are the useful measures for KPP – 2. Degree centrality of a node is the number of direct connections that the node has. In other words, degree centrality is defined as the number of ties that a node has, (i.e) the number of links incident upon a node. For a graph G = (V,E) with n vertices, the degree centrality CD(v) for vertex v is: CD(v) = deg(v) / [n – 1]

(2)

where deg(v) is the degree of the vertex v. Closeness is one of the basic concepts in a topological space. We can say that two sets are close if they are arbitrarily near to each other. In a social network, closeness centrality measures how close a vertex is to all other vertices in the graph. Vertices those that tend to have short geodesic distances to other vertices within the graph have higher closeness. This can be measured as in (3)

IV. MATERIAL, METHOD AND RESULTS We used the dataset that is collected by Gopal (2007). This dataset is the collection of blog posts that are posted by AIDS patients. The blog responses are randomly collected by the author. The blog posts are represented by numbers, to protect the privacy of the patients. The dataset contains an edge list for a directed graph representing the pattern of citation among 146 unique blogs related to AIDS patients, and their support networks. A directed edge from one blog to another indicates that the former had a link to the latter in their web page (more specifically, the former refers to the latter in their so-called 'blogroll'). Blog posts along with its responses are represented by a graph structure. When there is a response j for a blog post i, there emanate an edge from i to j. A vertex represents blog post and an edge represents the response between the two vertices. In this way the entire graph structure is created. Then we applied the key player [8] to identify the key players over this graph of social network.

(3) where n ≥ 2 is the size of the network's 'connectivity component' V reachable from v. Closeness can be regarded as a measure of how long it will take information to spread from a given vertex to other reachable vertices in the network. The larger the closeness centrality of a vertex, the shorter the average distance form the vertex to any other vertex, and thus the better positioned the vertex is in spreading information to other vertices. The closeness centrality of all vertices can be calculated by solving all pairs shortest-paths problem. III. RELATED LITERATURE A social network SN can be modelled as a graph G = (V,E) where V is a set of objects, called nodes or

© 2009 ACADEMY PUBLISHER

86

SHORT PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 3, November 2009 V. CONCLUSION AND DISCUSSION

Fig 1 shows the number of key players with

A key player is someone who is always in the spot light and is always involved in community activities. In this problem, we utilized centrality approach to identify the sets of key players in a weblog. In this work, we have taken a blog posted by AIDS patients. There are seven key players under KPP – 1 and five key players under KPP – 2. For visualizing KPP – 1, we used NetDraw and for visualizing KPP – 2, we used Pajek. We used baseline settings, like fragmentation criteria, group size and number of iterations, to identify the key players. The outcome of this work is also produced as figures and tables. VI. REFERENCES [1] Daniel Ortiz - Arroyo, D. M. Akbar Hussain, "An information Theory approach to identify sets of key players", LNCSA 5376, pp 15-26, 2008 [2] Stephen P. Borgatti, "Identifying sets of key players in a social network", Computational and mathematical organization theory, springer US, vol 12, no 1, pp 21-34, 2006 [3] Coralio Ballester, Antoni Calvo - Armengol, Yves Zenou, "Who's who in networks wanted: the key player", Econometrica, vol 74, No. 5, pp 1403-1417, 2006 [4] Stephen P. Borgatti, "Centrality and network flow", Social Networks, Vol 27, pp 55–71, 2005 [5] Kazuya Okamoto, Wei Chen, Xiang - Yang Li, "Ranking of closeness centrality for large - scale social networks", Springer Lecture Notes in Computer Science, pp 186-195, 2008 [6] Stephen P. Borgatti, "The Key Player Problem" available at: www.steveborgatti.com/.../borgatti%20%20NAS%20%20The%20Key%20Player%20Problem%203.doc [7] S. Gopal, "The evolving social geography of blogs," Societies and Cities in the Age of Instant Access, H. Miller, Ed. Berlin:Springer, 2007, pp. 275-294 [8] Key Player Program at: www.analytictech.com/keyplayer/keyplayer.htm [9] http://technorati.com/blogging/state-of-theblogosphere/ [10] http://www.analytictech.com/Netdraw/netdraw.htm

Fig. 1: Blocks and Cutpoints (blue in Colour) under KPP-1

respect to KPP – 1. Key players are: 7, 12, 37, 73, 118, 142 and 143. These persons are the players who received more responses than the others. Fig. 2 represents the set of key players with respect to KKPP – 2.

Fig 2: A Set of Key Players (yellow in Colour) under KPP-2

We turned our attention to identify the sets of key players in accordance with KPP – 2. It is the identification of key players for the purpose of disrupting or fragmenting the network by removing the key nodes. Among the one hundred and forty six nodes, only five nodes are identified as key players in this context. The nodes identified are: 118, 126, 139, 143 and 145. These are the nodes which maximally connected to all other nodes, which are identified by cutpoints and cutsets.

© 2009 ACADEMY PUBLISHER

87

Suggest Documents