Identification of opinion leaders in social network - IEEE Xplore

0 downloads 0 Views 504KB Size Report
Abstract- Identification of the opinion leaders in the world's largest microblogging site,Twitter, is crucial problem of social network analysis.Twitter has quick ...
Identification of Opinion Leaders in Social Network Nida Saddaf Khan, Maira Ata, Quratulain Rajput Faculty of Computer Science Institute of Business Administration (IBA) Garden/Kiyani Shaheed Road, 74400 Karachi, Pakistan {nskhan,maria.ata, qrajput}@iba.edu.pk

Abstract- Identification of the opinion leaders in the world's largest microblogging site,Twitter, is crucial problem of social network analysis.Twitter has quick information flow and has high impact on forming opinion on mass public.This paper presents the study based on various centrality mesaure approaches for finding key players for a specific political trend on twitter. A novel weighted approach has been proposed for finding opinion leaders based on the centrality measures. Experiment has been conducted on the twitter’s data for the sit in procession of Pakistani politician Imran Khan in 2014. It has been found that Twitter opinion leadership makes a significant contribution to individuals’ involvement in political reforms. Keywords—opinion leaders, analysis,social network ,twitter

I.

INTRODUCTION

Social Networking sites have become an integral part of today’s society and have changed the way the information flows in communities. It has been observed in the last decade that the social media played a vital role in political participations of people over a large scale and has a great influence on the process of political reforms [1] [2] [3]. Opinion leader concept was first emerged in 1940 and then further developed through different studies and theories. Today, social media have a significant role not only in the economy but also in the society. Among multitude of social media, Twitter, emerged as the world largest microblogging service and third largest social networking site that, has particular potential to shape the political discourse. In recent years, people used twitter as a platform for expressing, promoting, organizing and responding to their opinions on political issues [1]. From US and Iranian presidential elections [2] to Egyptian and Turkish protests [3], people used Twitter to demonstrate their views and political involvement. Twitter allows user to tweet their views and a recent study1 shows that, the individual who are active in tweeting and up voting the stories can be thought of opinion leaders. Thus, opinion leaders tend to play a role as a new type of agenda generator or news disseminator, irrespective of 1

http://www.hcpdols.com/digital-opinion-leaders-havetwice-the-reach-online-shows-asco-study/

their social, economic, or political standing. That is, any user has the opportunity to become an opinion leader, if the user could produce noticeable information to attract public attention. Identification of such influential node in social network can be extremely useful from the purpose of viral marketing, propagating one’s point of view, as well as setting which topic dominate the public agenda. Several research has been attempted to identify influential users on basis of most popular users [4], the most discussed users [5].The users who are instigators of the longest cascades of information [6] are also considered as opinion leaders in the literature. While [7] calls “influentials” in the diffusion of innovations process, and may suffice for market researchers looking to send a message to the widest possible public [8]. These techniques do little to identify influence on a more local/personal levels. Huanhuan Liu used web ranking algorithms like page rank and HITS [9] for finding influential nodes. The most influential nodes can be thought of as opinion leaders, who played major role in developing certain perspective through tweets. Katz[10] describe the “opinion leader” as a person able to change the opinions, attitudes, and/or behaviors of their “everyday associates” [11]. Opinion leaders use social pressure and social support to exert personal influence. These locally influential individuals are important because they help guide political discussion, integral to the strength of democracy [11]. Pinar Bilgic [12] worked on a project for “Detecting Opinion Leaders in Twitter”. She describes opinion leader as: “An opinion leader is an influencer in his own social network that increases the traffic and usage of the entity he shares on the web”. In this definition entity could be a product or service or any topic that people are sharing on network. The frequency of discussion is increased or people are using his opinion to express their own thoughts about certain topic which in return increase the number of followers of his opinion. Such scenario are very strong indicative of active/key player who have taken the role of leader in terms of opinion. We have used this definition of opinion leaders and have continued the hypothesis suggested by Pinar as the foundation of our model.

Twitter has also shown its political potential in Pakistan during Azadi Tehreek of Imran Khan. Pakistan Tehreek-eInsaf (PTI) chairman Imran Khan announced sit in protest with his supporters, as the protest against the rigging in the elections in 2014. Protest started in August throughout the country. PTI has a separate wing for youth known as Insaf Youth Wing (IYW)2 and it considers youth of Pakistan as the source of change to raise the standard of politics and to eliminate the corruption from the country. This made youth of Pakistan to actively participate in the party’s activities to give their input in politics. As young generation uses social networking most frequently so the role of social media in the Azadi March was phenomenal. What is unique about this particular case is how Twitter is being used to spread information about the demonstrations from the ground. In this paper, we used Pinar Bilgic [12] definition of opinion leader to determine the opinion leader in a network. The presented approach focus on the weighted approach based on the centrality measures. The rest of paper is organized as follows: Section II provides a literature review on opinion leadership. The proposed approach has been discussed in section III followed by experimental results in Section IV. Lastly, Section V discussed the findings and observations from the presented work. II.

RELATED WORK

Identification of opinion leaders and followers in a network has become very beneficial in several ways such as targeting the opinion leaders for marketing, understanding the structure of flow of information, community health campaign, administrative science, and most interestingly in politics. Our study is motivated by such wide variety of applications in several domains as described above. As opinion leader has significant many researchers have carried out research on identifying opinion leaders. Huanhua [13] proposed a new method Synthesized Centrality to find opinion leaders in a network of users. Synthesized Centrality is a measure of finding influential nodes in a network. It is calculated by multiplying the normalized degree with Betweeness and then dividing it by the normalized Closeness to obtain a comprehensive centrality measure. It contains all the features extracted by all the centrality measures and proved to be significant in finding opinion leaders. For the network structure they took the microblogs from Sina, one of the biggest microblogging websites of China. The network was comprised of more than four thousand users of Shanghai University which formed a local social area network SHULAN. They applied and compared PageRank, HITS, and Synthesized Centrality over the learned network SHULAN, where PageRank and HITS are link analysis algorithms which are used in this paper to find the relative importance of a node. Their results shows that Synthesized centrality proved to be better in identifying opinion leaders i.e., the leaders identified by Synthesized Centrality are likely to be opinion leaders with PageRank and HITS. Furthermore the synthesized centrality has a higher

2

www.insaf.pk/

accuracy for identifying opinion leaders and the accuracy increases with the opinion leaders’ increasing. The research article [14] presented a new approach based on analysis of online communities for finding opinion leaders and opinion trends. The approach is illustrated by looking at sample forums in which opinions on Apple iPhone are exchanged. This analysis comprises of four steps. In the first step, the users’ opinions on the product are extracted by text mining and then they are classified by their polarity i.e., positive, negative and neutral by a data mining classification method, Support Vector Machine (SVM). After classification, each forum user was assigned an opinion. If a user has a diverging opinion he is linked to the class reflecting his strongest opinion. In the second step, the communication relationships among users are identified by text based relationship mining methods. The extracted users and their opinions together with relationships found among them form a social network as a graph. The nodes represent the users of the forum and edges represent their communication relationships in the graph. Nodes and edges have the attributes which help in their characterization. The nodes attributes describe the users’ opinions e.g., about any specific product or topic while edges attributes describe the communication frequency. The resulting graph is analyzed by determining key figures for the position of single users and for the overall structure of the network. The measures of centrality (degree, Betweeness, and closeness) were used to identify the opinion leaders. The Degree centrality was used to find the local opinion leaders while Closeness and Betweeness were used to identify the opinion leaders of overall network. This approach is a mixture of social network analysis and text mining for finding polarity of the posts. Thus existing positive opinions can be reinforced via marketing measures. Negative opinions can be counteracted in time e.g. by product improvements or appropriate marketing measures. The approach can be considered as a basic concept for analyzing opinions and finding opinion leaders in social networks. Stephen H. Borgatti put major emphasis on key player problem (KPP) specifically; KPP-POS and KPP-Neg. In KPP- Neg. problem deals with removal a set of k nodes (called a kp-set of order k) in given social network such that its removal would maximally disrupt communication among the remaining nodes. For instance in public health, finding who to immunize or quarantine in order to slow down spread of infectious disease, or in criminal justice; who to arrest or discredit to disrupt criminal network. Whereas KPP-POS concerns with finding a kp-set of order k that is maximally connected to all other nodes in given social network. For example Selecting peer health advocates for diffusing safe practices (e.g. bleaching) and material or selecting employees for intervention prior to change initiative. He studied several approaches along with their shortfalls. In the end, he concluded that reciprocal of weighted distance gives the optimal solution to both of under studied problems. III.

PROPOSED APPROACH

In this research study the opinion leaders are identified on the basis of their network structure on social media i.e., how they are connected with other users. The proposed researched is carried out by following steps: Data

Acquisition, Data set creation, Network Creation, Calculation of Centrality Measures and their Comparative Analysis. A. Data Acquisition To download the network data, NodeXL3 is used. NodeXL is a free and open network analysis and visualization software package. It is a popular package similar to other network visualization tools such as Pajek4, UCINET5 and Gephi6. We used it to download the Tweets data from Twitter about the Azadi Tehreek of a Pakistani politician Imran Khan. The Azadi Tehreek, is based on the Azadi March which was the largest public protest march that ever held in Pakistan began on 14 August 2014 to 17 December 2014. Organized by the Pakistan Tehreek-eInsaf party against Prime Minister Nawaz Sharif, over the claims of his governmental manipulations in the 2013 general election. B. Data Set The hash tag #dharna (sit-in protest) used for the acquisition of Tweets. The downloaded data contained various fields about the Tweets including: sender account ID, receiver account ID, relationship (mention, reply, and tweets), date and several visual and label properties. We preprocess the data so that it contains only those attributes which are necessary for network creation and its analysis. After preprocessing, the data consists of the following columns: sender account ID, receiver account ID and relationship where we have used all three types of relationship in our data. A sample of few data points are shown in Table 1. TABLE 1.

S No.

Sender

SAMPLE DATA POINTS

Receiver

Relationship

1

User_1

User_10

Mentions

2

User_2

User_17

Replies to

3

User_40

User_101

Tweet

C. Network Creation After cleaning the data we moved to the next step of network creation. Senders and receivers account IDs were used to perform the role of vertices of the network while for edges creation, Relationship column was used. The relationship identifies the type of Tweets sent by sender to the receiver. It could be the mentioning of an account holder in a Tweet, the reply of a Tweet or retweet by the same sender again. In Our data (table 1) Sender shows sender of the tweet, receiver shows receiver of the tweet and Relationship could be formed by Mention/Reply/Tweet. The sample data points given in Table1 are drawn in Fig.1 to show the network structure. The network created by the dataset is a directed graph

3

http://www.nodexl.codeplex.com/ http://www.vlado.fmf.uni-lj.si/pub/networks/pajek/ 5 https://sites.google.com/site/ucinetsoftware/home 6 http://gephi.github.io/ 4

having more than four hundred vertices and almost seven hundred edges.

Mentions User_1

Replies to User_2

Tweet User_40

User_10

User_17

User_101

Fig. 1. A sample network structure

D. Calculation of Centrality Measures Centrality is a structural characteristic of individuals in the network, meaning a centrality score tells you something about how that individual fits within the network overall. The most commonly used centrality measures are Degree, Betweeness and Closeness. These measures are used to find the important and central nodes in the network according to the particular application. Degree centrality is used when we need to find the central nodes in a network in terms of direct ties. Betweeness centrality is used when the individuals having brokering position needed to be identified. And Closeness centrality is used when we need to find the nodes which are close to everyone else in a network. The described measures are used to identify the centrally located nodes in a network where each one has a different perspective of claiming the important or central nodes. But there is not a single measure which can give us the collective score of centrality by combining the effect of all these three measures. We took this idea and proposed an approach where the three measures (degree, betweeness, and closeness) are combined according to their respective weights and a collective score is computed for each node to show their location in the network. Individuals with high centrality scores are often more likely to be leaders, key conduits of information, and be more likely to be early adopters of anything that spreads in a network. In this phase importance of each node was calculated by Weighted-Degree-Betweeness-Closeness-Centrality (WDBCC) measure. The formula for calculating WDBCC is given in Table 2: TABLE 2.

EQUATION FOR CALCULATION OF WDBCC

Where D = In-degree of node n B= Betweeness of node n C = Closeness of node n α = β = γ = are the respective weights of each measure α+β+γ=1

In the calculation of WDBCC we used In-degree centrality because we have directed graph and In-degree of a node is more important than Out-degree as it shows that how many times a node is being mentioned, replied or tweeted. We used weight function for the reason that it is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. Here according to the weights the respective effect of a measure can be adjusted for different applications. Finally in the next section, we analyzed and discussed the results obtained by each of the method (degree, betweeness, and closeness) and compare them with the results obtained by WDBCC. IV.

EXPERIMENTS & RESULTS

The network of Tweets contains 467 users and 1060 edges. After data cleaning and normalizing the variables of the dataset, the network was created. The network of the selected data set is shown in Fig.2 (a). As the dense connection area can be seen in the network structure, there are many nodes playing important role in terms of Indegree and Betweeness highlighted in red color as shown in Fig.2 (b) and Fig.2 (c) respectively.

Fig. 2 (c). The user having maximum betweeness

The Fig.2 (b) highlighted the dense area in red color to emphasize on a user who has been tweeted, mentioned or replied maximum number of the times by other users in the network. The Fig.2(c) highlighted in red color the connection of a user who has the highest betweeness to connect several disconnected components of the network. TABLE 3.

Fig. 2 (a). Network of Twitter Users

TOP SCORED USERS IN EACH MEASURE

Rank

In-Degree

Betweeness

Closeness

WDBCC

1

User_174

User_174

User_22

User_174

2

User_319

User_319

User_136

User_319

3

User_310

User_56

User_188

User_43

4

User_43

User_43

User_294

User_55

5

User_259

User_161

User_326

User_56

6

User_55

User_55

User_373

User_161

7

User_169

User_459

User_386

User_22

8

User_399

User_399

User_391

User_373

9

User_56

User_288

User_406

User_391

10

User_123

User_100

User_417

User_462

11

User_161

User_51

User_424

User_399

12

User_30

User_169

User_462

User_188

13

User_132

User_281

User_93

User_294

14

User_151

User_132

User_168

User_386

15

User_192

User_123

User_214

User_169

The centrality measure of in-degree, betweeness and closeness are calculated for each node of the network. The top fifteen key players of the network identified by each measures in addition to our weighted measure WDBCC are shown in Table 3. WDBCC score is calculated by adding weighted centrality measures where weight α = 0.5, β = 0.4 and γ = 0.1 assigned to in-degree, betweeness and

closeness measures respectively. The equation of WDBCC calculation is mentioned in Table 2. The Fig. 2 (b). The user having maximum in-degree

weightage to in-degree centrality is kept high (i.e., 0.5) because in this particular network the users which are referred (mention/reply/tweet) highly by other users

indicates some importance in the network as compared to other nodes. This can be achieved through in-degree centrality measure to identifying key players of the network. The betweeness is assigned the weightage of 0.4 to give the second importance to those users that are playing influential role. The lowest weightage i.e., 0.1 is assigned to closeness because closeness in itself is not very useful as users having no ties with other users to get the highest closeness score of 1 but it can give good clue of key player of network, when combined with in-degree and betweeness centrality measures. The WDBCC score identified the key players in a network combining the three score of centrality measures with their respective weights. These leaders not only have the property of having many connections but also they are important because users in their networks use them to connect with other networks. The pie chart of WDBCC score as shown in Fig. 3 indicates the distribution of score among various measure and their combinations. Fifty three percent of the users which are identified as opinion leaders by WDBCC are also identified by in-degree and betweeness centrality measure while forty seven percent of the users are marked by closeness centrality. On the other hand remaining combination of approached have zero proportion. The interesting part of WDBCC score is that its calculation can be tuned according to the nature of domain therefore closeness measure has been give low weight. If the data belong to a problem where a particular measure is not significant then its effect could be minimized by lowering its respective weights. Hence, α, β, and γ are the controlling parameter of WDBCC score by which effect of each measure could be controlled by respective weightage.

retweet his tweets and they also mention him in their tweets asking for suggestions or approvals. It is not only the tweets however he has also possessed a position in the network which enable him to connect people of different networks which are otherwise disconnected. This position makes his status very important that people belonging to different networks like to communicate with him giving him the credit to cause flow of information among different sub networks. It has been found that this user has positive opinion about PTI (concerned political party) and spread same messages containing same polarity to his connected users. Summarizing all these information give us the detail that user_174 has good opinion about the party, he has influential position in the network structure, and users like to refer him for the information about party. User_174 fulfills many criteria to be claimed as an opinion leader. PTI can use him to spread the better image of the party intentions and can increase their vote bank for future election. The second highest rank is assigned to user_319 and his analysis reveals that he is working as the official person of PTI on tweeter network. This person is also responsible for the flow of positive sentiments having high degree of tweets/reply/mention. People contact him to ask the queries about party decisions and they retweet his messages to other users. The information provided by him is considered authentic and reliable which also show the characteristics of leaders. The WDBCC score has played a significant role in identifying the possible leaders in a network which can be seen by the cases discussed above. VI.

CONCLUSION

Opinion leaders are great people who brought the changes and influenced the masses with their revolutionized ideas. Today Social Media has become the primary way where public has large accesses to maintain the closer ties and to stay informed about the world. Social media has also taken the role to provide a platform where people can interact with others and it helps in identifying the Leaders in the network. In this paper an approach has been proposed which calculates a score called WDBCC to rank the users in a network on the basis of network structure and direction of the flow of information. Experiment shows that the approach is useful to identify opinion leader when the node in the network have higher connections and when they are the only connecting point for disconnected components. Fig. 3. Top fifteen key players Composition by WDBCC

In future we plan to calculate WDBCC score on dataset of different domains and to further extend this study to identify more characteristics of opinion leaders.

V. DISCUSSION: To understand the finding and observations from the presented work, we analyze few top scored users tweets to understand the reason behind their high scores. For this purpose top two users are discussed here. Consider Table 3, where WDBCC score of user_174 is at the top of the list having maximum score. The analysis of his activities on this network reveals that he is referred by maximum number of time by other users in their tweets. Users like to

REFERENCES [1] [2]

Ashleigh Morpeau, "Twitter and political and civic engagement: Is there a relationship?",2011. Burns. A., & Eltham B,"Twitter free Iran: An evaluation of Twitter’s role in Iran’s 2009 election crisis", Record of the Communications Policy & Research Forum ,2009.

[3]

[4] [5]

[6]

[7] [8]

[9] [10]

[11] [12] [13] [14]

[15]

[16]

[17]

[18]

[19]

[20]

B. Epstein , R & Kraft , "Why Less is Doing More: The Political Uses, Influence, and Potential of Twitter", Conference Papers Southern Political Science Association,2010. M. Cha, & K. P.Gummadi,"Measuring User Influence in Twitter : The MillionFollower Fallacy",pp 10–17, 2010. E. Bakshy, J. M.Hofman, D.J Watts,W.A Mason, "Everyone’s an Influencer:Quantifying Influence on Twitter Categories and Subject Descriptors",WSDM proceedings of the fourth acm international conference on web search and data mining,pp. 65–74, 2011. K. Lerman, R. & Ghosh, "Information contagion: An empirical study of the spreadof news on digg and twitter social networks",Proceedings of 4th international conference on weblogs and social media (icwsm),2010. E.M Rogers,"Diffusion of Innovations", 4th Edition, Free Press ,2010. D. Watts , P. & Dodds, " Influentials, Networks, and Public Opinion Formation " Journal of Consumer Research, 2010.Katz, E., & Lazarsfeld, P. F."Personal Influence: The Part Played by People in the Flow of Mass Communications",Transaction Publishers,2006. E. Katz ,"The Two-Step Flow of Communication: An Up-To-Date Reporton an Hypothesis", Public Opinion Quarterly,1957. J. Dillard, C. Segrin, J. & Harden," Primary and secondary goals in the productionof interpersonal influence messages",Communication Monographs, 1989. D. C. Mutz," Hearing the other side: Deliberative versus participatory democracy",Cambridge University Press, 2006. Pinar Bilgic, “Detecting Opinion Leaders in Twitter ”, In Digital Marketing, Psycholog, Social Media, June 2013. Huanhuan Liu, Xiaoqing Yu, Jing Lu, “Identifying Top-N Opinion Leaders on Local Social Network,” ICSSC, 2013. Freimut Bodendorf, Carolin Kaiser, “Detecting Opinion Leaders and Trends in Online Social Networks,” SWSM’09, Proceedings of the second ACM workshop on social web search and mining, pp. 65-68, ACM, New York, NY, USA, 2009. Stephen P. Borgatti, “Identify sets of Key players in a social network,” Computational & Mathematical Organization Theory, Vol. 12, Issues 1, pp. 21-34, April 2006. M. Zubair Shafiq, Muhammad U. Ilyas, Alex X. Liu, Hayder Radha, “Identifying Leaders and Followers in Online Social Networks,” IEEE Journal on Selected Areas in Communications (JSAC) – 2013 special issue on Emerging Technologies in Communications, Vol. 31, No. 8, pp. 1-13, Aug 2013. Mohamma Abdel-Ghany, “Identifying Opinion Leaders using Social Network Analysis, A study in an Egyptian Village,” Russian Journal of Agriculture and Socio-Economic Sciences, Vol. 4, pp. 12-19, Africa, 2012. Chang Sup Park, “Does Twitter motivate involvement in politics? Tweeting, opinion leadership, and political engagement.” Computers in Human Behavior ,pp 1641–1648, 2013. Lu Zhong, Chao Gao, Zili Zhang, Ning Shi, Jiajin Huang, “Identifying Influential Nodes in Complex Networks: A Multiple Attributes Fusion Method”,AMT,pg.11-22, 2014. Jingyu Zhou, Yunlong Zhang, Jia Cheng“Preference-based mining of top-K influential nodes in social networks. Future Generation Comp. Syst. (FGCS) ,pp 31-47, 2014.