Comparing Group Characteristics to Explain Community Structures in ...

9 downloads 133678 Views 995KB Size Report
Recently, the identification of community structures among social media networks (SMNs) becomes a hot topic. The characteristics of SMNs have typically been ...
Comparing Group Characteristics to Explain Community Structures in Social Media Networks

957

Comparing Group Characteristics to Explain Community Structures in Social Media Networks Ruei-Yuan Chang, Sheng-Lung Peng, Guanling Lee, Chia-Jung Chang Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan [email protected], {slpeng, guanling}@mail.ndhu.edu.tw, [email protected]

Abstract Recently, the identification of community structures among social media networks (SMNs) becomes a hot topic. The characteristics of SMNs have typically been analyzed by clustering SMN users based on their relationships. However, popular SMNs such as LiveJournal and Flickr allow users to join or create communities according to their personal interests. In contrast to previous studies that have typically employed cluster strategies to categorize SMN users, this paper examines the communities that users have joined, i.e., people can belong to multiple communities. The structures are analyzed using data collected from four popular SMNs, namely, LiveJournal, Flickr, Orkut, and YouTube. Several measurements are proposed to model the characteristics and structures of these communities, and the experimental results show that the interconnection among communities is high, especially among Flickr and YouTube users. The findings of this study differ considerably from previous studies that have applied a cluster analysis methodology. Keywords: Social media network, Cluster analysis, Measurement.

1 Introduction Accompanying the growth of social media networks (SMNs) such as Facebook and Friendster, many users are replying on SMNs to communicate with friends and express their daily life experiences. Moreover, SMN platforms are becoming a huge dissemination and marketing platform that allows information and ideas to reach large population in a short period. As mentioned in [19], the number of internet users in Taiwan is approximately 16.7 million in 2011. Moreover, in 2011, the most widely used social networking site in Taiwan is Facebook which contains about 92.3% of internet users. Some studies elucidating the behavior of SMN users is a critical topic. In [3][5][8] [15], problems associated with measuring the effect of information dissemination and identifying a small subset of nodes within an SMN to maximize the spread of influence have been discussed. Based on a social graph and a lot of user actions, [6-7] and [14] determined the propagation *Corresponding author: Sheng-Lung Peng; E-mail: [email protected] DOI: 10.6138/JIT.2015.16.6.20140306

of influence when SMN users performed specific actions. Furthermore, the roles of leaders and followers in an SMN were defined by analyzing the number of users affected by a certain user within a certain period. Moreover, [1-2] [4][10][16-17] discussed the problems associated with segregating SMN users into clusters. SMN users were considered as belonging to a cluster if they exhibited both strong connections with other users in that cluster and weak connections with users outside of that cluster. In previous research, users have been categorized into a cluster according to their degrees of intra- and interconnection were high or low. However, regarding to current popular SMNs, users can join or create their own communities. Therefore, the purpose of this study is to elucidate the structures and characteristics of actual communities that users have joined, and to represent the relationship among SMN users by applying these characteristics. The dataset we employed to analyze the community structures were collected from the following four popular SMNs: (1) LiveJournal; (2) Flickr; (3) Orkut; and (4) YouTube. Moreover, we propose several measurements to model the community characteristics and structures. The remainder of this paper is organized as follows. The problem definition and three measurements are presented in Section 2. The experiments conducted on the four datasets and results are discussed in Section 3. Finally, Section 4 concludes the paper.

2 Problem Definition and Proposed Measurements As discussed in [12-13], an SMN is defined as an interaction graph G = (V, E), where V denotes the vertex set of G representing the SMN users and E denotes the edge set of G with eij being an edge of E if nodes i and j are friends in the network G. In previous studies [11], the users (i.e., nodes in G) were segregated into clusters according to their degrees of their interconnectedness. In other words, the nodes were grouped into a cluster if the intra-similarity has typically been measured as a function of the number of connections among the nodes. Figure 1 shows an example of this concept, where the nodes are partitioned into three clusters. However, as discussed, users of current popular SMNs can join or create their own communities. In particular, a

958

Journal of Internet Technology Volume 16 (2015) No.6



Figure 1 Cluster Example

user can join multiple communities according to his or her personal interests. This differs from the cluster concept proposed in previous works because in the real community model, members can belong to multiple communities, and a community does not necessarily exhibit tight connectivity. Therefore, the measurements for modeling a cluster are unsuitable for measuring the characteristics of the SMN communities. To address these problems, we propose three measurements for modeling the characteristics of a community that are based on the purpose of a real community. Community center: If a user has many friends in a community, then the information expressed by that user would be noticed by many users in that community. Therefore, we define the community center as the member who has considerably more intra-community connections than other nodes have in that community. Therefore, the center degree ciA of node i in Community A can be expressed as

ciA =

(1)

where degreeiA denotes the number of connections (i.e., friends) of node i in Community A, and |NA| is the number of members in Community A. Therefore, |NA| - 1 is the maximum number of connections of a node in Community A. When ciA is larger than a predefined threshold, node i is considered as a center of Community A. Moreover, the center set CA of Community A is the collection of centers in Community A. Intra-connection degree: We propose the concept of intra-connection degree to measure the internal structure of a community, which is measureed by adapting the complete connections concept. The degree of intra-connection IntraA within Community A can be obtained from the foillowing equation:

IntraA =

(2)

where |EA| denotes the number of edges in Community A, and the denominator indicates the maximum number of edges that Community A can have. Thus, a larger value for IntraA implies a tighter connection among members of Community A. Inter-connection degree: We apply the concept of community centers to measure the degree of interconnectedness among nodes. The basic concept implies that if a center of Community A is also the center of other communities, then Community A would possess a strong connection with those other communities. Futhermore, a node that is the center of multiple communities is denoted as OC. The degree of interconnectedness of Community A, denoted as InterA, is measured by

IntraA =

(3)

where the numerator represents the number of OC’s in Community A. A high Inter A would indicate a high proportion of OC’s in CA, in which case Community A would be considered to have a strong connection with other communities. In the following section, a set of experiments are conducted on real datasets by comparing the proposed measurements with the dataset characteristics.

3 Experimental Results The real datasets of four SMN platforms (LiveJournal, Orkut, Flickr and YouTube) were collected from [18] to perform the experiment. LiveJournal was developed by Brad and Fitzpatrick in 1999. It is a global SMN platform where users share common passions and interests. Orkut is an SMN service provided by Google and users can build their own virtual social links on the Internet. Flickr, which was developed by Ludicorp, is a platform that allows users to upload and share their pictures. Moreover, users can tag their pictures to ease the browsing process. YouTube is a video-sharing website that allows users to upload, view, and share videos. The majority of YouTube content has been uploaded by regular users, although media corporations such as CBS employ YouTube to disseminate their media content. Unregistered users can visit YouTube to watch videos, and registered users can upload an unlimited number of videos. The first experiment shows the relationship between the number of community members and the intra-connection degree of communities. Figures 2 and 3 show the results for small and large communities, respectively.

Comparing Group Characteristics to Explain Community Structures in Social Media Networks

As shown in the figures, the mean degree of intraconnection among the communities with 2 ~ 10 nodes is considerably greater than that of other communities. This indicates that the relationship between the members of small communities is very tight. Moreover, when the number of nodes in a community exceeds 10, the degree of intra-connection becomes substantially smaller, implying that these communities have a loose connection structure. This is because the users join the community according to their interests, implying that the members of a community have similar interests, and that they do not necessarily know each other. In the second experiment, the relationship between the community size and the ratio of the number of the centers to the number of the nodes in the community is examined. Figures 4 and 5 show the results for small and large communities, respectively. Figure 4 shows that the ratio of the number of centers to the number of members tends to increase in conjunction with the community size. The ratio increases to peaks when the community size ranges from nodes. However, Figure 5 shows that when the number

959

of nodes in a community exceeds 100, the ratio tends to decrease, although this tends to stabilize when the number of nodes exceeds to 400. These observations indicate that the number of centers initially increases in conjunction with the size of a community, although the ratio stabilizes when a community matures into a major community. The third experiment shows the relationship between the number of community members and the degree of interconnection for the community. Figures 6 and 7 show the results for small and large communities, respectively.

Figure 4 The Average Ratio of Small Size Communities

Figure 2 The Average Intra-Connection Degree of Small Size Communities

Figure 3 The Average Intra-Connection Degree of Large Size Communities

Figure 5 The Average Ratio of Large Size Communities

Figure 6 The Average Interconnection Degree of Small Size Communities

960

Journal of Internet Technology Volume 16 (2015) No.6

This implies that the larger the value of α, the smaller the satisfied community size. Consequently, the number of centers does not increase in conjunction with α.

4 Conclusion

Figure 7 The Average Interconnection Degree of Large Size Communities

As indicated in the figures, the degree of interconnection among Flickr and YouTube users is quite large, implying that the centers of one community are also likely to be the centers of other communities. This is attributable to the main primary functions of Flickr (i.e., photo-sharing) and YouTube (i.e., video-sharing). Therefore, a user can post a popular photo and video in several communities to increase his or her popularity. The simulation results show that the interconnection among real communities is high, which differs considerably in comparison to the concept of cluster analysis that has been proposed in previous studies. The fourth experiment identifies the relationship between the degree of intra-connection and the number of centers in a community. In the experiment, a threshold α% is used to select the community with a degree of intraconnection larger than the threshold. A larger value for α implies that the degree of intra-connection is high. Figure 8 shows the experimental results. The figure indicates that the number of centers in a community is stable, relative to α. The number of centers does not increase in conjunction with α because the degree of intra-connection with large communities is typically small, as indicated by the result for the first experiment. Therefore, as α increases, the size of a community that satisfies the threshold decreases.

By exploring the structures and characteristics of specific SMNs according to the actual community that the users have joined, we have elucidated the characteristics of SMNs. We propose the following three measurements to model the characteristics of SMNs based on the community structures: (1) community center; (2) degree of intraconnection; (3) degree of interconnection. To measure the degree of interconnection for a community, we proposed the concept of centers and defined a strong connection among communities. Moreover, we conducted experiments using four real datasets (LiveJournal, Flickr, Orkut, and YouTube) to analyze the characteristics driving SMNs. According to the experimental results, we observed that the communities tended to exhibit a loose connection structure when the number of members exceeded 10. Furthermore, when the communities reached a certain scale (i.e., the community matures or becomes a major community), the ratio of the number of centers to the number of community members stabilized. Moreover, we observed considerably high degrees of interconnection among Flickr and YouTube communities. This differs considerably from the concept of cluster analysis that has been proposed in previous studies. Recently, an evolution of social networks based on tagging practices was proposed [9]. It will be interesting whether these tags can be a kind of characteristics to explain community structures or not.

Acknowledgements This work was partially supported by the National Science Council of Taiwan, under contracts NSC 101-2221E-259-002 and NSC 101-2221-E-259-004.

References

Figure 8 The Average Number of Centers of Different α

[1] L. Backstorm, D. Huttenlocher, J. Kleinberg and X. Lan, Group Formation in Large Social Networks: Membership, Growth, and Evolution, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, 2006, pp. 44-54. [2] M. Biryukov, Co-author Network Analysis in DBLP: Classifying Personal Names, Second International Conference MCO 2008, Metz, France, 2008, pp. 399408.

Comparing Group Characteristics to Explain Community Structures in Social Media Networks

[3] W. Chen, Y. Wang and S. Yang, Efficient Influence Maximization in Social Networks, 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 199-208. [4] D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg and S. Suri, Feedback Effects between Similarity and Social Influence in Online Communities, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, 2008, pp. 160-168. [5] P. Domingos and M. Richardson, Mining the Network Value of Customers, 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2001, pp. 57-66. [6] A. Goyal, F. Bonchi and L. Lakshmanan, Discovering Leaders from Community Actions, 17th ACM C o n f e re n c e o n I n f o r m a t i o n a n d K n o w l e d g e Management, Napa Valley, CA, 2008, pp. 499-508. [7] L. B. Jabeur, L. Tamine and M. Boughanem, Active Microbloggers: Identifying Influencers, Leaders and Discussers in Microblogging Netwroks, 19th International Conference on String Processing and Information Retrieval, Cartagena de Indias, Colombia, 2012, pp. 111-117. [8] D. Kempe, J. Kleinberg and É. Tardos, Maximizing the Spread of Influence through a Social Network, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washinton, DC, 2003, pp. 137-146. [9] H.-L. Kim, J. G. Breslin, H.-C. Chao and L. Shu, Evolution of Social Networks Based on Tagging Practices, IEEE Transactions on Services Computing, Vol. 6, No. 2, pp. 252-261, April-June, 2013. [10] A. Lancichinetti, S. Fortunato and J. Kertész, Detecting the Overlapping and Hierachical Community Structure in Complex Networks, New Journal of Physics, Vol. 11, March, 2009, doi: 10.1088/1367-2630/11/3/033015. [11] C. Lee, F. Reid, A. McDaid and N. Hurley, Detecting Highly Overlapping Community Structure by Greedy Clique Expansion, SNAKDD 2010: The 4th International Workshop on Social Network Mining and Analysis, Washinton, DC, 2010, pp. 3-9. [12] J. Leskovec, K. J. Lang, A. Dasgupta and M. M. Mahoney, Statistical Properties of Community Structure in Large Social and Information Networks, 17th International Conference on World Wide Web, Beijin, China, 2008, pp. 695-704. [13] L. Licamele and L. Getoor, Social Capital in Friendship-Event Networks, 6th IEEE International

[14]

[15]

[16]

[17]

[18] [19]

961

Conference on Data Mining, Hong Kong, China, 2006, pp. 959-964. D. Lu, Q. Li and S. S. Liao, A Graph-Based Action Network Framework to Identify Prestigious Members through Member’s Prestige Evolution, Decision Support Systems, Vol. 53, No. 1, April, 2012, pp. 44-54. M. Richardson and P. Domingos, Mining KnowledgeSharing Sites for Viral Marketing, 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002, pp. 61-70. M. G. Rodriguez and M. Rogati, Bridging Offline and Online Social Graph Dynamics, 21st ACM International C o n f e re n c e o n I n f o r m a t i o n a n d K n o w l e d g e Management, Maui, HI, 2012, pp. 2447-2450. D. Schiöberg, F. Schneider, H. Schiöberg, S. Schmid, S. Uhlig and A. Feldmann, Tracing the Birth of an OSN: Social Graph and Profile Analysis in Google+, 3rd Annual ACM Web Science Conference, Evanston, IL, 2012, pp. 265-274. IMC 2007 Data Sets, http://socialnetworks.mpi-sws. org/data-imc2007.html Department of Industrial Technology, Ministry of Economic Affairs, ITeS in Taiwan, 2011, http://web. iii.org.tw/ReadFile/?p=Product&n=2012330114025. pdf

Biographies Ruei-Yuan Chang received the BS degree in applied mathematics from National Dong Hwa University, Hualien, Taiwan, R.O.C., in 2006, and the MS and PhD degrees in computer science and information engineering from the same university in 2008 and 2016, respectively. His research interests are in designing and analyzing algorithms for Combinatorics and Networks. Sheng-Lung Peng is an associate professor of the Department of Computer Science and Information Engineering at National Dong Hwa University, Hualien, Taiwan. He received the BS degree in Mathematics from National Tsing Hua University, and the MS and PhD degrees in Computer Science from the National Chung Cheng University and National Tsing Hua University, Taiwan, respectively. His research interests are in designing and analyzing algorithms for Combinatorics, Bioinformatics, and Networks.

962

Journal of Internet Technology Volume 16 (2015) No.6

Guanling Lee received the BS, MS, and PhD degrees, all in computer science, from National Tsing Hua University, Taiwan, Republic of China, in 1995, 1997, and 2001, respectively. She joined National Dong Hwa University. Taiwan, as an assistant professor in the Department of Computer Science and Information Engineering in August 2001, and became an associate professor in 2005. Her research interests include resource management in the mobile environment, data scheduling on wireless channels, search in the P2P network, and data mining. Chia-Jung Chang received the MS degree in computer science and information engineering from National Dong Hwa University, Taiwan, Republic of China, in 2012. His research interests include data mining, database, and social network analysis.

Suggest Documents