2010 International Conference on Advances in Social Networks Analysis and Mining
Visualizing the evolution of users’ profiles from online social networks Dieudonné Tchuente, Marie-Françoise Canut, Nadine Baptiste Jessel, André Péninou, Anass El Haddadi IRIT (Institut de Recherche en Informatique de Toulouse) Generalized Information Systems 118, Route de Narbonne, F-31062 Toulouse, France
[email protected],
[email protected],
[email protected],
[email protected] ,
[email protected] Abstract— Nowadays, online social networks host more and more applications in order to provide their users with the possibility of finding everything they need on a single platform. The number and diversity of interactions that take place over time between users and applications within these platforms make these environments very good candidates for learning various types of information about users’ interests. We are particularly interested in the determination of users’ short-term and long-term interests which are essential for adaptative systems that take into account the evolution of user’s needs. While studies in adaptative systems focus on computing interests’ weight value and time periods to determine user’s short-term and long-term profile, we focus instead on temporal graphs’ visualization of users’ interests. From a case study on Facebook, we use dynamic graphs in order to view the influence of social ties on the user’s interests.
the user’s long-term interests. A short-term profile contains user’s interest in a fixed period of time (e.g., a single search session) [3]. For the previous example, we can consider “java tourism” as part of the user’s short-term interests. Generally, to better infer the user needs, it is important to consider long-term and short-term profiles [1]. Actually, research works are interested in computing interests’ weight value and time periods to determine users’ short-term and long-term profile [4, 5, 6]. In this paper, we focus instead on temporal graphs’ visualization of users’ interests, for designing the users’ shortterm and long-term profile. Our experimentation is carried out, particularly, in the context of online social networks (OSN) with the added advantage of viewing the influence of social ties in the evolution of users’ profiles. The structure of our paper is as follows. Section II explores related work in data extraction and analysis of users’ profiles in online social networks. Section III presents our methodology and some results of our experimentation on Facebook. Section IV highlights the conclusions and suggests ideas for future research.
Keywords-component: user profile; social networks; temporal graph; dynamic graph; text-mining.
I.
INTRODUCTION II.
Generally, the main goal of constructing users’ profile in information systems is the adaptation of the content of information to the user’s characteristics or interests. There are many applications on the web where building users’ profiles is very important, such as customization of query results in search engines depending on users’ preferences. For instance, let us suppose that a hard programmer user enters the word “java” in a search engine, it is very important for the search engine to know that the user is a programmer and thus, it should first provide him with results showing tutorials for java programming language. An adaptive system could also consider another case where the user is a programmer but every summer, he used to search information about touristic locations where he can spend his holidays. In this case, depending on the period of the user’s search, first results will be about java Indonesian Island or Java tutorials.
Actually, OSN (e.g., Facebook, MySpace, Orkut) became real “operating systems” or platforms in which a user can find all types of applications he needs. Due to the uniqueness and complexity of OSN, we were interested in one of our previous work in modeling of the users’ profiles for these specific environments [7]. Globally, four major concepts are considered in a user model from the OSN: (i) User concept contains explicit data provided by the user (e.g., name, age, sex, addresses, hobbies) and social ties (e.g., list of friends, list of groups); (ii) Security concept contains data on the security parameters defined by the user in order to control the access to their profile towards other users or third parties; (iii) Application concept contains data on the applications used by the user. It may be internal applications which are provided by default in the platform (e.g., Mail, Instant Messaging, Wall, Photos, Comments, Groups, etc.) or external (Third party) applications which are built by third parties and may include any other features in the OSN; (iv) Stream activities contains data about the use of applications over social ties of a user (generally, traces about activities of user’s friends). Before analyzing the user profiles in OSN, the first challenge is to extract the user’s data. For this extraction, some authors use data represented in RDF ontologies formats like FOAF (Friend of A Friend) or SIOC (Semantically- Interlinked
Basically, profile’s information may be explicitly provided by the user (e.g., favorite’s activities or interests, sex, age, country) or implicitly gathered by a software agent (e.g., cookies, log files of web servers) [1]. A user profile is static if its data does not change over time, or dynamic otherwise. In real cases, profiles are always dynamic and could be long-term or short-term profiles. A long-term profile contains user’s interests which do not vary over time [2]. Taking the example previously mentioned in the first paragraph, we can consider “java programmer” or “java programming language” as part of 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.79
RELATED WORK
370
Online Communities Project) [8, 9]. However, these vocabularies are mainly implemented in blog platforms, and used for security and privacy reasons. Data from OSN platforms are not massively available in these formats. Other authors parse the HTML sources code of user’s profile for data extraction [10], but this kind of extracted data is very incomplete or even nonexistent for many OSN like Facebook or Friendster. External (third party) applications are also a source provided by more and more OSN for data extraction on users’ profiles [11]. While these applications require explicit permission of the user for data collection, they involve many security risks [12, 13], they still provide the means to access more complete and updated data from users’ activities. For our experimentation, we have used an external application in Facebook. Once data from a social network are available, the methods and tools used for the analysis of the social graphs are mostly static and only consider graphs between users’ [14]. In our approach, we enrich users’ social graphs with their interests constructed from qualitative data on their activities (use of applications). This method is similar to the construction of the users’ semantic profiles [1], but with two major advantages: (i) visualization of the evolution of each user interests over the graph which can help to design the user’s short-term and longterm interests, (ii) visualization of the influence of the social ties on user’s interests by dynamic graph generation. The methodology and firsts results of our experimentation are highlighted below. III.
most third-party applications [15]. We developed a third-party application (https://apps.facebook.com/analyze_network) that a panel of 85 users has installed on their profile. With the ability to access the user’s friends via the API, we have accessed a total of 7,081 profiles that have been anonymized (with MD5 encryption method) for the study. Our application has only academic goals, and is intended only for voluntaries users who explicitly give us the authorization to maintain and analyze information on their profile during our experimentation. The quality of data available to third party application depends on the security parameters defined by the user (e.g., additional authorizations that the user may give to the application, the position of the user with respect to the application (user of the third party application or a friend of the application user)) [7]. For the Facebook API, we found that the majority of data available on the user of the third party application are stream activities of his friends. So we ask the user to give us explicit authorization to access stream activities (which contains many types of applications) (figure 2). For the user of our third party application, we extract data about his applications like photos, videos, status, links, notes, tags, external applications, etc. The data extracted are mainly text contents (e.g., status, links, notes, tags) associated to the applications used by the user and his friends. Some metadata like the time of the user interaction with applications are also extracted.
METHODOLOGY AND PROFILES VISUALIZATION
Our experimentation can be divided into three major steps (data extraction, data preparation and mining, data or profile visualization) (figure 1) as any Knowledge Discovery in Databases process. Each of these steps is described below.
Figure 2. Data extraction on Facebook.
B. Data preparation and profile building For each user publication within applications or for each stream activity, we extract associated text data and metadata. The discovery of users’ interests is carried out by text mining over all the data published by the user and his friends. Thus, for our experimentation, we have used a platform that integrates both text mining and graph visualization [16], rather than platforms dedicated only to the structural analysis of social networks (e.g. Ucinet, Stocnet, Pajek). Here a document is all data and metadata associated to each user. For text mining, the major steps consist in: structuring the text document, cutting text features and application of filters and dictionaries (figure 1). Filters and
Figure 1. Process for building and visualizing users’ profiles.
A. Data extraction As mentioned in the previous section, we have chosen to use third-party applications in OSN to gather as much data as possible on user’s social ties and activities. We are interested especially in Facebook API because Facebook users are more familiar with third-party applications. Indeed, the Facebook API has taken a step ahead of its main competitors, such as Google OpenSocial, and is currently the platform that hosts the
371
dictionaries are used to take into account the vocabulary and semantics of terms (concepts):
Figure 3. Long-term and short-term interests visualization for a user (“Up”). Time has been divided into 4 semesters based on homogeneity frequency of activities (semester 2 of 2008, semester 1 of 2009, semester 2 of 2009 and semester 1 of 2010). The graph is undirected, node-weighted, and edges weighted. Each node represents a user (histogram bars with different colors) or an interest (histogram bars with blue color). For one node, each bar represents the frequency of the node for a time period. The succession of the bars for each node is made in a clock-wise representation of time-periods, beginning here from semester 1 of 2009. For instance, for the node “up” the red bar represents his activity frequency in semester 1 of 2009, the orange bar represents his activity frequency in semester 1 of 2010, the yellow bar represents his activity frequency in semester 2 of 2008, and the green bar represents his activity frequency in semester 2 of 2009.
•
•
studied, some negative filters already exist and can be reused or enriched.
Positive filters (domains concepts): these filters contain exclusive terms to retain in the document. This may be the projection of text documents on specific domain ontology to select only specific concepts in this area. For our experimentation, we are not interested in a specific domain, thus we used as positives filters all the concepts which appear more than a predefine threshold in the document (e.g., more than 2 or 3 times).
• Synonyms dictionaries: several terms may refer to a single concept of the studied area. Synonyms dictionaries link several terms referring to the same concept. Depending on the studied languages, a synonyms dictionary can be automatically built and enriched from the document. Once interests (domain concepts) are discovered from text, they are projected toward users and their dates of use (time) in order to construct a 3-D co-occurrence matrice [17]. Depending on the desired temporal level of granularity in the analysis, it is possible to split the time in different periods (e.g.,
Negative filters (empty concepts): these filters contain concepts having no meaning in the context of the study. Typical examples of empty concepts are articles of any languages. Depending on the languages
372
of 2009, “U37” for semester 2 of 2008, “U3, U14, U32, etc.” for semester 1 of 2010).
days, months, quarters, semesters, and years). For a period, the weight of a tie between user’s and interests (a value in the 3-D matrice) is the number of times in which they appear together.
The user "Up" studied here has specially agreed to be identified in reality to validate the graph constructed. It was recognized in almost all interests and associated periods in the graph. The interests for which he does not recognize are certainly related to his friends and can be identified by subgraphs or dynamic graphs.
C. Evolutive and dynamic profile vizualisation To view the profiles constructed from co-occurrences matrices, we actually rely on the prototype Visugraph [18] specialized in the analysis of evolutionary and dynamic graphs. The graph of all profiles can be viewed, but for clarity, we present a graph restricted to the egocentric network of a user named “Up” (figure 3).
To better visualize each period and observe trends over periods, the dynamic graph of the profile can be observed by displaying successive graphs of periods. Figure 4 presents the first 2 periods (semester 2 of 2008 and semester 1 of 2009) for the dynamic profile of the user “Up”. We should focus our analysis on the changes between the two periods and not on each period individually. From the first to the second period, some friends of the user have disappeared (U51, U2), but almost all interests in the first period remain in the second period. This could mean that users U2 and U51 do not really influences interests of the user in the first period. However, in the second period, several interests appear (e.g. Youtube, holidays (“vacances”), fashion (“mode”), games (“jeu”), festival (“fête”), girls (“filles”)). These new interests appear at the same time as several friends of users (U13, U18, U22, U28, U33, and U36). These could mean that these new users influence the appearance of the new interests. To better see users involved in each new interest, the sub-graph associated to this interest can be viewed.
The more nodes are involved in several periods, the closer to the center they are. Thus, these nodes can be considered as part of the user’s long-term interests throughout all the studied time-periods (green rectangle, figure 3). Here, these long-term interests include items like photos, music, games (“jeu”), dailymotion, etc. The user "Up" has about 105 friends but only one (“U29”) of these friends is constantly active. Details on the interactivity and the common interests between “Up” and “U29” can be viewed by extracting sub-graph limited to these two users. When nodes are more involved in one particular period, they get closer to the summit representing this period. Thus, these nodes can be considered as part of the user’s short-term interests throughout this time-period (red rectangles, figure 3). In semester 1 of 2009 we clearly identify movies (“films”), sociology or car brand “Renault”. Most interactive most active friends of the user can be identified (e.g., “U57” for semester 1
Figure 4. Two first periods of a dynamic profile (user “Up”). Some users disappear in the second period (green arrows in period 1). New users (green arrows in period 2) and new interests (black arrows in period 2) appear in the second period.
373
IV.
CONCLUSION
[10]
In this paper we address two main points: the visualization of the evolution of users’ interests (to easily visualize users’ long-term and short-term interests) and the influence of the users’ social ties on their interests. We are particularly interested in the context of Online Social Networks. Our approach is based on text mining with temporal and dynamic graph visualization. The objective of this paper is mainly to present our methodology and firsts results. However we still have to improve and validate our method on more profiles extracted from OSN, and to add features for sub-graph visualization in the Visugraph [18] prototype. We also plan to validate the visual profiles constructed by comparison to existing methods for computing users’ long-term and shortterm profiles [5], and for computing social networks influence in user’s profile [19]. V.
[11]
[12]
[13]
[14]
[15] [16]
ACKNOWLEDGMENTS
This research was in part supported by the DIDES (Direction of Innovation and e-Services Development) of La Poste Group (Paris). [17]
REFERENCES [1] Susan Gauch, Mirco Speretta, Aravind Chandramouli, Alessandro Micarelli, “User profiles for Personnalized Information Access” in The Adaptive Web , Vol. 4321 (2007), pp. 54-89. [2] Bin Tan, Xuehua Shen, ChengXiang Zhai, “Mining LongTerm Search History to Improve Search Accuracy” in KDD’06, August 20–23, 2006, Philadelphia, Pennsylvania, USA, pp. 718-723. [3] Mariam Daoud, Lynda-Tamine Lechani, Mohand Boughanem, “Towards a graph-based user profile modeling for a session based personnalized search,” in Knowledge and Information Systems, 2009, pp. 365–398. [4] Kazunari Sugiyama, Kenji Hatano, Masatoshi Yoshikawa, “Adaptative Web Search Based on User Profile Constructed Without Any Effort From the User”, In WWW '04: Proceedings of the 13th international conference on World Wide Web (2004), pp. 675-684. [5] Lin Li, Zhenglu Yang, Botao Wang, Masaru Kitsuregawa, “Dynamic Adaptation Strategies for Long-Term and Short-Term User Profile to personnalized search,” In APWeb/WAIM, Vol. 4505 (2007), pp. 228-240. [6] Marco Benini, Alberto Trombetta, Michela Acquaviva, “A Model for Short-Term Content Adaptation”, In the 14th International World Wide Web Conference, May 10-14, 2005, in Chiba, Japan. [7] Nadine Jessel, Dieudonné Tchuente, Marie-Francoise Canut, André Peninou, “Quelle modélisation des profils utilisateurs des réseaux sociaux numériques? ” In Colloque International EUTIC - Usages et Enjeux des TIC (EUTIC 2009), Bordeaux, 18/11/2009-20/11/2009. [8] Jennifer Golbeck, Matthew Rothstein, “Linking Social Networks on the Web with FOAF: A Semantic Web Case Study”, Proceedings of the Twenty- Third AAAI Conference on Artificial Intelligence (2008). [9] Uldis Bojars, Alexandre Passant, Richard Cyganiak, John Breslin, “Weaving SIOC into the Web of Linked Data”, LDOW2008, April 22, 2008, Beijing, China.
[18]
[19]
374
Sophia Alim, Ruquya Abdul-Rahman, Daniel Neagu, Mick Ridley, “Data Retrieval from Online Social Network Profiles for Social Engineering Applications”, Internet Technology and Secured Transactions, ICITST 2009, pp 1-5. Bernie Hogan, “A comparison of on and offline networks through the Facebook API”, Electronic paper available at http://ssrn.com/abstract=1331029. Joseph Bonneau, Jonathan Anderson, George Danezis, “Prying data out of a Social Network” In 2009 Advances in Social Network Analysis and Mining, pp. 249-254. Patsakis, C.; Asthenidis, A.; Chatzidimitriou, A.; « Social networks as an attack platform: Facebook case study”, Networks, 2009. ICN '09. Eighth International Conference Page(s):245-247, 2009. Mohsen Jamali, Hassan Abolhassani, « Different Aspects of Social Network Analysis », Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI’06). Facebook statistics 2009, http://www.facebook.com/press/info.php?statistics Josiane Mothe, Claude Chrisment, Taoufiq Dkaki, Bernard Dousset, Said Karouach, “Combining mining and visualization tools to discover the geographic structure of a domain”, In Computers, Environment and Urban Systems, Elsevier, Numéro spécial Geographic Information Retrieval, Vol. Hors-série N. 4, p. 460-484, juillet 2006. Eloïse Loubier, Bernard Dousset, “Visualisation and analysis of relationnal data by considering temporal dimension”, In International Conference on Enterprise Information Systems (ICEIS 2007), Funchal, Madeira Portugal, 12/06/2007-16/06/2007, Vol. ISAS, INSTICC Press, p. 550-553, juin 2007. Brigitte Gay, Eloïse Loubier, “Dynamics and Evolution Patterns of Business Networks”, In 2009 Advances in Social Network Analysis and Mining, pp. 290-295. Masahiro Kimura, Kazumi Saito, Ryohei Nakano, and Hiroshi Motoda, “Finding Influential Nodes in a Social Network from Information Diffusion Data” Social Computing and Behavioral Modeling (2009), pp. 1-8.