LNAI 4694 - User Modeling in the Social Web - Springer Link

User Modeling in the Social Web Francesca Carmagnola, Federica Cena, and Cristina Gena Department of Computer Science, University of Torino Corso Svizzera 185, Torino, Italy {carmagnola,cena,cgena}@di.unito.it

Abstract. This paper presents the idea to reason over user’s tags to define and enrich the user model. We apply our approach to an adaptive web-based and multi-device social recommender system: iCITY, which exploits a tag-based user model, enriched from the information derived from the tags inserted in the system by users, and filled also with the tags the user has already exploited in other social web sites. Moreover, we propose an architecture to enable the iCITY tag-based user model be exported and shared with other social applications in a semantic enhanced way. Finally, we propose the sharing of the user profile, together with the list of tags, in a shared syntax (such as RDF(S), OWL, RSS). Keywords: user modeling, tagging, social application, Web 2.0.

1 Introduction Nowadays, we are assisting to a big transformation of the Web as we are used to conceive. We refer to a new paradigm, the Web 2.0 [1], a label coined by O’Reilly Media in 2004, to address a new generation of Web-based services such as social networking sites, social software [2], wikis, communication tools, weblogs, postcasts, RSS feeds (and other forms of many-to-many publishing) and folksonomies [3], that emphasize online collaboration among users. This new paradigm offers users several ways to easily participate in the creation of web contents: it makes easy and stimulating the process of tagging (labelling resources by means of keywords), inserting new contents, sharing objects, providing comments and so on. In this sense, the most popular Web tools [4] are Del.icio.us1 , which allows tagging and sharing bookmarks, Flickr2 which allows storing, sharing and retrieving photos, Digg3 which allows sharing news, and Youtube4 which allows partaking videos. On the one side, the “one-size-fit-all” approach seems useless to satisfy user needs in this novel social environment. The user is no more a passive entity: she is active and demanding, and she requires her needs to be personally satisfied. Thus, personalization,

1 2 3 4

This work has been funded by the the Municipality of Torino and the CSP research center. We want to thank Franco Carcillo, Andrea Toso, Luca Console, Fabiana Vernero, Anna Goy for their support. http://del.icio.us/ http://www.flickr.com/ http://www.digg.com/ http://www.youtube.com/

B. Apolloni et al. (Eds.): KES 2007/ WIRN 2007, Part III, LNAI 4694, pp. 745–752, 2007. c Springer-Verlag Berlin Heidelberg 2007

746

F. Carmagnola, F. Cena, and C. Gena

already considered crucial in many areas (from e-commerce to e-learning, from tourism and cultural heritage to digital libraries and so on) is deeply required in the social web as well. On the other side, with the proliferation of user-adaptive web-based systems, a user can easily interact in the Web with different systems that collect data about her [5]. More specifically, to perform adaptation, a user-adaptive system needs to maintain a user model with the user profile (all the available personal information about a user), user’s preferences and system’s assumptions about the current user’s state. On the basis of such user information, the system exploits reasoning strategies (such as heuristic rules, decision trees, Bayesian networks, production rules, inductive reasoning, etc.) to derive user knowledge, update the model and decide about adaptation strategies and techniques. Up to now, user-adaptive applications build a model of the user starting from personal information (e.g. characteristics, preferences, knowledge, interests, goals, activities). Such information can be explicitly provided by the user herself, or implicitly inferred by her interactions with the system. With the coming of the Web 2.0, systems have the possibility to collect also all the tags the user uses to label items. Therefore, there is the great chance for systems to exploit tag annotation in order to enrich and extend the user model [6]. “Annotations can become part of his user profile as an indication of his perspective on the content collection and interest in the annotated object” [7]. Thus, the systems can obtain from the tags the user has inserted, knowledge about themselves in terms of preferences, interests, etc. Moreover, systems may use this “tag-enriched” profile for recommendations, using techniques as collaborative filtering or case based reasoning. To infer knowledge about a specific user by reasoning on tags, we need the means to analyze the semantics of tags. Thus we propose to map all these tags to some ontologies; in this way, systems may reason over the semantics of the tags, especially in relation with the user who has inserted them (See also [8]. Besides the idea of reasoning on tags to exploit a tag-based user model, we led a further consideration. Currently each system builds its own model of the user; consequently user data are fragmented over many applications on the web and the user profile is inherently distributed and often not consistent. In this direction, we present the idea of providing systems with the possibility of exchanging the tags (and the related information) used by a specific user interacting with a system. The advantages of this solution are mainly related to the possibility for social systems who want to offer personalized service to overcame the “cold start problem” which may occur at the beginning of the interaction with a new user, when no information about her is available. If systems exchange the list of tags a specific user has inserted, they do not need to wait for a big amount of tagging activities from the user, and at the same time they avoid bothering the user asking for some personal information during the registration phase. Furthermore we are investigating the possibility of adopting the same approach and of use in the same architecture to exchange the user profile across web-based systems, like in a cross-system personalization approach. The paper is structured as follows: Sec. 2 presents a possible scenario in the social adaptive web, Sec. 3 presents our proposal of reasoning on tags to enrich the user model, Sec. 4 illustrates our proposed architecture for interoperability of user models and tags, while Sec. 5 concludes the paper.

User Modeling in the Social Web

747

2 Scenario iCITY5 is a social web-based and multi-device recommender system. It provides suggestions on cultural events in the city of Turin, and allows users inserting new events, adding information to events, inserting comments and tags. Recommendations are based on the user model enriched with tags, and on the user location, and the presentation interface is adapted to the device being used. A general description of iCITY can be found in [6], while a description of the tag-based user model can be found in [8]. Now imagine a scenario where the user Cristina - which regularly uses social softwares such as Del.icio.us (to collect and share bookmarks) and Flickr (to store, share and retrieve photos) - decides to register in iCITY in order to be always aware of the events in Turin. Furthermore she can store and tag the events in the iCITY agenda, and retrieve them on her GPRS equipped mobile phone when she is around. During the registration phase she inserts, besides some other personal information, her accounts (express through URLs)6 of the web communities she belongs to. Such an operation is due to the fact that she discovers that iCITY can learn her interests and preferences by reading the tags she has inserted in other web community tools. Moreover, iCITY also monitors and reasons about her direct interactions with the system itself and about the tags she inserts. She believes very useful to be helped by iCITY in finding more interesting events, also because she knows she can always check the assumptions about her interests and preferences in her scrutable user profile and eventually correct them.

3 Reasoning on Tags This section investigates the possibility of reasoning on the semantics of tags inserted by users. This operation aims at enriching the user model both by refining the value of existing user features and inferring new user features that were not present before. To do so, we perform three main tasks : 1) Categorization of tags. We start with a preliminary evaluation with the goal of exploring in which way users tag events, and how this information can be used for user modeling. To simulate the tagging activity on iCITY,we chose a list of events from the RSS channel feeding our system. We selected 15 events of different categories (art, theater, cinema, music, books), grouped them and presented them to three different groups of users. We selected 39 users from students (23 subjects), researchers working in our departments (10 s.), relatives and friends (6 s.). Then, we presented each user a printed list with 5 events and correspondent description, and we asked them to tag the events. Users were free to write their own tags (up to 5 tags for event) or to choose them from the event description (this second option is motivated by the fact that iCITY suggests also tags that are automatically extracted from the event description). At the end, we collected 217 tags and we analyzed them in an inductive way, according the principles of the Grounded Theory [9]. 5 6

A work in progress prototype can be found at http://icity.di.unito.it/dsa/ http://del.icio.us/cristinagena; http://www.flickr.com/photos/alfia1973

748


From the analysis two categories emerged: proposed tags (76%), tags derived from the event description, and free tags (24%), tags not derived from the description. After this, we deeper analyzed tags, considering other properties such as: specific tags (61.19%), adding a more specific information about the event, generic tags (22.37%), classifying the event in a more general category) ; contextual tags(13.24%), about the event context, such as location, time, etc.); synonym tags (2.74%), synonyms of words in the event description); invented tags (2.17%), such as unhyphenated compound words like ”PicassoExhibition”. Notice that these identified categories are similar in same way with those ones defined by the group working on “My Web 2.0”, the social search engine of Yahoo! [10]. With respect to their analysis,in our classification two categories are missing: subjective tags (tags that express user’s opinion and emotion) and organizational tags (tags that identify personal stuff). we did not find any occurrence of these kind of tags. This can be motivated by the fact that our sample users had no interest in classifying resources, since it was a testing situation and not a real usage situation, where user can use tags to organize and share their resources). Considering this gap, we integrated the classification test with these new categories not emerged from it: Subjective tags and organizational tags. Afterwards, we considered also the typologies of tags suggested by iCITY: i) the most popular tags in the community; ii) the most used tags previously inserted by the user, and iii) the tags recommended on the basis of the user model features combined with the event description. As a consequence, our classification is extended with : most popular tags, most used tags and recommended tags. These categories are considered as subclasses of the more general class proposed tags. 2) How to automatically analyze tags. First of all, we had to solve the issue of how process all the above seen categories. Some tags can be analyzed exploiting the iCITY events ontology. However, a better solution could be to analyze tags by mean of a natural language ontology, such as WordNet. In the following we provide, for each category of our classification, some ideas of how to analyze them. Proposed tags/free tags.This is the easiest category to analyze, since the categories are based on the user selections and the proposed selections are controlled by the system. Thus, it is possible to check if the selected tags derived from the system’s inference (recommended tags), if they derive from the tags most used by the user, if they were inserted by other users (most popular tags), if they are inserted for the first time by the user (free tags) and in this case which of them do not belong to the WordNet dictionary (invented tags). Generic/specific tags. Tags are classified as ”general” when it is possible to map them on the upper categories of the iCITY ontology; ”specific” when they can be mapped on lower concepts or instances of WordNet linked to the ontology categories. Synonym tags. Inserted tags are compared with WordNet vocabulary for identifing if they are synonyms of the word used in the description of the specific event. Contextual tag. iCITY tries to discover whether the tag is related to the context of the event, using the WordNet vocabulary. This operation is possible only for tags with a


749

well-defined format (e.g. time) or for tags which represent instances of concept previously identified as contextual ones in WordNet (e.g. location-based concepts). Subjective tags. The tags expressing opinion and emotion can be identified by means of WordNet. organizational tags. These tags can be used to organize events and personal stuff of the user. Thus, it is difficult to recognize them by using WordNet. We considered tags as “organizational” when they are used with a high frequency by the same user. In this case, we assume that the user used them to retrieve the event in future. Last but not least, we also analyzed the meaning of the tag. WordNet can return the category to which the tag belongs and we used this information to check whether the tag belongs to the same category of the event. E.g., a user could tag the movie “Ray” (about the Ray Charles’ life) with the tag “jazz”, which is a lower concept of WordNet category. Now, we need to face the problem of the possible polysemy of tags, which can make difficult the use of WordNet. Finally, we considered also the action made by a user in adding tags as a feedback for user modeling. From the action of tagging, we can infer the following user dimensions : i) the interactivity level, how much the user interacts with the system. It is related to the willingness of the user to interact, and to the real possibility the user has to interact. Since the action of tagging requires more effort to accomplish it with respect to other user actions, it is a relevant indicator of the user interactivity level ; ii) the organization level: the vocation of the user in categorizing things. The main motivation to tag resource in the Web is for organizing resources in a personal way in order to better visualize, store and retrieve resource later; iii) the users’ interests: the fact that a user spends time in tagging a specific item is a high indicator of a probable interest in it. 3) Matching between tags and user model dimensions. This classification of tags can be used to analyze how each category can be relevant for user modeling dimensions. If the user selects one of the proposed tags, we can infer: a medium level of participation in the tagging activity, a low level of knowledge in the content and a medium level of organization attitude. All these inferences are not so strong since this behavior could be caused by the slackness or by the fact that she simply found the right tag among those ones suggested by the system. Thus, we need to further analyzing the typology of the proposed tags. If the user selects the most popular tags we can -weakly- infer that she trusts other community people and that she is close to the general thought (high conformism level). In the same way, if she is used to tag with the same word during several interactions, we infer a propensity to be regular in habits (high orderliness level). Finally, the selection of tags recommended by the system implies a high level of trust in the system. On the contrary, the fact that user exploits a lot of free tags let us make further assumptions. It could imply: an high knowledge in the topic, a high creativity, a great participation in the tagging activity (since using personal words requires more efforts than to simply selecting one of the suggested tags) and a high level of organization. The last three inferred values are even higher when the free tag is an invented one. If the user uses specific words, this could indicate a deep knowledge in the topic. This is not true on the other way around: if she uses generic words, this does not necessarily

750


imply a low knowledge. In fact, if the generic words are appropriate, this could mean a high knowledge that allows the use high abstract concepts. The use of synonyms could imply again a good knowledge in the topic and a high level of creativity; while contextual tags could imply that the user has high practical knowledge probably derived from a direct participation at event, and thus a high interest in it. If the meaning of the tag reveal some cross-categorization, that could imply a high knowledge in the event. Finally, organizational tags express a high attitude to organization and creativity, and subjective tags reveal a tendency to personalize the interaction.

4 An Architecture for User Model Interoperability in the Adaptive Social Web According to our approach, what a web community application may wish to do with users’ tags is threefold: i) to extract them regardless of the format used to represent them; ii) to reason upon them in order to enrich the user’s knowledge (especially regarding interests and preferences); iii) to make such tags available to other web applications in order to achieve an effective exchange of knowledge7. In order to perform all the three tasks, we are developing a modular architecture. The main components of such architecture are the Importer Module and the Exporter Module, which have respectively the assignments of extracting the tags and making them available to other applications. Let’s more deeply analyze the tasks of each component, with the aid of Figure 1. Once the user provides her web community accounts, the Importer Module retrieves the corresponding files containing the set of tags exploited by the user in the web community tools she has pointed out. Then the Importer Module extracts the tags, independently by the format used to represent them. For instance, Del.icio.us uses the following RDF markup: ...

While Flickr uses the XML markup: naples ravello church Once all the user’s tags have been extracted, the Importer Module looks for correspondence in the Event Ontology of iCITY, which is a RDFS ontology defined starting from the classification of events in TorinoCultura8 (a web portal managed by the municipality of Torino for informing citizens about cultural ongoing events in the city) 7

8

Some systems already make the list of tags available in some xml-based syntax (such as rdf, OWL). http://www.torinocultura.it/


751

Fig. 1. Architecture

and enriched with WordNet9 . In particular, classes and subclasses of the ontology are mapped on WordNet concepts by means of the WordNetTab extension for Protege, in order to automatically map the tag inserted by the user into the event ontology. Thus iCITY reasons over the tag to enhance the knowledge about user’s interests and preferences (for a more detailed description of this mechanism see [8]). The Importer Module is in fact able to find a correspondence between the extracted user’s tags and the WordNet terms mapped on the ontology. Thus the system increases the level of inferred user’s interests related to the class the tag belongs to (notice that the iCITY user model is designed as an overlay model of the Event Ontology). For instance if Cristina has often used the tag ”music” to annotate her bookmarks in Del.icio.us, iCITY infers that she likes music and propose her musical events. However, Cristina could also either uses personal tags that don’t have any correspondence with the ontology (e.g., “research-Links”, “personalLinks”), or terms related to a different concepts. For instance, she could use the tag ”Japanese” to annotate the web sites of her preferred Japanese restaurant, but she could not like Japanese music. To better understand the semantics of the tags, the Importer Module is designed to analyze the HTML keyword meta-tags of web site the tagged bookmark is related to (as done by De.li.cious). However, either the module could not understand the metatags for some reason, or, due to the heterogeneity of the web community systems from which tags can come from, there is the risk that no HTML meta-tags are available (i.e. Flickr). To overcome these problems and let that other systems better understand the iCITY public tag-based user profile, we are defining in iCITY an Exporter Module which generates a RSS file containing RDF statements about the the tags...of the tags the user inserts into the system. In such file each tag is referred to the class of the public and shared ontology it belongs to (the tag is referred to an event which is automatically mapped on the ontology). Thus, the other web-based systems that would explore the 9

http://wordnet.princeton.edu/

752


tags a specific user used in the interaction with iCITY, can find a semantic hints for the understanding and the disambiguation of the concepts.

5 Conclusion and Future Works Several web communities tools provide users with the chance of inserting tags. In our opinion,tags are really useful in revealing users’ interests and preferences; thus useradaptive systems may benefit from them to enrich their user models. This paper has presented a preliminary proposal which shows how systems can reason over tags to infer knowledge about users and haw tags could be exchanged over the web to share and re-use the user knowledge which is fragmented in the distributed user models of social web applications. We are working on investigating the possibility of sharing and exporting - using the same architecture and a shared format (e.g., RSS, RDF(S), OWL, UserML) - the complete user models (or a portion of it) together with the users’ tags. In this way the user could save her user model and voluntary submit her profile to adaptive social applications able to parse and understand such format. Thus the user is no more a passive entity, but she has the power to control the information systems has stored about her. In this direction, the need of scrutable user model became crucial [11].

References 1. T, O.: What is web 2.0. design patterns and business models for the next generation of software (30/09/05) 2. M, T.: The rise of social software, networker. 7(3) (2003) 3. M, G., E, T.: Folksonomie. tidying up with tags. D-Lib Magazine 12(1), 2 (2006) 4. A., D., S, L., A, M.: Semantic halo for collaboration tagging systems. In: Workshop on Social Navigation and Community-Based Adaptation Technologies AH(2006) Dublin, Ireland (2006) 5. M., M., P, B.: The Adaptive Web. Communications of the ACM 45 (2002) 6. Carmagnola, F., Cena, F., Console, L., Cortassa, O., Ferri, M., Gena, C., Goy, A., Parena, M., Torre, I., Toso, A., Vernero, F., Vellar, A.: icity – an adaptive social mobile guide for cultural events. In: Proceedings of Mobile Guide 2006 (2006), http://mobileguide06.di.unito.it/ 7. van Setten, M.R.B., van Vliet, H.L.G., van Huten, Y.M.V.: On the importance of ”who tagged what”. In: Wade, V., Ashman, H., Smyth, B. (eds.) AH 2006. LNCS, vol. 4018, Springer, Heidelberg (2006) 8. Carmagnola, F., Cena, F., Cortassa, O., Gena, C., Torre, I.: Towards a tag-based user model: how can user model benefit from tags? In the Proceedings of UM2007 (to appear) (2007) 9. Strauss, A., Corbin, J.: Basics of qualitative research: grounded theory procedures and techniques. Sage Publications, Newbury Park, Calif (1990) 10. Xu, Z., Fu, Y., Mao, J., Su, D.: Towards the semantic web: Collaborative tag suggestions. In: Proceedings of workshop on collaborative Web tagging (2006) 11. J, K., J, K.R., P, L.: Foundations for personalised documents: a scrutable user model server. In: Proceedings of ADCS’ 2001 Australian Document Computing Symposium pp. 43–50 (2001)