Multi-source Provenance-aware User Interest Profiling on the Social Semantic Web Fabrizio Orlandi Digital Enterprise Research Institute, National University of Ireland, Galway
[email protected]
Abstract. The creation of accurate user profiles of interest across heterogeneous websites is a fundamental step for personalisation, recommendations and analysis of social networks. The opportunities offered by the Web of Data and Semantic Web technologies introduce new interesting challenges. In particular, the main benefits for user profiling techniques are given by the extensive amount of already available and structured information and the solution to the “cold start” problem. On the other hand it is difficult to manage a massive “open corpus” such as the Web of Data and select only the relevant features and sources from an heterogeneous collection of datasets. Hence we propose semantic technologies for interlinking social websites and provenance management on the Web of Data to retrieve accurate information about data producers. The goal is to build comprehensive user profiles based on qualitative and quantitative measures about user activities across social sites. Keywords: Social Web, Semantic Web, User Modelling, User Profiles, Web of Data, Provenance of Data.
1
Introduction
The extraction, analysis and representation of information about users’ knowledge and activities on the Web plays an important role for software systems providing personalisation and recommendations to their users. The demand for personalisation on social media websites, search engines, e-commerce websites, etc. is clearly growing and becoming an essential part of every relevant web service. The challenges for web service providers are to provide accurate recommendations and personalisation without having to explicitly ask for users’ input or make users wait for valuable recommendations only after a long initial training period on the system (the “cold start” problem). To overcome these challenges it is important to create accurate user models and integrate relevant information about users from different sources on the Web [3]. In this regard the Web of Data is certainly a valid and extensive source of information for profiling and recommendation algorithms. The Web of Data offers structured data from different domains and communities. It provides easily accessible and machine readable data that can help solving the “cold start” problem and enriching the level of detail of user profiles. However one of the main challenges in dealing J. Masthoff et al. (Eds.): UMAP 2012, LNCS 7379, pp. 378–381, 2012. c Springer-Verlag Berlin Heidelberg 2012
Multi-domain Profiling of User Interests on the Social Semantic Web
379
with the Web of Data is to select only the relevant features and sources [2]. Especially in the social web context, where content is usually created by users themselves, it changes frequently and it is spread across multiple heterogeneous social sites. In this context provenance of data is a building block for: establishing data trust and quality measures, the knowledge acquisition/filtering process and the user profiling phase [4]. Moreover, semantic annotations and reasoning provide a solution to the problem of selecting only the relevant information. Research Goals The purpose of our research is to investigate: (i) how to extract relevant information from social media websites and make it available following the Linked Data principles; (ii) how to use the Web of Data as a global open corpus for personalisation purposes; (iii) the role of provenance on the Web of Data and how to use it for user profiling; (iv) how to improve the current user profiling and personalisation techniques leveraging the potentialities of Linked Data, provenance of data and Social Web Science.
2
Research Contributions
Following a state of the art review for the research areas of user modelling and personalisation, especially in relation with the Semantic Web field, our contributions in the first two years of research focused on the following: 1) The development of a framework for the semantic representation and data management of wikis. In particular we built and efficient application with a simple user-interface enabling semantic searching and browsing capabilities on top of different interlinked wikis. We described how we designed a common model for representing social and structural wiki features and how we extracted semantic data from heterogeneous wikis [6]. 2) A solution for representing and managing provenance of data from Wikipedia (and other wikis) using Semantic Web technologies. In particular we provided a specific lightweight ontology for provenance in wikis. Then, a framework for the extraction of provenance data from Wikipedia has been implemented, as well as an application for accessing the generated data in a meaningful way and exposing it to the Web of Data. 3) An approach for modelling and managing provenance on DBpedia (one of the largest datasets on the Web of Data) using Wikipedia edits, and making this information available on the Web of Data. For this purpose a modelling solution, an information extraction framework and a provenance-computation system have been implemented [7]. 4) A Semantic Web approach to filter public microblog posts matching interests from personalised user profiles. Our approach includes automatic generation of multi-domain and personalised user profiles of interest, filtering Twitter stream based on the generated profiles and delivering them in real-time [5]. 5) A system that allows users to set fine-grained privacy preferences for the creation of privacy-aware faceted user profiles on the Social Web.
380
F. Orlandi
Fig. 1. The complete profiling process: from user activities on heterogeneous social media websites (1), to their provenance representation (2), do the data aggregation and analysis (3)
3
Current and Future Work
The studies conducted on the use of semantics for interlinking social websites and subsequently on provenance on the Web of Data provide us the necessary baseline for our current work. In particular we focus on building comprehensive user profiles based on quantitative and qualitative measures about user activities across different social websites. Provenance of data is particularly useful to evaluate on each different website and/or dataset the type and amount of contributions to be attributed to a particular user. This would allow us to infer expertise, interests and qualitative estimations on users’ activities. More in detail, we are now focusing on user profiling algorithms for Wikipedia users that take into account the different possible types of contributions on that wiki. The different types of contributions are not only those that involve changes in Wikipedia articles’ content but also those that result in changes on the Web of Data, in this case in the DBpedia dataset. Every edit in Wikipedia that involves structural features of the Wikipedia articles, results in a change in the DBpedia dataset. Hence we are currently investigating the relevancy of those edits compared to the other types. Moreover one of our current activities involves the real-time user profiling and personalisation on Twitter. The aim is to provide a user profiling framework building user profiles of interests across different social websites. These user profiles can then be used for personalising and filtering social web streams of messages such as the Twitter stream (see also [1]). We plan to implement a framework that manages user models from different applications starting from Wikipedia and integrating it to other social media websites. Specific ontologies will be used to represent and connect user models from different applications in an interoperable way. The application will be capable of collecting information from the Web of Data and use it to enrich the user models. Approaches to automatically aggregate ontology-based user models will be explored.
Multi-domain Profiling of User Interests on the Social Semantic Web
4
381
Conclusions and Future Research Questions
The aforementioned research plan is expected to be completed in two more years. One of the main important challenges is the evaluation of the user profiling algorithms implemented. In particular, at the moment a corpus for measuring the quality of the developed methods is not available and user based evaluations are demanding and require a large number of participants. Another challenge is the study on the different sets of features for a user profiling algorithm that have to be implemented depending on the use case, the application and the source. For example, the profiling algorithm should adapt to the personalisation scenario or the use case, whether it is for music recommendations or for filtering a microblog stream. Moreover, the user model should also adapt to the source where the user information is extracted from. Users use different sites for different purposes, hence the activities performed and the interests expressed on social websites should be captured adopting different criteria. Thus, the profile information originated from different sources may have different importance for the application that requires the aggregated profile and this issue needs to be investigated. Finally, as our background is closer to the Semantic Web area of research, a more accurate investigation on the user modelling topics will be beneficial. Especially on the automatic aggregation and representation of user models which is crucial to our research, it is still a research challenge but extensive work has been done already in the past.
References 1. Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 375–389. Springer, Heidelberg (2011) 2. Aroyo, L., Houben, G.: User modeling and adaptive Semantic Web. Semantic Web Journal (2010) 3. Carmagnola, F., Cena, F., Gena, C.: User model interoperability: a survey. User Modeling and User-Adapted Interaction (2011) 4. Hartig, O., Zhao, J.: Publishing and Consuming Provenance Metadata on the Web of Linked Data. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 78–90. Springer, Heidelberg (2010) 5. Kapanipathi, P., Orlandi, F., Sheth, A., Passant, A.: Personalized Filtering of the Twitter Stream. In: SPIM Workshop at ISWC 2011, pp. 6–13. CEUR-WS (2011) 6. Orlandi, F., Passant, A.: Semantic Search on Heterogeneous Wiki Systems. In: International Symposium on Wikis (WikiSym 2010). ACM (2010) 7. Orlandi, F., Passant, A.: Modelling provenance of DBpedia resources using Wikipedia contributions. Journal of Web Semantics 9(2) (2011)