Personalized Blog Recommendation Using the Value, Semantic, and Social Model Tse-Ming Tsai, Chia-Chun Shih Institute for Information Industry, Taipei, Taiwan {eric, chiachun}@iii.org.tw
Seng-cho T. Chou Department of Information Management, National Taiwan University, Taipei, Taiwan
[email protected]
A blog has the potential to change the way we perceive information and make friends. Using the value, semantic, and social model of the blogger, this work analyzes the user interfaces and user profile for a proactive recommendation. Retrieving the personal value inclination, article topic semantic, and interactive social network of the bloggers, this study has tried to step toward a blog recommendation system. Related works, analysis model, system architecture, application UI and discussion are presented.
popular remedies. The recommender system helps people locate potentially interesting items, freeing one from frustrating browsing sessions. [7]. The key issue here is designing a method to estimate a user’s favored item. Existing methods could be divided into three approaches [1]: (1) content-based approaches, (2) collaborative approaches, and (3) a hybrid of the above. However, these approaches all rely on the accumulation of the user’s favorite items to enhance performance. One solution is to include a user model, which helps in understanding user characteristics when less usage records are gathered [6].
1. Introduction
2.2. User Modeling
Personal information perception and publication has always been an interesting and critical problem, which represents how human forming, communicating, and utilizing information. The media form has evolved from bamboo, paper, television broadcasting, email, web, and now, Blogs. A blog removes the intermediation for channel selection thus everyone can represent himself/herself without any filtering mechanism. Searching blogs of like-minded people and reading interesting blog articles become part of bloggers’ daily life. However, it is usually limited in one’s social reachable boundary. In accumulating many preferences, opinions, and relationships in the blogosphere, this study is trying to look into the possibility of brokering those of like mind in a more comprehensive and automatic manner to make the information space friendlier. We analyze existing blog user interfaces by using mining technology to recommend blogs to those bloggers .
In the premier form of user characteristics, user profiles cover user interests, user preferences, and simple user demographics which are mostly derived from questionnaires. These characteristics could be gathered from explicit investigation and/or implicit observation. User preferences are useful in e-commerce, but it suffers when applied for recommending articles in the Blogosphere. Unlike products with clear and objective descriptions, blog articles are often more subjective and emotional. User interests and preferences are deficient in more in-depth description of the user’s mind set [10].
Abstract
2. Related work 2.1. Recommender systems Information overloading is a general problem in the web era and the recommender system is one of the most
1-4244-0674-9/06/$20.00 ©2006 IEEE.
2.3. Blogs The blog is becoming the most popular form for users to represent themselves on the Web. According to the estimate of Technorati [8], the number of global blogs doubles every six months. People use blogs to share life with friends, express opinions, and introduce something cool. Blog content is an excellent source to understand bloggers. Blogs not only provide people with an opportunity to express themselves but is also a platform for interaction and communication. Comments, trackbacks, blogrolls, links, and syndications are some approaches bloggers use to interact in the Blogosphere [9]. These interactions are
useful to understand how one communicates with others and could be used as critical clues to build personal profiles.
2.4. Semantic Tagging Tagging is a de facto feature in the latest web application. Tagging allows users to define their own terms to arrange items [11]. Unlike a traditional classification scheme, which uses a top-down approach to centrally control hierarchy structure, tagging follows a bottom-up manner and believes “collective intelligence” will drive the democratically-correct classification results [12]. Although usability is enhanced with tagging, precision and recall may drop. [13] Some literatures tried to augment tagging with ontologies or taxonomies, which show prospective results. [4]
3. Analysis model In this section, we will introduce our three-dimension analytical model for blogs. We view a blog from three perspectives. First, a blog is a personal publication platform. Bloggers show some aspects of themselves in their blogs. Content published in personal blogs would be an accurate hint for recommender systems to infer bloggers’ interests. Second, blog content is highly subjective. Common recommendation techniques which take objective views to model users and items are not appropriate for the Blogosphere. Last, a blog is a two-way communication platform which promotes social interactions. These interactions would imply different degrees of care toward others, such as understanding, concern, intimacy, etc. We summarize these perspectives with a three-dimension model, which represents semantic, value, and social, respectively. (See Figure 1.)
Semantic Value Social Figure 1.
Three-dimension model
3.1. Semantic To understand blog content, we need to capture the semantics implied in the blog content by using two approaches: hard approaches and soft approaches. Hard
approaches utilize NLP (Natural Language Processing) and IR (Information Retrieval) techniques to parse blog articles. Key concepts or terms of articles are extracted and then used to build personal ontologies. Soft approaches borrow the power of tags, which is a set of user-defined terms to arrange items. With the help of tags, term extraction processes could be omitted. Personal ontologies are useful for locating bloggers’ domain characteristics, such as interests and expertise. [14]
3.2. Value Pure semantic approaches can only discover limited user characteristics. To obtain in-depth user characteristics, an appropriate user value model should be defined in advance. Lu et al. proposed a value-driven model [2] (See Figure 2.), which derives many personal value attributes from an exchange view and a norm view. We would like to present our personal value model for the Blogosphere based on Lu et al.’s work.
Figure 2.
Referenced value model [2]
3.3. Social Social Network Analysis (SNA) provides some useful tools to look into a social system. A blog is a social platform, and SNA could be helpful to analyze blog communities. Through comments, trackbacks, and sometimes blogrolls or syndication activities, bloggers build connections to others [15]. In most case, these interactions leave some messages in blog content, so that we can track these interactions without the aid of user activity logs. Since different kinds of user interactions would imply different types and degrees of relationships [3], we would identify possible types of relationships and map interactions to relationships with different degrees.
4. System architecture
Our proposed system architecture is shown in Figure 3. Users access the system through existing blog infrastructure. A Content Aggregator collects blog content, and the Content Extractor formats the content to support follow-up processing. Semantic Analysis and Social Analysis compute blog content to discover bloggers’ areas of interest and social patterns. A user personal value model is calculated based on a pre-designed questionnaire. All computation results are stored in user profiles, which is the base of recommendations.
4.4. Semantic analysis
User
Blog Infrastructure Comment Trackback RSS
Recommendation
Questionnaires
Value Model
User Profile
Semantic Analysis
RSS
Content Aggregator
Content Extractor
Social Analysis
Figure 3.
Blog Articles
Tags/Categories Article Raw Content Content Extracted Links Trackbacks Raw Comments Content Extracted Links …(More Comments)
System Architecture
4.1. Blog infrastructure Blog infrastructure constitutes interfaces facing users. Native blog functionalities, such as comment, trackback, and RSS subscription, as well as recommendation and personalization components are built into blog infrastructure.
4.2. Content aggregator Content Aggregator collects blog content routinely and stores up-to-date blog content.
4.3. Content extractor The Content Extractor transforms blog pages into a pre-defined data format. We extract title, author(s), timestamps, article content, link(s), tag(s), comment(s), and trackback(s) from a single blog article. The blogger’s profile, blogroll(s), and syndication activities are extracted from blogs as well. Table 1 shows our decomposition of blog articles. Table 1. Decomposition of blog articles Title Author Publication Timestamp Permalink Address
The major purpose of Semantic Analysis is to discover the bloggers’ area of interests. Given all content a blogger has published, key terms are highlighted through IR/NLP and/or user tagging approaches. Distribution of term frequency then can be taken as an important indicator of the blogger area of interest. Our semantic analysis approach will be based on an “evolving ontology”. With the help of tags, we can conjecture users’ semantic cognition towards objects, which makes it possible to build a user-oriented semantic model [5]. As more objects are tagged, the model evolves and completes itself. Existing ontologies, for example, DMOZ [16], are referenced in the process to merge and correlate tags. Existing ontologies may enhance performance in an initial stage where less objects are tagged.
4.5. Social analysis In Social Analysis, we build a social network to describe the Blogosphere. In the social network, nodes represent bloggers and arcs represent relationships between bloggers. Each arc is associated with a weight which shows the degree of relationship between bloggers. Arc weights are calculated as follows: SocialDegree(a, b) = ∑ strengthi ( Ri ) (1) R
i
( a ,b ) =1
Each type of interaction is given a value to show the interaction strength. The degree of relationship of two bloggers is the sum of interaction strength between the two bloggers.
4.6. Value model Currently, we have defined four attributes to describe one’s value orientation, as follows: y Freshness: How users respect timeliness of information
y Cost-sensitivity: To what degree users care about money y Popularity: The degrees to which users enjoy popular information y Controversial: How users enjoy participating in a controversial discussion We will use questionnaires to measure the degree of each attribute.
in the left sidebar. Our calculation results are in the right sidebar, including recommended people, recommended articles, personal FOAF file icon, and personal value model.
Personal Information
4.7. User profile Personal characteristics are stored in a specific user profile format. We plan to use FOAF [17] and relationship extension [18] to describe users. If necessary, we will append customized element into FOAF as well.
Social
4.8. Recommendation We will combine three different kinds of clues to calculate a score for each blogger and blog article pair. The score shows the estimated tendency of a blogger to be interested in an article. The formula is as follows: Score(U , A) = α ∗ SemanticSim(U , A) + β ∗ SocialDegree(U , Author ( A)) + (2) γ ∗ ValueSim(U , Author ( A)) where U corresponds to a blogger, A corresponds to an article, and Author(A) corresponds to an author of article A. In (2), the calculation of Social Degree is referred to (1). The semantic and the value of bloggers/articles will be represented as vectors, and similarity can be calculated using cosine similarity measure: A• B similarity ( A, B ) = (3) A⋅ B
5. Result and discussions Figure 4. shows a simplified example of a user profile in our system. The profile is based on FOAF, and represented as an XML file. In spite of basic personal information, the profile also covers the user’s semantic, social, and value characteristics. The system also calculates a score for each characteristic using the formula described in section 4. The score represents the degree of preference of the user towards the characteristic. Figure 5. shows the user interface. We customized Drupal (http://www.drupal.org) to implement our user interface. As seen in Figure 5, common blog elements, such as tag cloud, friend list, and blogrolls are presented
Semantic
Value
Figure 4. XML representation of user profile
6. Conclusions Through the three dimensions of value, semantic, and the social models, our recommendation could apply to the emerging Blogosphere and improve the user experience for the bloggers in gathering the featured items. Current approaches may not be comprehensive enough since the way people use blogs continues to evolve; however, we can start to envision a more active and sensitive web platform for people to connect with one another. The system could know you better and provide more accurate information once you open your mind. Matching accuracy is the next topic to address, and we cannot ignore the privacy and authorization issue either. By retrieving the personal value
inclination, article topic semantic, and interactive social network of the bloggers, this study has tried to step toward a blog recommendation system.
[7] M. Balabanović, Y. Shoham, “Fab: content-based, collaborative recommendation”, Communications of the ACM, vol. 40 no. 3, March 1997 [8] D. Sifry, “State of the Blogosphere”, available at http://technorati.com/weblog/2006/02/81.html, Technorati, February 2006 [9] C. Miller and D. Shepherd, “Blogging as Social Action: A Genre Analysis of the Weblog”, available at http://blog.lib.umn.edu/blogosphere/, 2004 [10] M. Efron, “Cultural Orientation: Classifying Subjective Documents by Cociation Analysis”, AAAI Fall Symposium on Style and Meaning in Language, Art, and Music, 2004
Figure 5.
User Interface
7. Acknowledgement This research was supported by the III Innovative and Prospective Technologies Project of Institute for Information Industry and sponsored by MOEA, ROC., and the National Science Council of the Republic of China under Grant Number NSC95-3114-P-001-002-Y02 and NSC 94-2218-E-002 -054.
8. References [1] G. Adomavicius and A. Tuzhilin, “Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions”, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, June 2005. [2] Mei-Rung Lu and Soe-Tsyr Yuan, “A Value-Driven Architecture Strategy of Adaptive SOA and EDA”, International Conference of Information Management Research and Practice, Taiwan, 2005 [3] C. Marlow, “Audience, structure and authority in the weblog community”, papers presented at International Communication Association, 2004 [4] Z. Xu, Y. Fu, J. Mao, and D. Su, “Towards the Semantic Web: Collaborative Tag Suggestions”, Collaborative Web Tagging Workshop on Fifteenth International World Wide Web Conference (WWW’06), 2006 [5] P. Schmitz, “Inducing Ontology from Flickr Tags", Collaborative Web Tagging Workshop on Fifteenth International World Wide Web Conference (WWW’06), 2006 [6] M. Degemmis, P. Lops, G. Semeraro, M. F. Costabile, O. Licchelli, and S. Guida, “A hybrid collaborative recommender system based on user profiles”, International Conference on Enterprise Information Systems, 2004
[11] S. Golder , B. Huberman, “The Structure of Collaborative Tagging Systems”, Technical report, Information Dynamics Lab, HP Labs, 2005 [12] C. Shirkey, “Ontology is Overrated: Categories, Links, and Tags”, available at http://shirky.com/writings/ontology_overrated.html, 2005 [13] A. Mathes, “Folksonomies – Cooperative Classification and Communication Through Shared Metadata”, Computer Mediated Communication – LIS590CMC, Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, available at http://www.adammathes.com/academic/computer-mediated-c ommunication/folksonomies.html, 2004 [14] L. Razmerita, A. Angehrn, A. Maedche, “Ontology based user modeling for Knowledge Management Systems”, 9th International Conference on User Modeling, Pittsburgh, 2003 [15] S. Herring, I. Kouper, J. Paolillo, and L. Scheidt, “Conversations in the blogosphere: An analysis "from the bottom up"”, the 38th International Conference on System Sciences, 2004 [16] Open Directory Project (http://dmoz.org) [17] FOAF project (http://xmlns.com/foaf/0.1) [18] RELATIONSHIP: A vocabulary for relationships between (http://purl.org/vocab/relationship)
describing people