Time-Ordered Collaborative Filtering for News ... - IEEE Xplore

MOBILE INFORMATION SYSTEMS

Time-Ordered Collaborative Filtering for News Recommendation XIAO Yingyuan 1, 2, AI Pengqiang 1, Ching-Hsien Hsu1, 3, WANG Hongya 4, JIAO Xu1 Tianjin University of Technology, 300384, Tianjin, China Tianjin Key Lab of Intelligence Computing and Novel Software Technology, 300384, China 3 Chung Hua University, 30012, Taiwan 4 Donghua University, 201620, Shanghai, China 1 2

Abstract: Faced with hundreds of thousands of news articles in the news websites, it is difficult for users to find the news articles they are interested in. Therefore, various news recommender systems were built. In the news recommendation, these news articles read by a user is typically in the form of a time sequence. However, traditional news recommendation algorithms rarely consider the time sequence characteristic of user browsing behaviors. Therefore, the performance of traditional news recommendation algorithms is not good enough in predicting the next news article which a user will read. To solve this problem, this paper proposes a time-ordered collaborative filtering recommendation algorithm (TOCF), which takes the time sequence characteristic of user behaviors into account. Besides, a new method to compute the similarity among different users, named time-dependent similarity, is proposed. To demonstrate the efficiency of our solution, extensive experiments are conducted along with detailed performance analysis. Keywords: time sequence; time-dependent similarity; time-ordered collaborative filtering

I. INTRODUCTION The ever growing popularity of mobile devic-

53

es and rapid advent of wireless technology enable people to access the Internet more easily [1, 2]. In particular, more and more people are accustomed to reading news via PC or smartphone. A variety of news websites and news aggregation websites, like Google news and Yahoo news, provide on-line news services for users. However, faced with thousands of news articles every moment, users may realize that it is difficult to find these news articles they are interested in from the news websites. Personalized news recommendation technology is the most effective way to solve this problem. News recommender systems are aimed to offer news articles to a user based on his/her interests. To reflect the interests of a specific user into news article recommendation, the interests are predicted from the data of user behaviors or the content of news articles that are read by him/her. There are three kinds of basic news recommendation algorithms: content-based, collaborative filtering and hybrid approaches. In this paper, we focus on the collaborative filtering. In the traditional collaborative filtering algorithms, user browsing behaviors are always treated as a set or collection of news articles. Some research work focus on adding the temporal characteristic into the news recommendation algorithms [3, 4]. However, few China Communications • December 2015

researches take the time sequence characteristic into account. Therefore, although many news recommendation algorithms can make a meaningful recommendation, but few of them are designed to predict the next news article that the user will read after reading a specific news article, which is also an important need for users. In this paper, we take the time sequence characteristic of user behaviors into account while computing the similarity among different users and propose a time-dependent similarity measure. Then the time sequence characteristic of user behaviors is also integrated into the recommendation algorithm and a news recommendation algorithm named time-ordered collaborative filtering recommendation algorithm (TOCF) is proposed, which treats the specific news article as a context in predicting the next news article for a user. In summary, the contributions of our work are two-folds: • A novel user similarity measure: unlike prior user similarity measure, the time-dependent similarity measure integrates the time sequence characteristic of user behaviors in computing the user similarity, which can compute the similarity between a longterm user and a relative short-term user in a reasonable way, avoiding the defect of traditional Jaccard similarity measure. • Time-ordered collaborative filtering algorithm: we propose a novel news recommendation algorithm called time-ordered collaborative filtering algorithm (TOCF). Experimental results show that TOCF perform better than traditional collaborative filtering algorithms when they are used to predict the next news article the user will read. In addition, we find that the users who share similar interests also have similar reading pattern after analyzing the experimental results (See section 5). The rest of this paper is organized as follows. Section 2 reviews the existing work relevant to personalized news recommendation. In section 3, the problem definition and baseline predictor are given. In China Communications • December 2015

section 4, time-dependent similarity measure and TOCF are proposed. Section 5 evaluates our method through extensive experiments. Finally, Section 6 concludes this paper.

II. RELATED WORK 2.1 Content-based recommendation Content-based recommendation methods are based on a description of the item and a profile of the user’s preference. In a content-based news recommender system, keywords are used to abstract the features of news articles and a user profile is built to indicate the type of news article this user likes. Based on the feature representations of news articles and a user profile, content-based methods recommend news articles that may be of interest to the user. The key of content-based methods is the representation of news articles. News content is often represented using vector space model. A well-known method is TF-IDF [5]. Before calculating the TFIDF values, a series of preprocessing steps are executed, including removing stop words, tokenizing, stemming and so on. Then a news article is represented by a keyword vector, where each entry is the TF-IDF value of the corresponding keyword. Based on a user’s history behaviors, the user profile can be created. For a newly-published news article, we can compute the similarity between the user profile and the news article by similarity functions (Jaccard similarity or cosine similarity). However, the TF-IDF method represents a news article by a bag of words, which ignores the semantic relations between words. Then semantic methods are involved into the news recommendation systems. These methods make use of domain ontology [6], lexical ontology [7], containing synsets associated with corresponding lexical representations, named entity [8], or tag [9]. For example, Goossen et al. [10] extended the TF-IDF measure by using ontological concepts. Capelle et al. [11] proposed a method called SF-IDF, which combined the TF-IDF with the WordNet synsets.

In this paper, the author take advantage of the time sequence characteristic of user behaviors in computing the similarity among different users and propose a time-dependent similarity measure.

54

In addition, the news article is represented by a topic distribution. For example, the probabilistic Latent Semantic Indexing (PLSI) and the Latent Dirichlet Allocation (LDA) are also used in news recommendation.

2.2 Collaborative filtering recommendation

55

algorithms and collaborative filtering algorithms can make meaningful recommendation. However, both of them have their advantages and disadvantages. Recent research has demonstrated that a hybrid approach [15, 16], combining collaborative filtering and content-based method could be more effective in some cases. For example, Liu et al. [17] developed a Bayesian framework for predicting users’ current news interests from the activities of particular user and the news trends demonstrated in the activity of all users. They combine the content-based recommendation mechanism which uses learned user profiles with an existing collaborative filtering mechanism to generate personalized news recommendations.

Unlike content-based approaches, which use the content of news articles, collaborative filtering methods leverages the ratings of other users to predict the rating of news articles for a particular user. In the news recommendation, the news ratings are typically binary: a click on a piece of news corresponds to 1, whereas a non-click is represented as 0 [12]. Collaborative filtering methods can be categorized into two types: memory-based algorithm and model-based algorithm. Memory-based algorithms make rating prediction for users based on their past ratings. These collaborative filtering algorithms predict the rating of a newly-published news article based on a group of users who share similar reading preferences. GroupLens [13] gathered and disseminated the ratings of users to predict scores of news articles for a particular user, based on the heuristic rule that people who agreed in the past will probably agree again. Model-based collaborative filtering algorithms build users’ profiles in a probabilistic way. For example, SVD and SVD++ are used in the recommendation algorithms [14]. Collaborative filtering recommendation algorithms can efficiently capture users’ interests based on users’ history behaviors, which doesn’t need to deal with the content of news articles. Therefore, the collaborative filtering algorithms are content-free. However, because hundreds of thousands of news articles are published in the news websites, the number of news articles that a user can read is a small collection, which leads to sparsity. Besides, the cold start is a potential problem for collaborative filtering recommendation algorithms.

, where denotes the identifier of the news article read by , represents the news article read by the user at the time and tk1