Important User Group Based Web Service Recommendation Lulan Yu1,2 , Min Gao1,2 , Xinyu Xiao1,2 , Xiang Li1,2 , Qingyu Xiong1,2
1
Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education, Chongqing, 400044, China 2 School of Software Engineering, Chongqing University, Chongqing 400044, China
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] Abstract—Due to the burgeoning of online services, web service recommendation system (WSRS) has received extensive attention no matter in the academia or industry. As an effective personalization technique, it solicits recommendations from one another and recommends appropriate services to target users. However, with the advent of shilling attack, problems arise along with the rapid development of such promising technology, which is, the existence of noisy attacking profiles leads to the inaccuracy of recommendation results. Since current state-of-the-art approaches rarely take such security aspects into consideration, we propose a novel recommending framework based on Important User Group (IUG) incorporating traditional collaborative filtering algorithm to achieve a robust web service recommendation. In our work, three selection methods are applied to obtain IUG, eliminating certain quantity of malicious users. Experimental results on Meizu-AppCom, WS-DREAM, and Epinions demonstrate resilience of IUG to shilling attacks. Keywords—Important User Group; User-based Recommendation Algorithm; QoS; Web Service; Shilling Attack
I.
I NTRODUCTION
With the accelerated proliferation of online businesses, how to relay appropriate web services to target user has been a challenging issue. Recommender System (RS) [1], which aims to help its users find alternative services they are interested in, is recognized as an effective personalized technique to present recommendations based on previous preferences of its users and choices of their like-minded users. A prevalent approach, collaborative filtering (CF), has been discussed in a number of literatures [2], [3] due to its remarkable effectiveness and usability. In actual fact, subjects communicating in the digital world have such characteristics as virtuality, indirectness and concealment. For this purpose, how to identify trustworthy users becomes an intractable task. Currently, there are two ways to cope with it. Filtering users based on trust relationship [4] or recognizing reliable users [5] is all practical methods to improve the accuracy. Additionally, it has been certified by dozens of researches that shilling attack, a major concern at present, is affecting recommending results severely [6]. The robustness of the system is overwhelmingly challenged [7]. A diverse array of investigation has been done on developing detection scheme like identifying malicious users or reducing the effect of fake profiles, based on supervised or un-supervised learning, classification, and clustering [8]. Unfortunately, most of the former works attempted to enhance the accuracy and efficiency of recommending algorithm
without considering the aspect of shilling attack. Motivated by the above mentioned issues, we propose a novel web recommender algorithm based on Important User Group (IUG), which sorts out users with high importance-ranking while filters low ranking ones, aiming at improving the accuracy of recommendation under the existence of attacker profiles, which might not be included as noisy data in the previous studies. Generally, altogether three phases constitute the IUG architecture: user-service information collection, IUG selection, and prediction of QoS and recommendation. Meanwhile, we propose three types of IUG selection criteria: the QoSbased, the user behavior-based and social relationship-based. In order to evaluate our IUG-based algorithm, we conducted altogether three experiments respectively on our realworld crawling dataset Meizu-AppCom, which contains information contributed by 921002 users from 726 apps, along with the well-known WS-DREAM [9] and Epinions dataset. Comparing with the traditional approach, experimental results indicate that our algorithm has a reasonable overhead for overcoming the vulnerability to shilling attack. The rest of the paper is organized as follows: Section 2 presents the fundamental knowledge on user-based collaborative filtering and shilling attack. In Section 3, we address the concept of Important User Group and its selection criteria, and propose four indicators of user behaviors. The framework of our IUG algorithm is described in Section 4. Experimental studies in Section 5 intend to provide demonstration of our algorithm robustness. Lastly, Section 6 addresses our conclusion of the whole paper work. II.
BACKGROUND
A. Collaborative Filtering Recommendation Algorithms CF, as an effective and prevalent recommending approach, solicits and relays recommendations from one another. Without analyzing the contents of items or extracting features, CF simply depends on the target users rating information to reckon its similar user group, whose preferences resemble the target one. Such recommending method has become one of the hits adopted by big companies throughout the world, for instance, Netflix [10], YouTube [11]. Concerning the QoS-aware CF algorithm, traditional web service recommendation is mainly composed of three phases: similarity computation [12], QoS value prediction [5], [13], [14] as well as recommendation [2]. Currently, researchers have employed plenty of techniques to improve CF accuracy
and effectiveness, among which QoS-aware algorithm analyzed by studies [2], [14], paves the way for latter researchers. In the area of enhancement of similarity computation, adjusting user similarity scores [15] and calculating QoS attribute values [12] are all applicable ways. While the tolerance of user-item matrix sparsity were analyzed by [16] where time-aware CF approach was first described. The literature [17], [18] proposed to apply location information to CF recommending process to overcome the weakness of data sparsity and scalability. B. Shilling Attack Due to the open nature of collaborative systems, recent studies has shown that traditional collaborative filtering recommender systems (CFRS) are extremely vulnerable to what have been termed as “shilling attacks”, that is to say, attackers counterfeit multiple user profiles to increase or decrease the invoked frequency of target web services [6]. The common means attackers would apply is constructing attack profiles on certain kinds of attack models without substantial knowledge while exerting great influence. The profile of a shilling attacker is an injection of rating record on various items essentially, and it is normally composed of ratings on four-typed items: Target Items, Filler Items, Selected Items and Unrated Items (Non-voted Items) [19]. There are three classical attack models that have been used in our experimental studies: Random Attack, Average Attack, and Bandwagon Attack, as are briefly explained below: 1) Random Attack: In the Random Attack, Filling Items will be assigned with random ratings in a small range around the average rating contributed by all users. According to the different purposes in the attack, this attack model generates attack profiles on the basis of push attack or nuclear attack with an intention to fill the maximum or minimum value of the target item. Random Attack is relatively simple, since limited knowledge on attacking the system database is needed. 2) Average Attack: Requiring the average rating of each item, the Average Attack is much more sophisticated than the Random Attack, though. In order to affect as many recommendation results as possible, it relies on more neighboring users, which means more fundamental information in the rating dataset. Consequently, it costs time to estimate the average rating of each item. However, as some systems explicitly provide information on preferences of users, the implementation of this attack is supposed to become easier. 3) Bandwagon Attack: The Bandwagon Attack is derived from Random Attack but it differs in associating the Selected Items that include a handful of well-known popular items and is set to be the maximum rating. The attacker usually chooses a great deal of popular items in the recommender system as the Filling Item, so that the attack profiles could disguise like real user profiles, enhancing the effectiveness of the attack. We will utilize the above three basic attack models in the experiments section for the analysis of IUG and the evaluation for the IUG-based recommendation framework. III.
I MPORTANT U SER G ROUP S ELECTION
Numerous analyses have shown that spiteful users tend to behave in a similar way, such as writing numerous reviews
in a short period of time, commenting on particular things, or merely leaving the highest or the lowest ratings. Due to their non-usability, we define them as Unimportant User Group (UUG), some of whom are Attackers. IUG is a collection of genuine users with reliable comments and recommendable scores to web services, which provides effective reference and guidance for future prediction of recommendations. To capture the feature of IUG and well define it, we propose three types of selection criteria of IUG: the QoS-based, the user behavior-based, and the social relationship-based, which will be illustrated in the following subsections. We can either apply one of them in light of different conditions of datasets or combine them to realize more sophisticated and accurate filtering. A. IUG selection based on QoS Known as a vital performance index in Web services, QoS is often used to represent the set of non-functional properties of Web services. It is demonstrated in the study [6] that the distribution of QoS data has a feature of long-tail due to the result that approximately 90% of the response time are in the range of [0, 1] second, which we call Normal Range. Thus, considering the range mentioned above, only users generating more normal QoS values are considered more important. The ratio Pi representing the importance of user i is calculated as follows: Pi = Ni /Nn
(1)
where Ni denotes the total number of response time in normal range that belongs to user i, and Nn denotes the number of all response time records in all user profiles. B. IUG selection based on User Behaviors In WSRS, user behaviors mainly consist of giving ratings or comments for services. In virtue of personal habits, the services rated by a user reflect the interest pattern of the user. To identify IUG reasonably and precisely, we propose the following four key indicators that will be using throughout the paper. Average Deviation of User Rating: This is the average value of deviations between each user rating value to the mean value of all ratings, designed to measure the fluctuation magnitude of a piece of user data. It is defined as ¯ ui )/Nur Rb = abs(Rui − R (2) where Rui is the rating value that user u rated for service i, ¯ ui is the average value of all the ratings contributed by user R u, and Nur is the number of user ratings of u. Range Value of User Rating: This is the difference between the highest and the lowest ratings of a single user, also intended to reveal the fluctuation amplitude of the rating of the user, and we define it as Rr = Rmax − Rmin (3) where Rmax is the maximum rating value for each user, while Rmin is the minimum one.
User Active Rating Day Proportion: It refers to the proportion of the number of active days and the number of all the ratings in the system, intended to measure the normality of user behaviour, and is calculated as follows Rd = Nd /Nur
(4)
where Nd is the number of active days with target user ratings, and Nur is the quantity of all of his/her ratings.
IV.
I MPORTANT U SER G ROUP BASED S ERVICE R ECOMMENDATION F RAMEWORK
In this section, we propose a recommendation framework based on Important User Group, as is illustrated by the following Fig.1. After data selection, the main parts of the framework include two parts: IUG selection and prediction of QoS.
User Interest Degree: It represents the degree of interest a user is into a target service, also a measurement of normality of user behaviour, which is defined as Ri = Na /Nur
(5)
where Na represents the number of services a user rated and Nur is the number of his/her all ratings. Based on four indicators above, a feature vector for each user should be created and normalized, so as to obtain the importance mark by summing up every element in this vector. Next, all the importance marks are listed in descending order and the top users of the list are what we called as IUG. Fig. 1. The framework of web service recommendation algorithms based on IUG
C. IUG selection based on Social Relationships The input information of WSRS based on social relationships is composed of two parts: user relationship information and user history information (ratings, response time, throughput, etc.). It is demonstrated that the importance of a high-degree user node is self-evident, which means that the bigger number of relationship with others a user has, the more necessary he/she is to the user network [20], [21]. Thus, in this paper, we apply the two crucial centrality characteristics of social network to select IUG, namely degree centrality and betweenness centrality. Degree Centrality: Degree centrality is the most direct indicator of node centrality in social networks. The greater the degree of a node is, the more important the node is in the network. The degree centrality of a node can be represented by the following formula: CD (vi ) = di /(n − 1)
(6)
where n represents the total number of nodes that node vi belongs to in the network, and di represents the degree of the node. Node Betweenness Centrality: Betweenness Centrality represents the number of shortest path passes in a social network. The user nodes of high betweenness play greater role in the communication of nodes. The betweenness centrality of nodes can be represented by the following formula: σst (vi ) (7) CB (vi ) = σst vs =vi =vt ,s