Logic Profiling for Multicriteria Rating on Web Pages - CiteSeerX

3 downloads 0 Views 92KB Size Report
meaningful user's actions traced in a log file during browsing, and ex- tracted in form ... can be a subset of all pages that are included among the favourites links in the ... i. to visit an Url with frequency N for an average dwell time D: vis(Url,N,D);.
Logic Profiling for Multicriteria Rating on Web Pages Alessandra Mileo1 Abstract. I want to propose a general framework for user-oriented and content-based recommender systems aimed at providing preferential multicriteria rating on Web Pages and automatic user’s profile generation and updating through logic programming techniques.

1 Introduction The World Wide Web (WWW) is constituted by a large, distributed set of heterogeneous documents that is constantly changing, making it harder for a visitor to locate interesting pages exploring all possible paths where pertinent information can be found. Some of the solutions to this problem proposed so far were mainly based on machine learning techniques [4, 5], sometimes tracing the whole user’s browsing behavior to update the profile [3], and generally content-based. Several limitations of these systems showed that multicriteria rating, multidimensionality and flexibility are needed in order to improve the understanding of users and items [1].

2 User’s Profile Generation In my approach, user’s browsing behavior, is represented by a set of meaningful user’s actions traced in a log file during browsing, and extracted in form of logic predicates when the user asks the system for suggestions on links of a page. User’s profile is related to i)sources and ii)reasons to recommend a link. Interests on sources are represented by a set of pages U rli (in form of URL addresses) that are interesting to the user. These pages can be a subset of all pages that are included among the favourites links in the browser menu and are written as a set of facts of the form f av(U rli ). When a user asks for suggestions, the initial profile is enriched with new facts derived from the recent user’s browsing behavior. Which actions should be considered meaningful and consequently used to update the profile is a crucial issue, as a user may perform several kinds of actions mostly irrelevant in determining user’s preferences in a specific context. Thus, I decided to select a reduced set of actions among those traced in the log file, and rewrite them as logic predicates to be used in the inference process. The filtering mechanism considers meaningful: i. to visit an U rl with frequency N for an average dwell time D: vis(U rl, N, D); ii. to add/remove an U rl to/from the list of favourites at time T : add(U rl, T ime) / rem(U rl, T ime); iii. to follow/discard a suggestion with a frequency N : f oll(U rl, N ) / disc(U rl, N ). Any other action is filtered out.

Preferences on reasons are defined by the user in form of an ordering relation on the reasons for which a page can be recommended. Each reason for recommendation is associated to a function fi (P0 , Pj ) representing a relation among the current page the user is visiting, P0 , and each of the pages linked by P0 , namely P1 , . . . , Pj . In preliminary tests I considered three reasons to recommend a link: f1 is related to a similarity metric, f2 expresses how a page is correlated with interesting sources and f3 combines both criteria. According to the ordering relation, each function fi is associated to a value p(fi ) ∈ {1..i} that will be used for ranking.

3 Updating Profile by Adapting Rules The formal setting I propose for profile update is that of Answer Set Programming (ASP). ASP is based on the stable models semantics for Logic Programs proposed by Gelfond and Lifschitz [2] and it can be seen as bringing together concepts and results from Logic Programming, Default Reasoning and Deductive Databases. The intuition is that of using automated commonsense (nonmonotonic) reasoning to update user’s interests by revising the logic program (adding new facts and rules according to information from the log file) and by modifying numerical thresholds in logic rules. Parameters update allows to adapt the learning process to different users as well as to a user that changes his/her interests. A page U rlj is classified as being either interesting or uninteresting. In particular, U rlj is interesting if the log file traced that it has been added to the list of favourites during browsing2 . Thus, U rlj is interesting either by default (it is in the list of favourites) or if: i) U rlj has a frequency N of visits, average dwell time D > , or ii) recommended U rlj has been visited with frequency N > M inf , When a page is inferred to be uninteresting, the default mechanism detects an exception to default. This may happen in three cases: i) U rlj has a frequency N of visits, and average dwell time D < ; ii) suggested U rlj is ignored with a frequency N > M ind ; iii) U rlj is removed by favourites at time T , not added again later. Values for  (expressed in milliseconds), M inf (real) and M ind (real) are initially fixed by the system as equal to the minimum value extracted from the log file, and then revised on the basis of how user reacts to suggestions.

4 Computing suggestions The process of computing suggestions and adding them to a Web Page consists of four phases: 2

1

Dipartimento di Informatica e Comunicazione, Universit`a degli studi di Milano. Email: [email protected]

Supposing a sort of coherence of the search, the log file is cleaned each time the user closes the browser, or manually types a different url address into it, thus moving in a new search context.

1. Clustering: page P0 and pages Pj (j = 1..m) pointed by P0 at the first level of depth, are clusterized on the basis of a similarity function s(P0 , Pj ) (e.g. cosine similarity); 2. Preferred Ranking: page P0 and pages Pj are given a weight Wj = |W Cj | + |W Ij | where: W Cj = {U rli | W Ij = {U rli |

U rli is in the same cluster as Pj } U rli points to/is pointed by Pj }

3. Computing similarities: (a) f1 (P, Pj ) is equal to the value of the similarity function s(P0 , Pj ) if P0 and Pj are in the same cluster, zero otherwise; (b) f2 (P, Pj ) is equal to |W Cj | if P0 and Pj are in the same cluster, zero otherwise; (c) f3 (P, Pj ) is equal to Wj − W0 if Wj > W0 , zero otherwise. Values of fi are disposed in a similarity matrix S ∈ N j×i where the j − th row corresponds to page Pj , the i − th column corresponds to correlation functions fi , and each element sji in the matrix is the result of fi (P, Pj ). S is multiplied by a correlation matrix C ∈ Ri×i which is a diagonal matrix where each element dii on the diagonal, corresponds to the weight of fi according to preferences on reasons expressed by the ordering relation. By this operation, a weighted similarity matrix Sw ∈ N j×i is obtained. Each row in Sw is associated a value rj which is a linear combination of the elements in the j − th row. This value is then used to rank pages. 4. Adding suggestions: once rj has been computed for each page Pj , results are presented to the user by associating symbols to urls in P0 or by showing an ordered list of the urls of P0 , accessible by clicking on the correspondent link. Example 1 Suppose user U is browsing page P0 , pointing to four pages Pj , j = 1..4. Interesting sources are U rli , i = 1..4. Suppose s(Pi , Pi ) = 1, s(P0 , P3 ) = 0.92, s(P0 , U rl4 ) = 0.87, s(P1 , U rl1 ) = 0.7, s(P1 , U rl2 ) = 0.8, s(P2 , U rl3 ) = 0.9, s(P3 , U rl4 ) = 0.783 . Values are placed in the correlation matrix S; weights and links are illustrated in Figure 1. Suppose we expressed the order f1  f3  f2 , thus p(f1 ) = 3, p(f3 ) = 2 and p(f2 ) = 1 are placed on the diagonal of matrix C. P1 W1=2 P2 P0 W0=2

W2=1

Url 2

P3 W3 =2

Url 3

P4

Url 4

W4 =4

Figure 1.

Url 1

Example 1: preferred ranking

Matrix Sw is given by S · C as follows: 3

Similarity results for the remaining combination of pages are lower than a fixed minimum, so they are considered equal to zero.

 

Sw = 

0 0 0.92 0

0 0 1 0



0  0  0  2

3 0 0

0 1 0

  

0 0  2

= 

0 0 2.76 0

0 0 1 0



0  0 0  4

The ranking value rj for page Pj is a linear combination of elements of the j-th row of Sw , i.e. rj =  3i=1 sji , thus r1 = r2 = 0, r3 = 2.76 + 1 = 3.76, and r4 = 4. Note that P3 is less preferred than P4 despite its being in the same cluster as P0 . This is due to the combination of two factors: first, correlation between P0 and P3 is not as substantial as the increment of interestingness the user could get in visiting page P4 ; second, function f3 combining interestingness of a page and similarity metric, is not much less preferred by the user than f1 , based on similarity only; as a consequence, due to the preferential nature of the multicriteria framework, page P3 is slightly less preferred than P4 because browsing through it results in a significant increment of interestingness for the user, although P3 is more similar to P0 than P4 . A different order on functions fi would affect this result.

5 Conclusions and Future Work Although no prototype is available for complete validation yet, preliminary tests on a few log file instances showed that classical machine learning approaches mentioned in the introduction, could give better solutions only when a continuative browsing activity of different users is taken into account. My alternative approach considers the profile as user- and interaction-oriented, in that only recent browsing activity of the user who is running the system is detailed and used for profile update. In this case the cost of using a reasoning system at run-time is not too much to pay for the benefits that may be gained, as we can get acceptable recommendations even after a few user’s interactions, by combining rating criteria. Non intrusiveness is granted by the fact that initial user’s interests are extracted almost automatically: no complex statistical information, or features selection are needed. Correlation functions could potentially be n4 ; this would generate a n-dimensional matrix expressing the preferential order on functions fi , thus allowing multicriteria rating by a linear combination of the final results of each criterion. Nonetheless, more detailed experimental results are needed to evaluate effectiveness of this approach and provide significant empirical data. This aspects will be detailed in a future full paper, where also further extensions of the framework will be investigated, such as i) ordering rules applied to infer interestingness of a page and ii) quantifying interestingness at some degree of a page.

REFERENCES [1] G. Adomavicius and A. Tuzhilin, ‘Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions’, IEEE Trans. Knowl. Data Eng., 17, 734–749, (2005). [2] M. Gelfond and V. Lifschitz, ‘The stable model semantics for logic programming’, ICLP/SLP, 1070–1080, (1988). [3] H. Lieberman, ‘Letizia: An agent that assists web browsing’, IJCAI, 1, 924–929, (1995). [4] D Mladeni´c, ‘Web location of personal web watcher project’, http://www-ai.ijs.si/DunjaMladenic/pww.html, (1999). [5] J.M. Pazzani and D. Billsus, ‘Learning and revising user profiles: The identification of interesting web sites’, Machine Learning, 3, 313–331, (1997). 4

This number is limited by the space available and may affect performances, but n > 2 already allow to apply multicriteria rating.

Suggest Documents