Document not found! Please try again

A Graph Indexing Approach for Content-Based ...

1 downloads 211 Views 643KB Size Report
State Key Laboratory of Networking and Switching .... The single dimensional graph builders are a collection of relation graph builders. Each relation graph builder builds a ..... Based Recommendation System Using Social Network Analysis.
2010 Second International Conference on MultiMedia and Information Technology

A Graph Indexing Approach for Content-Based Recommendation System Tao PENG, Wendong WANG, XiangYang GONG, Ye TIAN

XiaoGang YANG, Jian MA Nokia Beijing Research Center Beijing, China

State Key Laboratory of Networking and Switching Beijing University of Posts and Telecommunications Beijing, China e-mail: [email protected] Abstract—Conventional content-based recommendation systems use different classifying algorithms to group items into several groups and for each group generate a ranking list of items. An important characteristic of conventional contentbased recommendation systems is that they use the same ranking list to make recommendations for items in each group, ignoring differences among items inside of a group. The paper proposes a content-based recommendation system built on top of a weighted un-directional graph. The graph describes the content similarity between items based on the semantic relations of their metadata. Neighbors of a node in the graph construct a ranking list of items to be recommended and there is a ranking list for each item. So it is able to emphasize differences among related items. We developed a prototype of the proposed system in Kaleido Photo project, and it proves to be sufficient to recommend most similar photos according to what the user is viewing. Keywords- Graph Indexing; Content-Based Recommender System; Semantic Relations; Feature Similarity

I.

INTRODUCTION

Nowadays, as the explosion of information, recommendation systems are playing a more and more important role in our daily life. Most recommendation systems fall into three categories [1]: content-based recommendation systems, collaborative filtering recommendation systems and hybrid of the two. Contentbased recommendation systems recommend items similar to the ones users previously chose. Collaborative filtering recommendation systems make recommendations based on what people with similar tastes and preferences liked in the past. Hybrid recommendation systems combine contentbased and collaborative methods. Conventional content-based recommendation systems usually use classifying algorithms such as Bayesian classifier to group items into different groups and generate a ranking list for each group [2]. The ranking lists are used to make recommendations for items in the group. Recommendations for items in the same group are the same. So difference between items in the same group is omitted.

978-0-7695-4008-5/10 $26.00 © 2010 IEEE DOI 10.1109/MMIT.2010.84

In the meantime, graph indexing is getting attention in the content-based information retrieval area [3]. Contentbased recommendation systems have their roots in contentbased information retrieval. Semantic relation of item metadata is considered to be used to calculate the similarity between items [4]. And semantic graph indexing has already been considered as an efficient information retrieval technique [5]. In the paper, we propose a content-based recommendation system built on top of a weighted undirectional semantic relation graph. The graph describes the similarity between items based on the semantic relations of their metadata. Each node in the graph represents an item in the system. Neighbors of a node in the graph construct a ranking list of items to be recommended. There is a ranking list for each item in the graph. So the proposed system is able to make recommendations for all items, and difference of all items is also addressed. We also developed a prototype of the proposed recommendation system in Kaleido Photo project. Kaleido Photo project aims at improving the quality of photo sharing service for mobile device users. The rest of the paper is divided into four parts to explain the system in details. In Section 2, we present the system architecture. Then we describe how we build the semantic relation graph index in Section 3, and how recommendations are made in Section 4. Then in Section 5, we present a performance evaluation using precision and recall metrics. Finally, we conclude our work and describe the future work in Section 6. II.

SYSTEM OVERVIEW

This section describes the architecture of the proposed content-based recommendations system. We also discuss the key parts of the system, as well as some related methods. Figure 1 depicts the architectural design of the proposed system. It consists of three major components. The system is driven by user-supplied items. Generation of semantic relation graphs needs to read a lot of data from the metadata database. To reduce impact on the performance of the main item metadata database, we use a replica database that duplicates all metadata from the main metadata database, and build relation graphs from the replica database. 93

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on June 23,2010 at 07:20:12 UTC from IEEE Xplore. Restrictions apply.

The single dimensional graph builders are a collection of relation graph builders. Each relation graph builder builds a semantic relation graph based on one item metadata attribute. The relation graph is an un-directional weighted graph. Each node of the graph represents an item in the system. The similarity of two nodes’ metadata attribute determines the weight of the line connecting them. Therefore the graph is able to describe the similarity of a single metadata attribute between all items in the system. We define the weight wij of the line connecting item Pi and Pj as

wij = ω ( Ai, Aj ) f ( Ai, Aj )

(1)

where n is the number of item metadata attributes, and ω is the weight given to the similarity in value of attribute Ak between two items Pi and Pj . The definition of f depends on the type of attribute we passed in. Similar as in [6], we normalize f to have value in [0, 1]. In the next section, we will discuss how we determine ω and f in details. We implemented a prototype of the proposed system in Kaleido Photo project. The Kaleido Photo project deployment is drawn in Figure 2, where clients communicate with the server via WIFI or mobile networks. We also set up a testing environment as shown in Figure 3.

where Ai and Aj are the corresponding metadata attribute. And we define ω as the weight function determined by the similarity of Ai and Aj . The graph index fusing component combines all graphs generated by the single dimensional graph builders. The component does this by adding each line in the graphs to corresponding nodes. If the line exists, add the weight of new line to the original line. The output graph is still an undirectional weighted graph but it now describes semantic relations of all metadata attributes between items.

Internet Tablets

Kaleido Photo Server

WIFI Router

Public Internet

Cell Phone

WIFI Router

Users Uploading Items GSNs

Internet Tablets

Content-Based Recommendation

GSM Internet Tablets

Base Stati on

Recommendation Matching/Dispatching

Cell Phone

Graph Index Fusing

Index DB

Base Stati on

Figure 2. Example of Prototype System Deployment

Single Dimensional Relation Graph Builders

Location Relation Graph Builder

Time Relation Graph Builder

Author Relation Graph Builder

Internet Tablets

Event Relation Graph Builder

WIFI Router

LAN

Item Metadata Replica DB

Database Replication

Internet Tablets

WIFI Router

Main Item Metadata DB Kaleido Photo Server Internet Tablets

Figure 1. Proposed System Architecture.

In the semantic relation graph generated by this component, the similarity of two items is ranked by the weight of the line connecting them. We define the weight W of the line is defined between two nodes Pi and Pj as n

W (Pi, Pj) = ∑ ω ( Aik , Ajk ) f ( Aik , Ajk )

(2)

Figure 3. Testing Environment

III.

GENERATING SEMANTIC RELATION GRAPH INDEX

In this section, we illustrate the proposed system in details based on our prototype implementation in Kaleido Photo project.

k =1

94

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on June 23,2010 at 07:20:12 UTC from IEEE Xplore. Restrictions apply.

A. Single Dimensional Relation Graph Generation Here we explain the single dimensional relation graph builder by using time attribute of photos as an example. Other single dimensional relation (e.g. location, author, event, etc.) graphs are calculated similarly. To generate a single dimensional relation graph using (1), we need to define ω and f for each metadata attribute. Table I is a collection of definitions of f we used in our prototype implementation. For each metadata attribute, we choose a granularity and group photos into several collections to forms different single attribute hierarchies. Photos in the same hierarchy are considered similar to each other. And the lower group two photos belong to, the more similar they are considered, and thus the higher weight value ω is given to the line connecting them in the single dimensional relation graph. TABLE I.

All photos belonging to the same group are considered to have the same similarity between each other. Photos in the leaf group in the hierarchy are most related in time. So they are given larger ω in (1). Photos belonging to the same upper group but to the same leaf group are considered less relative and thus the function ω in (1) returns a smaller value for them. For example, in Figure 4, photos in the leaf group 2009-June-9 have the similarity weight of 10. Then a photo pair, one in 2009-June-15 group and the other in 2009June-9 group could have a similarity weight of 5. And other pair, one in 2009-June-15 and the other in 2009-July-13 could have a similarity weight of 2.

METADATA ATTRIBUTES USED IN PHOTO RECOMMENDATION

Attributes

Type

Domain

Taken Time Upload Time Location

Year Year String

YYYY YYYY

GPS Author Event

Double String String

,

Emotion

String

Collections

Tags

String

Collections

Distance Measure Y1 == Y2 ? 1 : 0 Y1 == Y2 ? 1 : 0

| L1 ∩ L 2 | L max |G1 – G2| > 10 ? 0 : 1 A1 == A2 ? 1 : 0

| E1 ∩ E 2 | E max | E1 ∩ E 2 | E max | T1 ∩ T 2 | T max

For example, in time dimension, we first choose a granularity of a day to group photos. Photos are then grouped into several collections and form a time hierarchy like Figure 4. We use this kind of hierarchy to determine the similarity of between photos. Photos in the same leaf group as well as in the same upper group are considered to be similar to each other. So the function f returns 1 for them in (1).

Figure 5. Example of Semantic Relation Graph for Time Attribute

After calculating the time relation between all photos, we have a semantic relation graph similar to Figure 5 for time attribute. In Figure 5, each node Pi in the graph represents a photo in the system and the weight of a line wi is the semantic similarity we calculated by time attribute for the two photos. As we can see in Figure 5, the single dimensional relation graph is constructed by several fully connected sub-graphs, because not all photos in the system are time-dimensionally related. Each sub-graph can be stored in a matrix like:

Stime _ i

where

⎡0 w1 ⎢ w1 0 =⎢ ⎢ ⎢ ⎣ wj

wj ⎤ wk ⎥⎥ ⎥ ⎥ 0⎦

(3)

S time _ i is the ith sub-graph for time relation graph

and w is the weight of lines between nodes in the graph calculated using (1). B. Full Semantic Relation Graph Generation After calculating single dimensional relation graphs for all metadata attributes, we have a matrix similar to (3) for each attribute. We then sum up all the matrices to form a relation matrix for all photos in the system using (4).

⎡0 W 1 ⎢W 1 0 Sn = ⎢ ⎢ ⎢ ⎣Wn

Figure 4. Example of Time Groups Hierarchy

Wn ⎤ Wk ⎥⎥ ⎥ ⎥ 0⎦

(4)

95

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on June 23,2010 at 07:20:12 UTC from IEEE Xplore. Restrictions apply.

where Sn is the relation matrix for all photos in the system. And W is calculated by summing up all corresponding weight found in each single dimensional relation graph. Note that W here also agrees with (2), where weight of a line is the sum of all corresponding lines in single dimensional relation graph. Sn is used to store the full semantic relation graph we want to generate. And an example of such a graph is given in Figure 6.

of photo IDs can be found as

ranking list like

3,2,4,6 . According to the similarity

ranking list, we can make recommendations in the order of photo ID 3, 2, 4 and 6. V.

MAKING RECOMMENDATIONS

This section explains how recommendations are made in the proposed system. After calculating the semantic relation graph in the previous section, we can save the graph in the index database. And each photo’s unique id in the system is used as an identifier to the corresponding node in the graph. The process is illustrated in Figure 7.

P2

2,3,4,6 . Then we sort the

list by the weight of lines connecting them to P1. If we assume w5 > w4 > w6 > w13 , we can have a similarity

PERFORMANCE EVALUATION

To order to evaluate the performance of the prototype implementation, we conduct a data set that we collected in our internal software trial, from which more than 5000 photos are selected, and 8 kinds of metadata attributes are considered. Precision and recall are the most popular metrics for evaluating information retrieval systems. They are used to evaluating recommender systems too [7]. Precision is defined as the ratio of relevant item selected to number of items selected, shown in (5) [8]. Precision represents the probability that a selected item is relevant.

Figure 6. Example of Full Semantic Relation Graph

IV.

can find that P2, P3, P4 and P6 are connected to P1. So a list

P=

P5

P1

(5)

where Nrs is the number of real relevant items recommended by the system, and Ns , is the number of items recommended by the system. Recall, shown in (6), is defined as the ratio of relevant items selected to total number of relevant items available. Recall represents the probability that a relevant item is selected.

P=

w4 w1

Nrs Ns

w9

Nrs Nr

(6)

w13 w2 w7

P6

w6

w5

w10

w11

P3

w8

P7

w12

P4

Figure 7. Example of Photo_id-Graph_node Mapping

We make recommendations based on what a user is viewing. So we are told the ID of the photo that a user is currently viewing. We then find the node in the semantic relation graph corresponding to the given photo ID and mark it as a temporary root. Now we are able to find a list of photo IDs whose corresponding node is connected to the root. Then we sort the ID list decreasingly by the weight of the line connecting them to the root. Recall that the weight of a line in the graph represents the similarity of the two photos. So the sorted ID list is a similarity ranking list and our recommendations are from top to bottom in the list. For instance, in Figure 7, if a user is viewing photo 1, we first map photo 1 to node P1 in the semantic graph. Then we

where Nrs is the number of real relevant items recommended by the system, and Nr , is the number of relevant items in the system. Table II shows our test results of precision and recall metrics, using different number of photos as input data set. Overall, the results are very good even when given data set is relatively small. The performance of the proposed system improves quite quickly as the number of input photos are increased. TABLE II.

SUMMARY OF PRECISION AND RECALL RESULTS

Number of photos 50 100 200 500 1000 2000 5000

Precision 0.786 0.846 0.886 0.901 0.908 0.920 0.925

Recall 0.815 0.898 0.918 0.938 0.939 0.941 0.942

96

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on June 23,2010 at 07:20:12 UTC from IEEE Xplore. Restrictions apply.

precision

Precision and recall of the tests results are also drawn in Figure 8 and Figure 9, from which we can see the performance of the proposed system stays high and stable when dealing with large number of photos. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

ACKNOWLEDGMENT 0

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Number of Photos

Figure 8. Tests Results Precision

Recall

a user is viewing. The graph indexing approach is also feasible for other recommendation systems used in online videos sharing sites like Youtube [9] and online book stores like Amazon [10]. A potential risk of the graph indexing approach is that, for very large-scale item-sets, the process of semantic graph index generation might be slow, because it costs too much CPU time and memory bandwidth. A distributed computing environment like the Hadoop project [11] is considered as a good solution, which may also be the next step of the Kaleido Photo project.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

This paper is partly supported by the BUPT-NOKIA Joint Project, National High-Tech Research and Development 863 Program of China (2007AA01Z206, 2009AA01Z210), the National Basic Research 973 Program of China (2009CB320504). REFERENCES [1]

0

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Number of Photos

Figure 9. Tests Results Recall

VI.

CONCLUSION AND FUTURE WORK

In this paper, we propose a content-based recommendation system based on a semantic relation graph. Firstly, for each item metadata attribute, we group items into several collections to form a single-attribute-similarity hierarchy. Then we create single dimensional relation graph on the basis of the collection hierarchy. Secondly, we superimpose all single dimensional relation graphs into a full semantic relation graph, which describes the metadatasemantic relations of all items. Thirdly, we explain how we make recommendations based on the full semantic relation graph and give an example to illustrate the process. We have implemented a prototype of the proposed recommendation system in Kaleido Photo project. Test results show that the graph indexing approach can fast and efficiently recommend relevant photos based on which photo

G. Adomavicius and A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering. Piscataway, NJ, USA, 2005, pp.734-749. [2] R.J. Mooney and L. Roy. Content-Based Book Recommending Using Learning for Text Categorization. Proceedings of the SIGIR-99 Workshop on Recommender Systems: Algorithms and Evaluation. Berkeley, CA, USA, 1999. [3] S. Berretti, A.D. Bimbo, and E. Vicario. Efficient Matching and Indexing of Graph Models in Content-Based Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence. Washington, USA, 2001, pp. 1089-1105. [4] Y. Wang, N. Stash, L. Aroyo, L. Hollink and G. Schreiber. Semantic Relations in Contentbased Recommender Systems. http://www.cs.vu.nl/~guus/papers/, 2009. [5] X. Yang, Y Tian, T. Peng, C Chen, J. Ma. Semantic-Based Graph Index for Mobile Photo Search. The 2nd International Workshop on Education Technology and Computer Science, ETCS 2010, in press. [6] S. Debnath, N. Ganguly and P. Mitra. Feature Weighting in Content Based Recommendation System Using Social Network Analysis. WWW 2008. Beijing, China. 2008. [7] Z. Zaier, R. Godin and L. Faucher. Evaluating Recommender Systems. International Conference on Automated solutions for Cross Media Content and Multi-channel Distribution. 2008. [8] L. J. Herlocker, A. J. Konstan, G. L. Terveen, and T. J. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, Vol. 22, pp. 5-53, 2004. [9] http://www.youtube.com/ [10] http://www.amazon.com/ [11] http://hadoop.apache.org/

97

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on June 23,2010 at 07:20:12 UTC from IEEE Xplore. Restrictions apply.

Suggest Documents