A Semantic Framework for Personalized Ad Recommendation based on Advanced Textual Analysis Dorothea Tsatsou, Fotis Menemenis, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology Hellas 6th km Charilaou-Thermi Road, 57001 Thermi, Thessaloniki, Greece
{dorothea, fotis, ikom}@iti.gr ABSTRACT In this paper we present a hybrid recommendation system that combines ontological knowledge with content-extracted linguistic information, derived from pre-trained lexical graphs, in order to produce high quality, personalized recommendations. In the described approach, such recommendations are exemplified in an advertising scenario. We propose a distributed system architecture that uses semantic knowledge, based on terminologically enriched domain ontologies, to learn ontological user profiles and consequently infer recommendations through fuzzy semantic reasoning. A real world user study demonstrates the improvements attained in providing user-relevant recommendations with the aid of semantic profiles.
Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software – Distributed systems.
General Terms Algorithms.
Keywords Ad recommendation, ontology population, ontological user profile, fuzzy reasoning.
lexical
graph,
1. INTRODUCTION Personalized recommendation systems aim to present users with online content, tailored to each user’s specific interests. The personalization of ad recommendation has received particular interest in recent years, since the internet business model relies heavily on advertising. Personalizing ad recommendation however is particularly challenging, since the scarcity and briefness of the text and metadata accompanying ads leave recommenders with insufficient criteria to adequately filter ads. Our choice for using ontological user profiles in the recommendation process aims to remedy the vocabulary impedance problem [1] as well as the cold-start problem in recommender systems [2]. The first problem is addressed by extracting semantic metadata from textual content and expressing them in a uniform, machine-understandable vocabulary, via domain reference ontologies. The process of classifying text to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RecSys’09, October 23–25, 2009, New York, New York, USA. Copyright 2009 ACM 978-1-60558-435-5/09/10...$10.00.
Paul C. Davis Motorola Applied Research & Technology Center 1295 E. Algonquin Rd. Schaumburg, IL 60196, USA
[email protected] ontology concepts, formulated in semantic axioms, involves a novel combination of linguistic analysis through the use of lexical graphs. The latter problem is thus alleviated by the fundamental domain information provided in the reference ontologies. The effectiveness of this system depends on advanced textual analysis used to terminologically enrich the semantic knowledge. We extend a previously implemented approach by [3], where graphs carrying information about interrelated domain terms were employed for content-based recommendation in order to ameliorate the vocabulary impedance problem. The use of explicit domain knowledge, represented by an ontology or taxonomy in order to improve recommendation accuracy and completeness has been explored in the past. The population techniques in this paper adopt the methodology proposed in [4] and [2], where concept vectors are constructed for each ontology concept by indexing a training set of web pages. In these approaches, the terms in the vectors are essentially a bag-ofwords and there is no accounting for relations between them. The aforementioned techniques place the load directly on the server and refrain from utilizing much of the formal semantics offered via ontologies, possibly due to performance limitations. However, advanced inference capabilities through the use of reasoning over formal semantics have been explored in [5], with acceptable performance. Our approach extends this technique to fuzziness, enabling us to handle fuzzy annotation and user preference weights.
2. PROPOSED APPROACH This paper discusses techniques which combine semantic knowledge and statistical terminological information unobtrusively extracted from the content a user consumes in order to match and rank advertisements based on the interest score in the semantic profile of the user. A typical usage scenario is as follows: the user consumes a content item (ad, article, annotated video, short text). The textual data of the item are analyzed in order to extract its semantic information based on domain ontologies, enriched with terms in an offline process. This information translates into a set of user preferences which are captured in the semantic user profile through an automated procedure. User preferences are then matched semantically to a set of supplied, automatically annotated ads to determine whether to recommend an ad as well as the degree of confidence that the ad is useful to the user. The match confidence degree is used to rank recommended ads to achieve more accurate recommendations. The framework’s architecture is depicted in Figure 1.
connecting pairs of nodes, which in short can be defined as G ≡ {V,E}. Each node tk ∈ V in the graph is thus connected to a set of correlating nodes {ti,t2,…,tn}∈V, that comprise that term’s graph neighborhood. After training the graph as depicted in [3], a set of tokens is derived. Each token, denoted as lemma, consists of the graph nodes, assigned with attributes such as the lemma’s part-of-speech (POS) tag, and statistical information such as its term frequency (tf) and node degree (deg) (i.e. the number of neighbors). The edges of the graph carry the degree denoting the co-occurrence (cooc) between a pair of terms in the same sentence.
3.3 Populating the Ontology Figure 1. Overview of the Ad Recommendation System
3. ONTOLOGY POPULATION WITH LEXICAL GRAPHS Populating an ontology with statistical contextual information is performed in an offline training process which involves the construction of at least one lexical graph per topic, based on linguistic analysis over a large set of web corpora relevant to the domain in question, and requires at least one reference ontology per domain.
3.1 Ontological Knowledge Modeling The approach does not require extensive ontologies or mapping every detail of the domain. For recommendation purposes, it suffices to identify the domain for the specific recommendation problem and model (or reproduce from existing ontologies) the basic information. The expressivity of the ontologies supported by our system rests within the DLP (Description Logic Programs) fragment as defined in [6]. An ontology O consists of a set of concepts C and roles R and includes axioms that comprise semantic rules by right of the ontology’s expressivity. Figure 2 illustrates a portion of an example ontology for the soccer domain.
Figure 2. Extract of an example soccer ontology
3.2 Lexical Graph Creation The lexical graph creation and update process follows the principles in [3], where the graphs used within our system have the form of a network of connected words (terms) and are progressively built up through processing textual content found on web. The basic elements of this model, denoted by G, are the set of graph nodes (or vertices) V and the set of graph edges E,
The topic graphs serve as enriched dictionaries for each domain and enable us to define contextual relations between the concepts and the terms in the domain. These relations are used to classify text to ontology concepts. Hence, each ontology concept ci∈C is terminologically classified to a lemma in the graph with the same string classification method, where possible. Each concept can then be enriched by a vector cv = t1 ⋅ w1 , t 2 ⋅ w2 ,..., t n ⋅ wn of weighted terms, where each term ti∈V is in the graph neighborhood of the term mapped to the concept ci and each weight wi∈E, represents the cooc of the neighboring terms to the term mapped to ci. However, matching absolute strings to graph nodes leads to a loss of valuable information, dispersed in the variations and synonyms of terms represented by different lemmas in the graph. Therefore, we employ a series of linguistic analyses to detect and bring together information on closely related or identical data. Wikipedia Named Entities Normalization (NEN). This mechanism is used to group the neighborhoods of all variations of a named entity in the ontology under a single reference name. We accept the Wikipedia representation of a NE as the reference name for that entity and assume that all NEs in the ontology are expressed in or can be converted to such a reference name. For example, the terms “Man Utd” and “MUFC” are variations of the English soccer team Manchester United FC. If some or all of these terms are present in separate lemmas in the graph, it is essential that all the neighbors of these variations are merged under a common neighborhood in order to achieve richer and more consistent concept vectors. We attempt to retrieve all variations for a single NE from a token list extracted from a set of texts relevant to the NE. The texts are comprised by the top N results returned from querying the reference name to a search engine. Tokenization of the texts retrieves a set of terms (referred to as variation candidates) that are then further filtered in order to discard linguistically incompatible terms. The heuristic filter used compares the order of letters in the retrieved candidates against the order of letters in the wiki-reference name. For example, the token “MUFC” contains the letters m,u,f and c in the same order as the reference name “Manchester United FC” and is therefore accepted, while “Featured Comic Strips” contains letters found in the reference name but in the wrong order and is therefore rejected. Remaining candidates are then queried to Wikipedia. A candidate is finally accepted as a valid variation of the concept only if the query redirects to (a) a page whose header is the original reference name, or (b) a Wikipedia disambiguation page that contains a hyperlink to the original reference name in its body text.
WordNet-based Synonym Detection. Similarly, we attempt to identify and merge neighborhoods of semantically identical nouns in the graph. For that purpose, all WordNet synonyms of a reference concept are retrieved and classified as variations of that term. The initial concept vector is updated with the terms in the neighborhoods of all the synonyms retrieved in the graph. Neighborhood Selection. All graph nodes representing variations (or the reference name) of a concept ci are assembled in a single v vector c , along with the terms in their joint neighborhood, as it was formed after the abovementioned analysis. A lower degree threshold is employed to dispense with the most common terms of the topic and prune them from the concept’s neighborhood. The lower deg threshold formula is given in (1).
L deg G = avg (deg t∈V ) + 3 ⋅ stdDeviati on (deg t∈V ) (1) where LdegG is the lower deg threshold, avg(degr∈V) is the average deg of all terms in the graph and stdDeviation(degr∈V) depicts the standard deviation of the deg all terms in the graph. Similarly, to avoid circumstantial co-occurrence between two terms, terms whose cooc is lower than the average neighborhood cooc of the examined concept’s neighborhood are discarded. Weighting. All terms in the concept vector are assigned a weight w(ni)∈[0,1], n being a term in the vector, representing the confidence degree with which the term describes the concept. All NE variations are assigned with the maximum degree, i.e. 1, of participation to the concept. All synonyms of a noun or adjective are assigned 0.9, to allow some uncertainty with respect to the sense of the particular synonym. All other neighborhood terms in the vector are assigned with the normalized cooc degree of the neighboring term with the concept’s name variation term. The normalized weight is calculated with formula (2). If a term appears in the neighborhood of two or more variations, the maximum of the normalized term degrees is retained.
w(ni ) =
co − occurrence(t , ni ) (2) tf (t )
Where n is adjacent to the root lemma t, co-occurrence(t, ni) is the co-occurrence degree between t and n, and tf (t) is the tf of the root lemma t.
4. THE RECOMMENDATION SYSTEM The recommendation system uses the concept vectors derived from the previous step to discern the semantics of user-consumed content. The individual components exploiting these semantics to result in a set of ranked recommendations are described below.
4.1 Mapping Text to Concepts The pre-trained concept vectors are used to classify ads and user consumed content to ontology concepts. A vector of terms is retained for each individual content item or advertisement. Each term is assigned with a tf value analogous to its appearance to the text, which represents the participation weight of that term to the content item or ad. The classification process is based on a look-up scheme, where concept mappings for the extracted text terms are retrieved, i.e. each extracted term is looked-up with an inverted index scheme in all the concept vectors. The weight wi of the term ti in the concept vector and its tf in the text determine the weight of
participation wi′ = wi* tf of the concept ci to the content. If the same concept emerges more than once in a single content item, the concluding weight acquired is the maximum from all occurrences of the concept in the content. After all concept vectors are examined, the set of retrieved concepts and their standing weights constitute the classification set {c1 ⋅ w1′ , c2 ⋅ w2′ ,..., cn ⋅ w′n } for the particular content item or ad.
4.2 Automatic Advertisement Annotation Raw ads can be annotated semantically based on the aforementioned process. For each concept in the classification set of the ad, a unique instance of the concept will be automatically generated, i.e. atomi : ci , ci ∈ C . Then a reasoner induced semantic processing determines whether some ontology property might quantify each concept, based on the range and sub-domain of the property. Such properties are assigned with connecting ground values, thus relating a unique atom and the atom of the instance of the concept quantified by that property, i.e. atom , atom : r , r ∈ R . j
i
j
j
4.3 Semantic User Profile User transactions with web content are tracked through a transaction listener. Refraining from exploring the non-trivial issue of tracking disinterests, we consider as positive transactions, hence interests, viewed content items and clicked advertisements. Upon consumption, the textual content is classified and the emerging concepts are added to the user profile, along with their classification weight. Concepts are incrementally appended, and/or eliminated, or are weight-modified in the profile. A semantic rule is created and updated for each user and is expressed as ⊔n∃hasInterest.Interesti∈n⊑Profile, where ⊔n denotes the disjunction of all preferences n in the user’s profile. Concepts are existentially quantified with general descriptor properties of the ontology (where applicable) based on the given property’s range, forming complex concepts of the type ∃ Pr operty j .Concepti . Properties considered as general descriptors are the ones not assigned to a specific sub-domain. Preferences are depicted in the user profile in Profile Concepts which might consist of not only individual ontology concepts, but also persistent combinations of ontology concepts, that represent occurrences of atomic concepts with strong correlation. Each concept combination is expressed as a conjunction of the individual correlating concepts in the user profile, such as
⊓kConcepth∈k. The combinations for each set of classified concepts are produced for every set of terms in a consumed item based on formula (3). N
N
i =1
i =1
ctotal = ∑ ciN = ∑
N! i! ( N − i)! (3)
Where N is the number of individual concepts participating in the combination and Ctotal is the sum of all combinations Cn,i, where i=1,N. Empirical results have demonstrated that N=3 is sufficient to adequately represent plausible persistent combinations. Profile concepts are updated upon user transaction. The weights of the profile concepts are also decayed by a temporal factor, expressed in formula (4).
w′ = α ⋅ f ⋅ e
t −tlast
λ
(4)
Where α is the last recorded normalized weight of the profile concept, f is the concept’s frequency of appearance in the profile, t is the current timestamp, tlast is the timestamp of the last transaction for this profile concept and λ is the mean concept life (in milliseconds). This weight is then normalized in [0,1] with respect to the user’s transaction total.
identifying ads relevant to the text, while we notice that the accuracy gradually decreases as the users expect targeted recommendations.
4.4 Matchmaking A semantic reasoning service is employed to match complex semantic user profiles to extracted ad annotation. The devised reasoner is an extension of the Pocket KRHyper [5] mobile reasoner. The reasoner, called f-PocketKRHyper 1 , extends the previous implementation to fuzziness, thus supporting management of annotation and user preferences’ uncertainty. Consequently, by appending user implicit information, formulated with formal semantics, in given a domain ontology, fPocketKRHyper can decide whether a given ad item matches the user profile and to what degree. In addition, the reasoner supports weighted concepts, based on Straccia’s concept weight modifiers [7], thus the participation degree of the produced entailments can be controlled by the confidence weight with which the user preferences participate to the profile.
5. EXPERIMENTAL EVALUATION For the implementation of the approach, a single ontology outlining the soccer domain was developed for proof-of-concept purposes, represented by 547 concepts and 18 roles. A lexical graph has been trained for this topic from a manually collected dataset of web-crawled soccer articles and ads. A large corpus of additional web-crawled articles and ads were collected and used for test purposes. A user study was conducted by 38 users, not chosen with any specific constraint, in order to evaluate the inter-dependent components of the overall system 2 . A profile was learned per user after consumption of 20 soccer articles and a set of ranked ads was recommended at the end, both of which the users were asked to rate. Results have demonstrated an average success rate of 69.74% for the top 10 final recommendations and of 79.47% for the derived profile. 75.26% was the average satisfaction score for the system’s performance as a whole. The users were also asked to rate ads presented to them during article viewing. The ads were provided by the reasoner-based recommender (RBR) described herein and the content-based recommender (CBR) described in [3] while the users were unaware of the recommendation method. Results demonstrate that the RBR performs well from the beginning of the trials, achieving scores comparable to the straightforward CBR. Comparison of the two recommenders verifies that the RBR shows progressive and rather steady improvement as the user profile evolves and becomes more stable. The CBR’s performance, on the other hand, fluctuates significantly since it depends on successfully
Figure 3. Rating (points) and Polynomial (order of 6) Trendlines for the two Recommenders.
6. CONCLUSIONS The work described in this paper has presented evidence that semantic knowledge in combination with the use of statistical terminological data captured in lexical graphs can be beneficial to the recommendation of content items expressed by text. Furthermore, it has confirmed the efficiency of employing formal semantics that capture the underlying substance of user preferences to achieve richer and more meaningful recommendations, even when little information about the user is available. The methodology described allows for recommendation of any kind of content item for which there are textual data and for any domain for which semantic knowledge is available.
7. REFERENCES [1] Ribeiro-Neto B., Cristo M., Golgher B.G., de Moura E.S. 2005. Impedance Coupling in Content-targeted Advertising. In: Proceedings of the 28th annual international annual international ACM SIGIR conference (SIGIR '05), pp. 496-503, ACM, New York.
[2] Sieg A., Mobasher B., Burke R. 2007. Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search. In IEEE Intelligent Informatics Bulletin, Vol. 8, No.1, (November 2007).
[3] Papadopoulos S., Menemenis F., Kompatsiaris Y., Bratu B. 2009. Lexical Graphs for Improved Contextual Ad Recommendation. In Proceedings of the 31st European Conference on Information Retrieval (Toulouse, France, April 2009).
[4] Trajkova J., Gauch S. 2003. Improving Ontology-Based User Profiles, M.S. Thesis, EECS, University of Kansas, August, 2003
[5] Kleemann T., Sinner A. 2005. User Profiles and Matchmaking on Mobile Phones. In proceedings, INAP 2005, 16th International Conference on Applications of Declarative Programming and Knowledge Management, Fukuoka 2005.
[6] Grosof B. N., Horrocks I.,. Volz R, Decker S. 2003. Description 1
Initially developed within the aceMedia project. This work was partially supported by the European Commission under contract FP6-001765. http://www.acemedia.org/
2
The authors thank George Kalfas of ITI for setting up and running user trials in Greece and Mir Farooq Ali of Motorola for running the U.S. trials.
Logic Programs: Combining Logic Programs with Description Logic, In proceedings of 12th International Conference on the World Wide Web (WWW-2003).
[7] Straccia, U. 2005. Towards a Fuzzy Description Logic for the Semantic Web (Preliminary Report). In Proceedings of the 2nd European Semantic Web Conference (ESWC-05), 2005